Statistics Module
The Stats module provides a comprehensive toolkit for time series analysis, including decomposition methods, transformation functions, autocorrelation analysis, and unit root tests. These functions are essential for preprocessing, analyzing, and understanding time series data before fitting forecasting models.
Overview
The Stats module exports the following functions and types:
| Category | Functions/Types |
|---|---|
| Transformations | box_cox, box_cox!, box_cox_lambda, inv_box_cox |
| Decomposition | decompose, DecomposedTimeSeries, stl, STLResult, mstl, MSTLResult |
| Differencing | diff, ndiffs, nsdiffs |
| Autocorrelation | acf, pacf, ACFResult, PACFResult |
| Unit Root Tests | adf, ADF, kpss, KPSS, phillips_perron, PhillipsPerron, ocsb, OCSB |
| Missing Values | handle_missing, longest_contiguous, interpolate_missing, check_missing, MissingMethod, Contiguous, Interpolate, FailMissing |
| Utilities | fourier, embed, ols, OlsFit, approx, approxfun, seasonal_strength |
Box-Cox Transformations
Box-Cox transformations stabilize variance and make data more normally distributed, which can improve forecast accuracy.
Mathematical Formulation
The Box-Cox transformation is defined as:
\[y^{(\lambda)} = \begin{cases} \frac{y^\lambda - 1}{\lambda} & \text{if } \lambda \neq 0 \\ \log(y) & \text{if } \lambda = 0 \end{cases}\]
box_cox_lambda
Automatically select the optimal Box-Cox transformation parameter.
box_cox_lambda(x, m; method=:guerrero, lower=-1, upper=2)Arguments:
x::AbstractVector: A numeric vector (must be positive for Guerrero method)m::Int: Frequency of the datamethod::Symbol: Selection method -:guerrero(default) or:logliklower::Float64: Lower bound for λ search (default: -1)upper::Float64: Upper bound for λ search (default: 2)
Methods:
- Guerrero: Minimizes the coefficient of variation for subseries (Guerrero, 1993)
- Log-likelihood: Maximizes the profile log likelihood of a linear model
Example:
using Durbyn.Stats
y = [120, 135, 148, 152, 141, 158, 170, 165, 180, 195]
lambda = box_cox_lambda(y, 12, method=:guerrero)box_cox
Apply Box-Cox transformation to a series.
box_cox(x, m; lambda=:auto)Arguments:
x::AbstractVector: Input vectorm::Int: Frequencylambda: Transformation parameter or:autofor automatic selection
Returns: Tuple (transformed_vector, lambda_used)
Example:
y_transformed, lambda = box_cox(y, 12; lambda=:auto)box_cox!
In-place Box-Cox transformation for memory efficiency.
box_cox!(output, x, m; lambda)Note: Use this in tight loops where box_cox is called repeatedly to avoid allocations.
inv_box_cox
Reverse the Box-Cox transformation.
inv_box_cox(x; lambda, biasadj=false, fvar=nothing)Arguments:
x::AbstractArray: Transformed datalambda::Real: Transformation parameter usedbiasadj::Bool: Apply bias adjustment for mean forecasts (default: false)fvar: Forecast variance (required ifbiasadj=true)
Example:
y_original = inv_box_cox(y_transformed; lambda=0.5)
# With bias adjustment for forecasts
y_mean = inv_box_cox(y_transformed; lambda=0.5, biasadj=true, fvar=forecast_variance)References
- Box, G. E. P. and Cox, D. R. (1964). An analysis of transformations. JRSS B, 26, 211-246.
- Guerrero, V.M. (1993). Time-series analysis supported by power transformations. Journal of Forecasting, 12, 37-48.
- Bickel, P. J. and Doksum K. A. (1981). An Analysis of Transformations Revisited. JASA, 76, 296-311.
Time Series Decomposition
Classical Decomposition (decompose)
Decompose a time series into trend, seasonal, and residual components using moving averages.
decompose(x; m, type=:additive, filter=nothing)Arguments:
x::AbstractVector: Time series vectorm::Int: Frequency (observations per cycle)type::Symbol::additiveor:multiplicativefilter: Custom filter coefficients (optional)
Returns: DecomposedTimeSeries struct with fields:
x: Original seriesseasonal: Seasonal componenttrend: Trend componentrandom: Residual/remainderfigure: Seasonal figuretype: Decomposition typem: Frequency
Example:
using Durbyn.Stats
ap = air_passengers()
result = decompose(ap; m=12, type=:multiplicative)
result.trend
result.seasonalSTL Decomposition (stl)
Seasonal-Trend decomposition using LOESS (STL) - a robust and flexible method for decomposing time series.
stl(x, m; s_window, s_degree=0, t_window=nothing, t_degree=1,
l_window=nothing, l_degree=t_degree, s_jump=nothing, t_jump=nothing,
l_jump=nothing, robust=false, inner=nothing, outer=nothing)Arguments:
x::AbstractVector: Time series to decomposem::Int: Seasonal frequency (must be ≥ 2)s_window: Seasonal smoothing window (integer or"periodic")s_degree::Int: Seasonal smoothing polynomial degree (0 or 1)t_window: Trend smoothing windowt_degree::Int: Trend smoothing polynomial degree (0 or 1)l_window: Low-pass filter windowl_degree::Int: Low-pass filter polynomial degrees_jump,t_jump,l_jump: Subsampling stepsrobust::Bool: Use robustness iterations (default: false)inner,outer: Inner/outer iteration counts
Returns: STLResult struct with:
time_series: NamedTuple with:seasonal,:trend,:remainderweights: Robustness weightswindows: (s, t, l) window sizesdegrees: (s, t, l) polynomial degreesjumps: (s, t, l) jump parametersinner,outer: Iteration counts
Example:
ap = air_passengers()
result = stl(ap, 12; s_window=7, robust=true)
# Access components
result.time_series.trend
result.time_series.seasonal
result.time_series.remainder
# Summarize and plot
summary(result)
plot(result)Multiple Seasonal Decomposition (mstl)
Decompose time series with multiple seasonal periods using iterative STL.
mstl(x, m; lambda=nothing, iterate=2, s_window=nothing, stl_kwargs...)Arguments:
x::AbstractVector: Time seriesm: Single period (Int) or vector of periodslambda: Box-Cox parameter (nothing,:auto, or numeric)iterate::Int: Number of outer iterations (default: 2)s_window: Seasonal window(s)stl_kwargs...: Additional arguments passed tostl
Returns: MSTLResult struct with:
data: Original seriestrend: Trend componentseasonals: Vector of seasonal componentsm: Periods usedremainder: Residual componentlambda: Box-Cox λ used
Example:
# Hourly data with daily and weekly patterns
y = rand(200) .+ 2sin.(2π*(1:200)/7) .+ 0.5sin.(2π*(1:200)/30)
result = mstl(y; m=[7, 30], iterate=2, s_window=[11, 23], robust=true)
# Access components
result.trend
result.seasonals[1] # First seasonal component (period 7)
result.seasonals[2] # Second seasonal component (period 30)
result.remainder
# Summarize
summary(result)Seasonal Strength
Measure the strength of seasonality in an MSTL decomposition.
seasonal_strength(x; m, kwargs...)
seasonal_strength(res::MSTLResult)The seasonal strength is computed as:
\[\text{strength} = 1 - \frac{\text{Var}(\text{remainder})}{\text{Var}(\text{remainder} + \text{seasonal})}\]
Values range from 0 (no seasonality) to 1 (strong seasonality).
Example:
result = mstl(y; m=[7, 30])
strength = seasonal_strength(result)Autocorrelation Functions
ACF (acf)
Compute the sample autocorrelation function.
acf(y, m, nlags=nothing; demean=true)Arguments:
y::AbstractVector: Input time seriesm::Int: Frequency/seasonal periodnlags: Number of lags (default:min(10*log10(n), n-1))demean::Bool: Subtract mean before computing (default: true)
Returns: ACFResult with:
values: ACF values at each lag (including lag 0)lags: Lag indicesn: Series lengthm: Frequencyci: 95% confidence interval (±1.96/√n)
Formula:
\[\hat{\rho}(k) = \frac{\sum_{t=1}^{n-k} (y_t - \bar{y})(y_{t+k} - \bar{y})}{\sum_{t=1}^{n} (y_t - \bar{y})^2}\]
Example:
y = randn(100)
result = acf(y, 12)
result.values # ACF values
result.ci # Confidence interval
plot(result) # Requires Plots.jlPACF (pacf)
Compute the sample partial autocorrelation function using the Durbin-Levinson algorithm.
pacf(y, m, nlags=nothing)Arguments:
y::AbstractVector: Input time seriesm::Int: Frequency/seasonal periodnlags: Number of lags (default:min(10*log10(n), n-1))
Returns: PACFResult with:
values: PACF values (lags 1 to nlags)lags: Lag indicesn: Series lengthm: Frequencyci: 95% confidence interval
Example:
y = randn(100)
result = pacf(y, 12)
result.values
plot(result)Differencing
diff
Compute lagged differences of a vector or matrix.
diff(x; lag=1, differences=1)Arguments:
x: Vector or matrixlag::Int: Lag interval (default: 1)differences::Int: Number of times to apply differencing (default: 1)
Example:
y = [1, 3, 6, 10, 15]
diff(y) # [2, 3, 4, 5]
diff(y; lag=2) # [5, 7, 9]
diff(y; differences=2) # [1, 1, 1]ndiffs
Determine the number of non-seasonal differences needed for stationarity.
ndiffs(x; alpha=0.05, test=:kpss, deterministic=:level, maxd=2, kwargs...)Arguments:
x::AbstractVector: Time seriesalpha::Float64: Significance level (clamped to [0.01, 0.10])test::Symbol: Unit root test -:kpss,:adf, or:ppdeterministic::Symbol::level(intercept) or:trend(intercept + trend)maxd::Int: Maximum differences to try (default: 2)
Test Behavior:
- KPSS: Null = stationarity. Returns smallest d where KPSS does not reject.
- ADF/PP: Null = unit root. Returns smallest d where test rejects unit root.
Example:
y = cumsum(randn(100)) # Random walk (non-stationary)
d = ndiffs(y; test=:kpss)
println("Differences needed: $d")
# Using ADF test
d_adf = ndiffs(y; test=:adf, deterministic=:trend)nsdiffs
Determine the number of seasonal differences needed.
nsdiffs(x, m; alpha=0.05, test=:seas, maxD=1, kwargs...)Arguments:
x::AbstractVector: Time seriesm::Int: Seasonal periodalpha::Float64: Significance leveltest::Symbol::seas(default) or:ocsbmaxD::Int: Maximum seasonal differences (default: 1)
Example:
ap = air_passengers()
D = nsdiffs(ap, 12)
println("Seasonal differences needed: $D")Unit Root Tests
ADF Test (adf)
Augmented Dickey-Fuller test for unit roots.
adf(y; type=:none, lags=1, selectlags=:fixed)Null Hypothesis: The series has a unit root (non-stationary)
Arguments:
y::AbstractVector: Time seriestype::Symbol::none,:drift(intercept), or:trend(intercept + trend)lags::Int: Maximum augmentation orderselectlags::Symbol::fixed,:aic, or:bic
Returns: ADF struct with:
model: Test type usedcval: Critical values matrixclevels: Significance levels [0.01, 0.05, 0.10]lag: Selected augmentation orderteststat: Test statistics (τ-statistics)testreg: Auxiliary regression resultsres: Residuals
Example:
y = cumsum(randn(100))
result = adf(y; type=:drift, lags=4, selectlags=:aic)
println("Test statistic: $(result.teststat[1])")
println("Critical values: $(result.cval)")KPSS Test (kpss)
Kwiatkowski-Phillips-Schmidt-Shin test for stationarity.
kpss(y; type=:mu, lags=:short, use_lag=nothing)Null Hypothesis: The series is stationary
Arguments:
y::AbstractVector: Time seriestype::Symbol::mu(constant) or:tau(constant + trend)lags::Symbol::short,:long, or:niluse_lag: Manually specify bandwidth
Returns: KPSS struct with:
type: Test typelag: Bandwidth usedteststat: Test statisticcval,clevels: Critical values and levelsres: Regression residuals
Example:
y = randn(100) # Stationary
result = kpss(y; type=:mu)
println("Test statistic: $(result.teststat)")
println("Critical values: $(result.cval)")Phillips-Perron Test (phillips_perron)
Phillips-Perron unit root test with non-parametric correction.
phillips_perron(x; type=:Z_alpha, model=:constant, lags=:short, use_lag=nothing)Null Hypothesis: The series has a unit root
Arguments:
x::AbstractVector: Time seriestype::Symbol::Z_alphaor:Z_taumodel::Symbol::constantor:trendlags::Symbol::shortor:longuse_lag: Bartlett truncation lag
Returns: PhillipsPerron struct with test results.
Example:
result = phillips_perron(y; type=:Z_tau, model=:trend)OCSB Test (ocsb)
Osborn-Chui-Smith-Birchenhall test for seasonal unit roots.
ocsb(x, m; lag_method=:fixed, maxlag=0, clevels=[0.10, 0.05, 0.01])Null Hypothesis: Seasonal unit root exists
Arguments:
x::AbstractVector: Time seriesm::Int: Seasonal periodlag_method::Symbol::fixed,:AIC,:BIC, or:AICcmaxlag::Int: Maximum AR order to consider
Returns: OCSB struct with:
teststat: OCSB t-statisticcval,clevels: Critical values and levelslag: Selected AR order
Example:
ap = air_passengers()
result = ocsb(ap, 12; lag_method=:AIC)Utility Functions
Fourier Terms (fourier)
Generate Fourier terms for seasonal modeling in regression.
fourier(x; m, K, h=nothing)Arguments:
x::AbstractVector: Time seriesm: Seasonal periodK::Int: Number of Fourier termsh: Forecast horizon (optional)
Returns: Matrix of sin/cos terms
Example:
y = randn(100)
F = fourier(y; m=12, K=6)
# Use F as regressors in ARIMA with external regressorsTime-Delay Embedding (embed)
Create a time-delay embedding matrix (lag matrix).
embed(x, dimension=1)Arguments:
x: Vector or matrixdimension::Int: Embedding dimension
Returns: Matrix with lags in descending order (compatible with R's embed)
Example:
y = [1, 2, 3, 4, 5]
embed(y, 3)
# Returns:
# 3 2 1
# 4 3 2
# 5 4 3Ordinary Least Squares (ols)
Fit OLS linear regression.
ols(y, X)Arguments:
y::AbstractVector: Response vectorX::AbstractMatrix: Design matrix (include intercept column if needed)
Returns: OlsFit struct with:
coef: Estimated coefficientsfitted: Fitted valuesresiduals: Residualssigma2: Residual variancecov: Coefficient covariance matrixse: Standard errorsdf_residual: Residual degrees of freedom
Example:
n = 100
X = hcat(ones(n), randn(n)) # Intercept + predictor
y = X * [2.0, 3.0] + randn(n) * 0.5
fit = ols(y, X)
fit.coef # Coefficients
fit.se # Standard errors
fit.residuals # Residuals
# Predictions
predict(fit, X_new)Interpolation (approx, approxfun)
Linear or constant interpolation of data.
approx(x, y; xout=nothing, method=:linear, n=50, yleft=nothing,
yright=nothing, rule=(1,1), f=0.0, ties=mean, na_rm=true)Arguments:
x, y: Coordinates to interpolatexout: Output grid points (default: n equally spaced)method::linearor:constantn: Number of interpolation pointsyleft,yright: Extrapolation valuesrule:(1,1)for missing at boundaries,(2,2)for boundary valuesf: For:constant, controls step function continuityties: Function to collapse duplicate x values
Returns: NamedTuple (x=xout_vec, y=yout_vec)
Example:
x = [1, 2, 4, 5]
y = [2, 4, 6, 8]
result = approx(x, y; n=10)
# Create interpolation function
f = approxfun(x, y)
f(3) # Interpolate at x=3Missing Value Handling
Durbyn provides a type-dispatched system for handling missing values (missing and NaN) in time series data.
Type Hierarchy
All missing value strategies are subtypes of the abstract type MissingMethod:
| Type | Description |
|---|---|
Contiguous() | Extract the longest contiguous segment without missing values |
Interpolate() | Interpolate missing values (seasonal-aware) |
Interpolate(; linear=true) | Force linear interpolation |
FailMissing() | Throw an error if any missing values are present |
handle_missing
Dispatch to the appropriate strategy based on the MissingMethod type.
handle_missing(x, Contiguous()) # longest contiguous segment
handle_missing(x, Interpolate(); m=12) # seasonal interpolation
handle_missing(x, FailMissing()) # error if missingArguments:
x::AbstractArray: Input vector (may containmissingorNaN)method::MissingMethod: Strategy to usem::Union{Int,Nothing}: Seasonal period (used byInterpolate)
Example:
using Durbyn.Stats
x = [1.0, 2.0, missing, 4.0, 5.0]
# Type dispatch replaces string dispatch
result = handle_missing(x, Contiguous())
result = handle_missing(x, Interpolate())
result = handle_missing(x, FailMissing()) # throws ArgumentErrorlongest_contiguous
Extract the longest contiguous segment of non-missing values.
longest_contiguous(x)Both missing and NaN are treated as missing values.
Example:
x = [missing, 1.0, 2.0, 3.0, missing, 4.0, missing]
longest_contiguous(x) # Returns [1.0, 2.0, 3.0]interpolate_missing
Interpolate missing values in a time series. For seasonal data, uses STL decomposition with seasonal-aware interpolation. For non-seasonal data, uses linear interpolation.
interpolate_missing(x; m=nothing, lambda=nothing, linear=nothing)Arguments:
x::AbstractVector: Time series (may containmissingorNaN)m::Union{Int,Nothing}: Seasonal period (nothingor1= non-seasonal)lambda::Union{Nothing,Real}: Box-Cox transformation parameterlinear::Union{Nothing,Bool}: Force linear interpolation
Algorithm (seasonal):
- Fit preliminary model with Fourier terms and polynomial trend
- Apply robust MSTL decomposition
- Linearly interpolate the seasonally adjusted series
- Add back the seasonal component
- Fall back to linear if results are unstable
Example:
# Non-seasonal
y = [1.0, 2.0, missing, 4.0, 5.0]
interpolate_missing(y)
# Seasonal (monthly)
ap = air_passengers()
ap[50] = NaN
interpolate_missing(ap; m=12)
# With Box-Cox
interpolate_missing(y; lambda=0.5)check_missing
Verify that a vector contains no missing values. Returns the input unchanged if clean; throws ArgumentError otherwise.
check_missing(x)Example:
check_missing([1.0, 2.0, 3.0]) # Returns [1.0, 2.0, 3.0]
check_missing([1.0, missing, 3.0]) # Throws ArgumentError
check_missing([1.0, NaN, 3.0]) # Throws ArgumentErrorUsage in ETS
The ets() function accepts a missing_method keyword argument:
using Durbyn.ExponentialSmoothing
ap = air_passengers()
# Default: extract longest contiguous segment
fit1 = ets(ap, 12, "ZZZ")
# Interpolate missing values
fit2 = ets(ap, 12, "ZZZ"; missing_method=Interpolate())
# Error on missing values
fit3 = ets(ap, 12, "ZZZ"; missing_method=FailMissing())Complete Example: Time Series Preprocessing Pipeline
using Durbyn
using Durbyn.Stats
# Load data
ap = air_passengers()
# 1. Check for stationarity and determine differencing
d = ndiffs(ap; test=:kpss)
D = nsdiffs(ap, 12)
println("Non-seasonal differences: $d, Seasonal differences: $D")
# 2. Apply Box-Cox transformation
y_transformed, lambda = box_cox(ap, 12; lambda=:auto)
println("Box-Cox lambda: $lambda")
# 3. Decompose the series
result = stl(ap, 12; s_window=7, robust=true)
summary(result)
# 4. Examine autocorrelation structure
acf_result = acf(ap, 12, 24)
pacf_result = pacf(ap, 12, 24)
# 5. Multiple seasonal decomposition (if applicable)
mstl_result = mstl(ap; m=12, lambda=lambda)
strength = seasonal_strength(mstl_result)
println("Seasonal strength: $strength")
# 6. Perform unit root tests
adf_result = adf(ap; type=:drift, selectlags=:aic)
kpss_result = kpss(ap; type=:mu)
println("ADF test statistic: $(adf_result.teststat[1])")
println("KPSS test statistic: $(kpss_result.teststat)")References
- Cleveland, R. B., Cleveland, W. S., McRae, J. E., & Terpenning, I. (1990). STL: A Seasonal-Trend Decomposition Procedure Based on Loess. Journal of Official Statistics, 6(1), 3-73.
- Dickey, D. A., & Fuller, W. A. (1979). Distribution of the Estimators for Autoregressive Time Series with a Unit Root. JASA, 74, 427-431.
- Kwiatkowski, D., Phillips, P. C. B., Schmidt, P., & Shin, Y. (1992). Testing the Null Hypothesis of Stationarity Against the Alternative of a Unit Root. Journal of Econometrics, 54, 159-178.
- Phillips, P. C. B., & Perron, P. (1988). Testing for a Unit Root in Time Series Regression. Biometrika, 72(2), 335-346.
- Osborn, D. R., Chui, A. P. L., Smith, J. P., & Birchenhall, C. R. (1988). Seasonality and the Order of Integration for Consumption. Oxford Bulletin of Economics and Statistics, 50, 361-377.
- Box, G. E. P., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015). Time Series Analysis: Forecasting and Control. Wiley.