Statistics Module

The Stats module provides a comprehensive toolkit for time series analysis, including decomposition methods, transformation functions, autocorrelation analysis, and unit root tests. These functions are essential for preprocessing, analyzing, and understanding time series data before fitting forecasting models.

Overview

The Stats module exports the following functions and types:

Category	Functions/Types
Transformations	`box_cox`, `box_cox!`, `box_cox_lambda`, `inv_box_cox`
Decomposition	`decompose`, `DecomposedTimeSeries`, `stl`, `STLResult`, `mstl`, `MSTLResult`
Differencing	`diff`, `ndiffs`, `nsdiffs`
Autocorrelation	`acf`, `pacf`, `ACFResult`, `PACFResult`
Unit Root Tests	`adf`, `ADF`, `kpss`, `KPSS`, `phillips_perron`, `PhillipsPerron`, `ocsb`, `OCSB`
Missing Values	`handle_missing`, `longest_contiguous`, `interpolate_missing`, `check_missing`, `MissingMethod`, `Contiguous`, `Interpolate`, `FailMissing`
Utilities	`fourier`, `embed`, `ols`, `OlsFit`, `approx`, `approxfun`, `seasonal_strength`

Box-Cox Transformations

Box-Cox transformations stabilize variance and make data more normally distributed, which can improve forecast accuracy.

Mathematical Formulation

The Box-Cox transformation is defined as:

\[y^{(\lambda)} = \begin{cases} \frac{y^\lambda - 1}{\lambda} & \text{if } \lambda \neq 0 \\ \log(y) & \text{if } \lambda = 0 \end{cases}\]

`box_cox_lambda`

Automatically select the optimal Box-Cox transformation parameter.

box_cox_lambda(x, m; method=:guerrero, lower=-1, upper=2)

Arguments:

x::AbstractVector: A numeric vector (must be positive for Guerrero method)
m::Int: Frequency of the data
method::Symbol: Selection method - :guerrero (default) or :loglik
lower::Float64: Lower bound for λ search (default: -1)
upper::Float64: Upper bound for λ search (default: 2)

Methods:

Guerrero: Minimizes the coefficient of variation for subseries (Guerrero, 1993)
Log-likelihood: Maximizes the profile log likelihood of a linear model

Example:

using Durbyn.Stats

y = [120, 135, 148, 152, 141, 158, 170, 165, 180, 195]
lambda = box_cox_lambda(y, 12, method=:guerrero)

`box_cox`

Apply Box-Cox transformation to a series.

box_cox(x, m; lambda=:auto)

Arguments:

x::AbstractVector: Input vector
m::Int: Frequency
lambda: Transformation parameter or :auto for automatic selection

Returns: Tuple (transformed_vector, lambda_used)

Example:

y_transformed, lambda = box_cox(y, 12; lambda=:auto)

`box_cox!`

In-place Box-Cox transformation for memory efficiency.

box_cox!(output, x, m; lambda)

Note: Use this in tight loops where box_cox is called repeatedly to avoid allocations.

`inv_box_cox`

Reverse the Box-Cox transformation.

inv_box_cox(x; lambda, biasadj=false, fvar=nothing)

Arguments:

x::AbstractArray: Transformed data
lambda::Real: Transformation parameter used
biasadj::Bool: Apply bias adjustment for mean forecasts (default: false)
fvar: Forecast variance (required if biasadj=true)

Example:

y_original = inv_box_cox(y_transformed; lambda=0.5)

# With bias adjustment for forecasts
y_mean = inv_box_cox(y_transformed; lambda=0.5, biasadj=true, fvar=forecast_variance)

References

Box, G. E. P. and Cox, D. R. (1964). An analysis of transformations. JRSS B, 26, 211-246.
Guerrero, V.M. (1993). Time-series analysis supported by power transformations. Journal of Forecasting, 12, 37-48.
Bickel, P. J. and Doksum K. A. (1981). An Analysis of Transformations Revisited. JASA, 76, 296-311.

Time Series Decomposition

Classical Decomposition (`decompose`)

Decompose a time series into trend, seasonal, and residual components using moving averages.

decompose(x; m, type=:additive, filter=nothing)

Arguments:

x::AbstractVector: Time series vector
m::Int: Frequency (observations per cycle)
type::Symbol: :additive or :multiplicative
filter: Custom filter coefficients (optional)

Returns: DecomposedTimeSeries struct with fields:

x: Original series
seasonal: Seasonal component
trend: Trend component
random: Residual/remainder
figure: Seasonal figure
type: Decomposition type
m: Frequency

Example:

using Durbyn.Stats

ap = air_passengers()
result = decompose(ap; m=12, type=:multiplicative)
result.trend
result.seasonal

STL Decomposition (`stl`)

Seasonal-Trend decomposition using LOESS (STL) - a robust and flexible method for decomposing time series.

stl(x, m; s_window, s_degree=0, t_window=nothing, t_degree=1,
    l_window=nothing, l_degree=t_degree, s_jump=nothing, t_jump=nothing,
    l_jump=nothing, robust=false, inner=nothing, outer=nothing)

Arguments:

x::AbstractVector: Time series to decompose
m::Int: Seasonal frequency (must be ≥ 2)
s_window: Seasonal smoothing window (integer or "periodic")
s_degree::Int: Seasonal smoothing polynomial degree (0 or 1)
t_window: Trend smoothing window
t_degree::Int: Trend smoothing polynomial degree (0 or 1)
l_window: Low-pass filter window
l_degree::Int: Low-pass filter polynomial degree
s_jump, t_jump, l_jump: Subsampling steps
robust::Bool: Use robustness iterations (default: false)
inner, outer: Inner/outer iteration counts

Returns: STLResult struct with:

time_series: NamedTuple with :seasonal, :trend, :remainder
weights: Robustness weights
windows: (s, t, l) window sizes
degrees: (s, t, l) polynomial degrees
jumps: (s, t, l) jump parameters
inner, outer: Iteration counts

Example:

ap = air_passengers()
result = stl(ap, 12; s_window=7, robust=true)

# Access components
result.time_series.trend
result.time_series.seasonal
result.time_series.remainder

# Summarize and plot
summary(result)
plot(result)

Multiple Seasonal Decomposition (`mstl`)

Decompose time series with multiple seasonal periods using iterative STL.

mstl(x, m; lambda=nothing, iterate=2, s_window=nothing, stl_kwargs...)

Arguments:

x::AbstractVector: Time series
m: Single period (Int) or vector of periods
lambda: Box-Cox parameter (nothing, :auto, or numeric)
iterate::Int: Number of outer iterations (default: 2)
s_window: Seasonal window(s)
stl_kwargs...: Additional arguments passed to stl

Returns: MSTLResult struct with:

data: Original series
trend: Trend component
seasonals: Vector of seasonal components
m: Periods used
remainder: Residual component
lambda: Box-Cox λ used

Example:

# Hourly data with daily and weekly patterns
y = rand(200) .+ 2sin.(2π*(1:200)/7) .+ 0.5sin.(2π*(1:200)/30)
result = mstl(y; m=[7, 30], iterate=2, s_window=[11, 23], robust=true)

# Access components
result.trend
result.seasonals[1]  # First seasonal component (period 7)
result.seasonals[2]  # Second seasonal component (period 30)
result.remainder

# Summarize
summary(result)

Seasonal Strength

Measure the strength of seasonality in an MSTL decomposition.

seasonal_strength(x; m, kwargs...)
seasonal_strength(res::MSTLResult)

The seasonal strength is computed as:

\[\text{strength} = 1 - \frac{\text{Var}(\text{remainder})}{\text{Var}(\text{remainder} + \text{seasonal})}\]

Values range from 0 (no seasonality) to 1 (strong seasonality).

Example:

result = mstl(y; m=[7, 30])
strength = seasonal_strength(result)

Autocorrelation Functions

ACF (`acf`)

Compute the sample autocorrelation function.

acf(y, m, nlags=nothing; demean=true)

Arguments:

y::AbstractVector: Input time series
m::Int: Frequency/seasonal period
nlags: Number of lags (default: min(10*log10(n), n-1))
demean::Bool: Subtract mean before computing (default: true)

Returns: ACFResult with:

values: ACF values at each lag (including lag 0)
lags: Lag indices
n: Series length
m: Frequency
ci: 95% confidence interval (±1.96/√n)

Formula:

\[\hat{\rho}(k) = \frac{\sum_{t=1}^{n-k} (y_t - \bar{y})(y_{t+k} - \bar{y})}{\sum_{t=1}^{n} (y_t - \bar{y})^2}\]

Example:

y = randn(100)
result = acf(y, 12)
result.values  # ACF values
result.ci      # Confidence interval

plot(result)   # Requires Plots.jl

PACF (`pacf`)

Compute the sample partial autocorrelation function using the Durbin-Levinson algorithm.

pacf(y, m, nlags=nothing)

Arguments:

y::AbstractVector: Input time series
m::Int: Frequency/seasonal period
nlags: Number of lags (default: min(10*log10(n), n-1))

Returns: PACFResult with:

values: PACF values (lags 1 to nlags)
lags: Lag indices
n: Series length
m: Frequency
ci: 95% confidence interval

Example:

y = randn(100)
result = pacf(y, 12)
result.values
plot(result)

Differencing

`diff`

Compute lagged differences of a vector or matrix.

diff(x; lag=1, differences=1)

Arguments:

x: Vector or matrix
lag::Int: Lag interval (default: 1)
differences::Int: Number of times to apply differencing (default: 1)

Example:

y = [1, 3, 6, 10, 15]
diff(y)                    # [2, 3, 4, 5]
diff(y; lag=2)             # [5, 7, 9]
diff(y; differences=2)     # [1, 1, 1]

`ndiffs`

Determine the number of non-seasonal differences needed for stationarity.

ndiffs(x; alpha=0.05, test=:kpss, deterministic=:level, maxd=2, kwargs...)

Arguments:

x::AbstractVector: Time series
alpha::Float64: Significance level (clamped to [0.01, 0.10])
test::Symbol: Unit root test - :kpss, :adf, or :pp
deterministic::Symbol: :level (intercept) or :trend (intercept + trend)
maxd::Int: Maximum differences to try (default: 2)

Test Behavior:

KPSS: Null = stationarity. Returns smallest d where KPSS does not reject.
ADF/PP: Null = unit root. Returns smallest d where test rejects unit root.

Example:

y = cumsum(randn(100))  # Random walk (non-stationary)
d = ndiffs(y; test=:kpss)
println("Differences needed: $d")

# Using ADF test
d_adf = ndiffs(y; test=:adf, deterministic=:trend)

`nsdiffs`

Determine the number of seasonal differences needed.

nsdiffs(x, m; alpha=0.05, test=:seas, maxD=1, kwargs...)

Arguments:

x::AbstractVector: Time series
m::Int: Seasonal period
alpha::Float64: Significance level
test::Symbol: :seas (default) or :ocsb
maxD::Int: Maximum seasonal differences (default: 1)

Example:

ap = air_passengers()
D = nsdiffs(ap, 12)
println("Seasonal differences needed: $D")

Unit Root Tests

ADF Test (`adf`)

Augmented Dickey-Fuller test for unit roots.

adf(y; type=:none, lags=1, selectlags=:fixed)

Null Hypothesis: The series has a unit root (non-stationary)

Arguments:

y::AbstractVector: Time series
type::Symbol: :none, :drift (intercept), or :trend (intercept + trend)
lags::Int: Maximum augmentation order
selectlags::Symbol: :fixed, :aic, or :bic

Returns: ADF struct with:

model: Test type used
cval: Critical values matrix
clevels: Significance levels [0.01, 0.05, 0.10]
lag: Selected augmentation order
teststat: Test statistics (τ-statistics)
testreg: Auxiliary regression results
res: Residuals

Example:

y = cumsum(randn(100))
result = adf(y; type=:drift, lags=4, selectlags=:aic)
println("Test statistic: $(result.teststat[1])")
println("Critical values: $(result.cval)")

KPSS Test (`kpss`)

Kwiatkowski-Phillips-Schmidt-Shin test for stationarity.

kpss(y; type=:mu, lags=:short, use_lag=nothing)

Null Hypothesis: The series is stationary

Arguments:

y::AbstractVector: Time series
type::Symbol: :mu (constant) or :tau (constant + trend)
lags::Symbol: :short, :long, or :nil
use_lag: Manually specify bandwidth

Returns: KPSS struct with:

type: Test type
lag: Bandwidth used
teststat: Test statistic
cval, clevels: Critical values and levels
res: Regression residuals

Example:

y = randn(100)  # Stationary
result = kpss(y; type=:mu)
println("Test statistic: $(result.teststat)")
println("Critical values: $(result.cval)")

Phillips-Perron Test (`phillips_perron`)

Phillips-Perron unit root test with non-parametric correction.

phillips_perron(x; type=:Z_alpha, model=:constant, lags=:short, use_lag=nothing)

Null Hypothesis: The series has a unit root

Arguments:

x::AbstractVector: Time series
type::Symbol: :Z_alpha or :Z_tau
model::Symbol: :constant or :trend
lags::Symbol: :short or :long
use_lag: Bartlett truncation lag

Returns: PhillipsPerron struct with test results.

Example:

result = phillips_perron(y; type=:Z_tau, model=:trend)

OCSB Test (`ocsb`)

Osborn-Chui-Smith-Birchenhall test for seasonal unit roots.

ocsb(x, m; lag_method=:fixed, maxlag=0, clevels=[0.10, 0.05, 0.01])

Null Hypothesis: Seasonal unit root exists

Arguments:

x::AbstractVector: Time series
m::Int: Seasonal period
lag_method::Symbol: :fixed, :AIC, :BIC, or :AICc
maxlag::Int: Maximum AR order to consider

Returns: OCSB struct with:

teststat: OCSB t-statistic
cval, clevels: Critical values and levels
lag: Selected AR order

Example:

ap = air_passengers()
result = ocsb(ap, 12; lag_method=:AIC)

Utility Functions

Fourier Terms (`fourier`)

Generate Fourier terms for seasonal modeling in regression.

fourier(x; m, K, h=nothing)

Arguments:

x::AbstractVector: Time series
m: Seasonal period
K::Int: Number of Fourier terms
h: Forecast horizon (optional)

Returns: Matrix of sin/cos terms

Example:

y = randn(100)
F = fourier(y; m=12, K=6)
# Use F as regressors in ARIMA with external regressors

Time-Delay Embedding (`embed`)

Create a time-delay embedding matrix (lag matrix).

embed(x, dimension=1)

Arguments:

x: Vector or matrix
dimension::Int: Embedding dimension

Returns: Matrix with lags in descending order (compatible with R's embed)

Example:

y = [1, 2, 3, 4, 5]
embed(y, 3)
# Returns:
# 3  2  1
# 4  3  2
# 5  4  3

Ordinary Least Squares (`ols`)

Fit OLS linear regression.

ols(y, X)

Arguments:

y::AbstractVector: Response vector
X::AbstractMatrix: Design matrix (include intercept column if needed)

Returns: OlsFit struct with:

coef: Estimated coefficients
fitted: Fitted values
residuals: Residuals
sigma2: Residual variance
cov: Coefficient covariance matrix
se: Standard errors
df_residual: Residual degrees of freedom

Example:

n = 100
X = hcat(ones(n), randn(n))  # Intercept + predictor
y = X * [2.0, 3.0] + randn(n) * 0.5

fit = ols(y, X)
fit.coef       # Coefficients
fit.se         # Standard errors
fit.residuals  # Residuals

# Predictions
predict(fit, X_new)

Interpolation (`approx`, `approxfun`)

Linear or constant interpolation of data.

approx(x, y; xout=nothing, method=:linear, n=50, yleft=nothing,
       yright=nothing, rule=(1,1), f=0.0, ties=mean, na_rm=true)

Arguments:

x, y: Coordinates to interpolate
xout: Output grid points (default: n equally spaced)
method: :linear or :constant
n: Number of interpolation points
yleft, yright: Extrapolation values
rule: (1,1) for missing at boundaries, (2,2) for boundary values
f: For :constant, controls step function continuity
ties: Function to collapse duplicate x values

Returns: NamedTuple (x=xout_vec, y=yout_vec)

Example:

x = [1, 2, 4, 5]
y = [2, 4, 6, 8]
result = approx(x, y; n=10)

# Create interpolation function
f = approxfun(x, y)
f(3)  # Interpolate at x=3

Missing Value Handling

Durbyn provides a type-dispatched system for handling missing values (missing and NaN) in time series data.

Type Hierarchy

All missing value strategies are subtypes of the abstract type MissingMethod:

Type	Description
`Contiguous()`	Extract the longest contiguous segment without missing values
`Interpolate()`	Interpolate missing values (seasonal-aware)
`Interpolate(; linear=true)`	Force linear interpolation
`FailMissing()`	Throw an error if any missing values are present

`handle_missing`

Dispatch to the appropriate strategy based on the MissingMethod type.

handle_missing(x, Contiguous())           # longest contiguous segment
handle_missing(x, Interpolate(); m=12)    # seasonal interpolation
handle_missing(x, FailMissing())          # error if missing

Arguments:

x::AbstractArray: Input vector (may contain missing or NaN)
method::MissingMethod: Strategy to use
m::Union{Int,Nothing}: Seasonal period (used by Interpolate)

Example:

using Durbyn.Stats

x = [1.0, 2.0, missing, 4.0, 5.0]

# Type dispatch replaces string dispatch
result = handle_missing(x, Contiguous())
result = handle_missing(x, Interpolate())
result = handle_missing(x, FailMissing())  # throws ArgumentError

`longest_contiguous`

Extract the longest contiguous segment of non-missing values.

longest_contiguous(x)

Both missing and NaN are treated as missing values.

Example:

x = [missing, 1.0, 2.0, 3.0, missing, 4.0, missing]
longest_contiguous(x)  # Returns [1.0, 2.0, 3.0]

`interpolate_missing`

Interpolate missing values in a time series. For seasonal data, uses STL decomposition with seasonal-aware interpolation. For non-seasonal data, uses linear interpolation.

interpolate_missing(x; m=nothing, lambda=nothing, linear=nothing)

Arguments:

x::AbstractVector: Time series (may contain missing or NaN)
m::Union{Int,Nothing}: Seasonal period (nothing or 1 = non-seasonal)
lambda::Union{Nothing,Real}: Box-Cox transformation parameter
linear::Union{Nothing,Bool}: Force linear interpolation

Algorithm (seasonal):

Fit preliminary model with Fourier terms and polynomial trend
Apply robust MSTL decomposition
Linearly interpolate the seasonally adjusted series
Add back the seasonal component
Fall back to linear if results are unstable

Example:

# Non-seasonal
y = [1.0, 2.0, missing, 4.0, 5.0]
interpolate_missing(y)

# Seasonal (monthly)
ap = air_passengers()
ap[50] = NaN
interpolate_missing(ap; m=12)

# With Box-Cox
interpolate_missing(y; lambda=0.5)

`check_missing`

Verify that a vector contains no missing values. Returns the input unchanged if clean; throws ArgumentError otherwise.

check_missing(x)

Example:

check_missing([1.0, 2.0, 3.0])         # Returns [1.0, 2.0, 3.0]
check_missing([1.0, missing, 3.0])      # Throws ArgumentError
check_missing([1.0, NaN, 3.0])          # Throws ArgumentError

Usage in ETS

The ets() function accepts a missing_method keyword argument:

using Durbyn.ExponentialSmoothing

ap = air_passengers()

# Default: extract longest contiguous segment
fit1 = ets(ap, 12, "ZZZ")

# Interpolate missing values
fit2 = ets(ap, 12, "ZZZ"; missing_method=Interpolate())

# Error on missing values
fit3 = ets(ap, 12, "ZZZ"; missing_method=FailMissing())

Complete Example: Time Series Preprocessing Pipeline

using Durbyn
using Durbyn.Stats

# Load data
ap = air_passengers()

# 1. Check for stationarity and determine differencing
d = ndiffs(ap; test=:kpss)
D = nsdiffs(ap, 12)
println("Non-seasonal differences: $d, Seasonal differences: $D")

# 2. Apply Box-Cox transformation
y_transformed, lambda = box_cox(ap, 12; lambda=:auto)
println("Box-Cox lambda: $lambda")

# 3. Decompose the series
result = stl(ap, 12; s_window=7, robust=true)
summary(result)

# 4. Examine autocorrelation structure
acf_result = acf(ap, 12, 24)
pacf_result = pacf(ap, 12, 24)

# 5. Multiple seasonal decomposition (if applicable)
mstl_result = mstl(ap; m=12, lambda=lambda)
strength = seasonal_strength(mstl_result)
println("Seasonal strength: $strength")

# 6. Perform unit root tests
adf_result = adf(ap; type=:drift, selectlags=:aic)
kpss_result = kpss(ap; type=:mu)

println("ADF test statistic: $(adf_result.teststat[1])")
println("KPSS test statistic: $(kpss_result.teststat)")

References

Cleveland, R. B., Cleveland, W. S., McRae, J. E., & Terpenning, I. (1990). STL: A Seasonal-Trend Decomposition Procedure Based on Loess. Journal of Official Statistics, 6(1), 3-73.
Dickey, D. A., & Fuller, W. A. (1979). Distribution of the Estimators for Autoregressive Time Series with a Unit Root. JASA, 74, 427-431.
Kwiatkowski, D., Phillips, P. C. B., Schmidt, P., & Shin, Y. (1992). Testing the Null Hypothesis of Stationarity Against the Alternative of a Unit Root. Journal of Econometrics, 54, 159-178.
Phillips, P. C. B., & Perron, P. (1988). Testing for a Unit Root in Time Series Regression. Biometrika, 72(2), 335-346.
Osborn, D. R., Chui, A. P. L., Smith, J. P., & Birchenhall, C. R. (1988). Seasonality and the Order of Integration for Consumption. Oxford Bulletin of Economics and Statistics, 50, 361-377.
Box, G. E. P., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015). Time Series Analysis: Forecasting and Control. Wiley.