Skip to contents

Overview

This article provides a complete reference for all features available in chronofeat’s formula interface. Features are automatically generated during fit() and regenerated identically during forecast().

library(chronofeat)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

data(retail)
ts_data <- TimeSeries(retail, date = "date", groups = "items", frequency = "month")

Target Features

Target features are derived from the variable you’re forecasting.

Lags: p()

Create lagged values of the target variable.

# Create 12 consecutive lags (lag_1 through lag_12)
m <- fit(value ~ p(12), data = ts_data, model = lm)
m$predictors
#>  [1] "value_lag_1"  "value_lag_2"  "value_lag_3"  "value_lag_4"  "value_lag_5" 
#>  [6] "value_lag_6"  "value_lag_7"  "value_lag_8"  "value_lag_9"  "value_lag_10"
#> [11] "value_lag_11" "value_lag_12"
# Create specific lags only
m <- fit(value ~ p(1, 7, 12, 24), data = ts_data, model = lm)
# Creates: value_lag_1, value_lag_7, value_lag_12, value_lag_24

How it works during forecasting:

  • At step 1: value_lag_1 uses the last actual observation
  • At step 2: value_lag_1 uses the step-1 prediction
  • And so on recursively

Best practices:

  • For daily data with weekly seasonality: p(1, 7, 14) or p(7)
  • For monthly data with yearly seasonality: p(1, 12) or p(12)
  • Fewer lags = less data lost to NA, simpler model
  • More lags = captures longer patterns but needs more history

Moving Averages: q()

Create moving averages of the target over specified windows.

# Single window
m <- fit(value ~ q(12), data = ts_data, model = lm)
m$predictors
#> [1] "value_ma_12"

# Multiple windows
m <- fit(value ~ q(3, 6, 12), data = ts_data, model = lm)
m$predictors
#> [1] "value_ma_3"  "value_ma_6"  "value_ma_12"

How it works:

  • q(7) = mean of the last 7 values (including current)
  • Uses slider::slide_dbl() with .complete = TRUE (returns NA if window incomplete)

Best practices:

  • Captures level/trend without lag-specific patterns
  • Combine with lags: value ~ p(12) + q(12) provides both
  • During forecasting, early steps may have incomplete windows (returns NA)

Rolling Statistics

Calculate rolling statistics over specified windows.

Rolling Sum: rollsum()

m <- fit(value ~ rollsum(6, 12), data = ts_data, model = lm)
m$predictors
#> [1] "value_rollsum_6"  "value_rollsum_12"

Useful for cumulative metrics (total sales over period).

Rolling Standard Deviation: rollsd()

m <- fit(value ~ rollsd(12), data = ts_data, model = lm)
m$predictors
#> [1] "value_rollsd_12"

Captures volatility/variability patterns.

Rolling Min/Max: rollmin(), rollmax()

m <- fit(value ~ rollmin(12) + rollmax(12), data = ts_data, model = lm)
m$predictors
#> [1] "value_rollmin_12" "value_rollmax_12"

Useful for range-based patterns.

Rolling Slope: rollslope()

m <- fit(value ~ rollslope(12), data = ts_data, model = lm)
m$predictors
#> [1] "value_rollslope_12"

Captures local trend direction and strength. Computed as the slope of a linear regression over the window.

Important behavior during forecasting:

Rolling statistics return NA when the available history is shorter than the requested window. This matches training behavior where incomplete windows produce NA.

Trend: trend()

Add polynomial trend features (time index raised to powers).

# Linear trend
m <- fit(value ~ p(12) + trend(1), data = ts_data, model = lm)
m$predictors
#>  [1] "value_lag_1"  "value_lag_2"  "value_lag_3"  "value_lag_4"  "value_lag_5" 
#>  [6] "value_lag_6"  "value_lag_7"  "value_lag_8"  "value_lag_9"  "value_lag_10"
#> [11] "value_lag_11" "value_lag_12" "trend1"

# Quadratic trend
m <- fit(value ~ p(12) + trend(1, 2), data = ts_data, model = lm)
m$predictors
#>  [1] "value_lag_1"  "value_lag_2"  "value_lag_3"  "value_lag_4"  "value_lag_5" 
#>  [6] "value_lag_6"  "value_lag_7"  "value_lag_8"  "value_lag_9"  "value_lag_10"
#> [11] "value_lag_11" "value_lag_12" "trend1"       "trend2"

How it works:

  • trend(1) = row number (1, 2, 3, …)
  • trend(2) = row number squared
  • During forecasting, continues incrementing (if trained on 100 rows, forecast step 1 = 101)

Caution:

High-degree polynomials can extrapolate wildly. Prefer trend(1) or use with regularization.


Calendar Features

Calendar features capture seasonal patterns based on the date.

Day of Week: dow()

# dow() returns an ordered factor: Monday, Tuesday, ..., Sunday
m <- fit(value ~ p(7) + dow(), data = ts_data, model = lm)

For daily data, captures within-week patterns (e.g., weekend effects).

Month: month()

# month() returns a factor: 01, 02, ..., 12
m <- fit(value ~ p(12) + month(), data = ts_data, model = lm)

Captures monthly seasonality (very common in retail, energy, etc.).

Week of Year: woy()

# woy() returns integer: 1-53
m <- fit(value ~ p(7) + woy(), data = ts_data, model = lm)

Captures week-level seasonality. Useful for weekly data or daily data where week number matters.

End of Month: eom()

# eom() returns 0/1 indicator
m <- fit(value ~ p(12) + eom(), data = ts_data, model = lm)

Captures end-of-month effects (common in finance, billing cycles).

Day of Month: dom()

# dom() returns integer: 1-31
m <- fit(value ~ p(12) + dom(), data = ts_data, model = lm)

Captures within-month patterns (e.g., payday effects).

Combining Calendar Features

# For daily retail data
value ~ p(7) + dow() + month() + eom()

# For hourly data (if using POSIXct)
value ~ p(24) + dow() + month() + hod()  # hour of day

Important:

Calendar features create factor levels. Ensure your training data covers all levels you’ll encounter during forecasting. If you train on Jan-Nov and forecast into December, the month = 12 level will be unknown and converted to NA.


Exogenous Variable Features

Include external predictors in your model.

Raw Variables

Include a column directly (no transformation):

# Assume data has 'price' and 'promo' columns
m <- fit(value ~ p(12) + price + promo, data = ts_data, model = lm)

For forecasting, you must provide future values via the future parameter:

future_data <- data.frame(
  date = seq(as.Date("2024-01-01"), by = "month", length.out = 12),
  items = "item_1",
  price = rep(9.99, 12),
  promo = c(1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1)
)

fc <- forecast(m, future = future_data)

Lagged Exogenous: lag()

Create lags of exogenous variables:

# lag(variable, lag1, lag2, ...)
m <- fit(value ~ p(12) + lag(price, 0, 1, 7), data = ts_data, model = lm)
# Creates: price (lag 0 = current), price_lag_1, price_lag_7

Lag 0 includes the current value of the exogenous variable.

Moving Average of Exogenous: ma()

# ma(variable, window1, window2, ...)
m <- fit(value ~ p(12) + ma(price, 7, 28), data = ts_data, model = lm)
# Creates: price_ma_7, price_ma_28

Combining Exogenous Features

# Full exogenous specification
m <- fit(
  value ~ p(12) + month() +
    price + lag(price, 1, 7) + ma(price, 7) +
    promo,
  data = ts_data,
  model = lm
)

Feature Combinations

Here are recommended feature sets for common scenarios:

Basic Autoregressive

# Simple: just lags
value ~ p(12)

# With moving average
value ~ p(12) + q(12)

Seasonal (Calendar-based)

# Monthly data with yearly seasonality
value ~ p(12) + month()

# Daily data with weekly + yearly
value ~ p(7) + dow() + month()

# Daily with all calendar
value ~ p(7) + dow() + month() + woy() + eom()

Trend + Seasonality

# Linear trend + monthly seasonality
value ~ p(12) + trend(1) + month()

# Quadratic trend (use cautiously)
value ~ p(12) + trend(1, 2) + month()

With Rolling Statistics

# Capture level, volatility, and trend
value ~ p(12) + q(12) + rollsd(12) + rollslope(12)

# Multiple windows for short and long patterns
value ~ p(12) + rollsum(7, 28) + rollsd(7, 28)

With Exogenous Variables

# Price effects
value ~ p(12) + month() + price + lag(price, 1, 7)

# Full model
value ~ p(12) + q(12) + month() + dow() +
        price + lag(price, 1, 7) + ma(price, 7) +
        promo + rollslope(12)

Choosing Features

Guidelines by Data Frequency

Frequency Typical Features
Hourly p(24, 168) + dow() + hour features
Daily p(7, 14) + dow() + month()
Weekly p(4, 52) + woy() + month()
Monthly p(12, 24) + month()
Quarterly p(4, 8) + quarter feature

Guidelines by Pattern Type

Pattern Recommended Features
Strong trend trend(1) or differencing
Weekly seasonality dow() or p(7)
Yearly seasonality month() or p(12) for monthly data
Volatility clustering rollsd()
Level shifts rollsum(), q()
Local trends rollslope()

Start Simple

  1. Start with p(k) + month() where k matches your seasonality
  2. Check residuals for remaining patterns
  3. Add features incrementally
  4. Use cross-validation to compare

Feature Behavior During Forecasting

Understanding how features behave during recursive forecasting is crucial:

Complete Windows vs Incomplete

Feature Training Forecasting (early steps)
Lags Use actual history Use predictions recursively
MAs Complete windows only NA if window incomplete
Rolling stats Complete windows only NA if window incomplete
Calendar From training dates From generated future dates
Trend 1, 2, …, n n+1, n+2, …, n+h

NA Handling

  • Rows with NA features are dropped during training
  • During forecasting, NA features → NA predictions for that step
  • Choose window sizes appropriate for your forecast horizon

Low-Level Feature Functions

For manual feature engineering outside the formula interface:

# Create lags and MAs manually
df_feat <- feat_lag_ma_dt(
  df = my_data,
  date = "date",
  target = "sales",
  groups = "store",
  p = 12,          # 12 lags
  q = c(7, 28)     # 7-day and 28-day MAs
)

# Add calendar features
df_feat <- feat_calendar_dt(
  df = df_feat,
  date = date,
  dow = TRUE,
  month = TRUE,
  woy = FALSE,
  eom = TRUE,
  dom = FALSE
)

# Add rolling statistics
df_feat <- feat_rolling_dt(
  df = df_feat,
  date = "date",
  target = "sales",
  groups = "store",
  windows = c(7, 28),
  stats = c("sum", "sd"),
  trend_windows = 28
)

# Add trend
df_feat <- feat_trend(
  df = df_feat,
  date = "date",
  groups = "store",
  degrees = c(1, 2)
)

Summary Table

Feature Syntax Output Columns
Lags p(k) {target}_lag_1{target}_lag_k
Specific lags p(1, 7, 12) {target}_lag_1, {target}_lag_7, {target}_lag_12
Moving average q(w1, w2) {target}_ma_w1, {target}_ma_w2
Rolling sum rollsum(w) {target}_rollsum_w
Rolling SD rollsd(w) {target}_rollsd_w
Rolling min rollmin(w) {target}_rollmin_w
Rolling max rollmax(w) {target}_rollmax_w
Rolling slope rollslope(w) {target}_rollslope_w
Trend trend(d1, d2) trend1, trend2
Day of week dow() dow (factor)
Month month() month (factor)
Week of year woy() woy (integer)
End of month eom() eom (0/1)
Day of month dom() dom (integer)
Exog lag lag(var, k) var_lag_k
Exog MA ma(var, w) var_ma_w
Raw column varname varname