Train a forecasting model using a formula-based feature specification. This function automatically generates time series features (lags, MAs, rolling stats, calendar features) from the formula and fits the specified model.
Arguments
- formula
A formula specifying the target and features using special syntax:
p(k)- Create k lags of the target (e.g., p(12))q(w1, w2, ...)- Create moving averages with specified windows (e.g., q(7,28))dow(),month(),woy(),eom(),dom()- Calendar featuresrollsum(w1,w2),rollsd(w),rollmin(w),rollmax(w)- Rolling statisticsrollslope(w)- Rolling trend slopeslag(varname, k1, k2, ...)- Lags of exogenous variablesma(varname, w1, w2, ...)- MAs of exogenous variablesRaw column names for direct inclusion
- data
A data frame or TimeSeries object containing the time series data
- date
Character string naming the date column (Date class required)
- groups
Character vector naming grouping columns for panel data
- model
Model specification as a list with
fitandpredictfunctions, or a model function (for backward compatibility, e.g.,lm,glm).fit(y, X, ...)- Function that fits model with y (response) and X (predictors matrix)predict(object, newdata, ...)- Function that predicts from fitted model
- ...
Additional arguments passed to the model$fit function
Value
A tsfeature_fit object containing:
model- The fitted model objectdata- Training data with featureshistory_raw- Raw historical data for forecastingformula- Expanded formula used for modelingspec- Feature specificationsmeta- Metadata (date, groups)predictors- Predictor column namesschema- Predictor schema for type consistency
Details
The function automatically:
Parses the formula to identify feature specifications
Generates all requested features within groups
Removes rows with NA values in features (due to lagging/rolling)
Fits the model on the feature-engineered data
Stores necessary metadata for forecasting
Edge Cases and Important Behaviors
Data Sorting: If using a TimeSeries object, data is automatically sorted by groups and date. This ensures correct lag and difference calculations. If passing a plain data frame, ensure it is pre-sorted by groups (if any) and date.
Minimum Data Requirements:
Lags: If history length < lag number, the lag feature will be NA
Moving averages: If history length < window size, returns NA
Rolling statistics: Computed on available data (see below)
Frequency detection: Requires at least 2 date observations per group
Factor Variables: Factor levels observed during training are stored in the model schema. During forecasting, if new factor levels appear that weren't seen in training, they will be silently converted to NA. Consider this when using categorical features like day-of-week or month.
Examples
if (FALSE) { # \dontrun{
# Load retail data
load("data/retail.rda")
# Simple model with 12 lags using linear regression
m1 <- fit(value ~ p(12), data = retail,
date = "date", groups = "items", model = lm)
# Model with lags, MAs, and calendar features
m2 <- fit(value ~ p(12) + q(7, 28) + month() + dow(),
data = retail, date = "date", groups = "items", model = lm)
# GLM for count data
m3 <- fit(count ~ p(7) + dow(), data = count_data,
date = "date", groups = "store",
model = glm, family = poisson())
# Custom model specification
custom_model <- list(
fit = function(y, X, ...) {
# Your custom fitting logic
train_df <- cbind(data.frame(.response = y), X)
lm(.response ~ ., data = train_df, ...)
},
predict = function(object, newdata, ...) {
stats::predict(object, newdata = newdata, ...)
}
)
m4 <- fit(value ~ p(12), data = retail,
date = "date", groups = "items", model = custom_model)
} # }
