Create a TimeSeries Object with Complete Preprocessing Pipeline

This function creates a TimeSeries object that bundles data with frequency information and provides a comprehensive preprocessing pipeline for time series data. It handles:

Frequency detection and validation
Irregular calendar completion (missing dates)
Target variable gap-filling
Exogenous variable gap-filling

Usage

TimeSeries(
  data,
  date,
  groups = NULL,
  frequency = NULL,
  auto_detect = TRUE,
  target = NULL,
  target_na = NULL,
  fill_time = FALSE,
  xreg_na = NULL
)

Arguments

data

A data frame containing the time series data

date

Character string naming the date column. Accepts:

Date: For daily and longer frequencies (day, week, month, quarter, year)
POSIXct: Required for sub-daily frequencies (second, minute, halfhour, hour)

groups

Optional character vector naming grouping columns for panel data. All preprocessing operations (time grid completion, gap-filling) are performed independently per group.

frequency

Character string or numeric specifying the time frequency:

Sub-daily (require POSIXct): "second", "minute", "halfhour", "hour"
Daily+ (work with Date or POSIXct): "day", "businessday", "biweekly", "week", "month", "quarter", "year"
Numeric: Custom interval in days (e.g., 7 for weekly, 14 for biweekly)

If NULL and auto_detect = TRUE, frequency is inferred from median date/time differences.

auto_detect

Logical, if TRUE and frequency is NULL, attempt to detect frequency from the data. Default: TRUE.

target

Character string naming the target column (optional). Required if target_na is specified. The target column will be gap-filled according to the target_na strategy.

target_na

List specifying gap-filling strategy for missing target values.

strategy: Character, one of:
- "error" - Fail if NAs present (forces explicit choice)
- "zero" - Replace NAs with 0 (useful for count data)
- "locf" - Last observation carried forward
- "nocb" - Next observation carried backward
- "linear" - Linear interpolation (time-aware)
- "rolling_mean" - Centered or right-aligned rolling mean
- "stl" - Seasonal-Trend-Loess decomposition (auto-detects period)
- "borrow" - Cross-series borrowing from peer groups
- "custom" - User-provided function
params: List of strategy-specific parameters (see Details)

Default: NULL (no target gap-filling). Adds {target}_is_imputed flag column.

fill_time

Logical, if TRUE, complete missing dates using tidyr::complete(). Uses the frequency parameter to determine step size. When enabled, adds rows for missing dates with NA values in all dynamic columns. Respects group boundaries for panel data. Default: FALSE.

xreg_na

Named list specifying gap-filling strategies for exogenous columns. Each element should be: column_name = list(strategy = "...", params = list(...)).

Keys: Column names to fill (must exist in data)
Values: Lists with strategy and params (same as target_na)

Example: list(price = list(strategy = "locf"), promo = list(strategy = "zero")). Each column gets its own {column}_is_imputed flag. Filling is done per-group if groups is specified.

Value

A TimeSeries object (S3 class) with components:

data - Data frame with completed calendar and filled gaps (if requested)
date - Name of the date column
groups - Names of grouping columns (or NULL)
frequency - Time frequency specification
target - Name of the target column (or NULL)
target_na_meta - Metadata about target gap-filling (or NULL):
- strategy - Strategy used
- params - Parameters used
- n_imputed - Number of values imputed
- n_total - Total observations
- pct_imputed - Percentage imputed
xreg_na_meta - Named list of metadata for each exogenous column (or empty list)
time_fill_meta - Metadata about time grid completion (or NULL):
- n_added - Number of rows added
- by - Step size used
- n_total - Total observations after completion

Details

All preprocessing is auditable via metadata and is_imputed flags.

Preprocessing Pipeline Order

TimeSeries() applies preprocessing in this order:

Sort data by groups (if present) and date
Complete time grid (if fill_time$enabled = TRUE)
- Per-group for panel data
- Adds rows for missing dates with NA values
Fill target (if target and target_na specified)
- Uses specified strategy
- Adds {target}_is_imputed flag
Fill exogenous columns (if xreg_na specified)
- Per-column, per-group filling
- Adds {column}_is_imputed flag for each

Gap-Filling Strategy Parameters

Common parameters across strategies:

max_gap: Maximum consecutive NAs to fill (default: Inf). Throws error if gap exceeds this value.

Strategy-specific parameters:

linear: extrapolate = FALSE - Allow extrapolation beyond observed range
rolling_mean: window = 7, center = TRUE - Window size and alignment
stl: period = NULL, robust = TRUE - Seasonal period (auto-detected if NULL) and robust fitting
borrow: method = "median", neighbors = NULL - Aggregation method and neighbor filtering
custom: fn = function(y, dates, params) {...} - User-provided function

See ?fill_gaps for detailed documentation of each strategy.

Auditability

All gap-filling operations add is_imputed flags:

{target}_is_imputed - Logical vector marking imputed target values
{column}_is_imputed - Logical vector for each exogenous column

These flags can be used to:

Filter imputed rows: data[!data$sales_is_imputed, ]
Use as model predictor: sales ~ ... + sales_is_imputed
Weight observations: lm(..., weights = ifelse(is_imputed, 0.5, 1))

Metadata is stored in the TimeSeries object for full reproducibility.

Examples

if (FALSE) { # \dontrun{
# ===== Basic Usage =====
# Create TimeSeries with auto-detected frequency
ts <- TimeSeries(retail, date = "date", groups = "store")

# Specify frequency explicitly
ts <- TimeSeries(retail, date = "date", groups = "store", frequency = "day")

# ===== Target Gap-Filling =====
# Fill target with last observation carried forward
ts <- TimeSeries(
  retail,
  date = "date",
  target = "sales",
  target_na = list(strategy = "locf", params = list(max_gap = 7))
)

# Fill target with seasonal decomposition
ts <- TimeSeries(
  retail,
  date = "date",
  groups = "store",
  target = "sales",
  target_na = list(strategy = "stl", params = list(period = 7))
)

# ===== Complete Preprocessing Pipeline =====
# Handle irregular calendar + target gaps + exogenous gaps
ts <- TimeSeries(
  sales_df,
  date = "date",
  groups = c("store", "item"),
  frequency = "day",
  target = "sales",
  target_na = list(strategy = "locf"),
  fill_time = TRUE,  # Complete missing dates using frequency
  xreg_na = list(
    price = list(strategy = "linear"),  # Smooth interpolation
    promo = list(strategy = "zero")     # NA = no promotion
  )
)

# Inspect preprocessing results
print(ts)  # Shows all metadata
summary(ts$data$sales_is_imputed)  # Check how many values imputed

# ===== Using with fit() and forecast() =====
# fit() extracts cleaned data automatically
m <- fit(sales ~ p(7) + price + promo, data = ts, model = lm)

# forecast() uses stored frequency
fc <- forecast(m, h = 14)

# ===== Panel Data Example =====
# Each store gets independent preprocessing
ts <- TimeSeries(
  panel_df,
  date = "date",
  groups = "store",
  frequency = "day",
  target = "sales",
  target_na = list(strategy = "borrow", params = list(method = "median")),
  fill_time = TRUE,  # Uses frequency = "day"
  xreg_na = list(
    price = list(strategy = "locf"),
    temp = list(strategy = "linear")
  )
)

# Metadata shows per-store/column imputation counts
ts$time_fill_meta  # Rows added to complete calendar
ts$target_na_meta  # Target imputation stats
ts$xreg_na_meta    # Exogenous imputation stats per column
} # }