Apply gap-filling strategies to handle missing target values in time series data. This ensures train/forecast parity by making features deterministic and auditable.
Usage
fill_gaps(
data,
target,
date,
groups = NULL,
strategy = "error",
params = list()
)Arguments
- data
Data frame with time series data
- target
Character, name of target column
- date
Character, name of date column
- groups
Character vector of group column names (NULL for ungrouped)
- strategy
Character, gap-filling strategy. One of:
"error"- Fail if any NAs present (default, forces explicit choice)"zero"- Replace NAs with 0 (appropriate for count data)"locf"- Last observation carried forward"nocb"- Next observation carried backward"linear"- Linear interpolation (Phase 2)"rolling_mean"- Centered rolling mean (Phase 2)"stl"- Seasonal decomposition (Phase 3)"borrow"- Cross-series borrowing for panel data (Phase 3)"custom"- User-provided function (Phase 4)
- params
List of strategy-specific parameters. Common parameters:
max_gap- Maximum gap length to fill (for locf, nocb)extrapolate- Allow extrapolation (for linear)window- Window size (for rolling_mean)period- Seasonal period (for stl)fn- User function (for custom)
Value
Data frame with:
Original columns
Filled target column
{target}_is_imputed- Logical flag indicating imputed values
Details
The is_imputed flag allows downstream models to:
Filter out imputed rows if desired
Use the flag as a predictor to learn different behavior
Weight imputed observations differently
Gap-filling respects group boundaries and never fills across groups. Each group's time series is filled independently.
Examples
if (FALSE) { # \dontrun{
# Retail sales: missing = no sale
sales_filled <- fill_gaps(sales_data, target = "revenue", date = "date",
groups = "store", strategy = "zero")
# Sensor data: carry forward up to 3 missing readings
sensor_filled <- fill_gaps(sensor_data, target = "temperature",
date = "timestamp", groups = "device",
strategy = "locf",
params = list(max_gap = 3))
# Check imputation summary
table(sensor_filled$temperature_is_imputed)
# Use filled data in forecasting
ts <- TimeSeries(sensor_filled, date = "timestamp", groups = "device")
m <- fit(temperature ~ p(7) + rollsum(7), data = ts, model = lm)
} # }
