SARIMAX
Seasonal ARIMA with eXogenous variables for time series influenced by external factors
SARIMAX extends SARIMA with exogenous (external) variables. Use it when you have seasonal patterns and additional features that influence your target variable, such as weather, promotions, holidays, or economic indicators.
When to Use SARIMAX
SARIMAX is best suited for:
- Time series with seasonal patterns and external influencing factors
- When you have additional predictors available at prediction time
- Business forecasting with promotional calendars or marketing spend
- Weather-dependent forecasting with meteorological features
- Economic forecasting with leading indicators
- Scenarios where you need both seasonality and feature engineering
- When SARIMA alone underfits due to missing explanatory variables
Strengths
- Combines seasonal patterns with external predictors
- Leverages both historical patterns and current conditions
- Interpretable coefficients for exogenous variables
- Statistical framework with confidence intervals
- Can improve accuracy by incorporating domain knowledge
- Flexible: works with continuous or categorical external variables
- Handles trend, seasonality, and covariates simultaneously
Weaknesses
- Requires exogenous variables to be known at forecast time
- More parameters to tune than SARIMA: (p,d,q)(P,D,Q,s) + exogenous coefficients
- Exogenous variables must be available for future periods during inference
- Assumes linear relationships with exogenous variables
- Limited to single seasonality
- Computationally more intensive than SARIMA
- Requires careful feature engineering
- Can overfit if too many exogenous variables relative to data size
Parameters
Common Time Series Parameters
All time series models share these parameters:
- Timestamp Column (required): Column containing dates/times
- Target Column (required): Numeric value to forecast
- Exogenous Variables (optional): List of external feature columns (e.g., ['temperature', 'promotion', 'holiday'])
- Critical: These must be available for future forecast periods during inference
- Frequency (optional): Time spacing (D, H, W, M). Auto-inferred if not specified
- Forecast Steps (required, default=1): How many periods to predict
SARIMAX-Specific Parameters
Non-Seasonal Components
AR Order (p)
- Type: Integer
- Default: 1
- Description: Number of autoregressive terms (past values of target)
- Typical Range: 0-5
Differencing (d)
- Type: Integer
- Default: 1
- Description: Degree of differencing to achieve stationarity
- 0: No differencing
- 1: First difference (removes trend)
- Typical Range: 0-2
MA Order (q)
- Type: Integer
- Default: 1
- Description: Number of moving average terms (past errors)
- Typical Range: 0-5
Seasonal Components
Seasonal AR (P)
- Type: Integer
- Default: 1
- Description: Seasonal autoregressive order
- Typical Range: 0-2
Seasonal Diff (D)
- Type: Integer
- Default: 1
- Description: Seasonal differencing order
- Typical Range: 0-1
Seasonal MA (Q)
- Type: Integer
- Default: 1
- Description: Seasonal moving average order
- Typical Range: 0-2
Seasonal Period
- Type: Integer
- Default: 12
- Description: Number of periods in seasonal cycle
- Common Values:
- 7 for weekly in daily data
- 12 for yearly in monthly data
- 24 for daily in hourly data
Configuration Tips
Selecting Exogenous Variables
Good Exogenous Variables:
- Known in advance or predictable (temperature forecasts, planned promotions)
- Causally related to target (not just correlated)
- Consistent relationship over time
- Available at forecast time
Poor Exogenous Variables:
- Not available at forecast time (future stock prices)
- Outcome variables (don't predict sales with revenue)
- Highly correlated with each other (multicollinearity)
- Too many relative to sample size (risk of overfitting)
Example Exogenous Variables by Domain
Retail Sales:
- is_holiday (binary)
- promotion_discount (numeric)
- competitor_price (numeric)
- marketing_spend (numeric)
Energy Consumption:
- temperature (numeric)
- is_weekend (binary)
- is_holiday (binary)
- day_of_week (categorical → one-hot encoded)
Website Traffic:
- email_campaign_sent (binary)
- ad_impressions (numeric)
- content_posts (count)
Starting Configuration
For seasonal data with exogenous variables, start with:
Non-seasonal: (p=1, d=1, q=1)
Seasonal: (P=1, D=1, Q=1, s=[your period])
Exogenous: 1-5 carefully selected featuresFeature Engineering for Exogenous Variables
- Binary Indicators: is_holiday, is_weekend, is_sale_period
- Lagged Exogenous: If the effect is delayed, create lags
- Interactions: product of two features (e.g., temperature × is_summer)
- Categorical Encoding: One-hot or ordinal encoding for categories
Handling Missing Future Values
Problem: Exogenous variables must be known during forecasting.
Solutions:
- Use Predictable Features: Calendar features (day_of_week, month, is_holiday)
- Use Planned Values: Scheduled promotions, planned marketing spend
- Forecast Exogenous First: Build separate models for weather, prices, etc., then use forecasts
- Use Scenarios: Create multiple forecasts with different exogenous assumptions (best/worst case)
Model Complexity Management
- Start with 1-2 most important exogenous variables
- Add more only if they significantly improve cross-validation performance
- Use regularization or feature selection if you have many candidates
- Monitor for overfitting (training error << validation error)
Common Issues and Solutions
Issue: Exogenous Variables Not Available at Forecast Time
Solution:
- Only use features you can know in advance (holidays, planned events)
- Build separate forecast models for uncertain exogenous variables
- Use scenario-based forecasting (multiple forecasts with different assumptions)
- Consider switching to Prophet or SARIMA if you lack reliable future features
Issue: Model Ignores Exogenous Variables
Solution:
- Check that exogenous variables have sufficient variation
- Ensure they're not constant or near-constant
- Verify they're properly scaled (large range differences can cause issues)
- Confirm they're actually related to the target (check correlations)
- Try increasing the number of observations
Issue: Worse Performance Than SARIMA
Solution:
- Your exogenous variables may be adding noise
- Try simpler SARIMA without exogenous variables
- Check for multicollinearity (remove highly correlated features)
- Ensure exogenous variables are properly preprocessed
- Reduce number of exogenous variables
Issue: Training Fails or Doesn't Converge
Solution:
- Scale exogenous variables to similar ranges
- Remove features with missing values
- Check for perfect multicollinearity (identical features)
- Simplify model orders (p,d,q)(P,D,Q)s
- Ensure sufficient data (need more data with more exogenous variables)
Issue: Need Multiple Seasonalities
Solution:
- SARIMAX handles only one seasonal period
- Use Prophet with additional regressors (handles multiple seasonalities)
- Apply seasonal decomposition, then use SARIMAX on deseasonalized data
- Consider TBATS if multiple seasonalities are critical
Issue: Coefficients Have Wrong Signs
Solution:
- Check for multicollinearity between exogenous variables
- Verify data quality (no data entry errors)
- Consider interaction effects or non-linear relationships
- Remove redundant features
Issue: Poor Out-of-Sample Performance
Solution:
- Overfitting to exogenous variables
- Use time series cross-validation to tune
- Reduce number of exogenous variables
- Ensure future exogenous values are realistic (not using hindsight)
Example Use Cases
Retail Sales with Promotions
Target: daily_sales
Exogenous: [is_promotion, discount_percent, is_holiday]
SARIMAX(1,1,1)(1,1,1)7 # weekly seasonalityCaptures weekly patterns and the impact of promotions.
Electricity Demand
Target: hourly_demand
Exogenous: [temperature, is_weekend, is_holiday]
SARIMAX(2,0,1)(1,1,0)24 # daily seasonalityModels daily cycles with temperature and calendar effects.
App Downloads with Marketing
Target: daily_downloads
Exogenous: [ad_spend, email_sent, app_store_feature]
SARIMAX(1,1,1)(1,0,1)7 # weekly seasonalitySeparates organic weekly patterns from marketing-driven spikes.
Restaurant Revenue
Target: daily_revenue
Exogenous: [temperature, is_raining, local_events]
SARIMAX(1,1,0)(1,1,1)7 # weekly seasonalityAccounts for weather and special events beyond regular weekly patterns.
HVAC Energy Usage
Target: daily_energy
Exogenous: [avg_temperature, humidity, occupancy]
SARIMAX(1,0,1)(1,1,1)7 # weekly seasonalityModels energy use as a function of environmental conditions and weekly patterns.
Inference Requirements
When using a trained SARIMAX model for forecasting, you must provide:
- Trained Model: The fitted SARIMAX model
- Preprocessing Config: How exogenous variables were scaled/encoded
- Training Tail: Recent historical values for lag computation
- Future Exogenous Values: Values of all exogenous variables for each forecast step
Example Inference Input: If forecasting 7 days ahead with exogenous variables [temperature, is_holiday]:
forecast_steps = 7
future_exogenous = [
[25.0, 0], # day 1
[26.5, 0], # day 2
[24.0, 0], # day 3
[23.5, 0], # day 4
[25.0, 1], # day 5 (holiday)
[27.0, 0], # day 6
[28.0, 0], # day 7
]