ARIMA with Seasonality in Python using sktime

ARIMA (AutoRegressive Integrated Moving Average) with seasonality is an extension of the traditional ARIMA model to handle data with seasonal patterns. Seasonal patterns are periodic fluctuations that repeat over a fixed period, such as daily, weekly, monthly, or yearly. For seasonal data, we use the SARIMA (Seasonal ARIMA) or SARIMAX (Seasonal ARIMA with exogenous variables) models.

Here’s a breakdown of how to use ARIMA with seasonality:


1. Understanding SARIMA Model

SARIMA extends ARIMA by incorporating seasonal components:

SARIMA(p, d, q)(P, D, Q, s)

  • p, d, q: Non-seasonal ARIMA terms:
    • p: Order of the AutoRegressive (AR) part, which signifies the number of lagged observations in the model that are taken into account, thus determining how much of the past data influences the current value being predicted.
    • d: Number of differences needed to make the series stationary in order to meet the conditions of stationarity, indicating the degree of differencing required for the time series data.
    • q: Order of the Moving Average (MA) part, which refers to the number of lagged forecast errors in the prediction equation used to smooth time series data and identify trends over specific periods.
  • P, D, Q, s: Seasonal components:
    • P: Seasonal AutoRegressive order, indicates the number of lagged seasonal observations used to predict future values based on past data patterns.
    • D: Seasonal differencing order, which helps in removing seasonal trends and ensuring that the data is stationary for accurate modeling and forecasting.
    • Q: Seasonal Moving Average order, which refers to the specific arrangement of data points over a designated period of time, calculated by averaging values in a seasonal context to improve the accuracy of forecasting and data analysis results.
    • s: Seasonal period (e.g., 12 for monthly data with yearly seasonality).

The seasonal_order parameter in ARIMA in sktime is used to capture the seasonality in time series data. It is expressed as (P, D, Q, s), where:

  1. P (Seasonal Autoregressive Order): This parameter specifies the number of seasonal autoregressive terms. It indicates how many past values in the same season (e.g., same month in previous years) influence the current value.
  2. D (Seasonal Differencing Order): This parameter specifies the number of seasonal differences needed to make the time series stationary. Seasonal differencing is used to remove seasonality from the data. For instance, if the seasonality is yearly, differencing will subtract the value from the same month of the previous year.
  3. Q (Seasonal Moving-Average Order): This parameter specifies the number of lagged forecast errors to be included in the model. It indicates how many past forecast errors in the same season are used to predict the current value.
  4. s (Length of Seasonal Cycle): This parameter specifies the number of time periods per season. For example, if the data has yearly seasonality with monthly observations, s would be 12 because there are 12 months in a year.

Example Breakdown with sktime

In this sktime’s ARIMA codes:

# Initialize ARIMA forecaster
forecaster = ARIMA(
    order=(1, 1, 0),
    seasonal_order=(0, 1, 0, 12),
    suppress_warnings=True
)

Note that

 seasonal_order=(0, 1, 0, 12)
  • P = 0: No seasonal autoregressive terms are included.
  • D = 1: The data will be differenced once to remove yearly seasonality.
  • Q = 0: No seasonal moving-average terms are included.
  • s = 12: The seasonality length is 12, indicating a yearly cycle with monthly data.

Why Use Seasonal Parameters?

Seasonal parameters are crucial when the time series exhibits regular patterns at specific intervals, like monthly sales data showing peaks during holiday seasons or electricity consumption showing higher values during summer and winter.

By incorporating these seasonal components, the ARIMA model can better capture and predict the underlying seasonal patterns in the data, leading to more accurate forecasts.

In short, according to sktime’s documentation, in seasonal_order, the (P, D, Q, s) order specifies the seasonal component of the ARIMA model, covering autoregressive parameters (P), differences (D), moving-average parameters (Q), and periodicity (s). D must be an integer indicating the number of seasonal differences needed to make the series stationary. P and Q can either be integers that define the number of autoregressive (AR) and moving-average (MA) terms up to those orders, respectively, or iterables that specify particular lags to be included. s is an integer representing the length of the seasonal cycle (e.g., 4 for quarterly data or 12 for monthly data). By default, the model assumes there is no seasonal effect if not specified.


Implementation in sktime

Let’s performs time series forecasting using an ARIMA (AutoRegressive Integrated Moving Average) model, and visualizes the results using the airline dataset


Imports and Setup

  1. matplotlib.pyplot: Used for plotting the data and results.
  2. load_airline from sktime.datasets: Loads the monthly airline passenger dataset (a classic time series dataset).
  3. ARIMA from sktime.forecasting.arima: Provides the ARIMA forecasting model.
  4. numpy (imported as np): Used to create numeric arrays for future horizons (fh) in predictions.

Loading Data

  • y = load_airline():
    • Loads the airline passenger dataset, a univariate time series that contains monthly totals of international airline passengers from 1949 to 1960.

Initializing the ARIMA Forecaster

forecaster = ARIMA(
    order=(1, 1, 0),
    seasonal_order=(0, 1, 0, 12),
    suppress_warnings=True
)
  • order=(1, 1, 0):
    • Specifies the non-seasonal ARIMA model parameters:
      • 1: Number of lagged observations in the AR (AutoRegressive) component.
      • 1: Differencing order (to make the series stationary).
      • 0: Number of lagged forecast errors in the MA (Moving Average) component.
  • seasonal_order=(0, 1, 0, 12):
    • Specifies the seasonal component:
      • 0: No seasonal AR component.
      • 1: Seasonal differencing order.
      • 0: No seasonal MA component.
      • 12: Seasonal periodicity (e.g., 12 months in a year).
  • suppress_warnings=True:
    • Prevents warnings about convergence or other issues from being displayed.

Fitting the Model

  • forecaster.fit(y):
    • Trains the ARIMA model on the airline dataset.

Generating Predictions

  • y_pred = forecaster.predict(fh=np.arange(20)):
    • Forecasts the next 20 periods (future horizon fh):
      • np.arange(20) creates an array [1, 2, ..., 20] representing the steps ahead to predict.

Plotting the Results

plt.figure(figsize=(10, 5))
plt.plot(y.to_timestamp(), label='Actual')
plt.plot(y_pred, label='Forecast', color='red')
plt.title('ARIMA Model Forecast - airline dataset')
plt.legend()
plt.savefig('airline.jpg')
plt.show()
  1. plt.figure(figsize=(10, 5)):
    • Sets the figure size to 10 inches by 5 inches.
  2. plt.plot(y.to_timestamp(), label='Actual'):
    • Plots the actual time series data (converted to timestamps for better visualization).
  3. plt.plot(y_pred, label='Forecast', color='red'):
    • Plots the forecasted values in red.
  4. plt.title():
    • Adds a title to the plot.
  5. plt.legend():
    • Displays the legend for ‘Actual’ and ‘Forecast’ labels.
  6. plt.savefig('airline.jpg'):
    • Saves the plot as an image file named airline.jpg.
  7. plt.show():
    • Displays the plot.

Output

The plot shows:

  • The historical airline passenger data (as the “Actual” line).
  • The forecasted future values (as the “Forecast” line, in red).

Combined codes (click to download):

import matplotlib.pyplot as plt
from sktime.datasets import load_airline
from sktime.forecasting.arima import ARIMA
import numpy as np

# Load data
y = load_airline()

# Initialize ARIMA forecaster
forecaster = ARIMA(
    order=(1, 1, 0),
    seasonal_order=(0, 1, 0, 12),
    suppress_warnings=True
)

# Fit the model
forecaster.fit(y)

# Generate predictions
y_pred = forecaster.predict(fh=np.arange(20))

# Plot the predictions
plt.figure(figsize=(10, 5))
plt.plot(y.to_timestamp(), label='Actual')
plt.plot(y_pred, label='Forecast', color='red')
plt.title('ARIMA Model Forecast - airline dataset')
plt.legend()
plt.savefig('airline.jpg')
plt.show()


Discover more from Science Comics

Subscribe to get the latest posts sent to your email.

Leave a Reply

error: Content is protected !!