ARIMA stands for AutoRegressive Integrated Moving Average, which is a widely used statistical model for analyzing and forecasting time series data. The model is particularly effective for understanding and predicting future points in a series based on its own past values (autoregressive), differences between consecutive observations (integrated), and past forecast errors (moving average).
ARIMA is very flexible and can handle various types of time series data, making it a popular choice in fields like finance, economics, and environmental science. It requires the time series to be stationary, meaning it should have a consistent mean and variance over time, which can be achieved through differencing.
Key Components of ARIMA:
- AutoRegressive (AR): This component uses the dependency between an observation and a number of lagged observations (i.e., previous values).
- Integrated (I): This component involves differencing the data to make it stationary, which means that its statistical properties like mean and variance are constant over time.
- Moving Average (MA): This part models the relationship between an observation and a residual error from a moving average model applied to lagged observations.
ARIMA Model Notation:
ARIMA models are typically denoted as ARIMA(p, d, q)
where:
p
is the number of lag observations included in the model (AR part).d
is the number of times that the raw observations are differenced (I part).q
is the size of the moving average window (MA part).
Where:
is the actual value at time
.
is a constant term.
(for
) are the coefficients of the autoregressive terms.
(for
) are the coefficients of the moving average terms.
is the error term at time
.
This equation represents the ARIMA model with parameters ,
, and
, capturing the autoregressive, differencing, and moving average components, respectively.
Implementation in Python with sktime
We’ll start with a toy standard example first. Then we’ll see how to use auto-ARIMA in sktime to select the best parameters for an ARIMA model,….
In addition to sktime, we also need to install package ‘pmdarima’ to implement ARIMA. This can be done using pip install pmdarima. Otherwise, we’ll meet this error:
ModuleNotFoundError: ARIMA requires package 'pmdarima' to be present in the python environment, but 'pmdarima' was not found. 'pmdarima' is a soft dependency and not included in the base sktime installation. Please run: `pip install pmdarima` to install the pmdarima package. To install all soft dependencies, run: `pip install sktime[all_extras]`
Ex1: Basic Toy example
1. Importing Libraries
from sktime.pipeline import make_pipeline
from sktime.transformations.series.adapt import TabularToSeriesAdaptor
from sktime.forecasting.arima import ARIMA
from sktime.datasets import load_airline
from sklearn.preprocessing import StandardScaler
make_pipeline
: Combines different preprocessing steps and a forecasting model into a unified pipeline.TabularToSeriesAdaptor
: Adapts tabular transformers (likeStandardScaler
) to work with time series data.ARIMA
: A popular time series forecasting model that uses autoregression, integration, and moving average components.load_airline
: Loads the airline passenger dataset, a popular univariate time series dataset.StandardScaler
: Scales features by removing the mean and scaling to unit variance. It’s a standard preprocessing step.
2. Loading the Dataset
y = load_airline()
- Loads the monthly airline passenger counts dataset, which is a univariate time series.
- The dataset contains values representing the number of airline passengers per month over several years.
3. Defining the Scaler
scaler = TabularToSeriesAdaptor(StandardScaler())
- The
StandardScaler
fromscikit-learn
is designed for tabular data. Sincesktime
deals with time series data, we wrap the scaler usingTabularToSeriesAdaptor
, enabling it to process time series natively. - The scaler normalizes the time series, which can improve the performance and convergence of the forecasting model.
4. Defining the Forecaster
forecaster = ARIMA(order=(1, 1, 1))
- An ARIMA model is specified with the parameters
order=(1, 1, 1)
:1
for the autoregressive (AR) term.1
for the differencing (I) term, which makes the series stationary.1
for the moving average (MA) term.
- The ARIMA model will use these parameters to capture the patterns in the time series data.
5. Creating the Pipeline
pipeline = make_pipeline(scaler, forecaster)
- Combines the
scaler
andforecaster
into a unified pipeline. - The pipeline ensures that the scaling is applied before the ARIMA model is fit to the data.
- This modular approach makes the workflow cleaner and easier to manage.
6. Fitting the Pipeline
pipeline.fit(y)
- Fits the entire pipeline to the time series data
y
. - The scaling transformation is applied first, followed by fitting the ARIMA model to the scaled data.
7. Making Predictions
y_pred = pipeline.predict(fh=12)
- Forecasts the next 12 time steps (
fh=12
) into the future. - The forecasting horizon (
fh
) defines how far ahead we want to predict.
8. Outputting the Predictions
print(y_pred)
- Prints the 12-step-ahead predictions.
- The output is a
pandas
Series containing the forecasted values for the next 12 time steps.
Combined codes (click to download):
from sktime.pipeline import make_pipeline
from sktime.transformations.series.adapt import TabularToSeriesAdaptor
from sktime.forecasting.arima import ARIMA
from sktime.datasets import load_airline
from sklearn.preprocessing import StandardScaler
# Load sample dataset
y = load_airline()
# Define the scaler (wrapped to adapt to time series)
scaler = TabularToSeriesAdaptor(StandardScaler())
# Define the forecaster
forecaster = ARIMA(order=(1, 1, 1))
# Create the pipeline using make_pipeline
pipeline = make_pipeline(scaler, forecaster)
# Fit the pipeline
pipeline.fit(y)
# Forecast for 12 steps ahead
y_pred = pipeline.predict(fh=12)
print(y_pred)
auto-ARIMA algorithm for finding the optimal parameters for an ARIMA model
The auto-ARIMA algorithm is designed to find the optimal parameters for an ARIMA model, resulting in a single fitted ARIMA model. Auto-ARIMA conducts differencing tests, such as Kwiatkowski-Phillips-Schmidt-Shin, Augmented Dickey-Fuller, or Phillips-Perron, to determine the order of differencing (d). It then fits models within specified ranges for (p) (start_p to max_p) and (q) (start_q to max_q). If the seasonal option is enabled, auto-ARIMA also identifies the optimal seasonal parameters (P) and (Q) after conducting the Canova-Hansen test to determine the seasonal differencing order (D).
The algorithm aims to find the best model by optimizing a selected information criterion—one of (aic) (Akaike Information Criterion), (aicc) (Corrected Akaike Information Criterion), (bic) (Bayesian Information Criterion), (hqic) (Hannan-Quinn Information Criterion), or (oob) (out-of-bag validation scoring)—and returns the ARIMA model that minimizes this value.
However, due to stationarity issues, auto-ARIMA in sktime might not always find a converging model. In such cases, a ValueError
is raised, suggesting that stationarity-inducing measures be taken or a new range of order values be selected. Non-stepwise selection (essentially a grid search) can be slow, especially for seasonal data.
Implementation of auto-ARIMA in sktime
Now, let’s use the AutoARIMA model from the sktime
library to automatically select optimal ARIMA parameters for time series forecasting and then visualizes the actual and forecasted data for the airline passengers dataset. Here’s a detailed breakdown:
Imports
matplotlib.pyplot
: Used for creating and customizing plots.load_airline
fromsktime.datasets
: Loads the monthly airline passenger dataset, a popular time series dataset.numpy
: Used to create numerical arrays, particularly for the future forecast horizon (fh
).AutoARIMA
fromsktime.forecasting.arima
: Automatically selects the best ARIMA model based on given constraints and data.
Loading Data
y = load_airline()
:- Loads the airline passenger dataset.
- This dataset contains monthly totals of international airline passengers from 1949 to 1960 as a time series.
Initializing AutoARIMA
forecaster = AutoARIMA(
sp=12, d=0, max_p=3, max_q=2, suppress_warnings=True
)
The AutoARIMA model automates the process of identifying optimal ARIMA parameters. Here’s what each argument does:
sp=12
:- Seasonal periodicity (12 months in a year).
d=0
:- Specifies the differencing order (
d
) for making the series stationary. AutoARIMA will respect this fixed value.
- Specifies the differencing order (
max_p=3
andmax_q=2
:- Limits the range for potential AR (AutoRegressive) and MA (Moving Average) parameters:
p
(lagged observations) will be searched up to 3.q
(lagged forecast errors) will be searched up to 2.
- Limits the range for potential AR (AutoRegressive) and MA (Moving Average) parameters:
suppress_warnings=True
:- Prevents warnings (e.g., related to convergence or non-invertibility) from being displayed.
Fitting the Model
forecaster.fit(y)
:- Trains the AutoARIMA model on the airline passenger dataset to find the optimal parameters.
Generating Predictions
y_pred = forecaster.predict(fh=np.arange(20))
:- Forecasts the next 20 periods:
fh
(forecasting horizon): Specifies the steps ahead to predict.np.arange(20)
creates an array[1, 2, ..., 20]
, representing these steps.
- Forecasts the next 20 periods:
Plotting the Results
plt.figure(figsize=(10, 5))
plt.plot(y.to_timestamp(), label='Actual')
plt.plot(y_pred, label='Forecast', color='red')
plt.title('ARIMA Model Forecast - airline dataset')
plt.legend()
plt.savefig('airline-auto-arima.jpg')
plt.show()
plt.figure(figsize=(10, 5))
:- Sets the plot size to 10 inches by 5 inches.
plt.plot(y.to_timestamp(), label='Actual')
:- Plots the historical airline passenger data, converting it to timestamps for clarity.
plt.plot(y_pred, label='Forecast', color='red')
:- Plots the forecasted values from the AutoARIMA model in red.
plt.title()
:- Adds a title to the plot.
plt.legend()
:- Displays a legend for the “Actual” and “Forecast” data.
plt.savefig('airline-auto-arima.jpg')
:- Saves the plot as an image file named
airline-auto-arima.jpg
.
- Saves the plot as an image file named
plt.show()
:- Displays the plot in the current output.
Output

- The plot shows:
- The historical data (labeled “Actual”).
- The 20-step-ahead forecast (labeled “Forecast”) in red.
- The forecast leverages the seasonal patterns in the data and uses optimal ARIMA parameters found by AutoARIMA.
Combined codes: (click to download)
import matplotlib.pyplot as plt
from sktime.datasets import load_airline
import numpy as np
from sktime.forecasting.arima import AutoARIMA
y = load_airline()
forecaster = AutoARIMA(
sp=12, d=0, max_p=3, max_q=2, suppress_warnings=True
)
forecaster.fit(y)
y_pred = forecaster.predict(fh=np.arange(20))
# Plot the predictions
plt.figure(figsize=(10, 5))
plt.plot(y.to_timestamp(), label='Actual')
plt.plot(y_pred, label='Forecast', color='red')
plt.title('ARIMA Model Forecast - airline dataset')
plt.legend()
plt.savefig('airline-auto-arima.jpg')
plt.show()
Discover more from Science Comics
Subscribe to get the latest posts sent to your email.