How to Implement a Time-Series Forecasting Model with ARIMA

How to Implement a Time-Series Forecasting Model with ARIMA
Photo by NOAA / Unsplash

Time-series forecasting—the process of predicting future values based on historical data—is a foundational component of intelligent business operations. For CTOs and engineering leaders, robust forecasting capabilities are critical for optimizing inventory, managing compute resources, projecting financial performance, and detecting anomalies.

While modern deep learning models (LSTMs, Transformers) garner significant attention, classical statistical methods remain powerful, interpretable, and highly effective, particularly for univariate time series. The most prominent among these is ARIMA, or AutoRegressive Integrated Moving Average.

ARIMA is a model class that captures a suite of different temporal structures in time-series data. This article provides a comprehensive, code-first technical guide for implementing ARIMA, focusing on the practical steps, diagnostic procedures, and architectural considerations required for an enterprise-grade application.

Deconstructing the ARIMA(p, d, q) Model

Before implementation, it is crucial to understand the three components that define the model, denoted as ARIMA($p, d, q$):

  • $p$ (AutoRegressive - AR): This component models the relationship between an observation and a specific number of lagged observations (i.e., previous time steps). The parameter $p$ defines how many previous periods are used to predict the current value. A $p$ of 2 means the current value is predicted based on the values from the previous two periods.
  • $d$ (Integrated - I): This component addresses a critical prerequisite for ARIMA: stationarity. A stationary time series is one whose statistical properties (mean, variance, autocorrelation) are constant over time. If a series exhibits trends (e.g., consistent upward growth), it is non-stationary. The $d$ parameter represents the number of times the raw data must be differenced (i.e., subtracting the previous value from the current value) to achieve stationarity.
  • $q$ (Moving Average - MA): This component models the relationship between an observation and the residual errors from a moving average model applied to $q$ lagged observations. It captures short-term, random shocks or fluctuations in the data.

The goal of the implementation process is to find the optimal values for ($p, d, q$) that best describe the underlying data.

Product Engineering Services

Work with our in-house Project Managers, Software Engineers and QA Testers to build your new custom software product or to support your current workflow, following Agile, DevOps and Lean methodologies.

Build with 4Geeks

The Implementation Workflow: A Step-by-Step Approach

We will use Python with the standard data science stack: pandas for data manipulation, matplotlib for visualization, statsmodels for the core statistical tests and model, and pmdarima for automated parameter tuning.

First, ensure the necessary libraries are installed:

pip install pandas matplotlib statsmodels pmdarima scikit-learn

Step 1: Data Loading and Visualization

Assume we have a simple CSV file (dataset.csv) with monthly sales data:

monthsales
2023-01-01200
2023-02-01210
......
import pandas as pd
import matplotlib.pyplot as plt

# Load and prepare the data
# parse_dates=True and index_col=0 are critical for time-series analysis
data = pd.read_csv('dataset.csv', 
                   parse_dates=['month'], 
                   index_col='month')

# Ensure the index is a DatetimeIndex with a defined frequency
data.index = pd.to_datetime(data.index)
data = data.asfreq('MS') # 'MS' = Month Start frequency

# Visualize the data to identify trends and seasonality
plt.figure(figsize=(12, 6))
plt.plot(data['sales'])
plt.title('Monthly Sales Over Time')
plt.xlabel('Date')
plt.ylabel('Sales')
plt.show()

From this plot, we would visually inspect for an upward or downward trend (non-stationarity in mean) or seasonality (cyclical patterns).

Step 2: Ensure Stationarity (Finding $d$)

ARIMA models require stationary data. We use the Augmented Dickey-Fuller (ADF) test to check for stationarity.

  • Null Hypothesis ($H_0$): The series is non-stationary (it has a unit root).
  • Alternative Hypothesis ($H_a$): The series is stationary.

We look for a p-value < 0.05 to reject the null hypothesis and confirm stationarity.

from statsmodels.tsa.stattools import adfuller

def check_stationarity(timeseries):
    """
    Performs the ADF test and prints the results.
    """
    print('Results of Dickey-Fuller Test:')
    dftest = adfuller(timeseries, autolag='AIC')
    dfoutput = pd.Series(dftest[0:4], 
                         index=['Test Statistic', 'p-value', '#Lags Used', 'Number of Observations'])
    print(dfoutput)
    
    if dftest[1] > 0.05:
        print("\n[!] Series is Non-Stationary (p-value > 0.05). Differencing is required.")
    else:
        print("\n[✓] Series is Stationary (p-value <= 0.05).")

# Check original data
check_stationarity(data['sales'])

If the p-value is > 0.05, the data is non-stationary. We apply differencing (the 'I' in ARIMA) and re-test. This process gives us our $d$ parameter.

# Apply first-order differencing (d=1)
sales_diff_1 = data['sales'].diff().dropna()

# Check stationarity again
print("\n--- Checking First-Order Differencing (d=1) ---")
check_stationarity(sales_diff_1)

# If still non-stationary, try second-order differencing (d=2)
# sales_diff_2 = sales_diff_1.diff().dropna()
# check_stationarity(sales_diff_2)

The $d$ value is the number of differencing operations required to achieve stationarity. For most business data, $d$ is 1 or 2.

Step 3: Identify Parameters $p$ and $q$

Once we have a stationary series (our differenced data), we can find $p$ and $q$.

Method 1: Manual Inspection with ACF/PACF Plots

  • ACF (Autocorrelation Function) Plot: Helps identify the $q$ parameter. Look for the last significant spike outside the confidence interval (blue shaded area) before a sharp "cut-off."
  • PACF (Partial Autocorrelation Function) Plot: Helps identify the $p$ parameter. Look for the last significant spike outside the confidence interval.
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# Use the stationary data (e.g., sales_diff_1)
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))

# ACF plot (for q)
plot_acf(sales_diff_1, ax=ax1, lags=20)

# PACF plot (for p)
plot_pacf(sales_diff_1, ax=ax2, lags=20)

plt.show()

This manual method is subjective and complex. A "cut-off" can be ambiguous.

A more robust and preferred approach is to use an automated algorithm that iterates through combinations of ($p, d, q$) and selects the best model based on an information criterion, such as AIC (Akaike Information Criterion). The pmdarima library provides an excellent auto_arima function for this.

Product Engineering Services

Work with our in-house Project Managers, Software Engineers and QA Testers to build your new custom software product or to support your current workflow, following Agile, DevOps and Lean methodologies.

Build with 4Geeks

auto_arima will also handle differencing ($d$) and even seasonality (SARIMA) automatically.

import pmdarima as pm

# auto_arima will find the best (p, d, q) combination
# We pass the *original* data, not the differenced data
# m=12 indicates a yearly seasonal cycle (12 months)
# 'stepwise=True' speeds up the search
auto_model = pm.auto_arima(data['sales'], 
                           start_p=1, start_q=1,
                           test='adf',       # Use ADF test to find d
                           max_p=3, max_q=3,  # Max p and q
                           m=12,             # Frequency of the series (12 for monthly)
                           d=None,           # Let the test determine d
                           seasonal=True,    # Test for seasonality
                           start_P=0, 
                           D=None,           # Let the test determine D
                           trace=True,       # Print results
                           error_action='ignore',
                           suppress_warnings=True,
                           stepwise=True)

print(auto_model.summary())

The summary from auto_arima will provide the best-fit model parameters, such as SARIMAX(1, 1, 1)(0, 1, 1, 12), which means:

  • ARIMA ($p,d,q$): (1, 1, 1)
  • Seasonal ($P,D,Q$)m: (0, 1, 1, 12)

This indicates an ARIMA(1,1,1) model with a seasonal component of (0,1,1) based on a 12-month cycle. This combined model is SARIMA, an extension of ARIMA that explicitly handles seasonality.

Step 4: Model Validation and Diagnostics

Fitting a model is not enough. We must validate that its underlying assumptions hold. We do this by analyzing the model's residuals (the difference between the model's predictions and the actual values).

Ideal residuals should behave like white noise:

  1. They should be uncorrelated (no patterns).
  2. They should have a mean of zero.
  3. They should have constant variance.

The auto_arima (or statsmodels) fit object provides a convenient plot_diagnostics method.

auto_model.plot_diagnostics(figsize=(15, 12))
plt.show()
  • Top-Left (Standardized Residuals): Should show no obvious trend or pattern.
  • Top-Right (Histogram plus KDE): Should closely resemble a normal distribution (bell curve), indicating residuals are normally distributed around zero.
  • Bottom-Left (Q-Q Plot): Points should lie closely on the red diagonal line, confirming normality.
  • Bottom-Right (Correlogram/ACF): No spikes should be significant after lag 0. This confirms residuals are uncorrelated.

If these diagnostics look good, the model is statistically sound.

Step 5: Backtesting (Train/Test Split)

Before forecasting, we must validate performance on unseen data.

from sklearn.metrics import mean_squared_error
from numpy import sqrt

# Split data (e.g., 80% train, 20% test)
train_size = int(len(data) * 0.8)
train_data = data.iloc[:train_size]
test_data = data.iloc[train_size:]

# We must re-fit the model *only* on the training data
# We use the parameters found by auto_arima
from statsmodels.tsa.statespace.sarimax import SARIMAX

# Example parameters from auto_arima: SARIMAX(1, 1, 1)(0, 1, 1, 12)
model = SARIMAX(train_data['sales'],
                order=(1, 1, 1),
                seasonal_order=(0, 1, 1, 12))

# Fit the model
model_fit = model.fit(disp=False)

# Make predictions on the test set
start = len(train_data)
end = len(train_data) + len(test_data) - 1
predictions = model_fit.predict(start=start, end=end, dynamic=False)

# Compare predictions to actual test data
plt.figure(figsize=(12, 6))
plt.plot(train_data['sales'], label='Training Data')
plt.plot(test_data['sales'], label='Actual Data (Test)', color='orange')
plt.plot(predictions, label='ARIMA Predictions', color='green', linestyle='--')
plt.legend()
plt.show()

# Calculate RMSE
rmse = sqrt(mean_squared_error(test_data['sales'], predictions))
print(f'Test RMSE: {rmse:.3f}')

Step 6: Final Forecasting

Once validated, we fit the model on the entire dataset and use it to forecast the future.

# Fit the final model on ALL data
final_model = SARIMAX(data['sales'],
                      order=(1, 1, 1),
                      seasonal_order=(0, 1, 1, 12))
final_model_fit = final_model.fit(disp=False)

# Forecast the next 12 periods (months)
n_periods = 12
forecast = final_model_fit.get_forecast(steps=n_periods)

# Get confidence intervals
forecast_ci = forecast.conf_int(alpha=0.05) # 95% confidence

# Plot the forecast
plt.figure(figsize=(14, 7))
plt.plot(data['sales'], label='Historical Data')
plt.plot(forecast.predicted_mean, label='Forecast', color='red')
plt.fill_between(forecast_ci.index,
                 forecast_ci.iloc[:, 0],
                 forecast_ci.iloc[:, 1], color='r', alpha=0.1, label='95% Confidence Interval')
plt.title('Sales Forecast')
plt.legend()
plt.show()

Architectural and Performance Considerations

While powerful, ARIMA is not a one-size-fits-all solution.

  • Scalability: ARIMA is univariate. To forecast 10,000 different products, you must train 10,000 individual models. This can be computationally expensive. Distributed computing frameworks (like Spark with Pandas UDFs) may be necessary to parallelize model training.
  • Multivariate Analysis: ARIMA (and SARIMA) cannot natively use external regressors or related time series (e.g., using a competitor's price to predict your sales). For this, SARIMAX (which auto_arima often finds) or VAR (Vector AutoRegression) models are required.
  • Interpretability vs. Performance: ARIMA is highly interpretable. You can precisely state the statistical relationships (e.g., $p=1$ means sales are dependent on last month's sales). Deep Learning models (LSTMs, etc.) may achieve higher accuracy on complex, non-linear data but are significantly less interpretable ("black box").
  • Data Pipeline: A production-grade forecasting system requires a robust data pipeline for automated re-training. A model trained on 2023 data will become "stale" and perform poorly on 2025 data. A MLOps pipeline should be established to automatically pull new data, re-run auto_arima to check for parameter drift, and deploy the new model.

Finally

ARIMA/SARIMA models provide a robust, statistically-grounded baseline for time-series forecasting. Their strength lies in their ability to model trends and seasonality with high interpretability. For many enterprise needs—such as inventory, financial, or resource planning—a well-diagnosed SARIMA model is often sufficient and preferable to more complex solutions.

Product Engineering Services

Work with our in-house Project Managers, Software Engineers and QA Testers to build your new custom software product or to support your current workflow, following Agile, DevOps and Lean methodologies.

Build with 4Geeks

The key to a successful implementation is not just fitting the model, but rigorously adhering to the workflow: validate stationarity ($d$), tune parameters ($p, q, P, Q$) systematically, and thoroughly check model diagnostics.

While this guide provides the building blocks, scaling these methods into an automated, reliable forecasting engine across multiple business units is a significant challenge. This is where specialized AI engineering services for enterprises become invaluable, translating statistical models into high-availability, production-grade forecasting platforms.

FAQs

What is an ARIMA model and what do its (p, d, q) components mean?

ARIMA, or AutoRegressive Integrated Moving Average, is a class of statistical models used to capture and predict temporal structures in time-series data. The model is defined by three parameters: p, d, and q.

  • p (AutoRegressive): This component models the relationship between a current observation and a specific number (p) of previous (lagged) observations.
  • d (Integrated): This represents the number of times the raw data must be "differenced" (subtracting the previous value from the current value) to achieve stationarity, a state where the data's statistical properties (like mean and variance) are constant over time.
  • q (Moving Average): This component models the relationship between an observation and the residual errors from a moving average model based on q lagged observations, capturing short-term, random fluctuations.

What is stationarity and why is it essential for ARIMA forecasting?

Stationarity is a critical prerequisite for building an ARIMA model. A time series is considered stationary if its statistical properties—such as its mean, variance, and autocorrelation—are constant over time. It should not exhibit clear trends (like consistent upward growth) or cyclical patterns. The "Integrated" (d) component of ARIMA is specifically designed to address non-stationarity. This parameter indicates how many differencing operations are needed to transform the data into a stationary series, which is required for the AutoRegressive (p) and Moving Average (q) components to function correctly. The Augmented Dickey-Fuller (ADF) test is commonly used to check if a series is stationary.

What are the key steps to implement an ARIMA forecasting model?

A typical workflow for implementing an ARIMA model involves several key steps:

  1. Data Loading and Visualization: First, the time-series data is loaded and visualized to identify any obvious trends, seasonality, or cyclical patterns.
  2. Ensure Stationarity: The data is tested for stationarity using a statistical test like the Augmented Dickey-Fuller (ADF) test. If the data is non-stationary, differencing is applied (this determines the d parameter).
  3. Identify Parameters (p and q): Once the data is stationary, the p (AutoRegressive) and q (Moving Average) parameters are identified. This can be done manually by inspecting Autocorrelation (ACF) and Partial Autocorrelation (PACF) plots or, more commonly, by using an automated algorithm like auto_arima which finds the best parameters based on a criterion like AIC.
  4. Model Validation and Diagnostics: After fitting the model, its residuals (the errors) are analyzed. The residuals should ideally behave like "white noise"—uncorrelated, with a zero mean and constant variance.
  5. Backtesting and Forecasting: The model is validated on a test set of unseen data to measure its performance (e.g., using RMSE). Once validated, the final model is trained on the entire dataset and used to forecast future values.

Read more