Implementing CI/CD for Machine Learning Models with GitHub Actions

Implementing CI/CD for Machine Learning Models with GitHub Actions
Photo by Roman Synkevych / Unsplash

In modern software engineering, CI/CD pipelines are the bedrock of efficient and reliable software delivery. However, when the asset being delivered is a machine learning model, the traditional CI/CD paradigm falls short. The unique lifecycle of ML models—encompassing data validation, training, and performance monitoring—necessitates a specialized approach known as MLOps.

This article provides a comprehensive, actionable guide for CTOs and senior engineers on architecting and implementing a robust CI/CD pipeline for machine learning models using GitHub Actions. We will move beyond theory to provide concrete code examples and architectural patterns that you can implement directly. The focus will be on automating the entire process from code commit to a production-ready, containerized model endpoint.

🎧
Listen to this and other episodes of The 4Geeks Podcast on your favorite podcasting platform, including Apple Podcast, Spotify and YouTube.

Architectural Overview: The MLOps CI/CD Pipeline

The core challenge in MLOps is managing the trifecta of code, data, and model as a single, cohesive unit. Our pipeline must automate processes that are often manual and error-prone, ensuring reproducibility and governance.

LLM & AI Engineering Services

We provide a comprehensive suite of AI-powered solutions, including generative AI, computer vision, machine learning, natural language processing, and AI-backed automation.

Learn more

The architecture we will build is triggered by a git push to the main branch and consists of two primary stages orchestrated by GitHub Actions:

  1. Continuous Integration (CI) & Continuous Training (CT): This stage is responsible for validating the entire ML system. It includes:
    • Code Validation: Linting and running unit tests on the model's source code (e.g., data preprocessing, feature engineering).
    • Data Validation: Ensuring the training data conforms to an expected schema and distribution.
    • Model Training & Evaluation: Training the model on a versioned dataset and evaluating its performance against predefined metrics (e.g., accuracy, F1-score). If performance meets the threshold, the trained model is versioned and saved.
    • Artifact Generation: The serialized model, performance metrics, and any other required assets are packaged as build artifacts.
  2. Continuous Delivery (CD): This stage focuses on packaging and deploying the validated model.
    • Containerization: Building a Docker image containing the model and a serving layer (e.g., a FastAPI application).
    • Image Publication: Pushing the versioned Docker image to a container registry (e.g., GitHub Container Registry, Docker Hub, AWS ECR).
    • Deployment: Automatically deploying the container to a target environment, such as a Kubernetes cluster or a cloud-based container service.

This separation ensures that only models that pass rigorous automated checks are considered for deployment, minimizing the risk of introducing regressions into production.

Step-by-Step Implementation

Let's proceed with a practical implementation. We will use a simple Scikit-learn model, but the principles are directly applicable to more complex models built with TensorFlow, PyTorch, or other frameworks.

Repository Structure

A well-organized repository is critical for a manageable MLOps project.

/
├── .github/
│   └── workflows/
│       └── ci-cd.yml         # GitHub Actions workflow definition
├── data/                     # Raw data (managed by DVC)
├── models/                   # Output for serialized models
├── src/
│   ├── __init__.py
│   ├── train.py              # Model training and evaluation script
│   └── serve.py              # API server for model inference
├── tests/
│   └── test_model.py         # Unit tests for our code
├── requirements.txt          # Python dependencies
└── Dockerfile                # Docker definition for the serving image

Data and Model Versioning with DVC

Managing large datasets and model files directly in Git is impractical. We will leverage Data Version Control (DVC) to handle this. DVC stores pointers to data/models in Git while the actual files are stored in remote object storage (like S3, GCS, or even a shared network drive).

To initialize DVC and track our data:

# Install DVC
pip install dvc dvc-s3

# Initialize DVC in your Git repository
dvc init

# Configure remote storage
dvc remote add -d my-remote s3://your-bucket-name/dvc-store

# Add and track data
dvc add data/my_dataset.csv
git add data/my_dataset.csv.dvc .dvc/config
git commit -m "feat: Add initial dataset with DVC"
dvc push

This process ensures that every Git commit has a corresponding, immutable version of the data it was trained on.

The CI/CD Workflow: ci-cd.yml

This file is the heart of our automation. It defines the jobs, steps, and triggers for our pipeline. Place it in .github/workflows/ci-cd.yml.

name: MLOps CI/CD Pipeline

on:
  push:
    branches:
      - main

jobs:
  build-and-test:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout Repository
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.9'

      - name: Install Dependencies
        run: |
          pip install -r requirements.txt
          pip install dvc[s3] # Install DVC with S3 support

      - name: Authenticate with AWS for DVC
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-east-1

      - name: Pull Data with DVC
        run: dvc pull

      - name: Run Unit Tests
        run: pytest tests/

      - name: Train and Evaluate Model
        id: train
        run: python src/train.py

      - name: Upload Model Artifact
        uses: actions/upload-artifact@v4
        with:
          name: model
          path: |
            models/classifier.joblib
            metrics.json

  deploy:
    needs: build-and-test
    runs-on: ubuntu-latest
    steps:
      - name: Checkout Repository
        uses: actions/checkout@v4

      - name: Download Model Artifact
        uses: actions/download-artifact@v4
        with:
          name: model
          path: ./

      # We assume the model is downloaded to a directory matching its name
      # Move files to their correct locations for the Docker build context
      - name: Prepare Docker Build Context
        run: |
          mkdir -p models
          mv model/classifier.joblib models/
          mv model/metrics.json ./

      - name: Log in to GitHub Container Registry
        uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Build and Push Docker Image
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ghcr.io/${{ github.repository }}:${{ github.sha }}

      # Deployment step would go here. Example for a generic script execution.
      # - name: Deploy to Production
      #   run: ./scripts/deploy.sh ghcr.io/${{ github.repository }}:${{ github.sha }}

Key Points in the Workflow:

  • Secrets: We use secrets.AWS_ACCESS_KEY_ID and secrets.GITHUB_TOKEN for secure authentication. These must be configured in your repository's Settings > Secrets and variables > Actions.
  • Dependencies: The build-and-test job depends on the deploy job succeeding (needs: build-and-test), ensuring we only deploy validated models.
  • Artifacts: actions/upload-artifact and actions/download-artifact are crucial for passing the trained model between jobs, as each job runs in a clean, isolated environment.
  • Image Tagging: We tag the Docker image with the Git commit SHA (${{ github.sha }}). This provides explicit traceability from a deployed container image back to the exact code and data version that produced it.

LLM & AI Engineering Services

We provide a comprehensive suite of AI-powered solutions, including generative AI, computer vision, machine learning, natural language processing, and AI-backed automation.

Learn more

Core Application Code

Model Training Script (src/train.py)

This script handles training, evaluation, and serialization.

# src/train.py
import pandas as pd
import joblib
import json
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# 1. Load Data
df = pd.read_csv('data/my_dataset.csv')

# Dummy preprocessing
X = df.drop('target', axis=1)
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 2. Train Model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# 3. Evaluate Model
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy}")

# 4. Save Metrics
with open('metrics.json', 'w') as f:
    json.dump({'accuracy': accuracy}, f)

# Performance Gate: Fail the build if accuracy is below a threshold
MINIMUM_ACCURACY = 0.85
if accuracy < MINIMUM_ACCURACY:
    raise Exception(f"Model accuracy {accuracy} is below the threshold of {MINIMUM_ACCURACY}")

# 5. Serialize Model
joblib.dump(model, 'models/classifier.joblib')

print("Training complete. Model and metrics saved.")

Dockerfile for Model Serving

This Dockerfile creates a container image with our FastAPI application and the trained model.

# Dockerfile
FROM python:3.9-slim

WORKDIR /app

# Copy dependencies and install them
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy the application and model
COPY ./src /app/src
COPY ./models /app/models

# Expose port and define command
EXPOSE 8000
CMD ["uvicorn", "src.serve:app", "--host", "0.0.0.0", "--port", "8000"]

API Server (src/serve.py)

A simple FastAPI application to load the model and serve predictions.

# src/serve.py
import joblib
from fastapi import FastAPI
from pydantic import BaseModel
from typing import List

app = FastAPI(title="ML Model Server")

# Define the input data schema
class ModelInput(BaseModel):
    features: List[float]

# Load the trained model from the file system
model = joblib.load("models/classifier.joblib")

@app.post("/predict")
def predict(data: ModelInput):
    """Makes a prediction based on the input features."""
    prediction = model.predict([data.features])
    return {"prediction": int(prediction[0])}

@app.get("/")
def read_root():
    return {"status": "ML model server is running."}

Advanced Considerations and Next Steps

This pipeline provides a solid foundation, but a production-grade MLOps system involves more components.

  • Model Registry: For more robust model governance, integrate a model registry like MLflow or Weights & Biases. The CI pipeline would publish the model to the registry, which would then handle versioning, staging (e.g., "staging," "production"), and metadata storage. The CD pipeline would then pull a specific model version from the registry for deployment.
  • Feature Stores: In complex environments, a centralized Feature Store (e.g., Feast, Tecton) ensures consistency in feature engineering between training and inference, mitigating online/offline skew.
  • Continuous Monitoring: Deployment is not the final step. Implement monitoring for both operational metrics (latency, error rate) and model performance metrics (concept drift, data drift). If performance degrades, it should trigger an alert or a retraining pipeline.
  • Sophisticated Deployment Strategies: Instead of a direct deployment, consider canary releases or A/B testing. The CD pipeline can be extended to deploy the new model to a small subset of traffic, compare its performance against the incumbent model, and automatically promote or roll back based on the results.

Conclusion

By automating the machine learning lifecycle with a CI/CD pipeline, we transform model development from an artisanal craft into a disciplined, repeatable engineering process. Using GitHub Actions, we have demonstrated a practical, powerful, and accessible way to build this capability.

🎧
Listen to this and other episodes of The 4Geeks Podcast on your favorite podcasting platform, including Apple Podcast, Spotify and YouTube.

This approach significantly reduces the time-to-market for new models, improves collaboration between data science and engineering teams, and provides the necessary governance and reproducibility required for enterprise-grade ML systems. The framework presented here is not merely a theoretical exercise but a direct blueprint for building a more mature MLOps practice within your organization.

Read more