Engineering

Implementing CI/CD for Machine Learning Models with GitHub Actions

Allan Porras

20 Oct 2025 — 8 min read

In modern software engineering, CI/CD pipelines are the bedrock of efficient and reliable software delivery. However, when the asset being delivered is a machine learning model, the traditional CI/CD paradigm falls short. The unique lifecycle of ML models—encompassing data validation, training, and performance monitoring—necessitates a specialized approach known as MLOps.

This article provides a comprehensive, actionable guide for CTOs and senior engineers on architecting and implementing a robust CI/CD pipeline for machine learning models using GitHub Actions. We will move beyond theory to provide concrete code examples and architectural patterns that you can implement directly. The focus will be on automating the entire process from code commit to a production-ready, containerized model endpoint.

🎧

Listen to this and other episodes of The 4Geeks Podcast on your favorite podcasting platform, including Apple Podcast, Spotify and YouTube.

Architectural Overview: The MLOps CI/CD Pipeline

The core challenge in MLOps is managing the trifecta of code, data, and model as a single, cohesive unit. Our pipeline must automate processes that are often manual and error-prone, ensuring reproducibility and governance.

LLM & AI Engineering Services

We provide a comprehensive suite of AI-powered solutions, including generative AI, computer vision, machine learning, natural language processing, and AI-backed automation.

Learn more

The architecture we will build is triggered by a git push to the main branch and consists of two primary stages orchestrated by GitHub Actions:

Continuous Integration (CI) & Continuous Training (CT): This stage is responsible for validating the entire ML system. It includes:
- Code Validation: Linting and running unit tests on the model's source code (e.g., data preprocessing, feature engineering).
- Data Validation: Ensuring the training data conforms to an expected schema and distribution.
- Model Training & Evaluation: Training the model on a versioned dataset and evaluating its performance against predefined metrics (e.g., accuracy, F1-score). If performance meets the threshold, the trained model is versioned and saved.
- Artifact Generation: The serialized model, performance metrics, and any other required assets are packaged as build artifacts.
Continuous Delivery (CD): This stage focuses on packaging and deploying the validated model.
- Containerization: Building a Docker image containing the model and a serving layer (e.g., a FastAPI application).
- Image Publication: Pushing the versioned Docker image to a container registry (e.g., GitHub Container Registry, Docker Hub, AWS ECR).
- Deployment: Automatically deploying the container to a target environment, such as a Kubernetes cluster or a cloud-based container service.

This separation ensures that only models that pass rigorous automated checks are considered for deployment, minimizing the risk of introducing regressions into production.

Step-by-Step Implementation

Let's proceed with a practical implementation. We will use a simple Scikit-learn model, but the principles are directly applicable to more complex models built with TensorFlow, PyTorch, or other frameworks.

Repository Structure

A well-organized repository is critical for a manageable MLOps project.

/
├── .github/
│   └── workflows/
│       └── ci-cd.yml         # GitHub Actions workflow definition
├── data/                     # Raw data (managed by DVC)
├── models/                   # Output for serialized models
├── src/
│   ├── __init__.py
│   ├── train.py              # Model training and evaluation script
│   └── serve.py              # API server for model inference
├── tests/
│   └── test_model.py         # Unit tests for our code
├── requirements.txt          # Python dependencies
└── Dockerfile                # Docker definition for the serving image

Data and Model Versioning with DVC

Managing large datasets and model files directly in Git is impractical. We will leverage Data Version Control (DVC) to handle this. DVC stores pointers to data/models in Git while the actual files are stored in remote object storage (like S3, GCS, or even a shared network drive).

To initialize DVC and track our data:

# Install DVC
pip install dvc dvc-s3

# Initialize DVC in your Git repository
dvc init

# Configure remote storage
dvc remote add -d my-remote s3://your-bucket-name/dvc-store

# Add and track data
dvc add data/my_dataset.csv
git add data/my_dataset.csv.dvc .dvc/config
git commit -m "feat: Add initial dataset with DVC"
dvc push

This process ensures that every Git commit has a corresponding, immutable version of the data it was trained on.

The CI/CD Workflow: `ci-cd.yml`

This file is the heart of our automation. It defines the jobs, steps, and triggers for our pipeline. Place it in .github/workflows/ci-cd.yml.

name: MLOps CI/CD Pipeline

on:
  push:
    branches:
      - main

jobs:
  build-and-test:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout Repository
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.9'

      - name: Install Dependencies
        run: |
          pip install -r requirements.txt
          pip install dvc[s3] # Install DVC with S3 support

      - name: Authenticate with AWS for DVC
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-east-1

      - name: Pull Data with DVC
        run: dvc pull

      - name: Run Unit Tests
        run: pytest tests/

      - name: Train and Evaluate Model
        id: train
        run: python src/train.py

      - name: Upload Model Artifact
        uses: actions/upload-artifact@v4
        with:
          name: model
          path: |
            models/classifier.joblib
            metrics.json

  deploy:
    needs: build-and-test
    runs-on: ubuntu-latest
    steps:
      - name: Checkout Repository
        uses: actions/checkout@v4

      - name: Download Model Artifact
        uses: actions/download-artifact@v4
        with:
          name: model
          path: ./

      # We assume the model is downloaded to a directory matching its name
      # Move files to their correct locations for the Docker build context
      - name: Prepare Docker Build Context
        run: |
          mkdir -p models
          mv model/classifier.joblib models/
          mv model/metrics.json ./

      - name: Log in to GitHub Container Registry
        uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Build and Push Docker Image
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ghcr.io/${{ github.repository }}:${{ github.sha }}

      # Deployment step would go here. Example for a generic script execution.
      # - name: Deploy to Production
      #   run: ./scripts/deploy.sh ghcr.io/${{ github.repository }}:${{ github.sha }}

Key Points in the Workflow:

Secrets: We use secrets.AWS_ACCESS_KEY_ID and secrets.GITHUB_TOKEN for secure authentication. These must be configured in your repository's Settings > Secrets and variables > Actions.
Dependencies: The build-and-test job depends on the deploy job succeeding (needs: build-and-test), ensuring we only deploy validated models.
Artifacts: actions/upload-artifact and actions/download-artifact are crucial for passing the trained model between jobs, as each job runs in a clean, isolated environment.
Image Tagging: We tag the Docker image with the Git commit SHA (${{ github.sha }}). This provides explicit traceability from a deployed container image back to the exact code and data version that produced it.

LLM & AI Engineering Services

We provide a comprehensive suite of AI-powered solutions, including generative AI, computer vision, machine learning, natural language processing, and AI-backed automation.

Learn more

Core Application Code

Model Training Script (`src/train.py`)

This script handles training, evaluation, and serialization.

# src/train.py
import pandas as pd
import joblib
import json
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# 1. Load Data
df = pd.read_csv('data/my_dataset.csv')

# Dummy preprocessing
X = df.drop('target', axis=1)
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 2. Train Model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# 3. Evaluate Model
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy}")

# 4. Save Metrics
with open('metrics.json', 'w') as f:
    json.dump({'accuracy': accuracy}, f)

# Performance Gate: Fail the build if accuracy is below a threshold
MINIMUM_ACCURACY = 0.85
if accuracy < MINIMUM_ACCURACY:
    raise Exception(f"Model accuracy {accuracy} is below the threshold of {MINIMUM_ACCURACY}")

# 5. Serialize Model
joblib.dump(model, 'models/classifier.joblib')

print("Training complete. Model and metrics saved.")

Dockerfile for Model Serving

This Dockerfile creates a container image with our FastAPI application and the trained model.

# Dockerfile
FROM python:3.9-slim

WORKDIR /app

# Copy dependencies and install them
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy the application and model
COPY ./src /app/src
COPY ./models /app/models

# Expose port and define command
EXPOSE 8000
CMD ["uvicorn", "src.serve:app", "--host", "0.0.0.0", "--port", "8000"]

API Server (`src/serve.py`)

A simple FastAPI application to load the model and serve predictions.

# src/serve.py
import joblib
from fastapi import FastAPI
from pydantic import BaseModel
from typing import List

app = FastAPI(title="ML Model Server")

# Define the input data schema
class ModelInput(BaseModel):
    features: List[float]

# Load the trained model from the file system
model = joblib.load("models/classifier.joblib")

@app.post("/predict")
def predict(data: ModelInput):
    """Makes a prediction based on the input features."""
    prediction = model.predict([data.features])
    return {"prediction": int(prediction[0])}

@app.get("/")
def read_root():
    return {"status": "ML model server is running."}

Advanced Considerations and Next Steps

This pipeline provides a solid foundation, but a production-grade MLOps system involves more components.

🎧

Listen to this and other episodes of The 4Geeks Podcast on your favorite podcasting platform, including Apple Podcast, Spotify and YouTube.

Model Registry: For more robust model governance, integrate a model registry like MLflow or Weights & Biases. The CI pipeline would publish the model to the registry, which would then handle versioning, staging (e.g., "staging," "production"), and metadata storage. The CD pipeline would then pull a specific model version from the registry for deployment.
Feature Stores: In complex environments, a centralized Feature Store (e.g., Feast, Tecton) ensures consistency in feature engineering between training and inference, mitigating online/offline skew.
Continuous Monitoring: Deployment is not the final step. Implement monitoring for both operational metrics (latency, error rate) and model performance metrics (concept drift, data drift). If performance degrades, it should trigger an alert or a retraining pipeline.
Sophisticated Deployment Strategies: Instead of a direct deployment, consider canary releases or A/B testing. The CD pipeline can be extended to deploy the new model to a small subset of traffic, compare its performance against the incumbent model, and automatically promote or roll back based on the results.

Conclusion

By automating the machine learning lifecycle with a CI/CD pipeline, we transform model development from an artisanal craft into a disciplined, repeatable engineering process. Using GitHub Actions, we have demonstrated a practical, powerful, and accessible way to build this capability.

This approach significantly reduces the time-to-market for new models, improves collaboration between data science and engineering teams, and provides the necessary governance and reproducibility required for enterprise-grade ML systems. The framework presented here is not merely a theoretical exercise but a direct blueprint for building a more mature MLOps practice within your organization.

FAQs

What is MLOps and how does it differ from traditional CI/CD?

MLOps is a specialized approach to CI/CD pipelines tailored for the unique lifecycle of machine learning models. Unlike traditional CI/CD, which focuses mainly on code, MLOps manages the trifecta of code, data, and models as a single, cohesive unit. It automates processes unique to machine learning, such as data validation, model training, and performance evaluation.

How do GitHub Actions automate a machine learning pipeline?

GitHub Actions automate the ML pipeline using a workflow YAML file triggered by Git events (like a push). This workflow defines a series of jobs. A Continuous Integration/Continuous Training (CI/CT) job validates code, tests data, and trains the model. If it passes, a Continuous Delivery (CD) job takes over to build a Docker image of the model, push it to a registry, and deploy it as a production-ready endpoint.

Why is data and model versioning important in MLOps?

Data and model versioning is critical for reproducibility and governance. Since large data files and models cannot be stored directly in Git, tools like Data Version Control (DVC) are used. DVC stores small pointer files in Git that reference large data files in remote storage (like S3). This ensures that every Git commit has an immutable, corresponding version of the data it was trained on, making every experiment traceable and reproducible.

Implementing CI/CD for Machine Learning Models with GitHub Actions

Allan Porras

Architectural Overview: The MLOps CI/CD Pipeline

LLM & AI Engineering Services

Step-by-Step Implementation

Repository Structure

Data and Model Versioning with DVC

The CI/CD Workflow: `ci-cd.yml`

LLM & AI Engineering Services

Core Application Code

Model Training Script (`src/train.py`)

Dockerfile for Model Serving

API Server (`src/serve.py`)

Advanced Considerations and Next Steps

Conclusion

FAQs

What is MLOps and how does it differ from traditional CI/CD?

How do GitHub Actions automate a machine learning pipeline?

Why is data and model versioning important in MLOps?

Read more

How Serial Churners Are Changing Subscriptions: Solutions from 4Geeks Payments Globally

Personalized Subscription Plans for US Startups: Discover 4Geeks Payments Features

How to Optimize Subscription Retention in Asia Using 4Geeks Payments

Top Subscription Management Trends 2026: Boost Revenue with 4Geeks Payments for Global SaaS Businesses

Architectural Overview: The MLOps CI/CD Pipeline

LLM & AI Engineering Services

Step-by-Step Implementation

Repository Structure

Data and Model Versioning with DVC

The CI/CD Workflow: ci-cd.yml

LLM & AI Engineering Services

Core Application Code

Model Training Script (src/train.py)

Dockerfile for Model Serving

API Server (src/serve.py)

Advanced Considerations and Next Steps

Conclusion

FAQs

What is MLOps and how does it differ from traditional CI/CD?

How do GitHub Actions automate a machine learning pipeline?

Why is data and model versioning important in MLOps?

Read more

How Serial Churners Are Changing Subscriptions: Solutions from 4Geeks Payments Globally

Personalized Subscription Plans for US Startups: Discover 4Geeks Payments Features

How to Optimize Subscription Retention in Asia Using 4Geeks Payments

Top Subscription Management Trends 2026: Boost Revenue with 4Geeks Payments for Global SaaS Businesses

The CI/CD Workflow: `ci-cd.yml`

Model Training Script (`src/train.py`)

API Server (`src/serve.py`)