How to Implement a Blue-Green Deployment Strategy on AWS

How to Implement a Blue-Green Deployment Strategy on AWS
Photo by Desola Lanre-Ologun / Unsplash

Blue-Green deployment is a release strategy that minimizes downtime and reduces risk by running two identical production environments, referred to as "Blue" and "Green." At any given time, only one of the environments is live, serving all production traffic.

This article provides a detailed, actionable guide for implementing Blue-Green deployments on AWS, focusing on architectural patterns, automation, and the critical challenge of managing stateful components.

Architectural Prerequisites

Before implementing a Blue-Green strategy, your architecture must adhere to several core principles. Failure to address these prerequisites will negate the benefits and introduce significant operational complexity.

  1. Immutable Infrastructure: The cornerstone of predictable deployments. Both Blue and Green environments should be built from identical, version-controlled templates (e.g., AMIs, Docker images, CloudFormation/Terraform templates). The Green environment is never modified in-place; it is created fresh from the new artifact, tested, and then promoted. The old Blue environment is terminated, not updated.
  2. Stateless Application Tier: Your application servers must be stateless. Any session data, user state, or temporary files must be externalized to a distributed cache (like ElastiCache for Redis) or a shared data store (like S3 or DynamoDB). This ensures that traffic can be shifted between environments without any loss of user context.
  3. Comprehensive Automation: The entire process—provisioning infrastructure, deploying the application, running tests, and switching traffic—must be fully automated. Manual steps introduce the potential for human error, which this strategy is designed to eliminate. AWS CodePipeline, CodeBuild, and CodeDeploy are the primary tools for this orchestration.
  4. Robust Health Checks and Monitoring: You must have reliable health checks that validate not just instance health (HTTP 200 OK) but also application functionality and dependencies. These checks are the gatekeepers that determine whether the Green environment is ready to receive production traffic.

Core Implementation Patterns on AWS

There are two primary methods for directing traffic between Blue and Green environments on AWS, each with distinct trade-offs.

Product Engineering Services

Work with our in-house Project Managers, Software Engineers and QA Testers to build your new custom software product or to support your current workflow, following Agile, DevOps and Lean methodologies.

Build with 4Geeks

1. DNS-Level Switching with Amazon Route 53

This pattern involves manipulating DNS records to redirect traffic from the Blue environment's endpoint to the Green environment's endpoint. It is conceptually simple but has notable drawbacks.

Architecture:

Each environment (Blue and Green) has its own independent stack, including an Application Load Balancer (ALB) and an Auto Scaling Group (ASG). A single Route 53 record, configured with a Weighted Routing Policy, points to both ALBs.

Execution Flow:

  1. Provision Green: A complete, parallel stack (VPC, ALB, ASG, EC2 instances) is provisioned for the new application version ("Green").
  2. Deploy & Test: The new application artifact is deployed to the Green instances. Automated smoke tests, integration tests, and health checks are executed against the Green ALB's direct DNS name (e.g., green-alb-12345.us-east-1.elb.amazonaws.com).
  3. Update DNS Weights: Initially, the Route 53 weighted record directs 100% of traffic to the Blue ALB and 0% to the Green ALB. To switch, you update the record set to shift the weights: 0% to Blue and 100% to Green.
  4. Monitor: Observe application metrics (latency, error rates) to confirm the health of the Green environment under full production load.
  5. Decommission Blue: After a predetermined "bake time" (e.g., one hour), the Blue environment is terminated.

AWS CLI Example (DNS Weight Shift):

Assume your hosted-zone-id is Z0123456789ABCDEF and your domain is api.example.com. The following command updates the record set to direct all traffic to the Green ALB (green-alb-dns-name).

{
  "Comment": "Switching production traffic to the Green environment",
  "Changes": [
    {
      "Action": "UPSERT",
      "ResourceRecordSet": {
        "Name": "api.example.com",
        "Type": "CNAME",
        "SetIdentifier": "blue-environment",
        "Weight": 0,
        "TTL": 60,
        "ResourceRecords": [{ "Value": "blue-alb-dns-name.us-east-1.elb.amazonaws.com" }]
      }
    },
    {
      "Action": "UPSERT",
      "ResourceRecordSet": {
        "Name": "api.example.com",
        "Type": "CNAME",
        "SetIdentifier": "green-environment",
        "Weight": 100,
        "TTL": 60,
        "ResourceRecords": [{ "Value": "green-alb-dns-name.us-east-1.elb.amazonaws.com" }]
      }
    }
  ]
}
# Execute the change
aws route53 change-resource-record-sets \
  --hosted-zone-id Z0123456789ABCDEF \
  --change-batch file://traffic-shift-to-green.json
  • Pros: Simple to understand; works across regions.
  • Cons: DNS caching. Clients and resolvers will cache the old DNS record for the duration of its Time-To-Live (TTL). Even with a low TTL (e.g., 60 seconds), the traffic shift is not instantaneous or deterministic. Some users may continue hitting the Blue environment long after the switch, which can be problematic.

2. Load Balancer-Level Switching with AWS CodeDeploy and ALB Target Groups

This is the recommended and more robust pattern for most use cases. It provides near-instantaneous, precise traffic control by manipulating the Application Load Balancer's listener rules, avoiding DNS propagation delays entirely.

Architecture:

A single ALB is used with two distinct Target Groups: target-blue and target-green. The ALB's production listener rule initially forwards all traffic to target-blue. The deployment process involves provisioning a new set of instances and registering them with target-green.

Execution Flow (Automated by AWS CodeDeploy):

  1. Deployment Start: A new deployment is initiated in CodeDeploy, targeting a Deployment Group configured for Blue/Green.
  2. Provision Green: CodeDeploy provisions a new set of EC2 instances (the "replacement" or "Green" fleet) based on your Auto Scaling Group launch template.
  3. Install & Test: The appspec.yml file orchestrates the deployment on these new instances.
    • BeforeInstall, Install, AfterInstall: The new application revision is downloaded and installed.
    • ApplicationStart, ValidateService: The application is started, and health checks are run.
    • Traffic Hook (BeforeAllowTraffic): This is a crucial step. CodeDeploy allows you to run a Lambda function to perform comprehensive integration tests on the Green environment before it receives any production traffic.
  4. Reroute Traffic: If the tests pass, CodeDeploy automatically modifies the ALB's listener rule to point from target-blue to target-green. This switch is atomic and instantaneous.
  5. Keep Original (Bake Time): The old instances in target-blue are kept running for a configured period. This allows for an immediate rollback if issues are detected post-deployment.
  6. Terminate Blue: After the bake time expires, CodeDeploy terminates the old instances.

Example appspec.yml for CodeDeploy:

This file defines the hooks that CodeDeploy executes during the deployment lifecycle.

version: 0.0
os: linux
files:
  - source: /
    destination: /var/www/html/my-app
hooks:
  BeforeInstall:
    - location: scripts/stop_server.sh
      timeout: 300
      runas: root
  ApplicationStart:
    - location: scripts/start_server.sh
      timeout: 300
      runas: root
  ValidateService:
    - location: scripts/validate_health.sh
      timeout: 60
      runas: root
  # Hooks for Blue/Green traffic control
  BeforeAllowTraffic:
    - location: scripts/run_integration_tests.sh
      timeout: 1800
  AfterAllowTraffic:
    - location: scripts/post_deployment_smoke_test.sh
      timeout: 300
  • Pros: Instantaneous traffic cutover. No DNS issues. Rollback is equally fast by simply reversing the listener rule change. Integrates seamlessly with the AWS ecosystem (CodePipeline, ASG).
  • Cons: Slightly more complex initial setup. Confined to a single region.

The Database Challenge: Managing State

The most significant challenge in a Blue-Green deployment is managing the persistence layer. Stateless application tiers are simple to swap; databases are not.

Product Engineering Services

Work with our in-house Project Managers, Software Engineers and QA Testers to build your new custom software product or to support your current workflow, following Agile, DevOps and Lean methodologies.

Build with 4Geeks

Strategy A: Shared Database (For Backward-Compatible Changes)

The simplest approach is for both Blue and Green environments to share the same database.

  • Requirement: This mandates strict schema discipline. All database changes deployed with the Green environment must be backward-compatible. This typically means only additive changes are allowed (new tables, new columns with default values, new indexes). Destructive changes (dropping columns, renaming tables) must be deferred until all traffic has been off the Blue environment for a safe period.
  • Rollback: Rollback is straightforward. Since both versions of the application were writing to the same database, simply switching traffic back to the Blue environment works seamlessly.

Strategy B: Blue-Green Database Deployment (For Breaking Changes)

For changes that are not backward-compatible, a more sophisticated approach is required. Amazon RDS Blue/Green Deployments is a managed feature that automates this process.

Execution Flow with RDS Blue/Green:

  1. Create Green DB: Using the AWS Console or CLI, you create a "Green" database deployment from your "Blue" (production) RDS instance. AWS creates a fully-synced, physically replicated clone of your production database.
  2. Deploy Green App: Provision your Green application environment and configure it to point to the Green database's endpoint.
  3. Test: Perform validation and testing on the fully isolated Green stack (app + DB). The Green DB continues to sync from the Blue DB via logical replication.
  4. Switchover: When ready, you initiate the switchover. RDS performs the following guarded actions:
    • Blocks writes on both Blue and Green databases.
    • Waits for the Green database to fully catch up with the last transactions from the Blue.
    • Promotes the Green DB to be the new primary, renaming the database endpoints so the application's connection string does not need to change.
    • Redirects replication from the old primary to the new one.
  5. Traffic Cutover: Simultaneously, you perform the application traffic switch (e.g., via ALB target group swap).

This approach provides complete isolation and allows for complex schema changes but introduces a brief write-interruption during the final switchover (typically under one minute).

Conclusion

Implementing a Blue-Green deployment strategy on AWS is a powerful technique for achieving near-zero-downtime releases and mitigating deployment risk. For most applications, the ALB Target Group Swapping method orchestrated by AWS CodeDeploy offers the most precise and reliable control over traffic. DNS-based switching remains a viable, simpler alternative for services where DNS propagation delays are acceptable.

Success, however, is less about the specific AWS service and more about architectural discipline. Embracing immutable infrastructure, ensuring application statelessness, and rigorously automating the process are non-negotiable prerequisites. For stateful systems, the database becomes the central challenge, requiring a deliberate strategy for managing schema evolution—either through strict backward compatibility or by leveraging advanced features like RDS Blue/Green Deployments for a fully isolated transition.

Read more