Engineering

How to Use Infrastructure as Code (IaC) to Manage Your Cloud Resources

Allan Porras

02 Dec 2025 — 9 min read

In modern cloud-native environments, manual infrastructure management—colloquially known as "click-ops"—is an organizational liability. It is brittle, impossible to audit, prone to human error, and creates insidious configuration drift. For engineering leaders, the objective is clear: infrastructure must be managed with the same rigor, testability, and repeatability as the applications that run on it.

This is the central premise of Infrastructure as Code (IaC).

IaC is the practice of managing and provisioning infrastructure (networks, virtual machines, load balancers, databases) through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools. This article provides a technical deep-dive into how to implement IaC effectively, moving beyond introductory concepts to discuss architectural patterns, state management, CI/CD integration, and advanced challenges relevant to CTOs and senior engineers.

Product Engineering Services

Work with our in-house Project Managers, Software Engineers and QA Testers to build your new custom software product or to support your current workflow, following Agile, DevOps and Lean methodologies.

Build with 4Geeks

Core Decision: Declarative vs. Imperative IaC

Your first architectural decision is the paradigm of your IaC.

Imperative (Procedural): You write scripts that define the steps to achieve a desired state. (e.g., "Create a VM," "Check if S3 bucket exists," "If not, create bucket," "Set policy"). Tools like shell scripts using the AWS CLI or the AWS SDK fall into this category.
- Problem: These scripts are not inherently idempotent. Running one twice may fail or create duplicate resources. They become exponentially complex as you must manually code for every possible current state.
Declarative (Functional): You define the desired end state of your infrastructure. (e.g., "I require one t3.medium EC2 instance with this AMI and two S3 buckets with these policies"). The IaC tool is responsible for calculating the differential (the "plan") and executing the necessary API calls to reconcile the real-world state with your defined state.
- Benefit: This is inherently idempotent. Running the definition 100 times will result in the same end state, with the tool making no changes after the first successful application.

Verdict: A production-grade strategy must be declarative. The most mature and widely adopted tools in this space are Terraform, AWS CloudFormation/CDK, and Pulumi.

Tooling Architecture: Key Trade-offs

Your choice of tool dictates your workflow, multi-cloud capabilities, and the skillset required of your team.

Tool	Language	State Management	Key Pro	Key Con
Terraform	HCL (HashiCorp Config. Language)	Self-managed (e.g., S3 + DynamoDB)	Cloud-Agnostic: Best-in-class provider ecosystem for AWS, GCP, Azure, etc.	State management is a critical, self-managed component.
AWS CDK	TypeScript, Python, Go, etc.	Managed by AWS (via CloudFormation)	General-purpose language: Use loops, classes, logic. Deep AWS integration.	AWS-only: No multi-cloud capability.
Pulumi	TypeScript, Python, Go, etc.	Managed by Pulumi Service (default) or self-hosted	General-purpose language + Cloud-Agnostic. Can use software engineering patterns (unit tests, classes).	Newer, smaller community than Terraform.
CloudFormation	YAML / JSON	Managed by AWS	Atomic, transactional deployments with rollbacks (Change Sets).	Extremely verbose and difficult to author manually. (Often used as the target for CDK, not authored directly).

Example: Defining an S3 Bucket

Observe the difference in authoring experience.

Terraform (HCL):

Concise, purpose-built, and declarative.

resource "aws_s3_bucket" "artifacts" {
  bucket = "my-prod-app-artifacts"

  tags = {
    Environment = "Production"
    ManagedBy   = "Terraform"
  }
}

resource "aws_s3_bucket_public_access_block" "artifacts_access" {
  bucket                  = aws_s3_bucket.artifacts.id
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

AWS CDK (TypeScript):

Uses a general-purpose language, which appeals to software engineers. This code synthesizes into a verbose CloudFormation YAML template.

import * as s3 from 'aws-cdk-lib/aws-s3';
import { Construct } from 'constructs';
import { Stack, StackProps, RemovalPolicy } from 'aws-cdk-lib';

export class ArtifactsStack extends Stack {
  constructor(scope: Construct, id: string, props?: StackProps) {
    super(scope, id, props);

    new s3.Bucket(this, 'ArtifactsBucket', {
      bucketName: 'my-prod-app-artifacts',
      publicReadAccess: false,
      blockPublicAccess: s3.BlockPublicAccess.BLOCK_ALL,
      removalPolicy: RemovalPolicy.RETAIN, // Production safety
      versioned: true,
      encryption: s3.BucketEncryption.S3_MANAGED,
    });
  }
}

CTO's Takeaway:

For multi-cloud or to enforce a single, unified workflow across vendors, Terraform or Pulumi are the clear choices.
For an AWS-only shop that wants to empower developers to use familiar languages, the AWS CDK is a powerful, first-party solution.

Practical Implementation: A Production-Grade IaC Workflow

This is the most critical section. A tool is useless without a robust, safe, and automated workflow. We will use Terraform for these examples due to its cloud-agnostic prevalence.

Step 1: Secure Remote State Management

The Terraform state file is a JSON file that maps your code definitions to real-world resource IDs.

It is the single source of truth.
It often contains sensitive data.
It must be shared by all team members and CI/CD systems.
It must be locked to prevent concurrent, conflicting apply operations.

NEVER commit terraform.tfstate to Git. NEVER manage it locally on your laptop.

Solution: Use a remote backend with locking. For AWS, the standard is S3 for storage and DynamoDB for locking.

File: backend.tf

terraform {
  backend "s3" {
    bucket         = "my-company-terraform-state-prod"
    key            = "global/s3/terraform.tfstate" // Unique key per project/env
    region         = "us-east-1"
    dynamodb_table = "terraform-state-lock-prod"
    encrypt        = true
  }
}

This configuration must be bootstrapped: you must create the S3 bucket and DynamoDB table before you can run Terraform (this one-time "chicken-and-egg" problem can be solved with a simple CLI command or a separate, minimal IaC definition).

Step 2: A Modular, Environment-Driven Repository Structure

Do not put all resources for all environments in one giant main.tf file. This is unmaintainable. The goal is to maximize code reuse and isolate environmental blast-radius.

Recommended Structure:

/terraform-infra
├── README.md
├── environments
│   ├── production
│   │   ├── main.tf         # Defines backend, providers, and calls modules
│   │   ├── outputs.tf
│   │   └── terraform.tfvars  # Prod-specific variables (e.g., instance_count = 10)
│   └── staging
│       ├── main.tf
│       ├── outputs.tf
│       └── terraform.tfvars  # Staging-specific (e.g., instance_count = 1)
│
└── modules
    ├── vpc
    │   ├── main.tf         # Defines VPC, subnets, NAT gateways...
    │   ├── variables.tf    # Input variables (e.g., vpc_cidr_block)
    │   └── outputs.tf      # Output variables (e.g., vpc_id, private_subnet_ids)
    ├── ecs_service
    │   ├── main.tf         # Defines ECS service, task def, LB...
    │   ├── variables.tf
    │   └── outputs.tf
    └── rds_instance
        ├── main.tf
        ├── variables.tf
        └── outputs.tf

environments/production/main.tf:

This file composes modules, creating the actual infrastructure.

provider "aws" {
  region = "us-east-1"
}

# Load prod-specific variables
variable "instance_count" { type = number }

# Call the reusable VPC module
module "vpc" {
  source = "../../modules/vpc" // Use the local module

  vpc_cidr_block = "10.0.0.0/16"
  env            = "production"
}

# Call the reusable ECS service module
module "app_service" {
  source = "../../modules/ecs_service"

  vpc_id          = module.vpc.vpc_id
  subnet_ids      = module.vpc.private_subnet_ids
  instance_count  = var.instance_count // From terraform.tfvars
  docker_image    = "my-app:1.2.5-prod"
}

This pattern provides isolation (staging and prod have different state files) and reusability (the vpc module is defined once and used by all environments).

Product Engineering Services

Build with 4Geeks

Step 3: The GitOps CI/CD Pipeline

All infrastructure changes must go through a pull request (PR) and CI/CD. No exceptions.

Workflow:

Branch: Engineer creates a feature branch (e.g., feat/add-redis-cache).
Code: Engineer adds a new module call (e.g., module "redis" { ... }) to the staging environment.
Commit/Push: Engineer pushes the branch.
Pull Request: Engineer opens a PR against the main or develop branch.
CI Pipeline (on PR): This is the automated safety net.
- Lint & Format: terraform fmt -check
- Static Analysis: tfsec . or checkov -d . (Finds security risks like public S3 buckets or unencrypted disks).
- Initialize: terraform init (in the environments/staging directory)
- Validate: terraform validate
- Plan: terraform plan -out=tfplan
- Comment: The CI bot posts the text output of the plan directly to the PR.
Human Review: A Senior Engineer or CTO reviews the PR. This is the most critical step. The reviewer's job is to read the plan output to see exactly what Terraform will Create, Change, or Destroy.
Merge (Auto-Apply): Once the PR is approved and merged, a separate pipeline job runs.
- Apply: terraform apply "tfplan" (Applies the exact plan that was reviewed).

Example (GitHub Actions):

.github/workflows/terraform-pr.yml

name: 'Terraform PR Plan'
on: [pull_request]

jobs:
  terraform:
    name: 'Terraform Plan'
    runs-on: ubuntu-latest
    
    # Run all steps from the staging environment directory
    defaults:
      run:
        working-directory: ./environments/staging

    steps:
    - name: Checkout
      uses: actions/checkout@v3

    - name: Setup Terraform
      uses: hashicorp/setup-terraform@v2

    - name: Configure AWS Credentials
      uses: aws-actions/configure-aws-credentials@v1
      with:
        aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
        aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        aws-region: us-east-1

    - name: Terraform Init
      run: terraform init

    - name: Terraform Format
      run: terraform fmt -check
      
    - name: Terraform Validate
      run: terraform validate

    - name: Terraform Plan
      id: plan
      run: terraform plan -no-color -out=tfplan
      # Continue on error so plan failure is visible in PR
      continue-on-error: true

    # This part would typically use a GitHub App or action to post to the PR
    - name: Post Plan to PR
      if: steps.plan.outcome == 'failure'
      run: |
        echo "Terraform plan failed!"
        # (Add logic to post plan output)
        exit 1

Step 4: Managing Secrets and Sensitive Data

DO NOT hardcode database passwords, API keys, or certificates in .tf files or .tfvars files.

Solution: Use a dedicated secrets manager. Your IaC code should provision the secret placeholder, and the value should be injected from a secure store.

Example (AWS Secrets Manager):

Your Terraform code provisions the secret definition, but not the value.

resource "aws_secretsmanager_secret" "rds_password" {
  name = "prod/rds/master_password"
  description = "Master password for the production RDS instance"
}

The secret value itself should be populated "out-of-band" (e.g., via the AWS console by a security officer, or a separate, highly restricted CI/CD job).

Your application's IaC (e.g., the ECS Task Definition) can then reference this secret by its ARN, injecting it securely at runtime.

# In your ecs_service module
resource "aws_ecs_task_definition" "app" {
  # ... other config ...

  container_definitions = jsonencode([{
    # ...
    secrets = [
      {
        name      = "DB_PASSWORD" # Env var in the container
        valueFrom = aws_secretsmanager_secret.rds_password.arn
      }
    ]
  }])
}

This decouples the provisioning of infrastructure from the management of sensitive data.

Product Engineering Services

Build with 4Geeks

Advanced Challenge: Managing Configuration Drift

Drift is when the real-world state of your infrastructure (what's in the AWS console) desynchronizes from the state defined in your IaC code. This is your worst enemy. It happens when an engineer makes a "quick fix" manually in the console ("I'll just open this security group port for a test...").

Solution:

Prevention (Policy): Enforce strict, read-only IAM permissions for most engineers. All changes must go through the IaC PR process. This is a cultural and disciplinary challenge as much as a technical one.
Detection (Automation): Run a scheduled CI job (e.g., nightly) that executes terraform plan against your production environment. If the plan is "dirty" (i.e., it proposes changes), drift has occurred. Send a high-priority alert to the engineering team.
Remediation: The team's responsibility is to "pave over" the drift.
- If the manual change was incorrect, simply re-running terraform apply will revert the infrastructure to match the code.
- If the manual change was correct and desired, the engineer must update the Terraform code to match it, submit a PR, and get it approved, before running apply (which will then show "no changes").

Conclusion

Infrastructure as Code is not an optional tool; it is a foundational component of a mature, scalable, and reliable engineering organization. By treating infrastructure with the same discipline as application code—versioning, modularizing, testing, and automating it through a CI/CD pipeline—you eliminate a massive class of potential errors and unlock significant development velocity.

For CTOs, the mandate is to move your organization from "click-ops" to a "GitOps" model. Start by inventorying your critical infrastructure, codifying one component at a time (e.g., your networking/VPC), and building an automated pipeline around it. The initial investment in process and tooling pays for itself immediately in stability, auditability, and speed.

FAQs

What is the difference between imperative and declarative Infrastructure as Code (IaC)?

In an imperative (procedural) approach, you write scripts detailing specific steps to achieve a result (e.g., "create a server," then "add a firewall rule"). This method can be brittle and complex because it is not inherently idempotent—running the script twice might create duplicate resources or fail. In contrast, a declarative (functional) approach defines the desired end state of the infrastructure (e.g., "I need three servers and a load balancer"). The IaC tool automatically calculates the necessary changes to match that state, ensuring consistency and stability regardless of how many times the code is applied.

What are the best practices for managing Terraform state files securely?

The state file is the "single source of truth" for your infrastructure and often contains sensitive data, so it should never be stored locally or committed to version control systems like Git. Instead, use a secure remote backend (such as AWS S3) to store the file centrally. Additionally, implement state locking (using tools like DynamoDB) to prevent concurrent operations that could corrupt the state. This ensures that the entire team and CI/CD pipelines work from a unified, protected, and consistent view of the infrastructure.

How can engineering teams prevent and detect configuration drift?

Configuration drift occurs when the actual infrastructure environment deviates from the defined IaC code, usually due to manual "click-ops" changes made directly in the cloud console. To prevent this, organizations should enforce read-only permissions for engineers, ensuring all changes go through a pull request and CI/CD pipeline. To detect existing drift, teams can set up automated scheduled jobs that run infrastructure plans (e.g., terraform plan) to alert engineers if the real-world state no longer matches the code, allowing for immediate remediation.

How to Use Infrastructure as Code (IaC) to Manage Your Cloud Resources

Allan Porras

Product Engineering Services

Core Decision: Declarative vs. Imperative IaC

Tooling Architecture: Key Trade-offs

Example: Defining an S3 Bucket

Practical Implementation: A Production-Grade IaC Workflow

Step 1: Secure Remote State Management

Step 2: A Modular, Environment-Driven Repository Structure

Product Engineering Services

Step 3: The GitOps CI/CD Pipeline

Step 4: Managing Secrets and Sensitive Data

Product Engineering Services

Advanced Challenge: Managing Configuration Drift

Conclusion

FAQs

What is the difference between imperative and declarative Infrastructure as Code (IaC)?

What are the best practices for managing Terraform state files securely?

How can engineering teams prevent and detect configuration drift?

Read more

Introducing 4Geeks AI Studio, a New Era of AI-Augmented Software Development

Robotics and Spatial Reasoning Use Cases with Gemini Robotics-ER

Achieve Flawless Product Quality with Custom Computer Vision from 4Geeks

Scaling Without Dying in the Attempt: The Rockefeller Method Meets Growth Engineering