Engineering

How to Monitor and Log a Distributed System with the ELK Stack

Allan Porras

19 Oct 2025 — 7 min read

In modern distributed architectures, visibility is no longer a luxury; it's a fundamental requirement for operational stability and performance engineering. The ephemeral nature of containers and the complexity of microservice interactions create an opaque environment where traditional monitoring tools fail. Centralized logging and observability are the solutions, and the ELK Stack (Elasticsearch, Logstash, Kibana) remains one of the most powerful, open-source platforms for achieving this.

This article provides a technical, actionable guide for CTOs and senior engineers to design and implement a robust logging pipeline using the ELK Stack. We will move beyond high-level concepts to focus on architectural decisions, practical configurations, and performance tuning at scale.

The Core Architecture: Components and Data Flow

The power of the ELK Stack lies in its modular design. Understanding the specific role of each component is critical to designing a scalable and resilient system. The modern incarnation of the stack is more accurately described as the "Elastic Stack," incorporating Beats as the primary data shipping mechanism.

Product Engineering Services

Work with our in-house Project Managers, Software Engineers and QA Testers to build your new custom software product or to support your current workflow, following Agile, DevOps and Lean methodologies.

Build with 4Geeks

Elasticsearch: At its core, Elasticsearch is a distributed, RESTful search and analytics engine built on Apache Lucene. It stores, indexes, and makes vast quantities of structured and unstructured data searchable in near real-time. For our purposes, it's the database where all logs and metrics reside. Key architectural concepts include nodes (a single server), clusters (a collection of nodes), indices (a collection of documents with similar characteristics), and shards (the basic unit of scalability, allowing indices to be partitioned and distributed).
Logstash: This is the server-side data processing pipeline. Its primary function is to ingest data from a multitude of sources simultaneously, transform it, and then send it to a "stash" like Elasticsearch. The pipeline is defined by three stages:
- Inputs: Where to pull data from (e.g., Beats, Kafka, TCP sockets).
- Filters: How to process the data. This is where the real power lies—parsing unstructured log text into structured fields (grok), enriching data (geoip), and manipulating fields (mutate).
- Outputs: Where to send the processed data (e.g., Elasticsearch, S3, another message queue).
Kibana: This is the visualization layer. Kibana provides the web interface to explore, visualize, and build dashboards on top of the data stored in Elasticsearch. Its Discover, Lens, and Dashboard features allow engineers to move from raw logs to aggregated insights, identifying trends and anomalies.
Beats: These are lightweight, single-purpose data shippers. They are installed on source machines to collect different types of data and forward them to either Logstash or directly to Elasticsearch. Filebeat tails log files, while Metricbeat collects system and service metrics. Using Beats is the standard practice, as it offloads the resource-intensive work from your application servers and provides reliability features like backpressure handling.

The canonical data flow is: Beats → Logstash → Elasticsearch ← Kibana. For high-throughput environments, a message queue is inserted for resilience: Beats → Kafka/Redis → Logstash → Elasticsearch.

Practical Implementation: A Dockerized ELK Stack

To demonstrate a functional setup, we will use Docker Compose to orchestrate the core components. This provides a repeatable environment for development and testing.

Step 1: Orchestrate the Stack with Docker Compose

Create a docker-compose.yml file. This configuration launches Elasticsearch, Logstash, and Kibana, linking them on a dedicated network.

version: '3.8'

services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.9.2
    container_name: elasticsearch
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=false # Disable for demo purposes ONLY
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ports:
      - "9200:9200"
    networks:
      - elk-net

  logstash:
    image: docker.elastic.co/logstash/logstash:8.9.2
    container_name: logstash
    volumes:
      - ./logstash/pipeline:/usr/share/logstash/pipeline/
    ports:
      - "5044:5044"
    depends_on:
      - elasticsearch
    networks:
      - elk-net

  kibana:
    image: docker.elastic.co/kibana/kibana:8.9.2
    container_name: kibana
    ports:
      - "5601:5601"
    depends_on:
      - elasticsearch
    networks:
      - elk-net

networks:
  elk-net:
    driver: bridge

Note: The xpack.security.enabled=false setting is for simplicity. In a production environment, you must enable security features like TLS encryption and role-based access control.

Step 2: Configure the Logstash Pipeline

Logstash is inert without a pipeline configuration. Create a directory logstash/pipeline/ and add a logstash.conf file. This pipeline listens for Beats traffic on port 5044, parses an incoming JSON log message, and outputs it to Elasticsearch.

# ./logstash/pipeline/logstash.conf

input {
  beats {
    port => 5044
  }
}

filter {
  # Example: If logs are JSON-encoded strings, parse them
  json {
    source => "message"
    target => "log"
  }

  # Grok filter for a common NGINX access log format
  # Sample: 127.0.0.1 - - [19/Oct/2025:14:09:58 +0000] "GET /api/v1/users HTTP/1.1" 200 512 "-" "Go-http-client/1.1"
  grok {
    match => { "message" => "%{IPORHOST:client.ip} %{USER:user.name} %{USER:user.auth} \[%{HTTPDATE:timestamp}\] \"%{WORD:http.request.method} %{DATA:url.path} HTTP/%{NUMBER:http.version}\" %{NUMBER:http.response.status_code:int} %{NUMBER:http.response.body.bytes:int} \"%{DATA:http.request.referrer}\" \"%{DATA:user_agent.original}\"" }
  }

  # GeoIP enrichment based on the parsed client IP
  geoip {
    source => "client.ip"
  }

  # Parse the timestamp into a proper date field
  date {
    match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
    remove_field => "timestamp"
  }
}

output {
  elasticsearch {
    hosts => ["http://elasticsearch:9200"]
    index => "microservice-logs-%{+YYYY.MM.dd}"
  }
  # For debugging during setup
  stdout { codec => rubydebug }
}

This configuration demonstrates three powerful filters: json for structured logs, grok for unstructured text, and geoip for data enrichment. The output uses a daily rolling index pattern (microservice-logs-%{+YYYY.MM.dd}), which is crucial for managing data retention.

Step 3: Configure the Data Shipper (Filebeat)

On your application servers, install and configure Filebeat. The filebeat.yml specifies which log files to monitor and where to send the data.

# filebeat.yml on an application server

filebeat.inputs:
- type: filestream
  id: my-app-logs
  enabled: true
  paths:
    - /var/log/my-app/*.log
  # Handle multi-line stack traces (e.g., from Java, Python)
  parsers:
    - multiline:
        type: pattern
        pattern: '^[[:space:]]'
        negate: true
        match: after

# Send data to our Logstash container
output.logstash:
  hosts: ["your-logstash-host:5044"]

This configuration tails all .log files in /var/log/my-app/, correctly groups multi-line stack traces into a single event, and sends them to the Logstash instance we configured.

Advanced Architecture for Scale and Resilience

The basic setup works well but has limitations under heavy load. A production-grade architecture must account for backpressure, data durability, and cost management.

Decoupling with a Message Queue

Directly connecting Beats to Logstash creates a tightly coupled system. If Logstash slows down or fails, backpressure is exerted all the way back to the application servers, potentially impacting their performance.

Introducing a message queue like Apache Kafka or Redis Streams as a buffer between Beats and Logstash decouples the system.

Beats configuration changes to output to Kafka. This is a lightweight, fast operation.
Logstash now consumes from a Kafka topic at its own pace.
Benefits: This architecture provides superior durability (Kafka retains data) and scalability (multiple Logstash consumers can process logs in parallel from the same topic).

Product Engineering Services

Build with 4Geeks

Filebeat output to Kafka:

output.kafka:
  hosts: ["kafka-broker-1:9092", "kafka-broker-2:9092"]
  topic: 'filebeat-logs'
  partition.round_robin:
    reachable_only: false
  required_acks: 1

Logstash input from Kafka:

input {
  kafka {
    bootstrap_servers => "kafka-broker-1:9092,kafka-broker-2:9092"
    topics => ["filebeat-logs"]
    group_id => "logstash_consumers"
  }
}

Elasticsearch Index Lifecycle Management (ILM)

Storing logs indefinitely is prohibitively expensive. Elasticsearch ILM automates the management of indices over their lifetime. A common strategy is the Hot-Warm-Cold architecture:

Hot Phase: Index is actively being written to and queried. Resides on the fastest hardware (SSDs).
Warm Phase: Index is no longer being written to but is still queried. Can be moved to slower, less expensive hardware. Number of replicas can be reduced.
Cold Phase: Infrequently queried data. Can be moved to the cheapest storage (spinning disks) or a searchable snapshot on object storage (e.g., S3).
Delete Phase: Index is deleted permanently.

You define an ILM policy in Kibana or via the API and attach it to an index template. This ensures that all new daily indices automatically follow the policy.

Example ILM Policy (JSON):

{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_size": "50gb",
            "max_age": "30d"
          }
        }
      },
      "warm": {
        "min_age": "30d",
        "actions": {
          "forcemerge": {
            "max_num_segments": 1
          },
          "shrink": {
            "number_of_shards": 1
          },
          "allocate": {
            "number_of_replicas": 1
          }
        }
      },
      "delete": {
        "min_age": "365d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

Conclusion

Implementing the ELK Stack is a strategic investment in observability. A well-architected logging pipeline moves an organization from a reactive to a proactive operational posture. It enables engineers to not only diagnose failures but also to understand system behavior, optimize performance, and detect security anomalies.

The key takeaways for a successful implementation are:

Standardize on Beats: Use Beats for data shipping to ensure reliability and low overhead on application hosts.
Enrich at the Pipeline: Leverage Logstash's filtering capabilities to parse, structure, and enrich data before it reaches Elasticsearch. Structured logs are exponentially more valuable than raw text.
Architect for Scale: For any non-trivial workload, introduce a message queue like Kafka to decouple your services from the logging pipeline.
Manage Data Lifecycle: Implement ILM from day one to control storage costs and maintain cluster performance.

By following these principles, you can build a centralized logging system that serves as the bedrock of your observability strategy, providing clear, actionable insights into the most complex distributed systems.

FAQs

What is the ELK Stack and what is it used for?

The ELK Stack (now often called the Elastic Stack) is a powerful, open-source platform used for centralized logging, monitoring, and observability in complex distributed systems. It consists of four main components: Elasticsearch (a search and analytics engine for storing data), Logstash (a server-side pipeline for ingesting and transforming data), Kibana (a web interface for visualizing and dashboarding data), and Beats (lightweight shippers that send data from source machines).

How do Elasticsearch, Logstash, and Kibana work together?

The components operate as a data pipeline. Beats (like Filebeat) collect logs or metrics from application servers and forward them. The data is sent to Logstash, which processes, parses, and enriches it using filters (such as grok for unstructured text). Logstash then sends the structured data to Elasticsearch, which indexes and stores it for fast, real-time searching. Finally, Kibana connects to Elasticsearch, allowing users to explore the data, create visualizations, and build dashboards to monitor system health.

How can you make the ELK Stack scalable and resilient?

For high-throughput systems, a message queue like Apache Kafka is often introduced between Beats and Logstash. This decouples the system, acts as a buffer to handle backpressure, and improves data durability. Additionally, Elasticsearch Index Lifecycle Management (ILM) is used to manage storage costs and performance. ILM automates data retention by moving indices through "Hot" (active), "Warm" (less queried), and "Cold" (infrequent) phases, and can eventually delete old data.

How to Monitor and Log a Distributed System with the ELK Stack

Allan Porras

The Core Architecture: Components and Data Flow

Product Engineering Services

Practical Implementation: A Dockerized ELK Stack

Step 1: Orchestrate the Stack with Docker Compose

Step 2: Configure the Logstash Pipeline

Step 3: Configure the Data Shipper (Filebeat)

Advanced Architecture for Scale and Resilience

Decoupling with a Message Queue

Product Engineering Services

Elasticsearch Index Lifecycle Management (ILM)

Conclusion

FAQs

What is the ELK Stack and what is it used for?

How do Elasticsearch, Logstash, and Kibana work together?

How can you make the ELK Stack scalable and resilient?

Read more

How Serial Churners Are Changing Subscriptions: Solutions from 4Geeks Payments Globally

Personalized Subscription Plans for US Startups: Discover 4Geeks Payments Features

How to Optimize Subscription Retention in Asia Using 4Geeks Payments

Top Subscription Management Trends 2026: Boost Revenue with 4Geeks Payments for Global SaaS Businesses