How to Monitor and Log a Distributed System with the ELK Stack
In modern distributed architectures, visibility is no longer a luxury; it's a fundamental requirement for operational stability and performance engineering. The ephemeral nature of containers and the complexity of microservice interactions create an opaque environment where traditional monitoring tools fail. Centralized logging and observability are the solutions, and the ELK Stack (Elasticsearch, Logstash, Kibana) remains one of the most powerful, open-source platforms for achieving this.
This article provides a technical, actionable guide for CTOs and senior engineers to design and implement a robust logging pipeline using the ELK Stack. We will move beyond high-level concepts to focus on architectural decisions, practical configurations, and performance tuning at scale.
The Core Architecture: Components and Data Flow
The power of the ELK Stack lies in its modular design. Understanding the specific role of each component is critical to designing a scalable and resilient system. The modern incarnation of the stack is more accurately described as the "Elastic Stack," incorporating Beats as the primary data shipping mechanism.

Product Engineering Services
Work with our in-house Project Managers, Software Engineers and QA Testers to build your new custom software product or to support your current workflow, following Agile, DevOps and Lean methodologies.
- Elasticsearch: At its core, Elasticsearch is a distributed, RESTful search and analytics engine built on Apache Lucene. It stores, indexes, and makes vast quantities of structured and unstructured data searchable in near real-time. For our purposes, it's the database where all logs and metrics reside. Key architectural concepts include nodes (a single server), clusters (a collection of nodes), indices (a collection of documents with similar characteristics), and shards (the basic unit of scalability, allowing indices to be partitioned and distributed).
- Logstash: This is the server-side data processing pipeline. Its primary function is to ingest data from a multitude of sources simultaneously, transform it, and then send it to a "stash" like Elasticsearch. The pipeline is defined by three stages:
- Inputs: Where to pull data from (e.g., Beats, Kafka, TCP sockets).
- Filters: How to process the data. This is where the real power lies—parsing unstructured log text into structured fields (grok), enriching data (geoip), and manipulating fields (mutate).
- Outputs: Where to send the processed data (e.g., Elasticsearch, S3, another message queue).
- Kibana: This is the visualization layer. Kibana provides the web interface to explore, visualize, and build dashboards on top of the data stored in Elasticsearch. Its Discover, Lens, and Dashboard features allow engineers to move from raw logs to aggregated insights, identifying trends and anomalies.
- Beats: These are lightweight, single-purpose data shippers. They are installed on source machines to collect different types of data and forward them to either Logstash or directly to Elasticsearch. Filebeat tails log files, while Metricbeat collects system and service metrics. Using Beats is the standard practice, as it offloads the resource-intensive work from your application servers and provides reliability features like backpressure handling.
The canonical data flow is: Beats → Logstash → Elasticsearch ← Kibana. For high-throughput environments, a message queue is inserted for resilience: Beats → Kafka/Redis → Logstash → Elasticsearch.
Practical Implementation: A Dockerized ELK Stack
To demonstrate a functional setup, we will use Docker Compose to orchestrate the core components. This provides a repeatable environment for development and testing.
Step 1: Orchestrate the Stack with Docker Compose
Create a docker-compose.yml
file. This configuration launches Elasticsearch, Logstash, and Kibana, linking them on a dedicated network.
version: '3.8'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.9.2
container_name: elasticsearch
environment:
- discovery.type=single-node
- xpack.security.enabled=false # Disable for demo purposes ONLY
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ports:
- "9200:9200"
networks:
- elk-net
logstash:
image: docker.elastic.co/logstash/logstash:8.9.2
container_name: logstash
volumes:
- ./logstash/pipeline:/usr/share/logstash/pipeline/
ports:
- "5044:5044"
depends_on:
- elasticsearch
networks:
- elk-net
kibana:
image: docker.elastic.co/kibana/kibana:8.9.2
container_name: kibana
ports:
- "5601:5601"
depends_on:
- elasticsearch
networks:
- elk-net
networks:
elk-net:
driver: bridge
Note: The xpack.security.enabled=false
setting is for simplicity. In a production environment, you must enable security features like TLS encryption and role-based access control.
Step 2: Configure the Logstash Pipeline
Logstash is inert without a pipeline configuration. Create a directory logstash/pipeline/
and add a logstash.conf
file. This pipeline listens for Beats traffic on port 5044, parses an incoming JSON log message, and outputs it to Elasticsearch.
# ./logstash/pipeline/logstash.conf
input {
beats {
port => 5044
}
}
filter {
# Example: If logs are JSON-encoded strings, parse them
json {
source => "message"
target => "log"
}
# Grok filter for a common NGINX access log format
# Sample: 127.0.0.1 - - [19/Oct/2025:14:09:58 +0000] "GET /api/v1/users HTTP/1.1" 200 512 "-" "Go-http-client/1.1"
grok {
match => { "message" => "%{IPORHOST:client.ip} %{USER:user.name} %{USER:user.auth} \[%{HTTPDATE:timestamp}\] \"%{WORD:http.request.method} %{DATA:url.path} HTTP/%{NUMBER:http.version}\" %{NUMBER:http.response.status_code:int} %{NUMBER:http.response.body.bytes:int} \"%{DATA:http.request.referrer}\" \"%{DATA:user_agent.original}\"" }
}
# GeoIP enrichment based on the parsed client IP
geoip {
source => "client.ip"
}
# Parse the timestamp into a proper date field
date {
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
remove_field => "timestamp"
}
}
output {
elasticsearch {
hosts => ["http://elasticsearch:9200"]
index => "microservice-logs-%{+YYYY.MM.dd}"
}
# For debugging during setup
stdout { codec => rubydebug }
}
This configuration demonstrates three powerful filters: json
for structured logs, grok
for unstructured text, and geoip
for data enrichment. The output uses a daily rolling index pattern (microservice-logs-%{+YYYY.MM.dd}
), which is crucial for managing data retention.
Step 3: Configure the Data Shipper (Filebeat)
On your application servers, install and configure Filebeat. The filebeat.yml
specifies which log files to monitor and where to send the data.
# filebeat.yml on an application server
filebeat.inputs:
- type: filestream
id: my-app-logs
enabled: true
paths:
- /var/log/my-app/*.log
# Handle multi-line stack traces (e.g., from Java, Python)
parsers:
- multiline:
type: pattern
pattern: '^[[:space:]]'
negate: true
match: after
# Send data to our Logstash container
output.logstash:
hosts: ["your-logstash-host:5044"]
This configuration tails all .log
files in /var/log/my-app/
, correctly groups multi-line stack traces into a single event, and sends them to the Logstash instance we configured.
Advanced Architecture for Scale and Resilience
The basic setup works well but has limitations under heavy load. A production-grade architecture must account for backpressure, data durability, and cost management.
Decoupling with a Message Queue
Directly connecting Beats to Logstash creates a tightly coupled system. If Logstash slows down or fails, backpressure is exerted all the way back to the application servers, potentially impacting their performance.
Introducing a message queue like Apache Kafka or Redis Streams as a buffer between Beats and Logstash decouples the system.
- Beats configuration changes to output to Kafka. This is a lightweight, fast operation.
- Logstash now consumes from a Kafka topic at its own pace.
- Benefits: This architecture provides superior durability (Kafka retains data) and scalability (multiple Logstash consumers can process logs in parallel from the same topic).

Product Engineering Services
Work with our in-house Project Managers, Software Engineers and QA Testers to build your new custom software product or to support your current workflow, following Agile, DevOps and Lean methodologies.
Filebeat output to Kafka:
output.kafka:
hosts: ["kafka-broker-1:9092", "kafka-broker-2:9092"]
topic: 'filebeat-logs'
partition.round_robin:
reachable_only: false
required_acks: 1
Logstash input from Kafka:
input {
kafka {
bootstrap_servers => "kafka-broker-1:9092,kafka-broker-2:9092"
topics => ["filebeat-logs"]
group_id => "logstash_consumers"
}
}
Elasticsearch Index Lifecycle Management (ILM)
Storing logs indefinitely is prohibitively expensive. Elasticsearch ILM automates the management of indices over their lifetime. A common strategy is the Hot-Warm-Cold architecture:
- Hot Phase: Index is actively being written to and queried. Resides on the fastest hardware (SSDs).
- Warm Phase: Index is no longer being written to but is still queried. Can be moved to slower, less expensive hardware. Number of replicas can be reduced.
- Cold Phase: Infrequently queried data. Can be moved to the cheapest storage (spinning disks) or a searchable snapshot on object storage (e.g., S3).
- Delete Phase: Index is deleted permanently.
You define an ILM policy in Kibana or via the API and attach it to an index template. This ensures that all new daily indices automatically follow the policy.
Example ILM Policy (JSON):
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {
"max_size": "50gb",
"max_age": "30d"
}
}
},
"warm": {
"min_age": "30d",
"actions": {
"forcemerge": {
"max_num_segments": 1
},
"shrink": {
"number_of_shards": 1
},
"allocate": {
"number_of_replicas": 1
}
}
},
"delete": {
"min_age": "365d",
"actions": {
"delete": {}
}
}
}
}
}
Conclusion
Implementing the ELK Stack is a strategic investment in observability. A well-architected logging pipeline moves an organization from a reactive to a proactive operational posture. It enables engineers to not only diagnose failures but also to understand system behavior, optimize performance, and detect security anomalies.
The key takeaways for a successful implementation are:
- Standardize on Beats: Use Beats for data shipping to ensure reliability and low overhead on application hosts.
- Enrich at the Pipeline: Leverage Logstash's filtering capabilities to parse, structure, and enrich data before it reaches Elasticsearch. Structured logs are exponentially more valuable than raw text.
- Architect for Scale: For any non-trivial workload, introduce a message queue like Kafka to decouple your services from the logging pipeline.
- Manage Data Lifecycle: Implement ILM from day one to control storage costs and maintain cluster performance.
By following these principles, you can build a centralized logging system that serves as the bedrock of your observability strategy, providing clear, actionable insights into the most complex distributed systems.