Engineering

How to Perform Load Testing on a Distributed System with k6

Allan Porras

23 Oct 2025 — 9 min read

In a monolithic world, load testing was a relatively straightforward affair: point a tool at a single endpoint and increase the pressure. In today's landscape of distributed systems, microservices, and serverless functions, this approach is dangerously insufficient. A modern system's performance is not a single number; it's an emergent property of dozens of discrete, interconnected services communicating over a network. A bottleneck might not be in your primary API gateway but in a downstream authentication service, a saturated message queue, or a poorly indexed database table three hops away.

Understanding this systemic behavior under load is paramount. Failing to do so doesn't just risk slow response times; it risks cascading failures, resource exhaustion, and catastrophic outages.

This article provides a technical, actionable guide for CTOs and senior engineers on implementing robust load testing for distributed systems using k6. We will bypass high-level theory and focus on practical implementation, from scripting complex user journeys to executing tests at scale on Kubernetes and, most critically, correlating client-side metrics with your server-side observability stack.

Product Engineering Services

Work with our in-house Project Managers, Software Engineers and QA Testers to build your new custom software product or to support your current workflow, following Agile, DevOps and Lean methodologies.

Build with 4Geeks

Why k6 for Distributed Systems?

While tools like Apache JMeter have been mainstays, k6 offers a modern, developer-centric approach particularly suited for distributed architectures:

High Performance, Low Footprint: k6 is written in Go. It uses a single process and an event-loop-based architecture, allowing it to generate significant load from a single machine with minimal CPU and memory overhead. This is crucial for cost-effective testing.
Developer-First Scripting (JavaScript ES6): Tests are written in JavaScript. This lowers the barrier to entry, as your engineers don't need to learn a new domain-specific language or navigate a complex UI. The scripts are code, treat them as such: version control, code review, and modularize them.
Built-in Metrics & Thresholds: k6 provides critical metrics (p95/p99 latency, request rates, error rates) out of the box. More importantly, it allows you to define explicit pass/fail criteria (Thresholds) directly in your script, making it ideal for CI/CD integration.
Extensibility: k6 supports gRPC, WebSockets, Kafka, and other protocols common in distributed systems, not just HTTP.
Observability Integration: k6 is designed to plug directly into modern observability stacks, shipping metrics to Prometheus, Grafana, Datadog, and New Relic.

Phase 1: Scripting Complex User Scenarios

A distributed system rarely serves a single, stateless request. Users follow a flow: they log in (hitting an auth service), browse a catalog (hitting a product service), and place an order (hitting an order service, which may trigger a payment service and an inventory service). Your test script must model this reality.

Core k6 Script Structure

A k6 script has two main parts: the options object, which defines the load profile, and the default function, which contains the logic executed by each Virtual User (VU).

import http from 'k6/http';
import { check, sleep, group } from 'k6';

// 1. OPTIONS: Define the load profile
export const options = {
  stages: [
    { duration: '1m', target: 100 }, // Ramp-up to 100 VUs over 1 minute
    { duration: '3m', target: 100 }, // Stay at 100 VUs for 3 minutes
    { duration: '1m', target: 0 },   // Ramp-down to 0 VUs
  ],
  thresholds: {
    // 95% of requests must complete below 500ms
    'http_req_duration': ['p(95)<500'],
    // 99% of requests must be successful
    'http_req_failed': ['rate<0.01'],
    // The 'login' group must have a 99.9% success rate
    'checks{group:::User Login}': ['rate>0.999'],
  },
};

// 2. DEFAULT FUNCTION: The VU logic
export default function () {
  const BASE_URL = 'https://api.your-system.com';
  let authToken;

  // Group 1: User Login (Auth Service)
  group('User Login', () => {
    const loginPayload = JSON.stringify({
      email: `user_${__VU}@example.com`, // Parameterize data per VU
      password: 'supersecretpassword',
    });
    const loginParams = {
      headers: { 'Content-Type': 'application/json' },
    };

    const res = http.post(`${BASE_URL}/v1/auth/login`, loginPayload, loginParams);
    
    check(res, {
      'login successful (status 200)': (r) => r.status === 200,
      'auth token received': (r) => r.json('token') !== '',
    });

    if (res.json('token')) {
      authToken = res.json('token');
    }
  });

  // Only proceed if login was successful
  if (!authToken) {
    return; // Abort this iteration
  }

  const authParams = {
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${authToken}`,
    },
  };

  // Group 2: Browse Products (Product Service)
  group('Browse Products', () => {
    const res = http.get(`${BASE_URL}/v2/products?category=electronics`, authParams);
    check(res, {
      'get products successful (status 200)': (r) => r.status === 200,
    });
    sleep(1.5); // Simulate user think time
  });

  // Group 3: Place Order (Order Service)
  group('Place Order', () => {
    const orderPayload = JSON.stringify({
      productId: 'abc-123',
      quantity: 1,
    });

    const res = http.post(`${BASE_URL}/v1/orders`, orderPayload, authParams);
    check(res, {
      'order placement successful (status 201)': (r) => r.status === 201,
    });
  });

  sleep(2); // Wait before starting a new session
}

Key Takeaways from this script:

Groups: We use group() to organize requests into logical transactions (User Login, Browse Products). This provides aggregated metrics for each step, allowing you to pinpoint which part of the flow is failing or slow.
Checks: check() validates responses. These are not assertions; they don't stop the test. They collect pass/fail metrics, which we then use in our thresholds.
Thresholds: This is your SLO/SLA as code. The test will return a non-zero exit code (failing your CI pipeline) if p(95) latency exceeds 500ms or if the error rate climbs above 1%.
Data Parameterization: We use __VU (a k6-specific variable for the Virtual User ID) to create unique usernames. In a real test, you would load this from a shared data array or file to avoid hitting caches and simulate real-world variability.
State: We capture the authToken from the login response and pass it in subsequent requests, simulating a real, stateful user session.

Phase 2: Executing at Distributed Scale

A single k6 instance, while efficient, cannot simulate the load of millions of users. For a distributed system, you must run a distributed test. The goal is to generate load from multiple "load generator" machines, all orchestrated by a single controller.

While k6 Cloud offers a managed, "push-button" solution for this, a self-hosted approach on Kubernetes provides maximum control and cost-effectiveness for a technical organization. We achieve this using the k6-operator.

The k6-operator introduces a Custom Resource Definition (CRD) to Kubernetes, allowing you to define a distributed load test declaratively, just like a Deployment or Service.

Product Engineering Services

Build with 4Geeks

Step-by-Step: Distributed Testing with k6-Operator

1. Prerequisite: Install the k6-operator

# Ensure you are on the correct K8s context
kubectl apply -f https://github.com/grafana/k6-operator/releases/latest/download/k6-operator.yaml

2. Package Your k6 Script in a ConfigMap

The operator needs access to your script.js. The simplest way is to load it into a ConfigMap.

kubectl create configmap my-load-test-script --from-file=script.js

3. Define the K6 Custom Resource

Create a YAML file (e.g., test-run.yaml) to define the distributed test. This is where the power lies.

apiVersion: k6.io/v1alpha1
kind: K6
metadata:
  name: my-distributed-test
spec:
  # 1. Parallelism: Number of k6 worker pods to spin up
  parallelism: 10

  # 2. Script: Reference the ConfigMap created in Step 2
  script:
    configMap:
      name: my-load-test-script
      file: script.js

  # 3. Arguments: Pass k6 CLI flags (e.g., VUs, duration)
  # This overrides the 'options' in the script, allowing for dynamic test profiles.
  arguments: --vcs 1000 --duration 10m # 1000 VUs total, split across 10 pods

  # 4. Observability: Send metrics to your stack
  runner:
    env:
      # Example: Configure k6 to output to Prometheus Remote-Write
      - name: K6_PROMETHEUS_RW_SERVER_URL
        value: "http://prometheus-remote-write-endpoint.monitoring.svc.cluster.local/api/v1/write"
      - name: K6_PROMETHEUS_RW_TREND_STATS
        value: "p(95),p(99),min,max,avg,med"

4. Execute the Test

Simply apply the manifest:

kubectl apply -f test-run.yaml

Kubernetes will now do the following:

Read the K6 resource.
Spin up one k6-controller pod.
Spin up 10 k6-worker pods (as defined by parallelism: 10).
The controller automatically distributes the 1000 VUs (--vcs 1000) among the workers (100 VUs each).
Each worker pod runs the same script.js, streaming its metrics to your configured backend (e.g., Prometheus).

You now have a scalable, repeatable, and declarative load testing framework running natively on your own infrastructure.

Phase 3: The Critical Link: Correlating Client & Server Metrics

This is the single most important, and most frequently missed, step.

Running the test in Phase 2 will tell you what happened from the client's perspective (e.g., "The /v1/orders endpoint p95 latency spiked to 3000ms"). It will not tell you why.

The "why" is on your servers:

Did the order-service pod run out of CPU?
Did its database connection pool exhaust?
Did a downstream gRPC call to the inventory-service time out?
Was there a spike in Kafka consumer lag?

To find the "why," you must correlate the k6 client-side metrics with your server-side observability data on a single, shared timeline.

How to Implement Correlation

1. Inject Trace Context from k6

Your distributed tracing system (e.g., OpenTelemetry, Jaeger, Datadog APM) relies on context propagation, typically via HTTP headers like traceparent. Your k6 script must generate and inject these headers so that the requests it generates are included in your server-side traces.

import http from 'k6/http';
import { uuidv4 } from 'https://jslib.k6.io/k6-utils/1.4.0/index.js';

// ... (options and other setup) ...

export default function () {
  // Generate a unique trace ID for this entire user flow
  const traceId = uuidv4().replace(/-/g, ''); 
  const spanId = uuidv4().substring(0, 16);
  
  // W3C Trace Context header
  const traceparent = `00-${traceId}-${spanId}-01`;

  const authParams = {
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${authToken}`,
      'traceparent': traceparent, // <-- INJECT THE TRACE HEADER
    },
  };

  group('Browse Products', () => {
    // This request will now be picked up by your APM/tracing backend
    const res = http.get(`${BASE_URL}/v2/products?category=electronics`, authParams);
    check(res, { /* ... */ });
  });

  // ... (rest of script) ...
}

2. Build the Unified Dashboard

With the k6-operator shipping k6 metrics to Prometheus (Phase 2) and your script injecting trace IDs (Phase 3), all your data is now in one place.

In Grafana (or your preferred tool), build a dashboard that layers these two data sources:

Top Panel (Client-Side):
- k6_http_reqs_total (Request Rate)
- k6_http_req_duration_p95 (P95 Latency)
- k6_http_req_failed_rate (Error Rate)
- k6_vus (Active Virtual Users)
Bottom Panels (Server-Side):
- Per-Service: CPU/Memory Usage, Pod Counts (HPA activity).
- Database: Query throughput, query latency, connection counts.
- Queues: Message queue depth (e.g., Kafka lag, RabbitMQ queue size).
- Network: Ingress/Egress bandwidth, connection errors.

Now, when you run your load test, you can watch this dashboard. When you see a spike in k6_http_req_duration_p95, you look directly below it. You will see the corresponding spike in database connections, the flatlining of a downstream service's pods, or the HPA scaling up a new node.

You have moved from "the site is slow" to "the site is slow because the order-service p99 latency is high, which correlates directly with a 95% CPU saturation on the payment-service deployment, which is failing its health checks." This is an actionable insight.

Product Engineering Services

Build with 4Geeks

Finally

Load testing a distributed system with k6 is not a one-time event; it's a continuous practice. By scripting realistic scenarios, executing at scale with the k6-operator, and—most importantly—building unified observability dashboards, you transform load testing from a simple pass/fail check into a powerful performance engineering and debugging tool.

By integrating these practices into your CI/CD pipeline, you establish a performance baseline, protect against regressions, and give your engineering teams the high-fidelity data they need to build resilient, scalable, and high-performance systems. This is no longer just "testing"; it is a foundational component of modern systems architecture and operational excellence.

FAQs

What is k6 and why is it recommended for testing distributed systems?

k6 is a modern, high-performance load testing tool written in Go. It is highly recommended for distributed systems because it has a low resource footprint, allowing it to generate significant load cost-effectively. Its developer-centric approach uses JavaScript (ES6) for scripting, which lowers the barrier to entry for engineers. It also supports various protocols common in distributed systems (like gRPC, Kafka, and WebSockets) and integrates directly with modern observability stacks like Prometheus and Grafana.

How can a k6 script simulate a complex user journey in a distributed system?

A k6 script simulates a complex user journey by using groups, checks, and state management.

Groups (group()) are used to organize multiple requests into logical transactions, such as "User Login" or "Place Order," which allows you to see metrics for each specific part of the flow.
Checks (check()) are used to validate responses (e.g., confirming an HTTP 200 status) to measure the success rate of requests without stopping the test.
State Management is achieved by capturing data from one request's response (like an authentication token) and using it in subsequent requests, mimicking a real, stateful user session.

What is the most critical step for analyzing k6 load test results?

The most critical step is correlating client-side metrics with server-side metrics. Running a k6 test tells you what happened from the user's perspective (e.g., "p95 latency spiked to 3000ms"). To find out why it happened, you must use a unified dashboard (e.g., in Grafana) to view your k6 metrics alongside your server-side observability data (e.g., CPU saturation, database connection pool exhaustion, or Kafka consumer lag) on a single timeline. This correlation provides actionable insights into the root cause of bottlenecks.

How to Perform Load Testing on a Distributed System with k6

Allan Porras

Product Engineering Services

Why k6 for Distributed Systems?

Phase 1: Scripting Complex User Scenarios

Core k6 Script Structure

Phase 2: Executing at Distributed Scale

Product Engineering Services

Step-by-Step: Distributed Testing with k6-Operator

Phase 3: The Critical Link: Correlating Client & Server Metrics

How to Implement Correlation

Product Engineering Services

Finally

FAQs

What is k6 and why is it recommended for testing distributed systems?

How can a k6 script simulate a complex user journey in a distributed system?

What is the most critical step for analyzing k6 load test results?

Read more

Evaluating LLM Performance for Coding Tasks: SWE-Bench Insights for the Enterprise

Architecting Autonomous Code Quality: Integrating LLMs into CI/CD Pipelines

Architecting Real-Time Multimodal Agents with Gemini and WebSockets

Building Autonomous Agents Using Gemini 3 Pro's Tool Calling