Orchestrating State: Building and Deploying Stateful Applications on Kubernetes with Operators

Orchestrating State: Building and Deploying Stateful Applications on Kubernetes with Operators
Photo by Growtika / Unsplash

While Kubernetes excels at managing stateless microservices via standard Deployments and ReplicaSets, managing stateful applications—such as databases, message queues, and distributed caches—introduces significant complexity. These applications require stable network identities, persistent storage, and ordered deployment and termination. To solve this in an automated, scalable manner, high-performing engineering teams turn to the Operator Pattern.

This article details the technical implementation of a Kubernetes Operator using Go and Kubebuilder, specifically designed to manage a stateful workload.

Product Engineering Services

Work with our in-house Project Managers, Software Engineers and QA Testers to build your new custom software product or to support your current workflow, following Agile, DevOps and Lean methodologies.

Build with 4Geeks

The Operator Pattern: Automating Day-2 Operations

The Operator pattern codifies operational knowledge into software. It extends the Kubernetes API using Custom Resource Definitions (CRDs) and a custom Controller. The Controller runs a reconciliation loop, ensuring the current state of the system matches the desired state defined in the CRD.

For stateful applications, this means the Operator handles logic that standard Kubernetes resources cannot, such as:

  • Leader Election: Handling primary/replica promotion during failovers.
  • Data Rebalancing: Sharding data when scaling nodes up or down.
  • Backup/Restore: Automating snapshots and restoration processes.

Technical Implementation

We will scaffold a project to create an Operator for a hypothetical KVStore (Key-Value Store) that requires persistent storage and strict ordering.

1. Scaffolding with Kubebuilder

First, initialize the project domain and API.

kubebuilder init --domain 4geeks.io --repo github.com/4geeks/kvstore-operator
kubebuilder create api --group db --version v1 --kind KVStore

2. Defining the Custom Resource (CRD)

The CRD defines the schema for our application. For a stateful service, we need to specify storage requirements and cluster size.

Modify api/v1/kvstore_types.go:

// KVStoreSpec defines the desired state of KVStore
type KVStoreSpec struct {
    // Size defines the number of replicas in the cluster
    // +kubebuilder:validation:Minimum=1
    Size int32 `json:"size"`

    // StorageSize defines the request size for Persistent Volume Claims
    StorageSize string `json:"storageSize"`

    // ContainerImage defines the specific version of the KVStore to run
    ContainerImage string `json:"containerImage"`
}

// KVStoreStatus defines the observed state of KVStore
type KVStoreStatus struct {
    // ReadyReplicas indicates how many nodes are fully synced
    ReadyReplicas int32 `json:"readyReplicas"`
}

Product Engineering Services

Work with our in-house Project Managers, Software Engineers and QA Testers to build your new custom software product or to support your current workflow, following Agile, DevOps and Lean methodologies.

Build with 4Geeks

3. Implementing the Reconciliation Loop

The Reconcile function is the heart of the Operator. It is triggered whenever a change occurs in the KVStoreresource or the resources it owns (like the StatefulSet).

In controllers/kvstore_controller.go, we implement logic to manage a StatefulSet. Unlike DeploymentsStatefulSets maintain a sticky identity for each Pod (kvstore-0kvstore-1), which is crucial for data consistency.

func (r *KVStoreReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    log := log.FromContext(ctx)

    // 1. Fetch the KVStore CR instance
    kvStore := &dbv1.KVStore{}
    if err := r.Get(ctx, req.NamespacedName, kvStore); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // 2. Define the desired StatefulSet object
    desiredSts := r.desiredStatefulSet(kvStore)

    // 3. Check if the StatefulSet already exists
    foundSts := &appsv1.StatefulSet{}
    err := r.Get(ctx, types.NamespacedName{Name: desiredSts.Name, Namespace: kvStore.Namespace}, foundSts)

    if err != nil && errors.IsNotFound(err) {
        log.Info("Creating a new StatefulSet", "Namespace", desiredSts.Namespace, "Name", desiredSts.Name)
        err = r.Create(ctx, desiredSts)
        if err != nil {
            return ctrl.Result{}, err
        }
        // Requeue to verify creation
        return ctrl.Result{Requeue: true}, nil
    } else if err != nil {
        return ctrl.Result{}, err
    }

    // 4. Update Strategy: Check for drift in configuration (e.g., Scaling)
    if *foundSts.Spec.Replicas != kvStore.Spec.Size {
        foundSts.Spec.Replicas = &kvStore.Spec.Size
        log.Info("Updating replica count", "From", *foundSts.Spec.Replicas, "To", kvStore.Spec.Size)
        err = r.Update(ctx, foundSts)
        if err != nil {
            return ctrl.Result{}, err
        }
    }

    // 5. Update Status
    kvStore.Status.ReadyReplicas = foundSts.Status.ReadyReplicas
    if err := r.Status().Update(ctx, kvStore); err != nil {
        return ctrl.Result{}, err
    }

    return ctrl.Result{}, nil
}

4. Handling Persistent Storage with VolumeClaimTemplates

Stateful applications require data persistence across Pod restarts. When constructing the StatefulSet struct in Go, you must define VolumeClaimTemplates. This dynamically provisions a PersistentVolume (PV) for each replica using the underlying Cloud Provider's storage driver (e.g., EBS on AWS, PD on GCP).

func (r *KVStoreReconciler) desiredStatefulSet(kv *dbv1.KVStore) *appsv1.StatefulSet {
    // ... metadata setup ...
    return &appsv1.StatefulSet{
        // ...
        Spec: appsv1.StatefulSetSpec{
            // Headless Service is required for StatefulSets
            ServiceName: kv.Name + "-headless", 
            VolumeClaimTemplates: []corev1.PersistentVolumeClaim{
                {
                    ObjectMeta: metav1.ObjectMeta{Name: "data"},
                    Spec: corev1.PersistentVolumeClaimSpec{
                        AccessModes: []corev1.PersistentVolumeAccessMode{corev1.ReadWriteOnce},
                        Resources: corev1.ResourceRequirements{
                            Requests: corev1.ResourceList{
                                corev1.ResourceStorage: resource.MustParse(kv.Spec.StorageSize),
                            },
                        },
                    },
                },
            },
            // ... container spec ...
        },
    }
}

Scaling and Cloud Infrastructure Considerations

Deploying Operators in production requires robust underlying infrastructure. The complexity of managing storage classes, multi-zone availability, and secure networking often demands dedicated expertise.

For organizations scaling these capabilities, 4Geeks provides cloud engineering services remote teams. Their expertise in cloud architecture and automation helps enterprises design the reliable foundation necessary for running complex stateful workloads on Kubernetes. Whether you are performing a cloud migration or implementing serverless architectures, having a specialized partner ensures that your Kubernetes strategy aligns with broader business availability goals.

Conclusion

Operators are the definitive solution for managing stateful complexity on Kubernetes. By abstracting the specific lifecycle management of databases and storage systems into a custom controller, engineering teams can achieve a higher level of automation and reliability. While the initial investment in building an Operator is higher than a simple Helm chart, the long-term operational benefits—automated healing, scaling, and maintenance—are substantial for any serious cloud-native enterprise.

If you are looking to accelerate your infrastructure maturity, consider how 4Geeks can support your journey with expert cloud engineering and DevOps services.

Product Engineering Services

Work with our in-house Project Managers, Software Engineers and QA Testers to build your new custom software product or to support your current workflow, following Agile, DevOps and Lean methodologies.

Build with 4Geeks

FAQs

Why is the Operator Pattern preferred for deploying stateful applications on Kubernetes?

Standard Kubernetes resources, such as Deployments, excel at managing stateless microservices but often lack the logic required for complex stateful workloads like databases or message queues. The Operator Pattern is preferred because it codifies operational knowledge into software, allowing for the automation of specific tasks such as leader election, data rebalancing, and backup/restore. This ensures that applications requiring stable network identities and persistent storage can be managed reliably and scalable.

How does the reconciliation loop function within a Kubernetes Operator?

The reconciliation loop is the core mechanism of a custom Controller within an Operator. It operates by continuously observing the current state of the cluster and comparing it to the "desired state" defined in the Custom Resource Definition (CRD). If the Controller detects any drift—such as a missing pod or a configuration mismatch—it automatically triggers corrective actions to reconcile the differences, ensuring the system consistently matches the defined requirements without manual intervention.

What role do Custom Resource Definitions (CRDs) play in orchestrating stateful workloads?

Custom Resource Definitions (CRDs) extend the Kubernetes API, enabling users to define custom objects that represent specific application requirements. For stateful applications, a CRD acts as the schema that specifies essential parameters, such as cluster size, storage requirements, and versioning. This allows the Operator to recognize and manage these custom resources just like native Kubernetes objects, facilitating precise control over the application's lifecycle and storage provisioning.

Read more

How to Build Your Own Internal Developer Platform (IDP) Using Crossplane

How to Build Your Own Internal Developer Platform (IDP) Using Crossplane

In the modern cloud-native landscape, the friction between operation stability and developer velocity remains a critical bottleneck. As organizations scale, the manual ticketing systems traditionally used to provision infrastructure become unsustainable. The solution lies in platform engineering: building an Internal Developer Platform (IDP) that enables self-service capabilities without sacrificing governance

By Allan Porras