Kubernetes Cost Optimization: Save 40

Kubernetes Cost Optimization: How to Save 40–60% on Cloud Infrastructure in 2026¶

At CORE SYSTEMS, we operate Kubernetes clusters for clients like Packeta (500+ microservices) and Ceska sporitelna — with 99.9% SLA and a team of 50+ experts. From 15+ years of experience, we know that cloud infrastructure costs are growing faster than productivity. The CNCF 2026 survey shows that the average organization overpays for Kubernetes infrastructure by 35–50%. This article is a practical guide to eliminating these losses — based on concrete optimizations we have performed for our clients.

Where money disappears — anatomy of Kubernetes waste¶

Over-provisioning resource requests¶

The biggest source of waste. Developers set requests and limits conservatively because nobody wants their application to be OOMkilled. The result: average CPU utilization in a cluster is typically 15–25%, memory 40–60%.

# Kubernetes Cost Optimization: How to Save 40–60% on Cloud Infrastructure in 2026
resources:
  requests:
    cpu: "500m"    # Actually the app uses 50m
    memory: "512Mi" # Actually 120Mi
  limits:
    cpu: "2000m"
    memory: "2Gi"

This pod occupies a billing slot for 500m CPU and 512Mi RAM — even though 90% of that is never used.

Idle namespaces and zombie workloads¶

Development and staging environments run 24/7, despite being active only 8 hours a day. Forgotten jobs, completed CronJobs with history, old ReplicaSets — you’re paying for all of it.

Suboptimal instance types¶

Running a memory-intensive workload on a compute-optimized instance (or vice versa) — you pay for capacity you can’t use.

Resource Optimization — concrete steps¶

1. Goldilocks — automated resource request recommendations¶

Goldilocks analyzes actual usage via VPA (Vertical Pod Autoscaler) and recommends the right values.

# Installation
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm install goldilocks fairwinds-stable/goldilocks \
  --namespace goldilocks \
  --create-namespace

# Label namespace for analysis
kubectl label namespace production goldilocks.fairwinds.com/enabled=true

# Goldilocks dashboard
kubectl -n goldilocks port-forward svc/goldilocks-dashboard 8080:80

The dashboard shows recommended requests/limits for each deployment based on actual P50/P99 usage over the past N days.

2. VPA (Vertical Pod Autoscaler) in recommendation mode¶

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-service-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-service
  updatePolicy:
    updateMode: "Off"  # Recommendation only, does not apply automatically
  resourcePolicy:
    containerPolicies:
    - containerName: api
      minAllowed:
        cpu: 50m
        memory: 64Mi
      maxAllowed:
        cpu: 2000m
        memory: 4Gi
      controlledResources: ["cpu", "memory"]

# Show recommendations
kubectl describe vpa api-service-vpa -n production
# Look for: "Target" values for cpu and memory

3. HPA with custom metrics¶

Horizontal Pod Autoscaler based on custom business metrics (not just CPU):

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-service-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-service
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "100"
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # 5 min before scale-down
      policies:
      - type: Percent
        value: 25
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15

4. KEDA for event-driven autoscaling¶

For workloads driven by queues (Kafka, SQS, Redis):

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: worker-scaledobject
  namespace: production
spec:
  scaleTargetRef:
    name: queue-worker
  minReplicaCount: 0   # Scale to zero!
  maxReplicaCount: 50
  triggers:
  - type: aws-sqs-queue
    metadata:
      queueURL: https://sqs.eu-west-1.amazonaws.com/123456789/my-queue
      queueLength: "10"  # 1 replica per 10 messages
      awsRegion: eu-west-1

Scale-to-zero is key — a worker with no messages = 0 pods = 0 cost.

Node Optimization¶

Spot/Preemptible instances with Karpenter¶

Karpenter is a modern node autoscaler from AWS that can intelligently mix spot and on-demand instances:

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
      - key: karpenter.sh/capacity-type
        operator: In
        values: ["spot", "on-demand"]
      - key: kubernetes.io/arch
        operator: In
        values: ["amd64", "arm64"]  # ARM = cheaper
      - key: karpenter.k8s.aws/instance-category
        operator: In
        values: ["c", "m", "r"]
      nodeClassRef:
        name: default
  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 30s
  limits:
    cpu: 1000
    memory: 4000Gi

Karpenter consolidates nodes automatically — if 3 workloads sit on 3 nodes, it repacks them onto 1 node and shuts down the remaining 2.

Regular cleanup of idle nodes¶

#!/bin/bash
# Script to identify underutilized nodes

kubectl get nodes -o json | jq -r '
  .items[] |
  select(.status.conditions[] | select(.type=="Ready" and .status=="True")) |
  {
    name: .metadata.name,
    cpu_capacity: .status.capacity.cpu,
    mem_capacity: .status.capacity.memory,
    age: .metadata.creationTimestamp
  }
' | jq -r '.name + " | CPU: " + .cpu_capacity + " | Age: " + .age'

# Check actual utilization via metrics-server
kubectl top nodes --sort-by=cpu | awk '$3+0 < 20 {print "LOW CPU:", $0}'

Namespace and Environment Cleanup¶

Automatic deletion of development environments¶

#!/usr/bin/env python3
"""Auto-cleanup idle development namespaces"""
import subprocess
import json
from datetime import datetime, timezone, timedelta

def get_namespace_last_activity(namespace: str) -> datetime:
    """Get the last activity in a namespace based on events"""
    result = subprocess.run(
        ["kubectl", "get", "events", "-n", namespace,
         "--sort-by=.lastTimestamp", "-o", "json"],
        capture_output=True, text=True
    )
    events = json.loads(result.stdout)
    if not events["items"]:
        return datetime.min.replace(tzinfo=timezone.utc)

    last_event = events["items"][-1]
    timestamp = last_event["lastTimestamp"]
    return datetime.fromisoformat(timestamp.replace("Z", "+00:00"))

def cleanup_idle_namespaces(max_idle_hours: int = 48):
    """Delete namespaces with 'dev-' prefix that have been idle for more than N hours"""
    result = subprocess.run(
        ["kubectl", "get", "namespaces", "-o", "json"],
        capture_output=True, text=True
    )
    namespaces = json.loads(result.stdout)

    now = datetime.now(timezone.utc)
    cutoff = now - timedelta(hours=max_idle_hours)

    for ns in namespaces["items"]:
        name = ns["metadata"]["name"]
        if not name.startswith("dev-"):
            continue

        last_activity = get_namespace_last_activity(name)
        if last_activity < cutoff:
            idle_hours = (now - last_activity).total_seconds() / 3600
            print(f"Deleting idle namespace {name} (idle {idle_hours:.0f}h)")
            subprocess.run(["kubectl", "delete", "namespace", name])

if __name__ == "__main__":
    cleanup_idle_namespaces(max_idle_hours=48)

CronJob for overnight scale-down¶

# Scale staging to 0 replicas overnight
apiVersion: batch/v1
kind: CronJob
metadata:
  name: staging-scale-down
  namespace: staging
spec:
  schedule: "0 20 * * 1-5"  # Monday-Friday 20:00
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: scaler-sa
          containers:
          - name: kubectl
            image: bitnami/kubectl:latest
            command:
            - /bin/sh
            - -c
            - |
              kubectl scale deployment --all --replicas=0 -n staging
              kubectl scale statefulset --all --replicas=0 -n staging
          restartPolicy: OnFailure
---
apiVersion: batch/v1
kind: CronJob
metadata:
  name: staging-scale-up
  namespace: staging
spec:
  schedule: "0 8 * * 1-5"  # Monday-Friday 8:00
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: scaler-sa
          containers:
          - name: kubectl
            image: bitnami/kubectl:latest
            command:
            - /bin/sh
            - -c
            - |
              kubectl scale deployment --all --replicas=2 -n staging
          restartPolicy: OnFailure

FinOps — Kubecost and cost visibility¶

Kubecost for granular cost tracking¶

# Install Kubecost
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm install kubecost kubecost/cost-analyzer \
  --namespace kubecost \
  --create-namespace \
  --set kubecostToken="your-token" \
  --set prometheus.server.persistentVolume.size=50Gi

Kubecost lets you see costs per namespace, deployment, label, or team — essential for a chargeback model.

Cost allocation labels¶

# Every workload must have cost labels
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-service
  labels:
    app: api-service
    team: backend          # Team
    cost-center: "CC-1042" # Cost center
    environment: production
    product: core-platform

# Kubecost query per team
curl "http://kubecost/model/allocation?window=30d&aggregate=label:team&idle=true" | \
  jq '.data[0] | to_entries | sort_by(-.value.totalCost) |
      .[] | "\(.key): $\(.value.totalCost | round)"'

Results — real numbers¶

After implementing these measures in a typical enterprise cluster:

Optimization	Typical savings
Right-sizing requests (Goldilocks)	20–30%
Spot instances (70% of workloads)	60–70% on compute
Scale-to-zero for dev/staging	40–60% on nonprod
Karpenter consolidation	10–20%
Cleanup idle resources	5–15%
Total	40–60% of total costs

Implementation plan¶

Week 1–2: Visibility - Deploy Kubecost or OpenCost - Add cost allocation labels to all workloads - Audit resource utilization via Goldilocks

Week 3–4: Quick wins - Right-size the top 20 most over-provisioned deployments - Enable scale-to-zero for dev/staging overnight - Clean up zombie workloads

Month 2: Automation - Karpenter or Cluster Autoscaler with spot pool - HPA/KEDA for key services - Automatic namespace cleanup

Month 3+: FinOps culture - Chargeback reports per team - Cost budgets and alerting - Quarterly reviews with development teams

Our Experience: Logistics Client with 500+ Microservices¶

At a logistics client (500+ microservices on Kubernetes), we reduced cloud spend by 42% in 3 months:

Right-sizing (Goldilocks + VPA): 25% savings — most pods had 4-8x over-provisioned requests
Spot instances (70% of stateless workloads): 12% savings — Karpenter with fallback to on-demand
Scale-to-zero for dev/staging: 5% savings — KEDA + CronJob overnight scale-down
Total: 42% reduction in monthly cloud costs in the first quarter
Optimization was performed with zero impact on 99.9% SLA

Conclusion¶

Kubernetes cost optimization is not a one-time action — it’s a continuous process. Start with visibility (Kubecost), continue with right-sizing (Goldilocks + VPA), and automate scaling (HPA, KEDA, Karpenter). The result is infrastructure that grows with your needs, not despite them.

CORE SYSTEMS helps enterprise organizations implement FinOps culture and Kubernetes cost governance. Contact us for an audit of your infrastructure.

kubernetescost-optimizationfinopsclouddevopsenterprise

CORE SYSTEMS

We build core systems and AI agents that keep operations running. 15 years of experience with enterprise IT.

Need help with implementation?

Our experts can help with design, implementation, and operations. From architecture to production.

Need help with implementation? Schedule a meeting

Kubernetes Cost Optimization: Save 40–60% on Cloud Infra