Kubernetes Cost Optimization: How to Save 40–60% on Cloud Infrastructure in 2026¶
At CORE SYSTEMS, we operate Kubernetes clusters for clients like Packeta (500+ microservices) and Ceska sporitelna — with 99.9% SLA and a team of 50+ experts. From 15+ years of experience, we know that cloud infrastructure costs are growing faster than productivity. The CNCF 2026 survey shows that the average organization overpays for Kubernetes infrastructure by 35–50%. This article is a practical guide to eliminating these losses — based on concrete optimizations we have performed for our clients.
Where money disappears — anatomy of Kubernetes waste¶
Over-provisioning resource requests¶
The biggest source of waste. Developers set requests and limits conservatively because nobody wants their application to be OOMkilled. The result: average CPU utilization in a cluster is typically 15–25%, memory 40–60%.
# Kubernetes Cost Optimization: How to Save 40–60% on Cloud Infrastructure in 2026
resources:
requests:
cpu: "500m" # Actually the app uses 50m
memory: "512Mi" # Actually 120Mi
limits:
cpu: "2000m"
memory: "2Gi"
This pod occupies a billing slot for 500m CPU and 512Mi RAM — even though 90% of that is never used.
Idle namespaces and zombie workloads¶
Development and staging environments run 24/7, despite being active only 8 hours a day. Forgotten jobs, completed CronJobs with history, old ReplicaSets — you’re paying for all of it.
Suboptimal instance types¶
Running a memory-intensive workload on a compute-optimized instance (or vice versa) — you pay for capacity you can’t use.
Resource Optimization — concrete steps¶
1. Goldilocks — automated resource request recommendations¶
Goldilocks analyzes actual usage via VPA (Vertical Pod Autoscaler) and recommends the right values.
# Installation
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm install goldilocks fairwinds-stable/goldilocks \
--namespace goldilocks \
--create-namespace
# Label namespace for analysis
kubectl label namespace production goldilocks.fairwinds.com/enabled=true
# Goldilocks dashboard
kubectl -n goldilocks port-forward svc/goldilocks-dashboard 8080:80
The dashboard shows recommended requests/limits for each deployment based on actual P50/P99 usage over the past N days.
2. VPA (Vertical Pod Autoscaler) in recommendation mode¶
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-service-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-service
updatePolicy:
updateMode: "Off" # Recommendation only, does not apply automatically
resourcePolicy:
containerPolicies:
- containerName: api
minAllowed:
cpu: 50m
memory: 64Mi
maxAllowed:
cpu: 2000m
memory: 4Gi
controlledResources: ["cpu", "memory"]
# Show recommendations
kubectl describe vpa api-service-vpa -n production
# Look for: "Target" values for cpu and memory
3. HPA with custom metrics¶
Horizontal Pod Autoscaler based on custom business metrics (not just CPU):
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-service-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-service
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100"
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # 5 min before scale-down
policies:
- type: Percent
value: 25
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
4. KEDA for event-driven autoscaling¶
For workloads driven by queues (Kafka, SQS, Redis):
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: worker-scaledobject
namespace: production
spec:
scaleTargetRef:
name: queue-worker
minReplicaCount: 0 # Scale to zero!
maxReplicaCount: 50
triggers:
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.eu-west-1.amazonaws.com/123456789/my-queue
queueLength: "10" # 1 replica per 10 messages
awsRegion: eu-west-1
Scale-to-zero is key — a worker with no messages = 0 pods = 0 cost.
Node Optimization¶
Spot/Preemptible instances with Karpenter¶
Karpenter is a modern node autoscaler from AWS that can intelligently mix spot and on-demand instances:
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: kubernetes.io/arch
operator: In
values: ["amd64", "arm64"] # ARM = cheaper
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "m", "r"]
nodeClassRef:
name: default
disruption:
consolidationPolicy: WhenUnderutilized
consolidateAfter: 30s
limits:
cpu: 1000
memory: 4000Gi
Karpenter consolidates nodes automatically — if 3 workloads sit on 3 nodes, it repacks them onto 1 node and shuts down the remaining 2.
Regular cleanup of idle nodes¶
#!/bin/bash
# Script to identify underutilized nodes
kubectl get nodes -o json | jq -r '
.items[] |
select(.status.conditions[] | select(.type=="Ready" and .status=="True")) |
{
name: .metadata.name,
cpu_capacity: .status.capacity.cpu,
mem_capacity: .status.capacity.memory,
age: .metadata.creationTimestamp
}
' | jq -r '.name + " | CPU: " + .cpu_capacity + " | Age: " + .age'
# Check actual utilization via metrics-server
kubectl top nodes --sort-by=cpu | awk '$3+0 < 20 {print "LOW CPU:", $0}'
Namespace and Environment Cleanup¶
Automatic deletion of development environments¶
#!/usr/bin/env python3
"""Auto-cleanup idle development namespaces"""
import subprocess
import json
from datetime import datetime, timezone, timedelta
def get_namespace_last_activity(namespace: str) -> datetime:
"""Get the last activity in a namespace based on events"""
result = subprocess.run(
["kubectl", "get", "events", "-n", namespace,
"--sort-by=.lastTimestamp", "-o", "json"],
capture_output=True, text=True
)
events = json.loads(result.stdout)
if not events["items"]:
return datetime.min.replace(tzinfo=timezone.utc)
last_event = events["items"][-1]
timestamp = last_event["lastTimestamp"]
return datetime.fromisoformat(timestamp.replace("Z", "+00:00"))
def cleanup_idle_namespaces(max_idle_hours: int = 48):
"""Delete namespaces with 'dev-' prefix that have been idle for more than N hours"""
result = subprocess.run(
["kubectl", "get", "namespaces", "-o", "json"],
capture_output=True, text=True
)
namespaces = json.loads(result.stdout)
now = datetime.now(timezone.utc)
cutoff = now - timedelta(hours=max_idle_hours)
for ns in namespaces["items"]:
name = ns["metadata"]["name"]
if not name.startswith("dev-"):
continue
last_activity = get_namespace_last_activity(name)
if last_activity < cutoff:
idle_hours = (now - last_activity).total_seconds() / 3600
print(f"Deleting idle namespace {name} (idle {idle_hours:.0f}h)")
subprocess.run(["kubectl", "delete", "namespace", name])
if __name__ == "__main__":
cleanup_idle_namespaces(max_idle_hours=48)
CronJob for overnight scale-down¶
# Scale staging to 0 replicas overnight
apiVersion: batch/v1
kind: CronJob
metadata:
name: staging-scale-down
namespace: staging
spec:
schedule: "0 20 * * 1-5" # Monday-Friday 20:00
jobTemplate:
spec:
template:
spec:
serviceAccountName: scaler-sa
containers:
- name: kubectl
image: bitnami/kubectl:latest
command:
- /bin/sh
- -c
- |
kubectl scale deployment --all --replicas=0 -n staging
kubectl scale statefulset --all --replicas=0 -n staging
restartPolicy: OnFailure
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: staging-scale-up
namespace: staging
spec:
schedule: "0 8 * * 1-5" # Monday-Friday 8:00
jobTemplate:
spec:
template:
spec:
serviceAccountName: scaler-sa
containers:
- name: kubectl
image: bitnami/kubectl:latest
command:
- /bin/sh
- -c
- |
kubectl scale deployment --all --replicas=2 -n staging
restartPolicy: OnFailure
FinOps — Kubecost and cost visibility¶
Kubecost for granular cost tracking¶
# Install Kubecost
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm install kubecost kubecost/cost-analyzer \
--namespace kubecost \
--create-namespace \
--set kubecostToken="your-token" \
--set prometheus.server.persistentVolume.size=50Gi
Kubecost lets you see costs per namespace, deployment, label, or team — essential for a chargeback model.
Cost allocation labels¶
# Every workload must have cost labels
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-service
labels:
app: api-service
team: backend # Team
cost-center: "CC-1042" # Cost center
environment: production
product: core-platform
# Kubecost query per team
curl "http://kubecost/model/allocation?window=30d&aggregate=label:team&idle=true" | \
jq '.data[0] | to_entries | sort_by(-.value.totalCost) |
.[] | "\(.key): $\(.value.totalCost | round)"'
Results — real numbers¶
After implementing these measures in a typical enterprise cluster:
| Optimization | Typical savings |
|---|---|
| Right-sizing requests (Goldilocks) | 20–30% |
| Spot instances (70% of workloads) | 60–70% on compute |
| Scale-to-zero for dev/staging | 40–60% on nonprod |
| Karpenter consolidation | 10–20% |
| Cleanup idle resources | 5–15% |
| Total | 40–60% of total costs |
Implementation plan¶
Week 1–2: Visibility - Deploy Kubecost or OpenCost - Add cost allocation labels to all workloads - Audit resource utilization via Goldilocks
Week 3–4: Quick wins - Right-size the top 20 most over-provisioned deployments - Enable scale-to-zero for dev/staging overnight - Clean up zombie workloads
Month 2: Automation - Karpenter or Cluster Autoscaler with spot pool - HPA/KEDA for key services - Automatic namespace cleanup
Month 3+: FinOps culture - Chargeback reports per team - Cost budgets and alerting - Quarterly reviews with development teams
Our Experience: Logistics Client with 500+ Microservices¶
At a logistics client (500+ microservices on Kubernetes), we reduced cloud spend by 42% in 3 months:
- Right-sizing (Goldilocks + VPA): 25% savings — most pods had 4-8x over-provisioned requests
- Spot instances (70% of stateless workloads): 12% savings — Karpenter with fallback to on-demand
- Scale-to-zero for dev/staging: 5% savings — KEDA + CronJob overnight scale-down
- Total: 42% reduction in monthly cloud costs in the first quarter
- Optimization was performed with zero impact on 99.9% SLA
Conclusion¶
Kubernetes cost optimization is not a one-time action — it’s a continuous process. Start with visibility (Kubecost), continue with right-sizing (Goldilocks + VPA), and automate scaling (HPA, KEDA, Karpenter). The result is infrastructure that grows with your needs, not despite them.
CORE SYSTEMS helps enterprise organizations implement FinOps culture and Kubernetes cost governance. Contact us for an audit of your infrastructure.
Need help with implementation?
Our experts can help with design, implementation, and operations. From architecture to production.
Contact us