Resource requests โ the prerequisite
The Horizontal Pod Autoscaler (HPA) adjusts a Deployment's
replica count based on observed metrics โ typically CPU. For the HPA
to do math against CPU, every pod must declare how much CPU it
expects to use. That number is resources.requests.cpu.
Look at the starter web Deployment:
cat autoscaling/web.yaml
Note the resource block:
resources:
requests:
cpu: "100m"
limits:
cpu: "200m"
100m is 1/10 of a CPU core. The HPA will compare actual CPU usage
against this number, not against the limit.
Apply it:
kubectl apply -f autoscaling/web.yaml
Watch the single replica come up:
kubectl get deployment web
Wait until READY shows 1/1. Click Verify step.
Hint
`kubectl apply -f autoscaling/web.yaml` deploys a single replica with `cpu: 100m` requests.
Create the HorizontalPodAutoscaler
The HPA is a separate object that points at a target Deployment.
Create one declaratively from the CLI:
kubectl autoscale deployment web --cpu-percent=50 --min=1 --max=4
This says: keep web at between 1 and 4 replicas, aiming for an
average CPU utilization of 50% of its requested CPU.
See it:
kubectl get hpa
The columns to watch:
REFERENCE โ the Deployment it controls.
TARGETS โ <current>/<target> CPU; <unknown>/50% if metrics
haven't arrived yet.
MINPODS / MAXPODS / REPLICAS โ the bounds and the current count.
For the full picture:
kubectl describe hpa web
You may see the HPA was unable to compute the replica count in
the Events section while metrics-server is still warming up. That's
the next step.
Click Verify step.
Hint
`kubectl autoscale deployment web --cpu-percent=50 --min=1 --max=4`.
Wait for metrics-server to report
The HPA reads CPU usage from metrics-server. k3s doesn't ship one
by default; the scenario installer applied the upstream manifest in
the background when the lab started.
Check it's running:
kubectl get deployment metrics-server -n kube-system
READY should be 1/1. If it isn't yet, give it a minute โ pulling
the image takes ~15-30 seconds the first time.
Once the deployment is Ready, ask for live pod metrics:
kubectl top pods
If you see No resources found. or metrics not available yet,
wait ~30 seconds and try again โ metrics-server polls every 15s and
needs a couple of cycles before it has numbers.
Once kubectl top pods returns CPU/memory rows, look at the HPA
again:
kubectl get hpa
TARGETS is no longer <unknown>/50% โ it shows the live percentage.
That means the HPA is now functional.
Click Verify step.
Hint
`kubectl top pods` โ once it works, the HPA's CURRENT column stops being `<unknown>`.
Generate load and watch it scale up
The web Deployment expects to be reached via a Service called
web. Create it:
kubectl expose deployment web --port=8080 --target-port=8080
Now apply the load generator โ a busybox pod that hammers web with
wget in a tight loop:
kubectl apply -f autoscaling/loadgen.yaml
One load pod isn't enough to drive web's CPU over 50%. Pile on
more:
kubectl scale deployment loadgen --replicas=5
In another tab โ or in this same one โ watch the HPA reach for more
replicas:
kubectl get hpa web -w
The progression takes 1-3 minutes. You should see:
TARGETS climb past 50%.
REPLICAS step up โ usually from 1 to 2, then 3, then 4 (capped
by max).
kubectl get pods -l run=web shows the new pods.
The HPA is conservative on scale-up by design โ it waits to make
sure load is sustained, not a transient spike. The default behaviour
doubles capacity at most every 15 seconds, capped by maxReplicas.
Click Verify step once kubectl get deployment web shows at
least 2 ready replicas.
Hint
`kubectl apply -f autoscaling/loadgen.yaml; kubectl scale deployment/loadgen --replicas=5` drives `web`'s CPU up; replicas climb toward 4.
Remove load and watch it scale down
Remove the load:
kubectl scale deployment loadgen --replicas=0
Confirm no more load pods:
kubectl get pods -l app=loadgen
The HPA scales down more cautiously than it scales up โ to avoid
flapping, it waits 5 minutes by default before reducing replicas.
You can watch the wait:
kubectl describe hpa web | tail -20
The Events: section will show New size: <N> lines as the HPA
makes decisions, and the Conditions: section will indicate
ScaleDownStabilized while it's waiting.
After ~5 minutes (or earlier if you want to short-circuit the wait):
kubectl get hpa web
REPLICAS settles back to 1.
Production tip: tune
behavior.scaleDown.stabilizationWindowSeconds on the HPA spec to
trade scale-down responsiveness for stability. Spiky workloads
often want a longer window; predictable workloads can shrink it.
For this lab, declaring victory once load is removed and the HPA
has acknowledged is enough. Click Verify step.
Hint
`kubectl scale deployment loadgen --replicas=0` โ within ~5 minutes the HPA scales `web` back down.