Autoscaling
The WorkloadDeployment resource implements the standard Kubernetes /scale subresource.
That means a wasmCloud WorkloadDeployment can be scaled like any other resource using kubectl scale, the Horizontal Pod Autoscaler (HPA), or KEDA.
What scales
Autoscaling changes the number of component instances running across the host group. Hosts are a separate pool managed by the operator's host group Deployment; scaling a WorkloadDeployment schedules more instances of the component onto existing hosts, up to each host's pool size.
This has two practical consequences:
- The host group is a precondition for autoscaling. If you set
maxReplicas: 100but the host group only has capacity for 30 component instances, HPA will hold at 30 and surface the cap in its status. Scale the host group (kubectl scale deployment hostgroup-default -n wasmcloud --replicas=N, or via theruntime.hostGroups[].replicasHelm value) ahead of expected demand, or autoscale the host group separately. - Pod- and resource-based HPA metrics do not apply. wasmCloud components are not pods, so HPA's built-in
Resource(CPU/memory) andPodsmetric types have nothing to read. UseExternalorObjectmetrics (typically delivered via the Prometheus Adapter or a KEDA scaler).
Imperative scaling
You can set replicas directly using standard tooling such as kubectl:
kubectl scale workloaddeployment hello --replicas=5WorkloadDeployment.spec.replicas defaults to 1 (matching native Deployment semantics), so the field can be omitted from manifests where one replica is the desired baseline.
Horizontal Pod Autoscaler (HPA)
A complete Horizontal Pod Autoscaler (HPA) manifest targets a WorkloadDeployment:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: hello
spec:
scaleTargetRef:
apiVersion: runtime.wasmcloud.dev/v1alpha1
kind: WorkloadDeployment
name: hello
minReplicas: 1
maxReplicas: 100
metrics:
- type: External
external:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: '10'
behavior:
scaleDown:
stabilizationWindowSeconds: 60The example above scales on an external http_requests_per_second metric exposed through the Prometheus Adapter or an equivalent External Metrics API provider. Any source the External Metrics API can read works; the metric simply needs to live outside the workload pods, since wasmCloud workloads aren't pods.
Kubernetes Event-Driven Autoscaling (KEDA)
KEDA is a CNCF Graduated project that serves as an event-driven autoscaler. KEDA's ScaledObject wraps the same /scale subresource and adds a large catalog of scalers (Prometheus, Kafka, NATS, OTel, cloud queue services, and so on). Behind the scenes, KEDA creates and manages an HPA on your behalf.
A path that works particularly well with wasmCloud is scaling on a metric emitted by the workload itself: a Wasm component exporting OpenTelemetry from its handlers becomes its own scale signal.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: hello
spec:
scaleTargetRef:
apiVersion: runtime.wasmcloud.dev/v1alpha1
kind: WorkloadDeployment
name: hello
minReplicaCount: 1
maxReplicaCount: 100
pollingInterval: 15
cooldownPeriod: 60
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
query: sum(rate(http_server_requests_total{workload="hello"}[1m]))
threshold: '10'The component itself doesn't need to be aware of KEDA: it emits standard OpenTelemetry metrics, Prometheus (or another OTel collector) scrapes them, and KEDA closes the loop.
Selector label
The operator stamps a managed label on every WorkloadReplicaSet it creates:
runtime.wasmcloud.dev/workload-deployment=<deployment-name>This label backs the /scale subresource's selector field. You can use it from kubectl to list a deployment's replica sets:
kubectl get workloadreplicaset -l runtime.wasmcloud.dev/workload-deployment=helloDo not edit or remove the label by hand; the operator owns it and rewrites it on every reconcile.
Status fields
For visibility into what HPA or KEDA observes, WorkloadDeployment.status exposes two scale-subresource fields:
| Field | Type | Description |
|---|---|---|
currentReplicas | int32 | Flat replica count read by HPA via statusReplicasPath. Mirrors .status.replicas.current as a scalar. |
selector | string | Serialized label selector read by HPA via labelSelectorPath. Populated even during a fresh deploy so HPA never sees a missing selector mid-rollout. |
Both fields are operator-managed; they are not part of the user-authored spec.
When autoscaling isn't the right answer
A few cases where WorkloadDeployment autoscaling is a worse fit than alternatives:
- Bursty, short-lived traffic on a static host group. Each new component instance starts in milliseconds, but the value of autoscaling drops if the host group is already large enough to handle peaks. A higher per-component
poolSizecan absorb burst without any controller in the loop. - Scale-to-zero. Setting
minReplicas: 0is supported, but only the workload scales to zero—the host group keeps running. If you need true zero-cost idle, scale the host group separately or shut down the deployment entirely. - Tight latency targets where cold-start of a new instance matters. Even at sub-millisecond instantiation, a scale-up step that lands on a p99-sensitive request is visible; pre-warming via a higher
minReplicasis usually cheaper than chasing a metric.