Skip to main content
Version: v2

Autoscaling

The WorkloadDeployment resource implements the standard Kubernetes /scale subresource.

That means a wasmCloud WorkloadDeployment can be scaled like any other resource using kubectl scale, the Horizontal Pod Autoscaler (HPA), or KEDA.

What scales

Autoscaling changes the number of component instances running across the host group. Hosts are a separate pool managed by the operator's host group Deployment; scaling a WorkloadDeployment schedules more instances of the component onto existing hosts, up to each host's pool size.

This has two practical consequences:

  • The host group is a precondition for autoscaling. If you set maxReplicas: 100 but the host group only has capacity for 30 component instances, HPA will hold at 30 and surface the cap in its status. Scale the host group (kubectl scale deployment hostgroup-default -n wasmcloud --replicas=N, or via the runtime.hostGroups[].replicas Helm value) ahead of expected demand, or autoscale the host group separately.
  • Pod- and resource-based HPA metrics do not apply. wasmCloud components are not pods, so HPA's built-in Resource (CPU/memory) and Pods metric types have nothing to read. Use External or Object metrics (typically delivered via the Prometheus Adapter or a KEDA scaler).

Imperative scaling

You can set replicas directly using standard tooling such as kubectl:

shell
kubectl scale workloaddeployment hello --replicas=5

WorkloadDeployment.spec.replicas defaults to 1 (matching native Deployment semantics), so the field can be omitted from manifests where one replica is the desired baseline.

Horizontal Pod Autoscaler (HPA)

A complete Horizontal Pod Autoscaler (HPA) manifest targets a WorkloadDeployment:

yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: hello
spec:
  scaleTargetRef:
    apiVersion: runtime.wasmcloud.dev/v1alpha1
    kind: WorkloadDeployment
    name: hello
  minReplicas: 1
  maxReplicas: 100
  metrics:
    - type: External
      external:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: '10'
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 60

The example above scales on an external http_requests_per_second metric exposed through the Prometheus Adapter or an equivalent External Metrics API provider. Any source the External Metrics API can read works; the metric simply needs to live outside the workload pods, since wasmCloud workloads aren't pods.

Kubernetes Event-Driven Autoscaling (KEDA)

KEDA is a CNCF Graduated project that serves as an event-driven autoscaler. KEDA's ScaledObject wraps the same /scale subresource and adds a large catalog of scalers (Prometheus, Kafka, NATS, OTel, cloud queue services, and so on). Behind the scenes, KEDA creates and manages an HPA on your behalf.

A path that works particularly well with wasmCloud is scaling on a metric emitted by the workload itself: a Wasm component exporting OpenTelemetry from its handlers becomes its own scale signal.

yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: hello
spec:
  scaleTargetRef:
    apiVersion: runtime.wasmcloud.dev/v1alpha1
    kind: WorkloadDeployment
    name: hello
  minReplicaCount: 1
  maxReplicaCount: 100
  pollingInterval: 15
  cooldownPeriod: 60
  triggers:
    - type: prometheus
      metadata:
        serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
        query: sum(rate(http_server_requests_total{workload="hello"}[1m]))
        threshold: '10'

The component itself doesn't need to be aware of KEDA: it emits standard OpenTelemetry metrics, Prometheus (or another OTel collector) scrapes them, and KEDA closes the loop.

Selector label

The operator stamps a managed label on every WorkloadReplicaSet it creates:

text
runtime.wasmcloud.dev/workload-deployment=<deployment-name>

This label backs the /scale subresource's selector field. You can use it from kubectl to list a deployment's replica sets:

shell
kubectl get workloadreplicaset -l runtime.wasmcloud.dev/workload-deployment=hello

Do not edit or remove the label by hand; the operator owns it and rewrites it on every reconcile.

Status fields

For visibility into what HPA or KEDA observes, WorkloadDeployment.status exposes two scale-subresource fields:

FieldTypeDescription
currentReplicasint32Flat replica count read by HPA via statusReplicasPath. Mirrors .status.replicas.current as a scalar.
selectorstringSerialized label selector read by HPA via labelSelectorPath. Populated even during a fresh deploy so HPA never sees a missing selector mid-rollout.

Both fields are operator-managed; they are not part of the user-authored spec.

When autoscaling isn't the right answer

A few cases where WorkloadDeployment autoscaling is a worse fit than alternatives:

  • Bursty, short-lived traffic on a static host group. Each new component instance starts in milliseconds, but the value of autoscaling drops if the host group is already large enough to handle peaks. A higher per-component poolSize can absorb burst without any controller in the loop.
  • Scale-to-zero. Setting minReplicas: 0 is supported, but only the workload scales to zero—the host group keeps running. If you need true zero-cost idle, scale the host group separately or shut down the deployment entirely.
  • Tight latency targets where cold-start of a new instance matters. Even at sub-millisecond instantiation, a scale-up step that lands on a p99-sensitive request is visible; pre-warming via a higher minReplicas is usually cheaper than chasing a metric.