KEP-2021: HPA supports scaling to/from zero pods for object/external metrics

Implementation History
ALPHA Implementable
Created 2020-09-26
Latest v1.36
Milestones
Alpha v1.16
Beta x.y
Stable x.y
Ownership
Owning SIG
SIG Autoscaling
Primary Authors

KEP-2021: HPA supports scaling to/from zero pods for object/external metrics

Release Signoff Checklist

Items marked with (R) are required prior to targeting to a milestone / release.

  • (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
  • (R) KEP approvers have approved the KEP status as implementable
  • (R) Design details are appropriately documented
  • (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
    • e2e Tests for all Beta API Operations (endpoints)
    • (R) Ensure GA e2e tests meet requirements for Conformance Tests
    • (R) Minimum Two Week Window for GA e2e tests to prove flake free
  • (R) Graduation criteria is in place
  • (R) Production readiness review completed
  • (R) Production readiness review approved
  • “Implementation History” section is up-to-date for milestone
  • User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
  • Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

Summary

Horizontal Pod Autoscaler (HPA) automatically scales the number of pods in any resource which supports the scale subresource based on observed CPU or memory utilization (or, with custom metrics support, on some other application-provided metrics) from one to many replicas. This proposal adds support for scaling from zero to many replicas and back to zero for object and external metrics.

Scaling to zero is particularly effective for cost reduction when individual pods demand substantial resource requests, such as dedicated CPUs or GPUs. Since CPU and memory utilization can only be measured on running pods, scaling to zero will be limited to object and external metrics.

Motivation

With the addition of scaling based on object and external metrics it became possible to automatically adjust the number of running replicas based on an application provided metric. A typical use-case case for this is scaling the number of queue consumers based on the length of the consumed queue.

In cases of a frequently idle queue or a less latency-sensitive workload, there is no need to keep one replica running at all times. Instead, you can dynamically scale to zero replicas, especially for workloads with high resource demands, such as those requiring GPUs. This approach not only reduces costs but also has significant energy-saving potential, particularly as GPU workloads become more prevalent. When replicas are scaled to zero, the HPA must also be capable of scaling back up as soon as messages become available.

Goals

  • Provide scaling to zero replicas for object and external metrics
  • Provide scaling from zero replicas for object and external metrics

Non-Goals

  • Provide scaling to/from zero replicas for resource metrics
  • Provide request buffering at the Kubernetes Service level

Proposal

Allow the HPA to scale from and to zero using minReplicas: 0 and a HPA status condition.

User Stories (Optional)

Story 1: Scale a heavy queue consumer on-demand

As the operator of a video processing pipeline, I would like to reduce costs. While video processing is CPU intensive, it is not a latency sensitive workload. Therefore I want my video processing workers to only be created if there is actually a video to be processed and terminated afterwards.

Notes/Constraints/Caveats (Optional)

Currently disabling HPA is possible by manually setting the scaled resource to replicas: 0. This works as the HPA itself could never reach this state itself. As replicas: 0 is now a possible state when using minReplicas: 0 it can no longer be used to differentiate between manually disabled or automatically scaled to zero.

Additionally the replicas: 0 state is problematic as updating a HPA object minReplicas from 0 to 1 has different behavior. If replicas was 0 during the update, HPA will be disabled for the resource, if it was > 0, HPA will continue with the new minReplicas value.

To resolve these issues the KEP is introducing an explicit ScaledToZero condition inside the HorizontalPodAutoscalerStatus. When ScaledToZero=True was recorded the HPA will scale up a workload from 0 ~> 1 and remove the condition ScaledToZero=True. If the condition is not found, the HPA maintains the current behavior of performing no change.

When the HPA scales a workload from 1 ~> 0, it records the ScaledToZero=True condition inside the status.

Risks and Mitigations

As ScaledToZero is no explicit property, applying a new Deployment with replicas: 0 and HPA minReplicas: 0 can be confusing as the Deployment will never scale.

This needs should be documented and is detectable by looking at the existing ScalingActive condition.

In the future pausing the HPA can become an explicit feature and the implicit pausing via replicas: 0 can be deprecate to remove this confusing.

Design Details

Add ScaledToZero as HPA HorizontalPodAutoscalerConditionType

const (
 // ScaledToZero indicates that the HPA controller scaled the workload to zero.
 ScaledToZero HorizontalPodAutoscalerConditionType = "ScaledToZero"
)

Test Plan

[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.

Most logic related to this KEP is contained in the HPA controller so the testing of the various minReplicas, replicas and ScaledToZero should be achievable with unit tests.

Additionally integration tests should be added for enable scale to zero by, setting ScaledToZero: true, setting minReplicas: 1 and waiting for replicas to become 0 and another test for increasing minReplicas: 1 and observing that replicas became 1 again and confirming that ScaledToZero: true has been removed.

Prerequisite testing updates
Unit tests
  • /pkg/controller/podautoscaler: 2025-02-06 - 96.4
Integration tests
  • N/A in this case.
e2e tests

E2E tests will be added under https://github.com/kubernetes/kubernetes/tree/master/test/e2e/autoscaling to test scale down to 0 and scale up with this feature enabled and scale down 1 without this feature.

  • :

Graduation Criteria

Alpha

  • Implement the ScaleToZero condition recording
  • Ensure that all minReplicas state transitions from 0 to 1 are working as expected

Beta

  • Allowing time for feedback
  • E2E tests for scale to/from zero have been added

GA

  • Allowing time for feedback

Upgrade / Downgrade Strategy

As this KEP changes the allowed values for minReplicas, special care is required for the downgrade case to not prevent any kind of updates for HPA objects using minReplicas: 0. The alpha code already accepts minReplicas: 0 with the flag enabled or disabled since Kubernetes version 1.16 downgrades to any version >= 1.16 aren’t an issue.

Before downgrading all HPAs need to be set to minReplicas: 1 to avoid any deployments being stuck at replicas: 1.

Version Skew Strategy

Production Readiness Review Questionnaire

Feature Enablement and Rollback

How can this feature be enabled / disabled in a live cluster?
  • Feature gate (also fill in values in kep.yaml)
    • Feature gate name: HPAScaleToZero
    • Components depending on the feature gate: kube-apiserver
  • Other
    • Describe the mechanism:

      When HPAScaleToZero feature gate is enabled HPA supports scaling to zero pods based on object or external metrics. HPA remains active as long as at least one metric value available.

    • Will enabling / disabling the feature require downtime of the control plane?

      No

    • Will enabling / disabling the feature require downtime or reprovisioning of a node?

      No

Does enabling the feature change any default behavior?

HPA creation/update with minReplicas: 0 is no longer rejected.

Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

Yes. To downgrade the cluster to version that does not support scale-to-zero feature or to disable to feature gate:

  1. Make sure there are no hpa objects with minReplicas=0 and maxReplicas=0. Here is a oneliner to update it to 1:

    $ kubectl get hpa --all-namespaces --no-headers=true | awk '{if($6==0) printf "kubectl patch hpa/%s --namespace=%s -p \"{\\\"spec\\\":{\\\"minReplicas\\\":1,\\\"maxReplicas\\\":1}}\"\n", $2, $1 }' | sh

  2. Disable HPAScaleToZero feature gate

  3. In case step 1. has been omitted, workloads might be stuck with replicas: 0 and need to be manually scaled up to replicas: 1 to re-enable autoscaling.

What happens if we reenable the feature if it was previously rolled back?

Nothing, the feature can be re-enabled without problems and workload with replicas: 0 targeted by a HPA will be scaled again.

Are there any tests for feature enablement/disablement?

There currently unit tests for the alpha cases and tests planned to be added for the new functionality.

Rollout, Upgrade and Rollback Planning

As this is a new field every usage is opt-in. In case the kubernetes version is downgraded, currently scaled to 0 workloads might need to be manually scaled to 1 as the controller would treat them as paused otherwise.

If a rollback is planned, the following steps should be performed before downgrading the kubernetes version:

  1. Make sure there are no hpa objects with minReplicas=0 and maxReplicas=0. Here is a oneliner to update it to 1:

    $ kubectl get hpa --all-namespaces --no-headers=true | awk '{if($6==0) printf "kubectl patch hpa/%s --namespace=%s -p \"{\\\"spec\\\":{\\\"minReplicas\\\":1,\\\"maxReplicas\\\":1}}\"\n", $2, $1 }' | sh

  2. Disable HPAScaleToZero feature gate

  3. Downgrade the Kubernetes version

How can a rollout or rollback fail? Can it impact already running workloads?

There are no expected side-effects when the rollout fails as the new ScaleToZero condition should only be enabled once the version upgraded completed.

If the kube-apiserver has been upgraded before the kube-controller-manager, an HPA object has been updated to minReplicas: 0 and the workload is already scaled down to 0 replicas, you must manually scale the workload to at least one replica.

You can detect this situation in one of two ways:

  • Manually, by checking the HPA status and verifying that all entries show ScalingActive set to true and do not mention ScalingDisabled, or

  • Automatically, by using the kube_horizontalpodautoscaler_status_condition metric provided by kube-state-metrics to ensure the ScalingActive condition is true.

If an rollback is attempted, all HPAs should be updated to minReplicas: 1 as otherwise HPA for deployments with zero replicas will be disabled until replicas have been raised explicitly to at least 1.

What specific metrics should inform a rollback?

If workloads an unexpected number of HPA entities contain a the status ScalingActive false and mention ScalingDisable the feature isn’t working as desired and all HPA objects should be updated to > 0 again and their managed workloads should be scaled to at least 1.

This condition can also be detected using the kube_horizontalpodautoscaler_status_condition metric provided by kube-state-metrics , but reason should be manually confirmed for flagged HPA objects.

Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

No yet as no implementation based on the new condition is available.

Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

No.

Monitoring Requirements

How can an operator determine if the feature is in use by workloads?

The new status will be visible inside the kube_horizontalpodautoscaler_status_condition metric provided by kube-state-metrics as and the minReplicas: 0 setting reflected in kube_horizontalpodautoscaler_spec_min_replicas.

How can someone using this feature know that it is working for their instance?

When this feature is enabled for a workload scaled based on an object or external metric, the workload should be scaled to 0 replicas when the metric is 0.

What are the reasonable SLOs (Service Level Objectives) for the enhancement?

No changes to the autoscaling SLOs.

What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?

No changes to the autoscaling SLIs.

Are there any missing metrics that would be useful to have to improve observability of this feature?

No, in regards to this KEP.

Dependencies

Does this feature depend on any specific services running in the cluster?

The addition has the same dependencies as the current autoscaling controller.

Scalability

Will enabling / using this feature result in any new API calls?

No, the amount of autoscaling related API calls will remain unchanged. No other components are affected.

Will enabling / using this feature result in introducing new API types?

No, this only modifies the existing API types.

Will enabling / using this feature result in any new calls to the cloud provider?

No, the amount of autoscaling related cloud provider calls will remain unchanged. No other components are affected.

Will enabling / using this feature result in increasing size or count of the existing API objects?

Yes, one additional boolean field inside the spec of every HorizontalPodAutoscaler resource.

Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?

No, the are no visible latency changes expected for existing autoscaling operations.

Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?

No, the are no visible changes expected for existing autoscaling operations.

Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?

No.

Troubleshooting

How does this feature react if the API server and/or etcd is unavailable?

Autoscaling will not occur, this is the same as the current behaviour.

What are other known failure modes?
What steps should be taken if SLOs are not being met to determine the problem?

Check metric_computation_duration_seconds to see which metric encountered the latency issue. If the latency problem is caused by metrics used for scaling to zero, you can remove those metrics again from your HPA(s).

Implementation History

Drawbacks

Alternatives

Third-party solutions like KEDA already support scaling to zero for various resource (e.g. RabbitMQ Queues . However, these solutions often introduce additional paradigms and complexity. Since Horizontal Pod Autoscaling is already a core feature of Kubernetes and supports scaling to one, adding native support for scaling to zero would be a valuable and low-complexity enhancement.

Infrastructure Needed (Optional)