KEP-2021: HPA supports scaling to/from zero pods for object/external metrics

Release Signoff Checklist
Summary
Motivation
- Goals
- Non-Goals
Proposal
Design Details
Production Readiness Review Questionnaire
Implementation History
Drawbacks
Alternatives
Infrastructure Needed (Optional)

Release Signoff Checklist

Items marked with (R) are required prior to targeting to a milestone / release.

(R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
(R) KEP approvers have approved the KEP status as implementable
(R) Design details are appropriately documented
(R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
(R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests
(R) Production readiness review completed
(R) Production readiness review approved
“Implementation History” section is up-to-date for milestone
User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

Summary

Horizontal Pod Autoscaler (HPA) automatically scales the number of pods in any resource which supports the scale subresource based on observed CPU or memory utilization (or, with custom metrics support, on some other application-provided metrics) from one to many replicas. This proposal adds support for scaling from zero to many replicas and back to zero for object and external metrics.

Scaling to zero is particularly effective for cost reduction when individual pods demand substantial resource requests, such as dedicated CPUs or GPUs. Since CPU and memory utilization can only be measured on running pods, scaling to zero will be limited to object and external metrics.

Motivation

With the addition of scaling based on object and external metrics it became possible to automatically adjust the number of running replicas based on an application provided metric. A typical use-case case for this is scaling the number of queue consumers based on the length of the consumed queue.

In cases of a frequently idle queue or a less latency-sensitive workload, there is no need to keep one replica running at all times. Instead, you can dynamically scale to zero replicas, especially for workloads with high resource demands, such as those requiring GPUs. This approach not only reduces costs but also has significant energy-saving potential, particularly as GPU workloads become more prevalent. When replicas are scaled to zero, the HPA must also be capable of scaling back up as soon as messages become available.

Goals

Provide scaling to zero replicas for object and external metrics
Provide scaling from zero replicas for object and external metrics

Non-Goals

Provide scaling to/from zero replicas for resource metrics
Provide request buffering at the Kubernetes Service level

Proposal

Allow the HPA to scale from and to zero using minReplicas: 0 and a HPA status condition.

User Stories (Optional)

Story 1: Scale a heavy queue consumer on-demand

As the operator of a video processing pipeline, I would like to reduce costs. While video processing is CPU intensive, it is not a latency sensitive workload. Therefore I want my video processing workers to only be created if there is actually a video to be processed and terminated afterwards.

Notes/Constraints/Caveats (Optional)

Currently disabling HPA is possible by manually setting the scaled resource to replicas: 0. This works as the HPA itself could never reach this state itself. As replicas: 0 is now a possible state when using minReplicas: 0 it can no longer be used to differentiate between manually disabled or automatically scaled to zero.

Additionally the replicas: 0 state is problematic as updating a HPA object minReplicas from 0 to 1 has different behavior. If replicas was 0 during the update, HPA will be disabled for the resource, if it was > 0, HPA will continue with the new minReplicas value.

To resolve these issues the KEP is introducing an explicit ScaledToZero condition inside the HorizontalPodAutoscalerStatus. When ScaledToZero=True was recorded the HPA will scale up a workload from 0 ~> 1 and remove the condition ScaledToZero=True. If the condition is not found, the HPA maintains the current behavior of performing no change.

When the HPA scales a workload from 1 ~> 0, it records the ScaledToZero=True condition inside the status.

Risks and Mitigations

As ScaledToZero is no explicit property, applying a new Deployment with replicas: 0 and HPA minReplicas: 0 can be confusing as the Deployment will never scale.

This needs should be documented and is detectable by looking at the existing ScalingActive condition.

In the future pausing the HPA can become an explicit feature and the implicit pausing via replicas: 0 can be deprecate to remove this confusing.

Design Details

Add ScaledToZero as HPA HorizontalPodAutoscalerConditionType

const (
 // ScaledToZero indicates that the HPA controller scaled the workload to zero.
 ScaledToZero HorizontalPodAutoscalerConditionType = "ScaledToZero"
)

Test Plan

[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.

Most logic related to this KEP is contained in the HPA controller so the testing of the various minReplicas, replicas and ScaledToZero should be achievable with unit tests.

Additionally integration tests should be added for enable scale to zero by, setting ScaledToZero: true, setting minReplicas: 1 and waiting for replicas to become 0 and another test for increasing minReplicas: 1 and observing that replicas became 1 again and confirming that ScaledToZero: true has been removed.

Prerequisite testing updates

Unit tests

/pkg/controller/podautoscaler: 2025-02-06 - 96.4

Integration tests

HPA integration tests are being introduced via https://github.com/kubernetes/kubernetes/pull/138464 , which includes a scale-to-zero and back scenario for the HPAScaleToZero feature gate. As a follow-up for beta we plan to add a negative test case asserting that scale-to-zero does not happen when an HPA is configured with a CPU (resource) metric, since scaling to zero is intentionally limited to object/external metrics.

e2e tests

E2E tests under https://github.com/kubernetes/kubernetes/tree/master/test/e2e/autoscaling cover scaling down to 0 and back up from 0 based on an external metric with the HPAScaleToZero feature gate enabled.

[sig-autoscaling] [Feature:HPA] [Feature:HPAScaleToZero] Horizontal pod autoscaling (scale to zero) should scale down to zero and back up based on external metric value: https://github.com/kubernetes/kubernetes/blob/master/test/e2e/autoscaling/horizontal_pod_autosclaing_external_metrics.go

Graduation Criteria

Alpha

Implement the ScaleToZero condition recording
Ensure that all minReplicas state transitions from 0 to 1 are working as expected

Beta

Condition-based scale from/to zero implementation merged (https://github.com/kubernetes/kubernetes/pull/135118 )
Unit tests cover behavior with the HPAScaleToZero feature gate both enabled and disabled
E2E test for scale-to-zero and scale-from-zero based on an external metric under test/e2e/autoscaling gated on HPAScaleToZero
Integration tests cover scale-to-zero and back (https://github.com/kubernetes/kubernetes/pull/138464 ) and a negative case ensuring HPAs configured with resource (CPU) metrics are not scaled to zero
Production readiness review approved for beta
User-facing documentation updated in kubernetes/website

GA

HPAScaleToZero feature gate has been enabled by default in beta for at least one release without blocking bugs
Feedback from beta users has been gathered and addressed
E2E test(s) have been running consistently without flakes

Upgrade / Downgrade Strategy

As this KEP changes the allowed values for minReplicas, special care is required for the downgrade case to not prevent any kind of updates for HPA objects using minReplicas: 0. API validation has accepted minReplicas: 0 with the HPAScaleToZero feature gate enabled since Kubernetes 1.16, so downgrades to any version >= 1.16 will not reject existing HPA objects.

Before downgrading to a version without the condition-based implementation, all HPAs using minReplicas: 0 should be set to minReplicas: 1 and their workloads scaled to at least one replica, otherwise workloads currently scaled to replicas: 0 may remain stuck at replicas: 0 (the old controller cannot distinguish “manually paused” from “HPA scaled to zero” without the ScaledToZero condition).

Version Skew Strategy

This feature only affects control-plane components (kube-apiserver and kube-controller-manager); there is no interaction with the kubelet, CRI, CNI, or CSI, so node version skew is not relevant.

The relevant skew is between kube-apiserver and kube-controller-manager:

kube-apiserver upgraded first, kube-controller-manager still on the previous version: users may create or update HPAs with minReplicas: 0, but the older controller does not understand the ScaledToZero condition. It will treat replicas: 0 as “manually paused” and will not scale the workload back up from zero. Operators should either avoid minReplicas: 0 until both components are on the new version, or manually scale affected workloads back to replicas >= 1 after the controller is upgraded.
kube-controller-manager upgraded first, kube-apiserver still on the previous version: this is not a supported skew direction in Kubernetes, but has no adverse effect in practice, since the older API server will continue to reject minReplicas: 0 and the controller simply never observes HPAs that require the new behavior.

The HPAScaleToZero feature gate lives in kube-apiserver (for validation) and kube-controller-manager (for the condition-based scaling behavior). Enabling the gate only in one of the two components is effectively a subset of the skew cases above.

Production Readiness Review Questionnaire

Feature Enablement and Rollback

How can this feature be enabled / disabled in a live cluster?

Feature gate (also fill in values in kep.yaml)
- Feature gate name: HPAScaleToZero
- Components depending on the feature gate: kube-apiserver (accepting minReplicas: 0 during validation) and kube-controller-manager (the condition-based scale-from/to-zero behavior)
Other
- Describe the mechanism:
  When HPAScaleToZero feature gate is enabled HPA supports scaling to zero pods based on object or external metrics. HPA remains active as long as at least one metric value available.
- Will enabling / disabling the feature require downtime of the control plane?
  No
- Will enabling / disabling the feature require downtime or reprovisioning of a node?
  No

Does enabling the feature change any default behavior?

HPA creation/update with minReplicas: 0 is no longer rejected.

Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

Yes. To downgrade the cluster to version that does not support scale-to-zero feature or to disable to feature gate:

Make sure there are no hpa objects with minReplicas=0 and maxReplicas=0. Here is a oneliner to update it to 1:
$ kubectl get hpa --all-namespaces --no-headers=true | awk '{if($6==0) printf "kubectl patch hpa/%s --namespace=%s -p \"{\\\"spec\\\":{\\\"minReplicas\\\":1,\\\"maxReplicas\\\":1}}\"\n", $2, $1 }' | sh
Disable HPAScaleToZero feature gate
In case step 1. has been omitted, workloads might be stuck with replicas: 0 and need to be manually scaled up to replicas: 1 to re-enable autoscaling.

What happens if we reenable the feature if it was previously rolled back?

Nothing, the feature can be re-enabled without problems and workload with replicas: 0 targeted by a HPA will be scaled again.

Are there any tests for feature enablement/disablement?

Yes. Unit tests in pkg/controller/podautoscaler/horizontal_test.go exercise the HPA controller with the HPAScaleToZero feature gate both enabled and disabled, covering:

HPA creation with minReplicas: 0 being rejected by API validation when the gate is off and accepted when the gate is on.
Scaling from zero only occurring when the ScaledToZero=True condition is present (i.e. recorded by the HPA itself) and the gate is enabled.
The controller conservatively leaving a workload at replicas: 0 when the gate is off, so manually paused workloads are not disturbed.

An e2e test exists at test/e2e/autoscaling/horizontal_pod_autosclaing_external_metrics.go gated on the HPAScaleToZero feature gate via framework.WithFeatureGate. Integration tests covering feature-gate on/off paths are being added in https://github.com/kubernetes/kubernetes/pull/138464 .

Rollout, Upgrade and Rollback Planning

As this is a new field every usage is opt-in. In case the kubernetes version is downgraded, currently scaled to 0 workloads might need to be manually scaled to 1 as the controller would treat them as paused otherwise.

If a rollback is planned, the following steps should be performed before downgrading the kubernetes version:

Make sure there are no hpa objects with minReplicas=0 and maxReplicas=0. Here is a oneliner to update it to 1:
$ kubectl get hpa --all-namespaces --no-headers=true | awk '{if($6==0) printf "kubectl patch hpa/%s --namespace=%s -p \"{\\\"spec\\\":{\\\"minReplicas\\\":1,\\\"maxReplicas\\\":1}}\"\n", $2, $1 }' | sh
Disable HPAScaleToZero feature gate
Downgrade the Kubernetes version

How can a rollout or rollback fail? Can it impact already running workloads?

There are no expected side-effects when the rollout fails as the new ScaleToZero condition should only be enabled once the version upgraded completed.

If the kube-apiserver has been upgraded before the kube-controller-manager, an HPA object has been updated to minReplicas: 0 and the workload is already scaled down to 0 replicas, you must manually scale the workload to at least one replica.

You can detect this situation in one of two ways:

Manually, by checking the HPA status and verifying that all entries show ScalingActive set to true and do not mention ScalingDisabled, or
Automatically, by using the kube_horizontalpodautoscaler_status_condition metric provided by kube-state-metrics to ensure the ScalingActive condition is true.

If an rollback is attempted, all HPAs should be updated to minReplicas: 1 as otherwise HPA for deployments with zero replicas will be disabled until replicas have been raised explicitly to at least 1.

What specific metrics should inform a rollback?

If workloads an unexpected number of HPA entities contain a the status ScalingActive false and mention ScalingDisable the feature isn’t working as desired and all HPA objects should be updated to > 0 again and their managed workloads should be scaled to at least 1.

This condition can also be detected using the kube_horizontalpodautoscaler_status_condition metric provided by kube-state-metrics , but reason should be manually confirmed for flagged HPA objects.

Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

The condition-based implementation merged for 1.36 (https://github.com/kubernetes/kubernetes/pull/135118 ), so the upgrade and rollback paths can now be exercised:

Upgrade (gate off → on): existing HPAs are unaffected. minReplicas: 0 only starts being accepted once the gate is enabled on kube-apiserver, and scale-to-zero only occurs once the gate is enabled on kube-controller-manager. No ScaledToZero condition exists on objects created before the upgrade, so the controller takes no new action on them.
Downgrade / disablement (gate on → off): the controller stops scaling workloads to zero. Workloads already at replicas: 0 with a recorded ScaledToZero condition must be scaled back to replicas: 1 before the downgrade (see the rollback steps above), because the disabled/older controller treats replicas: 0 as manually paused and will not scale it back up.
Upgrade → downgrade → upgrade: re-enabling the gate resumes scale-from-zero for HPAs still configured with minReplicas: 0; no manual recovery is required for objects that were left untouched.

This behavior is covered by the integration tests added in https://github.com/kubernetes/kubernetes/pull/138464 , which exercise the feature-gate on/off paths.

Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

No.

Monitoring Requirements

How can an operator determine if the feature is in use by workloads?

The new status will be visible inside the kube_horizontalpodautoscaler_status_condition metric provided by kube-state-metrics as and the minReplicas: 0 setting reflected in kube_horizontalpodautoscaler_spec_min_replicas.

How can someone using this feature know that it is working for their instance?

When this feature is enabled for a workload scaled based on an object or external metric, the workload should be scaled to 0 replicas when the metric is 0.

What are the reasonable SLOs (Service Level Objectives) for the enhancement?

No changes to the autoscaling SLOs.

Scaling from 0 ~> 1 is a new path, but it reuses the regular HPA reconcile loop and is therefore bounded by the existing sync period (--horizontal-pod-autoscaler-sync-period, default 15s): once the object or external metric crosses the threshold, the workload is scaled up on the next reconcile, plus whatever freshness the metrics pipeline adds. The downscale stabilization window does not apply to scale-up, so no additional delay is introduced beyond a single sync period.

What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?

No changes to the autoscaling SLIs.

Are there any missing metrics that would be useful to have to improve observability of this feature?

No, in regards to this KEP.

Dependencies

Does this feature depend on any specific services running in the cluster?

The addition has the same dependencies as the current autoscaling controller.

Scalability

Will enabling / using this feature result in any new API calls?

No, the amount of autoscaling related API calls will remain unchanged. No other components are affected.

Will enabling / using this feature result in introducing new API types?

No, this only modifies the existing API types.

Will enabling / using this feature result in any new calls to the cloud provider?

No, the amount of autoscaling related cloud provider calls will remain unchanged. No other components are affected.

Will enabling / using this feature result in increasing size or count of the existing API objects?

Yes, one additional boolean field inside the spec of every HorizontalPodAutoscaler resource.

Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?

No, the are no visible latency changes expected for existing autoscaling operations.

Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?

No, the are no visible changes expected for existing autoscaling operations.

Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?

No.

Troubleshooting

How does this feature react if the API server and/or etcd is unavailable?

Autoscaling will not occur, this is the same as the current behaviour.

What are other known failure modes?

Failed to fetch the relevant object or external metrics.
- Detection: ScalingActive: false condition with FailedGetExternalMetric or FailedGetObjectMetric reason.
- Mitigations: manually scale the resource.
- Diagnostics: Related errors should be printed as the messages of ScalingActive: false.
- Testing: https://github.com/kubernetes/kubernetes/blob/0e3818e02760afa8ed0bea74c6973f605ca4683c/pkg/controller/podautoscaler/replica_calculator_test.go#L451

What steps should be taken if SLOs are not being met to determine the problem?

Check metric_computation_duration_seconds to see which metric encountered the latency issue. If the latency problem is caused by metrics used for scaling to zero, you can remove those metrics again from your HPA(s).

Implementation History

(2019/02/25) Original design doc: https://github.com/kubernetes/kubernetes/issues/69687#issuecomment-467082733
(2019/07/16) Alpha implementation (https://github.com/kubernetes/kubernetes/pull/74526 ) merged for Kubernetes 1.16
(2026/03/18) Alpha re-implementation (https://github.com/kubernetes/kubernetes/pull/135118 ) merged for Kubernetes 1.36
(2026/04/21) Targeted for Beta graduation in Kubernetes 1.37

Drawbacks

Alternatives

Third-party solutions like KEDA already support scaling to zero for various resource (e.g. RabbitMQ Queues . However, these solutions often introduce additional paradigms and complexity. Since Horizontal Pod Autoscaling is already a core feature of Kubernetes and supports scaling to one, adding native support for scaling to zero would be a valuable and low-complexity enhancement.

KEP-2021: HPA supports scaling to/from zero pods for object/external metrics

KEP-2021: HPA supports scaling to/from zero pods for object/external metrics

Release Signoff Checklist

Summary

Motivation

Goals

Non-Goals

Proposal

User Stories (Optional)

Story 1: Scale a heavy queue consumer on-demand

Notes/Constraints/Caveats (Optional)

Risks and Mitigations

Design Details

Test Plan

Prerequisite testing updates

Unit tests

Integration tests

e2e tests

Graduation Criteria

Alpha

Beta

GA

Upgrade / Downgrade Strategy

Version Skew Strategy

Production Readiness Review Questionnaire

Feature Enablement and Rollback

How can this feature be enabled / disabled in a live cluster?

Does enabling the feature change any default behavior?

Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

What happens if we reenable the feature if it was previously rolled back?

Are there any tests for feature enablement/disablement?

Rollout, Upgrade and Rollback Planning

How can a rollout or rollback fail? Can it impact already running workloads?

What specific metrics should inform a rollback?

Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

Monitoring Requirements

How can an operator determine if the feature is in use by workloads?

How can someone using this feature know that it is working for their instance?

What are the reasonable SLOs (Service Level Objectives) for the enhancement?

What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?

Are there any missing metrics that would be useful to have to improve observability of this feature?

Dependencies

Does this feature depend on any specific services running in the cluster?

Scalability

Will enabling / using this feature result in any new API calls?

Will enabling / using this feature result in introducing new API types?

Will enabling / using this feature result in any new calls to the cloud provider?

Will enabling / using this feature result in increasing size or count of the existing API objects?

Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?

Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?

Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?

Troubleshooting

How does this feature react if the API server and/or etcd is unavailable?

What are other known failure modes?

What steps should be taken if SLOs are not being met to determine the problem?

Implementation History

Drawbacks

Alternatives

Infrastructure Needed (Optional)