KEP-2021: HPA supports scaling to/from zero pods for object/external metrics
KEP-2021: HPA supports scaling to/from zero pods for object/external metrics
- Release Signoff Checklist
- Summary
- Motivation
- Proposal
- Design Details
- Production Readiness Review Questionnaire
- Implementation History
- Drawbacks
- Alternatives
- Infrastructure Needed (Optional)
Release Signoff Checklist
Items marked with (R) are required prior to targeting to a milestone / release.
- (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
- (R) KEP approvers have approved the KEP status as
implementable - (R) Design details are appropriately documented
- (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
- (R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests
- (R) Production readiness review completed
- (R) Production readiness review approved
- “Implementation History” section is up-to-date for milestone
- User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
- Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
Summary
Horizontal Pod Autoscaler
(HPA) automatically scales the number of pods in any resource which supports the scale subresource based on observed CPU or memory utilization
(or, with custom metrics support, on some other application-provided metrics) from one to many replicas. This proposal adds support for scaling from zero to many replicas and back to zero for object and external metrics.
Scaling to zero is particularly effective for cost reduction when individual pods demand substantial resource requests, such as dedicated CPUs or GPUs. Since CPU and memory utilization can only be measured on running pods, scaling to zero will be limited to object and external metrics.
Motivation
With the addition of scaling based on object and external metrics it became possible to automatically adjust the number of running replicas based on an application provided metric. A typical use-case case for this is scaling the number of queue consumers based on the length of the consumed queue.
In cases of a frequently idle queue or a less latency-sensitive workload, there is no need to keep one replica running at all times. Instead, you can dynamically scale to zero replicas, especially for workloads with high resource demands, such as those requiring GPUs. This approach not only reduces costs but also has significant energy-saving potential, particularly as GPU workloads become more prevalent. When replicas are scaled to zero, the HPA must also be capable of scaling back up as soon as messages become available.
Goals
- Provide scaling to zero replicas for object and external metrics
- Provide scaling from zero replicas for object and external metrics
Non-Goals
- Provide scaling to/from zero replicas for resource metrics
- Provide request buffering at the Kubernetes Service level
Proposal
Allow the HPA to scale from and to zero using minReplicas: 0 and a HPA status condition.
User Stories (Optional)
Story 1: Scale a heavy queue consumer on-demand
As the operator of a video processing pipeline, I would like to reduce costs. While video processing is CPU intensive, it is not a latency sensitive workload. Therefore I want my video processing workers to only be created if there is actually a video to be processed and terminated afterwards.
Notes/Constraints/Caveats (Optional)
Currently disabling HPA is possible by manually setting the scaled resource to replicas: 0. This works as the HPA itself could never reach this state itself.
As replicas: 0 is now a possible state when using minReplicas: 0 it can no longer be used to differentiate between manually disabled or automatically scaled to zero.
Additionally the replicas: 0 state is problematic as updating a HPA object minReplicas from 0 to 1 has different behavior. If replicas was 0 during the update, HPA
will be disabled for the resource, if it was > 0, HPA will continue with the new minReplicas value.
To resolve these issues the KEP is introducing an explicit ScaledToZero condition inside the HorizontalPodAutoscalerStatus. When ScaledToZero=True was recorded the HPA will scale
up a workload from 0 ~> 1 and remove the condition ScaledToZero=True. If the condition is not found, the HPA maintains the current behavior of performing no change.
When the HPA scales a workload from 1 ~> 0, it records the ScaledToZero=True condition inside the status.
Risks and Mitigations
As ScaledToZero is no explicit property, applying a new Deployment with replicas: 0 and HPA minReplicas: 0 can be confusing as the Deployment will never scale.
This needs should be documented and is detectable by looking at the existing ScalingActive condition.
In the future pausing the HPA can become an explicit feature and the implicit pausing via replicas: 0 can be deprecate to remove this confusing.
Design Details
Add ScaledToZero as HPA HorizontalPodAutoscalerConditionType
const (
// ScaledToZero indicates that the HPA controller scaled the workload to zero.
ScaledToZero HorizontalPodAutoscalerConditionType = "ScaledToZero"
)
Test Plan
[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.
Most logic related to this KEP is contained in the HPA controller so the testing of
the various minReplicas, replicas and ScaledToZero should be achievable with unit tests.
Additionally integration tests should be added for enable scale to zero by, setting
ScaledToZero: true, setting minReplicas: 1 and waiting for replicas to become 0 and another test for increasing minReplicas: 1 and observing that replicas became 1 again and confirming that ScaledToZero: true has been removed.
Prerequisite testing updates
Unit tests
/pkg/controller/podautoscaler:2025-02-06-96.4
Integration tests
- N/A in this case.
e2e tests
E2E tests will be added under https://github.com/kubernetes/kubernetes/tree/master/test/e2e/autoscaling
to test scale down to 0 and scale up
with this feature enabled and scale down 1 without this feature.
:
Graduation Criteria
Alpha
- Implement the
ScaleToZerocondition recording - Ensure that all
minReplicasstate transitions from0to1are working as expected
Beta
- Allowing time for feedback
- E2E tests for scale to/from zero have been added
GA
- Allowing time for feedback
Upgrade / Downgrade Strategy
As this KEP changes the allowed values for minReplicas, special care is required for the downgrade case to not prevent any kind of updates for HPA objects using minReplicas: 0. The alpha code already accepts minReplicas: 0 with the flag enabled or disabled since Kubernetes version 1.16 downgrades to any version >= 1.16 aren’t an issue.
Before downgrading all HPAs need to be set to minReplicas: 1 to avoid any deployments being stuck at replicas: 1.
Version Skew Strategy
Production Readiness Review Questionnaire
Feature Enablement and Rollback
How can this feature be enabled / disabled in a live cluster?
- Feature gate (also fill in values in
kep.yaml)- Feature gate name:
HPAScaleToZero - Components depending on the feature gate:
kube-apiserver
- Feature gate name:
- Other
Describe the mechanism:
When HPAScaleToZero feature gate is enabled HPA supports scaling to zero pods based on object or external metrics. HPA remains active as long as at least one metric value available.
Will enabling / disabling the feature require downtime of the control plane?
No
Will enabling / disabling the feature require downtime or reprovisioning of a node?
No
Does enabling the feature change any default behavior?
HPA creation/update with minReplicas: 0 is no longer rejected.
Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
Yes. To downgrade the cluster to version that does not support scale-to-zero feature or to disable to feature gate:
Make sure there are no hpa objects with minReplicas=0 and maxReplicas=0. Here is a oneliner to update it to 1:
$ kubectl get hpa --all-namespaces --no-headers=true | awk '{if($6==0) printf "kubectl patch hpa/%s --namespace=%s -p \"{\\\"spec\\\":{\\\"minReplicas\\\":1,\\\"maxReplicas\\\":1}}\"\n", $2, $1 }' | shDisable
HPAScaleToZerofeature gateIn case step 1. has been omitted, workloads might be stuck with
replicas: 0and need to be manually scaled up toreplicas: 1to re-enable autoscaling.
What happens if we reenable the feature if it was previously rolled back?
Nothing, the feature can be re-enabled without problems and workload with replicas: 0 targeted by a HPA will be scaled again.
Are there any tests for feature enablement/disablement?
There currently unit tests for the alpha cases and tests planned to be added for the new functionality.
Rollout, Upgrade and Rollback Planning
As this is a new field every usage is opt-in. In case the kubernetes version is downgraded, currently scaled to 0 workloads might need to be manually scaled to 1 as the controller would treat them as paused otherwise.
If a rollback is planned, the following steps should be performed before downgrading the kubernetes version:
Make sure there are no hpa objects with minReplicas=0 and maxReplicas=0. Here is a oneliner to update it to 1:
$ kubectl get hpa --all-namespaces --no-headers=true | awk '{if($6==0) printf "kubectl patch hpa/%s --namespace=%s -p \"{\\\"spec\\\":{\\\"minReplicas\\\":1,\\\"maxReplicas\\\":1}}\"\n", $2, $1 }' | shDisable
HPAScaleToZerofeature gateDowngrade the Kubernetes version
How can a rollout or rollback fail? Can it impact already running workloads?
There are no expected side-effects when the rollout fails as the new ScaleToZero condition should only be enabled once the version upgraded completed.
If the kube-apiserver has been upgraded before the kube-controller-manager, an HPA object has been updated to minReplicas: 0 and the workload is already scaled down to 0 replicas, you must manually scale the workload to at least one replica.
You can detect this situation in one of two ways:
Manually, by checking the HPA status and verifying that all entries show ScalingActive set to true and do not mention ScalingDisabled, or
Automatically, by using the
kube_horizontalpodautoscaler_status_conditionmetric provided by kube-state-metrics to ensure theScalingActivecondition istrue.
If an rollback is attempted, all HPAs should be updated to minReplicas: 1 as otherwise HPA for deployments with zero replicas will be disabled until
replicas have been raised explicitly to at least 1.
What specific metrics should inform a rollback?
If workloads an unexpected number of HPA entities contain a the status ScalingActive false and mention ScalingDisable the feature isn’t working as desired and all HPA objects should be updated to > 0 again and their managed workloads should be scaled to at least 1.
This condition can also be detected using the kube_horizontalpodautoscaler_status_condition metric provided by kube-state-metrics
, but reason should be manually confirmed for flagged HPA objects.
Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
No yet as no implementation based on the new condition is available.
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
No.
Monitoring Requirements
How can an operator determine if the feature is in use by workloads?
The new status will be visible inside the kube_horizontalpodautoscaler_status_condition metric provided by kube-state-metrics
as
and the minReplicas: 0 setting reflected in kube_horizontalpodautoscaler_spec_min_replicas.
How can someone using this feature know that it is working for their instance?
When this feature is enabled for a workload scaled based on an object or external metric, the workload should be scaled to 0 replicas when the metric is 0.
What are the reasonable SLOs (Service Level Objectives) for the enhancement?
No changes to the autoscaling SLOs.
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
No changes to the autoscaling SLIs.
Are there any missing metrics that would be useful to have to improve observability of this feature?
No, in regards to this KEP.
Dependencies
Does this feature depend on any specific services running in the cluster?
The addition has the same dependencies as the current autoscaling controller.
Scalability
Will enabling / using this feature result in any new API calls?
No, the amount of autoscaling related API calls will remain unchanged. No other components are affected.
Will enabling / using this feature result in introducing new API types?
No, this only modifies the existing API types.
Will enabling / using this feature result in any new calls to the cloud provider?
No, the amount of autoscaling related cloud provider calls will remain unchanged. No other components are affected.
Will enabling / using this feature result in increasing size or count of the existing API objects?
Yes, one additional boolean field inside the spec of every HorizontalPodAutoscaler resource.
Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
No, the are no visible latency changes expected for existing autoscaling operations.
Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?
No, the are no visible changes expected for existing autoscaling operations.
Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
No.
Troubleshooting
How does this feature react if the API server and/or etcd is unavailable?
Autoscaling will not occur, this is the same as the current behaviour.
What are other known failure modes?
- Failed to fetch the relevant object or external metrics.
- Detection:
ScalingActive: falsecondition withFailedGetExternalMetricorFailedGetObjectMetricreason. - Mitigations: manually scale the resource.
- Diagnostics: Related errors should be printed as the messages of
ScalingActive: false. - Testing: https://github.com/kubernetes/kubernetes/blob/0e3818e02760afa8ed0bea74c6973f605ca4683c/pkg/controller/podautoscaler/replica_calculator_test.go#L451
- Detection:
What steps should be taken if SLOs are not being met to determine the problem?
Check metric_computation_duration_seconds to see which metric encountered the latency issue.
If the latency problem is caused by metrics used for scaling to zero, you can remove those metrics again from your HPA(s).
Implementation History
- (2019/02/25) Original design doc: https://github.com/kubernetes/kubernetes/issues/69687#issuecomment-467082733
- (2019/07/16) Alpha implementation (https://github.com/kubernetes/kubernetes/pull/74526 ) merged for Kubernetes 1.16
Drawbacks
Alternatives
Third-party solutions like KEDA already support scaling to zero for various resource (e.g. RabbitMQ Queues . However, these solutions often introduce additional paradigms and complexity. Since Horizontal Pod Autoscaling is already a core feature of Kubernetes and supports scaling to one, adding native support for scaling to zero would be a valuable and low-complexity enhancement.