KEP-3673: Kubelet limit of Parallel Image Pulls
KEP-3673: Kubelet limit of Parallel Image Pulls
- Release Signoff Checklist
- Summary
- Motivation
- Proposal
- Design Details
- Production Readiness Review Questionnaire
- Implementation History
- Drawbacks
- Alternatives
- Infrastructure Needed (Optional)
Release Signoff Checklist
Items marked with (R) are required prior to targeting to a milestone / release.
- (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
- (R) KEP approvers have approved the KEP status as
implementable - (R) Design details are appropriately documented
- (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
- (R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests
- (R) Production readiness review completed
- (R) Production readiness review approved
- “Implementation History” section is up-to-date for milestone
- User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
- Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
Summary
This KEP proposes adding to kubelet a node-level limit on the number of parallel image pulls.
Motivation
QPS/burst limits on kubelet are confusing
Currently kubelet limits image pulls with QPS and burst, but they are confusing as they only limit the number of requests sent to container runtime, and does not consider the number of inflight image pulls. In other words, even a small QPS is set, there still could be many image pulls in progress in parallel, if each pull takes a long time. See issue #112044 as an example.
No way to limit the number of inflight image pulls
A user might want to put a limit on the number of images being pulled at the same time to avoid burden on disk IO or consuming too much network bandwidth. However, currently neither kubelet or containerd limits the number of inflight image pulls.
On kubelet, as mentioned above, the QPS/burst limit does not take the in-progress pulls into account. On containerd, there is only a per-image limit , which only limits the number of parallel layer downloading for each image, but potentially allows unlimited number downloads overall.
As a side node, cri-o has a configuration to limit the maximum layer downloads in-progress.
Goals
Adding a node-level limit of parallel image pulls to kubelet. This limit will limit the maximum number of images being pulled in parallel. Any image pull request beyond the limit will be blocked until one image pull finishes.
Non-Goals
- Limiting the number of image layers being downloaded at the same time.
- Prioritizing image pulls in any way.
- Using the number of inflight image pulls as a signal to direct pod scheduling.
Proposal
Before this proposal, serialize-image-pulls is by default true.
This proposal includes some defaulting and validation logics.
- A new integer field
maxParallelImagePullswill be added to Kubelet Configuration.maxParallelImagePullsis the maximum number of inflight image pulls. - If both
serialize-image-pullsandmaxParallelImagePullsare not set, the default value ofserialize-image-pullswill be true by default, which applies no limit on parallel pulls, the same as current behavior.(It indicates that maxParallelImagePulls is 1.) - If
serialize-image-pullsis not set andmaxParallelImagePullsis set, the default value ofserialize-image-pullswill depend onmaxParallelImagePulls.- If
maxParallelImagePullsis 1,serialize-image-pullswill be set to true by default. - If
maxParallelImagePullsis larger than 1,serialize-image-pullswill be set to false by default.
- If
- If both
serialize-image-pullsandmaxParallelImagePullsare set, there would be some validations.- If
serialize-image-pullsis set to true,maxParallelImagePullsshould be nil or 1. IfmaxParallelImagePullsis larger than 1, the configuration validation will fail. - If
serialize-image-pullsis set to false,maxParallelImagePullsshould be larger than 0. IfmaxParallelImagePullsis less than 1, the configuration validation will fail.
- If
User Stories (Optional)
Story 1
A kubernetes user wants to enable parallel image pulls to reduce workload startup latency, but they don’t want too many images to be pulled at the same time, as that might burden the disk IO and therefore has a negative impact on disk performance, or might use too much network bandwidth. By setting a proper maxParallelImagePulls, they will have a better control over the parallelism and therefore avoid overwhelming the disk.
Notes/Constraints/Caveats (Optional)
Risks and Mitigations
Since this feature is purely in-memory, if kubelet restarts, it will lose track of the image pulls before the Kubelet restart. It seems that even on non-graceful shutdown of kubelet, in-flight image pulls are cancelled, but there might be some corner cases when previous image pulls may still be performed after restart, exceeding the total of maxParallelImagePulls.
Design Details
- The implementation will be similar to the existing
serialImagePuller. More specifically:maxParallelImagePullswill be passed to parallelImagePuller , andparallelImagePullerwill create a channel of the size ofmaxParallelImagePulls.parallelImagePuller.pullImagewill simply try to send the pull request to the channel, and will be blocked if the channel is full.- A go routine will keep processing the requests in the channel.
Test Plan
[X] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.
Prerequisite testing updates
None.
Unit tests
New unit test is added to image_manager_test.go along with Alpha implementation.
k8s.io/kubernetes/pkg/kubelet/images/image_manager.go:05/29/2023-97.3
Unit test covers the following cases:
- Kubelet allows the number of image pull requests to be sent to container runtime, if the number equals to or below
MaxParallelImagePulls. - Kubelet blocks further image pull requests from being sent to container runtime, if
MaxParallelImagePullsis hit. - If a certain number of image pulls get stuck, other image pull requests can still be sent to container runtime.
Integration tests
See e2e tests section below.
:
e2e tests
A new node_e2e test with serialize-image-pulls==false will be added test parallel image pull limits.
- When maxParallelImagePulls is reached, all further image pulls will be blocked.
- Verify the behavior when the same image is pulled in parallel, which will happen when image pull policy is
Always.
- pull image parallel test cases: https://github.com/kubernetes/kubernetes/blob/6c258fa74b2f0644a6b31a7ce3e613dda41effd4/test/e2e_node/image_pull_test.go
Graduation Criteria
Alpha
- Initial e2e tests completed and enabled
Beta
- Gather feedback from developers and surveys
- Add e2e test to cover the parallel image pull case
GA
- Gather feedback from real-world usage from kubernetes vendors.
- Allowing time for feedback.
Upgrade / Downgrade Strategy
N/A
Version Skew Strategy
N/A
Production Readiness Review Questionnaire
Feature Enablement and Rollback
How can this feature be enabled / disabled in a live cluster?
- Other
- Describe the mechanism: The feature will be enabled when the kubelet config field
maxParallelImagePullsis set to a non-zero value on kubelet, can be disabled by settingmaxParallelImagePullsto 0 and restarting kubelet. - Will enabling / disabling the feature require downtime of the control
plane?
- No
- Will enabling / disabling the feature require downtime or reprovisioning
of a node?
- Yes, it requires restarting kubelet.
- Describe the mechanism: The feature will be enabled when the kubelet config field
Does enabling the feature change any default behavior?
The change itself will not change any default behavior. The default behavior will only be changed when the user explicityly sets maxParallelImagePulls to a non-zero value.
Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
A user can roll back to the previous default behavior by setting serialize-image-pulls to true and restarting kubelet.
Similarly, setting maxParallelImagePulls to 1 will be equivalent to setting serialize-image-pulls to true.
What happens if we reenable the feature if it was previously rolled back?
Nothing will happen.
Are there any tests for feature enablement/disablement?
Yes, see e2e tests section above.
Rollout, Upgrade and Rollback Planning
N/A
How can a rollout or rollback fail? Can it impact already running workloads?
No running workloads will be impacted.
Note that when changing MaxParallelImagePulls, kubelet restart is required. Since the parallel image pull counter is maintained in memory, restarting kubelet will reset the counter and potentially allow more image pulls than the limit.
What specific metrics should inform a rollback?
In worst case, image pulls might fail. Users can monitor image pull k8s events and runtime_operations_errors_total metric to see if there is an increase
of image pull failures.
Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
See e2e tests section above. The upgrade->downgrade->upgrade path needs multiple kubelet restarts and is not quite necessary for this feature. We manually tested enabling and disabling this feature by changing kubelet config and restarting kubelet.
The manual test steps are as following:
- Create an one-node 1.27 k8s cluster, which has MaxParallelImagePulls support but the value is nil (no limit) by default.
- Manually change the MaxParallelImagePulls setting by SSH-ing to the node and adding the following to the kubelet config:
serializeImagePulls: false
maxParallelImagePulls: 2
- Deploy three pods, each with a different container image to the one-node cluster. All the three images are 5GB. The relatively-big size makes sure there is enough time between image pulling events, and makes it easier for us to observe the behavior.
- Observe the k8s events by running
kubectl get events, and observe that exactly two images finish pulling first, and then the remaining one image finishes. - Manually change the MaxParallelImagePulls setting by SSH-ing to the node again and removing the
serializeImagePullsentry andmaxParallelImagePullsentry. - Deploy two pods, each with a different container image to the cluster. Both of the two images are 5GB, and they are different images from the three images deployed in step 3.
- Observe the k8s events by running
kubectl get events, and observe that exactly one image finishes pulling first, and then the remaining one image finishes.
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
No.
Monitoring Requirements
How can an operator determine if the feature is in use by workloads?
Image pulling is managed by kubelet, and does not affect how workloads run. That said, when parallel image pulling is enabled (SerialImagePulls is set to false), an operator will observe that a pod could start while kubelet is still pulling images for another pod.
To observe the effect of different MaxParallelImagePulls settings, please refer to the next section.
How can someone using this feature know that it is working for their instance?
- Events
- Event Reason: Pulling
Assuming MaxParallelImagePulls is set to X, an operator can look at the container runtime log, and see X PullImageRequests sent to container runtime at the same time.
If the image pulls take roughly the same amount of time, an operator can see k8s event and see X images finish pulling at roughly the same time.
What are the reasonable SLOs (Service Level Objectives) for the enhancement?
The success rate of image pulls should remain the same with a parallel image pull limit set, compared to without it. For example, if a cluster admin has set an SLO of 99% of image pulls should succeed, then setting the parallel image pull limit should not lower the success rate to below 99%.
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
We can rely on the existing metrics on image pull to determine if this feature has any impact on image pulling.
- Metrics
- Metric name: kubelet_runtime_operations_errors_total
- [Optional] Aggregation method: operation_type=pull_image
- Components exposing the metric:
- Other (treat as last resort)
- Details:
Are there any missing metrics that would be useful to have to improve observability of this feature?
No.
Dependencies
Does this feature depend on any specific services running in the cluster?
No.
Scalability
Will enabling / using this feature result in any new API calls?
No.
Will enabling / using this feature result in introducing new API types?
No.
Will enabling / using this feature result in any new calls to the cloud provider?
No.
Will enabling / using this feature result in increasing size or count of the existing API objects?
No.
Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
No.
Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?
When the limit of parallel pull is configured improperly, it might cause disk IO or network exhaustion, but it is not a new problem if parallel image pull is enabled. And this KEP is actually allowing to avoid this exhaustion by limiting the maximum image pulls in parallel.
Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
See question above.
Troubleshooting
How does this feature react if the API server and/or etcd is unavailable?
N/A. This feature does not rely on any component other than kubelet.
What are other known failure modes?
No known failure modes.
What steps should be taken if SLOs are not being met to determine the problem?
If this feature impact image pulling. The user should unset MaxParallelImagePulls (i.e. setting MaxParallelImagePulls to nil), or set SerialImagePulls to true to enable serial image pulling.
Implementation History
Alpha
Alpha feature was implemented in 1.27: https://github.com/kubernetes/kubernetes/pull/115220
Beta
Add e2e tests https://github.com/kubernetes/kubernetes/pull/121604 (WIP):
- A new node_e2e test to confirm image pull will be blocked if maxParallelImagePulls is reached.
- Verfiy behavior of image pull in parallel for same image using
imagePullPolicy:Always. - Check the waiting period of image pull for pods with
MaxParallelImagePulls: 1andMaxParallelImagePulls: 5.
GA
1.35 the feature is GA.