KEP-4580: Deprecate & remove Kubelet RunOnce mode
KEP-4580: Deprecate & remove Kubelet RunOnce mode
- Release Signoff Checklist
- Summary
- Motivation
- Proposal
- Design Details
- Production Readiness Review Questionnaire
- Implementation History
- Drawbacks
- Alternatives
- Infrastructure Needed (Optional)
Release Signoff Checklist
Items marked with (R) are required prior to targeting to a milestone / release.
- (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
- (R) KEP approvers have approved the KEP status as
implementable - (R) Design details are appropriately documented
- (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
- (R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests
- (R) Production readiness review completed
- (R) Production readiness review approved
- “Implementation History” section is up-to-date for milestone
- User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
- Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
Summary
Deprecate and remove kubelet support for RunOnce mode, and mark the RunOnce field in KubeletConfiguration and the --runonce flag of kubelet as deprecated, finally remove the --runonce flag.
Motivation
- RunOnce does not work in systemd mode.
- RunOnce mode doesn’t support many newer pod features (init containers).
- RunOnce mode does not apply to the pod lifecycle we describe in the documentation, e.g. it does not support any volumes.
- RunOnce only provides some unit tests, without any e2e or integration tests, which makes us unable to guarantee whether it is usable.
Goals
- Mark the
RunOncefield inKubeletConfigurationand therunonceflag of kubelet as deprecated, and finally remove therunonceflag. - Remove kubelet support for RunOnce mode.
Non-Goals
Immediate removal: the deprecation and removal process will be gradual and feature gate to increase awareness among potential users.
Proposal
The RunOnce mode of kubelet will exit the kubelet process after spawning pods from the local manifests or remote URL. It is suitable for scenarios where one-time tasks need to be run on the node, this proposal outlines plans to deprecate and remove RunOnce mode in kubelet.
Risks and Mitigations
Some people may still rely on this feature, but podman addresses the same use case with more well-supported way, ref: https://docs.podman.io/en/latest/markdown/podman-kube.1.html . Affected users can migrate to podman kube subcommand on demand.
For Docker users, Docker does not officially provide a subcommand similar to podman-kube-play to create containers with Kubernetes YAML, and there is currently no mature and reliable third-party tool to translate Kubernetes YAML into Docker Compose files, but they can manually perform this process and run containers in the form of Docker Compose.
Design Details
KubeletConfiguration Change: KubeletConfiguration
Mark the RunOnce field as deprecated.
kubelet flag Change
make the --runonce flag as deprecated, and remove it in GA version.
Implement warning logging for RunOnce mode usage
Starting in 1.31, during kubelet startup, if running in RunOnce mode, the kubelet will log a warning message, for example:
klog.Warning("RunOnce mode has been deprecated, and will be removed in a future release")
Introduction LegacyNodeRunOnceMode feature gate
With the introduction of the LegacyNodeRunOnceMode feature gate, Kubernetes aims to guide users through the deprecated RunOnce mode. Unless this feature gate is enabled, kubelet will refuse to start when the --runonce command line flag is set.
Test Plan
[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.
Prerequisite testing updates
Unit tests
- N/A
Integration tests
- N/A
e2e tests
- N/A
Graduation Criteria
Alpha
- Feature gate
LegacyNodeRunOnceModeis introduced, is enable by default. Disable this feature gate will fail the kubelet on startup with RunOnce mode enable. - Mark the
RunOncefield inKubeletConfigurationas deprecated.
Beta
LegacyNodeRunOnceModefeature gate is disable by default.- Failed when starting kubelet in RunOnce mode.
GA
- We make the
LegacyNodeRunOnceModefeature gate disable by default and cannot be enable. - Comment the
RunOncefield in KubeletConfiguration as ’no longer has any effect’, and remove the kubelet’s--runonceflag. - Remove kubelet RunOnce mode.
Upgrade / Downgrade Strategy
Version Skew Strategy
- N/A
Production Readiness Review Questionnaire
Feature Enablement and Rollback
- Feature gate (also fill in values in
kep.yaml)- Feature gate name: LegacyNodeRunOnceMode
- Components depending on the feature gate: kubelet
- Will enabling / disabling the feature require downtime of the control plane? Yes. Flag must be set on kubelet start. To disable, kubelet must be restarted. Hence, there would be brief control component downtime on a given node.
- Will enabling / disabling the feature require downtime or reprovisioning of a node? Yes. See above; disabling would require brief node downtime.
Does enabling the feature change any default behavior?
No.
Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
Yes. Using the feature gate is the only way to enable/disable this feature.
What happens if we reenable the feature if it was previously rolled back?
Re-enabling the feature will make the RunOnce functionality available in the kubelet.
Are there any tests for feature enablement/disablement?
N/A
Rollout, Upgrade and Rollback Planning
How can a rollout or rollback fail? Can it impact already running workloads?
In the alpha stage, this feature is enable by default.
Cluster operators can test the behavior by enabling the feature gate.
In the beta stage, this feature is enable by default. With this feature disabled, the kubelet will refuse to start if is still using RunOnce mode.
Cluster operators can reinstate the mode by explicitly enabling the feature gate.
What specific metrics should inform a rollback?
N/A
Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
N/A
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
We will deprecate and remove the --runonce flag of kubelet and the RunOnce field in KubeletConfiguration.
Monitoring Requirements
- N/A
Dependencies
Does this feature depend on any specific services running in the cluster?
No
Scalability
Will enabling / using this feature result in any new API calls?
No
Will enabling / using this feature result in introducing new API types?
No
Will enabling / using this feature result in any new calls to the cloud provider?
No
Will enabling / using this feature result in increasing size or count of the existing API objects?
No
Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
No
Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?
No
Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
No
Troubleshooting
How does this feature react if the API server and/or etcd is unavailable?
What are other known failure modes?
What steps should be taken if SLOs are not being met to determine the problem?
Implementation History
- - 2024-04-17: Initial draft KEP
Drawbacks
Alternatives
- Fix RunOnce mode and add e2e tests and integration tests.
- Make RunOnce mode work in systemd mode and support volumes.