KEP-5343: Make nftables the default kube-proxy backend
KEP-5343: Make nftables the default kube-proxy backend
- Release Signoff Checklist
- Summary
- Motivation
- Proposal
- Design Details
- Production Readiness Review Questionnaire
- Implementation History
- Drawbacks
- Alternatives
Release Signoff Checklist
Items marked with (R) are required prior to targeting to a milestone / release.
- (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
- (R) KEP approvers have approved the KEP status as
implementable - (R) Design details are appropriately documented
- (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
- (R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests
- (R) Production readiness review completed
- (R) Production readiness review approved
- “Implementation History” section is up-to-date for milestone
- User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
- Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
Summary
KEP-3866
, which introduced the nftables kube-proxy backend, noted
that we would probably eventually want to make it the default backend,
as part of some other KEP. This is that KEP.
Motivation
Goals
Declare
nftablesto be the default backend.Avoid breaking existing users.
Non-Goals
None
Proposal
When we created the nftables backend, we assumed it would
eventually be a better choice for everyone than the iptables or
ipvs backend. However, there were a few reasons for not changing the
default right away.
Kernel support
The nftables backend requires a much newer kernel (5.13) than the
iptables and ipvs backends.
There is no official minimum required kernel version for Kubernetes,
though kubeadm will warn if you are using a kernel that is both
“old” (older than 6.0 currently) and not under LTS upstream. Currently
it allows 5.4, 5.10, 5.15, and 6.0 or later, though the last update to
that list was 9 months ago (September 2025), and 5.4 has ended LTS
since then. 5.10 and 5.15 will end LTS in December 2026, so assuming
we update the list again at that point, Kubernetes 1.39 would
semi-officially only support kernels new enough to support the
nftables backend.
In terms of distributions:
Debian 12 (bookworm) has a 6.1 kernel (and 11 will be out of LTS by then anyway).
Ubuntu 22.04 has a 5.15 kernel (and will be the oldest fully-supported release at that point).
RHEL 9 nominally has a 5.14 kernel, though it’s probably effectively newer than 6.1 anyway. (RHEL 8’s nominally 4.18 kernel probably does not have enough backports to be able to support
nftablesmode.)SLE15 SP4 and SP5 have 5.14, and SP6 and SLE16 have 6.x kernels.
Thus, it seems reasonable to assume that as of Kubernetes 1.39, most
users will have a new-enough kernel to support nftables.
Stability and performance
The new nftables backend needed to get some use to convince us that
it was stable and performant enough to replace iptables. At this
point, with it having been GA for 3 releases, we have gotten enough
bug reports to know that people are using it, and few enough to be
confident that it mostly works.
There have been some performance fixes since GA. One problem we know existed in the past that may still be there is that it definitely used to take a long time to start up a new node in a cluster with many services/endpoints. It is possible that kubernetes #135800 and kubernetes #136499 may have fixed this. (They definitely fixed a closely-related issue.) If not, then we should probably fix it to do very large syncs in multiple stages rather than all in a single transaction.
The biggest remaining stability problem is the fact that nft itself
crashes in some circumstances, notably when it encounters rules
created by someone else using a much newer version of nft than the
kube-proxy image contains. We are working on fixing this now by
building our own patched version of nft that should be more stable
(kubernetes #136786
). (We can’t simply switch to using a newer
version of nft, because then we would be inflicting crashes on users
in the opposite situation, with a host filesystem containing a much
older nft.)
Feature compatibility
Another thing that may have prevented some people from migrating to
nftables mode is the lack of support for localhost NodePorts. This
is being addressed by KEP-6032
, which is expected to be Beta in 1.38
and GA in 1.40.
Transitioning the default proxy mode in existing clusters
Many users presumably run kube-proxy with no --proxy-mode argument
or config-file mode. We do not want to switch them from iptables
to nftables without warning.
Starting in 1.37, we will warn users (via logs and events) when the
proxy mode gets defaulted to iptables (rather than having been
explicitly set to iptables). After enough releases of this, we can
switch the default.
(Alternatively, we could remove the default altogether, and say that
in the future, everybody will have to always explicitly specify the
proxy mode that they want. That avoids having anyone get accidentally
switched to nftables, but it means they accidentally completely
break their cluster instead, which is not an improvement.)
Risks and Mitigations
The big risk of making nftables the default is that it might turn
out to not be ready. Of course, if this turned out to be the case, we
could just un-promote it and go back to iptables as default. (The
mitigation is the fact that we will continue to support the iptables
backend for a while still.)
Design Details
See Graduation Criteria for the timeline.
Test Plan
[X] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.
Prerequisite testing updates
None
Unit tests
Probably none; there’s not going to be much new/modified code here.
Integration tests
None; kube-proxy doesn’t have integration tests.
e2e tests
We could possibly do further testing of switching kube-proxy modes, but we may not need to.
Graduation Criteria
This isn’t really Alpha/Beta/GA, but let’s pretend it is anyway just to structure the work.
Alpha (1.37)
Begin warning users who get
iptablesby default that the default mode will switch tonftablesin 1.40 so they should requestiptablesexplicitly. (Or migrate now.)- Update kubeadm to specify
iptablesexplicitly in new configs if it’s not already doing that.
- Update kubeadm to specify
(KEP-5495 should declare the
ipvsbackend to be deprecated.)
Beta (1.39)
Update kubeadm’s
system-validatorto no longer accept 5.4 and 5.10 kernels, which will now be out of LTS, but to accept 5.14 and 5.15 (which aren’t LTS upstream but are used by still-supported LTS distros).Update the kube-proxy docs to say something more like “nftables doesn’t work with very old kernels” rather than “nftables requires a very recent kernel”.
(KEP-6032 should have a Beta implementation of localhost nodeports for
nftablesby now.)
GA (1.40)
Change the default proxy mode to
nftables.- This will be pointed out in an action-required release note.
Make
iptableslog about the fact that it is no longer the default and thatnftablesis now preferred.Update kubeadm to default to
nftables.(KEP-5495 should update the
KubeProxyIPVSfeature gate toDefault: false.)Update the docs to reflect that
nftablesis the default,iptablesis legacy, andipvsis on the way out.
Upgrade / Downgrade Strategy
Users who already explicitly set the proxy mode will see no change on upgrade or downgrade. Hopefully, by the time the default actually changes, everybody will be explicitly setting the proxy mode.
For people who ignore the warnings and the release notes and update to 1.40 without explicitly setting the proxy mode, they will have a change in behavior on upgrade, and a reversion to that change on rollback. But, “don’t do that then”.
Version Skew Strategy
N/A: kube-proxy is the only component that is changing. While it is
possible that some people will end up with a mix of nodes running
iptables and nodes running nftables during an upgrade, this
configuration is supported.
Production Readiness Review Questionnaire
Feature Enablement and Rollback
The change to the default value will happen automatically at the given
release (planned for 1.40). There is no reason to put the change
behind a feature gate, since no functionality is being added or
removed: anyone who wants to “opt in” to the new default value before
the default changes can just set mode: nftables explicitly, and
anyone who wants to “opt out” of the new default can just set mode: iptables explicitly (before or after the default changes).
For anyone who is explicitly setting a mode, this KEP will have
absolutely no effect. The rest of this section and the next are only
considering the case of people who ignore the warnings and let their
cluster get switched from iptables to nftables automatically when
they upgrade to 1.40.
How can this feature be enabled / disabled in a live cluster?
It can’t. The change to the default value happens automatically in 1.40.
Does enabling the feature change any default behavior?
The feature is a change to default behavior.
Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
If you downgrade back to before 1.40, the default will go back to
iptables. Alternatively, you can explicitly set mode: iptables
after upgrading to undo the effect of the defaults change.
What happens if we reenable the feature if it was previously rolled back?
As expected.
Are there any tests for feature enablement/disablement?
No, though we have manually tested changing the proxy mode to/from
nftables on a running node in the past.
Rollout, Upgrade and Rollback Planning
How can a rollout or rollback fail? Can it impact already running workloads?
If someone running on a kernel that does not support kube-proxy in
nftables mode upgrades to 1.40 without having explicitly set mode: iptables in their config, then kube-proxy will try to start in
nftables mode and fail, leaving the node broken. (Again, “don’t do
that then”.)
In theory, we could have kube-proxy try to detect this case, and have
the default mode be iptables on older hosts and nftables on newer
ones, but that would be confusing for users and hard to document
clearly.
What specific metrics should inform a rollback?
If the nftables mode is not working, it is likely that the cluster
will fail catastrophically (e.g. because of kubernetes.default not
being accessible), and metrics will be beside the point.
Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
No new testing has been done since the mode-changing testing that was
done when the nftables backend was first added.
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
Switching from iptables mode to nftables mode means that some of
the CLI and config options the user is passing to kube-proxy might now
have no effect (and the user is presumably not setting the
corresponding nftables-specific options).
We give some notes on migrating from iptables mode to nftables
in the online documentation, which points out other things that may
not work right if the proxy mode gets changed out from under the user.
Monitoring Requirements
How can an operator determine if the feature is in use by workloads?
The changed default value is automatically in effect for all clusters after the release in which it changes.
An operator can determine whether a cluster will be / was affected by the changed default by examining the kube-proxy configuration to see if a mode is explicitly specified or not.
How can someone using this feature know that it is working for their instance?
N/A. Anybody who knows that the change in behavior is coming should know that they don’t want to allow the default to be changed out from under them, and thus they should explicitly set a kube-proxy mode to prevent that from happening. (And they can know that that is working by the fact that kube-proxy will no longer log an event at them.)
What are the reasonable SLOs (Service Level Objectives) for the enhancement?
N/A
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
N/A
Are there any missing metrics that would be useful to have to improve observability of this feature?
We could add a metric for “number of warnings logged about the fact
that you’re not explicitly setting mode”, but anybody who would look
at metrics ought to see the event anyway?
Dependencies
Does this feature depend on any specific services running in the cluster?
No
Scalability
Will enabling / using this feature result in any new API calls?
No
Will enabling / using this feature result in introducing new API types?
No
Will enabling / using this feature result in any new calls to the cloud provider?
No
Will enabling / using this feature result in increasing size or count of the existing API objects?
No
Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
No
Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?
No; switching from iptables to nftables should result in less CPU
and RAM being used.
Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
No
Troubleshooting
How does this feature react if the API server and/or etcd is unavailable?
N/A / the same as kube-proxy already does
What are other known failure modes?
N/A
What steps should be taken if SLOs are not being met to determine the problem?
N/A
Implementation History
Drawbacks
Changing the default value is inherently risky, but we don’t want
people getting stuck with the iptables backend forever.
Alternatives
None.