KEP-3503: Host Network Support for Windows Pods
KEP-3503: Host Network Support for Windows Pods
- Release Signoff Checklist
- Summary
- Motivation
- Proposal
- Design Details
- Production Readiness Review Questionnaire
- Implementation History
- Drawbacks
- Alternatives
- Infrastructure Needed (Optional)
Release Signoff Checklist
Items marked with (R) are required prior to targeting to a milestone / release.
- (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
- (R) KEP approvers have approved the KEP status as
implementable - (R) Design details are appropriately documented
- (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
- (R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests
- (R) Production readiness review completed
- (R) Production readiness review approved
- “Implementation History” section is up-to-date for milestone
- User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
- Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
Summary
Windows has all of the support needed to enable having containers use the node’s networking namespace. This enhancement details the work needed to enable this functionality in Kubernetes.
Motivation
The two main motivating factors are 1) parity with Linux and 2) improved cluster density.
Today it is possible to set hostNetwork=true for Windows pods but it doesn’t change anything
(unless the pod contains hostProcess containers). This can be confusing for users.
In clusters with large amounts of services Windows nodes can experience port exhaustion.
One such situation is where a small amount of pods need to expose many ports it can then be desirable to use use host networking instead of using nodePorts.
Another situation is using hostNetwork=true as alternative to relying on ‘hostPort’ CNI feature for exposing many ports in many pods.
Goals
- Enable Windows containers to use node’s networking namespace for non-
hostprocesscontainers.
Non-Goals
- Discuss implementation details on how container runtimes configure Pod sandboxes for pods that should be joined to the node’s network namespace.
Proposal
Host Network support is already supported for Linux Pods and this enhancement will bring feature parity to Windows Pods.
Changes to enable this for Windows entail updating the Kubelet to populate the necessary CRI-API fields when running on Windows to instruct the container runtime (containerd) to use the node’s network namespace when configuring the sandbox for Pods that specify hostNetwork=true.
Container runtimes will also need to be updated but that work is out-of-scope for this proposal.
User Stories (Optional)
Story 1
As a user of a legacy application I want to bind many arbitrary ports to a host network namespace on a single node, as opposed to taking all the node ports of a cluster.
Story 2
As a DaemonSet which runs before CNI providers are installed, for example for security, application bootstrapping, cni bootstrapping, and so on - I want to be able to run a container that isn’t fully privileged (i.e. that isn’t a HostProcessContainer) but which is on the host’s network
Story 3
As a user creating a windows pod with hostNetwork=true, I want correct behaviour (i.e. I don’t want to silently ignore the hostNetwork=true setting in Pod specs).
Notes/Constraints/Caveats (Optional)
Risks and Mitigations
Very low risk.
Design Details
CRI / Kubelet Updates
Add a new WindowsNamespaceOption struct to CRI-API that mirrors the Linux-specific NameSpaceOption
struct and contains only options on Windows.
// WindowsNamepaceOption provides options for Windows namespaces.
message WindowsNamespaceOption {
// Network namespace for this container/sandbox.
// Note: There is currently no way to set CONTAINER scoped network in the Kubernetes API.
// Namespaces currently set by the kubelet: POD, NODE
NamespaceMode network = 1;
}
Update WindowsSandboxSecurityContext
to include WindowsNamespaceOption
// WindowsSandboxSecurityContext holds platform-specific configurations that will be
// applied to a sandbox.
// These settings will only apply to the sandbox container.
message WindowsSandboxSecurityContext {
...
// Configurations for the sandbox's namespaces.
WindowsNamespaceOption namespace_options = 4;
}
Update Kubelet to set new CRI-API fields based on contents of incoming Pod specs.
Container Runtime Support
Update Containerd to check for new CRI-API fields in RunPodSandbox and configure networking appropriately.
Test Plan
[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.
Prerequisite testing updates
Unit tests
k8s.io/kubernetes/pkg/kubelet/kuberuntime:2022-09-12-67%
Integration tests
N/A - There is currently no way to run integration tests that target Windows specific functionality.
e2e tests
- Existing SIG-Network e2e tests for hostNetwork containers will be updated to run for Windows.
Graduation Criteria
Alpha
- CRI-API updates added to codebase
- Unit tests added to validate correct CRI-API fields are set depending if
hostNetwork=trueis set on Windows - Feature implemented behind a feature flag
- Initial e2e tests completed and enabled
Beta
- A version of containerd w/ support for configuing pod’s to use the node’s network namespace is released (target v1.8)
- Functionality is validated as part of Windows Operational Readiness validation.
GA
- All feedback from alpha/beta usage is addressed
Upgrade / Downgrade Strategy
N/A
Version Skew Strategy
N/A
Production Readiness Review Questionnaire
Feature Enablement and Rollback
How can this feature be enabled / disabled in a live cluster?
- Feature gate (also fill in values in
kep.yaml)- Feature gate name: WindowsHostNetwork
- Components depending on the feature gate: kubelet
- Other
- Describe the mechanism:
- Will enabling / disabling the feature require downtime of the control plane?
- Will enabling / disabling the feature require downtime or reprovisioning of a node?
Does enabling the feature change any default behavior?
Enabling this feature gate will cause the kubelet to populate the CRI-API fields outlined above. It will be up to each container runtime to act on these fields.
Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
If this feature is disabled after Windows Pod’s that set hostNetwork=true have already been scheduled,
if those Pod’s get recreated they will not use the node’s networking namespace.
The impact of this will vary per workload.
What happens if we reenable the feature if it was previously rolled back?
If this feature is reenabled then any Pod’s that were created when the feature was disabled will be joined to the node’s networking namespace if they get recreated. The impact of this will vary per workload.
Are there any tests for feature enablement/disablement?
Tests will be added to kubelet to verify above mentioned API fields are/are not populated based on the state of the feature gate.
Rollout, Upgrade and Rollback Planning
How can a rollout or rollback fail? Can it impact already running workloads?
What specific metrics should inform a rollback?
Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
Monitoring Requirements
How can an operator determine if the feature is in use by workloads?
How can someone using this feature know that it is working for their instance?
- Events
- Event Reason:
- API .status
- Condition name:
- Other field:
- Other (treat as last resort)
- Details:
What are the reasonable SLOs (Service Level Objectives) for the enhancement?
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
- Metrics
- Metric name:
- [Optional] Aggregation method:
- Components exposing the metric:
- Other (treat as last resort)
- Details:
Are there any missing metrics that would be useful to have to improve observability of this feature?
Dependencies
Does this feature depend on any specific services running in the cluster?
Scalability
Will enabling / using this feature result in any new API calls?
No
Will enabling / using this feature result in introducing new API types?
No
Will enabling / using this feature result in any new calls to the cloud provider?
No
Will enabling / using this feature result in increasing size or count of the existing API objects?
No
Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
No
Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?
No
Troubleshooting
How does this feature react if the API server and/or etcd is unavailable?
What are other known failure modes?
What steps should be taken if SLOs are not being met to determine the problem?
Implementation History
2022-09-12: Initial KEP merged 2022-10-10: Alpha implemention merged into kubernetes/kubernetes 2025-01-31: Feature has been withdrawn and will be removed from the code