KEP-2595: Expanded DNS Configuration
KEP-2595: Expanded DNS Configuration
- Summary
- Motivation
- Proposal
- Design Details
- Production Readiness Review Questionnaire
- Implementation History
- Drawbacks
- Alternatives
- Infrastructure Needed (Optional)
Summary
Allow kubernetes to have expanded DNS(Domain Name System) configuration.
Motivation
Kubernetes today limits DNS configuration according to the obsolete
criteria
. As recent DNS resolvers
allow an arbitrary number of search paths, a new feature gate
ExpandedDNSConfig will be introduced. With this feature, kubernetes allows
more DNS search paths and longer list of DNS search paths to keep up with recent
DNS resolvers.
Confirmed that expanded DNS configuration is supported by
glibc 2.17-323glibc 2.28musl libc 1.22pure Go 1.10 resolverpure Go 1.16 resolver
Goals
- Make
kube-apiserverallow expanded DNS configuration when validating Pod’s or PodTemplate’sDNSConfig - Make
kubeletallow expanded DNS configuration when validatingresolvConf - Make
kubeletallow expanded DNS configuration when validating actual DNS resolver configuration composed ofcluster domain suffixes(e.g. default.svc.cluster.local, svc.cluster.local, cluster.local), kubelet’sresolvConfand Pod’sDNSConfig
Non-Goals
- Remove limitation on DNS search paths completely
- Let cluster administrators limit the number of search paths or the length of DNS search path list to an arbitrary number
Proposal
- Expand
MaxDNSSearchPathsto 32 - Expand
MaxDNSSearchListCharsto 2048
User Stories (Optional)
Story 1
Story 2
Notes/Constraints/Caveats (Optional)
Some container runtimes of older versions have their own restrictions on the number of DNS search paths. For the container runtimes which are older than
- containerd v1.5.6
- CRI-O v1.22
, pods with expanded DNS configuration may get stuck in the pending state. (see kubernetes#104352 )
This enhancement relaxes the validation of Pod and PodTemplate. Once the
feature is activated, it must be carefully disabled. Otherwise, the objects left
over from the previous version which have the expanded DNS configuration can be
problematic.
Risks and Mitigations
There may be some environments(DNS resolver or others) that break without current limitations. At this point, it is fair to open a bug, so they can fix it.
Design Details
- Declare and define
MaxDNSSearchPathsExpandedto32andMaxDNSSearchListCharsExpandedto2048 - Add the feature gate
ExpandedDNSConfig(see Feature Enablement and Rollback ) - If the feature gate
ExpandedDNSConfigis enabled, replaceMaxDNSSearchPathswithMaxDNSSearchPathsExpandedand replaceMaxDNSSearchListCharswithMaxDNSSearchListCharsExpandedto allow expanded DNS configuration
Test Plan
[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.
Prerequisite testing updates
N/A
Unit tests
Verified that the API server accepts the pod or podTemplate with the expanded DNS config and the kubelet accepts the resolv.conf or pod with the expanded DNS config.
k8s.io/kubernetes/pkg/kubelet/network/dns/dns.go:2023/04/22-68.7%
Integration tests
No integration tests are planned.
e2e tests
Tests in test/e2e/network/dns.go:
- [It] [sig-network] DNS should work with the pod containing more than 6 DNS search paths and longer than 256 search list characters: results
Graduation Criteria
Alpha
- Implement the feature
- Add appropriate unit tests
Beta
- Address feedback from alpha
- Add e2e tests
- All major container runtimes supported versions allows this feature
GA
- Address feedback from beta
- Sufficient number of users using the feature
- Close on any remaining open issues & bugs
Upgrade / Downgrade Strategy
N/A
Version Skew Strategy
In clusters with older kubelets, old kubelets with resolvConf configured to
exceed bounds throw warnings but do not fail. Eventually, old kubelets truncate
the overage and apply the actual DNS resolver configuration.
In clusters with a replicated control plane, all kube-apiservers should enable or disable the expanded DNS configuration feature.
Production Readiness Review Questionnaire
Feature Enablement and Rollback
How can this feature be enabled / disabled in a live cluster?
- Feature gate (also fill in values in
kep.yaml)- Feature gate name:
ExpandedDNSConfig - Components depending on the feature gate:
kubeletkube-apiserver
- This feature is not compatible with some older container runtimes (see Notes/Constraints/Caveats )
- Feature gate name:
- Other
- Describe the mechanism:
- Will enabling / disabling the feature require downtime of the control plane?
- Will enabling / disabling the feature require downtime or reprovisioning of a node?
- Feature gate (also fill in values in
Does enabling the feature change any default behavior?
Enabling this feature allows kubernetes to have objects(Pod or PodTemplate) with the expanded DNS configuration.
- Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
Yes, the feature can be disabled by disabling the feature gate.
Before disabling the feature gate, it is recommended to remove objects containing podsTemplate with the expanded DNS config as newly created pods will be rejected by the apiserver.
$ cat << \EOF > get-expanded-dns-config-objects.tpl
{{- range $_, $objects := .items}}
{{- with $searches := .spec.template.spec.dnsConfig}}
{{- $length := len .searches }}
{{- if gt $length 6 }}
{{- $objects.metadata.name }}
{{- printf " " }}
{{- continue }}
{{- end}}
{{- $searchStr := "" }}
{{- range $search := .searches}}
{{- $searchStr = printf "%s %s" $searchStr $search }}
{{- end}}
{{- $searchLen := len $searchStr }}
{{- if gt $searchLen 256}}
{{- $objects.metadata.name }}
{{- printf " " }}
{{- continue }}
{{- end }}
{{- end}}
{{- end}}
EOF
# get deployments having the expanded DNS configuration
$ kubectl get deployments.apps --all-namespaces -o go-template-file=get-expanded-dns-config-objects.tpl
Once the feature is disabled, kube-apiserver will reject the newly requested pod having expanded DNS configuration and kubelet will create a resolver configuration excluding the overage.
If there is a problem with an object that already has expanded DNS configuration, the object should be removed manually.
- What happens if we reenable the feature if it was previously rolled back?
New objects with expanded DNS configuration will be accepted by the apiserver and new Pods with expanded configuration will be created by the kubelet.
- Are there any tests for feature enablement/disablement?
Yes.
We verified in unit tests that existing pods work with the feature enabled and already created pods with the expanded DNS config work fine with the feature disabled.
When this feature is disabled, objects containing podTemplate with the expanded DNS config cannot create new pods until that podTemplate is fixed to have the non-expanded DNS config.
Rollout, Upgrade and Rollback Planning
- How can a rollout fail? Can it impact already running workloads?
If a kubelet starts with invalid resolvConf, new workloads will fail DNS
lookups.
- What specific metrics should inform a rollback?
If new workloads start to fail DNS lookups due to a corrupted resolv.conf, or due to older resolver libraries, that would be an indication to rollback the enablement.
- Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
Yes
- Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
No
Monitoring Requirements
- How can an operator determine if the feature is in use by workloads?
There is no metric to indicate the enablement. The operator has to check if there are objects or DNS resolver configuration files with expanded configuration to determine if the feature is in use.
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
- Metrics
- Metric name:
- [Optional] Aggregation method:
- Components exposing the metric:
- Other (treat as last resort)
- Success of DNS lookups
- Metrics
What are the reasonable SLOs (Service Level Objectives) for the above SLIs?
DNS lookups should not fail as before the feature was enabled.
- Are there any missing metrics that would be useful to have to improve observability of this feature?
TBD
Dependencies
- Does this feature depend on any specific services running in the cluster?
This feature requires container runtime support. See Notes/Constraints/Caveats .
Scalability
- Will enabling / using this feature result in any new API calls?
No
- Will enabling / using this feature result in introducing new API types?
No
- Will enabling / using this feature result in any new calls to the cloud provider?
No
- Will enabling / using this feature result in increasing size or count of the existing API objects?
The sum of the lengths of PodSpec.DNSConfig.Searches can be increased to 2048.
- Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
The DNS lookup time can be increased, but it will be negligible.
- Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?
No
- Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
No, this cannot result in resource exhaustion of node resources as setting
longer DNS search list to /etc/resolv.conf only affects the process running
inside the container.
Troubleshooting
- How does this feature react if the API server and/or etcd is unavailable?
N/A
- What are other known failure modes?
N/A
- What steps should be taken if SLOs are not being met to determine the problem?
If DNS lookups fail, you can check error messages. And then, validate the
kubelet’s resolvConf if it is corrupted or use newer DNS resolver libraries if
they are too old.
Implementation History
- 2021-03-26: Initial discussion at #100583
- 2021-05-11: Initial KEP approved
- 2021-05-27: Initial alpha implementations merged
- 2021-06-05: Initial docs merged
- 2022-01-12: Docs updated to add requirements for the feature
- 2022-08-15: Beta implementation merged
- 2022-08-22: Beta docs merged
Drawbacks
Alternatives
- Remove the limitation of DNS search paths completely
- Make
MaxDNSSearchPathsandMaxDNSSearchListCharsconfigurable