KEP-1797: Configure FQDN as Hostname for Pods
KEP-1797: Configure FQDN as Hostname for Pods
- Release Signoff Checklist
- Summary
- Motivation
- Proposal
- Design Details
- Production Readiness Review Questionnaire
- Implementation History
- Drawbacks
- Alternatives
- Infrastructure Needed (optional)
Release Signoff Checklist
Items marked with (R) are required prior to targeting to a milestone / release.
- (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
- (R) KEP approvers have approved the KEP status as
implementable - (R) Design details are appropriately documented
- (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
- (R) Graduation criteria is in place
- (R) Production readiness review completed
- Production readiness review approved
- “Implementation History” section is up-to-date for milestone
- User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
- Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
Summary
This proposal gives users the ability to set a pod’s hostname to its Fully Qualified Domain Name (FQDN).
A new PodSpec field setHostnameAsFQDN will be introduced. When a user sets this field to true, its Linux
kernel hostname field (the nodename field of struct utsname
)
will be set to its fully qualified domain name (FQDN). Hence, both uname -n and hostname –fqdn will return
the pod’s FQDN. The new PodSpec field setHostnameAsFQDN will default to false to preserve current behavior, i.e.,
setting the hostname field of the kernel to the pod’s shortname.
Related Kubernetes issue #1791 .
Motivation
This feature would increase the interoperability of Kubernetes with legacy applications. Traditionally, Unix and certain Linux distributions, such as RedHat and CentOS, have recommended setting the kernel hostname field to the host FQDN. As a consequence, many applications created before Kubernetes rely on this behavior. Having this feature would help containerize existing applications without deep, risky code changes.
Goals
Giving users the ability to set the hostname field of the kernel to the FQDN of a Pod.
Non-Goals
Giving users a way to configure or enforce this feature cluster-wide.
Proposal
This proposal gives users the ability to set a pod’s hostname to its FQDN. A new PodSpec field named setHostnameAsFQDN
will be introduced, with type *bool.
The values of setHostnameAsFQDN are:
nil(default): The Linux kernel hostname field (the nodename field of struct utsname ) of a pod will be set to its shortname. This is the current behavior.False: Same asnilTrue: The Linux kernel hostname field (the nodename field of struct utsname ) of a pod will be set to its fully qualified domain name (FQDN). FQDN is determined as described here )
User Stories
Story 1: User does not Configure Pod to have FQDN
Assume we have a pod named foo in a namespace bar. The PodSpec subdomain is not set. This pod does not have FQDN, so the value of setHostnameAsFQDN does not have an impact. The Pod spec for this example would be:
# Pod spec
apiVersion: v1
kind: Pod
metadata: {"name": "foo", "namespace": "bar"}
spec:
...
If we exec into the Pod:
uname -nreturnsfoohostname --fqdnreturnsfoo
Story 2: User Configures Pod to have FQDN
Assume we have a pod named foo in a namespace bar. The PodSpec subdomain is set to test. We also assume the cluster-domain is set to its default, i.e. cluster.local. The FQDN of this pod is defined as foo.test.bar.svc.cluster.local (see details here
). The user does not set setHostnameAsFQDN. The Pod spec for this example would be:
# Pod spec
apiVersion: v1
kind: Pod
metadata: {"name": "foo", "namespace": "bar"}
spec:
...
hostname: "foo"
subdomain: "test"
If we exec into the Pod:
uname -nreturnsfoohostname --fqdnreturnsfoo.test.bar.svc.cluster.local
Story 3: User Configures Pod to have FQDN and it would like the pod hostname to be the FQDN
Assume we have a pod named foo in a namespace bar. The PodSpec subdomain is set to test. We also assume the cluster-domain is set to its default, i.e. cluster.local. The FQDN of this pod is defined as foo.test.bar.svc.cluster.local (see details in here
). Additionally, the user sets setHostnameAsFQDN: true. The Pod spec for this example would be:
# Pod spec
apiVersion: v1
kind: Pod
metadata: {"namespace": "bar", "name": "foo"}
spec:
...
hostname: "foo"
subdomain: "test"
setHostnameAsFQDN: "true"
If we exec into the Pod:
uname -nreturnsfoo.test.bar.svc.cluster.localhostname --fqdnreturnsfoo.test.bar.svc.cluster.local
Notes/Constraints/Caveats
The hostname field of the Linux Kernel is limited to 64 bytes (see sethostname(2) ), while most Kubernetes resource types require a name as defined in RFC 1123 , which limits them to 63 bytes. Kubernetes attempts to include the Pod name as hostname, unless this limit is reached. When the limit is reached, Kubernetes has a series of mechanisms to deal with the issue. These include, truncating Pod hostname when a “Naked” Pod name is longer than 63 bytes, and having an alternative way of generating Pod names when they are part of a Controller, like a Deployment. Without any remediation, users might hit the 64 bytes kernel hostname limit, and Kubernetes will fail to create the Pod Sandbox and the pod will remain in “ContainerCreating” (Pending status) forever. The feature proposed here will make this issue occur more frequently, as now the whole FQDN would be limited to 64 bytes. Next we illustrate the issue with an example of a potential error message:
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
longpodnametestsaoitfail23423423432wer-547cc5-st6dd 0/1 ContainerCreating 0 52s
$ kubectl describe pod longpodnametestsaoitfail23423423432wer-547cc5-st6dd
Name: longpodnametestsaoitfail23423423432wer-547cc5-st6dd
Namespace: foo
...
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 16s default-scheduler Successfully assigned foo/longpodnametestsaoitfail23423423432wer-547cc5-st6dd to host.company.com
Warning FailedCreatePodSandBox 1s (x2 over 16s) kubelet, host.company.com Failed create pod sandbox: Failed to set FQDN in hostname, Pod hostname longpodnametestsaoitfail23423423432wer-547cc5-st6dd.p1324234234234.foo.svc.testq.company.com is too long (93 characters requested, 64 characters is the limit).
This failure mode is not great because it might not be apparent to users that their pods are failing. To improve the UX of this failure mode we will create an example Admission Controller that people can take and customize to apply their own policies. For example, if users care only about Deployments, they can make sure this Admission Controller account for the size of FQDN when the setHostnameAsFQDN and subdomain flags are set in the PodSpec template.
Behavior on Windows
There has been discussions with some members of the Sig-Windows group and it
seems this feature does not make sense from the Windows perspective. However,
the feature works as intended on Windows. Specifically, when
the user configures pod to have an FQDN and sets setHostnameAsFQDN: true,
Windows sets the registry value of ‘hostname’ for the registry key
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters to
the pod FQDN. When executing the command hostname, the pod FQDN is returned.
Risks and Mitigations
Design Details
Test Plan
The following end-to-end test is implemented in addition to unit tests:
- Create e2e cases for the 3 User scenarios we described and check value returned by
hostname/uname -nversushostname --fqdn
Graduation Criteria
- Compatible with major systems (e.g. linux, windows)
Alpha -> Beta Graduation
- Gather feedback from users
- Ensure e2e tests are running in Testgrid and they are stable
Beta -> GA Graduation
- Allowing time for feedback from production users
Upgrade / Downgrade Strategy
We will gate off this feature for one release (1.19), then we enable it by default as Beta in the next release (1.20), then GA in release 1.22
Version Skew Strategy
Old kubelets that do not have support for this feature will just ignore the PodSpec setHostnameAsFQDN field.
Production Readiness Review Questionnaire
Feature enablement and rollback
This section must be completed when targeting alpha to a release.
How can this feature be enabled / disabled in a live cluster?
- Feature gate (also fill in values in
kep.yaml)- Feature gate name: setHostnameAsFQDN
- Components depending on the feature gate: kube-apiserver and kubelet
- Other
- Describe the mechanism:
- Will enabling / disabling the feature require downtime of the control plane?
- Will enabling / disabling the feature require downtime or reprovisioning of a node?
- Feature gate (also fill in values in
Does enabling the feature change any default behavior? No
Can the feature be disabled once it has been enabled (i.e. can we rollback the enablement)? Yes, it can be disabled. However, it will only have effect on newly created pods. Existing pods will keep having their FQDN as hostname, if they were configured for it.
What happens if we reenable the feature if it was previously rolled back? New pods, configured to have FQDN in hostname, will start getting FQDN in the hostname field of kernel.
Are there any tests for feature enablement/disablement? No, only manual testing was performed.
Rollout, Upgrade and Rollback Planning
This section must be completed when targeting beta graduation to a release.
How can a rollout fail? Can it impact already running workloads? No known failure modes.
What specific metrics should inform a rollback? Abnormal increase in
run_podsandbox_errors_totalcount could be related to this feature. We should filter those pods having issues to create sandbox and check whether they are stuck due to the length of their FQDN, as described in the proposal.Were upgrade and rollback tested? Was upgrade->downgrade->upgrade path tested? Yes. We tested enabling and disabling this feature. Running pods are not affected by either enabling nor disabling this feature. When disabling the feature, running Pods using the feature keep making use of it, while new pods do not get the setHostnameAsFQDN field even if a user tries to set it. Similarly, when reenabling the feature gate, existing pods keep existing behavior, and new pods that define setHostnameAsFQDN make use of the feature as expected.
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? N/A
Monitoring requirements
This section must be completed when targeting beta graduation to a release.
How can an operator determine if the feature is in use by workloads? Ideally, this should be a metrics. Operations against Kubernetes API (e.g. checking if there are objects with field X set) may be last resort. Avoid logs or events for this purpose.
Yes, operators can use Kubenetes API for this purpose. They would need to get all pods in the cluster and check if any has both
subdomainandsetHostnameAsFQDNfields set. For example, we could find the namespace and name of the pods using this feature with the following command:kubectl get pod -o json --all-namespaces | jq '.items[] | select(.spec.setHostnameAsFQDN=true) | select(.spec.subdomain!=null) | "\(.metadata.namespace):\(.metadata.name)"'What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
- Metrics
- Metric name:
run_podsandbox_errors_total - Comment: Abnormal increase in
run_podsandbox_errors_totalmight be related to this feature. We should check if the feature gate is enabled and pods are using it. - [Optional] Aggregation method:
- Components exposing the metric: Kubelet
- Metric name:
- Other (treat as last resort)
- Details:
- Metrics
What are the reasonable SLOs (Service Level Objectives) for the above SLIs? At the high-level this usually will be in the form of “high percentile of SLI per day <= X”. It’s impossible to provide a comprehensive guidance, but at the very high level (they needs more precise definitions) those may be things like:
- per-day percentage of API calls finishing with 5XX errors <= 1%
- 99% percentile over day of absolute value from (job creation time minus expected job creation time) for cron job <= 10%
- 99,9% of /health requests per day finish with 200 code N/A
Are there any missing metrics that would be useful to have to improve observability if this feature? Describe the metrics themselves and the reason they weren’t added (e.g. cost, implementation difficulties, etc.).
Dependencies
This section must be completed when targeting beta graduation to a release.
- Does this feature depend on any specific services running in the cluster? No
Scalability
For alpha, this section is encouraged: reviewers should consider these questions and attempt to answer them.
For beta, this section is required: reviewers must answer these questions.
For GA, this section is required: approvers should be able to confirms the previous answers based on experience in the field.
Will enabling / using this feature result in any new API calls? No
Will enabling / using this feature result in introducing new API types? No
Will enabling / using this feature result in any new calls to cloud provider? No
Will enabling / using this feature result in increasing size or count of the existing API objects? Pods using this feature are required to set a new field, which increases the size of their objects by a couple of bytes.
Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs ? No
Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components? No
Troubleshooting
Troubleshooting section serves the Playbook role as of now. We may consider
splitting it into a dedicated Playbook document (potentially with some monitoring
details). For now we leave it here though.
This section must be completed when targeting beta graduation to a release.
How does this feature react if the API server and/or etcd is unavailable? It is not affected.
What are other known failure modes?
For each of them fill in the following information by copying the below template:
- Pod FQDN is longer than 64 bytes
Detection: How can it be detected via metrics? Stated another way: how can an operator troubleshoot without logging into a master or worker node? Pods configured to obtain FQDN that make use of this feature will remain in Pending status generating error events regarding failure to create PodSandbox due to too long FQDN. We could use the metric
run_podsandbox_errors_totalto identify abnormal number of failures creating PodSandbox.Mitigations: What can be done to stop the bleeding, especially for already running user workloads? Pods having problems to start should unset the PodSpec field
setHostnameAsFQDN.Diagnostics: What are the useful log messages and their required logging levels that could help debugging the issue? This issue will be logged in Error level log messages and in the Events. The message will be something like
GeneratePodSandboxConfig for pod foo failed: Failed to construct FQDN from pod hostname and cluster domain, FQDN <long-fqdn> is too long (64 characters is the max, 70 characters requested)Testing: Are there any tests for failure mode? If not describe why. Both unittests and e2e tests cover this failure scenario.
- Pod FQDN is longer than 64 bytes
What steps should be taken if SLOs are not being met to determine the problem?
Implementation History
- 2020-05-08: KEP Opened PR kubernetes/enhancement #1792, Issue kubernetes/enhancement #1797
- 2020-05-20: KEP marked implementable and merged
- 2020-07-09: Documentation PR merged kubernetes/website #21210 and #22712
- 2020-07-19: Implementation of Feature Merged targeting 1.19 for Alpha. kubernetes/kubernetes #91699
- 2020-07-24: Review for API changes marked as Completed (for kubernetes/kubernetes #91699 changes)
- 2020-08-26: v1.19 includes feature in Alpha
- 2020-08-28: Feature e2e tests running in TestGrid under sig-node-kubelet/node-kubelet-alpha
Drawbacks
Setting the FQDN in the hostname field of the Kernel is not the standard in applications that have been developed to run in orchestration platforms such as Kubernetes. Additionally, the fact that the Kernel hostname field is limited to 64 bytes causes pretty poor failure modes, where users might not immediately know that something went wrong.
Alternatives
Alternative to creating this feature:
- Make users fix all their own legacy code to not assume the FQDN is the hostname, which does not seem practical.
Alternative to controlling the feature:
- We could also control the use of this feature using a Kubelet configuration flag. Configuration flags are harder to maintain and it requires from platforms, such as GKE, to include support for them. Additionally, using a PodSpec flag we ensure that the behavior of controllers, like Deployments, is consistent on all its pods. For example, if we were to use a Kubelet config flag we might end up on a situation where different pods of the same deployment behave differently.
Alternatives for improving UX of failure mode:
- Create an admission plugin that calculates the length of the FQDN of the Pod. The problem of this approach is that it might not cover all scenarios, there are many entry points to generate a pod, i.e., deployments, replicasets, CRD, etc. Another problem is that it breaks Kubernetes abstraction layers as we have to make assumptions from the top layer.
- Create non-retriable errors for pods. Currently failures like the one generated by this kernel hostname limit retry forever. It would be nice if we can define that an error is fatal, then the pod changes to Failed state.