KEP-1797: Configure FQDN as Hostname for Pods

Release Signoff Checklist
Summary
Motivation
- Goals
- Non-Goals
Proposal
Design Details
Production Readiness Review Questionnaire
Implementation History
Drawbacks
Alternatives
Infrastructure Needed (optional)

Release Signoff Checklist

Items marked with (R) are required prior to targeting to a milestone / release.

(R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
(R) KEP approvers have approved the KEP status as implementable
(R) Design details are appropriately documented
(R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
(R) Graduation criteria is in place
(R) Production readiness review completed
Production readiness review approved
“Implementation History” section is up-to-date for milestone
User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

Summary

This proposal gives users the ability to set a pod’s hostname to its Fully Qualified Domain Name (FQDN). A new PodSpec field setHostnameAsFQDN will be introduced. When a user sets this field to true, its Linux kernel hostname field (the nodename field of struct utsname ) will be set to its fully qualified domain name (FQDN). Hence, both uname -n and hostname –fqdn will return the pod’s FQDN. The new PodSpec field setHostnameAsFQDN will default to false to preserve current behavior, i.e., setting the hostname field of the kernel to the pod’s shortname.

Related Kubernetes issue #1791 .

Motivation

This feature would increase the interoperability of Kubernetes with legacy applications. Traditionally, Unix and certain Linux distributions, such as RedHat and CentOS, have recommended setting the kernel hostname field to the host FQDN. As a consequence, many applications created before Kubernetes rely on this behavior. Having this feature would help containerize existing applications without deep, risky code changes.

Goals

Giving users the ability to set the hostname field of the kernel to the FQDN of a Pod.

Non-Goals

Giving users a way to configure or enforce this feature cluster-wide.

Proposal

This proposal gives users the ability to set a pod’s hostname to its FQDN. A new PodSpec field named setHostnameAsFQDN will be introduced, with type *bool.

The values of setHostnameAsFQDN are:

nil (default): The Linux kernel hostname field (the nodename field of struct utsname ) of a pod will be set to its shortname. This is the current behavior.
False: Same as nil
True: The Linux kernel hostname field (the nodename field of struct utsname ) of a pod will be set to its fully qualified domain name (FQDN). FQDN is determined as described here )

User Stories

Story 1: User does not Configure Pod to have FQDN

Assume we have a pod named foo in a namespace bar. The PodSpec subdomain is not set. This pod does not have FQDN, so the value of setHostnameAsFQDN does not have an impact. The Pod spec for this example would be:

# Pod spec
apiVersion: v1
kind: Pod
metadata: {"name": "foo", "namespace": "bar"}
spec:
  ...

If we exec into the Pod:

uname -n returns foo
hostname --fqdn returns foo

Story 2: User Configures Pod to have FQDN

Assume we have a pod named foo in a namespace bar. The PodSpec subdomain is set to test. We also assume the cluster-domain is set to its default, i.e. cluster.local. The FQDN of this pod is defined as foo.test.bar.svc.cluster.local (see details here ). The user does not set setHostnameAsFQDN. The Pod spec for this example would be:

# Pod spec
apiVersion: v1
kind: Pod
metadata: {"name": "foo", "namespace": "bar"}
spec:
  ...
  hostname: "foo"  
  subdomain: "test"

If we exec into the Pod:

uname -n returns foo
hostname --fqdn returns foo.test.bar.svc.cluster.local

Story 3: User Configures Pod to have FQDN and it would like the pod hostname to be the FQDN

Assume we have a pod named foo in a namespace bar. The PodSpec subdomain is set to test. We also assume the cluster-domain is set to its default, i.e. cluster.local. The FQDN of this pod is defined as foo.test.bar.svc.cluster.local (see details in here ). Additionally, the user sets setHostnameAsFQDN: true. The Pod spec for this example would be:

# Pod spec
apiVersion: v1
kind: Pod
metadata: {"namespace": "bar", "name": "foo"}
spec:
  ...
  hostname: "foo"  
  subdomain: "test"
  setHostnameAsFQDN: "true"

If we exec into the Pod:

uname -n returns foo.test.bar.svc.cluster.local
hostname --fqdn returns foo.test.bar.svc.cluster.local

Notes/Constraints/Caveats

The hostname field of the Linux Kernel is limited to 64 bytes (see sethostname(2) ), while most Kubernetes resource types require a name as defined in RFC 1123 , which limits them to 63 bytes. Kubernetes attempts to include the Pod name as hostname, unless this limit is reached. When the limit is reached, Kubernetes has a series of mechanisms to deal with the issue. These include, truncating Pod hostname when a “Naked” Pod name is longer than 63 bytes, and having an alternative way of generating Pod names when they are part of a Controller, like a Deployment. Without any remediation, users might hit the 64 bytes kernel hostname limit, and Kubernetes will fail to create the Pod Sandbox and the pod will remain in “ContainerCreating” (Pending status) forever. The feature proposed here will make this issue occur more frequently, as now the whole FQDN would be limited to 64 bytes. Next we illustrate the issue with an example of a potential error message:

$ kubectl get pod
NAME                                                  READY   STATUS              RESTARTS   AGE
longpodnametestsaoitfail23423423432wer-547cc5-st6dd   0/1     ContainerCreating   0          52s

$ kubectl describe pod longpodnametestsaoitfail23423423432wer-547cc5-st6dd
Name:           longpodnametestsaoitfail23423423432wer-547cc5-st6dd
Namespace:      foo
...
...
Events:
  Type     Reason                  Age               From                                Message
  ----     ------                  ----              ----                                -------
  Normal   Scheduled               16s               default-scheduler                   Successfully assigned foo/longpodnametestsaoitfail23423423432wer-547cc5-st6dd to host.company.com
  Warning  FailedCreatePodSandBox  1s (x2 over 16s)  kubelet, host.company.com  Failed create pod sandbox: Failed to set FQDN in hostname, Pod hostname longpodnametestsaoitfail23423423432wer-547cc5-st6dd.p1324234234234.foo.svc.testq.company.com is too long (93 characters requested, 64 characters is the limit).

This failure mode is not great because it might not be apparent to users that their pods are failing. To improve the UX of this failure mode we will create an example Admission Controller that people can take and customize to apply their own policies. For example, if users care only about Deployments, they can make sure this Admission Controller account for the size of FQDN when the setHostnameAsFQDN and subdomain flags are set in the PodSpec template.

Behavior on Windows

There has been discussions with some members of the Sig-Windows group and it seems this feature does not make sense from the Windows perspective. However, the feature works as intended on Windows. Specifically, when the user configures pod to have an FQDN and sets setHostnameAsFQDN: true, Windows sets the registry value of ‘hostname’ for the registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters to the pod FQDN. When executing the command hostname, the pod FQDN is returned.

Risks and Mitigations

Design Details

Test Plan

The following end-to-end test is implemented in addition to unit tests:

Create e2e cases for the 3 User scenarios we described and check value returned by hostname/uname -n versus hostname --fqdn

Graduation Criteria

Compatible with major systems (e.g. linux, windows)

Alpha -> Beta Graduation

Gather feedback from users
Ensure e2e tests are running in Testgrid and they are stable

Beta -> GA Graduation

Allowing time for feedback from production users

Upgrade / Downgrade Strategy

We will gate off this feature for one release (1.19), then we enable it by default as Beta in the next release (1.20), then GA in release 1.22

Version Skew Strategy

Old kubelets that do not have support for this feature will just ignore the PodSpec setHostnameAsFQDN field.

Production Readiness Review Questionnaire

Feature enablement and rollback

This section must be completed when targeting alpha to a release.

How can this feature be enabled / disabled in a live cluster?
- Feature gate (also fill in values in kep.yaml)
  - Feature gate name: setHostnameAsFQDN
  - Components depending on the feature gate: kube-apiserver and kubelet
- Other
  - Describe the mechanism:
  - Will enabling / disabling the feature require downtime of the control plane?
  - Will enabling / disabling the feature require downtime or reprovisioning of a node?
Does enabling the feature change any default behavior? No
Can the feature be disabled once it has been enabled (i.e. can we rollback the enablement)? Yes, it can be disabled. However, it will only have effect on newly created pods. Existing pods will keep having their FQDN as hostname, if they were configured for it.
What happens if we reenable the feature if it was previously rolled back? New pods, configured to have FQDN in hostname, will start getting FQDN in the hostname field of kernel.
Are there any tests for feature enablement/disablement? No, only manual testing was performed.

Rollout, Upgrade and Rollback Planning

This section must be completed when targeting beta graduation to a release.

How can a rollout fail? Can it impact already running workloads? No known failure modes.
What specific metrics should inform a rollback? Abnormal increase in run_podsandbox_errors_total count could be related to this feature. We should filter those pods having issues to create sandbox and check whether they are stuck due to the length of their FQDN, as described in the proposal.
Were upgrade and rollback tested? Was upgrade->downgrade->upgrade path tested? Yes. We tested enabling and disabling this feature. Running pods are not affected by either enabling nor disabling this feature. When disabling the feature, running Pods using the feature keep making use of it, while new pods do not get the setHostnameAsFQDN field even if a user tries to set it. Similarly, when reenabling the feature gate, existing pods keep existing behavior, and new pods that define setHostnameAsFQDN make use of the feature as expected.
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? N/A

Monitoring requirements

This section must be completed when targeting beta graduation to a release.

How can an operator determine if the feature is in use by workloads? Ideally, this should be a metrics. Operations against Kubernetes API (e.g. checking if there are objects with field X set) may be last resort. Avoid logs or events for this purpose.
Yes, operators can use Kubenetes API for this purpose. They would need to get all pods in the cluster and check if any has both subdomain and setHostnameAsFQDN fields set. For example, we could find the namespace and name of the pods using this feature with the following command:
```
kubectl get pod -o json --all-namespaces | jq '.items[] | select(.spec.setHostnameAsFQDN=true) | select(.spec.subdomain!=null) | "\(.metadata.namespace):\(.metadata.name)"'
```
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
- Metrics
  - Metric name: run_podsandbox_errors_total
  - Comment: Abnormal increase in run_podsandbox_errors_total might be related to this feature. We should check if the feature gate is enabled and pods are using it.
  - [Optional] Aggregation method:
  - Components exposing the metric: Kubelet
- Other (treat as last resort)
  - Details:
What are the reasonable SLOs (Service Level Objectives) for the above SLIs? At the high-level this usually will be in the form of “high percentile of SLI per day <= X”. It’s impossible to provide a comprehensive guidance, but at the very high level (they needs more precise definitions) those may be things like:
- per-day percentage of API calls finishing with 5XX errors <= 1%
- 99% percentile over day of absolute value from (job creation time minus expected job creation time) for cron job <= 10%
- 99,9% of /health requests per day finish with 200 code N/A
Are there any missing metrics that would be useful to have to improve observability if this feature? Describe the metrics themselves and the reason they weren’t added (e.g. cost, implementation difficulties, etc.).

Dependencies

This section must be completed when targeting beta graduation to a release.

Does this feature depend on any specific services running in the cluster? No

Scalability

For alpha, this section is encouraged: reviewers should consider these questions and attempt to answer them.

For beta, this section is required: reviewers must answer these questions.

For GA, this section is required: approvers should be able to confirms the previous answers based on experience in the field.

Will enabling / using this feature result in any new API calls? No
Will enabling / using this feature result in introducing new API types? No
Will enabling / using this feature result in any new calls to cloud provider? No
Will enabling / using this feature result in increasing size or count of the existing API objects? Pods using this feature are required to set a new field, which increases the size of their objects by a couple of bytes.
Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs ? No
Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components? No

Troubleshooting

Troubleshooting section serves the Playbook role as of now. We may consider splitting it into a dedicated Playbook document (potentially with some monitoring details). For now we leave it here though.

This section must be completed when targeting beta graduation to a release.

How does this feature react if the API server and/or etcd is unavailable? It is not affected.
What are other known failure modes?
For each of them fill in the following information by copying the below template:
- Pod FQDN is longer than 64 bytes
  - Detection: How can it be detected via metrics? Stated another way: how can an operator troubleshoot without logging into a master or worker node? Pods configured to obtain FQDN that make use of this feature will remain in Pending status generating error events regarding failure to create PodSandbox due to too long FQDN. We could use the metric run_podsandbox_errors_total to identify abnormal number of failures creating PodSandbox.
  - Mitigations: What can be done to stop the bleeding, especially for already running user workloads? Pods having problems to start should unset the PodSpec field setHostnameAsFQDN.
  - Diagnostics: What are the useful log messages and their required logging levels that could help debugging the issue? This issue will be logged in Error level log messages and in the Events. The message will be something like GeneratePodSandboxConfig for pod foo failed: Failed to construct FQDN from pod hostname and cluster domain, FQDN <long-fqdn> is too long (64 characters is the max, 70 characters requested)
  - Testing: Are there any tests for failure mode? If not describe why. Both unittests and e2e tests cover this failure scenario.
What steps should be taken if SLOs are not being met to determine the problem?

Implementation History

2020-05-08: KEP Opened PR kubernetes/enhancement #1792, Issue kubernetes/enhancement #1797
2020-05-20: KEP marked implementable and merged
2020-07-09: Documentation PR merged kubernetes/website #21210 and #22712
2020-07-19: Implementation of Feature Merged targeting 1.19 for Alpha. kubernetes/kubernetes #91699
2020-07-24: Review for API changes marked as Completed (for kubernetes/kubernetes #91699 changes)
2020-08-26: v1.19 includes feature in Alpha
2020-08-28: Feature e2e tests running in TestGrid under sig-node-kubelet/node-kubelet-alpha

Drawbacks

Setting the FQDN in the hostname field of the Kernel is not the standard in applications that have been developed to run in orchestration platforms such as Kubernetes. Additionally, the fact that the Kernel hostname field is limited to 64 bytes causes pretty poor failure modes, where users might not immediately know that something went wrong.

Alternatives

Alternative to creating this feature:

Make users fix all their own legacy code to not assume the FQDN is the hostname, which does not seem practical.

Alternative to controlling the feature:

We could also control the use of this feature using a Kubelet configuration flag. Configuration flags are harder to maintain and it requires from platforms, such as GKE, to include support for them. Additionally, using a PodSpec flag we ensure that the behavior of controllers, like Deployments, is consistent on all its pods. For example, if we were to use a Kubelet config flag we might end up on a situation where different pods of the same deployment behave differently.

Alternatives for improving UX of failure mode:

Create an admission plugin that calculates the length of the FQDN of the Pod. The problem of this approach is that it might not cover all scenarios, there are many entry points to generate a pod, i.e., deployments, replicasets, CRD, etc. Another problem is that it breaks Kubernetes abstraction layers as we have to make assumptions from the top layer.
Create non-retriable errors for pods. Currently failures like the one generated by this kernel hostname limit retry forever. It would be nice if we can define that an error is fatal, then the pod changes to Failed state.

KEP-1797: Configure FQDN as Hostname for Pods

KEP-1797: Configure FQDN as Hostname for Pods

Release Signoff Checklist

Summary

Motivation

Goals

Non-Goals

Proposal

User Stories

Story 1: User does not Configure Pod to have FQDN

Story 2: User Configures Pod to have FQDN

Story 3: User Configures Pod to have FQDN and it would like the pod hostname to be the FQDN

Notes/Constraints/Caveats

Behavior on Windows

Risks and Mitigations

Design Details

Test Plan

Graduation Criteria

Alpha -> Beta Graduation

Beta -> GA Graduation

Upgrade / Downgrade Strategy

Version Skew Strategy

Production Readiness Review Questionnaire

Feature enablement and rollback

Rollout, Upgrade and Rollback Planning

Monitoring requirements

Dependencies

Scalability

Troubleshooting

Implementation History

Drawbacks

Alternatives

Infrastructure Needed (optional)