KEP-5311: Relaxed validation for Services names
KEP-5311: Relaxed validation for Services names
- Release Signoff Checklist
- Summary
- Motivation
- Proposal
- Design Details
- Production Readiness Review Questionnaire
- Implementation History
- Drawbacks
- Alternatives
Release Signoff Checklist
Items marked with (R) are required prior to targeting to a milestone / release.
- (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
- (R) KEP approvers have approved the KEP status as
implementable - (R) Design details are appropriately documented
- (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
- (R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests
- (R) Production readiness review completed
- (R) Production readiness review approved
- “Implementation History” section is up-to-date for milestone
- User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
- Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
Summary
This document proposes a relaxation of the Service name validation, in order bring it in line with the validation requirements of names of other resources in Kubernetes.
Motivation
At time of writing, Service name validation is stricter than most of the other Kubernetes resource names.
By losening the validation of Services, it simplifies the Kubernetes code base slightly, by removing the
apimachineryvalidation.NameIsDNS1035Label validation, which is only used by Service names.
Additionally, this change also allows users to name their Services with the same conventions as other of their resources, ie: Service names can now start with a digit.
Goals
- Allow Service names to be created using
apimachineryvalidation.NameIsDNSLabelvalidation
Non-Goals
- Change validation for other Kubernetes resource types
- Removal of the
apimachineryvalidation.NameIsDNS1035Labelfunction
Proposal
At time of writing Service names are validated with apimachineryvalidation.NameIsDNS1035Label.
The proposal is to change this validation to apimachineryvalidation.NameIsDNSLabel, allowing Service names to start with a digit.
Risks and Mitigations
- Services are responsible for creating DNS records (ie:
<service-name>.<namespace>.svc.cluster.local>). To confirm that downstream systems will support the new validation, we will conduct compatibility testing by:- Verifying that DNS providers used in Kubernetes clusters can handle the new service name format.
- Running integration tests to ensure that dependent components such as Ingress controllers and service discovery mechanisms function correctly with the updated validation.
- The Ingress resource references Service and will also need a relaxed validation on its reference to the Service
- Downstream applications may perform validation on fields relating to Service names (ie: ingress-nginx and will also need updating.
Design Details
Introduce a new feature gate named RelaxedServiceNameValidation, which is disabled by default in alpha.
When enabled, the feature gate will use the NameIsDNSLabel validation for new Services.
Since the relaxed check allows previously invalid values, care must be taken to support cluster downgrades safely. To accomplish this, the validation will distinguish between new resources and updates to existing resources:
When the feature gate is disabled:
- Creation of Services will use the previous
NameIsDNS1035Label()validation for.metadata.name - Updates of Services will no longer validate
metadata.name, since the field is immutable, so the existing value can be assumed to be valid. - Creation of Ingress will use the previous
NameIsDNS1035Label()validation for.spec.rules[].http.paths[].backend.service.name - Updates of Ingress will use the previous
NameIsDNS1035Label()validation for.spec.rules[].http.paths[].backend.service.nameif that field changes, otherwise, there is no validation
When the feature gate is enabled:
- Creation of Services will use new
NameIsDNSLabel()validation for.metadata.name - Creation and update of Ingress will use the new
NameIsDNSLabel()validation for.spec.rules[].http.paths[].backend.service.name
Test Plan
[X] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.
Prerequisite testing updates
Unit tests
Tests which validate Service creation/update and Ingress creation/update to be updated.
pkg/apis/core/validation:2025-05-24-84.7%pkg/apis/networking/validation:2025-05-24-91.9%
Integration tests
Alpha:
- With the feature gate enabled, test that Services can be created with both new and previous validation
- With the feature gate disabled, test that Services can be created with the previous validation, and fail when using the new validation
- Disable the feature gate and ensure that the Service can be edited without a validation error being returned
Beta:
Tests have been written: https://github.com/kubernetes/kubernetes/blob/v1.34.0/test/integration/service/service_test.go#L1219-L1309
e2e tests
Alpha:
- Create a Service that requires the new validation and test if a DNS lookup works for it
Beta:
An e2e exists: https://github.com/kubernetes/kubernetes/blob/v1.34.0/test/e2e/network/dns.go#L659-L686
Graduation Criteria
Alpha
- Feature implemented behind a feature flag
- Initial e2e tests completed and enabled
Beta
- E2E and Integration tests completed and enabled.
GA
- Time passes, no major objections
- Promote e2e test to conformance
Upgrade / Downgrade Strategy
Version Skew Strategy
Not applicable - only a single component is being changed.
Production Readiness Review Questionnaire
Feature Enablement and Rollback
How can this feature be enabled / disabled in a live cluster?
- Feature gate (also fill in values in
kep.yaml)- Feature gate name: RelaxedServiceNameValidation
- Components depending on the feature gate:
- kube-apiserver
Does enabling the feature change any default behavior?
No, as this feature is for validation only.
Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
Yes, via the feature gate.
What happens if we reenable the feature if it was previously rolled back?
The new relaxed validation will be enabled again.
Are there any tests for feature enablement/disablement?
The integration test will test disabling of the feature
Rollout, Upgrade and Rollback Planning
How can a rollout or rollback fail? Can it impact already running workloads?
An initial rollout cannot fail because the feature only changes the behavior when creating new Services and doesn’t affect already-created Services.
A rollback will always succeed if the user hasn’t created any Services that depend on the new validation.
A rollback to a version that has the feature gate turned on will work, even if there are
Services that depend on the new validation, since any version of Kubernetes with
this feature gate on will allow existing Services to have NameIsDNSLabel names. If a
user rolls back to a version without the feature gate on, or even earlier not having this functionality, and has Services that depend
on the new validation, they will be unable to modify those services, and will
probably need to delete them and recreate them with new names.
What specific metrics should inform a rollback?
The following metrics could indicate that this feature is failing:
apiserver_request_total{code=500, version=v1, resource=service, verb=POST}apiserver_request_total{code=500, version=v1, resource=service, verb=PATCH}
Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
Yes, using the following steps:
- Installed a 1.34 Kubernetes cluster with the
RelaxedServiceNameValidationfeature gate disabled - Attempted to create a service with a name starting in a digit - it failed as expected
- Upgraded to a custom built 1.35 Kubernetes cluster with the
RelaxedServiceNameValidationfeature gate enabled - Attempted to create a service with a name starting in a digit - it succeeded as expected
- Downgraded back to 1.34 with the
RelaxedServiceNameValidationfeature gate disabled - Edited that same service, and it succeeded as expected
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
No
Monitoring Requirements
How can an operator determine if the feature is in use by workloads?
Existence of any service which has a name that begins with a digit.
How can someone using this feature know that it is working for their instance?
If they are able to apply a service resource with a name that begings with a digit.
What are the reasonable SLOs (Service Level Objectives) for the enhancement?
N/A
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
N/A
Are there any missing metrics that would be useful to have to improve observability of this feature?
N/A
Dependencies
Does this feature depend on any specific services running in the cluster?
No. This is a change to API validation.
Scalability
N/A
Will enabling / using this feature result in any new API calls?
No. This is a change to validation of existing API calls.
Will enabling / using this feature result in introducing new API types?
No.
Will enabling / using this feature result in any new calls to the cloud provider?
No.
Will enabling / using this feature result in increasing size or count of the existing API objects?
No, the new validation doesn’t change the maximum length of the field.
Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
No.
Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?
No.
Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
No.
Troubleshooting
How does this feature react if the API server and/or etcd is unavailable?
N/A. This is a change to validation within the API server.
What are other known failure modes?
N/A
What steps should be taken if SLOs are not being met to determine the problem?
N/A
Implementation History
- Alpha
- KEP (
k/enhancements) update PR(s): - Code (
k/k) update PR(s): - Docs (
k/website) update PR(s):
- KEP (
- Beta
- KEP (
k/enhancements) update PR(s): - Code (
k/k) update PR(s):- (this PR was rolled back by the PR below) https://github.com/kubernetes/kubernetes/pull/134493
- https://github.com/kubernetes/kubernetes/pull/135426
- …
- Docs (
k/website) update(s):- (this PR was rolled back by the PR below) https://github.com/kubernetes/website/pull/52920
- https://github.com/kubernetes/website/pull/53471
- …
- KEP (
Drawbacks
3rd party tooling being incompatible with the new validation could introduce issues for users
Alternatives
N/A