KEP-5975: Declarative API Definitions
KEP-5975: Declarative API Definitions
- Summary
- Motivation
- Proposal
- Examples
- Design Details
- Production Readiness Review Questionnaire
- Alternatives
- Future Work
- Implementation History
Note: This KEP tracks an internal code organization change that is not visible to end users. There are no feature gates and no alpha/beta/stable transitions. We are using the KEP process to encourage discussion and community involvement.
Summary
Each API’s directory should only contain code that makes it different from the standard pattern. Today, each contains hundreds of lines of boilerplate that obscure those differences.
It should be impossible for an API author to accidentally violate standard patterns. Deliberate violations should require an exception from an API reviewer.
Additionally, it should be easy to adhere to the standard patterns when in code. A simple resource should need nothing more than a trivial storage registration:
func NewREST(optsGetter generic.RESTOptionsGetter) (*REST, *StatusREST, error) {
return registry.NewREST(registry.RESTConfig{
Resource: widgets.Resource("widgets"),
Kind: "Widget",
}, optsGetter)
}
Which could potentially be generated in the future.
All code generation, validation/warning wiring, field wiping/resetting, generation management, field dropping of feature gated fields, and so on, should happen correctly by default. These behaviors should be driven by information already available in the API definition (types.go files) such as declarative validation and feature gate tags.
Motivation
Today, the standard pattern is reimplemented from scratch in every API’s strategy file. A resource that follows all conventions still requires ~100 lines of method implementations identical to every other resource. The most important resource-specific decisions are buried in the noise. This makes it harder to spot the non-standard hooks that reviewers should pay attention to and has resulted in mistakes which are challenging and risky to fix .
This is a natural follow-up to declarative validation (KEP-4153 , KEP-5073 ) and declarative defaulting (KEP-1929 ), which have done much of the heavy lifting and can be further leveraged here.
Every strategy file in pkg/registry/ implements the same ~12 methods:
status clearing, generation bumping, validation delegation, and feature-gate
field dropping. These follow rigid patterns across 67 files totaling ~12,000
lines.
Goals
- Make it impossible to author APIs that violate best practices without being granted an exception by an API reviewer.
- Eliminate the need to hand-write strategy boilerplate for resources that follow standard conventions while preserving the customization that is available today.
- All changes will be backwards compatible. External users of API frameworks and code generators will need to opt-in to the new behaviors.
Non-Goals
- A large-scale migration. This is intended to provide convenience to the existing API definition framework in a backward-compatible way and will be adopted as needed.
Background
This is not the first time we’ve done this. Back in 2016-2017 @enj did a major sweep:
- https://github.com/kubernetes/kubernetes/pull/37770
- https://github.com/kubernetes/kubernetes/pull/44779
- https://github.com/kubernetes/kubernetes/pull/46390
(thanks @liggitt for the PR links!)
Proposal
Enforcement of Best Practices
https://github.com/kubernetes/kubernetes/pull/137689 demonstrates using a cross-cutting blackbox test to ensure that all APIs adhere to field wiping conventions. We will expand approach with testing, linting, and exception lists to cover:
- ResetObjectMetaForStatus was used to reset metadata
- https://github.com/kubernetes/kubernetes/pull/137689 tests metadata wiping but does not ensure that ResetObjectMetaForStatus was used
- Generation management (set to 1 on create and monotomically increased when spec is changed by an update)
- Field dropping of feature gated fields
- It should be impossible to add a new field to an existing API without a feature gate
- If a field is feature gated, field dropping must be implemented. If it is not feature gated, the field must not be dropped.
- Field dropping must be tested (can we offer any test conveniences/automation?)
- Default behaviors:
AllowCreateOnUpdateisfalseAllowUnconditionalUpdateistrueDefaultGarbageCollectionPolicyisDeleteDependents
- Enablement of generated support code (DeepCopy, Declarative Validation, etc.)
It should be possible to make exceptions, but exceptions should be tracked in a file owned exclusively by API approvers.
We don’t want to have to remember to add the safety nets when adding new groups and new versions. So we will structure them such that they’re automatically added and automatically enforced.
API Declarations
Build on existing registry.Store and strategy interfaces in a
backward-compatible way by introducing a “configuration” layer
that declares a desired resource definition at a high level while still
providing a level of customization and extensibility.
Custom behavior can be added as needed:
func NewREST(optsGetter generic.RESTOptionsGetter) (*REST, *StatusREST, error) {
return registry.NewREST(registry.RESTConfig{
Resource: widgets.Resource("widgets"),
Kind: "Widget",
StrategyConfig: strategy.Config[*widgets.Widget]{
WarningsOnCreate: func(ctx context.Context, obj *widgets.Widget) []string {
return widgetWarnings(obj)
},
},
}, optsGetter)
}
Entirely handwritten strategies can also be used:
type podStrategy struct {}
func (s podStrategy) CheckGracefulDelete(
ctx context.Context, obj runtime.Object, opts *metav1.DeleteOptions) bool {
// Pod-specific logic
}
func NewREST(optsGetter generic.RESTOptionsGetter) (*REST, *StatusREST, error) {
return registry.NewREST(registry.RESTConfig{
Resource: core.Resource("pods"),
Kind: "Pod",
DeleteStrategy: &podStrategy{},
}, optsGetter)
}
The rest of this proposal focuses on what we can offer to minimize the amount of custom configuration needed.
Validation
Declarative validation will be called automatically if available. If handwritten validation is provided, all handwritten and declarative validation code will be called and mismatch checking will be run.
For example,
func NewREST(optsGetter generic.RESTOptionsGetter) (*REST, *StatusREST, error) {
return registry.NewREST(registry.RESTConfig{
Resource: widgets.Resource("widgets"),
Kind: "Widget",
StrategyConfig: strategy.Config[*widgets.Widget]{
Validate: func(ctx context.Context, obj *widgets.Widget) field.ErrorList {
// Call handwritten validation here (Declarative Validation is called and mismatch-checked automatically)
},
},
}, optsGetter)
}
Warnings
A small extension to validation-gen to generate warning-producing code. In the future, validation errors and warnings may be output in a single validation pass, but that is not a goal for this KEP.
Field Wiping and Fields Resetting
Default behavior will follow best practices:
- Main strategy: clears status on create (
obj.Status = TypeStatus{}), and clears status changes on update (new.Status = old.Status). - Status substrategy: clears spec, labels, and annotations changes and all new fields
(
new.Spec = old.Spec,metav1.ResetObjectMetaForStatus, etc.). New fields, such as new metadata fields, are cleared unless we take active effort to allow status writes to them.
Managed fields are reset to match the fields that are wiped.
This requires Spec/Status accessor interfaces .
Generation Management
Default behavior will follow best practice: Generation is set to 1 on create and bumped on update exactly when Spec changes.
This requires Spec/Status accessor interfaces .
Feature-Gate Field Dropping
The +k8s:featureGate=<GateName> tag being added as part of declarative
validation can also serve as the source of truth for field dropping. The
field-dropping code will be generated by validation-gen. Generated
functions are consumed via the config’s DropDisabledFields hook or
by the default strategies automatically.
Examples
Adding a New API Resource
For a resource following all best practices:
func NewREST(optsGetter generic.RESTOptionsGetter) (*REST, *StatusREST, error) {
return registry.NewREST(registry.RESTConfig{
Resource: widgets.Resource("widgets"),
Kind: "Widget",
}, optsGetter)
}
Adding a New Field to an Existing API
Adding a new field to an API requires a feature gate:
type WidgetSpec struct {
// +k8s:featureGate=WidgetPriority
// +k8s:optional
// +k8s:minimum=0
// +k8s:maximum=1000
Priority *int32 `json:"priority,omitempty"`
}
The appropriate field dropping is generated automatically when the feature gate is provided.
Declarative validation also automatically handles proper validation of the field when the feature gate is off (i.e. in-use detection and ratcheting).
Design Details
registry.RESTConfig
Provides resource identity, naming, and optional customization of name generation, handwritten validation, handwritten warnings, selectable fields, and printer columns.
Required:
| Field | Type | Purpose |
|---|---|---|
Resource | schema.GroupResource | Resource identity (group + plural name), matching the existing DefaultQualifiedResource pattern |
Kind | string | Kind name; the internal GVK is derived as Resource.Group/__internal/Kind |
Optional properties:
| Field | Default | Purpose |
|---|---|---|
NameGenerator | SimpleNameGenerator | Name generation for generateName |
TableConvertor | default | Custom printer columns |
SelectableFields | metadata only | Custom field selectors beyond metadata |
Strategy overrides:
| Field | Default | Purpose |
|---|---|---|
StrategyConfig | nil | Typed hooks for custom strategy behavior (see strategy.Config ) |
When StrategyConfig is set, the provided configuration customizes strategy.
strategy.Config
strategy.Config provides fields and hooks for custom strategy behavior.
Hooks:
| Hook | Purpose |
|---|---|
Validate / ValidateUpdate | Additional hand-written validation (merged with DV) |
WarningsOnCreate / WarningsOnUpdate | Custom warning messages |
PrepareForCreate / PrepareForUpdate | Optional hooks for hand-written preparation |
Status substrategy:
When the type has a Status field, a status substrategy is created
automatically. If it needs customization, a nested Status *StatusConfig[T]
provides hooks.
Spec/Status Accessor Interfaces
Our implementation needs to copy, clear, and compare Spec and
Status fields. To support this, we propose generating accessor interfaces
that provide type-safe access without reflection:
type SpecAccessor[S any] interface {
GetSpec() S
SetSpec(S)
}
type StatusAccessor[T any] interface {
GetStatus() T
SetStatus(T)
}
type GenerationAccessor interface {
GetGeneration() int64
SetGeneratioin(value int64)
}
Implementations are trivial one-liners, and can be generated by a deepcopy-gen style generator (which already walks the type graph).
Test Plan
[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.
Prerequisite testing updates
None. Existing per-resource tests serve as the compatibility gate.
Unit tests
- Unit tests for all new framework types and workflows that are introduced.
Integration tests
- All above “Enforcement of Best Practices” tests are implemented
e2e tests
Not applicable.
Graduation Criteria
This is an internal refactoring with no user-visible behavior change. There are no feature gates and no alpha/beta/stable transitions. Once the new way of defining strategies is implemented and is in use by at least five APIs, we will mark this KEP as implemented.
Upgrade / Downgrade Strategy
Not applicable. Internal refactoring only.
Version Skew Strategy
Not applicable. Entirely within the kube-apiserver binary.
Production Readiness Review Questionnaire
Feature Enablement and Rollback
How can this feature be enabled / disabled in a live cluster?
- Other
- Describe the mechanism: Internal refactoring, always enabled. No feature gate — no behavioral change.
- Will enabling / disabling the feature require downtime of the control plane? No.
- Will enabling / disabling the feature require downtime or reprovisioning of a node? No.
Does enabling the feature change any default behavior?
No.
Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
Not applicable — code refactoring, not a runtime feature.
What happens if we reenable the feature if it was previously rolled back?
Not applicable.
Are there any tests for feature enablement/disablement?
Not applicable — no feature gate.
Rollout, Upgrade and Rollback Planning
How can a rollout or rollback fail? Can it impact already running workloads?
Not applicable.
What specific metrics should inform a rollback?
Not applicable.
Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
Not applicable.
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
No.
Monitoring Requirements
How can an operator determine if the feature is in use by workloads?
Not applicable — internal refactoring only.
How can someone using this feature know that it is working for their instance?
Not applicable.
What are the reasonable SLOs (Service Level Objectives) for the enhancement?
Not applicable.
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
Not applicable.
Are there any missing metrics that would be useful to have to improve observability of this feature?
No.
Dependencies
Does this feature depend on any specific services running in the cluster?
No.
Scalability
Will enabling / using this feature result in any new API calls?
No.
Will enabling / using this feature result in introducing new API types?
No.
Will enabling / using this feature result in any new calls to the cloud provider?
No.
Will enabling / using this feature result in increasing size or count of the existing API objects?
No.
Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
No.
Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?
No.
Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
No.
Troubleshooting
How does this feature react if the API server and/or etcd is unavailable?
Not applicable.
What are other known failure modes?
None.
What steps should be taken if SLOs are not being met to determine the problem?
Not applicable.
Alternatives
Declarative tags for all strategy behavior. Tags break down for complex validation options, cross-field generation tracking, custom warnings, and injected dependencies. Tags are appropriate for field dropping, validation, warnings, and defaulting, but for strategy and storage definitions, keeping the definitions in Go provides type safety.
Future Work
- REST install registration. The
StorageProviderwiring (~4,200 lines) that maps resources to their REST storage follows a repeating pattern per API group. Withregistry.NewREST, much of this could be simplified. - etcd storage test data.
test/integration/etcd/data.go(~1,000 lines) contains a per-resource entry with a JSON stub, expected etcd path, and introduced version. This data could potentially be derived from the resource identity and scheme. - Selectable fields will be tagged with
+k8s:selectableFieldon the type definition. Generated functions would bewired into the Store automatically. This begs the question: Should selectable fields be codified into types.go? - For 1:1 field-to-column mapping, fields could be tagged with
+k8s:printerColumn. A generatedTableConvertoris used by default. Computed columns (derived from multiple fields) overrideTableConvertorviaRESTConfig. This also begs the question: Should printer columns be codified into types.go?
Implementation History
- 2026-03-23: Initial KEP filed