KEP-5978: DRA: ClusterResourceClaimTemplate
KEP-5978: DRA: ClusterResourceClaimTemplate
NOTE: This KEP was withdrawn in favor of the Out-of-Tree Synchronization Controller alternative .
- Release Signoff Checklist
- Summary
- Motivation
- Proposal
- Design Details
- Production Readiness Review Questionnaire
- Implementation History
- Drawbacks
- Alternatives
- Infrastructure Needed (Optional)
Release Signoff Checklist
Items marked with (R) are required prior to targeting to a milestone / release.
- (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
- (R) KEP approvers have approved the KEP status as
implementable - (R) Design details are appropriately documented
- (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
- (R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests within one minor version of promotion to GA
- (R) Production readiness review completed
- (R) Production readiness review approved
- “Implementation History” section is up-to-date for milestone
- User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
- Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
Summary
We propose introducing a new cluster-scoped API resource, ClusterResourceClaimTemplate, to allow cluster administrators to define standard Dynamic Resource Allocation (DRA) templates once at the cluster level. Any workload (via Pod or PodGroup) in any namespace can reference these templates. During Pod or PodGroup scheduling/admission, the control plane’s resource claim controller resolves the cluster template and automatically generates a namespace-local ResourceClaim under the Pod’s or PodGroup’s namespace.
Motivation
Currently, ResourceClaimTemplate is namespace-scoped. Administrators managing clusters with specialized hardware (GPUs, FPGAs, high-performance network devices) must deploy and synchronize identical ResourceClaimTemplate resources across all active namespaces. This namespace pollution makes cluster management complex, especially in dynamic, multi-tenant clusters. Cluster-scoped templates solve this by allowing a single central template definition.
Goals
- Reduce administrative overhead: Eliminate the need to replicate and synchronize identical DRA templates across multiple namespaces.
- Simplify workload configuration: Allow Pods and PodGroups in any namespace to easily consume centrally managed hardware templates.
- Preserve namespace isolation: Ensure cluster-scoped templates do not introduce privilege escalation or cross-namespace risks.
Non-Goals
- Changing how namespace-local ResourceClaimTemplate behaves.
- Dynamically shadowing namespaced templates with cluster templates (resolution must be explicit).
- Direct cluster-wide sharing of a single namespaced ResourceClaim instance (claims remain isolated per namespace).
Proposal
We propose introducing a new cluster-scoped API resource,
ClusterResourceClaimTemplate, which serves as a cluster-wide template for
generating ResourceClaims. Along with this, we propose adding a new
clusterResourceClaimTemplateName field to PodResourceClaim and PodGroupResourceClaim alongside the
existing resourceClaimName and resourceClaimTemplateName fields.
When clusterResourceClaimTemplateName is populated in a Pod or PodGroup specification, the
ResourceClaim controller fetches the referenced ClusterResourceClaimTemplate
and generates a corresponding ResourceClaim in the Pod’s or PodGroup’s namespace,
maintaining standard resource ownership and garbage collection semantics.
User Stories (Optional)
Story 1: Exposing and Consuming Specialized Hardware Cluster-Wide
As a cluster administrator of an AI training platform, I want to expose high-performance GPUs (e.g., NVIDIA H100) to all research groups across the cluster without managing templates in every individual namespace.
- The administrator defines a single ClusterResourceClaimTemplate named
nvidia-h100-80gbat the cluster scope. - A developer/researcher in a newly provisioned namespace (e.g.,
project-alpha) creates a Pod that references this template. The ephemeral resource claim controller automatically generates the namespaced ResourceClaim inproject-alphawith the spec defined innvidia-h100-80gb:
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
namespace: project-alpha
spec:
containers:
- name: training
image: my-training-image
resources:
claims:
- name: gpu
resourceClaims:
- name: gpu
clusterResourceClaimTemplateName: nvidia-h100-80gb
Notes/Constraints/Caveats (Optional)
Risks and Mitigations
Design Details
API Changes
We will introduce a new cluster-scoped resource ClusterResourceClaimTemplate
in resource.k8s.io:
// ClusterResourceClaimTemplate is used to produce ResourceClaim objects.
// Cluster scoped.
type ClusterResourceClaimTemplate struct {
metav1.TypeMeta `json:""`
// Standard object metadata
// +optional
metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"`
// Describes the ResourceClaim that is to be generated.
//
// This field is immutable. A ResourceClaim will get created by the
// control plane for a Pod when needed and then not get updated
// anymore.
// +optional
Spec ClusterResourceClaimTemplateSpec `json:"spec" protobuf:"bytes,2,name=spec"`
}
// ClusterResourceClaimTemplateSpec contains the metadata and fields for a ResourceClaim.
type ClusterResourceClaimTemplateSpec struct {
// ObjectMeta may contain labels and annotations that will be copied into the ResourceClaim
// when creating it. No other fields are allowed and will be rejected during validation.
// +optional
metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"`
// Spec for the ResourceClaim. The entire content is copied unchanged
// into the ResourceClaim that gets created from this template. The
// same fields as in a ResourceClaim are also valid here.
// +optional
Spec ResourceClaimSpec `json:"spec" protobuf:"bytes,2,name=spec"`
}
We will add ClusterResourceClaimTemplateName to PodResourceClaim in k8s.io/api/core/v1:
type PodResourceClaim struct {
Name string `json:"name" protobuf:"bytes,1,name=name"`
ResourceClaimName *string `json:"resourceClaimName,omitempty" protobuf:"bytes,3,opt,name=resourceClaimName"`
ResourceClaimTemplateName *string `json:"resourceClaimTemplateName,omitempty" protobuf:"bytes,4,opt,name=resourceClaimTemplateName"`
// ClusterResourceClaimTemplateName is the name of a ClusterResourceClaimTemplate
// object in the cluster.
//
// The template will be used to create a new ResourceClaim in the same
// namespace as the Pod, which will be bound to this Pod. Except for the
// cluster scope, it behaves identically to ResourceClaimTemplateName.
//
// This field is immutable and no changes will be made to the
// corresponding ResourceClaim by the control plane after creating the
// ResourceClaim.
//
// Exactly one of ResourceClaimName, ResourceClaimTemplateName, and
// ClusterResourceClaimTemplateName must be set.
ClusterResourceClaimTemplateName *string `json:"clusterResourceClaimTemplateName,omitempty" protobuf:"bytes,5,opt,name=clusterResourceClaimTemplateName"`
}
We will also add ClusterResourceClaimTemplateName to PodGroupResourceClaim
in k8s.io/api/scheduling/v1alpha3 (or the appropriate API group for
PodGroups):
type PodGroupResourceClaim struct {
Name string `json:"name"`
ResourceClaimName *string `json:"resourceClaimName,omitempty"`
ResourceClaimTemplateName *string `json:"resourceClaimTemplateName,omitempty"`
// ClusterResourceClaimTemplateName is the name of a ClusterResourceClaimTemplate
// object in the cluster.
//
// The template will be used to create a new ResourceClaim in the same
// namespace as this PodGroup, which will be bound to this PodGroup. Except for the
// cluster scope, it behaves identically to ResourceClaimTemplateName.
//
// This field is immutable and no changes will be made to the
// corresponding ResourceClaim by the control plane after creating the
// ResourceClaim.
//
// Exactly one of ResourceClaimName, ResourceClaimTemplateName, and
// ClusterResourceClaimTemplateName must be set.
ClusterResourceClaimTemplateName *string `json:"clusterResourceClaimTemplateName,omitempty"`
}
ResourceClaim Controller Changes
ResourceClaimController will be updated to:
- Fetch ClusterResourceClaimTemplate resources using a cluster-level template informer.
- When a
PodResourceClaimreferences aClusterResourceClaimTemplateName, create a namespaced ResourceClaim in the Pod’s namespace using the ClusterResourceClaimTemplate’sSpec.Spec, and attach standard OwnerReferences mapping it to the Pod. - When a
PodGroupResourceClaimreferences aClusterResourceClaimTemplateName, create a namespaced ResourceClaim in the PodGroup’s namespace using the ClusterResourceClaimTemplate’sSpec.Spec, and attach standard OwnerReferences mapping it to the PodGroup.
Admission & Validation
A Pod or PodGroup reference to a cluster-scoped template does not require explicit reference authorization checks (such as SubjectAccessReview) during Pod or PodGroup admission. This is consistent with other cluster-scoped templates and class resources like StorageClass or RuntimeClass.
Because a ClusterResourceClaimTemplate is a template containing a declarative
ResourceClaimSpec and does not grant any special capabilities on its own, a
user could always copy the template’s content and create an equivalent
namespaced ResourceClaimTemplate or inline ResourceClaim. Enforcing access
control on the actual underlying hardware or scheduling resources must be done
at the resource driver level or via namespace resource quotas/limits.
During admission, the API server will perform standard structural validation on
the clusterResourceClaimTemplateName field in both PodResourceClaim and PodGroupResourceClaim (e.g. checking that only one of the
template/claim name fields is populated).
Test Plan
[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.
Prerequisite testing updates
None.
Unit tests
k8s.io/kubernetes/pkg/apis/core/validation:85.5%k8s.io/kubernetes/pkg/apis/scheduling/validation:90.6%k8s.io/kubernetes/pkg/apis/resource/validation:96.6%k8s.io/kubernetes/pkg/controller/resourceclaim:75.0%k8s.io/dynamic-resource-allocation/resourceclaim:90.6%
Integration tests
Integration tests will verify:
- Successful generation of namespaced ResourceClaims from ClusterResourceClaimTemplate resources when a Pod or PodGroup is created.
- Correct
ownerReferenceson the generated ResourceClaims and matching garbage collection when Pods or PodGroups are deleted. - API validation ensuring that only one of
resourceClaimName,resourceClaimTemplateName, andclusterResourceClaimTemplateNameis set on Pod or PodGroup creation. - Proper behavior in edge cases (e.g. non-existent templates, updates to templates, and multi-namespace references).
e2e tests
E2E tests will verify:
- Complete scheduling lifecycle of a Pod or PodGroup referencing a ClusterResourceClaimTemplate using a test DRA resource driver (successful allocation, execution, and deallocation/clean-up).
- Correct resource isolation and independent allocation across multiple namespaces referencing the same cluster-scoped template.
Graduation Criteria
Alpha
- API fields added and gate
ClusterResourceClaimTemplatesintroduced. - Controller logic implemented for resolving cluster templates.
- Tests added and enabled.
Beta
- API types graduated to beta.
- Feature gate enabled by default.
- Gather feedback.
GA
- API types graduated to GA.
- Feature gate locked to true.
Upgrade / Downgrade Strategy
- Upgrade: Enabling the
ClusterResourceClaimTemplatesfeature gate registers the ClusterResourceClaimTemplate API. The feature is opt-in and does not affect existing workloads. - Downgrade: Disabling the feature gate will prevent the creation of new ClusterResourceClaimTemplate objects, and Pod or PodGroup creation requests referencing them will be rejected. Already-generated, namespaced ResourceClaims remain unaffected and continue to function as standard claims.
Version Skew Strategy
- Kube-apiserver / Kube-controller-manager skew: If
kube-apiserveris upgraded butkube-controller-manageris not, Pods or PodGroups referencing a cluster template will be admitted but remain inPendingstate indefinitely because the claim controller won’t generate the namespaced ResourceClaims. - Kubelet skew: The kubelet only interacts with standard, namespaced ResourceClaim objects. Since the cluster-scoped template resolution is performed entirely by the control plane, there is no dependency or version skew concerns with older Kubelets.
Production Readiness Review Questionnaire
Feature Enablement and Rollback
How can this feature be enabled / disabled in a live cluster?
- Feature gate (also fill in values in
kep.yaml)- Feature gate name:
ClusterResourceClaimTemplates - Components depending on the feature gate:
kube-apiserver,kube-controller-manager
- Feature gate name:
- Other
- Describe the mechanism:
- Will enabling / disabling the feature require downtime of the control plane?
- Will enabling / disabling the feature require downtime or reprovisioning of a node?
Does enabling the feature change any default behavior?
No.
Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
Yes. If disabled, new Pod or PodGroup specifications with clusterResourceClaimTemplateName
set will be rejected during API validation. Existing Pods or PodGroups with the field
populated will fail validation on any subsequent updates. Generated
ResourceClaim resources already created will continue to exist and be bound,
but further template resolution will be disabled.
What happens if we reenable the feature if it was previously rolled back?
The kube-apiserver and kube-controller-manager will resume resolving
ClusterResourceClaimTemplate references and generating corresponding local
ResourceClaim resources for any new or updated Pods or PodGroups.
Are there any tests for feature enablement/disablement?
We will cover feature enablement/disablement in both unit tests and integration tests.
Rollout, Upgrade and Rollback Planning
How can a rollout or rollback fail? Can it impact already running workloads?
During upgrade, if a new kube-apiserver is upgraded and admits a Pod or PodGroup with
clusterResourceClaimTemplateName but kube-controller-manager has not yet
been upgraded or does not have the feature gate enabled, the Pod or PodGroup will remain
pending indefinitely because the claim controller won’t create the namespaced
ResourceClaim. Already running workloads are unaffected.
What specific metrics should inform a rollback?
- An increase in the number of Pods or PodGroups stuck in
Pendingstate with unscheduled ResourceClaims. - High error rates in
kube-controller-managerlogs related to ResourceClaim creation.
Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
To be completed at Beta stage.
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
No.
Monitoring Requirements
How can an operator determine if the feature is in use by workloads?
An operator can list ClusterResourceClaimTemplate objects or query
kube-apiserver request metrics for clusterresourceclaimtemplates.
How can someone using this feature know that it is working for their instance?
- Events
- Event Reason:
- API .status
- Condition name: ResourceClaim transitions to
status.allocationset. - Other field: Pod’s
status.resourceClaimslist, and the existence of the generated namespaced ResourceClaim owned by the Pod or PodGroup.
- Condition name: ResourceClaim transitions to
- Other (treat as last resort)
- Details:
What are the reasonable SLOs (Service Level Objectives) for the enhancement?
99% of local ResourceClaim creations corresponding to a ClusterResourceClaimTemplate should succeed within 1 second of the Pod or PodGroup being admitted.
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
- Metrics
- Metric name:
controller_runtime_reconcile_errors_total(filtered byresourceclaimcontroller) - [Optional] Aggregation method:
sum - Components exposing the metric:
kube-controller-manager
- Metric name:
- Other (treat as last resort)
- Details:
Are there any missing metrics that would be useful to have to improve observability of this feature?
No. Standard API server request latency/error metrics and controller-runtime metrics are sufficient.
Dependencies
Does this feature depend on any specific services running in the cluster?
No.
Scalability
Will enabling / using this feature result in any new API calls?
Yes:
kube-controller-managerwill list and watch ClusterResourceClaimTemplates (cluster-wide informer index).- Throughput is bounded by Pod or PodGroup creation/admission rates that use these templates.
Will enabling / using this feature result in introducing new API types?
Yes, ClusterResourceClaimTemplate.
Will enabling / using this feature result in any new calls to the cloud provider?
No.
Will enabling / using this feature result in increasing size or count of the existing API objects?
Yes, Pod or PodGroup specifications referencing a cluster template will have one extra field
(clusterResourceClaimTemplateName) set.
Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
No additional checks or operations are performed during Pod or PodGroup admission that would impact existing SLIs/SLOs.
Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?
No. The new informer in kube-controller-manager caches only
ClusterResourceClaimTemplates which are highly static and low in count.
Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
No.
Troubleshooting
How does this feature react if the API server and/or etcd is unavailable?
Admission/creation of Pods or PodGroups using the templates, as well as the creation, update, or deletion of the ClusterResourceClaimTemplate resources themselves, is blocked, similar to other Kubernetes resources.
What are other known failure modes?
- Template Reference Resolution Failure: A Pod or PodGroup specifies a non-existent
clusterResourceClaimTemplateName.- Detection: Pod or PodGroup remains
Pending(or unscheduled) with a status event indicating that template resolution failed. - Mitigations: Correct the Pod or PodGroup specification to use a valid template or create the referenced ClusterResourceClaimTemplate.
- Diagnostics: Look at events on the Pod/PodGroup and logs of
kube-controller-manager.
- Detection: Pod or PodGroup remains
What steps should be taken if SLOs are not being met to determine the problem?
Verify that kube-controller-manager has connection to kube-apiserver. Check
kube-controller-manager logs and the rate of reconcile errors for the
resourceclaim controller.
Implementation History
- 2026-06-02: Initial KEP drafted and proposed targeting v1.37 Alpha.
- 2026-06-16: KEP withdrawn in favor of an out-of-tree controller solution.
Drawbacks
Introducing ClusterResourceClaimTemplate adds another API resource to
resource.k8s.io and expands the PodResourceClaim API structure. However,
this is justified by the significant simplification it brings to cluster-scoped
DRA configuration and resource control.
Alternatives
- Shadowing / Fallback Resolution: Allow
resourceClaimTemplateNameto resolve to a cluster-scoped resource if the namespaced template is not found.- Why Rejected: Shadowing leads to unpredictable resolution behavior and silent failures. An administrator creating a namespaced template would silently override/shadow the cluster-scoped template, changing Pod provisioning paths without explicit audit trails. Dedicated fields make references completely explicit and easily auditable.
- Out-of-Tree Synchronization Controller / CRD: Create an out-of-tree
controller (for example, in a
kubernetes-sigsrepository) that automatically generates namespacedResourceClaimTemplateobjects in target namespaces. This could be driven either by aClusterResourceClaimTemplateCRD or by publishing masterResourceClaimTemplateobjects in a centralized controller namespace using a special naming convention or annotation mapping. Pods would then refer to the generated namespaced templates by name.Trade-offs / Motivation for Withdrawal: While this approach involves certain trade-offs compared to an in-tree API:
- Race Conditions: When a new namespace is created, Pods referencing the template might be rejected or fail to deploy if they are created before the out-of-tree controller finishes copying the template into the new namespace.
- Resource Proliferation and Overhead: Duplicating identical template
objects across tens or hundreds of namespaces creates unnecessary
etcdstorage overhead and additional API server load.
Despite these trade-offs, the SIG decided to withdraw this KEP in favor of this out-of-tree controller approach. The alternative is considered good enough to fulfill the target use cases, and implementing an in-tree solution does not justify the required engineering effort and the expansion of the core Kubernetes API surface.
Infrastructure Needed (Optional)
None.