KEP-5978: DRA: ClusterResourceClaimTemplate

Implementation History
ALPHA Withdrawn
Created 2026-06-02
Latest v1.37
Milestones
Alpha v1.37
Beta v1.38
Stable v1.40
Ownership
Owning SIG
SIG Node
Primary Authors

KEP-5978: DRA: ClusterResourceClaimTemplate

NOTE: This KEP was withdrawn in favor of the Out-of-Tree Synchronization Controller alternative .

Release Signoff Checklist

Items marked with (R) are required prior to targeting to a milestone / release.

  • (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
  • (R) KEP approvers have approved the KEP status as implementable
  • (R) Design details are appropriately documented
  • (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
    • e2e Tests for all Beta API Operations (endpoints)
    • (R) Ensure GA e2e tests meet requirements for Conformance Tests
    • (R) Minimum Two Week Window for GA e2e tests to prove flake free
  • (R) Graduation criteria is in place
  • (R) Production readiness review completed
  • (R) Production readiness review approved
  • “Implementation History” section is up-to-date for milestone
  • User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
  • Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

Summary

We propose introducing a new cluster-scoped API resource, ClusterResourceClaimTemplate, to allow cluster administrators to define standard Dynamic Resource Allocation (DRA) templates once at the cluster level. Any workload (via Pod or PodGroup) in any namespace can reference these templates. During Pod or PodGroup scheduling/admission, the control plane’s resource claim controller resolves the cluster template and automatically generates a namespace-local ResourceClaim under the Pod’s or PodGroup’s namespace.

Motivation

Currently, ResourceClaimTemplate is namespace-scoped. Administrators managing clusters with specialized hardware (GPUs, FPGAs, high-performance network devices) must deploy and synchronize identical ResourceClaimTemplate resources across all active namespaces. This namespace pollution makes cluster management complex, especially in dynamic, multi-tenant clusters. Cluster-scoped templates solve this by allowing a single central template definition.

Goals

  • Reduce administrative overhead: Eliminate the need to replicate and synchronize identical DRA templates across multiple namespaces.
  • Simplify workload configuration: Allow Pods and PodGroups in any namespace to easily consume centrally managed hardware templates.
  • Preserve namespace isolation: Ensure cluster-scoped templates do not introduce privilege escalation or cross-namespace risks.

Non-Goals

  • Changing how namespace-local ResourceClaimTemplate behaves.
  • Dynamically shadowing namespaced templates with cluster templates (resolution must be explicit).
  • Direct cluster-wide sharing of a single namespaced ResourceClaim instance (claims remain isolated per namespace).

Proposal

We propose introducing a new cluster-scoped API resource, ClusterResourceClaimTemplate, which serves as a cluster-wide template for generating ResourceClaims. Along with this, we propose adding a new clusterResourceClaimTemplateName field to PodResourceClaim and PodGroupResourceClaim alongside the existing resourceClaimName and resourceClaimTemplateName fields.

When clusterResourceClaimTemplateName is populated in a Pod or PodGroup specification, the ResourceClaim controller fetches the referenced ClusterResourceClaimTemplate and generates a corresponding ResourceClaim in the Pod’s or PodGroup’s namespace, maintaining standard resource ownership and garbage collection semantics.

User Stories (Optional)

Story 1: Exposing and Consuming Specialized Hardware Cluster-Wide

As a cluster administrator of an AI training platform, I want to expose high-performance GPUs (e.g., NVIDIA H100) to all research groups across the cluster without managing templates in every individual namespace.

  1. The administrator defines a single ClusterResourceClaimTemplate named nvidia-h100-80gb at the cluster scope.
  2. A developer/researcher in a newly provisioned namespace (e.g., project-alpha) creates a Pod that references this template. The ephemeral resource claim controller automatically generates the namespaced ResourceClaim in project-alpha with the spec defined in nvidia-h100-80gb:
apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
  namespace: project-alpha
spec:
  containers:
  - name: training
    image: my-training-image
    resources:
      claims:
        - name: gpu
  resourceClaims:
  - name: gpu
    clusterResourceClaimTemplateName: nvidia-h100-80gb

Notes/Constraints/Caveats (Optional)

Risks and Mitigations

Design Details

API Changes

We will introduce a new cluster-scoped resource ClusterResourceClaimTemplate in resource.k8s.io:

// ClusterResourceClaimTemplate is used to produce ResourceClaim objects.
// Cluster scoped.
type ClusterResourceClaimTemplate struct {
	metav1.TypeMeta   `json:""`
  // Standard object metadata
  // +optional
  metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"`

  // Describes the ResourceClaim that is to be generated.
  //
  // This field is immutable. A ResourceClaim will get created by the
  // control plane for a Pod when needed and then not get updated
  // anymore.
  // +optional
  Spec ClusterResourceClaimTemplateSpec `json:"spec" protobuf:"bytes,2,name=spec"`
}

// ClusterResourceClaimTemplateSpec contains the metadata and fields for a ResourceClaim.
type ClusterResourceClaimTemplateSpec struct {
  // ObjectMeta may contain labels and annotations that will be copied into the ResourceClaim
  // when creating it. No other fields are allowed and will be rejected during validation.
  // +optional
  metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"`

  // Spec for the ResourceClaim. The entire content is copied unchanged
  // into the ResourceClaim that gets created from this template. The
  // same fields as in a ResourceClaim are also valid here.
  // +optional
  Spec ResourceClaimSpec `json:"spec" protobuf:"bytes,2,name=spec"`
}

We will add ClusterResourceClaimTemplateName to PodResourceClaim in k8s.io/api/core/v1:

type PodResourceClaim struct {
  Name string `json:"name" protobuf:"bytes,1,name=name"`
  ResourceClaimName *string `json:"resourceClaimName,omitempty" protobuf:"bytes,3,opt,name=resourceClaimName"`
  ResourceClaimTemplateName *string `json:"resourceClaimTemplateName,omitempty" protobuf:"bytes,4,opt,name=resourceClaimTemplateName"`

  // ClusterResourceClaimTemplateName is the name of a ClusterResourceClaimTemplate
  // object in the cluster.
  //
  // The template will be used to create a new ResourceClaim in the same
  // namespace as the Pod, which will be bound to this Pod. Except for the
  // cluster scope, it behaves identically to ResourceClaimTemplateName.
  //
  // This field is immutable and no changes will be made to the
  // corresponding ResourceClaim by the control plane after creating the
  // ResourceClaim.
  //
  // Exactly one of ResourceClaimName, ResourceClaimTemplateName, and
  // ClusterResourceClaimTemplateName must be set.
  ClusterResourceClaimTemplateName *string `json:"clusterResourceClaimTemplateName,omitempty" protobuf:"bytes,5,opt,name=clusterResourceClaimTemplateName"`
}

We will also add ClusterResourceClaimTemplateName to PodGroupResourceClaim in k8s.io/api/scheduling/v1alpha3 (or the appropriate API group for PodGroups):

type PodGroupResourceClaim struct {
  Name string `json:"name"`
  ResourceClaimName *string `json:"resourceClaimName,omitempty"`
  ResourceClaimTemplateName *string `json:"resourceClaimTemplateName,omitempty"`

  // ClusterResourceClaimTemplateName is the name of a ClusterResourceClaimTemplate
  // object in the cluster.
  //
  // The template will be used to create a new ResourceClaim in the same
  // namespace as this PodGroup, which will be bound to this PodGroup. Except for the
  // cluster scope, it behaves identically to ResourceClaimTemplateName.
  //
  // This field is immutable and no changes will be made to the
  // corresponding ResourceClaim by the control plane after creating the
  // ResourceClaim.
  //
  // Exactly one of ResourceClaimName, ResourceClaimTemplateName, and
  // ClusterResourceClaimTemplateName must be set.
  ClusterResourceClaimTemplateName *string `json:"clusterResourceClaimTemplateName,omitempty"`
}

ResourceClaim Controller Changes

ResourceClaimController will be updated to:

  • Fetch ClusterResourceClaimTemplate resources using a cluster-level template informer.
  • When a PodResourceClaim references a ClusterResourceClaimTemplateName, create a namespaced ResourceClaim in the Pod’s namespace using the ClusterResourceClaimTemplate’s Spec.Spec, and attach standard OwnerReferences mapping it to the Pod.
  • When a PodGroupResourceClaim references a ClusterResourceClaimTemplateName, create a namespaced ResourceClaim in the PodGroup’s namespace using the ClusterResourceClaimTemplate’s Spec.Spec, and attach standard OwnerReferences mapping it to the PodGroup.

Admission & Validation

A Pod or PodGroup reference to a cluster-scoped template does not require explicit reference authorization checks (such as SubjectAccessReview) during Pod or PodGroup admission. This is consistent with other cluster-scoped templates and class resources like StorageClass or RuntimeClass.

Because a ClusterResourceClaimTemplate is a template containing a declarative ResourceClaimSpec and does not grant any special capabilities on its own, a user could always copy the template’s content and create an equivalent namespaced ResourceClaimTemplate or inline ResourceClaim. Enforcing access control on the actual underlying hardware or scheduling resources must be done at the resource driver level or via namespace resource quotas/limits.

During admission, the API server will perform standard structural validation on the clusterResourceClaimTemplateName field in both PodResourceClaim and PodGroupResourceClaim (e.g. checking that only one of the template/claim name fields is populated).

Test Plan

[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.

Prerequisite testing updates

None.

Unit tests
  • k8s.io/kubernetes/pkg/apis/core/validation: 85.5%
  • k8s.io/kubernetes/pkg/apis/scheduling/validation: 90.6%
  • k8s.io/kubernetes/pkg/apis/resource/validation: 96.6%
  • k8s.io/kubernetes/pkg/controller/resourceclaim: 75.0%
  • k8s.io/dynamic-resource-allocation/resourceclaim: 90.6%
Integration tests

Integration tests will verify:

  • Successful generation of namespaced ResourceClaims from ClusterResourceClaimTemplate resources when a Pod or PodGroup is created.
  • Correct ownerReferences on the generated ResourceClaims and matching garbage collection when Pods or PodGroups are deleted.
  • API validation ensuring that only one of resourceClaimName, resourceClaimTemplateName, and clusterResourceClaimTemplateName is set on Pod or PodGroup creation.
  • Proper behavior in edge cases (e.g. non-existent templates, updates to templates, and multi-namespace references).
e2e tests

E2E tests will verify:

  • Complete scheduling lifecycle of a Pod or PodGroup referencing a ClusterResourceClaimTemplate using a test DRA resource driver (successful allocation, execution, and deallocation/clean-up).
  • Correct resource isolation and independent allocation across multiple namespaces referencing the same cluster-scoped template.

Graduation Criteria

Alpha

  • API fields added and gate ClusterResourceClaimTemplates introduced.
  • Controller logic implemented for resolving cluster templates.
  • Tests added and enabled.

Beta

  • API types graduated to beta.
  • Feature gate enabled by default.
  • Gather feedback.

GA

  • API types graduated to GA.
  • Feature gate locked to true.

Upgrade / Downgrade Strategy

  • Upgrade: Enabling the ClusterResourceClaimTemplates feature gate registers the ClusterResourceClaimTemplate API. The feature is opt-in and does not affect existing workloads.
  • Downgrade: Disabling the feature gate will prevent the creation of new ClusterResourceClaimTemplate objects, and Pod or PodGroup creation requests referencing them will be rejected. Already-generated, namespaced ResourceClaims remain unaffected and continue to function as standard claims.

Version Skew Strategy

  • Kube-apiserver / Kube-controller-manager skew: If kube-apiserver is upgraded but kube-controller-manager is not, Pods or PodGroups referencing a cluster template will be admitted but remain in Pending state indefinitely because the claim controller won’t generate the namespaced ResourceClaims.
  • Kubelet skew: The kubelet only interacts with standard, namespaced ResourceClaim objects. Since the cluster-scoped template resolution is performed entirely by the control plane, there is no dependency or version skew concerns with older Kubelets.

Production Readiness Review Questionnaire

Feature Enablement and Rollback

How can this feature be enabled / disabled in a live cluster?
  • Feature gate (also fill in values in kep.yaml)
    • Feature gate name: ClusterResourceClaimTemplates
    • Components depending on the feature gate: kube-apiserver, kube-controller-manager
  • Other
    • Describe the mechanism:
    • Will enabling / disabling the feature require downtime of the control plane?
    • Will enabling / disabling the feature require downtime or reprovisioning of a node?
Does enabling the feature change any default behavior?

No.

Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

Yes. If disabled, new Pod or PodGroup specifications with clusterResourceClaimTemplateName set will be rejected during API validation. Existing Pods or PodGroups with the field populated will fail validation on any subsequent updates. Generated ResourceClaim resources already created will continue to exist and be bound, but further template resolution will be disabled.

What happens if we reenable the feature if it was previously rolled back?

The kube-apiserver and kube-controller-manager will resume resolving ClusterResourceClaimTemplate references and generating corresponding local ResourceClaim resources for any new or updated Pods or PodGroups.

Are there any tests for feature enablement/disablement?

We will cover feature enablement/disablement in both unit tests and integration tests.

Rollout, Upgrade and Rollback Planning

How can a rollout or rollback fail? Can it impact already running workloads?

During upgrade, if a new kube-apiserver is upgraded and admits a Pod or PodGroup with clusterResourceClaimTemplateName but kube-controller-manager has not yet been upgraded or does not have the feature gate enabled, the Pod or PodGroup will remain pending indefinitely because the claim controller won’t create the namespaced ResourceClaim. Already running workloads are unaffected.

What specific metrics should inform a rollback?
  • An increase in the number of Pods or PodGroups stuck in Pending state with unscheduled ResourceClaims.
  • High error rates in kube-controller-manager logs related to ResourceClaim creation.
Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

To be completed at Beta stage.

Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

No.

Monitoring Requirements

How can an operator determine if the feature is in use by workloads?

An operator can list ClusterResourceClaimTemplate objects or query kube-apiserver request metrics for clusterresourceclaimtemplates.

How can someone using this feature know that it is working for their instance?
  • Events
    • Event Reason:
  • API .status
    • Condition name: ResourceClaim transitions to status.allocation set.
    • Other field: Pod’s status.resourceClaims list, and the existence of the generated namespaced ResourceClaim owned by the Pod or PodGroup.
  • Other (treat as last resort)
    • Details:
What are the reasonable SLOs (Service Level Objectives) for the enhancement?

99% of local ResourceClaim creations corresponding to a ClusterResourceClaimTemplate should succeed within 1 second of the Pod or PodGroup being admitted.

What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
  • Metrics
    • Metric name: controller_runtime_reconcile_errors_total (filtered by resourceclaim controller)
    • [Optional] Aggregation method: sum
    • Components exposing the metric: kube-controller-manager
  • Other (treat as last resort)
    • Details:
Are there any missing metrics that would be useful to have to improve observability of this feature?

No. Standard API server request latency/error metrics and controller-runtime metrics are sufficient.

Dependencies

Does this feature depend on any specific services running in the cluster?

No.

Scalability

Will enabling / using this feature result in any new API calls?

Yes:

  • kube-controller-manager will list and watch ClusterResourceClaimTemplates (cluster-wide informer index).
  • Throughput is bounded by Pod or PodGroup creation/admission rates that use these templates.
Will enabling / using this feature result in introducing new API types?

Yes, ClusterResourceClaimTemplate.

Will enabling / using this feature result in any new calls to the cloud provider?

No.

Will enabling / using this feature result in increasing size or count of the existing API objects?

Yes, Pod or PodGroup specifications referencing a cluster template will have one extra field (clusterResourceClaimTemplateName) set.

Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?

No additional checks or operations are performed during Pod or PodGroup admission that would impact existing SLIs/SLOs.

Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?

No. The new informer in kube-controller-manager caches only ClusterResourceClaimTemplates which are highly static and low in count.

Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?

No.

Troubleshooting

How does this feature react if the API server and/or etcd is unavailable?

Admission/creation of Pods or PodGroups using the templates, as well as the creation, update, or deletion of the ClusterResourceClaimTemplate resources themselves, is blocked, similar to other Kubernetes resources.

What are other known failure modes?
  • Template Reference Resolution Failure: A Pod or PodGroup specifies a non-existent clusterResourceClaimTemplateName.
    • Detection: Pod or PodGroup remains Pending (or unscheduled) with a status event indicating that template resolution failed.
    • Mitigations: Correct the Pod or PodGroup specification to use a valid template or create the referenced ClusterResourceClaimTemplate.
    • Diagnostics: Look at events on the Pod/PodGroup and logs of kube-controller-manager.
What steps should be taken if SLOs are not being met to determine the problem?

Verify that kube-controller-manager has connection to kube-apiserver. Check kube-controller-manager logs and the rate of reconcile errors for the resourceclaim controller.

Implementation History

  • 2026-06-02: Initial KEP drafted and proposed targeting v1.37 Alpha.
  • 2026-06-16: KEP withdrawn in favor of an out-of-tree controller solution.

Drawbacks

Introducing ClusterResourceClaimTemplate adds another API resource to resource.k8s.io and expands the PodResourceClaim API structure. However, this is justified by the significant simplification it brings to cluster-scoped DRA configuration and resource control.

Alternatives

  • Shadowing / Fallback Resolution: Allow resourceClaimTemplateName to resolve to a cluster-scoped resource if the namespaced template is not found.
    • Why Rejected: Shadowing leads to unpredictable resolution behavior and silent failures. An administrator creating a namespaced template would silently override/shadow the cluster-scoped template, changing Pod provisioning paths without explicit audit trails. Dedicated fields make references completely explicit and easily auditable.

  • Out-of-Tree Synchronization Controller / CRD: Create an out-of-tree controller (for example, in a kubernetes-sigs repository) that automatically generates namespaced ResourceClaimTemplate objects in target namespaces. This could be driven either by a ClusterResourceClaimTemplate CRD or by publishing master ResourceClaimTemplate objects in a centralized controller namespace using a special naming convention or annotation mapping. Pods would then refer to the generated namespaced templates by name.
    • Trade-offs / Motivation for Withdrawal: While this approach involves certain trade-offs compared to an in-tree API:

      • Race Conditions: When a new namespace is created, Pods referencing the template might be rejected or fail to deploy if they are created before the out-of-tree controller finishes copying the template into the new namespace.
      • Resource Proliferation and Overhead: Duplicating identical template objects across tens or hundreds of namespaces creates unnecessary etcd storage overhead and additional API server load.

      Despite these trade-offs, the SIG decided to withdraw this KEP in favor of this out-of-tree controller approach. The alternative is considered good enough to fulfill the target use cases, and implementing an in-tree solution does not justify the required engineering effort and the expansion of the core Kubernetes API surface.

Infrastructure Needed (Optional)

None.