KEP-5075: DRA Consumable Capacity

Release Signoff Checklist
Summary
- Goals
- Non-Goals
Proposal
- User Stories (Optional)
- Risks and Mitigations
Design Details
Production Readiness Review Questionnaire
Implementation History
Drawbacks
Alternatives
Infrastructure Needed (Optional)

Release Signoff Checklist

Items marked with (R) are required prior to targeting to a milestone / release.

(R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
(R) KEP approvers have approved the KEP status as implementable
(R) Design details are appropriately documented
(R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
(R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests
(R) Production readiness review completed
(R) Production readiness review approved
“Implementation History” section is up-to-date for milestone
User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

Summary

Without this KEP, device sharing is done by having multiple pods (and/or containers) reference the same resource claim, and that resource claim has allocated the device. This can be considered exclusive or dedicated allocation. With this KEP, independent resource claims (and/or requests within a claim) can allocate shares of resources provided by the same underlying device. This enables resource sharing across pods that are completely unrelated, potentially even across different namespaces.

When a device is shared across multiple resource claims, this enhancement enables device resource allocation to be drawn from the device’s overall capacity. It ensures that the total resources consumed by all claim requests remain within the device’s capacity and comply with any defined requestPolicy, such as minimum per-claim resource requirements, if specified. This concept is referred to as consumable capacity. If a request does not specify particular device resource requirements, it implies an expectation of full device capacity.

Notably, each of these independent resource claims can still be referenced by one or more pods. However, the device resources allocated to each request are shared without any isolation guarantees among the pods that reference the same request.

To achieve this, this KEP introduces

a new device property field to distinguish between devices those can be allocated only once and those can be allocated multiple times,
a capacity-aware scheduling mechanism that allows limiting or guaranteeing the capacity of devices among the resource claims (or requests) those are sharing,
a new capacity requirement field in the device request of the resource claim,
a new consumed capacity field in the allocation result of the resource claim,
a method to associate the allocated device status to the allocation result in the resource claim.

With those in place, a resource claim with multiple requests might allocate the same device multiple times. This may or may not be desired, so this KEP also introduces:

a distinct attribute constraint to prevent allocating the same multi-allocatable device in the same claim multiple times.

Relations to other KEPs:

KEP 4815 : The partitioned devices can be a multi-allocatable device or have mutually exclusive partitions where one partition is multi-allocatable and the other is not. The partitioning constraints (the remaining SharedCounters) are only checked once during the allocation of a multi-allocatable partitioned device. Meanwhile, the constraints introduced in this KEP (the remaining Capacity of the multi-allocatable partitioned device) are checked during every allocation of the same device.
KEP 5007 : The allocated share can be provisioned at the pre-bind step.
KEP 4817 : A single network device can be shared across multiple pods, with each allocated share’s NetworkData identified by a unique Share ID.
KEP 4816 : The enhancement must be able to handle subrequests when the DRAPrioritizedList feature is enabled.
KEP 5677 : This enhancement reports aggregated allocated, available devices and their resources.
KEP 5941 : The enhancement introduce a generic DRA model for parent-scoped shared capacities (sharedCapacities) consumed by related child devices.
KEP 5981 : The extractors of sharingAffinity acts as a structural gatekeeper before capacity subtraction occurs. It guarantees that multi-allocation devices do not accidentally mingle workloads from distinct administrative configurations.

A motivating use case is to allocate a multi-allocatable network device in the CNI DRA driver which can be selected by more than one pod on demand during scheduling. The original discussion is in this PR’s comment thread . The limitation of current implementation has been addressed here . The virtual network device is created and configured once the CNI is called based on the information of the master network device. The configured information specific to the generated device cannot be listed in the ResourceSlice in advance.

This feature is also beneficial for the other multi-allocatable devices which are not within scope of KEP-4815 . For instance, this feature will be allow reserving memory fraction of virtual GPU in the AWS virtual GPU device plugin . In other words, the device capacity allocation is determined by the user’s claim.

Goals

Introduce an ability to allocate a multi-allocatable device via DRA multiple times in scenarios where pre-defined partitions are not viable, for example because there would be too many of them.
Let DRA driver declare which device-level resource it can guarantee or reserve to a specific request and what are valid values that can be reserved,
Let users specify in device requests how much of certain device resources they require.

Non-Goals

Define driver-specific attributes and configs (such as CNI parameter config).
Support network security policy.
Support aggregated resource consumption where multiple devices are allocated to satisfy a single capacity request. This is related to the comment about distinctAttributes .
Support an extended use case where the resource guaranteeing behavior is determined by the first user request. For example, if the first request does not require a guarantee, the resource remains unguaranteed. However, if the first request requires a guarantee, the resource is marked as guaranteed, and all subsequent requests must adhere to that guarantee.

Proposal

User Stories (Optional)

Story 1

A DRA driver for networks advertises multi-allocatable devices for two interfaces eth1 and eth2 which each connect to the same virtual LAN, the admin makes those device available through a DeviceClass selecting only multi-allocatable devices, and users request access through a request which references that DeviceClass.

Story 2

When requesting two interfaces, the user requests two devices. To ensure that they don’t end up with the same multi-allocatable device for each request, they specify that the driver-specific “interfaceName” attribute must be different.

Story 3

A DRA driver for networks supports QoS guaranteed bandwidth which can ensures a specific bandwidth amount of the multi-allocatable network can be reserved exclusively to resource requests. A DRA driver also specifies minimum, maximum amount of reserved capacity for each resource request. When requesting the guaranteed network device, users specifies their required guaranteed bandwidth. Otherwise, the default value defined by the DRA driver is applied.

Risks and Mitigations

The requested amount in the resource claim may not satisfy the capacity request policy, especially if the requested amount exceeds the maximum allowed consumption.
This scenario should be handled similarly to other scheduling issues, such as when the request exceeds the allocatable capacity. In such cases, the allocation fails and the pod remains pending.
The driver includes both multi-allocatable and dedicated (non-multi-allocatable) devices. There is a risk that a user may be allocated a multi-allocatable device (e.g., a multi-allocatable network device) and accidentally configure it with the HostDevice CNI plugin. This would move the device from the host into the user’s pod, preventing other users from accessing the multi-allocatable device.
Mitigation:
- Device drivers should define a concrete request policy. If the device is intended to be shared without capacity limits among requests, the request policy should set the consumable value to zero.
- Additionally, administrators should define clear device classes for multi-allocatable and dedicated devices to prevent such misallocations.
When a driver changes a device property from dedicated to multi-allocatable, existing resource claims that have no specified consumed capacity will adopt a default quantity based on the defined request policy. This default may represent a fraction of the device, potentially altering the behavior of existing claims.
Mitigation:
- The existing allocation result, which has no share ID (as it was previously a dedicated device), will be included in the allocated list. Scheduler must ensure that the device cannot be allocated for another resource claim during the scheduling process.
When a driver updates the request policy, the behavior of resource claims changes.
Mitigation:
- Device driver should avoid updating the request policy, or do so with caution, if any devices have already been allocated to the ResourceClaim or are under preparation.

Design Details

This enhancement introduces a AllowMultipleAllocations field within the Device of the ResourceSlice to mark whether the device is multi-allocatable among multiple resource claims (or requests). The multi-allocatable device can be assigned to more than one request if it satisfies the selection criteria and constraints. The select condition device.allowMultipleAllocations == true/false can be used to select the device with a AllowMultipleAllocations property or not in a CEL selector.

The enhancement also adds a RequestPolicy field to DeviceCapacity. This field specifies how the device resources can be drawn from the device’s capacity for each claim request. The request policy can either specify a range of valid values or a discrete set of them. Each policy must have a default value.

If a device with the AllowMultipleAllocations property does not contain any Capacity, it can be allocated multiple times without device capacity constraints—that is, infinitely, as long as other scheduling conditions are met.

Users can define specific per-device resource requests using the newly added Capacity field, which is available in each supported device request type under DeviceRequest. Capacity contains a Requests map, where each entry specifies the required amount of a device resource. The amount available for allocation is determined by subtracting the aggregated allocation results of current claims from the device’s capacity as defined in the resource slice. The remaining amount will be used solely by the allocator and will not be reflected in the resource slice. The calculation of capacity requirements will round the requested capacity up to the nearest valid amount, based on the capacity’s request policy.

If users do not specify a capacity request, the consumed value will be:

the device’s full capacity or
the default value if it was specified by the request policy or
none if the device usage is unlimited

A device with AllowMultipleAllocations property can only be allocated when its consumability has been verified and its attributes match the request’s selectors and constraints. The newly added ConsumedCapacity field in the DeviceRequestAllocationResult will be set to the calculated capacity upon a successful allocation. This value may differ from the originally requested amount, as it is rounded up to the nearest valid value based on the device’s request policy.

API enhancement

To enable this enhancement, the following API updates are proposed.

ResourceSliceSpec’s Device

type Device struct {
...
    // AllowMultipleAllocations marks whether the device is allowed to be allocated to multiple DeviceRequests.
    //
    // If AllowMultipleAllocations is set to true, the device can be allocated more than once,
    // and all of its capacity is consumable, regardless of whether the requestPolicy is defined or not.
    //
    // +optional
    // +featureGate=DRAConsumableCapacity
    AllowMultipleAllocations *bool
}

type DeviceCapacity struct {
    // Value defines how much of a certain capacity that device has.
    //
    // This field reflects the fixed total capacity and does not change.
    // The consumed amount is tracked separately by scheduler
    // and does not affect this value.
    //
    // +required
    Value resource.Quantity

    // RequestPolicy defines how this DeviceCapacity must be consumed
    // when the device is allowed to be shared by multiple allocations.
    //
    // The Device must have allowMultipleAllocations set to true in order to set a requestPolicy.
    //
    // If unset, capacity requests are unconstrained:
    // requests can consume any amount of capacity, as long as the total consumed
    // across all allocations does not exceed the device's defined capacity.
    // If request is also unset, default is the full capacity value.
    //
    // +optional
    // +featureGate=DRAConsumableCapacity
    RequestPolicy *CapacityRequestPolicy
}

// CapacityRequestPolicy defines how requests consume device capacity.
//
// Must not set more than one ValidRequestValues.
type CapacityRequestPolicy struct {
    // Default specifies how much of this capacity is consumed by a request
    // that does not contain an entry for it in DeviceRequest's Capacity.
    //
    // +optional
    Default *resource.Quantity

    // ValidValues defines a set of acceptable quantity values in consuming requests.
    //
    // Must not contain more than 10 entries.
    // Must be sorted in ascending order.
    //
    // If this field is set,
    // Default must be defined and it must be included in ValidValues list.
    //
    // If the requested amount does not match any valid value but smaller than some valid values,
    // the scheduler calculates the smallest valid value that is greater than or equal to the request.
    // That is: min(ceil(requestedValue) ∈ validValues), where requestedValue ≤ max(validValues).
    //
    // If the requested amount exceeds all valid values, the request violates the policy,
    // and this device cannot be allocated.
    //
    // +optional
    // +listType=atomic
    // +oneOf=ValidRequestValues
    ValidValues []resource.Quantity

    // ValidRange defines an acceptable quantity value range in consuming requests.
    //
    // If this field is set,
    // Default must be defined and it must fall within the defined ValidRange.
    //
    // If the requested amount does not fall within the defined range, the request violates the policy,
    // and this device cannot be allocated.
    //
    // If the request doesn't contain this capacity entry, Default value is used.
    //
    // +optional
    // +oneOf=ValidRequestValues
    ValidRange *CapacityRequestPolicyRange
}

// CapacityRequestPolicyRange defines a valid range for consumable capacity values.
//
//   - If the requested amount is less than Min, it is rounded up to the Min value.
//   - If Step is set and the requested amount is between Min and Max but not aligned with Step,
//     it will be rounded up to the next value equal to Min + (n * Step).
//   - If Step is not set, the requested amount is used as-is if it falls within the range Min to Max (if set).
//   - If the requested or rounded amount exceeds Max (if set), the request does not satisfy the policy,
//     and the device cannot be allocated.
type CapacityRequestPolicyRange struct {
    // Min specifies the minimum capacity allowed for a consumption request.
    //
    // Min must be greater than or equal to zero,
    // and less than or equal to the capacity value.
    // requestPolicy.default must be more than or equal to the minimum.
    //
    // +required
    Min *resource.Quantity

    // Max defines the upper limit for capacity that can be requested.
    //
    // Max must be less than or equal to the capacity value.
    // Min and requestPolicy.default must be less than or equal to the maximum.
    //
    // +optional
    Max *resource.Quantity

    // Step defines the step size between valid capacity amounts within the range.
    //
    // Max (if set) and requestPolicy.default must be a multiple of Step.
    // Min + Step must be less than or equal to the capacity value.
    //
    // +optional
    Step *resource.Quantity
}

CELDeviceSelector’s description

type CELDeviceSelector struct {
    // ...
    // The expression's input is an object named "device", which carries
    // the following properties:
    //  - driver (string): the name of the driver which defines this device.
    //  - attributes (map[string]object): the device's attributes, grouped by prefix
    //    (e.g. device.attributes["dra.example.com"] evaluates to an object with all
    //    of the attributes which were prefixed by "dra.example.com".
    //  - capacity (map[string]object): the device's capacities, grouped by prefix.
    //  - allowMultipleAllocations (bool): the allowMultipleAllocations property of the device
    //    (v1.34+ with the DRAConsumableCapacity feature enabled).
    // ...
    // +required
    Expression string
}

ResourceClaimSpec’s DeviceRequest

The Capacity field is defined within each supported device request type, such as DeviceSubRequest and ExactDeviceRequest.

type DeviceSubRequest struct {

    // Capacity define resource requirements against each capacity.
    //
    // If this field is unset and the device supports multiple allocations,
    // the default value will be applied to each capacity according to requestPolicy.
    // For the capacity that has no requestPolicy, default is the full capacity value.
    //
    // Applies to each device allocation.
    // If Count > 1,
    // the request fails if there aren't enough devices that meet the requirements.
    // If AllocationMode is set to All,
    // the request fails if there are devices that otherwise match the request,
    // and have this capacity, with a value >= the requested amount, but which cannot be allocated to this request.
    //
    // +optional
    // +featureGate=DRAConsumableCapacity
    Capacity *CapacityRequirements

}

type ExactDeviceRequest struct {

    // Capacity define resource requirements against each capacity.
    //
    // If this field is unset and the device supports multiple allocations,
    // the default value will be applied to each capacity according to requestPolicy.
    // For the capacity that has no requestPolicy, default is the full capacity value.
    //
    // Applies to each device allocation.
    // If Count > 1,
    // the request fails if there aren't enough devices that meet the requirements.
    // If AllocationMode is set to All,
    // the request fails if there are devices that otherwise match the request,
    // and have this capacity, with a value >= the requested amount, but which cannot be allocated to this request.
    //
    // +optional
    // +featureGate=DRAConsumableCapacity
    Capacity *CapacityRequirements

}

// CapacityRequirements defines the capacity requirements for a specific device request.
type CapacityRequirements struct {
    // Requests represent individual device resource requests for distinct resources,
    // all of which must be provided by the device.
    //
    // This value is used as an additional filtering condition against the available capacity on the device.
    // This is semantically equivalent to a CEL selector with
    // `device.capacity[<domain>].<name>.compareTo(quantity(<request quantity>)) >= 0`.
    // For example, device.capacity['test-driver.cdi.k8s.io'].counters.compareTo(quantity('2')) >= 0.
    //
    // When a requestPolicy is defined, the requested amount is adjusted upward
    // to the nearest valid value based on the policy.
    // If the requested amount cannot be adjusted to a valid value—because it exceeds what the requestPolicy allows—
    // the device is considered ineligible for allocation.
    //
    // For any capacity that is not explicitly requested:
    // - If no requestPolicy is set, the default consumed capacity is equal to the full device capacity
    //   (i.e., the whole device is claimed).
    // - If a requestPolicy is set, the default consumed capacity is determined according to that policy.
    //
    // If the device allows multiple allocation,
    // the aggregated amount across all requests must not exceed the capacity value.
    // The consumed capacity, which may be adjusted based on the requestPolicy if defined,
    // is recorded in the resource claim’s status.devices[*].consumedCapacity field.
    //
    // +optional
    Requests map[QualifiedName]resource.Quantity
}


type DeviceConstraint struct {

    // DistinctAttribute requires that all devices in question have this
    // attribute and that its type and value are unique across those devices.
    //
    // This acts as the inverse of MatchAttribute.
    //
    // This constraint is used to avoid allocating multiple requests to the same device
    // by ensuring attribute-level differentiation.
    //
    // This is useful for scenarios where resource requests must be fulfilled by separate physical devices.
    // For example, a container requests two network interfaces that must be allocated from two different physical NICs.
    //
    // +optional
    // +oneOf=ConstraintType
    // +featureGate=DRAConsumableCapacity
    DistinctAttribute *FullyQualifiedName
}

ResourceClaimStatus’s DeviceRequestAllocationResult

type DeviceRequestAllocationResult struct {

    // ShareID uniquely identifies an individual allocation share of the device,
    // used when the device supports multiple simultaneous allocations.
    // It serves as an additional map key to differentiate concurrent shares
    // of the same device.
    //
    // +optional
    // +featureGate=DRAConsumableCapacity
    ShareID *types.UID

    // ConsumedCapacity tracks the amount of capacity consumed per device as part of the claim request.
    // The consumed amount may differ from the requested amount: it is rounded up to the nearest valid
    // value based on the device’s requestPolicy if applicable (i.e., may not be less than the requested amount).
    //
    // The total consumed capacity for each device must not exceed the DeviceCapacity's Value.
    //
    // This field is populated only for devices that allow multiple allocations.
    // All capacity entries are included, even if the consumed amount is zero.
    //
    // +optional
    // +featureGate=DRAConsumableCapacity
    ConsumedCapacity map[QualifiedName]resource.Quantity
}

type ResourceClaimStatus struct {

    // Devices contains the status of each device allocated for this
    // claim, as reported by the driver. This can include driver-specific
    // information. Entries are owned by their respective drivers.
    //
    // +optional
    // +listType=map
    // +listMapKey=driver
    // +listMapKey=device
    // +listMapKey=pool
    // +listMapKey=shareID
    // +featureGate=DRAResourceClaimDeviceStatus
    Devices []AllocatedDeviceStatus
}

type AllocatedDeviceStatus struct {

    // ShareID uniquely identifies an individual allocation share of the device.
    //
    // +optional
    // +featureGate=DRAConsumableCapacity
    ShareID *string
}

Scheduling enhancement

When the scheduler invokes the Allocate function in the allocator, the total allocated capacity is calculated by aggregating the consumedCapacity from all resource claims’s DeviceRequestAllocationResult that have already been allocated.
Before allocation proceeds, existing selection criteria (defined by alloc.isSelectable) are evaluated. These include the class selector and request selector.
A new device.allowMultipleAllocations key is introduced in the CEL selector, enabling policies and constraints to recognize whether a device supports allocation by multiple requests.
If a device is considered selectable, the CmpRequestOverCapacity function is invoked to verify whether the consumed capacity would exceed the device’s remaining capacity. The remaining capacity is calculated based on the sum of already allocated and currently allocating capacities.
- consumed capacity is derived from the requested amount specified in the resource claim, adjusted by the device’s capacity request policy, if defined.
- This value may differ from the originally requested amount—it is rounded up to the nearest valid capacity according to the policy (e.g., using Min + ⌈(Requested - Min)/Step⌉ × Step logic).
If the device has enough remaining capacity to satisfy the consumed amount, constraint checks are applied. In addition to the existing MatchAttribute, this proposal introduces a new constraint: DistinctAttribute, which ensures attribute uniqueness across allocated devices.
Once all selection and constraint checks pass, the allocation is valid. The allocation result is updated with:
- The share identifier (ShareID), which uniquely identifies the allocation on a device.
- The consumed capacity. This consumed capacity is tracked as part of the device’s allocatingCapacity, allowing it to be included in remaining capacity calculations for future allocations within the same call.
Finally, the share identifiers and consumed capacities from all internal results are propagated to the DeviceRequestAllocationResult.

Handles Device Updates for `AllowMultipleAllocations` and `RequestPolicy`

If a device is updated from dedicated (allowMultipleAllocations: false) to multi-allocatable (allowMultipleAllocations: true), it must continue to behave as a dedicated device and not allow sharing until all existing resource claims for that device are released.
If a device is updated from multi-allocatable to dedicated, it should no longer be available for new allocations. However, already allocated devices should not be deallocated.
If the request policy is later set, update, or unset, the change will apply only to future allocations. No rollback or changes will be applied to shared devices that have already been allocated.

Examples

DeviceClass’s selector

selectors:
  - cel:
      expression: |-
        device.allowMultipleAllocations == true

ResourceClaim with capacity requirement

kind: ResourceSlice
...
spec:
  driver: guaranteed-cni.dra.networking.x-k8s.io
  devices:
  - name: eth1
    basic:
      allowMultipleAllocations: true
      attributes:
        name:
          string: "eth1"
      capacity:
        bandwidth:
          requestPolicy:
            default: "1Mi"
            validRange:
              min: "1Mi"
              step: "8"
          value: "10Gi"

ResourceClaim’s request

kind: ResourceClaim
...
spec:
  devices:
    requests: # for devices
    - name: nic
      exactly:
        deviceClassName: qos-aware-shared.device.x-k8s.io
        capacity:
          requests: # for resources which must be provided by those devices
            bandwidth: 5Gi

ResourceClaim’s status

kind: ResourceClaim
...
status:
  allocation:
    devices:
      results:
      - consumedCapacity:
          bandwidth: 1Mi
        device: eth1
        shareID: "a671734a-e8e5-11e4-8fde-42010af09327"
        ...
 devices:
    - data:
        cniVersion: 1.1.0
        ips:
        - address: 10.0.103.49/16
      device: eth1
      shareID: "a671734a-e8e5-11e4-8fde-42010af09327"
      ...

ResourceClaim with distinctAttribute

kind: ResourceClaim
...
spec:
  devices:
    requests:
    - name: macvlan-1
      exactly:
        deviceClassName: simple-multialloc.networking.x-k8s.io
        allocationMode: ExactCount
        count: 1
    - name: macvlan-2
      exactly:
        deviceClassName: simple-multialloc.networking.x-k8s.io
        allocationMode: ExactCount
        count: 1
    constraints:
    - requests:
      - macvlan-1
      - macvlan-2
      distinctAttribute: interfaceName

Test Plan

[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.

Prerequisite testing updates

Unit tests

API Validations

Sharing Policy (Device Capacity Test)

Default value is required.
The default must be included in the options for validValues, or fall within the specified validRange.
validValues and validRange must not be defined at the same time within a single sharingpolicy.
validValues must be a list of unique values.
validValues must be in ascending order.
The validValues size should be kept within 10 to avoid excessive growth.
The minimum must be less than or equal to the maximum in the validRange.
If a chunk size is defined, both the default and the maximum must be multiples of the chunk size.
The minimum, maximum, and (minimum + chunk size) must each be less than the capacity value.
If AllowMultipleAllocations of the device is not set or set to false, RequestPolicyfor any of its capacity must not be defined.

Distinct Attribute

Similar to the matchAttribute, check for a missing domain and required name (invalid request).
If the feature gate is enabled, exactly one of matchAttribute or distinctAttribute must be provided.
If the feature gate is disabled, matchAttribute is required.

Share ID

When this feature gate and DRAResourceClaimDeviceStatus (KEP 4817 ) are enabled, the combination keys of driver, pool, device, and shareIDin Status.Devices must be a one-to-one mapping with those keys in Status.Allocation.Devices.
Must be a valid UID

Create Strategy

Keep fields if the feature is enabled for ResourceSlice, ResourceClaim, and ResourceClaimTemplate
Drop fields if the feature is disabled for ResourceSlice, ResourceClaim, and ResourceClaimTemplate

Update Strategy

Keep existing fields if the feature is enabled ResourceSlice, ResourceClaim, and ResourceClaimTemplate
Keep existing fields of ResourceSlice if any feature field is set in the old ResourceSlice.
Keep existing fields of ResourceClaim and ResourceClaimTemplate if any feature field is set in the old ResourceClaim or ResourceClaimTemplate.
Keep existing fields of ResourceClaim if any feature field is set in the old ResourceClaim.Status.
Drop fields if the feature is disabled and the fields has not been used by the old resource as described above.
- If the DRAPrioritizedList is enabled, the Capacity of DeviceSubRequest in FirstAvailable must be dropped as well.
The same strategy for ResourceClaim must be followed regardless of DRADeviceStatus feature enablement.

Allocator

Allow Multiple Allocations

can allocate a device which allow multiple allocations for multiple times
must not allocate a device which do not allow multiple allocations more than once
can exclude dedicated device from allocation with CEL
can limit allocation to multi-allocatable device with CEL
can work with DRAPartitionableDevices feature.

Consumable Capacity

can gather consumed capacity from allocated resource claims
can add/remove consumed capacity of allocating devices
can round up and compute user-requesting minimum capacity according to request policy range and chunk size
requested capacity for non-consumable capacity acts like a >= filter
can work with DRAPrioritizedList feature’s subrequests.

Distinct Attribute

can prevent allocating the same device in the same request with a distinct constraint
can allocate different device in the same request with a distinct contraint

Coverage

k8s.io/dynamic-resource-allocation/structured/internal/experimental: 4/2/2026 - 93.1
k8s.io/dynamic-resource-allocation/structured/internal/incubating: 4/2/2026 - 93.1
k8s.io/kubernetes/pkg/apis/resource/validation: 4/2/2026 - 96.8
k8s.io/kubernetes/pkg/registry/resource/resourceclaimtemplate: 4/2/2026 - 72.0
k8s.io/kubernetes/pkg/registry/resource/resourceclaim: 4/2/2026 - 87.1
k8s.io/kubernetes/pkg/registry/resource/resourceslice: 4/2/2026 - 77.7
k8s.io/kubernetes/pkg/kubelet/cm/dra: 4/2/2026 - 83.5
k8s.io/kubernetes/pkg/kubelet/cm/dra/plugin: 4/2/2026 - 83.5
k8s.io/kubernetes/pkg/kubelet/cm/dra/state: 4/2/2026 - 44.2

Integration tests

The existing integration tests for kube-scheduler which measure performance will be extended to cover the overheaad of running the additional logic to support the features in this KEP.

We extend the test for creating large ResourceSlices to ensure that a ResourceSlice using the new fields satisfies the etcd limits.

e2e tests

We extend the DRA test driver to enable support for this feature and add tests to ensure they are handled by the scheduler as described in this KEP.

The following functionalities should be covered in E2E tests:

ResourceSlice creation: The ResourceSlice must be created successfully with AllowMultipleAllocations and a RequestPolicy.
Pod Scheduling with Available Capacity: A Pod with a resource claim must run successfully when the requested capacity is available.
Capacity Enforcement: A Pod must stay in Pending state if it requests more than the remaining capacity, even if the request is less than the total capacity.
Capacity Release and Re-Scheduling: When a Pod is deleted, its reserved capacity must be released, and any pending Pod with a satisfied request must start running.

Graduation Criteria

Alpha

Feature implemented behind feature gates (DRAConsumableCapacity). Feature Gates are disabled by default.
Documentation provided
Initial unit, integration and e2e tests completed and enabled.

Beta

Feature Gates are enabled by default.
No major outstanding bugs.
2 examples of real-world use cases.
- CNI DRA driver (kubernetes-sigs/cni-dra-driver) can use this feature to manage and limit bandwidth quota.
- DRA Driver for CPU (kubernetes-sigs/dra-driver-cpu) can use this feature to manage and limit CPU resources.
Feedback collected from the community (developers and users) with adjustments provided, implemented and tested.

GA

Available for general testing via the DRA Example Driver (kubernetes-sigs/dra-example-driver)
2 examples of real-world use cases.
- DRA driver for multi-network can use this feature to manage and limit bandwidth quota.
- Acelerator DRA driver can use this feature for on-demand virtual memory allocation.
Allowing time for feedback from developers and users.
Concrete evaluation of scheduling performance metrics, addressing: https://github.com/kubernetes/kubernetes/pull/138618

Upgrade / Downgrade Strategy

In the context of this enhancement, the following strategy is proposed:

All introduced fields are optional and can be omitted if empty. This means that during the upgrade or downgrade process, if certain fields or configurations are not required, they can be left out without causing issues or disrupting the upgrade process.
The general upgrade and downgrade processes will follow the DRA strategy.
The upgrade and downgrade processes of shareID will follow the optional map keys strategy.
The upgrade and downgrade processes of allowMultipleAllocations CEL will follow the VersionOptions method.

Version Skew Strategy

During version skew, where the API server supports the feature but the scheduler does not, the scheduler will throw an error if the ResourceClaim contains Capacity to prevent allocating the devices that doesn’t meet the user requests. If there is no Capacity, the scheduler continues scheduling and ignores the allowMultipleAllocation and requestPolicy fields in ResourceSlice.

If the feature is enabled on scheduler but is disabled on API server. The scheduler can continue scheduling as-is without feature fields.

Production Readiness Review Questionnaire

Feature Enablement and Rollback

How can this feature be enabled / disabled in a live cluster?

Feature gate (also fill in values in kep.yaml)
- Feature gate name: DRAConsumableCapacity
- Components depending on the feature gate:
  - kube-scheduler
  - kubelet
  - kube-apiserver
Other
- Describe the mechanism:
- Will enabling / disabling the feature require downtime of the control plane?
- Will enabling / disabling the feature require downtime or reprovisioning of a node?

Does enabling the feature change any default behavior?

Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

Yes, this feature can be disabled once it has been enabled. The AllowMultipleAllocations flag, RequestPolicy and Capacity fields will be dropped. However, the ShareID, ConsumedCapacity, and renamed device (<device id>/<share id>) in device status needs to remain to keep the existing allocation result reference valid.

What happens if we reenable the feature if it was previously rolled back?

The fields will be available again for read and write. However, the previously dropped RequestPolicy, Capacity, and ConsumedCapacity will be missing.

Are there any tests for feature enablement/disablement?

The enablement and disablement of this feature are tested as part of the integration tests. Additionally, the feature enablement/disablement tests cover the scenario where the feature gate is switched from enabled to disabled after an allocation has already been made. In this case, the existing resource claim should remain valid, but the remaining device capacity must no longer be multi-allocatable.

Rollout, Upgrade and Rollback Planning

How can a rollout or rollback fail? Can it impact already running workloads?

Enabling the feature gate will enable the field to be written and therefore invoke validation of the field.
Disabling the feature gate will drop the ability to consume the capacity in scheduling so that the ConsumedCapacity in the allocation result should be also dropped. If the external party uses the reference to this field to manage the QoS-aware devices, it may fail if there is no handler.
Disabling the feature gate is equivalent to unset AllowMultipleAllocations and RequestPolicy, the scheduler will handle as described in this previous section .

What specific metrics should inform a rollback?

When we notice unexpected scheduler_unschedulable_pods{plugin="DynamicResources"} or metric scheduler_plugin_execution_duration_seconds{plugin="DynamicResources"} in the kube-scheduler suddenly increases.

Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

The manual test was performed on a local Kind cluster by manually disabling and enabling the feature gate for all control plane components on the Kind node.

When this feature is enabled, a ResourceClaim with the added fields can be deployed and the driver can advertise 10G of bandwidth. Workloads which requests 5G can request capacity from devices that allow multiple allocations, and the consumed capacity is updated in ResourceClaim.Status.

When the feature is disabled, existing workloads continue running, and there is no change to ResourceClaim, including the status of consumed capacity. However, new workloads, requesting 2G, that include a capacity request are rejected and remain in a pending state. Additionally, the fields added by this feature are removed when applying a new ResourceClaimTemplate.

When the feature is re-enabled, a new ResourceClaimTemplates can be created with the added fields. The scheduler can properly prevent over-provisioning of capacity when trying to deploy another workload which requests 8G while allow the workload which requests only 2G to run with their consumed capacity tracked in ResourceClaim.Status, as intended by this feature, without impacting on the first workload that was already running.

Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

Monitoring Requirements

How can an operator determine if the feature is in use by workloads?

Check the allowMultiAllocation flag in the resource slice.

How can someone using this feature know that it is working for their instance?

Events
- Event Reason:
API .status
- Condition name:
- Other field: ResourceClaim.Status.Allocation.Devices.Results[].ShareID
Other (treat as last resort)
- Details:

What are the reasonable SLOs (Service Level Objectives) for the enhancement?

Existing DRA and related SLOs continue to apply.

What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?

Metrics
- Metric names:
  - apiserver_request with resource="resourceclaims"
  - scheduler_unschedulable_pods with plugin="DynamicResources"
  - scheduler_plugin_execution_duration_seconds with plugin="DynamicResources"
    - For state gathering, extension_point="PreFilter"
    - For allocation, extension_point="Filter"
    - For status update, extension_point="PostFilter"
- [Optional] Aggregation method:
- Components exposing the metric: kube-apiserver, kube-scheduler
Other (treat as last resort)
- Details:

Are there any missing metrics that would be useful to have to improve observability of this feature?

No.

Dependencies

Does this feature depend on any specific services running in the cluster?

This feature depends on the DRA structured parameters feature being enabled, and on DRA drivers that support the feature being deployed. This feature also works with DRA device status feature if it is enabled.

Scalability

Will enabling / using this feature result in any new API calls?

No.

Will enabling / using this feature result in introducing new API types?

There will be CapacityRequestPolicy and CapacityRequirements struct added to DeviceCapacity in ResourceSlice and DeviceSubRequest/ExactDeviceRequest in ResourceClaim.

Will enabling / using this feature result in any new calls to the cloud provider?

No.

Will enabling / using this feature result in increasing size or count of the existing API objects?

Yes, when using this field, the user will add additional data in their ResourceSlice, ResourceClaim and ResourceClaimTemplate objects. This is an incremental increase on top of the existing structures.

Estimated increase in size:

~ 10 bytes of boolean pointer per device
~ 200-1100 bytes per request policy (max 10 options)
~ 100 bytes per capacitiy per request and allocation result (ResourceSliceMaxAttributesAndCapacitiesPerDevice=32)
~ 40 bytes of share ID per resource allocation
7 bytes extended name in device name if the device status feature is enabled

Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?

Scheduling a claim that uses this feature may take a bit longer, if it is necessary to calculate aggregation of consumed capacity before finding a suitable device. We will measure in beta timeframe.

Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?

No.

Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?

No.

Troubleshooting

The troubleshooting section in https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4381-dra-structured-parameters#troubleshooting still applies. The only additional failure modes comes from version skew in the cluster and the troubleshooting steps provided through the link above should be sufficient to determine the cause.

How does this feature react if the API server and/or etcd is unavailable?

See https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4381-dra-structured-parameters#how-does-this-feature-react-if-the-api-server-andor-etcd-is-unavailable .

What are other known failure modes?

See https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4381-dra-structured-parameters#what-are-other-known-failure-modes .

kube-scheduler cannot allocate ResourceClaims.
The shared device may not have sufficient capacity to satisfy the request. The log message Device capacity not enough and the capacities field in the log Allocating one device can provide further clues for investigation (require -v=7 on kube-scheduler).

If the feature is disabled but a ResourceClaim still requests capacity, the scheduler log will report: has capacity requests, but the DRAConsumableCapacity feature is disabled. Nevertheless, when using the allocator in stable mode, no logs related to the DRAConsumableCapacity feature will be emitted.

What steps should be taken if SLOs are not being met to determine the problem?

N/A

Implementation History

Alpha 1.34:

Initial implementation merged on 2025-07-30

Alpha 1.35:

Fix scheduler perf test (simplified) merged on 2025-09-10
Fix 133705 - failed to schedule next device PR 133706 has been pushed on 2025-08-26
Fix 134100 - integration with partitionable device PR 134103 has been pushed on 2025-09-17
Fix 134519 - add ShareID to kubelet plugin API PR 134520 has been pushed on 2025-10-10
Increase test coverage PR 134615 has been pushed on 2025-10-15

Beta 1.36:

Fix 136734 - missing GetSharedDeviceIDs bug in GatherAllocatedState has been pushed on 2025-02-04
Promote DRAConsumableCapacity to Beta PR 136611 has been pushed on 2026-01-29

Drawbacks

This adds complexity to the scheduler.

Alternatives

Identifying multi-allocatable Property of Device

Current Approach:

Use a boolean to indicate whether a device can be shared among multiple resource claims (or requests).

Pros:

Simple

Cons:

Implicit infinite sharing if no consuming capacity defined

Alternatives:

Use an enum, such as Allocatable, with defined values like:

AllocatableOnce — device can only be allocated once
AllocatableMultipleTimes — device can be allocated multiple times
Pros:
- Provides flexibility for future extension according to Kubernetes API conventions .
Cons:
- Increases the program’s memory footprint compared to a boolean when there is only a single binary option to serve the purpose.

Use a count field to specify how many times a device can be reallocated to different resource requests.
Pros:
- Simple.
- No implicit infinite sharing.
Cons:
- Not equivalent to the legacy CNI, which places no limit on the number of master devices, as long as the Pod can be successfully created.

Selecting/Deselecting multi-allocatable Devices

Current Approach:

Extend the CEL selector to recognize device.shareable for filtering multi-allocatable devices.

Alternative:

Introduce explicit flags in the resource request:

AllowShared: Opt-in to allow multi-allocatable devices.
RequireShared: Only allow multi-allocatable devices.

(Default: multi-allocatable devices are excluded unless explicitly allowed.)

Pros:

Does not affect dedicated device selection.
Easier for users to understand and configure, reducing the risk of mistakes.
More user-friendly than writing CEL expressions manually.

Cons:

Adds complexity to the allocation logic for multi-allocatable devices.
Introduces an additional field in resource requests.
May require an abstraction layer if more device features are added in the future.
Less explicit and expressive than CEL for advanced use cases.

Preventing Same multi-allocatable Device from Being Allocated Multiple Times in the Same Claim

Current Approach:

Introduce a new API-level constraint: DistinctAttribute, ensuring devices in a single claim have unique attribute values.

Alternative:

The scheduler enforces this behavior implicitly—never allocate the same multi-allocatable device multiple times to the same resource claim.

Pros:

Avoids any API changes—logic handled internally.

Cons:

Doesn’t support cases where a pod legitimately permits multiple fractions of capacity from the same multi-allocatable device. For example, when a pod uses two vGPUs for parallel processing, it may not require them to come from different devices. It can accept allocations from either the same or different multi-allocatable devices.
Not configurable—users can’t override this behavior when needed.

Identifying shared device in the device status

When the same device are allocated multiple times to different requests, ShareID is required to differentiate between different allocation especially useful when the different allocation has a different status set in the AllocatedDeviceStatus.

ShareID, which is intended to serve as part of a composite key for identifying devices, cannot be added to the map keys of the listType because fields used as map keys must be required. Making ShareID required is not an option, as the API feature gate should not introduce required fields.

Current Approach:

Update api-machinery to support adding new keys to listType=map - GitHub Issue

Alternative:

Append the Device with /<share id> and workarounds on validation function.

Defining a valid range for RequestPolicy

Current Approach:

Newly define minimum, maximum, and chunk size and implement a function to validate the value in range.

Alternative:

The range can be defined and validated using a LimitRange match. Enforcing min/max/default values via LimitRange is a generally useful mechanism.

Alternative words

Device.AllowMultipleAllocations: The alternative names proposed were: Shareable, Shared, and AllowShared.
Step: The alternative names proposed were: ChunkSize, StepSize, UnitSize.
DeviceRequest.Capacity: The alternative names proposed were: Capacities, Capacity.
DeviceRequest.Capacity.Requests: The alternative names proposed were: Required, Reservation, Consumption, Minimum and Min.
Requests was dropped once since it’s already used in the DRA API for device requests. Minimum was selected as an alternative because the actual consumed capacity can be rounded up based on the request policy — for example, to match a defined chunk size or meet a minimum requirement. However, Requests was reselected during API review because it is more align with the container spec and matches present semantic definition used elsewhere in the API (minimum guaranteed, must be satisfied). with the need of clear description to distinguish between requests for devices and requests for resources which must be provided by those devices.

Future Possibilities

RequestPolicy

The allocation strategy can be introduced for each capacity attribute defined in the RequestPolicy. For example, a strategy field could be added to explicitly define the scheduling behavior for a specific capacity:
```
requestPolicy:
  strategy: ...
```
For example,
- AlwaysConsumed: The default behavior. A predefined default value is always applied if no capacity is explicitly requested.
- ConsumedOrNever: If the first consumer specifies a capacity request, that capacity becomes consumable. If not, it remains non-consumable until the first consumer releases it.
- BlockOrShare: The inverse of ConsumedOrNever. If the first consumer requests no capacity, it consumes the entire device (i.e., full capacity). If it does specify a capacity request, the device remains multi-allocatable up to the guaranteed amount.
The current default behavior is AlwaysConsumed.
A common RequestPolicy can be defined in the Device struct (similar to mixins) and reference it using new fields named RequestPolicyRef or RequestPolicyName, which are mutually exclusive with the Default field as discussed in this comment’s thread .
Defining an inifinite requestPolicy zeroConsumption as a mutual exclusive definition to other valid value policies. This flag is equivalent to {default: 0, validValues{{0}}}. If the request doesn’t contain this capacity entry, zero value is used and Default must not be defined. See this comment for future discussion.

CapacityRequirements

Limits field to describe burstable consumption. Handling burstability would be the responsibility of the individual device driver, similar to how the CPU manager handles CPU burst behavior.

KEP-5075: DRA Consumable Capacity

KEP-5075: DRA Consumable Capacity

Release Signoff Checklist

Summary

Goals

Non-Goals

Proposal

User Stories (Optional)

Story 1

Story 2

Story 3

Risks and Mitigations

Design Details

API enhancement

ResourceSliceSpec’s Device

CELDeviceSelector’s description

ResourceClaimSpec’s DeviceRequest

ResourceClaimStatus’s DeviceRequestAllocationResult

Scheduling enhancement

Handles Device Updates for AllowMultipleAllocations and RequestPolicy

Examples

DeviceClass’s selector

ResourceClaim with capacity requirement

ResourceClaim’s request

ResourceClaim’s status

ResourceClaim with distinctAttribute

Test Plan

Prerequisite testing updates

Unit tests

API Validations

Allocator

Coverage

Integration tests

e2e tests

Graduation Criteria

Alpha

Beta

GA

Upgrade / Downgrade Strategy

Version Skew Strategy

Production Readiness Review Questionnaire

Feature Enablement and Rollback

How can this feature be enabled / disabled in a live cluster?

Does enabling the feature change any default behavior?

Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

What happens if we reenable the feature if it was previously rolled back?

Are there any tests for feature enablement/disablement?

Rollout, Upgrade and Rollback Planning

How can a rollout or rollback fail? Can it impact already running workloads?

What specific metrics should inform a rollback?

Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

Monitoring Requirements

How can an operator determine if the feature is in use by workloads?

How can someone using this feature know that it is working for their instance?

What are the reasonable SLOs (Service Level Objectives) for the enhancement?

What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?

Are there any missing metrics that would be useful to have to improve observability of this feature?

Dependencies

Does this feature depend on any specific services running in the cluster?

Scalability

Will enabling / using this feature result in any new API calls?

Will enabling / using this feature result in introducing new API types?

Will enabling / using this feature result in any new calls to the cloud provider?

Will enabling / using this feature result in increasing size or count of the existing API objects?

Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?

Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?

Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?

Troubleshooting

How does this feature react if the API server and/or etcd is unavailable?

What are other known failure modes?

What steps should be taken if SLOs are not being met to determine the problem?

Implementation History

Drawbacks

Alternatives

Identifying multi-allocatable Property of Device

Selecting/Deselecting multi-allocatable Devices

Preventing Same multi-allocatable Device from Being Allocated Multiple Times in the Same Claim

Identifying shared device in the device status

Defining a valid range for RequestPolicy

Handles Device Updates for `AllowMultipleAllocations` and `RequestPolicy`