KEP-5234: DRA ResourceSlice Mixins
KEP-5234: DRA: ResourceSlice Mixins
- Release Signoff Checklist
- Summary
- Motivation
- Proposal
- Design Details
- Production Readiness Review Questionnaire
- Implementation History
- Drawbacks
- Alternatives
- Infrastructure Needed (Optional)
Release Signoff Checklist
Items marked with (R) are required prior to targeting to a milestone / release.
- (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
- (R) KEP approvers have approved the KEP status as
implementable - (R) Design details are appropriately documented
- (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
- (R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests
- (R) Production readiness review completed
- (R) Production readiness review approved
- “Implementation History” section is up-to-date for milestone
- User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
- Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
Summary
With Dynamic Resource Allocation (DRA), DRA drivers publish information about the devices that they manage in ResourceSlices. This information is used by the scheduler when selecting devices for user requests in ResourceClaims.
With this KEP, DRA drivers can define metadata in mixins separately from specific devices and include them in a device by reference. This reduces the duplication in ResourceSlices and allows for more compact device definitions. Mixins can also be used in counter sets.
Motivation
DRA requires that drivers publish all available devices on a node/cluster in
ResourceSlice objects. There are scenarios where the number of devices
can be pretty large and each device might have a relatively large amount
of metadata associated with it, primarily in the form of attributes and
capacity. This has a few consequences:
- Several of the devices might have similar metadata, resulting in a lot of duplication between the published devices.
- The size of the data required to specify each device reduces the number of devices that can be defined in a single ResourceSlice without hitting the limit on the total size of objects in Kubernetes (1.5MB by default, but can be changed).
The latter can be addressed by splitting the devices across multiple
ResourceSlices within a single pool, but that isn’t always an option.
In particular, DRA currently doesn’t allow sharing counters across ResourceSlices,
meaning that the number of devices that can fit into a single ResourceSlice
also limits the number of partitionable devices for a single physical device.
Goals
- Enable a more compact way to define devices in ResourceSlices so duplication can be reduced and a larger number of devices can be published within a single ResourceSlice.
- Enable defining counter sets with more counters and devices with more counsumed counters.
Non-Goals
- Not part of the plan for alpha: developing kubectl command or plugin to let users see the flattened device definitions. Mixins does make it harder to find the full definition for a specific device, so this might be added to the scope for Beta or GA.
- Enable devices to have more than 32 attributes and capacities. Increasing this would have implications for the CEL cost functions, so we are not looking to increase the limits as part of this KEP.
Proposal
The proposal has two parts to it, the definition of mixins and the mechanism for referencing mixins from devices and counter sets.
A new Mixins field will be added to the ResourceSliceSpec as an
optional field of type ResourceSliceMixins. It will have three properties, one for each of the three
types of mixins that will be supported:
The
CounterSetfield defines a list of namedCounterSetMixins. These define counters that can be used to extend the counters explicitly defined in aCounterSet. This allows for reduced duplication if there are many identical physical devices that must be represented asCounterSets.CounterSetMixinscannot be referenced directly by devices.The
Devicefield is a list of namedDeviceMixins. These define attributes and capacities that can be used to extend what is defined explicitly inDevice.DeviceMixins cannot be allocated directly, but can only be referenced by devices.The
DeviceCounterConsumptionfield defines a list of namedDeviceCounterConsumptionMixins. These define the pattern of consumption of counters, distinct from the specific underlying counter set from which they are being consumed. TheCounterSetfrom which the counters will be consumed is not specified in theDeviceCounterConsumptionMixin, but rather provided when the mixin is referenced from the device.
The mixins are referenced using the same pattern in all three places. The field
is named Includes and will contain a list of references to the mixins. The mixins
are applied in the order listed, meaning that later mixins will overwrite earlier
ones in case of conflicts. Properties set directly on the CounterSet, Device or
DeviceCounterConsumption will always override mixins.
The
Includesfield onCounterSetis a list of references to mixins defined in theCounterSetfield on theResourceSliceMixins.The
Includesfield onDeviceis a list of references to mixins defined in theDevicefield on theResourceSliceMixins.The
Includesfield onDeviceCounterConsumptionis a list of references to mixins defined in theDeviceCounterConsumptionfield on theResourceSliceMixins.
With these changes, attributes, capacity, and counters that are shared across devices or counter sets can be split out into mixins, thereby reducing duplication and reducing the size of the ResourceSlice object.
Risks and Mitigations
This change doesn’t really affect the functionality of DRA, it just provides a more compact way to define devices in ResourceSlices. But it still have some consequences worth pointing out here.
ResourceSlice resources will be harder to understand
The biggest challenge with this change is that it adds a level of
indirection for the Device and CounterSet definitions, meaning
that it gets harder to understand the ResourceSlice objects.
We have discussed adding a kubectl command or a plugin that will allow users to see the fully flattened versions of a ResourceSlice. But this is not in scope for alpha.
Flatting of ResourceSlices might be needed by all tools using the API
Any tool that needs to understand the full device definition will need to flatten the ResourceSlice. This can lead to duplicate effort across different tools and potential for implementations that differ in meaningful ways. This can be addressed by providing reusable libraries that can be leveraged by other tools.
Mixins and more counters might worsen worst-case scheduling
This will not negatively effect existing scheduling performance of existing ResourceSlice definitions, but DRA driver authors taking advantage of mixins should be made aware of possible performance effects due to this increased referential complexity.
This also demonstrates that DRA driver authors should consider performance when they write drivers and decide how to structure devices into ResourceSlices and pools. Information and best-practices about how to write drivers are available in https://github.com/kubernetes-sigs/dra-example-driver and this will also include information about performance and scalability.
Design Details
API
The exact set of proposed API changes can be seen below (... is used in places where new fields
are added to existing types):
// ResourceSliceSpec contains the information published by the driver in one ResourceSlice.
type ResourceSliceSpec struct {
...
// Mixins defines the mixins available for devices and counter sets
// in the ResourceSlice.
//
// +featureGate=DRAResourceSliceMixins
// +optional
Mixins *ResourceSliceMixins
}
type CounterSet struct {
...
// Includes defines a list of references to CounterSetMixin.
// The counters listed in the mixins will be added to the counters
// available in this CounterSet.
//
// The counters of each included mixin are applied to this counter set in
// order. Conflicting counters from multiple mixins are taken from the
// last mixin listed. Counters set on the CounterSet will always override
// counters from mixins.
//
// The mixins referenced here must be defined in the same
// ResourceSlice.
//
// The maximum number of includes is 8.
//
// +featureGate=DRAResourceSliceMixins
// +listType=atomic
// +optional
Includes []string
}
// ResourceSliceMixins defines mixins for the ResourceSlice.
//
// The main purposes of these mixins is to reduce the memory footprint
// of devices since they can reference the mixins provided here rather
// than duplicate them.
type ResourceSliceMixins struct {
// Device represents a list of device mixins, i.e. a collection of
// shared attributes and capacities that an actual device can "include"
// to extend the set of attributes and capacities it already defines.
//
// +optional
// +listType=atomic
Device []DeviceMixin
// DeviceCounterConsumption represents a list of counter
// consumption mixins, each of which contains a set of counters
// that a device will consume from a counter set.
//
// +optional
// +listType=atomic
DeviceCounterConsumption []DeviceCounterConsumptionMixin
// CounterSet represents a list of counter set mixins, i.e.
// a collection of counters that a CounterSet can "include"
// to extend the set of counters it already defines.
//
// +optional
// +listType=atomic
CounterSet []CounterSetMixin
}
// DeviceMixin defines a mixin that can be referenced from a device.
type DeviceMixin struct {
// Name is a unique identifier among all device mixins in the ResourceSlice.
// It must be a DNS label.
//
// +required
Name string
// Attributes defines the set of attributes for this mixin.
// The name of each attribute must be unique in that set.
//
// To ensure this uniqueness, attributes defined by the vendor
// must be listed without the driver name as domain prefix in
// their name. All others must be listed with their domain prefix.
//
// The maximum number of attributes and capacities across all devices
// and device mixins in a ResourceSlice is 4096. When flattened, the
// total number of attributes and capacities for each device must not
// exceed 32.
//
// +optional
Attributes map[QualifiedName]DeviceAttribute
// Capacity defines the set of capacities for this mixin.
// The name of each capacity must be unique in that set.
//
// To ensure this uniqueness, capacities defined by the vendor
// must be listed without the driver name as domain prefix in
// their name. All others must be listed with their domain prefix.
//
// The maximum number of attributes and capacities across all devices
// and device mixins in a ResourceSlice is 4096. When flattened, the
// total number of attributes and capacities for each device must not
// exceed 32.
//
// +optional
Capacity map[QualifiedName]DeviceCapacity
}
// DeviceCounterConsumptionMixin defines a mixin that
// devices can include to extend or override the set of counters
// that a device consumes from a counter set.
type DeviceCounterConsumptionMixin struct {
// Name is a unique identifier among all device counter consumption
// mixins in the ResourceSlice. It must be a DNS label.
//
// +required
Name string
// Counters defines a set of counters
// that a device will consume from a counter set.
//
// The maximum number device counter consumption all device counter consumptions
// and device counter consumption mixins in a ResourceSlice is 2048.
//
// +required
Counters map[string]Counter
}
// CounterSetMixin defines a mixin that a capacity pool can include.
type CounterSetMixin struct {
// Name is a unique identifier among all capacity pool mixins in the ResourceSlice.
// It must be a DNS label.
//
// +required
Name string
// Counters defines the set of counters for this mixin.
// The name of each counter must be unique in that set and must be a DNS label.
//
// The maximum number of counters across all counter sets and counter set
// mixins in a ResourceSlice is 256.
//
// +required
Counters map[string]Counter
}
type Device struct {
...
// Includes defines a list of references to DeviceMixin. The attributes
// and capacity listed in the mixins will be added to the device.
//
// The attributes and capacity of each included mixin are applied in
// order. Conflicting attributes/capacity from multiple mixins are taken from the
// last mixin listed. Attributes and capacity set on the device will
// always override those from mixins.
//
// The mixins referenced here must be defined in the same
// ResourceSlice.
//
// The maximum number of includes is 8.
//
// +featureGate=DRAResourceSliceMixins
// +optional
// +listType=atomic
Includes []string
}
type DeviceCounterConsumption struct {
...
// Includes defines a list of references to DeviceCounterConsumptionMixin.
// The counters listed in the mixins will be added to the
// counters that will be consumed by the device.
//
// The counters of each included mixin are applied in
// order. Conflicting counters from multiple mixins are taken from the
// last mixin listed. Counters set on the DeviceCounterConsumption will
// always override counters from mixins.
//
// The mixins referenced here must be defined in the same
// ResourceSlice.
//
// The maximum number of includes is 8.
//
// +featureGate=DRAResourceSliceMixins
// +optional
// +listType=atomic
Includes []string
}
Implementation
The DRA scheduler will keep the mixin structure throughout the scheduling process as much as possible and avoid completely flattening the ResourceSlices. This is to avoid additional memory usage that might come as a result. For example, we plan to walk the mixins as part of the CEL variable lookup to avoid having to flatten the device representation.
If the mixins feature is disabled, any devices or counter sets that references mixins will be droppped. This also means that all devices that references a dropped counter set will also be dropped. The result is that the scheduler will not see those devices. From the users point of view, the consequence is that the scheduler might pick a different device than expected or fails to allocate any device at all. But we think this failure mode is preferable than allowing the scheduler to make allocation decisions based on incomplete data.
Limits
For DRA, we have gradually moved away from individual per-slice and per-map limits towards aggregating at the higher level. The reason for this is to give users maximum flexibility between defining a small number of complex devices or a large number of simple devices in a single ResourceSlice without exceeding the limit on the size of Kubernetes objects.
For this KEP, we propose taking this to what is essentially the logical conclusion, where we enforce most of the limits across all devices, mixins and counter sets in a ResourceSlice, rather than setting separate limits for each of them.
The ResourceSlice-wide limits will be:
- Total number of devices is 128.
- Total combined number of attributes and capacity in a ResourceSlice is 4096 (so with the maximum number of devices, there can be 32 per device).
- Total number of counters is 256.
- Total number of consumed counters is 2048 (so with the maximum number of devices, there can be 16 per device).
We will still enforce some per-field limits:
- The number of mixins that can be referenced from each device, counter set or device counter consumption is 8.
- The number of taints per device is 4.
We will also enforce one limit on the flattened device:
- The combined number of attributes and capacities for a single device can not exceed 32. We do this to avoid increasing the cost of evaluation the CEL expressions for a device.
The limits on the number of counters across counter sets, mixins and device counter consumption in 1.33 for the Partitionable Devices KEP will be removed, as those are still in alpha. The current limit on the total number of attributes and capacities per device will be adjusted a bit to be enforced on the device with mixins applied, rather than just based on what is defined directly on a device. This doesn’t change the current behavior since a device with more than 32 attributes/capacities defined directly on the device will always fail the updated validation rule.
With these limits, the worst-case size for a ResourceSlice increases from 1,107,864 bytes to 1,288,825 bytes.
Test Plan
[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.
Prerequisite testing updates
None
Unit tests
k8s.io/dynamic-resource-allocation/structured:04/11/2025- 91.3%k8s.io/kubernetes/pkg/apis/resource/validation:04/11/2025- 97.8%
Integration tests
The integration test that verifies the theoretical maximum size of the ResourceSlice resource will be updated.
e2e tests
E2e tests will be added to verify that the mixins are properly applied and used by the scheduler.
Graduation Criteria
Alpha
- Feature implemented behind a feature flag
- Initial e2e tests completed and enabled
Beta
- Gather feedback from developers and surveys
- Additional tests are in Testgrid and linked in KEP
GA
- 3 examples of real-world usage
- Allowing time for feedback
- Conformance tests
Upgrade / Downgrade Strategy
Mixins will no longer work when downgrading to a release without support for it. As described in the Implementation section, devices that uses mixins or reference counter sets that is using mixins, will not be visible to the scheduler.
Version Skew Strategy
During version skew where the apiserver supports the feature and the scheduler doesn’t, the devices that is using the mixins feature will be dropped and not visible to the scheduler (ref Implementation ).
The exception here is in 1.34 (the first version where this feature is in alpha). If the APIServer is at 1.34 and the scheduler is at 1.33, the APIServer will send the new fields, but the scheduler will not know what to do about them. It will end up ignoring them, which can lead to incorrect scheduling decisions. Note that this scenario only applies to the initial 1.34 release and will not apply for 1.35+. The recommendation is that the user should not enable this alpha feature unless the scheduler is updated to 1.34 and enables the alpha feature as well.
Production Readiness Review Questionnaire
Feature Enablement and Rollback
How can this feature be enabled / disabled in a live cluster?
- Feature gate (also fill in values in
kep.yaml)- Feature gate name: DRAResourceSliceMixins
- Components depending on the feature gate:
- kube-apiserver
- kube-scheduler
Does enabling the feature change any default behavior?
No
Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
Yes. Applications that were already running will continue to run and the allocated devices will remain so.
What happens if we reenable the feature if it was previously rolled back?
It will take affect again and will impact allocation decisions.
Are there any tests for feature enablement/disablement?
This will be covered through unit tests for the apiserver and scheduler.
Rollout, Upgrade and Rollback Planning
How can a rollout or rollback fail? Can it impact already running workloads?
What specific metrics should inform a rollback?
Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
Monitoring Requirements
How can an operator determine if the feature is in use by workloads?
How can someone using this feature know that it is working for their instance?
- Events
- Event Reason:
- API .status
- Condition name:
- Other field:
- Other (treat as last resort)
- Details:
What are the reasonable SLOs (Service Level Objectives) for the enhancement?
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
- Metrics
- Metric name:
- [Optional] Aggregation method:
- Components exposing the metric:
- Other (treat as last resort)
- Details:
Are there any missing metrics that would be useful to have to improve observability of this feature?
Dependencies
Does this feature depend on any specific services running in the cluster?
Scalability
Will enabling / using this feature result in any new API calls?
No
Will enabling / using this feature result in introducing new API types?
No
Will enabling / using this feature result in any new calls to the cloud provider?
No
Will enabling / using this feature result in increasing size or count of the existing API objects?
Yes and no. It does add additional fields, which increases the worst case size of the ResourceSlice object. However, it also provides features that allows drivers to represent devices and counter sets in a more compact way, thereby potentially reducing the size of the ResourceSlice object.
Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
Flattening the devices and counter sets will require slightly more work, but this is unlikely to have any meaningful impact on the time used for allocation.
It does allow DRA driver authors to create more complex devices, with a larger number of counters. It also allows for larger number of counters in the counter sets. This can worsen the worst-case scheduling performance.
Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?
No
Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
No, because the feature is not used on nodes.
Troubleshooting
How does this feature react if the API server and/or etcd is unavailable?
What are other known failure modes?
What steps should be taken if SLOs are not being met to determine the problem?
Implementation History
- 1.33: first KEP revision as part of the Partitionable Devices KEP
- 1.34: split out into a separate KEP
Drawbacks
Using mixins adds to the complexity and makes it harder to get a quick overview of a device or a counter set.
Alternatives
Several alternatives were considered as part of the Partitionable Devices KEP