KEP-6080: DRA Derived Attributes
KEP-6080: DRA Derived Attributes
- Release Signoff Checklist
- Summary
- Motivation
- Proposal
- Design Details
- Production Readiness Review Questionnaire
- Implementation History
- Drawbacks
- Alternatives
- Future Considerations
Release Signoff Checklist
Items marked with (R) are required prior to targeting to a milestone / release.
- (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
- (R) KEP approvers have approved the KEP status as
implementable - (R) Design details are appropriately documented
- (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
- (R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests within one minor version of promotion to GA
- (R) Production readiness review completed
- (R) Production readiness review approved
- “Implementation History” section is up-to-date for milestone
- User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
- Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
Summary
DRA currently relies on exact literal matching of device attributes across
requests (matchAttributes). While performant, literal matching is rigid
for end users. Workloads needing to pair devices across complex boundaries—
such as extracting a PCI or NUMA ID from a topology string—cannot express
these requirements inline.
This KEP introduces derivedAttributes to .devices.requests. It allows
users to synthesize virtual grouping keys on the fly using scoped, per-device
CEL expressions. The scheduler’s constraint engine then evaluates these
derived keys exactly like static attributes.
Motivation
End-User Rigidity
Currently, DRA co-allocation requires exact literal matching of attributes via
matchAttributes. If a user wants to co-locate a GPU and a NIC based on a
shared PCIe locality, both device drivers must publish the exact same attribute
name and value. If one driver publishes pcie_locality: 0 and the other
publishes pcie_root: 0 (or embeds it in a topology string like numa0-pcie1),
the scheduler cannot match them. Users have no mechanism to bridge these schemas
inline.
Attribute Standardization Bottleneck
To achieve cross-vendor or multi-driver device pairing today, hardware vendors
must agree on standardized attribute naming conventions (e.g.,
resource.kubernetes.io/pcieRoot).
However, core Kubernetes components like the scheduler treat these attribute names as opaque strings during constraint matching. Forced standardization is essentially a human-coordinated workaround for a missing API capability: the inability of the constraint engine to match different attribute names across requests. Relying on this workaround forces vendors into a slow pipeline for API approval whenever a new pairing boundary is needed (real example ). While standardized labels provide clear benefits for downstream observability tools, forcing their use as the sole mechanism for device alignment creates rigid, human-coordinated dependencies rather than enabling flexible, API-driven logic.
While ongoing community efforts to standardize schemas are highly valuable for long-term hardware representation, the rapidly evolving AI/ML ecosystem often outstrips the turnaround time of formal standardization cycles. Frequently, the physical topology information needed for co-allocation is already present on device objects (embedded in vendor-specific strings, naming conventions, or capacity metrics). The only bottleneck preventing end users from consuming this data immediately is the strict requirement that attribute names match exactly across requests.
Goals
- Allow
.devices.requeststo define virtual grouping keys via scoped, per-device CEL expressions (derivedAttributes). - Enable co-allocation of heterogeneous hardware across differing vendor attribute schemas without prior human coordination or schema standardization.
- Preserve the early-pruning performance of the scheduler’s constraint engine by scoping CEL evaluation to individual candidate devices.
Non-Goals
- Replacing existing static attribute matching (
matchAttributes). - Supporting arbitrary cross-device constraints in CEL (explored in KEP-5254 but did not proceed due to scheduler scaling bottlenecks).
Proposal
We propose extending .devices.requests with derivedAttributes. A derived
attribute defines a CEL expression evaluated against each candidate device
object. The resulting value acts as a virtual attribute that can be referenced
directly by existing .devices.constraints[].matchAttribute fields.
Core Manifest Example: GPU & NIC NUMA Alignment
(Note: The community is actively standardizing numaNode in PR #6073
.
This example illustrates the API mechanics for bridging disparate schemas).
apiVersion: resource.k8s.io/v1
kind: ResourceClaim
metadata:
name: gpu-numa-alignment-claim
spec:
devices:
requests:
- name: gpu
exactly:
deviceClassName: gpu.nvidia.com
count: 8
# [NEW]: Compute virtual grouping key from GPU driver
derivedAttributes:
- name: shared-numa-node
expression: "device.attributes['gpu.nvidia.com'].numa"
- name: nic
exactly:
deviceClassName: dranet
count: 1
# [NEW]: Compute virtual grouping key from dranet driver
derivedAttributes:
- name: shared-numa-node
expression: "device.attributes['dra.net'].numaNode"
constraints:
# Match the derived attribute across both requests using existing matchAttribute
- matchAttribute: shared-numa-node
requests: [gpu, nic]
User Stories
Custom Multi-Device Grouping (GPU + NIC + CPU)
Users need to co-locate heterogeneous hardware (GPUs, high-speed NICs, host CPUs) managed by independent drivers. Each driver publishes topology metadata under different attribute names. Users can synthesize an identical virtual grouping key across all three requests:
- GPU:
expression: "device.attributes['gpu.nvidia.com'].foo_key" - NIC:
expression: "device.attributes['dra.net'].bar_id" - CPU:
expression: "device.attributes['cpu.intel.com'].baz_domain"
Substring / Regex Topology Extraction
Device drivers often publish monolithic topology strings (e.g.,
topology: "numa0-pcieDomain1-nic0"). Users can use CEL string manipulation
to extract the specific boundary needed inline:
expression: "device.attributes['vendor.com'].topology.split('-')[0]"
Dynamic Capacity Tiering
Users want to group heterogeneous devices into matching performance tiers based on capacity or numeric attributes. CEL comparisons can quantize continuous ranges into discrete matching bins:
expression: "device.attributes['vendor.com'].memory_gb >= 80 ? 'tier-1' : 'tier-2'"
Implicit Hardware Alignment via Naming Conventions
When physical topology alignment is known implicitly via device naming
conventions (e.g., gpu0 aligns with eth0) rather than explicit topology
attributes, CEL can extract and match the underlying hardware index:
- GPU:
expression: "device.name.replace('gpu', '')" - NIC:
expression: "device.name.replace('eth', '')"
Notes/Constraints/Caveats
KEP-5254 Comparison (DRA: Constraints with CEL)
KEP-5254 (PR #5391)
explored using CEL to express arbitrary constraints across entire device groups
(cel.expression). For example:
constraints:
# KEP-5254 proposed evaluating CEL across an entire device group
- cel:
expression: "devices[0].attributes['gpu.nvidia.com'].numa == devices[1].attributes['dra.net'].numaNode"
requests: [gpu, nic]
While highly flexible, evaluating expressions across entire device groups
prevented the scheduler from pruning invalid permutations early. Because the CEL
environment required candidate devices for both gpu and nic simultaneously
within the devices list, the scheduler had to generate combinatorial device
permutations before evaluating the constraint. This led to combinatorial
explosion during device filtering.
derivedAttributes resolves this by scoping CEL evaluation strictly to
individual candidate devices. The scheduler evaluates derived keys as individual
devices are filtered, maintaining the exact same early-pruning efficiency as
static attribute matching.
Interaction with KEP-5491 (List Types for Attributes)
KEP-5491 (PR #5492)
introduces list-typed attributes to ResourceSlice (e.g., lists of strings or
integers) and redefines matchAttribute to evaluate as a non-empty set
intersection ($\cap v_k \neq \emptyset$), treating scalar values as single-
element lists.
derivedAttributes is fully forward-compatible with KEP-5491 and will inherit its
capabilities in three key areas:
- CEL Runtime Environment: The CEL environment for
expressionwill expose list-typed attributes exactly as defined inResourceSliceby KEP-5491. - List Return Types & Synthesis: The allowed return types for
expressionwill be expanded to include list types ([]string,[]int64,[]bool). Crucially, manifest authors can use CEL list literal syntax to dynamically synthesize a list from multiple individual scalar attributes inline (e.g.,expression: "[device.attributes['v'].r1, device.attributes['v'].r2]"). This allows bridging older scalar-only drivers with KEP-5491 list matching. - Constraint Matching Semantics: When
matchAttributeevaluates a derived attribute that returns a list, it will adopt the exact same non-empty intersection matching semantics defined by KEP-5491. Scalar return values will be treated as single-element lists.
Naming and Override Semantics for Derived Attributes
We debate two design paths for validating derivedAttributes names:
- Option 1 (Allow FQDNs e.g.,
vendo.com/attr): Enables derived attributes to cleanly override static driver attributes of the same name, providing powerful inline flexibility at the slight risk of accidental shadowing. - Option 2 (Strictly Bare Names e.g.,
shared-numa): Enforces an absolute syntactic boundary between static (FQDN) and derived (bare) attributes, eliminating shadowing risks but preventing direct attribute overrides.
Recommended: Option 1 (Allowing FQDNs). Most manifest authors will naturally choose simple bare names. Conversely, authoring an FQDN override requires deliberate effort to duplicate a driver’s schema, indicating clear user intent. Enabling this pattern provides great flexibility.
Risks and Mitigations
- Risk: A new CEL expression needs to be evaluated for each candidate device (in addition to any CEL expressions evaluated for device selectors).
- Mitigation: To prevent redundant evaluations of the same expression on a single device (which may be evaluated multiple times or against multiple requests), the scheduler plugin caches both the compiled CEL ASTs and the evaluated derived attribute values for each candidate device. Evaluation happens exactly once per candidate device per scheduling cycle, making the latency overhead strictly linear O(N) with the number of candidate devices.
Design Details
API Changes
We extend DeviceRequest in resource.k8s.io/v1 with DerivedAttributes:
// package resource
type DeviceRequest struct {
// Existing fields...
Name string `json:"name"`
DeviceClassName string `json:"deviceClassName"`
// ...
// DerivedAttributes defines a set of virtual attributes computed via CEL expressions
// for each candidate device.
// +featureGate=DRADerivedAttributes
// +listType=map
// +listMapKey=name
// +optional
// +k8s:optional
// +k8s:maxItems=8
DerivedAttributes []DerivedAttribute `json:"derivedAttributes,omitempty"`
}
type DerivedAttribute struct {
// Name is the identifier for this derived attribute, used in constraints.
//
// It has the same format as the name of attributes in a ResourceSlice.
//
// A domain prefix (e.g., "example.com/attribute-name") should be used if
// the derived attribute is intended to override or shadow a static attribute
// of the same name from a device driver. If the derived attribute is unique
// and used solely for inline matching in constraints within the claim, a simple
// bare name without a domain prefix (e.g., "my-derived-attribute") should
// be used to prevent accidental shadowing and make the intent clear.
//
// +k8s:required
Name QualifiedName `json:"name"`
// Expression is a CEL expression evaluated against each candidate device.
// The expression must evaluate to a primitive scalar (string, integer,
// boolean, or semver) or a list of these scalars ([]string, []int64,
// []bool, []semver) to act as a virtual grouping key. Any other return type
// is an error and causes CEL evaluation for the device to fail.
//
// The expression's input is an object named "device", which carries the
// same properties as in a CELDeviceSelector.
//
// When pod scheduling encounters CEL runtime errors (such as looking
// up an attribute that isn't defined) for some devices, it will abort
// allocation and fail scheduling for the Pod. Surfacing evaluation
// errors immediately prevents silent topology matching failures that are
// extremely hard to detect. A robust expression should, for example, check
// for the existence of attributes before referencing them to avoid
// runtime evaluation errors.
//
// The length of the expression must be smaller or equal to 10 Ki. The
// cost of evaluating it is also limited based on the estimated number
// of logical steps.
//
// +k8s:required
Expression string `json:"expression"`
}
DeviceConstraint in resource.k8s.io/v1 requires zero Go struct modifications.
Existing MatchAttribute *string fields will be reused to reference derived
attributes.
CEL Environment & Validation
- Environment: The CEL environment for
Expressionis exactly the same as that forCELDeviceSelector, containing a single variabledevice. - Evaluation Order: Derived attributes are evaluated after the device
request’s
CELDeviceSelectorhas filtered candidate devices. They are not injected into the selector’s CEL environment and are exclusively used for evaluating constraints likematchAttributeanddistinctAttribute. - Return Type: The CEL expression must evaluate to a scalar (string,
integer, boolean, or semver) or a list of these scalars (
[]string,[]int64,[]bool,[]semver). - Validation:
kube-apiservervalidates the CEL syntax duringResourceClaimcreation and update. - Runtime Error Handling: If a CEL expression fails to evaluate on a candidate device at runtime (due to a missing attribute, null pointer reference, type mismatch, or other runtime error), the scheduler will abort the allocation and fail scheduling for the Pod immediately, even if other candidate devices or nodes evaluate successfully. This matches the behavior of CEL device selectors in the scheduler, where any runtime evaluation failure aborts allocation and fails scheduling for the Pod rather than silently filtering it out. Surfacing evaluation errors immediately prevents silent topology matching failures and ensures that broken expressions are detected and resolved.
Scheduler Plugin Implementation
In pkg/scheduler/framework/plugins/dynamicresources:
- The plugin compiles the CEL expressions defined in
derivedAttributesfor all requests in the Pod’sResourceClaims. Compiled ASTs are cached. - When evaluating candidate devices for a request, the plugin executes the
cached CEL expressions against each
Deviceobject. The computed values are stored in a temporary map associated with the candidate device. - When evaluating constraints (like
matchAttributeanddistinctAttribute), the plugin implements a lookup precedence: it first checks if the attribute name matches a cached derived attribute on the candidate device’s request; if not found, it falls back to looking up the static attribute on theDeviceobject. If values do not match across the specified requests, the permutation is pruned.
Test Plan
[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.
Prerequisite testing updates
None.
Unit tests
k8s.io/kubernetes/pkg/apis/resource/validation:2026-05-01->90%(Coverage )- Verify validation of
derivedAttributes(valid/invalid CEL syntax).
- Verify validation of
k8s.io/kubernetes/pkg/scheduler/framework/plugins/dynamicresources:2026-05-01->80%(Coverage )- Verify CEL compilation caching.
- Verify correct CEL evaluation and constraint matching.
Integration tests
test/integration/scheduler_perf/dra/derived-attributes:- Include a realistic scenario (number of attributes, complexity of the CEL expressions), then define a simple case for correctness checking and larger cases for performance measurement.
e2e tests
test/e2e/scheduling/dra.go:- Add e2e tests verifying multi-request co-allocation using
derivedAttributesacross test device plugins.
- Add e2e tests verifying multi-request co-allocation using
Graduation Criteria
Alpha
- Feature gate
DRADerivedAttributesimplemented. - API validation and scheduler plugin implementation complete.
- Unit and integration tests passing.
Beta
- Gather feedback from DRA driver maintainers (SIG Node / SIG Network).
- Any additional e2e tests implemented and running in Testgrid canaries.
- Verify scheduler performance and latency overhead with large device counts
using the
scheduler_perftest cases.
GA
- Proven adoption in deployment manifests and user documentation for real-world DRA drivers (e.g., dra-driver-cpu, dra-driver-nvidia-gpu, dra-driver-nvidia-tpus, dranet).
- Allowing time for feedback
- All issues and gaps identified as feedback during beta are resolved
Upgrade / Downgrade Strategy
- Upgrade: Enabling
DRADerivedAttributesallows users to createResourceClaimswith derived attributes. Existing claims are unaffected. - Downgrade: Disabling
DRADerivedAttributesprevents creating or updating claims with derived attributes. Existing claims using derived attributes will fail validation on update, and the scheduler will ignore derived attributes during filtering.
Version Skew Strategy
kube-apiserverandkube-schedulermust both haveDRADerivedAttributesenabled.- If
kube-apiserveris upgraded and has the feature gate enabled butkube-schedulerdoes not, the older scheduler will ignorederivedAttributesand treat non-FQDNmatchAttributestrings as static attributes on the device objects, resulting in scheduling failures. Standard n-1 control plane version skew rules apply.
Production Readiness Review Questionnaire
Feature Enablement and Rollback
How can this feature be enabled / disabled in a live cluster?
- Feature gate (also fill in values in
kep.yaml)- Feature gate name: DRADerivedAttributes
- Components depending on the feature gate: kube-apiserver, kube-controller-manager, kube-scheduler
Does enabling the feature change any default behavior?
No. Existing ResourceClaims without derivedAttributes are evaluated exactly
as before.
Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
Yes. Disabling the feature gate prevents creating new ResourceClaims with
derivedAttributes. Existing claims with derivedAttributes will fail
validation on update, and the scheduler will ignore derived attributes during
filtering.
What happens if we reenable the feature if it was previously rolled back?
Existing ResourceClaims with derivedAttributes will resume being evaluated
correctly by the scheduler.
Are there any tests for feature enablement/disablement?
Yes. Unit tests in pkg/apis/resource/validation verify that
derivedAttributes and non-FQDN matchAttribute or distinctAttribute strings
are rejected when the feature gate is disabled.
Rollout, Upgrade and Rollback Planning
How can a rollout or rollback fail? Can it impact already running workloads?
A rollout failure or rollback does not impact running Pods, as DRA resource
allocation occurs during Pod scheduling. If a rollback occurs, pending Pods
referencing claims with derivedAttributes may fail to schedule.
What specific metrics should inform a rollback?
schedule_attempts_totalwithresult="error"orresult="unschedulable"inkube-scheduler.plugin_execution_duration_secondsfor thedynamicresourcesplugin inkube-scheduler.
Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
Manual upgrade and rollback testing will be performed during Alpha by toggling the feature gate on a local test cluster and verifying scheduling behavior.
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
No.
Monitoring Requirements
How can an operator determine if the feature is in use by workloads?
By checking for ResourceClaim objects where
spec.devices.requests[*].derivedAttributes is non-empty.
How can someone using this feature know that it is working for their instance?
- API .status
- Condition name:
Allocationcondition onResourceClaim. When successfully scheduled and allocated, the claim status reflects the allocated devices matching the derived constraints.
- Condition name:
- Events
- Event Reason:
Scheduledevent on the Pod, indicating successful co-allocation by the scheduler.
- Event Reason:
What are the reasonable SLOs (Service Level Objectives) for the enhancement?
Scheduling latency for Pods using ResourceClaims with derivedAttributes
should not increase by more than 5% compared to claims using literal
matchAttributes.
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
- Metrics
- Metric name:
plugin_execution_duration_seconds(filter/reserve) fordynamicresourcesinkube-scheduler. - Components exposing the metric:
kube-scheduler.
- Metric name:
Are there any missing metrics that would be useful to have to improve observability of this feature?
No. Existing scheduler framework metrics and DRA controller metrics provide sufficient observability.
Dependencies
Does this feature depend on any specific services running in the cluster?
No.
Scalability
Will enabling / using this feature result in any new API calls?
No.
Will enabling / using this feature result in introducing new API types?
No.
Will enabling / using this feature result in any new calls to the cloud provider?
No.
Will enabling / using this feature result in increasing size or count of the existing API objects?
Yes. ResourceClaim objects will increase slightly in size when
derivedAttributes are defined (typically under 500 bytes per claim).
Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
kube-scheduler latency will increase slightly to evaluate CEL expressions.
Because evaluation is scoped per device object rather than across device groups,
the overhead is linear with the number of candidate devices $O(N)$ rather than
combinatorial. Caching compiled CEL ASTs minimizes this overhead.
Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?
No. kube-scheduler will experience a minor, negligible increase in CPU and
memory usage to compile and evaluate CEL expressions.
Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
No.
Troubleshooting
How does this feature react if the API server and/or etcd is unavailable?
Standard Kubernetes behavior applies; scheduling and resource allocation cannot proceed without the API server.
What are other known failure modes?
- Failure mode: Poorly optimized CEL expressions causing scheduler plugin
timeouts.
- Detection: Increase in
plugin_execution_duration_secondsfordynamicresourcesandschedule_attempts_totalwithresult="error". - Mitigations: The scheduler enforces a bounded execution time for CEL evaluation. If an expression exceeds the limit or fails at runtime, the candidate device is pruned.
- Detection: Increase in
What steps should be taken if SLOs are not being met to determine the problem?
Examine kube-scheduler logs and metrics
(plugin_execution_duration_seconds) to identify if specific CEL expressions
in pending ResourceClaims are causing high evaluation latency.
Implementation History
- 2026-05-15: Initial KEP draft created for Alpha in v1.37.
Drawbacks
- Adds complexity to the
dynamicresourcesscheduler plugin, which must manage additional CEL compilation, caching, and runtime evaluation environments (on top of existing ones.)
Alternatives
(Status Quo) Forced Attribute Standardization: Relying entirely on hardware vendors agreeing on standardized attribute naming conventions (e.g.,
resource.kubernetes.io/numa-node). This is not ideal as it creates rigid, slow, human-coordinated dependencies rather than enabling flexible, API-driven co-allocation logic.KEP-5254 (
MatchExpression): Exploring the use of CEL to express arbitrary constraints across entire device groups. While offering incredible flexibility, evaluating expressions across entire device groups made it difficult for the scheduler to prune invalid permutations early, leading to combinatorial explosion during device filtering.
Future Considerations
Derived Attributes in Device Selectors
In the current design, derivedAttributes are evaluated after the device
request’s CELDeviceSelector has filtered candidate devices. As a result,
derived attributes cannot be referenced within the selector’s CEL expression
itself.
While making derived attributes available within selectors would improve usability (by avoiding the need to duplicate complex mapping logic across the selector and the derived attribute), it introduces significant complexities:
- Performance Overhead: Evaluating derived attributes before selectors forces early evaluation on all candidate devices.
- CEL Environment Ambiguity: Referencing derived attributes in the CEL
environment requires resolving namespace collisions or expanding the
standard schema (e.g., introducing
device.derivedAttributes).
This functionality is excluded from the current KEP to prioritize scheduling performance and simplify the initial implementation. It may be explored in a future enhancement if the community identifies a strong need for this usability improvement.