KEP-5491: DRA: List Types for Attributes
KEP-5491: DRA: List Types for Attributes
- Release Signoff Checklist
- Summary
- Motivation
- Proposal
- Design Details
- Production Readiness Review Questionnaire
- Implementation History
- Drawbacks
- Alternatives
- Infrastructure Needed (Optional)
Release Signoff Checklist
Items marked with (R) are required prior to targeting to a milestone / release.
- (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
- (R) KEP approvers have approved the KEP status as
implementable - (R) Design details are appropriately documented
- (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
- (R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests within one minor version of promotion to GA
- (R) Production readiness review completed
- (R) Production readiness review approved
- “Implementation History” section is up-to-date for milestone
- User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
- Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
Summary
The Device Resource Assignment (DRA) API currently allows scalar attribute values to describe device characteristics. However, many real-world device topologies require representing sets of relationships (e.g., multiple PCIe roots, NUMA nodes). This KEP introduces support for list-typed attributes in ResourceSlice and extends(redefine) ResourceClaim’s constraints[].{matchAttribute, distinctAttribute} semantics to fit both list-type attributes and primitive attributes supported previously.
Motivation
The ResourceSlice API allows users to attach scalar attributes to devices. These can be used to allocate devices that share common topology within the node. For certain types of topological relationships, scalar values are insufficient. For example, a CPU may have adjacency to multiple PCIe roots. This enhancement proposes allowing attributes to be lists. The semantics of the MatchAttribute and DistinctAttribute constraints must adapt to the possibility of lists. For example, rather than defining an attribute “match” as equality, it would be defined as a non-empty intersection, treating scalars as single-element lists. Conversely, “distinct” attributes for lists would be defined as an empty intersection.
Goals
- Support typed-list in device attribute values.
- Extends(redefine) the semantics of
ResourceClaim’sconstraints[].{matchAttribute,distinctAttribute}fields as below so that it can work with list-type attribute valuesmatchAttribute: it is defined as non-empty intersectiondistinctAttribute: it is defined as pairwise disjoint- note: scalar values are treated as single-element lists
- Keep monotonicity in constraint.
- Currently
Allocator’s algorithm assumes monotonic constraints only. Monotonic means that once a constraint returns false, adding more devices will never cause it to return true. This allows to bound the computational complexity for searching device combinations which satisfies the specified constraints. This KEP focuses to keep monotonicity ofmatchAttribute/distinctAttributesemantics.
- Currently
- Maintain backward compatibility and inter-operability for scalar-only attributes.
matchAttribute/distinctAttribute: existing constraint can work because scalar values are treated as single-value list- CEL expressions in device selectors: when the attribute type is updated, existing CEL won’t failed to compile. But, we will provide some type-agnostic helper function to achieve easier migration for users/DRA driver developers.
Non-Goals
- Introducing generic or complex boolean logic in constraints(KEP-5254: DRA: Constraints with CEL ).
- Forcing all drivers to use list attributes immediately.
Proposal
The proposal has mainly two parts:
- Add list-types in
DeviceAttributeso that DRA drivers can expose the attribute values in typed list(int,string,boolean,version) - Extends the semantics of
MatchAttribute/DistinctAttributefield inDeviceConstraint- For
MatchAttribute:- Previously: it matches when the attribute values among candidate devices are identical (i.e.
∀i,j, v_i = v_j) - This KEP: it matches when the intersection (as a set) of all the list values among candidate devices is non-empty(i.e.
(∩ v_k != ∅))
- Previously: it matches when the attribute values among candidate devices are identical (i.e.
- For
DistinctAttribute- Previously: it matches when all the attribute values among candidate devices are distinct (i.e.
∀i,j, s.t. i != j, v_i != v_j) - This KEP: it matches when all the list values among candidate devices are pairwise disjoint (i.e.
∀i,j, s.t. i != j, v_j ∩ v_k = ∅)
- Previously: it matches when all the attribute values among candidate devices are distinct (i.e.
- For
API Changes
Introduce typed-list in DeviceAttribute
kind: ResourceSlice
spec:
devices:
- name: typed-list-attributes
attributes:
list-of-string:
list:
string: ["pci0000:00", "pci000:01"]
list-of-int:
list:
int: [0, 1, 2]
list-of-bool:
list:
bool: [true, false, true]
list-of-version:
list:
version: ["1.0.0", "1.0.1"]
Introduce .include function in CEL
When the attribute type was changed from scalar to list. Existing CEL won’t compile due to type mismatch.
// This CEL won't compile if attributes["foo"] type is changed from 1 (scalar) to [1](https://raw.githubusercontent.com/kubernetes/enhancements/master/keps/sig-scheduling/5491-dra-list-types-for-attributes/list)
attributes["foo"] == 1
To maintain backward compatibility for existing CEL expressions, it might be possible to override comparison operators (==, etc.) that allows for a list type where attributes["foo"] == 1 is equivalent to attributes["foo"] == [1]. But we don’t do this way because it wouldn’t be idiomatic and would diverge from normal CEL type system expectations and feels confusing to anyone that already has an understanding of how the CEL type system is suppose to work.
Instead, although user needs to rewrite the existing CEL expressions, it plans to provide a helper function, say .include, which can work in type-agnostic way to make the CEL migration easier:
// assume attribute["foo"] is 1
attribute["foo"].include(1) --> true
// assume attribute["foo"] is [1]
attribute["foo"].include(1) --> true
User Stories (Optional)
Story 1: Hardware Topological Aligned CPUs & GPUs & NICs
Assume several DRA drivers exposed device attribute resource.kubernetes.io/pcieRoot:
apiVersion: resource.k8s.io/v1
kind: ResourceSlice
metadata:
name: cpu
spec:
driver: "cpu.example.com"
pool:
name: "cpu"
resourceSliceCount: 1
nodeName: node-1
devices:
- name: "cpu-0"
attributes:
resource.kubernetes.io/pcieRoot:
list:
string:
- pci0000:01
- pci0000:02
- name: "cpu-1"
attributes:
resource.kubernetes.io/pcieRoot:
list:
string:
- pci0000:03
- pci0000:04
---
apiVersion: resource.k8s.io/v1
kind: ResourceSlice
metadata:
name: gpu
spec:
driver: "gpu.example.com"
pool:
name: "gpu"
resourceSliceCount: 1
nodeName: node-1
devices:
- name: "gpu-0"
attributes:
# Assume this driver is a bit old that keeps exposing string for the attribute
resource.kubernetes.io/pcieRoot:
string: pci0000:01
---
apiVersion: resource.k8s.io/v1
kind: ResourceSlice
metadata:
name: nic
spec:
driver: "nic.example.com"
pool:
name: "nic"
resourceSliceCount: 1
nodeName: node-1
devices:
- name: "nic-0"
attributes:
# Assume this driver is a bit old that keeps exposing string for the attribute
resource.kubernetes.io/pcieRoot:
string: pci0000:01
Then, user can create ResourceClaim resource which requests PCIe topology aligned CPU & GPU & NIC triple like below:
apiVersion: resource.k8s.io/v1
kind: ResourceClaim
spec:
requests:
- name: "gpu"
exactly:
deviceClassName: gpu.example.com
count: 1
- name: "nic"
exactly:
deviceClassName: nic.example.com
count: 1
- name: "cpu"
exactly:
deviceClassName: cpu.example.com
count: 2
constraints:
# "gpu-0", "nic-0" and "cpu-0" above can match
# because
# - "pci0000:01" is common.
# - string attribute can be treated as a single value list
- requests: ["gpu", "nic", "cpu"]
matchAttribute: k8s.io/pcieRoot
Story 2
T.B.D.
Notes/Constraints/Caveats (Optional)
Risks and Mitigations
- Risk 1: Driver adoption lag
- Mitigation: scalar is treated as single value list
- Risk 2: Scheduler performance overhead
- bound lengths of the list-typed attribute values
Design Details
Go Type Definitions
DeviceAttribute
type DeviceAttribute struct {
...
// ListValue is a typed-list.
//
// +optional
// +k8s:optional
// +k8s:unionMember
ListValue *ListAttribute `json:"list,omitempty"`
}
// ListAttribute defines typed-list value for device attributes
type ListAttribute struct {
// IntValue is a list of numbers.
//
// +optional
// +k8s:optional
// +k8s:unionMember
// +k8s:listType=atomic
// +k8s:maxItems=64
IntValue []int64 `json:"int,omitempty"`
// BoolValue is a list of true/false values.
//
// +optional
// +k8s:optional
// +k8s:unionMember
// +k8s:listType=atomic
// +k8s:maxItems=64
BoolValue []bool `json:"bool,omitempty"`
// StringValue is a list of strings.
// Each string must not be longer than 64 characters.
//
// +optional
// +k8s:optional
// +k8s:unionMember
// +k8s:listType=atomic
// +k8s:maxItems=64
// +k8s:eachVal=+k8s:maxLength=64
StringValue []string `json:"string,omitempty"`
// VersionValue is a list of semantic versions according to semver.org spec 2.0.0.
// Each version string must not be longer than 64 characters.
//
// +optional
// +k8s:optional
// +k8s:unionMember
// +k8s:listType=atomic
// +k8s:maxItems=64
// +k8s:eachVal=+k8s:maxLength=64
VersionValue []string `json:"version,omitempty"`
}
Implementation (for evaluating constraints)
Since non-empty intersection constraint is monotonic, we would not need updating Allocator.Allocate() algorithm
and can keep using constraint interface
. We will just extend the current matchAttributeConstraint
and distinctAttributeConstraint
instances. Or, we could introduce constraint instances for proposed modes (e.g., nonEmptyIntersectionMatchAttributeConstraint, etc.).
Test Plan
[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.
Prerequisite testing updates
Unit tests
<package>:<date>-<test coverage>
Integration tests
e2e tests
Graduation Criteria
Alpha
- Feature implemented behind a feature flag (
DRAListTypeAttributes). The Feature gate is disabled by default. - Documentation provided
- Initial unit, integration and e2e tests completed and enabled.
Beta
- Feature Gates are enabled by default.
- No major outstanding bugs.
- 1 example of real-world use case.
- Feedback collected from the community (developers and users) with adjustments provided, implemented and tested.
GA
- 2 examples of real-world use cases.
- Allowing time for feedback from developers and users.
Upgrade / Downgrade Strategy
Version Skew Strategy
For upgrade, existing ResourceClaim/ResourceSlice will still work as expected, as the new fields are missing there.
For downgrade, when there exists ResourceClaim with matchSemantics/distinctSemantics field or ResourceSlice with list type attribute values, there need to be caution. Although the already allocated claim does not affect, but when re-allocating, matchSemantics/distinctSemantics will be ignored. And, specified attribute in matchAttribute/distinctAttribute is list type, then allocation will be failed.
Production Readiness Review Questionnaire
Feature Enablement and Rollback
How can this feature be enabled / disabled in a live cluster?
- Feature gate (also fill in values in
kep.yaml)- Feature gate name:
DRAListTypeAttributes - Components depending on the feature gate: kube-apiserver, kube-scheduler
- Feature gate name:
- Other
- Describe the mechanism:
- Will enabling / disabling the feature require downtime of the control plane?
- Will enabling / disabling the feature require downtime or reprovisioning of a node?
Does enabling the feature change any default behavior?
Basically, no. Just introducing new API fields in ResourceClaim and ResourceSlice which does NOT change the default behavior when any device attribute type was NOT changed.
However, please note that ResourceClaim’s matchAttribute/distinctAttribute semantics are CHANGED when some device attribute type are changed from scalar to list.
Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
Yes. When disabled, you can not create DeviceAttribute with list-type values. And, existing list-type attribute values are just ignored. But, if specified attribute in matchAttribute/distinctAttribute is list type, allocation will be failed.
What happens if we reenable the feature if it was previously rolled back?
list-type attribute values in DeviceAttribute and matchSemantics/distinctAttribute in ResourceClaim will be available again.
Are there any tests for feature enablement/disablement?
Yes, it will be covered by Unit tests .
Rollout, Upgrade and Rollback Planning
How can a rollout or rollback fail? Can it impact already running workloads?
What specific metrics should inform a rollback?
Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
Monitoring Requirements
How can an operator determine if the feature is in use by workloads?
How can someone using this feature know that it is working for their instance?
- Events
- Event Reason:
- API .status
- Condition name:
- Other field:
- Other (treat as last resort)
- Details:
What are the reasonable SLOs (Service Level Objectives) for the enhancement?
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
- Metrics
- Metric name:
- [Optional] Aggregation method:
- Components exposing the metric:
- Other (treat as last resort)
- Details:
Are there any missing metrics that would be useful to have to improve observability of this feature?
Dependencies
Does this feature depend on any specific services running in the cluster?
Scalability
Will enabling / using this feature result in any new API calls?
No
Will enabling / using this feature result in introducing new API types?
No
Will enabling / using this feature result in any new calls to the cloud provider?
No
Will enabling / using this feature result in increasing size or count of the existing API objects?
Yes and no. It does add new fields, which increase the worst case size of ResourceSlice and ResourceClaim object. However, the increase size is bounded for most cases:
ResourceClaim: linear to the number of constraints specified in the resource.ResourceSlice: linear to the number of devices defined in the resource. And, the number of list items is also bounded.
Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
Not expected. All the proposed constraints in this KEP are monotonic constraint. Thus, worst case of computational complexity for device search is the same.
Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?
No.
Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
No.
Troubleshooting
How does this feature react if the API server and/or etcd is unavailable?
What are other known failure modes?
What steps should be taken if SLOs are not being met to determine the problem?
Implementation History
Drawbacks
Alternatives
Just support formatted string list instead of introducing list type
We could add pseudo list type support only for string type attribute (e.g. comma separated string).
- Pros:
- Simple, no change in
DeviceAttribute
- Simple, no change in
- Cons:
- String list only (Can’t support list of int/version).
- prone to mis-formatted string
- extra parsing computation
Introduce matchSemantics/distinctSemantics field for flexible/declarative match
Introduce matchSemantics/distinctSemantics fields into constraints field like this:
matchSemantics field
kind: ResourceClaim
spec:
constraints:
- requests: [ "device1", "device2", "device3" ]
matchAttribute: "resource.kubernetes.io/pcieRoot"
# [NEW]
# An optional field that defines customized "match" semantics over attribute values.
# This field must not set when "distinctAttribute" is set
matchSemantics:
# mode specifies the "match" semantics
# Identical (∀i,j, v_i = v_j):
# All the attribute values among candidate devices are identical,
# supporting both list-order-sensitive and set-equivalence comparisons via `listMode`.
# NonEmptyIntersection (|∩ v_i| >= k (>=1)):
# The intersection (as a set) of list values among candidate devices is non-empty.
# The required intersection size could be configurable via `minSize`.
# For future possible cases:
# - CommonPrefix/Suffix with customizable length
# - Identical for aggregated values of the list items (min/max/sum/length)
mode: Identical | NonEmptyIntersection
options:
nonEmptyIntersection:
# if true, implicit cast from scalar to list will be performed. The default is false.
coerceScalarToList: true | false
# minSize specifies the minimum size of the intersection to evaluate as true.
# Default is 1. The value must be positive integer.
minSize: 1
identical:
coerceScalarToList: true | false # common option
# listMode specified the equality as a set(order/duplicates are ignored) or list (order significant). Default is List
listMode: List | Set
Examples of match semantics mode:
| attribute values | Identical | NonEmptyIntersection( coerceScalarToList=true) |
|---|---|---|
d1="a", d2="b" | false | false |
d1=["a", "b"] , d2=["b", "a"] | false(listMode: List)true(listMode: Set) | true( d1 ∩ d2 = {"a", "b"}) |
d1=["a", "b"] , d2=["a", "c"] | false | true( d1 ∩ d2 = {"a"}) |
d1=["a", "b"] , d1=["c", "d"] | false | false( d1 ∩ d2 = ∅) |
`distinctSemantics
kind: ResourceClaim
spec:
constraints:
- requests: [ "device1", "device2", "device3" ]
distinctAttribute: "resource.kubernetes.io/numaNode" # note: this is imaginary attribute.
# [NEW]
# an optional field that defines customized "distinct" semantics over attribute values
# this field must not set when "matchAttribute" is set
distinctSemantics:
# mode specifies the "distinct" semantics
# `AllDistinct`:
# All the values are distinct, supporting both list-order-sensitive and set-equivalence comparisons via `listMode`.
# (i.e. ∀i,j s.t. i ≠ j, v_i != v_j),
# `EmptyIntersection`:
# The intersection (as a set) of all the list values among candidate devices is empty. (i.e. ∩ v_k = ∅ )
# `PairwiseDisjoint`:
# Every pair of the list values (as a set) of candidate devices is disjoint (i.e. completely no overlap).
# (i.e. ∀i,j s.t. i ≠ j, v_i ∩ v_j = ∅),
# For future possible cases:
# - NoCommonPrefix/Suffix, PairwiseDisjointPrefix/Suffix with customizable length
# - AllDistinct for aggregated values of the list items (min/max/sum/length)
mode: AllDistinct | EmptyIntersection | PairwiseDisjoint
options:
allDistinct:
coerceScalarToList: true | false # common option
# listMode specified the equality as a set(order/duplicates are ignored) or list (order significant). Default is List
listMode: List | Set
emptyIntersection:
coerceScalarToList: true | false # common option
pairwiseDisjoint:
coerceScalarToList: true | false # common option
Examples of distinct semantics mode:
| attribute values | AllDistinct | PairwiseDistinct( coerceScalarToList=true) | EmptyIntersection( coerceScalarToList=true) |
|---|---|---|---|
d1="a", d2="b" | false | false | false |
d1=["a", "b"] , d2=["b", "a"] | true(listMode: List)false(listMode: Set) | false( d1 ∩ d2={"a","b"}) | false( ∩dk={"a","b"}) |
d1=["a", "b"] , d2=["a", "c"], d3=["a", "d"] | true | false( di ∩ dj = {"a"} ≠ ∅) | false( ∩ dk = {"a"} ≠ ∅) |
d1=["a", "b"] , d2=["b", "c"], d3=["c", "a"] | true | false( di ∩ dj ≠ ∅) | true( ∩ dk = ∅) |
d1=["a", "b"] , d2=["c", "d"], d3=["e", "f"] | true | true( di ∩ dj = ∅) | true( ∩ dk = ∅) |
Pros/Cons
- Pros:
- Flexible
- Declarative
- Extensible
- Cons:
- Too much complex even we don’t have use-cases to introduce the complexity
Unified semantics field instead of matchSemantics/distinctSemantics
We can consider unified semantics field for both matchAttribute/distinctAttribute like below:
semantics:
mode: NonEmptyIntersection | EmptyIntersection | Identical | AllDistinct | PairwiseDisjoint
- Pros:
- Simple
- Cons:
- Confusing which mode is valid for
matchAttributeordistinctAttribute - Extra validation logics
- Confusing which mode is valid for