KEP-5313: Placement Decision API
KEP-5313: PlacementDecision API
- Release Signoff Checklist
- Summary
- Motivation
- Proposal
- Design Details
- Production Readiness Review Questionnaire
- Implementation History
- Drawbacks
- Alternatives
- Infrastructure Needed (Optional)
Release Signoff Checklist
Items marked with (R) are required prior to targeting to a milestone / release.
- (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
- (R) KEP approvers have approved the KEP status as
implementable - (R) Design details are appropriately documented
- (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
- (R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests
- (R) Production readiness review completed
- (R) Production readiness review approved
- “Implementation History” section is up-to-date for milestone
- User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
- Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
Summary
Today every multicluster scheduler publishes its own API to convey where a workload should run, forcing downstream tools such as GitOps engine, workload orchestrator, progressive rollout controller, or AI/ML pipeline to understand a scheduler-specific API.
This KEP introduces a vendor neutral PlacementDecision API that standardizes
the output of multicluster placement calculations. A PlacementDecision object is data only:
a namespaced list of chosen clusters whose referenced names must map one-to-one to ClusterProfiles
as defined by the ClusterProfile API
.
Any scheduler can emit the object and any consumer can watch it.
What this API standardizes: The “which clusters” answer (the decision output).
What this API does NOT standardize: How to request placement (the input remains vendor-specific), or how consumers deploy workloads to the selected clusters.
This focused scope means:
- Consumers write ONE integration to read decisions from ANY scheduler
- Schedulers write ONE output format that works with ALL consumers
- Integration reduction: Instead of every consumer needing custom code for every scheduler, each component only integrates once with the standard API
- Example: 10 consumers with 10 schedulers needs 100 separate integrations today, but only 20 integrations with this API
Important clarifications:
- This is interface standardization, not full decoupling: Schedulers still need vendor-specific APIs for placement requests, and consumers still need to understand how to deploy workloads. What’s standardized is ONLY the “which clusters” output.
- Both schedulers and consumers remain workload-aware: Schedulers need to understand workload requirements (GPUs, storage, etc.) to make decisions, and consumers need to know how to deploy those workloads. This API is the simple “handoff” between them.
- Plugin reduction: Instead of “Argo CD plugin for OCM + Argo CD plugin for Clusternet + Argo CD plugin for Fleet”, you get “Argo CD reads PlacementDecision” (works with all).
Workload correlation is optional. When a decision is tied to a specific workload,
producers may label the PlacementDecision with the workload’s placement key.
Decisions not tied to a workload are supported
(ie, a controller continuously publishes a reusable decision stream for consumers).
flowchart LR
Initiator["Initiator (Users, higher level systems)"] -->
PL["Placement (concept/vendor-specific and not specified in this KEP)"]
PL --> S["Scheduler/PlacementController"]
S -- writes --> PD["PlacementDecisions (this KEP)"]
Initiator--> WL["Workload (with the placement key)"]
WL --> Tools["AI/ML Pipeline<br>GitOps Engine<br>Progressive Rollout Controller<br>Workload Orchestrator"]
Tools -- reads from --> PD
Tools -- performs actions on -->
Spokes["Spoke/Managed Clusters<br>- cluster1<br>- cluster2<br>- cluster3<br>- etc"]Motivation
A typical multicluster setup involves two key components:
- Scheduler: examines the fleet (
ClusterProfileobjects), and other signals/metrics and decides where a workload should land. - Consumer: GitOps engine, workload orchestrator, progressive rollout controller, AI/ML pipeline read that decision and act (for example, by creating Work objects).
Today’s problem: Every scheduler publishes its own API to convey placement decisions, forcing each consumer to write custom integration code for every scheduler they want to support.
Without this API, each consumer must implement separate code to read each scheduler’s decision format:
- Argo CD needs custom code for OCM, separate code for Clusternet, separate code for Fleet, etc.
- MultiKueue needs the same separate integrations for each scheduler
- Flux needs the same separate integrations for each scheduler
- Every new consumer added must write all these integrations again
- Every new scheduler added requires updates to all existing consumers
This creates an integration explosion where the total integration burden grows with every new consumer or scheduler added to the ecosystem.
With this API, the burden is reduced:
- Each consumer writes ONE integration that reads the standard
PlacementDecisionformat - Each scheduler writes ONE output format following the standard
- New consumers work with all existing schedulers automatically
- New schedulers work with all existing consumers automatically
Problems this solves:
- Vendor lock-in: Switching schedulers no longer requires rewriting consumer integrations
- Slow adoption: New consumers can support all schedulers by implementing one standard interface
- Duplicated effort: The same “read cluster list” logic no longer needs to be written separately for every consumer-scheduler pair
- RBAC complexity: One resource type to secure instead of different permissions for each scheduler’s API
Concrete example:
- Today: Argo CD must write and maintain separate code to understand OCM’s
PlacementDecision, Clusternet’sFeedInventory, Fleet’sClusterResourcePlacement, each with different schemas - With this KEP: Argo CD writes one integration that reads standard
PlacementDecisionobjects, works with all schedulers
Goals
This API aims to solve the following problems:
- Reduce integration burden: Enable consumers to write ONE integration that works with ANY scheduler implementing this API, instead of writing separate integrations for each scheduler.
- Enable scheduler portability: Allow organizations to switch schedulers (e.g., from OCM to Clusternet) without rewriting consumer code.
- Enable consumer portability: Allow new consumers (e.g., Argo CD, MultiKueue) to work with all schedulers by implementing one standard API.
- Simplify RBAC: Provide one resource schema for consumers to
get/list/watch, instead of different permissions for each scheduler’s API. - Support placement delegation: Enable external placement controllers to be plugged into any consumer.
- Align with ClusterProfile: Ensure direct mapping to
ClusterProfileinventory for cluster references.
Technical goals:
- Define a namespaced, minimalistic, data-only
PlacementDecisionAPI that lists selected clusters. - Support continuous rescheduling where decision lists may be updated over time.
- Guarantee that every (
clusterNamespace,clusterName) pair matches aClusterProfilein the fleet (enforced via admission). - Provide label and naming conventions so consumers can retrieve all slices of one decision via label selector or deterministic naming.
- Leave room for scheduler-specific implementations and workload-aware placement logic.
Non-Goals
This API intentionally does NOT solve the following:
- NOT standardizing placement requests: How to request placement (the input) remains vendor-specific. Each scheduler keeps its own API (OCM
Placement, ClusternetSubscription, FleetClusterResourcePlacement, etc.). Organizations will still need to use their chosen scheduler’s API to request placement. - NOT eliminating scheduler workload awareness: Schedulers still need to understand workload requirements (GPUs, storage, regions, compliance, etc.) to make placement decisions. This API only standardizes the output format.
- NOT eliminating consumer workload awareness: Consumers still need to understand how to deploy specific workload types to the selected clusters. They still need “one integration per consumer type” (e.g., Argo CD knows how to deploy Applications), but they don’t need “one integration per scheduler.”
- NOT describing scheduling logic: Internal details of how a scheduler made its choice remain implementation-specific.
- NOT describing cluster access: How consumers access selected clusters or authenticate to them is out of scope.
- NOT replacing Work API: This API does not replace the Work API, which is responsible for actually applying workloads to clusters.
- NOT embedding orchestration logic: Consumer feedback, rollout strategies, or deployment orchestration logic do not belong in
PlacementDecision. - NOT handling custom CRD scheduling: For custom CRDs with special scheduling needs, the scheduler handles the custom logic and still outputs a standard
PlacementDecision. The scheduler must understand the custom workload requirements. - NOT providing full decoupling: This is interface standardization. Both schedulers and consumers still need to understand the workload being placed, but they no longer need to understand each other’s proprietary decision APIs. You can’t eliminate all integration code, but you standardize the interface.
Why define PlacementDecision before a standardized Placement API?
The producer consumer swap we want most is at the decision interface:
any scheduler can publish the same simple, data only result and any consumer can read it and act.
How to request scheduling (the Placement spec) is much more complex since it needs to cover all the scheduling
scenarios and will take much longer to define.
Defining PlacementDecision first allows for the following:
- Consumers can adopt one reader that works for all the vendors that supports this API.
- Vendors can define their own custom placement spec/logic without coupling consumers.
- Simple RBAC due to one resource schema to secure
get/list/watchfor consumers.
Proposal
This section describes the high-level approach to solving the integration explosion problem.
The proposal is to introduce a standard, read-only PlacementDecision API that acts as the interface between schedulers and consumers. Schedulers write decisions using this format; consumers read from it. This creates a clean separation: schedulers focus on intelligent cluster selection, consumers focus on reliable workload deployment, and neither needs to know about the other’s implementation details.
Key characteristics:
- Data-only resource: No business logic, just a list of selected clusters
- Namespace-scoped: Aligns with Work and ClusterProfile for consistent RBAC
- Sliceable: Can represent decisions with hundreds of clusters using multiple slice objects
- Optional correlation: Can be tied to specific workloads via labels, or published as generic streams
- Read-only for consumers: Clear ownership model prevents conflicts
User Stories
Story 1: GPU-aware AI training
Initiator: ML platform / pipeline for a specific training job
Workload: PyTorch training job requiring 8 GPUs
Flow:
- Initiator creates a vendor-specific placement request (e.g., OCM
Placementwith GPU requirements) - Initiator creates/labels the training Job manifest with
multicluster.x-k8s.io/placement-key="training-job-resnet50-123" - ML scheduler scores
ClusterProfiles by available GPUs, GPU type, cost, network latency, etc. - Scheduler writes
PlacementDecisionwith labelmulticluster.x-k8s.io/placement-key="training-job-resnet50-123"listing chosen clusters (e.g., gpu-cluster-west, gpu-cluster-east) - GitOps/Work-API syncer watches decisions, finds the training Job by matching placement-key, deploys to selected clusters
- If GPUs become unavailable or cost spikes, scheduler updates the
PlacementDecisionwith new clusters; syncer reconciles
- Initiator creates a vendor-specific placement request (e.g., OCM
Key point: Scheduler understands GPU requirements (workload-aware), consumer just reads “deploy to these clusters”
Story 2: Progressive rollout
Initiator: Progressive rollout controller managing canary deployment of
my-servicev2.0Workload: Kubernetes Deployment with new version (release: “v2.0-canary”)
Flow:
- Rollout controller creates placement request for phase 1: “10% of clusters” (vendor-specific API)
- Labels the v2.0 Deployment with
multicluster.x-k8s.io/placement-key="my-service-v2.0-canary" - Scheduler (or rollout controller acting as scheduler) creates
PlacementDecisionwith:- Label:
multicluster.x-k8s.io/placement-key="my-service-v2.0-canary"(this ties decision to the specific release) - Label:
multicluster.x-k8s.io/decision-key="my-service-v2.0-rollout"(for correlating multiple slices if needed) - Clusters: [cluster-us-west-1] (10% of fleet)
- Label:
- Consumer (GitOps engine) watches decisions, finds the one matching placement-key “my-service-v2.0-canary”, deploys v2.0 to cluster-us-west-1
- After validation, rollout controller updates placement request to “50% of clusters”
- Scheduler updates the SAME
PlacementDecisionobject (still keyed to “my-service-v2.0-canary”) with more clusters: [cluster-us-west-1, cluster-us-east-1, cluster-eu-west-1, cluster-ap-south-1, cluster-ap-east-1] - Consumer reconciles: deploys to 4 new clusters
Key point: The release identifier (“v2.0-canary”) is encoded in the placement-key value, allowing the consumer to match the correct workload to its placement decision. Progressive expansion happens by updating the cluster list in the same keyed decision object.
Story 3: Disaster recovery
Initiator: DR controller / policy owner
Workload: Critical database service with primary/standby pattern
Flow:
- DR policy labels the database Deployment with
multicluster.x-k8s.io/placement-key="db-primary" - DR controller (acting as scheduler) creates
PlacementDecisionwith:- Label:
multicluster.x-k8s.io/placement-key="db-primary" - Clusters: [prod-us-east-1] (healthy primary)
- Label:
- DR controller continuously monitors
ClusterProfilestatus for health signals - Primary cluster fails, then DR controller updates
PlacementDecision:- Clusters: [prod-us-west-2] (promoted standby)
- Consumer (workload syncer) reconciles: deletes workload from failed cluster, creates on standby
- When primary recovers, DR controller may update decision again to fail back
- DR policy labels the database Deployment with
Key point: No workload-specific request needed; DR controller directly writes decisions based on cluster health
Story 4: Self produce and self consume (Argo CD)
Initiator: Argo CD ApplicationSet with custom placement logic
Workload: ApplicationSet generates Applications for
frontend-appFlow:
- ApplicationSet generator includes placement logic (acts as scheduler)
- Generator creates
PlacementDecisionwith:- Label:
multicluster.x-k8s.io/placement-key="frontend-app-prod" - Label:
multicluster.x-k8s.io/decision-key="frontend-app-placement" - Clusters: [cluster-1, cluster-2, cluster-3]
- Label:
- Custom ApplicationSet generator (or controller) reads the same
PlacementDecision - Generates Argo Applications for each cluster in the decision
- Argo CD’s standard reconciliation deploys to those clusters
Key point: Single tool can produce and consume decisions; enables modular architecture within one system
Story 5: Multiple consumers fan-out
Initiator: Platform team managing multi-region deployment
Workload: Microservice with observability, security scanning, and deployment needs
Flow:
- Scheduler creates ONE
PlacementDecisionwith:- Label:
multicluster.x-k8s.io/placement-key="payment-service" - Clusters: [prod-us-1, prod-eu-1, prod-ap-1]
- Label:
- GitOps consumer reads decision and deploys application manifests to all 3 clusters
- Security scanner consumer reads decision and schedules vulnerability scans on all 3 clusters
- Observability consumer reads decision and configures monitoring dashboards for all 3 clusters
- Backup consumer reads decision and sets up backup policies for all 3 clusters
- Scheduler creates ONE
Key point: Multiple independent consumers act on ONE decision; no need to run placement logic 4 times or keep 4 lists in sync
Notes/Constraints/Caveats
Additional Benefits: Placement Delegation and Modular Architecture
An important secondary benefit of this API is enabling placement delegation across different architectural patterns:
External Placement Controllers:
- Systems like Argo CD or MultiKueue can delegate placement decisions to external specialized schedulers
- Example: MultiKueue can set
DispatcherNameto reference an external scheduler, which then writesPlacementDecisionobjects - The consumer doesn’t need to implement its own placement logic—it just consumes the decision
- Multiple consumers can leverage the same sophisticated placement algorithm without duplicating code
Pluggable Scheduler Architectures:
- Organizations can develop specialized schedulers (cost-optimizing, GPU-aware, compliance-aware) as separate controllers
- Any consumer tool (Argo CD, Flux, custom orchestrators) can benefit from these schedulers through the standard API
- No need to fork or modify consumer tools to add new scheduling capabilities
Separation of Concerns:
- Scheduler teams focus on optimal cluster selection algorithms
- Consumer teams focus on reliable workload deployment and management
- Clear API boundary prevents coupling between these layers
This architectural flexibility wasn’t the primary goal but emerges naturally from the standardized interface.
Risks and Mitigations
Design Details
This section provides the technical specification of the PlacementDecision API.
Terminology
Placement: A scheduler request that asks “where should this workload run?”. Not standardized here and may not exist as a resource.
Scheduling decision: The resolved set of target clusters at a point in time.
Placement key: A correlation string to associate the placement request/decision with a workload when applicable. It is carried in the
multicluster.x-k8s.io/placement-keylabel and applied on the workload and its children. Producers may also put this label onPlacementDecisionslices when the decision is workload scoped. (Decisions not tied to a workload need not set this label.)Decision key: An opaque correlation string chosen by implementers to group decision slices. When used, it is carried in the
multicluster.x-k8s.io/decision-keylabel.Scheduler: A controller that writes
PlacementDecisionsbased onClusterProfilesand scheduling/placement requirements/specs.Consumer: Any controller (GitOps engine, workload orchestrator, progressive rollout controller, AI/ML pipeline) that watches
PlacementDecisionsand acts.
API Specification
Scope: Namespace scoped for RBAC parity with Work and ClusterProfile.
Design principle: The resource is pure data following EndpointSlice convention.
Size limits: Maximum 100 ClusterDecision entries per slice keeps objects well below etcd limit.
Validation: A webhook may verify that every (clusterNamespace, clusterName)
pair has a matching ClusterProfile in the fleet.
If multicluster.x-k8s.io/decision-index is set, it should be >=0.
API Definition
// PlacementDecision publishes the set of clusters chosen by a scheduler at a point in time.
type PlacementDecision struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
// Up to 100 ClusterDecisions per object (slice) to stay well below the etcd limit.
// +kubebuilder:validation:MinItems=0
// +kubebuilder:validation:MaxItems=100
Decisions []ClusterDecision `json:"decisions"`
// Optional: Name of the scheduler that created this decision.
// +optional
SchedulerName string `json:"schedulerName,omitempty"`
}
// Optional: when a decision spans multiple slices: links all slices to the same decision.
const DecisionKeyLabel = "multicluster.x-k8s.io/decision-key"
// Optional: label that indicates the index position of this slice when order matters.
const DecisionIndexLabel = "multicluster.x-k8s.io/decision-index"
// Optional: label that links a decision to an originating workload when applicable.
const PlacementKeyLabel = "multicluster.x-k8s.io/placement-key"
// ClusterDecision references a target ClusterProfile to apply workloads to.
type ClusterDecision struct {
// Reference to the target ClusterProfile.
ClusterProfileRef corev1.ObjectReference `json:"clusterProfileRef"`
// Optional: Reason to why this cluster was chosen.
// +optional
Reason string `json:"reason,omitempty"`
}
API Example
apiVersion: multicluster.x-k8s.io/v1alpha1
kind: PlacementDecision
metadata:
name: app-placement-decision-0
namespace: argocd
labels:
# Optional: present when the decision is tied to a workload
multicluster.x-k8s.io/placement-key: "my-app"
# Optional: if this logical decision spans multiple slices
multicluster.x-k8s.io/decision-key: "argocd-app-placement-decision"
# Optional: ordering hint when order matters across slices
multicluster.x-k8s.io/decision-index: "0"
schedulerName: multicluster-placement-controller
decisions:
- clusterProfileRef:
apiVersion: multicluster.x-k8s.io/v1alpha1
kind: ClusterProfile
namespace: fleet1
name: cluster1
reason: "GPUs available"
- clusterProfileRef:
apiVersion: multicluster.x-k8s.io/v1alpha1
kind: ClusterProfile
namespace: fleet1
name: cluster2
reason: "GPUs available"
Implementation Details
This section describes how the API should be implemented in practice.
Consumer Discovery and Usage
Consumers can discover and use PlacementDecision in one of the following ways:
Label selector (recommended)
- If the decision is workload scoped, the producer may set
multicluster.x-k8s.io/placement-key=<placement-key>on slices. Consumers can list/watch withlabelSelector=multicluster.x-k8s.io/placement-key=<placement-key>in the namespace. - If ordering matters and results span multiple slices, producer should set
multicluster.x-k8s.io/decision-index=<0..N>and consumers can sort by that label. - When multiple slices exist for one logical decision, the producer MUST set the same
multicluster.x-k8s.io/decision-key=<decision-key>on all slices. - To avoid assembling partially updated sets during reschedules, consumers SHOULD also group by a common
multicluster.x-k8s.io/decision-revisionvalue across slices.
Deterministic naming
- Producer uses a predictable naming scheme (
<base>-<slice-index>), and the consumerGets by name or lists by a name prefix within a namespace. - When using naming for grouping, the consumer is responsible for correlating all slices that share the same base.
Controllers may implement both options simultaneously.
Slicing
- Following EndpointSlice
design,
a single scheduling decision can fan out to N
PlacementDecisionslices, each limited to 100 clusters (EndpointSlice’s default). - To correlate slices, producers MUST:
- set the same
multicluster.x-k8s.io/decision-key=<decision-key>on all slices when more than one slice exists.
- set the same
- Producers may also:
- set
multicluster.x-k8s.io/placement-key=<placement-key>on slices when the decision is workload scoped.
- set
- If a scheduler needs to preserve the order of selected clusters and the result spans multiple slices,
it should label each PlacementDecision with
multicluster.x-k8s.io/decision-index=<index>wherestarts at 0 and increments by 1. Consumers that require ordering can sort by this label.
Lifecycle
Create: The scheduler creates one or more slices with the list of clusters in the decision. To enable discovery, it should choose either or both:
- Label selector correlation: set
multicluster.x-k8s.io/decision-key=<decision-key>on every slice when there are multiple slices; optionally setmulticluster.x-k8s.io/placement-key=<placement-key>when workload scoped, andmulticluster.x-k8s.io/decision-indexwhen order matters. - Deterministic naming correlation: use a deterministic naming pattern and set
multicluster.x-k8s.io/decision-indexwhen order matters (label is optional). The scheduler may populate the reason for each decision for debugging/auditing.
- Label selector correlation: set
Update / Reschedule: The scheduler may add or remove clusters in decisions at any time. If the number of target clusters crosses the 100 limit, it must create or delete slices to maintain the slicing rule. If order changes, update decision-index values accordingly so consumers can detect the new order.
Consumer Actions on Updates:
- Clusters Added: Consumer should deploy workloads to the newly added clusters
(ie, create
Workobjects targeting new clusters). - Clusters Removed: Consumer should remove workloads from clusters no longer in the decision list
(ie, delete
Workobjects, drain workloads).
If heavy churn is a concern, a scheduler may treat
decisionsas an unordered set and maintain it in a deterministic order (ie, alphabetical sorting). When the cluster set itself has not changed, this stable ordering produces an identical set of clusters, so the API server skips the write and no extra change events reach consumers.- Clusters Added: Consumer should deploy workloads to the newly added clusters
(ie, create
Delete: When a scheduling decision is no longer required (application/workload lifecycle ended, policy changes, or scheduler shutdown/replacement), the scheduler deletes every related
PlacementDecisionslice. Consumers should react to the delete event and remove any workload previously applied to the listed clusters.
Ownership
- The scheduler that creates the
PlacementDecisionowns the object. It is solely responsible for all writes (create,update,patch,delete). The consumers of thePlacementDecisionMUST treat the object as read only (get,list,watch). - RBAC will enforce this contract by granting the scheduler write verbs on
PlacementDecisions, while limiting consumers to read only access.
Relationship to other SIG-Multicluster (SIG-MC) APIs
- ClusterProfile The inventory. Each decision must reference a matching name
ClusterProfile - Work API The workload. A consumer may read
PlacementDecisionthen for each cluster createsWork.
Consumer Feedback
Consumer feedback is intentionally out of scope for PlacementDecision. The PlacementDecision object’s sole purpose is to publish the scheduler’s chosen cluster list. Once it has been created, it should be treated as read-only by consumers.
Allowing consumers to update the same PlacementDecision would complicate lifecycle ownership (whether the scheduler or the consumer is responsible for adding, updating, or removing cluster entries). It would also complicate security/permissions because a malicious consumer could update the decision and move workloads to unintended clusters.
When consumers need to provide feedback to the scheduler, they should do so through a separate channel like events, metrics, or a purpose-built PlacementFeedback API so they have clear write authority and the scheduler can decide what to do with that feedback.
Test Plan
[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.
Unit tests for CRD defaults/validation.
Ensuring slice size <= 100 and required labels exists.
Prerequisite testing updates
Unit tests
<package>:<date>-<test coverage>
Integration tests
:
e2e tests
:
Graduation Criteria
Alpha
- A CRD definition and generated client.
- A dummy controller and unit test to validate the CRD and client.
Beta
- Gather feedback from users during the Alpha stage to identify any issues, limitations, or areas for improvement. Address this feedback by making the necessary changes to the API and iterating on its design and functionality.
- At least two providers and one consumer using
PlacementDecisionAPI. - Conformance test suite for schedulers.
- Metrics for slice count and QPS.
- Backwards compatible field/label stability.
GA
- N examples of real-world usage
- N installs
- More rigorous forms of testing ie. downgrade tests and scalability tests
- Allowing time for feedback
- Stability: The API should demonstrate stability in terms of its reliability.
- Functionality: The API should provide the necessary functionality for multicluster scheduling, including the ability to distribute workloads across clusters. This should be validated through a series of functional tests and real-world use cases.
- Integration: Ensure that the API can be easily integrated with popular workload distribution tools, such as GitOps and Work API. This may involve developing plugins or extensions for these tools or providing clear guidelines on how to integrate them with the unified API.
- Performance and Scalability: Conduct performance and scalability tests to ensure that the API can handle a large number of clusters and workloads without degrading its performance. This may involve stress testing the API with a high volume of requests or simulating large-scale deployments.
Note: Generally we also wait at least two releases between beta and GA/stable, because there’s no opportunity for user feedback, or even bug reports, in back-to-back releases.
Upgrade / Downgrade Strategy
Additive-only until GA; optional fields carry defaults; no disruptive schema changes.
Version Skew Strategy
Older consumers ignore unknown fields; older schedulers remain valid. Label contracts stable from alpha.
Production Readiness Review Questionnaire
Feature Enablement and Rollback
How can this feature be enabled / disabled in a live cluster?
- Feature gate (also fill in values in
kep.yaml)- Feature gate name:
- Components depending on the feature gate:
- Other
- Describe the mechanism:
- Will enabling / disabling the feature require downtime of the control plane?
- Will enabling / disabling the feature require downtime or reprovisioning of a node?
Does enabling the feature change any default behavior?
- No default Kubernetes behavior is currently planned to be based on this feature; it is designed to be used by the separately installed, out-of-tree, multicluster management providers and consumers.
Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
- Yes, as this feature only describes a CRD, it can most directly be disabled by uninstalling the CRD.
What happens if we reenable the feature if it was previously rolled back?
Are there any tests for feature enablement/disablement?
- As a dependency only for an out-of-tree component, there will not be e2e tests for feature enablement/disablement of this CRD in core Kubernetes. The e2e test can be provided by multicluster management providers who support this API.
Rollout, Upgrade and Rollback Planning
How can a rollout or rollback fail? Can it impact already running workloads?
What specific metrics should inform a rollback?
Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
Monitoring Requirements
How can an operator determine if the feature is in use by workloads?
How can someone using this feature know that it is working for their instance?
- Events
- Event Reason:
- API .status
- Condition name:
- Other field:
- Other (treat as last resort)
- Details:
What are the reasonable SLOs (Service Level Objectives) for the enhancement?
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
- Metrics
- Metric name:
- [Optional] Aggregation method:
- Components exposing the metric:
- Other (treat as last resort)
- Details:
Are there any missing metrics that would be useful to have to improve observability of this feature?
Dependencies
Does this feature depend on any specific services running in the cluster?
Scalability
Will enabling / using this feature result in any new API calls?
Will enabling / using this feature result in introducing new API types?
Will enabling / using this feature result in any new calls to the cloud provider?
Will enabling / using this feature result in increasing size or count of the existing API objects?
Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?
Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
Troubleshooting
How does this feature react if the API server and/or etcd is unavailable?
What are other known failure modes?
What steps should be taken if SLOs are not being met to determine the problem?
Implementation History
Drawbacks
Alternatives
Status quo: every multicluster provider/scheduler ships its own API leads to consumer bloat and vendor lock-in.
Extending
Work API: overloads a workload syncner API with scheduling details which couples the where with the what.