KEP-2149: ClusterID for ClusterSet Identification

Implementation History
BETA Implementable
Created 2020-11-13
Latest v1.28
Milestones
Alpha v1.26
Beta v1.28
Ownership

KEP-2149: ClusterId for ClusterSet identification

Release Signoff Checklist

Items marked with (R) are required prior to targeting to a milestone / release.

  • (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
  • (R) KEP approvers have approved the KEP status as implementable
  • (R) Design details are appropriately documented
  • (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
  • (R) Graduation criteria is in place
  • (R) Production readiness review completed
  • Production readiness review approved
  • “Implementation History” section is up-to-date for milestone
  • User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
  • Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

Summary

The new multi-cluster services API (see KEP-1645 ) expanded the ways clusters can communicate with each other and organized them into ClusterSets, but as of now there is no way for a cluster to be uniquely identified in a Kubernetes-native way. This document by SIG-Multicluster proposes a standard for how cluster IDs should be stored and managed, based on concrete use cases discussed and observed in ClusterSet deployments. While existing implementations may not currently or plan to abide by this standard, future expansions to the Multi-Cluster API will be designed on top of this standard and existing MCS API implementations are encouraged to adopt it.

Motivation

That there must be some way to identify individual clusters in a multi-cluster deployment has felt like a given to SIG-Multicluster; it has been discussed in a broad sense previously (see this doc ), and was scoped down in response to actual observed use cases in the latest community discussion on which this KEP is based (doc ). The motivation of this KEP is to provide a flexible but useful baseline for cluster id that can work with the known use cases (see the User Stories section).

Existing implementations of the MCS API may have addressed the need for a cluster id in their own ways, inconsistent with this current standard. It is the perspective of SIG-Multicluster that future additions to the MCS API will depend when necessary on the proposal laid out here, and existing implementations are encouraged to migrate any existing cluster id assignment and storage mechanism to fit within the specifications of this KEP.

Goals

  • Propose a standard for how cluster identification metadata should be stored and managed as Kubernetes resources
  • Define the standard to be strict enough to be useful in the following user stories:
    • Establish reliable coordinates for determining clusterset membership and identity of a cluster within its cluster set
    • Enable disambiguation of DNS names for multicluster Headless services with the same hostnames
    • Facilitate enrichment of log / event / metrics data with cluster id / set coordinates

Non-Goals

  • Define any characteristics of the system that tracks cluster ids within a ClusterSet (i.e. a cluster registry)
  • Solve any problems without specific, tangible use cases (though we will leave room for extension).
  • In particular, this KEP explicitly does not consider
    • a cluster joining multiple ClusterSets
    • how or whether users should be able to specify aliases for cluster ids and what they could be used for

Proposal

Overview

This proposal defines a new cluster-scopedClusterProperty resource for storing cluster-level metadata. The primary justification is to enable identification of a cluster and its relevant properties within a cluster set, but there is no intention to limit general use of ClusterProperty to multi-cluster scenarios.

Each cluster in a ClusterSet must be assigned a unique identifier, that lives at least as long as that cluster is a member of the given ClusterSet, and must not be changed for that same lifetime. This identifier will be stored in a ClusterProperty CR with the well known name cluster.clusterset.k8s.io that may be referenced by workloads within the cluster. The identifier must be a valid RFC-1123 DNS subdomain, and should be less than 128 characters in total.

While it is a member of a ClusterSet, a cluster must also have an additional clusterset.k8s.io ClusterProperty which describes its current membership. This property must be present as long as the cluster’s membership in a ClusterSet lasts, and removed when the cluster is no longer a member.

More detail and examples of the uniqueness, lifespan, immutability, and content requirements for both the cluster.clusterset.k8s.io ClusterProperty and clusterset.k8s.io ClusterProperty are described further below. The goal of these requirements are to provide to the MCS API a cluster id of viable usefulness to address known user stories without being too restrictive or prescriptive.

User Stories

ClusterSet membership

I have some set of clusters working together and need a way to uniquely identify them within the system that I use to track membership, or determine if a given cluster is in a ClusterSet.

For example, SIG-Cluster-Lifecycle’s Cluster API subproject uses a management cluster to deploy resources to member workload clusters, but today member workload clusters do not have a way to identify their own management cluster or any interesting metadata about it, such as what cloud provider it is hosted on.

Joining or moving between ClusterSets

I want the ability to add a previously-isolated cluster to a ClusterSet, or to move a cluster from one ClusterSet to another and be aware of this change.

Multi-Cluster Services

I have a headless multi-cluster service deployed across clusters in my ClusterSet with similarly named pods in each cluster. I need a way to disambiguate each backend pod via DNS.

For example, an exported headless service of services name myservice in namespace test, backed by pods in two clusters with cluster ids clusterA and clusterB, could be disambiguated by different DNS names following the pattern <clusterid>.<svc>.<ns>.svc.clusterset.local: clusterA.myservice.test.svc.clusterset.local. and clusterB.myservice.test.svc.clusterset.local.. This way the user can implement whatever load balancing they want (as is usually the case with headless services) by targeting each cluster’s available backends directly.

Diagnostics

Clusters within my ClusterSet send logs/metrics to a common monitoring solution and I need to be able to identify the cluster from which a given set of events originated.

Multi-tenant controllers

My controller interacts with multiple clusters and needs to disambiguate between them to process its business logic.

For example, CAPN’s virtualcluster project is implementing a multi-tenant scheduler that schedules tenant namespaces only in certain parent clusters, and a separate syncer running in each parent cluster controller needs to compare the name of the parent cluster to determine whether the namespace should be synced. (ref ).

ClusterProperty CRD

The ClusterProperty Kind provides a way to store cluster scoped information while creating flexibility for implementations. The initial use case is to support multi-cluster tooling, but a ClusterProperty may be used to store any cluster-scoped data. A cluster may have multiple ClusterPropertys, each holding a different identification related value. Each property contains the following information:

  • Name - a well known or custom name to identify the property. This is the metadata.Name of the resource.
  • Value - a property-dependent string, up 128k Unicode code points (see Notes/Constraints/Caveats section). This is the one and only field in this Kind.

The schema for ClusterProperty is intentionally loose to support multiple forms of information, including arbitrary additional identification related properties described by users (see “Additional Properties”, below), but certain well-known properties will add additional schema constraints, such as those described in the next section.

Well known properties

The ClusterProperty CRD will support two specific properties under the well known names cluster.clusterset.k8s.io and clusterset.k8s.io. Being “well known” means that they must conform to the requirements described below, and therefore can be depended on by multi-cluster implementations to achieve use cases dependent on knowledge of a cluster’s id or ClusterSet membership.

The requirements below use the keywords must, should, and may purposefully in accordance with RFC-2119 .

Property: cluster.clusterset.k8s.io

Contains a unique identifier for the containing cluster.

Uniqueness
  • The identifier must be unique within the ClusterSet to which its cluster belongs for the duration of the cluster’s membership.
  • The identifier may be globally unique beyond the scope of its ClusterSet.
  • The identifier may be unique beyond the span of its cluster’s membership and lifetime.
Lifespan
  • The identifier must exist and not be changed for the duration of a cluster’s membership in a ClusterSet, and as long as a clusterset.k8s.io property referring to that cluster in that ClusterSet exists.
Contents
  • The identifier must be a valid RFC-1123 DNS subdomain and should be less than 128 characters in total. This may be used to compose larger DNS names (e.g. in the case of multi-cluster services), so care should be take to ensure that the final names fit into the limit of 253 characters.
  • The identifier may be used as a component in MCS DNS.
  • The identifier may be a human readable description of its cluster.
Consumers
  • May rely on the identifier existing, unmodified for the entire duration of its membership in a ClusterSet.
  • Should watch the cluster.clusterset.k8s.io property to handle potential changes if they live beyond the ClusterSet membership.
  • May rely on the existence of an identifier for clusters that do not belong to a ClusterSet so long as the implementation provides one.
Notable scenarios

Reusing cluster names: Since an cluster.clusterset.k8s.io ClusterProperty has no restrictions on whether or not a ClusterProperty can be repeatable, if a cluster unregisters from a ClusterSet it is permitted under this standard to rejoin later with the same cluster.clusterset.k8s.io ClusterProperty it had before. Similarly, a different cluster could join a ClusterSet with the same cluster.clusterset.k8s.io ClusterProperty that had been used by another cluster previously, as long as both do not have membership in the same ClusterSet at the same time. Finally, two or more clusters may have the same cluster.clusterset.k8s.io ClusterProperty concurrently (though they should not; see “Uniqueness” above) as long as they both do not have membership in the same ClusterSet.

Property: clusterset.k8s.io

Contains an identifier that relates the containing cluster to the ClusterSet in which it belongs.

Lifespan
  • The identifier must exist and be immutable for the duration of a cluster’s membership in a ClusterSet.
  • The identifier must not exist when the cluster is not a member of a ClusterSet.
Contents
  • The identifier must associate the cluster with a ClusterSet.
Consumers
  • May rely on the identifier existing, unmodified for the entire duration of its membership in a ClusterSet.
  • Should watch the clusterset property to detect the span of a cluster’s membership in a ClusterSet.

Additional Properties

Implementers are free to add additional properties as they see fit, so long as they do not conflict with the well known properties and utilize a suffix. The following suffixes are reserved for Kubernetes and related projects: .k8s.io, .kubernetes.io. For example, an implementation may utilize the Kind ClusterProperty to store objects with the name fingerprint.example.com but not fingerprint.k8s.io. Cluster operators are free to use non-namespaced properties (e.g. fingerprint) as they see fit, but any shared tooling should use appropriately namespaced names.

Notes/Constraints/Caveats

Note: On ClusterProperty.value max length validation

Prior Kubernetes API constructs in core k/k containing arbitrary string values, such as annotations, are limited by a byte length. The CRD system exposes two built-in (as in, non-webhook) methods for expressing validation rules against CRDs: CustomResourceValidation, also known as structural schema, via OpenAPIv3 schema validation (as of Kubernetes version 1.14.7 ), and CEL, also known as the x-kubernetes-validations extension (as of Kubernetes version 1.25 ). Both systems define strings as Unicode code points, so any validation for maxLength will be based on number of code points, NOT on input byte count. As a result, this specification can only express the limits on ClusterProperty.value length in terms of Unicode code points, regardless of which of these two validation methods are used (and, to maximize Kubernetes version compatibility, using structural schema over CEL is advised). Note that this may not be the same as number of perceived characters (for example, flag emojis such as “🇺🇸” appear as 1 character but take up 2 code points) nor the number of bytes used to represent it in a given encoding (that same emoji uses 8 and 10 bytes in UTF-8 and UTF-16, respectively).

Practically, the encoded length of the string in bytes as observed on input or output by the user may vary depending on which of the valid JSON encodings are used (UTF-8, UTF-16, or UTF-32). Therefore, the value limit of 128k code points could take up to 512KB using the least space efficient allowable encoding, UTF-32, which uses 4 bytes per code point.

Strings must be at their encoded length in bytes at storage and while transmitting over REST. Regarding storage limits, the 512KB is within the 1.5 MiB default maximum request size for etcd . There is no apparent enforcement of request limit sizes to a vanilla Kubernetes API server outside of the PodSecurity admission controller (which only applies to Pod.Spec, and for reference is 3MiB ). The most comparable upstream limit is for resource annotation values, which must be within 256KB (enforced with custom validation in k/k ), and which is supporting a use case amenable to smaller value sizes than ClusterProperty.value.

Risks and Mitigations

Design Details

Rationale behind the ClusterProperty CRD

This proposal suggests a CRD composed of objects all of the same Kind ClusterProperty, and that are distinguished using certain well known values in their metadata.name fields. This design avoids cluster-wide singleton Kinds for each property, reduces access competition for the same metadata by making each property its own resource (instead of all in one), allows for RBAC to be applied in a targeted way to individual properties, and supports the user prerogative to store other simple metadata in one centralized CRD by creating CRs of the same Kind ClusterProperty but with their own names.

Storing arbitrary facts about a cluster can be implemented in other ways. For example, Cluster API subproject stopgapped their need for cluster name metadata by leveraging the existing Node Kind and storing metadata there via annotations, such as cluster.x-k8s.io/cluster-name (ref ). While practical for their case, this KEP avoids adding cluster-level info as annotations on child resources so as not to be dependent on a child resource’s existence, to avoid issues maintaining parity across multiple resources of the same Kind for identical metadata, and maintain RBAC separation between the cluster-level metadata and the child resources. Even within the realm of implementing as a CRD, the API design could focus on distinguishing each fact by utilizing different spec.Types (as Service objects do e.g. spec.type=ClusterIP or spec.type=ExternalName), or even more strictly, each as a different Kind. The former provides no specific advantages since multiple differently named properties for the same fact are unnecessary, and is less expressive to query (it is easier to query by name directly like kubectl get clusterproperties cluster.clusterset.k8s.io). The latter would result in the proliferation of cluster-wide singleton Kind resources, and be burdensome for users to create their own custom properties.

Implementing the ClusterProperty CRD and its admission controllers

cluster.clusterset.k8s.io ClusterProperty

The actual implementation to select and store the identifier of a given cluster could occur local to the cluster. It does not necessarily ever need to be deleted, particularly if the identifier selection mechanism chooses an identifier that is compliant with this specification’s most broad restrictions – namely, being immutable for a cluster’s lifetime and unique beyond just the scope of the cluster’s membership. A recommended option that meets these broad restrictions is a cluster’s kube-system.uuid.

That being said, for less stringent identifiers, for example a user-specified and human-readable value, a given cluster.clusterset.k8s.io ClusterProperty may need to change if an identical identifier is in use by another member of the ClusterSet it wants to join. It is likely this would need to happen outside the cluster-local boundary; for example, whatever manages memberships would likely need to deny the incoming cluster, and potentially assign (or prompt the cluster to assign itself) a new id.

Since this KEP does not formally mandate that the cluster id must be immutable for the lifetime of the cluster, only for the lifetime of its membership in a ClusterSet, any dependent tooling explicitly cannot assume the cluster.clusterset.k8s.io ClusterProperty for a given cluster will stay constant on its own merit. For example, log aggregation of a given cluster id based on this property should only be trusted to be referring to the same cluster for as long as it has one ClusterSet membership; similarly, controllers whose logic depends on distinguishing clusters by cluster id can only trust this property to disambiguate the same cluster for as long as the cluster has one ClusterSet membership.

Despite this flexibility in the KEP, cluster ids may still be useful before ClusterSet membership needs to be established; again, particularly if the implementation chooses the broadest restrictions regarding immutability and uniqueness. Therefore, having a controller that initializes it early in the lifecycle of the cluster, and possibly as part of cluster creation, may be a useful place to implement it, though within the bounds of this KEP that is not strictly necessary.

The most common discussion point within the SIG regarding whether an implementation should favor a UUID or a human-readable cluster id string is when it comes to DNS. Since DNS names are originally intended to be a human readable technique of address, clunky DNS names composed from long UUIDs seems like an anti-pattern, or at least unfinished. While some extensions to this spec have been discussed as ways to leverage the best parts of both (ex. using labels on the cluster.clusterset.k8s.io ClusterProperty to store aliases for DNS), an actual API specification to allow for this is outside the scope of this KEP at this time (see the Non-Goals section).

# An example object of `cluster.clusterset.k8s.io ClusterProperty` 
# using a kube-system ns uuid as the id value (recommended above):

apiVersion: about.k8s.io/v1
kind: ClusterProperty
metadata:
  name: cluster.clusterset.k8s.io
spec:
  value: 721ab723-13bc-11e5-aec2-42010af0021e
# An example object of `cluster.clusterset.k8s.io ClusterProperty` 
# using a human-readable string as the id value:

apiVersion: about.k8s.io/v1
kind: ClusterProperty
metadata:
  name: cluster.clusterset.k8s.io
spec:
  value: cluster-1

clusterset.k8s.io ClusterProperty

A cluster in a ClusterSet is expected to be authoritatively associated with that ClusterSet by an external process and storage mechanism with a purview above the cluster local boundary, whether that is some form of a cluster registry, some peer-to-peer distributed consensus and membership tracking, or just a human running kubectl. (The details of any specific mechanism is out of scope for the MCS API and this KEP – see the Non-Goals section.) Mirroring this information in the cluster-local ClusterProperty CRD will necessarily need to be managed above the level of the cluster itself, since the properties of clusterset.k8s.io extend beyond the boundaries of a single cluster, and will likely be something that has access to whatever cluster registry-esque concept is implemented for that multicluster setup. It is expected that the mcs-controller (as described in the MCS API KEP ), will act as an admission controller to verify individual objects of this property.

Because there are obligations of the cluster.clusterset.k8s.io ClusterProperty that are not meanigfully verifiable until a cluster tries to join a ClusterSet and set its clusterset.k8s.io ClusterProperty, the admission controller responsible for setting a clusterset.k8s.io ClusterProperty will need the ability to reject such an attempt when it is invalid, and alert [UNRESOLVED] or possibly affect changes to that cluster’s cluster.clusterset.k8s.io ClusterProperty to make it valid [/UNRESOLVED]. Two symptomatic cases of this would be:

  1. When a cluster with a given cluster.clusterset.k8s.io ClusterProperty tries to join a ClusterSet, but a cluster with that same cluster.clusterset.k8s.io ClusterProperty appears to already be in the set.
  2. When a cluster that does not have a cluster.clusterset.k8s.io ClusterProperty tries to join a ClusterSet.

In situations like these, the admission controller will need to fail to add the invalid cluster to the ClusterSet by refusing to set its clusterset.k8s.io ClusterProperty, and surface an error that is actionable to make the property valid.

# An example object of `clusterset.k8s.io ClusterProperty`:

apiVersion: about.k8s.io/v1
kind: ClusterProperty
metadata:
  name: clusterset.k8s.io
spec:
  value: environ-1

CRD upgrade path

To CRD or not to CRD?

That is the question.

While this document has thus far referred to the ClusterProperty resource as being implemented as a CRD, another implementation point of debate has been whether this belongs in the core Kubernetes API, particularly the cluster.clusterset.k8s.io ClusterProperty and especially while it being discussed under the more general naming convention of id.k8s.io. A dependable cluster ID or cluster name has previously been discussed in other forums (such as this SIG-Architecture thread from 2018, or, as mentioned above, the Cluster API subproject which implemented their own solution .) While today the use case for the current well-known properties described in this KEP address specific needs for multicluster setups, it is the opinion of SIG-Multicluster that the function of the proposed ClusterProperty CRD is of broad utility and becomes more useful the more ubiquitous it is.

This has led to the discussion of whether or not we should pursue adding this resource type not as a CRD associated with SIG-Multicluster, but as a core Kubernetes API implemented in kubernetes/kubernetes. A short pro/con list is enclosed at the end of this section.

One effect of that decision is related to the upgrade path. Implementing this resource only in k/k will restrict the types of clusters that can use cluster id to only ones on the target version (or above) of Kubernetes, unless a separate backporting CRD is made available to them. At that point, with two install options, other issues arise. How do backported clusters deal with migrating their CRD data to the core k/k objects during upgrade – will the code around the formal k/k implementation be sensitive to the backport CRD and migrate itself? Will users have to handle upgrades in a bespoke manner?

CRDk/k
UbiquitousNoYes
Default always setNoYes
DeploymentMust be installed by the cluster lifecycle management, or as a manual setup stepIn every cluster over target milestone
Schema validationOpenAPI v3 validationCan use the built-in Kubernetes schema validation
BlockersOfficial API review if using *.k8s.ioOfficial API review
Conformance testingNot possible now, and no easy path forwardStandard

In the end, SIG-Multicluster discussed this with SIG-Architecture and it was decided to stick with the plan to use a CRD. Notes from this conversation are in the SIG-Architecture meeting agenda for 3/25/2021. A graduation criteria was set for Alpha->Beta stage to fully immortalize this decision, intended to be the last chance to consider including this design in k/k or not.

The largest concern within SIG-Multicluster regarding a CRD based implementation was the added difficulty of deployment. While at that time efforts were underway to address some of these concerns by providing a better ecosystem for CRD bootstrapping, there is as of yet no centralized solution for bootstrapping out-of-tree CRDs. For now, users of About API (and by extension, MCS API which depends on it) will need to manage their CRD installations carefully, until that is addressed out of scope of this KEP. This may become easier for the community to address as a whole as other CRD-based implementations of Kubernetes features also reach maturity.

Test Plan

This KEP proposes and out-of-tree CRD that is not expected to integrate with any of the Kubernetes CI infrastructure. In addition, it explicitly provides only the CRD definition and generated clients for use by third party implementers, and does not provide a controller or any other binary with business logic to test. For these reasons, we only expect to provide unit tests for a dummy controller to confirm that the generated CRD can be installed and the generated clients can be instantiated. Today those tests are available here .

However, similar to other out-of-tree CRDs that serve third party implementers, such as Gateway API and MCS API, there is rationale for the project to provide conformance tests for implementers to use to confirm they adhere to the restrictions set forth in this KEP that are not otherwise enforced by the CRD definition; in thise case, the constraints defined on the well-known properties clusterset.k8s.io and cluster.clusterset.k8s.io. Providing these tests are not considered blocking graduation requirements for the maturity level of this API.

These tests will be provided in such a way that implementers can expose one or more clusters that have the About API CRD installed in them, and run a series of tests that confirms any well-known properties stored in those clusters' ClusterProperty objects conform to the constraints in Well known properties .

Graduation Criteria

Alpha -> Beta Graduation

  • Determine if an cluster.clusterset.k8s.io ClusterProperty be strictly a valid DNS label, or is allowed to be a subdomain.
  • To CRD or not to CRD (see section above)
  • Determine if CRD implementation should use CEL validation to limit byte length instead of code points; this would make it only compatible with 1.23+ where CEL validation is behind a feature gate for alpha.

Beta -> GA criteria

  • At least one headless implementation using cluster id for MCS DNS

Upgrade / Downgrade Strategy

Any changes to the API definition will follow the official Kubernetes API groups and versioning guidance here and here . In short, the API will be provided in order through v1alphaX, v1betaX, to v1, where compatibility will be preserved from v1beta1 and onwards; clients will be expected to eventually migrate to the v1 implementation of the API as the prior versions are deprecated.

Version Skew Strategy

As a CRD, this API is dependent on any changes in the version and compatibility of the CRD feature itself on which it is built. As the CRD system is in v1 as of Kubernetes 1.14, and the Kubernetes versioning guarantees v1 APIs to be maintained through the Kubernetes major release, and as the About API does not depend on any new features of the CRD system since then, there is no expected coordination required with any core Kubernetes components until and unless Kubernetes proceeds to version 2.X.

This CRD /is/ a direct dependency of the MCS API and any mcs-controller implementation as defined by that KEP. As discussed later in the PRR, it is expected that the mcs-controller (or any other controller taking this CRD as its dependency) would manage the lifecycle of this CRD, including any version skew.

As also mentioned below, we are aware that other features (in or out of tree) may want to use this CRD (as debated in “To CRD or Not to CRD” section, above) but we believe it is in the scope of those future features to assess the impact of this CRD’s version strategy on their component’s version skew and their feature’s stability if they do.

Production Readiness Review Questionnaire

NOTE: While this KEP represents only the schema of a CRD that will be implemented out-of-tree and maintained separately from core Kubernetes, a best effort on the PRR questionnaire is enclosed below.

Feature Enablement and Rollback

This section must be completed when targeting alpha to a release.

  • How can this feature be enabled / disabled in a live cluster?

    • Feature gate (also fill in values in kep.yaml)
      • Feature gate name:
      • Components depending on the feature gate:
    • Other
      • Describe the mechanism:
        • This feature is independently installed via a CRD hosted on the kubernetes-sigs Github.
      • Will enabling / disabling the feature require downtime of the control plane?
        • No
      • Will enabling / disabling the feature require downtime or reprovisioning of a node?
        • No
  • Does enabling the feature change any default behavior? Any change of default behavior may be surprising to users or break existing automations, so be extremely careful here.

    • No default Kubernetes behavior is currently planned to be based on this feature; it is designed to be used by the separately installed, out-of-tree, MCS controller. That being said, we are of the opinion that future features (default or not) may want to use this CRD (as debated in “To CRD or Not to CRD” section, above) but we believe it is in the scope of those future features to assess the impact of requiring CRD bootstrapping has on their feature stability if they do.
  • Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? Also set disable-supported to true or false in kep.yaml. Describe the consequences on existing workloads (e.g., if this is a runtime feature, can it break the existing applications?).

    • Yes, as this feature only describes a CRD, it can most directly be disabled by uninstalling the CRD. However in practice it is expected that the bootstrapping of this CRD and the management of the well known property CRs themselves will be managed by the mcs-controller, and the recommended way to disable this feature will be to disable the mcs-controller. It is expected the mcs-controller will be responsible for detecting the presence of this CRD to gracefully fail or otherwise raise error messages that can be acted on if the CRD has been disabled by a mechanism other than the mcs-controller’s lifecycle management of the CRD.
  • What happens if we reenable the feature if it was previously rolled back?

    • Purely from this KEP’s standpoint, feature reenablement - namely, reinstallation of the CRD - will do no more than reinstall the CRD schema. In relation to the expected lifecycle manager of this CRD (the mcs-controller), it is expected that on reenablement of the mcs-controller it will reinstall the CRD, will reestablish lifecycle management of the well known properties it is dependent on, including re-creating any relevant CRs.
  • Are there any tests for feature enablement/disablement? The e2e framework does not currently support enabling or disabling feature gates. However, unit tests in each component dealing with managing data, created with and without the feature, are necessary. At the very least, think about conversion tests if API types are being modified.

    • As a dependency only for an out-of-tree component, there will not be e2e tests for feature enablement/disablement of this CRD in core Kubernetes, but e2e tests for this can be implemented in the kubernetes-sigs/mcs-api repo where a basic mcs-controller implementation lives. In reality, multiple mcs-controller implementations are expected to be produced outside of core and these production-ready mcs-controllers are responsible for their own e2e testing.

Rollout, Upgrade and Rollback Planning

This section must be completed when targeting beta graduation to a release.

  • How can a rollout fail? Can it impact already running workloads?

    CRDs themselves are Kubernetes objects, and can fail to be applied if the schema definition is corrupt or incompatible with the CustomResourceDefinition schema. Unit tests and manual tests continuously confirm that as the built CRD yaml produced by this project is valid against the stable v1 CustomResourceDefinition. (It also could fail if the CRD is applied to a version of Kubernetes that does not have the CRD system is used (<1.14), or the API Server is unreachable, but these are both considered catastrophic failures out of scope of this KEP.)

    Ultimately, the failure of a rollout of any CRD has the potential to disrupt all features or workloads that depend on it. Watches in controllers will fail to receive updates as the client would fail to find the CRD; a concrete known example for this CRD, the CoreDNS multicluster DNS plugin, would fail to program new DNS records and CoreDNS will answer SERVFAIL to any request made for a Kubernetes record that has not yet been synchronized. Features or workloads that depend on this CRD should plan to manage the lifecycle of this CRD or to provide transparent failure modes if the CRD is not present.

  • What specific metrics should inform a rollback?

    Metrics should be configured using a metrics solutions implementing the Custom Metrics API , for example, the metrics plugin for Custom Resources in kube-state-metrics . Kubernetes does not provide default metrics for CRDs.

  • Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? Unit tests and manual tests confirm that the CRD is capable of being uninstalled and reinstalled.

  • Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? No.

Monitoring Requirements

This section must be completed when targeting beta graduation to a release.

  • How can an operator determine if the feature is in use by workloads?

    Kubernetes does not provide default metrics for CRDs so an operator would need to depend on custom metrics, or filter 404s from Kubernetes API server against this CRD.

  • What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?

    N/A: This KEP does not propose a service, only leverages the existing Kuebernetes API service and CRD extension mechanism.

  • What are the reasonable SLOs (Service Level Objectives) for the above SLIs?

    N/A: This KEP does not propose a service, only leverages the existing Kuebernetes API service and CRD extension mechanism.

  • Are there any missing metrics that would be useful to have to improve observability of this feature?

    Default metrics for CRDs in general for number of requests by workload source would improve

Dependencies

This section must be completed when targeting beta graduation to a release.

  • Does this feature depend on any specific services running in the cluster? This feature depends only on the CustomResourceDefinition v1 in Kubernetes API server, available in Kubernetes versions 1.14+.

Scalability

For alpha, this section is encouraged: reviewers should consider these questions and attempt to answer them.

For beta, this section is required: reviewers must answer these questions.

For GA, this section is required: approvers should be able to confirm the previous answers based on experience in the field.

  • Will enabling / using this feature result in any new API calls?

    Installing the CRD will require a single API call to POST the new CustomResourceDefinition resource that represents it.

  • Will enabling / using this feature result in introducing new API types?

    Yes, installing the CRD introduces the cluster-scoped ClusterProperty Kind. As there is no related service proposed as part of this KEP, there are no specific limits on the supported number of objects per cluster outside of Kubernetes API server storage limits.

  • Will enabling / using this feature result in any new calls to the cloud provider?

    No.

  • Will enabling / using this feature result in increasing size or count of the existing API objects?

    Besides the trivial single CustomResourceDefinition required to install this CRD, no other size or count of existing API objects will be affected by this KEP.

  • Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs ?

    No, this KEP does not affect any of the operations covered by existing SLIs/SLOs, particularly since CustomResourceDefinitions are excluded from those SLOs.

  • Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?

    This CRD will utilize the validation mechanism provided by the CRD extension for validation of structural schemas of CRDs which requires some amount of resources to validate on create or update of a CR. However, the number of expected resources (2 as of this KEP) and their rate of change (related to clusterset membership changes, itself expected to be a human decision and rarely changing state) is expected to be trivial.

Troubleshooting

The Troubleshooting section currently serves the Playbook role. We may consider splitting it into a dedicated Playbook document (potentially with some monitoring details). For now, we leave it here.

This section must be completed when targeting beta graduation to a release.

  • How does this feature react if the API server and/or etcd is unavailable?

    This KEP itself proposes a CRD applied to the API server; if the API server and/or etcd is unavailable, so is this CRD. Features dependent on this CRD must assess the impact of this CRD’s availability on their component’s availability. Most concretely today, components of the mcs-controller are expected to serve as an admission controller to this CRD or are dependent on this CRD to program DNS. If the API server and/or etcd is unavailable, those controllers will be unable to update a cluster’s ClusterProperty data regarding its well-known properties as part of a ClusterSet, or to program any updates to DNS, respectively.

  • What are other known failure modes?

    • [CRD cannot be installed]
      • Detection: Custom metrics or dependent feature metrics; increased 404 rate on Kube API server for the CRD.
      • Mitigations: What can be done to stop the bleeding, especially for already running user workloads?
      • Diagnostics: What are the useful log messages and their required logging levels that could help debug the issue? Warning and above, as this is the level that 404s against the CRD will be seen.
      • Testing: Unit tests against generated CRD schema installation and usage of generated client.
  • What steps should be taken if SLOs are not being met to determine the problem?

    N/A: SLOs are not defined as there is no service provided by this KEP.

Implementation History

Drawbacks

Alternatives

Infrastructure Needed (Optional)