KEP-2149: ClusterID for ClusterSet Identification
KEP-2149: ClusterId for ClusterSet identification
- Release Signoff Checklist
- Summary
- Motivation
- Proposal
- Design Details
- Production Readiness Review Questionnaire
- Implementation History
- Drawbacks
- Alternatives
- Infrastructure Needed (Optional)
Release Signoff Checklist
Items marked with (R) are required prior to targeting to a milestone / release.
- (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
- (R) KEP approvers have approved the KEP status as
implementable - (R) Design details are appropriately documented
- (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
- (R) Graduation criteria is in place
- (R) Production readiness review completed
- Production readiness review approved
- “Implementation History” section is up-to-date for milestone
- User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
- Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
Summary
The new multi-cluster services API (see
KEP-1645
)
expanded the ways clusters can communicate with each other and organized them
into ClusterSets, but as of now there is no way for a cluster to be uniquely
identified in a Kubernetes-native way. This document by SIG-Multicluster
proposes a standard for how cluster IDs should be stored and managed, based on
concrete use cases discussed and observed in ClusterSet deployments. While
existing implementations may not currently or plan to abide by this standard,
future expansions to the Multi-Cluster API will be designed on top of this
standard and existing MCS API implementations are encouraged to adopt it.
Motivation
That there must be some way to identify individual clusters in a multi-cluster deployment has felt like a given to SIG-Multicluster; it has been discussed in a broad sense previously (see this doc ), and was scoped down in response to actual observed use cases in the latest community discussion on which this KEP is based (doc ). The motivation of this KEP is to provide a flexible but useful baseline for cluster id that can work with the known use cases (see the User Stories section).
Existing implementations of the MCS API may have addressed the need for a cluster id in their own ways, inconsistent with this current standard. It is the perspective of SIG-Multicluster that future additions to the MCS API will depend when necessary on the proposal laid out here, and existing implementations are encouraged to migrate any existing cluster id assignment and storage mechanism to fit within the specifications of this KEP.
Goals
- Propose a standard for how cluster identification metadata should be stored and managed as Kubernetes resources
- Define the standard to be strict enough to be useful in the following user
stories:
- Establish reliable coordinates for determining clusterset membership and identity of a cluster within its cluster set
- Enable disambiguation of DNS names for multicluster Headless services with the same hostnames
- Facilitate enrichment of log / event / metrics data with cluster id / set coordinates
Non-Goals
- Define any characteristics of the system that tracks cluster ids within a ClusterSet (i.e. a cluster registry)
- Solve any problems without specific, tangible use cases (though we will leave room for extension).
- In particular, this KEP explicitly does not consider
- a cluster joining multiple ClusterSets
- how or whether users should be able to specify aliases for cluster ids and what they could be used for
Proposal
Overview
This proposal defines a new cluster-scopedClusterProperty resource for storing
cluster-level metadata. The primary justification is to enable identification of
a cluster and its relevant properties within a cluster set, but there is no
intention to limit general use of ClusterProperty to multi-cluster scenarios.
Each cluster in a ClusterSet must be assigned a unique identifier, that lives at
least as long as that cluster is a member of the given ClusterSet, and must not
be changed for that same lifetime. This identifier will be stored in a
ClusterProperty CR with the well known name cluster.clusterset.k8s.io that
may be referenced by workloads within the cluster. The identifier must be a valid
RFC-1123
DNS subdomain, and should be less
than 128 characters in total.
While it is a member of a ClusterSet, a cluster must also have an additional
clusterset.k8s.io ClusterProperty which describes its current membership. This
property must be present as long as the cluster’s membership in a
ClusterSet lasts, and removed when the cluster is no longer a member.
More detail and examples of the uniqueness, lifespan, immutability, and content
requirements for both the cluster.clusterset.k8s.io ClusterProperty and
clusterset.k8s.io ClusterProperty are described further below. The goal of
these requirements are to provide to the MCS API a cluster id of viable
usefulness to address known user stories without being too restrictive or
prescriptive.
User Stories
ClusterSet membership
I have some set of clusters working together and need a way to uniquely identify them within the system that I use to track membership, or determine if a given cluster is in a ClusterSet.
For example, SIG-Cluster-Lifecycle’s Cluster API subproject uses a management cluster to deploy resources to member workload clusters, but today member workload clusters do not have a way to identify their own management cluster or any interesting metadata about it, such as what cloud provider it is hosted on.
Joining or moving between ClusterSets
I want the ability to add a previously-isolated cluster to a ClusterSet, or to move a cluster from one ClusterSet to another and be aware of this change.
Multi-Cluster Services
I have a headless multi-cluster service deployed across clusters in my ClusterSet with similarly named pods in each cluster. I need a way to disambiguate each backend pod via DNS.
For example, an exported headless service of services name myservice in
namespace test, backed by pods in two clusters with cluster ids clusterA
and clusterB, could be disambiguated by different DNS names following the
pattern <clusterid>.<svc>.<ns>.svc.clusterset.local:
clusterA.myservice.test.svc.clusterset.local. and
clusterB.myservice.test.svc.clusterset.local.. This way the user can implement
whatever load balancing they want (as is usually the case with headless
services) by targeting each cluster’s available backends directly.
Diagnostics
Clusters within my ClusterSet send logs/metrics to a common monitoring solution and I need to be able to identify the cluster from which a given set of events originated.
Multi-tenant controllers
My controller interacts with multiple clusters and needs to disambiguate between them to process its business logic.
For example, CAPN’s virtualcluster project is implementing a multi-tenant scheduler that schedules tenant namespaces only in certain parent clusters, and a separate syncer running in each parent cluster controller needs to compare the name of the parent cluster to determine whether the namespace should be synced. (ref ).
ClusterProperty CRD
The ClusterProperty Kind provides a way to store
cluster scoped information while creating flexibility
for implementations. The initial use case is to support multi-cluster tooling, but a ClusterProperty may be used to store any cluster-scoped data. A cluster may have multiple ClusterPropertys, each
holding a different identification related value. Each property contains the
following information:
- Name - a well known or custom name to identify the property. This is the metadata.Name of the resource.
- Value - a property-dependent string, up 128k Unicode code points (see Notes/Constraints/Caveats section). This is the one and only field in this Kind.
The schema for ClusterProperty is intentionally loose to support multiple
forms of information, including arbitrary additional identification related
properties described by users (see “Additional Properties”, below), but certain
well-known properties will add additional schema constraints, such as those
described in the next section.
Well known properties
The ClusterProperty CRD will support two specific properties under the well
known names cluster.clusterset.k8s.io and clusterset.k8s.io. Being “well
known” means that they must conform to the requirements described below, and
therefore can be depended on by multi-cluster implementations to achieve use
cases dependent on knowledge of a cluster’s id or ClusterSet membership.
The requirements below use the keywords must, should, and may purposefully in accordance with RFC-2119 .
Property: cluster.clusterset.k8s.io
Contains a unique identifier for the containing cluster.
Uniqueness
- The identifier must be unique within the ClusterSet to which its cluster belongs for the duration of the cluster’s membership.
- The identifier may be globally unique beyond the scope of its ClusterSet.
- The identifier may be unique beyond the span of its cluster’s membership and lifetime.
Lifespan
- The identifier must exist and not be changed for the duration of a
cluster’s membership in a ClusterSet, and as long as a
clusterset.k8s.ioproperty referring to that cluster in that ClusterSet exists.
Contents
- The identifier must be a valid RFC-1123 DNS subdomain and should be less than 128 characters in total. This may be used to compose larger DNS names (e.g. in the case of multi-cluster services), so care should be take to ensure that the final names fit into the limit of 253 characters.
- The identifier may be used as a component in MCS DNS.
- The identifier may be a human readable description of its cluster.
Consumers
- May rely on the identifier existing, unmodified for the entire duration of its membership in a ClusterSet.
- Should watch the
cluster.clusterset.k8s.ioproperty to handle potential changes if they live beyond the ClusterSet membership. - May rely on the existence of an identifier for clusters that do not belong to a ClusterSet so long as the implementation provides one.
Notable scenarios
Reusing cluster names: Since an cluster.clusterset.k8s.io ClusterProperty
has no restrictions on whether or not a ClusterProperty can be repeatable, if a
cluster unregisters from a ClusterSet it is permitted under this standard to
rejoin later with the same cluster.clusterset.k8s.io ClusterProperty it had
before. Similarly, a different cluster could join a ClusterSet with the same
cluster.clusterset.k8s.io ClusterProperty that had been used by another
cluster previously, as long as both do not have membership in the same
ClusterSet at the same time. Finally, two or more clusters may have the same
cluster.clusterset.k8s.io ClusterProperty concurrently (though they should
not; see “Uniqueness” above) as long as they both do not have membership in
the same ClusterSet.
Property: clusterset.k8s.io
Contains an identifier that relates the containing cluster to the ClusterSet in which it belongs.
Lifespan
- The identifier must exist and be immutable for the duration of a cluster’s membership in a ClusterSet.
- The identifier must not exist when the cluster is not a member of a ClusterSet.
Contents
- The identifier must associate the cluster with a ClusterSet.
Consumers
- May rely on the identifier existing, unmodified for the entire duration of its membership in a ClusterSet.
- Should watch the clusterset property to detect the span of a cluster’s membership in a ClusterSet.
Additional Properties
Implementers are free to add additional properties as they see fit, so long as
they do not conflict with the well known properties and utilize a suffix. The
following suffixes are reserved for Kubernetes and related projects: .k8s.io,
.kubernetes.io. For example, an implementation may utilize the Kind
ClusterProperty to store objects with the name
fingerprint.example.com but not fingerprint.k8s.io. Cluster operators are
free to use non-namespaced properties (e.g. fingerprint) as they see fit, but
any shared tooling should use appropriately namespaced names.
Notes/Constraints/Caveats
Note: On ClusterProperty.value max length validation
Prior Kubernetes API constructs in core k/k containing arbitrary string values,
such as annotations, are limited by a byte length. The CRD system exposes two
built-in (as in, non-webhook) methods for expressing validation rules against
CRDs: CustomResourceValidation, also known as structural schema, via OpenAPIv3
schema validation (as of Kubernetes version
1.14.7
),
and CEL, also known as the x-kubernetes-validations extension (as of
Kubernetes version
1.25
).
Both systems define strings as Unicode code points, so any validation for
maxLength will be based on number of code points, NOT on input byte count. As a
result, this specification can only express the limits on
ClusterProperty.value length in terms of Unicode code points, regardless of
which of these two validation methods are used (and, to maximize Kubernetes
version compatibility, using structural schema over CEL is advised). Note that
this may not be the same as number of perceived characters (for example, flag
emojis such as “🇺🇸” appear as 1 character but take up 2 code points) nor the
number of bytes used to represent it in a given encoding (that same emoji uses 8
and 10 bytes in UTF-8 and UTF-16, respectively).
Practically, the encoded length of the string in bytes as observed on input or output by the user may vary depending on which of the valid JSON encodings are used (UTF-8, UTF-16, or UTF-32). Therefore, the value limit of 128k code points could take up to 512KB using the least space efficient allowable encoding, UTF-32, which uses 4 bytes per code point.
Strings must be at their encoded length in bytes at storage and while
transmitting over REST. Regarding storage limits, the 512KB is within the 1.5
MiB default maximum request size for
etcd
. There is no apparent
enforcement of request limit sizes to a vanilla Kubernetes API server outside of
the PodSecurity admission controller (which only applies to Pod.Spec, and for
reference is
3MiB
).
The most comparable upstream limit is for resource annotation values, which must
be within 256KB (enforced with custom validation in
k/k
),
and which is supporting a use case amenable to smaller value sizes than
ClusterProperty.value.
Risks and Mitigations
Design Details
Rationale behind the ClusterProperty CRD
This proposal suggests a CRD composed of objects all of the same Kind
ClusterProperty, and that are distinguished using certain well known values in
their metadata.name fields. This design avoids cluster-wide singleton Kinds
for each property, reduces access competition for the same metadata by making
each property its own resource (instead of all in one), allows for RBAC to be
applied in a targeted way to individual properties, and supports the user
prerogative to store other simple metadata in one centralized CRD by creating
CRs of the same Kind ClusterProperty but with their own names.
Storing arbitrary facts about a cluster can be implemented in other ways. For
example, Cluster API subproject stopgapped their need for cluster name metadata
by leveraging the existing Node Kind and storing metadata there via
annotations, such as cluster.x-k8s.io/cluster-name
(ref
). While
practical for their case, this KEP avoids adding cluster-level info as
annotations on child resources so as not to be dependent on a child resource’s
existence, to avoid issues maintaining parity across multiple resources of the
same Kind for identical metadata, and maintain RBAC separation between the
cluster-level metadata and the child resources. Even within the realm of
implementing as a CRD, the API design could focus on distinguishing each fact by
utilizing different spec.Types (as Service objects do e.g.
spec.type=ClusterIP or spec.type=ExternalName), or even more strictly, each
as a different Kind. The former provides no specific advantages since
multiple differently named properties for the same fact are unnecessary, and is
less expressive to query (it is easier to query by name directly like kubectl get clusterproperties cluster.clusterset.k8s.io). The latter would result in
the proliferation of cluster-wide singleton Kind resources, and be burdensome
for users to create their own custom properties.
Implementing the ClusterProperty CRD and its admission controllers
cluster.clusterset.k8s.io ClusterProperty
The actual implementation to select and store the identifier of a given cluster could occur local to the cluster. It does not necessarily ever need to be deleted, particularly if the identifier selection mechanism chooses an identifier that is compliant with this specification’s most broad restrictions – namely, being immutable for a cluster’s lifetime and unique beyond just the scope of the cluster’s membership. A recommended option that meets these broad restrictions is a cluster’s kube-system.uuid.
That being said, for less stringent identifiers, for example a user-specified
and human-readable value, a given cluster.clusterset.k8s.io ClusterProperty
may need to change if an identical identifier is in use by another member of the
ClusterSet it wants to join. It is likely this would need to happen outside the
cluster-local boundary; for example, whatever manages memberships would likely
need to deny the incoming cluster, and potentially assign (or prompt the cluster
to assign itself) a new id.
Since this KEP does not formally mandate that the cluster id must be immutable
for the lifetime of the cluster, only for the lifetime of its membership in a
ClusterSet, any dependent tooling explicitly cannot assume the
cluster.clusterset.k8s.io ClusterProperty for a given cluster will stay
constant on its own merit. For example, log aggregation of a given cluster id
based on this property should only be trusted to be referring to the same
cluster for as long as it has one ClusterSet membership; similarly, controllers
whose logic depends on distinguishing clusters by cluster id can only trust this
property to disambiguate the same cluster for as long as the cluster has one
ClusterSet membership.
Despite this flexibility in the KEP, cluster ids may still be useful before ClusterSet membership needs to be established; again, particularly if the implementation chooses the broadest restrictions regarding immutability and uniqueness. Therefore, having a controller that initializes it early in the lifecycle of the cluster, and possibly as part of cluster creation, may be a useful place to implement it, though within the bounds of this KEP that is not strictly necessary.
The most common discussion point within the SIG regarding whether an
implementation should favor a UUID or a human-readable cluster id string is when
it comes to DNS. Since DNS names are originally intended to be a human readable
technique of address, clunky DNS names composed from long UUIDs seems like an
anti-pattern, or at least unfinished. While some extensions to this spec have
been discussed as ways to leverage the best parts of both (ex. using labels on
the cluster.clusterset.k8s.io ClusterProperty to store aliases for DNS), an
actual API specification to allow for this is outside the scope of this KEP at
this time (see the Non-Goals section).
# An example object of `cluster.clusterset.k8s.io ClusterProperty`
# using a kube-system ns uuid as the id value (recommended above):
apiVersion: about.k8s.io/v1
kind: ClusterProperty
metadata:
name: cluster.clusterset.k8s.io
spec:
value: 721ab723-13bc-11e5-aec2-42010af0021e
# An example object of `cluster.clusterset.k8s.io ClusterProperty`
# using a human-readable string as the id value:
apiVersion: about.k8s.io/v1
kind: ClusterProperty
metadata:
name: cluster.clusterset.k8s.io
spec:
value: cluster-1
clusterset.k8s.io ClusterProperty
A cluster in a ClusterSet is expected to be authoritatively associated with that
ClusterSet by an external process and storage mechanism with a purview above the
cluster local boundary, whether that is some form of a cluster registry, some
peer-to-peer distributed consensus and membership tracking, or just
a human running kubectl. (The details of any specific mechanism is out of scope
for the MCS API and this KEP – see the Non-Goals section.) Mirroring this
information in the cluster-local ClusterProperty CRD will necessarily need to
be managed above the level of the cluster itself, since the properties of
clusterset.k8s.io extend beyond the boundaries of a single cluster, and will
likely be something that has access to whatever cluster registry-esque concept
is implemented for that multicluster setup. It is expected that the
mcs-controller (as described in the MCS API
KEP
),
will act as an admission controller to verify individual objects of this
property.
Because there are obligations of the cluster.clusterset.k8s.io ClusterProperty
that are not meanigfully verifiable until a cluster tries to join a ClusterSet
and set its clusterset.k8s.io ClusterProperty, the admission controller
responsible for setting a clusterset.k8s.io ClusterProperty will need the
ability to reject such an attempt when it is invalid, and alert [UNRESOLVED]
or possibly affect changes to that cluster’s cluster.clusterset.k8s.io ClusterProperty to make it valid [/UNRESOLVED]. Two symptomatic cases of this
would be:
- When a cluster with a given
cluster.clusterset.k8s.io ClusterPropertytries to join a ClusterSet, but a cluster with that samecluster.clusterset.k8s.io ClusterPropertyappears to already be in the set. - When a cluster that does not have a
cluster.clusterset.k8s.io ClusterPropertytries to join a ClusterSet.
In situations like these, the admission controller will need to fail to add the
invalid cluster to the ClusterSet by refusing to set its clusterset.k8s.io ClusterProperty, and surface an error that is actionable to make the property
valid.
# An example object of `clusterset.k8s.io ClusterProperty`:
apiVersion: about.k8s.io/v1
kind: ClusterProperty
metadata:
name: clusterset.k8s.io
spec:
value: environ-1
CRD upgrade path
To CRD or not to CRD?
That is the question.
While this document has thus far referred to the ClusterProperty resource as
being implemented as a CRD, another implementation point of debate has been
whether this belongs in the core Kubernetes API, particularly the
cluster.clusterset.k8s.io ClusterProperty and especially while it being
discussed under the more general naming convention of id.k8s.io. A dependable
cluster ID or cluster name has previously been discussed in other forums (such
as this SIG-Architecture
thread
from 2018, or, as mentioned above, the Cluster API
subproject
which
implemented their own
solution
.) While
today the use case for the current well-known properties described in this KEP
address specific needs for multicluster setups, it is the opinion of
SIG-Multicluster that the function of the proposed ClusterProperty CRD is of
broad utility and becomes more useful the more ubiquitous it is.
This has led to the discussion of whether or not we should pursue adding this
resource type not as a CRD associated with SIG-Multicluster, but as a core
Kubernetes API implemented in kubernetes/kubernetes. A short pro/con list is
enclosed at the end of this section.
One effect of that decision is related to the upgrade path. Implementing this resource only in k/k will restrict the types of clusters that can use cluster id to only ones on the target version (or above) of Kubernetes, unless a separate backporting CRD is made available to them. At that point, with two install options, other issues arise. How do backported clusters deal with migrating their CRD data to the core k/k objects during upgrade – will the code around the formal k/k implementation be sensitive to the backport CRD and migrate itself? Will users have to handle upgrades in a bespoke manner?
| CRD | k/k | |
|---|---|---|
| Ubiquitous | No | Yes |
| Default always set | No | Yes |
| Deployment | Must be installed by the cluster lifecycle management, or as a manual setup step | In every cluster over target milestone |
| Schema validation | OpenAPI v3 validation | Can use the built-in Kubernetes schema validation |
| Blockers | Official API review if using *.k8s.io | Official API review |
| Conformance testing | Not possible now, and no easy path forward | Standard |
In the end, SIG-Multicluster discussed this with SIG-Architecture and it was decided to stick with the plan to use a CRD. Notes from this conversation are in the SIG-Architecture meeting agenda for 3/25/2021. A graduation criteria was set for Alpha->Beta stage to fully immortalize this decision, intended to be the last chance to consider including this design in k/k or not.
The largest concern within SIG-Multicluster regarding a CRD based implementation was the added difficulty of deployment. While at that time efforts were underway to address some of these concerns by providing a better ecosystem for CRD bootstrapping, there is as of yet no centralized solution for bootstrapping out-of-tree CRDs. For now, users of About API (and by extension, MCS API which depends on it) will need to manage their CRD installations carefully, until that is addressed out of scope of this KEP. This may become easier for the community to address as a whole as other CRD-based implementations of Kubernetes features also reach maturity.
Test Plan
This KEP proposes and out-of-tree CRD that is not expected to integrate with any of the Kubernetes CI infrastructure. In addition, it explicitly provides only the CRD definition and generated clients for use by third party implementers, and does not provide a controller or any other binary with business logic to test. For these reasons, we only expect to provide unit tests for a dummy controller to confirm that the generated CRD can be installed and the generated clients can be instantiated. Today those tests are available here .
However, similar to other out-of-tree CRDs that serve third party implementers,
such as Gateway API and MCS API, there is rationale for the project to provide
conformance tests for implementers to use to confirm they adhere to the
restrictions set forth in this KEP that are not otherwise enforced by the CRD
definition; in thise case, the constraints defined on the well-known properties
clusterset.k8s.io and cluster.clusterset.k8s.io. Providing these tests are
not considered blocking graduation requirements for the maturity level of this
API.
These tests will be provided in such a way that implementers can expose one or
more clusters that have the About API CRD installed in them, and run a series of
tests that confirms any well-known properties stored in those clusters'
ClusterProperty objects conform to the constraints in Well known
properties
.
Graduation Criteria
Alpha -> Beta Graduation
- Determine if an
cluster.clusterset.k8s.io ClusterPropertybe strictly a valid DNS label, or is allowed to be a subdomain. - To CRD or not to CRD (see section above)
- Determine if CRD implementation should use CEL validation to limit byte length instead of code points; this would make it only compatible with 1.23+ where CEL validation is behind a feature gate for alpha.
Beta -> GA criteria
- At least one headless implementation using cluster id for MCS DNS
Upgrade / Downgrade Strategy
Any changes to the API definition will follow the official Kubernetes API groups
and versioning guidance
here
and here
. In
short, the API will be provided in order through v1alphaX, v1betaX, to v1,
where compatibility will be preserved from v1beta1 and onwards; clients will
be expected to eventually migrate to the v1 implementation of the API as the
prior versions are deprecated.
Version Skew Strategy
As a CRD, this API is dependent on any changes in the version and compatibility
of the CRD feature itself on which it is built. As the CRD system is in v1 as
of Kubernetes 1.14, and the Kubernetes versioning guarantees v1 APIs to be
maintained through the Kubernetes major release, and as the About API does not
depend on any new features of the CRD system since then, there is no expected
coordination required with any core Kubernetes components until and unless
Kubernetes proceeds to version 2.X.
This CRD /is/ a direct dependency of the MCS API and any mcs-controller implementation as defined by that KEP. As discussed later in the PRR, it is expected that the mcs-controller (or any other controller taking this CRD as its dependency) would manage the lifecycle of this CRD, including any version skew.
As also mentioned below, we are aware that other features (in or out of tree) may want to use this CRD (as debated in “To CRD or Not to CRD” section, above) but we believe it is in the scope of those future features to assess the impact of this CRD’s version strategy on their component’s version skew and their feature’s stability if they do.
Production Readiness Review Questionnaire
NOTE: While this KEP represents only the schema of a CRD that will be implemented out-of-tree and maintained separately from core Kubernetes, a best effort on the PRR questionnaire is enclosed below.
Feature Enablement and Rollback
This section must be completed when targeting alpha to a release.
How can this feature be enabled / disabled in a live cluster?
- Feature gate (also fill in values in
kep.yaml)- Feature gate name:
- Components depending on the feature gate:
- Other
- Describe the mechanism:
- This feature is independently installed via a CRD hosted on the kubernetes-sigs Github.
- Will enabling / disabling the feature require downtime of the control
plane?
- No
- Will enabling / disabling the feature require downtime or reprovisioning
of a node?
- No
- Describe the mechanism:
- Feature gate (also fill in values in
Does enabling the feature change any default behavior? Any change of default behavior may be surprising to users or break existing automations, so be extremely careful here.
- No default Kubernetes behavior is currently planned to be based on this feature; it is designed to be used by the separately installed, out-of-tree, MCS controller. That being said, we are of the opinion that future features (default or not) may want to use this CRD (as debated in “To CRD or Not to CRD” section, above) but we believe it is in the scope of those future features to assess the impact of requiring CRD bootstrapping has on their feature stability if they do.
Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? Also set
disable-supportedtotrueorfalseinkep.yaml. Describe the consequences on existing workloads (e.g., if this is a runtime feature, can it break the existing applications?).- Yes, as this feature only describes a CRD, it can most directly be disabled by uninstalling the CRD. However in practice it is expected that the bootstrapping of this CRD and the management of the well known property CRs themselves will be managed by the mcs-controller, and the recommended way to disable this feature will be to disable the mcs-controller. It is expected the mcs-controller will be responsible for detecting the presence of this CRD to gracefully fail or otherwise raise error messages that can be acted on if the CRD has been disabled by a mechanism other than the mcs-controller’s lifecycle management of the CRD.
What happens if we reenable the feature if it was previously rolled back?
- Purely from this KEP’s standpoint, feature reenablement - namely, reinstallation of the CRD - will do no more than reinstall the CRD schema. In relation to the expected lifecycle manager of this CRD (the mcs-controller), it is expected that on reenablement of the mcs-controller it will reinstall the CRD, will reestablish lifecycle management of the well known properties it is dependent on, including re-creating any relevant CRs.
Are there any tests for feature enablement/disablement? The e2e framework does not currently support enabling or disabling feature gates. However, unit tests in each component dealing with managing data, created with and without the feature, are necessary. At the very least, think about conversion tests if API types are being modified.
- As a dependency only for an out-of-tree component, there will not be e2e tests for feature enablement/disablement of this CRD in core Kubernetes, but e2e tests for this can be implemented in the kubernetes-sigs/mcs-api repo where a basic mcs-controller implementation lives. In reality, multiple mcs-controller implementations are expected to be produced outside of core and these production-ready mcs-controllers are responsible for their own e2e testing.
Rollout, Upgrade and Rollback Planning
This section must be completed when targeting beta graduation to a release.
How can a rollout fail? Can it impact already running workloads?
CRDs themselves are Kubernetes objects, and can fail to be applied if the schema definition is corrupt or incompatible with the CustomResourceDefinition schema. Unit tests and manual tests continuously confirm that as the built CRD yaml produced by this project is valid against the stable
v1 CustomResourceDefinition. (It also could fail if the CRD is applied to a version of Kubernetes that does not have the CRD system is used (<1.14), or the API Server is unreachable, but these are both considered catastrophic failures out of scope of this KEP.)Ultimately, the failure of a rollout of any CRD has the potential to disrupt all features or workloads that depend on it. Watches in controllers will fail to receive updates as the client would fail to find the CRD; a concrete known example for this CRD, the CoreDNS multicluster DNS plugin, would fail to program new DNS records and CoreDNS will answer SERVFAIL to any request made for a Kubernetes record that has not yet been synchronized. Features or workloads that depend on this CRD should plan to manage the lifecycle of this CRD or to provide transparent failure modes if the CRD is not present.
What specific metrics should inform a rollback?
Metrics should be configured using a metrics solutions implementing the Custom Metrics API , for example, the metrics plugin for Custom Resources in kube-state-metrics . Kubernetes does not provide default metrics for CRDs.
Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? Unit tests and manual tests confirm that the CRD is capable of being uninstalled and reinstalled.
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? No.
Monitoring Requirements
This section must be completed when targeting beta graduation to a release.
How can an operator determine if the feature is in use by workloads?
Kubernetes does not provide default metrics for CRDs so an operator would need to depend on custom metrics, or filter 404s from Kubernetes API server against this CRD.
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
N/A: This KEP does not propose a service, only leverages the existing Kuebernetes API service and CRD extension mechanism.
What are the reasonable SLOs (Service Level Objectives) for the above SLIs?
N/A: This KEP does not propose a service, only leverages the existing Kuebernetes API service and CRD extension mechanism.
Are there any missing metrics that would be useful to have to improve observability of this feature?
Default metrics for CRDs in general for number of requests by workload source would improve
Dependencies
This section must be completed when targeting beta graduation to a release.
- Does this feature depend on any specific services running in the cluster? This feature depends only on the CustomResourceDefinition v1 in Kubernetes API server, available in Kubernetes versions 1.14+.
Scalability
For alpha, this section is encouraged: reviewers should consider these questions and attempt to answer them.
For beta, this section is required: reviewers must answer these questions.
For GA, this section is required: approvers should be able to confirm the previous answers based on experience in the field.
Will enabling / using this feature result in any new API calls?
Installing the CRD will require a single API call to POST the new
CustomResourceDefinitionresource that represents it.Will enabling / using this feature result in introducing new API types?
Yes, installing the CRD introduces the cluster-scoped
ClusterPropertyKind. As there is no related service proposed as part of this KEP, there are no specific limits on the supported number of objects per cluster outside of Kubernetes API server storage limits.Will enabling / using this feature result in any new calls to the cloud provider?
No.
Will enabling / using this feature result in increasing size or count of the existing API objects?
Besides the trivial single
CustomResourceDefinitionrequired to install this CRD, no other size or count of existing API objects will be affected by this KEP.Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs ?
No, this KEP does not affect any of the operations covered by existing SLIs/SLOs, particularly since CustomResourceDefinitions are excluded from those SLOs.
Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?
This CRD will utilize the validation mechanism provided by the CRD extension for validation of structural schemas of CRDs which requires some amount of resources to validate on create or update of a CR. However, the number of expected resources (2 as of this KEP) and their rate of change (related to clusterset membership changes, itself expected to be a human decision and rarely changing state) is expected to be trivial.
Troubleshooting
The Troubleshooting section currently serves the Playbook role. We may
consider splitting it into a dedicated Playbook document (potentially with
some monitoring details). For now, we leave it here.
This section must be completed when targeting beta graduation to a release.
How does this feature react if the API server and/or etcd is unavailable?
This KEP itself proposes a CRD applied to the API server; if the API server and/or etcd is unavailable, so is this CRD. Features dependent on this CRD must assess the impact of this CRD’s availability on their component’s availability. Most concretely today, components of the mcs-controller are expected to serve as an admission controller to this CRD or are dependent on this CRD to program DNS. If the API server and/or etcd is unavailable, those controllers will be unable to update a cluster’s ClusterProperty data regarding its well-known properties as part of a ClusterSet, or to program any updates to DNS, respectively.
What are other known failure modes?
- [CRD cannot be installed]
- Detection: Custom metrics or dependent feature metrics; increased 404 rate on Kube API server for the CRD.
- Mitigations: What can be done to stop the bleeding, especially for already running user workloads?
- Diagnostics: What are the useful log messages and their required logging levels that could help debug the issue? Warning and above, as this is the level that 404s against the CRD will be seen.
- Testing: Unit tests against generated CRD schema installation and usage of generated client.
- [CRD cannot be installed]
What steps should be taken if SLOs are not being met to determine the problem?
N/A: SLOs are not defined as there is no service provided by this KEP.