KEP-1209: Metrics Stability Framework
KEP-1209: Metrics Stability Framework
- Release Signoff Checklist
- Summary
- Motivation
- Proposal
- Design Details
- Production Readiness Review Questionnaire
- Implementation History
Release Signoff Checklist
Items marked with (R) are required prior to targeting to a milestone / release.
- (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
- (R) KEP approvers have approved the KEP status as
implementable - (R) Design details are appropriately documented
- (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
- (R) Graduation criteria is in place
- (R) Production readiness review completed
- (R) Production readiness review approved
- “Implementation History” section is up-to-date for milestone
- User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
- Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
Summary
This proposal covers the implementation of metrics stability in the kubernetes/kubernetes repo (and anywhere else that consumes component-base/metrics).
Historically, the implementation was split into four documents:
- Metrics Stability Framework
- Metrics Stability Migration
- Metrics Validation and Verification
- Metrics Stability to Beta
This document is not net new and ties the four together in order to document the lifecycle of this feature.
Motivation
See:
- Metrics Stability Framework#Motivation
- Metrics Stability Migration#Motivation
- Metrics Validation and Verification#Motivation
- Metrics Stability to Beta#Motivation
Proposal
See:
- Metrics Stability Framework#Proposal
- Metrics Stability Migration#General Migration Strategy
- Metrics Validation and Verification#Proposal
- Metrics Stability to Beta#Proposal
Design Details
See:
Graduation Criteria
Net New -> Alpha Graduation
See:
Alpha -> Beta Graduation
See:
- Metrics Validation and Verification#Graduation Criteria
- Metrics Stability to Beta#Graduation Criteria
Beta -> GA Graduation
- Metrics are now eligible to be promoted to STABLE status (we have some candidates in kube-apiserver).
- apiserver_storage_object_counts
apiserver_request_totalwill also be promoted (as discussed in biweekly SIG apimachinery meeting)
- Implement the ability to turn off individual metrics (see here
)
- We need this because of stuff like this: Unbounded valuesets for metric labels
Upgrade / Downgrade Strategy
See:
Version Skew Strategy
N/A
Production Readiness Review Questionnaire
How can this feature be enabled / disabled in a live cluster?
The metrics stability framework adds developer tooling around commit pipelines and is not a user-facing feature per se. The part that is user-facing is the annotation on metrics with a stability level.
This framework intends to increase reliability in control-plane management and so features in the metrics stability framework tend to ‘fix’ aspects of dev processes which lead to downstream breakages.
Rollout, Upgrade and Rollback Planning This section must be completed when targeting beta graduation to a release.
N/A, this isn’t a feature per se.
What specific metrics should inform a rollback?
N/A
Monitoring Requirements
How can an operator determine if the feature is in use by workloads?
N/A
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
N/A
Metrics
The stability framework applies to all metrics which originate directly from the control-plane.
Dependencies
This section must be completed when targeting beta graduation to a release.
Does this feature depend on any specific services running in the cluster?
N/A
For GA, this section is required: approvers should be able to confirm the previous answers based on experience in the field.
Will enabling / using this feature result in any new API calls? Describe them, providing:
No.
Troubleshooting
How does this feature react if the API server and/or etcd is unavailable?
N/A (but if the component isn’t available, no metrics are being scraped).
What are other known failure modes?
At worst, this thing can clog the commit pipeline (since it is effectively a conformance test for ensuring metric stability guarantees). In that case, we can simply turn off the verification and validation mechanism (i.e. the hack/verify_generated_stable_metrics.sh script) which effectively puts us back to where we were before the framework. Note that this basically allows developers to commit breaking changes to metrics and violate guarantees though.
Implementation History
- Metrics Stability Framework#Implementation History
- Metrics Stability Migration#Implementation History
- Metrics Validation and Verification#Implementation History
- Metrics Stability to Beta#Implementation History
Status: currently implemented.