KEP-1209: Metrics Stability Framework

Implementation History
STABLE Implemented
Created 2019-04-04
Latest v1.21
Milestones
Alpha v1.15
Beta v1.17
Stable v1.21

KEP-1209: Metrics Stability Framework

Release Signoff Checklist

Items marked with (R) are required prior to targeting to a milestone / release.

  • (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
  • (R) KEP approvers have approved the KEP status as implementable
  • (R) Design details are appropriately documented
  • (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
  • (R) Graduation criteria is in place
  • (R) Production readiness review completed
  • (R) Production readiness review approved
  • “Implementation History” section is up-to-date for milestone
  • User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
  • Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

Summary

This proposal covers the implementation of metrics stability in the kubernetes/kubernetes repo (and anywhere else that consumes component-base/metrics).

Historically, the implementation was split into four documents:

  1. Metrics Stability Framework
  2. Metrics Stability Migration
  3. Metrics Validation and Verification
  4. Metrics Stability to Beta

This document is not net new and ties the four together in order to document the lifecycle of this feature.

Motivation

See:

  1. Metrics Stability Framework#Motivation
  2. Metrics Stability Migration#Motivation
  3. Metrics Validation and Verification#Motivation
  4. Metrics Stability to Beta#Motivation

Proposal

See:

  1. Metrics Stability Framework#Proposal
  2. Metrics Stability Migration#General Migration Strategy
  3. Metrics Validation and Verification#Proposal
  4. Metrics Stability to Beta#Proposal

https://github.com/kubernetes/enhancements/blob/77a84d2d55b5802a615f3fe98e7e7c9bd26c9efc/keps/sig-instrumentation/1209-metrics-stability/keps/sig-instrumentation/1209-metrics-stability/20190404-kubernetes-control-plane-metrics-stability.md#implementation-history

Design Details

See:

  1. Metrics Stability Framework#Design Details
  2. Metrics Validation and Verification#Design Details

Graduation Criteria

Net New -> Alpha Graduation

See:

  1. Metrics Stability Framework#Graduation Criteria
  2. Metrics Stability Migration#Graduation Criteria

Alpha -> Beta Graduation

See:

  1. Metrics Validation and Verification#Graduation Criteria
  2. Metrics Stability to Beta#Graduation Criteria

Beta -> GA Graduation

  • Metrics are now eligible to be promoted to STABLE status (we have some candidates in kube-apiserver).
  • Implement the ability to turn off individual metrics (see here )

Upgrade / Downgrade Strategy

See:

https://github.com/kubernetes/enhancements/blob/0f5bb1138a6dfd7f3d52fa901c2fba7abb7fb731/keps/sig-instrumentation/1209-metrics-stability/keps/sig-instrumentation/1209-metrics-stability/20190404-kubernetes-control-plane-metrics-stability.md#implementation-history

Version Skew Strategy

N/A

Production Readiness Review Questionnaire

How can this feature be enabled / disabled in a live cluster?

The metrics stability framework adds developer tooling around commit pipelines and is not a user-facing feature per se. The part that is user-facing is the annotation on metrics with a stability level.

This framework intends to increase reliability in control-plane management and so features in the metrics stability framework tend to ‘fix’ aspects of dev processes which lead to downstream breakages.

Rollout, Upgrade and Rollback Planning This section must be completed when targeting beta graduation to a release.

N/A, this isn’t a feature per se.

What specific metrics should inform a rollback?

N/A

Monitoring Requirements

How can an operator determine if the feature is in use by workloads?

N/A

What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?

N/A

Metrics

The stability framework applies to all metrics which originate directly from the control-plane.

Dependencies

This section must be completed when targeting beta graduation to a release.

Does this feature depend on any specific services running in the cluster?

N/A

For GA, this section is required: approvers should be able to confirm the previous answers based on experience in the field.

Will enabling / using this feature result in any new API calls? Describe them, providing:

No.

Troubleshooting

How does this feature react if the API server and/or etcd is unavailable?

N/A (but if the component isn’t available, no metrics are being scraped).

What are other known failure modes?

At worst, this thing can clog the commit pipeline (since it is effectively a conformance test for ensuring metric stability guarantees). In that case, we can simply turn off the verification and validation mechanism (i.e. the hack/verify_generated_stable_metrics.sh script) which effectively puts us back to where we were before the framework. Note that this basically allows developers to commit breaking changes to metrics and violate guarantees though.

Implementation History

  1. Metrics Stability Framework#Implementation History
  2. Metrics Stability Migration#Implementation History
  3. Metrics Validation and Verification#Implementation History
  4. Metrics Stability to Beta#Implementation History

Status: currently implemented.

Current list of stable metrics:

  1. apiserver_request_total
  2. apiserver_storage_object_counts
  3. apiserver_request_duration_seconds