KEP-4785: Resource State Metrics

Release Signoff Checklist
Summary
Motivation
- Goals
- Non-Goals
Proposal
Design Details
Production Readiness Review Questionnaire
Implementation History
Drawbacks
Alternatives
Infrastructure Needed (Optional)

Release Signoff Checklist

N/A since the KEP proposes a controller external to the core Kubernetes codebase.

Summary

Resource State Metrics is a Kubernetes controller that builds on Kube State Metrics’ Custom Resource State’s ideology and generates metrics for custom resources based on the configuration specified in its managed resource, ResourceMetricsMonitor.

Motivation

kubernetes/kube-state-metrics#1710 introduced the Custom Resource State API to Kube State Metrics, which allowed for generating metrics from Custom Resources’ schemata. This was a highly appreciated and much needed feature-set, since before this, the only way to generate metrics for a particular resource was to get that logic merged upstream. It has been two years since the patch was merged, and yet even today the feature-set is arguably one of the most contributed-to parts of one of the most active repositories that fall under SIG Instrumentation.

However, during the recent releases, the maintainers realized that owing to the almost zero dependency parsing logic, it became more prone to side effects as the configuration scaled to include more fields to cater to the needs raised by the community. After many cycles of trading off maintenance complexity for more features, and seeing its maturity, the maintainers agreed on putting this on a “maintenance-only” mode in favor of a better solution that prioritizes scalability in the long run, while ensuring that it meets the same expectations as its predecessor and more.

Such a “solution” should allow the maintainers to deprecate and drop the Custom Resource State API from Kube State Metrics, and replace it by Resource State Metrics which, in addition to its own benefits, would allow Kube State Metrics to drop all Custom Resource State API-specific behaviors that can crash Kube State Metrics, directly affecting the availability of native metrics defined in the codebase, and ensuring that native metrics do not experience significant downtime due to an unavoidable error during the generation of an entirely different set of metrics. Also, clusters requiring only native metrics will not experience the additional Custom Resource State API overhead needlessly when metrics stores’ are being built.

The presence of native and custom resource metrics under the same hood also encouraged folks to infer implicit expectations that we do not officially support, which in turn have driven them to patch the two, sharing logic, when both anticipate varying degrees of engagement and should not be (and are not designed to be) interdependent. Kube State Metrics, owing to its original purpose of native metrics generation, is meant to be comparatively much more stable, and as such, the two feature-sets should be able to grow and release independently in contrast to a part of the codebase that the other is not dependent upon warranting a release.

Goals

The KEP targets a controller which allows for:

Custom Resource metrics generation based on their schemata,
using admission webhooks to avoid ingesting conflicting or bad configuration from managed resources,
defining cluster-scoped managed resources (ResourceMetricsMonitor) that allows defining the collection configuration for generating metrics on-the-fly,
accommodating for multiple configuration parsing techniques and expressions using turing-incomplete (or turing-complete if Go-bindings are available) languages,
conforming to, improving on, and eventually deprecating the existing Custom Resource State API offered by Kube State Metrics without hindering the maintainability and the scalability of the controller.

Non-Goals

The KEP does not target a controller which:

overlaps with Kube State Metrics’ goals in any way except for the Custom Resource State API, or,
offers any stability and performance guarantees for the metrics generated using its managed resource(s).

Proposal

The proposal targets the incubation and incorporation of a controller capable of essentially doing all that Kube State Metrics’ Custom Resource State API offers, in addition to various benefits of its own owing to the controller lifecycle it is based upon.

The controller aims to replace the existing Kube State Metrics’ Custom Resource State feature-set and be significantly much more maintainable and scalable than its predecessor while enabling the community the freedom to extend the supported set of DSLs to parse the configuration in a language that they are familiar with, instead of forcing them up a steep learning-curve for a self-defined DSL that they have to live with, as is the case with Kube State Metrics’ Custom Resource State feature-set.

User Stories (Optional)

Story 1

As a cluster admin, I want to be able to express my Kube State Metrics’ Custom Resource State configurations in dedicated CRs, so I can benefit from everything the controller lifecycle has to offer, for e.g., not having to mount volumes and redeploy their modifications, event emissions on metrics generation, multiple managed resources to isolate configuration logic, etc.

Story 2

As a cluster admin, I want to be able to express my Kube State Metrics’ Custom Resource State configurations in languages that are well-known throughout the ecosystem, and enables me to get going without learning another DSL (domain-specific language).

Story 3

As a Cluster Admin, I want to expose additional metrics from core resources that Kube State Metrics does not provide out of the box, so I can observe custom tailored metrics for my use case.

Story 4

As a maintainer of a project that makes use of Custom Resources, I want to provide configuration on how to generate metrics for them via an externally managed resource, so my users can deploy them along-side on the same cluster.

Notes/Constraints/Caveats (Optional)

The controller abides by the following principals:

Garbage in, garbage out: Invalid configurations will generate invalid metrics. The exception to this being that certain checks that ensure metric structure are still present (for e.g., value should be a float64).
Library support: The module is not intended to be used as a library, and as such, does not export any functions or types, with pkg/ being an exception (for e.g., managed resource types).
Metrics stability: There are no metrics stability guarantees, as the metrics are dynamically generated.
No middle-ware: The configuration is unmarshalled into a set of stores that the codebase directly operates on. Unlike Kube State Metrics, there is no controller-defined parsing middle-ware that processes the configuration before it is used, in order to cut down on unnecessary complexity as much as possible.

Risks and Mitigations

N/A since the managed resource offered by Resource State Metrics provides the ability to define metric configurations that are a super-set of the expressibility that Kube State Metrics’ Custom Resource State configurations have to offer. However, users will need to familiarize themselves with the supported DSLs to be able to translate their configuration into a format that the controller can understand, missing which could result in invalid metrics.

Also, since this aims to deprecate the Custom Resource State API, a timeline will be published for users on the Kube State Metrics’ repository and a PSA on the appropriate channel(s), once the controller graduates to stable.

Design Details

The controller offers a number of improvements over Kube State Metrics’ Custom Resource State API, while maintaining a 3x faster round trip time for metric generation.

At its core, the controller relies on its managed resource, ResourceMetricsMonitor to fetch the metric generation configuration. Parts of the configuration may be defined using different resolvers, such as unstructured or CEL.
Once fetched, the controller unmarshals the configuration YAML directly into stores which are a set of metric families, which in turn are a set of metrics.
Metric stores are created based on its respective GVKR (a type that embeds schema.GroupVersionKind, schema.GroupVersionResource to avoid plural ambiguities ), and reflectors for the specified resource are initialized, and populate the stores on its update.
All generated metrics are hardcoded to gauges by design, as Prometheus currently does not support some OpenMetrics-specified metrics’ types, such as Info and StateSets, but more importantly, because these metrics can be expressed using gauges.
/metrics pings on RSM_MAIN_PORT trigger the server to write the raw metrics defined in the configuration, combined with its appropriate header(s), in the response.
/external pings on RSM_MAIN_PORT trigger the server to write the raw metrics defined in the ./extenal directory, combined with its appropriate header(s), in the response.
/metrics pings on RSM_SELF_PORT trigger the server to write the raw metrics about the process itself, combined with its appropriate header(s), in the response.

At the moment, the spec houses a single configuration field, which defines the metric generation configuration as follows (please note that the schema is fast-moving at this point and may be subject to change:

generators: # Set of metrics stores for each CR we want to generate metrics for.
  - group: "contoso.com" # CR's group.
    version: "v1alpha1"    # CR's version.
    # Both kind and resource names are required to avoid plural ambiguities, see
    # https://github.com/kubernetes-sigs/kubebuilder/issues/3402.
    kind: "MyPlatform"  # CR's kind.
    resource: "myplatforms" # CR's resource.
    selectors: # Set of filters to narrow down the selected CRs, may be:
      field: "metadata.namespace=default" # field selector(s), and (/or),
      label: "app.kubernetes.io/part-of=sample-controller" # label selector(s).
    families: # Set of metrics families to generate for the specified CR.
      - name: "platform_info" # The metric family name, plugged in as-is.
        help: "Information about a MyPlatform instance" # The help text for the
                                                        # metric family, plugged
                                                        # in as-is.
        metrics: # Set of metrics to generate under the current metrics family.
          - resolver: "cel" # Preferred resolver to parse the value expressions.
                            # MAY BE SPECIFIED AT ANY LEVEL.
                            # INNER RESOLVERS OVERRIDE THE OUTER ONES.
                            # NO RESOLVER DEFAULTS TO THE "UNSTRUCTURED" ONE.
            # Set of label-sets to generate for the current metric.
            labelKeys: # Set of ordered label-keys, static in nature.
              - "name"
              - "static_foo"
            labelValues: # Set of ordered label values, dynamic in nature.
                         # Therefore, these may contain static values or
                         # parse-able expressions.
              - "o.metadata.name" # Parse-able CEL expression.
              - "static_foo_value" # Static value.
            value: "42" # `float64` static value, or a dynamic (resolver) path
                        # that MUST resolve to one. A non-cast-able `float64`
                        # will skip the current metric generation and log an
                        # error.
          - labelKeys: # Set of ordered label-keys, static in nature.
              - "environmentType"
              - "static_foo"
            labelValues: # Set of ordered label values, dynamic in nature.
                         # Therefore, these may contain static values or
                         # parse-able expressions.
              - "spec.environmentType" # Parse-able unstructured expression.
              - "static_foo_value" # Static value.
            value: "metadata.labels.foo" # `float64` static value, or a dynamic
                                         # (resolver) path that MUST resolve to
                                         # one. A non-cast-able `float64` will
                                         # skip the current metric generation
                                         # and log an error.
      - name: "platform_replicas" # The metric family name, plugged in as-is.
        help: "Number of replicas for a MyPlatform instance" # The help text for
                                                             # the metric family
                                                             # plugged in as-is.
        metrics: # Set of metrics to generate under the current metrics family.
          - labelKeys: # Set of ordered label-keys, static in nature.
              - "name"
              - "dynamicNoResolveShouldOutputMapRepr_CompositeUnsupportedUpstreamForUnstructured"
            labelValues: # Set of ordered label values, dynamic in nature.
                         # Therefore, these may contain static values or
                         # parse-able expressions.
              - "metadata.name"
              - "metadata.labels"
            value: "spec.replicas" # `float64` static value, or a dynamic
                                   # (resolver) path that MUST resolve to one.
                                   # A non-cast-able `float64` will skip the
                                   # current metric generation and log an error.

It’s also worth mentioning that unlike Kube State Metrics’ Custom Resource State, Resource State Metrics supports recursively generating samples from nested data structures, all from a single expression. Assuming we have a query,

o.spec

for the object,

...
spec:
  appId: test-sample
  language: csharp
  os: linux
  instanceSize: small
  environmentType: dev
  tags:
    - frontend
    - middleware
    - backend
  features:
    - monitoring
    - alerting
  versions:
    - "1.0"
    - "2.0"
    - "3.0"
    - "4.0"
  xProps:
    nonComposite: "example-value"
    compositeArray:
      - "value1"
      - "value2"
    compositeMap:
      key1: "value1"
      key2: "value2"

the resulting metric would look like,

test_metric{os="linux", tags="backend",    key_1="value1", key_2="value2", app_id="test-sample", features="alerting",   language="csharp", versions="1.0", instance_size="small", non_composite="example-value", compositeArray="value1", environment_type="dev"} 2.000000
test_metric{os="linux", tags="frontend",   key_1="value1", key_2="value2", app_id="test-sample", features="monitoring", language="csharp", versions="2.0", instance_size="small", non_composite="example-value", compositeArray="value2", environment_type="dev"} 2.000000
test_metric{os="linux", tags="middleware", key_1="value1", key_2="value2", app_id="test-sample", features="",           language="csharp", versions="3.0", instance_size="small", non_composite="example-value", compositeArray="",       environment_type="dev"} 2.000000
test_metric{os="linux", tags="",           key_1="value1", key_2="value2", app_id="test-sample", features="",           language="csharp", versions="4.0", instance_size="small", non_composite="example-value", compositeArray="",       environment_type="dev"} 2.000000

Note that the order of samples, as well as their labelsets, guaranteed to be stable across runs.

The status, on the other hand, is a set of metav1.Conditions, like so:

status:
  conditions:
    - lastTransitionTime: "2024-11-11T22:43:30Z"
      message: 'Resource configuration has been processed successfully: Event
      handler successfully processed event: addEvent'
      observedGeneration: 1
      reason: EventHandlerSucceeded
      status: "True"
      type: Processed

Test Plan

[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.

Prerequisite testing updates

We assess that since the proposed controller resides out-of-tree, no prerequisite testing suites need to be updated.

Unit tests

- `internal`: 20/1/2025 - 21.1% of statements
- `pkg/apis/resourcestatemetrics/v1alpha1`: 20/1/2025 - 18.3% of statements
- `pkg/resolver`: 20/1/2025 - 47.8% of statements

Integration tests

N/A.

e2e tests

End-to-end tests reside in the resourcestatemetrics_test package, and aim to test the following behaviors:

conformance with the existing Custom Resource State API (kubernetes/kube-state-metrics#1710 ) feature-set,
handling all supported events on the managed resource, which should exhibit deterministic outcomes for each of them,
fuzzing the configuration with invalid values to ensure that the controller does not crash, and
fuzzing the configuration with valid values to ensure there are no side effects.

Graduation Criteria

Upgrade / Downgrade Strategy

ResourceMetricsMonitor API(s) will follow the hub-spoke interconversion model, and as such, an outdated hub or spoke version will be able to upgrade to the latest spoke version when the controller itself is upgraded. This will in turn convert the spoke to the hub version, and the hub version into the latest available spoke version, depending on the controller’s version.

Version Skew Strategy

The controller will follow the same n-1 release strategy as Kube State Metrics’ compatibility matrix follows.

Production Readiness Review Questionnaire

Feature Enablement and Rollback

How can this feature be enabled / disabled in a live cluster?

Other
- Describe the mechanism: The controller can be enabled by deploying it in the cluster, and disabled by deleting it. The managed resources may be left in the cluster to be picked up by a future deployment. However, if that’s not intended, cluster-admins will need to drop them separately (either all CRs or the defining CRD).
- Will enabling / disabling the feature require downtime of the control plane? No.
- Will enabling / disabling the feature require downtime or reprovisioning of a node? No.

Does enabling the feature change any default behavior?

The controller only has RBAC permissions over its managed resources (ResourceMetricsMonitor instances) only and does not attempt to modify or break any existing in-cluster functionality.

Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

Yes, cluster-admins can roll back the controller by simply deleting it, as there are no finalizers or owner references linking back to the controller (this may be subject to change at a later stage). Consequently, doing so does not remove the managed resources automatically, which will need to be dropped separately (or by dropping the defining CRD).

The reason why it is so is that cluster-scoped resources cannot have owner references linking back to namespace-scoped resources, in which case, the garbage collection is a no-op. See 819a80 for more details.

What happens if we re-enable the feature if it was previously rolled back?

In the context of the controller, pre-existing managed resources in the cluster will be picked back up after a (re-)deploy.

Are there any tests for feature enablement/disablement?

N/A.

Rollout, Upgrade and Rollback Planning

Future API version for managed resources will follow the hub-spoke interconversion model.

How can a rollout or rollback fail? Can it impact already running workloads?

In the context of the controller, failed controller deployments simply do not make any changes to the existing managed resources’ metric configurations (spec.configuration), and as such, the metrics will again be available once a deployed instance is up and discovers all existing managed resources in the cluster. But long as that’s not healthy, the metrics exposition will be absent.

What specific metrics should inform a rollback?

At the moment any disruptions will be indicated by the managed resources’ status field, like so,

status:                                                                                                     
    conditions:                                                                  
    - lastTransitionTime: "2024-08-27T19:46:13Z"
      message: 'Resource failed to process'
      observedGeneration: 1                                                                               
      reason: EventHandlerFailed                                                                                               
      status: True                                                                                                           
      type: Failed

Additionally, diagnostic metrics and traces are planned for the near-future.

Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

TBD (when targeting beta).

Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

The controller aims to completely replace Kube State Metrics’ Custom Resource State feature-set, and as such, cause it to be deprecated once the KEP graduates to stable.

Monitoring Requirements

How can an operator determine if the feature is in use by workloads?

This is not a workload feature, but an out-of-tree telemetry solution.

How can someone using this feature know that it is working for their instance?

Events: Events are emitted in the controller’s namespace, for e.g., OwnerRefInvalidNamespace in case of an owner reference being defined on ResourceMetricsMonitor to its controller.
API .status: The status for a successfully processed ResourceMetricsMonitor looks as follows:

  status:                                                                                                     
    conditions:                                                                  
    - lastTransitionTime: "2024-08-27T19:46:13Z"
      message: 'Resource configuration has been processed successfully: Event handler successfully processed event: addEvent'
      observedGeneration: 1                                                                               
      reason: EventHandlerSucceeded                                                                                               
      status: "True                                                                                                           
      type: Processed

Other: http_request_duration_seconds is a histogram metric exposed on the telemetry port which is useful for observing the trends in requests for the generated metrics.

What are the reasonable SLOs (Service Level Objectives) for the enhancement?

TBD.

What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?

Metrics: Telemetry metrics exposed on RSM_SELF_PORT.
Other: healthz, readyz, and livez endpoints are exposed on RSM_SELF_PORT, RSM_MAIN_PORT, and RSM_SELF_PORT, respectively.

Are there any missing metrics that would be useful to have to improve observability of this feature?

TBD.

Dependencies

Does this feature depend on any specific services running in the cluster?

The controller relies on core Kubernetes serving-stack components:

kube-apiserver and etcd
- Usage description: The controller needs to make API calls in order to reconcile the managed resources in the cluster.
  - Impact of its outage on the feature: The controller will log errors but keep reconciling for when etcd comes back up.
  - Impact of its degraded performance or high-error rates on the feature: Same as above.

There’s an indirect dependency on critical components such as kube-controller-manager and kubelet, which have been left out above for the sake of brevity.

Scalability

Will enabling / using this feature result in any new API calls?

Yes, the controller needs to make API calls in order to reconcile the managed resources in the cluster. There are no resync polls done by the controller (ResyncPeriod: 0 for all reflectors). The reflectors will do a LIST and/or WATCH on the associated resources’ modification. The same applies for managed resource(s), however, their case will be accompanied by additional GET and UPDATE calls to ensure their ObjectMeta and Status are synced.

Will enabling / using this feature result in introducing new API types?

Yes, the controller currently has one cluster-scoped managed resource of
ResourceMetricsMonitor type . There is currently no upper-limit on the number of managed resource instances that could be defined.

Will enabling / using this feature result in any new calls to the cloud provider?

No.

Will enabling / using this feature result in increasing size or count of the existing API objects?

The controller will not create any object on its own. Only when a managed resource is deployed will it try to reconcile (modify) it, if needed.

Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?

N/A.

Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?

The telemetry metrics show no considerable jump in memory or CPU usage under any supported operation.

Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?

No.

Troubleshooting

How does this feature react if the API server and/or etcd is unavailable?

The controller will log errors but keep reconciling for when etcd comes back up.

What are other known failure modes?

It is encouraged to observe any symptoms using the telemetry metrics, while monitoring any failure using the available set of health probes.

What steps should be taken if SLOs are not being met to determine the problem?

The controller provides a /debug endpoint (on RSM_SELF_PORT) which exposes all available pprof data to help diagnose any issue with the binary. Additionally, the telemetry metrics can provide more details about the runtime consumptions of the binary. The health probes take into factor the health of the respective set of components associated with that probe and will fail if any such component is not healthy (?verbose may be appended when querying the health probes’ endpoint to know exactly which component(s) are not healthy).

Implementation History

Drawbacks

Alternatives

We considered refactoring the Kube State Metrics’ Custom Resource State API, but that has actually been done multiple times in the past which often amounts to us ending up in the same position, owing to its limited scalability. Its also worth mentioning kubernetes/kube-state-metrics#1978 here, an in-house effort that had similar goals.

Infrastructure Needed (Optional)

We request a repository (kubernetes-sigs/resource-state-metrics) to migrate rexagod/resource-state-metrics to.