KEP-3515: Kubectl Explain OpenAPIv3

KEP-3515: OpenAPI v3 for kubectl explain

Release Signoff Checklist
Summary
Motivation
Proposal
Design Details
Production Readiness Review Questionnaire
Implementation History
Drawbacks
Alternatives
- Implement proto.Models for OpenAPI V3 data
- Custom User Templates
Future Work
- Other template outputs
  - HTML
  - Markdown

Release Signoff Checklist

Items marked with (R) are required prior to targeting to a milestone / release.

(R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
(R) KEP approvers have approved the KEP status as implementable
(R) Design details are appropriately documented
(R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
(R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests
(R) Production readiness review completed
(R) Production readiness review approved
“Implementation History” section is up-to-date for milestone
User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

Summary

This KEP proposes an enhancement to kubectl explain:

Switch data source from OpenAPI v2 to OpenAPI v3
Replace the hand-written kubectl explain printer with a go/template implementation.

Motivation

OpenAPI v3 is a richer API description than OpenAPI v2

OpenAPI v3 support in Kubernetes is currently beta since version 1.24. OpenAPI V3 is a richer representation of the kubernetes API to our users, who have been asking for visibility into things like:

nullable
default
validation fields like oneOf, anyOf, etc.

To show each of these additional data points by themselves is a strong reason to switch to using OpenAPI v3.

CRD schemas expressed as OpenAPI v2 are lossy

Today CRDs specify their schemas in OpenAPI v3 format. To serve the /openapi/v2 document used today by kubectl, there is an expensive conversion from the v3 down to v2 format.

This process is very lossy , so kubectl explain when used against CRDs making use of v3 features does not have a good experience with inaccurate information, or fields removed altogther.

This transformation causes bugs, for example, when attempting to explain a field that is nullable, kubectl instead shows nothing, due to the lossy conversion wiping nullable fields.

Goals

Provide the new richer type information specified by OpenAPI v3 within kubectl explain
Have a more maintainable text/template based approach to printing
Fallback to old explain implementation if cluster does not expose OpenAPI v3 data.
Provide multiple new output formats for kubectl explain:
- human-readable plaintext
- maybe others
(Optional?) Allow users to specify their own templates for use with kubectl explain (there may be interesting use cases for this)
Improve discoverability of API Resources and endpoints, and provide a platform for richer information to be included in the future.

Non-Goals

“Fix” openapi v3 to openapi v2 conversion This is a non-goal for two reasons:
- These formats are not compatible, and there WILL be data loss and inaccuracy
- This negates the benefits of using OpenAPI v3 for the richer type information
Provide general-purpose OpenAPI visualization.

Proposal

Basic Usage

The following user experience should be possible with kubectl explain

kubectl explain pods.spec

Output should be familiar to users of today’s kubectl explain, except new information from the OpenAPI v3 spec is now populated.

Note: Feature during development will be gated by an experimental flag. The commands shown here elide the experimental flag for clarity.

Built-in Template Options

Plaintext

kubectl explain pods

kubectl explain pods --output plaintext

The plaintext output format is the default and should be crafted to be as close as the existing explain output in use before this KEP.

OpenAPIV3 (raw json)

kubectl explain pods --output openapiv3

To get raw OpenAPI v3 data for a certain resource today involves: 1.) setting up kubectl proxy 2.) fetching the correct path at /openapi/v3/<group>/<version> 3.) filtering out unwanted results

This command is useful not only for its convenience, but also other visualizations may be built upon the raw output if we opt not to support a first-class custom template solution in the future.

Risks and Mitigations

OpenAPI V3 Not Available

Risk

OpenAPI v3 data is not available in the current cluster.

Mitigation

If the user does not provide an –output argument

In alpha in particular, if --output is not specified, the old explain behavior using openapi v2 data will be used.

In beta, kubectl will test if server publishes /openapi/v3. If it does, it will proceed with the new renderer. If there is no endpoint published, kubectl will fall back to the old v2 implemtation.

After GA, --output plaintext will be assumed and behave as below.

If the user does provide an –output argument

If a user specifies an --output argument and the server 404’s attempting to fetch the correct openapi version for the template, a new error message should be thrown to the effect of: server missing openapi data for version: %v.%v.%v.

Internal templates should strive to support the latest OpenAPI version enabled by default by versions of kubernetes within their skew. With that policy, templates will always render with the latest spec-version of the data, if it is available.

Other network errors should be handled using normal kubectl error handling.

Design Details

Current High-level Approach

User types kubectl explain pods
kubectl resolves ‘pods’ to GVR core v1 pods using cluster discovery information
kubectl resolves GVR to its GVK using restmapper
kubectl fetches /openapi/v2 as protobuf
kubectl parses the protobuf into gnostic_v2.Document
kubectl converts gnostic_v2.Document into proto.Models
kubectl searches the document’s Definitions for a schema with the extension x-kubernetes-group-version-kind matching the interested GVK
If a field path was used, kubectl traverses the definition’s fields to the subschema specified by the user’s path.
kubectl renders the definition using its hardcoded printer
If --recursive was used, repeat step 9 for the transitive closure of object-typed fields of the top-level object. Concat the results together.

Proposed High-level Approach

User types kubectl explain pods
kubectl resolves ‘pods’ to GVR core v1 pods using cluster discovery information
kubectl attempts to fetch /openapi/v3, which indexes where to find specs for each GV
If failure and fallback to v2 is allowed, falls back to Step #3 of the “Current High-level Approach”.
Otherwise, kubectl fetches OpenAPIV3 path for GV: /openapi/v3/<group>/<version>
kubectl parses the result as map[string]any
kubectl locates the schema of the return type for the Path /apis/<group>/<version>/<resource>
If a field path was used, kubectl traverses the definition’s fields to the subschema specified by the user’s path.
kubectl renders the type using its built-in template for human-readable plaintext
If --recursive was used, repeat step 9 for the transitive closure of object-typed fields of the top-level object. Concat the results together.

Template rendering

Go’s text/template will be used due to its familiarity, stability, and virtue of being in stdlib.

Test Plan

[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.

Prerequisite testing updates

Unit tests

k8s.io/kubectl/pkg/explain: 09/29/2022-75.6

Integration tests

Tests should include

Expected Output tests
Show correct OpenAPI v3 endpoints are hit
Tests that show default/nullability information is being included in plaintext output
Tests that update the backing openapi in between calls to explain

e2e tests

Existing e2e tests should be adapted for the new system. E2E test that shows every definition in OpenAPI document can be retrieved via explain

Graduation Criteria

Defined using feature gate

Alpha 1

Feature implemented behind a command line flag --output and environment variable
Existing explain tests are working or adapted for new implementation
Plaintext output roughly matches explain output
OpenAPIV3 (raw json) output implemented

Beta

OpenAPI V3 is enabled by default on at least one version within kubectl’s support window. As of Kubernetes 1.24 OpenAPIV3 entered beta and become enabled by default, therefore meeting this requirement. In Kubectl for release 1.25, all k8s versions within support window will be able to have OpenAPIV3 enabled. However, the fallback is kept around since it may not always be enabled.
--output plaintext is on-by-default and environment variable is removed/on by default
--output plaintext-openapiv2 added as a name for the old explain implementation, so the feature may be positively disabled.

GA

OpenAPIV3 is GA and has been since at least the minimum supported apiserver version by kubectl.
OpenAPIV3 should be stable for all k8s versions within skew.
Old kubectl explain implementation is removed, as is support for OpenAPIV2-backed kubectl explain

Upgrade / Downgrade Strategy

N/A

Version Skew Strategy

This feature only requires the target cluster has enabled The OpenAPIV3 feature.

OpenAPIV3 is Beta as of Kubernetes 1.24. Thus every version of Kubernetes within skew should be reasonably expected to have the feature on, unless it has been explicitly disabled.

This feature should not be on-by-default without an automatic fallback until OpenAPIV3 is GA.

Users of the --output plaintext flag who attempt to use it against a cluster for which OpenAPI v3 is not enabled will be shown an error informing them of missing openapi version upon 404.

In Beta, if no output is specified, OpenAPIV3 will be tried first, and fallback to V2 if not available. In GA, the fallback will be removed (since all clusters in skew should publish V3 endpoint by then)

Built-in templates supported by kubectl should aim to support at least one OpenAPI version which is GA for an apiserver version within the support window. kubectl will support trying to fetch each of these versions, so one is guaranteed to be able to render.

Production Readiness Review Questionnaire

Feature Enablement and Rollback

How can this feature be enabled / disabled in a live cluster?

Feature gate (also fill in values in kep.yaml)
- Feature gate name:
- Components depending on the feature gate:
Other
- Describe the mechanism: disablement via --output plaintext-openapiv2 CLI argument for explain subcommand. Beta may also be disabled with KUBECTL_EXPLAIN_OPENAPIV3=false environment variable.
- Will enabling / disabling the feature require downtime of the control plane? No
- Will enabling / disabling the feature require downtime or reprovisioning of a node? No

Does enabling the feature change any default behavior?

Enabling the feature changes the data source of kubectl explain to use openapiv3. The output optimally should be familiar to users, who may be delighted to see new information populated.

Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

Yes, providing --output plaintext-openapiv2 will disable the feature.

Alternatively during Alpha and Beta phases, environment variable KUBECTL_EXPLAIN_OPENAPIV3=false may be used to disable the feature without the backwards-incompatile argument.

This feature has no persistent effect on data that is viewed. It is just a viewer of cluster data.

What happens if we reenable the feature if it was previously rolled back?

There is no persistence to using the feature. It is only used for viewing data. So it behaves as normal.

Are there any tests for feature enablement/disablement?

Plan to add more tests for enablement/disablement for beta. PR started here with tests that toggle feature on and off and show feature works in both cases.

https://github.com/kubernetes/kubernetes/blob/a62e52cf2ede71a7219c04569ee09f0410f709f0/staging/src/k8s.io/kubectl/pkg/cmd/explain/explain_test.go#L157-L219

Rollout, Upgrade and Rollback Planning

How can a rollout or rollback fail? Can it impact already running workloads?

No, this is a user-interactive CLI feature. If users don’t like it they can use the old functionality by providing arguments --output plaintext-openapiv2

What specific metrics should inform a rollback?

Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

This feature has no state in the cluster. Using explain on a cluster cannot affect other users.

Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

No.

Monitoring Requirements

How can an operator determine if the feature is in use by workloads?

There is no direct metrics of explain users, but operators can indirectly gauge usership by watching openapi v3 metrics.

How can someone using this feature know that it is working for their instance?

kubectl explain pods --output plaintext

User should see OpenAPI v3 JSON Schema for pods type printed to console.

What are the reasonable SLOs (Service Level Objectives) for the enhancement?

N/A

What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?

Metrics
- Metric name:
- [Optional] Aggregation method:
- Components exposing the metric:
Other (treat as last resort)
- Details: N/A for client-side user-interactive CLI feature

Are there any missing metrics that would be useful to have to improve observability of this feature?

N/A

Dependencies

None

Does this feature depend on any specific services running in the cluster?

To reap the benefits of this feature, OpenAPI v3 is required, however OpenAPI v2 data can be used as a fallback. OpenAPI V3 is GA as of Kubernetes 1.27.

Scalability

Will enabling / using this feature result in any new API calls?

Yes, up feature replaces a single GET of /openapi/v2 which returns a large (megabytes) openapi document for all types with a more targeted call to /openapi/v3/<group>/<version>

The /openapi/v3/<group>/<version> endpoint implements E-Tag caching so that if the document has not changed the server incurs a cheap, almost negligible cost to serving the request.

The document returned by calls to /openapi/v3/... is expected to be far smaller than the megabytes-scale openapi v2 document, since it only includes information for a single group-version. Additionally, this new mechanism is far more cache-friendly so the expectation is that far less data will need to be transferred.

Will enabling / using this feature result in introducing new API types?

No.

Will enabling / using this feature result in any new calls to the cloud provider?

No.

Will enabling / using this feature result in increasing size or count of the existing API objects?

No.

Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?

No.

Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?

No, would expect generally same amount of resource usage for kubectl.

Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?

No, it is client-side only and only uses a single standard HTTP connection.

Troubleshooting

How does this feature react if the API server and/or etcd is unavailable?

Using kubectl’s normal error handling. There is no lasting effect to data or the user.

What are other known failure modes?

What steps should be taken if SLOs are not being met to determine the problem?

N/A

Implementation History

Drawbacks

Alternatives

Implement proto.Models for OpenAPI V3 data

The current hard-coded printer is capable of printing any objects in proto.Models form.

[We already have a way to express OpenAPI v3 data as proto.Models, so this can be seen as a path of least resistance for plugging OpenAPI v3 into kubectl explain.

This approach is undesirable for a few different reasons:

1.) We would like to update the explain printer to include new OpenAPI v3 information, the current design makes that time consuming and not maintainable.

2.) API-Machinery has desire to deprecate proto.Models. We seeproto.Models conversion as unnecessary and costly buraucracy, that contributes to high OpenAPI overhead. We are seeking to deprecate the type in favor of the kube-openapi types for future usage.

Custom User Templates

Users might also like to be able to specify a path to a custom template file for the resource information to be written to:

human-readable plaintext form:

kubectl explain pods --template /path/to/template.tmpl

Since the API surface for this sort of feature remains very unclear and will likely be very unstable, this sort of feature should be delayed until the internal templates have proven the API surface to be used. To do otherwise would risk breaking user’s templates.

Future Work

This was work that was specced out was part of this KEP, but not added. SIG-CLI is open to improvements in these areas.

Other template outputs

This KEP makes it easy to extend the explain output. Requirements for built-in md and html outputs might be:

md output implemented (or dropped from design due to continued debate)
- Table of contents all GVKs grouped by Group then Version.
- Section for each individual GVK
- All types hyperlink to specific section
basic html output (or dropped from design due to continued debate)
Table of contents all GVKs grouped by Group then Version.
Page for each individual GVK.
All types hyperlink to their specific page
Searchable by name, description, field name.

This was removed from scope for the KEP to focus only on the feature users rely on which is the plaintext explain. These templates may be added in the future.

HTML

kubectl explain pods --output html

Similarly to godoc , we suggest to provide a searchable, navigable, generated webpage for the kubernetes types of whatever cluster kubectl is talking to.

Only the fields selected in the command line (and their subfields’ types, etc) will be included in the resultant page.

Possible idea: If user types kubectl explain --output html with no specific target, then all types in the cluster are included.

Markdown

kubectl explain pods --output md

When using the md template, a markdown document is printed to stdout, so it might be saved and used for a documentation website, for example.

Similarly to html output, only the fields selected in the command line (and their subfields’ types, etc) will be included in the resultant page.

Possible idea: If user types kubectl explain --output md with no specific target, then all types in the cluster are included.