KEP-2985: Public KRM Functions Registry

Release Signoff Checklist
Summary
Motivation
- Goals
- Non-Goals
Proposal
Design Details
Production Readiness Review Questionnaire
Implementation History
Drawbacks
Alternatives
Infrastructure Needed (Optional)

Release Signoff Checklist

Items marked with (R) are required prior to targeting to a milestone / release.

(R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
(R) KEP approvers have approved the KEP status as implementable
(R) Design details are appropriately documented
(R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests for meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
(R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests
(R) Production readiness review completed
(R) Production readiness review approved
“Implementation History” section is up-to-date for milestone
User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

Summary

This KEP proposes to create a public KRM functions registry for the community to contribute and discover useful KRM functions. There will be a repo to host the centralized index for KRM functions and a website that present the KRM functions to the users.

Motivation

KRM functions have gained more and more interests and become more and more popular in the k8s configuration management space.

Google has a GitHub repository under GoogleContainerTools for hosting the source of the functions and a website for presenting the functions to the end users. However, not everyone in the community can contribute to it for various reasons. E.g. a company’s policy may not allow its employees to contribute to repo owned by another company.

To have a thriving ecosystem of KRM functions, we must enable contributions for all community members in vendor-neutral registry. With a public KRM function registry, we can significantly improve the Day 0 and Day 2 user experiences. On Day 0, users can discover some KRM functions that may be useful for their needs. On Day 2, users can query the registry to discover new function versions.

Goals

Enable end users of orchestrators (e.g. Kustomize, Kpt) that support KRM functions to discover and leverage a common ecosystem of compatible functions.
Enable end users to discover and use sets of functions specifically from publishers they trust.
Enable function authors from any company to expose their function in a well-known index for discovery by end users.
Provide a central place for first-party (SIG-sponsored) plugins to be built and added to the index.

Non-Goals

Replace or compete with Catalog as publication format for collections of functions. Catalog should be used by this Registry, and the details of its internal format should be discussed in that KEP.
Support building non-SIG-sponsored functions.
Support SIG-sponsored functions written in a language other than Go or published in a format other than containerized.

Proposal

This KEP proposes to create a public KRM functions registry. There will be 2 components to back the registry:

A repo sponsored by sig-cli to host the index of the functions in the registry.
A website that presents the doc and examples to the end users and allows users to search and discover KRM functions.

Publisher

When publishing a function, the contributor MUST publish it on behalf of a publisher. We can revisit this decision later if there are requests to relax it when publisher is an individual.

A publisher can be one of the following:

A project, community or SIG in Kubernetes: e.g. Kustomize, KubeFlow or SIG-CLI.
A company: e.g. Apple or Google.
A GitHub organization: e.g. github.com/myorg
An individual: e.g. foo@example.com

All publishers must specify maintainers in a OWNERS file which is a convention in kubernetes. Whenever changes (e.g. adding new functions) are made in a publisher, at least one of the maintainers must approve the change.

To publish on behalf of a company:

The commit must be a verified commit from a person with that company’s email address.
If there are maintainers for this publisher, at least one of the maintainers must approve the change.

To publish on behalf of a GitHub organization, the contributor must be a member of the organization.

To publish as an individual, the commit must be a verified commit from the same person’s email.

Kubernetes already has the tooling to enforce approval with OWNERS files. We can leverage it. CI can be set up to enforce verifiable commit from desired email domain.

Security

SIG-CLI is responsible for the security of the SIG-sponsored KRM functions but not all KRM functions in the registry.

Publishers are responsible for the security of their KRM functions. Publishers are responsible for clearly communicating the expectation (e.g. maturity) to their users. For example, Kustomize can provide a small set of carefully vetted KRM functions which can be published as kustomize.

We strongly suggest users to use container as a sandboxing mechanism to run the KRM functions.

Trust

A user should NOT trust every KRM function in the registry.

Trust can be established at the publisher level. Users can choose to trust a publisher and use the KRM functions provided by this publisher.

Publisher information can be used to aggregate KRM functions. We can support both dynamic aggregation of KRM functions and static, versioned collection of KRM functions. Publishers can choose to create a snapshot of the dynamic aggregation of their KRM functions at some time. The snapshot must be versioned, but SemVer is not necessary here since it’s meaningless for a catalog. The snapshot can be accessed later as a static catalog.

User Stories (Optional)

Story 1

As a KRM functions user, I can browse the function registry website (e.g. https://krm-functions.io ) and search the KRM function by name (e.g. set-labels). And I can find everything including doc, examples, homepage, maintainers, publisher information about the function.

Story 2

As a KRM function user, I can query https://krm-functions.io/catalogs/aggregate/latest.yaml?publisher=kubeflow,kustomize to find a real-time aggregation of all KRM functions published by Kustomize and KubeFlow.

Story 3

«[UNRESOLVED]» This KEP propose using the date as the version of static catalog. Alternatively, the hash of the contents can be used as the version. This is still TBD. «[/UNRESOLVED]»

As a KRM function user, I can find the versioned catalog published by Pineapple Co. at https://krm-functions.io/catalogs/pineapple/v20210924.yaml . It is a catalog provided by Pineapple Co. and snapshoted on 09/24/2021.

Story 4

As a kustomize user, I want to use a KRM functions catalog provided by a publisher in kustomize. The kustomiztion.yaml file may look like the following per Catalog KEP

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
catalogs:
- https://krm-functions.io/catalogs/kustomize/v20210924.yaml.
resources:
- input.yaml
transformers:
- ...

Story 5

As a KRM function user, I want to have tab completion for function image names when using imperative runs. The sugguested image names should from the registry.

When using with kustomize:

kustomize fn run --image <tab><tab>

When using with kpt:

kpt fn eval --image <tab><tab>

Story 6

As a Kustomize maintainer, I want to develop and publish a small, well-vetted set of functions published from the SIG’s registry. These plugins should behave identically to built-ins from the end-user perspective.

Furthermore, I can release a version of Kustomize that trusts these functions. This should be supported by kustomize, but building kustomize is not in scope for this KEP.

Notes/Constraints/Caveats (Optional)

Risks and Mitigations

Security might be a potential risk here. We have introduced the publisher concept to mitigate the risk.

The dynamic generation of catalogs (e.g. from a versionless URL with a query param), if supported and used in declarative configs, would lead to non-reproducible builds. To mitigate it, we would strongly suggest users to use versioned catalogs from the registry in production.

Design Details

Repo Location

The ideal repo location is in kubernetes-sigs . It should be sponsored by SIG-CLI.

The repo name can be krm-functions, krm-function-registry or something more reasonable.

Management Model

Centralized Index and Release Management

The source code, documentation, releases and function metadata are completely managed in one repo. A website can be built from the repo.

Prior Art

Kpt-functions-catalog has been using this management model. Everything about the functions are managed in the Kpt-functions-catalog repo and the website is rendered from the information in the repo.

Pros and Cons

Pros:

Easier to manage.
Quality (e.g. test coverage) of the code can be enforced, since the maintainers can do the gatekeeping.

Cons:

May have scalability issues: repo maintainers may become the bottleneck.
The release pipeline (at least for containers) is required.
Supporting releasing binaries (exec mode in kustomize) will be challenging, since there are many programming languages and tools to build binaries, and it’s almost impossible to meet everyone’s needs.

Centralized Index with Distributed Release Management

The source code, documentation and releases are managed in repositories that are owned by the original function authors. The function metadata is managed in the central index repository owned by CNCF.

Prior Art

Similar model is used by Krew Index (kubectl’s plugin index) and Terraform Registry (a public registry for Terraform modules).

In Krew Index, only the manifest files (plugin metadata) are published to the krew-index repo. Here is an example plugin manifest file.

In Terraform Registry , when contributors want to publish a module, they will need to allow Terraform Registry to register a webhook in their public GitHub repository to detect git tags as releases.

Pros and Cons

Pros:

Scale well. Maintainers only need to review function metadata files when contributors publish their functions.
Encourage the KRM functions community to grow larger and faster.
It’s possible to support binaries as KRM functions in the registry, since they are built and released by the publishers.

Cons:

Quality (e.g. test coverage) of the code is hard to be enforced.
Security may be a concern if we want to support binaries as KRM functions in the registry.

Mixture Model

We can mix the 2 management models above.

We can manage the source code of the Kustomize provided KRM functions in-tree. Generic KRM functions like set-labels, set-annotations and set-namespace can be included. All the KRM functions provided by kustomize must go through a security audit. These in-tree KRM functions can serve as examples for other publishers about how to organize their functions.

The model of centralized index with distributed release is more flexible and more suitable for all vendors and other contributors. We can manage the source code of the functions contributed by the community out-of-tree.

All KRM functions in the registry must provide the metadata in the repo. We must standardize the metadata format for KRM functions, since we will require all contributors to follow it. A website can be built using the metadata information.

Website

The website code will live in the registry as well. Ideally, we don’t need to check generated html files in the repo. We can use tools to generate the site from Markdown files.

Kpt site is using docsify and kubebuilder site is using mdBook .

Function Metadata

We will use KRMFunctionDefinition kind whose schema is defined in KEP-2906 to capture the metadata for a single KRM function.

The following is an example function metadata for a container-based KRM function. We will support it starting from the alpha phase.

apiVersion: config.kubernetes.io/v1alpha1
kind: KRMFunctionDefinition
spec:
  group: example.com
  names:
    kind: SetNamespace
  description: "A short description of the KRM function"
  publisher: example.com
  versions:
    - name: v1
      schema:
        openAPIV3Schema: ... # inline schema like CRD
      idempotent: true|false
      runtime:
        container:
          image: docker.example.co/functions/set-namespace:v1.2.3
          sha256: a428de... # The digest of the image which can be verified against. This field is required if the version is semver.
          requireNetwork: true|false
          requireStorageMount: true|false
      configMap: true|false # Support ConfigMap as functionConfig. Default is false if omitted.
      usage: <a URL pointing to a README.md>
      examples:
        - <a URL pointing to a README.md>
        - <another URL pointing to another README.md>
      license: Apache 2.0
    - name: v1beta1
      ...
  maintainers: # The maintainers for this function. It doesn't need to be the same as the publisher OWNERS. 
    - foo@example.com
  tags: # keywords of the KRM functions
    - mutator
    - namespace

The following is an example for exec-based KRM function. We will support it starting from the beta phase.

apiVersion: config.kubernetes.io/v1alpha1
kind: KRMFunctionDefinition
spec:
  group: example.com
  names:
    kind: SetNamespace
  description: "A short description of the KRM function"
  publisher: example.com
  versions:
    - name: v1
      schema:
        openAPIV3Schema: ...
      idempotent: true|false
      runtime:
        exec:
          platforms:
          - bin: foo-amd64-linux
            os: linux
            arch: amd64
            uri: https://example.com/foo-amd64-linux.tar.gz
            sha256: <hash>
          - bin: foo-amd64-darwin
            os: darwin
            arch: amd64
            uri: https://example.com/foo-amd64-darwin.tar.gz
            sha256: <hash>
      configMap: true|false # Support ConfigMap as functionConfig. Default is false if omitted.
      usage: <a URL pointing to a README.md>
      examples:
        - <a URL pointing to a README.md>
        - <another URL pointing to another README.md>
      license: Apache 2.0
    - name: v1
      ...
  home: <a URL pointing to the home page>
  maintainers: # The maintainers for this function. It doesn't need to be the same as the publisher OWNERS. 
    - foo@example.com
  tags: # keywords of the KRM functions
    - mutator
    - namespace

Publishing Workflow

We only support publishing container-based KRM function in the public registry. We will only cover the workflow for that.

The developer needs to do the following:

Build a container image and pushed it to a publicly accessible container registry.
Ensure the usage doc is a markdown file and is up-to-date and publicly accessible.
Create a file called krm-function-metadata.yaml which contains the metadata that satisfies the KRM function metadata schema above.
Checkout the KRM function registry repo.
Move the krm-function-metadata.yaml file to the desired location in the repo by following the repo layout convention (discussed below).
Depending on the requirements (discussed in an earlier session) of different publisher type, choose the right email to commit the change.
Create a PR and get it reviewed and approved by the publisher OWNERS.

Repo Layout Convention

├── publishers
│   ├── communities
│   │   ├── kustomize
│   │   │   ├── fn-foo
│   │   │   │   └── krm-function-metadata.yaml
│   │   │   ├── fn-bar
│   │   │   └── OWNERS # OWNERS of the publisher
│   │   ├── kubeflow
│   │   ├── sig-cli
│   │   └── OWNERS # OWNERS to approve new community publishers
│   ├── companies
│   │   ├── apple
│   │   │   ├── fn-baz
│   │   │   └── OWNERS # OWNERS of the publisher
│   │   ├── google
│   │   └── OWNERS # OWNERS to approve new company publishers
│   ├── github-orgs
│   └── individuals
├── krm-functions # in-tree functions implementation
│   ├── kustomize
│   │   ├── fn-foo
│   │   └── OWNERS # OWNERS to approve code change to the function
│   └── sig-cli
├── site # Stuff related to the site
└── OWNERS

Test Plan

For the sig-sponsored KRM functions, they should be tested in-tree. And if we develop a test harness, it should live in-tree. If kustomize has an existing test harness, we can leverage it or move it to the registry repo.

For KRM functions that are not sig-sponsored, the maintainers are responsible for testing them.

Graduation Criteria

Alpha

Set up the repo in kubernetes-sigs.
Set up build and release pipeline for sig-sponsored functions (including kustomize’s).
Set up CI for publishers.
Support container-based KRM functions.

Beta

Gather feedback from developers and contributors.
Support exec-based KRM functions.

GA

TBD

Upgrade / Downgrade Strategy

Version Skew Strategy

Production Readiness Review Questionnaire

Feature Enablement and Rollback

How can this feature be enabled / disabled in a live cluster?

Feature gate (also fill in values in kep.yaml)
- Feature gate name:
- Components depending on the feature gate:
Other
- Describe the mechanism:
- Will enabling / disabling the feature require downtime of the control plane?
- Will enabling / disabling the feature require downtime or reprovisioning of a node?

Does enabling the feature change any default behavior?

Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

What happens if we reenable the feature if it was previously rolled back?

Are there any tests for feature enablement/disablement?

Rollout, Upgrade and Rollback Planning

How can a rollout or rollback fail? Can it impact already running workloads?

What specific metrics should inform a rollback?

Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

Monitoring Requirements

How can an operator determine if the feature is in use by workloads?

How can someone using this feature know that it is working for their instance?

Events
- Event Reason:
API .status
- Condition name:
- Other field:
Other (treat as last resort)
- Details:

What are the reasonable SLOs (Service Level Objectives) for the enhancement?

What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?

Metrics
- Metric name:
- [Optional] Aggregation method:
- Components exposing the metric:
Other (treat as last resort)
- Details:

KEP-2985: Public KRM Functions Registry

KEP-2985: Public KRM Functions Registry

Release Signoff Checklist

Summary

Motivation

Goals

Non-Goals

Proposal

Publisher

Security

Trust

User Stories (Optional)

Story 1

Story 2

Story 3

Story 4

Story 5

Story 6

Notes/Constraints/Caveats (Optional)

Risks and Mitigations

Design Details

Repo Location

Management Model

Centralized Index and Release Management

Prior Art

Pros and Cons

Centralized Index with Distributed Release Management

Prior Art

Pros and Cons

Mixture Model

Website

Function Metadata

Publishing Workflow

Repo Layout Convention

Test Plan

Graduation Criteria

Alpha

Beta

GA

Upgrade / Downgrade Strategy

Version Skew Strategy

Production Readiness Review Questionnaire

Feature Enablement and Rollback

How can this feature be enabled / disabled in a live cluster?

Does enabling the feature change any default behavior?

Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

What happens if we reenable the feature if it was previously rolled back?

Are there any tests for feature enablement/disablement?

Rollout, Upgrade and Rollback Planning

How can a rollout or rollback fail? Can it impact already running workloads?

What specific metrics should inform a rollback?

Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

Monitoring Requirements

How can an operator determine if the feature is in use by workloads?

How can someone using this feature know that it is working for their instance?

What are the reasonable SLOs (Service Level Objectives) for the enhancement?

What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?

Are there any missing metrics that would be useful to have to improve observability of this feature?

Dependencies

Does this feature depend on any specific services running in the cluster?

Scalability

Will enabling / using this feature result in any new API calls?

Will enabling / using this feature result in introducing new API types?

Will enabling / using this feature result in any new calls to the cloud provider?

Will enabling / using this feature result in increasing size or count of the existing API objects?

Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?

Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?

Troubleshooting

How does this feature react if the API server and/or etcd is unavailable?

What are other known failure modes?

What steps should be taken if SLOs are not being met to determine the problem?

Implementation History

Drawbacks

Alternatives

Infrastructure Needed (Optional)