KEP-4080: Add generic control plane staging repository(ies)
KEP-4080: Add generic control plane staging repository(ies)
- Release Signoff Checklist
- Summary
- Motivation
- Proposal
- Design Details
- Production Readiness Review Questionnaire
- Implementation History
- Drawbacks
- Alternatives
- Infrastructure Needed (Optional)
Release Signoff Checklist
Items marked with (R) are required prior to targeting to a milestone / release.
- (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
- (R) KEP approvers have approved the KEP status as
implementable - (R) Design details are appropriately documented
- (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
- (R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests
- (R) Production readiness review completed
- (R) Production readiness review approved
- “Implementation History” section is up-to-date for milestone
- User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
- Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
Summary
This KEP proposes factoring kube-apiserver and kube-controller-manager to build
on one or multiple new staging repositories that consume k/apiserver but have a
bigger, carefully chosen subset of the functionality of kube-apiserver and
kube-controller-manager such that it is reusable.
The factoring will be progressive: we will start with new repo(s) adding
nothing to k/apiserver, refactor in-place and then progressively move generic
functionality from kube-apiserver and kube-controller-manager to the new
repositories.
The suggested naming of the new repository(ies) is k/generic-controlplane
(-apiserver/-controllers; for simplicity we drop these suffixes in this document
until the names are finalized). Choosing the exact name(s) and split of
packages will be part of the process of implementation.
Motivation
A working kube-based control plane is more than just an apiserver component
built on k/apiserver. It includes standard resources (depending on context
namespaces, CRDs, RBAC, secrets, configmaps), and standard controllers (think
of garbage collection, namespace deletion, etc.). kube-apiserver today is a
bundle of those resources with container orchestration, kube-controller-manager
equally for the corresponding controllers.
Separating the generic parts from container orchestration will allow new
use-cases building upon k/apimachinery and k/apiserver, while keeping a
unified codebase and ecosystem, and by improving the factoring of
kube-apiserver for easier maintenance due to less complexity by clear layering.
Goals
- As always: every PR transforms a working system into a working system, and PRs are of manageable size.
- Improve factoring of kube-apiserver through layering on-top of
k/generic-controlplane, reducing complexity through more explicit structure and reduction of code ink/k. k/generic-controlplanewill provide- a
sample-generic-controlplanebinary - a modular, further customizable (in code) library suitable to build a working kube-based control plane without vendoring k/k.
- a
k/generic-controlplanewill (optionally) include the ability to define resources by CustomResourceDefinition objects.k/generic-controlplanewill be able to (optionally) delegate handling of some kinds of objects to another server, as directed by APIService objects.k/generic-controlplanewill allow customization (in code) of which generic (native) resources like secrets, configmaps, admission webhooks, RBAC, etc. are served.k/generic-controlplanewill not include the definitions of the resources in Kubernetes for the management of containerized workloads. For example, the excluded resources include: nodes, pods, daemonsets, ingresses, services, persistentvolumes.k/generic-controlplaneas a library will be agnostic to being used in separate binaries or in an all-in-one binary, both in a hyperkube-like subcommand way, and in an all-in-one k3s like way.
Non-Goals
provide and ship a de-facto standard, full-featured generic-controlplane binary:
i.e. this is clearly a library approach and consumer projects will define a feature set of a control plane. There is no new deliverable beyond a staging repository with a library and a sample binary only, with clear limited scope of demonstrating plumbing.
change anything noticeable to the user for existing binaries.
change compatibility guarantees of (server-side) staging repositories
create
k/kube-apiserveror anything similar, although this work can lead the path by defining package structures suitable fork/kube-apiserver.
Proposal
The desired outcome is a useful new library, while of course keeping everything working during iterative development. Success will be measured by community members saying that the new library is useful to them.
User Stories (Optional)
Story 1
Project kube-hyper-mini wants to maintain a main program that bundles
- the subset of kube-apiserver that is not concerned with the management of containerized workloads,
- a single-member etcd cluster, and
- the subset of kube-controller-manager that is not concerned with the
management of containerized workloads. This makes a convenient platform for
hosting kube-style APIs defined by CRDs and/or resources served by their own
extension apiserver. They use
k/generic-controlplaneto get part (1).
Story 2
Project kube-core recognizes that the three parts of kube-hyper-mini scale out
differently, and wants instead to maintain a main program that is just the
desired subset of kube-apiserver. Their main program is very little more than
a use of k/generic-controlplane.
Notes/Constraints/Caveats (Optional)
Risks and Mitigations
- This KEP is about code refactoring introducing another layer to the staging repositories that kube-apiserver and kube-controller-manager are built from. With every code refactoring there is risk of bugs. The mitigation are small, easy reviewable “obvious” PRs, iteratively moving from the old to the new structure.
Design Details
First steps are about splitting existing kube-apiserver and kube-controller-manager
packages in-place, aka inside of k/k. This includes:
cmd/kube-apiserverpkg/kubeapiserverpkg/controlplane.
Early sketch of an end-state. By far, most changes towards this goal will be code moves only:
k/generic-controlplanepkg/apispkg/apiserver/optionspkg/apiserver/serverpkg/apiserver/registrypkg/apiserver/admissionpkg/controllers/garbagecollectionpkg/controllers/namespacedeletioncmd/sample-generic-controlplane
Potentially, we will split the apiserver and controller parts into two separate repositories. This will be decided after the initial in-place steps have been done and the best structure has become clearer how to host the new packages.
Test Plan
[ ] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.
Prerequisite testing updates
Unit tests
<package>:<date>-<test coverage>
Integration tests
:
e2e tests
:
Graduation Criteria
Upgrade / Downgrade Strategy
Version Skew Strategy
Production Readiness Review Questionnaire
Feature Enablement and Rollback
There will be no changes to system behavior. The typical alpha/beta/GA stages and requirements do not apply as this KEP proposed code moves of existing code, without changing its alpha/beta/GA status.
How can this feature be enabled / disabled in a live cluster?
- Feature gate (also fill in values in
kep.yaml)- Feature gate name:
- Components depending on the feature gate:
- Other
- Describe the mechanism: Does not apply. This is a code move of existing code without functional changes.
Does enabling the feature change any default behavior?
Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
What happens if we reenable the feature if it was previously rolled back?
Are there any tests for feature enablement/disablement?
Rollout, Upgrade and Rollback Planning
How can a rollout or rollback fail? Can it impact already running workloads?
What specific metrics should inform a rollback?
Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
Monitoring Requirements
How can an operator determine if the feature is in use by workloads?
How can someone using this feature know that it is working for their instance?
- Events
- Event Reason:
- API .status
- Condition name:
- Other field:
- Other (treat as last resort)
- Details:
What are the reasonable SLOs (Service Level Objectives) for the enhancement?
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
- Metrics
- Metric name:
- [Optional] Aggregation method:
- Components exposing the metric:
- Other (treat as last resort)
- Details: