KEP-2436: Controller Manager Leader Migration
Controller Manager Leader Migration
Table of Contents
- Release Signoff Checklist
- Summary
- Motivation
- Proposal
- Design Details
- Production Readiness Review Questionnaire
- Implementation History
- Drawbacks
- Alternatives
Release Signoff Checklist
Items marked with (R) are required prior to targeting to a milestone / release.
- (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
- (R) KEP approvers have approved the KEP status as
implementable - (R) Design details are appropriately documented
- (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
- (R) Graduation criteria is in place
- (R) Production readiness review completed
- (R) Production readiness review approved
- “Implementation History” section is up-to-date for milestone
- User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
- Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
Summary
Support a migration process for large scale and highly available Kubernetes clusters using the in-tree cloud providers (via kube-controller-manager and kubelet) to their out-of-tree equivalents (via cloud-controller-manager).
Motivation
SIG Cloud Provider is in the process of migrating the cloud specific code from the core Kubernetes tree to external packages and removing them from the kube-controller-manager, where they are today embedded. Once the extraction has been completed, existing users running older versions of Kubernetes need a process to migrate their existing clusters to use the new cloud-controller-manager component with minimal risk.
This KEP proposes a mechanism in which HA clusters can safely migrate “cloud specific” controllers between the kube-controller-manager and the cloud-controller-manager via a shared resource lock between the two components. The pattern proposed in this KEP should be reusable by other components in the future if desired.
The migration mechanism outlined in this KEP should only be used for Kubernetes clusters that have very strict requirements on control plane availability. If a cluster can tolerate short intervals of downtime, it is recommended to update your cluster with in-tree cloud providers disabled, and then deploy the respective out-of-tree cloud-controller-manager.
Goals
- Define migration process for large scale, highly available clusters to migrate from the in-tree cloud provider mechanism, to their out-of-tree equivalents.
Non-Goals
- Removing cloud provider code from the core Kubernetes tree, this effort is separate and is covered in KEP-removing-in-tree-providers
- Improving the scalability of controllers by running controllers across multiple components (with or without leader election).
- Migrating cloud-based volume plugins to CSI. This is a separate effort led by SIG Storage. See this proposal for more details.
Proposal
Today, the kube-controller-manager (KCM) and cloud-controller-manager (CCM) run independent of each other. This means that both the KCM or the CCM can run the cloud specific control loops for a given cluster. For a highly available control plane to migrate from running only the KCM to running both the KCM and the CCM requires that only one process in the control plane is running the cloud specific controllers. This becomes non-trivial when introducing the CCM that runs overlapping controllers with the KCM.
For environments that can tolerate downtime, the control plane can be taken down in order to reconfigure components to use the CCM, and then bring the control plane back up. This ensures that only 1 component can be running the set of cloud controllers. For environments that have stricter requirements for uptime, some level of coordination is required between the two components to ensure that upgrading control planes does not result in running the same controller in more than one place while also accounting for version skew.
In order to coordinate the cloud-specific controllers across the KCM and the CCM, this KEP proposes a primary and N configurable secondary (a.k.a migration) leader election locks in the KCM and the CCM. The primary lock represents the current leader election resource lock in the KCM and the CCM. The set of secondary locks are defined by the cloud provider and run in parallel to the primary locks. For a migration lock defined by the cloud provider, the cloud provider also determines the set of controllers run within the migration lock and the controller manager it will run in - either the CCM or the KCM.
The properties of the migration lock are:
- must have a unique name
- the set of controllers in the lock is immutable.
- no two migration locks should have overlapping controllers
- the controller manager where the lock runs can change across releases.
- for a minor release it must run exclusively in one type of controller manager - KCM or CCM.
During migration, either the KCM or CCM may have multiple migration locks, though for performance reasons no more than 2 locks is recommended.
Let’s say we are migrating the service, route, and nodeipam controllers from the KCM to the CCM across Kubernetes versions, say v1.17 to v1.18.
In v1.17, the cloud provider would define a new migration lock called cloud-network-controller-migration which specifies those controllers to run
inside the KCM (see Figure 1). As a result, in v1.17 those controllers would run in the KCM but under the cloud-network-controller-migration leader election.
To migrate to the CCM for v1.18, the cloud provider would update the cloud-network-controller-migration lock to now run in the CCM (see Figure 2).
During a control plane upgrade, the cloud network controllers may still run in one of the KCMs that are still on v1.17. A 1.17 KCM holding the lock
will prevent any of the v1.18 CCMs from claiming the lock. When the current holder of the lock goes down, one of the controller managers eligible will acquire lock.

Figure 1: Example of migrating cloud network controllers in v1.17

Figure 2: Example of migrating cloud network controllers in v1.18
Notes/Constraints/Caveats (Optional)
Migration Configuration
The migration lock will be configured by defining new API types that will then be passed into the KCM and CCM.
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
// LeaderMigrationConfiguration provides versioned configuration for all migrating leader locks.
type LeaderMigrationConfiguration struct {
metav1.TypeMeta `json:",inline"`
// LeaderName is the name of the resource under which the controllers will be run.
LeaderName string `json:"leaderName"`
// ResourceLock indicates the resource object type that will be used to lock
// Must be "leases", default to "leases". This field is retained only for
// compatibility with previous releases.
// This field will be removed in stable (v1) API.
ResourceLock string
// ControllerLeaders contains a list of migrating leader lock configurations
ControllerLeaders []ControllerLeaderConfiguration `json:"controllerLeaders"`
}
// ControllerLeaderConfiguration provides the configuration for a migrating leader lock.
type ControllerLeaderConfiguration struct {
// Name is the name of the controller being migrated
// E.g. service-controller, route-controller, cloud-node-controller, etc
Name string `json:"name"`
// Component is the name of the component in which the controller will be running.
// E.g. kube-controller-manager, cloud-controller-manager, etc
// Or '*' meaning the controller can be run under any component that participates the migration
Component string `json:"component"`
}
Default LeaderMigrationConfiguration
The staging/controller-manager package will provide a default LeaderMigrationConfiguration which has the
component filed of all controllers that need migrating set to *, indicating the controllers can be run under
either party of the migration.
Please refer to an workthough of an example cloud controllers migration from KCM to CCM that use the default configuration.
The default values must be only used when no configuration file is specified. If a custom configuration file is specified to either controller manager, the specified configuration will completely replace default value for the corresponding controller manager.
Component Flags
Both kube-controller-manager and cloud-controller-manager will get support for the following two flags for Leader
Migration. First, --enable-leader-migration is a boolean flag which defaults to false that indicates whether Leader
Migration is enabled. Second, --leader-migration-config is an optional flag that accepts a path to a file containing
the LeaderMigrationConfiguration type serialized in yaml.
If --enable-leader-migration is true but --leader-migration-config flag is empty or not set, the
default LeaderMigrationConfiguration will be used.
If --enable-leader-migration is not set or set to false, but --leader-migration-config is set and not empty, the
controller manager will print an error at FATAL level and exit immediately. Additionally,
if --leader-migration-config is set but the configuration file cannot be read or parsed, the controller manager will
log the failure at FATAL level and exit immediately.
Example Walkthrough of Controller Migration with Default Configuration
This is an example of migrating a KCM-only Kubernetes 1.21 control plane to KCM + CCM 1.22.
After the upgrade, all cloud controllers will be moved from the KCM to the KCM. We assume KCM and CCM are running with
default controller assignments, namely, in 1.21, KCM runs route-controller, service-controller
, cloud-node-controller, and cloud-nodelifecycle-controller, and in 1.22, CCM instead will run all the 4
controllers.
If KCM and CCM are not running with the default controller assignments, a custom configuration file can be specified
with --leader-migration-config. However, this example only covers the simple case of using default configuration.
At the beginning, KCM should not have --enable-leader-migration or --leader-migration-config set, but it should
have --cloud-provider already set to an existing cloud provider (e.g. --cloud-provider=gce). At this point, KCM
runs route-controller, service-controller, cloud-node-controller, and cloud-nodelifecycle-controller. CCM is not
yet deployed.
Enable Leader Migration on Components
The default LeaderMigrationConfiguration can be represented as follows:
kind: LeaderMigrationConfiguration
apiVersion: controllermanager.config.k8s.io/v1
leaderName: cloud-provider-extraction-migration
resourceLock: leases
controllerLeaders:
- name: route
component: *
- name: service
component: *
- name: cloud-node-lifecycle
component: *
First, within 1.21 control plane, update the kube-controller-manager to set --enable-leader-migration
and --feature-gate=ControllerManagerLeaderMigration (this enables ControllerManagerLeaderMigration feature gate) but
not --leader-migration-config, this flag enables Leader Migration with default configuration, which prepares KCM to
participate in the migration.
Upgrade the Control Plane
Upgrade each node of the control plane to 1.22 with the following updates:
- KCM has neither
--enable-leader-migrationor--leader-migration-config - KCM has no cloud provider enabled with
--cloud-provider= - CCM deployed with
--enable-leader-migration - CCM has its
--cloud-providerset to the correct cloud provider
Starting from version 1.22, a proper default that represents the most recent migration will be provided.
* in the component field indicates that the controllers can be run under either kube-controller-manager or cloud-controller-manager. Because the migration happens between
the 1.21 kube-controller-manager with built-in cloud provider and 1.22 cloud-controller-manager, the controllers
can run under either controller manager, which is exactly as described in the default configuration.
During the upgrade, either KCM or CCM may hold the migration lease and thus run migrated controllers.
However, at any moment, there is one and only one of them running the migrated controllers.
After upgrade, only CCM will run the route, service, cloud-node-lifecycle controllers.
Disable Leader Migration
Once all nodes in the control plane are upgraded to 1.22, disable leader migration on the cloud-controller-manager by
unsetting the --enable-leader-migration flag.
Risks and Mitigations
- Increased apiserver load due to new leader election resource per migration configuration.
- User error could result in cloud controllers not running in any component at all.
Design Details
Test Plan
- Unit Testing:
- test resource reading, parsing, validation
- test calculation of leader differences.
- test all helpers
- Integration Testing
- test resource registration, parsing, and validation against the Schema APIs
- test interactions with the leader election APIs
- E2E Testing
- In a replicated control plane, test control plane upgrade, assert controller managers become health and ready after upgrade
- In a replicated control plane, test control plane upgrade, assert no controllers become active in both controller managers.
Graduation Criteria
Alpha -> Beta Graduation
Leader migration configuration is tested end-to-end on at least 2 cloud providers. The default migration configuration is implemented and tested.
Beta -> GA Graduation
- Leader Migration works on all in-tree cloud providers that require migration.
- Leader Migration has an automated upgrade test on a replicated control plane, with Leader Migration enabled, of the following cases
- Upgrade from KCM only to KCM + CCM
- Rollback from KCM + CCM to KCM only
Upgrade / Downgrade Strategy
See Example Walkthrough of Controller Migration for upgrade strategy. Clusters can be downgraded and migration can be disabled by reversing the steps in the upgrade strategy assuming the behavior of each controller does not change incompatibly across those versions.
Version Skew Strategy
Version skew is handled as long as the leader name is consistent across all control plane nodes during upgrade.
Production Readiness Review Questionnaire
Feature Enablement and Rollback
How can this feature be enabled / disabled in a live cluster?
- Feature gate (also fill in values in
kep.yaml)- Feature gate name:
ControllerManagerLeaderMigration - Components depending on the feature gate:
cloud-controller-managerandkube-controller-manager
- Feature gate name:
Does enabling the feature change any default behavior?
No. The user must explicitly add --enable-leader-migration flag to enable this feature. If the user enables this
feature without providing a configuration, the default configuration will reflect default situation and “just works”.
Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
Yes. Once the feature is enabled via feature gate, it can be disabled by unsetting --enable-leader-migration on KCM
and CCM.
What happens if we reenable the feature if it was previously rolled back?
This feature can be re-enabled without side effects.
Are there any tests for feature enablement/disablement?
Yes. Unit & integration tests include flag/configuration parsing. E2E test will have cases with the feature enabled and disabled.
Rollout, Upgrade and Rollback Planning
How can a rollout or rollback fail? Can it impact already running workloads?
The rollout may fail if the configuration file does not represent correct controller-to-manager assignment or configurations mismatch between controller managers. This can cause controllers referred in the configuration file to either be unavailable or run in multiple instances.
The rollback may fail if the leader election of the controller manager is not properly configured. For example, multiple instances of the same controller manager are running without election, or none of the instances become the leader. In these situations, all controllers will be either unavailable or conflict among multiple instances.
What specific metrics should inform a rollback?
If neither controller managers show leader_active for the main leader lock or the migration lock, Leader Migration may fail to activate and thus needs rollback.
If any of the controllers indicate they are unavailable through their per-controller metric, Leader Migration may need reconfiguration.
The metrics of each controller are specific to the implementation of each cloud provider and out of scope of this KEP,
Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
The manual testing showed a clean takeover of the migration leader during both upgrade and downgrade process. This process will be tested as part of the e2e suite, required by Graduation Criteria .
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
None.
Monitoring Requirements
How can an operator determine if the feature is in use by workloads?
N/A. This feature is never used by any user workloads.
How can someone using this feature know that it is working for their instance?
- Other (treat as last resort)
- The
Leaseresource used in the migration can be watched for transition of leadership and timing information. - logs and metrics can directly indicate the status of migration.
- The
Note that this feature is intended for cluster administrators, who should have access to metrics during the upgrade.
What are the reasonable SLOs (Service Level Objectives) for the enhancement?
Leader Migration is designed to ensure availability of controller managers during upgrade, and this feature will not affect SLOs of controller managers.
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
- Metrics
- per-controller health checks in both controller managers.
- Components exposing the metric: kube-controller-manager, cloud-controller-manager
Are there any missing metrics that would be useful to have to improve observability of this feature?
Status of the migration lease, provided by the API server, can help observe the transition of holders if exposed as resource metrics.
Dependencies
Does this feature depend on any specific services running in the cluster?
- API Server
- needed for leader election
- Impact of its outage on the feature: when leader election timeout, controller managers will lose the leader and exit, causing outage.
- Impact of its degraded performance or high-error rates on the feature: delayed or retried operations of leader election.
- needed for leader election
Scalability
Will enabling / using this feature result in any new API calls?
Leader Migration uses exactly one more resource of coordination.k8s.io/v1.Lease using the standard leader election process.
Both kube-controller-manager and cloud-controller-manager will create, update, and watch on the lease.
If the service accounts are not granted access to the lease resources, the RBAC roles of each controller manager may need to modified before the upgrade.
Will enabling / using this feature result in introducing new API types?
Type: controllermanager.config.k8s.io/v1.LeaderMigrationConfiguration
This resource is only for configuration file parsing. The resource should never reach the API server.
Will enabling / using this feature result in any new calls to the cloud provider?
No.
Will enabling / using this feature result in increasing size or count of the existing API objects?
This feature uses exactly one more coordination.k8s.io/v1.Lease resource. The RBAC roles of both controller managers will
gain additional ~50 bytes because of the new lease under resourceName.
Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
No.
Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?
Both kube-controller-manager and cloud-controller-manager runs another leader election process,
which cause negligible increases of CPU and memory usages, both during upgrade and under normal operations.
Troubleshooting
How does this feature react if the API server and/or etcd is unavailable?
The existing implementation of controller managers will klog.Fatal once the leader times out, which eventually happens if API server is unavailable.
Leader Migration will not change such behavior.
What are other known failure modes?
None. Leader Migration is known to fail only because of misconfiguration or unavailability of the API server, both of which are discussed above.
What steps should be taken if SLOs are not being met to determine the problem?
N/A.
Implementation History
- 07-25-2019
SummaryandMotivationsections were merged signaling SIG acceptance - 01-21-2019 Implementation details are proposed to move KEP to
implementablestate. - 09-30-2020
LeaderMigrationConfigurationandControllerLeaderConfigurationschemas merged as #94205. - 11-04-2020 Registration of both types merged as #96133
- 12-28-2020 Parsing and validation merged as #96226
- 03-10-2021 Implementation for alpha state completed, released in 1.21.
- 03-30-2021 User guide published as kubernetes/website#26970
- 05-11-2021 KEP updated to target beta.
- 01-21-2022 KEP updated to target GA.
- 01-25-2022 Testing and monitoring revised for GA.
- 03-29-2022 Implementation for GA completed, released in 1.24
Drawbacks
A single-node control plane does not need this feature. If downtime is allowed during control plane upgrade, KCM and CCM can have no migration mechanism at all.
Alternatives
Change all controllers so that they can handle a situation where two instances of the same controller are running in both KCM and CCM. This requires a massive change to all controllers and potentially require other kinds of synchronization. It would be better that the controller manager provides migration mechanism instead of relying on each controller.