KEP-980: Finalizer Protection for Service LoadBalancers
Finalizer Protection for Service LoadBalancers
Table of Contents
Summary
We will be adding finalizer protection to ensure the Service resource is not
fully deleted until the correlating load balancer resources are deleted. Any
service that has type=LoadBalancer (both existing and newly created ones)
will be attached a service LoadBalancer finalizer, which should be removed by
service controller upon the cleanup of related load balancer resources. Such
finalizer protection mechanism will be released with phases to ensure downgrades
can happen safely.
Motivation
There are various cases where service controller can leave orphaned load balancer resources after services are deleted (ref discussion on https://github.com/kubernetes/kubernetes/issues/32157 , https://github.com/kubernetes/kubernetes/issues/53451) . We are periodically getting bug reports and customer issues that replicated such problem, which seems to be common enough and is worth to have a better mechanism for ensuring the cleanup of load balancer resources.
Goals
Ensure the Service resource is not fully deleted until the correlating load balancer resources are deleted.
Proposal
We are going to define a finalizer for service LoadBalancers with name
service.kubernetes.io/load-balancer-cleanup. This finalizer will be attached
to any service that has type=LoadBalancer if the cluster has the cloud
provider integration enabled. Upon the deletion of such service, the actual
deletion of the resource will be blocked until this finalizer is removed.
This finalizer will not be removed until cleanup of the correlating load
balancer resources are considered finished by service controller.
Note that the removal of this finalizer might also happen when service type
changes from LoadBalancer to another. This however doesn’t change the
implication that the resources cleanup must be fulfilled before fully deleting
the service.
The lifecyle of a LoadBalancer type service with finalizer would look like:
- Creation
- User creates a service.
- Service controller observes the creation and attaches finalizer to the service.
- Provision of load balancer resources.
- Deletion
- User issues a deletion for the service.
- Service resource deletion is blocked due to the finalizer.
- Service controller observed the deletion timestamp is added.
- Cleanup of load balancer resources.
- Service controller removes finalizer from the service.
- Service resource deleted.
- Update to another type
- User update service from
type=LoadBalancerto another. - Service controller observed the update.
- Cleanup of load balancer resources.
- Service controller removes finalizer from the service.
- User update service from
The expected cluster upgrade/downgrade path for service with finalizer would be:
- Upgrade from pre-finalizer version
- All existing
LoadBalancerservices will be attached a finalzer upon startup of the new version of service controller. - The newly created
LoadBalancerservices will have finalizer attached upon creation.
- All existing
- Downgrade from with-finailzer version
- All existing
LoadBalancerservice will have the attached finalizer removed upon the cleanup of load balancer resources. - The newly created
LoadBalancerservices will not have finailzer attached.
- All existing
To ensures that downgrades can happen safely, the first release will include the “remove finalizer” logic with the “add finalizer” logic behind a gate. Then in a later release we will remove the feature gate and enable both the “remove” and “add” logic by default.
As such, we are proposing Alpha/Beta/GA phase for this enhancement as below:
- Alpha: Finalizer cleanup will always be on. Finalizer addition will be off by default but can be enabled via a feature gate.
- Beta: Finalizer cleanup will always be on. Finalizer addition will be on by default but can be disabled via a feature gate.
- GA: Service LoadBalancers Finalizer Protection will always be on.
Risks and Mitigations
n+2 upgrade/downgrade is not supported
If user does n+2 upgrade from v1.14 -> v1.16 and then does a downgrade back to v1.14.
They would have added finalizers to the Service but then lose the removal logic on
the downgrade. And hence Service with type=LoadBalancer can’t be deleted until the
finalizer on it is manually removed.
To keep the upgrade/downgrade safe a user would always do n+1 upgrade/downgrade as stated on https://kubernetes.io/docs/setup/version-skew-policy/#supported-component-upgrade-order .
Other notes
If the cloud provider opts-out of LoadBalancer support, service controller won’t be run at all (see here ). Hence finalizer won’t be added/removed by service controller.
If any other custom controller that watches Service with type=LoadBalancer, it
should implement its own finalizer protection.
Test Plan
We will implement e2e test cases to ensure:
- Service finalizer protection works with various service lifecycles on a cluster that enables it.
In addition to above, we should have upgrade/downgrade tests that:
- Verify the downgrade path and ensure service finalizer removal works.
- Verify the upgrade path and ensure finalizer protection works with existing LB services.
Graduation Criteria
Beta: Allow Alpha (“remove finalizer”) to soak for at least one release, then switch the “add finalizer” logic to be on by default.
GA: Allow Beta to soak for at least one release. (There is no behavioral differences from the Beta phase.)
Implementation History
- 2017-10-25 - First attempt of adding finalizer to service (https://github.com/kubernetes/kubernetes/pull/54569 )
- 2018-07-06 - Split finalizer cleanup logic to a separate PR (https://github.com/kubernetes/kubernetes/pull/65912 )
- 2019-04-23 - Creation of the KEP
- 2019-05-23 - PR merged for adding finalizer support in LoadBalancer services (https://github.com/kubernetes/kubernetes/pull/78262 )