KEP-950: Add pod-startup liveness-probe holdoff for slow-starting pods
Add pod-startup liveness-probe holdoff for slow-starting pods
Table of Contents
Release Signoff Checklist
- kubernetes/enhancements issue in release milestone, which links to KEP (this should be a link to the KEP location in kubernetes/enhancements, not the initial KEP PR)
- KEP approvers have set the KEP status to
implementable - Design details are appropriately documented
- Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
- Graduation criteria is in place
- “Implementation History” section is up-to-date for milestone
- User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
- Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
Summary
Slow starting containers are difficult to address with the current status of health probes: they are either killed before being up, or could be left deadlocked during a very long time before being killed.
This proposal adds a new probe called startupProbe that holds off all the other probes until the pod has finished its startup. In the case of a slow-starting pod, it could poll on a relatively short period with a high failureThreshold. Once it is satisfied, the other probes can start.
Motivation
Slow starting containers here refer to containers that require a significant amount of time (one to several minutes) to start. There can be various reasons for this slow startup:
- long data initialization: only the first startup takes a lot of time
- heavy workload: every startups take a lot of time
- underpowered/overloaded node: startup times depend on external factors (however, solving node related issues is not a goal of this proposal)
The main problem with this kind containers is that they should be given enough time to start before having livenessProbe fail failureThreshold times, which triggers a kill by the kubelet before they have a chance to be up.
There are various strategies to handle this situation with the current API:
- Delay the initial
livenessProbesufficiently to permit the container to start up (setinitialDelaySecondsgreater than startup time). While this ensures nolivenessProbewill run and fail during the startup period (triggering a kill), it also delays deadlock detection if the container starts faster thaninitialDelaySeconds. Also, since thelivenessProbeisn’t run at all during startup, there is no feedback loop on the actual startup time of the container. - Increase the allowed number of
livenessProbefailures untilkubeletkills the container (setfailureThresholdso thatfailureThresholdtimesperiodSecondsis greater than startup time). While this gives enough time for the container to start up and allows a feedback loop, it prevents the container from being killed in a timely manner if it deadlocks or otherwise hangs after it has initially successfully come up.
However, none of these strategies provide an timely answer to slow starting containers stuck in a deadlock, which is the primary reason of setting up a livenessProbe.
Goals
- Allow slow starting containers to run safely during startup with health probes enabled.
- Improve documentation of the
Probestructure in core types’ API. - Improve
kubernetes.io/docssection about Pod lifecycle:- Clearly state that PostStart handlers do not delay probe executions.
- Introduce and explain this new probe.
- Document appropriate use cases for this new probe.
Non-Goals
- This proposal does not address the issue of pod load affecting startup (or any other probe that may be delayed due to load). It is acting strictly at the pod level, not the node level.
- This proposal will only update the official Kubernetes documentation, excluding A Pod’s Life and other well referenced pages explaining probes.
Proposal
Implementation Details
The proposed solution is to add a new probe named startupProbe in the container spec of a pod which will determine whether it has finished starting up.
It also requires keeping the state of the container (has the startupProbe ever succeeded?) using a boolean Started inside the ContainerStatus struct.
Depending on Started the probing mechanism in worker.go might be altered:
Started == true: the kubelet worker works the same way as todayStarted == false: the kubelet worker only probes thestartupProbe
If startupProbe fails more than failureThreshold times, the result is the same as today when livenessProbe fails: the container is killed and might be restarted depending on restartPolicy.
If no startupProbe is defined, Started is initialized with true.
Why a new probe instead of initializationFailureThreshold
While trying to merge PR #1014 in time for code-freeze, @thockin has make the following points which I agree with:
I feel pretty strongly that something like a startupProbe would be net simpler to comprehend than a new field on liveness.
In issuecomment-437208330 we looked at a different take on this API - it is more precise in its meaning and rather than add yet another behavior modifier to probe, it can reuse the probe structure directly.
Here is the excerpt of issuecomment-437208330 talking about the design:
An idea that I toyed with but never pursued was a StartupProbe - all the other probes would wait on it at pod startup. It could poll on a relatively short period with a long FailureThreshold. Once it is satisfied, the other probes can start.
I also think the third probe gives more flexibility if we find other good reasons to inhibit livenessProbe or readinessProbe before something occurs during container startup.
Configuration example
This example shows how startupProbe can be used to emulate the functionality of initializationFailureThreshold as it was proposed before:
ports:
- name: liveness-port
containerPort: 8080
hostPort: 8080
livenessProbe:
httpGet:
path: /healthz
port: liveness-port
failureThreshold: 1
periodSeconds: 10
startupProbe:
httpGet:
path: /healthz
port: liveness-port
failureThreshold: 30 (=initializationFailureThreshold)
periodSeconds: 10
Design Details
Test Plan
Unit tests will be implemented with newTestWorker and will check the following:
- proper initialization of
Startedto false Startedbecomes true as soon asstartupProbesucceedslivenessProbeandreadinessProbeare disabled untilStartedis truestartupProbeis disabled afterStartedbecomes truefailureThresholdexceeded forstartupProbekills the container
E2e tests will also cover the main use-case for this probe:
startupProbedisableslivenessProbelong enough to simulate a slow starting container, using a highfailureThreshold
Feature Gate
- Expected feature gate key:
StartupProbeEnabled - Expected default value:
false
Graduation Criteria
- Alpha: Initial support for
startupProbeadded. Disabled by default. - Beta:
startupProbeenabled with no default configuration. - Stable:
startupProbeenabled with no default configuration.
Implementation History
- 2018-11-27: prototype implemented in PR #71449 under review
- 2019-03-05: present KEP to sig-node
- 2019-04-11: open issue in enhancements #950
- 2019-05-01: redesign to additional probe after @thockin proposal
- 2019-05-02: add test plan
Version 1.16
Version 1.17
- Fix
startup_probe_test.gofailing test #82747 - Add
startupProberesult handling to kuberuntime #84279 - Clarify startupProbe e2e tests #84291
Version 1.18
Version 1.19
- Pods which have not “started” can not be “ready” #92196
Version 1.20
- Graduate
startupProbeto GA #94160