KEP-6063: Per-Pod PID Limit
KEP-6063: Per-Pod PID Limit
- Release Signoff Checklist
- Summary
- Motivation
- Proposal
- Design Details
- Production Readiness Review Questionnaire
- Implementation History
- Drawbacks
- Alternatives
- Infrastructure Needed (Optional)
Release Signoff Checklist
Items marked with (R) are required prior to targeting to a milestone / release.
- (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
- (R) KEP approvers have approved the KEP status as
implementable - (R) Design details are appropriately documented
- (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
- (R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests within one minor version of promotion to GA
- (R) Production readiness review completed
- (R) Production readiness review approved
- “Implementation History” section is up-to-date for milestone
- User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
- Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
Summary
Add support for setting PID limits at the Pod level to restrict below the node-level kubelet PID limit. This allows individual Pods to specify their own PID limits using spec.resources.limits.pid in the Pod specification. The effective limit is the lower of the Pod-specified value and the node’s podPidsLimit.
Motivation
Currently, PID limiting in Kubernetes is configured globally at the node level via the kubelet’s podPidsLimit setting. This creates challenges when different workloads on the same node have different PID requirements:
Infrastructure pods affected by low limits: If a cluster administrator sets a low PID limit to protect against runaway processes, this limit applies equally to all pods on the node, including critical infrastructure pods that may legitimately need more PIDs.
Agent and sandbox workloads: Modern workloads like AI agent runtimes (Agent Sandbox, OpenShell) and service mesh sidecars (Istio) often have specific PID requirements that differ from typical application pods.
No granular control: Administrators cannot apply different PID limits to different types of workloads without node-level segregation, which is operationally expensive.
Pod-level resource model: With the introduction of Pod-level resources in Kubernetes, PID limits logically belong alongside other Pod-level resource constraints.
Goals
- Allow Pods to specify PID limits via
spec.resources.limits.pidin the Pod specification - Restrict below the kubelet’s default
podPidsLimitwhen a Pod-level limit is set - Use the lower value when both kubelet and Pod limits are specified
- Implement PID limiting at the Pod cgroup level for cgroupsv2
- Maintain backward compatibility with existing node-level PID limiting
Non-Goals
- Support for non-Linux operating systems (PID cgroup limits are Linux-specific)
- Support for cgroupsv1 (this enhancement targets cgroupsv2 only)
- Container-level PID limits (this KEP focuses on Pod-level limits only)
- Automatic PID limit calculation based on workload characteristics
- Changes to the existing node-level PID limiting mechanism
- In-place resize of PID limits (in-place resize currently supports only CPU and memory)
Proposal
Add a new pid resource type that can be specified under spec.resources.limits in the Pod specification. When set, the effective PID limit is min(podPidsLimit, pod.spec.resources.limits.pid) — the Pod can only restrict further, not exceed the node limit.
API Changes
Extend the Pod specification to support PID limits:
apiVersion: v1
kind: Pod
metadata:
name: example-pod
spec:
resources:
limits:
pid: "2048" # Pod-level PID limit
containers:
- name: app
image: my-app:latest
resources:
limits:
cpu: "2"
memory: "4Gi"
- name: sidecar
image: log-agent:latest
resources:
limits:
cpu: "100m"
memory: "64Mi"
- Only
spec.resources.limits.pidis accepted;spec.resources.requests.pidis forbidden and will be rejected by the API server - When
spec.resources.limits.pidis specified, the kubelet applies this limit to the Pod’s cgroup - All containers in the Pod share the same PID pool enforced by the pod cgroup
- If both kubelet
podPidsLimitand Podspec.resources.limits.pidare set, the kubelet enforcesmin(podPidsLimit, spec.resources.limits.pid). The Pod may request a value higher than the node’spodPidsLimit(up to the API maximum of 16384), but the kubelet always caps at the node’spodPidsLimit, ensuring the node administrator retains control. - If
spec.resources.limits.pidis not specified, the kubelet’spodPidsLimitapplies (existing behavior)
Valid Values
Following the same constraints as node-level podPidsLimit:
- Minimum: 1024 (conservative lower bound)
- Maximum: 16384 (maximum supported in managed environments)
- Default: If not specified, inherits kubelet’s
podPidsLimit
User Stories
Story 1
Agent Sandbox Workloads
As a platform engineer running AI agent workloads (such as Agent Sandbox or OpenShell), I need to restrict PID limits for agent pods because they run untrusted code, while allowing trusted application pods to use the full node default.
Current state: I must set a low podPidsLimit at the node level to
protect against runaway agent processes, but this unnecessarily restricts
all pods on the node, including trusted application pods that may
legitimately need more PIDs.
With this enhancement: I can keep the node’s podPidsLimit at a
comfortable level (e.g., 4096) for trusted application pods. I can then
set spec.resources.limits.pid: "2048" on agent pods to contain untrusted
workloads. Because the kubelet enforces the lower of the two limits, agent
pods are safely restricted to 2048, while trusted application pods can use
the full node default of 4096.
Story 2
Infrastructure Pod Protection
As a cluster administrator, I want to set a conservative PID limit (2048) for tenant workloads while allowing infrastructure pods (e.g., monitoring agents, log collectors, service mesh proxies) to use higher limits (4096) on the same nodes.
Current state: I must choose between protecting the node (a low node limit that starves infrastructure pods) or allowing infrastructure pods to function (a high node limit that risks PID exhaustion from tenant pods).
With this enhancement: I can set the kubelet’s podPidsLimit to 4096 to
support infrastructure pods. I can then enforce
spec.resources.limits.pid: "2048" on tenant pods (via admission control or
default LimitRanges). Since the lower value wins, tenant pods are strictly
capped at 2048, while infrastructure pods can scale up to the node limit of
4096.
Story 3
Service Mesh Sidecar Requirements
As an application developer using Istio, my application pods with Envoy sidecars need more PIDs than standard applications because the sidecar creates multiple worker threads and connections.
Current state: If the default namespace limit is too low, I have no way to
increase my PID limit without the cluster administrator globally changing the
node-level podPidsLimit, which reduces overall cluster safety.
With this enhancement: The cluster administrator sets the node’s
podPidsLimit to 8192. Standard application pods set
spec.resources.limits.pid: "2048" to stay conservative. For my Istio
pods, I set spec.resources.limits.pid: "4096" to give the Envoy sidecar
the headroom it needs. The kubelet enforces min(8192, 4096) = 4096,
granting my pod more PIDs without affecting other workloads or requiring
a node-level configuration change.
Notes/Constraints/Caveats
cgroupsv2 only: This enhancement requires cgroupsv2. On cgroupsv1 nodes, the kubelet will reject pods that specify spec.resources.limits.pid during admission, rather than silently ignoring the field. This prevents a situation where the user expects PID enforcement but the node cannot deliver it.
Lower value precedence: When both kubelet and Pod specify limits, the lower value is enforced. This ensures that node-level protections cannot be bypassed by Pod specifications.
Hierarchical cgroup limits: PID limits are hierarchical in cgroups, so the most restrictive limit in the hierarchy applies.
Resource quota and limits: PID limits should be considered in cluster resource planning, though they do not consume schedulable resources like CPU or memory.
Pod Security Admission (PSA) compatibility: The spec.resources.limits.pid field has no PSA impact. Pods specifying PID limits are admitted under all PSA profiles (Privileged, Baseline, and Restricted) without triggering any policy violations. PSA does not inspect resource limits, so no special configuration or exemptions are needed.
Node Declared Features: This feature integrates with the Node Declared Features framework (KEP-5328
) to handle version skew and mixed cgroupsv1/v2 clusters. Nodes that support per-pod PID limits declare PerPodPIDLimit in node.status.declaredFeatures, enabling the scheduler to avoid placing pods with spec.resources.limits.pid on incompatible nodes.
hostPID compatibility: Pods with hostPID: true and spec.resources.limits.pid are fully supported. The PID namespace (process visibility) and PID cgroup controller (process creation accounting) are independent kernel subsystems. hostPID allows the container to see host processes, but those processes belong to different cgroups and are not counted against the pod’s PID limit. The cgroup pids.max limit is enforced regardless of PID namespace configuration.
Risks and Mitigations
Users expect to raise PID limits above node default
- Risk: The “lower value wins” design means Pods can only restrict, not raise. Users may expect
pid: 8192to override a nodepodPidsLimitof 4096. - Mitigation: Clear documentation. The node admin retains control — raise
podPidsLimitif workloads need more.
- Risk: The “lower value wins” design means Pods can only restrict, not raise. Users may expect
Interaction with existing PID exhaustion protections
- Risk: A Pod could set a very low PID limit, making itself non-functional.
- Mitigation: API validation enforces a minimum of 1024 and a maximum of 16384, so extremely low or high values are rejected. Even at the minimum (1024), the pod has sufficient PIDs to function.
Version skew: apiserver accepts but kubelet ignores
- Risk: If only apiserver enables the gate, users think their limit is enforced but kubelet applies the node default.
- Mitigation: Documented in Version Skew Strategy. Both components must enable the gate for enforcement.
Design Details
Feature Gate
The feature is controlled by a new feature gate: PerPodPIDLimit.
This feature depends on the PodLevelResources feature gate (Beta, enabled by default
since v1.34) because pid is specified under pod.spec.resources.limits. If
PodLevelResources is disabled, PerPodPIDLimit cannot be enabled — the kubelet
and API server will reject the configuration at startup.
When the feature gate is disabled:
- The kube-apiserver rejects Pods that specify
pidunderspec.resources.limits - Existing Pods that already contain
pidare preserved and remain unchanged - The kubelet ignores the
pidlimit and continues using the node-levelpodPidsLimitsetting instead
API Validation
Add pid to the list of valid resource types
Validate that pid values are integers within the allowed range (1024-16384)
Kubelet Implementation
Parse Pod specification: The kubelet reads spec.resources.limits.pid from the Pod spec during Pod admission.
Limit enforcement: When creating the Pod cgroup, the kubelet sets effective PID limits based on:
- If Pod
spec.resources.limits.pidis set:min(podPidsLimit, pod.spec.resources.limits.pid) - If Pod
spec.resources.limits.pidis not set:podPidsLimit(current behavior)
Validation: The kubelet validates that the PID limit is within acceptable bounds (1024 to 16384).
cgroupsv2 requirement: The kubelet checks the cgroup version and only applies Pod-level limits on cgroupsv2 systems. On cgroupsv1 systems, the kubelet rejects pods that specify spec.resources.limits.pid during admission.
Event on capping: When a pod’s requested PID limit exceeds the node’s podPidsLimit, the kubelet emits a PIDLimitCapped event on the pod, indicating the effective limit was capped at the node-defined maximum. This gives users visibility when their requested PID limit is not fully honored.
Pod Security Admission (PSA)
This feature has no PSA interaction. Pod Security Admission does not inspect
resource limits (spec.resources.limits), so pods specifying
spec.resources.limits.pid are admitted under all PSA profiles (Privileged,
Baseline, and Restricted) without triggering any policy violations. No special
configuration, exemptions, or PSA policy changes are required.
Eviction Manager Interaction
The kubelet eviction manager monitors node-level PID availability and can set
the PIDPressure node condition when available PIDs fall below configured
eviction thresholds. When PID pressure is detected, the eviction manager may
evict pods to recover node stability.
Per-pod PID limits do not modify the eviction manager’s behavior, eviction
thresholds, or pod ranking logic. PID pressure detection continues to be based
on node-level PID availability (pid.available) and operates independently of
pod-level PID limits.
Per-pod PID limits act as a preventative control by limiting the maximum number
of processes that an individual pod may create. This can reduce the likelihood
of node-wide PID exhaustion and therefore reduce the frequency of
PIDPressure conditions.
Node Declared Features Integration
This feature integrates with the Node Declared Features framework (KEP-5328 , GA since v1.36) to handle version skew and mixed cgroupsv1/v2 clusters gracefully.
When the PerPodPIDLimit feature gate is enabled and the node supports
cgroupsv2, the kubelet declares PerPodPIDLimit in
node.status.declaredFeatures during bootstrap. This enables:
- Scheduler filtering: The scheduler infers that a pod with
spec.resources.limits.pidrequiresPerPodPIDLimitand only places it on nodes that declare the feature. This prevents pods from being scheduled to cgroupsv1 nodes or nodes without the feature gate, avoiding kubelet admission rejections that would put the pod into a non-retriableFailedstate. - Kubelet admission validation: Before admitting a pod, the kubelet validates the pod’s feature requirements against its declared features as a secondary check (handles node restart with a feature gate flip).
Test Plan
[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.
Prerequisite testing updates
No prerequisite testing updates are required. The existing PID limiting tests
in test/e2e_node/pids_test.go (PodPidsLimit suite) cover the node-level
podPidsLimit behavior which this feature builds upon.
Unit tests
This feature touches API validation, field stripping, and kubelet cgroup enforcement. For Alpha, unit test coverage for the following packages is added:
pkg/apis/core/validationwill be updated with validation rules for thepidresource (range 1024-16384, container-level rejection,requests.pidforbidden).pkg/api/podwill be updated with field stripping logic when thePerPodPIDLimitgate is disabled (requests.pidis always stripped).pkg/kubelet/cmwill be updated withgetPodPIDLimitlogic (pod limit lower/higher than node, no pod limit fallback), cgroupsv1 rejection error message,PIDLimitCappedevent emission when the effective limit is capped, and static pod PID limit enforcement (kubelet appliesgetPodPIDLimituniformly for both regular and static pods).pkg/apis/core/helperwill be updated to excludeResourcePIDfromstandardContainerResources.pkg/kubelet/evictionwill be updated to verify that PID pressure detection, eviction ranking, and pod selection are unaffected byspec.resources.limits.pidwhen thePerPodPIDLimitgate is enabled.k8s.io/component-helpers/nodedeclaredfeatureswill be updated to verify that the kubelet declaresPerPodPIDLimitinnode.status.declaredFeatureswhen the feature gate is enabled and cgroupsv2 is available, and does not declare it when the gate is disabled or on cgroupsv1 nodes.
k8s.io/kubernetes/pkg/apis/core/validation: API validation of pid resource
k8s.io/kubernetes/pkg/api/pod: Field stripping when gate is disabled
k8s.io/kubernetes/pkg/kubelet/cm: getPodPIDLimit enforcement logic
k8s.io/kubernetes/pkg/apis/core/helper: ResourcePID helper functions
k8s.io/kubernetes/pkg/kubelet/eviction: Eviction manager unaffected by per-pod PID limits
k8s.io/component-helpers/nodedeclaredfeatures: Node Declared Features registration and inference
Integration tests
e2e tests provide good test coverage of the interaction between the API server (validation of pid resource) and the kubelet (cgroup enforcement). We may add integration tests before Beta to cover API admission edge cases that are not covered by the planned unit and e2e tests.
e2e tests
- Pod PID limit lower than node default is applied.
- Pod PID limit higher than node default is capped at node default and a
PIDLimitCappedevent is emitted. - Multi-container pod shares a single pod-level PID limit.
- No pod PID limit falls back to node default.
- Pod with pid limit is admitted under Baseline PSA profile and limit is enforced.
- Pod with pid limit is admitted under Restricted PSA profile and limit is enforced.
- Pod specifying
spec.resources.limits.pidon a cgroupsv1 node is rejected during admission. - Static pod with
spec.resources.limits.pidhas PID limit enforced at the pod cgroup level (bypasses apiserver). - Static pod without
spec.resources.limits.pidfalls back to node-levelpodPidsLimit. - Node enters
PIDPressurewhen available PIDs fall below configured eviction thresholds, regardless of pod-level PID limits. - Eviction manager continues to evict pods under
PIDPressure; pod-level PID limits do not alter eviction behavior or pod selection. - Pod-level PID limits reduce aggregate PID consumption and can prevent a workload from triggering
PIDPressurecompared to the node-level default limit. - PID pressure calculation is based on node-level PID availability and is unaffected by the configured value of
spec.resources.limits.pid. - Node with
PerPodPIDLimitenabled and cgroupsv2 declaresPerPodPIDLimitinnode.status.declaredFeatures. - Pod with
spec.resources.limits.pidis not scheduled to a node that does not declarePerPodPIDLimit. - Pod with
hostPID: trueandspec.resources.limits.pidhas the pod cgrouppids.maxset correctly; spawning processes up to the limit succeeds and fork fails withEAGAINat the limit; host processes are visible viapsbut are not counted against the pod’s cgroup PID limit.
- PerPodPIDLimit : SIG Node , triage search
- PerPodPIDLimit PSA Compatibility : validates admission and enforcement under Baseline and Restricted PSA profiles
Graduation Criteria
Alpha (target 1.37)
- Feature implemented behind
PerPodPIDLimitfeature gate (disabled by default) - Kubelet enforces
min(podPidsLimit, pod.spec.resources.limits.pid)on Pod cgroup - Unit tests and initial node e2e tests completed
kubelet_pod_pid_limit_applied_totalmetric exposed
Beta (target 1.38)
- Feature enabled by default
- Node e2e tests stable in Testgrid for at least one release
- Downgrade and upgrade testing completed
- Address feedback and bugs reported during Alpha
GA (target 1.40)
- At least 2 releases in Beta without major bugs
- Remove feature gate
- Conformance tests added
- No negative user feedback from production usage
Upgrade / Downgrade Strategy
Upgrade: No changes required for existing workloads. Enable the PerPodPIDLimit
feature gate to start using the feature.
Downgrade: Existing Pods with spec.resources.limits.pid continue running with
node-level podPidsLimit. The field is preserved on existing objects but
rejected on new Pods.
Version Skew Strategy
This feature involves coordination between the kube-apiserver (API validation), the kubelet (cgroup enforcement), and the scheduler (node filtering via Node Declared Features). The following version skew scenarios are possible during rolling upgrades:
Apiserver ON, kubelet OFF: The apiserver accepts spec.resources.limits.pid.
The kubelet does not declare PerPodPIDLimit in node.status.declaredFeatures,
so the scheduler avoids placing the pod on this node. The node-level
podPidsLimit applies as a safe fallback.
Apiserver OFF, kubelet ON: Two scenarios:
Regular Pods: The apiserver silently rejects spec.resources.limits.pid
from new pod specs, so the field never reaches the kubelet. The kubelet
applies the node-level podPidsLimit.
Static Pods: Static pods are defined directly on the node and bypass
the apiserver. Since the kubelet has the gate enabled, if
spec.resources.limits.pid is present, the kubelet will enforce the
configured validation and apply the pod-level PID limit logic accordingly.
Both ON: Full enforcement. The kubelet declares PerPodPIDLimit in
node.status.declaredFeatures, the scheduler places pods on compatible nodes,
the apiserver validates, and the kubelet enforces min(podPidsLimit, spec.resources.limits.pid).
Both OFF: Feature disabled, existing behavior.
Enable on apiserver and kubelet simultaneously. If rolling, enable apiserver
first. The node-level podPidsLimit provides a safe fallback until kubelets
are upgraded.
In clusters with mixed node versions or mixed cgroupsv1/v2 nodes, the Node
Declared Features framework
(KEP-5328
)
mitigates version skew automatically. Only nodes that declare PerPodPIDLimit
receive pods with spec.resources.limits.pid, preventing kubelet admission
rejections that would put the pod into a non-retriable Failed state.
Production Readiness Review Questionnaire
Feature Enablement and Rollback
How can this feature be enabled / disabled in a live cluster?
- Feature gate (also fill in values in
kep.yaml)- Feature gate name:
PerPodPIDLimit - Components depending on the feature gate:
kube-apiserver,kubelet
- Feature gate name:
Does enabling the feature change any default behavior?
No.
Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
Yes; disable gate and restart components. Existing Pods retain their cgroup PID settings until restarted.
What happens if we reenable the feature if it was previously rolled back?
Pods can again specify spec.resources.limits.pid; no state loss.
Are there any tests for feature enablement/disablement?
Unit tests cover gate toggle: validation accepts/rejects pid based on gate state, and field stripping preserves pid on existing objects when gate is off.
Rollout, Upgrade and Rollback Planning
How can a rollout or rollback fail? Can it impact already running workloads?
A rollout or rollback (enabling/disabling the feature gate and restarting components) does not impact already running workloads.
What specific metrics should inform a rollback?
Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
No. This feature adds a new optional field (spec.resources.limits.pid)
and does not deprecate or remove any existing fields, flags, or APIs.
Monitoring Requirements
How can an operator determine if the feature is in use by workloads?
An operator can check the kubelet_pod_pid_limit_applied_total metric,
which increments each time the kubelet applies a per-pod PID limit from
spec.resources.limits.pid. A non-zero and increasing value indicates
workloads are using the feature. As a fallback, operators can query the
API for pods with the field set:
kubectl get pods --all-namespaces -o json | jq '.items[] | select(.spec.resources.limits.pid != null) | .metadata.name'
How can someone using this feature know that it is working for their instance?
- Events
- Event Reason:
PIDLimitCapped— emitted when a pod’s requested PID limit exceeds the node’spodPidsLimit, causing the effective limit to be capped at the node-defined maximum.
- Event Reason:
- API .status
- Condition name:
- Other field:
- Other (treat as last resort)
- Details: The
kubelet_pod_pid_limit_applied_totalPrometheus metric tracks how many pods have had per-pod PID limits applied.
- Details: The
What are the reasonable SLOs (Service Level Objectives) for the enhancement?
- 100% of pods with a valid
spec.resources.limits.pidshould have their pod-level cgrouppids.maxset to the correct effective value (min(podPidsLimit, spec.resources.limits.pid)) within the normal pod startup time. - No increase in pod startup latency attributable to PID limit enforcement (the cgroup write is a single syscall during pod setup).
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
- Metrics
- Metric name:
kubelet_pod_pid_limit_applied_total - [Optional] Aggregation method: counter (monotonically increasing)
- Components exposing the metric: kubelet
- Metric name:
Are there any missing metrics that would be useful to have to improve observability of this feature?
No
Dependencies
Does this feature depend on any specific services running in the cluster?
cgroupsv2
- Usage description: PID limits are enforced via the cgroupsv2 PID controller (
pids.max).- Impact of its outage on the feature: On cgroupsv1 nodes, pods specifying
spec.resources.limits.pidare rejected during admission. - Impact of its degraded performance or high-error rates on the feature: N/A — cgroup is a kernel interface, not a service.
- Impact of its outage on the feature: On cgroupsv1 nodes, pods specifying
- Usage description: PID limits are enforced via the cgroupsv2 PID controller (
PodLevelResourcesfeature gate (Beta, enabled by default since v1.34)- Usage description:
pidis specified underpod.spec.resources.limits, which requiresPodLevelResources.- Impact of its outage on the feature: If
PodLevelResourcesis disabled,PerPodPIDLimitcannot be enabled. - Impact of its degraded performance or high-error rates on the feature: N/A — feature gate is a binary on/off.
- Impact of its outage on the feature: If
- Usage description:
NodeDeclaredFeaturesfeature gate (GA since v1.36)- Usage description: The kubelet declares
PerPodPIDLimitinnode.status.declaredFeatureswhen the feature gate is enabled and cgroupsv2 is available. The scheduler uses this to filter nodes for pods withspec.resources.limits.pid.- Impact of its outage on the feature: If
NodeDeclaredFeaturesis disabled, the scheduler cannot filter nodes based on per-pod PID limit support. The feature still works via graceful degradation (kubelet admission rejection on incompatible nodes). - Impact of its degraded performance or high-error rates on the feature: N/A — feature gate is a binary on/off.
- Impact of its outage on the feature: If
- Usage description: The kubelet declares
Scalability
Will enabling / using this feature result in any new API calls?
No. The kubelet reads spec.resources.limits.pid from the existing Pod spec it
already fetches. No new API calls, watches, or controllers are introduced.
Will enabling / using this feature result in introducing new API types?
No. This feature adds a new field to the existing Pod API type. No new API types are introduced.
Will enabling / using this feature result in any new calls to the cloud provider?
No.
Will enabling / using this feature result in increasing size or count of the existing API objects?
- API type(s): Pod
- Estimated increase in size: ~20 bytes per Pod when
spec.resources.limits.pidis set. Pods that do not use this field are unchanged. - Estimated amount of new objects: None. No new API objects are created.
Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
No. The kubelet performs one additional integer comparison (min) during pod
cgroup setup, which is negligible.
Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?
No. This feature adds one integer field read and a min() comparison per pod
admission. No additional in-memory state, disk I/O, or network traffic.
Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
No. This feature restricts PID usage — it can only lower the effective PID limit
for a pod, never raise it above the node’s podPidsLimit. It reduces the risk
of PID exhaustion rather than increasing it.
Troubleshooting
How does this feature react if the API server and/or etcd is unavailable?
What are other known failure modes?
What steps should be taken if SLOs are not being met to determine the problem?
Implementation History
- 2026-05-05: Initial discussion in sig-node weekly meeting
- 2026-05-06: Enhancement issue created
- 2026-05-06: KEP PR created
- 2026-05-25: Alpha implementation PR created targeting v1.37
Drawbacks
This feature only works on cgroupsv2 nodes. Pods specifying spec.resources.limits.pid will be rejected on cgroupsv1 nodes, which may cause confusion in mixed clusters where some nodes have been upgraded to cgroupsv2 and others have not. Workload authors need to be aware of the underlying cgroup version to avoid unexpected admission failures.
Additionally, the “lower value wins” design means pods can only restrict PID limits below the node default, not raise them. Users who expect pid: 8192 to override a node podPidsLimit of 4096 may find this counterintuitive, though this is consistent with how Kubernetes treats node-level resource limits as the administrator’s ceiling.
Alternatives
ulimit nproc via KEP-5758: A potential alternative would be to introduce support for the nproc option in ulimit as part of KEP-5758. However, because ulimit is not enforced at the cgroup level, we chose not to include this approach in KEP-5758. For additional context and design considerations, see the KEP documentation: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/5758-per-container-ulimits-configuration .
Both approaches below use NRI plugin implementations that intercept container creation to adjust the
pids.maxcgroup limit. Both were tested on a Kind cluster (see testing details ).NRI plugin with annotations: The developer puts the desired PID limit directly on the pod as an annotation. The plugin reads the annotation, parses it as an integer, and injects a
ContainerAdjustmentwithLinuxPids.Limitset accordingly. This gives per-pod granularity but means any developer can set an arbitrary value, and there’s no admission validation unless you add a webhook separately.PriorityClass-based PID limits: An admin defines a config file mapping PriorityClass names to PID limits. The plugin reads the pod’s PriorityClass and applies the corresponding limit. This is more admin-controlled but semantically overloads PriorityClass — which is meant for scheduling preemption decisions — with resource-limit behavior.