KEP-3619: Fine grained SupplementalGroups control

Implementation History
STABLE Implemented
Created 2022-10-14
Latest v1.35
Milestones
Alpha v1.31
Beta v1.33
Stable v1.35
Ownership
Owning SIG
SIG Node
Participating SIGs
Primary Authors

KEP-3619: Fine-grained SupplementalGroups control

Release Signoff Checklist

Items marked with (R) are required prior to targeting to a milestone / release.

  • (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
  • (R) KEP approvers have approved the KEP status as implementable
  • (R) Design details are appropriately documented
  • (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
    • e2e Tests for all Beta API Operations (endpoints)
    • (R) Ensure GA e2e tests meet requirements for Conformance Tests
    • (R) Minimum Two Week Window for GA e2e tests to prove flake free
  • (R) Graduation criteria is in place
  • (R) Production readiness review completed
  • (R) Production readiness review approved
  • “Implementation History” section is up-to-date for milestone
  • User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
  • Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

Summary

The KEP seeks to provide a way to choose correct behavior with how Container Runtimes (Containerd and CRI-O) are applying SupplementalGroups to the first container processes. The KEP describes the work needed to be done in Kubernetes or connected projects to make sure customers have a clear migration path - including detection and safe upgrade - if any of their workflows took a dependency on this arguably erroneous behavior.

The issue

How supplemental groups attached to the container processes are defined in two levels in Kubernetes, one is OCI image level and the other is Kubernetes API level.

In OCI image spec , config.User OCI image configuration (mirrored spec of USER directive in Dockerfile ) is defined as follows:

The username or UID which is a platform-specific structure that allows specific control over which user the process run as. This acts as a default value to use when the value is not specified when creating a container. For Linux based systems, all of the following are valid: user, uid, user:group, uid:gid, uid:group, user:gid. If group/gid is not specified, the default group and supplementary groups of the given user/uid in /etc/passwd from the container are applied.

In Kubernetes API level, PodSecurityContext.{RunAsUser, RunAsGroup, SupplementalGroups} relates to this. This API was designed to override config.User configuration of OCI images. However, in the current implementation, as described in kubernetes/kubernetes#112879 , even when a manifest defines both RunAsGroup, group memberships defined in the container image for the UID are attached to the container process (see the the next section for details). This behavior clearly diverges from the specification of OCI image configuration, especially the next sentence of config.User OCI image configuration ):

If group/gid is not specified, the default group and supplementary groups of the given user/uid in /etc/passwd from the container are applied.

As described in kubernetes/kubernetes#112879 , the behavior is not documented well and is not widely known by most Kubernetes administrators and users. Moreover, this behavior causes security considerations in some cases.

Steps to reproduce

Assume you have an image and a Pod manifest:

# Dockerfile
FROM ubuntu:22.04
# This generates /etc/group entry --> "group-in-image:x:50000:alice"
RUN groupadd -g 50000 group-in-image \
    && useradd -m -u 1000 alice \
    && gpasswd -a alice group-in-image
USER alice
spec:
  # This overrides 
  # - USER directive in Dockerfile above by runAsUser and runAsGroup with "1000:1000", and 
  # - setting supplementalGroups
  # This spec expects NOT to attach gids defined in the image(/etc/group) to the container process
  # because this specifies gid by runAsGroup explicitly.
  securityContext: { runAsUser:1000, runAsGroup:1000, supplementalGroups:[60000]}
  containers:
    # Expected output: "uid=1000(alice) gid=1000(alice) groups=1000(alice),60000"
    # NOTE: "group-in-image" is not included here 
    #       because groups defined in /etc/group should not be attached
    #       when gids is specified in runAsGroup
  - image: the-image-above
    sh: ["id"] 

However, the current combination with Kubernetes and major container runtimes(at least containerd and cri-o) outputs(See here for more detailed reproduction code) includes “group-in-image” group of the first container process.

uid=1000(alice) gid=1000(alice) groups=1000(alice),50000(group-in-image),60000

Motivation

As described above, how supplemental groups attached to the first container process is complicated and not OCI image spec compliant.

Moreover, this causes security considerations as follows. When a cluster enforces some security policy for pods that protects the value of RunAsGroup and SupplementalGroups, the effect of its enforcement is limited, i.e., cluster users can easily bypass the policy enforcement just by using a custom image. If such a bypass happened, it would be unexpected behavior for most cluster administrators because the enforcement is almost useless. Moreover, the bypass will cause unexpected file access permission. In some use cases, the unexpected file access permission will be a security concern. For example, using hostPath volumes could be a severe problem because UID/GIDs matter in accessing files/directories in the volumes.

Kubernetes provides no API surface to prevent this bypass although it could sometimes lead to a security concern. Because the behavior is implemented in CRI implementations actually, To mitigate this, the cluster administrators will need to deploy a custom low-level container runtime(e.g., pfnet-research/strict-supplementalgroups-container-runtime ) that modifies OCI container runtime spec(config.json) produced by CRI implementations (e.g., containerd, cri-o). A custom RuntimeClass would be introduced for it. Nevertheless, It would be an extra operational burden for cluster administrators.

Thus, this KEP proposes to offer a new API field named SupplementalGroupsPolicy that enables users to control supplemental groups attached to the first container process by following “principle of least surprise”. The new API allows cluster administrators to deploy security policies that protect the SupplementalGroupsPolicy field in the cluster to avoid the unexpected bypass of SupplementalGroups described above. This KEP also proposes a way for users to detect which groups are actually attached to container processes. This helps users/administrators identify which pods have unexpected group permissions and choose the best SupplementalGroupsPolicy for them.

Goals

  • To Provide a new API field to control exactly which groups the container process belongs to
  • Ensure there are clear steps documented for end users to detect if their workload is affected
  • (Optional) provide helper APIs and/or tooling to simplify the detection

Non-Goals

  • To provide a cluster-wide control method.
  • To change the default behavior (a potentially breaking change)

Proposal

This KEP proposes changes both on Kubernets API and CRI levels.

Kubernetes API

See also Alternatives section for rejected alternative plans.

SupplementalGroupsPolicy in PodSecurityContext

A new field named SupplementalGroupsPolicy will be introduced to PodSecurityContext. This field defines how supplemental groups of the first container process are calculated.

Allowed values are:

  • Merge(default if not specified): This policy always merges the provided SupplementalGroups(including FsGroup) with groups of the primary user from the image(/etc/group in the image).
    • Note: The primary user is specified with RunAsUser. If not specified, the user from the image config is used. Otherwise, the runtime default is used.
  • Strict: This policy uses only the provided SupplementalGroups(including FsGroup) as supplemental groups for the first container process. No groups from the image are extracted.

Note that both policies diverge from the semantics of config.User OCI image configuration . The purpose is to follow “principle of least surprise” as described in the previous section.

User in ContainerStatus

To provide users/administrators to know which identities are actually attached to the container process, it proposes to introduce new User field in ContainerStatus. User is an object which consists of Uid, Gid, SupplementalGroups fields for linux containers. This will help users to identify unexpected identities. This field is derived by CRI response (See user in ContainerStatus section).

NodeFeatures in NodeStatus which contains SupplementalGroupsPolicy field

Because the actual control(calculation) of supplementary groups to be attached to the first container process will happen inside of CRI implementations (container runtimes), it proposes to add NodeFeatures field in NodeStatus which contains the SupplementalGroupsPolicy feature field inside of it like below so that kubernetes can correctly understand whether underlying CRI implementation implements the feature or not. The field is populated by CRI response.

type NodeStatus struct {
	// Features describes the set of features implemented by the CRI implementation.
	Features *NodeFeatures
}
type NodeFeatures struct {
	// SupplementalGroupsPolicy is set to true if the runtime supports SupplementalGroupsPolicy and ContainerUser.
	SupplementalGroupsPolicy *bool
}

Recently KEP-3857: Recursive Read-only (RRO) mounts introduced RuntimeHandlers[].Features. But it is not fit to use for this KEP because RRO mounts requires inspecting the OCI runtime spec’s Feature to understand whether the low-level OCI runtime supports RRO or not. However, for this KEP(SupplementalGroupsPolicy), it does not need to inspect the OCI runtime spec’s Feature because this KEP only affects Process.User.additionalGid and does not depend on the OCI runtime spec’s Feature . So, introducing new NodeFeatures in NodeStatus does not conflict with RuntimeHandlerFeatures as we can clearly define how to use them as below:

  • NodeFeatures(added in this KEP):
    • focusses on features that depend only on cri implementation, be independent of runtime handlers(low-level container runtimes), (i.e. it should not require to inspect to any information from oci runtime-spec’s features).
  • RuntimeHandlerFeature (introduced in KEP-3857):
    • focuses features that depend on the runtime handlers, (i.e. dependent to the information exposed by oci runtime-spec’s features).

See this section for details.

CRI

SupplementalGroupsPolicy in SecurityContext

Symmetrical changes are needed. See Design Details section.

user in ContainerStatus

To propagate identities of the container process to ContainerStatus in Kubernetes API, CRI changes would be needed. This proposes to define ContainerUser data type and add user field to ContainerStatus that is used in the response of ContainerStatus method. ContainerUser consists of Uid, Gid and SupplementalGroups fields.

// service RuntimeService {
//   rpc ContainerStatus(ContainerStatusRequest) returns (ContainerStatusResponse) {}
//  ...
// }
// message ContainerStatusResponse {
//  ContainerStatus status = 1;
//  ...
// }

message ContainerStatus {
  ...
  // user information of the container process
  ContainerUser user = ?;
}

message ContainerUser {
  // details in "Design Details" section
}

features in StatusResponse which contains supplemental_groups_policy field

To propagate whether the runtime supports fine-grained supplemental group control to NodeFeatures.SupplementalGroupsPolicy, it proposes to add a corresponding field features in StatusResponse.

// service RuntimeService {
// ...
//     rpc Status(StatusRequest) returns (StatusResponse) {}
// }
message StatusResponse {
...
    // features describes the set of features implemented by the CRI implementation.
    // This field is supposed to propagate to NodeFeatures in Kubernetes API.
    RuntimeFeatures features = ?;
}
message RuntimeFeatures {
    // supplemental_groups_policy is set to true if the runtime supports SupplementalGroupsPolicy and ContainerUser.
    bool supplemental_groups_policy = 1;
}

As discussed in Kubernetes API section , RuntimeHandlerFeature introduced in KEP-3857 should focus on features only for ones which requires to inspect OCI runtime spec’s Feature . But RuntimeFeatuers proposed in this KEP should focus on ones which does NOT require to inepect it.

User Stories (Optional)

Story 1: Deploy a Security Policy to enforce SupplementalGroupsPolicy field

Assume a multi-tenant kubernetes cluster with hostPath volumes below situations:

  • Multi-tenant model is namespace-based (namespace per tenant(user/group) model)
    • access to each namespace is controlled by RBAC
  • PSP(or other policy engines) is enforced in each namespace which protects
    • runAsUser, runAsGroup, fsGroup, supplementalGroups values
  • A hostPath volume (say /mnt/hostpath) is maintained in all the nodes by administrators
    • with permission drwxr-xr-x nobody nogroup /mnt/hostpath
    • the directory mounts an NFS volume shared by all the tenants, and UIDs/GIDs are managed by the cluster admininistrators
    • Any tenant CAN create a directory under this directory
  • There is a /mnt/hostpath/private-to-gid-60000 which is fully private to gid=60000
    • i.e. its permission is drwxrwx--- nobody 60000 /mnt/hostpath/private-to-gid-60000
  • There is user-alice namespace for alice(uid=1000), and alice only belongs a group-a(gid=50000)
  • cluster administrator enforces a policy for Pods with /mnt/hostpath hostPath volumes in user-alice namespace such that
    • runAsUser, runAsGroup must be 1000
    • supplementalGroups must be [60000]
    • fsGroup must be one of 1000, 60000
    • i.e. cluster administrator expects that all the container processes can only have 60000 as supplementary groups in user-alice namespace

As described in Summary section, alice can bypass the restriction by using a custom image. To mitigate the scenario, cluster administrators can deploy a security policy restricting supplementalGroupsPolicy in user-alice namespace such that:

  • runAsUser, runAsGroup must be 1000
  • supplementalGroups must be [60000]
    • this is not enough to avoid bypassing supplementary groups for container processes
  • supplementalGroupsPolicy must be Strict
    • this really needs to avoid the bypass completely
  • fsGroup must be one of 1000, 60000

Please note that a security policy without supplementalGroupsPolicy would lead to unexpected groups for the first process in the containers.

Notes/Constraints/Caveats (Optional)

The proposal affects to the CRI implementations (e.g., containerd, cri-o, gVisor, etc.)

Risks and Mitigations

  • How to track the support status in CRI implementations of this proposal?
    • This feature is mainly implemented inside each CRI implementation.
  • How to feature-gate this feature in CRI implementations?

Design Details

Kubernetes API

SupplementalGroupsPolicy in PodSecurityContext

A new field named SupplementalGroupsPolicy will be introduced to PodSecurityContext:

type PodSecurityContext struct {
	...
	// A list of groups applied to the first process run in each container. 
	// supplementalGroupsPolicy can control how groups will be calculated.
	// Note that this field cannot be set when spec.os.name is windows.
	// +optional
	SupplementalGroups []int64
	// supplementalGroupsPolicy defines how supplemental groups of the first 
	// container processes are calculated.
	// Valid values are "Merge" and "Strict". 
	// If note specified, "Merge" is used.
	// Note that this field cannot be set when spec.os.name is windows.
	// +optional
	SupplementalGroupsPolicy *PodSecurityGroupsPolicy 
}

type PodSecurityGroupsPolicy string
const (
	// SecurityGroupsPolicyMerge policy always merges 
	// the provided SupplementalGroups (including FsGroup) 
	// with groups of the primary user from the container image(`/etc/group`).
	// Note: The primary user is specified with RunAsUser. 
	//       If not specified, the user from the image config is used. 
	//       Otherwise, the runtime default is used.
	SecurityGroupsPolicyMerge PodSecurityGroupsPolicy = "Merge"

	// SecurityGroupsPolicyStrict policy uses only 
	// the provided SupplementalGroups(including FsGroup) 
	// as supplemental groups for the first container process. 
	// No groups extracted from the container image.
	SecurityGroupsPolicyStrict PodSecurityGroupsPolicy = "Strict"
)

User in ContainerStatus

type ContainerStatus struct {
...
	// User indicates identities of the container process
	User ContainerUser
}
type ContainerUser struct {
	// Linux holds identity information of the process of the containers in Linux.
	// Note that this field cannot be set when spec.os.name is windows.
	Linux *LinuxContainerUser

	// Windows holds identity information of the process of the containers in Windows
	// This is just reserved for future use.
	// Windows *WindowsContainerUser
}

type LinuxContainerUser struct {
	// Uid is the primary uid of the container process
	Uid int64
	// Gid is the primary gid of the container process
	Gid int64
	// SupplementalGroups are the supplemental groups attached to the container process
	SupplementalGroups []int64
}

// This is just reserved for future use.
// type WindowsContainerUser struct {
// 	T.B.D.
// }

NodeFeatures in NodeStatus which contains SupplementalGroupsPolicy field

type NodeStatus struct {
	// Features describes the set of implemented features implemented by the CRI implementation.
	// +featureGate=SupplementalGroupsPolicy
	// +optional
	Features *NodeFeatures

	// The available runtime handlers.
	// +featureGate=RecursiveReadOnlyMounts
	// +optional
	RuntimeHandlers []RuntimeHandlers
}

// NodeFeatures describes the set of implemented features implemented by the CRI implementation.
// THE FEATURES CONTAINED IN THE NodeFeatures SHOULD DEPEND ON ONLY CRI IMPLEMENTATION, BE INDEPENDENT ON RUNTIME HANDLERS,
// (I.E. IT SHOULD NOT REQUIRE TO INSPECT TO ANY INFORMATION FROM OCI RUNTIME-SPEC'S FEATURES).
type NodeFeatures {
	// SupplementalGroupsPolicy is set to true if the runtime supports SupplementalGroupsPolicy and ContainerUser.
	// +optional
	SupplementalGroupsPolicy *bool
}

// NodeRuntimeHandler is a set of runtime handler information.
type NodeRuntimeHandler struct {
	// Runtime handler name.
	// Empty for the default runtime handler.
	// +optional
	Name string
	// Supported features in the runtime handlers.
	// +optional
	Features *NodeRuntimeHandlerFeatures
}

// NodeRuntimeHandlerFeatures is a set of features implementedy by the runtime handler.
// THE FEATURES CONTAINED IN THE NodeRuntimeHandlerFeatures SHOULD DEPEND ON THE RUNTIME HANDLERS,
// (I.E. DEPENDENT TO THE INFORMATION EXPOSED BY OCI RUNTIME-SPEC'S FEATURES).
type NodeRuntimeHandlerFeatures struct {
	// RecursiveReadOnlyMounts is set to true if the runtime handler supports RecursiveReadOnlyMounts.
	// +featureGate=RecursiveReadOnlyMounts
	// +optional
	RecursiveReadOnlyMounts *bool
	// Reserved: UserNamespaces *bool
}

CRI

SupplementalGroupsPolicy in SecurityContext

cri-spec (v1) also needs to be updated similarly as follows. Comments are omitted because they are symmetric to Pods’ one.

enum SupplementalGroupsPolicy {
    Merge = 0;
    Strict = 1;
}

message LinuxContainerSecurityContext {
...
    repeated int64 supplemental_groups;
    optional SupplementalGroupsPolicy supplemental_groups_policy;
}

message LinuxSandboxSecurityContext {
...
    repeated int64 supplemental_groups;
    optional SupplementalGroupsPolicy supplemental_groups_policy;
}

user in ContainerStatus


message ContainerStatus {
    ...
    // User holds user information of the container process
    ContainerUser user = ??;
}

message ContainerUser {
    // User information of Linux containers.
    LinuxContainerUser linux = 1;
    // User information of Windows containers.
    // This is just reserved for future use.
    // WindowsContainerUser windows = 2;
}


message LinuxContainerUser {
    // uid is the primary uid of the container process
    Int64Value uid = 1;
    // gid is the primary gid of the container process
    Int64Value gid = 2;
    // supplemental_groups are the supplemental groups attached to the container process
    repeated int64 supplemental_groups = 3;
}

// message WindowsContainerUser {
//     T.B.D.
// }

features in StatusResponse which contains supplemental_groups_policy field

// service RuntimeService {
// ...
//     rpc Status(StatusRequest) returns (StatusResponse) {}
// }
message StatusResponse {
...
    // Runtime handlers.
    repeated RuntimeHandler runtime_handlers = 3;

    // features describes the set of features implemented by the CRI implementation.
    // This field is supposed to propagate to NodeFeatures in Kubernetes API.
    RuntimeFeatures features = ?;
}

// RuntimeFeatures describes the set of features implemented by the CRI implementation.
// THE FEATURES CONTAINED IN THE RuntimeFeatures SHOULD DEPEND ON ONLY CRI IMPLEMENTATION, BE INDEPENDENT ON RUNTIME HANDLERS,
// (I.E. IT SHOULD NOT REQUIRE TO INSPECT TO ANY INFORMATION FROM OCI RUNTIME-SPEC'S FEATURES).
message RuntimeFeatures {
    // supplemental_groups_policy is set to true if the runtime supports SupplementalGroupsPolicy and ContainerUser.
    bool supplemental_groups_policy = 1;
}

// message RuntimeHandler {
//     // Name must be unique in StatusResponse.
//     // An empty string denotes the default handler.
//     string name = 1;
//     // Supported features.
//     RuntimeHandlerFeatures features = 2;
// }

// RuntimeHandlerFeatures is a set of features implementedy by the runtime handler.
// THE FEATURES CONTAINED IN THE RuntimeHandlerFeatures SHOULD DEPEND ON THE RUNTIME HANDLERS,
// (I.E. DEPENDENT TO THE INFORMATION EXPOSED BY OCI RUNTIME-SPEC'S FEATURES).
message RuntimeHandlerFeatures {
    bool recursive_read_only_mounts = 1;
    bool user_namespaces = 2;
}

Test Plan

[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.

Prerequisite testing updates
Unit tests
  • k8s.io/kubernetes/pkg/api/pod/util.go: 2024-08-13 - 68.7%
    • It tests dropDisabledFields for PodSecurityContext.SupplementalGroups, ContainerStatus.User fields
    • Note: The test these field values when enabling/disabling this feature.
Integration tests

See e2e tests below.

e2e tests
  • Kubernetes: https://github.com/kubernetes/kubernetes/blob/v1.31.0/test/e2e/node/security_context.go
    • When creating a Pod with SupplementalGroupsPolicy=Strict
      • the containers in the pod will run with only groups specified by the API, and
      • once it starts, ContainerStatus.User contains the correct identities of the containers
    • When creating a Pod with SupplementalGroupsPolicy=Merge
      • the containers in the pod will run with groups specified by API and groups from the container image, and
      • once it starts, ContainerStatus.User contains the correct identities of the containers, and
    • When creating a Pod without SupplementalGroupsPolicy (equivalent behaviour with Merge)
      • the pod will run with with groups specified by API and groups from the image
      • once it starts, ContainerStatus.User contains the correct identities of the containers
    • Note: above e2e tests will self-skip if the node does not support SupplementalGroupsPolicyFeature detected by Node.Status.Featuers.SupplementalGroupsPolicy field.
  • critools(critest): https://github.com/kubernetes-sigs/cri-tools/blob/v1.31.0/pkg/validate/security_context_linux.go
    • Symmetric test cases with Kubernetes e2e tests except for the case of without SupplementalGroupsPolicy because SupplementalGroupsPolicy always has value(default is Merge).
    • Note: above tests will self-skip if the runtime does not support SupplementalGroupsPolicyFeature detected by StatusResponse.features.supplemental_groups_policy field.

Graduation Criteria

Because this KEP’s core implementation(i.e. SupplementalGroupsPolicy handling) lies inside of CRI implementations(e.g. containerd, cri-o), the graduation criteria contains the support statuses of the updated CRI by container runtimes.

Alpha

  • At least one of the most popular Container Runtimes(e.g. containerd) implements the updated CRI and released
  • Feature implemented behind a feature flag based on the Container Runtime
  • Unit tests and initial e2e tests completed and enabled

Beta

  • Several popular Container Runtimes(e.g. containerd and cri-o) support the updated CRI and released
  • Fixed reported bugs from the community
  • Additional integration tests and e2e tests are in Testgrid and linked in KEP

GA

  • No negative user feedback based on production experience, promote after 2 releases in beta.

Upgrade / Downgrade Strategy

Version Skew Strategy

Existing pods will still work as intended, as the new field is missing there (i.e. no SupplementalGroupsPolicy fields in existing Pods’ spec).

For upgrade, it will not change any current behaviors. But, please note that if you plan to use Strict SupplementalGroupsPolicy after the upgrade, we assume your CRI runtime in the cluster also support this feature (See “Dependencies” section). If there are some nodes whose CRI runtime does NOT support this feature,

  • the creation of pods with Strict policy will be rejected depending if the feature levels of the upgraded version was beta or above,
  • the Strict policy will fallback to Merge silently if the feature level of the upgraded version was alpha. Please see the below matrix for more details.

For downgrade, when the functionality wasn’t yet used, downgrade will not be affected. But, when the functionality, especially Strict SupplementalGroupsPolicy, was already used, there need to be caution:

  • the running containers will continue to run with its effective policy as long as the container was not recreated.
  • However, when the containers in such pods are recreated in the node, the behavior will be varied by downgraded version, the downgraded feature gate value, and its CRI runtime support status (see the below matrix).

The below matrix summarizes what will happen by upgraded/downgraded target versions, target feature gate, target CRI runtime support status:

Target
kubelet version
Target
Feature Gate
Target
CRI runtime
support the feature?
Pod’s policyEffective PolicyRejected By Kubelet?.containerStatuses.user reported?
<1.31
(does not know the field)
N/AYes/NoStrictMerge
(fallback silently)
NONO
Merge
/(not set)
MergeNONO
1.31 or 1.32
(Alpha)
TrueYESStrictStrictNOYES
Merge
/(not set)
MergeNOYES
NOStrictMerge
(fallback silently)
NONO
Merge
/(not set)
MergeNONO
FalseYESStrict
(set when the feature was on)
StrictNONO
Merge
/(not set)
MergeNONO
NOStrict
(set when the feature was on)
Merge
(fallback silently)
NONO
Merge
/(not set)
MergeNONO
>=1.33
(Beta or above)
True
(default)
YESStrictStrictNOYES
Merge
/(not set)
MergeNOYES
NOStrict-REJECTED(*)NO
Merge
/(not set)
MergeNONO
FalseYESStrict
(set when the feature was)
StrictNONO
Merge
/(not set)
MergeNONO
NOStrict
(set when the feature was)
-REJECTED(*)NO
Merge
/(not set)
MergeNONO

(*): See “What specific metrics should inform a rollback?” for details

Production Readiness Review Questionnaire

Feature Enablement and Rollback

How can this feature be enabled / disabled in a live cluster?
  • Feature gate (also fill in values in kep.yaml)
    • Feature gate name: SupplementalGroupsPolicy
    • Components depending on the feature gate: kube-apiserver, kubelet, (and CRI implementations(e.g. containerd, cri-o))
  • Other
    • Describe the mechanism:
    • Will enabling / disabling the feature require downtime of the control plane?
    • Will enabling / disabling the feature require downtime or reprovisioning of a node?
Does enabling the feature change any default behavior?

No. Just introducing new API fields in Pod spec and CRI which does NOT change the default behavior.

Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

Yes. It can be disabled after enabled until Beta. When disabled, you can not create pods with SupplementalGroupsPolicy fields and no .status.containerStatuses[*].user will be reported in pod status. Please note if there are pods that have been created with Strict policy, the policy of the containers in such pods will keep enforced even after its disablement.

See “Version Skew Strategy” for more complex cases (including upgrading/downgrading).

But, starting v1.35, this feature graduates to GA, the SupplementalGroupsPolicy feature gate will be locked to true and will no longer be disable-able.

What happens if we reenable the feature if it was previously rolled back?

The SupplementalGroupsPolicy field in pod spec and .status.containerStatuses[*].user in pod status will be available again. As described above section, for pods that have been created with Strict policy before, the policy of the containers in such pods will still keep enforced after its re-enablement.

See “Version Skew Strategy” for more complex cases (including upgrading/downgrading).

Are there any tests for feature enablement/disablement?

Yes, see Unit tests section.

Rollout, Upgrade and Rollback Planning

How can a rollout or rollback fail? Can it impact already running workloads?

As long as you does not use the SupplementalGroupsPolicy fields, rollout or rollback will be safe. And, there is no impact to already running workloads because the feature have backward compatible.

However, if there exist pods with SupplementalGroupsPolicy fields when to rollout/rollback, there need to be caution. Please see the matrix in “Version Skew Strategy” section for details.

What specific metrics should inform a rollback?

As long as you does not use the SupplementalGroupsPolicy fields, rollout or rollback will be safe as described in the above section.

However, if there exist pods with SupplementalGroupsPolicy fields when to rollout/rollback, pod creation rejection might happen when

  • the feature level of rollout-ed/rollback-ed version is beta or above, and
  • pods with Strict policy (set when the feature gate was on previously) are scheduled to the nodes whose CRI runtime does NOT support this feature.

In that case, please look for an event saying indicating SupplementalGroupsPolicy is not supported by the node as the rollback signal.

$ kubectl get events -o json -w
...
{
    ...
    "kind": "Event",
    "reason": "SupplementalGroupsPolicyNotSupported",
    "message": "Error: SupplementalGroupsPolicy is not supported in this node.",
    ...
}
...

So, you can follow kubelet_admission_rejections_total{reason='SupplementalGroupsPolicyNotSupported'} metrics to track such events.

Also, the following kubelet metrics are also useful to check:

  • kubelet_running_pods: Shows the actual number of pods running
  • kubelet_desired_pods: The number of pods the kubelet is trying to run

If these metrics are different, it means there are desired pods that can’t be set to running. If that is the case, checking the pod events to see if they are failing for SupplementalGroupsPolicy reasons (like the errors shown in above) is advised, in which case it is recommended to rollback.

Even this KEP does NOT include kube-scheduler integration to ensure to let the scheduler place pods requires the feature(Strict policy) to the nodes which support this feature, you can use node labels and pod’s nodeSelector/nodeAffinity to mitigate pod rejection or error events. Please see “Are there any missing metrics that would be useful to have to improve observability of this feature?” section below for details.

Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

During the beta phase, the following test will be manually performed:

  • Enable the SupplementalGroupsPolicy feature gate for kube-apiserver and kubelet.
  • Create a pod with supplementalGroupsPolicy specified.
  • Disable the SupplementalGroupsPolicy feature gate for kube-apiserver, and confirm that the pod gets rejected.
  • Enable the SupplementalGroupsPolicy feature gate again, and confirm that the pod gets scheduled again.
  • Do the same for kubelet too.
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

No.

Monitoring Requirements

How can an operator determine if the feature is in use by workloads?

Inspect the supplementalGroupsPolicy fields in Pods. You can check if the following jq command prints non-zero number:

kubectl get pods -A -o json | jq '[.items[].spec.securityContext? | select(.supplementalGroupsPolicy)] | length'
How can someone using this feature know that it is working for their instance?
  • Events
    • Event Reason:
  • API .status
    • Condition name: containerStatuses.user
    • Other field:
  • Other (treat as last resort)
    • Details:
What are the reasonable SLOs (Service Level Objectives) for the enhancement?
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
  • Metrics
    • Metric name:
    • [Optional] Aggregation method: kubectl get events -o json -w
    • Components exposing the metric: kubelet -> kube-apiserver
  • Other (treat as last resort)
    • Details:
Are there any missing metrics that would be useful to have to improve observability of this feature?

Potentially, kube-scheduler could implement a rule to avoid scheduling a pod with supplementalGroupsPolicy: Strict to a node not supporting this feature.

However, this is not covered by this KEP. It is because that more generic way would be nice in Kubernetes so that scheduler can schedule pods which requires node feature X to the nodes which support node feature X.

As of v1.33, although kubernetes does not offer such generic way to do this, cluster admins can maintain node labels and use nodeSelector/nodeAffinity in pods instead.

There are several way to automate them:

  • By Mutating Webhook:
    • for nodes, which transforms Node.Status.Feature.SupplementalGroupsPolicy field to some node label(say supplementalgroupspolicy-supported: "true" | "false"),
    • for pods, which mutates an additional .spec.nodeSelector: { "supplementalgroupspolicy-supported": "true" } when the pod specifies Strict policy.
  • By Mutating Admission Policy:
    • although the feature is still alpha as of v1.32, you can write the equivalent policy to do this.

If you appropriately managed the node labels and pods’ nodeSelector/nodeAffinity, the error events or pod rejection will not expect to happen. Instead, you will need to watch Pending pods if there are sufficient number of nodes supporting SupplementalGroupsPolicy in the cluster.

Dependencies

Does this feature depend on any specific services running in the cluster?

Container runtimes supporting CRI api v0.31.0 or above.

For example,

  • containerd: v2.0 or later
  • CRI-O: v1.31 or later

Scalability

A pod with supplementalGroupsPolicy: Strict may be rejected by kubelet with the probablility of $$B/A$$, where $$A$$ is the number of all the nodes that may potentially accept the pod, and $$B$$ is the number of the nodes that may potentially accept the pod but does not support this feature. This may affect scalability.

To evaluate this risk, users may run kubectl get nodes -o json | jq '[.items[].status.features]' to see how many nodes support supplementalGroupsPolicy: true before using Strict policy.

To mitigate this probability, you can also manage node labels and pod’s nodeSelector/nodeAffinity to ensure pods with Strict policy to the nodes which support SupplementalGroupPolicy feature. Please see “Are there any missing metrics that would be useful to have to improve observability of this feature?” section.

Will enabling / using this feature result in any new API calls?

No. Just introducing new API fields in Pod spec and CRI which does NOT change the default behavior.

Will enabling / using this feature result in introducing new API types?

No.

Will enabling / using this feature result in any new calls to the cloud provider?

No.

Will enabling / using this feature result in increasing size or count of the existing API objects?

Precisely, yes because the kep introduces new API fields in Pods. But the increasing size can be negligible.

Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?

No.

Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?

No.

Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?

No.

Troubleshooting

How does this feature react if the API server and/or etcd is unavailable?

A pod cannot be created, just as in other pods.

What are other known failure modes?

None.

What steps should be taken if SLOs are not being met to determine the problem?
  • Make sure that the node is running with CRI runtime which supports this feature.
  • Make sure that crictl info (with the latest crictl) reports that supplemental_groups_policy is supported. Otherwise upgrade the CRI runtime, and make sure that no relevant error is printed in the CRI runtime’s log.
  • Make sure that kubectl get nodes -o json | jq '[.items[].status.features]' (with the latest kubectl and control plane) reports that supplementalGroupsPolicy is supported. Otherwise upgrade the CRI runtime, and make sure that no relevant error is printed in kubelet’s log.

Implementation History

  • 2023-02-10: Initial KEP published.
  • v1.31.0(2024-08-13): Alpha
  • v1.33.0(2025-04-23): Beta
  • v1.35.0(2025-12-17): GA

Drawbacks

N/A

Alternatives

Introducing RuntimeClass

As described in the Motivation section, cluster administrators would need to deploy a custom low-level container runtime(e.g., pfnet-research/strict-supplementalgroups-container-runtime ) that modifies OCI container runtime spec(config.json) produced by CRI implementations (e.g., containerd, cri-o). A custom RuntimeClass would be introduced for it.

Adjusting container image by users

Users could modify their container images to control the supplemental groups (i.e., modifying group memberships of the uid of the container). Although it is more work and users won’t always have the option to do that.

Just fixing CRI implementations

We could just fix CRI implementations directly without introducing new APIs. The advantage is no API changes both on Kubernetes and CRI levels. However, the main downside of this approach is a breaking change that makes users confused.

Infrastructure Needed (Optional)

N/A