KEP-4601: Authorize with Selectors

Release Signoff Checklist
Summary
Motivation
- Goals
- Non-Goals
Proposal
Design Details
Production Readiness Review Questionnaire
Implementation History
Drawbacks
Alternatives
Infrastructure Needed (Optional)

Release Signoff Checklist

Items marked with (R) are required prior to targeting to a milestone / release.

(R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
(R) KEP approvers have approved the KEP status as implementable
(R) Design details are appropriately documented
(R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
(R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests
(R) Production readiness review completed
(R) Production readiness review approved
“Implementation History” section is up-to-date for milestone
User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

Summary

The authorization attributes will be extended to include field selectors and label selectors from List, Watch, and DeleteCollection. This will allow authorizers to use these selectors when making an authorization decision.

Motivation

Security for per-node workloads could be improved by exposing field and label selectors to authorizers. Adding them as authorization attributes allows the development of new kinds of authorizers that leverage this information to provide security. In particular, it enables out-of-tree authorizers to experiment with ways to express restrictions based on field and label selectors.

Goals

Add field and label selectors to authorization attributes for List, Watch, and DeleteCollection verbs.
Add field and label selectors to webhook authorization types.
Add field and label selectors to SelfSubjectAccessReview (SSAR), SubjectAccessReview (SAR), and LocalSubjectAccessReview.
Update node authorizer to restrict on nodeName field selector.
Add field and label selectors to CEL authorizer implementation.

Non-Goals

Create a generic in-tree authorizer that manages field or label selectors.
Expand the audit surface area, since requestURI is already included
Expand the admission surface area (admission.Attributes, AdmissionReview, available to admission) since admission verbs don’t support field/label selectors

Proposal

List, Watch, and DeleteCollection requests directly have field and label selector options. A single-item List or Watch request is still a list as normal (including selectors), but also includes a name.

Authorization Attributes changes

The authorization attributes have easy access to the query parameter field and label selectors. To avoid confusion, field and label selectors will not be included in authorization attributes for kube-apiserver requests with verbs where the field selector has no semantic meaning. In practice this means that (for now), only List, Watch, and DeleteCollection have field and label selectors.

SubjectAccessReviews submitted to the kube-apiserver with verbs that do not honor the selectors will NOT modify the field and label selector attributes. The client is trusted to be sending only combinations that will be honored.

Any authorizer that gets an error from GetFieldSelector or GetLabelSelector may attempt to authorize without field or label selectors since that will authorize using a wider permission (field and label selectors can only reduce access).

type Attributes interface {
  // GetFieldSelector is lazy, thread-safe, and stores the parsed result and error.
  // It can return an error if the field selector cannot be parsed.
  // Remember that field selector formats vary based on the version of the API being used!
  GetFieldSelector() (fields.Requirements, error)
  
  // GetLabelSelector is lazy, thread-safe, and stores the parsed result and error.
  // It can return an error if the field selector cannot be parsed.
  GetLabelSelector() (labels.Requirements, error)

Webhook authors: remember that the list of verbs accepting field and label selectors may change over time. If the kube-apiserver sends the FieldSelector or LabelSelector to a webhook, the kube-apiserver intends to honor the selector attributes.

Future-proofing your authorization webhook for future verbs

As of 1.31, the only verbs with field and label selectors are List, Watch, and DeleteCollection. In the future, the kube-apiserver may add field and label selectors to Get, Create, Update, Patch, and Delete.

For Get, this means the field and label selector of the retrieved object must match.
For Create, this means that the resource after all mutation is complete (finalObject) must match the field and label selector.
For Update/Patch, this means that the finalNewObject and oldObject must match the field and label selector.
For Delete, this means that the oldObject must match the field and label selector.
For subresources, if the storage layer cannot verify the parent object matches the selector (both old and new), the request must be rejected.

We do not allow field and label selectors for Get, because if a client is specifying a selector, they can add a .metadata.name field selector and use a List to get equivalent functionality.

SubjectAccessReview Changes

SubjectAccessReview is used for two purposes:

Authorization webhook calls from the kube-apiserver to a webhook. This usage likely benefits from a serialization with []Requirement.
Authorization checks from a client (often a server process using in-cluster authorization like kube-rbac-proxy) This usage likely benefits from a serialization that matches the query parameter.

Their needs are best met with two different serialization (see user stories)


type SubjectAccessReviewSpec struct {
	ResourceAttributes *ResourceAttributes
}

type ResourceAttributes struct {
	FieldSelector *FieldSelectorAttributes

	LabelSelector *LabelSelectorAttributes
}

// FieldSelectorAttributes indicates a field limited access.
// For webhooks:
// The kube-apiserver will never send a request with rawSelector set, but we cannot control what other clients directly send.
// * If rawSelector is empty and requirements are empty, the request is not limited.
// * If rawSelector is present and requirements are empty, the request is not limited.
// * If rawSelector is empty and requirements are present, the requirements should be honored
// * If rawSelector is present and requirements are present, the request is invalid.
// Webhook authors are encouraged to
// * ensure rawSelector and requirements are not both set
// * consider the requirements field if set
// * not try to parse or consider the rawSelector field if set.
//   This is to avoid another CVE-2022-2880 (i.e. getting different systems to agree on how exactly to parse
//   a query is not something we want), see https://www.oxeye.io/resources/golang-parameter-smuggling-attack for more details.
// For the kube-apiserver:
// * If rawSelector is empty and requirements are empty, the request is not limited.
// * If rawSelector is present and requirements are empty, the rawSelector will be parsed and limited if the parsing succeeds.
// * If rawSelector is empty and requirements are present, the requirements should be honored
// * If rawSelector is present and requirements are present, the request is invalid.
type FieldSelectorAttributes struct {
	// rawSelector is the serialization of a field selector that would be included in a query parameter.
	// Webhook implementations are encouraged to ignore rawSelector.
    // The kube-apiserver's SubjectAccessReview will parse the rawSelector. 
	RawSelector string

	// requirements is the parsed interpretation of a field selector.
	// All requirements must be met for a resource instance to match the selector.
	// Webhook implementations should handle requirements, but how to handle them is up to the webhook.
	// Since requirements can only limit the request, it is safe to authorize as unlimited request if the requirements
	// are not understood.
	Requirements []FieldSelectorRequirement
}

// LabelSelectorAttributes indicates a label limited access.
// For webhooks:
// The kube-apiserver will never send a request with rawSelector set, but we cannot control what other clients directly send.
// * If rawSelector is empty and requirements are empty, the request is not limited.
// * If rawSelector is present and requirements are empty, the request is not limited.
// * If rawSelector is empty and requirements are present, the requirements should be honored
// * If rawSelector is present and requirements are present, the request is invalid.
// Webhook authors are encouraged to
// * ensure rawSelector and requirements are not both set
// * consider the requirements field if set
// * not try to parse or consider the rawSelector field if set.
//   This is to avoid another CVE-2022-2880 (i.e. getting different systems to agree on how exactly to parse
//   a query is not something we want), see https://www.oxeye.io/resources/golang-parameter-smuggling-attack for more details.
// For the kube-apiserver:
// * If rawSelector is empty and requirements are empty, the request is not limited.
// * If rawSelector is present and requirements are empty, the rawSelector will be parsed and limited if the parsing succeeds.
// * If rawSelector is empty and requirements are present, the requirements should be honored
// * If rawSelector is present and requirements are present, the request is invalid.
type LabelSelectorAttributes struct {
	// rawSelector is the serialization of a field selector that would be included in a query parameter.
    // Webhook implementations are encouraged to ignore rawSelector.
	// The kube-apiserver's SubjectAccessReview will parse the rawSelector. 
	RawSelector string

    // requirements is the parsed interpretation of a label selector.
    // All requirements must be met for a resource instance to match the selector.
    // Webhook implementations should handle requirements, but how to handle them is up to the webhook.
    // Since requirements can only limit the request, it is safe to authorize as unlimited request if the requirements
    // are not understood.
	Requirements []metav1.LabelSelectorRequirement
}

type FieldSelectorRequirement struct {
	// key is the field selector key that the requirement applies to.
	Key string `json:"key" protobuf:"bytes,1,opt,name=key"`
	// operator represents a key's relationship to a set of values.
	// Valid operators are In, NotIn, Exists, DoesNotExist
	// The list of operators may grow in the future.
	// Webhook authors are encouraged to ignore unrecognized operators and assume they don't limit the request.
	// The semantics of "all requirements are AND'd will not change, so other requirements can continue to be enforced.
	Operator LabelSelectorOperator `json:"operator" protobuf:"bytes,2,opt,name=operator,casttype=LabelSelectorOperator"`
	// values is an array of string values. If the operator is In or NotIn,
	// the values array must be non-empty. If the operator is Exists or DoesNotExist,
	// the values array must be empty.
	// +optional
	// +listType=atomic
	Values []string `json:"values,omitempty" protobuf:"bytes,3,rep,name=values"`
}

Importantly, if old webhook authorizers do not honor these new fields, they will assume the broadest possible access and fail closed. If old in-cluster authorization does not include field and label selectors, the kube-apiserver will assume the broadest possible access and fail closed.

Node Authorizer Changes

The node authorizer will be modified to only authorize node clients to list and watch pods with fieldSelectors containing spec.nodeName=$nodeName. The node authorizer will be modified to authorize pod get requests based on the graph.

CEL Authorizer Changes

While admission isn’t supported on List, Watch, or DeleteCollection, it is reasonable to expect that secondary authorization checks may desire to use those verbs and leverage the field and label selector capabilities. To support this we will two congruent options similar to

	"fieldSelector": {
		cel.MemberOverload("resourcecheck_fieldselector", []*cel.Type{ResourceCheckType, cel.StringType}, ResourceCheckType,
			cel.BinaryBinding(resourceCheckName))},
    }

This will allow usage like authorizer.group('').resource('pods').fieldSelector('spec.nodeName=foo').check('list').allowed(). The parsing will happen during the call to allowed where we track errors and have means of handling them already. Field and label selectors that fail to parse will be ignored. No checking of valid verb,selector pairs is made.

User Stories (Optional)

As a SAR client, I want to check a request with a field or label selector

This type of usage probably finds the stringified serialization format used in the query parameters the most convenient format to build their request with. Providing the query parameter serialization format avoids the need for a client to grow a decently complex lexer/parser.

As an authorization webhook author, I want to easily consume the field and label selectors

This type of usage probably finds a serialized []Requirement to be the most convenient way to consume the field and label selector. Providing the parsed value avoids the need for every consumer to grow a decently complex lexer/parser.

Notes/Constraints/Caveats (Optional)

Remember to update these places in existing code:

authorization webhook matchConditions, which evaluates the v1 SubjectAccessReview that would be sent to the webhook: ref .
v1 / v1beta1 SAR translation function ref
v1 SubjectAccessReview construction function ref
cache size decision ref

Risks and Mitigations

client provides field or label selector to kube-apiserver that does not parse

The kube-apiserver may still authorize the request without considering the selectors (system:masters for instance). It will be up to the REST handler to accept or reject requests for bad selectors. This approach also allows an aggregated API server to have extended field and label selector syntax, though we strongly discourage doing so. The kube-apiserver will attempt to authorize without the selector information.

If the client is authorized without the selector, then Allow since they have broader permission.
If the client is not authorized without the selector then either NoOpinion or Fail depending on intent.

client provides field or label selector to kube-apiserver with improper verb

Consider a client that sends an Update request with a field selector on it. The metav1.UpdateOption doesn’t allow this, but imagine devious-user with an alternative library. The ResolveRequestInfo method will not add field and label selectors to the requestInfo, so they will not appear in the authorization.Attributes, so the spurious selectors are not passed to the authorizer. This keeps authorization behavior exactly as it was previously.

SubjectAccessReviews are not modified prior to calling the kube-apiserver authorizer. This allows skew in support between the kube-apiserver and other apiservers.

client provides SAR where field rawSelector does not match field requirements.

The request is rejected. Only one of rawSelector and requirements can be specified.

Design Details

Test Plan

[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.

Prerequisite testing updates

Unit tests

k8s.io/kubernetes/pkg/registry/authorization/subjectaccessreview: 61.9% of statements
k8s.io/kubernetes/pkg/registry/authorization/util: 82.6% of statements
k8s.io/kubernetes/plugin/pkg/auth/authorizer/node: 77.0% of statements
k8s.io/kubernetes/pkg/apis/admissionregistration/validation: 87.6% of statements
k8s.io/kubernetes/pkg/apis/authorization/validation: 97.0% of statements
k8s.io/apiserver/pkg/admission/plugin/cel: 83.6% of statements
k8s.io/apiserver/pkg/authorization/cel: 53.9% of statements
k8s.io/apiserver/pkg/endpoints/filters: 77.2% of statements
k8s.io/apiserver/pkg/endpoints/request: 65.4% of statements
k8s.io/apiserver/plugin/pkg/authorizer/webhook: 86.6% of statements

Unit tests exercise node authorization, CEL compilation for authorization webhook and admission matchConditions, and CEL compilation for authorizer use with and without the feature enabled:

https://github.com/kubernetes/kubernetes/blob/0b1d123fd040359da11dc772947a7908ee907910/plugin/pkg/auth/authorizer/node/node_authorizer_test.go#L75-L81

https://github.com/kubernetes/kubernetes/blob/0b1d123fd040359da11dc772947a7908ee907910/staging/src/k8s.io/apiserver/pkg/authorization/cel/compile_test.go#L34

https://github.com/kubernetes/kubernetes/blob/0b1d123fd040359da11dc772947a7908ee907910/staging/src/k8s.io/apiserver/plugin/pkg/authorizer/webhook/webhook_v1_test.go#L806

https://github.com/kubernetes/kubernetes/blob/0b1d123fd040359da11dc772947a7908ee907910/staging/src/k8s.io/apiserver/pkg/admission/plugin/cel/filter_test.go#L503-L620

Integration tests

test/integration/apiserver/cel/authorizerselector/... - triage history
- Fully exercise the new CEL authorizer functions with the feature enabled and disabled
test/integration/auth TestMultiWebhookAuthzConfig - triage history
positive and negative match tests for a webhook matchCondition using selector matching, on actual API requests using selectors and on SubjectAccessReview requests

Test history

e2e tests

This feature is fully tested with unit and integration tests

Graduation Criteria

Alpha

Feature implemented behind a feature flag
Unit tests demonstrating wiring and fallback
Integration test demonstrating field selector wiring
- must include fallback on parsing error as well

Beta

Determine if additional tests are necessary
Ensure reliability of existing tests

GA

All bugs resolved and no new bugs requiring code change since the previous shipped release

Upgrade / Downgrade Strategy

On upgrade to a version that enables the feature, no configuration changes are required to maintain previous behavior of CEL expressions and authorization webhooks. All existing CEL expressions and authorization webhook responses behave identically.

On upgrade to a version that enables the feature, to make use of the new feature:

authorization webhooks can inspect incoming SubjectAccessReview requests for field and label selector information
authorization webhook configuration files can include matchConditions that inspect field and label selector information
admission webhook API matchConditions can use authorizer fieldSelector / labelSelector functions
SubjectAccessReview API requests can specify fieldSelector / labelSelector fields

On downgrade to a version that does not enable the feature by default, or if the feature is disabled:

field and label selector information will no longer be sent to authorization webhooks
authorization webhook configuration files can no longer include matchConditions that inspect field and label selector information
admission webhook API matchConditions use authorizer fieldSelector / labelSelector functions will not error, but will no-op
SubjectAccessReview API requests that specify fieldSelector / labelSelector fields will drop those fields

Version Skew Strategy

New kube-apiserver, old webhook authorizer

The new kube-apiserver will include the field and label selectors, but the old webhook authorizer will ignore them. The old authorizer will assume the broadest possible action and authorize accordingly. Because the old authorizer will only allow the action if the user has permission to act on th entire collection, this fails safely. There may be more rejections than expected, but this behavior matches previous behavior.

Old kube-apiserver, new in-cluster authorizer (or any SAR client)

The new client will include the field and label selectors, but the kube-apiserver will ignore them. The kube-apiserver will assume the broadest possible action and authorize accordingly. Because the kube-apiserver will only allow the action if the user has permission to act on th entire collection, this fails safely. There may be more rejections than expected, but this behavior matches previous behavior.

Production Readiness Review Questionnaire

Feature Enablement and Rollback

How can this feature be enabled / disabled in a live cluster?

Feature gate (also fill in values in kep.yaml)
- Feature gate name: AuthorizeWithSelectors
- Components depending on the feature gate:
  - kube-apiserver
- Feature gate name: AuthorizeNodeWithSelectors
- Components depending on the feature gate:
  - kube-apiserver

Does enabling the feature change any default behavior?

Yes. The kube-apiserver will send field and label selector information to authorization webhooks. The node authorizer will start preventing kubelets from listing pods that are not on their node.

Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

Yes. Set the FeatureGate to false and restart the kube-apiserver. The kube-apiserver will stop sending field and label selector information to authorization webhooks. Persisted CEL expressions using fieldSelector and labelSelector authorization functions will still function.

What happens if we reenable the feature if it was previously rolled back?

The kube-apiserver will send field and label selector information to authorization webhooks.

Are there any tests for feature enablement/disablement?

Yes. Integration tests exercise behavior of CEL expressions with the feature enabled and disabled.

https://github.com/kubernetes/kubernetes/tree/0b1d123fd040359da11dc772947a7908ee907910/test/integration/apiserver/cel/authorizerselector

Rollout, Upgrade and Rollback Planning

How can a rollout or rollback fail? Can it impact already running workloads?

Non-kubelet clients using kubelet credentials to make API requests could be forbidden if they are listing/watching pods without filtering to pods scheduled to the node, or if they are listing/watching nodes other than their own node.

What specific metrics should inform a rollback?

Use of kubelet credentials to make API requests the kubelet is not authorized to make is unexpected, but could be detected in the authorization_attempts_total{result=denied} metric increasing and audit events showing requests from a user in the system:nodes group with an authorization.k8s.io/decision=forbid audit annotation.

Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

Handling of persisted CEL expressions using selector features was tested with the feature disabled, and with a compatibility version of 1.30, to ensure that a previous version API server would not have to handle CEL expressions it did not understand.

Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

Monitoring Requirements

None

How can an operator determine if the feature is in use by workloads?

Workloads do not use this feature directly.

Audit events of SubjectAccessReview API requests would show if selector information was being provided.

Authorization webhooks would be able to observe selector information provided in requests.

How can someone using this feature know that it is working for their instance?

Most of the uses are internal to cluster administrators:

authorization webhooks configured with matchConditions using fieldSelector/labelSelector pass validation and only route requests passing those conditions to the webhook (apiserver_authorization_match_condition_exclusions_total metric will increment if match conditions skip)
authorization webhooks can inspect the SubjectAccessReview requests sent to them to observe selector information
admission webhooks and validating admission policies can use fieldSelector and labelSelector authorizer methods and pass API validation.

What are the reasonable SLOs (Service Level Objectives) for the enhancement?

Use of this feature should not change existing API SLOs.

What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?

Use of this feature should not change existing API SLIs.

Are there any missing metrics that would be useful to have to improve observability of this feature?

There are already metrics for the layers this feature is adding to:

authorization latency
authorization success
webhook authorizer match condition latency
webhook authorizer match condition success
webhook admission match condition latency
webhook admission match condition success
validating admission policy match condition latency
validating admission policy match condition success

Dependencies

Does this feature depend on any specific services running in the cluster?

Scalability

Will enabling / using this feature result in any new API calls?

No.

Will enabling / using this feature result in introducing new API types?

No.

Will enabling / using this feature result in any new calls to the cloud provider?

No.

Will enabling / using this feature result in increasing size or count of the existing API objects?

Existing API fields containing CEL expressions support additional CEL functions.

SubjectAccessReview types (which are not persisted) add new fields for fieldSelector and labelSelector data.

Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?

Enabling the feature adds negligible size to authorization webhook payloads.

Using the authorization selector functions in CEL expressions in authorization webhook matchConditions, admission webhook matchConditions, and validating admission policies can take additional time, though this is no different from increasing the complexity or number of CEL expressions generally. CEL expressions that can be set via REST APIs are subject to cost estimation to limit the complexity and size of the input data used for selectors.

Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?

Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?

No, this feature does not touch nodes.

Troubleshooting

How does this feature react if the API server and/or etcd is unavailable?

This feature is fully contained within the API server.

What are other known failure modes?

Non-kubelet clients using kubelet credentials are forbidden
- Detection: logs of non-kubelet client, authorization_attempts_total{result=denied} metric increasing, audit events showing requests from a user in the system:nodes group with an authorization.k8s.io/decision=forbid audit annotation
- Mitigations:
  - change the non-kubelet client to use its own credential (preferred)
  - adjust the non-kubelet client to use field selectors on pods and nodes
  - temporarily disable the AuthorizeNodeWithSelectors feature gate in kube-apiserver
- Diagnostics: the node authorizer logs the following messages at verbosity level 2 when a client attempts to use kubelet credentials to read nodes or pods without using the expected field selector:
  - node '...' cannot read all nodes, only its own Node object
  - node '...' cannot read '...', only its own Node object
  - can only list/watch pods with spec.nodeName field selector
- Testing: There are tests ensuring the node authorizer forbids these overly broad read requests. Use of kubelet credentials by non-kubelet clients to make API requests the kubelet is not authorized to make is unexpected and unwanted.

What steps should be taken if SLOs are not being met to determine the problem?

Determine if webhook latency or matchCondition latency of matchConditions using these selector functions is the primary contributor, and if that change correlates with enablement of this feature. Test if eliminating use of the CEL selector functions in the offending CEL expression resolves the issue.

Implementation History

v1.31: Alpha release
v1.32: Beta release
v1.34: Stable release

Drawbacks

None considered

Alternatives

None considered

Infrastructure Needed (Optional)

None