KEP-3141: Prevent unauthorised volume mode conversion
KEP-3141: Prevent unauthorised volume mode conversion
- Release Signoff Checklist
- Summary
- Motivation
- Proposal
- Design Details
- Production Readiness Review Questionnaire
- Implementation History
- Drawbacks
- Alternatives
- Infrastructure Needed (Optional)
Release Signoff Checklist
Items marked with (R) are required prior to targeting to a milestone / release.
- (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
- (R) KEP approvers have approved the KEP status as
implementable - (R) Design details are appropriately documented
- (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests for meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
- (R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests
- (R) Production readiness review completed
- (R) Production readiness review approved
- “Implementation History” section is up-to-date for milestone
- User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
- Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
Summary
Users can leverage the VolumeSnapshot feature, which GA’d in Kubernetes 1.20, to
create a PersistentVolumeClaim or PVC from a previously taken VolumeSnapshot.
This is done by pointing the Spec.dataSource parameter of the PVC to an existing
VolumeSnapshot instance. There is no logic that validates whether the original
volume mode of the PVC, whose snapshot was taken, matches the volume mode of
the newly created PVC that is being created from the existing VolumeSnapshot.
This KEP proposes a solution to prevent unauthorized conversion of the volume
mode during such an operation.
Motivation
Malicious users may expose a vulnerability in the kernel by exploiting this gap. Here is an example of how a malicious user can exploit this gap to crash the kernel.
- User creates a
PVCwithvolumeMode: Blockand runs a pod with it. - User writes malformed ext4 data to it (simple dd)
- User takes snapshot of this volume.
- User creates a
PVCwithvolumeMode: Filesystemfrom the above snapshot. - User uses this
PVCin a pod.- kubelet tries to mount it during pod creation. If there is a CVE in the kernel, the user can crash it.
Note that, as of this writing, there is no known CVE in the kernel that a malicious user can exploit. However CVE’s are regularly discovered that affect filesystems. For example https://access.redhat.com/security/cve/cve-2020-12655 allows an attacker to trigger a DoS attack on the kernel. This proposal aims to prevent a security vulnerability in the event that a CVE is discovered.
We cannot simply block this operation as some backup vendors try to create a volume with the exact same mode as the original volume but may need to do the conversion for efficiency. An example workflow of a backup vendor could look like:
- Assume the original
PVCis created withvolumeMode: Filesystem. - During backup, the backup software will create a
PVCfrom aVolumeSnapshotwithvolumeMode: Block. This steps needs volume mode conversion. The purpose of creating thisPVCwith block mode is to be able to copy data efficiently and save it to a backup target. - The
PVCcreated in the previous step is temporary and will be deleted after data is copied. - Finally at restore time, another
PVCwill be created withvolumeMode: Filesystem.
Goals
Define a mechanism to mitigate the vulnerability of restoring volumes without hampering valid use cases.
Non-Goals
Design that is generic and can be extended to other storage related security aspects.
Proposal
The proposal aims to mitigate this issue by modifying the VolumeSnapshotContent
API spec as well as the control flows of snapshot-controller and external-provisioner.
VolumeSnapshotContent API will include a field that denotes the volume mode of
the volume that the snapshot was created from.
This proposal also introduces a new annotation on the VolumeSnapshotContent resource
that a trusted user (like a backup software) needs to apply on a VolumeSnapshot.
By introducing these changes, we will leverage existing user access rights to determine
whether the volume mode of a volume can be altered when a PVC is being created
from a VolumeSnapshot.
User Stories (Optional)
Story 1
When a VolumeSnapshot is created from an existing PVC, a corresponding
VolumeSnapshotContent is created by the snapshot-controller.
Alternatively, a VolumeSnapshotContent can be manually created by an admin
if the Spec.Source.SnapshotHandle refers to a pre-existing snapshot on the
underlying storage system. In either case, VolumeSnapshots and VolumeSnapshotContents
maintain a 1:1 mapping.
Backup vendors that need to convert the volume mode when creating a PVC
need to identify the VolumeSnapshotContent mapped to the VolumeSnapshot
from which the PVC is being created.
Either through software or via manual intervention, the annotation
snapshot.storage.kubernetes.io/allow-volume-mode-change: true needs to be applied
to the VolumeSnapshotContent. If the backup software is a privileged user,
it will have Update and Patch permissions on VolumeSnapshotContents.
Then the backup software can continue with the operation by creating a PVC
with Spec.DataSource pointing to the VolumeSnapshot instance.
Story 2
Here is an example of how this change prevents a malicious user from exploiting this vulnerability.
- User creates a
PVCwithvolumeMode: Blockand runs a pod with it. - User writes malformed ext4 data to it (simple dd)
- User takes snapshot of this volume.
- User attempts to create a
PVCwithvolumeMode: Filesystemfrom the snapshot.- This is blocked as the user does not have
UpdateorPatchpermissions onVolumeSnapshotContentresources.
- This is blocked as the user does not have
Notes/Constraints/Caveats (Optional)
Risks and Mitigations
Design Details
A new out-of-tree flag named PreventVolumeModeConversion will be introduced on
snapshot-controller and csi-provisioner. Both of these components are
out-of-tree so this proposal will not require any in-tree feature gates.
Changes to VolumeSnapshotContent API
With this design, we will introduce two new changes to the VolumeSnapshotContent API:
- A new optional field, called
SourceVolumeModewill be added to theSpecofVolumeSnapshotContents. This field will be immutable.
type VolumeSnapshotContentSpec struct {
...
// SourceVolumeMode is the mode of the volume whose snapshot is taken.
// Can be either “Filesystem” or “Block”.
// If left empty, will be treated as “Unknown”.
// +optional
SourceVolumeMode *SourceVolumeMode
...
- A new annotation to
VolumeSnapshotContentobjects. The onus is on the backup vendor (via s/w or manually) to add this annotation to theVolumeSnapshotContentif they intend to alter the volume mode. TheVolumeSnapshotContentmust look like below after this change:
kind: VolumeSnapshotContent
metadata:
annotations:
- snapshot.storage.kubernetes.io/allow-volume-mode-change: "true"
...
Changes to Snapshot Controller
There are two cases to consider:
- Dynamic Provisioning
VolumeSnapshotis created by the user, withVolumeSnapshotClassoptionally specified in the spec.VolumeSnapshotContentis created by thesnapshot-controllerin response to (i).snapshot-controllerpopulates theSpecof the givenVolumeSnapshotContent.- With this change, the controller will fetch the
Spec.PersistentVolumeModeof thePVand add that to newly introducedSpec.SourceVolumeModefield of the VolumeSnapshotContent to be created.
- With this change, the controller will fetch the
- Static Provisioning
VolumeSnapshotContentis created by the admin. With this change, the admin will be expected to fill theSpec.SourceVolumeModefield appropriately. If left nil,Unknownmode will be assumed to preserve existing behavior.
Changes to external-provisioner
This design leverages the access rights of a user on VolumeSnapshotContents to
determine whether the volume mode can be modified when a PVC is being created
with a VolumeSnapshot as the source.
The volume mode can be altered if the requesting user has Update and Patch rights
on VolumeSnapshotContents (which is a cluster scoped resource).
The control flow for creating a PVC from a VolumeSnapshot will look like below:
- A user attempts to create a
PVCfrom aVolumeSnapshotby specifying theSpec.DataSourceparameter of thePVCYAML. external-provisionerreceives a callback to dynamically create the volume. As part of the preprocessing steps, it will:- Get the
Spec.SourceVolumeModeof theVolumeSnapshotContent.- If
Spec.SourceVolumeModedoesn’t exist or is nil, then continue with volume provisioning to preserve existing behavior.
- If
- Get the
Spec.VolumeModeof thePVCbeing created. If they do not match:- Get all annotations on the
VolumeSnapshotContentand verify ifsnapshot.storage.kubernetes.io/allow-volume-mode-change: trueexists. If it does not exist, block volume provisioning by returning an error.
- Get all annotations on the
- Get the
- In all other cases, let volume provisioning continue.
NOTE: external-provisioner maintains a reference to PVC and VolumeSnapshotContent
during volume creation. This proposal leverages those references to make additional
decisions.
Test Plan
[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.
Prerequisite testing updates
None. New E2E tests will be added for the transition to beta.
Unit tests
The unit tests were added to the CSI external-provisioner repo.
Integration tests
- No integration tests added.
e2e tests
The feature flag will be enabled for e2e tests. The tests will attempt to convert volume
mode when creating a PVC from a VolumeSnapshot:
- With
Spec.SourceVolumeModepopulated andsnapshot.storage.kubernetes.io/allow-volume-mode-change: trueannotation present - https://github.com/kubernetes-csi/external-provisioner/pull/867/files : https://testgrid.k8s.io/sig-storage-csi-external-provisioner#canary - With
Spec.SourceVolumeModepopulated but nosnapshot.storage.kubernetes.io/allow-volume-mode-change: trueannotation - https://github.com/kubernetes-csi/external-provisioner/pull/832 : https://testgrid.k8s.io/sig-storage-csi-external-provisioner#canary - With
Spec.SourceVolumeModeset tonil- https://github.com/kubernetes-csi/external-provisioner/pull/867/files : https://testgrid.k8s.io/sig-storage-csi-external-provisioner#canary
Graduation Criteria
Alpha
- Feature implemented behind an out-of-tree feature flag.
- Feedback from users.
- Implementation of unit and e2e tests.
Alpha -> Beta
- One release with positive feedback from users.
Beta -> GA
- Deployed in production and in use by backup software.
- Gone through one kubernetes upgrade.
Upgrade / Downgrade Strategy
- Upgrading
external-snapshotterandexternal-provisionerwithPreventVolumeModeConversionenabled:
VolumeSnapshotscreated after the upgrade will maintain a reference to the source volume mode. Newly createdPVCswill undergo an additional check before the provisioning is performed on the storage backend.VolumeSnapshotscreated before the upgrade will leave the new API field unpopulated.
- Downgrading
external-snapshotterandexternal-provisionerwithPreventVolumeModeConversiondisabled:
VolumeSnapshotscreated prior to the upgrade will still maintain a reference to the source volume mode, butPVCscan be created from them without the additional check.
Version Skew Strategy
This proposal requires changes to three components - VolumeSnapshotContent API,
external-snapshotter and external-provisioner.
If any of the components are not upgraded to a version supporting this feature, then the feature will not work as expected. From an end user perspective, the existing behavior will continue, ie, there will be no check to prevent unauthorized conversion of the volume mode.
Production Readiness Review Questionnaire
Feature Enablement and Rollback
How can this feature be enabled / disabled in a live cluster?
- Other
- Describe the mechanism: Out-of-tree flag named
PreventVolumeModeConversion, which will be enabled inexternal-provisionerandexternal-snapshotter. - Will enabling / disabling the feature require downtime of the control
plane?
external-provisionerandexternal-snapshotterwill need to be restarted for the changes to take effect. This means that there will be a few seconds of downtime until the newer Pods are Running. There will not be any effect on the previously running applications. - Will enabling / disabling the feature require downtime or reprovisioning of a node? No
- Describe the mechanism: Out-of-tree flag named
Does enabling the feature change any default behavior?
Yes. Users without requisite privileges cannot alter the volume mode of VolumeSnapshot
when it is being used to create a PVC. Users with privileges need to add an
annotation to the corresponding VolumeSnapshotContent instance if they
require the volume mode to be converted.
The default behavior does not make any validations prior to provisioning a volume.
The volume mode can be converted by any user when a PVC is created from a
VolumeSnapshot.
Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
Yes. Disabling the feature is supported and will fall back to the existing behavior.
What happens if we reenable the feature if it was previously rolled back?
The new behaviour will be re enabled. VolumeSnapshots created when the feature
was disabled will not have the new capabilities.
Are there any tests for feature enablement/disablement?
We will add unit tests with and without the feature flag enabled. The expectation
is for new fields in VolumeSnapshotContent to be dropped when the feature flag
is disabled.
Rollout, Upgrade and Rollback Planning
How can a rollout or rollback fail? Can it impact already running workloads?
Due to the feature gate on the external-provisioner, rolling out this feature does not affect existing Pods that use PVCs. It also does not affect VolumeSnapshots that are created prior to rolling out the feature, ie, the volume mode of an existing VolumeSnapshot can be modified while creating a PVC.
What specific metrics should inform a rollback?
- persistentvolumeclaim_provision_failed_total
Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
Yes. The feature flag was enabled and disabled separately in the csi-provisioner and snapshot-controller.
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
No.
Monitoring Requirements
How can an operator determine if the feature is in use by workloads?
If the feature gate is enabled in the external-provisioner and snapshot-controller, this feature will always be in use when creating a PVC from a VolumeSnapshot.
How can someone using this feature know that it is working for their instance?
- Events
- Event Reason: ProvisioningFailed
- Event Message: Failed to provision volume with StorageClass “csi-hostpath-sc”: error getting handle for DataSource Type VolumeSnapshot by Name new-snapshot-demo: requested volume default/hpvc-restore modifies the mode of the source volume but does not have permission to do so. snapshot.storage.kubernetes.io/allow-volume-mode-change annotation is not present on snapshotcontent snapcontent-8d709f2e-db04-444f-aae2-e17d6c5398dd
What are the reasonable SLOs (Service Level Objectives) for the enhancement?
We will add new labels to the existing persistentvolumeclaim_provision_failed_total metric for the volume data source and status code. The per-day percentage of calls with error status code <= 1. However the failure will always happen as long as the feature is correctly enabled and the annotations are not applied correctly to VolumeSnapshotContent objects.
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
- Metrics
- Metric name: persistentvolumeclaim_provision_failed_total
- [Optional] Aggregation method:
- Components exposing the metric: external-provisioner
Are there any missing metrics that would be useful to have to improve observability of this feature?
There are no metrics for persistentvolumeclaims created from volumesnapshots. This KEP aims to add those metrics to the external-provisioner.
Dependencies
Does this feature depend on any specific services running in the cluster?
- [external-provisioner]
- Usage description: Failure events are emitted as events by the external-provisioner.
- Impact of its outage on the feature: Outage of this component will prevent error reporting to users.
- Impact of its degraded performance or high-error rates on the feature: Outage of this component will prevent error reporting to users.
- Usage description: Failure events are emitted as events by the external-provisioner.
Scalability
Will enabling / using this feature result in any new API calls?
This feature adds an event write to the API server when PVC creation is blocked.
Will enabling / using this feature result in introducing new API types?
This feature adds a new field to the existing VolumeSnapshotContent API.
Will enabling / using this feature result in any new calls to the cloud provider?
No.
Will enabling / using this feature result in increasing size or count of the existing API objects?
The size of VolumeSnapshotContents will increase as we will introduce a new
field to the API. Also, users will be adding an annotation to individual
objects on a need basis.
Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
The latency of CSI’s CreateVolume may increase due to this change, when the
Spec.DataSource field points to a VolumeSnapshot instance. This is because
there is an additional check to determine whether volume provisioning must
continue. However, this increase is expected to be minimal as there are no new
API calls and the volume spec has already been loaded into memory of the external-provisioner.
Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?
No.
Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
No. This feature does not introduce any resource exhaustive operations.
Troubleshooting
How does this feature react if the API server and/or etcd is unavailable?
In case PVC creation is blocked due to this feature, the failure event will not be emitted due to the unavailability of the API server. Users will need to refer to the external-provisioner logs to determine why PVC creation is failing.
What are other known failure modes?
There are no other known failure modes.
What steps should be taken if SLOs are not being met to determine the problem?
The user needs to read the logs of the external-provisioner to determine the reason behind why PVC creation is failing.
Implementation History
- 2023-02-06: KEP updated to mark transition to beta off-by-default
- 2023-12-23: KEP updated to mark transition to beta on-by-default
- 2024-01-24: KEP updated to mark transition to stable
Drawbacks
Alternatives
VolumeSecurityPolicy
Proposal to create a new policy called VolumeSecurityPolicy, which will be used
to control access for creation of PVCs.
This proposal also includes an admission controller that prevents PVCs from being
restored with the wrong volume mode, unless the user that attempts to do so is a
privileged user (as defined by the VolumeSecurityPolicy).
As part of this proposal, there will be only a single field in the Spec -
allowVolumeModeModification, which can be set to true or false.
Once a VolumeSecurityPolicy is created, it must be tied to a user or a service
account, similar to tying a PSP to a user/service account.
An admission controller will be introduced that intercepts requests to create a
PVC. In case the PVC is being restored from a snapshot and is modifying the
volumeMode, it validates that the user requesting the PVC has the allowed
privileges. If not, the admission controller rejects the PVC create request.
Rejected as PSP was recently deprecated in lieu of PodSecurityStandards. If we need a standard for storage security, we should follow that approach.
VolumeSecurityStandard
Introduce VolumeSecurityStandards that enforceable by any mechanism, including
webhooks, similar to PodSecurityStandards.
We will define two policies as part of this design:
Privileged- least restrictive policy that allows the widest level of permissions.Restricted- most restrictive policy that follows security best practices.
A Mode defines how a violation of the given security policy is handled.
There are three modes:
Enforce: violations of the policy are not allowed.Audit: violations trigger an audit annotation, but are otherwise allowed.Warn: violations trigger a user-facing warning, but are otherwise allowed.
A VolumeSecurityStandard is applied on a per-namespace basis. This gives an
admin the ability to apply different standards based on the users of a namespace.
An admission controller will be introduced that intercepts requests to create a PVC. The VolumeSecurityStandards will be hardcoded into this admission controller.
Rejected as the solution was too generic for a very specific use case. If and when there are more storage related security aspects that need a generic solution, we can reconsider this approach.
Annotation on VolumeSnapshotClass
This proposal introduced a new annotation on the VolumeSnapshotClass object
allowModeConversionForUsers: <comma separated list of allowed users>.
The above comma separated list of users are set by the admin. They will be allowed
to modify the volume mode when restoring a PVC from a Snapshot.
The annotation allowModeConversionForUsers will be copied to the VolumeSnapshotContent
by the snapshot-controller from the VolumeSnapshotClass.
VolumeSnapshotClass is cluster-scoped therefore applying this annotation is
restricted to privileged users only.
An admission controller will be introduced that intercepts requests to create a PVC.
Rejected due to issues with immutability of this lists. For example, if a users access is revoked, does the admin need to modify all existing resources that allow this user to modify volume mode? Also there were concerns with introducing a new mechanism for access control.