KEP-5943: Topology For Volume Snapshots
5943-topology-for-volume-snapshot: Topology For Volume Snapshot
- Release Signoff Checklist
- Summary
- Motivation
- Proposal
- Design Details
- Feature Gate
- CSI Spec Changes
- Volume Snapshot Components Changes
- Scheduler Plugin (at kubernetes-sigs/scheduler-plugins repo)
- External-Provisioner Changes (Immediate Volume Binding)
- What happens with statically provisioned snapshots?
- Error Handling
- Test Plan
- Graduation Criteria
- Upgrade / Downgrade Strategy
- Version Skew Strategy
- Production Readiness Review Questionnaire
- Implementation History
- Drawbacks
- Alternatives
- Infrastructure Needed (Optional)
Release Signoff Checklist
Items marked with (R) are required prior to targeting to a milestone / release.
- (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
- (R) KEP approvers have approved the KEP status as
implementable - (R) Design details are appropriately documented
- (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
- (R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests within one minor version of promotion to GA
- (R) Production readiness review completed
- (R) Production readiness review approved
- “Implementation History” section is up-to-date for milestone
- User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
- Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
Summary
This KEP proposes adding topology support to volume snapshots in Kubernetes by extending the VolumeSnapshotContentSpec object with a NodeAffinity field and enforcing topology compatibility when restoring volumes from snapshots. Currently, volume snapshots lack any topology information, creating a gap that prevents topology-aware scheduling and features.
The enhancement has two parts. First, topology information will be added to volume snapshot contents by returning the topology in the CSI CreateSnapshotResponse and stored in the resulting VolumeSnapshotContent. Second, topology enforcement ensures that volumes restored from snapshots are provisioned in compatible topologies:
- For
WaitForFirstConsumervolume binding: an out-of-tree scheduler plugin (inkubernetes-sigs/scheduler-plugins) will filter nodes whose topology is incompatible with the snapshot’sNodeAffinity, ensuring the selected node leads to provisioning in a compatible topology. - For
Immediatevolume binding: the external-provisioner will intersect the snapshot’sNodeAffinitywith the StorageClass’sAllowedTopologiesto select a compatible topology for provisioning.
This prevents provisioning failures or silent data inaccessibility regardless of volume binding mode.
Motivation
This enhancement emerged from discussions in the 11/27/2025 Kubernetes CSI Implementation Meeting (see notes) around implementing cross-region snapshot cloning capabilities in Kubernetes. During the discussions, it became clear that topology support in snapshots is a prerequisite for a feature of this kind. Rather than implementing snapshot cloning without the proper topology foundations, it was decided to first establish topology for snapshots, creating a solid base for future enhancements.
A PR for adding topology for snapshots to the CSI Spec was previously in review (Add Topology for Snapshot #274 ). At the time features such as cross region snapshot cloning or specifying snapshot final placement were not considered and as such there was no valid use case(s) to justify implementing this so further development/review was halted.
While CSI volumes have comprehensive topology support, snapshots operate without any topology context. The lack of topology information in snapshots is the problem that this KEP addresses by establishing topology awareness as an optional capability of the volume snapshot.
Goals
- Add topology information to
VolumeSnapshotContentobjects via a mutableNodeAffinityfield. - Extend CSI spec to support optional topology fields in
CreateSnapshotRequestandCreateSnapshotResponse. - Provide an out-of-tree scheduler plugin (in
kubernetes-sigs/scheduler-plugins) that enforces topology compatibility when scheduling pods with PVCs that reference a snapshot as their data source withWaitForFirstConsumervolume binding, ensuring the scheduler only selects nodes whose topology is compatible with the snapshot’sNodeAffinity. - Ensure the external-provisioner intersects the snapshot’s
NodeAffinitywithStorageClass.AllowedTopologieswhen provisioning volumes from snapshot data sources withImmediatevolume binding mode.
Non-Goals
- Implementing cross region/AZ snapshot cloning functionality.
- Add ability to modify any existing volume snapshot fields.
- Modifying the in-tree kube-scheduler or adding snapshot CRD dependencies to core Kubernetes components.
- Update existing snapshot objects with Topology information once feature is enabled.
Proposal
User Stories (Optional)
Story 1
As a cluster operator, I want to see topology information for each volume snapshot, so that I can audit snapshot distribution, understand disaster recovery exposure, and identify zone-specific risks in my backup strategy.
Story 2
As a cluster administrator, I want to specify a target location (regions, zones, racks, etc.) when creating a snapshot via the VolumeSnapshotClass, so that the snapshot is stored in a location that aligns with my disaster recovery or data locality requirements.
Story 3
As a developer, I want to restore a volume from a snapshot and have my pod automatically scheduled to a node that is topology-compatible with the snapshot, so that provisioning succeeds without manual intervention.
Notes/Constraints/Caveats (Optional)
N/A
Risks and Mitigations
N/A
Design Details
Feature Gate
A new feature gate, VolumeSnapshotTopology, will be introduced to control the functionality implemented by this KEP. Since VolumeSnapshotContent is a CRD, the field cannot be dropped at the API level the way core types handle feature-gated fields. For alpha, the feature gate controls controller behavior only:
- When disabled, the csi-snapshotter sidecar does not patch
NodeAffinityonto VolumeSnapshotContent objects. - The scheduler plugin does not enforce topology constraints.
- The external-provisioner does not read the
NodeAffinityfield for topology intersection.
The field exists in the CRD schema but remains empty. If someone manually patches it while the gate is disabled, no component acts on it since all consumers also check the gate.
For Beta, we will investigate using ValidatingAdmissionPolicy/MutatingAdmissionPolicy to reject writes to the field when the gate is disabled, providing API-level enforcement.
CSI Spec Changes
message PluginCapability {
message Service {
enum Type {
... Existing capabilities
// SNAPSHOT_ACCESSIBILITY_CONSTRAINTS indicates that the snapshots
// for this plugin MAY NOT be equally accessible from all
// topologies in the cluster. The CO MUST use the topology
// information returned in the CreateSnapshotResponse to ensure
// that a desired volume can be provisioned from a given snapshot
// when scheduling workloads.
SNAPSHOT_ACCESSIBILITY_CONSTRAINTS = 5;
}
}
message CreateSnapshotRequest {
... Existing CreateSnapshotRequest fields
// Specifies where (regions, zones, racks, etc.) the provisioned
// snapshot MUST be accessible from.
// An SP SHALL advertise the requirements for topological
// accessibility information in documentation. COs SHALL only specify
// topological accessibility information supported by the SP.
// This field is OPTIONAL.
// This field SHALL NOT be specified unless the SP has the
// SNAPSHOT_ACCESSIBILITY_CONSTRAINTS plugin capability.
// If this field is not specified and the SP has the
// SNAPSHOT_ACCESSIBILITY_CONSTRAINTS plugin capability, the SP MAY
// choose where the provisioned snapshot is accessible from.
TopologyRequirement accessibility_requirements = 5;
}
message Snapshot {
... Existing fields
// Specifies where (regions, zones, racks, etc.) the provisioned
// snapshot is accessible from.
// A plugin that returns this field MUST also set the
// SNAPSHOT_ACCESSIBILITY_CONSTRAINTS plugin capability.
// An SP MAY specify multiple topologies to indicate the snapshot is
// accessible from multiple locations.
// COs MAY use this information to ensure that a desired volume can
// be provisioned from a given snapshot when scheduling workloads.
// This field is OPTIONAL. If it is not specified, the CO MAY assume
// the snapshot is equally accessible from all topologies in the
// cluster and MAY provision volumes referencing the snapshot as a
// source without topology constraints.
repeated Topology accessible_topology = 7;
}
Additionally, a new error condition is added to the CreateSnapshot Errors table:
| Condition | gRPC Code | Description | Recovery Behavior |
|---|---|---|---|
Unable to create snapshot in accessibility_requirements | 8 RESOURCE_EXHAUSTED | Indicates that although the accessibility_requirements field is valid, a snapshot cannot be created with the specified topology constraints. More human-readable information MAY be provided in the gRPC status.message field. | Caller MUST ensure that whatever is preventing snapshots from being created in the specified location (e.g. quota issues) is addressed before retrying with exponential backoff. |
This mirrors the existing error condition defined for CreateVolume when topology cannot be satisfied.
Volume Snapshot Components Changes
There will be changes required in the Volume Snapshot CRDs and the CSI snapshotter sidecar controller.
Volume Snapshot CRD
Add topology field to VolumeSnapshotContentSpec object:
type VolumeSnapshotContentSpec struct {
// ... existing fields ...
// nodeAffinity defines the node topologies from which a volume can
// be provisioned using this snapshot as a source. This is derived from the
// CSI driver's CreateSnapshotResponse. For WaitForFirstConsumer volume
// binding, the scheduler plugin compares these terms against node labels
// to filter candidate nodes (alongside the built-in volumebinding plugin
// which also considers StorageClass.AllowedTopologies). For Immediate
// volume binding, the external-provisioner intersects these terms with
// StorageClass.AllowedTopologies to select a compatible provisioning
// topology.
// This field is mutable to support topology changes during the snapshot
// lifecycle (e.g., snapshot replication to additional regions, admin
// corrections for statically provisioned snapshots). Changes only affect
// future scheduling and provisioning - already-running workloads are not
// impacted.
// +optional
NodeAffinity []v1.TopologySelectorTerm `json:"nodeAffinity,omitempty" protobuf:"bytes,7,rep,name=nodeAffinity"`
}
Example VolumeSnapshotContent with topology:
Name: snapcontent-123-456-789
Namespace:
Labels: <none>
Annotations: <none>
API Version: snapshot.storage.k8s.io/v1
Kind: VolumeSnapshotContent
Metadata:
Creation Timestamp: 2030-02-12T00:06:57Z
Finalizers:
snapshot.storage.kubernetes.io/volumesnapshotcontent-bound-protection
Generation: 1
Resource Version: 345-678-912
Spec:
Deletion Policy: Delete
Driver: ebs.csi.aws.com
Source:
Volume Handle: vol-123456789
Source Volume Mode: Filesystem
Volume Snapshot Class Name: csi-aws-vsc
Volume Snapshot Ref:
API Version: snapshot.storage.k8s.io/v1
Kind: VolumeSnapshot
Name: ebs-volume-snapshot
Namespace: default
Resource Version: 111691
UID: 123-456-789
# Topology field - snapshot accessible from two zones
Node Affinity:
- matchLabelExpressions:
- key: topology.kubernetes.io/region
values:
- us-west-2
- key: topology.kubernetes.io/zone
values:
- us-west-2a
- us-west-2b
Status:
Creation Time: 1234567890000000
Ready To Use: true
Restore Size: 4294967296
Snapshot Handle: snap-123456789
Events: <none>
VolumeSnapshotClass CRD
Add an optional AllowedTopologies field to VolumeSnapshotClass to allow admins to restrict the node topologies where snapshots created from this class are accessible from. This mirrors the AllowedTopologies field on StorageClass and maps to the AccessibilityRequirements field in the CSI CreateSnapshotRequest.
type VolumeSnapshotClass struct {
// ... existing fields ...
// allowedTopologies restricts the node topologies where snapshots created
// using this class are accessible from. Each volume plugin defines its own
// supported topology specifications. An empty list means there is no
// topology restriction. This is passed to the CSI driver as
// AccessibilityRequirements in the CreateSnapshotRequest.
// +optional
AllowedTopologies []v1.TopologySelectorTerm `json:"allowedTopologies,omitempty" protobuf:"bytes,5,rep,name=allowedTopologies"`
}
Example VolumeSnapshotClass with allowed topologies:
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: csi-aws-vsc
driver: ebs.csi.aws.com
deletionPolicy: Delete
allowedTopologies:
- matchLabelExpressions:
- key: topology.kubernetes.io/region
values:
- us-west-2
When present, the sidecar controller reads AllowedTopologies from the class and converts it into CSI CreateSnapshotRequest.AccessibilityRequirements. If not specified, the request is sent without topology requirements.
Snapshotter (pkg/snapshotter)
Update the Snapshotter interface and implementation to accept topology requirements as input and return topology from the CSI driver response.
// Updated Snapshotter interface
type Snapshotter interface {
CreateSnapshot(ctx context.Context, snapshotName string, volumeHandle string, parameters map[string]string, snapshotterCredentials map[string]string, /* new parameter */ accessibilityRequirements *csi.TopologyRequirement) (driverName string, snapshotId string, timestamp time.Time, size int64, readyToUse bool, /* new return value */ accessibleTopology []*csi.Topology, err error)
// ... existing methods unchanged ...
}
func (s *snapshot) CreateSnapshot(ctx context.Context, snapshotName string, volumeHandle string, parameters map[string]string, snapshotterCredentials map[string]string, /* new parameter */ accessibilityRequirements *csi.TopologyRequirement) (string, string, time.Time, int64, bool, /* new return value */ []*csi.Topology, error) {
klog.V(5).Infof("CSI CreateSnapshot: %s", snapshotName)
client := csi.NewControllerClient(s.conn)
// ... Existing Code ...
req := csi.CreateSnapshotRequest{
SourceVolumeId: volumeHandle,
Name: snapshotName,
Parameters: parameters,
Secrets: snapshotterCredentials,
AccessibilityRequirements: accessibilityRequirements, // optional new field.
}
rsp, err := client.CreateSnapshot(ctx, &req)
if err != nil {
return "", "", time.Time{}, 0, false, nil, err
}
// ... Existing Code ...
creationTime := rsp.Snapshot.CreationTime.AsTime()
return driverName, rsp.Snapshot.SnapshotId, creationTime, rsp.Snapshot.SizeBytes, rsp.Snapshot.ReadyToUse, /* new return value containing topology from CreateSnapshotResponse*/ rsp.Snapshot.AccessibleTopology, nil
}
CSI Handler (pkg/sidecar-controller/csi_handler.go)
Update Handler interface and implementation to accept topology requirements and return topology.
type Handler interface {
CreateSnapshot(content *crdv1.VolumeSnapshotContent, parameters map[string]string, snapshotterCredentials map[string]string, /* new parameter */ accessibilityRequirements *csi.TopologyRequirement) (string, string, time.Time, int64, bool, /* new return value */ []*csi.Topology, error)
// ... existing methods unchanged ...
}
func (handler *csiHandler) CreateSnapshot(content *crdv1.VolumeSnapshotContent, parameters map[string]string, snapshotterCredentials map[string]string, /* new parameter */ accessibilityRequirements *csi.TopologyRequirement) (string, string, time.Time, int64, bool, /* new return value */ []*csi.Topology, error) {
// ... existing validation ...
return handler.snapshotter.CreateSnapshot(ctx, snapshotName, *content.Spec.Source.VolumeHandle, parameters, snapshotterCredentials, accessibilityRequirements)
}
CSI Snapshotter (pkg/sidecar-controller/snapshot_controller.go)
Update createSnapshotWrapper to read topology requirements from the VolumeSnapshotClass, pass them to the CSI handler, and persist the resulting topology to the VolumeSnapshotContent.
func (ctrl *csiSnapshotSideCarController) createSnapshotWrapper(content *crdv1.VolumeSnapshotContent) (*crdv1.VolumeSnapshotContent, error) {
klog.Infof("createSnapshotWrapper: Creating snapshot for content %s through the plugin ...", content.Name)
// ... existing code ...
// Try to build requirements from VolumeSnapshotClass if feature gate is enabled
var accessibilityRequirements *csi.TopologyRequirement
if feature.DefaultFeatureGate.Enabled(features.VolumeSnapshotTopology) && class != nil && len(class.AllowedTopologies) > 0 {
accessibilityRequirements = &csi.TopologyRequirement{
Preferred: allowedTopologiesToCSI(class.AllowedTopologies),
}
}
driverName, snapshotID, creationTime, size, readyToUse, accessibleTopology, err := ctrl.handler.CreateSnapshot(content, parameters, snapshotterCredentials, accessibilityRequirements)
if err != nil {
// ... existing error handling ...
}
// ... existing status update and annotation removal ...
// Patch VolumeSnapshotContent Spec with topology from CSI driver response
if feature.DefaultFeatureGate.Enabled(features.VolumeSnapshotTopology) && len(accessibleTopology) > 0 {
terms := convertCSITopologyToTerms(accessibleTopology)
if len(terms) > 0 {
patches := []utils.PatchOp{
{
Op: "add",
Path: "/spec/nodeAffinity",
Value: terms,
},
}
content, err = utils.PatchVolumeSnapshotContent(content, patches, ctrl.clientset, "")
if err != nil {
return content, fmt.Errorf("failed to patch topology for volume snapshot content %s: %v", content.Name, err)
}
}
}
return content, nil
}
// convertCSITopologyToTerms converts CSI Topology segments into the
// []v1.TopologySelectorTerm shape used by VolumeSnapshotContentSpec.NodeAffinity.
func convertCSITopologyToTerms(csiTopology []*csi.Topology) []v1.TopologySelectorTerm {
var terms []v1.TopologySelectorTerm
// ... any conversion necessary
return terms
}
Scheduler Plugin (at kubernetes-sigs/scheduler-plugins repo)
Why a scheduler plugin is necessary
Storing topology on VolumeSnapshotContent alone is not sufficient to prevent provisioning failures. When a PVC references a snapshot as its data source and uses WaitForFirstConsumer volume binding, the scheduler selects a node before provisioning begins. The external-provisioner then provisions the volume in a topology derived from the selected node. If that topology is incompatible with the snapshot’s NodeAffinity, the CSI CreateVolume call will fail because that source snapshot may not be accessible.
The scheduler’s built-in volumebinding plugin has no awareness of snapshot topology today. It only considers the StorageClass’s allowedTopologies and the node’s own constraints. This means it can select a node in foo-bar-1 even though the snapshot source is only accessible from foo-bar-2.
Attempting to solve this at the provisioner level (e.g., in the external-provisioner) would only allow us to fail after the scheduler has already made its decision. The provisioner could reject the request and trigger a reschedule, but this creates a retry loop: the scheduler picks a node, provisioning fails, the scheduler picks another node, and so on with no guarantee of convergence.
The proposed solution with this scheduler plugin is to filter out incompatible nodes during scheduling, before provisioning begins. Because the plugin needs to read VolumeSnapshot and VolumeSnapshotContent CRDs (which are not core Kubernetes API types), it cannot be added to the in-tree kube-scheduler. The plugin therefore lives as an out-of-tree plugin along with the various other scheduler plugins in kubernetes-sigs/scheduler-plugins.
High Level Plugin design (Based on Scheduling Framework
The plugin implements two extension points: PreFilter and Filter.
PreFilter runs once per scheduling cycle. It inspects the pod’s PVC volumes, resolves any snapshot data sources to their VolumeSnapshotContent, and caches the NodeAffinity in the CycleState. This avoids repeating the lookup for every candidate node.
Filter runs once per candidate node. It retrieves the cached snapshot topology from CycleState and checks whether the node’s labels satisfy any of the snapshot’s NodeAffinity terms (the same label-expression matching used for StorageClass.AllowedTopologies). If no term matches, the node is marked Unschedulable.
If a VolumeSnapshotContent has no NodeAffinity set (e.g., the CSI driver does not support topology, or the feature gate was disabled when the snapshot was created), the plugin does not filter any nodes and the core scheduler behaves as usual.
// SnapshotTopology is an out-of-tree scheduler plugin that filters nodes
// based on snapshot accessible topology when a PVC references a snapshot
// as its data source.
type SnapshotTopology struct { ... }
// PreFilter resolves PVC -> VolumeSnapshot -> VolumeSnapshotContent for all
// snapshot-sourced PVCs in the pod, reads NodeAffinity, and caches the
// result in CycleState.
func (pl *SnapshotTopology) PreFilter(ctx context.Context, state *framework.CycleState, pod *v1.Pod) (*framework.PreFilterResult, *framework.Status) { ... }
// Filter checks whether the candidate node's labels satisfy any of the
// cached snapshot NodeAffinity terms (via TopologySelectorTerm
// match expressions). Returns Unschedulable if no term matches.
func (pl *SnapshotTopology) Filter(ctx context.Context, state *framework.CycleState, pod *v1.Pod, nodeInfo *framework.NodeInfo) *framework.Status { ... }
External-Provisioner Changes (Immediate Volume Binding)
For PVCs with Immediate volume binding mode that reference a snapshot as their data source, the scheduler is not involved in topology selection. Provisioning happens immediately without waiting for a pod. In this case, the external-provisioner is responsible for selecting a compatible topology.
This behavior is gated behind the same VolumeSnapshotTopology feature gate in the external-provisioner. When the gate is disabled, the provisioner ignores the NodeAffinity field and uses existing behavior regardless of whether topology data is present on the VolumeSnapshotContent.
When the feature gate is enabled and the external-provisioner handles a PVC with a snapshot data source and Immediate binding, it will:
- Resolve the PVC’s
dataSourceto theVolumeSnapshotContent(it already does this to get the snapshot handle). - Read the
VolumeSnapshotContent.Spec.NodeAffinityfield. - Intersect the snapshot’s
NodeAffinitywith theStorageClass.AllowedTopologiesto determine a set of compatible topologies for provisioning. - Pass the intersected topology as
AccessibilityRequirementsin the CSICreateVolumeRequest.
If the intersection is empty (no topology satisfies both constraints), the provisioner will fail the provision with an event indicating topology incompatibility between the snapshot and the StorageClass. This fails with a clear actionable error rather than deferring to a CSI driver failure.
If the VolumeSnapshotContent has no NodeAffinity set, the provisioner falls back to existing behavior (using only StorageClass.AllowedTopologies).
// In the external-provisioner's Provision() flow for Immediate + snapshot dataSource:
func (p *csiProvisioner) buildTopologyRequirements(sc *storagev1.StorageClass, vsc *snapshotv1.VolumeSnapshotContent) (*csi.TopologyRequirement, error) {
scTopology := sc.AllowedTopologies
snapTopology := vsc.Spec.NodeAffinity
if len(snapTopology) == 0 {
// No snapshot topology then fall back to existing behavior
return buildFromAllowedTopologies(scTopology), nil
}
if len(scTopology) == 0 {
// No StorageClass restriction then use snapshot topology directly
return buildFromAllowedTopologies(snapTopology), nil
}
// Intersect: only topologies that satisfy both constraints
intersected := intersectTopologySelectorTerms(scTopology, snapTopology)
if len(intersected) == 0 {
return nil, fmt.Errorf("no compatible topology: StorageClass.AllowedTopologies and snapshot NodeAffinity have no intersection")
}
return buildFromAllowedTopologies(intersected), nil
}
What happens with statically provisioned snapshots?
Since there is no CreateSnapshot call for snapshots that are statically provisioned, operators must manually set their desired topology specifications on the VolumeSnapshotContent.
Error Handling
Topology requirement cannot be satisfied (e.g., no capacity in requested region/zone):
The CSI driver returns gRPC error code 8 RESOURCE_EXHAUSTED from CreateSnapshot, indicating that although the accessibility_requirements field is valid, a snapshot cannot be created with the specified topology constraints. This mirrors the same error condition defined for CreateVolume when topology cannot be satisfied. The sidecar controller sets the error on VolumeSnapshotContent.Status.Error, emits a warning event, and retries with exponential backoff. No topology is written to the Spec since the snapshot was not created. The caller MUST ensure that whatever is preventing snapshots from being created in the specified location (e.g., quota issues) is addressed before the retry will succeed.
- User-facing: The user sees the error on
VolumeSnapshotContent.Status.Errorand a warning event with the gRPCRESOURCE_EXHAUSTEDmessage. The user should verify capacity/quota in the requested topology or recreate the VolumeSnapshot using a VolumeSnapshotClass without the topology constraints.
CSI driver does not return topology in the response:
If the driver does not populate accessible_topology in the CreateSnapshotResponse, the sidecar controller simply skips the topology patch. The VolumeSnapshotContent is created successfully without topology information. This ensures backward compatibility with drivers that do not support topology.
No nodes match the snapshot’s NodeAffinity (scheduler plugin):
All candidate nodes are filtered out by the plugin because no node’s labels satisfy any of the snapshot’s NodeAffinity terms. The pod remains unschedulable with an event indicating the topology mismatch. This is a permanent failure unless new nodes are added in a compatible topology.
- User-facing: The pod remains in
Pendingstate with a scheduler event indicating no nodes satisfy the snapshot topology constraints. The user should add nodes in a compatible topology or use a different snapshot whoseNodeAffinitymatches available nodes.
Snapshot NodeAffinity incompatible with StorageClass.AllowedTopologies (Immediate binding):
The external-provisioner detects that the intersection of StorageClass.AllowedTopologies and the snapshot’s NodeAffinity is empty. Provisioning fails immediately with an event indicating the topology incompatibility.
- User-facing: The user sees a warning event on the PVC indicating topology incompatibility between the snapshot and StorageClass. The user should use a StorageClass with
AllowedTopologiescompatible with the snapshot’sNodeAffinity, or use a different snapshot.
VolumeSnapshotContent has topology field empty: In this case, default scheduler behavior will run, so no snapshot topology will be taken into account when filtering out nodes.
Test Plan
[X] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.
Prerequisite testing updates
Unit tests
This enhancement is entirely out-of-tree, so testgrid coverage data does not apply. The packages we will be touching and their current coverage:
github.com/kubernetes-csi/external-snapshotter/pkg/sidecar-controller: 2026-06-15 - 69.8%github.com/kubernetes-csi/external-snapshotter/pkg/snapshotter: 2026-06-15 - 90.2%github.com/kubernetes-csi/external-provisioner/pkg/controller: 2026-06-15 - 80.9%sigs.k8s.io/scheduler-plugins/pkg/<snapshot-topology>: New package (does not exist yet)
Since the e2e framework does not currently support enabling or disabling feature gates, there will be various unit tests that are exercising the switch of feature gate itself and handling of relevant data.
Additionally, unit tests for the scheduler plugin will cover:
- PreFilter correctly resolves PVC → VolumeSnapshot → VolumeSnapshotContent and caches
NodeAffinity. - PreFilter is a no-op when the pod has no snapshot-sourced PVCs.
- Filter rejects nodes whose labels do not satisfy any term in the snapshot’s
NodeAffinity. - Filter passes nodes whose labels satisfy at least one
NodeAffinityterm. - Filter is a no-op when
NodeAffinityis empty (backward compatibility).
Integration tests
This enhancement is entirely out-of-tree (external-snapshotter, external-provisioner, and kubernetes-sigs/scheduler-plugins). There are no changes to kubernetes/kubernetes, so integration tests in test/integration/ do not apply. The equivalent integration-level testing will be done in the respective out-of-tree repositories’ test suites, covering the interaction between the sidecar controller, CRDs, and the scheduler plugin.
e2e tests
- Test e2e workflow of having the feature flag enabled and having VolumeSnapshotContents
NodeAffinityfields be populated. - Test that a pod with a PVC referencing a snapshot as its data source is scheduled to a node whose topology is compatible with the snapshot’s
NodeAffinity. - Test that a pod remains unschedulable when no nodes match the snapshot’s
NodeAffinity. - Test that Immediate binding mode with a snapshot data source provisions in a topology compatible with the snapshot’s
NodeAffinity. - Test multi-zone cluster with partial topology coverage: snapshot accessible in zone-a only, nodes exist in zone-a and zone-b. Verify pod is scheduled to zone-a.
- Test topology changes during snapshot lifecycle: update
NodeAffinityon a VolumeSnapshotContent to add a new zone, verify subsequent scheduling considers the updated topology. - Test scheduler plugin failure modes: VolumeSnapshot or VolumeSnapshotContent not found (e.g., deleted mid-scheduling), plugin should not block scheduling of unrelated pods.
- Test that CSI drivers which do not advertise
SNAPSHOT_ACCESSIBILITY_CONSTRAINTSare unaffected:NodeAffinityis not populated, scheduling behavior is unchanged. - Test that snapshots accessible in any zone (driver returns empty
accessible_topology) do not restrict scheduling.
Graduation Criteria
Alpha
- Feature implemented behind a feature flag.
- Scheduler plugin implemented in
kubernetes-sigs/scheduler-plugins. - Initial unit/e2e tests completed and enabled.
Beta
- Allowing time for feedback (at least 2 releases between beta and GA).
- All unit tests/integration/e2e tests completed and enabled.
- Validate that the
NodeAffinityfield is being accurately populated. - Validate snapshot-controller behavior with and without volume snapshot topology enabled.
- Validate scheduler plugin correctly filters nodes based on snapshot topology.
- Evaluate whether a size limit on the
NodeAffinityfield is necessary to prevent object bloat from misbehaving CSI drivers, and implement validation/truncation if needed based on alpha feedback. - Add sidecar-level validation of topology data from CSI drivers before patching
NodeAffinity(e.g., valid label key format, valid label value length, non-empty keys). For alpha, the sidecar passes through what the driver returns. - Investigate compatibility with Topology Aware Workload Scheduling (TAS/WAS) and the scheduler plugin.
- Complete detailed feature gate skew analysis across all three components (csi-snapshotter, external-provisioner, scheduler plugin).
- Investigate using ValidatingAdmissionPolicy/MutatingAdmissionPolicy to enforce API-level gating of the
NodeAffinityfield on the CRD when the feature gate is disabled.
GA
- No bug reports / feedback / improvements to address.
- Scheduler plugin has been deployed and validated in production environments.
Deprecation
No deprecation plan.
Upgrade / Downgrade Strategy
Upgrade Strategy:
- Upgrade to the external-snapshotter version that has the updated controller behavior and CRDs.
- Make sure to have the
VolumeSnapshotTopologyfeature gate enabled. - Deploy the scheduler with the
SnapshotTopologyplugin enabled (fromkubernetes-sigs/scheduler-plugins) to enforce topology-aware scheduling.
Downgrade Strategy:
- Disable
VolumeSnapshotTopologyfeature gate and restart snapshot-controller. New snapshots will no longer haveNodeAffinitypopulated. - Remove the
SnapshotTopologyscheduler plugin from the scheduler configuration. Scheduling will revert to the default behavior with no snapshot topology filtering. ExistingVolumeSnapshotContentobjects that already haveNodeAffinityset will retain the field but it will not be enforced.
- Disable
Version Skew Strategy
Since the changes of this enhancement span three independent components (the csi-snapshotter sidecar, the external-provisioner, and the out-of-tree scheduler plugin), the following version skew scenarios apply:
Scheduler plugin deployed without updated csi-snapshotter: The
NodeAffinityfield will not be populated onVolumeSnapshotContentobjects. The scheduler plugin will see no topology data and will not filter any nodes. Scheduling behavior is unchanged, this is safe.Updated csi-snapshotter deployed without scheduler plugin: The
NodeAffinityfield will be populated on newVolumeSnapshotContentobjects, but no scheduling enforcement occurs forWaitForFirstConsumerbinding. The topology data is informational only. Provisioning may still fail if the selected node’s topology is incompatible with the snapshot, which is the same behavior as today.Updated csi-snapshotter deployed without updated external-provisioner: The
NodeAffinityfield will be populated onVolumeSnapshotContentobjects, but Immediate binding mode will not perform topology intersection. The provisioner continues with existing behavior. This is safe — same behavior as today.Updated external-provisioner deployed without updated csi-snapshotter: The provisioner checks for
NodeAffinitybut the field will be empty on all VolumeSnapshotContent objects. The provisioner falls back to existing behavior (using onlyStorageClass.AllowedTopologies). This is safe.All components deployed: Full functionality, topology is populated by the csi-snapshotter, enforced during scheduling by the plugin (WaitForFirstConsumer), and enforced during provisioning by the external-provisioner (Immediate binding).
All components must be at the minimum version that includes the topology changes for full functionality. The recommended deployment order is to upgrade the csi-snapshotter first (so topology data is available), then deploy the scheduler plugin and update the external-provisioner.
A more detailed analysis of feature gate skew scenarios (e.g., feature gate enabled on one component but disabled on another) will be completed as a Beta graduation requirement.
Production Readiness Review Questionnaire
Feature Enablement and Rollback
How can this feature be enabled / disabled in a live cluster?
- Feature gate (also fill in values in
kep.yaml)- Feature gate name: VolumeSnapshotTopology
- Components depending on the feature gate: snapshot-controller, csi-snapshotter, and external-provisioner.
Does enabling the feature change any default behavior?
It does not change any default behavior for the external-snapshotter components. All existing fields in the VSC will still be there, and users will see the topology information being added to the VSC. However, if the scheduler plugin is deployed, scheduling behavior changes: pods with PVCs referencing snapshot data sources may be filtered to only schedule on nodes whose topology is compatible with the snapshot’s NodeAffinity. Pods that previously scheduled on any node may now be restricted to a subset of nodes. Additionally, for Immediate binding mode with snapshot data sources, the external-provisioner will now enforce topology compatibility, which may cause provisioning to fail if the StorageClass’s AllowedTopologies are incompatible with the snapshot, and previously this would have resulted in a less clear CSI driver error.
Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
- Yes the feature can be disabled by turning off the feature gate and restarting the snapshot-controller. New snapshots will no longer have
NodeAffinitypopulated. The scheduler plugin can be independently removed from the scheduler configuration to disable topology enforcement. The external-provisioner will also stop enforcing topology intersection for Immediate binding. ExistingVolumeSnapshotContentobjects will retain theirNodeAffinityfield.
What happens if we reenable the feature if it was previously rolled back?
The VolumeSnapshotContents will have have topology information from the CreateSnapshotResponse. There may be some that do not have it if they were created while the feature was disabled.
Are there any tests for feature enablement/disablement?
Yes, there will be a combination of e2e tests and unit tests. Tests will verify that when the VolumeSnapshotTopology feature gate is disabled: (1) the sidecar controller does not patch NodeAffinity onto VolumeSnapshotContent objects even if the CSI driver returns topology in the CreateSnapshotResponse, and (2) existing VolumeSnapshotContent objects that already have NodeAffinity set are not stripped of the field (data is preserved but no new topology is written).
Rollout, Upgrade and Rollback Planning
How can a rollout or rollback fail? Can it impact already running workloads?
TBD after alpha.
What specific metrics should inform a rollback?
If the users are seeing failures in the creation of VolumeSnapshotContents, that should be a red flag. Additionally, if pods with snapshot-sourced PVCs are unexpectedly stuck in a pending state after deploying the scheduler plugin, this may indicate the plugin is filtering out nodes incorrectly.
Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
TBD after alpha.
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
No deprecations and/or removals of features, APIs, fields of API types, or flags as part of this rollout.
Monitoring Requirements
How can an operator determine if the feature is in use by workloads?
An operator can determine if the feature is in use by workloads if they see topology in VolumeSnapshotContent objects spec. For the scheduler plugin, operators can verify it is working by noticing that pods with PVCs referencing snapshots are assigned to nodes that match the snapshot’s NodeAffinity.
How can someone using this feature know that it is working for their instance?
- Events
- Event Reason:
- API .spec
- Field name: NodeAffinity
- Other (treat as last resort)
- Details:
What are the reasonable SLOs (Service Level Objectives) for the enhancement?
TBD after alpha.
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
TBD after alpha.
- Metrics
- Metric name:
- [Optional] Aggregation method:
- Components exposing the metric:
- Other (treat as last resort)
- Details:
Are there any missing metrics that would be useful to have to improve observability of this feature?
TBD after alpha.
Dependencies
Does this feature depend on any specific services running in the cluster?
- Topology population depends on the external-snapshotter sidecar controller and a CSI driver that supports the
SNAPSHOT_ACCESSIBILITY_CONSTRAINTScapability. - Topology enforcement depends on the
SnapshotTopologyout-of-tree scheduler plugin being deployed. Without it, topology data is populated but not enforced.
Scalability
Will enabling / using this feature result in any new API calls?
Yes, one additional PATCH call to VolumeSnapshotContent per snapshot creation to set the NodeAffinity field on the Spec. This is a one-time call per snapshot, originating from the CSI snapshotter sidecar controller.
Will enabling / using this feature result in introducing new API types?
No, this feature adds a new field (NodeAffinity) to the existing VolumeSnapshotContentSpec type.
Will enabling / using this feature result in any new calls to the cloud provider?
No, the topology information is returned as part of the existing CSI CreateSnapshot response. No additional cloud provider calls are made.
Will enabling / using this feature result in increasing size or count of the existing API objects?
Yes, VolumeSnapshotContent objects will have an additional nodeAffinity field. Estimated increase is small depending on the number of topology key-value pairs (e.g., region and zone).
Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
No, The additional PATCH call to set topology will likely add negligible latency to the snapshot creation flow.
Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?
No, the additional topology data is a small map stored on existing objects. There are no new computations, watchers, or reconciliation loops introduced.
Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
No, this feature only adds a small amount of data to existing API objects and does not affect node resources.
Troubleshooting
How does this feature react if the API server and/or etcd is unavailable?
TBD after alpha.
What are other known failure modes?
TBD after alpha.
What steps should be taken if SLOs are not being met to determine the problem?
TBD after alpha.
Implementation History
- 2025-11-19 - Enhancement was discussed in CSI Implementation Meeting.
- 2026-03-03 - Initial KEP PR is published along with corresponding issue.
Drawbacks
Alternatives
Inherit topology from the source PersistentVolume
An alternative approach considered was to have the common snapshot controller copy topology information from the source PersistentVolume’s NodeAffinity into the VolumeSnapshotContent at creation time, rather than receiving it from the SP in the CreateSnapshotResponse.
This was ruled out because:
Snapshot topology may differ from volume topology. A storage backend may have the option to store snapshots in a different location than the source volume.
Single source of truth. Having the storage backend report topology directly avoids assumptions about the relationship between volume and snapshot placement, making the design more accurate and portable across different storage systems.