KEP-3299: KMS v2 Improvements

Implementation History
STABLE Implemented
Created 2022-05-09
Latest v1.29
Milestones
Alpha v1.25
Beta v1.27
Stable v1.29
Ownership
Owning SIG
SIG Auth
Participating SIGs
Primary Authors

KEP-3299: KMS v2 Improvements

Release Signoff Checklist

Items marked with (R) are required prior to targeting to a milestone / release.

  • (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
  • (R) KEP approvers have approved the KEP status as implementable
  • (R) Design details are appropriately documented
  • (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
    • e2e Tests for all Beta API Operations (endpoints)
    • (R) Ensure GA e2e tests for meet requirements for Conformance Tests
    • (R) Minimum Two Week Window for GA e2e tests to prove flake free
  • (R) Graduation criteria is in place
  • (R) Production readiness review completed
  • (R) Production readiness review approved
  • “Implementation History” section is up-to-date for milestone
  • User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
  • Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

Summary

This KEP proposes the new v2 KeyManagementService service contract to:

  • enable partially automated key rotation for the latest key without API server restarts
  • improve KMS plugin health check reliability
  • improve observability of envelop operations between kube-apiserver, KMS plugins and KMS

Motivation

Performance: Today, when the kube-apiserver is restarted in a cluster and a LIST secret call is made (this applies to all resources encrypted at rest, which secrets tend to always be part of), due to the serial processing of LIST requests and the data encryption key (DEK) cache being empty, the initialization of informers may take significant time as a result of the large number of consecutive trips to the KMS plugin -> external KMS for all the DEKs that have been generated so far. This serial call can cause the KMS plugin to hit the external KMS rate limit and delay the overall readiness of the cluster. Currently, a DEK is generated for each object and is then encrypted using a KEK. This 1:1 mapping means if there is a burst of secret creation, then the KMS plugin can also hit the external KMS rate limit for encrypt operations.

Rotation: Currently, it requires lots of manual steps to rotate a KMS key for Kubernetes and the process is error prone. It requires deployment of another instance of the KMS plugin with the new key running side by side with the old instance while adding a second entry of the new plugin to EncryptionConfiguration. Any change to the EncryptionConfiguration requires a kube-apiserver restart for the changes to take effect. For a single kube-apiserver configuration, this can lead to a brief period when the kube-apiserver is unavailable. The current rotation process requires multiple restarts of all kube-apiserver processes to ensure each server can decrypt and then encrypt using the new key. It requires multiple updates to the EncryptionConfiguration to move the new key to the second and then first entry in the keys array so that it is used for encryption in the config. It also requires running storage migration (either via the storage version migrator or a manual invocation of kubectl get secrets --all-namespaces -o json | kubectl replace -f - ) to encrypt all existing Secrets with the new key, which can timeout and leave the cluster in a state where it is still dependent on the old key.

Health Check & Status: Today, the health check from kube-apiserver to KMS plugin is an Encrypt operation followed by Decrypt operation. These operations cost money in cloud environments and do not allow the plugin to perform more holistic checks on if it is healthy. Furthermore, a plugin has no way to inform the API server if its underlying key encryption key (KEK) has been rotated. If we provide a separate status RPC call with its own StatusRequest and StatusResponse, the KMS plugin can indicate the change in KEK version as part of response. This could be an indication that the KEK is now rotated and storage migration is now required.

Observability: The only way to correlate a successful/failed envelope operation today is to use the approximate timestamp of the operation to check events in kube-apiserver, kms-plugin and KMS. There is no guarantee that the timestamp of the operation is the same as the timestamp of the corresponding event in KMS. This KEP proposes extending the signature of the kms-plugin interface to include the transaction ID (to be generated by the kube-apiserver), which kms-plugin could pass to KMS. This transaction ID will be logged in the kube-apiserver with additional metadata such as secret name and namespace for the envelope operation. Similarly, the transaction ID will be logged in the kms-plugin and optionally passed to KMS.

Goals

  • improve readiness times for clusters with a large number of encrypted resources
  • reduce the likelihood of hitting the KMS rate limit
  • enable partially automated key rotation for the latest key without API server restarts
  • improve KMS plugin health check reliability
  • improve observability of envelop operations between kube-apiserver, KMS plugins and KMS
  • if this v2 API reaches beta in release M, the existing v1beta1 gRPC API will be deprecated at release M (or any later release)
  • if this v2 API reaches GA in release N, the existing v1beta1 gRPC API will be disabled by default at N, stop supporting writes at N+3, and removed at release N+6 (the existing key rotation dance of using multiple providers will be used to migrate from v1beta1 to v2)

Non-Goals

  • Prevent KMS rate limiting
  • Recovery when KMS KEK is deleted
  • Using the proposed transaction ID for audit logging

Proposal

Performance, Health Check, Observability and Rotation:

  • Support re-using DEK when the KEK key ID is stable
  • Expand EncryptionConfiguration to support a new KMSv2 configuration
  • Add v2alpha1 KeyManagementService proto service contract in Kubernetes to include
    • key_id and additional metadata in annotations to support key rotation
    • key_id: the KMS Key ID, stable identifier, changed to trigger key rotation and storage migration
    • annotations: structured data, can be used for debugging, recovery, opaque to API server, stored unencrypted, etc. Validation similar to how K8s labels are validated today. Labels have good size limits and restrictions today.
    • A status request and response periodically (order of minutes) returns version, healthz, and key_id
    • The key_id in status can be used on decrypt operations to compare and validate the key ID stored in the DEK cache and the latest EncryptResponse key_id to detect if an object is stale in terms of storage migration
    • Generate a new UID for each envelope operation in kube-apiserver
    • Add a new UID field to EncryptRequest and DecryptRequest
  • Add support for hot reload of the EncryptionConfiguration:
    • Watch on the EncryptionConfiguration
    • When changes are detected, process the EncryptionConfiguration resource, and add new transformers and update existing ones atomically.
    • If there is an issue with creating or updating any of the transformers, retain the current configuration in the kube-apiserver and generate an error in logs.
  • Enable partially automated rotation for latest key in KMS:

    NOTE: Prerequisite: EncryptionConfiguration is set up to always use the latest key version in KMS and the values can be interpreted dynamically at runtime by the KMS plugin to automatically reload the current write key. Rotation process sequence:

    • record initial key ID across all API servers
    • cause key rotation in KMS (user action in the remote KMS)
    • observe the change across the stack using metrics from API server
    • storage migration (run storage migrator)

Design Details

v2 API

EncryptionConfiguration will be expanded to support the new v2 API:

diff --git a/staging/src/k8s.io/apiserver/pkg/apis/config/v1/types.go b/staging/src/k8s.io/apiserver/pkg/apis/config/v1/types.go
index d7d68d2584d..84c1fa6546f 100644
--- a/staging/src/k8s.io/apiserver/pkg/apis/config/v1/types.go
+++ b/staging/src/k8s.io/apiserver/pkg/apis/config/v1/types.go
@@ -98,3 +99,10 @@ type KMSConfiguration struct {
+    // apiversion of KeyManagementService
+    APIVersion string `json:"apiVersion"`

Add v2 KeyManagementService proto service contract in Kubernetes to include key_id, annotations, and status.

DEK re-use in API server

In case of KMS v1, a new DEK is generated for every encryption. This means that for every write request, the API server makes a call to the KMS plugin to encrypt the DEK using the remote KEK. The API server also has to cache the DEKs to avoid making a call to the KMS plugin for every read request. When the API server restarts, it has to populate the cache by making a call to the KMS plugin for every DEK in the etcd store based on the cache size. This is a significant overhead for the API server.

With KMS v2, the API server will generate a DEK seed at startup and cache it. The API server also makes a call to the KMS plugin to encrypt the DEK seed using the remote KEK. This is a one-time call at startup and on KEK rotation. The API server then uses the cached DEK seed to generate single use DEKs via a Key Derivation Function (KDF). Each DEK is used once (and only once) to encrypt a resource. This reduces the number of calls to the KMS plugin and improves the overall latency of the API server requests.

key_id and rotation

What is required of the kube-apiserver is to be able to tell the KMS plugin which KEK (KMS KEK) it should use to decrypt the incoming DEK. To do so, upon encryption, the KMS plugin needs to provide the key_id for the KEK used as part of EncryptResponse. The kube-apiserver would then store it in etcd next to the DEK. Upon decryption, the kube-apiserver provides the key_id from the last encryption when calling Decrypt.

The key_id is the public, non-secret name of the remote KMS KEK that is currently in use. It may be logged during regular operation of the API server, and thus must not contain any private data. Plugin implementations are encouraged to use a hash to avoid leaking any data. The KMS v2 metrics take care to hash this value before exposing it via the /metrics endpoint.

The API server considers the key_id returned from the Status procedure call to be authoritative. Thus, a change to this value signals to the API server that the remote KEK has changed, and data encrypted with the old KEK should be marked stale when a no-op write is performed. If an EncryptRequest procedure call returns a key_id that is different from Status, the response is thrown away and the plugin is considered unhealthy. NOTE: Thus implementations must guarantee that the key_id returned from Status will be the same as the one returned by EncryptRequest. Furthermore, plugins must ensure that the key_id is stable and does not flip-flop between values (i.e. during a remote KEK rotation).

Plugins must not re-use key_ids, even in situations where a previously used remote KEK has been reinstated. For example, if a plugin was using key_id=A, switched to key_id=B, and then went back to key_id=A - instead of reporting key_id=A the plugin should report some derivative value such as key_id=A_001 or use a new value such as key_id=C.

Since the API server polls Status about every minute, key_id rotation is not immediate. Furthermore, the API server will coast on the last valid state for about three minutes. Thus if a user wants to take a passive approach to storage migration (i.e. by waiting), they must schedule a migration to occur at 3 + N + M minutes after the remote KEK has been rotated (N is how long it takes the plugin to observe the key_id change and M is the desired buffer to allow config changes to be processed - a minimum M of five minutes is recommend). Note that no API server restart is required to perform KEK rotation.

NOTE: Because you don’t control the number of writes performed with the DEK, we will recommend rotating the KEK at least every 90 days.

message EncryptResponse {
    // The encrypted data.
    bytes ciphertext = 1;
    // The KMS key ID used to encrypt the data. This must always refer to the KMS KEK.
    // This can be used to inform staleness of data updated via value.Transformer.TransformFromStorage.
    string key_id = 2;
    // Additional metadata to be stored with the encrypted data.
    // This data is stored in plaintext in etcd. KMS plugin implementations are responsible for pre-encrypting any sensitive data.
    map<string, bytes> annotations = 3;
}

The DecryptRequest passes the same key_id and annotations returned by the previous EncryptResponse of this data as its key_id and annotations for the decryption request.

message DecryptRequest {
    // The data to be decrypted.
    bytes ciphertext = 1;
    // UID is a unique identifier for the request.
    string uid = 2;
    // The keyID that was provided to the apiserver during encryption.
    // This represents the KMS KEK that was used to encrypt the data.
    string key_id = 3;
    // Additional metadata that was sent by the KMS plugin during encryption.
    map<string, bytes> annotations = 4;
}

message DecryptResponse {
    // The decrypted data.
    bytes plaintext = 1;
}

message EncryptRequest {
    // The data to be encrypted.
    bytes plaintext = 1;
    // UID is a unique identifier for the request.
    string uid = 2;
}

In terms of storage, a new structured protobuf format is proposed. The prefix for the new format is k8s:enc:kms:v2:<config name>:.

// EncryptedObject is the representation of data stored in etcd after envelope encryption.
type EncryptedObject struct {
	// EncryptedData is the encrypted data.
	EncryptedData []byte `protobuf:"bytes,1,opt,name=encryptedData,proto3" json:"encryptedData,omitempty"`
	// KeyID is the KMS key ID used for encryption operations.
	KeyID string `protobuf:"bytes,2,opt,name=keyID,proto3" json:"keyID,omitempty"`
	// EncryptedDEKSource is the ciphertext of the source of the DEK used to encrypt the data stored in encryptedData.
	// encryptedDEKSourceType defines the process of using the plaintext of this field to determine the aforementioned DEK.
	EncryptedDEKSource []byte `protobuf:"bytes,3,opt,name=encryptedDEKSource,proto3" json:"encryptedDEKSource,omitempty"`
	// Annotations is additional metadata that was provided by the KMS plugin.
	Annotations map[string][]byte `protobuf:"bytes,4,rep,name=annotations,proto3" json:"annotations,omitempty" protobuf_key:"bytes,1,opt,name=key,proto3" protobuf_val:"bytes,2,opt,name=value,proto3"`
	// encryptedDEKSourceType defines the process of using the plaintext of encryptedDEKSource to determine the DEK.
	EncryptedDEKSourceType EncryptedDEKSourceType `protobuf:"varint,5,opt,name=encryptedDEKSourceType,proto3,enum=v2.EncryptedDEKSourceType" json:"encryptedDEKSourceType,omitempty"`
}

type EncryptedDEKSourceType int32

const (
	// AES_GCM_KEY means that the plaintext of encryptedDEKSource is the DEK itself, with AES-GCM as the encryption algorithm.
	EncryptedDEKSourceType_AES_GCM_KEY EncryptedDEKSourceType = 0
	// HKDF_SHA256_XNONCE_AES_GCM_SEED means that the plaintext of encryptedDEKSource is the pseudo random key
	// (referred to as the seed throughout the code) that is fed into HKDF expand.  SHA256 is the hash algorithm
	// and first 32 bytes of encryptedData are the info param.  The first 32 bytes from the HKDF stream are used
	// as the DEK with AES-GCM as the encryption algorithm.
	EncryptedDEKSourceType_HKDF_SHA256_XNONCE_AES_GCM_SEED EncryptedDEKSourceType = 1
)

This object simply provides a structured format to store the EncryptResponse data with the plugin name and encrypted object data. New fields can easily be added to this format. EncryptedDEKSourceType was added to support a KDF based approach with the security properties of single use DEKs with the performance properties of a long lived DEK (HKDF_SHA256_XNONCE_AES_GCM_SEED).

Status API

To improve health check reliability, the new StatusResponse provides version, healthz information, and can trigger key rotation via storage version status updates.

message StatusRequest {}

message StatusResponse {
    // Version of the KMS plugin API.  Must match the configured .resources[].providers[].kms.apiVersion
    string version = 1;
    // Any value other than "ok" is failing healthz.  On failure, the associated API server healthz endpoint will contain this value as part of the error message.
    string healthz = 2;
    // the current write key, used to determine staleness of data updated via value.Transformer.TransformFromStorage.
    string key_id = 3;
}

The key_id will be funneled into API server metrics.

Observability

To improve observability, this design also generates a new UID for each envelope operation similar to UID generation in admission review requests here: https://github.com/kubernetes/kubernetes/blob/e9e669aa6037c380469b45200e59cff9b52d6d68/staging/src/k8s.io/apiserver/pkg/admission/plugin/webhook/request/admissionreview.go#L137 .

This UID field is included in the EncryptRequest and DecryptRequest of the v2 API. It will always be present. It is generated in the kube-apiserver and will be used:

  1. For logging in the kube-apiserver. All envelope operations to the kms-plugin will be logged with the corresponding UID.
    1. The UID will be logged using a wrapper in the kube-apiserver to ensure that the UID is logged in the same format and is always logged.
    2. In addition to the UID, the kube-apiserver will also log at log level 6+ non-sensitive metadata such as name, namespace and GroupVersionResource of the object that triggered the envelope operation.
  2. Sent to the kms-plugin as part of the EncryptRequest and DecryptRequest structs.

Metrics

  • apiserver_encryption_config_controller_automatic_reload_last_timestamp_seconds - Timestamp of the last successful or failed automatic reload of encryption configuration split by apiserver identity.
  • apiserver_encryption_config_controller_automatic_reload_success_total - Total number of successful automatic reloads of encryption configuration split by apiserver identity.
  • apiserver_envelope_encryption_dek_source_cache_size - Number of records in data encryption key (DEK) source cache. On a restart, this value is an approximation of the number of decrypt RPC calls the server will make to the KMS plugin.
  • apiserver_envelope_encryption_key_id_hash_last_timestamp_seconds - The last time in seconds when a keyID was used.
  • apiserver_envelope_encryption_key_id_hash_status_last_timestamp_seconds - The last time in seconds when a keyID was returned by the Status RPC call.
  • apiserver_envelope_encryption_key_id_hash_total - Number of times a keyID is used split by transformation type, provider, and apiserver identity.
  • apiserver_envelope_encryption_kms_operations_latency_seconds - KMS operation duration with gRPC error code status total.
  • apiserver_storage_envelope_transformation_cache_misses_total - Total number of cache misses while accessing key decryption key(KEK).
  • apiserver_storage_transformation_duration_seconds - Latencies in seconds of value transformation operations.
  • apiserver_storage_transformation_operations_total - Total number of transformations. Successful transformation will have a status ‘OK’ and a varied status string when the transformation fails. This status and transformation_type fields may be used for alerting on encryption/decryption failure using transformation_type from_storage for decryption and to_storage for encryption

Sequence Diagram

Encrypt Request

%%{init:{"sequence": {"mirrorActors":true},
    "themeVariables": {
        "actorBkg":"royalblue",
        "actorTextColor":"white"
}}}%%

sequenceDiagram
    participant user
    participant kube_api_server
    participant kms_plugin
    participant external_kms
    alt Generate DEK seed at startup
        Note over kube_api_server,external_kms: Refer to Generate Data Encryption Key (DEK) Seed diagram for details
    end
    user->>kube_api_server: create/update resource that's to be encrypted
    kube_api_server->>kube_api_server: generate DEK using DEK seed
    kube_api_server->>kube_api_server: encrypt resource with DEK
    kube_api_server->>etcd: store encrypted object

Decrypt Request

%%{init:{"sequence": {"mirrorActors":true},
    "themeVariables": {
        "actorBkg":"royalblue",
        "actorTextColor":"white"
}}}%%

sequenceDiagram
    participant user
    participant kube_api_server
    participant kms_plugin
    participant external_kms
    participant etcd
    user->>kube_api_server: get/list resource that's encrypted
    kube_api_server->>etcd: get encrypted resource
    etcd->>kube_api_server: encrypted resource
    alt Encrypted DEK seed not in cache
        kube_api_server->>kms_plugin: decrypt request
        kms_plugin->>external_kms: decrypt DEK seed with remote KEK
        external_kms->>kms_plugin: decrypted DEK seed
        kms_plugin->>kube_api_server: return decrypted DEK seed
        kube_api_server->>kube_api_server: cache decrypted DEK seed
    end
    kube_api_server->>kube_api_server: generate DEK using DEK seed
    kube_api_server->>kube_api_server: decrypt resource with DEK
    kube_api_server->>user: return decrypted resource

Status Request

%%{init:{"sequence": {"mirrorActors":true},
    "themeVariables": {
        "actorBkg":"royalblue",
        "actorTextColor":"white"
}}}%%

sequenceDiagram
    participant kube_api_server
    participant kms_plugin
    participant external_kms
    alt Generate DEK seed at startup
        Note over kube_api_server,external_kms: Refer to Generate Data Encryption Key (DEK) Seed diagram for details
    end
    loop every minute (or every 10s if error or unhealthy)
        kube_api_server->>kms_plugin: status request
        kms_plugin->>external_kms: validate remote KEK
        external_kms->>kms_plugin: KEK status
        kms_plugin->>kube_api_server: return status response <br/> {"healthz": "ok", key_id: "<remote KEK ID>", "version": "v2beta1"}
        alt KEK rotation detected (key_id changed), rotate DEK seed
            Note over kube_api_server,external_kms: Refer to Generate Data Encryption Key (DEK) Seed diagram for details
        end
    end

Generate Data Encryption Key (DEK) Seed

%%{init:{"sequence": {"mirrorActors":true},
    "themeVariables": {
        "actorBkg":"royalblue",
        "actorTextColor":"white"
}}}%%

sequenceDiagram
    participant kube_api_server
    participant kms_plugin
    participant external_kms
        kube_api_server->>kube_api_server: generate DEK seed
        kube_api_server->>kms_plugin: encrypt request
        kms_plugin->>external_kms: encrypt DEK seed with remote KEK
        external_kms->>kms_plugin: encrypted DEK seed
        kms_plugin->>kube_api_server: return encrypt response <br/> {"ciphertext": "<encrypted DEK seed>", key_id: "<remote KEK ID>", "annotations": {}}

Cryptography Details

We propose to extend the limited 12 byte AES-GCM nonce with a 32 byte info (randomly generated per write) that is fed into HKDF-Expand (the secret is the DEK seed and the hash is SHA-256). We read 32 bytes from HKDF-Expand to use as the AES-GCM DEK.

We want the crypto properties of KMS v1 (one DEK per write) without the network overhead. The DEK seed (32 random bytes) is generated on server start up and automatically rotated whenever the remote KEK changes. Note that the HKDF-Extract step is skipped because we already have a good pseudo random key (thus there is no salt, only info).

This allows us to use a purely per-write random 12 byte nonce for AES-GCM because each generated DEK+nonce combination is unique (the chance of collision is negligible). VM state restores are not an issue in this model.

While not strictly necessary, a cache will be used to memoize the HKDF operations as they are fully deterministic based on the inputs. This significantly reduces the overhead of the key generation both in terms of CPU time and memory allocations.

Note that the info must be stored (in the clear) with the ciphertext, meaning we increase the storage overhead by 32 bytes.

stateDiagram-v2
KEK
note right of KEK
   accessed via plugin
end note
KEK --> DEK_seed: encrypts
DEK_seed --> etcd: Encrypted_DEK_seed stored
stateDiagram-v2
etcd_path
note left of etcd_path
   unique per object in etcd
   /PATH_PREFIX/secrets/NAMESPACE/NAME
end note
resource
note right of resource
   stored in EncryptedObject.EncryptedData as
   info|nonce|ciphertext
end note
DEK_seed --> hkdf_expand: pseudo random key
sha256 --> hkdf_expand: hash
rand_nonce_32 --> hkdf_expand: info param
hkdf_expand --> DEK: generates
DEK --> aes_gcm: key
rand_nonce_12 --> aes_gcm: nonce
etcd_path --> aes_gcm: additional_data
aes_gcm --> resource: encrypts

Benchmarks

Extensive benchmarks were performed to compare the impact of having KMS v2 encryption enabled. The most relevant run is included below. It shows that there is no significant increase in the amount of time it takes to perform a REST call, but that the cost of encryption can be as high as 14% in the terms of memory usage. This is considered an acceptable tradeoff.

             │     rest_none1.txt │       rest_kdf_cache1.txt           │
             │       sec/op       │   sec/op     vs base                │
KMSv2REST-10       18.51 ± 41%       20.39 ± 99%  ~ (p=0.353 n=10)

             │     rest_none1.txt │           rest_kdf_cache1.txt       │
             │        B/op        │     B/op      vs base               │
KMSv2REST-10      23.95Gi ± 0%       27.25Gi ± 0%  +13.77% (p=0.000 n=10)

             │     rest_none1.txt │          rest_kdf_cache1.txt        │
             │      allocs/op     │  allocs/op   vs base                │
KMSv2REST-10       3.119M ± 0%       3.268M ± 1%  +4.78% (p=0.000 n=10)

Comparing Metrics

Multiple runs of e2e tests were performed to compare the impact of having KMS v2 encryption of all resources vs no encryption at all. The results are included below.

It shows that there is no significant increase in the following API server metrics: apiserver_request_duration_seconds, apiserver_request_terminations_total, apiserver_request_aborts_total.

post*get*delete*list*
run w/o encrypt
10.02250.00860.01030.0046
20.03360.00760.01190.0058
30.02050.00810.01170.0047
average w/o encrypt0.0255330.00810.01130.005033
run w/ encrypt
40.02190.00710.01090.0051
50.02290.00620.010.0045
60.02790.00820.01190.005
average w/ encrypt0.0242330.0071670.0109330.004867
% diff between averages-5.09138-11.5226-3.24484-3.31126

*average apiserver_request_duration_seconds = apiserver_request_duration_seconds_sum / apiserver_request_duration_seconds_count Both apiserver_request_terminations_total and apiserver_request_aborts_total resulted in no difference.

Test Plan

[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.

Prerequisite testing updates
Unit tests
  • Validate DEK seed re-use behavior in the API server
    • DEK seed is generated at startup and re-used for all encryption operations
    • DEK seed is rotated after KEK rotation
    • KMS plugin is only called when cache is empty
    • Ensure the logs and metrics are generated as expected
    • At least 75% code coverage
  • Staleness check based on keyID in the StatusResponse
  • Unit test for gRPC request/response validation
    • Serialize/Deserialize EncryptedObject
    • Validate keyID in StatusResponse and EncryptResponse
    • Validate annotations in EncryptResponse
    • Validate logs are generated with the correct UID and additional metadata
Integration tests
  • Integration tests to validate
    • Encryption of custom resources and custom resource definitions
    • No-op writes cause rewrite of stale data (data that has correct schema but was encrypted with keyID that is not the latest)
    • Health checks
      • single health check for v2 at /kms-providers
      • individual health checks for v1 and v2 with /kms-provider-0 and /kms-provider-1
  • Integration tests with base64 plugin to validate the encryption and decryption of data
  • Integration tests to check rotation is possible without restarting API server
  • Integration tests that exercise the feature enablement/disablement flow
e2e tests

With this e2e test suite, we want to do the following:

  1. Run the e2e suite against a kind cluster without kms encryption enabled.
  2. Run the e2e suite against a kind cluster that has kms v2 encryption enabled (as defined below).
  3. Compare request_duration_seconds, request_terminations_total, request_aborts_total API server metrics between the two runs. The acceptable delta should be less than 20%.
  4. Observe metrics from the mock implementation to determine time taken at each step of the encryption/decryption process.
  5. Observe API server startup time with and without kms encryption enabled.
  • KMSv2 config would use the mock implementation
    • Validate all resources are encrypted
    • The “remote” kms would be a local encryption key
      • that adds 100 ms latency
      • that has rate limiting

Graduation Criteria

Alpha

  • Feature implemented behind a feature flag
  • Initial unit and integration tests completed and enabled

Beta

  • Feature is enabled by default
  • All of the above documented tests are complete
  • Metrics in API server to gauge performance impact

GA

  • Tracing is added to the API server to assess transformation timings
  • At least 2 KMSv2 plugin implementations are available
    • We will gather feedback from these implementations to determine if API is sufficient
  • Reference implementation using PKCS11

Production Readiness Review Questionnaire

Feature Enablement and Rollback

How can this feature be enabled / disabled in a live cluster?
  • Feature gate
    • Feature gate name: KMSv2
    • Components depending on the feature gate:
      • kube-apiserver

Alpha

FeatureSpec{
	Default: false,
	LockToDefault: false,
	PreRelease: featuregate.Alpha,
}

Beta

FeatureSpec{
	Default: true,
	LockToDefault: false,
	PreRelease: featuregate.Beta,
}

GA

FeatureSpec{
	Default: true,
	LockToDefault: true,
	PreRelease: featuregate.GA,
}
Does enabling the feature change any default behavior?

No. The v2 API is new in the v1.25 release. Furthermore, even with the feature enabled by default, the user needs to explicitly configure a KMSv2 provider to use this.

Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

Yes, To disable encryption at rest using the v2 API:

  1. Add new identity provider at the top of encryption config
  2. Restart kube-apiserver
  3. Run storage migration to migrate all the existing encrypted data to use the identity provider
    1. If running kubectl get <resource> --all-namespaces -o json | kubectl replace -f - to migrate data, the user can confirm that the migration is complete by observing the kube-apiserver metrics apiserver_envelope_encryption_key_id_hash_last_timestamp_seconds and apiserver_envelope_encryption_key_id_hash_total. These metrics will no longer contain the keyID hash of the old KEK after storage migration and kube-apiserver restart.
    2. If running storage version migrator to migrate data, the user can confirm that the migration is complete by observing the conditions in storageversionmigrations. Refer to doc for more details. Using the storage version migrator is recommended.
  4. Remove the KMS provider from the encryption config and restart kube-apiserver
  5. At the end of these steps, all the data in etcd will be unencrypted.

More details are available here

Disabling this gate without first doing a storage migration to use a different encryption at rest mechanism will result in data loss.

  • For secrets that are mounted in pods, if the DEK used to encrypt the secret is not present in the kube-apiserver cache, the pods will fail to start as the secret will not be able to be decrypted.
What happens if we reenable the feature if it was previously rolled back?

After the feature is reenabled, if a v2 KMS provider is still configured in the EncryptionConfiguration

  • All new data will be encrypted with the external kms provider.
  • Existing data can be decrypted if the key used for encryption before feature rollback still exists.
Are there any tests for feature enablement/disablement?
  • We will add unit and integration tests to validate the enablement/disablement flow.
  • When the feature is disabled, data stored in etcd will no longer be encrypted using the external kms provider with v2 API.
  • If the feature is disabled incorrectly (i.e without performing a storage migration), existing data that is encrypted with the external kms provider will be unable to be decrypted. This will cause list and get operations to fail for the resources that were encrypted.

Rollout, Upgrade and Rollback Planning

How can a rollout or rollback fail? Can it impact already running workloads?
  • If a rollback of the feature is done without first doing a storage migration to use a different encryption at rest mechanism will result in data loss.
    • Workloads relying on existing data in etcd will no longer be able to access it.
    • The data can be retrieved by reenabling the feature gate or deleting and recreating the data.
  • The rollout of the feature can fail if there are too many calls to the external kms provider.
    • API server will not report healthy.
  • For highly-available clusters, the feature can be enabled on some API servers only for read purpose.
    • For rollout, add KMSv2 providers as read across all API servers first before adding the provider for write.
    • For rollback, move KMSv2 providers from write to read position across all API servers.
What specific metrics should inform a rollback?
  • Latency metrics transformation_duration_seconds
  • Transformation error count metric apiserver_storage_transformation_duration_seconds_bucket{transformation_type="from_storage", transformer_prefix="k8s:enc:kms:v2:"}
  • After rollback is complete, you should no longer see the keyID metric apiserver_envelope_encryption_key_id_hash_total increment.
Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

This will be covered by integration tests.

Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
  • The cacheSize field in EncryptionConfiguration is no longer valid for KMS v2.
  • When KMSv2 is used without KMSv1 provider, the health endpoints don’t individually identify for each KMS provider.

Monitoring Requirements

How can someone using this feature know that it is working for their instance?
  • Other (treat as last resort)
    • Details:
      • Logs in kube-apiserver, kms-plugin and KMS will be logged with the corresponding key_id, annotations, and UID.
      • Number of times a keyID is used for encryption/decryption
      • Metric recording the last time in seconds when a keyID was returned in the StatusResponse e.g. apiserver_envelope_encryption_key_id_hash_status_last_timestamp_seconds{key_id_hash="sha256", provider_name="providerName"} 1.674865558833728e+09
What are the reasonable SLOs (Service Level Objectives) for the enhancement?

There should be no impact on the SLO with this change.

What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
  • Other (treat as last resort)
    • Details:
      • Logs in kube-apiserver, kms-plugin and KMS will be logged with the corresponding key_id, annotations, and UID.
      • Metrics for latency of encryption/decryption requests.

Dependencies

Does this feature depend on any specific services running in the cluster?

This feature requires the KMS plugin to be running.

Scalability

Will enabling / using this feature result in any new API calls?

Yes, the new KMS v2 gRPC API.

Will enabling / using this feature result in introducing new API types?

Yes, the new KMS v2 gRPC types.

Will enabling / using this feature result in any new calls to the cloud provider?

No.

Will enabling / using this feature result in increasing size or count of the existing API objects?

No, the v2 API is new.

Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?

No.

Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?

No.

Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?

No. One socket is used per KMS plugin and old connections are closed after new connections have been created/validated for health during an automatic reload of EncryptionConfiguration.

Troubleshooting

How does this feature react if the API server and/or etcd is unavailable?
  • This feature is part of API server. The feature is unavailable if API server is unavailable. ETCD data encryption with external kms-plugin will be unavailable.
  • If the API server is unavailable, clients will be unable to create/get data that’s stored in etcd. There will be no requests from the API server to the kms-plugin.
  • If the EncryptionConfiguration file configured in the control plane node is not valid:
    • API server when restarted will fail at startup as it’s unable to load the EncryptionConfig. This behavior is consistent with the KMS v1 API. The encryption configuration needs to be fixed to allow the API server to start properly.
  • If the KMS plugin is unavailable:
    • API server when restarted will fail health check as it’s unable to connect to the KMS plugin. The /healthz and /readyz (but not the /livez which ignores kms) endpoints will show a failed health check for the kms provider. This behavior is consistent with the KMS v1 API. Refer to docs for the health API endpoints and how to exclude individual endpoints from causing the API server to fail health check.
    • To resolve the issue, the kms plugin must be fixed to be available. The logs in the kms-plugin should be indicative of the issue.

Implementation History

Alternatives

Performance and rotation:

We considered the follow approaches and each has its own drawbacks:

  1. cacheSize field in EncryptionConfiguration. It is used by the API server to initialize a LRU cache of the given size with the encrypted ciphertext used as index. Having a higher value for the cacheSize will prevent calls to the plugin for decryption operations. However, this does not solve the issue with the number of calls to KMS plugin when encryption traffic is bursty.
  2. Key hierarchy in the KMS plugin.
    • No changes to the API server, keep 1:1 DEK mapping

      • Assumption: A KMS plugin that was implemented using a local HSM would not need any changes because it would be able to handle the amount of encryption calls with ease since it would not need to perform network IO
      • Assumption: local gRPC calls to the KMS plugin do not represent significant overhead
    • KMS plugin generates its own local KEK in-memory

    • External KMS is used to encrypt the local KEK

    • Local KEK is used for encryption of DEKs sent by API server

    • Local KEK is used for encryption based on policy (N events, X time, etc) We tested this approach and the metrics in CI indicated the gRPC calls to the KMS plugin added significant overhead. The gRPC call latency was in the order of 0.1s (refer to apiserver_envelope_encryption_kms_operations_latency_seconds_bucket here )

      API server encryption metric from CI run
      # HELP apiserver_envelope_encryption_kms_operations_latency_seconds [ALPHA] KMS operation duration with gRPC error code status total.
      # TYPE apiserver_envelope_encryption_kms_operations_latency_seconds histogram
      apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="0.0001"} 0
      apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="0.0002"} 0
      apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="0.0004"} 60
      apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="0.0008"} 2947
      apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="0.0016"} 5090
      apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="0.0032"} 6639
      apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="0.0064"} 8076
      apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="0.0128"} 9448
      apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="0.0256"} 10875
      apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="0.0512"} 12236
      apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="0.1024"} 13442
      apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="0.2048"} 14153
      apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="0.4096"} 14426
      apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="0.8192"} 14533
      apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="1.6384"} 14544
      apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="3.2768"} 14544
      apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="6.5536"} 14544
      apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="13.1072"} 14544
      apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="26.2144"} 14544
      apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="52.4288"} 14544
      apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="+Inf"} 14544
      apiserver_envelope_encryption_kms_operations_latency_seconds_sum{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider"} 433.331096833
      apiserver_envelope_encryption_kms_operations_latency_seconds_count{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider"} 14544
      
      Encrypt Request
      sequenceDiagram
          participant etcd
          participant kubeapiserver
          participant kmsplugin
          participant externalkms
          kubeapiserver->>kmsplugin: encrypt request
          alt using key hierarchy
              kmsplugin->>kmsplugin: encrypt DEK with local KEK
              kmsplugin->>externalkms: encrypt local KEK with remote KEK
              externalkms->>kmsplugin: encrypted local KEK
              kmsplugin->>kmsplugin: cache encrypted local KEK
              kmsplugin->>kubeapiserver: return encrypt response <br/> {"ciphertext": "<encrypted DEK>", key_id: "<remote KEK ID>", <br/> "annotations": {"kms.kubernetes.io/local-kek": "<encrypted local KEK>"}}
          else not using key hierarchy
              %% current behavior
              kmsplugin->>externalkms: encrypt DEK with remote KEK
              externalkms->>kmsplugin: encrypted DEK
              kmsplugin->>kubeapiserver: return encrypt response <br/> {"ciphertext": "<encrypted DEK>", key_id: "<remote KEK ID>", "annotations": {}}
          end
          kubeapiserver->>etcd: store encrypt response and encrypted DEK
      Decrypt Request
      sequenceDiagram
          participant kubeapiserver
          participant kmsplugin
          participant externalkms
          %% if local KEK in annotations, then using hierarchy
          alt encrypted local KEK is in annotations
            kubeapiserver->>kmsplugin: decrypt request <br/> {"ciphertext": "<encrypted DEK>", key_id: "<key_id gotten as part of EncryptResponse>", <br/> "annotations": {"kms.kubernetes.io/local-kek": "<encrypted local KEK>"}}
              alt encrypted local KEK in cache
                  kmsplugin->>kmsplugin: decrypt DEK with local KEK
              else encrypted local KEK not in cache
                  kmsplugin->>externalkms: decrypt local KEK with remote KEK
                  externalkms->>kmsplugin: decrypted local KEK
                  kmsplugin->>kmsplugin: decrypt DEK with local KEK
                  kmsplugin->>kmsplugin: cache decrypted local KEK
              end
              kmsplugin->>kubeapiserver: return decrypt response <br/> {"plaintext": "<decrypted DEK>", key_id: "<remote KEK ID>", <br/> "annotations": {"kms.kubernetes.io/local-kek": "<encrypted local KEK>"}}
          else encrypted local KEK is not in annotations
              kubeapiserver->>kmsplugin: decrypt request <br/> {"ciphertext": "<encrypted DEK>", key_id: "<key_id gotten as part of EncryptResponse>", <br/> "annotations": {}}
              kmsplugin->>externalkms: decrypt DEK with remote KEK (same behavior as today)
              externalkms->>kmsplugin: decrypted DEK
              kmsplugin->>kubeapiserver: return decrypt response <br/> {"plaintext": "<decrypted DEK>", key_id: "<remote KEK ID>", <br/> "annotations": {}}
          end

Observability:

We considered using the AuditID from the kube-apiserver request that generated the envelope operation. This approach has the following drawbacks:

  1. AuditID can be configured by the user with the Audit-ID header in the API server request. Multiple requests can be sent to the kube-apiserver with the same Audit-ID.
  2. Not all API server requests will generate an envelope operation. The API server caches DEKs and for the DEK that’s available in the cache, the kube-apiserver will not generate an envelope operation.
  3. Since not all calls to the KMS correspond to an audit log, using audit ID is not complete for correlating calls from kube-apiserver->kms-plugin->KMS.

Infrastructure Needed

We need a new git repo for the KMS plugin reference implementation. It will need to be synced from the k/k staging dir.