KEP-3299: KMS v2 Improvements
KEP-3299: KMS v2 Improvements
- Release Signoff Checklist
- Summary
- Motivation
- Proposal
- Design Details
- Production Readiness Review Questionnaire
- Implementation History
- Alternatives
- Infrastructure Needed
Release Signoff Checklist
Items marked with (R) are required prior to targeting to a milestone / release.
- (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
- (R) KEP approvers have approved the KEP status as
implementable - (R) Design details are appropriately documented
- (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests for meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
- (R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests
- (R) Production readiness review completed
- (R) Production readiness review approved
- “Implementation History” section is up-to-date for milestone
- User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
- Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
Summary
This KEP proposes the new v2 KeyManagementService service contract to:
- enable partially automated key rotation for the latest key without API server restarts
- improve KMS plugin health check reliability
- improve observability of envelop operations between kube-apiserver, KMS plugins and KMS
Motivation
Performance: Today, when the kube-apiserver is restarted in a cluster and a LIST secret call is made (this applies to all resources encrypted at rest, which secrets tend to always be part of), due to the serial processing of LIST requests and the data encryption key (DEK) cache being empty, the initialization of informers may take significant time as a result of the large number of consecutive trips to the KMS plugin -> external KMS for all the DEKs that have been generated so far. This serial call can cause the KMS plugin to hit the external KMS rate limit and delay the overall readiness of the cluster. Currently, a DEK is generated for each object and is then encrypted using a KEK. This 1:1 mapping means if there is a burst of secret creation, then the KMS plugin can also hit the external KMS rate limit for encrypt operations.
Rotation: Currently, it requires lots of manual steps to rotate a KMS key for Kubernetes
and the process is error prone. It requires deployment of another instance of the KMS plugin with the new key running side by side with the old instance while adding a second entry of the new plugin to EncryptionConfiguration. Any change to the EncryptionConfiguration requires a kube-apiserver restart for the changes to take effect. For a single kube-apiserver configuration, this can lead to a brief period when the kube-apiserver is unavailable. The current rotation process requires multiple restarts of all kube-apiserver processes to ensure each server can decrypt and then encrypt using the new key. It requires multiple updates to the EncryptionConfiguration to move the new key to the second and then first entry in the keys array so that it is used for encryption in the config. It also requires running storage migration (either via the storage version migrator or a manual invocation of kubectl get secrets --all-namespaces -o json | kubectl replace -f - ) to encrypt all existing Secrets with the new key, which can timeout and leave the cluster in a state where it is still dependent on the old key.
Health Check & Status: Today, the health check from kube-apiserver to KMS plugin is an Encrypt operation followed by Decrypt operation. These operations cost money in cloud environments and do not allow the plugin to perform more holistic checks on if it is healthy. Furthermore, a plugin has no way to inform the API server if its underlying key encryption key (KEK) has been rotated. If we provide a separate status RPC call with its own StatusRequest and StatusResponse, the KMS plugin can indicate the change in KEK version as part of response. This could be an indication that the KEK is now rotated and storage migration is now required.
Observability: The only way to correlate a successful/failed envelope operation today is to use the approximate timestamp of the operation to check events in kube-apiserver, kms-plugin and KMS. There is no guarantee that the timestamp of the operation is the same as the timestamp of the corresponding event in KMS. This KEP proposes extending the signature of the kms-plugin interface to include the transaction ID (to be generated by the kube-apiserver), which kms-plugin could pass to KMS. This transaction ID will be logged in the kube-apiserver with additional metadata such as secret name and namespace for the envelope operation. Similarly, the transaction ID will be logged in the kms-plugin and optionally passed to KMS.
Goals
- improve readiness times for clusters with a large number of encrypted resources
- reduce the likelihood of hitting the KMS rate limit
- enable partially automated key rotation for the latest key without API server restarts
- improve KMS plugin health check reliability
- improve observability of envelop operations between kube-apiserver, KMS plugins and KMS
- if this v2 API reaches beta in release M, the existing v1beta1 gRPC API will be deprecated at release M (or any later release)
- if this v2 API reaches GA in release N, the existing v1beta1 gRPC API will be disabled by default at N, stop supporting writes at N+3, and removed at release N+6 (the existing key rotation dance of using multiple providers will be used to migrate from v1beta1 to v2)
Non-Goals
- Prevent KMS rate limiting
- Recovery when KMS KEK is deleted
- Using the proposed transaction ID for audit logging
Proposal
Performance, Health Check, Observability and Rotation:
- Support re-using DEK when the KEK key ID is stable
- Expand
EncryptionConfigurationto support a new KMSv2 configuration - Add v2alpha1
KeyManagementServiceproto service contract in Kubernetes to includekey_idand additional metadata inannotationsto support key rotationkey_id: the KMS Key ID, stable identifier, changed to trigger key rotation and storage migrationannotations: structured data, can be used for debugging, recovery, opaque to API server, stored unencrypted, etc. Validation similar to how K8s labels are validated today. Labels have good size limits and restrictions today.- A status request and response periodically (order of minutes) returns
version,healthz, andkey_id - The
key_idin status can be used on decrypt operations to compare and validate the key ID stored in the DEK cache and the latestEncryptResponsekey_idto detect if an object is stale in terms of storage migration - Generate a new UID for each envelope operation in kube-apiserver
- Add a new UID field to
EncryptRequestandDecryptRequest
- Add support for hot reload of the
EncryptionConfiguration:- Watch on the
EncryptionConfiguration - When changes are detected, process the
EncryptionConfigurationresource, and add new transformers and update existing ones atomically. - If there is an issue with creating or updating any of the transformers, retain the current configuration in the kube-apiserver and generate an error in logs.
- Watch on the
- Enable partially automated rotation for
latestkey in KMS:NOTE: Prerequisite:
EncryptionConfigurationis set up to always use thelatestkey version in KMS and the values can be interpreted dynamically at runtime by the KMS plugin to automatically reload the current write key. Rotation process sequence:- record initial key ID across all API servers
- cause key rotation in KMS (user action in the remote KMS)
- observe the change across the stack using metrics from API server
- storage migration (run storage migrator)
Design Details
v2 API
EncryptionConfiguration will be expanded to support the new v2 API:
diff --git a/staging/src/k8s.io/apiserver/pkg/apis/config/v1/types.go b/staging/src/k8s.io/apiserver/pkg/apis/config/v1/types.go
index d7d68d2584d..84c1fa6546f 100644
--- a/staging/src/k8s.io/apiserver/pkg/apis/config/v1/types.go
+++ b/staging/src/k8s.io/apiserver/pkg/apis/config/v1/types.go
@@ -98,3 +99,10 @@ type KMSConfiguration struct {
+ // apiversion of KeyManagementService
+ APIVersion string `json:"apiVersion"`
Add v2 KeyManagementService proto service contract in Kubernetes to include key_id, annotations, and status.
DEK re-use in API server
In case of KMS v1, a new DEK is generated for every encryption. This means that for every write request, the API server makes a call to the KMS plugin to encrypt the DEK using the remote KEK. The API server also has to cache the DEKs to avoid making a call to the KMS plugin for every read request. When the API server restarts, it has to populate the cache by making a call to the KMS plugin for every DEK in the etcd store based on the cache size. This is a significant overhead for the API server.
With KMS v2, the API server will generate a DEK seed at startup and cache it. The API server also makes a call to the KMS plugin to encrypt the DEK seed using the remote KEK. This is a one-time call at startup and on KEK rotation. The API server then uses the cached DEK seed to generate single use DEKs via a Key Derivation Function (KDF). Each DEK is used once (and only once) to encrypt a resource. This reduces the number of calls to the KMS plugin and improves the overall latency of the API server requests.
key_id and rotation
What is required of the kube-apiserver is to be able to tell the KMS plugin which KEK (KMS KEK) it should use to decrypt the incoming DEK. To do so, upon encryption, the KMS plugin needs to provide the key_id for the KEK used as part of EncryptResponse. The kube-apiserver would then store it in etcd next to the DEK. Upon decryption, the kube-apiserver provides the key_id from the last encryption when calling Decrypt.
The key_id is the public, non-secret name of the remote KMS KEK that is currently in use. It may be logged during regular operation of the API server, and thus must not contain any private data. Plugin implementations are encouraged to use a hash to avoid leaking any data. The KMS v2 metrics take care to hash this value before exposing it via the /metrics endpoint.
The API server considers the key_id returned from the Status procedure call to be authoritative. Thus, a change to this value signals to the API server that the remote KEK has changed, and data encrypted with the old KEK should be marked stale when a no-op write is performed. If an EncryptRequest procedure call returns a key_id that is different from Status, the response is thrown away and the plugin is considered unhealthy.
NOTE: Thus implementations must guarantee that the key_id returned from Status will be the same as the one returned by EncryptRequest. Furthermore, plugins must ensure that the key_id is stable and does not flip-flop between values (i.e. during a remote KEK rotation).
Plugins must not re-use key_ids, even in situations where a previously used remote KEK has been reinstated. For example, if a plugin was using key_id=A, switched to key_id=B, and then went back to key_id=A - instead of reporting key_id=A the plugin should report some derivative value such as key_id=A_001 or use a new value such as key_id=C.
Since the API server polls Status about every minute, key_id rotation is not immediate. Furthermore, the API server will coast on the last valid state for about three minutes. Thus if a user wants to take a passive approach to storage migration (i.e. by waiting), they must schedule a migration to occur at 3 + N + M minutes after the remote KEK has been rotated (N is how long it takes the plugin to observe the key_id change and M is the desired buffer to allow config changes to be processed - a minimum M of five minutes is recommend). Note that no API server restart is required to perform KEK rotation.
NOTE: Because you don’t control the number of writes performed with the DEK, we will recommend rotating the KEK at least every 90 days.
message EncryptResponse {
// The encrypted data.
bytes ciphertext = 1;
// The KMS key ID used to encrypt the data. This must always refer to the KMS KEK.
// This can be used to inform staleness of data updated via value.Transformer.TransformFromStorage.
string key_id = 2;
// Additional metadata to be stored with the encrypted data.
// This data is stored in plaintext in etcd. KMS plugin implementations are responsible for pre-encrypting any sensitive data.
map<string, bytes> annotations = 3;
}
The DecryptRequest passes the same key_id and annotations returned by the previous EncryptResponse of this data as its key_id and annotations for the decryption request.
message DecryptRequest {
// The data to be decrypted.
bytes ciphertext = 1;
// UID is a unique identifier for the request.
string uid = 2;
// The keyID that was provided to the apiserver during encryption.
// This represents the KMS KEK that was used to encrypt the data.
string key_id = 3;
// Additional metadata that was sent by the KMS plugin during encryption.
map<string, bytes> annotations = 4;
}
message DecryptResponse {
// The decrypted data.
bytes plaintext = 1;
}
message EncryptRequest {
// The data to be encrypted.
bytes plaintext = 1;
// UID is a unique identifier for the request.
string uid = 2;
}
In terms of storage, a new structured protobuf format is proposed. The prefix for the new format is k8s:enc:kms:v2:<config name>:.
// EncryptedObject is the representation of data stored in etcd after envelope encryption.
type EncryptedObject struct {
// EncryptedData is the encrypted data.
EncryptedData []byte `protobuf:"bytes,1,opt,name=encryptedData,proto3" json:"encryptedData,omitempty"`
// KeyID is the KMS key ID used for encryption operations.
KeyID string `protobuf:"bytes,2,opt,name=keyID,proto3" json:"keyID,omitempty"`
// EncryptedDEKSource is the ciphertext of the source of the DEK used to encrypt the data stored in encryptedData.
// encryptedDEKSourceType defines the process of using the plaintext of this field to determine the aforementioned DEK.
EncryptedDEKSource []byte `protobuf:"bytes,3,opt,name=encryptedDEKSource,proto3" json:"encryptedDEKSource,omitempty"`
// Annotations is additional metadata that was provided by the KMS plugin.
Annotations map[string][]byte `protobuf:"bytes,4,rep,name=annotations,proto3" json:"annotations,omitempty" protobuf_key:"bytes,1,opt,name=key,proto3" protobuf_val:"bytes,2,opt,name=value,proto3"`
// encryptedDEKSourceType defines the process of using the plaintext of encryptedDEKSource to determine the DEK.
EncryptedDEKSourceType EncryptedDEKSourceType `protobuf:"varint,5,opt,name=encryptedDEKSourceType,proto3,enum=v2.EncryptedDEKSourceType" json:"encryptedDEKSourceType,omitempty"`
}
type EncryptedDEKSourceType int32
const (
// AES_GCM_KEY means that the plaintext of encryptedDEKSource is the DEK itself, with AES-GCM as the encryption algorithm.
EncryptedDEKSourceType_AES_GCM_KEY EncryptedDEKSourceType = 0
// HKDF_SHA256_XNONCE_AES_GCM_SEED means that the plaintext of encryptedDEKSource is the pseudo random key
// (referred to as the seed throughout the code) that is fed into HKDF expand. SHA256 is the hash algorithm
// and first 32 bytes of encryptedData are the info param. The first 32 bytes from the HKDF stream are used
// as the DEK with AES-GCM as the encryption algorithm.
EncryptedDEKSourceType_HKDF_SHA256_XNONCE_AES_GCM_SEED EncryptedDEKSourceType = 1
)
This object simply provides a structured format to store the EncryptResponse data with the plugin name and encrypted object data. New fields can easily be added to this format. EncryptedDEKSourceType was added to support a KDF based approach with the security properties of single use DEKs with the performance properties of a long lived DEK (HKDF_SHA256_XNONCE_AES_GCM_SEED).
Status API
To improve health check reliability, the new StatusResponse provides version, healthz information, and can trigger key rotation via storage version status updates.
message StatusRequest {}
message StatusResponse {
// Version of the KMS plugin API. Must match the configured .resources[].providers[].kms.apiVersion
string version = 1;
// Any value other than "ok" is failing healthz. On failure, the associated API server healthz endpoint will contain this value as part of the error message.
string healthz = 2;
// the current write key, used to determine staleness of data updated via value.Transformer.TransformFromStorage.
string key_id = 3;
}
The key_id will be funneled into API server metrics.
Observability
To improve observability, this design also generates a new UID for each envelope operation similar to UID generation in admission review requests here: https://github.com/kubernetes/kubernetes/blob/e9e669aa6037c380469b45200e59cff9b52d6d68/staging/src/k8s.io/apiserver/pkg/admission/plugin/webhook/request/admissionreview.go#L137
.
This UID field is included in the EncryptRequest and DecryptRequest of the v2 API. It will always be present. It is generated in the kube-apiserver and will be used:
- For logging in the kube-apiserver. All envelope operations to the kms-plugin will be logged with the corresponding
UID.- The
UIDwill be logged using a wrapper in the kube-apiserver to ensure that theUIDis logged in the same format and is always logged. - In addition to the
UID, the kube-apiserver will also log at log level 6+ non-sensitive metadata such asname,namespaceandGroupVersionResourceof the object that triggered the envelope operation.
- The
- Sent to the kms-plugin as part of the
EncryptRequestandDecryptRequeststructs.
Metrics
apiserver_encryption_config_controller_automatic_reload_last_timestamp_seconds- Timestamp of the last successful or failed automatic reload of encryption configuration split by apiserver identity.apiserver_encryption_config_controller_automatic_reload_success_total- Total number of successful automatic reloads of encryption configuration split by apiserver identity.apiserver_envelope_encryption_dek_source_cache_size- Number of records in data encryption key (DEK) source cache. On a restart, this value is an approximation of the number of decrypt RPC calls the server will make to the KMS plugin.apiserver_envelope_encryption_key_id_hash_last_timestamp_seconds- The last time in seconds when a keyID was used.apiserver_envelope_encryption_key_id_hash_status_last_timestamp_seconds- The last time in seconds when a keyID was returned by the Status RPC call.apiserver_envelope_encryption_key_id_hash_total- Number of times a keyID is used split by transformation type, provider, and apiserver identity.apiserver_envelope_encryption_kms_operations_latency_seconds- KMS operation duration with gRPC error code status total.apiserver_storage_envelope_transformation_cache_misses_total- Total number of cache misses while accessing key decryption key(KEK).apiserver_storage_transformation_duration_seconds- Latencies in seconds of value transformation operations.apiserver_storage_transformation_operations_total- Total number of transformations. Successful transformation will have a status ‘OK’ and a varied status string when the transformation fails. This status and transformation_type fields may be used for alerting on encryption/decryption failure using transformation_type from_storage for decryption and to_storage for encryption
Sequence Diagram
Encrypt Request
%%{init:{"sequence": {"mirrorActors":true},
"themeVariables": {
"actorBkg":"royalblue",
"actorTextColor":"white"
}}}%%
sequenceDiagram
participant user
participant kube_api_server
participant kms_plugin
participant external_kms
alt Generate DEK seed at startup
Note over kube_api_server,external_kms: Refer to Generate Data Encryption Key (DEK) Seed diagram for details
end
user->>kube_api_server: create/update resource that's to be encrypted
kube_api_server->>kube_api_server: generate DEK using DEK seed
kube_api_server->>kube_api_server: encrypt resource with DEK
kube_api_server->>etcd: store encrypted objectDecrypt Request
%%{init:{"sequence": {"mirrorActors":true},
"themeVariables": {
"actorBkg":"royalblue",
"actorTextColor":"white"
}}}%%
sequenceDiagram
participant user
participant kube_api_server
participant kms_plugin
participant external_kms
participant etcd
user->>kube_api_server: get/list resource that's encrypted
kube_api_server->>etcd: get encrypted resource
etcd->>kube_api_server: encrypted resource
alt Encrypted DEK seed not in cache
kube_api_server->>kms_plugin: decrypt request
kms_plugin->>external_kms: decrypt DEK seed with remote KEK
external_kms->>kms_plugin: decrypted DEK seed
kms_plugin->>kube_api_server: return decrypted DEK seed
kube_api_server->>kube_api_server: cache decrypted DEK seed
end
kube_api_server->>kube_api_server: generate DEK using DEK seed
kube_api_server->>kube_api_server: decrypt resource with DEK
kube_api_server->>user: return decrypted resourceStatus Request
%%{init:{"sequence": {"mirrorActors":true},
"themeVariables": {
"actorBkg":"royalblue",
"actorTextColor":"white"
}}}%%
sequenceDiagram
participant kube_api_server
participant kms_plugin
participant external_kms
alt Generate DEK seed at startup
Note over kube_api_server,external_kms: Refer to Generate Data Encryption Key (DEK) Seed diagram for details
end
loop every minute (or every 10s if error or unhealthy)
kube_api_server->>kms_plugin: status request
kms_plugin->>external_kms: validate remote KEK
external_kms->>kms_plugin: KEK status
kms_plugin->>kube_api_server: return status response <br/> {"healthz": "ok", key_id: "<remote KEK ID>", "version": "v2beta1"}
alt KEK rotation detected (key_id changed), rotate DEK seed
Note over kube_api_server,external_kms: Refer to Generate Data Encryption Key (DEK) Seed diagram for details
end
endGenerate Data Encryption Key (DEK) Seed
%%{init:{"sequence": {"mirrorActors":true},
"themeVariables": {
"actorBkg":"royalblue",
"actorTextColor":"white"
}}}%%
sequenceDiagram
participant kube_api_server
participant kms_plugin
participant external_kms
kube_api_server->>kube_api_server: generate DEK seed
kube_api_server->>kms_plugin: encrypt request
kms_plugin->>external_kms: encrypt DEK seed with remote KEK
external_kms->>kms_plugin: encrypted DEK seed
kms_plugin->>kube_api_server: return encrypt response <br/> {"ciphertext": "<encrypted DEK seed>", key_id: "<remote KEK ID>", "annotations": {}}Cryptography Details
We propose to extend the limited 12 byte AES-GCM nonce with a 32 byte info (randomly generated per write) that is fed into HKDF-Expand (the secret is the DEK seed and the hash is SHA-256). We read 32 bytes from HKDF-Expand to use as the AES-GCM DEK.
We want the crypto properties of KMS v1 (one DEK per write) without the network overhead. The DEK seed (32 random bytes) is generated on server start up and automatically rotated whenever the remote KEK changes. Note that the HKDF-Extract step is skipped because we already have a good pseudo random key (thus there is no salt, only info).
This allows us to use a purely per-write random 12 byte nonce for AES-GCM because each generated DEK+nonce combination is unique (the chance of collision is negligible). VM state restores are not an issue in this model.
While not strictly necessary, a cache will be used to memoize the HKDF operations as they are fully deterministic based on the inputs. This significantly reduces the overhead of the key generation both in terms of CPU time and memory allocations.
Note that the info must be stored (in the clear) with the ciphertext, meaning we increase the storage overhead by 32 bytes.
stateDiagram-v2 KEK note right of KEK accessed via plugin end note KEK --> DEK_seed: encrypts DEK_seed --> etcd: Encrypted_DEK_seed stored
stateDiagram-v2 etcd_path note left of etcd_path unique per object in etcd /PATH_PREFIX/secrets/NAMESPACE/NAME end note resource note right of resource stored in EncryptedObject.EncryptedData as info|nonce|ciphertext end note DEK_seed --> hkdf_expand: pseudo random key sha256 --> hkdf_expand: hash rand_nonce_32 --> hkdf_expand: info param hkdf_expand --> DEK: generates DEK --> aes_gcm: key rand_nonce_12 --> aes_gcm: nonce etcd_path --> aes_gcm: additional_data aes_gcm --> resource: encrypts
Benchmarks
Extensive benchmarks were performed to compare the impact of having KMS v2 encryption enabled. The most relevant run is included below. It shows that there is no significant increase in the amount of time it takes to perform a REST call, but that the cost of encryption can be as high as 14% in the terms of memory usage. This is considered an acceptable tradeoff.
│ rest_none1.txt │ rest_kdf_cache1.txt │
│ sec/op │ sec/op vs base │
KMSv2REST-10 18.51 ± 41% 20.39 ± 99% ~ (p=0.353 n=10)
│ rest_none1.txt │ rest_kdf_cache1.txt │
│ B/op │ B/op vs base │
KMSv2REST-10 23.95Gi ± 0% 27.25Gi ± 0% +13.77% (p=0.000 n=10)
│ rest_none1.txt │ rest_kdf_cache1.txt │
│ allocs/op │ allocs/op vs base │
KMSv2REST-10 3.119M ± 0% 3.268M ± 1% +4.78% (p=0.000 n=10)
Comparing Metrics
Multiple runs of e2e tests were performed to compare the impact of having KMS v2 encryption of all resources vs no encryption at all. The results are included below.
It shows that there is no significant increase in the following API server metrics: apiserver_request_duration_seconds, apiserver_request_terminations_total, apiserver_request_aborts_total.
| post* | get* | delete* | list* | |
|---|---|---|---|---|
| run w/o encrypt | ||||
| 1 | 0.0225 | 0.0086 | 0.0103 | 0.0046 |
| 2 | 0.0336 | 0.0076 | 0.0119 | 0.0058 |
| 3 | 0.0205 | 0.0081 | 0.0117 | 0.0047 |
| average w/o encrypt | 0.025533 | 0.0081 | 0.0113 | 0.005033 |
| run w/ encrypt | ||||
| 4 | 0.0219 | 0.0071 | 0.0109 | 0.0051 |
| 5 | 0.0229 | 0.0062 | 0.01 | 0.0045 |
| 6 | 0.0279 | 0.0082 | 0.0119 | 0.005 |
| average w/ encrypt | 0.024233 | 0.007167 | 0.010933 | 0.004867 |
| % diff between averages | -5.09138 | -11.5226 | -3.24484 | -3.31126 |
*average apiserver_request_duration_seconds = apiserver_request_duration_seconds_sum / apiserver_request_duration_seconds_count
Both apiserver_request_terminations_total and apiserver_request_aborts_total resulted in no difference.
Test Plan
[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.
Prerequisite testing updates
Unit tests
- Validate DEK seed re-use behavior in the API server
- DEK seed is generated at startup and re-used for all encryption operations
- DEK seed is rotated after KEK rotation
- KMS plugin is only called when cache is empty
- Ensure the logs and metrics are generated as expected
- At least 75% code coverage
- Staleness check based on keyID in the
StatusResponse - Unit test for gRPC request/response validation
- Serialize/Deserialize
EncryptedObject - Validate keyID in
StatusResponseandEncryptResponse - Validate annotations in
EncryptResponse - Validate logs are generated with the correct
UIDand additional metadata
- Serialize/Deserialize
Integration tests
- Integration tests to validate
- Encryption of custom resources and custom resource definitions
- No-op writes cause rewrite of stale data (data that has correct schema but was encrypted with keyID that is not the latest)
- Health checks
- single health check for v2 at
/kms-providers - individual health checks for v1 and v2 with
/kms-provider-0and/kms-provider-1
- single health check for v2 at
- Integration tests with base64 plugin to validate the encryption and decryption of data
- Integration tests to check rotation is possible without restarting API server
- Integration tests that exercise the feature enablement/disablement flow
e2e tests
With this e2e test suite, we want to do the following:
- Run the e2e suite against a kind cluster without kms encryption enabled.
- Run the e2e suite against a kind cluster that has kms v2 encryption enabled (as defined below).
- Compare
request_duration_seconds,request_terminations_total,request_aborts_totalAPI server metrics between the two runs. The acceptable delta should be less than 20%. - Observe metrics from the mock implementation to determine time taken at each step of the encryption/decryption process.
- Observe API server startup time with and without kms encryption enabled.
- KMSv2 config would use the mock implementation
- Validate all resources are encrypted
- The “remote” kms would be a local encryption key
- that adds 100 ms latency
- that has rate limiting
Graduation Criteria
Alpha
- Feature implemented behind a feature flag
- Initial unit and integration tests completed and enabled
Beta
- Feature is enabled by default
- All of the above documented tests are complete
- Metrics in API server to gauge performance impact
GA
- Tracing is added to the API server to assess transformation timings
- At least 2 KMSv2 plugin implementations are available
- We will gather feedback from these implementations to determine if API is sufficient
- Reference implementation using PKCS11
Production Readiness Review Questionnaire
Feature Enablement and Rollback
How can this feature be enabled / disabled in a live cluster?
- Feature gate
- Feature gate name:
KMSv2 - Components depending on the feature gate:
- kube-apiserver
- Feature gate name:
Alpha
FeatureSpec{
Default: false,
LockToDefault: false,
PreRelease: featuregate.Alpha,
}
Beta
FeatureSpec{
Default: true,
LockToDefault: false,
PreRelease: featuregate.Beta,
}
GA
FeatureSpec{
Default: true,
LockToDefault: true,
PreRelease: featuregate.GA,
}
Does enabling the feature change any default behavior?
No. The v2 API is new in the v1.25 release. Furthermore, even with the feature enabled by default, the user needs to explicitly configure a KMSv2 provider to use this.
Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
Yes, To disable encryption at rest using the v2 API:
- Add new
identityprovider at the top of encryption config - Restart kube-apiserver
- Run storage migration to migrate all the existing encrypted data to use the
identityprovider- If running
kubectl get <resource> --all-namespaces -o json | kubectl replace -f -to migrate data, the user can confirm that the migration is complete by observing the kube-apiserver metricsapiserver_envelope_encryption_key_id_hash_last_timestamp_secondsandapiserver_envelope_encryption_key_id_hash_total. These metrics will no longer contain the keyID hash of the old KEK after storage migration and kube-apiserver restart. - If running storage version migrator to migrate data, the user can confirm that the migration is complete by observing the conditions in
storageversionmigrations. Refer to doc for more details. Using the storage version migrator is recommended.
- If running
- Remove the KMS provider from the encryption config and restart kube-apiserver
- At the end of these steps, all the data in etcd will be unencrypted.
More details are available here
Disabling this gate without first doing a storage migration to use a different encryption at rest mechanism will result in data loss.
- For secrets that are mounted in pods, if the DEK used to encrypt the secret is not present in the kube-apiserver cache, the pods will fail to start as the secret will not be able to be decrypted.
What happens if we reenable the feature if it was previously rolled back?
After the feature is reenabled, if a v2 KMS provider is still configured in the EncryptionConfiguration
- All new data will be encrypted with the external kms provider.
- Existing data can be decrypted if the key used for encryption before feature rollback still exists.
Are there any tests for feature enablement/disablement?
- We will add unit and integration tests to validate the enablement/disablement flow.
- When the feature is disabled, data stored in etcd will no longer be encrypted using the external kms provider with v2 API.
- If the feature is disabled incorrectly (i.e without performing a storage migration), existing data that is encrypted with the external kms provider will be unable to be decrypted. This will cause list and get operations to fail for the resources that were encrypted.
Rollout, Upgrade and Rollback Planning
How can a rollout or rollback fail? Can it impact already running workloads?
- If a rollback of the feature is done without first doing a storage migration to use a different encryption at rest mechanism will result in data loss.
- Workloads relying on existing data in etcd will no longer be able to access it.
- The data can be retrieved by reenabling the feature gate or deleting and recreating the data.
- The rollout of the feature can fail if there are too many calls to the external kms provider.
- API server will not report healthy.
- For highly-available clusters, the feature can be enabled on some API servers only for read purpose.
- For rollout, add KMSv2 providers as read across all API servers first before adding the provider for write.
- For rollback, move KMSv2 providers from write to read position across all API servers.
What specific metrics should inform a rollback?
- Latency metrics
transformation_duration_seconds - Transformation error count metric
apiserver_storage_transformation_duration_seconds_bucket{transformation_type="from_storage", transformer_prefix="k8s:enc:kms:v2:"} - After rollback is complete, you should no longer see the keyID metric
apiserver_envelope_encryption_key_id_hash_totalincrement.
Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
This will be covered by integration tests.
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
- The
cacheSizefield inEncryptionConfigurationis no longer valid for KMS v2. - When KMSv2 is used without KMSv1 provider, the health endpoints don’t individually identify for each KMS provider.
Monitoring Requirements
How can someone using this feature know that it is working for their instance?
- Other (treat as last resort)
- Details:
- Logs in kube-apiserver, kms-plugin and KMS will be logged with the corresponding
key_id,annotations, andUID. - Number of times a keyID is used for encryption/decryption
- Metric recording the last time in seconds when a keyID was returned in the
StatusResponsee.g.apiserver_envelope_encryption_key_id_hash_status_last_timestamp_seconds{key_id_hash="sha256", provider_name="providerName"} 1.674865558833728e+09
- Logs in kube-apiserver, kms-plugin and KMS will be logged with the corresponding
- Details:
What are the reasonable SLOs (Service Level Objectives) for the enhancement?
There should be no impact on the SLO with this change.
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
- Other (treat as last resort)
- Details:
- Logs in kube-apiserver, kms-plugin and KMS will be logged with the corresponding
key_id,annotations, andUID. - Metrics for latency of encryption/decryption requests.
- Logs in kube-apiserver, kms-plugin and KMS will be logged with the corresponding
- Details:
Dependencies
Does this feature depend on any specific services running in the cluster?
This feature requires the KMS plugin to be running.
Scalability
Will enabling / using this feature result in any new API calls?
Yes, the new KMS v2 gRPC API.
Will enabling / using this feature result in introducing new API types?
Yes, the new KMS v2 gRPC types.
Will enabling / using this feature result in any new calls to the cloud provider?
No.
Will enabling / using this feature result in increasing size or count of the existing API objects?
No, the v2 API is new.
Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
No.
Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?
No.
Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
No. One socket is used per KMS plugin and old connections are closed after new connections have been created/validated for health during an automatic reload of EncryptionConfiguration.
Troubleshooting
How does this feature react if the API server and/or etcd is unavailable?
- This feature is part of API server. The feature is unavailable if API server is unavailable. ETCD data encryption with external kms-plugin will be unavailable.
- If the API server is unavailable, clients will be unable to create/get data that’s stored in etcd. There will be no requests from the API server to the kms-plugin.
- If the
EncryptionConfigurationfile configured in the control plane node is not valid:- API server when restarted will fail at startup as it’s unable to load the EncryptionConfig. This behavior is consistent with the KMS v1 API. The encryption configuration needs to be fixed to allow the API server to start properly.
- If the KMS plugin is unavailable:
- API server when restarted will fail health check as it’s unable to connect to the KMS plugin. The
/healthzand/readyz(but not the/livezwhich ignores kms) endpoints will show afailedhealth check for the kms provider. This behavior is consistent with the KMS v1 API. Refer to docs for the health API endpoints and how to exclude individual endpoints from causing the API server to fail health check. - To resolve the issue, the kms plugin must be fixed to be available. The logs in the kms-plugin should be indicative of the issue.
- API server when restarted will fail health check as it’s unable to connect to the KMS plugin. The
Implementation History
- 2022-05-09: Initial KEP draft submitted.
- 2022-09-09: KMSv2 Alpha (v1.25) implemented https://github.com/kubernetes/kubernetes/pull/111126
- 2023-02-28: re-use DEK while key ID is unchanged https://github.com/kubernetes/kubernetes/pull/116155
- This was a change in design from alpha to beta. The key hierarchy approach initially suggested in the KEP was dropped in favor of re-using DEKs based on metrics numbers from CI.
- 2023-03-14: Generate proto API and update feature gate for beta https://github.com/kubernetes/kubernetes/pull/115123
- 2023-07-21: KDF based nonce extension https://github.com/kubernetes/kubernetes/pull/118828
Alternatives
Performance and rotation:
We considered the follow approaches and each has its own drawbacks:
cacheSizefield inEncryptionConfiguration. It is used by the API server to initialize a LRU cache of the given size with the encrypted ciphertext used as index. Having a higher value for thecacheSizewill prevent calls to the plugin for decryption operations. However, this does not solve the issue with the number of calls to KMS plugin when encryption traffic is bursty.- Key hierarchy in the KMS plugin.
No changes to the API server, keep 1:1 DEK mapping
- Assumption: A KMS plugin that was implemented using a local HSM would not need any changes because it would be able to handle the amount of encryption calls with ease since it would not need to perform network IO
- Assumption: local gRPC calls to the KMS plugin do not represent significant overhead
KMS plugin generates its own local KEK in-memory
External KMS is used to encrypt the local KEK
Local KEK is used for encryption of DEKs sent by API server
Local KEK is used for encryption based on policy (N events, X time, etc) We tested this approach and the metrics in CI indicated the gRPC calls to the KMS plugin added significant overhead. The gRPC call latency was in the order of
0.1s(refer toapiserver_envelope_encryption_kms_operations_latency_seconds_buckethere )API server encryption metric from CI run
# HELP apiserver_envelope_encryption_kms_operations_latency_seconds [ALPHA] KMS operation duration with gRPC error code status total. # TYPE apiserver_envelope_encryption_kms_operations_latency_seconds histogram apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="0.0001"} 0 apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="0.0002"} 0 apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="0.0004"} 60 apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="0.0008"} 2947 apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="0.0016"} 5090 apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="0.0032"} 6639 apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="0.0064"} 8076 apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="0.0128"} 9448 apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="0.0256"} 10875 apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="0.0512"} 12236 apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="0.1024"} 13442 apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="0.2048"} 14153 apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="0.4096"} 14426 apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="0.8192"} 14533 apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="1.6384"} 14544 apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="3.2768"} 14544 apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="6.5536"} 14544 apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="13.1072"} 14544 apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="26.2144"} 14544 apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="52.4288"} 14544 apiserver_envelope_encryption_kms_operations_latency_seconds_bucket{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider",le="+Inf"} 14544 apiserver_envelope_encryption_kms_operations_latency_seconds_sum{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider"} 433.331096833 apiserver_envelope_encryption_kms_operations_latency_seconds_count{grpc_status_code="OK",method_name="/v2alpha1.KeyManagementService/Encrypt",provider_name="kmsprovider"} 14544Encrypt Request
sequenceDiagram participant etcd participant kubeapiserver participant kmsplugin participant externalkms kubeapiserver->>kmsplugin: encrypt request alt using key hierarchy kmsplugin->>kmsplugin: encrypt DEK with local KEK kmsplugin->>externalkms: encrypt local KEK with remote KEK externalkms->>kmsplugin: encrypted local KEK kmsplugin->>kmsplugin: cache encrypted local KEK kmsplugin->>kubeapiserver: return encrypt response <br/> {"ciphertext": "<encrypted DEK>", key_id: "<remote KEK ID>", <br/> "annotations": {"kms.kubernetes.io/local-kek": "<encrypted local KEK>"}} else not using key hierarchy %% current behavior kmsplugin->>externalkms: encrypt DEK with remote KEK externalkms->>kmsplugin: encrypted DEK kmsplugin->>kubeapiserver: return encrypt response <br/> {"ciphertext": "<encrypted DEK>", key_id: "<remote KEK ID>", "annotations": {}} end kubeapiserver->>etcd: store encrypt response and encrypted DEKDecrypt Request
sequenceDiagram participant kubeapiserver participant kmsplugin participant externalkms %% if local KEK in annotations, then using hierarchy alt encrypted local KEK is in annotations kubeapiserver->>kmsplugin: decrypt request <br/> {"ciphertext": "<encrypted DEK>", key_id: "<key_id gotten as part of EncryptResponse>", <br/> "annotations": {"kms.kubernetes.io/local-kek": "<encrypted local KEK>"}} alt encrypted local KEK in cache kmsplugin->>kmsplugin: decrypt DEK with local KEK else encrypted local KEK not in cache kmsplugin->>externalkms: decrypt local KEK with remote KEK externalkms->>kmsplugin: decrypted local KEK kmsplugin->>kmsplugin: decrypt DEK with local KEK kmsplugin->>kmsplugin: cache decrypted local KEK end kmsplugin->>kubeapiserver: return decrypt response <br/> {"plaintext": "<decrypted DEK>", key_id: "<remote KEK ID>", <br/> "annotations": {"kms.kubernetes.io/local-kek": "<encrypted local KEK>"}} else encrypted local KEK is not in annotations kubeapiserver->>kmsplugin: decrypt request <br/> {"ciphertext": "<encrypted DEK>", key_id: "<key_id gotten as part of EncryptResponse>", <br/> "annotations": {}} kmsplugin->>externalkms: decrypt DEK with remote KEK (same behavior as today) externalkms->>kmsplugin: decrypted DEK kmsplugin->>kubeapiserver: return decrypt response <br/> {"plaintext": "<decrypted DEK>", key_id: "<remote KEK ID>", <br/> "annotations": {}} end
Observability:
We considered using the AuditID from the kube-apiserver request that generated the envelope operation. This approach has the following drawbacks:
AuditIDcan be configured by the user with theAudit-IDheader in the API server request. Multiple requests can be sent to the kube-apiserver with the sameAudit-ID.- Not all API server requests will generate an envelope operation. The API server caches DEKs and for the DEK that’s available in the cache, the kube-apiserver will not generate an envelope operation.
- Since not all calls to the KMS correspond to an audit log, using audit ID is not complete for correlating calls from kube-apiserver->kms-plugin->KMS.
Infrastructure Needed
We need a new git repo for the KMS plugin reference implementation. It will need to be synced from the k/k staging dir.
- repo created: https://github.com/kubernetes/kms