KEP-4222: CBOR Serializer

Release Signoff Checklist
Summary
Motivation
- Goals
- Non-Goals
Proposal
Design Details
Production Readiness Review Questionnaire
Implementation History
Drawbacks
Alternatives
Infrastructure Needed (Optional)

Release Signoff Checklist

Items marked with (R) are required prior to targeting to a milestone / release.

(R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
(R) KEP approvers have approved the KEP status as implementable
(R) Design details are appropriately documented
(R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
(R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests
(R) Production readiness review completed
(R) Production readiness review approved
“Implementation History” section is up-to-date for milestone
User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

Summary

Under this proposal, Kubernetes API servers and clients will support the Concise Binary Object Representation (CBOR) data format. CBOR will be available to clients as an alternative to JSON for serializing resources in request and response bodies. It will supersede JSON in apiextensions-apiserver for storage serialization of custom resources.

Motivation

In the course of processing a single request to the Kubernetes API, various representations of a resource may be encoded and decoded several times by both the client (encode request body, decode response body) and the server (decode request body, decode from storage, encode response body, encode to storage). For years, Kubernetes has supported a Protobuf format requiring dramatically less CPU time and heap churn than its JSON (or YAML) format. The reduction in codec overhead resulting from the adoption of Protobuf has made Kubernetes clusters more efficient and able to handle increasingly heavy API traffic.

The Kubernetes community has embraced CustomResourceDefinitions (CRDs) as a declarative extension mechanism for the Kubernetes API. Unlike native types, custom resources can not trivially be serialized as Protobuf for serving or storage. Protobuf is dependent on code generation, requires careful schema evolution, and requires clients and servers to have compilation-time knowledge of any Protobuf definitions they will use.

High-object-count and high-traffic custom resources are at a serious efficiency disadvantage versus comparable native resources. Benchmarks suggest that custom resource and dynamic client encode and decode operations can be made up to approximately 8x and 2x faster, respectively, with a substantial reduction in heap allocations, by adopting CBOR as a self-describing binary format.

Goals

Reduce CPU time and heap churn of encode and decode along the request-response path when Protobuf can not be used, especially:
- custom resource storage
- custom resource serving
- dynamic clients
- apply configurations
- strategic merge patches

Non-Goals

Replace existing usage of Protobuf.
Substantially reduce the size of encoded objects (a modest size reduction is anticipated).
Replace all usage of YAML or JSON.

Proposal

Format

The output of the CBOR encoder is a single tagged data item as specified in “Self-Described CBOR ”, with no additional envelope. Self-described CBOR – a tagged data item with tag number 55799 – has the same semantics as the same data item with no tag, with the convenient property that its encoded form is always prefixed by 0xd9d9f7. By design, this prefix is never found at the beginning of a JSON text and can be used as a “magic number” to distinguish the data format of a stored object at rest.

To support decoding custom resources that have been stored as a mixture of JSON and CBOR, the CBOR serializer will implement RecognizingDecoder by checking for the prefix 0xd9d9f7.

Streaming responses (i.e. watches) will be serialized as CBOR Sequences. A CBOR Sequence is a concatenation of zero or more CBOR data items, with no additional framing. This is effectively equivalent to the existing JSON stream serialization behavior and takes advantage of the property that, like JSON objects – and unlike Protobuf messages or non-object JSON documents, e.g. numbers – CBOR data items are self-delimiting.

At the time of writing, watch events are encoded to a temporary buffer before being passed to the frame writer. Frame writers can also assume that the byte slice passed to each call of Write represents the complete contents of one frame. The Protobuf frame writer takes advantage of both in order to determine a frame’s length prefix “for free”. If this proposal were to require encoding events using the effectively length-prefixed approach described in Optimizing CBOR Sequences for Skipping Elements , the CBOR frame writer would similarly need to know each event’s encoded size.

One useful property of a self-delimiting encoding is described in the CBOR standard :

the self-delimiting nature of the CBOR encoding means that there are no two well-formed CBOR encoded data items where one is a prefix of the other

In other words, CBOR (and the existing JSON framing) can stream directly to and from the wire without incurring additional copies on both sides of the connection. If an encoding fails or is otherwise not completely received on the other end, the fragment that is received will not be well-formed and will produce a decode error.

Negotiation

Proactive content negotiation will be supported for clients that want to receive CBOR-encoded responses using the MIME type “application/cbor” in the Accept request header. For compatibility with API servers that don’t support CBOR, clients should also accept “application/json” (with a lower quality factor) and choose the appropriate decoder based on the Content-Type response header.

Streaming requests should use the MIME type for CBOR Sequences, “application/cbor-seq”.

A new “+cbor” suffix will be accepted for the existing Server-Side Apply media type “application/apply-patch” and identifies a CBOR-encoded apply configuration. Similarly, “application/strategic-merge-patch+cbor” will be accepted as the content type of a CBOR-encoded strategic merge patch.

CBOR will not be a supported encoding for JSON Patches or JSON Merge Patches because both types are JSON documents by definition; supporting them would require either defining parallel CBOR variants of each patch type, or sacrificing the efficiency benefit of CBOR by transcoding to JSON on the server side.

Clients can send CBOR-encoded request bodies with the appropriate Content-Type to API servers that support CBOR. API servers that don’t support CBOR will return status 415 (Unsupported Media Type). In client-go, for alpha, when a RESTClient configured to encode requests with CBOR receives a 415, it will permanently (for the life of the RESTClient) fall back to JSON for subsequent requests. For GA, this fallback behavior will be changed to operate on a per-(method, target resource) basis, and to consider acceptable fallback content-types based on the value of the Accept header in a 415 response, as described in RFC 9110 .

The client’s mapping of (method, target resource) pairs to acceptable request content type can be pre-populated from the request media types in OpenAPI documents. This allows clients to bypass the initial request in the content-type fallback mechanism, but is not required.

Client Enablement

Clients can be explicitly configured to prefer CBOR as a request encoding as they can today be configured to prefer Protobuf or JSON. In client-go, this involves setting the ContentType field of rest.ClientContentConfig. The default request content-type will remain JSON for a period of time post-GA; a minimum of two minor versions, so that the oldest kube-apiserver within the supported kubectl version skew will have CBOR support. The supported version skew for aggregated API servers is much wider (infinite?). Encoding and decoding resources from aggregated API servers that don’t support CBOR will rely on the content-type negotiation mechanisms described above.

Two client-side gates will be added as follows, using a common client-go gating mechanism with specific details to be agreed by sig-api-machinery:

AllowCBOR: If disabled, clients configured to accept “application/cbor” will instead accept “application/json” with the same preference. Clients configured to write “application/cbor” will instead write “application/json”. Patch requests with content types “application/apply-patch+cbor” or “application/strategic-merge-patch+cbor” will instead use “application/apply-patch+yaml” and “application/strategic-merge-patch+json”, respectively.
PreferCBOR: If enabled and AllowCBOR is enabled, The default request content-type (if not explicitly configured) becomes “application/cbor” and the dynamic client’s request content-type becomes “application/cbor”.

User Stories (Optional)

Story 1

Story 2

Notes/Constraints/Caveats (Optional)

Risks and Mitigations

Phased Implementation

Introducing a new data format comes with risks to most API endpoints. Errors that lose or modify parts of a resource during encode/decode are a special concern, as is the risk of being unable to decode an object from its encoded form. Additionally, as soon as it becomes possible for users to enable the new encoding, it must always remain possible to decode any custom resource that may have been persisted (barring a forced storage encoding migration).

Before allowing it to be enabled in kube-apiserver at all, there will be a phased implementation to establish confidence in the safety and correctness of the serializer.

Make it a fatal error if kube-apiserver starts with support for CBOR (same with apiextensions-apiserver storage codec?).
Add CBOR library dependency and incrementally implement all unit and fuzz tests enumerated in the alpha criteria.
Make it possible, by code injection only, to allow CBOR in kube-apiserver. Keep the fatal error condition.
Implement all integration tests.
Complete other alpha criteria.
Expose using feature gate.

Library Dependency

Kubernetes will take a new dependency on a CBOR library, with associated risks:

The library may become unmaintained or undermaintained, or our use cases may require a change/addition to the library that its maintainers are unwilling to accept.
- Mitigation: Contribute features, fixes, and testing upstream. If necessary, accept owning a fork.
Since the library will be used to decode untrusted input, it is a potential source of security vulnerabilities.
- Mitigation: New fuzz tests.
- Mitigation: Manual review of library source.

Design Details

Why CBOR?

CBOR is a binary data format initially developed in 2013, specified in RFC 8949 , and assigned Internet Standard number 94 by the IETF.

In addition to its mature specification, the stated design objectives of CBOR are interesting to Kubernetes. In particular:

All JSON data types are convertible to and from CBOR. It should be possible to represent all existing API objects in a CBOR encoding.

Decoding does not require a schema (“self-describing”). No need to build supporting machinery to generate and manage schemas, distribute them to clients, and associate them with persisted objects.

Encoding and decoding is “reasonably frugal” in CPU usage. Not efficiency at all costs, but suitability for “high-volume” applications is an explicit goal.

Serialization is “reasonably compact”. Smaller than JSON, but not at the expense of codec implementation complexity. Exploratory testing showed a fuzzed v1 Pod was nearly 20% smaller than JSON. Like JSON, field names are present in the encoded form due to the self-describing nature of CBOR.

Duplicate Map Keys and Unrecognized or Duplicate Field Names

Existing serializers handle decoding of duplicate fields / map keys differently.

The JSON serializer:

keeps the last duplicate entry
records the duplicated key
continues decoding
returns a strict decoding error along with the decoded object
1. the recognizing decoder treats data as recognized on strict decoding error
2. field validation configures the handling of strict decoding errors encountered while decoding request bodies

The generated Protobuf marshalers keep the last duplicate entry (for both fields and map entries) without producing a strict decoding error.

As a text format, JSON (or YAML) is more commonly edited by hand and so is more prone to this sort of error. And although Kubernetes consistently decodes JSON objects containing duplicate keys, the presence of duplicate keys indicates a mistake. Protobuf is typically machine-generated, and decoders are expected to be “last one wins” in the case of duplicated fields. So while it would be unexpected for a Protobuf-encoded object to contain duplicate fields, the interpretation of such an object is unambiguous.

A map containing duplicate keys is well-formed but invalid according to the CBOR specification. Decoding a map containing duplicate keys will produce a decode error.

Decoding a map with unrecognized fields (map keys that do not not correspond to the name of a struct field’s json tag name) is expected in cases where the client is newer than the server, or where an object containing an unrecognized field was transcoded from YAML or JSON to CBOR. A strict decoding error (as in JSON) will be generated in this case. In the custom resource path, where objects are decoded into unstructured.Unstructured, a schema-aware decoder wrapper is responsible for reporting unknown fields as strict decoding errors.

Note that clients (e.g. kubectl) may choose to decode an object from a JSON or YAML text representation containing duplicate keys, then encode to CBOR to populate the body of an API request. Since the text-encoded content (potentially containing duplicates) is not literally transcoded to CBOR, this use case is supported. Depending on the strictness mode, duplicate keys would either be removed or produce an error at decode time.

Encoding Determinism

It is possible for a single object to be encodable as multiple distinct but valid and semantically-equivalent CBOR byte strings. The CBOR specification does not require encoder implementations to produce deterministic output, although it does include recommendations for implementing deterministic encoding.

The etcd3 storage implementation of GuaranteedUpdate relies on deterministic encoding to skip writes if the stored bytes would not change. The existing JSON and Protobuf encoders produce deterministic output.

Other potential use cases for deterministic encoding of response bodies might include:

caching
- The existing WithCacheControl filter sets the response header “Cache-Control: no-cache, private” to prevent shared caches from storing responses (since requests are subject to authn/authz), and to prevent responses in non-shared caches from being reused without validation. Deterministic encoding could allow an API server to generate strong ETags by hashing the encoded form of the resource.
- Even for the existing data formats, there should be no caching proxies storing API responses.
diffing
- The human-readable text formats (JSON and YAML) are not changing under this proposal.

Encode benchmarks for the two evaluated Go CBOR libraries show a 2.4x speedup and a 1.8x speedup by disabling map key sorting. According to the spec, “the CBOR data model for maps does not allow ascribing semantics to the order of the key/value pairs in the map representation.” And since the CBOR decoder will reject maps containing duplicate keys, a CBOR map represents exactly the same set of key-value pairs regardless of the order they are encoded.

In order to take advantage of the available speedup, the CBOR encoder will support separate deterministic and nondeterministic modes. The deterministic mode will be used for storage serialization only. The nondeterministic mode should introduce randomness into the order of map item encoding (as with map iteration in Go) to make it easier to detect invalid assumptions about the order, but not in a way that adds significant overhead.

To further mitigate the risk that the output of the nondeterministic encoder mode will be accidentally used in cases that require determinism (bytewise equality, hashing, etc.), and because output determinism is implicitly part of the contract of runtime.Encoder, the CBOR encoder will also implement a new interface:

type NondeterministicEncoder interface {
  NondeterministicEncode(runtime.Object, io.Writer) error
}

Callers that don’t require output determinism will perform a conditional type assertion and invoke NondeterministicEncode in place of Encode.

Unicode

CBOR supports distinct major types for text strings and byte strings. Text strings that do not contain a valid UTF-8 sequence are well-formed but invalid CBOR. Unlike JSON strings, CBOR text strings do not support any escape sequences.

The JSON serializer replaces invalid UTF-8 sequences with the Unicode replacement character (u+fffd) during both encode and decode. This is consistent with the behavior of encoding/json in the Go standard library. Generated Protobuf marshal and unmarshal code neither validates nor coerces strings; the byte sequence is directly copied on both encode and decode.

To avoid accepting invalid CBOR, the decoder will produce an error if a text string is not a valid UTF-8 sequence. Strings will follow the precedent established by Protobuf and be encoded using CBOR’s byte string type, except in cases where the encoder can be sure that the string is a valid UTF-8 sequence. This ensures the serializer will not encode an object to a byte sequence that it will not successfully decode.

Libraries

	github.com/ugorji/go/codec	github.com/fxamacker/cbor/v2
license	MIT	MIT
text string utf-8 coercion	none	none
decode: text string utf-8 validation	{error, ignore}	{error, ignore}
decode: duplicate map key	ignore	{error, ignore}
decode: unknown field name	{error, ignore}	{error, ignore}
decode: case-sensitivity	yes	no
unsafe	yes (disable by build tag)	no
fuzzed	no	maybe

Benchmarks TODO: inline

RawExtension

The RawExtension type in k8s.io/apimachinery/pkg/runtime allows extension types to be handled opaquely within external versioned types, as long as they are syntactically valid.

The type declaration is:

type RawExtension struct {
  Raw []byte
  Object Object
}

Using JSON, marshalling and unmarshalling of RawExtension is comparable to that of the standard library’s RawMessage type. For unmarshalling, if the input serialized JSON value is null, the destination RawExtension is not modified. Otherwise, its Raw field is set to a verbatim copy of the provided serialized JSON value. The contract of json.Unmarshaler states that implementations can assume that the input is valid encoding of a JSON value. Absent a bug in the caller (typically via json.Marshal or (*json.Decoder).Decode), a RawExtension’s Raw field will contain a valid JSON text after unmarshaling.

In general, for an encoding that supports Unstructured, the encoding of a RawExtension value must always be the same as the overall encoding of the request or response body. This is not the case for Protobuf. Protobuf can encode RawExtension fields with any encoding since both the writer and reader of a Protobuf message have the type information to know that they are serializing or deserializing a RawExtension message.

There are three cases when marshalling RawExtension to JSON:

If both Raw and Object are nil, null is returned.
If Raw is not nil, return it verbatim.
Otherwise (Raw is nil and Object is not nil), return the result of marshalling Object.

Note that, in the second case, the bytes of the Raw field must be a valid JSON text in order to successfully serialize an object containing a RawExtension to JSON.

Usage

Transient External Types

External versioned types may use RawExtension to exchange arbitrary objects and plugins without persisting them to storage. In these cases, only a single object encoding is involved. When preparing to send, or handle a received object containing RawExtension, callers can assume that the Raw bytes are in the same encoding as the negotiated request or response encoding.

Stored External Types

Storing the verbatim Raw bytes of a RawExtension received from a client introduces additional considerations on top of the transient (transmit-only) case. The encoding of the Raw bytes is determined by encoding of the request that wrote the value of the RawExtension, which may or may not be the same as the object’s storage encoding.

Types as Canonical Definition of Custom Resources

Throughout the ecosystem, it is common practice to maintain Go structs as the canonical definition for API extensions. In many cases, controller-gen is used to mechanically translate such types from Go sources to CustomResourceDefinition manifests. Similarly, client-gen can produce typed Go clients that use the canonical Go types directly. These Go struct types can and sometimes do include fields of type RawExtension (example ).

Scenarios

The following tables enumerate API request and response flows that can involve RawExtension.

The Client and Server columns indicate the types the named component uses to processes API objects. If “dynamic”, it uses Unstructured (e.g. a custom resource handler or a dynamic client). If “typed”, it uses API-specific Go types that may include RawExtension (e.g. clients generated by client-gen, kube-apiserver built-in types, aggregated apiservers). The table omits cases where both the client and the server are dynamic (e.g. a dynamic client and a custom resource handler), since neither side should be dealing with RawExtension values. The edge case where a client program makes a RawExtension a child of an Unstructured value’s map[string]interface{} can be considered a static client case for the purposes of this evaluation.

The Encoding column is the client’s encoding of the request body (for requests) or the server’s encoding of the response body (for responses).

Marshalled Unstructured

N	Client	Server	Direction	Encoding
1	dynamic	typed	request	json
2	dynamic	typed	request	cbor
3	typed	dynamic	response	json
4	typed	dynamic	response	cbor

In these cases, the marshalling side acts on an Unstructured object and is not aware that the unmarshalling side may decode some of the payload into a RawExtension. The bytes stored in the RawExtension by unmarshalling ultimately depend on the negotiated content type, which can vary with the enablement of the CBOR serializer. Existing programs have so far been able to assume that unmarshalled RawExtensions always have either nil or a valid JSON text in their Raw field.

Marshalled RawExtension

N	Client	Server	Direction	Encoding
1	typed	typed	request	json
2	typed	typed	request	cbor
3	typed	typed	response	json
4	typed	typed	response	cbor
5	dynamic	typed	response	json
6	dynamic	typed	response	cbor
7	typed	dynamic	request	json
8	typed	dynamic	request	cbor
9	typed	typed	request	protobuf
10	typed	typed	response	protobuf

In these cases, if the marshalling side populates Raw with a non-nil slice, it is responsible for ensuring that that encoding of the slice contents matches the encoding that will be used to serialize the object containing the RawExtension. This is trivially ensured in cases 9 and 10 because Protobuf is capable of representing RawExtension values containing arbitrary bytes. Protobuf is not a supported encoding for Unstructured objects. Existing programs have in practice stored JSON in the Raw field of RawExtension.

Compatibility

If the RawExtension marshalling and unmarshalling behavior for CBOR were to be implemented in exactly the same way as the existing JSON behaviors, the assumptions in many existing programs that the Raw field can be assigned to a slice of JSON bytes, or that the Raw bytes of an unmarshalled RawExtension are valid JSON, would be broken.

The simple approach of automatically transcoding JSON to CBOR during CBOR marshalling, and transcoding CBOR to JSON during CBOR unmarshalling, would avoid breaking existing programs. However, the expense of transcoding to or from JSON would negate any performance advantage of a binary encoding. This expense would not be limited to a few API types: significant examples include the use of a RawExtension field in metav1.WatchEvent to represent each watch event’s object state, or the arbitrary objects embedded in admissionv1.AdmissionRequest.

A new ContentType string field will be added to RawExtension to indicate the IANA media type of the Raw bytes. If empty, the assumed content type is “application/json”. In existing usage, if a RawExtension’s Raw field does not contain valid JSON, the RawExtension itself cannot be marshalled to JSON.

ContentType will not be serialized to JSON or CBOR, but it will be serialized to Protobuf. When unmarshalling either JSON or CBOR into a RawExtension, the content type is implicitly the same as that of the input. This is not true for Protobuf, which is capable of embedding RawExtensions using any encoding, since in all cases both the writer and reader of a Protobuf message are aware that they are handling an extension.

The proposed behavior for both MarshalJSON and MarshalCBOR is:

If both Raw and Object are nil, null is returned.
If Object is not nil, return the result of marshalling Object to the target encoding.
If the ContentType matches the media type of the target encoding (or if ContentType is the empty string and the target encoding is JSON), return the Raw bytes verbatim.
Otherwise, return the result of transcoding the Raw bytes from the encoding indicated by ContentType to the target encoding.

Unmarshalling will behave the same for CBOR as it currently does for JSON and the input bytes will be copied verbatim to the Raw field. The ContentType will be set to “application/json” by a successful call to UnmarshalJSON and to “application/cbor” by a successful call to UnmarshalCBOR.

Additionally, by default, the Raw bytes of a decoded RawExtension will be automatically transcoded to JSON to preserve compatibility with programs that assume an unmarshalled RawExtension contains valid JSON. The CBOR serializer available through serializer.CodecFactory will be wired to use this, allowing existing programs to continue to assume that unmarshalled Raw bytes contain JSON. The stream serializer will not. In practice, the watch decoder assumes that the non-stream serializer can directly decode the Raw bytes of a metav1.WatchEvent decoded by the stream serializer.

There will be a migration period during which it will remain possible to disable automatic transcoding of RawExtension by clients via client-go feature gate. API servers based on the generic API server module (k8s.io/apiserver) must remove hardcoded assumptions about the format of RawExtension bytes decoded from request bodies. It will remain possible to disable the CBORServingAndStorage feature gate for at least 3 releases at beta in order to provide sufficient time to update API servers without blocking k8s.io/apiserver dependency updates.

Migration

Naive Clients

Client assumes received RawExtension is JSON.
Client receives CBOR response body. The response bytes that represent the RawExtension are CBOR.
During decoding, the RawExtension’s Raw field is transcoded from CBOR to JSON.
Client continues processing RawExtension bytes as JSON.

Advanced Clients

Client tolerates RawExtensions containing either JSON or CBOR.
Client receives CBOR response body. The response bytes that represent the RawExtension are CBOR.
No transcoding is performed during decoding.
Client detects the format of the RawExtension bytes and processes it accordingly. RawExtension will implement UnstructuredConverter, providing a one-liner to get an Unstructured from a RawExtension.

Post-GA, CBOR as Default Preferred Request/Response Encoding for One Year

Automatic transcoding client feature gate becomes disabled by default. The feature gate is unlocked and transcoding can be re-enabled without code changes using the existing client feature gate environment variable mechanism.

Post-GA, CBOR as Default Preferred Request/Response Encoding for Two Years

Automatic transcoding client feature gate is removed and requires code changes to enable.

All existent clusters will support CBOR. Existing programs continue to work unmodified. Updating client libraries in existing programs may cause them to break if they have not changed how they are handling RawExtensions.

Test Plan

[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.

Prerequisite testing updates

Unit tests

Tests for the following behaviors will be added:

decoding a map containing duplicate keys into a Go map produces an error
decoding a map containing duplicate keys into a Go struct produces an error
roundtripping preserves the distinction between integers and floating-point numbers
decoding a text string containing an invalid UTF-8 sequence produces an error
decoding a map into a Go struct matches json field tag names case-sensitively
when decoding a map into a Go struct, a case-insensitive match between a key and a json field tag name is treated the same as no match
encoding a struct with duplicate field names (json tag names) does not result in a map containing duplicate keys ([https://go.dev/issue/17913])
pooled buffers should not grow and be retained forever ([https://go.dev/issue/23199])
decoding into a Go interface{} stores only either nil concrete values or concrete values of type bool, string, int64, float64, []interface{}, or map[string]interface{} (no special treatment of tagged content producing time.Time, math/big.Int, etc.)
conformance to CBOR specification (adopt existing suite and/or develop as necessary)
- this should be demonstrated to run against implementations in at least some of the non-Go client languages (Python )
Go strings that are not valid UTF-8 sequences can be roundtripped through CBOR without error
decoding a map into a Go struct produces a strict decoding error if the map contains a key that does not correspond to JSON tag name of one of the struct’s fields
roundtripping preserves the distinction between absent, present-but-null, and present-and-empty for slices and maps
runtime.RawExtension
- re-encoding preserves the original raw bytes
- encoding a runtime.Object with existing no raw bytes defaults to JSON
- decoding JSON-in-CBOR, JSON-in-Protobuf, CBOR-in-JSON, and CBOR-in-Protobuf is supported

As well as fuzz tests covering:

for all native types, native-to-JSON-to-unstructured and native-to-CBOR-to-unstructured is identical
the number of bytes allocated per decode does not exceed a reasonable upper limit
roundtrip JSON-to-CBOR-to-JSON and CBOR-to-JSON-to-CBOR
roundtrip through implementations in at least some of the non-Go client languages

Integration tests

custom resources storage encoding is CBOR with feature gate enabled
custom resources storage encoding is JSON with feature gate disabled
response content-type negotiation works and honors indicated preference (Protobuf > CBOR > JSON)
get, list, watch, update, delete, deletecollection, and scale support CBOR using dynamic and generated clients for all native types
mixed CBOR and JSON encodings in storage for a single custom resource can be retrieved with feature gate disabled
client gating mechanism:
- can force clients otherwise configured with a CBOR request encoding to use JSON
- can change the default request encoding to CBOR if not explicitly configured
- can be disabled programmatically
request content-type falls back to JSON and does not try CBOR again for a given (method, target resource) pair

e2e tests

request and response content-type negotiation with 1.17 sample API server

Custom JSON Marshalers

If a type implements json.Marshaler or json.Unmarshaler without corresponding CBOR behaviors, serializing values of that type to and from CBOR using default behaviors risks mangling the data.

As an example, consider the structure of a marshalled IntOrString with the custom behavior versus the default behavior:

Go	Custom	Default
IntOrString{Type: Int, IntVal: 7}	7	{“IntVal”:7,“StrVal”:"",“Type:":0}
IntOrString{Type: String, StrVal: “foo”}	“foo”	{“IntVal”:0,“StrVal”:“foo”,“Type:":1}
IntOrString{Type: -1}		{“IntVal”:0,“StrVal”:””,“Type:”:-1}

Imagine a similar type is declared out-of-tree. It has a similar implementation of json.Marshaler, but not corresponding custom implementation for CBOR. From this type, a CRD and typed client are generated. This typed client is used in a program to write to a custom resource, using JSON to encode the request body as either a JSON number or a JSON string. On the server side, the request body is decoded into an Unstructured object, and within that object, the IntOrString value is represented by either a string or an int64.

Now imagine that the same request is repeated, but with CBOR as the negotiated content type of the request body, and that the CBOR serializer implementation does not recognize types that implement json.Marshaler or json.Unmarshaler. By changing the request content type from JSON to CBOR, the actual bytes of the request body represent a structurally different object. Referencing the table above, instead of the “Custom” encoding, the encoded CBOR would look like the “Default” encoding.

On the server side, the value is represented within the decoded Unstructured as a map[string]interface{} with three keys, "IntVal", "StrVal", and "Type". A change in the request encoding resulted in a structural change to the object the client intended to send.

The CBOR serializer must not use the default behaviors to marshal and unmarshal values that implement only custom JSON behaviors. Rejecting them with an error is a minimum requirement for alpha, since it prevents corruption. This would support in-tree types, server-side custom resource serialization, and typical dynamic client usage. A second alpha release will support these types automatically by invoking the JSON methods and transcoding to or from CBOR.

All of the above also applies to types implementing encoding.TextMarshaler (which is used if implemented unless json.Marshaler is also implemented) and encoding.TextUnmarshaler (which is used if implemented when the input is a JSON string unless json.Unmarshaler is also implemented).

Graduation Criteria

Alpha

All new tests enumerated in “Test Plan” are implemented.
Feature gate wired to kube-apiserver.
Dynamic client updated to support CBOR behind client-side gates.
Client generation updated to support CBOR behind client-side gates.
Runtime gating mechanism added to client-go.
Maintenance of CBOR library is understood.
Types that implement json.Marshaler or json.Unmarshaler without corresponding custom CBOR behaviors are either rejected with an error on Encode and Decode or automatically transcoded from JSON.

Beta

Review of nondeterministic encoding mode and final decision on whether to keep or remove it.
To support rollback from beta to alpha, at least one alpha release has supported automatic transcoding of types that implement json.Marshaler or json.Unmarshaler without corresponding custom CBOR behaviors.
All Kubernetes components have opted out of automatic transcoding to JSON for FieldsV1 and RawExtension.
Collection (i.e. List) object encoding supports “true” streaming (i.e. buffer size is not proportional to output size).
Structured endpoints like discovery, statusz, flagz, etc., support CBOR.

GA

Granular content-type fallback behavior on HTTP 415.
Ability to bypass content-type fallback behavior using OpenAPI.

Upgrade / Downgrade Strategy

API servers will be able to decode resources that have been stored with a CBOR encoding, even when the feature gate permitting the CBOR storage encoding is disabled. The feature gate will remain disabled by default during alpha. The default storage encoding will not change for built-in API types. The default storage encoding for custom resources will not change in the first version to support decoding CBOR-encoded objects from storage, so it will remain possible after a downgrade for kube-apiserver to decode any resources that may have been stored with the CBOR encoding.

Version Skew Strategy

Server-side support for accepting CBOR as a request encoding and returning CBOR as a response encoding is in addition to the existing support for JSON and Protobuf. CBOR is never selected as a response encoding unless the client has included a CBOR media type in the “Accept” request header. Older components will continue to use the existing encodings in their interactions with API servers that support CBOR.

Clients that proactively send a CBOR-encoded request to an API server without CBOR support will receive an HTTP 415 (Unsupported Media Type) response status and fall back to JSON. The test plan includes an end-to-end test covering a CBOR request made to the sample 1.17 API server to mitigate the risk of regressing this client-side fallback behavior.

Clients that include the CBOR media type in the “Accept” header will also include the JSON media type. API servers without CBOR support will select JSON as the response encoding through content negotiation.

Production Readiness Review Questionnaire

Feature Enablement and Rollback

How can this feature be enabled / disabled in a live cluster?

Feature gate (also fill in values in kep.yaml)
- Feature gate name: CBORServingAndStorage
- Components depending on the feature gate:
  - kube-apiserver

Does enabling the feature change any default behavior?

Enabling the feature changes the default storage encoding of custom resources to CBOR, but this should be invisible to clients.

Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

Yes, with the exception of support for CBOR decoding of custom resources from storage. That cannot be disabled because it must remain possible to decode any resource that has already been persisted.

With CBOR is disabled on the server side, resources that have been persisted using the CBOR encoding can be replaced with their JSON encoding by retrieving the resource as JSON and writing it back unaltered. This is the same process used for storage version migrations and can be automated using the Storage Version Migrator.

What happens if we reenable the feature if it was previously rolled back?

No additional considerations. Custom resource storage will support recognition and decoding of both JSON and CBOR whether the feature is enabled or disabled.

Are there any tests for feature enablement/disablement?

There is an integration test that ensures custom resources that have been stored with a mixture of CBOR and JSON encodings continue to be accessible with the feature gate disabled .

Rollout, Upgrade and Rollback Planning

How can a rollout or rollback fail? Can it impact already running workloads?

Existing releases are capable of decoding CBOR-encoded custom resources from storage whether or not the feature gate is enabled. During a rollout or rollback where only a subset of API servers have the CBOR storage encoding enabled, all API servers will be able to decode all custom resources.

A client program that attempts to send a CBOR-encoded request body and receives an HTTP 415 response from an API server with CBOR disabled will fall back to the JSON request body encoding until the client program in question is restarted.

What specific metrics should inform a rollback?

Failure to decode custom resource objects from storage is reported directly through storage_decode_errors_total and should not occur. Extensive serialization roundtrip testing exists to mitigate the risk of bugs that prevent persisted CBOR from being decoded. The established API server request latency and error metrics are not expected to regress.

Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

Upgrade and rollback were simulated manually by enabling and disabling the CBORServingAndStorage feature gate on a local kube-apiserver.

Start with feature disabled.
Create a CRD.

$ curl -X POST "$API/apis/apiextensions.k8s.io/v1/customresourcedefinitions" \
    -H "Content-Type: application/json" -d '{
    "apiVersion": "apiextensions.k8s.io/v1",
    "kind": "CustomResourceDefinition",
    "metadata": {"name": "bars.mygroup.example.com"},
    "spec": {
      "group": "mygroup.example.com",
      "versions": [{"name": "v1beta1", "served": true, "storage": true,
        "schema": {"openAPIV3Schema": {"type": "object", "x-kubernetes-preserve-unknown-fields": true}}}],
      "names": {"plural": "bars", "singular": "bar", "kind": "Bar", "listKind": "BarList"},
      "scope": "Cluster"
    }
  }'

Create the CR test-storage-json.

$ curl -X POST "$API/apis/mygroup.example.com/v1beta1/bars" \
    -H "Content-Type: application/json" -d '{
    "apiVersion": "mygroup.example.com/v1beta1",
    "kind": "Bar",
    "metadata": {"name": "test-storage-json"}
  }'

Verify that test-storage-json is JSON-encoded in storage.

$ ETCDCTL_API=3 etcdctl get /registry/mygroup.example.com/bars/test-storage-json --print-value-only
{"apiVersion":"mygroup.example.com/v1beta1","kind":"Bar",...}

Restart kube-apiserver with the feature enabled.
Create another CR test-storage-cbor.

$ curl -X POST "$API/apis/mygroup.example.com/v1beta1/bars" \
    -H "Content-Type: application/json" -d '{
    "apiVersion": "mygroup.example.com/v1beta1",
    "kind": "Bar",
    "metadata": {"name": "test-storage-cbor"}
  }'

Verify that test-storage-cbor is CBOR-encoded in storage.

$ ETCDCTL_API=3 etcdctl get /registry/mygroup.example.com/bars/test-storage-cbor --print-value-only | od -A n -t x1
d9 d9 f7 .. .. ..

Verify that both resources can be read from the API (existing JSON can be decoded after upgrade).

$ curl "$API/apis/mygroup.example.com/v1beta1/bars/test-storage-json"
{...}
$ curl "$API/apis/mygroup.example.com/v1beta1/bars/test-storage-cbor"
{...}

Restart kube-apiserver with the feature again disabled.
Verify that both resources can be read from the API (persisted CBOR can be decoded after rollback).

$ curl "$API/apis/mygroup.example.com/v1beta1/bars/test-storage-json"
{...}
$ curl "$API/apis/mygroup.example.com/v1beta1/bars/test-storage-cbor"
{...}

Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

No.

Monitoring Requirements

How can an operator determine if the feature is in use by workloads?

Any workload interacting with custom resources is implicitly exercising CBOR as a storage encoding.

Since the request encoding and response encoding are both primarily determined by client behavior (through proactive negotiation), and the existing request metrics don’t provide the ability to distinguish between JSON, YAML, and Protobuf, an operator can’t determine from apiserver metrics that CBOR is being actively used for serving. Programs built with client-go that have the ClientsAllowCBOR and ClientsPreferCBOR client-go feature gates enabled will use CBOR request and response encodings when interacting with custom resources.

How can someone using this feature know that it is working for their instance?

Other (treat as last resort)
- Details: Serving is verifiable by making a request to any API resource with the appropriate content negotiation headers (i.e. “Accept: application/cbor”) and inspecting the first three bytes of the response. They will be the encoding of the Self-Described CBOR tag, 0xd9d9f7. The storage encoding used for custom resources is transparent to end users.

What are the reasonable SLOs (Service Level Objectives) for the enhancement?

Enabling the feature gate shouldn’t regress established SLOs for request latency and error rate.

What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?

Metrics
- Metric name: apiserver_request_total
- Components exposing the metric: kube-apiserver
- Metric name: apiserver_request_duration_seconds
- Components exposing the metric: kube-apiserver

Are there any missing metrics that would be useful to have to improve observability of this feature?

Additional labels on apiserver_request_total to indicate request and response encoding would allow segmenting error rates by serializer and would show adoption by clients. This increases time series cardinality (16 pairs of request and response serializations) and the capability hasn’t been considered necessary for existing serializers.

Dependencies

Does this feature depend on any specific services running in the cluster?

No.

Scalability

Will enabling / using this feature result in any new API calls?

If a client is configured to encode a request body using CBOR, and that request is handled by an API server that does not have CBOR enabled, the API server will send response status 415 (Unsupported Media Type) and the client will repeat the request using JSON. This is not expected to produce a substantial number of additional requests because:

the default request encoding for clients will not be modified until CBOR support is widespread (beyond GA and accounting for version skew)
individual clients will limit failed attempts at using CBOR as request content-type for any given verb and target resource

Will enabling / using this feature result in introducing new API types?

No.

Will enabling / using this feature result in any new calls to the cloud provider?

No.

Will enabling / using this feature result in increasing size or count of the existing API objects?

No. Objects counts will not be affected. Storage and most serving of native types will continue to use Protobuf and will be unaffected. Traffic from dynamic clients, and storage of custom resources, should be modestly more compact. Although not a goal of this proposal, pods encoded as part of benchmarking were approximately 20% smaller with CBOR than with JSON.

Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?

No.

Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?

No.

Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?

No.

Troubleshooting

How does this feature react if the API server and/or etcd is unavailable?

N/A. The serialization itself doesn’t depend on communication between API servers or between an API server and etcd.

What are other known failure modes?

Essentially any failure mode is a bug, whether it results in corruption, pathological resource consumption or request latency, or serialization roundtrip failures (inability of a kube-apiserver to decode custom resources from storage, failure of a client to decode a response body, or failure of an API server to decode a valid CBOR-encoded request body).

Client operators can prevent further CBOR usage in API requests by disabling the client-go feature gate ClientsAllowCBOR and restarting the client program.

API server operators can stop the use of CBOR in requests and storage encoding by disabling CBORServingAndStorage and restarting the API server. Clients attempting to send CBOR request bodies to an API server with the feature gate disabled will trip a circuit breaker and fall back to JSON. After restarting the API server, any custom resources that are CBOR-encoded in storage can be migrated back to the JSON storage encoding by performing a no-change get/put on each custom resource.

KEP-4222: CBOR Serializer

KEP-4222: CBOR Serializer

Release Signoff Checklist

Summary

Motivation

Goals

Non-Goals

Proposal

Format

Negotiation

Client Enablement

User Stories (Optional)

Story 1

Story 2

Notes/Constraints/Caveats (Optional)

Risks and Mitigations

Phased Implementation

Library Dependency

Design Details

Why CBOR?

Duplicate Map Keys and Unrecognized or Duplicate Field Names

Encoding Determinism

Unicode

Libraries

RawExtension

Usage

Transient External Types

Stored External Types

Types as Canonical Definition of Custom Resources

Scenarios

Compatibility

Migration

Test Plan

Prerequisite testing updates

Unit tests

Integration tests

e2e tests

Custom JSON Marshalers

Graduation Criteria

Alpha

Beta

GA

Upgrade / Downgrade Strategy

Version Skew Strategy

Production Readiness Review Questionnaire

Feature Enablement and Rollback

How can this feature be enabled / disabled in a live cluster?

Does enabling the feature change any default behavior?

Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

What happens if we reenable the feature if it was previously rolled back?

Are there any tests for feature enablement/disablement?

Rollout, Upgrade and Rollback Planning

How can a rollout or rollback fail? Can it impact already running workloads?

What specific metrics should inform a rollback?

Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

Monitoring Requirements

How can an operator determine if the feature is in use by workloads?

How can someone using this feature know that it is working for their instance?

What are the reasonable SLOs (Service Level Objectives) for the enhancement?

What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?

Are there any missing metrics that would be useful to have to improve observability of this feature?

Dependencies

Does this feature depend on any specific services running in the cluster?

Scalability

Will enabling / using this feature result in any new API calls?

Will enabling / using this feature result in introducing new API types?

Will enabling / using this feature result in any new calls to the cloud provider?

Will enabling / using this feature result in increasing size or count of the existing API objects?

Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?

Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?

Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?

Troubleshooting

How does this feature react if the API server and/or etcd is unavailable?

What are other known failure modes?

What steps should be taken if SLOs are not being met to determine the problem?

Implementation History

Drawbacks

Alternatives

Infrastructure Needed (Optional)