Internet-Draft | Ciphertext Format | January 2021 |
Sheffer, et al. | Expires 19 July 2021 | [Page] |
This document defines a set of structured headers for encrypted data. The main goal of this format is to enable detection of encrypted data in large data stores, and associating it back to the system where it was created and the key with which it was encrypted. This allows organizations to extend the concept of data governance to encrypted data, and to manage such data even when encrypted by multiple different systems and cloud providers.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 19 July 2021.¶
Copyright (c) 2021 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.¶
Organizations that manage sensitive data often employ application-level encryption to protect data at rest. When this solution is used, it is common that very large numbers of encrypted data items are stored, potentially for a long time. Security best practices, complicated organizational structures, as well as the existence of modern key management systems, lead to the proliferation of large numbers of encryption keys. After a while it becomes difficult to identify the encryption key that was used for a particular piece of data, with the situation becoming even more complicated when multiple key management systems are used by the same organization.¶
Application-level encryption can be deployed at different scales: in some cases a multi-megabyte file may be encrypted with a single key. In other cases, we may want to deploy encryption for specific database fields, which can easily manifest itself as millions of keys for a single database table.¶
Tagging encrypted data with metadata supports a number of important use cases: it allows the organization to better catalog the data (a.k.a. "data governance"), to discover the owner of each piece of encrypted data, to detect data encrypted with outdated keys.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
Our main goal in defining a common ciphertext format is to allow organizations to manage large scale data, encrypted at rest using multiple key management and encryption services. Additional motivations for an enterprise to use a common format are:¶
Some of the goals behind this design include:¶
A few notable formats are:¶
The ciphertext is prefixed by a header, which in turn, consists of a short fixed header and variable header. The variable header is a CBOR [RFC8949] map.¶
Following the header is the body of the ciphertext. The format (including length) of the body is out of scope for this document.¶
The fixed header consists of:¶
The variable header is a CBOR map consisting of elements from the following table.¶
Field Name | Map Key | Value Type | Meaning | Mandatory |
---|---|---|---|---|
Key Provider | 1 | Unsigned integer | The organization responsible for the key management system. | Y |
Key ID | 2 | Byte string | An encryption key identifier, where the key is stored in a key management system. This must denote a unique key, even if the Provider supports multiple tenants. Encoding of this field is Provider-specific. The field must appear once. | Y |
Key Version | 3 | Unsigned integer | A version of a key, where the key is rotated on a periodic basis. Encoding of this field is Provider-specific. The field must appear at most once. | N |
Auxiliary Data | 4 | Byte string | Additional data required to derive a specific key from the referenced key (and key version, if any), see also Section 3.1.3. The field must appear at most once. | N |
Nonce | 5 | Byte string | A nonce or initialization vector (IV), if required by the cipher algorithm. We note that an implementation may prefer to store the nonce and authentication tag in-line with the ciphertext. | N |
Authentication Tag | 6 | Byte string | An authentication tag or integrity check value (ICV), if required by the cipher algorithm. | N |
Additional Authenticated Data | 7 | Byte string | Additional authenticated data (AAD), which is integrity-protected but not encrypted by the cipher. | N |
The Auxiliary Data field is used to support derivation of a key, specific to the ciphertext being managed. There are two common ways to obtain this specific key:¶
The exact algorithm is implementation dependent, and should be uniquely defined by the combination of Key Provider, Key ID and (if given) Key Version.¶
Correct interpretation of the format may have security implications, making it important to define the exact semantics even when the entity that receives a ciphertext may not understand parts of the header.¶
We chose the initial byte 0x08, since strings are very unlikely to start with it, as we explain below. Automated tools can detect encrypted data in structured contexts (e.g., a SQL database column) by sampling a number of data items and if all start with this byte, determining that they are encrypted with a high probability.¶
The byte 0x08 encodes the ASCII control character "backspace". It has the same meaning in UTF-8, and the 08 block of UTF-16 characters is only populated by two very small languages and rarely-used extended Arabic characters.¶
08 01
¶
{1: 65535, 2: h'1122334455', 3: 6, }
¶
a3 01 19 ff ff 02 45 11 22 33 44 55 03 06
¶
08 01 a3 01 19 ff ff 02 45 11 22 33 44 55 03 06
¶
The following non-normative snippet defines the format of the variable header using CDDL [RFC8610].¶
var_header = { K_KEY_PROVIDER: uint, K_KEY_ID: bstr, ? K_KEY_VERSION: uint, ? K_AUX_DATA: bstr, ? K_NONCE : bstr, ? K_AUTH_TAG : bstr, ? K_AAD : bstr, *uint => any ; extensions } K_RESERVED = 0 K_KEY_PROVIDER = 1 K_KEY_ID = 2 K_KEY_VERSION = 3 K_AUX_DATA = 4 K_NONCE = 5 K_AUTH_TAG = 6 K_AAD = 7 ; extend here¶
TBD: establish a registry for Types, with 128-255 as private use.¶
TBD: establish a registry of Key Providers.¶
The format defined here does not include integrity protection for the header, and neither does it mandate that the encrypted item's integrity protection should include the header.¶
Data encrypted at rest is typically vulnerable to denial of service attacks, since (assuming the data is integrity protected) an attacker that can change the ciphertext can trivially cause it to fail validation.¶
There are cases where it is convenient to manipulate the ciphertext header, even if the data itself remains encrypted and unmodified. For example, when migrating between formats or when bulk-changing metadata associated with the ciphertext. On the other hand, it is a best practice to protect cryptographic metadata against malicious modification. We are currently not aware of a specific threat vector associated with malicious changes to the proposed format, at least assuming the use of AEAD ciphers.¶