Internet DRAFT - draft-ietf-keytrans-architecture
draft-ietf-keytrans-architecture
Key Transparency B. McMillion
Internet-Draft 4 March 2024
Intended status: Informational
Expires: 5 September 2024
Key Transparency Architecture
draft-ietf-keytrans-architecture-01
Abstract
This document defines the terminology and interaction patterns
involved in the deployment of Key Transparency (KT) in a general
secure group messaging infrastructure, and specifies the security
properties that the protocol provides. It also gives more general,
non-prescriptive guidance on how to securely apply KT to a number of
common applications.
About This Document
This note is to be removed before publishing as an RFC.
The latest revision of this draft can be found at https://ietf-wg-
keytrans.github.io/draft-arch/draft-ietf-keytrans-architecture.html.
Status information for this document may be found at
https://datatracker.ietf.org/doc/draft-ietf-keytrans-architecture/.
Discussion of this document takes place on the Key Transparency
Working Group mailing list (mailto:keytrans@ietf.org), which is
archived at https://mailarchive.ietf.org/arch/browse/keytrans/.
Subscribe at https://www.ietf.org/mailman/listinfo/keytrans/.
Source for this draft and an issue tracker can be found at
https://github.com/ietf-wg-keytrans/draft-arch.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
McMillion Expires 5 September 2024 [Page 1]
Internet-Draft Key Transparency Architecture March 2024
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 5 September 2024.
Copyright Notice
Copyright (c) 2024 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Revised BSD License text as
described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Conventions and Definitions . . . . . . . . . . . . . . . . . 4
3. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 5
4. User Interactions . . . . . . . . . . . . . . . . . . . . . . 5
4.1. Out-of-Band Communication . . . . . . . . . . . . . . . . 8
5. Deployment Modes . . . . . . . . . . . . . . . . . . . . . . 9
5.1. Contact Monitoring . . . . . . . . . . . . . . . . . . . 10
5.2. Third-Party Auditing . . . . . . . . . . . . . . . . . . 12
5.3. Third-Party Management . . . . . . . . . . . . . . . . . 12
6. Combining Logs . . . . . . . . . . . . . . . . . . . . . . . 13
6.1. Gradual Migration . . . . . . . . . . . . . . . . . . . . 14
6.2. Immediate Migration . . . . . . . . . . . . . . . . . . . 14
6.3. Federation . . . . . . . . . . . . . . . . . . . . . . . 15
7. Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
8. Security Guarantees . . . . . . . . . . . . . . . . . . . . . 16
8.1. Privacy Guarantees . . . . . . . . . . . . . . . . . . . 18
8.1.1. Leakage to Third-Party . . . . . . . . . . . . . . . 18
9. Privacy Law Considerations . . . . . . . . . . . . . . . . . 19
10. Implementation Guidance . . . . . . . . . . . . . . . . . . . 20
11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22
12. References . . . . . . . . . . . . . . . . . . . . . . . . . 22
12.1. Normative References . . . . . . . . . . . . . . . . . . 22
12.2. Informative References . . . . . . . . . . . . . . . . . 22
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 22
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 22
McMillion Expires 5 September 2024 [Page 2]
Internet-Draft Key Transparency Architecture March 2024
1. Introduction
Before any information can be exchanged in an end-to-end encrypted
system, two things must happen. First, participants in the system
must provide the service operator with any public keys they wish to
use to receive messages. Second, the service operator must somehow
distribute these public keys amongst the participants that wish to
communicate with each other.
Typically this is done by having users upload their public keys to a
simple directory where other users can download them as necessary, or
by providing public keys in-band with the communication being
secured. With this approach, the service operator needs to be
trusted to provide the correct public keys, which means that the
underlying encryption protocol can only protect users against passive
eavesdropping on their messages.
However most messaging systems are designed such that all messages
exchanged between users flow through the service operator's servers,
so it's extremely easy for an operator to launch an active attack.
That is, the service operator can provide fake public keys which it
knows the private keys for, associate those public keys with a user's
account without the user's knowledge, and then use them to
impersonate or eavesdrop on conversations with that user.
Key Transparency (KT) solves this problem by requiring the service
operator to store user public keys in a cryptographically-protected
append-only log. Any malicious entries added to such a log will
generally be equally visible to both the key's owner and the owner's
contacts, in which case a user can detect that they are being
impersonated by viewing the public keys attached to their account.
If the service operator attempts to conceal some entries of the log
from some users but not others, this creates a "forked view" which is
permanent and easily detectable with out-of-band communication.
The critical improvement of KT over related protocols like
Certificate Transparency [RFC6962] is that KT includes an efficient
protocol to search the log for entries related to a specific
participant. This means users don't need to download the entire log,
which may be substantial, to find all entries that are relevant to
them. It also means that KT can better preserve user privacy by only
showing entries of the log to participants that genuinely need to see
them.
McMillion Expires 5 September 2024 [Page 3]
Internet-Draft Key Transparency Architecture March 2024
2. Conventions and Definitions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.
*End-to-end Encrypted Communication Service:* A communications
service that allows end-users to engage in text, voice, video, or
other forms of communication over the Internet, that uses public
key cryptography to ensure that communications are only accessible
to their intended recipients.
*End-user Device:* The device at the final point in a digital
communication, which may either send or receive encrypted data in
an end-to-end encrypted communication service.
*End-user Identity:* A unique and user-visible identity associated
with an account (and therefore one or more end-user devices) in an
end-to-end encrypted communication service. In the case where an
end-user explicitly requests to communicate with (or is informed
they are communicating with) an end-user uniquely identified by
the name "Alice", the end-user identity is the string "Alice".
*User / Account:* A single end-user of an end-to-end encrypted
communication service, which may be represented by several end-
user identities and end-user devices. For example, a user may be
represented simultaneously by multiple identities (email, phone
number, username) and interact with the service on multiple
devices (phone, laptop).
*Service Operator:* The primary organization that provides the
infrastructure and software resources necessary to operate an end-
to-end encrypted communication service.
*Transparency Log:* A specialized service capable of securely
attesting to the information (such as public keys) associated with
a given end-user identity. The transparency log is usually run
either entirely or partially by the service operator.
McMillion Expires 5 September 2024 [Page 4]
Internet-Draft Key Transparency Architecture March 2024
3. Protocol Overview
From a networking perspective, KT follows a client-server
architecture with a central _Transparency Log_, acting as a server,
which holds the authoritative copy of all information and exposes
endpoints that allow users to query or modify stored data. Users
coordinate with each other through the server by uploading their own
public keys and/or downloading the public keys of other users. Users
are expected to maintain relatively little state, limited only to
what is required to interact with the log and ensure that it is
behaving honestly.
From an application perspective, KT works as a versioned key-value
database. Users insert key-value pairs into the database where, for
example, the key is their username and the value is their public key.
Users can update a key by inserting a new version with new data.
They can also look up the most recent version of a key or any past
version. Users are considered to *own* a key if, in the normal
operation of the application, they should be the only one making
changes to it. From this point forward, "key" will refer to a lookup
key in a key-value database and "public key" or "private key" will be
specified if otherwise.
KT does not require the use of a specific transport protocol. This
is intended to allow applications to layer KT on top of whatever
transport protocol their application already uses. In particular,
this allows applications to continue relying on their existing access
control system.
Applications may enforce arbitrary access control rules on top of KT
such as requiring a user to be logged in to make KT requests, only
allowing a user to lookup the keys of another user if they're
"friends", or simply applying a rate limit. Applications SHOULD
prevent users from modifying keys that they don't own. The exact
mechanism for rejecting requests, and possibly explaining the reason
for rejection, is left to the application.
4. User Interactions
As discussed in Section 3, KT follows a client-server architecture.
This means users generally interact directly with the transparency
log. The operations that can be executed by a user are as follows:
1. *Search:* Performs a lookup on a specific key in the most recent
version of the log. Users may request either a specific version
of the key, or the most recent version available. If the key-
version pair exists, the server returns the corresponding value
and a proof of inclusion.
McMillion Expires 5 September 2024 [Page 5]
Internet-Draft Key Transparency Architecture March 2024
2. *Update:* Adds a new key-value pair to the log, for which the
server returns a proof of inclusion. Note that this means that
new values are added to the log immediately and no provisional
inclusion proof, such as an SCT as defined in Section 3 of
[RFC6962], is provided.
3. *Monitor:* While Search and Update are run by the user as
necessary, monitoring is done in the background on a recurring
basis. It both checks that the log is continuing to behave
honestly (all previously returned keys remain in the tree) and
that no changes have been made to keys owned by the user without
the user's knowledge.
These operations are executed over an application-provided transport
layer, where the transport layer enforces access control by blocking
queries which are not allowed:
Alice Transparency Log
| |
| (Valid / Accepted Requests) |
| |
| Search(Alice) ---------------------------> |
| <--------------------- SearchResponse(...) |
| |
| Search(Bob) -----------------------------> |
| <--------------------- SearchResponse(...) |
| |
| Update(Alice, ...) ----------------------> |
| <--------------------- UpdateResponse(...) |
| |
| |
| (Rejected / Blocked Requests) |
| |
| Search(Fred) ----------------------> X |
| Update(Bob, ...) ------------------> X |
| |
Figure 1: Example request/response flow. Valid requests receive
a response while invalid requests are blocked by the transport
layer.
An important caveat to the client-server architecture is that many
end-to-end encrypted communication services require the ability to
provide _credentials_ to their users. These credentials convey a
binding between an end-user identity and potentially several
encryption or signature public keys, and are meant to be verified
with no/minimal network requests by the receiving users.
McMillion Expires 5 September 2024 [Page 6]
Internet-Draft Key Transparency Architecture March 2024
In particular, credentials that can be verified with minimal network
access are often required by applications provide anonymous
communication. These applications provide end-to-end encryption with
a protocol like the Messaging Layer Security protocol [RFC9420] (with
the encryption of handshake messages required), or Sealed Sender
[sealed-sender]. When a user receives a message, these protocols
have senders provide their own credential in an encrypted portion of
the message. Encrypting the sender's credential prevents it from
being visible to the service provider, while still assuring the
recipient of the sender's identity. If users were to authenticate
the sender's public key directly with the service provider, they
would leak to the service provider who the they are communicating
with.
Key Transparency credentials can be created by serializing one or
more Search request-response pairs. These Search operations would
correspond to the lookups a user needs to do to prove the
relationship between their end-user identity and their cryptographic
keys. Recipients can verify the request-response pairs themselves
without contacting the Transparency Log.
Any future monitoring that may be required can be provided to
recipients proactively by the sender. However if this fails, the
recipient can still perform the monitoring themselves (including over
an anonymous channel if necessary).
Transparency Log Alice Anonymous Group
| | |
| <--------------- Search(Alice) | |
| SearchResponse(...) ---------> | Encrypt(Anon Group, |
| | SearchResponse || |
| | Message || |
| | Signature) ---------> |
| | |
| <-------------- Monitor(Alice) | |
| MonitorResponse(...) --------> | Encrypt(Anon Group, |
| | MonitorResponse) ---> |
| | |
Figure 2: Example message flow in an anonymous deployment. Users
request their own key from the Transparency Log and provide the
serialized response, functioning as a credential, in encrypted
messages to other users. Required monitoring is provided
proactively.
McMillion Expires 5 September 2024 [Page 7]
Internet-Draft Key Transparency Architecture March 2024
4.1. Out-of-Band Communication
It is sometimes possible for a Transparency Log to present forked
views of data to different users. This means that, from an
individual user's perspective, a log may appear to be operating
correctly in the sense that all of a user's requests succeed and
proofs verify correctly. However, the Transparency Log has presented
a view to the user that's not globally consistent with what it has
shown other users. As such, the log may be able to associate data
with keys without the key owner's awareness.
The protocol is designed such that users always require subsequent
queries to prove consistency with previous queries. As such, users
always stay on a linearizable view of the log. If a user is ever
presented with a forked view, they hold on to this forked view
forever and reject the output of any subsequent queries that are
inconsistent with it.
This provides ample opportunity for users to detect when a fork has
been presented, but isn't in itself sufficient for detection. To
detect forks, users must either use *peer-to-peer communication* or
*anonymous communication* with the Transparency Log.
With peer-to-peer communication, two users gossip with each other to
establish that they both have the same view of the log's data. This
gossip is able to happen over any supported out-of-band channel, even
if it is heavily bandwidth-limited, such as scanning a QR code or
talking over the phone.
With anonymous communication, a single user accesses the Transparency
Log over an anonymous channel and tries to establish that the log is
presenting the same view of data over the anonymous channel as it
does over authenticated channels.
In the event that a fork is successfully detected, the user is able
to produce non-repudiable proof of log misbehavior which can be
published.
McMillion Expires 5 September 2024 [Page 8]
Internet-Draft Key Transparency Architecture March 2024
Alice Bob Transparency Log
| | |
| | (Normal reqs over authenticated channel) |
| | |
| | Search(Bob) ---------------------------> |
| | <---------- Response{Head: 6c063bb, ...} |
| | |
| | |
| | |
| | |
| (OOB check with peer) | (OOB check over anonymous channel) |
| | |
| <------ DistinguishedHead | DistinguishedHead ~~~~~~~~~~~~~~~~~~~~~> |
| 6c063bb ----------------> | <~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 6c063bb |
| | |
| | Search(Bob) ~~~~~~~~~~~~~~~~~~~> X |
| | |
Figure 3: Users receive tree heads while making authenticated
requests to a Transparency Log. Users ensure consistency of tree
heads by either comparing amongst themselves, or by contacting
the Transparency Log over an anonymous channel. Requests that
require authorization are not available over the anonymous
channel.
5. Deployment Modes
In the interest of satisfying the widest range of use-cases possible,
three different modes for deploying a Transparency Log are supported.
Each mode has slightly different requirements and efficiency
considerations for both the transparency log and the end-user.
*Third-Party Management* and *Third-Party Auditing* are two
deployment modes that require the transparency log to delegate part
of its operation to a third party. Users are able to run more
efficiently as long as they can assume that the transparency log and
the third party won't collude to trick them into accepting malicious
results.
With both third-party modes, all requests from end-users are
initially routed to the transparency log and the log coordinates with
the third party itself. End-users never contact the third party
directly, however they will need a signature public key from the
third party to verify its assertions.
With Third-Party Management, the third party performs the majority of
the work of actually storing and operating the service, and the
transparency log only signs new entries as they're added. With
McMillion Expires 5 September 2024 [Page 9]
Internet-Draft Key Transparency Architecture March 2024
Third-Party Auditing, the transparency log performs the majority of
the work of storing and operating the service, and obtains signatures
from a lightweight third-party auditor at regular intervals asserting
that the tree has been constructed correctly.
*Contact Monitoring*, on the other hand, supports a single-party
deployment with no third party. The cost of this is that executing
the background monitoring protocol requires an amount of work that's
proportional to the number of keys a user has looked up in the past.
As such, it's less suited to use-cases where users look up a large
number of ephemeral keys, but would work ideally in a use-case where
users look up a limited number of keys repeatedly (for example, the
keys of regular contacts).
+========================+==========================+===============+
| Deployment Mode | Supports ephemeral keys? | Single party? |
+========================+==========================+===============+
| Contact Monitoring | No | Yes |
+------------------------+--------------------------+---------------+
| Third-Party Auditing | Yes | No |
+------------------------+--------------------------+---------------+
| Third-Party | Yes | No |
| Management | | |
+------------------------+--------------------------+---------------+
Table 1: Comparison of deployment modes
Applications that rely on a Transparency Log deployed in Contact
Monitoring mode MUST regularly engage in out-of-band communication
(Section 4.1) to ensure that they detect forks in a timely manner.
Applications that rely on a Transparency Log deployed in either of
the third-party modes SHOULD allow users to enable a "Contact
Monitoring Mode". This mode, which affects only the individual
client's behavior, would cause the client to behave as if its
Transparency Log was deployed in Contact Monitoring mode. As such,
it would start retaining state about previously looked-up keys and
regularly engaging in out-of-band communication. Enabling this
higher-security mode allows users to double-check that the third-
party is not colluding with the Transparency Log to serve malicious
data.
5.1. Contact Monitoring
With the Contact Monitoring deployment mode, the monitoring burden is
split between both the owner of a key and those that look up the key.
Stated as simply as possible, the monitoring obligations of each
party are:
McMillion Expires 5 September 2024 [Page 10]
Internet-Draft Key Transparency Architecture March 2024
1. The key owner, on a regular basis, searches for the most recent
version of the key in the log. They verify that the key has not
changed unexpectedly.
2. The users that looked up a key, at some point in the future,
verify that the key-value pair they observed is still properly
represented in the tree such that other users would find it if
they searched for it.
This guarantees that if a malicious key-value pair is added to the
log, then either it is detected by the key owner, or if it is
removed/obscured from the log before the key owner can detect it,
then any users that observed it will detect its removal.
Alice Transparency Log Bob
| | |
| Search(Bob) --------------------> | |
| <------------ SearchResponse(...) | |
| | |
| | |
| (1 day later) | |
| | |
| Monitor(Bob) -------------------> | |
| <----------- MonitorResponse(...) | |
| | |
| | |
| (2 days later) | |
| | |
| Monitor(Bob) -------------------> | |
| <----------- MonitorResponse(...) | |
| | |
| | <------------------ Monitor(Bob) |
| (4 days later) | MonitorResponse(...) ----------> |
| | |
| Monitor(Bob) -------------------> | |
| <----------- MonitorResponse(...) | |
| | |
| ... | |
| | |
Figure 4: Contact Monitoring. When users make a Search request,
they must check back in with the Transparency Log several times.
These checks ensure that the data in the Search response wasn't
later removed from the log. Overlap with the key owner's own
monitoring guarantees a consistent view of data.
McMillion Expires 5 September 2024 [Page 11]
Internet-Draft Key Transparency Architecture March 2024
5.2. Third-Party Auditing
With the Third-Party Auditing deployment mode, the transparency log
obtains signatures from a lightweight third-party auditor attesting
to the fact that the tree has been constructed correctly. These
signatures are provided to users along with the responses for their
queries.
The third-party auditor is expected to run asynchronously,
downloading and authenticating a log's contents in the background, so
as not to become a bottleneck for the transparency log.
Many Users Transparency Log Auditor
| | |
| Update(Alice, ...) ------------------> | |
| Update(Bob, ...) --------------------> | |
| Update(Carol, ...) ------------------> | |
| <===== Response{AuditorSig: 66bf, ...} | |
| | |
| | |
| | BatchUpdate --------------> |
| | <---------- NewSig: 53c1035 |
| | |
Figure 5: Third-Party Auditing. A recent signature from the
auditor is provided to users. The auditor is updated on changes
to the tree in the background.
5.3. Third-Party Management
With the Third-Party Management deployment mode, a third party is
responsible for the majority of the work of storing and operating the
log, while the transparency log serves mainly to enforce access
control and authenticate the addition of new entries to the log. All
user queries are initially sent by users directly to the transparency
log, and the log operator proxies them to the third-party manager if
they pass access control.
McMillion Expires 5 September 2024 [Page 12]
Internet-Draft Key Transparency Architecture March 2024
Alice Transparency Log Manager
| | |
| Search(Alice) ------------> | -----------------------------> |
| <-------------------------- | <--------- SearchResponse(...) |
| | |
| Update(Alice, ...) -------> | -----------------------------> |
| <-------------------------- | <--------- UpdateResponse(...) |
| | |
| Search(Fred) ----------> X | |
| Update(Bob, ...) ------> X | |
| | |
Figure 6: Third-Party Management. Valid requests are proxied by the
Transparency Log to the Manager. Rejected requests are blocked.
6. Combining Logs
There are many cases where it makes sense to operate multiple
cooperating log instances, for example:
* A service provider may decide that it's prudent to rotate its
cryptographic keys, or migrate to a new deployment mode. They can
do this by creating a new log instance with new cryptographic
keys, operating under a new deployment mode if desired, and
migrating their data from the old log to the new log while users
are able to query both.
* A service provider may choose to operate multiple logs to improve
their ability to scale or provide higher availability.
* A federated system may allow each participant in the federation to
operate their own log for their own users.
Client implementations should generally be prepared to interact with
multiple logs simultaneously. In particular, clients SHOULD
namespace any configuration or state related to a particular log,
such that information related to different logs do not conflict.
When multiple logs are used, all users in the system MUST have a
consistent policy for executing Search, Update, and Monitor queries
against the logs in a way that maintains the high-level security
guarantees of KT:
* If all logs behave honestly, then users observe a globally-
consistent view of the data associated with each key.
McMillion Expires 5 September 2024 [Page 13]
Internet-Draft Key Transparency Architecture March 2024
* If any log behaves dishonestly such that the prior guarantee is
not met (some users observe data associated with a key that others
do not), this will be detected either immediately or in a timely
manner by background monitoring.
6.1. Gradual Migration
In the case of gradually migrating from an old log to a new one, this
policy may look like:
1. Search queries should be executed against the old log first, and
then against the new log only if the most recent version of a key
in the old log is a special application-defined 'tombstone'
entry.
2. Update queries should only be executed against the new log,
adding a tombstone entry to the old log if one hasn't been
already created.
3. Both logs should be monitored as they would be if they were run
individually. Once the migration has completed and the old log
has stopped accepting changes, the old log MUST stay operational
long enough for all users to complete their monitoring of it
(keeping in mind that some users may be offline for a significant
amount of time).
Placing a tombstone entry for each key in the old log gives users a
clear indication as to which log contains the most recent version of
a key and prevents them from incorrectly accepting a stale version if
the new log rejects a search query.
6.2. Immediate Migration
In the event of a key compromise, the service provider may instead
choose to stop adding new entries to a log immediately and provide a
new log that is pre-populated with the most recent versions of all
keys. In this case, the policy may look like:
1. Search queries must be executed against the new log.
2. Update queries must be executed against the new log.
3. The final tree size and root hash of the old log should be
provided to users over a trustworthy channel. Users will use
this to do any final monitoring of the old log, and then ensure
that the most recent versions of the keys they own are properly
represented in the new log. From then on, users will monitor
only the new log.
McMillion Expires 5 September 2024 [Page 14]
Internet-Draft Key Transparency Architecture March 2024
The final tree size and root hash of the prior log must be
distributed to users in a way that guarantees all users have a
globally-consistent view. This can be done either by storing them in
a well-known key of the new log, or with the application's code
distribution mechanism.
6.3. Federation
In a federated application, many servers that are owned and operated
by different entities will cooperate to provide a single end-to-end
encrypted communication service. Each entity in a federated system
provides its own infrastructure (in particular, a transparency log)
to serve the users that rely on it. Given this, there must be a
consistent policy for directing KT requests to the correct
transparency log. Typically in such a system, the end-user identity
directly specifies which entity requests should be directed to. For
example, with an email end-user identity like alice@example.com, the
controlling entity is example.com.
A controlling entity like example.com may act as an anonymizing proxy
for its users when querying transparency logs run by other entities
(in the manner of [RFC9458]), but should not attempt to 'mirror' or
combine other transparency logs with its own.
7. Pruning
As part of the core infrastructure of an end-to-end encrypted
communication service, Transparency Logs are required to operate
seamlessly for several years. This presents a problem for general
append-only logs, as even moderate usage can cause the log to grow to
an unmanageable size. This issue is further compounded by the fact
that a substantial portion of the entries added to a log may be fake,
having been added solely for the purpose of obscuring short-term
update rates (as discussed in Section 8.1). Given this, Transparency
Logs need to be able manage their footprint by pruning data which is
no longer required by the communication service.
Broadly speaking, a Transparency Log's database will contain two
types of data:
1. Serialized user data (the values corresponding to keys in the
log), and
2. Cryptographic data, such as pre-computed portions of hash trees
or commitment openings.
McMillion Expires 5 September 2024 [Page 15]
Internet-Draft Key Transparency Architecture March 2024
The first type, serialized user data, can be pruned by removing any
entries that the service operator's access control policy would never
permit access to. For example, a service operator may only permit
clients to search for the most recent version (or n versions) of a
key. Any entries that don't meet this criteria can be deleted
without consideration to the rest of the protocol.
The second type, cryptographic data, can also be pruned, but only
after considering which parts are no longer required by the protocol
for producing proofs. For example, even though the key-value pair
inserted at a particular entry in the append-only log may have been
deleted, parts of the log entry may still be needed to produce proofs
for Search / Update / Monitor queries on other keys. The exact
mechanism for determining which data is safe to delete will depend on
the implementation.
The distinction between user data and cryptographic data provides a
valuable separation of concerns, given that the protocol document
does not provide a mechanism for a service operator to convey its
access control policy to a Transparency Log. That is: pruning user
data can be done entirely by application-defined code, while pruning
cryptographic data can be done entirely by KT-specific code as a
subsequent operation.
8. Security Guarantees
A user that correctly verifies a proof from the Transparency Log (and
does any required monitoring afterwards) receives a guarantee that
the Transparency Log operator executed the key-value lookup
correctly, and in a way that's globally consistent with what it has
shown all other users. That is, when a user searches for a key,
they're guaranteed that the result they receive represents the same
result that any other user searching for the same key would've seen.
When a user modifies a key, they're guaranteed that other users will
see the modification the next time they search for the key.
If the Transparency Log operator does not execute a key-value lookup
correctly, then either:
1. The user will detect the error immediately and reject the proof,
or
2. The user will permanently enter an invalid state.
McMillion Expires 5 September 2024 [Page 16]
Internet-Draft Key Transparency Architecture March 2024
Depending on the exact reason that the user enters an invalid state,
it will either be detected by background monitoring or the next time
that out-of-band communication is available. Importantly, this means
that users must stay online for some bounded amount of time after
entering an invalid state for it to be successfully detected.
Alternatively, instead of executing a lookup incorrectly, the
Transparency Log can attempt to prevent a user from learning about
more recent states of the log. This would allow the log to continue
executing queries correctly, but on outdated versions of data. To
prevent this, applications configure an upper bound on how stale a
query response can be without being rejected.
The exact caveats of the above guarantees depend naturally on the
security of underlying cryptographic primitives, and also the
deployment mode that the Transparency Log relies on:
* Third-Party Management and Third-Party Auditing require an
assumption that the transparency log and the third-party manager/
auditor do not collude to trick users into accepting malicious
results.
* Contact Monitoring requires an assumption that the user that owns
a key and all users that look up the key do the necessary
monitoring afterwards.
In short, assuming that the underlying cryptographic primitives used
are secure, any deployment-specific assumptions hold (such as non-
collusion), and that user devices don't go permanently offline, then
malicious behavior by the Transparency Log is always detected within
a bounded amount of time. The parameters that determine the maximum
amount of time before malicious behavior is detected are as follows:
* How stale an application allows query responses to be (ie, how
long an application is willing to go without seeing updates to the
tree).
* How frequently users execute background monitoring.
* How frequently users exercise out-of-band communication.
* For third-party auditing: the maximum amount of lag that an
auditor is allowed to have, with respect to the most recent tree
head.
McMillion Expires 5 September 2024 [Page 17]
Internet-Draft Key Transparency Architecture March 2024
8.1. Privacy Guarantees
For applications deploying KT, service operators expect to be able to
control when sensitive information is revealed. In particular, an
operator can often only reveal that a user is a member of their
service, and information about that user's account, to that user's
friends or contacts.
KT only allows users to learn whether or not a lookup key exists in
the Transparency Log if the user obtains a valid search proof for
that key. Similarly, KT only allows users to learn about the
contents of a log entry if the user obtains a valid search proof for
the exact key and version stored at that log entry.
Applications are primarily able to manage the privacy of their data
in KT by relying on these properties when they enforce access control
policies on the queries issued by users, as discussed in Section 3.
For example if two users aren't friends, an application can block
these users from searching for each other's lookup keys. This
prevents the two users from learning about each other's existence.
If the users were previously friends but no longer are, the
application can prevent the users from searching for each other's
keys and learning the contents of any subsequent account updates.
Service operators also expect to be able to control sensitive
population-level metrics about their users. These metrics include
the size of their userbase, the frequency with which new users join,
and the frequency with which existing users update their keys.
KT allows a service operator to obscure the size of its userbase by
padding the tree with fake entries. Similarly, it also allows a
service operator to obscure the rate at which changes are made by
padding real changes with fake ones, causing outsiders to observe a
baseline constant rate of changes.
8.1.1. Leakage to Third-Party
In the event that a third-party auditor or manager is used, there's
additional information leaked to the third-party that's not visible
to outsiders.
In the case of a third-party auditor, the auditor is able to learn
the total number of distinct changes to the log. It is also able to
learn the order and approximate timing with which each change was
made. However, auditors are not able to learn the plaintext of any
keys or values. This is because keys are masked with a VRF, and
values are only provided to auditors as commitments. They are also
not able to distinguish between whether a change represents a key
McMillion Expires 5 September 2024 [Page 18]
Internet-Draft Key Transparency Architecture March 2024
being created for the first time or being updated, or whether a
change represents a "real" change from an end-user or a "fake"
padding change.
In the case of a third-party manager, the manager generally learns
everything that the service operator would know. This includes the
total set of plaintext keys and values and their modification
history. It also includes traffic patterns, such as how often a
specific key is looked up.
9. Privacy Law Considerations
Consumer privacy laws often provide a 'right to erasure', meaning
that when a consumer requests that a service provider delete their
personal information, the service provider is legally obligated to do
so. This may seem to be incompatible with the description of KT in
Section 1 as an 'append-only log'. Once an entry is added to a
transparency log, it indeed can not be removed.
The important caveat here is that user data is not directly stored in
the append-only log. Instead, the log consists of privacy-preserving
cryptographic commitments. By logging commitments instead of
plaintext user data, users interacting with the log are unable to
infer anything about an entry's contents until the service provider
explicitly provides the commitment's opening. A service provider
responding to an erasure request can delete the commitment opening
and the associated data, effectively anonymizing the entry.
Other than the log, the second place where user information is stored
is in the _prefix tree_. This is a cryptographic index provided to
users to allow them to efficiently query the log, which contains
information about which lookup keys exist and where. These lookup
keys are usually serialized end-user identifiers, although it varies
by application. To minimize leakage, all lookup keys are processed
through a Verifiable Random Function, or VRF [RFC9381].
A VRF deterministically maps each lookup key to the fixed-length
pseudorandom value. The VRF can only be executed by the service
operator, who holds a private key. But critically, VRFs can still
provide a proof that an input-output pair is valid, which users
verify with a public key. When a user tries to search for or update
a key, the service operator first executes its VRF on the input
lookup key to obtain the output key that will actually be looked up
or stored in the prefix tree. The service operator then provides the
output key, along with a proof that the output key is correct, in its
response to the user.
McMillion Expires 5 September 2024 [Page 19]
Internet-Draft Key Transparency Architecture March 2024
The pseudorandom output of VRFs means that even if a user indirectly
observes that a search key exists in the prefix tree, they can't
immediately learn which user the search key identifies. The
inability of users to execute the VRF themselves also prevents
offline "password cracking" approaches, where an attacker tries all
possibilities in a low entropy space (like the set of phone numbers)
to find the input that produces a given search key.
A service provider responding to an erasure request can 'trim' the
prefix tree, by no longer storing the full VRF output for any lookup
keys corresponding to an end-user's identifiers. With only a small
amount of the VRF output left in storage, even if the transparency
log is later compromised, it would be unable to recover deleted
identifiers. If the same lookup keys were reinserted into the log at
a later time, it would appear as if they were being inserted for the
first time.
As an example, consider the information stored in a transparency log
after inserting a key K with value V. The value stored in the prefix
tree would roughly correspond to VRF(key K) = pseudorandom bytes, and
the value stored in the append-only log would roughly correspond to:
Commit(nonce: random bytes, body: version N of key K is V)
After receiving an erasure request, the transparency log deletes the
key, value, and random commitment nonce. It also trims the VRF
output to the minimum size necessary. The commitment scheme
guarantees that, without the high-entropy random nonce, the remaining
commitment reveals nothing about the key or value.
Assuming that the prefix tree is well-balanced (which is extremely
likely due to VRFs being pseudorandom), the number of VRF output bits
retained is approximately equal to the logarithm of the total number
of keys logged. This means that while the VRF's full output may be
256 bits, in a log with one million keys, only 20 output bits would
need to be retained. This would be insufficient for recovering even
a very low-entropy identifier like a phone number.
10. Implementation Guidance
Fundamentally, KT can be thought of as guaranteeing that all the
users of a service agree on the contents of a key-value database.
Using this guarantee, that all users agree on a set of keys and
values, to authenticate the relationship between end-user identities
and the end-users of a communication service takes special care.
Critically, in order to authenticate an end-user identity, it must be
both _unique_ and _user-visible_. However, what exactly constitutes a
unique and user-visible identifier varies greatly from application to
McMillion Expires 5 September 2024 [Page 20]
Internet-Draft Key Transparency Architecture March 2024
application.
Consider, for example, a communication service where users are
uniquely identified by a fixed username, but KT has been deployed
using an internal UUID as the lookup key. While the UUID might be
unique, it is not user-visible. When a user attempts to lookup a
contact by username, the service operator must translate the username
into its UUID. Since this mapping (from username to UUID) is
unauthenticated, the service operator can manipulate it to eavesdrop
on conversations by returning the UUID for an account that it
controls. From a security perspective, this is equivalent to not
using KT at all. An example of this kind of application would be
email.
However in other applications, the use of internal UUIDs in KT may be
appropriate. For example, many applications don't have this type of
fixed username and instead use their UI (underpinned internally by a
UUID) to indicate to users whether a conversation is with a new
person or someone they've previously contacted. The fact that the UI
behaves in this way makes the UUID a user-visible identifer, even if
a user may not be able to actually see it written out. An example of
this kind of application would be Slack.
A *primary end-user identity* is one that is unique, user-visible,
and unable to change. (Or equivalently, if it changes, it appears in
the application UI as a new conversation with a new user.) A primary
end-user identity should always be a lookup key in KT, with the end-
user's public keys as the associated value.
A *secondary end-user identity* is one that is unique, user-visible,
and able to change without being interpreted as a different account
due to its association with a primary identity. Examples of this
type of identity include phone numbers, or most usernames. These
identities are used solely for initial user discovery, in which
they're converted to a primary identity that's used by the
application from then on. A secondary end-user identity should be a
lookup key in KT, for the purpose of authenticating user discovery,
with the primary end-user identity as the associated value.
While likely helpful to most common applications, the distinction
between handling primary and secondary identities is not a hard-and-
fast rule. Applications must be careful to ensure they fully capture
the semantics of identity in their application with the key-value
structure they put in KT.
McMillion Expires 5 September 2024 [Page 21]
Internet-Draft Key Transparency Architecture March 2024
11. IANA Considerations
This document has no IANA actions.
12. References
12.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/rfc/rfc2119>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/rfc/rfc8174>.
12.2. Informative References
[RFC6962] Laurie, B., Langley, A., and E. Kasper, "Certificate
Transparency", RFC 6962, DOI 10.17487/RFC6962, June 2013,
<https://www.rfc-editor.org/rfc/rfc6962>.
[RFC9381] Goldberg, S., Reyzin, L., Papadopoulos, D., and J. Včelák,
"Verifiable Random Functions (VRFs)", RFC 9381,
DOI 10.17487/RFC9381, August 2023,
<https://www.rfc-editor.org/rfc/rfc9381>.
[RFC9420] Barnes, R., Beurdouche, B., Robert, R., Millican, J.,
Omara, E., and K. Cohn-Gordon, "The Messaging Layer
Security (MLS) Protocol", RFC 9420, DOI 10.17487/RFC9420,
July 2023, <https://www.rfc-editor.org/rfc/rfc9420>.
[RFC9458] Thomson, M. and C. A. Wood, "Oblivious HTTP", RFC 9458,
DOI 10.17487/RFC9458, January 2024,
<https://www.rfc-editor.org/rfc/rfc9458>.
[sealed-sender]
"Technology preview: Sealed sender for Signal", 29 October
2018, <https://signal.org/blog/sealed-sender/>.
Acknowledgments
TODO acknowledge.
Author's Address
Brendan McMillion
McMillion Expires 5 September 2024 [Page 22]
Internet-Draft Key Transparency Architecture March 2024
Email: brendanmcmillion@gmail.com
McMillion Expires 5 September 2024 [Page 23]