Internet DRAFT - draft-hallambaker-compressedcrlset
draft-hallambaker-compressedcrlset
Internet Engineering Task Force (IETF) Phillip Hallam-Baker
Internet-Draft Comodo Group Inc.
Intended Status: Standards Track Rob Stradling
Expires: April 10, 2015 Comodo CA Ltd.
October 7, 2014
Compressed CRL Sets
draft-hallambaker-compressedcrlset-00
Abstract
An efficient encoding for Certificate Revocation Lists is described
based on a Disjoint Set encoding technique. When applied to the
population of currently issued certificates enroled in the
Certificate Transparency Notary network, the data set was reduced by
97% without introduction of false positives or false negatives.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
Copyright Notice
Copyright (c) 2014 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Hallam-Baker April 10, 2015 [Page 1]
Internet-Draft Compressed CRLSets October 2014
Table of Contents
1. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3
2. The Revocation Problem . . . . . . . . . . . . . . . . . . . . 3
2.1. PKIX Certificate Status Mechanisms . . . . . . . . . . . 3
2.2. Short Lived Certificates . . . . . . . . . . . . . . . . 4
2.3. CRL Sets . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4. Compressed CRL Sets . . . . . . . . . . . . . . . . . . . 5
3. Efficient Encoding of Disjoint Sets . . . . . . . . . . . . . 5
3.1. Problem Statement . . . . . . . . . . . . . . . . . . . . 5
3.2. Standard Hash Function. . . . . . . . . . . . . . . . . . 6
3.3. Parameterized Hash Function. . . . . . . . . . . . . . . 6
3.4. Shortest Discrimination Prefix . . . . . . . . . . . . . 6
3.4.1. Sorted List of Hashes . . . . . . . . . . . . . . . 7
3.4.2. Difference Encoding . . . . . . . . . . . . . . . . 7
3.4.3. Further optimizations. . . . . . . . . . . . . . . . 7
4. Compressed CRL Set Encoding . . . . . . . . . . . . . . . . . 8
5. Experimental Results . . . . . . . . . . . . . . . . . . . . . 8
6. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8
6.1. Normative References . . . . . . . . . . . . . . . . . . 8
6.2. Informative References . . . . . . . . . . . . . . . . . 8
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 8
Hallam-Baker April 10, 2015 [Page 2]
Internet-Draft Compressed CRLSets October 2014
1. Definitions
1.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
2. The Revocation Problem
Certificate revocation is a critical element of the Web PKI design. A
certificate issuer may revoke a certificate for many reasons
including:
* The certificate was issued in error or due to a CA breach.
* The certificate holder has breached some condition set by the
issuer.
* The certificate is being used for or in connection with illegal
purposes.
* The certificate holder has requested the certificate be
revoked.
* The private key corresponding to the public key in the
certificate has been compromised.
While certificates are occasionally issued in error or due to an
issuer breach, these cases are very rare. Impersonating another
company is much more difficult than stealing a private key from a
legitimate company. The ability to revoke certificates is therefore a
critical element in the WebPKI risk mitigation strategy.
At present, there are approximately TBS certificates unexpired
certificates issued in the WebPKI of which TBS (approximately 10%)
have been revoked.
2.1. PKIX Certificate Status Mechanisms
PKIX [RFC5280] describes two mechanisms for communicating certificate
status.
* Certificate Revocation Lists (CRLs) list the serial numbers of
certificates that a CA has issued that are not currently valid.
* Online Certificate Status Protocol (OCSP) describes an online
service that a relying party can query to request the current
status of a certificate.
Hallam-Baker April 10, 2015 [Page 3]
Internet-Draft Compressed CRLSets October 2014
The scaling problems inherent in the use of CRLs quickly became
apparent as the size of the WebPKI grew. The length of the CRLs
issued by the large CAs quickly grew to several Mb. While techniques
such as delta encoding and partitioning had the protential to provide
temporary relief, the scaling issue was only mitigated, not solved.
OCSP [RFC6960] provides a scalable solution but in its original
conception for TLS introduces a new party to every communication.
Each time a client attempts to establish a TLS connection it looks
for an unexpired OCSP token to establish the validity of the
certificate presented. If no unexpired cached token is available, a
fresh one is obtained from the OCSP distribution. This introduces a
risk of traffic analysis by either the OCSP service provider itself
or any party that can view the traffic to the OCSP service.
OCSP Stapling [RFC6066] was introduced to the TLS protocol as a
mechanism to avoid the need to obtain OCSP tokens from a third party
by allowing them to be passed in band in the TLS conversation.
2.2. Short Lived Certificates
While OCSP stapling removes the privacy, reliability and performance
concerns of the original OCSP design, the certificate issuer is still
required to issue the OCSP token and the server using the certificate
is still required to fetch it. Since issuing an OCSP token on a daily
basis is tantamount to issuing a new certificate, why not reissue the
certificate on a daily basis and do away with the OCSP token
entirely?
In principle, use of short lived certificates is simpler for all the
parties involved: CAs, subjects and relying parties. But before short
lived certificates can be used in practice, new infrastructure is
required to support their use. In particular client software has to
be rewritten so that it recognizes a certificate with a short
validity interval (e.g. 36 hours) as carrying an implicit assertion
of validity and a mechanism by which the server obtains fresh
certificates must be specified and implemented. For work on the
latter problem, see [I-D.hallambaker-omnipublish].
2.3. CRL Sets
CRL Sets are curated CRLs, typically issued by an application
provider that list the revoked certificates regarded to pose the
greatest concern of abuse.
Use of CRL Sets has the advantage that they may allow an application
provider to react to a security incident more quickly than the
issuing CA. But they only provide revocation information for a
limited number of certificates.
Hallam-Baker April 10, 2015 [Page 4]
Internet-Draft Compressed CRLSets October 2014
As with traditional CRLs, various techniques such as delta encoding
may be used to limit the size of individual CRLs but these do not
reduce the total volume of data that must be exchanged.
2.4. Compressed CRL Sets
Compressed CRL Sets are an alternative encoding for CRLs that permit
very much higher encoding density than traditional CRLs. Rather than
encoding the set of revoked certificates, Compressed CRLs encode the
difference between the set of valid unexpired certificates and the
set of invalid unexpired certificates. This allows the average number
of bits required to encode the set to be reduced from 96 bits per
certificate (using the certificate serial number) to approximately 6.
While Compressed CRL Sets currently offer a practical answer to the
problem of revocation in the WebPKI, short lived certificates
actually scale better.
Although Compressed CRL Sets provide much better scaling properties
than traditional CRLs, the size of the CRL Set still increases
linearly with the product of the number of revoked certificates and
the log of the revocation ratio. Thus while use of Compressed CRL
Sets will remain practical as long as the growth in the number of
WebPKI certificates issued grows at a slower rate than additional
communications bandwidth becomes available, this cannot be
guaranteed.
Short lived certificates in contrast do not require any additional
revocation data and may be regarded as offering perfect scaling. The
chief drawback to relying on short lived certificates is that the
minimum practical validity interval is 48 hours which is also the
time period in which the vast majority of the harm caused in a
criminal attack occurs. Also deployment of short lived certificates
relies on new certificate lifecycle management infrastructure that is
not yet developed.
Use of Compressed CRLs in combination with Short Lived certificates
offers advantages over both. In the short term Compressed CRLs
provide a mechanism for reporting status on the population of long
expiry certificates until the infrastructure is deployed to enable
use of the short lived approach. In the long term, Compressed CRLs
provide an efficient mechanism to narrow the residual vulnerability
window from days to minutes.
3. Efficient Encoding of Disjoint Sets
CRL Sets are compressed using a selected variant of the approach
described in [[HBS2014 - To be specified]].
Hallam-Baker April 10, 2015 [Page 5]
Internet-Draft Compressed CRLSets October 2014
3.1. Problem Statement
Given two disjoint sets of data entries A and B in which |A| >= |B|,
provide a boolean function f (x) that is efficient in both memory and
time such that:
* f(x) is true if x is a member of A
* f(x) is false if x is a member of B
* f(x) can return true or false otherwise
For ease of comparison we will consider an example in which set A has
10 million members and set B has 1 million. This corresponds to a
revocation use case in which there are 11 million unexpired
certificates and 10% have been revoked.
Note that while it is in theory possible for the revocation ratio to
exceed 50% and thus for |A| < |B|, this case is easily handled with a
binary flag to indicate that Set A represents the revoked
certificates and B those that are valid.
3.2. Standard Hash Function.
One approach to this problem is to first reduce the data entries
using a hash function and compile a list of the hash values
corresponding to the entries of set B. This has the advantage of
simplicity but the corresponding data set is very large. For the
example case, the implementation of f(x) requires 32MB of data if a
256 bit hash is used.
3.3. Parameterized Hash Function.
While using a 256 bit hash leaves an infintesimal probability of
collisions, the same is true of 128 bit or even shorter hashes. If a
parameterized hash function (e.g. MAC-SHA1) is used we can repeat the
hash construction process multiple times with different parameters
and check to see the shortest truncation that allows the sets to be
distinguished. This allows the example case to be addressed using
only 6MB of data.
3.4. Shortest Discrimination Prefix
The parameterized hash function encodes enough information to
distinguish any member of Set A from any member of set B, but this is
more information than is necessary. It is sufficient to distinguish a
member of set B from the two members of set A that are adjacent to
it.
Hallam-Baker April 10, 2015 [Page 6]
Internet-Draft Compressed CRLSets October 2014
We begin by calculating a hash value H(x) of each member of A and B
using a hash function that returns a unique value for each entry to
create the corresponding hash sets AH, BH.
The hash function used for this purpose does not need to meet any
special cryptographic properties such as first or second pre-image
resistance.
Let P(x,y) be a function of two bit strings, x, y such that P(x,y) is
true if and only if y is a prefix of x. That if y is n bits long, the
first n bits of x and y are identical.
Let M(b, AH) be a function that returns the shortest prefix of b that
is not a prefix of any member of AH.
We apply the function M to each member of BH and remove duplicates to
create the set BP.
The function f(A,B) can now be implemented as follows: f(x) = false
if there is a member p of BP such that p is a prefix of the hash
value of x (i/e/ P(H(x),p) is true)f(x) = true otherwise
Note that the value of f(x) returned if x is not a member of A or B
is undefined.
A space and time efficient implementation of the function f(x) is
arrived at by an efficient encoding of the set BP.
3.4.1. Sorted List of Hashes
The set BP may be encoded as a simple list.
Sorting the list permits efficient access using standard techniques
applicable to searching a hash table.
For the example case, the average shortest prefix is 24?? bits giving
a total data set size of 3Mb.
3.4.2. Difference Encoding
In a sorted list of hashes, adjacent hashes will typically only vary
in the last few bits. Since it is only the difference between the
hashes that contains information, we can encode this information
without loss of information overall.
For the example case, the average number of bits required for a
difference encoding is 6 bits and the total data volume required is
750KB.
Hallam-Baker April 10, 2015 [Page 7]
Internet-Draft Compressed CRLSets October 2014
3.4.3. Further optimizations.
Further optimizations have been developed permitting an additional
reduction in the data volume by half over the difference technique.
4. Compressed CRL Set Encoding
TBS specify an encoding of CRL data based on the above.
Following the general trend for JSON and the need for compactness,
the use of JSON-B is ideal.
5. Experimental Results
TBS here extract the lab data from the prototype implementations
using the encoding described above.
6. References
6.1. Normative References
[I-D.hallambaker-omnipublish] Hallam-Baker, P, "OmniBroker
Publication Protocol", Internet-Draft draft-hallambaker-
omnipublish-00, 22 May 2014.
[RFC6960] Santesson, S.,Myers, M.,Ankney, R.,Malpani, A.,Galperin,
S.,Adams, C., "X.509 Internet Public Key Infrastructure
Online Certificate Status Protocol - OCSP", RFC 6960, June
2013.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
6.2. Informative References
[RFC6066] Eastlake, D., "Transport Layer Security (TLS) Extensions:
Extension Definitions", RFC 6066, January 2011.
[RFC5280] Cooper, D.,Santesson, S.,Farrell, S.,Boeyen, S.,Housley,
R.,Polk, W., "Internet X.509 Public Key Infrastructure
Certificate and Certificate Revocation List (CRL)
Profile", RFC 5280, May 2008.
Authors' Addresses
Phillip Hallam-Baker
Comodo Group Inc.
philliph@comodo.com
Hallam-Baker April 10, 2015 [Page 8]
Internet-Draft Compressed CRLSets October 2014
Rob Stradling
Comodo CA Ltd.
rob.stradling@comodo.com
Hallam-Baker April 10, 2015 [Page 9]