Network Working Group | T. Bruijnzeels |
Internet-Draft | O. Muravskiy |
Intended status: Informational | RIPE NCC |
Expires: October 13, 2013 | B. Weber |
Cobenian | |
April 11, 2013 |
RPKI Repository Analysis and Requirements
draft-tbruijnzeels-sidr-repo-analysis-00
The current RPKI Resource Certificate Repository Structure (RFC6480 & RFC6481) uses rsync as both a delta and transfer protocol. Concerns have been raised about the scalability of this repository and the reliance on rsync. This document provides an analysis of these concerns and formulates requirements for future work.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on October 13, 2013.
Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119] .
The current RPKI Resource Certificate Repository Structure (RFC6480 & RFC6481) uses rsync as both a delta and transfer protocol and recommends that repositories be set up in a hierarchical way such that relying parties (validation tools) can fetch all updates in a single repository efficiently while performing top-down validation.
This structure has its benefits. In particular it has allowed for early deployment of RPKI without the need to re-invent a delta protocol, and this has allowed early adopters of RPKI to build up operational experience more quickly. The delta protocol also has benefits for relying party tools, allowing them to quickly retrieve what’s new in a repository limiting fetch time and bandwidth usage.
Having said this, operational experience, as well as lab testing, have shown that there are concerns with regards to the current infrastructure that justify that the WG thinks about improvements in this space.
Rsync is a very efficient tool when used 1:1 between a client and server. The problem is that in a globally deployed RPKI we can expect in the order of 40k clients, roughly corresponding to the number of ASNs, to connect regularly to a repository server.
When the rsync built-in delta protocol is used (recursive fetching), the server is computationally involved in calculating the delta for each connected client. Performance measurements in a lab have shown that the maximum number of clients that can be processed per second (throughput), and the maximum number of concurrent clients are both linearly dependent on the repository size. The throughput is limited by server CPU. Concurrency is limited by server memory.
As a result a sufficiently large repository has to invest heavily in running multiple rsyncd instances to cope with the expected regular load of a large number of clients, and to counter the risks of DDoS attacks.
The retrieval of signed objects is described in RFC6480 (section 6). There are no formal limits imposed by this informational RFC on the update frequency, but to prevent the overloading of repository servers as described above, the typical update interval of current tools is between 1-24 hours.
In previous discussions in the WG it was suggested that human scale propagation times (i.e. up to 24 hours) are good enough for the problem that we are trying to solve. There are however good reasons why much faster retrieval of newly signed objects is desirable.
In this scenario, a ROA exists for a prefix and ASN, but the prefix needs to be announced from another ASN.
In many cases the RP can foresee this and create an appropriate ROA well in advance, but there are also failure cases possible where this is not foreseeable.
The CA operator made a mistake when it created a ROA. The ROA causes announcements that should be considered VALID to appear as INVALID, or vice versa. The CA would like to take appropriate action and revoke the ROA or issue additional ROAs. However, RPs that have received the mistaken ROA may not see these updates for some time.
The BGPSec protocol is still being discussed. However there are indications that it would be desirable if propagation times for new router certificates and CRLs could be reduced. In particular in the context of:
There is only one known implementation of rsync and the standard is not described by any RFC. The implementation is non-modular, making it impossible to use the code as a library even when coding in the same language.
As a result all current implementations of relying party tooling have had no option but to use rsync as a pre-installed external process.
This has several major draw backs for the quality of implementations:
An 'inconsistent' set of objects is a set of retrieved objects for a CA Certificate where there differences between the objects retrieved and the objects mentioned on the corresponding manifest. If any objects are missing, or if additional objects not mentioned on the manifest are found, or if any of the objects does not match the sha256 hash mentioned on the manifest, then the set as a whole is considered inconsistent. RFC6486 has text advising RPs on possible ways to treat each of these cases. However, there is a large degree of uncertainty as to how different RP tools, and operators, will deal with these corner cases because most decisions are left to local policy. This can lead to inconsistent and possibly surprising differences in the validation of RPKI data.
Missing ROA objects can be particularly problematic because other ROAs, that can be found and validated, may invalidate announcements that would have been marked as valid by these missing ROAS. Additional ROA objects are confusing because to the RP it's not clear whether this ROA was intentional and the MFT is out of date, or not.
The use of rsync as a delta protocol is problematic in this context, because rsync is non-transactional. As a result an RP may get partially updated CA repository objects if it happens to fetch while the objects on disk are being updated. This is confusing to the RP who can not tell the difference between this and a persistent error on the publisher side, or an attack involving partial replay or withholding of objects.
All of this adds up to a repository infrastructure and corresponding validation rules that leave a high degree of uncertainty in case of corner cases. The authors believe that it would be better to (1) improve the standards so that these corner cases are less likely to occur, and (2) formulate much stricter validation rules so that the uncertainty with regards to how RPs may deal with corner cases is further reduced.
In the current design only publication point per CA is envisioned.
Even though such a publication point may employ various techniques to achieve high-availability, this leaves concerns with regards to:
The notion that child CAs can publish in a sub-directory of their parent CA publication point has been suggested as mitigation strategy for scalability of fetching RPKI data using rsync.
There are a number of reasons why this hierarchical model may not be advisable or even possible:
A future delta protocol should be transport agnostic, allowing agility in future transport protocols between RPs and repositories, and sharing of deltas between RPs.
A future delta protocol should enable CAs to publish new objects as a set so that errors in evaluating route origin validity as a result of incomplete information may be avoided as much as possible.
The scalability of a future delta protocol should not depend on a hierarchical repository lay-out. This is particularly important if one considers the possibility of third party publication servers and/or the possible use of mirror repositories. In both cases the CAs for which objects are published can most likely not be considered children of the publication server, or each other.
The numbers cited below reflect our best current estimates based on relevant statistics currently at our disposal. They are intended to provide context for load testing proposed solutions.
We are of course open any suggestions and real world statistics that can improve these estimates.
For the purpose of scalability it would be prudent to assume that mirroring should be supported in the RPKI to the point where one publication server can, in principle, mirror the complete global RPKI. In reality this may not happen to this extent, but any design that can support this should be adequate to support smaller numbers.
Assuming that only current RIR member organisations that are holding IPv4, IPv6 and/or ASN resources would act as Certificate Authorities (CA), the expected number of CAs in the global RPKI is expected to be around 50.000. However, if one also considers holders of Provider Independent (PI) resources this number may be larger. For reference: the RIPE NCC has roughly 25.000 PI prefixes registered, vs just roughly 9000 regular members. If these numbers are similar for all regions, and we assume the 'worst case' where all PI holders have their own CAs, then we are looking at number that is roughly four times larger: i.e. 200.000 CAs.
Note that each organisation will most likely find all their resources on one certificate, however in case an organisation holds resources from multiple parent sources more than certificate may be needed. For the moment we will assume that the number of CA certificates per organisation will be close to 1.
For each CA certificate 4 objects will be published in the global RPKI: the CA certificate itself (by the parent CA), one manifest, one CRL and one ghostbuster record.
The number of ROAs in the global RPKI does not depend on the number of CAs. A small organisation may have only 1 ROA, while a large organisation will need many. Instead it is expected that the number of ROAs is related to the number of intended announcements that are seen in the global BGP. The current routing table has roughly 500.000 such announcements, but the size of the table has been growing steadily.
It should be noted that ROAs can be used to authorise more than one announcement, but there are restrictions:
Statistics for the RIPE region indicate that an aggregation factor of 3 announcements per ROA is reasonable. This would put the expected number of ROAs in the order of 200.000.
The number of router certificates depends on the number of keys that will be used by BGPSec speaking routers.
At a specific ASN, different physical BGPSec speaking routers MAY use the same key, and therefore may require only one certificate for that key. On the other hand to support BGPSec roll-overs it may be advisable to publish not one, but two keys at the same time. Plus some operators may choose to use unique keys per physical router.
All in all it is not entirely clear to the authors how many certified keys may be, but on list numbers as high as 2.000.000 have been mentioned.
Using the number of objects cited in the previous sections, we can describe the total number of objects in the RPKI with the formula:
Ototal = #CAorganisations * #Avg_CAcert_per_organisation * 4 + #ROAs + #Router Certs
Ototal = 200k * ~1 * 4 + 200k + 2M = 3M
Based on the current repositories deployed by the RIRs we find these average sizes for different object types:
type | size (bytes) | size in model |
---|---|---|
CA certificate | 1416 | 1.5 kB |
Manifest | 1951 | 2 kB |
CRL | 692 | 0.7 kB |
ROA | 1846 | 2 kB |
Ghostbuster record | unknown, expect similar to ROA | 1.5 kB |
Router certificate | unknown, expect similar to CA certificate | 2 kB |
We use rounded off decimal numbers for our calculations for simplicity, and because our predictions are intended to give an idea of the expected order of magnitude of the repository size only.
Using these numbers we can predict a global repository size with the formula:
Stotal = #CAorganisations * ( CA_certificate_size + MFT_size + CRL_size + GB_size) + #ROAs * ROA_size + #Router Certs * Router_Cert_size
Stotal = 200k * ( 1.5k + 2k + 0.7k + 2k) + 200k * 2k + 2M * 2k
Stotal = 4.6G
The daily churn in the RPKI, i.e. the amount of new objects we're expected to see, per 24 hours is another important factor to consider in the context of scalability
The current RIR managed RPKI services typically update MFT and CRLs for each CA every 8 hours, accounting for a churn of 200k * 2 * (24 hours / 8 hours) = 1.2 M objects per 24 hour (833 per minute). The volume of this churn is expected to amount to 200k * (2kB + 0.7kB) * (24 hours / 8 hours) = 1.6 GB per 24 hour = 1.1 MB per minute.
The expected churn in ROAs and router certificates are expected to depend on:
We have no clear idea about these numbers at this time, but we expect this number to be relatively small compared to the churn rate caused by republishing MFTs and CRLs.
It seems plausible that in a full deployment scenario each ASN will run at least two RP tool instance (one back-up).
There are currently around 40.000 ASNs in the global BGP, so this would suggest a number of 80.000 distinct client RPs accessing repositories.
Others have suggested that RPs can use (file) sharing techniques to reduce their dependency on central repository servers. If this approach would be deployed this could reduce the number of RPs that central repository servers have to serve. We expect though that this sharing will be mostly used to ensure redundancy with an ASN, and much less between ASNs.
The repository servers have little control of the fetch frequencies used by Relying Parties. As mentioned in section 3.2 Relying Parties have an interest in fetching new information much more frequently than they do currently. It's not clear right now what frequency will be most common in a full deployment scenario. We expect though that the desired update frequency will be in the order of every ten minutes. This seems to be in-line with operator time scale changes that would have to be made in BGP and the RPKI.
It should be noted that repository servers have no control over relying parties. RPs are responsible for their own infrastructure and keeping up to date. Well functioning RPs will try to stay up to date at all times, while avoiding to overload the server(s). Badly configured RPs may however fail to retrieve updates, or they may insist on checking for updates at a well-above average rate and cause additional server load.
Having said that we believe that we can stipulate some ball park parameters that large repositories should be prepared to deal with based on the estimates mentioned in the previous section.
The numbers below all assume averages of well behaved RPs. In reality repositories will have to deal with peak loads that may result from a number of different factors, like:
Defining these factors in formulas is however fairly complicated. The authors believe that it is a more pragmatic and useful strategy to take the naive estimates defined below as a starting point and require that new protocols are load tested to a degree where we can be confident that new implementations will be able to meet the normal load requirements easily, as well as peak load conditions that may exceed normal load by factors of 5-10.
We define "throughput" as the total number of RP connections that the repository can server per minute. Note that this does not imply anything about the time each connection takes.
By definition the server has to be able to process a number of connections per time unit that is at least equal, and preferably comfortably bigger, than the number of new connections that are expected over that time unit. Failure to meet this number will inevitably lead to a build-up of client connections to the point where the server will no longer be able to accept new connections
Based on 80k RP tools fetching updates every 10 minutes we may assume that a throughput number of 8k connections / min. is the bare minimum that needs to be supported.
Another interesting load factor is given by the number of expected concurrent connections. A naive formula for this number is given by:
#Concurrent connections = #New Connections (conn / minute) * # Avg processing time (min / conn)
E.g. if it would take 30 seconds to process the average connection, we would need to support:
8k conn/min * 0.5 min/conn = 4k concurrent RPs
Based un RPs getting deltas alone we expect that the volume of data that the repository server has to serve per minute can be determined by the formula:
Vol_min = Churn_vol_min * #RPs
Vol_min = 1.1 M/min * 80k = 88 GB/min
In a full deployment scenario a large number of RPs are expected to approach a single repository server regularly. Estimates of how large this number of RPs is, and how regularly they will fetch updates, vary. However as a starting point one might expect one RP tool for each ASN, currently 40k, to fetch updates every five minutes. Whatever the real numbers may be it should be noted that the repository server has very little control over these numbers.
Because of this it makes sense to look into a delta protocol where the number of clients and frequency of fetching, has the least possible effect on the central repository server. E.g by enabling pre-computing of updates and offloading to caches or CDNs.
Although this may result in a protocol that causes the Relying Party to do more work, the trade-off of offloading CPU cycles to a large number of frequently polling RPs as opposed to spending CPU on the server is expected to scale much better.
Higher update frequencies and shorter propagation times are desired. On the other hand it would also be good if unnecessary synchronisation attempts were prevented to reduce load. For this reason a delta protocol would do well to support update notifications to RPs. Both push and poll based strategies may be used for this purpose.
A large part of the churn in the RPKI is caused by the regular republishing of Manifests and CRLs. If this frequency could be reduced without compromising security, such as an RP's sensitivity to replay attacks, then the total load on repository servers could be reduced significantly.
It should be noted that although RPs retrieve objects from untrusted sources, these objects are cryptographically validated. In other words a publisher, or monkey-in-the-middle, can not mislead the RP and generate valid objects without having access to the associated private keys. Having said that, this still leaves RPs vulnerable to attacks where information is withheld or replayed. RPs can notice such attacks if they rely on manifests to inform them about:
If deltas were signed it would be possible for RPs to detect attacks or transport errors sooner. However, signing deltas comes at a cost of complexity. In particular it will be difficult to communicate keys used for signing in a secure and dynamic (allow rolls) way. The benefit seems too limited to warrant this.
TBD
TBD
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. |