Interdomain Working Group | S. Litkowski |
Internet-Draft | Orange Business Service |
Intended status: Standards Track | K. Patel |
Expires: January 4, 2015 | Cisco Systems |
J. Haas | |
Juniper Networks | |
July 3, 2014 |
Timestamp support for BGP paths
draft-litkowski-idr-bgp-timestamp-00
BGP is more and more used to transport routing information for critical services. Some BGP updates may be critical to be received as fast as possible : for example, in a layer 3 VPN scenario where a dual-attached site is loosing primary connection, the BGP withdraw message should be propagated as fast as possible to restore the service. The same criticity exists for other address-families like multicast VPNs where "join" messages should also be propagated very fast.
Experience of service providers shows that BGP path propagation time may vary depending on network conditions (especially load of BGP speaker on the path) and too long propagation time are affecting customer service.
It is important for service providers to keep track of BGP updates propagation time to monitor quality of service for the customers. It is also important to be able to identify BGP Speakers that are slowing down the propagation.
This document presents a solution to transport timestamps of a BGP path.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 4, 2015.
Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
CE3----PE3 PE4 --- CE4 (Source) \ / RR3 RR4 \ / RR5 / \ RR1 RR2 / | \ / | \ CE1----PE1 PE5 PE2 --- CE2 | CE5 Figure 1
The figure 1 describes a typical hierarchical RR design where PEs are meshed to local RRs and local RRs are meshed to more centric RRs. We consider a single multicast VPN between all CEs. CE4 is the source, all others may be receivers. The BGP controlplane also supports some other BGP service like L3VPN service.
We consider an event in L3VPN service leading to RR1 being temporarly overloaded (for example, RR1 is processing massive updates due to a router failure or formatting updates for a route-refresh). In the same timeframe, CE1 wants to join the multicast flow from CE4. PE1 propagates the C-multicast route to RR1, but RR1 fails to propagate the route to RR5 because it is busy processing L3VPN. When RR1 finishes the L3VPN job, it would send the C-multicast route to RR5 and updates would be imported by PE4. The long time to join the flow may cause CE4 to miss part of the multicast flow.
All BGP implementations are different in term of internal processing within an address family or between address family. The issue described above is just given as an example, and the document does not presume that all implementations are suffering from this exact issue. But whatever the implementation, their always be cases where BGP update processing could be delayed.
Service providers currently lack of performant solution to keep track of BGP update propagation time as well as solution to identify the BGP speakers causing issues.
BMP (BGP Monitoring Protocol) may be a solution but as several drawbacks (see Section 6).
Our proposal is based on the path vector property of BGP. Each hop within the path would add a tuple (ID,timestamp) information in the BGP path. An ordered list of timestamps would so be built along the path.
BGP Update BGP Update BGP Update BGP Update 10.0.0.0/8 10.0.0.0/8 10.0.0.0/8 10.0.0.0/8 Timestamp: Timestamp: Timestamp: Timestamp: R1:T1 R1:T1 R1:T1 R1:T1 R2:T2 R2:T2 R2:T2 R3:T3 R3:T3 R4:T4 R1 ------------> R2 ------------> R3 ------------> R4 ------------> R5
Using this mechanism, we can easily identify if a hop within a path is slowing down the propagation.
We propose to use a new BGP attribute, BGP timestamp attribute to encode timestamps information.
The BGP timestamp (BGP-TS) Attribute is an optional transitive BGP Path Attribute. The attribute type code is TBD.
The value field of the BGP timestamp attribute is defined here :
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | OType | Originator (variable) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Timestamp #1 (variable) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Timestamp #2 (variable) | ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Timestamp #n (variable) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Receive Timestamp #x | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |A|P|S|T| Rsvd | SyncType | AS#x (variable) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Peer#x (variable) | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
A BGP Speaker supporting the BGP-TS can decide to timestamp only some specific BGP paths. An inspection list may be configured by the user (filter) to apply timestamping on a specific set of BGP prefixes or paths. By default, we suggest that a BGP Speaker supporting BGP-TS SHOULD NOT timestamp any BGP paths.
When a BGP Speaker supporting BGP-TS originates a new path in BGP that matches the inspection list, it MUST add the BGP-TS attribute to the BGP path and MUST set the receive timestamp field to the time the path was originated in BGP. If the BGP Speaker is synchronized to an external system when originating the route, the S-bit MUST be set in the attribute and the SyncType MUST be set to the current stratum.
When a BGP Speaker supporting BGP-TS receives a BGP path that matches the inspection list and does not contains a BGP-TS attribute, it MUST add a BGP-TS attribute containing :
When a BGP Speaker supporting BGP-TS receives a BGP path that matches the inspection list and contains a BGP-TS attribute, it MUST append its own timestamp entry in the existing attribute. If the BGP Speaker is synchronized to an external system when receiving the route, the S-bit MUST be set in the attribute and the SyncType MUST be set to the current stratum.
When a BGP Speaker supporting BGP-TS receives a BGP path that does not the inspection list and contains a BGP-TS attribute, it MUST NOT change the existing attribute.
When a BGP Speaker not supporting BGP-TS receives a BGP path that contains a BGP-TS attribute, it MUST follow the standard BGP procedures described in [RFC4271].
For a manageability/security purpose, the authors suggest that BGP timestamp attribute MAY NOT be sent to a peer unless it was explicitly configured for. This would prevent timestamp and internal address informations to be propagated to some external peers for example. See Section 4.5 for more information.
If a BGP path containing a BGP-TS attribute must be sent to be peer not configured with BGP timestamp option, the BGP-TS attribute should be dropped when the update message is sent to the peer.
BGP update CE2 add timestamp 10.0.0.0/8 when receiving path TS: CE1:T1 CE1--------->R1 ------------> R2 ------------> R3 ------------> R4 ------------> CE2 | | | | | | | | AS1 AS2 Figure 2
In the figure above, we consider that customer wants to monitor BGP updates propagation time between its two sites.
If AS1 and AS2 BGP Speakers does not support BGP-TS, the attribute will be transported transparently accross AS1 without any processing. CE2 will so receive the BGP path with only a single timestamp entry from CE1.
If AS1 and AS2 BGP Speakers does support BGP-TS, three different options are offered : drop, summarize, propagate.
If AS1 and/or AS2 BGP Speakers support BGP-TS, they may not want to expose their timestamps or internal BGP topology to other ASes. If a service does not want to propagate timestamp information to external peers, it can decide to not activate the "timestamp" option on the peer configuration , as explained in Section 4.4.
BGP update BGP update BGP update BGP update BGP update 10.0.0.0/8 10.0.0.0/8 10.0.0.0/8 10.0.0.0/8 10.0.0.0/8 TS: TS: CE1:T1 CE1:T1 CE1--------->R1 ------------> R2 ------------> R3 ------------> R4 ------------> CE2 | | no TS | | | | | | AS1 AS2 Figure 3
If AS1 and/or AS2 BGP Speakers support BGP-TS, they may want to offer timestamp service to their customers but they want to hide their internal topology. In order to achieve the expected behavior, AS1/AS2 can activate a timestamp summary option on the external peer.
BGP update BGP update BGP update BGP update BGP update 10.0.0.0/8 10.0.0.0/8 10.0.0.0/8 10.0.0.0/8 10.0.0.0/8 TS: TS: TS: TS: TS: CE1:T1 CE1:T1 CE1:T1 CE1:T1 CE1:T1 R1:T2 AS1:T3 AS1:T3 AS1:T3 R3:T4 AS2:T5 CE1--------->R1 ------------> R2 ------------> R3 ------------> R4 ------------> CE2 | | TS summary | | TS summary | | | | AS1 AS2 Figure 4
When using summary option, the BGP-TS attribute is modified as follows when exporting the route :
If AS1 and/or AS2 BGP Speakers support BGP-TS, they may want to offer timestamp service to their customers with a full view. The behavior is the default intraAS behavior.
BGP update BGP update BGP update BGP update BGP update 10.0.0.0/8 10.0.0.0/8 10.0.0.0/8 10.0.0.0/8 10.0.0.0/8 TS: TS: TS: TS: TS: CE1:T1 CE1:T1 CE1:T1 CE1:T1 CE1:T1 R1:T2 R1:T2 R1:T2 R1:T2 R2:T3 R2:T3 R2:T3 R3:T4 R3:T4 R4:T5 CE1--------->R1 ------------> R2 ------------> R3 ------------> R4 ------------> CE2 | | | | | | | | AS1 AS2 Figure 5
When receiving a BGP Update message containing a malformed BGP-TS attribute, an "attribute-discard" action MUST be applied as defined in .
--------- ------- / \ / \ RTR_SRC ----- | AS1 | ----- | AS2 | ---- RTR_DST1 \ / \ / --------- --------- | | | | --------- ------- / \ / \ RTR_DST2 ---- | AS4 | | AS3 | ---- RTR_DST3 \ / \ / --------- --------- Figure 6
Single AS ------------------------------------------- / \ | RR1 ---------- RR2 | | / \ \ | | RTR_SRC1 \ RTR_DST1 | | \ | | RR3 | | | | | RTR_DST2 | | | \ / ------------------------------------------- Figure 7
Figure 6 and Figure 7 describes an interAS and a single AS scenario where a service provider wants to monitor BGP Update propagation time from a router to multiple routers. In Figure 6, multiple probing routers are attached to multiple ASes. In Figure 7, all probing routers are in the same AS.
An external tool should command RTR_SRC to originate a probing BGP path. Each probing router is configured to match the path in its inspection list. The BGP path would propagate across ASes whatever they are supporting BGP TS or not. Each probing router would receive the BGP path and add timestamp information. Authors suggest to implementors to use a local wrapping buffer on each node and record entries in the buffer each time a BGP path is timestamped. An external tool should then retrieve timestamps information from RTR_DSTx. How the information is retrieved is out of scope of the document but we can imagine using :
For the solution to be accurate, it is mandatory for BGP Speaker to be synchronized. This could be achieved easily within a single AS but in a inter domain scenario, it is hard to ensure that all Speakers are synchronized to a good clock source.
The S bit and SyncType fields are set to help operators to understand the accuracy of the timestamp measurements and being able to compare timestamps between them.
Single AS ------------------------------------------- / RTR_SRC2- 10/8 \ | / | | RR1 ---------- RR2 | | / \ \ | | RTR_SRC1 \ RTR_DST1 | | | \ | | 10/8 RR3 | | | | | RTR_DST2 | | | \ / ------------------------------------------- Figure 8
RTR_SRC1 starts to propagate 10/8 within the BGP controlplane. All BGP Speakers considers the path as best and this path will be propagated within the whole controlplane. Each BGP Speaker would add its timestamp information and RTR_DST1 and RTR_DST2 would be able to record the timestamp vector. In this case, the timestamp vector is quite accurate because it represents an end to end propagation.
Now RTR_SRC2 starts to propagate its own path. RR2 has two paths for 10/8 and will choose the best one, let's consider that RTR_SRC2 path is the best one, RTR_SRC2 path will so be propagated and timestamp vector will be updated. RR1 will also have two paths, and we consider that RR1 prefers RTR_SRC1 path, so RTR_SRC2 path will not be propagated by RR1. In this situation, RTR_DST1 will receive the path from RR2 with accurate timestamp (end to end propagation) but RTR_DST2 will never receive it.
We could also consider a stable network situation, where both paths have been advertised for a long time. A network event may occur (e.g. IGP metric change) that would cause a BGP Speaker within a path vector to change its best path. In Figure 8, an IGP event, may cause RR1 to change its decision and prefers the path originated by RTR_SRC2 as best, the path will be propagated with previous received timestamp information that are no more accurate. RTR_DST2 will receive a BGP timestamp vector containing stale timestamp informations as well as new ones.
The case of sending stale timestamp information can also appear with a single originator as soon as some redundancy in the BGP design is involved (multiple RRs, multiple ASBRs ...).
An external tool that monitors BGP timestamp should take care about analysing only end to end propagation scenarios.
BMP (BGP Monitoring Protocol) [I-D.ietf-grow-bmp] is a solution to monitor BGP sessions and provides a convenient interface for obtaining route views. BMP is a complete suite of messages to exchange informations regarding a BGP session.
We can imagine to use BMP as a solution to monitor BGP update propagation time but there is multiple drawbacks associated with such solution :
Using BMP to monitor BGP update propagation may complexify the design of the monitor solution.
This solution is not intended to perform timestamp imposition on all BGP updates.
Service provider implementing the BGP timestamp attribute must be aware of the propagation rules of the NLRIs to be inspected. If we consider an implementation scenario, where a path for NLRI is already propagated, a new path may appear and starts to be propagated, propagation of this new path may stop at a certain point because a BGP Speaker may consider the old path as the best one. Another scenario, could be that the two paths are installed, and for a BGP Speaker within the path vector, the best path is changing because of an IGP metric change, this BGP Speaker will send a new BGP update and timestamp information of the path will be updated but will have no more sense : origin timestamp will be quite old, but timestamps recorded after this BGP Speaker will be recent. This kind of scenario is complex to understand.
The deployment scenario we are targeting is really to inspect some specific NLRIs identified by the service provider where the propagation rules are well known (see Section 5 as an example). Service provider may rely on existing NLRIs (real routes), or ephemeral NLRIs (dedicated NLRIs for beaconing). Whatever the NLRI used, the tool used by the service provider to collect and interpret the timestamp must be aware of the propagation rules and must record events only if propagation is end to end (from originator to listener).
The inspection list should be kept as small as possible in order to not introduce processing overhead and as a consequence slow down propagation. Implementors should take care about reducing as much as possible the processing overhead introduced by the inspection list and timestamp imposition.
Depending of the implementation and router capacity, adding timestamps to BGP path may consume some router ressources. As proposed in Section 4.1, by default a BGP Speaker will not timestamp any path and inspection list should be configured to activate timestamping on a subset of paths. Using this approach, we consider that overhead that may be introduced by timestamping BGP paths is well controlled by operators. An external router cannot force an internal router to timestamp.
Providing detailled timestamps information to other ASes may introduce security issues by exposing internal datas (part of BGP topology, IP addresses, internal performance) to external entities. The proposal we make in Section 4.5 solves this security issue by giving flexibility to operators on the level of information he wants to expose to external peers.
IANA shall assign a codepoint for the BGP Timestamp attribute. This codepoint will come from the "BGP Path Attributes" registry.
[I-D.ietf-grow-bmp] | Scudder, J., Fernando, R. and S. Stuart, "BGP Monitoring Protocol", Internet-Draft draft-ietf-grow-bmp-07, October 2012. |
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. |
[RFC4271] | Rekhter, Y., Li, T. and S. Hares, "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, January 2006. |
[RFC5905] | Mills, D., Martin, J., Burbank, J. and W. Kasch, "Network Time Protocol Version 4: Protocol and Algorithms Specification", RFC 5905, June 2010. |