Internet DRAFT - draft-sun-idr-bgp-ls-notification
draft-sun-idr-bgp-ls-notification
INTERNET-DRAFT M. Sun
Intended Status: <Standard Track> B.Pithawala
Expires: December 30,2017 HUAWEI Technologies
F.Gao
Baidu Inc
June 28,2017
<BGP Support for Fast Link Status Notification>
draft-sun-idr-bgp-ls-notification-00
Abstract
This document describes the use of Border Gateway Protocol (BGP)
community. This optional transitive community will instruct router to
monitor itself ports . With this community, controller only needs to
send route update message once and will get the feedback only if link
status changes. In particular this community can help controller get
the link status changing notification much faster than current
method.
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/1id-abstracts.html
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
Copyright and License Notice
Copyright (c) 2017 IETF Trust and the persons identified as the
document authors. All rights reserved.
Marcus, et al. Expires December 30,2017 [Page 1]
INTERNET DRAFT Fast Link Status Notification June 28,2017
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 Large-scale DC Routing Solution . . . . . . . . . . . . . . 3
1.2 BFD protocol and Hellos Protocol . . . . . . . . . . . . . . 5
2. Another Centralized Link Detection Method Based on BGP . . . . 5
2.1 Basic Principle . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Advantages and Benefits of this solution . . . . . . . . . . 7
3 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 7
4 References . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.1 Normative References . . . . . . . . . . . . . . . . . . . 8
4.2 Informative References . . . . . . . . . . . . . . . . . . 8
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 8
Marcus, et al. Expires December 30,2017 [Page 2]
INTERNET DRAFT Fast Link Status Notification June 28,2017
1 Introduction
With the advent of micro services application architecture and the
continued advances in massively scaled distributed systems, majority
of traffic traversing the data center network is within the data
center (east-west). This necessitates the data center network to have
deterministic latency (preferably ultra-low), high scalability, high
availability and low cost. For those requirements, current large-
scale data center network is mostly based on CLOS architecture,
[RFC7938] shows a typical 3 layer(5 stages) CLOS architecture(in
Figure 1,3 layer means Leaf-Agg-Spine ).
Spine
+-----+
| |
+--| |--+
| +-----+ | +-----------------------+
Agg | | | Agg POD |
+-----+ | +-----+ | |+-----+ |
+-------------| DEV |------+--| |--+-+| |-------------+ |
| +-----| C |------+ | | +-+| |-----+ | |
| | +-----+ +-----+ |+-----+ | | |
| | | | | |
| | +-----+ +-----+ |+-----+ | | |
| +-----------| DEV |------+ | | +-+| |-----------+ | |
| | | +---| D |------+--| |--+-+| |---+ | | | |
| | | | +-----+ | +-----+ | |+-----+ | | | | |
| | | | | | | | | | | |
+-----+ +-----+ | +-----+ | | +-----+ +-----+|
| DEV | | DEV | +--| |--+ | | | | ||
| A | | B |Leaf | | | Leaf | | | ||
+-----+ +-----+ +-----+ | +-----+ +-----+|
| | | | +----------+-+-----+-+--+
O O O O O O O O
Servers Servers
Figure 1 3-Layer Clos Topology
Note: Leaf is switching node that is connected with servers, Agg is
exchange node that aggregates Leaf, and Spine is core exchange node.
Nowadays, the scale of this architecture can support 100k servers.
The number of links in network is nearly up to 200k links. Managing
the large number of switches and links in a data center from a
Controller is a difficult scale problem.
1.1 Large-scale DC Routing Solution
Marcus, et al. Expires December 30,2017 [Page 3]
INTERNET DRAFT Fast Link Status Notification June 28,2017
[RFC7938] introduces a link detection solution based on BGP.This RFC
uses ebgp to connect switches (physical link) and use ibgp to connect
switches and controller (logical link). The ebgp connections are made
using the local loopback addresses of the Routers/Switches.Since this
solution does not have any IGP in the network to convey the local
loopback addresses to form the EBGP connection, the solution uses a
centralized controller to initiate the messages to convey loopback
address of a Router to its neighbor. It uses a combination of ibgp
and ebgp connections and messages to achieve the following as Figure
2.
+----------+
inject Prefix +-----+Controller+----+
for R1 with | +----------+ | expect Prefix
one-hop | | for R1 from R2
community +-++ +-++
|R1+------------------+R2|
+--+ Prefix for R1 +-++
relayed | Prefix for R1
+-++NOT relayed
|R3|
+--+
Figure 2 one kind of link detection method
In Figure 2, the controller periodically updates the packets to the
source of the link, determines link status (status of link connecting
to routers/switches) according to whether controller receives update
message from destination link node.The controller sends route message
to switch R1 periodically, which only contains one-hop community
attribute.R1 publishes this message to its neighbor R2 through ebgp
with no_export attribute in it.R2 sends this message to controller
through ibgp instead of sending message to R3 because of no_export
attribute.If controller receives route message from R2 within
specified time, it is assumed that R1->R2 status is normal.
Otherwise, R1->R2 status is down.
But when link detection packets sending frequency is high, the
controller load is heavy, i.e. controller processing capacity is not
enough, and firewall device does not accept this large flow of
traffic.On the other hand,when link detection packets sending
frequency is low, the convergence speed of network is slow, that will
lead to loop or network interruption and other issues. Network
reliability is unacceptable.With single controller multi-threaded
Marcus, et al. Expires December 30,2017 [Page 4]
INTERNET DRAFT Fast Link Status Notification June 28,2017
exabgp + virtual router vyatta, experimental test data shows that
this solution can only support 1k links and 512 servers in non-block
network.
1.2 BFD protocol and Hellos Protocol
Existing mainstream distributed link monitoring methods are Protocol
Hellos [RFC 2328]and BFD protocol[RFC 5880].
Protocol Hellos: Since a protocol (ebgp) is initiated over the link,
the status of the link could be inferred by receiving periodic hellos
(or the lack of hellos).Protocol hellos are generally regarded as a
slow link detection mechanism. Increasing the frequency of hellos
only creates a scale issues at many points in the network without
really providing sub-second link detection.
BFD solution configures BFD session at both ends of the link which
need to be detected. Each end sends detection BFD messages and link
will report failure if the detection message is not received on
time.BFD needs plenty of configurations to different devices and
different ports. In VRRP track, 100k servers need to configure 200k
links and 200k ends. At the same time, 100k servers use BFD need to
configure 200k links and 400k ends which may cause some unexpectable
errors with high cost.
2. Another Centralized Link Detection Method Based on BGP
2.1 Basic Principle
Considering current large-scale DCN link detection method, there are
many problems of periodical detection method. When the frequency of
sending and receiving messages is high, the controller load will be
too heavy. The controller processing capacity is not enough and
firewall devices cannot accept this large flow of traffic. On the
other hand, when the frequency is low, the convergence speed of
network will decrease. This may cause network interruption and worse
network reliability.
Compared with traditional link detection method, this solution
propose an efficient optimization method which can monitor links
automatically. This method can reduce lots of manual configuration
work, avoid various types of errors and high cost. Furthermore, it
also eases the collection of link status notifications for the
controller.
In Figure 3, if the controller need to detect link status from R1 to
R2, the process is as following.
Marcus, et al. Expires December 30,2017 [Page 5]
INTERNET DRAFT Fast Link Status Notification June 28,2017
+------------+
+-+ +-----------+ Controller +------+ +-+
|1| | ibgp1 +------------+ ibgp2| |3|
+-+ | | +-+
+--+--+ +--+--+
| R1 | ebgp | R2 |
| AS1 +------------------------/+ AS2 |
+--+--+ +-+ / +--+--+
| |2| / |
|ebgp +-+ / |ebgp
+--+--+ / +--+--+
|R5 | / | R3 |
|AS5 | port is / | AS3 |
+-----+ automatically +-----+
monitored
Figure 3 the principle of this solution
Step 1:
a) Controller sends route update message A1 to R1 (nonperiodic, just
once) then they can establish a peer. In A1, there's instructions
that can enable R1's port (link) status monitoring function.
b) is the same as a>, only the objective is R2.
c) The A1 message only contains one-hop community attribute and its
prefix is used to identify device R1.
Step 2:
When R1 receives route update message A1 from controller, it will add
a no_export attribute so it can only publish to egbp neighbor R2. R2
will publish this route message to controller through ibgp instead of
its ebgp neighbor device R3.
a) R2 finds that message A1 comes from R1 according to the community
in A1.
b) Here we need to define a dedicated bit in communities to specify
that R2 should start to monitor its link when it receives this
indication. Hence, start to monitor all the links from R1 to R2 in
this step.
step 3
If it detects ports (links) status has changed in step 2 b), on the
Marcus, et al. Expires December 30,2017 [Page 6]
INTERNET DRAFT Fast Link Status Notification June 28,2017
one hand, if the port status switches from normal to fault, R2 will
tell controller a withdraw message through ibgp. On the other hand,
R2 will tell controller a announce message through ibgp.
step 4
When controller receives route A1 update message from R2:
a) Find corresponding link based on received A1 update message
<prefix, srcIP>. Prefix marks network device R1 and srcIP means
device R2. The <prefix, srcIP> can tell controller this is the link
from R1 to R2.
b) If the message is route announce type, link status is normal,
otherwise, the withdraw type means link status is fault.
It is important to notice here that we do not prefer any link
detection mechanism and the BGP implementation on a vendor's device
is free to activate any link detection mechanism it chooses (some
examples are BFD, either auto-sensing feature etc.).
2.2 Advantages and Benefits of this solution
Generally speaking, we need a dedicated bit of communities that can
notify R2 to start monitoring the link between R1 and R2. It's quite
simple but there are many advantages of this solution.
1. It needs no extra configuration and can monitor corresponding
ports (links) automatically. It helps controller know about every
link status with existing BGP protocols. It can avoid lots of manual
configuration and unnecessary errors and costs caused by manual
configuration.
2. It can solve the conflict that network needs fast convergence time
but controller capacity constraint. Using this solution, network with
single controller can support 100k servers while other method can
only support 512 servers.
3. The performance of real-time link failure recovery is better. With
experiments, link failure report time reduces from 3s to less than
50ms, link failure recovery time decreases from 1s to less than 50ms.
3 IANA Considerations
The IANA has registered Transitive Extended Community Types in
RFC7153. This registry contains values of the high-order octet (the
"Type" field) of a Transitive Extended Community.
Marcus, et al. Expires December 30,2017 [Page 7]
INTERNET DRAFT Fast Link Status Notification June 28,2017
This method only needs one unassigned type value to notify device
monitoring corresponding links(ports).
4 References
4.1 Normative References
[RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998.
[RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A
Border Gateway Protocol 4 (BGP-4)", RFC 4271, January
2006.
[RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection
(BFD)", RFC 5880, June 2010.
[RFC7153] E. Rosen, Y. Rekhter, "IANA Registries for BGP Extended
Communities", RFC 7153, March 2014.
[RFC7938] P. Lapukhov, A. Premji, J. Mitchell, Ed., "Use of BGP for
Routing in Large-Scale Data Centers", RFC 7938, August 2016.
4.2 Informative References
[RFC3765] Huston, G., "NOPEER Community for Border Gateway Protocol
(BGP) Route Scope Control", RFC 3765, April 2004.
[RFC6286] E. Chen, J. Yuan, "Autonomous-System-Wide Unique BGP
Identifier for BGP-4", RFC 6286, June 2011.
[RFC6608] J. Dong, M. Chen, A. Suryanarayana, "Subcodes for BGP Finite
State Machine Error", RFC 6608, May 2012.
[RFC7606] E. Chen, Ed., J. Scudder, Ed., P. Mohapatra, K. Patel, "Revised
Error Handling for BGP UPDATE Messages", RFC 7606, August 2015.
[RFC7705] W. George, S. Amante, "Autonomous System Migration
Mechanisms and Their Effects on the BGP AS_PATH Attribute",
RFC 7705, November 2015.
[RFC7752] H. Gredler, Ed., J. Medved, S. Previdi, A. Farrel, S. Ray,
"North-Bound Distribution of Link-State and Traffic Engineering
(TE) Information Using BGP", RFC 7752, March 2016.
Authors' Addresses
Marcus Sun
HUAWEI TECHNOLOGIES CO.,LTD
12 E. Mozhou Rd.Nanjing,Jiangsu
China
EMail: marcus.sun@huawei.com
Burjiz Pithawala
HUAWEI TECHNOLOGIES CO.,LTD
2330 Central Expressway, Santa Clara, CA 95050
US
EMail: burjiz.pithawala1@huawei.com
Feng Gao
BAIDU Inc.
10 shangdi shijie Haidian, Beijing
Email:gaofeng04@baidu.com
Marcus, et al. Expires December 30,2017 [Page 8]
INTERNET DRAFT Fast Link Status Notification June 28,2017
Marcus, et al. Expires December 30,2017 [Page 9]