nachum-sarp-11.txt

Internet DRAFT - draft-nachum-sarp

draft-nachum-sarp

Last Version:	draft-nachum-sarp-11.txt	Tracker Entry
Date:	`09-Apr-2015`
Disposition:	RFC7586 (diff)
Previous Versions:	draft-nachum-sarp-10.txt (diff) - 26-Dec-2014
	draft-nachum-sarp-09.txt (diff) - 16-Dec-2014
	draft-nachum-sarp-08.txt (diff) - 30-Jun-2014
	draft-nachum-sarp-07.txt (diff) - 13-Jan-2014
	draft-nachum-sarp-06.txt (diff) - 16-Jul-2013
	draft-nachum-sarp-05.txt (diff) - 12-Jul-2013
	draft-nachum-sarp-04.txt (diff) - 25-Feb-2013
	draft-nachum-sarp-03.txt (diff) - 10-Oct-2012
	draft-nachum-sarp-02.txt (diff) - 05-Jun-2012
	draft-nachum-sarp-01.txt (diff) - 12-Mar-2012
	draft-nachum-sarp-00.txt (diff) - 05-Mar-2012

Network Working Group Youval Nachum
Internet Draft Ixia
Intended status: Experimental Linda Dunbar
Expires: October 2015 Huawei

Ilan Yerushalmi
Tal Mizrahi
Marvell

April 8, 2015

Scaling the Address Resolution Protocol for Large Data Centers
(SARP)
draft-nachum-sarp-11.txt

Abstract

This document introduces SARP, an architecture that uses proxy
gateways to scale large data center networks. SARP is based on
fast proxies that significantly reduce switches' Filtering
Databased (FDB) table sizes and ARP/ND impact on network
elements in an environment where hosts within one subnet (or
VLAN) can spread over various locations. SARP is targeted for
massive data centers with a significant number of Virtual
Machines (VMs) that can move across various physical
locations.

Status of this Memo

This Internet-Draft is submitted to IETF in full conformance
with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet
Engineering Task Force (IETF), its areas, and its working
groups. Note that other groups may also distribute working
documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as
"work in progress."

The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.

The list of Internet-Draft Shadow Directories can be accessed
at http://www.ietf.org/shadow.html.

Nachum, et al. Expires October 8, 2015 [Page 1]

Internet-Draft SARP April 2015

This Internet-Draft will expire on October 8, 2015.

This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date
of publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described
in Section 4.e of the Trust Legal Provisions and are provided
without warranty as described in the Simplified BSD License.

Table of Contents

1. Introduction...................................................3
1.1. SARP Motivation...........................................3
1.2. SARP Overview.............................................6
1.3. SARP Deployment Options...................................7
1.4. Comparing with Existing Solutions.........................8
2. Terms and Abbreviations Used in this Document..................9
3. SARP - Theory of Operation....................................10
3.1. Control Plane: ARP/ND....................................10
3.1.1. ARP/NS Request for a Local VM.......................10
3.1.2. ARP/NS Request for a Remote VM......................11
3.1.3. Gratuitous ARP and Unsolicited Neighbor
Advertisement (UNA)........................................12
3.2. Data Plane: Packet Transmission..........................13
3.2.1. Local Packet Transmission...........................13
3.2.2. Packet Transmission Between Sites...................13
3.3. VM Migration.............................................14
3.3.1. VM Local Migration..................................14
3.3.2. VM Migration from One Site to Another...............14
3.3.2.1. Impact on IP<->MAC Mapping Cache Table of
Migrated VMs............................................16
3.4. Multicast and Broadcast..................................16
3.5. Non IP packet............................................17
3.6. High availability and load balancing.....................17
3.7. SARP Interaction with Overlay networks...................18
4. Security Considerations.......................................18
5. IANA Considerations...........................................19
6. References....................................................19

Nachum, et al. Expires October 8, 2015 [Page 2]

Internet-Draft SARP April 2015

6.1. Normative References.....................................19
6.2. Informative References...................................19
7. Acknowledgments...............................................20

1. Introduction

This document describes a proxy gateway technique, called
Scalable Address Resolution Protocol (SARP), which reduces
switches' Filtering Data Base (FDB) size and ARP/Neighbor
Discovery impact on network elements in an environment where
hosts within one subnet (or VLAN) can spread over various
access domains in data centers.

The main idea of SARP is to represent all VMs (or hosts) under
each access domain by their corresponding access (or
aggregation) node's MAC address. For example (Figure 1), when
host A in the west site needs to communicate with host B,
which is on the same VLAN but connected to a different access
domain (east site), SARP requires A to use the MAC address of
SARP proxy 2, rather than the address of host B. By doing so,
switches in each domain do not need to maintain a list of MAC
addresses for all the VMs (hosts) in different access domains;
every switch only needs to be familiar with MAC addresses that
reside in the current domain, and addresses of remote SARP
proxy gateways. Therefore, the switches' FDB size is limited
regardless of the number of access domains.

+-------+ +-------+ _ __ +-------+ +-------+
| | | SARP | / \_/ \_ | SARP | | |
|host A |<===>| proxy |<=>\_ \<==>| proxy |<===>|host B |
| | | 1 | / _/ | 2 | | |
+-------+ +-------+ \__ _/ +-------+ +-------+
\_/
<------west site------> <------east site------>
Figure 1 SARP in a nutshell

1.1. SARP Motivation

[RFC6820] discusses the impacts and scaling issues that arise
in data center networks when subnets span across multiple
L2/L3 boundary routers.

Unfortunately, when the combined number of VMs (or hosts) in
all those subnets is large, this can lead to switches' MAC
table size explosion and heavy impact on network elements.

Nachum, et al. Expires October 8, 2015 [Page 3]

Internet-Draft SARP April 2015

There are four major issues associated with subnets spanning
across multiple L2/L3 boundary router ports:

1)Intermediate switches' MAC address table (FDB) explosion.

When hosts in a VLAN (or subnet) span across multiple access
domains and each access domain has hosts belonging to
different VLANs, each access switch has to enable multiple
VLANs. Thus, those access switches are exposed to all MAC
addresses across all VLANs.

For example, for an access switch with 40 attached physical
servers, where each server has 100 VMs, the access switch
has 4000 attached MAC addresses. If indeed hosts/VMs can be
moved anywhere, the worst case for the Access Switch is when
all those 4000 VMs belong to different VLANs, i.e. the
access switch has 4000 VLANs enabled. If each VLAN has 200
hosts, this access switch's MAC table potentially has
200*4000 = 800,000 entries.

It is important to note that the example above is relevant
regardless of whether IPv4 or IPv6 are used.

The example illustrates a scenario that is worse than what
today's L2/3 Gateway has to face. In today's environment
where each subnet is limited to a few access switches, the
number of MAC addresses the gateway has to learn is of a
significantly smaller scale.

2)ARP/ND processing load impact to the L2/L3 boundary routers.

All VMs periodically send NDs to their corresponding gateway
nodes to get gateway nodes' MAC addresses. When the combined
number of VMs across all the VLANs is large, processing the
responses to the ND requests from those VMs can easily
exhaust the gateway's CPU utilization.

A L2/L3 boundary router could be hit with ARP/ND twice when
the originating and destination stations are in different
subnets attached to the same router and when those hosts do
not communicate with external peers very frequently. The
first hit is when the originating station in subnet 1
initiates an ARP/ND request to the L2/L3 boundary router.
The second hit is when the L2/L3 boundary router initiates
an ARP/ND request to the target in subnet 2 if the target is
not in router's ARP/ND cache.

Nachum, et al. Expires October 8, 2015 [Page 4]

Internet-Draft SARP April 2015

3)In IPv4, every end station in a subnet receives ARP
broadcast messages from all other end stations in the
subnet. IPv6 ND has eliminated this issue by using
multicast.

However, most devices support a limited number of multicast
addresses, due to multicast filtering scaling. Once the
number of multicast addresses exceeds the multicast filter
limit, the multicast addresses have to be processed by
devices' CPU (i.e. the slow path).

It is less of an issue in data centers without VM mobility,
since each port is only dedicated to one (or a small number
of) VLANs. Thus, the number of multicast addresses hitting
each port is significantly lower.

4)The ARP/ND messages are flooded to many physical link
segments which can reduce the bandwidth utilization for user
traffic.

ARP/ND flooding is, in most cases, an insignificant issue in
today's data center networks as the majority of data center
servers are shifting towards 1G or 10G ports. The bandwidth
used by ARP/ND, even when flooded to all physical links,
becomes negligible compared to the link bandwidth.
Furthermore, IGMP/MLD snooping [RFC4541] can further reduce
the ND multicast traffic to some physical link segments.

Statistics gathered by Merit Network [ARMDStats] have shown
that the major impact of a large number of VMs in data centers
is on the L2/L3 boundary routers, i.e., issue (2) above. An
L2/L3 boundary router could be hit with ARP/ND twice when the
originating and destination stations are in different subnets
attached to the same router and those hosts do not communicate
with external peers often enough.

Overlay approaches, e.g. [RFC7364], can hide hosts (VMs)
addresses in the core but do not prevent the MAC table
explosion problem (issue (1)) unless the NVE is on a server.

The scaling practices documented in [ARP-ND-PRACTICE] can only
reduce some ARP impact to L2/L3 boundary routers in some
scenarios, but not all.

In order to protect router CPUs from being overburdened by
target resolution requests, some routers rate limit the target

Nachum, et al. Expires October 8, 2015 [Page 5]

Internet-Draft SARP April 2015

MAC resolution requests to the router's CPU. When the rate
limit is exceeded, the incoming data frames are dropped. In
traditional data centers, this issue is less significant,
since the number of hosts attached to one L2/L3 boundary
router is limited by the number of physical ports of the
switches/routers. When servers are virtualized to support 30+
VMs, the number of hosts under one router can grow by a factor
of 30+. Furthermore, in traditional data center networks each
subnet is neatly bound to a limited number of server racks,
i.e., switches only need to be familiar with MAC addresses of
hosts that reside in this small number of subnets. In
contemporary data center networks, as subnets are spread
across many server racks, switches are exposed to VLAN/MAC
addresses of many subnets, greatly increasing the size of
switches' FDB tables.

The solution proposed in this document can eliminate or reduce
the likelihood of inter-subnet data frames being dropped and
reduce the number of host MAC addresses that intermediate
switches are exposed to, thus reducing switches' FDB table
sizes.

1.2. SARP Overview

The SARP approach uses proxy gateways to address the problems
discussed above.

Note: The Guidelines to proxy developers [RFC4389] have been
carefully considered for the SARP protocols. Section 3.3
discusses how SARP works when VMs are moved from one segment
to another.

In order to enable VMs to be moved across servers while
maintaining their MAC/IP addresses unchanged, the Layer 2
network (e.g. VLAN) which interconnects those VMs may spread
across different server racks, different rows of server racks,
or even different data center sites.

A multi-site data center network is comprised of two main
building blocks: an interconnecting segment and an access
segment. While the access network is, in most cases, a Layer 2
network, the interconnecting segment is not necessarily a
Layer 2 network.

The SARP proxies are located at the boundaries where the
access segment connects to its interconnecting segment. The
boundary node can be a hypervisor virtual switch, a top-of-

Nachum, et al. Expires October 8, 2015 [Page 6]

Internet-Draft SARP April 2015

rack switch, an aggregation switch (or end of row switch), or
a data center core switch. Figure 2 depicts an example of two
remote data centers that are managed as a single flat Layer 2
domain. SARP proxies are implemented at the edge devices
connecting the data center to the transport network. SARP
significantly reduces the ARP/ND transmissions over the
interconnecting network.

*-------------------*
| |
+-------| Interconnecting |-------+
| | network | |
| *-------------------* |
| |
*-----------------* *----------------*
| SARP Proxies | | SARP Proxies |
*-----------------* *----------------*
| | | |
*-------* *-------* *-------* *-------*
|Access | |Access | |Access | |Access |
*-------* *-------* *-------* *-------*
|
*----------*
|Hypervisor|
*----------*
|
*--------*
|Virtual |
|Machine |
*--------*

(West Site) (East Site)

Figure 2 SARP: Network Architecture Example

1.3. SARP Deployment Options

SARP deployment is tightly coupled with the data center
architecture. SARP proxies are located at the point where the
Layer 2 infrastructure connects to its Layer 2 cloud using
overlay networks. SARP proxies can be located at the data
center edge (as Figure 2 depicts), data center core, or data
center aggregation (denoted by Agg in the figure). SARP can
also be implemented by the hypervisor (as Figure 3 depicts).

Nachum, et al. Expires October 8, 2015 [Page 7]

Internet-Draft SARP April 2015

To simplify the description, we will focus on data centers
that are managed as a single flat Layer 2 network, where SARP
proxies are located at the boundary where the data center
connects to the transport network (as Figure 2 depicts).

*-------------------*
| |
+-------| TRANSPORT |-------+
| | | |
| *-------------------* |
| |
*-----------------* *----------------*
| Edge Device | | Edge Device |
*-----------------* *----------------*
| |
*-----------------* *----------------*
| Core | | Core |
*-----------------* *----------------*
| | | |
*-------* *-------* *-------* *-------*
| Agg | | Agg | | Agg | | Agg |
*-------* *-------* *-------* *-------*
|
*----------*
|Hypervisor|
*----------*

(West Site) (East Site)

Figure 3 SARP deployment options

1.4. Comparing with Existing Solutions

IETF has developed several mechanisms to address issues
associated with Layer 2 networks over multiple geographic
locations, for example, Layer 2 VPN [RFC 4664], proxy ARP [RFC
925], proxy Neighbor Discovery [RFC 4389], IGMP and MLD
snooping [RFC 4541], and ARP mediation for IP interworking of
Layer 2 VPNs [RFC 6575].

However, all those solutions work well when hosts within one
subnet are placed together under one access domain, so that
the intermediate switches in each access domain are only
exposed to host addresses from a limited number of subnets

Nachum, et al. Expires October 8, 2015 [Page 8]

Internet-Draft SARP April 2015

SARP is to provide a solution when hosts within one subnet are
spread across multiple access domains and each access domain
has hosts from many subnets. Under this environment, the
intermediate switches in each access domain are exposed to
combined hosts of all the subnets that are enabled by the
access domain.

2. Terms and Abbreviations Used in this Document

ARP: Address Resolution Protocol [ARP]

FDB: Filtering Data Base, which is used for Layer-2 switches
[802.1Q]. Layer 2 switches flood data frames when DA is
not in FDB, whereas routers drop data frames when the DA
is not in the Forwarding Information Base (FIB). That is
why Filtering Data Base (FDB) is used for Layer 2
switches.

FIB: Forwarding Information Base

Hypervisor: a software layer that creates and runs virtual
machines on a server.

IP-D: IP address of the destination virtual machine

IP-S: IP address of the source virtual machine

MAC-D: MAC address of the destination virtual machine

MAC-E: MAC address of the East Proxy SARP Device

MAC-S: MAC address of the source virtual machine

NA: IPv6 ND's Neighbor Advertisement

ND: IPv6 Neighbor Discovery Protocol [ND]. In this document,
ND also refers to Neighbor Solicitation, Neighbor
Advertisement, Unsolicited Neighbor Advertisement
messages defined by RFC4861

NS: IPv6 ND's Neighbor Solicitation

Nachum, et al. Expires October 8, 2015 [Page 9]

Internet-Draft SARP April 2015

SARP Proxy: The components that participates in the SARP
protocol.

UNA: IPv6 ND's Unsolicited Neighbor Advertisement [ND]

VM: Virtual Machine

3. SARP - Theory of Operation

3.1. Control Plane: ARP/ND

This section describes the ARP/ND procedure scenarios. The
first scenario addresses a case where both the source and
destination VMs reside in the same access segment. In the
second scenario, the source VM is in the local access segment
and the destination VM is located at the remote access
segment.

In all scenarios, the VMs (source and destination) share the
same L2 broadcast domain.

3.1.1. ARP/NS Request for a Local VM

When source and destination VMs are located at the same access
segment (Figure 4), the address resolution process is as
described in [ARP] and [ND]; host A sends an ARP request or an
IPv6 Neighbor Solicitation (NS) to learn the IP-to-MAC mapping
of host B, and receives a reply from host B with the IP-D to
MAC-D mapping.

Nachum, et al. Expires October 8, 2015 [Page 10]

Internet-Draft SARP April 2015

+-------+ _ __ +-------+ _ __
|host A | / \_/ \_ | SARP | / \_/ \_
| IP-S |<--->\_access \<==>| proxy |<===>\_interc.\
| MAC-S | /network_/ | 1 | /network_/
+-------+ +->\__ _/ +-------+ \__ _/
| \_/ \_/
+-------+ |
|host B |<-+
| IP-D |
| MAC-D |
+-------+

<--------------west site------------>
Figure 4 SARP: two hosts in the same access segment

3.1.2. ARP/NS Request for a Remote VM

When the source and destination VMs are located at different
access segments, the address resolution process is as follows.

+-------+ +-------+ _ __ +-------+ +-------+
|host A | | SARP | / \_/ \_ | SARP | |host B |
| IP-S |<===>|proxy 1|<=>\_ \<==>|proxy 2|<===>| IP-D |
| MAC-S | | MAC-W | / _/ | MAC-E | | MAC-D |
+-------+ +-------+ \__ _/ +-------+ +-------+
\_/
<------west site------> <------east site------>
Figure 5 SARP: two hosts that reside at different segments

In the example illustrated in Figure 5, the source VM is
located at the west access segment and the destination VM is
located at the east access segment.

When host A sends an ARP/NS request to find out the IP-to-MAC
mapping of host B:

1. If SARP proxy 1 does not have IP-D in its ARP cache, the
ARP/NS request is propagated to all access segments which
might have VMs in the same virtual network as the
originating VM, including the east access segment.

2. As SARP proxy 1 forwards the ARP/NS message, it replaces
the source MAC address, MAC-S, with its own MAC address,

Nachum, et al. Expires October 8, 2015 [Page 11]

Internet-Draft SARP April 2015

MAC-W. Thus, all switches that reside in the interconnecting
segment are not exposed to MAC-S.

3. The ARP/NS request reaches SARP proxy 2.

4. If SARP proxy 2 does not have IP-D in its ARP cache, the
ARP/NS request is forwarded to the east access network. Host
B responds with an ARP reply (IPv4) or a Neighbor
Advertisement (IPv6) to the request with MAC-D.

5. When the response message reaches SARP proxy 2, it replaces
MAC-D with MAC-E, and thus the response reaches SARP proxy 1
with MAC-E.

6. As SARP proxy 1 forwards the response to host A, it
replaces the destination address from MAC-W to MAC-S.

SARP Proxy ARP/ND Cache

SARP proxies maintain a cache of the IP<->MAC mapping. This
cache is based on ARP/ND messages that are sent by hosts and
traverse the SARP proxies.

In step . 1 and step 4 . above, if the SARP proxy has IP-D in its
ARP cache, it responds with MAC-E, without forwarding the
ARP/NS request.

This caching approach significantly reduces the volume of the
ARP/ND transmission over the network, and reduces the round
trip time of ARP/ND requests.

When the west SARP proxy caches the IP<-> MAC mapping entries
for remote VMs, the expiration timers should be set to
relatively low value to prevent stale entries due to remote
VMs being moved or deleted. In environments where VMs move
more frequently, it is not recommended for SARP proxies to
cache the IP<-> MAC mapping entries of remote VMs.

3.1.3. Gratuitous ARP and Unsolicited Neighbor Advertisement
(UNA)

Hosts (or VMs) send out Gratuitous ARP (IPv4) [TcpIp] and
Unsolicited Neighbor Advertisement - UNA (IPv6) to allow other
nodes to refresh IP<->MAC entries in their caches.

Nachum, et al. Expires October 8, 2015 [Page 12]

Internet-Draft SARP April 2015

The local SARP proxy processes the Gratuitous ARP or UNA in
the same way as the ARP reply or IPv6 NA, i.e. replaces the
MAC addresses in the same manner.

3.2. Data Plane: Packet Transmission

3.2.1. Local Packet Transmission

When a VM transmits packets to a destination VM that is
located at the same site (Figure 4), the data plane is
unaffected by SARP; packets are sent from (IP-S, MAC-S) to
(IP-D, MAC-D).

3.2.2. Packet Transmission Between Sites

Packets that are sent between sites (Figure 5) traverse the
SARP proxy of both sites.

A packet sent from host A to host B undergoes the following
procedure:

1. Host A sends a packet to IP-D, and based on its ARP table
it uses the MAC addresses {MAC-E, MAC-S}.

2. SARP proxy 1 receives the packet and replaces the source
MAC address, such that the packet includes {MAC-E, MAC-W}.

3. SARP proxy 2 receives the packet and replaces the
destination MAC address, and the packet is sent to host B
with {MAC-D, MAC-W}.

SARP proxy 1 replaces the source MAC address with its own
since switches in the interconnecting segment are only
familiar with SARP proxy MAC addresses, and are not familiar
with host addresses.

Note: it is a common security practice in data center networks
to use access lists, allowing each VM to communicate only with
a list of authorized peer VMs. In most cases, such access
control lists are based on IP addresses, and hence are not
affected by the MAC address replacement in SARP.

Nachum, et al. Expires October 8, 2015 [Page 13]

Internet-Draft SARP April 2015

3.3. VM Migration

3.3.1. VM Local Migration

When a VM migrates locally within its access segment, the SARP
protocol does not require any special behavior. VM migration
is resolved entirely by the Layer 2 mechanisms.

3.3.2. VM Migration from One Site to Another

This section focuses on a scenario where a VM migrates from
the west site to the east site while maintaining its MAC and
IP addresses.

VM migration might affect networking elements based on their
respective location:

- Origin site (west site)

- Destination site (east site)

- Other sites

+-------+ +-------+ _ __ +-------+ +-------+
|host A | | SARP | / \_/ \_ | SARP | |host A |
| IP-D |<===>|proxy 1|<=>\_ \<==>|proxy 2|<===>| IP-D |
| MAC-D | | MAC-W | / _/ | MAC-E | | MAC-D |
+-------+ +-------+ \__ _/ +-------+ +-------+
\_/
<------west site------> <------east site------>
Origin site Destination site
Figure 6 SARP: host A migrates from west site to east site

Origin site

The Origin site is the site where the VM resides before the
migration (west site).

Before the VM (IP=IP-D, MAC=MAC-D) is moved, all VMs at the
west site that have an ARP entry of IP-D in their ARP table
have the IP-D -> MAC-D mapping. VMs on other access segments
have an ARP entry of IP-D -> MAC-W mapping where MAC-W is the
MAC address of the SARP proxy on the west access segment.

Nachum, et al. Expires October 8, 2015 [Page 14]

Internet-Draft SARP April 2015

After the VM (IP-D) in the west site moves to the east site,
if a Gratuitous ARP (IPv4) or an Unsolicited Neighbor
Advertisement (IPv6) is sent out by the destination hypervisor
on behalf of the VM (IP-D), then the IP<->MAC mapping cache of
the VMs in all access segments is updated by IP-D -> MAC-E
where MAC-E is the MAC address of the SARP proxy on the east
site. If no Gratuitous ARP or Unsolicited Neighbor
Advertisement is sent out by the destination hypervisor, the
IP<->MAC cache on the VMs in the west site (and other sites)
is eventually aged out.

Until the IP<->MAC mapping cache tables are updated, the
source VMs from the west site continue sending packets locally
to MAC-D, and switches at the west site are still configured
with the old location of MAC-D. This transient condition can
be resolved by having the VM manager send out a fake
Gratuitous ARP or Unsolicited Neighbor Advertisement on behalf
of the destination Hypervisor. Another alternative is to have
a shorter aging timer configured for IP<->MAC cache table.

Destination Site

The destination site is the site to which the VM migrated,
i.e., the east site in Figure 6.

Before any Gratuitous ARP or Unsolicited Neighbor
Advertisement messages are sent out by the destination
hypervisor, all VMs at the east site (and all other sites)
might have IP-D -> MAC-W mapping in their IP<->MAC mapping
cache. The IP<->MAC mapping cache is updated by aging or by a
Gratuitous ARP or UNA message sent by the destination
hypervisor. Until the IP<->MAC mapping caches are updated, VMs
from the east site continue to send packets to MAC-W. This can
be resolved by having the VM manager sending out a fake
Gratuitous ARP/UNA immediately after the VM migration, or
redirecting the packets from the SARP proxy of the east site
back to the migrated VM by updating the destination MAC of the
packets to MAC-D.

Other Sites

All VMs at the other sites that have an ARP entry of IP-D in
their ARP table have the IP-D -> MAC-W mapping. The ARP
mapping is updated by aging or by a Gratuitous ARP message
sent by the destination hypervisor of the migrated VM and
modified by the SARP proxy of the east site to an IP-D -> MAC-

Nachum, et al. Expires October 8, 2015 [Page 15]

Internet-Draft SARP April 2015

E mapping. Until ARP tables are updated, VMs from other sites
continue sending packets to MAC-W.

3.3.2.1. Impact on IP<->MAC Mapping Cache Table of Migrated VMs

When a VM (IP-D) is moved from one site to another, its IP<-
>MAC mapping entries for VMs located at other sites (i.e.,
neither the east site nor the west site) are still valid, even
though most guest OSs (or VMs) will refresh their IP<->MAC
cache after migration.

The migrated VM's IP<->MAC mapping entries for VMs located at
the east site, if not refreshed after migration, can be kept
with no change until the ARP aging time since they are mapped
to MAC-E. All traffic originated from the migrated VM in its
new location to VMs located at the east site traverses the
SARP proxy of the east site, which can redirect the traffic
back to the corresponding destinations on the east site.
Furthermore, an ARP/UNA sent by the SARP proxy of the east
site or by the VMs on the east site can refresh the
corresponding entries in the migrated VM's IP<->MAC cache.

The migrated VM's ARP entries for VMs located at the west site
remain unchanged until either the ARP entries age out or new
data frames are received from the remote sites. Since all MAC
addresses of the VMs located at the west site are unknown at
the east site, all unknown traffic from the VM is intercepted
by the SARP proxy of the east site and forwarded to the SARP
proxy of the west site (during the transient period before the
ARP entries age out). This transient behavior is avoided if
the SARP proxy has the destination IP address in its ARP
cache, and upon receiving a packet with an unknown destination
MAC address it can send a Gratuitous ARP/UNA to the migrated
VM.

Note that overlay networks providing Layer 2 network
virtualization services configure their edge device MAC aging
timers to be greater than the ARP request interval.

3.4. Multicast and Broadcast

Multicast and broadcast traffic is forwarded by SARP proxies
as follows:

o SARP proxies modify the source MAC address of multicast and
broadcast packets as described in Section 3.2.

Nachum, et al. Expires October 8, 2015 [Page 16]

Internet-Draft SARP April 2015

o SARP proxies do not modify the destination MAC address of
multicast and broadcast packets.

3.5. Non IP packet

The L2/L3 boundary routers in the current document are capable
of forwarding non-IP IEEE802.1 Ethernet frames (Layer 2)
without MAC header change. When subnets span across multiple
ports of those routers, they are still under the category of a
single link, or a multi-access link model recommended by
[RFC4903]. They differ from the "multi-link" subnets described
in [MultLinkSub] and [RFC4903], which refer to a different
physical media with the same prefix connected to a router,
where the Layer 2 frames cannot be natively forwarded without
header change.
3.6. High availability and load balancing

The SARP proxy is located at the boundary where the local
Layer 2 infrastructure connects to the interconnecting
network. All traffic from the local site to the remote sites
traverses the SARP proxy. The SARP proxy is subject to high
availability and bandwidth requirements.

The SARP architecture supports multiple SARP proxies
connecting a single site to the transport network. In the SARP
architecture all proxies can be active and can backup one
another. The SARP architecture is robust and allows network
administrators to allocate proxies according to bandwidth and
high availability requirements.

Traffic is segregated between SARP proxies by using VLANs. An
SARP proxy is the Master-SARP proxy of a set of VLANs and the
Backup-SARP proxy of another set of VLANs.

For example, assume the SARP proxies of the west site are SARP
proxy 1 and SARP proxy 2. The west site supports VLAN 1 and
VLAN 2 while SARP proxy 1 is the Master SARP proxy of VLAN 1
and the Backup proxy of VLAN 2 and SARP proxy 2 is the Master
SARP proxy of VLAN 2 and the Backup SARP proxy of VLAN 1. Both
proxies are members of VLAN 1 and VLAN 2.

The Master SARP proxy updates its Backup proxy with all the
ARP reply messages. The Backup SARP proxy maintains a backup
database to all the VLANs that it is the Backup SARP proxy of.

Nachum, et al. Expires October 8, 2015 [Page 17]

Internet-Draft SARP April 2015

The Master and the Backup SARP proxies maintain a keepalive
mechanism. In case of a failure the Backup proxy becomes the
Master SARP proxy. The failure decision is per VLAN. When the
Master and the Backup proxies switch-over, the backup SARP
proxy can use the MAC address of the Master SARP proxy. The
backup SARP proxy sends locally a Gratuitous ARP message with
the MAC address of the Master SARP proxy to update the
forwarding tables on the local switches. The backup SARP proxy
also updates the remote SARP proxies on the change.

3.7. SARP Interaction with Overlay networks

SARP can be used over overlay networks, providing L2 network
virtualization (such as IP, VPLS, TRILL, OTV, NVGRE and
VXLAN). The mapping of SARP to overlay networks is
straightforward; the VM does the destination IP to SARP proxy
MAC mapping. The mapping of the proxy MAC to its correct
tunnel is done by the overlay networks.

SARP significantly scales down the complexity of the overlay
networks and transport networks by reducing the mapping tables
to the number of SARP proxies.

4. Security Considerations

SARP proxies are located at the boundaries of access networks,
where the local Layer 2 infrastructure connects to its Layer 2
cloud. SARP proxies interoperate with overlay network
protocols that extend the Layer 2 subnet across data centers
or between different systems within a data center.

The SARP protocol does not expose the network to additional
security threats that do not exist in the absence of SARP.

SARP proxies may be exposed to Denial of Service (DoS) attacks
by means of ARP/ND message flooding. Thus, SARP proxies must
have sufficient resources to support the SARP control plane
without making the network more vulnerable to DoS than without
SARP proxies.

SARP adds security to the data plane in terms of network
reconnaissance, by hiding all the local Layer 2 MAC addresses
from potential attackers located at the interconnecting
network, and significantly limiting the number of addresses
exposed to an attacker at a remote site.

Nachum, et al. Expires October 8, 2015 [Page 18]

Internet-Draft SARP April 2015

5. IANA Considerations

There are no IANA actions required by this document.

RFC Editor: please delete this section before publication.

6. References

6.1. Normative References

[ARP] Plummer, D., "An Ethernet Address Resolution
Protocol", RFC 826, November 1982.

[ND] Narten, T., Nordmark, E., Simpson, W., and H.
Soliman, "Neighbor Discovery for IP version 6
(IPv6)", RFC 4861, September 2007.

[ProxyARP] Carl-Mitchell, S., Quarterman, J., "Using ARP to
Implement Transparent Subnet Gateways", RFC
1027, October 1987.

[RFC4389] Thaler, D., Talwar, M., Patel, C., "Neighbor
Discovery Proxies (ND Proxy)", RFC 4389, April
2006.

[RFC925] Postel,J., "Multi-LAN Address Resolution", Oct
1984.

[RFC4541] Christensen, M., et al, "Considerations for
Internet Group Management Protocol (IGMP) and
Multicast Listener Discovery (MLD) Snooping
Switches", may 2006.

[RFC4664] Andersson, L., et al, "Framework for Layer 2
Virtual Private Nteworks (L2VPNs)", Sept 2006.

[RFC6575] Shah, H. et al, "Address Resolution Protocol
(ARP Mediation for IP Interworking of Layer 2
VPNs", June 2012

6.2. Informative References

[802.1Q] IEEE, "IEEE Standard for Local and metropolitan
area networks -- Bridges and Bridged Networks",
IEEE Std 802.1Q, December 2014.

Nachum, et al. Expires October 8, 2015 [Page 19]

Internet-Draft SARP April 2015

[RFC6820] Narten, T., Karir , M., Foo, I., "Address
Resolution Problems in Large Data Center
Networks", RFC 6820, Jan 2013.

[ARMDStats] Karir, M., Rees, J., "Address Resolution
Statistics", draft-karir-armd-statistics-01
(expired), July 2011.

[RFC7364] Narten, T., Gray, E., Black, D., Fang, L.,
Kreeger, L., Napierala, M., "Problem Statement:
Overlays for Network Virtualization", draft-
ietf-nvo3-overlay-problem-statement, Oct 2014.

[RFC4903] Thaler, D., "Multilink Subnet Issues", RFC 4903,
June 2007.

[MultLinkSub] Thaler, D., Huitema, C., "Multi-link Subnet
Support in IPv6", draft-ietf-ipv6-multi-link-
subnets-00 (expired), June 2002.

[TcpIp] W. Stevens, "TCP/IP Illustrated, Volume 1: The
Protocols", Addison-Wesley, 1994.

7. Acknowledgments

The authors thank Ted Lemon, Eric Gray and Adrian Farrel for
providing valuable comments and suggestions to the draft.

This document was prepared using 2-Word-v2.0.template.dot.

Authors' Addresses

Youval Nachum
Email: youval.nachum@gmail.com

Linda Dunbar
Huawei Technologies
5430 Legacy Drive, Suite #175
Plano, TX 75024, USA
Phone: (469) 277 5840
Email: ldunbar@huawei.com

Nachum, et al. Expires October 8, 2015 [Page 20]

Internet-Draft SARP April 2015

Ilan Yerushalmi
Marvell
6 Hamada St.
Yokneam, 20692 Israel
Email: yilan@marvell.com

Tal Mizrahi
Marvell
6 Hamada St.
Yokneam, 20692 Israel
Email: talmi@marvell.com

Nachum, et al. Expires October 8, 2015 [Page 21]