Internet DRAFT - draft-ietf-bess-weighted-hrw
draft-ietf-bess-weighted-hrw
BESS Working Group S R. Mohanty
Internet-Draft M. Misra
Intended status: Standards Track A. Lindem
Expires: 26 August 2023 A. Sajassi
Cisco Systems, Inc.
J. Drake
Juniper Networks, Inc.
22 February 2023
Weighted HRW and its applications
draft-ietf-bess-weighted-hrw-00
Abstract
Rendezvous Hashing also known as Highest Random Weight (HRW) has been
used in many load balancing applications where the central problem is
how to map an object to as server such that the mapping is uniform
and also minimally affected by the change in the server set.
Recently, it has found use in DF election algorithms in the EVPN
context and load balancing using DMZ. This draft deals with the
problem of achieving load balancing with minimal disruption when the
servers have different weights. It provides an algorithm to do so
and also describes a few use-case scenarios where this algorithmic
technique can apply.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 26 August 2023.
Copyright Notice
Copyright (c) 2023 IETF Trust and the persons identified as the
document authors. All rights reserved.
Mohanty, et al. Expires 26 August 2023 [Page 1]
Internet-Draft Weighted HRW and its Applications February 2023
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Revised BSD License text as
described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License.
Table of Contents
1. Requirements Language . . . . . . . . . . . . . . . . . . . . 2
2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
3. HRW Introduction . . . . . . . . . . . . . . . . . . . . . . 3
4. HRW with weights . . . . . . . . . . . . . . . . . . . . . . 4
5. HRW and Consistent Hashing . . . . . . . . . . . . . . . . . 5
6. Weighted HRW and its application to the EVPN DF Election . . 5
7. Weighted HRW and its application to Resilient Hashing . . . . 7
8. Weighted HRW and its application to Multicast DR Election . . 8
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8
10. Operational Considerations . . . . . . . . . . . . . . . . . 8
11. Security Considerations . . . . . . . . . . . . . . . . . . . 8
12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8
13. References . . . . . . . . . . . . . . . . . . . . . . . . . 8
13.1. Normative References . . . . . . . . . . . . . . . . . . 8
13.2. Informative References . . . . . . . . . . . . . . . . . 9
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10
1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
2. Introduction
Given an object O, a set of servers and a set of clients, a
fundamental problem is how do the set of clients, independently and
unanimously agree in a distributed framework, which server to assign
O? This is the distributed hash table problem. The assignment
should be "minimally disruptive" which means that there should be a
minimal remapping of objects whenever a server is down or a new
server comes up or the object set changes. This is a very common
problem in practice in the Internet load balancing and web caching as
described in the 'Akamai' paper [CHASH], database [DYNAMODB] and
networking context.
Mohanty, et al. Expires 26 August 2023 [Page 2]
Internet-Draft Weighted HRW and its Applications February 2023
+----+ +----+ +----+ +-----+
| | | | | | | |
| S0 | | S1 | | S2 | | Sn |
| | | | | | | |
+----+ +----+ +----+ +-----+
| | | |
| | | |
| | | |
| | | |
| | | |
+----------+----------+-------------|
O0, O1, O2 ... ON
Set of Objects need to be assigned to the set of servers.
All the servers are of same capacities
Figure 1 The object to server assignment problem
Figure 1
In the Fig 1, we show a set of servers, S0,..,Sn and object pool
O0,..,On and the requirement is to assign Oi to Sj such that the
servers are uniformly loaded. In addition, when any server goes down
or a new one is introduced, there should be minimal reassignments.
There are two standard techniques to address this problem.
1. Consistent Hashing
2. Rendezvous Hashing
3. HRW Introduction
Highest Random Weight (HRW) as defined in [HRW1999]is originally
proposed in the context of Internet Caching and proxy Server load
balancing. Given an object name and a set of servers, HRW maps a
request to a server using the object-id (Oi) and server-id(Sj) rather
than the state of the server states. HRW computes a hash, Hash(Oi,
Sj) from the server-id and the object-id; this hash value can be
considered as a score, and forms an ordered list of the servers based
on the hash value (i.e. score) in decreasing order. The server for
which the score is the highest, serves as the primary responsible for
that particular object, and the server with the next highest score
serves as the backup server. HRW always maps a given object object
Mohanty, et al. Expires 26 August 2023 [Page 3]
Internet-Draft Weighted HRW and its Applications February 2023
name to the same server within a given cluster; consequently it can
be used at client sites to achieve global consensus on object-server
mappings. When that server goes down, the backup server becomes the
responsible designate.
Choosing an appropriate hash function that is statistically oblivious
to the key distribution and imparts a good uniform distribution of
the hash output is an important aspect of the algorithm. The
original HRW [HRW1999] provides pseudorandom functions based on Unix
utilities rand and srand and easily constructed XOR functions that
perform considerably well. Any good uniform hash function like the
Jenkins hash for instance will also work. HRW already finds use in
multicast and ECMP [RFC2991],[RFC2992].
4. HRW with weights
The issue when the servers are not of the same capacity is also quite
a common problem. However this problem has not gained as much
attention as it should. In such a case, an obvious approach is to
take the normalized weight factor into account, fi=wi/Sum(wi)and
multiply the Hash(Oi, Sj) with that value i.e. the value fi*Hash(Oi,
Sj). The Cache Array Routing Protocol [CARP] used this method.
However there is a problem with this approach, since any change in
weight of any of the servers, will result in a change in the
normalized weights for everyone. This will necessitate re-computing
all the weighted hash values all over again. Therefore this approach
does not have the minimal disruption property of the HRW. We address
this issue of the weighted HRW with minimal disruption in this draft.
Instead of re-normalizing the weights, or, in other words relatively
scaling them, the approach taken by [WHRW] is to adjust the score
before weighing them. When a server is added, removed or modified
(its weight changes), only the score for that server changes. That
server may win or lose some objects. Other servers remain affected.
There is no needless transfer of objects between servers whose weight
did not change. [WHRW] uses a clever way to accomplish this by
defining the score function as:
1. Score(Oi, Sj) = -wi/log(Hash(Oi, Sj)/Hmax); where Hmax is the
maximum hash value.
The author provides a mathematical proof as to why this choice of the
Score function works with very mild assumptions on the probability
distribution of the hash function.
Mohanty, et al. Expires 26 August 2023 [Page 4]
Internet-Draft Weighted HRW and its Applications February 2023
+----+ +----+ +----+ +----+
| | | | | | | |
| S0 | | S1 | | S2 |-------| Sn |
| w0 | | w1 | | w2 | | wn |
+----+ +----+ +----+ +----+
| | | |
| | | |
| | | |
| | | |
| | | |
+----------+----------+-------------|
O1, O2 ... ON
Set of Objects need to be assigned to the set of servers.
Each server is now associated with a weight
Figure 1 The object to server assignment problem
Figure 2
5. HRW and Consistent Hashing
HRW is not the only algorithm that addresses the object to server
mapping problem with goals of fair load distribution, redundancy and
fast access. There is another family of algorithms that also
addresses this problem; these fall under the umbrella of the
Consistent Hashing Algorithms [CHASH]. These will not be considered
here.
6. Weighted HRW and its application to the EVPN DF Election
The notion and need for the Designated Forwarder is described in
[RFC7432]. Consider a CE that is a host or a router that is multi-
homed directly to more than one PE in an EVPN instance on a given
Ethernet segment. One or more Ethernet Tags may be configured on the
Ethernet segment. In this scenario only one of the PEs, referred to
as the Designated Forwarder (DF), is responsible for certain actions:
a. Sending multicast and broadcast traffic, on a given Ethernet Tag
on a particular Ethernet segment, to the CE.
Mohanty, et al. Expires 26 August 2023 [Page 5]
Internet-Draft Weighted HRW and its Applications February 2023
b. Flooding unknown unicast traffic (i.e. traffic for which an PE
does not know the destination MAC address), on a given Ethernet
Tag on a particular Ethernet segment to the CE, if the
environment requires flooding of unknown unicast traffic.
+---------------+
| IP/MPLS |
| CORE |
+----+ ES1 +----+ +----+
| CE1|-----| |-----------| |____ES2
+----+ | PE1| | PE2| \
| |-------- +----+ \+----+
+----+ | | | CE2|
| | +----+ /+----+
| |__| |____/ |
| | PE3| ES2 /
| +----+ /
| | /
+-------------+----+ /
| PE4|____/ES2
| |
+----+
Figure 3 Multi-homing Network of EVPN
Figure 3
Figure 3 illustrates a case where there are two Ethernet Segments,
ES1 and ES2. PE1 is attached to CE1 via Ethernet Segment ES1 whereas
PE2, PE3 and PE4 are attached to CE2 via ES2 i.e. PE2, PE3 and PE4
form a redundancy group. Since CE2 is multi-homed to different PEs
on the same Ethernet Segment, it is necessary for PE2, PE3 and PE4 to
agree on a DF to satisfy the above mentioned requirements.
The use of HRW in the EVPN DF Election is described in [RFC8584]. In
that RFC it is explained how the HRW DF Election performs better than
the modulo DF Election algorithm in [RFC7432]. However, it is
implicitly assumed there that all the PEs are of the same capacity
(weights equal).
DMZ link bandwidth for load balancing flows across multiple EBGP
egress points is described in [I-D.ietf-idr-link-bandwidth]. It has
been extended to the case of cumulative DMZ load balancing
Mohanty, et al. Expires 26 August 2023 [Page 6]
Internet-Draft Weighted HRW and its Applications February 2023
[I-D.ietf-bess-ebgp-dmz] in the case of an all EBGP network in the
data center. [I-D.ietf-bess-evpn-unequal-lb] describes the use of
the DMZ in the EVPN DF Election. The argument is made that ideally
one should be able to change the link bandwidth in one or more of the
multi-homed PEs rather than have to change in all of the multi-homed
PEs simultaneously. The draft describes the bandwidth increments to
be taken into consideration and proposes an iterative way to assign
the score function. The description in Section 4.3.2 of
[I-D.ietf-bess-evpn-unequal-lb] is an non-optimal solution and
somewhat empirical. It does not obey the minimal disruption property
of the HRW.
In contrast to the procedures for weighted HRW in 4.3.2 of
[I-D.ietf-bess-evpn-unequal-lb], we can achieve an optimal solution
for weighted HRW in [I-D.ietf-bess-evpn-unequal-lb] using the score
function as described in Section 4 above and obviating the need to
take bandwidth increments. It is an order of magnitude faster and
efficient and minimally disruptive.
7. Weighted HRW and its application to Resilient Hashing
With the exponential increase in the number of physical links used in
data centers, there is also the potential for an increase in the
number of failed physical links. In systems that employ static
hashing for load balancing flows across members of port channels or
Equal Cost Multipath (ECMP) groups, each flow is hashed to a link.
When a link fails, all flows including those that were previously
mapped to the non-failed links are rehashed across the remaining
working links. This causes packet reordering of flows that were in
fact not mapped to the link that failed. A similar rehashing with
packet re-ordering also happens when a link is added to the port
channel or Equal Cost Multipath (ECMP) group. With the ever
increasing number of physical links used in the data centers there
the possibility for increasing number of failed links only increases.
Hence the resilient hashing is very important.
However when the links are not of the same speed, Resilient hashing
for ECMP does not apply per-se. However, one can use the method
explained in Section 4 to achieve resilient hashing even in the
Unequal Cost Multipath (UCMP)case or when member links are of
different bandwidths.
Mohanty, et al. Expires 26 August 2023 [Page 7]
Internet-Draft Weighted HRW and its Applications February 2023
8. Weighted HRW and its application to Multicast DR Election
[I-D.mankamana-pim-bdr]propose a mechanism to elect backup DR on a
shared LAN. A backup DR on LAN would be useful for faster
convergence. When the access bandwidth is different for the PIM
routers and we want to do a load balancing among the PIM routers for
DR/backup DR functionality with regards to the various (S,G) flow,
technique similar to Section 4 can be applied. The details of the
problem is out of the scope of the current draft and is being worked
on separately at this time.
9. IANA Considerations
A request needs to registered with IANA registry for the weighted HRW
EVPN DF Election Algorithm in the DF Alg field in the DF Election
Extended Community in [RFC8584].
10. Operational Considerations
TBD.
11. Security Considerations
This document raises no new security issues for EVPN.
12. Acknowledgements
The authors would like to thank Shyam Sethuram and Peter Psenak for
useful discussions related to this draft.
13. References
13.1. Normative References
[HRW1999] Thaler, D. and C. Ravishankar, "Using Name-Based Mappings
to Increase Hit Rates", IEEE/ACM Transactions in
networking Volume 6 Issue 1, February 1998.
[I-D.ietf-bess-evpn-unequal-lb]
Malhotra, N., Sajassi, A., Rabadan, J., Drake, J.,
Lingala, A., and S. Thoria, "Weighted Multi-Path
Procedures for EVPN Multi-Homing", Work in Progress,
Internet-Draft, draft-ietf-bess-evpn-unequal-lb-17, 28
November 2022, <https://datatracker.ietf.org/doc/html/
draft-ietf-bess-evpn-unequal-lb-17>.
Mohanty, et al. Expires 26 August 2023 [Page 8]
Internet-Draft Weighted HRW and its Applications February 2023
[I-D.ietf-idr-link-bandwidth]
Mohapatra, P. and R. Fernando, "BGP Link Bandwidth
Extended Community", Work in Progress, Internet-Draft,
draft-ietf-idr-link-bandwidth-07, 5 March 2018,
<https://datatracker.ietf.org/doc/html/draft-ietf-idr-
link-bandwidth-07>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
[RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A.,
Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based
Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February
2015, <https://www.rfc-editor.org/info/rfc7432>.
[RFC8584] Rabadan, J., Ed., Mohanty, S., Ed., Sajassi, A., Drake,
J., Nagaraj, K., and S. Sathappan, "Framework for Ethernet
VPN Designated Forwarder Election Extensibility",
RFC 8584, DOI 10.17487/RFC8584, April 2019,
<https://www.rfc-editor.org/info/rfc8584>.
[WHRW] Resch, J., "New hashing Algorithms for Data Storage",
Storage Developer Conference 18, November 2015.
13.2. Informative References
[CARP] Valloppillil, V. and K. Ross, "Cache Array Routing
Protocol v1.1", IEEE/ACM Transactions in networking Volume
6 Issue 1, February 1998.
[CHASH] Karger, D., Lehman, E., Leighton, T., Panigrahy, R.,
Levine, M., and D. Lewin, "Consistent Hashing and Random
Trees: Distributed Caching Protocols for Relieving Hot
Spots on the World Wide Web", ACM Symposium on Theory of
Computing ACM Press New York, May 1997.
[DYNAMODB] Decennia, G., Hastorun, D., Jampani, M., Kakulapati, G.,
Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall,
P., and W. Vogels, "Dynamo: Amazon’s Highly Available Key-
value Store", SOSP 07, October 2007.
Mohanty, et al. Expires 26 August 2023 [Page 9]
Internet-Draft Weighted HRW and its Applications February 2023
[I-D.ietf-bess-ebgp-dmz]
satyamoh@cisco.com, Vayner, A., Gattani, A., Kini, A., and
J. Tantsura, "Cumulative DMZ Link Bandwidth and load-
balancing", Work in Progress, Internet-Draft, draft-ietf-
bess-ebgp-dmz-02, 4 January 2023,
<https://datatracker.ietf.org/doc/html/draft-ietf-bess-
ebgp-dmz-02>.
[I-D.mankamana-pim-bdr]
Mishra, M. P., Santhanam, S., Paramasivam, A., Goh, J.,
and G. S. Mishra, "PIM Backup Designated Router
Procedure", Work in Progress, Internet-Draft, draft-
mankamana-pim-bdr-05, 8 April 2021,
<https://datatracker.ietf.org/doc/html/draft-mankamana-
pim-bdr-05>.
[RFC2991] Thaler, D. and C. Hopps, "Multipath Issues in Unicast and
Multicast Next-Hop Selection", RFC 2991,
DOI 10.17487/RFC2991, November 2000,
<https://www.rfc-editor.org/info/rfc2991>.
[RFC2992] Hopps, C., "Analysis of an Equal-Cost Multi-Path
Algorithm", RFC 2992, DOI 10.17487/RFC2992, November 2000,
<https://www.rfc-editor.org/info/rfc2992>.
Authors' Addresses
Satya Ranjan Mohanty
Cisco Systems, Inc.
225 West Tasman Drive
San Jose, CA 95134
United States of America
Email: satyamoh@cisco.com
Mankamana Misra
Cisco Systems, Inc.
170 W. Tasman Drive
San Jose, CA 95134
United States of America
Email: mankamis@cisco.com
Acee Lindem
Cisco Systems, Inc.
170 West Tasman Drive
San Jose, CA 95134
United States of America
Mohanty, et al. Expires 26 August 2023 [Page 10]
Internet-Draft Weighted HRW and its Applications February 2023
Email: acee@cisco.com
Ali Sajassi
Cisco Systems, Inc.
170 West Tasman Drive
San Jose, CA 95134
United States of America
Email: sajassi@cisco.com
John Drake
Juniper Networks, Inc.
1194 N. Mathilda Drive
Sunnyvale, CA 94089
United States of America
Email: jdrake@juniper.net
Mohanty, et al. Expires 26 August 2023 [Page 11]