Internet DRAFT - draft-nandy-utkarsh-pim-mcast-path-mtu
draft-nandy-utkarsh-pim-mcast-path-mtu
INTERNET-DRAFT Tathagata Nandy
Intended Status: Proposed Standard HPE
Utkarsh Srivastava
HPE
Expires: 18 July 2021 January 18, 2021
Multicast Path MTU
draft-nandy-utkarsh-pim-mcast-path-mtu-00
Abstract
Path MTU discovery (rfc1191) is a standard technique to determine
the supported MTU between two Internet Protocol (IP) hosts to avoid
any fragmentation. In a multicast distribution tree, source will
not know where the receivers are located. So the technique used to
compute the path MTU for a unicast stream does not work in a
multicast network. This document describes a method to discover
multicast path MTU with the goal to avoid traffic loss. This
solution also aims to solve the problem of traffic loss in for
multicast streams because of incorrect MTU setting and no path MTU
support for multicast networks.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-Drafts
as reference material or to cite them other than as "work in
progress."
This Internet-Draft will expire on 18 July 2021.
Copyright Notice
Copyright (c) 2020 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this
document. Please review these documents carefully, as they
describe your rights and restrictions with respect to this
document. Code Components extracted from this document must include
Simplified BSD License text as described in Section 4.e of the
Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Tathagata, et al. Expires 12 June 2021 [Page 1]
Internet-Draft PIM Multicast Path MTU December 2020
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 3
2. Conventions used in this document . . . . . . . . . . . . . 3
3. Problem Statement . . . . . . . . . . . . . . . . . . . . . 4
4. Multicast Path MTU . . . . . . . . . . . . . . . . . . . . 5
5. Security Considerations . . . . . . . . . . . . . . . . . . 6
6. IANA considerations . . . . . . . . . . . . . . . . . . . . 6
7. References . . . . . . . . . . . . . . . . . . . . . . . . 7
7.1. Normative References . . . . . . . . . . . . . . . . . 7
7.2. Informative References . . . . . . . . . . . . . . . . 7
8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 8
Author's Address . . . . . . . . . . . . . . . . . . . . . . . 8
Tathagata, et al. Expires 12 June 2021 [Page 2]
Internet-Draft PIM Multicast Path MTU December 2020
1. Introduction
When one IP host has a large amount of data to send to another
host, the data is transmitted as a series of IP datagrams. It is
usually preferable that these datagrams be of the largest size that
does not require fragmentation anywhere along the path from the
source to the destination. (For the case against fragmentation,
see [5].) This datagram size is referred to as the Path MTU (PMTU),
and it is equal to the minimum of the MTUs of each hop in the path.
A shortcoming of the current Internet protocol suite is the lack of
a standard mechanism for a host to discover the PMTU of an
arbitrary path. Note: The Path MTU is what in [1] is called the
"Effective MTU for sending" (EMTU_S). A PMTU is associated with a
path, which is a particular combination of IP source and
destination address and perhaps a Type-of-service (TOS). The
current practice [1] is to use the lesser of 576 and the first-hop
MTU as the PMTU for any destination that is not connected to the
same network or subnet as the source. In computer networking,
multicast is group communication where data transmission is
addressed to a group of destination computers simultaneously.
Multicast can be one-to-many or many-to-many distribution.
Multicast should not be confused with physical layer
point-to-multipoint communication. Ethernet frames with a value of
1 in the least-significant bit of the first octet of the
destination address are treated as multicast frames and are flooded
to all points on the network. This mechanism constitutes multicast
at the data link layer. This mechanism is used by IP multicast to
achieve one-to-many transmission for IP on Ethernet networks.
Modern Ethernet controllers filter received packets to reduce CPU
load, by looking up the hash of a multicast destination address in
a table, initialized by software, which controls whether a
multicast packet is dropped or fully received. IP multicast is a
technique for one-to-many communication over an IP network. The
destination nodes send Internet Group Management Protocol join and
leave messages, for example in the case of IPTV when the user
changes from one TV channel to another. Multicast uses network
infrastructure efficiently by requiring the source to send a packet
only once, even if it needs to be delivered to a large number of
receivers. The nodes in the network take care of replicating the
packet to reach multiple receivers only when necessary.
2. Conventions used in this document
2.1. Terminology
The reader is assumed to be familiar with the terminology,
reference models, and taxonomy defined in [RFC4664] and [RFC4665].
For readability purposes, we repeat some of the terms here.
Moreover, we also propose some other terms needed when IP multicast
support is discussed.
Tathagata, et al. Expires 21 July 2021 [Page 3]
Internet-Draft PIM Multicast Path MTU December 2020
Multicast domain
An area in which multicast data is transmitted. In this
document, this term has a generic meaning that can refer to
Layer-2 and Layer-3. Generally, the Layer-3 multicast domain is
determined by the Layer-3 multicast protocol used to establish
reachability between all potential receivers in the
corresponding domain. The Layer-2 multicast domain can be the
same as the Layer-2 broadcast domain (i.e., VLAN), but it may be
restricted to being smaller than the Layer-2 broadcast domain if
an additional control protocol is used.
PIM-SM
Protocol Independent Multicast Sparse Mode (PIM-SM) is a family
of multicast routing protocols for Internet Protocol (IP)
networks that provide one-to-many and many-to-many distribution
of data over a LAN, WAN or the Internet. It explicitly builds
unidirectional shared trees rooted at a rendezvous point (RP)
per group, and optionally creates shortest-path trees per
source. PIM-SM uses shared trees by default and implements
source-based trees for efficiency; it assumes that no hosts want
the multicast traffic unless they specifically ask for it.
Senders first send the multicast data to the RP, which in turn
sends the data down the shared tree to the receivers.
PIM-SSM
PIM source-specific multicast (SSM) uses a subset of PIM sparse
mode and IGMP version 3 (IGMPv3) to allow a client to receive
multicast traffic directly from the source. PIM SSM uses the
PIM sparse-mode functionality to create an SPT between the
receiver and the source, but builds the SPT without the help of
an RP.
2.2. Conventions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
this document are to be interpreted as described in [RFC2119].
3. Problem Statement
3.1. Motivation
Path MTU discovery computes the lowest MTU supported between two
hosts to avoid IP fragmentation. For a unicast packet, source
device sends out a packet with Don't Fragment (DF) flag bit set in
the IP header [1]. Any device along the path whose MTU is
smaller than the packet will drop the packet and send back an ICMP
Packet Too Big (Type 2) message containing its MTU, allowing the
source host to reduce its Path MTU appropriately. The process is
repeated until the MTU is small enough to traverse the entire path
without fragmentation. In a multicast distribution tree, the
source does not know the host for a multicast group till the
complete multicast tree is built. Hosts in different branches of
Tathagata, et al. Expires 21 July 2021 [Page 4]
Internet-Draft PIM Multicast Path MTU Jan 2021
the tree use IGMP/MLD followed by PIM to become part of the
multicast tree. Generally the process starts at the host where it
sends a request to become part of a multicast tree through IGMP
joins. The same request is sent to the RP and there by source and
group develop a common path. So the technique mentioned above may
not work for multicast flows.
3.2. Scalability
Most routers doesn't send ICMP (unreachable; fragmentation needed)
messages in response to too-big IPv4 multicast packets with DF-bit
set. They're just dropping these packets silently, breaking PMTUD.
This is a case of as-per-design feature and is updated in section
7.2 of RFC 1112 that an ICMP error message (Destination
Unreachable, Time Exceeded, Parameter Problem, Source Quench, or
Redirect) is never generated in response to a datagram destined to
an IP host group. The same document also describes why RFC 1112
prohibits sending ICMP error messages in response to multicast
datagrams. The processing done on ICMP error replies by the *nix
socket API might block the sender socket if an error comes back
from a single receiver or if TTL expires when traversing a
particularly long branch of the multicast tree, not exactly a good
idea in multicast environment.
4. Multicast Path MTU
The multicast Stream between a Source and a Host for a particular
Group uses the following path.
1. The Sender device connected Router, periodically sends probe
messages for a well-known Multicast Group that falls in the
PIM-SSM range. The probe packet here is nothing but small
packets whose destination IP falls in the SSM group range.
This should be a reserved IP and should not be used for
any other regular multicast stream.
2. The Probe packets are different from the actual packets that
the Source is sending. This algorithm runs on the Routers
and not on the actual Source sending the Stream.
3. The receiver Routers will also run periodic probing to the
Source(s). As part of the probe the receiving Routers will
run Path MTU protocol to the Source Device. The PMTU will
run only for Active Sources when they receive the Probe
packets. This is the reason, the Sender device needs to
send periodic probe packets.
4. This will be performed at all the Receiver Routers
(Designated Router). All these Receiver Routers would also
use the same Source which would be specifically reserved
for PMTU computation. This is the PIM SSM source for
the specified Group.
Tathagata, et al. Expires 18 July 2021 [Page 5]
Internet-Draft PIM Multicast Path MTU Jan 2021
5. There are two options, one is the receiver Router
(Host Connected DR) themselves sending a PIM Join for
these Groups to the sources or optionally it can act
on this by receiving an IGMP v3 join. In the latter
case , the Host device need to send IGMP v3 joins to
the Sources for Computing Path MTU
6. The Receiver DR (Host Connected) would compute PMTU
to the Source by sending Probe packets of different
sizes.
7. Once the receiver Router has computed the PMTU to the
Source connected DR, the PMTU will be sent to the
Source Router via a new option in PIM Join packet or
a new type of PIM packet. A new ICMP packet is not
chosen for this as this algorithm is supposed to
run inside the PIM Application.
8. Once the Source Connected Designated Router receives
the PMTU for all the connected paths, it would compute
the minimum MTU and send it back to the Source device.
This takes away all the computation headache from the
Source Device. The Source device will get the periodic
MTU update from all the Routers and should never send
any packets with a MTU higher than this. The assumption
is that TCP/IP stack with ICMP packets is implemented
in all the Sources, so internally it can handle the
ICMP packets.
9. The probing packets sent by the sender device can be
of reduced frequency to prevent congestion
10. The receiver can keep sending the probe packets
as long as it has an intended Host.
5 IANA Considerations
This memo includes no request to IANA.
6 Security Considerations
This Path MTU Discovery mechanism makes possible two
denial-of-service attacks, both based on a malicious party sending
false Datagram Too Big messages to an Internet host. In the first
attack, the false message indicates a PMTU much smaller than
reality. This should not entirely stop data flow, since the victim
host should never set its PMTU estimate below the absolute minimum,
but at 8 octets of IP data per datagram, progress could be slow.
In the other attack, the false message indicates a PMTU greater
than reality. If believed, this could cause temporary blockage as
Tathagata, et al. Expires 18 July 2021 [Page 6]
Internet-Draft PIM Multicast Path MTU Jan 2021
the victim sends datagrams that will be dropped by some router.
Within one round-trip time, the host would discover its mistake
(receiving Datagram Too Big messages from that router), but
frequent repetition of this attack could cause lots of datagrams to
be dropped. A host, however, should never raise its estimate of the
PMTU based on a Datagram Too Big message, so should not be
vulnerable to this attack. A malicious party could also cause
problems if it could stop a victim from receiving legitimate
Datagram Too Big messages, but in this case there are simpler
denial-of-service attacks available. In another case if the
packets are always rejected because of higher MTU and the sender
does not change the packet size or the admin does not adjust the
MTU, there is a risk of a DOS attack on the Switch sending the ICMP
Error packet. Multicast packet send at high rate can consume the
CPU resources of all the Routers implementing the PMTU for
Multicast.
7 References
7.1 Normative References
[1] J. Mogul, S. Deering. Path MTU Discovery. RFC 1191, DECWRL
and Stanford University, November, 1990.
[2] J. Postel, INTERNET CONTROL MESSAGE PROTOCOL. RFC 791,
ISI, September 1981.
7.2 Informative References
[3] <https://blog.ipspace.net/2015/09/
path-mtu-discovery-doesnt-work-with-ip.html>
[4] <https://en.wikipedia.org/wiki/Multicast>
[5] <https://www.cisco.com/c/en/us/products/collateral/
ios-nx-os-software/ip-multicast/whitepaper_c11-508498.html>
Tathagata, et al. Expires 18 July 2021 [Page 7]
Internet-Draft PIM Multicast Path MTU Jan 2021
8 Acknowledgments
The authors thank the contributors of [RFC1191] and RFC{5501] since
the structure and content of this document were, for some sections,
largely inspired from it. The authors also thank Mark Pearson and
others for their valuable reviews and feedback. THIS SOFTWARE IS
PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.
9 Authors' Addresses
Tathagata Nandy
Hewlett Packard India Software Operations Pvt. Ltd.
Survey # 192, Whitefield Road,
Mahadevapura Post, Bangalore 560048. India
Phone: (+91) 9611895857
EMail: tathagata.nandy@hpe.com
Utkarsh Srivastava
Hewlett Packard India Software Operations Pvt. Ltd.
Survey # 192, Whitefield Road,
Mahadevapura Post, Bangalore 560048. India
Phone: (+91) 8948794936
EMail: usrivastava@hpe.com
Tathagata, et al. Expires 18 July 2021 [Page 8]