Internet-Draft | Performance and Liveness Monitoring in S | September 2020 |
Gandhi, et al. | Expires 30 March 2021 | [Page] |
Segment Routing (SR) leverages the source routing paradigm. SR is applicable to both Multiprotocol Label Switching (SR-MPLS) and IPv6 (SRv6) data planes. This document defines procedure for Enhanced Performance Delay and Liveness Monitoring (PDLM) in Segment Routing networks. The procedure leverages the probe messages compatible with the delay measurement message formats defined in RFC 5357 (Two-Way Active Measurement Protocol (TWAMP)) and RFC 8762 (Simple Two-Way Active Measurement Protocol (STAMP)) and is applicable to end-to-end SR Paths including SR Policies for both SR-MPLS and SRv6 data planes.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 30 March 2021.¶
Copyright (c) 2020 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.¶
Segment Routing (SR) leverages the source routing paradigm and greatly simplifies network operations for Software Defined Networks (SDNs). SR is applicable to both Multiprotocol Label Switching (SR-MPLS) and IPv6 (SRv6) data planes [RFC8402]. SR takes advantage of the Equal-Cost Multipaths (ECMPs) between source and transit nodes, between transit nodes and between transit and destination nodes. SR Policies as defined in [I-D.ietf-spring-segment-routing-policy] are used to steer traffic through a specific, user-defined paths using a stack of Segments. Built-in Performance Delay Measurement (DM) as well as Liveness Monitoring for Connectivity Verification (CV) and Continuity Check (CC) are essential requirements to provide Service Level Agreements (SLAs) in SR networks.¶
The One-Way Active Measurement Protocol (OWAMP) defined in [RFC4656] and Two-Way Active Measurement Protocol (TWAMP) defined in [RFC5357] provide capabilities for the measurement of various performance metrics in IP networks using probe messages. The TWAMP Light [Appendix I in RFC5357] and the Simple Two-way Active Measurement Protocol (STAMP) [RFC8762] provide simplified mechanisms for active performance measurement in IP networks, alleviating the need for control-channel signaling by using configuration data model to provision a test-channel.¶
[I-D.gandhi-spring-twamp-srpm] defines procedure for performance measurement using TWAMP Light messages with user-defined IP/UDP paths in SR networks. [I-D.gandhi-spring-stamp-srpm] defines similar procedure using STAMP messages in SR networks. The procedure for one-way and two-way modes defined for delay measurement can also be applied to liveness monitoring of SR Paths. However, it limits the scale for number of PM sessions and fault detection interval since the probe query messages need to be punted from the forwarding path (to slow path or control plane) and response messages need to be injected.¶
For Liveness Monitoring, Seamless Bidirectional Forwarding Detection (S-BFD) [RFC7880] can be used in Segment Routing networks. However, S-BFD requires protocol support on the reflector node to process the S-BFD packets as packets need to be punted from the forwarding path in order to send the reply thereby limiting the scale for number of PM sessions and fault detection interval. In addition, S-BFD protocol does not have the capability today to enable performance delay monitoring in SR networks. Enabling multiple protocols in SR networks, S-BFD for liveness monitoring and TWAMP Light or STAMP for performance delay monitoring increases the deployment and operational complexities in SR networks.¶
This document defines procedure for Enhanced Performance Delay and Liveness Monitoring (PDLM) in Segment Routing networks. The procedure leverages the probe messages compatible with the delay measurement message formats defined in RFC 5357 (Two-Way Active Measurement Protocol (TWAMP)) and RFC 8762 (Simple Two-Way Active Measurement Protocol (STAMP)) and is applicable to end-to-end SR Paths including SR Policies for both SR-MPLS and SRv6 data planes.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
BFD: Bidirectional Forwarding Detection.¶
BSID: Binding Segment ID.¶
DM: Delay Measurement.¶
ECMP: Equal Cost Multi-Path.¶
LM: Loss Measurement.¶
MPLS: Multiprotocol Label Switching.¶
OWAMP: One-Way Active Measurement Protocol.¶
PDLM: Performance Delay and Liveness Monitoring.¶
PM: Performance Measurement.¶
PTP: Precision Time Protocol.¶
SID: Segment ID.¶
SL: Segment List.¶
SR: Segment Routing.¶
SRH: Segment Routing Header.¶
SR-MPLS: Segment Routing with MPLS data plane.¶
SRv6: Segment Routing with IPv6 data plane.¶
STAMP: Simple Two-way Active Measurement Protocol.¶
TWAMP: Two-Way Active Measurement Protocol.¶
In the reference topology shown in Figure 1, the nodes R1 and R5 are connected via Point-to-Point (P2P) SR Path such as SR Policy [I-D.ietf-spring-segment-routing-policy] originating on node R1 with endpoint on node R5.¶
In loopback mode, the sender node R1 initiates probe messages and the reflector node R5 forwards them just like data packets for the normal traffic back to the sender node R1. The probe messages are not punted at the reflector node and it does not process them and generate response messages. The reflector node must not drop the loopback probe messages, for example, due to a local policy provisioned on the node. No PM session is created on the reflector node.¶
The Source and Destination IP addresses in the probe messages are set to the reflector and the sender node addresses, respectively (representing the reverse path). Both Source and Destination UDP ports in the probe messages are allocated dynamically or user-configured from the range specified in [RFC8762]. The UDP ports used in loopback mode are different than the ports used for TWAMP and STAMP sessions. The IPv4 Time To Live (TTL) and IPv6 Hop Limit (HL) are set to 255.¶
In "loopback mode enabled with network programming function", both transmit (t1) and receive (t2) timestamps in data plane are collected by the probe messages sent in loopback mode as shown in Figure 2. The network programming function optimizes the "operations of punt and inject the probe packet" on the reflector node as timestamping is implemented in hardware. This helps to achieve higher scale and faster rate, resulting in faster failure detection.¶
The sender node adds transmit (t1) timestamp in the payload of the probe message and clears the receive (t2) timestamp. The reflector node adds the receive timestamp in the payload of the received probe message in hardware without punting the message to slow-path (or control-plane). The reflector node only adds the receive timestamp if the source or destination address in the probe message matches the local node address to ensure that the probe message reaches the intended reflector node and the receive timestamp is returned by the that node. The payload of the probe message is not modified by any intermediate nodes.¶
The network programming function enables the node to add receive timestamp in the payload of the probe message at a specific offset which is locally provisioned consistently in the network. In the probe message defined in Figure 4 for delay measurement, the 64-bit receive timestamp is added at byte-offset 16 which is from the start of the payload.¶
An example provisioning model and typical measurement parameters are shown in Figure 3:¶
Example of message format is TWAMP and STAMP, example of Timestamp Format is 64-bit PTPv2 [IEEE1588] and NTP, etc.¶
The mechanisms to provision the sender and reflector nodes are outside the scope of this document.¶
The probe messages compatible with the delay measurement message formats defined in TWAMP [RFC5357] and STAMP [RFC8762] are specified in Figure 4.¶
Sequence Number is the sequence number of the probe packet according to its transmit order. It starts with zero and is incremented by one for each subsequent packet.¶
Transmit Timestamp and Transmit Error Estimate are the Sender's transmit timestamp and error estimate for the probe packet, respectively. Similarly, Receive Timestamp and Receive Error Estimate are the Reflector's receive timestamp and error estimate, respectively. The timestamp and error estimate fields follow the definition and formats defined in Section 4.1.2 in [RFC4656]. Timestamp format preferred is 64-bit PTPv2 [IEEE1588] as specified in [RFC8186], implemented in hardware.¶
For performance delay and liveness monitoring of an end-to-end SR Path including SR Policy, PM probes in loopback mode is used.¶
For SR Policy, the probe messages are sent using the Segment List (SL) of the Candidate-path [I-D.ietf-spring-segment-routing-policy]. When a Candidate-path has more than one Segment Lists, multiple probe messages are sent, one using each Segment List. The return probe messages are received by the sender node via IP/UDP [RFC0768] return path by default. The Segment List of the return SR path can be added in the probe message header to receive the return probe message on a specific path using the Binding SID [I-D.ietf-pce-binding-label-sid] or Segment List of the Reverse SR Policy [I-D.ietf-pce-sr-bidir-path].¶
The probe messages are sent using the MPLS header containing the label stack of the SR Policy as shown in Figure 5. In case of IP/UDP return path, the MPLS header is removed by the reflector node. The label stack can contain a reverse SR-MPLS path to receive the return probe message on a specific path. In this case, the MPLS header will not be removed by the reflector node.¶
The probe messages for SRv6 data plane are sent using the Segment Routing Header (SRH) [RFC8754] containing the Segment List of the SR Policy as shown in Figure 6. In case of IP/UDP return path, the SRH is removed by the reflector node. The Segment List can contain a reverse SRv6 path to receive the return probe message on a specific path. In this case, the SRH will not be removed by the reflector node. When the return probe message contains an SRH at the sender node, the procedure defined for upper-layer header processing for SRv6 SIDs in [I-D.ietf-spring-srv6-network-programming] is used to process the UDP header in the received probe messages.¶
The enhanced performance delay and liveness monitoring of an end-to-end SR Path including SR Policy is defined using the PM probes in "loopback mode enabled with network programming function".¶
In this document, new Timestamp Label (Extended Special-Purpose value TBD1) is defined for SR-MPLS data plane to enable network programming function for "timestamp, pop and forward" the received packet.¶
In the probe message for SR-MPLS, Timestamp Label is added in the MPLS header as shown in Figure 7, to collect "Receive Timestamp" field in the payload of the probe message. The label stack for the reverse SR-MPLS path can be added after the Timestamp Label to receive the return probe message on a specific path. When a node receives a message with Timestamp Label, after timestamping the packet at a specific offset, the node pops the Timestamp Label and forwards the message using the next label or IP header in the message (just like the data packets for the normal traffic).¶
Timestamp Label can be allocated using one of the following methods:¶
The ingress node needs to know if the egress node can process the Timestamp Label to avoid dropping probe packets. The signaling extension for this capability exchange is outside the scope of this document.¶
In this document, Timestamp Endpoint function for "Timestamp and Forward (TSF)" (SRv6 Endpoint Behaviour value TBD2) is defined for Segment Routing Header (SRH) [RFC8754] for SRv6 data plane to enable network programming function to "timestamp and forward" the received packet.¶
In the probe message for SRv6, End.TSF function is added for the target Segment Identifier (SID) in SRH [RFC8754] as shown in Figure 8, to collect "Receive Timestamp" field in the payload of the probe message. The Segment List for the reverse path can be added after the target SID to receive the return probe message on a specific path. When a reflector node receives a message with End.TSF function for the target SID which is local, after timestamping the packet at a specific offset, the node forwards the packet using the next SID or IP header in the message (just like the data packets for the normal traffic).¶
Timestamp endpoint function for "Timestamp and Forward" can be signaled using one of the following methods:¶
The ingress node needs to know if the egress node can process the Timestamp Endpoint Function to enable the monitoring. The signaling extension for this capability exchange is outside the scope of this document.¶
An SR Policy can have ECMPs between the source and transit nodes, between transit nodes and between transit and destination nodes. The PM probe messages need to be sent to traverse different ECMP paths to monitor the liveness for an end-to-end SR Policy.¶
Forwarding plane has various hashing functions available to forward packets on specific ECMP paths. In IPv4 header of the PM probe messages, sweeping of Destination Address in 127/8 range can be used to exercise different ECMP paths in the loopback mode as long as the return path is also SR-MPLS. The Flow Label field in the outer IPv6 header can also be used for sweeping to exercise different ECMP paths.¶
Liveness success for SR Path is initially notified as soon as one or more return probe messages are received at the sender node.¶
Liveness failure for SR Path is notified when consecutive N number of return probe messages are not received at the sender node, where N (Missed Probe Message Count) is locally provisioned value. Similarly, delay metrics are notified as an example when consecutive M number of probe messages have measured delay values exceed user-configured thresholds (absolute and percentage), where M is also locally provisioned value.¶
For the probe messages generated by the Sender node R1 in the loopback mode, a failure on the reverse direction path can also cause the return probe messages to not reach the Sender node. This is also true in case of the probe response messages generated by the Reflector node R5 e.g. to indicate node R1 of any failure on the forward direction path. As such, the probe-based methods have this limitation for the liveness monitoring of the forward direction path.¶
In loopback mode, the timestamps t1 and t4 are used to measure round-trip delay. In loopback mode enabled with network programming function, the timestamps t1 and t2 are used to measure one-way delay.¶
The Performance Delay and Liveness Monitoring is intended for deployment in the well-managed private and service provider networks. As such, it assumes that a node involved in a monitoring operation has previously verified the integrity of the path and the identity of the reflector node. If desired, attacks can be mitigated by performing basic validation and sanity checks, at the sender, of the timestamp fields in received probe messages. The minimal state associated with these protocols also limits the extent of disruption that can be caused by a corrupt or invalid message to a single probe cycle. Cryptographic measures may be used by the correct configuration of access-control lists and firewalls.¶
IANA maintains the "Special-Purpose Multiprotocol Label Switching (MPLS) Label Values" registry (see <https://www.iana.org/assignments/mpls-label-values/mpls-label-values.xml>). IANA is requested to allocate Timestamp Label value from the "Extended Special-Purpose MPLS Label Values" registry:¶
+-------------+---------------------------------+---------------+ | Value | Description | Reference | +-------------+---------------------------------+---------------+ | TBA1 | Timestamp Label | This document | +-------------+---------------------------------+---------------+¶
IANA is requested to allocate, within the "SRv6 Endpoint Behaviors Registry" sub-registry belonging to the top-level "Segment Routing Parameters" registry [I-D.ietf-spring-srv6-network-programming], the following allocation:¶
+-------------+---------------------------------+---------------+ | Value | Endpoint Behavior | Reference | +-------------+---------------------------------+---------------+ | TBA2 | End.TSF (Timestamp and Forward) | This document | +-------------+---------------------------------+---------------+¶
The authors would like to thank Greg Mirsky, Mach Chen, Kireeti Kompella, and Adrian Farrel for providing the review comments.¶