Internet Engineering Task Force | E. Grossman, Ed. |
Internet-Draft | DOLBY |
Intended status: Informational | C. Gunther |
Expires: August 12, 2016 | HARMAN |
P. Thubert | |
P. Wetterwald | |
CISCO | |
J. Raymond | |
HYDRO-QUEBEC | |
J. Korhonen | |
BROADCOM | |
Y. Kaneko | |
Toshiba | |
S. Das | |
Applied Communication Sciences | |
Y. Zha | |
HUAWEI | |
B. Varga | |
J. Farkas | |
Ericsson | |
F. Goetz | |
J. Schmitt | |
Siemens | |
February 9, 2016 |
Deterministic Networking Use Cases
draft-ietf-detnet-use-cases-01
This draft documents requirements in several diverse industries to establish multi-hop paths for characterized flows with deterministic properties. In this context deterministic implies that streams can be established which provide guaranteed bandwidth and latency which can be established from either a Layer 2 or Layer 3 (IP) interface, and which can co-exist on an IP network with best-effort traffic.
Additional requirements include optional redundant paths, very high reliability paths, time synchronization, and clock distribution. Industries considered include wireless for industrial applications, professional audio, electrical utilities, building automation systems, radio/mobile access networks, automotive, and gaming.
For each case, this document will identify the application, identify representative solutions used today, and what new uses an IETF DetNet solution may enable.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 12, 2016.
Copyright (c) 2016 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
This draft presents use cases from diverse industries which have in common a need for deterministic streams, but which also differ notably in their network topologies and specific desired behavior. Together, they provide broad industry context for DetNet and a yardstick against which proposed DetNet designs can be measured (to what extent does a proposed design satisfy these various use cases?)
For DetNet, use cases explicitly do not define requirements; The DetNet WG will consider the use cases, decide which elements are in scope for DetNet, and the results will be incorporated into future drafts. Similarly, the DetNet use case draft explicitly does not suggest any specific design, architecture or protocols, which will be topics of future drafts.
We present for each use case the answers to the following questions:
The level of detail in each use case should be sufficient to express the relevant elements of the use case, but not more.
At the end we consider the use cases collectively, and examine the most significant goals they have in common.
(This section was derived from draft-gunther-detnet-proaudio-req-01)
The professional audio and video industry includes music and film content creation, broadcast, cinema, and live exposition as well as public address, media and emergency systems at large venues (airports, stadiums, churches, theme parks). These industries have already gone through the transition of audio and video signals from analog to digital, however the interconnect systems remain primarily point-to-point with a single (or small number of) signals per link, interconnected with purpose-built hardware.
These industries are now attempting to transition to packet based infrastructure for distributing audio and video in order to reduce cost, increase routing flexibility, and integrate with existing IT infrastructure.
However, there are several requirements for making a network the primary infrastructure for audio and video which are not met by todays networks and these are our concern in this draft.
The principal requirement is that pro audio and video applications become able to establish streams that provide guaranteed (bounded) bandwidth and latency from the Layer 3 (IP) interface. Such streams can be created today within standards-based layer 2 islands however these are not sufficient to enable effective distribution over wider areas (for example broadcast events that span wide geographical areas).
Some proprietary systems have been created which enable deterministic streams at layer 3 however they are engineered networks in that they require careful configuration to operate, often require that the system be over designed, and it is implied that all devices on the network voluntarily play by the rules of that network. To enable these industries to successfully transition to an interoperable multi-vendor packet-based infrastructure requires effective open standards, and we believe that establishing relevant IETF standards is a crucial factor.
It would be highly desirable if such streams could be routed over the open Internet, however even intermediate solutions with more limited scope (such as enterprise networks) can provide a substantial improvement over todays networks, and a solution that only provides for the enterprise network scenario is an acceptable first step.
We also present more fine grained requirements of the audio and video industries such as safety and security, redundant paths, devices with limited computing resources on the network, and that reserved stream bandwidth is available for use by other best-effort traffic when that stream is not currently in use.
The fundamental stream properties are guaranteed bandwidth and deterministic latency as described in this section. Additional stream requirements are described in a subsequent section.
Transmitting audio and video streams is unlike common file transfer activities because guaranteed delivery cannot be achieved by re-trying the transmission; by the time the missing or corrupt packet has been identified it is too late to execute a re-try operation and stream playback is interrupted, which is unacceptable in for example a live concert. In some contexts large amounts of buffering can be used to provide enough delay to allow time for one or more retries, however this is not an effective solution when live interaction is involved, and is not considered an acceptable general solution for pro audio and video. (Have you ever tried speaking into a microphone through a sound system that has an echo coming back at you? It makes it almost impossible to speak clearly).
Providing a way to reserve a specific amount of bandwidth for a given stream is a key requirement.
Latency in this context means the amount of time that passes between when a signal is sent over a stream and when it is received, for example the amount of time delay between when you speak into a microphone and when your voice emerges from the speaker. Any delay longer than about 10-15 milliseconds is noticeable by most live performers, and greater latency makes the system unusable because it prevents them from playing in time with the other players (see slide 6 of [SRP_LATENCY]).
The 15ms latency bound is made even more challenging because it is often the case in network based music production with live electric instruments that multiple stages of signal processing are used, connected in series (i.e. from one to the other for example from guitar through a series of digital effects processors) in which case the latencies add, so the latencies of each individual stage must all together remain less than 15ms.
In some situations it is acceptable at the local location for content from the live remote site to be delayed to allow for a statistically acceptable amount of latency in order to reduce jitter. However, once the content begins playing in the local location any audio artifacts caused by the local network are unacceptable, especially in those situations where a live local performer is mixed into the feed from the remote location.
In addition to being bounded to within some predictable and acceptable amount of time (which may be 15 milliseconds or more or less depending on the application) the latency also has to be consistent. For example when playing a film consisting of a video stream and audio stream over a network, those two streams must be synchronized so that the voice and the picture match up. A common tolerance for audio/video sync is one NTSC video frame (about 33ms) and to maintain the audience perception of correct lip sync the latency needs to be consistent within some reasonable tolerance, for example 10%.
A common architecture for synchronizing multiple streams that have different paths through the network (and thus potentially different latencies) is to enable measurement of the latency of each path, and have the data sinks (for example speakers) buffer (delay) all packets on all but the slowest path. Each packet of each stream is assigned a presentation time which is based on the longest required delay. This implies that all sinks must maintain a common time reference of sufficient accuracy, which can be achieved by any of various techniques.
This type of architecture is commonly implemented using a central controller that determines path delays and arbitrates buffering delays.
The controller might also perform optimizations based on the individual path delays, for example sinks that are closer to the source can inform the controller that they can accept greater latency since they will be buffering packets to match presentation times of farther away sinks. The controller might then move a stream reservation on a short path to a longer path in order to free up bandwidth for other critical streams on that short path. See slides 3-5 of [SRP_LATENCY].
Additional optimization can be achieved in cases where sinks have differing latency requirements, for example in a live outdoor concert the speaker sinks have stricter latency requirements than the recording hardware sinks. See slide 7 of [SRP_LATENCY].
Device cost can be reduced in a system with guaranteed reservations with a small bounded latency due to the reduced requirements for buffering (i.e. memory) on sink devices. For example, a theme park might broadcast a live event across the globe via a layer 3 protocol; in such cases the size of the buffers required is proportional to the latency bounds and jitter caused by delivery, which depends on the worst case segment of the end-to-end network path. For example on todays open internet the latency is typically unacceptable for audio and video streaming without many seconds of buffering. In such scenarios a single gateway device at the local network that receives the feed from the remote site would provide the expensive buffering required to mask the latency and jitter issues associated with long distance delivery. Sink devices in the local location would have no additional buffering requirements, and thus no additional costs, beyond those required for delivery of local content. The sink device would be receiving the identical packets as those sent by the source and would be unaware that there were any latency or jitter issues along the path.
The requirements in this section are more specific yet are common to multiple audio and video industry applications.
Some audio systems installed in public environments (airports, hospitals) have unique requirements with regards to health, safety and fire concerns. One such requirement is a maximum of 3 seconds for a system to respond to an emergency detection and begin sending appropriate warning signals and alarms without human intervention. For this requirement to be met, the system must support a bounded and acceptable time from a notification signal to specific stream establishment. For further details see [ISO7240-16].
Similar requirements apply when the system is restarted after a power cycle, cable re-connection, or system reconfiguration.
In many cases such re-establishment of streaming state must be achieved by the peer devices themselves, i.e. without a central controller (since such a controller may only be present during initial network configuration).
Video systems introduce related requirements, for example when transitioning from one camera feed to another. Such systems currently use purpose-built hardware to switch feeds smoothly, however there is a current initiative in the broadcast industry to switch to a packet-based infrastructure (see [STUDIO_IP] and the ESPN DC2 use case described below).
In cases where stream bandwidth is reserved but not currently used (or is under-utilized) that bandwidth must be available to best-effort (i.e. non-time-sensitive) traffic. For example a single stream may be nailed up (reserved) for specific media content that needs to be presented at different times of the day, ensuring timely delivery of that content, yet in between those times the full bandwidth of the network can be utilized for best-effort tasks such as file transfers.
This also addresses a concern of IT network administrators that are considering adding reserved bandwidth traffic to their networks that users will just reserve a ton of bandwidth and then never un-reserve it even though they are not using it, and soon they will have no bandwidth left.
As an intermediate step (short of providing guaranteed bandwidth across the open internet) it would be valuable to provide a way to connect multiple Layer 2 networks. For example layer 2 techniques could be used to create a LAN for a single broadcast studio, and several such studios could be interconnected via layer 3 links.
Digital Rights Management (DRM) is very important to the audio and video industries. Any time protected content is introduced into a network there are DRM concerns that must be maintained (see [CONTENT_PROTECTION]). Many aspects of DRM are outside the scope of network technology, however there are cases when a secure link supporting authentication and encryption is required by content owners to carry their audio or video content when it is outside their own secure environment (for example see [DCI]).
As an example, two techniques are Digital Transmission Content Protection (DTCP) and High-Bandwidth Digital Content Protection (HDCP). HDCP content is not approved for retransmission within any other type of DRM, while DTCP may be retransmitted under HDCP. Therefore if the source of a stream is outside of the network and it uses HDCP protection it is only allowed to be placed on the network with that same HDCP protection.
On-air and other live media streams must be backed up with redundant links that seamlessly act to deliver the content when the primary link fails for any reason. In point-to-point systems this is provided by an additional point-to-point link; the analogous requirement in a packet-based system is to provide an alternate path through the network such that no individual link can bring down the system.
For transmitting streams that require more bandwidth than a single link in the target network can support, link aggregation is a technique for combining (aggregating) the bandwidth available on multiple physical links to create a single logical link of the required bandwidth. However, if aggregation is to be used, the network controller (or equivalent) must be able to determine the maximum latency of any path through the aggregate link (see Bounded and Consistent Latency section above).
Sink devices may be low cost devices with limited processing power. In order to not overwhelm the CPUs in these devices it is important to limit the amount of traffic that these devices must process.
As an example, consider the use of individual seat speakers in a cinema. These speakers are typically required to be cost reduced since the quantities in a single theater can reach hundreds of seats. Discovery protocols alone in a one thousand seat theater can generate enough broadcast traffic to overwhelm a low powered CPU. Thus an installation like this will benefit greatly from some type of traffic segregation that can define groups of seats to reduce traffic within each group. All seats in the theater must still be able to communicate with a central controller.
There are many techniques that can be used to support this requirement including (but not limited to) the following examples.
Packet forwarding rules can be used to eliminate some extraneous streaming traffic from reaching potentially low powered sink devices, however there may be other types of broadcast traffic that should be eliminated using other means for example VLANs or IP subnets.
Multicast addressing is commonly used to keep bandwidth utilization of shared links to a minimum.
Because of the MAC Address forwarding nature of Layer 2 bridges it is important that a multicast MAC address is only associated with one stream. This will prevent reservations from forwarding packets from one stream down a path that has no interested sinks simply because there is another stream on that same path that shares the same multicast MAC address.
Since each multicast MAC Address can represent 32 different IPv4 multicast addresses there must be a process put in place to make sure this does not occur. Requiring use of IPv6 address can achieve this, however due to their continued prevalence, solutions that are effective for IPv4 installations are also required.
A commonly cited goal of moving to a packet based media infrastructure is that costs can be reduced by using off the shelf, commodity network hardware. In addition, economy of scale can be realized by combining media infrastructure with IT infrastructure. In keeping with these goals, stream reservation technology should be compatible with existing protocols, and not compromise use of the network for best effort (non-time-sensitive) traffic.
Many industries that are moving from the point-to-point world to the digital network world have little understanding of the pitfalls that they can create for themselves with improperly implemented network infrastructure. DetNet should consider ways to provide security against DoS attacks in solutions directed at these markets. Some considerations are given here as examples of ways that we can help new users avoid common pitfalls.
One security pitfall that this author is aware of involves the use of technology that allows a presenter to throw the content from their tablet or smart phone onto the A/V system that is then viewed by all those in attendance. The facility introducing this technology was quite excited to allow such modern flexibility to those who came to speak. One thing they hadn't realized was that since no security was put in place around this technology it left a hole in the system that allowed other attendees to "throw" their own content onto the A/V system.
Professional audio systems can include amplifiers that are capable of generating hundreds or thousands of watts of audio power which if used incorrectly can cause hearing damage to those in the vicinity. Apart from the usual care required by the systems operators to prevent such incidents, the network traffic that controls these devices must be secured (as with any sensitive application traffic). In addition, it would be desirable if the configuration protocols that are used to create the network paths used by the professional audio traffic could be designed to protect devices that are not meant to receive high-amplitude content from having such potentially damaging signals routed to them.
ESPN recently constructed a state-of-the-art 194,000 sq ft, $125 million broadcast studio called DC2. The DC2 network is capable of handling 46 Tbps of throughput with 60,000 simultaneous signals. Inside the facility are 1,100 miles of fiber feeding four audio control rooms. (See details at [ESPN_DC2] ).
In designing DC2 they replaced as much point-to-point technology as they possibly could with packet-based technology. They constructed seven individual studios using layer 2 LANS (using IEEE 802.1 AVB) that were entirely effective at routing audio within the LANs, and they were very happy with the results, however to interconnect these layer 2 LAN islands together they ended up using dedicated links because there is no standards-based routing solution available.
This is the kind of motivation we have to develop these standards because customers are ready and able to use them.
The editors would like to acknowledge the help of the following individuals and the companies they represent:
Jeff Koftinoff, Meyer Sound
Jouni Korhonen, Associate Technical Director, Broadcom
Pascal Thubert, CTAO, Cisco
Kieran Tyrrell, Sienda New Media Technologies GmbH
(This section was derived from draft-wetterwald-detnet-utilities-reqs-02)
[I-D.finn-detnet-problem-statement] defines the characteristics of a deterministic flow as a data communication flow with a bounded latency, extraordinarily low frame loss, and a very narrow jitter. This document intends to define the utility requirements for deterministic networking.
Utility Telecom Networks
The business and technology trends that are sweeping the utility industry will drastically transform the utility business from the way it has been for many decades. At the core of many of these changes is a drive to modernize the electrical grid with an integrated telecommunications infrastructure. However, interoperability, concerns, legacy networks, disparate tools, and stringent security requirements all add complexity to the grid transformation. Given the range and diversity of the requirements that should be addressed by the next generation telecommunications infrastructure, utilities need to adopt a holistic architectural approach to integrate the electrical grid with digital telecommunications across the entire power delivery chain.
Many utilities still rely on complex environments formed of multiple application-specific, proprietary networks. Information is siloed between operational areas. This prevents utility operations from realizing the operational efficiency benefits, visibility, and functional integration of operational information across grid applications and data networks. The key to modernizing grid telecommunications is to provide a common, adaptable, multi-service network infrastructure for the entire utility organization. Such a network serves as the platform for current capabilities while enabling future expansion of the network to accommodate new applications and services.
To meet this diverse set of requirements, both today and in the future, the next generation utility telecommunnications network will be based on open-standards-based IP architecture. An end-to-end IP architecture takes advantage of nearly three decades of IP technology development, facilitating interoperability across disparate networks and devices, as it has been already demonstrated in many mission-critical and highly secure networks.
IEC (International Electrotechnical Commission) and different National Committees have mandated a specific adhoc group (AHG8) to define the migration strategy to IPv6 for all the IEC TC57 power automation standards. IPv6 is seen as the obvious future telecommunications technology for the Smart Grid. The Adhoc Group has disclosed, to the IEC coordination group, their conclusions at the end of 2014.
It is imperative that utilities participate in standards development bodies to influence the development of future solutions and to benefit from shared experiences of other utilities and vendors.
These general telecommunications requirements are over and above the specific requirements of the use cases that have been addressed so far. These include both current and future telecommunications related requirements that should be factored into the network architecture and design.
Throughout the world, utilities are increasingly planning for a future based on smart grid applications requiring advanced telecommunications systems. Many of these applications utilize packet connectivity for communicating information and control signals across the utility's Wide Area Network (WAN), made possible by technologies such as multiprotocol label switching (MPLS). The data that traverses the utility WAN includes:
WANs support this wide variety of traffic to and from substations, the transmission and distribution grid, generation sites, between control centers, and between work locations and data centers. To maintain this rapidly expanding set of applications, many utilities are taking steps to evolve present time-division multiplexing (TDM) based and frame relay infrastructures to packet systems. Packet-based networks are designed to provide greater functionalities and higher levels of service for applications, while continuing to deliver reliability and deterministic (real-time) traffic support.
Among the numerous applications and use cases that a utility deploys today, many rely on high availability and deterministic behaviour of the telecommunications networks. Protection use cases and generation control are the most demanding and can't rely on a best effort approach.
Protection means not only the protection of the human operator but also the protection of the electric equipments and the preservation of the stability and frequency of the grid. If a default occurs on the transmission or the distribution of the electricity, important damages could occured to the human operator but also to very costly electrical equipments and perturb the grid leading to blackouts. The time and reliability requirements are very strong to avoid dramatic impacts to the electrical infrastructure.
The key criteria for measuring Teleprotection performance are command transmission time, dependability and security. These criteria are defined by the IEC standard 60834 as follows:
Additional key elements that may impact Teleprotection performance include bandwidth rate of the Teleprotection system and its resiliency or failure recovery capacity. Transmission time, bandwidth utilization and resiliency are directly linked to the telecommunications equipments and the connections that are used to transfer the commands between relays.
Delay requirements for utility networks may vary depending upon a number of parameters, such as the specific protection equipments used. Most power line equipment can tolerate short circuits or faults for up to approximately five power cycles before sustaining irreversible damage or affecting other segments in the network. This translates to total fault clearance time of 100ms. As a safety precaution, however, actual operation time of protection systems is limited to 70- 80 percent of this period, including fault recognition time, command transmission time and line breaker switching time. Some system components, such as large electromechanical switches, require particularly long time to operate and take up the majority of the total clearance time, leaving only a 10ms window for the telecommunications part of the protection scheme, independent of the distance to travel. Given the sensitivity of the issue, new networks impose requirements that are even more stringent: IEC standard 61850 limits the transfer time for protection messages to 1/4 - 1/2 cycle or 4 - 8ms (for 60Hz lines) for the most critical messages.
In addition to minimal transmission delay, a differential protection telecommunications channel must be synchronous, i.e., experiencing symmetrical channel delay in transmit and receive paths. This requires special attention in jitter-prone packet networks. While optimally Teleprotection systems should support zero asymmetric delay, typical legacy relays can tolerate discrepancies of up to 750us.
The main tools available for lowering delay variation below this threshold are:
The following table captures the main network requirements (this is based on IEC 61850 standard)
Teleprotection Requirement | Attribute |
---|---|
One way maximum delay | 4-10 ms |
Asymetric delay required | Yes |
Maximum jitter | less than 250 us (750 us for legacy IED) |
Topology | Point to point, point to Multi-point |
Availability | 99.9999 |
precise timing required | Yes |
Recovery time on node failure | less than 50ms - hitless |
performance management | Yes, Mandatory |
Redundancy | Yes |
Packet loss | 0.1% to 1% |
Inter-tripping is the controlled tripping of a circuit breaker to complete the isolation of a circuit or piece of apparatus in concert with the tripping of other circuit breakers. The main use of such schemes is to ensure that protection at both ends of a faulted circuit will operate to isolate the equipment concerned. Inter-tripping schemes use signaling to convey a trip command to remote circuit breakers to isolate circuits.
Inter-Trip protection Requirement | Attribute |
---|---|
One way maximum delay | 5 ms |
Asymetric delay required | No |
Maximum jitter | Not critical |
Topology | Point to point, point to Multi-point |
Bandwidth | 64 Kbps |
Availability | 99.9999 |
precise timing required | Yes |
Recovery time on node failure | less than 50ms - hitless |
performance management | Yes, Mandatory |
Redundancy | Yes |
Packet loss | 0.1% |
Current differential protection is commonly used for line protection, and is typical for protecting parallel circuits. A main advantage for differential protection is that, compared to overcurrent protection, it allows only the faulted circuit to be de-energized in case of a fault. At both end of the lines, the current is measured by the differential relays, and based on Kirchhoff's law, both relays will trip the circuit breaker if the current going into the line does not equal the current going out of the line. This type of protection scheme assumes some form of communications being present between the relays at both end of the line, to allow both relays to compare measured current values. A fault in line 1 will cause overcurrent to be flowing in both lines, but because the current in line 2 is a through following current, this current is measured equal at both ends of the line, therefore the differential relays on line 2 will not trip line 2. Line 1 will be tripped, as the relays will not measure the same currents at both ends of the line. Line differential protection schemes assume a very low telecommunications delay between both relays, often as low as 5ms. Moreover, as those systems are often not time-synchronized, they also assume symmetric telecommunications paths with constant delay, which allows comparing current measurement values taken at the exact same time.
Current Differential protection Requirement | Attribute |
---|---|
One way maximum delay | 5 ms |
Asymetric delay Required | Yes |
Maximum jitter | less than 250 us (750us for legacy IED) |
Topology | Point to point, point to Multi-point |
Bandwidth | 64 Kbps |
Availability | 99.9999 |
precise timing required | Yes |
Recovery time on node failure | less than 50ms - hitless |
performance management | Yes, Mandatory |
Redundancy | Yes |
Packet loss | 0.1% |
Distance (Impedance Relay) protection scheme is based on voltage and current measurements. A fault on a circuit will generally create a sag in the voltage level. If the ratio of voltage to current measured at the protection relay terminals, which equates to an impedance element, falls within a set threshold the circuit breaker will operate. The operating characteristics of this protection are based on the line characteristics. This means that when a fault appears on the line, the impedance setting in the relay is compared to the apparent impedance of the line from the relay terminals to the fault. If the relay setting is determined to be below the apparent impedance it is determined that the fault is within the zone of protection. When the transmission line length is under a minimum length, distance protection becomes more difficult to coordinate. In these instances the best choice of protection is current differential protection.
Distance protection Requirement | Attribute |
---|---|
One way maximum delay | 5 ms |
Asymetric delay Required | No |
Maximum jitter | Not critical |
Topology | Point to point, point to Multi-point |
Bandwidth | 64 Kbps |
Availability | 99.9999 |
precise timing required | Yes |
Recovery time on node failure | less than 50ms - hitless |
performance management | Yes, Mandatory |
Redundancy | Yes |
Packet loss | 0.1% |
This use case describes the exchange of Sampled Value and/or GOOSE (Generic Object Oriented Substation Events) message between Intelligent Electronic Devices (IED) in two substations for protection and tripping coordination. The two IEDs are in a master-slave mode.
The Current Transformer or Voltage Transformer (CT/VT) in one substation sends the sampled analog voltage or current value to the Merging Unit (MU) over hard wire. The merging unit sends the time-synchronized 61850-9-2 sampled values to the slave IED. The slave IED forwards the information to the Master IED in the other substation. The master IED makes the determination (for example based on sampled value differentials) to send a trip command to the originating IED. Once the slave IED/Relay receives the GOOSE trip for breaker tripping, it opens the breaker. It then sends a confirmation message back to the master. All data exchanges between IEDs are either through Sampled Value and/or GOOSE messages.
Inter-Substation protection Requirement | Attribute |
---|---|
One way maximum delay | 5 ms |
Asymetric delay Required | No |
Maximum jitter | Not critical |
Topology | Point to point, point to Multi-point |
Bandwidth | 64 Kbps |
Availability | 99.9999 |
precise timing required | Yes |
Recovery time on node failure | less than 50ms - hitless |
performance management | Yes, Mandatory |
Redundancy | Yes |
Packet loss | 1% |
This use case describes the data flow from the CT/VT to the IEDs in the substation via the merging unit (MU). The CT/VT in the substation send the sampled value (analog voltage or current) to the Merging Unit (MU) over hard wire. The merging unit sends the time-synchronized 61850-9-2 sampled values to the IEDs in the substation in GOOSE message format. The GPS Master Clock can send 1PPS or IRIG-B format to MU through serial port, or IEEE 1588 protocol via network. Process bus communication using 61850 simplifies connectivity within the substation and removes the requirement for multiple serial connections and removes the slow serial bus architectures that are typically used. This also ensures increased flexibility and increased speed with the use of multicast messaging between multiple devices.
Intra-Substation protection Requirement | Attribute |
---|---|
One way maximum delay | 5 ms |
Asymetric delay Required | No |
Maximum jitter | Not critical |
Topology | Point to point, point to Multi-point |
Bandwidth | 64 Kbps |
Availability | 99.9999 |
precise timing required | Yes |
Recovery time on Node failure | less than 50ms - hitless |
performance management | Yes, Mandatory |
Redundancy | Yes - No |
Packet loss | 0.1% |
The application of synchrophasor measurement data from Phasor Measurement Units (PMU) to Wide Area Monitoring and Control Systems promises to provide important new capabilities for improving system stability. Access to PMU data enables more timely situational awareness over larger portions of the grid than what has been possible historically with normal SCADA (Supervisory Control and Data Acquisition) data. Handling the volume and real-time nature of synchrophasor data presents unique challenges for existing application architectures. Wide Area management System (WAMS) makes it possible for the condition of the bulk power system to be observed and understood in real-time so that protective, preventative, or corrective action can be taken. Because of the very high sampling rate of measurements and the strict requirement for time synchronization of the samples, WAMS has stringent telecommunications requirements in an IP network that are captured in the following table:
WAMS Requirement | Attribute |
---|---|
One way maximum delay | 50 ms |
Asymetric delay Required | No |
Maximum jitter | Not critical |
Topology | Point to point, point to Multi-point, Multi-point to Multi-point |
Bandwidth | 100 Kbps |
Availability | 99.9999 |
precise timing required | Yes |
Recovery time on Node failure | less than 50ms - hitless |
performance management | Yes, Mandatory |
Redundancy | Yes |
Packet loss | 1% |
The IEC (International Electrotechnical Commission) has recently published a Technical Report which offers guidelines on how to define and deploy Wide Area Networks for the interconnections of electric substations, generation plants and SCADA operation centers. The IEC 61850-90-12 is providing a classification of WAN communication requirements into 4 classes. You will find herafter the table summarizing these requirements:
WAN Requirement | Class WA | Class WB | Class WC | Class WD |
---|---|---|---|---|
Application field | EHV (Extra High Voltage) | HV (High Voltage) | MV (Medium Voltage) | General purpose |
Latency | 5 ms | 10 ms | 100 ms | > 100 ms |
Jitter | 10 us | 100 us | 1 ms | 10 ms |
Latency Asymetry | 100 us | 1 ms | 10 ms | 100 ms |
Time Accuracy | 1 us | 10 us | 100 us | 10 to 100 ms |
Bit Error rate | 10-7 to 10-6 | 10-5 to 10-4 | 10-3 | |
Unavailability | 10-7 to 10-6 | 10-5 to 10-4 | 10-3 | |
Recovery delay | Zero | 50 ms | 5 s | 50 s |
Cyber security | extremely high | High | Medium | Medium |
As the name implies, Fault Location, Isolation, and Service Restoration (FLISR) refers to the ability to automatically locate the fault, isolate the fault, and restore service in the distribution network. It is a self-healing feature whose purpose is to minimize the impact of faults by serving portions of the loads on the affected circuit by switching to other circuits. It reduces the number of customers that experience a sustained power outage by reconfiguring distribution circuits. This will likely be the first wide spread application of distributed intelligence in the grid. Secondary substations can be connected to multiple primary substations. Normally, static power switch statuses (open/closed) in the network dictate the power flow to secondary substations. Reconfiguring the network in the event of a fault is typically done manually on site to operate switchgear to energize/de-energize alternate paths. Automating the operation of substation switchgear allows the utility to have a more dynamic network where the flow of power can be altered under fault conditions but also during times of peak load. It allows the utility to shift peak loads around the network. Or, to be more precise, alters the configuration of the network to move loads between different primary substations. The FLISR capability can be enabled in two modes:
There are 3 distinct sub-functions that are performed:
1. Fault Location Identification
This sub-function is initiated by SCADA inputs, such as lockouts, fault indications/location, and, also, by input from the Outage Management System (OMS), and in the future by inputs from fault-predicting devices. It determines the specific protective device, which has cleared the sustained fault, identifies the de-energized sections, and estimates the probable location of the actual or the expected fault. It distinguishes faults cleared by controllable protective devices from those cleared by fuses, and identifies momentary outages and inrush/cold load pick-up currents. This step is also referred to as Fault Detection Classification and Location (FDCL). This step helps to expedite the restoration of faulted sections through fast fault location identification and improved diagnostic information available for crew dispatch. Also provides visualization of fault information to design and implement a switching plan to isolate the fault.
2. Fault Type Determination
I. Indicates faults cleared by controllable protective devices by distinguishing between:
a. Faults cleared by fuses
b. Momentary outages
c. Inrush/cold load current
II. Determines the faulted sections based on SCADA fault indications and protection lockout signals
III. Increases the accuracy of the fault location estimation based on SCADA fault current measurements and real-time fault analysis
3. Fault Isolation and Service Restoration
Once the location and type of the fault has been pinpointed, the systems will attempt to isolate the fault and restore the non-faulted section of the network. This can have three modes of operation:
I. Closed-loop mode : This is initiated by the Fault location sub-function. It generates a switching order (i.e., sequence of switching) for the remotely controlled switching devices to isolate the faulted section, and restore service to the non-faulted sections. The switching order is automatically executed via SCADA.
II. Advisory mode : This is initiated by the Fault location sub-function. It generates a switching order for remotely and manually controlled switching devices to isolate the faulted section, and restore service to the non-faulted sections. The switching order is presented to operator for approval and execution.
III. Study mode : the operator initiates this function. It analyzes a saved case modified by the operator, and generates a switching order under the operating conditions specified by the operator.
With the increasing volume of data that are collected through fault sensors, utilities will use Big Data query and analysis tools to study outage information to anticipate and prevent outages by detecting failure patterns and their correlation with asset age, type, load profiles, time of day, weather conditions, and other conditions to discover conditions that lead to faults and take the necessary preventive and corrective measures.
FLISR Requirement | Attribute |
---|---|
One way maximum delay | 80 ms |
Asymetric delay Required | No |
Maximum jitter | 40 ms |
Topology | Point to point, point to Multi-point, Multi-point to Multi-point |
Bandwidth | 64 Kbps |
Availability | 99.9999 |
precise timing required | Yes |
Recovery time on Node failure | Depends on customer impact |
performance management | Yes, Mandatory |
Redundancy | Yes |
Packet loss | 0.1% |
The system frequency should be maintained within a very narrow band. Deviations from the acceptable frequency range are detected and forwarded to the Load Frequency Control (LFC) system so that required up or down generation increase / decrease pulses can be sent to the power plants for frequency regulation. The trend in system frequency is a measure of mismatch between demand and generation, and is a necessary parameter for load control in interconnected systems.
Automatic generation control (AGC) is a system for adjusting the power output of generators at different power plants, in response to changes in the load. Since a power grid requires that generation and load closely balance moment by moment, frequent adjustments to the output of generators are necessary. The balance can be judged by measuring the system frequency; if it is increasing, more power is being generated than used, and all machines in the system are accelerating. If the system frequency is decreasing, more demand is on the system than the instantaneous generation can provide, and all generators are slowing down.
Where the grid has tie lines to adjacent control areas, automatic generation control helps maintain the power interchanges over the tie lines at the scheduled levels. The AGC takes into account various parameters including the most economical units to adjust, the coordination of thermal, hydroelectric, and other generation types, and even constraints related to the stability of the system and capacity of interconnections to other power grids.
For the purpose of AGC we use static frequency measurements and averaging methods are used to get a more precise measure of system frequency in steady-state conditions.
During disturbances, more real-time dynamic measurements of system frequency are taken using PMUs, especially when different areas of the system exhibit different frequencies. But that is outside the scope of this use case.
FCAG (Frequency Control Automatic Generation) Requirement | Attribute |
---|---|
One way maximum delay | 500 ms |
Asymetric delay Required | No |
Maximum jitter | Not critical |
Topology | Point to point |
Bandwidth | 20 Kbps |
Availability | 99.999 |
precise timing required | Yes |
Recovery time on Node failure | N/A |
performance management | Yes, Mandatory |
Redundancy | Yes |
Packet loss | 1% |
Utilities often have very large private telecommunications networks. It covers an entire territory / country. The main purpose of the network, until now, has been to support transmission network monitoring, control, and automation, remote control of generation sites, and providing FCAPS (Fault. Configuration. Accounting. Performance. Security) services from centralized network operation centers.
Going forward, one network will support operation and maintenance of electrical networks (generation, transmission, and distribution), voice and data services for ten of thousands of employees and for exchange with neighboring interconnections, and administrative services. To meet those requirements, utility may deploy several physical networks leveraging different technologies across the country: an optical network and a microwave network for instance. Each protection and automatism system between two points has two telecommunications circuits, one on each network. Path diversity between two substations is key. Regardless of the event type (hurricane, ice storm, etc.), one path shall stay available so the SPS can still operate.
In the optical network, signals are transmitted over more than tens of thousands of circuits using fiber optic links, microwave and telephone cables. This network is the nervous system of the utility's power transmission operations. The optical network represents ten of thousands of km of cable deployed along the power lines.
Due to vast distances between transmission substations (for example as far as 280km apart), the fiber signal can be amplified to reach a distance of 280 km without attenuation.
Some utilities do not use GPS clocks in generation substations. One of the main reasons is that some of the generation plants are 30 to 50 meters deep under ground and the GPS signal can be weak and unreliable. Instead, atomic clocks are used. Clocks are synchronized amongst each other. Rubidium clocks provide clock and 1ms timestamps for IRIG-B. Some companies plan to transition to the Precision Time Protocol (IEEE 1588), distributing the synchronization signal over the IP/MPLS network.
The Precision Time Protocol (PTP) is defined in IEEE standard 1588. PTP is applicable to distributed systems consisting of one or more nodes, communicating over a network. Nodes are modeled as containing a real-time clock that may be used by applications within the node for various purposes such as generating time-stamps for data or ordering events managed by the node. The protocol provides a mechanism for synchronizing the clocks of participating nodes to a high degree of accuracy and precision.
PTP operates based on the following assumptions :
A time-stamp event is generated at the time of transmission and reception of any event message. The time-stamp event occurs when the message's timestamp point crosses the boundary between the node and the network.
IEC 61850 will recommend the use of the IEEE PTP 1588 Utility Profile (as defined in IEC 62439-3 Annex B) which offers the support of redundant attachment of clocks to Paralell Redundancy Protcol (PRP) and High-availability Seamless Redundancy (HSR) networks.
This memo includes no request to IANA.
Grid monitoring and control devices are already targets for cyber attacks and legacy telecommunications protocols have many intrinsic network related vulnerabilities. DNP3, Modbus, PROFIBUS/PROFINET, and other protocols are designed around a common paradigm of request and respond. Each protocol is designed for a master device such as an HMI (Human Machine Interface) system to send commands to subordinate slave devices to retrieve data (reading inputs) or control (writing to outputs). Because many of these protocols lack authentication, encryption, or other basic security measures, they are prone to network-based attacks, allowing a malicious actor or attacker to utilize the request-and-respond system as a mechanism for command-and-control like functionality. Specific security concerns common to most industrial control, including utility telecommunication protocols include the following:
These inherent vulnerabilities, along with increasing connectivity between IT an OT networks, make network-based attacks very feasible. Simple injection of malicious protocol commands provides control over the target process. Altering legitimate protocol traffic can also alter information about a process and disrupt the legitimate controls that are in place over that process. A man- in-the-middle attack could provide both control over a process and misrepresentation of data back to operator consoles.
Although advanced telecommunications networks can assist in transforming the energy industry, playing a critical role in maintaining high levels of reliability, performance, and manageability, they also introduce the need for an integrated security infrastructure. Many of the technologies being deployed to support smart grid projects such as smart meters and sensors can increase the vulnerability of the grid to attack. Top security concerns for utilities migrating to an intelligent smart grid telecommunications platform center on the following trends:
This development of a diverse set of networks to support the integration of microgrids, open-access energy competition, and the use of network-controlled devices is driving the need for a converged security infrastructure for all participants in the smart grid, including utilities, energy service providers, large commercial and industrial, as well as residential customers. Securing the assets of electric power delivery systems, from the control center to the substation, to the feeders and down to customer meters, requires an end-to-end security infrastructure that protects the myriad of telecommunications assets used to operate, monitor, and control power flow and measurement. Cyber security refers to all the security issues in automation and telecommunications that affect any functions related to the operation of the electric power systems. Specifically, it involves the concepts of:
When designing and deploying new smart grid devices and telecommunications systems, it's imperative to understand the various impacts of these new components under a variety of attack situations on the power grid. Consequences of a cyber attack on the grid telecommunications network can be catastrophic. This is why security for smart grid is not just an ad hoc feature or product, it's a complete framework integrating both physical and Cyber security requirements and covering the entire smart grid networks from generation to distribution. Security has therefore become one of the main foundations of the utility telecom network architecture and must be considered at every layer with a defense-in-depth approach. Migrating to IP based protocols is key to address these challenges for two reasons:
1. IP enables a rich set of features and capabilities to enhance the security posture
2. IP is based on open standards, which allows interoperability between different vendors and products, driving down the costs associated with implementing security solutions in OT networks.
Securing OT (Operation technology) telecommunications over packet-switched IP networks follow the same principles that are foundational for securing the IT infrastructure, i.e., consideration must be given to enforcing electronic access control for both person-to-machine and machine-to-machine communications, and providing the appropriate levels of data privacy, device and platform integrity, and threat detection and mitigation.
Faramarz Maghsoodlou, Ph. D. IoT Connected Industries and Energy Practice Cisco
Pascal Thubert, CTAO Cisco
Building Automation System (BAS) is a system that manages various equipment and sensors in buildings (e.g., heating, cooling and ventilating) for improving residents' comfort, reduction of energy consumption and automatic responses in case of failure and emergency. For example, BAS measures temperature of a room by using various sensors and then controls the HVAC (Heating, Ventilating, and air Conditioning) system automatically to maintain the temperature level and minimize the energy consumption.
There are typically two layers of network in a BAS. Upper one is called management network and the lower one is called field network. In management networks, an IP-based communication protocol is used while in field network, non-IP based communication protocols (a.k.a., field protocol) are mainly used.
There are many field protocols used in today's deployment in which some medium access control and physical layers protocols are standards-based and others are proprietary based. Therefore the BAS needs to have multiple MAC/PHY modules and interfaces to make use of multiple field protocols based devices. This situation not only makes BAS more expensive with large development cycle of multiple devices but also creates the issue of vendor lock-in with multiple types of management applications.
The other issue with some of the existing field networks and protocols are security. When these protocols and network were developed, it was assumed that the field networks are isolated physically from external networks and therefore the network and protocol security was not a concern. However, in today's world many BASes are managed remotely and is connected to shared IP networks and it is also not uncommon that same IT infrastructure is used be it office, home or in enterprise networks. Adding network and protocol security to existing system is a non-trivial task.
This document first describes the BAS functionalities, its architecture and current deployment models. Then we discuss the use cases and field network requirements that need to be satisfied by deterministic networking.
Building Automation System (BAS) is a system that manages various devices in buildings automatically. BAS primarily performs the following functions:
A typical BAS architecture is described below in Figure 1. There are several elements in a BAS.
+----------------------------+ | | | BMS HMI | | | | | | +----------------------+ | | | Management Network | | | +----------------------+ | | | | | | LC LC | | | | | | +----------------------+ | | | Field Network | | | +----------------------+ | | | | | | | | Dev Dev Dev Dev | | | +----------------------------+ BMS := Building Management Server HMI := Human Machine Interface LC := Local Controller
Figure 1: BAS architecture
Human Machine Interface (HMI): It is commonly a computing platform (e.g., desktop PC) used by operators. Operators perform the following operations through HMI.
Building Management Server (BMS) collects device states from LCs (Local Controllers) and stores it into a database. According to its configuration, BMS executes the following operation automatically.
BMS and HMI communicate with Local Controllers (LCs) via IP-based communication protocol standardized by BACnet/IP [bacnetip], KNX/IP [knx]. These protocols are commonly called as management protocols. LCs measure device states and provide the information to BMS or HMI. These devices may include HVAC, FAN, doors, valves, lights, sensors (e.g., temperature, humidity, and illuminance). LC can also set control values to the devices. LC sometimes has additional functions, for example, sending a device state to BMS or HMI if the device state exceeds a certain threshold value, feedback control to a device to keep the device state at a certain state. Typical example of LC is a PLC (Programmable Logic Controller).
Each LC is connected with a different field network and communicates with several tens or hundreds of devices via the field network. Today there are many field protocols used in the field network. Based on the type of field protocol used, LC interfaces and its hardware/software could be different. Field protocols are currently non-IP based in which some of them are standards-based (e.g., LonTalk [lontalk], Modbus [modbus], Profibus [profibus], FL-net [flnet],) and others are proprietary.
An example BAS system deployment model for medium and large buildings is depicted in Figure 2 below. In this case the physical layout of the entire system spans across multiple floors in which there is normally a monitoring room where the BAS management entities are located. Each floor will have one or more LCs depending upon the number of devices connected to the field network.
+--------------------------------------------------+ | Floor 3 | | +----LC~~~~+~~~~~+~~~~~+ | | | | | | | | | Dev Dev Dev | | | | |--- | ------------------------------------------| | | Floor 2 | | +----LC~~~~+~~~~~+~~~~~+ Field Network | | | | | | | | | Dev Dev Dev | | | | |--- | ------------------------------------------| | | Floor 1 | | +----LC~~~~+~~~~~+~~~~~+ +-----------------| | | | | | | Monitoring Room | | | Dev Dev Dev | | | | | BMS HMI | | | Management Network | | | | | +--------------------------------+-----+ | | | | +--------------------------------------------------+
Figure 2: Deployment model for Medium/Large Buildings
Each LC is then connected to the monitoring room via the management network. In this scenario, the management functions are performed locally and reside within the building. In most cases, fast Ethernet (e.g. 100BASE-TX) is used for the management network. In the field network, variety of physical interfaces such as RS232C, and RS485 are used. Since management network is non-real time, Ethernet without quality of service is sufficient for today's deployment. However, the requirements are different for field networks when they are replaced by either Ethernet or any wireless technologies supporting real time requirements (Section 3.4).
Figure 3 depicts a deployment model in which the management can be hosted remotely. This deployment is becoming popular for small office and residential buildings whereby having a standalone monitoring system is not a cost effective solution. In such scenario, multiple buildings are managed by a remote management monitoring system.
+---------------+ | Remote Center | | | | BMS HMI | +------------------------------------+ | | | | | Floor 2 | | +---+---+ | | +----LC~~~~+~~~~~+ Field Network| | | | | | | | | | Router | | | Dev Dev | +-------|-------+ | | | | |--- | ------------------------------| | | | Floor 1 | | | +----LC~~~~+~~~~~+ | | | | | | | | | | Dev Dev | | | | | | | | Management Network | WAN | | +------------------------Router-------------+ | | +------------------------------------+
Figure 3: Deployment model for Small Buildings
In either case, interoperability today is only limited to the management network and its protocols. In existing deployment, there are limited interoperability opportunity in the field network due to its nature of non-IP-based design and requirements.
In this section, we describe several use cases and corresponding network requirements.
In this use case, LCs measure environmental data (e.g. temperatures, humidity, illuminance, CO2, etc.) from several sensor devices at each measurement interval. LCs keep latest value of each sensor. BMS sends data requests to LCs to collect the latest values, then stores the collected values into a database. Operators check the latest environmental data that are displayed by the HMI. BMS also checks the collected data automatically to notify the operators if a room condition was going to bad (e.g., too hot or cold). The following table lists the field network requirements in which the number of devices in a typical building will be ~100s per LC.
Metric | Requirement |
---|---|
Measurement interval | 100 msec |
Availability | 99.999 % |
There is a case that BMS sends data requests at each 1 second in order to draw a historical chart of 1 second granularity. Therefore 100 msec measurement interval is sufficient for this use case, because typically 10 times granularity (compared with the interval of data requests) is considered enough accuracy in this use case. A LC needs to measure values of all sensors connected with itself at each measurement interval. Each communication delay in this scenario is not so critical. The important requirement is completing measurements of all sensor values in the specified measurement interval. The availability in this use case is very high (Three 9s).
In the case of fire detection, HMI needs to show a popup window with an alert message within a few seconds after an abnormal state is detected. BMS needs to do some operations if it detects fire. For example, stopping a HVAC, closing fire shutters, and turning on fire sprinklers. The following table describes requirements in which the number of devices in a typical building will be ~10s per LC.
Metric | Requirement |
---|---|
Measurement interval | 10s of msec |
Communication delay | < 10s of msec |
Availability | 99.9999 % |
In order to perform the above operation within a few seconds (1 or 2 seconds) after detecting fire, LCs should measure sensor values at a regular interval of less than 10s of msec. If a LC detects an abnormal sensor value, it sends an alarm information to BMS and HMI immediately. BMS then controls HVAC or fire shutters or fire sprinklers. HMI then displays a pop up window and generates the alert message. Since the management network does not operate in real time, and software run on BMS or HMI requires 100s of ms, the communication delay should be less than ~10s of msec. The availability in this use case is very high (Four 9s).
Feedback control is used to keep a device state at a certain value. For example, keeping a room temperature at 27 degree Celsius, keeping a water flow rate at 100 L/m and so on. The target device state is normally pre-defined in LCs or provided from BMS or from HMI.
In feedback control procedure, a LC repeats the following actions at a regular interval (feedback interval).
The feedback interval highly depends on the characteristics of the device and a target quality of control value. While several tens of milliseconds feedback interval is sufficient to control a valve that regulates a water flow, controlling DC motors requires several milliseconds interval. The following table describes the field network requirements in which the number of devices in a typical building will be ~10s per LC.
Metric | Requirement |
---|---|
Feedback interval | ~10ms - 100ms |
Communication delay | < 10s of msec |
Communication jitter | < 1 msec |
Availability | 99.9999 % |
Small communication delay and jitter are required in this use case in order to provide high quality of feedback control. This is currently offered in production environment with hgh availability (Four 9s).
Both network and physical security of BAS are important. While physical security is present in today's deployment, adequate network security and access control are either not implemented or configured properly. This was sufficient in networks while they are isolated and not connected to the IT or other infrastructure networks but when IT and OT (Operational Technology) are connected in the same infrastructure network, network security is essential. The management network being an IP-based network does have the protocols and knobs to enable the network security but in many cases BAS for example, does not use device authentication or encryption for data in transit. On the contrary, many of today's field networks do not provide any security at all. Following are the high level security requirements that the network should provide:
(This section was derived from draft-thubert-6tisch-4detnet-01)
The emergence of wireless technology has enabled a variety of new devices to get interconnected, at a very low marginal cost per device, at any distance ranging from Near Field to interplanetary, and in circumstances where wiring may not be practical, for instance on fast-moving or rotating devices.
At the same time, a new breed of Time Sensitive Networks is being developed to enable traffic that is highly sensitive to jitter, quite sensitive to latency, and with a high degree of operational criticality so that loss should be minimized at all times. Such traffic is not limited to professional Audio/ Video networks, but is also found in command and control operations such as industrial automation and vehicular sensors and actuators.
At IEEE802.1, the Audio/Video Task Group [IEEE802.1TSNTG] Time Sensitive Networking (TSN) to address Deterministic Ethernet. The Medium access Control (MAC) of IEEE802.15.4 [IEEE802154] has evolved with the new TimeSlotted Channel Hopping (TSCH) [RFC7554] mode for deterministic industrial-type applications. TSCH was introduced with the IEEE802.15.4e [IEEE802154e] amendment and will be wrapped up in the next revision of the IEEE802.15.4 standard. For all practical purpose, this document is expected to be insensitive to the future versions of the IEEE802.15.4 standard, which is thus referenced undated.
Though at a different time scale, both TSN and TSCH standards provide Deterministic capabilities to the point that a packet that pertains to a certain flow crosses the network from node to node following a very precise schedule, as a train that leaves intermediate stations at precise times along its path. With TSCH, time is formatted into timeSlots, and an individual cell is allocated to unicast or broadcast communication at the MAC level. The time-slotted operation reduces collisions, saves energy, and enables to more closely engineer the network for deterministic properties. The channel hopping aspect is a simple and efficient technique to combat multi-path fading and co-channel interferences (for example by Wi-Fi emitters).
The 6TiSCH Architecture [I-D.ietf-6tisch-architecture] defines a remote monitoring and scheduling management of a TSCH network by a Path Computation Element (PCE), which cooperates with an abstract Network Management Entity (NME) to manage timeSlots and device resources in a manner that minimizes the interaction with and the load placed on the constrained devices.
This Architecture applies the concepts of Deterministic Networking on a TSCH network to enable the switching of timeSlots in a G-MPLS manner. This document details the dependencies that 6TiSCH has on PCE [PCE] and DetNet [I-D.finn-detnet-architecture] to provide the necessary capabilities that may be specific to such networks. In turn, DetNet is expected to integrate and maintain consistency with the work that has taken place and is continuing at IEEE802.1TSN and AVnu.
Readers are expected to be familiar with all the terms and concepts that are discussed in "Multi-link Subnet Support in IPv6" [I-D.ietf-ipv6-multilink-subnets].
The draft uses terminology defined or referenced in [I-D.ietf-6tisch-terminology] and [I-D.ietf-roll-rpl-industrial-applicability].
The draft also conforms to the terms and models described in [RFC3444] and uses the vocabulary and the concepts defined in [RFC4291] for the IPv6 Architecture.
The scope of the present work is a subnet that, in its basic configuration, is made of a TSCH [RFC7554] MAC Low Power Lossy Network (LLN).
---+-------- ............ ------------ | External Network | | +-----+ +-----+ | NME | | | LLN Border | | | | router +-----+ +-----+ o o o o o o o o o LLN o o o o o o o o
Figure 4: Basic Configuration of a 6TiSCH Network
In the extended configuration, a Backbone Router (6BBR) federates multiple 6TiSCH in a single subnet over a backbone. 6TiSCH 6BBRs synchronize with one another over the backbone, so as to ensure that the multiple LLNs that form the IPv6 subnet stay tightly synchronized.
---+-------- ............ ------------ | External Network | | +-----+ | +-----+ | NME | +-----+ | +-----+ | | | | Router | | PCE | +-----+ | | +--| | +-----+ +-----+ | | | Subnet Backbone | +--------------------+------------------+ | | | +-----+ +-----+ +-----+ | | Backbone | | Backbone | | Backbone o | | router | | router | | router +-----+ +-----+ +-----+ o o o o o o o o o o o o o o o o o o o LLN o o o o o o o o o o o o o o o o
Figure 5: Extended Configuration of a 6TiSCH Network
If the Backbone is Deterministic, then the Backbone Router ensures that the end-to-end deterministic behavior is maintained between the LLN and the backbone. This SHOULD be done in conformance to the DetNet Architecture [I-D.finn-detnet-architecture] which studies Layer-3 aspects of Deterministic Networks, and covers networks that span multiple Layer-2 domains. One particular requirement is that the PCE MUST be able to compute a deterministic path and to end across the TSCH network and an IEEE802.1 TSN Ethernet backbone, and DetNet MUST enable end-to-end deterministic forwarding.
6TiSCH defines the concept of a Track, which is a complex form of a uni-directional Circuit ([I-D.ietf-6tisch-terminology]). As opposed to a simple circuit that is a sequence of nodes and links, a Track is shaped as a directed acyclic graph towards a destination to support multi-path forwarding and route around failures. A Track may also branch off and rejoin, for the purpose of the so-called Packet Replication and Elimination (PRE), over non congruent branches. PRE may be used to complement layer-2 Automatic Repeat reQuest (ARQ) to meet industrial expectations in Packet Delivery Ratio (PDR), in particular when the Track extends beyond the 6TiSCH network.
+-----+ | IoT | | G/W | +-----+ ^ <---- Elimination | | Track branch | | +-------+ +--------+ Subnet Backbone | | +--|--+ +--|--+ | | | Backbone | | | Backbone o | | | router | | | router +--/--+ +--|--+ o / o o---o----/ o o o---o--/ o o o o o o \ / o o LLN o o v <---- Replication o
Figure 6: End-to-End deterministic Track
In the example above, a Track is laid out from a field device in a 6TiSCH network to an IoT gateway that is located on a IEEE802.1 TSN backbone.
The Replication function in the field device sends a copy of each packet over two different branches, and the PCE schedules each hop of both branches so that the two copies arrive in due time at the gateway. In case of a loss on one branch, hopefully the other copy of the packet still makes it in due time. If two copies make it to the IoT gateway, the Elimination function in the gateway ignores the extra packet and presents only one copy to upper layers.
At each 6TiSCH hop along the Track, the PCE may schedule more than one timeSlot for a packet, so as to support Layer-2 retries (ARQ). It is also possible that the field device only uses the second branch if sending over the first branch fails.
In current deployments, a TSCH Track does not necessarily support PRE but is systematically multi-path. This means that a Track is scheduled so as to ensure that each hop has at least two forwarding solutions, and the forwarding decision is to try the preferred one and use the other in case of Layer-2 transmission failure as detected by ARQ.
6top is a logical link control sitting between the IP layer and the TSCH MAC layer, which provides the link abstraction that is required for IP operations. The 6top operations are specified in [I-D.wang-6tisch-6top-sublayer].
The 6top data model and management interfaces are further discussed in [I-D.ietf-6tisch-6top-interface] and [I-D.ietf-6tisch-coap].
The architecture defines "soft" cells and "hard" cells. "Hard" cells are owned and managed by an separate scheduling entity (e.g. a PCE) that specifies the slotOffset/channelOffset of the cells to be added/moved/deleted, in which case 6top can only act as instructed, and may not move hard cells in the TSCH schedule on its own.
A slotFrame is the base object that the PCE needs to manipulate to program a schedule into an LLN node. Elaboration on that concept can be found in section "SlotFrames and Priorities" of the 6TiSCH architecture [I-D.ietf-6tisch-architecture]. The architecture also details how the schedule is constructed and how transmission resources called cells can be allocated to particular transmissions so as to avoid collisions.
6TiSCH supports a mixed model of centralized routes and distributed routes. Centralized routes can for example be computed by a entity such as a PCE. Distributed routes are computed by RPL.
Both methods may inject routes in the Routing Tables of the 6TiSCH routers. In either case, each route is associated with a 6TiSCH topology that can be a RPL Instance topology or a track. The 6TiSCH topology is indexed by a Instance ID, in a format that reuses the RPLInstanceID as defined in RPL [RFC6550].
Both RPL and PCE rely on shared sources such as policies to define Global and Local RPLInstanceIDs that can be used by either method. It is possible for centralized and distributed routing to share a same topology. Generally they will operate in different slotFrames, and centralized routes will be used for scheduled traffic and will have precedence over distributed routes in case of conflict between the slotFrames.
Section "Schedule Management Mechanisms" of the 6TiSCH architecture describes 4 paradigms to manage the TSCH schedule of the LLN nodes: Static Scheduling, neighbor-to-neighbor Scheduling, remote monitoring and scheduling management, and Hop-by-hop scheduling. The Track operation for DetNet corresponds to a remote monitoring and scheduling management by a PCE.
The 6top interface document [I-D.ietf-6tisch-6top-interface] specifies the generic data model that can be used to monitor and manage resources of the 6top sublayer. Abstract methods are suggested for use by a management entity in the device. The data model also enables remote control operations on the 6top sublayer.
[I-D.ietf-6tisch-coap] defines an mapping of the 6top set of commands, which is described in [I-D.ietf-6tisch-6top-interface], to CoAP resources. This allows an entity to interact with the 6top layer of a node that is multiple hops away in a RESTful fashion.
[I-D.ietf-6tisch-coap] also defines a basic set CoAP resources and associated RESTful access methods (GET/PUT/POST/DELETE). The payload (body) of the CoAP messages is encoded using the CBOR format. The PCE commands are expected to be issued directly as CoAP requests or to be mapped back and forth into CoAP by a gateway function at the edge of the 6TiSCH network. For instance, it is possible that a mapping entity on the backbone transforms a non-CoAP protocol such as PCEP into the RESTful interfaces that the 6TiSCH devices support. This architecture will be refined to comply with DetNet [I-D.finn-detnet-architecture] when the work is formalized.
By forwarding, this specification means the per-packet operation that allows to deliver a packet to a next hop or an upper layer in this node. Forwarding is based on pre-existing state that was installed as a result of the routing computation of a Track by a PCE. The 6TiSCH architecture supports three different forwarding model, G-MPLS Track Forwarding (TF), 6LoWPAN Fragment Forwarding (FF) and IPv6 Forwarding (6F) which is the classical IP operation. The DetNet case relates to the Track Forwarding operation under the control of a PCE.
A Track is a unidirectional path between a source and a destination. In a Track cell, the normal operation of IEEE802.15.4 Automatic Repeat-reQuest (ARQ) usually happens, though the acknowledgment may be omitted in some cases, for instance if there is no scheduled cell for a retry.
Track Forwarding is the simplest and fastest. A bundle of cells set to receive (RX-cells) is uniquely paired to a bundle of cells that are set to transmit (TX-cells), representing a layer-2 forwarding state that can be used regardless of the network layer protocol. This model can effectively be seen as a Generalized Multi-protocol Label Switching (G-MPLS) operation in that the information used to switch a frame is not an explicit label, but rather related to other properties of the way the packet was received, a particular cell in the case of 6TiSCH. As a result, as long as the TSCH MAC (and Layer-2 security) accepts a frame, that frame can be switched regardless of the protocol, whether this is an IPv6 packet, a 6LoWPAN fragment, or a frame from an alternate protocol such as WirelessHART or ISA100.11a.
A data frame that is forwarded along a Track normally has a destination MAC address that is set to broadcast - or a multicast address depending on MAC support. This way, the MAC layer in the intermediate nodes accepts the incoming frame and 6top switches it without incurring a change in the MAC header. In the case of IEEE802.15.4, this means effectively broadcast, so that along the Track the short address for the destination of the frame is set to 0xFFFF.
A Track is thus formed end-to-end as a succession of paired bundles, a receive bundle from the previous hop and a transmit bundle to the next hop along the Track, and a cell in such a bundle belongs to at most one Track. For a given iteration of the device schedule, the effective channel of the cell is obtained by adding a pseudo-random number to the channelOffset of the cell, which results in a rotation of the frequency that used for transmission. The bundles may be computed so as to accommodate both variable rates and retransmissions, so they might not be fully used at a given iteration of the schedule. The 6TiSCH architecture provides additional means to avoid waste of cells as well as overflows in the transmit bundle, as follows:
In one hand, a TX-cell that is not needed for the current iteration may be reused opportunistically on a per-hop basis for routed packets. When all of the frame that were received for a given Track are effectively transmitted, any available TX-cell for that Track can be reused for upper layer traffic for which the next-hop router matches the next hop along the Track. In that case, the cell that is being used is effectively a TX-cell from the Track, but the short address for the destination is that of the next-hop router. It results that a frame that is received in a RX-cell of a Track with a destination MAC address set to this node as opposed to broadcast must be extracted from the Track and delivered to the upper layer (a frame with an unrecognized MAC address is dropped at the lower MAC layer and thus is not received at the 6top sublayer).
On the other hand, it might happen that there are not enough TX-cells in the transmit bundle to accommodate the Track traffic, for instance if more retransmissions are needed than provisioned. In that case, the frame can be placed for transmission in the bundle that is used for layer-3 traffic towards the next hop along the track as long as it can be routed by the upper layer, that is, typically, if the frame transports an IPv6 packet. The MAC address should be set to the next-hop MAC address to avoid confusion. It results that a frame that is received over a layer-3 bundle may be in fact associated to a Track. In a classical IP link such as an Ethernet, off-track traffic is typically in excess over reservation to be routed along the non-reserved path based on its QoS setting. But with 6TiSCH, since the use of the layer-3 bundle may be due to transmission failures, it makes sense for the receiver to recognize a frame that should be re-tracked, and to place it back on the appropriate bundle if possible. A frame should be re-tracked if the Per-Hop-Behavior group indicated in the Differentiated Services Field in the IPv6 header is set to Deterministic Forwarding, as discussed in Section 5.4.1. A frame is re-tracked by scheduling it for transmission over the transmit bundle associated to the Track, with the destination MAC address set to broadcast.
There are 2 modes for a Track, transport mode and tunnel mode.
In transport mode, the Protocol Data Unit (PDU) is associated with flow-dependant meta-data that refers uniquely to the Track, so the 6top sublayer can place the frame in the appropriate cell without ambiguity. In the case of IPv6 traffic, this flow identification is transported in the Flow Label of the IPv6 header. Associated with the source IPv6 address, the Flow Label forms a globally unique identifier for that particular Track that is validated at egress before restoring the destination MAC address (DMAC) and punting to the upper layer.
| ^ +--------------+ | | | IPv6 | | | +--------------+ | | | 6LoWPAN HC | | | +--------------+ ingress egress | 6top | sets +----+ +----+ restores +--------------+ dmac to | | | | dmac to | TSCH MAC | brdcst | | | | self +--------------+ | | | | | | | LLN PHY | +-------+ +--...-----+ +-------+ +--------------+
Track Forwarding, Transport Mode
In tunnel mode, the frames originate from an arbitrary protocol over a compatible MAC that may or may not be synchronized with the 6TiSCH network. An example of this would be a router with a dual radio that is capable of receiving and sending WirelessHART or ISA100.11a frames with the second radio, by presenting itself as an access Point or a Backbone Router, respectively.
In that mode, some entity (e.g. PCE) can coordinate with a WirelessHART Network Manager or an ISA100.11a System Manager to specify the flows that are to be transported transparently over the Track.
+--------------+ | IPv6 | +--------------+ | 6LoWPAN HC | +--------------+ set restore | 6top | +dmac+ +dmac+ +--------------+ to|brdcst to|nexthop | TSCH MAC | | | | | +--------------+ | | | | | LLN PHY | +-------+ +--...-----+ +-------+ +--------------+ | ingress egress | | | +--------------+ | | | LLN PHY | | | +--------------+ | | | TSCH MAC | | | +--------------+ | dmac = | dmac = |ISA100/WiHART | | nexthop v nexthop +--------------+
Figure 7: Track Forwarding, Tunnel Mode
In that case, the flow information that identifies the Track at the ingress 6TiSCH router is derived from the RX-cell. The dmac is set to this node but the flow information indicates that the frame must be tunneled over a particular Track so the frame is not passed to the upper layer. Instead, the dmac is forced to broadcast and the frame is passed to the 6top sublayer for switching.
At the egress 6TiSCH router, the reverse operation occurs. Based on metadata associated to the Track, the frame is passed to the appropriate link layer with the destination MAC restored.
Metadata coming with the Track configuration is expected to provide the destination MAC address of the egress endpoint as well as the tunnel mode and specific data depending on the mode, for instance a service access point for frame delivery at egress. If the tunnel egress point does not have a MAC address that matches the configuration, the Track installation fails.
In transport mode, if the final layer-3 destination is the tunnel termination, then it is possible that the IPv6 address of the destination is compressed at the 6LoWPAN sublayer based on the MAC address. It is thus mandatory at the ingress point to validate that the MAC address that was used at the 6LoWPAN sublayer for compression matches that of the tunnel egress point. For that reason, the node that injects a packet on a Track checks that the destination is effectively that of the tunnel egress point before it overwrites it to broadcast. The 6top sublayer at the tunnel egress point reverts that operation to the MAC address obtained from the tunnel metadata.
In a classical system, the 6TiSCH device does not place the request for bandwidth between self and another device in the network. Rather, an Operation Control System invoked through an Human/Machine Interface (HMI) indicates the Traffic Specification, in particular in terms of latency and reliability, and the end nodes. With this, the PCE must compute a Track between the end nodes and provision the network with per-flow state that describes the per-hop operation for a given packet, the corresponding timeSlots, and the flow identification that enables to recognize when a certain packet belongs to a certain Track, sort out duplicates, etc...
For a static configuration that serves a certain purpose for a long period of time, it is expected that a node will be provisioned in one shot with a full schedule, which incorporates the aggregation of its behavior for multiple Tracks. 6TiSCH expects that the programing of the schedule will be done over COAP as discussed in 6TiSCH Resource Management and Interaction using CoAP [I-D.ietf-6tisch-coap].
But an Hybrid mode may be required as well whereby a single Track is added, modified, or removed, for instance if it appears that a Track does not perform as expected for, say, PDR. For that case, the expectation is that a protocol that flows along a Track (to be), in a fashion similar to classical Traffic Engineering (TE) [CCAMP], may be used to update the state in the devices. 6TiSCH provides means for a device to negotiate a timeSlot with a neighbor, but in general that flow was not designed and no protocol was selected and it is expected that DetNet will determine the appropriate end-to-end protocols to be used in that case.
Operational System and HMI -+-+-+-+-+-+-+ Northbound -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- PCE PCE PCE PCE -+-+-+-+-+-+-+ Southbound -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- --- 6TiSCH------6TiSCH------6TiSCH------6TiSCH-- 6TiSCH / Device Device Device Device \ Device- - 6TiSCH \ 6TiSCH 6TiSCH 6TiSCH 6TiSCH / Device ----Device------Device------Device------Device--
Figure 8: Stream Management Entity
Section "Packet Marking and Handling" of [I-D.ietf-6tisch-architecture] describes the packet tagging and marking that is expected in 6TiSCH networks.
For packets that are routed by a PCE along a Track, the tuple formed by the IPv6 source address and a local RPLInstanceID is tagged in the packets to identify uniquely the Track and associated transmit bundle of timeSlots.
It results that the tagging that is used for a DetNet flow outside the 6TiSCH LLN MUST be swapped into 6TiSCH formats and back as the packet enters and then leaves the 6TiSCH network.
Note: The method and format used for encoding the RPLInstanceID at 6lo is generalized to all 6TiSCH topological Instances, which includes Tracks.
6TiSCH expects elimination and replication of packets along a complex Track, but has no position about how the sequence numbers would be tagged in the packet.
As it goes, 6TiSCH expects that timeSlots corresponding to copies of a same packet along a Track are correlated by configuration, and does not need to process the sequence numbers.
The semantics of the configuration MUST enable correlated timeSlots to be grouped for transmit (and respectively receive) with a 'OR' relations, and then a 'AND' relation MUST be configurable between groups. The semantics is that if the transmit (and respectively receive) operation succeeded in one timeSlot in a 'OR' group, then all the other timeSLots in the group are ignored. Now, if there are at least two groups, the 'AND' relation between the groups indicates that one operation must succeed in each of the groups.
On the transmit side, timeSlots provisioned for retries along a same branch of a Track are placed a same 'OR' group. The 'OR' relation indicates that if a transmission is acknowledged, then further transmissions SHOULD NOT be attempted for timeSlots in that group. There are as many 'OR' groups as there are branches of the Track departing from this node. Different 'OR' groups are programmed for the purpose of replication, each group corresponding to one branch of the Track. The 'AND' relation between the groups indicates that transmission over any of branches MUST be attempted regardless of whether a transmission succeeded in another branch. It is also possible to place cells to different next-hop routers in a same 'OR' group. This allows to route along multi-path tracks, trying one next-hop and then another only if sending to the first fails.
On the receive side, all timeSlots are programmed in a same 'OR' group. Retries of a same copy as well as converging branches for elimination are converged, meaning that the first successful reception is enough and that all the other timeSlots can be ignored.
Additionally, an IP packet that is sent along a Track uses the Differentiated Services Per-Hop-Behavior Group called Deterministic Forwarding, as described in [I-D.svshah-tsvwg-deterministic-forwarding].
6TiSCH nodes are usually IoT devices, characterized by very limited amount of memory, just enough buffers to store one or a few IPv6 packets, and limited bandwidth between peers. It results that a node will maintain only a small number of peering information, and will not be able to store many packets waiting to be forwarded. Peers can be identified through MAC or IPv6 addresses, but a Cryptographically Generated Address [RFC3972] (CGA) may also be used.
Neighbors can be discovered over the radio using mechanism such as beacons, but, though the neighbor information is available in the 6TiSCH interface data model, 6TiSCH does not describe a protocol to pro-actively push the neighborhood information to a PCE. This protocol should be described and should operate over CoAP. The protocol should be able to carry multiple metrics, in particular the same metrics as used for RPL operations [RFC6551]
The energy that the device consumes in sleep, transmit and receive modes can be evaluated and reported. So can the amount of energy that is stored in the device and the power that it can be scavenged from the environment. The PCE SHOULD be able to compute Tracks that will implement policies on how the energy is consumed, for instance balance between nodes, ensure that the spent energy does not exceeded the scavenged energy over a period of time, etc...
On top of the classical protection of control signaling that can be expected to support DetNet, it must be noted that 6TiSCH networks operate on limited resources that can be depleted rapidly if an attacker manages to operate a DoS attack on the system, for instance by placing a rogue device in the network, or by obtaining management control and to setup extra paths.
This specification derives from the 6TiSCH architecture, which is the result of multiple interactions, in particular during the 6TiSCH (bi)Weekly Interim call, relayed through the 6TiSCH mailing list at the IETF.
The authors wish to thank: Kris Pister, Thomas Watteyne, Xavier Vilajosana, Qin Wang, Tom Phinney, Robert Assimiti, Michael Richardson, Zhuo Chen, Malisa Vucinic, Alfredo Grieco, Martin Turon, Dominique Barthel, Elvis Vogli, Guillaume Gaillard, Herman Storey, Maria Rita Palattella, Nicola Accettura, Patrick Wetterwald, Pouria Zand, Raghuram Sudhaakar, and Shitanshu Shah for their participation and various contributions.
(This section was derived from draft-korhonen-detnet-telreq-00)
The recent developments in telecommunication networks, especially in the cellular domain, are heading towards transport networks where precise time synchronization support has to be one of the basic building blocks. While the transport networks themselves have practically transitioned to all-AP packet based networks to meet the bandwidth and cost requirements, a highly accurate clock distribution has become a challenge. Earlier the transport networks in the cellular domain were typically time division and multiplexing (TDM) -based and provided frequency synchronization capabilities as a part of the transport media. Alternatively other technologies such as Global Positioning System (GPS) or Synchronous Ethernet (SyncE) [SyncE] were used. New radio access network deployment models and architectures may require time sensitive networking services with strict requirements on other parts of the network that previously were not considered to be packetized at all. The time and synchronization support are already topical for backhaul and midhaul packet networks [MEF], and becoming a real issue for fronthaul networks. Specifically in the fronthaul networks the timing and synchronization requirements can be extreme for packet based technologies, for example, in order of sub +-20 ns packet delay variation (PDV) and frequency accuracy of +0.002 PPM [Fronthaul].
Both Ethernet and IP/MPLS [RFC3031] (and PseudoWires (PWE) [RFC3985] for legacy transport support) have become popular tools to build and manage new all-IP radio access networks (RAN) [I-D.kh-spring-ip-ran-use-case]. Although various timing and synchronization optimizations have already been proposed and implemented including 1588 PTP enhancements [I-D.ietf-tictoc-1588overmpls][I-D.mirsky-mpls-residence-time], these solution are not necessarily sufficient for the forthcoming RAN architectures or guarantee the higher time-synchronization requirements [CPRI]. There are also existing solutions for the TDM over IP [RFC5087] [RFC4553] or Ethernet transports [RFC5086]. The really interesting and important existing work for time sensitive networking has been done for Ethernet [TSNTG], which specifies the use of IEEE 1588 time precision protocol (PTP) [IEEE1588] in the context of IEEE 802.1D and IEEE 802.1Q. While IEEE 802.1AS [IEEE8021AS] specifies a Layer-2 time synchronizing service other specification, such as IEEE 1722 [IEEE1722] specify Ethernet-based Layer-2 transport for time-sensitive streams. New promising work seeks to enable the transport of time-sensitive fronthaul streams in Ethernet bridged networks [IEEE8021CM]. Similarly to IEEE 1722 there is an ongoing standardization effort to define Layer-2 transport encapsulation format for transporting radio over Ethernet (RoE) in IEEE 1904.3 Task Force [IEEE19043].
As already mentioned all-IP RANs and various "haul" networks would benefit from time synchronization and time-sensitive transport services. Although Ethernet appears to be the unifying technology for the transport there is still a disconnect providing Layer-3 services. The protocol stack typically has a number of layers below the Ethernet Layer-2 that shows up to the Layer-3 IP transport. It is not uncommon that on top of the lowest layer (optical) transport there is the first layer of Ethernet followed one or more layers of MPLS, PseudoWires and/or other tunneling protocols finally carrying the Ethernet layer visible to the user plane IP traffic. While there are existing technologies, especially in MPLS/PWE space, to establish circuits through the routed and switched networks, there is a lack of signaling the time synchronization and time-sensitive stream requirements/reservations for Layer-3 flows in a way that the entire transport stack is addressed and the Ethernet layers that needs to be configured are addressed. Furthermore, not all "user plane" traffic will be IP. Therefore, the same solution need also address the use cases where the user plane traffic is again another layer or Ethernet frames. There is existing work describing the problem statement [I-D.finn-detnet-problem-statement] and the architecture [I-D.finn-detnet-architecture] for deterministic networking (DetNet) that eventually targets to provide solutions for time-sensitive (IP/transport) streams with deterministic properties over Ethernet-based switched networks.
This document describes requirements for deterministic networking in a cellular telecom transport networks context. The requirements include time synchronization, clock distribution and ways of establishing time-sensitive streams for both Layer-2 and Layer-3 user plane traffic using IETF protocol solutions.
The recent developments in telecommunication networks, especially in the cellular domain, are heading towards transport networks where precise time synchronization support has to be one of the basic building blocks. While the transport networks themselves have practically transitioned to all-AP packet based networks to meet the bandwidth and cost requirements, a highly accurate clock distribution has become a challenge. Earlier the transport networks in the cellular domain were typically time division and multiplexing (TDM) -based and provided frequency synchronization capabilities as a part of the transport media. Alternatively other technologies such as Global Positioning System (GPS) or Synchronous Ethernet (SyncE) [SyncE] were used. New radio access network deployment models and architectures may require time sensitive networking services with strict requirements on other parts of the network that previously were not considered to be packetized at all. The time and synchronization support are already topical for backhaul and midhaul packet networks [MEF], and becoming a real issue for fronthaul networks. Specifically in the fronthaul networks the timing and synchronization requirements can be extreme for packet based technologies, for example, in order of sub +-20 ns packet delay variation (PDV) and frequency accuracy of +0.002 PPM [Fronthaul].
Both Ethernet and IP/MPLS [RFC3031] (and PseudoWires (PWE) [RFC3985] for legacy transport support) have become popular tools to build and manage new all-IP radio access networks (RAN) [I-D.kh-spring-ip-ran-use-case]. Although various timing and synchronization optimizations have already been proposed and implemented including 1588 PTP enhancements [I-D.ietf-tictoc-1588overmpls][I-D.mirsky-mpls-residence-time], these solution are not necessarily sufficient for the forthcoming RAN architectures or guarantee the higher time-synchronization requirements [CPRI]. There are also existing solutions for the TDM over IP [RFC5087] [RFC4553] or Ethernet transports [RFC5086]. The really interesting and important existing work for time sensitive networking has been done for Ethernet [TSNTG], which specifies the use of IEEE 1588 time precision protocol (PTP) [IEEE1588] in the context of IEEE 802.1D and IEEE 802.1Q. While IEEE 802.1AS [IEEE8021AS] specifies a Layer-2 time synchronizing service other specification, such as IEEE 1722 [IEEE1722] specify Ethernet-based Layer-2 transport for time-sensitive streams. New promising work seeks to enable the transport of time-sensitive fronthaul streams in Ethernet bridged networks [IEEE8021CM]. Similarly to IEEE 1722 there is an ongoing standardization effort to define Layer-2 transport encapsulation format for transporting radio over Ethernet (RoE) in IEEE 1904.3 Task Force [IEEE19043].
As already mentioned all-IP RANs and various "haul" networks would benefit from time synchronization and time-sensitive transport services. Although Ethernet appears to be the unifying technology for the transport there is still a disconnect providing Layer-3 services. The protocol stack typically has a number of layers below the Ethernet Layer-2 that shows up to the Layer-3 IP transport. It is not uncommon that on top of the lowest layer (optical) transport there is the first layer of Ethernet followed one or more layers of MPLS, PseudoWires and/or other tunneling protocols finally carrying the Ethernet layer visible to the user plane IP traffic. While there are existing technologies, especially in MPLS/PWE space, to establish circuits through the routed and switched networks, there is a lack of signaling the time synchronization and time-sensitive stream requirements/reservations for Layer-3 flows in a way that the entire transport stack is addressed and the Ethernet layers that needs to be configured are addressed. Furthermore, not all "user plane" traffic will be IP. Therefore, the same solution need also address the use cases where the user plane traffic is again another layer or Ethernet frames. There is existing work describing the problem statement [I-D.finn-detnet-problem-statement] and the architecture [I-D.finn-detnet-architecture] for deterministic networking (DetNet) that eventually targets to provide solutions for time-sensitive (IP/transport) streams with deterministic properties over Ethernet-based switched networks.
This document describes requirements for deterministic networking in a cellular telecom transport networks context. The requirements include time synchronization, clock distribution and ways of establishing time-sensitive streams for both Layer-2 and Layer-3 user plane traffic using IETF protocol solutions.
Figure Figure 9 illustrates a typical, 3GPP defined, cellular network architecture, which also has fronthaul and midhaul network segments. The fronthaul refers to the network connecting base stations (base band processing units) to the remote radio heads (antennas). The midhaul network typically refers to the network inter-connecting base stations (or small/pico cells).
Fronthaul networks build on the available excess time after the base band processing of the radio frame has completed. Therefore, the available time for networking is actually very limited, which in practise determines how far the remote radio heads can be from the base band processing units (i.e. base stations). For example, in a case of LTE radio the Hybrid ARQ processing of a radio frame is allocated 3 ms. Typically the processing completes way earlier (say up to 400 us, could be much less, though) thus allowing the remaining time to be used e.g. for fronthaul network. 200 us equals roughly 40 km of optical fiber based transport (assuming round trip time would be total 2*200 us). The base band processing time and the available "delay budget" for the fronthaul is a subject to change, possibly dramatically, in the forthcoming "5G" to meet, for example, the envisioned reduced radio round trip times, and other architecural and service requirements [NGMN].
The maximum "delay budget" is then consumed by all nodes and required buffering between the remote radio head and the base band processing in addition to the distance incurred delay. Packet delay variation (PDV) is problematic to fronthaul networks and must be minimized. If the transport network cannot guarantee low enough PDV additional buffering has to be introduced at the edges of the network to buffer out the jitter. Any buffering will eat up the total available delay budget, though. Section Section 6.3 will discuss the PDV requirements in more detail.
Y (remote radios) \ Y__ \.--. .--. +------+ \_( `. +---+ _(Back`. | 3GPP | Y------( Front )----|eNB|----( Haul )----| core | ( ` .Haul ) +---+ ( ` . ) ) | netw | /`--(___.-' \ `--(___.-' +------+ Y_/ / \.--. \ Y_/ _( Mid`. \ ( Haul ) \ ( ` . ) ) \ `--(___.-'\_____+---+ (small cells) \ |SCe|__Y +---+ +---+ Y__|eNB|__Y +---+ Y_/ \_Y ("local" radios)
Figure 9: Generic 3GPP-based cellular network architecture with Front/Mid/Backhaul networks
Cellular networks starting from long term evolution (LTE) [TS36300] [TS23401] radio the phase synchronization is also needed in addition to the frequency synchronization. The commonly referenced fronthaul network synchronization requirements are typically drawn from the common public radio interface (CPRI) [CPRI] specification that defines the transport protocol between the base band processing - radio equipment controller (REC) and the remote antenna - radio equipment (RE). However, the fundamental requirements still originate from the respective cellular system and radio specifications such as the 3GPP ones [TS25104][TS36104][TS36211] [TS36133].
The fronthaul time synchronization requirements for the current 3GPP LTE-based networks are listed below:
The above listed time synchronization requirements are hard to meet even with point to point connected networks, not to mention cases where the underlying transport network actually constitutes of multiple hops. It is expected that network deployments have to deal with the jitter requirements buffering at the very ends of the connections, since trying to meet the jitter requirements in every intermediate node is likely to be too costly. However, every measure to reduce jitter and delay on the path are valuable to make it easier to meet the end to end requirements.
In order to meet the timing requirements both senders and receivers must is perfect sync. This asks for a very accurate clock distribution solution. Basically all means and hardware support for guaranteeing accurate time synchronization in the network is needed. As an example support for 1588 transparent clocks (TC) in every intermediate node would be helpful.
In addition to the time synchronization requirements listed in Section Section 6.3 the fronthaul networks assume practically error free transport. The maximum bit error rate (BER) has been defined to be 10^-12. When packetized that would equal roughly to packet error rate (PER) of 2.4*10^-9 (assuming ~300 bytes packets). Retransmitting lost packets and/or using forward error coding (FEC) to circumvent bit errors are practically impossible due additional incurred delay. Using redundant streams for better guarantees for delivery is also practically impossible due to high bandwidth requirements fronthaul networks have. For instance, current uncompressed CPRI bandwidth expansion ratio is roughly 20:1 compared to the IP layer user payload it carries in a "radio sample form".
The other fundamental assumption is that fronthaul links are symmetric. Last, all fronthaul streams (carrying radio data) have equal priority and cannot delay or pre-empt each other. This implies the network has always be sufficiently under subscribed to guarantee each time-sensitive flow meets their schedule.
Mapping the fronthaul requirements to [I-D.finn-detnet-architecture] Section 3 "Providing the DetNet Quality of Service" what is seemed usable are:
The current time-sensitive networking features may still not be sufficient for fronthaul traffic. Therefore, having specific profiles that take the requirements of fronthaul into account are deemed to be useful [IEEE8021CM].
The actual transport protocols and/or solutions to establish required transport "circuits" (pinned-down paths) for fronthaul traffic are still undefined. Those are likely to include but not limited to solutions directly over Ethernet, over IP, and MPLS/PseudoWire transport.
Establishing time-sensitive streams in the network entails reserving networking resources sometimes for a considerable long time. It is important that these reservation requests must be authenticated to prevent malicious reservation attempts from hostile nodes or even accidental misconfiguration. This is specifically important in a case where the reservation requests span administrative domains. Furthermore, the reservation information itself should be digitally signed to reduce the risk where a legitimate node pushed a stale or hostile configuration into the networking node.
(This section was derived from draft-varga-industrial-m2m-00)
Traditional "industrial automation" and terminology usually refers to automation of manufacturing, quality control and material processing. In practice, it means that machine units in a plant floor need cyclic control data exchange to upstream or downstream machine modules or to a supervisory control in a local network, which is often based on proprietary networking technologies today.
For such communication between industrial entities, it is critical to ensure proper and deterministic end to end delivery time of messages with (very) high reliability and robustness, especially in closed loop automation control.
Moreover, the recent trend is to use standard networking technologies in the local network and for connecting remote industrial automation sites, e.g., over an enterprise or metro network which also carries other types of traffic. Therefore, deterministic flows should be guaranteed regardless of the amount of other flows in those networks for the deployment of future industrial automation.
This document covers a selected industrial application, identifies representative solutions used today, and points on new use cases an IETF DetNet solution may enable.
In case of industrial automation, the actors of Machine to Machine (M2M) communication are Programmable Logic Controls (PLC). The communication between PLCs and between PLCs and the supervisory PLC (S-PLC) is achieved via critical Control-Data-Streams Figure 10. This draft focuses on PLC related communications and communication to Manufacturing-Execution-System (MES) are out-of-scope. The PLC related Control-Data-Streams are transmitted periodically and they are established either with (i) a pre-configured payload or (ii) a payload configuration during runtime.
S (Sensor) \ +-----+ PLC__ \.--. .--. ---| MES | \_( `. _( `./ +-----+ A------( Local )-------------( L2 ) ( Net ) ( Net ) +-------+ /`--(___.-' `--(___.-' ----| S-PLC | S_/ / PLC .--. / +-------+ A_/ \_( `. (Actuator) ( Local ) ( Net ) /`--(___.-'\ / \ A S A
Figure 10: Current generic industrial M2M network architecture
The network topologies used today by applications of industrial automation are (i) daisy chain, (ii) ring and (iii) hub and spoke. Such topologies are often used in telecommunication networks too. In industrial networks comb (being a subset of daisy-chain) is also used.
Some industrial applications require Time Synchronization (Sync) to end nodes, which is also similar to some telecommunication networks, e.g., mobile Radio Access Networks. For such time coordinated PLCs, accuracy of 1 microseconds is required. In case of non-time coordinated PLCs, a requirement for Time Sync may still exist, e.g., for time stamping of collected measurement (sensor) data.
The requirements listed here refer to critical Control-Data-Streams. Non-critical traffic of industrial automation applications can be served with currently available prioritizing techniques.
In an industrial environment, non-time-critical traffic is related to (i) communication of state, configuration, set-up, etc., (ii) connection to Manufacturing-Execution-System (MES) and (iii) database communication. Such type of traffic can use up to 80% of the available bandwidth. There is a subset of non-time-critical traffic that their bandwidth should be guaranteed.
The rest of this chapter is dealing only with time-critical traffic.
The Cycle Time defines the frequency of message(s) between industrial entities. The Cycle Time is application dependent, it is in the range of 1ms - 100ms for critical Control-Data-Streams.
As industrial applications assume deterministic transport instead of defining latency and delay variation parameters for critical Control-Data-Stream parameters, it is enough to fulfill the upper bound of latency (maximum latency). The communication must ensure a maximum end to end delivery time of messages in the range of 100 microseconds to 50 milliseconds depending on the control loop application.
Bandwidth requirements of Control-Data-Streams are usually calculated directly from the bytes per cycle parameter of the control loop. For PLC to PLC communication one can expect 2 - 32 streams with packet size in the range of 100 - 700 bytes. For S-PLC to PLCs the number of streams is higher up-to 256 streams need to be supported. Usually no more than 20% of available bandwidth is used for critical Control-Data-Streams in today's networks, which comprise Gbps links.
Usual PLC control loops are rather tolerant for packet loss. Critical Control-Data-Streams accept no more than 1 packet loss per consecutive communication cycles. The required network availability is rather high, it hits the 5 nines (99,999%).
Based on the above parameters, it can be concluded that some form of redundancy might be required for M2M communication. The actual solution depends on several parameters, like cycle time, delivery time, etc.
Most Critical Control-Data-Streams get created at startup, however, flexibility is also needed during runtime (e.g. add / remove machine). In an industrial environment, critical Control-Data-Streams are created rather infrequent: ~10 times per day / week / month. With the future advent of flexible production systems, flow maintenance parameters are expected to increase significantly.
This document specifies an industrial machine-to-machine use-case in the DetNet context.
Industrial network scenarios require advanced security solutions. Many of the current industrial production networks are physically separated. Protection of critical flows are handled today by gateways / firewalls.
The authors would like to thank Feng Chen and Marcel Kiessling for their comments and suggestions.
(This section was derived from draft-zha-detnet-use-case-00)
The rapid growth of the today's communication system and its access into almost all aspects of daily life has led to great dependency on services it provides. The communication network, as it is today, has applications such as multimedia and peer-to-peer file sharing distribution that require Quality of Service (QoS) guarantees in terms of delay and jitter to maintain a certain level of performance. Meanwhile, mobile wireless communications has become an important part to support modern sociality with increasing importance over the last years. A communication network of hard real-time and high reliability is essential for the next concurrent and next generation mobile wireless networks as well as its bearer network for E-2-E performance requirements.
Conventional transport network is IP-based because of the bandwidth and cost requirements. However the delay and jitter guarantee becomes a challenge in case of contention since the service here is not deterministic but best effort. With more and more rigid demand in latency control in the future network [METIS], deterministic networking [I-D.finn-detnet-architecture] is a promising solution to meet the ultra low delay applications and use cases. There are already typical issues for delay sensitive networking requirements in midhaul and backhaul network to support LTE and future 5G network [net5G]. And not only in the telecom industry but also other vertical industry has increasing demand on delay sensitive communications as the automation becomes critical recently.
More specifically, CoMP techniques, D-2-D, industrial automation and gaming/media service all have great dependency on the low delay communications as well as high reliability to guarantee the service performance. Note that the deterministic networking is not equal to low latency as it is more focused on the worst case delay bound of the duration of certain application or service. It can be argued that without high certainty and absolute delay guarantee, low delay provisioning is just relative [rfc3393], which is not sufficient to some delay critical service since delay violation in an instance cannot be tolerated. Overall, the requirements from vertical industries seem to be well aligned with the expected low latency and high determinist performance of future networks
This document describes several use cases and scenarios with requirements on deterministic delay guarantee within the scope of the deterministic network [I-D.finn-detnet-problem-statement].
Delay and jitter requirement has been take into account as a major component in QoS provisioning since the birth of Internet. The delay sensitive networking with increasing importance become the root of mobile wireless communications as well as the applicable areas which are all greatly relied on low delay communications. Due to the best effort feature of the IP networking, mitigate contention and buffering is the main solution to serve the delay sensitive service. More bandwidth is assigned to keep the link low loaded or in another word, reduce the probability of congestion. However, not only lack of determinist but also has limitation to serve the applications in the future communication system, keeping low loaded cannot provide deterministic delay guarantee. Take the [METIS] that documents the fundamental challenges as well as overall technical goal of the 5G mobile and wireless system as the starting point. It should supports: -1000 times higher mobile data volume per area, -10 times to 100 times higher typical user data rate, -10 times to 100 times higher number of connected devices, -10 times longer battery life for low power devices, and -5 times reduced End-to-End (E2E) latency, at similar cost and energy consumption levels as today's system. Taking part of these requirements related to latency, current LTE networking system has E2E latency less than 20ms [LTE-Latency] which leads to around 5ms E2E latency for 5G networks. It has been argued that fulfill such rigid latency demand with similar cost will be most challenging as the system also requires 100 times bandwidth as well as 100 times of connected devices. As a result to that, simply adding redundant bandwidth provisioning can be no longer an efficient solution due to the high bandwidth requirements more than ever before. In addition to the bandwidth provisioning, the critical flow within its reserved resource should not be affected by other flows no matter the pressure of the network. Robust defense of critical flow is also not depended on redundant bandwidth allocation. Deterministic networking techniques in both layer-2 and layer-3 using IETF protocol solutions can be promising to serve these scenarios.
In the wireless communication system, Coordinated multipoint processing (CoMP) is considered as an effective technique to solve the inter-cell interference problem to improve the cell-edge user throughput [CoMP].
+--------------------------+ | CoMP | +--+--------------------+--+ | | +----------+ +------------+ | Uplink | | Downlink | +-----+----+ +--------+---+ | | ------------------- ----------------------- | | | | | | +---------+ +----+ +-----+ +------------+ +-----+ +-----+ | Joint | | CS | | DPS | | Joint | | CS/ | | DPS | |Reception| | | | | |Transmission| | CB | | | +---------+ +----+ +-----+ +------------+ +-----+ +-----+ | | |----------- |------------- | | | | +------------+ +---------+ +----------+ +------------+ | Joint | | Soft | | Coherent | | Non- | |Equalization| |Combining| | JT | | Coherent JT| +------------+ +---------+ +----------+ +------------+
Figure 11: Framework of CoMP Technology
As shown in Figure 11, CoMP reception and transmission is a framework that multiple geographically distributed antenna nodes cooperate to improve the performance of the users served in the common cooperation area. The design principal of CoMP is to extend the current single-cell to multi-UEs transmission to a multi-cell- to-multi-UEs transmission by base station cooperation. In contrast to single-cell scenario, CoMP has critical issues such as: Backhaul latency, CSI (Channel State Information) reporting and accuracy and Network complexity. Clearly the first two requirements are very much delay sensitive and will be discussed in next section.
As the essential feature of CoMP, signaling is exchanged between eNBs, the backhaul latency is the dominating limitation of the CoMP performance. Generally, JT and JP may benefit from coordinating the scheduling (distributed or centralized) of different cells in case that the signaling exchanging between eNBs is limited to 4-10ms. For C-RAN the backhaul latency requirement is 250us while for D-RAN it is 4-15ms. And this delay requirement is not only rigid but also absolute since any uncertainty in delay will down the performance significantly. Note that, some operator's transport network is not build to support Layer-3 transfer in aggregation layer. In such case, the signaling is exchanged through EPC which means delay is supposed to be larger. CoMP has high requirement on delay and reliability which is lack by current mobile network systems and may impact the architecture of the mobile network.
Traditional "industrial automation" terminology usually refers to automation of manufacturing, quality control and material processing. "Industrial internet" and "industrial 4.0" [EA12] is becoming a hot topic based on the Internet of Things. This high flexible and dynamic engineering and manufacturing will result in a lot of so-called smart approaches such as Smart Factory, Smart Products, Smart Mobility, and Smart Home/Buildings. No doubt that ultra high reliability and robustness is a must in data transmission, especially in the closed loop automation control application where delay requirement is below 1ms and packet loss less than 10E-9. All these critical requirements on both latency and loss cannot be fulfilled by current 4G communication networks. Moreover, the collaboration of the industrial automation from remote campus with cellular and fixed network has to be built on an integrated, cloud-based platform. In this way, the deterministic flows should be guaranteed regardless of the amount of other flows in the network. The lack of this mechanism becomes the main obstacle in deployment on of industrial automation.
V2V communication has gained more and more attention in the last few years and will be increasingly growth in the future. Not only equipped with direct communication system which is short ranged, V2V communication also requires wireless cellular networks to cover wide range and more sophisticated services. V2V application in the area autonomous driving has very stringent requirements of latency and reliability. It is critical that the timely arrival of information for safety issues. In addition, due to the limitation of processing of individual vehicle, passing information to the cloud can provide more functions such as video processing, audio recognition or navigation systems. All of those requirements lead to a highly reliable connectivity to the cloud. On the other hand, it is natural that the provisioning of low latency communication is one of the main challenges to be overcome as a result of the high mobility, the high penetration losses caused by the vehicle itself. As result of that, the data transmission with latency below 5ms and a high reliability of PER below 10E-6 are demanded. It can benefit from the deployment of deterministic networking with high reliability.
Online gaming and cloud gaming is dominating the gaming market since it allow multiple players to play together with more challenging and competing. Connected via current internet, the latency can be a big issue to degrade the end users' experience. There different types of games and FPS (First Person Shooting) gaming has been considered to be the most latency sensitive online gaming due to the high requirements of timing precision and computing of moving target. Virtual reality is also receiving more interests than ever before as a novel gaming experience. The delay here can be very critical to the interacting in the virtual world. Disagreement between what is seeing and what is feeling can cause motion sickness and affect what happens in the game. Supporting fast, real-time and reliable communications in both PHY/MAC layer, network layer and application layer is main bottleneck for such use case. The media content delivery has been and will become even more important use of Internet. Not only high bandwidth demand but also critical delay and jitter requirements have to be taken into account to meet the user demand. To make the smoothness of the video and audio, delay and jitter has to be guaranteed to avoid possible interruption which is the killer of all online media on demand service. Now with 4K and 8K video in the near future, the delay guarantee become one of the most challenging issue than ever before. 4K/8K UHD video service requires 6Gbps-100Gbps for uncompressed video and compressed video starting from 60Mbps. The delay requirement is 100ms while some specific interactive applications may require 10ms delay [UHD-video].
Looking at the use cases collectively, the following common desires for the DetNet-based networks of the future emerge:
This document has benefited from reviews, suggestions, comments and proposed text provided by the following members, listed in alphabetical order: Jing Huang, Junru Lin, Lehong Niu and Oilver Huang.