Internet Engineering Task Force C. Gunther, Ed.
Internet-Draft HARMAN
Intended status: Informational March 4, 2015
Expires: September 5, 2015

Deterministic Networking Professional Audio Requirements
draft-gunther-detnet-proaudio-req-00

Abstract

This draft documents the needs in the Professional Audio industry to establish multi-hop paths and optional redundant paths for characterized flows with deterministic properties.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on September 5, 2015.

Copyright Notice

Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.


Table of Contents

1. Introduction

Professional Audio (Pro-A) includes the simple and small network used by a garage band which may contain a handful of devices, as well as the large theme park spread across 25,000 acres or more. It is worth noting that these theme parks may exist on multiple continents and share content around the world.

Some examples of Pro-A networks include:

While many of these uses have common requirements there are some unique usage models that will be highlighted in this document.

2. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].

3. Stream Characteristics

All streams of interest to the Pro-A world have the same requirements related to establishing a path and allocating bandwidth as any other type of network application. This section of the draft is meant to introduce other concerns associated with streams in a Pro-A network.

3.1. Emergency Notifications

Audio systems installed in public environments have unique requirements with regards to health, safety and fire concerns. For example [ISO7240-16] subjects equipment to tests that can simulate an emergency situation. The purpose of this section is to provide a very basic set of requirements that an underlying network must provide if it is to be used in public areas. It would be advantageous to establish a liaison with the International Standards Organization (ISO) so that the referenced ISO 7240 standards could be made available for Deterministic Networking (DetNet) review for the specific details.

The remainder of this section is simply a synopsis of some of the requirements found in the ISO 7240 standard. The wording in that standard supersedes anything specified in this section and it should be referenced for the specific requirements.

Any numbers in this section surrounded by braces refers to the specific section within ISO 7240-16:2007 (for example {7.1.1} is a reference to section 7.1.1).

One such requirement is a maximum of 3 seconds {7.1.1} for a system to respond to an emergency detection and begin sending appropriate warning signals and alarms. When these conditions occur the audio system must be able to disable normal functions {7.1.4} not associated with emergency functionality, without the need for human intervention.

Announcements must be able to be made within 20 seconds of a system reset {7.9.2.2}.

In the event of equipment failure the backup equipment must be able to take over within 10 seconds {14.4.1}. This would include detection time, new path configuration, etc.

3.2. Content Protection

Digital Rights Management (DRM) is very important to the Pro-A and Professional Video industries. Any time that protected content is introduced into a network there are DRM concerns that must be maintained. (See [CONTENT_PROTECTION]).

As an example, two techniques are Digital Transmission Content Protection (DTCP) and High-Bandwidth Digital Content Protection (HDCP). HDCP content is not approved for retransmission within any other type of DRM, while DTCP may be retransmitted under HDCP. Therefore if the source of a stream is outside of the network and it uses HDCP protection it is only allowed to be placed on the network with that same HDCP protection.

3.3. Multiple Sinks

Pro Audio systems often have multiple sinks (e.g.: speakers) connected to a single source. In order to keep bandwidth utilization of shared links to a minimum multicast addressing is commonly used.

3.4. Super Stream = Two or More Serial Streams

Audio content delivered from a source (e.g.: microphone or guitar) can be sent through one or more stages of processing before it reaches the sink(s). For example, one stream may be used to send audio from a microphone hub to a digital processor that will match the singers pitch to that of a guitar. A second stream will then take that processed audio to a mixing console. A third stream is then required to move the mixed audio to an amplified speaker. Not only does this one super "stream" require three physical streams to be created, but the overall latency of all three streams plus the digital processing at each hop must not exceed 10-15 msec. See slide 6 of [SRP_LATENCY].

3.5. Unused Reservations and Best-Effort Traffic

Often times reservations are created, but not used until some time later in a live show. This is really more of a comfort issue for the show's producers; they just want to know that there is no reason an important reservation's request could be refused during a live performance.

In other situations a single reservation may be used for different content at different times throughout the day. It is convenient to create a single reservation that is large enough for the biggest bandwidth consumer although that could be wasteful on smaller streams.

In both these cases it is advantageous for other best-effort traffic to be able to use that unused bandwidth so that the full bandwidth of the network can be utilized at all times. This best-effort traffic could consist of "meter data" which helps an operator understand what is going on at the other end of Pro-A system in an amusement park. Or it could be used for file transfers or venue updates. Regardless of the reason, Pro-A installations will want to be able to use any reserved bandwidth that is unused.

3.6. Maximum and Acceptable Latency

In order to synchronize speakers throughout a venue it is critical for each sink (amplified speaker) to know what the maximum latency is it can expect to see from the network. That maximum latency from each sink is sent back to the source, or an associated Controller, so the presentation time of the Pro-A audio data samples can be set. In addition, sinks that are fewer hops away from the source will know how much memory they will need to provide in order to buffer the content that will be presented at some later time.

A Controller may also collect the various maximum latency numbers and decide to exclude the sinks that are too many hops away since they will place unrealistic buffering requirements on the sinks that are very few hops from the source.

Additionally, sinks that are closer to the source can inform the network that they can accept more latency than the network is currently offering since they will be buffering packets to match play-out time of father away sinks. This acceptable latency can be used by the network to move a reservation on a short path to a longer path in order to free up bandwidth for other critical streams on that short path. See slides 3-5 of [SRP_LATENCY].

3.7. Latency Per Sink

As previously mentioned a single stream may be sent to multiple sinks. This use case introduces the concept of more stringent latency requirements for some sinks, whereas other sinks have more flexible latency requirements. A live outdoor concert has stringent requirements for delivering the audio to the speaker systems, yet can have very flexible requirements for that same audio content that is delivered to a mobile recording studio that is set up nearby. See slide 7 of [SRP_LATENCY].

3.8. Layer 3 Interconnecting Layer 2 Islands

The DetNet solution for Layer 3 networks should support Layer 3 segments that can connect to Layer 2 networks that do not support Layer 3 protocols.

3.9. Link Aggregation

If any type of link aggregation is proposed as part of the DetNet solution there must be a technique used that can determine the maximum latency that a packet may experience when flowing across any links in that aggregation.

Or, an alternative could be to report the maximum latency of a single link within the link aggregation and then enforce that the stream will only use that link when establishing the path.

3.10. Layer 3 Multicast

Because of the MAC Address forwarding nature of Layer 2 bridges it is important that a multicast MAC Address is only associated with one stream. This will prevent reservations from forwarding packets from one stream down a path that has no interested sinks simply because there is another stream on that same path that shares the same multicast MAC address.

Since each multicast MAC Address can represent 32 different IPv4 multicast addresses there must be a process put in place to make sure this does not occur. Optionally it could be stated that Deterministic Networking will recommend the use of IPv6, although the impact of such a decision upon existing IPv4 installations should be discussed.

3.11. Segregate Traffic

Sink devices may have limited processing power. In order to not overwhelm the CPUs in these devices it is important to limit the amount of traffic that these devices must process. Packet forwarding rules should eliminate extraneous streaming traffic from reaching these devices; however there may be other types of broadcast traffic that should be eliminated where possible. This is often done by VLANs or IP subnets.

3.12. Elapsed Time to Build a Reservation

During a venue change in a show various modifications to reservations may be required. Some existing reservation may be torn down and other reservations may be established. On the Pro-A side this may be a simple reconfiguration of the speakers so the sound field can be created in a different way, or inclusion or exclusion of certain areas in the physical environment.

When video is added to the mix this may be switching from one camera to another. Currently video systems use expensive switching hardware to switch inputs at the head-end of the final feed. Interest has been expressed from the Broadcast industry to the IEEE AVB group for using the network as the video switch (see [STUDIO_IP]).

There is also the issue of the time between power-on and establishment of the first set of reservations. In many situations the appropriate thing to do is simply reestablish all paths and bandwidth reservations as were in place when the power was turned off, doing this as quickly as possible. This is particularly true when recovering from a power failure, or accidental removal of an Ethernet cable or power cord.

4. Use Cases

4.1. Singularity of IT and AV Networks

A recent large installation of a Pro-A network based on IEEE 802.1 AVB technology encompassed a 194,000 sq ft, $125 million facility. The network is capable of handling 46 Tbps of throughput with 60,000 simultaneous signals. Inside the facility are 1,100 miles of fiber feeding four audio control rooms. Phase I of this project was for audio, the next phase will include video as well. One of the future goals of this project is to have the capability to integrate IT infrastructure with the audio streaming technology. Details of this installation can be found here [ESPN_DC2].

4.2. Combining Local and Remote Content

One advantage of a guaranteed reservation with a small bounded latency is the reduced buffering requirements on sink devices. As mentioned earlier there are large theme parks, megachurches, and other venues that wish to broadcast a live event from one physical location to another physical location. These may be across town or across the globe and the content would be delivered via a layer 3 protocol. Depending on the technology available, latency bounds and jitter caused by Internet delivery of content can have a huge impact on the buffering requirements at the receiving site.

In these situations it is acceptable at the local location for content from the live remote site to be delayed (buffered) a reasonable amount to allow for a statistically acceptable amount of latency in order to reduce jitter. However, once the content begins playing in the local location any audio artifacts caused by the local network are unacceptable, especially in those situation where a live local performer is "mixed" into the feed from the remote location.

With these scenarios a single gateway device at the local network that is receiving the feed from the remote site would provide the expensive buffering required to mask the latency and jitter issues associated with long distance delivery. Sink devices in the local location would have no additional buffering requirements, and thus no additional costs, beyond those required for delivery of local content. The sink device would be receiving the identical packets as those sent by the source and would be unaware that there were any latency or jitter issues along the path.

4.3. Lots of Small Devices

Consumers expect more and more from their theater experiences. One example is the use of individual theater seat speakers and effects systems. In order to be cost effective these systems must be inexpensive per seat since the quantities in a single theater can reach hundreds or thousands of seats.

Discovery protocols alone in a one thousand seat theater can generate a lot of broadcast traffic that can put an unnecessary load on a low powered CPU. An installation like this will require some type of traffic segregation that can create groups of seats to reduce traffic within that group. All seats in the theater must still be able to communicate with a central controller.

5. Acknowledgements

The editor would like to acknowledge the help of the following individuals and the companies they represent:

Jeff Koftinoff, Meyer Sound

Jouni Korhonen, Associate Technical Director, Broadcom

Pascal Thubert, CTAO, Cisco

Kieran Tyrrell, Sienda New Media Technologies GmbH

6. IANA Considerations

This memo includes no request to IANA.

7. Security Considerations

7.1. Content Protection

As mentioned earlier any solutions that would be recommended for the Professional A/V space must support DRM.

7.2. Denial of Service

Many industries that are moving from the analog wire world to the digital network world have little understanding of the pitfalls that they can create for themselves by an improperly installed system. DetNet should consider ways to provide security against DoS attacks in solutions directed at these markets.

One example this author is aware of involved the use of technology that allows a presenter to "throw" the content from their tablet or smart phone onto the A/V system that is then viewed by all those in attendance. The facility introducing this technology was quite excited to allow such modern flexibility to those who came to speak. One thing they hadn't realized was that since no security was put in place around this technology it left a hole in the system that allowed other attendees to "throw" their own content onto the A/V system.

7.3. Control Protocols

Pro-A systems can include amplifiers that are capable of generating several hundreds or thousands of watts of audio power. If used incorrectly these systems can cause hearing damage to those in the vicinity of the speaker arrays. The traffic that controls these devices must be protected and that is mostly a concern of those providing that service. However, the configuration protocols that create the network paths used by the Pro-A traffic should be protected as well so that high-volume content cannot be sent to areas that are not meant to receive it.

8. References

8.1. Normative References

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.

8.2. Informative References

[CONTENT_PROTECTION] Olsen, D., "1722a Content Protection", 2012.
[ESPN_DC2] Daley, D., "ESPN's DC2 Scales AVB Large", 2014.
[ISO7240-16] ISO, "ISO 7240-16:2007 Fire detection and alarm systems -- Part 16: Sound system control and indicating equipment", 2007.
[SRP_LATENCY] Gunther, C., "Specifying SRP Latency", 2014.
[STUDIO_IP] Mace, G., "IP Networked Studio Infrastructure for Synchronized & Real-Time Multimedia Transmissions", 2007.

Author's Address

Craig Gunther (editor) Harman International 10653 South River Front Parkway South Jordan, UT 84095 USA Phone: +1 801 568-7675 EMail: craig.gunther@harman.com URI: http://www.harman.com