Internet DRAFT - draft-bernstein-alto-large-bandwidth-cases
draft-bernstein-alto-large-bandwidth-cases
Network Working Group Greg Bernstein
Internet Draft Grotto Networking
Intended status: Informational Young Lee
Huawei
July 16, 2012
Use Cases for High Bandwidth Query and Control of Core Networks
draft-bernstein-alto-large-bandwidth-cases-02.txt
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with
the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on January 16, 2011.
Copyright Notice
Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
Bernstein & Lee, et al. Expires January 16, 2013 [Page 1]
Internet-Draft Cross Stratum Optimization Use-cases July 2012
carefully, as they describe your rights and restrictions with
respect to this document.
Abstract
This draft describes two generic use-cases that illustrate
application layer traffic optimization applied to high bandwidth
core networks. The type of information and interactions needed to
perform various optimizations is described. In addition extensions
to the existing ALTO protocol widely applicable to any high
bandwidth applications are suggested. These include bandwidth
constraint representations for a diverse range of control and data
plane technologies as well as advanced filtering based on
constraints.
Table of Contents
1. Introduction...................................................3
1.1. Computing Clouds, Data Centers, and End Systems...........4
2. End System Aggregate Networking................................5
2.1. Aggregated Bandwidth Scaling..............................5
2.2. Cross Stratum Optimization Example........................6
2.3. Data Center and Network Faults and Recovery...............7
3. Data Center to Data Center Networking..........................8
3.1. Cross Stratum Optimization Examples.......................9
3.2. Network and Data Center Faults and Reliability............9
4. Cross Stratum Control Interfaces..............................10
5. Potential ALTO Protocol Extensions............................11
6. Bandwidth Constraint Information..............................12
6.1. Introduction.............................................12
6.1.1. Example Network: Providers View.....................13
6.2. Data and Control Plane Path Choices......................14
6.3. ALTO Extensions..........................................15
6.3.1. Mutually Constrained Paths..........................15
6.3.1.1. Simple IP Network Example......................16
6.3.1.2. TDM Network Example............................16
6.3.1.3. JSON Encoding..................................18
6.3.2. Cost-Capacity Graphs................................18
6.3.2.1. Simple TDM Example with Graph Reduction........19
6.3.2.2. Ethernet MSTP Example with Multiple Graphs.....20
6.3.2.3. JSON Encoding..................................23
7. Constraint Based Filtering....................................24
8. Conclusion....................................................24
9. Security Considerations.......................................24
10. IANA Considerations..........................................25
11. References...................................................25
11.1. Informative References..................................25
Bernstein & Lee Expires January 16, 2013 [Page 2]
Internet-Draft Cross Stratum Optimization Use-cases July 2012
Author's Addresses...............................................27
Intellectual Property Statement..................................27
Disclaimer of Validity...........................................27
1. Introduction
Cloud Computing, network applications, software as a service (SaaS),
Platform as a service (PaaS), and Infrastructure as a Service
(IaaS), are just a few of the terms used to describe situations
where multiple computation entities interact with one another across
a network. When the communication resources consumed by these
interacting entities is significant compared with link or network
capacity then opportunities may exist for more efficient utilization
of available computation and network resources if both computation
and network stratums cooperate in some way. The application layer
traffic optimization (ALTO) working group is tackling the similar
problem of "better-than-random peer selection" for distributed
applications based on peer to peer (P2P) or client server
architectures [1]. In addition, such optimization is important in
content distribution networks (CDNs) as illustrated in [2].
In the network stratum, particularly at the lower layers such as
MPLS and optical, there are many restoration and recovery mechanisms
to deal with network faults. The emergence of network based
applications or cloud based disaster recovery/business recovery
brings a new dimension to fault management, but also opportunities
to more efficiently deliver higher levels of reliability. For
example, the reliability requirements for mission critical
applications are typically quantified by two key time parameters.
The first is the Recovery Time Objective (RTO) which is the time to
get the application back up and functioning and is similar to
network recovery time notions. The second is the Recovery Point
Objective (RPO) which quantifies in terms of time the amount of data
loss that can be tolerated when a disaster occurs. Different
applications and organizations can have greatly different demands
from miliseconds to 12 hours. In addition, the amount of data that
may need to be transferred to meet these objectives can vary greatly
amongst different application types. With recover point objectives
of, say an hour or more, a dynamic optical network layer could be
very efficiently shared so as to reduce the overall cost to achieve
a given layer of reliability. However, to do so requires cooperation
between application and network stratum.
General multi-protocol label switching (GMPLS) [3] can and is being
applied to various core networking technologies such as SONET/SDH
and wavelength division multiplexing (WDM) [4]. GMPLS provides
Bernstein & Lee Expires January 16, 2013 [Page 3]
Internet-Draft Cross Stratum Optimization Use-cases July 2012
dynamic network topology and resource information, and the
capability to dynamically allocate resources (provision label
switched paths). Furthermore, the path computation element (PCE) [5]
provides for traffic engineered path optimization.
However, neither GMPLS nor PCE provide interfaces that are
appropriate for an application layer entity to use for the following
reasons:
. GMPLS routing exposes full network topology information which
tends to be proprietary to a carrier or require specialized
knowledge and techniques to make use of, e.g., the routing and
wavelength assignment (RWA) problem in WDM networks [4].
. Core networks typically consist of two or more layers, while
applications are typically only know about the IP layer and
above. Hence applications would not be able to make direct use
of PCE capabilities.
. GMPLS signaling interfaces are defined for either peer GMPLS
nodes or via a user network interface (UNI) [6]. Neither of
these are appropriate for direct use by an application entity.
In this paper we discuss two general use-cases that can generate
core network flows with significant bandwidth and may vary
significantly over time. The "cross stratum optimization" problems
generated by these use cases are discussed. Finally, we look at
interfaces between the application and network "stratums" that can
enable these types of optimizations and how they can be created via
extensions to the current ALTO protocol[7].
1.1. Computing Clouds, Data Centers, and End Systems
While the definition of cloud computing or compute clouds is
somewhat nebulous (or "foggy" if you will) [8], the physical
instantiation of compute resources with network connectivity is very
real and bounded by physical and logical constraints. For the
purposes of this draft, we will call any network connected compute
resources a data center if its network connectivity is significant
compared either to the bandwidth of an individual WDM wavelength or
with respect to the network links in which it is located. Hence we
include in our definition very large data centers that feature
multiple fiber access and consume more than 10MW of power, moderate
to large content distribution network (CDN) installations located in
or near major internet exchange points, medium sized business
centers, etc...
Bernstein & Lee Expires January 16, 2013 [Page 4]
Internet-Draft Cross Stratum Optimization Use-cases July 2012
We will refer to those computational entities that don't meet our
bandwidth criteria for a data center as an "end system".
2. End System Aggregate Networking
In this section we consider the fundamental use case of end systems
communicating with data centers as shown in Figure 1. In this figure
the "clients" are end systems with relatively small access bandwidth
compared to a WDM wavelength, e.g., under 100Mbps. We show these
clients roughly partitioned into three network related end user
regions ("A", "B", and "C"). Given a particular network application,
in a static network application situation, each client in a region
would be associated with a particular data center.
Region B
+---------+ +------+
| Data | |Client|
|Center 2 | | B1 |+------+
+------+ +----+----+ +--+---+|Client|
|Client| | / | B2 |
| A1 `. _.-+--------+-. +--+---+
Region A +------+ `-. ,-'' `--. / ...
+------+ ,`: `+. +------+
|Client| / \ |Client|
| A2 +------+ \---+ BM |
+------+ ( Network ) +------+
... .-' /
+------+ _.-' \ `.
|Client|.-' `=. ,-' `.
| AN | _.-'' `--. _.-\ +---`.----+
+------+ +----'----+ `----+------+'' \ | Data |
| Data | | \ | |Center 3 |
|Center 1 | +--+---+ +--+---+ \ +---------+
+---------+ |Client| |Client| \------+
| C1 | | C2 | |Client|
+------+ +------+ | CK |
Region C +------+
Figure 1. End system to data center communications.
2.1. Aggregated Bandwidth Scaling
One of the simplest examples where the aggregation of end system
bandwidth can quickly become significant to the "network" is for
video on demand (VoD) streaming services. Unlike a live streaming
service where IP or lower layer multicast techniques can be
generally applied, in VoD the transmissions are unique between the
data center and clients. For regular quality VoD we'll use an
Bernstein & Lee Expires January 16, 2013 [Page 5]
Internet-Draft Cross Stratum Optimization Use-cases July 2012
estimate of 1.5Mbps per stream (assuming H.264 coding), for HD VoD
we'll use an estimate of 10Mbps per stream. To fill up a 10Gbps
capacity optical wavelength requires either 6,666 or 1,000 clients
for regular or high definition respectively. Note that special
multicasting techniques such as those discussed in [9] and peer
assistance techniques such as provided in some commercial systems
[10] can reduce the overall network bandwidth requirements.
With current high speed internet deployment such numbers of clients
are easily achieved; in addition demand for VoD services can vary
significantly over time, e.g., new video releases, inclement weather
(increases number of viewers), etc...
2.2. Cross Stratum Optimization Example
In an ideal world both data centers and networks would have
unlimited capacity, however in actuality both can have constraints
and possibly varying marginal costs that vary with load or time of
day. For example suppose that in Figure 1 that Data Center 3 has
been primarily serving VoD to region "C" but that it has, at a
particular period in time, run out of computation capacity to serve
all the client requests coming from region "C". At this point we
have a fundamental cross stratum optimization (CSO) problem. We want
to see if we can accommodate additional client request from region
"C" by using a different data center than the fully utilized data
center #3. To answer this questions we need to know (a) available
capacity on other data centers to meet a request, (b) the marginal
(incremental) cost of servicing the request on a particular data
center with spare capacity, (c) the ability of the network to
provide bandwidth between region "C" to a data center, and (d) the
incremental cost of bandwidth from region "C" to a data center.
Bernstein & Lee Expires January 16, 2013 [Page 6]
Internet-Draft Cross Stratum Optimization Use-cases July 2012
Region B
+---------+ +------+
| Data | |Client|
|Center 2 | | B1 |+------+
+------+ +----+----+ +--+---+|Client|
|Client| | / | B2 |
| A1 `. _.-+--------+-. +--+---+
Region A +------+ `-. ,-'' XXXXX XX `--. / ...
+------+ ,`: ``---..__ XXXX `+. +------+
|Client| / X | ```--XX \ |Client|
| A2 +------+..X`. \ XX--+---+ BM |
+------+ ( X `-/ \ ) +------+
... .-' .' | +----.X /
+------+ _.-' \ X/ \ | X `.
|Client|.-' `=.X \ XXXX ,-' `.
| AN | _.-'' `--. XXXXXXXXX _.-\ +---`.----+
+------+ +----'----+ `----+------+'' \ | Data |
| Data | | \ | |Center 3 |
|Center 1 | +--+---+ +--+---+ \ +---------+
+---------+ |Client| |Client| \------+
| C1 | | C2 | |Client|
+------+ +------+ | CK |
Region C +------+
Figure 2. Aggregated flows between end systems and data centers.
In Figure 2 we show a possible result of solving the previously
mentioned CSO problem. Here we show the additional client requests
from region "C" being serviced by data center #2 across the network.
Figure 2 also illustrates the possibility of setting up "express"
routes across the network at the MPLS level or below. Such
techniques, known as "optical grooming" or "optical bypass"[11],[12]
at the optical layer, can result in significant equipment and power
savings for the network by "bypassing" higher level routers and
switches.
2.3. Data Center and Network Faults and Recovery
Data center failures, whether partial or complete, can have a major
impact on revenues in the VoD example previously described. If there
is excess capacity in other data centers within the network
associated with the same application then clients could be
redirected to those other centers if the network has the capacity.
Moreover, MPLS and GMPLS controlled networks have the ability to
reroute traffic very quickly while preserving QoS. As with general
network recovery techniques [13] various combinations of pre-
Bernstein & Lee Expires January 16, 2013 [Page 7]
Internet-Draft Cross Stratum Optimization Use-cases July 2012
planning and "on the fly" approaches can be used to tradeoff between
recovery time and excess network capacity needed for recovery.
In the case of network failures there is the potential for clients
to be redirected to other data centers to avoid failed or over
utilized links.
3. Data Center to Data Center Networking
There are a number of motivations for data center to data center
communications: on demand capacity expansion ("cloud bursting"),
cooperative exchanges between business partners, offsite data
backup, "rent before building", etc... In Figure 3 we show an
example where a number of businesses each with an "internal data
center" contracts with a large external data center for additional
computational (which may include storage) capacity. The data centers
may connect to each other via IP transit type services or more
typically via some type of Ethernet virtual private line or LAN
service.
+-------------------+
| |
| Large Data Center |
| |
+----------+--------+
|
_.+-----------.
,--'' `---.
,-' `-.
,' `.
,' `.
+--------+ ; Network :
|Business| __..+ |
| #1 DC +-' : ;
+--------+ `. ,'
`. ;:
`-. ,-' \
`---. _.--' +--`.----+
`+-----------'' |Business|
/ | #N DC |
| +--------+
+----+---+
|Business|
| #2 DC |
+--------+
Figure 3. Basic data center to data center networking.
Bernstein & Lee Expires January 16, 2013 [Page 8]
Internet-Draft Cross Stratum Optimization Use-cases July 2012
3.1. Cross Stratum Optimization Examples
In the DC-to-DC example of Figure 3 we can have computational
constraints/limits at both local and remote data centers; fixed and
marginal computational costs at local and remote data centers; and
network bandwidth costs and constraints between data centers. Note
that computing costs could vary by the time of day along with the
cost of power and demand. Some cloud providers have quite
sophisticated compute pricing models including: reserved, on demand,
and spot (auction) variants.
In addition, to possibly dynamically changing pricing, traffic
loads between data centers can be quite dynamic. In addition, data
movement between data centers is another source of large network
usage variation. Such peaks can be due to scheduled daily or weekly
offsite data backup, bulk VM migration to a new data center,
periodic virtual machine migration, etc...
3.2. Network and Data Center Faults and Reliability
For networked applications that require high levels of
reliability/availability the network diagram of Figure 4 could be
enhanced with redundant business locations and external data centers
as shown in Figure 4. For example cell phone subscriber databases
and financial transactions generally require what is called
geographic database replication and results in extra communication
between sites supporting high availability. For example if business
#1 in Figure 4 required a highly available database related service
then there would be an additional communication flows from the data
center "1a" to data center "1b". Furthermore, if business #1 has
outsourced some of its computation and storage needs to independent
data center X then for resilience it may want/need to replicate
(hot-hot redundancy) this information at independent data center Y.
Bernstein & Lee Expires January 16, 2013 [Page 9]
Internet-Draft Cross Stratum Optimization Use-cases July 2012
+-------------+ +-------------+
|Independent | |Independent |
|Data Center X| |Data Center Y|
+-----+-------+ +------+------+
\ /
`. _.------------. .'
\--'' `-+-.
,-' `-. +--------+
,' `. .'Business|
,' `.-' |#N DC-a |
; Network : +--------+
+--------+ | |
|Business+--- ;
|#1 DC-a | `. +:
+--------+ `. ;/ \
`-. ,-' `.
.'`---. _.--' +--`.----+
+--------+ / `+-+---------\' |Business|
|Business| .' | \ |#N DC-a |
|#1 DC-b .' / \ +--------+
+--------+ | \
+----+---+ +--------+
|Business| |Business|
|#2 DC-a | |#2 DC-b |
+--------+ +--------+
Figure 4. Data center to data center networking with redundancy.
4. Cross Stratum Control Interfaces
Two types of load balancing techniques are currently utilized in
cloud computing. The first is load balancing within a data center
and is sometimes referred to as local load balancing. Here one is
concerned with distributing requests to appropriate machines (or
virtual machines) in a pool based on the current machine
utilization. The second type of load balancing is known as global
load balancing and is used to assign clients to a particular data
center out of a choice of more than one within the network and is
our concern here. A number of commercial vendors offer both local
and global load balancing products. Currently global load balancing
systems have very little knowledge of the underlying network. To
make better assignments of clients to data centers many of these
systems use geographic information based on IP addresses. Hence we
see that current systems are attempting to perform cross stratum
optimization albeit with very coarse network information. A more
Bernstein & Lee Expires January 16, 2013 [Page 10]
Internet-Draft Cross Stratum Optimization Use-cases July 2012
complete interface for CSO in the client aggregation case that is
also applicable in the "data center to data center" case would be:
1. A Network Query Interface - Where the global load balancer
can inquire as to the bandwidth availability between "client
regions" and data centers.
2. A Network Resource Reservation Interface - Where the global
load balancer can make explicit requests for bandwidth
between client regions and data centers.
3. A Fault Recovery Interface - For the global load balancer to
make requests for expedited bulk rerouting of client traffic
from one data center to another. Or for the network layer to
make requests to the application to help deal with network
faults.
The network query interface can be considered a superset of the
functionality supported by the current ALTO protocol [7]. Potential
extensions to ALTO for this purpose are given in the next section.
5. Potential ALTO Protocol Extensions
This section discusses the applicability of the ALTO protocol and
necessary extensions to support a network query interface suitable
for high bandwidth consuming applications. Before doing so we
discuss general properties of the high bandwidth scenarios that may
differ significantly from other uses of the ALTO protocol.
The first has to do with scope and scale. The consumer of high
bandwidth alto extensions is typically some type of application
controller within a data center, as opposed to an individual end
user. The number of such entities with a need for the high bandwidth
related information is orders of magnitude smaller than, say, peer
to peer networking users, or applications closer to the end user.
Since a network provider may consider this information sensitive,
there may be a desire to limit its distribution to a "pre-
registered" set of entities. Hence these extensions would be
applicable to controlled or partially controlled environments.
Secondly, there is the notion of time scales. In cloud services we
already see variants such as "on demand" compute instances and
"reserved" compute instances. For network resource queries we may be
concerned with (a) current bandwidth availability, (b) bandwidth
availability at a future time, or (c) bandwidth for a bulk data
Bernstein & Lee Expires January 16, 2013 [Page 11]
Internet-Draft Cross Stratum Optimization Use-cases July 2012
transfer of a given amount that must take place within a given time
window.
Time-dependent bandwidth information can be and typically are
considered in network planning and provisioning systems. For
example, a VoD provider knows ahead of time when the latest
"blockbuster" film will be available via its service and can make
estimates based on historical data on the bandwidth that it will
need to deal with the subsequent demand. The following discussions,
however, are restricted to "current time" for now.
Finally another goal in the design of an interface between the
application and networking stratums is to minimize the need for
either stratum to know too much about the inner workings of the
other. Hence as much as possible it is desired to insulate the
applications stratum from technology specifics of the network. That
said, data centers providing IaaS may prefer to specify flows and
connectivity at a layer below IP such as Ethernet.
The key ALTO extensions useful for querying the network for high
bandwidth consuming applications are:
(a) Bandwidth Constraint Information
(b) Constraint Based Filtering
(c) Multi-cost information [MultiCost]
(d) Endpoint Access Bandwidth Capacity (a new endpoint property)
In the following sections we discuss (a) and (b).
6. Bandwidth Constraint Information
6.1. Introduction
The amount of bandwidth of available between two entities or two
sets of entities can be of prime interest to applications that have
stringent bandwidth requirements relative to a networks capacity.
Such entities can be communicating across a WAN, a metro area, a
LAN, or even within a compute cluster.
One may want to query the network as to the available bandwidth in a
number of different cases:
(a) Bandwidth available between a single source destination pair
(b) Bandwidth between one particular source and several other
destinations
Bernstein & Lee Expires January 16, 2013 [Page 12]
Internet-Draft Cross Stratum Optimization Use-cases July 2012
(c) Bandwidth between one set of sources and another set of
destinations.
Case (a), bandwidth between two points, is well defined, however, in
cases (b) and (c) there is some ambiguity. In cases (b) and (c) one
may want to the query for the bandwidth available to a single "flow"
at a time, or for multiple simultaneous "flows" between sources and
destinations.
If the bandwidth query is for potentially simultaneous flows then
there is the possibility that the flows of interest would (or could)
share network resources, e.g., link capacity. Such a situation leads
to what is known as a multi-commodity flow problem [NetOpt]. General
formulations of this problem [NetOpt] allow for arbitrary path
selection and can permit splitting of user demands across multiple
paths if inverse multiplexing like techniques are available.
Alternative formulations of multi-commodity flow problems exist
[RWA] when path choices between a source and destination are
restricted to an explicit list of paths (or a single path). In both
formulations link capacities form a key optimization constraint.
To perform better application layer traffic optimization, the
presence and capacity of such "mutual bottleneck" links would need
to be considered by "large bandwidth applications". This draft shows
how a combination of abstract path link vectors and/or constrained
cost graph can be used to enable enhanced application layer traffic
optimization. These techniques are illustrated with connectionless
technologies such as IP and Ethernet, as well as MPLS and circuit
switched technologies that can be controlled via GMPLS.
6.1.1. Example Network: Providers View
In Figure 1 we show an example network consisting of five nodes and
six links. This is the network provider's view of the network and
not necessarily information to be shared in detail with
applications. We will use this same network to illustrate bandwidth
constraint representations for different technologies. For
illustrative purposes we only consider a single weight (cost) and
bandwidth constraint per link. The units of bandwidth could be Mbps,
Gbps, or wavelengths depending upon the technology. These costs and
constraints are from the network provider's perspective and may or
may not be the sole guidance in path selection, e.g., non-shortest
paths may be chosen depending upon data and control plane
technologies. However, when considering a path between a source and
destination across this network we sum the weights for each link
along the path to obtain the total cost for the path.
Bernstein & Lee Expires January 16, 2013 [Page 13]
Internet-Draft Cross Stratum Optimization Use-cases July 2012
+----+ L0 Wt=10,BW=50 +----+
| N0 |-----------------------------------------| N3 |
+----+ `. +----+
| `. L4 Wt=7 |
| `-. BW=40 |
| `. +----+ |
| `.| N4 | |
| L1 .' +----+ |
| Wt=10 / L2 |
| BW=45 / Wt=12 |
| /L5 Wt=10 BW=30 |
| .' BW=45 |
| / |
| / |
+----+ .' L3 Wt=15 BW=42 +----+
| N1 |.........................................| N2 |
+----+ +----+
Figure 1 Generic Constrained Network Example
6.2. Data and Control Plane Path Choices
In this section we survey common data and control plane technologies
with respect to the path choices that they may allow as well as the
methods one can use to infer available paths. Methods for inferring
paths influence how efficient the network layer can convey cost and
constraint information to the application layer, i.e., even if the
control plane limits us to a single fixed path between a source an
destination, if we need many paths between many sources and
destinations it can be very efficient if such information can be
derived from a simple graph representation.
Technologies that allow arbitrary placement of paths across a
network include: circuit switched technologies (WDM, TDM), strictly
connection oriented packet technologies (MPLS, ATM, and Frame
Relay), and connection oriented modes of multi-purpose protocols
such as InfiniBand's CO service. In these cases a network provider
can furnish a graph representation of the network suitable for the
application optimizer to choose routes. In some cases, for example,
in WDN networks due to optical impairments, the usable paths may be
restricted in a way not readily discerned from a simple graph
representation. In such a case a list of possible paths would need
to be furnished.
Bernstein & Lee Expires January 16, 2013 [Page 14]
Internet-Draft Cross Stratum Optimization Use-cases July 2012
For IP, a connectionless technology, one typically thinks of a
single path between each source and destination (not considering
equal cost multipath). Although no choice in path selection is
available, in the case of single area OSPF the paths can be derived
from a graph, while BGP [BGP4] uses techniques based on policies and
path vectors (AS_PATH) as part of its route selection process and
these are not derived from graphs. Multi-Topology Routing
enhancements to OSPF[MT-OSPF] can allow multiple path choices
between a source and destination and such paths could be derived
from their corresponding graphs.
Ethernet switching offers the greatest variety of path selection
capabilities depending upon the control plane employed. The basic
Ethernet Bridge specifications in 802.1D [802.1D] utilizes a single
tree structure as the communication backbone between all nodes.
Hence, one has no choice in path between nodes and the paths can be
easily derived from a graph of the spanning tree. We will also see
that such graphs are easy to reduce. IEEE 802.1Q [802.1Q] includes
virtual LANs (VLANs) and allows for multiple spanning trees. The
multiple spanning tree protocol (MSTP) allows for the assignment of
VLANs to trees. Hence we have more than one choice in paths but all
flows within the same VLAN have to share the same tree. Note that
trees can be given as graphs so this is a case where we may want
multiple graphs.
OpenFlow [OpenFlow] capable switches permit general forwarding
behavior based on general packet header matching. These can include
Ethernet destination and source addresses, IP destination and source
addresses, as well as other protocol related fields. Since both
source and destination information can be utilized in forwarding
OpenFlow can enable traffic engineering like a connection oriented
packet switching technology. Hence arbitrary path selection based on
a graph is possible.
6.3. ALTO Extensions
In this section we show give two different models for representing
bandwidth constraints, give several examples of both approaches, and
furnish an initial JSON encoding for both approaches. We end this
section with a discussion of which approach a network provider may
want to choose within a given context.
6.3.1. Mutually Constrained Paths
As discussed in section 6.2. the network's data or control plane may
dictate the paths taken between a source and destination. Even if
such paths could be derived from a graph, the network provider may
choose to provide information about the paths to promote information
Bernstein & Lee Expires January 16, 2013 [Page 15]
Internet-Draft Cross Stratum Optimization Use-cases July 2012
hiding or to minimize the amount of information needed to be
transferred via ALTO. For example if the application is asking for
cost/capacity information between a few sources and destinations
providing path information for these few paths may take much less
space than a corresponding graph.
In the following we give examples of paths with shared link
bandwidth constraints for two different technologies then we provide
a tentative JSON encoding for use with the ALTO protocol.
6.3.1.1. Simple IP Network Example
Consider Figure 1 as a single OSPF area with N0 representing a large
data center and nodes N2 and N3 as potential clients. The
corresponding path link vectors with their corresponding cost (sum
of weights) and link bandwidth constraints:
Path Src-Dest Path Vector Path Cost
P1 N0-N2: {L0, L2} 22
P2 N0-N3: {L0} 10
----------------------------------
Link Bandwidth
L0 50
L2 30
Table 1. Path Vectors for paths P1 and P2, and used link capacities.
From an optimization perspective each (capacitated) link is a
potential traffic constraint. From Table 1 since the paths from N0-
N2 and N0-N3 shared a common link, L0, the sum of their bandwidth
flows must be less than the capacity of L0 (50 units). In addition,
the capacity constraint on link L2 tell us that the bandwidth of the
traffic from N0-N2 must be less than 30 units. This information, as
well as the total costs of the two paths, is all that is needed for
a constrained joint optimization to proceed. Detailed information on
link costs (as seen by the network) is not necessary, nor is
information on unused links.
6.3.1.2. TDM Network Example
Now suppose the network of Figure 1 is a TDM network controlled by
GMPLS. Once again N0 representing a large data center and nodes N2
and N3 as potential clients. However in this case the network
provider offers an additional path, P3, for getting from N0-N2.
Path Src-Dest Path Vector Path Cost
Bernstein & Lee Expires January 16, 2013 [Page 16]
Internet-Draft Cross Stratum Optimization Use-cases July 2012
P1 N0-N2 {L0, L2} 22
P2 N0-N3 {L0} 10
P3 N0-N2 {L1,L3} 25
----------------------------------
Link Bandwidth
L0 50
L1 45
L2 30
L3 42
Table 2. Path Vectors for P1-P3 and used link capacities.
Once again no information in addition to that shown in Table 2 is
required to perform a constrained optimization. However, path P3 is
the only path using links L1 and L3. Link L3's capacity is 42 units
and is less that link L1's capacity of 45 units. Satisfying link
L3's capacity constraint (for the set of paths P1-P3) implies that
link L1's capacity constraint is always satisfied and hence no
information on link L1 needs to be sent from the network. In
particular the network could send the information shown in Table 3
where we have replaced links L1 and L3 with an "abstract link"
(AL13) with capacity equal to that of link L3.
Path Src-Dest Path Vector Path Cost
P1 N0-N2 {L0, L2} 22
P2 N0-N3 {L0} 10
P3 N0-N2 {AL13} 25
----------------------------------
Link Bandwidth
L0 50
L2 30
AL13 42
Table 3. Path Vectors for P1-P3 and abstract link capacities.
Note that simplifications such as the previous can frequently be
performed and can result in significant information savings. Also
this constraint information reduction was performed without the
network provider having knowledge of the application layers traffic
demands. Methods for performing these reductions may be specific to
service providers and not subject to standardization.
Bernstein & Lee Expires January 16, 2013 [Page 17]
Internet-Draft Cross Stratum Optimization Use-cases July 2012
6.3.1.3. JSON Encoding
In some cases there may be more than one path given between a source
and destination. In this case the network needs to furnish with
each path the following information: (source, destination), (path id
if more than one between source and destination), costs, overall
path constraint (if any), and list of mutual abstract links for this
path. In addition we need to furnish capacities for all mutual
abstract links mentioned.
object {
PIDName source;
PIDName dest;
JSONNumber wt; //A numerical path cost
JSONNumber delay; //A numerical path latency, optional
JSONNumber bw; //A numerical bandwidth constraint, optional
LIDName mutual-links<1..*>; //shared constrained links, optional
} PathData;
Note that "mutual-links" is a JSON array that contains the names of
the shared links that this path depends upon (may be empty). Note
that all costs are associated with path entities, while constraints
may be associated with paths or links.
object {
JSONNumber bw; //A numerical bandwidth constraint, optional
} SharedAbstractLink;
Note that the shared abstract link only contains capacity
information. This is much different from the case where a graph is
shared.
object {
PathData [pathname]<0..*>; // The individual path info
SharedAbstractLink [linkname]<0..*>; //Shared link info
} NetworkPathData;
6.3.2. Cost-Capacity Graphs
As discussed in section 6.2. the network's data or control plane may
allow arbitrary path selection and hence a cost-capacity graph
Bernstein & Lee Expires January 16, 2013 [Page 18]
Internet-Draft Cross Stratum Optimization Use-cases July 2012
representation would be needed for the optimization to fully take
advantage of this network flexibility.
In the case where path choice is limited, but the paths can be
derived from a graph, it may be useful for the network to supply a
graph to reduce the amount of information transferred via the ALTO
protocol. Suppose the application is interested in many source
destination pairs. In this case the amount of path information
including abstract link constraints could significantly exceed the
information size of a graph.
In the following we give examples of cost-capacity graphs for a
technology (TDM) that can offer arbitrary path choice, and for a
technology (MSTP Ethernet) that offers limited path choice but where
specifying graphs can result in significant efficiencies, we then
provide a tentative JSON encoding of cost-capacity graphs for use
with the ALTO protocol.
6.3.2.1. Simple TDM Example with Graph Reduction
Consider again where Figure 1 represents a TDM network and in this
case the provider will permit the application to make path choices.
Suppose that the application only involves nodes N0, N1, and N2, and
not N3 or N4. By studying the structure of the graph of Figure 1 one
can derive the reduced graph shown in Figure 2 that maintains all
relevant cost and capacity information from the point of view of
nodes N0, N1, and N2. In particular we were able to remove nodes N2
and N4, substitute abstract link AL0M2 for links L0 and L2, and
substitute abstract link AL4M5 for link L4 and L5. Note that any
such reductions, approximate or exact, are at the network providers
discretion.
+----+
| N0 |-------------------------------------------+
+----+ `. AL0M2 |
| `. Wt=22,BW=30 |
| `-. |
| `. |
| | AL4M5 |
| L1 . Wt=17,BW=40 |
| Wt=10 / |
| BW=45 / |
| / |
| .' |
| / |
Bernstein & Lee Expires January 16, 2013 [Page 19]
Internet-Draft Cross Stratum Optimization Use-cases July 2012
| / |
+----+ .' L3 Wt=15 BW=42 +----+
| N1 |.........................................| N2 |
+----+ +----+
Figure 2. Reduced graph of Figure 1 from the perspective of nodes
N1-N3.
The resulting information to be conveyed concerning this reduced
graph is shown in Table 4.
Link End Nodes Bandwidth Cost
AL0M2 (N0, N2) 50 22
L1 (N0, N1) 45 10
L3 (N1, N2) 42 15
AL4M5 (N0, N1) 40 17
Table 4. Representation of the graph of Figure 2.
6.3.2.2. Ethernet MSTP Example with Multiple Graphs
Consider the Ethernet network shown in Figure 3 running the MSTP
with three multiple spanning tree instances define. Suppose the
application is interested in connectivity between nodes N1, N3, N5,
N6, and N7. In Figures 4-6 we show the spanning tree instances along
with a high fidelity graph reduction that removes nodes that are not
of interest and abstracts links as needed.
Let's compare these reduced graph representations with that of a
path representation. Since we have n=5 communicating nodes of
interest this leads to n*(n-1)/2 = 10 potential paths per MSTI that
the network would need to furnish cost and constraint information as
in section 6.3.1. In the case of graphs reduced for the nodes of
interest from tree structures it can be proved that the number of
links in the graph is equal to (n-1), e.g., the reduced graph
consists of 5 nodes and 4 links.
+----+ L4
/| N3 |..______ +----+
| +----+ `````----| N4 |..__ L6
/ .-'+----+ ``--.__ +----+
/ .-' | ``--..| N7 |
| L2 .-' | +----+
/ .-' / .' |
/ .' | / /
| .-' / .' |
Bernstein & Lee Expires January 16, 2013 [Page 20]
Internet-Draft Cross Stratum Optimization Use-cases July 2012
/ .-' L9 | .' |
+-+--+ .-' | L11 / /
| N2 |.-' L5 / .' |
+----+ | / /L8
\ | .' |
\ L1 / .' |
\ | / /
\ / .' |
+----+ | .' /
| N1 |.__ L3 | / +----+
+----+ `--._ / .' __..| N6 |
``-.._ +----+ __..--'' +----+
``-.| N5 |.--'' L7
+----+
Figure 3. Ethernet Network supporting MSTP.
L4 AL4M6
+--+ +--+
+--+ __..--|N4|`. +--+ __..--|N7|
|N3|--' +--+ \ L6 |N3|--' +--+
+--+ `. +--+ |
/ `. / \
L2 / +--+ / |
.' |N7| .'AL1M2 \ L8
/ +--+ / |
+--+ MSTI #1 / +--+ \
|N2| / |N1| |
+--+ L8| +--+ \
\ (a) / (b) +--+
| L1 / .'|N6|
\ +--+ +--+ .' +--+
\ .'|N6| |N5|.' L7
+--+ +--+ .' +--+ +--+
|N1| |N5|.' L7
+--+ +--+
Figure 4. (a) Spanning tree instance #1, (b) Reduced graph from the
perspective of notes N1, N3, N5, N6, N7.
Bernstein & Lee Expires January 16, 2013 [Page 21]
Internet-Draft Cross Stratum Optimization Use-cases July 2012
+--+
+--+ L4_..-|N4| +--+
|N3|.--'' +--+ |N3||
+--+ .-' | +--+\
.-' / |
_.-' | +--+ \ +--+
.-' L9 | |N7| | |N7|
.-' / +--+ \ +--+
+--+ | + AL4M5 \ +
|N2| L5 / | | |
+--+ MSTI #2 | L8 / \ L8 /
| / | /
(a) / / (b) \ /
| +--+ | +--+
L3 / .'|N6| \ .'|N6|
+--+ +--+ .' +--+ +--+ L3 +--+ .' +--+
|N1|-------|N5|.' L7 |N1|-------|N5|.' L7
+--+ +--+ +--+ +--+
Figure 5. (a) Spanning tree instance #2, (b) Reduced graph from the
perspective of notes N1, N3, N5, N6, N7.
+--+
+--+ L4 __.|N4|`. +--+ AL4M6
|N3|---' +--+ \L6 |N3|.__
+--+ `. +--+ ``--...__
/ `. ``--..
L2 / +--+ +--+
.' MSTI #3 /|N7| /|N7|
/ .' +--+ .' +--+
+--+ L11 / | L11 / |
|N2| / / / /
+--+ (a) .' L8/ (b) .' L8/
/ | / |
/ / / /
.' +--+ .' +--+
/ |N6| / |N6|
+--+ L3 +--+ +--+ +--+ L3 +--+ +--+
|N1|.......|N5| |N1|.......|N5|
+--+ +--+ +--+ +--+
Figure 6. (a) Spanning tree instance #2, (b) Reduced graph from the
perspective of notes N1, N3, N5, N6, N7.
Bernstein & Lee Expires January 16, 2013 [Page 22]
Internet-Draft Cross Stratum Optimization Use-cases July 2012
In many data center applications all communicating virtual machines
(VM) need to be place within the same VLAN. MSTP allows the
assignment of VLANs to MSTIs hence a reduced graph representation
can provide a very good mechanism for determining an optimum fit
between communicating VM traffic patterns and MSTI VLAN assignment.
6.3.2.3. JSON Encoding
Like the current ALTO filtered cost map, a request for a cost-
capacity graph would take source and destination PIDs as inputs. In
JSON notation we could represent the return graph or graphs as an
JSON object containing link objects. As we saw in the Ethernet case
it may be useful to supply more than one graph. In addition
restrictions on routing such as only the shortest path between
source and destination is a valid route, e.g., OSPF routing for IP,
or that all routes come from the same graph, e.g., VLAN assignment
to MSTI in MSTP Ethernet.
Hence we are led to a tentative JSON encoding which includes named
link objects, named graph objects, an a versioned container for
holding graphs and any other general information such as the
previously mentioned restrictions.
object {
NIDName aend; // Node ids are similar to PIDs but
NIDName zend; // may not have end points
JSONNumber wt; //A numerical routing cost
JSONNumber delay; //A numerical latency cost, optional
JSONNumber bw; //A numerical bandwidth "cost", optional
// Other costs private or experimental could be added
// for example stuff related to reliability or economic cost.
// Only one cost of each type would be permitted.
// Note a multi-cost like mechanism could be used.
} LinkData
// Collection of links each identified by link id (LID) name.
object {
LinkData [lidname]<0..*>; // Link id (LID) would be an identifier
... // similar to a PID or NID and identifies the
// link
} NetworkGraphData;
Bernstein & Lee Expires January 16, 2013 [Page 23]
Internet-Draft Cross Stratum Optimization Use-cases July 2012
// Finally Multiple graph encapsulation and versioning
object {
VersionTag map-vtag;
NetworkGraphData [graphname]<1..*>; //named graphs
... // other information such as graph choice restrictions
// or routing restrictions.
} InfoResourceNetwork;
Where a graph name is formatted like a PIDName, but names a graph.
7. Constraint Based Filtering
Young's stuff here.
8. Conclusion
In this draft we have discussed two generic use cases that motivate
the usefulness of general interfaces for cross stratum optimization
in the network core. In our first use case network resource usage
became significant due to the aggregation of many individually
unique client demands. While in the second use case where data
centers were communicating with each other bandwidth usage was
already significant enough to warrant the use of private line/LAN
type of network services.
Both use cases result in optimization problems that trade off
computational versus network costs and constraints. Both featured
scenarios where advanced reservation, on demand, and recovery type
service interfaces could prove beneficial. In the later section of
this document we showed how ALTO concepts [1] and the ALTO protocol
could be used and extended to support joint application network
optimization for large network bandwidth consuming applications.
9. Security Considerations
TBD
Bernstein & Lee Expires January 16, 2013 [Page 24]
Internet-Draft Cross Stratum Optimization Use-cases July 2012
10. IANA Considerations
This informational document does not make any requests for IANA
action.
11. References
11.1. Informative References
[1] "draft-ietf-alto-reqs-09." [Online]. Available:
http://datatracker.ietf.org/doc/draft-ietf-alto-reqs/. [Accessed:
17-May-2011].
[2] J. Medved, N. Bitar, S. Previdi, B. Niven-Jenkins, and G. Watson,
"Use Cases for ALTO within CDNs." [Online]. Available:
http://tools.ietf.org/html/draft-jenkins-alto-cdn-use-cases-02.
[Accessed: 06-Mar-2012].
[3] E. Mannie, Ed., "Generalized Multi-Protocol Label Switching (GMPLS)
Architecture, RFC 3945." Oct-2004.
[4] Y. Lee, G. Bernstein, and W. Imajuku, Eds., "Framework for GMPLS
and PCE Control of Wavelength Switched Optical Networks (WSON), RFC
6163." Apr-2011.
[5] A. Farrel, J. P. Vasseur, and J. Ash, "A Path Computation Element
(PCE)-Based Architecture, RFC 4655." Aug-2006.
[6] G. Swallow, J. Drake, H. Ishimatsu, Y. Rekhter,, "Generalized
Multiprotocol Label Switching (GMPLS) User-Network Interface (UNI):
Resource ReserVation Protocol-Traffic Engineering(RSVP-TE) Support
for the Overlay Model, RFC 4208," Oct-2005.
[7] Y. R. Yang, R. Alimi, and R. Penno, "ALTO Protocol." [Online].
Available: http://tools.ietf.org/html/draft-ietf-alto-protocol-10.
[Accessed: 05-Mar-2012].
[8] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A.
Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M.
Zaharia, "A view of cloud computing," Commun. ACM, vol. 53, pp. 50-
58, Apr. 2010.
[9] K. A. Hua and S. Sheu, "Skyscraper broadcasting: a new broadcasting
scheme for metropolitan video-on-demand systems," in Proceedings of
the ACM SIGCOMM '97 conference on Applications, technologies,
architectures, and protocols for computer communication, Cannes,
France, 1997, pp. 89-100.
[10] "Adobe Flash Media Server 4.0 * Building peer-assisted networking
applications." [Online]. Available:
http://help.adobe.com/en_US/flashmediaserver/devguide/WSa4cb07693d12
3884520b86f312a354ba36d-8000.html. [Accessed: 13-May-2011].
Bernstein & Lee Expires January 16, 2013 [Page 25]
Internet-Draft Cross Stratum Optimization Use-cases July 2012
[11] Rudra Dutta and George N. Rouskas, "Traffic grooming in WDM
networks: Past and future," IEEE Network, vol. 16, no. 6, pp. 46 -
56, 2002.
[12] Keyao Zhu and B. Mukherjee, "Traffic grooming in an optical WDM
mesh network," Selected Areas in Communications, IEEE Journal on,
vol. 20, no. 1, pp. 122-133, 2002.
[13] G. Bernstein, B. Rajagopalan, and D. Saha, Optical Network
Control: Architecture, Protocols, and Standards. Addison-Wesley
Professional, 2003.
[14] B. Awerbuch and Y. Shavitt, "Topology aggregation for directed
graphs," Networking, IEEE/ACM Transactions on, vol. 9, no. 1, pp.
82-90, 2001.
[15] S. Uludag, K.-S. Lui, K. Nahrstedt, and G. Brewster, "Analysis of
Topology Aggregation techniques for QoS routing," ACM Comput. Surv.,
vol. 39, Sep. 2007.
[16] K. Nichols, D. L. Black, S. Blake, and F. Baker, "Definition of
the Differentiated Services Field (DS Field) in the IPv4 and IPv6
Headers." RFC2747. Available: http://tools.ietf.org/html/rfc2474.
[17] D. O. Awduche and J. Agogbua, "Requirements for Traffic
Engineering Over MPLS." RFC2702. Available:
http://tools.ietf.org/html/rfc2702.
Bernstein & Lee Expires January 16, 2013 [Page 26]
Internet-Draft Cross Stratum Optimization Use-cases July 2012
Author's Addresses
Greg M. Bernstein
Grotto Networking
Fremont California, USA
Phone: (510) 573-2237
Email: gregb@grotto-networking.com
Young Lee
Huawei Technologies
1700 Alma Drive, Suite 500
Plano, TX 75075
USA
Phone: (972) 509-5599
Email: ylee@huawei.com
Intellectual Property Statement
The IETF Trust takes no position regarding the validity or scope of
any Intellectual Property Rights or other rights that might be
claimed to pertain to the implementation or use of the technology
described in any IETF Document or the extent to which any license
under such rights might or might not be available; nor does it
represent that it has made any independent effort to identify any
such rights.
Copies of Intellectual Property disclosures made to the IETF
Secretariat and any assurances of licenses to be made available, or
the result of an attempt made to obtain a general license or
permission for the use of such proprietary rights by implementers or
users of this specification can be obtained from the IETF on-line
IPR repository at http://www.ietf.org/ipr
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
any standard or specification contained in an IETF Document. Please
address the information to the IETF at ietf-ipr@ietf.org.
Disclaimer of Validity
All IETF Documents and the information contained therein are
provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION
Bernstein & Lee Expires January 16, 2013 [Page 27]
Internet-Draft Cross Stratum Optimization Use-cases July 2012
HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY,
THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL
WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY
WARRANTY THAT THE USE OF THE INFORMATION THEREIN WILL NOT INFRINGE
ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS
FOR A PARTICULAR PURPOSE.
Acknowledgment
Funding for the RFC Editor function is currently provided by the
Internet Society.
Bernstein & Lee Expires January 16, 2013 [Page 28]