Internet DRAFT - draft-dalela-dc-requirements
draft-dalela-dc-requirements
Network Working Group A. Dalela
Internet Draft Cisco Systems
Intended status: Standards Track December 30, 2011
Expires: June 2012
Datacenter Network and Operations Requirements
draft-dalela-dc-requirements-00.txt
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
This Internet-Draft will expire on June 30, 2012.
Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document.
Dalela Expires June 30, 2012 [Page 1]
Internet-Draft Datacenter Requirements December 2011
Abstract
The problems of modern datacenters are rooted in virtualized host
scaling. Virtualization implies VM mobility, which may be intra-
datacenter or inter-datacenter. Mobility may cross administrative
boundaries, such as across private-public domains. A public
datacenter may be multi-tenant, extending several private networks
into the public domain. Running these massively scaled, virtualized,
multi-tenant datacenters poses some unique networking and operational
challenges. This document describes these challenges.
Table of Contents
1. Introduction...................................................3
2. Conventions used in this document..............................5
3. Terms and Acronyms.............................................5
4. Cloud Characteristics..........................................5
4.1. Network Fragmentation.....................................5
4.2. Hardware-Software Reliability.............................6
4.3. Data is Relatively Immobile...............................6
5. Problem Statement..............................................7
5.1. The Basic Forwarding Problem..............................7
5.2. The Datacenter Inter-Connectivity Problem.................7
5.3. The Multi-Tenancy Problem.................................8
5.4. The Technology-Topology Separation Problem................8
5.5. The Network Convergence Problem...........................9
5.6. The East-West Traffic Problem............................10
5.7. The Network SLA Problem..................................11
5.8. The Broadcast and Multicast Problem......................12
5.9. The Cloud Control Problem................................13
5.10. The Forwarding Plane Scale Problem......................14
6. Security Considerations.......................................15
7. IANA Considerations...........................................15
8. Conclusions...................................................15
9. References....................................................15
9.1. Normative References.....................................15
10. Acknowledgments..............................................15
Dalela Expires June 30, 2012 [Page 2]
Internet-Draft Datacenter Requirements December 2011
1. Introduction
Cloud computing promises to lead a revolution in computing provided
we can solve some of its most pressing problems. At a high-level, the
challenge is to connect and operate massively scaled virtualized
datacenters for multiple tenants, while guaranteeing the same level
of reliability, performance, security and control that is available
in the private domain. Virtualization enables mobility - including
mobility of network devices (switches, firewalls, load-balancers,
etc.) within and across datacenters. The Mobile IP approach does not
handle datacenter mobility optimally. New approaches are therefore
needed to handle these problems in a scalable way. The challenge is
also to solve cloud problems with as little disruption to existing
infrastructure, protocols and architectures as possible.
Since changes to existing technology will be needed, it is important
to state the reasons for this change, which then forms the basis for
the development of new technology for datacenters.
The discussion of datacenter problems is today complicated by various
architectural approaches that are being proposed in the industry to
address them. For example, there are questions about whether the
datacenter networks need to be hierarchical or flat, should the
network be fat-edges and lean-core, or lean-edges and fat-core,
whether packet forwarding should use flat addressing or overlays,
should the problems be solved exclusively in the network, exclusively
in the host, or via a combination of both, does the datacenter need a
centralized or distributed control plane, etc.
Given these many alternative approaches, it is not clear if there is
a real problem out there waiting to be solved. For example, if there
was a problem waiting to be solved, then we would not have so many
solutions! Articulating the problem itself therefore has become a
challenge. Proliferation of solutions without an adequate discussion
of the entire problem set creates a lot of confusion.
The easiest way to move forward is to identify a broader set of
problems rather than looking at "how do we move a VM in the network"
or "what is the way to connect datacenters" in isolation.
The need for identifying a broader set of problems is reinforced by a
recognition (that may be yet to dawn in some cases) that some
solutions only shift the burden of a problem to another point that
will become obvious after the solution has been deployed.
Dalela Expires June 30, 2012 [Page 3]
Internet-Draft Datacenter Requirements December 2011
For instance, if we use flat addressing and propagate all datacenter
addresses into the Internet, the address fragmentation caused by VM
mobility would need impossible scale in the Internet core. If however
we implement overlays to connect multiple datacenters, additional
signaling and encapsulation overheads are involved at the datacenter
edge. These overheads will rapidly grow as the size of datacenters
and the number of interconnected datacenters grows. Overlay solutions
shift the problem from the Internet core to the datacenter edges, and
will now appear at the edge at another time. This might be
acceptable, but it needs to be understood. Also, an approach that
solves scaling both in Internet core and the datacenter edge should
be preferred over those that only shift the scaling problem.
Without a broader consideration of all the problems involved, it is
likely that we might be adopting solutions that will lead to other
problems. The solution of these second-order problems will be harder,
and lead to much more complexity than if a careful consideration was
given to the entire problem set from start.
The key point of this document is that the datacenter problem is
multi-faceted. There are many ways to solve each of these problems
individually. But, taken together, the solutions may be less than
optimal. At scale, less than optimal solutions quickly break down.
Without scale, any solution will work perfectly well. A key criterion
for selecting amongst many possible solution alternatives should be
whether the entire solution set scales very well.
Scalability hasn't generally been a standards consideration and the
problems of scaling are left to implementation. But, in the case of
cloud datacenters, scaling is the basic requirement, and all problems
of cloud datacenters arise due to scaling. The solution development
can't therefore ignore the scaling and optimality problem.
As an initial step, therefore, we need to state the multiple facets
of the datacenter problem. These problems can be used to identify the
optimal approach that will collectively solve all problems, without
worsening the solution of another problem. Multi-facets of the
overall approach can then be divided into individual workgroups to
build a consistent solution. This kind of approach is needed for the
cloud datacenter domain, because the problem is multi-faceted.
This document identifies a set of networking and operational problems
in the datacenter that need to be solved collectively.
Dalela Expires June 30, 2012 [Page 4]
Internet-Draft Datacenter Requirements December 2011
2. Conventions used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC-2119 [RFC2119].
In this document, these words will appear with that interpretation
only when in ALL CAPS. Lower case uses of these words are not to be
interpreted as carrying RFC-2119 significance.
3. Terms and Acronyms
NA
4. Cloud Characteristics
Most cloud characteristics are well-known - on-demand, elastic,
large-scale, etc. We will not repeat them here. What we will briefly
discuss is the impact of an on-demand, elastic, large-scale design on
traditional assumptions about the network.
4.1. Network Fragmentation
By definition, cloud involves rapid provisioning of resources. When a
user requests VMs incrementally, these VMs will generally be
allocated in different locations where capacity is currently
available. When these VMs connect to each other, it consumes a huge
amount of cross-sectional bandwidth. Over time, these machines need
to be consolidated to conserve cross-sectional bandwidth.
Fragmentation will also be caused due to maintenance windows, because
during this time, no additional configuration is allowed since
hardware or software is upgrading. Providers will not turn away
customers because they are upgrading hardware or software! That means
new requests must land up in locations that are not upgrading.
Subsequently, these fragmented islands need to be consolidated. At
massive scale, something or the other will mostly be upgrading.
A provider has no good way of predicting or planning the behavior of
various users who use the cloud services. A provider can only react
to these behaviors. And react they must to optimize resources.
Network fragmentation therefore has to be fixed reactively. Over time
predictive trends could be devised but these are not likely to be
very consistent. We will probably discover that like network trends,
cloud trends are statistical and self-similar over time.
Dalela Expires June 30, 2012 [Page 5]
Internet-Draft Datacenter Requirements December 2011
4.2. Hardware-Software Reliability
At 5-nines reliability, one out of 100,000 hardware devices will fail
every 5.25 minutes. At 3-nines reliability, one out of 1000 software
elements will fail every 5.25 minutes. At the scale being targeted
for cloud, there are easily a million hardware devices and several
hundred million software instances. This implies a massive rate of
on-going failure that needs to be recovered from. For instance, at 3-
nines reliability, and 100 million VM/applications, a VM/application
will fail every 3 milliseconds on average. At 5-nines reliability, a
VM/application will fail every 300 milliseconds on average.
Large web companies with experience in running massive datacenters
echo this view. They design large clusters that run common services
and are dispensable as a whole, because there are standby clusters
that can take over their work. That isn't true of consumer clouds in
general, and can be very expensive. The best recourse is to detect
failures and recreate services as quickly as possible.
Recovery from on-going failures forces churn in network. The provider
needs to find new locations quickly where services can be recreated.
To detect these failures and recreate them, there is a significant
effort in monitoring these hardware-software entities.
The situation is further complicated by the fact that new devices are
constantly being provisioned. Every network operator knows from
experience that their systems generally work correctly until
configuration is changed. Configuration changes are confined to
maintenance windows when existing configuration is backed up and
administrators prepare users for possible failures. There are no such
time windows in cloud. The changes are ever present and on-going.
This implies a higher rate of failure and churn.
4.3. Data is Relatively Immobile
To optimize disk usage, it is optimal to use network storage, rather
than local storage. This also simplifies storage backup. But it also
implies that data is immobile. When a VM moves, it is not carrying
its data along. Rather big data pipes need to haul the bits from one
location to another. You can recreate the VM but can't recreate data
easily. That means that this hauling is necessary.
It implies that network has to consider data immobility as a major
source of network traffic, besides server-to-server and user-to-
server traffics that are already recognized. When data is outside the
VM, the security enforcement also needs to be more effective. In
effect, we need to consider both static and in-flight data security.
Dalela Expires June 30, 2012 [Page 6]
Internet-Draft Datacenter Requirements December 2011
In-flight data security comes from encryption and static security
from authentication. Both of these traditional approaches are
expensive. Network segmentation needs to be used to address them.
5. Problem Statement
This section captures the various types of datacenter problems, one
per section, at a high-level. These problems are described in
relation to widely deployed and standardized L2 and L3 network
protocols and practices. A complete analysis of proposed or emerging
technologies, including current on-going work in IETF in other
workgroups, is out of scope for this document. However, the problem
statement here can be used to subsequently discuss that work.
5.1. The Basic Forwarding Problem
Traditionally, datacenter networks have used L2 or L3 technologies.
The need to massively scale virtualized hosts breaks both these
approaches. L2 networks can't be made to scale because of high number
of broadcasts. L3 networks can't support host mobility, since routing
uses subnets and an IP cannot be moved out of that subnet. Moving IP
in a natively L3 network requires installing host routes at one or
more points in the path and that is an approach that can't be scaled.
The failure of traditional L2 and L3 networking approaches means that
a new forwarding approach needs to be devised that scales massively
and supports host mobility to different points in the datacenter.
5.2. The Datacenter Inter-Connectivity Problem
There are limits to how much a datacenter would be scaled. Workloads
need to be placed closer to the clients to reduce latency and
bandwidth. Hence, datacenters need to be split into geographical
locations and connected over the Internet. Some of these datacenters
may be owned by different administrators, as in the case of private
and public cloud interconnectivity. Workloads can move between these
datacenters, similar to how they move within the datacenter.
This problem is similar to the mobility problem within a datacenter,
except that the intra-datacenter mobility can use modifications to L2
or L3 technologies, but the inter-datacenter connection problem must
run over an L3 internet. This is because pushing the datacenter
routes into the internet is not a scalable solution.
Regardless of these difference, treating inter and intra datacenter
as entirely independent leads to new issues at the edge that arise
from trying to map one forwarding approach within datacenter to
Dalela Expires June 30, 2012 [Page 7]
Internet-Draft Datacenter Requirements December 2011
another forwarding approach between datacenters. In some cases, both
L2 and L3 approaches may be needed to connect two datacenters.
Further, ideally, customer segmentation in the internet needs to be
done similar to the segmentation in the datacenter. This simplifies
the identification of a customer's packets in the Internet as in the
datacenter. Common QoS and Security policies can be applied, in both
the domains if there is a common way to identify packets.
5.3. The Multi-Tenancy Problem
Datacenters thus far have been wholly used by single tenant. To
separate departments within a tenant, VLANs have been used. This
seemed sufficient for the number of segments an enterprise would
need. But, this approach can't be extended to cloud datacenters.
First, the number of tenants in the public cloud will far exceed
4096, which is the limit of VLANs possible. Second, many such tenants
may want to use more than one VLAN to segment their network in ways
similar to how they have been doing in the private domain. Third,
some tenants would like to span their private VLAN space into the
public cloud using the same VLAN id. Since many tenants use the same
VLAN in their private domains, extending these VLANs into the public
domain will cause VLAN overlap. These issues limit the use of VLANs
as isolation mechanisms to segment tenants in the cloud.
The use of L3 VRFs also poses similar challenges of scaling. VRFs
will bloat a routing table if addresses within a subnet belong to
different tenants (even when there is no mobility) because routes to
a tenant's hosts must be separated from other tenant's routes. The
problem further worsens when mobility is added to multi-tenancy. Now,
host specific routes must be present at every point where the VRF is
spanning. With VRFs, these entries will be present even if there is
no traffic from a host to other hosts in the VRF.
Regardless of whether L2 or L3 techniques are used, there is need to
find a mechanism to segment networks by tenants, and the ability to
identify a tenant on the network based on the packet. This would also
be required for enforcing security and QoS policies per tenant.
5.4. The Technology-Topology Separation Problem
While large datacenters are becoming common, medium and small
datacenters will continue to exist. These may include a branch office
connected to a central office, or a small enterprise datacenter that
is connected to a huge public cloud. To move workloads across these
networks, the technologies used in the datacenter must be agnostic of
the topology employed in the various sized datacenters.
Dalela Expires June 30, 2012 [Page 8]
Internet-Draft Datacenter Requirements December 2011
A small datacenter may use a mesh topology. A medium datacenter may
use a three-tier topology. And a large datacenter may use a two-tier
multi-path architecture. It has to be recognized that all these
datacenters of various sizes need to interoperate. In particular, it
should be possible to use a common technology to connect large and
small datacenters, two large datacenters, or two small datacenters.
The technology devised to support massive scale and mobility
therefore should be topology agnostic. This is true of L2 and L3
technologies. For instance, both STP and OSPF are topology agnostic.
This separates the question of technology from the question of
topology, and has been key reason for their success. Similar
separation of technology and topology is needed in datacenters.
This of course does not imply that questions of architecture are
unimportant. Quite the contrary, since different architectures
facilitate different cross-sectional bandwidths. However, concerns
about cross-sectional bandwidth (which arise from oversubscription)
are orthogonal to the other issues about forwarding. Problems of
oversubscription have to be handled at massive scale, and the
architecture of choice depends on application traffic patterns. Given
a certain network design (based on application traffic patterns) it
should be possible to use the same technology everywhere.
5.5. The Network Convergence Problem
Cloud datacenters will be characterized by elasticity. That means
that virtual resources are constantly created and destroyed. Typical
hardware and software reliabilities of today mean that failures at
scale will be fairly common, and automated recovery mechanisms will
need to be put in place. When combined with workload mobility for the
sake of resource optimization and improving utilization, the churn in
the network forwarding tables can be very significant.
To keep the network stable, this churn needs to be minimized.
Frequent changing of routes and insertion and removal of devices will
add instabilities to the entire network that can lead to large scale
outages. At scale, minor issues may propagate into the entire
network, making it harder to debug and fix issues.
Current L2 and L3 technologies solved this in different ways. A L2
network recovers automatically through the effort of the hosts that
ARP for each other. A L3 network remains stable by forwarding based
on subnets, and subnet changes in the datacenter are infrequent. A
lot of effort has been invested into reducing the convergence times
of L2 and L3 technologies. These mechanisms need to be extended to
whatever new forwarding approach is designed for datacenter.
Dalela Expires June 30, 2012 [Page 9]
Internet-Draft Datacenter Requirements December 2011
Any approach that propagates network state change everywhere (like
traditional routing protocols) is unlikely to work. But, note that
such changes may often be unavoidable in case of multicast and
broadcast traffic where movement and creation of resources changes
the boundary of the multicast and broadcast domains.
Mobility also affects virtualized network devices, such as virtual
switches, firewalls, load-balancers, etc. For instance, when a server
fails and all the VMs are relocated, the associated virtual switch
and firewall must also be relocated. This means that any assumption
in mobility that the network is a static firmament on which hosts are
dynamically attached becomes false. We have to assume that the
network is as dynamic as the hosts themselves.
5.6. The East-West Traffic Problem
Datacenter traffic patterns are changing. Instead of predominantly
north-south traffic, the traffic patterns are now largely server-to-
server. This change is driven by the fact that computation now deals
with massive amounts of data, and to compute the result in short
periods of time, the data needs to be distributed into many compute
nodes. When data in one of these locations changes those changes have
to be propagated to other nodes. Likewise, a single request has to be
forked into many requests and the results have to be collated, in a
design often called Map-Reduce. These application design patterns
influence the traffic patterns in datacenters.
The predominant traffic pattern in datacenter is not 1-to-1 anymore.
It is rather replaced by 1-to-N and N-to-1. For instance, when a data
node is replicated to many nodes, or when a problem is "mapped" to
many servers, the traffic pattern is 1-to-N. When the problem is
"reduced", the traffic pattern is N-to-1. To deal with 1-to-N and N-
to-1 traffic patterns, there must be sufficient bandwidth at ingress
and egress. This is possible through use of multi-paths.
The distributed application models are predominantly used today in
large web companies. However, increasingly these models will be used
for any application that deals with large amounts of data or very
complex processing needs. This includes variety of HPC applications,
telecommunications, Big Data analytics, etc. Today, cloud may be used
only for hosting web servers or simple one or two VM applications.
The traditional enterprise applications also have used a single
machine. This model will increasingly give way to new application
architectures that involve distribution of processing across compute
nodes. The data in these nodes may be distributed as well.
Dalela Expires June 30, 2012 [Page 10]
Internet-Draft Datacenter Requirements December 2011
To deal with distributed applications, there need to be many physical
paths to destination. If these physical paths are available, then
forwarding technologies must enable the use all those paths. That
implies that STP traditionally used in L2 networks can't be used in
datacenters because STP converts a physical mesh into a logical tree,
turning off most of the ports to avoid packet flooding. L3 unicast
routing protocols don't suffer from this problem. However, the issue
returns for L3/L2 multicast and L2 broadcast which use trees.
Use of multi-paths should be allowed for both L2 and L3 traffics,
namely that packets that cross a VLAN boundary should also be able to
use multi-paths. This becomes difficult if the default gateway for a
VLAN is pinned at some point in the network, because then all packets
that cross the VLAN boundary must go through that gateway.
5.7. The Network SLA Problem
Multi-tenant networks need to protect all tenants from overusing
network resources. For example, high-traffic load from one tenant
should not starve another tenant of bandwidth. Note that in a multi-
tenant environment, no tenant has full control or visibility of what
other tenants are doing, and how problems can be fixed. A real-time
debugging of such problems is very hard for a provider.
There are two alternatives to address this problem.
First, a provider can overprovision the network at the core and
aggregation to ensure that all tenants can get a good bandwidth. This
scheme is not fool-proof. It fails when many flows converge into the
same aggregation or core link, while other links remain idle. The
total cross-sectional bandwidth maybe same but flow convergence will
result in buffer overflows and packet drops. When this happens, all
flows will crawl. Given statistical randomness of flows, the
situation will most likely ease soon but it may leave behind a data
backlog that causes other issues. What makes it incredibly hard is
that it depends on a network-wide flow pattern and would be nearly
impossible to replicate in a multi-tenant environment.
The above scenario can be made infrequent by increasing the extent of
over-provisioning, but it cannot be completely solved. Over-
provisioning reduces the probability of this happening, but does not
have methods to control its effects when it does happen. The adverse
consequences of its occurrence completely depend on the application,
which the provider cannot control, and could be limited to a short-
lived slow application performance or may cause long-lived data out-
of-sync issues. Over-provisioning is also expensive.
Dalela Expires June 30, 2012 [Page 11]
Internet-Draft Datacenter Requirements December 2011
Second, mechanisms to measure and guarantee network SLAs will have to
employ active flow management to guarantee bandwidth to all tenants
and keep the network provisioned only to the level required. Flow
management can be integrated as part of existing forwarding
techniques or may need new techniques. Network SLAs can play an
important role in determining if sufficient bandwidth is available
before a VM is moved to a new location.
5.8. The Broadcast and Multicast Problem
Traffic forwarding needs to take into account broadcast control and
multicast optimization. ARP broadcast control is already recognized
to be an important problem because hosts tend to refresh their ARP
tables 15-30 seconds. Depending on the number of VMs in a datacenter,
the number of other VMs a VM interacts with, the number of VLANs a
physical host is part of, and if these VLANs are spanning across
subnets, where the L2-L3 boundary is placed in the network, ARP
broadcasts can range from manageable to unmanageable. Note that L2-L3
boundaries conflict with VM mobility across L3 domains.
Broadcast control is also very important at the datacenter edges, if
a VLAN is spanning across different sites. Periodic broadcasts, if
flooded over the datacenter interconnect, can be an overhead.
DHCP broadcast in a multi-tenant environment also poses a challenge
since the IP assignment for each tenant may be different, and may
therefore involve the use of different DHCP servers. Mapping a tenant
to a DHCP server requires additional intelligence and configuration.
Alternately, tenant separation needs to be built in DHCP.
Both multicast and broadcast traffic across datacenters needs to
devise flood control. Note that the inter-datacenter connectivity is
generally an overlay. This implies that for broadcast a VLAN spanning
tree and for multicast a multicast distribution tree must be
constructed within the overlay. The overlay can be a mesh (ECMP) but
the multicast and broadcast distribution must be in trees. The roots
of these trees need to be distributed across the various datacenter
edges to load-balance multicast and broadcast traffic load.
The multicast and broadcast trees should also be constructed in a way
that minimizes latency and conserves bandwidth. That means that the
distribution trees may need to be reconstructed with VM mobility. The
reconstruction of trees must be such that the locations of maximum
workload density are placed closest on the distribution tree.
Dalela Expires June 30, 2012 [Page 12]
Internet-Draft Datacenter Requirements December 2011
5.9. The Cloud Control Problem
Move into a multi-tenant environment depends on opening up sufficient
controls for a user to "look inside" the cloud and be able to control
it to a sufficient low-level of granularity. A cloud that equals to
fuzzy (in terms of control) will not be preferred by users who are
accustomed to designing their networks and controlling them. This
includes most high-end users whose needs require customization of the
network. The "one size fits all" approach can't be used.
The "control" problem also translates into a user being able to
access the same service from more than one provider in the same way,
similar to how most enterprises multi-home their networks into more
than one network provider. The cloud infrastructure is no less
important than the internet connection that is used to reach to it,
and concerns of high-availability and lock-in prevention are as
important for cloud as they were for networks. Further, to
interoperate the public and private domains, the control mechanism
must be interoperable across private and public domains. This is
practically achieved only if the private and public cloud control
mechanisms are made to converge into a single mechanism.
From a provider perspective, there are many infrastructure "domains"
that need combining to deliver complete services. These include
compute, network, storage, security, firewalls, load balancers, wan
optimizers, etc. Standards for orchestrating these domains are needed
to manage them in a common way. If the orchestration interfaces are
standardized, then they can be delivered with the infrastructure
itself similar to how management interfaces have been provided so
far. A new set of standardized management interfaces are desired that
allow users to create, modify, move and delete virtual devices.
Mobility of devices in particular poses some challenges. For
instance, when a VM moves, it needs to drag the network configuration
along with it. This may include the VLANs that the host was part of,
and other network interface level policies such as bandwidth.
Similarly, if there were firewall rules protecting a VM, these need
to be moved as well into the path of all packets at the new location.
Prior to the move, it must be determined if there is sufficient
bandwidth available at the new location. The orchestrator has to be
aware of the current packet flows and the latencies in the new
location. For user facing VMs, as a general rule, the VM must be
equidistant from storage and user for optimal performance.
These requirements obviously imply that the orchestrator has to be at
the very least aware of network topology, various paths to a
destination and the current load on the network. Tight linkage
Dalela Expires June 30, 2012 [Page 13]
Internet-Draft Datacenter Requirements December 2011
between network and orchestrator is needed to achieve this. This is a
major problem today because currently we treat the network topology
and orchestration of resources as unrelated domains.
5.10. The Forwarding Plane Scale Problem
The need to support VM mobility has led to many encapsulation
schemes. These schemes have variable scaling properties. In the best
case, the network scales like L3 routing if there is no mobility and
only one hardware entry is needed per subnet (which is aggregated).
In the worst case, network scales like L2 forwarding, when every host
is mobile, and one hardware entry per host is needed (which can't be
aggregated). As the network grows very large, the gap between the
best and worst cases can be 5-6 orders of magnitude. When a device
runs out of forwarding entry space, packets would be dropped for L3
routing and flooded for L2 forwarding. Both can have serious side-
effects depending on the nature of applications involved.
Note that host routes are in addition to what currently exists on the
network devices, and will continue to exist. This includes (a)
network routes, (b) local host-port bindings, (c) access lists, port
security, port based authentication, etc. The network of today is
sufficient to handle these types of entries. Addition of host routes
will imply a dramatic increase from current network abilities.
But, how widespread is VM mobility? If it is widespread, then
encapsulation schemes will not scale. If it is not widespread, then
encapsulation schemes will work, although sub-optimally.
Cases Where VMs Would Not Be Moved:
a. When a VM has significant large local storage
b. When a VM uses consistent amount of CPU always
c. When a VM holds proprietary data that can't be moved
d. When moving a VM with network storage will degrade performance
e. When users for the VM are always in one location
Cases Where VMs Would Be Moved:
a. During maintenance cycles for hardware, software upgrade
b. During disaster recovery of sites, or hardware/software failures
c. When VM has different CPU, memory requirements at different times
d. When VM don't need to be run always, and are job based
e. When users for the VM have different locations based on time
The above is not an exhaustive list but it shows that VM mobility is
not applicable everywhere, but there are many cases when it is
applicable. Also, when private-public domains are connected, the IP
and MAC across these domains could overlap. The mobility approaches
Dalela Expires June 30, 2012 [Page 14]
Internet-Draft Datacenter Requirements December 2011
must therefore be segmented between tenants. This further grows table
sizes because the route to an IP depends on which tenant is asking
for it. Duplicating IP will also need duplicate routes.
The problem of encapsulation is further worsened by the fact that it
must be performed at the access layer. In a datacenter, generally,
the ratio of access to core is about 1000 to 1. That means that the
cost of encapsulation must be summed over 1000 devices to arrive at
the total cost. This is different than if the cost was limited to a
single device or a few devices in the core.
6. Security Considerations
NA
7. IANA Considerations
NA
8. Conclusions
This document described a set of high-level cloud related problems.
Potential solutions to these problems need to be evaluated in terms
of their impact to other problems and their solutions.
9. References
9.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
10. Acknowledgments
This document was prepared using 2-Word-v2.0.template.dot.
Dalela Expires June 30, 2012 [Page 15]
Internet-Draft Datacenter Requirements December 2011
Authors' Addresses
Ashish Dalela
Cisco Systems
Cessna Business Park
Bangalore
India 560037
Email: adalela@cisco.com
Dalela Expires June 30, 2012 [Page 16]