dalela-dc-requirements-00.txt

Internet DRAFT - draft-dalela-dc-requirements
draft-dalela-dc-requirements

Last Version:	draft-dalela-dc-requirements-00.txt	Tracker Entry
Date:	`31-Dec-2011`
Disposition:	expired

Network Working Group                                         A. Dalela
Internet Draft                                            Cisco Systems
Intended status: Standards Track                      December 30, 2011
Expires: June 2012



              Datacenter Network and Operations Requirements
                    draft-dalela-dc-requirements-00.txt


Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html

   This Internet-Draft will expire on June 30, 2012.

Copyright Notice

   Copyright (c) 2011 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.






Dalela                  Expires June 30, 2012                  [Page 1]

Internet-Draft         Datacenter Requirements            December 2011


Abstract

   The problems of modern datacenters are rooted in virtualized host
   scaling. Virtualization implies VM mobility, which may be intra-
   datacenter or inter-datacenter. Mobility may cross administrative
   boundaries, such as across private-public domains. A public
   datacenter may be multi-tenant, extending several private networks
   into the public domain. Running these massively scaled, virtualized,
   multi-tenant datacenters poses some unique networking and operational
   challenges. This document describes these challenges.

Table of Contents

   1. Introduction...................................................3
   2. Conventions used in this document..............................5
   3. Terms and Acronyms.............................................5
   4. Cloud Characteristics..........................................5
      4.1. Network Fragmentation.....................................5
      4.2. Hardware-Software Reliability.............................6
      4.3. Data is Relatively Immobile...............................6
   5. Problem Statement..............................................7
      5.1. The Basic Forwarding Problem..............................7
      5.2. The Datacenter Inter-Connectivity Problem.................7
      5.3. The Multi-Tenancy Problem.................................8
      5.4. The Technology-Topology Separation Problem................8
      5.5. The Network Convergence Problem...........................9
      5.6. The East-West Traffic Problem............................10
      5.7. The Network SLA Problem..................................11
      5.8. The Broadcast and Multicast Problem......................12
      5.9. The Cloud Control Problem................................13
      5.10. The Forwarding Plane Scale Problem......................14
   6. Security Considerations.......................................15
   7. IANA Considerations...........................................15
   8. Conclusions...................................................15
   9. References....................................................15
      9.1. Normative References.....................................15
   10. Acknowledgments..............................................15












Dalela                  Expires June 30, 2012                  [Page 2]

Internet-Draft         Datacenter Requirements            December 2011



1. Introduction

   Cloud computing promises to lead a revolution in computing provided
   we can solve some of its most pressing problems. At a high-level, the
   challenge is to connect and operate massively scaled virtualized
   datacenters for multiple tenants, while guaranteeing the same level
   of reliability, performance, security and control that is available
   in the private domain. Virtualization enables mobility - including
   mobility of network devices (switches, firewalls, load-balancers,
   etc.) within and across datacenters. The Mobile IP approach does not
   handle datacenter mobility optimally. New approaches are therefore
   needed to handle these problems in a scalable way. The challenge is
   also to solve cloud problems with as little disruption to existing
   infrastructure, protocols and architectures as possible.

   Since changes to existing technology will be needed, it is important
   to state the reasons for this change, which then forms the basis for
   the development of new technology for datacenters.

   The discussion of datacenter problems is today complicated by various
   architectural approaches that are being proposed in the industry to
   address them. For example, there are questions about whether the
   datacenter networks need to be hierarchical or flat, should the
   network be fat-edges and lean-core, or lean-edges and fat-core,
   whether packet forwarding should use flat addressing or overlays,
   should the problems be solved exclusively in the network, exclusively
   in the host, or via a combination of both, does the datacenter need a
   centralized or distributed control plane, etc.

   Given these many alternative approaches, it is not clear if there is
   a real problem out there waiting to be solved. For example, if there
   was a problem waiting to be solved, then we would not have so many
   solutions! Articulating the problem itself therefore has become a
   challenge. Proliferation of solutions without an adequate discussion
   of the entire problem set creates a lot of confusion.

   The easiest way to move forward is to identify a broader set of
   problems rather than looking at "how do we move a VM in the network"
   or "what is the way to connect datacenters" in isolation.

   The need for identifying a broader set of problems is reinforced by a
   recognition (that may be yet to dawn in some cases) that some
   solutions only shift the burden of a problem to another point that
   will become obvious after the solution has been deployed.




Dalela                  Expires June 30, 2012                  [Page 3]

Internet-Draft         Datacenter Requirements            December 2011


   For instance, if we use flat addressing and propagate all datacenter
   addresses into the Internet, the address fragmentation caused by VM
   mobility would need impossible scale in the Internet core. If however
   we implement overlays to connect multiple datacenters, additional
   signaling and encapsulation overheads are involved at the datacenter
   edge. These overheads will rapidly grow as the size of datacenters
   and the number of interconnected datacenters grows. Overlay solutions
   shift the problem from the Internet core to the datacenter edges, and
   will now appear at the edge at another time. This might be
   acceptable, but it needs to be understood. Also, an approach that
   solves scaling both in Internet core and the datacenter edge should
   be preferred over those that only shift the scaling problem.

   Without a broader consideration of all the problems involved, it is
   likely that we might be adopting solutions that will lead to other
   problems. The solution of these second-order problems will be harder,
   and lead to much more complexity than if a careful consideration was
   given to the entire problem set from start.

   The key point of this document is that the datacenter problem is
   multi-faceted. There are many ways to solve each of these problems
   individually. But, taken together, the solutions may be less than
   optimal. At scale, less than optimal solutions quickly break down.
   Without scale, any solution will work perfectly well. A key criterion
   for selecting amongst many possible solution alternatives should be
   whether the entire solution set scales very well.

   Scalability hasn't generally been a standards consideration and the
   problems of scaling are left to implementation. But, in the case of
   cloud datacenters, scaling is the basic requirement, and all problems
   of cloud datacenters arise due to scaling. The solution development
   can't therefore ignore the scaling and optimality problem.

   As an initial step, therefore, we need to state the multiple facets
   of the datacenter problem. These problems can be used to identify the
   optimal approach that will collectively solve all problems, without
   worsening the solution of another problem. Multi-facets of the
   overall approach can then be divided into individual workgroups to
   build a consistent solution. This kind of approach is needed for the
   cloud datacenter domain, because the problem is multi-faceted.

   This document identifies a set of networking and operational problems
   in the datacenter that need to be solved collectively.






Dalela                  Expires June 30, 2012                  [Page 4]

Internet-Draft         Datacenter Requirements            December 2011


2. Conventions used in this document

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC-2119 [RFC2119].

   In this document, these words will appear with that interpretation
   only when in ALL CAPS. Lower case uses of these words are not to be
   interpreted as carrying RFC-2119 significance.

3. Terms and Acronyms

   NA

4. Cloud Characteristics

   Most cloud characteristics are well-known - on-demand, elastic,
   large-scale, etc. We will not repeat them here. What we will briefly
   discuss is the impact of an on-demand, elastic, large-scale design on
   traditional assumptions about the network.

4.1. Network Fragmentation

   By definition, cloud involves rapid provisioning of resources. When a
   user requests VMs incrementally, these VMs will generally be
   allocated in different locations where capacity is currently
   available. When these VMs connect to each other, it consumes a huge
   amount of cross-sectional bandwidth. Over time, these machines need
   to be consolidated to conserve cross-sectional bandwidth.

   Fragmentation will also be caused due to maintenance windows, because
   during this time, no additional configuration is allowed since
   hardware or software is upgrading. Providers will not turn away
   customers because they are upgrading hardware or software! That means
   new requests must land up in locations that are not upgrading.
   Subsequently, these fragmented islands need to be consolidated. At
   massive scale, something or the other will mostly be upgrading.

   A provider has no good way of predicting or planning the behavior of
   various users who use the cloud services. A provider can only react
   to these behaviors. And react they must to optimize resources.
   Network fragmentation therefore has to be fixed reactively. Over time
   predictive trends could be devised but these are not likely to be
   very consistent. We will probably discover that like network trends,
   cloud trends are statistical and self-similar over time.




Dalela                  Expires June 30, 2012                  [Page 5]

Internet-Draft         Datacenter Requirements            December 2011


4.2. Hardware-Software Reliability

   At 5-nines reliability, one out of 100,000 hardware devices will fail
   every 5.25 minutes. At 3-nines reliability, one out of 1000 software
   elements will fail every 5.25 minutes. At the scale being targeted
   for cloud, there are easily a million hardware devices and several
   hundred million software instances. This implies a massive rate of
   on-going failure that needs to be recovered from. For instance, at 3-
   nines reliability, and 100 million VM/applications, a VM/application
   will fail every 3 milliseconds on average. At 5-nines reliability, a
   VM/application will fail every 300 milliseconds on average.

   Large web companies with experience in running massive datacenters
   echo this view. They design large clusters that run common services
   and are dispensable as a whole, because there are standby clusters
   that can take over their work. That isn't true of consumer clouds in
   general, and can be very expensive. The best recourse is to detect
   failures and recreate services as quickly as possible.

   Recovery from on-going failures forces churn in network. The provider
   needs to find new locations quickly where services can be recreated.
   To detect these failures and recreate them, there is a significant
   effort in monitoring these hardware-software entities.

   The situation is further complicated by the fact that new devices are
   constantly being provisioned. Every network operator knows from
   experience that their systems generally work correctly until
   configuration is changed. Configuration changes are confined to
   maintenance windows when existing configuration is backed up and
   administrators prepare users for possible failures. There are no such
   time windows in cloud. The changes are ever present and on-going.
   This implies a higher rate of failure and churn.

4.3. Data is Relatively Immobile

   To optimize disk usage, it is optimal to use network storage, rather
   than local storage. This also simplifies storage backup. But it also
   implies that data is immobile. When a VM moves, it is not carrying
   its data along. Rather big data pipes need to haul the bits from one
   location to another. You can recreate the VM but can't recreate data
   easily. That means that this hauling is necessary.

   It implies that network has to consider data immobility as a major
   source of network traffic, besides server-to-server and user-to-
   server traffics that are already recognized. When data is outside the
   VM, the security enforcement also needs to be more effective. In
   effect, we need to consider both static and in-flight data security.


Dalela                  Expires June 30, 2012                  [Page 6]

Internet-Draft         Datacenter Requirements            December 2011


   In-flight data security comes from encryption and static security
   from authentication. Both of these traditional approaches are
   expensive. Network segmentation needs to be used to address them.

5. Problem Statement

   This section captures the various types of datacenter problems, one
   per section, at a high-level. These problems are described in
   relation to widely deployed and standardized L2 and L3 network
   protocols and practices. A complete analysis of proposed or emerging
   technologies, including current on-going work in IETF in other
   workgroups, is out of scope for this document. However, the problem
   statement here can be used to subsequently discuss that work.

5.1. The Basic Forwarding Problem

   Traditionally, datacenter networks have used L2 or L3 technologies.
   The need to massively scale virtualized hosts breaks both these
   approaches. L2 networks can't be made to scale because of high number
   of broadcasts. L3 networks can't support host mobility, since routing
   uses subnets and an IP cannot be moved out of that subnet. Moving IP
   in a natively L3 network requires installing host routes at one or
   more points in the path and that is an approach that can't be scaled.

   The failure of traditional L2 and L3 networking approaches means that
   a new forwarding approach needs to be devised that scales massively
   and supports host mobility to different points in the datacenter.

5.2. The Datacenter Inter-Connectivity Problem

   There are limits to how much a datacenter would be scaled. Workloads
   need to be placed closer to the clients to reduce latency and
   bandwidth. Hence, datacenters need to be split into geographical
   locations and connected over the Internet. Some of these datacenters
   may be owned by different administrators, as in the case of private
   and public cloud interconnectivity. Workloads can move between these
   datacenters, similar to how they move within the datacenter.

   This problem is similar to the mobility problem within a datacenter,
   except that the intra-datacenter mobility can use modifications to L2
   or L3 technologies, but the inter-datacenter connection problem must
   run over an L3 internet. This is because pushing the datacenter
   routes into the internet is not a scalable solution.

   Regardless of these difference, treating inter and intra datacenter
   as entirely independent leads to new issues at the edge that arise
   from trying to map one forwarding approach within datacenter to


Dalela                  Expires June 30, 2012                  [Page 7]

Internet-Draft         Datacenter Requirements            December 2011


   another forwarding approach between datacenters. In some cases, both
   L2 and L3 approaches may be needed to connect two datacenters.
   Further, ideally, customer segmentation in the internet needs to be
   done similar to the segmentation in the datacenter. This simplifies
   the identification of a customer's packets in the Internet as in the
   datacenter. Common QoS and Security policies can be applied, in both
   the domains if there is a common way to identify packets.

5.3. The Multi-Tenancy Problem

   Datacenters thus far have been wholly used by single tenant. To
   separate departments within a tenant, VLANs have been used. This
   seemed sufficient for the number of segments an enterprise would
   need. But, this approach can't be extended to cloud datacenters.

   First, the number of tenants in the public cloud will far exceed
   4096, which is the limit of VLANs possible. Second, many such tenants
   may want to use more than one VLAN to segment their network in ways
   similar to how they have been doing in the private domain. Third,
   some tenants would like to span their private VLAN space into the
   public cloud using the same VLAN id. Since many tenants use the same
   VLAN in their private domains, extending these VLANs into the public
   domain will cause VLAN overlap. These issues limit the use of VLANs
   as isolation mechanisms to segment tenants in the cloud.

   The use of L3 VRFs also poses similar challenges of scaling. VRFs
   will bloat a routing table if addresses within a subnet belong to
   different tenants (even when there is no mobility) because routes to
   a tenant's hosts must be separated from other tenant's routes. The
   problem further worsens when mobility is added to multi-tenancy. Now,
   host specific routes must be present at every point where the VRF is
   spanning. With VRFs, these entries will be present even if there is
   no traffic from a host to other hosts in the VRF.

   Regardless of whether L2 or L3 techniques are used, there is need to
   find a mechanism to segment networks by tenants, and the ability to
   identify a tenant on the network based on the packet. This would also
   be required for enforcing security and QoS policies per tenant.

5.4. The Technology-Topology Separation Problem

   While large datacenters are becoming common, medium and small
   datacenters will continue to exist. These may include a branch office
   connected to a central office, or a small enterprise datacenter that
   is connected to a huge public cloud. To move workloads across these
   networks, the technologies used in the datacenter must be agnostic of
   the topology employed in the various sized datacenters.


Dalela                  Expires June 30, 2012                  [Page 8]

Internet-Draft         Datacenter Requirements            December 2011


   A small datacenter may use a mesh topology. A medium datacenter may
   use a three-tier topology. And a large datacenter may use a two-tier
   multi-path architecture. It has to be recognized that all these
   datacenters of various sizes need to interoperate. In particular, it
   should be possible to use a common technology to connect large and
   small datacenters, two large datacenters, or two small datacenters.

   The technology devised to support massive scale and mobility
   therefore should be topology agnostic. This is true of L2 and L3
   technologies. For instance, both STP and OSPF are topology agnostic.
   This separates the question of technology from the question of
   topology, and has been key reason for their success. Similar
   separation of technology and topology is needed in datacenters.

   This of course does not imply that questions of architecture are
   unimportant. Quite the contrary, since different architectures
   facilitate different cross-sectional bandwidths. However, concerns
   about cross-sectional bandwidth (which arise from oversubscription)
   are orthogonal to the other issues about forwarding. Problems of
   oversubscription have to be handled at massive scale, and the
   architecture of choice depends on application traffic patterns. Given
   a certain network design (based on application traffic patterns) it
   should be possible to use the same technology everywhere.

5.5. The Network Convergence Problem

   Cloud datacenters will be characterized by elasticity. That means
   that virtual resources are constantly created and destroyed. Typical
   hardware and software reliabilities of today mean that failures at
   scale will be fairly common, and automated recovery mechanisms will
   need to be put in place. When combined with workload mobility for the
   sake of resource optimization and improving utilization, the churn in
   the network forwarding tables can be very significant.

   To keep the network stable, this churn needs to be minimized.
   Frequent changing of routes and insertion and removal of devices will
   add instabilities to the entire network that can lead to large scale
   outages. At scale, minor issues may propagate into the entire
   network, making it harder to debug and fix issues.

   Current L2 and L3 technologies solved this in different ways. A L2
   network recovers automatically through the effort of the hosts that
   ARP for each other. A L3 network remains stable by forwarding based
   on subnets, and subnet changes in the datacenter are infrequent. A
   lot of effort has been invested into reducing the convergence times
   of L2 and L3 technologies. These mechanisms need to be extended to
   whatever new forwarding approach is designed for datacenter.


Dalela                  Expires June 30, 2012                  [Page 9]

Internet-Draft         Datacenter Requirements            December 2011


   Any approach that propagates network state change everywhere (like
   traditional routing protocols) is unlikely to work. But, note that
   such changes may often be unavoidable in case of multicast and
   broadcast traffic where movement and creation of resources changes
   the boundary of the multicast and broadcast domains.

   Mobility also affects virtualized network devices, such as virtual
   switches, firewalls, load-balancers, etc. For instance, when a server
   fails and all the VMs are relocated, the associated virtual switch
   and firewall must also be relocated. This means that any assumption
   in mobility that the network is a static firmament on which hosts are
   dynamically attached becomes false. We have to assume that the
   network is as dynamic as the hosts themselves.

5.6. The East-West Traffic Problem

   Datacenter traffic patterns are changing. Instead of predominantly
   north-south traffic, the traffic patterns are now largely server-to-
   server. This change is driven by the fact that computation now deals
   with massive amounts of data, and to compute the result in short
   periods of time, the data needs to be distributed into many compute
   nodes. When data in one of these locations changes those changes have
   to be propagated to other nodes. Likewise, a single request has to be
   forked into many requests and the results have to be collated, in a
   design often called Map-Reduce. These application design patterns
   influence the traffic patterns in datacenters.

   The predominant traffic pattern in datacenter is not 1-to-1 anymore.
   It is rather replaced by 1-to-N and N-to-1. For instance, when a data
   node is replicated to many nodes, or when a problem is "mapped" to
   many servers, the traffic pattern is 1-to-N. When the problem is
   "reduced", the traffic pattern is N-to-1. To deal with 1-to-N and N-
   to-1 traffic patterns, there must be sufficient bandwidth at ingress
   and egress. This is possible through use of multi-paths.

   The distributed application models are predominantly used today in
   large web companies. However, increasingly these models will be used
   for any application that deals with large amounts of data or very
   complex processing needs. This includes variety of HPC applications,
   telecommunications, Big Data analytics, etc. Today, cloud may be used
   only for hosting web servers or simple one or two VM applications.
   The traditional enterprise applications also have used a single
   machine. This model will increasingly give way to new application
   architectures that involve distribution of processing across compute
   nodes. The data in these nodes may be distributed as well.




Dalela                  Expires June 30, 2012                 [Page 10]

Internet-Draft         Datacenter Requirements            December 2011


   To deal with distributed applications, there need to be many physical
   paths to destination. If these physical paths are available, then
   forwarding technologies must enable the use all those paths. That
   implies that STP traditionally used in L2 networks can't be used in
   datacenters because STP converts a physical mesh into a logical tree,
   turning off most of the ports to avoid packet flooding. L3 unicast
   routing protocols don't suffer from this problem. However, the issue
   returns for L3/L2 multicast and L2 broadcast which use trees.

   Use of multi-paths should be allowed for both L2 and L3 traffics,
   namely that packets that cross a VLAN boundary should also be able to
   use multi-paths. This becomes difficult if the default gateway for a
   VLAN is pinned at some point in the network, because then all packets
   that cross the VLAN boundary must go through that gateway.

5.7. The Network SLA Problem

   Multi-tenant networks need to protect all tenants from overusing
   network resources. For example, high-traffic load from one tenant
   should not starve another tenant of bandwidth. Note that in a multi-
   tenant environment, no tenant has full control or visibility of what
   other tenants are doing, and how problems can be fixed. A real-time
   debugging of such problems is very hard for a provider.

   There are two alternatives to address this problem.

   First, a provider can overprovision the network at the core and
   aggregation to ensure that all tenants can get a good bandwidth. This
   scheme is not fool-proof. It fails when many flows converge into the
   same aggregation or core link, while other links remain idle. The
   total cross-sectional bandwidth maybe same but flow convergence will
   result in buffer overflows and packet drops. When this happens, all
   flows will crawl. Given statistical randomness of flows, the
   situation will most likely ease soon but it may leave behind a data
   backlog that causes other issues. What makes it incredibly hard is
   that it depends on a network-wide flow pattern and would be nearly
   impossible to replicate in a multi-tenant environment.

   The above scenario can be made infrequent by increasing the extent of
   over-provisioning, but it cannot be completely solved. Over-
   provisioning reduces the probability of this happening, but does not
   have methods to control its effects when it does happen. The adverse
   consequences of its occurrence completely depend on the application,
   which the provider cannot control, and could be limited to a short-
   lived slow application performance or may cause long-lived data out-
   of-sync issues. Over-provisioning is also expensive.



Dalela                  Expires June 30, 2012                 [Page 11]

Internet-Draft         Datacenter Requirements            December 2011


   Second, mechanisms to measure and guarantee network SLAs will have to
   employ active flow management to guarantee bandwidth to all tenants
   and keep the network provisioned only to the level required. Flow
   management can be integrated as part of existing forwarding
   techniques or may need new techniques. Network SLAs can play an
   important role in determining if sufficient bandwidth is available
   before a VM is moved to a new location.

5.8. The Broadcast and Multicast Problem

   Traffic forwarding needs to take into account broadcast control and
   multicast optimization. ARP broadcast control is already recognized
   to be an important problem because hosts tend to refresh their ARP
   tables 15-30 seconds. Depending on the number of VMs in a datacenter,
   the number of other VMs a VM interacts with, the number of VLANs a
   physical host is part of, and if these VLANs are spanning across
   subnets, where the L2-L3 boundary is placed in the network, ARP
   broadcasts can range from manageable to unmanageable. Note that L2-L3
   boundaries conflict with VM mobility across L3 domains.

   Broadcast control is also very important at the datacenter edges, if
   a VLAN is spanning across different sites. Periodic broadcasts, if
   flooded over the datacenter interconnect, can be an overhead.

   DHCP broadcast in a multi-tenant environment also poses a challenge
   since the IP assignment for each tenant may be different, and may
   therefore involve the use of different DHCP servers. Mapping a tenant
   to a DHCP server requires additional intelligence and configuration.
   Alternately, tenant separation needs to be built in DHCP.

   Both multicast and broadcast traffic across datacenters needs to
   devise flood control. Note that the inter-datacenter connectivity is
   generally an overlay. This implies that for broadcast a VLAN spanning
   tree and for multicast a multicast distribution tree must be
   constructed within the overlay. The overlay can be a mesh (ECMP) but
   the multicast and broadcast distribution must be in trees. The roots
   of these trees need to be distributed across the various datacenter
   edges to load-balance multicast and broadcast traffic load.

   The multicast and broadcast trees should also be constructed in a way
   that minimizes latency and conserves bandwidth. That means that the
   distribution trees may need to be reconstructed with VM mobility. The
   reconstruction of trees must be such that the locations of maximum
   workload density are placed closest on the distribution tree.





Dalela                  Expires June 30, 2012                 [Page 12]

Internet-Draft         Datacenter Requirements            December 2011


5.9. The Cloud Control Problem

   Move into a multi-tenant environment depends on opening up sufficient
   controls for a user to "look inside" the cloud and be able to control
   it to a sufficient low-level of granularity. A cloud that equals to
   fuzzy (in terms of control) will not be preferred by users who are
   accustomed to designing their networks and controlling them. This
   includes most high-end users whose needs require customization of the
   network. The "one size fits all" approach can't be used.

   The "control" problem also translates into a user being able to
   access the same service from more than one provider in the same way,
   similar to how most enterprises multi-home their networks into more
   than one network provider. The cloud infrastructure is no less
   important than the internet connection that is used to reach to it,
   and concerns of high-availability and lock-in prevention are as
   important for cloud as they were for networks. Further, to
   interoperate the public and private domains, the control mechanism
   must be interoperable across private and public domains. This is
   practically achieved only if the private and public cloud control
   mechanisms are made to converge into a single mechanism.

   From a provider perspective, there are many infrastructure "domains"
   that need combining to deliver complete services. These include
   compute, network, storage, security, firewalls, load balancers, wan
   optimizers, etc. Standards for orchestrating these domains are needed
   to manage them in a common way. If the orchestration interfaces are
   standardized, then they can be delivered with the infrastructure
   itself similar to how management interfaces have been provided so
   far. A new set of standardized management interfaces are desired that
   allow users to create, modify, move and delete virtual devices.

   Mobility of devices in particular poses some challenges. For
   instance, when a VM moves, it needs to drag the network configuration
   along with it. This may include the VLANs that the host was part of,
   and other network interface level policies such as bandwidth.
   Similarly, if there were firewall rules protecting a VM, these need
   to be moved as well into the path of all packets at the new location.
   Prior to the move, it must be determined if there is sufficient
   bandwidth available at the new location. The orchestrator has to be
   aware of the current packet flows and the latencies in the new
   location. For user facing VMs, as a general rule, the VM must be
   equidistant from storage and user for optimal performance.

   These requirements obviously imply that the orchestrator has to be at
   the very least aware of network topology, various paths to a
   destination and the current load on the network. Tight linkage


Dalela                  Expires June 30, 2012                 [Page 13]

Internet-Draft         Datacenter Requirements            December 2011


   between network and orchestrator is needed to achieve this. This is a
   major problem today because currently we treat the network topology
   and orchestration of resources as unrelated domains.

5.10. The Forwarding Plane Scale Problem

   The need to support VM mobility has led to many encapsulation
   schemes. These schemes have variable scaling properties. In the best
   case, the network scales like L3 routing if there is no mobility and
   only one hardware entry is needed per subnet (which is aggregated).
   In the worst case, network scales like L2 forwarding, when every host
   is mobile, and one hardware entry per host is needed (which can't be
   aggregated). As the network grows very large, the gap between the
   best and worst cases can be 5-6 orders of magnitude. When a device
   runs out of forwarding entry space, packets would be dropped for L3
   routing and flooded for L2 forwarding. Both can have serious side-
   effects depending on the nature of applications involved.

   Note that host routes are in addition to what currently exists on the
   network devices, and will continue to exist. This includes (a)
   network routes, (b) local host-port bindings, (c) access lists, port
   security, port based authentication, etc. The network of today is
   sufficient to handle these types of entries. Addition of host routes
   will imply a dramatic increase from current network abilities.

   But, how widespread is VM mobility? If it is widespread, then
   encapsulation schemes will not scale. If it is not widespread, then
   encapsulation schemes will work, although sub-optimally.

   Cases Where VMs Would Not Be Moved:
   a. When a VM has significant large local storage
   b. When a VM uses consistent amount of CPU always
   c. When a VM holds proprietary data that can't be moved
   d. When moving a VM with network storage will degrade performance
   e. When users for the VM are always in one location

   Cases Where VMs Would Be Moved:
   a. During maintenance cycles for hardware, software upgrade
   b. During disaster recovery of sites, or hardware/software failures
   c. When VM has different CPU, memory requirements at different times
   d. When VM don't need to be run always, and are job based
   e. When users for the VM have different locations based on time

   The above is not an exhaustive list but it shows that VM mobility is
   not applicable everywhere, but there are many cases when it is
   applicable. Also, when private-public domains are connected, the IP
   and MAC across these domains could overlap. The mobility approaches


Dalela                  Expires June 30, 2012                 [Page 14]

Internet-Draft         Datacenter Requirements            December 2011


   must therefore be segmented between tenants. This further grows table
   sizes because the route to an IP depends on which tenant is asking
   for it. Duplicating IP will also need duplicate routes.

   The problem of encapsulation is further worsened by the fact that it
   must be performed at the access layer. In a datacenter, generally,
   the ratio of access to core is about 1000 to 1. That means that the
   cost of encapsulation must be summed over 1000 devices to arrive at
   the total cost. This is different than if the cost was limited to a
   single device or a few devices in the core.

6. Security Considerations

   NA

7. IANA Considerations

   NA

8. Conclusions

   This document described a set of high-level cloud related problems.
   Potential solutions to these problems need to be evaluated in terms
   of their impact to other problems and their solutions.

9. References

9.1. Normative References

   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
             Requirement Levels", BCP 14, RFC 2119, March 1997.

10. Acknowledgments

   This document was prepared using 2-Word-v2.0.template.dot.














Dalela                  Expires June 30, 2012                 [Page 15]

Internet-Draft         Datacenter Requirements            December 2011


   Authors' Addresses

   Ashish Dalela
   Cisco Systems
   Cessna Business Park
   Bangalore
   India 560037

   Email: adalela@cisco.com








































Dalela                  Expires June 30, 2012                 [Page 16]