Internet DRAFT - draft-kiesel-alto-availability-metrics
draft-kiesel-alto-availability-metrics
ALTO S. Kiesel
Internet-Draft University of Stuttgart
Intended status: Standards Track M. Scharf
Expires: August 17, 2014 Alcatel-Lucent Bell Labs
February 13, 2014
ALTO metrices for expressing availability information
draft-kiesel-alto-availability-metrics-00
Abstract
This document specifies new metrices to be used with the ALTO
protocol. The goal is to provide information about the availability
of physical network, host, and storage infrastructures to management
systems that orchestrate virtual infrastructures on top of them.
Kiesel & Scharf Expires August 17, 2014 [Page 1]
Internet-Draft ALTO availability metrices February 2014
Terminology and Requirements Language
This document makes use of the ALTO terminology defined in [RFC5693]
and [RFC6708].
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 17, 2014.
Copyright Notice
Copyright (c) 2014 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Kiesel & Scharf Expires August 17, 2014 [Page 2]
Internet-Draft ALTO availability metrices February 2014
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
2. Classification of availability-related parameters . . . . . . 5
2.1. Identification of physical resources . . . . . . . . . . . 5
2.2. Classification of cost types and properties . . . . . . . 5
2.2.1. Static vs. dynamic facts vs. probabilites . . . . . . 5
2.2.2. Causality and Correlation . . . . . . . . . . . . . . 6
3. Specification of new Endpoint Address types . . . . . . . . . 8
4. Specification of new Cost and Property types . . . . . . . . . 9
5. Obtaining Availability Information . . . . . . . . . . . . . . 10
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11
7. Security Considerations . . . . . . . . . . . . . . . . . . . 12
8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 13
8.1. Normative References . . . . . . . . . . . . . . . . . . . 13
8.2. Informative References . . . . . . . . . . . . . . . . . . 13
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 14
Kiesel & Scharf Expires August 17, 2014 [Page 3]
Internet-Draft ALTO availability metrices February 2014
1. Introduction
Various virtualization technologies allow to instantiate virtual
hosts, virtual storage, and virtual networks on top of their physical
counterparts. They can be combined to build complex virtual
infrastructures. Management systems automate the task of mapping
virtual to physical resources, considering various optimization
goals. Mechanisms like live migration of virtual machines or re-
shaping the topology of overlay networks allow to dynamically react
on changing conditions, both in the virtual infrastructure (e.g.,
change in demand) and in the underlying physical infrastructures
(e.g., change in available resources).
A typical example is, that in a cluster of several physical servers,
in times of low demand, all virtual machines could be migrated away
from one node. This node would be powered down, in order to save
precious energy. If resource utilization is the only optimization
goal, the input for the placement/scheduling manager can be gathered
by measurements.
If, however, other optimization goals have to be considered, the
management system needs external information. For example, if all
but two nodes of the cluster are to be shut down, the remaining two
nodes should be selected in a way that minimizes the risk of both
nodes failing at the same time due to a single root cause. This
optimization problem becomes more difficult if not only hosts and
storage but also network resources are considered.
This document shows that the ALTO protocol [I-D.ietf-alto-protocol]
offers the required base mechanisms for providing a standardized
interface to virtual infrastructure orchestration managers, for
conveying information about the availability / reliability of the
underlying physical infrastructure, This document further defines
appropriate metrices for this use case.
Kiesel & Scharf Expires August 17, 2014 [Page 4]
Internet-Draft ALTO availability metrices February 2014
2. Classification of availability-related parameters
Important concepts of the ALTO protocol are Endpoints, which are
identified by Endpoint Addresses. Endpoints can be grouped in PIDs.
Endpoints (and by means of protocol extensions
[I-D.roome-alto-pid-properties] also PIDs) may have properties that
can be queried using the ALTO protocol. Paths between PIDs may have
one or more path costs according to some cost metric. These path
costs can be queried for individual pairs of PIDs, or a whole cost
map (i.e., a "PID x PID -> cost" matrix) can be downloaded. The path
cost concept can easily be generalized to a path property concept.
This section discusses how these base mechanisms can be used to
convey information related to availability of physical
infrastructures to systems that manage virtual infrastructures on top
of them.
2.1. Identification of physical resources
In order to identify physical resources within the ALTO protocol, an
appropriate endpoint address type has to be used. The ALTO base
protocol specification [I-D.ietf-alto-protocol] only defines IPv4 and
IPv6 addresses, and establishes a process to register further types.
In fact, IP addresses may be used to identify physical resources in
many cases, e.g., the loopback address of a router, or the management
address of a physical server, etc. For a discussion of VPNs and ALTO
see [I-D.scharf-alto-vpn-service].
TBD: discussion of further options for endpoint addesses.
2.2. Classification of cost types and properties
Information related to availability of physical resources may be of
different fundamental natures, requiring different encodings and
different update intervals. This section itemizes several criteria.
2.2.1. Static vs. dynamic facts vs. probabilites
Information may be static facts that change never or very
infequently. For example: "Electrical power outlets A and B are both
connected to circuit breaker F1".
Information may also be more frequently changing, e.g.,
"Uninterruptible power supply UPS1 is now running on battery power,
82% capacity left". TBD: further investigation and guidance is
needed on the maximum update frequency that can reasonably be done
using ALTO.
Kiesel & Scharf Expires August 17, 2014 [Page 5]
Internet-Draft ALTO availability metrices February 2014
Another type of information are statistical measures such as the
average relative availability of a subsystem, e.g., the famous "five
nines".
2.2.2. Causality and Correlation
Many initial incidents can cause a series of events, according to
some kind of "failure propagation topology", which is independent of
the IP network topology. There may be even hierarchies.
For example, "Servers S1 and S2 are connected via circuit breaker F1
to uninterruptible power supply UPS1 while S3 and S4 are connected
via F2 to UPS1" implies that a failure in S1 triggering F1 will also
interrupt operation of S2. Furhermore, shutting down S2, S3, and S4
in case of a power grid failure could strech UPS1's battery lifetime
and thereby prolong S1's survivability time. Similar considerations
can be made for different kinds of problems, e.g., the impact of a
fire.
Modeling diffent risk types (e.g., power outage, fire, flooding,
physical intruders, etc.) in their respective terminology would
require the definition of many new data types.
A more generic approach is to use an ALTO cost map as a matrix, which
indicates the level of isolation against "fate sharing" of any two
PIDs with respect to a given (physical) risk. In other words, for
every specific risk R the coefficients of that matrix could be
calculated as
C_R(x,y) = 1 - P( y fails due to R | x fails due to R )
For example, if the risk type is "fire", then a coefficient of 0
could mean "these two physical resources are in the same rack. If
one is on fire for any reason, the other one will almost inevitably
fail within seconds, too.", a value of 0.3 could mean "the resources
are in adjacent buildings" and 0.99999 could mean "these two
resources are on different continents and only a natural disaster
causing global destruction could disable both of them in one single
event".
Note that these conditional properties only indicate how likely it is
that the second resource will become unavailable due to the same
event that disabled the first resource. They do not indicate how
likely it is that the event will actually occur.
TBD: discuss to which extent a single "endpoint address to PID"
network map is useful when considering different risk types. The
idea behind PIDs is to reduce map size by grouping topologically
Kiesel & Scharf Expires August 17, 2014 [Page 6]
Internet-Draft ALTO availability metrices February 2014
close endpoints, but the "failure propagation topologies" may be very
unalingned for different risk types. We will probably end up with
many very small PIDs.
Kiesel & Scharf Expires August 17, 2014 [Page 7]
Internet-Draft ALTO availability metrices February 2014
3. Specification of new Endpoint Address types
TBD.
Kiesel & Scharf Expires August 17, 2014 [Page 8]
Internet-Draft ALTO availability metrices February 2014
4. Specification of new Cost and Property types
TBD.
We need: the "isolation level agains fate sharing" matrix, and a list
of risk types, in order to give the absolute probability of that risk
for a given resource.
Kiesel & Scharf Expires August 17, 2014 [Page 9]
Internet-Draft ALTO availability metrices February 2014
5. Obtaining Availability Information
For any ALTO information, it is important to consider whether the
ALTO service realistically can discover that information, if the
distribution of that information is allowed, if the data is useful,
if a client can get that information without excessive privacy
concerns, and if the information cannot be gathered easily be found
in some other way.
Availability-related parameters can both refer to properties of the
network infrastructure (e.g., network resiliency mechanisms) as well
as non-networking effects (e.g., redundancy of power supply). In
both cases, an application typically cannot measure that information,
neither by passive monitoring nor by active probing. Yet,
availability information and insight into impact of incidents matters
to many applications and can be an important criteria for resource
selection decisions. Since typical use cases would be limited to one
administrative domain, privacy is not a major concern; in addition,
the suggested correlation metrics provide an abstraction over the
actual physical infrastructure.
Gathering availability information may be more challenging than, for
instance, IP routing topologies. For instance, it may require access
to inventory databases. Yet, within one domain, the organization
that is responsible for the physical network topology may also take
care of other parts of the physical infrastructure, such as the power
supply or hardware installation. An organization that operates an
ALTO server for exposing network topology information could therefore
also have access to other inventory data. Therefore, providing
availability information to an ALTO server as described in this
document is realistic.
Kiesel & Scharf Expires August 17, 2014 [Page 10]
Internet-Draft ALTO availability metrices February 2014
6. IANA Considerations
TBD.
Kiesel & Scharf Expires August 17, 2014 [Page 11]
Internet-Draft ALTO availability metrices February 2014
7. Security Considerations
TBD.
Kiesel & Scharf Expires August 17, 2014 [Page 12]
Internet-Draft ALTO availability metrices February 2014
8. References
8.1. Normative References
[I-D.ietf-alto-protocol]
Alimi, R., Penno, R., and Y. Yang, "ALTO Protocol",
draft-ietf-alto-protocol-25 (work in progress),
January 2014.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
8.2. Informative References
[I-D.roome-alto-pid-properties]
Roome, B. and Y. Yang, "PID Property Extension for ALTO
Protocol", draft-roome-alto-pid-properties-00 (work in
progress), October 2013.
[I-D.scharf-alto-vpn-service]
Scharf, M., Gurbani, V., Soprovich, G., and V. Hilt, "The
Virtual Private Network (VPN) Service in ALTO: Use Cases,
Requirements and Extensions",
draft-scharf-alto-vpn-service-01 (work in progress),
July 2013.
[RFC5693] Seedorf, J. and E. Burger, "Application-Layer Traffic
Optimization (ALTO) Problem Statement", RFC 5693,
October 2009.
[RFC6708] Kiesel, S., Previdi, S., Stiemerling, M., Woundy, R., and
Y. Yang, "Application-Layer Traffic Optimization (ALTO)
Requirements", RFC 6708, September 2012.
Kiesel & Scharf Expires August 17, 2014 [Page 13]
Internet-Draft ALTO availability metrices February 2014
Authors' Addresses
Sebastian Kiesel
University of Stuttgart Information Center
Networks and Communication Systems Department
Allmandring 30
Stuttgart 70550
Germany
Email: ietf-alto@skiesel.de
URI: http://www.rus.uni-stuttgart.de/nks/
Michael Scharf
Alcatel-Lucent Bell Labs
Lorenzstrasse 10
Stuttgart 70435
Germany
Email: michael.scharf@alcatel-lucent.com
URI: www.alcatel-lucent.com/bell-labs
Kiesel & Scharf Expires August 17, 2014 [Page 14]