Internet DRAFT - draft-pedro-anticipated-adaptation
draft-pedro-anticipated-adaptation
NFVRG P. Martinez-Julia, Ed.
Internet-Draft NICT
Intended status: Informational 2 20, 2018
Expires: August 24, 2018
Exploiting External Event Detectors to Anticipate Resource Requirements
for the Elastic Adaptation of SDN/NFV Systems
draft-pedro-anticipated-adaptation-00
Abstract
The adoption of SDN/NFV technologies by current computer and network
system infrastructures is constantly increasing, becoming essential
for the the particular case of edge/branch network systems. The
systems supported by these infrastructures require to be adapted to
environment changes within a short period of time. Thus, the
complexity of new systems and the speed at which management and
control operations must be performed go beyond human limits. Thus,
management systems must be automated. However, in several situations
current automation techniques are not enough to respond to
requirement changes. Here we propose to anticipate changes in the
operation environments of SDN/NFV systems in response to external
events and reflect it in the anticipation of the amount of resources
required by those systems for their ulterior adaptaion. The final
objective is to avoid service degradation or disruption while keeping
close-to-optimum resource allocation to reduce monetary and operative
cost as much as possible. Here we discuss how to achieve such
capabilities by the integration of the Autonomic Resource Control
Architecture (ARCA) to the management and operation (MANO) of NFV
systems. We showcase it by building a multi-domain SDN/NFV
infrastructure based on OpenStack and deploying ARCA to adapt a
virtual system based on the edge/branch network concept to the
operational conditions of an emergency support service, which is
rarely used but that cannot leave any user unattended.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
Martinez-Julia Expires August 24, 2018 [Page 1]
Internet-Draft Anticipated Adaptation of SDN/NFV Systems 2 2018
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 24, 2018.
Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4
3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.1. Virtual Computer and Network Systems . . . . . . . . . . 4
3.2. SDN and NFV . . . . . . . . . . . . . . . . . . . . . . . 5
3.3. Management and Control . . . . . . . . . . . . . . . . . 5
3.4. The Autonomic Resource Control Architecture (ARCA) . . . 6
4. External Event Detectors . . . . . . . . . . . . . . . . . . 7
5. Anticipating Requirements . . . . . . . . . . . . . . . . . . 8
6. ARCA Integration With ETSI-NFV-MANO . . . . . . . . . . . . . 8
6.1. Functional Integration . . . . . . . . . . . . . . . . . 9
6.2. Target Experiment and Scenario . . . . . . . . . . . . . 11
6.3. OpenStack Platform . . . . . . . . . . . . . . . . . . . 13
6.4. Initial Results . . . . . . . . . . . . . . . . . . . . . 14
7. Relation to Other IETF/IRTF Initiatives . . . . . . . . . . . 17
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17
9. Security Considerations . . . . . . . . . . . . . . . . . . . 17
10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 17
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 17
11.1. Normative References . . . . . . . . . . . . . . . . . . 18
11.2. Informative References . . . . . . . . . . . . . . . . . 18
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 19
Martinez-Julia Expires August 24, 2018 [Page 2]
Internet-Draft Anticipated Adaptation of SDN/NFV Systems 2 2018
1. Introduction
The incorporation of Software Defined Networking (SDN) and Network
Function Virtualization (NFV) to current infrastructures to build
virtual computer and network systems is constantly increasing. The
need to automate the management and control of such systems has
motivated us to design the Autonomic Resource Control Architecture
(ARCA), as presented in ICIN 2018 [ICIN-2018]. Automation
requirements are enough justified by the increasing size and
complexity of systems, which in turn are essential in the current
digital world. Moreover, the particular requirements and market
benefits of network virtualization have been crystallized in the
uprising of SDN/NFV infrastructures. Nowadays they broad reception
of the combined SDN/NFV technology supposes a huge leap towards the
empowerment and homogenization of virtualization technologies.
Therefore, we have modeled ARCA to fit within the reference
architecture for management and orchestration of NFV elements, the
Virtual Network Functions (VNFs).
Behind the scenes, NFV is based on a highly distributed and network
empowered version of the well-known Cloud infrastructures and
platforms, also complemented by their centralized counterparts. This
takes to virtual networks the high degree of flexibility already
found for computer systems. It is highly desirable at the time NFV
is being exploited by many organizations to build their private
infrastructures, as well as by network service providers to build the
services they later commercialize. However, to actually exploit the
potential monetary and operative cost reduction that is associated to
such infrastructures, the amount of resources used by production
services must be kept close to the optimum, so the physical resources
are exploited as much as possible.
The fast detection of changes in the requirements of the virtual
systems deployed on the aforementioned SDN/NFV infrastructures, and
the consequent adaptation of allocated resources to the new
situations, becomes essential to actually exploit their cost and
operative benefits, while also avoiding service unresponsiveness due
to underlying resource overloading. It is widely accepted that the
size and complexity of systems and services makes it difficult for
humans to accomplish such task within their objective time
boundaries. Therefore, they must be automated. Luckily, the
architecture and underlying platforms supporting the SDN/NFV
technologies enable the required automation. In fact, some solutions
already exist to perform several batched or scripted tasks without
human intervention. However, those solutions still have high
dependences on low-level human involvement. This remarks the
challenge found in control and management automation, which is
continuously revised and enlarged.
Martinez-Julia Expires August 24, 2018 [Page 3]
Internet-Draft Anticipated Adaptation of SDN/NFV Systems 2 2018
ARCA provides as a small step towards the resolution of the
aforementioned problem. It advances the State of the Art in
automation of resource control and management by providing a
supervised but autonomous mechanism that reduces the time required to
perform corrective and/or adaptive changes in virtual computer and
network systems from hours/minutes to seconds/milliseconds.
Moreover, it is able to take advantage of the event notifications
provided by external detectors to anticipate the amount of resources
that the controlled SDN/NFV system will require in response to such
event. We propose to bring such benefit to the reference
architecture promoted by ETSI for the management and orchestration of
NFV services (see ETSI-NFV-MANO [ETSI-NFV-MANO]) by integrating ARCA
as the Virtual Infrastructure Manager (VIM). We showcase this
proposal by discussing the evaluation results obtained by ARCA when
runnion on a real and physical experimentation infrastructure based
on OpenStack [OPENSTACK]. We thus justify the need to adapt the
interfaces supported by the NFV-MANO to include real-world event
detectors, which are external to the virtualization platform and
virtual resources.
2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
3. Background
3.1. Virtual Computer and Network Systems
The continuous search for efficiency and cost reduction to get the
most optimum exploitation of available resources (e.g. CPU power and
electricity) has conducted current physical infrastructures to move
towards virtualization infrastructures. Also, this trend enables end
systems to be centralized and/or distributed, so that they are
deployed to best accomplish customer requirements in terms of
resources and qualities.
One of the key functional requirements imposed to computer and
network virtualization is a high degree of flexibility and
reliability. Both qualities are subject to the underlying
technologies but, while the latter has been always enforced to
computer and network systems, flexibility is a relatively new
requirement, which whould not have been impossed without the backing
of virtualization and cloud technologies.
Martinez-Julia Expires August 24, 2018 [Page 4]
Internet-Draft Anticipated Adaptation of SDN/NFV Systems 2 2018
3.2. SDN and NFV
SDN and NFV are conceived to bring high degree of flexibility and
conceptual centralization qualities to the network. On the one hand,
with SDN, the network can be programmed to implement a dynamic
behavior that changes its topology and overall qualities. Moreover,
with NFV the functions that are typically provided by physical
network equipment are now implemented as virtual appliances that can
be deployed and linked together to provide customized network
services. SDN and NFV complements to each other to actually
implement the network aspect of the aforementioned virtual computer
and network systems.
Although centralization can lead us to think on the single-point-of-
failure concept, it is not the case for these technologoes.
Conceptual centralization highly differs from centralized deployment.
It brings all benefits from having a single point of decision but
retaining the benefits from distributed systems. For instance,
control decisions in SDN can be centralized while the mechanisms that
enforce such decisions into the network (SDN controllers) can be
implemented as highly distributed systems. The same approach can be
applied to NFV. Althoug network functions can be implemented in a
central computing facility, they can take advantage of several
replication and distribution techniques to achieve the properties of
distributed systems. Nevertheless, NFV also allows the deployment of
functions on top of distributed systems, so they benefit from both
distribution alternatives at the same time.
3.3. Management and Control
The introduction of virtualization into the computer and network
system landscape has increased the complexity of both underlying and
overlying systems. On the one hand, virtualyzing underlying systems
adds extra functions that must be managed propoerly to ensure the
correct operation of the whole system, which not just encompasses
underlying elements but also the virtual elements running on top of
them. Such functions are used to actually host the overlying virtual
elements, so there is an indirect management operation that involves
virtual systems. Moreover, such complexities are inherited by final
systems that get virtualized and deployed on top of those
virtualization infrastructures.
In parallel, virtual systems are empowered with additional, and
widely exploited, functionality that must be managed correctly. It
is the case of the dynamic adaptation of virtual resources to the
specific needs of their operation environments, or even the
composition of distributed elements across heterogeneous underlying
infrastructures, and probably providers.
Martinez-Julia Expires August 24, 2018 [Page 5]
Internet-Draft Anticipated Adaptation of SDN/NFV Systems 2 2018
Taking both complex functions into account, either separately or
jointly, makes clear that management requirements have greatly
supassed the limits of humans, so automation has become essential to
accomplish most common tasks.
3.4. The Autonomic Resource Control Architecture (ARCA)
As deeply discussed in ICIN 2018 [ICIN-2018], ARCA leverages the
elastic adaptation of resources assigned to virtual computer and
network systems by calculating or estimating their requirements from
the analysis of load measurements and the detection of external
events. These events can be notified by physical elements (things,
sensors) that detect changes on the environment, as well as software
elements that analyze digital information, such as connectors to
sources or analyzers of Big Data. For instance, ARCA is able to
consider the detection of an earthquake or a heavy rainfall to
overcome the damages it can make to the controlled system.
The policies that ARCA must enforce will be specified by
administrators during the configuration of the control/management
engine. Then, ARCA continues running autonomously, with no more
human involvement unless some parameter must be changed. ARCA will
adopt the required control and management operations to adapt the
controlled system to the new situation or requirements. The main
goal of ARCA is thus to reduce the time required for resource
adaptation from hours/minutes to seconds/milliseconds. With the
aforementioned statements, system administrators are able to specify
the general operational boundaries in terms of lower and upper system
load thresholds, as well as the minimum and maximum amount of
resources that can be allocated to the controlled system to overcome
any eventual situation, including the natural crossing of such
thresholds.
ARCA functional goal is to run autonomously while the performance
goal is to keep the resources assigned to the controlled resources as
close as possible to the optimum (e.g. 5 % from the optimum) while
avoiding service disruption as much as possible, keeping client
request discard rate as low as possible (e.g. below 1 %). To achieve
both goals, ARCA relies on the Autonomic Computing (AC) paradigm, in
the form of interconnected micro-services. Therefore, ARCA includes
the four main elements and activities defined by AC, incarnated as:
Collector Is responsible of gathering and formatting the
heterogeneous observations that will be used in the control
cycle.
Analyzer Correlates the observations to each other in order to find
the situation of the controlled system, especially the
Martinez-Julia Expires August 24, 2018 [Page 6]
Internet-Draft Anticipated Adaptation of SDN/NFV Systems 2 2018
current load of the resources allocated to the system and
the occurrence of an incident that can affect to the normal
operation of the system, such as an earthquake that
increases the traffic in an emergency-support system, which
is the main target scenario studied in this paper.
Decider Determines the necessary actions to adjust the resources to
the load of the controlled system.
Enforcer Requests the underlying and overlying infrastructure, such
as OpenStack, to make the necessary changes to reflect the
effects of the decided actions into the system.
Being a micro-service architecture means that the different
components are executed in parallel. This allows such components to
operate in two ways. First, their operation can be dispatched by
receiving a message from the previous service or an external service.
Second, the services can be self-dispatched, so they can activate
some action or send some message without being previously stimulated
by any message. The overall control process loops indefinitely and
it is closed by checking that the expected effects of an action are
actually taking place. The coherence among the distributed services
involved in the ARCA control process is ensured by enforcing a common
semantic representation and ontology to the messages they exchange.
ARCA semantics are built with the Resource Description Framework
(RDF) and the Web Ontology Language (OWL), which are well known and
widely used standards for the semantic representation and management
of knowledge. They provide the ability to represent new concepts
without requiring to change the software, just plugin extensions to
the ontology. ARCA stores all its knowledge is stored in the
Knowledge Base (KB), which is queried and kept up-to-date by the
analyzer and decider micro-services. It is implemented by Apache
Jena Fuseki, which is a high-performance RDF data store that supports
SPARQL through an HTTP/REST interface. Being de-facto standards,
both technologies enable ARCA to be easily integrated to
virtualization platforms like OpenStack.
4. External Event Detectors
As mentioned above, current mechanisms used to achieve automated
management and control rely only on the continuous monitoring of the
resources they control or the underlying infrastructure that host
them. However, there are several other sources of information that
can be exploited to make the systems more robust and efficient. It
is the case of the notifications that can be provided by physical or
virtual elements or devices that are watching for specific events,
hence called external event detectors.
Martinez-Julia Expires August 24, 2018 [Page 7]
Internet-Draft Anticipated Adaptation of SDN/NFV Systems 2 2018
More specifically, although the notifications provided by these
external event detectors are related to successes that occur outside
the boundaries of the controlled system, such successes can affect
the typical operation of controlled systems. For instance, a heavy
rainfall or snowfall can be detected and correlated to a huge
increase in the amount of requests experienced by some emergency
support service.
5. Anticipating Requirements
One of the main goals of the MANO mechanisms is to ensure the virtual
computer and network system they manage meets the requirements
established by their owners and administrators. It is currently
achieved by observing and analyzing the performance measurements
obtained either by directly asking the resources forming the managed
system of by asking the controllers of the underlying infrastructure
that hosts such resources. Thus, under changing or eventual
situations, the managed system must be adapted to cope with the new
requirements, incrasing the amount of resources assigned to it, or to
make efficient use of available infrastructures, reducing the amount
of resources assigned to it.
However, the time required by the infrastructure to make effective
the adaptations requested by the MANO mechanisms is longer than the
time required by client requests to overload the system and make it
discard further client requests. This situation is generally
undesired but particularly dangerous for some systems, such as the
emergency support system mentioned above. Therefore, in order to
avoid the disruption of the service, the change in requirements must
be anticipated to ensure that any adaptation has finished as soon as
possible, preferably before the target system gets overloaded or
underloaded.
Here we propose to integrate ARCA with NFV-MANO to take advantage of
the notifications provided by the aforementioned external event
detectors, by correlating them to the target amount of resources
required by the managed system and enforcing the necessary
adaptations beforehand, particularly before the system performance
metrics have actually changed.
6. ARCA Integration With ETSI-NFV-MANO
In this section we describe how to fit ARCA on a general SDN/NFV
underlying infrastructure and introduce a showcase experiment that
demonstrates its operation on an OpenStack-based experimentation
platform. We first describe the integration of ARCA with the NFV-
MANO reference architecture. We contextualize the significance of
this integration by describing an emergency support scenario that
Martinez-Julia Expires August 24, 2018 [Page 8]
Internet-Draft Anticipated Adaptation of SDN/NFV Systems 2 2018
clearly benefits from it. Then we proceed to detail the elements
forming the OpenStack platform and finally we discuss some initial
results obtained from them.
6.1. Functional Integration
The most important functional blocks of the NFV reference
architecture promoted by ETSI (see ETSI-NFV-MANO [ETSI-NFV-MANO]) are
the system support functions for operations and business (OSS/BSS),
the element management (EM) and, obviously. the Virtual Network
Functions (VNFs). But these functions cannot exist without being
instantiated on a specific infrastructure, the NFV infrastructure
(NFVI), and all of them must be coordinated, orchestrated, and
managed by the general NFV-MANO functions.
Both the NFVI and the NFV-MANO elements are subdivided into several
sub-components. The NFVI has the underlying physical computing,
storage, and network resources, which are sliced (seedraft-qiang-
coms-netslicing-information-model-02
[draft-qiang-coms-netslicing-information-model-02] and draft-geng-
coms-architecture-01 [draft-geng-coms-architecture-01]) and
virtualized to conform the virtual computing, storage, and network
resources that will host the VNFs. In addition, the NFV-MANO is
subdivided in the NFV Orchestrator (NFVO), the VNF manager (VNFM) and
the Virtual Infrastructure Manager (VIM). As their name indicates,
all high-level elements and sub-components have their own and very
specific objective in the NFV architecture.
During the design of ARCA we enforced both operational and
interfacing aspects to its main objectives. From the operational
point of view, ARCA processes observations to manage virtual
resources, so it plays the role of the VIM mentioned above.
Therefore, ARCA has been designed with appropriate interfaces to fit
in the place of the VIM. This way, ARCA provides the NFV reference
architecture with the ability to react to external events to adapt
virtual computer and network systems, even anticipating such
adaptations as performed by ARCA itself. However, some interfaces
must be extended to fully enable ARCA to perform its work within the
NFV architecture.
Once ARCA is placed in the position of the VIM, it enhances the
general NFV architecture with its autonomic management capabilities.
In particular, it discharges some responsibilities from the VNFM and
NFVO, so they can focus on their own business while the virtual
resources are behaving as they expect (and request). Moreover, ARCA
improves the scalability and reliability of the managed system in
case of disconnection from the orchestration layer due to some
failure, network split, etc. It is also achieved by the autonomic
Martinez-Julia Expires August 24, 2018 [Page 9]
Internet-Draft Anticipated Adaptation of SDN/NFV Systems 2 2018
capabilities, which, as described above, are guided by the rules and
policies specified by the administrators and, here, communicated to
ARCA through the NFVO. However, ARCA will not be limited to such
operation so, more generally, it will accomplish the requirements
established by the Virtual Network Operators (VNOs), which are the
owners of the slice of virtual resources that is managed by a
particular instance of NFV-MANO, and therefore ARCA.
In addition to the operational functions, ARCA incorporates the
necessary mechanisms to engage the interfaces that enable it to
interact with other elements of the NFV-MANO reference architecture.
More specifically, ARCA is bound to the Or-Vi (see ETSI-NFV-IFA-005
[ETSI-NFV-IFA-005]) and the Nf-Vi (see ETSI-NFV-IFA-004
[ETSI-NFV-IFA-004] and ETSI-NFV-IFA-019 [ETSI-NFV-IFA-019]). The
former is the point of attachment between the NFVO and the VIM while
the latter is the point of attachment between the NFVI and the VIM.
In our current design we decided to avoid the support for the point
of attachment between the VNFM and the VIM, called Vi-Vnfm (see ETSI-
NFV-IFA-006 [ETSI-NFV-IFA-006]). We leave it for future evolutions
of the proposed integration, that will be enabled by a possible
solution that provides the functions of the VNFM required by ARCA.
Through the Or-Vi, ARCA receives the instructions it will enforce to
the virtual computer and network system it is controlling. As
mentioned above, these are specified in the form of rules and
policies, which are in turn formatted as several statements and
embedded into the Or-Vi messages. In general, these will be high-
level objectives, so ARCA will use its reasoning capabilities to
translate them into more specific, low-level objectives. For
instance, the Or-Vi can specify some high-level statement to avoid
CPU overloading and ARCA will use its innate and acquired knowledge
to translate it to specific statements that specify which parameters
it has to measure (CPU load from assigned servers) and which are
their desired boundaries, in the form of high threshold and low
threshold. Moreover, the Or-Vi will be used by the NFVO to specify
which actions can be used by ARCA to overcome the violation of the
mentioned policies.
All information flowing the Or-Vi interface is encoded and formatted
by following a simple but highly extensible ontology and exploiting
the aforementioned semantic formats. This ensures that the
interconnected system is able to evolve, including the replacement of
components, updating (addition or removal) the supported concepts to
understand new scenarios, and connecting external tools to further
enhance the management process. The only requirement to ensure this
feature is to ensure that all elements support the mentioned ontology
and semantic formats. Although it is not a finished task, the
development of semantic technologies allows the easy adaptation and
Martinez-Julia Expires August 24, 2018 [Page 10]
Internet-Draft Anticipated Adaptation of SDN/NFV Systems 2 2018
translation of existing information formats, so it is expected that
more and more software pieces become easily integrable with the ETSI-
NFV-MANO [ETSI-NFV-MANO] architecture.
In contrast to the Or-Vi interface, the Nf-Vi interface exposes more
precise and low-level operations. Although this makes it easier to
be integrated to ARCA, it also makes it to be tied to specific
implementations. In other words, building a proxy that enforces the
aforementioned ontology to different interface instances to
homogenize them adds undesirable complexity. Therefore, new
components have been specifically developed for ARCA to be able to
interact with different NFVIs. Nevertheless, this specialization is
limited to the collector and enforcer. Moreover, it allows ARCA to
have optimized low-level operations, with high improvement of the
overall performance. This is the case of the specific
implementations of the collector and enforcer used with Mininet and
Docker, which are used as underlying infrastructures in previous
experiments described in ICIN 2017 [ICIN-2017]. Moreover, as
discussed in the following section, this is also the case of the
implementations of the collector and enforcer tied to OpenStack
telemetry and compute interfaces, respectively.
Although OpenStack still lacks some functionality regarding the
construction of specific virtual networks, we use it as the NFVI
functional block in the integrated approach. Therefore, OpenStack is
the provider of the underlying SDN/NFV infrastructure and we
exploited its APIs and SDK to achieve the integration. More
specifically, in our showcase we use the APIs provided by Ceilometer,
Gnocchi, and Compute services as well as the SDK provided for Python.
All of them are gathered within the Nf-Vi interface. Moreover, we
have extended the Or-Vi interface to connect external elements, such
as the physical or environmental event detectors and Big Data
connectors, which is becoming a mandatory requirement of the current
virtualization ecosystem and it conforms our main extension to the
NFV architecture.
6.2. Target Experiment and Scenario
From the beginning of our work on the design of ARCA we are targeting
real-world scenarios, so we get better suited requirements. In
particular we work with a scenario that represents an emergency
support service that is hosted on a virtual computer and network
system, which is in turn hosted on the distributed virtualization
infrastructure of a medium-sized organization. The objective is to
clearly represent an application that requires high dynamicity and
high degree of reliability. The emergency support service
accomplishes this by being barely used when there is no incident but
also being heavily loaded when there is an incident.
Martinez-Julia Expires August 24, 2018 [Page 11]
Internet-Draft Anticipated Adaptation of SDN/NFV Systems 2 2018
Both the underlying infrastructure and virtual network share the same
topology. They have four independent but interconnected network
domains that form part of the same administrative domain
(organization). The first domain hosts the systems of the
headquarters (HQ) of the owner organization, so the VNFs it hosts
(servants) implement the emergency support service. We defined them
as ``servants'' because they are Virtual Machine (VM) instances that
work together to provide a single service by means of backing the
Load Balancer (LB) instances deployed in the separate domains. The
amount of resources (servants) assigned to the service will be
adjusted by ARCA, attaching or detaching servants to meet the load
boundaries specified by administrators.
The other domains represent different buildings of the organization
and will host the clients that access to the service when an incident
occurs. They also host the necessary LB instances, which are also
VNFs that are controlled by ARCA to regulate the access of clients to
servants. All domains will have physical detectors to provide
external information that can (and will) be correlated to the load of
the controlled virtual computer and network system and thus will
affect to the amount of servants assigned to it. Although the
underlying infrastructure, the servants, and the ARCA instance are
the same as those those used in the real world, both clients and
detectors will be emulated. Anyway, this does not reduce the
transferability of the results obtained from our experiments as it
allows to expand the amount of clients beyond the limits of most
physical infrastructures.
Each underlying OpenStack domain will be able to host a maximum of
100 clients, as they will be deployed on a low profile virtual
machine (flavor in OpenStack). In general, clients will be
performing requests at a rate of one request every ten seconds, so
there would be a maximum of 30 requests per second. However, under
the simulated incident, the clients will raise their load to reach a
common maximum of 1200 requests per second. This mimics the shape
and size of a real medium-size organization of about 300 users that
perform a maximum of four requests per second when they need some
support.
The topology of the underlying network is simplified by connecting
the four domains to the same, high-performance switch. However, the
topology of the virtual network is built by using direct links
between the HQ domain and the other three domains. These are
complemented by links between domains 2 and 3, and between domains 3
and 4. This way, the three domains have three paths to reach the HQ
domain: a direct path with just one hop, and two indirect paths with
two and three hops, respectively.
Martinez-Julia Expires August 24, 2018 [Page 12]
Internet-Draft Anticipated Adaptation of SDN/NFV Systems 2 2018
During the execution of the experiment, the detectors notify the
incident to the controller as soon as it happens. However, although
the clients are stimulated at the same time, there is some delay
between the occurrence of the incident and the moment the network
service receives the increase in the load. One of the main targets
of our experiment is to study such delay and take advantage of it to
anticipate the amount of servants required by the system. We discuss
it below.
In summary, this scenario highlights the main benefits of ARCA to
play the role of VIM and interacting with the underlying OpenStack
platform. This means the advancement towards an efficient use of
resources and thus reducing the CAPEX of the system. Moreover, as
the operation of the system is autonomic, the involvement of human
administrators is reduced and, therefore, the OPEX is also reduced.
6.3. OpenStack Platform
The implementation of the scenario described above reflects the
requirements of any edge/branch networking infrastructure, which are
composed of several distributed micro-data-centers deployed on the
wiring centers of the buildings and/or storeys. We chose to use
OpenStack to meet such requirements because it is being widely used
in production infrastructures and the resulting infrastructure will
have the necessary robustness to accomplish our objectives, at the
time it reflects the typical underlying platform found in any SDN/NFV
environment.
We have deployed four separate network domains, each one with its own
OpenStack instantiation. All domains are totally capable of running
regular OpenStack workload, i.e. executing VMs and networks, but, as
mentioned above, we designate the domain 1 to be the headquarters of
the organization. The different underlying networks required by this
(quite complex) deployment are provided by several VLANs within a
high-end L2 switch. This switch represents the distributed network
of the organization. Four separated VLANs are used to isolate the
traffic within each domain, by connecting an interface of OpenStack's
controller and compute nodes. These VLANs therefore form the
distributed data plane. Moreover, other VLAN is used to carry the
control plane as well as the management plane, which are used by the
NFV-MANO, and thus ARCA. It is instantiated in the physical machine
called ARCA Node, to exchange control and management operations in
relation to the collector and enforcer defined in ARCA. This VLAN is
shared among all OpenStack domains to implement the global control of
the virtualization environment pertaining to the organization.
Finally, other VLAN is used by the infrastructure to interconnect the
data planes of the separated domains and also to allow all elements
Martinez-Julia Expires August 24, 2018 [Page 13]
Internet-Draft Anticipated Adaptation of SDN/NFV Systems 2 2018
of the infrastructure to access the Internet to perform software
installation and updates.
Installation of OpenStack is provided by the Red Hat OpenStack
Platform, which is tightly dependent on the Linux operating system
and closely related to the software developed by the OpenStack Open
Source project. It provides a comprehensive way to install the whole
platform while being easily customized to meet our specific
requirements, while it is also backed by operational quality support.
The ARCA node is also based on Linux but, since it is not directly
related to the OpenStack deployment, it is not based on the same
distribution. It is just configured to be able to access the control
and management interfaces offered by OpenStack, and therefore it is
connected to the VLAN that hosts the control and management planes.
On this node we deploy the NFV-MANO components, including the micro-
services that form an ARCA instance.
In summary, we dedicate nine physical computers to the OpenStack
deployment, all are Dell PowerEdge R610 with 2 x Xeon 5670 2.96 GHz
(6 core / 12 thread) CPU, 48 GiB RAM, 6 x 146 GiB HD at 10 kRPM, and
4 x 1 GE NIC. Moreover, we dedicate an additional computer with the
same specification to the ARCA Node. We dedicate a less powerful
computer to implement the physical router because it will not be
involved in the general execution of OpenStack nor in the specific
experiments carried out with it. Finally, as detailed above, we
dedicate a high-end physical switch, an HP ProCurve 1810G-24, to
build the interconnection networks.
6.4. Initial Results
Using the platform described above we execute an initial but long-
lasting experiment based on the target scenario introduced at the
beginning of this section. The objective of this experiment is
twofold. First, we aim to demonstrate how ARCA behaves in a real
environment. Second, we aim to stress the coupling points between
ARCA and OpenStack, which will raise the limitations of the existing
interfaces.
With such objectives in mind, we define a timeline that will be
followed by both clients and external event detectors. It forces the
virtualized system to experience different situations, including
incidents of many severities. When an incident is found in the
timeline, the detectors notify it to the ARCA-based VIM and the
clients change their request rates, which will depend on the severity
of the incident. This behavior is widely discussed in ICIN 2018
[ICIN-2018], remarking how users behave after occurring a disaster or
another similar incident.
Martinez-Julia Expires August 24, 2018 [Page 14]
Internet-Draft Anticipated Adaptation of SDN/NFV Systems 2 2018
The ARCA-based VIM will know the occurrence of the incident from two
sources. First, it will receive the notification from the event
detectors. Second, it will notice the change of the CPU load of the
servants assigned to the target service. In this situation, ARCA has
different opportunities to overcome the possible overload (or
underload) of the system. We explore the anticipation approach
deeply discussed in ICIN 2018 [ICIN-2018]. Its operation is enclosed
in the analyzer and decider and it is based on an algorithm that is
divided in two sub-algorithms.
The first sub-algorithm reacts to the detection of the incident and
ulterior correlation of its severity to the amount of servants
required by the system. This sub-algorithm hosts the regression of
the learner, which is based on the SVM/SVR technique, and predicts
the necessary resources from two features: the severity of the
incident and the time elapsed from the moment it happened. The
resulting amount of servants is established as the minimum amount
that the VIM can use.
The second sub-algorithm is fed with the CPU load measurements of the
servants assigned to the service, as reported by the OpenStack
platform. With this information it checks whether the system is
within the operating parameters established by the NFVO. If not, it
adjusts the resources assigned to the system. It also uses the
minimum amount established by the other sub-algorithm as the basis
for the assignation. After every correction, this algorithm learns
the behavior by adding new correlation vectors to the SVM/SVR
structure.
When the experiment is running, the collector component of the ARCA-
based VIM is attached to the Telemetry interface of OpenStack by
using the SDK to access the measurement data generated by Ceilometer
and stored by Gnocchi. In addition, it is attached to the external
event detectors in order to receive their notifications. On the
other hand, the enforcer component is attached to the Compute
interface of OpenStack by also using its SDK to request the
infrastructure to create, destroy, query, or change the status of a
VM that hosts a servant of the controlled system. Finally, the
enforcer also updates the lists of servers used by the load balancers
to distribute the clients among the available resources.
During the execution of the experiment we make the ARCA-based VIM to
report the severity of the last incident, if any, the time elapsed
since it occurred, the amount of servants assigned to the controlled
system, the minimum amount of servants to be assigned, as determined
by the anticipation algorithm, and the average load of all servants.
In this instance, the severities are spread between 0 (no incident)
and 4 (strongest incident), the elapsed times are less than 35
Martinez-Julia Expires August 24, 2018 [Page 15]
Internet-Draft Anticipated Adaptation of SDN/NFV Systems 2 2018
seconds, and the minimum server assignation (MSA) is below 10,
although the hard maximum is 15.
With such measurements we illustrate how the learned correlation of
the three features (dimensions) mentioned above is achieved. Thus,
when there is no incident (severity = 0), the MSA is kept to the
minimum. In parallel, regardless of the severity level, the
algorithm learned that there is no need to increase the MSA for the
first 5 or 10 seconds. This shows the behavior discussed in this
paper, that there is a delay between the occurrence of an event and
the actual need for updated amount of resources, and it forms one
fundamental aspect of our research.
By inspecting the results, we know that there is a burst of client
demands that is centered (peak) around 15 seconds after the
occurrence of an incident or any other change in the accounted
severity. We also know that the burst lasts longer for higher
severities, and it fluctuates a bit for the highest severities.
Finally, we can also notice that for the majority of severities, the
increased MSA is no longer required after 25 seconds from the time
the severity change was notified.
All that information becomes part of the knowledge of ARCA and it is
stored both by the internal structures of the SVM/SVR and, once
represented semantically, in the semantic database that manages the
knowledge base of ARCA. Thus, it is used to predict any future
behavior. For instance, is an incident of severity 3 has occurred 10
seconds ago, ARCA knows that it will need to set the MSA to 6
servants. In fact, this information has been used during the
experiment, so we can also know the accuracy of the algorithm by
comparing the anticipated MSA value with the required value (or even
the best value). However, the analysis of such information is left
for the future.
While preparing and executing the experiment we found several
limitation intrinsic to the current OpenStack platform. First,
regardless of the CPU and memory resources assigned to the underlying
controller nodes, the platform is unable to record and deliver
performance measurements at a lower interval than every 10 seconds,
so it is currently not suitable for real time operations, which is
important for our long-term research objectives. Moreover, we found
that the time required by the infrastructure to create a server that
hosts a somewhat heavy servant is around 10 seconds, which is too far
from our targets. Although these limitations can be improved in the
future, they clearly justify that our anticipation approach is
essential for the proper working of a virtual system and, thus, the
integration of external information becomes mandatory for future
Martinez-Julia Expires August 24, 2018 [Page 16]
Internet-Draft Anticipated Adaptation of SDN/NFV Systems 2 2018
system management technologies, especially considering the
virtualization environments.
Finally, we found it difficult for the required measurements to be
pushed to external components, so we had to poll for them.
Otherwise, some component of ARCA must be instantiated along the main
OpenStack components and services so it has first-hand and prompt
access to such features. This way, ARCA could receive push
notifications with the measurements, as it is for the external
detectors. This is a key aspect that affects the placement of the
NFV-VIM, or some subpart of it, on the general architecture.
Therefore, for future iterations of the NFV reference architecture,
an integrated view between the VIM and the NFVI could be required to
reflect the future reality.
7. Relation to Other IETF/IRTF Initiatives
TBD
8. IANA Considerations
This memo includes no request to IANA.
9. Security Considerations
The major security concerns of the integration of external event
detectors and ARCA to manage SDN/NFV systems is that the boundaries
of the control and management planes are crossed to introduce
information from outside. Such communications must be highly and
heavily secured since some malfunction or explicit attacks might
compromise the integrity and execution of the controlled system.
However, it is up to implementers to deploy the necessary
countermeasures to avoid such situations. From the design point of
view, since all oprations are performed within the control and/or
management planes, the security level of the current solution is
inherited and thus determined by the security masures established by
the systems conforming such planes.
10. Acknowledgements
TBD
11. References
Martinez-Julia Expires August 24, 2018 [Page 17]
Internet-Draft Anticipated Adaptation of SDN/NFV Systems 2 2018
11.1. Normative References
[draft-geng-coms-architecture-01]
"Technology Independent Information Model for Network
Slicing", 2018, <https://www.ietf.org/id/
draft-qiang-coms-netslicing-information-model-02.txt>.
[draft-qiang-coms-netslicing-information-model-02]
"Technology Independent Information Model for Network
Slicing", 2018, <https://www.ietf.org/id/
draft-qiang-coms-netslicing-information-model-02.txt>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
11.2. Informative References
[ETSI-NFV-IFA-004]
ETSI NFV GS NFV-IFA 004, "Network Functions Virtualisation
(NFV); Acceleration Technologies; Management Aspects
Specification", 2016.
[ETSI-NFV-IFA-005]
ETSI NFV GS NFV-IFA 005, "Network Functions Virtualisation
(NFV); Management and Orchestration; Or-Vi reference point
- Interface and Information Model Specification", 2016.
[ETSI-NFV-IFA-006]
ETSI NFV GS NFV-IFA 006, "Network Functions Virtualisation
(NFV); Management and Orchestration; Vi-Vnfm reference
point - Interface and Information Model Specification",
2016.
[ETSI-NFV-IFA-019]
ETSI NFV GS NFV-IFA 019, "Network Functions Virtualisation
(NFV); Acceleration Technologies; Management Aspects
Specification; Release 3", 2017.
[ETSI-NFV-MANO]
ETSI NFV GS NFV-MAN 001, "Network Functions Virtualisation
(NFV); Management and Orchestration", 2014.
Martinez-Julia Expires August 24, 2018 [Page 18]
Internet-Draft Anticipated Adaptation of SDN/NFV Systems 2 2018
[ICIN-2017]
P. Martinez-Julia, V. P. Kafle, and H. Harai, "Achieving
the autonomic adaptation of resources in virtualized
network environments, in Proceedings of the 20th ICIN
Conference (Innovations in Clouds, Internet and Networks,
ICIN 2017). Washington, DC, USA: IEEE, 2018, pp. 1--8",
2017.
[ICIN-2018]
P. Martinez-Julia, V. P. Kafle, and H. Harai,
"Anticipating minimum resources needed to avoid service
disruption of emergency support systems, in Proceedings of
the 21th ICIN Conference (Innovations in Clouds, Internet
and Networks, ICIN 2018). Washington, DC, USA: IEEE, 2018,
pp. 1--8", 2018.
[OPENSTACK]
The OpenStack Project, "http://www.openstack.org/", 2018.
Author's Address
Pedro Martinez-Julia (editor)
NICT
4-2-1, Nukui-Kitamachi
Koganei, Tokyo 184-8795
Japan
Phone: +81 42 327 7293
Email: pedro@nict.go.jp
Martinez-Julia Expires August 24, 2018 [Page 19]