Internet DRAFT - draft-mtoy-anima-self-faultmang-framework
draft-mtoy-anima-self-faultmang-framework
Internet Engineering Task Force M. Toy
Internet-Draft Comcast
Intended status: Informational June 29, 2015
Expires: December 31, 2015
Architectural Framework for Self-Managed Networks with Fault Management
Hierarchy
draft-mtoy-anima-self-faultmang-framework-00.txt
Abstract
This document describes a self-managed network identifying network
problems during failures and repairing them. Self-managed Network
Element (sNE) architectures and Network Management System (sNMS)
architectures for centrally and distributedly managed networks are
described. A hierarchy among repairing entities is defined. An in-
band message format for Metro Ethernet networks is proposed for the
fault management communication.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on December 31, 2015.
Copyright Notice
Copyright (c) 2015 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
Toy Expires December 31, 2015 [Page 1]
Internet-Draft self-faultmang-framework June 2015
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. sNE Architecture . . . . . . . . . . . . . . . . . . . . . . . 3
3. Self-Managing Network Management System (sNMS) Architecture . 6
4. Intelligent Agent Architecture . . . . . . . . . . . . . . . . 9
5. Self and Centrally Managed Networks . . . . . . . . . . . . . 9
6. Self and Distributedly Managed Networks . . . . . . . . . . . 10
7. In-band Communications of Failure types, Estimated Fix
Time and Fix . . . . . . . . . . . . . . . . . . . . . . . . . 11
8. Failure Fixing Hierarchy in Centrally Managed Networks . . . . 13
9. Failure Fixing Hierarchy in Distributedly Managed Networks . . 14
10. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 15
11. Security Considerations . . . . . . . . . . . . . . . . . . . 16
12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16
13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 16
13.1. Normative References . . . . . . . . . . . . . . . . . . 16
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 16
Toy Expires December 31, 2015 [Page 2]
Internet-Draft self-faultmang-framework June 2015
1. Introduction
The industry is focused on auto-configuration [GANA], [SUSEREQ],
[SELFMAN] and monitoring of network resources and services, isolating
problems when there are failures, and fixing them by sending
technicians to the sites most of the time or downloading certain
configuration files remotely for configuration related problems. The
concept of network identifying problems by itself and fixing them and
only sending technicians to the failure site only when there are
single-point of hardware failures (i.e. there is no hardware
redundancy) is not practiced [SMN,SMCEN]. Tools for self-managed
networks are not developed either. On the other hand, auto-
configuration of network elements (NEs) such as cable modem (CM) and
cable modem termination system (CMTS) is being practiced by Multiple
System Operators (MSOs) using Data Over Cable Service Interface
Specification (DOCSIS) back-office systems. Similar procedures are
also used by DPoE networks [DPoE] for auto-configuration of NEs and
services. This draft does not discuss the auto-configuration, but
focuses on fault management aspects of self and centrally or
distributedly managed networks.
This draft describes a self-managed network where each self-managed
NE (sNE) in a network monitors its hardware and software resources
periodically, runs diagnostics tests during failures in a
hierarchical fashion, identifies problems if they are local to the
sNE and fixable by the sNE, and reports failures and fixes to a
centralized network management system (sNMS) to be accessed by
network operators, field technicians, customers, and other sNEs in
the network. If the problem is not locally fixable by the sNE, the
Self-Managing Regional NMS (sNMRn) or sNMS runs its own rule-based
logic to determine if the problem is fixable remotely by the sNMRn or
sNMS. If it is not, a message (i.e.notification) is sent to a
network operator or field technician to fix the problem.
Failure type; if the problem is fixable locally by sNE, remotely by
sNMRn or sNMS, or remotely by a technician; and estimated fix time
are communicated with a newly defined message format. The hierarchy
of fixing failures is network architecture dependent, as discussed in
sections 8 and 9.
2. sNE Architecture
An sNE (Figure 1) consists of an intelligent NE (iNE) and intelligent
agents.
The intelligent NE (iNE) is built to have redundant hardware and
software components as depicted in Figure 1, where each hardware or
Toy Expires December 31, 2015 [Page 3]
Internet-Draft self-faultmang-framework June 2015
software component is intelligent enough to run its own diagnostics
and identify faulty subcomponents. Self-managing agents (i.e.
intelligent agents) may take over after internal diagnostics of each
component is completed. Furthermore, iNE keeps a redundant copy of
its current or default configuration.
The intelligent subcomponents can be smallest replaceable units such
as chips, operating system, and protocol software that are capable of
periodic self-checking, declaring a failure when it is unable to
perform its functions, running diagnostics and identifying whether
the faulty entity is within the subcomponent or not, escalating the
diagnostics to the next level in the hierarchy when the diagnostics
are inconclusive.
When there is a failure, if failed entity is unidentified as a result
of the diagnostics tests run by the intelligent subcomponents, the
iNE is able to run diagnostics for a pre-defined set of subcomponents
that are collectively performing a specific function. A pre-defined
set of subcomponents can be a collection of components that are
contributing to the realization of a main function such as packet
forwarding, deep packet inspection, event forwarding, etc.
If the diagnostics tests ran for a pre-defined set of subcomponents
cannot identify the failed entity, the iNE is able to run diagnostics
at NE level to determine the failure. After the failure is
identified to the smallest replaceable hardware (e.g. chips, wires
connecting chips, backplane, etc.) and/or software entity (e.g.
kernel, log, protocol software, event forwarding discriminator,
etc.), the responsible intelligent agents determine if the failure is
fixable and initiates a message to related parties with estimated fix
time to repair. If the iNE diagnostics are inconclusive, then that
will be communicated as well.
Each self-managing agent (i.e. intelligent agent) monitors the entity
that it belongs to, and may run additional diagnostic tests to
identify problems during failures, initiates a failure message, fixes
problems, and initiates a fix notification to the central or regional
self-managing systems and other related entities. The message (i.e.
notification) indicating that the fixing entity is sNE, is
communicated to other sNEs, regional and central network management
systems systems, field technicians and customers (if desired). If
the problem is determined to be not fixable locally after two-three
tries or without a try, depending on the problem, a message is sent
to the regional or central network management systems by the sNE
indicating that the fixing entity is unidentified.
The intelligent agents are one or more intelligent Hardware
Maintenance Agent(s) (iHMA(s)), intelligent Operating System
Toy Expires December 31, 2015 [Page 4]
Internet-Draft self-faultmang-framework June 2015
Maintenance Agent (s) (iOMA (s)), intelligent Application Maintenance
Agent (s) (iAMA (s)), and intelligent Capacity Management Agent (s)
(iCMA (s)), depending on the implementation.
The iHMA is capable of periodically monitoring hardware entities such
as CPU, memory, physical ports, communication channels, buffers,
backplane, power supplies, etc., and initiating pre-defined
maintenance actions during hardware failures. iOMA is capable of
periodically monitoring operating system and initiating pre-defined
maintenance actions during Operating System failures. The iAMA is
capable of periodically monitoring application software and protocol
software, and initiating pre-defined maintenance actions during
application and protocol software failures. The iCMA is capable of
periodically monitoring system capacity, load and performance, and
collecting measurements. When capacity thresholds are exceeded, the
iCMA initiates pre-defined maintenance actions.
+----------------------------------------------------------------+
| +--------------------------------------------------+ |
| | iNE | |
| +-------+ | +------------------+ +---------------------+ | |
| | iCMA | | | intelligent and | | intelligent and | | |
| | | | | redundant HW | | redundant SW | | |
| +-------+ | | subcomponent 1 | | subcomponent 1+ | | |
| +-------+ | +---------+--------+ +----------+----------+ | |
| | iOMA | | | | | |
| | | | | |
| +-------+ | | | | |
| +-------+ | +---------+---------+ +----------+-----------+ | |
| | iAMA | | | intelligent and | | intelligent and | | |
| | | | | redundant HW | | redundant SW | | |
| +-------+ | | subcomponent N | | subcomponent N | | |
| +-------+ | +-------------------+ +----------------------+ | |
| |iHMA | +--------------------------------------------------+ |
| | | +------------------------+ |
| +-------+ |backup copy of current | |
| |configuration files | |
| +------------------------+ |
+----------------------------------------------------------------+
Figure 1: Self-Managed NE Architecture
Toy Expires December 31, 2015 [Page 5]
Internet-Draft self-faultmang-framework June 2015
+----------------------------------------------------------------------+
| +--------------------------------------------------------+ |
| | Hierarchical Diagnostic and Trouble Isolation Logic | |
| +--------------------------------------------------------+ |
| |---------------------------------------------------------+ +--+ +--+|
| |------------------------------------------------------+ | | | | ||
| || +---------------------+ +---------------------+ | | |S | |S ||
| || |intelligent smalles | |intelligent smallest | | | |u | |u ||
| || |replaceable unit | ++ |repleacable unit | | | |b | |b ||
| || | ISRU) | |(ISRU) | | | + | + ||
| || +---------------------+ +---------------------+ | | |C | |C ||
| || a group of ISRUs providing functions collectively | | |o | |o ||
| +-----------------------------+------------------------+ | |m | |m ||
| | | | |p | |p ||
| | | | |o | |o ||
| +------------------------------------------------------+ | |n | |n ||
| || +--------------------+ +----------------------+ | | |e | |e ||
| || |intelligent smallest| |intelligent smallest | | | |n | |n ||
| || |replaceable unit | ++ |replaceable unit | | | |t | |t ||
| || |(ISRU) | |(ISRU) | | | | | | ||
| || +--------------------+ +----------------------+ | | |2 | |N ||
| || a group of ISRUs providing functions collectively | | | | | ||
| +------------------------------------------------------+ | | | | ||
| | SubComponent 1 | | | | ||
| +---------------------------------------------------------+ +--+ +--+|
++ --------------------------------------------------------- +---------+
Figure 2: Intelligent NE Architecture
3. Self-Managing Network Management System (sNMS) Architecture
A Central or Regional sNMS consists of an intelligent NMS (iNMS) that
mainly deals with remote fixes, a Task Manager (TM) to manage tasks
to be executed, copies of software modules for each type of sNE, a
Traffic Manager (TrfMgr) to deal with network level traffic
management issues such as routing policies, load balancing,
connection admission control, congestion control, Event Forwarding
Discriminator (EFD) to forward failures and fixes to network
operators and customers, data base(DB) to store data, and a user
interface such as a Graphical User Interface (GUI) (Figure 3). The
sNMS is redundant where the active sNMS is protected by a stand-by
sNMS. The iNMSs in active and stand-by units perform periodic self-
checking. When the active sNMS fails, the stand-by sNMS takes over
the responsibilities.
The user interface provides human and machine interfaces. A Database
(DB) stores user interface events and data collected from network. A
Toy Expires December 31, 2015 [Page 6]
Internet-Draft self-faultmang-framework June 2015
Task Manager prioritizes and schedules execution of the tasks
including repair and configuration of activities that can be
performed remotely using a Rule Based Logic module. A Data Handler
collects end-to-end connection level measurements and sNE level
capacity measurements, and stores them in the DB to support the
TrfMgr.
The Task Manager (TM) of sNMS manages tasks to be executed by the
sNMS. The Rule-Based Logic determines if the problem is remotely
fixable by the iNMS.
The iNMS is expected to include a Fix Manager (FixMgr) for each sNE
type to fix the sNE problems remotely; store software modules
specific to the sNE; and capable of running network level traffic
management algorithms such as routing policies, load balancing,
connection admission control and congestion control. Furthermore,
the iNMS holds a copy of each sNE agent and remotely loads into sNEs
when needed.
Toy Expires December 31, 2015 [Page 7]
Internet-Draft self-faultmang-framework June 2015
+---------------------------------------------------------------------+
| GUI |
+--------------------------------------------------------+--+---------+
| iNMS |event |
+--------------------------------------------------------+ |forwarder|
|| hierarchical network level diagnostics and Trouble || +--+------+
|| identification logic || |
|-------------------+--------------------------+----------| +--+--++--+
||copies of iNE | |copies of iNE | |Fix Mgr|| |rule ||DB|
||Type 1 Software: | + ++Type N Software: | |for iNE|--|based|--+|
||Operating System, | |Operatinng System, | |Type 1 || |logic|+--+
||Applications, etc.| |Applications,etc. | +---+----+ +-----++---+
|-------------------+ +---------------- ----+ | | |Task ||Trf|
|------------------+ ++ +----------------------+ +---+----+ | mgr ++Mgr|
||periodic network | |traffic management | |Fix Mgr|| | |+---+
||level monitoring, | |algorithms and policies |for iNE|| +--+--+
||network level troubl| |, connection admission| |Type N || |
||isolation and fixing| |control, load balancing +--------+ +--+----+
||network level | |, congestion control | | |data |
||troubles | | | | |handler|
|---------------------+ +----------------------+ | +-------+
|---------------------+ +----------------------+ |
||periodic self checking |rule based logic to | |
||and switchover to | | verify iNE reporting | |
||back-up iNMS during | |of that failure is not |
|| failures. | |local and if it is | |
+---------------------+ |fixable by iNMS | |
| +----------------------+ |
+---------------------------------------------------+-----+
Figure 3: Self Managing NMS Architecture
An intelligent NMS (iNMS) (Figure 3) periodically monitors the
network that sNMS is managing, identifies network level failures,
estimates and communicates the fix time to related parties, and fixes
them. When the sNE reports that the failure is not local (i.e.
either tests are inconclusive or sNE is not capable of fixing it),
the Rule Based Logic of the sNMS verifies if the sNE failure is not
local.
There are no changes introduced to interfaces between the management
systems and the network for self-management. The well-known
protocols such as SNMP, IPDR (IP Detail Record) for usage
information, Network Configuration (NETCONF) for manipulating
configuration data and examining state information, and YANG modeling
can be employed.
Toy Expires December 31, 2015 [Page 8]
Internet-Draft self-faultmang-framework June 2015
4. Intelligent Agent Architecture
The intelligent agent architecture is depicted in Figure 4. Its Rule
Based Logic module determines problems and initiates fixes if the
problems are local to sNE, initiates tests for the fixes, determines
if the fix procedure or a step or some of the steps are to be
repeated, and initiates a message to all related parties about the
fix. If the problem is not local to the sNE, the agent informs all
related parties including the sNMS for its conclusion which is that
the fixing entity is unidentified. If the result of diagnostics
cannot identify the failed component which is inconclusive, that will
be conveyed as well.
A Scheduler module determines the priority and order of the tasks for
each functional entity within the sNE that it belongs to. An
Application Programming Interface (API) provides an interface to
various types of Software and Hardware entities within the sNE. A
Data Handler module collects necessary data for the sNE, performs the
fix, and keeps the data associated with the task. The Authorization
(AUTH) module authenticates local user access and remote user access
from the sNMS interface to sNE agents. The Utilities module supports
various file operations.
+-----------------------------------------------------------+
| |
| +------+ +----------------------------+ +-------+ | +--------+
| |Rule | | Scheduler +-+ | | |NE |
| |Based | +----------------------------+ | | | | |
| +Logic + | API | | |SW/HW |
| ------- +---------------------------+ | | +-+Modules |
| +-------------+ | Data Handler | | | | | |
| |Authorization| | +-+ | | +--------+
| +-------------+ +---------------------------+ +---+---+ |
| | Task|| Fix ||Data | | |
| | Data|| Delivery+|Collector| | |
| | || Agent | ---------+ +---+-------+
| +----------------+ |Utilities ||
| +-----------|
+-----------------------------------------------------------+
Figure 4: Intelligent Agent Architecture
5. Self and Centrally Managed Networks
A self and centrally managed network architecture consisting of self-
managed NEs and self-managing NMS is depicted in Figure 5.
Toy Expires December 31, 2015 [Page 9]
Internet-Draft self-faultmang-framework June 2015
sNE related failures are handled locally by the sNE. If the problem
is determined to be not fixable by the sNE after two or three tries
or without a try, depending on the problem, a message is sent to the
sNMS by the sNE indicating that the fixing entity is unidentified.
If the problem is locally fixable, sNE send a message to SNMS, other
sNEs, field technicians and users, indicating the fixing entity and
how long the fix is going to take.
+-----------+ +-----------+
| iNE | SNMP/YANG + iNMS |
| +-----+XXXXXXXXXXXX +---------- | TrfMgr, |
|iHMA, iCMA | X|X XX+XXXX |GUI,TM, EFD|
|iOMA, iAMA | XX| XXXX |DB, FixMgr |
+-----------+ XX + XX +-----------+
sNE X XX sNMS
X Network X
XX X
X XX
+-----------+ X XX +-----------+
|iNE | X+X XXX+---+iNE |
| +---+XXXXXXXXXXXX X XX+ | |
|iHMA, iCMA | XXX |iHMA, iCMA |
|iOMA, iAMA | |iOMA, iAMA |
+-----------+ +-----------+
sNE sNE
Figure 5: Self and Centrally Managed Network Architecture
6. Self and Distributedly Managed Networks
A self and distributedly managed network architecture is given in
Figure 6. The network is divided into multiple regions where each
region is managed by a self-managing NMS (sNMRn ). One of the sNMRs
in the network acts as the central sNMS. The regional self-managing
sNMRs and central self-managing NMS are connected to each other via
in-band and/or out-of-band communications facilities.
Toy Expires December 31, 2015 [Page 10]
Internet-Draft self-faultmang-framework June 2015
+--------------+ +--------------+
|Regional | | Self-Managing|
+---+ |Self-Managing +--------+ NMS +--+
|sNE| +---+NMS | +-------+------+ |
+---+ XXXXXXXX+ +--------------+ | |
XXXXX XX XXXXXXX++ |
+---+ XX Network + Links Connecting XXXXX XXXXXX+ |
|sNE| X Region 1 +-----------------------+ Network X |
+---+ XXXX XXXXXXXXXX Regional Networks X Region N X |
XXX +------------+X XXX |
+ | XXXXXXXXXXXXXXXX |
| | |
+---+ XXXXX+XXXXXXXXXX+--------+ in-band or out-of |
|sNE| XX Network X| +--------------------+
+---+ X Region 2 X| +--------------+ | band Connectivity
+---+ X X+---+Regional +---+ among NMSs
|sNE| XXXXXXXXXXXXXXXXX |Self-Managing |
+---+ X |NMS 2 |
+--------------+
Figure 6: Self Distributedly Managed Network Architecture
sNMRn provides all the centralized management functions for its own
subnet and informs the central sNMS about its activities. End-to-end
network level activities beyond region boundaries will be left to
sNMS. These activities can be Connection Admission Control (CAC),
load balancing, and congestion control at end-to-end network level.
7. In-band Communications of Failure types, Estimated Fix Time and Fix
In today's networks, failures related to equipment, ports and
connections are mostly reported to an NMS via SNMP traps or in-band
communications to NEs via AIS (Alarm Indication Signal), RDI (Remote
Defect Indicator), Connectivity Check Message (CCM) related events
such as Loss of Continuity (LoC) [Y.1731], etc. These alarms and
traps identify the failed NE, port, or connection, but don't identify
the component contributing to the failure. Furthermore, each has a
different format.
For self-management, it is necessary to identify faulty components,
estimate the time for fix, and communicate that to all parties
involved (i.e. sNEs, sNMRn , sNMS, field technicians, and customers),
so that working sNEs can store (if desired) data routed to the failed
sNE(s) for the duration of fix or re-route traffic around the failed
sNE(s) or port(s). For simplicity, all messages should have the same
format.
Toy Expires December 31, 2015 [Page 11]
Internet-Draft self-faultmang-framework June 2015
Figure 7 depicts a possible Ethernet frame for Ethernet networks to
carry all the information described above. Similar messages are to
be created for other types of networks such as IP, MPLS and IMS.
+-----------------------------------+--------------+-----+------------+
|IFG|P |SFD|DA|SA|L/T|fNE|fComp|Op | Failure| Fix | Fix | PAD (25|CRC|
| | | | | | |ID |ID |Code| Code | Code| Time|bytes 0)| |
+---+--+---+--+--+---------------------------+--------------------+---+
IFG: Interframe Gap, 12 bytes
P/SFD (Preamble/Start of Frame Delimiter)-8 Bytes(P-7 bytes, SFD-1 byte)
L/T (Length/Type) : Length of frame or data type, 2 bytes (0x8808)
CRC: 4 bytes
DA: 01:80:C2:00:00:02 (6 bytes)-Slow protocol multicast address
fNE ID: 6 bytes, Failed sNE Identifier
fComp ID: 4 bytes, Failed Component Identifier
Op Code: 2 bytes-0x0202 for Disabled and 0x0303 for Enabled status
Failure Code : 4 bytes
Fix Code: 1 byte identifying fixing entity, NE (x00), sNMS (x01),
sRMS (x02), sNMS-v (x03), RNMS-v (x04), sNMS-s (x05),
sRNMS-s (x06), field technician (x07),
unidentified entity or inconclusive diag(x08)
Fix Time: 4 bytes indicating fix time in seconds by NE, NMS,
or field technician
Figure 7: Self-Managing message frame format for Self-managed
Ethernet networks
For Ethernet networks, slow protocol multicast address can be used to
inform sNEs, sNMS, and field technician devices connected to the
network. fNE ID indicates MAC address of the failed sNE. fComp ID
indicates the failed component identifier within the sNE. Op Code
indicates whether the sNE or port is operationally disabled or
enabled.
This operational status is disabled during failures and becomes
enabled after the failure is fixed. Failure Code indicates failure
type. If failure type is unidentified thru diagnostics, Failure Code
will be unidentified or inconclusive or the failure is not-local to
sNE. Fix Code identifies repairing entity whether it is sNE, sNMRn ,
sNMS, or a field technician.
It is possible to allocate six bytes to Fix Code field to indicate
MAC address of the fixing entity. It is also possible to identify
the failure type and not fix it. In this case, fixing entity is
unidentified. It is also possible that both failure code and fix
code are unidentified. Fix time indicates the estimated time in
Toy Expires December 31, 2015 [Page 12]
Internet-Draft self-faultmang-framework June 2015
seconds for repair which is set by the repairing entity. In order
for sNE, sNMRn , or sNMS to provide the estimated fix time, the fix
time for each type of failure needs to be stored in sNE and sNMRn or
sNMS. If the failure is going to be fixed by a field technician, the
technician may enter fix time manually into the related management
system to communicate that to all related parties.
Given the sNMRn and sNMS interface uses network management protocols
such as SNMP, the information in the message (Figure 5) needs to be
conveyed to sNMS via an SNMP trap. Similarly the SNMP trap from
sNMRn and sNMS needs to be converted into an in-band message to
convey the information to self-managing NEs.
8. Failure Fixing Hierarchy in Centrally Managed Networks
+---------+
| Failure |
| in sNE |
+----+----+
|
|
XXX+XX
No XXXX XXXXX Yes
+--------+XXXis it locallyXXX+-------+
| XXX fixable by XXXX |
| XXXX sNE? XXXX |
XX+XXX XXXX XXX |
No XXXXXX XXXXXXX Yes +-----------+-----------+
+----+XX is it remotelyX+------+ |sNMS,field technicians,|
| XX fixable by XX | |sNEs and users wait for|
| X sNMS? XXXXX | |estimated Fix Time for |
| XXX XXXXX | |notification from sNE |
| XXXXX | +-----------------------+
| |
| |
| |
+-----+-------------------+ +------+-----------------+
|Field technician sets fix| |sNMS sets Fix Time in |
|time in notification and | |notification and sends |
|nitiate the notification | |the notification to a |
|to sNEs, sNMS and users | |sNE to communicate that |
+-------------------------+ |to other sNEs, field |
|technicians and users |
+------------------------+
Figure 8: Fault Management Hierarchy for Self and Centrally Managed
Toy Expires December 31, 2015 [Page 13]
Internet-Draft self-faultmang-framework June 2015
Networks
In a centrally managed network, when there is a failure, sNE
determines if the failure is local to the sNE or not. If the failure
is local, then the sNE informs other sNEs, sNMS, field technicians
and customers about failure type and fix time. If NE decides that
the failure is not local to sNE, then sNE escalates the problem to
the sNMS. The sNMS verifies that it is not local to the sNE and
determines if it can fix the problem. If the sNMS can fix the
problem, the sNMS communicates the failure type and fix time to sNEs,
field technicians and customers. If the sNMS determines the failure
is not fixable, the sNMS escalates the problem to field technicians.
The field technician communicates fix time to sNEs, the sNMS and
customers. After the fix is completed, the fixing entity initiates a
self-managed notification with Enabled status (i.e. Opcode is set to
Eanabled) to other sNEs, the sNMS, and customers. Both sNMS and
field technicians use one of the sNEs to send notifications to the
remaining interested parties.
The sNMS and field technician communicates failures and fixes via a
message from the sNMS. If there is a node failure (i.e. sNE
completely fails due to a power failure for example), neither the
sNMS nor field technicians is able to communicate with the sNE.
Therefore, the sNMS and field technicians would use another sNE to
communicate the failure.
9. Failure Fixing Hierarchy in Distributedly Managed Networks
In distributed architecture, the network is divided into sub-networks
(I.e. regional networks), where each sub-net has its own sNMRn .
sNMRn provides all the centralized management functions for its own
subnet and informs sNMS about its activities.
End-to-end network level monitoring and problem fixing beyond
regional boundaries are left to sNMS. These activities can be
Connection Admission Control (CAC), load balancing, and congestion
control at network level.
Toy Expires December 31, 2015 [Page 14]
Internet-Draft self-faultmang-framework June 2015
+---------+
| Failure |
| in sNE |
+----+----+
|
|
XXX+XX
No XXXX XXXXX Yes
+--------+XXXis it locallyXXX+-+
| XXX fixable by XXXX |
| XXXX sNE? XXXX |
XX+XXX XXXX XXX |
No XXXXXX XXXXXXX Yes +-------------+------------+
+---+XX is it remotelyX+----+ |sNMS,sNMRs, sNMS, field |
| XX fixable by XX | |technicians and users wait|
| X sNMR? XXXXX | |for estimated Fix Time for|
| XXX XXXXX | |notification from sNE |
| XXXXX | +--------------------------+
No XXXXXXXXXXXXXXXXXX Yes +-+----------------------+
+---+XXis it remotelyXX+----------+ |sNMR sets Fix Time in |
| XX fixable by XXX | |notification and sends |
| X sNMS XXXX | |the notification to |
| XXXXXXXXXXXX | | a sNE to |
| | | communicate that to |
| | |other sNEs, sNMRs, sNMS |
| | | and users |
| | +------------------------+
+--+---------------------- +---+---------------------+
|Field technician sets fix |sNMS sets Fix Time in |
|time in notification and |notification to a sNE to |
|nitiate the notification |communicate it to other |
|to sNEs,sNMRs, sNMS, and + |sNEs, sNMRs, field |
| users | |technicians and users |
+-------------------------+ +-------------------------+
Figure 9: Fault Management Hierarchy for Self and Distributedly
Managed Networks
10. Conclusion
Self-managed network concept for fault management, self-managed NE
and self-managing NMS architectures, and a fault management
communication mechanism for centrally and distributedly self-managed
networks are introduced. A hierarchy for fault management for these
networks are described.
Toy Expires December 31, 2015 [Page 15]
Internet-Draft self-faultmang-framework June 2015
11. Security Considerations
It is expected that all sNEs, sNMS, and sNMRn are authenticated
during the network configuration manually or automatically. If there
are security mechanisms established among sNEs, sNMS, sNMRn for
exchanging messages, they would apply for exchanging the fault
messages described here. There is no need for additional security
procedures for the fault management messages described here.
12. IANA Considerations
This document does not request any action from IANA.
13. References
13.1. Informative References
[GANA] ETSI GS AFI 002 V1.1.1 : Autonomic network engineering for
the self-managing Future Internet; Generic Autonomic Network
Architecture, 2013-04
[SUSEREQ] ETSI GS AFI 001 V1.1.1 Group Specification Autonomic
network engineering for the self-managing Future
Internet (AFI); Scenarios, Use Cases and Requirements
for Autonomic/Self-Managing Future Internet, 2011-06
[SELFMAN] Keller, Alexander; et al. (Eds.), Self-Managed Networks,
Systems, and Services Second IEEE International
Workshops, SelfMan 2006, Dublin, Ireland, June 16, 2006,
Proceedings
[DPoE] E. Malette and M. Hajduczenia, Automating provisioning of
Demarcation Devices in DOCSIS Provisioning of EPON (DPoE),
IEEE Comm. Magazine, September, 2012
[SMN] M. Toy, Self-Managed Networks, Comcast internal document,
November, 2012.
[SMCEN] M. Toy, Self-Managed Carrier Ethernet Networks, April 2014,
MEF Meeting in Budapest, self-managed-networks-comcast-
mtoy.pdf.,
https://wiki.metroethernetforum.com/display/OWG/New+Work
[Y.1731] ITU-T Y.1731, OAM functions and mechanisms for Ethernet
based networks, 2008
Toy Expires December 31, 2015 [Page 16]
Internet-Draft self-faultmang-framework June 2015
Author's Address
Mehmet Toy
Comcast
1800 Bishops Gate Blvd.
Mount Laurel, NJ 08054
USA
Email: mehmet_toy@cable.comcast.com
Toy Expires December 31, 2015 [Page 17]