Internet DRAFT - draft-kim-nmlrg-network
draft-kim-nmlrg-network
NML Research Group M-S. Kim
Internet-Draft Y-G. Hong
Intended status: Informational ETRI
Expires: September 14, 2017 March 13, 2017
Collaborative Intelligent Multi-agent Reinforcement Learning over a
Network
draft-kim-nmlrg-network-00
Abstract
This document describes agent reinforcement learning (RL) in a
distributed environment to transfer or share information for
autonomous shortest path-planning over a communication network. The
centralized node, which is the main node to manage agent workflow in
hybrid peer-to-peer environment, provides a cumulative reward for
each action that a given agent takes with respect to an optimal path
based on a to-be-learned policy over the learning process. A reward
from the centralized node is reflected when an agent explores to
reach its destination for autonomous shortest path-planning in
distributed nodes.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on September 14, 2017.
Copyright Notice
Copyright (c) 2017 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
Kim & Hong Expires September 14, 2017 [Page 1]
Internet-Draft Reinforcement Learning over a Network March 2017
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Conventions and Terminology . . . . . . . . . . . . . . . . . 3
3. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.1. General Motivation for Reinforcement Learning (RL) . . . 4
3.2. Reinforcement Learning (RL) in networks . . . . . . . . . 4
3.3. Motivation in our work . . . . . . . . . . . . . . . . . 4
4. Related Works . . . . . . . . . . . . . . . . . . . . . . . . 4
4.1. Autonomous Driving System . . . . . . . . . . . . . . . . 4
4.2. Game Theory . . . . . . . . . . . . . . . . . . . . . . . 4
4.3. Wireless Sensor Network (WSN) . . . . . . . . . . . . . . 5
4.4. Routing Enhancement . . . . . . . . . . . . . . . . . . . 5
5. Multi-agent Reinforcement Learning (RL) Technologies . . . . 5
5.1. Reinforcement Learning (RL) . . . . . . . . . . . . . . . 5
5.2. Reward of Distance and Frequency . . . . . . . . . . . . 5
5.3. Distributed Computing Node . . . . . . . . . . . . . . . 6
5.4. Agent Sharing Information . . . . . . . . . . . . . . . . 6
5.5. Sub-goal Selection . . . . . . . . . . . . . . . . . . . 6
5.6. Cluttered-index-based scheme . . . . . . . . . . . . . . 6
6. Proposed Architecture for Reinforcement Learning (RL) . . . . 7
7. Use case of Multi-agent Reinforcement Learning (RL) . . . . . 8
7.1. Distributed Multi-agent Reinforcement Learning: Sharing
Information . . . . . . . . . . . . . . . . . . . . . . . 8
7.2. Use case of Shortest Path-planning via sub-goal selection 9
7.3. Use case of Asynchronous Triggered Multi-agent with
Terrain Cluttered-index-based . . . . . . . . . . . . . . 10
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10
9. Security Considerations . . . . . . . . . . . . . . . . . . . 10
10. References . . . . . . . . . . . . . . . . . . . . . . . . . 10
10.1. Normative References . . . . . . . . . . . . . . . . . . 10
10.2. Informative References . . . . . . . . . . . . . . . . . 10
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 12
1. Introduction
In large surveilling applications, information of Critical Key
Infrastructures and Resources (CKIR) to protect and share is
necessary in larger ground, maritime and airborne areas, where there
is a special need for collaborative intelligent distributed systems
with intelligent learning schemes. These applications also need the
Kim & Hong Expires September 14, 2017 [Page 2]
Internet-Draft Reinforcement Learning over a Network March 2017
development of computational multi-agents learning systems in large
distributed networking nodes, where the agents have limited,
incomplete knowledge, and only access to local information in
distributed computing nodes over a communication network.
Reinforcement Learning (RL) is one effective technique to transfer
and share information among agents for autonomous shortest agent path
planning, as it does not require a-priori-knowledge of the agent's
behavior or environment to accomplish its tasks [Megherbi]. Such a
knowledge is usually acquired/learned automatically and autonomously
by trial and error.
Reinforcement Learning (RL) actions involve interacting with a given
environment, so the environment provides an agent learning process
with the elements as followings:
o Starting agent state, one or more obstacles, and agent
destinations
o Initially, agent explores randomly in a given node
o Agents' actions to avoid an obstacle and move to one or more
available positions to reach its goal(s)
o After an agent reaches its goal, it can use the information
collected in initial random path-planning work to improve its
learning speed
o Optimal ways in the following phase and exploratory learning
trials
Reinforcement Learning (RL) is one of the Machine Learning techniques
that will be adapted to the various networking environments for
automatic networks [I-D.jiang-nmlrg-network-machine-learning]. Thus,
this document provides motivation, learning technique, and use case
for network machine learning.
2. Conventions and Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
3. Motivation
Kim & Hong Expires September 14, 2017 [Page 3]
Internet-Draft Reinforcement Learning over a Network March 2017
3.1. General Motivation for Reinforcement Learning (RL)
Reinforcement Learning (RL) is a system capable of autonomous
acquirement and incorporation of knowledge. It can continuously
self-improve learning speed with experience and attempts to maximize
cumulative reward for a faster optimal path used in used in multi-
agents-based monitoring systems [Teiralbar].
3.2. Reinforcement Learning (RL) in networks
In large surveilling applications, it is necessary to protect and
share information in many Infrastructure and Resource area. In
wireless networking layers, Reinforcement Learning (RL) is an
emerging technology to monitor dynamics of the network to achieve
fair resource allocation for nodes within the wireless mesh setting.
Monitoring parameters of the network and adjusts based on the network
dynamics can demonstrate to improve fairness in wireless environment
Infrastructures and Resources [Nasim].
3.3. Motivation in our work
There are many different networking issues such as latency, traffic,
management and etc. Reinforcement learning [RL] is one of the
Machine Learning mechanisms that will be applied with multiple cases
to solve diverse networking problems against human operating
capacities. It can be a challenge-able due to a multitude of reasons
such as large state space search, complexity in giving reward,
difficulty in agent action selection, and difficulty in sharing/
merging learned information among the agents in a distributed memory
nodes to be transferred over a communication network [Minsuk].
4. Related Works
4.1. Autonomous Driving System
Autonomous vehicle is capable of self-automotive driving without
human supervision depending on optimized trust region policy by
reinforcement learning (RL) that enables learning of more complex and
special Neural Network. Such a vehicle provides a comfortable user
experience safely and reliably on interactive communication network
[April][Markus].
4.2. Game Theory
The adaptive multi-agent system, which is combined with complexities
from interacting game player, has developed in a field of
reinforcement learning (RL). In the early game theory, the
interdisciplinary work was only focused on competitive games, but
Kim & Hong Expires September 14, 2017 [Page 4]
Internet-Draft Reinforcement Learning over a Network March 2017
Reinforcement Learning (RL) has developed into a general framework
for analyzing strategic interaction and has been attracted field as
diverse as psychology, economics and biology [Ann].
4.3. Wireless Sensor Network (WSN)
Wireless sensor network (WSN) consists of a large number of sensors
and sink nodes for monitoring systems with event parameters such as
temperature, humidity, air conditioning, etc. Reinforcement learning
(RL) in WSNs has been applied in a wide range of schemes such as
cooperative communication, routing and rate control. The sensors and
sink nodes are able to observe and carry out optimal actions on their
respective operating environment for network and application
performance enhancements [Kok-Lim].
4.4. Routing Enhancement
Reinforcement Learning (RL) is used to enhance multicast routing
protocol in wireless ad hoc networks, where each node has different
capability. Routers in the multicast routing protocol are determined
to discover optimal route with a predicted reward, and then the
routers create the optimal path with multicast transmissions to
reduce the overhead in Reinforcement Learning (RL) [Kok-Lim].
5. Multi-agent Reinforcement Learning (RL) Technologies
5.1. Reinforcement Learning (RL)
Reinforcement Learning (RL) is one of the machine learning algorithms
based on an agent learning process. Reinforcement Learning (RL) is
normally used with a reward from the centralized node, and capable of
autonomous acquirement and incorporation of knowledge. It is
continuously self-improving and becoming more efficient as the
learning process from an agent's experience to increase an agent
learning speed for autonomous shortest path-planning
[Sutton][Madera].
5.2. Reward of Distance and Frequency
In general, an agent takes the return values of its current state and
next available state to decide and move an action, but the learning
process in Reinforcement Learning (RL) involves lots of limitations
since it provides the agents with only a single level of exploratory
learning process. The limitation is generated to reduce agent
learning speed in an optimal path, so that the Distance-and-Frequency
technique based on the Euclidean distance in Reinforcement Learning
(RL) was derived to enhance agent's optimal learning speed.
Distance-and-Frequency is based on more levels of agent visibility to
Kim & Hong Expires September 14, 2017 [Page 5]
Internet-Draft Reinforcement Learning over a Network March 2017
enhance learning algorithm by an additional way that uses the state
occurrence frequency [Al-Dayaa].
5.3. Distributed Computing Node
Autonomous path-planning for multi-agent environment is related to
agent transfer of path information, as the agents require information
to achieve efficient path-planning on a given local node or
distributed memory nodes over a communication network.
5.4. Agent Sharing Information
The quality of agent decision making often depends on the willingness
of agents to share a given learning information with other agents for
optimal path-planning. Sharing Information means that an agent would
share and communicate the knowledge learned and acquired with / to
other agents using Message Passing Interface (MPI). In sharing
information, each agent has an attempt of exploring its environment,
where all agents explore to reach their destinations via a
distributed reinforcement reward-based learning method on the
existing local distributed memory nodes. The agents can be running
on the same or different nodes over a communication network (via
sharing information). The agents have limited resources and
incomplete knowledge of their environments. Even if the agents do
not share the capabilities and resources to monitor an entire given
large terrain, they are able to share the needed information for
collaborative path-planning in distributed networking nodes
[Chowdappa][Minsuk].
5.5. Sub-goal Selection
A new technical method for agent sub-goal selection in distributed
nodes is introduced to reduce the agent initial random exploration
with a given selected sub-goal.
[TBD]
5.6. Cluttered-index-based scheme
We propose a learning algorithm to optimize agent sub-goal selection.
It is a proposed clutter-index-based technique for a new
reinforcement learning scheme with a reward and an improved method to
optimize multi-agent learning speed over a communication network.
[TBD]
Kim & Hong Expires September 14, 2017 [Page 6]
Internet-Draft Reinforcement Learning over a Network March 2017
6. Proposed Architecture for Reinforcement Learning (RL)
The architecture using Reinforcement Learning (RL) describes a
collaborative multi-agent-based system in distributed environments as
shown in figure 1, where the architecture is combined with a hybrid
architecture making use of both a master / slave architecture and a
peer-to-peer. The centralized node, assigns each slave computing
node a portion of the distributed terrain and an initial number of
agents. The network communication handles all communication among
components and agents in the distributed networking environment. The
components are deployed on different nodes. The communication
handler alternatively sends one message from the outgoing queue and
distributes one message in the incoming queue to the destination
agent or component, and runs in a separate thread on each node with
two message queues that consists of the incoming queue and the
outgoing queue.
+--------------------------------------+
+------------|----------+ | +------------|----------+
| Communication Handler | | | Communication Handler |
+-----------------------+ | +-----------------------+
| Terrain | | | Terrain |
+-----------------------+ | +-----------------------+
|
+--------------------------------------+
+------------|----------+ | +------------|----------+
| Communication Handler | | | Communication Handler |
+-----------------------+ | +-----------------------+
| Terrain | | | Terrain |
+-----------------------+ | +-----------------------+
|
+-----------------------+
| Communication Handler |
+-----------------------+
|Centralized Global Node|
+-----------------------+
Figure 1: Top level components, deployment and agent communication
handler
Figure2 shows the relationship of an action, state and reward between
an agent and its destination in the environment for reinforcement
learning. The agent does an action that leads to a reward from
achieving an optimal path toward its destination.
Kim & Hong Expires September 14, 2017 [Page 7]
Internet-Draft Reinforcement Learning over a Network March 2017
+-------------------------+
States & Reward ------| Centralized Global Node |<----------------+
| +-------------------------+ |
| |
| |
| States
| |
| |
+-------------+ +-------------+ |
| Multi-agent |-------------Action----------->| Destination |-----+
+-------------+ +-------------+
Figure 2: Architecture Overview
7. Use case of Multi-agent Reinforcement Learning (RL)
7.1. Distributed Multi-agent Reinforcement Learning: Sharing
Information
In this section, we deal with case of a collaborative distributed
multi-agent, where each agent has same or different individual
destination in a distributed environment. Since sharing information
scheme among the agents is problematic one, we need to expand on the
work described by solving the challenging cases.
Basically, the main proposed algorithm is presented by distributed
multi-agent reinforcement learning as below:.
Kim & Hong Expires September 14, 2017 [Page 8]
Internet-Draft Reinforcement Learning over a Network March 2017
+--Proposed Algorithm------------------------------------------+
| |
| Let N, A and D denote number of node, agent and destination |
+--------------------------------------------------------------+
| Place N, A and D in random position(x, y) |
+--------------------------------------------------------------+
| Every A agents in N nodes |
+--------------------------------------------------------------+
| Do inital exploration(random) toward D |
| (1) Let S denotes current state |
| (2) Relinguish S so other agent can occupy the positions |
| (3) Assign the agent's new position |
| (4) Update the current state S <- Sn |
+--------------------------------------------------------------+
| Do optimized exploration(RL) for number of trials |
| (1) Let S denotes current state |
| (2) Let P denotes action |
| (3) Let R denotes discounted reward value |
| (4) Choose action P <- Policy(S, P) in RL |
| (5) Move available directions by agent |
| (6) Update the learning model in a new value |
| (7) Update the current state S <- Sn |
+--------------------------------------------------------------+
Figure 3: Use case of Multi-agent Reinforcement Learning
Multi-agent reinforcement learning (RL) in distributed nodes can
improve the overall system performance to transfer or share
information from one node to another node in following cases;
expanded complexity in RL technique with various experimental factors
and conditions, analyzing multi-agent sharing information for agent
learning speed.
7.2. Use case of Shortest Path-planning via sub-goal selection
Sub-goal selection is a scheme of a distributed multi-agent RL
technique based on selected intermediary agent sub-goal(s) with the
aim of reducing the initial random trial. The scheme is to improve
the multi-agent system performance with asynchronously triggered
exploratory phase(s) with selected agent sub-goal(s) for autonomous
shortest path-planning.
[TBD]
Kim & Hong Expires September 14, 2017 [Page 9]
Internet-Draft Reinforcement Learning over a Network March 2017
7.3. Use case of Asynchronous Triggered Multi-agent with Terrain
Cluttered-index-based
This is a new proposed technical reward scheme based on the proposed
environment-clutter-index for the fast learning speed path-planning.
[TBD]
8. IANA Considerations
There are no IANA considerations related to this document.
9. Security Considerations
[TBD]
10. References
10.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<http://www.rfc-editor.org/info/rfc2119>.
10.2. Informative References
[I-D.jiang-nmlrg-network-machine-learning]
Jiang, S., "Network Machine Learning", ID draft-jiang-
nmlrg-network-machine-learning-02, October 2016.
[Megherbi]
"Megherbi, D. B., Kim, Minsuk, Madera, Manual., "A Study
of Collaborative Distributed Multi-Goal and Multi-agent
based Systems for Large Critical Key Infrastructures and
Resources (CKIR) Dynamic Monitoring and Surveillance",
IEEE International Conference on Technologies for Homeland
Security", 2013.
[Teiralbar]
"Megherbi, D. B., Teiralbar, A. Boulenouar, J., "A Time-
varying Environment Machine Learning Technique for
Autonomous Agent Shortest Path Planning.", Proceedings of
SPIE International Conference on Signal and Image
Processing, Orlando, Florida", 2001.
Kim & Hong Expires September 14, 2017 [Page 10]
Internet-Draft Reinforcement Learning over a Network March 2017
[Nasim] "Nasim ArianpooEmail, Victor C.M. Leung, "How network
monitoring and reinforcement learning can improve tcp
fairness in wireless multi-hop networks", EURASIP Journal
on Wireless Communications and Networking", 2016.
[Minsuk] "Dalila B. Megherbi and Minsuk Kim, "A Hybrid P2P and
Master-Slave Cooperative Distributed Multi-Agent
Reinforcement Learning System with Asynchronously
Triggered Exploratory Trials and Clutter-index-based
Selected Sub goals", IEEE CIG Conference", 2016.
[April] "April Yu, Raphael Palefsky-Smith, Rishi Bedi, "Deep
Reinforcement Learning for Simulated Autonomous Vehicle
Control", Stanford University", 2016.
[Markus] "Markus Kuderer, Shilpa Gulati, Wolfram Burgard, "Learning
Driving Styles for Autonomous Vehicles from
Demonstration", Robotics and Automation (ICRA)", 2015.
[Ann] "Ann Nowe, Peter Vrancx, Yann De Hauwere, "Game Theory and
Multi-agent Reinforcement Learning", In book:
Reinforcement Learning: State of the Art, Edition:
Adaptation, Learning, and Optimization Volume 12", 2012.
[Kok-Lim] "Kok-Lim Alvin Yau, Hock Guan Goh, David Chieng, Kae
Hsiang Kwong, "Application of reinforcement learning to
wireless sensor networks: models and algorithms",
Published in Journal Computing archive Volume 97 Issue 11,
Pages 1045-1075", November 2015.
[Sutton] "Sutton, R. S., Barto, A. G., "Reinforcement Learning: an
Introduction", MIT Press", 1998.
[Madera] "Madera, M., Megherbi, D. B., "An Interconnected Dynamical
System Composed of Dynamics-based Reinforcement Learning
Agents in a Distributed Environment: A Case Study",
Proceedings IEEE International Conference on Computational
Intelligence for Measurement Systems and Applications,
Italy", 2012.
[Al-Dayaa]
"Al-Dayaa, H. S., Megherbi, D. B., "Towards A Multiple-
Lookahead-Levels Reinforcement-Learning Technique and Its
Implementation in Integrated Circuits", Journal of
Artificial Intelligence, Journal of Supercomputing. Vol.
62, issue 1, pp. 588-61", 2012.
Kim & Hong Expires September 14, 2017 [Page 11]
Internet-Draft Reinforcement Learning over a Network March 2017
[Chowdappa]
"Chowdappa, Aswini., Skjellum, Anthony., Doss, Nathan,
"Thread-Safe Message Passing with P4 and MPI", Technical
Report TR-CS-941025, Computer Science Department and NSF
Engineering Research Center, Mississippi State
University", 1994.
Authors' Addresses
Min-Suk Kim
ETRI
218 Gajeongno, Yuseong
Daejeon 305-700
Korea
Phone: +82 42 860 5930
Email: mskim16@etri.re.kr
Yong-Geun Hong
ETRI
161 Gajeong-Dong Yuseung-Gu
Daejeon 305-700
Korea
Phone: +82 42 860 6557
Email: yghong@etri.re.kr
Kim & Hong Expires September 14, 2017 [Page 12]