Internet DRAFT - draft-wang-alto-large-data-framework
draft-wang-alto-large-data-framework
ALTO WG X. Wang
Internet-Draft S. Dong
Intended status: Informational Tongji University
Expires: January 21, 2016 G. Chen
Huawei Technologies
July 20, 2015
Design and Implementation of Large Data Transfer Coordinator
draft-wang-alto-large-data-framework-01.txt
Abstract
The Application-Layer Traffic Optimization (ALTO) protocol provides
network information with the goal of improving both application
performance and network resource utilization. As data transfers
become larger (e.g., due to big data analysis), more data transfers
are concurrent but with service requirements, and more network
capabilities are emerging (e.g., SDN allowing a data transfer to
request specific routes or Qos), the management of large data
transfers has become an increasingly challenging issue. This
document introduces Data Transfer Coordinator (DTC), a centralized
data transfer scheduling framework which provides Scheduling Hub
Service (SHS) to coordinate and schedule large data transfers. DTC
considers all three components: data transfer requirements, (ALTO)
network information, and SDN control capabilities. This document
specifies not only the basic framework of DTC, but also a key
component, service API for SHS to specify data transfers and their
relations.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 21, 2016.
Wang, et al. Expires January 21, 2016 [Page 1]
Internet-Draft Large Data Transfer Coordinator July 2015
Copyright Notice
Copyright (c) 2015 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Requirements Language . . . . . . . . . . . . . . . . . . . . 3
3. Terminology and Notation . . . . . . . . . . . . . . . . . . 3
4. Data Transfer Coordinator Framework . . . . . . . . . . . . . 4
4.1. Architecture . . . . . . . . . . . . . . . . . . . . . . 4
4.2. Job Collector . . . . . . . . . . . . . . . . . . . . . . 5
4.3. ALTO Client . . . . . . . . . . . . . . . . . . . . . . . 5
4.3.1. PASSIVE and ACTIVE Mode . . . . . . . . . . . . . . . 6
4.4. Task Scheduler . . . . . . . . . . . . . . . . . . . . . 6
4.4.1. Priority Model . . . . . . . . . . . . . . . . . . . 6
4.5. DTN Controller . . . . . . . . . . . . . . . . . . . . . 7
5. Scheduling Hub Service . . . . . . . . . . . . . . . . . . . 7
5.1. Application Compute-Transfer Structure . . . . . . . . . 8
5.2. Abstract Computation . . . . . . . . . . . . . . . . . . 8
5.3. DataTransferTask and SyncTask . . . . . . . . . . . . . . 9
5.4. Service API . . . . . . . . . . . . . . . . . . . . . . . 10
6. Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
7. Security Considerations . . . . . . . . . . . . . . . . . . . 12
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12
9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 12
10. References . . . . . . . . . . . . . . . . . . . . . . . . . 12
10.1. Normative References . . . . . . . . . . . . . . . . . . 12
10.2. Informative References . . . . . . . . . . . . . . . . . 12
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13
1. Introduction
There is substantial need to manage large data transfers.
Considering limited network resources such as bandwidth,
inappropriate handling large data transfer would reduce performance
significantly. It could be easier to cause network congestion than
Wang, et al. Expires January 21, 2016 [Page 2]
Internet-Draft Large Data Transfer Coordinator July 2015
low traffic. Congested network can result in higher rate of packet
loss, then triggers retransmissions, which can cripple already
heavily loaded networks. It's necessary to manage large data
transfer not only for high network resource utilization but also for
users' experience aspect.
Scheduling data flows needs network information such as available
bandwidth between two transfer nodes. ALTO defines cost maps
providing cost between two pids and endpoint cost service for two
endpoints. By utilizing these network information, application can
determine how to allocate bandwidth for each data flow. However, to
archive such scheduling, there needs a centralized coordinator that
can be aware of every data flow requirements. Moreover, to get the
customized requirements for each data transfer, a general interface
is need to obtain the correlation among data flows besides single
data flow requirements.
This document introduces a centralized framework, Data Transfer
Coordinator (DTC), which provides Scheduling Hub Service (SHS) for
applications. SHS implements common functionalities for data
transfers and provides cross-app coordination for achieving better
network-wide utility. Also SHS provides a general API for
applications to express data transfer relations by using two basic
structures, DataTransferTask and SyncTask.
This document is organized as follows: Section 3 defines the
Terminology and Notation in this document. Section 4 gives the
details of SHS for scheduling large data transfer. Section 5 gives
details of service API designed. Section 6 gives a MapReduce example
for specifying relations between data transfers.
2. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
3. Terminology and Notation
This document uses the following additional terms:DTC, SHS, Job,
Task.
o DTC
Data Transfer Coordinator. A centralized framework includes Job
Collector, Task Scheduler, ALTO Client, and DTN Controller to
provide data transfer scheduling service to applications. See
more detailed description in Section 4.
Wang, et al. Expires January 21, 2016 [Page 3]
Internet-Draft Large Data Transfer Coordinator July 2015
o SHS
Scheduling Hub Service. Data transfer scheduling service
considers both network information and data transfer requests.
Data transfer requests are captured by two basic structures,
DataTransferTask and SyncTask.See more detailed description in
Section 5.
o Job
Data transfer job that is registered by applications. A job
includes tasks indicating data transfers and their relations
submitted by one application. See more detailed description in
Section 5.
o Task
Including DataTransferTask and SyncTask that specifies data
transfer information and their relations, respectively. See more
detailed description in Section 5.
4. Data Transfer Coordinator Framework
4.1. Architecture
This section describes the design details of four components of the
DTC framework, 1. Job Collector; 2. ALTO Client; 3. Task
Scheduler; 4. Data Transfer Nodes (DTN) Controller. Among these
four modules, task scheduler is the core of the framework. Job
Collector provides interface to users for submitting data transfer
requests, which will be passed to task scheduler for further process.
Task scheduler makes scheduling based on the network information
generated by ALTO client as well as the requirements of each data
transfer from tasks. After computing allocation of bandwidth for
each task, task scheduler will send transfer commands to DTN
controller to start data transmission. Figure 1 shows the whole
process.
Wang, et al. Expires January 21, 2016 [Page 4]
Internet-Draft Large Data Transfer Coordinator July 2015
.-----------.
| Users |
'-----------'
| submit jobs
.- - - - - - - - - - - - - - - | - - - - - - - - - - - - - - .
| .-----------. |
| | Job | |
| DTC | Collector | |
| '-----------' |
| | pass user defined tasks |
| | to Task Scheduler |
| .-----------. .-----------. .---------. |
| | DTN |----------| Task |----------| ALTO | |
| | Controller| send | Scheduler | get | Client | |
| '-----------' transfer '-----------' network '---------' |
| commands state |
' - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -'
The benefits of DTC include:
o 1. It can achieve better network resource (bandwidth) allocation
since it manages all data transfer requirements in a centralized
framework.
o 2. It takes customized data transfer requirement into
consideration by introducing DataTransferTask and SyncTask to
capture correlation among data flows.
o 3. It's modular to support different scheduler algorithm
implementations.
4.2. Job Collector
The job collector is responsible to manage data transfer requests
from user and pass them to task scheduler for further process. It is
important that the requests are dynamic and hence the API of the job
collector allows dynamic insertion and deletion of data transfers.
Details of the data transfer description and APIs for users are
described in Section 5.3: Service API.
4.3. ALTO Client
ALTO client will be responsible to get network state to task
scheduler for further usage. Although different scheduling
algorithms may request different ALTO services, cost map and endpoint
cost map seems to be the most useful services for scheduling tasks.
Wang, et al. Expires January 21, 2016 [Page 5]
Internet-Draft Large Data Transfer Coordinator July 2015
4.3.1. PASSIVE and ACTIVE Mode
ALTO client should support two modes according to the way it
perceives network state changes, PASSIVE and ACTIVE. In PASSIVE
mode, ALTO client will query ALTO server periodically to get latest
network states. If the network state changes after one query, the
ALTO client will not be aware of the change until next query. In
ACTIVE mode, ALTO client will only query ALTO server once to get the
initial network state. If network state changes after that, the ALTO
client will be notified by ALTO server so it does not have to query
ALTO server again. Note that ACTIVE mode will only be supported by
ALTO server with ALTO SSE implemented.
4.4. Task Scheduler
The duty of task scheduler is to assign tasks from job collector to
proper data transfer nodes (DTNs), splitting a file to several
partial files to different DTNs if necessary, and notify the DTN
controller to initiate the transfer. We will not discuss specific
algorithm in this document but we assume algorithms used by scheduler
should take network states provided by ALTO client into
consideration. Different schedulers may obey different principles,
some schedulers aims to maximize the number of finished tasks while
some try to transfer as much data as possible.
4.4.1. Priority Model
In this section, we proposed a schedule model based on priority. In
this model, every task will be set a predefined priority value, e.g.
LOW, MEDIUM and HIGH, to indicate how important it is. The principle
of this model is that tasks with higher priority have the privilege
to occupy more resources such as available bandwidth. If the
priority is not set, the task must be set a default one. Things
become tricky when user does not specify priority but an expected
finish time instead. However, in this model, it is easy to be solved
by transforming expected finish time to priority by following steps:
o 01. Assign the lowest priority to the task and schedule the task.
o 02. Calculate the task's estimated finish time. If the estimated
finish time is longer than user specified finish time, increase
the task priority by one and reschedule the task, else the
schedule procedure completes.
o 03. Keep doing step 2 until either the schedule procedure
completes or the task is assigned as highest priority. If the
task is still not able to be finished, we will keep it as highest
priority and transfer as much data as possible.
Wang, et al. Expires January 21, 2016 [Page 6]
Internet-Draft Large Data Transfer Coordinator July 2015
The specific algorithm used to adjust the resources according to the
priority is not described in this document.
4.5. DTN Controller
DTN controller is only responsbile for two following functions:
o 01. Receive and process instructions from task scheduler, e.g.
starting a new transfer, aborting a running transfer and adjusting
transfer parameters such as transfer rate or number of
connections.
o 02. Monitor transfer status and update status changes to task
scheduler. If a transfer failed or finished, it should notify
task scheduler the details for further scheduling.
If we assume task scheduler is a manager, then DTN controller are
workers who focusing on its own job without caring anything else.
DTN controllers are not able to communicate with each other, which
means it does not have a global view. Since the DTN controller has
to utilize DTNs to transfer data, it should be deployed either in a
server able to access DTNs or in the DTNs themselves.
5. Scheduling Hub Service
Introducing a systematic description of data transfer for SHS is
challenging. Although it is easy to describe each individual data
transfer, this simple description method is not sufficient for a
centralized data transfer coordinator because it is not capable of
representing relations, e.g. dependencies, between different data
transfers. To solve this problem, this section first introduces the
concept of Application Compute-Transfer Structure (ACTS) that
captures the computation logic of application. ACTS includes the two
basic components, data computation and data transfer. We find that
for many data processing applications, they are composed of several
data computations and several data transfers by which data
computations are linked as a complete data processing. For example,
MapReduce job includes mappers and reducers as data computation
components, and data transfers act as connections between mappers and
reducers.
However, for SHS, it doesn't need the exact computation at data
computation nodes, but the enough knowledge to reflect the dependency
between data transfers. Hence, we provide the ability of abstracting
computation to applications for expressing dependency anf
coordination between data transfers. By abstracting data
computation, application can define the relation between data
Wang, et al. Expires January 21, 2016 [Page 7]
Internet-Draft Large Data Transfer Coordinator July 2015
transfers to/from one data computation node or a cluster of nodes,
for expressing coarse grained dependency.
Finally, to map the concept to the design, SHS service API includes
two transfer task types, DataTransferTask and SyncTask, which defines
the basic data transfer information and relations between data
transfers, respectively.
5.1. Application Compute-Transfer Structure
For many applications, the whole data processing would be divided
into several pieces of small data computations depending on the
different roles of servers, e.g., the MapReduce job is divided into
two types of tasks, mapper and reducer, based on the role of servers.
All partial data computations are linked by data transfers which
transmit the result of computation from one place to another. By the
joint collaboration of all small data computations, the application
achieves the specific data processing. Then we use Application
Compute-Transfer Structure (ACTS) which includes data computation and
data transfer to convey the computation pattern of application. The
mapping from computation logic of application to ACTS should be very
obvius since it only includes data computation and data transfer.
By using ACTS, the computation logic of application can be defined as
several data computations and several data transfers which link data
computations, i.e., a Directed Acyclic Graph (DAG), in which each
node is data computation and each link is data transfer.
5.2. Abstract Computation
For SHS, it doesn't need to know the exact computation of each data
computation nodes in ACTS. But to schedule data transfers submitted
by different applications, SHS needs the information about the
relation between data transfers, such as dependency and coordiantion.
The relation between data transfers is defined at data computation
nodes. To achieve a collaboration of multiple data computation, each
data computation must rely on the result of others. The dependency
of data computations defines the relation of data transfers which is
needed by SHS. Hence, to express the relation of data transfers, for
a better scheduling, application should abstract its data
computations
In this document, we define some attributes (dependency type,
throughput matching, pipelining or blocking, and deadline) that can
be used for abstract computation. Dependency type includes two
values, all and one, to specify when to start the output data trnsfer
at data computation nodes. All indicates the output data transfer
cannot start until all input data transfers (at the same data
Wang, et al. Expires January 21, 2016 [Page 8]
Internet-Draft Large Data Transfer Coordinator July 2015
computation node) finishes, and one indicates if one input data
transfer finishes, it can start output data transfer instead of
waiting for other input data transfers. Throughput matching will
defines the throughout relation between input data transfers and
output data transfers. E.g., application needs a higher throughput
for output data transfers than input ones. Pipelining and blocking
indicates whether should the output data transfers wait the finishing
of input data transfers or not. Deadline specifies the deadline for
add dependent data transfers.
5.3. DataTransferTask and SyncTask
In this section, we define two types of task for mapping the concept
to design of service API. DataTransferTask defines the basic
information of data transfers while SyncTask defines the relation
between data tansfers, i.e., abstract computation.
The schema for DataTransferTask (dtt) representation is described as
following:
object {
ResourcePath src;
ResourcePath dst;
JSONNumber dataSize;
JSONNumber offset;
[JSONString deadline;]
} DataTransferTask;
object {
JSONString dependencies<1..*>;
Attributes attributes<1..*>;
} SyncTask;
object {
JSONString ss_id;
JSONString path;
} ResourcePath
object {
JSONString -> JSONString;
} Attributes;
with fields:
o src
Wang, et al. Expires January 21, 2016 [Page 9]
Internet-Draft Large Data Transfer Coordinator July 2015
This field specifies the source of data transfer.
o dst
This field specifies the destination of data transfer.
o ResourcePath
This field identifies a unique resource in multiple storege
systems. Since a storage system could be connected by multiple
data transfer nodes, it is not accurate to identify a resource by
server host and file path anymore. To solve this problem, DTC
will assign every connected storage system a unique id. Thus,
users can combine ss_id, which is the unique storage system id,
and file_path, which indicates location of the file in the
corresponding storage system, to identify a unique resource.
o dataSize
This field specifies the size of data to transport.
o offset
This field specifies the offset of data. This provides the
flexibility to application to split the data and transport them
separately.
o dependencies
This field specifies the dependencies of the SyncTask. Mapping to
the ACTS, dependency of a SyncTask is the input data transfer of a
data computation node.
o attributes
This field specifies the attributes of the SyncTask. Attributes
is key-value that key is the attributes name and value is the
attributes value. Attributes can be dependency type of throughput
matching as described.
5.4. Service API
Normally, users will register transfer jobs to include all
conrresponding DataTransferTasks and SyncTasks. While a transfer jon
is running, the user should be able to add tasks to or remove tasks
from the job dynamically. To enable these features, a job collector
should provide the following five functions for user:
Wang, et al. Expires January 21, 2016 [Page 10]
Internet-Draft Large Data Transfer Coordinator July 2015
o register()
This function creates a new transfer job. It must return a job id
for user to identify the job created. If the creation fails, it
must throw an error.
o unregister(job_id)
This function aborts a running transfer job. It accepts a job_id
parameter and must abort all tasks belonging to the job. The
function return value should indicate if the abort action succeeds
or not. If the job does not exist, it must throw an error.
o createTaskDesc(type, [args])
This function creates a task description satisfying the structure
defined above. Type argument specifies the type of task,
DataTransferTask or SyncTask. Args list specifies the content of
the task, for DataTransferTask, it includes src, dst, dataSize,
offset, and deadline; for SyncTask, it includes dependencies and
attributes. This function returns the specified task for further
operations.
o addTask(job_id, task)
This function adds a new task to a existing job. This function
accepts a job_id and a task as parameters. It must return a task
id for user to identify the added task. If the creation fails, it
must throw an error.
o removeTaskS(job_id, task_id,)
This function removes a task from a existing job. This function
accepts a job_id and a task_id. The job_id and task_id will
identify a unique task to be removed. The function return value
should indicate if the remove action succeeds or not.
6. Example
Suppose a MapReduce job has 10 mappers and 5 reducers. Each mapper
transfers data to each reducer. There will be 50 data transfers in
all. Application wants to express its requirements that minimize the
finishing time of all transfers, not one individual transfer. Here
we give a JSON example to show what should be sent to job collector
for adding a DataTransferTask and a SyncTask to existing transfer
job. After application added a DataTransferTask to transfer job, it
will receive a task_id to identify the task (task_01, ..., task_50).
Then it will use those task_id to add a SyncTask.
Wang, et al. Expires January 21, 2016 [Page 11]
Internet-Draft Large Data Transfer Coordinator July 2015
{
"job-id": "job_00",
"task": {
"type": "data-transfer-task",
"src": "http://192.168.0.0/bigdata/mapreduce/map0.data",
"dst": "http://192.168.1.0/bigdata/mapreduce/reduce0.data",
"data-size": "100",
"offset": "0"
}
}
{
"job-id": "job_00",
"task": {
"type": "sync-task",
"dependencies": [ "task_01", "task_02",..., "task_50" ],
"dependency_type": "all"
}
}
7. Security Considerations
This document has not conducted its security analysis.
8. IANA Considerations
This document does not specified its IANA considerations, yet.
9. Acknowledgments
The authors thank discussions with Yicheng Qian.
10. References
10.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/
RFC2119, March 1997,
<http://www.rfc-editor.org/info/rfc2119>.
10.2. Informative References
Wang, et al. Expires January 21, 2016 [Page 12]
Internet-Draft Large Data Transfer Coordinator July 2015
[RFC7285] Alimi, R., Ed., Penno, R., Ed., Yang, Y., Ed., Kiesel, S.,
Previdi, S., Roome, W., Shalunov, S., and R. Woundy,
"Application-Layer Traffic Optimization (ALTO) Protocol",
RFC 7285, DOI 10.17487/RFC7285, September 2014,
<http://www.rfc-editor.org/info/rfc7285>.
Authors' Addresses
Xin Wang
Tongji University
4800 Cao'an Road, Jiading District
Shanghai
China
Email: xinwang2014@hotmail.com
Shu Dong
Tongji University
4800 Cao'an Road, Jiading District
Shanghai
China
Email: dongs2011@gmail.com
Guohai Chen
Huawei Technologies
101 Software Avenue, Yuhua District
Nanjing
China
Email: chenguohai@huawei.com
Wang, et al. Expires January 21, 2016 [Page 13]