Internet DRAFT - draft-yangcan-ietf-data-migration-standards
draft-yangcan-ietf-data-migration-standards
Internet Engineering Task Force C. Yang, Ed.
Internet-Draft Y.Liu&Y.Wang&SY.Pan, Ed.
Intended status: Standards Track South China University of Technology
Expires: November 28, 2020 C. Chen
Inspur
G. Chen
GSTA
Y. Wei
Huawei
May 27, 2020
A Massive Data Migration Framework
draft-yangcan-ietf-data-migration-standards-04
Abstract
This document describes a standardized framework for implementing the
massive data migration between traditional databases and big-data
platforms on the cloud via Internet, especially for an instance of
Hadoop data architecture. The main goal of the framework is to
provide more concise and friendly interfaces for users more easily
and quickly migrate the massive data from a relational database to a
distributed platform for a variety of requirements, in order to make
full use of distributed storage resource and distributed computing
capability to solve the bottleneck problems of both storage and
computing performance in traditional enterprise-level applications.
This document covers the fundamental architecture, data elements
specification, operations, and interface related to massive data
migration.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on November 28, 2020.
Yang, et al. Expires November 28, 2020 [Page 1]
Internet-Draft Data Migration Standards May 2020
Copyright Notice
Copyright (c) 2020 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Definitions and Terminology . . . . . . . . . . . . . . . . . 4
3. Specific Framework Implementation Standards . . . . . . . . . 6
3.1. System Architecture Diagram . . . . . . . . . . . . . . . 6
3.2. Source and Target of Migration . . . . . . . . . . . . . 7
3.2.1. The Data Sources of Migration . . . . . . . . . . . . 7
3.2.2. The Connection Testing of Relational Data Sources . . 8
3.2.3. The Target Storage Container of Data Migration . . . 8
3.2.4. Specifying Target Cloud Platform . . . . . . . . . . 8
3.2.5. Data Migration to third-party Web Applications . . . 8
3.3. Type of Migrated Database . . . . . . . . . . . . . . . . 9
3.4. Scale of Migrated Table . . . . . . . . . . . . . . . . . 9
3.4.1. Full Table Migration . . . . . . . . . . . . . . . . 9
3.4.2. Single Table Migration . . . . . . . . . . . . . . . 10
3.4.3. Multi-table migration . . . . . . . . . . . . . . . . 10
3.5. Split-by . . . . . . . . . . . . . . . . . . . . . . . . 10
3.5.1. Single Column . . . . . . . . . . . . . . . . . . . . 10
3.5.2. Multiple Column . . . . . . . . . . . . . . . . . . . 11
3.5.3. Non-linear Segmentation . . . . . . . . . . . . . . . 11
3.6. Conditional Query Migration . . . . . . . . . . . . . . . 12
3.7. Dynamic Detection of Data Redundancy . . . . . . . . . . 12
3.8. Data Migration with Compression . . . . . . . . . . . . . 13
3.9. Updating Mode of Data Migration . . . . . . . . . . . . . 13
3.9.1. Appending Migration . . . . . . . . . . . . . . . . . 13
3.9.2. Overwriting the Import . . . . . . . . . . . . . . . 13
3.10. The Encryption and Decryption of Data Migration . . . . . 14
3.11. Incremental Migration . . . . . . . . . . . . . . . . . . 14
3.12. Real-Time Synchronization Migration . . . . . . . . . . . 14
3.13. The Direct Mode of Data Migration . . . . . . . . . . . . 14
3.14. The Storage Format of Data files . . . . . . . . . . . . 15
3.15. The Number of Map Tasks . . . . . . . . . . . . . . . . . 15
Yang, et al. Expires November 28, 2020 [Page 2]
Internet-Draft Data Migration Standards May 2020
3.16. The selection on the elements in a table to be migrated
column . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.17. Visualization of Migration . . . . . . . . . . . . . . . 15
3.17.1. Dataset Visualization . . . . . . . . . . . . . . . 15
3.17.2. Visualization of Data Migration Progress . . . . . . 15
3.18. Smart Analysis of Migration . . . . . . . . . . . . . . . 16
3.19. Task Scheduling . . . . . . . . . . . . . . . . . . . . . 16
3.20. The Alarm of Task Error . . . . . . . . . . . . . . . . . 16
3.21. Data Export From Cloud to RDBMS . . . . . . . . . . . . . 16
3.21.1. Data Export Diagram . . . . . . . . . . . . . . . . 16
3.21.2. Full Export . . . . . . . . . . . . . . . . . . . . 17
3.21.3. Partial Export . . . . . . . . . . . . . . . . . . . 17
3.22. The Merger of Data . . . . . . . . . . . . . . . . . . . 18
3.23. Column Separator . . . . . . . . . . . . . . . . . . . . 18
3.24. Record Line Separator . . . . . . . . . . . . . . . . . . 18
3.25. The Mode of Payment . . . . . . . . . . . . . . . . . . . 18
3.26. Web Shell for Migration . . . . . . . . . . . . . . . . . 18
3.26.1. Linux Web Shell . . . . . . . . . . . . . . . . . . 18
3.26.2. HBase Shell . . . . . . . . . . . . . . . . . . . . 19
3.26.3. Hive Shell . . . . . . . . . . . . . . . . . . . . . 19
3.26.4. Hadoop Shell . . . . . . . . . . . . . . . . . . . . 19
3.26.5. Spark Shell . . . . . . . . . . . . . . . . . . . . 19
3.26.6. Spark Shell Programming Language . . . . . . . . . . 19
4. Security Considerations . . . . . . . . . . . . . . . . . . . 19
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20
6. References . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.1. Normative References . . . . . . . . . . . . . . . . . . 20
6.2. Informative References . . . . . . . . . . . . . . . . . 20
6.3. URL References . . . . . . . . . . . . . . . . . . . . . 20
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 21
1. Introduction
With the widespread popularization of cloud computing and big data
technology, the scale of data is increasing rapidly, and the
distribution computing requirements are more significant than before.
For a long time, a majority of companies have usually use relational
databases to store and manage their data, a great amount of
structured data exist still and accumulate with the business
development in legacies. With the dairy growth of data size, the
storage bottleneck and the performance degradation for the data when
analyzing and processing have become pretty serious and need to be
solved in globe enterprise-level applications. This distributed
platform refers to a software platform that builds data storage, data
analysis, and calculations on a cluster of multiple hosts. Its core
architecture involves in distributed storage and distributed
computing. In terms of storage, it is theoretically possible to
expand capacity indefinitely, and storage can be dynamically expanded
Yang, et al. Expires November 28, 2020 [Page 3]
Internet-Draft Data Migration Standards May 2020
horizontally with the increasing data. In terms of computing, some
key computing frameworks as mapreduce can be used to perform parallel
computing on large-scale datasets to improve the efficiency of
massive data processing. Therefore, when the data size exceeds the
storage capacity of a single-system or the computation exceeds the
computing capacity of a stand-alone system, massive data can be
migrated to a distributed platform. The ability of resource sharing
and collaborative computing provided by a distributed platform can
well solve large-scale data processing problems. The document
focuses on putting forward a standard for implementing a big data
migration framework through web access via Internet and considering
how to help users more easily and quickly migrate the massive data
from a traditional relational database to a cloud platform from
multiple requirements. Using the distributed storage and distributed
computing technologies highlighted by the cloud platform, on the one
hand, it solves the storage bottleneck and the problem of low data
analyzing and processing performance of relational databases. Based
on the access by web, the framework supports open work state and
promotes globe applications for data migration.
Note: It is also permissible to implement this framework in non-web.
2. Definitions and Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
The following definitions are for terms used in the context of this
document.
o "DMOW": Its full name is "Data Migration on Web",it means data
migration based on web.
o "Cloud Computing": Cloud computing is a pay-per-use model that
provides available, convenient, on-demand network access, the user
enters a configurable computing resource sharing pool (resources
include network, server, storage, application software, services),
these resources can be provided quickly, with little
administrative effort or little interaction with service
providers.
o "Big Data": A collection of data that cannot be captured, managed,
and processed using conventional software tools within a certain
time frame. That is a massive, high growth rate and diversified
information assets that require new processing modes to have
stronger decision-making power, insight and process optimization
capabilities.
Yang, et al. Expires November 28, 2020 [Page 4]
Internet-Draft Data Migration Standards May 2020
o "Data Migration": The data migration described in this document is
aimed at the data transfer process between a relational database
and a cloud platform.
o "Data Storage": Data is recorded in a format on the computer's
internal or external storage media.
o "Data Cleansing": It is a process of re-examining and verifying
data. The purpose is to remove duplicate information, correct
existing errors, and provide data consistency.
o "Extraction-Transformation-Loading(ETL)": The processing of user
database or data warehouse. That is, data is extracted from
various data sources, converted into data that meets the needs of
the business, and finally loaded into the database.
o "Distributed Platform": A software platform that builds data
storage, data analysis, and calculations on a cluster of multiple
hosts.
o "Distributed File System": The physical storage resources managed
by the file system are not directly connected to the local node.
Instead, they are distributed on a group of machine nodes
connected by a high-speed internal network. These machine nodes
together form a cluster.
o "Distributed Computing": A computer science discipline studies how
to divide a problem that requires a very large amount of computing
power into many small parts and it is coordinated by many
independent computers to get the final result.
o "Apache Hadoop": An open source distributed system infrastructure
that can be used to develop distributed programs for large data
operations and storage.
o "Apache HBase": An open source, non-relational, distributed
database. Used with the Hadoop framework.
o "Apache Hive": It is a data warehouse infrastructure built on
Hadoop. It can be used for data extraction-transformation-
loading(ETL), it is a mechanism that can store, query, and analyze
large-scale data stored in Hadoop.
o "HDFS": A Hadoop distributed file system designed to run on
general-purpose hardware.
o "MapReduce": A programming model for parallel computing of large-
scale data sets (greater than 1 TB).
Yang, et al. Expires November 28, 2020 [Page 5]
Internet-Draft Data Migration Standards May 2020
o "Spark": It is a fast and versatile computing engine designed for
large-scale data processing.
o "MongoDB":It is a database based on distributed file storage
designed to provide scalable, high-performance data storage
solutions for web applications.
o "GHTs":the sender to send the granular header information index
table of the information granules.
o "GDBs": the sender to send the granular information database of
the information granules.
o "GHTr": the receiver to receive the granular header information
index table of the information granules.
o "GDBr":the receiver to receive the granular information database
of the information granules.
3. Specific Framework Implementation Standards
The main goal of this data migration framework is to help companies
migrate their massive data stored in relational databases to cloud
platforms through web access. We propose a series of rules and
constraints on the implementation of the framework, by which the
users can conduct massive data migration with a multi-demand
perspective.
Note: The cloud platforms mentioned in the document refer to the
Hadoop platform by default. All standards on the operations and the
environment of the framework refer to web state by default.
3.1. System Architecture Diagram
Yang, et al. Expires November 28, 2020 [Page 6]
Internet-Draft Data Migration Standards May 2020
Figure 1 shows the working diagram of the framework.
+---------+ +----------------+
| | (1) | WebServer |
| Browser |-------->| |---------------------
| | | +-----------+ | |
+---------+ | | DMOW | | |
| +-----------+ | |
+----------------+ |
|(2)
|
|
+-------------+ +-----------------------+ |
| | (3) | | |
| Data Source |--------> | Cloud Platform | |
| | | +-----------------+ |<----
+-------------+ | | Migration Engine| |
| +-----------------+ |
+-----------------------+
Figure 1:Reference Architecture
The workflow of the framework is as follows:
Step (1) in the figure means that users submit the requisition of
data migration to DMOW through browser(the requisition includes
data source information, target cloud platform information, and
related migration parameter settings);
Step (2) in the figure means that DMOW submits user's request
information of data migration to cloud platform's migration
engine;
Step (3) in the figure means that the migration engine performs
data migration tasks based on the migration requests it receives
to migrate data from relational database to cloud platform;
3.2. Source and Target of Migration
3.2.1. The Data Sources of Migration
This framework MUST support data migration between relational
databases and cloud platforms on web, and MUST meet the following
requirements:
Yang, et al. Expires November 28, 2020 [Page 7]
Internet-Draft Data Migration Standards May 2020
1. The framework supports to connect data sources in relational
databases. The relational database MUST be at least one of the
following:
* SQLSERVER
* MYSQL
* ORACLE
2. This framework MUST support the dynamic perception of data
information in relational databases under a normal connection, in
other words :
* It MUST support dynamic awareness of all tables in a
relational database;
* It MUST support dynamic awareness of all columns corresponding
to all tables in a relational database;
3.2.2. The Connection Testing of Relational Data Sources
Before conducting data migration, the framework MUST support testing
the connection to the data sources that will be migrated, and then
decide whether to migrate.
3.2.3. The Target Storage Container of Data Migration
This framework MUST allow users to migrate large amounts of data from
a relational database to the following at least two types of target
storage containers:
o HDFS
o HBASE
o HIVE
3.2.4. Specifying Target Cloud Platform
This framework MUST allow an authorized user to specify the target
cloud platform to which the data will be migrated.
3.2.5. Data Migration to third-party Web Applications
This framework SHALL support the migration of large amounts of data
from relational databases to one or multiple data containers for
Yang, et al. Expires November 28, 2020 [Page 8]
Internet-Draft Data Migration Standards May 2020
third-party Web applications. The target storage containers of the
third-party Web application systems can be:
o MONGODB
o MYSQL
o SQLSERVER
o ORACLE
3.3. Type of Migrated Database
This framework is needed to meet the following requirements:
o It MAY support migrating the entire relational database to the
cloud platform;
o It MAY support homogeneous migration (for example, migration from
ORACLE to ORACLE);
o It MAY support heterogeneous migrations between different
databases (for example, migration from ORACLE to SQLServer);
o It SHALL support the migration to the MONGODB database;
o It's OPTIONAL that If the migration process is interrupted, it is
needed to support automatic restart of the migration process and
continue the migration from where it left off; Additionally, the
framework is needed to be able to support the user in the
following manner to inform this abnormal interruption:
* It MUST support popping up an alert box on the screen of the
user;
* It SHALL support notifying users by email;
* It's OPTIONAL to notify users by an Instant Messenger as We
Chat or QQ;
3.4. Scale of Migrated Table
3.4.1. Full Table Migration
This framework MUST support the migration of all tables in a
relational database to at least two types of target storage
containers:
Yang, et al. Expires November 28, 2020 [Page 9]
Internet-Draft Data Migration Standards May 2020
o HDFS
o HBASE
o HIVE
3.4.2. Single Table Migration
This framework MUST allow users to specify a single table in a
relational database and migrate it to at least two types of target
storage containers:
o HDFS
o HBASE
o HIVE
3.4.3. Multi-table migration
This framework MUST allow users to specify multiple tables in a
relational database and migrate to at least two types of target
storage containers:
o HDFS
o HBASE
o HIVE
3.5. Split-by
This framework is needed to meet the following requirements on split-
by.
3.5.1. Single Column
1. The framework MUST allow the user to specify a single column of
the data table (usually the table's primary key), then slice the
data in the table into multiple parallel tasks based on this
column, and migrate the sliced data to one or more of the
following target data containers respectively:
* HDFS
* HBASE
* HIVE
Yang, et al. Expires November 28, 2020 [Page 10]
Internet-Draft Data Migration Standards May 2020
The specification of the data table column can be based on the
following methods:
+ Users can specify freely;
+ Users can specify linearly;
+ Users can select an appropriate column for the segmentation
based on the information entropy of the selected column
data;
2. The framework SHALL allow the user to query the boundaries of the
specified column in the split-by, then slice the data into
multiple parallel tasks and migrating the data to one or more of
the following target data containers:
* HDFS
* HBASE
* HIVE
3.5.2. Multiple Column
This framework MAY allow the user to specify multiple columns in the
data table to slice the data linearly into multiple parallel tasks
and then migrate the data to one or more of the following target data
containers:
o HDFS
o HBASE
o HIVE
3.5.3. Non-linear Segmentation
It's OPTIONAL that this framework is needed to support non-linear
intelligent segmentations of data for one or more columns and then
migrate the data to one or more of the following target data
containers:
The non-linear intelligent segmentations refer to:
* Adaptive segmentation based on the distribution(density)of the
value of numerical columns;
Yang, et al. Expires November 28, 2020 [Page 11]
Internet-Draft Data Migration Standards May 2020
* Adaptive segmentation based on the distribution of entropy of
subsegments of a column;
* Adaptive Segmentation Based on Neural Network Predictor;
The target data container includes:
* HDFS
* HBASE
* HIVE
3.6. Conditional Query Migration
This framework SHALL allow users to specify the query conditions,
then querying out the corresponding data records and migrating them.
3.7. Dynamic Detection of Data Redundancy
It's OPTIONAL that the framework is needed to allow users to add data
redundancy labels and label communication mechanisms, then it detects
redundant data dynamically during data migration to achieve non-
redundant migration.
The specific requirements are as follows:
o The framework SHALL be able to deep granulation processing on the
piece of data content to be sent. It means the content segment to
be sent is further divided into smaller-sized data sub-blocks.
o The framework SHALL be able to feature calculation and forming a
grain head for each of the decomposed particles, the granular
header information includes but not limited to grain feature
amount, grain data fingerprint, unique grain ID number, particle
generation time, source address and destination address, etc.
o The framework SHALL be able to detect the granular header
information to determine the transmission status of each
information granule content that is decomposed, and if the current
information granule to be sent is already present at the receiving
end, the content of the granule is not sent. Otherwise the
current granule will be sent out.
o The framework SHALL be able to set a Cache at the sending port to
cache the granular header information index table (GHTs) and the
granular information database (GDBs) for the information granules;
the receiver SHALL be able to set a Cache to cache the granular
Yang, et al. Expires November 28, 2020 [Page 12]
Internet-Draft Data Migration Standards May 2020
header information index table (GHTr) and the granular information
database (GDBr) that have successfully received the information
granular.
o After all the fragments of the data have been transferred, the
framework SHALL be able to reassemble all the fragments and store
the data on the receiving disk.
o The framework SHALL be able to set a granular encoder at the
sending port, which is responsible for encoding and compressing
the information granular content generated by the granular
resolver. The encoder generates a coded version of the
corresponding information granule, and calculates the content
about the granule head of the compressed information granule ,
then performs transmission processing in the manner of sending the
granule head, detecting redundant granules, synthesizing the
granules, and accurately detecting redundant granules.
o The framework SHALL be able to set a granular decoder at the
receiving port, which is responsible for decoding the encoded
compressed granular content at the receiving port and merging it
with the grain synthesizer, whether it comes from the sending port
Cache or the receiving port Cache.
3.8. Data Migration with Compression
During the data migration process, the data is not compressed by
default. This framework MUST support at least one of the following
data compression encoding formats, allowing the user to compress and
migrate the data:
o GZIP
o BZIP2
3.9. Updating Mode of Data Migration
3.9.1. Appending Migration
This framework SHALL support the migration of appending data to
existing datasets in HDFS.
3.9.2. Overwriting the Import
When importing data into HIVE, the framework SHALL support
overwriting the original dataset and saving it.
Yang, et al. Expires November 28, 2020 [Page 13]
Internet-Draft Data Migration Standards May 2020
3.10. The Encryption and Decryption of Data Migration
This framework is needed to meet the following requirements:
o It MAY support data encryption at the source, and then the
received data should be decrypted and stored on the target
platform;
o It MUST support the authentication when getting data migration
source data;
o It SHALL support the verification of identity and permission when
accessing the target platform of data migration;
o During the process of data migration, it SHOULD support data
consistency;
o During the process of data migration, it MUST support data
integrity;
3.11. Incremental Migration
The framework SHOULD support incremental migration of table records
in a relational database, and it MUST allow the user to specify a
field value as "last_value" in the table in order to characterize the
row record increment. Then, the framework SHOULD migrate those
records in the table whose field value is greater than the specified
"last_value", and then update the last_value.
3.12. Real-Time Synchronization Migration
The framework SHALL support real-time synchronous migration of
updated data and incremental data from a relational database to one
or many of the following target data containers:
o HDFS
o HBASE
o HIVE
3.13. The Direct Mode of Data Migration
This framework MUST support data migration in direct mode, which can
increase the data migration rate.
Note:This mode supports only for MYSQL and POSTGRESQL.
Yang, et al. Expires November 28, 2020 [Page 14]
Internet-Draft Data Migration Standards May 2020
3.14. The Storage Format of Data files
This framework MUST allow saving the migrated data within at least
one of following data file formats:
o SEQUENCE
o TEXTFILE
o AVRO
3.15. The Number of Map Tasks
This framework MUST allow the user to specify a number of map tasks
to start a corresponding number of map tasks for migrating large
amounts of data in parallel.
3.16. The selection on the elements in a table to be migrated column
o The specification of columns
This framework MUST support the user to specify the data of one
or multiple columns in a table to be migrated.
o The specification of rows
This framework SHOULD support the user to specify the range of
rows in a table to be migrated.
o The composition of the specification of columns and rows
This framework MAY support optionally the user to specify the
range of rows and columns in a table to be migrated.
3.17. Visualization of Migration
3.17.1. Dataset Visualization
After the framework has migrated the data in the relational
database,,it MUST support the visualization of the dataset in the
cloud platform.
3.17.2. Visualization of Data Migration Progress
The framework SHOULD support to show dynamically the progress to
users in graphical mode when migrating.
Yang, et al. Expires November 28, 2020 [Page 15]
Internet-Draft Data Migration Standards May 2020
3.18. Smart Analysis of Migration
The framework MAY provide automated migration proposals to facilitate
the user's estimation of migration workload and costs.
3.19. Task Scheduling
The framework SHALL support the user to set various migration
parameters(such as map tasks,the storage format of data files,the
type of data compression and so on) and task execution time, and then
to perform the schedule off-line/online migration tasks.
3.20. The Alarm of Task Error
When the task fails, the framework MUST at least support to notify
stakeholders through a predefined way.
3.21. Data Export From Cloud to RDBMS
3.21.1. Data Export Diagram
Figure 2 shows the framework's working diagram of exporting data.
+---------+ +----------------+
| | (1) | WebServer |
| Browser |-------->| |---------------------
| | | +-----------+ | |
+---------+ | | DMOW | | |
| +-----------+ | |
+----------------+ |
|(2)
|
|
+-------------+ +-----------------------+ |
| | (3) | | |
| Data Source |<-------- | Cloud Platform | |
| | | +-----------------+ |<----
+-------------+ | | Migration Engine| |
| +-----------------+ |
+-----------------------+
Figure 2:Reference Diagram
The workflow of exporting data through the framework is as follows:
Yang, et al. Expires November 28, 2020 [Page 16]
Internet-Draft Data Migration Standards May 2020
Step (1) in the figure means that users submit the requisition of
data migration to DMOW through browser(the requisition includes
cloud platform information,the information of target relational
database, and related migration parameter settings);
Step (2) in the figure means that DMOW submits user's request
information of data migration to cloud platform's migration
engine;
Step (3) in the figure means that the migration engine performs
data migration tasks based on the migration requests it receives
to migrate data from cloud platform to relational database;
3.21.2. Full Export
The framework MUST at least support exporting data from HDFS to one
of following relational databases:
o SQLSERVER
o MYSQL
o ORACLE
The framework SHALL support exporting data from HBASE to one of
following relational databases:
o SQLSERVER
o MYSQL
o ORACLE
The framework SHALL support exporting data from HIVE to one of
following relational databases:
o SQLSERVER
o MYSQL
o ORACLE
3.21.3. Partial Export
The framework SHALL allow the user to specify data range of keys on
the cloud platform and export the elements in the specified range to
a relational database. Exporting into A Subset of Columns.
Yang, et al. Expires November 28, 2020 [Page 17]
Internet-Draft Data Migration Standards May 2020
3.22. The Merger of Data
The framework SHALL support merging data in different directories in
HDFS and store them in a specified directory.
3.23. Column Separator
The framework MUST allow the user to specify the separator between
fields in the migration process.
3.24. Record Line Separator
The framework MUST allow the user to specify the separator between
the record lines after the migration is complete.
3.25. The Mode of Payment
1. One-way payment mode
* In the framework by default, users SHALL to pay for
downloading data from the cloud platform.It is free when
uploading data from a relational database to a cloud platform;
* In the framework, users SHALL pay for uploading data from a
relational database to a cloud platform.It is free when
downloading data from the cloud;
2. Two-way payment mode
In the framework, the users of the data migration process
between the relational database and the cloud platform all
SHALL pay a fee;
3.26. Web Shell for Migration
The framework provides following shells for character interface to
operate through web access.
3.26.1. Linux Web Shell
The framework SHALL support Linux shell through web access, which
allows users to perform basic Linux command instructions for the
configuration management of the data migrated on web.
Yang, et al. Expires November 28, 2020 [Page 18]
Internet-Draft Data Migration Standards May 2020
3.26.2. HBase Shell
The framework SHALL support hbase shell through web access, which
allows users to perform basic operations such as adding, deleting,
and deleting to the data migrated to hbase through the web shell.
3.26.3. Hive Shell
The framework SHALL support hive shell through web access, which
allows users to perform basic operations such as adding, deleting,
and deleting to the data migrated to hive through the web shell.
3.26.4. Hadoop Shell
The framework SHALL support the Hadoop shell through web access so
that users can perform basic Hadoop command operations through the
web shell.
3.26.5. Spark Shell
The framework SHALL support spark shell through web access and
provide an interactive way to analyze and process the data in the
cloud platform.
3.26.6. Spark Shell Programming Language
In spark web shell, the framework SHALL support at least one of the
following programming languages:
o Scala
o Java
o Python
4. Security Considerations
The framework SHOUD support for the security of the data migration
process. During the data migration process, it should support
encrypt the data before transmission, and then decrypt it for storage
in target after the transfer is complete. At the same time, it must
support the authentication when getting data migration source data
and it shall support the verification of identity and permission when
accessing the target platform.
Yang, et al. Expires November 28, 2020 [Page 19]
Internet-Draft Data Migration Standards May 2020
5. IANA Considerations
This memo includes no request to IANA.
6. References
6.1. Normative References
[RFC2026] Bradner, S., "The Internet Standards Process -- Revision
3", BCP 9, RFC 2026, DOI 10.17487/RFC2026, October 1996,
<https://www.rfc-editor.org/info/rfc2026>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
[RFC2578] McCloghrie, K., Ed., Perkins, D., Ed., and J.
Schoenwaelder, Ed., "Structure of Management Information
Version 2 (SMIv2)", STD 58, RFC 2578,
DOI 10.17487/RFC2578, April 1999,
<https://www.rfc-editor.org/info/rfc2578>.
6.2. Informative References
[RFC2629] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629,
DOI 10.17487/RFC2629, June 1999,
<https://www.rfc-editor.org/info/rfc2629>.
[RFC4710] Siddiqui, A., Romascanu, D., and E. Golovinsky, "Real-time
Application Quality-of-Service Monitoring (RAQMON)
Framework", RFC 4710, DOI 10.17487/RFC4710, October 2006,
<https://www.rfc-editor.org/info/rfc4710>.
[RFC5694] Camarillo, G., Ed. and IAB, "Peer-to-Peer (P2P)
Architecture: Definition, Taxonomies, Examples, and
Applicability", RFC 5694, DOI 10.17487/RFC5694, November
2009, <https://www.rfc-editor.org/info/rfc5694>.
6.3. URL References
[hadoop] The Apache Software Foundation,
"http://hadoop.apache.org/".
[hbase] The Apache Software Foundation,
"http://hbase.apache.org/".
[hive] The Apache Software Foundation, "http://hive.apache.org/".
Yang, et al. Expires November 28, 2020 [Page 20]
Internet-Draft Data Migration Standards May 2020
[idguidelines]
IETF Internet Drafts editor,
"http://www.ietf.org/ietf/1id-guidelines.txt".
[idnits] IETF Internet Drafts editor,
"http://www.ietf.org/ID-Checklist.html".
[ietf] IETF Tools Team, "http://tools.ietf.org".
[ops] the IETF OPS Area, "http://www.ops.ietf.org".
[spark] The Apache Software Foundation,
"http://spark.apache.org/".
[sqoop] The Apache Software Foundation,
"http://sqoop.apache.org/".
[xml2rfc] XML2RFC tools and documentation,
"http://xml.resource.org".
Authors' Addresses
Can Yang (editor)
South China University of Technology
382 Zhonghuan Road East
Guangzhou Higher Education Mega Centre
Guangzhou, Panyu District
P.R.China
Phone: +86 18602029601
Email: cscyang@scut.edu.cn
Yu Liu&Ying Wang&ShiYing Pan (editor)
South China University of Technology
382 Zhonghuan Road East
Guangzhou Higher Education Mega Centre
Guangzhou, Panyu District
P.R.China
Email: 201820132798@scut.edu.cn
Yang, et al. Expires November 28, 2020 [Page 21]
Internet-Draft Data Migration Standards May 2020
Cong Chen
Inspur
163 Pingyun Road
Guangzhou, Tianhe District
P.R.China
Email: chen_cong@inspur.com
Ge Chen
GSTA
No. 109 Zhongshan Road West, Guangdong Telecom Technology Building
Guangzhou, Tianhe District
P.R.China
Email: cheng@gsta.com
Yukai Wei
Huawei
Putian Huawei base
Shenzhen, Longgang District
P.R.China
Email: weiyukai@huawei.com
Yang, et al. Expires November 28, 2020 [Page 22]