Internet Engineering Task Force | S. Yang |
Internet-Draft | CUHK(SZ) |
Intended status: Informational | X. Huang |
Expires: November 26, 2020 | R.W. Yeung |
CUHK | |
J.K. Zao | |
NCTU | |
May 25, 2020 |
BATS Coding Scheme for Multi-hop Data Transport
draft-yang-nwcrg-bats-03
This document describes a BATS coding scheme for communication through multi-hop networks. BATS code is a class of efficient linear network coding scheme with a matrix generalization of fountain codes as the outer code, and batch-based linear network coding as the inner code.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on November 26, 2020.
Copyright (c) 2020 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
This document specifies a BATS code scheme for data delivery in multi-hop networks. The BATS code described here includes an outer code and an inner code. The outer code is a matrix generalization of fountain codes (see also the RapterQ code described in RFC 6330), which inherits the advantages of reliability and efficiency and possesses the extra desirable property of being network coding compatible. The inner code, also called recoding, is formed by linear network coding for combating packet loss, improving the multicast efficiency, etc. A detailed design and analysis of BATS codes are provided in the BATS monograph.
The BATS coding scheme can be applied in multi-hop networks formed by wireless communication links, which are inherently unreliable due to interference. Existing network protocols like TCP/IP use end-to-end retransmission and store-and-forward at the relays, so that packet loss would accumulate along the way.
The BATS coding scheme can be used for various data delivery applications like file transmission, video streaming over wireless multi-hop networks, etc. Different from traditional forward error correcting (FEC) schemes that are applied either hop-by-hop or end-to-end, the BATS coding scheme combines the end-to-end coding (the outer code) with certain hop-by-hop coding (the inner code), and hence can potentially achieve better performance.
The coding scheme described here can be used in a network with multiple communication flows. For each flow, the source node encodes the data for transmission separately. Inside the network, however, it is possible to mix the packets from different flows for recoding. In this document, we describe a simple case where recoding is performed within each flow. Note that the same encoding/decoding scheme described here can be used with different recoding schemes as long as they follow the principle as we illustrate in this document.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
A BATS coding scheme includes an outer code encoder (also called encoder), an inner code encoder (also called recoder) and a decoder. The BATS coding scheme can be used for a single data flow that includes a single source and one or multiple destinations. Thus there exists only one encoder with multiple recoders and decoders. The BATS coding scheme described in this document can be used by a Data Delivery Protocol (DDP) with the following procedures.
Suppose that the DDP has F octets of data for transmission. We describe the procedures of one BATS session for transmitting the F octets. There is a limit on F of a single BATS session. If the total data has more than the limit, the data needs to be transmitted using multiple BATS sessions. The limit on F of a single BATS session depends on the MTU (maximum transmission unit) of the network, which MUST be known by the DDP. We have F is no more than (MTU-10)2^16-1 octets.
The DDP first determines the following parameters:
Based on the above parameters, the parameters T, O and K are calculated as follows:
The data MUST be padded to have T*K octets, which will be partitioned into K source packets b[0], ..., b[K-1], each of T octets. In our padding scheme, b[0], ..., b[K-2] are filled with data bits, and b[K-1] is filled with the remaining data octets and padding octets. Let P = K*T-F denote the number of padding octets. We use b[K-1, 0], ..., b[K-1, T-P-1] to denote the T-P source octets and b[K-1, T-P], ..., b[K-1, T-1] to denote the P padding octets in b[K-1], respectively. The padding process is shown in Figure 1.
Z = T - P Let bl be the last source packet b[K-1] for i = 1, 2, ... do if Z + i >= T - 1 do bl[Z...T-1] = i break bl[Z...Z+i-1] = i Z = Z + i
Figure 1: Data Padding Process
The DDP provides the BATS encoder with the following information:
Using this information, the (outer code) encoder generates a batch for each batch ID. For the batch ID j, the encoder returns the DDP that contains
The DDP will use the batches to form DDP packets to be transmitted to other network nodes towards the destination nodes. The DDP MUST deliver with each coded packet its
The DDP MUST deliver the following information to each recoder:
The DDP MUST deliver the following information to each decoder:
The BATS payload size TO MUST be known by all the nodes.
Both the source node and the intermediate nodes perform recoding on the batches before transmission. At the source node, the recoder receives the batches from the outer code encoding procedure. At an intermediate node, the DDP receives the DDP packets from the other network nodes, and should be able to extract coded packets and the corresponding batch information from these packets.
The DDP provides the recoder with the following information:
For a received batch, the recoder determines a positive integer Mr, the number of recoded packets to be transmitted for the batch. The recoder uses the information provided by the DDP to generate Mr recoded packets, each containing TO octets. The DDP uses the Mr recoded packets to form the DDP packets for transmitting.
A destination node needs the data transmitted by the source node. At the destination node, the DDP receives DDP packets from the other network nodes, and should be able to extract coded packets and the corresponding batch information from these packets.
The DDP provides the decoder with the following information:
The decoder uses this information to decode the K source packets. If successful, the decoder returns the recovered K source packets to the DDP, which will use the K source packets to form the F octets data. The recommended padding process is shown as follows:
// this procedure returns the number P of padding octets // at the end of b[K-1] Let bl be the last decoded source packet b[K-1] PL = bl[T-1] if PL == 1 do return P = 1 WI = T - 1 while bl[WI] == PL do WI = WI - 1 return P = (1 + bl[WI]) * bl[WI] + T - WI - 1
Figure 2: Data Depadding Process
The recommendation for the parameters M and q is shown as follows:
It is RECOMMENDED that K is at least 128. However, the encoder/decoder SHALL support an arbitrary positive integer value less than 2^16.
A DDP can form a DDP packet with a header (5 octets), a footer (3 octets) and a payload (TO octets). A DDP packet has totally 8+TO octets.
The BATS packet header has 40 bits (5 octets) and includes fields Packet_Count, Mq, Batch_ID, and Degree.
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Packet_Count | Mq | Batch_ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Degree | +-+-+-+-+-+-+-+-+
Figure 3: BATS packet header format.
Mq | M | q | O |
---|---|---|---|
0010 | 16 | 2 | 2 |
0100 | 32 | 2 | 4 |
0110 | 64 | 2 | 8 |
0001 | 8 | 256 | 8 |
0011 | 16 | 256 | 16 |
0101 | 32 | 256 | 32 |
0111 | 64 | 256 | 64 |
O T +-----------------------+-------------------------------+ | coefficient vector | coded data | +-----------------------+-------------------------------+
Figure 4: BATS packet payload format.
The payload has TO octets, where the first O octets contain the coefficient vector and the remaining T octets contain the coded data. Information in both fields MAY be encoded in JSON (ASCII) or protobuf (binary) formats.
0 1 2 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | signature | parity check | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 5: BATS packet footer format.
The footer has three octets.
The T octets of a source packets are treated as a column vector of T elements in GF(256). Linear algebra and matrix operations over finite fields are assumed in this section.
Suppose that a pseudorandom number generator Rand() which generate an unsigned integer of 32 bits is shared by both encoding and decoding. The pseudorandom generator can be initialized by Rand_Init(S) with seed S. When S is not provided, the pseudorandom generator is initialized arbitrarily. One example of such a pseudorandom generator is defined in RFC 8682.
A function called BatchSampler is used in both encoding and decoding. The function takes two integers j and d as input, and generates an array idx of d integers and a d x M matrix G. The function first initializes the pseudorandom generator with j, sample d distinct integers from 0 to K-1 as idx, and sample d*M integers from 0 to 255 as G. See the pseudocode in Figure 6.
function BatchSampler(j,d) // initialize the pseudorandom generator by seed j. Rand_Init(j) // sample d distinct integers between 0 and K-1. for k = 0, ..., d-1 do r = Rand() % K while r already exists in idx do r = Rand() % K idx[k] = r // sample d x M matrix for r = 0, ..., d-1 do for c = 0,...,M-1 do G[r,c] = Rand() % 256 return idx, G
Figure 6: Batch Sampler Function
Define a function called DegreeSampler that return an integer d using the degree distribution DD. We expect that the empirical distribution of the returning d converges to DD(d) when d < K. One design of DegreeSampler is illustrated in Figure 7.
function DegreeSampler(j, DD) Let CDF be an array CDF[0] = 0 for i = 1, ..., MAX_DEG do CDF[i] = CDF[i-1] + DD[i] Rand_Init() r = Rand() % CDF[MAX_DEG] for d = 1, ..., MAX_DEG do if r >= CDF[d] do return min(d,K) return min(MAX_DEG,K)
Figure 7: Degree Sampler Function
Let b[0], b[1], ..., b[K-1] be the K source packets. A batch with BID j is generated using the following steps.
See the pseudocode of the batch generating process in Figure 8.
function GenBatch(j) d = DegreeSampler(j) (idx, G) = BatchSampler(j,d) B = (b[idx[0]], b[idx[i]], ..., b[idx[d-1]]) X = B * G Xr = [I_M; X] return Xr
Figure 8: Batch Generation Function
The inner code comprises (random) linear network coding applied on the coded packets belonging to the same batch. At a particular network node, recoded packets are generated by (random) linear combinations of the received coded packets of a batch. The recoded packets have the same BID, sparse degree and coded packet length.
The number Mr of recoded packets for a batch is decided first by the recoder. Mr can be set as M. When the link statistics is known, the recoder can try to obtain the link packet loss rate e for the link to transmit the recoded batch, and set Mr to be (1+e)M.
Suppose that coded packets xr[i], i = 0, 1, ..., r-1, which have the same BID j, have been received at an intermediate node. Using the recommended packet format, it can be verified whether the corresponding packet headers of these coded packets are the same. Then a recoded packet can be generated by one of the following two approaches:
A recoder can combine these two approaches to generate recoded packets. For example, the recoder will output xr[i], i = 0, 1, ..., r-1 as r systematic recoded packets and generate Mr-r recoded packets using linear combinations of randomly chosen coefficients.
The decoder receives a sequence of batches Yr[j], j = 0, 1, ..., n-1, each of which is a TO-row matrix over GF(256). The degree d[j] of batch j is also known. Let Y[j] be the submatrix of the last T rows of Yr[j]. When q = 256, let H[j] be the first M rows of Yr[j]; when q = 2, let H[j] be the matrix over GF(256) formed by embedding each bit in the first M/8 rows of Yr[j] into GF(256).
By calling BatchSampler with input j and d[j], we obtain idx[j] and G[j]. According to the encoding and recoding processes described in Section 3.2 and Section 3.3, we have the system of linear equations Y[j] = B[j]G[j]H[j] for each received batch with ID j, where B[j] = (b[idx[j,0]], b[idx[j,1]], ..., b[idx[j,d-1]]) is unknown.
We describe a belief propagation (BP) decoder that can efficiently solve the source packets when a sufficient number of batches have been received. A batch j is said to be decodable if rank(G[j]H[j]) = d[j] (i.e., the system of linear equations Y[j] = B[j]G[j]H[j] with B[j] as the variable matrix has a unique solution). The BP decoding algorithm has multiple iterations. Each iteration is formed by the following steps:
The BP decoder repeats the above steps until no batches are decodable during the decoding step.
This memo includes no request to IANA.
Subsuming both Random Linear Network Codes (RLNC) and fountain codes, BATS codes naturally inherit both their desirable capability of offering confidentiality protection as well as their vulnerability towards pollution attacks.
Since the transported messages are linearly combined with random coefficients at each recoding node, it is statistically impossible to recover the individual messages by capturing the coded messages at any one or small number of nodes. As long as the coding matrices of the transported messages cannot be fully recovered, any attempt of decoding is equivalent to randomly guessing the transported messages. Thus, with the use of BATS codes, information confidentiality throughout the data transport process is assured.
The only threat towards confidentiality exists in the form of eavesdropping on the initial encoding process, which takes place at the encoding nodes. In these nodes, the transported data are presented in plain text and can be read along their transfer paths. Hence, information isolation between the encoding process and all other user processes running on the node must be assured.
In addition, the authenticity and trustworthiness of the encoding, recoding and decoding program running on all the nodes must be attested by a trusted authority. Such a measure is also necessary in countering pollution attacks.
Like all network codes, BATS codes are vulnerable under pollution attacks. In these attacks, one or more compromised coding node(s) can pollute the coded messages or inject forged messages into the coding network. These attacks can prevent the receivers from recovering the transported data correctly. Although error detection mechanisms can be put in place to prevent the receivers from getting the wrong messages, detection and discard of the polluted messages still constitute a form of denial-of-service (DoS) attack.
The research community has long been investigating the use of various signature schemes (including homomorphic signatures) to identify the forged messages and stall the attacks (see Zhao07, Yu08, Agrawal09). Nevertheless, these countermeasures are regarded as being computationally too expensive to be employed in broadband communications. A practical approach to protect against pollution attacks consist of the following system-level countermeasures:
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997. |
[RFC8682] | Saito, M., Matsumoto, M., Roca, V. and E. Baccelli, "TinyMT32 Pseudorandom Number Generator (PRNG)", RFC 8682, DOI 10.17487/RFC8682, January 2020. |
[Agrawal09] | Agrawal, S. and D. Boneh, "Homomorphic MACs: MAC-based integrity for network coding", International Conference on Applied Cryptography and Network Security , 2009. |
[BATS] | Yang, S. and R. Yeung, "Batched Sparse Codes", IEEE Transactions on Information Theory 60(9), 5322-5346, 2014. |
[BATSMonograph] | Yang, S. and R. Yeung, "BATS Codes: Theory and Practice", Morgan & Claypool Publishers , 2017. |
[RFC6330] | Luby, M., Shokrollahi, A., Watson, M., Stockhammer, T. and L. Minder, "RaptorQ Forward Error Correction Scheme for Object Delivery", RFC 6330, DOI 10.17487/RFC6330, August 2011. |
[Yu08] | Yu, Z., Wei, Y., Ramkumar, B. and Y. Guan, "An Efficient Signature-Based Scheme for Securing Network Coding Against Pollution Attacks", INFOCOM , 2008. |
[Zhao07] | Zhao, F., Kalker, T., Medard, M. and K. Han, "Signatures for content distribution with network coding", ISIT , 2007. |
This becomes an Appendix.