Internet DRAFT - draft-wang-cfrg-sda-wsn
draft-wang-cfrg-sda-wsn
Network Working Group L. Wang
Internet-Draft L. Chen
Intended status: Informational SoutnEast University
Expires: December 21, 2014 J. Huang
SouthEast University
June 19, 2014
A Secure Data Aggregation Scheme Based on Key Vector Sharing in Wireless
Sensor Network
draft-wang-cfrg-sda-wsn-00
Abstract
This document specifies a secure data aggregation scheme based on key
vector sharing. New key set is negotiated among sink and sensor
nodes, and then each sensor node uses one key in the set to encrypt
and decrypt sensory data. A key vector structure is constructed and
shared among them. Since the size of the shared key vector structure
is constant, the transmission overhead is saved in the proposed
scheme while security is ensured. Moreover, MAC integrity mechanism
is also included in the proposed scheme. The recommended scheme
reduces the transmitted overhead so as to save the energy consumption
of each sensor node.Meanwhile, the scheme is data-loss resilient.
Distribution of this memo is unlimited.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on December 21, 2014.
Copyright Notice
Copyright (c) 2014 IETF Trust and the persons identified as the
document authors. All rights reserved.
Wang, et al. Expires December 21, 2014 [Page 1]
Internet-Draft CTA June 2014
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1. Requirements of the secure data aggregation in WSNs . . . 3
1.2. Meaning of Symbols . . . . . . . . . . . . . . . . . . . 4
2. Network model and specify methods . . . . . . . . . . . . . . 4
2.1. Network model . . . . . . . . . . . . . . . . . . . . . . 4
2.2. Secure data aggregation . . . . . . . . . . . . . . . . . 4
2.3. MAC processing . . . . . . . . . . . . . . . . . . . . . 5
3. Secure data aggregation scheme based on Key Vector Sharing . 5
3.1. Preparation of the scheme . . . . . . . . . . . . . . . . 6
3.2. Process for the node . . . . . . . . . . . . . . . . . . 7
4. Security Considerations . . . . . . . . . . . . . . . . . . . 8
5. Overhead Considerations . . . . . . . . . . . . . . . . . . . 8
6. iana consideration . . . . . . . . . . . . . . . . . . . . . 9
7. Robustness analysis . . . . . . . . . . . . . . . . . . . . . 9
8. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 9
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 10
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10
1. Introduction
As an important part of Internet of Things (IOT), Wireless Sensor
Networks (WSNs) are used in collecting useful information by means of
deploying mounts of sensors in the real scene.Then, the sink and data
center can use this information for further analysis and process.
Nowadays WSNs are widely applied in many fields, such as military
surveillance, factory or environmental monitoring, battlefield and
health care.
Due to limited energy of sensor node in WSNs, data aggregation
technology is widely used in wireless sensor networks. The privacy
preserving of sensor node and secure transmission of WSNs have become
two of the key problems needed to be solved in data aggregation.More
information of constrained-node networks could be found in RFC
7228[RFC7228]. [RFC7228]
Wang, et al. Expires December 21, 2014 [Page 2]
Internet-Draft CTA June 2014
In typical WSNs, large numbers of nodes are distributed in a
particular area and self-organize into a network. Each sensor node
sends its own sensory data to a parent node. Then the parent node
performs the aggregation on its own sensory data and the data
received from its child node, and sends the aggregated result to its
own parent node too. After layer by layer uploading, ultimate
aggregated data reach the sink. In many practical applications, the
sink node is only interested in the statistics of those sensor data,
such as sum of the data, average and variance of those data, etc. As
these results can be all obtained in a SUM model of data aggregation
with some additional compute, we will mainly consider the addition
model of data aggregation.have been proposed and used, and can
basically meet the corresponding requirements. But due to the open
feature of WSNs, the exposure of context information can cause the
user's secret leak to the attacker, especially attacker can locate
the source node or destination node (base station) through traffic
analysis and hop track packet stream. The location of source node
and base station are very sensitive in many WSN applications such as
in precious animals detection system, the position of animals (source
node) can't be exposed to illegal hunters; in battlefield information
collection system, the position of soldier (sink node) who accepts a
variety of information can't be exposed to the enemy. Because of the
importance and necessity of location privacy, this paper has a
research on principles, advantages and disadvantages of current main
location privacy protection agreement.
1.1. Requirements of the secure data aggregation in WSNs
In the worst case, the attacker eavesdropped or captured an
aggregation node, more valuable data will be achieved.
This document aims to recommend a secure data aggregation scheme to
meet the following requirements.
1) Security: the proposed scheme should protect the data collected by
the sensor nodes from being easily obtained by eavesdroppers or
attackers.
2) Energy-saving: the proposed scheme should ensure not only that the
computation complexity in sensor nodes and aggregation nodes is as
simple as possible, but also the size of transmitted data among nodes
is short enough too.
3) Robustness: Since the sensor nodes in WSNs are often deployed in
the scene where nodes can't be replaced or charged artificially,
attack and steal is unavoidable. The proposed scheme should still
have the flexibility to work in the case of several node failure or
data loss.
Wang, et al. Expires December 21, 2014 [Page 3]
Internet-Draft CTA June 2014
In order to meet the above requirements, this document recommends a
Key Vector Sharing data aggregation scheme, which uses additional
homomorphic algorithm to encrypt each node's data and Key Vector
Sharing to achieve secure data aggregation.
1.2. Meaning of Symbols
N:Number of nodes in the whole network;
Ni:The ith Node;
P:Maximum number of child nodes in one aggregation node;
M:Modulus of modular arithmetic in the scheme;
G:Number of keys in overall key pool;
K:Number of keys in one key group;
ki:The ith key in the key group;
seed:Random number shared by sink and nodes;
Hash1:Hash function 1;
Hash2:Hash function 2 (different from Hash1).
2. Network model and specify methods
2.1. Network model
Firstly, we present the network model of secure data aggregation.
It is a multi-level tree topology, and has one sink to collect the
aggregation data result. Each parent node can have p child node at
most. Here, we assume there are 7 levels in total and p=3. Thus the
total number of nodes is N, while N=3279.
2.2. Secure data aggregation
In encryption process, we assume there are plaintext m [0, M-1],
random generated key stream k [0, M-1], then, ciphertext c=Enc(m, k,
M) =(m+ k) mod M.
In decryption process, the plaintext m=Dec(c, k, M) =(c-k) mod M.
When the encrypted data reaches the aggregation node, c1=Enc(m1, k1,
M), c2=Enc(m2, k2, M), the aggregated data is C=c1+c2=Enc(m1+m2,
k1+k2, M). However, when the sink receives the aggregation data, it
Wang, et al. Expires December 21, 2014 [Page 4]
Internet-Draft CTA June 2014
makes the decryption as Dec(c1+c2, k1+k2, M) =m1+m2. Then, we get
the aggregated plaintext.
Here, mod means a modulo operation, Enc(m, k, M) is the function
which uses k as a key to encrypt plaintext m. Dec(c, k, M)
represents the function to decrypt ciphertext c with the key k.
Obviously we have got an additional homomorphic scheme now.
However, we must know that if n different ciphertexts are added
together, module M has to be larger than the summary of all the mi,
otherwise the decryption will fail. In practical applications,
assuming that m=max(mi), then M can be chosen as M= N*m. For
example, if we have N=3279 nodes in our network, M should be not less
than 3279*m.
2.3. MAC processing
In end to end data aggregation scheme, integrity verification is a
good method to resist attacks. There are some schemes which have
been proposed to realize integrity verification in data aggregation.
The SIE scheme requires each node to compute the MAC [RFC2104]
[RFC2104]value (e.g. SHA hash value) of the key shared with sink,
and transmit the MAC value together with the encrypted sensor data to
the aggregation nodes and sink. The sink then calculates the sum of
response nodes' MAC, and compares the sum value with the received
hash value to verify integrity. Although this scheme has realized
integrity verification for CMT scheme, it does not address the ID
expansion problem and data-loss problem yet. For example, in CMT
scheme, since the sink just receives those nodes' IDs which don't
need to respond, then sink does not know the node IDs which drops
down suddenly or when data loss problems happen, this will result to
erroneous verification.
In this document, we improved the scheme as follows. Each node
appends the used key's MAC to the sensory data, aggregation nodes
only need to simply add these MACs. The sum result of the MACs will
be received by the sink finally. Meanwhile, vector V is send to the
sink too. V shows the used times of each key in the topology. Since
the sink knows all the shared keys, it can calculate the MAC value
based on the keys vector V. Therefore the sink compares the
calculated MAC with the MAC received to make sure that the data
hasn't been modified.
3. Secure data aggregation scheme based on Key Vector Sharing
Wang, et al. Expires December 21, 2014 [Page 5]
Internet-Draft CTA June 2014
3.1. Preparation of the scheme
Initially, the sink will randomly choose K keys from a big key pool
to build a new key group, here G is a great number but K need not be
set too big (e.g. G=2100, K=16). Then, each node will get one
unique key from the sink, which is randomly chosen from the key
group. Besides, the key of one node gotten from the sink is unknown
to another. The proposed secure data aggregation based on key vector
sharing has the following processing phases.
Phase 2: When an aggregating round begins, both the leaf nodes and
aggregation nodes collect the sensory data and encrypt their own
data, then leaf nodes send their encrypted data to aggregation nodes,
aggregation nodes perform aggregation over the encrypted data. The
processing details are shown as follows:
If Ni is a leaf node, it first calculates Hash1(seed,ki), here ki is
the unique key assigned from the sink. Then it uses the hash value
as the encryption key to be added with collected data di, the result
of the addition is the ciphertext C. Here, the addition is based on
modular M. Meanwhile, the node generates a vector V=(v1,...vi-
1,vi,...vK), which represents each key's used times, here vi will be
set 1 and others will be set 0 because node has just used ki one
time. The value of Hash2(seed,ki) will also be calculated by the
node, and denoted by H. In the end, the node connects C, V and H to
a data structure(C,V,H), which will be sent to its parent node.
If Nj is an aggregation node, it first calculates Hash1(seed,kj),
here kj is the unique key assigned to this node. Then it uses the
hash value as the encryption key to add with collected data dj, thus
ciphertext C is obtained. Meanwhile, the aggregation node generates
a vector V=(v1,...vi-1,vi,...vK) which represents each key's used
times, here vj will be set 1 and others will be set 0 because the
node has just used kj one time. The Hash2(seed,kj) value will also
be calculated by the node, and denoted by H. Then, the aggregation
node adds its own C with other Cs received from its children nodes,
adds its own V vector with other Vs received from its children nodes,
and adds its own H with other Hs received from its children modes.
When addition of Hs is computed, the overflow bits in the modular
addition can be ignored.
Phase 3: After receiving the aggregated data, the sink first
calculates all the values of Hash1(seed,kj)(i=1,2,...K) and
Hash2(seed,kj)(i=1,2,...K), because it has all the keys in the key
group. According to the received vector V, the sink can retrieve the
sum of all sensed data' sum d1+d2+...dN. The plaintext of
aggregation data result will be regained by modular subtracting vi
times of each Hash1(seed,kj) from the received ciphertext C.
Wang, et al. Expires December 21, 2014 [Page 6]
Internet-Draft CTA June 2014
Meanwhile, the sink sums vi times of each Hash2(seed,kj) to get a
value denoted as H'. Then it compares the H'value with the received
H to verify the transmitted data integrity.
3.2. Process for the node
Phase 1: In the beginning, the sink broadcasts a request to all
sensors to generate network topology (same as TAG scheme). After
that, the sink broadcasts a seed for a new aggregating round.
However, it's optional. As we can use the synchronous time or a
serial number as the seed, which needn't be broadcast every round.
Phase 2: When an aggregating round begins, both the leaf nodes and
aggregation nodes collect the sensory data and encrypt their own
data, then leaf nodes send their encrypted data to aggregation nodes,
aggregation nodes perform aggregation over the encrypted data. The
processing details are shown as follows:
If Ni is a leaf node, it first calculates Hash1(seed,ki), here ki is
the unique key assigned from the sink. Then it uses the hash value
as the encryption key to be added with collected data di, the result
of the addition is the ciphertext C. Here, the addition is based on
modular M. Meanwhile, the node generates a vector V=(v1,...vi-
1,vi,...vK), which represents each key's used times, here vi will be
set 1 and others will be set 0 because node has just used ki one
time. The value of Hash2(seed,ki) will also be calculated by the
node, and denoted by H. In the end, the node connects C, V and H to
a data structure(C,V,H), which will be sent to its parent node.
If Nj is an aggregation node, it first calculates Hash1(seed,kj),
here kj is the unique key assigned to this node. Then it uses the
hash value as the encryption key to add with collected data dj, thus
ciphertext C is obtained. Meanwhile, the aggregation node generates
a vector V=(v1,...vi-1,vi,...vK) which represents each key's used
times, here vj will be set 1 and others will be set 0 because the
node has just used kj one time. The Hash2(seed,kj) value will also
be calculated by the node, and denoted by H. Then, the aggregation
node adds its own C with other Cs received from its children nodes,
adds its own V vector with other Vs received from its children nodes,
and adds its own H with other Hs received from its children modes.
When addition of Hs is computed, the overflow bits in the modular
addition can be ignored.
Phase 3: After receiving the aggregated data, the sink first
calculates all the values of Hash1(seed,kj)(i=1,2,...K) and
Hash2(seed,kj)(i=1,2,...K), because it has all the keys in the key
group. According to the received vector V, the sink can retrieve the
sum of all sensed data' sum d1+d2+...dN. The plaintext of
Wang, et al. Expires December 21, 2014 [Page 7]
Internet-Draft CTA June 2014
aggregation data result will be regained by modular subtracting vi
times of each Hash1(seed,kj) from the received ciphertext C.
Meanwhile, the sink sums vi times of each Hash2(seed,kj) to get a
value denoted as H'. Then it compares the H'value with the received
H to verify the transmitted data integrity.
4. Security Considerations
We take the tree network topology with seven levels for instance. In
this network topology, each parent node can have 3 child nodes at
most, so the whole network has not more than 3279 nodes in total. We
assume that key group's size is 16(K=16). Thus, for the sink, there
are 16 numbers in the final V it received, and each number could be
anyone range between 1 and 3279. Since different V represents
different used times of each key in the encryption process, the
vector's space is so huge (more than 2200) for an attacker to crack
the ciphertext. Due to the addition of V in the aggregation process,
two adjacent aggregation nodes have totally different V vector, thus
one node cannot decrypt the other node's ciphertext. Moreover, for a
leaf node, because it does not know which key its adjacent node has
got from the key group and it holds only one key, it can't decrypt
the data of an adjacent node too. In summary, the proposed scheme
provides good privacy protection for the nodes and the transmitted
data.
Furthermore, if an attacker captures a node, all he can get is the
data structure (C,V,H) of the node. What he may want is to find
another node with the same vector V which will help him to analysis
the ciphertext C. In the worst case, assuming that the attacker can
eavesdrop on all nodes in the entire network and get all nodes'
transmitted data, there will be two cases when the attacker looks for
the nodes having the same key vector V: 1) If the attacker captures a
leaf node, then he may find 1/16 of all the leaf nodes, which is just
4% of the nodes in whole network. Because these nodes are randomly
located in the area, they don't represent any actually useful
information. 2) If the attacker captures an aggregation node, the
probability of finding other nodes having the same vector V will be
greatly reduced. For example, the probability is 1/103 in layer 6,
and is 1/106 in layer 5. It is very hard for the attacker to find
another aggregation node having same vector V.
5. Overhead Considerations
The recommended scheme only requires a vector V having a limited
number of bits instead of transmission of all nodes' ID. Vector V
won't rapidly expand in the process of aggregation. For example,
based on the communication overhead discussion of CMT scheme,
transmission overhead of each node increases from 75 bits in bottom
Wang, et al. Expires December 21, 2014 [Page 8]
Internet-Draft CTA June 2014
level to 950 bits in level 1 in the case that 10% nodes didn't
respond. It becomes worse in the case that 30% nodes didn't respond,
while communication overhead increases from 75 bits to 2700 bits.
However, in this document, the communication overhead ranges from 75
bits to 187 bits, and most of the nodes' overhead is not larger than
75 bits.
Since the sensory data's measure range is m=27, in order to decrypt
the ciphertext properly, modulus M should be at least N*m. Hence
each node's encrypted data segment needs 12+7=19bits (here we assume
n=212=4096>3279). For consistency, we also plus 56 additional load
bits like the CMT scheme, thus the basic data segment is 56+19=75bits
in size. In CMT scheme, each node in the highest level of the
aggregation tree will transmit at least 950 bits in the case that 10%
nodes didn't respond. However, in the proposed scheme, each node in
the highest level only transmits at most 187 bits in the average
case. Because the average using times of ki after the last
aggregation is:3276(1-10%)/3*16=61<26<<27, each vi in the V needs 7
bits at most to represent the number of used times, V's length is
7*16=112 bits, and the total bits required for transmitting is
75+112=187 bits. Obviously, the proposed scheme saves over 80% of
the overhead in the high level, and for the nodes in other lower
levels, the length of transmitted data is also shorter than those
related schemes too.
6. iana consideration
To be completed.
7. Robustness analysis
The recommended scheme has solved the data-loss problem as well. For
example, when a node dropped due to a sudden accident, the sink
didn't need to know its ID. Because once there is a node dropped or
data lost in the transferring process, we will lose not only C but
also V and H, the lost data's V and H will not reach the aggregation
node, so the correctness of the decrypted data is guaranteed. Thus,
the proposed scheme is robust, which is very important for a sensor
network which changes topology frequently.
8. Conclusions
In this document, the recommended secure data aggregation scheme
based on Key Vector Sharing overcomes the problem of rapid expansion
of transferred IDs, reduces the additional overhead due to
transferring of ID information, and improves the endurance of the
network. Moreover, by using hash function, the scheme guarantees the
Wang, et al. Expires December 21, 2014 [Page 9]
Internet-Draft CTA June 2014
integrity of the aggregation. In addition, solving the data-loss
problem makes the proposed scheme more robust and practical.
9. References
[RFC7228] Bormann, C., Ersue, M., and A. Keranen, "Terminology for
Constrained-Node Networks", May 2014.
[RFC2104] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed-
Hashing for Message Authentication", February 1997.
Authors' Addresses
Likun Wang
SoutnEast University
N0.9, Mo Zhoudong street Nan Jing, Jiang Su Province 210096
Phone: +86 18795858986
Email: kunliw@aliyun.com
Liquan Chen
SoutnEast University
N0.9, Mo Zhoudong street Nan Jing, Jiang Su Province 210096
Phone: +86 13813852253
Email: Lqchen@seu.edu.cn
Jie Huang
SouthEast University
N0.9, Mo Zhoudong street Nan Jing, Jiang Su Province 210096
Phone: +86 13675178016
Email: jhuang@seu.edu.cn
Wang, et al. Expires December 21, 2014 [Page 10]