Network Working Group | L. Wang |
Internet-Draft | L. Chen |
Intended status: Informational | SoutnEast University |
Expires: December 21, 2014 | J. Huang |
SouthEast University | |
June 19, 2014 |
A Secure Data Aggregation Scheme Based on Key Vector Sharing in Wireless Sensor Network
draft-wang-cfrg-sda-wsn-00
This document specifies a secure data aggregation scheme based on key vector sharing. New key set is negotiated among sink and sensor nodes, and then each sensor node uses one key in the set to encrypt and decrypt sensory data. A key vector structure is constructed and shared among them. Since the size of the shared key vector structure is constant, the transmission overhead is saved in the proposed scheme while security is ensured. Moreover, MAC integrity mechanism is also included in the proposed scheme. The recommended scheme reduces the transmitted overhead so as to save the energy consumption of each sensor node.Meanwhile, the scheme is data-loss resilient. Distribution of this memo is unlimited.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on December 21, 2014.
Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
As an important part of Internet of Things (IOT), Wireless Sensor Networks (WSNs) are used in collecting useful information by means of deploying mounts of sensors in the real scene.Then, the sink and data center can use this information for further analysis and process. Nowadays WSNs are widely applied in many fields, such as military surveillance, factory or environmental monitoring, battlefield and health care.
Due to limited energy of sensor node in WSNs, data aggregation technology is widely used in wireless sensor networks. The privacy preserving of sensor node and secure transmission of WSNs have become two of the key problems needed to be solved in data aggregation.More information of constrained-node networks could be found in RFC 7228[RFC7228]. [RFC7228]
In typical WSNs, large numbers of nodes are distributed in a particular area and self-organize into a network. Each sensor node sends its own sensory data to a parent node. Then the parent node performs the aggregation on its own sensory data and the data received from its child node, and sends the aggregated result to its own parent node too. After layer by layer uploading, ultimate aggregated data reach the sink. In many practical applications, the sink node is only interested in the statistics of those sensor data, such as sum of the data, average and variance of those data, etc. As these results can be all obtained in a SUM model of data aggregation with some additional compute, we will mainly consider the addition model of data aggregation.have been proposed and used, and can basically meet the corresponding requirements. But due to the open feature of WSNs, the exposure of context information can cause the user's secret leak to the attacker, especially attacker can locate the source node or destination node (base station) through traffic analysis and hop track packet stream. The location of source node and base station are very sensitive in many WSN applications such as in precious animals detection system, the position of animals (source node) can’t be exposed to illegal hunters; in battlefield information collection system, the position of soldier (sink node) who accepts a variety of information can’t be exposed to the enemy. Because of the importance and necessity of location privacy, this paper has a research on principles, advantages and disadvantages of current main location privacy protection agreement.
In the worst case, the attacker eavesdropped or captured an aggregation node, more valuable data will be achieved.
This document aims to recommend a secure data aggregation scheme to meet the following requirements.
1) Security: the proposed scheme should protect the data collected by the sensor nodes from being easily obtained by eavesdroppers or attackers.
2) Energy-saving: the proposed scheme should ensure not only that the computation complexity in sensor nodes and aggregation nodes is as simple as possible, but also the size of transmitted data among nodes is short enough too.
3) Robustness: Since the sensor nodes in WSNs are often deployed in the scene where nodes can’t be replaced or charged artificially, attack and steal is unavoidable. The proposed scheme should still have the flexibility to work in the case of several node failure or data loss.
In order to meet the above requirements, this document recommends a Key Vector Sharing data aggregation scheme, which uses additional homomorphic algorithm to encrypt each node’s data and Key Vector Sharing to achieve secure data aggregation.
N:Number of nodes in the whole network;
Ni:The ith Node;
P:Maximum number of child nodes in one aggregation node;
M:Modulus of modular arithmetic in the scheme;
G:Number of keys in overall key pool;
K:Number of keys in one key group;
ki:The ith key in the key group;
seed:Random number shared by sink and nodes;
Hash1:Hash function 1;
Hash2:Hash function 2 (different from Hash1).
Firstly, we present the network model of secure data aggregation.
It is a multi-level tree topology, and has one sink to collect the aggregation data result. Each parent node can have p child node at most. Here, we assume there are 7 levels in total and p=3. Thus the total number of nodes is N, while N=3279.
In encryption process, we assume there are plaintext m [0, M-1], random generated key stream k [0, M-1], then, ciphertext c=Enc(m, k, M) =(m+ k) mod M.
In decryption process, the plaintext m=Dec(c, k, M) =(c-k) mod M. When the encrypted data reaches the aggregation node, c1=Enc(m1, k1, M), c2=Enc(m2, k2, M), the aggregated data is C=c1+c2=Enc(m1+m2, k1+k2, M). However, when the sink receives the aggregation data, it makes the decryption as Dec(c1+c2, k1+k2, M) =m1+m2. Then, we get the aggregated plaintext.
Here, mod means a modulo operation, Enc(m, k, M) is the function which uses k as a key to encrypt plaintext m. Dec(c, k, M) represents the function to decrypt ciphertext c with the key k. Obviously we have got an additional homomorphic scheme now.
However, we must know that if n different ciphertexts are added together, module M has to be larger than the summary of all the mi, otherwise the decryption will fail. In practical applications, assuming that m=max(mi), then M can be chosen as M= N*m. For example, if we have N=3279 nodes in our network, M should be not less than 3279*m.
In end to end data aggregation scheme, integrity verification is a good method to resist attacks. There are some schemes which have been proposed to realize integrity verification in data aggregation. The SIE scheme requires each node to compute the MAC [RFC2104] [RFC2104]value (e.g. SHA hash value) of the key shared with sink, and transmit the MAC value together with the encrypted sensor data to the aggregation nodes and sink. The sink then calculates the sum of response nodes’ MAC, and compares the sum value with the received hash value to verify integrity. Although this scheme has realized integrity verification for CMT scheme, it does not address the ID expansion problem and data-loss problem yet. For example, in CMT scheme, since the sink just receives those nodes’ IDs which don’t need to respond, then sink does not know the node IDs which drops down suddenly or when data loss problems happen, this will result to erroneous verification.
In this document, we improved the scheme as follows. Each node appends the used key’s MAC to the sensory data, aggregation nodes only need to simply add these MACs. The sum result of the MACs will be received by the sink finally. Meanwhile, vector V is send to the sink too. V shows the used times of each key in the topology. Since the sink knows all the shared keys, it can calculate the MAC value based on the keys vector V. Therefore the sink compares the calculated MAC with the MAC received to make sure that the data hasn’t been modified.
Initially, the sink will randomly choose K keys from a big key pool to build a new key group, here G is a great number but K need not be set too big (e.g. G=2100, K=16). Then, each node will get one unique key from the sink, which is randomly chosen from the key group. Besides, the key of one node gotten from the sink is unknown to another. The proposed secure data aggregation based on key vector sharing has the following processing phases.
Phase 2: When an aggregating round begins, both the leaf nodes and aggregation nodes collect the sensory data and encrypt their own data, then leaf nodes send their encrypted data to aggregation nodes, aggregation nodes perform aggregation over the encrypted data. The processing details are shown as follows:
If Ni is a leaf node, it first calculates Hash1(seed,ki), here ki is the unique key assigned from the sink. Then it uses the hash value as the encryption key to be added with collected data di, the result of the addition is the ciphertext C. Here, the addition is based on modular M. Meanwhile, the node generates a vector V=(v1,...vi-1,vi,...vK), which represents each key’s used times, here vi will be set 1 and others will be set 0 because node has just used ki one time. The value of Hash2(seed,ki) will also be calculated by the node, and denoted by H. In the end, the node connects C, V and H to a data structure(C,V,H), which will be sent to its parent node.
If Nj is an aggregation node, it first calculates Hash1(seed,kj), here kj is the unique key assigned to this node. Then it uses the hash value as the encryption key to add with collected data dj, thus ciphertext C is obtained. Meanwhile, the aggregation node generates a vector V=(v1,...vi-1,vi,...vK) which represents each key’s used times, here vj will be set 1 and others will be set 0 because the node has just used kj one time. The Hash2(seed,kj) value will also be calculated by the node, and denoted by H. Then, the aggregation node adds its own C with other Cs received from its children nodes, adds its own V vector with other Vs received from its children nodes, and adds its own H with other Hs received from its children modes. When addition of Hs is computed, the overflow bits in the modular addition can be ignored.
Phase 3: After receiving the aggregated data, the sink first calculates all the values of Hash1(seed,kj)(i=1,2,...K) and Hash2(seed,kj)(i=1,2,...K), because it has all the keys in the key group. According to the received vector V, the sink can retrieve the sum of all sensed data’ sum d1+d2+...dN. The plaintext of aggregation data result will be regained by modular subtracting vi times of each Hash1(seed,kj) from the received ciphertext C. Meanwhile, the sink sums vi times of each Hash2(seed,kj) to get a value denoted as H’. Then it compares the H’value with the received H to verify the transmitted data integrity.
Phase 1: In the beginning, the sink broadcasts a request to all sensors to generate network topology (same as TAG scheme). After that, the sink broadcasts a seed for a new aggregating round. However, it’s optional. As we can use the synchronous time or a serial number as the seed, which needn’t be broadcast every round.
Phase 2: When an aggregating round begins, both the leaf nodes and aggregation nodes collect the sensory data and encrypt their own data, then leaf nodes send their encrypted data to aggregation nodes, aggregation nodes perform aggregation over the encrypted data. The processing details are shown as follows:
If Ni is a leaf node, it first calculates Hash1(seed,ki), here ki is the unique key assigned from the sink. Then it uses the hash value as the encryption key to be added with collected data di, the result of the addition is the ciphertext C. Here, the addition is based on modular M. Meanwhile, the node generates a vector V=(v1,...vi-1,vi,...vK), which represents each key’s used times, here vi will be set 1 and others will be set 0 because node has just used ki one time. The value of Hash2(seed,ki) will also be calculated by the node, and denoted by H. In the end, the node connects C, V and H to a data structure(C,V,H), which will be sent to its parent node.
If Nj is an aggregation node, it first calculates Hash1(seed,kj), here kj is the unique key assigned to this node. Then it uses the hash value as the encryption key to add with collected data dj, thus ciphertext C is obtained. Meanwhile, the aggregation node generates a vector V=(v1,...vi-1,vi,...vK) which represents each key’s used times, here vj will be set 1 and others will be set 0 because the node has just used kj one time. The Hash2(seed,kj) value will also be calculated by the node, and denoted by H. Then, the aggregation node adds its own C with other Cs received from its children nodes, adds its own V vector with other Vs received from its children nodes, and adds its own H with other Hs received from its children modes. When addition of Hs is computed, the overflow bits in the modular addition can be ignored.
Phase 3: After receiving the aggregated data, the sink first calculates all the values of Hash1(seed,kj)(i=1,2,...K) and Hash2(seed,kj)(i=1,2,...K), because it has all the keys in the key group. According to the received vector V, the sink can retrieve the sum of all sensed data’ sum d1+d2+...dN. The plaintext of aggregation data result will be regained by modular subtracting vi times of each Hash1(seed,kj) from the received ciphertext C. Meanwhile, the sink sums vi times of each Hash2(seed,kj) to get a value denoted as H’. Then it compares the H’value with the received H to verify the transmitted data integrity.
We take the tree network topology with seven levels for instance. In this network topology, each parent node can have 3 child nodes at most, so the whole network has not more than 3279 nodes in total. We assume that key group’s size is 16(K=16). Thus, for the sink, there are 16 numbers in the final V it received, and each number could be anyone range between 1 and 3279. Since different V represents different used times of each key in the encryption process, the vector’s space is so huge (more than 2200) for an attacker to crack the ciphertext. Due to the addition of V in the aggregation process, two adjacent aggregation nodes have totally different V vector, thus one node cannot decrypt the other node’s ciphertext. Moreover, for a leaf node, because it does not know which key its adjacent node has got from the key group and it holds only one key, it can’t decrypt the data of an adjacent node too. In summary, the proposed scheme provides good privacy protection for the nodes and the transmitted data.
Furthermore, if an attacker captures a node, all he can get is the data structure (C,V,H) of the node. What he may want is to find another node with the same vector V which will help him to analysis the ciphertext C. In the worst case, assuming that the attacker can eavesdrop on all nodes in the entire network and get all nodes’ transmitted data, there will be two cases when the attacker looks for the nodes having the same key vector V: 1) If the attacker captures a leaf node, then he may find 1/16 of all the leaf nodes, which is just 4% of the nodes in whole network. Because these nodes are randomly located in the area, they don’t represent any actually useful information. 2) If the attacker captures an aggregation node, the probability of finding other nodes having the same vector V will be greatly reduced. For example, the probability is 1/103 in layer 6, and is 1/106 in layer 5. It is very hard for the attacker to find another aggregation node having same vector V.
The recommended scheme only requires a vector V having a limited number of bits instead of transmission of all nodes’ ID. Vector V won’t rapidly expand in the process of aggregation. For example, based on the communication overhead discussion of CMT scheme, transmission overhead of each node increases from 75 bits in bottom level to 950 bits in level 1 in the case that 10% nodes didn’t respond. It becomes worse in the case that 30% nodes didn’t respond, while communication overhead increases from 75 bits to 2700 bits. However, in this document, the communication overhead ranges from 75 bits to 187 bits, and most of the nodes’ overhead is not larger than 75 bits.
Since the sensory data’s measure range is m=27, in order to decrypt the ciphertext properly, modulus M should be at least N*m. Hence each node’s encrypted data segment needs 12+7=19bits (here we assume n=212=4096>3279). For consistency, we also plus 56 additional load bits like the CMT scheme, thus the basic data segment is 56+19=75bits in size. In CMT scheme, each node in the highest level of the aggregation tree will transmit at least 950 bits in the case that 10% nodes didn’t respond. However, in the proposed scheme, each node in the highest level only transmits at most 187 bits in the average case. Because the average using times of ki after the last aggregation is:3276(1-10%)/3*16=61<26<<27, each vi in the V needs 7 bits at most to represent the number of used times, V’s length is 7*16=112 bits, and the total bits required for transmitting is 75+112=187 bits. Obviously, the proposed scheme saves over 80% of the overhead in the high level, and for the nodes in other lower levels, the length of transmitted data is also shorter than those related schemes too.
The recommended scheme has solved the data-loss problem as well. For example, when a node dropped due to a sudden accident, the sink didn’t need to know its ID. Because once there is a node dropped or data lost in the transferring process, we will lose not only C but also V and H, the lost data’s V and H will not reach the aggregation node, so the correctness of the decrypted data is guaranteed. Thus, the proposed scheme is robust, which is very important for a sensor network which changes topology frequently.
In this document, the recommended secure data aggregation scheme based on Key Vector Sharing overcomes the problem of rapid expansion of transferred IDs, reduces the additional overhead due to transferring of ID information, and improves the endurance of the network. Moreover, by using hash function, the scheme guarantees the integrity of the aggregation. In addition, solving the data-loss problem makes the proposed scheme more robust and practical.
[RFC7228] | Bormann, C., Ersue, M. and A. Keranen, "Terminology for Constrained-Node Networks", May 2014. |
[RFC2104] | Krawczyk, H., Bellare, M. and R. Canetti, "HMAC: Keyed-Hashing for Message Authentication", February 1997. |