wang-cfrg-sda-wsn-00.txt

Internet DRAFT - draft-wang-cfrg-sda-wsn
draft-wang-cfrg-sda-wsn

Last Version:	draft-wang-cfrg-sda-wsn-00.txt	Tracker Entry
Date:	`21-Jun-2014`
Disposition:	expired

Network Working Group                                            L. Wang
Internet-Draft                                                   L. Chen
Intended status: Informational                      SoutnEast University
Expires: December 21, 2014                                      J. Huang
                                                    SouthEast University
                                                           June 19, 2014


A Secure Data Aggregation Scheme Based on Key Vector Sharing in Wireless
                             Sensor Network
                       draft-wang-cfrg-sda-wsn-00

Abstract

   This document specifies a secure data aggregation scheme based on key
   vector sharing.  New key set is negotiated among sink and sensor
   nodes, and then each sensor node uses one key in the set to encrypt
   and decrypt sensory data.  A key vector structure is constructed and
   shared among them.  Since the size of the shared key vector structure
   is constant, the transmission overhead is saved in the proposed
   scheme while security is ensured.  Moreover, MAC integrity mechanism
   is also included in the proposed scheme.  The recommended scheme
   reduces the transmitted overhead so as to save the energy consumption
   of each sensor node.Meanwhile, the scheme is data-loss resilient.
   Distribution of this memo is unlimited.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on December 21, 2014.

Copyright Notice

   Copyright (c) 2014 IETF Trust and the persons identified as the
   document authors.  All rights reserved.




Wang, et al.            Expires December 21, 2014               [Page 1]

Internet-Draft                     CTA                         June 2014


   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
     1.1.  Requirements of the secure data aggregation in WSNs . . .   3
     1.2.  Meaning of Symbols  . . . . . . . . . . . . . . . . . . .   4
   2.  Network model and specify methods . . . . . . . . . . . . . .   4
     2.1.  Network model . . . . . . . . . . . . . . . . . . . . . .   4
     2.2.  Secure data aggregation . . . . . . . . . . . . . . . . .   4
     2.3.  MAC processing  . . . . . . . . . . . . . . . . . . . . .   5
   3.  Secure data aggregation scheme based on Key Vector Sharing  .   5
     3.1.  Preparation of the scheme . . . . . . . . . . . . . . . .   6
     3.2.  Process for the node  . . . . . . . . . . . . . . . . . .   7
   4.  Security Considerations . . . . . . . . . . . . . . . . . . .   8
   5.  Overhead Considerations . . . . . . . . . . . . . . . . . . .   8
   6.  iana consideration  . . . . . . . . . . . . . . . . . . . . .   9
   7.  Robustness analysis . . . . . . . . . . . . . . . . . . . . .   9
   8.  Conclusions . . . . . . . . . . . . . . . . . . . . . . . . .   9
   9.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  10
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  10

1.  Introduction

   As an important part of Internet of Things (IOT), Wireless Sensor
   Networks (WSNs) are used in collecting useful information by means of
   deploying mounts of sensors in the real scene.Then, the sink and data
   center can use this information for further analysis and process.
   Nowadays WSNs are widely applied in many fields, such as military
   surveillance, factory or environmental monitoring, battlefield and
   health care.

   Due to limited energy of sensor node in WSNs, data aggregation
   technology is widely used in wireless sensor networks.  The privacy
   preserving of sensor node and secure transmission of WSNs have become
   two of the key problems needed to be solved in data aggregation.More
   information of constrained-node networks could be found in RFC
   7228[RFC7228].  [RFC7228]





Wang, et al.            Expires December 21, 2014               [Page 2]

Internet-Draft                     CTA                         June 2014


   In typical WSNs, large numbers of nodes are distributed in a
   particular area and self-organize into a network.  Each sensor node
   sends its own sensory data to a parent node.  Then the parent node
   performs the aggregation on its own sensory data and the data
   received from its child node, and sends the aggregated result to its
   own parent node too.  After layer by layer uploading, ultimate
   aggregated data reach the sink.  In many practical applications, the
   sink node is only interested in the statistics of those sensor data,
   such as sum of the data, average and variance of those data, etc.  As
   these results can be all obtained in a SUM model of data aggregation
   with some additional compute, we will mainly consider the addition
   model of data aggregation.have been proposed and used, and can
   basically meet the corresponding requirements.  But due to the open
   feature of WSNs, the exposure of context information can cause the
   user's secret leak to the attacker, especially attacker can locate
   the source node or destination node (base station) through traffic
   analysis and hop track packet stream.  The location of source node
   and base station are very sensitive in many WSN applications such as
   in precious animals detection system, the position of animals (source
   node) can't be exposed to illegal hunters; in battlefield information
   collection system, the position of soldier (sink node) who accepts a
   variety of information can't be exposed to the enemy.  Because of the
   importance and necessity of location privacy, this paper has a
   research on principles, advantages and disadvantages of current main
   location privacy protection agreement.

1.1.  Requirements of the secure data aggregation in WSNs

   In the worst case, the attacker eavesdropped or captured an
   aggregation node, more valuable data will be achieved.

   This document aims to recommend a secure data aggregation scheme to
   meet the following requirements.

   1) Security: the proposed scheme should protect the data collected by
   the sensor nodes from being easily obtained by eavesdroppers or
   attackers.

   2) Energy-saving: the proposed scheme should ensure not only that the
   computation complexity in sensor nodes and aggregation nodes is as
   simple as possible, but also the size of transmitted data among nodes
   is short enough too.

   3) Robustness: Since the sensor nodes in WSNs are often deployed in
   the scene where nodes can't be replaced or charged artificially,
   attack and steal is unavoidable.  The proposed scheme should still
   have the flexibility to work in the case of several node failure or
   data loss.



Wang, et al.            Expires December 21, 2014               [Page 3]

Internet-Draft                     CTA                         June 2014


   In order to meet the above requirements, this document recommends a
   Key Vector Sharing data aggregation scheme, which uses additional
   homomorphic algorithm to encrypt each node's data and Key Vector
   Sharing to achieve secure data aggregation.

1.2.  Meaning of Symbols

   N&#65306;Number of nodes in the whole network;

   Ni&#65306;The ith Node;

   P&#65306;Maximum number of child nodes in one aggregation node;

   M&#65306;Modulus of modular arithmetic in the scheme;

   G&#65306;Number of keys in overall key pool;

   K&#65306;Number of keys in one key group;

   ki&#65306;The ith key in the key group;

   seed&#65306;Random number shared by sink and nodes;

   Hash1&#65306;Hash function 1;

   Hash2&#65306;Hash function 2 (different from Hash1).

2.  Network model and specify methods

2.1.  Network model

   Firstly, we present the network model of secure data aggregation.

   It is a multi-level tree topology, and has one sink to collect the
   aggregation data result.  Each parent node can have p child node at
   most.  Here, we assume there are 7 levels in total and p=3.  Thus the
   total number of nodes is N, while N=3279.

2.2.  Secure data aggregation

   In encryption process, we assume there are plaintext m [0, M-1],
   random generated key stream k [0, M-1], then, ciphertext c=Enc(m, k,
   M) =(m+ k) mod M.

   In decryption process, the plaintext m=Dec(c, k, M) =(c-k) mod M.
   When the encrypted data reaches the aggregation node, c1=Enc(m1, k1,
   M), c2=Enc(m2, k2, M), the aggregated data is C=c1+c2=Enc(m1+m2,
   k1+k2, M).  However, when the sink receives the aggregation data, it



Wang, et al.            Expires December 21, 2014               [Page 4]

Internet-Draft                     CTA                         June 2014


   makes the decryption as Dec(c1+c2, k1+k2, M) =m1+m2.  Then, we get
   the aggregated plaintext.

   Here, mod means a modulo operation, Enc(m, k, M) is the function
   which uses k as a key to encrypt plaintext m.  Dec(c, k, M)
   represents the function to decrypt ciphertext c with the key k.
   Obviously we have got an additional homomorphic scheme now.

   However, we must know that if n different ciphertexts are added
   together, module M has to be larger than the summary of all the mi,
   otherwise the decryption will fail.  In practical applications,
   assuming that m=max(mi), then M can be chosen as M= N*m.  For
   example, if we have N=3279 nodes in our network, M should be not less
   than 3279*m.

2.3.  MAC processing

   In end to end data aggregation scheme, integrity verification is a
   good method to resist attacks.  There are some schemes which have
   been proposed to realize integrity verification in data aggregation.
   The SIE scheme requires each node to compute the MAC [RFC2104]
   [RFC2104]value (e.g.  SHA hash value) of the key shared with sink,
   and transmit the MAC value together with the encrypted sensor data to
   the aggregation nodes and sink.  The sink then calculates the sum of
   response nodes' MAC, and compares the sum value with the received
   hash value to verify integrity.  Although this scheme has realized
   integrity verification for CMT scheme, it does not address the ID
   expansion problem and data-loss problem yet.  For example, in CMT
   scheme, since the sink just receives those nodes' IDs which don't
   need to respond, then sink does not know the node IDs which drops
   down suddenly or when data loss problems happen, this will result to
   erroneous verification.

   In this document, we improved the scheme as follows.  Each node
   appends the used key's MAC to the sensory data, aggregation nodes
   only need to simply add these MACs.  The sum result of the MACs will
   be received by the sink finally.  Meanwhile, vector V is send to the
   sink too.  V shows the used times of each key in the topology.  Since
   the sink knows all the shared keys, it can calculate the MAC value
   based on the keys vector V.  Therefore the sink compares the
   calculated MAC with the MAC received to make sure that the data
   hasn't been modified.

3.  Secure data aggregation scheme based on Key Vector Sharing







Wang, et al.            Expires December 21, 2014               [Page 5]

Internet-Draft                     CTA                         June 2014


3.1.  Preparation of the scheme

   Initially, the sink will randomly choose K keys from a big key pool
   to build a new key group, here G is a great number but K need not be
   set too big (e.g.  G=2100, K=16).  Then, each node will get one
   unique key from the sink, which is randomly chosen from the key
   group.  Besides, the key of one node gotten from the sink is unknown
   to another.  The proposed secure data aggregation based on key vector
   sharing has the following processing phases.

   Phase 2: When an aggregating round begins, both the leaf nodes and
   aggregation nodes collect the sensory data and encrypt their own
   data, then leaf nodes send their encrypted data to aggregation nodes,
   aggregation nodes perform aggregation over the encrypted data.  The
   processing details are shown as follows:

   If Ni is a leaf node, it first calculates Hash1(seed,ki), here ki is
   the unique key assigned from the sink.  Then it uses the hash value
   as the encryption key to be added with collected data di, the result
   of the addition is the ciphertext C.  Here, the addition is based on
   modular M.  Meanwhile, the node generates a vector V=(v1,...vi-
   1,vi,...vK), which represents each key's used times, here vi will be
   set 1 and others will be set 0 because node has just used ki one
   time.  The value of Hash2(seed,ki) will also be calculated by the
   node, and denoted by H.  In the end, the node connects C, V and H to
   a data structure(C,V,H), which will be sent to its parent node.

   If Nj is an aggregation node, it first calculates Hash1(seed,kj),
   here kj is the unique key assigned to this node.  Then it uses the
   hash value as the encryption key to add with collected data dj, thus
   ciphertext C is obtained.  Meanwhile, the aggregation node generates
   a vector V=(v1,...vi-1,vi,...vK) which represents each key's used
   times, here vj will be set 1 and others will be set 0 because the
   node has just used kj one time.  The Hash2(seed,kj) value will also
   be calculated by the node, and denoted by H.  Then, the aggregation
   node adds its own C with other Cs received from its children nodes,
   adds its own V vector with other Vs received from its children nodes,
   and adds its own H with other Hs received from its children modes.
   When addition of Hs is computed, the overflow bits in the modular
   addition can be ignored.

   Phase 3: After receiving the aggregated data, the sink first
   calculates all the values of Hash1(seed,kj)(i=1,2,...K) and
   Hash2(seed,kj)(i=1,2,...K), because it has all the keys in the key
   group.  According to the received vector V, the sink can retrieve the
   sum of all sensed data' sum d1+d2+...dN.  The plaintext of
   aggregation data result will be regained by modular subtracting vi
   times of each Hash1(seed,kj) from the received ciphertext C.



Wang, et al.            Expires December 21, 2014               [Page 6]

Internet-Draft                     CTA                         June 2014


   Meanwhile, the sink sums vi times of each Hash2(seed,kj) to get a
   value denoted as H'.  Then it compares the H'value with the received
   H to verify the transmitted data integrity.

3.2.  Process for the node

   Phase 1: In the beginning, the sink broadcasts a request to all
   sensors to generate network topology (same as TAG scheme).  After
   that, the sink broadcasts a seed for a new aggregating round.
   However, it's optional.  As we can use the synchronous time or a
   serial number as the seed, which needn't be broadcast every round.

   Phase 2: When an aggregating round begins, both the leaf nodes and
   aggregation nodes collect the sensory data and encrypt their own
   data, then leaf nodes send their encrypted data to aggregation nodes,
   aggregation nodes perform aggregation over the encrypted data.  The
   processing details are shown as follows:

   If Ni is a leaf node, it first calculates Hash1(seed,ki), here ki is
   the unique key assigned from the sink.  Then it uses the hash value
   as the encryption key to be added with collected data di, the result
   of the addition is the ciphertext C.  Here, the addition is based on
   modular M.  Meanwhile, the node generates a vector V=(v1,...vi-
   1,vi,...vK), which represents each key's used times, here vi will be
   set 1 and others will be set 0 because node has just used ki one
   time.  The value of Hash2(seed,ki) will also be calculated by the
   node, and denoted by H.  In the end, the node connects C, V and H to
   a data structure(C,V,H), which will be sent to its parent node.

   If Nj is an aggregation node, it first calculates Hash1(seed,kj),
   here kj is the unique key assigned to this node.  Then it uses the
   hash value as the encryption key to add with collected data dj, thus
   ciphertext C is obtained.  Meanwhile, the aggregation node generates
   a vector V=(v1,...vi-1,vi,...vK) which represents each key's used
   times, here vj will be set 1 and others will be set 0 because the
   node has just used kj one time.  The Hash2(seed,kj) value will also
   be calculated by the node, and denoted by H.  Then, the aggregation
   node adds its own C with other Cs received from its children nodes,
   adds its own V vector with other Vs received from its children nodes,
   and adds its own H with other Hs received from its children modes.
   When addition of Hs is computed, the overflow bits in the modular
   addition can be ignored.

   Phase 3: After receiving the aggregated data, the sink first
   calculates all the values of Hash1(seed,kj)(i=1,2,...K) and
   Hash2(seed,kj)(i=1,2,...K), because it has all the keys in the key
   group.  According to the received vector V, the sink can retrieve the
   sum of all sensed data' sum d1+d2+...dN.  The plaintext of



Wang, et al.            Expires December 21, 2014               [Page 7]

Internet-Draft                     CTA                         June 2014


   aggregation data result will be regained by modular subtracting vi
   times of each Hash1(seed,kj) from the received ciphertext C.
   Meanwhile, the sink sums vi times of each Hash2(seed,kj) to get a
   value denoted as H'.  Then it compares the H'value with the received
   H to verify the transmitted data integrity.

4.  Security Considerations

   We take the tree network topology with seven levels for instance.  In
   this network topology, each parent node can have 3 child nodes at
   most, so the whole network has not more than 3279 nodes in total.  We
   assume that key group's size is 16(K=16).  Thus, for the sink, there
   are 16 numbers in the final V it received, and each number could be
   anyone range between 1 and 3279.  Since different V represents
   different used times of each key in the encryption process, the
   vector's space is so huge (more than 2200) for an attacker to crack
   the ciphertext.  Due to the addition of V in the aggregation process,
   two adjacent aggregation nodes have totally different V vector, thus
   one node cannot decrypt the other node's ciphertext.  Moreover, for a
   leaf node, because it does not know which key its adjacent node has
   got from the key group and it holds only one key, it can't decrypt
   the data of an adjacent node too.  In summary, the proposed scheme
   provides good privacy protection for the nodes and the transmitted
   data.

   Furthermore, if an attacker captures a node, all he can get is the
   data structure (C,V,H) of the node.  What he may want is to find
   another node with the same vector V which will help him to analysis
   the ciphertext C.  In the worst case, assuming that the attacker can
   eavesdrop on all nodes in the entire network and get all nodes'
   transmitted data, there will be two cases when the attacker looks for
   the nodes having the same key vector V: 1) If the attacker captures a
   leaf node, then he may find 1/16 of all the leaf nodes, which is just
   4% of the nodes in whole network.  Because these nodes are randomly
   located in the area, they don't represent any actually useful
   information. 2) If the attacker captures an aggregation node, the
   probability of finding other nodes having the same vector V will be
   greatly reduced.  For example, the probability is 1/103 in layer 6,
   and is 1/106 in layer 5.  It is very hard for the attacker to find
   another aggregation node having same vector V.

5.  Overhead Considerations

   The recommended scheme only requires a vector V having a limited
   number of bits instead of transmission of all nodes' ID.  Vector V
   won't rapidly expand in the process of aggregation.  For example,
   based on the communication overhead discussion of CMT scheme,
   transmission overhead of each node increases from 75 bits in bottom



Wang, et al.            Expires December 21, 2014               [Page 8]

Internet-Draft                     CTA                         June 2014


   level to 950 bits in level 1 in the case that 10% nodes didn't
   respond.  It becomes worse in the case that 30% nodes didn't respond,
   while communication overhead increases from 75 bits to 2700 bits.
   However, in this document, the communication overhead ranges from 75
   bits to 187 bits, and most of the nodes' overhead is not larger than
   75 bits.

   Since the sensory data's measure range is m=27, in order to decrypt
   the ciphertext properly, modulus M should be at least N*m.  Hence
   each node's encrypted data segment needs 12+7=19bits (here we assume
   n=212=4096>3279).  For consistency, we also plus 56 additional load
   bits like the CMT scheme, thus the basic data segment is 56+19=75bits
   in size.  In CMT scheme, each node in the highest level of the
   aggregation tree will transmit at least 950 bits in the case that 10%
   nodes didn't respond.  However, in the proposed scheme, each node in
   the highest level only transmits at most 187 bits in the average
   case.  Because the average using times of ki after the last
   aggregation is:3276(1-10%)/3*16=61<26<<27, each vi in the V needs 7
   bits at most to represent the number of used times, V's length is
   7*16=112 bits, and the total bits required for transmitting is
   75+112=187 bits.  Obviously, the proposed scheme saves over 80% of
   the overhead in the high level, and for the nodes in other lower
   levels, the length of transmitted data is also shorter than those
   related schemes too.

6.  iana consideration

   To be completed.

7.  Robustness analysis

   The recommended scheme has solved the data-loss problem as well.  For
   example, when a node dropped due to a sudden accident, the sink
   didn't need to know its ID.  Because once there is a node dropped or
   data lost in the transferring process, we will lose not only C but
   also V and H, the lost data's V and H will not reach the aggregation
   node, so the correctness of the decrypted data is guaranteed.  Thus,
   the proposed scheme is robust, which is very important for a sensor
   network which changes topology frequently.

8.  Conclusions

   In this document, the recommended secure data aggregation scheme
   based on Key Vector Sharing overcomes the problem of rapid expansion
   of transferred IDs, reduces the additional overhead due to
   transferring of ID information, and improves the endurance of the
   network.  Moreover, by using hash function, the scheme guarantees the




Wang, et al.            Expires December 21, 2014               [Page 9]

Internet-Draft                     CTA                         June 2014


   integrity of the aggregation.  In addition, solving the data-loss
   problem makes the proposed scheme more robust and practical.

9.  References

   [RFC7228]  Bormann, C., Ersue, M., and A. Keranen, "Terminology for
              Constrained-Node Networks", May 2014.

   [RFC2104]  Krawczyk, H., Bellare, M., and R.  Canetti, "HMAC: Keyed-
              Hashing for Message Authentication", February 1997.

Authors' Addresses

   Likun Wang
   SoutnEast University
   N0.9, Mo Zhoudong street Nan Jing, Jiang Su Province 210096

   Phone: +86 18795858986
   Email: kunliw@aliyun.com


   Liquan Chen
   SoutnEast University
   N0.9, Mo Zhoudong street Nan Jing, Jiang Su Province 210096

   Phone: +86 13813852253
   Email: Lqchen@seu.edu.cn


   Jie Huang
   SouthEast University
   N0.9, Mo Zhoudong street Nan Jing, Jiang Su Province 210096

   Phone: +86 13675178016
   Email: jhuang@seu.edu.cn
















Wang, et al.            Expires December 21, 2014              [Page 10]