KARP | W. Atwood |
Internet-Draft | R. Bangalore Somanatha |
Intended status: Standards Track | Concordia University/CSE |
Expires: January 15, 2013 | July 16, 2012 |
Automatic Key and Adjacency Management for Routing Protocols
draft-atwood-karp-akam-rp-01
When tightening the security of the core routing infrastructure, two steps are necessary. The first is to secure the routing protocols' packets on the wire. The second is to ensure that the keying material for the routing protocol exchanges is distributed only to the appropriate routers. This document specifies requirements on that distribution and proposes the use of a set of protocols to achieve those requirements.
This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http:/⁠/⁠datatracker.ietf.org/⁠drafts/⁠current/⁠.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 15, 2013.
Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http:/⁠/⁠trustee.ietf.org/⁠license-⁠info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.
Within the Keying and Authentication for Routing Protocols working group, there are several goals:
Within the second goal, there is at this time considerable activity on protocols and procedures for creating shared keys, under the assumption that the end points of the exchanges (the routers) are entitled to enter into the conversation. However, there appears to be no work on ensuring that the end points are legitimate.
This document addresses this issue. In particular, it addresses the need to ensure that keying material is distributed only to routers that legitimately form part of the "neighbor set" of a particular speaking router.
Autonomous System ...
Administrative Domain ...
Traffic Encryption Key (TEK) ...
In an AD, all routers having the same TEK can be referred to as forming a 'keying group'. We can have routers forming a 'keying group' as follows:
Alternatively, keying groups can be viewed from another perspective. Instead of looking at the granularity of keying from the point of view of the members, we can look at it from the point of view of the keys. This can be referred to as 'key scope'.
The key scopes corresponding to the above categories of keying groups in the same order could be defined as follows:
The overall aim of this document is to specify a system for automated key management, which will eliminate the disadvantages of the manual kethod of key updating. The basic function of this automated system is secure generation of keys and their distribution. The system should also enable key updates at regular intervals so as to protect against both active intruders and passible intruders who could be eavesdropping the traffic after having gained access to the keys secretly.
Along with these basic goals, a key management system should satisfy an additional set of requirements. These requirements ensure among other things, security, easy deployment, robustness and scalability. We have compiled this set after referring to RFC 6518 [RFC6518], [I-D.ietf-karp-threats-reqs] and RFC 5796 [RFC5796].
In this section, we propose an architecture for an automated key management and adjacency management system. In order to build this framework, we have reused parts of some existing proposals and fitted them into their correct places in the overall architecture. We have then extended/ modified them so as to handle the key management isues overlooked by them.
Our design deals with securing the control traffic of routers within an AD.
The main entities in our system are the following:
These entities and their functions are explained in the next section.
The entities are based on those in GSAKMP. The difference is that the Group Owner in GSAKMP has been replaced by a Policy Server, and the Subordinate GC/KS has been replaced by a Standby GCKS in our design. We have chosen the term 'Policy Server' in order to be consistent with RFC 6407 [RFC6407], and the term 'Standby GCKS' since it is not a subordinate in our design and is a standby that is capable of performing all operations performed by the active GCKS. Our design conforms to the Multicast Group Security Architecture [RFC3740].
The network administrator makes configurations for the Policy Server and the GCKS. Security policies go to the policy server, and configurations related to the AD go to the GCKS.
Policy Server is the entity that manages security policies for the AD. The behavior of the policy server we describe here draws contents from and is very similar to the 'Group Owner' in GSAKMP. The security policies include general policies such as authorization details for the GCKS, access control for the GMs, rekey intervals, as well as other specific policies that may be necessary for the group. These policies are put together into a 'Policy Token' (term taken from GSAKMP) and sent to the GCKS.
The GCKS is either a router or a server chosen by the administrator as the group controller. It is the entity whose major function is key management and adjacency management. The GCKS should also ensure that the security policies in the policy token are enforced. This implies that whenever a GM requests keys from the GCKS, the GCKS should enforce access control for the GM according to the terms specified in the policy token. The administrator configures the GCKS with information such as the type of keying group to be enforced for the AD and the adjacencies for each router in the AD corresponding to a particular routing protocol (or a set of similar routing protocols). This last point is due to our proposal that there could be one instance of a GCKS per routing protocol or a set of similar routing protocols. This is in fact necessary because GCKS is the entity that should ensure adjacency management, and adjacencies may be defined differently for different routing protocols. Also, according to [I-D.ietf-karp-ops-model] , "KARP must not permit configuration of an inappropriate key scope". This means that each routing protocol could have a different requirement of key scope and that needs to be satisfied. The GCKS may also generate, distribute and update keys, depending on the type of keying group to be enforced in the AD.
The standby GCKS is an entity that is always kept in sync with the active GCKS, ready to take over at any time should the active fail. This design eliminates the possibility of a single point of failure in a centralized system.
GMs are the group member routers that communicate with each other as well as with the GCKS. When they request keys from the GCKS, they are given the keys along with the policy token. GMs are required to check the rules specified in the policy token to determine if the GCKS is authorized to act in that role. Each GM has a Local Key Server (LKS) [atwo2009:AKM]. It is a key generation and storage entity within the GM. A GM may sometimes be required to generate keys itself depending on the category of keying group being enforced. This kind of design ensures that the architecture is distributed in the sense that key management responsibility is divided between the GCKS and the LKSes.
From the description above, it can be seen that the architecture we propose is a balance between a completely centralized model and a completely distributed one, developed by picking the plus points of both types. It defines the concept of a GCKS, which is a centralized entity, as well as the concept of a LKS, which is distributed as being one entity per router. The design tries to bring in the advantages of both models. A centralized entity is considered necessary mainly to make adjacency management possible. In the absence of a central controller that has information about the adjacencies of each router in the AD, individual routers will not be able to establish the legitimacy of their neighbors. Adjacency management is especially important since we are dealing with control packets, which are usually exchanged with immediate neighbors. At the same time, loading the centralized entity with multiple responsibilities may lead to its failure. Hence we have a localized entity that can take up some of the functions of the central controller as and when the need arises. This enhances scalability, which is so important in a key management system. Another factor leading to scalability is the presence of the standby GCKS. A centralized system could have the disadvantage of having a single point of failure. Our design tries to eliminate this by defining a standby for the central controller that is always kept in sync with it, ready to take over at any time.
The operations of key management and adjacency management occur at two different levels. To ensure scalability of the system, as many operations as possible need to take place among adjacent routers. However, to ensure overall control, policies nees to be set centrally for the entire AD.
We recognize two types of groups, which represent the two levels of operation:
The overall operation proceeds in four steps:
If the key scope corresponds to "same key for the entire AD", then the key management policy in step 2 could be "use this key", where "this key" is the same for all GMs, and is sent as a parameter along with the policy. In this case, the key generation in step 4 is not necessary.
If the key scope corresponds to "key per link", the the key may be mutually determined by the routers on that link, or a "local" GCKS may be elected and asume the task of generating the key, which will then be distributed on the secure paths established in step 3.
If the key scope corresponds to "key per sending router" or "key per sending router per interface", then the sending router assumes the responsibility for generating and distributing the key(s) that it will use to send its routing protocol traffic. Thus each router maintains (n+1) keys, one for each neighbor, for incoming traffic from that neighbor, and one key for outgoing traffic.
Similarly, if the key scope coresponds to "same key for the entire AD", then the adjacency management policy is probably "accept any router that claims to be your neighbor" or "accept any router that presents a valid router identification string".
For other key scopes, the authentication part of step 3 will have to confirm that a match exists between what is presented by the neighbor router and what is specified in the adjacency management policy information.
If IPsec is to be used to protect the routing protocol packets, negotiation of the Security Parameter Index (SPI) to be used will be done as part of step 4. This has to be mutually negotiated among the users of a particular key, because it cannot be arbitrarily set by any particular member of the group of adjacent routers. (This is in contrast with a two-party Security Association, where the SPI can be safely set by the (single) receiver of the incoming packets.)
This section provides a detailed description of the automated key and adjacency management system. This is followed by the details of the communication among the various entities of the system.
This section provides a detailed description of the architecture, showing also the communication among the different entities.
Figure 1 gives a closer view of the entities in our design as described previously and shows the interactions among them.
Ascii art for system architecture
Basically there is a centralized GCKS in the system and localized LKS, local to each GM router. The GCKS and the LKS have the ability to generate SA parameters through a KMP, and to store them in a key store. The different scenarios to be considered and the steps of communication are described in this section and the next.
Figure 2 shows an inner view of a GM with interactions among the KMP, a routing protocol and the LKS.
Ascii art for GM
Initially the routing protocol requests keys from the KMP to secure its control traffic. This starts the communication between the GM and the GCKS through the KMP, as shown by the numbered steps in Figure 1. The key generation policy specified by the GCKS is transferred to the GM. Then the keys are generated by the LKS of the GM, and stored into a key store hosted by the LKS. The KMP notifies the routing protocol that new keys are available for its use as shown in Figure 2. The routing protocol then retrieves the keys from the key store. For some categories of keying groups, the LKS is given the keys directly by the GCKS. For others, it may negotiate the keys with its neighbors. These cases are explored in detail in the sections that follow.
The proposed KMP runs between the GCKS and the GMs, and among the GMs themselves. The KMP messages need to be protected, and this can be achieved by running a protocol prior to it to derive keys to protect it. This is similar to the manner in which GDOI messages are protected by keys generated by a phase 1 protocol such as IKE.
The design we propose is a hierarchical one. There are two kinds of groups that can be formed here (not to be confused with keying groups). The first kind is the one formed by the GCKS with each GM in the AD. The second kind is the one formed among the GMs. The design can be seen as comprised of 5 main steps. The steps together help ensure key and adjacency management in a secure manner.
Each step is dependant on the previous ones leading to a hierarchy and ensuring modularity of design. Our design concentrates on steps 1 through 4 in order to enable a secure step 5.
The details of each of these steps are explained in the next section.
In this section, we give a detailed description of our proposal for a protocol that serves as a solution to the key management problem outlined in \xc{ch:problem stmt}. To summarise, the intention is to develop a protocol for an automated key management system such that all the requirements listed in \xs{sec:reqs} are satisfied.
We have seen the set of entities in the proposed design in \xc{ch:high level design}. Now we shall see the exact messages exchanged among them so that the keys required for securing routing protocol control traffic can be generated and distributed to the appropriate routers.
Initially the administrator configures security rules on the Policy Server, and configuration parameters on the GCKS. The security rules have among other things, access control rules related to GMs, and authorization rules related to the GCKS. The configuration parameters include among other things, the key scope information pertaining to the AD and adjacency information corresponding to each router in the AD. If required, the Policy Server generates other security policies relevant to the group and puts them together into a policy token. This policy token is sent to the GCKS.
Once this is done, steps 1, 2, 3 and 4 as outlined in \xs{sec:hier design} follow. Step 1 is for GCKS-GM authentication, step 2 is for key and/ or policy transfer from the GCKS to each GM, step 3 is for GM-GM authentication, and step 4 is for key exchange between GMs that need to communicate with each other. Steps 2 and 4 have small variations depending on the key scope being enforced for the AD.
Steps 1 and 2 are based on the GDOI GROUPKEY-PULL protocol \cite{rfc-gdoi}. However, step 2 in our case is an extension of GROUPKEY-PULL in the sense that it accommodates various cases of keying groups and adjacency management as well. Steps 3 and 4 have been designed such that GROUPKEY-PULL has been extended to inter-GM communication.
Now we shall look at each of these steps in detail.
Initially, when a routing protocol instance wishes to start communication, be it unicast or multicast communication, it informs the same to the KMP instance on the router. This information is communicated by the KMP instance from that router to the KMP instance on the router or server it believes to be the GCKS. At this point, the GCKS needs the identity of the requesting router in order to authenticate it. The requesting router also has to authenticate the GCKS. Any of the ISAKMP group of unicast protocols could be used for step 1 communication between the GCKS and each router that requests keys from it. IKE/ IKEv2 is an example of such a protocol. This protocol provides peer authentication, and parameters for an SA including a key to help provide confidentiality and message integrity for the next step where the actual traffic keys would be generated. We call the key derived in this phase as SKEYID_a (term taken from GDOI). It is assumed that the routers have agreed upon a way to establish their identity during authentication, either through pre-shared keys, asymmetric keys or certificates. If peer authentication is successful, the router becomes a GM.
As already mentioned, GM stands for 'Group Member'. When talking about the GCKS-GM interactions, 'group' typically means the entire set of GMs in the AD. When talking about the GM-GM interactions, 'group' typically means the sending router and some set of its neighbors. This set may include all of its neighbors or only a subset, depending on the key scope in use. For example, when the key scope is per link, a 'group' may refer to all routers sharing a link. This will become evident as we see the GM-GM interactions shortly.
The protocol message exchanges for this step are the standard IKE exchanges since we propose using IKE for this step. We would like to mention at this point that whenever we say IKE, we intend to refer to IKE or IKEv2, unless explicitly stated otherwise.
This is the step where the KMP takes over. The goal of the KMP is to provide parameters for an SA to be eventually used by a routing protocol to secure its control traffic.
Messages in this step are secured by the key generated by the step 1 protocol, that is, SKEYID_a. This key helps achieve authentication and confidentiality for step 2. For step 2, we have taken most of the messages from GROUPKEY-PULL protocol of GDOI. However, there are some modifications and important addition of functionality in our case, with the GCKS passing additional information to the GMs. We shall see this in this section.
We shall initially look at the KMP details for one of the finely grained cases of keying groups, namely, the group per sending router. This is a flavor of multicast communication. Soon after this we will see the small variations necessary in order to handle the other categories of keying groups.
In step 2, the (each) GM makes requests from the GCKS through the KMP for SA parameters required to secure its control traffic. In the request to the GCKS, the GM specifies the identity of the routing protocol for which it needs the keys. Although the GCKS corresponding to the routing protocol would have already been selected in step 1, specifying the routing protocol id again here helps to handle the case where the same GCKS may be used for a category of similar routing protocols.
When the GCKS receives this request from the GM, it checks to verify if the GM can be given access to key related information according to the rules in the policy token. If the checks fail, the communication with the GM should not be continued. The exact behavior can be determined from the rules in the policy token. If the checks succeed, the GCKS delivers to the GM the following information:
The protocol message exchanges for step 2 are shown in \xf{fig:msg-exchg-2}.
Ascii art for message exchanges
In the message exchanges, HDR is an ISAKMP header payload. It has a message id M-ID. The '*' indicates that the message contents following the header are encrypted. The encryption is done with SKEYID_a. This ensures authentication (since the key is a secret generated in step 1 and can be possessed only by the GCKS and the GM with which the step 1 has been carried out) as well as secrecy (due to the encryption). Hashes are used for ensuring message integrity and data origin authentication; this will be explained shortly.
In exchange (1), the GM requests SA information from the GCKS to protect its control traffic corresponding to the routing protocol whose id is given by RP_ID. Ni is a nonce used to protect against replay attacks as well as to ensure liveness of the GM.
In exchange (2), the GCKS initially confirms from the rules in the policy token that the GM can be given SA information. It also verifies the freshness of the nonce Ni. If this is successful, the GCKS proceeds to deliver to the GM the following information:
The details of these pieces of information have already been explained. Nr is a nonce used for replay protection and to ensure liveness of the GCKS.
In exchange (3), the GM initially verifies freshness of the nonce Nr so as to detect a replay attack. It then proceeds to confirm the authorization of the GCKS by referring to the policy token. If the GCKS is an authorized entity, the GM uses the key scope information to know how to proceed with respect to key generation. The adjacency list is used to note the list of legitimate neighbors and the allowed interfaces on which they can appear online. Once this is done, the GM sends an acknowledgement. This acknowledgement includes a hash for integrity purposes. If the GCKS is not authorized, the GM needs to end the communication with the GCKS. The behavior in such cases can be determined by the policies specified in the policy token.
The hashes are pseudorandom functions (prf) computed as shown in \xf{fig:hash-2}.
Ascii art for hashes
According to [RFC6407], "Each HASH calculation is a pseudo-random function ("prf") over the message ID (M-ID) from the ISAKMP header concatenated with the entire message that follows the hash including all payload headers, but excluding any padding added for encryption." SKEYID_a is included in the hashes to ensure that both parties have the step 1 key. The hashes include the nonces from previous messages to ensure that both the parties have the exchanged nonces. This is used for data origin authentication purposes. Hence Ni_b and Nr_b refer to Ni and Nr from exchanges (1) and (2) respectively.
An important function of hashes is to provide message integrity. The receiver computes the hash of the received message and compares it with the hash value received to determine whether the message has been tampered with or not.
Once the GM has received this information, it generates the TEK and determines the parameters to be used for its outgoing SA. Here the functionality of the LKS of the GM as a generator of keys comes into play. Since the key scope being discussed now is one key per sending router, the LKS of each GM generates one TEK. The key generation is to be followed by key information exchange with legitimate neighbors so that the incoming SAs can be determined.
After the GM generates TEK based information, before exchanging it with its neighbors, it needs to ensure that a secure TEK exchange can take place. This is done in step 3 by each GM engaging in a unicast communication with each of its legitimate neighbors through any of the ISAKMP group of unicast key management protocols, such as IKE. This protocol provides peer authentication as well as a secret key to provide confidentiality, authentication and message integrity for step 4, which is the actual TEK exchange step. We call this secret key as SKEYID_b. The legitimate neighbors are determined by referring to the adjacency information given by the GCKS to the GM in step 2. During peer authentication in step 3, the certificate given to the GM by the GCKS could be used.
The protocol message exchanges for this step are the standard IKE exchanges since we propose using IKE for this step.
This is the step where the TEK information is exchanged between GMs that need to communicate with each other. Unicast communication is anyway between two peers. For multicast communication, since we are dealing with control traffic only, and control traffic is typically link-local, each router on a link needs to be aware of the TEK of all other routers on the same link. These legitimate neighbors are determined from the adjacency information received from the GCKS. The LKS of the corresponding GMs communicate to exchange their TEK information in order to help them populate their incoming and outgoing SAs.
Messages in this step are secured by the key generated by the step 3 protocol, that is, SKEYID_b. This key helps provide authentication as well as confidentiality.
In step 4, the LKS of the GM pushes the SA information corresponding to its TEK to each of its neighbors. The LKS also requests TEK information from its neighbors. Each of the neighbors then sends its outgoing TEK information and this is maintained as an incoming key on the querying LKS. As a result of step 4, all GMs have the TEK information corresponding to all their neighbors so that a secure control traffic exchange can start.
Ascii art for exchanges
GMi and GMr depict the initiator and the responder GMs respectively.
The message exchanges in this step are similar to those in step 2 in that the HDR is an ISAKMP header payload with a message id M-ID. The '*' indicates that the message contents following the header are encrypted. The encryption is now done with the key SKEYID_b derived in step 3. This ensures both authentication and secrecy. Hashes are used for ensuring message integrity and data origin authentication. Nonces are used to resist replay attacks and to ensure peer liveness.
In exchanges (4) and (5), we show mutual authentication between GMs through the certificates received from the GCKS in step 2. CERT1 is the certificate received by GMi and CERT2 is the one received by GMr from the GCKS. Authentication would have happened in step 3 so exchanges (4) and (5) can be eliminated. They have been shown here for the sake of completeness.
In exchange (6), the initiator GM communicates to its neighbor its outgoing SA parameters in SA1 as well as the outgoing TEK information explicitly in KD1. This is the TEK that it will be using henceforth to secure its control packets. It also requests the outgoing SA information from the neighboring GM so that it can be installed as incoming SA information on the querying GM. This request is represented by KREQ, which stands for Key Request.
In exchange (7), the neighboring GM responds with its outgoing SA information in SA2 as well as the TEK in KD2. This will be the TEK the neighboring GM will use henceforth to secure its control packets.
As already mentioned, the nonces N1 and N2 help provide replay protection and a confirmation that the peer is alive.
The hashes are pseudorandom functions computed as shown in \xf{fig:hash-4}.
Ascii art for hashes
Hash computation is similar to that explained in step 2. In step 4 hashes are computed by applying a pseudorandom function to the key SKEYID_b, along with the message id concatenated with the message contents following the hash. Also, nonces from a message exchange are included in the hash computation of the subsequent exchanges in order to ensure that both parties have the nonces just exchanged. This helps in data origin authentication. Hence N1_b and N2_b refer to N1 and N2 in exchanges (4) and (5) respectively. Hashes are very essential to ensure message integrity and to confirm that the messages have not been modified (possibly by an intruder) during transit.
All information received by the LKS of a GM from the GCKS as well as from neighboring LKSes is written to stable storage persistent across reboots. This can be effectively used to avoid flooding the GCKS with requests on a router reboot. This is one of the advantages of the proposed design over GDOI \cite{rfc-gdoi}, where, when routers reboot they come back up with no information and the GCKS is flooded with requests. The routing protocol is notified by the KMP about the new SA being available in the key table for it to protect its control traffic.
The routing protocol security mechanism would store the incoming and outgoing SA information, and the adjacency information into the relevant databases.
As we can see, confidentiality and authentication has been ensured for all steps by means of secret keys and certificates.
In the following section, we shall see the small variations required in the basic protocol design proposed above, in order to handle the various categories of keying groups.
We have seen the different granularities possible for a keying group, that is, the different key scopes, in \xc{ch:key scopes}. We have also seen that the design proposed in \xs{sec:protocol design} is able to handle the keying group where there is a separate key per sending router. This has been achieved by each router generating its own key, which would be the same for all its interfaces. Hence each router has a different SA for outgoing traffic and multiple SAs for incoming traffic, one corresponding to each neighbor. It is to be noted here that the key generation being done locally could have a small possibility of two routers ending up with the same key when they generate it randomly. However, if a good random number generator is used for key generation, the probability of ending up with the same key is drastically reduced. This extremely small possibility can be ignored since the method more importantly has the advantages that it reduces the load on the GCKS. Also the GCKS does not have the need to be aware of the individual keys of each router. This could be considered as a case of tradeoff.
In this section, we shall see how the remaining cases of keying groups can be handled. They can actually be handled by minor variations to the basic design. In essence, these variations can be implemented by the GM interpreting the key scope information given to it by the GCKS in step 2, and thereby knowing whether to expect keys from the GCKS or to derive them itself. This also makes the GM aware of the path to be followed. As we shall see, in a majority of cases it is step 4 that gets slightly altered.
We have mentioned that the SA parameters along with the TEK are either delivered to the GMs by the GCKS (for the single key per AD case) or generated by the GMs themselves, possibly through interactions with other GMs (for the other keying groups, depending on the particular category). A parameter that could have a slightly different behavior is the SPI. This is also one of the parameters of an SA. However the range of SPIs to be used in an AD could be decided by the administrator. Whatever be the category of keying group, it could so happen that the administrator chooses to have the same SPI for all GMs. In this case, the GCKS could deliver the SPI to the GMs along with the policy for the remaining parameters of the SA. It could also be that the administrator wants each GM to use a different SPI for its outgoing traffic. In this case, the GCKS should not be overloaded with the task of generating a different SPI for each GM. GMs should generate the SPI themselves, possibly with communication with other GMs. If that happens, even for the single key per AD category of keying groups, the SPI is generated by the GMs, although the TEK may be obtained from the GCKS (since the TEK is to be the same for all GMs for this category of key scope). In other words, the key scope may be different from the scope of the SPI used in the AD. Our design is flexible enough to handle this since the SA policy handed down by the GCKS to the GMs would indicate to the GM the exact steps to be followed.
In all cases of keying groups, the LKS stores SA information to persistent storage to be used across reboots. Keys are stored into the key table \cite{draft-ietf-karp-crypto-key-table} and the KMP informs the same to the routing protocol, which would start using the keys to secure its control traffic. This is the step 5 mentioned in the explanation of the concept of hierarchical design in \xs{sec:hier design}.
In this section, we address some of the other important aspects of the key management problem. Firstly we show how this automated system allows key updates to be done as frequently as desired. Soon after that, we show how various good-to-have features have been incorporated in the proposed design. Some of these features are scalability, incremental deployment ability, effective handling of router reboots and smooth key rollover. Addition of these features would help in achieving the requirements stated in \xs{sec:reqs}.
Keys used by the routing protocols to secure their traffic need to be updated at regular intervals. They may have to be updated at other non-specific times as well depending on the requirement. There are a couple of reasons why key updates are required:
One of the important points to be noted here is that PFS and PBS can be achieved very easily and in a straight forward way for unicast communication. Unicast communication involves a pair of routers that share keys for securing their traffic. Every pair of routers derives its own set of keys and those keys are known only to that particular pair of routers. Hence a change in any one of the members of the pair of routers would mean that the old keys are no longer valid and new keys are derived for communication. This automatically takes care of PFS and PBS. When a router, say R1, is uninstalled, the keys used by the other routers for pairwise (unicast) communication with R1 are no longer used. This ensures PFS. When a new router, say R2, is installed, all routers engaging in a unicast communication with it derive new pairwise keys with it. This ensures PBS.
For multicast communication, key updates are essential on a router uninstallation or an installation to ensure PFS and PBS respectively. This is because in multicast communication, multiple routers share the same key and a key remains valid even if one of the routers involved in the communication is changed. To achieve PFS and PBS, keys have to be updated so that the leaving or entering routers do not have access to information they are not entitled to.
We now have to determine what are the keys that need to be updated. For regular updates, it is quite obvious that the traffic keys of all the routers would have to be changed. The other case to consider is when the routers in an AD change, either due to an installation or an uninstallation. It is interesting to note that when the same traffic key is used for the entire AD, that key should be changed, leading to the effect of changing the keys for all the routers. However, for all other key scopes, only the keys corresponding to the neighbors of the leaving/ entering router need to be changed. This is because as far as control traffic is concerned, routers have knowledge of the keys of their neighbors only. Of course the adjacencies and hence the neighbors, may be defined differently for the various routing protocols.
One of the major problems with the manual method of key management is that keys cannot be updated as frequently as desired. This is due to the lack of authorized people to carry out the task. This issue can be easily overcome by an automated key management system. Let us see how these two cases of regular rekey and a rekey on a router installation/ uninstallation can be handled by the automated key management system we propose.
In this section, we discuss how our design for automated key management aids key updates at regular intervals. The interval at which key updates are to be done is determined from the policies handed down by the Policy Server entity described in \xs{sec:entities}. These policies are handed down by the Policy Server to the GCKS in the form of a policy token, which in turn is handed down by the GCKS to the GMs in Step 2 of the protocol as explained in \xs{sec:protocol design}. We now need to see how key updates for all variations of keying groups can be addressed. As we shall see, when all routers in the AD share the same traffic key, the centralized GCKS is the generator of the new key, whereas in all other cases, the GMs generate the new keys appropriately. This is in fact similar to the process of initial key generation described in \xs{sec:protocol design}.
First, let us take the case of having a single key for the entire AD. Here, when a rekey is required, the GCKS generates the new traffic key and unicasts it to each individual GM. This ensures that all GMs share the same new TEK after the rekey. As an alternative to transferring the new TEK through unicast communication, the GCKS and all GMs in the AD could share a key called a 'TEK Encryption Key'. This key could be used by the GCKS for encrypting the new TEK derived, and multicasting to all GMs. The advantage of this approach over the unicast method is that it eliminates the need to have multiple key update messages sent out by the GCKS, one corresponding to each GM. This in turn reduces the network traffic. However, the downside to the multicast approach is the overhead of maintaining a group key (and appropriately updating it) just for the rekey purposes. This is a case of tradeoff.
In this category of keying group, routers sharing a link also share the traffic key for that link. Here when a TEK update is required, GMs on a link execute one of the key agreement protocols such as MRKMP, group Diffie-Hellman or the STS protocol to derive a new TEK. This is similar to the manner in which they interact to derive the initial TEK for the link. The interval after which the TEK should be changed is of course determined from the policy token.
In this case, every router has a different TEK that it uses for securing its control traffic. When a rekey is required, each GM generates a new TEK individually and then communicates the same to all its neighbors. The neighbors update the incoming TEK information corresponding to that router in their databases.
This case is very similar to the previous one. The only difference is that here, each GM generates as many new TEKs as the number of its interfaces, one per interface. The GM then communicates to each of its neighbors the TEK it plans to use on the interface corresponding to that particular neighbor.
This is the unicast case. Keys can be updated just by every pair of routers executing a unicast key management protocol such as IKE.
In all the above cases, the LKS updates the key store as well as its persistent storage with the updated key information. The KMP notifies the routing protocol of a change in the keys used to secure the control traffic.
Along with the regular key updates, keys need to be updated even when an existing router is uninstalled or a new router is installed. These are for PFS and PBS purposes respectively as already explained in \xs{sec:key updates}. There are a couple of differences between key updates in these cases when compared with the regular key updates.
Apart from these differences, the process of key updates during a router change is very similar to the regular key updates. We shall now discuss briefly how key updates on a router change can be handled for each of the categories of keying groups.
For this category of key scope, the same traffic key is shared by all routers in the AD. When a router is removed or a new router is installed, the GCKS derives a new TEK and unicasts it to each of the routers in the AD.
As an alternative to transferring the new key through unicast method, the GCKS and all GMs could share a key called the 'TEK Encryption Key'. If this option is followed, first of all, the TEK Encryption Key would have to be changed on a router change. Then for the case of router installation, the GCKS multicasts the new TEK Encryption Key, encrypted in the old key to all existing routers. It then unicasts the new TEK Encryption Key to the newly installed router. After this, the GCKS derives a new TEK and multicasts it to all the routers after encrypting it in the new TEK Encryption Key. This can be decoded by the new router as well since it now possesses the latest TEK Encryption Key. For the case of router uninstallation, the GCKS changes the TEK Encryption Key and unicasts it to all the remaining routers. The new TEK Encryption Key cannot be multicast in this case since the old router would also be able to decrypt it. Changing of the TEK would be the same as for router installation. The new TEK is sent in a multicast message to all routers encrypted in the new TEK Encryption Key.
When compared with the unicast method of key updates, this multicast method has the advantage of low bandwidth consumption. However the disadvantage of the multicast method is that an extra key, the TEK Encryption Key, now needs to be maintained and updated accurately. So the exact method chosen depends on the administrator.
For this case, on a router installation or an uninstallation, the GCKS informs the neighbors of that router. These routers interact with each other (and with the new router if it is a case of router installation) and derive a new traffic key for that particular link where the neighbor change has occurred. Any of the mutual key agreement protocols such as MRKMP, group Diffie-Hellman or the STS protocol can be used.
Here again the GCKS appropriately informs the neighbors of the affected router. Each such neighbor runs a randomized key generation algorithm to derive a new traffic key and communicates the key to its neighbors. This is very similar to the case of regular key updates.
This category of keying group can also be handled in an easy manner. The GCKS informs the neighbors of the affected router. Each such router derives a new traffic key for that interface on which the neighbor change has occurred. The router then communicates the new key to its new set of neighbors on that particular interface.
As already explained, key updates on a router change are not valid for unicast communication. This is because in unicast communication, a key is shared by only two routers. A router addition or a removal results in a change in a particular pair (or pairs) of routers. Hence new keys are anyway derived to be shared by the new pair. Thus this can be considered as an automatic update of keys without any explicit processing.
Router reboots form a very important case to be considered in any design pertaining to networks. Especially in a centralized architecture, care should be taken to prevent the central entity from being stormed with requests when multiple routers happen to reboot almost simultaneously. In our architecture, it is the persistent storage of the distributed LKS that plays a major role on a router reboot. As already seen the LKS of each GM writes to persistent storage some configuration and policy information such as the key scope, adjacencies, SAs, the traffic keys corresponding to itself and its neighbors, certificate received from the GCKS, and the policy token. Hence on a GM reboot, the LKS retrieves information from the persistent storage. This is an extremely important feature since it avoids the GCKS being flooded with requests for information when multiple routers in the AD happen to reboot.
However, information retrieval from the persistent storage may not always be sufficient. Occasionally a rekey could have happened when a router was down. This could have been either a regular rekey or a rekey due to a router installation or removal. These cases should be dealt with in an appropriate manner so as to ensure that the rebooted router gets the latest SA and adjacency information.
In order to handle these cases, a router needs to query its neighbors on a reboot. This is done as soon as the router has rebooted and read the relevant information from its persistent store. The neighbors communicate their traffic key and SA information to the rebooted router. Depending on this information as well as the key scope information retrieved from the persistent storage, the rebooted router can handle a rekey appropriately. This interaction with the neighbors for the different cases of key scopes is explained below:
The method described above helps ensure that in a majority of cases, rekeys that could have happened when a router was down are handled. There are a couple of cases to be considered as yet.
Firstly, the rebooted router should verify whether the adjacencies as retrieved from its persistent storage are accurate still. They could now be stale due to the fact that a router could have been installed/ uninstalled when it was rebooting.
Secondly, in the discussion above regarding the ways in which reboots can be handled for the different categories of keying groups, we have mentioned that a router queries only one neighbor in some cases and one neighbor per link or interface in other cases. A situation could arise wherein the queried neighbor itself had gone through a reboot resulting in its own key being stale. This in turn would mean that the querying router cannot rely on the information got from this single neighbor.
One way in which both of these issues could be addressed is for the rebooted router to query the GCKS to get the updated information. However we do not want the GCKS to be flooded with requests from the various routers in the AD. Hence there are two layers of protection designed as follows:
Due to the randomness introduced, chances of the GCKS being flooded with requests are reduced. The GCKS when queried, could give the router information corresponding to its new adjacencies, probably the time of change of its adjacencies and any other relevant rekey information. This enables the rebooted router to know whether its traffic keys are stale or not.
Another fine point here is that very rarely the rekey process could be in progress when the router comes up. This is a corner case and is being left for future work.
Any system that has widespread deployment should be designed keeping the scalability feature in mind. If scalability is overlooked during the design phase, the system would fail on high loads when actually deployed.
We have designed the automated key management system so as to make it scalable. We have already mentioned that we are limiting the scope of our problem to key and adjacency management within an AD. Even within an AD since the number of routers is not fixed, the system should be able to handle a variable/ large number of routers. The proposed protocol involves a set of GCKS-GM interactions and a set of GM-GM interactions. The GM-GM communication is only among neighboring GMs and hence scalability is not an issue for that. Even for the GCKS-GM communication in the normal case, there should not be any issue since all GMs are not installed or turned on at the same time. However, a situation to be considered is when the GMs reboot. It could so happen that due to a power outage, all GMs in the AD go down and come back up at approximately the same time. It is extremely important to ensure that the GCKS is not stormed with requests at this point.
Our proposal handles this case in a couple of ways. Firstly we have seen that the LKS of each GM maintains a stable storage. All important pieces of information, such as the ones got from the GCKS and from the neighboring GMs are written to this storage, which is persistent across reboots. Hence a GM after a reboot, reads information directly from its persistent storage thereby preventing the GCKS from being flooded with requests. Secondly after retrieving information from the local storage, when the GMs need to query the GCKS itself, they do so by starting a timer and querying at a random time interval. This plays a major role in preventing the GCKS from being overloaded thereby leading to scalability.
Another factor that enables partial distribution of functionality thereby enhancing scalability is the presence of the Standby GCKS. If a situation arises such that the active GCKS fails (which could be due to an overload), the Standby GCKS would immediately take over the functionality of the active one. This eliminates a single point of failure and hence allows the system to withstand higher loads, or more number of GMs in the AD.
We have already discussed why it is important for an automated key management system to manage adjacencies well. In fact, this is because routing protocol updates are usually exchanged with neighbors, which in turn leads to the requirement that communicating routers should be legitimate neighbors. It is a good practice to have adjacency management turned on in a network so that for any router, only its legitimate neighbors and all of its legitimate neighbors get to know the keys it uses for securing its control traffic.
However, sometimes an administrator may decide to turn off adjacency checks because his network of routers is probably too small and the extra overhead is not required. This would mean that any router is then allowed to query for and receive the traffic keys of any other router in the network even though the routers may not be neighbors. If adjacency management is turned off, even routing protocols would respond to all control packets without performing adjacency checks. This definitely reduces security in the network.
If the key scope is such that the same traffic key is used throughout the AD, not much harm is caused if a router gives its key information to any other router in the AD since all routers share the same key. Of course mutual authentication of the routers should happen in order to know if the routers are valid members of the AD. However, an administrator could use the key per sender model, for example, and turn off adjacency management. The administrator then relies on the physical adjacency to ensure that a router far away from another router does not query it for keys.
Whenever a new system is to be deployed in the real world, the ease with which that can be done is of utmost importance. Network operators may not be ready to switch over to a new system if it is not easy to deploy it. Also, operators using a certain setup, when switching over to a new one would usually want to deploy the new system on an incremental basis. This would help them detect problems in the new system, if any, and then decide whether to completely move to the new model or not. We have designed our automated key management system keeping this requirement in mind. The model we have proposed can be deployed on a per interface basis. This means that initially GMs could be manually configured with the TEKs for some of their interfaces, and made to run the key management protocol to derive TEKs corresponding to the other interfaces. This is for the case of separate key per interface of each router. The other cases of keying groups can be handled in a similar manner. Secondly, the new system can be used to provide TEKs for one routing protocol at a time. This again makes the transition from the manual method of configuration to the automated method smooth.
Whenever the TEK is changed, smooth key rollover should be ensured so that no packets are dropped during the process of key transitions. In order to achieve this, while transitioning from the old key to the new one, for a short duration routers have to accept messages secured using either key. This allows for the time delay involved in the new keys being received by all routers participating in that particular communication. After a certain time period as determined by a timer, the old key information could be cleared. For smooth key rollover in multicast communication, these points have been explained in more detail in \cite{rfc-multicast-extensions}. For unicast communication, either this method could be followed or the two participating routers could exchange new keys and acknowledge the receipt of the keys just before beginning to use them.
The proposed design for key management describes the use of a centralized GCKS as the controller and co-ordinator for the entire AD. In any centralized system, there is a possibility of having a single point of failure. In such a system, if the central entity goes down, it could so happen that the entire system stops functioning due to loss of important data. This can be avoided by having a backup entity to take over when the primary controller goes down. This is precisely what is proposed in our design in \xs{sec:entities}. We propose maintaining a Standby GCKS, which is always kept in sync with the primary GCKS. This can be done by correctly syncing all data from the active to the standby at regular intervals. The appropriate interval could be determined by the policies handed down by the Policy Server to the GCKS. Whenever the active goes down, the standby can immediately take over its responsibility thereby preventing any interruption in the functioning of the system. This introduces a certain degree of distribution of functionality and hence can successfully eliminate a single point of failure.
TBD
This document has no actions for IANA.
[NOTE TO RFC EDITOR: this section for use during I-D stage only. Please remove before publishing as RFC.]
atwood-karp-akam-rp-01
atwood-karp-akam-rp-00 (original submission, based on Revathi's thesis)
[NOTE TO RFC EDITOR: this section for use during I-D stage only. Please remove before publishing as RFC.]
List of stuff that still needs work
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. |