TOC |
|
This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”
The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.
This Internet-Draft will expire on December 4, 2009.
Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents in effect on the date of publication of this document (http://trustee.ietf.org/license-info). Please review these documents carefully, as they describe your rights and restrictions with respect to this document.
A trusted database located on the first switch, storing the binding between end-nodes Link-Layer-Addresses (LLA) and their IPv6 addresses would be an essential part of source address validation. To build such a database, one must:
and solutions would differ by one or more of these elements.
While also getting its binding data from NDP, this draft proposes an alternate to "first-come-first-serve" basis [fcfs] (Nordmark, E. and M. Bagnulo, “First-Come First-Serve Source-Address Validation Implementation,” March 2009.)), by specifying a preference algorithm to deal with collisions. Instead of the simplistic first-come first serve collision handling, the proposed algorithm relies on the following criterias to choose between two coliding entries:
Since the state of the entry is one of the element of the algorithm, this draft also describes a tracking mechanism to maintain entries in states where the preference algorithm can enable end-nodes movement.
1.
Introduction
2.
Goals and assumptions
2.1.
Definitions and Terminology
2.2.
Scenarios considered
3.
Source of information
4.
Binding table
4.1.
Data model
4.2.
Entry preference algorithm
4.2.1.
Preference Level
4.2.2.
Entry update algorithm
4.2.3.
Enabling slow movement
4.3.
Binding entry tracking
4.4.
Binding table state machine
5.
Configuration
5.1.
Switch port configuration
5.2.
Binding table configuration
6.
Bridging NDP traffic
6.1.
Bridging DAD NS
6.2.
Bridging other NDP messages
7.
Normative References
Appendix A.
Contributors and Acknowledgments
§
Author's Address
TOC |
To populate the first-switch binding table, this document propose a scheme based on NDP snooping, and introduces a preference level algorithm to deal with collisions. It is organized as follows:
TOC |
The primary goal of the proposed approach is for the layer2 switch to maintain an accurate view of the nodes attached to it, directly or via another layer2 switch. This view is referred to as the switch "binding table". The following goals are also looked at:
The binding table includes the nodes IPv6 address, link-layer address, switch port they were leanrt from, whether an access port or a trunk port (port to another switch).
This binding table is the keystone to detect and arbitrage in case of collisions. It also brings a couple of interesting by-products: it can provide some address spoofing mitigation, and it can be used to limit multicast traffic forwarding.
TOC |
The following teminology is being used:
- plb-switch:
- A switch that implement the algorithms described in this draft
TOC |
Three main scenarios are considered in this document:
+------+ |HostA +-----------------+ +------+ | | +-----+------+ +------+ | | |HostB +-----------+ SWITCHA + +------+ | | +-----+------+ | +------+ | |HostC +-----------------+ +------+
+------+ +------+ |HostA +-----------------+ +--------------+HostD | +------+ | | +------+ | | +-----+------+ +-----+------+ +------+ | | | | +------+ |HostB +-----------+ SWITCHA +-----+ SWITCHB +-------|HostE | +------+ | | | | +------+ +-----+------+ +-----+------+ | | +------+ | | +------+ |HostC +-----------------+ +--------------+HostF | +------+ +------+
TOC |
Basically, there should be the following source of data for filling the table:
Note that the binding information can also be learnt from other protocol sources such as DHCP or even be configured statically on the switch. This is outside the scope of this document to detail how this would be performed. However, binding table entries learnt by non-NDP methods might collide with entries leant via NDP snooping, and section [address collision resolution] describes how to prefer one entry over another one.
TOC |
A table is maintained on the switch(es) that binds layer3 (IPv6) address and link-layer address (MAC).
TOC |
A record of the binding table should contain the following information:
A global scope address should be unique across ports and vlans. A link-local scope address is unique within a vlan. Therefore, the database is a collection of l3-entries, keyed by ipv6-address and zoneid. A zoneid is a function of the scope of the address (LINK-LOCAL, GLOBAL) and the vlanid:
A collision between an existing entry and a candidate entry would occur if the two entries have the same v6addr and zoneid.These field are referred to as the "key"
The fields of an entry other than the key (port, vlanid, lla, etc) will the referred to as attributes. Changing attributes of an entry require complying with the Entry update algorithm described in Section 4.2 (Entry preference algorithm).
TOC |
TOC |
The preference level (preflevel) is an attribute of an entry in the binding table. It is setup when the entry is learnt, based on where it is learnt from, the credentials associated with it and other criterias to-be-defined. The preflevel is used to arbitrage between two candidate entries (with identical key) in the binding table. The higher the preference level is, the more preferred the entry.
One of the key elements of the preflevel associated to an entry is the port is was learnt from. For example, an entry would have different preflevels if it is learnt from:
Another important element is the credentials associated with this learning. An entry could be associated with cryptographic proof (CGA), and/or the LLA learnt could match the source MAC of the frame from which it was learnt.
The following preflevel values have been identified (from lowest to highest):
An entry can sum up preference values, for instance it could be TRUNK_PORT + LLA_MAC_MATCH. However, the preference level value should be encoded in such a way that the sum of preferences 1 to N-1 is smaller that preference N. For example:
TOC |
Once an entry is installed in the binding table, its attributes cannot be changed without complying with this “entry update algorithm”.
The algorithm is as follows, starting with rule_1, up to rule_5, in that order until one rule is satisfied:
TOC |
It is quite a common scenario that an end-node is moving from one port of the switch to another one, or to a different switch. It is also possible that the end-node is updating its hardware and start using a different MAC address. There are two paradoxical goals with the trusted binding table: insure entry ownership and enable movement. The former drives the locking of the address, mac, and port alltogether, and prevent updates other than on the base of preference. It also works a lot better when entry lifetime is very long or infinite. The latter requires that a node can easily move from one port to another one, from one mac to another one. Enforcing address ownership will tend to lead to rejection of any movement and classify it as an attack.
The algorithm described in Section 4.2.2 (Entry update algorithm), conbined with the capability to manage entry states reviewed in Section 4.4 (Binding table state machine) enables end-nodes to move from on switch port to another port (or one mac to another) under three scenarios:
Note that movement driven bu T1 is tied up to the accuracy of the REACHABILIY state. Maintaining this state with some entry tracking mechanism as described in Section 4.3 (Binding entry tracking) is going to it a lot more efficient.
TOC |
In order to maintain an accurate view of the devices location and device state, which is a key element of the binding table entry preference algorithm, an entry tracking mechanism can be enabled. The tracking of entries is performed on a per-port basis, per IPv6 address basis, by “layer-2 unicasting” DAD NS on the port the address was first learnt from, to the Destination MAC (DMAC) known to be bound to that address.
The DMAC can be learnt from the LLA option carried in some NDP messages, configured statically, or, in last resort, from the source mac (SMAC) address of NDP messages referring to that address. In the case of NDP messages not sourced with UNSPECIFIED address, that would be the source address of the messages. In the case of DAD NS, that would be the target address
TOC |
The entry lifecycle is driven by the switch, not by NDP: this is especially important to insure that entries are kept as long as needed in the table rather than following the rule of the ND cache, dictated by other requirements.
Typically, an entry will be created INCOMPLETE, move to REACHABLE when binding is known, move back and forth from REACHABLE to VERIFY if tracking is enabled, at some point move to STALE when the device (the address owner) stop talking on the link. The entry could stay in that state for very long, sometime forever depending on the configuration (see “configuration” section.
Four states are defined:
The binding table state machine is as follows:
T0 E1 +------+ send DAD-NS +----------+ | | increment r0 | | | V | | +---+--------------+ +--------------+---+ | | | E1 | |<-----+ | INCOMPLETE +----------------->| REACHABLE | | | T1 | | | | /--------------+ | +-----+------------+ / +------+-----------+ |R0 / A | A | / / | | V / / |T1 |E1 delete / / | | V / V | +------------------+ E1 / +------------+-----+ | +--------------- | |T3 | VERIFY | R2 | STALE +---> delete | +----------------->| | | | | | +---+--------------+ +------------------+ | A | | send DAD-NS +------+ increment r2 T2
The following events are driving the state transitions:
Default values are as follows:
All the default values should be overridden-able by configuration.
TOC |
TOC |
Qualifying a port of the switch is of primary importance to influence the “entry update algorithm” (see Section 4.2 (Entry preference algorithm)). The switch configuration should allow the following values to be configured on a per-port basis:
TOC |
The following elements, acting on the binding table behavior, should be configurable, globally or on a per-port basis:
TOC |
One important aspect of an “NDP-aware” switch is to efficiently bridge the NDP traffic to destinations. In some areas, the switch might have a behavior different from a regular non plb-switch:
The general bridging algorithm is as follows. When an NDP message is received by the layer2 switch, the switch extracts the link-layer information, if any. If no LLA is provided, the switch should bridge normally the message to destination. If LLA is provided, the switch can lookup its binding table for this entry. If no entry is found, it creates one, and bridges the message normally. If an entry is found with attributes consistent with the ones received (port, zoneid, etc), it should bridge the message normally. If the attribute are not consistent, and a change is allowed (see Section 4.2 (Entry preference algorithm)), it should update the attributes and bridge the message. If the change is disallowed, it should drop the message.
TOC |
Bridging DAD NS is critical to both security and binding table distribution. Flows below study some relevant cases.
In scenario A, the switch SWITCH_A has only end-nodes connected to it.
Scenario A: +--------+ +--------+ +--------+ +--------+ | host 1 + |SWITCH_A| |host 2 | | host 3 | +--------+ +--------+ +--------+ +--------+ | | | | | switch up | | | | DAD NS tgt=X | | | |<------------------+ | | no hit | | | X stored, pref=ACCESS | | | | | | | DAD NS tgt=X conditional forward (1) | | |<------------------O------------------------------------->| | NA | | | |------------------>| | | | hit, newpref=ACCESS | | | do not replace | | | drop | | | | | | | | ... | | | | | DAD NS tgt=X | | |<-------------------------------------| | hit, newpref=ACCESS | | | forward to owner | | | |------------------>| | | | | | | DAD NS tgt=X conditional forward (1)| | |<------------------| | | | replace | | | NA | | | |<------------------| | | | | | | | | | |
When nodes come up, the switch is assumed to be already up. As the result of it, since the switch stores entries for all addresses it snoops, it is going to have a fairly accurate view of the nodes (addresses). Host 2 comes up, and sends a DAD NS for target X, intercepted by the switch. Switch_A does not have X in its binding table, stores it (INCOMPLETE), and bridges it to other nodes host1 and host3. If MLD snooping is in effect, the switch might decide not to forward it at all (no other known group listener for the solicited-node multicast group), or only to a few hosts. Regardless of MLD snooping, flow (1) is not absolutely "useful" and could even be harmful. If we assume the switch knows all addresses of the link/vlan, then it knows nobody owns yet this address. In that case, sending it to other hosts would be an invite for an attack. There is a tradeoff between two issues which are not equally probable: a risk to break DAD and a risk to be vulnerable to a DoS on address resolution.
The latter is well understood: should the switch broadcast DAD NS, an attacker can immediately claim ownership with an NA. As far as the former, it would happen if following conditions are met:
In scenario B, SWICTH_A is also connected to a second switch SWITCH_B, which runs the same logic to populate its own binding table.
Scenario B: +--------+ +--------+ +--------+ +--------+ | host 1 + |SWITCH_A| |SWITCH_B| | host 2 | +--------+ +--------+ +--------+ +--------+ | | switch up | | | | DAD NS tgt=X | | | |<-----------------| | | No hit, no trunk up | | switch up X stored in Bt, pref= ACCESS | | | | | DAD NS tgt=X | | | |------------------>| | | | no hit | | | X stored, pref=ACCESS | | | forward on trunk (2) | | | |------------------>| | | | hit (host2) | | | | forward to owner | | | |----------------->| | | | NA | | | |<-----------------| | | hit, owner | | | NA forward on trunk | | |<------------------| | | hit, newpref=TRUSTED_TRUNK | | replace | | | NA | | | |<------------------| | | | | | | | | | |
When SWITCH_A comes up, it may come after SWITCH_B. In this case, it is unaware about end-nodes attached to SWITCH_B. SWITCH_B however knows all of them, with the same assumptions as in scenario A. Upon receiving a DAD NS for target X, and in the absence of a hit, SWITCH_A creates an INCOMPLETE entry and forwards it to SWITCH_B.
Scenario C connects SWITCH_A to a SWITCH_B that does not run the same binding table alrorigthm (referred to as a non plb-switch). In this scenario, SWITCH_A forwarding on the trunk a DAD NS for target X. Configuration should tell whether any response coming from SWITCH_B is to be trusted (in the lack of better credential such as CGA/RSA proof). If SWITCH_B is fully trusted, then the trunk is configured as "TRUSTED_TRUNK" and scenario B applies. Otherwise, the trunk is configured as "TRUNK" and response is ignored.
Scenario C: +--------+ +--------+ +--------+ +--------+ | host 1 + |SWITCH_A| |SWITCH_B| | host 2 | +--------+ +--------+ +--------+ +--------+ | | switch up | | | | DAD NS tgt=X | | | |<-----------------| | | | | | switch up | | | | | | | DAD NS tgt=X | | | |------------------>| | | | no hit | | | X stored, pref=ACCESS | | | |------------------>| | | | | to group | | | |----------------->| | | | NA | | | |<-----------------| | | NA | | | |<------------------| | | hit, newpref=TRUNK | | | do not replace | | | drop NA | | | | | | | | | | | | | |
TOC |
When running the proposed binding table populate algorithm, switches are expected to have an accurate view of end-nodes attached to them. While scenario C is problematic, scenario A and B are clearer. If a switch has an entry in its table that conflicts with binding observed in an NDP message just received, it should drop the message (if new data has a smaller preflevel) or update its entry and bridge the message.
If the switch does not have such entry, it should create the entry and bridge the message, including to trunks.
In the case of multicast messages, it should bridge it on trunks regardless of group registration, to give a chance to other switch to buildup a more accurate binding table.
TOC |
[RFC3971] | Arkko, J., Kempf, J., Zill, B., and P. Nikander, “SEcure Neighbor Discovery (SEND),” RFC 3971, March 2005 (TXT). |
[RFC3972] | Aura, T., “Cryptographically Generated Addresses (CGA),” RFC 3972, March 2005 (TXT). |
[RFC4861] | Narten, T., Nordmark, E., Simpson, W., and H. Soliman, “Neighbor Discovery for IP version 6 (IPv6),” RFC 4861, September 2007 (TXT). |
[RFC4862] | Thomson, S., Narten, T., and T. Jinmei, “IPv6 Stateless Address Autoconfiguration,” RFC 4862, September 2007 (TXT). |
[fcfs] | Nordmark, E. and M. Bagnulo, “First-Come First-Serve Source-Address Validation Implementation,” draft-ietf-savi-fcfs-01 I-D, March 2009. |
TOC |
This draft benefited from the input from: Pascal Thubert.
TOC |
Eric Levy-Abegnoli | |
Cisco Systems | |
Village d'Entreprises Green Side - 400, Avenue Roumanille | |
Biot-Sophia Antipolis - 06410 | |
France | |
Email: | elevyabe@cisco.com |