Internet DRAFT - draft-crowcroft-multirouting
draft-crowcroft-multirouting
HTTP/1.1 200 OK
Date: Mon, 08 Apr 2002 23:25:54 GMT
Server: Apache/1.3.20 (Unix)
Last-Modified: Wed, 09 Dec 1992 04:36:00 GMT
ETag: "3ddd7e-14c82-2b2577b0"
Accept-Ranges: bytes
Content-Length: 85122
Connection: close
Content-Type: text/plain
Core Based Trees (CBT)
- Scalable Multicast Routing -
by
A. Ballardie (UCL), P. Tsuchiya (Bellcore), J. Crowcroft (UCL)
Status of this Draft
This document is an Internet Draft. Internet Drafts are working
documents of the Internet Engineering Task Force (IETF), its Areas,
and its Working Groups. Note that other groups may also distribute
working documents as Internet Drafts. Internet Drafts are draft
documents valid for a maximum of six months. Internet Drafts may be
updated, replaced, or obsoleted by other documents at any time. It is
not appropriate to use Internet Drafts as reference material or to
cite them other than as a ``working draft'' or ``work in progress.''
Please check the 1id-abstracts.txt listing contained in the
internet-drafts Shadow Directories on nic.ddn.mil, nnsc.nsf.net,
nic.nordu.net, ftp.nisc.sri.com, or munnari.oz.au to learn the current
status of any Internet Draft.
Abstract
Multicasting is a technique used which allows for group
communication either locally or on a wide-area scale. Most
local-area networks such as Ethernet and Token Ring provide a
multicast service, which has since been exploited by many
applications and distributed systems. More recently,
multicast capability has been extended to the internetwork
using a combination of a distance-vector routing algorithm, a
host-to-router group-membership reporting protocol, and the
computation of sender-based multicast trees. This, and other
approaches to internetwork multicasting, are not scalable to
large groups, especially so for large numbers of groups.
This document provides a proposal for a new multicast
routing algorithm which provides multicast capability in a
datagram internetwork. It is scalable, low-cost and
efficient - properties lacking in current internetwork
multicast routing protocols.
Distribution of this draft is unlimited. A mailing list
is in place in idmr@cs.ucl.ac.uk (subscribe to
idmr-request@cs.ucl.ac.uk) to discuss this and other
inter-domain multicast routing issues.
(NOTE: DIAGRAMS HAVE BEEN OMITTED FROM THIS .txt VERSION OF CBT. HOWEVER,
THEY ARE INCLUDED IN THE .ps VERSION OF THIS DRAFT WHICH IS
AVAILABLE IN THE "internet-drafts" DIRECTORIES UNDER THE SAME
NAME).
CBT Expires April 6th, 1993.
i
Contents
1 Introduction. . . . . . . 1
2 CBT Design Goals . . . . . . 2
3 Assumptions of the CBT Model . . . . 4
4 CBT Design Overview. . . . . . 5
4.1 Some Terminology . . . . . 5
4.2 Packet Types . . . . . 7
4.3 CBT Introduction . . . . . 10
4.4 CBT Algorithm . . . . . 13
4.5 Some Examples . . . . . 16
5 The LAN Issue . . . . . . 20
5.1 Further Details on the LAN Issue . . . 22
5.2 Summary . . . . . . 25
6 More on how a Router becomes a Major Core . . . 26
7 Core Assignment and Optimality . . . . 27
7.1 Dynamically Improving Optimality . . . 27
8 Major Cores and Minor Cores: The core-join request . 29
9 Core Tree Maintenance Protocol . . . . 30
9.1 The quit-request . . . . . 34
10 core-state-notification (CSN) packet . . . 35
10.1 How the CSN Packet Works . . . . 35
11 core-list packet . . . . . . 37
12 Acknowledgements . . . . . . 42
A Authors' Addresses . . . . . 43
CBT - Internet DRAFT, November 1992 1
1 Introduction
Multicast transmission is an increasingly important function in data
networks. For instance, SMDS specifies multicast as part of its
service, and multicast is an important element for ATM networking,
particularly for, but not limited to, video broadcast. Multicast is
also increasingly used in the IP internet.
One of the central problems in multicast transmission is forming the
multicast tree - the collection of nodes and links that the multicast
packet traverses. In spite of the fact that substantial work has been
done in the area of algorithms for forming multicast trees,
significant problems remain to be solved. Paramount among these is
the problem of scaling. Current multicast techniques either scale
well but form grossly suboptimal trees, or they form good trees but
scale poorly. (By scale poorly, we mean consume too many memory,
bandwidth, or processing resources.)
For example, the multicast model that has been adopted for
internetwork multicasting is based onformulating a sender-based
multicast spanning tree for each member of a group to which a host is
subscribed, using information extracted from the distance-vecto
routing algorithm. This imposes considerable storage costs on
participating routers. These costs include storing parent and child
attributes (and corresponding next-hop link information) in existing
route table entries, as well as some other protocol-dependent
information. Furthermore, all routers in the internet that have
multicast capability are required to participate in establishing a
multicast tree for each group present in the internet, whether they
are on the multicast tree or not1. This is clearly not scalable as
the number of groups to which a host subscribes, and the number of
hosts per group, increases.
The number of applications taking advantage of the internetwork
multicast service is growing rapidly. Example applications include
video-conferencing, audio-conferencing, distributed database
updating/querying, and location mechanisms for distributed network
services. It is therefore important that any internetwork multicast
routing algorithm be scalable, efficient (in terms of multicast packet
forwarding), low-cost (in terms of computational overhead and storage
requirements), and be adaptable to security features - an important
issue so far not addressed in multicasting.
This draft proposes a new algorithm for multicast group
communication in the global internetwork, called CBT (Core Based Trees).
It satisfies all of the requirements stated above.
----------------------------
1. This overhead can be reduced by using an appropriate TTL value
in the IP header, especially so for localised groups
CBT - Internet DRAFT, November 1992 2
2 CBT Design Goals
1. The design of any multicast routing algorithm should satisfy the
requirements of the Host Group Model as specified in [1]. These
mostly have to do with the flexibility of multicasting, for
example:
o Senders need not be members.
o Groups may have any number of members.
o There are no topological restrictions on group membership.
o Membership is dynamic and autonomous.
o Host groups may be permanent or transient.
2. Scalability is of paramount importance in any multicast routing
algorithm. Good multicast routing algorithm scaling
characteristics include low memory, low bandwidth, and low
processing resouce requirements. Furthermore, the less routers
need to know about the location of remote group members, the
better.
3. Robust Multicast Routing. CBT uses a core-based approach to
building the multicast spanning tree. Most groups will be small
at the beginning of their lifetime, and so the multicast spanning
tree for a group will be single-core in the beginning. The
destination address in all multicast packets will be the address
of the core. As groups become larger and more widespread,
additional core(s) may be assigned to the group. Core placement
will be chosen so as to enhance the optimality and robustness of
the multicast delivery tree.
4. Invisible Multicast Routing and Addressing to routers not on the
CBT tree. CBT allows for multicast routing and addressing to be
completely invisible to underlying unicast routing and addressing
to routers not on the CBT tree. No separate multicast routing
table is required by routers (although minor extensions to the
unicast routing table are required), and looking at the IP
address field alone, multicast addresses are indistinguishable
from IP unicast addresses.
CBT - Internet DRAFT, November 1992 3
Multicast addresses are only recognised as such if the router
in question is a member of the same group as the group identified
in an incoming multicast packet, i.e. it is a router on the CBT
tree for that group. The router in question can establish this
by matching its group identifier with that present in the IP Options
field. The group identifier is 32 bits long. If there is no match,
then the packet is routed as a normal unicast packet towards the core
addressed in the packet. We considered this addressing
flexibility an important design goal since multicast addresses
are no longer restricted to class D addresses, the administrative
overhead of assigning them is reduced, and we see a degree of
multicast address and routing transparency introduced not seen
before in multicasting. Furthermore, by having a separate name
space it is concievable to apply additional semantics to the
name.
Routers that are part of a CBT tree need only know which routers
are its parent (any router has only one parent per CBT tree) and
which are its children (the exception to this rule is for certain
cores on the core tree which need to know about each other
although they are not directly connected neighbours). Thus, CBT
allows for a considerable amount of information hiding when
compared with other network-level multicast approaches.
5. Routing Algorithm/Protocol Independence. CBT's flexible
addressing method allows CBT to be installable anywhere, even
across different routing algorithms. Furthermore, it is easy to
incorporate into new protocols that may not have a separate
address space set aside for multicasting (for example, ATM with
E.164 addresses). This is obviously advantageous for
inter-domain multicast routing, and also means it can be applied
to technologies such as SMDS and ATM networks
6. Independence from any new IP addressing scheme. One of the
biggest headaches currently facing the internet research
community is that the 32-bit IP address space is fast exhausting.
A considerable number of researchers are frantically trying to
find the right addressing solution to satisfy future growth
expectations and beyond. As far as we can foresee, CBT should
function independent of addressing method. This is certainly the
case for the proposals so far reviewed by the authors.
CBT - Internet DRAFT, November 1992 4
3 Assumptions of the CBT Model
o CBT is discussed here in the context of a connectionless datagram
internetwork, although as we have already stated, the design of
CBT should allow it to run over heterogeneous networks.
o CBT uses both ICMP and IGMP protocols, and although the both are
standard parts of IP, the latter is present in many, but not all,
implementations. We assume its presence for CBT.
o CBT requires certain messages not originating on the CBT tree, to
traverse only tree branches once encountering the tree.
Furthermore, messages originating in certain parts of the tree
must only traverse certain branches, and any acknowledgements to
these messages must similarly traverse the same branch(es) on the
return-path. We have not assumed symmetric routes and have
therefore made explicit provisions to create symmetric routes
where necessary.
o CBT routers are not dedicated to CBT multicast routing. They are
normal IP routers with normal IP routing capabilities. A router
may be a CBT router for more than one group at any one time. CBT
functionality then, is considered an extension to normal IP
routing functionality. Ideally, all routers in the internetwork
will have CBT functionality and each will make itself available
to become a core for a group(s) at any time.
o Last, but by no means least, we assume the presence of a
multicast core-assignment service, or core server. The
specification of such a core-assignment service is outside the
context of this draft. However, we make numerous
assumptions/suggestions as to how it would/should operate and
some of the services it should provide. We also include some of
the messages that might be exchanged between such a service and
CBT in order to provide a clearer picture, but these message are
NOT part of CBT. Whether a dedicated core-assignment service is
required, or directory services should assume this role, is open
to debate, but clearly a core-assignment service would require
global topology information in order to be able to assign cores
reasonably optimally. For simplicity, we will refer to a
dedicated core-assignment service throughout this draft as
multicast directory service.
We feel some kind of dedicated multicast service is necessary not
only for CBT, but for all multicast protocols. If such a service
were presently in operation, maybe it could assign the class D
addresses required by current internet multicast routing
protocols, which at present, come from ``thin air''.
CBT - Internet DRAFT, November 1992 5
4 CBT Design Overview
We will first of all provide a description of some of the terminology
used in CBT. This is followed by a subsection on CBT describing what
it is, what it aims to provide, and briefly how it works. Finally, a
more detailed description of the algorithm is given. CBT encompasses
a rather more complicated protocol for maintaining the connectivity of
the backbone of the CBT tree (core-tree), called the Core-tree
Maintenance Protocol. This may be mentioned in the description of the
algorithm, but we have dedicated a complete section to it later on
because of its importance and for reasons of clarity.
4.1 Some Terminology
CBT (core-based tree) is a multicast routing algorithm which results
in a centre-based (core-based) multicast delivery tree. There is one
CBT tree per group. A core-based tree can be of two types:
o single-core
o multi-core
Single-core CBT trees are for small, geographically local groups. As
a group becomes larger and more widespread, additional cores may be
assigned to the group concerned, thus making it a multi-core CBT tree.
We distinguish between three different types of router that make up a
CBT tree:
o major core - these are the only routers to be assigned by
directory service. They become the centre (or part of the centre
in the case of multi-core trees) of a multicast CBT spanning
tree. They are assigned so as to be strategically placed to form
as near an optimal multicast spanning tree as possible for the
group. All multicast packets contain, as their destination
address, the (unicast) address of a major core.
o minor core - router(s) which form a virtual link between major
core routers on a CBT tree. These routers are only present in
multi-core CBT trees. When an additoinal major core is assigned
to a single-core CBT tree, that major core must make a special
core-join request to the existing major core. The path of
routers the core-join request-ACK traverses become minor cores on
the CBT tree.
CBT - Internet DRAFT, November 1992 6
o non-core router - all routers part of the CBT tree that are not
major or minor cores are non-core routers.
The part(s) of the CBT spanning tree that link major core routers
together, i.e. the branches of the tree that only contain major and
minor cores, is known as the core tree.
o major child - the major core that has sent (and received and ACK
for) a core-join request to another major core on the core tree.
o major parent - the major core which has received and ACK'd a
core-join request from another major core, now on the core tree.
CBT - Internet DRAFT, November 1992 7
4.2 Packet Types
The following is a list of all CBT and CBT-related control packet
types. Brief descriptions are given here, with fuller implementation
details available in the appropriate later sections.
o CBT control packet types.
1. send-join request - sent from a host to its local router
requesting that router to formulate and send a join request.
It is an indication of the host's firm intention to join a
particular group2.
2. send-join-ACK - positive response from the local router to
the host, formulated and sent by it after forwarding a
join-request to the major core as specified in the send-join
request. The host's group membership is pending.
3. join-request - originated by (if received a send-join request
from host on attached subnet), or forwarded by, a router that
is not already part of the CBT tree for the group identified
by group-id inside the packet. Non-core router status is
pending.
4. join-ACK - formulated by any router on the CBT tree that
receives a join-request. All routers traversed by a join-ACK
become non-core routers on the CBT tree for the group
identified by group-id.
5. core-join request - formulated and sent by a major core
router that has been newly-assigned to a group. It will have
been assigned as such by directory service, which also
supplies it with the address of an existing major core to
which the core-join request must be forwarded.
Alternatively, a core-join request is formulated and sent by
an existing major core to another existing major core when
reachability between the two has been lost, or one of the
major core's parents has become unavailable. Both scenarios
amount to a core-tree partition.
6. core-join-ACK - formulated and sent by either a major core
(in the case it actually received the core-join request), or
by a minor core (if that minor core terminated the core-join
CBT - Internet DRAFT, November 1992 8
request), towards the source of the core-join request. All
routers traversed by a core-join-ACK on its way to its
destination become minor core routers, if are not already
such.
7. termination-notification - formulated by a minor core that
has received (and thus terminated) a core-join request. It
is forwarded to the major core addressed as the target of the
of the core-join request.
8. termination-notification-ACK - formulated by the major core
that was the target of a core-join request terminated by a
minor core. It is forwarded to that minor core.
9. core-state-notification - originated by a major core when
that major core has sent either a core-join-ACK, or a
termination-notification-ACK. Disemminated on all core-tree
interfaces.
Alternatively, a major core will generate a
core-state-notification when it loses reachability to a
virtual neighbour.
10. core-list - generated by the originating major core of a
core-state-notification packet. It is the result of a
core-state-notification packet having stabilized. The
core-list is multicast on the CBT tree.
----------------------------
2. This will incur minor host modifications. Any host
modifications are obviously undesirable, but we feel that if we are
going to solve the problem of inter-domain multicasting for the
long-term future to provide a scalable and simple approach, host
modifications will be eventually necessary. The minor host changes
will be with respect to IGMP
CBT - Internet DRAFT, November 1992 9
11. quit-request - can be composed by all three types of CBT
router. In the case of major cores, a quit-request is sent
when a new core-join request arrives from a currently
unreachable major child via a different interface than that
leading to the child previously.
In the case of minor cores, a quit-request is sent only when
one is received on a child core-tree interface AND there are
no member hosts on any attached subnets.
In the case of non-core routers, a quit-request is sent
whenever the non-core router's parent changes and the path
leading to the new parent is different from the old, or a
quit request is received from a downstream non-core AND the
current router has no member hosts on any attached subnets.
Note: All CBT control packets will be sent encapsulated in IP
datagrams.
o ICMP.
1. ICMP echo request - sent both ways between major child and
major parent, traversing only the core-tree, to test the
``liveness'' of each.
2. ICMP echo reply - a positive response to the above, again,
sent only via the core-tree.
o IGMP.
1. group-membership query - composed by routers and sent locally
to attached subnets to monitor the presence of group members.
Failure to receive a response for a particular group query
after a certain timeout period results in bf non-core router
sending a quit-request to its parent.
2. group-membership report - composed in response to receiving a
group-membership query from the local designated router, to
indicate that the group in question still has membership on
this subnet.
o CBT - Directory Service.
1. group-initiation request - composed by a host that wishes to
create a group for a particular multicast application. It
contains the address(es) of the other parties with which it
wishes to communicate, and is forwarded to directory service
as a request for a major core to be assigned. It expects the
address of a new core to be returned.
2. group-join request - composed by a host that wishes to join
an existing group. It is a request for a major core address.
3. core-request - a request from directory service aimed at a
router for it to become a major core for a group.
CBT - Internet DRAFT, November 1992 10
4.3 CBT Introduction
The aim of CBT is to provide a core-based tree for each multicast
group. This is a centre-based shortest-path spanning tree, with
branches emanating from the centre. The tree will normally start out
by being a single-core tree but as the group becomes larger and more
widespread, additional core(s) will be assigned to the tree for
reasons of routing efficiency and robustness. All cores are assigned
by a multicast directory service, the operation of which is outside
the context of this draft. When more than a single core is present in
a CBT tree, the branch(es) linking the cores together form the
core-tree. The assigned cores for a particular CBT tree, known as
major core routers, are likely to be geographically distant apart, and
so the core tree which links these cores is likely to contain other
routers, known as minor core routers. For reasons of brevity, we
describe first the single-core case.
All branches of the tree consist of point-to-point links. When a host
on a subnet somewhere in the internet, wishes to join an existing
group, it sends a request to its local router to send a request to
join the group. This join-request contains as its destination
address, the address of the major core for the group, and is forwarded
using the underlying unicast routing algorithm to the next hop on its
way to the core. Every router that is traversed en-route will either
be a router that is already part of the CBT tree, known as non-core
routers, or will not be part of the CBT tree. If the join-request
encounters a non-core router, that router terminates the join-request,
and sends a join-ACK back to the source. All routers traversed by the
join-ACK become non-core routers on receiving it. The join-ACK
continues to be forwarded until it reaches the original source.
Routers that receive a join-request that are not non-cores for the CBT
tree will forward it on its way to the destination core using normal
unicast routing, and then await a join-ACK before becoming non-cores
for the CBT tree (group).
Whenever a join-ACK is sent or received via an interface, that
interface is marked as being a multicast interface for the
corresponding CBT tree (group). In this way, branches are formed on
the CBT tree. The diagram over illustrates a single-core CBT tree.
CBT - Internet DRAFT, November 1992 11
Figure 1: Single-core CBT Tree
Should the group become larger or/and more widespread, an additional
core(s) may be assigned by directory service. This new major core
must make a special join-request to the existing major core, to become
part of the core-tree. This request is called a core-join request.
All routers traversed en-route by the core-join request will become
minor core routers on receiving a core-join-ACK. Existing minor cores
on a CBT core-tree are obliged to terminate core-join requests and
send a core-join-ACK back to the source, which, on receipt of it,
becomes a new major core for the group. The new and existing major
cores become (virtual) neighbours (major neighbours), the latter being
the major parent and the former being the major child. On receiving
the core-join-ACK, both parent and child majors can begin exchanging
Core-tree Maintenance Protocol messages. The diagram over illustrates
a multi-core CBT tree.
CBT - Internet DRAFT, November 1992 12
Figure 2: Multi-core CBT Tree
Should an existing minor core terminate a core-join request, it must
send a termination-notification packet to the target of the join.
When it receives a termination-notification-ACK in response, it may
then generate a core-join-ACK which is forwarded to the source of the
core-join request. The termination-notification packet indicates to
the major parent that it has a new major child.
Multicast data packets can originate from members of the group or
non-members outside the group that are not part of the CBT tree. In
the former case, multicast data packets will originate from a member
host on a CBT router's attached subnet and be disemminated on reciept
by that local on all outgoing multicast interfaces identified by
group-id. Eventually, the data packets will reach all nodes on the
CBT tree.
In the latter case, multicast data packets will be forwarded as normal
data packets, with the destination being a major core for the group,
using the underlying unicast routing algorithm. The multicast data
packet will first encounter the CBT tree either when the destination
is reached, i.e. the major core, or it will hit another router (minor
or non-core) that is part of the CBT tree. In either case, the
multicast data packet will be forwarded on all outgoing interfaces
associated with the group-id present in the data packet(s). In this
way, the multicast data packets will eventually reach all nodes on the
CBT tree.
CBT - Internet DRAFT, November 1992 13
4.4 CBT Algorithm
As specified above, it is assumed that hosts and routers are running
the Internet Group Management Protocol - routers running the router
part, and hosts the host part [1].
Whenever a host anywhere in the internet wants to initiate a group,
there must be at least one other entity at the initial stage with
which it wishes to communicate. Taking as an example, two hosts in
different parts of the internet have agreed, by some external means
(e.g. e-mail), that they wish to partake in a video conference.
On group initiation, one host must make a group-initiation-request to
multicast directory service for a group initiation. If, as in this
case, there is no CBT tree yet present for the application concerned,
the host makes its group initiation request supplying all the host
addresses it currently knows about with which it wishes to take part
in the group communication. In this way, directory service can
``magically'' select a major core which is optimally placed between
those participants and supply the address of that major core, together
with the group-identifier for that group, to all hosts identified in
the initial reqest.
If the requesting host knows, by some external means, that a group
with which it wishes to communicate already exists, then it makes a
group-join reqest to directory service which reqires no arguments.
Directory service replies with the address of a major core and
group-identifier (group-id) for the group.
Once the requesting host has received the address of a major core for
the group, it must now send a send-join-reqest, which includes the
group-id and major core address, to its local router. There are two
scenarios to consider:
o The local router is not already a member for the group identified
by the group-id. In this case, the local router formulates a
join-request containing as its destination address, the address
of the core supplied in the send-join request and also containing
the group-id. This is forwarded using normal unicast routing on
the next hop to its destination. At this point, the local router
sends a send-join-ACK back to the local host, indicating to it
that the join-request has been sent and group membership is
pending. When a join-ACK is received by the local router, it is
forwarded to the local host that initiated the join-request.
o The local router is already a member for the group identified by
the group-id. In this case, the send-join request is not
actually honoured, and the local router formulates a
send-join-ACK and forwards it locally back to the host.
Following this, the router formulates a ``bogus'' join-ACK (since
no join-request was actually sent) and forwards it to the local
host.
CBT - Internet DRAFT, November 1992 14
Only at this point, can an end-system host consider itself a member
for the group, and as such can send/receive multicast packets for that
group, as well as send positive replies to IGMP group membership
queries.
Design decision: We decided a member host should not be permitted to
forward multicast packets to the CBT tree for a group until it
receives the join-ACK. Here we trade-off between reliablility and low
join-latency. By forwarding multicast packets to an upstream router
before being ``ACK'd'' would further reduce the degree of delivery
probability associated with a connectionless datagram network.
When a router receives a join-request from a neighbour router, there
are two scenarios to consider:
o Scenario 1: The router is already part of the CBT tree for the
group identified by group-id, but it is not the major core to
which the join was addressed. Thus, the router encountered is
either a minor core or a non-core router on the tree. At this
point, the join-request terminates.
The join-request is terminated at this point because there is
already a shortest-path from this router to the core tree. So,
the interface over which the join-reqest arrives is marked as a
child interface for the corresponding CBT tree, and all CBT
packets for the group arriving over it are forwarded on that
router's one and only interface leading towards the core tree.
If the router receiving the join-request is indeed the major core
to which it is addressed, then the major core must be the first
router encountered on the CBT tree for that group. In this case,
the major core simply returns a join-ACK which may traverse other
routers before it reaches its destination.
All routers that forward/originate a join-request that
recieve/forward a join-ACK for that request, set the following
flags:
1. Router status flag - if not already set, is set to non-core
in the flag table. Other router status flags are major core
and minor core, but these are set under different
circumstances.
Note: Only one router status flag can be set at any one time
for any one CBT tree in the flag table. The setting of one
flag may however, lead to the clearing of another.
2. parent-branch - this flag is set in routing table, and there
can only be one branch for which this flag is set. It is set
against the interface indicating the parent branch in a
particular CBT tree identified by group-id. The routing
table is a slightly modified unicast routing table.
CBT - Internet DRAFT, November 1992 15
3. child-branch - this flag is also set in the routing table
against the interface indicating a child branch in a
particular CBT tree identified by group-id. For any one
group, this flag may be set on multiple interfaces.
The flag table is illustrated below:
++++++++++++++++++++++++++++++++++++++++
+ ROUTER STATUS FLAG +
++++++++++++++++++++++++++++++++++++++++
+ Non-core Minor-core Major-core +
+ -------- ---------- ---------- +
+ group-id - - +
+ - group-id +
+ - group-id - +
++++++++++++++++++++++++++++++++++++++++
o Scenario 2: The join-request encounters a router that is not
part of the CBT tree identified by group id. In this case the
router forwards the join-request on the next hop towards the
addressed core using the underlying unicast routing algorithm.
As the join-ACK passes back along the point-to-point links that
the join-request traversed, the routers encountered become
non-cores and the appropriate flags are set in the flag
table/routing table, as described above.
Note: All join-request's contain an identifier in addition to the
group-id to help distinguish between different join's for the same
group, and match join-request's with join-ACK's. When a join-request
is received by a router not already on the CBT tree, a copy of the
join-request is cached, together with the incoming interface address.
When it is forwarded, the outgoing interface address is added to the
information contained in the cache. This is to ensure that the
join-ACK is received and sent out on the correct interfaces.
CBT - Internet DRAFT, November 1992 16
4.5 Some Examples
The diagram below shows a CBT tree for a group with group-id 12345.
We will demonstrate two examples. The first will show the messages
generated when a host, shown as ``X'' next to the bottom right router
in the diagram, wishes to become a member of the group 12345.
Figure 3: Example Multi-Core CBT Tree
The second example demonstrates how a new major core becomes part of
the core-tree. The part of the diagram relevant for this is the top
centre indicated by thick dotted lines.
CBT - Internet DRAFT, November 1992 17
o Example 1: This example demonstrates the sequence of events
required for host ``X'' (bottom right in diagram), when it wishes
to become a member for the group 12345. The diagram illustrates
the present state of the CBT tree for group 12345, but the dashed
lines indicate the branches not yet part of the tree. The
routers on the dotted lines are those traversed by any messages
exchanged with the tree.
We assume the message exchange with directory service has already
taken place, and has supplied major core B's address. The
sequence of events is as follows:
1. Host ``X'' composes a send-join request containing the
address of major core B, and the group-id 12345. It then
forwards it to the local router shown in the diagram.
2. The local router composes a join-request with major core B as
the destination address, and forwards it on the next-hop
towards the core. There are two further hops to traverse
(shown on the dotted line on bottom right of diagram) before
the join-request hits the CBT tree. These two routers
forward the join-request as a normal unicast packet, i.e.
CBT is transparent to them as long as they are not part of
the group identified by group-id in the OPTIONS field of the
IP header.
3. After the local router has forwarded the join-request, it
returns a send-join-ACK to the local host. This indicates to
the local host that group membership is pending.
4. The non-core router at the end of the dashed line receives
the join-request. It processes the IP header which includes
the group-id in the OPTIONS field of the header, and realises
it is a CBT packet pertaining to group 12345, of which it is
a current member.
5. Since the non-core router already has a shortest-path branch
to the core tree, it terminates the join-request and composes
a join-ACK. This is forwarded back over the interface the
join was received, towards the originator of the
join-request. The current non-core router sets another child
interface flag in its routing table, for the new branch of
the CBT tree.
6. Each router that receives the join-ACK sets the parent/child
flags in their routing tables, indicating the CBT tree
interfaces. It also sets the non-core flag in the flag
table.
CBT - Internet DRAFT, November 1992 18
7. When the originator receives the join-ACK, it marks the
appropriate parent interface in its routing table, sets its
non-core flag in its flag table, and forwards the join-ACK to
the local host that requested membership.
8. From this point the host can send/receive multicast packets
for the group identified by group-id. The host will also
begin to answer group-membership queries as part of IGMP.
CBT - Internet DRAFT, November 1992 19
o Example 2: This example demonstrates the sequence of events
required for a newly-assigned major core router to become part of
the CBT core-tree for the group 12345. The CBT tree for group
12345 is the same as one used in Example 1 above.
We assume the initial exchange with directory service has been
completed, and the major core has been supplied with either the
address of one existing major core (in this case, B), or it has
been supplied with the complete list of all current major cores
for the group, and makes its own choice (in this case, B). The
path between the newly-assigned major core and major core B is
shown by the thick dotted line. The sequence of events is as
follows:
1. The newly-assigned major core C now knows the address of
major core B, to which it must forward a core-join request.
It composes the request, and forwards it on the next hop
towards B. The sequence of hops before encountering the CBT
tree is shown by the thick dotted line.
2. The next-hop recieves the core-join request. It forwards it
using normal unicast routing, since it does not recognise CBT
packets unless it is a router on the CBT tree for that group.
Similarly for the next two hops, since they are not part of
the CBT tree (yet) either.
3. The core-join request hits the core-tree on its next-hop.
All minor cores terminate core-join requests. The minor core
caches the received core-join request and forwards a
termination-notification to major core B. When the minor core
receives a termination-notification-ACK it can then compose a
core-join-ACK and forwards it on the next-hop to the new
major core.
4. Once major core B has sent the termination-notification-ACK
it marks the interface over which it was sent in its routing
table as the interface for major child C. It can proceed to
send ICMP echo request/replies over that interface, but first
waits to receive an echo request before itself sending one.
5. The routers traversed by the core-join-ACK record over which
interface it was received, set the appropriated parent flag
in the routing table, set the minor core flag in the flag
table, then set the child flags in the routing table after
forwarding the join-ACK.
6. When the new major core receives the join-ACK, it records the
interface over which it was received, together with its new
major parent, in the parent/child table. It then proceeds to
send an ICMP echo request on the interface leading to its new
major parent. A new core-tree branch is now complete.
CBT - Internet DRAFT, November 1992 20
5 The LAN Issue
Multicast packets will nearly always originate from a host on some
local area network (LAN). For LAN's with only one router attached,
there are no apparent problems with receiving/forwarding CBT multicast
packets onto the LAN on thier way to their destination.
Today however, multi-access links are becoming more common on LAN's,
i.e. LAN's now tend to have more than one router attached to them.
Although all routers on a multi-access link will be exchanging EGP/IGP
messages to establish which are the unicast shortest-cost paths to
destinations, there is often more than one valid route to a
destination, for example, for different types of service (ToS) or/and
policy considerations for inter-domain routing.
The latter points are of relevance to CBT; consider what would happen
if a join-request from a particular major core from downstream (in
relation to that core), arrived at a router on the multi-access LAN.
To which router should the join-request be forwarded if there is more
than one with a shortest-path to the core? If the router receiving
the join-request were to select (or elect) a forwarding router on the
LAN for that and all subsequent multicast packets, the problem is not
solved, as we can see from the scenario that could then follow;
consider another join-request for the same group arriving from
downstream (in relation to the core tree) at a different router on the
LAN (which is a perfectly valid scenario). To exacerbate the problems
of our example, the join-request is for the same group, but addressed
to a different major core.
Further assume that the router at which the join-request arrived,
selects/elects a different router on the LAN to the one previously
chosen by the router that received the first join-request. The LAN
topology could be as shown in the diagram over ....
CBT - Internet DRAFT, November 1992 21
Figure 4: The Multi-Access LAN Problem
This obviously cannot be allowed to happen since not only would it
cause ``cycles'' in the tree (resulting in looping), but would also
result in duplicate multicast packets arriving on the LAN from
upstream, and duplicates being sent from the LAN upstream.
The problem can be alleviated rather simply since we can take
advantage of the way CBT uses unicast addressing to perform
multicasting. This would certainly not be the case if CBT used class
D addresses, since modern router interfaces are programmed to pick up
all multicast packets.
What we must do is imagine a multi-access LAN as one large node
(router) on the CBT tree. Any node on the tree has only one interface
leading to/from the core-tree, and zero, one or more interfaces
leading away from the core-tree.
The two problems described in our example are solved as follows:
Problem 1: A router has received a join-request from downstream (in
relation to the core tree) and must decide between two or more
routers on the LAN which each have equal-cost paths to the
addressed major core.
The problem is solved by electing the router with the lowest
address from those with equal-cost paths. Then, the join-request
and all subsequent multicast packets pertaining to that group
that arrive from downstream are encapsulated at the link level
and addressed to the elected router.
Problem 2: As we have already explained, it is perfectly valid for a
second join-request for the same group to arrive at a different
router on the LAN than the previous one, and be addressed to a
different major core for the same group.
If we imagine the LAN to be one large node (router) on the CBT
tree, the solution should be obvious. As we explained in section
4.4, when a join-request arrives at a node that is already on the
tree, whichever major core for the group the join-request is
addressed to, the join is terminated at that node, and the path
towards the core tree is via that node's parent interface for the
tree. Thus, that node can ACK the join-request.
More details on the solutions to the above problems are given in the
following subsection.
CBT - Internet DRAFT, November 1992 22
5.1 Further Details on the LAN Issue
As we described in the preceding section, we must imagine a
multi-access LAN as one large node (router) on the CBT tree, with one
interface leading to the core from the LAN, and from the core to the
LAN.
There may be zero, one or more legitimate LAN interfaces (routers)
leading away from the LAN for any particular group tree.
We now describe the details of two scenarios with respect to
multi-access LAN CBT multicasting:
Scenario 1: There are no members for CBT group g on multi-access LAN
l, and some host(s) on the LAN is/are sending to group g. In
this case, there are no complications involved and the CBT
multicast packet will be encapsulated as normal at the link level
and addressed to some local router that has a shortest-path to
the core. This router then forwards the packet on its way to the
addressed major core.
CBT - Internet DRAFT, November 1992 23
If the chosen router happens not to have the shortest-path to the
addressed core, that router will nevertheless forward the packet,
and then send an ICMP-Redirect message to the originating host so
that subsequent packets will be addressed to the more appropriate
router3.
Scenario 2: This is the more complicated scenario where a
join-request is received by a router on a multi-access LAN. As we
have stressed, there must be only one LAN interface to/from the
core. Thus, when a join-request arrives at a router on a
multi-access LAN, that router must take immediate action to avoid
the problems occurring that we outlined in the previous section.
Exactly what happens is described below.
----------------------------
3. The router to which packets are sent for forwarding maybe DOWN.
In the absence of a router-discovery protocol, hosts may never
dynamically discover that the packets they are sending to it are
disappearing ``into a black hole''.
CBT - Internet DRAFT, November 1992 24
On receipt of a join-request, router r performs the following:
o elects a router on the LAN that has the shortest-path to the
addressed major core. If there is a tie, the router with the
lowest address is elected as designated core multicast router4.
o Router r sends a special CBT LAN broadcast containing the
following information:
1. group-id contained in the join-request
2. designated core multicast router IP address
o The join-request is forwarded to the designated router.
All routers on the LAN are expected to store this information,
together with a timer (of say, three minutes) for reference purposes,
in case another join-request is received before the previous
join-request has been ACK'd. When the join-ACK arrives from upstream,
it is forwarded onto the LAN by the designated router, and is
eventually picked up by the sending LAN router, which may forward it
which may or may not forward it downstream, depending on whether it
received a join, or originated one.
CBT LAN broadcast messages are sent by a single router on the LAN say,
once every three minutes. They are stored by all LAN routers with a
timer. Thus, the router responsible for broadcasting must continue to
do so once within the timeout period, as long as it is a node on the
CBT tree. Information in successive broadcasts overwrites previous
information with respect to the group identified by group-id provided
it is broadcast by the same source.
If another router on the LAN recieves a join-request for the same
group, it may not have received a previously broadcast message from
some other router on the LAN w.r.t. the same group (however, this is
considered highly unlikely given LAN media bandwidths and
reliability). If this second router begins broadcasting the same or
different information w.r.t. the router on the LAN with the
shortest-path to the core, all other routers would receive it and
recognise it as being generated by a different source than the first.
In this (again unlikely) case, the two broadcasting routers will
choose between themselves, which is going to elect the designated core
multicast router and be responsible for LAN broadcasts.
The diagram over shows a LAN with a designated router:
----------------------------
4. The designated core multicast router will be responsible for
operating the ``Host Membership Protocol'' (IGMP)
CBT - Internet DRAFT, November 1992 25
Figure 5: The Solution to the Multi-Access LAN Problem
If a quit-request is received between broadcasts, and there is more
than one downstream LAN interface for the CBT tree, then multicasts
received via those interfaces will continue to use the designated core
multicast router until the entry expires for that group. This, and
the case where the designated router fails, are now considered:
Scenario 1: A downstream CBT tree router interface has received a
quit-request. It will encapsulate the quit-request and address
it to the designated router. Since the designated router is also
responsible for running IGMP, it will know if there are any more
CBT routers on the LAN pertaining to that group. If there are,
the quit-request is not forwarded further. If there are not,
then it forwards the quit-request upstream and removes itself
from the tree.
Note: If the quit-request has been sent by the router
responsible for CBT LAN broadcasting, then assuming there are
still members remaining on the LAN, it assumes responsibility for
subsequent broadcasts.
Scenario 2: The router responsible for CBT broadcasts is DOWN. This
could potentially happen at any time. It is a fairly certain
assumption that a dynamic routing protocol will be operating
between the routers on the LAN which should indicate the status
of the CBT LAN broadcasting router. If it is indeed DOWN, the
designated router on the LAN assumes responsibility for that
role.
5.2 Summary
In summary then, once a join-request has been received by a router on
a multi-access LAN, all other routers on that LAN are informed of the
group-id and the designated router, which is considered the only LAN
interface to the core tree. After the join-ACK has been received and
forwarded, all routers on the LAN have the information necessary to
allow the LAN to simulate a single node on the CBT tree, with one
interface leading to/from the core, and (potentially) several
interfaces leading away from the core.
CBT - Internet DRAFT, November 1992 26
6 More on how a Router becomes a Major Core
Any available IP router can be requested to become a major core at any
time by directory service. It is assumed that directory service will
have a full list of most recently available routers (subject to
routing update period), each of which it can request to become a major
core for a particular group(s) at any time. Directory service make
such a request by means of a core-request. Network managers have the
authority to configure their routers to reject core-request's if they
so wish.
When a core-request is successfully accepted (by means of an ACK as
opposed to a N-ACK), directory service makes an entry in its group
table. This contains a link list for every group present in the
internetwork. As long as a group is in existence it will have at
least one link (for the case of a single-core tree) containing the
address of the major core. As more major core's become assigned to a
group, their addresses are added to the corresponding link list. An
entry in the group table of directory service would be as follows:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ Group-id || core1_addr | core2_addr | core3_addr | NULL +
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
On receipt of a successful ACK to a core-request, directory service
will send a packet containing the address of an existing major core
(or alternatively all the core addresses) that the new core should
join. If this packet is empty, the new core assumes it is the core
for a single-core tree and awaits join-requests that will be pending,
since cores are only assigned as such when some end-system requested
the presence of the group.
As the group becomes larger and more widespread, directory service can
assign further major cores for the group. The assignment of these is
left at the discretion of directory service. Additional core
assignment becomes necessary when potential members request the
address of a major core for the group but are geographically distant
from present members. Alternatively, a new major core may be assigned
for reasons of robustness.
Any newly-assigned core must send a core-join request to the major
core determined by directory service (or alternatively determined by
itself from a list supplied by directory service). This is described
in the section 8.
CBT - Internet DRAFT, November 1992 27
7 Core Assignment and Optimality
It is the assumption of this draft that major cores will be optimally
placed (or rather, near optimally placed) with respect to group
members, whether groups be small or large. We have left this
``magical assignment'' to multicast directory service, but have not
said how directory service should/could achieve near-optimal core
placement. There is no doubt that such a specification is outside the
context of a multicast routing protocol, but it is clearly a subject
of future work.
However, in our opinion, the heuristic required to achieve near
optimal core placement is likely to be simpler rather than the result
of complex mathematics or graph theory. We may well propose a
solution before a CBT implementation is complete.
Even with optimal core placement, it should be obvious that CBT does
not always provide shortest paths between a multicast source and all
destinations on the tree, as opposed to source-based multicast. There
are some for whom this will be unacceptable, but we must focus on the
problems at hand; firstly, it is highly probable, given optimal or
near-optimal core placement, that the paths traversed by CBT multicast
packets will be mostly shortest-paths, and so will compare favourably
with source-based multicast trees. We have yet to carry out
experimentation to prove this, but we don't expect an optimality
margin any greater than 10-15%.
Second, we have to think of the problems at hand. As far as multicast
routing is concerned, we should first and foremost ensure scalability,
as well as simplicity, robustness and information hiding.
Thirdly (and rather more cynically), most routing algorithms/protocols
are in one way or another sub-optimal, whether it be how they handle
congestion, or react to topology changes. What, in the area of
routing, is optimal in all cases!
Finally, there is the interesting case of locating network services
distributed throughout the internetwork. So called resource discovery
is a topic of current research, but there is no doubt that multicast
is an important aspect of it, with for example, one multicast group
linking together all locations relevant to a particular service. Such
an application would require permanent group trees to be maintained.
Again, assuming reasonably optimal core placement between the members
of such a group, a client need only send a request by means of a CBT
multicast packet, and unicast it towards a core for that particular
tree. The first available service receiving the request on the tree
would reply by means of a unicast to the sender. Used in this way,
CBT must surely offer one of the most attractive solutions to the
resource discovery problem.
7.1 Dynamically Improving Optimality
It may well be that when a join-request arrives at a router that is
not already on the tree, that router's most optimal path to the
addressed core is temporarily not available. In order to keep the
join latency low, we decided that all routers must immediately forward
CBT - Internet DRAFT, November 1992 28
join-requests, and in the above case, the join would be forwarded on
the next-best path, which may be considerably less optimal.
We have made a provision for the tree to be able to re-configure
itself so that the paths traversed to the core tree are mostly
optimal. The overhead required to do this is negligible and the
packet loss resulting from the transition may well be zero.
Consider the following, where router X (not yet on the tree) must
forward an incoming join-request to next-hop Y on its way to major
core A, since it's best next-hop, Z, is temporarily not available.
Figure 6: CBT branches may change dynamically
In the above case, the branch formed will be X, Y, A. However, the
best path is X, Z, A. When Z becomes available again, X can send a
second join-request to Z, which forwards it on towards A. A will then
send a join-ACK back along the path A, Z, X. When X receives it, it
can ``keep it on hold'' before making a new entry (parent interface in
routing table) for that CBT tree.
In an instant when all the of X's CBT interfaces are silent, it could
then make the transition to shortest-path by simply replacing the its
parent entry in it's routing table. After the transtion it sends a
quit-request out on the old interface.
CBT - Internet DRAFT, November 1992 29
8 Major Cores and Minor Cores: The core-join request
A core-join request will always be forwarded by non-core routers.
However, minor cores will always terminate a core-join request. There
are 5 scenarios to consider when a core-join request is sent to an
existing major core router:
o Scenario 1: There is a point-to-point link between the sending
major core and the recipient major core. In this case the
core-join request arrives directly at the destination major core.
The recipient core sends a core-join-ACK back to the originating
core, and proceed to exchange Core Tree Maintenance Protocol
messages.
o Scenario 2: The core-join request encounters a router that is
not any part of the CBT tree for the group identified by group-id
in the core-join request. The receiving router will cache the
request (which contains additional identifying information) and
the address of the incoming interface, forward it using normal
unicast routing to the next hop on its way to its destination
(major core), then add the outgoing interface address to the
cached information. When/if a matching core-join-ACK is
received, the cached information is used to ensure that it
arrived on the correct interface. If so, the appropriate flags
are set in the flag table/routing table, and a core-join-ACK is
forwarded on the interface as specified in the cached
information. After a core-join-ACK has been forwarded, it
becomes a minor core on the core-tree for that group. Minor
cores are obliged to forward Core Tree Maintenance Protocol
messages.
If, for any reason, a check fails, the packet is dropped.
Note: The interfaces over which a core-join-ACK is received/sent
are recorded in the routing table by flags, one for each
interface that corresponds to the core tree. It is necessary for
a minor core to know its parent core-tree interface and
child(ren) core-tree interface(s), and these are marked as such
in the routing table.
o Scenario 3: The core-join request encounters a non-core router
on the CBT tree. Non-core routers are obliged to forward
core-join requests to an upstream neighbour that is the next-hop
on the CBT tree. Once a core-join request hits a non-core
router, irrespective of the major core it is addressed to, it is
forwarded on the parent interface of that non-core router
towards. Thus, the core-join request may no longer be destined
for the major core addressed in the IP header. The core-join
request continues it's journey on a branch of the CBT tree for
the appropriate group, and will eventually hit either a minor
core or a major core.
CBT - Internet DRAFT, November 1992 30
If a major core is encountered, whether it is the one addressed
or not, it is ACK'd by it. The ACK traverses the same branch
back to the source and all routers along the way change their
status to minor core, except the final recipient router, which
now becomes a new major core. In this way, a new branch of the
core tree is formed.
Design decision: In this way, a core-join request will only
traverse existing branches of the tree and in doing so prevent
core-join requests ``crossing over'' non-core routers, which
could potentially lead to ``cycles'' in the tree.
Note: The core-join ACK must contain an explicit field as to the
true identity of the parent major core. The reason should be
apparent from scenario 5.
o Scenario 4: The first router encountered by a core-join request
is a major core different from that in the destination field of
the request. This is a somewhat unlikely scenario since it would
indicate that directory service had made an error in its choice
of new major core. Major cores are expected to be geographically
isolated from each other, and so it is unlikely that directory
service would assign a router to become a major core so near to
an existing major core, for any one group.
However, if the first router encountered were indeed a different
major core than the one addressed in the core-join request, it
will terminate it and ACK it. The source address of the ACK is
the true identity of the parent.
o Scenario 5: The core-join request encounters is a minor core for
the group. The core-join request is terminated at this point.
The recipient minor core next formulates a
termination-notification packet which contains among other
things, the address of the source (major core), and forwards it
over the core tree interface which leads to the addressed core
(as indicated in its routing table). By doing this, Core Tree
Maintenance Protocol message exchanges may be more evenly
distributed. The termination-notification packet serves to
indicate to the addressed major core that it has a new major
child with which to exchange Core Tree Maintenance Protocol
messages.
On receipt of the termination-notification-ACK, the minor core
formulates a core-join-ACK and forwards it on the interface over
which it received the core-join request. As already mentioned,
any routers on or off the tree that the core-join-ACK traverses
becomes a minor core for the group, so the necessary updates must
be made in the flag table of each router. In this way, new
branches of the core-tree are formed, and the core-tree branch
interfaces are recorded by each minor core by setting the
appropriate flags in the routing table.
Note: The core-join-ACK formulated by the minor core must have a
field containing the identity of the true major parent of the new
child. By assuming the source address of the ACK would be clearly
incorrect.
CBT - Internet DRAFT, November 1992 31
9 Core Tree Maintenance Protocol
An introduction to the Core Tree Maintenance Protocol is given below,
but the actual detailed working of the protocol is made clear in the
sections on the core-state-notification packet and core-list packet.
ICMP messages form the basis of the Core Tree Maintenance Protocol.
However, other types of message are generated as part core-tree
maintenance, so we shall refer to these messages collectively as Core
Tree Maintenance Protocol messages. Only major cores are ever the
source/destination of Core Tree Maintenance Protocol messages. More
precisely, only major parents and thier respective major child(ren)
exchange Core Tree Maintenance Protocol messages.
For any group, say group A, the Core Tree Maintenance Protocol
messages must traverse only the core-tree for group A. Only in this
way will core-tree partitions be noticable.
Design decision: This is an example of why it is necessary to
distinguish between minor cores and non-cores. The latter will never
receive Core Tree Maintenance Protocol messages, and so need not know
how to deal with them.
Whenever a major core sends/receives a core-join request, it records
the destination address present in the core-join-ACK (or the source
address in the case of receiving a core-join-ACK) and the interface
over which it forwards the ACK (or received it). This information is
recorded by major cores in the major parent-child table. This not
only gives information on who a major core's major parent/major
child(ren) are, but also the interfaces over which they are reachable,
since major parent/child(ren) will not be directly connected
neighbours. The interfaces will obviously be the CBT core-tree
interfaces.
Note: If a major core receives a termination-notification packet from
some minor core, and replies with a termination-notification-ACK,
address and interface information is similarly recorded, as above, in
the major parent-child table. The major parent-child table is
illustrated below:
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ Group-id || major-parent = i/f | major_child1 = i/f | major_child2 = i/f +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A core-list packet contains the addresses of the major cores for a CBT
tree that are considered to be currently available. A core-list
packet is only generated in response to the addition or removal of
major cores and is the result the result of a core-state-notification
packet having stabilized. A core-list packet is multicast by one
major core. This process will be more fully explained in the sections
10 and 11.
Non-core routers cache and forward core-list packets on all outgoing
CBT tree interfaces, i.e. multicast. Non-core routers need take no
action on receiving a core-list packet unless its parent is a major
CBT - Internet DRAFT, November 1992 32
core and has become unavailable. The parent may have become
unavailable since the generation of the core-list, or the core-list
was generated as a result of that major core being removed. In either
case, it is the responsibility of the non-core router to send a new
join-request to the ``nearest'' major core listed in the core-list
packet.
If a major core, to which a non-core router is attached becomes
unreachable, it is the responsibility of that non-core router to send
a new join-request towards another major core selected from its
current core-list packet. In this way, only the non-core router in
the immediate vacinity of a CBT partition need take action to ensure
that the branch becomes re-attached to the tree. Thus, the partition
remains transparent to other non-cores not in the direct vacinity of
the partition.
Minor cores ignore core-list packets. Minor cores play no direct part
in repairing partitions of a core-tree, but do serve to help prune the
core-tree after a failure has occured.
Major cores cache the most recently received core-list packet. A
major core must use its cached core-list if it establishes that it's
parent is DOWN, in order to restore core tree connectivity. When
reachability is lost between major child and major parent because of
an intermediate link failure, it is the responsibility of the major
child only, to re-establish a (virtual) link with its major parent.
The core-list is not required for this.
Design decision: Indeed, if both major cores did try to re-establish
connectivity with each other, each would become the parent (and child)
of the other.
Major-parent and major-child(ren) constantly test each others
``liveness'' by exchanging ICMP echo request/replies. If a reply is
not received after a certain timeout period, one of two kinds of
failure is assumed:
o Scenario 1: Major core failure.
o Scenario 2: Minor core failure (or some link between parent and
child major cores is down.
Scenario 1: Major core failure - A major parent (or child) can
establish to a high degree of probability the reason for not
receiving a response to a ping message. It can assume failure
after trying to reach the parent/child via all other possible
routes. Again, if no reply is received within a small timeout
period, the parent/child is assumed DOWN.
Only the major child(ren) of the failed major core is responsible
for re-establishing full connectivity of the core-tree. Each
child does so by making core-join request to another major core
as described in the section 11 on the core-list packet. After a
major core has received the core-join-ACK, its new parent-core
generates a core-state-notification packet which is disemminated
on all core-tree interfaces for that group. When the
core-state-notification packet stabilizes, i.e. the packet sent
CBT - Internet DRAFT, November 1992 33
out on a core-tree interface is the same as that received, the
contents of it are copied to a core-list packet and multicast on
the CBT tree. Only the major core that originated the
core-state-notification packet generates and multicasts the
core-list packet.
Scenario 2: Minor core (or link) failure - if, on the other hand, an
ICMP reply is received via a route not on the core tree, then the
core tree link is assumed broken. As in the previous case, it is
the responsiblility of the child to re-establish core tree
connectivity. The underlying unicast routing algorithm will
provide a new route to the parent major core. The new route may
or may not include hops that were part of the old core-tree.
What follows is described in the next section on the
quit-request.
CBT - Internet DRAFT, November 1992 34
9.1 The quit-request
If it has been decided that a core tree link is broken, the major
child sends a new core-join request towards its major parent. If the
first next-hop of the new route is different from the old, it must
also send a quit-request to the old next-hop which formed part of the
original core-tree.
Quit-requests can have a cascading effect in that whenever any router
on the CBT tree receives one from a child router, it too sends one to
its parent iff it has no further children and no member hosts on
attached subnets. The exception to this is when a quit-request
arrives at a non-core router from a downstream non-core router, and
the directly connected parent is either a minor or major core. In
both cases the quit-request is sent to the parent which removes the
child flags as appropriate from the routing table, but no further
quit-requests are generated from this point.
A quit-request is only permitted in the direction child to parent, and
never in the reverse direction. Furthermore, quit-requests are always
terminated at a major core.
CBT - Internet DRAFT, November 1992 35
10 core-state-notification (CSN) packet
A core-state-notification packet is generated only by major cores as
the result of a major core having accepted a new core-join request, or
by a major core as the result of that core sending a quit-request.
The CSN packet contains a unique identifier as well as the group-id
and originator's address, both of which remain constant until the CSN
packet stabilizes.
Once a new major core has been ACK'd by its major parent (either
explicitly or by having received a core-join-ACK from a minor core in
response to the minor core having received a
termination-notification-ACK packet), the parent generates a
core-state-notification packet containing a list of its major core
children.
Alternatively, a major core has sent one of it's children a
quit-request. In this case too, the sending core generates a CSN
packet.
The CSN packet is distributed on each core-tree interface for the
group. Exactly how the CSN packet works is described in the next
section.
10.1 How the CSN Packet Works
In a multi-core CBT tree, all major cores have at least one major core
(virtual) neighbour. When the state of the core tree changes as the
result of a new core addition or core removal, a mechanism is needed
to inform the rest of the CBT tree of the new state. This mechanism
is the core-state notification packet exchange which results in a
core-list packet being multicast on the tree.
It works as follows; whenever a major core accepts a new child, or
sends a quit-request to remove one, it must take action on reporting
it. The parent to the new addition/removal generates a core-state
notification packet reporting itself and its children that are UP.
This CSN packet is broadcast on all core tree interfaces, but not on
other CBT tree interfaces. The aim of the CSN packet is for it to
arrive in a state where each child is terminated by a ``NULL'' - this
``NULL'' entry can only be made by a child if it has itself no
children. If a major core has no children, then a ``NULL'' follows
it's own entry. When a major core cannot append information to a
broadcast CSN packet is it forwarded on outgoing interfaces only if it
is as large or larger than the CSN packet the node currently knows
about. If it is not the CSN packet arriving is dropped. In this way,
a slow responder cannot flood the the core tree with out-of-date
packets. If however, a broadcast arrives which is as big or bigger
than the one it currently knows about AND can modify it, the resulting
CSN packet is broadcast on all outgoing core tree interfaces.
The CSN packet is considered to have stabilized when all child entries
in it are suffixed with ``NULL'', and when childless parent entries
are also suffixed with ``NULL''. A stabilized CSN packet is copied to
a core-list packet and multicast on the CBT tree by the originator of
the first CSN packet. A diagram is given in section 11.
CBT - Internet DRAFT, November 1992 36
A diagram may have been appropriate here, but because of the potential
for each major core to be slow/faster than the others, and as
broadcasts can arrive in different orderings because of faster/slower
nodes, we thought a diagram may have been misleading, so we hope the
explanation suffices.
CBT - Internet DRAFT, November 1992 37
11 core-list packet
As far as major cores are concerned, the core-list packet is crucial
to core-tree maintenance, allowing major cores, where appropriate, to
re-establish core-tree connectivity after it has failed.
The structure of the core-list packet is simple, but is catalytic in
repairing core-tree connectivity. As we do not envisage the number of
major cores to be more than a handful for very large groups, the
core-list is uncomplicated and remains even in the worst case, quite
small. It consists of a concatenation of major cores and their major
core children in the order they joined the parent.For example, take
the following (large for demonstration purposes) core-tree:
Figure 7: CBT Core Tree
The numbers only serve to illustrate the example and indicate the
order in which major cores were assigned to the group. The core-list
packet for this group will be as shown over:
CBT - Internet DRAFT, November 1992 38
Figure 8: CBT core-list packet
If a major core fails and it has no major core children, the failed
major core is a leaf on the CBT core-tree. In this case, no action is
necessary since core-tree connectivity is maintained. If there are
non-core router children hanging off it, it will be the responsibility
of the non-core whose parent has failed, to make a join-request
elsewhere.
Assuming that the failed major core has at least one major child, it
may be necessary for the child/children to restore core-tree
connectivity. The first major child that joins a major core is tagged
to indicate it is the oldest child for that core. While there is only
one child, it is also the youngest, so it is also tagged as such. The
oldest child will obviously retain the oldest tag (as long as it is
alive), but as new major children join, the youngest tag changes.
Returning to the example of the failed major core. Its youngest major
child must send a core-join request to the next-oldest, which then
does the same. This continues until the oldest child has been
reached. The major core this child sends a core-join request to
depends on whether the failed parent was the child of any other major
core. There are two scenarios:
o If the failed major core has a parent, the youngest child must
join that parent.
o If the failed major core has no parent, then the youngest child
of the failed core need not make a core-join request to any other
major core, since original core-tree connectivity will have been
re-established simply by having linked the children together, if
indeed, there were any present.
Design decision: When a core-tree partition occurs, simply having a
major core in the vacinity of the partition send a new core-join
request to any of the other major cores listed in the core-list packet
would NOT ensure full core-tree connectivity, since a request could
potentially be made to a major core on the same side of the partition.
The method described above ensures that full core-tree connectivity is
restored after a partition.
CBT - Internet DRAFT, November 1992 39
Let us take the previous example again to show this:
Figure 9: CBT Core Tree
When 5 fails: From the core-list, major core 5 has 3 major core
children (6, 7, and 8). Sequence of events:
o 8 joins 7, 7 joins 6.
o 6 scans the core-list to establish whether 5 is the child of any
other major core.
o There is a hit. 3 is the parent to 5.
o 6 joins 3 to re-establish core-tree connectivity.
The new core-tree is shown over:
CBT - Internet DRAFT, November 1992 40
Figure 10: Repaired CBT Core Tree
Another example is given over ....
CBT - Internet DRAFT, November 1992 41
Another example. Assume major core 1 fails. Sequence of events:
o 4 joins 3, 3 joins 2.
o 2 scans the core-list to establish whether 1 is the parent of any
other major core.
o There is no hit. Thus, no action is taken, and core-tree
connectivity is re-established.
The resulting core-tree is as shown over:
Figure 11: Repaired CBT Core Tree
CBT - Internet DRAFT, November 1992 42
12 Acknowledgements
There are several people whom we would especially like to thank for
their valuable contributions to CBT. Firstly, Ian Wakeman (UCL) whose
suggestion it was to separate the group identifier from the IP address
and place it in the options field of the IP header. We considered
this an important design issue offering several advantages over the
original design.
Secondly, Joel Halpern (Network Systems Corporation) and Steve Deering
(Xerox PARC). Joel identified many of the problems encountered when
CBT multicast packets traverse or originate on a multi-access LAN.
Both he and Steve offered valuable insights into the LAN multicast
problems and made many comments and suggestions as to how these
problems could be solved.
Finally, we would like to thank all those on the IDMR mailing-list who
contributed to, or/and made comments on, CBT, however small these may
have been. We consider these, and all ongoing comments and
suggestions important in devoloping a standard inter-domain multicast
routing protocol that will cater for the requirements of the internet
community for the forseeable future and beyond.
We encourage continued participation via IDMR on CBT and all other
inter-domain multicast routing issues.
CBT Expires April 6th, 1993.
CBT - Internet DRAFT, November 1992 43
A Authors' Addresses
Anthony J. Ballardie,
Department of Computer Science,
University College London,
Gower St.,
London WC1E 6BT,
ENGLAND.
e-mail: A.Ballardie@uk.ac.ucl.cs
Office Tel: ++44 (0)71 387 7050 ext. 3701
Paul F. Tsuchiya,
Bellcore,
445 South St. 2L-281,
Morristown,
New Jersey 07960,
U.S.A.
e-mail: tsuchiya@thumper.bellcore.com
Office Tel: 1-201-829-4484
Jon Crowcroft,
Department of Computer Science,
University College London,
Gower St.,
London WC1E 6BT,
ENGLAND.
e-mail: J.Crowcroft@uk.ac.ucl.cs
Office Tel: ++44 (0)71 380 7296
REFERENCES
[1] S. E. Deering. Multicast Routing in a Datagram Internetwork. PhD thesis,
Stanford University, California, U.S.A., 1991.