Internet-Draft Diameter Overload Control Solution Issue October 2020
Campbell Expires 4 April 2021 [Page]
Workgroup:
Network Working Group
Internet-Draft:
draft-campbell-dime-overload-issues-01
Published:
Intended Status:
Informational
Expires:
Author:
B. Campbell
Tekelec

Diameter Overload Control Solution Issues

Abstract

The Diameter Maintenance and Extensions (DIME) working group has undertaken an "overload control" work item, with the goal of standardizing a mechanism to allow Diameter nodes to report overload information among themselves. Requirements currently include, among others, the need to accurately report the scope of overload conditions, and the ability to report overload information between nodes that are not directly connected at the transport layer. These requirements introduce complex issues. This document describes those issues, in the hope that it will assist the working group's decision process.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 4 April 2021.

Table of Contents

1. Introduction

When a Diameter [RFC6733] server or agent becomes overloaded, it needs to be able to gracefully reduce its load, typically by requesting other nodes to reduce the number of Diameter requests for some period of time.

The Diameter Overload Control Requirements [I-D.ietf-dime-overload-reqs] describe requirements for overload control mechanisms. Requirement 31 states that Diameter nodes must be able to report overload with sufficient granularity to avoid forcing available capacity to go unused. Requirement 34 requires the ability to report overload across Diameter nodes that do not support the mechanism. These requirements introduce significant and interrelated complexities to potential solutions. This document describes the related issues. The author hopes that this document will assist the working group's decision process related to these requirements.

At the time of this writing, there have been two proposals for Diameter overload control solutions. "A Mechanism for Diameter Overload Control" (MDOC) [I-D.roach-dime-overload-ctrl] defines a solution that piggybacks overload and load state information over existing Diameter messages. "The Diameter Overload Control Application" (DOCA) [I-D.korhonen-dime-ovl] defines a solution that uses a new dedicated Diameter application to communicate similar information.

This document serves two purposes. The primary purpose is to explore the issues related to Requirement 34, that is, the requirement for the overload control mechanism to support sending load and overload information across intermediaries that do not support the mechanism (referred to herein as "non-adjacent" overload reporting.) The document describes two use cases for non-adjacent overload reporting. It does not, however, attempt to describe the use cases for Diameter agents in general. For a more thorough treatment of Diameter agent use cases in the context of overload control, please see [I-D.ietf-dime-overload-reqs].

The secondary purpose is to help the reader understand the concept of overload scopes, and make recommendations about what kinds of overload scope should be supported by the mechanism. These purposes are interrelated, since an understanding of overload scopes is necessary to fully understand some of the issues with non-adjacent overload reporting.

2. Document Conventions

This document uses terms defined in [RFC6733] and [I-D.ietf-dime-overload-reqs]. In particular, the terms "client", "server","upstream", and "downstream" are used as defined in RFC 6733. In addition, this document uses the following terms:

Overload:
A condition where a Diameter node needs a reduction in the number of requests that it must handle.
Overload Report:
A request to reduce traffic that contributes to an overload condition.
Overload Scope:
A classifier that defines the set of requests that may contribute to particular overload conditions. Alternatively, the purposes for which a node may be overloaded. For example, if a server is overloaded for the purposes of one Diameter application but not another, the overload condition can be considered "scoped" to that application.
Reporting Node:
The node that sends an overload report. Also known as an "overloaded node".
Reacting Node:
A node that consumes and possibly acts on an overload report.
Adjacent Overload Reporting:
Overload reports exchanged between adjacent Diameter peers.
Non-Adjacent Overload Reporting:
Overload reports sent between Diameter nodes separated by one or more intermediate Diameter agents (i.e. relays or proxies) .
Piggybacked Overload Reporting:
The inclusion of overload reports in existing Diameter messages.
Application-Based Overload Reporting:
The sending of overload reports in a separate, dedicated Diameter application.

3. Non-adjacent Overload Information

Requirement 34 of [I-D.ietf-dime-overload-reqs] says that the selected Diameter overload control mechanism "SHOULD" be able to communicate overload and load information across intermediaries that do not support the mechanism. This requirement introduces a number of complications to the solution effort, creating complications in how Diameters negotiate support for overload control, address and route overload reports to the right places, and act on received overload reports.

While the requirement does not explicitly say it, we interpret "intermediaries" in this context to mean Diameter agents. The requirement is irrelevant for lower layer intermediaries (e.g. routers), and cannot be reasonably applied for non-Diameter entities, or hybrid entities such as gateways between Diameter and other protocols.

The requirement to traverse non-supporting intermediaries is not necessarily the same thing as a requirement for end-to-end communication of overload reports between Diameter clients and servers. Non-adjacent reporting can include client-to-server scenarios. They can also include server-to-agent scenarios and agent-to-client scenarios. All such scenarios may include one or more intervening agents. Since Diameter allows transactions to be sent from server to client, all scenarios may be reversed. Therefore, we refer to this requirement as "Non-adjacent Overload Control".

3.1. Use-Cases for Non-adjacent Overload Control

There are two primary use-cases for non-adjacent overload control.

3.1.1. Interconnect

The first significant non-adjacent use-case is the interconnect scenario described in section 2.3 of the overload control requirements [I-D.ietf-dime-overload-reqs]. Two or more Diameter network operators communicate with each other across a third-party interconnect provider that brokers Diameter traffic between the operators. Figure 1 illustrates the interconnect use case.

                        +-------------------------------------------+
                        |               Interconnect                |
                        |                                           |
                        |   +--------------+      +--------------+  |
                        |   |     Agent    |------|     Agent    |  |
                        |   +--------------+      +--------------+  |
                        |         .'                      `.        |
                        +------.-'--------------------------`.------+
                             .'                               `.
                          .-'                                   `.
        ------------.'-----+                             +----`.------------
                  +----------+ |                             | +----------+
                  |Edge Agent|                               | |Edge Agent|
                  +----------+ |                             | +----------+
                               |                             |
                    Operator 1 |                             |  Operator 2
        -------------------+                             +------------------
Figure 1: Two Operator Interconnect Scenario

If the interconnect provider does not support Diameter overload control, each operator network becomes an island of overload control, similar to those in the non-supporting agent use-case (Section 3.1.2). Even if the interconnect provider does support overload control, the operators may not trust it to generate and act on overload reports on the operators' behalves, and may prefer to exchange overload and load information directly with each other.

The interconnect use-case may introduce additional security concerns. While the non-supporting agent use case typically (but not necessarily) occurs inside a single administrative domain, the interconnect case will almost always involve sending overload reports across multiple administrative domains. Since a malicious or incorrect overload report can effectively shut down Diameter processing, the current lack of a viable solution for end-to-end integrity protection of Diameter messages may be a problem.

3.1.2. Non-Supporting Agents

[I-D.ietf-dime-overload-reqs] requires the solution to function in networks where not all Diameter elements support it. That is, the solution must allow gradual deployment, and must not require a flag-day cutover. If non-adjacent overload control is not supported, one or more non-supporting Diameter Agents can divide a network into overload control islands, where overload information is communicated inside each island, but not among separate islands.

  • In the author's strictly personal opinion, the non-supporting agent use case is less compelling than the interconnect case. The non-supporting agent case would typically occur inside one administrative domain. The operator of that domain has considerably more control over the implementations used in the domain than it might have for third-party domains.

3.2. Issues with Non-Adjacent Overload Control

3.2.1. Topology Issues

Many of the issues with non-adjacent overload control derive from the fact that a Diameter node is unlikely to know the topology of the Diameter network past its immediate peers. In a trivial topology, that is, a Diameter network with only clients and servers, this is not a problem. But if the immediate peer is a Diameter agent, a node is unlikely to know what next hop the relay will select for a given Diameter message. This is particularly difficult if the agent hides topology in either direction, or uses dynamic peer discovery. While a node may be able to infer the path a given message will take in some specific cases (e.g. for mid-session messages), they cannot do this in general. And even those specific cases may fail if an agent on the message path performs topology hiding.

This lack of topology knowledge impacts the way that nodes can negotiate overload-control support, the ways they send overload reports, and the ways a reacting node can act to mitigate overload. A non-adjacent overload-control mechanism will need to solve the topology issues, either by offering ways to discover non-adjacent topologies, or offering ways to constrain overload-control relevant parts of such topologies in ways where a node could reasonably know them in advance.

3.2.2. Support Negotiation

Diameter nodes need to negotiate or otherwise indicate their support for overload control to other nodes. This includes indicating support for overload control in general, as well as potentially indicating support of certain parameters of the overload control solution. For example, a node may need to indicate which overload algorithms it supports. This becomes complex if two non-adjacent nodes need to negotiate support.

In a Diameter application-based solution, support for the overload control application would occur during the capabilities exchange between peers. Diameter capabilities exchange occurs strictly between peers; Diameter offers no mechanism for indicating support of a given Application-ID between non-adjacent nodes.

Diameter allows non-negotiated use of an arbitrary Application-Id between non-adjacent nodes across Diameter agents that implement the Diameter Relay application. In theory, this means that an application-based, non-adjacent overload control could only traverse Diameter relays, or Diameter proxies that explicitly support the overload-control Application-Id. In the latter case, we assume that a proxy will not indicate support for the overload-control Application-Id unless it supports the overload-control mechanism; such a proxy cannot be considered a non-supporting agent.

In practice, a Diameter agent can act as a proxy for some purposes and a relay for others. If a Diameter proxy indicates support for the Diameter relay application, we assume that it will relay any arbitrary application. This means it can be considered a relay for the purposes of overload control.

For both application-based and piggybacked solutions, a supporting node needs know the other nodes with which it should negotiate. For overload-control between Diameter peers, this is easy; a node exchanges support information with its immediate peers. But for non-adjacent overload control, this is more difficult for reasons discussed in Section 3.2.1.

Therefore, for non-adjacent overload control negotiation, each supporting node either needs advance knowledge of all nodes with which it may negotiate overload-control support, or it needs a mechanism for discovering that knowledge dynamically.

3.2.3. Overload Report Delivery

With adjacent overload control reporting, overload report addressing and delivery is relatively simple. A node sends overload reports directly to its peers. This becomes more complex for non-adjacent overload-control.

For application-based overload control, nodes could address overload reports to specific endpoint nodes using the Destination-Host AVP. Doing so would be subject to the same non-adjacent topology issues described in Section 3.2.1. That is, a node can only send overload reports to non-adjacent clients or servers that it knows about, either from prior knowledge (i.e. provisioning) or from which it has observed previous Diameter messages.

An application-based mechanism could possibly address reports to non-adjacent Diameter agents using the Destination-Host AVP. This would effectively make the agent into an endpoint for the overload-control application.

A piggy-backed mechanism will have more difficulty addressing non-adjacent overload reports. A piggy-backed mechanism sends overload reports in already existing Diameter requests; That is, requests that have their own purposes and destinations independent of the overload-report. Thus, nodes can only select the destination of an overload report by bundling it into a Diameter message that was already going to that destination. While a piggy-backed mechanism might be able to send overload-reports across quiescent transport connections using watchdog (DWR/DWA) messages, these message are cannot be exchanged between non-adjacent nodes.

  • In some cases, the limit of sending overload reports to destinations to which existing traffic is bound may be acceptable. If a node is contributing to an overload condition, then it's reasonable to assume that node is regularly exchanging traffic with the overloaded node. However, there may be cases where an overload report causes a connection become quiescent. If the reporting node needed to tell a reacting node that the condition has resolved or improved, it would need to send a new report across the now quiescent connection. There may also be cases where a reacting node redirects traffic along a different path, causing a previously quiescent node to suddenly start sending requests to the overloaded node. Thus, without careful selection of the overload report scope, an overloaded node may find itself engaged in a game of Whack-a-Mole [Whac-a-Mole] with previously quiescent non-adjacent nodes.

For both piggy-backed and application-based solutions, non-adjacent overload control introduces a need to identify the sender of a report, or at least determine whether the report is from an adjacent or non-adjacent node. This is not required for purely adjacent solutions, since the sender could always be assumed to be the peer.

For example, a non-adjacent report with a "Connection" scope does not make sense. If a node receives one, it should ignore it. But in order to make that decision, it must be able distinguish a non-adjacent report from an adjacent one. For example, in an application-based mechanism,

3.2.4. Non-Adjacent Overload Scopes

A reacting node will typically attempt to mitigate an overload condition by either reducing the number of requests that contribute to the condition, or by rerouting part of that traffic to avoid the problem. In both cases, the reacting node's is limited by its ability to determine to which Diameter requests contribute to the overload condition in the first place. The overload scope concept (Section 4) offers a way for overloaded nodes to indicate what traffic is likely to contribute to an overload condition and should be abated.

Not all of the scope-types described in Section 4 make sense for non-adjacent overload control. The "Connection" scope-type is an obvious example, since the reacting node will never share a transport connection with a non-adjacent node; this is the very definition of non-adjacent nodes.

Since a Diameter node cannot control how requests are forwarded to non-adjacent nodes, the "Peer" scope-type also does not work well, especially when there are multiple possible destinations up or downstream from the adjacent peer. For example in Figure 2, Node A sends Diameter requests to Nodes B and C across a non-supporting agent. If Node B becomes overloaded but Node C does not, Node A cannot reroute requests to Node C, since it has very little way to influence where the agent will forward any given request. If Node A tries to reduce traffic by 50%, the agent will likely still send half of the remaining traffic to Node B. If B and C are endpoints, Node A may in some cases be able to use the Destination-Host AVP for this purpose (in which case the "Destination-Host" scope-type would be more appropriate), but this does not help if B and C are also agents rather than servers.

                      +--------+       +--------+
                      | Node B |       | Node C |
                      +----+---+       +---+----+
                           |               |
                           +-------+-------+
                                   |
                           +-------+--------+
                           | Non-Supporting |
                           |  Agent         |
                           +-------+--------+
                                   |
                                   |
                              +----+----+
                              | Node  A |
                              +---------+
Figure 2: Non-Adjacent Routing

Scope-types that classify traffic by origin or final destinations, such as "Origin-Host","Destination-Realm", "Application-ID", and "Destination-Host" can be used for non-adjacent overload control. In general, scope-types that may denote non-adjacent intermediary devices, such "Peer" cannot, nor can scope-types that refer only to peers, e.g. "Connection".

Even for destination-oriented scope-types, the sender of an overload report must be authoritative for the indicated scope. That is, it must have full knowledge of the congestion state for the scope. For example, if Node B and C both serve the ream "example.com", and B becomes 50% overloaded while C does not, B cannot simply report 50% overload at realm scope. If it did, Node A would reduce its generated traffic by 50%. Since the overall realm is really only overloaded by 75%, this would leave the realm operating beneath available capacity.

  • The need to be authoritative for an indicated scope is also true for strictly adjacent reporting mechanisms. But in an adjacent mechanism, it is easier for an intervening agent to learn the overload state of upstream nodes. In the example, if the agent supported the overload control mechanism, it would most likely receive reports from Nodes B and C, and could then construct downstream reports that incorporate the state of B, C, and its own local state. This contrasts with the non-adjacent case where B must understand the current state of C even though it is not in the path of overload reports from C.

Therefore, a given node must only report overload for scopes for which it has full knowledge of the load and overload state. That is, it must be a "scope authority" for any scope it reports. In the example, nodes B and C (and any other nodes serving "example.com") would be required to share current load and overload state. The state-sharing requirement could be substantial for high-capacity nodes.

When a node reports overload for a certain scope, reacting nodes will treat the overload condition as uniform across the entire scope. For example, if a node reports overload for an entire realm, reacting nodes will reduce traffic equally for all servers that serve that realm. If the servers are unequally overloaded, they must use a more granular scope-type, for example, "Destination-Host".

3.3. Non-adjacent Overload Control Recommendations

An adjacent reporting mechanism allows for very flexible and fine grained overload control. It solves or simplifies a number of issues, such as negotiation of support and parameters, requirements for topology knowledge, end-to-end security, etc, by avoiding them in the first place. Adding non-adjacent support to such a mechanism would complicate it considerably.

Non-adjacent overload control mechanism are better for connecting islands of overload control. Such a mechanism works well for larger scopes and relatively static topologies.

The author believes that we are unlikely to find a single solution that works well for both adjacent and non-adjacent overload control. While a single solution is more desirable in general, a single solution that works well for both cases is likely to be extremely complicated. Therefore, the working group should consider a separate mechanism for the non-adjacent delivery of overload reports.

If the group chooses to accept two separate solutions, we should be able to specify a single data model and set of AVPs that work for both, with some restrictions. (For example, the non-adjacent solution would likely forbid the use of the "Connection" scope-type.)

If the working group chooses to add non-adjacent features to MDOC or DOCA, we will need to change the support negotiation mechanisms to allow for the non-adjacent case, specify how a node can determine whether a report is adjacent or non-adjacent, and state what subset of scope-types are allowed in non-adjacent supports. We will also need to study how we can meet the security-related requirements [I-D.ietf-dime-overload-reqs] given the current lack of end-to-end security features in Diameter.

4. Overload Scopes

Diameter overload does not necessarily affect all kinds of Diameter traffic. A node may become overloaded for some requests but not others. For example, a Diameter agent may handle requests for more than one Diameter Application, and may route requests to a different set of servers for each application. If one server set becomes overloaded, but the other does not, then the agent itself is effectively overloaded for one application, but can process the other at normal capacity.

The Diameter overload requirements [I-D.ietf-dime-overload-reqs] list several scenarios that illustrate overload that affects some requests but not others. We refer to the set of requests affected by a particular overload event as the "scope" of the overload event. The overload requirements require the mechanism to be able to report overload reports that are "scoped" to (that is, they affect requests targeted to) a particular Diameter node, a Realm, or a Diameter Application.

A scope indication in an overload report is a set of classifiers that identify requests likely to contribute to the overload condition. In general, this could include any aspect of a Diameter message that a reacting node can observe. For example, requests could be classified by Attribute Value Pair (AVP) values or next-hop routing decisions.

The ability to express the scope of an overload condition is only useful when reacting nodes can act on the information. There are only a small number of actions a reacting node may take to mitigate overload. Essentially these actions boil down to reducing the number of requests that "match" the scope, either by sending fewer requests in the first place, or by routing around the problem. The former is limited by the node's ability to distinguish between requests that match the overload scope, and request that do not. The latter is limited by the node's ability to predict or influence how a request will be routed.

This section discusses the meanings of the required scope-types, and analyses their implications for the selected mechanism.

4.1. Explicit vs Implicit Indication of Scopes

Both MDOC and DOCA use explicit scope indication. That is, the scope of an overload report is not, in general, implied by the type of message that carries the report. For example, if an overload report is scoped to a particular Diameter Application-Id, the report explicitly indicates affected Application-Id, rather than leaving the reacting-node to infer the Application-ID based on that of the message that carries the report. There are a few exceptions to this; for example MDOC supports a "Connection" scope that, when specified, pertains to requests to be sent over the same transport connection over which the overload report arrived.

  • List discussions have shown a common assumption that overload reports sent over a piggy-backed solution such as MDOC would only affect requests associated with the same Diameter Application-Id. For MDOC, this is a false assumption. MDOC's explicit use of scopes allows overload reports sent over one application to affect requests for any arbitrary application. On the other hand, solutions that use a dedicated Application-Id (such as DOCA) necessarily require the ability to report overload for arbitrary applications; otherwise it would only be possible for an overload control application to report overload on itself.

Some list participants have suggested that the solution include a concept of a default scope, that is, a scope that is implied if no other scope is explicitly indicated. The concept of default or implicit scopes requires further study by the working group.

4.2. Types of Overload Scopes

There are several different kinds, or types, of overload scopes. The type of a scope defines how the reacting node interprets it. Table 1 gives a summary of the scope types discussed in this document. The "Scope Type" column gives the name of the scope. The "Affected Traffic" column describes what Diameter requests are impacted by the scope-type. The "Reacting-Node" column describes which Diameter nodes may be able to take action on an overload report with the respective scope-type. Finally, the "Draft" column describes which proposed solution includes the respective scope-type.

Table 1: Summary of Overload Scope Types
Scope Type Affected Traffic Reacting-Node Draft
Connection Requests sent to directly to the reporting-node on a particular transport connection Adjacent Peer MDOC, DOCA
Peer Requests routed directly to reporting-node. Adjacent Peer MDOC, DOCA
Destination-Host Requests with a matching Destination-Host AVP Any MDOC
Origin Host Requests including a matching Origin-Host AVP Any DOCA?
Diameter Application Requests with a matching Application-Id AVP Any MDOC, DOCA
Destination Realm Requests with a matching Destination-Realm AVP Any MDOC, DOCA
Session Requests with a matching Session-Id AVP Any MDOC
Session-Group Requests belonging to sessions assigned matching labels Any MDOC

4.2.1. Connection Scope-Type

The "Connection" scope-type indicates that the reacting node should reduce traffic sent on the transport connection on which it received the overload report. A Connection scope indicate does not include an explicit value; rather it implies "this connection".

4.2.2. Peer Scope-Type

The "Peer" scope-type indicates that a particular Diameter node is overloaded. Other nodes should mitigate the overload by reducing the number of requests that will land on the overloaded node, either by sending fewer requests, or by attempting to route requests around the overloaded node.

  • In both MDOC and DOCA, the "Peer" scope-type is named "Host". In practice, only immediate peers can act as the reacting node for a Host scoped overload report. This is due to the fact that non-adjacent nodes have limited ability to influence routing decisions beyond the immediate next hop. This document uses the term "Peer" to illustrate that fact.

Large-scale Diameter nodes are often implemented as clusters of IP hosts, which may or may not share their knowledge about upstream overload conditions. Certain IP hosts in a cluster could become overloaded when others do not. Furthermore, if the reacting-node is also clustered, it may be difficult for the cluster members to share real-time knowledge of the reporting-node's overload state. This can make it difficult for a node to know conclusively whether any two connections that appear to connect to the same peer can be treated as such for the purposes of overload control. The working group should study whether the Peer scope-type should be deprecated in favor of the "Connection" scope-type.

4.2.3. Destination-Host Scope-Type

The "Destination-Host" scope type pertains to requests that contain a Destination-Host AVP that matches the indicated Destination-Host value. Destination-Host always refers to the endpoint for a given Diameter request.

The best the reacting node can do is reduce the number of requests that contain a Destination-Host AVP that match the overloaded node. Rerouting will not help in general, since the requests will simply take different routes to arrive at the same overloaded server. Unless the destination node is also direct peer, the reacting node cannot do much about requests that don't contain a Destination-Host AVP in the first place, since it cannot predict whether these requests will land on the overloaded endpoint. The Destination-Host scope type is useful for requests bound to a particular server, for example, mid-session requests for a session-stateful application.

Go ahead and cover details for "session" and "session-groups", and argue for removal of "session".

4.2.4. Origin-Host Scope-Type

While most scope-types refer to where a request is likely to go, the "Origin-Host" scope-type refers to where the request originates. That is, any request with a matching Origin-Host AVP would match. The Origin-Host scope type is useful for situations where a specific client or set of clients sends an excessive number of requests. An overload report with an Origin-Host scope would tell matching clients to reduce traffic, or agents to throttle requests that came from matching clients.

  • Note that the Origin-Host scope-type is not explicitly mentioned in the requirements document. The authors include it here because others have mentioned the need in conversation.

4.2.5. Diameter-Application Scope-Type

The "Diameter Application" scope-type indicates overload for a particular Diameter application. That is, it impacts all requests with the matching value in an Application-Id AVP.

The Diameter Application scope-type is useful for declaring an overload condition that affects a specific Diameter service, typically, but not necessarily, in a specific realm.

Since the Diameter Application scope-type indicates overload for an entire application, reacting nodes should reduce the number of requests sent for that application. Similarly to the Realm scope-type, it will rarely if ever make sense for a Diameter node to reroute traffic to a different Diameter application.

4.2.6. Destination-Realm Scope-Type

The "Destination-Realm" scope-type indicates overload for all servers that handle requests for the particular Diameter realm. That is, it impacts all requests with the particular realm in the Destination-Realm AVP.

The Realm scope-type is useful for declaring a global overload condition within a network serving a single realm. It is also useful for requesting third-parties to reduce Diameter traffic sent to a particular realm, for example, in roaming scenarios.

Since the Realm scope-type indicates overload for an entire realm, reacting nodes should reduce the number of messages sent for the realm. Rerouting traffic does not make sense for the Realm scope type, since it would probably never be useful for Diameter nodes to reroute traffic destined for an overloaded realm to a different, non-overloaded realm. Client applications might, however, be able to choose to use services from a different operator if the Diameter realm of one operator reports an overload condition.

MDOC currently makes the Realm scope-type mandatory to implement. List participants have indicated that there may be use cases where all Diameter traffic on a network uses the same Realm, and that the use of the Realm scope-type would be redundant in such networks. Whether the Realm scope-type should remain mandatory or become optional to implement requires further study.

4.2.7. Session Scope-Type

MDOC currently includes a "Session" scope-type. This scope-type refers to messages that include a matching Session-Id. Conceptually, this applies to all requests that are part of a previously established session. This scope-type could potentially be useful for a session-stateful agent that assigns session-establishing requests to a certain server, and then sends all future requests in that session to the same server. If that server became overloaded, the agent could send an overload report scoped to the assigned session.

However, the Session scope-type will become unwieldy for anything other than very small-scale installations. The number of sessions assigned to any specific server is likely to be quite large. Therefore, the number of Session scope values would probably become quite large. The working group should consider deprecating the Session scope-type. In non-topology hiding agents, the Destination-Host scope-type can be used to affect all sessions assigned to a particular server. For topology-hiding agents, the session-group mechanism can do the same.

4.2.8. Session-Group Scope-Type

Diameter agents that implement certain topology-hiding schemes may modify Origin-Host AVPs inserted by servers, and use some local mechanism to bind sessions to specific servers. The "Destination-Host" type may not function correctly in this case. MDOC specifies a "session-group" scope-type, where an agent or server can assign a common identifier to sessions that are fate-shared in some way, such as being bound to the same server. If that server becomes overloaded, the agent can send an overload report that matches requests in all sessions with the matching identifier.

This scope-type may be useful under certain circumstances, but may also be complex to implement. Further discussion is needed to determine if the session-group type should be included in the base mechanism. Since the mechanism is required to allow extensible scope-types, session-groups could still be added in the future. The working group should study whether the Session-Group mechanism should be included in the base overload control solution, or removed with the potential to add as an extension scope-type in the future.

4.3. Scope Values

Scope labels in an overload report will typically take the form of a scope-type and a value. For example, if the "example.com" realm is overloaded for all services, the overload report would indicate a scope-type of "Realm" and a scope-value of "example.com"

The Connection scope-type is an exception. Since an overload report with a Connection scope is only actionable by one of the peers connected via the specified connection, it makes sense to treat the Connection scope-type as always having a value of "this connection".

4.4. Combining Scopes

Diameter nodes will commonly need to construct overload reports that apply to a combination of scopes. For example, if a given realm is overloaded for subset of the applications it supports, it might indicate both a realm scope and and one or more Diameter application scopes.

Logically, combining multiple scopes of different types reduces the overall set of requests to which the overload report would apply. Combining multiple scopes of the same type increases the applicable set. A function that determines the requests affected by an overload report could model this as a logical "and" or "intersection" operator for combining scopes of different types, and a logical "or" or "union" operator for combining scopes of the same type.

The working group should study whether all possible combinations should be allowed. For example, it may or may not make sense to combine a "Connection" scope with other scopes, or to allow more than one "Connection" scope-value for a single overload report.

4.5. Scope Extensibility

[I-D.ietf-dime-overload-reqs] requires scope-types to be extensible. This requirement implies that the chosen mechanism or mechanisms must discuss how new scope-types can be added, how support for specific scope-types should be declared or negotiated, and which scope-types might be mandatory to support.

4.6. Scope Recommendations

In the author's opinion, the selected solution or solutions should support, at a minimum, the "Connection", "Destination-Host", "Realm" and "Application-ID" scope-types. The working group should consider also adding the "Origin-Host" scope-type.

The working group should consider whether the advantages of the "session-group" concept and scope-type are worth the complexity. The group should also study whether the Peer scope-type adds sufficient utility over the Connection scope-type to warrant it's inclusion.

5. IANA Considerations

This draft makes no requests of IANA.

6. Security Considerations

Overload reports induce Diameter nodes to reduce or reroute traffic. For large scopes, a single erroneous or malicious overload report could effectively shut down Diameter processing for an entire realm. A Diameter overload control solution needs mechanisms to ensure that overload reports are only accepted from trusted sources, and that nothing tampers with the reports en route.

For adjacent approaches, the transport connection can be protected with TLS or IPSec. But this will not help for non-adjacent reporting, since no such transport connection exists.

While such work is in progress in the DIME working group, Diameter has no currently viable mechanism for end-to-end authentication and integrity protection. The working group should consider either making non-adjacent overload control contingent on a generic Diameter end-to-end protection mechanism, or adding a specialized protection mechanism to any resulting non-adjacent overload control solution.

7. References

7.1. Normative References

[RFC6733]
Fajardo, V., Ed., Arkko, J., Loughney, J., and G. Zorn, Ed., "Diameter Base Protocol", RFC 6733, DOI 10.17487/RFC6733, , <https://www.rfc-editor.org/info/rfc6733>.
[I-D.ietf-dime-overload-reqs]
McMurry, E. and B. Campbell, "Diameter Overload Control Requirements", Work in Progress, Internet-Draft, draft-ietf-dime-overload-reqs-13, , <http://www.ietf.org/internet-drafts/draft-ietf-dime-overload-reqs-13.txt>.

7.2. Informative References

[I-D.roach-dime-overload-ctrl]
Roach, A. and E. McMurry, "A Mechanism for Diameter Overload Control", Work in Progress, Internet-Draft, draft-roach-dime-overload-ctrl-03, , <http://www.ietf.org/internet-drafts/draft-roach-dime-overload-ctrl-03.txt>.
[I-D.korhonen-dime-ovl]
Korhonen, J. and H. Tschofenig, "The Diameter Overload Control Application (DOCA)", Work in Progress, Internet-Draft, draft-korhonen-dime-ovl-01, , <http://www.ietf.org/internet-drafts/draft-korhonen-dime-ovl-01.txt>.
[Whac-a-Mole]
"Whack-a-Mole Colloquial Usage", <http://en.wikipedia.org/wiki/Whack-a-mole#Colloquial_usage>.

Appendix A. Contributors

Eric McMurry and Robert Sparks made significant contributions to the concepts in this draft.

Author's Address

Ben Campbell
Tekelec
17210 Campbell Rd.
Suite 250
Dallas, TX 75252
United States of America