Diameter Maintenance and Extensions (DIME) | S. Donovan |
Internet-Draft | Oracle |
Intended status: Standards Track | October 21, 2013 |
Expires: April 24, 2014 |
Diameter Agent Overload
draft-donovan-dime-agent-overload-00.txt
This specification documents an extension to the Diameter Overload Control (DOC) base solution. The extension addresses the handling of agent overload.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 24, 2014.
Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
This document defines the behavior of Diameter nodes when Diameter agents become overloaded.
The base Diameter overload specification [I-D.docdt-dime-ovli] addresses the handling of overload when a Diameter endpoint (a Diameter Client or Diameter server as defined in [RFC6733]) becomes overloaded.
In the base specification case, the goal is to react to the overload as close to the generator of the Diameter traffic as is feasible. When possible this is done at the originator of the traffic, generally referred to as a Diameter Client. A Diameter agent can also do the overload mitigation. For instance, a Diameter agent can handle Diameter overload mitigation when it knows that a Diameter client does not support the ability to do the mitigation.
This document extends the base Diameter endpoint overload specification to address the case when Diameter agents become overloaded. Just as is the case with other Diameter nodes, both clients and servers, surges in Diameter traffic can cause a Diameter agent to be asked to handle more Diameter traffic than it was configured to handle. For a more detailed discussion of what can cause the overload of Diameter nodes, refer to the Diameter Overload Requirements [I-D.ietf-dime-overload-reqs].
This document builds on the "Loss" overload mitigation algorithm defined in [I-D.docdt-dime-ovli]. The handling of endpoint overload and agent overload is very similar. The primary differences are the following:
Editors note – These definitions need to be made consistent with the base Diameter overload specification.
The following use cases illustrate the cases where agent overload must be handled.
This use case is illustrated in Figure 1. In this case, the client sends all traffic through the single agent. If there is a failure in the agent then the client is unable to send Diameter traffic toward the server.
+-+ +-+ +-+ |c|----|a|----|s| +-+ +-+ +-+
Figure 1
A more likely case for the use of agents is illustrated in Figure 2. In this case, there are multiple servers behind the single agent. The client sends all traffic through the agent and the agent determines how to distribute the traffic to the servers based on local routing and load distribution policy.
+-+ --|s| +-+ +-+ / +-+ |c|----|a|- ... +-+ +-+ \ +-+ --|s| +-+
Figure 2
In both of these cases, the occurrence of overload in the single agent must by handled by the client in a similar fashion as if the client were handling the overload of a directly connected server. When the agent becomes overloaded it will insert an agent overload report in answer messages flowing to the client. This overload report will contain a requested reduction in the amount of traffic being sent to the agent. The client will apply overload abatement behavior as defined in the base Diameter overload specification [I-D.docdt-dime-ovli]. This will result in the requested percentage of the requests that would have been sent to the agent being dropped with the appropriate indication given to the request that resulted in the need for the Diameter transaction.
Figure 3 and Figure 4 illustrate a second, and more likely, type of deployment scenario involving agents. In both of these cases, the client has connections to two agents.
Figure 3 illustrates a client that has a primary connection to one of the agents (agent a1) and a secondary connection to the other agent (agent a2). In this scenario, the client will use the primary connection for all traffic. The secondary connection is used when there is a failure scenario of some sort.
+--+ +-+ --|a1|---|s| +-+ / +--+\ /+-+ |c|- x +-+ . +--+/ \+-+ ..|a2|---|s| +--+ +-+
Figure 3
The second case, in Figure 4, illustrates the case where the connections to the agents are both actively used. In this case, the client will have a local distribution policy to determine the percentage of the traffic sent through each client.
+--+ +-+ --|a1|---|s| +-+ / +--+\ /+-+ |c|- x +-+ \ +--+/ \+-+ --|a2|---|s| +--+ +-+
Figure 4
In the case where a single agent in the above scenarios become overloaded, the client should reduce the amount of traffic sent to the overloaded agent by the amount requested. This traffic should instead be routed through the non-overloaded agent. For example, assume that the overloaded agent requests a reduction of 10 percent. The client should send 10 percent of the traffic that would have been routed to the overloaded agent through the non-overloaded agent.
In the case where both agents are reporting overload, the client will need to start dropping traffic in a similar fashion as discussed in section 3.1. The amount of traffic depends on the combined reduction requested by the two agents.
There are also deployment scenarios where there can be multiple agents between clients and servers. Examples of this type of deployment include when there are edge agents between Diameter networks. Another example of this type of deployment is when there are multiple sets of servers, each supporting a subset of the Diameter traffic.
Figure 5 illustrates one such network deployment case. Note that while this figure shows a maximum of two agents being involved in a Diameter transaction, it is possible that more than two agents could be in the path of a transaction.
+---+ +---+ +-+ --|a11|-----|a21|---|s| +-+ / +---+ \ / +---+\ /+-+ |c|- x x +-+ \ +---+ / \ +---+/ \+-+ --|a12|-----|a22|---|s| +---+ +---+ +-+
Figure 5
Handling of overload of one or both of agents a11 or a12 in this case is equivalent to that discussed in section 2.2.
Overload of agents a21 and a22 must be handled by the previous hop agents. As such, agents a11 and a12 must handle the overload mitigation logic when receiving an agent overload report from agents a21 and a22.
Editor's note: Probably need to elaborate the reasoning behind the need for the agent overload report being handled by the previous hop agent.
The handling of the overload reports is similar to that discussed in section 2.2. If the overload can be addressed by adjusting the amount of traffic sent to the next hop agents, then this approach should be taken.
If both of the agents have requested a reduction in traffic then the previous hop agent must start rejecting the appropriate amount of transactions. When rejecting requests, the agent must use the same mechanism as defined in the base overload specification [I-D.docdt-dime-ovli].
It is possible that both an agent and a server can be overloaded at the same time. When this occurs, the Diameter entity will need to handle both overload reports. When this occurs the overload reactor should first handle the throttling of the overloaded end-point. Any messages that survive that throttling should then be throttled (or routed) based on the reduction requested in the agent overload report.
Editors Note: This section depends upon the completion of the base Diameter Overload specification. As such, it cannot be complete until the data model and extension mechanism are finalized in the based DOC specification. Details for any new AVPs or modifications to existing AVPs will be added in a future version of the draft after the base DOC specification has stabilized.
This extension adds the following capabilities to the OC-Feature-Vector AVP.
This extension makes no changes to the TimeStamp, ValidityDuration and OC-Algorithm AVPs.
The agent overload function extends the base Diameter overload specification by defining a new overload report type of “peer”. See section [4.5] in [I-D.docdt-dime-ovli] for a description of the overload report type AVP.
The following extension is proposed for the ReportType AVP.
The overload report must also include the Diameter identity of the agent that generated the report. This is necessary to handle the case where there is a non supporting agent between the requesting node and the reacting node. Without the indication of the agent that generated the overload request, the reacting node would erroneously assume that the report applied to the non supporting node. This could, in turn, result in unnecessary traffic being either redistributed or throttled.
This extension adds the Reporting-Node AVP.
The OC-Reporting-Node AVP (AVP code TBD) is of type DiameterIdentity and is inserted by the reporting node. It contains the Diameter Identity of the inserting node. This is used by the reacting node to determine if the peer report came from a true peer. Behavior associated with this AVP is discussed in Section 5.2
+---------+ |AVP flag | |rules | +----+----+ AVP Section | |MUST| Attribute Name Code Defined Value Type |MUST| NOT| +--------------------------------------------------+----+----+ |OC-Reporting-Node TBD1 x.x Unsigned64 | | V | +--------------------------------------------------+----+----+
An agent that supports this specification must have the ability to determine when it is appropriate to send an overload report. This is based on local policy but, as is the case with Diameter end-point overload, this will generally be done as a way to attempt to avoid the agent actually entering an overloaded state.
Once the agent determines that there is need to request a reduction in traffic then it SHOULD include the overload report in all answer messages handled by the agent.
The overload report must include a type of peer.
The amount of reduction requested MUST be included in the overload report.
The requested duration of the report MUST be included in the overload report.
The overload report must include a timestamp indicating when the report was first sent. The reacting node uses the timestamp to determine differentiate an already received report from a new report.
Editor's note: These statements might turn out to be repeats of normative requirements in the DOC baseline specification. If this is so then they likelly can be removed from this document.
The overload report must include the DiameterIdentity of the reporting node in the OC-Reporting-Node AVP. This is used by DOC end-points to determine if the report came from a true peer or from a non adjacent reporting node.
The reporting agent must follow all other overload reporting node behaviors outlined in the base overload specification. This includes sending a report with a reduction of zero when the need for a reduction has been abated. It also includes sending a new overload report, with a new timestamp, to refresh the abatement duration.
A DOC reacting node receiving an overload report of type "peer" must first verify that the report came from an adjacent node or from a non-adjacent reporting node.
If the report came from a non-adjacent reporting node then the reacting node must strip the overload report and take no other action as a result of the report.
If the peer report came from an adjacent node then the reacting node should attempt to distribute subsequent traffic through available routes, with a reduction of the amount of traffic sent to the reporting node. The reasoning behind re-distributing the requests through other routes is the general thought that it is best to attempt to complete requests when there is capacity in the network. In the case of agent overload, the targetted servers will not necessarily be overloaded. As such, re-distributed requests are likely to be successfully handled.
If there is not sufficient capacity to route offered traffic through the available routes then the reacting node must throttle traffic.
If the reacting node is throttling traffic then it must select the throttled traffic using the loss algorithm defined in [I-D.docdt-dime-ovli].
If the Diameter node is a Diameter end-point then the throttling action results in the Diameter request not being sent and presenting the appropriate application level response to the request that caused the need for the Diameter transaction.
If the Diameter node is a Diameter agent then the throttling action involves generating the error response in an answer message for the throttled transactions. The error response must be the same as defined for agent throttling actions in [I-D.docdt-dime-ovli].
Editors note: This section will be completed once the base overload document has finished the definition of extension IANA requirements.
Agent overload is an extension to the based Diameter overload mechanism. As such, all of the security considerations outlined in [I-D.docdt-dime-ovli] apply to the agent overload scenarios.
It is possible that the malicious insertion of an agent overload report could have a bigger impact on a Diameter network as agents can be concentration points in a Diameter network. Where an end-point report would impact the traffic sent to a single Diameter server, for example, an agent overload report could throttle all traffic to the Diameter network.
This impact is amplified in an agent that sits at the edge of a Diameter network that serves as the entry point from all other Diameter networks.
Adam Roach and Eric McMurry for the work done in defining a comprehensive Diameter overload solution in draft-roach-dime-overload-ctrl-03.txt.
Ben Campbell for his insights and review of early versions of this document.
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. |
[RFC6733] | Fajardo, V., Arkko, J., Loughney, J. and G. Zorn, "Diameter Base Protocol", RFC 6733, October 2012. |
[RFC5226] | Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 5226, May 2008. |
[I-D.docdt-dime-ovli] | Korhonen, J., "Diameter Overload Indication Conveyance", October 2013. |
[I-D.ietf-dime-overload-reqs] | McMurry, E. and B. Campbell, "Diameter Overload Control Requirements", Internet-Draft draft-ietf-dime-overload-reqs-13, September 2013. |