Internet DRAFT - draft-ietf-dime-doic-rate-control

draft-ietf-dime-doic-rate-control







Diameter Maintenance and Extensions (DIME)               S. Donovan, Ed.
Internet-Draft                                                    Oracle
Intended status: Standards Track                                 E. Noel
Expires: August 15, 2019                                       AT&T Labs
                                                       February 11, 2019


                     Diameter Overload Rate Control
                  draft-ietf-dime-doic-rate-control-11

Abstract

   This specification documents an extension to the Diameter Overload
   Indication Conveyance (DOIC) [RFC7683] base solution.  This extension
   adds a new overload control abatement algorithm.  This abatement
   algorithm allows for a DOIC reporting node to specify a maximum rate
   at which a DOIC reacting node sends Diameter requests to the DOIC
   reporting node.

Requirements

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in BCP
   14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on August 15, 2019.








Donovan & Noel           Expires August 15, 2019                [Page 1]

Internet-Draft       Diameter Overload Rate Control        February 2019


Copyright Notice

   Copyright (c) 2019 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   5
   3.  Interaction with DOIC Report Types  . . . . . . . . . . . . .   5
   4.  Capability Announcement . . . . . . . . . . . . . . . . . . .   6
   5.  Overload Report Handling  . . . . . . . . . . . . . . . . . .   6
     5.1.  Reporting Node Overload Control State . . . . . . . . . .   6
     5.2.  Reacting Node Overload Control State  . . . . . . . . . .   7
     5.3.  Reporting Node Maintenance of Overload Control State  . .   7
     5.4.  Reacting Node Maintenance of Overload Control State . . .   8
     5.5.  Reporting Node Behavior for Rate Abatement Algorithm  . .   8
     5.6.  Reacting Node Behavior for Rate Abatement Algorithm . . .   9
   6.  Rate Abatement Algorithm AVPs . . . . . . . . . . . . . . . .   9
     6.1.  OC-Supported-Features AVP . . . . . . . . . . . . . . . .   9
       6.1.1.  OC-Feature-Vector AVP . . . . . . . . . . . . . . . .   9
     6.2.  OC-OLR AVP  . . . . . . . . . . . . . . . . . . . . . . .   9
       6.2.1.  OC-Maximum-Rate AVP . . . . . . . . . . . . . . . . .  10
     6.3.  Attribute Value Pair Flag Rules . . . . . . . . . . . . .  10
   7.  Rate-Based Abatement Algorithm  . . . . . . . . . . . . . . .  10
     7.1.  Overview  . . . . . . . . . . . . . . . . . . . . . . . .  11
     7.2.  Reporting Node Behavior . . . . . . . . . . . . . . . . .  11
     7.3.  Reacting Node Behavior  . . . . . . . . . . . . . . . . .  12
       7.3.1.  Default Algorithm for Rate-based Control  . . . . . .  12
       7.3.2.  Priority Treatment  . . . . . . . . . . . . . . . . .  15
       7.3.3.  Optional Enhancement: Avoidance of Resonance  . . . .  17
   8.  IANA Consideration  . . . . . . . . . . . . . . . . . . . . .  18
     8.1.  AVP Codes . . . . . . . . . . . . . . . . . . . . . . . .  18
     8.2.  OC-Supported-Features . . . . . . . . . . . . . . . . . .  18
     8.3.  New DOIC report types . . . . . . . . . . . . . . . . . .  19
   9.  Security Considerations . . . . . . . . . . . . . . . . . . .  19
   10. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  19
   11. References  . . . . . . . . . . . . . . . . . . . . . . . . .  19



Donovan & Noel           Expires August 15, 2019                [Page 2]

Internet-Draft       Diameter Overload Rate Control        February 2019


     11.1.  Normative References . . . . . . . . . . . . . . . . . .  19
     11.2.  Informative References . . . . . . . . . . . . . . . . .  20
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  20

1.  Introduction

   This document defines a new Diameter overload control abatement
   algorithm, the "rate" algorithm.

   The base Diameter overload specification [RFC7683] defines the "loss"
   algorithm as the default Diameter overload abatement algorithm.  The
   loss algorithm allows a reporting node (see Section 2) to instruct a
   reacting node (see Section 2) to reduce the amount of traffic sent to
   the reporting node by abating (diverting or throttling) a percentage
   of requests sent to the server.  While this can effectively decrease
   the load handled by the server, it does not directly address cases
   where the rate of arrival of service requests changes quickly.  For
   instance, if the service requests that result in Diameter
   transactions increase quickly then the loss algorithm cannot
   guarantee the load presented to the server remains below a specific
   rate level.  The loss algorithm can be slow to ensure the stability
   of reporting nodes when subjected to rapidly-changing loads.  The
   "loss" algorithm errs both in throttling too much when there is a dip
   in offered load, and throttling not enough when there is a spike in
   offered load.

   Consider the case where a reacting node is handling 100 service
   requests per second, where each of these service requests results in
   one Diameter transaction being sent to a reporting node.  If the
   reporting node is approaching an overload state, or is already in an
   overload state, it will send a Diameter overload report requesting a
   percentage reduction in traffic sent when the loss algorithm is used
   as Diameter overload abatement algorithm.  Assume for this discussion
   that the reporting node requests a 10% reduction.  The reacting node
   will then abate (diverting or throttling) ten Diameter transactions a
   second, sending the remaining 90 transactions per second to the
   reporting node.

   Now assume that the reacting node's service requests spikes to 1000
   requests per second.  The reacting node will continue to honor the
   reporting node's request for a 10% reduction in traffic.  This
   results, in this example, in the reacting node sending 900 Diameter
   transactions per second, abating the remaining 100 transactions per
   second.  This spike in traffic is significantly higher than the
   reporting node is expecting to handle and can result in negative
   impacts to the stability of the reporting node.





Donovan & Noel           Expires August 15, 2019                [Page 3]

Internet-Draft       Diameter Overload Rate Control        February 2019


   The reporting node can, and likely would, send another overload
   report requesting that the reacting node abate 91% of requests to get
   back to the desired 90 transactions per second.  However, once the
   spike has abated and the reacting node handled service requests
   returns to 100 per second, this will result in just 9 transactions
   per second being sent to the reporting node, requiring a new overload
   report setting the reduction percentage back to 10%.  This control
   feedback loop has the potential to make the situation worse by
   causing wide fluctuations in traffic on multiple nodes in the
   Diameter network.

   One of the benefits of a rate-based algorithm over the loss algorithm
   is that it better handles spikes in traffic.  Instead of sending a
   request to reduce traffic by a percentage, the rate approach allows
   the reporting node to specify the maximum number of Diameter requests
   per second that can be sent to the reporting node.  For instance, in
   this example, the reporting node could send a rate-based request
   specifying the maximum transactions per second to be 90.  The
   reacting node will send the 90 regardless of whether it is receiving
   100 or 1000 service requests per second.

   It should be noted that one of the implications of the rate-based
   algorithm is that the reporting node needs to determine how it wants
   to distribute its load over the set of reacting nodes from which it
   is receiving traffic.  For instance, if the reporting node is
   receiving Diameter traffic from 10 reacting nodes and has a capacity
   of 100 transactions per second then the reporting node could choose
   to set the rate for each of the reacting nodes to 10 transactions per
   second.  This, of course, is assuming that each of the reacting nodes
   has equal performance characteristics.  The reporting node could also
   choose to have a high capacity reacting node send 55 transactions per
   second and the remaining 9 low capacity reacting nodes send 5
   transactions per second.  The ability of the reporting node to
   specify the amount of traffic on a per-reacting-node basis implies
   that the reporting node must maintain state for each of the reacting
   nodes.  This state includes the current allocation of Diameter
   traffic to that reacting node.  If the number of reacting nodes
   changes, either because new nodes are added, nodes are removed from
   service or nodes fail, then the reporting node will need to
   redistribute the maximum Diameter transactions over the new set of
   reacting nodes.

   This document extends the base Diameter Overload Indication
   Conveyance (DOIC) solution [RFC7683] to add support for the rate-
   based overload abatement algorithm.

   This document draws heavily on work in the SIP Overload Control
   working group.  The definition of the rate abatement algorithm is



Donovan & Noel           Expires August 15, 2019                [Page 4]

Internet-Draft       Diameter Overload Rate Control        February 2019


   copied almost verbatim from the SIP Overload Control (SOC) document
   [RFC7415], with changes focused on making the wording consistent with
   the DOIC solution and the Diameter protocol.

2.  Terminology

   Diameter Node

      A Diameter Client, Diameter Server, or Diameter Agent.  [RFC6733]

   Diameter Endpoint

      A Diameter Client or Diameter Server.  [RFC6733]

   DOIC Node

      A Diameter Node that supports the DOIC solution defined in
      [RFC7683].

   Reporting Node

      A DOIC Node that sends a DOIC overload report.

   Reacting Node

      A DOIC Node that receives and acts on a DOIC overload report.

3.  Interaction with DOIC Report Types

   As of the publication of this specification, there are two DOIC
   report types defined with the specification of a third in progress:

   HOST_REPORT 0  Overload of a specific Diameter Application at a
      specific Diameter Node as defined in [RFC7683]

   REALM_REPORT 1  Overload of a specific Diameter Application at a
      specific Diameter Realm as defined in [RFC7683]

   PEER_REPORT 2  Overload of a specific Diameter peer as defined in
      [I-D.ietf-dime-agent-overload]

   The rate algorithm MAY be selected by reporting nodes for any of
   these report types.

   It is expected that all report types defined in the future will
   indicate whether or not the rate algorithm can be used with that
   report type.




Donovan & Noel           Expires August 15, 2019                [Page 5]

Internet-Draft       Diameter Overload Rate Control        February 2019


4.  Capability Announcement

   This document defines the rate abatement algorithm (referred to as
   rate in this document) feature.  Support for the rate feature by a
   DOIC node will be indicated by a new value of the OC-Feature-Vector
   AVP, as described in Section 6.1.1, per the rules defined in
   [RFC7683].

   Since all nodes that support DOIC are required to support the loss
   algorithm, DOIC nodes supporting the rate feature will support both
   the loss and rate-based abatement algorithms.

   DOIC reacting nodes supporting the rate feature MUST indicate support
   for both the loss and rate algorithms in the OC-Feature-Vector AVP
   and MAY indicate support for other algorithms.

   As defined in [RFC7683], a DOIC reporting node supporting the rate
   feature selects a single abatement algorithm in the OC-Feature-Vector
   AVP and OC-Peer-Algo AVP in the answer message sent to the DOIC
   reacting nodes.

   A reporting node can select one abatement algorithm to apply to host
   and realm reports and a different algorithm to apply to peer reports.

      For host or realm reports the selected algorithm is reflected in
      the OC-Feature-Vector AVP sent as part of the OC-Supported-
      Features AVP included in answer messages for transaction where the
      request contained an OC-Supported-Features AVP.  This is per the
      procedures defined in [RFC7683].

      For peer reports the selected algorithm is reflected in the OC-
      Peer-Algo AVP sent as part of the OC-Supported-Features AVP
      included answer messages for transactions where the request
      contained an OC-Supported-Features AVP.  This is per the
      procedures defined in [I-D.ietf-dime-agent-overload].

5.  Overload Report Handling

   This section describes any changes to the behavior defined in
   [RFC7683] for handling of overload reports when the rate overload
   abatement algorithm is used.

5.1.  Reporting Node Overload Control State

   A reporting node that uses the rate abatement algorithm SHOULD
   maintain reporting node Overload Control State (OCS) for each
   reacting node to which it sends a rate Overload Report (OLR).




Donovan & Noel           Expires August 15, 2019                [Page 6]

Internet-Draft       Diameter Overload Rate Control        February 2019


      This is different from the behavior defined in [RFC7683] where a
      reporting node sends a single loss percentage to all reacting
      nodes.

   A reporting node SHOULD maintain OCS entries when using the rate
   abatement algorithm per supported Diameter application, per targeted
   reacting node and per report type.

   A rate OCS entry is identified by the tuple of Application-Id, report
   type and DiameterIdentity of the target of the rate OLR.

   The rate OCS entry SHOULD include the rate allocated to the reacting
   note.

   A reporting node that has selected the rate overload abatement
   algorithm MUST indicate the rate requested to be applied by DOIC
   reacting nodes in the OC-Maximum-Rate AVP included in the OC-OLR AVP.

   All other elements for the OCS defined in [RFC7683] and
   [I-D.ietf-dime-agent-overload] also apply to the reporting nodes OCS
   when using the rate abatement algorithm.

5.2.  Reacting Node Overload Control State

   A reacting node that supports the rate abatement algorithm MUST
   indicate rate as the selected abatement algorithm in the reacting
   node OCS based on the OC-Feature-Vector AVP or the OC-Peer-Algo AVP
   in the received OC-Supported-Features AVP.

   A reacting node that supports the rate abatement algorithm MUST
   include the rate specified in the OC-Maximum-Rate AVP included in the
   OC-OLR AVP as an element of the abatement-algorithm-specific portion
   of reacting node OCS entries.

   All other elements for the OCS defined in [RFC7683] and
   [I-D.ietf-dime-agent-overload] also apply to the reporting nodes OCS
   when using the rate abatement algorithm.

5.3.  Reporting Node Maintenance of Overload Control State

   A reporting node that has selected the rate overload abatement
   algorithm and enters an overload condition MUST indicate rate as the
   abatement algorithm and MUST indicate the selected rate in the
   resulting reporting node OCS entries.

   When selecting the rate algorithm in the response to a request that
   contained an OC-Supporting-Features AVP with an OC-Feature-Vector AVP
   indicating support for the rate feature, a reporting node MUST ensure



Donovan & Noel           Expires August 15, 2019                [Page 7]

Internet-Draft       Diameter Overload Rate Control        February 2019


   that a reporting node OCS entry exists for the target of the overload
   report.  The target is defined as follows:

   o  For Host reports, the target is the DiameterIdentity contained in
      the Origin-Host AVP received in the request.

   o  For Realm reports, the target is the DiameterIdentity contained in
      the Origin-Realm AVP received in the request.

   o  For Peer reports, the target is the DiameterIdentity of the
      Diameter Peer from which the request was received.

   A reporting node that receives a capability announcement from a new
   reacting node, meaning a reacting node for which it does not have an
   OCS entry, and the reporting node chooses the rate algorithm for that
   reacting node may need to recalculate the rate to be allocated to all
   reacting nodes.  Any changed rate values will be communicated in the
   next OLR sent to each reacting node.

5.4.  Reacting Node Maintenance of Overload Control State

   When receiving an answer message indicating that the reporting node
   has selected the rate algorithm, a reacting node MUST indicate the
   rate abatement algorithm in the reacting node OCS entry for the
   reporting node.

   A reacting node receiving an overload report for the rate abatement
   algorithm MUST save the rate received in the OC-Maximum-Rate AVP
   contained in the OC-OLR AVP in the reacting node OCS entry.

5.5.  Reporting Node Behavior for Rate Abatement Algorithm

   When in an overload condition with rate selected as the overload
   abatement algorithm and when handling a request that contained an OC-
   Supported-Features AVP that indicated support for the rate abatement
   algorithm, a reporting node SHOULD include an OC-OLR AVP for the rate
   algorithm using the parameters stored in the reporting node OCS for
   the target of the overload report.

      Note: It is also possible for the reporting node to send overload
      reports with the rate algorithm indicated even when the reporting
      node is not in an overloaded state.  This could be a strategy to
      proactively avoid entering into an overloaded state.  Whether to
      do so is up to local policy.

   When sending an overload report for the rate algorithm, the OC-
   Maximum-Rate AVP MUST be included in the OC-OLR AVP and the OC-
   Reduction-Percentage AVP MUST NOT be included.



Donovan & Noel           Expires August 15, 2019                [Page 8]

Internet-Draft       Diameter Overload Rate Control        February 2019


5.6.  Reacting Node Behavior for Rate Abatement Algorithm

   When determining if abatement treatment should be applied to a
   request being sent to a reporting node that has selected the rate
   overload abatement algorithm, the reacting node can choose to use the
   algorithm detailed in Section 7.

   Other algorithms for controlling the rate MAY be implemented by the
   reacting node.  Any algorithm implemented MUST correctly limit the
   maximum rate of traffic being sent to the reporting node.

   Once a determination is made by the reacting node that an individual
   Diameter request is to be subjected to abatement treatment then the
   procedures for throttling and diversion defined in [RFC7683] and
   [I-D.ietf-dime-agent-overload] apply.

6.  Rate Abatement Algorithm AVPs

6.1.  OC-Supported-Features AVP

   The rate algorithm does not add any new AVPs to the OC-Supported-
   Features AVP.

   The rate algorithm does add a new feature bit to be carried in the
   OC-Feature-Vector AVP.

6.1.1.  OC-Feature-Vector AVP

   This extension adds the following capability to the OC-Feature-Vector
   AVP.

   OLR_RATE_ALGORITHM (bit 2)

      Bit 2 is assigned to the rate overload abatement algorithm.  When
      this flag is set by the overload control endpoint it indicates
      that the DOIC Node supports the rate overload abatement algorithm.

6.2.  OC-OLR AVP

   This extension defines the OC-Maximum-Rate AVP to be an optional part
   of the OC-OLR AVP.










Donovan & Noel           Expires August 15, 2019                [Page 9]

Internet-Draft       Diameter Overload Rate Control        February 2019


      OC-OLR ::= < AVP Header: TBD2 >
                 < OC-Sequence-Number >
                 < OC-Report-Type >
                 [ OC-Reduction-Percentage ]
                 [ OC-Validity-Duration ]
                 [ SourceID ]
                 [ OC-Maximum-Rate ]
               * [ AVP ]


   This extension makes no changes to the other AVPs that are part of
   the OC-OLR AVP.

   This extension does not define new overload report types.  The
   existing report types of host and realm defined in [RFC7683] apply to
   the rate control algorithm.  The peer report type defined in
   [I-D.ietf-dime-agent-overload] also applies to the rate control
   algorithm.

6.2.1.  OC-Maximum-Rate AVP

   The OC-Maximum-Rate AVP (AVP code TBD1) is of type Unsigned32 and
   describes the maximum rate that the sender is requested to send
   traffic.  This is specified in terms of requests per second.

   A value of zero indicates that no traffic is to be sent.

6.3.  Attribute Value Pair Flag Rules


                                                             +---------+
                                                             |AVP flag |
                                                             |rules    |
                                                             +----+----+
                            AVP   Section                    |    |MUST|
    Attribute Name          Code  Defined  Value Type        |MUST| NOT|
   +---------------------------------------------------------+----+----+
   |OC-Maximum-Rate         TBD1    6.2    Unsigned32        |    | V  |
   +---------------------------------------------------------+----+----+


7.  Rate-Based Abatement Algorithm

   This section is pulled from [RFC7415], with minor changes needed to
   make it apply to the Diameter protocol.






Donovan & Noel           Expires August 15, 2019               [Page 10]

Internet-Draft       Diameter Overload Rate Control        February 2019


7.1.  Overview

   The reporting node is the one protected by the overload control
   algorithm defined here.  The reacting node is the one that abates
   traffic towards the server.

   Following the procedures defined in [RFC7683], the reacting node and
   reporting node signal their support for rate-based overload control.

   Then periodically, the reporting node relies on internal measurements
   (e.g.  CPU utilization or queuing delay) to evaluate its overload
   state and estimate a target maximum Diameter request rate in number
   of requests per second (as opposed to target percent reduction in the
   case of loss-based abatement).

   When in an overloaded state, the reporting node uses the OC-OLR AVP
   to inform reacting nodes of its overload state and of the target
   Diameter request rate.

   Upon receiving the overload report with a target maximum Diameter
   request rate, each reacting node applies overload abatement for new
   Diameter requests towards the reporting node.

7.2.  Reporting Node Behavior

   The actual algorithm used by the reporting node to determine its
   overload state and estimate a target maximum Diameter request rate is
   beyond the scope of this document.

   However, the reporting node MUST periodically evaluate its overload
   state and estimate a target Diameter request rate beyond which it
   would become overloaded.  The reporting node must allocate a portion
   of the target Diameter request rate to each of its reacting nodes.
   The reporting node may set the same rate for every reacting node, or
   may set different rates for different reacting node.

   The maximum rate determined by the reporting node for a reacting node
   applies to the entire stream of Diameter requests, even though
   abatement may only affect a particular subset of the requests, since
   the reacting node might apply priority as part of its decision of
   which requests to abate.

   When setting the maximum rate for a particular reacting node, the
   reporting node may need take into account the workload (e.g.  CPU
   load per request) of the distribution of message types from that
   reacting node.  Furthermore, because the reacting node may prioritize
   the specific types of messages it sends while under overload
   restriction, this distribution of message types may be different from



Donovan & Noel           Expires August 15, 2019               [Page 11]

Internet-Draft       Diameter Overload Rate Control        February 2019


   the message distribution for that reacting node under non-overload
   conditions (e.g., either higher or lower CPU load).

   Note that the value of OC-Maximum-Rate AVP (in request messages per
   second) for the rate algorithm provides a loose upper bound on the
   traffic sent by the reacting node to the reporting node.

   In other words, when multiple reacting nodes are being controlled by
   an overloaded reporting node, at any given time, some reporting nodes
   may receive requests at a rate below its target maximum Diameter
   request rate while others above that target rate.  But the resulting
   request rate presented to the overloaded reporting node will converge
   towards the target Diameter request rate or a lower rate.

   Upon detection of overload, and the determination to invoke overload
   controls, the reporting node follows the specifications in [RFC7683]
   to notify its clients of the allocated target maximum Diameter
   request rate and to notify them that the rate overload abatement is
   in effect.

   The reporting node uses the OC-Maximum-Rate AVP defined in this
   specification to communicate a target maximum Diameter request rate
   to each of its clients.

7.3.  Reacting Node Behavior

7.3.1.  Default Algorithm for Rate-based Control

   A reference algorithm is shown below.

   Note that use of // below inidcates a comment.

   No priority case:


















Donovan & Noel           Expires August 15, 2019               [Page 12]

Internet-Draft       Diameter Overload Rate Control        February 2019


        // T: inter-transmission interval, set to 1 / OC-Maximum-Rate
        // TAU: tolerance parameter
        // ta: arrival time of the most recent arrival
        // LCT: arrival time of last Diameter request that
        //      was sent to the server
        //      (initialized to the first arrival time)
        // X: current value of the leaky bucket counter (initialized to
        //    TAU0)

        // After most recent arrival, calculate auxiliary variable Xp
        Xp = X - (ta - LCT);

        if (Xp <= TAU) {
          // Transmit Diameter request
          // Update X and LCT
          X = max (0, Xp) + T;
          LCT = ta;
        } else {
          // Reject Diameter request
          // Do not update X and LCT
        }

   In determining whether or not to transmit a specific message, the
   reacting node can use any algorithm that limits the message rate to
   the OC-Maximum-Rate AVP value in units of messages per second.  For
   ease of discussion, we define T = 1/[OC-Maximum-Rate] as the target
   inter-Diameter request interval.  It may be strictly deterministic,
   or it may be probabilistic.  It may, or may not, have a tolerance
   factor, to allow for short bursts, as long as the long term rate
   remains below 1/T.

   The algorithm may have provisions for prioritizing traffic.

   If the algorithm requires other parameters (in addition to "T", which
   is 1/OC-Maximum-Rate), they may be set autonomously by the reacting
   node, or they may be negotiated independently between reacting node
   and reporting node.

   In either case, the coordination is out of scope for this document.
   The default algorithms presented here (one with and one without
   provisions for prioritizing traffic) are only examples.

   To apply abatement treatment to new Diameter requests at the rate
   specified in the OC-Maximum-Rate AVP value sent by the reporting node
   to its reacting nodes, the reacting node MAY use the proposed default
   algorithm for rate-based control or any other equivalent algorithm
   that forward messages in conformance with the upper bound of 1/T
   messages per second.



Donovan & Noel           Expires August 15, 2019               [Page 13]

Internet-Draft       Diameter Overload Rate Control        February 2019


   The default Leaky Bucket algorithm presented here is based on [ITU-T
   Rec. I.371] Appendix A.2.  The algorithm makes it possible for
   reacting nodes to deliver Diameter requests at a rate specified in
   the OC-Maximum-Rate value with tolerance parameter TAU (preferably
   configurable).

   Conceptually, the Leaky Bucket algorithm can be viewed as a finite
   capacity bucket whose real-valued content drains out at a continuous
   rate of 1 unit of content per time unit and whose content increases
   by the increment T for each forwarded Diameter request.  T is
   computed as the inverse of the rate specified in the OC-Maximum-Rate
   AVP value, namely T = 1 / OC-Maximum-Rate.

   Note that when the OC-Maximum-Rate value is 0 with a non-zero OC-
   Validity-Duration, then the reacting node should apply abatement
   treatment to 100% of Diameter requests destined to the overloaded
   reporting node.  However, when the OC-Validity-Duration value is 0,
   the reacting node should stop applying abatement treatment.

   If, at a new Diameter request arrival, the content of the bucket is
   less than or equal to the limit value TAU, then the Diameter request
   is forwarded to the server; otherwise, the abatement treatment is
   applied to the Diameter request.

   Note that the capacity of the bucket (the upper bound of the counter)
   is (T + TAU).

   The tolerance parameter TAU determines how close the long-term
   admitted rate is to an ideal control that would admit all Diameter
   requests for arrival rates less than 1/T and then admit Diameter
   requests precisely at the rate of 1/T for arrival rates above 1/T.
   In particular at mean arrival rates close to 1/T, it determines the
   tolerance to deviation of the inter-arrival time from T (the larger
   TAU the more tolerance to deviations from the inter-departure
   interval T).

   This deviation from the inter-departure interval influences the
   admitted rate burstyness, or the number of consecutive Diameter
   requests forwarded to the reporting node (burst size proportional to
   TAU over the difference between 1/T and the arrival rate).

   In situations where reacting nodes are configured with some knowledge
   about the reporting node and other traffic sources (e.g., operator
   pre-provisioning), it can be beneficial to choose a value of TAU
   based on how many reacting nodes will be sending requests to the
   reporting node.





Donovan & Noel           Expires August 15, 2019               [Page 14]

Internet-Draft       Diameter Overload Rate Control        February 2019


   Reporting nodes with a very large number of reacting nodes, each with
   a relatively small arrival rate, will generally benefit from a
   smaller value for TAU in order to limit queuing (and hence response
   times) at the reporting node when subjected to a sudden surge of
   traffic from all reacting nodes.  Conversely, a reporting node with a
   relatively small number of reacting nodes, each with proportionally
   larger arrival rate, will benefit from a larger value of TAU.

   Once the control has been activated, at the arrival time of the k-th
   new Diameter request, ta(k), the content of the bucket is
   provisionally updated to the value

   X' = X - (ta(k) - LCT)

   where X is the value of the leaky bucket counter after arrival of the
   last forwarded Diameter request, and LCT is the time at which the
   last Diameter request was forwarded.

   If X' is less than or equal to the limit value TAU, then the new
   Diameter request is forwarded and the leaky bucket counter X is set
   to X' (or to 0 if X' is negative) plus the increment T, and LCT is
   set to the current time ta(k).  If X' is greater than the limit value
   TAU, then the abatement treatment is applied to the new Diameter
   request and the values of X and LCT are unchanged.

   When the first response from the reporting node has been received
   indicating control activation (OC-Validity-Duration>0), LCT is set to
   the time of activation, and the leaky bucket counter is initialized
   to the parameter TAU0 (preferably configurable) which is 0 or larger
   but less than or equal to TAU.

   TAU can assume any positive real number value and is not necessarily
   bounded by T.

   TAU=4*T is a reasonable compromise between burst size and abatement
   rate adaptation at low offered rate.

   Note that specification of a value for TAU, and any communication or
   coordination between servers, is beyond the scope of this document.

7.3.2.  Priority Treatment

   A reference algorithm is shown below.

   Priority case:






Donovan & Noel           Expires August 15, 2019               [Page 15]

Internet-Draft       Diameter Overload Rate Control        February 2019


     // T: inter-transmission interval, set to 1 / OC-Maximum-Rate
     // TAU1: tolerance parameter of no priority Diameter requests
     // TAU2: tolerance parameter of priority Diameter requests
     // ta: arrival time of the most recent arrival
     // LCT: arrival time of last Diameter request that
     //      was sent to the server
     //      (initialized to the first arrival time)
     // X: current value of the leaky bucket counter (initialized to
     //    TAU0)

     // After most recent arrival, calculate auxiliary variable Xp
     Xp = X - (ta - LCT);

    if (AnyRequestReceived && Xp <= TAU1) || (PriorityRequestReceived &&
     Xp <= TAU2 && Xp > TAU1) {
       // Transmit Diameter request
       // Update X and LCT
       X = max (0, Xp) + T;
       LCT = ta;
     } else {
       // Apply abatement treatment to Diameter request
       // Do not update X and LCT
     }

   The reacting node is responsible for applying message priority and
   for maintaining two categories of requests: Request candidates for
   reduction, requests not subject to reduction (except under
   extenuating circumstances when there aren't any messages in the first
   category that can be reduced).

   Accordingly, the proposed Leaky bucket implementation is modified to
   support priority using two thresholds for Diameter requests in the
   set of request candidates for reduction.  With two priorities, the
   proposed Leaky bucket requires two thresholds TAU1 < TAU2:

   o  All new requests would be admitted when the leaky bucket counter
      is at or below TAU1,

   o  Only higher priority requests would be admitted when the leaky
      bucket counter is between TAU1 and TAU2,

   o  All requests would be rejected when the bucket counter is above
      TAU2.

   This can be generalized to n priorities using n thresholds for n>2.

   With a priority scheme that relies on two tolerance parameters (TAU2
   influences the priority traffic, TAU1 influences the non-priority



Donovan & Noel           Expires August 15, 2019               [Page 16]

Internet-Draft       Diameter Overload Rate Control        February 2019


   traffic), always set TAU1 <= TAU2 (TAU is replaced by TAU1 and TAU2).
   Setting both tolerance parameters to the same value is equivalent to
   having no priority.  TAU1 influences the admitted rate the same way
   as TAU does when no priority is set.  And the larger the difference
   between TAU1 and TAU2, the closer the control is to strict priority
   queuing.

   TAU1 and TAU2 can assume any positive real number value and is not
   necessarily bounded by T.

   Reasonable values for TAU0, TAU1 & TAU2 are:

   o  TAU0 = 0,

   o  TAU1 = 1/2 * TAU2, and

   o  TAU2 = 10 * T.

   Note that specification of a value for TAU1 and TAU2, and any
   communication or coordination between servers, is beyond the scope of
   this document.

7.3.3.  Optional Enhancement: Avoidance of Resonance

   As the number of reacting node sources of traffic increases and the
   throughput of the reporting node decreases, the maximum rate admitted
   by each reacting node needs to decrease, and therefore the value of T
   becomes larger.  Under some circumstances, e.g. if the traffic arises
   very quickly simultaneously at many sources, the occupancies of each
   bucket can become synchronized, resulting in the admissions from each
   source being close in time and batched or very 'peaky' arrivals at
   the reporting node, which not only gives rise to control instability,
   but also very poor delays and even lost messages.  An appropriate
   term for this is 'resonance' [Erramilli].

   If the network topology is such that resonance can occur, then a
   simple way to avoid resonance is to randomize the bucket occupancy at
   two appropriate points -- at the activation of control and whenever
   the bucket empties -- as described below.

   After updating the value of the leaky bucket to X', generate a value
   u as follows:

   if X' > 0, then u=0

   else if X' <= 0, then let u be set to a random value uniformly
   distributed between -1/2 and +1/2




Donovan & Noel           Expires August 15, 2019               [Page 17]

Internet-Draft       Diameter Overload Rate Control        February 2019


   Then (only) if the arrival is admitted, increase the bucket content
   by an amount T + uT, which will therefore be just T if the bucket
   hadn't emptied, or lie between T/2 and 3T/2 if it had.

   This randomization should also be done when control is activated,
   i.e. instead of simply initializing the leaky bucket counter to TAU0,
   initialize it to TAU0 + uT, where u is uniformly distributed as
   above.  Since activation would have been a result of response to a
   request sent by the reacting node, the second term in this expression
   can be interpreted as being the bucket increment following that
   admission.

   This method has the following characteristics:

   o  If TAU0 is chosen to be equal to TAU and all sources activate
      control at the same time due to an extremely high request rate,
      then the time until the first request admitted by each reacting
      node would be uniformly distributed over [0,T];

   o  The maximum occupancy is TAU + (3/2)T, rather than TAU + T without
      randomization;

   o  For the special case of 'classic gapping' where TAU=0, then the
      minimum time between admissions is uniformly distributed over
      [T/2, 3T/2], and the mean time between admissions is the same,
      i.e. T+1/R where R is the request arrival rate.

   o  At high load randomization rarely occurs, so there is no loss of
      precision of the admitted rate, even though the randomized
      'phasing' of the buckets remains.

8.  IANA Consideration

8.1.  AVP Codes

   New AVPs defined by this specification are listed in Section 6.  All
   AVP codes are allocated from the 'Authentication, Authorization, and
   Accounting (AAA) Parameters' AVP Codes registry.

8.2.  OC-Supported-Features

   As indicated in Section 6.1.1, a new allocation is required in the
   OC-Feature-Vector AVP.








Donovan & Noel           Expires August 15, 2019               [Page 18]

Internet-Draft       Diameter Overload Rate Control        February 2019


8.3.  New DOIC report types

   All DOIC report types defined in the future MUST indicate whether or
   not the rate algorithm can be used with that report type.

9.  Security Considerations

   The rate overload abatement mechanism is an extension to the base
   Diameter overload mechanism.  As such, all of the security
   considerations outlined in [RFC7683] apply to the rate overload
   abatement mechanism.

   In addition, the rate algorithm could be used to handle DoS attacks
   more effectively than the loss algorithm.

10.  Acknowledgements

   Lionel Morand for his contributions to the document.

11.  References

11.1.  Normative References

   [I-D.ietf-dime-agent-overload]
              Donovan, S., "Diameter Agent Overload", draft-ietf-dime-
              agent-overload-00 (work in progress), December 2014.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC6733]  Fajardo, V., Ed., Arkko, J., Loughney, J., and G. Zorn,
              Ed., "Diameter Base Protocol", RFC 6733,
              DOI 10.17487/RFC6733, October 2012,
              <https://www.rfc-editor.org/info/rfc6733>.

   [RFC7683]  Korhonen, J., Ed., Donovan, S., Ed., Campbell, B., and L.
              Morand, "Diameter Overload Indication Conveyance",
              RFC 7683, DOI 10.17487/RFC7683, October 2015,
              <https://www.rfc-editor.org/info/rfc7683>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.






Donovan & Noel           Expires August 15, 2019               [Page 19]

Internet-Draft       Diameter Overload Rate Control        February 2019


11.2.  Informative References

   [Erramilli]
              Erramilli, A. and L. Forys, "Traffic Synchronization
              Effects In Teletraffic Systems", 1991.

   [RFC7415]  Noel, E. and P. Williams, "Session Initiation Protocol
              (SIP) Rate Control", RFC 7415, DOI 10.17487/RFC7415,
              February 2015, <https://www.rfc-editor.org/info/rfc7415>.

Authors' Addresses

   Steve Donovan (editor)
   Oracle
   7460 Warren Pkwy # 300
   Frisco, Texas  75034
   United States

   Email: srdonovan@usdonovans.com


   Eric Noel
   AT&T Labs
   200s Laurel Avenue
   Middletown, NJ  07747
   United States

   Email: ecnoel@research.att.com























Donovan & Noel           Expires August 15, 2019               [Page 20]