Benchmarking Methodology Working Group | C. Davids |
Internet-Draft | Illinois Institute of Technology |
Intended status: Informational | V. Gurbani |
Expires: July 11, 2013 | Bell Laboratories, Alcatel-Lucent |
S. Poretsky | |
Allot Communications | |
January 07, 2013 |
Methodology for Benchmarking SIP Networking Devices
draft-ietf-bmwg-sip-bench-meth-07
This document describes the methodology for benchmarking Session Initiation Protocol (SIP) performance as described in SIP benchmarking terminology document. The methodology and terminology are to be used for benchmarking signaling plane performance with varying signaling and media load. Both scale and establishment rate are measured by signaling plane performance. The SIP Devices to be benchmarked may be a single device under test (DUT) or a system under test (SUT). Benchmarks can be obtained and compared for different types of devices such as SIP Proxy Server, SBC, and server paired with a media relay or Firewall/NAT device.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http:/⁠/⁠datatracker.ietf.org/⁠drafts/⁠current/⁠.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on July 11, 2013.
Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http:/⁠/⁠trustee.ietf.org/⁠license-⁠info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
In this document, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in BCP 14, conforming to [RFC2119] and indicate requirement levels for compliant implementations.
Terms specific to SIP [RFC3261] performance benchmarking are defined in [I-D.sip-bench-term].
RFC 2119 defines the use of these key words to help make the intent of standards track documents as clear as possible. While this document uses these keywords, this document is not a standards track document. The term Throughput is defined in [RFC2544].
This document describes the methodology for benchmarking Session Initiation Protocol (SIP) performance as described in Terminology document [I-D.sip-bench-term]. The methodology and terminology are to be used for benchmarking signaling plane performance with varying signaling and media load. Both scale and establishment rate are measured by signaling plane performance.
The SIP Devices to be benchmarked may be a single device under test (DUT) or a system under test (SUT). The DUT is a SIP Server, which may be any [RFC3261] conforming device. The SUT can be any device or group of devices containing RFC 3261 conforming functionality along with Firewall and/or NAT functionality. This enables benchmarks to be obtained and compared for different types of devices such as SIP Proxy Server, SBC, SIP proxy server paired with a media relay or Firewall/NAT device. SIP Associated Media benchmarks can also be made when testing SUTs.
The test cases covered in this methodology document provide benchmarks metrics of Registration Rate, SIP Session Establishment Rate, Session Capacity, and IM Rate. These can be benchmarked with or without associated Media. Some cases are also included to cover Forking, Loop detection, Encrypted SIP, and SIP Flooding. The test topologies that can be used are described in the Test Setup section. Topologies are provided for benchmarking of a DUT or SUT. Benchmarking with Associated Media can be performed when using a SUT.
SIP permits a wide range of configuration options that are also explained in the Test Setup section. Benchmark metrics could possibly be impacted by Associated Media. The selected values for Session Duration and Media Streams Per Session enable benchmark metrics to be benchmarked without Associated Media. Session Setup Rate could possibly be impacted by the selected value for Maximum Sessions Attempted. The benchmark for Session Establishment Rate is measured with a fixed value for maximum Session Attempts.
Finally, the overall value of these tests is to serve as a comparison function between multiple SIP implementations. One way to use these tests is to derive benchmarks with SIP devices from Vendor-A, derive a new set of benchmarks with similar SIP devices from Vendor-B and perform a comparison on the results of Vendor-A and Vendor-B. This document does not make any claims on the interpretation of such results.
Familiarity with the benchmarking models in Section 2.2 of [I-D.sip-bench-term] is assumed. Figures 1 through 10 in [I-D.sip-bench-term] contain the canonical topologies that can be used to perform the benchmarking tests listed in this document.
Test cases may be performed with any transport protocol supported by SIP. This includes, but is not limited to, SIP TCP, SIP UDP, and TLS. The protocol used for the SIP transport protocol must be reported with benchmarking results.
The Signaling Server is defined in the companion terminology document, ([I-D.sip-bench-term], Section 3.2.2) It is a SIP-speaking device that complies with RFC 3261. Conformance to [RFC3261] is assumed for all tests. The Signaling Server may be the DUT or a component of a SUT. The Signaling Server may include Firewall and/or NAT functionality. The components of the SUT may be a single physical device or separate devices.
Some tests require Associated Media to be present for each SIP session. The test topologies to be used when benchmarking SUT performance for Associated Media are shown in [I-D.sip-bench-term], Figures 4 and 5.
The test cases specified in this document provide SIP performance independent of the protocol used for the media stream. Any media protocol supported by SIP may be used. This includes, but is not limited to, RTP, RTSP, and SRTP. The protocol used for Associated Media MUST be reported with benchmarking results.
Benchmarking results may vary with the number of media streams per SIP session. When benchmarking a SUT for voice, a single media stream is used. When benchmarking a SUT for voice and video, two media streams are used. The number of Associated Media Streams MUST be reported with benchmarking results.
SUT performance benchmarks may vary with the duration of SIP sessions. Session Duration MUST be reported with benchmarking results. A Session Duration of zero seconds indicates transmission of a BYE immediately following successful SIP establishment indicate by receipt of a 200 OK. An infinite Session Duration indicates that a BYE is never transmitted.
DUT and SUT performance benchmarks may vary with the the rate of attempted sessions offered by the Tester. Attempted Sessions per Second MUST be reported with benchmarking results.
The purpose of this document is to benchmark SIP performance; this document does not benchmark stability of SIP systems under stressful conditions such as a high rate of Attempted Sessions per Second.
In order to benchmark the test cases uniformly in Section 6, the algorithm described in this section should be used. Both, a prosaic description of the algorithm and a pseudo-code description are provided.
The goal is to find the largest value of a SIP session-request-rate, measured in sessions-per-second, which the DUT/SUT can process with zero errors. To discover that number, an iterative process (defined below) is used to find a candidate for this rate. Once the candidate rate has been found, the DUT/SUT is subjected to an offered load whose arrival rate is set to that of the candidate rate. This test is run for an extended period of time, which is referred to as infinity, and which is, itself, a parameter of the test labeled T in the pseudo-code. This latter phase of testing is called the steady-state phase. If errors are encountered during this steady-state phase, then the candidate rate is reduced by a defined percent, also a parameter of test, and the steady-state phase is entered again until a final (new) steady-state rate is achieved.
The iterative process itself is defined as follows: a starting rate of 100 sessions per second (sps) is selected. The test is executed for the time period identified by t in the pseudo-code below. If no failures occur, the rate is increased to 150 sps and again tested for time period t. The attempt rate is continuously ramped up until a failure is encountered before the end of the test time t. Then an attempt rate is calculated that is higher than the last successful attempt rate by a quantity equal to half the difference between the rate at which failures occurred and the last successful rate. If this new attempt rate also results in errors, a new attempt rate is tried that is higher than the last successful attempt rate by a quantity equal to half the difference between the rate at which failures occurred and the last successful rate. Continuing in this way, an attempt rate without errors is found. The operator can specify margin of error using the parameter G, measured in units of sessions per second.
The pseudo-code corresponding to the description above follows.
; ---- Parameters of test, adjust as needed t := 5000 ; local maximum; used to figure out largest ; value T := 50000 ; global maximum; once largest value has been ; figured out, pump this many requests before calling ; the test a success m := {...} ; other attributes that affect testing, such ; as media streams, etc. s := 100 ; Initial session attempt rate (in sessions/sec) G := 5 ; granularity of results - the margin of error in sps C := 0.05 ; caliberation amount: How much to back down if we ; have found candidate s but cannot send at rate s for ; time T without failures ; ---- End of parameters of test ; ---- Initialization of flags, candidate values and upper bounds f := false ; indicates that you had a success after the upper limit F := false ; indicates that test is done c := 0 ; indicates that we have found an upper limit proc main find_largest_value ; First, figure out the largest value. ; Now that the largest value (saved in s) has been figured out, ; use it for sending out s requests/s and send out T requests. do { send_traffic(s, m, T) ; send_traffic not shown if (all requests succeeded) { F := true ; test is done } else if (one or more requests fail) { s := s - (C * s) ; Reduce s by calibration amount steady_state } } while (F == false) end proc proc find_largest_value ; Iterative process to figure out the largest value we can ; handle with no failures do { send_traffic(s, m, t) ; Send s request/sec with m ; characteristics until t requests have ; been sent if (all requests succeeded) { s' := s ; save candidate value of metric if ( c == 0 ) { s := s + (0.5 * s) }else if ((c == 1) && (s??-s?)) > 2*G ) { s := s + ( 0.5 * (s?? ? s ); }else if ((c == 1) && ((s??-s?) <= 2*G ) { f := true; } else if (one or more requests fail) { c := 1 ; we have found an upper bound for the metric s?? := s ; save new upper bound s := s - (0.5 * (s ? s?)) } } while (f == false) end proc
SIP Transport Protocol = ___________________________ (valid values: TCP|UDP|TLS|SCTP|specify-other) Session Attempt Rate = _____________________________ (session attempts/sec) IS Media Attempt Rate = ____________________________ (IS media attempts/sec) Total Sessions Attempted = _________________________ (total sessions to be created over duration of test) Media Streams Per Session = _______________________ (number of streams per session) Associated Media Protocol = _______________________ (RTP|RTSP|specify-other) Media Packet Size = _______________________________ (bytes) Media Offered Load = ______________________________ (packets per second) Media Session Hold Time = _________________________ (seconds) Establishment Threshold time = ____________________ (seconds) Loop Detecting Option = ___________________________ (on|off) Forking Option Number of endpoints request sent to = ___________ (1, means forking is not enabled) Type of forking = _______________________________ (serial|parallel) Authentication option = ___________________________________ (on|off; if on, please see Notes 2 and 3 below).
Note 1: Total Sessions Attempted is used in the calculation of the Session Establishment Performance ([I-D.sip-bench-term], Section 3.4.5). It is the number of session attempts ([I-D.sip-bench-term], Section 3.1.6) that will be made over the duration of the test.
Note 2: When the Authentication Option is "on" the test tool must be set to ignore 401 and 407 failure responses in any test described as a "test to failure." If this is not done, all such tests will yield trivial benchmarks, as all attempt rates will lead to a failure after the first attempt.
Note 3: When the Authentication Option is "on" the DUT/SUT uses two transactions instead of one when it is establishing a session or accomplishing a registration. The first transaction ends with the 401 or 407. The second ends with the 200 OK or another failure message. The Test Organization interested in knowing how many times the EA was intended to send a REGISTER as distinct from how many times the EA wound up actually sending a REGISTER may wish to record the following data as well: Number of responses of the following type:
401: _____________ (if authentication turned on; N/A otherwise) 407: _____________ (if authentication turned on; N/A otherwise)
Registration Rate = _______________________________ (registrations per second) Re-registration Rate = ____________________________ (registrations per second) Session Capacity = _________________________________ (sessions) Session Overload Capacity = ________________________ (sessions) Session Establishment Rate = ______________________ (sessions per second) Session Establishment Performance = _______________ (total established sessions/total sessions attempted)(no units) Session Attempt Delay = ___________________________ (seconds)
IM Rate = _______________________________ (IM messages per second)
This document does not requires any IANA considerations.
Documents of this type do not directly affect the security of Internet or corporate networks as long as benchmarking is not performed on devices or systems connected to production networks. Security threats and how to counter these in SIP and the media layer is discussed in RFC3261, RFC3550, and RFC3711 and various other drafts. This document attempts to formalize a set of common methodology for benchmarking performance of SIP devices in a lab environment.
The authors would like to thank Keith Drage and Daryl Malas for their contributions to this document. Dale Worley provided an extensive review that lead to improvements in the documents.
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. |
[RFC2544] | Bradner, S. and J. McQuaid, "Benchmarking Methodology for Network Interconnect Devices", RFC 2544, March 1999. |
[I-D.sip-bench-term] | Davids, C., Gurbani, V. and S. Poretsky, "SIP Performance Benchmarking Terminology", Internet-Draft draft-ietf-bmwg-sip-bench-term-07, March 2012. |
[RFC3261] | Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M. and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002. |