Internet DRAFT - draft-ietf-bmwg-ipp-treno-btc
draft-ietf-bmwg-ipp-treno-btc
HTTP/1.1 200 OK
Date: Tue, 09 Apr 2002 01:20:02 GMT
Server: Apache/1.3.20 (Unix)
Last-Modified: Wed, 27 Nov 1996 00:20:00 GMT
ETag: "304db5-4342-329b8930"
Accept-Ranges: bytes
Content-Length: 17218
Connection: close
Content-Type: text/plain
INTERNET-DRAFT Expires May 1997 INTERNET-DRAFT
Network Working Group Matt Mathis
INTERNET-DRAFT Pittsburgh Supercomputing Center
Expiration Date: May 1997 Nov 1996
Empirical Bulk Transfer Capacity
< draft-ietf-bmwg-ippm-treno-btc-00.txt >
Status of this Document
This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its
areas, and its working groups. Note that other groups may also
distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as
``work in progress.''
To learn the current status of any Internet-Draft, please check
the ``1id-abstracts.txt'' listing contained in the Internet-
Drafts Shadow Directories on ftp.is.co.za (Africa),
nic.nordu.net (Europe), munnari.oz.au (Pacific Rim),
ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast).
Abstract:
Bulk Transport Capacity (BTC) is a measure of a network's ability
to transfer significant quantities of data with a single
congestion-aware transport connection (e.g. state-of-the-art
TCP). For many applications the BTC of the underlying network
dominates the the overall elapsed time for the application, and
thus dominates the performance as perceived by a user.
The BTC is a property of an IP cloud (links, routers, switches,
etc) between a pair of hosts. It does not include the hosts
themselves (or their transport-layer software). However,
congestion control is crucial to the BTC metric because the
Internet depends on the end systems to fairly divide the
available bandwidth on the basis of common congestion behavior.
The BTC metric is based on the performance of a reference
congestion control algorithm that has particularly uniform and
stable behavior.
Introduction
This Internet-draft is likely to become one section of some
future, larger document covering several different metrics.
Motivation:
Bulk Transport Capacity (BTC) is a measure of a network's ability
to transfer significant quantities of data with a single
congestion-aware transport connection (e.g. state-of-the-art
TCP). For many applications the BTC of the underlying network
dominates the the overall elapsed time for the application, and
thus dominates the performance as perceived by a user. Examples
of such applications include ftp and other network copy
utilities.
The BTC is a property of an IP cloud (links, routers, switches,
etc) between a pair of hosts. It does not include the hosts
themselves (or their transport-layer software). However,
congestion control is crucial to the BTC metric because the
Internet depends on the end systems to fairly divide the
available bandwidth on the basis of common congestion behavior.
The BTC metric is based on the performance of a reference
congestion control algorithm that has particularly uniform and
stable behavior. The reference algorithm is documented in
appendix A, and can be implemented in TCP using the SACK option
[RFC2018]. It is similar in style and behavior to the congestion
control algorithm which have been in standard use [Jacoboson88,
Stevens94, Stevens96] in the Internet.
Since the behavior of the reference congestion control algorithm
is well defined and implementation independent, it will be
possible confirm that different measurements only reflect
properties of the network and not the end-systems. As such BTC
will be a true network metric. [A strong definition of "network
metric" belongs in the framework document: - truly indicative of
what *could* be done with TCP or another good transport layer -
sensitive to weaknesses in the routers, switches, links, etc. of
the IP cloud that would also cause problems for production
transport layers - *not* be sensitive to weaknesses in common
host hardware or software, such as current production TCP
implementations, that can be removed by doing transport right on
the hosts - complete as a methodology in that little/no
additional deep knowledge of state-of-the-art measurement
technology is needed Others that may come to mind. - Guy Almes]
Implementing standard congestion control algorithms within the
diagnostic eliminates calibration problems associated with the
non-uniformity of current TCP implementations. However, like all
empirical metrics it introduces new problems, most notably the
need to certify the correctness of the implementation and to
verify that there are not systematic errors due to limitations of
the tester.
This version of the metric is based on the tool TReno (pronounced
tree-no), which implements the reference congestion control
algorithm over either traceroute-style UDP and ICMP messages or
ICMP ping packets.
Many of the calibration checks can be included in the measurement
process itself. The TReno program includes error and warning
messages for many conditions which indicate either problems with
the infrastructure or in some cases problems with the measurement
process. Other checks need to be performed manually.
Metric Name: TReno-Type-P-Bulk-Transfer-Capacity
(e.g. TReno-UDP-BTC)
Metric Parameters: A pair of IP addresses, Src (aka "tester")
and Dst (aka "target"), a start time T and initial
MTU.
[The framework document needs a general way to address additional
constraints that may be applied to metrics: E.g. for a
NetNow-style test between hosts on two exchange points, some
indication of/control over the first hop is needed.]
Definition: The average data rate attained by the reference
congestion control algorithm, while using type-P
packets to probe the forward (Src to Dst) path.
In the case of ICMP ping, these messages also probe
the return path.
Metric Units: bits per second
Ancillary results and output used to verify
the proper measurement procedure and calibration:
- Statistics over the entire test
(data transferred, duration and average rate)
- Statistics from the equilibrium portion of the test
(data transferred, duration, average rate, and number
of equilibrium congestion control cycles)
- Path property statistics (MTU, Min RTT, max cwnd in
equilibrium and max cwnd during Slow-start)
- Statistics from the non-equilibrium portion of the
test (nature and number of non-equilibrium events).
- Estimated load/BW/buffering used on the return path.
- Warnings about data transmission abnormalities.
(e.g packets out-of-order)
- Warning about conditions which may effect metric
accuracy. (e.g insufficient tester buffering)
- Alarms about serious data transmission abnormalities.
(e.g. data duplicated in the network)
- Alarms about tester internal inconsistencies and events
which might invalidate the results.
- IP address/name of the responding target.
- TReno version.
Method: Run the treno program on the tester with the chosen
packet type addressed to the target. Record both the
BTC and the ancillary results.
Manual calibration checks. (See detailed explanations below).
- Verify that the tester and target have sufficient raw
bandwidth to sustain the test.
- Verify that the tester and target have sufficient
buffering to support the window needed by the test.
- Verify that there is not any other system activity on the
tester or target.
- Verify that the return path is not a bottleneck at the
load needed to sustain the test.
- Verify that the IP address reported in the replies is some
interface of the selected target.
Version control:
- Record the precise TReno version (-V switch)
- Record the precise tester OS version, CPU version and
speed, interface type and version.
Discussion:
We do not use existing TCP implementations due to a number of
problems which make them difficult to calibrate as metrics. The
Reno congestion control algorithms are subject to a number of
chaotic or turbulent behaviors which introduce non-uniform
performance [Floyd95, Hoe95, mathis96]. Non-uniform performance
introduces substantial non-calibratable uncertainty when used as
a metric. Furthermore a number of people [Paxon:testing,
Comer:testing, ??others??] have observed extreme diversity
between different TCP implementations, raising doubts about
repeatability and consistency between different TCP based
measures.
There are many possible reasons why a TReno measurement might not
agree with the performance obtained by a TCP based application.
Some key ones include: older TCP's missing key algorithms such as
MTU discovery, support for large windows or SACK, or mistuning of
either the data source or sink. Some network conditions which
need the newer TCP algorithms are detected by TReno and reported
in the ancillary results. Other documents will cover methods to
diagnose the difference between TReno and TCP performance.
Note that the BTC metric is defined specifically to be the
average data rate between the source and destination hosts. The
ancillary results are designed to detect a number of possible
measurement problems, and in a few case pathological behaviors in
the network. The ancillary results should not be used as metrics
in their own right. The discussion below assumes that the TReno
algorithm is implemented as a user mode program running under a
standard operating system. Other implementations, such as a
dedicated measurement instrument, can have stronger builtin
calibration checks.
The raw performance (bandwidth) limitations of both the tester
and target SHOULD be measured by running TReno in a controlled
environment (e.g. a bench test). Ideally the observed
performance limits should be validated by diagnosing the nature
of the bottleneck and verifying that it agrees with other
benchmarks of the tester and target (e.g. That TReno performance
agrees with direct measures of backplane or memory bandwidth or
other bottleneck as appropriate.) These raw performance
limitations MAY be obtained in advance and recorded for later
reference. Currently no routers are reliable targets, although
under some conditions they can be used for meaningful
measurements. For most people testing between a pair of modern
computer systems at a few megabits per second or less, the tester
and target are unlikely to be the bottleneck.
TReno may not be accurate, and SHOULD NOT be used as a formal
metric at rates above half of the known tester or target limits.
This is because during Slow-start TReno needs to be able to send
bursts which are twice the average data rate.
[need exception if the 1st hop LAN is the limit in all cases?]
Verifying that the tester and target have sufficient buffering is
difficult. If they do not have sufficient buffer space, then
losses at their own queues may contribute to the apparent losses
along the path. There several difficulties in verifying the
tester and target buffer capacity. First, there are no good
tests of the target's buffer capacity at all. Second, all
validation of the testers buffering depend in some way on the
accuracy of reports by the tester's own operating system. Third,
there is the confusing result that in many circumstances
(particularly when there is more than sufficient average
performance) where insufficient buffering does not adversely
impact measured performance.
TReno separately instruments the performance of the equilibrium
and non-equilibrium portions of the test. This is because
TReno's behavior is intrinsicly more accurate during equilibrium.
If TReno can not sustain equilibrium, it either suggests serious
problems with the network or that the expected performance is
lower than can be accurately measures by TReno.
TReno reports (as calibration alarms) any events where transmit
packets were refused due to insufficient buffer space. It
reports a warning if the maximum measured congestion window is
larger than the reported buffer space. Although these checks are
likely to be sufficient in most cases they are probably not
sufficient in all cases, and will be subject of future research.
Note that on a timesharing or multi-tasking system, other
activity on the tester introduces burstyness due to operating
system scheduler latency. Therefore, it is very important that
there be no other system activity during a test. This SHOULD be
confirmed with other operating system specific tools.
In traceroute mode, TReno computes and reports the load on the
return path. Unlike real TCP, TReno can not distinguish between
losses on the forward and return paths, so idealy we want the
return path to introduce as little loss as possible. The best
way to test the return path is with TReno ICMP mode using ACK
sized messages, and verify that the measured packet rate is
improved by a factor of two. [More research needed]
In ICMP mode TReno measures the net effect of both the forward
and return paths on a single data stream. Bottlenecks and packet
losses in the forward and return paths are treated equally.
It would raise the accuracy of TReno traceroute mode if the ICMP
TTL execeded messages were generated at the target and
transmitted along the return path with elevated priority (reduced
losses and queuing delays).
People using the TReno metric as part of procurement documents
should be aware that in many circumstances MTU has an intrinsic
and large impact on overall path performance. Under some
conditions the difficulty in meeting a given performance
specifications is inversely proportional to the square of the
path MTU. (e.g. halving the specified MTU makes meeting the
bandwidth specification 4 times harder.)
In metric mode, TReno presents exactly the same load to the
network as a properly tuned state-of-the-art TCP between the same
pair of hosts. Although the connection is not transferring
useful data, it is no more wasteful than fetching an un-wanted
web page takes the same time to transfer.
References
[RFC2018] Mathis, M., Mahdavi, J. Floyd, S., Romanow, A., "TCP
Selective Acknowledgment Options",
ftp://ds.internic.net/rfc/rfc2018.txt
[Jacobson88] Jacobson, V., "Congestion Avoidance and Control",
Proceedings of SIGCOMM '88, Stanford, CA., August 1988.
[Stevens94] Stevens, W., "TCP/IP Illustrated, Volume 1: The
Protocols", Addison-Wesley, 1994.
[Stevens96] Stevens, W., "TCP Slow Start, Congestion Avoidance,
Fast Retransmit, and Fast Recovery Algorithms", Work in progress
ftp://ietf.org/internet-drafts/draft-stevens-tcpca-spec-01.txt
[Floyd95] Floyd, S., "TCP and successive fast retransmits",
February 1995, Obtain via ftp://ftp.ee.lbl.gov/papers/fastretrans.ps.
[Hoe95] Hoe, J., "Startup dynamics of TCP's congestion control
and avoidance schemes". Master's thesis, Massachusetts Institute
of Technology, June 1995.
[mathis96] Mathis, M. and Mahdavi, J. "Forward acknowledgment:
Refining tcp congestion control", Proceedings of ACM SIGCOMM '96,
Stanford, CA., August 1996.
Author's Address
Matt Mathis
email: mathis@psc.edu
Pittsburgh Supercomputing Center
4400 Fifth Ave.
Pittsburgh PA 15213
----------------------------------------------------------------
Appendix A:
Currently the best existing description of the algorithm is in
the "FACK technical note" below http://www.psc.edu/networking/tcp.html.
Within TReno, all invocations of "bounding parameters" will be
reported as warnings.
The FACK technical note will be revised for TReno, supplemented by a
code fragment and included here.