Internet DRAFT - draft-hurtig-tsvwg-transport-apis
draft-hurtig-tsvwg-transport-apis
Transport Area P. Hurtig, Ed.
Internet-Draft Karlstad University
Intended status: Informational S. Gjessing
Expires: June 18, 2014 M. Welzl
University of Oslo
M. Sustrik
December 15, 2013
Transport APIs
draft-hurtig-tsvwg-transport-apis-00
Abstract
Commonly used networking APIs are currently limited by the transport
layer's inability to expose services instead of protocols. An API/
application/user is therefore forced to use exactly the services that
are implemented by the selected transport. This document surveys
networking APIs and discusses how they can be improved by a more
expressive transport layer that hides and automatizes the choice of
the transport protocol.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on June 18, 2014.
Copyright Notice
Copyright (c) 2013 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
Hurtig, et al. Expires June 18, 2014 [Page 1]
Internet-Draft Transport APIs December 2013
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3
2. Services Offered by IETF Transports . . . . . . . . . . . . . 3
3. General Networking APIs . . . . . . . . . . . . . . . . . . . 4
3.1. ZeroMQ . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2. nanomsg . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.3. enet . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.4. Java Message Service . . . . . . . . . . . . . . . . . . 7
3.5. Chrome Network Stack . . . . . . . . . . . . . . . . . . 7
3.6. CFNetwork . . . . . . . . . . . . . . . . . . . . . . . . 8
3.7. Apache Portable Runtime . . . . . . . . . . . . . . . . . 8
3.8. VirtIO . . . . . . . . . . . . . . . . . . . . . . . . . 8
4. Networking APIs with Exposed Transport . . . . . . . . . . . 8
4.1. Berkeley Sockets . . . . . . . . . . . . . . . . . . . . 8
4.2. Java Libraries . . . . . . . . . . . . . . . . . . . . . 8
4.3. Netscape Portable Runtime . . . . . . . . . . . . . . . . 9
4.4. Infiniband Verbs . . . . . . . . . . . . . . . . . . . . 10
4.5. Input/Output Completion Port . . . . . . . . . . . . . . 10
5. Security Considerations . . . . . . . . . . . . . . . . . . . 10
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10
7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 10
8. Comments Solicited . . . . . . . . . . . . . . . . . . . . . 10
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 10
9.1. Normative References . . . . . . . . . . . . . . . . . . 10
9.2. Informative References . . . . . . . . . . . . . . . . . 11
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 12
1. Introduction
The intention of this document is to create an understanding of some
commonly used network APIs and how the mechanisms they provide could
possibly be enhanced via a richer set of transport services. A non-
comprehensive list of APIs is given, along with a brief description
and a discussion of how they relate to services provided by current
transports.
To understand what tools a transport system could have available to
better realize mechanisms that higher level APIs offer, the next
section gives a high-level (and most certainly incomplete) overview
Hurtig, et al. Expires June 18, 2014 [Page 2]
Internet-Draft Transport APIs December 2013
of services offered by transports that have been published by the
IETF or are currently being proposed.
This overview is followed by two sections describing different types
of transport APIs: general APIs and APIs exposing the underlying
transport.
The general APIs can intuitively benefit from a richer set of
transport services as they do not expose the underlying transport to
the application. Section 3 describe a subset of these APIs and
analyze how they can benefit from transport services. The complexity
of these APIs range from providing simple transport interfaces to
providing advanced communication libraries utilizing message-oriented
middleware. API-wise there are two broad classes of such middleware:
centralized solutions where a server manages the communication and
decentralized ones where the endpoints communicate directly.
Although there is no standard interface for these types of middleware
the JMS API (see Section 3.4) can be thought of as the canonical API
for centralized solutions and the BSD socket API, as implemented by
nanomsg (see Section 3.2), for the decentralized.
APIs that expose the underlying transport, including e.g. BSD
sockets, differ a lot from general APIs as they both require an
explicit choice of transport, and then expose this choice. This is a
significant limitation in the context of transport services, as an
explicit choice of transport also limits the amount of services that
can be used. It is, however, possible to enhance this type of APIs
as some transports provide services that are not fully exposed to
applications. Section 4 explains how such services can be used and
provides descriptions of the most common APIs and how they can be
enhanced.
1.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
2. Services Offered by IETF Transports
From [WJG11], TCP [RFC0793] [RFC5681], UDP [RFC0768], UDP-Lite
[RFC3828], SCTP [RFC4960] and DCCP [RFC4340] offer various
combinations of: TCP-like congestion control / "smooth" congestion
control (which is expected to have less jitter); application PDU
bundling (which is the mechanism called "Nagle" in TCP); error
detection (using a checksum with full or partial payload coverage);
reliability (yes/no); delivery order. The point of not always
requiring full reliability and ordered delivery is that these
Hurtig, et al. Expires June 18, 2014 [Page 3]
Internet-Draft Transport APIs December 2013
mechanisms can come at the cost of extra delay which is unnecessary
if these properties of the data transmission are not needed. After
the publication of [WJG11], some more features were defined, e.g.
SCTP now also offers partial reliability using a timer.
MPTCP [RFC6824] and SCTP offer multihoming for improved robustness
(as a backup in case a path fails), which is a mechanism that is
listed in [WJG11] but could perhaps be hidden from an application.
Similarly, it was shown in [WNG11] that the benefits of multi-
streaming (mapping multiple application streams onto one connection,
or "association" in SCTP terminology) can be exploited without
exposing this functionality to an application. Because of this
assumption, multi-streaming was not included as a service in [WJG11].
MPTCP and CMT-SCTP also use multiple paths to achieve better
performance, at the possible cost of some extra delay and jitter; as
discussed in Appendix A.2 of [RFC6897], an advanced MPTCP API could
allow applications to provide high-level guidance about its
requirements in terms of high bandwidth, low latency and jitter
stability, or high reliability.
The newly proposed Minion [MINION] has a somewhat different way of
translating some of the above mentioned lower-level transport
mechanisms (e.g. multi-streaming or partial reliability) into
application services. It provides message cancellation and has a
notion of superseding messages, i.e. a later message rendering a
prior one unnecessary. Ordered delivery is provided according to
pre-specified message dependencies, and a request-reply communication
model is offered (i.e. a message can be a reply to another message,
i.e. address the original message's reply-handler).
When applying multi-streaming, priorities between streams become a
mere scheduling decision. In the absence of multi-streaming, there
is at least one congestion control method in an RFC that is more
aggressive than standard Reno-like TCP (HighSpeed TCP [RFC3649]), and
there is also the more recent LEDBAT [RFC6817] which is specifically
designed for low-priority "scavenger" traffic. All in all, it is
probably correct to say that IETF transports are likely to be able to
honor priorities between data streams in one way or another.
3. General Networking APIs
This section introduces and provides an analysis of commonly used
networking APIs in the context of transport services. That is, how
are these APIs currently designed and how, if at all, can these APIs
be simplified and/or enhanced given a transport API that exposes all
services provided by the operating system.
Hurtig, et al. Expires June 18, 2014 [Page 4]
Internet-Draft Transport APIs December 2013
Please note that the current list of APIs is incomplete and rather
arbitrary. Feedback is very welcome!
3.1. ZeroMQ
3.1.1. Description
ZeroMQ is a messaging library that simplifies and improves the usage
of sockets. It operates on messages, and has embedded support for a
variety of communication styles including e.g. request/reply or pub/
sub. What this means is that, for instance, a socket of type
"request" can issue one request, and then a reply must arrive on that
socket; any other sequence of communication will produce an error
message. ZeroMQ tries to be transport agnostic and currently works
on top of IPC, TCP and PGM.
Internally, ZeroMQ's functionality largely depends on buffering
mechanisms. For instance, in contrast to native Berkeley sockets, a
single server socket can be used to read and respond to requests from
multiple clients. To achieve this, ZeroMQ must accept incoming
requests and read their data as they arrive from multiple clients,
buffer them, and upon the application's request hand the data over to
the application using fair queuing.
3.1.2. Analysis
Like Minion, ZeroMQ introduces delimiters into a TCP stream to send
frames of a given size using the ZeroMQ Message Transport Protocol
[ZMTP]. Some form of multi-streaming is intended for the future:
According to the FAQ [ZMQFAQ] page, having multiple sockets share a
single TCP connection is being added to the next version of the ZMTP
protocol. Today one can accomplish this "using a proxy that sits
between the external TCP address, and your tasks".
Multi-streaming over standard TCP creates an RTT of HOL blocking
delay for all out-of-order packets that arrive at the receiver's
buffer. This problem also occurs with e.g. SPDY [SPDYWP] [SPDYID]
over TCP; just like SPDY works better over QUIC [QUIC], ZeroMQ can be
made to work better over a transport that natively supports multi-
streaming.
Because ZeroMQ is implemented as a user space library, it cannot
multiplex streams from multiple processes. This can be a significant
drawback when many small stand-alone services are co-located on the
same host. In contrast, in line with the way TCP and UDP are
currently implemented, it is likely that broader transport services
would be provided monolithically, e.g. in the system's kernel,
thereby eliminating this problem.
Hurtig, et al. Expires June 18, 2014 [Page 5]
Internet-Draft Transport APIs December 2013
The notion of request and reply sockets seems to be similar in Minion
and in ZeroMQ. Hence, mapping such ZeroMQ sockets onto Minion is
probably an efficient way to implement them. One may wonder where to
draw the boundaries between a transport like Minion and a middleware
or library like ZeroMQ, i.e. is it really more efficient to provide
request-reply functionality in the transport layer? Conceptually,
many of Minion's functions (e.g., message cancellation and
superseding messages) relate to having direct access to the sender
and receiver-side buffers, which is otherwise limited depending on
the TCP implementation, and by standard TCP's in-order-delivery
requirement. At the same time, ZeroMQ's functions have to do with
controlling the sender and receiver-side buffers; it therefore seems
natural that transports such as Minion could improve the performance
of ZeroMQ.
Notably, some transports might turn out to be a poor match for
ZeroMQ. For example, MPTCP requires a larger receiver buffer than
standard TCP due to the larger expected reordering. However, if
ZeroMQ's ZMTP protocol does or will (in accordance with the FAQ
mentioned above) multiplex data from several sockets over a single
TCP stream, this might create extra delay before the the receiver-
side ZeroMQ instance can take the data from the buffer and hand it
over to the application.
3.2. nanomsg
3.2.1. Description
3.2.2. Analysis
3.3. enet
3.3.1. Description
enet started out as a networking layer for a first-person shooter
where low latency communication with very frequent data transmission
was needed. It is a lightweight library that is entirely based on
UDP, which it extends with a set of optional features such as
reliability and in-order packet delivery.
Its features include connection management (monitoring of a
connection with frequent pings), optional reliability, sequencing
(mandatory for reliable transmission), fragmentation and reassembly,
aggregation, flow control. It gives its user control over the packet
size (a function call allows a packet to be resized), and sequential
delivery is enforced.
Hurtig, et al. Expires June 18, 2014 [Page 6]
Internet-Draft Transport APIs December 2013
Reliability in enet is a binary choice; it does not allow providing a
deadline or maximum number of retransmissions per packet; if a per-
host-configurable number of retries is exceeded, the host is
disconnected.
Because HOL blocking delay can arise when guaranteeing sequential
delivery, enet also has a form of multi-streaming (called
"channels").
enet provides window-based flow control for reliable packets and a
dynamic throttle that drops packets from the send buffer if the
network is congested based on a given probability. This probability
is based on measuring the RTT to a peer; if the current RTT is
significantly greater than the mean RTT, the probability is increased
up to a configurable maximum value. Each host's bandwidth limits are
taken into account as an upper bound for the bandwidth used by enet.
A broadcast function can be used to send a packet to all currently
connected peers on a host.
3.3.2. Analysis
Many of the functions in enet resemble functions found in SCTP and
Minion -- e.g., control over the packet size, optional reliability,
multi-streaming. Since enet intends to be "thin", simply using these
protocols instead probably would not make it better. However, enet's
goal being low latency, it could benefit from other functions such as
SCTP's and MPTCP's multi-path capability (picking the lower latency
path). The congestion control also appears to be rather rudimentary
-- there are known issues with using the RTT as a congestion signal
(for one, it is incapable of distinguishing between congestion on the
forward and backward path). Probably, using the congestion control
embedded in an IETF-standardized protocol could improve enet's
performance under certain situations. Finally, the "broadcast"
functionality could benefit from multicast.
3.4. Java Message Service
3.4.1. Description
3.4.2. Analysis
3.5. Chrome Network Stack
3.5.1. Description
3.5.2. Analysis
Hurtig, et al. Expires June 18, 2014 [Page 7]
Internet-Draft Transport APIs December 2013
3.6. CFNetwork
3.6.1. Description
3.6.2. Analysis
3.7. Apache Portable Runtime
3.7.1. Description
3.7.2. Analysis
3.8. VirtIO
3.8.1. Description
3.8.2. Analysis
4. Networking APIs with Exposed Transport
Much of the motivation behind the transport services concept comes
from the limitations posed by networking APIs that require the user
to explicitly chose a transport, and thus confine itself to a certain
number of "services". It is, however, possible to include such APIs
in the transport services concept if mechanisms can be hidden from
the application [WNG11].
This section describes a number of commonly used APIs that expose the
underlying transport and analyzes how these particular APIs could be
improved with transport services.
4.1. Berkeley Sockets
4.1.1. Description
4.1.2. Analysis
4.2. Java Libraries
4.2.1. Description
The Java library has classes to handle TCP and UDP sockets. There is
also a separate library, not included with the regular Java
distribution, that interfaces SCTP.
The java.net library contains the two classes Socket and ServerSocket
that handle TCP sockets. These sockets write a message at a time,
but read character streams. A ServerSocket contains a method called
Hurtig, et al. Expires June 18, 2014 [Page 8]
Internet-Draft Transport APIs December 2013
"accept", that waits for a connection request from a client. The
class DatagramSocket handles UDP-sockets. It "receive"s and "send"s
objects of the class DatagramPacket that contain characters. The
"close" method closes the connection. Finally the library contains a
class called NetworkInterface that can be used to query the operating
system about available network interfaces.
The separate Java library that handle SCTP a is called
com.sun.nio.sctp. Similar to the TCP-sockets there are classes
called SctpChannel and SctpServerChannel. An instance of the former
can control a single association only, while an instance of the
latter can control multiple associations. Instances of the class
SctpMultiChannel can also control multiple associations.
4.2.2. Analysis
The Java socket api is very similar to the Berkeley socket api. A
main difference is that the transport to be used is defined as a
parameter to the socket() call in the Berkeley socket api, while in
Java different classes is used for the different protocols. There is
no well known support for DCCP in Java.
When a socket object is created it can either be connected
immediately, or the "connect" method can be called later. If not
already bound, a socket is bound to a local address by calling the
method "bind". To shut down the connection, "close" is called. If
an application calls "receive" on a datagram socket, the method call
will block the application until a packet is received, which may
never happen using an unreliable transfer. When operations on
Sockets fail, an exception is thrown.
The SCTP interface is event driven. When the SCTP stack wants to
notify the applications, it generates a Notification object. This
object is passed as parameter to the method "handleNotification" in
an instance of the class NotificationHandler. An association will be
implicitly set up by a send or receive method call if there is no
current association. The SCTP library is only supporter at run time
by Linux and Solaris.
4.3. Netscape Portable Runtime
4.3.1. Description
Hurtig, et al. Expires June 18, 2014 [Page 9]
Internet-Draft Transport APIs December 2013
4.3.2. Analysis
4.4. Infiniband Verbs
4.4.1. Description
4.4.2. Analysis
4.5. Input/Output Completion Port
4.5.1. Description
4.5.2. Analysis
5. Security Considerations
TBD
6. IANA Considerations
At this point, the memo includes no request to IANA.
7. Acknowledgments
Hurtig, Gjessing, and Welzl are supported by RITE, a research project
(ICT-317700) funded by the European Community under its Seventh
Framework Program. The views expressed here are those of the
author(s) only. The European Commission is not liable for any use
that may be made of the information in this document.
8. Comments Solicited
To be removed by RFC Editor: This draft is a part of the first steps
towards an IETF BoF on Transport Services. Comments and questions
are encouraged and very welcome. They can be addressed to the
current mailing list <transport-services@ifi.uio.no> and/or to the
authors.
9. References
9.1. Normative References
[RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768,
August 1980.
[RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC
793, September 1981.
Hurtig, et al. Expires June 18, 2014 [Page 10]
Internet-Draft Transport APIs December 2013
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC3208] Speakman, T., Crowcroft, J., Gemmell, J., Farinacci, D.,
Lin, S., Leshchiner, D., Luby, M., Montgomery, T., Rizzo,
L., Tweedly, A., Bhaskar, N., Edmonstone, R.,
Sumanasekera, R., and L. Vicisano, "PGM Reliable Transport
Protocol Specification", RFC 3208, December 2001.
[RFC3649] Floyd, S., "HighSpeed TCP for Large Congestion Windows",
RFC 3649, December 2003.
[RFC3828] Larzon, L-A., Degermark, M., Pink, S., Jonsson, L-E., and
G. Fairhurst, "The Lightweight User Datagram Protocol
(UDP-Lite)", RFC 3828, July 2004.
[RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram
Congestion Control Protocol (DCCP)", RFC 4340, March 2006.
[RFC4960] Stewart, R., "Stream Control Transmission Protocol", RFC
4960, September 2007.
[RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
Control", RFC 5681, September 2009.
[RFC6817] Shalunov, S., Hazel, G., Iyengar, J., and M. Kuehlewind,
"Low Extra Delay Background Transport (LEDBAT)", RFC 6817,
December 2012.
[RFC6824] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure,
"TCP Extensions for Multipath Operation with Multiple
Addresses", RFC 6824, January 2013.
[RFC6897] Scharf, M. and A. Ford, "Multipath TCP (MPTCP) Application
Interface Considerations", RFC 6897, March 2013.
9.2. Informative References
[MINION] Iyengar, J., Cheshire, S., and J. Graessley, "Minion -
Service Model and Conceptual API", draft-iyengar-minion-
concept-02.txt (work in progress), October 2013.
[QUIC] Roskind, J., "QUIC: Design Document and Specification
Rational", April 2012, <https://bitly.com/Hm0DyX>.
[SPDYID] Belshe, M. and R. Peon, "SPDY Protocol", draft-mbelshe-
httpbis-spdy-00.txt (work in progress), February 2012.
Hurtig, et al. Expires June 18, 2014 [Page 11]
Internet-Draft Transport APIs December 2013
[SPDYWP] Belshe, M., "SPDY: An Experimental Protocol for a Faster
Web", April 2012,
<http://www.chromium.org/spdy/spdy-whitepaper>.
[WJG11] Welzl, M., Jorer, S., and S. Gjessing, "Towards a
Protocol-Independent Internet Transport API", IEEE ICC
2011., June 2011.
[WNG11] Welzl, M., Niederbacher, F., and S. Gjessing, "Beneficial
Transparent Deployment of SCTP: the Missing Pieces", IEEE
GLOBECOM 2011, December 2011.
[ZMQFAQ] Sustrik, M., "Frequently Asked Questions - zeromq", July
2008, <http://zeromq.org/area:faq>.
[ZMTP] Hintjens, P., Hurton, M., and I. Barber, "ZMTP - ZeroMQ
Message Transport Protocol", June 2013,
<http://rfc.zeromq.org/spec:23>.
Authors' Addresses
Per Hurtig (editor)
Karlstad University
Universitetsgatan 2
Karlstad 651 88
Sweden
Phone: +46 54 700 23 35
Email: per.hurtig@kau.se
Stein Gjessing
University of Oslo
PO Box 1080 Blindern
Oslo N-0316
Norway
Phone: +47 22 85 24 44
Email: stein.gjessing@ifi.uio.no
Hurtig, et al. Expires June 18, 2014 [Page 12]
Internet-Draft Transport APIs December 2013
Michael Welzl
University of Oslo
PO Box 1080 Blindern
Oslo N-0316
Norway
Phone: +47 22 85 24 20
Email: michawe@ifi.uio.no
Martin Sustrik
Phone: +421 908 714 885
Email: sustrik@250bpm.com
Hurtig, et al. Expires June 18, 2014 [Page 13]