Internet DRAFT - draft-dnoveck-nfsv4-rpcrdma-rtissues
draft-dnoveck-nfsv4-rpcrdma-rtissues
Network File System Version 4 D. Noveck
Internet-Draft NetApp
Intended status: Informational February 22, 2018
Expires: August 26, 2018
Issues Related to RPC-over-RDMA Internode Round Trips
draft-dnoveck-nfsv4-rpcrdma-rtissues-05
Abstract
As currently designed and implemented, the RPC-over-RDMA protocol
requires use of multiple internode round trips to process some common
operations. For example, NFS WRITE operations require use of three
internode round trips. This document looks at this issue and
discusses what can and what should be done to address it, both within
the context of an extensible version of RPC-over-RDMA and potentially
outside that framework.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 26, 2018.
Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
Noveck Expires August 26, 2018 [Page 1]
Internet-Draft RPC/RDMA Round-trip Issues February 2018
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 2
1.2. Introduction . . . . . . . . . . . . . . . . . . . . . . 2
2. Review of the Current Situation . . . . . . . . . . . . . . . 3
2.1. Potentially Troublesome Requests . . . . . . . . . . . . 3
2.2. WRITE Request Processing Details . . . . . . . . . . . . 4
2.3. READ Request Processing Details . . . . . . . . . . . . . 5
3. Near-term Work . . . . . . . . . . . . . . . . . . . . . . . 6
3.1. Target Performance . . . . . . . . . . . . . . . . . . . 7
3.2. Message Continuation . . . . . . . . . . . . . . . . . . 8
3.3. Send-based Data Placement . . . . . . . . . . . . . . . . 8
3.4. Feature Synergy . . . . . . . . . . . . . . . . . . . . . 9
3.5. Feature Selection and Negotiation . . . . . . . . . . . . 10
4. Possible Future Development of RPC-over-RDMA . . . . . . . . 12
5. Other Possible Approaches . . . . . . . . . . . . . . . . . . 13
6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
7. Security Considerations . . . . . . . . . . . . . . . . . . . 15
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 15
9.1. Normative References . . . . . . . . . . . . . . . . . . 15
9.2. Informative References . . . . . . . . . . . . . . . . . 15
Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 16
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 16
1. Preliminaries
1.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
1.2. Introduction
When many common operations are performed using RPC-over-RDMA,
additional inter-node round-trip latencies are required to take
advantage of the performance benefits provided by RDMA Functionality.
While the latencies involved are generally small, they are a reason
for concern for two reasons.
o With the ongoing improvement of persistent memory technologies,
such internode latencies, being fixed, can be expected to consume
Noveck Expires August 26, 2018 [Page 2]
Internet-Draft RPC/RDMA Round-trip Issues February 2018
an increasing portion of the total latency required for processing
NFS requests using RPC-over-RDMA.
o High-performance transfers using NFS may be needed outside of a
machine-room environment. As RPC-over-RDMA is used in networks of
campus and metropolitan scale, the internode round-trip time of
sixteen microseconds per mile becomes an issue.
Given this background, round trips beyond the minimum necessary need
to be justified by corresponding benefits. If they are not, work
needs to be done to eliminate those excess round trips.
We are going to look at the existing situation with regard to round-
trip latency and make some suggestions as to how the issue might be
best addressed. We will consider things that could be done in the
near future and also explore further possibilities that would require
a longer-term approach to be adopted.
2. Review of the Current Situation
2.1. Potentially Troublesome Requests
We will be looking at four sorts of situations:
o An RPC operation involving Direct Data Placement of request data
(e.g., an NFSv3 WRITE or corresponding NFSv4 COMPOUND).
o An RPC operation involving Direct Data Placement of response data
(e.g., an NFSv3 READ or corresponding NFSv4 COMPOUND).
o An RPC operation where the request data is longer than the inline
buffer limit.
o An RPC operation where the response data is longer than the inline
buffer limit.
These are all simple examples of situations in which explicit RDMA
operations are used, either to effect Direct Data Placement or to
respond to message size limits that derive from a limited receive
buffer size.
We will survey the resulting latency and overhead issues in an RPC-
over-RDMA Version One environment in Sections 2.2 and 2.3 below.
Noveck Expires August 26, 2018 [Page 3]
Internet-Draft RPC/RDMA Round-trip Issues February 2018
2.2. WRITE Request Processing Details
We'll start with the case of a request involving direct placement of
request data. In this case, an RDMA READ is used to transfer a DDR-
eligible data item (e.g. the data to be written) from its location in
requester memory, to a location selected by the responder.
Processing proceeds as described below. Although we are focused on
internode latency, the time to perform a request also includes such
things as interrupt latency, overhead involved in interacting with
the RNIC, and the time for the server to execute the requested
operation.
o First, the memory to be accessed remotely is registered. This is
a local operation.
o Once the registration has been done, the initial send of the
request can proceed. Since this is in the context of connected
operation, there is an internode round trip involved. However,
the next step can proceed after the initial transmission is
received by the responder. As a result, only the responder-bound
side of the transmission contributes to overall operation latency.
o The responder, after being notified of the receipt of the request,
uses RDMA READ to fetch the bulk data. This involves an internode
round-trip latency. After the fetch of the data, the responder
needs to be notified of the completion of the explicit RDMA
operation
o The responder (after performing the requested operation) sends the
response. Again, as this is in the context of connected
operation, there is an internode round trip involved. However,
the next step can proceed after the initial transmission is
received by the requester.
o The memory registered before the request was issued needs to be
deregistered, before the request is considered complete and the
sending process restarted. When remote invalidation is not
available, the requester, after being notified of the receipt of
the response, performs a local operation to deregister the memory
in question. Alternatively, the responder will use Send With
Invalidate and the responder's RNIC will effect the deregistration
before notifying the requester of the response which has been
received.
To summarize, if we exclude the actual server execution of the
request, the latency consists of two internode round-trip latencies
plus two responder-side interrupt latencies plus one requester-side
Noveck Expires August 26, 2018 [Page 4]
Internet-Draft RPC/RDMA Round-trip Issues February 2018
interrupt latency plus any necessary registration/de-registration
overhead. This is in contrast to a request not using explicit RDMA
operations in which there is a single inter-node round-trip latency
and one interrupt latency on the requester and the responder.
The processing of the other sorts of requests mentioned in
Section 2.1 show both similarities and differences:
o Handling of a long request is similar to the above. The memory
associated with a position-zero read chunk is registered,
transferred using RDMA READ, and deregistered. As a result, we
have the same overhead and latency issues noted in the case of
direct data placement, without the corresponding benefits.
o The case of direct data placement of response data follows a
similar pattern. The important difference is that the transfer of
the bulk data is performed using RDMA WRITE, rather than RDMA
READ. However, because of the way that RDMA WRITE is effected
over the wire, the latency consequences are different. See
Section 2.3 for a detailed discussion.
o Handling of a long response is similar to the previous case.
2.3. READ Request Processing Details
We'll now discuss the case of a request involving direct placement of
response data. In this case, an RDMA WRITE is used to transfer a
DDR-eligible data item (e.g. the data being read) from its location
in responder memory, to a location previously selected by the
requester.
Processing proceeds as described below. Although we are focused on
internode latency, the time to perform a request also includes such
things as interrupt latency, overhead involved in interacting with
the RNIC, and the time for the server to execute the requested
operation.
o First, the memory to be accessed remotely is registered. This is
a local operation.
o Once the registration has been done, the initial send of the
request can proceed. Since this is in the context of connected
operation, there is an internode round trip involved. However,
the next step can proceed after the initial transmission is
received. As a result, only the responder-bound side of the
transmission contributes to overall operation latency.
Noveck Expires August 26, 2018 [Page 5]
Internet-Draft RPC/RDMA Round-trip Issues February 2018
o The responder, after being notified of the receipt of the request,
proceeds to process the request until the data to be read is
available in its own memory, with its location determined and
fixed. It then uses RDMA WRITE to transfer the bulk data to the
location in requester memory selected previously. This involves
an internode latency, but there is no round trip and thus no
round-trip latency,
o The responder continues processing and sends the inline portion of
the response. Again, as this is in the context of connected
operation, there is an internode round trip involved. However,
the next step can proceed immediately. If the RDMA WRITE or the
send of the inline portion of the response were to fail, the
responder can be notified subsequently.
o The requester, after being notified of the receipt of the
response, can analyze it and can access the data written into its
memory. Deregistration of the memory originally registered before
the request was issued can be done using remote invalidation or
can be done by the requester as a local operation
To summarize, in this case the additional latency that we saw in
Section 2.2 does not arise. Except for the additional overhead due
to memory registration and invalidation, the situation is the same as
for a request not using explicit RDMA operations in which there is a
single inter-node round-trip latency and one interrupt latency on the
requester and the responder.
3. Near-term Work
We are going to consider how the latency and overhead issues
discussed in Section 2 might be addressed in the context of an
extensible version of RPC-over-RDMA, such as that proposed in
[I-D.cel-nfsv4-rpcrdma-version-two].
In Section 3.1, we will establish a performance target for the
troublesome requests, based on the performance of requests that do
not involve long messages or direct data placement.
We will then consider how extensions might be defined to bring
latency and overhead for the requests discussed in Section 2.1 into
line with those for other requests. There will be two specific
classes of requests to address:
o Those that do not involve direct data placement will be addressed
in Section 3.2. In this case, there are no compensating benefits
justifying the higher overhead and, in some cases, latency.
Noveck Expires August 26, 2018 [Page 6]
Internet-Draft RPC/RDMA Round-trip Issues February 2018
o The more complicated case of requests that do involve direct data
placement is discussed in Section 3.3. In this case, direct data
placement could serve as a compensating benefit, and the important
question to be addressed is whether Direct Data Placement can be
effected without the use of explicit RDMA operations.
The optional features to deal with each of the classes of messages
discussed above could be implemented separately. However, in the
handling of RPCs with very large amounts of bulk data, the two
features are synergistic. This fact makes it desirable to define the
features as part of the same extension. See Sections 3.4 and 3.5 for
details.
3.1. Target Performance
As our target, we will look at the latency and overhead associated
with other sorts of RPC requests, i.e. those that do not use data
placement, and that have request and response messages which do fit
within the receive buffer limit.
Processing proceeds as follows:
o The initial send of the request is done. Since this is in the
context of connected operation, there is an internode round-trip
involved. However, the next step can proceed after the initial
transmission is received. As a result, only the responder-bound
side of the transmission contributes to overall operation latency.
o The responder, after being notified of the receipt of the request,
performs the requested operation and sends the reply. As in the
case of the request, there is an internode round trip involved.
However, the request can be considered complete upon receipt of
the requester-bound transmission. The responder-bound
acknowledgment does not contribute to request latency.
In this case, there is only a single internode round-trip latency
necessary to effect the RPC. Total request latency includes this
round-trip latency plus interrupt latency on the requester and
responder, plus the time for the responder to actually perform the
requested operation.
Thus the delta between the operations discussed in Section 2 and our
baseline consists of two portions, one of which applies to all the
requests we are concerned with and the second of which only applies
to request which involve use of RDMA READ, as discussed in
Section 2.2. The latter category consists of:
o One additional internode round-trip latency.
Noveck Expires August 26, 2018 [Page 7]
Internet-Draft RPC/RDMA Round-trip Issues February 2018
o One additional instance of responder-side interrupt latency.
The additional overhead necessary to do memory registration and
deregistration applies to all requests using explicit RDMA
operations. The costs will vary with implementation characteristics.
As a result, in some implementations, it may desirable to replace use
of RDMA Write with send-based alternatives, while in others, use of
RDMA Write may be preferable.
3.2. Message Continuation
Using multiple RPC-over-RDMA transmissions, in sequence, to send a
single RPC message avoids the additional latency and overhead
associated with the use of explicit RDMA operations to transfer
position-zero read chunks. In the case of reply chunks, only
overhead is reduced.
Although transfer of a single request or reply in N transmissions
will involve N+1 internode latencies, overall request latency is not
increased by requiring that operations involving multiple nodes be
serialized. Generally, these transmissions are pipelined.
As an illustration, let's consider the case of a request involving a
response consisting of two RPC-over-RDMA transmissions. Even though
each of these transmissions is acknowledged, that acknowledgement
does not contribute to request latency. The second transmission can
be received by the requester and acted upon without waiting for
either acknowledgment.
This situation would require multiple receive-side interrupts but it
is unlikely to result in extended interrupt latency. With 1K sends
(Version One), the second receive will complete about 200 nanoseconds
after the first assuming a 40Gb/s transmission rate. Given likely
interrupt latencies, the first interrupt routine would be able to
note that the completion of the second receive had already occurred.
3.3. Send-based Data Placement
In order to effect proper placement of request or reply data within
the context of individual RPC-over-RDMA transmissions, receive
buffers need to be structured to accommodate this function
To illustrate the considerations that could lead clients and servers
to choose particular buffer structures, we will use, as examples, the
cases of NFS READs and WRITEs of 8K data blocks (or the corresponding
NFSv4 COMPOUNDs).
Noveck Expires August 26, 2018 [Page 8]
Internet-Draft RPC/RDMA Round-trip Issues February 2018
In such cases, the client and server need to have the DDP-eligible
bulk data placed in appropriately aligned 8K buffer segments. Rather
than being transferred in separate transmissions using explicit RDMA
operations, a message can be sent so that bulk data is received into
an appropriate buffer segment. In this case, it would be excised
from the XDR payload stream, just as it is in the case of existing
DDP facilities.
Consider a server expecting write requests which are usually X bytes
long or less, exclusive of an 8K bulk data area. In this case the
payload stream will most likely be less than X bytes and will fit in
a buffer segment devoted to that purpose. The bulk data needs to be
placed in the subsequent buffer segment in order to align it
properly, i.e. with the appropriate alignment, in the data placement
target buffer. In order to place the data appropriately, the sender
(in this case, the client) needs to add padding of length X-Y bytes
where Y is the length of payload stream for the current request. The
case of reads is exactly the same except that the sender adding the
padding is the server.
To provide send-based data placement as an RPC-over-RDMA extension,
the framework defined in [I-D.cel-nfsv4-rpcrdma-version-two] could be
used. A new "transport characteristic" could be defined which
allowed a participant to expose the structure of his receive buffers
and to identify the buffer segments capable of being used as data
placemenr targets. In addition, a new optional message header would
have to be defined. It would be defined to provide:
o A way to designate a DDP-eligible data item as corresponding to
target buffer segments, rather than memory registered for RDMA.
o A way to indicate to the responder that it should place DDP-
eligible data items in DDP-targetable buffer segments, rather than
in memory registered for RDMA.
o A way to designate a limited portion of an RPC-over-RDMA
transmission as constituting the payload stream.
3.4. Feature Synergy
While message continuation and send-based data placement each address
an important class of commonly used messages, their combination
allows simpler handling of some important classes of messages:
o READs and WRITEs transferring larger IOs
o COMPOUNDs containing multiple IO operations.
Noveck Expires August 26, 2018 [Page 9]
Internet-Draft RPC/RDMA Round-trip Issues February 2018
o Operations whose associated payload stream is longer than the
typical value.
To accommodate these situations, it would be best to have the
definition of the headers to support message continuation interact
with data structures to support send-based data placement as follows:
o The header type used for the initial transmission of a message
continued across multiple transmissions would contain placement-
directing structures which support both send-based data placement
as well as DDP using Explicit RDMA operations.
o Buffer references for Send-based data placement should be relative
to the start of the group of transmissions and should allow
transitions between buffer segments in different receive buffers.
o The header type for messages continuing a group of transmissions
should not have DDP-related fields but should rely on the initial
transmission of the group for DDP-related functions.
o The portion of each received transmission devoted to the payload
stream should be part of the header for each message within a
group of transmissions devoted to a single RPC message. The
payload stream for the message as a whole should be the
concatenation of the streams for each transmission.
A potential extension supporting these features interacting as
described above can be found in [I-D.dnoveck-nfsv4-rpcrdma-rtrext].
3.5. Feature Selection and Negotiation
Given that an appropriate extension is likely to support multiple
OPTIONAL features, special attention will have to be given to
defining how implementations which might not support the same subset
of OPTIONAL features can successfully interact. The goal is to allow
interacting implementations to get the benefit of features that they
both support, while allowing implementation pairs that do not share
support for any of the OPTIONAL features to operate just as base
Version Two implementations could do in the absence of the potential
extension.
It is helpful if each implementation provides characteristics
defining its level of feature support which the peer implementation
can test before attempting to use a particular feature. In other
similar contexts, the support level concerns the implementation in
its role as responder, i.e. whether it is prepared to execute a given
request. In the case of the potential extension discussed here, most
Noveck Expires August 26, 2018 [Page 10]
Internet-Draft RPC/RDMA Round-trip Issues February 2018
characteristics concern an implementation in its role as receiver.
One might define characteristics which indicate,
o The ability of the implementation, in its role as receiver, to
process messages continued across multiple RPC-over-RDMA
transmissions.
o The ability of the implementation, in its role as receiver, to
process messages containing DDP-eligible data items, placed using
a send-based data placement approach.
Use of such characteristics might allow asymmetric implementations.
For example, a client might send requests containing DDP-eligible
data items using send-based data placement without being able to
accept messages containing data items using send-based data
placement. That is a likely implementation pattern, given the
greater performance benefits of avoiding use of RDMA Read.
Further useful characteristics would apply to the implementation in
its role of responder. For instance,
o The ability of the implementation, in its role as responder, to
accept and process requests which REQUIRE that DDP-eligible data
items in the response be sent using send-based DDP. The presence
of this characteristic would allow a requester to avoid
registering memory to be used to accommodate DDP-eligible data
items in the response.
o The ability of the implementation, in its role as responder, to
send responses using message continuation, as opposed to using a
reply chunk.
Because of the potentially different needs of operations in the
forward and backward directions, it may be desirable to separate the
receiver-based characteristics according the direction of operation
that they apply to.
A further issue relates to the role of explicit RDMA operations in
connection with backwards operation. Although, no current protocols
require support for DDP or transfer of large messages when operating
in the backward direction, the protocol is designed to allow such
support to be developed in the future. Since the protocol, with the
extension discussed here is likely to have multiple methods of
providing these functions, we have a number of possible choices
regarding the role of chunk-based methods of providing these
functions
Noveck Expires August 26, 2018 [Page 11]
Internet-Draft RPC/RDMA Round-trip Issues February 2018
o Support for chunk-based operation remains a REQUIREMENT for
responders, and requesters always have the option of using it,
regardless of the direction of operation.
Requesters could select alternatives to the use of explicit RDMA
operations when these are supported by the responder
o When operating in the forward direction, support for chunk-based
operation remains a REQUIREMENT for responders (i.e. servers), and
requesters (i.e. clients).
When operating in the backward direction, support for chunk-based
is OPTIONAL for responders (i.e. clients) allowing requesters
(i.e. servers) to select use of explicit RDMA operations or
alternatives when each of these is supported by the requester.
o Support for chunk-based operation is treated as OPTIONAL for
responders, regardless of the direction of operation.
In this case, requesters would select use of explicit RDMA
operations or alternatives when each of these is supported by the
responder. For a considerable time, support for explicit RDMA
operations would be a practical necessity, even if not a
REQUIREMENT, for operation in the forward direction.
4. Possible Future Development of RPC-over-RDMA
Although the reduction of explicit RDMA operation reduces the number
of inter-node round trips and eliminates sequences of operations in
which multiple round-trip latencies are serialized with server
interrupt latencies, the use of connected operations means that
round-trip latencies will always be present, since each message is
acknowledged.
One avenue that has been considered is use of unreliable-datagram
(UD) transmission in environments where the "unreliable" transmission
is sufficiently reliable that RPC replay can deal with a very low
rate of message loss. For example, UD in Infiniband specifies a low
enough rate of frame loss to make this a viable approach,
particularly for use in supporting protocols such as NFSv4.1, that
contain their own facilities to ensure exactly-once semantics.
With this sort of arrangement, request latency is still the same.
However, since the acknowledgements are not serving any substantial
function, it is tempting to consider removing them, as they do take
up some transmission bandwidth, that might be used otherwise, if the
protocol were to reach the goal of effectively using the underlying
medium.
Noveck Expires August 26, 2018 [Page 12]
Internet-Draft RPC/RDMA Round-trip Issues February 2018
The size of such wasted transmission bandwidth depends on the average
message size and many implementation considerations regarding how
acknowledgments are done. In any case, given expected message sizes,
the wasted transmission bandwidth will be very small.
When RPC messages are quite small, acknowledgments may be of concern.
However, in that situation, a better response would be transfer
multiple RPC messages within a single RPC-over-RDMA transmission.
When multiple RPC messages are combined into a single transmission,
the overhead of interfacing with the RNIC, particularly the interrupt
handling overhead, is amortized over multiple RPC messages.
Although this technique is quite outside the spirit of existing RPC-
over-RDMA implementations, it appears possible to define new header
types capable of supporting this sort of transmission, using the
extension framework described in [I-D.cel-nfsv4-rpcrdma-version-two].
5. Other Possible Approaches
It is possible that the additional round-trips associated with
writing data to the server might be addressed outside the context of
RPC-over-RDMA, by avoiding use of the RDMA paradigm for such
transfers.
One possibility that has been discussed is the use of an RDMA-based
pNFS mapping type, in which areas in server memory are presented via
RDMA-based layouts so that the client could obtain file data using
RDMA Read and modify it using RDMA Write. In each case, only a
single round-trip would be required to effect each transfer, assming
that the appropriate layouts have been obtained. Although some I-D's
have been written presenting the outlines of this approach, none are
curently active.
6. Summary
We've examined the issue of round-trip latency and concluded:
o That the number of round trips per se is not as important as the
contribution of any extra round trips to overall request latency.
o That the latency issue can be addressed using the extension
mechanism provided for in [I-D.cel-nfsv4-rpcrdma-version-two].
o That in many cases in which latency is not an issue, there may be
overhead issues that can be addressed using the same sorts of
techniques as those useful in latency reduction, again using the
Noveck Expires August 26, 2018 [Page 13]
Internet-Draft RPC/RDMA Round-trip Issues February 2018
extension mechanism provided for in
[I-D.cel-nfsv4-rpcrdma-version-two].
As it seems that the features sketched out could put internode
latencies and overhead for a large class of requests back to the
baseline value for the RPC paradigm, more detailed definition of the
required extension functionality is in order.
We've also looked at round trips at the physical level, in that
acknowledgments are sent in circumstances where there is no obvious
need for them. With regard to these, we have concluded:
o That these acknowledgements do not contribute to request latency.
o That while UD transmission can remove acknowledgements of limited
value, the performance benefits are not sufficient to justify the
disruption that this would entail.
o That issues with transmission bandwidth overhead in a small-
message environment are better addressed by combining multiple RPC
messages in a single RPC-over-RDMA transmission. This is
particularly so, because such a step is likely to reduce overhead
in such environments as well
As the features described involve the use of alternatives to explicit
RDMA operations, in performing direct data placement and in
transferring messages that are larger than the receive buffer limit,
it is appropriate to understand the role that such operations are
expected to have once the extensions discussed in this document are
fully specified and implemented.
It is important to note that these extensions are OPTIONAL and are
expected to remain so, while support for explicit RDMA operations
will remain an integral part of RPC-over-RDMA.
Given this framework, the degree to which explicit RDMA operations
will be used will reflect future implementation choices and needs.
While we have been focusing on cases in which other options might be
more efficient in some cases, it worth looking also at the cases in
which explicit RDMA operations are likely to remain preferable:
o In some environments in which direct data placement to memory of a
certain alignment does not meet application requirements and in
which data needs to be read into a particular address on the
client. Also, large physically contiguous buffers may be required
in some environments. In these situations, send-based data
placement is not an option.
Noveck Expires August 26, 2018 [Page 14]
Internet-Draft RPC/RDMA Round-trip Issues February 2018
o Where large transfers are to be done, there will be limits to the
capacity of send-based data placement to provide the required
functionality, since the basic pattern using send/receive is to
allocate a pool of memory to contain receive buffers in advance of
issuing requests. While this issue can be mitigated by use of
message continuation, tying up large numbers of credits for a
single request can cause difficult issues as well. As a result,
send-based data placement may be restricted to IO's of limited
size, although the specific limits will depend on the details of
the specific implementation.
7. Security Considerations
This document does not raise any security issues.
8. IANA Considerations
This document does not require any actions by IANA.
9. References
9.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
[RFC8166] Lever, C., Ed., Simpson, W., and T. Talpey, "Remote Direct
Memory Access Transport for Remote Procedure Call Version
1", RFC 8166, DOI 10.17487/RFC8166, June 2017,
<https://www.rfc-editor.org/info/rfc8166>.
[RFC8267] Lever, C., "Network File System (NFS) Upper-Layer Binding
to RPC-over-RDMA Version 1", RFC 8267,
DOI 10.17487/RFC8267, October 2017,
<https://www.rfc-editor.org/info/rfc8267>.
9.2. Informative References
[I-D.cel-nfsv4-rpcrdma-version-two]
Lever, C. and D. Noveck, "RPC-over-RDMA Version 2
Protocol", draft-cel-nfsv4-rpcrdma-version-two-06 (work in
progress), January 2018.
Noveck Expires August 26, 2018 [Page 15]
Internet-Draft RPC/RDMA Round-trip Issues February 2018
[I-D.dnoveck-nfsv4-rpcrdma-rtrext]
Noveck, D., "RPC-over-RDMA Extensions to Reduce Internode
Round-trips", draft-dnoveck-nfsv4-rpcrdma-rtrext-03 (work
in progress), December 2017.
[RFC5666] Talpey, T. and B. Callaghan, "Remote Direct Memory Access
Transport for Remote Procedure Call", RFC 5666,
DOI 10.17487/RFC5666, January 2010,
<https://www.rfc-editor.org/info/rfc5666>.
Appendix A. Acknowledgements
The author gratefully acknowledges the work of Brent Callaghan and
Tom Talpey producing the original RPC-over-RDMA Version One
specification [RFC5666] and also Tom's work in helping to clarify
that specification.
The author also wishes to thank Chuck Lever for his work reviving
RDMA support for NFS in [RFC8166] and [RFC8267], for providing a path
for incremental improvement of that support by his work on
[I-D.cel-nfsv4-rpcrdma-version-two], and for helpful discussions
regarding RPC-over-RDMA latency issues.
Author's Address
David Noveck
NetApp
1601 Trapelo Road
Waltham, MA 02451
United state of America
Phone: +1 781-572-8038
Email: davenoveck@gmail.com
Noveck Expires August 26, 2018 [Page 16]