Internet DRAFT - draft-andros-nfsv4-client-multipath-discovery
draft-andros-nfsv4-client-multipath-discovery
Network File System Version 4 W. Adamson
Internet-Draft NetApp
Intended status: Standards Track C. Lever, Ed.
Expires: August 13, 2017 Oracle
February 9, 2017
Trunking Discovery For Network File System Version 4.1
draft-andros-nfsv4-client-multipath-discovery-00
Abstract
Connection trunking is the use of multiple transport connections to
increase data and request throughput between one NFS client and
server pair. This document describes a means for an NFS version 4.1
client to discover NFS version 4.1 server multipath addresses that
may be used for connection trunking.
Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 13, 2017.
Copyright Notice
Copyright (c) 2017 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
Adamson & Lever Expires August 13, 2017 [Page 1]
Internet-Draft NFSv4.1 Trunking Discovery February 2017
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Discovering Multipath Addresses . . . . . . . . . . . . . . . 4
4. Trunking Support For Other NFS Versions . . . . . . . . . . . 11
5. Security Considerations . . . . . . . . . . . . . . . . . . . 11
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12
7. Normative References . . . . . . . . . . . . . . . . . . . . 12
Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 12
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 12
1. Introduction
Multiple transport connections can be established between an NFS
client and server pair to improve the throughput of RPC operations or
data transfer. These connections leverage the bandwidth of multiple
network paths, potentially making use of more than one network
interface or execution engine on both the client and server.
NFS version 4.1 defines two mechanisms for managing multiple
transport connections between a single client-server pair.
Section 2.10.5 of [RFC5661] defines "trunking" as the use of multiple
transport connections to increase the speed of data transfer.
Chapters 12 and 13 of that document introduce Parallel NFS (pNFS),
wherein multiple transport connections may be established to pNFS
Data Servers (DSs). This document refers to these multiple DS
connections as "multipathing".
The NFSv4.1 GETDEVICEINFO operation enables multipathing among
multiple pNFS Data Server (DS) network addresses. As noted in
Section 13.5 of [RFC5661], if multiple network addresses appear in a
multipath list, they designate the same Data Server. Given a such a
list of multipath addresses, a client tests further for trunking
support by sending an EXCHANGE_ID operation to each address in a
multipath list and comparing the results.
The NFS version 4.1 protocol does not specify a similar means for an
NFS version 4.1 client to discover multipath addresses to enable
Adamson & Lever Expires August 13, 2017 [Page 2]
Internet-Draft NFSv4.1 Trunking Discovery February 2017
trunking for a pNFS Meta Data Server (MDS), nor for an NFS version
4.1 server where pNFS is not in use.
This document describes a mechanism for an NFS version 4.1 server to
advertise multipath addresses that may be used for "connection
trunking": establishing multiple transport connections outside the
auspices of pNFS. This document does not discuss how an NFS client
utilizes connection trunking to achieve better performance.
1.1. Clientid And Session Trunking
The initial interaction between an NFSv4.1 client and server is an
exchange of the unique identities of both peers. During that
exchange, the server presents the client with a token which the
client uses as a shorthand for its identity during subsequent
interactions with the server. This token is known as a client ID,
which is returned to the client as a result in the NFSv4.1
EXCHANGE_ID operation.
The NFS version 4.1 protocol introduces the concept of a session. A
session enables a server to manage state associated with each client
independent of that client's transport connections, which are
transient. Section 2.10.1 of [RFC5661] provides a detailed overview
of sessions.
Each NFSv4.1 client is typically associated with one client ID. A
client is allowed to instantiate multiple sessions, which are all
associated with its client ID. This is referred to as client ID
trunking.
An NFS version 4.1 client associates an otherwise unbound transport
connection to an existing session by sending a BIND_CONN_TO_SESSION
operation on that connection. It might do this if, for instance, a
network partition caused the original transport connection associated
with a session to be lost. Using BIND_CONN_TO_SESSION operations,
more than one transport connection can be associated with, or trunked
to, the same session. This is referred to as session trunking.
An NFS client can employ either client ID trunking or session
trunking to trunk connections to a pNFS Meta Data Server or non-pNFS
server.
2. Terminology
Client ID trunking
The association of multiple sessions to the same client ID.
Connection trunking
Adamson & Lever Expires August 13, 2017 [Page 3]
Internet-Draft NFSv4.1 Trunking Discovery February 2017
The use of multiple transport connections between a single NFS
client and server pair, outside the context of pNFS. Includes
client ID and session trunking.
fs_locations and fs_locations_info
File system attributes, retrieved via a GETATTR operation, that
describe NFS server locations where a file system may be found.
Multipath address
A network address of an NFS version 4.1 server that may be used
for connection trunking.
Multipathing
The use of multiple transport connections between a single NFS
client and server pair, in the context of a pNFS layout.
pNFS Data Server
A storage service that stores only file data.
pNFS Meta Data Server
A storage service that manages pNFS layouts, which direct clients
to pNFS Data Servers.
Pseudo file system
A read-only file system that bridges the non-accessible portions
of a server's externally accessible file system namespace.
Replicas
Alternative locations to be used to access data in place of, or in
addition to, the current file system instance.
Session trunking
The association of multiple transport connections to the same
session.
3. Discovering Multipath Addresses
3.1. Querying Locations
The fs_locations attribute (Section 11.9 [RFC5661]), and the
fs_locations_info attribute (Section 11.10 [RFC5661]) provide a list
of replica servers for an externally accessible file system.
Section 11.4 of [RFC5661] defines replication as follows:
Under some circumstances, multiple alternative locations may be
used simultaneously to provide higher-performance access to the
file system in question. Provision of such alternate locations is
referred to as "replication" although there are cases in which
Adamson & Lever Expires August 13, 2017 [Page 4]
Internet-Draft NFSv4.1 Trunking Discovery February 2017
replicated sets of data are not in fact present, and the replicas
are instead different paths to the same data.
3.2. Pseudo File Systems
Section 7.3 of [RFC5661] describes the "pseudo file system" as a
framework to present all exports for an NFS version 4.1 server in a
single local namespace. The pseudo file system bridges the
unexported portions of a server's local file system namespace
providing a view of only externally accessible exported directories.
Because a pseudo file system holds a dynamically-constructed read-
only local traversal path to all externally accessible file systems
specific to that server, it is not normally a candidate for any
fs_locations nor fs_locations_info query. This includes queries for
replication or migration information, as a server's pseudo file
system is never replicated or migrated because it is unique to that
server.
3.3. Obtaining Multipath Information For Connection Trunking
Multipath addresses suitable for connection trunking are a server-
wide resource, as they provide a means to reach all exported file
systems on a server. The pseudo file system is a server-wide file
system in the sense that it provides a traversal path to all exported
file systems on a server.
Thus we define an fs_locations and fs_locations_info replica list on
the pseudo file system as a list of multipath addresses for the
server to be tested for connection trunking.
This scheme relies on a new restriction on the pseudo file system.
The NFSv4.1 server exported pseudo file system root "/", as seen by
clients, MUST NOT be migrated or replicated in a way that NFS clients
can be aware of.
To guarantee a client is getting the location information from a
server's pseudo file system, and not from a real file system on that
server, the client MUST probe the root directory of the pseudo file
system using GETATTR with the fs_locations or fs_locations_info
attribute.
Clients can make good use of information about what transport type to
use (eg. RDMA or TCP) for each multipath address, and some idea of
the relative performance of each multipath address (eg. 10GbE, 40GbE,
FDR RDMA, and so on). This class of information can be encoded in an
fs_locations_info attribute, but is not conveyed in fs_locations.
Adamson & Lever Expires August 13, 2017 [Page 5]
Internet-Draft NFSv4.1 Trunking Discovery February 2017
The text in Section 11.10 of [RFC5661] suggests that the fs_locations
attribute may be deprecated in favor of fs_locations_info.
Therefore, this document RECOMMENDs the use of fs_locations_info over
fs_locations to convey the list of multipath addresses.
3.3.1. Constructing The Multipath List
A multi-homed server knows neither the connectivity nor the
performance characteristics of the network path between a client and
each of it's network interfaces. As such, the server SHOULD
enumerate all of it's network interfaces in constructing the
connection trunking multipath address list for the pseudo file
system. This allows each client to test each multipath address and
make a connectivity and performance determination.
Mixing slow and fast transports in connection trunking can be
problematic if the client algorithm for choosing which trunked
transport to use does not take transport characteristics into
account. Indeed, Section 13.5 [RFC5661] notes that for DS multipath
address the MDS SHOULD NOT mix slow and fast transports. For
connection trunking multipath address list construction, the server
should take the transport speed into consideration. An
fs_locations_info multipath list can use fls_info flags
(Section 3.3.1.2) to communicate transport characteristics. An
fs_locations multipath list depends on the following ordering of
interfaces to convey some notion of transport characteristics:
o Place TCP transports first, followed by RDMA transports.
o Order the transports by performance, with highest performance
transports first. E.g. for TCP, 40GbE, 10GbE, then 1GbE.
o For each transport with equal performance, group by address
family. E.G. for TCP 10GbE, group IPv4 addresses, then IPv6
addresses.
3.3.1.1. Constructing An fs_locations Multipath List
When creating an fs_locations pseudofs multipath replica list, the
server fs_locations4 locations list SHOULD be ordered as described
above in Section 3.3.1.
An entry in the fs_location4 server array is formed as defined in
Section 11.9 [RFC5661].
Adamson & Lever Expires August 13, 2017 [Page 6]
Internet-Draft NFSv4.1 Trunking Discovery February 2017
The fs_locations4 fs_root and each fs_location4 rootpath MUST be set
to "/" to indicate this fs_locations replica list is on the pseudo
file system.
3.3.1.2. Use of fs_locations_info FSLI4BX Flags With Connection
Trunking
As noted in Section 3.1 both the fs_locations and fs_locations_info
attributes are designed to describe alternative locations for
exported file systems. The pseudo file system replica list describes
a server-wide resource, so file system specific information encoded
in the fs_locations_info attribute has no meaning.
When creating an fs_locations_info pseudofs multipath replica list,
the server SHOULD NOT set the FSLI4BX_GFLAGS, FSLI4BX_CLSIMUL,
FSLI4BX_CLHANDLE, FSLI4BX_CLFILEID, FSLI4BX_CLWRITEVER,
FSLI4BX_CLCHANGE, nor FSLI4BX_CLREADDIR fs_locations_server4 fls_info
flag fields. The client MUST ignore these flags.
File system specific information such as the meaning of the FSLI4BX
RANK and ORDER values and read-only versus writeable file systems
have no meaning for the connection trunking fs_locations_info
multipath list. There is information beyond the multipath address
that is useful to the client that can be expressed in the RANK and
ORDER values. We arbitrarily choose to use the FSLI4BX_READRANK and
FSLI4BX_READORDER values and redefine the meaning of FSLI4BX_READRANK
and FSLI4BX_READORDER when used for connection trunking below.
The server SHOULD NOT set either the field at byte index
FSLI4BX_WRITERANK nor FSLI4BX_WRITEORDER. The client MUST ignore
these byte fields when interpreting the fs_locations_info multipath
list.
Section 11.10.1 [RFC5661] describes the use of the server imposed
rank and order file system values which overrides client preferences.
The client connectivity characteristics of a multipath address are
typically not visible to the server, so connection trunking mulipath
lists do not interpret the FSLI4BX_READRANK or FSLI4BX_READORDER
values as overriding client preferences, but rather as additional
information that the client can use to setup connection trunking.
The server SHOULD set the FSLI4BX_READRANK and FSLI4BX_READORDER
fs_locations_server4 fls_info flag fields for each entry as follows.
The FSLI4BX_READRANK value is redefined as the server "interface
index" with a unique value for each server interface. Two connection
trunking fs_locations_server4 fls_info FSLI4BX_READRANK values that
are the same indicates that the fs_locations_server4 entries refer to
the same server interface. This can occur, for example, if a server
Adamson & Lever Expires August 13, 2017 [Page 7]
Internet-Draft NFSv4.1 Trunking Discovery February 2017
interface has multiple IPv4 addresses, or an IPv4 and an IPv6 address
assigned and entered in the connection trunking multipath list.
The FSLI4BX_READORDER value is redefined as the "relative interface
performance". For connection trunking, the FSLI4BX_READORDER is no
longer used for ordering within the FSLI4BX_READRANK value but
instead orders the fs_locations_server4 fli_entries list.
FSLI4BX_READORDER is a value that orders the server interface's
relative performance with the higher performing interfaces having a
larger FSLI4BX_READORDER value. This value MAY equal the transmit
size of the Network Interface Card (NIC) e.g. a value of 40 for a 40G
NIC.
3.3.1.3. Constructing An fs_locations_info Multipath List
When creating an fs_locations_info connection trunking multipath
list, the server fs_locations_item4 fli_entries list SHOULD be
ordered as described above in Section 3.3.1 with the appropriate
FSLI4BX_READRANK and FSLI4BX_READORDER fls_info values.
There is no FSLI4BX_TFLAG for ethernet, so for ethernet
fs_locations_server4 entries the FSLI4BX_TFLAG is not set. The
server MUST set the FSLI4BX_TFLAGS fls_info byte value to
FSLI4TF_RDMA on an RDMA fs_locations_server4 entry.
The fs_locations_server4 fls_currency field has no meaning for a
multipath list, and so SHOULD be set to zero. The client MUST ignore
the fls_currency field.
The fs_locations_server4 fli_flags and flli_valid_for fields have no
meaning for a multipath list, and so SHOULD be set to zero. The
client MUST ignore the fli_flags and flli_valid_for fields.
The fs_locations_server4 fls_server is formed as described in
Section 11.10.1 of [RFC5661].
The fs_locations_info connection trunking multipath list will consist
of a single fs_locations_info4 fli_items entry, as all entries share
a common rootpath, that of the pseudo file system. The
fs_locations_info4 fli_fs_root and the fs_locations_item4
fli_rootpath MUST be set to "/" to confirm this fs_locations_info
replica list is on the pseudo file system.
3.3.2. Querying for Multipath Information
Unlike the DS multipath list provided by GETDEVICEINFO, neither
fs_locations nor fs_locations_info attributes has a client cache
coherency feature. The client SHOULD query for multipath information
Adamson & Lever Expires August 13, 2017 [Page 8]
Internet-Draft NFSv4.1 Trunking Discovery February 2017
on mount and reboot. The client SHOULD refresh the connection
trunking multipath information whenever the connection goes away on
one or more addresses without a reboot. The client MAY query every
couple of hours or so to discover new multipath addresses.
The client MAY want to query every hour or so when a multipath list
is not present to detect a newly instantiated list.
Section 11.9 of [RFC5661] When a multipath-capable client sends an
fs_locations request to a (legacy) server that does not support the
multipath list, the server SHOULD return a zero-length array of
fs_location4 structures
A multipath-capable client can query a (legacy) server that supports
the fs_locations or fs_locations_info attribute but does not support
the connection trunking multipath list on the pseudo file system. In
this case, the server SHOULD behave as Section 11.9 of [RFC5661]
describes: the server SHOULD return an fs_locations4 data type with a
zero-length locations array and the fs_root set to "/" on an
fs_locations attribute query. For an fs_locations_info attribute
query, the server SHOULD return a zero length fli_items array of
fs_location_info4 structures with the fli_fs_root set to "/" and the
fli_flags and fli_valid_for both set to zero.
3.3.3. Resolving Server Identity
Section 2.10.5 [RFC5661] describes how a client uses EXCHANGE_ID to
resolve server identity ambiguity, and test for session and/or client
ID trunking. Connection trunking uses these methods.
3.3.4. Connection Trunking Example
Here we provide an example exchange between a client and a multi-
homed server. The example server has two 10G interfaces, a 1G
interface, and a 40G RDMA interface. All interfaces have both IPv4
and IPv6 addresses assigned to them.
Following the rules in Section 3.3.1, the server orders it's
interfaces and associated addresses to construct the connection
trunking multipath address list as follows: The first 10G(IPv4)
address, the second 10G(IPv4) address, the first 10G(IPv6) address,
the second 10G(IPv6) address, the 1G(IPv4) address, the 1G(IPv6)
address, the RDMA(IPv4) address, and finally, the RDMA(IPv6) address.
This example server interface ordering is used for both the
fs_locations and the fs_locations_info lists
The fs_locations list consists of an fs_locations4 structure with the
fs_root set to "/" and a locations list where each fs_location4 entry
Adamson & Lever Expires August 13, 2017 [Page 9]
Internet-Draft NFSv4.1 Trunking Discovery February 2017
has a rootpath value set to "/" and a server string representation of
the interface addresses in the above example server interface list
order.
The fs_locations_info list consists of an fs_locations_info4 struct
with the fli_flags and fli_valid_for fields set to zero, the
fli_fs_root root set to "/" and an fli_items list with one entry.
The single fli_items fs_locations_item4 struct has the fli_rootpath
set to "/" and an fs_locations_server4 struct for each item in the
above example server interface list order. Each fs_locations_server4
structure in the list has the fls_currency set to zero, the
fls_server is the same as the fs_locations server string, and the
fls_info array set as described in Section 3.3.1.2 and shown here:
o The first 10G interface IPv4 address: FSLI4BX_READRANK=1,
FSLI4BX_READORDER=10
o The second 10G interface IPv4 address: FSLI4BX_READRANK=2,
FSLI4BX_READORDER=10
o The first 10G interface IPv6 address: FSLI4BX_READRANK=1,
FSLI4BX_READORDER=10
o The second 10G interface IPv6 address: FSLI4BX_READRANK=2,
FSLI4BX_READORDER=10
o The 1G interface IPv4 address: FSLI4BX_READRANK=3,
FSLI4BX_READORDER=1
o The 1G interface IPv6 address: FSLI4BX_READRANK=3,
FSLI4BX_READORDER=1
o The RDMA interface IPv4 address: FSLI4BX_TFLAGS=FSLI4TF_RDMA,
FSLI4BX_READRANK=4, FSLI4BX_READORDER=40
o The RDMA interface IPv6 address: FSLI4BX_TFLAGS=FSLI4TF_RDMA,
FSLI4BX_READRANK=4, FSLI4BX_READORDER=40
Note that the fs_locations_info list provides more information than
the fs_locations list as the FSLI4BX_READRANK identifies the
interfaces, and the FSLI4BX_READORDER value is the network interface
card size.
The client queries the server as described in Section 3.3.2 and
parses the returned fs_locations or fs_locations_info multipath
address list. The client may decide to ping a multipath address with
a NULLPROC RPC to determine connectivity and round trip performance.
Adamson & Lever Expires August 13, 2017 [Page 10]
Internet-Draft NFSv4.1 Trunking Discovery February 2017
An EXCHANGE_ID is then sent to each address that the client wants to
test for connection trunking as described in Section 3.3.3.
4. Trunking Support For Other NFS Versions
NFS versions other than NFSv4.1 can also support trunking if they
provide the following protocol features:
o A place to pin the multipath list on the server. For NFSv4.1,
this is the pseudo file system fs_locations or fs_locations_info
multipath list as described in Section 3.3.1.
o A mechanism for the client to retrieve the multipath list. For
NFSv4.1, this is an fs_locations or fs_locations_info query as
described in Section 3.3.1.
o A client recipe for determining whether trunking is supported on a
multipath address. For NFSv4.1, this is the use of an EXCHANGE_ID
query as described in Section 3.3.3 .
For example, NFSv4.2 can directly use the NFSv4.1 trunking support
described in this document.
NFSv4.0 can provide client ID trunking by pinning the multipath list
on the server's pseudo file system and using an fs_locations query as
a retrieval mechanism as describe for NFSv4.1 in this document.
NFSv4.0 can then use SETCLIENTID and SETCLIENTID_CONFIRM calls as
described in Section 5.8 [RFC7931] to determine whether trunking is
supported on a multipath address.
5. Security Considerations
The traditional NFS security model controls access to shared file
systems based on a client's IP address. When multiple transport
connections are in play, a client request can appear from any one of
its network interfaces. Therefore, clients should rely on
authentication of individual users to ensure share access is
controlled appropriately. The client's IP address becomes ever less
meaningful as a mode of access control.
An injection of the IP address of a man-in-the-middle system is
easily done by replacing an IP address in a multipath list as a
GETATTR(fs_locations) reply is conveyed back to a client.
Recommendations to protect GETATTR(fs_locations) [RFC5661] and
SETCLIENTID [RFC7530] (and EXCHANGE_ID for NFSv4.1) with an
integrity-protecting security service are key to preventing such an
attack.
Adamson & Lever Expires August 13, 2017 [Page 11]
Internet-Draft NFSv4.1 Trunking Discovery February 2017
As an additional step, Section 2.10.5.1 of [RFC5661] recommends that
clients reliably verify a server's claims of trunking support for a
session or client ID using strong authentication of the server that
responds on each IP address in a multipath list.
6. IANA Considerations
There are no IANA considerations for this document.
7. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/
RFC2119, March 1997,
<http://www.rfc-editor.org/info/rfc2119>.
[RFC5661] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed.,
"Network File System (NFS) Version 4 Minor Version 1
Protocol", RFC 5661, DOI 10.17487/RFC5661, January 2010,
<http://www.rfc-editor.org/info/rfc5661>.
[RFC7530] Haynes, T., Ed. and D. Noveck, Ed., "Network File System
(NFS) Version 4 Protocol", RFC 7530, DOI 10.17487/RFC7530,
March 2015, <http://www.rfc-editor.org/info/rfc7530>.
[RFC7931] Noveck, D., Ed., Shivam, P., Ed., Lever, C., Ed., and B.
Baker, Ed., "NFSv4.0 Migration: Specification Update", RFC
7931, DOI 10.17487/RFC7931, July 2016,
<http://www.rfc-editor.org/info/rfc7931>.
Appendix A. Acknowledgments
Andy Adamson would like to thank NetApp, Inc. for its funding of his
time on this project.
Authors' Addresses
William A. (Andy) Adamson
NetApp
3629 Wagner Ridge Ct
Ann Arbor, MI 48103
USA
Email: andros@netapp.com
Adamson & Lever Expires August 13, 2017 [Page 12]
Internet-Draft NFSv4.1 Trunking Discovery February 2017
Charles Lever (editor)
Oracle Corporation
1015 Granger Avenue
Ann Arbor, MI 48104
USA
Phone: +1 248 816 6463
Email: chuck.lever@oracle.com
Adamson & Lever Expires August 13, 2017 [Page 13]