Internet DRAFT - draft-hellwig-nfsv4-scsi-layout
draft-hellwig-nfsv4-scsi-layout
NFSv4 C. Hellwig
Internet-Draft March 23, 2015
Intended status: Standards Track
Expires: September 24, 2015
Parallel NFS (pNFS) SCSI Layout
draft-hellwig-nfsv4-scsi-layout-01.txt
Abstract
Parallel NFS (pNFS) extends Network File Sharing version 4 (RFC5661)
to allow clients to directly access file data on the storage used by
the NFSv4 server. This ability to bypass the server for data access
can increase both performance and parallelism, but requires
additional client functionality for data access, some of which is
dependent on the class of storage used. The main pNFS operations
document specifies storage-class-independent extensions to NFS, the
pNFS Block/Volume Layout (RFC5663) specifies the additional
extensions for use of pNFS with block-and volume-based storage, while
this document provides extensions to the pNFS Block/Volume Layout
document to provide reliable fencing and better device
discoverability for SCSI based shared storage devices.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on September 24, 2015.
Copyright Notice
Copyright (c) 2015 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
Hellwig Expires September 24, 2015 [Page 1]
Internet-Draft pNFS SCSI Layout March 2015
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2. Conventions Used in This Document . . . . . . . . . . . . 3
1.3. Code Components Licensing Notice . . . . . . . . . . . . . 3
1.4. XDR Description . . . . . . . . . . . . . . . . . . . . . 3
2. SCSI Layout Description . . . . . . . . . . . . . . . . . . . 5
2.1. GETDEVICEINFO . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1. Model . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.2. Volume Identification . . . . . . . . . . . . . . . . 7
2.1.3. Volume Topology . . . . . . . . . . . . . . . . . . . 7
2.2. Client Fencing . . . . . . . . . . . . . . . . . . . . . . 9
2.2.1. PRs - Key Generation . . . . . . . . . . . . . . . . . 9
2.2.2. PRs - MDS Registration and Reservation . . . . . . . . 9
2.2.3. PRs - Client Registration . . . . . . . . . . . . . . 9
2.2.4. PRs - Fencing Action . . . . . . . . . . . . . . . . . 10
2.2.5. Client Recovery After a Fence Action . . . . . . . . . 10
3. Security Considerations . . . . . . . . . . . . . . . . . . . 10
4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11
5. Normative References . . . . . . . . . . . . . . . . . . . . . 11
Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . . 11
Appendix B. RFC Editor Notes . . . . . . . . . . . . . . . . . . 12
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 12
Hellwig Expires September 24, 2015 [Page 2]
Internet-Draft pNFS SCSI Layout March 2015
1. Introduction
In the parallel Network File System (pNFS), the metadata server
returns Layout Type structures that describe where file data is
located. There are different Layout Types for different storage
systems and methods of arranging data on storage devices. This
document extends the pNFS Block/Volume Layout [RFC5663] with a closer
integration into the the SCSI Architecture Model ([SAM-4]) to provide
a generic fencing method and more scalable device discovery.
1.1. Scope
This document only specifies an updated version of the layout-
specific GETDEVICEINFO XDR response, and a new mandatory fencing
method for SCSI devices, but refers to [RFC5663] for the basic
principle of operation, as well as the layout specific XDR data
structures for the LAYOUTGET and LAYOUTCOMMIT operations. This
document does not directly interact with [RFC6688], although the
mechanisms described in this document also achieve the goals of
[RFC6688], and do so in a more robust fashion that does not depend on
the cooperation of the systems involved. Thus, the mechanisms
specified in [RFC6688] are not necessary for a pNFS SCSI layout type
implementation.
1.2. Conventions Used in This Document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
1.3. Code Components Licensing Notice
The external data representation (XDR) description and scripts for
extracting the XDR description are Code Components as described in
Section 4 of "Legal Provisions Relating to IETF Documents" [LEGAL].
These Code Components are licensed according to the terms of Section
4 of "Legal Provisions Relating to IETF Documents".
1.4. XDR Description
This document contains the XDR [RFC4506] description of the NFSv4.1
SCSI layout protocol. The XDR description is embedded in this
document in a way that makes it simple for the reader to extract into
a ready-to-compile form. The reader can feed this document into the
following shell script to produce the machine readable XDR
description of the NFSv4.1 SCSI layout:
#!/bin/sh
Hellwig Expires September 24, 2015 [Page 3]
Internet-Draft pNFS SCSI Layout March 2015
grep '^ *///' $* | sed 's?^ */// ??' | sed 's?^ *///$??'
That is, if the above script is stored in a file called "extract.sh",
and this document is in a file called "spec.txt", then the reader can
do:
sh extract.sh < spec.txt > flex_files_prot.x
The effect of the script is to remove leading white space from each
line, plus a sentinel sequence of "///".
The embedded XDR file header follows. Subsequent XDR descriptions,
with the sentinel sequence are embedded throughout the document.
Note that the XDR code contained in this document depends on types
from the NFSv4.1 nfs4_prot.x file [RFC5662]. This includes both nfs
types that end with a 4, such as offset4, length4, etc., as well as
more generic types such as uint32_t and uint64_t.
/// /*
/// * This code was derived from draft-hellwig-nfsv4-scsi-layout
/// * Please reproduce this note if possible.
/// */
/// /*
/// * Copyright (c) 2015 IETF Trust and the persons identified
/// * as the document authors. All rights reserved.
/// *
/// * Redistribution and use in source and binary forms, with
/// * or without modification, are permitted provided that the
/// * following conditions are met:
/// *
/// * - Redistributions of source code must retain the above
/// * copyright notice, this list of conditions and the
/// * following disclaimer.
/// *
/// * - Redistributions in binary form must reproduce the above
/// * copyright notice, this list of conditions and the
/// * following disclaimer in the documentation and/or other
/// * materials provided with the distribution.
/// *
/// * - Neither the name of Internet Society, IETF or IETF
/// * Trust, nor the names of specific contributors, may be
/// * used to endorse or promote products derived from this
/// * software without specific prior written permission.
/// *
/// * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS
/// * AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED
/// * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
Hellwig Expires September 24, 2015 [Page 4]
Internet-Draft pNFS SCSI Layout March 2015
/// * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
/// * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
/// * EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
/// * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
/// * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
/// * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
/// * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
/// * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
/// * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
/// * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
/// * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
/// * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
/// */
///
/// /*
/// * nfs4_scsi_layout_prot.x
/// */
///
/// %#include "nfs4_block_layout_prot.x"
/// %#include "nfsv41.h"
///
2. SCSI Layout Description
The layout4 type defined in [RFC5662] is extended with a new value as
follows:
enum layouttype4 {
LAYOUT4_NFSV4_1_FILES = 1,
LAYOUT4_OSD2_OBJECTS = 2,
LAYOUT4_BLOCK_VOLUME = 3,
LAYOUT4_SCSI = 0x80000005
[[RFC Editor: please modify the LAYOUT4_SCSI
to be the layouttype assigned by IANA]]
};
struct layout_content4 {
layouttype4 loc_type;
opaque loc_body<>;
};
struct layout4 {
offset4 lo_offset;
length4 lo_length;
layoutiomode4 lo_iomode;
layout_content4 lo_content;
};
Hellwig Expires September 24, 2015 [Page 5]
Internet-Draft pNFS SCSI Layout March 2015
This document defines structure associated with the layouttype4 value
LAYOUT4_SCSI. [RFC5661] specifies the loc_body structure as an XDR
type "opaque". The opaque layout is uninterpreted by the generic
pNFS client layers, but obviously must be interpreted by the Layout
Type implementation. All structures behind this opaque value are
identical to those defined in [RFC5663].
2.1. GETDEVICEINFO
/// /*
/// * Code sets from SPC-3.
/// */
/// enum pnfs_scsi_code_set {
/// PS_CODE_SET_BINARY = 1,
/// PS_CODE_SET_ASCII = 2,
/// PS_CODE_SET_UTF8 = 3
/// };
///
/// /*
/// * Designator types from taken from SPC-3.
/// *
/// * Other values are allocated in SPC-3, but not mandatory to
/// * implement or aren't logical unit names.
/// */
/// enum pnfs_scsi_designator_type {
/// PS_DESIGNATOR_EUI64 = 2,
/// PS_DESIGNATOR_NAA = 3,
/// PS_DESIGNATOR_NAME = 8
/// };
///
/// /*
/// * Logical unit name + reservation key.
/// */
/// struct pnfs_scsi_base_volume_info4 {
/// pnfs_scsi_code_set sbv_code_set;
/// pnfs_scsi_designator_type sbv_designator_type;
/// opaque sbv_designator<>;
/// uint32_t sbv_pr_key;
/// };
///
2.1.1. Model
GETDEVICEINFO calls are handled exactly the same way as specified in
[RFC5663]. The "pnfs_scsi_volume_type4" data structure returned by
the server as the storage-protocol-specific opaque field da_addr_body
in the "device_addr4" structure by a successful GETDEVICEINFO
operation [RFC5661] is a strict superset of the
Hellwig Expires September 24, 2015 [Page 6]
Internet-Draft pNFS SCSI Layout March 2015
"pnfs_block_volume_type" structured defined by [RFC5663].
2.1.2. Volume Identification
SCSI targets implementing [SPC3] export unique logical unit names for
each logical unit through the Device Identification VPD page which
can be obtained using the INQUIRY command. This document uses a
subset of this information to identify logical units backing pNFS
SCSI layouts. It is similar to the "Identification Descriptor Target
Descriptor" specified in [SPC3], but limits the allowed values to
those that uniquely identify a logical unit. Device Identification
VPD page descriptors used to identify logical units for use with pNFS
SCSI layouts must adhere to the following restrictions:
1. The "ASSOCIATION" must be set to 0 (The DESIGNATOR field is
associated with the addressed logical unit).
2. The "DESIGNATOR TYPE" must be set to one of three values
explicitly listed in the "pnfs_scsi_designator_type"
enumerations.
The "CODE SET" VPD page field is stored in the "sbv_code_set" field
of the "pnfs_scsi_base_volume_info4" structure, the "DESIGNATOR TYPE"
is stored in "sbv_designator_type", and the DESIGNATOR is stored in
"sbv_designator". Due to the use of a XDR array the "DESIGNATOR
LENGTH" field does not need to be set separately. Only certain
combinations of "sbv_code_set" and "sbv_designator_type" are valid,
please refer to [SPC3] for details, and note that ASCII may be used
as the code set for UTF-8 text that contains only ASCII characters.
Note that a Device Identification VPD page MAY contain multiple
descriptors with the same association, code set and designator type.
NFS clients thus MUST iterate the descriptors until a match for
"sbv_code_set", "sbv_designator_type" and "sbv_designator" is found,
or until the end of VPD page.
Additionally the server returns a Persistent Reservation key in the
"sbv_pr_key" field. See Section 2.2 for more details on the use of
Persistent Reservations.
2.1.3. Volume Topology
The pNFS SCSI server volume topology is expressed as an arbitrary
combination of base volume types enumerated in the following data
structures. The individual components of the topology are contained
in an array and components may refer to other components by using
array indices.
Hellwig Expires September 24, 2015 [Page 7]
Internet-Draft pNFS SCSI Layout March 2015
/// enum pnfs_scsi_volume_type4 {
/// PNFS_SCSI_VOLUME_SIMPLE =
/// PNFS_BLOCK_VOLUME_SIMPLE , /* invalid */
/// PNFS_SCSI_VOLUME_SLICE = /* see RFC5663 */
/// PNFS_BLOCK_VOLUME_SLICE,
/// PNFS_SCSI_VOLUME_CONCAT = /* see RFC5663 */
/// PNFS_BLOCK_VOLUME_CONCAT,
/// PNFS_SCSI_VOLUME_STRIPE = /* see RFC5663 */
/// PNFS_BLOCK_VOLUME_STRIPE,
/// PNFS_SCSI_VOLUME_BASE = 4 /* SCSI LU */
/// };
///
///
/// union pnfs_scsi_volume4 switch (pnfs_scsi_volume_type4 type) {
/// case PNFS_SCSI_VOLUME_SIMPLE:
/// pnfs_block_simple_volume_info4 sv_simple_info;
/// case PNFS_SCSI_VOLUME_SLICE:
/// pnfs_block_slice_volume_info4 sv_slice_info;
/// case PNFS_SCSI_VOLUME_CONCAT:
/// pnfs_block_concat_volume_info4 sv_concat_info;
/// case PNFS_SCSI_VOLUME_STRIPE:
/// pnfs_block_stripe_volume_info4 sv_stripe_info;
/// case PNFS_SCSI_VOLUME_BASE:
/// pnfs_scsi_base_volume_info4 sv_base_info;
/// };
///
/// /* scsi layout specific type for da_addr_body */
/// struct pnfs_scsi_deviceaddr4 {
/// pnfs_scsi_volume4 sda_volumes<>; /* array of volumes */
/// };
///
All rules for ordering and formation of a "pnfs_scsi_deviceaddr4"
structure are identical to those for a "pnfs_block_deviceaddr4"
structure in [RFC5663], except that the new
pnfs_scsi_base_volume_info4 PNFS_SCSI_VOLUME_BASE case is used in
place of the pnfs_block_simple_volume_info4 PNFS_BLOCK_VOLUME_SIMPLE
case as the base structure. A PNFS_BLOCK_VOLUME_SIMPLE element MUST
NOT be referenced by a pnfs_scsi_deviceaddr4, but is preserved for
XDR level compatibility.
Hellwig Expires September 24, 2015 [Page 8]
Internet-Draft pNFS SCSI Layout March 2015
2.2. Client Fencing
[RFC5663] suggests using either LUN masking or cooperative clients to
implement client fencing. The first implementation requires the
server and the storage device to have a common way to address a
client, which is impossible when the NFS and storage connection don't
share a network, and requires a non-standardized control protocol
between the MDS and the storage device. The second implementation
relies on a cooperative client, which is not robust.
Instead this document specifies a new SCSI-specific fencing protocol
using Persistent Reservations (PRs), similar to the fencing method
used by existing shared disk file systems. By placing a PR of type
"Exclusive Access - All Registrants" on each SCSI logical unit
exported to pNFS clients the MDS prevents access from any client that
does not have an outstanding device device ID that gives the client a
reservation key to access the logical unit, and allows the MDS to
revoke access to the logic unit at any time.
2.2.1. PRs - Key Generation
To allow fencing individual systems, each system must use a unique
Persistent Reservation key. [SPC3] does not specify a way to
generate keys. This document assigns the burden to generate unique
keys to the MDS, which must generate a key for itself before
exporting a volume, and one for each client that accesses a volume.
The MDS MAY either generate a key for each client that accesses logic
units exported by the MDS, or generate a key for each [logical unit,
client] combination. If using a single key per client, the MDS needs
to be aware of the per-client fencing granularity.
2.2.2. PRs - MDS Registration and Reservation
Before returning a PNFS_SCSI_VOLUME_BASE volume to the client, the
MDS needs to prepare the volume for fencing using PRs. This is done
by registering the reservation generated for the MDS with the device
using the "PERSISTENT RESERVE OUT" command with a service action of
"REGISTER", followed by a "PERSISTENT RESERVE OUT" command, with a
service action of "RESERVE" and the type field set to 8h (Exclusive
Access - All Registrants). To make sure all I_T nexuses are
registered, the MDS SHOULD set the "All Target Ports" (ALL_TG_PT) bit
when registering the key, or otherwise ensure the registration is
performed for each initiator port.
2.2.3. PRs - Client Registration
Before performing the first IO to a device returned from a
GETDEVICEINFO operation the client will register the registration key
Hellwig Expires September 24, 2015 [Page 9]
Internet-Draft pNFS SCSI Layout March 2015
returned in sbv_pr_key with the storage device by issuing a
"PERSISTENT RESERVE OUT" command with a service action of REGISTER
with the "SERVICE ACTION RESERVATION KEY" set to the reservation key
returned in sbv_pr_key. To make sure all I_T nexus are registered,
the client SHOULD set the "All Target Ports" (ALL_TG_PT) bit when
registering the key, or otherwise ensure the registration is
performed for each initiator port.
When a client stops using a device earlier returned by GETDEVICEINFO
it MUST unregister the earlier registered key by issuing a
"PERSISTENT RESERVE OUT" command with a service action of "REGISTER"
with the "RESERVATION KEY" set to the earlier registered reservation
key.
2.2.4. PRs - Fencing Action
In case of a non-responding client the MDS MUST fence the client by
issuing a "PERSISTENT RESERVE OUT" command with the service action
set to "PREEMPT" or "PREEMPT AND ABORT", the reservation key field
set to the server's reservation key, the service action reservation
key field set to the reservation key associated with the non-
responding client, and the type field set to 8h (Exclusive Access -
All Registrants).
After the MDS preempts a client, all client I/O to the logical unit
fails. The client should at this point return any layout that refers
to the device ID that points to the logical unit. Note that the
client can distinguish I/O errors due to fencing from other errors
based on the "RESERVATION CONFLICT" status. Refer to [SPC3] for
details.
2.2.5. Client Recovery After a Fence Action
A client that detects I/O errors on the storage devices MUST commit
through the MDS, return all outstanding layouts for the device,
forget the device ID and unregister the reservation key. Future
GETDEVICEINFO calls may refer to the storage device again, in which
case a new registration will be performed.
3. Security Considerations
The security considerations in [RFC5663] apply to this document as
well.
Hellwig Expires September 24, 2015 [Page 10]
Internet-Draft pNFS SCSI Layout March 2015
4. IANA Considerations
IANA is requested to assign a new pNFS layout type in the pNFS Layout
Types Registry as follows (the value 5 is suggested): Layout Type
Name: LAYOUT4_SCSI Value: 0x00000005 RFC: RFCTBD10 How: L (new layout
type) Minor Versions: 1
5. Normative References
[LEGAL] IETF Trust, "Legal Provisions Relating to IETF Documents",
November 2008, <http://trustee.ietf.org/docs/
IETF-Trust-License-Policy.pdf>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", March 1997.
[RFC4506] Eisler, M., "XDR: External Data Representation Standard",
STD 67, RFC 4506, May 2006.
[RFC5661] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed.,
"Network File System (NFS) Version 4 Minor Version 1
Protocol", RFC 5661, January 2010.
[RFC5662] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed.,
"Network File System (NFS) Version 4 Minor Version 1
External Data Representation Standard (XDR) Description",
RFC 5662, January 2010.
[RFC5663] Black, D., Ed., Fridella, S., Ed., and J. Glasgow, Ed.,
"Parallel NFS (pNFS) Block/Volume Layout", RFC 5663,
January 2010.
[RFC6688] Black, D., Ed., Glasgow, J., and S. Faibish, "Parallel NFS
(pNFS) Block Disk Protection", RFC 6688, July 2012.
[SAM-4] INCITS Technical Committee T10, "SCSI Architecture Model -
4 (SAM-4)", ANSI INCITS 447-2008, ISO/IEC 14776-414, 2008.
[SPC3] INCITS Technical Committee T10, "SCSI Primary Commands-3",
ANSI INCITS 408-2005, ISO/IEC 14776-453, 2005.
Appendix A. Acknowledgments
David Black, Robert Elliott and Tom Haynes provided a throughout
review of early drafts of this document, and their input lead to the
current form of the document.
Hellwig Expires September 24, 2015 [Page 11]
Internet-Draft pNFS SCSI Layout March 2015
Appendix B. RFC Editor Notes
[RFC Editor: please remove this section prior to publishing this
document as an RFC]
[RFC Editor: prior to publishing this document as an RFC, please
replace all occurrences of RFCTBD10 with RFCxxxx where xxxx is the
RFC number of this document]
Author's Address
Christoph Hellwig
Email: hch@lst.de
Hellwig Expires September 24, 2015 [Page 12]