Internet DRAFT - draft-myklebust-nfsv4-byte-range-delegations

draft-myklebust-nfsv4-byte-range-delegations






Network File System Version 4                               T. Myklebust
Internet-Draft                                   Network Appliance, Inc.
Expires: April 19, 2006                                        J. Fields
                                                              W. Adamson
                                                             P. Honeyman
                                                                    CITI
                                                        October 16, 2005


       Network File System (NFS) version 4 byte range delegations
            draft-myklebust-nfsv4-byte-range-delegations-00

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on April 19, 2006.

Copyright Notice

   Copyright (C) The Internet Society (2005).

Abstract

   This document describes a set of extensions to the NFS version 4
   protocol that enable the client to cache file data when caching
   conflicts prevent the server from handing out a file delegation.

   The proposed extensions enable the caching of only those specific



Myklebust, et al.        Expires April 19, 2006                 [Page 1]

Internet-Draft        NFSv4 byte range delegations          October 2005


   byte ranges of data which the user application is reading or writing.

   As in the case of full delegations, a callback mechanism enables the
   server to request that the client flush cached data when a caching
   conflict occurs.

Keywords

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].

Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
     1.1   File caching in NFS versions 2 and 3 . . . . . . . . . . .  3
     1.2   File caching in NFS version 4  . . . . . . . . . . . . . .  3
     1.3   Motivation for extending the NFSv4 delegation model  . . .  3
   2.  Description of the proposed caching model  . . . . . . . . . .  5
     2.1   File data  . . . . . . . . . . . . . . . . . . . . . . . .  5
       2.1.1   Read delegations . . . . . . . . . . . . . . . . . . .  5
       2.1.2   Write delegations  . . . . . . . . . . . . . . . . . .  6
     2.2   Upgrading and downgrading byte ranges  . . . . . . . . . .  6
     2.3   File truncation and extension  . . . . . . . . . . . . . .  7
     2.4   Byte range locks . . . . . . . . . . . . . . . . . . . . .  7
   3.  Stateids and byte range delegations  . . . . . . . . . . . . .  8
     3.1   The current delegation stateid . . . . . . . . . . . . . .  8
   4.  Callback model . . . . . . . . . . . . . . . . . . . . . . . .  9
     4.1   Revocation . . . . . . . . . . . . . . . . . . . . . . . .  9
     4.2   Client recovery from a recalled byte range delegation  . .  9
     4.3   Client recovery from a recalled file delegation  . . . . . 10
     4.4   Use of CB_GETATTR for querying the size attribute  . . . . 10
   5.  Crash recovery . . . . . . . . . . . . . . . . . . . . . . . . 11
     5.1   Client reboot scenario . . . . . . . . . . . . . . . . . . 11
     5.2   Server reboot scenario . . . . . . . . . . . . . . . . . . 11
     5.3   Network partition  . . . . . . . . . . . . . . . . . . . . 11
   6.  New client operations  . . . . . . . . . . . . . . . . . . . . 12
     6.1   DELEG_OPEN - request new byte-range delegation stateid . . 12
     6.2   DELEG_RANGE - extend delegation to cover a byte range  . . 14
     6.3   DELEG_DOWNGRADE - downgrades a write delegation on a
           byte range . . . . . . . . . . . . . . . . . . . . . . . . 17
     6.4   DELEG_RELEASE - release a delegation on a byte range . . . 19
     6.5   DELEG_PUT_STATEID - set the current delegation stateid . . 20
   7.  New callback operations  . . . . . . . . . . . . . . . . . . . 22
     7.1   CB_RECALL_RANGE - recall a byte range delegation . . . . . 22
   8.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 23
       Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 23
       Intellectual Property and Copyright Statements . . . . . . . . 25



Myklebust, et al.        Expires April 19, 2006                 [Page 2]

Internet-Draft        NFSv4 byte range delegations          October 2005


1.  Introduction


1.1  File caching in NFS versions 2 and 3

   The NFS protocol versions 2 and 3 do not offer any caching guarantees
   to clients.  The most commonly implemented caching model is the so-
   called close-to-open model, which relies on user applications
   providing their own assurances of exclusive access to file data.  In
   this model, the clients limit themselves to checking cache
   consistency when the user opens and closes the file.  In the case
   where the NLM locking extensions are implemented, checks are also
   performed upon taking and releasing advisory locks.

1.2  File caching in NFS version 4

   With the introduction of delegations, NFS version 4 [RFC3530]
   strengthens file caching guarantees at the protocol level under
   limited circumstances that mirror those under which the close-to-open
   model is valid.

   When the client opens a file for reading, the server is permitted to
   offer a "file read delegation" after having determined that no other
   clients have been granted write access.  This is a guarantee that the
   file data and meta-data will not change until the client gives up the
   delegation.  A file read delegation also gives the client the
   opportunity to cache byte range read locks and READ open share locks.

   When the client opens a file with READ or WRITE share semantics, and
   the server determines that the client is the exclusive user of that
   file, it may offer a "file write delegation".  In doing so it
   guarantees that no other client may read or modify the file until the
   delegation is returned.  A write delegation also enables the caching
   of all byte range locks and open share locks.

   The key difference in functionality between a file delegation and a
   lock lies in the fact that the server is able to recall the
   delegation at any time by means of a callback channel.  When a
   delegation is recalled, the client is expected to flush its cache,
   establish its cached locks on the server, and return the delegation,
   and to do all this as quickly as possible.  If the server notes that
   the client has failed to return the delegation within a grace time of
   1 lease period, then the server may unilaterally revoke the
   delegation.

1.3  Motivation for extending the NFSv4 delegation model

   Problems arise when multiple clients wish to access the file, and one



Myklebust, et al.        Expires April 19, 2006                 [Page 3]

Internet-Draft        NFSv4 byte range delegations          October 2005


   (or more) has open for writing.  Delegations are ruled out for this
   case, so unless an application uses byte range locking, a client is
   unable to tell whether cached data is valid.  Perforce, clients fall
   back to not caching data or checking cache validity frequently,
   increasing the I/O burden on the server.

   One long-standing problem that the NFSv4 delegation model therefore
   fails to solve is that of providing cache consistency guarantees as
   strong as those provided by local file-systems.  This failure has a
   broad impact, e.g. it interferes with porting applications from a
   single machine environment to a cluster of machines that share files
   with NFS.

   Among the applications that require stronger caching semantics than
   NFSv4 provides are those that use shared files for synchronisation
   and communication between processes on different clients but do no
   supplementary locking.  Another example is shared append-only files
   such as logs.

   Even applications that use byte range locking for synchronisation are
   affected.  Unless a peek at the change attribute shows that no-one
   has written to the file anywhere in the file, a client may be forced
   to ignore otherwise valid cached data.




























Myklebust, et al.        Expires April 19, 2006                 [Page 4]

Internet-Draft        NFSv4 byte range delegations          October 2005


2.  Description of the proposed caching model

   Except for the special case of the size attribute, this document does
   not address the issue of file meta-data consistency.

   The proposed model resembles that of file delegations in that the
   client can register with the server to provide synchronous
   notification of changes to locks and cached data.  It also provides
   synchronisation guarantees between writers by allowing them to
   request temporary exclusive access to byte ranges of the file.

   The model is required to operate consistently in a mixed environment
   in which some clients may be using older versions of the NFS protocol
   together with uncached I/O. To the older clients, those that are
   using byte range delegations should appear to behave as if they too
   are using uncached I/O.

2.1  File data

2.1.1  Read delegations

   A server that grants a read delegation on a byte range guarantees
   that no other client may change the data or acquire a write-lock in
   the covered region until the delegation is released.  Note that a
   SETATTR that modifies the size of a file effectively changes the data
   in the region between the old and new sizes.

   The client may request a read delegation on a byte range using the
   DELEG_RANGE operation with a lock type argument of READ_LT or
   READW_LT.  In the case where the READ_LT argument is used, the
   DELEG_RANGE call should fail without triggering a recall if another
   client holds a write delegation for that range.  Clients can use this
   mechanism in order to issue speculative requests that might fail,
   e.g. read-ahead requests.  The server MUST, however initiate the
   recall of any conflicting write delegation when the READW_LT variant
   is used whether or not the request is granted.

   In the proposed model, if a current delegation stateid has been set
   using a previous DELEG_PUT_STATEID or DELEG_RANGE operation, then a
   READ request implicitly requests a read delegation on the byte range
   covered by its arguments.  In this case, the server should treat the
   READ request as if it has been immediately preceded by a DELEG_RANGE
   call with a READW_LT argument.

   A server MUST refuse to grant a read delegation on a range that would
   overlap with a write delegation held by another client.  In order to
   allow the caching of byte range locks, the server MUST also refuse to
   grant a read delegation for a range that overlaps with a WRITE lock



Myklebust, et al.        Expires April 19, 2006                 [Page 5]

Internet-Draft        NFSv4 byte range delegations          October 2005


   held by another client.

   If another client attempts to write into the region covered by the
   delegation, the server should initiate an immediate recall.  It may
   then optionally return an error of NFS4ERR_DELAY to the write
   request.

2.1.2  Write delegations

   A server that grants a write delegation on a byte range guarantees
   that no other client may change the data in that region until the
   delegation has been released.  In addition, it guarantees that no
   other client may read data or hold a read delegation in that region
   until the write delegation has been downgraded or released.

   The client may request a write delegation on a byte range using the
   DELEG_RANGE operation with a lock type argument of WRITE_LT or
   WRITEW_LT.  In the case where the WRITE_LT argument is used, the
   DELEG_RANGE call should fail without triggering a recall if another
   client holds a read or write delegation for that range.  The server
   MUST, however initiate the recall of any conflicting read or write
   delegation when the WRITEW_LT variant is used.

   A server MUST refuse to grant a write delegation that would overlap
   with a read or write delegation held by another client.  In order to
   allow the caching of byte range locks, the server MUST also refuse to
   grant a write delegation for a range that overlaps with a READ or
   WRITE lock held by another client.

   To avoid lock starvation for write delegations, the server is
   encouraged to implement the same queueing scheme that is described
   for byte range locks in Section 8.4 of [RFC3530].

2.2  Upgrading and downgrading byte ranges

   In the proposed mode, a client may request to upgrade a read
   delegation to a write delegation at any time using the DELEG_RANGE
   operation.  If successful, the upgrade must be performed atomically
   by the server so that the client that requested the upgrade can keep
   any cached data.

   Similarly, a client that is holding a write delegation on a byte
   range may, once it is done flushing out any dirty data, request that
   the server atomically downgrade it to a read delegation using the
   DELEG_DOWNGRADE operation.  It is expected that clients will take
   advantage of this as part of a COMMIT compound to obviate recalls.





Myklebust, et al.        Expires April 19, 2006                 [Page 6]

Internet-Draft        NFSv4 byte range delegations          October 2005


2.3  File truncation and extension

   Changes to the file size MUST trigger a recall of all byte range
   delegations held by other clients in the region between the old and
   new end of file.

   A useful consequence of this rule is that a client wishing to be
   notified of changes to the size attribute may achieve this by
   requesting a read or write delegation that covers the 2 byte range
   starting at the offset (size - 1).

   If a client holds a write delegation in the region of the end of file
   marker, then it is guaranteed that no other clients can append to the
   file until the client holding the write delegation has finished
   writing out its modifications and released the delegation in that
   region.

2.4  Byte range locks

   A client holding a write delegation may cache read or write byte
   range lock requests, provided they are fully included in the range
   covered by the write delegation.

   A client holding a read delegation may cache read byte range lock
   requests provided they are fully included in the region covered by
   the read delegation.

   If a delegation is recalled or downgraded, the client is responsible
   for establishing any cached locks to the server as part of the
   process of recovery.





















Myklebust, et al.        Expires April 19, 2006                 [Page 7]

Internet-Draft        NFSv4 byte range delegations          October 2005


3.  Stateids and byte range delegations

   One of the goals of the delegation model is to allow clients to cache
   data without having to tie that delegation to a particular open
   stateid.  Although the DELEG_OPEN operation uses an open stateid and
   sequence to guarantee only-once semantics, the resulting stateid is
   not considered to be associated to this particular open stateid.

   To allow it to be reused with other open stateids, therefore, the
   byte range delegation stateid does not carry any share or lock
   information.  A client holding a write delegation on a particular
   byte range has no guarantee that the share reservations on that file
   allow write access.

3.1  The current delegation stateid

   To allow the server to check that a given operation does not violate
   the requested caching semantics, we add the notion of a "current
   delegation stateid".

   Rather than replacing the usual open stateid argument, the current
   delegation stateid is set in a separate operation that precedes the
   READ, WRITE, or SETATTR operation that it protects.  It is set either
   implicitly using a DELEG_RANGE operation, or by using the dedicated
   operation DELEG_PUT_STATEID.  The current delegations stateid is
   automatically cleared by any operation that changes the current
   filehandle.  It may also be cleared by explicitly calling
   DELEG_PUT_STATEID with a special stateid argument consisting of all
   zeros.

   If set, the current delegation stateid applies to all subsequent
   READ, WRITE and SETATTR operations within the same COMPOUND.  The
   server is required to check the current delegation stateid in
   addition to the READ/WRITE/SETATTR's stateid argument, and should
   return NFS4ERR_OLD_STATEID if either stateid has been superseded due
   to a state change.  This may, for instance occur in the case of a
   race with another DELEG_DOWNGRADE or DELEG_RELEASE request on the
   same file.













Myklebust, et al.        Expires April 19, 2006                 [Page 8]

Internet-Draft        NFSv4 byte range delegations          October 2005


4.  Callback model

4.1  Revocation

   Servers are permitted to recall a byte range delegation at any time
   and for any reason.  Typical scenarios that trigger such a recall
   include:
   o  Resolving a caching conflict due to a request from another client.
      Operations that may require a recall of the byte range delegation
      include READ, WRITE, LOCK, LOCKT, SETATTR, OPEN or DELEG_RANGE.
   o  Another client's read patterns triggers speculative read-ahead on
      the server.
   o  The amount of delegation state being managed by the server grows
      too large, triggering a reclaim of resources.

   There are two ways for a server to recall a byte range delegation:
   o  As for file delegations, the server can use CB_RECALL to request
      that a client flush all writes and locks affected by the
      delegation, and return the delegation using the DELEGRETURN
      operation.  If the client later wishes to re-establish a
      delegation, then it must first call DELEG_OPEN to obtain a new
      delegation stateid.
   o  The new CB_RECALL_RANGE allows the server finer granularity over
      which region of the file that it wishes to control.
      CB_RECALL_RANGE also allows the server to request a downgrade
      rather than a full recall of a region that holds cached writes.
      By requesting a downgrade, the server signals that the client may
      convert its write delegations into read delegations after it has
      finished flushing the cached writes to disk.

   Clients that request byte range delegations MUST be able to handle
   both CB_RECALL and CB_RECALL_RANGE recall requests.

4.2  Client recovery from a recalled byte range delegation

   When the server recalls a byte range or part of a byte range that has
   been delegated, the client recovery process is very similar to that
   of file delegation:
   o  If the client holds a read delegation on the recalled byte range,
      then it should recover any cached byte range read locks and mark
      the read cache as invalid.
   o  If a write delegation is held on all or part of the byte range
      being recalled, then the client should recover any cached read or
      write locks, flush out all pending writes, and mark the read cache
      as invalid.

   The recovery process ends when the client returns the delegation on
   the recalled range using either the DELEG_RELEASE or DELEGRETURN



Myklebust, et al.        Expires April 19, 2006                 [Page 9]

Internet-Draft        NFSv4 byte range delegations          October 2005


   operations.

   If the server requests a downgrade of a write delegation, then the
   client may optionally select to use a DELEG_DOWNGRADE instead of
   returning the entire delegation.  If it chooses to do so then it need
   not mark the read cache as invalid on that range.

4.3  Client recovery from a recalled file delegation

   If the server recalls a file write delegation, then the client may
   request read or write byte range delegations as part of the usual
   process of recovering cached locks and flushing out writes.

   The server is under no obligation to honour these requests, but it
   may choose to do so in order to allow the client to continue to cache
   read data or writes that are not causing any immediate cache
   consistency conflicts.

   Likewise, in the case where the server recalls a file read
   delegation, then the client may issue requests for byte range read
   delegations during the recovery phase.

4.4  Use of CB_GETATTR for querying the size attribute

   If a client holds a write delegation that extends across the end of
   file, then it may cache SETATTR or WRITE operations that will cause
   the size attribute to change.  Rather than recall the delegation when
   a second client attempts to query the size attribute, the server MAY
   choose to send a CB_GETATTR callback to the client holding the
   delegation in order to determine the true file size.

   Note that the server MUST NOT issue a CB_GETATTR query for any
   attributes other than size.


















Myklebust, et al.        Expires April 19, 2006                [Page 10]

Internet-Draft        NFSv4 byte range delegations          October 2005


5.  Crash recovery

   As usual under NFS, the recovery of byte range delegations after a
   crash is driven by clients.

5.1  Client reboot scenario

   If the client reboots using the standard calls to SETCLIENTID and
   SETCLIENTID_CONFIRM then the server is expected to clear the byte
   range delegations as part of the usual operation of breaking the
   lease state owned by the previous incarnation of the client.

5.2  Server reboot scenario

   The client discovers a server reboot in the usual fashion by
   receiving a NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_STATEID.  If the
   server supports a grace period, the client may then attempt to
   recover byte range delegations as part of the normal process of state
   recovery.

   During the grace period, the client recovers the byte range
   delegation by issuing requests with the reclaim flag set to true.
   The server guarantees that the file will not change in the usual
   fashion by rejecting any conflicting non-reclaim delegation, locking
   and OPEN requests, READ, WRITE, and SETATTR.

5.3  Network partition

   If a network partition causes the client to fail to renew its leases
   within the usual lease expiration period, the server MAY choose to
   hold the byte range delegation on behalf of the client until a
   conflict forces a revocation.  In the latter case, the server should
   return NFS4ERR_EXPIRED in response to any attempts to use the
   delegation.

   If the client sees that the change attribute on the file has not been
   modified, it may attempt to re-establish its byte range delegations
   by requesting a DELEG_OPEN, and then replaying the DELEG_RANGE
   requests to the server.  The client should ensure that it revalidates
   its cache using the change attribute also after recovery is complete
   in order to make sure that the cache is still valid.

   The reader is referred to the section "Revocation Recovery for Write
   Open Delegation" in [RFC3530] for a discussion on how to deal with
   cached writes in regions where recovery of the byte range delegation
   has failed.





Myklebust, et al.        Expires April 19, 2006                [Page 11]

Internet-Draft        NFSv4 byte range delegations          October 2005


6.  New client operations

6.1  DELEG_OPEN - request new byte-range delegation stateid

   SYNOPSIS

     (cfh), open_seqid, open_stateid, deleg_seqid -> stateid, delegation

   ARGUMENT

     struct DELEG_OPEN4args {
             /* CURRENT_FH: opened file */
             seqid4             open_seqid;
             stateid4           open_stateid;
             seqid4             deleg_seqid;
     };

   RESULT

     struct DELEG_OPEN4resok {
             stateid4           stateid;     /* byte range delegation */
             open_delegation4   delegation;  /* open delegation */
     };

     union DELEG_OPEN4res switch (nfsstat4 status) {
      case NFS4_OK:
              /* CURRENT_STATEID: Stateid for byte range delegation */
              DELEG_OPEN4resok   resok4;
      default:
              void;
     };

   DESCRIPTION

   DELEG_OPEN requests a byte-range delegation stateid for a given file.
   The open stateid and sequence id are used to ensure only-once
   semantics in the absence of sessions [draft-ietf-nfsv4-sess-01].  The
   delegation sequence identifier should be initialised to zero upon the
   first call to DELEG_OPEN for a given file and each time the user
   gives up the byte range delegation stateid.

   If the client attempts to call DELEG_OPEN using the special stateids
   consisting of all zero bits or all one bits, the server should deny
   the request using the error NFS4ERR_OPENMODE.

   The server is also required to deny this request with a
   NFS4ERR_CB_PATH_DOWN if the callback path cannot be established.




Myklebust, et al.        Expires April 19, 2006                [Page 12]

Internet-Draft        NFSv4 byte range delegations          October 2005


   On success, the current filehandle retains its value.  The current
   delegation stateid is replaced with the stateid corresponding to the
   byte range delegation.

   IMPLEMENTATION

   The client gives up the byte range delegation stateid using the
   DELEGRETURN operation.

   At any given time there should be at most one byte-range delegation
   stateid in existence per (file, client) pair.  A client is permitted
   to send multiple DELEG_OPEN requests, however the server should then
   reply with the same stateid.

   The server may additionally choose to grant the client an ordinary
   file delegation.

   ERRORS
      NFS4ERR_ACCESS
      NFS4ERR_ADMIN_REVOKED
      NFS4ERR_BADHANDLE
      NFS4ERR_BAD_SEQID
      NFS4ERR_BAD_STATEID
      NFS4ERR_BADXDR
      NFS4ERR_CB_PATH_DOWN
      NFS4ERR_DELAY
      NFS4ERR_DENIED
      NFS4ERR_EXPIRED
      NFS4ERR_FHEXPIRED
      NFS4ERR_ISDIR
      NFS4ERR_LEASE_MOVED
      NFS4ERR_MOVED
      NFS4ERR_NOFILEHANDLE
      NFS4ERR_NOTSUPP
      NFS4ERR_OLD_STATEID
      NFS4ERR_OPENMODE
      NFS4ERR_RESOURCE
      NFS4ERR_SERVERFAULT
      NFS4ERR_STALE
      NFS4ERR_STALE_CLIENTID
      NFS4ERR_STALE_STATEID










Myklebust, et al.        Expires April 19, 2006                [Page 13]

Internet-Draft        NFSv4 byte range delegations          October 2005


6.2  DELEG_RANGE - extend delegation to cover a byte range

   SYNOPSIS

     (cfh), locktype, reclaim, stateid, offset, length ->
     (cstateid), offset, length, recall

   ARGUMENT

     struct DELEG_RANGE4args {
             /* CURRENT_FH: file */
             nfs_lock_type4     locktype;
             bool               reclaim;
             stateid4           stateid;
             offset4            offset;
             length4            length;
     };

   RESULT

     enum delegreturn4 {
            NORECALL            = 0,
            DOWNGRADE           = 1,
            RECALL              = 2
     };

     struct DELEG_RANGE4resok {
             offset4            offset;
             length4            length;
             delegreturn4       recall;
     };

     union DELEG_RANGE4res (nfsstat4 status) {
       case NFS4_OK:
             DELEG_RANGE4resok  resok4;
       default:
             void;
     };

   DESCRIPTION

   The DELEG_RANGE operation requests a delegation for the byte range
   specified by the offset and length parameters.  The locktype
   specifies the type of caching semantics that are requested.  A
   reclaim request is signalled by setting the reclaim parameter to
   TRUE.

   If the locktype is set to READ_LT or WRITE_LT, and another client



Myklebust, et al.        Expires April 19, 2006                [Page 14]

Internet-Draft        NFSv4 byte range delegations          October 2005


   holds a conflicting delegation, the server should return
   NFS4ERR_DENIED.  If, however locktype is either READW_LT or
   WRITEW_LT, the server should initiate a recall of all conflicting
   delegations prior to returning NFS4ERR_DENIED.

   If a client requests a locktype of WRITE_LT or WRITEW_LT on a region
   for which it already holds a read delegation, then the server should
   attempt to atomically upgrade the existing delegation.  A server that
   does not support atomic upgrades or downgrades of the byte range
   delegation should return NFS4ERR_LOCK_NOTSUPP.

   On success, the server returns the range covered by the delegation.
   Note that the server may choose to extend the range requested by the
   client in order to decrease the administrative burden by merging
   noncontiguous delegation ranges.  It MUST not, however, return a
   range that is smaller than that requested by the client.

   The "recall" flag is an optimisation that can be used by the server
   to notify the client that a conflicting request is already queued.
   If this flag is set to DOWNGRADE then the client should should
   downgrade the write delegation to a read delegation.  If it is set to
   RECALL, then the client should release the delegation.

   On success the current filehandle retains its value, and the current
   delegation stateid is set to the new value.

   IMPLEMENTATION

   DELEG_RANGE may be called on a given stateid as many times as
   desired.  The server may represent the result bytes covered
   internally as a list of noncontiguous byte ranges.  Or, if it
   chooses, it may choose a simpler representation--for example, a
   single range covering all of the bytes ever requested.  A server is
   is free to reject DELEG_RANGE requests and to recall them for any
   reason, so at worst, this might cause the server to deny requests (or
   recall delegations) more often than is strictly necessary.

   The READW_LT and WRITEW_LT lock types cause the server to recall any
   conflicting delegations from other clients.  A client will want to
   use these variants in situations where strong cache consistency
   guarantees are needed.

   A length field with all bits one extends the delegation through the
   end of file, regardless of how long the file actually is.

   If mandatory file locking is on for the file, and if a lockowner on a
   client other than the one from which this DELEG_RANGE request
   originated holds a conflicting lock, then the server should return



Myklebust, et al.        Expires April 19, 2006                [Page 15]

Internet-Draft        NFSv4 byte range delegations          October 2005


   NFS4ERR_LOCKED.

   ERRORS
      NFS4ERR_ACCESS
      NFS4ERR_ADMIN_REVOKED
      NFS4ERR_BADHANDLE
      NFS4ERR_BAD_RANGE
      NFS4ERR_BAD_STATEID
      NFS4ERR_BADXDR
      NFS4ERR_DELAY
      NFS4ERR_DENIED
      NFS4ERR_EXPIRED
      NFS4ERR_FHEXPIRED
      NFS4ERR_GRACE
      NFS4ERR_INVAL
      NFS4ERR_ISDIR
      NFS4ERR_LEASE_MOVED
      NFS4ERR_LOCKED
      NFS4ERR_LOCK_NOTSUPP
      NFS4ERR_MOVED
      NFS4ERR_NOFILEHANDLE
      NFS4ERR_NO_GRACE
      NFS4ERR_NOTSUPP
      NFS4ERR_OLD_STATEID
      NFS4ERR_RECLAIM_BAD
      NFS4ERR_RECLAIM_CONFLICT
      NFS4ERR_RESOURCE
      NFS4ERR_SERVERFAULT
      NFS4ERR_STALE
      NFS4ERR_STALE_STATEID





















Myklebust, et al.        Expires April 19, 2006                [Page 16]

Internet-Draft        NFSv4 byte range delegations          October 2005


6.3  DELEG_DOWNGRADE - downgrades a write delegation on a byte range

   SYNOPSIS

     (cfh), stateid, deleg_seqid, offset, length -> stateid, recall

   ARGUMENT

     struct DELEG_DOWNGRADE4args {
             /* CURRENT_FH: file */
             stateid4           stateid;
             seqid4             deleg_seqid;
             offset4            offset;
             length4            length;
     };

   RESULT

     struct DELEG_DOWNGRADE4resok {
             stateid4           stateid;
             bool               recall;
     };

     union DELEG_DOWNGRADE4res switch (nfsstat4 status) {
       case NFS4_OK:
               DELEG_DOWNGRADE4resok resok;
       default:
               void;
     };

   DESCRIPTION

   DELEG_DOWNGRADE is used by the client to downgrade all write
   delegations held over a given byte range and convert them into read
   delegations.

   The server may piggyback a request to have the client release the
   delegation onto the reply by setting the "recall" flag to true.

   On success the current filehandle retains its value, and the current
   delegation stateid is set to the new value.

   If the client holds no write delegations in the range
   (offset,length), then the server should treat this operation as a
   no-op and simply return NFS4_OK.

   If the server is unable to atomically convert the existing write
   delegations into read delegations, then the request should fail with



Myklebust, et al.        Expires April 19, 2006                [Page 17]

Internet-Draft        NFSv4 byte range delegations          October 2005


   the error NFS4ERR_LOCK_NOTSUPP.

   ERRORS
      NFS4ERR_ADMIN_REVOKED
      NFS4ERR_BADHANDLE
      NFS4ERR_BAD_RANGE
      NFS4ERR_BAD_STATEID
      NFS4ERR_BADXDR
      NFS4ERR_DELAY
      NFS4ERR_EXPIRED
      NFS4ERR_FHEXPIRED
      NFS4ERR_GRACE
      NFS4ERR_INVAL
      NFS4ERR_ISDIR
      NFS4ERR_LEASE_MOVED
      NFS4ERR_LOCK_NOTSUPP
      NFS4ERR_MOVED
      NFS4ERR_NOFILEHANDLE
      NFS4ERR_NOTSUPP
      NFS4ERR_OLD_STATEID
      NFS4ERR_RESOURCE
      NFS4ERR_SERVERFAULT
      NFS4ERR_STALE
      NFS4ERR_STALE_STATEID



























Myklebust, et al.        Expires April 19, 2006                [Page 18]

Internet-Draft        NFSv4 byte range delegations          October 2005


6.4  DELEG_RELEASE - release a delegation on a byte range

   SYNOPSIS

     (cfh), stateid, deleg_seqid, offset, length -> stateid

   ARGUMENT

     struct DELEG_RELEASE4args {
             /* CURRENT_FH: file */
             stateid4           stateid;
             seqid4             deleg_seqid;
             offset4            offset;
             length4            length;
     };

   RESULT

     struct DELEG_RELEASE4resok {
             stateid4           stateid;
     };

     union DELEG_RELEASE4res switch (nfsstat4 status) {
       case NFS4_OK:
               DELEG_RELEASE4resok resok;
       default:
               void;
     };

   DESCRIPTION

   The DELEG_RELEASE operation notifies the server that the client is no
   longer caching any data in the specified range, and returns any byte
   range delegations that may be held in that range.

   ERRORS
      NFS4ERR_ADMIN_REVOKED
      NFS4ERR_BADHANDLE
      NFS4ERR_BAD_RANGE
      NFS4ERR_BAD_STATEID
      NFS4ERR_BADXDR
      NFS4ERR_DELAY
      NFS4ERR_EXPIRED
      NFS4ERR_FHEXPIRED
      NFS4ERR_INVAL
      NFS4ERR_ISDIR





Myklebust, et al.        Expires April 19, 2006                [Page 19]

Internet-Draft        NFSv4 byte range delegations          October 2005


      NFS4ERR_LEASE_MOVED
      NFS4ERR_MOVED
      NFS4ERR_NOFILEHANDLE
      NFS4ERR_NOTSUPP
      NFS4ERR_OLD_STATEID
      NFS4ERR_RESOURCE
      NFS4ERR_SERVERFAULT
      NFS4ERR_STALE
      NFS4ERR_STALE_STATEID

6.5  DELEG_PUT_STATEID - set the current delegation stateid

   SYNOPSIS

     (cfh), stateid -> (cstateid)

   ARGUMENT

     struct DELEG_PUT_STATEID4args {
             /* CURRENT_FH: file */
             stateid4           stateid;
     };

   RESULT

     struct DELEG_PUT_STATEID4res {
             nfsstat4           status;
     };

   DESCRIPTION

   The DELEG_PUT_STATEID operation is used by the client to set the
   current delegation stateid.

   If the client specifies the special stateid consisting of all zeros,
   then the server is expected to clear the current delegation stateid.

   IMPLEMENTATION

   This operation is used in order to apply a byte range delegation to
   any subsequent READ or WRITE requests within the same COMPOUND.

   ERRORS
      NFS4ERR_ADMIN_REVOKED
      NFS4ERR_BADHANDLE
      NFS4ERR_BAD_STATEID





Myklebust, et al.        Expires April 19, 2006                [Page 20]

Internet-Draft        NFSv4 byte range delegations          October 2005


      NFS4ERR_BADXDR
      NFS4ERR_DELAY
      NFS4ERR_EXPIRED
      NFS4ERR_FHEXPIRED
      NFS4ERR_ISDIR
      NFS4ERR_LEASE_MOVED
      NFS4ERR_MOVED
      NFS4ERR_NOFILEHANDLE
      NFS4ERR_OLD_STATEID
      NFS4ERR_RESOURCE
      NFS4ERR_SERVERFAULT
      NFS4ERR_STALE_STATEID







































Myklebust, et al.        Expires April 19, 2006                [Page 21]

Internet-Draft        NFSv4 byte range delegations          October 2005


7.  New callback operations

7.1  CB_RECALL_RANGE - recall a byte range delegation

   SYNOPSIS

     stateid, offset, length, downgrade, truncate, fh -> ()

   ARGUMENT

     struct CB_RECALL_RANGE4args {
             stateid4           stateid;
             offset4            offset;
             length4            length;
             bool               downgrade;
             bool               truncate;
             nfs_fh4            fh;
     };

   RESULT

     struct CB_RECALL_RANGE4res {
             nfsstat4           status;
     };

   DESCRIPTION

   The CB_RECALL_RANGE operation is used to compel a client to
   relinquish a delegated byte range and return it to the server.

   IMPLEMENTATION

   The downgrade flag is used by the server to inform the client about
   the nature of the caching conflict that triggered the callback.  If
   set, it indicates that it would suffice to resolve the conflict if
   the client were to downgrade all write delegations in the range to
   read delegations.

   If the downgrade flag is not set, the client MUST prepare to release
   all delegations in the specified range.

   The truncate flag is used to inform the client that the byte range
   being recalled is about to be truncated as a result of an incoming
   SETATTR or OPEN.  The client may use this information to discard any
   queued writes that may otherwise have had to be transferred to disk.

   If a race causes the client to believe that it is not holding any
   delegations in the range specified by the server and there are no



Myklebust, et al.        Expires April 19, 2006                [Page 22]

Internet-Draft        NFSv4 byte range delegations          October 2005


   outstanding requests for this range, then it may signal this to the
   server using the error NFS4ERR_BAD_RANGE.  This may for instance be
   the case if the server's CB_RECALL_RANGE call raced with a
   DELEG_RELEASE from the client.

   ERRORS
      NFS4ERR_BADHANDLE
      NFS4ERR_BAD_STATEID
      NFS4ERR_BAD_XDR
      NFS4ERR_BAD_RANGE
      NFS4ERR_BAD_RESOURCE
      NFS4ERR_BAD_SERVERFAULT

8.  References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", RFC 2119.

   [RFC3530]  Shepler, S., "Network File System (NFS) version 4
              Protocol", RFC 3530.

   [draft-ietf-nfsv4-sess-01]
              Talpey, T. and J. Bauman, "NFSv4 Session Extensions".


Authors' Addresses

   Trond Myklebust
   Network Appliance, Inc.
   535 W. William St., Suite 3100
   Ann Arbor, MI  48103
   US

   Phone: +1 734-764-5207
   Email: Trond.Myklebust@netapp.com


   J. Bruce Fields
   U. of Michigan Center for Information Technology Integration
   535 W. William St., Suite 3100
   Ann Arbor, MI  48103
   US

   Email: bfields@citi.umich.edu







Myklebust, et al.        Expires April 19, 2006                [Page 23]

Internet-Draft        NFSv4 byte range delegations          October 2005


   William A. Adamson
   U. of Michigan Center for Information Technology Integration
   535 W. William St., Suite 3100
   Ann Arbor, MI  48103
   US

   Email: andros@citi.umich.edu


   Peter Honeyman
   U. of Michigan Center for Information Technology Integration
   535 W. William St., Suite 3100
   Ann Arbor, MI  48103
   US

   Email: honey@citi.umich.edu



































Myklebust, et al.        Expires April 19, 2006                [Page 24]

Internet-Draft        NFSv4 byte range delegations          October 2005


Intellectual Property Statement

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.


Disclaimer of Validity

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Copyright Statement

   Copyright (C) The Internet Society (2005).  This document is subject
   to the rights, licenses and restrictions contained in BCP 78, and
   except as set forth therein, the authors retain all their rights.


Acknowledgment

   Funding for the RFC Editor function is currently provided by the
   Internet Society.




Myklebust, et al.        Expires April 19, 2006                [Page 25]