Internet DRAFT - draft-myklebust-nfsv4-byte-range-delegations
draft-myklebust-nfsv4-byte-range-delegations
Network File System Version 4 T. Myklebust
Internet-Draft Network Appliance, Inc.
Expires: April 19, 2006 J. Fields
W. Adamson
P. Honeyman
CITI
October 16, 2005
Network File System (NFS) version 4 byte range delegations
draft-myklebust-nfsv4-byte-range-delegations-00
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on April 19, 2006.
Copyright Notice
Copyright (C) The Internet Society (2005).
Abstract
This document describes a set of extensions to the NFS version 4
protocol that enable the client to cache file data when caching
conflicts prevent the server from handing out a file delegation.
The proposed extensions enable the caching of only those specific
Myklebust, et al. Expires April 19, 2006 [Page 1]
Internet-Draft NFSv4 byte range delegations October 2005
byte ranges of data which the user application is reading or writing.
As in the case of full delegations, a callback mechanism enables the
server to request that the client flush cached data when a caching
conflict occurs.
Keywords
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 File caching in NFS versions 2 and 3 . . . . . . . . . . . 3
1.2 File caching in NFS version 4 . . . . . . . . . . . . . . 3
1.3 Motivation for extending the NFSv4 delegation model . . . 3
2. Description of the proposed caching model . . . . . . . . . . 5
2.1 File data . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Read delegations . . . . . . . . . . . . . . . . . . . 5
2.1.2 Write delegations . . . . . . . . . . . . . . . . . . 6
2.2 Upgrading and downgrading byte ranges . . . . . . . . . . 6
2.3 File truncation and extension . . . . . . . . . . . . . . 7
2.4 Byte range locks . . . . . . . . . . . . . . . . . . . . . 7
3. Stateids and byte range delegations . . . . . . . . . . . . . 8
3.1 The current delegation stateid . . . . . . . . . . . . . . 8
4. Callback model . . . . . . . . . . . . . . . . . . . . . . . . 9
4.1 Revocation . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2 Client recovery from a recalled byte range delegation . . 9
4.3 Client recovery from a recalled file delegation . . . . . 10
4.4 Use of CB_GETATTR for querying the size attribute . . . . 10
5. Crash recovery . . . . . . . . . . . . . . . . . . . . . . . . 11
5.1 Client reboot scenario . . . . . . . . . . . . . . . . . . 11
5.2 Server reboot scenario . . . . . . . . . . . . . . . . . . 11
5.3 Network partition . . . . . . . . . . . . . . . . . . . . 11
6. New client operations . . . . . . . . . . . . . . . . . . . . 12
6.1 DELEG_OPEN - request new byte-range delegation stateid . . 12
6.2 DELEG_RANGE - extend delegation to cover a byte range . . 14
6.3 DELEG_DOWNGRADE - downgrades a write delegation on a
byte range . . . . . . . . . . . . . . . . . . . . . . . . 17
6.4 DELEG_RELEASE - release a delegation on a byte range . . . 19
6.5 DELEG_PUT_STATEID - set the current delegation stateid . . 20
7. New callback operations . . . . . . . . . . . . . . . . . . . 22
7.1 CB_RECALL_RANGE - recall a byte range delegation . . . . . 22
8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 23
Intellectual Property and Copyright Statements . . . . . . . . 25
Myklebust, et al. Expires April 19, 2006 [Page 2]
Internet-Draft NFSv4 byte range delegations October 2005
1. Introduction
1.1 File caching in NFS versions 2 and 3
The NFS protocol versions 2 and 3 do not offer any caching guarantees
to clients. The most commonly implemented caching model is the so-
called close-to-open model, which relies on user applications
providing their own assurances of exclusive access to file data. In
this model, the clients limit themselves to checking cache
consistency when the user opens and closes the file. In the case
where the NLM locking extensions are implemented, checks are also
performed upon taking and releasing advisory locks.
1.2 File caching in NFS version 4
With the introduction of delegations, NFS version 4 [RFC3530]
strengthens file caching guarantees at the protocol level under
limited circumstances that mirror those under which the close-to-open
model is valid.
When the client opens a file for reading, the server is permitted to
offer a "file read delegation" after having determined that no other
clients have been granted write access. This is a guarantee that the
file data and meta-data will not change until the client gives up the
delegation. A file read delegation also gives the client the
opportunity to cache byte range read locks and READ open share locks.
When the client opens a file with READ or WRITE share semantics, and
the server determines that the client is the exclusive user of that
file, it may offer a "file write delegation". In doing so it
guarantees that no other client may read or modify the file until the
delegation is returned. A write delegation also enables the caching
of all byte range locks and open share locks.
The key difference in functionality between a file delegation and a
lock lies in the fact that the server is able to recall the
delegation at any time by means of a callback channel. When a
delegation is recalled, the client is expected to flush its cache,
establish its cached locks on the server, and return the delegation,
and to do all this as quickly as possible. If the server notes that
the client has failed to return the delegation within a grace time of
1 lease period, then the server may unilaterally revoke the
delegation.
1.3 Motivation for extending the NFSv4 delegation model
Problems arise when multiple clients wish to access the file, and one
Myklebust, et al. Expires April 19, 2006 [Page 3]
Internet-Draft NFSv4 byte range delegations October 2005
(or more) has open for writing. Delegations are ruled out for this
case, so unless an application uses byte range locking, a client is
unable to tell whether cached data is valid. Perforce, clients fall
back to not caching data or checking cache validity frequently,
increasing the I/O burden on the server.
One long-standing problem that the NFSv4 delegation model therefore
fails to solve is that of providing cache consistency guarantees as
strong as those provided by local file-systems. This failure has a
broad impact, e.g. it interferes with porting applications from a
single machine environment to a cluster of machines that share files
with NFS.
Among the applications that require stronger caching semantics than
NFSv4 provides are those that use shared files for synchronisation
and communication between processes on different clients but do no
supplementary locking. Another example is shared append-only files
such as logs.
Even applications that use byte range locking for synchronisation are
affected. Unless a peek at the change attribute shows that no-one
has written to the file anywhere in the file, a client may be forced
to ignore otherwise valid cached data.
Myklebust, et al. Expires April 19, 2006 [Page 4]
Internet-Draft NFSv4 byte range delegations October 2005
2. Description of the proposed caching model
Except for the special case of the size attribute, this document does
not address the issue of file meta-data consistency.
The proposed model resembles that of file delegations in that the
client can register with the server to provide synchronous
notification of changes to locks and cached data. It also provides
synchronisation guarantees between writers by allowing them to
request temporary exclusive access to byte ranges of the file.
The model is required to operate consistently in a mixed environment
in which some clients may be using older versions of the NFS protocol
together with uncached I/O. To the older clients, those that are
using byte range delegations should appear to behave as if they too
are using uncached I/O.
2.1 File data
2.1.1 Read delegations
A server that grants a read delegation on a byte range guarantees
that no other client may change the data or acquire a write-lock in
the covered region until the delegation is released. Note that a
SETATTR that modifies the size of a file effectively changes the data
in the region between the old and new sizes.
The client may request a read delegation on a byte range using the
DELEG_RANGE operation with a lock type argument of READ_LT or
READW_LT. In the case where the READ_LT argument is used, the
DELEG_RANGE call should fail without triggering a recall if another
client holds a write delegation for that range. Clients can use this
mechanism in order to issue speculative requests that might fail,
e.g. read-ahead requests. The server MUST, however initiate the
recall of any conflicting write delegation when the READW_LT variant
is used whether or not the request is granted.
In the proposed model, if a current delegation stateid has been set
using a previous DELEG_PUT_STATEID or DELEG_RANGE operation, then a
READ request implicitly requests a read delegation on the byte range
covered by its arguments. In this case, the server should treat the
READ request as if it has been immediately preceded by a DELEG_RANGE
call with a READW_LT argument.
A server MUST refuse to grant a read delegation on a range that would
overlap with a write delegation held by another client. In order to
allow the caching of byte range locks, the server MUST also refuse to
grant a read delegation for a range that overlaps with a WRITE lock
Myklebust, et al. Expires April 19, 2006 [Page 5]
Internet-Draft NFSv4 byte range delegations October 2005
held by another client.
If another client attempts to write into the region covered by the
delegation, the server should initiate an immediate recall. It may
then optionally return an error of NFS4ERR_DELAY to the write
request.
2.1.2 Write delegations
A server that grants a write delegation on a byte range guarantees
that no other client may change the data in that region until the
delegation has been released. In addition, it guarantees that no
other client may read data or hold a read delegation in that region
until the write delegation has been downgraded or released.
The client may request a write delegation on a byte range using the
DELEG_RANGE operation with a lock type argument of WRITE_LT or
WRITEW_LT. In the case where the WRITE_LT argument is used, the
DELEG_RANGE call should fail without triggering a recall if another
client holds a read or write delegation for that range. The server
MUST, however initiate the recall of any conflicting read or write
delegation when the WRITEW_LT variant is used.
A server MUST refuse to grant a write delegation that would overlap
with a read or write delegation held by another client. In order to
allow the caching of byte range locks, the server MUST also refuse to
grant a write delegation for a range that overlaps with a READ or
WRITE lock held by another client.
To avoid lock starvation for write delegations, the server is
encouraged to implement the same queueing scheme that is described
for byte range locks in Section 8.4 of [RFC3530].
2.2 Upgrading and downgrading byte ranges
In the proposed mode, a client may request to upgrade a read
delegation to a write delegation at any time using the DELEG_RANGE
operation. If successful, the upgrade must be performed atomically
by the server so that the client that requested the upgrade can keep
any cached data.
Similarly, a client that is holding a write delegation on a byte
range may, once it is done flushing out any dirty data, request that
the server atomically downgrade it to a read delegation using the
DELEG_DOWNGRADE operation. It is expected that clients will take
advantage of this as part of a COMMIT compound to obviate recalls.
Myklebust, et al. Expires April 19, 2006 [Page 6]
Internet-Draft NFSv4 byte range delegations October 2005
2.3 File truncation and extension
Changes to the file size MUST trigger a recall of all byte range
delegations held by other clients in the region between the old and
new end of file.
A useful consequence of this rule is that a client wishing to be
notified of changes to the size attribute may achieve this by
requesting a read or write delegation that covers the 2 byte range
starting at the offset (size - 1).
If a client holds a write delegation in the region of the end of file
marker, then it is guaranteed that no other clients can append to the
file until the client holding the write delegation has finished
writing out its modifications and released the delegation in that
region.
2.4 Byte range locks
A client holding a write delegation may cache read or write byte
range lock requests, provided they are fully included in the range
covered by the write delegation.
A client holding a read delegation may cache read byte range lock
requests provided they are fully included in the region covered by
the read delegation.
If a delegation is recalled or downgraded, the client is responsible
for establishing any cached locks to the server as part of the
process of recovery.
Myklebust, et al. Expires April 19, 2006 [Page 7]
Internet-Draft NFSv4 byte range delegations October 2005
3. Stateids and byte range delegations
One of the goals of the delegation model is to allow clients to cache
data without having to tie that delegation to a particular open
stateid. Although the DELEG_OPEN operation uses an open stateid and
sequence to guarantee only-once semantics, the resulting stateid is
not considered to be associated to this particular open stateid.
To allow it to be reused with other open stateids, therefore, the
byte range delegation stateid does not carry any share or lock
information. A client holding a write delegation on a particular
byte range has no guarantee that the share reservations on that file
allow write access.
3.1 The current delegation stateid
To allow the server to check that a given operation does not violate
the requested caching semantics, we add the notion of a "current
delegation stateid".
Rather than replacing the usual open stateid argument, the current
delegation stateid is set in a separate operation that precedes the
READ, WRITE, or SETATTR operation that it protects. It is set either
implicitly using a DELEG_RANGE operation, or by using the dedicated
operation DELEG_PUT_STATEID. The current delegations stateid is
automatically cleared by any operation that changes the current
filehandle. It may also be cleared by explicitly calling
DELEG_PUT_STATEID with a special stateid argument consisting of all
zeros.
If set, the current delegation stateid applies to all subsequent
READ, WRITE and SETATTR operations within the same COMPOUND. The
server is required to check the current delegation stateid in
addition to the READ/WRITE/SETATTR's stateid argument, and should
return NFS4ERR_OLD_STATEID if either stateid has been superseded due
to a state change. This may, for instance occur in the case of a
race with another DELEG_DOWNGRADE or DELEG_RELEASE request on the
same file.
Myklebust, et al. Expires April 19, 2006 [Page 8]
Internet-Draft NFSv4 byte range delegations October 2005
4. Callback model
4.1 Revocation
Servers are permitted to recall a byte range delegation at any time
and for any reason. Typical scenarios that trigger such a recall
include:
o Resolving a caching conflict due to a request from another client.
Operations that may require a recall of the byte range delegation
include READ, WRITE, LOCK, LOCKT, SETATTR, OPEN or DELEG_RANGE.
o Another client's read patterns triggers speculative read-ahead on
the server.
o The amount of delegation state being managed by the server grows
too large, triggering a reclaim of resources.
There are two ways for a server to recall a byte range delegation:
o As for file delegations, the server can use CB_RECALL to request
that a client flush all writes and locks affected by the
delegation, and return the delegation using the DELEGRETURN
operation. If the client later wishes to re-establish a
delegation, then it must first call DELEG_OPEN to obtain a new
delegation stateid.
o The new CB_RECALL_RANGE allows the server finer granularity over
which region of the file that it wishes to control.
CB_RECALL_RANGE also allows the server to request a downgrade
rather than a full recall of a region that holds cached writes.
By requesting a downgrade, the server signals that the client may
convert its write delegations into read delegations after it has
finished flushing the cached writes to disk.
Clients that request byte range delegations MUST be able to handle
both CB_RECALL and CB_RECALL_RANGE recall requests.
4.2 Client recovery from a recalled byte range delegation
When the server recalls a byte range or part of a byte range that has
been delegated, the client recovery process is very similar to that
of file delegation:
o If the client holds a read delegation on the recalled byte range,
then it should recover any cached byte range read locks and mark
the read cache as invalid.
o If a write delegation is held on all or part of the byte range
being recalled, then the client should recover any cached read or
write locks, flush out all pending writes, and mark the read cache
as invalid.
The recovery process ends when the client returns the delegation on
the recalled range using either the DELEG_RELEASE or DELEGRETURN
Myklebust, et al. Expires April 19, 2006 [Page 9]
Internet-Draft NFSv4 byte range delegations October 2005
operations.
If the server requests a downgrade of a write delegation, then the
client may optionally select to use a DELEG_DOWNGRADE instead of
returning the entire delegation. If it chooses to do so then it need
not mark the read cache as invalid on that range.
4.3 Client recovery from a recalled file delegation
If the server recalls a file write delegation, then the client may
request read or write byte range delegations as part of the usual
process of recovering cached locks and flushing out writes.
The server is under no obligation to honour these requests, but it
may choose to do so in order to allow the client to continue to cache
read data or writes that are not causing any immediate cache
consistency conflicts.
Likewise, in the case where the server recalls a file read
delegation, then the client may issue requests for byte range read
delegations during the recovery phase.
4.4 Use of CB_GETATTR for querying the size attribute
If a client holds a write delegation that extends across the end of
file, then it may cache SETATTR or WRITE operations that will cause
the size attribute to change. Rather than recall the delegation when
a second client attempts to query the size attribute, the server MAY
choose to send a CB_GETATTR callback to the client holding the
delegation in order to determine the true file size.
Note that the server MUST NOT issue a CB_GETATTR query for any
attributes other than size.
Myklebust, et al. Expires April 19, 2006 [Page 10]
Internet-Draft NFSv4 byte range delegations October 2005
5. Crash recovery
As usual under NFS, the recovery of byte range delegations after a
crash is driven by clients.
5.1 Client reboot scenario
If the client reboots using the standard calls to SETCLIENTID and
SETCLIENTID_CONFIRM then the server is expected to clear the byte
range delegations as part of the usual operation of breaking the
lease state owned by the previous incarnation of the client.
5.2 Server reboot scenario
The client discovers a server reboot in the usual fashion by
receiving a NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_STATEID. If the
server supports a grace period, the client may then attempt to
recover byte range delegations as part of the normal process of state
recovery.
During the grace period, the client recovers the byte range
delegation by issuing requests with the reclaim flag set to true.
The server guarantees that the file will not change in the usual
fashion by rejecting any conflicting non-reclaim delegation, locking
and OPEN requests, READ, WRITE, and SETATTR.
5.3 Network partition
If a network partition causes the client to fail to renew its leases
within the usual lease expiration period, the server MAY choose to
hold the byte range delegation on behalf of the client until a
conflict forces a revocation. In the latter case, the server should
return NFS4ERR_EXPIRED in response to any attempts to use the
delegation.
If the client sees that the change attribute on the file has not been
modified, it may attempt to re-establish its byte range delegations
by requesting a DELEG_OPEN, and then replaying the DELEG_RANGE
requests to the server. The client should ensure that it revalidates
its cache using the change attribute also after recovery is complete
in order to make sure that the cache is still valid.
The reader is referred to the section "Revocation Recovery for Write
Open Delegation" in [RFC3530] for a discussion on how to deal with
cached writes in regions where recovery of the byte range delegation
has failed.
Myklebust, et al. Expires April 19, 2006 [Page 11]
Internet-Draft NFSv4 byte range delegations October 2005
6. New client operations
6.1 DELEG_OPEN - request new byte-range delegation stateid
SYNOPSIS
(cfh), open_seqid, open_stateid, deleg_seqid -> stateid, delegation
ARGUMENT
struct DELEG_OPEN4args {
/* CURRENT_FH: opened file */
seqid4 open_seqid;
stateid4 open_stateid;
seqid4 deleg_seqid;
};
RESULT
struct DELEG_OPEN4resok {
stateid4 stateid; /* byte range delegation */
open_delegation4 delegation; /* open delegation */
};
union DELEG_OPEN4res switch (nfsstat4 status) {
case NFS4_OK:
/* CURRENT_STATEID: Stateid for byte range delegation */
DELEG_OPEN4resok resok4;
default:
void;
};
DESCRIPTION
DELEG_OPEN requests a byte-range delegation stateid for a given file.
The open stateid and sequence id are used to ensure only-once
semantics in the absence of sessions [draft-ietf-nfsv4-sess-01]. The
delegation sequence identifier should be initialised to zero upon the
first call to DELEG_OPEN for a given file and each time the user
gives up the byte range delegation stateid.
If the client attempts to call DELEG_OPEN using the special stateids
consisting of all zero bits or all one bits, the server should deny
the request using the error NFS4ERR_OPENMODE.
The server is also required to deny this request with a
NFS4ERR_CB_PATH_DOWN if the callback path cannot be established.
Myklebust, et al. Expires April 19, 2006 [Page 12]
Internet-Draft NFSv4 byte range delegations October 2005
On success, the current filehandle retains its value. The current
delegation stateid is replaced with the stateid corresponding to the
byte range delegation.
IMPLEMENTATION
The client gives up the byte range delegation stateid using the
DELEGRETURN operation.
At any given time there should be at most one byte-range delegation
stateid in existence per (file, client) pair. A client is permitted
to send multiple DELEG_OPEN requests, however the server should then
reply with the same stateid.
The server may additionally choose to grant the client an ordinary
file delegation.
ERRORS
NFS4ERR_ACCESS
NFS4ERR_ADMIN_REVOKED
NFS4ERR_BADHANDLE
NFS4ERR_BAD_SEQID
NFS4ERR_BAD_STATEID
NFS4ERR_BADXDR
NFS4ERR_CB_PATH_DOWN
NFS4ERR_DELAY
NFS4ERR_DENIED
NFS4ERR_EXPIRED
NFS4ERR_FHEXPIRED
NFS4ERR_ISDIR
NFS4ERR_LEASE_MOVED
NFS4ERR_MOVED
NFS4ERR_NOFILEHANDLE
NFS4ERR_NOTSUPP
NFS4ERR_OLD_STATEID
NFS4ERR_OPENMODE
NFS4ERR_RESOURCE
NFS4ERR_SERVERFAULT
NFS4ERR_STALE
NFS4ERR_STALE_CLIENTID
NFS4ERR_STALE_STATEID
Myklebust, et al. Expires April 19, 2006 [Page 13]
Internet-Draft NFSv4 byte range delegations October 2005
6.2 DELEG_RANGE - extend delegation to cover a byte range
SYNOPSIS
(cfh), locktype, reclaim, stateid, offset, length ->
(cstateid), offset, length, recall
ARGUMENT
struct DELEG_RANGE4args {
/* CURRENT_FH: file */
nfs_lock_type4 locktype;
bool reclaim;
stateid4 stateid;
offset4 offset;
length4 length;
};
RESULT
enum delegreturn4 {
NORECALL = 0,
DOWNGRADE = 1,
RECALL = 2
};
struct DELEG_RANGE4resok {
offset4 offset;
length4 length;
delegreturn4 recall;
};
union DELEG_RANGE4res (nfsstat4 status) {
case NFS4_OK:
DELEG_RANGE4resok resok4;
default:
void;
};
DESCRIPTION
The DELEG_RANGE operation requests a delegation for the byte range
specified by the offset and length parameters. The locktype
specifies the type of caching semantics that are requested. A
reclaim request is signalled by setting the reclaim parameter to
TRUE.
If the locktype is set to READ_LT or WRITE_LT, and another client
Myklebust, et al. Expires April 19, 2006 [Page 14]
Internet-Draft NFSv4 byte range delegations October 2005
holds a conflicting delegation, the server should return
NFS4ERR_DENIED. If, however locktype is either READW_LT or
WRITEW_LT, the server should initiate a recall of all conflicting
delegations prior to returning NFS4ERR_DENIED.
If a client requests a locktype of WRITE_LT or WRITEW_LT on a region
for which it already holds a read delegation, then the server should
attempt to atomically upgrade the existing delegation. A server that
does not support atomic upgrades or downgrades of the byte range
delegation should return NFS4ERR_LOCK_NOTSUPP.
On success, the server returns the range covered by the delegation.
Note that the server may choose to extend the range requested by the
client in order to decrease the administrative burden by merging
noncontiguous delegation ranges. It MUST not, however, return a
range that is smaller than that requested by the client.
The "recall" flag is an optimisation that can be used by the server
to notify the client that a conflicting request is already queued.
If this flag is set to DOWNGRADE then the client should should
downgrade the write delegation to a read delegation. If it is set to
RECALL, then the client should release the delegation.
On success the current filehandle retains its value, and the current
delegation stateid is set to the new value.
IMPLEMENTATION
DELEG_RANGE may be called on a given stateid as many times as
desired. The server may represent the result bytes covered
internally as a list of noncontiguous byte ranges. Or, if it
chooses, it may choose a simpler representation--for example, a
single range covering all of the bytes ever requested. A server is
is free to reject DELEG_RANGE requests and to recall them for any
reason, so at worst, this might cause the server to deny requests (or
recall delegations) more often than is strictly necessary.
The READW_LT and WRITEW_LT lock types cause the server to recall any
conflicting delegations from other clients. A client will want to
use these variants in situations where strong cache consistency
guarantees are needed.
A length field with all bits one extends the delegation through the
end of file, regardless of how long the file actually is.
If mandatory file locking is on for the file, and if a lockowner on a
client other than the one from which this DELEG_RANGE request
originated holds a conflicting lock, then the server should return
Myklebust, et al. Expires April 19, 2006 [Page 15]
Internet-Draft NFSv4 byte range delegations October 2005
NFS4ERR_LOCKED.
ERRORS
NFS4ERR_ACCESS
NFS4ERR_ADMIN_REVOKED
NFS4ERR_BADHANDLE
NFS4ERR_BAD_RANGE
NFS4ERR_BAD_STATEID
NFS4ERR_BADXDR
NFS4ERR_DELAY
NFS4ERR_DENIED
NFS4ERR_EXPIRED
NFS4ERR_FHEXPIRED
NFS4ERR_GRACE
NFS4ERR_INVAL
NFS4ERR_ISDIR
NFS4ERR_LEASE_MOVED
NFS4ERR_LOCKED
NFS4ERR_LOCK_NOTSUPP
NFS4ERR_MOVED
NFS4ERR_NOFILEHANDLE
NFS4ERR_NO_GRACE
NFS4ERR_NOTSUPP
NFS4ERR_OLD_STATEID
NFS4ERR_RECLAIM_BAD
NFS4ERR_RECLAIM_CONFLICT
NFS4ERR_RESOURCE
NFS4ERR_SERVERFAULT
NFS4ERR_STALE
NFS4ERR_STALE_STATEID
Myklebust, et al. Expires April 19, 2006 [Page 16]
Internet-Draft NFSv4 byte range delegations October 2005
6.3 DELEG_DOWNGRADE - downgrades a write delegation on a byte range
SYNOPSIS
(cfh), stateid, deleg_seqid, offset, length -> stateid, recall
ARGUMENT
struct DELEG_DOWNGRADE4args {
/* CURRENT_FH: file */
stateid4 stateid;
seqid4 deleg_seqid;
offset4 offset;
length4 length;
};
RESULT
struct DELEG_DOWNGRADE4resok {
stateid4 stateid;
bool recall;
};
union DELEG_DOWNGRADE4res switch (nfsstat4 status) {
case NFS4_OK:
DELEG_DOWNGRADE4resok resok;
default:
void;
};
DESCRIPTION
DELEG_DOWNGRADE is used by the client to downgrade all write
delegations held over a given byte range and convert them into read
delegations.
The server may piggyback a request to have the client release the
delegation onto the reply by setting the "recall" flag to true.
On success the current filehandle retains its value, and the current
delegation stateid is set to the new value.
If the client holds no write delegations in the range
(offset,length), then the server should treat this operation as a
no-op and simply return NFS4_OK.
If the server is unable to atomically convert the existing write
delegations into read delegations, then the request should fail with
Myklebust, et al. Expires April 19, 2006 [Page 17]
Internet-Draft NFSv4 byte range delegations October 2005
the error NFS4ERR_LOCK_NOTSUPP.
ERRORS
NFS4ERR_ADMIN_REVOKED
NFS4ERR_BADHANDLE
NFS4ERR_BAD_RANGE
NFS4ERR_BAD_STATEID
NFS4ERR_BADXDR
NFS4ERR_DELAY
NFS4ERR_EXPIRED
NFS4ERR_FHEXPIRED
NFS4ERR_GRACE
NFS4ERR_INVAL
NFS4ERR_ISDIR
NFS4ERR_LEASE_MOVED
NFS4ERR_LOCK_NOTSUPP
NFS4ERR_MOVED
NFS4ERR_NOFILEHANDLE
NFS4ERR_NOTSUPP
NFS4ERR_OLD_STATEID
NFS4ERR_RESOURCE
NFS4ERR_SERVERFAULT
NFS4ERR_STALE
NFS4ERR_STALE_STATEID
Myklebust, et al. Expires April 19, 2006 [Page 18]
Internet-Draft NFSv4 byte range delegations October 2005
6.4 DELEG_RELEASE - release a delegation on a byte range
SYNOPSIS
(cfh), stateid, deleg_seqid, offset, length -> stateid
ARGUMENT
struct DELEG_RELEASE4args {
/* CURRENT_FH: file */
stateid4 stateid;
seqid4 deleg_seqid;
offset4 offset;
length4 length;
};
RESULT
struct DELEG_RELEASE4resok {
stateid4 stateid;
};
union DELEG_RELEASE4res switch (nfsstat4 status) {
case NFS4_OK:
DELEG_RELEASE4resok resok;
default:
void;
};
DESCRIPTION
The DELEG_RELEASE operation notifies the server that the client is no
longer caching any data in the specified range, and returns any byte
range delegations that may be held in that range.
ERRORS
NFS4ERR_ADMIN_REVOKED
NFS4ERR_BADHANDLE
NFS4ERR_BAD_RANGE
NFS4ERR_BAD_STATEID
NFS4ERR_BADXDR
NFS4ERR_DELAY
NFS4ERR_EXPIRED
NFS4ERR_FHEXPIRED
NFS4ERR_INVAL
NFS4ERR_ISDIR
Myklebust, et al. Expires April 19, 2006 [Page 19]
Internet-Draft NFSv4 byte range delegations October 2005
NFS4ERR_LEASE_MOVED
NFS4ERR_MOVED
NFS4ERR_NOFILEHANDLE
NFS4ERR_NOTSUPP
NFS4ERR_OLD_STATEID
NFS4ERR_RESOURCE
NFS4ERR_SERVERFAULT
NFS4ERR_STALE
NFS4ERR_STALE_STATEID
6.5 DELEG_PUT_STATEID - set the current delegation stateid
SYNOPSIS
(cfh), stateid -> (cstateid)
ARGUMENT
struct DELEG_PUT_STATEID4args {
/* CURRENT_FH: file */
stateid4 stateid;
};
RESULT
struct DELEG_PUT_STATEID4res {
nfsstat4 status;
};
DESCRIPTION
The DELEG_PUT_STATEID operation is used by the client to set the
current delegation stateid.
If the client specifies the special stateid consisting of all zeros,
then the server is expected to clear the current delegation stateid.
IMPLEMENTATION
This operation is used in order to apply a byte range delegation to
any subsequent READ or WRITE requests within the same COMPOUND.
ERRORS
NFS4ERR_ADMIN_REVOKED
NFS4ERR_BADHANDLE
NFS4ERR_BAD_STATEID
Myklebust, et al. Expires April 19, 2006 [Page 20]
Internet-Draft NFSv4 byte range delegations October 2005
NFS4ERR_BADXDR
NFS4ERR_DELAY
NFS4ERR_EXPIRED
NFS4ERR_FHEXPIRED
NFS4ERR_ISDIR
NFS4ERR_LEASE_MOVED
NFS4ERR_MOVED
NFS4ERR_NOFILEHANDLE
NFS4ERR_OLD_STATEID
NFS4ERR_RESOURCE
NFS4ERR_SERVERFAULT
NFS4ERR_STALE_STATEID
Myklebust, et al. Expires April 19, 2006 [Page 21]
Internet-Draft NFSv4 byte range delegations October 2005
7. New callback operations
7.1 CB_RECALL_RANGE - recall a byte range delegation
SYNOPSIS
stateid, offset, length, downgrade, truncate, fh -> ()
ARGUMENT
struct CB_RECALL_RANGE4args {
stateid4 stateid;
offset4 offset;
length4 length;
bool downgrade;
bool truncate;
nfs_fh4 fh;
};
RESULT
struct CB_RECALL_RANGE4res {
nfsstat4 status;
};
DESCRIPTION
The CB_RECALL_RANGE operation is used to compel a client to
relinquish a delegated byte range and return it to the server.
IMPLEMENTATION
The downgrade flag is used by the server to inform the client about
the nature of the caching conflict that triggered the callback. If
set, it indicates that it would suffice to resolve the conflict if
the client were to downgrade all write delegations in the range to
read delegations.
If the downgrade flag is not set, the client MUST prepare to release
all delegations in the specified range.
The truncate flag is used to inform the client that the byte range
being recalled is about to be truncated as a result of an incoming
SETATTR or OPEN. The client may use this information to discard any
queued writes that may otherwise have had to be transferred to disk.
If a race causes the client to believe that it is not holding any
delegations in the range specified by the server and there are no
Myklebust, et al. Expires April 19, 2006 [Page 22]
Internet-Draft NFSv4 byte range delegations October 2005
outstanding requests for this range, then it may signal this to the
server using the error NFS4ERR_BAD_RANGE. This may for instance be
the case if the server's CB_RECALL_RANGE call raced with a
DELEG_RELEASE from the client.
ERRORS
NFS4ERR_BADHANDLE
NFS4ERR_BAD_STATEID
NFS4ERR_BAD_XDR
NFS4ERR_BAD_RANGE
NFS4ERR_BAD_RESOURCE
NFS4ERR_BAD_SERVERFAULT
8. References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", RFC 2119.
[RFC3530] Shepler, S., "Network File System (NFS) version 4
Protocol", RFC 3530.
[draft-ietf-nfsv4-sess-01]
Talpey, T. and J. Bauman, "NFSv4 Session Extensions".
Authors' Addresses
Trond Myklebust
Network Appliance, Inc.
535 W. William St., Suite 3100
Ann Arbor, MI 48103
US
Phone: +1 734-764-5207
Email: Trond.Myklebust@netapp.com
J. Bruce Fields
U. of Michigan Center for Information Technology Integration
535 W. William St., Suite 3100
Ann Arbor, MI 48103
US
Email: bfields@citi.umich.edu
Myklebust, et al. Expires April 19, 2006 [Page 23]
Internet-Draft NFSv4 byte range delegations October 2005
William A. Adamson
U. of Michigan Center for Information Technology Integration
535 W. William St., Suite 3100
Ann Arbor, MI 48103
US
Email: andros@citi.umich.edu
Peter Honeyman
U. of Michigan Center for Information Technology Integration
535 W. William St., Suite 3100
Ann Arbor, MI 48103
US
Email: honey@citi.umich.edu
Myklebust, et al. Expires April 19, 2006 [Page 24]
Internet-Draft NFSv4 byte range delegations October 2005
Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Disclaimer of Validity
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Copyright Statement
Copyright (C) The Internet Society (2005). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights.
Acknowledgment
Funding for the RFC Editor function is currently provided by the
Internet Society.
Myklebust, et al. Expires April 19, 2006 [Page 25]