NFSv4 D. Noveck, Ed.
Internet-Draft P. Shivam
Intended status: Informational C. Lever
Expires: September 16, 2014 B. Baker
ORACLE
March 15, 2014

NFSv4 migration: Implementation experience and spec issues to resolve
draft-ietf-nfsv4-migration-issues-05

Abstract

The migration feature of NFSv4 provides for moving responsibility for a single filesystem from one server to another, without disruption to clients. Recent implementation experience has shown problems in the existing specification for this feature. This document discusses the issues which have arisen, explores the options available for curing the issues, and explains the choices made in updating the NFSv4.0 and NFSv4.1 specifications, to address migration.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on September 16, 2014.

Copyright Notice

Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.


Table of Contents

1. Introduction

This document is in the informational category, and while the facts it reports may have normative implications, any such normative significance reflects the readers' preferences. For example, we may report that the reboot of a client with migrated state results in state not being promptly cleared and that this will prevent granting of conflicting lock requests at least for the lease time, which is a fact. While it is to be expected that client and server implementers will judge this to be a situation that is best avoided, the judgment as to how pressing this issue should be considered is a judgment for the reader, and eventually the nfsv4 working group to make.

We do explore possible ways in which such issues can be avoided, with minimal negative effects, given that the working group has decided to address these issues, but the choice of exactly how to address these is best given effect in one or more standards-track documents and/or errata.

This document focuses on NFSv4.0, since that is where the majority of implementation experience has been. Nevertheless, there is discussion of the implications of the NFSv4.0 experience for migration in NFSv4.1, as well as discussion of other issues with regard to the treatment of migration in NFSv4.1.

2. Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

In the context of this informational document, these normative keywords will always occur in the context of a quotation, most often direct but sometimes indirect. The context will make it clear whether the quotation is from:

3. NFSv4.0 Implementation Experience

3.1. Implementation issues

Note that the examples below reflect current experience which arises from clients implementing the recommendation to use different nfs_client_id4 id strings for different server addresses, i.e. using what is later referred to herein as the "non-uniform client-string approach."

This is simply because that is the experience implementers have had. The reader should not assume that in all cases, this practice is the source of the difficulty. It may be so in some cases but clearly it is not in all cases.

3.1.1. Failure to free migrated state on client reboot

The following sort of situation has proved troublesome:

Note here that while it seems clear to us in this example that C-XYZ and C-ABC are from the same client, the server has no way to determine the structure of the "opaque" id string. In the protocol, it really is treated as opaque. Only the client knows which nfs_client_id4 values designate the same client on a different server.

3.1.2. Server reboots resulting in a confused lease situation

Further problems arise from scenarios like the following.

Note that if the client used "C" (rather than "C-ABC") as the nfs_client_id4 id string, the exact same situation would arise.

One of the first cases in which this sort of situation has resulted in difficulties is in connection with doing a SETCLIENTID for callback update.

The SETCLIENTID for callback update only includes the nfs_client_id4, assuming there can only be one such with a given nfs_client_id4 value. If there were multiple, confirmed client records with identical nfs_client_id4 id string values, there would be no way to map the callback update request to the correct client record. Apart from the migration handling specified in [RFC3530] and [RFC3530bis], such a situation cannot arise.

One possible accommodation for this particular issue that has been used is to add a RENEW operation along with SETCLIENTID (on a callback update) to disambiguate the client.

When the client updates the callback info to the destination, the client would, by convention, send a compound like this:

{ RENEW clientid4, SETCLIENTID nfs_client_id4,verf,cb }

The presence of the clientid4 in the compound would allow the server to differentiate among the various leases that it knows of, all with the same nfs_client_id4 value.

While this would be a reasonable patch for an isolated protocol weakness, interoperable clients and servers would require that the protocol truly be updated to allow such a situation, specifically that of multiple clientid4's with the same nfs_client_id4 value. The protocol is currently designed and implemented assuming this cannot happen. We need to either prevent the situation from happening, or fully adapt to the possibilities which can arise. See Section 4 for a discussion of such issues.

3.1.3. Client complexity issues

Consider the following situation:

Now, instead of a clientid4 identifying a client-server pair, we have many more entities for the client to deal with. In addition, it isn't clear how new state is to be incorporated in this structure.

The limitations of the migrated state (inability to be freed on reboot) would argue against adding more such state but trying to avoid that would run into its own difficulties. For example, a single lockowner string presented under two different clientids would appear as two different entities.

Thus we have to choose between:

In any case, we have gone (in adding migration as it was described) from a situation in which

To one in which

This sort of additional client complexity is troublesome and needs to be eliminated.

3.2. Sources of Protocol difficulties

3.2.1. Issues with nfs_client_id4 generation and use

The current definitive definitions of the NFSv4.0 protocol, [RFC3530] and [RFC3530bis] both agree. The section entitled "Client ID" says:

There are two possible interpretations of the phrase "uniquely defines" in the above:

The first interpretation would make these client-strings like phone numbers (a single person can have several) while the second would make them like social security numbers.

Endless debate about the true meaning of "uniquely defines" in this context is quite possible but not very helpful. The following points should be noted though:

Given the need for the server to be aware of client identity with regard to migrated state, either client-string construction rules will have to change or there will be a need to get around current issues, or perhaps a combination of these two will be required. Later sections will examine the options and propose a solution.

One consideration that may indicate that this cannot remain exactly as it is today has to do with the fact that the current explanation for this behavior is not correct. The current definitive definitions of the NFSv4.0 protocol, [RFC3530] and [RFC3530bis] both agree. The section entitled "Client ID" says:

In point of fact, a "SETCLIENTID with the same id string" sent to multiple network addresses will be treated as all from the same client but will not "cause the server to begin the process of removing the client's previous leased state" unless the server believes it is a different instance of the same client, i.e. if the id string is the same and there is a different boot verifier. If the client does not reboot, the verifier should not change. If it does reboot, the verifier will change, and the server should "begin the process of removing the client's previous leased state.

The situation of multiple SETCLIENTID requests received by a server on multiple network addresses is exactly the same, from the protocol design point of view, as when multiple (i.e. duplicate) SETCLIENTID requests are received by the server on a single network address. The same protocol mechanisms that prevent erroneous state deletion in the latter case prevent it in the former case. There is no reason for special handling of the multiple-network-appearance case, in this regard.

3.2.2. Issues with lease proliferation

It is often felt that this is a consequence of the client-string construction issues, and it is certainly the case that the two are closely connected in that non-uniform client-strings make it impossible for the server to appropriately combine leases from the same client.

However, even where the server could combine leases from the same client, it needs to be clear how and when it will do so, so that the client will be prepared. These issues will have to be addressed at various places in the spec.

This could be enough only if we are prepared to do away with the "should" recommending non-uniform client-strings and replace it with a "should not" or even a "SHOULD NOT". Current client implementation patterns make this an unpalatable choice for use as a general solution, but it is reasonable to "RECOMMEND" this choice for a well-defined subset of clients. One alternative would be to create a way for the server to infer from client behavior which leases are held by the same client and use this information to do appropriate lease mergers. Prototyping and detailed specification work has shown that this could be done but the resulting complexity is such that a better choice is to "RECOMMEND" use of the uniform client-string approach for clients supporting the migration feature.

Because of the discussion of client-string construction in [RFC3530] and [RFC3530bis], most existing clients implement the non-uniform client-string approach. As a result, existing servers may not have been tested with clients implementing uniform client-strings. As a consequence, care must be taken to preserve interoperability between UCS-capable clients and servers that don't tolerate uniform client strings for one reason or another.

4. Issues to be resolved in NFSv4.0

4.1. Possible changes to nfs_client_id4 client-string

The fact that the reason given in client-string-BP3 is not valid makes the existing "should" insupportable. We can't either

What are often presented as reasons that motivate use of the non-uniform approach always turn out to be cases in which, if the uniform approach were used, the server will treat a client which accesses that server via two different IP addresses as part of a single client, as it in fact is. This may be disconcerting to a client unaware that the two IP addresses connect to the same server. This is not a reason to use the non-uniform approach but is better thought of as an illustration of the fact that those using the uniform approach need to be aware of the possibility of server trunking and its effect on server behavior.

If it is possible to reliably infer the existence of trunking of server IP addresses from observed server behavior, use of the uniform approach would be more desirable, although compatibility issues would have to be dealt with.

An alternative to having the client infer the existence of trunking of IP server addresses, is to make this information available to the client directly. See Section 4.3 for details.

It is always possible that a valid new reason will be found, but so far none has been proposed. Given the history, the burden of proof should be on those asserting the validity of a proposed new reason.

So we will assume for now that the "should" will have to go. The question is what to replace it with.

4.2. Possible changes to handle differing nfs_client_id4 string values

Given the difficulties caused by having different nfs_client_id4 client-string values for the same client, we have two choices:

4.3. Possible changes to add a new operation

It might be possible to return server-identity information to the client, just as is done in NFSv4.1 by the response to the EXCHANGE_ID operation. This could be done by a SETCLIENTID_PLUS optional operation, which acts like SETCLIENTID, except that it returns server identity information. Such information could be used by clients, making it possible to for them to be aware of server trunking relationships, rather than having to infer them from server behavior.

It has been generally thought that protocol extensions such as this are not appropriate in bis documents and other documents updating NFSv4 protocol definition RFC's. However, it is argued in [NFS-ext] that protocol extensions, similar to those allowed between minor versions, should be acceptable to correct mistakes within a minor version.

A decision to adopt this approach will require considerable nfsv4 working group discussion and would probably best be effected by means of a standards-track document laying out a modified NFSv4 extension/versioning model applying to all minor versions, as has been proposed.

In view of the time to effect such changes, this approach is not likely to be adopted in an RFC updating [RFC3530] or [RFC3530bis], such as [migr-v4.0-update]. Still, it is worth keeping in mind, if implementers have difficulties inferring trunking relationships using the techniques discussed there.

4.4. Other issues within migration-state sections

There are a number of issues where the existing text is unclear and/or wrong and needs to be fixed in some way.

4.5. Issues within other sections

There are a number of cases in which certain sections, not specifically related to migration, require additional clarification. This is generally because text that is clear in a context in which leases and clientids are created in one place and live there forever may need further refinement in the more dynamic environment that arises as part of migration.

Some examples:

5. Proposed resolution of NFSv4.0 protocol difficulties

This section lists the changes which we believe are necessary to resolve the difficulties mentioned above. Such change, along with other clarifications found to be desirable during drafting and review are contained in [migr-v4.0-update].

5.1. Proposed changes: nfs_client_id4 client-string

We propose replacing client-string-BP3 with the following text and adding the following proposed to provide implementation guidance.

In addition, given the importance of the issue of client identity and the fact that both client string-approaches are to be considered valid, a greatly expanded treatment of client identity desirable. It should have the following major elements.

5.2. Proposed changes: merged (vs. synchronized) leases

The current definitive definitions of the NFSv4.0 protocol, [RFC3530] and [RFC3530bis] both agree. The section entitled "Migration and State" says:

There are a number of problems with this and any resolution of our difficulties must address them somehow.

To avoid client complexity, we need to have no more than one lease between a single client and a single server. This requires merger of leases since there is no real help from synchronizing them at a single instant.

For the uniform approach, the destination server would simply merge leases as part of state transfer, since two leases with the same nfs_client_id4 values must be for the same client.

We have made the following decisions as far as proposed normative statements regarding for state merger. They reflect the facts that we want to support fully migration support in the simplest way possible and that we can't say MUST since we have older clients and servers to deal with.

If the clients and the servers obey the SHOULD's, having more than a single lease for a given client-server pair will be a transient situation, cleaned up as part of adapting to use of migrated state.

Since clients and servers will be a mixture of old and new and because nothing is a MUST we have to ensure that no combination will show worse behavior than is exhibited by current (i.e. old) clients and servers.

5.3. Other proposed changes to migration-state sections

5.3.1. Proposed changes: Client ID migration

The current definitive definitions of the NFSv4.0 protocol, [RFC3530] and [RFC3530bis] both agree. The section entitled "Migration and State" says:

This poses some difficulties, mostly because the part about "client ID" is not clear:

We have decided that it is best to address this issue as follows:

5.3.2. Proposed changes: Callback re-establishment

The current definitive definitions of the NFSv4.0 protocol, [RFC3530] and [RFC3530bis] both agree. The section entitled "Migration and State" says:

The above will need to be fixed to reflect the possibility of merging of leases,

5.3.3. Proposed changes: NFS4ERR_LEASE_MOVED rework

The current definitive definitions of the NFSv4.0 protocol, [RFC3530] and [RFC3530bis] both agree. The section entitled "Notification of Migrated Lease" says:

There is a lack of clarity that is prompted by ambiguity about what exactly probing is and what the interlock between client and server must be. This has led to some worry about the scalability of the probing process, and although the time required does scale linearly with the number of filesystems that the client may have state for with respect to a given server, the actual process can be done efficiently.

To address these issues we propose rewriting the above to be more clear and to give suggestions about how to do the required scanning efficiently.

5.4. Proposed changes to other sections

5.4.1. Proposed changes: callback update

Some changes are necessary to reduce confusion about the process of callback information update and in particular to make it clear that no state is freed as a result:

5.4.2. Proposed changes: clientid4 handling

To address both of the clientid4-related issues mentioned in Section 4.5, we propose replacing the last three paragraphs of the section entitled "Client ID" with the following:

5.4.3. Proposed changes: NFS4ERR_CLID_INUSE

It appears to be the intention that only a single principal be used for client establishment between any client-server pair. However:

As a result, servers exist which reject a SETCLIENTID simply because there already exists a clientid for the same client, established using a different IP address. Although this is generally understood to be erroneous, such servers still exist and the spec should make the correct behavior clear.

Although the error name cannot be changed, the following changes should be made to avoid confusion:

6. Results of proposed changes for NFSv4.0

The purpose of this section is to examine the troubling results reported in Section 3.1. We will look at the scenarios as they would be handled within the proposal.

Because the choice of uniform vs. non-uniform nfs_client_id4 id strings is a "SHOULD" in these cases, we will designate clients that follow this recommendation by SHOULD-UF-CID.

We will also have to take account of any merger-related "SHOULD" clauses to better understand how they have addressed the issues seen. We abbreviate as follows:

6.1. Results: Failure to free migrated state on client reboot

Let's look at the troublesome situation cited in Section 3.1.1. We have already seen what happens when SHOULD-UF-CID does not hold. Now let's look at the situation in which SHOULD-UF-CID holds, whether SHOULD-SVR-AM is in effect or not.

The correctness signature for this issue is

so if you have clients and servers that obey the SHOULD clauses, the problem is gone regardless of the choice on the MAY.

6.2. Results: Server reboots resulting in confused lease situation

Now let's consider the scenario given in Section 3.1.2. We have already seen what happens when SHOULD-UF-CID does not hold . Now let's look at the situation in which SHOULD-UF-CID holds and SHOULD-SVR-AM holds as well.

Now let's consider the same scenario in the situation in which SHOULD-UF-CID holds and SHOULD-SVR-AM holds as well.

The correctness signature for this issue is

so if you have clients and servers that obey the SHOULD clauses, the problem is gone regardless of the choice on the MAY.

6.3. Results: Client complexity issues

Consider the following situation:

Now look what will happen under various scenarios:

The correctness signature for this issue is

so if you have clients and servers that obey the SHOULD clauses, the problem is gone regardless of the choice on the MAY.

6.4. Result summary

We have seen that (SHOULD-SVR-AM & SHOULD-UF-CID) are sufficient to solve the problems people have experienced.

7. Issues for NFSv4.1

Because NFSv4.1 embraces the uniform client-string approach, addressing migration issues is simpler. In the terms of Section 6, we already have SHOULD-UF-CID, for NFSv4.1, as advised by section 2.4 of [RFC5661], simplifying the work to be done.

Nevertheless, there are some issues that will have to be addressed. Some examples:

Discussion of how to resolve these issues will appear in the sections below.

7.1. Addressing state merger in NFSv4.1

The existing treatment of state transfer in [RFC5661], has similar problems to that in [RFC3530] and [RFC3530bis] in that it assumes that the state for multiple filesystems on different servers will not be merged to so that it appears under a single common clientid. We've already seen the reasons that this is a problem, with regard to NFSv4.0.

Although we don't have the problems stemming from the non-uniform client-string approach, there are a number of complexities in the existing treatment of state management in the section entitled "Lock State and File System Transitions" in [RFC5661] that make this non-trivial to address:

7.2. Addressing pNFS relationship with migration

This is made difficult because, within the PNFS framework, migration might mean any of several things:

Migration needs to support both the first and last of these models.

7.3. Addressing server owner changes in NFSv4.1

Section 2.10.5 of [RFC5661] states the following.

While this paragraph is literally true in that such reconfiguration events can happen and clients have to deal with them, it is confusing in that it can be read as suggesting that clients have to deal with them without disruption, which in general is impossible.

A clearer alternative would be:

8. Security Considerations

The current definitive definitions of the NFSv4.0 protocol, [RFC3530] and [RFC3530bis] both agree. The section entitled "Security Considerations" encourages that clients protect the integrity of the SECINFO operation, any GETATTR operation for the fs_locations attribute, and the operations SETCLIENTID/SETCLIENTID_CONFIRM. A migration recovery event can use any or all of these operations. We do not recommend any change here.

9. IANA Considerations

This document does not require actions by IANA.

10. Acknowledgements

The editor and authors of this document gratefully acknowledge the contributions of Trond Myklebust of NetApp and Robert Thurlow of Oracle. We also thank Tom Haynes of NetApp and Spencer Shepler of Microsoft for their guidance and suggestions.

Special thanks go to members of the Oracle Solaris NFS team, especially Rick Mesta and James Wahlig, for their work implementing an NFSv4.0 migration prototype and identifying many of the issues documented here.

11. References

11.1. Normative References

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC3530] Shepler, S., Callaghan, B., Robinson, D., Thurlow, R., Beame, C., Eisler, M. and D. Noveck, "Network File System (NFS) version 4 Protocol", RFC 3530, April 2003.
[RFC5661] Shepler, S., Eisler, M. and D. Noveck, "Network File System (NFS) Version 4 Minor Version 1 Protocol", RFC 5661, January 2010.
[RFC3530bis] Haynes, T. and D. Noveck, "Network File System (NFS) Version 4 Protocol", 2014.

Work in progress.

11.2. Informative References

[NFS-ext] Noveck, D., "NFS Protocol Extension: Retrospect and Prospect", 2014.

Work in progress.

[migr-v4.0-update] Noveck, D., Shivam, P., Lever, C. and B. Baker, "NFSv4.0 migration: Specification Update", 2013.

Work in progress.

Authors' Addresses

David Noveck (editor) 26 Locust Avenue Lexington, MA 02421 US Phone: +1 781 572 8038 EMail: david.noveck@emc.com
Piyush Shivam Oracle Corporation 5300 Riata Park Ct. Austin, TX 78727 US Phone: +1 512 401 1019 EMail: piyush.shivam@oracle.com
Charles Lever Oracle Corporation 1015 Granger Avenue Ann Arbor, MI 48104 US Phone: +1 248 614 5091 EMail: chuck.lever@oracle.com
Bill Baker Oracle Corporation 5300 Riata Park Ct. Austin, TX 78727 US Phone: +1 512 401 1081 EMail: bill.baker@oracle.com