Network File System Version 4 T. Haynes Internet-Draft Hammerspace Intended status: Standards Track 25 May 2026 Expires: 26 November 2026 Proxy-Driven Server for Flexible Files Version 2 draft-haynes-nfsv4-flexfiles-v2-proxy-server-02 Abstract Parallel NFS (pNFS) with the Flexible Files Version 2 layout type supports client-side erasure coding and per-chunk repair between clients and data servers. This document extends that architecture with a proxy server (proxy server) role: a registered peer of the metadata server that polls the metadata server for work assignments and carries them out -- moving a file from one layout to another, reconstructing a whole file from surviving shards, or translating between codecs for clients that cannot participate in the file's native encoding (including NFSv3 clients). All proxy-server-to- metadata-server coordination is fore-channel: the metadata server returns work assignments inline in the response to a proxy-server- initiated PROXY_PROGRESS poll, and the proxy server reports completion via a fore-channel PROXY_DONE. No callback operations are required for the proxy server protocol. Note to Readers Discussion of this draft takes place on the NFSv4 working group mailing list (nfsv4@ietf.org), which is archived at https://mailarchive.ietf.org/arch/search/?email_list=nfsv4. Source code and issues list for this draft can be found at https://github.com/ietf-wg-nfsv4/flexfiles-v2-proxy-server. Working Group information can be found at https://github.com/ietf-wg- nfsv4. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Haynes Expires 26 November 2026 [Page 1] Internet-Draft FFv2 Proxy Server May 2026 Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 26 November 2026. Copyright Notice Copyright (c) 2026 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 5 2.1. Relation to the Main Draft . . . . . . . . . . . . . . . 6 3. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1. In Scope . . . . . . . . . . . . . . . . . . . . . . . . 6 3.2. Out of Scope . . . . . . . . . . . . . . . . . . . . . . 7 4. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.1. Administrative Ingest . . . . . . . . . . . . . . . . . . 9 4.2. Policy-Driven Layout Transition . . . . . . . . . . . . . 9 4.3. Data Server Maintenance / Evacuation . . . . . . . . . . 10 4.4. Whole-File Repair . . . . . . . . . . . . . . . . . . . . 10 4.5. TLS Coverage Transition . . . . . . . . . . . . . . . . . 10 4.6. Filehandle / Storage-Backend Transition . . . . . . . . . 11 4.7. Codec Translation for Codec-Ignorant Clients . . . . . . 11 4.7.1. Mechanism . . . . . . . . . . . . . . . . . . . . . . 12 4.7.2. Why the same PROXY_REGISTRATION machinery . . . . . . 12 5. Design Model . . . . . . . . . . . . . . . . . . . . . . . . 13 5.1. Roles . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5.2. Layout Model . . . . . . . . . . . . . . . . . . . . . . 14 5.2.1. Single-Layout Model . . . . . . . . . . . . . . . . . 14 5.2.2. Two-Layout State on the Metadata Server Side . . . . 14 5.2.3. Pinned definitions . . . . . . . . . . . . . . . . . 15 5.3. Session Between Metadata Server and Proxy Server . . . . 16 5.4. Flow Summary . . . . . . . . . . . . . . . . . . . . . . 17 5.5. Message Sequence: Policy-Driven Move . . . . . . . . . . 17 Haynes Expires 26 November 2026 [Page 2] Internet-Draft FFv2 Proxy Server May 2026 5.6. Message Sequence: Whole-File Repair . . . . . . . . . . . 19 5.7. Message Sequence: metadata-server-initiated Cancellation . . . . . . . . . . . . . . . . . . . . . . 19 6. New NFSv4.2 Operations . . . . . . . . . . . . . . . . . . . 20 6.1. proxy_stateid4: A New Stateid Type . . . . . . . . . . . 21 6.1.1. Value Space . . . . . . . . . . . . . . . . . . . . . 22 6.1.2. Metadata Server Minting . . . . . . . . . . . . . . . 22 6.1.3. Lifetime . . . . . . . . . . . . . . . . . . . . . . 22 6.1.4. Renewal Semantics . . . . . . . . . . . . . . . . . . 23 6.1.5. Authorization . . . . . . . . . . . . . . . . . . . . 23 6.2. Operation 92: PROXY_REGISTRATION - Register as Proxy Server . . . . . . . . . . . . . . . . . . . . . . . . . 23 6.2.1. ARGUMENTS . . . . . . . . . . . . . . . . . . . . . . 23 6.2.2. RESULTS . . . . . . . . . . . . . . . . . . . . . . . 23 6.2.3. DESCRIPTION . . . . . . . . . . . . . . . . . . . . . 24 6.3. Operation 93: PROXY_PROGRESS - Heartbeat and Receive Work Assignments . . . . . . . . . . . . . . . . . . . . . . . 26 6.3.1. ARGUMENTS . . . . . . . . . . . . . . . . . . . . . . 26 6.3.2. RESULTS . . . . . . . . . . . . . . . . . . . . . . . 26 6.3.3. DESCRIPTION . . . . . . . . . . . . . . . . . . . . . 27 7. New Fore-Channel Operations: PROXY_DONE and PROXY_CANCEL . . 29 7.1. Operation 94: PROXY_DONE - Commit or Roll Back a Proxy Operation . . . . . . . . . . . . . . . . . . . . . . . . 30 7.1.1. ARGUMENTS . . . . . . . . . . . . . . . . . . . . . . 30 7.1.2. RESULTS . . . . . . . . . . . . . . . . . . . . . . . 30 7.1.3. DESCRIPTION . . . . . . . . . . . . . . . . . . . . . 30 7.2. Operation 95: PROXY_CANCEL - Abort a Proxy Operation . . 32 7.2.1. ARGUMENTS . . . . . . . . . . . . . . . . . . . . . . 32 7.2.2. RESULTS . . . . . . . . . . . . . . . . . . . . . . . 32 7.2.3. DESCRIPTION . . . . . . . . . . . . . . . . . . . . . 32 8. Multi-Proxy Server Assignment Fan-out . . . . . . . . . . . . 33 9. Layout Shape During a Proxy Operation . . . . . . . . . . . . 34 9.1. Atomic commit on PROXY_DONE . . . . . . . . . . . . . . . 35 9.2. The swap window . . . . . . . . . . . . . . . . . . . . . 36 9.3. The CLAIM_PROXY open claim . . . . . . . . . . . . . . . 36 9.4. Drain interaction . . . . . . . . . . . . . . . . . . . . 38 9.5. Per-instance migration deltas (informative) . . . . . . . 39 10. Client Behavior . . . . . . . . . . . . . . . . . . . . . . . 40 10.1. When the Layout Is Recalled . . . . . . . . . . . . . . 40 10.2. In-Flight I/O When the Proxy Server Changes . . . . . . 40 11. State Machine . . . . . . . . . . . . . . . . . . . . . . . . 41 11.1. Transitions . . . . . . . . . . . . . . . . . . . . . . 42 12. Proxy Server Failure and Recovery . . . . . . . . . . . . . . 44 12.1. Proxy Server Crash During PROXY_ACTIVE . . . . . . . . . 44 12.2. Cascading Proxy Server Failure . . . . . . . . . . . . . 45 12.3. Source Data Server Crash During PROXY_ACTIVE . . . . . . 45 12.4. Destination Data Server Crash During PROXY_ACTIVE . . . 45 13. Metadata Server Crash Recovery . . . . . . . . . . . . . . . 45 Haynes Expires 26 November 2026 [Page 3] Internet-Draft FFv2 Proxy Server May 2026 13.1. Proxy Server Recovery Sequence . . . . . . . . . . . . . 45 13.1.1. Retention Requirement . . . . . . . . . . . . . . . 47 13.2. Proxy Server Identity Continuity . . . . . . . . . . . . 47 13.3. Lost Migration Records . . . . . . . . . . . . . . . . . 48 14. Security Considerations . . . . . . . . . . . . . . . . . . . 48 14.1. Credential Forwarding and the Privilege Boundary . . . . 49 14.2. Namespace Traversal Privilege . . . . . . . . . . . . . 52 14.3. Proxy-server-side Policy Enforcement (informative) . . . 54 15. Implementations . . . . . . . . . . . . . . . . . . . . . . . 54 15.1. reffs . . . . . . . . . . . . . . . . . . . . . . . . . 55 15.2. Demonstration . . . . . . . . . . . . . . . . . . . . . 55 16. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 56 17. Interaction with the Main Draft . . . . . . . . . . . . . . . 56 17.1. chunk_guard4 . . . . . . . . . . . . . . . . . . . . . . 57 17.2. CHUNK_LOCK . . . . . . . . . . . . . . . . . . . . . . . 57 17.3. CB_CHUNK_REPAIR . . . . . . . . . . . . . . . . . . . . 57 17.4. TRUST_STATEID / REVOKE_STATEID . . . . . . . . . . . . . 57 18. Open Questions . . . . . . . . . . . . . . . . . . . . . . . 57 19. Deferred . . . . . . . . . . . . . . . . . . . . . . . . . . 60 20. References . . . . . . . . . . . . . . . . . . . . . . . . . 60 20.1. Normative References . . . . . . . . . . . . . . . . . . 60 20.2. Informative References . . . . . . . . . . . . . . . . . 61 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 62 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 62 1. Introduction The Flexible Files Version 2 layout type ([I-D.haynes-nfsv4-flexfiles-v2]) introduces client-side erasure coding for pNFS and a per-chunk repair protocol (CB_CHUNK_REPAIR) that lets the metadata server direct an active client to reconstruct individual damaged chunks. That mechanism is sufficient for repairs whose scope is a handful of chunks in a file that has at least one live client. Haynes Expires 26 November 2026 [Page 4] Internet-Draft FFv2 Proxy Server May 2026 Three classes of work are outside the per-chunk repair model. The first is whole-file repair: the case in which enough data servers have failed that per-chunk reconstruction would require visiting every chunk, or in which no live client is available to drive the repair at all. The second is layout transitions: a file must move from one layout geometry to another for policy reasons (migrating to a new coding type, or re-mirroring), for maintenance reasons (evacuating a data server ahead of decommission), or for environmental reasons (moving between transport-security profiles or between filehandle backends). The third is codec translation: a client that cannot participate in the file's native codec -- including every NFSv3 client, and any legacy or minimal NFSv4 client that does not implement the file's encoding type -- still needs to read and write the file. This document specifies a proxy server (proxy server) role to address those three cases with a single mechanism: the proxy server opens a session to the metadata server and registers its capabilities via PROXY_REGISTRATION; the proxy server then polls the metadata server using PROXY_PROGRESS for work assignments (move, repair), which the metadata server returns inline in the poll response; the proxy server carries out each assignment and signals completion via PROXY_DONE (or abort via PROXY_CANCEL). All of the proxy-server-to-metadata-server coordination is fore-channel; the proxy server does not require a back-channel callback program. A client reaches a proxied file either by contacting the proxy server directly, as an ordinary NFS server, or through a pNFS layout whose data server is the proxy server. While a migration is in progress every client routes its I/O through the proxy server; for codec translation, only a client that cannot encode the file does. Flex Files v1 provides no standardized mechanism for migrating a file's layout while the file remains in use. Without such primitives, migration is left to implementation-specific machinery and cannot be performed safely across implementations. This design codifies that mechanism, closing what is today the single biggest interop gap between pNFS and parallel-filesystem competitors that already expose migration primitives. 2. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. Haynes Expires 26 November 2026 [Page 5] Internet-Draft FFv2 Proxy Server May 2026 2.1. Relation to the Main Draft [I-D.haynes-nfsv4-flexfiles-v2] defines CB_CHUNK_REPAIR and the per- chunk repair model. This document is the companion whole-file and per-client mechanism. Per-chunk and whole-file operations are mutually exclusive for a given file at a given time; coexistence rules are in Section 17. 3. Scope This section draws the boundary between the wire-level mechanism defined here and the much larger space of useful behaviours a proxy implementation might support. Drawing the boundary tightly keeps the protocol small enough to specify and implement in a single revision; everything beyond the boundary is either future work or implementation latitude, and both categories are called out below so later readers know which is which. 3.1. In Scope This document defines a new protocol role, a new session, and a small set of operations that flow on that session. Around that core it specifies the layout conventions a client observes while a proxy operation is active, the credential-forwarding rules a translating proxy must follow, and the recovery semantics for the three actor failures that matter during an operation (proxy server, metadata server, data server). The new role is the proxy server (proxy server), distinct from the metadata server and data server roles defined in [I-D.haynes-nfsv4-flexfiles-v2]. A proxy server registers with an metadata server and, on receipt of a directive from that metadata server, performs a move, a repair, or an ongoing codec translation on behalf of a client. The proxy server is opaque to most clients and is visible to the metadata server through a dedicated NFSv4.1+ session that the proxy server itself opens. All proxy-server-to- metadata-server coordination is fore-channel: the proxy server issues ops to the metadata server, and the metadata server returns work assignments inline in its responses. No callback channel is required for the proxy server protocol. The fore-channel surface is deliberately small. PROXY_REGISTRATION (Section 6.2) lets the proxy server declare the codec set it supports and its lease. PROXY_PROGRESS (Section 6.3) is the proxy server's poll-and-report op: the proxy server sends it as a heartbeat (with optional progress reports on in-flight migrations) and the metadata server replies with zero or more new work assignments inline. PROXY_DONE (Section 7.1) commits or rolls back a migration when the Haynes Expires 26 November 2026 [Page 6] Internet-Draft FFv2 Proxy Server May 2026 proxy server finishes it; PROXY_CANCEL (Section 7.2) lets the proxy server abort early. All four ops are fore-channel proxy-server-to- metadata-server. Around the operation set, the document specifies the layout conventions a client sees during a proxy operation and how a client discovers the proxy server in the first place. Codec translation for codec-ignorant clients, including NFSv3 clients, is in scope, and is the one case that stretches the move/ repair vocabulary. The same proxy machinery that handles move and repair also provides the persistent per-client translation that lets a client incapable of participating in a file's native codec still read and write the file. Unlike move and repair, which are transient transitions on the file, codec translation is an ongoing routing arrangement that persists as long as the codec-ignorant client is active. Credential-forwarding rules for a proxy that translates on behalf of a client are defined in the Security Considerations section. The proxy is a translator, not an authority: authorization decisions MUST remain with the metadata server, using the client's forwarded credentials. Getting this boundary wrong turns the proxy server into a privilege-elevation vehicle, so the rules are stated normatively and enforced at the metadata server rather than policed on the wire. Finally, the document defines recovery semantics for the three actor failures that matter during a proxy operation -- proxy server failure, metadata server failure, and data server failure -- each with its own fencing and re-registration rules so that a mid- operation crash does not leave a file in an unrecoverable state. 3.2. Out of Scope The items below are deliberately deferred. They split into three groups: features whose absence was an explicit design decision (delta journaling, partial-range moves), orchestration that belongs to a layer above a single proxy (multi-proxy pipelines, automated load balancing), and proxy-internal behaviour that does not surface on the wire and therefore needs no standardisation. Nothing on this list is precluded by the current design; each is a reasonable future extension. Journaling and partial moves: Move assignments in this revision are always whole-file. The proxy server performs a CSM-style write to all mirrors (source D, destination G, and any other mirrors in the file's mirror set) while reading source bytes from any mirror in the source set; the two-layout state on the metadata server keeps Haynes Expires 26 November 2026 [Page 7] Internet-Draft FFv2 Proxy Server May 2026 client traffic on L1 throughout, with an atomic swap to L2 at PROXY_DONE time (Section 5.2.2). Delta-journaling mechanisms -- capturing writes against an otherwise-offline source, replaying them on completion, or maintaining reference integrity across detached clones -- are a future extension, as is a partial-range move that would move a byte range while the rest of the file stays on the source. The whole-file two-layout commit covers every motivating scenario the design currently has. Orchestration beyond a single proxy: Multi-proxy pipelines (staged moves for very large files) and automated load balancing or predictive selection across registered proxies are out of scope. A metadata server in this revision selects a single proxy server per operation; load distribution across many proxies, when it matters, is expected to be handled by the metadata server's selection policy and does not surface as new wire protocol. Server-side copy as an alternative path: Integration with server- side copy ([RFC7862] Section 4) as an alternative to proxy-server- driven moves for single-file moves within one namespace is adjacent work. The two mechanisms are complementary (server-side copy is a client-directed intra-server operation; the proxy- server-driven move is an metadata-server-directed inter-server operation), and their intersection -- for example, using server- side copy under the hood of a proxy server move assignment -- is better specified in its own extension rather than bolted into this document. Proxy-internal features that do not surface on the wire: A proxy MAY implement content-integrity and error-correction layers, encryption and compression pass-through, log-structured write staging, and sector-alignment normalisation. These are useful motivating scenarios for the move/repair vocabulary but do not require new protocol surface beyond what the PROXY_PROGRESS / PROXY_DONE / PROXY_CANCEL fore-channel set already provides, and so they are left to implementation rather than standardised here. 4. Use Cases Seven motivating scenarios converge on the same mechanism. Six of them are transient transitions on the file itself: the file is changing state (being ingested, re-coded, evacuated, reconstructed, migrated between transport-security profiles, or migrated between filehandle backends), and the transition is what the proxy server drives to completion. The seventh is qualitatively different: the file is not changing, but specific clients -- those that cannot participate in the file's native codec -- are routed through a proxy server persistently, for as long as they are active. Haynes Expires 26 November 2026 [Page 8] Internet-Draft FFv2 Proxy Server May 2026 The distinction matters for how the metadata server schedules work. Transient transitions have a terminal state; the metadata server expects each one to complete (via terminal PROXY_PROGRESS) and then to retire the associated layout. The persistent routing case has no terminal state for the file as a whole; the proxy server stays in the layouts of codec-ignorant clients as long as those clients are open. In every case, a registered proxy server becomes the source of truth for a file's data during the operation, and clients are redirected to route I/O through that proxy server rather than directly to the original layout's data servers. The individual scenarios are described below. 4.1. Administrative Ingest An administrator rsyncs a file from an external source into the cluster as a single-copy file. Server policy requires the file to be mirrored or erasure coded. The metadata server queues a MOVE assignment for the file; the next PROXY_PROGRESS poll from a registered proxy whose codec set covers the destination layout returns the assignment in its response. The proxy populates the destination from the source, while any client that opens the file during the move sees a layout that routes I/O through the proxy. The source "layout" may not even be a Flex Files layout; it could be a non-pNFS NFS mount that the proxy reads as an NFSv4.2 client. Throughout the move the proxy presents the file to pNFS clients as if the move had not started, while populating the destination in the background. 4.2. Policy-Driven Layout Transition A server-objective or policy change ("files older than 30 days must be erasure coded", "high-access-rate files must have additional mirrors") requires transforming a file's layout without user visibility. The transformation is purely a layout change; the file contents are unchanged except at the shard level. The MOVE assignment carries the new layout's geometry and coding type via the destination deviceid in proxy_assignment4; the proxy reshapes the file's shards to match. Because the transformation type (encode / decode / transcode) is entirely specified by the (source, destination) deviceid pair plus the file's recorded layout type, the assignment does not need a separate transformation-class field. Haynes Expires 26 November 2026 [Page 9] Internet-Draft FFv2 Proxy Server May 2026 4.3. Data Server Maintenance / Evacuation A data server is scheduled for maintenance (hardware replacement, software upgrade, decommission). All files whose layouts reference that data server must be evacuated to replacement data servers before it is taken offline. The metadata server queues a MOVE assignment per file (source = the outgoing data server, target = a replacement); registered proxies pick the assignments up via PROXY_PROGRESS polls. Evacuation can be large-scale (thousands or millions of files); running per-client per-chunk repair over every file would be prohibitively expensive, but a single registered proxy can drive many concurrent migrations subject to the per-proxy server in-flight cap (Section 8). 4.4. Whole-File Repair Multiple data servers have failed such that per-chunk repair cannot reconstruct the file in place. The metadata server constructs a new layout backed by replacement data servers and queues a REPAIR assignment. The next registered-proxy server PROXY_PROGRESS poll receives the assignment; the proxy drives reconstruction from whatever surviving shards remain. If fewer than k shards survive across the mirror set, the proxy reports terminal failure via PROXY_DONE with pd_status set to NFS4ERR_PAYLOAD_LOST, matching the per-chunk repair semantics in the Repair Client Selection section of [I-D.haynes-nfsv4-flexfiles-v2]. 4.5. TLS Coverage Transition A file whose layout currently points at non-TLS-capable data servers needs to be migrated to TLS-capable data servers, or vice versa (an inventory change, a policy change mandating transport security, onboarding a new storage class whose data servers are TLS-only). A PROXY_OP_MOVE assignment applies: the destination data servers have the required transport security profile, the source data servers are retired. A client that arrives mid-transition is routed through the proxy and does not directly see the heterogeneous data server set. The proxy establishes its own RPC connections to source and destination data servers, potentially with different transport security profiles (non-TLS to source, mutual TLS [RFC9289] to destination, or any other combination). The proxy's per-data server security is independent of the client's security to the proxy. Haynes Expires 26 November 2026 [Page 10] Internet-Draft FFv2 Proxy Server May 2026 4.6. Filehandle / Storage-Backend Transition A data server changes the filehandles it issues for a file; this happens when the data server's underlying storage is migrated (e.g., from one backend object store to another) and the old filehandles become unresolvable on the new backend. Without a proxy, every client holding a layout has to be individually recalled and re- issued. With a proxy, the metadata server points all clients at the proxy (keeping their existing stateids and FHs intact), the proxy reconciles old-to-new FHs internally, and clients are recalled only at the end. This same mechanism covers several related situations: an NFSv3-to- NFSv4.2 data server protocol upgrade where the data server FHs change as a side effect of migrating from [RFC1813] to [RFC8881] semantics; a data-server-side format change that invalidates existing FHs (for example, a transition from a local POSIX store to an object store); and backend-opaque FH migration where the data server's FH structure is internally versioned and old clients hold stale versions. 4.7. Codec Translation for Codec-Ignorant Clients The coding-type registry defined in the IANA Considerations of [I-D.haynes-nfsv4-flexfiles-v2] is expected to grow. Not every client is required to implement every registered codec; a minimal client, a legacy client, or an NFSv3 client typically cannot participate in erasure-coded files at all. Per the codec-negotiation rules in [I-D.haynes-nfsv4-flexfiles-v2], such a client either retries with a different supported_types hint, falls back to metadata-server-terminated I/O, or (this case) is routed through a proxy that translates on its behalf. Unlike the move / repair / evacuation / transition use cases above, codec translation is persistent per client. The file itself is not changing state. What changes is the layout the metadata server hands to a codec-ignorant client: that client gets a single-data server layout naming a translating proxy server, with a coding_type the client does support (typically FFV2_ENCODING_MIRRORED, or for NFSv3 clients just a flat NFSv3 data surface). The proxy encodes and decodes on the fly against the real data servers; the client sees a flat file. The same file may be accessed directly by codec-aware clients (with a normal layout naming the real data servers) and through the proxy by codec-ignorant clients (with a proxy layout) simultaneously. The metadata server issues a different layout per request; only the codec-ignorant case routes through the proxy server. Haynes Expires 26 November 2026 [Page 11] Internet-Draft FFv2 Proxy Server May 2026 4.7.1. Mechanism A translating proxy runs two sides that meet internally. On its client-facing side it speaks the protocol the codec-ignorant client can speak: for an NFSv3 [RFC1813] client that is an NFSv3 server that re-exports the metadata server's namespace; for a legacy NFSv4.2 client that understands only some codecs, it is an NFSv4.2 data- server surface presenting FFV2_ENCODING_MIRRORED (or an equivalent codec the client supports). On its metadata-server-facing side it is an NFSv4.2 client to the metadata server plus whatever data server protocol the metadata server's real data servers speak. The proxy translates each client-facing op into the corresponding metadata server or data server op, applies the codec transformation between the two, and returns results. For an NFSv3 client, a read flows: * Client: NFSv3 READ against the proxy. * Proxy: if it does not hold a layout for the file, issues LAYOUTGET on the metadata server with the client's forwarded credentials (see Security Considerations). * Proxy: issues CHUNK_READ (or v3 READ if the data server is NFSv3) against the real data servers, decodes the shards back to plaintext. * Proxy: returns the plaintext bytes in the NFSv3 READ reply. A write flows: * Client: NFSv3 WRITE with stable_how and a byte range. * Proxy: encodes the bytes per the file's codec, issues CHUNK_WRITE / CHUNK_FINALIZE / CHUNK_COMMIT against the real data servers. * Proxy: returns NFSv3 WRITE ok with the stable_how it was able to honor (which may be downgraded based on the back-end data server's own stable_how behavior). 4.7.2. Why the same PROXY_REGISTRATION machinery The registered-proxy server mechanism gives the metadata server the information it needs for translation-proxy selection: prr_codecs enumerates the codecs the proxy server can translate between, the metadata server <-> proxy server session carries the fore-channel control-plane traffic, and the lease bounds the relationship. No new op is required for the translation case -- the existing Haynes Expires 26 November 2026 [Page 12] Internet-Draft FFv2 Proxy Server May 2026 PROXY_REGISTRATION covers it. PROXY_OP_MOVE and PROXY_OP_REPAIR assignments are not used for pure translation (the file is not moving or being repaired); the proxy server simply serves the codec-ignorant client's I/O requests against the unchanged source layout. 5. Design Model 5.1. Roles This document introduces a third role alongside the pNFS metadata server (metadata server) and data server (data server): Proxy server (proxy server): A persistent, registered peer of the metadata server that carries out whole-file operations on the metadata server's behalf -- moving file content between layouts, reconstructing files whose source layout has been damaged, and translating codecs on behalf of clients that cannot participate in the file's native encoding. A proxy server is a distinct role from a data server; a given server MAY implement both, and typically does, but the protocol does not require that. The metadata server sees the proxy server through a dedicated session whose direction is defined in Section 5.3. The existing roles are unchanged: Metadata server (metadata server): As defined in [I-D.haynes-nfsv4-flexfiles-v2]: the coordinator for each file, and the authority that issues layouts, manages stateids, and selects repair participants. Data server (data server): As defined in [I-D.haynes-nfsv4-flexfiles-v2]: serves the CHUNK data path to pNFS clients. Only one of the three pairs carries new ops in this document. The metadata server <-> proxy server pair gains the new proxy-server-to- metadata-server regular ops for registration, progress, and terminal reporting (Section 6). No new callback ops are introduced; the metadata server pulls work assignments to the proxy server in the PROXY_PROGRESS reply on the fore-channel, and the proxy server reports completion or cancellation back via PROXY_DONE / PROXY_CANCEL on the same session, so no callback program is required for this protocol. The Client <-> proxy server pair gains no new ops: clients reach a proxy server through the normal pNFS data path, seeing it as the data server named in a single-data server layout (Section 9). The metadata server <-> data server pair is also unchanged; the tight-coupling control session in [I-D.haynes-nfsv4-flexfiles-v2] carries over as defined there. Haynes Expires 26 November 2026 [Page 13] Internet-Draft FFv2 Proxy Server May 2026 5.2. Layout Model 5.2.1. Single-Layout Model This design uses a single layout naming the proxy server as the sole data server rather than two linked layouts. Three considerations drive that choice. pNFS clients already handle single layouts cleanly, so no new layout-linkage mechanism needs to be invented or implemented on the client. The client's view of the file -- "the file's data server is the proxy server" -- is the truth during the operation, so exposing the source and destination data servers directly would invite confusion about which entry to address rather than clarify. And late-arriving clients see the proxy layout from the start, without any separate setup path to join an operation already in progress. The alternative -- a source layout plus a destination layout linked by a redirector record -- was considered and rejected on those three grounds. Routing all client I/O through the proxy server has a cost deployments MUST weigh. For the duration of a migration the proxy server is a data-path single point of failure for the file: the client sees one data server, the proxy server, and the usual Flex Files mitigation -- client-side mirroring across several data servers -- is unavailable to it. A file under migration therefore has lower availability than a normally-mirrored file until PROXY_DONE. The proxy server is also a throughput funnel: all client read and write traffic for the file, plus the proxy server's own copy traffic, passes through one proxy server, adding a latency hop and a bandwidth bottleneck. These costs are inherent to the single-layout choice; they argue against migrating very large files in a single proxy operation, and motivate the multi-proxy server and partial-range extensions listed as out of scope (Section 3.2). 5.2.2. Two-Layout State on the Metadata Server Side For each file F whose mirror on a draining data server D is being migrated, the metadata server keeps three logical layout records. Only L3 backs a client-facing layout; L1 and L2 are metadata-server- internal bookkeeping for the duration of the migration. L1: the pre-migration mirror set, including D. This record is metadata-server-internal: it preserves what the file's layout was before the migration and is handed to no client while the migration is active. L2: the post-migration mirror set: L1 with D replaced by the target G. Also metadata-server-internal; it becomes the file's layout after the PROXY_DONE swap. Haynes Expires 26 November 2026 [Page 14] Internet-Draft FFv2 Proxy Server May 2026 L3: the composite the proxy server works from. It backs the layout every client is served during the migration: that client-facing layout names the proxy server as its data server, and the proxy server does the real I/O against L3's two mirror entries: * M1 (read source): the L1 mirror set. The proxy server reads source bytes from any mirror in M1. * M2 (write target): the L1 mirror set PLUS G. The proxy server writes via CSM to every mirror in M2. During the migration every client of F -- whichever front door it used -- is served a layout naming the proxy server; the proxy server is the sole writer to M1 and M2. No client addresses D, E, or G directly. Because the proxy server is the sole writer, it MUST NOT acknowledge a client write until every mirror in M2 is durable: an acknowledged write has reached the whole target set, so a proxy server crash cannot leave a write acknowledged to the client but applied to only part of the mirror set. D's presence in M2 (alongside G) is intentional: the proxy server keeps D a current mirror until the PROXY_DONE swap, so a cancelled migration can fall back to L1 with no data loss. The proxy server writes both D and G, and because the proxy server is the only writer they converge to the same byte image with no inter-writer race. A source mirror MAY be an NFSv3 data server. When it is, the proxy server reads from it using NFSv3 semantics and writes the NFSv4.2 destination using CHUNK semantics; the asymmetric-protocol bridging is the proxy server's responsibility and is not visible to the client. 5.2.3. Pinned definitions * L1.mirrors = the file's pre-migration mirror set, includes D * L2.mirrors = (L1.mirrors \ {D}) union {G} * L3.M1 = L1.mirrors (proxy server's read-source set) * L3.M2 = L1.mirrors union {G} (proxy server's CSM write-target set) Haynes Expires 26 November 2026 [Page 15] Internet-Draft FFv2 Proxy Server May 2026 5.3. Session Between Metadata Server and Proxy Server The proxy server opens an NFSv4.1+ session to the metadata server as a normal client. All proxy-server-to-metadata-server coordination flows on the fore-channel of that session: PROXY_REGISTRATION establishes the relationship, PROXY_PROGRESS heartbeats and pulls work assignments, PROXY_DONE / PROXY_CANCEL report terminal state. No callback program is required for the proxy server protocol -- the session's back-channel is unused by this draft (the proxy server may still establish one for unrelated NFSv4.1 callbacks if it wishes, but no proxy-server-protocol op rides on it). The session direction is intentionally opposite to the metadata server -> data server tight-coupling control session in [I-D.haynes-nfsv4-flexfiles-v2]: that session is opened by the metadata server to carry metadata-server-originated stateid management to a data server. The metadata server <-> proxy server session is opened by the proxy server because registration is a proxy-server-initiated act -- the proxy server is saying "here I am, with these capabilities." Without a proxy-server-to-metadata-server direction the capability-advertisement would have to be inferred from session-setup flags alone, which is inadequate for the range of capabilities a proxy server can usefully advertise (codec set and -- as the DEVICEID_REGISTRATION open question anticipates -- fault-zone coordinates and other deployment attributes). A consequence of this direction choice is that a server that implements both the data server and proxy server roles toward the same metadata server runs two sessions between the same pair of hosts: the metadata server opens the data server tight-coupling session toward the box, and the box's proxy server opens the proxy server session toward the metadata server. That is two EXCHANGE_ID exchanges, two CREATE_SESSION exchanges, and two TCP connections. In deployments that use RPCSEC_GSS ([RFC7861]) or RPC-over-TLS ([RFC9289]) on the proxy server session -- which the credential- forwarding rules in Section 14 recommend for any proxy server that translates on behalf of clients -- reserved-port trust is not in use and the doubled connection has no security cost. In a strict AUTH_SYS-only deployment the second outbound reserved port is a real but typically negligible cost, because a storage box's outbound NFS traffic is usually limited to one connection per metadata server it is registered with. Haynes Expires 26 November 2026 [Page 16] Internet-Draft FFv2 Proxy Server May 2026 5.4. Flow Summary The proxy server opens a session to the metadata server and issues PROXY_REGISTRATION, declaring its supported codecs; the metadata server records the registration and returns a registration id with a granted lease. The proxy server then polls the metadata server via PROXY_PROGRESS at lease/2 cadence (or as the metadata server's ppr_lease_remaining_sec hint directs). When the metadata server decides to move or repair a file, it selects a registered proxy server whose capabilities match the operation and queues an assignment for that proxy server; the next PROXY_PROGRESS reply delivers the assignment in its ppr_assignments<> array. The proxy server picks the work up by issuing OPEN + LAYOUTGET on the assignment's pa_file_fh, drives the byte-shoveling phase to completion, and reports terminal status by issuing LAYOUTRETURN + PROXY_DONE in a single fore-channel compound on the same session. The metadata server may at any time retract an assignment that the proxy server has not yet acknowledged via OPEN+LAYOUTGET by including a PROXY_OP_CANCEL_PRIOR assignment for the same (pa_file_fh, pa_target_deviceid) pair in a subsequent PROXY_PROGRESS reply; the proxy server may cancel work it has already started via the fore- channel PROXY_CANCEL (Section 7.2). Clients interact with the proxy server through the normal layout path. During a proxy operation the metadata server hands out a single-data server layout naming the proxy server; clients route CHUNK I/O to that data server. Clients that arrive mid-operation see the proxy layout from the start and need no additional signalling; clients that held an older (non-proxy) layout are recalled via CB_LAYOUTRECALL and reacquire. 5.5. Message Sequence: Policy-Driven Move The simplest flow -- a quiesced whole-file move for a policy transition. Shown as a wire-level message sequence between the three protocol actors; clients are elided because in the quiesced case they are recalled before the proxy server work starts. Haynes Expires 26 November 2026 [Page 17] Internet-Draft FFv2 Proxy Server May 2026 proxy server metadata server | | | ---- CREATE_SESSION ----------> | (PS opens session to MDS) | <--- session est. ------------- | | | | ---- PROXY_REGISTRATION ------> | (advertise codecs) | <--- reg_id, granted_lease ---- | | | | ---- PROXY_PROGRESS ----------> | (heartbeat poll) | <--- NFS4_OK, ppr_assignments | (zero or more entries; one | includes MOVE assignment | delivers the MOVE work) | | | ---- OPEN(pa_file_fh) --------> | (PS picks up the work) | ---- LAYOUTGET (L3 composite) > | | | | [PS drives move: reads source | | DSes, encodes per destination | | codec, writes destination | | DSes via L3 fan-out] | | | | ---- PROXY_PROGRESS ----------> | (heartbeat; lease renewal, | <--- NFS4_OK, ppr_lease_rem.... | no new assignment needed) | | | ... | | | | ---- SEQUENCE | | PUTFH(pa_file_fh) | | LAYOUTRETURN(L3_stid) | | PROXY_DONE(L3_stid, OK) -> | (terminal: commit L1 -> L2) | <--- NFS4_OK ------------------ | | | | | --- CB_LAYOUTRECALL ---> | | (to clients holding L1) | | | | <-- LAYOUTRETURN ------ | | (from each client) | | | | (MDS retires source DSes; | | next LAYOUTGET on this | | file returns L2) v v Figure 1: Message sequence for a policy-driven move Haynes Expires 26 November 2026 [Page 18] Internet-Draft FFv2 Proxy Server May 2026 5.6. Message Sequence: Whole-File Repair Same shape as a move, but the assignment in PROXY_PROGRESS carries pa_kind = PROXY_OP_REPAIR and the source layout is degraded. Terminal outcomes: * NFS4_OK in pd_status: the proxy server reconstructed the file; the metadata server proceeds as in Figure 1. * NFS4ERR_PAYLOAD_LOST in pd_status: fewer than k shards survived across the mirror set; the metadata server marks the affected byte ranges lost and rolls back to L1. No CB_LAYOUTRECALL is issued because there is no valid destination layout to issue. 5.7. Message Sequence: metadata-server-initiated Cancellation The metadata server may decide to retract an assignment. Two cases: Assignment not yet acknowledged by the proxy server: The metadata server includes a PROXY_OP_CANCEL_PRIOR assignment in the next PROXY_PROGRESS reply, naming the same (pa_file_fh, pa_target_deviceid) pair as the prior MOVE / REPAIR assignment. The proxy server, which has not yet OPEN'd the file, simply drops the prior assignment from its in-flight queue. Assignment acknowledged and in flight: The metadata server internally aborts the migration and discards the in-flight record; the proxy server's eventual PROXY_DONE returns NFS4ERR_BAD_STATEID (the L3 layout stateid no longer resolves to a record), and the proxy server abandons the work and releases its OPEN. The metadata server may also let the proxy server's registration lease expire as a coarser cancellation. The proxy-server-initiated cancellation case uses the fore-channel PROXY_CANCEL (Section 7.2). Haynes Expires 26 November 2026 [Page 19] Internet-Draft FFv2 Proxy Server May 2026 proxy server metadata server | | | [MOVE assignment delivered in | | prior PROXY_PROGRESS reply, | | not yet acknowledged] | | | | | <-- (cancel decision) | | | ---- PROXY_PROGRESS ----------> | (next heartbeat poll) | <--- NFS4_OK, ppr_assignments | (CANCEL_PRIOR for the | includes CANCEL_PRIOR | same pa_file_fh/target) | | | [PS drops the prior assignment | | from its in-flight queue; | | no PROXY_DONE / PROXY_CANCEL | | is issued for this work] | v v Figure 2: Message sequence for metadata-server-initiated cancellation (assignment not yet acknowledged) 6. New NFSv4.2 Operations This document defines two new NFSv4.2 operations that a proxy server (proxy server) issues to the metadata server (metadata server) on the fore-channel of the proxy server -> metadata server session defined in Section 5.3. PROXY_REGISTRATION (92) is issued once at session setup and on renewal. PROXY_PROGRESS (93) is issued by the proxy server as a heartbeat-with-poll: the proxy server reports periodic and terminal progress for in-flight migrations and optionally requests new work; the metadata server replies inline with zero or more new work assignments. PROXY_DONE (94) commits or rolls back an individual migration when the proxy server finishes it; PROXY_CANCEL (95) lets the proxy server abort early. None of these operations is sent by pNFS clients. /// /* New operations for the proxy server (proxy server -> metadata server) */ /// /// OP_PROXY_REGISTRATION = 92; /// OP_PROXY_PROGRESS = 93; /// OP_PROXY_DONE = 94; /// OP_PROXY_CANCEL = 95; Figure 3: Proxy server operation numbers Opcodes 92 through 95 continue the metadata-server-to-data-server control-plane range that [I-D.haynes-nfsv4-flexfiles-v2] opens at 89 (TRUST_STATEID through BULK_REVOKE_STATEID at 89-91). Haynes Expires 26 November 2026 [Page 20] Internet-Draft FFv2 Proxy Server May 2026 The following amendment blocks extend the nfs_argop4 and nfs_resop4 dispatch unions from [RFC7863] with the new ops. A consumer that combines this document's extracted XDR with the [RFC7863] XDR applies the amendments at the unions' extension point. /// /* nfs_argop4 amendment block */ /// /// case OP_PROXY_REGISTRATION: /// PROXY_REGISTRATION4args opproxyregistration; /// case OP_PROXY_PROGRESS: /// PROXY_PROGRESS4args opproxyprogress; /// case OP_PROXY_DONE: /// PROXY_DONE4args opproxydone; /// case OP_PROXY_CANCEL: /// PROXY_CANCEL4args opproxycancel; Figure 4: nfs_argop4 amendment block /// /* nfs_resop4 amendment block */ /// /// case OP_PROXY_REGISTRATION: /// PROXY_REGISTRATION4res opproxyregistration; /// case OP_PROXY_PROGRESS: /// PROXY_PROGRESS4res opproxyprogress; /// case OP_PROXY_DONE: /// PROXY_DONE4res opproxydone; /// case OP_PROXY_CANCEL: /// PROXY_CANCEL4res opproxycancel; Figure 5: nfs_resop4 amendment block 6.1. proxy_stateid4: A New Stateid Type This document introduces proxy_stateid4, a new server-issued stateid type used as the canonical handle for an in-flight proxy migration. The wire shape reuses the standard NFSv4 stateid4 from [RFC8881] S3.3.12; no new XDR type is added: /// typedef stateid4 proxy_stateid4; Figure 6: proxy_stateid4 wire shape Haynes Expires 26 November 2026 [Page 21] Internet-Draft FFv2 Proxy Server May 2026 6.1.1. Value Space The proxy_stateid value space is disjoint from the open, lock, layout, and delegation stateid value spaces defined in [RFC8881]. Disjointness is enforced by context, not by an in-band tag. A proxy_stateid4 appears in exactly four places: the PROXY_PROGRESS, PROXY_DONE, and PROXY_CANCEL arguments, and the open_claim_proxy4 operand of an OPEN(CLAIM_PROXY) (Section 9.3) -- where it is carried inside the open claim, not in a stateid argument slot. An implementation MUST NOT use an open, lock, layout, or delegation stateid lookup table to resolve a proxy_stateid. Conversely, a leaked proxy_stateid presented in the stateid argument of an ordinary operation (e.g., READ, WRITE, SETATTR, CLOSE) MUST be rejected with NFS4ERR_BAD_STATEID. 6.1.2. Metadata Server Minting The metadata server mints a fresh proxy_stateid each time it accepts a work assignment for delivery to a proxy server, and includes it as the pa_stateid field of the proxy_assignment4 carried in the next PROXY_PROGRESS reply (Section 6.3). The metadata server guarantees that no two proxy_stateids in the same (server_state, boot_seq) are equal. An implementation MAY embed the metadata server boot_seq in the high-order bytes of other[12] to enable cheap NFS4ERR_STALE_STATEID detection across reboots; this is informative implementation guidance, not a wire requirement. One known implementation uses the layout { uint16_t boot_seq | uint16_t reserved | uint64_t opaque } where the opaque tail is getrandom(2) output. The reserved field is zero in this revision; implementations MUST emit zero and MUST NOT reject non-zero on receipt (left as a forward-compat slot for widening boot_seq). 6.1.3. Lifetime A proxy_stateid is valid from the instant the metadata server mints it until either: * The proxy server issues PROXY_DONE(proxy_stateid, ...) or PROXY_CANCEL(proxy_stateid) and the metadata server acknowledges it. On acknowledgment the proxy_stateid is retired; subsequent references return NFS4ERR_BAD_STATEID. * The proxy server's registration lease expires (Section 6.2), at which point all proxy_stateids minted for that proxy server are abandoned. Subsequent references return NFS4ERR_BAD_STATEID (or NFS4ERR_STALE_CLIENTID if the registration itself has been purged). Haynes Expires 26 November 2026 [Page 22] Internet-Draft FFv2 Proxy Server May 2026 * The metadata server reboots. Subsequent references to a proxy_stateid minted in a prior boot return NFS4ERR_STALE_STATEID. 6.1.4. Renewal Semantics PROXY_PROGRESS may carry a proxy_stateid in its arguments to renew an in-flight assignment. The seqid field of proxy_stateid4 follows the standard NFSv4 stateid seqid semantics in [RFC8881] S8.2.4: * The metadata server bumps seqid on each issuance, including renewal acknowledgments. * The proxy server sends the most recent seqid it has received. * Out-of-order seqids are rejected with NFS4ERR_OLD_STATEID. 6.1.5. Authorization Possession of a proxy_stateid is not sufficient to drive PROXY_DONE or PROXY_CANCEL on the corresponding migration. The metadata server additionally validates that the calling session's registered-proxy server identity owns that migration (see the "Authorization" subsection of Section 7.1 for the full normative rule). Without this check, a proxy server that learned another proxy server's proxy_stateid (through a packet capture, a leaked log, or any other channel) could drive its PROXY_DONE / PROXY_CANCEL on a migration it does not own. 6.2. Operation 92: PROXY_REGISTRATION - Register as Proxy Server 6.2.1. ARGUMENTS /// struct PROXY_REGISTRATION4args { /// ffv2_coding_type4 prr_codecs<>; /// uint32_t prr_lease; /// uint32_t prr_flags; /// }; Figure 7: XDR for PROXY_REGISTRATION4args 6.2.2. RESULTS Haynes Expires 26 November 2026 [Page 23] Internet-Draft FFv2 Proxy Server May 2026 /// struct PROXY_REGISTRATION4resok { /// uint64_t prr_registration_id; /// uint32_t prr_granted_lease; /// }; /// /// union PROXY_REGISTRATION4res switch (nfsstat4 prr_status) { /// case NFS4_OK: /// PROXY_REGISTRATION4resok prr_resok4; /// default: /// void; /// }; Figure 8: XDR for PROXY_REGISTRATION4res 6.2.3. DESCRIPTION A proxy server (proxy server) calls PROXY_REGISTRATION on the fore- channel of its session to the metadata server (Section 5.3) to declare its capabilities. The metadata server records the registration and MAY select that proxy server for subsequent MOVE / REPAIR work assignments delivered inline in the response to PROXY_PROGRESS. The prr_codecs field lists the ffv2_coding_type4 values the proxy server supports. The proxy server MUST be able to encode, decode, and transcode between any pair of values in this list. Because the transformation class of a PROXY_OP_MOVE assignment is inherent in the (source, destination) layout pair, this codec-set membership is all the capability information the metadata server needs to match. An empty list results in NFS4ERR_INVAL in this revision. The prr_lease field is the lease duration the proxy server requests in seconds. The metadata server MAY grant a shorter one, returned in prr_granted_lease. The proxy server MUST renew before the granted lease expires; on expiry the metadata server drops the registration and any in-flight migration record owned by this proxy server is abandoned (committed to L1 per Section 8). The prr_flags field is reserved for future use. In this revision the proxy server MUST set prr_flags to 0, and a metadata server that receives a PROXY_REGISTRATION with any bit of prr_flags set MUST reject it with NFS4ERR_INVAL. The "reject, don't ignore" rule follows the NFSv4 extension model in [RFC8178]. Section 8 of [RFC8178] specifies that when a flag bit is used that is not known in the specified minor version, NFS4ERR_INVAL is returned; Section 4.4.3 of [RFC8178] then explains that this same error is how a requester determines whether the responder understands Haynes Expires 26 November 2026 [Page 24] Internet-Draft FFv2 Proxy Server May 2026 the bit. Silently ignoring an unknown bit would break that discovery contract: a proxy server that sets a future capability bit against a metadata server that pre-dates the bit could not tell whether the metadata server honored the capability or simply dropped it. A future revision of this specification (or a successor document that updates it) MAY define new bit values in prr_flags, following the extension rules of Section 4.2 of [RFC8178]. A proxy server that understands a newly defined bit MAY set it when registering with a metadata server that supports it; on NFS4ERR_INVAL the proxy server MAY retry with the bit cleared, treating the response as the [RFC8178] Section 4.4.3 signal that the metadata server does not recognize the bit. On success, the metadata server returns a prr_registration_id that identifies this registration. The proxy server uses it to renew the registration before the granted lease expires (by re-issuing PROXY_REGISTRATION with the same prr_registration_id) and to identify itself across reconnects (see the squat-guard text below). Registration conveys capabilities only; the proxy server's network endpoint is conveyed through the same deviceinfo channel as any other data server's address. When the metadata server selects a proxy server for an operation, the layout issued to clients includes a ffv2_data_server4 entry pointing at the proxy server's existing deviceinfo. PROXY_REGISTRATION is issued on the fore-channel of the metadata server <-> proxy server session. That session is opened by the proxy server, not by the metadata server; it is distinct from the metadata server -> data server tight-coupling control session defined by [I-D.haynes-nfsv4-flexfiles-v2] even when the same host acts as both data server and proxy server. The proxy server MUST present EXCHGID4_FLAG_USE_NON_PNFS on the session so that the metadata server can distinguish it from a regular pNFS client. A metadata server that receives PROXY_REGISTRATION on a session whose owning client did not present EXCHGID4_FLAG_USE_NON_PNFS MUST reject it with NFS4ERR_PERM. Before recording the registration, the metadata server MUST authorize the caller as a registered proxy server for this metadata server. How that authorization is established -- a deployment allowlist, a per-metadata server provisioning step, integration with a directory service, or any other mechanism -- is implementation. The metadata server MUST reject an unauthorized PROXY_REGISTRATION with NFS4ERR_PERM. Haynes Expires 26 November 2026 [Page 25] Internet-Draft FFv2 Proxy Server May 2026 The authorization MUST be applied to a cryptographically authenticated identity, per the metadata server <-> proxy server transport- security requirements in Section 14.1. AUTH_SYS is never sufficient for PROXY_REGISTRATION; the metadata server MUST reject it. Because one proxy server proxies one metadata server, a successful rogue registration displaces the legit proxy server and returns NFS4ERR_STALE to every client holding cached filehandles against the previous proxy server. To guard against registration squatting, the metadata server MUST refuse a new PROXY_REGISTRATION from an authorized identity while an existing registration from that same identity still holds a valid lease; the metadata server returns NFS4ERR_DELAY and SHOULD log the conflict. A renewal -- distinguished by the proxy server re-presenting the same prr_registration_id it received on the prior registration -- is not squatting and the metadata server MUST accept it (refreshing the granted lease). Registration revocation before lease expiry is not a dedicated operation in this revision. A metadata server that needs to revoke a proxy server before its lease expires MUST cease delivering work assignments to that proxy server in PROXY_PROGRESS replies; MUST return NFS4ERR_STALE_CLIENTID on subsequent PROXY_PROGRESS or PROXY_REGISTRATION-renewal from the revoked proxy server; and MUST handle the proxy server's in-flight migration records as if the lease had expired (see the lease-expiry paragraph above): the records are abandoned and the affected layouts revert to the pre-migration state. The revoked proxy server, on its next PROXY_PROGRESS, sees NFS4ERR_STALE_CLIENTID and may either re-register (if the deployment policy allows) or shut down. A future revision MAY define a dedicated PROXY_REVOKE operation if operational experience shows lease revocation through silence is insufficient. 6.3. Operation 93: PROXY_PROGRESS - Heartbeat and Receive Work Assignments 6.3.1. ARGUMENTS /// struct PROXY_PROGRESS4args { /// uint32_t ppa_flags; /// }; Figure 9: XDR for PROXY_PROGRESS4args 6.3.2. RESULTS Haynes Expires 26 November 2026 [Page 26] Internet-Draft FFv2 Proxy Server May 2026 /// enum proxy_op_kind4 { /// PROXY_OP_MOVE = 0, /// PROXY_OP_REPAIR = 1, /// PROXY_OP_CANCEL_PRIOR = 2 /// }; /// /// struct proxy_assignment4 { /// proxy_op_kind4 pa_kind; /// proxy_stateid4 pa_stateid; /// nfs_fh4 pa_file_fh; /// deviceid4 pa_source_deviceid; /// deviceid4 pa_target_deviceid; /// opaque pa_descriptor<>; /// }; /// /// struct PROXY_PROGRESS4resok { /// uint32_t ppr_lease_remaining_sec; /// proxy_assignment4 ppr_assignments<>; /// }; /// /// union PROXY_PROGRESS4res switch (nfsstat4 ppr_status) { /// case NFS4_OK: /// PROXY_PROGRESS4resok ppr_resok; /// default: /// void; /// }; Figure 10: XDR for PROXY_PROGRESS4res 6.3.3. DESCRIPTION A registered proxy server calls PROXY_PROGRESS on the fore-channel of its session to the metadata server for two purposes: 1. *Heartbeat*: extend the proxy server's registration lease. The metadata server responds with ppr_lease_remaining_sec so the proxy server can size its next poll interval. 2. *Receive work assignments*: pick up zero or more units of work the metadata server has queued for this proxy server. Each assignment is a proxy_assignment4 describing one migration or repair the metadata server wants this proxy server to drive. Per [RFC8178] S4.4.3, ppa_flags is a reserved-for-future-use flag word; the metadata server MUST reject any non-zero bit with NFS4ERR_INVAL. The slot allows future revisions to add proxy-server- side appetite signaling (e.g., "do not give me more assignments right now") without an XDR break. Haynes Expires 26 November 2026 [Page 27] Internet-Draft FFv2 Proxy Server May 2026 The metadata server returns work assignments inline in ppr_assignments<>. A proxy server that does not want new work simply ignores the assignments past its in-flight cap; the metadata server does not retract assignments once delivered. Each assignment names a single file (pa_file_fh), the source and target data servers the migration moves data between (pa_source_deviceid / pa_target_deviceid), and a kind-specific opaque descriptor (pa_descriptor<>) for future extensions (for example, a precomputed source-layout descriptor so the proxy server can dial source data servers without a second LAYOUTGET). The pa_stateid field carries the proxy_stateid4 (Section 6.1) the metadata server has minted for this migration; the proxy server presents it in the OPEN(CLAIM_PROXY) that binds it to the file (Section 9.3) and references it as the handle in the eventual PROXY_DONE / PROXY_CANCEL. The pa_kind discriminates the work type: PROXY_OP_MOVE: drain or migrate the file's data between the named data servers. pa_stateid is the proxy_stateid the proxy server will reference in PROXY_DONE / PROXY_CANCEL. PROXY_OP_REPAIR: reconstruct a missing or corrupt mirror on pa_target_deviceid from the surviving mirrors. pa_stateid is the proxy_stateid the proxy server will reference in PROXY_DONE / PROXY_CANCEL. PROXY_OP_CANCEL_PRIOR: the metadata server rescinds an assignment it delivered in a prior PROXY_PROGRESS reply, before the proxy server acknowledged it via OPEN+LAYOUTGET. pa_stateid is the proxy_stateid of the assignment being rescinded; the proxy server MUST drop any in-progress work tagged with this proxy_stateid and MUST NOT issue PROXY_DONE / PROXY_CANCEL for it (the metadata server has already cleaned up the in-flight migration record on its side and retired the proxy_stateid). For each MOVE / REPAIR assignment, the proxy server picks the work up by issuing a normal NFSv4 OPEN+LAYOUTGET against pa_file_fh (the L3 composite layout), shovels bytes per the kind-specific protocol, and reports terminal status via PROXY_DONE(pa_stateid, ...) (Section 7.1) or PROXY_CANCEL(pa_stateid) (Section 7.2). Haynes Expires 26 November 2026 [Page 28] Internet-Draft FFv2 Proxy Server May 2026 pa_file_fh is an nfs_fh4 minted by the metadata server and presented to the proxy server for use against the same metadata server. Per [RFC8881] Section 4.2.3, NFSv4 filehandles are server-private opaque tokens; the receiving server treats the byte string as opaque, validates it only by attempting the lookup, and returns NFS4ERR_STALE or NFS4ERR_BADHANDLE if the bytes do not resolve. The proxy server MUST NOT inspect, mutate, or shape-check pa_file_fh; it forwards the filehandle verbatim in PUTFH on the same metadata server that issued it, and the existing PUTFH semantics apply unchanged. The ppr_lease_remaining_sec field is the metadata server's acknowledgment of this PROXY_PROGRESS as a registration lease renewal. It is the number of seconds remaining until the proxy server's registration would expire absent further PROXY_PROGRESS. A well-behaved proxy server treats it as a lower bound on its next poll deadline; the metadata server MAY return a smaller value than the standard NFSv4 lease period to drive a busy proxy server to poll more often or to encourage a quiet one to back off. Polling cadence: lease/2 in steady state. Adaptive backoff to lease and then 2*lease after K consecutive empty replies; reset on any non- empty reply. The metadata server may override the cadence via ppr_lease_remaining_sec. The metadata-server-initiated cancellation case (the metadata server abandons an in-flight assignment before the proxy server has driven it to terminal state) is signaled via the PROXY_OP_CANCEL_PRIOR assignment kind described above. There is no separate cancel callback; the proxy-server-initiated cancel is handled by the fore- channel PROXY_CANCEL (Section 7.2). 7. New Fore-Channel Operations: PROXY_DONE and PROXY_CANCEL The proxy-server-to-metadata-server protocol uses two new fore- channel operations in addition to the extended PROXY_PROGRESS: PROXY_DONE (op 94): proxy server reports terminal success or failure on a specific in-flight migration. The metadata server uses the ppd_status to atomically commit (success: swap the file's active layout from L1 to L2) or roll back (failure: keep L1, drop L2/G). PROXY_CANCEL (op 95): proxy server aborts a work item it was assigned but cannot complete (e.g., source data server becomes unreachable, proxy server resource exhaustion). The metadata server treats this as PROXY_DONE with a fail-equivalent status: rolls back to L1, drops L2/G, frees the assignment for re- assignment by a later PROXY_PROGRESS poll. Haynes Expires 26 November 2026 [Page 29] Internet-Draft FFv2 Proxy Server May 2026 Both operations identify the affected migration by layout stateid. The proxy server acquired this stateid earlier when it issued LAYOUTGET against the migration layout (L3) for this file; the metadata server keys its in-flight migration record on the (clientid, pa_file_fh, layout_stid) triple. No new stateid type is required. 7.1. Operation 94: PROXY_DONE - Commit or Roll Back a Proxy Operation 7.1.1. ARGUMENTS /// struct PROXY_DONE4args { /// proxy_stateid4 pd_stateid; /// nfsstat4 pd_status; /// }; Figure 11: PROXY_DONE arguments 7.1.2. RESULTS /// struct PROXY_DONE4res { /// nfsstat4 pdr_status; /// }; Figure 12: PROXY_DONE results 7.1.3. DESCRIPTION PROXY_DONE signals the terminal outcome of a migration the proxy server was assigned via PROXY_PROGRESS. pd_stateid is the proxy_stateid the metadata server minted when it delivered the corresponding proxy_assignment4 (Section 6.1). A pd_status of NFS4_OK directs the metadata server to commit the migration (swap the file's active layout from the pre-migration shape L1 to the post- migration shape L2); any other value directs the metadata server to roll back (keep L1, discard L2 and the proxy-server-only composite L3). The proxy server compounds PROXY_DONE after the byte-shoveling phase completes (or fails): SEQUENCE PUTFH(pa_file_fh) LAYOUTRETURN(L3_stateid) PROXY_DONE(pd_stateid, status) LAYOUTRETURN runs FIRST per [RFC8881] S18.51, releasing the proxy server's reference to the L3 layout cleanly via the standard mechanism. PROXY_DONE then operates on the in-flight migration record keyed by the proxy_stateid; the record is the single source of truth for migration state, so PROXY_DONE remains valid even though L3 Haynes Expires 26 November 2026 [Page 30] Internet-Draft FFv2 Proxy Server May 2026 has just been returned. The proxy server MAY issue PROXY_DONE in a subsequent compound, but the single-compound shape is RECOMMENDED to keep the recovery window short. 7.1.3.1. Authorization The metadata server MUST validate, in this priority order, returning the first failure encountered: 1. The calling session belongs to a registered proxy server (i.e., the session's owning client has nc_is_registered_ps set). Otherwise: NFS4ERR_PERM. 2. pd_stateid.other was minted in the current (server_state, boot_seq) tuple. A proxy_stateid minted in a prior boot returns NFS4ERR_STALE_STATEID. 3. pd_stateid.other identifies a proxy operation currently in flight at the metadata server. Otherwise: NFS4ERR_BAD_STATEID. 4. The proxy operation identified by pd_stateid is owned by the calling session's registered-proxy server identity. The identity captured at PROXY_REGISTRATION time -- the prr_registration_id if non-empty, or the matched GSS principal / mTLS fingerprint otherwise -- is the authorization principal, not the per- EXCHANGE_ID clientid4. This makes PROXY_DONE / PROXY_CANCEL tolerant of proxy server reconnect: a proxy server that drops its session and reconnects with a fresh EXCHANGE_ID but the same prr_registration_id retains authority over its in-flight migrations. Mismatch returns NFS4ERR_PERM. 5. The current filehandle (set by the preceding PUTFH) is the pa_file_fh of the proxy operation identified by pd_stateid. Otherwise: NFS4ERR_BAD_STATEID. 6. pd_stateid.seqid matches the most recently issued seqid for this proxy_stateid (per [RFC8881] S8.2.4 stateid sequence semantics). Otherwise: NFS4ERR_OLD_STATEID. If all validations succeed, the metadata server atomically: * For a pd_status of NFS4_OK: commits the migration -- promotes L2 to be the file's layout (D dropped, G promoted), drops L1 and L3, issues CB_LAYOUTRECALL on the prior layout to external clients still holding cached L1 references, and defers final removal of the decommissioned mirror D until all L1 holders return their layouts. See Section 9.1 for the full mechanics. Haynes Expires 26 November 2026 [Page 31] Internet-Draft FFv2 Proxy Server May 2026 * For any other pd_status: leaves the file's layout unchanged -- L1 stays in force; L2 and L3 are discarded. No CB_LAYOUTRECALL is needed (external clients never saw the post-image). The proxy server owns cleanup of any half-written data it placed on the target G. In both cases the metadata server retires the proxy operation; pd_stateid is thereafter invalid. Atomicity is critical: external client traffic must transition cleanly across this op; either the per-instance deltas commit fully or they do not commit at all. 7.2. Operation 95: PROXY_CANCEL - Abort a Proxy Operation 7.2.1. ARGUMENTS /// struct PROXY_CANCEL4args { /// proxy_stateid4 pc_stateid; /// }; Figure 13: PROXY_CANCEL arguments 7.2.2. RESULTS /// struct PROXY_CANCEL4res { /// nfsstat4 pcr_status; /// }; Figure 14: PROXY_CANCEL results 7.2.3. DESCRIPTION PROXY_CANCEL discards an assigned-but-unfinished migration. The proxy server uses it when it knows it cannot complete the assignment (the proxy server is being shut down gracefully, the source data server is unreachable, the destination data server rejected the writes, etc.) and wants to release the work item back to the metadata server without computing a specific failure status. pc_stateid is the proxy_stateid the metadata server minted when it delivered the corresponding proxy_assignment4. Compound shape: SEQUENCE PUTFH(pa_file_fh) LAYOUTRETURN(L3_stateid) PROXY_CANCEL(pc_stateid) Haynes Expires 26 November 2026 [Page 32] Internet-Draft FFv2 Proxy Server May 2026 LAYOUTRETURN runs first (standard [RFC8881] S18.51 release of the L3 layout); PROXY_CANCEL then operates on the in-flight migration record only. 7.2.3.1. Authorization The same priority-ordered validation as PROXY_DONE (Section 7.1) applies, with pc_stateid substituted for pd_stateid. In particular, the registered-proxy server identity that owns the proxy operation identified by pc_stateid MUST match the caller's, or the metadata server returns NFS4ERR_PERM; a proxy server cannot cancel another proxy server's migration. 7.2.3.2. Side effects If validation succeeds, layout-side effects mirror PROXY_DONE with a failing pd_status -- L1 stays in force; L2 and L3 are discarded; the half-filled target G is the proxy server's to clean up. The metadata server retires the proxy operation, invalidates pc_stateid, and (informatively) updates its operator-facing telemetry to record the cancellation. No CB_LAYOUTRECALL is needed. The distinction between PROXY_DONE(FAIL) and PROXY_CANCEL is purely intent / accounting: PROXY_DONE(FAIL) records that the proxy server attempted the migration and ran into a recoverable error; PROXY_CANCEL records that the proxy server abandoned the assignment without attempting it (or while attempting, decided not to report a specific failure cause). A metadata server implementation MAY surface the distinction in operator telemetry but MUST NOT make any behavioral distinction on the wire. 8. Multi-Proxy Server Assignment Fan-out When multiple proxy servers are registered against the same metadata server, the metadata server coordinates assignment fan-out under one hard invariant: at any time, at most one migration MUST be in flight for a given (pa_file_fh, pa_target_deviceid) pair. The metadata server MUST NOT assign a migration whose (pa_file_fh, pa_target_deviceid) matches that of an in-flight migration. How the metadata server chooses among eligible proxy servers -- by load, locality, fault domain, capability match, or any combination -- is an implementation matter. The protocol constrains only the outcome: the proxy server that receives the assignment is registered at delivery time, and the (pa_file_fh, pa_target_deviceid) invariant holds. Haynes Expires 26 November 2026 [Page 33] Internet-Draft FFv2 Proxy Server May 2026 When a registered proxy server loses its session -- its lease expires, or a competing proxy server takes its registration slot in a squat -- the metadata server MUST treat each migration the proxy server had in flight as if PROXY_DONE(FAIL) had been issued: L1 stays in force, L2 and L3 are discarded. The metadata server MAY then reassign the work to another eligible proxy server; the (pa_file_fh, pa_target_deviceid) invariant continues to hold across reassignment. A proxy server that reconnects with the same client_owner4, and so recovers the same clientid via EXCHANGE_ID, retains ownership of its in-flight migrations; no reassignment is needed. The proxy server reclaims its layouts via the metadata-server-recovery path in Section 13. A host that does not implement the proxy server role simply does not call PROXY_REGISTRATION and is never selected for a MOVE or REPAIR assignment. A deployment with no registered proxy server falls back to per-chunk CB_CHUNK_REPAIR for single-shard repair, to admin- coordinated offline procedures for policy transitions and data server evacuation, and to blocking data server maintenance -- the data server cannot drain through a proxy server, so it must remain reachable to clients throughout its service life. Deployments SHOULD ensure at least one registered proxy server exists per failure domain to avoid a single point of failure on move operations. 9. Layout Shape During a Proxy Operation The layout the metadata server hands out to clients while a proxy operation is active is the mechanism's sole client-facing surface. Everything else in this document -- the session, the ops, the credential-forwarding rules -- is between the metadata server and the proxy server. The layout shape is therefore what a client implementer needs to read to know how its code interacts with a proxied file. On a LAYOUTGET, the metadata server chooses one of three outcomes: * A direct-data server layout, when no proxy operation is in flight for the file and the client's coding-type support set includes the file's coding. This is the unchanged FFv2 path. Haynes Expires 26 November 2026 [Page 34] Internet-Draft FFv2 Proxy Server May 2026 * A single-data server layout naming the proxy server, when a MOVE or REPAIR migration is in flight for the file, or when the client's coding-type support set does not include the file's coding and a registered proxy server can translate. The client uses this layout as it would any FFv2 layout, sending CHUNK ops to the named data server; the proxy server internally dispatches reads and writes to the source and destination data servers. * NFS4ERR_CODING_NOT_SUPPORTED (see [I-D.haynes-nfsv4-flexfiles-v2]), when the client's coding-type support set does not include the file's coding and no registered proxy server can translate. A client that supports FFv2 -- which is the precondition for any of this -- needs no proxy-specific code: the proxy case arrives as a single-data server layout, indistinguishable from any other FFv2 layout. 9.1. Atomic commit on PROXY_DONE When the proxy server issues PROXY_DONE(pd_stateid, pd_status=NFS4_OK), the metadata server atomically (in one transaction): 1. Promotes L2 to be the file's layout (D dropped, G promoted) 2. Drops L1 and L3 from the file's layout records 3. Retires the in-flight migration record 4. Issues CB_LAYOUTRECALL for the file's outstanding client-facing (proxy-server-naming) layouts 5. Defers REMOVE_MIRROR(D) until those layouts are returned On its next LAYOUTGET each client receives the post-migration layout (L2): the real data servers, with no proxy server. When PROXY_DONE indicates failure (or PROXY_CANCEL is issued): 1. L1 is promoted unchanged -- the file falls back to its pre- migration mirror set 2. L2 is dropped; the half-filled G instance is internally unlinked 3. L3 is dropped and the migration record retired Haynes Expires 26 November 2026 [Page 35] Internet-Draft FFv2 Proxy Server May 2026 4. CB_LAYOUTRECALL is issued for the proxy-server-naming layouts; on the next LAYOUTGET clients receive L1 9.2. The swap window Because every client wrote through the proxy server, no client ever addressed D directly, and there are no client writes in flight to D when the swap occurs. The deferred REMOVE_MIRROR(D) covers only the proxy server's own trailing writes: by the time it issues PROXY_DONE the proxy server, as the sole writer, has quiesced its M2 fan-out, so the deferral window is short and contains no client-visible activity. A client holding a proxy-server-naming layout when CB_LAYOUTRECALL arrives returns it and re-LAYOUTGETs in the usual way. In-flight client I/O to the proxy server across that boundary is handled by the in-flight-I/O rules for a proxy server change (see "In-Flight I/O When the proxy server Changes"). 9.3. The CLAIM_PROXY open claim The proxy server opens the file with a new OPEN claim, CLAIM_PROXY. A bare OPEN(CLAIM_NULL) cannot serve here: it would be indistinguishable from an ordinary or racing OPEN on the registered- proxy server session, leaving the metadata server to infer proxy intent from session state -- which it cannot do reliably. CLAIM_PROXY makes the proxy OPEN explicit and carries, in one step, what the metadata server needs to bind the proxy server to the file. open_claim_type4 gains one enumerant and open_claim4 one union arm: /// /* New OPEN claim for a proxy server; extends the /// open_claim_type4 enumeration of [RFC8881]. */ /// /// const CLAIM_PROXY = 7; /// /// struct open_claim_proxy4 { /// proxy_stateid4 ocp_proxy_stateid; /// nfs_fh4 ocp_ps_fh; /// }; /// /// /* open_claim4 gains the arm: /// case CLAIM_PROXY: /// open_claim_proxy4 ocp_proxy; /// */ Figure 15: The CLAIM_PROXY open claim Haynes Expires 26 November 2026 [Page 36] Internet-Draft FFv2 Proxy Server May 2026 CLAIM_PROXY is filehandle-based and opens an existing file: the proxy server issues PUTFH of the assignment's pa_file_fh followed by OPEN(CLAIM_PROXY), with no directory filehandle and no component name, in the manner of CLAIM_FH. CLAIM_PROXY MUST NOT be combined with OPEN4_CREATE. The operand carries two values: ocp_proxy_stateid: the proxy_stateid4 the metadata server minted for this assignment and returned in the PROXY_PROGRESS work assignment (Section 6.3). It is the correlator that identifies this OPEN as the proxy OPEN for a specific assignment; the metadata server does not infer proxy intent from the session. ocp_ps_fh: the filehandle under which the proxy server will serve the file to clients: the data-server filehandle that appears in the layout the metadata server hands a codec-incapable client, and the filehandle a non-pNFS client obtains by LOOKUP against the proxy server. Only the proxy server can mint this filehandle; it is opaque to the metadata server, which records it and copies it verbatim into the layouts it issues. Carrying it in the OPEN binds it atomically with the proxy OPEN. The metadata server MUST verify that ocp_proxy_stateid is valid, that it names an outstanding assignment, that the assignment was made to the calling clientid, and that the current filehandle is that assignment's file. An invalid or stale stateid draws NFS4ERR_BAD_STATEID. A proxy server MUST NOT issue OPEN(CLAIM_PROXY) unless it holds a successful PROXY_REGISTRATION (Section 6.2); successful registration is what establishes that the metadata server implements this extension. OPEN(CLAIM_PROXY) returns the ordinary OPEN result: an open stateid, and an open_delegation4 which the metadata server MUST set to OPEN_DELEGATE_NONE -- a delegation to the proxy server would conflict with the migration the proxy server is itself driving. The proxy server opens with OPEN4_SHARE_ACCESS_BOTH and OPEN4_SHARE_DENY_NONE. A retransmitted or re-issued OPEN(CLAIM_PROXY) is handled exactly as for any other claim: the session replay cache absorbs retransmits, and a genuine repeated OPEN by the same open-owner is an ordinary share-state operation. The proxy server then obtains the L3 composite layout with an ordinary LAYOUTGET; the metadata server serves L3 because the calling clientid holds an in-flight migration record for the file. The L3 layout stateid is a normal NFSv4 layout stateid, used for CHUNK / Haynes Expires 26 November 2026 [Page 37] Internet-Draft FFv2 Proxy Server May 2026 WRITE / READ I/O against the source and target data servers in the standard way. It is distinct from proxy_stateid4 (Section 6.1), which is a control-plane handle for the migration as a whole and is never presented to LAYOUTGET; the metadata server keys its in-flight migration record on the proxy_stateid. Separating the two -- one for I/O on a layout, one for the migration -- keeps the migration record's lifetime independent of any LAYOUTGET / LAYOUTRETURN cycle the proxy server performs during the byte-shoveling phase. 9.4. Drain interaction The DRAINING state on D is observable to external clients only through the absence of new layouts naming D: while D is DRAINING, the metadata server does not place D in any new mirror set. Before the migration becomes active for an existing file whose layout names D, the metadata server issues CB_LAYOUTRECALL on every outstanding layout for the file whose mirror set includes D. Once those layouts have been returned -- or administratively revoked when a client's CB back-channel fails to ack within the recall window -- the migration is in flight. From that point until the assigned proxy server completes its OPEN(CLAIM_PROXY) (Section 9.3) and registers the filehandle under which it will serve the file, the metadata server cannot yet build a client-facing layout, and answers every LAYOUTGET for the file with NFS4ERR_DELAY; clients retry. This window MUST be bounded: if the assigned proxy server does not complete its OPEN(CLAIM_PROXY) within a deadline tied to its registration lease, the metadata server reassigns or abandons the assignment and stops returning NFS4ERR_DELAY. Once the proxy server has opened, subsequent LAYOUTGETs for the file return a layout naming the proxy server. This omit-and-replace ordering guarantees that no client write hits D after the migration has started. The alternative -- keep-and-shadow, in which the layout view continues to include D and the proxy server shadows client writes from D to G as they happen -- requires the proxy server to expose itself as a flex-files data server (an INTERPOSED instance taking the place of D in the visible layout, with the proxy server funneling writes to both D and G). This shape is defined in the per-instance delta model below (informative) but is not exercised by the wire ops in this revision. Haynes Expires 26 November 2026 [Page 38] Internet-Draft FFv2 Proxy Server May 2026 9.5. Per-instance migration deltas (informative) The L1/L2/L3 framing above describes one valid implementation approach -- whole-layout swap -- that captures the simplest case (single mirror replacement under a Client Side Mirroring codec). A metadata server implementation that supports more general migrations (e.g., a single shard add to an erasure-coded file, or a partial mirror-set rotation under FFv2 RS) MAY record migration state as per- instance deltas on the file's existing layout records, rather than as a complete L2/L3 pair. In this informative model, each migration record carries an array of per-instance deltas, each delta describing a transformation on one position within one segment of layout_segments. Four instance states are useful: STABLE: unchanged; client writes go here directly. DRAINING: a slot being decommissioned; under omit-and-replace, the LAYOUTGET view-build path omits this slot and replaces it with the matching INCOMING. INCOMING: a new slot the proxy server is filling; under omit-and- replace, the LAYOUTGET view-build path emits this slot in place of the matching DRAINING. INTERPOSED: a slot whose visible endpoint is the proxy server, with the proxy server internally fanning writes out to one or more target data servers. Used by keep-and-shadow (forward-compat; not produced by the wire ops in this revision). The current published layout (layout_segments) is built through the deltas: when LAYOUTGET runs while a migration is active, the layout- build path consults the migration record and emits the during- migration view by applying the deltas to the base segments. layout_segments itself is never mutated until PROXY_DONE(NFS4_OK) collapses the deltas into the base records permanently. This per-instance model and the L1/L2/L3 swap model agree on the wire-visible behavior in the simplest case (single mirror replacement, omit-and-replace). The wire ops in this draft do not require either implementation; a metadata server chooses whichever matches its layout-record machinery. Haynes Expires 26 November 2026 [Page 39] Internet-Draft FFv2 Proxy Server May 2026 The wire ops in this draft do not constrain the choice; the per- instance delta model is one known implementation strategy that has been used to track the four record-builder invariants and a lease- aware reaper for the migration record / proxy_stateid tables across the lifecycle described above. 10. Client Behavior During a proxy operation the layout the metadata server hands a client is a single-data server FFv2 layout naming the proxy server. The client treats it as any other FFv2 layout, sending CHUNK ops to the named data server under its existing layout stateid. Nothing in the client's path distinguishes "the data server is a proxy server" from "the data server is a real data server"; that distinction lives entirely on the metadata server side. The proxy server accepts those CHUNK ops under the client's existing layout stateid because the metadata server has registered the stateid via TRUST_STATEID on the proxy server, per the tight-coupling semantics in [I-D.haynes-nfsv4-flexfiles-v2]. The client handles proxy-server-side errors (NFS4ERR_DELAY, connection loss, NFS4ERR_BAD_STATEID) exactly as it would any other data server error: report LAYOUTERROR to the metadata server and expect either a new layout or a proxy server reassignment in return. 10.1. When the Layout Is Recalled If the metadata server recalls the layout mid-operation (the proxy server failed and is being replaced, or the operation completed and normal data server layouts are being reissued), the client LAYOUTRETURNs as usual and reacquires via LAYOUTGET. The new layout may name a different proxy server, a different mirror set, or -- if the proxy operation has completed -- the real data servers directly. 10.2. In-Flight I/O When the Proxy Server Changes In-flight I/O to the old proxy server when the metadata server recalls the layout MAY complete at the old proxy server; results remain valid under the old proxy server's authority. New I/O issued after LAYOUTRETURN MUST go through the data server(es) the new layout names: a replacement proxy server, or the real data servers if the proxy operation has completed. Haynes Expires 26 November 2026 [Page 40] Internet-Draft FFv2 Proxy Server May 2026 11. State Machine A file's participation in a proxy operation passes through five states: READY (no operation in flight), ASSIGNED (the metadata server has queued an assignment for a proxy server but the proxy server has not acknowledged it via OPEN+LAYOUTGET), PROXY_ACTIVE (the proxy server is driving a move or repair), COMMITTING (the proxy server has issued PROXY_DONE(OK) and the metadata server is recalling the old layout from external clients), and DONE (clients are on the post-move layout, source data servers retired). The state is metadata-server- local: clients never observe these state names directly, but a client's behaviour is shaped by which layout the metadata server is currently handing out. A given file spends most of its lifetime in READY; a proxy operation is a relatively short excursion through the other four states, after which the file returns to READY with a new layout in place (or, on cancellation or failure, with the old layout preserved). The diagram below shows the states and the principal transitions, including the failure exits from ASSIGNED and PROXY_ACTIVE back to READY. The Transitions table that follows enumerates each transition with its trigger and effect. (admin, policy, repair, or maintenance trigger) | v +------------+ | READY | | source | | layout | | only | +-----+------+ | | MDS selects registered PS; | queues a proxy_assignment4 | for delivery in next | PROXY_PROGRESS reply v +--------------+ | ASSIGNED |---> back to READY on | MDS has the | cancellation | in-flight | (MDS-initiated, or | record; PS | lease expires before | has not yet | the PS picks up) | OPEN'd file | +-----+--------+ | | PS picks up the assignment: Haynes Expires 26 November 2026 [Page 41] Internet-Draft FFv2 Proxy Server May 2026 | OPEN(pa_file_fh) + LAYOUTGET | (L3 composite layout) v +--------------+ | PROXY_ACTIVE |---> back to READY on | clients see | PROXY_DONE(FAIL), | single-DS | PROXY_CANCEL, or | layout | PS lease expiry; | naming PS; | layout reverts to L1 | PS drives | | source->dest | +-----+--------+ | | PS issues SEQUENCE | PUTFH LAYOUTRETURN | PROXY_DONE(stid, OK) v +------------+ | COMMITTING | | MDS issues | | CB_LAYOUT- | | RECALL for | | old layout | +-----+------+ | | all clients have | LAYOUTRETURNed v +------------+ | DONE | | new layout | | live; | | source | | DSes | | retired | +------------+ Figure 16: File state during a proxy operation 11.1. Transitions +============+============+=====================+===================+ |From |To |Trigger | Actions | +============+============+=====================+===================+ |READY |ASSIGNED |metadata server | metadata server | | | |decides to move or | queues a | | | |repair | proxy_assignment4 | | | | | (kind=MOVE or | Haynes Expires 26 November 2026 [Page 42] Internet-Draft FFv2 Proxy Server May 2026 | | | | REPAIR) for | | | | | delivery in the | | | | | next | | | | | PROXY_PROGRESS | | | | | reply to the | | | | | selected proxy | | | | | server; creates | | | | | the in-flight | | | | | migration record | +------------+------------+---------------------+-------------------+ |ASSIGNED |PROXY_ACTIVE|proxy server picks up| proxy server | | | |the assignment | issues | | | | | OPEN(CLAIM_PROXY) | | | | | + LAYOUTGET | | | | | against | | | | | pa_file_fh; | | | | | metadata server | | | | | begins serving | | | | | clients a layout | | | | | naming the proxy | | | | | server | +------------+------------+---------------------+-------------------+ |PROXY_ACTIVE|COMMITTING |proxy server issues | metadata server | | | |PROXY_DONE with | begins | | | |pd_status=NFS4_OK | CB_LAYOUTRECALL | | | | | fan-out to | | | | | clients still on | | | | | the old layout | +------------+------------+---------------------+-------------------+ |COMMITTING |DONE |All clients have | metadata server | | | |LAYOUTRETURNed | issues post-move | | | | | layouts (L2); | | | | | source DSes | | | | | retired | +------------+------------+---------------------+-------------------+ |ASSIGNED |READY |metadata-server- | metadata server | | | |initiated | drops the in- | | | |cancellation: | flight record; | | | |metadata server | proxy server | | | |includes a | drops the | | | |PROXY_OP_CANCEL_PRIOR| assignment from | | | |assignment in the | its in-flight | | | |next PROXY_PROGRESS | queue | | | |reply | | +------------+------------+---------------------+-------------------+ |PROXY_ACTIVE|READY |proxy server failed | metadata server | | | |and no replacement | reverts layouts | | | |available; or proxy- | to pre-move | Haynes Expires 26 November 2026 [Page 43] Internet-Draft FFv2 Proxy Server May 2026 | | |server-initiated | source set (L1) | | | |cancellation via | | | | |PROXY_CANCEL; or | | | | |PROXY_DONE with a | | | | |failing pd_status | | +------------+------------+---------------------+-------------------+ Table 1 12. Proxy Server Failure and Recovery 12.1. Proxy Server Crash During PROXY_ACTIVE When a proxy server crashes mid-operation, client I/O routed through it receives NFS4ERR_DELAY (if the proxy server is reachable but unhealthy) or connection errors (if unreachable), and the affected clients report LAYOUTERROR to the metadata server. The metadata server MAY select a replacement proxy server from the registered pool and queue a fresh proxy_assignment4 (kind MOVE or REPAIR) for that proxy server in its next PROXY_PROGRESS reply, with the source layout updated to reflect current reality -- destination data servers that the failed proxy server populated are now part of the source set -- and the destination layout unchanged; the replacement proxy server resumes from wherever the failed proxy server left off. Before the replacement proxy server's layout becomes live, the metadata server MUST fence the failed proxy server: it revokes the failed proxy server's L3 layout stateid (REVOKE_STATEID) and, where the data server protocol supports it, fences the failed proxy server at the source and target data servers. Fencing closes the window in which a delayed write from a failed-but-not-dead proxy server could land after the replacement proxy server has taken over -- a two-proxy server instance of the write race the single-writer model otherwise prevents. The metadata server then issues CB_LAYOUTRECALL on the old layout and the replacement proxy server's layout becomes live for new LAYOUTGETs. If the metadata server cannot find a replacement within a policy timeout, it MUST cancel the operation: revert to the pre-move source layout, do not issue a destination layout, and mark the destination data servers for cleanup or retry. Haynes Expires 26 November 2026 [Page 44] Internet-Draft FFv2 Proxy Server May 2026 12.2. Cascading Proxy Server Failure A second proxy server failure on the same operation SHOULD escalate to deployment management rather than trigger another automatic replacement. Recurring failures across multiple proxy servers indicate an environmental issue no proxy server can work around -- an unreachable source data server, a misconfiguration, or a starved replacement pool -- that operator attention will resolve sooner than another retry. 12.3. Source Data Server Crash During PROXY_ACTIVE A source data server crash reduces the proxy server's read parallelism but does not block forward progress as long as the erasure code can still reconstruct the file from the surviving source data servers. If the source set degrades past reconstructibility, the operation transitions to whole-file repair semantics automatically: ranges that can still be reconstructed succeed; ranges that cannot terminate the operation with NFS4ERR_PAYLOAD_LOST. 12.4. Destination Data Server Crash During PROXY_ACTIVE A destination data server crash is handled as a normal data server failure on the destination side. The proxy server, acting as a client to the destination data servers, reports LAYOUTERROR to the metadata server, which MAY substitute a spare or mark the destination FFV2_DS_FLAGS_REPAIR. The proxy server continues pushing to the remaining destinations, and clients are unaffected. 13. Metadata Server Crash Recovery Clients and the proxy server detect metadata server session loss and enter RECLAIM per [RFC8881] S8.4 / S10.2.1. The proxy server's recovery extends the standard NFSv4.1 client-recovery sequence with one proxy-specific step -- re-registration -- and one explicit safety rule: if the proxy server cannot reclaim a layout for a migration it had in flight, the proxy server drops that migration rather than continuing with stale state. 13.1. Proxy Server Recovery Sequence The proxy server, after detecting metadata server session loss, performs the following steps in order: 1. *EXCHANGE_ID + CREATE_SESSION* with the proxy server's prior client_owner4. Standard NFSv4.1 client recovery; the proxy server's clientid4 is restored when the metadata server recognizes the prior client_owner4. Haynes Expires 26 November 2026 [Page 45] Internet-Draft FFv2 Proxy Server May 2026 2. *PROXY_REGISTRATION* with the proxy server's prior prr_registration_id. Re-registration is idempotent: the proxy server sends the same fields it would for a first-time registration and the metadata server accepts them as re- establishing the proxy server role on this session. The metadata server does not assume that it retained any record of this proxy server being registered previously; PROXY_REGISTRATION is the proxy server's explicit assertion of proxy-server role for this session. Until this step completes, the metadata server treats the client as an ordinary NFSv4 client and MUST NOT deliver proxy assignments to it. 3. *PROXY_PROGRESS* to pull the assignment queue. If the metadata server has retained any in-flight migrations owned by this proxy server, it delivers them in the reply -- each as a fresh proxy_assignment4 carrying a fresh proxy_stateid (the prior proxy_stateid is from a prior boot and is permanently stale; see Section 6.1). For each re-delivered assignment, the proxy server attempts to match it to a retained sidecar entry by (pa_file_fh, pa_target_deviceid). Matched assignments proceed to step 4; unmatched assignments are handled as fresh assignments. 4. *Per-file layout reclaim*, for each matched assignment: OPEN_RECLAIM(CLAIM_PREVIOUS, pa_file_fh) per [RFC8881] S9.11.1, followed by LAYOUTGET(reclaim=true) per [RFC8881] S18.43.3 with the proxy server's retained layout stateid as the reclaim key. On success the metadata server returns a fresh layout stateid for the resumed migration and the proxy server continues from where it left off. On NFS4ERR_NO_GRACE, NFS4ERR_RECLAIM_BAD, or any other reclaim failure, the proxy server MUST drop the migration: it issues PROXY_CANCEL with the fresh proxy_stateid delivered in step 3 to signal the metadata server that the migration cannot be resumed, discards the retained sidecar entry, and halts any pending I/O for the file. 5. *Sidecar entries the metadata server did not re-deliver.* For any migration the proxy server retained in its sidecar but the metadata server did not include in step 3, the proxy server MUST drop that migration: discard the sidecar entry and halt any pending I/O for the file. No signal to the metadata server is needed -- the metadata server already has no record of this migration. 6. *RECLAIM_COMPLETE* when all per-file reclaims and drops are done. Haynes Expires 26 November 2026 [Page 46] Internet-Draft FFv2 Proxy Server May 2026 Dropped migrations rejoin the metadata server's normal assignment path: a data server still DRAINING when recovery settles will, in due course, attract a fresh proxy_assignment4 -- to this proxy server or another -- under the (pa_file_fh, pa_target_deviceid) invariant of Section 8. The sidecar match in step 3 is best-effort across a metadata server reboot. deviceid4 is server-scoped and MAY change across a restart per [RFC8881] S3.3.7; a target deviceid the metadata server reassigns on restart will not match the proxy server's retained sidecar entry, the proxy server will not recognize the re-delivery as the same migration, and the affected sidecar entries are dropped per step 5. The autopilot re-drives the work as a fresh assignment. Implementations that preserve target deviceids across restart will resume mid-flight; those that do not will re-drive from scratch -- both are conformant. 13.1.1. Retention Requirement A proxy server MUST be able to supply each in-flight migration's layout stateid as the reclaim key after a proxy-server-process restart. The natural implementation retains it in proxy-server-local storage (e.g., a small sidecar file or DB table keyed by pa_file_fh) when the metadata server grants the L3 layout; how it is retained is an implementation matter. A proxy server that cannot supply the prior stateid cannot reclaim its layouts after a restart, and the affected migrations are dropped per step 5 above. 13.2. Proxy Server Identity Continuity A proxy server implementation SHOULD retain its client_owner4 across proxy-server-process restart so that post-restart EXCHANGE_ID recovers the same clientid and the in-flight migration records remain valid. If the proxy server's client_owner4 rotates (e.g., because proxy server process state was lost), the new EXCHANGE_ID gets a fresh clientid4. The proxy server's prr_registration_id (if matching the prior registration) identifies it as the same operator-meaningful proxy server instance via the squat-guard, but the in-flight migration records keyed on the OLD clientid cannot be claimed by the NEW clientid. Those migrations are dropped per the safety rules in Section 13.1; the autopilot re-issues fresh assignments at its discretion. Haynes Expires 26 November 2026 [Page 47] Internet-Draft FFv2 Proxy Server May 2026 13.3. Lost Migration Records The drop rules in Section 13.1 cover the lost-state case end-to-end: a migration the metadata server did not retain is dropped by the proxy server -- either because the metadata server does not re- deliver it in PROXY_PROGRESS, or because the per-file reclaim returns NFS4ERR_NO_GRACE / NFS4ERR_RECLAIM_BAD -- and the autopilot is free to re-drive the work as a fresh assignment. No proxy-specific signalling beyond standard NFSv4 reclaim errors and PROXY_CANCEL is needed. 14. Security Considerations The security surface added by this document sits in two places: the session the proxy server establishes with the metadata server, and the data path clients take through the proxy server during a proxy operation. The session is narrower than the data path -- only the metadata server talks to the proxy server over it, and the metadata server has long been a trusted coordinator in the pNFS model -- but it carries operations that affect every client whose layouts reach a proxy server. The data path is broader, because it exposes the proxy server to every client whose layout names it; a compromised proxy server on that path has the same observational and modification reach as a compromised data server, and in the translating-proxy server case a larger reach because of the elevated identity the proxy server typically runs with. Each threat the design addresses or explicitly leaves out of scope is named below. Credential forwarding, the most consequential and the most easily implemented incorrectly, is expanded in Section 14.1. proxy server authority: A proxy server in PROXY_ACTIVE sees all client I/O for the proxied file. A compromised proxy server can observe or modify file data. Deployments MUST treat proxy-server- capable hosts as at least as trusted as the data servers they proxy for. PROXY_REGISTRATION SHOULD be gated by deployment-level authorization; arbitrary hosts that present the op without prior provisioning SHOULD be rejected. Transport security across the operation: The proxy server's connections to source and destination data servers are independent of the client's connection to the proxy server. A proxy server MAY read from an AUTH_SYS source and write to a TLS destination (or any other combination). The proxy server is responsible for enforcing the effective security policy (e.g., do not downgrade encrypted data to a plaintext data server). Principal binding during a proxy operation: For proxy-server-to- Haynes Expires 26 November 2026 [Page 48] Internet-Draft FFv2 Proxy Server May 2026 data-server traffic (the proxy server reading source data servers and writing destination data servers to carry out a MOVE or REPAIR assignment), the proxy server presents a principal to those data servers that they will accept; this is the proxy server's own service identity unless constrained delegation or equivalent is arranged. Forwarding the client's identity to the peer data servers for proxy-server-driven data movement is NOT required and is typically NOT practical (the client is not in the conversation at that point). See Section 14.1 for the case of client-initiated file I/O through a translating proxy server, where the credential- forwarding rule is different and stricter. proxy server impersonation: A malicious metadata server could register a hostile entity as a proxy server. The existing metadata server trust model already grants the metadata server this capability via CB_LAYOUTRECALL and the ability to issue any layout it chooses; PROXY_REGISTRATION does not weaken it. Clients that require stronger proxy server identity verification SHOULD apply deployment-level authorization to the proxy server's transport-security credentials. Registration lease expiry: If a proxy server's lease expires mid- operation, the metadata server MUST abandon the operation: discard the in-flight migration record, revert the affected layouts to the pre-operation state, and arrange cleanup of any half-populated destination data servers. The metadata server MUST NOT continue to route client I/O to a proxy server whose registration has lapsed. 14.1. Credential Forwarding and the Privilege Boundary A translating proxy server (see Section 4.7) has structurally elevated privilege by design. To perform its management tasks -- moves, repairs, evacuations, cross-tenant re-exports -- the deployment grants the proxy server's service identity broad access: typically not-root-squashed, often read/write to every file in the namespace, and session authority to every data server. That privilege is intentional. Haynes Expires 26 November 2026 [Page 49] Internet-Draft FFv2 Proxy Server May 2026 A codec-ignorant client that reaches the proxy server, however, arrives with its own RPC credentials that the proxy server does not itself need in order to function. An NFSv3 client's uid/gid, an AUTH_SYS-squashed identity, an RPCSEC_GSS principal -- none of these are the proxy server's own. If the proxy server ignores the client's credentials and issues metadata server or data server operations under its own service identity when translating client I/O, every client that reaches the proxy server silently inherits the proxy server's privilege. This is a protocol-level privilege-escalation vector, and this document calls it out rather than hiding it. The normative requirements below apply whenever a proxy server is translating client-initiated file I/O (as distinct from proxy-server- driven move / repair work, which runs under the proxy server's own authority on directives from the metadata server). They form a cohesive set: credential pass-through is the core requirement; no- squash-inversion closes the most common way pass-through can be implemented incorrectly; authorization-remains-with-metadata server names the responsibility on the metadata server side of the same contract; service-identity-is-for-the-control-plane draws the line between the op paths where the proxy server uses its own credentials and the op paths where it does not; and the failure-mode rule specifies the correct refusal behaviour rather than letting a silent fall-through become the escape hatch. Credential pass-through: The proxy server MUST present the client's credentials (RPC auth flavor and principal) on every metadata server or data server operation it issues as a consequence of a client-initiated request. Specifically, a client READ that the proxy server expands into LAYOUTGET + CHUNK_READ MUST carry the client's credentials on both the LAYOUTGET against the metadata server and the CHUNK_READ against the data servers. The proxy server MUST NOT substitute its own service identity for client- initiated operations. No squash inversion: If the client arrives with a root-squashed identity (for example, uid 0 mapped to nobody by the NFSv3 export configuration on the client-facing side of the proxy server), the proxy server MUST preserve the squashed identity when forwarding. The proxy server MUST NOT translate a client's squashed credentials back into unsquashed root, even though the proxy server's own identity is typically unsquashed. Authorization remains with the metadata server: When a client- initiated operation reaches the metadata server over a proxy server <-> metadata server session, the metadata server MUST use the RPC credentials carried on that compound for authorization and MUST NOT substitute the proxy server's session-level identity. Haynes Expires 26 November 2026 [Page 50] Internet-Draft FFv2 Proxy Server May 2026 Equivalently: the metadata server performs access-control checks against the forwarded client credentials, not against the proxy server's service identity, for any client-initiated file operation. The proxy server is a translator, not an authority. This is what prevents proxy server deployment from becoming a blanket ACL override. proxy server service identity is for the control plane only: The proxy server MAY, and typically MUST, use its own service identity for: * The metadata server <-> proxy server session (the session the proxy server opens to the metadata server, on which PROXY_REGISTRATION, PROXY_PROGRESS, PROXY_DONE, and PROXY_CANCEL all flow on the fore-channel; the session's back- channel is not used by this draft). * Peer-data server session setup for proxy-server-driven data movement (reading source data servers, writing destination data servers under a MOVE assignment the metadata server has delivered via PROXY_PROGRESS). * proxy server housekeeping. The proxy server's service identity MUST NOT be used for client- initiated file data operations. Failure mode on missing credentials: If the proxy server cannot forward a client's credentials for some reason (e.g., the client presented AUTH_NONE, or the client-facing side used a security flavor the proxy server cannot propagate), the proxy server MUST reject the client operation with the equivalent of NFS4ERR_ACCESS (or NFS3ERR_ACCES for NFSv3 clients). The proxy server MUST NOT fall back to serving the operation under its own identity. Deployment-level requirements: * PROXY_REGISTRATION MUST be deployment-authorized. An unknown host presenting PROXY_REGISTRATION MUST be rejected. This is the only wire-level defense against a hostile entity registering as a proxy server and then receiving client-forwarded credentials. * The metadata server <-> proxy server session MUST use RPCSEC_GSS [RFC7861] or RPC-over-TLS [RFC9289] with mutual authentication. AUTH_SYS on the metadata server <-> proxy server session is forbidden. Haynes Expires 26 November 2026 [Page 51] Internet-Draft FFv2 Proxy Server May 2026 * Deployments SHOULD audit both the proxy server's credential- forwarding behavior (the proxy server logs what it forwards) and the metadata server's authorization checks (the metadata server logs what principal authorized each operation). Divergence between the two indicates a credential-forwarding bug or compromise. What the protocol cannot defend against: * A compromised proxy server has direct access to whatever credentials pass through it. Credential confidentiality collapses the moment the proxy server is under adversary control. Mitigation is operational: restrict which hosts can register as a proxy server, audit PROXY_REGISTRATION events, rotate deployment- level keys. * A deployment that configures a proxy server to run as root while the client is root-squashed has already violated rule 2 above; no wire mechanism detects a proxy server deliberately mis- implementing credential forwarding. Deployments SHOULD verify their proxy server implementation's credential-forwarding behavior through conformance testing before production use. Future work (noted as an Open Question below): RPCSEC_GSSv3 structured privilege assertion per [RFC7861] Section 2.5.2 is the natural strong-authentication mechanism for proxy-server-forwarded credentials. This revision does not require GSSv3 because the broader NFSv4 deployment base does not yet support it; deployments that can use GSSv3 SHOULD prefer it over AUTH_SYS passthrough for the credential-forwarding channel. 14.2. Namespace Traversal Privilege A proxy server that translates client I/O has to know how the metadata server's namespace is shaped: which paths are exported, what filehandle each path resolves to, how the exports mount within one another. The proxy server acquires this information by traversing the metadata server's namespace -- LOOKUP, LOOKUPP, PUTFH, PUTROOTFH, GETFH on the proxy server <-> metadata server session. This traversal cannot always run under forwarded client credentials: at the point the proxy server needs to discover a new export (a client has not yet asked for it, or the proxy server has just restarted and has no FH cache) there is no client whose credentials the proxy server could forward. Deployments have two choices for how the proxy server acquires namespace shape: Grant a narrow traversal privilege: The metadata server MAY treat a Haynes Expires 26 November 2026 [Page 52] Internet-Draft FFv2 Proxy Server May 2026 registered proxy server's service identity as authorized for LOOKUP, LOOKUPP, PUTFH, PUTROOTFH, GETFH, and SEQUENCE on the proxy server <-> metadata server session without applying the metadata server's export-rule filtering that would normally gate those names. This is strictly a structural privilege: it permits the proxy server to see that paths exist and to obtain their filehandles, but grants no data access. All operations that carry or require data authorization (OPEN, READ, WRITE, LAYOUTGET, GETATTR of privileged attributes, etc.) MUST still run under the rules of Section 14.1: forwarded client credentials for client- initiated operations, and proxy server service identity only for control-plane operations. A deployment that grants this privilege discloses the metadata server's namespace shape to the proxy server's service identity -- specifically, names that the proxy server's source address would not be able to see through the metadata server's normal export filtering. Deployments SHOULD audit traversal compounds on registered-proxy server sessions so the disclosure is reviewable; the metadata server SHOULD log each LOOKUP / GETFH that benefits from the bypass. Do not grant the privilege: The proxy server is required to translate every client-originated LOOKUP into a separate LOOKUP against the metadata server under the forwarding client's credentials, caching only what the client's credentials authorized the metadata server to return. This eliminates the namespace- shape disclosure but costs an extra metadata server round-trip per client LOOKUP-miss and leaves the proxy server unable to pre- discover exports. This document does not normatively prefer one approach over the other. Implementations SHOULD document which they use; deployment guidance for the common combined data server+proxy server case is that granting the privilege is the expected choice, acceptable given the proxy server is already a trusted control-plane peer of the metadata server. The traversal privilege is distinct from and narrower than root_squash bypass. A forwarded-uid-0 client operation (OPEN, READ, etc.), even when the privilege is granted, is still subject to normal root_squash handling on the proxy server's source-address rule at the metadata server; the privilege applies only to the six ops enumerated above. Haynes Expires 26 November 2026 [Page 53] Internet-Draft FFv2 Proxy Server May 2026 14.3. Proxy-server-side Policy Enforcement (informative) A proxy server implementation MAY perform per-client export-rule enforcement locally, rejecting operations the metadata server would also reject before forwarding them. This is a performance optimization: it keeps bad requests off the proxy server <-> metadata server wire and lets the proxy server return NFS4ERR_WRONGSEC / NFS4ERR_ACCESS without paying a round-trip. Local enforcement is not a security boundary. Rule 3 of Section 14.1 names the metadata server as the authority for every client-initiated file operation. A proxy server that performs local enforcement is checking its own cached copy of the metadata server's per-client rules; if the copy is stale, wrong, or absent, the proxy server MUST forward the operation and let the metadata server decide. A proxy server implementation that declines to perform local enforcement is conformant with this specification. Deployments that want local enforcement need a mechanism for the proxy server to acquire the metadata server's per-export client-rule list. This document does not standardise such a mechanism; implementation-specific options include a control-plane probe- protocol extension, out-of-band admin distribution, or a future revision of this specification. Any such mechanism MUST limit distribution to proxy servers that the deployment has authorized (the rules are sensitive deployment policy) and MUST support a refresh path so proxy servers see rule changes within a bounded time. 15. Implementations This section is to be removed before publishing as an RFC. This section records the status of known implementations of the protocol defined by this specification at the time of posting of this Internet-Draft, and is based on a proposal described in [RFC7942]. The description of implementations in this section is intended to assist the IETF in its decision processes in progressing drafts to RFCs. Please note that the listing of any individual implementation here does not imply endorsement by the IETF. Furthermore, no effort has been spent to verify the information presented here that was supplied by IETF contributors. This is not intended as, and must not be construed to be, a catalog of available implementations or their features. Readers are advised to note that other implementations may exist. Haynes Expires 26 November 2026 [Page 54] Internet-Draft FFv2 Proxy Server May 2026 15.1. reffs reffs is an open-source NFSv4.2 server, metadata server, and erasure- coding client. The reffs source ships a metadata server, a Proxy Server, and a multi-codec client harness used as the working implementation for this draft. reffs is licensed AGPL-3.0-or-later. The proxy server surface implemented in reffs covers, at the time of writing: * The proxy listener model (one process serving its native NFS port and a per-[[proxy_mds]] proxy server port from independent SB namespaces, see Section 5.3). * PROXY_REGISTRATION over RPCSEC_GSS-class auth, presently exercised via mutually-authenticated RPC-over-TLS ([RFC9289]) with a client- cert SHA-256 fingerprint allowlist. AUTH_SYS over plain TCP is rejected with NFS4ERR_PERM per Section 14. * PROXY_PROGRESS lease renewal and the empty-assignment idle path used by every steady-state proxy server poll. * Forwarding of LOOKUP, OPEN, READ, WRITE, GETATTR, CLOSE, LAYOUTGET, GETDEVICEINFO, LAYOUTRETURN, LAYOUTERROR through the proxy server to the upstream metadata server using the end- client's credentials. Forward-channel ops not yet exercised end-to-end in the public implementation include PROXY_DONE / PROXY_CANCEL, which are issued only after a PROXY_PROGRESS reply that delivers a proxy_assignment4. The metadata-server-driven assignment model (move, repair) is wire- implemented but the only assignment kind exercised by the published demo is the implicit no-assignment heartbeat that every PROXY_PROGRESS produces. 15.2. Demonstration A reproducible demonstration of cross-proxy server proxying, exercising the layout-passthrough data path through proxy server A and proxy server B against a shared metadata server + 6 data servers, lives in the reffs source under deploy/sanity/. The demo does not exercise migration, repair, or any proxy_assignment4; its purpose is to show that a client's codec-encoded write through one proxy server is recoverable byte-for-byte through a peer proxy server that shares the same metadata server. The matrix: Haynes Expires 26 November 2026 [Page 55] Internet-Draft FFv2 Proxy Server May 2026 +===============+========+==========================+========+ | Path | Layout | Codec | Result | +===============+========+==========================+========+ | /ffv1-csm | FF v1 | plain mirror | PASS | +---------------+--------+--------------------------+--------+ | /ffv1-stripes | FF v1 | stripe k=6, m=0 | PASS | +---------------+--------+--------------------------+--------+ | /ffv2-csm | FF v2 | plain mirror, CHUNK | PASS | +---------------+--------+--------------------------+--------+ | /ffv2-rs | FF v2 | RS(4,2), CHUNK | PASS | +---------------+--------+--------------------------+--------+ | /ffv2-mj | FF v2 | Mojette systematic (4,2) | PASS | +---------------+--------+--------------------------+--------+ Table 2 For each row the client opens /codec_