N/A | T.E.K. Keiser |
Internet-Draft | S.J. Jenkins |
Intended status: Informational | A.P.D. Deason, Ed. |
Expires: March 14, 2013 | Sine Nomine |
September 12, 2012 |
AFSVol Tag-Length-Value Remote Procedure Call Extensions
draft-tkeiser-afs3-volser-tlv-04
AFS-3 is a distributed file system based upon prototypes developed at Carnegie Mellon University during the 1980s. AFS-3 heavily leverages Remote Procedure Calls (RPCs) as the foundation for its distributed architecture. This memo extends the volume management interface to support getting and setting of AFS volume attributes via an extensible Tag-Length-Value (TLV) encoding, which is based upon AFS-3 extensible discriminated unions. TLV-based get and set RPCs are specified, along with a tag enumeration RPC.
In addition, tags are allocated for existing volume and transaction metadata, and implementation-private tags are allocated for metadata related to the OpenAFS Demand Attach File Server, and the RxOSD protocol suite.
Comments regarding this draft are solicited. Please include the AFS-3 protocol standardization mailing list (afs3-standardization@openafs.org) as a recipient of any comments.
This document is in state "draft", as per the document state definitions set forth in [I-D.wilkinson-afs3-standardisation].
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http:/⁠/⁠datatracker.ietf.org/⁠drafts/⁠current/⁠.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on March 14, 2013.
Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http:/⁠/⁠trustee.ietf.org/⁠license-⁠info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.
This document may not be modified, and derivative works of it may not be created, and it may not be published except as an Internet-Draft.
AFS-3 [CMU-ITC-88-062] [CMU-ITC-87-068] is a distributed file system that has its origins in the VICE project [CMU-ITC-84-020] [CMU-ITC-85-039] at the Carnegie Mellon University Information Technology Center [CMU-ITC-83-025] a joint venture between CMU and IBM. VICE later became AFS when CMU moved development to a new commercial venture called Transarc Corporation, which later became IBM Pittsburgh Labs. AFS-3 is a suite of un-standardized network protocols based on a remote procedure call (RPC) suite known as Rx. While de jure standards for AFS-3 fail to exist, the various AFS-3 implementations have agreed upon certain de facto standards, largely helped by the existence of an open source fork called OpenAFS that has served the role of reference implementation. In addition to using OpenAFS as a reference, IBM wrote and donated developer documentation that contains somewhat outdated specifications for the Rx protocol and all AFS-3 remote procedure calls, as well as a detailed description of the AFS-3 system architecture.
The AFS-3 architecture consists of many administrative domains called "cells" [CMU-ITC-88-070] which are glued together to form a globally distributed file system. Each cell consists of: client nodes, which run cache manager daemons; file servers, which run file server daemons and volume server daemons; and database server nodes, which can run volume location database servers, protection database servers, backup database servers, or several other obscure and/or deprecated database services.
This memo focuses on the volume server [AFS3-VVL] component of AFS-3. The volume server provides an RPC interface for managing AFS volumes. Volumes are the unit of storage administration in AFS-3. Each volume contains a subtree of the file system, along with special directory entries called mount points, which are used to link volumes together into a (potentially cyclic) directed graph. Mount points can cross cell boundaries, thus permitting construction of a cross-organizational, globally distributed, location-transparent file system. The file system is location-transparent because mount points contain volume names and cell names (which are resolved to locations by contacting the appropriate cell's volume location database), rather than encoding the data's physical location directly in the pointer. This memo extends the AFS-3 volume server RPC interface with a suite of new RPCs that provide extensible volume metadata get and set operations.
The current AFSVol volume metadata introspection routines use hard-coded XDR [RFC4506] structure definitions. This significantly limits protocol extensibility because new remote procedure calls and structure definitions must be defined during each protocol revision. To some degree, this has been due to the lack of protocol standards documents: certain sites co-opted unused protocol fields for private uses, thus precluding the standards process from reclaiming these fields (without breaking existing deployments). Hence, each time new functionality is required, a new RPC--and, typically, new XDR data structures--need to be defined. This is a rather expensive process--both in terms of standardization, and implementation. Frequently, this leads to a desire to postpone protocol feature enhancements until many changes can be aggregated into a monolithic protocol upgrade.
This memo introduces a new tag-length-value (TLV) encoding mechanism based upon the AFS-3 extensible discriminated union primitive type [I-D.keiser-afs3-xdr-union]. This TLV encoding is utilized for getting and setting AFS-3 volume metadata. The key advantage of this design is that new TLV tuples can be allocated without defining a new RPC. Furthermore, because ext-union includes a length field in the encoding, it is always possible for a peer to decode the remainder of the XDR stream--even when a tag (and, thus, its XDR encoding) is unrecognized. Hence, decode error handling can happen at the application layer, instead of deep within the Rx protocol stack.
As the TLV changes require the addition of several new RPC interfaces (that are intended to eventually supplant extant interfaces), it is logical to add AFSVol capabilities [I-D.keiser-afs3-capabilities] for these new interfaces.
This memo aims to standardize a new TLV encoding mechanism for volume metadata. In addition, this memo will standardize the TLV encoding of volume metadata which is currently available via several AFSVol XDR structures, as well as specify the encoding of several new pieces of AFS-3 volume metadata that are not currently available via the AFSVol interface. For example, metadata specific to the OpenAFS Demand Attach File Server [DAFS] will be made available via the AFSVol service, whereas in the past it was only available locally on the file server machine via a proprietary interprocess communication mechanism.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].
When this capability bit is asserted, the file server is advertising that it supports Demand Attach File Server version 1 protocol semantics. Specifically, DAFS v1 semantics imply that the following invariants MAY be violated by the fileserver:
When this capability bit is asserted, the volserver is advertising that it supports Demand Attach File Server version 1 protocol semantics. Specifically, DAFS v1 semantics imply that the following invariants MAY be violated by the volserver:
In addition, the combination of AFSVOL_CAPABILITY_DAFS and AFSVOL_CAPABILITY_TLV MAY imply that the tag AFSVOL_TLV_TAG_VOL_STATE_DAFS_RAW exists. However, this implication SHOULD NOT be relied upon, as DAFS may evolve to the point where AFSVOL_TLV_TAG_VOL_STATE_DAFS_RAW has to be deprecated. When both of these capabilities are asserted, the client SHOULD still gracefully handle the VOLSER_TAG_UNSUPPORTED error for AFSVOL_TLV_TAG_VOL_STATE_DAFS_RAW.
Assertion of this capability bit indicates the ability to service the RPC calls described in Section 4.
A new suite of RPCs will be standardized to get/set tag-length-value tuples, and to enumerate supported tags. The tag namespace will be controlled by the AFS Assigned Numbers Registrar as an assigned numbers namespace.
The TLV data will be encoded using the AFS-3 extensible discriminated union [I-D.keiser-afs3-xdr-union]. The RPC-L specification for the base TLV types is as follows (NB: this is pseudocode; a less-concise, but machine-parseable version of this specification is included in Appendix Appendix A).
/* registrar-controlled tag namespace */ enum AFSVol_TLV_tag { ... }; const AFSVOL_TLV_TAG_MAX = 1024; /* upper-bound on number of * TLV tuples per RPC */ const AFSVOL_TLV_OPAQUE_MAX = 262144; /* upper-bound on size of * value payload */ const AFSVOL_TLV_TIME_MAX = 21845; /* upper-bound on length of AFSTime vector payload */ const AFSVOL_TLV_UINT64_MAX = 32768; /* upper-bound on length of uint64 vector payload */ enum AFSVol_TLV_type { AFSVOL_TLV_TYPE_NULL = 0, AFSVOL_TLV_TYPE_TRUE = 1, AFSVOL_TLV_TYPE_FALSE = 2, AFSVOL_TLV_TYPE_UINT64 = 3, AFSVOL_TLV_TYPE_UINT64_VEC = 4, AFSVOL_TLV_TYPE_INT64 = 5, AFSVOL_TLV_TYPE_INT64_VEC = 6, AFSVOL_TLV_TYPE_UUID = 7, AFSVOL_TLV_TYPE_STRING = 8, AFSVOL_TLV_TYPE_TIME_ABS = 9, AFSVOL_TLV_TYPE_TIME_ABS_VEC = 10, AFSVOL_TLV_TYPE_TIME_REL = 11, AFSVOL_TLV_TYPE_TIME_REL_VEC = 12, AFSVOL_TLV_TYPE_VOL_ID = 13, AFSVOL_TLV_TYPE_VOL_ID_VEC = 14, AFSVOL_TLV_TYPE_PART_ID = 15, AFSVOL_TLV_TYPE_PART_ID_VEC = 16, AFSVOL_TLV_TYPE_DISK_BLOCKS = 17, AFSVOL_TLV_TYPE_STAT_COUNTER = 18, AFSVOL_TLV_TYPE_STAT_GAUGE = 19, AFSVOL_TLV_TYPE_BIT64 = 20, AFSVOL_TLV_TYPE_VOL_DOW_USE = 21, AFSVOL_TLV_TYPE_OPAQUE = 22 }; ext-union AFSVol_TLV_value switch(AFSVol_TLV_type type) { case AFSVOL_TLV_TYPE_NULL: case AFSVOL_TLV_TYPE_TRUE: case AFSVOL_TLV_TYPE_FALSE: void; case AFSVOL_TLV_TYPE_UINT64: case AFSVOL_TLV_TYPE_VOL_ID: case AFSVOL_TLV_TYPE_PART_ID: case AFSVOL_TLV_TYPE_DISK_BLOCKS: case AFSVOL_TLV_TYPE_STAT_COUNTER: case AFSVOL_TLV_TYPE_BIT64: afs_uint64 u_u64; case AFSVOL_TLV_TYPE_INT64: case AFSVOL_TLV_TYPE_STAT_GAUGE: afs_int64 u_s64; case AFSVOL_TLV_TYPE_UINT64_VEC: case AFSVOL_TLV_TYPE_VOL_ID_VEC: case AFSVOL_TLV_TYPE_PART_ID_VEC: afs_uint64 u_u64_vec<AFSVOL_TLV_UINT64_MAX>; case AFSVOL_TLV_TYPE_INT64_VEC: afs_int64 u_s64_vec<AFSVOL_TLV_UINT64_MAX>; case AFSVOL_TLV_TYPE_TIME_ABS: AFSTime u_time_abs; case AFSVOL_TLV_TYPE_TIME_REL: AFSRelTimestamp u_time_rel; case AFSVOL_TLV_TYPE_TIME_ABS_VEC: AFSTime u_time_abs_vec<AFSVOL_TLV_TIME_MAX>; case AFSVOL_TLV_TYPE_TIME_REL_VEC: AFSRelTimestamp u_time_rel_vec<AFSVOL_TLV_UINT64_MAX>; case AFSVOL_TLV_TYPE_UUID: afsUUID u_uuid; case AFSVOL_TLV_TYPE_STRING: string u_string<AFSVOL_TLV_OPAQUE_MAX>; case AFSVOL_TLV_TYPE_VOL_DOW_USE: /* type defined later in this memo */ AFSVol_stat_use_per_dow u_vol_dow_use; case AFSVOL_TLV_TYPE_OPAQUE: opaque u_opaque<AFSVOL_TLV_OPAQUE_MAX>; }; const AFSVOL_TLV_FLAG_UNSUPPORTED = 0x1; const AFSVOL_TLV_FLAG_READ_ERROR = 0x2; const AFSVOL_TLV_FLAG_CRITICAL = 0x4; const AFSVOL_TLV_FLAG_QUALIFIER_NO_MATCH = 0x8; const AFSVOL_TLV_FLAG_MORE = 0x10; const AFSVOL_TLV_FLAG_OBJ_NOT_SUPP = 0x20; struct AFSVol_TLV { afs_uint32 tlv_tag; afs_uint32 tlv_flags; AFSVol_TLV_value tlv_value; };
TLV XDR pseudocode
Figure 1
The core of the TLV definition above is the AFS-3 extensible discriminated union primitive type. The following discriminators are initially defined in this memo:
The AFSVol_TLV structure contains a 32-bit flags field for communication of various ancillary boolean values. This memo defines and allocates the following flag bits:
When possible, future protocol augmentations requiring the definition of new data types should request allocation of a new standards-track payload type code. Allocation of a type code should coincide with standardization of the payload encoding associated with the type code allocation. However, in limited circumstances where:
use of the type code AFSVOL_TLV_TYPE_OPAQUE may be an acceptable alternative.
In some cases the value associated with a tag will be large, structured data. A qualifier is a tag-specific parameter which allows a caller to address a subset of the value stored in a tag. For TLV get interfaces, specifying a qualifer can reduce the amount of data sent over the wire. For TLV set interfaces, specifying a qualifier permits a client to modify a subset of a structured value without endangering cache coherence. Qualifiers are marshalled over the wire as type AFSVol_TLV_value. Unless otherwise noted, it should be assumed that a tag only supports the null qualifier (ext-union discriminator set to AFSVOL_TLV_TYPE_NULL). The null qualifier always references the entire value for a given tag.
A collection of new wire error codes are a required substrate. The following new error codes are defined:
In order for clients to determine which tags are supported by a given server, an RPC is provided for obtaining the list of tags. In addition to returning the tags, this RPC also returns a tag namespace version (TSV) ordinal that can be used by clients to determine whether a consistent set of tags was fetched from the server. Additionally, this version ordinal can be compared against the value returned by all other TLV RPC calls to determine whether the tag cache remains coherent. The Rx procedure specification for the tag enumeration RPC is as follows:
typedef afs_uint64 AFSVol_TLV_TSV; typedef AFSVol_TLV_tag AFSVol_TLV_tag_vec<AFSVOL_TLV_TAG_MAX>; proc GetVolumeTLVTags( IN AFSVol_TLV_tag offset, OUT AFSVol_TLV_tag_vec * tags, OUT AFSVol_TLV_TSV * tsv ) = XXX;
Figure 2
A compliant implementation MUST implement this RPC. The call parameters are defined as follows:
Because the AFSVol interface is stateless, cache coherence cannot be maintained via the normal AFS mechanism. Thus, AFSVol clients MUST treat enumerated tags as ephemeral with a TTL of two hours.
The Tag nameSpace Version (TSV) ordinal is used to communicate the current version of the tag namespace to the caller. Version zero is a reserved value, so the server must initialize this verion ordinal to a value great than 1. Servers MUST ensure that version ordinals are unique within any 2 hour TTL period.
The Rx procedure specification for the TLV get interface will be as follows:
struct AFSVol_TLV_query { AFSVol_TLV_tag tq_tag; AFSVol_TLV_value tq_qualifier; }; typedef AFSVol_TLV_query AFSVol_TLV_query_vec<AFSVOL_TLV_TAG_MAX>; typedef AFSVol_TLV AFSVol_TLV_vec<AFSVOL_TLV_TAG_MAX>; proc GetOneVolumeTLV( IN afs_uint64 partId, IN afs_uint64 volId, IN AFSVol_TLV_query_vec * queries, OUT AFSVol_TLV_vec * tuples, OUT AFSVol_TLV_TSV * tsv ) = XXX;
Figure 3
A compliant implementation MUST implement this RPC. The call parameters are defined as follows:
This call is similar to the call described in the previous section, with the exception that TLV tuples will be returned for multiple volumes at once using an Rx split call interface. The Rx procedure specification is as follows:
const AFSVOL_BULK_GETVOLUME_MAX = 1024; typedef afs_uint64 AFSVol_TLV_part_id_vec<AFSVOL_BULK_GETVOLUME_MAX>; typedef afs_uint64 AFSVol_TLV_vol_id_vec<AFSVOL_BULK_GETVOLUME_MAX>; struct AFSVol_TLV_vol_list { afs_uint64 partId; AFSVol_TLV_Vol_id_vec * volIds; }; typedef struct AFSVol_TLV_vol_list AFSVol_TLV_get_filter<AFSVOL_BULK_GETVOLUME_MAX>; proc GetVolumesTLV( IN AFSVol_TLV_get_filter * filter, IN AFSVol_TLV_query_vec * queries, OUT AFSVol_TLV_TSV * tsv ) split = XXX;
Figure 4
Implementation of this RPC is OPTIONAL. The call parameters are defined as follows:
The contents of the split call stream shall be an xdrrec stream containing a finite sequence of XDR-encoded AFSVol_TLV structures, each of which shall be marked as a separate record (typically by calling xdrrec_endofrecord). End of sequence will be annotated by a dummy tuple containing the special tag type AFSVOL_TLV_TAG_EOS.
The Rx procedure specification for the TLV set interface will be as follows:
struct AFSVol_TLV_store { AFSVol_TLV ts_tuple; AFSVol_TLV_value ts_qualifier; }; typedef AFSVol_TLV_store AFSVol_TLV_store_vec<AFSVOL_TLV_TAG_MAX>; typedef afs_int32 AFSVol_TLV_result_vec<AFSVOL_TLV_TAG_MAX>; proc SetVolumeTLV( IN afs_int32 trans, IN AFSVol_TLV_TSV assert_tsv, IN AFSVol_TLV_store_vec * tuples, OUT AFSVol_TLV_result_vec * results, OUT AFSVol_TLV_TSV * server_tsv ) = XXX;
Figure 5
Implementation of this RPC is OPTIONAL. The call parameters are defined as follows:
The SetVolumeTLV begins by scanning all elements within the tuples array. If any elements have the AFSVOL_TLV_FLAG_CRITICAL bit asserted in tuples[i].ts_tuple.ts_flags, then preprocessing of the tuple must occur. For each tuple with the critical bit set, several preprocessing validation steps will be taken.
The tag stored in tuples[i].ts_tuple.tlv_tag is checked to ensure that the server supports it. In the event that the tag is not supported, then the corresponding array index in the results array will be set to VOLSER_TAG_UNSUPPORTED, and the RPC call abort at the conclusion of critical tuple preprocessing with error code VOLSERFAILEDOP.
The tag stored in tuples[i].ts_tuple.tlv_flag is checked to ensure that it is a writeable property. In the event that the tag is read-only, then the corresponding array index in the results array will be set to VOLSER_TAG_READ_ONLY, and the RPC call will abort at the conclusion of critical tuple preprocessing with error code VOLSERFAILEDOP.
The ext-union discriminator in tuples[i].ts_tuple.tlv_value is checked to make sure that it is a supported type. If the discriminator is not a supported type, then the corresponding array index in the results array will be set to VOLSER_TAG_UNSUPPORTED_ENCODING, and the RPC call will abort at the conclusion of critical tuple preprocessing with error code VOLSERFAILEDOP.
The value stored in tuples[i].ts_tuple.tlv_value is checked to make sure that it can be decoded. If the wire-encoded data cannot be decoded, then the corresponding array index in the results array will be set to VOLSER_TAG_DECODE_FAILED, and the RPC call will abort at the conclusion of critical tuple preprocessing with error code VOLSERFAILEDOP.
Qualifiers are specific to a given tag. If for any reason the tag-specific validation logic determines that the qualifier is invalid, it may set the corresponding array index in the results array to one of VOLSER_TLV_QUALIFIER_UNSUPPORTED_ENCODING, VOLSER_TLV_QUALIFIER_DECODE_FAILED, or VOLSER_TLV_QUALIFIER_INVALID. As with the other validation steps, if a critical tuple fails qualifier validation, then the RPC call will abort at the conclusion of critical tuple preprocessing with error code VOLSERFAILEDOP.
Once the necessary validation steps have been performed, the call will perform the set operations for each tuple. Errors encountered during the processing of each tuple will be recorded in the appropriate array index of the results array. At the conclusion the RPC will either return 0 if all set operations succeeded, or VOLSERFAILEDOP if any failed.
Existing metadata available from several interfaces will also be exported as TLV tuples. This is being done not only for completeness, but also to prevent data races between AFSVolGetOneVolumeTLV, and the various legacy introspection interfaces.
All metadata exported via the volintXInfo XDR structure will now be exported as TLV tuples. Unless otherwise specified, the values associated with each tag shall be identical to that returned for the associated field in volintXInfo by the AFSVolXListOneVolume interface. The following tuples will be allocated to export existing members of volintXInfo:
All metadata exported via the transDebugInfo XDR structure will now be exported as TLV tuples. Unless otherwise specified, the values associated with each tag shall be identical to that returned for the associated field in transDebugInfo by the AFSVolMonitor interface. The following tuples will be allocated to export existing members of transDebugInfo:
Certain fields from the IBM AFS and OpenAFS file server's VolumeDiskData header are generally useful. In particular, several fields exported via the AFSVolGetFlags and AFSVolSetFlags RPCs should be exported via the TLV interface. The full list of supported TLV tuples are:
The day-of-week usage statistics accessed via tag AFSVOL_TLV_TAG_VOL_STAT_USE_PER_DOW provide access to historic data for the 7 days prior to the current access counter available via tag AFSVOL_TLV_TAG_VOL_STAT_USE_TODAY. Depending on the desired mode of statistics collection, two qualifier types are supported by this tag.
When the qualifier is of type AFSVOL_TLV_TYPE_NULL, then a custom payload of type AFSVOL_TLV_TYPE_VOL_DOW_USE will be used to deliver day-of-week usage data for the past week. This type is defined as follows:
struct AFSVol_stat_use_per_dow { afs_uint64 stat_dow[7]; afs_uint32 stat_flags; };
Figure 6
Seven bits in the stat_flags field are used to assert data validity for each day of week. These bits are present to help monitoring applications distinguish between days for which no data was collected (e.g. due to the volume being less than eight days old) and days when there were exactly zero accesses. These bits are defined as follows:
Flag Description ----- ----------- AFSVOL_VOL_STAT_DOW0_VALID stat_dow[0] is valid AFSVOL_VOL_STAT_DOW1_VALID stat_dow[1] is valid AFSVOL_VOL_STAT_DOW2_VALID stat_dow[2] is valid AFSVOL_VOL_STAT_DOW3_VALID stat_dow[3] is valid AFSVOL_VOL_STAT_DOW4_VALID stat_dow[4] is valid AFSVOL_VOL_STAT_DOW5_VALID stat_dow[5] is valid AFSVOL_VOL_STAT_DOW6_VALID stat_dow[6] is valid AFSVOL_VOL_STAT_DOW_FUZZY server incapable of guaranteeing validity
Day-of-week statistics flags
Server implementations which are incapable of distinguishing between days when there was no usage, and for which there is no data SHOULD make a best-effort to populate the 7 per-day bits, and MUST assert the 0x80 stat_flags bit.
When the qualifier is of type AFSVOL_TLV_TYPE_UINT64, then a payload of type AFSVOL_TLV_TYPE_UINT64 will be used to deliver day-of-week usage data for the day of week specified in the uint64 qualifier. Valid qualifiers are in the range 0 to 6, where 0 means the day prior to the current day, and 6 means 7 days prior to the current day.
Clients who need to poll AFSVOL_TLV_TAG_VOL_STAT_USE_TODAY or AFSVOL_TLV_TAG_VOL_STAT_USE_PER_DOW, and need to correlate this statistical data with specific calendar days SHOULD simultaneously query for the value stored at tag AFSVOL_TLV_TAG_VOL_STAT_USE_TODAY_DATE. By querying these tags in the same RPC invocation, the caller will be able correlate the usage statistics with calendar days in a race-free manner. Querying AFSVOL_TLV_TAG_VOL_STAT_USE_TODAY_DATE in a separate RPC invocation is not guarnteed to yield correct results, as there is no way to guarantee the value didn't change between the two RPC invocations.
In addition to exporting the existing volser state, DAFS state metadata will also be exported via the TLV interface. Specifically, an extended volume state field, and a raw DAFS state debugging tag, will be exported.
Given that volume state information is useful across all server implementations, a collection of generic state explanations shall be standardized. These standardized enumeration values shall be published via a special volume state explanation tag. The following states are initially defined in the namespace:
enum AFSVol_vol_state_expl { AFSVOL_VOL_STATE_EXPL_NONE = 0, AFSVOL_VOL_STATE_EXPL_UNKNOWN = 1, AFSVOL_VOL_STATE_EXPL_OUT_OF_SERVICE = 2, AFSVOL_VOL_STATE_EXPL_DELETED = 3, AFSVOL_VOL_STATE_EXPL_READY = 4, AFSVOL_VOL_STATE_EXPL_ATTACHING = 5, AFSVOL_VOL_STATE_EXPL_DETACHING = 6, AFSVOL_VOL_STATE_EXPL_BUSY = 7, AFSVOL_VOL_STATE_EXPL_IO_BUSY = 8, AFSVOL_VOL_STATE_EXPL_SALVAGING = 9, AFSVOL_VOL_STATE_EXPL_SALVAGE_NEEDED = 10, AFSVOL_VOL_STATE_EXPL_ERROR = 11, AFSVOL_VOL_STATE_EXPL_VOLUME_OPERATION = 12 };
XDR definition of Volume State Enumeration
It is useful to be able to track volume ownership by process type. In order to do this, a new program type namespace must be defined. The following types are initially defined in the program type namespace:
enum AFSVol_program_type { AFSVOL_PROGRAM_TYPE_NONE = 0, AFSVOL_PROGRAM_TYPE_FILE_SERVER = 1, AFSVOL_PROGRAM_TYPE_VOLUME_SERVER = 2, AFSVOL_PROGRAM_TYPE_SALVAGER = 3, AFSVOL_PROGRAM_TYPE_SALVAGE_SERVER = 4, AFSVOL_PROGRAM_TYPE_VOLUME_UTILITY = 5, AFSVOL_PROGRAM_TYPE_UNKNOWN = 6 };
XDR definition of Program Type Enumeration
Volume state will be exported via five new TLV tuples:
RxOSD [AFS-OSD08] [AFS-OSD09] requires two TLV tuples to encode new quota types:
AFSVol services providing extended Tag-Length-Value RPCs MUST provide backwards compatible interfaces to both legacy clients and servers. Additionally, interoperability between TLV versions must also be specified if they do not comply with the following requirements:
We would like to thank all of the participants at the 2009 Edinburgh AFS hackathon for their input into the design of this TLV mechanism. Alistair Ferguson has provided much useful feedback, especially with regard to backwards compatibility and discriminated union type identifier namespace allocations. Andrew Deason and Michael Meffie have provided considerable input with regard to the discriminated union XDR decoding problem, AFS registrar and namespace allocation concerns, what metadata should be exported in the initial revision, the notion of data qualifiers, as well as commentary about how they envision this extension being used to support future protocol extensions. Derrick Brashear has provided helpful feedback with regard to restructuring the volume state reporting tags. Thanks to Christof Hanke and Hartmut Reuter for collaborating to make this memo compatible with their RxOSD protocol enhancments, and, furthermore, for providing helpful feedback regarding the language in this draft. Finally, special thanks to Jeffrey Hutzelman for providing considerable help with restructuring this memo to improve readability and limit its scope to something tractable.
This memo includes no request to IANA.
The AFS Assigned Numbers Registrar will need to consider several assigned numbers requests.
First and foremost, this memo requests that the AFS Registrar assume control over several new registries:
This memo requests the allocation of a new registry with the formal name "AFSVol TLV Payloads". This registry will be used to track allocations of enumeration values in the AFSVol_TLV_type XDR enum, and the mapping of these values onto their respective XDR type definitions. This is a 32-bit unsigned namespace. Allocations can fall into one of a few categories:
Range Description ----- ----------- 0 to 0xfeffffff - AFS-STDS Early Assignment 0xff000000 - Private Assignment to 0xfffeffff 0xffff0000 - reserved to 0xffffffff
Subdivision into allocation policy regions
In the table above, "AFS-STDS Early Assignment" refers to the allocation policy described in [I-D.wilkinson-afs3-standardisation]; "Private Assignment", and "Reserved" are as-described in [RFC5226].
Allocation requests for the "AFS-STDS Early Assignment" region MUST contain the following information:
In addition, an "AFS-STDS Early Assignment" allocation request MAY include the following optional elements:
This memo requests the allocation of a new registry with the formal name "AFSVol TLV Tags". This registry will be used to track allocations of enumeration values in the AFSVol_TLV_tag XDR enum, and the mapping of these values onto legal tags and qualifiers. This is a 32-bit unsigned namespace. Allocations can fall into one of a few categories:
Range Description ----- ----------- 0 to 0xfeffffff - AFS-STDS Early Assignment 0xff000000 - Private Assignment to 0xfffeffff 0xffff0000 - reserved to 0xffffffff
Subdivision into allocation policy regions
In the table above, "AFS-STDS Early Assignment" refers to the allocation policy described in [I-D.wilkinson-afs3-standardisation]; "Private Assignment", and "Reserved" are as-described in [RFC5226].
Allocation requests for the "AFS-STDS Early Assignment" region MUST contain the following information:
In addition, an "AFS-STDS Early Assignment" allocation request MAY include the following optional elements:
This memo requests the allocation of a new registry with the formal name "AFSVol TLV Flags". This registry will be used to track allocations of flag bits in the AFSVol_TLV.tlv_flags field. This is a 32-bit flag namespace. All flag bit allocations shall fall under the "AFS-STDS Early Assignment" allocation policy, as described in [I-D.wilkinson-afs3-standardisation]. Flag bit allocation requests MUST contain the following information:
In addition, an allocation request MAY include the following optional elements:
This memo requests the allocation of a new registry with the formal name "AFSVol DoW Stats Flags". This registry will be used to track allocations of flag bits in the AFSVol_stat_use_per_dow.stat_flags field. This is a 32-bit flag namespace. All flag bit allocations shall fall under the "AFS-STDS Early Assignment" allocation policy, as described in [I-D.wilkinson-afs3-standardisation]. Flag bit allocation requests MUST contain the following information:
In addition, an allocation request MAY include the following optional elements:
This memo requests the allocation of a new registry with the formal name "AFSVol Vol State Expls". This registry will be used to track allocations of enumeration values in the AFSVol_vol_state_expl enum (see Section 7.1). This is a 32-bit unsigned namespace. Allocations can fall into one of a few categories:
Range Description ----- ----------- 0 to 0xfeffffff - AFS-STDS Early Assignment 0xff000000 - Private Assignment to 0xffffffff
Subdivision into allocation policy regions
In the table above, "AFS-STDS Early Assignment" refers to the allocation policy described in [I-D.wilkinson-afs3-standardisation]; "Private Assignment" is as-described in [RFC5226].
Allocation requests for the "AFS-STDS Early Assignment" region MUST contain the following information:
In addition, an "AFS-STDS Early Assignment" allocation request MAY include the following optional elements:
This memo requests the allocation of a new registry with the formal name "AFSVol Program Types". This registry will be used to track allocations of enumeration values in the AFSVol_program_type enum (see Section 7.2). This is a 32-bit unsigned namespace. Allocations can fall into one of a few categories:
Range Description ----- ----------- 0 to 0xfeffffff - AFS-STDS Early Assignment 0xff000000 - Private Assignment to 0xffffffff
Subdivision into allocation policy regions
In the table above, "AFS-STDS Early Assignment" refers to the allocation policy described in [I-D.wilkinson-afs3-standardisation]; "Private Assignment" is as-described in [RFC5226].
Allocation requests for the "AFS-STDS Early Assignment" region MUST contain the following information:
In addition, an "AFS-STDS Early Assignment" allocation request MAY include the following optional elements:
In addition to requesting the allocation of new registries, this memo also requests several new allocations within existing assigned numbers registries.
One new capability bit is requested:
The following allocations are requested in the "AFSVol Capabilites" registry [I-D.keiser-afs3-capabilities]:
The following initial allocations are requested in the newly-created registry "AFSVol TLV Payloads":
The following initial allocations are requested in the newly-created registry "AFSVol TLV Tags":
The following initial allocations are requested within the newly-created registry "AFSVol TLV Flags":
The following initial allocations are requested within the newly-created registry "AFSVol DoW Stats Flags":
Within the VOLS error table (offset 1492325120), several new codes need to be allocated:
The following initial allocations are requested within the newly-created registry "AFSVol Vol State Expls":
Within the new AFS program type namespace, the following allocations are requested:
Security and authorization issues are tag-specific. Most known implementations of the legacy AFSVol RPCs permitted rxnull connections to perform the four ListVolume RPCs, and AFSVolMonitor. Arguably, it is time to re-evaluate this decision, and subsequently restrict access to some tags, as they do permit potentially sensitive volume--or operational--metadata to leak onto public networks.
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. |
[RFC5226] | Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 5226, May 2008. |
[I-D.deason-afs3-type-time] | Deason, A, "Base Types for Time in AFS-3", Internet-Draft draft-deason-afs3-type-time-03, August 2011. |
[I-D.keiser-afs3-capabilities] | Keiser, T, Jenkins, S and A Deason, "AFS-3 Protocol Capabilities Query Mechanism", Internet-Draft draft-keiser-afs3-capabilities-00, September 2012. |
[I-D.keiser-afs3-xdr-union] | Keiser, T and A Deason, "Extensible XDR Discriminated Union Primitive Type", Internet-Draft draft-keiser-afs3-xdr-union-06, September 2012. |
[I-D.keiser-afs3-xdr-primitive-types] | Keiser, T and A Deason, "AFS-3 Rx RPC XDR Primitive Type Definitions", Internet-Draft draft-keiser-afs3-xdr-primitive-types-01, September 2012. |
[I-D.wilkinson-afs3-standardisation] | Wilkinson, S, "Options for AFS Standardisation", Internet-Draft draft-wilkinson-afs3-standardisation-00, June 2010. |
const AFSVOL_TLV_TAG_MAX = 1024; /* upper-bound on number of * TLV tuples per RPC */ const AFSVOL_TLV_OPAQUE_MAX = 262144; /* upper-bound on size of * value payload */ const AFSVOL_TLV_UINT64_MAX = 32768; /* upper-bound on length of uint64 vector payload */ const AFSVOL_TLV_TIME_MAX = 21845; /* upper-bound on length of AFSTime vector payload */ const AFSVOL_BULK_GETVOLUME_MAX = 1024; /* upper-bound on * (partition, volume) * tuples per RPC */ const AFSVOL_TLV_FLAG_UNSUPPORTED = 0x1; const AFSVOL_TLV_FLAG_READ_ERROR = 0x2; const AFSVOL_TLV_FLAG_CRITICAL = 0x4; const AFSVOL_TLV_FLAG_QUALIFIER_NO_MATCH = 0x8; const AFSVOL_TLV_FLAG_MORE = 0x10; const AFSVOL_TLV_FLAG_OBJ_NOT_SUPP = 0x20; enum AFSVol_TLV_type { AFSVOL_TLV_TYPE_NULL = 0, AFSVOL_TLV_TYPE_TRUE = 1, AFSVOL_TLV_TYPE_FALSE = 2, AFSVOL_TLV_TYPE_UINT64 = 3, AFSVOL_TLV_TYPE_UINT64_VEC = 4, AFSVOL_TLV_TYPE_INT64 = 5, AFSVOL_TLV_TYPE_INT64_VEC = 6, AFSVOL_TLV_TYPE_UUID = 7, AFSVOL_TLV_TYPE_STRING = 8, AFSVOL_TLV_TYPE_TIME_ABS = 9, AFSVOL_TLV_TYPE_TIME_ABS_VEC = 10, AFSVOL_TLV_TYPE_TIME_REL = 11, AFSVOL_TLV_TYPE_TIME_REL_VEC = 12, AFSVOL_TLV_TYPE_VOL_ID = 13, AFSVOL_TLV_TYPE_VOL_ID_VEC = 14, AFSVOL_TLV_TYPE_PART_ID = 15, AFSVOL_TLV_TYPE_PART_ID_VEC = 16, AFSVOL_TLV_TYPE_DISK_BLOCKS = 17, AFSVOL_TLV_TYPE_STAT_COUNTER = 18, AFSVOL_TLV_TYPE_STAT_GAUGE = 19, AFSVOL_TLV_TYPE_BIT64 = 20, AFSVOL_TLV_TYPE_VOL_DOW_USE = 21, AFSVOL_TLV_TYPE_OPAQUE = 22 }; const AFSVOL_VOL_STAT_DOW0_VALID = 0x1; const AFSVOL_VOL_STAT_DOW1_VALID = 0x2; const AFSVOL_VOL_STAT_DOW2_VALID = 0x4; const AFSVOL_VOL_STAT_DOW3_VALID = 0x8; const AFSVOL_VOL_STAT_DOW4_VALID = 0x10; const AFSVOL_VOL_STAT_DOW5_VALID = 0x20; const AFSVOL_VOL_STAT_DOW6_VALID = 0x40; const AFSVOL_VOL_STAT_DOW_FUZZY = 0x80; struct AFSVol_stat_use_per_dow { afs_uint64 stat_dow[7]; afs_uint32 stat_flags; }; enum AFSVol_vol_state_expl { AFSVOL_VOL_STATE_EXPL_NONE = 0, AFSVOL_VOL_STATE_EXPL_UNKNOWN = 1, AFSVOL_VOL_STATE_EXPL_OUT_OF_SERVICE = 2, AFSVOL_VOL_STATE_EXPL_DELETED = 3, AFSVOL_VOL_STATE_EXPL_READY = 4, AFSVOL_VOL_STATE_EXPL_ATTACHING = 5, AFSVOL_VOL_STATE_EXPL_DETACHING = 6, AFSVOL_VOL_STATE_EXPL_BUSY = 7, AFSVOL_VOL_STATE_EXPL_IO_BUSY = 8, AFSVOL_VOL_STATE_EXPL_SALVAGING = 9, AFSVOL_VOL_STATE_EXPL_SALVAGE_NEEDED = 10, AFSVOL_VOL_STATE_EXPL_ERROR = 11, AFSVOL_VOL_STATE_EXPL_VOLUME_OPERATION = 12 }; enum AFSVol_program_type { AFSVOL_PROGRAM_TYPE_NONE = 0, AFSVOL_PROGRAM_TYPE_FILE_SERVER = 1, AFSVOL_PROGRAM_TYPE_VOLUME_SERVER = 2, AFSVOL_PROGRAM_TYPE_SALVAGER = 3, AFSVOL_PROGRAM_TYPE_SALVAGE_SERVER = 4, AFSVOL_PROGRAM_TYPE_VOLUME_UTILITY = 5, AFSVOL_PROGRAM_TYPE_UNKNOWN = 6 }; ext-union AFSVol_TLV_value switch(AFSVol_TLV_type type) { case AFSVOL_TLV_TYPE_NULL: void; case AFSVOL_TLV_TYPE_TRUE: void; case AFSVOL_TLV_TYPE_FALSE: void; case AFSVOL_TLV_TYPE_UINT64: afs_uint64 u_u64; case AFSVOL_TLV_TYPE_VOL_ID: afs_uint64 u_vol_id; case AFSVOL_TLV_TYPE_PART_ID: afs_uint64 u_part_id; case AFSVOL_TLV_TYPE_DISK_BLOCKS: afs_uint64 u_disk_blocks; case AFSVOL_TLV_TYPE_STAT_COUNTER: afs_uint64 u_stat_counter; case AFSVOL_TLV_TYPE_BIT64: afs_uint64 u_bit64; case AFSVOL_TLV_TYPE_INT64: afs_int64 u_s64; case AFSVOL_TLV_TYPE_STAT_GAUGE: afs_int64 u_stat_gauge; case AFSVOL_TLV_TYPE_UINT64_VEC: afs_uint64 u_u64_vec<AFSVOL_TLV_UINT64_MAX>; case AFSVOL_TLV_TYPE_VOL_ID_VEC: afs_uint64 u_vol_id_vec<AFSVOL_TLV_UINT64_MAX>; case AFSVOL_TLV_TYPE_PART_ID_VEC: afs_uint64 u_part_id_vec<AFSVOL_TLV_UINT64_MAX>; case AFSVOL_TLV_TYPE_INT64_VEC: afs_int64 u_s64_vec<AFSVOL_TLV_UINT64_MAX>; case AFSVOL_TLV_TYPE_TIME_ABS: AFSTime u_time_abs; case AFSVOL_TLV_TYPE_TIME_REL: AFSRelTimestamp u_time_rel; case AFSVOL_TLV_TYPE_TIME_ABS_VEC: AFSTime u_time_abs_vec<AFSVOL_TLV_TIME_MAX>; case AFSVOL_TLV_TYPE_TIME_REL_VEC: AFSRelTimestamp u_time_rel_vec<AFSVOL_TLV_UINT64_MAX>; case AFSVOL_TLV_TYPE_UUID: afsUUID u_uuid; case AFSVOL_TLV_TYPE_STRING: string u_string<AFSVOL_TLV_OPAQUE_MAX>; case AFSVOL_TLV_TYPE_VOL_DOW_USE: /* type defined later in this memo */ AFSVol_stat_use_per_dow u_vol_dow_use; case AFSVOL_TLV_TYPE_OPAQUE: opaque u_opaque<AFSVOL_TLV_OPAQUE_MAX>; }; /* registrar-controlled tag namespace */ enum AFSVol_TLV_tag { AFSVOL_TLV_TAG_EOS = 0, AFSVOL_TLV_TAG_VOL_NAME = 1, AFSVOL_TLV_TAG_VOL_STATUS = 2, AFSVOL_TLV_TAG_VOL_IN_USE = 3, AFSVOL_TLV_TAG_VOL_ID = 4, AFSVOL_TLV_TAG_VOL_TYPE = 5, AFSVOL_TLV_TAG_VOL_CLONE_ID = 6, AFSVOL_TLV_TAG_VOL_BACKUP_ID = 7, AFSVOL_TLV_TAG_VOL_PARENT_ID = 8, AFSVOL_TLV_TAG_VOL_COPY_DATE = 9, AFSVOL_TLV_TAG_VOL_CREATE_DATE = 10, AFSVOL_TLV_TAG_VOL_ACCESS_DATE = 11, AFSVOL_TLV_TAG_VOL_UPDATE_DATE = 12, AFSVOL_TLV_TAG_VOL_BACKUP_DATE = 13, AFSVOL_TLV_TAG_VOL_SIZE = 14, AFSVOL_TLV_TAG_VOL_FILE_COUNT = 15, AFSVOL_TLV_TAG_VOL_QUOTA_BLOCKS = 16, AFSVOL_TLV_TAG_VOL_STAT_USE_TODAY = 17, AFSVOL_TLV_TAG_VOL_STAT_USE_PER_DOW = 18, AFSVOL_TLV_TAG_VOL_STAT_READS = 19, AFSVOL_TLV_TAG_VOL_STAT_WRITES = 20, AFSVOL_TLV_TAG_VOL_STAT_FILE_SAME_AUTHOR = 21, AFSVOL_TLV_TAG_VOL_STAT_FILE_DIFFERENT_AUTHOR = 22, AFSVOL_TLV_TAG_VOL_STAT_DIR_SAME_AUTHOR = 23, AFSVOL_TLV_TAG_VOL_STAT_DIR_DIFFERENT_AUTHOR = 24, AFSVOL_TLV_TAG_VOL_TRANS_ID = 25, AFSVOL_TLV_TAG_VOL_TRANS_TIME = 26, AFSVOL_TLV_TAG_VOL_TRANS_CREATE_TIME = 27, AFSVOL_TLV_TAG_VOL_TRANS_RETURN_CODE = 28, AFSVOL_TLV_TAG_VOL_TRANS_ATTACH_MODE = 29, AFSVOL_TLV_TAG_VOL_TRANS_STATUS = 30, AFSVOL_TLV_TAG_VOL_TRANS_FLAGS = 31, AFSVOL_TLV_TAG_VOL_TRANS_LAST_PROC_NAME = 32, AFSVOL_TLV_TAG_VOL_TRANS_CALL_VALID = 33, AFSVOL_TLV_TAG_VOL_TRANS_READ_NEXT = 34, AFSVOL_TLV_TAG_VOL_TRANS_XMIT_NEXT = 35, AFSVOL_TLV_TAG_VOL_TRANS_LAST_RECV_TIME = 36, AFSVOL_TLV_TAG_VOL_TRANS_LAST_SEND_TIME = 37, AFSVOL_TLV_TAG_VOL_IN_SERVICE = 38, AFSVOL_TLV_TAG_VOL_BLESSED = 39, AFSVOL_TLV_TAG_VOL_RESTORED_FROM_ID = 40, AFSVOL_TLV_TAG_VOL_DESTROYED = 41, AFSVOL_TLV_TAG_VOL_NEEDS_SALVAGE = 42, AFSVOL_TLV_TAG_VOL_OFFLINE_MESSAGE = 43, AFSVOL_TLV_TAG_VOL_EXPIRATION_DATE = 44, AFSVOL_TLV_TAG_VOL_QUOTA_RESERVATION = 45, AFSVOL_TLV_TAG_VOL_STAT_USE_TODAY_DATE = 46, AFSVOL_TLV_TAG_VOL_STATE_ONLINE = 47, AFSVOL_TLV_TAG_VOL_STATE_AVAILABLE = 48, AFSVOL_TLV_TAG_VOL_STATE_EXPL = 49, AFSVOL_TLV_TAG_VOL_STATE_DAFS_RAW = 50, AFSVOL_TLV_TAG_VOL_STATE_OWNING_PROCESS = 51, AFSVOL_TLV_TAG_VOL_QUOTA_BLOCKS_STORED_LOCALLY = 52, AFSVOL_TLV_TAG_VOL_QUOTA_FILES = 53 }; struct AFSVol_TLV { afs_uint32 tlv_tag; afs_uint32 tlv_flags; AFSVol_TLV_value tlv_value; }; struct AFSVol_TLV_query { AFSVol_TLV_tag tq_tag; AFSVol_TLV_value tq_qualifier; }; struct AFSVol_TLV_store { AFSVol_TLV ts_tuple; AFSVol_TLV_value ts_qualifier; }; typedef afs_uint64 AFSVol_TLV_TSV; typedef AFSVol_TLV_tag AFSVol_TLV_tag_vec<AFSVOL_TLV_TAG_MAX>; typedef AFSVol_TLV_query AFSVol_TLV_query_vec<AFSVOL_TLV_TAG_MAX>; typedef AFSVol_TLV AFSVol_TLV_vec<AFSVOL_TLV_TAG_MAX>; typedef afs_uint64 AFSVol_TLV_part_id_vec<AFSVOL_BULK_GETVOLUME_MAX>; typedef afs_uint64 AFSVol_TLV_vol_id_vec<AFSVOL_BULK_GETVOLUME_MAX>; typedef AFSVol_TLV_store AFSVol_TLV_store_vec<AFSVOL_TLV_TAG_MAX>; typedef afs_int32 AFSVol_TLV_result_vec<AFSVOL_TLV_TAG_MAX>; struct AFSVol_TLV_vol_list { afs_uint64 partId; AFSVol_TLV_Vol_id_vec * volIds; }; typedef struct AFSVol_TLV_vol_list AFSVol_TLV_get_filter<AFSVOL_BULK_GETVOLUME_MAX>; proc GetVolumeTLVTags( IN AFSVol_TLV_tag offset, OUT AFSVol_TLV_tag_vec * tags, OUT AFSVol_TLV_TSV * tsv ) = XXX; proc GetOneVolumeTLV( IN afs_uint64 partId, IN afs_uint64 volId, IN AFSVol_TLV_query_vec * queries, OUT AFSVol_TLV_vec * tuples, OUT AFSVol_TLV_TSV * tsv ) = XXX; proc GetVolumesTLV( IN AFSVol_TLV_get_filter * filter, IN AFSVol_TLV_query_vec * queries, OUT AFSVol_TLV_TSV * tsv ) split = XXX; proc SetVolumeTLV( IN afs_int32 trans, IN AFSVol_TLV_TSV assert_tsv IN AFSVol_TLV_store_vec * tuples, OUT AFSVol_TLV_result_vec * results, OUT AFSVol_TLV_TSV * server_tsv ) = XXX;
Figure 7