Network Working Group | M. V. Kerwin |
Internet-Draft | QUT |
Intended status: Standards Track | July 03, 2013 |
Expires: January 04, 2014 |
The file URI Scheme
draft-kerwin-file-scheme-05
This document specifies the file Uniform Resource Identifier (URI) scheme that was originally specified in [RFC1738]. The purpose of this document is to keep the information about the scheme on standards track, since [RFC1738] has been made obsolete.
This draft should be discussed on its github project page [github].
This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 04, 2014.
Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.
URIs were previously defined in [RFC1738], which was updated by [RFC3986]. Those documents also specify how to define schemes for URIs.
The first definition for many URI schemes appeared in [RFC1738]. Because that document has been made obsolete, this document copies the file URI scheme from it to allow that material to remain on standards track.
This section is non-normative.
The file URI scheme was first defined in [RFC1630], which, being an informational RFC, does not specify an Internet standard. The definition was standardised in [RFC1738], and the scheme was registered with the Internet Assigned Numbers Authority (IANA) [IANA-URI-Schemes]; however the latter definition omitted certain language included by former that clarified aspects such as:
The Internet draft [I-D.draft-hoffman-file-uri] was written in an effort to keep the file URI scheme on standards track when [RFC1738] was made obsolete, but that draft expired in 2005. It enumerated concerns arising from the various, often conflicting implementations of the scheme. It serves as the basis of this document.
The file URI scheme defined in [RFC1738] is referenced three times in the current URI Generic Syntax standard [RFC3986], despite the former's obsoletion:
Finally the WHATWG defines a living URL standard [WHATWG-URL], which includes algorithms for interpreting file URIs.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].
The file URI scheme is used to identify files accessible on a particular host computer, where a file is a named resource which can be accessed through the computer's filesystem interface. This scheme, unlike most other URI schemes, does not identify a resource that is universally accessible over the Internet.
Note well that "file" refers to filesystem names from the perspective of the user of a reference, rather than in relation to a globally-defined naming authority, so care should be taken to ensure that such references are actually intended to be interpreted in relation to the user's filesystem interface.
The file URI scheme has historically had little or no interoperability between platforms. Further, implementers on a single platform have often disagreed on the syntax to use for a particular filesystem. This document attempts to resolve those problems, and define a standard scheme which is interoperable between different extant and future implementations. Additionally, it aims to ease implementation by conforming to a general syntax that allows existing URI parsing machinery to parse file URIs.
Note that file and ftp URIs are not the same, even when the target of the ftp URI is the local host.
file:///usr/local/bin/
The syntax of a file URI conforms with the generic syntax presented in [RFC3986], with the following components:
Previous definitions of the file URI scheme required two slashes between the scheme and path, so implementations may wish to include an authority component in any file URIs they generate, in order to remain interoperable.
Systems exhibit different levels of case-sensitivity. Implementations file URIs to and from the local system's representation of file paths, and any systems or devices that transport file URIs file URIs they transport.
Most implementations of the file URI scheme do a reasonable job of mapping the hierarchical part of a directory structure into the slash ("/") delimited hierarchy of the URI syntax, independent of the native platform's delimiter.
For example, on Microsoft Windows platforms, it is typical that the file system presents backslash ("\") as the file delimeter for file names, yet the URI's forward slash ("/") can be used in file URIs. Similarly, on (some) Macintosh OS versions, at least in some contexts, the colon (":") is used as the delimiter in the native presentation of file path names. Unix systems natively use the same forward slash ("/") delimiter for hierarchy, so there is a closer mapping between file URI paths and native path names.
In accordance with Section 3.3 of [RFC3986], the path segments . and .., also known as dot-segments, are only interpreted within the URI path hierarchy and are removed as part of the resolution process ([RFC3986], Section 5.2). Implementations operating on or interacting with systems that allow dot-segments in their resolved native path representation may be required to escape those segments using some other means.
file:a/b/c
As relative references are resolved into their respective (absolute) target URIs according to Section 5 of [RFC3986], this document does not describe that resolution. However, a fully resolved file URI may contain a non-absolute file path. For example, the URI: c, in directory b, in directory a, on the machine on which the URI is being interpreted (i.e. localhost); however there is no indication of the location of the directory a on that machine. By convention an absolute file path would begin with a slash ("/") character on a Unix-based system, or a drive letter (e.g. "c:\") on a Microsoft Windows system, etc.
Resolution of relative file paths is left undefined by this specification.
Historically there has been considerable difference, in practice, for handling of the syntax for the "top" of the hierarchy. The file URI syntax provides one simple place for designating the root of the file hierachy, and implementations have diverged, even on the same platform, sometimes even within a single application.
For example, Microsoft DOS- and Windows-based systems support the notion of a "drive letter", a single character which represents a (virtual) drive, mount point, or device. Native representations of file paths start with the drive letter, a colon, and then the path; e.g., c:\TMP\test.txt.
file:///c:/TMP/test.txt file:///c|/TMP/test.txt file:///c/TMP/test.txt
c:\TMP\test.txt
Drive letters are mapped into the top of a file URI in various ways. On systems running some versions of Microsoft Windows, the drive letter may be specified with a colon (":") character, however sometimes the colon is replaced with a pipe ("|") character, and in some implementations the colon is omitted entirely. The three representations
Implementations
Note that some systems running some versions of Microsoft Windows are known to omit the slash before the drive letter, effectively replacing the authority component with the drive specification; for example, file://c:/TMP/test.txt. In line with Postel's robustness principle ("an implementation must be conservative in its sending behavior, and liberal in its receiving behavior" [RFC791]) implementations that are likely to encounter such a URI
UNC = "\\" hostname "\" sharename *( "\" objectname ) hostname = <NetBIOS name, FQDN, or IP address of a server> sharename = <name of a share or resource to be accessed> objectname = <the name of an object>
The Microsoft Windows Universal Naming Convention (UNC) [MS-DTYP] defines a convention for specifying the location of resources such as shared files or devices, for example Windows shares accessed via the SMB/CIFS protocol [MS-SMB2]. The general structure of a UNC file path, given in Augmented Backus-Naur Form (ABNF) [RFC5234], is:
\\server.example.com\Share\path\to\file.doc
file://server.example.com/Share/path/to/file.doc \________________/\_____________________/ hostname sharename+objectnames
The canonical representation of a UNC file path as a file URI copies the UNC hostname into the URI host field, and the UNC sharename and objectnames, concatenated with forward slash ("/") characters, into the path. For example, the following UNC path: file URI canonically as:
\\server.example.com\Share\path\to\file.doc
file:////server.example.com/Share/path/to/file.doc \_________________________________________/ translated UNC path
Historically some implementations have translated UNC file paths entirely into the path segment of a file URI, including both leading slashes. For example, the UNC path:
The file URI scheme is unusual in that it does not specify an Internet protocol or access method for shared files; as such, its utility in network protocols between hosts is limited. Examples of file server protocols that do define such access methods include SMB/CIFS [MS-SMB2], NFS [RFC3530], and NCP [NOVELL].
The Microsoft Windows API defines Win32 Namespaces [Win32-Namespaces] for interacting with files and devices using Windows API functions. These namespaced paths are prefixed by \\?\ for Win32 File Namespaces and \\.\ for Win32 Device Namespaces. There is also a special case for UNC file paths [MS-DTYP] in Win32 File Namespaces, referred to as "Long UNC", using the prefix \\?\UNC\.
This document does not define a mechanism for translating namespaced file paths into file URIs.
Local file systems sometimes use many different encodings for representing file names. The URI syntax defined in [RFC3986] provides a method of encoding data, presumably for the sake of identifying a resource, as a sequence of characters. The URI characters are, in turn, frequently encoded as octets for transport or presentation. This specification does not mandate any particular character encoding for mapping between URI characters and the octets used to store or transmit those characters, however for the sake of interoperability, file URI libraries [UNICODE] encoded as UTF-8 [RFC3629] and then percent-encoded into valid ASCII [RFC20].
A protocol or system that utilises the file URI scheme file URIs used in that protocol or system, and
There are many security considerations for URI schemes discussed in [RFC3986].
File access and the granting of privileges for specific operations are complex topics, and the use of file URIs can complicate the security model in effect for file privileges. Under no circumstance should software using file URIs grant greater access than would be available for other file access methods.
This document does not modify the existing entry in the URI Schemes registry [IANA-URI-Schemes], except by updating its reference RFC.
This specification is derived from RFC 1738 [RFC1738], RFC 3986 [RFC3986], and I-D draft-hoffman-file-uri (expired) [I-D.draft-hoffman-file-uri]; the acknowledgements in those documents still apply.
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. |
[RFC3986] | Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, January 2005. |