TOC |
|
This document contains a precise specification of how browsers process URLs. The behavior specified in this document might or might not match any particular browser, but browsers might be well-served by adopting the behavior defined herein.
If you have suggestions for improving this document, please send email to mailto:public-iri@w3.org. Further Working Group information is available from https://tools.ietf.org/wg/iri/.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”
This Internet-Draft will expire on May 13, 2011.
Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
1.
Open Issues
2.
Definitions
3.
Parsing a URL
3.1.
Finding the scheme
3.2.
Finding the authority, path, query, and fragment
3.3.
Finding the user information, host, and port
3.4.
Find the user name and password
4.
Resolving a string relative to a base URL
4.1.
Resolving a string as a relative URL
4.2.
Resolving a string as a scheme-relative URL
4.3.
Resolving a string as an authority-relative URL
4.4.
Resolving a string as a query-relative URL
4.5.
Resolving a string as a fragment-relative URL
4.6.
Resolving a string as a path-relative URL
Appendix A.
Acknowledgements
§
Author's Address
TOC |
Browsers parse URLs differently depending on which operating system they're running on. The problem is that they want to do sensible things for file paths, but file paths look different on Windows and Unix systems.
How should we handle cases where browsers disaggree with the regular expression in RFC 3986? Currently, this document aims to describe how browsers behave, but we'll likely need to compare that to RFC 3986 at some point. Some specific differences that have been brought up on the mailing list:
TOC |
A /control character/ is a character whose value is less than or equal to U+0020 (" ").
A /slash character/ is either U+???? ("/") or U+???? ("\").
An /authority terminating characters/ is either a slash charcter, U+???? ("?"), U+???? ("#"), or U+???? (";").
During a parsing algorithm, the /remaining string/ are the characters of the input that have not yet been consumed.
TOC |
Given a string of characters, find the scheme, as described in Section ??.
If the URL is invalid:
-> Abort these steps.
If the scheme is a single upper or lower case ASCII character:
-> TODO: Windows drive specs!
If the scheme is a ASCII case-insensitive match for "file":
-> TODO: File URLs!
If the scheme is a ASCII case-insensitive match for "mailto":
-> TODO: I think mailto URLs are special, but more testing is required.
If the scheme is hierarchical:
-> In the after-scheme, if any, find the authority, path, query, and fragment, as decribed in Section ??.
-> In the authority, if any, find the user information, host, and port, as described in Section ??.
-> In the user-info, if any, find the user name and password, as described in Section ??.
-> Abort these steps.
The remaining string is the /path/.
TOC |
Consume all leading and trailing control characters.
If the remaining string does not contain a ":" character:
-> The URL is invalid.
-> Abort these steps.
Consume characters up to, but not including, the first ":" character. These characters are the /scheme/.
Consume the ":" character.
The reamining characters are the /after-scheme/.
TOC |
Consume any number of slash characters.
If the remaining string does not contain any authority terminating characters:
-> The remaining string is the /authority/.
-> Abort these steps.
Consume characters up to, but not including, the first authority terminating character. The consumed characters are /authority/.
If the remaining string does not contain a "?" character or a "# character:
-> The remaining string is the /path/.
-> Abort these steps.
Consume characters up to, but not including, the first "?" or "#" charcter. The consumed characters are the /path/.
If the first character of the remaining string is a "?" character:
-> Consume the "?" character.
-> If the remaining string does not contain a "#" character:
-> The remaining string is the /query/.
-> Abort these steps.
-> Consume characters up to, but not including, the first "#" charcter. The consumed characters are the /query/.
Consume the "#" character.
The remaining string is the /fragment/.
TOC |
If the remaining string contains an "@" character:
-> Consume characters up to, but not including the *last* "@" character. The consumed characters are the /user-info/.
-> Consume the "@" character.
If the remaining string does not contain an ":" character:
-> The remaining string is the /host/.
-> Abort these steps.
If the first character of the remaining string is a "[" character, the remaining string contains a "]" character, and the last ":" character in the remaining string occurs before the last "]" character in the remaining string:
-> The remaining string is the /host/.
-> Abort these steps.
Consume characters up to, but not including, the last ":" character. The consumed characters are the /host/.
Consume the ":" character.
The remaining string is the /port/.
TOC |
If the remaining string does not contain a ":" character:
-> The remaining string is the /user/.
-> Abort these steps.
Consume characters up to, but not including, the first ":" character. The consumed characters are the /user/.
Consume the ":" character.
The remaining string is the /password/.
TOC |
Given a string /spec/ and a ParsedURL /base-url/, find the scheme of spec.
TODO: We probably need to trim leading and trailing control characters.
If spec is an invalid URL:
-> The resolved URL is spec resolved as relative URL.
-> Abort these steps.
If spec's scheme contains any characters which are not "valid scheme characters" (TODO: Define valid scheme characters):
-> The resolved URL is spec resolved as relative URL.
-> Abort these steps.
If base-url's scheme is an ASCII case insensitive match for spec's scheme and the shared scheme is hierarchical:
-> The resolved URL is spec's after-scheme resolved as a relative URL.
-> Abort these steps.
The resolved URL is spec parsed as an absolute URL.
TOC |
Given a string /spec/ and a ParsedURL /base-url/...
TODO: If base-url's scheme is not hierarchical, we can't resolve as a relative URL. We'll probably want to return an invalid URL. Check what happens when resolving an empty string as a relative URL with a non-hierarchical base.
If spec is empty:
-> The resolved URL is identical to base-url, with the fragment, if any, removed.
-> Abort these steps.
If the first character of spec is a slash character:
-> If spec has at least two characters and the second character is also a slash character:
Otherwise:-> The resolved URL is spec resolved as a scheme-relative URL.
-> The resolved URL is spec resolved as an authority-relative URL.
-> Abort these steps.
If the first character of spec is a "?" character:
-> The resolved URL is spec resolved as a query-relative URL.
-> Abort these steps.
If the first character of spec is a "#" character:
-> The resolved URL is spec resolved as a fragment-relative URL.
-> Abort these steps.
The resolved URL is spec resolved as a path-relative URL.
TOC |
Given a string /spec/ and a ParsedURL /base-url/, let resolved-url be
The resolved URL is resolved-url parsed as an absolute URL.
TOC |
Given a string /spec/ and a ParsedURL /base-url/, let resolved-url be
The resolved URL is resolved-url parsed as an absolute URL.
TOC |
Given a string /spec/ and a ParsedURL /base-url/, let resolved-url be
The resolved URL is resolved-url parsed as an absolute URL.
TOC |
Given a string /spec/ and a ParsedURL /base-url/, let resolved-url be
The resolved URL is resolved-url parsed as an absolute URL.
TOC |
TODO: Handle path-relative URLs. This requires a bunch of path and dot semantics.
TOC |
TODO
TOC |
Adam Barth | |
Google, Inc. | |
Email: | ietf@adambarth.com |
URI: | http://www.adambarth.com/ |