Internet DRAFT - draft-daigle-appidarch
draft-daigle-appidarch
Internet Engineering Task Force L. Daigle
Internet-Draft TCE
Intended status: Informational March 2015
Expires: August 31, 2015
Internet Application Identifier Architecture
draft-daigle-appidarch-00.txt
Abstract
This document outlines a general architecture for Internet
applications, through the perspective of applications identifiers.
It provides a survey of past approaches, drawing out common elements
and highlighting common traps and roadblocks.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 31, 2015.
Copyright Notice
Copyright (c) 2015 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (http://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Simplified BSD License text
as described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Basic components of Application Identifier Architecture . . . 2
3. Application Identifier Architectures in More Detail . . . . . 2
3.1. System . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Daigle Expires August 31, 2015 [Page 1]
Internet-Draft App ID Arch March 2015
3.2. Identifiers . . . . . . . . . . . . . . . . . . . . . . . 3
3.3. Identified . . . . . . . . . . . . . . . . . . . . . . . . 3
4. Survey of existing work . . . . . . . . . . . . . . . . . . . 5
4.1. Domain Name System . . . . . . . . . . . . . . . . . . . . 5
4.2. World Wide Web . . . . . . . . . . . . . . . . . . . . . . 6
4.3. Classic URIs . . . . . . . . . . . . . . . . . . . . . . . 8
4.4. IP addresses . . . . . . . . . . . . . . . . . . . . . . . 10
5. Common design choices and challenges . . . . . . . . . . . . . 10
5.1. Identifiers . . . . . . . . . . . . . . . . . . . . . . . 10
5.2. Identified . . . . . . . . . . . . . . . . . . . . . . . . 10
6. Issues in (mis)using identifiers . . . . . . . . . . . . . . . 11
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11
8. Security Considerations . . . . . . . . . . . . . . . . . . . 11
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 11
1. Introduction
This document posits a high level architecture of Internet
application identifier systems, as well as a survey of IETF efforts
dealing in standardization of Internet applications and services
using those identifier systems.
Status of this revision: this is a very drafty -00 document. The
hope and expectation is that it is enough to stimulate some thought
and discussion to flesh out future documents.
2. Basic components of Application Identifier Architecture
There are 3 basic components of an Application Identifier
Architecture:
o System
o Identifiers
o Identified
The System is the context in which the identifiers are created, used
and in which they are intended to make sense. This is usually
transparent, except when identifiers are used outside of this
context, causing greater or lesser problems over time. This is
discussed in more detail, below.
Identifiers are typically strings of bits or characters. They may
have multiple representations.
The concept of what is being Identified is also dependent on the
System -- whether it's Internet hosts, services, documents, parts of
documents, people or other actors from the physical world, their
representatives in the Internet, etc.
3. Application Identifier Architectures in More Detail
3.1. System
Daigle Expires August 31, 2015 [Page 2]
Internet-Draft App ID Arch March 2015
As noted above, the System is the context in which the identifiers
make sense. In the Domain Name System, for example, the system
initially consisted of the set of hosts attached to the Internet, and
it has generalized to the set of addressable Internet services.
These are organized into ?domains?, which are operated under the
control of a single entity, and individual domains are completely
independent of each other.
3.2. Identifiers
Identifiers may identify a thing that exists ? content, service,
location ? or is posited to exist. Typical actions on identifiers
include:
o "Minting" -- creation of an identifier, usually including
association with the identified thing
o Transformation -- changing the bits (characters) of the identifier
by some set of rules and/or to conform to some structure; relative
or absolute
o Comparison -- of identifiers. Are 2 different identifiers the
same? Identifying the same thing? Expressing relationship
between things?
o Resolution -- using the identifier to access what it identifies
o Validation -- confirmation that the identifier conforms to the
system?s rules (syntax)
o Status check -- has the identifier been minted? Is it active
within the system?
o Authentication -- confirmation that the identifier association is
valid (as minted)
o Lookup -- some systems support look up ? finding identifier
entries based on partial fragments, typically leading characters
(bits)
o Search -- some systems support searches for identifiers based on
fragments of the related data
o Subscribe -- subscribing to an identifier allows you to get
periodic updates as to state of the identifier/identified.
3.3. Identified
Identifiers can be associated with just about any level of concept,
construct, network or software element, or entity in the physical
world. The range of possible identified things is generally scoped
by the System.
Daigle Expires August 31, 2015 [Page 3]
Internet-Draft App ID Arch March 2015
From the perspective of application architectures, there are 4 levels
of things that may be identified, and may have individual
identifiers:
o Entity/resource -- the thing itself. For example, the published
work "Moby Dick"
o Instance -- a specific copy of the thing, e.g., a copy of "Moby
Dick".
o Properties -- the characteristics of the thing. The set of
properties discussed is generally constrained by the System.
o Relationship -- an identifier may identify something as "part-of"
a larger entity.
There are actions that may be performed on the things identified:
o Assign properties -- associate values with properties of
identified item
o Get properties
o Intrinsic -- E.g., format, number of words
o Applied -- director's cut
o Publish -- put copy somewhere
o First instance
o Cache/replica
o Get (a copy)
o Any copy
o Specific service
o Closest
o Cheapest
o Authenticate -- confirm (cryptographically) that the resource is
genuinely the one expected / related to identifier
o Comparison
o Equivalence
o Send -- a reflection of "get"?
o Subscribe
Daigle Expires August 31, 2015 [Page 4]
Internet-Draft App ID Arch March 2015
o Search
4. Survey of existing work
4.1. Domain Name System
The Domain Name System (DNS) was designed to provide identifiers to
allow storage and retrieval of (sets of) properties associated with
Internet hosts and services ? real and virtual.
DNS identifiers are hierarchical, dot-separated labels, typically
expressed as characters. Host names are a subset of domain name
identifiers, with some restrictions on the permissible characters.
o "Minting" -- the authority for a domain can create labels within
that domain. So-called "synthetic" domain labels are created
dynamically.
o Transformation -- relative domain names can be understood within
the context of a ?search domain?
o Comparison -- domain names are matched on an octet-by-octet basis
(IDNs are not considered here)
o Resolution -- DNS resolution means "DNS lookup" -- using the DNS
infrastructure to retrieve resource records associated with the
domain name. Resolution can be tailored to retrieve particular
types of resource records (e.g., A records, or AAAA records, or MX
records)
o Validation -- any string that conforms to DNS syntax may be
considered valid.
o Status check -- DNS does not distinguish between "inactive" and
"not minted". That is, either every label in the hierarchical
domain name is accessible in an authoritative domain server (in
which case the domain name is "minted" and "active") or the DNS
will return the value that it does not exist. (Not true in
DNSSEC?)
o Authentication -- domain names are not authenticated (see below
for DNSSEC).
o Lookup -- DNS resolution is lookup of domain names
o Search -- DNS does not support search (wildcards?)
o Subscribe -- N/A
DNS identifies resource records. The resource records are
themselves descriptions of Internet hosts, services, and other
Daigle Expires August 31, 2015 [Page 5]
Internet-Draft App ID Arch March 2015
information stored in the DNS, but fundamentally the DNS identifier
is for resource records.
o Entity/resource -- a set of resource records
o Instance -- copies of resource records may be stored in caching
servers; there are no special identifiers to distinguish primary
or cached results
o Properties -- N/A
o Relationship -- N/A
There are actions that may be performed on the things identified:
o Assign properties -- resource records have time to live (TTL) and
serial numbers included
o Get properties -- parsed as part of the response from the server.
o Publish -- publishing a DNS resource record amounts to updating a
DNS zone file with the record.
o Get (a copy) -- resolve the domain name identifier
o Authenticate -- DNSSEC is used to authenticate that the resource
records/response received for domain name resolution are as they
were published.
o Comparison -- of RRs?
o Send -- N/A
o Search -- N/A
4.2. World Wide Web
The World Wide Web (WWW) is largely defined by the HTTP protocol.
"Pages" defined in HTML are the primary design target, though these
days much content of varying formats is delivered via the HTTP
protocol. For the sake of discussion, we'll say that WWW identifiers
are HTTP scheme URIs.
o "Minting" -- typically, HTTP URIs are not composed consciously, so
much as assembled practically with components of the domain name
hosting the web server and some path components based on how the
website is laid out hierarchically (which may or may not relate to
Daigle Expires August 31, 2015 [Page 6]
Internet-Draft App ID Arch March 2015
an underlying file structure)
o Transformation -- HTTP URIs may be relative (to current page in
the "hierarchy", current domain authority etc)
o Comparison -- HTTP URIs are not inherently comparable except by
characterwise comparision or determining relative relationships
o Resolution -- HTTP URIs are resolved by parsing the authority
component from the URI, connecting to the server, and requesting
the resource associated with the path part of the URI.
o Validation -- any string that conforms to HTTP URI syntax may be
considered valid.
o Status check -- HTTP does not distinguish between "inactive" and
"not minted". That is, either the HTTP server is available and
the resource is found on it (in which case the URI is "minted" and
"active") or the server (or path) are not found.
o Authentication -- HTTP URIs are not authenticated (see below for
authentication of servers).
o Lookup -- N/A
o Search -- HTTP does not support search (within server?)
o Subscribe -- N/A
HTTP URIs identify "web pages" or "resources".
o Entity/resource -- web content (page)
o Instance -- copies of pages may be stored in caching servers;
there are no special identifiers to distinguish primary or cached
results
o Properties -- properties may be embedded within the HTML document,
but there are no special identifiers to query/retrieve properties;
as part of the HTTP protocol, capabilities may be negotiated
o Relationship -- relative URIs (?)
There are actions that may be performed on the things identified:
o Assign properties -- the web server may assign properties to a web
page.
o Get properties --parsed as part of the response from the server.
o Publish -- publishing a web page is done on the backend, out of
band of the WWW system
o Get (a copy) -- resolve the URI
Daigle Expires August 31, 2015 [Page 7]
Internet-Draft App ID Arch March 2015
o Authenticate -- certs are used, within HTTP, to authenticate the
server is empowered to operate for a given domain name.
Individual pages are not authenticated.
o Comparison -- many web pages look alike -- but there is no
inherent way to claim two web pages (documents) are "the same".
o Send -- N/A
o Search -- within the WWW there is no support for search (all
search is achieved as an external system).
4.3. Classic URIs
The advent of the WWW heralded a burst of development of standards
for applications and content on the Internet. Much work was done to
elaborate a general system of identifiers, supporting a broad range
of application needs -- Uniform Resource Identifiers in the large,
encompassing Locators (identifiers of Internet "location"), Names
(persistent, location-independent identifiers for resources),
Characteristics (metadata about resources), Agents (for composing
actions).
In the large, the classic perspective on URIs was that they would
identify all resources (documents, services, media, components,
classes, parameters etc) that would be referenced within Internet
protocols.
o "Minting" -- dependent on the URI scheme. The HTTP URI scheme is
outlined above as a dynamic URI creation example. Some
(namespaces of) URNs require more explicit minting of identifiers
to be used in URNs (e.g., ISBN URNs).
o Transformation -- dependent on the URI scheme. URNs were
intended to be (authoritatively) transformed into URLs identifying
the location of the desired resource at a given point in time.
o Comparison -- Scheme-dependent -- there is no URI-wide support for
comparing URIs (except byte-wise equality).
o Resolution -- Scheme-dependent. Some URI schemes do not have
Internet-based resolution capabilities.
o Validation -- any string that conforms to URI syntax may be
considered valid. Individual schemes may include validation
services (e.g., out of band lookup services, built-in checksums,
etc).
o Status check -- Generally, URIs do not distinguish between
"inactive" and "not minted". That is, successful resolution
Daigle Expires August 31, 2015 [Page 8]
Internet-Draft App ID Arch March 2015
implies minted, unsuccessful resolution is ambiguous. Individual
schemes may provide more refined methods of confirmation of status
(SIP URIs?)
o Authentication -- There is no URI-wide support for URI
authentication.
o Lookup -- N/A
o Search -- URIs are not inherently searchable
o Subscribe -- N/A
Classically, URIs identify Internet "resources". However, URIs
have been found in contexts that are disjoint from the Internet
(e.g., XML component identification). "Resource" is deliberately
general -- could be documents, services, etc. Each URI scheme
refines its scope of intended resources.
Entity/resource -- typically Internet content or service, but see
comment above.
Instance -- Most URI schemes do not support identification of
instances. However, URNs are intended to identify locations of
multiple instances of a given resource.
Properties -- URIs may identify URCs (Uniform Resource
Characteristics) an articulation of the properties of a given
resource. (URCs were never finally standardized).
Relationship -- relative URIs (?)
There are actions that may be performed on the things identified:
o Assign properties -- publish a URC
o Get properties -- retrieve a URC associated with the URI.
o Publish -- scheme-dependent
o Get (a copy) -- resolve the URI
o Authenticate -- certs are used, within some URI schemes, to
authenticate the server is empowered to operate for a given domain
name. Individual resources are not authenticated. (Unless you
have a URC with a checksum?)
o Comparison -- many resources look alike -- but there is no
inherent way to claim two resources are "the same".
o Send -- N/A
o Search -- within the URI space there is no support for search (all
search is achieved as an external system).
Daigle Expires August 31, 2015 [Page 9]
Internet-Draft App ID Arch March 2015
4.4. IP addresses
To be added... the interesting thing about IP addresses is
considering the context in which they are defined, and then
contrasting that with the places they turn up.
5. Common design choices and challenges
In creating a system, there are important common design choices that
need to be made. Sometimes the answer is implicit within the overall
design constraints of the system. Other times, considerable effort
is required to refine specifications and make appropriate choices.
As noted, these are common design questions. This is an area where
understanding of previous systems? design discussions can be
particularly helpful (in order not to repeat them needlessly).
5.1. Identifiers
In defining identifiers for a system, is the intention that they:
o Name something -- the identifier will be associated with one
entity (or instance of an entity), wherever that entity may be
located.
o Locate something -- identify the location of an entity (at some
point in time).
o Are Smart or dumb identifiers -- "smart" identifiers have
structures that can be parsed to determine something about the
thing identified (e.g., domain in which it is stored); "dumb"
identifiers are opaque and must be resolved within the system.
o Have uniqueness -- is the identifier/resource binding unique?
o Have scope -- is the uniqueness (or other properties) only
maintained within some limited scope, or is it global?
o Permanent -- what is the expected level of permanence of the
identifier's relevance (the ability to use it, the binding between
the identifier and the identified resource). Or, are they
transient identifiers?
5.2. Identified
The specifics of the identified resource need to be defined, as well.
o Instances -- can there be multiple instances of a single resource?
How can they be distinguished and/or how can two instances be
equated. This is important if one needs to be able to cache
instances or otherwise validate "local" copies.
o Scope of applicability -- what constitutes a "resource" in this
system?
Daigle Expires August 31, 2015 [Page 10]
Internet-Draft App ID Arch March 2015
6. Issues in (mis)using identifiers
Things like using IP addressed out of context of the routing system ?
assumptions about uniqueness and volatility may be improper.
7. IANA Considerations
This memo includes no request to IANA.
8. Security Considerations
This document is about considering applications systems. Security is
important to applications, but is not specifically called out here.
9. References
[1] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997, <http:/
/xml.resource.org/public/rfc/html/rfc2119.html>.
Author's Address
Leslie Daigle
ThinkingCat Enterprises
Leesburg, VA 20176
US
Email: ldaigle@thinkingcat.com
Daigle Expires August 31, 2015 [Page 11]