Dublin Core Workshop Series S. Weibel
Internet-Draft J. Kunze
draft-kunze-dc-00.txt C. Lagoze
9 February 1997
Expire in six months
Dublin Core Metadata for Simple Resource Description
1. Status of this Document
This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas, and
its working groups. Note that other groups may also distribute working
documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet- Drafts as reference
material or to cite them other than as ``work in progress.''
To learn the current status of any Internet-Draft, please check the
``1id-abstracts.txt'' listing contained in the Internet- Drafts Shadow
Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
ftp.isi.edu (US West Coast).
Distribution of this document is unlimited. Please send comments
to weibel@oclc.org, or to the discussion list meta2@mrrl.lut.ac.uk.
2. Introduction
Finding information on the World Wide Web has become increasingly
problematic in proportion to the explosive growth of available resources.
Web indexing evolved rapidly to fill the demand for resource discovery
tools, but indexing, while enormously useful, is a poor substitute for
richer varieties of resource description.
An invitational workshop in March of 1995 brought together librarians,
digital library researchers, and text-markup specialists to address the
problem of resource description for networked resources. This activity
evolved into a series of related workshops and ancillary activities
that have become known collectively as the Dublin Core Metadata
Workshop Series. This report summarizes the state of this effort.
The initial motivation for the first workshop was simply to do
something that would improve the prospects for resource discovery on
the Web. Specifically, the goal was to identify a simple set of common
description elements that authors (or content managers) could embed in
their documents to promote their discovery -- something like a catalog
card for a network resource. The term "Dublin Core" applies to
this simple core of descriptive elements.
3. Simple Resource Description
The goals that motivate the Dublin Core effort are:
Simplicity of creation and maintenance
Commonly understood semantics
International scope and applicability
Extensibility
These requirements work at cross purposes to some degree, but all are
desirable goals. The ensuing two years of discussion have been to some
degree an exercise in minimizing the tensions among them.
The development of formal ontologies is currently a prominent line of
research in digital library communities, aimed at identifying the
structure of knowledge in a given discipline, and linking these
structures into a larger whole. In contrast, one might think of this
workshop series as an attempt to identify an "emergent ontology",
that is, a consensus among experienced practitioners across many
disciplines about the basic elements of resource discovery.
4. Description of Dublin Core Elements
The following comprises the reference definition of the Dublin Core
Metadata Element set as of December, 1996. The elements or their names
are not expected to change substantively from this list, though the
application of some of them are currently experimental and subject
to interpretation. Further, it is expected that practice will evolve
to include qualifiers for certain of the elements. The reference
description of the elements resides at
http://purl.org/metadata/dublin_core_elements
Note that elements have a descriptive name intended to convey a common
semantic understanding of the element. In addition, a formal, single-
word label is specified to make syntactic specification of elements
simpler in encoding schemes. Each element is optional and repeatable.
Element descriptions follow.
4.1. Title Label: TITLE
The name given to the resource by the CREATOR or PUBLISHER.
4.2. Author or Creator Label: CREATOR
The person(s) or organization(s) primarily responsible for the
intellectual content of the resource. For example, authors in the
case of written documents, artists, photographers, or illustrators
in the case of visual resources.
4.3. Subject and Keywords Label: SUBJECT
The topic of the resource, or keywords or phrases that describe
the subject or content of the resource. The intent of the
specification of this element is to promote the use of controlled
vocabularies and keywords. This element might well include
scheme-qualified classification data (for example, Library of
Congress Classification Numbers or Dewey Decimal numbers) or
scheme-qualified controlled vocabularies (such as MEdical Subject
Headings or Art and Architecture Thesaurus descriptors) as well.
4.4. Description Label: DESCRIPTION
A textual description of the content of the resource, including
abstracts in the case of document-like objects or content
descriptions in the case of visual resources. Future metadata
collections might well include computational content description
(spectral analysis of a visual resource, for example) that may not
be embeddable in current network systems. In such a case this
field might contain a link to such a description rather than the
description itself.
4.5. Publisher Label: PUBLISHER
The entity responsible for making the resource available in its
present form, such as a publisher, a university department, or a
corporate entity. The intent of specifying this field is to
identify the entity that provides access to the resource.
4.6. Other Contributor Label: CONTRIBUTOR
Person(s) or organization(s) in addition to those specified in the
CREATOR element who have made significant intellectual contributions
to the resource but whose contribution is secondary to the individuals
or entities specifed in the CREATOR element (for example, editors,
transcribers, illustrators, and convenors).
4.7. Date Label: DATE
The date the resource was made available in its present form. The
recommended best practice is an 8 digit number in the form YYYYMMDD
as defined by ANSI X3.30-1985. In this scheme, the date element for
the day this is written would be 19961203, or December 3, 1996.
Many other schema are possible, but if used, they should be
identified in an unambiguous manner.
4.8. Resource Type Label: TYPE
The category of the resource, such as home page, novel, poem, working
paper, technical report, essay, dictionary. It is expected that
RESOURCE TYPE will be chosen from an enumerated list of types.
4.9. Format Label: FORMAT
The data representation of the resource, such as text/html, ASCII,
Postscript file, executable application, or JPEG image. The intent
of specifying this element is to provide information necessary to
allow people or machines to make decisions about the usability of
the encoded data (what hardware and software might be required to
display or execute it, for example). As with RESOURCE TYPE, FORMAT
will be assigned from enumerated lists such as registered Internet
Media Types (MIME types). In principal, formats can include
physical media such as books, serials, or other non-electronic media.
4.10. Resource Identifier Label: IDENTIFIER
String or number used to uniquely identify the resource. Examples
for networked resources include URLs and URNs (when implemented).
Other globally-unique identifiers,such as International Standard
Book Numbers (ISBN) or other formal names would also be candidates
for this element.
4.11. Source Label: SOURCE
The work, either print or electronic, from which this resource
is derived, if applicable. For example, an html encoding of a
Shakespearean sonnet might identify the paper version of the
sonnet from which the electronic version was transcribed.
4.12. Language Label: LANGUAGE
Language(s) of the intellectual content of the resource. Where
practical, the content of this field should coincide with the
NISO Z39.53 three character codes for written languages.
4.13. Relation Label: RELATION
Relationship to other resources. The intent of specifying this
element is to provide a means to express relationships among
resources that have formal relationships to others, but exist as
discrete resources themselves. For example, images in a document,
chapters in a book, or items in a collection. A formal
specification of RELATION is currently under development. Users
and developers should understand that use of this element should
be currently considered experimental.
4.14. Coverage Label: COVERAGE
The spatial locations and temporal durations characteristic of the
resource. Formal specification of COVERAGE is currently under
development. Users and developers should understand that use of
this element should be currently considered experimental.
4.15. Rights Management Label: RIGHTS
The content of this element is intended to be a link (a URL or
other suitable URI as appropriate) to a copyright notice, a
rights-management statement, or perhaps a server that would
provide such information in a dynamic way. The intent of
specifying this field is to allow providers a means to associate
terms and conditions or copyright statements with a resource or
collection of resources. No assumptions should be made by users
if such a field is empty or not present.
5. Security Considerations
The Dublin Core element set poses no risk to computers and networks.
It poses minimal risk to searchers who obtain incorrect or private
information due to careless mapping from rich data descriptions to
simple Dublin Core scheme. No other security concerns are likely
to be affected by the element description consensus documented here.
6. References
[1] Weibel, S., Miller, E., "Dublin Core Metadata Element Set:
Reference Description",
http://purl.org/metadata/dublin_core_elements
7. Authors' Addresses
Stuart L. Weibel
OCLC Online Computer Library Center, Inc.
Office of Research
6565 Frantz Rd.
Dublin, Ohio, 43017, USA
Email: weibel@oclc.org
Voice: +1 614-764-6081
Fax: +1 614-764-2344
John A. Kunze
Center for Knowledge Management
University of California, San Francisco
530 Parnassus Ave, Box 0840
San Francisco, CA 94143-0840, USA
Email: jak@ckm.ucsf.edu
Voice: +1 415-502-6660
Fax: +1 415-476-4653
Carl Lagoze
Digital Library Research Group
Department of Computer Science
Cornell University
Ithaca, NY 14853, USA
Email: lagoze@cs.cornell.edu
Voice: +1-607-255-6046
Fax: +1-607-255-4428
APPENDIX: A Proposed Convention for Embedding Metadata in HTML.
The following proposed convention reflects the consensus of a break-out
group at the W3C Distributed Indexing and Searching Workshop, May 28-29,
1996, concerning tagging of meta information in HTML. This break out
group included representatives of the Dublin Core/Warwick Framework
Metadata meetings, Lycos, Microsoft, WebCrawler, the IEEE metadata effort,
Verity Software, and the W3C.
Attendees (alphabetically):
Nick Arnett narnett@verity.com Mic Bowman bowman@transarc.com
Eliot
Christian echristi@usgs.gov Dan Connolly conolly@w3.org
Martijn Koster m.koster@webcrawler.com John Kunze jak@ckm.ucsf.edu
Carl Lagoze lagoze@cs.cornell.edu Michael fuzzy@lycos.com
Mauldin
Christian
Mogensen christian@vivid.com Wick Nichols wickn@microsoft.com
Timothy Niesen tmn@swl.msd.ray.com Stuart weibel@oclc.org
Weibel
Andrew Wood woody@dstc.edu.au
1. The Problem
The problem is to identify a simple means of embedding metadata within HTML
documents without requiring additional tags or changes to browser software,
and without unnecessarily compromising current practices for robot
collection of data.
While metadata is intended for display in some situations, it is judged
undesireable for such embedded metadata to display on browser screens as
a side effect of displaying a document. Therefore, any solution requires
encoding information in attribute tags rather than as container element
content.
The goal was to agree on a simple convention for encoding structured
metadata information of a variety of types (which may or may not be
registered with a central registry analogous to the Mime Type registry).
It was judged that a registry may be a necessary feature of the metadata
infrastructure as alternative schema are elaborated, but that deployment
in the short-term could go forward without such a registry, especially
in light of the proposed use of the LINK tag to link descriptions to a
standard schema description as described below.
2. A Proposed Convention
The solution agreed upon is to encode schema elements in META tags, one
element per META tag, and as many META tags as are necessary. Grouping of
schema elements is achieved by a prefix schema identifier associated with
each schema element. The convention agreed upon is as follows:
Thus, a partial Dublin Core citation might be encoded as follows:
And a collection of Microsoft Word metadata might be encoded as follows:
3. Linkage to the Reference Description of a Metadata Schema
It is judged useful to provide a means for linking to the reference
definition of the metadata schema (or schemata) used in a document. Doing
so serves as a primitive registration mechanism for metadata schemata, and
lays the foundation for a more formal, machine-readable linkage mechanism
in the future. The proposed convention for doing so is as follows:
Thus, the reference description of one metadata scheme, the Dublin Core
Metadata Element Set, would be referenced in the LINK HREF as follows:
The description of an element could be accessed by the construction of URL
using the # token to identify a named anchor. Thus, the derived URL below
actually links to the title element in the reference description of the
Dublin Core Metadata Element Set.
http://purl.org/metadata/dublin_core_elements#title
This URL would correspond to the human-readable description of the title
element within the document by a NAME anchor such as:
Title
The name of the work provided by the author or publisher.
While use of the LINK tag is not required for a given schema, when used,
it will make possible retrieval of the reference definition of a given
schema element, and will therefore reduce the need for a formal metadata
scheme registry. Multiple LINK tags can be used so that elements derived
from multiple schemas can be referenced within a single document.
4. Consistency of Description Schemas
To promote consistency among resource description schemas, it is suggested
that the semantics for metadata elements be related to existing well-known
schemas whenever feasible.