Internet DRAFT - draft-reddy-dasl-requirements
draft-reddy-dasl-requirements
DAV Searching and Locating March 1998
INTERNET-DRAFT S. Reddy
draft-reddy-dasl-requirements-02.txt Microsoft Corporation
March, 1998 J. Slein
Expires July, 1998 Xerox Corporation
Requirements for DAV Searching and Locating
Status of this Memo
This document is an Internet draft. Internet drafts are working
documents of the Internet Engineering Task Force (IETF), its areas
and its working groups. Note that other groups may also distribute
working information as Internet drafts.
Internet Drafts are draft documents valid for a maximum of six
months and can be updated, replaced or obsoleted by other documents
at any time. It is inappropriate to use Internet drafts as
reference material or to cite them as other than as "work in
progress".
To learn the current status of any Internet draft please check the
"lid-abstracts.txt" listing contained in the Internet drafts shadow
directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
munnari.oz.au (Pacific Rim), ds.internic.net (US East coast) or
ftp.isi.edu (US West coast). Further information about the IETF can
be found at URL: http://www.ietf.org/
Distribution of this document is unlimited. Editorial comments
should be sent to the author (saveenr@microsoft.com).
Abstract
The Distributed Authoring and Versioning protocol [WEBDAV] defines
simple mechanisms to assign and retrieve values for properties.
This document presents requirements for a WEBDAV extension to
support efficient searching for resources based on WEBDAV
properties and content. These requirements are intended to be the
basis for the DAV Searching a Location (DASL) protocol.
1 Introduction
1.1 Existing DAV searching mechanisms
INTERNET DRAFT DAV Searching and Locating 1
DAV Searching and Locating March 1998
WEBDAV and HTTP provide support for client-side search, but not
server-side search. The GET method defined in [HTTP] allows
clients to retrieve a resource’s content; the PROPFIND method
defined in [WEBDAV] allows clients to retrieve a resource’s
properties. Having retrieved a resource’s properties and / or
content, the client can compare them to its search criteria to
determine whether the resource is of interest.
1.2 Limitations of Client-side Searching
Client-side searching requires no modifications to the server.
However, simplicity for the server comes at a cost:
(1) It makes inefficient use of network resources. Clients must
retrieve properties and content for each resource under
consideration.
(2) It does not take advantage of server intelligence. Servers
capable of searching can use sophisticated mechanisms to
generate results: internal caching of intermediate search
results, content-indexing, etc.
Even simple, common queries may expose these limitations. Consider
the query "find all text files modified during the last week.” When
such a query is extended to a large number of clients searching
against a single server, the limitations become more apparent.
Client-side searching has difficulties scaling in these cases.
1.3 Server-side Searching
DASL allows for server-side searching. Server-side searching allows
the client to formulate a query and have the server perform task of
selecting the resources that fit the criteria. This overcomes both
of the limitations of client-side searching described above. The
benefit is a searching solution that scales; the cost is that the
server software becomes more complex.
2 Terminology
2.1 DASL Terms
2.1.1 Search Criteria
INTERNET DRAFT DAV Searching and Locating 2
DAV Searching and Locating March 1998
Search criteria are an expression against which each resource in
the search scope is evaluated. Those resources for which the
expression evaluates to True are included in the result set.
2.1.2 Search Expression
An Expression is a Term or the negation of an Expression (using the
Boolean NOT operator) or two expressions joined by one of the
Boolean operators (AND or OR). An expression evaluates to either
True, False, or Unknown.
2.1.3 Search Term
A Search Term is an assertion about a resource. The term may assert
that: (1) a property has a relationship to some value, (2) a
property exists, or (3) the content of a resource has a
relationship to some value.
2.1.4 Result Set
The Result Set is a response to a search request. This is a set of
result records, one record for each resource that matches the
search criteria.
2.1.5 Result Record Definition
The Result Record Definition is the set of properties specified by
the client that it requests the server to transmit for each
resource that matches the criteria.
2.1.6 Result Record
A unit of information appearing in the result set that corresponds
to a resource that matches the search criteria. The record consists
of those properties listed in the Result Record Definition.
2.1.7 Search Scope
The Search Scope is the set of resources to be searched.
Comparison Operator
A comparison operator is a function used in a search term that
evaluates the relationship between two values. Examples of
comparison operators are <, <=, >=, >, ==, and != .
INTERNET DRAFT DAV Searching and Locating 3
DAV Searching and Locating March 1998
2.1.8 Sort Specification
A sort specification tells the server how to sort the result set.
2.1.9 Search Attribute
A Search Attribute is an instruction that governs the execution of
the query but is not part of the search scope, result record
definition, the search criteria, or the sort specification. An
example of a search modifier is one that controls how much time the
server can spend on the query before giving a response.
2.1.10 Query
The Query is the combination of search criteria, search scope,
result record definition, sort specification, and search
attributes.
2.2 Additional Terms
In addition to the terms defined above, this document uses
terminology consistent with [HTTP] and [WEBDAV].
3 Query Semantics
3.1 General Requirements
3.1.1 Simple Searches on Content
It must be possible to perform simple searches on content of any
media type.
Searching for specific content inside a resource is a common
operation. DASL must provide a mechanism to provide searching on
content of a resource to provide for this scenario.
3.1.2 Variants
It must be possible for searches to occur across multiple variants
of a resource and to target specific variants.
The WEBDAV working group is addressing the standardization of
mechanisms for authors to use when submitting variants to the
server. DASL must provide mechanisms that can intelligently query
on those variants.
INTERNET DRAFT DAV Searching and Locating 4
DAV Searching and Locating March 1998
3.1.3 Versioning
It must be possible for searches to occur across multiple versions
of resource and to target specific versions.
The WEBDAV working group is addressing the standardization of
mechanisms for authors to use when submitting versions to the
server. DASL must provide mechanisms that can intelligently query
on those versions.
3.2 Result Record Definition
The client must be able to identify the properties or content to be
returned in the result records.
Search criteria and search result records are not required to
overlap. For example, a query might ask for "the authors of those
documents under 10K in size". In this case, the criterion relates
only to the size, but the desired result record contains only the
author.
3.3 Scope
3.3.1 Scope Identification & Multiple Scopes
It must be possible for the client to specify a number of
different, unrelated URIs over which the search is to range.
3.3.2 Resource-Based Scopes
It should be possible to perform scoping within a resource. For
example, one may wish to limit a search to a single chapter within
a document.
3.3.3 Depth
It must be possible for the client to specify the "depth" of a
search for a search scope URI.
Users often intend to scope their searches either to the immediate
children of a container or to extend the search recursively to the
container's children. Furthermore, depth control is needed to
prevent servers from performing unnecessary work.
INTERNET DRAFT DAV Searching and Locating 5
DAV Searching and Locating March 1998
3.4 Search Criteria
3.4.1 Simple Terms
3.4.1.1 Exact Matching
A query term must be able to compare the entire value of a property
to some constant value.
3.4.1.2 Regular Expression Matching
A query term must be able to compare a property to an expression
with the expressive power of regular expressions.
The power and frequent use of the UNIX utility GREP highlights the
value of regular expressions for searching large bodies of content.
3.4.1.3 Property Comparisons
It must be possible to specify criteria on "equal to", and "not
equal to" for all property values that can be compared. It must be
possible to support relative comparison operators ( >, >=, <=, and
< ) on those properties that can be ordered (for example, those
having numerical values).
Many common searches involve such comparisons. For example, a
stereotypical query might ask for "those documents under 10K in
size" or "those text files authored by Saveen".
DASL must support the ability to compare property values against
literal values, other property values, and expressions.
3.4.1.4 Content Comparisons
It must be possible to specify searches for content-based operators
such as NEAR, IN, CONTAINS, LIKE.
It must be possible to specify how linguistic stemming, phonetic
searching, truncation, keyword expansion, and case-sensitivity will
play a role in the search.
It must be possible to specify the relevance and ranking criteria
for content-based searches.
INTERNET DRAFT DAV Searching and Locating 6
DAV Searching and Locating March 1998
3.4.1.5 Existence Assertions
It must be possible to test for the existence or non-existence of a
property.
3.4.2 Complex Expressions
3.4.2.1 Logical Boolean Operators
It must be possible to use the logical Boolean operators (AND, OR,
NOT) in the search criteria to combine search expressions.
Often criteria involve the evaluation of several conditions
simultaneously. For example, a stereotypical query might ask for
"those documents modified by user X within some period of time Y."
Boolean operations are necessary to provide support these criteria.
3.4.2.2 Undefined properties and values
The behavior of a query when properties or their values are
undefined must be specified.
Undefined properties are those that do not exist. Their role in
query evaluation needs to be specified. Undefined values can occur
when properties are calculated from expressions like "x/y" where
y=0.
3.4.2.3 Sort Order
DASL must define a mechanism to allow clients to specify a sort
order for the result set.
3.4.3 Other Query Attributes
3.4.3.1 Maximum Result Rest
It must be possible to indicate that the search result must not
exceeded some fixed number of records.
3.4.3.2 Paged Results
It must be possible to request pages results.
3.5 Query Syntax
INTERNET DRAFT DAV Searching and Locating 7
DAV Searching and Locating March 1998
3.5.1 Standard Query Grammar
The DASL extensions must define a query grammar that provides
simple searching functionality.
For the sake of interoperability, DASL servers are expected to
offer a basic set of searching capabilities. Likewise, clients need
a standard, simple syntax by which to access those capabilities.
3.5.2 Support for Other Query Grammars
DASL extensions must allow servers to support other grammars.
A particular query grammar may not expose useful searching
functionality of a server. Clients should be allowed to query a
server using any grammar that takes advantage of those special
server capabilities.
3.5.3 Natural Language Queries
It must be possible to support natural language queries.
3.6 Results
3.6.1 Standard format
DASL must define a standard format for search results.
For the sake of interoperability, it is desirable that server
result formats be standardized so that regardless of the type of
query syntax used, clients are guaranteed to successfully
understand the results of a query.
3.6.2 Paged Results
DASL search results must be conducive to paged retrieval.
Paged retrieval is necessary if result sets are very large and if
clients must also present a responsive interface to a user. In this
scenario clients need to access portions of the search result at
specific times. DASL search results must be defined so that paged
search results are possible.
3.7 Discovery Mechanisms
INTERNET DRAFT DAV Searching and Locating 8
DAV Searching and Locating March 1998
3.7.1 Grammar Discovery
It must be possible for clients to discover which query grammars a
server supports.
If a server is capable of supporting several search grammars, the
client needs to determine which grammars are supported.
3.7.2 Operator Discovery
It must be possible for a client to discover which operators are
available for a given query grammar.
3.7.3 Scope Information Discovery
It should be possible for a client to determine searching
information about a scope, if that information is available.
Examples of such information includes information that reveals
which properties can be searched in a scope, indexing statistics
for the scope, etc.
3.8 Redirecting a Query
It must be possible for the server to refer the client to other
resources in order to continue a search.
For example, a client may ask the resource http://ren/stimpy to
perform a search over http://foo/bar and http://blah/mumble.
However http://ren/stimpy may not be able to perform the search
itself and so will need to be able to inform the client that it
should submit its search request directly to http://foo/bar and
http://blah/mumble.
3.9 Hit Highlighting
DASL must define a mechanism to allow clients to request and
receive "hit highlighting". Hit highlighting allows clients to
provide visual cues to a user to identify segments in a text
resource that cause them to match content-based queries.
4 Authentication
The DASL specification should state how the DASL extensions to
WEBDAV interoperate with existing authentication schemes, and
should make recommendations for using those schemes.
INTERNET DRAFT DAV Searching and Locating 9
DAV Searching and Locating March 1998
5 Access Control
The DASL specification should state how the DASL extensions to
WEBDAV interoperate with the ACL mechanisms supported by WEBDAV,
and should make recommendations for using those schemes.
6 Internationalization
DASL extensions must describe how to perform searches on
internationalized content and properties. Information intended for
user comprehension must conform to the IETF Character Set Policy
[CHAR].
7 Related Work
Z39.50: "Information Retrieval (Z39.50): Application Service
Definition and Protocol Specification".
http://lcweb.loc.gov/z3950/agency/
Z39.50 Profile for Simple Distributed Search and Ranked Retrieval
http://lcweb.loc.gov/z3950/agency/profiles/zdsr.html
The STARTS Protocol
http://www-db.stanford.edu/~gravano/starts.html
The Harvest Information Discovery and Access System
http://mordor.transarc.com/afs/transarc.com/public/trg/Harvest/
8 References
[CHAR] H.T. Alvestrand, "IETF Policy on Character Sets and
Languages", June 1997, internet-draft, work-in-progress, draft-
alvestrand-charset-policy-02.txt.
[HTTP] R. Fielding, J. Gettys, J. C. Mogul, H. Frystyk, and T.
Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2068,
U.C. Irvine, DEC, MIT/LCS, January 1997.
[WEBDAV] Y. Y. Goland, E. J. Whitehead, Jr., A. Faizi, S. R.
Carter, D. Jensen, "Extensions for Distributed Authoring and
Versioning on the World Wide Web", October, 1997, internet-draft,
INTERNET DRAFT DAV Searching and Locating 10
DAV Searching and Locating March 1998
work-in-progress, draft-ietf-webdav-protocol-04.txt.Authors'
Addresses
9 Author's Addresses
Saveen Reddy
Microsoft Corporation
One Microsoft Way
Redmond WA, 9085-6933
EMail: saveenr@microsoft.com
Judith Slein
Xerox Corporation
800 Phillips Road 105-50C
Webster, NY 14580
EMail: slein@wrc.xerox.com
Expires July 1998
INTERNET DRAFT DAV Searching and Locating 11