IETF Email List Archiving and Search Tool Requirements
draft-sparks-genarea-mailarch-00
The IETF makes heavy use of email lists to conduct its work. Participants frequently need to search and browse the archives of these lists, and have asked for improved search capabilities. The current archive mechanism could also be made more efficient. This memo captures the requirements for improved email list archiving and searching systems.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
The IETF makes heavy use of email lists to conduct its work. Participants frequently need to search the archives of these lists, and have asked for improved search capabilities. The current archive mechanism could also be made more efficient. This memo captures the requirements for improved email list archiving and searching systems.
The requirments captured in this version of this memo (-00) are an initial brainstorm. While intended to be thorough, they are probably incomplete. Please keep "What's missing?" in mind while reviewing them. Discussion of this memo should take place on the ietf@ietf.org mailing list.
2. List Search and Archive Requirements
2.1. Search and Browsing
- o
- The system must provide a web interface for search and browsing archived messages.
- o
- The system must allow browsing the entire archive of a given list by thread or by date.
- o
- The system must allow browsing the results of a search by thread or by date.
- Both threading based on Message-Id/References/In-Reply-To and threading based on same subject line (modulo short prefixes like re: and fwd:) shoul dbe taken into account.
- o
- The system must allow searching across any subset of the archived lists (one list, a selection of lists, or all lists).
- o
- The system must allow searching of any combination (using AND and OR operators) of the following attributes. Richer search capabilites are highly desirable.
- -
- string occurring in sender name
- -
- date range
- -
- string occurring in Subject
- -
- string occurring in message body
- -
- string occuring in message header (in particular, exact match of Message-Id)
- For instance, it would be nice to search the entire archive for instances of a message with a given Message-ID with a URL like <http://datatracker.ietf.org/mlarchive/msg?id=4EA6E023.6010603@nostrum.com>
- o
- Individual messages must be representable by a long-term stable URI that can be shared between users. That is, the URI must be suitable for reference in an email message.
- o
- Searches should be representable by a URI that can be shared between users
- -
- Such URIs should be long-term stable.
- -
- The search may be re-executed when the URI is referenced. It is acceptable for the same URI to produce different results if accessed at different times (reflecting additional messages that may match the search criteria for example.)
2.2. Archiving Active Lists
- o
- The archive system must accept messages handled by various mail reflector packages.
- -
- Lists hosted on the IETF systems are served by mailman [mailman].
- -
- Lists hosted at other organizations may use other packages.
- *
- The archive system must accept messages through subscribing to such an external list.
- *
- The archive system may support other mechanisms for accepting messages into the archive
2.3. Importing Messages from Other Archives
Lists hosted at other systems are sometimes moved to the IETF servers, and their archive is moved with them. The archiving system must be able to import these archives.
- o
- At a minimum the archive system must be able to import mbox formatted archives [RFC4155][mbox].
- o
- The archive system should be able to import maildir formatted archives [maildir].
- o
- It is acceptable to use a separate utility to convert between these formats before import as long as the conversion is lossless
2.4. Exporting messages from the Archives
- o
- The archive system must support exporting messages in the mbox format
- o
- The archive system should support exporting messages in mailder format
- o
- The archive system must support exporting the entire archive of a given list
- o
- The archive system must support exporting all messages from a given list within a given daterange
- o
- The archive system should allow exporting the results of any supported search query
2.5. Redundancy
- o
- The systems must facilitate providing archive, search, and browse functions through geographically distributed servers
- -
- The systems must support a single active an dsingle standby server. This reflects the current operating configuration and is expected to be the initial deployment model.
- -
- The systems should support a single active and multiple standby servers.
- -
- The systems should support multiple active servers for the search and browse functions. Multiple active archive servers are not a requirement.
- -
- The amount of data replication between servers should be on the order of the size of any new/changed messages in the archives.
- *
- It is acceptable for replication to be part of the archival system itself (such as using the replication mechanisms from an underlying database).
- *
- It is acceptable to rely on replication of the underlying filesystem objects (using rsync of one or more directory trees for example), but only if the objects in the underlying filesystem are formatted such that the size of the replication data is on the order of the size of any new/changed messages in the archives.
2.6. Archive Administration
- o
- The archive system must support adding and removing lists to be archived
- o
- The system must allow the administrator to add messages to and delete messages from an archived list. The system should log such actions.
- o
- The system must allow the administrator to delete messages from an archived list
2.7. Transition Requirements
There are many existing archived messages containing embedded links into the existing MHonArc mail archive. These links must continue to work, but should reach the message as archived in the new system.
3. Future Considerations
The archive and search functions should anticipate internationalized email addresses as discussed in the EAI working group. Since the EAI WG has not finished their work, there is no firm requirement at this time.
4. Security Considerations
Creating a new tool for searching and archiving IETF email lists does not affect the security of the Internet in any significant fashion.
5. IANA Considerations
This document has no actions for IANA.
6. Acknowledgements
The Tools Development team provided input into this initial brainstorm.
7. References
[RFC4155] |
Hall, E., "The application/mbox Media Type", RFC 4155, September 2005. |
[mbox] |
Mbox", . | , "
[maildir] |
Maildir", . | , "
[mailman] |
Mailman", . | , "
Robert Sparks
Sparks
Tekelec
17210 Campbell Road
Suite 250
Dallas,
Texas
75254-4203
USA
EMail: RjS@nostrum.com