Internet DRAFT - draft-seantek-text-markdown-media-type
draft-seantek-text-markdown-media-type
Network Working Group S. Leonard
Internet-Draft Penango, Inc.
Intended Status: Informational July 4, 2014
Expires: January 5, 2015
The text/markdown Media Type
draft-seantek-text-markdown-00.txt
Abstract
This document registers the text/markdown media type for use with
Markdown, a family of plain text formatting syntaxes that optionally
can be converted to formal markup languages such as HTML.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
Copyright Notice
Copyright (c) 2014 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Leonard Exp. January 5, 2015 [Page 1]
Internet-Draft The text/markdown Media Type July 4, 2014
1. Introduction
In computer systems, textual data is stored and processed using a
continuum of techniques. On the one end is plain text: a linear
sequence of characters in some character set (code), possibly
interrupted by line breaks, page breaks, or other control characters.
Plain text provides /some/ fixed facilities for formatting
instructions, namely codes in the character set that have meanings
other than "represent this character on the output medium"; however,
these facilities are not particularly extensible. Compare with
[RFC6838] Section 4.2.1. (Applications may neuter the effects of
these special characters by prohibiting them or by ignoring their
dictated meanings, as is the case with how modern applications treat
most control characters in US-ASCII.) On this end, any text reader or
editor that interprets the character set can be used to see or
manipulate the text. If some characters are corrupted, the corruption
is unlikely to affect the ability of a computer system to process the
text (even if the human meaning is changed).
On the other end is binary format: a sequence of instructions
intended for some computer application to interpret and act upon.
Binary formats are flexible in that they can store non-textual data
efficiently (perhaps storing no text at all, or only storing certain
kinds of text for very specialized purposes). Binary formats require
an application to be coded specifically to handle the format; no
partial interoperability is possible. Furthermore, if even one byte
or bit are corrupted in a binary format, it may prevent an
application from processing any of the data correctly.
Between these two extremes lies formatted text, i.e., text that
includes non-textual information coded in a particular way, that
affects the interpretation of the text by computer programs.
Formatted text is distinct from plain text and binary format in that
the non-textual information is encoded into textual characters, which
are assigned specialized meanings /not/ defined by the character set.
With a regular text editor and a standard keyboard (or other standard
input mechanism), a user can enter these textual characters to
express the non-textual meanings. For example, a character like "<"
no longer means "LESS-THAN SIGN"; it means the start of a tag or
element that affects the document in some way.
On the formal end of the spectrum is markup, a family of languages
for annotating a document in such a way that the annotations are
syntactically distinguishable from the text. Markup languages are
(reasonably) well-specified and tend to follow (mostly) standardized
syntax rules. Examples of markup languages include SGML, HTML, XML,
and LaTeX. Standardized rules lead to interoperability between markup
processors, but a skill requirement for new (human) users of the
Leonard Exp. January 5, 2015 [Page 2]
Internet-Draft The text/markdown Media Type July 4, 2014
language that they learn these rules in order to do useful work. This
imposition makes markup less accessible for non-technical users
(i.e., users who are unwilling or unable to invest in the requisite
skill development).
informal /---------formatted text----------\ formal
<------v-------------v-------------v-----------------------v---->
plain text informal markup formal markup binary format
(Markdown) (HTML, XML, etc.)
Figure 1: Degrees of Formality in Data Storage Formats for Text
On the informal end of the spectrum are lightweight markup languages.
In comparison with formal markup like XML, lightweight markup uses
simple syntax, and is designed to be easy for humans to enter with
basic text editors. Markdown, the subject of this document, is an
/informal/ plain text formatting syntax that is intentionally
targeted at non-technical users (i.e., users upon whom little to no
skill development is imposed) using unspecialized tools (i.e., text
boxes). Jeff Atwood once described these informal markup languages as
/humane/.[HUMANE]
Markdown specifically is a family of syntaxes that are based on the
original work of John Gruber with substantial contributions from
Aaron Swartz, released in 2004.[MARKDOWN] Since its release a number
of web or web-facing applications have incorporated Markdown into
their text entry systems, frequently with proprietary extensions. Fed
up with the complexity and security pitfalls of formal markup
languages (e.g., HTML5) and proprietary binary formats (e.g.,
commercial word processing software), yet unwilling to be confined to
the restrictions of plain text, many users have turned to Markdown
for document processing. Whole toolchains now exist to support
Markdown for online and offline projects.
Due to Markdown's intentional informality, there is no standard
specifying the Markdown syntax, and no governing body that guides or
impedes its development. Markdown works for users for two key
reasons. First, the markup instructions (in text) look similar to the
markup that they represent; therefore the cognitive burden to learn
the syntax is very low. Second, the primary arbiter of the syntax's
success is *running code*. The tool that converts the Markdown to a
presentable format, and not a series of formal pronouncements by a
standards body, is the basis for whether syntactic elements matter.
To support identifying and conveying Markdown (as distinguished from
plain text), this document defines a media type and a "flavor"
parameter that indicates, in broad strokes, the author's intent on
how to interpret the Markdown.
Leonard Exp. January 5, 2015 [Page 3]
Internet-Draft The text/markdown Media Type July 4, 2014
1.1. Requirements Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
2. Markdown Media Type Registration Applications
This section provides the media type registration application for the
text/markdown media type (see [RFC6838], Section 5.6).
Type name: text
Subtype name: markdown
Required parameters: charset. Per Section 4.2.1 of [RFC6838], charset
is REQUIRED. The default value is UTF-8. If omitted, parsers MAY
reject the input; if parsers accept the input, they MUST interpret
the content as UTF-8.
Optional parameters:
flavor=f; where f is an identifier that specifies the "flavor", or
variation, of the Markdown syntax. The parameter represents the
intent of the author, namely, that the Markdown will be interpreted
"best" (i.e., as the author intended) when processed with tools
associated with the identified flavor.
The flavor parameter is opaque and case-sensitive. Valid flavor
values can be any sequence of characters or bytes; in practice,
however, virtually all will be alphanumeric (US-ASCII) and
registered in the IANA Markdown Flavors Registry, discussed in
Section 4. Implementations checking flavor parameters MUST only
compare them for exact equality.
Encoding considerations: Text.
Security considerations:
Markdown interpreted as plain text is relatively harmless. A text
editor need only display the text. The editor SHOULD take care to
handle control characters appropriately, and to limit the effect of
the Markdown to the text editing area itself; malicious Unicode-
based Markdown could, for example, surreptitiously change the
directionality of the text. An editor for normal text would already
take these control characters into consideration, however.
Markdown interpreted as a precursor to other formats, such as HTML,
Leonard Exp. January 5, 2015 [Page 4]
Internet-Draft The text/markdown Media Type July 4, 2014
carry all of the security considerations as the target formats. For
example, HTML can contain instructions to execute scripts, redirect
the user to other webpages, download remote content, and upload
personally identifiable information. Markdown also can contain
islands of formal markup, such as HTML. These islands of formal
markup may be passed as-is, transformed, or ignored (perhaps
because the islands are conditional or incompatible) when the
Markdown is interpreted into the target format. Since Markdown may
have different interpretations depending on the tool and the
environment, a better approach is to analyze (and sanitize or
block) the output markup, rather than attempting to analyze the
Markdown.
Interoperability considerations:
Markdown flavors are designed to be broadly compatible with humans
("humane"), but not necessarily with each other. Therefore, syntax
in one Markdown flavor may be ignored or treated differently in
another flavor. The overall effect is a general degradation of the
output, proportional to the quantity of flavor-specific Markdown
used in the text. When it is desirable to reflect the author's
intent in the output, stick with the flavor identified in the
flavor parameter.
Published specification: This specification.
Applications that use this media type:
Markdown conversion tools, Markdown WYSIWYG editors, and plain text
editors and viewers; target markup processors indirectly use
Markdown (e.g., web browsers for Markdown converted to HTML).
Additional information:
Magic number(s): None
File extension(s): .md, .markdown
Macintosh File Type Code(s): TEXT
Person & email address to contact for further information:
Sean Leonard <dev+ietf@seantek.com>
Restrictions on usage: None.
Author: Sean Leonard <dev+ietf@seantek.com>
Intended usage: COMMON
Leonard Exp. January 5, 2015 [Page 5]
Internet-Draft The text/markdown Media Type July 4, 2014
Change controller: The IESG <iesg@ietf.org>
3. Example
The following is an example of Markdown as an e-mail attachment:
MIME-Version: 1.0
Content-Type: text/markdown; charset=UTF-8; flavor=GitHub
Content-Disposition: attachment; filename=readme.md
Sample GitHub Markdown
=============
This is some sample GitHub Flavored Markdown (*GFM*).
The generated HTML is then run through filters in the
[html-pipeline](https://github.com/jch/html-pipeline)
to perform things like [sanitization](#html-sanitization) and
[syntax highlighting](#syntax-highlighting).
Bulleted Lists
-------
Here are some bulleted lists...
* One Potato
* Two Potato
* Three Potato
- One Tomato
- Two Tomato
- Three Tomato
More Information
-----------
[.markdown, .md](http://daringfireball.net/projects/markdown/)
has more information.
4. IANA Considerations
IANA is asked to register the media type text/markdown in the
Standards tree using the application provided in Section 2 of this
document.
Leonard Exp. January 5, 2015 [Page 6]
Internet-Draft The text/markdown Media Type July 4, 2014
IANA is also asked to establish a subtype registry called "Markdown
Flavors". Entries in these registries is by Expert Review [RFC5226].
The Expert will determine whether the registration represents a bona-
fide variation of the Markdown syntax (i.e., neither a duplicate of
an existing registration nor a syntax that is something other than
Markdown; [MARKDOWN] SHALL be treated as a normative basis), a brief
description, one or more responsible parties, whether the flavor is
being maintained at the time of registration, and the existence of at
least one complete tool (with or without documentation) that
processes the Markdown syntax into a formal document language.
A responsible party can be an individual author or maintainer, a
corporate author or maintainer (plus an individual contact), or a
representative of a community of interest dedicated to the Markdown
syntax.
The registry shall have one initial value, "Standard", with the
following data:
Description:
The Markdown syntax as it exists in the Markdown 1.0.1 Perl script
at <http://daringfireball.net/projects/markdown/>, with accompanying
documentation at
<http://daringfireball.net/projects/markdown/syntax>.
Responsible Parties:
(individual)
John Gruber <http://daringfireball.net/>
<comments@daringfireball.net>
Currently Maintained? No
Tool:
Name: Markdown 1.0.1
Reference: <http://daringfireball.net/projects/markdown/>
Purpose: Converts to HTML or XHTML circa 2004.
5. Security Considerations
See the answer to the Security Considerations template questions in
Section 2.
6. References
6.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
Leonard Exp. January 5, 2015 [Page 7]
Internet-Draft The text/markdown Media Type July 4, 2014
[RFC5226] Narten, T., and H. Alvestrand, "Guidelines for Writing an
IANA Considerations Section in RFCs", RFC 5226, May 2008.
[RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type
Specifications and Registration Procedures", BCP 13, RFC
6838, January 2013.
6.2. Informative References
[HUMANE] Atwood, J., "Is HTML a Humane Markup Language?", WWW
http://blog.codinghorror.com/is-html-a-humane-markup-
language/, May 2008.
[MARKDOWN] Gruber, J., "Daring Fireball: Markdown", WWW
http://daringfireball.net/projects/markdown/, December
2004.
Leonard Exp. January 5, 2015 [Page 8]
Internet-Draft The text/markdown Media Type July 4, 2014
Author's Address
Sean Leonard
Penango, Inc.
5900 Wilshire Boulevard
21st Floor
Los Angeles, CA 90036
USA
EMail: dev+ietf@seantek.com
URI: http://www.penango.com/
Leonard Exp. January 5, 2015 [Page 9]