Internet DRAFT - draft-seantek-unicode-in-abnf
draft-seantek-unicode-in-abnf
Network Working Group S. Leonard
Internet-Draft Penango, Inc.
Updates: 5234 (if approved) C. Newman
Intended Status: Experimental Oracle
Expires: September 14, 2017 March 13, 2017
Unicode in ABNF
draft-seantek-unicode-in-abnf-03
Abstract
This experimental document adds support for Unicode strings in ABNF
(Augmented Backus-Naur Form), and provides certain symbols related to
Unicode code point ranges.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute working
documents as Internet-Drafts. The list of current Internet-Drafts is
at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft is a fork of
draft-seantek-abnf-more-core-rules-05.
Copyright Notice
Copyright (c) 2017 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Leonard & Newman Experimental [Page 1]
Internet-Draft Unicode in ABNF March 13, 2017
1. Introduction
Augmented Backus-Naur Form (ABNF) [RFC5234] is a formal syntax that
is popular among many Internet specifications. Many Internet
documents employ this syntax along with the Core Rules defined in
Appendix B.1 of [RFC5234]. ABNF is defined in terms of ASCII
[ASCII86, RFC0020]; however, Unicode [UNICODE] has become
increasingly popular--even required--as the Internet has evolved over
the last two decades. Unicode (as UTF-8) will be permitted in the RFC
series [IABNA], while [RFC5198] established Net-Unicode as the
standard form for the use of Unicode as "network text". Protocols
that originally were ASCII-based have been, or are being, extended to
support Unicode. However, protocols that use Unicode in some way
(e.g., permit UTF-8 content in a production) use different ABNF
expressions, some of which do not conform to the modern Unicode
Standard 9.0.0, and therefore could introduce interoperability or
security problems.
Many parties have expressed interest in incorporating [UNICODE] into
ABNF, yet the questions remain: "How?" and "To what extent?"
This document proposes standardized techniques for expressing Unicode
code points using ABNF. This document intends to be very conservative
in its approach: a conforming implementation only needs to know how
to map between the Unicode scalar values and any Unicode encoding
form. The Unicode Character Database (UCD, Section 4.1 of [UNICODE])
is intentionally not necessary. ABNF text that uses the syntax in
this document needs to be in a Unicode encoding form (Conformance
Clause D89 of [UNICODE]), but ABNF text that just uses the rules or
terminal values can be expressed in ASCII [RFC0020].
2. Unicode Code Points in ABNF
(Consult Section 2.3 of [RFC5234] in relation to this paragraph.)
Unicode has been expressed in several different ways in RFCs to-date.
This document establishes that in contexts where Unicode is specified
as the coded character set [RFC2130], the terminal values %x00-10FFFF
are to be used to represent the Unicode code points. Only the Unicode
scalar values are to be used in specifications that follow this
document; surrogate code points (%xD800-DFFF) are not to be used
[[NB: directly]]. This technique aligns ABNF with W3C EBNF [XMLEBNF]
and Unicode EBNF [UNICODE].
(Consult Section 2.4 and Appendix B.2 of [RFC5234] in relation to
this paragraph.)
In contexts where Unicode is specified as the character set, the
ABNF-based grammar may have multiple external encodings. This
document does not fix the encoding scheme. The obvious external
Leonard & Newman Experimental [Page 2]
Internet-Draft Unicode in ABNF March 13, 2017
encoding is UTF-8 (see Net-Unicode [RFC5198]), but other encodings
are possible. This document neither restricts productions to NFC, nor
provides a syntax for normalization to NFC.
3. Unicode Core Rule Update
Appendix A furnishes Unicode Core Rules that include comprehensive
support for certain Unicode ranges and characters. These Unicode Core
Rules supplement the Core Rules of [RFC5234] and [ABNFMORE]; they are
intended to be available whenever this document is invoked.
The rules reflect broad categories of allowable and disallowable
characters in protocols for interchange between systems, as the
Internet community has evolved, and as of Unicode 9.0.0 in August
2016 [UNICODE]. It is a design goal that a general-purpose ABNF
grammar should not need to delve into the minutiae of Unicode
character properties, which can be tailorable (i.e., language-
specific), overridable, and unstable (between Unicode versions). It
is a further design goal that a general-purpose ABNF grammar should
not need to rely on sizeable external sources, namely the Unicode
Character Database (Section 4.1 of [UNICODE]). To constrain this
document's scope, character properties are not addressed further.
According to a survey of all RFCs published through August 2016, many
widely used Internet protocols rely on horizontal whitespace (HT and
SP, or occasionally SP alone) and line breaks (usually CRLF,
sometimes LF) as delimiters. Therefore, the rules specifically
address horizontal whitespace and line breaks.
Rules that both include and exclude the private-use characters
(Section 23.5 of [UNICODE]) are provided. Private-use characters "are
intended for open interchange, subject to interpretation by private
agreement" (Section 23.7 of [UNICODE]). Therefore, there is no way
within [UNICODE] itself to provide for a common interpretation of
these code points. See also Section 4 of [RFC5198]. A protocol
designer needs to establish that common interpretation in prose,
provide for protocol elements that establish the common
interpretation, or (explicitly) accept that a common interpretation
is done outside of the designer's protocol.
4. Case-Sensitive Unicode String Syntax
This document extends ABNF with a new case-sensitive Unicode string
literal. The type is denoted using a type prefix similar to the type
prefixes used with numeric values and case-sensitive ASCII string
literals. No syntax is provided for a case-insensitive Unicode string
literal because doing so would require implementing Unicode caseless
matching [UNICODE], which is language-dependent, Unicode version-
Leonard & Newman Experimental [Page 3]
Internet-Draft Unicode in ABNF March 13, 2017
dependent, and very complicated overall. Caseless matching also
requires the UCD.
Add the contents of Section 4.1 to [RFC5234].
4.1. Terminal Values - Literal Text Strings
Literal case sensitive text strings in ABNF may be in the Unicode
character set [UNICODE]. The following prefix is used:
%su = case-sensitive, Unicode
To be consistent with prior implementations of ABNF, having no prefix
means that the string is case insensitive and in ASCII.
[[ALT/DISCUSS: [RFC7405] %s"text" could be extended to support
characters beyond ASCII. It is a strict superset of [RFC7405] and
thus simpler. This document would leave [%i]"text" undefined for the
time being, or, a collation from [RFC4790] could be identified.]]
The case-sensitive Unicode string can be comprised of any Graphic,
Format, or Reserved code point. Control, Private-Use, Surrogate, and
Noncharacter code points are excluded. Newline (line breaking)
characters are also omitted. (See Table 2-3 of [UNICODE].)
An example:
rulename = %su"!100Q$"
where the character ! is actually the Unicode code point U+00A5 YEN
SIGN, and the character $ is actually the Unicode code point U+1F39F
ADMISSION TICKETS, is equivalent to the rule:
rulename = %xA5.31.30.30.51.1F39F
4.2. ABNF Definition of ABNF - char-val
char-val =/ case-sensitive-Unicode-string
; ALT/DISCUSS: "%s", modify 7405
case-sensitive-Unicode-string =
"%su" quoted-Unicode-string
quoted-Unicode-string = DQUOTE *(%x20-21 / %x23-7E /
UVCHARBEYONDASCII) DQUOTE
; quoted string of SP and VCHAR
; without DQUOTE, and UVCHAR
; beyond the ASCII range
Leonard & Newman Experimental [Page 4]
Internet-Draft Unicode in ABNF March 13, 2017
5. Terminal Value Transformation Syntax for UTF-8 and UTF-16
While Section 2 establishes terminal values %x00-10FFFF for Unicode,
many Internet protocols incorporate Unicode using UTF-8 and define
protocol elements using UTF-8 terminal values (i.e., values in the 8-
bit range of %x00-FF, or more specifically, %x00-BF and %xC2-F4); see
[RFC3629]. A smaller yet notable set of protocols use UTF-16.
Writing out Unicode code points or ranges in UTF-8 or UTF-16 can be
cumbersome and error-prone. This document therefore provides a
"terminal value transformation syntax", so that the code points %x00-
10FFFF can be written out natively, but the resulting ABNF represents
8-bit or 16-bit units at the level of ABNF syntax. From there, a
protocol can supply a specific mapping (encoding) of those values
into a character set or other representation, consistent with Section
2.3 of [RFC5234].
The syntax is:
%t8(...) for 8-bit UTF-8 (transform to %x00-BF and %xC2-F4)
%t16(...) for 16-bit UTF-16 (transform to %x00-D7FF,
%xD800-DBFF %xDC00-DFFF, and %xE000-FFFF)
%t16le(...) for 8-bit UTF-16LE (transform to %x00.00-%xFF.FF,
little-endian)
%t16be(...) for 8-bit UTF-16BE (transform to %x00.00-%xFF.FF,
big-endian)
[[NB: Other possibilities: !t8 ~t8 $t8 #t8 -t8]]
A transform is applied by recursively driving it into the elements,
transforming terminal values from the original code point to the
corresponding Unicode Transformation Format over an 8-bit (or 16-bit)
field. The transforms in this document distribute over ABNF
operators. "%t16" outputs 16-bit terminal values from %x00-FFFF,
meaning that the endianness is not specified: a protocol needs to
specify this or furnish a protocol slot for 16-bit code units. In
contrast, "%t16be" and "%t16le" output 8-bit terminal values: each
terminal value in the input will correspond to two or four terminal
values in the output.
If a transform is used on a terminal value outside the Unicode scalar
value range (see the proposed Core Rule <UNICODE>), the resulting
terminal value can be neither satisfied nor produced.
A "reverse transformation syntax" to go from 8-bit or 16-bit terminal
values to reassembled Unicode code points is not proposed at this
time.
5.1. Examples
Leonard & Newman Experimental [Page 5]
Internet-Draft Unicode in ABNF March 13, 2017
Example 1: The following rules are equivalent; see [RFC3629]:
UTF8-MB = UTF8-2 / UTF8-3 / UTF8-4 ; from RFC 3629
; %x80-D7FF / %xE000-10FFFF
UTF8-MB = %t8( BEYONDASCII )
Example 2: The code point U+1F430 RABBIT FACE can be represented as
%x1F430. It can also be represented as %xD83D.DC30 or %t16( %x1F430 )
when UTF-16 is intended.
5.2. Advantages and Features
Using transformation syntax offers several advantages:
The generic ABNF syntax of a textual protocol can take full advantage
of the Unicode character set; the syntax is not dependent on a
particular encoding form.
Specifying ranges of characters becomes unwieldy when explicitly
defined in terms of code units in a Unicode encoding form, e.g., as
UTF-8 code units (octets) for characters beyond ASCII, or as UTF-16
code units (16-bit words) for supplementary characters. Trying to
specify Punycode in ABNF would be, for all intents and purposes,
impossible! (Note: it's not actually impossible, but very difficult
and not particularly useful.)
Protocols that have arbitrary binary slots (e.g., BINARYMIME) are
inherently incompatible with Section 2 syntax, but compatibility can
be achieved by using transformation syntax.
Protocol designers can effectively exploit the "holes" in UTF-8,
because octets C0, C1, and F5-FF are never seen in UTF-8. These
octets provide natural delimiters for arbitrary runs of UTF-8. An
advantage of using such octets as delimiters is that checking for
these octets has to be done anyway for security reasons, so a
designer can save cycles by incorporating this part of a check for
well-formed Unicode into a protocol. Such delimiters can only be
expressed outside of "%t8", since a "%t8" transform will never
produce those terminal values.
(UTF-16 also has such "holes", namely, in unpaired surrogates. But
using unpaired surrogates as delimiters may suffer from other
security pitfalls; in any event, UTF-16 is far less common in IETF
usage.)
Leonard & Newman Experimental [Page 6]
Internet-Draft Unicode in ABNF March 13, 2017
6. Comment Syntax
This document extends ABNF to have Unicode comments. Comments are
treated as specification prose, so they may be normative depending on
the context. Comment text allows for the same repertoire of
characters as RFC text. The RFC Editors can regulate comments to the
same extent as specification prose, including disallowing certain
characters or code points.
6.1. Comment: ; Comment
(No changes to the text of Section 3.9 of [RFC5234] are needed.)
6.2. ABNF Definition of ABNF - comment
; given:
comment = ";" *(WSP / VCHAR) CRLF
; increment (unambiguous grammar):
comment =/ ";" *(UWSP / UVCHAR / PUACHAR)
(UWSPBEYONDASCII / UVCHARBEYONDASCII / PUACHAR)
*(UWSP / UVCHAR / PUACHAR) CRLF
; or redefine:
comment = ";" *(UWSP / UVCHAR / PUACHAR) CRLF
7. Notational Conventions
For readability it is advisable to express a Unicode code point as
the character itself, the numeric terminal value, and the name or a
name alias. Only one expression is used for the formal ABNF notation:
either the character itself (Section 4) or the numeric terminal value
(Section 2). The other expressions can be incorporated into an
adjacent comment.
The suggested notational convention for the adjacent comment follows
Appendix A of [UNICODE]. The comment text is comprised of one or more
WSP characters, optionally either the character itself or "U+" syntax
followed by exactly one SP, and the name or a name alias in ALL-CAPS
ASCII. Multiple characters can be notated in sequence on multiple
comment lines or on a single comment line. It is neither advisable
nor necessary to notate characters in the ASCII range. Examples of
the notation include:
Leonard & Newman Experimental [Page 7]
Internet-Draft Unicode in ABNF March 13, 2017
; U+2206 INCREMENT
; U+2030 PER MILLE SIGN
change-in-temp = %su"$" 3DIGIT %su"%"
; # EURO SIGN ZWJ / VULGAR FRACTION ONE HALF
euros = %x20AC 3DIGIT [%x200D.BD]
where the characters $, %, #, and / are actually the respective
Unicode characters mentioned in the comments.
8. Effects on RFC 5234
Formally, this document updates [RFC5234] but does not modify it in
situ. Authors need to reference this document if they want to include
these enhancements; bare references to [RFC5234] do not include this
specification (or, for that matter, [RFC7405]). This directive
follows a model whereby document authors can choose whether to invoke
particular enhancements to ABNF. As time goes on, the IETF can
determine how often these enhancements are invoked, and can decide
whether to include them as part of a revision to the base [RFC5234].
A bare reference to this document invokes the case-sensitive Unicode
literal string syntax enhancement, the Unicode comment syntax
enhancement, and the Unicode Core Rules of Appendix A (i.e., the Core
Rules do not have to be further referenced). Nevertheless, document
authors are free to qualify a reference to this document to invoke
each feature selectively.
Appendix A of this document is meant to supplement Appendix B.1 of
[RFC5234] and Appendix A of [ABNFMORE]; therefore, concurrently
referencing those documents is a good idea. Document authors who
reference this document should use the rules of Appendix A, and
should not attempt to redefine or provide incremental alternatives to
them (except for backwards compatibility with prior documents).
9. IANA Considerations
This document implies no IANA considerations.
10. Security Considerations
While the Unicode Core Rules themselves may not be security-relevant,
the use of C1 control characters could very well be security-
relevant, because they may trigger special functions on various
devices, while being invisible in other contexts. Similarly, case-
sensitive Unicode string syntax allows for a broad range of code
points, many of which represent characters that are confusable with
other characters, or can only be inferred by visible yet subtle
Leonard & Newman Experimental [Page 8]
Internet-Draft Unicode in ABNF March 13, 2017
changes in the surrounding graphemes (or worse, semantic changes that
do not have visual representations).
Protocols using Unicode should evaluate the applicability of Unicode
security considerations [UTR#36].
11. References
11.1. Normative References
[ASCII86] American National Standards Institute, "Coded Character
Set -- 7-bit American Standard Code for Information
Interchange", ANSI X3.4, 1986.
[RFC0020] Cerf, V., "ASCII format for network interchange", RFC 20,
October 1969.
[RFC5198] Klensin, J. and M. Padlipsky, "Unicode Format for Network
Interchange", RFC 5198, March 2008.
[RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax
Specifications: ABNF", STD 68, RFC 5234, January 2008.
[UNICODE] The Unicode Consortium, "The Unicode Standard, Version
9.0.0", The Unicode Consortium, August 2016.
11.2. Informative References
[IABNA] Flanagan, H., "The Use of Non-ASCII Characters in RFCs",
draft-iab-rfc-nonascii-02 (work in progress), April 2016.
[RFC1345] Simonsen, K., "Character Mnemonics and Character Sets",
RFC 1345, June 1992.
[RFC2130] Weider, C., Preston, C., Simonsen, K., Alvestrand, H.,
Atkinson, R., Crispin, M., and P. Svanberg, "The Report of
the IAB Character Set Workshop held 29 February - 1 March,
1996", RFC 2130, April 1997.
[RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO
10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November
2003.
[RFC4790] Newman, C., Duerst, M., and A. Gulbrandsen, "Internet
Application Protocol Collation Registry", RFC 4790, March
2007.
[RFC7405] Kyzivat, P., "Case-Sensitive String Support in ABNF", RFC
Leonard & Newman Experimental [Page 9]
Internet-Draft Unicode in ABNF March 13, 2017
7405, December 2014.
[UTR#36] Davis, M. and M. Suignard, "Unicode Security
Considerations", Unicode Technical Report #36, September
2014, <http://unicode.org/reports/tr36/>.
[XMLEBNF] Bray, T., Paoli, J., Sperberg-McQueen, M., Maler, E., and
F. Yergeau, "Extensible Markup Language (XML) 1.0 (Fifth
Edition)", Section 6, W3C Recommendation REC-xml-20081126,
November 2008, <http://www.w3.org/TR/2008/REC-xml-
20081126>.
Appendix A. Comprehensive Unicode Core Rules
Certain basic rules are in uppercase, such as SP, HTAB, CRLF, DIGIT,
ALPHA, etc.
; D76 Unicode scalar value
UNICODE = <U+0000-U+D7FF / U+E000-U+10FFFF>
BEYONDASCII = <U+0080-U+D7FF / U+E000-U+10FFFF>
BEYONDG0 = <U+0080-U+D7FF / U+E000-U+10FFFF>
C1 = <U+0080-U+009F>
BEYONDC1 = <U+00A0-U+D7FF / U+E000-U+10FFFF>
G1 = <U+00A0-U+00FF> ; 96-set
BEYONDG1 = <U+0100-U+D7FF / U+E000-U+10FFFF>
LATIN1 = <U+0000-U+00FF>
BEYONDLATIN1 = <U+0100-U+D7FF / U+E000-U+10FFFF>
; C2 D14 noncharacter (sentinel)
; Section 23.7 Noncharacters, see also NUL
NONUCHAR = <U+FDD0-U+FDEF / U+FFFE-U+FFFF /
U+1FFFE-U+1FFFF / U+2FFFE-U+2FFFF /
U+3FFFE-U+1FFFF / U+4FFFE-U+4FFFF /
U+5FFFE-U+1FFFF / U+6FFFE-U+6FFFF /
U+7FFFE-U+1FFFF / U+8FFFE-U+8FFFF /
U+9FFFE-U+1FFFF / U+AFFFE-U+AFFFF /
U+BFFFE-U+1FFFF / U+CFFFE-U+CFFFF /
U+DFFFE-U+1FFFF / U+EFFFE-U+EFFFF /
U+FFFFE-U+FFFFF / U+10FFFE-U+10FFFF>
; UCHAR rules are analogous to CHAR
UCHARBEYONDBMP = <U+10000-U+1FFFD / U+20000-U+2FFFD /
U+30000-U+3FFFD / U+40000-U+4FFFD /
Leonard & Newman Experimental [Page 10]
Internet-Draft Unicode in ABNF March 13, 2017
U+50000-U+5FFFD / U+60000-U+6FFFD /
U+70000-U+7FFFD / U+80000-U+8FFFD /
U+90000-U+9FFFD / U+A0000-U+AFFFD /
U+B0000-U+BFFFD / U+C0000-U+CFFFD /
U+D0000-U+DFFFD / U+E0000-U+EFFFD /
U+F0000-U+FFFFD / U+100000-U+10FFFD>
UCHARBEYONDLATIN1 = <U+0100-U+D7FF / U+E000-U+FDCF /
U+FDF0-U+FFFD> / UCHARBEYONDBMP
UCHARBEYONDC1 = <U+00A0-U+D7FF / U+E000-U+FDCF / U+FDF0-U+FFFD>
/ UCHARBEYONDBMP
UCHARBEYONDASCII = C1 / UCHARBEYONDC1
UCHAR = <U+0001-U+D7FF / U+E000-U+FDCF / U+FDF0-U+FFFD> /
UCHARBEYONDBMP
; D49 private-use
; Section 23.5 Private-Use Characters
; Primary Private Use Area (in BMP)
PPUACHAR = <U+E000-U+F8FF>
; Supplementary Private Use Area-A
SPUAACHAR = <U+F0000-U+FFFFF>
; Supplementary Private Use Area-B
SPUABCHAR = <U+100000-U+10FFFF>
; TODO: possible alternates: PUCHAR, PUA
PUACHAR = PPUACHAR / SPUAACHAR / SPUABCHAR
; Unicode-y VCHAR: like VCHAR, attempts to capture
; "all standardized graphic and formatting
; characters/code points for open interchange,
; excluding white space and controls"
; EXCLUDES: Noncharacters (some Cn), Cs, Co, Cc, Z (Zs, Zl, Zp)
UVCHARBEYONDBMP = <U+10000-U+1FFFD / U+20000-U+2FFFD /
U+30000-U+3FFFD / U+40000-U+4FFFD /
U+50000-U+5FFFD / U+60000-U+6FFFD /
U+70000-U+7FFFD / U+80000-U+8FFFD /
U+90000-U+9FFFD / U+A0000-U+AFFFD /
U+B0000-U+BFFFD / U+C0000-U+CFFFD /
U+D0000-U+DFFFD / U+E0000-U+EFFFD>
UVCHARBEYONDLATIN1 = <U+0100-U+167F / U+1681-U+1FFF /
U+200B-U+2027 / U+202A-U+202E /
U+2030-U+205E / U+2060-U+2FFF /
Leonard & Newman Experimental [Page 11]
Internet-Draft Unicode in ABNF March 13, 2017
U+3001-U+D7FF /
U+F900-U+FDCF / U+FDF0-U+FFFD> /
UVCHARBEYONDBMP
UVCHARBEYONDASCII = <U+00A1-U+167F / U+1681-U+1FFF /
U+200B-U+2027 / U+202A-U+202E /
U+2030-U+205E / U+2060-U+2FFF /
U+3001-U+D7FF /
U+F900-U+FDCF / U+FDF0-U+FFFD> /
UVCHARBEYONDBMP
UVCHARBEYONDC1 = UVCHARBEYONDASCII
UVCHAR = VCHAR / UVCHARBEYONDASCII
; horizontal white space only (Zs beyond ASCII),
; NO line breaks (Cc, Zl, Zp)
; cf Section 5.8 Newline Guidelines with RFC 5198
; see also SP
UWSPBEYONDASCII = <U+00A0 / U+1680 / U+2000-U+200A /
U+202F / U+205F / U+3000>
; includes HT
UWSP = WSP / UWSPBEYONDASCII
; C1 Controls
PAD = <U+0080> ; gov't health warning: figment
HOP = <U+0081> ; gov't health warning: figment
BPH = <U+0082>
NBH = <U+0083>
IND = <U+0084>
NEL = <U+0085>
; NLF CRLF, CR, LF, NEL (not LS or PS)
; --probably unnecessary for Internet usage:
; CRLF is already the standard
SSA = <U+0086>
ESA = <U+0087>
HTS = <U+0088>
HTJ = <U+0089>
VTS = <U+008A>
PLD = <U+008B>
PLU = <U+008C>
RI = <U+008D>
SS2 = <U+008E>
SS3 = <U+008F>
DCS = <U+0090>
PU1 = <U+0091>
PU2 = <U+0092>
Leonard & Newman Experimental [Page 12]
Internet-Draft Unicode in ABNF March 13, 2017
STS = <U+0093>
CCH = <U+0094>
MW = <U+0095>
SPA = <U+0096>
EPA = <U+0097>
SOS = <U+0098>
SGCI = <U+0099> ; or SGC, gov't health warning: figment
SCI = <U+009A>
CSI = <U+009B>
ST = <U+009C>
OSC = <U+009D>
PM = <U+009E>
APC = <U+009F>
; Latin1
NBSP = <U+00A0>
SHY = <U+00AD>
; Zl, Zp
; NB: These are excluded from both UVCHAR and UWSP
LS = <U+2028>
PS = <U+2029>
Authors' Addresses
Sean Leonard
Penango, Inc.
5900 Wilshire Boulevard
21st Floor
Los Angeles, CA 90036
USA
EMail: dev+ietf@seantek.com
URI: http://www.penango.com/
Chris Newman
Oracle
440 E. Huntington Dr., Suite 400
Arcadia, CA 91006
USA
EMail: chris.newman@oracle.com
Leonard & Newman Experimental [Page 13]