Internet DRAFT - draft-seantek-abnf-more-core-rules

draft-seantek-abnf-more-core-rules



 



Network Working Group                                         S. Leonard
Internet-Draft                                             Penango, Inc.
Updates: 5234 (if approved)                               March 13, 2017
Intended Status: Standards Track                                        
Expires: September 14, 2017                                             


            Comprehensive Core Rules and References for ABNF
                 draft-seantek-abnf-more-core-rules-08
                                    
Abstract

   This document extends the base definition of ABNF (Augmented Backus-
   Naur Form) to include a reference syntax, along with core rules that
   provide comprehensive support for certain symbols related to ASCII.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF). Note that other groups may also distribute working
   documents as Internet-Drafts. The list of current Internet-Drafts is
   at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on September 14, 2017.

Copyright Notice

   Copyright (c) 2017 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document. Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

 


Leonard                     Standards Track                     [Page 1]

Internet-Draft              More Core Rules                   March 2017


1.  Introduction

   Augmented Backus-Naur Form (ABNF) [RFC5234] is a formal syntax that
   is popular among many Internet specifications. Many Internet
   documents employ this syntax along with the Core Rules defined in
   Appendix B.1 of [RFC5234]. However, the Core Rules do not specify
   many symbols in the ASCII range that are also needed by these relying
   documents, forcing document authors to define them as local rules.
   Sometimes different documents define these common symbols in
   different ways, resulting in confusion or incompatibility when the
   rules are misread or are combined with other sets of rules.
   Furthermore, [RFC5234] does not clarify whether referencing [RFC5234]
   for ABNF automatically defines its Core Rules.

   [RFC5234] also lacks a syntax for referring to rules from other
   specifications. Instead, authors have been required to name the rules
   and sources in the specification prose. While this method has served
   authors well, it has hampered machine-readable ABNF efforts for
   services such as syntax highlighting, automatic grammar checking, and
   compiling into target computer languages.

   This document addresses these problems by introducing a reference
   syntax for rules taken from other ABNF grammars, as well as an
   enhanced set of "Core Rules" based on ASCII that are usable without
   needing to be referenced.

2.  Comprehensive Core Rule Update

   This document provides Core Rules that include comprehensive support
   for certain symbols, namely DELETE (DEL) and the C0 controls in
   [ASCII86], which for purposes of this document is equivalent to
   [RFC0020]. The Comprehensive Core Rules are listed in Appendix A as a
   drop-in replacement for the Core Rules of [RFC5234].

3. Reference Syntax

   The purpose of reference syntax is to provide a uniform way to refer
   to rules in other ABNF grammars, without needing to "import",
   "recognize", or "take" all of the rules from those ABNF grammars into
   the subject grammar. The syntax in this section essentially replaces
   the verbiage: "{RULE} is taken from {[RFCXXXX]}" in text that
   describes the ABNF. This verbiage traditionally has appeared in the
   specification prose adjacent to ABNF, in ABNF prose-val productions,
   or in ABNF comments. The varying verbiage has made it difficult for
   both human readers and machine parsers to validate the ABNF. At the
   same time, the presence of such verbiage in the vast majority of
   published ABNF specifications in the RFC series demonstrates the need
   for a general-purpose referencing facility.
 


Leonard                     Standards Track                     [Page 2]

Internet-Draft              More Core Rules                   March 2017


   To reference a rule in another ABNF grammar, use the syntax
   rulename@REF. The referenced rule resolves to terminal values in the
   context of the referenced ABNF grammar. The following enhancement to
   [RFC5234] permits this referenced-rule syntax as a change to the
   <element> production:

      element    =  rulename [ "@" ruleref ] / group / option
                    char-val / num-val / prose-val

      ruleref    =  ref-doc / ref-path

      ref-doc    =  "[" 1*(SP / %x21-5A / %x5C / %x5E-7E) "]"
                        ; bracketed string of SP / VCHAR without [ or ]
                        ; TODO: are leading and trailing SP ok?

      ref-path   =  "<" 1*(SP / %x21-3B / "<" / "=" / ">" / %x3F-7E) ">"
                        ; angle-bracketed string of SP / VCHAR;
                        ; < and > permitted with  prefix

   In a referenced-rule production the <rulename> production preceding
   the "@" specifies the name of the rule in the reference containing
   ABNF. The <ruleref> production following the "@" specifies the
   reference containing the rule. This specification does not define the
   semantics if a rule is found in a grammar that is not ABNF. (This
   limitation is because rule names in ABNF are case-insensitive and
   drawn from a limited character repertoire. Some rule names in other
   BNFs may be unreachable or ambiguous, even though the productions
   named by the rules are linguistically compatible.)

   The <ref-doc> production is a document reference of a resource
   containing ABNF. The term "document reference" refers to "the
   document containing this ABNF (i.e., the instance of these production
   rules)". In IETF-related publications, ref-doc conveniently is of the
   same form as document references, such as "[RFC1605]". [[NB: in this
   draft:]] Arbitrary spaces (not tabs) are allowed; interior brackets
   "[" and "]" are prohibited.

   The <ref-path> production is a path to a resource containing ABNF.
   The ABNF might be in a text file or MIME entity, for example. The
   intent of this production is to accommodate file paths and Uniform
   Resource Identifiers [RFC3986] (including fragment identifier
   components), but this specification imposes no requirement to
   validate conformance to those syntaxes. If the characters "<" or ">"
   are present in the path, they are syntactically distinguishable from
   the ref-path terminators by being escaped with a preceding backslash.
   The assumption is that ref-doc rather than ref-path productions will
   be used in published standards documents.

 


Leonard                     Standards Track                     [Page 3]

Internet-Draft              More Core Rules                   March 2017


   [[NB: in this draft:]] This document only proposes referenced-rule
   syntax in <element> productions, that is, on the right-hand side of a
   rule definition. The referenced-rule syntax is not proposed to appear
   on the left-hand side at this time.

   Stylistically, authors are encouraged to put reference syntax at the
   top of a list of rules, and to limit usage of the reference syntax to
   the single element of a rule definition. For example:

                   You      =  Edward@[FFIV]
                   spoony   =  spoony@[FFIV]
                   bard     =  bard@[FF-JOB-CLASS]
                   chara    =  Tellah@[FFIV]

                   insult   =  chara ":" You spoony bard "!"

   Appendix B provides some tips on how to think about combining
   referenced ABNF rules with the subject ABNF grammar of a
   specification.

5. Effects on RFC 5234

   Formally, this document updates [RFC5234] but does not modify it in
   situ. Authors need to reference this document if they want to include
   these enhancements; bare references to [RFC5234] do not include this
   specification (or, for that matter, [RFC7405]). This directive
   follows a model whereby document authors can choose whether to invoke
   particular enhancements to ABNF. As time goes on, the IETF can
   determine how often these enhancements are invoked, and can decide
   whether to include them as part of a revision to the base [RFC5234].

   A bare reference to this document invokes the reference syntax
   enhancement and the Core Rules of Appendix A (i.e., the Core Rules do
   not have to use reference syntax).

   Appendix A of this document is meant to mirror Appendix B.1 of
   [RFC5234]; therefore, concurrently referencing Appendix B.1 of
   [RFC5234] is redundant yet harmless. Document authors who reference
   this document should use the rules of Appendix A, and should not
   attempt to redefine or provide incremental alternatives to them
   (except for backwards compatibility with prior documents).

6.  IANA Considerations This document implies no IANA considerations.

7.  Security Considerations

   While the Core Rules themselves may not be security-relevant, the use
   of such control characters could very well be security-relevant,
 


Leonard                     Standards Track                     [Page 4]

Internet-Draft              More Core Rules                   March 2017


   because they may trigger special functions on various devices, while
   being invisible in other contexts.

   Unfortunately security is relevant to the reference syntax in this
   document. Using the reference syntax facilitates automated processing
   of ABNF. A malicious source could supply different ABNF as an attack
   vector on a compiled program. Furthermore, referring to a mutable
   resource (e.g., a document series such as BCP) permits the resource
   to change its contained ABNF, which may be well-intentioned but have
   side-effects when combined with the referring ABNF. Authors should
   stick to persistent, durable references, whose integrity can be
   validated easily.

8.  Acknowledgements

   The author wishes to thank Paul Kyzivat and Chris Newman for ongoing
   discussion and comments during the development of this draft.

9.  References

9.1.  Normative References

   [ASCII86] American National Standards Institute, "Coded Character Set
              -- 7-bit American Standard Code for Information
              Interchange", ANSI X3.4, 1986.

   [RFC0020]  Cerf, V., "ASCII format for network interchange", RFC 20,
              October 1969.

   [RFC5234]  Crocker, D. and P. Overell, "Augmented BNF for Syntax
              Specifications: ABNF", STD 68, RFC 5234, January 2008.

9.2.  Informative References

   [UNICODE]  The Unicode Consortium, "The Unicode Standard, Version
              9.0.0", The Unicode Consortium, August 2016.

   [RFC1345]  Simonsen, K., "Character Mnemonics and Character Sets",
              RFC 1345, June 1992.

   [RFC3986]  Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
              Resource Identifier (URI): Generic Syntax", STD 66, RFC
              3986, January 2005.

   [RFC5198]  Klensin, J. and M. Padlipsky, "Unicode Format for Network
              Interchange", RFC 5198, March 2008.


 


Leonard                     Standards Track                     [Page 5]

Internet-Draft              More Core Rules                   March 2017


Appendix A.  Comprehensive Core Rules

   Certain basic rules are in uppercase, such as SP, HTAB, CRLF, DIGIT,
   ALPHA, etc.












































 


Leonard                     Standards Track                     [Page 6]

Internet-Draft              More Core Rules                   March 2017


         ALPHA          =  %x41-5A / %x61-7A   ; A-Z / a-z

         BIT            =  "0" / "1"

         CHAR           =  %x01-7F
                                ; any 7-bit US-ASCII character,
                                ;  excluding NUL

         CR             =  %x0D
                                ; carriage return

         CRLF           =  CR LF
                                ; Internet standard newline

         CTL            =  %x00-1F / %x7F
                                ; controls

         DIGIT          =  %x30-39
                                ; 0-9

         DQUOTE         =  %x22
                                ; " (Double Quote)

         HEXDIG         =  DIGIT / "A" / "B" / "C" / "D" / "E" / "F"

         HTAB           =  %x09
                                ; horizontal tab

         LF             =  %x0A
                                ; linefeed

         LWSP           =  *(WSP / CRLF WSP)
                                ; Use of this linear-white-space rule
                                ;  permits lines containing only white
                                ;  space that are no longer legal in
                                ;  mail headers and have caused
                                ;  interoperability problems in other
                                ;  contexts.
                                ; Do not use when defining mail
                                ;  headers and use with caution in
                                ;  other contexts.

         OCTET          =  %x00-FF
                                ; 8 bits of data

         SP             =  %x20

         VCHAR          =  %x21-7E
 


Leonard                     Standards Track                     [Page 7]

Internet-Draft              More Core Rules                   March 2017


                                ; visible (printing) characters

         WSP            =  SP / HTAB
                                ; white space

         NUL            =  %d0
         SOH            =  %d1
         STX            =  %d2
         ETX            =  %d3
         EOT            =  %d4
         ENQ            =  %d5
         ACK            =  %d6
         BEL            =  %d7
         BS             =  %d8
         HT             =  %d9   ; also defined as HTAB

         VT             =  %d11
         FF             =  %d12  ; (literally used in every RFC)

         SO             =  %d14
         SI             =  %d15
         DLE            =  %d16
         DC1            =  %d17
         DC2            =  %d18
         DC3            =  %d19
         DC4            =  %d20
         NAK            =  %d21
         SYN            =  %d22
         ETB            =  %d23
         CAN            =  %d24
         EM             =  %d25
         SUB            =  %d26
         ESC            =  %d27
         FS             =  %d28
         GS             =  %d29
         RS             =  %d30
         US             =  %d31

         DEL            =  %d127

         ASCII          =  %x00-7F
         C0             =  %x00-1F
         G0             =  VCHAR  ; 94-set





 


Leonard                     Standards Track                     [Page 8]

Internet-Draft              More Core Rules                   March 2017


Appendix B.  Guidance for Automated Referenced Rule Conversion

   ABNF is a formal notation for describing the syntax of languages used
   in (Internet-connected) computing. Emphasis is therefore placed on
   human interpretation of ABNF grammars in the context of prose
   specifications, over formal computer languages that require machine
   tools to interpret. Nevertheless, as a formal syntactic metalanguage,
   tools can interpret ABNF grammars and validate conformance of
   grammars to ABNF as well as conformance of language instances to
   ABNF-defined grammars. This informative appendix provides guidance on
   how an automated tool might convert between referenced rules and
   terminal values.

   [[TODO: Discuss and put content here.]]

   Assume the existence of an "ABNF extractor", a tool that takes as
   input a document, and provides as output a stream of ABNF conforming
   to the <rulelist> production of ABNF.

   Extract the document reference from the <refrule>.

   Match the document reference to a reference in the References section
   of an RFC or conforming Internet-Draft.

   Parse the reference for an identifier that can be dereferenced, e.g.,
   a file path or URI.

   Dereference the identifier.

   Use the ABNF extractor to extract ABNF from the dereferenced
   document.

   Identify the <rulename> that matches the <rulename> from the
   <refrule>.

   If the ABNF in the dereferenced document is resolved to terminal
   values, it is resolved in its own context, not in the context of the
   original <refrule>'s ABNF.










 


Leonard                     Standards Track                     [Page 9]

Internet-Draft              More Core Rules                   March 2017


Author's Address

   Sean Leonard
   Penango, Inc.
   5900 Wilshire Boulevard
   21st Floor
   Los Angeles, CA  90036
   USA

   EMail: dev+ietf@seantek.com
   URI:   http://www.penango.com/








































Leonard                     Standards Track                    [Page 10]