Internet DRAFT - draft-ietf-cbor-update-8610-grammar
draft-ietf-cbor-update-8610-grammar
CBOR C. Bormann
Internet-Draft Universität Bremen TZI
Updates: 8610 (if approved) 2 March 2024
Intended status: Standards Track
Expires: 3 September 2024
Updates to the CDDL grammar of RFC 8610
draft-ietf-cbor-update-8610-grammar-04
Abstract
The Concise Data Definition Language (CDDL), as defined in RFC 8610
and RFC 9165, provides an easy and unambiguous way to express
structures for protocol messages and data formats that are
represented in CBOR or JSON.
The present document updates RFC 8610 by addressing errata and making
other small fixes for the ABNF grammar defined for CDDL there.
About This Document
This note is to be removed before publishing as an RFC.
The latest revision of this draft can be found at https://cbor-
wg.github.io/update-8610-grammar/. Status information for this
document may be found at https://datatracker.ietf.org/doc/draft-ietf-
cbor-update-8610-grammar/.
Discussion of this document takes place on the CBOR Working Group
mailing list (mailto:cbor@ietf.org), which is archived at
https://mailarchive.ietf.org/arch/browse/cbor/. Subscribe at
https://www.ietf.org/mailman/listinfo/cbor/.
Source for this draft and an issue tracker can be found at
https://github.com/cbor-wg/update-8610-grammar.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Bormann Expires 3 September 2024 [Page 1]
Internet-Draft CDDL grammar updates March 2024
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 3 September 2024.
Copyright Notice
Copyright (c) 2024 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Revised BSD License text as
described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Conventions and Definitions . . . . . . . . . . . . . . . 3
2. Clarifications and Changes based on Errata Reports . . . . . 3
2.1. Err6527 (text string literals) . . . . . . . . . . . . . 3
2.2. Err6543 (byte string literals) . . . . . . . . . . . . . 5
Change proposed by Errata Report 6543 . . . . . . . . . . . . 5
No change needed after addressing Err6527 (text string
literals) (Section 2.1) . . . . . . . . . . . . . . . . . 6
3. Small Enabling Grammar Changes . . . . . . . . . . . . . . . 7
3.1. Empty data models . . . . . . . . . . . . . . . . . . . . 7
3.2. Non-literal Tag Numbers, Simple Values . . . . . . . . . 8
4. Security Considerations . . . . . . . . . . . . . . . . . . . 9
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9
6. References . . . . . . . . . . . . . . . . . . . . . . . . . 9
6.1. Normative References . . . . . . . . . . . . . . . . . . 9
6.2. Informative References . . . . . . . . . . . . . . . . . 10
Appendix A. Updated Collected ABNF for CDDL . . . . . . . . . . 11
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 13
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 13
Bormann Expires 3 September 2024 [Page 2]
Internet-Draft CDDL grammar updates March 2024
1. Introduction
The Concise Data Definition Language (CDDL), as defined in [RFC8610]
and [RFC9165], provides an easy and unambiguous way to express
structures for protocol messages and data formats that are
represented in CBOR or JSON.
The present document updates [RFC8610] by addressing errata and
making other small fixes for the ABNF grammar defined for CDDL there.
1.1. Conventions and Definitions
The Terminology from [RFC8610] applies. The grammar in [RFC8610] is
based on ABNF, which is defined in [STD68] and [RFC7405].
2. Clarifications and Changes based on Errata Reports
_Compatibility_: errata fix
A number of errata reports have been made around some details of text
string and byte string literal syntax: [Err6527] and [Err6543].
These are being addressed in this section, updating details of the
ABNF for these literal syntaxes. Also, [Err6526] needs to be applied
(backslashes have been lost during RFC processing in some text
explaining backslash escaping).
2.1. Err6527 (text string literals)
The ABNF used in [RFC8610] for the content of text string literals is
rather permissive:
; RFC 8610 ABNF:
text = %x22 *SCHAR %x22
SCHAR = %x20-21 / %x23-5B / %x5D-7E / %x80-10FFFD / SESC
SESC = "\" (%x20-7E / %x80-10FFFD)
Figure 1: Old ABNF for strings with permissive ABNF for SESC, but
not allowing hex escapes
This allows almost any non-C0 character to be escaped by a backslash,
but critically misses out on the \uXXXX and \uHHHH\uLLLL forms that
JSON allows to specify characters in hex (which should be applying
here according to Bullet 6 of Section 3.1 of [RFC8610]). (Note that
we import from JSON the unwieldy \uHHHH\uLLLL syntax, which
represents Unicode code points beyond U+FFFF by making them look like
UTF-16 surrogate pairs; CDDL text strings are not using UTF-16 or
surrogates.)
Bormann Expires 3 September 2024 [Page 3]
Internet-Draft CDDL grammar updates March 2024
Both can be solved by updating the SESC production. We use the
opportunity to add a popular form of directly specifying characters
in strings using hexadecimal escape sequences of the form \u{hex},
where hex is the hexadecimal representation of the Unicode scalar
value. The result is the new set of rules defining SESC in Figure 2:
; new rules collectively defining SESC:
SESC = "\" ( %x22 / "/" / "\" / ; \" \/ \\
%x62 / %x66 / %x6E / %x72 / %x74 / ; \b \f \n \r \t
(%x75 hexchar) ) ; \uXXXX
hexchar = "{" (1*"0" [ hexscalar ] / hexscalar) "}" /
non-surrogate / (high-surrogate "\" %x75 low-surrogate)
non-surrogate = ((DIGIT / "A"/"B"/"C" / "E"/"F") 3HEXDIG) /
("D" %x30-37 2HEXDIG )
high-surrogate = "D" ("8"/"9"/"A"/"B") 2HEXDIG
low-surrogate = "D" ("C"/"D"/"E"/"F") 2HEXDIG
hexscalar = "10" 4HEXDIG / HEXDIG1 4HEXDIG
/ non-surrogate / 1*3HEXDIG
HEXDIG1 = DIGIT1 / "A" / "B" / "C" / "D" / "E" / "F"
Figure 2: Updated string ABNF to allow hex escapes
(Notes: In ABNF, strings such as "A", "B" etc. are case-insensitive,
as is intended here. We could have written %x62 as %s"b", but
didn't, in order to maximize ABNF tool compatibility.)
Now that SESC is more restrictively formulated, this also requires an
update to the BCHAR production used in the ABNF syntax for byte
string literals:
; RFC 8610 ABNF:
bytes = [bsqual] %x27 *BCHAR %x27
BCHAR = %x20-26 / %x28-5B / %x5D-10FFFD / SESC / CRLF
bsqual = "h" / "b64"
Figure 3: Old ABNF for BCHAR
With the SESC updated as above, \' is no longer allowed in BCHAR;
this now needs to be explicitly included.
Bormann Expires 3 September 2024 [Page 4]
Internet-Draft CDDL grammar updates March 2024
Updating BCHAR also provides an opportunity to address [Err6278],
which points to an inconsistency in treating U+007F (DEL) between
SCHAR and BCHAR. As U+007F is not printable, including it in a byte
string literal is as confusing as for a text string literal, and it
should therefore be excluded from BCHAR as it is from SCHAR. The
same reasoning also applies to the C1 control characters, so we
actually exclude the entire range from U+007F to U+009F. The same
reasoning then also applies to text in comments (PCHAR). For
completeness, all these should also explicitly exclude the code
points that have been set aside for UTF-16's surrogates.
; new rules for BCHAR and SCHAR:
SCHAR = %x20-21 / %x23-5B / %x5D-7E / NONASCII / SESC
BCHAR = %x20-26 / %x28-5B / %x5D-7E / NONASCII / SESC / "\'" / CRLF
PCHAR = %x20-7E / NONASCII
NONASCII = %xA0-D7FF / %xE000-10FFFD
Figure 4: Updated ABNF for BCHAR, SCHAR, and PCHAR
(Note that, apart from addressing the inconsistencies, there is no
attempt to further exclude non-printable characters from the ABNF;
doing this properly would draw in complexity from the ongoing
evolution of the Unicode standard that is not needed here.)
2.2. Err6543 (byte string literals)
The ABNF used in [RFC8610] for the content of byte string literals
lumps together byte strings notated as text with byte strings notated
in base16 (hex) or base64 (but see also updated BCHAR production
above):
; RFC 8610 ABNF:
bytes = [bsqual] %x27 *BCHAR %x27
BCHAR = %x20-26 / %x28-5B / %x5D-10FFFD / SESC / CRLF
Figure 5: Old ABNF for BCHAR
Change proposed by Errata Report 6543
Errata report 6543 proposes to handle the two cases in separate
productions (where, with an updated SESC, BCHAR obviously needs to be
updated as above):
; Err6543 proposal:
bytes = %x27 *BCHAR %x27
/ bsqual %x27 *QCHAR %x27
BCHAR = %x20-26 / %x28-5B / %x5D-10FFFD / SESC / CRLF
QCHAR = DIGIT / ALPHA / "+" / "/" / "-" / "_" / "=" / WS
Bormann Expires 3 September 2024 [Page 5]
Internet-Draft CDDL grammar updates March 2024
Figure 6: Errata Report 8653 Proposal to Split the Byte String Rules
This potentially causes a subtle change, which is hidden in the WS
production:
; RFC 8610 ABNF:
WS = SP / NL
SP = %x20
NL = COMMENT / CRLF
COMMENT = ";" *PCHAR CRLF
PCHAR = %x20-7E / %x80-10FFFD
CRLF = %x0A / %x0D.0A
Figure 7: ABNF definition of WS from RFC 8610
This allows any non-C0 character in a comment, so this fragment
becomes possible:
foo = h'
43424F52 ; 'CBOR'
0A ; LF, but don't use CR!
'
The current text is not unambiguously saying whether the three
apostrophes need to be escaped with a \ or not, as in:
foo = h'
43424F52 ; \'CBOR\'
0A ; LF, but don\'t use CR!
'
... which would be supported by the existing ABNF in [RFC8610].
No change needed after addressing Err6527 (text string literals)
(Section 2.1)
This document takes the simpler approach of leaving the processing of
the content of the byte string literal to a semantic step after
processing the syntax of the bytes/BCHAR rules as updated by Figure 2
and Figure 4.
The rules in Figure 7 are therefore applied to the result of this
processing where bsqual is given as h or b64.
Bormann Expires 3 September 2024 [Page 6]
Internet-Draft CDDL grammar updates March 2024
Note that this approach also works well with the use of byte strings
in Section 3 of [RFC9165]. It does require some care when copy-
pasting into CDDL models from ABNF that contains single quotes (which
may also hide as apostrophes in comments); these need to be escaped
or possibly replaced by %x27.
Finally, our approach would lend support to extending bsqual in CDDL
similar to the way this is done for CBOR diagnostic notation in
[I-D.ietf-cbor-edn-literals].
3. Small Enabling Grammar Changes
The two subsections in this section specify two small changes to the
grammar that are intended to enable certain kinds of specifications.
3.1. Empty data models
_Compatibility_: backward (not forward)
[RFC8610] requires a CDDL file to have at least one rule.
; RFC 8610 ABNF:
cddl = S 1*(rule S)
Figure 8: Old ABNF for top-level rule cddl
This makes sense when the file has to stand alone, as a CDDL data
model needs to have at least one rule to provide an entry point
(start rule).
With CDDL modules [I-D.ietf-cbor-cddl-modules], CDDL files can also
include directives, and these might be the source of all the rules
that ultimately make up the module created by the file. Any other
rule content in the file has to be available for directive
processing, making the requirement for at least one rule cumbersome.
Therefore, we extend the grammar as in Figure 9 and make the
existence of at least one rule a semantic constraint, to be fulfilled
after processing of all directives.
; new top-level rule:
cddl = S *(rule S)
Figure 9: Updated ABNF for top-level rule cddl
Bormann Expires 3 September 2024 [Page 7]
Internet-Draft CDDL grammar updates March 2024
3.2. Non-literal Tag Numbers, Simple Values
_Compatibility_: backward (not forward)
The existing ABNF syntax for expressing tags in CDDL is:
; extracted from RFC 8610 ABNF:
type2 =/ "#" "6" ["." uint] "(" S type S ")"
Figure 10: Old ABNF for tag syntax
This means tag numbers can only be given as literal numbers (uints).
Some specifications operate on ranges of tag numbers, e.g., [RFC9277]
has a range of tag numbers 1668546817 (0x63740101) to 1668612095
(0x6374FFFF) to tag specific content formats. This can currently not
be expressed in CDDL. Similar considerations apply to simple values
(#7.xx).
This update extends the syntax to:
; new rules collectively defining the tagged case:
type2 =/ "#" "6" ["." head-number] "(" S type S ")"
/ "#" "7" ["." head-number]
head-number = uint / ("<" type ">")
Figure 11: Updated ABNF for tag and simple value syntaxes
For #6, the head-number stands for the tag number. For #7, the head-
number stands for the simple value if it is in the ranges 0..23 or
32..255 (as per Section 3.3 of RFC 8949 [STD94] the simple values
24..31 are not used). For 24..31, the head-number stands for the
"additional information", e.g., #7.25 or #7.<25> is a float16, etc.
(All ranges mentioned here are inclusive.)
So the above range can be expressed in a CDDL fragment such as:
ct-tag<content> = #6.<ct-tag-number>(content)
ct-tag-number = 1668546817..1668612095
; or use 0x63740101..0x6374FFFF
Notes:
1. This syntax reuses the angle bracket syntax for generics; this
reuse is innocuous as a generic parameter/argument only ever
occurs after a rule name (id), while it occurs after . here.
(Whether there is potential for human confusion can be debated;
the above example deliberately uses generics as well.)
Bormann Expires 3 September 2024 [Page 8]
Internet-Draft CDDL grammar updates March 2024
2. The updated ABNF grammar makes it a bit more explicit that the
number given after the optional dot is special, not giving the
CBOR "additional information" for tags and simple values as it is
with other uses of # in CDDL. (Adding this observation to
Section 2.2.3 of [RFC8610] is the subject of [Err6575]; it is
correctly noted in Section 3.6 of [RFC8610].) In hindsight,
maybe a different character than the dot should have been chosen
for this special case, however changing the grammar now would
have been too disruptive.
4. Security Considerations
The grammar fixes and updates in this document are not believed to
create additional security considerations. The security
considerations in Section 5 of [RFC8610] do apply, and specifically
the potential for confusion is increased in an environment that uses
a combination of CDDL tools some of which have been updated and some
of which have not been, in particular based on Section 2.
5. IANA Considerations
This document has no IANA actions.
6. References
6.1. Normative References
[RFC8610] Birkholz, H., Vigano, C., and C. Bormann, "Concise Data
Definition Language (CDDL): A Notational Convention to
Express Concise Binary Object Representation (CBOR) and
JSON Data Structures", RFC 8610, DOI 10.17487/RFC8610,
June 2019, <https://www.rfc-editor.org/rfc/rfc8610>.
[STD68] Internet Standard 68,
<https://www.rfc-editor.org/info/std68>.
At the time of writing, this STD comprises the following:
Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax
Specifications: ABNF", STD 68, RFC 5234,
DOI 10.17487/RFC5234, January 2008,
<https://www.rfc-editor.org/info/rfc5234>.
[STD94] Internet Standard 94,
<https://www.rfc-editor.org/info/std94>.
At the time of writing, this STD comprises the following:
Bormann Expires 3 September 2024 [Page 9]
Internet-Draft CDDL grammar updates March 2024
Bormann, C. and P. Hoffman, "Concise Binary Object
Representation (CBOR)", STD 94, RFC 8949,
DOI 10.17487/RFC8949, December 2020,
<https://www.rfc-editor.org/info/rfc8949>.
6.2. Informative References
[Err6278] "Errata Report 6278", RFC 8610,
<https://www.rfc-editor.org/errata/eid6278>.
[Err6526] "Errata Report 6526", RFC 8610,
<https://www.rfc-editor.org/errata/eid6526>.
[Err6527] "Errata Report 6527", RFC 8610,
<https://www.rfc-editor.org/errata/eid6527>.
[Err6543] "Errata Report 6543", RFC 8610,
<https://www.rfc-editor.org/errata/eid6543>.
[Err6575] "Errata Report 6575", RFC 8610,
<https://www.rfc-editor.org/errata/eid6575>.
[I-D.ietf-cbor-cddl-modules]
Bormann, C., "CDDL Module Structure", Work in Progress,
Internet-Draft, draft-ietf-cbor-cddl-modules-01, 18
December 2023, <https://datatracker.ietf.org/doc/html/
draft-ietf-cbor-cddl-modules-01>.
[I-D.ietf-cbor-edn-literals]
Bormann, C., "CBOR Extended Diagnostic Notation (EDN):
Application-Oriented Literals, ABNF, and Media Type", Work
in Progress, Internet-Draft, draft-ietf-cbor-edn-literals-
08, 1 February 2024,
<https://datatracker.ietf.org/doc/html/draft-ietf-cbor-
edn-literals-08>.
[RFC7405] Kyzivat, P., "Case-Sensitive String Support in ABNF",
RFC 7405, DOI 10.17487/RFC7405, December 2014,
<https://www.rfc-editor.org/rfc/rfc7405>.
[RFC9165] Bormann, C., "Additional Control Operators for the Concise
Data Definition Language (CDDL)", RFC 9165,
DOI 10.17487/RFC9165, December 2021,
<https://www.rfc-editor.org/rfc/rfc9165>.
Bormann Expires 3 September 2024 [Page 10]
Internet-Draft CDDL grammar updates March 2024
[RFC9277] Richardson, M. and C. Bormann, "On Stable Storage for
Items in Concise Binary Object Representation (CBOR)",
RFC 9277, DOI 10.17487/RFC9277, August 2022,
<https://www.rfc-editor.org/rfc/rfc9277>.
Appendix A. Updated Collected ABNF for CDDL
This appendix provides the full ABNF from [RFC8610] with the updates
applied in the present document.
cddl = S *(rule S)
rule = typename [genericparm] S assignt S type
/ groupname [genericparm] S assigng S grpent
typename = id
groupname = id
assignt = "=" / "/="
assigng = "=" / "//="
genericparm = "<" S id S *("," S id S ) ">"
genericarg = "<" S type1 S *("," S type1 S ) ">"
type = type1 *(S "/" S type1)
type1 = type2 [S (rangeop / ctlop) S type2]
; space may be needed before the operator if type2 ends in a name
type2 = value
/ typename [genericarg]
/ "(" S type S ")"
/ "{" S group S "}"
/ "[" S group S "]"
/ "~" S typename [genericarg]
/ "&" S "(" S group S ")"
/ "&" S groupname [genericarg]
/ "#" "6" ["." head-number] "(" S type S ")"
/ "#" "7" ["." head-number]
/ "#" DIGIT ["." uint] ; major/ai
/ "#" ; any
head-number = uint / ("<" type ">")
rangeop = "..." / ".."
ctlop = "." id
group = grpchoice *(S "//" S grpchoice)
Bormann Expires 3 September 2024 [Page 11]
Internet-Draft CDDL grammar updates March 2024
grpchoice = *(grpent optcom)
grpent = [occur S] [memberkey S] type
/ [occur S] groupname [genericarg] ; preempted by above
/ [occur S] "(" S group S ")"
memberkey = type1 S ["^" S] "=>"
/ bareword S ":"
/ value S ":"
bareword = id
optcom = S ["," S]
occur = [uint] "*" [uint]
/ "+"
/ "?"
uint = DIGIT1 *DIGIT
/ "0x" 1*HEXDIG
/ "0b" 1*BINDIG
/ "0"
value = number
/ text
/ bytes
int = ["-"] uint
; This is a float if it has fraction or exponent; int otherwise
number = hexfloat / (int ["." fraction] ["e" exponent ])
hexfloat = ["-"] "0x" 1*HEXDIG ["." 1*HEXDIG] "p" exponent
fraction = 1*DIGIT
exponent = ["+"/"-"] 1*DIGIT
text = %x22 *SCHAR %x22
SCHAR = %x20-21 / %x23-5B / %x5D-7E / NONASCII / SESC
SESC = "\" ( %x22 / "/" / "\" / ; \" \/ \\
%x62 / %x66 / %x6E / %x72 / %x74 / ; \b \f \n \r \t
(%x75 hexchar) ) ; \uXXXX
hexchar = "{" (1*"0" [ hexscalar ] / hexscalar) "}" /
non-surrogate / (high-surrogate "\" %x75 low-surrogate)
non-surrogate = ((DIGIT / "A"/"B"/"C" / "E"/"F") 3HEXDIG) /
("D" %x30-37 2HEXDIG )
high-surrogate = "D" ("8"/"9"/"A"/"B") 2HEXDIG
low-surrogate = "D" ("C"/"D"/"E"/"F") 2HEXDIG
Bormann Expires 3 September 2024 [Page 12]
Internet-Draft CDDL grammar updates March 2024
hexscalar = "10" 4HEXDIG / HEXDIG1 4HEXDIG
/ non-surrogate / 1*3HEXDIG
bytes = [bsqual] %x27 *BCHAR %x27
BCHAR = %x20-26 / %x28-5B / %x5D-7E / NONASCII / SESC / "\'" / CRLF
bsqual = "h" / "b64"
id = EALPHA *(*("-" / ".") (EALPHA / DIGIT))
ALPHA = %x41-5A / %x61-7A
EALPHA = ALPHA / "@" / "_" / "$"
DIGIT = %x30-39
DIGIT1 = %x31-39
HEXDIG = DIGIT / "A" / "B" / "C" / "D" / "E" / "F"
HEXDIG1 = DIGIT1 / "A" / "B" / "C" / "D" / "E" / "F"
BINDIG = %x30-31
S = *WS
WS = SP / NL
SP = %x20
NL = COMMENT / CRLF
COMMENT = ";" *PCHAR CRLF
PCHAR = %x20-7E / NONASCII
NONASCII = %xA0-D7FF / %xE000-10FFFD
CRLF = %x0A / %x0D.0A
Figure 12: ABNF for CDDL as updated
Acknowledgments
TODO acknowledge.
Many thanks go to the submitters of the errata reports addressed in
this document. In one of the ensuing discussions, Doug Ewell
proposed to define an ABNF rule NONASCII, of which we have included
the essence.
Author's Address
Carsten Bormann
Universität Bremen TZI
Postfach 330440
D-28359 Bremen
Germany
Phone: +49-421-218-63921
Email: cabo@tzi.org
Bormann Expires 3 September 2024 [Page 13]