Internet-Draft CDDL grammar updates December 2023
Bormann Expires 17 June 2024 [Page]
Workgroup:
CBOR
Internet-Draft:
draft-ietf-cbor-update-8610-grammar-01
Updates:
8610 (if approved)
Published:
Intended Status:
Standards Track
Expires:
Author:
C. Bormann
Universität Bremen TZI

Updates to the CDDL grammar of RFC 8610

Abstract

The Concise Data Definition Language (CDDL), as defined in RFC 8610 and RFC 9165, provides an easy and unambiguous way to express structures for protocol messages and data formats that are represented in CBOR or JSON.

The present document updates RFC 8610 by addressing errata and making other small fixes for the ABNF grammar defined for CDDL there.

About This Document

This note is to be removed before publishing as an RFC.

The latest revision of this draft can be found at https://cbor-wg.github.io/update-8610-grammar/. Status information for this document may be found at https://datatracker.ietf.org/doc/draft-ietf-cbor-update-8610-grammar/.

Discussion of this document takes place on the CBOR Working Group mailing list (mailto:cbor@ietf.org), which is archived at https://mailarchive.ietf.org/arch/browse/cbor/. Subscribe at https://www.ietf.org/mailman/listinfo/cbor/.

Source for this draft and an issue tracker can be found at https://github.com/cbor-wg/update-8610-grammar.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 17 June 2024.

Table of Contents

1. Introduction

The Concise Data Definition Language (CDDL), as defined in [RFC8610] and [RFC9165], provides an easy and unambiguous way to express structures for protocol messages and data formats that are represented in CBOR or JSON.

The present document updates [RFC8610] by addressing errata and making other small fixes for the ABNF grammar defined for CDDL there.

1.1. Conventions and Definitions

The Terminology from [RFC8610] applies. The grammar in [RFC8610] is based on ABNF, which is defined in [STD68] and [RFC7405].

2. Clarifications and Changes based on Errata Reports

Compatibility:

errata fix

A number of errata reports have been made around some details of text string and byte string literal syntax: [Err6527] and [Err6543]. These are being addressed in this section, updating details of the ABNF for these literal syntaxes. Also, [Err6526] needs to be applied (backslashes have been lost during RFC processing in some text explaining backslash escaping).

2.1. Err6527 (text string literals)

The ABNF used in [RFC8610] for the content of text string literals is rather permissive:

; RFC 8610 ABNF:
text = %x22 *SCHAR %x22
SCHAR = %x20-21 / %x23-5B / %x5D-7E / %x80-10FFFD / SESC
SESC = "\" (%x20-7E / %x80-10FFFD)
Figure 1: Old ABNF for strings with permissive ABNF for SESC, but not allowing hex escapes

This allows almost any non-C0 character to be escaped by a backslash, but critically misses out on the \uXXXX and \uHHHH\uLLLL forms that JSON allows to specify characters in hex (which should be applying here according to Bullet 6 of Section 3.1 of [RFC8610]). (Note that we import from JSON the unwieldy \uHHHH\uLLLL syntax, which represents Unicode code points beyond U-FFFF by making them look like UTF-16 surrogate pairs; CDDL text strings are not using UTF-16 or surrogates.)

Both can be solved by updating the SESC production to:

; new rules collectively defining SESC:
SESC = "\" ( %x22 / "/" / "\" /                 ; \" \/ \\
             %x62 / %x66 / %x6E / %x72 / %x74 / ; \b \f \n \r \t
             (%x75 hexchar) )                   ; \uXXXX
hexchar = non-surrogate / (high-surrogate "\" %x75 low-surrogate)
non-surrogate = ((DIGIT / "A"/"B"/"C" / "E"/"F") 3HEXDIG) /
                ("D" %x30-37 2HEXDIG )
high-surrogate = "D" ("8"/"9"/"A"/"B") 2HEXDIG
low-surrogate = "D" ("C"/"D"/"E"/"F") 2HEXDIG
Figure 2: Updated string ABNF to allow hex escapes

(Notes: In ABNF, strings such as "A", "B" etc. are case-insensitive, as is intended here. We could have written %x62 as %s"b", but didn't, in order to maximize ABNF tool compatibility.)

Now that SESC is more restrictively formulated, this also requires an update to the BCHAR production used in the ABNF syntax for byte string literals:

; RFC 8610 ABNF:
bytes = [bsqual] %x27 *BCHAR %x27
BCHAR = %x20-26 / %x28-5B / %x5D-10FFFD / SESC / CRLF
bsqual = "h" / "b64"
Figure 3: Old ABNF for BCHAR

In BCHAR, the updated version explicitly allows \', which is no longer allowed in the updated SESC:

; new rule for BCHAR:
BCHAR = %x20-26 / %x28-5B / %x5D-10FFFD / SESC / "\'" / CRLF
Figure 4: Updated ABNF for BCHAR

2.2. Err6543 (byte string literals)

The ABNF used in [RFC8610] for the content of byte string literals lumps together byte strings notated as text with byte strings notated in base16 (hex) or base64 (but see also updated BCHAR production above):

; RFC 8610 ABNF:
bytes = [bsqual] %x27 *BCHAR %x27
BCHAR = %x20-26 / %x28-5B / %x5D-10FFFD / SESC / CRLF
Figure 5: Old ABNF for BCHAR

Change proposed by Errata Report 6543

Errata report 6543 proposes to handle the two cases in separate productions (where, with an updated SESC, BCHAR obviously needs to be updated as above):

; Err6543 proposal:
bytes = %x27 *BCHAR %x27
      / bsqual %x27 *QCHAR %x27
BCHAR = %x20-26 / %x28-5B / %x5D-10FFFD / SESC / CRLF
QCHAR = DIGIT / ALPHA / "+" / "/" / "-" / "_" / "=" / WS
Figure 6: Errata Report 8653 Proposal to Split the Byte String Rules

This potentially causes a subtle change, which is hidden in the WS production:

; RFC 8610 ABNF:
WS = SP / NL
SP = %x20
NL = COMMENT / CRLF
COMMENT = ";" *PCHAR CRLF
PCHAR = %x20-7E / %x80-10FFFD
CRLF = %x0A / %x0D.0A
Figure 7: ABNF definition of WS from RFC 8610

This allows any non-C0 character in a comment, so this fragment becomes possible:

foo = h'
   43424F52 ; 'CBOR'
   0A       ; LF, but don't use CR!
'

The current text is not unambiguously saying whether the three apostrophes need to be escaped with a \ or not, as in:

foo = h'
   43424F52 ; \'CBOR\'
   0A       ; LF, but don\'t use CR!
'

... which would be supported by the existing ABNF in [RFC8610].

No change needed after addressing [Err6527 (text string literals)] (Section [2.1])

note that the HTML rendering of the heading is butchered by xml2rfc, as noted in https://github.com/ietf-tools/xml2rfc/issues/683; we except this to have been fixed before this document is published

This document takes the simpler approach of leaving the processing of the content of the byte string literal to a semantic step after processing the syntax of the bytes/BCHAR rules as updated by Figure 2 and Figure 4.

The rules in Figure 7 are therefore applied to the result of this processing where bsqual is given as h or b64.

Note that this approach also works well with the use of byte strings in Section 3 of [RFC9165]. It does require some care when copy-pasting into CDDL models from ABNF that contains single quotes (which may also hide as apostrophes in comments); these need to be escaped or possibly replaced by %x27.

Finally, our approach would lend support to extending bsqual in CDDL similar to the way this is done for CBOR diagnostic notation in [I-D.ietf-cbor-edn-literals].

3. Small Enabling Grammar Changes

The two subsections in this section specify two small changes to the grammar that are intended to enable certain kinds of specifications.

3.1. Empty data models

Compatibility:

backward (not forward)

[RFC8610] requires a CDDL file to have at least one rule.

; RFC 8610 ABNF:
cddl = S 1*(rule S)
Figure 8: Old ABNF for top-level rule cddl

This makes sense when the file has to stand alone, as a CDDL data model needs to have at least one rule to provide an entry point (start rule).

With CDDL modules [I-D.ietf-cbor-cddl-modules], CDDL files can also include directives, and these might be the source of all the rules that ultimately make up the module created by the file. Any other rule content in the file has to be available for directive processing, making the requirement for at least one rule cumbersome.

Therefore, we extend the grammar as in Figure 9 and make the existence of at least one rule a semantic constraint, to be fulfilled after processing of all directives.

; new top-level rule:
cddl = S *(rule S)
Figure 9: Updated ABNF for top-level rule cddl

3.2. Non-literal Tag Numbers

Compatibility:

backward (not forward)

The existing ABNF syntax for expressing tags in CDDL is:

; extracted from RFC 8610 ABNF:
type2 /= "#" "6" ["." uint] "(" S type S ")"
Figure 10: Old ABNF for tag syntax

This means tag numbers can only be given as literal numbers (uints). Some specifications operate on ranges of tag numbers, e.g., [RFC9277] has a range of tag numbers 1668546817 (0x63740101) to 1668612095 (0x6374FFFF) to tag specific content formats. This can currently not be expressed in CDDL.

This update extends this to:

; new rules collectively defining the tagged case:
type2 /= "#" "6" ["." tag-number] "(" S type S ")"
tag-number = uint / ("<" type ">")
Figure 11: Updated ABNF for tag syntax

So the above range can be expressed in a CDDL fragment such as:

ct-tag<content> = #6.<ct-tag-number>(content)
ct-tag-number = 1668546817..1668612095
; or use 0x63740101..0x6374FFFF

Note that this syntax reuses the angle bracket syntax for generics; this reuse is innocuous as a generic parameter/argument only ever occurs after a rule name (id), while it occurs after . here. (Whether there is potential for human confusion can be debated; the above example deliberately uses generics as well.)

4. Security Considerations

The grammar fixes and updates in this document are not believed to create additional security considerations. The security considerations in Section 5 of [RFC8610] do apply, and specifically the potential for confusion is increased in an environment that uses a combination of CDDL tools some of which have been updated and some of which have not been, in particular based on Section 2.

5. IANA Considerations

This document has no IANA actions.

6. References

6.1. Normative References

[RFC8610]
Birkholz, H., Vigano, C., and C. Bormann, "Concise Data Definition Language (CDDL): A Notational Convention to Express Concise Binary Object Representation (CBOR) and JSON Data Structures", RFC 8610, DOI 10.17487/RFC8610, , <https://www.rfc-editor.org/rfc/rfc8610>.
[STD68]
Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, .

6.2. Informative References

[Err6526]
"Errata Report 6526", RFC 8610, <https://www.rfc-editor.org/errata/eid6526>.
[Err6527]
"Errata Report 6527", RFC 8610, <https://www.rfc-editor.org/errata/eid6527>.
[Err6543]
"Errata Report 6543", RFC 8610, <https://www.rfc-editor.org/errata/eid6543>.
[I-D.ietf-cbor-cddl-modules]
Bormann, C., "CDDL Module Structure", Work in Progress, Internet-Draft, draft-ietf-cbor-cddl-modules-00, , <https://datatracker.ietf.org/doc/html/draft-ietf-cbor-cddl-modules-00>.
[I-D.ietf-cbor-edn-literals]
Bormann, C., "CBOR Extended Diagnostic Notation (EDN): Application-Oriented Literals, ABNF, and Media Type", Work in Progress, Internet-Draft, draft-ietf-cbor-edn-literals-06, , <https://datatracker.ietf.org/doc/html/draft-ietf-cbor-edn-literals-06>.
[RFC7405]
Kyzivat, P., "Case-Sensitive String Support in ABNF", RFC 7405, DOI 10.17487/RFC7405, , <https://www.rfc-editor.org/rfc/rfc7405>.
[RFC9165]
Bormann, C., "Additional Control Operators for the Concise Data Definition Language (CDDL)", RFC 9165, DOI 10.17487/RFC9165, , <https://www.rfc-editor.org/rfc/rfc9165>.
[RFC9277]
Richardson, M. and C. Bormann, "On Stable Storage for Items in Concise Binary Object Representation (CBOR)", RFC 9277, DOI 10.17487/RFC9277, , <https://www.rfc-editor.org/rfc/rfc9277>.

Appendix A. Updated Collected ABNF for CDDL

This appendix provides the full ABNF from [RFC8610] with the updates applied in the present document.

cddl = S *(rule S)
rule = typename [genericparm] S assignt S type
     / groupname [genericparm] S assigng S grpent

typename = id
groupname = id

assignt = "=" / "/="
assigng = "=" / "//="

genericparm = "<" S id S *("," S id S ) ">"
genericarg = "<" S type1 S *("," S type1 S ) ">"

type = type1 *(S "/" S type1)

type1 = type2 [S (rangeop / ctlop) S type2]
; space may be needed before the operator if type2 ends in a name

type2 = value
      / typename [genericarg]
      / "(" S type S ")"
      / "{" S group S "}"
      / "[" S group S "]"
      / "~" S typename [genericarg]
      / "&" S "(" S group S ")"
      / "&" S groupname [genericarg]
      / "#" "6" ["." tag-number] "(" S type S ")"
      / "#" DIGIT ["." uint]                ; major/ai
      / "#"                                 ; any
tag-number = uint / ("<" type ">")


rangeop = "..." / ".."

ctlop = "." id

group = grpchoice *(S "//" S grpchoice)

grpchoice = *(grpent optcom)

grpent = [occur S] [memberkey S] type
       / [occur S] groupname [genericarg]  ; preempted by above
       / [occur S] "(" S group S ")"

memberkey = type1 S ["^" S] "=>"
          / bareword S ":"
          / value S ":"

bareword = id

optcom = S ["," S]

occur = [uint] "*" [uint]
      / "+"
      / "?"

uint = DIGIT1 *DIGIT
     / "0x" 1*HEXDIG
     / "0b" 1*BINDIG
     / "0"

value = number
      / text
      / bytes

int = ["-"] uint

; This is a float if it has fraction or exponent; int otherwise
number = hexfloat / (int ["." fraction] ["e" exponent ])
hexfloat = ["-"] "0x" 1*HEXDIG ["." 1*HEXDIG] "p" exponent
fraction = 1*DIGIT
exponent = ["+"/"-"] 1*DIGIT

text = %x22 *SCHAR %x22
SCHAR = %x20-21 / %x23-5B / %x5D-7E / %x80-10FFFD / SESC

SESC = "\" ( %x22 / "/" / "\" /                 ; \" \/ \\
             %x62 / %x66 / %x6E / %x72 / %x74 / ; \b \f \n \r \t
             (%x75 hexchar) )                   ; \uXXXX
hexchar = non-surrogate / (high-surrogate "\" %x75 low-surrogate)
non-surrogate = ((DIGIT / "A"/"B"/"C" / "E"/"F") 3HEXDIG) /
                ("D" %x30-37 2HEXDIG )
high-surrogate = "D" ("8"/"9"/"A"/"B") 2HEXDIG
low-surrogate = "D" ("C"/"D"/"E"/"F") 2HEXDIG

bytes = [bsqual] %x27 *BCHAR %x27
BCHAR = %x20-26 / %x28-5B / %x5D-10FFFD / SESC / "\'" / CRLF
bsqual = "h" / "b64"

id = EALPHA *(*("-" / ".") (EALPHA / DIGIT))
ALPHA = %x41-5A / %x61-7A
EALPHA = ALPHA / "@" / "_" / "$"
DIGIT = %x30-39
DIGIT1 = %x31-39
HEXDIG = DIGIT / "A" / "B" / "C" / "D" / "E" / "F"
BINDIG = %x30-31

S = *WS
WS = SP / NL
SP = %x20
NL = COMMENT / CRLF
COMMENT = ";" *PCHAR CRLF
PCHAR = %x20-7E / %x80-10FFFD
CRLF = %x0A / %x0D.0A
Figure 12: ABNF for CDDL as updated

Acknowledgments

TODO acknowledge.

Author's Address

Carsten Bormann
Universität Bremen TZI
Postfach 330440
D-28359 Bremen
Germany