Network Working Group | P. Hallam-Baker |
Internet-Draft | Comodo Group Inc. |
Expires: April 20, 2015 | October 17, 2014 |
Protocol Specification Tool
draft-hallambaker-protogen-00
The syntax for the PROTOGEN protocol specification tool is described and the use of the tool to generate protocol specifications, prototype and production implementations. While the primary focus of PROTOGEN is to develop protocols using JSON message syntax, the PROTOGEN framework has been successfully applied to generate prototypes using ASN.1, TLS, XML and RFC822 style syntax.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 20, 2015.
Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
The use of schemas to describe communication protocols is well established and plays a central role in the development of ASN.1 and XML based protocols. No such tools are currently widely used for writing JSON based protocols.
It is the view of the author that the first, last and only purpose of a protocol schema language is to enable the use of tools to support the development effort. A schema language that delays rather than advances the development of correct code and consistent documentation has become a liabilty, not an enabler.
One of the main reasons for the lack of such a tool has been the widespread concern as to the complexity of traditional schema tools and in particular the tendency of such tools to impose a complex data model on simple problems.
One major difference in the design of the Protogen schema language to its predecessors is that it does not attempt to support every feature of the JSON data model. Protogen is designed to allow programmers to design and implement network service protocols quickly using widely used programming languages such as C, C# and Java. JSON features that do not map conveniently to the majority of widely used languages are best ignored.
The XML Schema language is particularly obtuse presenting a two level type system in which element definitions provide typing for data and element types provide a type system for elements. At least three different inheritance mechanisms are supported.
The ASN.1 schema language introduces a distinction between lists and sets that is entirely frivolous in a serialization format and gratuitous distinctions between implicit and explict tagging.
The lesson to be drawn from these abominations is clear: The primary purpose of a schema language should be to allow the programmer to forget and ignore the wireline representation of protocol messages. Features that allow fine tuning of the wireline representation should be avoided.
While the notion of validating input data against a schema prior to passing data to an application is superficially attractive, schema constraints are rarely sufficient for this purpose. Thus applied to protocol design, schema validation rarely provides a meaningful benefit over checking that an encoding is well formed.
The following is based on an example from [RFC4627].
[ { "precision": "zip", "Latitude": 37.7668, "Longitude": -122.3959, "Address": "", "City": "SAN FRANCISCO", "State": "CA", "Zip": "94107", "Country": "US" }, { "precision": "zip", "Latitude": 37.371991, "Longitude": -122.026020, "Address": "", "City": "SUNNYVALE", "State": "CA", "Zip": "94085", "Country": "US" } ]
The corresponding Protogen schema is:
Structure SiteList Description |A list of sites Struct Site Sites Multiple Structure Site Description |A site location String Country Description |ISO ALPHA-2 Country Code. String precision Decimal Latitude Decimal Longitude String Address String City String State String Zip
For the sake of example, the description of the site structure entries is elided. While Protogen does not require description elements to be provided to produce code, descriptions are of course essential if useful documentation is to be generated.
Protogen is built using the Goedel code metasynthesizer which attempts to eliminate all unnecessary clutter from the code specification to minimize error. By default, indentation and the off-side rule are used to denote block structure following the approach used in occam and Python. Punctuation characters are only used to delimit strings ("), text blocks (|) and comments (!).
Note that the Latitude and Longitude are specified using the type Decimal rather than Float. This allows an implementation to avoid the loss of precision that inevitably occurs converting between a binary floating point representation such as IEEE 754 binary 64 and the decimal encoding used in JSON.
The example fragment is sufficient to describe a data structure and generate methods for JSON serialization and deserializtion. It is not however sufficient to generate a useful implementation of a Web service or client access library. to do this we must define a protocol with services, transactions and messages defined as follows:
To build a service using the Site structure, we prepend add following declaration:
Protocol Sitefinder STFND Service Finder "_siteFinder._wks" "SiteFinder" Request Response Description |Find sites for new donut stores. Message Request Struct Site WhereIAm Message Response Struct Site WhereAreDonuts Multiple
We can now run Protogen to generate any of the following:
Support for partial classes makes C# a particularly attractive target language for code generation as it allows classes produced by generated code to be conveniently extended. Support for other modern languages aligned with the Java/.NET data model requires only straightforward modification of the code generator.
While the C# generator is optimized for development of protocols and production code, the generator for C is intended for developing production code after the protocol architecture is largely static. The generator is intentionally biased towards flexibility rather than functionality since a modern programer using C is most likely to be doing so to build on a legacy code base. The ability to easily adapt the output of the generator to the existing coding style(s) is likely to be more highly valued than minimizing implementation effort.
Top level specification of a protocol. The Protocol element contains two attributes and a list of entries as follows:
Describes the parent element. Multiple description elements may be specified in which case the first SHOULD be a standalone short description. The description element has one attribute:
A service is a named set of transactions within a protocol namespace.
At present, due to an implementation limitation, all request and response messages used in a service MUST inherit from a single message type. This is bogus and should be fixed.
The service element has the following attributes:
Specifies a Request-Response transaction supported by a specified service.
At present transactions are specific to a service which is kind of bogus if multiple services were defined.
The Transaction element has the following attributes:
Specifies a protocol message. This is almost the same as a structure except that the name of a request message is a command to a server and the name of a response message identifies a response.
This feature is not yet implemented, the idea being that status codes should be represented at both the HTTP layer and JSON layer so that appropriate handling can be specified at either.
Specifies a message or structure defined in another schema.
Protogen recognizes ten intrinsic data types. While this is considerably larger than the three intrinsic types supported in JSON, the additional expressive power allows the tools to do more work for the programmer. For example, distinguishing strings that represent date-time values from other strings allows the tool to perform the work of encoding/decoding these values.
The following table sumarizes the Protogen schema types and their (default) corresponding C#/C equivalents.
Schema | JSON | C# | C |
---|---|---|---|
Boolean | true | false | bool | bool |
Float | number | double | double |
Decimal | number | Int64 | long long |
Integer | number | Int64 | int |
Binary | string (base64 encoded) | byte[] Data | BinaryType |
Label | string | string | StringType |
Name | string | string | StringType |
String | string | string | StringType |
URI | string | string | StringType |
DateTime | string | DateTime | struct tm |
Every data type supports the following options:
While the Protogen schema definition does include additional options for some data types (e.g. LengthBits, LengthFixed) these are only used in the TLS encoding generator and are ignored when JSON encoding is being used.
Messages and structures may be marked Abstract which means that they may be used as base classes for inheritance from other messages or structures but cannot appear on the wire.
Specifies that a message or structure inherits from another message or structure.
Note that inheritance relationships are represented in the generated code for languages that support inheritance (e.g. C#) and flattened out in languages that do not (e.g. C).
No distinction is made between a value that is not present and a value that is present with the value null. Thus the following JSON documents are considered to specify the same object.
{ "Value": 1 }
{ "Value": 1, "Optional": null }
An entry that has the 'Required' option set MUST always be specified even if the value is null.
No distinction is made between a list that is not present, a list with the null value and an empty list. Thus the following encodings desribe the same object:
{ "Value": 1 }
{ "Value": 1, "List": null }
{ "Value": 1, "List": [] }
To simplify scripting language implementation an entry that has the 'Multiple' option MUST be encoded as a list.
The decimal encoding provides an alternative to use of floating point to represent decimal fractions.
Since 10 is not a power of 2, conversion between decimal and binary fractions is inexact and using Real32 or Real64 values for this purpose introduces an unnecessary loss of precision.
Since modern programming languages lack support for a Decimal intrinsic type, this is implemented by mapping the datum to a 64 bit integer with an offset of 1,000,000,000. This approach allows for numbers up to 9,223,372 to be represented with nine digit precision.
Date Time Values are encoded as strings in IETF format.
Binary values are encoded using BASE64URL encoding.
[RFC4627] | Crockford, D., "The application/json Media Type for JavaScript Object Notation (JSON)", RFC 4627, July 2006. |