Internet DRAFT - draft-mouat-xml-patch
draft-mouat-xml-patch
Network Working Group A. Mouat
Internet-Draft diffxml
Expires: April 19, 2006 October 16, 2005
A delta format for XML documents
draft-mouat-xml-patch-00
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on April 19, 2006.
Copyright Notice
Copyright (C) The Internet Society (2005).
Abstract
This document specifies an implementation independent format for
expressing a set of changes between 2 XML documents. This set of
changes is commonly referred to as a "delta" in computing
terminology. The delta can be used to automatically transform (or
"patch") one XML document into another.
Mouat Expires April 19, 2006 [Page 1]
Internet-Draft diffxml October 2005
Table of Contents
1. Requirements notation . . . . . . . . . . . . . . . . . . . . 3
2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
3. Hierarchical vs Line Based Differencing . . . . . . . . . . . 5
4. Structure of DUL Document . . . . . . . . . . . . . . . . . . 6
4.1. Insert Operation . . . . . . . . . . . . . . . . . . . . . 6
4.1.1. Attributes . . . . . . . . . . . . . . . . . . . . . . 6
4.1.2. Content . . . . . . . . . . . . . . . . . . . . . . . 7
4.1.3. Example . . . . . . . . . . . . . . . . . . . . . . . 7
4.2. Insert Attribute Operation . . . . . . . . . . . . . . . . 7
4.2.1. Attributes . . . . . . . . . . . . . . . . . . . . . . 8
4.2.2. Content . . . . . . . . . . . . . . . . . . . . . . . 8
4.2.3. Example . . . . . . . . . . . . . . . . . . . . . . . 8
4.3. Delete Operation . . . . . . . . . . . . . . . . . . . . . 8
4.3.1. Attributes . . . . . . . . . . . . . . . . . . . . . . 9
4.3.2. Examples . . . . . . . . . . . . . . . . . . . . . . . 9
4.4. Update Operation . . . . . . . . . . . . . . . . . . . . . 10
4.4.1. Attributes . . . . . . . . . . . . . . . . . . . . . . 10
4.4.2. Content . . . . . . . . . . . . . . . . . . . . . . . 11
4.4.3. Examples . . . . . . . . . . . . . . . . . . . . . . . 11
4.5. Move Operation . . . . . . . . . . . . . . . . . . . . . . 12
4.5.1. Attributes . . . . . . . . . . . . . . . . . . . . . . 12
4.5.2. Example . . . . . . . . . . . . . . . . . . . . . . . 13
4.6. Complete Example . . . . . . . . . . . . . . . . . . . . . 13
4.7. Context Information . . . . . . . . . . . . . . . . . . . 14
5. Formal Definitions . . . . . . . . . . . . . . . . . . . . . . 15
6. Security Considerations . . . . . . . . . . . . . . . . . . . 16
7. IANA Consideration . . . . . . . . . . . . . . . . . . . . . . 17
7.1. MIME type registration . . . . . . . . . . . . . . . . . . 17
8. URN Sub-Namespace Registration . . . . . . . . . . . . . . . . 18
9. Normative References . . . . . . . . . . . . . . . . . . . . . 18
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 19
Intellectual Property and Copyright Statements . . . . . . . . . . 20
Mouat Expires April 19, 2006 [Page 2]
Internet-Draft diffxml October 2005
1. Requirements notation
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [5].
Mouat Expires April 19, 2006 [Page 3]
Internet-Draft diffxml October 2005
2. Introduction
The Delta Update Language (DUL) is an application agnostic format for
describing changes to XML documents. This has potential usage in
many applications, including reducing network transport usage by
obviating the need to send entire XML documents to convey a
potentially small change within a document.
Mouat Expires April 19, 2006 [Page 4]
Internet-Draft diffxml October 2005
3. Hierarchical vs Line Based Differencing
Standard UNIX tools exist for comparing (diff) and patching (patch)
files, which operate on a line-by-line basis using well-studied
methods for computing the longest common subsequence. Using these
tools on hierarchically stuctured data, such as XML, leads to sub-
optimal results, as they are incapable of recognizing the tree-based
structure of the files.
For example the XML fragments Figure 1 and Figure 2 are identical in
XML terms but substantially different in line-by-line terms:
CDATA
<doc attr1="2">x</doc>
Figure 1
CDATA
<doc
attr1= '2'
>x</doc>
Figure 2
For this reason the output format is an XML document that deals in
terms of tree operations upon nodes of a given document.
Mouat Expires April 19, 2006 [Page 5]
Internet-Draft diffxml October 2005
4. Structure of DUL Document
DUL deltas are XML [9] documents that MUST be well-formed and SHOULD
be valid. DUL documents MUST be based on XML 1.0.
This specification makes use of XML namespaces for identifying DUL
documents and document fragments. The namespace URI for elements
defined by this specification is a URN [6], using the namespace
identifier 'ietf' defined by [7] and extended by [8]. This URN is:
urn:ietf:params:xml:ns:dul The prefix "dul" is used throughout to
specify elements in the DUL namespace.
This RFC assumes the reader has a working knowledge of XML [9],
XPath [10] and DOM Level 2 [11].
There are five basic operations represented by XML elements. These
are insert, insertAttr, delete, move and update, defined as follows:
4.1. Insert Operation
CDATA
<dul:insert
parent = "xpathexpr"
childno = "cn"
charpos = "char"
>xml fragment</dul:insert>
Figure 3
Represents the insertion of an XML fragment into the document.
4.1.1. Attributes
4.1.1.1. The parent Attribute
The parent attribute represents the parent of the node to insert
under. The variable "xpathexpr" is an XPath expression that MUST
uniquely identify the node. The XPath expression SHOULD be
restricted to using node tests of the form "node()", which matches
any XPath node, followed by an abbreviated position predicate of the
form ["x"] where "x" is the position number of the node. The parent
attribute MUST be present.
4.1.1.2. The childno Attribute
The childno attribute represents the position at which to insert the
Mouat Expires April 19, 2006 [Page 6]
Internet-Draft diffxml October 2005
new node. The variable "cn" is the child number of the parent node
that the new node is to be inserted as. If there is already a node
at this position, that node will be moved to position "cn+1". The
number represents the XPath "node()" position which is not
necessarily the same as the DOM node index. When inserting
attributes the child number is unused and MAY be omitted, as
attributes have no defined order.
4.1.1.3. The charpos Attribute
The charpos attribute is required for cases where an insert is made
in the middle of, immediately after or immediately before character
data. It holds the character position at which to insert the node.
The variable "char" is the numeric position at which to insert the
node. The first character of a text node is 1, in accordance with
the XPath standard. Setting the attribute to 1 is equivalent to
inserting before the text. The charpos attribute MAY be omitted, and
in such cases "char" defaults to 1.
4.1.2. Content
The content of an insert element is an XML fragment to be inserted
verbatim. All whitespace, comments and processing instructions which
are children of the element MUST be inserted exactly as they appear.
Any XML included as content MUST be well formed.
4.1.3. Example
The following delta fragment represents the insertion of the element
<section title="Poetry"/> followed by the text "Coleridge" into a
document.
<dul:insert parent="/node()[1]/node()[3]" childno="2"><section
title="poetry"/>Coleridge</dul:insert>
Figure 4
4.2. Insert Attribute Operation
CDATA
<dul:insertAttr
parent = "xpathexpr"
name = "attrname"
>value</dul:insert>
Mouat Expires April 19, 2006 [Page 7]
Internet-Draft diffxml October 2005
Figure 5
Represents the insertion of an attribute into the document.
4.2.1. Attributes
4.2.1.1. The parent Attribute
The parent attribute specifies the parent of the attribute being
inserted. The variable "xpathexpr" is an XPath expression that MUST
uniquely identify the node. The XPath expression SHOULD be
restricted to using node tests of the form "node()", which matches
any XPath node, followed by an abbreviated position predicate of the
form ["x"] where "x" is the position number of the node. The parent
attribute MUST be present.
4.2.1.2. The name Attribute
The name attribute represents the name of the new attribute. It MUST
be a valid name for an XML attribute. The name attribute MUST be
present.
4.2.2. Content
The content of the insertAttr element is the value to be given to the
attribute. It MUST be a valid value for an XML attribute.
4.2.3. Example
The following delta fragment represents the insertion of the
attribute title with the value "poetry" into the document.
<dul:insertAttr parent="/node()[1]/node()[3]"
name="title">poetry</dul:insertAttr>
Figure 6
4.3. Delete Operation
<dul:delete
node = "xpathexpr"
charpos = "char"
length = "len"/>
Mouat Expires April 19, 2006 [Page 8]
Internet-Draft diffxml October 2005
Figure 7
Represents the deletion of a subtree, text or attribute from the
document.
4.3.1. Attributes
4.3.1.1. The node Attribute
The node attribute identifies the node to perform the delete
operation on. The variable "xpathexpr" is an XPath expression which
uniquely identifies the node to be deleted. Attributes may be
deleted by an appropriate XPath expression which specifies their
title. The variable "xpathexpr" is subject to the same restrictions
as for an insert, with the exception that when an attribute is being
deleted it is specified as the last predicate of "xpathexpr". The
node attribute MUST NOT be omitted.
4.3.1.2. The charpos Attribute
The charpos attribute is only used when character data is being
deleted, and is used in conjunction with the length attribute. The
variable "char" is the index of the first character to delete,
counting in the same way as for the insert operation. It is unused
in cases where the node is not a text node. The charpos attribute
MAY be omitted, and in such cases "char" defaults to 1.
4.3.1.3. The length Attribute
The length attribute is only used when character data is being
deleted and identifies how many characters to delete. The variable
"len" is the number of characters to delete, from and including the
character specified by the charpos attribute. The length attribute
MAY be omitted, and in such cases "len"defaults to 0. Therefore if
length is unspecified when deleting a text node then no deletion will
occur. If the length specified is greater than the length to the end
of the node, the length is treated as being equal to the length to
the end of the node. Note that entity references may be changed by
this operation. Specifying a greater length does not allow deletion
of other nodes.
4.3.2. Examples
The following delta fragment represents the deletion of the title
attribute of an element.
<dul:delete node="/node()[1]/node()[2]/node()[3]/@title"/>
Mouat Expires April 19, 2006 [Page 9]
Internet-Draft diffxml October 2005
Figure 8
The following delta fragment represents the deletion of a text node,
removing the first 7 characters from the node identified.
<dul:delete node="/node()[1]/node()[4]" charpos="1"
length="7"/>
Figure 9
4.4. Update Operation
<dul:update
node = "xpathexpr"
charpos = "char"
length = "len"
>value</dul:update>
Figure 10
Represents the updating of a value associated with the given node.
4.4.1. Attributes
4.4.1.1. The node Attribute
The node attribute identifies the node to be updated. The variable
"xpathexpr" is an XPath expression which uniquely identifies the node
to be updated. Attributes may be updated by an appropriate XPath
expression which specifies their title. The variable "xpathexpr" is
subject to the same restrictions as for the insert operation, with
the exception that when an attribute is being updated it is specified
as the last predicate of "xpathexpr". Also XPath expression may not
point to an element, as elements have no associated value that can be
updated. The names of attributes and elements cannot be changed with
this operation. The node attribute MUST NOT be omitted.
4.4.1.2. The charpos Attribute
The charpos attribute is used when character data is being updated,
and is used in conjunction with the length attribute. The variable
"char" is the index of the first character to replace, counting in
the same way as for the insert operation. It is unused in cases
where the node is not a text node. The charpos attribute MAY be
Mouat Expires April 19, 2006 [Page 10]
Internet-Draft diffxml October 2005
omitted and in such cases defaults to 1.
4.4.1.3. The length Attribute
The length attribute is used when character data is being updated and
identifies how many characters to replace. The variable "len"
represents the number of characters to replace, from and including
the character specified by the charpos attribute. The length
attribute MAY be omitted and in such cases defaults to 0. The number
of characters specified by the length attribute are always replaced,
if the new text is not "len" characters long, the old text is
truncated. Similarly if the new text is more than "len" characters
long, the excess text is inserted without overwriting. Hence if the
length attribute is unspecified or 0 when updating a text node, the
new text is inserted at the appropriate position, without overwriting
the old text.
4.4.2. Content
The variable "value" represents the new value for the node. The
meaning of the variable is dependent on the type of node being
updated. The content MUST NOT contain XML elements. In cases where
character data is being updated, the new text overwrites characters
beginning at position "char" and ending at "char + len". Excess
characters in "value" are appended without overwriting.
4.4.3. Examples
The following delta fragment represents an update of a non-attribute
node:
<dul:update node="/node()[1]/node()[2]/node()[3]"
>this is a comment</dul:update>
Figure 11
The following delta fragment updates the value of an attribute called
"title" to "Arch Bishop":
<dul:update node="/node()[1]/node()[2]/node()[3]/@title"
>Arch Bishop</dul:update>
Figure 12
Mouat Expires April 19, 2006 [Page 11]
Internet-Draft diffxml October 2005
4.5. Move Operation
<dul:move
node = "xpathexpr"
oldCharpos = "ochar"
length = "len"
parent = "parxpathexpr"
childno = "cn"
newCharpos = "nchar" />
Figure 13
Represents the move of a subtree or leaf node within a document.
4.5.1. Attributes
4.5.1.1. The node Attribute
The node attribute identifies the node or subtree to be moved. The
variable "xpathexpr" is an XPath expression which uniquely identifies
the node or subtree to be moved. Attributes may not be moved. The
variable "xpathexpr" is subject to the same restrictions as for the
insert operation. The node attribute MUST not be omitted.
4.5.1.2. The oldCharpos Attribute
The "oldCharpos" attribute is used in cases where a move is made from
the middle of, immediately after or immediately before character
data. It holds the character position of the node to be moved. The
variable "ochar" is the numeric position of the node or the first
text character to move. The first character of a text node is 1, in
accordance with the XPath standard. The oldCharpos attribute MAY be
omitted, and in is such cases defaults to 1.
4.5.1.3. The length Attribute
The length attribute identifies the number of characters to move. It
is unnecessary when not moving a text node. The length attribute MAY
be omitted, and in such cases defaults to 0. When moving a text
node, no move will take place if the variable "len" is 0.
4.5.1.4. The parent Attribute
The parent attribute identifies the new parent for the node or
subtree. The "parxpathexpr" variable uniquely identifies the element
that the node identified by xpathexpr is to become a child of. The
Mouat Expires April 19, 2006 [Page 12]
Internet-Draft diffxml October 2005
XPath expression is restricted as for the insert operation. The
parent attribute MUST NOT be omitted.
4.5.1.5. The childno Attribute
The childno attribute identifies the node position at which to insert
the moved node or subtree. The variable "cn" is the child number of
the node identified by "parxpathexpr" that the moved node or subtree
is to be inserted as. Any existing node at this position becomes the
"cn+1" node. The variable "cn" is the XPath "node()" position that
the node will have (as opposed to the DOM node index). The childno
attribute MUST NOT be omitted.
4.5.1.6. The newCharpos Attribute
The newCharpos attribute is used when a move is made to a position in
the middle of, immediately after or immediately before character
data. The variable "nchar" is the numeric character position at
which to insert the node, counting in the same way as for the insert
operation. The first character of a text node is 1, in accordance
with the XPath standard. Setting the attribute to 1 represents an
insertion before the text. The newCharpos MAY be omitted, and in
such cases defaults to 1.
4.5.2. Example
The following delta fragment represents the move of the subtree at
the 2nd child of the 3rd child of the root element, to the 2nd child
of the 2nd child of the root element.
<dul:move node="/node()[1]/node()[3]/node()[2]"
parent="/node()[1]/node()[2]"
childno="2"/>
Figure 14
4.6. Complete Example
The DUL document Section 4 represents then changes required to
transform the document Figure 15 into Figure 16.
CDATA
<?xml version="1.0"?>
<a><b>sometext</b><x/></a>
Mouat Expires April 19, 2006 [Page 13]
Internet-Draft diffxml October 2005
Figure 15
CDATA
<?xml version="1.0"?>
<a><x bute="new"/><b>text</b>moretext</a>
Figure 16
<?xml version="1.0"?>
<dul:dul xmlns:dul="urn:ietf:params:xml:ns:dul">
<!-- move <x/> to be first child of root -->
<dul:move node="/node()[1]/node()[2]"
parent="/node()[1]" childno="1"/>
<!-- add the bute attr to <x/> -->
<dul:insertAttr node="/node()[1]/node()[1]"
name="bute>new</dul:insertAttr>
<!-- delete the text "some" from sometext -->
<dul:delete node="/node()[]/node()[2]/node()[1]"
charpos="1" length="4"/>
<!-- add the text "moretext" -->
<dul:insert parent="/node()[1]"
childno="3">moretext</dul:insert>
</dul:dul>
Figure 17
4.7. Context Information
As with the UNIX diff and patch utilities, it would be useful to
support patching of arbitrary files via context matching. This would
require DUL documents to contain extra data pertaining to the context
of nodes. This is considered to be a future concern and is not
currently supported.
Mouat Expires April 19, 2006 [Page 14]
Internet-Draft diffxml October 2005
5. Formal Definitions
Mouat Expires April 19, 2006 [Page 15]
Internet-Draft diffxml October 2005
6. Security Considerations
There are no special security considerations for this specification.
Security considerations are more appropriate in documents describing
protocols that might use the delta format described in this
specification.
Mouat Expires April 19, 2006 [Page 16]
Internet-Draft diffxml October 2005
7. IANA Consideration
7.1. MIME type registration
To: ietf-types@iana.org
Subject: Registration of MIME media type application/xml-diff
MIME media type name: application
MIME subtype name: xml-diff
Required parameters: none
Optional parameters: none
Encoding Considerations: Same considerations as for XML.
Security Considerations: See Section 6.
Interoperability Considerations: TODO
Published Specification: This document is the published specification
for the MIME type being registered.
Applications which use this media type: Applications maintaining
configuration or application information on HTTP/WebDAV servers are
expected to use this media type.
Additional Information: There is no magic number or file extension
associated with this MIME type.
Person & email address to contact for further information: Adrian
Mouat (amouat@postmaster.co.uk).
Intended usage: Common
Author/Change Controller: TODO
TODO: also register as an instance manipulation for use in RFC 3229
Mouat Expires April 19, 2006 [Page 17]
Internet-Draft diffxml October 2005
8. URN Sub-Namespace Registration
URN Sub-Namespace Registration for urn:ietf:params:xml:ns:dul
This section registers a new XML namespace, as per the guidelines in
.
URI: The URI for this namespace is urn:ietf:params:xml:ns:dul.
TODO: Fill out the rest of this section as per the guidelines in [8].
9. Normative References
[1] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L.,
Leach, P., and T. Berners-Lee, "Hypertext Transfer Protocol --
HTTP/1.1", RFC 2616, June 1999.
[2] Clemm, G., Amsden, J., Ellison, T., Kaler, C., and J. Whitehead,
"Versioning Extensions to WebDAV (Web Distributed Authoring and
Versioning)", RFC 3253, March 2002.
[3] Goland, Y., Whitehead, E., Faizi, A., Carter, S., and D. Jensen,
"HTTP Extensions for Distributed Authoring -- WEBDAV", RFC 2518,
February 1999.
[4] Mogul, J., Krishnamurthy, B., Douglis, F., Feldmann, A., Goland,
Y., van Hoff, A., and D. Hellerstein, "Delta encoding in HTTP",
RFC 3229, January 2002.
[5] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997.
[6] Moats, R., "URN Syntax", RFC 2141, May 1997.
[7] Moats, R., "A URN Namespace for IETF Documents", RFC 2648,
August 1999.
[8] Mealling, M., "The IETF XML Registry", BCP 81, RFC 3688,
January 2004.
[9] <http://www.w3.org/TR/REC-xml>
[10] <http://www.w3.org/TR/xpath>
[11] <http://www.w3.org/TR/DOM-Level-2-Core/>
Mouat Expires April 19, 2006 [Page 18]
Internet-Draft diffxml October 2005
Author's Address
Adrian Mouat
diffxml.sf.net
Filsa
Quarff, Shetland
ZE2 9EY
Email: amouat@postmaster.co.uk
Mouat Expires April 19, 2006 [Page 19]
Internet-Draft diffxml October 2005
Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Disclaimer of Validity
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Copyright Statement
Copyright (C) The Internet Society (2005). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights.
Acknowledgment
Funding for the RFC Editor function is currently provided by the
Internet Society.
Mouat Expires April 19, 2006 [Page 20]