Internet-Draft | Gateway Crash Recovery | March 2021 |
Belchior, et al. | Expires 11 September 2021 | [Page] |
This memo describes the crash recovery mechanism for the Open Digital Asset Protocol (ODAP), entitled ODAP-2PC. ODAP-2PC assures that gateways running ODAP are crash-fault tolerant, meaning that the atomicity of asset transfers are assured even if gateways crash. This protocol includes the description of the messaging and logging flow necessary for gateways to keep track of current state, the crash recovery protocol, and a rollback protocol.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 11 September 2021.¶
Copyright (c) 2021 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.¶
Gateway systems that perform virtual asset transfers among DLTs must possess a degree of resiliency and fault tolerance in the face of possible crashes. A key component of crash recovery is maintaining logs that enable either the same or other backup gateways to resume partially completed transfers. Another key component is an atomic commit protocol (ACP) that guarantees that the source and target DLTs are modified consistently (atomicity) and permanently (durability), e.g., that assets that are taken from the source DLT are persisted into the recipient DLT.¶
This memo proposes: (i) the parameters that a gateway must retain in the form of logs concerning message flows within asset transfers; (ii) a JSON-based format for logs related to asset transfers.¶
There following are some terminology used in the current document:¶
Logs are associated to a process running operations on a certain gateway, and they can be stored in several supports: 1) off-chain storage (with the possibility of a hash of the logs being stored on-chain), where logs are stored on the hard-drive of the computer system performing the role of a gateway; 2) cloud storage; 3) on-chain storage, either storing the logs on the blockchains that gateways are connected, or to a third blockchain.¶
To manipulate the log, we define a set of log primitives, that translate log entry requests from a process into log entries, realized by the log storage API, later presented:¶
A log entry request typically comes from a single event in a given protocol. Log entry requests have the format (phase, step, operation, gateways), where the field operation corresponds to an arbitrary command, and the field gateways correspond to the parties involved in the protocol. We define four operations types to provide context to the protocol being executed. Operation type (init-) states the intention of a gateway to execute a particular operation, and operation (exec-) expresses that the gateway is excecuting an operation. The operation type (done-) states when an agent successfully executed a step of the protocol, while (ack-) refers to when a gateway acknowledges a message received from another. Conversely, we use the type (fail-) to refer to when an agent fails to execute a specific step.¶
From step 1 to 7, the generated logs are:¶
The gateway architecture [ODAP] defines two gateway nodes belonging to distinct DLT systems as a means to conduct a virtual asset transfer in a secure and non-repudiable manner while ensuring the asset does not exist simultaneously on both blockchains.¶
One of the key deployment requirements of gateways for asset transfers is a high degree of gateways availability. In this document, we consider two common strategies to increase availability: (1) to support the recovery of the gateways and (2) to employ backup gateways with the ability to resume a stalled transfer.¶
To this end, gateways must retain relevant log information regarding incoming protocol messages (parameters, payloads, etc.) and transmitted messages. In particular, logs are written before operations (write-ahead) to provide atomicity and durability to the asset exchange protocol. The log-data is considered as internal resources to the DLT system, accessible to the backup gateway and possible other gateway nodes.¶
The Open Digital Asset Protocol (ODAP) is a DLT-agnostic gateway-to-gateway protocol used by a sender gateway and a target gateway to perform a virtual asset's unidirectional transfer [ODAP]. The transfer process is started by a client (application) that interacts with the source gateway or both (source and recipient) gateways to provide instructions regarding actions, related resources located in the source DLT system, and resources located in the remote DLT system. The protocol has two modes, but here we consider only the Relay Mode: Client-initiated Gateway to Gateway asset transfer. When we refer to the ODAP protocol in this document, we refer to the ODAP protocol in Relay Mode, although the logging model specified in this memo can also support the Direct mode., although the logging model specified in this memo can also support the Direct mode.¶
ODAP has to be instanced with an ACP protocol to guarantee that the source and target DLTs are modified consistently, a property designated Atomicity [BHG87]. ACPs consider two roles: a Coordinator that manages the execution of the protocol and Participants that manage the resources that must be kept consistent. The source gateway plays the ACP role of Coordinator, and the recipient gateway plays the Participant role in relay mode. Gateways exchange messages corresponding to the protocol execution, generating log entries for each one. The message exchange, and corresponding logging procedure is represented in Figure 1.¶
The simplified message flow format is in the form < ODAP_PHASE, STEP, COMMAND, GATEWAY >, where ODAP_PHASE corresponds to the current phase of ODAP, STEP corresponds to a monotonically increasing integer, COMMAND to the command type being issued by a set of gateways (GATEWAY). However, both two-phase commit and three-phase commit can block in case nodes fail. The protocol being blocking means that if the coordinator crashes, then gateways may not finish transactions. When a crash happens, gateways will be waiting for a confirmation/abort, and possibly holding the lock regarding a specific digital asset.¶
We assume gateways fail by crashing, i.e., by becoming silent, not arbitrary or Byzantine faults. We assume authenticated reliable channels obtained using TLS/HTTPS [TLS]. To recover from these crashes, gateways store in persistent storage data about the step of their protocol. This allows the system to recover by getting from the log the first step that may have failed. We consider two recovery models:¶
In Self-healing mode, when a gateway restarts after a crash, it reads the state from the log and continues executing the protocol from that point on. We assume the gateway does not lose its long-term keys (public-private key pair) and can reestablish all TLS connections.¶
In Primary-backup mode, we assume that after a period T of the primary gateway failure, a backup gateway detects that failure unequivocally and takes the role of the primary gateway. The failure is detected using heartbeat messages and a conservative value for T. The backup gateway does virtually the same as the gateway in self-healing mode: reads the log and continues the process. The difference is that the log must be shared between the primary and the backup gateways. If there is more than one backup, a leader-election protocol may be executed to decide which backup will take the primary role.¶
Gateways can crash at several points of the protocol.¶
In 2PC and 3PC, recovery requires that the protocol steps are recorded in a log immediately before sending a message and immediately after receiving a message. Thus, at every step k of the protocol, each gateway writes in the log entry indicating its current state. When a node crashes:¶
Upon recovery, the recovered node attempts to retrieve the most recent log of operations. Based on the latest log entry last(log), it derives the current state of the asset transfer. This can be confirmed by querying all other nodes involved in such transfer by sending a recovery message rm. After the current state is fetched and agreed upon by all parties, the ODAP protocol continues. There are several situations when a crash may occur. The first one is immediately after starting the transfer, as shown below:¶
The source gateway (G1) crashes right before it issued an init command to the recipient gateway (G2). The gateway eventually recovers in self-healing mode, querying the last log entry from the log storage API. After that, it sends a recovery message to G2, advertising that the recovery has been completed and asking for an updated version of the log, i.e., the current state. In this case, the latest version of the log corresponds to G1 log. After synchronization has been achieved, the process can continue.¶
The second scenario requires further synchronization (figure below). At the retrieval of the latest log entry, G1 notices its log is outdated. It updates it upon necessary validation and then communicates its recovery to G2. The process then continues as defined.¶
Log primitives are translated into log entries, persisted by the log storage API in the format <operation, step, phase, gateways>, where the gateway issuing the operation is implicit. For example, when GS initiates ODAP's first phase, by sending a message to GR, a log entry specifying the command init given to G2, in the first operation of the phase p1 is translated to a log entry <p1,1,init-validate,GS-GR)>. After that, the log entry is persisted via the log storage API. Thus, log primitives are also translated into log storage API requests.¶
We consider the log file to be a stack of log entries. Each time a log entry is added, it goes to the top of the stack (the highest index). Logs can be saved locally (computer?s disk), in an external service (e.g., cloud storage service), or in the DLT the gateway is operating. Saving logs locally is faster than saving them on the respective ledger but delivers weaker integrity and availability guarantees. Saving log entries on a DLT may slow down the protocol because issuing a transaction is several orders of magnitude slower than writing on disk or accessing a cloud service. Self-healing mode is compatible with the three types of logs, but Primary-backup mode requires storage in an external service or the DLT. For critical scenarios where strong accountability and traceability are needed (e.g., financial institution gateways), blockchain-based logging storage may be appropriate. Conversely, for gateways that implement interoperability between blockchains belonging to the same organization (i.e., a legal framework protects the legal entities involved), local storage might suffice.¶
We assume the storage service used provides the means necessary to assure the logs' confidentiality and integrity, stored and in transit. The service must provide an authentication and authorization scheme, e.g., based on OAuth and OIDC [OIDC], and use secure channels based on TLS/HTTPS [TLS].¶
We consider a log storage API that allows developers to abstract from the storage details (e.g., relational vs. non-relational, local vs. cloud) and handles access control if needed. This is API-TYPE 1, as the gateway uses it to store off-chain resources.¶
The log storage API serves two purposes: 1) it provides a reliable mean to store logs created by all gateways involved in an asset transfer; and 2) promote accountability across parties.¶
The log storage API MUST respond with return codes indicating the failure (error 5XX) or success of the operation (200). The application may carry out further operation in future to determine the ultimate status of the operation.¶
Persists a log entry at the default storage environment, by appending it to the current log. Returns the index of the saved log entry.¶
Response example:¶
Obtains the latest log entry from the log.¶
Response example:¶
Obtains a log entry with specified ID.¶
Response example:¶
Obtains the whole log.¶
Response example:¶
Updates the current log. The log is updated if there are new log entries.¶
Returns the index of the last common log entry (common prefix).¶
Response example:¶
The log entries are stored by a gateway in its log. Entries account for the current status of one of the three ODAP flows: Transfer Initiation flow, Lock-Evidence flow, and Commitment Establishment flow. The recommended format for log entries is JSON [xxx], with protocol-specific mandatory fields, support for a free format field for plaintext or encrypted payloads directed at the DLT gateway or an underlying DLT. Although the recommended format is JSON, other formats can be used (e.g., XML).¶
The mandatory fields of a log entry are:¶
Optional log entry fields are:¶
Example of a log entry created by G1, corresponding to locking an asset (phase 2.3 of the ODAP protocol) :¶
Example of a log entry created by G2, acknowledging G1 locking an asset (phase 2.4 of the ODAP protocol) :¶
We assume a trusted, secure communication channel between gateways (i.e., messages cannot be spoofed and/or altered by an adversary) using TLS 1.3 or higher. Clients support ?acceptable? credential schemes such as OAuth2.0.¶
The present protocol is crash fault-tolerant, meaning that it handles gateways that crash for several reasons (e.g., power outage). The present protocol does not support Byzantine faults, where gateways can behave arbitrarily (including being malicious). This implies that both gateways are considered trusted. We assume logs are not tampered with or lost.¶
Log entries need integrity, availability, and confidentiality guarantees, as they are an attractive point of attack [BVC19]. Every log entry contains a hash of its payload for guaranteeing integrity. If extra guarantees are needed (e.g., non-repudiation), a log entry might be signed by its creator. Availability is guaranteed by the usage of the log storage API that connects a gateway to a dependable storage (local, external, or DLT-based). Each underlying storage provides different guarantees. Access control can be enforced via the access control profile that each log can have associated with, i.e., the profile can be resolved, indicating who can access the log entry in which condition. Access control profiles can be implemented with access control lists for simple authorization. The authentication of the entities accessing the logs is done at the Log Storage API level (e.g., username+password authentication in local storage vs. blockchain-based access control in a DLT).¶
For extra guarantees, the nodes running the log storage API (or the gateway nodes themselves) can be protected by hardening technologies such as Intel SGX [CD16].¶