Internet DRAFT - draft-nof-framework

draft-nof-framework







Network Working Group                                            H. Wang
Internet-Draft                                                   L. Zhao
Intended status: Standards Track                                 S. Chen
Expires: 8 September 2022                                         Huawei
                                                            7 March 2022


                   NVMe over Fabric Network Framework
                         draft-nof-framework-00

Abstract

   NVMe over Fabrics defines a common architecture that supports a range
   of storage networking fabrics for NVMe block storage protocol over a
   storage networking fabric, such as Ethernet, Fibre Channel and
   InfiniBand.  For Ethernet-based networks, RDMA or TCP technology can
   be used to transport NVMe, but the network management mechanism is
   simple, and fault detection is weak.

   This document defines the architecture of the Ethernet-based NVMe
   control optimization technology, including service processes between
   hosts, storage devices and network switches, and fast fault-aware
   switchover.

Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 8 September 2022.





Wang, et al.            Expires 8 September 2022                [Page 1]

Internet-Draft              Abbreviated-Title                 March 2022


Copyright Notice

   Copyright (c) 2022 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   3
   3.  Reference Models  . . . . . . . . . . . . . . . . . . . . . .   3
     3.1.  Basic Model . . . . . . . . . . . . . . . . . . . . . . .   3
     3.2.  CLOS Model  . . . . . . . . . . . . . . . . . . . . . . .   4
   4.  Functional Components . . . . . . . . . . . . . . . . . . . .   5
     4.1.  Storage Device  . . . . . . . . . . . . . . . . . . . . .   5
     4.2.  Host  . . . . . . . . . . . . . . . . . . . . . . . . . .   6
     4.3.  Network Device  . . . . . . . . . . . . . . . . . . . . .   6
   5.  Procedures  . . . . . . . . . . . . . . . . . . . . . . . . .   7
     5.1.  IP Domain Management  . . . . . . . . . . . . . . . . . .   7
     5.2.  Network Deployment  . . . . . . . . . . . . . . . . . . .   8
     5.3.  Storage and Host Access . . . . . . . . . . . . . . . . .   9
     5.4.  NoF Information Advertisement . . . . . . . . . . . . . .   9
   6.  Reliability Consider  . . . . . . . . . . . . . . . . . . . .  10
     6.1.  Storage Failure . . . . . . . . . . . . . . . . . . . . .  10
     6.2.  Host Failure  . . . . . . . . . . . . . . . . . . . . . .  10
     6.3.  Access Link Failure . . . . . . . . . . . . . . . . . . .  10
     6.4.  Network Link Failure  . . . . . . . . . . . . . . . . . .  10
     6.5.  Network Device Failure  . . . . . . . . . . . . . . . . .  11
   7.  Security Considerations . . . . . . . . . . . . . . . . . . .  11
   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  11
   9.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  11
   10. References  . . . . . . . . . . . . . . . . . . . . . . . . .  11
     10.1.  Normative References . . . . . . . . . . . . . . . . . .  11
     10.2.  References . . . . . . . . . . . . . . . . . . . . . . .  11
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  11








Wang, et al.            Expires 8 September 2022                [Page 2]

Internet-Draft              Abbreviated-Title                 March 2022


1.  Introduction

   For a long time, the key storage applications and high performance
   requirements were mainly based on FC networks.  With the increase of
   transmission rates, the medium has evolved from HDDs to solid-state
   storage, and the protocol has evolved from SCSI to NVMe.  The
   emergence of new NVMe technologies brings new opportunities.

   Ethernet-based NVMe is an implementation of NVMe over Fabric that
   best fits NVMe semantics.  It surpasses FC in terms of performance,
   cost and network management.  It is the development trend of high-
   speed storage networks in the future.  Ethernet-based NVMe has been
   defined in NVM Express.  The specification defined in this document
   optimizes network control in terms of ease of use, maintainability,
   and reliability, making Ethernet-based NVMe more suitable for high
   reliability requirements of key applications.  This feature improves
   system usability and maintainability.

   The [ODCC-2020-05016] defined the basic specifications for NVME of
   RoCEv2, and this document draws on that definition.

2.  Terminology

   NoF : NVMe of Fabric

   FC : Fiber Channel

   NVMe : Non-Volatile Memory Express

3.  Reference Models

   An Ethernet-based NVMe network mainly includes three types of roles:
   an initiator (referred to as a host), a switch, and a target
   (referred to as a storage device).  Initiators and targets are also
   referred to as endpoint devices.  Hosts and storage devices use the
   Ethernet-based NVMe protocol to transmit data over the network to
   provide high-performance storage services.

3.1.  Basic Model












Wang, et al.            Expires 8 September 2022                [Page 3]

Internet-Draft              Abbreviated-Title                 March 2022


                  +--+       +--+
       Host       |H1|       |H2|
    (Initiator)   +-,+       +_.+
                   | `',   _-` |
                   |    _-`    |
                   | _-`   `', |
      Ethernet  +----+       +----+
      Network   | SW |       | SW |
                +---,+       +_.--+
                   | `',   _-` |
                   |    `',    |
                   | _-`   `', |
      Storage     +-`+       +`'+
      (Target)    |S1|       |S2|
                  +--+       +--+
        Figure 1 : Basic Model

   This is the basic model for small-scale storage access networks.
   Hosts and storage devices are dual-homed to different switches.

   After a host or a storage device is connected to a switch, they
   register their information to the switch and obtain registration
   information of other hosts/storage devices from the switch node.

3.2.  CLOS Model


























Wang, et al.            Expires 8 September 2022                [Page 4]

Internet-Draft              Abbreviated-Title                 March 2022


                           +--+      +--+      +--+      +--+
               Host        |H1|      |H2|      |H3|      |H4|
            (Initiator)    +/-+      +-,+      +.-+      +/-+
                            |         | '.   ,-`|         |
                            |         |   `',   |         |
                            |         | ,-`  '. |         |
                          +-\--+    +--`-+    +`'--+    +-\--+
                          | SW |    | SW |    | SW |    | SW |
                          +--,-+    +---,,    +,.--+    +-.--+
                              `.          `'.,`         .`
                                `.   _,-'`    ``'.,   .`
              Ethernet          +--'`+            +`-`-+
              Network           | SW |            | SW |
                                +--,,+            +,.,-+
                                .`   `'.,     ,.-``   ',
                              .`         _,-'`          `.
                          +--`-+    +--'`+    `'---+    +-`'-+
                          | SW |    | SW |    | SW |    | SW |
                          +-.,-+    +-..-+    +-.,-+    +-_.-+
                            | '.   ,-` |        | `.,   .' |
                            |   `',    |        |    '.`   |
                            | ,-`  '.  |        | ,-`  `', |
              Storage      +-`+      `'\+      +-`+      +`'+
              (Target)     |S1|      |S2|      |S3|      |S4|
                           +--+      +--+      +--+      +--+
                           Figure 2 : CLOS Model

   This is a relatively large-scale storage network which applies to a
   large-scale storage device access network.

   Hosts and storage nodes connect to different switch nodes and
   register to the switch nodes.  The switch needs to flood the
   registration information received locally to other switch nodes on
   the network.

4.  Functional Components

   The Ethernet-based NVMe network consists of storage devices, hosts
   and switches.

4.1.  Storage Device

   As the server side, storage devices provide storage access services
   for hosts.  When a storage device is connected to a switch, storage
   service information must be registered and periodically notified to
   the switch to ensure the validity of information.





Wang, et al.            Expires 8 September 2022                [Page 5]

Internet-Draft              Abbreviated-Title                 March 2022


   If the storage device has interest in information of other storage
   device or host in the storage network, it may also receive the
   notification of such information from the switch.

     +-------+                  +------+
     |Storage|                  |Switch|
     +-------+                  +------+
         |      Register Msg       |
         | ----------------------->|
         |                         |
         |     Notification Msg    |
         | <-----------------------|
         |                         |
         |                         |
         Figure 3 : Storage Device

4.2.  Host

   The host is the client of the storage device.  When a host accesses a
   switch, it needs to register the host information to the switch and
   periodically publish it.

   As the client side, a host needs to quickly obtain the service status
   of the storage device that provides services.  When the host obtains
   the notification message from the switch indicating that the storage
   device goes online, the host may establish a connection to the
   storage device.  When the host receives a notification message from
   the switch indicating that the storage device is faulty, the host
   needs to quickly disconnect from the storage device and attempt to
   establish a connection to other redundant storage devices.

   +-------+                  +------+
   |  HOST |                  |Switch|
   +-------+                  +------+
       |       Register Msg      |
       | ----------------------->|
       |                         |
       |     Notification Msg    |
       | <-----------------------|
       |                         |
       |                         |
        Figure 4 : Host Device

4.3.  Network Device

   Switches manage the registration information of the hosts and storage
   devices, and monitor the network status.  Switches will synchronize
   this information to the other switches in the network.



Wang, et al.            Expires 8 September 2022                [Page 6]

Internet-Draft              Abbreviated-Title                 March 2022


   +------+                  +------+
   |Switch|                  |Switch|
   +------+                  +------+
      |    Information Sync     |
      | ----------------------->|
      |                         |
      |                         |
      |                         |
       Figure 5 : Network Device

5.  Procedures


5.1.  IP Domain Management

   On an FCoE network, users can control access between nodes through
   zones, improving network security.  This zone is used for inter-
   domain isolation and intra-domain communication.

   On the Ethernet-base NVMe network, we also need to implement FC zones
   to isolate and control services between storage devices and hosts.
   On the Ethernet-base NVMe network, IP addresses are used as the
   unique identifiers of hosts and storage devices, and domains are used
   as the attributes of IP addresses.  Hosts and storage devices in the
   same domain can access each other.  Hosts and storage devices in
   different domains are isolated.  Each IP address needs to be assigned
   to one or more domains.  Also, there is a default domain.  If no
   isolation is required, the IP addresses of these hosts and storage
   devices belong to the default domain.  For each domain, we may also
   call it zone.





















Wang, et al.            Expires 8 September 2022                [Page 7]

Internet-Draft              Abbreviated-Title                 March 2022


                _,.---.,,         ,,.--.,,
             .'`         `'.,  .'`        `'.
          ,-`                ,'              `\
         /    +--------+   ,'  \     +--------+`.
       .'     |StorageA|  /     `,   |StorageB|  \
      /       +---,----+ /        \  +-_.-----+   \
     /             `.,  /          ,_-`            \
     '                '/         _-\                ,
    |                  |`',   _-`   |               |
   /                   / +-`-`--+   \               \
   |                  |  |Switch|    |               |
   |                  |  +- .-,,+    |               |
   |                  |  ,'` |  '.   |               |
   |                  |-`    |    `',|               |
   |                .'|      |       |.,             |
    ,            ,-`   \     |      /   ',          /
    |     +-----`-+    | +---\---+  |   +-`'----+   |
     ,    | HostA |    \ | HostB | /    | HostC |   `
     \    +-------+     \+-------+ `    +-------+  /
      \                  \        /               /
       `.                 \      '               /
         \                 `,  ,'               `
          `.     Zone1       `.    Zone2      ,'
            `'.,         _.-`  '.,        _.'`
                `'''--''`         `''--''`

                Figure 6 : Zone Management

   As shown in the figure above, HostA and StorageA belong to Zone1,
   HostC and StorageB belong to Zone2, and HostB belongs to Zone1 and
   Zone2.

   StorageA can be accessed only by HostA but not HostC.  StorageB can
   be accessed only by HostC, but not by HostA.  Because HostB belongs
   to both Zone1 and Zone2, HostB can access StorageA in Zone1 and
   StorageB in Zone2.

5.2.  Network Deployment

   The NoF network uses the standard Ethernet technology, and the
   typical deployment model is the CLOS architecture.  Network
   deployments typically use the current IP technologies.  For example,
   OSPF is usually deployed as an underlay protocol.








Wang, et al.            Expires 8 September 2022                [Page 8]

Internet-Draft              Abbreviated-Title                 March 2022


5.3.  Storage and Host Access

   Hosts and storage devices are connected to the ethernet network.  The
   administrator assigns access IP addresses to the hosts and storage
   devices.  In most scenarios, these routes can be advertised through
   the underlay protocol.  In addition, after hosts and storage devices
   go online, they need to register their information to the switches.
   It is recommended that the registration message be completed using
   LLDP.

   The registration information includes the IP address type, whether to
   subscribe to host or storage device information changes, device role,
   service protocol type and version number, protocol service port
   number, protocol identifier, etc.

   The switch receives and saves the registration information of hosts
   and storage devices.  For a host/storage device that subscribes to
   the hosts and storage device information changes, the switch also
   needs to advertise the collected registration information to the
   subscriber.  The information to be advertised includes the device
   status, device status change reason, and device attachment
   information.  When advertising the subscribed information, it must be
   ensured that only the registration information of the domain to which
   the node belongs is advertised.  It is recommended to use a new
   protocol to implement this notification message.

5.4.  NoF Information Advertisement

   Users assign domains for different hosts and storage devices.  The
   domain information must be obtained by all access switches on the
   entire storage network.  The domain information can be configured on
   each access switch.  It can also be configured on some switches and
   then synchronize to all other access switches throughout the storage
   network.

   In addition, the local host and storage device registration
   information stored on each access switch needs to be synchronized
   across the entire switch network so that host/storage devices under
   other access switches can obtain the information.

   The synchronization information about the host and storage devices
   belongs to the application layer's information.  A new protocol
   should be defined to implement the information synchronization.








Wang, et al.            Expires 8 September 2022                [Page 9]

Internet-Draft              Abbreviated-Title                 March 2022


   +-------+           +----+      +------+      +----+      +-------+
   |  HOST |-----------|TOR1|------|Spine1|------|TOR3|------|Storage|
   +---/---+           +-/--+      +--/---+      +-/--+      +---/---+
       |---------------->|            |            |<------------|
       |  Register Msg   |----------->|<-----------| Register Msg|
       |                 |<-----------|----------->|             |
       |<----------------|  Info Sync |  Info Sync |             |
       |Notification Msg |            |            |             |
       |                 |            |            |             |
               Figure 7 : Information Advertisement

6.  Reliability Consider


6.1.  Storage Failure

   When a storage device is faulty, the access switch detects the fault
   and spreads the fault on the network.  After receiving the fault, the
   host that subscribes to the storage device can switch to another
   storage device.  The switchover is performed by the host side.  The
   network side needs to quickly notify the host of the fault.

6.2.  Host Failure

   When a host is faulty, the access switch detects the fault and floods
   the fault on the network.  Hosts and storage devices determine
   whether to subscribe to the fault status of a specified host based on
   the implementation.

6.3.  Access Link Failure

   When an access link is faulty, the access switch detects the fault
   and spreads the fault on the network.  After receiving the fault, the
   host that subscribes to the storage device can switch to another
   storage device.

   To accelerate fault detection, BFD or other fast detection
   technologies can be used to accelerate it.

6.4.  Network Link Failure

   ECMP or redundant link protection is usually deployed to prevent this
   failure.

   When multiple links fail on the network side, the switch network may
   be split.  In the two split networks, each host receives the
   corresponding notification and performs different serves on the
   storage devices.



Wang, et al.            Expires 8 September 2022               [Page 10]

Internet-Draft              Abbreviated-Title                 March 2022


6.5.  Network Device Failure

   The fault is equivalent to a network link fault or an access link
   fault or both.

7.  Security Considerations

   TBD

8.  IANA Considerations

   This document makes no request of IANA.

9.  Acknowledgements

   NA

10.  References

10.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

10.2.  References

   [ODCC-2020-05016]
              Open Data Center Committe, "NVMe over RoCEv2 Network
              Control Optimization Technical Requirements and Test
              Specifications", 2020.

Authors' Addresses

   Haibo Wang
   Huawei
   No. 156 Beiqing Road
   Beijing
   100095
   P.R. China
   Email: rainsword.wang@huawei.com









Wang, et al.            Expires 8 September 2022               [Page 11]

Internet-Draft              Abbreviated-Title                 March 2022


   Lily Zhao
   Huawei
   No. 3 Shangdi Information Road
   Beijing
   100085
   P.R. China
   Email: Lily.zhao@huawei.com


   Shuanglong Chen
   Huawei
   No. 156 Beiqing Road
   Beijing
   100095
   P.R. China
   Email: chenshuanglong@huawei.com



































Wang, et al.            Expires 8 September 2022               [Page 12]