Thing-to-Thing Research Group M. Burgess
Internet-Draft Independent Researcher
Intended status: Informational H. Wildfeuer
Expires: April 21, 2016 Cisco Systems
October 19, 2015

Federated Multi-Tenant Service Architecture for an Internet of Things
draft-burgess-promise-iot-arch-00

Abstract

This draft describes architectural recommendations for an Internet of Things scenario, based on tried and tested principles from infrastructure science. We describe a functional service architecture that may be applied in the manner of a platform, from the smallest scale to the largest scale, using vendor agnostic principles. The current draft is rooted in the principles of Promise Theory[Bergstra1] and voluntary cooperation.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on April 21, 2016.

Copyright Notice

Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.


Table of Contents

1. Introduction

The scenario we call the Internet of Things (IoT) is an inflection point in the development of information local and global infrastructure. The facilitation of a platform for the next generation of global commerce presents a challenge of both technological and human dimensions. This is a challenge that spans every layer of the software and networking stacks, but can be described in general terms without the need to specific implementations. That is our goal in this draft. Only a few new ideas are needed to synthesize this infrastructure, however several old technology practices must be deprecated for scaling and security considerations.

A platform for society must be vendor agnostic at its root, and must leave ample space for vendor specific creativity on top. What distinguishes IoT from past scenarios is the prolific contact surface it will expose to the physical world, embedding devices pervasively in our close environments, and touching every part of human life. At the time of writing, IoT has barely begun to emerge in domestic and industrial settings; however, choices we make now could help or hinder the development of an adequate platform over the coming decades. The proposed architecture not only scales up to large numbers, it also scales down to small devices of low capability; from the largest installations to the smallest, and from the tiniest amounts of data, to vast data-stores collected by scientific computing at the limits of possibility.

2. Requirements and Promises Language

The term "PROMISE", "PROMISES" in this document are to be interpreted as described in Promise Theory [Bergstra1]

When used, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].

3. Definitions and concepts

IP endpoint
A hardware or software agent that is IP addressable, via a TCP/IP capable interface.
Static endpoint
A hardware or software agent with an IP address (prefix and subnet) that is fixed over the timescale of application service interactions.
Mobile endpoint
A hardware or software agent whose IP address location can change on the timescale of application service interactions.
Application server/service
Any agent that promises to respond to requests, from external parties, and perform services of any kind, on a timescale that we may call the application service timescale.
Multi-tenant application service
A collection of agents housed as tenants within a single host device, each offering different services, with potentially different timescales.
Client application
An agent that consumes data from an application service, requested either by imposed query or by promised schedule.
Standalone Thing (FFD)
A full function device (FFD)[OneM2M], with an IP address, that can present its own service gateway or interface to the IP network.
Peripheral Thing (RFD)
A reduced function device (RFD)[OneM2M], with no IP address, that attaches to a host gateway device as a peripheral, over an arbitrary network (USB, PCIe, CANbus, Profibus, ModBus, wireless sensor network, etc). Devices are addressable, only through the gateway service. This includes portmapped devices.
Embedded network
Any network (IP or non-IP) that is non-IP routed, i.e. contained within a host endpoint as part of a black box, e.g. isolated NAT, device bus, serial channels.
Transducer
An agent that consumes a service from another agent, and provides a new service based on the consumed service, e.g. a router, encrypter, compressor, etc.
Trust
A unilateral policy assessment of one agent by another, concerning its reliability in honouring promises. Trust is not necessarily a transitive property.
Partial connectivity
A device is said to have partial connectivity if it is unavailable for intervals of time, e.g. due to loss of connectivity, mobility, or power napping.

4. Device interconnection

All devices are assumed to live in a partially connected environment. They MUST be fault tolerant to loss of communications, both with other agents in the course of providing application services, and with trusted sources of information. A minimum interdependency design may be recommended to facilitate this.

For a nascent Internet of Things, the focus is naturally drawn to the specialized leaf devices, where data may be produced or consumed. However, these are only half the picture. `Thing' devices, by design, also communicate with online services deployed `higher up', or `Northbound' in the system, to offload analysis and decision-making. Their physical capabilites thus place them into two broad categories:

Standalone devices
These are assumed to connect by an IP addressable underlay network. Connectivity is assumed end-to-end, without reference to tunnels or software defined overlays. Routing is assumed to be provided end-to-end, and fully decoupled from the registration of devices. Segregation and firewalling of certain network regions may be included in network design, but will not be considered here.
Peripherals
These include bare sensors and actuators, which do not possess sufficient onboard resources or software interfaces, may attach to hosting standalone devices that act as a gateway and IP endpoint on their behalf.
Transducers
These pass-through devices transformers, converters, encapsulation services, etc

      
  +------------------+
  | FFD / Standalone |--> IP Endpoint
  +------------------+

  +------------------+
  | RFD / Peripheral |--+
  +------------------+  |      +------------------+
                        +------| FFD / Standalone |--> IP Endpoint
  +------------------+  |      +------------------+
  | RFD / Peripheral |--+   
  +------------------+

      
    

Devices may be standalone (FFD), with service interfaces, or hosted peripherals (RFD), where data are exposed through service interfaces from other buses, e.g. USB, CANbus, MODbus, Profibus, etc.

Figure 1

Standalone devices are full stack devices that provide data oriented services to data clients

Stand-alone devices and transducers can vary considerably in their processing, memory and connectivity resources and constraints. This architecture assumes a minimum resource level at the stand-alone device, but the device must support `full stack' implementations. In practice, this implies that they contain an embedded OS (e.g. Linux), and are capable of running an agent providing secure service and connectivity interfaces.

5. Federation of agency

Centralization of intent or control is not practical in environments with the density of devices and overlapping concerns exhibited by a pervasive Internet of Things.

5.1. Ownership

Device ownership is an important issue in a multi-tenant consumer environment. While some devices will be centrally managed by providers, many devices in an Internet of Things will be personally owned, and would not be managed completely by centralized services. Devices may thus be managed by:

Their owners
This applies in particular to personal consumer electronics, phones, cars, domestic appliances, etc, where users need to retain trusted ownership of their personal belongings.
A service provider
This applies to managed services, factory machinery, fleet vehicles, set-top boxes placed in the home, power controllers, etc, where users do not need to interact with the devices on a management level, but there is an advantage to placing a device as a local presence in a smart environment.

5.2. Tenancy and separation of concerns

Federation of intent, aka multi-tenancy or diversity, all point to the need for Special Interest Groups (SIG) or work groups. Workspaces are places that are set aside for a particular purpose, that act as umbrellas for special interest groups. For this, we introduce the notion of workspaces.

Federation can be along a number of lines:

See sections below for further information.

5.3. Proximity of services to Things

Although devices will be separate from the agencies processing their sensory data, and feeding their guidance systems (policies and renderers), it is impractical to transport data over long distances between leaf devices and `cloud' services. The logical outcome is therefore a decentralization of the cloud itself to insert converged resources close to the data sources themselves. To scale such a distribution, the data services will naturally associate with private workspaces, which bound the scope of data generated by Things.

6. Workspaces

Workspaces may be thought of as a modernization of the domain concept. Domains are typically linked to directory services (DNS, Active Directory, LDAP etc). The demands of multi-tenant environments, where shared resources and separate business-processes mix and compete, make these older services less than optimal, though not inherently flawed.

Workspaces are related to the more familiar notion of namespaces in information technology; however, namespaces refer only to a priority in name-referencing of objects, without underlying resource segmentation. Workspaces MUST support multi-tenant separation of concerns within a hosted space. Today, workspace facilities are commonly offered by user logins on computing devices, and quasi-workspace-like facilities are offered by virtual private networks, and VLANs, etc, in networking.

For a collaborative Internet of Things, where interests span many issues from manufacturer interests, to personal ownership, functional responsibility, and security, the technologies for inter-group collaboration must be modernized to support logical, authenticated segmentation, shared directory information, as well as private naming, across converged resources: compute, network, and storage.

  1. Workspaces may or may not be private, but they must be self-contained and separable, in the manner of namespaces.
  2. Workspaces may or may not be associated with multiple tenants; but they are associated with multiple issues.
  3. They represent a context for human activity, or separation of concerns, e.g. some human activities might be modelled as workspaces include: the home, a children's playground, a squash court, an office, a shop, a factory floor, building, district, city, emergency channel frequency, hot and cold water pipes, dining room, drinks cabinet, etc.

Ubiquitous computing (the Internet of Things) is all about how networked devices support a wider variety of workspaces. As the density of device resources (compute, storage, sensors, actuators) in a workplace or home environment increases, isolation of regions, and mapping of resources to responsible or interested parties become more difficult problems, both to implement and to understand.

A detailed description of workspaces will be given separately [WORKSPC].

7. Generic Promise-Oriented Architecture

A promise-oriented architecture is described implicitly in [DSOM2005] and [Bergstra1]. It lays out a generic `bottom up' management concept, in which devices each have the responsibility for their own state and roles. It resembles Service Oriented Architecture (SOA) superficially, without reference to specific technologies, implementations or protocols, and relates to the modern notion of microservices [MicroS]

By formulating architecture from the bottom up, one can easily account for multi-contextual concerns, from developer concerns about realtime software updates (Continuous Delivery and DevOps etc), to operational service scaling, governance, and security, in a way that top-down schemes cannot easily achieve.

7.1. Control

A promise-oriented architecture communicates (e.g. intent and data) by authenticated publish-subscribe (aka "pull") methods, for security and predictability. Thing devices MUST not accept control commands imposed upon them by "push" methods, as this exposes a security risk and may lead to inconclusive results if there are uncoordinated pushes. In the vernacular usage of "control plane" and "data plane", control is asserted through agreed service level policies, and data are exchanged within services to carry out functions.

Every standalone device operates autonomously, with direct policy input from its owner, without being managed from an external collective. Similarly, any standalone device can give up that autonomy to a trusted manager, offering policy updates as a service.

7.2. Services

All devices provide services in varying degrees of sophistication. Peripheral devices serve data or actuators to host devices, and standalone devices expose functions to one another as software services. Each server plays a role to be composed into the wider system.

Services may be used both for basic infrastructure support, and for driving user applications. No limitations need be stated about applications. Each fully functional, standalone device is free to host any application services. The result is superficially similar to the Service Oriented Architecture [SOA], but without reference to a specific technology or methodology. In modern parlance, the model is an example of microservices [MicroS].

Data services are also best implemented as with pull methods, for resource-light scalability and security, but extremely limited application devices might initially struggle to support this mode.

7.3. Promises

The basic atom of bottom-up policy is a promise. Each promise consists of three things:

A `promiser'
i.e. a resource that will affect a change by keeping its promise to the system, e.g. a file, a process, a transaction, a measurement, device settings, etc.
A description body
i.e. the desired-outcome that is achieved when the promise is kept. This SHOULD be implemented in a convergent, idempotent manner [CFENGINE], [CONVERGE].
A context
in which the promise applies, based on time, location, type and group membership of the devices referred to in the model.

7.4. Agents and their promises

In a promise architecture, every device is contextually evaluated and integrated from the bottom up, according to the promises is keeps, e.g. the services it provides, its behaviours and properties, etc. Thus every device is modelled by its individual degree of agency to act as a proxy for human intent (policy).

Standalone devices are assumed to be equipped with policy-keeping software agents. Peripheral devices, such as sensors or actuators, are assumed to be integral parts of the standalone devices, and hence maintainable by the their software agents.

No system must push changes or data to such agents ad hoc, without a documented promise to accept; thereafter, `fault tolerance' demands that we reject the word `must' from most descriptions, and replace it with `promise of best effort', as to reply on perfect behaviour leads to brittle systems with unrealistic expectations. For human safety in a rapidly expanding sphere of human involvement, the only `must' is for each agent to be stable and self-correcting, subject to the guidance of policy.

7.5. Standard promises

The following characteristics describe the cooperation between agents:

  1. Standalone devices promise to bootstrap to some trusted bootservice, i.e. register to one or more workspaces.
  2. Standalone devices promise to refuse direct commands imposed from network peers (as mentioned above).
  3. Policy consists of a collection of promises that apply in labelled contexts, each of which describes a unique desired end-state.
  4. Promises are kept in a convergent manner, so that all promise-keeping actions lead to the desired end-state, no matter what the initial state of the device.
  5. Agents that live on every device have drivers/renderers and make all changes without remote communication.

7.6. Contextual policy-based adaptation

Each policy agent promises to maintain a context evaluator that computes a set of classifying `tags' or `labels' that characterize the state of the agent. This is updated every time the agent verifies policy, as its state may change as a result of repairs. These may be used as conditionals for distributed policy-based decision-making.

Contextual labels characterize the device, its environment, and its location and time. The labels can then be used in policy to make certain promises apply only in specific contexts.

When promises, within a policy, are tagged by issue or context, agents can select those that apply to its condition, within a larger trust relationship implied by policy sourcing. This simplifies logic and promotes stability, as evidenced by experience with software agents [CFENGINE].

7.7. Workspace maintenance

The following characteristics describe compatible policy update processes:

  1. Devices subscribe to policy from a trusted source, download changes to the policy model when they can, and cache it locally so that it is always available.
  2. Local agents implement cached policy, without any dependence on remote communication, and in a fault tolerant fashion. The failure to keep one promise should have minimal impact on the ability to keep others.
  3. By verifying promises continuously, the agent that runs on each standalone device will know (or be able to calculate) its operational context, and can decide which promises are needed from the policy model, and whether or not to keep the promises. This scales O(1), i.e. without bottleneck.
  4. Each promise that documents and intended outcome of the system is verified and measured in the process, providing immediate and statistical feedback to policy designers about the success of the policy in describing a stable desired outcome.

7.8. Change of policy (system intent)

Policy change can be initiated from within a workspace, subject to a defined quality assurance, or fit-for-purpose review. Thus change of infrastructure may be instigated from the bottom-up also, as a self-service request.

  1. Human operators (owners or managers) decide on a policy model for all devices in an organization or policy group. This may be informed by the feedback about the success rate of previously kept promises.
  2. The changes are edited into a model, which consists of a collection of promises that should be kept by all resources on all devices.
  3. Changes are checked and tested before publishing.
  4. Once changes are approved, they are published by a policy service for download at the convenience of the standalone device.

7.9. Separation of concerns versus timescales

infrastructure stability is supported by a separation of systems into agencies that act in alignment with specific, separable timescales. Separation of fast and slow timescales avoids tight coupling and associated complex behaviours and should be considered a priority for maintaining safe, stable systems for human dependence.

Systems scale along two broad lines, which a promise-oriented architecture helps to resolve:

Dynamical scaling
Workload timescales concern the quantitative activity of the system: how fast requests are handled, how quickly service is delivered, and promises are kept.
Semantic (functional) scaling
Semantics are normally the concern of software engineers and system designers. This facilitates functional understanding. It is a form of human interface or knowledge management. It is sometimes at odds with the needs of dynamical scaling.

Changes to semantics should generally be slow compared to the workload related dynamical activity, in order to maintain functional stability. Cooperative design of workspaces may observe this principle to foster functional stability and workload efficiency.

7.10. Device roles per workspace or region

A number of functional roles are required to maintain a service lifecycle in a distributed environment. Making these roles self-managed within each workspace is how one scales the diversity of human intent and concerns. Roles are defined by the kinds of promises kept by devices:

Bootstrap server
To provide trusted need-to-know data and local contacts so that clients can begin working within a policy domain.
Bootstrap client
To accept essential directory information on trust in order to join a local policy domain.
Policy server
To deliver current policy from an authorized source, appropriate for each client (tenancy terms) from its global perspective
Policy client
To subscribe to the policy, selectively, depending on context from its local perspective.
Data server
data server (aka ``Thing'') To offer a catalogue of data streams to different tenants This includes sensors, actuators.
Data Client
To subscribe to the policy, selectively, depending on context from its local perspective.
Identity server
Manufacturer User Description service is promised by all Things providing a URI that points to a description of the device, its serial number characteristics, service details etc.
Identity client
Identity clients promise to make use of data schemas and encodings involved in the interpretation of data pertaining to the device.

      
                         "Control data"                            "Application data"
 +--------------------------------------------------------------+
 |+------------------+ +------------------+ +----------------- +| +-----------------+ 
 || Bootstrap server | | Policy server    | | Directory server || |  Data client(s) |
 |+------------------+ +------------------+ +----------------- +| +-----------------+ 
 +--------|---------------------|----------------------|--------+          |
          |                     |                      |                   |
          +----------------+    |                      |                   |
                           |    |                      |                   |
     +------------------+  |    |                      |                   |
     | FFD / Standalone |  |    |                      |                   |
     |  Bootstrap client|--+    |                      |                   |
     |  Policy client   |-------+                      |                   |
     |  Directory server|------------------------------+                   |
     |  Data client     |--------------------------------------------------+
     +------------------+

        "Thing(s)"


      
    

The roles in each collaborative workspace. Devices at the bottom of the figure typically coordinate through workspace services hosted in the "cloud" or any nearby compute resource. Efficiency suggests avoiding long data paths, instead moving computational resources closer to data collection points.

Figure 2

Bootstrapping new devices into a workspace represents the beginning of a device lifecycle. Devices must begin with the location of a known bootstrap server. Devices must also promise to advertise their nature and capabilities, called `identification'. This may include Manufacturer Usage Description (MUD) identifiers [MUD].

7.11. Connectivity and Network Policy

So far, much as been said on how the application devices provide services via promises, and how system intent can be described and orchestrated via policy. There is also a connectivity (transport) fabric for these devices that operates on a set of promises that underly the described service framework, i.e. the network. Each network endpoint can be seen as providing its own set of promises that are used by other network elements to deliver routing and switching capabilities [PromiseNet].

Intent driven networking is becoming more relevant as Software Defined Networking (SDN) deployments proliferate. In the described IoT architecture, service policies that describe the IoT system intent can be used as an input to derive partial network policies (e.g. Group Based Policy or some other model-based approach), with modulation by other data discovered from bootstrapping, etc. The figure below illustrates the relationship between the service and network layer policies for IoT.

      
                +--------------------+
                | IoT Service Policy |
                +--------------------+
                          |
  +---------------------+ | +--------------------+
  | Topology / Location | | |   Orchestration    |
  |                     +-+-+
  |   Bootstrap data    | | | Organization policy|
  +---------------------+ | +--------------------+
                          |
                         \|/
                          v
              +--------------------+
              |   IoT SDN policy   |
              +--------------------+


      
    

Service policy could be partially rendered as an SDN baseline for simplifying dependency management.

Figure 3

8. Characteristics

The architecture, described in this draft, enables densely clustered IT resources to form arbitrary self-service communities that span local or wide area networks. This is decouples a logical patchwork of segments on top of a plain end-to-end IP network. By basing on principles of fault-tolerance, including publish-subscribe dissemination semantics, this may be scaled, without bottleneck, by only the well-known methods currently employed by the World Wide Web.

IPv6 and successors will play a key role in recapturing network simplicity from the many workarounds that have been stacked on top of IPv4 and its limitations. However, currently missing are adequate directory services to support a transparent workgroup concept. The present Internet architecture is still geared principally towards a crudely shared single-tenant, top-down management model, with authority at the top. Top down methods require the leaf domains to be exposed to attack from high up in the network. However, shrink-wrapping workspace boundaries closer around their private resources, their management could be simplified, speeded up, and become less exposed.

9. Summary and Outlook

The issues discussed and laid out in this draft address key issues of scalability, fault tolerance, separation of concerns, and federation of intent within networked information systems. The platform is a synthesis of well-known techniques, and is deliberately aligned with the needs of agile commercial spaces, as well as large industrial distributions, and small domestic needs. We purposely leave open vendor specific concerns, which can easily fit into the described architecture, on top of this common set of principles.

10. Acknowledgments

We are grateful for helpful conversations with K. Burns, M. Dvorkin, D. Maluf, and E. Lear.

11. Security Considerations

With a pervasive contact surface onto both the Internet and the real world, security is obvious a major concern. Experience with pervasive frameworks like [CFENGINE], as well as theoretical studies of pull-based architectures, suggest that the promise-oriented pull-only architecture can reduce the exposure to denial of service attacks and data-based overflow attacks, by rejecting all external data sent without invitation. Moreover, the tie-in between service and network policy reduces the likelihood of errors in policy across the layers.

Workspaces can play a role too here, as a shrink-wrapping of service scope around minimal set of endpoints, thus reducing the logical contact surface for data communications, and publishing information purely on a need-to-know basis. We take is for granted that workspace data are encrypted with workspace authorized credentials.

12. Normative References

[Bergstra1] Bergstra, J. and M. Burgess, "Promise Theory: Principles and Applications", 2013.
[CFENGINE] Burgess, M., "A site configuration engine, Computing Systems", 1995.
[CONVERGE] Burgess, M., "Configurable immunity model of evolving configuration management, Science of Computer Programming", 2004.
[DSOM2005] Burgess, M., "An Approach to Understanding Policy Based on Autonomy and Voluntary Cooperation, Lecture Notes in Computer Science", 2005.
[MicroS] Richardson, C., "Pattern: Microservices Architecture", 2014.
[MUD] Lear, E., "Manufacturer Usage Description", 2015.
[OneM2M] OneM2M, , "Standards for M2M and the Internet of Things", 2015.
[PromiseNet] Borrill, P., Burgess, M., Craw, T. and M. Dvorkin, "A Promise Theory of Networking", 2014.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997.
[SOA] Open Group, , "SOA Reference Architecture Technical Standard : Basic Concepts", 2016.
[WORKSPC] Burgess, M., Dvorkin, M. and K. Burns, "Self-Service Workspaces for Federated IT Infrastructure", 2016.

Authors' Addresses

Mark Burgess Independent Researcher Oslo, Norway
Herb Wildfeuer Cisco Systems San Jose, USA EMail: hwildfeu@cisco.com