Thing-to-Thing Research Group | M. Burgess |
Internet-Draft | Independent Researcher |
Intended status: Informational | H. Wildfeuer |
Expires: April 21, 2016 | Cisco Systems |
October 19, 2015 |
Federated Multi-Tenant Service Architecture for an Internet of Things
draft-burgess-promise-iot-arch-00
This draft describes architectural recommendations for an Internet of Things scenario, based on tried and tested principles from infrastructure science. We describe a functional service architecture that may be applied in the manner of a platform, from the smallest scale to the largest scale, using vendor agnostic principles. The current draft is rooted in the principles of Promise Theory[Bergstra1] and voluntary cooperation.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 21, 2016.
Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
The scenario we call the Internet of Things (IoT) is an inflection point in the development of information local and global infrastructure. The facilitation of a platform for the next generation of global commerce presents a challenge of both technological and human dimensions. This is a challenge that spans every layer of the software and networking stacks, but can be described in general terms without the need to specific implementations. That is our goal in this draft. Only a few new ideas are needed to synthesize this infrastructure, however several old technology practices must be deprecated for scaling and security considerations.
A platform for society must be vendor agnostic at its root, and must leave ample space for vendor specific creativity on top. What distinguishes IoT from past scenarios is the prolific contact surface it will expose to the physical world, embedding devices pervasively in our close environments, and touching every part of human life. At the time of writing, IoT has barely begun to emerge in domestic and industrial settings; however, choices we make now could help or hinder the development of an adequate platform over the coming decades. The proposed architecture not only scales up to large numbers, it also scales down to small devices of low capability; from the largest installations to the smallest, and from the tiniest amounts of data, to vast data-stores collected by scientific computing at the limits of possibility.
The term "PROMISE", "PROMISES" in this document are to be interpreted as described in Promise Theory [Bergstra1]
When used, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].
All devices are assumed to live in a partially connected environment. They MUST be fault tolerant to loss of communications, both with other agents in the course of providing application services, and with trusted sources of information. A minimum interdependency design may be recommended to facilitate this.
For a nascent Internet of Things, the focus is naturally drawn to the specialized leaf devices, where data may be produced or consumed. However, these are only half the picture. `Thing' devices, by design, also communicate with online services deployed `higher up', or `Northbound' in the system, to offload analysis and decision-making. Their physical capabilites thus place them into two broad categories:
+------------------+ | FFD / Standalone |--> IP Endpoint +------------------+ +------------------+ | RFD / Peripheral |--+ +------------------+ | +------------------+ +------| FFD / Standalone |--> IP Endpoint +------------------+ | +------------------+ | RFD / Peripheral |--+ +------------------+
Devices may be standalone (FFD), with service interfaces, or hosted peripherals (RFD), where data are exposed through service interfaces from other buses, e.g. USB, CANbus, MODbus, Profibus, etc.
Figure 1
Standalone devices are full stack devices that provide data oriented services to data clients
Stand-alone devices and transducers can vary considerably in their processing, memory and connectivity resources and constraints. This architecture assumes a minimum resource level at the stand-alone device, but the device must support `full stack' implementations. In practice, this implies that they contain an embedded OS (e.g. Linux), and are capable of running an agent providing secure service and connectivity interfaces.
Centralization of intent or control is not practical in environments with the density of devices and overlapping concerns exhibited by a pervasive Internet of Things.
Device ownership is an important issue in a multi-tenant consumer environment. While some devices will be centrally managed by providers, many devices in an Internet of Things will be personally owned, and would not be managed completely by centralized services. Devices may thus be managed by:
Federation of intent, aka multi-tenancy or diversity, all point to the need for Special Interest Groups (SIG) or work groups. Workspaces are places that are set aside for a particular purpose, that act as umbrellas for special interest groups. For this, we introduce the notion of workspaces.
Federation can be along a number of lines:
See sections below for further information.
Although devices will be separate from the agencies processing their sensory data, and feeding their guidance systems (policies and renderers), it is impractical to transport data over long distances between leaf devices and `cloud' services. The logical outcome is therefore a decentralization of the cloud itself to insert converged resources close to the data sources themselves. To scale such a distribution, the data services will naturally associate with private workspaces, which bound the scope of data generated by Things.
Workspaces may be thought of as a modernization of the domain concept. Domains are typically linked to directory services (DNS, Active Directory, LDAP etc). The demands of multi-tenant environments, where shared resources and separate business-processes mix and compete, make these older services less than optimal, though not inherently flawed.
Workspaces are related to the more familiar notion of namespaces in information technology; however, namespaces refer only to a priority in name-referencing of objects, without underlying resource segmentation. Workspaces MUST support multi-tenant separation of concerns within a hosted space. Today, workspace facilities are commonly offered by user logins on computing devices, and quasi-workspace-like facilities are offered by virtual private networks, and VLANs, etc, in networking.
For a collaborative Internet of Things, where interests span many issues from manufacturer interests, to personal ownership, functional responsibility, and security, the technologies for inter-group collaboration must be modernized to support logical, authenticated segmentation, shared directory information, as well as private naming, across converged resources: compute, network, and storage.
Ubiquitous computing (the Internet of Things) is all about how networked devices support a wider variety of workspaces. As the density of device resources (compute, storage, sensors, actuators) in a workplace or home environment increases, isolation of regions, and mapping of resources to responsible or interested parties become more difficult problems, both to implement and to understand.
A detailed description of workspaces will be given separately [WORKSPC].
A promise-oriented architecture is described implicitly in [DSOM2005] and [Bergstra1]. It lays out a generic `bottom up' management concept, in which devices each have the responsibility for their own state and roles. It resembles Service Oriented Architecture (SOA) superficially, without reference to specific technologies, implementations or protocols, and relates to the modern notion of microservices [MicroS]
By formulating architecture from the bottom up, one can easily account for multi-contextual concerns, from developer concerns about realtime software updates (Continuous Delivery and DevOps etc), to operational service scaling, governance, and security, in a way that top-down schemes cannot easily achieve.
A promise-oriented architecture communicates (e.g. intent and data) by authenticated publish-subscribe (aka "pull") methods, for security and predictability. Thing devices MUST not accept control commands imposed upon them by "push" methods, as this exposes a security risk and may lead to inconclusive results if there are uncoordinated pushes. In the vernacular usage of "control plane" and "data plane", control is asserted through agreed service level policies, and data are exchanged within services to carry out functions.
Every standalone device operates autonomously, with direct policy input from its owner, without being managed from an external collective. Similarly, any standalone device can give up that autonomy to a trusted manager, offering policy updates as a service.
All devices provide services in varying degrees of sophistication. Peripheral devices serve data or actuators to host devices, and standalone devices expose functions to one another as software services. Each server plays a role to be composed into the wider system.
Services may be used both for basic infrastructure support, and for driving user applications. No limitations need be stated about applications. Each fully functional, standalone device is free to host any application services. The result is superficially similar to the Service Oriented Architecture [SOA], but without reference to a specific technology or methodology. In modern parlance, the model is an example of microservices [MicroS].
Data services are also best implemented as with pull methods, for resource-light scalability and security, but extremely limited application devices might initially struggle to support this mode.
The basic atom of bottom-up policy is a promise. Each promise consists of three things:
In a promise architecture, every device is contextually evaluated and integrated from the bottom up, according to the promises is keeps, e.g. the services it provides, its behaviours and properties, etc. Thus every device is modelled by its individual degree of agency to act as a proxy for human intent (policy).
Standalone devices are assumed to be equipped with policy-keeping software agents. Peripheral devices, such as sensors or actuators, are assumed to be integral parts of the standalone devices, and hence maintainable by the their software agents.
No system must push changes or data to such agents ad hoc, without a documented promise to accept; thereafter, `fault tolerance' demands that we reject the word `must' from most descriptions, and replace it with `promise of best effort', as to reply on perfect behaviour leads to brittle systems with unrealistic expectations. For human safety in a rapidly expanding sphere of human involvement, the only `must' is for each agent to be stable and self-correcting, subject to the guidance of policy.
The following characteristics describe the cooperation between agents:
Each policy agent promises to maintain a context evaluator that computes a set of classifying `tags' or `labels' that characterize the state of the agent. This is updated every time the agent verifies policy, as its state may change as a result of repairs. These may be used as conditionals for distributed policy-based decision-making.
Contextual labels characterize the device, its environment, and its location and time. The labels can then be used in policy to make certain promises apply only in specific contexts.
When promises, within a policy, are tagged by issue or context, agents can select those that apply to its condition, within a larger trust relationship implied by policy sourcing. This simplifies logic and promotes stability, as evidenced by experience with software agents [CFENGINE].
The following characteristics describe compatible policy update processes:
Policy change can be initiated from within a workspace, subject to a defined quality assurance, or fit-for-purpose review. Thus change of infrastructure may be instigated from the bottom-up also, as a self-service request.
infrastructure stability is supported by a separation of systems into agencies that act in alignment with specific, separable timescales. Separation of fast and slow timescales avoids tight coupling and associated complex behaviours and should be considered a priority for maintaining safe, stable systems for human dependence.
Systems scale along two broad lines, which a promise-oriented architecture helps to resolve:
Changes to semantics should generally be slow compared to the workload related dynamical activity, in order to maintain functional stability. Cooperative design of workspaces may observe this principle to foster functional stability and workload efficiency.
A number of functional roles are required to maintain a service lifecycle in a distributed environment. Making these roles self-managed within each workspace is how one scales the diversity of human intent and concerns. Roles are defined by the kinds of promises kept by devices:
"Control data" "Application data" +--------------------------------------------------------------+ |+------------------+ +------------------+ +----------------- +| +-----------------+ || Bootstrap server | | Policy server | | Directory server || | Data client(s) | |+------------------+ +------------------+ +----------------- +| +-----------------+ +--------|---------------------|----------------------|--------+ | | | | | +----------------+ | | | | | | | +------------------+ | | | | | FFD / Standalone | | | | | | Bootstrap client|--+ | | | | Policy client |-------+ | | | Directory server|------------------------------+ | | Data client |--------------------------------------------------+ +------------------+ "Thing(s)"
The roles in each collaborative workspace. Devices at the bottom of the figure typically coordinate through workspace services hosted in the "cloud" or any nearby compute resource. Efficiency suggests avoiding long data paths, instead moving computational resources closer to data collection points.
Figure 2
Bootstrapping new devices into a workspace represents the beginning of a device lifecycle. Devices must begin with the location of a known bootstrap server. Devices must also promise to advertise their nature and capabilities, called `identification'. This may include Manufacturer Usage Description (MUD) identifiers [MUD].
So far, much as been said on how the application devices provide services via promises, and how system intent can be described and orchestrated via policy. There is also a connectivity (transport) fabric for these devices that operates on a set of promises that underly the described service framework, i.e. the network. Each network endpoint can be seen as providing its own set of promises that are used by other network elements to deliver routing and switching capabilities [PromiseNet].
Intent driven networking is becoming more relevant as Software Defined Networking (SDN) deployments proliferate. In the described IoT architecture, service policies that describe the IoT system intent can be used as an input to derive partial network policies (e.g. Group Based Policy or some other model-based approach), with modulation by other data discovered from bootstrapping, etc. The figure below illustrates the relationship between the service and network layer policies for IoT.
+--------------------+ | IoT Service Policy | +--------------------+ | +---------------------+ | +--------------------+ | Topology / Location | | | Orchestration | | +-+-+ | Bootstrap data | | | Organization policy| +---------------------+ | +--------------------+ | \|/ v +--------------------+ | IoT SDN policy | +--------------------+
Service policy could be partially rendered as an SDN baseline for simplifying dependency management.
Figure 3
The architecture, described in this draft, enables densely clustered IT resources to form arbitrary self-service communities that span local or wide area networks. This is decouples a logical patchwork of segments on top of a plain end-to-end IP network. By basing on principles of fault-tolerance, including publish-subscribe dissemination semantics, this may be scaled, without bottleneck, by only the well-known methods currently employed by the World Wide Web.
IPv6 and successors will play a key role in recapturing network simplicity from the many workarounds that have been stacked on top of IPv4 and its limitations. However, currently missing are adequate directory services to support a transparent workgroup concept. The present Internet architecture is still geared principally towards a crudely shared single-tenant, top-down management model, with authority at the top. Top down methods require the leaf domains to be exposed to attack from high up in the network. However, shrink-wrapping workspace boundaries closer around their private resources, their management could be simplified, speeded up, and become less exposed.
The issues discussed and laid out in this draft address key issues of scalability, fault tolerance, separation of concerns, and federation of intent within networked information systems. The platform is a synthesis of well-known techniques, and is deliberately aligned with the needs of agile commercial spaces, as well as large industrial distributions, and small domestic needs. We purposely leave open vendor specific concerns, which can easily fit into the described architecture, on top of this common set of principles.
We are grateful for helpful conversations with K. Burns, M. Dvorkin, D. Maluf, and E. Lear.
With a pervasive contact surface onto both the Internet and the real world, security is obvious a major concern. Experience with pervasive frameworks like [CFENGINE], as well as theoretical studies of pull-based architectures, suggest that the promise-oriented pull-only architecture can reduce the exposure to denial of service attacks and data-based overflow attacks, by rejecting all external data sent without invitation. Moreover, the tie-in between service and network policy reduces the likelihood of errors in policy across the layers.
Workspaces can play a role too here, as a shrink-wrapping of service scope around minimal set of endpoints, thus reducing the logical contact surface for data communications, and publishing information purely on a need-to-know basis. We take is for granted that workspace data are encrypted with workspace authorized credentials.