Thing-to-Thing Research Group | M. Burgess |
Internet-Draft | Independent Researcher |
Intended status: Informational | H. Wildfeuer |
Expires: October 22, 2016 | Cisco Systems |
April 20, 2016 |
Federated Multi-Tenant Service Architecture for an Internet of Things
draft-burgess-promise-iot-arch-01
This draft describes architectural recommendations for a unified concept of Cloud Computing and Internet of Things, based on tried and tested principles from infrastructure science. We describe a functional service architecture that may be applied in the manner of a platform, from the smallest scale to the largest scale, using vendor agnostic principles. The current draft is rooted in the principles of Promise Theory[Bergstra1] and voluntary cooperation.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on October 22, 2016.
Copyright (c) 2016 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
The scenario we call the Internet of Things (IoT) is an inflection point in the development of information local and global infrastructure. We know cloud computing as a commoditization of primary infrastructure resources (also `things') for flexible datacentre hosting. The facilitation of a common platform for the next generation of global commerce presents a challenge of both technological and human dimensions. Not only do we have to solve the matter of technology at scale, we must also solve the matter of human dignity and participation. This is a challenge that spans every layer of the software and networking stacks, yet can be described in general terms without the need to specific implementations. That is our goal in this (revised) draft. Only a few new ideas are needed to synthesize this infrastructure, however several old technology practices must be deprecated for scaling and security considerations.
A platform for society as a whole must be vendor agnostic at its root, and must leave ample space for vendor specific creativity on top. What distinguishes IoT from past scenarios is the prolific contact surface it will expose to the physical world, embedding devices pervasively in our close environments, and touching every part of human life. At the time of writing, IoT has barely begun to emerge in domestic and industrial settings; however, choices we make now could help or hinder the development of an adequate platform over the coming decades. The proposed architecture not only scales up to large numbers, it also scales down to small devices of low capability; from the largest installations to the smallest, and from the tiniest amounts of data, to vast data-stores collected by scientific computing at the limits of possibility.
The term "PROMISE", "PROMISES" in this document are to be interpreted as described in Promise Theory [Bergstra1]
When used, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].
A platform that brings computing closer to users, away from specialized datacentres, must be based on plausible assumptions. We assume that devices live in a partially connected environment, of limited reliability: they MUST be fault tolerant to loss of communications, both with other devices, in the course of providing application services, and with trusted sources of information. The minimum level of interdependency is recommended to facilitate this.
For a nascent Internet of Things, our focus is naturally drawn to the specialized leaf devices, where data may be produced or consumed. It will take many years to commoditize these sensors and actuators, and their local communication architectures. However, these are only half the picture. `Thing' devices, by design, also communicate with online services deployed `higher up', or `Northbound' in the system, to offload analysis and decision-making. Their physical capabilities thus place them into two broad categories:
+------------------+ | FFD / Standalone |--> IP Endpoint +------------------+ +------------------+ | RFD / Peripheral |--+ +------------------+ | +------------------+ +------| FFD / Standalone |--> IP or L3 endpoint +------------------+ | +------------------+ | RFD / Peripheral |--+ +------------------+
Devices may be standalone (FFD), with service interfaces, or hosted peripherals (RFD), where data are exposed through service interfaces from other buses, e.g. USB, CANbus, MODbus, Profibus, etc.
Figure 1
Standalone devices are full stack devices that provide data oriented services to data clients
Stand-alone devices and transducers can vary considerably in their processing, memory, and connectivity constraints. This architecture assumes a minimum resource level at the stand-alone device, but the device must support `full stack' implementations. In practice, this implies that they contain an embedded OS (e.g. Linux), and are capable of running an agent providing secure service and connectivity interfaces.
Centralization of intent is a natural form of coordination. However, centralization of command and control (e.g. by API) is not practical in environments where the density of devices and overlapping concerns reaches the level of a pervasive Internet of Things. Legacy technology is pervasively centralized and top-down in nature: requests for domain names and name services, IP address assignment, change management of information records, cloud controllers, service entry points, etc. The barriers to scalable autonomous activity are high. The trend has been to delegate these activities to sub-authorities (multiple queues) to limit scaling bottlenecks, which improves queue latency but not minimum transaction time. In the future, resource management can profit by propagating intentions and desires from the bottom-up, i.e. from the many points of service consumption, with localization to minimize queueing and flooding contention of the independent needs.
Infrastructure ownership is an important issue in a multi-tenant consumer environment. While some devices can be centrally managed by providers (regardless of owner), many devices in an Internet of Things will be owned by private individuals who will permit management by centralized services. Devices may be managed by:
Federation of intent, aka multi-tenancy or diversity, all point to the need for Special Interest Groups (SIG) or workgroups, who specialize within organizations to develop expertise. Software architectures following this pattern are sometimes called microservice architectures. We shall introduce the notion of `workspaces' as a federated infrastructure abstraction designed to wrap one or more of these specialized services under an umbrella abstraction that is easy to understand and work with. A goal of the workspace is to expose only the working parts that need to interact with consumers, e.g. in the same way that one does not expose the inner components of a television or a car.
Federation is desirable along a number of lines:
See sections below for further information.
Although user-facing devices, deployed in the field, may be separate from the agencies processing their sensory data, or feeding them guidance (e.g. as policies), it becomes increasingly impractical to transport data over long distances between leaf devices and `cloud' services as the density of deployed devices grows. The logical outcome is therefore a decentralization of the processing cloud itself, so as to bring all necessary resources close to the field-deployed data sources themselves. To scale such a distribution, the data services will naturally associate with private workspaces, which bound the scope of data generated by Things.
Workspaces may be thought of as a modernization and generalization of the familiar network domain concept. Workspaces go beyond namespacing, to include federation, collaboration, and segmentation of services. Currently, name domains are typically linked to simple directory services (DNS, Active Directory, LDAP etc) for name-address mapping. These are assigned from some top-down agency, either within an organization or even beyond it, at a regional level. The demands of multi-tenant environments, where shared resources and separate business-processes mix and compete, make these older services less than optimal, though not inherently flawed. It is awkward to separate independent collaborative activities and then manage their interactions on a need-to-know/need-to-do basis, without involving multiple human interventions. Cloud APIs bring some improvement, by exposing arbitrary capabilities to remote operation, but the remoteness also brings risks and inefficiencies by exposing an attack surface, and from lack of situational awareness of actual state.
Workspaces are related to the more familiar notion of namespaces in information technology; however, namespaces refer mainly to priority name-referencing of objects, without necessitating underlying resource access segmentation. Workspaces MUST support multi-tenant separation of concerns within a hosted hardware resource space. Today, workspace-like facilities are commonly offered as user logins on computer operating systems or online services, and quasi-workspace-like facilities are offered by virtual private networks, and VLANs, etc, in networking. However, resource management platforms do not yet bring the same level of flexibility to infrastructure. Typically, resources are managed by top-down agencies who design and grant resource usage manually, leading to the insertion of a human processing timescale into the flow of automation.
Workspaces need to be able to promise segmentation and privacy. This involves some basic capabilities:
At the time of this revision, many of the properties of workspaces are being explored under the aegis of cloud application cluster managers. Google's Kubernetes [K8S], for example, is presently the most ambitious of these. It is plausible to imagine extending such a system to cover the workspace proposal for both cloud and IoT scenarios.
For a collaborative Internet of Things, where interests span many issues from manufacturer interests, to personal ownership, service provider concerns, functional responsibility, and security, etc, the technologies for inter-group collaboration need to be modernized to support logical segmentation, authenticated access, instrumented delegation, shared name-service information, as well as private naming, all across a converged palette of resources: compute, network, storage and sensor-actuators. This is somewhat reminiscent, but not identical, of the goals of Named Data Networking (NDN) [NDN], which promotes the semantics of space above the details of its interconnectivity.
Ubiquitous computing (the Internet of Things) is all about how networked devices support a wider variety of workspaces than industrial scale central services. As the density of device resources (compute, storage, sensors, actuators) in a workplace or home environment increases, isolation of regions, and mapping of resources to responsible or interested parties become more difficult problems, both to implement and to understand.
A detailed description of workspaces will be given separately [WORKSPC].
The following characteristics describe compatible policy update processes:
The properties we are looking for in workspaces, suggest an architecture based on the principles of promise theory; such a promise-oriented architecture is described implicitly in [DSOM2005] and [Bergstra1]. It lays out a generic `bottom up' management concept, in which devices each have the responsibility for their own state and roles. It resembles Service Oriented Architecture (SOA) superficially, without reference to specific technologies, implementations or protocols, and relates to the modern notion of microservices [MicroS]
By formulating architecture from the bottom up, one can easily account for multi-contextual concerns, from developer concerns about realtime software updates (Continuous Delivery and DevOps etc), to operational service scaling, governance, and security, in a way that top-down schemes cannot easily achieve.
The relationship between a generic promise-oriented architecture and the concept of a workspace is that the former provides a necessary and sufficient basis for implementing the latter. Workspaces are expected to be a friendly interface to the underlying promise architecture, separating interior and exterior promises cleanly and intuitively.
A promise-oriented architecture communicates (e.g. intent and data) by authenticated publish-subscribe (aka "pull") methods, for security and predictability. In a workspace, devices MUST not accept control commands imposed upon them by remote "push" methods, as this exposes a security risk and may lead to inconclusive results during uncoordinated pushes (multithreaded access). In the vernacular usage of "control plane" and "data plane", control is asserted through agreed service level policies, and data are exchanged within services to carry out functions.
Every standalone device operates autonomously, with policy guidance from its owner, without direct external intervention. To form a workspace, any standalone device can give up that autonomy to a trusted manager, offering policy updates as a service. Workspaces separate interior and exterior promises about resources and their accessibility: they are a priori opaque from the exterior, and transparent from the interior. By joining a workspace, any device (whether a cloud server or an IoT embedded FFD, subordinates itself to a bounded policy domain with a private namespace. Policy determines whether a given member of the workspace will expose public service entry points, or will be entirely anonymous to exterior agents. This is very close to the community namespace idea currently being implemented in the Kubernetes cloud cluster manager [K8S]. One can imagine a meta-cluster of such cloud clusters, which are designed to tolerate partial network reliability, as forming a basis for the Internet of Things alongside more traditional high reliability datacentre environments. Currently, cluster managers assume top-down ownership, rather than autonomous self-management, and expose more complexity through APIs than is appropriate for average engineers.
+--------------------------+-----------------------+ | Workspaces | Cloud clusters | +--------------------------+-----------------------+ | Bootstrap server | Cluster master node | | Standalone device | Application Pod | | Intermittent network | Reliable network | | Network by directory | Network by overlay | | Converged infrastructure | Siloed infrastructure | | Bottom up | Top down | | Unreliable network | Reliable network | | Node members anywhere | Nodes in datacentre | | Master policy source | Master controller | | Need to know | Full consistency | +--------------------------+-----------------------+
Rough correspondence between contemporary cloud clusters and proposed workspace concept: workspaces are principally a bottom-up, self-service collaboration over multiple clusters with more diverse hardware and software.
Figure 2
All devices provide services with varying degrees of sophistication. Peripheral devices serve data or actuators to host devices, and standalone devices expose functions to one another as software services. Each server plays a role to be composed into the wider system.
Services may be used both for basic infrastructure support, and for driving user applications. No limitations need be stated about applications. Each fully functional, standalone device is free to host any application services. The result is superficially similar to the Service Oriented Architecture [SOA], but without reference to a specific technology or methodology. In modern parlance, the model is an example of microservices [MicroS].
Data collection services are also best implemented with pull methods, for resource-light scalability and security. However, extremely limited application devices might initially struggle to support this mode of operation.
Service scaling is a task for workspaces. Public (exterior) services can be provided in a standardized manner, through accessible points of entry, whose name information is propagated publicly, analogous to a DNS directory. Workspaces can hide internals (e.g. vendor or implementation specific details, private internal services, load sharing parallelism.
Interior name services deal with the registration and propagation of information between workspace members, on a need-to-know basis. This is never visible to the exterior network.
The basic atom of bottom-up policy is a promise. Each promise consists of three things:
In a promise architecture, every device is contextually evaluated and integrated from the bottom up, according to the promises is keeps, e.g. the services it provides, its behaviours and properties, etc. Thus every device is modelled by its individual degree of agency to act as a proxy for human intent (policy).
Standalone devices are assumed to be equipped with policy-keeping software agents. Peripheral devices, such as sensors or actuators, are assumed to be integral parts of the standalone devices, and hence maintainable by the their software agents.
NO system MUST push changes or data to such agents ad hoc, without a documented promise to accept; thereafter, `fault tolerance' demands that we reject the word `must' from most descriptions, and replace it with `promise of best effort', as to reply on perfect behaviour leads to brittle systems with unrealistic expectations. For human safety in a rapidly expanding sphere of human involvement, the only `must' is for each agent to be stable and self-correcting, subject to the guidance of policy.
The following characteristics describe the cooperation between agents:
Each policy agent promises to maintain a context evaluator that computes a set of classifying `tags' or `labels' that characterize the state of the agent. This is updated every time the agent verifies policy, as its state may change as a result of repairs. These may be used as conditionals for distributed policy-based decision-making.
Contextual labels characterize the device, its environment, and its location and time. The labels can then be used in policy to make certain promises apply only in specific contexts.
When promises, within a policy, are tagged by issue or context, agents can select those that apply to its condition, within a larger trust relationship implied by policy sourcing. This simplifies logic and promotes stability, as evidenced by experience with software agents [CFENGINE].
Policy change can be initiated from within a workspace, subject to a defined quality assurance, or fit-for-purpose review. Thus change of infrastructure may be instigated from the bottom-up also, as a self-service request.
Infrastructure stability is supported by a separation of systems into agencies that act in alignment with specific, separable timescales. Separation of fast and slow timescales avoids tight coupling and associated complex behaviours and should be considered a priority for maintaining safe, stable systems for human dependence.
Systems scale along two broad lines, which a promise-oriented architecture helps to resolve:
Changes to semantics should generally be slow compared to the workload related dynamical activity, in order to maintain functional stability. Cooperative design of workspaces may observe this principle to foster functional stability and workload efficiency.
A number of functional roles are required to maintain a service lifecycle in a distributed environment. Making these roles self-managed within each workspace is how one scales the diversity of human intent and concerns. Roles are defined by the kinds of promises kept by devices:
"Control data" "Application data" +--------------------------------------------------------------+ |+------------------+ +------------------+ +----------------- +| +-----------------+ || Bootstrap server | | Policy server | | Directory server || | Data client(s) | |+------------------+ +------------------+ +----------------- +| +-----------------+ +--------|---------------------|----------------------|--------+ | | | | | +----------------+ | | | | | | | +------------------+ | | | | | FFD / Standalone | | | | | | Bootstrap client|--+ | | | | Policy client |-------+ | | | Directory server|------------------------------+ | | Data client |--------------------------------------------------+ +------------------+ "Thing(s)"
The roles in each collaborative workspace. Devices at the bottom of the figure typically coordinate through workspace services hosted in the "cloud" or any nearby compute resource. Efficiency suggests avoiding long data paths, instead moving computational resources closer to data collection points.
Figure 3
Bootstrapping new devices into a workspace represents the beginning of a device lifecycle. Devices must begin with the location of a known bootstrap server. Devices must also promise to advertise their nature and capabilities, called `identification'. This may include Manufacturer Usage Description (MUD) identifiers [MUD].
So far, much as been said on how the application devices provide services via promises, and how system intent can be described and orchestrated via policy. There is also a connectivity (transport) fabric for these devices that operates on a set of promises that underly the described service framework, i.e. the network. Each network endpoint can be seen as providing its own set of promises that are used by other network elements to deliver routing and switching capabilities [PromiseNet]. The simplest form of SDN is simply name registration and route management.
Intent driven networking is becoming more relevant as Software Defined Networking (SDN) deployments proliferate. In the described IoT architecture, service policies that describe the IoT system intent can be used as an input to derive partial network policies (e.g. Group Based Policy or some other model-based approach), with modulation by other data discovered from bootstrapping, etc. The figure below illustrates the relationship between the service and network layer policies for IoT.
+--------------------+ | IoT Service Policy | +--------------------+ | +---------------------+ | +--------------------+ | Topology / Location | | | Orchestration | | +-+-+ | Bootstrap data | | | Organization policy| +---------------------+ | +--------------------+ | \|/ v +--------------------+ | IoT SDN policy | +--------------------+
Service policy could be partially rendered as an SDN baseline for simplifying dependency management. The simplest form of SDN is simply name registration and route management.
Figure 4
The architecture, described in this draft, enables densely clustered IT resources to form arbitrary self-service communities that span local or wide area networks. This is decouples a logical patchwork of segments on top of a plain end-to-end IP network. By basing on principles of fault-tolerance, including publish-subscribe dissemination semantics, this may be scaled, without bottleneck, by only the well-known methods currently employed by the World Wide Web.
IPv6 and successors will play a key role in recapturing network simplicity from the many workarounds that have been stacked on top of IPv4 and its limitations. However, currently missing are adequate directory services to support a transparent workspace concept. The present Internet architecture is still geared principally towards a shared single-tenant, top-down management model, with host authority at the top. Top down methods require the leaf domains to trust (and hence always be exposed to attack from) the layers high up in the network. However, shrink-wrapping workspace boundaries closer around their private resources, this management can be simplified, speeded up, and become less exposed.
The issues discussed and laid out in this draft address key issues of scalability, fault tolerance, separation of concerns, and federation of intent within networked information systems. The platform, described here, is a synthesis of well-known techniques, and is deliberately aligned with the needs of agile commercial spaces, as well as large industrial distributions, and small domestic needs. We purposely leave open vendor specific concerns, which can easily fit into the described architecture, on top of this common set of principles.
Interest in using IT to stimulate smart spaces (homes, buildings, vehicles, cities, etc., necessitates a scalable approach to interactive services, in which the service clients are not only humans but sensors and actuators. This cannot be scaled reliably without the segmentation of spacetime itself. Centralized cloud controllers, as we understand them today, cannot plausibly manage stable services for a society to rely on. We propose extending the notion of cloud and IoT to become a single seamless vision, without centralization as the core paradigm.
We are grateful for helpful conversations with K. Burns, M. Dvorkin, D. Maluf, and E. Lear.
With a pervasive contact surface onto both the Internet and the real world, security is obvious a major concern. Experience with pervasive frameworks like [CFENGINE], as well as theoretical studies of pull-based architectures, suggest that the promise-oriented pull-only architecture can reduce the exposure to denial of service attacks and data-based overflow attacks, by rejecting all external data sent without invitation. Moreover, the tie-in between service and network policy reduces the likelihood of errors in policy across the layers.
Workspaces can play a role too here, as a shrink-wrapping of service scope around minimal set of endpoints, thus reducing the logical contact surface for data communications, and publishing information purely on a need-to-know basis. We take is for granted that workspace data are encrypted with workspace authorized credentials.