Network Working Group J. Yasskin
Internet-Draft Google
Intended status: Informational F. Erias
Expires: 28 April 2022 Igalia
25 October 2021
Use Cases and Requirements for Web Packages
draft-ietf-wpack-use-cases-00
Abstract
This document lists use cases for signing and/or bundling collections
of web pages, and extracts a set of requirements from them.
Discussion Venues
This note is to be removed before publishing as an RFC.
Discussion of this document takes place on the Web Packaging Working
Group mailing list (wpack@ietf.org), which is archived at
https://mailarchive.ietf.org/arch/browse/wpack/.
Source for this draft and an issue tracker can be found at
https://github.com/wpack-wg/use-cases.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 28 April 2022.
Copyright Notice
Copyright (c) 2021 IETF Trust and the persons identified as the
document authors. All rights reserved.
Yasskin & Erias Expires 28 April 2022 [Page 1]
Internet-Draft Use Cases and Requirements for Web Packa October 2021
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Simplified BSD License text
as described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Use cases . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1. Essential . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1. Offline installation . . . . . . . . . . . . . . . . 4
2.1.2. Offline browsing . . . . . . . . . . . . . . . . . . 6
2.1.3. Save and share a web page . . . . . . . . . . . . . . 6
2.1.4. Privacy-preserving prefetch . . . . . . . . . . . . . 7
2.2. Nice-to-have . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1. Packaged Web Publications . . . . . . . . . . . . . . 8
2.2.2. Avoiding Censorship . . . . . . . . . . . . . . . . . 9
2.2.3. Third-party security review . . . . . . . . . . . . . 9
2.2.4. Building packages from multiple libraries . . . . . . 10
2.2.5. Cross-CDN Serving . . . . . . . . . . . . . . . . . . 10
2.2.6. Pre-installed applications . . . . . . . . . . . . . 11
2.2.7. Protecting Users from a Compromised Frontend . . . . 12
2.2.8. Installation from a self-extracting executable . . . 13
2.2.9. Packages in version control . . . . . . . . . . . . . 13
2.2.10. Subresource bundling . . . . . . . . . . . . . . . . 13
2.2.11. Archival . . . . . . . . . . . . . . . . . . . . . . 14
3. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1. Essential . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.1. Indexed by URL . . . . . . . . . . . . . . . . . . . 15
3.1.2. Request headers . . . . . . . . . . . . . . . . . . . 15
3.1.3. Response headers . . . . . . . . . . . . . . . . . . 15
3.1.4. Signing as an origin . . . . . . . . . . . . . . . . 15
3.1.5. Random access . . . . . . . . . . . . . . . . . . . . 16
3.1.6. Resources from multiple origins in a package . . . . 16
3.1.7. Cryptographic agility . . . . . . . . . . . . . . . . 16
3.1.8. Unsigned content . . . . . . . . . . . . . . . . . . 16
3.1.9. Certificate revocation . . . . . . . . . . . . . . . 16
3.1.10. Downgrade prevention . . . . . . . . . . . . . . . . 16
3.1.11. Metadata . . . . . . . . . . . . . . . . . . . . . . 17
3.1.12. Implementations are hard to get wrong . . . . . . . . 17
3.2. Nice to have . . . . . . . . . . . . . . . . . . . . . . 17
3.2.1. Streamed loading . . . . . . . . . . . . . . . . . . 17
3.2.2. Signing without origin trust . . . . . . . . . . . . 17
3.2.3. Additional signatures . . . . . . . . . . . . . . . . 17
Yasskin & Erias Expires 28 April 2022 [Page 2]
Internet-Draft Use Cases and Requirements for Web Packa October 2021
3.2.4. Binary . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.5. Deduplication of diamond dependencies . . . . . . . . 18
3.2.6. Old crypto can be removed . . . . . . . . . . . . . . 18
3.2.7. Compress transfers . . . . . . . . . . . . . . . . . 18
3.2.8. Compress stored packages . . . . . . . . . . . . . . 18
3.2.9. Subsetting and reordering . . . . . . . . . . . . . . 18
3.2.10. Packaged validity information . . . . . . . . . . . . 18
3.2.11. Signing uses existing TLS certificates . . . . . . . 18
3.2.12. External dependencies . . . . . . . . . . . . . . . . 19
3.2.13. Trailing length . . . . . . . . . . . . . . . . . . . 19
3.2.14. Time-shifting execution . . . . . . . . . . . . . . . 19
3.2.15. Service Worker integration . . . . . . . . . . . . . 19
4. Non-goals . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1. Store confidential data . . . . . . . . . . . . . . . . . 19
4.2. Generate packages on the fly . . . . . . . . . . . . . . 20
4.3. Non-origin identity . . . . . . . . . . . . . . . . . . . 20
4.4. DRM . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.5. Ergonomic replacement for HTTP/2 PUSH . . . . . . . . . . 20
5. Security Considerations . . . . . . . . . . . . . . . . . . . 21
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21
7. Informative References . . . . . . . . . . . . . . . . . . . 21
Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 23
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 23
1. Introduction
People would like to use content offline and in other situations
where there isn't a direct connection to the server where the content
originates. However, it's difficult to distribute and verify the
authenticity of applications and content without a connection to the
network. The W3C has addressed running applications offline with
Service Workers ([ServiceWorkers]), but not the problem of
distribution.
Previous attempts at packaging web resources (e.g. Resource Packages
(https://www.mnot.net/blog/2010/02/18/resource_packages) and the W3C
TAG's packaging proposal (https://w3ctag.github.io/packaging-on-the-
web/)) were motivated by speeding up the download of resources from a
single server, which is probably better achieved through other
mechanisms like HTTP/2 PUSH, possibly augmented with a simple
manifest of URLs a page plans to use
(https://lists.w3.org/Archives/Public/public-web-
perf/2015Jan/0038.html). This attempt is instead motivated by
avoiding a connection to the origin server at all. It may still be
useful for the earlier use cases, so they're still listed, but
they're not primary.
Yasskin & Erias Expires 28 April 2022 [Page 3]
Internet-Draft Use Cases and Requirements for Web Packa October 2021
2. Use cases
These use cases are in rough descending priority order. If use cases
have conflicting requirements, the design should enable more
important use cases.
2.1. Essential
2.1.1. Offline installation
Alex can download a file containing a website (a PWA
(https://developers.google.com/web/progressive-web-apps/checklist))
including a Service Worker from origin O, and transmit it to their
peer Bailey, and then Bailey can install the Service Worker with a
proof that it came from O. This saves Bailey the bandwidth costs of
transferring the website.
There are roughly two ways to accomplish this:
1. Package just the Service Worker Javascript and any other
Javascript that it importScripts() (https://w3c.github.io/
ServiceWorker/#importscripts), with their URLs and enough
metadata to synthesize a
navigator.serviceWorker.register(scriptURL, options) call
(https://w3c.github.io/ServiceWorker/#navigator-service-worker-
register), along with an uninterpreted but signature-checked blob
of data that the Service Worker can interpret to fill in its
caches.
2. Package the resources so that the Service Worker can fetch() them
to populate its cache.
Associated requirements for just the Service Worker:
* Indexed by URL: The register() and importScripts() calls have
semantics that depend on the URL.
* Signing as an origin: To prove that the file came from O.
* Signing uses existing TLS certificates: So O doesn't have to spend
lots of money buying a specialized certificate.
* Cryptographic agility: Today's algorithms will eventually be
obsolete and will need to be replaced.
* Certificate revocation: O's certificate might be compromised or
mis-issued, and the attacker shouldn't then get an infinite
ability to mint packages.
Yasskin & Erias Expires 28 April 2022 [Page 4]
Internet-Draft Use Cases and Requirements for Web Packa October 2021
* Downgrade prevention: O's site might have an XSS vulnerability,
and attackers with an old signed package shouldn't be able to take
advantage of the XSS forever.
* Metadata: Just enough to generate the register() call, which is
less than a full W3C Application Manifest.
Additional associated requirements for packaged resources:
* Indexed by URL: Resources on the web are addressed by URL.
* Request headers: If Bailey's running a different browser from Alex
or has a different language configured, the accept* headers are
important for selecting which resource to use at each URL.
* Response headers: The meaning of a resource is heavily influenced
by its HTTP response headers.
* Resources from multiple origins in a package: So the site can be
built from multiple components (Section 2.2.4).
* Metadata: The browser needs to know which resource within a
package file to treat as its Service Worker and/or initial HTML
page.
2.1.1.1. Online use
Bailey may have an internet connection through which they can, in
real time, fetch updates to the package they received from Alex.
2.1.1.2. Fully offline use
Or Bailey may not have any internet connection a significant fraction
of the time, either because they have no internet at all, because
they turn off internet except when intentionally downloading content,
or because they use up their plan partway through each month.
Associated requirements beyond Offline installation:
* Packaged validity information: Even without a direct internet
connection, Bailey should be able to check that their package is
still valid.
Yasskin & Erias Expires 28 April 2022 [Page 5]
Internet-Draft Use Cases and Requirements for Web Packa October 2021
2.1.2. Offline browsing
Alex can download a file containing a large website (e.g. Wikipedia)
from its origin, save it to transferrable storage (e.g. an SD card),
and hand it to their peer Bailey. Then Bailey can browse the website
with a proof that it came from O. Bailey may not have the storage
space to copy the website before browsing it.
This use case is harder for publishers to support if we specialize
Section 2.1.1 for Service Workers since it requires the publisher to
adopt Service Workers before they can sign their site.
Associated requirements beyond Offline installation:
* Random access: To avoid needing a long linear scan before using
the content.
* Compress stored packages: So that more content can fit on the same
storage device.
2.1.3. Save and share a web page
Casey is viewing a web page and wants to save it either for offline
use or to show it to their friend Dakota. Since Casey isn't the web
page's publisher, they don't have the private key needed to sign the
page. Browsers currently allow their users to save pages, but each
browser uses a different format (MHTML, Web Archive, or files in a
directory), so Dakota and Casey would need to be using the same
browser. Casey could also take a screenshot, at the cost of losing
links and accessibility.
Associated requirements:
* Unsigned content: A client can't sign content as another origin.
* Resources from multiple origins in a package: General web pages
include resources from multiple origins.
* Indexed by URL: Resources on the web are addressed by URL.
* Response headers: The meaning of a resource is heavily influenced
by its HTTP response headers.
Yasskin & Erias Expires 28 April 2022 [Page 6]
Internet-Draft Use Cases and Requirements for Web Packa October 2021
2.1.4. Privacy-preserving prefetch
Lots of websites link to other websites. Many of these source sites
would like the targets of these links to load quickly. The source
could use to prefetch the target of a link, but
if the user doesn't actually click that link, that leaks the fact
that the user saw a page that linked to the target. This can be true
even if the prefetch is made without browser credentials because of
mechanisms like TLS session IDs.
Because clients have limited data budgets to prefetch link targets,
this use case is probably limited to sites that can accurately
predict which link their users are most likely to click. For
example, search engines can predict that their users will click one
of the first couple results, and news aggreggation sites like Reddit
or Slashdot can hope that users will read the article if they've
navigated to its discussion.
Two search engines have built systems to do this with today's
technology: Google's AMP (https://www.ampproject.org/) and Baidu's
MIP (https://www.mipengine.org/) formats and caches allow them to
prefetch search results while preserving privacy, at the cost of
showing the wrong URLs for the results once the user has clicked. A
good solution to this problem would show the right URLs but still
avoid a request to the publishing origin until after the user clicks.
Associated requirements:
* Signing as an origin: To prove the content came from the original
origin.
* Streamed loading: If the user clicks before the target page is
fully transferred, the browser should be able to start loading
early parts before the source site finishes sending the whole
page.
* Compress transfers
* Subsetting and reordering: If a prefetched page includes
subresources, its publisher might want to provide and sign both
WebP and PNG versions of an image, but the source site should be
able to transfer only best one for each client.
2.2. Nice-to-have
Yasskin & Erias Expires 28 April 2022 [Page 7]
Internet-Draft Use Cases and Requirements for Web Packa October 2021
2.2.1. Packaged Web Publications
The W3C's Publishing Working Group
(https://www.w3.org/publishing/groups/publ-wg/), merged from the
International Digital Publishing Forum (IDPF) and in charge of EPUB
maintenance, wants to be able to create publications on the web and
then let them be copied to different servers or to other users via
arbitrary protocols. See their Packaged Web Publications use cases
(https://www.w3.org/TR/pwp-ucr/#pwp) for more details.
Associated requirements:
* Indexed by URL: Resources on the web are addressed by URL.
* Signing as an origin: So that readers can be sure their copy is
authentic and so that copying the package preserves the URLs of
the content inside it.
* Downgrade prevention: An early version of a publication might
contain incorrect content, and a publisher should be able to
update that without worrying that an attacker can still show the
old content to users.
* Metadata: A publication can have copyright and licensing concerns;
a title, author, and cover image; an ISBN or DOI name; etc.; which
should be included when that publication is packaged.
Other requirements are similar to those from Offline installation:
* Random access: To avoid needing a long linear scan before using
the content.
* Compress stored packages: So that more content can fit on the same
storage device.
* Request headers: If different users' browsers have different
capabilities or preferences, the accept* headers are important for
selecting which resource to use at each URL.
* Response headers: The meaning of a resource is heavily influenced
by its HTTP response headers.
* Signing uses existing TLS certificates: So a publisher doesn't
have to spend lots of money buying a specialized certificate.
* Cryptographic agility: Today's algorithms will eventually be
obsolete and will need to be replaced.
Yasskin & Erias Expires 28 April 2022 [Page 8]
Internet-Draft Use Cases and Requirements for Web Packa October 2021
* Certificate revocation: The publisher's certificate might be
compromised or mis-issued, and an attacker shouldn't then get an
infinite ability to mint packages.
2.2.2. Avoiding Censorship
Some users want to retrieve resources that their governments or
network providers don't want them to see. Right now, it's
straightforward for someone in a privileged network position to block
access to particular hosts, but TLS makes it difficult to block
access to particular resources on those hosts.
Today it's straightforward to retrieve blocked content from a third
party, but there's no guarantee that the third-party has sent the
user an accurate representation of the content: the user has to trust
the third party.
With signed web packages, the user can re-gain assurance that the
content is authentic, while still bypassing the censorship. Packages
don't do anything to help discover this content.
Systems that make censorship more difficult can also make legitimate
content filtering more difficult. Because the client that processes
a web package always knows the true URL, this forces content
filtering to happen on the client instead of on the network.
Associated requirements:
* Indexed by URL: So the user can see that they're getting the
content they expected.
* Signing as an origin: So that readers can be sure their copy is
authentic and so that copying the package preserves the URLs of
the content inside it.
2.2.3. Third-party security review
Some users may want to grant certain permissions only to applications
that have been reviewed for security by a trusted third party. These
third parties could provide guarantees similar to those provided by
the iOS, Android, or Chrome OS app stores, which might allow browsers
to offer more powerful capabilities than have been deemed safe for
unaudited websites.
Binary transparency for websites is similar: like with Certificate
Transparency [RFC6962], the transparency logs would sign the content
of the package to provide assurance that experts had a chance to
audit the exact package a client received.
Yasskin & Erias Expires 28 April 2022 [Page 9]
Internet-Draft Use Cases and Requirements for Web Packa October 2021
Associated requirements:
* Additional signatures
2.2.4. Building packages from multiple libraries
Large programs are built from smaller components. In the case of the
web, components can be included either as Javascript files or as