An Architecture for a Better Internet
Bringing human centric design principles to lessons learned from the web architecture
In my last post, I wrote about how REST architectural principles feed into the problem with surveillance capitalism that the world faces today. Today, I want to explore how we might approach an architecture for a better, more human centric internet.
Internet or Web?
First, some quick disambiguation. The Interpeer Project’s aim is to provide infrastructure for a next generation human centric internet — so why focus on REST, when that is specific to the web?
There is a practical consideration here as well as one rooted in the status quo: on the one hand, we need to work with existing internet infrastructure in order to get software out there, so right now we’re not going to replace the internet in our tech stack. However, where possible we make provisions for that future scenario.
More importantly, though, when you speak with people and when you see how the internet is used nowadays, outside of email and SSH, hardly anything gets done that is not in some way layered on top of HTTP, the web protocol. It’s worthwhile looking at the actual web use cases in order to understand better what the internet as a whole has become.
In the previous post, I highlighted that in the web architecture, scale is achieved by shifting control to server implementors. In particular, this is done by a) allowing for processing semantics to intermix with data transfer semantics, and by b) allowing for data transfer formats to be inextricably tied to server-side and server-provided code.
The good effect of this that it permits endless adaptability of the overall architecture to the website owner’s needs. But this is bought at the expense that the concept of data ownership by the user is effectively dissolved.
Now one of these things — client apps that one has to download and run in order to process data — predates the web and internet by quite a bit. Long before it was customary for web apps to run code in our browsers, we installed and ran software on our computers to process data formats that were undocumented and effectively unreadable.
I don’t think that this is something that can be addressed very easily. Short of legally forcing open documentation of every data format and the semantics of associated operations, software providers will always have an incentive to lock users into their ecosystem somehow. And obfuscating data formats is just the most effective way for achieving that.
And since internet bandwidths are what they are these days, this effectively implies that downloadable apps must remain a thing of sorts. Software downloads are by far the most common delivery mechanism for extending a computer with the functionality that permits reading such data.
We’ll have to embrace that, even if it doesn’t help with un-siloing our data. But what we can do is make it significantly easier to default to a mode of operation that does something different.
As explored in the previous post, the hypermedia semantics envisioned by HTTP protocol designers didn’t really come to pass — or rather, they did, and then the world moved on to (include) other things. So let’s briefly look at high-level use-cases we have for the web.
Hypermedia semantics are still around, but they’re somewhat subsumed into more generic publishing use cases. Rather than publish something on a server one owns, and wait for people to discover it by following some link, we have added subscription mechanisms in the form of social media, RSS or ATOM feeds, or the ActivityPub protocol of the federated web.
Each individual piece of content still follows fairly traditional hypermedia rules — but we have added subscription to some content collection as the primary means for discovery.
We have also added an identity concept, either of a person or some other entity, as the owner-author of either content pieces or the collection. This is a direct result of shifting away from the owner of a server as the author of a content piece towards server owners providing a platform for other people.
We have also added a concept of sharing, which is to add someone else’s content into our own collection, and optionally making it available to subscribers. This can be with or without commentary.
Somewhat adjacent to all of these additions we have used web platforms for sharing arbitrary files. The mechanisms are relatively similar in concept as the above, but implementations usually differ strongly. One new aspect is permission management, which is less strongly represented in the above content- and identity-centric social media contexts.
Orthogonally to the functionality above, we have instituted streaming as a new form of content sharing, in which subscribers receive updates — additions, really — to a single piece of content to be consumed in a piecemeal fashion.
Also modification of shared resources can be performed in such a piecemeal fashion, by multiple parties in collaborative editing scenarios.
We also have messaging as a use-case, and video chat. I tend to treat them as special versions of previous use-cases: messaging is collaborative editing of a message stream, effectively. Similarly, video-chat is just N-way streaming. But it’s worth bringing both up.
We have formalized the concept of remote APIs somewhat: these are URIs with documented semantics and input/output data formats, accessible in principle by anyone with a client that implements complementary behaviour.
For completeness’ sake, let’s mention the downloading of client code again here. It’s everywhere, but really more something that is done in support of the above use cases.
It’s not that the web is limited entirely to these use cases. But these are so prevalent, that there are clear patterns visible. And I would be hard pressed to find a reasonably popular web application that does not follow these patterns at all.
But if this is a sufficiently complete list, and I for one believe it is, then in order to build “web scale” systems, we do not actually need the REST architecture with its application-data-silos at all.
File System-Like Semantics
I outlined use-cases 1-3 above as being distinct from 4, but that is only true because applications tend to treat them very differently, and the reasons for this lie in the magic “web scale” requirement.
You see, for content sharing — cases 1-3 — we have long accepted that only eventual consistency can be achieved in such large scale systems. While I post something on social media right now, I am perfectly content knowing that you may only be able to read it in a few minutes time. This is so normal that we don’t even truly consider it unless the eventual consistency is achieved in a significantly longer time period than usual.
In case 4, the actual file sharing use-case, we have accepted something similar. Files you upload explicitly or put in a shared folder will eventually appear on the other person’s computer, if they subscribe to the enclosing folder.
There is so much similarity between the content and file sharing use-cases that it takes some effort to pick them apart at all. But let’s do so anyway, and highlight the similarities as we go along.
Permissions are more or less the same in either case. Permissions tend to come in the form of read vs. write/modify permissions. Social media has added “share” permissions, which are tied to the platform in the middle. Nothing prevents you from reading bits, and writing the same bits somewhere else. Sharing is the same activity in simpler, moderated by the platform.
The collection that we subscribe to tends to be a bit different in either case. In file sharing, we subscribe to a folder. In social media, we subscribe to a content feed. The content feed tends to have a more specific scope, i.e. “comments on a post” or “public posts of a person”, but that is the semantics the application applies to the feed rather than a feature of the underlying publish-subscribe-mechanism.
Similarly, we may not subscribe to sub-collections in content feeds. In file sharing, we tend to treat each share as something that has to be synchronized in its entirety.
Lastly, the “comments on a post” example above illustrates that unlike in file sharing, we can easily treat each individual content item as a subscribable collection in itself. In file sharing, that is rarely seen.
All in all, though, use-cases 1-4 really are surprisingly like the semantics of a distributed file systems. The main differences lie in being able to more specifically zero in on which sub-collection of a shared collection you want to subscribe to, and accepting eventual consistency.
So why not model these as a kind of file system?
When we look at 5-7 above, we can see that there is a real-time component to some web use-cases. Streaming can be viewed as a subscription to an incomplete file, or to a collection that keeps having data chunks appended.
Both the messaging and video chat cases can be built on top of that point of view: either could be conceived as a collaboratively edited single data stream. Or what each person produces — messages, video or audio content — could be considered an individual data stream, with the overall conversation modelled via a collection that contains meta information coordinating the individual streams. And if we recall the “comments on a post” situation above, maybe the distinction is a little arbitrary to begin with, and we don’t have to worry too much about that right now.
Either way, what distinguishes these use-cases significantly from the file system-like use-cases above is the ability to very deliberately subscribe to real-time updates of a smaller collection.
We do actually have an equivalent in file systems, and that’s pipes and asynchronous I/O. While the former allows for one process to continuously produce content that the other consumes, asynchronous I/O is as close to subscription semantics as operating systems offer.
It would therefore not be inconceivable to combine both groups of behaviour into a real-time capable file system-like system.
APIs similar in nature to those on the web, that is with client and server component isolated from each other, have existed outside of the web for quite some time. General purpose inter-process communication (IPC) is typically modelled in systems built atop asynchronous pipes. Here, requests are serialized in a format that translates application-language specific type information into one specific to the IPC mechanism — not much different from e.g. JSON payloads. Similarly, request metadata such as a specific functionality to invoke is encoded in a way that is not dissimilar to e.g. HTTP request URIs and headers.
Where local IPC tends to differ from the HTTP protocol is that it usually acknowledges asynchronicity, and provides each request with an explicit request token that is mirrored in the response. The client can then provide whatever abstraction of synchronicity or asynchronicity the application language expects based on this information.
What that means is that APIs is really nothing but an RPC mechanism layered on top of an asynchronous pipe mechanism. Surely that also fits well unto a more unified view of the above use-cases?
And since we explicitly talked about software delivery as a use-case, let’s just recall that software is nothing but a collection of files. What we may subscribe to in this case is not so much the folder in which these files reside, but checkpoints — author-defined collections of updates that represent a known-good state. In distributed version control, we’d call that tags. The idea is similar enough.
Is this enough?
As one can hopefully see, the actual common use-cases for the web map neatly onto file system-like semantics. The main differences lie in access control, subscription scope and eventual consistency vs. explicit real-time requirement.
This isn’t a particularly new revelation; in fact, it’s one that coincides with the evolution of the web. The 9P Protocol, originally conceived for the Plan 9 operating system, was first published in 1992. Plan 9 is notorious for describing all its functionality in the form of “file severs” that use file-like representation for all its highly specific semantics.
Drawing on the screen, for example, is done via draw commands written to a specific file representing the screen — moreover, since 9P is a distributed file system protocol, one can write code that draws on a remote screen, not connected to the current computer.
It appears that if one can build an entire distributed operating system on top of a distributed file system layer, then an enhanced file system-like layer should also allow for the right mixture of performance and “web scale”. The main difference here that distinguishes 9P and makes it difficult for it to achieve this scale is that 9P operates in an entirely synchronous and consistent model. If we distinguish between immediately consistent real-time operations and eventually consistent non-real-time operations, scale is much easier to achieve.
Privacy vs. Security
In the previous post, I wrote about bringing code and data together. One has the choice of either pushing local data to remote code, or pulling remote code to local data. I also wrote that the former risks privacy, while the latter risks security.
The web’s focus on leaving URI semantics entirely in the hands of the server implementor promotes a use of the web architecture that accumulates data in the hands of server owners.
A more explicit focus on file system-like semantics can help prevent that. Rather than making data we transfer representations of a resource, we fully embrace that the resource is the data. And then we encrypt it, so that the server can’t read it.
This prevents some forms of abuse, but also abandons some of the flexibility in both the web architecture and 9P’s API-by-filesystem approach. We have now effectively cut out server implementations entirely from accessing data.
The secret sauce to achieving this, but making access significantly more selective, lies in cryptographic methods. They offer no panacea for ensuring security and privacy, but there is a collection of methods available that can help reduce abuse.
I’ll have to write about these in a later post; this one is long enough already. But at the heart of it lie two relatively simple ideas (that end up getting somewhat complicated in practice, so let’s not go there now):
The double ratchet algorithm is not only applicable to message streams, but to any stream. And any file or content collection can be represented as a stream of updates to a stable identifier. We can therefore selectively add and remove access to and from file updates, which includes granting temporary access to a server for processing purposes.
Via a certificate-like mechanism, we can not only delegate proof of identity — we can also delegate proof of permissions. A signature over three-tuple is enough for that, if the tuple contains an identifier for an actor, an identifier for an object to be acted on, and an identifier for an action. Verifying an owner signature over such a tuple is enough to prove that the owner intended for the actor to be able to perform the action on the object.
I realize that the exact explanation of the cryptographic parts is missing for the time being. Assume for the moment that these work.
Then we can make the following statements:
A real-time capable (but selectively eventually consistent) file system-like system can provide for the use-cases the web is currently used for.
Delegating of permissions allows for cacheability (not very much discussed here) in the same way the REST architecture does, and therefore enables web scale.
Layering API semantics over the the real-time distribution mechanism allows for similar kinds of server flexibility as in 9P and the web, but…
… cryptographic access control would prevent infinite and unbounded abuse by servers having access to typically encrypted data.
We should, therefore, be able to build a distributed “web scale” system that makes surveillance capitalism that much harder. It still provides for selectively yielding control over data in exchange for benefits, which is arguably the foundation of human collaboration.
The basis of this, however, is at least the ability to have light-weight real-time-ish updates sent in response to subscriptions. That’s what the current protocol work is focused on, with a bunch of other requirements thrown at it that reflect the modern internet.
When that’s sufficiently implemented, I’ll pick up the file system-like behaviour again, including a much harder look at how cryptography can help us here. For the moment, though, this post and the last should provide for a better idea of what the Interpeer Project aims for.
A quick note on subscriptions: I’m trying to keep doing this work, and have set up subscriptions so that people can help me pay my bills. That means some articles are effectively paywalled — if you can’t pay, fear not: there is a special launch promotion open.
You’ll get free lifetime access, and I can keep the paywall up for folks coming from the outside. And if you are in the right space to be extra awesome and pay for a subscription, that’s all the more appreciated!