A look at ActivityPub's foundation

Nov 14, 2022

In this series of posts we’re going to explore ActivityPub, the protocol that powers microblogging across the Fediverse.

This post is going to focus on the technologies ActivityPub is built upon. It doesn’t dive into how ActivityPub itself is used to provide interoperable microblogging. That will be the topic of a future entry.

⚠️ Caveat lector: This post has an air of mild annoyance 😑. If you don’t enjoy reading this type of commentary, I suggest you stop here.

ActivityPub is a W3C standard from the same folks that standardise things like HTML and CSS. It’s built on top of another W3C standard named Activity Streams. When using Activity Streams, we usually make use of a third standard, the Activity Vocabular.

This is where our troubles start. Activity Streams makes use of JSON-LD, another (nowadays) W3C standard. Yes, it is turtles all the way down. JSON-LD pulls us into the madness that is the world of Semantic Web.

Semantic Web and Linked Data

There’s generally 3 types of reactions people have to “Semantic Web”:

What’s that?
Woooooooooooooo
Fuck me not this shit again

I’m firmly in the third category. Factual opinions on the topic are going to follow for the remainder of this section.

The theory behind Semantic Web and Linked Data is great. At it’s simplest the idea is to be able to describe all “things” in a machine-readable “vocabulary” and link “things” together through “relationships”. This turns the web into a gigantic connected graph that you can traverse and use to gleam and derive useful information from. Even better, if people describe things in a vocabulary you don’t know about you can retrieve that vocabulary and thus start to understand what they’re describing and how things relate.

The Semantic Web, aka Web 3.0 (not web3 the crypto thing, hardest problem in computer science yada yada) makes use of technologies like the Resource Description Framework and the Web Ontology Language to describe and classify things. RDF does this in XML. JSON-LD is very similar except, you guessed it, in JSON. JSON-LD tries to distance itself a bit from Semantic Web, but it’s not fooling anyone.

In practice, most of this doesn’t work out. The idea of machine-readable self-describing interconnected data is great. But it has no meaning. Machines can’t ascribe meaning to things, so unless you’ve encountered a particular vocabulary before and mapped it to your own domain model it’s essentially noise. So aside from you having to interpret the vocabulary, we also need to standardise the vocabulary. You then build in support for that vocabulary in whatever tool is retrieving that information. It can’t learn by itself.

This now means we’ve got another problem, standardising the vocabulary. Turns out this is genuinely hard and people don’t agree on things. Which creates two more problems. One of them is XKCD 927. It gets worse in that in many cases we’ll now describe the object multiple times in different vocabularies. This means we now bloat an object with potentially multiple representations of it in a way that’s essentially just noise if you don’t understand the other vocabularies. Most of these vocabularies also partially overlap.

The other is that different cultures have different ideas of how things are to be classified and categorised, what properties a thing has and which of those are meaningful when describing said thing. That results in a lot of vocabulary being mostly a lowest common denominator thing, an all properties are optional deal-with-it type thing or a makes sense in Western civilisations only type thing. And sometimes a property existing conveys some additional meaning on another thing in some but not other contexts. Humans aren’t regular and this is where it all comes tumbling down.

There are three places where I’ve seen this provide some modicum of value:

Search engines recognise certain microformats and will use that when displaying search results
Social media will generate those big fancy previews if you add Twitter Card or OpenGraph tags to your website and content. They don’t really understand any relationship between any of the content, just “this is title”, “this is header image” etc.
Ad-tech / surveillance. Guess what happens when you have a massive amount of data that you connect and define relationships between and augment that data by navigating the web of connections between them 😶

Linked Data also brings another challenge. You can be given a resource which in turn contains a collection of links to other data that describe or augment it. You then have to go and fetch all those resources, potentially recursively, to get a complete picture of the object. To me, this is wasteful. If we’re already agreeing on what information we need to transfer, actually transfer it. Don’t make it do a whole bunch of additional requests to get it. From a privacy point of view this also irks me, since me going around dereferencing a bunch of linked data is something that others can observe, especially when they were the one handing me the original document. In many ways it has a whiff of amplification attacks and resource exhaustion.

Personally I consider caring about Linked Data to just about always be the wrong engineering trade-off. It’s a ton of complexity for benefits that have yet to materialise in any meaningful manner at a scale that makes it remotely useful in practice. Write a spec where we agree on a bunch of key-value pairs that make sense within a domain and lets leave the rest at the door.

It’s a bird, it’s a plane, it’s an oil tanker

Another annoyance in the Activity Streams spec that ActivityPub inherits, and what seems to be another W3C requirement, is to be able to represent the same field in 3 different ways. This seems to be a thing we got from RDF and its plain literals versus typed literals.

The ways are generally:

String
Object
List of Object and/or String

A concrete example, the context key (which we get from JSON-lD):

{
   "@context": "https://example.org/some/namespace"
}

But it can also be:

{
   "@context": {
      "@vocab": "https://example.org/some/namespace"
   }
}

It can also be BOTH, in case we have an array:

{
   "@context": [
      "https://example.org/some/namespace",
      {
         "ext": "https://example.org/extension"
      }
   ]
}

At this point, anyone working in programming languages with static typing is probably mildly annoyed. It’s not that we can’t define our own format that we then (de)serialise to, but boy would it be nice not to have to jump through extra hoops for the sake of the hoops.

There’s are more things that are really annoying here. First of all, context is singular even though it may be plural. The Activity Streams spec actually does one extra thing here in calling properties that are only every singular “functional properties”, whereas the other ones are just properties. I don’t know what was so hard about singular versus plural but here we are. It also feels like properties that aren’t functional should then be called dysfunctional?

Second, since the field may be plural, you’ll always have to handle the case that it can contain more than one element. This means there’s really no advantage to string and object over “array with one string” and “array with one object”. It’s two additional bytes to transfer, and one less case you could forget to handle.

The string version is just a more compact representation of the object version and if compression wasn’t a thing maybe I would buy into the idea that this is done to limit the amount of data we transfer. In practice it’s highly unlikely to matter and if the aim really was to reduce bytes transferred we wouldn’t use JSON for our encoding. CBOR provides a nice alternative without the toolchain nightmares you get with Protobuf.

I personally also find this to be bad spec design. Having 3 different representations or encodings of the same thing is not helpful. It puts more work on me as an implementer and increases the chances of interoperability problems. Which is exactly what we don’t want when we’re trying to get two computers to successfully talk to each other and do something useful for the humans. So please don’t. If it may be an array, let it always be an array. And if it may be an object, let it always be an object.

The client-to-server spec is a lie

ActivityPub defines both a Client-to-Server protocol and a Server-to-Server protocol.

Server-to-Server is implemented by everyone because that’s how we share, i.e federate, the content out in the Fediverse. Its how we can consume each other’s posts even though we don’t have an account on the same server.

Client-to-Server is a lie because nobody implements this in practice. Just about everyone mimicks the Mastodon and Pleroma client APIs. Mastodon especially existed before ActivityPub was a thing and that’s what most apps were originally written against. It’s the defacto Client-to-Server API.

This actually poses a problem. Since nobody implements the C2S spec no clients are written that use the C2S spec which means there’s no need to implement the C2S spec etc. etc.

Conclusion

ActivityPub is very useful in practice as we’ve seen since the despot took over Twitter. There’s a lot of things to like about it that we’ll get into in future posts. But I really wish the W3C wasn’t trying so hard to cram Semantic Web in every spec and down everyone’s throat.

Thankfully, we don’t have to care about it in practice for ActivityPub. Though some implementers like Mastodon understand and use JSON-LD, many other implementations like Pleroma have opted out of this and simply treat it as a defined set of key/values in JSON with no further meaning and won’t bother with any additional contexts. In practice ActivityPub implementers have also gone for a more regular format than what would technically be allowed by Activity Streams. There’s an (ongoing?) effort by some ActivityPub implementers to standardise LitePub to have a more rigid standard and opt-out of any meaning conveyed through Semantic Web things entirely.

A lot of W3C standards are all aboard the Semantic Web train. Take a look at the Web Of Things family of specs for some serious Enterprise FizzBuzz vibes. Proponents of JSON-LD and Semantic Web in general will also really try to convey to you how amazing it is, it’s the future etc. You’ll be forgiven for mistaking it for a cult. It’s a cult. Personally I wish they’d stop beating this dead horse, but at this point the dead horse has been beaten so much it is now undead and will haunt us forever.

activitypub fediverse