Privacy and the Fediverse (AKA Mastodon, Pleroma, and Friendicas)

There are lots of “new” social media sites and software cropping up these days; largely in response to Facebook’s terrible behaviour, but also because a lot of things are coming together which make it easy to do so. Social Media in general has reached critical mass and has become so ingrained in modern societies that pretty much any social media has a chance to take hold. On the practical side, programs like Mastodon (and Pleroma, and Friendica, etc) are almost trivial to install and maintain. This makes it possible and fun for almost anyone with basic sysadmin skills to deploy an instance and join the Fediverse.

The Fediverse

I am going to try to avoid turning this into a Fediverse post so I will explain the Fediverse only to the level of detail needed to continue with the point of this post.

The term “Fediverse” is a colloquial portmanteau most commonly used to refer to The Federated Universe. Federation essentially means to join or form together and that accurately describes how Fediverse applications like Mastodon work. An individual Mastodon instance become aware of other Mastodon instances and they start to share posts from their respective users with each other. The instances continue to join with other instances they becomes aware of over time, and the connections - or federation - continue to grow. Instances that are federated with each other will display the public posts from each other on their own timelines. Therefore, a public post I make on Instance A will appear on the public federated timeline of Instance B automatically. This is done by Instance A actually sending my post to Instance B so there is now two copies of it, one on each instance.

A visual representation of the Fediverse gives some idea of what the Fediverse looks like. You can easily find the more influential instances by selecting the “Activity” color coding option and observing the level of opacity of the rays coming from each instance. The instances with more opague rays have more activity.

In practical terms, this means that although there are thousands of individual Fediverse instances, it doesn’t matter terribly much which one you join. You’re able to directly interact with people on other instances due to federation, so being on the same instance as another person doesn’t significantly lessen your ability to interact with them. Assuming your two instances know of each other, your posts will still reach them, and their posts will still reach you.

Fediverse applications use the ActivityPub protocol which lends itself well to a Twitter-like social media experience, rather than a Facebook-like experience. By that I mean it has weak friend support - it’s basically limited to “Following” a user, or direct messaging a user, such as you’d do on Twitter rather than the rich two-way private friendship complete with lots of visual media that you can establish on Facebook.

The two most common ways Fediverse instances discover each other are:

  1. The admin of the instance purposely federates with an instances using a relay.
  2. A user on an instance follows or mentions a user on another instance.

OK, that should be enough to get us going.

Social network privacy

Privacy is the ability of an individual or group to seclude themselves, or information about themselves, and thereby express themselves selectively.

I think we can all agree by now that there’s really no such thing a privacy on social networks. Nobody has yet found a way to make users pay money for the use of a social network, therefore the networks have to make their money elsewhere. The most obvious way to make money is to sell advertising targetted at users of the network, which all the social network sites do now. Inherent in the ability to sell targetted ads is the collection of as much user data as possible. A network can’t fulfill an ad request to target “millennial white males making over $50K in Toronto with two or more dogs” if the social network isn’t collecting data such as gender, birthdate, income, and lifestyle data.

At this point in our collective social network evolution, it’s arguably not possible for a social network to survive without harvesting and selling aggregate user data.

Privacy on Facebook, Twitter, and the Fediverse

Facebook and Twitter harvest user data. They make that clear in their respective terms of service and privacy policies. There’s no secret there. The Fediverse, however, isn’t a company. The Fediverse has no terms of service or privacy policy because it is just a protocol and some software. If there is a privacy policy or terms of service to be found anywhere, it is because an individual instance administrator has published it; in which case its scope is limited to that instance alone.

And therein lies probably the biggest privacy issue with the Fediverse: based on how federation works, users have no way to tell where their posts go or how long they live.

To make the issue easier to understand, I’ll use my home instance as an example. My Fediverse account is on Hackers.Town (HT). HT is a private instance that does not allow open registrations. As such, it has a fairly low user count of 145 (at the time of this writing). Let’s look at the HT stats harvested by the Fediverse Network project here.

Hackers.Town stats as of the time of this writing

At only 145 users, HT is federating with 4,849 other instances that it knows about.

Another Fediverse stats site tells me that HT has over 11,000 peers.

Obviously there is some work to be done in stat collection overall, but the point is that my public posts on HT go to thousands of other instances that I do not have a relationship with. I don’t have any agreement with the admins of those other instances and those admins aren’t required to uphold the terms of service of the instance I do have a relationship with, Hackers.Town.

The big Fediverse apps (Mastodon and Pleroma) are open source, presumably they all are. That means an admin of an instance has full access to the code, which is a good thing for society at large. However, it does mean an instance admin could do some bad things. In the scope of privacy, one of the most egregious things an admin could do is stub out the ‘post delete’ code (somewhere around here, I think /app/lib/activitypub/activity/delete.rb) which would have the effect of the instance not honouring delete requests. In that case, the instance becomes a vacum cleaner, hoovering up every post that comes its way and storing it away for…why?

How is the Fediverse less private than existing social networks?

Keeping in mind that the definition of privacy I’m working with is the ability to selectively express myself, I suggest that the Fediverse is less private than existing social networks for three main reasons:

My individual data is sent to unknown third parties

On Facebook and Twitter (and the rest), I send my data to Facebook and Twitter. And yes, they can and do stuff with it but at least I know who did that stuff and I gave them permission to do so. The Fediverse offers no such identification method - I do not know who my data is given to, but I do know for certain that it is shared with unknown parties.

The Fediverse shares my individual (not aggregate) data

Facebook and Twitter don’t technically sell user data. They sell aggregate user data which is what allows them to target my millennial dog owner earlier in the post. They can’t, or aren’t supposed to, sell me, personally. That is what prevents advertisers from saying “I want Jon Watson to see this ad”.

Conversely, because I have no relationship with most of the parties that hold my Fediverse data, they can make use of my individual data without my consent or even notification that they’re doing so.

The Fediverse provides no way for me to delete all my account data

There is a lot of privacy legislation aimed at social networks these days. A lot of it has to do with ensuring that users have access to their full data upon request and also have the ability to insist that their account and data be deleted if desired. Unfortunately, federation is not an exact science. The vagaries of the internet, different site configurations, malicious sites, and broken instances can all contribute to messages not being federated properly to all sites. That means that not only is a user unable to determine all the instances that may have their data, it’s also not possible to be certain that user data has been deleted completely, across the entire Fediverse, upon request. And that’s not likely to be fixed soo. Because there is no “Fediverse, Inc.”, there is no entity that can be regulated by privacy laws into developing ways to comply with this type of privacy legislation.

So what?

Good question. I use the Fediverse as my primary social media so obviously I am OK with all these issues. Mostly, I use the Fediverse because I actually like it. I find the level of discourse to be higher than on Facebook or Twitter, and because I am aware of the unique privacy issues of the Fediverse, I am able to tailor my interactions accordingly.

The reach of a single Fediverse instance can be wide, even if it is a very small instance. I was given permission by the admin of a single-user Mastodon instance to use his site to show how a very small instance federates as easily as larger instances. It’s a good example of how a single-user instance still shares data with almost 5,000 other sites. While I don’t worry that this particular admin is doing this, it serves to illustrate how anyone can set up a locked down, inaccessible Mastodon instance and just collect data from all over the Fediverse for any reason at all.

Single-user Mastodon instance stats

I am not ignorant to the fact that there’s very little rich user data on the Fediverse to capture. Facebook is an extremely rich trove of user data so the risks are much higher when that data is shared than the risk of my relatively meagre Fediverse data being shared. However, the richness of Fediverse user data is not inherently restrained in the protocol so there’s room for that pool of data to get richer over time and privacy controls may not develop at the same rate.

Finally, I’d like to try to proactively address some comments which may be coming.

  1. I acknowledge that there are privacy controls within the Fediverse apps and protocols such as direct messages and post-specific settings such as “followers only” and “unlisted”. However, because the underlying federation aspect will send at least some of my posts to unknown people without my consent or knowledge, I don’t feel that those controls comprehensively support the ability to “selectively” express myself in the scope of the fediverse at large.
  2. I’ve heard the argument that a public post on Facebook or Twitter is the same as a public post on the Fediverse and therefore a Fediverse post poses no more risk. However, I disagree with this argument for the three reasons I’ve stated earlier in this post, as well as a more nuanced reason. I’ve already stated that when I give data to Facebook or Twitter, I know I am giving it to them. Therefore, there’s also only one place for someone to harvest that data from Facebook or Twitter and those companies exercise some control over data scraping and API usage. That is a very different circumstance than a rogue Mastodon instance silently copying the entire Fediverse.

You can view comments on this post on the Fediverse here. If you want to participate, you’ll need to create a Fediverse account on an open node.