After last week's embarassingly-handled WebP 0-day, I realized my Synapse instance was sorely out of date. Unfortunately, the dockerfile I had been using to manage that service on my Wobscale server was out of date and didn't build with more recent versions of Synapse. Rather than using the upstream Debian-based Dockerfile, I was using one prepared by my dear friend iliana , one which she stopped using quite a while ago and I was maintaining myself. Welp nyaa~.
After briefly considering migrating to spantaleev/matrix-docker-ansible-deploy, and doing some math on exactly how much data a federating synapse node passes in a week or a month, I decided I would move the Synapse install on to my home network with my Wobscale 1U acting as a reverse proxy to my homelab machine over Tailscale .
And so on Friday afternoon I decided to wreck my sleep schedule and migrate across.
Step 1: Migrating Synapse
I started by pulling a 45GB psql dump text file off the server: -rw-r–r–. 1 root root 45G Sep 22 16:29 /srv/files/services/postgres/backups/synapse-1695424808.sql and populating a DB with that. Unfortunately for me, I set up the database incorrectly in two ways: the first was that everything I imported was owned by the postgres user instead of the synapse user, and the second was that I set up the database with incorrect collation rules.
I did some absolutely stupid things rather than recreate the DB. I'm the only user on this server and Personal Software Can Be Shitty , so Hey Smell This applies.
GRANT ALL ON ALL TABLES IN SCHEMA public TO "matrix-synapse"; was not enough because Synapse wanted to own the tables and so i munged the output of SELECT * FROM pg_tables; in to a bunch of ALTER TABLE account_data OWNER to matrix-synapse; statements using Emacs macros. I also had to do the same with the sequences select * from pg_sequences;
I had to enable database.allow_unsafe_locale = true; in my Synapse configuration, which might break some indexing stuff but it seems like it's working fine so far (lol sob) -- after creating the indexes my disk was pretty full and it took 4 hours to import the .sql dump. I went to go see a movie and this still hadn't finished when I got back:
At this point I had a Synapse instance which wasn't federating but was running and passing health checks.
Step 2: Migrating mx-puppet-discord
I only run two appservcices these days, both of them are in nixpkgs , luckily. =mx-puppet-discord= lets your matrix server pretend to be a Discord client on your behalf, and despite it being a pain in the ass to run and set up and manually move all the rooms in to spaces and be forced in to joining a bunch of rooms you never look at, it's better than using the Discord electron client to chat in the three or five rooms I care about. So what if it gets me flagged as a bot or force-logged-out every six months? I will do roam:Computer Crimes .
Unfortunately mx-puppet-discord is basically unmaintained/EOL these days with the suggestion being to move to mautrix-discord which is not packaged. I might deal with that later on. There is a nixpkgs PR that has been marked as Stale but anyways.
mx-puppet-discord's nixpkgs package builds on Node 14, which is EOL and marked as Insecure in nixpkgs, so I manually patched my nixpkgs to move it to Node 18.
And then getting the app service registration files to line up was a whole pain in the ass; nixpkgs services increasingly use a SystemD feature called Dyanmic USers where the service manager creates an ephemeral high-uid user when the service first starts up and then locks down the user to only being able to access a few filesystem directories and is basically more secure out of the box. This is nice, but also causes me pain when I want to share a file between a dynamic user and others, like with the registration file. I ended up just copying this thing out of the /var/lib/mx-puppet-discord directory and pointing Synapse to load that one but if I ever have to rotate these files it'll be a bit of a pain. I couldn't find any other prior art on how people handle this better. With that working, configuring the damn thing should have been this simple , but it never is, is it?
Discord -> Synapse messaging worked. That is, I could receive messages from my guilds and 1-1 conversations. But I could not send messages, Synapse would receive a 401 Unauthorzed from the app service, and this was a huge pain in the ass to debug because it's a node project and anything relevant is 2-3 transitive dependencies away from the service definition. I ended up doing More Crimes to investigate this (remounting /nix/store/ as rw and adding console.log statements to the transitive dep) and found, hilariously, a casing concern in sorunome/matrix-js-bot-sdk, my old Synapse instance must have used an access_token to auth to the appservice but the new version specified an Authorization header which Express helpfully downcased to authorization when it stored it. Rather than accessing it as an object you should do req.get or req.header which is case-insensitive.
This is a dep that hasn't been touched in years on an EOL service, so I just patched it myself in my nixpkgs clone by adding this to the postInstall:
shell source:substituteInPlace $out/lib/node_modules/@mx-puppet/discord/node_modules/@sorunome/matrix-bot-sdk/lib/appservice/Appservice.js \ --replace 'req.headers["Authorization"]' 'req.get("Authorization")'
At that point the puppet bridge worked.
Step 3: Heisenbridge
Heisenbridge is such good software, ever since I migrated off of the matrix-appservice-irc service my quality of life as a single-user instance owner has increased greatly. I enabled the service and started it and it worked.
You can see the downsides of the mx-puppet-discord Dynamic User setup in heisenbridge's lack thereof:
nix source:services.matrix-synapse.settings.app_service_config_files = ["/var/lib/heisenbridge/registration.yml"]; users.users.matrix-synapse.extraGroups = ["heisenbridge"]; # to access registration file
Step 4: Setting up reverse proxy and re-federating
So I'll admit I have a pretty non-standard setup in my network stack right now. All of my domains point at an edge server which runs nginx and tailscale and one or two more things now that it isn't running a Synapse stack. Manually configured nginx frontends on that edge route traffic over Tailscale to Nginx Frontends on the server in my living room controlled by NixOS which then route to service traffic.
I set up a simple listen 443 ssl rule on my edge which got me to the point where I could send messages, and I could receive messages, and it only took me an hour to realize this was not the full story: I wasn't properly federating. I could receive inbound federated messages, but my messages were not federating out of my server. The Matrix Federation Tester of course does not test this.
So, reader, a quiz: What is wrong with this location block?
conf source:location / { proxy_pass http://last-bank/; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_set_header X-Forwarded-Host $http_host; }
I'll give you a hint: BadSignatureError: Signature was forged or corrupt.
Now, I have been using libsodium for work and this raised my hackles. what is it verifying? Someone did the work for me back in 2018 to figure this out, Synapse signs the URL as part of the federation request and this may be impacted by reverse proxy settings. And it's spelled out in the Synapse docs even!
NOTE: Your reverse proxy must not canonicalise or normalise the requested URI in any way (for example, by decoding %xx escapes). Beware that Apache will canonicalise URIs unless you specify nocanon.
Then in what circumstances does nginx canonicalize the URL? If you have any computable path specified in the proxy_pass rule, Nginx will silently normalize it! My previous setup did not have this proxy_pass rule in place, and changing it to proxy_pass http://last-bank was enough to make it all glue together.
Step 5: Picking up the pieces
I don't like carrying patches to nixpkgs and adding two more on top of a handful of cherry-picks from nixpkgs-unstable to nixos-23.05 was a bit of a bummer, but these monkey patches for mx-puppet-discord are not things I enjoy carrying. I am going to evaluate moving to mautrix-discord and maybe package some other mautrix services ASAP, but for now this thing is working.
Synapse was one of the last services running on my Wobscale server; it was originally deployed back in like 2017 and has served me well but it's a geriatric Fedora Linux install that is now only running my Wallabag server on it. Once I migrate that to The Wobserver , I'll be able to turn this host down and ask ili for a small VM that can do the edge networking and be managed by my NixOS ecosystem instead of hand-tuned nginx configurations. That'll be nice.
This simultaneously took more and less work than I expected it to, and it's certainly not a perfect migration, but it is nice to be done with. It took about 22 hours of downtime all said and done, including some time spent sleeping while the thing was semi-functional.
This migration was a huge fnord for months where I would say "i should update my synapse, ugh, i should migrate my synapse to nixos, ugh, i should just sign up for beeper.com and never touch synapse again" every time my disk would fill up and i would have to do some stupid bullshit to clean it up enough to run VACUUM FULL on the synapse DB. It's still 65 fucking GB of old events I never want to see, and I recently learned why: "unfortunately the matrix-synapse delete room API does not remove anything from stategroups_state. This is similar to the way that the matrix-synapse message retention policies also do not remove anything from stategroups_state." this kills me, and this is probably why my ext step will be to set up matrix-synapse-diskspace-janitor.
Archival resources and references
spantaleev/matrix-docker-ansible-deploy
Matrix (An open network for secure, decentralized communication) server setup using Ansible and roam:Docker .
Ended up not using this, but it's a neat thing to point people towards, it supports a lot of the appservices i'm going to have to package myself.
NixOS 23.05 manual # Matrix setup section
The NixOS manual has a simple configuration for deploying Matrix.org which would have shortcut some of my problems including the database collation shit, alas.
Dynamic Users with systemd
TL;DR: you may now configure systemd to dynamically allocate a UNIX user ID for service processes when it starts them and release it when it stops them. It's pretty secure, mixes well with transient services, socket activated services and service templating.
I do think this is pretty neat and I like it even if it's a pain in the ass to do "unsafe" things like share files between related services as a result.
cyberia/matrix-synapse-diskspace-janitor
move mx-puppet to supported node 18 · 11dae907f0 - nixpkgs - rrix's code with a cup of tea
My nixpkgs commit to change to nodejs_14
mx-puppet-discord: fix header access bug in matrix-bot-sdk ... · a4bac7bfb2 - nixpkgs - rrix's code with a cup of tea
The substituteInPlace hack I put in for the matrix-bot-sdk bug