Do you know that feeling, waking up in the morning just to learn your website and self hosted services are down for unknown reason? That moment when you feel your VPS has literally moved to the cloud?
This is exactly what happened to me one day of March in 2021. I woke up in the morning, checked the news and learned that there’s an fire incident in SBG2 (sources: reference1 reference2). My first thought this is why we do backups, then I realized I’m really going to use them. My main VPS was there and it was not responding.
Back when the fire started, I’ve already had some automation in place to set up all of the services. I just figured it was not enough. Since I had to re-do my server from the ground up, this seemed like the right time to rethink the approach and try doing things a little different.
The usual thing to do after such situation is performing the post mortem analysis, asking yourself what could be done better to avoid the issues in future and so on. Hosting things on a single VPS instance has its limitations, the server is obviously the bottleneck. I figured I wanted to focus on three things:
- Reliability - can I make sure at least some of the services
- Ease of setup - situations like this one can and will happen again, so let’s make sure I don’t spend too much time setting it up next time
- Reproducibility & testability - I want to be able to reproduce this setup anywhere, if possible test the changes on a different server before introducing them to (for the lack of a better word) the production
Phoenix raises from the ashes
With all those assumptions in mind, I came up with a set of ideas. For the reliability, I decided to rewrite all of my pages (including this blog) to static ones and host them using Gitlab Pages. The reason is simple and a bit selfish. If their infrastructure goes down, it’s not my problem. It’s much more sophisticated than a single VPS so chances are they are much safer.
What about the other services? A while ago I told about hosting the Caddy server (see https://blog.michal.pawlik.dev/posts/caddy-reverse-proxy/). I set up that server along with few other things using Ansible playbook.
If you haven’t tried it - Ansible is a way to automate deployments using declarative definition in form of YAML file. It abstracts away some parts thanks to modules so you don’t care how to remotely copy files, set up docker and so on.
With Ansible I can abstract away the implementation details making sure I can deploy to different Linux distributions. The deployment is as easy as running single command invoking ansible playbook.
When it comes to testing - before buying a new VPS I first set up a virtual machine on my PC and deployed. After that I was ready to deploy my new server. This is how the phoenix rose from the ashes. This was the first time I had no problem to come up with the name for my server!
That time I’ve learned that disasters like this can teach you something. Performing post mortem analysis can teach you something, even when if it’s about a minor personal project. Automating things is fun and saysfactory when done well, and so is learning new things in general.