De-scaling Geneac

On July 8th, 2018, I made the first commit to Geneac. From the beginning, a conscious design decision was made to use “off-the-shelf” components for the project, such as using the Administrate gem as the editor interface. I also optimized for deploying to Heroku, which was free and came with Postgres and Redis. Frickin’ sweet! Heroku also encouraged good web app development patterns that shaped how I think about scalable web apps - for one, don’t save to disk! Use a storage service like AWS S3 or Azure Blob Store!

Unsurprisingly, Heroku stopped offering their free tier a few years ago (you know, right around the time that crypto mining was all the rage.) In the time since, I’ve bounced my own copy of Geneac between VMs both in and out of the cloud. The ability to move Geneac around was also a key design goal - or, if not move exactly, then import & export data easily.

Here’s the thing though: this is not actually easy. The current version of Geneac can create export files with HTML files for each note, image files for each uploaded image, and JSON files for the other structured data. The HTML files need an unfortunate amount of preprocessing to get to a nice-enough state to view on their own - to say nothing of the special handling they need to get them back into Geneac, should you try to restore from a backup.

So, I’ve found it easier to just run big SQL dumps with pg_dump and full storage dumps with rclone to get all the raw data material and execute/reupload on the target database and storage bucket. Hmm, can this be the official way to do backups? Kind of, but not really. If you’re using a managed database where you don’t necessarily have control over the version of Postgres you use, and you try to use the default version of pg_dump inside the application container, you won’t be able to run it unless the versions match! Dang.

There are a lot of these little papercuts that pop up when running a “scalable” web service like Geneac. It’s applying everything (well, most things) that I’ve learned in my professional career as a software engineer working in The Cloud:tm:. But, it’s work to maintain, and what if I just want it to be… simple?

SQLite

I switched Geneac to SQLite. In the unlikely event that you are a current user of Geneac,

Please email me, I am extremely curious
This is going to f%#* your s#*$ up.

SQLite uses a single file. Well you know what? I run my service on a single host. I can mount a volume in my Docker container with the file in it for persistence. This will work just fine.

I ran trusty pg_dump to dump my production database:

# technically I did this inside the app container using Kamal
pg_dump --host **********.**********.postgres.database.azure.com -d rysavywinz -U geneacProdAdmin -a --column-inserts --no-owner -N pg_search_documents --no-comments > dump.sql

This actually got me most of the way there. I had to turn to find-and-replace for two things in the resulting dump.sql:

Remove all uses of public. as the schema name since that doesn’t exist in the SQLite database.
Change the active_storage_blobs table entries to use the local storage type rather than the microsoft one (this can be done in an UPDATE after you migrate too.)

Then you can run sqlite my-database.db < dump.fixed.sql and you’re good to go. Make sure to update database.yml if you have to, but the default Geneac one is configured correctly now.

Full text search

pg_search is one of the killer features of Rails and Postgres, in my opinion. When I started using it, I had no idea what I was doing, I just knew I had a functional search without a lot of effort. Of course, that’s the point - it hides a lot of details to make the process look easy! Well, after 3 years of Rails and database experience at GitHub, I found that the nuts and bolts of a full text search were not that difficult. This blog post by Mario Alberto Cháve was also a big help for me. Rather than implement a model-specific search like him, though, I replicated the “multisearch”-esque feature of pg_search. This could really be its own gem, but for now, it’s wired up just in Geneac.

Kamal and Docker

Without a dependency on S3/Azure Blob Store, nor on a separate database server, deployment to a single VM is super easy. I started using Kamal a while ago, and version 2.0 comes with its own proxy which handles SSL certs - a key pain point of the 1.x series!

Putting it all together

Now, I can run bundle exec kamal deploy to deploy the latest version of Geneac. The whole thing runs on a single VM with all of the data stored in a single Docker volume, which I can easily just tar up and move somewhere else if needed. It couldn’t be simpler and I love it! Performance is not really a huge concern, but if it becomes one, scaling vertically to a larger VM is really my only option. But, if I ever want to scale out to multiple machines again, I should be able to undo these changes without much hassle!

Overall, I am happy that this setup increases my confidence in distributing Geneac as an application that others can easily self-host with Docker (or without Docker, if that’s what you want to do, I guess.)