.. | ||
src | ||
.gitignore | ||
copycat-db.sample.env | ||
copycat-db.service | ||
copycat-db.timer | ||
Dockerfile | ||
README.md | ||
test.sh |
Copycat DB
Copycat DB is a service to take a backup of our database. It uses the Scaleway CLI to take backups of the database, and uploads them to an offsite bucket.
This bucket has an object lock configured, so backups cannot be deleted before expiry. Conversely, the service also deletes backups older than some threshold when it creates a new one to avoid indefinite retention.
In production the service runs as a cron job, scheduled using a systemd timer.
These backups are in addition to the regular snapshots that we take, and are meant as a second layer of replication. For more details, see our Reliability and Replication Specification.
Quick help
View service status (it gets invoked as a timer automatically, doesn't need to be started/stopped manually):
sudo systemctl status copycat-db
View logs locally (they'll also be available on Grafana):
sudo tail /root/var/logs/copycat-db.log
Name
The name copycat-db is a riff on "copycat", which is what we call our museum instance that does the object replication. This one replicates the DB, so, copycat-db.
Required environment variables
SCW_CONFIG_PATH
Path to the config.yaml
used by Scaleway CLI.
This contains the credentials and the default region to use when trying to create and download the database dump.
If needed, this config file can be generated by running the following commands
on a shell prompt in the container (using ./test.sh sh
)
scw init
scw config dump
SCW_RDB_INSTANCE_ID
The UUID of the Scalway RDB instance that we wish to backup. If this is missing,
then the Docker image falls back to using pg_dump
(as outlined next).
PGUSER, PGPASSWORD, PGHOST
Not needed in production when taking a backup (since we use the Scaleway CLI to take backups in production).
These are used when testing a backup using pg_dump
, and when restoring
backups.
RCLONE_CONFIG
Location of the config file, that contains the destination bucket where you want to use to save the backups, and the credentials to to access it.
Specifically, the config file contains two remotes:
-
The bucket itself, where data will be stored.
-
A "crypt" remote that wraps the bucket by applying client side encryption.
The configuration file will contain (lightly) obfuscated versions of the password, and as long as we have the configuration file we can continue using rclone to download and decrypt the plaintext. Still, it is helpful to retain the original password too separately so that the file can be recreated if needed.
A config file can be generated using ./test.sh sh
rclone config
rclone config show
When generating the config, we keep file (and directory) name encryption off.
Note that rclone creates a backup of the config file, so Docker needs to have write access to the directory where it is mounted.
RCLONE_DESTINATION
Name of the (crypt) remote to which the dump should be saved. Example:
db-backup-crypt:
.
Note that this will not include the bucket - the bucket name will be part of the remote that the crypt remote wraps.
Logging
The service logs to its standard out/error. The systemd unit is configured to
route these to /var/logs/copycat-db.log
.
Local testing
The provided test.sh
script can be used to do a smoke test for building and
running the image. For example,
./test.sh bin/bash
gives us a shell prompt inside the built and running container.
For more thorough testing, run this service as part of a local test-cluster.
Restoring
The service also knows how to restore the latest backup into a Postgres instance. This functionality by a separate service (Phoenix) to periodically verify that the backups are restorable.
To invoke this, use "./restore.sh" as the command when running the container
(e.g. ./test.sh ./restore.sh
). This will restore the latest backup into the
Postgres instance whose credentials are provided via the various PG*
environment variables.
Preparing the bucket
The database dumps are stored in a bucket that has object lock enabled (compliance mode), and has a default bucket level retention time of 30 days.
Deploying
Ensure that promtail is running, and is configured to scrape
/root/var/logs/copycat-db.log
.
Create that the config and log destination directories
sudo mkdir -p /root/var/config/scw
sudo mkdir -p /root/var/config/rclone
sudo mkdir -p /root/var/logs
Create the env, scw and rclone configuration files
sudo tee /root/copycat-db.env
sudo tee /root/var/config/scw/copycat-db-config.yaml
sudo tee /root/var/config/rclone/copycat-db-rclone.conf
Add the service definition, and start the service
scp copycat-db.{service,timer} instance:
sudo mv copycat-db.{service,timer} /etc/systemd/system
sudo systemctl daemon-reload
To start the cron job
sudo systemctl start copycat-db.timer
The timer will trigger the service on the specified schedule. In addition, if you wish to force the job to service immediately
sudo systemctl start copycat-db.service
Updating
To update, run the GitHub workflow to build and push the latest image to our Docker Registry, then restart the systemd service on the instance
sudo systemctl restart copycat-db