|
@@ -0,0 +1,144 @@
|
|
|
+## Introduction
|
|
|
+
|
|
|
+Copycat DB is a [service](https://github.com/ente-io/infra) to take a backup of
|
|
|
+our database. It uses the Scaleway CLI to take backups of the database, and
|
|
|
+uploads them to an offsite bucket.
|
|
|
+
|
|
|
+This bucket has an object lock configured, so backups cannot be deleted before
|
|
|
+expiry. Conversely, the service also deletes backups older than some threshold
|
|
|
+when it creates a new one to avoid indefinite retention.
|
|
|
+
|
|
|
+In production the service runs as a cron job, scheduled using a systemd timer.
|
|
|
+
|
|
|
+## Required environment variables
|
|
|
+
|
|
|
+##### SCW_CONFIG_PATH
|
|
|
+
|
|
|
+Path to the `config.yaml` used by Scaleway CLI.
|
|
|
+
|
|
|
+This contains the credentials and the default region to use when trying to
|
|
|
+create and download the database dump.
|
|
|
+
|
|
|
+If needed, this config file can be generated by running the following commands
|
|
|
+on a shell prompt in the container (using `./test.sh sh`)
|
|
|
+
|
|
|
+ scw init
|
|
|
+ scw config dump
|
|
|
+
|
|
|
+##### SCW_RDB_INSTANCE_ID
|
|
|
+
|
|
|
+The UUID of the Scalway RDB instance that we wish to backup. If this is missing,
|
|
|
+then the Docker image falls back to using `pg_dump` (as outlined next).
|
|
|
+
|
|
|
+##### PGUSER, PGPASSWORD, PGHOST
|
|
|
+
|
|
|
+Not needed in production when taking a backup (since we use the Scaleway CLI to
|
|
|
+take backups in production).
|
|
|
+
|
|
|
+These are used when testing a backup using `pg_dump`, and when restoring backups.
|
|
|
+
|
|
|
+##### RCLONE_CONFIG
|
|
|
+
|
|
|
+Location of the config file, that contains the destination bucket where you want
|
|
|
+to use to save the backups, and the credentials to to access it.
|
|
|
+
|
|
|
+Specifically, the config file contains two remotes:
|
|
|
+
|
|
|
+* The bucket itself, where data will be stored.
|
|
|
+
|
|
|
+* A "crypt" remote that wraps the bucket by applying client side encryption.
|
|
|
+
|
|
|
+The configuration file will contain (lightly) obfuscated versions of the
|
|
|
+password, and as long as we have the configuration file we can continue using
|
|
|
+rclone to download and decrypt the plaintext. Still, it is helpful to retain the
|
|
|
+original password too separately so that the file can be recreated if needed.
|
|
|
+
|
|
|
+A config file can be generated using `./test.sh sh`
|
|
|
+
|
|
|
+ rclone config
|
|
|
+ rclone config show
|
|
|
+
|
|
|
+When generating the config, we keep file (and directory) name encryption off.
|
|
|
+
|
|
|
+Note that rclone creates a backup of the config file, so Docker needs to have
|
|
|
+write access to the directory where it is mounted.
|
|
|
+
|
|
|
+##### RCLONE_DESTINATION
|
|
|
+
|
|
|
+Name of the (crypt) remote to which the dump should be saved. Example:
|
|
|
+`db-backup-crypt:`.
|
|
|
+
|
|
|
+Note that this will not include the bucket - the bucket name will be part of the
|
|
|
+remote that the crypt remote wraps.
|
|
|
+
|
|
|
+##### Logging
|
|
|
+
|
|
|
+The service logs to its standard out/error. The systemd unit is configured to
|
|
|
+route these to `/var/logs/copycat-db.log`.
|
|
|
+
|
|
|
+## Local testing
|
|
|
+
|
|
|
+The provided `test.sh` script can be used to do a smoke test for building and
|
|
|
+running the image. For example,
|
|
|
+
|
|
|
+ ./test.sh bin/bash
|
|
|
+
|
|
|
+gives us a shell prompt inside the built and running container.
|
|
|
+
|
|
|
+For more thorough testing, run this service as part of a local test-cluster.
|
|
|
+
|
|
|
+## Restoring
|
|
|
+
|
|
|
+The service also knows how to restore the latest backup into a Postgres
|
|
|
+instance. This functionality is used to periodically verify that the backups are
|
|
|
+restorable.
|
|
|
+
|
|
|
+To invoke this, use "./restore.sh" as the command when running the container
|
|
|
+(e.g. `./test.sh ./restore.sh`). This will restore the latest backup into the
|
|
|
+Postgres instance whose credentials are provided via the various `PG*`
|
|
|
+environment variables.
|
|
|
+
|
|
|
+## Preparing the bucket
|
|
|
+
|
|
|
+The database dumps are stored in a bucket that has object lock enabled
|
|
|
+(Compliance mode), and has a default bucket level retention time of 30 days.
|
|
|
+
|
|
|
+## Deploying
|
|
|
+
|
|
|
+Ensure that promtail is running, and is configured to scrape
|
|
|
+`/root/var/logs/copycat-db.log`.
|
|
|
+
|
|
|
+Create that the config and log destination directories
|
|
|
+
|
|
|
+ sudo mkdir -p /root/var/config/scw
|
|
|
+ sudo mkdir -p /root/var/config/rclone
|
|
|
+ sudo mkdir -p /root/var/logs
|
|
|
+
|
|
|
+Create the env, scw and rclone configuration files
|
|
|
+
|
|
|
+ sudo tee /root/copycat-db.env
|
|
|
+ sudo tee /root/var/config/scw/copycat-db-config.yaml
|
|
|
+ sudo tee /root/var/config/rclone/copycat-db-rclone.conf
|
|
|
+
|
|
|
+Add the service definition, and start the service
|
|
|
+
|
|
|
+ scp copycat-db.{service,timer} instance:
|
|
|
+
|
|
|
+ sudo mv copycat-db.{service,timer} /etc/systemd/system
|
|
|
+ sudo systemctl daemon-reload
|
|
|
+
|
|
|
+To enable the cron job
|
|
|
+
|
|
|
+ sudo systemctl enable --now copycat-db.timer
|
|
|
+
|
|
|
+The timer will trigger the service on the specified schedule. In addition, if
|
|
|
+you wish to force the job to service immediately
|
|
|
+
|
|
|
+ sudo systemctl start copycat-db.service
|
|
|
+
|
|
|
+## Updating
|
|
|
+
|
|
|
+To update, run the [Github action](.github/workflows/ci.yaml) to push the latest
|
|
|
+image to our Docker Registry, then restart the systemd service on the instance
|
|
|
+
|
|
|
+ sudo systemctl restart copycat-db
|