# Copycat DB

Copycat DB is a [service](../services/README.md) to take a backup of our
database. It uses the Scaleway CLI to take backups of the database, and uploads
them to an offsite bucket.

This bucket has an object lock configured, so backups cannot be deleted before
expiry. Conversely, the service also deletes backups older than some threshold
when it creates a new one to avoid indefinite retention.

In production the service runs as a cron job, scheduled using a systemd timer.

> These backups are in addition to the regular snapshots that we take, and are
> meant as a second layer of replication. For more details, see our
> [Reliability and Replication Specification](https://ente.io/reliability).

## Quick help

View service status (it gets invoked as a timer automatically, doesn't need to
be started/stopped manually):

```sh
sudo systemctl status copycat-db
```

View logs locally (they'll also be available on Grafana):

```sh
sudo tail /root/var/logs/copycat-db.log
```

## Name

The name copycat-db is a riff on "copycat", which is what we call our museum
instance that does the object replication. This one replicates the DB, so,
copycat-db.

## Required environment variables

##### SCW_CONFIG_PATH

Path to the `config.yaml` used by Scaleway CLI.

This contains the credentials and the default region to use when trying to
create and download the database dump.

If needed, this config file can be generated by running the following commands
on a shell prompt in the container (using `./test.sh sh`)

    scw init
    scw config dump

##### SCW_RDB_INSTANCE_ID

The UUID of the Scalway RDB instance that we wish to backup. If this is missing,
then the Docker image falls back to using `pg_dump` (as outlined next).

##### PGUSER, PGPASSWORD, PGHOST

Not needed in production when taking a backup (since we use the Scaleway CLI to
take backups in production).

These are used when testing a backup using `pg_dump`, and when restoring
backups.

##### RCLONE_CONFIG

Location of the config file, that contains the destination bucket where you want
to use to save the backups, and the credentials to to access it.

Specifically, the config file contains two remotes:

-   The bucket itself, where data will be stored.

-   A "crypt" remote that wraps the bucket by applying client side encryption.

The configuration file will contain (lightly) obfuscated versions of the
password, and as long as we have the configuration file we can continue using
rclone to download and decrypt the plaintext. Still, it is helpful to retain the
original password too separately so that the file can be recreated if needed.

A config file can be generated using `./test.sh sh`

    rclone config
    rclone config show

When generating the config, we keep file (and directory) name encryption off.

Note that rclone creates a backup of the config file, so Docker needs to have
write access to the directory where it is mounted.

##### RCLONE_DESTINATION

Name of the (crypt) remote to which the dump should be saved. Example:
`db-backup-crypt:`.

Note that this will not include the bucket - the bucket name will be part of the
remote that the crypt remote wraps.

##### Logging

The service logs to its standard out/error. The systemd unit is configured to
route these to `/var/logs/copycat-db.log`.

## Local testing

The provided `test.sh` script can be used to do a smoke test for building and
running the image. For example,

    ./test.sh bin/bash

gives us a shell prompt inside the built and running container.

For more thorough testing, run this service as part of a local test-cluster.

## Restoring

The service also knows how to restore the latest backup into a Postgres
instance. This functionality by a separate service (Phoenix) to periodically
verify that the backups are restorable.

To invoke this, use "./restore.sh" as the command when running the container
(e.g. `./test.sh ./restore.sh`). This will restore the latest backup into the
Postgres instance whose credentials are provided via the various `PG*`
environment variables.

## Preparing the bucket

The database dumps are stored in a bucket that has object lock enabled
(compliance mode), and has a default bucket level retention time of 30 days.

## Deploying

Ensure that promtail is running, and is configured to scrape
`/root/var/logs/copycat-db.log`.

Create that the config and log destination directories

    sudo mkdir -p /root/var/config/scw
    sudo mkdir -p /root/var/config/rclone
    sudo mkdir -p /root/var/logs

Create the env, scw and rclone configuration files

    sudo tee /root/copycat-db.env
    sudo tee /root/var/config/scw/copycat-db-config.yaml
    sudo tee /root/var/config/rclone/copycat-db-rclone.conf

Add the service definition, and start the service

    scp copycat-db.{service,timer} instance:

    sudo mv copycat-db.{service,timer} /etc/systemd/system
    sudo systemctl daemon-reload

To start the cron job

    sudo systemctl start copycat-db.timer

The timer will trigger the service on the specified schedule. In addition, if
you wish to force the job to service immediately

    sudo systemctl start copycat-db.service

## Updating

To update, run the
[GitHub workflow](../../.github/workflows/copycat-db-release.yaml) to build and
push the latest image to our Docker Registry, then restart the systemd service
on the instance

    sudo systemctl restart copycat-db