[infra] Add more details about services (#1109)

- Import the existing DesktopDistribute notes
- Define services
- Add prometheus and promtail definitions
This commit is contained in:
Manav Rathi 2024-03-14 22:33:26 +05:30 committed by GitHub
commit cd9c6f713a
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
10 changed files with 398 additions and 13 deletions

92
desktop/docs/release.md Normal file
View file

@ -0,0 +1,92 @@
## Releases
> [!NOTE]
>
> TODO(MR): This document needs to be audited and changed as we do the first
> release from this new monorepo.
The Github Action that builds the desktop binaries is triggered by pushing a tag
matching the pattern `photos-desktop-v1.2.3`. This value should match the
version in `package.json`.
So the process for doing a release would be.
1. Create a new branch (can be named anything). On this branch, include your
changes.
2. Mention the changes in `CHANGELOG.md`.
3. Changing the `version` in `package.json` to `1.x.x`.
4. Commit and push to remote
```sh
git add package.json && git commit -m 'Release v1.x.x'
git tag v1.x.x
git push && git push --tags
```
This by itself will already trigger a new release. The GitHub action will create
a new draft release that can then be used as descibed below.
To wrap up, we also need to merge back these changes into main. So for that,
5. Open a PR for the branch that we're working on (where the above tag was
pushed from) to get it merged into main.
6. In this PR, also increase the version number for the next release train. That
is, supposed we just released `v4.0.1`. Then we'll change the version number
in main to `v4.0.2-next.0`. Each pre-release will modify the `next.0` part.
Finally, at the time of the next release, this'll become `v4.0.2`.
The GitHub Action runs on Windows, Linux and macOS. It produces the artifacts
defined in the `build` value in `package.json`.
* Windows - An NSIS installer.
* Linux - An AppImage, and 3 other packages (`.rpm`, `.deb`, `.pacman`)
* macOS - A universal DMG
Additionally, the GitHub action notarizes the macOS DMG. For this it needs
credentials provided via GitHub secrets.
During the build the Sentry webpack plugin checks to see if SENTRY_AUTH_TOKEN is
defined. If so, it uploads the sourcemaps for the renderer process to Sentry
(For our GitHub action, the SENTRY_AUTH_TOKEN is defined as a GitHub secret).
The sourcemaps for the main (node) process are currently not sent to Sentry
(this works fine in practice since the node process files are not minified, we
only run `tsc`).
Once the build is done, a draft release with all these artifacts attached is
created. The build is idempotent, so if something goes wrong and we need to
re-run the GitHub action, just delete the draft release (if it got created) and
start a new run by pushing a new tag (if some code changes are required).
If no code changes are required, say the build failed for some transient network
or sentry issue, we can even be re-run by the build by going to Github Action
age and rerun from there. This will re-trigger for the same tag.
If everything goes well, we'll have a release on GitHub, and the corresponding
source maps for the renderer process uploaded to Sentry. There isn't anything
else to do:
* The website automatically redirects to the latest release on GitHub when
people try to download.
* The file formats with support auto update (Windows `exe`, the Linux AppImage
and the macOS DMG) also check the latest GitHub release automatically to
download and apply the update (the rest of the formats don't support auto
updates).
* We're not putting the desktop app in other stores currently. It is available
as a `brew cask`, but we only had to open a PR to add the initial formula, now
their maintainers automatically bump the SHA, version number and the (derived
from the version) URL in the formula when their tools notice a new release on
our GitHub.
We can also publish the draft releases by checking the "pre-release" option.
Such releases don't cause any of the channels (our website, or the desktop app
auto updater, or brew) to be notified, instead these are useful for giving links
to pre-release builds to customers. Generally, in the version number for these
we'll add a label to the version, e.g. the "beta.x" in `1.x.x-beta.x`. This
should be done both in `package.json`, and what we tag the commit with.

View file

@ -1,8 +1,8 @@
# Copycat DB # Copycat DB
Copycat DB is a [service](../service.md) to take a backup of our database. It Copycat DB is a [service](../services/README.md) to take a backup of our
uses the Scaleway CLI to take backups of the database, and uploads them to an database. It uses the Scaleway CLI to take backups of the database, and uploads
offsite bucket. them to an offsite bucket.
This bucket has an object lock configured, so backups cannot be deleted before This bucket has an object lock configured, so backups cannot be deleted before
expiry. Conversely, the service also deletes backups older than some threshold expiry. Conversely, the service also deletes backups older than some threshold
@ -11,9 +11,8 @@ when it creates a new one to avoid indefinite retention.
In production the service runs as a cron job, scheduled using a systemd timer. In production the service runs as a cron job, scheduled using a systemd timer.
> These backups are in addition to the regular snapshots that we take, and are > These backups are in addition to the regular snapshots that we take, and are
> meant as a second layer of replication. For more details, see our [Reliability > meant as a second layer of replication. For more details, see our
> and Replication Specification](https://ente.io/reliability). > [Reliability and Replication Specification](https://ente.io/reliability).
## Quick help ## Quick help
@ -61,7 +60,8 @@ then the Docker image falls back to using `pg_dump` (as outlined next).
Not needed in production when taking a backup (since we use the Scaleway CLI to Not needed in production when taking a backup (since we use the Scaleway CLI to
take backups in production). take backups in production).
These are used when testing a backup using `pg_dump`, and when restoring backups. These are used when testing a backup using `pg_dump`, and when restoring
backups.
##### RCLONE_CONFIG ##### RCLONE_CONFIG
@ -70,9 +70,9 @@ to use to save the backups, and the credentials to to access it.
Specifically, the config file contains two remotes: Specifically, the config file contains two remotes:
* The bucket itself, where data will be stored. - The bucket itself, where data will be stored.
* A "crypt" remote that wraps the bucket by applying client side encryption. - A "crypt" remote that wraps the bucket by applying client side encryption.
The configuration file will contain (lightly) obfuscated versions of the The configuration file will contain (lightly) obfuscated versions of the
password, and as long as we have the configuration file we can continue using password, and as long as we have the configuration file we can continue using
@ -164,9 +164,9 @@ you wish to force the job to service immediately
## Updating ## Updating
To update, run the [GitHub To update, run the
workflow](../../.github/workflows/copycat-db-release.yaml) to build and push the [GitHub workflow](../../.github/workflows/copycat-db-release.yaml) to build and
latest image to our Docker Registry, then restart the systemd service on the push the latest image to our Docker Registry, then restart the systemd service
instance on the instance
sudo systemctl restart copycat-db sudo systemctl restart copycat-db

104
infra/services/README.md Normal file
View file

@ -0,0 +1,104 @@
# Services
"Services" are Docker images we run on our instances and manage using systemd.
All our services (including museum itself) follow the same pattern:
- They're run on vanilla Ubuntu instances. The only expectation they have is
for Docker to be installed.
- They log to fixed, known, locations - `/root/var/log/foo.log` - so that
these logs can get ingested by Promtail if needed.
- Each service should consist of a Docker image (or a Docker compose file),
and a systemd unit file.
- To start / stop / schedule the service, we use systemd.
- Each time the service runs it should pull the latest Docker image, so there
is no separate installation/upgrade step needed. We can just restart the
service, and it'll use the latest code.
- Any credentials and/or configuration should be read by mounting the
appropriate file from `/root/service-name` into the running Docker
container.
## Systemd cheatsheet
```sh
sudo systemctl status my-service
sudo systemctl start my-service
sudo systemctl stop my-service
sudo systemctl restart my-service
sudo journalctl --unit my-service
```
## Adding a service
Create a systemd unit file (See the various `*.service` files in this repository
for examples).
If we want the service to start on boot, add an `[Install]` section to its
service file (_note_: starting on boot requires one more step later):
```
[Install]
WantedBy=multi-user.target
```
Copy the service file to the instance where we want to run the service. Services
might also have some additional configuration or env files, also copy those to
the instance.
```sh
scp services/example.service example.env <instance>:
```
SSH into the instance.
```sh
ssh <instance>
```
Move the service `/etc/systemd/service`, and any config files to their expected
place. env and other config files that contain credentials are kept in `/root`.
```sh
sudo mv example.service /etc/systemd/system
sudo mv example.env /root
```
If you want to start the service on boot (as spoken of in the `[Install]`
section above), then enable it (this only needs to be done once):
```sh
sudo systemctl enable service
```
Restarts systemd so that it gets to know of the service.
```sh
sudo systemctl daemon-reload
```
Now you can manage the service using standard systemd commands.
```sh
sudo systemctl start example
```
To view stdout/err, use:
```sh
sudo journalctl --follow --unit example
```
## Logging
Services should log to files in `/var/logs` within the container. This should be
mounted to `/root/var/logs` on the instance (using the `-v` flag in the service
file which launches the Docker container or the Docker compose cluster).
If these logs need to be sent to Grafana, then ensure that there is an entry for
this log file in the `promtail/promtail.yaml` on that instance. The logs will
then get scraped by Promtail and sent over to Grafana.

View file

@ -0,0 +1,32 @@
# Prometheus
Install `prometheus.service` on an instance if it is running something that
exports custom Prometheus metrics. In particular, museum does.
Also install `node-exporter.service` (after installing
[node-exporter](https://prometheus.io/docs/guides/node-exporter/) itself) if it
is a production instance whose metrics (CPU, disk, RAM etc) we want to monitor.
## Installing
Prometheus doesn't currently support environment variables in config file, so
remember to change the hardcoded `XX-HOSTNAME` too in addition to adding the
`remote_write` configuration.
```sh
scp -P 7426 services/prometheus/* <instance>:
nano prometheus.yml
sudo mv prometheus.yml /root/prometheus.yml
sudo mv prometheus.service /etc/systemd/system/prometheus.service
sudo mv node-exporter.service /etc/systemd/system/node-exporter.service
```
Tell systemd to pick up new service definitions, enable the units (so that they
automatically start on boot going forward), and start them.
```sh
sudo systemctl daemon-reload
sudo systemctl enable node-exporter prometheus
sudo systemctl start node-exporter prometheus
```

View file

@ -0,0 +1,12 @@
[Unit]
Documentation=https://prometheus.io/docs/guides/node-exporter/
Wants=network-online.target
After=network-online.target
[Install]
WantedBy=multi-user.target
[Service]
User=node_exporter
Group=node_exporter
ExecStart=/usr/local/bin/node_exporter

View file

@ -0,0 +1,16 @@
[Unit]
Documentation=https://prometheus.io/docs/prometheus/
Requires=docker.service
After=docker.service
[Install]
WantedBy=multi-user.target
[Service]
ExecStartPre=docker pull prom/prometheus
ExecStartPre=-docker stop prometheus
ExecStartPre=-docker rm prometheus
ExecStart=docker run --name prometheus \
--add-host=host.docker.internal:host-gateway \
-v /root/prometheus.yml:/etc/prometheus/prometheus.yml:ro \
prom/prometheus

View file

@ -0,0 +1,39 @@
# https://prometheus.io/docs/prometheus/latest/configuration/
global:
scrape_interval: 30s # Default is 1m
scrape_configs:
- job_name: museum
static_configs:
- targets: ["host.docker.internal:2112"]
relabel_configs:
- source_labels: [__address__]
regex: ".*"
target_label: instance
replacement: XX-HOSTNAME
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
relabel_configs:
- source_labels: [__address__]
regex: ".*"
target_label: instance
replacement: XX-HOSTNAME
- job_name: "node"
static_configs:
- targets: ["host.docker.internal:9100"]
relabel_configs:
- source_labels: [__address__]
regex: ".*"
target_label: instance
replacement: XX-HOSTNAME
# Grafana Cloud
remote_write:
- url: https://g/api/prom/push
basic_auth:
username: foo
password: bar

View file

@ -0,0 +1,26 @@
# Promtail
Install `promtail.service` on an instance if it is running something whose logs
we want in Grafana.
## Installing
Replace `client.url` in the config file with the Loki URL that Promtail should
connect to, and move the files to their expected place.
```sh
scp -P 7426 services/promtail/* <instance>:
nano promtail.yaml
sudo mv promtail.yaml /root/promtail.yaml
sudo mv promtail.service /etc/systemd/system/promtail.service
```
Tell systemd to pick up new service definitions, enable the unit (so that it
automatically starts on boot), and start it this time around.
```sh
sudo systemctl daemon-reload
sudo systemctl enable promtail
sudo systemctl start promtail
```

View file

@ -0,0 +1,19 @@
[Unit]
Documentation=https://grafana.com/docs/loki/latest/clients/promtail/
Requires=docker.service
After=docker.service
[Install]
WantedBy=multi-user.target
[Service]
ExecStartPre=docker pull grafana/promtail
ExecStartPre=-docker stop promtail
ExecStartPre=-docker rm promtail
ExecStart=docker run --name promtail \
--hostname "%H" \
-v /root/promtail.yaml:/config.yaml:ro \
-v /var/log:/var/log \
-v /root/var/logs:/var/logs:ro \
-v /var/lib/docker/containers:/var/lib/docker/containers:ro \
grafana/promtail -config.file=/config.yaml -config.expand-env=true

View file

@ -0,0 +1,45 @@
# https://grafana.com/docs/loki/latest/clients/promtail/configuration/
# We don't want Promtail's HTTP / GRPC server.
server:
disable: true
# Loki URL
# For Grafana Cloud, it can be found in the integrations section.
clients:
- url: http://loki:3100/loki/api/v1/push
# Manually add entries for all our services. This is a bit cumbersome, but
# - Retains flexibility in file names.
# - Makes adding job labels easy.
# - Does not get in the way of logrotation.
#
# In addition, also scrape logs from all docker containers.
scrape_configs:
- job_name: museum
static_configs:
- labels:
job: museum
host: ${HOSTNAME}
__path__: /var/logs/museum.log
- job_name: copycat-db
static_configs:
- labels:
job: copycat-db
host: ${HOSTNAME}
__path__: /var/logs/copycat-db.log
- job_name: phoenix
static_configs:
- labels:
job: phoenix
host: ${HOSTNAME}
__path__: /var/logs/phoenix.log
- job_name: docker
static_configs:
- labels:
job: docker
host: ${HOSTNAME}
__path__: /var/lib/docker/containers/*/*-json.log