[infra] Add more details about services (#1109)
- Import the existing DesktopDistribute notes - Define services - Add prometheus and promtail definitions
This commit is contained in:
commit
cd9c6f713a
10 changed files with 398 additions and 13 deletions
92
desktop/docs/release.md
Normal file
92
desktop/docs/release.md
Normal file
|
@ -0,0 +1,92 @@
|
||||||
|
## Releases
|
||||||
|
|
||||||
|
> [!NOTE]
|
||||||
|
>
|
||||||
|
> TODO(MR): This document needs to be audited and changed as we do the first
|
||||||
|
> release from this new monorepo.
|
||||||
|
|
||||||
|
The Github Action that builds the desktop binaries is triggered by pushing a tag
|
||||||
|
matching the pattern `photos-desktop-v1.2.3`. This value should match the
|
||||||
|
version in `package.json`.
|
||||||
|
|
||||||
|
So the process for doing a release would be.
|
||||||
|
|
||||||
|
1. Create a new branch (can be named anything). On this branch, include your
|
||||||
|
changes.
|
||||||
|
|
||||||
|
2. Mention the changes in `CHANGELOG.md`.
|
||||||
|
|
||||||
|
3. Changing the `version` in `package.json` to `1.x.x`.
|
||||||
|
|
||||||
|
4. Commit and push to remote
|
||||||
|
|
||||||
|
```sh
|
||||||
|
git add package.json && git commit -m 'Release v1.x.x'
|
||||||
|
git tag v1.x.x
|
||||||
|
git push && git push --tags
|
||||||
|
```
|
||||||
|
|
||||||
|
This by itself will already trigger a new release. The GitHub action will create
|
||||||
|
a new draft release that can then be used as descibed below.
|
||||||
|
|
||||||
|
To wrap up, we also need to merge back these changes into main. So for that,
|
||||||
|
|
||||||
|
5. Open a PR for the branch that we're working on (where the above tag was
|
||||||
|
pushed from) to get it merged into main.
|
||||||
|
|
||||||
|
6. In this PR, also increase the version number for the next release train. That
|
||||||
|
is, supposed we just released `v4.0.1`. Then we'll change the version number
|
||||||
|
in main to `v4.0.2-next.0`. Each pre-release will modify the `next.0` part.
|
||||||
|
Finally, at the time of the next release, this'll become `v4.0.2`.
|
||||||
|
|
||||||
|
The GitHub Action runs on Windows, Linux and macOS. It produces the artifacts
|
||||||
|
defined in the `build` value in `package.json`.
|
||||||
|
|
||||||
|
* Windows - An NSIS installer.
|
||||||
|
* Linux - An AppImage, and 3 other packages (`.rpm`, `.deb`, `.pacman`)
|
||||||
|
* macOS - A universal DMG
|
||||||
|
|
||||||
|
Additionally, the GitHub action notarizes the macOS DMG. For this it needs
|
||||||
|
credentials provided via GitHub secrets.
|
||||||
|
|
||||||
|
During the build the Sentry webpack plugin checks to see if SENTRY_AUTH_TOKEN is
|
||||||
|
defined. If so, it uploads the sourcemaps for the renderer process to Sentry
|
||||||
|
(For our GitHub action, the SENTRY_AUTH_TOKEN is defined as a GitHub secret).
|
||||||
|
|
||||||
|
The sourcemaps for the main (node) process are currently not sent to Sentry
|
||||||
|
(this works fine in practice since the node process files are not minified, we
|
||||||
|
only run `tsc`).
|
||||||
|
|
||||||
|
Once the build is done, a draft release with all these artifacts attached is
|
||||||
|
created. The build is idempotent, so if something goes wrong and we need to
|
||||||
|
re-run the GitHub action, just delete the draft release (if it got created) and
|
||||||
|
start a new run by pushing a new tag (if some code changes are required).
|
||||||
|
|
||||||
|
If no code changes are required, say the build failed for some transient network
|
||||||
|
or sentry issue, we can even be re-run by the build by going to Github Action
|
||||||
|
age and rerun from there. This will re-trigger for the same tag.
|
||||||
|
|
||||||
|
If everything goes well, we'll have a release on GitHub, and the corresponding
|
||||||
|
source maps for the renderer process uploaded to Sentry. There isn't anything
|
||||||
|
else to do:
|
||||||
|
|
||||||
|
* The website automatically redirects to the latest release on GitHub when
|
||||||
|
people try to download.
|
||||||
|
|
||||||
|
* The file formats with support auto update (Windows `exe`, the Linux AppImage
|
||||||
|
and the macOS DMG) also check the latest GitHub release automatically to
|
||||||
|
download and apply the update (the rest of the formats don't support auto
|
||||||
|
updates).
|
||||||
|
|
||||||
|
* We're not putting the desktop app in other stores currently. It is available
|
||||||
|
as a `brew cask`, but we only had to open a PR to add the initial formula, now
|
||||||
|
their maintainers automatically bump the SHA, version number and the (derived
|
||||||
|
from the version) URL in the formula when their tools notice a new release on
|
||||||
|
our GitHub.
|
||||||
|
|
||||||
|
We can also publish the draft releases by checking the "pre-release" option.
|
||||||
|
Such releases don't cause any of the channels (our website, or the desktop app
|
||||||
|
auto updater, or brew) to be notified, instead these are useful for giving links
|
||||||
|
to pre-release builds to customers. Generally, in the version number for these
|
||||||
|
we'll add a label to the version, e.g. the "beta.x" in `1.x.x-beta.x`. This
|
||||||
|
should be done both in `package.json`, and what we tag the commit with.
|
|
@ -1,8 +1,8 @@
|
||||||
# Copycat DB
|
# Copycat DB
|
||||||
|
|
||||||
Copycat DB is a [service](../service.md) to take a backup of our database. It
|
Copycat DB is a [service](../services/README.md) to take a backup of our
|
||||||
uses the Scaleway CLI to take backups of the database, and uploads them to an
|
database. It uses the Scaleway CLI to take backups of the database, and uploads
|
||||||
offsite bucket.
|
them to an offsite bucket.
|
||||||
|
|
||||||
This bucket has an object lock configured, so backups cannot be deleted before
|
This bucket has an object lock configured, so backups cannot be deleted before
|
||||||
expiry. Conversely, the service also deletes backups older than some threshold
|
expiry. Conversely, the service also deletes backups older than some threshold
|
||||||
|
@ -11,9 +11,8 @@ when it creates a new one to avoid indefinite retention.
|
||||||
In production the service runs as a cron job, scheduled using a systemd timer.
|
In production the service runs as a cron job, scheduled using a systemd timer.
|
||||||
|
|
||||||
> These backups are in addition to the regular snapshots that we take, and are
|
> These backups are in addition to the regular snapshots that we take, and are
|
||||||
> meant as a second layer of replication. For more details, see our [Reliability
|
> meant as a second layer of replication. For more details, see our
|
||||||
> and Replication Specification](https://ente.io/reliability).
|
> [Reliability and Replication Specification](https://ente.io/reliability).
|
||||||
|
|
||||||
|
|
||||||
## Quick help
|
## Quick help
|
||||||
|
|
||||||
|
@ -61,7 +60,8 @@ then the Docker image falls back to using `pg_dump` (as outlined next).
|
||||||
Not needed in production when taking a backup (since we use the Scaleway CLI to
|
Not needed in production when taking a backup (since we use the Scaleway CLI to
|
||||||
take backups in production).
|
take backups in production).
|
||||||
|
|
||||||
These are used when testing a backup using `pg_dump`, and when restoring backups.
|
These are used when testing a backup using `pg_dump`, and when restoring
|
||||||
|
backups.
|
||||||
|
|
||||||
##### RCLONE_CONFIG
|
##### RCLONE_CONFIG
|
||||||
|
|
||||||
|
@ -70,9 +70,9 @@ to use to save the backups, and the credentials to to access it.
|
||||||
|
|
||||||
Specifically, the config file contains two remotes:
|
Specifically, the config file contains two remotes:
|
||||||
|
|
||||||
* The bucket itself, where data will be stored.
|
- The bucket itself, where data will be stored.
|
||||||
|
|
||||||
* A "crypt" remote that wraps the bucket by applying client side encryption.
|
- A "crypt" remote that wraps the bucket by applying client side encryption.
|
||||||
|
|
||||||
The configuration file will contain (lightly) obfuscated versions of the
|
The configuration file will contain (lightly) obfuscated versions of the
|
||||||
password, and as long as we have the configuration file we can continue using
|
password, and as long as we have the configuration file we can continue using
|
||||||
|
@ -164,9 +164,9 @@ you wish to force the job to service immediately
|
||||||
|
|
||||||
## Updating
|
## Updating
|
||||||
|
|
||||||
To update, run the [GitHub
|
To update, run the
|
||||||
workflow](../../.github/workflows/copycat-db-release.yaml) to build and push the
|
[GitHub workflow](../../.github/workflows/copycat-db-release.yaml) to build and
|
||||||
latest image to our Docker Registry, then restart the systemd service on the
|
push the latest image to our Docker Registry, then restart the systemd service
|
||||||
instance
|
on the instance
|
||||||
|
|
||||||
sudo systemctl restart copycat-db
|
sudo systemctl restart copycat-db
|
||||||
|
|
104
infra/services/README.md
Normal file
104
infra/services/README.md
Normal file
|
@ -0,0 +1,104 @@
|
||||||
|
# Services
|
||||||
|
|
||||||
|
"Services" are Docker images we run on our instances and manage using systemd.
|
||||||
|
|
||||||
|
All our services (including museum itself) follow the same pattern:
|
||||||
|
|
||||||
|
- They're run on vanilla Ubuntu instances. The only expectation they have is
|
||||||
|
for Docker to be installed.
|
||||||
|
|
||||||
|
- They log to fixed, known, locations - `/root/var/log/foo.log` - so that
|
||||||
|
these logs can get ingested by Promtail if needed.
|
||||||
|
|
||||||
|
- Each service should consist of a Docker image (or a Docker compose file),
|
||||||
|
and a systemd unit file.
|
||||||
|
|
||||||
|
- To start / stop / schedule the service, we use systemd.
|
||||||
|
|
||||||
|
- Each time the service runs it should pull the latest Docker image, so there
|
||||||
|
is no separate installation/upgrade step needed. We can just restart the
|
||||||
|
service, and it'll use the latest code.
|
||||||
|
|
||||||
|
- Any credentials and/or configuration should be read by mounting the
|
||||||
|
appropriate file from `/root/service-name` into the running Docker
|
||||||
|
container.
|
||||||
|
|
||||||
|
## Systemd cheatsheet
|
||||||
|
|
||||||
|
```sh
|
||||||
|
sudo systemctl status my-service
|
||||||
|
sudo systemctl start my-service
|
||||||
|
sudo systemctl stop my-service
|
||||||
|
sudo systemctl restart my-service
|
||||||
|
sudo journalctl --unit my-service
|
||||||
|
```
|
||||||
|
|
||||||
|
## Adding a service
|
||||||
|
|
||||||
|
Create a systemd unit file (See the various `*.service` files in this repository
|
||||||
|
for examples).
|
||||||
|
|
||||||
|
If we want the service to start on boot, add an `[Install]` section to its
|
||||||
|
service file (_note_: starting on boot requires one more step later):
|
||||||
|
|
||||||
|
```
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
|
```
|
||||||
|
|
||||||
|
Copy the service file to the instance where we want to run the service. Services
|
||||||
|
might also have some additional configuration or env files, also copy those to
|
||||||
|
the instance.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
scp services/example.service example.env <instance>:
|
||||||
|
```
|
||||||
|
|
||||||
|
SSH into the instance.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
ssh <instance>
|
||||||
|
```
|
||||||
|
|
||||||
|
Move the service `/etc/systemd/service`, and any config files to their expected
|
||||||
|
place. env and other config files that contain credentials are kept in `/root`.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
sudo mv example.service /etc/systemd/system
|
||||||
|
sudo mv example.env /root
|
||||||
|
```
|
||||||
|
|
||||||
|
If you want to start the service on boot (as spoken of in the `[Install]`
|
||||||
|
section above), then enable it (this only needs to be done once):
|
||||||
|
|
||||||
|
```sh
|
||||||
|
sudo systemctl enable service
|
||||||
|
```
|
||||||
|
|
||||||
|
Restarts systemd so that it gets to know of the service.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
sudo systemctl daemon-reload
|
||||||
|
```
|
||||||
|
|
||||||
|
Now you can manage the service using standard systemd commands.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
sudo systemctl start example
|
||||||
|
```
|
||||||
|
|
||||||
|
To view stdout/err, use:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
sudo journalctl --follow --unit example
|
||||||
|
```
|
||||||
|
|
||||||
|
## Logging
|
||||||
|
|
||||||
|
Services should log to files in `/var/logs` within the container. This should be
|
||||||
|
mounted to `/root/var/logs` on the instance (using the `-v` flag in the service
|
||||||
|
file which launches the Docker container or the Docker compose cluster).
|
||||||
|
|
||||||
|
If these logs need to be sent to Grafana, then ensure that there is an entry for
|
||||||
|
this log file in the `promtail/promtail.yaml` on that instance. The logs will
|
||||||
|
then get scraped by Promtail and sent over to Grafana.
|
32
infra/services/prometheus/README.md
Normal file
32
infra/services/prometheus/README.md
Normal file
|
@ -0,0 +1,32 @@
|
||||||
|
# Prometheus
|
||||||
|
|
||||||
|
Install `prometheus.service` on an instance if it is running something that
|
||||||
|
exports custom Prometheus metrics. In particular, museum does.
|
||||||
|
|
||||||
|
Also install `node-exporter.service` (after installing
|
||||||
|
[node-exporter](https://prometheus.io/docs/guides/node-exporter/) itself) if it
|
||||||
|
is a production instance whose metrics (CPU, disk, RAM etc) we want to monitor.
|
||||||
|
|
||||||
|
## Installing
|
||||||
|
|
||||||
|
Prometheus doesn't currently support environment variables in config file, so
|
||||||
|
remember to change the hardcoded `XX-HOSTNAME` too in addition to adding the
|
||||||
|
`remote_write` configuration.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
scp -P 7426 services/prometheus/* <instance>:
|
||||||
|
|
||||||
|
nano prometheus.yml
|
||||||
|
sudo mv prometheus.yml /root/prometheus.yml
|
||||||
|
sudo mv prometheus.service /etc/systemd/system/prometheus.service
|
||||||
|
sudo mv node-exporter.service /etc/systemd/system/node-exporter.service
|
||||||
|
```
|
||||||
|
|
||||||
|
Tell systemd to pick up new service definitions, enable the units (so that they
|
||||||
|
automatically start on boot going forward), and start them.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
sudo systemctl daemon-reload
|
||||||
|
sudo systemctl enable node-exporter prometheus
|
||||||
|
sudo systemctl start node-exporter prometheus
|
||||||
|
```
|
12
infra/services/prometheus/node-exporter.service
Normal file
12
infra/services/prometheus/node-exporter.service
Normal file
|
@ -0,0 +1,12 @@
|
||||||
|
[Unit]
|
||||||
|
Documentation=https://prometheus.io/docs/guides/node-exporter/
|
||||||
|
Wants=network-online.target
|
||||||
|
After=network-online.target
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
User=node_exporter
|
||||||
|
Group=node_exporter
|
||||||
|
ExecStart=/usr/local/bin/node_exporter
|
16
infra/services/prometheus/prometheus.service
Normal file
16
infra/services/prometheus/prometheus.service
Normal file
|
@ -0,0 +1,16 @@
|
||||||
|
[Unit]
|
||||||
|
Documentation=https://prometheus.io/docs/prometheus/
|
||||||
|
Requires=docker.service
|
||||||
|
After=docker.service
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
ExecStartPre=docker pull prom/prometheus
|
||||||
|
ExecStartPre=-docker stop prometheus
|
||||||
|
ExecStartPre=-docker rm prometheus
|
||||||
|
ExecStart=docker run --name prometheus \
|
||||||
|
--add-host=host.docker.internal:host-gateway \
|
||||||
|
-v /root/prometheus.yml:/etc/prometheus/prometheus.yml:ro \
|
||||||
|
prom/prometheus
|
39
infra/services/prometheus/prometheus.yml
Normal file
39
infra/services/prometheus/prometheus.yml
Normal file
|
@ -0,0 +1,39 @@
|
||||||
|
# https://prometheus.io/docs/prometheus/latest/configuration/
|
||||||
|
|
||||||
|
global:
|
||||||
|
scrape_interval: 30s # Default is 1m
|
||||||
|
|
||||||
|
scrape_configs:
|
||||||
|
- job_name: museum
|
||||||
|
static_configs:
|
||||||
|
- targets: ["host.docker.internal:2112"]
|
||||||
|
relabel_configs:
|
||||||
|
- source_labels: [__address__]
|
||||||
|
regex: ".*"
|
||||||
|
target_label: instance
|
||||||
|
replacement: XX-HOSTNAME
|
||||||
|
|
||||||
|
- job_name: "prometheus"
|
||||||
|
static_configs:
|
||||||
|
- targets: ["localhost:9090"]
|
||||||
|
relabel_configs:
|
||||||
|
- source_labels: [__address__]
|
||||||
|
regex: ".*"
|
||||||
|
target_label: instance
|
||||||
|
replacement: XX-HOSTNAME
|
||||||
|
|
||||||
|
- job_name: "node"
|
||||||
|
static_configs:
|
||||||
|
- targets: ["host.docker.internal:9100"]
|
||||||
|
relabel_configs:
|
||||||
|
- source_labels: [__address__]
|
||||||
|
regex: ".*"
|
||||||
|
target_label: instance
|
||||||
|
replacement: XX-HOSTNAME
|
||||||
|
|
||||||
|
# Grafana Cloud
|
||||||
|
remote_write:
|
||||||
|
- url: https://g/api/prom/push
|
||||||
|
basic_auth:
|
||||||
|
username: foo
|
||||||
|
password: bar
|
26
infra/services/promtail/README.md
Normal file
26
infra/services/promtail/README.md
Normal file
|
@ -0,0 +1,26 @@
|
||||||
|
# Promtail
|
||||||
|
|
||||||
|
Install `promtail.service` on an instance if it is running something whose logs
|
||||||
|
we want in Grafana.
|
||||||
|
|
||||||
|
## Installing
|
||||||
|
|
||||||
|
Replace `client.url` in the config file with the Loki URL that Promtail should
|
||||||
|
connect to, and move the files to their expected place.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
scp -P 7426 services/promtail/* <instance>:
|
||||||
|
|
||||||
|
nano promtail.yaml
|
||||||
|
sudo mv promtail.yaml /root/promtail.yaml
|
||||||
|
sudo mv promtail.service /etc/systemd/system/promtail.service
|
||||||
|
```
|
||||||
|
|
||||||
|
Tell systemd to pick up new service definitions, enable the unit (so that it
|
||||||
|
automatically starts on boot), and start it this time around.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
sudo systemctl daemon-reload
|
||||||
|
sudo systemctl enable promtail
|
||||||
|
sudo systemctl start promtail
|
||||||
|
```
|
19
infra/services/promtail/promtail.service
Normal file
19
infra/services/promtail/promtail.service
Normal file
|
@ -0,0 +1,19 @@
|
||||||
|
[Unit]
|
||||||
|
Documentation=https://grafana.com/docs/loki/latest/clients/promtail/
|
||||||
|
Requires=docker.service
|
||||||
|
After=docker.service
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
ExecStartPre=docker pull grafana/promtail
|
||||||
|
ExecStartPre=-docker stop promtail
|
||||||
|
ExecStartPre=-docker rm promtail
|
||||||
|
ExecStart=docker run --name promtail \
|
||||||
|
--hostname "%H" \
|
||||||
|
-v /root/promtail.yaml:/config.yaml:ro \
|
||||||
|
-v /var/log:/var/log \
|
||||||
|
-v /root/var/logs:/var/logs:ro \
|
||||||
|
-v /var/lib/docker/containers:/var/lib/docker/containers:ro \
|
||||||
|
grafana/promtail -config.file=/config.yaml -config.expand-env=true
|
45
infra/services/promtail/promtail.yaml
Normal file
45
infra/services/promtail/promtail.yaml
Normal file
|
@ -0,0 +1,45 @@
|
||||||
|
# https://grafana.com/docs/loki/latest/clients/promtail/configuration/
|
||||||
|
|
||||||
|
# We don't want Promtail's HTTP / GRPC server.
|
||||||
|
server:
|
||||||
|
disable: true
|
||||||
|
|
||||||
|
# Loki URL
|
||||||
|
# For Grafana Cloud, it can be found in the integrations section.
|
||||||
|
clients:
|
||||||
|
- url: http://loki:3100/loki/api/v1/push
|
||||||
|
|
||||||
|
# Manually add entries for all our services. This is a bit cumbersome, but
|
||||||
|
# - Retains flexibility in file names.
|
||||||
|
# - Makes adding job labels easy.
|
||||||
|
# - Does not get in the way of logrotation.
|
||||||
|
#
|
||||||
|
# In addition, also scrape logs from all docker containers.
|
||||||
|
scrape_configs:
|
||||||
|
- job_name: museum
|
||||||
|
static_configs:
|
||||||
|
- labels:
|
||||||
|
job: museum
|
||||||
|
host: ${HOSTNAME}
|
||||||
|
__path__: /var/logs/museum.log
|
||||||
|
|
||||||
|
- job_name: copycat-db
|
||||||
|
static_configs:
|
||||||
|
- labels:
|
||||||
|
job: copycat-db
|
||||||
|
host: ${HOSTNAME}
|
||||||
|
__path__: /var/logs/copycat-db.log
|
||||||
|
|
||||||
|
- job_name: phoenix
|
||||||
|
static_configs:
|
||||||
|
- labels:
|
||||||
|
job: phoenix
|
||||||
|
host: ${HOSTNAME}
|
||||||
|
__path__: /var/logs/phoenix.log
|
||||||
|
|
||||||
|
- job_name: docker
|
||||||
|
static_configs:
|
||||||
|
- labels:
|
||||||
|
job: docker
|
||||||
|
host: ${HOSTNAME}
|
||||||
|
__path__: /var/lib/docker/containers/*/*-json.log
|
Loading…
Reference in a new issue