[infra] Add more details about services (#1109)
- Import the existing DesktopDistribute notes - Define services - Add prometheus and promtail definitions
This commit is contained in:
commit
cd9c6f713a
10 changed files with 398 additions and 13 deletions
92
desktop/docs/release.md
Normal file
92
desktop/docs/release.md
Normal file
|
@ -0,0 +1,92 @@
|
|||
## Releases
|
||||
|
||||
> [!NOTE]
|
||||
>
|
||||
> TODO(MR): This document needs to be audited and changed as we do the first
|
||||
> release from this new monorepo.
|
||||
|
||||
The Github Action that builds the desktop binaries is triggered by pushing a tag
|
||||
matching the pattern `photos-desktop-v1.2.3`. This value should match the
|
||||
version in `package.json`.
|
||||
|
||||
So the process for doing a release would be.
|
||||
|
||||
1. Create a new branch (can be named anything). On this branch, include your
|
||||
changes.
|
||||
|
||||
2. Mention the changes in `CHANGELOG.md`.
|
||||
|
||||
3. Changing the `version` in `package.json` to `1.x.x`.
|
||||
|
||||
4. Commit and push to remote
|
||||
|
||||
```sh
|
||||
git add package.json && git commit -m 'Release v1.x.x'
|
||||
git tag v1.x.x
|
||||
git push && git push --tags
|
||||
```
|
||||
|
||||
This by itself will already trigger a new release. The GitHub action will create
|
||||
a new draft release that can then be used as descibed below.
|
||||
|
||||
To wrap up, we also need to merge back these changes into main. So for that,
|
||||
|
||||
5. Open a PR for the branch that we're working on (where the above tag was
|
||||
pushed from) to get it merged into main.
|
||||
|
||||
6. In this PR, also increase the version number for the next release train. That
|
||||
is, supposed we just released `v4.0.1`. Then we'll change the version number
|
||||
in main to `v4.0.2-next.0`. Each pre-release will modify the `next.0` part.
|
||||
Finally, at the time of the next release, this'll become `v4.0.2`.
|
||||
|
||||
The GitHub Action runs on Windows, Linux and macOS. It produces the artifacts
|
||||
defined in the `build` value in `package.json`.
|
||||
|
||||
* Windows - An NSIS installer.
|
||||
* Linux - An AppImage, and 3 other packages (`.rpm`, `.deb`, `.pacman`)
|
||||
* macOS - A universal DMG
|
||||
|
||||
Additionally, the GitHub action notarizes the macOS DMG. For this it needs
|
||||
credentials provided via GitHub secrets.
|
||||
|
||||
During the build the Sentry webpack plugin checks to see if SENTRY_AUTH_TOKEN is
|
||||
defined. If so, it uploads the sourcemaps for the renderer process to Sentry
|
||||
(For our GitHub action, the SENTRY_AUTH_TOKEN is defined as a GitHub secret).
|
||||
|
||||
The sourcemaps for the main (node) process are currently not sent to Sentry
|
||||
(this works fine in practice since the node process files are not minified, we
|
||||
only run `tsc`).
|
||||
|
||||
Once the build is done, a draft release with all these artifacts attached is
|
||||
created. The build is idempotent, so if something goes wrong and we need to
|
||||
re-run the GitHub action, just delete the draft release (if it got created) and
|
||||
start a new run by pushing a new tag (if some code changes are required).
|
||||
|
||||
If no code changes are required, say the build failed for some transient network
|
||||
or sentry issue, we can even be re-run by the build by going to Github Action
|
||||
age and rerun from there. This will re-trigger for the same tag.
|
||||
|
||||
If everything goes well, we'll have a release on GitHub, and the corresponding
|
||||
source maps for the renderer process uploaded to Sentry. There isn't anything
|
||||
else to do:
|
||||
|
||||
* The website automatically redirects to the latest release on GitHub when
|
||||
people try to download.
|
||||
|
||||
* The file formats with support auto update (Windows `exe`, the Linux AppImage
|
||||
and the macOS DMG) also check the latest GitHub release automatically to
|
||||
download and apply the update (the rest of the formats don't support auto
|
||||
updates).
|
||||
|
||||
* We're not putting the desktop app in other stores currently. It is available
|
||||
as a `brew cask`, but we only had to open a PR to add the initial formula, now
|
||||
their maintainers automatically bump the SHA, version number and the (derived
|
||||
from the version) URL in the formula when their tools notice a new release on
|
||||
our GitHub.
|
||||
|
||||
We can also publish the draft releases by checking the "pre-release" option.
|
||||
Such releases don't cause any of the channels (our website, or the desktop app
|
||||
auto updater, or brew) to be notified, instead these are useful for giving links
|
||||
to pre-release builds to customers. Generally, in the version number for these
|
||||
we'll add a label to the version, e.g. the "beta.x" in `1.x.x-beta.x`. This
|
||||
should be done both in `package.json`, and what we tag the commit with.
|
|
@ -1,8 +1,8 @@
|
|||
# Copycat DB
|
||||
|
||||
Copycat DB is a [service](../service.md) to take a backup of our database. It
|
||||
uses the Scaleway CLI to take backups of the database, and uploads them to an
|
||||
offsite bucket.
|
||||
Copycat DB is a [service](../services/README.md) to take a backup of our
|
||||
database. It uses the Scaleway CLI to take backups of the database, and uploads
|
||||
them to an offsite bucket.
|
||||
|
||||
This bucket has an object lock configured, so backups cannot be deleted before
|
||||
expiry. Conversely, the service also deletes backups older than some threshold
|
||||
|
@ -11,9 +11,8 @@ when it creates a new one to avoid indefinite retention.
|
|||
In production the service runs as a cron job, scheduled using a systemd timer.
|
||||
|
||||
> These backups are in addition to the regular snapshots that we take, and are
|
||||
> meant as a second layer of replication. For more details, see our [Reliability
|
||||
> and Replication Specification](https://ente.io/reliability).
|
||||
|
||||
> meant as a second layer of replication. For more details, see our
|
||||
> [Reliability and Replication Specification](https://ente.io/reliability).
|
||||
|
||||
## Quick help
|
||||
|
||||
|
@ -61,7 +60,8 @@ then the Docker image falls back to using `pg_dump` (as outlined next).
|
|||
Not needed in production when taking a backup (since we use the Scaleway CLI to
|
||||
take backups in production).
|
||||
|
||||
These are used when testing a backup using `pg_dump`, and when restoring backups.
|
||||
These are used when testing a backup using `pg_dump`, and when restoring
|
||||
backups.
|
||||
|
||||
##### RCLONE_CONFIG
|
||||
|
||||
|
@ -70,9 +70,9 @@ to use to save the backups, and the credentials to to access it.
|
|||
|
||||
Specifically, the config file contains two remotes:
|
||||
|
||||
* The bucket itself, where data will be stored.
|
||||
- The bucket itself, where data will be stored.
|
||||
|
||||
* A "crypt" remote that wraps the bucket by applying client side encryption.
|
||||
- A "crypt" remote that wraps the bucket by applying client side encryption.
|
||||
|
||||
The configuration file will contain (lightly) obfuscated versions of the
|
||||
password, and as long as we have the configuration file we can continue using
|
||||
|
@ -164,9 +164,9 @@ you wish to force the job to service immediately
|
|||
|
||||
## Updating
|
||||
|
||||
To update, run the [GitHub
|
||||
workflow](../../.github/workflows/copycat-db-release.yaml) to build and push the
|
||||
latest image to our Docker Registry, then restart the systemd service on the
|
||||
instance
|
||||
To update, run the
|
||||
[GitHub workflow](../../.github/workflows/copycat-db-release.yaml) to build and
|
||||
push the latest image to our Docker Registry, then restart the systemd service
|
||||
on the instance
|
||||
|
||||
sudo systemctl restart copycat-db
|
||||
|
|
104
infra/services/README.md
Normal file
104
infra/services/README.md
Normal file
|
@ -0,0 +1,104 @@
|
|||
# Services
|
||||
|
||||
"Services" are Docker images we run on our instances and manage using systemd.
|
||||
|
||||
All our services (including museum itself) follow the same pattern:
|
||||
|
||||
- They're run on vanilla Ubuntu instances. The only expectation they have is
|
||||
for Docker to be installed.
|
||||
|
||||
- They log to fixed, known, locations - `/root/var/log/foo.log` - so that
|
||||
these logs can get ingested by Promtail if needed.
|
||||
|
||||
- Each service should consist of a Docker image (or a Docker compose file),
|
||||
and a systemd unit file.
|
||||
|
||||
- To start / stop / schedule the service, we use systemd.
|
||||
|
||||
- Each time the service runs it should pull the latest Docker image, so there
|
||||
is no separate installation/upgrade step needed. We can just restart the
|
||||
service, and it'll use the latest code.
|
||||
|
||||
- Any credentials and/or configuration should be read by mounting the
|
||||
appropriate file from `/root/service-name` into the running Docker
|
||||
container.
|
||||
|
||||
## Systemd cheatsheet
|
||||
|
||||
```sh
|
||||
sudo systemctl status my-service
|
||||
sudo systemctl start my-service
|
||||
sudo systemctl stop my-service
|
||||
sudo systemctl restart my-service
|
||||
sudo journalctl --unit my-service
|
||||
```
|
||||
|
||||
## Adding a service
|
||||
|
||||
Create a systemd unit file (See the various `*.service` files in this repository
|
||||
for examples).
|
||||
|
||||
If we want the service to start on boot, add an `[Install]` section to its
|
||||
service file (_note_: starting on boot requires one more step later):
|
||||
|
||||
```
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
Copy the service file to the instance where we want to run the service. Services
|
||||
might also have some additional configuration or env files, also copy those to
|
||||
the instance.
|
||||
|
||||
```sh
|
||||
scp services/example.service example.env <instance>:
|
||||
```
|
||||
|
||||
SSH into the instance.
|
||||
|
||||
```sh
|
||||
ssh <instance>
|
||||
```
|
||||
|
||||
Move the service `/etc/systemd/service`, and any config files to their expected
|
||||
place. env and other config files that contain credentials are kept in `/root`.
|
||||
|
||||
```sh
|
||||
sudo mv example.service /etc/systemd/system
|
||||
sudo mv example.env /root
|
||||
```
|
||||
|
||||
If you want to start the service on boot (as spoken of in the `[Install]`
|
||||
section above), then enable it (this only needs to be done once):
|
||||
|
||||
```sh
|
||||
sudo systemctl enable service
|
||||
```
|
||||
|
||||
Restarts systemd so that it gets to know of the service.
|
||||
|
||||
```sh
|
||||
sudo systemctl daemon-reload
|
||||
```
|
||||
|
||||
Now you can manage the service using standard systemd commands.
|
||||
|
||||
```sh
|
||||
sudo systemctl start example
|
||||
```
|
||||
|
||||
To view stdout/err, use:
|
||||
|
||||
```sh
|
||||
sudo journalctl --follow --unit example
|
||||
```
|
||||
|
||||
## Logging
|
||||
|
||||
Services should log to files in `/var/logs` within the container. This should be
|
||||
mounted to `/root/var/logs` on the instance (using the `-v` flag in the service
|
||||
file which launches the Docker container or the Docker compose cluster).
|
||||
|
||||
If these logs need to be sent to Grafana, then ensure that there is an entry for
|
||||
this log file in the `promtail/promtail.yaml` on that instance. The logs will
|
||||
then get scraped by Promtail and sent over to Grafana.
|
32
infra/services/prometheus/README.md
Normal file
32
infra/services/prometheus/README.md
Normal file
|
@ -0,0 +1,32 @@
|
|||
# Prometheus
|
||||
|
||||
Install `prometheus.service` on an instance if it is running something that
|
||||
exports custom Prometheus metrics. In particular, museum does.
|
||||
|
||||
Also install `node-exporter.service` (after installing
|
||||
[node-exporter](https://prometheus.io/docs/guides/node-exporter/) itself) if it
|
||||
is a production instance whose metrics (CPU, disk, RAM etc) we want to monitor.
|
||||
|
||||
## Installing
|
||||
|
||||
Prometheus doesn't currently support environment variables in config file, so
|
||||
remember to change the hardcoded `XX-HOSTNAME` too in addition to adding the
|
||||
`remote_write` configuration.
|
||||
|
||||
```sh
|
||||
scp -P 7426 services/prometheus/* <instance>:
|
||||
|
||||
nano prometheus.yml
|
||||
sudo mv prometheus.yml /root/prometheus.yml
|
||||
sudo mv prometheus.service /etc/systemd/system/prometheus.service
|
||||
sudo mv node-exporter.service /etc/systemd/system/node-exporter.service
|
||||
```
|
||||
|
||||
Tell systemd to pick up new service definitions, enable the units (so that they
|
||||
automatically start on boot going forward), and start them.
|
||||
|
||||
```sh
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl enable node-exporter prometheus
|
||||
sudo systemctl start node-exporter prometheus
|
||||
```
|
12
infra/services/prometheus/node-exporter.service
Normal file
12
infra/services/prometheus/node-exporter.service
Normal file
|
@ -0,0 +1,12 @@
|
|||
[Unit]
|
||||
Documentation=https://prometheus.io/docs/guides/node-exporter/
|
||||
Wants=network-online.target
|
||||
After=network-online.target
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
|
||||
[Service]
|
||||
User=node_exporter
|
||||
Group=node_exporter
|
||||
ExecStart=/usr/local/bin/node_exporter
|
16
infra/services/prometheus/prometheus.service
Normal file
16
infra/services/prometheus/prometheus.service
Normal file
|
@ -0,0 +1,16 @@
|
|||
[Unit]
|
||||
Documentation=https://prometheus.io/docs/prometheus/
|
||||
Requires=docker.service
|
||||
After=docker.service
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
|
||||
[Service]
|
||||
ExecStartPre=docker pull prom/prometheus
|
||||
ExecStartPre=-docker stop prometheus
|
||||
ExecStartPre=-docker rm prometheus
|
||||
ExecStart=docker run --name prometheus \
|
||||
--add-host=host.docker.internal:host-gateway \
|
||||
-v /root/prometheus.yml:/etc/prometheus/prometheus.yml:ro \
|
||||
prom/prometheus
|
39
infra/services/prometheus/prometheus.yml
Normal file
39
infra/services/prometheus/prometheus.yml
Normal file
|
@ -0,0 +1,39 @@
|
|||
# https://prometheus.io/docs/prometheus/latest/configuration/
|
||||
|
||||
global:
|
||||
scrape_interval: 30s # Default is 1m
|
||||
|
||||
scrape_configs:
|
||||
- job_name: museum
|
||||
static_configs:
|
||||
- targets: ["host.docker.internal:2112"]
|
||||
relabel_configs:
|
||||
- source_labels: [__address__]
|
||||
regex: ".*"
|
||||
target_label: instance
|
||||
replacement: XX-HOSTNAME
|
||||
|
||||
- job_name: "prometheus"
|
||||
static_configs:
|
||||
- targets: ["localhost:9090"]
|
||||
relabel_configs:
|
||||
- source_labels: [__address__]
|
||||
regex: ".*"
|
||||
target_label: instance
|
||||
replacement: XX-HOSTNAME
|
||||
|
||||
- job_name: "node"
|
||||
static_configs:
|
||||
- targets: ["host.docker.internal:9100"]
|
||||
relabel_configs:
|
||||
- source_labels: [__address__]
|
||||
regex: ".*"
|
||||
target_label: instance
|
||||
replacement: XX-HOSTNAME
|
||||
|
||||
# Grafana Cloud
|
||||
remote_write:
|
||||
- url: https://g/api/prom/push
|
||||
basic_auth:
|
||||
username: foo
|
||||
password: bar
|
26
infra/services/promtail/README.md
Normal file
26
infra/services/promtail/README.md
Normal file
|
@ -0,0 +1,26 @@
|
|||
# Promtail
|
||||
|
||||
Install `promtail.service` on an instance if it is running something whose logs
|
||||
we want in Grafana.
|
||||
|
||||
## Installing
|
||||
|
||||
Replace `client.url` in the config file with the Loki URL that Promtail should
|
||||
connect to, and move the files to their expected place.
|
||||
|
||||
```sh
|
||||
scp -P 7426 services/promtail/* <instance>:
|
||||
|
||||
nano promtail.yaml
|
||||
sudo mv promtail.yaml /root/promtail.yaml
|
||||
sudo mv promtail.service /etc/systemd/system/promtail.service
|
||||
```
|
||||
|
||||
Tell systemd to pick up new service definitions, enable the unit (so that it
|
||||
automatically starts on boot), and start it this time around.
|
||||
|
||||
```sh
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl enable promtail
|
||||
sudo systemctl start promtail
|
||||
```
|
19
infra/services/promtail/promtail.service
Normal file
19
infra/services/promtail/promtail.service
Normal file
|
@ -0,0 +1,19 @@
|
|||
[Unit]
|
||||
Documentation=https://grafana.com/docs/loki/latest/clients/promtail/
|
||||
Requires=docker.service
|
||||
After=docker.service
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
|
||||
[Service]
|
||||
ExecStartPre=docker pull grafana/promtail
|
||||
ExecStartPre=-docker stop promtail
|
||||
ExecStartPre=-docker rm promtail
|
||||
ExecStart=docker run --name promtail \
|
||||
--hostname "%H" \
|
||||
-v /root/promtail.yaml:/config.yaml:ro \
|
||||
-v /var/log:/var/log \
|
||||
-v /root/var/logs:/var/logs:ro \
|
||||
-v /var/lib/docker/containers:/var/lib/docker/containers:ro \
|
||||
grafana/promtail -config.file=/config.yaml -config.expand-env=true
|
45
infra/services/promtail/promtail.yaml
Normal file
45
infra/services/promtail/promtail.yaml
Normal file
|
@ -0,0 +1,45 @@
|
|||
# https://grafana.com/docs/loki/latest/clients/promtail/configuration/
|
||||
|
||||
# We don't want Promtail's HTTP / GRPC server.
|
||||
server:
|
||||
disable: true
|
||||
|
||||
# Loki URL
|
||||
# For Grafana Cloud, it can be found in the integrations section.
|
||||
clients:
|
||||
- url: http://loki:3100/loki/api/v1/push
|
||||
|
||||
# Manually add entries for all our services. This is a bit cumbersome, but
|
||||
# - Retains flexibility in file names.
|
||||
# - Makes adding job labels easy.
|
||||
# - Does not get in the way of logrotation.
|
||||
#
|
||||
# In addition, also scrape logs from all docker containers.
|
||||
scrape_configs:
|
||||
- job_name: museum
|
||||
static_configs:
|
||||
- labels:
|
||||
job: museum
|
||||
host: ${HOSTNAME}
|
||||
__path__: /var/logs/museum.log
|
||||
|
||||
- job_name: copycat-db
|
||||
static_configs:
|
||||
- labels:
|
||||
job: copycat-db
|
||||
host: ${HOSTNAME}
|
||||
__path__: /var/logs/copycat-db.log
|
||||
|
||||
- job_name: phoenix
|
||||
static_configs:
|
||||
- labels:
|
||||
job: phoenix
|
||||
host: ${HOSTNAME}
|
||||
__path__: /var/logs/phoenix.log
|
||||
|
||||
- job_name: docker
|
||||
static_configs:
|
||||
- labels:
|
||||
job: docker
|
||||
host: ${HOSTNAME}
|
||||
__path__: /var/lib/docker/containers/*/*-json.log
|
Loading…
Reference in a new issue