[museum] Support running behind Nginx (#1130)

- Move deployment related stuff into a separate folder - Document the overall approach better - Add an Nginx specific environment and service definition - Remove the restart limiter
2024-03-18 13:56:01 +05:30 · 2024-03-18 13:56:01 +05:30 · a341f81932
commit a341f81932
parent c7fd976ab7 c43a0a7182
12 changed files with 193 additions and 20 deletions
--- a/infra/services/nginx/README.md
+++ b/infra/services/nginx/README.md
@ -1,11 +1,19 @@
 # Nginx

-This is a base nginx service that terminates TLS, and can be used as a reverse
+This is a base Nginx service that terminates TLS, and can be used as a reverse
 proxy for arbitrary services by adding new entries in `/root/nginx/conf.d` and
 `sudo systemctl restart nginx`.

 ## Installation

+Copy the service definition
+
+```sh
+scp services/nginx/nginx.service <instance>:
+
+sudo mv nginx.service /etc/systemd/system/nginx.service
+```
+
 Create a directory to house service specific configuration

    sudo mkdir -p /root/nginx/conf.d
@ -15,7 +23,18 @@ Add the SSL certificate provided by Cloudflare
    sudo tee /root/nginx/cert.pem
    sudo tee /root/nginx/key.pem

+Tell systemd to pick up new service definition, enable it (so that it
+automatically starts on boot going forward), and start it.
+
+```sh
+sudo systemctl daemon-reload
+sudo systemctl enable --now nginx
+```
+
 ## Adding a service

-When adding new services that sit behind nginx, add their nginx conf file to
-`/root/nginx/conf.d` and and restart the nginx service.
+When adding new services that sit behind Nginx,
+
+1. Add its nginx conf file to `/root/nginx/conf.d`
+
+2. Restart nginx (`sudo systemctl restart nginx`)
--- a/infra/services/prometheus/README.md
+++ b/infra/services/prometheus/README.md
@ -3,9 +3,10 @@
 Install `prometheus.service` on an instance if it is running something that
 exports custom Prometheus metrics. In particular, museum does.

-Also install `node-exporter.service` (after installing
-[node-exporter](https://prometheus.io/docs/guides/node-exporter/) itself) if it
-is a production instance whose metrics (CPU, disk, RAM etc) we want to monitor.
+If it is an instance whose metrics (CPU, disk, RAM etc) we want to monitor, also
+install `node-exporter.service` after installing
+[node-exporter](https://prometheus.io/docs/guides/node-exporter/) itself (Note
+that our prepare-instance script already installs node-exporter) .

 ## Installing

@ -14,7 +15,8 @@ remember to change the hardcoded `XX-HOSTNAME` too in addition to adding the
 `remote_write` configuration.

 ```sh
-scp -P 7426 services/prometheus/* <instance>:
+scp services/prometheus/prometheus.* <instance>:
+scp services/prometheus/node-exporter.service <instance>:

 nano prometheus.yml
 sudo mv prometheus.yml /root/prometheus.yml
--- a/infra/services/promtail/README.md
+++ b/infra/services/promtail/README.md
@ -9,7 +9,7 @@ Replace `client.url` in the config file with the Loki URL that Promtail should
 connect to, and move the files to their expected place.

 ```sh
-scp -P 7426 services/promtail/* <instance>:
+scp services/promtail/promtail.* <instance>:

 nano promtail.yaml
 sudo mv promtail.yaml /root/promtail.yaml
@ -21,6 +21,5 @@ automatically starts on boot), and start it this time around.

 ```sh
 sudo systemctl daemon-reload
-sudo systemctl enable promtail
-sudo systemctl start promtail
+sudo systemctl enable --now promtail
 ```
--- a/server/README.md
+++ b/server/README.md
@ -95,9 +95,10 @@ setup we ourselves use in production.
 > [!TIP]
 >
 > On our production servers, we wrap museum in a [systemd
-> service](scripts/museum.service). Our production machines are vanilla Ubuntu
-> images, with Docker and Promtail installed. We then plonk in this systemd
-> service, and use `systemctl start|stop|status museum` to herd it around.
+> service](scripts/deploy/museum.service). Our production machines are vanilla
+> Ubuntu images, with Docker and Promtail installed. We then plonk in this
+> systemd service, and use `systemctl start|stop|status museum` to herd it
+> around.

 Some people new to Docker/Go/Postgres might have general questions though.
 Unfortunately, because of limited engineering bandwidth **we will currently not
--- a/server/cmd/museum/main.go
+++ b/server/cmd/museum/main.go
@ -712,9 +712,8 @@ func main() {
 }

 func runServer(environment string, server *gin.Engine) {
-	if environment == "local" {
-		server.Run(":8080")
-	} else {
+	useTLS := viper.GetBool("http.use-tls")
+	if useTLS {
 		certPath, err := config.CredentialFilePath("tls.cert")
 		if err != nil {
 			log.Fatal(err)
@ -726,6 +725,8 @@ func runServer(environment string, server *gin.Engine) {
 		}

 		log.Fatal(server.RunTLS(":443", certPath, keyPath))
+	} else {
+		server.Run(":8080")
 	}
 }

--- a/server/configurations/local.yaml
+++ b/server/configurations/local.yaml
@ -65,6 +65,12 @@
 # It must be specified if running in a non-local environment.
 log-file: ""

+# HTTP connection parameters
+http:
+    # If true, bind to 443 and use TLS.
+    # By default, this is false, and museum will bind to 8080 without TLS.
+    # use-tls: true
+
 # Database connection parameters
 db:
    host: localhost
--- a/server/configurations/production.yaml
+++ b/server/configurations/production.yaml
@ -1,5 +1,8 @@
 log-file: /var/logs/museum.log

+http:
+    use-tls: true
+
 stripe:
    path:
        success: ?status=success&session_id={CHECKOUT_SESSION_ID}
--- a/server/scripts/deploy/README.md
+++ b/server/scripts/deploy/README.md
@ -0,0 +1,92 @@
+# Production Deployments
+
+This document outlines how we ourselves deploy museum. Note that this is very
+specific to our use case, and while this might be useful as an example, this is
+likely overkill for simple self hosted deployments.
+
+## Overview
+
+We use museum's Dockerfile to build images which we then run on vanilla Ubuntu
+servers (+ Docker installed). For ease of administration, we wrap Docker
+commands to start/stop/update it in a systemd service.
+
+* The production machines are vanilla Ubuntu instances, with Docker and Promtail
+installed.
+
+* There is a [GitHub action](../../../.github/workflows/server-release.yml) to
+  build museum Docker images using its Dockerfile.
+
+* We wrap the commands to start and stop containers using these images in a
+  systemd service.
+
+* We call this general concept of standalone Docker images that are managed
+using systemd as "services". More examples and details
+[here](../../../infra/services/README.md).
+
+* So museum is a "service". You can see its systemd unit definition in
+  [museum.service](museum.service)
+
+* On the running instance, we use `systemctl start|stop|status museum` to manage
+  it.
+
+* The service automatically updates itself on each start. There's also a
+  convenience [script](update-and-restart-museum.sh) that pre-downloads the
+  latest image to further reduce the delay during a restart.
+
+* Optionally and alternatively, museum can also be run behind an Nginx. This
+  option has a separate service definition.
+
+## Installation
+
+To bring up an additional museum node:
+
+* Prepare the instance to run our services
+
+* Setup [promtail](../../../infra/services/promtail/README.md), [prometheus and
+  node-exporter](../../../infra/services/prometheus/README.md) services
+
+* If running behind Nginx, install the
+  [nginx](../../../infra/services/nginx/README.md) service.
+
+* Add credentials
+
+      sudo mkdir -p /root/museum/credentials
+      sudo tee /root/museum/credentials/pst-service-account.json
+      sudo tee /root/museum/credentials/fcm-service-account.json
+      sudo tee /root/museum/credentials.yaml
+
+* If not running behind Nginx, add the TLS credentials (otherwise add the to
+  Nginx)
+
+      sudo tee /root/museum/credentials/tls.cert
+      sudo tee /root/museum/credentials/tls.key
+
+* Copy the service definition and restart script to the new instance. The
+  restart script can remain in the ente user's home directory. Move the service
+  definition to its proper place.
+
+      # If using nginx
+      scp scripts/deploy/museum.nginx.service <instance>:museum.service
+      # otherwise
+      scp scripts/deploy/museum.service <instance>:
+
+      scp scripts/deploy/update-and-restart-museum.sh <instance>:
+
+      sudo mv museum.service /etc/systemd/system
+      sudo systemctl daemon-reload
+
+* If running behind Nginx, tell it about museum
+
+      scp scripts/deploy/museum.nginx.conf <instance>:
+
+      sudo mv museum.nginx.conf /etc/systemd/system
+      sudo systemctl restart nginx
+
+## Starting
+
+SSH into the instance, and run
+
+    ./update-and-restart-museum.sh
+
+This'll ask for sudo credentials, pull the latest Docker image, restart the
+museum service and start tailing the logs (as a sanity check).
--- a/server/scripts/deploy/museum.nginx.conf
+++ b/server/scripts/deploy/museum.nginx.conf
@ -0,0 +1,16 @@
+server {
+    listen 443 ssl http2;
+    listen [::]:443 ssl http2;
+    ssl_certificate         /etc/ssl/certs/cert.pem;
+    ssl_certificate_key     /etc/ssl/private/key.pem;
+
+    server_name api.ente.io;
+
+    location / {
+        proxy_pass http://host.docker.internal:8080;
+        proxy_set_header Host $host;
+        proxy_set_header X-Real-IP $remote_addr;
+        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+        proxy_set_header X-Forwarded-Proto $scheme;
+    }
+}
--- a/server/scripts/deploy/museum.nginx.service
+++ b/server/scripts/deploy/museum.nginx.service
@ -0,0 +1,22 @@
+[Unit]
+Documentation=https://github.com/ente-io/ente/tree/main/server#readme
+Requires=docker.service
+After=docker.service
+Requires=nginx.service
+
+[Service]
+Restart=on-failure
+ExecStartPre=docker pull rg.fr-par.scw.cloud/ente/museum-prod
+ExecStartPre=-docker stop museum
+ExecStartPre=-docker rm museum
+ExecStart=docker run --name museum \
+     -e ENVIRONMENT=production \
+     -e ENTE_HTTP_USE-TLS=1 \
+     --hostname "%H" \
+     -p 8080:8080 \
+     -p 2112:2112 \
+     -v /root/museum/credentials:/credentials:ro \
+     -v /root/museum/credentials.yaml:/credentials.yaml:ro \
+     -v /root/museum/data:/data:ro \
+     -v /root/var:/var \
+     rg.fr-par.scw.cloud/ente/museum-prod
--- a/server/scripts/deploy/museum.service
+++ b/server/scripts/deploy/museum.service
@ -1,10 +1,7 @@
 [Unit]
-Documentation=https://github.com/ente-io/museum
+Documentation=https://github.com/ente-io/ente/tree/main/server#readme
 Requires=docker.service
 After=docker.service
-# Don't automatically restart if it fails more than 5 times in 10 minutes.
-StartLimitIntervalSec=600
-StartLimitBurst=5

 [Service]
 Restart=on-failure
--- a/server/scripts/deploy/update-and-restart-museum.sh
+++ b/server/scripts/deploy/update-and-restart-museum.sh
@ -0,0 +1,15 @@
+#!/bin/sh
+
+# This script is meant to be run on the production instances.
+#
+# It will pull the latest Docker image, restart the museum process and start
+# tailing the logs as a sanity check.
+
+set -o errexit
+
+# The service file also does this, but also pre-pull here to minimize downtime.
+sudo docker pull rg.fr-par.scw.cloud/ente/museum-prod
+
+sudo systemctl restart museum
+sudo systemctl status museum | more
+sudo tail -f /root/var/logs/museum.log