Also simplify the retry logic: each retry attempt is now
allowed to use the full 30 seconds. This means, a single
webhook delivery can take up to 3*30=90 seconds.
If sendalerts receives this parameter, it reconfigures
settings.DATABASES to enable db connection pooling
(using psycopg_pool with default parameters).
This lets us use many concurrent worker threads but not
run out of database connections. For example, with
`--num-workers 100 --pool`, up to 100 worker threads can run
concurrently, but only 3 threads can get a database connection
from the pool, the rest have to wait. When a worker thread
gives up a connection (by calling `close_old_connections`),
another thread can continue.
A worker thread can give up a db connection before it is fully
finished if it anticipates a long network IO operation ahead.
The Webhook transport does this before making a curl call.
psycopg_pool's default pool size is 4 connections. One
connection is used up by the main thread, so 3 connections
are available for the worker threads.
Fixes: #1057
"PRAGMA busy_timeout" configures the database to wait when a
database is locked instead of giving up immediately.
"transaction_mode IMMEDIATE" starts transactions in read/write
mode, required to make busy_timeout work.
Reference: https://gcollazo.com/optimal-sqlite-settings-for-django/
... and install psycopg-c using instuctions in Dockerfile.
This way, getting a development environment or CI environment ready
is quick and easy, but Docker images still get the C optimizations.
Previously this was done in process_one_flip (so on the main thread).
The advantage of doing this way is the flip gets marked as processed
only when the thread has started and has acquired a db connection.
There is now a smaller pause between a sendalerts process claiming a
flip, and actually starting work on it.
Webhook requests can take 20+ seconds. During that time we hold
on to a database connection. With this commit, the Webhook transport
closes its DB connection before making a curl call.
With psycopg2 this does not have much effect. But with
psycopg 3 & connection pooling we will be able to use more
sendalerts workers than we have database connections. While one
worker is busy making a slow curl call, another worker can
grab its freed up connection and do some work.
Django's test runner is not happy with connections closed
mid-test, so I patched out close_old_connections() in affected tests.
This is a tricky one: the default value for max_workers is
None. But it doesn't mean "unlimited", in Python 3.8+ it
means "min(32, os.cpu_count() + 4)"
For example on 8-core CPU the effective value would be 8 + 4 = 12,
and passing anything above 12 to `--max-workers` would have no effect.
The counter was slightly wrong (it counted lost races as sent
notifications). Rather than complicating code to make it correct,
let's rather just remove it :-)
* Remove the --no-loop and --no-threads arguments
* Use a threadpool to do multiple sends concurrently
* Add a new `--num-workers` argument. It limits how many flips we grab
from the database and process concurrently.
* Do not prioritize flips with historically low send times any more
(not as important now with concurrent sending, and simpler this way)
* Workers close db connections when they finish
(to keep the number of idle connections low)
Note: concurrent.futures.ThreadPoolExecutor internally has an unbounded
queue, it will accept any amount of jobs and keep them queued. We don't
want that. We only want to grab a flip, and commit to processing it,
if we know there's a free worker for it. Therefore we're tracking the
number of jobs in flight using a semaphore (`self.seats`).
Commit 8fed685f12 added a HEALTHCHECK
instruction in the Dockerfile. The healthcheck script calls http://localhost:8000/api/v3/status/, which fails if localhost is not in ALLOWED_HOSTS.
With this change, the healthcheck script is now a Django management
command. It reads Django's ALLOWED_HOSTS setting, grabs the first
element, and uses it in the "Host:" HTTP header when making a HTTP
request.
cc: #1051
* Healthchecks depends on python library "fido2"
* fido2 depends on python library "cryptography"
* building cryptography requires recent (1.65+) rustc
* cryptography has prebuilt binary wheels for most architectures
but not for arm/v7
* Dockerfile uses bookworm as base, which ships rustc 1.63
* So we now install rust using rustup
This is all terrible.
This commit adds a HEALTHCHECK instruction in Dockerfile.
The HEALTHCHECK instruction calls /docker/fetchstatus.sh
which in turn makes a HTTP request to
http://localhost:8000/api/v3/status/
This endpoint makes a test database query and returns non-200
response if the query fails. So, in short, if the Healthchecks
container for any reason is unable to query database, `docker ps`
will now show the container as "unhealthy".
cc: #1045
Previously, if the user enters a weak password like "qwerty",
the score is 0, the password strength bar is empty (all gray).
It is easy to not notice the password strength bar at all.
Now, the lowest score for a non-empty password is 1, meaning
the user will see one red bar. This will hopefully draw more
attention to the password strength bar.
Users are still allowed to choose weak passwords.
I like "sign in" better, but users from time
to time confuse "sign in" and "sign up" forms. To reduce
confusion potential, I'm renaming "sign in" to "log in".