0ct0pu5/healthchecks

Author	SHA1	Message	Date
Pēteris Caune	c91213179f	Fix API to gracefully handle too long slugs	2024-10-16 12:35:30 +03:00
Pēteris Caune	8c210e151f	Update the Signal integration to retry on network errors	2024-10-14 11:19:37 +03:00
Pēteris Caune	d574fa65fc	Update _refresh_last_active_date to also refresh user session Fixes: #1063	2024-10-10 15:41:00 +03:00
Pēteris Caune	4f9b0b11b9	Update Signal transport to log unexpected signal-cli replies When signal-cli returns an error that we are not handling yet, log the precise JSON message that signal-cli returns. This is for debug & development: We can look at the logged messages and see what additional special error handling may be needed.	2024-10-10 10:21:08 +03:00
Pēteris Caune	e49b5f8fbd	Remove LINE Notify onboarding form LINE Notify is shutting down on Apr 1, 2025: https://notify-bot.line.me/closing-announce I'm removing the onboarding form so people don't set up new integrations that will stop working in 5 months. The code for sending LINE Notify notifications still exists, and the existing integrations will continue to work (until LINE Notify stops working).	2024-10-08 09:13:03 +03:00
Pēteris Caune	fd96cc794b	Remove unused bits	2024-10-04 17:34:30 +03:00
Pēteris Caune	a51420744c	Add RiskCheck: disable in SMS transport This is to reduce the chance of hitting Twilio error 30453, "Message couldn't be delivered". https://www.twilio.com/docs/api/errors/30453	2024-10-02 17:01:23 +03:00
Pēteris Caune	de4c4897e3	Remove `prunenotifications` management command Notifications are now cleaned up automatically during pinging.	2024-10-02 09:24:01 +03:00
Pēteris Caune	13f92b90ef	Update settings.py to read SECURE_PROXY_SSL_HEADER from env vars And add it to docs. And add a system check to make sure it, if set, is a tuple with 2 elements. cc: #851	2024-10-01 19:13:26 +03:00
Pēteris Caune	e73d7a1ece	Remove `pruneflips` management command Flips are now cleaned up automatically during pinging.	2024-10-01 15:33:56 +03:00
Pēteris Caune	12cccaf7d1	Fix Project.num_checks naming collision The Project model has (well, had) a num_checks() method. In the project admin we are also annotating project queryset with a "num_checks" property. Using the same name for two different things causes type confusion for mypy and can also lead to coding accidents. This commit removes the Project.num_checks() method. This was easier to do than changing admin, as the method is very simple and was used in only two places.	2024-09-24 10:18:22 +03:00
Pēteris Caune	2cb47d3742	Make the sorting of null values in Flip.select_channels() explicit	2024-09-12 10:52:06 +03:00
Pēteris Caune	f241d070e1	Update Flip.select_channels() to sort channels by last_notify_duration If a check has multiple associated channels, some are slow and some are quick, handle the quick ones first.	2024-09-12 10:44:56 +03:00
Pēteris Caune	f60af9a156	Update ntfy integration to give up db connection before network IO	2024-09-12 10:30:58 +03:00
Pēteris Caune	28af3720f4	Increase outgoing webhook timeout from 10 to 30 seconds Also simplify the retry logic: each retry attempt is now allowed to use the full 30 seconds. This means, a single webhook delivery can take up to 3*30=90 seconds.	2024-09-11 12:37:40 +03:00
Pēteris Caune	13217af304	Add --pool parameter in `manage.py sendalerts` If sendalerts receives this parameter, it reconfigures settings.DATABASES to enable db connection pooling (using psycopg_pool with default parameters). This lets us use many concurrent worker threads but not run out of database connections. For example, with `--num-workers 100 --pool`, up to 100 worker threads can run concurrently, but only 3 threads can get a database connection from the pool, the rest have to wait. When a worker thread gives up a connection (by calling `close_old_connections`), another thread can continue. A worker thread can give up a db connection before it is fully finished if it anticipates a long network IO operation ahead. The Webhook transport does this before making a curl call. psycopg_pool's default pool size is 4 connections. One connection is used up by the main thread, so 3 connections are available for the worker threads.	2024-09-10 14:58:24 +03:00
Pēteris Caune	8eecece0bb	Add db migration for the updated msteams name	2024-09-10 14:45:48 +03:00
Pēteris Caune	fd0c428e29	Update sqlite settings to avoid "Database is locked" errors Fixes: #1057 "PRAGMA busy_timeout" configures the database to wait when a database is locked instead of giving up immediately. "transaction_mode IMMEDIATE" starts transactions in read/write mode, required to make busy_timeout work. Reference: https://gcollazo.com/optimal-sqlite-settings-for-django/	2024-09-09 10:11:22 +03:00
Pēteris Caune	6bf588d984	Remove unused import	2024-09-04 10:49:09 +03:00
Pēteris Caune	9d4fc031aa	Fix sendalerts to check the self.shutdown flag more often	2024-09-03 10:30:18 +03:00
Pēteris Caune	3275e0ffaa	Update notify() to return logs instead of printing them	2024-09-03 10:23:15 +03:00
Pēteris Caune	8c56ca6dde	Update sendalerts to mark flip as processed on thread Previously this was done in process_one_flip (so on the main thread). The advantage of doing this way is the flip gets marked as processed only when the thread has started and has acquired a db connection. There is now a smaller pause between a sendalerts process claiming a flip, and actually starting work on it.	2024-09-01 15:28:48 +03:00
Pēteris Caune	fd75049e0c	Fix type warnings	2024-08-31 19:23:10 +03:00
Pēteris Caune	a463daa775	Update Webhook transport to close db connection before network IO Webhook requests can take 20+ seconds. During that time we hold on to a database connection. With this commit, the Webhook transport closes its DB connection before making a curl call. With psycopg2 this does not have much effect. But with psycopg 3 & connection pooling we will be able to use more sendalerts workers than we have database connections. While one worker is busy making a slow curl call, another worker can grab its freed up connection and do some work. Django's test runner is not happy with connections closed mid-test, so I patched out close_old_connections() in affected tests.	2024-08-31 19:18:17 +03:00
Pēteris Caune	9803d77a1d	Set explicit max_workers value for ThreadPoolExecutor This is a tricky one: the default value for max_workers is None. But it doesn't mean "unlimited", in Python 3.8+ it means "min(32, os.cpu_count() + 4)" For example on 8-core CPU the effective value would be 8 + 4 = 12, and passing anything above 12 to `--max-workers` would have no effect.	2024-08-31 19:11:39 +03:00
Pēteris Caune	4cd677536d	Remove sent notification counter The counter was slightly wrong (it counted lost races as sent notifications). Rather than complicating code to make it correct, let's rather just remove it :-)	2024-08-31 19:07:25 +03:00
Pēteris Caune	faa1a2c99f	Add logging for exceptions thrown inside notify()	2024-08-31 19:04:41 +03:00
Pēteris Caune	7641f2a9a1	Switch to using close_old_connections() instead of connection.close()	2024-08-31 19:02:11 +03:00
Pēteris Caune	d76dc53e49	Increase Signal send timeout to 60 seconds	2024-08-31 11:07:17 +03:00
Pēteris Caune	b1b0a57033	Tweak sendalerts log format	2024-08-30 17:00:30 +03:00
Pēteris Caune	8a3a9b2a7e	Fix code comments	2024-08-29 16:30:28 +03:00
Pēteris Caune	029881f3b9	Refactor sendalerts * Remove the --no-loop and --no-threads arguments * Use a threadpool to do multiple sends concurrently * Add a new `--num-workers` argument. It limits how many flips we grab from the database and process concurrently. * Do not prioritize flips with historically low send times any more (not as important now with concurrent sending, and simpler this way) * Workers close db connections when they finish (to keep the number of idle connections low) Note: concurrent.futures.ThreadPoolExecutor internally has an unbounded queue, it will accept any amount of jobs and keep them queued. We don't want that. We only want to grab a flip, and commit to processing it, if we know there's a free worker for it. Therefore we're tracking the number of jobs in flight using a semaphore (`self.seats`).	2024-08-29 16:20:36 +03:00
Pēteris Caune	3968a4f9e0	Update MS Teams Connector EOL date	2024-08-27 16:34:59 +03:00
Pēteris Caune	027fcc1097	Simplify and eliminate assert	2024-08-20 14:39:11 +03:00
Pēteris Caune	0a4f038987	Simplify and eliminate assert	2024-08-20 14:13:58 +03:00
Pēteris Caune	b27ffe07a6	Update email_form to use more precise type annotation	2024-08-20 13:58:52 +03:00
Pēteris Caune	001ba8b69b	Fix type warnings	2024-08-20 11:06:55 +03:00
Pēteris Caune	5e051bfc30	Fix AJAX views to better handle user logging out Rather than redirecting to login page, return HTTP 403 Forbidden	2024-08-20 10:57:36 +03:00
Pēteris Caune	70b55a777b	Add migration which updates Channel.kind values This is to go with `8054191be3`, and should have been in there :-) cc: #1050	2024-08-17 12:12:47 +03:00
Pēteris Caune	d3ae4e7fac	Add support for $SLUG placeholder in webhook payloads Fixes: #1049	2024-08-16 13:24:12 +03:00
Pēteris Caune	cda744d0c1	Implement search by slug in the checks list cc: #1048	2024-08-15 14:17:28 +03:00
Pēteris Caune	3fbba0c2f0	Update timezone dropdowns to show frequently used timezones at the top	2024-08-13 13:57:52 +03:00
Pēteris Caune	b859a71920	Rename "sign in" to "log in" I like "sign in" better, but users from time to time confuse "sign in" and "sign up" forms. To reduce confusion potential, I'm renaming "sign in" to "log in".	2024-08-12 15:09:58 +03:00
Pēteris Caune	56862a1c49	Update NotificationsAdmin to use __ lookup in list_display	2024-08-07 17:39:17 +03:00
Pēteris Caune	f7876f67d7	Remove unused code	2024-08-07 17:38:43 +03:00
Pēteris Caune	aa2bd8cf66	Fix a testcase not correctly using sample values	2024-07-29 10:36:29 +03:00
Pēteris Caune	ba8a58a8a7	Fix type annotation	2024-07-29 09:57:28 +03:00
Pēteris Caune	42b733540d	Fix type annotation It used the wrong model name and neither me nor mypy noticed until upgrade to django-stubs 5.0.4	2024-07-29 09:50:56 +03:00
Pēteris Caune	7346994ae8	Fix field name in TypedDict used for type checking	2024-07-18 18:19:01 +03:00
Pēteris Caune	bdb6f18a3d	Add "uuid" field in API responses when read/write key is used The API responses already contain ping_url, update_url, resume_url, pause_url fields where the UUID can be extracted from, so we are not exposing new information. The extraction can be finicky in, say, shell-scripting scenarios. So for API user convenience we will now also provide the check's code (UUID) as a separate field. Fixes: #1007	2024-07-18 18:15:52 +03:00

1 2 3 4 5 ...

1955 commits