0ct0pu5/healthchecks

Author	SHA1	Message	Date
Pēteris Caune	9d4fc031aa	Fix sendalerts to check the self.shutdown flag more often	2024-09-03 10:30:18 +03:00
Pēteris Caune	3275e0ffaa	Update notify() to return logs instead of printing them	2024-09-03 10:23:15 +03:00
Pēteris Caune	8c56ca6dde	Update sendalerts to mark flip as processed on thread Previously this was done in process_one_flip (so on the main thread). The advantage of doing this way is the flip gets marked as processed only when the thread has started and has acquired a db connection. There is now a smaller pause between a sendalerts process claiming a flip, and actually starting work on it.	2024-09-01 15:28:48 +03:00
Pēteris Caune	fd75049e0c	Fix type warnings	2024-08-31 19:23:10 +03:00
Pēteris Caune	a463daa775	Update Webhook transport to close db connection before network IO Webhook requests can take 20+ seconds. During that time we hold on to a database connection. With this commit, the Webhook transport closes its DB connection before making a curl call. With psycopg2 this does not have much effect. But with psycopg 3 & connection pooling we will be able to use more sendalerts workers than we have database connections. While one worker is busy making a slow curl call, another worker can grab its freed up connection and do some work. Django's test runner is not happy with connections closed mid-test, so I patched out close_old_connections() in affected tests.	2024-08-31 19:18:17 +03:00
Pēteris Caune	9803d77a1d	Set explicit max_workers value for ThreadPoolExecutor This is a tricky one: the default value for max_workers is None. But it doesn't mean "unlimited", in Python 3.8+ it means "min(32, os.cpu_count() + 4)" For example on 8-core CPU the effective value would be 8 + 4 = 12, and passing anything above 12 to `--max-workers` would have no effect.	2024-08-31 19:11:39 +03:00
Pēteris Caune	4cd677536d	Remove sent notification counter The counter was slightly wrong (it counted lost races as sent notifications). Rather than complicating code to make it correct, let's rather just remove it :-)	2024-08-31 19:07:25 +03:00
Pēteris Caune	faa1a2c99f	Add logging for exceptions thrown inside notify()	2024-08-31 19:04:41 +03:00
Pēteris Caune	7641f2a9a1	Switch to using close_old_connections() instead of connection.close()	2024-08-31 19:02:11 +03:00
Pēteris Caune	d76dc53e49	Increase Signal send timeout to 60 seconds	2024-08-31 11:07:17 +03:00
Pēteris Caune	b1b0a57033	Tweak sendalerts log format	2024-08-30 17:00:30 +03:00
Pēteris Caune	8a3a9b2a7e	Fix code comments	2024-08-29 16:30:28 +03:00
Pēteris Caune	029881f3b9	Refactor sendalerts * Remove the --no-loop and --no-threads arguments * Use a threadpool to do multiple sends concurrently * Add a new `--num-workers` argument. It limits how many flips we grab from the database and process concurrently. * Do not prioritize flips with historically low send times any more (not as important now with concurrent sending, and simpler this way) * Workers close db connections when they finish (to keep the number of idle connections low) Note: concurrent.futures.ThreadPoolExecutor internally has an unbounded queue, it will accept any amount of jobs and keep them queued. We don't want that. We only want to grab a flip, and commit to processing it, if we know there's a free worker for it. Therefore we're tracking the number of jobs in flight using a semaphore (`self.seats`).	2024-08-29 16:20:36 +03:00
Pēteris Caune	3968a4f9e0	Update MS Teams Connector EOL date	2024-08-27 16:34:59 +03:00
Pēteris Caune	027fcc1097	Simplify and eliminate assert	2024-08-20 14:39:11 +03:00
Pēteris Caune	0a4f038987	Simplify and eliminate assert	2024-08-20 14:13:58 +03:00
Pēteris Caune	b27ffe07a6	Update email_form to use more precise type annotation	2024-08-20 13:58:52 +03:00
Pēteris Caune	001ba8b69b	Fix type warnings	2024-08-20 11:06:55 +03:00
Pēteris Caune	5e051bfc30	Fix AJAX views to better handle user logging out Rather than redirecting to login page, return HTTP 403 Forbidden	2024-08-20 10:57:36 +03:00
Pēteris Caune	70b55a777b	Add migration which updates Channel.kind values This is to go with `8054191be3`, and should have been in there :-) cc: #1050	2024-08-17 12:12:47 +03:00
Pēteris Caune	d3ae4e7fac	Add support for $SLUG placeholder in webhook payloads Fixes: #1049	2024-08-16 13:24:12 +03:00
Pēteris Caune	cda744d0c1	Implement search by slug in the checks list cc: #1048	2024-08-15 14:17:28 +03:00
Pēteris Caune	3fbba0c2f0	Update timezone dropdowns to show frequently used timezones at the top	2024-08-13 13:57:52 +03:00
Pēteris Caune	b859a71920	Rename "sign in" to "log in" I like "sign in" better, but users from time to time confuse "sign in" and "sign up" forms. To reduce confusion potential, I'm renaming "sign in" to "log in".	2024-08-12 15:09:58 +03:00
Pēteris Caune	56862a1c49	Update NotificationsAdmin to use __ lookup in list_display	2024-08-07 17:39:17 +03:00
Pēteris Caune	f7876f67d7	Remove unused code	2024-08-07 17:38:43 +03:00
Pēteris Caune	aa2bd8cf66	Fix a testcase not correctly using sample values	2024-07-29 10:36:29 +03:00
Pēteris Caune	ba8a58a8a7	Fix type annotation	2024-07-29 09:57:28 +03:00
Pēteris Caune	42b733540d	Fix type annotation It used the wrong model name and neither me nor mypy noticed until upgrade to django-stubs 5.0.4	2024-07-29 09:50:56 +03:00
Pēteris Caune	7346994ae8	Fix field name in TypedDict used for type checking	2024-07-18 18:19:01 +03:00
Pēteris Caune	bdb6f18a3d	Add "uuid" field in API responses when read/write key is used The API responses already contain ping_url, update_url, resume_url, pause_url fields where the UUID can be extracted from, so we are not exposing new information. The extraction can be finicky in, say, shell-scripting scenarios. So for API user convenience we will now also provide the check's code (UUID) as a separate field. Fixes: #1007	2024-07-18 18:15:52 +03:00
Pēteris Caune	8054191be3	Remove HipChat, Pagerteam, Zendesk channel kinds HipChat and Pagerteam products have long been shut down, the Zendesk integration was never fully implemented.	2024-07-18 16:21:45 +03:00
Pēteris Caune	61bdd975e8	Add "(stops working Oct 2024)" note to the old MS Teams integration	2024-07-18 10:27:51 +03:00
Pēteris Caune	9660bc293c	Update hc.lib.s3 to retry failed requests one time	2024-07-17 17:26:49 +03:00
Pēteris Caune	ce5d9bcf56	Re-enable S3 retries	2024-07-17 17:04:33 +03:00
Pēteris Caune	e83f60cc0b	Implement Implement MS Teams Workflows integration We already have a MS Teams integration but MS Teams is discontinuing the incoming webhook feature used by this integration: https://devblogs.microsoft.com/microsoft365dev/retirement-of-office-365-connectors-within-microsoft-teams/ MS Teams now recommends to use Workflows to post messages via webhook. MS Teams does not provide backwards compatibility or an upgrade path for existing integrations. This commit adds a new "msteamsw" integration which uses MS Teams Workflows to post notifications. It also updates the instructions and illustrations in the "Add MS Teams Integration" page. cc: #1024	2024-07-17 13:35:17 +03:00
Pēteris Caune	1877a8324f	Disable S3 API request retries urrlib3's default number of retries is 3. If requests to the S3 API are timing out, the retries usually don't help, but a 10-second timeout turns into 10*3=30 seconds of python code being blocked.	2024-07-12 03:09:21 +03:00
Pēteris Caune	70c5be5c4b	Fix type warning	2024-07-11 17:45:51 +03:00
Pēteris Caune	1b695c6970	Improve performance of loading ping body previews Defer loading body_raw, instead load its first 150 bytes as "body_raw_preview". This reduces both network I/O to database, and disk I/O on the database host if the database contains large request bodies. cc: #1023	2024-07-11 17:38:25 +03:00
Pēteris Caune	3e5080d9eb	Remove Ping.body field	2024-07-11 16:34:18 +03:00
Pēteris Caune	997154e3b0	Remove usages of Ping.body	2024-07-11 16:17:21 +03:00
Pēteris Caune	daaee30c88	Add data migration to move Check.body -> Check.body_raw We used "body" to store request body as text. In 2022 we added "body_raw" and started to use it to store request body as bytes. In python code we currently need to inspect both fields, because the data could be in "body" (for old pings) or in "body_raw" (for newer pings). My plan is to eventually get rid of the "body" field, and have "body_raw" only. This data migration is a step towards that: for any Ping objects that have non-empty "body" field, it moves the data to the "body_raw" field. After applying this migration, the "body" field should be empty (empty string or null) for all Ping objects.	2024-07-11 14:38:36 +03:00
Pēteris Caune	bc8fb90fed	Update Check.ping() to use select_for_update() Without it, on MariaDB, concurrent pings can lead to a deadlock. This results in OperationalError and HTTP 500 response to the client. cc: #1023	2024-07-10 19:50:39 +03:00
Pēteris Caune	b3de36d15c	Reorder system checks in hc.api.apps	2024-07-04 11:32:28 +03:00
Pēteris Caune	23f3256abc	Rename and clean up the apprise system check	2024-07-04 11:28:58 +03:00
Pēteris Caune	cf619bc68b	Fix hc.api.transports to not alter settings.APPRISE_ENABLED setting. Instead, make it set a local `have_apprise` variable, and use it in the hc.api.transports.Apprise class. If hc.api.transports sets APPRISE_ENABLED to False, then the apprise system check in hc.api.apps will not see the original value and therefore will not run.	2024-07-04 11:28:16 +03:00
Rajesh Kumar	57459b0375	Show warning if apprise is enabled but apprise package is not installed (#1021 ) * fix: show warning if apprise is enabled and not installed in environment * renamed appraise check register * revert back changes in transport for apprise	2024-07-04 11:12:05 +03:00
Pēteris Caune	8d0930c4b9	Fix unclosed sockets in statsd tests	2024-06-27 11:03:29 +03:00
Pēteris Caune	b5eced26cf	Fix migrations for Django 5.1	2024-06-27 10:20:27 +03:00
Pēteris Caune	324fa10ce7	Fix Check.lock_and_delete() to gracefully handle already deleted check	2024-06-20 15:57:53 +03:00

1 2 3 4 5 ...

1936 commits