0ct0pu5/healthchecks

Author	SHA1	Message	Date
Pēteris Caune	9d4fc031aa	Fix sendalerts to check the self.shutdown flag more often	2024-09-03 10:30:18 +03:00
Pēteris Caune	3275e0ffaa	Update notify() to return logs instead of printing them	2024-09-03 10:23:15 +03:00
Pēteris Caune	8c56ca6dde	Update sendalerts to mark flip as processed on thread Previously this was done in process_one_flip (so on the main thread). The advantage of doing this way is the flip gets marked as processed only when the thread has started and has acquired a db connection. There is now a smaller pause between a sendalerts process claiming a flip, and actually starting work on it.	2024-09-01 15:28:48 +03:00
Pēteris Caune	fd75049e0c	Fix type warnings	2024-08-31 19:23:10 +03:00
Pēteris Caune	a463daa775	Update Webhook transport to close db connection before network IO Webhook requests can take 20+ seconds. During that time we hold on to a database connection. With this commit, the Webhook transport closes its DB connection before making a curl call. With psycopg2 this does not have much effect. But with psycopg 3 & connection pooling we will be able to use more sendalerts workers than we have database connections. While one worker is busy making a slow curl call, another worker can grab its freed up connection and do some work. Django's test runner is not happy with connections closed mid-test, so I patched out close_old_connections() in affected tests.	2024-08-31 19:18:17 +03:00
Pēteris Caune	9803d77a1d	Set explicit max_workers value for ThreadPoolExecutor This is a tricky one: the default value for max_workers is None. But it doesn't mean "unlimited", in Python 3.8+ it means "min(32, os.cpu_count() + 4)" For example on 8-core CPU the effective value would be 8 + 4 = 12, and passing anything above 12 to `--max-workers` would have no effect.	2024-08-31 19:11:39 +03:00
Pēteris Caune	4cd677536d	Remove sent notification counter The counter was slightly wrong (it counted lost races as sent notifications). Rather than complicating code to make it correct, let's rather just remove it :-)	2024-08-31 19:07:25 +03:00
Pēteris Caune	faa1a2c99f	Add logging for exceptions thrown inside notify()	2024-08-31 19:04:41 +03:00
Pēteris Caune	7641f2a9a1	Switch to using close_old_connections() instead of connection.close()	2024-08-31 19:02:11 +03:00
Pēteris Caune	d76dc53e49	Increase Signal send timeout to 60 seconds	2024-08-31 11:07:17 +03:00
Pēteris Caune	b1b0a57033	Tweak sendalerts log format	2024-08-30 17:00:30 +03:00
Pēteris Caune	8a3a9b2a7e	Fix code comments	2024-08-29 16:30:28 +03:00
Pēteris Caune	029881f3b9	Refactor sendalerts * Remove the --no-loop and --no-threads arguments * Use a threadpool to do multiple sends concurrently * Add a new `--num-workers` argument. It limits how many flips we grab from the database and process concurrently. * Do not prioritize flips with historically low send times any more (not as important now with concurrent sending, and simpler this way) * Workers close db connections when they finish (to keep the number of idle connections low) Note: concurrent.futures.ThreadPoolExecutor internally has an unbounded queue, it will accept any amount of jobs and keep them queued. We don't want that. We only want to grab a flip, and commit to processing it, if we know there's a free worker for it. Therefore we're tracking the number of jobs in flight using a semaphore (`self.seats`).	2024-08-29 16:20:36 +03:00
Pēteris Caune	3968a4f9e0	Update MS Teams Connector EOL date	2024-08-27 16:34:59 +03:00
Pēteris Caune	70b55a777b	Add migration which updates Channel.kind values This is to go with `8054191be3`, and should have been in there :-) cc: #1050	2024-08-17 12:12:47 +03:00
Pēteris Caune	d3ae4e7fac	Add support for $SLUG placeholder in webhook payloads Fixes: #1049	2024-08-16 13:24:12 +03:00
Pēteris Caune	56862a1c49	Update NotificationsAdmin to use __ lookup in list_display	2024-08-07 17:39:17 +03:00
Pēteris Caune	42b733540d	Fix type annotation It used the wrong model name and neither me nor mypy noticed until upgrade to django-stubs 5.0.4	2024-07-29 09:50:56 +03:00
Pēteris Caune	7346994ae8	Fix field name in TypedDict used for type checking	2024-07-18 18:19:01 +03:00
Pēteris Caune	bdb6f18a3d	Add "uuid" field in API responses when read/write key is used The API responses already contain ping_url, update_url, resume_url, pause_url fields where the UUID can be extracted from, so we are not exposing new information. The extraction can be finicky in, say, shell-scripting scenarios. So for API user convenience we will now also provide the check's code (UUID) as a separate field. Fixes: #1007	2024-07-18 18:15:52 +03:00
Pēteris Caune	8054191be3	Remove HipChat, Pagerteam, Zendesk channel kinds HipChat and Pagerteam products have long been shut down, the Zendesk integration was never fully implemented.	2024-07-18 16:21:45 +03:00
Pēteris Caune	61bdd975e8	Add "(stops working Oct 2024)" note to the old MS Teams integration	2024-07-18 10:27:51 +03:00
Pēteris Caune	e83f60cc0b	Implement Implement MS Teams Workflows integration We already have a MS Teams integration but MS Teams is discontinuing the incoming webhook feature used by this integration: https://devblogs.microsoft.com/microsoft365dev/retirement-of-office-365-connectors-within-microsoft-teams/ MS Teams now recommends to use Workflows to post messages via webhook. MS Teams does not provide backwards compatibility or an upgrade path for existing integrations. This commit adds a new "msteamsw" integration which uses MS Teams Workflows to post notifications. It also updates the instructions and illustrations in the "Add MS Teams Integration" page. cc: #1024	2024-07-17 13:35:17 +03:00
Pēteris Caune	3e5080d9eb	Remove Ping.body field	2024-07-11 16:34:18 +03:00
Pēteris Caune	997154e3b0	Remove usages of Ping.body	2024-07-11 16:17:21 +03:00
Pēteris Caune	daaee30c88	Add data migration to move Check.body -> Check.body_raw We used "body" to store request body as text. In 2022 we added "body_raw" and started to use it to store request body as bytes. In python code we currently need to inspect both fields, because the data could be in "body" (for old pings) or in "body_raw" (for newer pings). My plan is to eventually get rid of the "body" field, and have "body_raw" only. This data migration is a step towards that: for any Ping objects that have non-empty "body" field, it moves the data to the "body_raw" field. After applying this migration, the "body" field should be empty (empty string or null) for all Ping objects.	2024-07-11 14:38:36 +03:00
Pēteris Caune	bc8fb90fed	Update Check.ping() to use select_for_update() Without it, on MariaDB, concurrent pings can lead to a deadlock. This results in OperationalError and HTTP 500 response to the client. cc: #1023	2024-07-10 19:50:39 +03:00
Pēteris Caune	b3de36d15c	Reorder system checks in hc.api.apps	2024-07-04 11:32:28 +03:00
Pēteris Caune	23f3256abc	Rename and clean up the apprise system check	2024-07-04 11:28:58 +03:00
Pēteris Caune	cf619bc68b	Fix hc.api.transports to not alter settings.APPRISE_ENABLED setting. Instead, make it set a local `have_apprise` variable, and use it in the hc.api.transports.Apprise class. If hc.api.transports sets APPRISE_ENABLED to False, then the apprise system check in hc.api.apps will not see the original value and therefore will not run.	2024-07-04 11:28:16 +03:00
Rajesh Kumar	57459b0375	Show warning if apprise is enabled but apprise package is not installed (#1021 ) * fix: show warning if apprise is enabled and not installed in environment * renamed appraise check register * revert back changes in transport for apprise	2024-07-04 11:12:05 +03:00
Pēteris Caune	b5eced26cf	Fix migrations for Django 5.1	2024-06-27 10:20:27 +03:00
Pēteris Caune	324fa10ce7	Fix Check.lock_and_delete() to gracefully handle already deleted check	2024-06-20 15:57:53 +03:00
Viktor Szépe	9a44ef1571	Fix typos	2024-06-20 15:41:42 +03:00
Pēteris Caune	b2c5e91c70	Implement legacy -> canonical timezone conversion There are three related changes: * Removed legacy timezones from hc.lib.tz.all_timezones * Added data migration to update existing Check.tz values * For backwards compatibility, added code to automatically replace a legacy timezone with a canonical timezone when a legacy timezone is passed to an API call I used the timezone mapping on https://en.wikipedia.org/wiki/List_of_tz_database_time_zones	2024-06-14 12:55:57 +03:00
Pēteris Caune	52f2b534a6	Fix API to accept Europe/Kiev but save it as Europe/Kyiv	2024-06-13 15:23:27 +03:00
Pēteris Caune	c5bd666faf	Add data migration to update timezone "Europe/Kiev" to "Europe/Kyiv"	2024-06-13 15:03:51 +03:00
Pēteris Caune	26a57343b1	Add a data migration to fill null api_notification.code values Using model's default didn't quite work, as Django tried to use the same UUID for all rows.	2024-05-17 10:43:46 +03:00
Pēteris Caune	d486d2db14	Add uniqueness constraint to api_notification.code This is primarily to make notification lookups by code efficient. We look up notifications by code in hc.api.views.boundces. This field has a default value (uuid.uuid4), so any null values will be filled with random UUIDs during migration.	2024-05-17 10:30:01 +03:00
Pēteris Caune	99d74d2c2c	Add type hint for view_on_site in channel admin	2024-05-01 11:18:31 +03:00
Pēteris Caune	4ec7a48082	Update the Discord integration to disable channel on HTTP 404 responses	2024-04-26 09:25:42 +03:00
Pēteris Caune	872e4d743e	Increase the timeout for sending Signal messages to 20 seconds We're sometimes overshooting the 15 seconds, so let's try increasing the limit a little.	2024-04-25 14:52:15 +03:00
Pēteris Caune	6fb46aee32	Fix integrations to include oncalendar schedules in notifications	2024-04-24 16:08:55 +03:00
Pēteris Caune	4181399659	Fix Spike integration to not disclose check's code in incident data	2024-04-22 13:01:38 +03:00
Pēteris Caune	ddae6a04bf	Fix VictorOps integration to not disclose check's code in incident data	2024-04-22 12:57:10 +03:00
Pēteris Caune	c08ba1d872	Fix PagerTree integration to not disclose check's code in incident data	2024-04-22 12:46:18 +03:00
Pēteris Caune	53f554df1e	Fix type warning	2024-04-22 12:45:51 +03:00
Pēteris Caune	994bc10857	Update PagerDuty integration to use ping.formatted_kind_created	2024-04-22 12:31:03 +03:00
Pēteris Caune	18bd44a68b	Fix PagerDuty integration to not disclose check's code in incident data	2024-04-22 12:12:22 +03:00
Pēteris Caune	e683496bed	Move reusable ping formatting code to Ping model	2024-04-19 12:38:20 +03:00

1 2 3 4 5 ...

1021 commits