Commit graph

1021 commits

Author SHA1 Message Date
Pēteris Caune
9d4fc031aa
Fix sendalerts to check the self.shutdown flag more often 2024-09-03 10:30:18 +03:00
Pēteris Caune
3275e0ffaa
Update notify() to return logs instead of printing them 2024-09-03 10:23:15 +03:00
Pēteris Caune
8c56ca6dde
Update sendalerts to mark flip as processed on thread
Previously this was done in process_one_flip (so on the main thread).
The advantage of doing this way is the flip gets marked as processed
only when the thread has started and has acquired a db connection.
There is now a smaller pause between a sendalerts process claiming a
flip, and actually starting work on it.
2024-09-01 15:28:48 +03:00
Pēteris Caune
fd75049e0c
Fix type warnings 2024-08-31 19:23:10 +03:00
Pēteris Caune
a463daa775
Update Webhook transport to close db connection before network IO
Webhook requests can take 20+ seconds. During that time we hold
on to a database connection. With this commit, the Webhook transport
closes its DB connection before making a curl call.

With psycopg2 this does not have much effect. But with
psycopg 3 & connection pooling we will be able to use more
sendalerts workers than we have database connections. While one
worker is busy making a slow curl call, another worker can
grab its freed up connection and do some work.

Django's test runner is not happy with connections closed
mid-test, so I patched out close_old_connections() in affected tests.
2024-08-31 19:18:17 +03:00
Pēteris Caune
9803d77a1d
Set explicit max_workers value for ThreadPoolExecutor
This is a tricky one: the default value for max_workers is
None. But it doesn't mean "unlimited", in Python 3.8+ it
means "min(32, os.cpu_count() + 4)"

For example on 8-core CPU the effective value would be 8 + 4 = 12,
and passing anything above 12 to `--max-workers` would have no effect.
2024-08-31 19:11:39 +03:00
Pēteris Caune
4cd677536d
Remove sent notification counter
The counter was slightly wrong (it counted lost races as sent
notifications). Rather than complicating code to make it correct,
let's rather just remove it :-)
2024-08-31 19:07:25 +03:00
Pēteris Caune
faa1a2c99f
Add logging for exceptions thrown inside notify() 2024-08-31 19:04:41 +03:00
Pēteris Caune
7641f2a9a1
Switch to using close_old_connections() instead of connection.close() 2024-08-31 19:02:11 +03:00
Pēteris Caune
d76dc53e49
Increase Signal send timeout to 60 seconds 2024-08-31 11:07:17 +03:00
Pēteris Caune
b1b0a57033
Tweak sendalerts log format 2024-08-30 17:00:30 +03:00
Pēteris Caune
8a3a9b2a7e
Fix code comments 2024-08-29 16:30:28 +03:00
Pēteris Caune
029881f3b9
Refactor sendalerts
* Remove the --no-loop and --no-threads arguments
* Use a threadpool to do multiple sends concurrently
* Add a new `--num-workers` argument. It limits how many flips we grab
  from the database and process concurrently.
* Do not prioritize flips with historically low send times any more
  (not as important now with concurrent sending, and simpler this way)
* Workers close db connections when they finish
  (to keep the number of idle connections low)

Note: concurrent.futures.ThreadPoolExecutor internally has an unbounded
queue, it will accept any amount of jobs and keep them queued. We don't
want that. We only want to grab a flip, and commit to processing it,
if we know there's a free worker for it. Therefore we're tracking the
number of jobs in flight using a semaphore (`self.seats`).
2024-08-29 16:20:36 +03:00
Pēteris Caune
3968a4f9e0
Update MS Teams Connector EOL date 2024-08-27 16:34:59 +03:00
Pēteris Caune
70b55a777b
Add migration which updates Channel.kind values
This is to go with 8054191be3,
and should have been in there :-)

cc: #1050
2024-08-17 12:12:47 +03:00
Pēteris Caune
d3ae4e7fac
Add support for $SLUG placeholder in webhook payloads
Fixes: #1049
2024-08-16 13:24:12 +03:00
Pēteris Caune
56862a1c49
Update NotificationsAdmin to use __ lookup in list_display 2024-08-07 17:39:17 +03:00
Pēteris Caune
42b733540d
Fix type annotation
It used the wrong model name and neither me nor mypy noticed
until upgrade to django-stubs 5.0.4
2024-07-29 09:50:56 +03:00
Pēteris Caune
7346994ae8
Fix field name in TypedDict used for type checking 2024-07-18 18:19:01 +03:00
Pēteris Caune
bdb6f18a3d
Add "uuid" field in API responses when read/write key is used
The API responses already contain ping_url, update_url, resume_url,
pause_url fields where the UUID can be extracted from, so we are
not exposing new information. The extraction can be finicky in,
say, shell-scripting scenarios. So for API user convenience we will
now also provide the check's code (UUID) as a separate field.

Fixes: #1007
2024-07-18 18:15:52 +03:00
Pēteris Caune
8054191be3
Remove HipChat, Pagerteam, Zendesk channel kinds
HipChat and Pagerteam products have long been shut down,
the Zendesk integration was never fully implemented.
2024-07-18 16:21:45 +03:00
Pēteris Caune
61bdd975e8
Add "(stops working Oct 2024)" note to the old MS Teams integration 2024-07-18 10:27:51 +03:00
Pēteris Caune
e83f60cc0b
Implement Implement MS Teams Workflows integration
We already have a MS Teams integration but MS Teams is discontinuing
the incoming webhook feature used by this integration:

https://devblogs.microsoft.com/microsoft365dev/retirement-of-office-365-connectors-within-microsoft-teams/

MS Teams now recommends to use Workflows to post messages
via webhook. MS Teams does not provide backwards compatibility or
an upgrade path for existing integrations.

This commit adds a new "msteamsw" integration which uses MS Teams
Workflows to post notifications. It also updates the instructions
and illustrations in the "Add MS Teams Integration" page.

cc: #1024
2024-07-17 13:35:17 +03:00
Pēteris Caune
3e5080d9eb
Remove Ping.body field 2024-07-11 16:34:18 +03:00
Pēteris Caune
997154e3b0
Remove usages of Ping.body 2024-07-11 16:17:21 +03:00
Pēteris Caune
daaee30c88
Add data migration to move Check.body -> Check.body_raw
We used "body" to store request body as text.
In 2022 we added "body_raw" and started to use it to store request
body as bytes.

In python code we currently need to inspect both fields,
because the data could be in "body" (for old pings) or in
"body_raw" (for newer pings). My plan is to eventually get rid
of the "body" field, and have "body_raw" only. This data migration
is a step towards that: for any Ping objects that have non-empty
"body" field, it moves the data to the "body_raw" field. After
applying this migration, the "body" field should be empty (empty
string or null) for all Ping objects.
2024-07-11 14:38:36 +03:00
Pēteris Caune
bc8fb90fed
Update Check.ping() to use select_for_update()
Without it, on MariaDB, concurrent pings can lead to a deadlock.
This results in OperationalError and HTTP 500 response to the client.

cc: #1023
2024-07-10 19:50:39 +03:00
Pēteris Caune
b3de36d15c
Reorder system checks in hc.api.apps 2024-07-04 11:32:28 +03:00
Pēteris Caune
23f3256abc
Rename and clean up the apprise system check 2024-07-04 11:28:58 +03:00
Pēteris Caune
cf619bc68b
Fix hc.api.transports to not alter settings.APPRISE_ENABLED setting.
Instead, make it set a local `have_apprise` variable, and use
it in the hc.api.transports.Apprise class.

If hc.api.transports sets APPRISE_ENABLED to False,
then the apprise system check in hc.api.apps will not see the
original value and therefore will not run.
2024-07-04 11:28:16 +03:00
Rajesh Kumar
57459b0375
Show warning if apprise is enabled but apprise package is not installed (#1021)
* fix: show warning if apprise is enabled and not installed in environment

* renamed appraise check register

* revert back changes in transport for apprise
2024-07-04 11:12:05 +03:00
Pēteris Caune
b5eced26cf
Fix migrations for Django 5.1 2024-06-27 10:20:27 +03:00
Pēteris Caune
324fa10ce7
Fix Check.lock_and_delete() to gracefully handle already deleted check 2024-06-20 15:57:53 +03:00
Viktor Szépe
9a44ef1571 Fix typos 2024-06-20 15:41:42 +03:00
Pēteris Caune
b2c5e91c70
Implement legacy -> canonical timezone conversion
There are three related changes:

* Removed legacy timezones from hc.lib.tz.all_timezones
* Added data migration to update existing Check.tz values
* For backwards compatibility, added code to automatically
  replace a legacy timezone with a canonical timezone when a
  legacy timezone is passed to an API call

I used the timezone mapping on
https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
2024-06-14 12:55:57 +03:00
Pēteris Caune
52f2b534a6
Fix API to accept Europe/Kiev but save it as Europe/Kyiv 2024-06-13 15:23:27 +03:00
Pēteris Caune
c5bd666faf
Add data migration to update timezone "Europe/Kiev" to "Europe/Kyiv" 2024-06-13 15:03:51 +03:00
Pēteris Caune
26a57343b1
Add a data migration to fill null api_notification.code values
Using model's default didn't quite work, as Django tried to use
the same UUID for all rows.
2024-05-17 10:43:46 +03:00
Pēteris Caune
d486d2db14
Add uniqueness constraint to api_notification.code
This is primarily to make notification lookups by code efficient.
We look up notifications by code in hc.api.views.boundces.

This field has a default value (uuid.uuid4), so any null values
will be filled with random UUIDs during migration.
2024-05-17 10:30:01 +03:00
Pēteris Caune
99d74d2c2c
Add type hint for view_on_site in channel admin 2024-05-01 11:18:31 +03:00
Pēteris Caune
4ec7a48082
Update the Discord integration to disable channel on HTTP 404 responses 2024-04-26 09:25:42 +03:00
Pēteris Caune
872e4d743e
Increase the timeout for sending Signal messages to 20 seconds
We're sometimes overshooting the 15 seconds, so let's try increasing
the limit a little.
2024-04-25 14:52:15 +03:00
Pēteris Caune
6fb46aee32
Fix integrations to include oncalendar schedules in notifications 2024-04-24 16:08:55 +03:00
Pēteris Caune
4181399659
Fix Spike integration to not disclose check's code in incident data 2024-04-22 13:01:38 +03:00
Pēteris Caune
ddae6a04bf
Fix VictorOps integration to not disclose check's code in incident data 2024-04-22 12:57:10 +03:00
Pēteris Caune
c08ba1d872
Fix PagerTree integration to not disclose check's code in incident data 2024-04-22 12:46:18 +03:00
Pēteris Caune
53f554df1e
Fix type warning 2024-04-22 12:45:51 +03:00
Pēteris Caune
994bc10857
Update PagerDuty integration to use ping.formatted_kind_created 2024-04-22 12:31:03 +03:00
Pēteris Caune
18bd44a68b
Fix PagerDuty integration to not disclose check's code in incident data 2024-04-22 12:12:22 +03:00
Pēteris Caune
e683496bed
Move reusable ping formatting code to Ping model 2024-04-19 12:38:20 +03:00