beenull/moby

Author	SHA1	Message	Date
Cory Snider	97d32bb7d7	daemon: stop checkpointing health probes to disk The health status and probe log of containers are not mission-criticial data which must survive a crash. It is not worth prematrely wearing out consumer-grade flash storage by overwriting and fsync()ing the container config on after every probe. Update only the live Container object and the ViewDB replica on every container health probe instead. It will eventually get checkpointed along with some other state (or config) change. Running containers will not be checkpointed on daemon shutdown when live-restore is enabled, but it does not matter: the health status and probe log will be zeroed out when the daemon starts back up. Signed-off-by: Cory Snider <csnider@mirantis.com>	2024-01-16 14:09:40 -05:00
Brian Goff	02a932d63f	Fix case where health start interval is 0 uses default When the start interval is 0 we should treat that as unset. This is especially important for older API versions where we reset the value to 0. Instead of using the default probe value we should be using the configured `interval` value (which may be a default as well) which gives us back the old behavior before support for start interval was added. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2023-11-02 20:02:16 +00:00
Sebastiaan van Stijn	cff4f20c44	migrate to github.com/containerd/log v0.1.0 The github.com/containerd/containerd/log package was moved to a separate module, which will also be used by upcoming (patch) releases of containerd. This patch moves our own uses of the package to use the new module. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2023-10-11 17:52:23 +02:00
Sebastiaan van Stijn	0f871f8cb7	api/types/events: define "Action" type and consts Define consts for the Actions we use for events, instead of "ad-hoc" strings. Having these consts makes it easier to find where specific events are triggered, makes the events less error-prone, and allows documenting each Action (if needed). Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2023-08-29 00:38:08 +02:00
Sebastiaan van Stijn	10a3a3bc49	daemon: inline some variables when emitting events Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2023-08-29 00:38:08 +02:00
Sebastiaan van Stijn	a3867992b7	daemon: rename max/min as it collides with go1.21 builtin Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2023-08-26 22:02:21 +02:00
Brian Goff	2216d3ca8d	Add health start interval This adds an additional interval to be used by healthchecks during the start period. Typically when a container is just starting you want to check if it is ready more quickly than a typical healthcheck might run. Without this users have to balance between running healthchecks to frequently vs taking a very long time to mark a container as healthy for the first time. Signed-off-by: Brian Goff <cpuguy83@gmail.com> Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2023-07-05 23:44:17 +00:00
Brian Goff	74da6a6363	Switch all logging to use containerd log pkg This unifies our logging and allows us to propagate logging and trace contexts together. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2023-06-24 00:23:44 +00:00
Cory Snider	786c9adaa2	daemon: fix double-unlock in health check probe Signed-off-by: Cory Snider <csnider@mirantis.com>	2023-06-22 17:48:21 -04:00
Sebastiaan van Stijn	0670621291	Merge pull request #43997 from thaJeztah/healthcheck_capture_logs daemon: capture output of killed health checks	2022-09-02 10:48:22 +02:00
Cory Snider	a09f8dbe6e	daemon: Maintain container exec-inspect invariant We have integration tests which assert the invariant that a GET /containers/{id}/json response lists only IDs of execs which are in the Running state, according to GET /exec/{id}/json. The invariant could be violated if those requests were to race the handling of the exec's task-exit event. The coarse-grained locking of the container ExecStore when starting an exec task was accidentally synchronizing (Daemon).ProcessEvent and (Daemon).ContainerExecInspect to it just enough to make it improbable for the integration tests to catch the invariant violation on execs which exit immediately. Removing the unnecessary locking made the underlying race condition more likely for the tests to hit. Maintain the invariant by deleting the exec from its container's ExecCommands before clearing its Running flag. Additionally, fix other potential data races with execs by ensuring that the ExecConfig lock is held whenever a mutable field is read from or written to. Signed-off-by: Cory Snider <csnider@mirantis.com>	2022-08-24 19:35:07 -04:00
Cory Snider	4bafaa00aa	Refactor libcontainerd to minimize c8d RPCs The containerd client is very chatty at the best of times. Because the libcontained API is stateless and references containers and processes by string ID for every method call, the implementation is essentially forced to use the containerd client in a way which amplifies the number of redundant RPCs invoked to perform any operation. The libcontainerd remote implementation has to reload the containerd container, task and/or process metadata for nearly every operation. This in turn amplifies the number of context switches between dockerd and containerd to perform any container operation or handle a containerd event, increasing the load on the system which could otherwise be allocated to workloads. Overhaul the libcontainerd interface to reduce the impedance mismatch with the containerd client so that the containerd client can be used more efficiently. Split the API out into container, task and process interfaces which the consumer is expected to retain so that libcontainerd can retain state---especially the analogous containerd client objects---without having to manage any state-store inside the libcontainerd client. Signed-off-by: Cory Snider <csnider@mirantis.com>	2022-08-24 14:59:08 -04:00
Cory Snider	0cbb92bcc5	daemon: capture output of killed health checks Add an integration test to verify that health checks are killed on timeout and that the output is captured. Co-authored-by: Nicolas De Loof <nicolas.deloof@gmail.com> Signed-off-by: Cory Snider <csnider@mirantis.com> Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2022-08-24 13:59:34 +02:00
Cory Snider	4b84a33217	daemon: kill exec process on ctx cancel Terminating the exec process when the context is canceled has been broken since Docker v17.11 so nobody has been able to depend upon that behaviour in five years of releases. We are thus free from backwards- compatibility constraints. Co-authored-by: Nicolas De Loof <nicolas.deloof@gmail.com> Co-authored-by: Sebastiaan van Stijn <github@gone.nl> Signed-off-by: Nicolas De Loof <nicolas.deloof@gmail.com> Signed-off-by: Cory Snider <csnider@mirantis.com> Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2022-08-23 15:35:30 +02:00
Paweł Gronowski	56a20dbc19	container/exec: Support ConsoleSize Now client have the possibility to set the console size of the executed process immediately at the creation. This makes a difference for example when executing commands that output some kind of text user interface which is bounded by the console dimensions. Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>	2022-06-24 11:54:25 +02:00
Cory Snider	bdc6473d2d	health: Start probe timeout after exec starts Starting an exec can take a significant amount of time while under heavy container operation load. In extreme cases the time to start the process can take upwards of a second, which is a significant fraction of the default health probe timeout (30s). With a shorter timeout, the exec start delay could make the difference between a successful probe and a probe timeout! Mitigate the impact of excessive exec start latencies by only starting the probe timeout timer after the exec'ed process has started. Add a metric to sample the latency of starting health-check exec probes. Signed-off-by: Cory Snider <csnider@mirantis.com>	2022-04-28 17:21:03 -04:00
Sebastiaan van Stijn	797ec8e913	daemon: rename all receivers to "daemon" Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-04-14 17:22:21 +02:00
Sebastiaan van Stijn	3e6a13ccb8	LCOW: fix using wrong shell for healthchecks As reported in docker/compose#6445, when deploying a Linux container on Windows (LCOW), the daemon made the wrong assumption when deciding which shell to use to execute the healthcheck, looking at the host's platform instead of the container's platform. This patch adds a check for the container's platform when deploying on Windows, and sets the correct shell. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2019-06-21 13:58:25 +02:00
Brian Goff	eaad3ee3cf	Make sure timers are stopped after use. `time.After` keeps a timer running until the specified duration is completed. It also allocates a new timer on each call. This can wind up leaving lots of uneccessary timers running in the background that are not needed and consume resources. Instead of `time.After`, use `time.NewTimer` so the timer can actually be stopped. In some of these cases it's not a big deal since the duraiton is really short, but in others it is much worse. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2019-01-16 14:32:53 -08:00
Kir Kolyshkin	7d62e40f7e	Switch from x/net/context -> context Since Go 1.7, context is a standard package. Since Go 1.9, everything that is provided by "x/net/context" is a couple of type aliases to types in "context". Many vendored packages still use x/net/context, so vendor entry remains for now. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2018-04-23 13:52:44 -07:00
Daniel Nephin	4f0d95fa6e	Add canonical import comment Signed-off-by: Daniel Nephin <dnephin@docker.com>	2018-02-05 16:51:57 -05:00
Nicolas De Loof	aa6bb5cb69	introduce « exec_die » event Signed-off-by: Nicolas De Loof <nicolas.deloof@gmail.com>	2018-01-08 11:42:25 +01:00
Nicolas De Loof	852a943c77	fix #35843 regression on health check workingdir Signed-off-by: Nicolas De Loof <nicolas.deloof@gmail.com>	2017-12-20 14:04:51 +01:00
Yong Tang	29d6aef393	Merge pull request #35533 from AliyunContainerService/supress-warning-healthcheck-none Suppress warning when NONE was set for healthcheck	2017-11-30 11:06:05 -08:00
Li Yi	e987c554c9	Supress warning when NONE was set for healthcheck Change-Id: I9ebcf49e9e8ac76beb037779ad02ac6020169849 Signed-off-by: Li Yi <denverdino@gmail.com>	2017-11-17 19:43:59 +08:00
Stephen J Day	7db30ab0cd	container: protect the health status with mutex Adds a mutex to protect the status, as well. When running the race detector with the unit test, we can see that the Status field is written without holding this lock. Adding a mutex to read and set status addresses the issue. Signed-off-by: Stephen J Day <stephen.day@docker.com>	2017-11-16 15:04:01 -08:00
Daniel Nephin	62c1f0ef41	Add deadcode linter Signed-off-by: Daniel Nephin <dnephin@docker.com>	2017-08-21 18:18:50 -04:00
Daniel Nephin	9b47b7b151	Fix golint errors. Signed-off-by: Daniel Nephin <dnephin@docker.com>	2017-08-18 14:23:44 -04:00
Derek McGowan	1009e6a40b	Update logrus to v1.0.1 Fixes case sensitivity issue Signed-off-by: Derek McGowan <derek@mcgstyle.net>	2017-07-31 13:16:46 -07:00
Aaron Lehmann	da28210a15	Merge pull request #33781 from mlaventure/fix-healhcheck-goroutine-leak Prevent a goroutine leak when healthcheck gets stopped	2017-06-26 15:34:43 -07:00
Kenfe-Mickael Laventure	67297ba005	Prevent a goroutine leak when healthcheck gets stopped Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>	2017-06-23 08:06:49 -07:00
Fabio Kung	aacddda89d	Move checkpointing to the Container object Also hide ViewDB behind an inteface. Signed-off-by: Fabio Kung <fabio.kung@gmail.com>	2017-06-23 07:52:32 -07:00
Fabio Kung	eed4c7b73f	keep a consistent view of containers rendered Replicate relevant mutations to the in-memory ACID store. Readers will then be able to query container state without locking. Signed-off-by: Fabio Kung <fabio.kung@gmail.com>	2017-06-23 07:52:31 -07:00
Boaz Shuster	5836d86ac4	Add container environment variables correctly to the health check The health check process doesn't have all the environment varialbes in the container or has them set incorrectly. This patch should fix that problem. Signed-off-by: Boaz Shuster <ripcurld.github@gmail.com>	2017-05-21 21:39:00 +03:00
Elias Faxö	e401f63735	Added start period option to health check. Signed-off-by: Elias Faxö <elias.faxo@gmail.com>	2017-04-06 12:35:34 +02:00
David McKay	647dce9dea	Healthchecks should inherit environment Signed-off-by: David McKay <david@rawkode.com>	2017-03-02 16:23:56 +00:00
Victor Vieux	f6f67891be	Merge pull request #28438 from vdemeester/use-container-shell-instead-of-hardcoded Use Container.Config.Shell instead of hardcoded…	2016-11-18 18:54:36 -08:00
Tonis Tiigi	89b1234737	Fix deadlock on cancelling healthcheck Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>	2016-11-15 20:10:16 -08:00
Vincent Demeester	5f81cf11f6	Use Container.Config.Shell instead of hardcoded… … for healthcheck. It make the code a little cleaner and more future/usage proof. Signed-off-by: Vincent Demeester <vincent@sbr.pm>	2016-11-15 17:53:24 +01:00
Michael Crosby	3343d234f3	Add basic prometheus support This adds a metrics packages that creates additional metrics. Add the metrics endpoint to the docker api server under `/metrics`. Signed-off-by: Michael Crosby <crosbymichael@gmail.com> Add metrics to daemon package Signed-off-by: Michael Crosby <crosbymichael@gmail.com> api: use standard way for metrics route Also add "type" query parameter Signed-off-by: Alexander Morozov <lk4d4@docker.com> Convert timers to ms Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-10-27 10:34:38 -07:00
Thomas Leonard	b8793cff48	Reset health status to starting when a container is restarted Signed-off-by: Thomas Leonard <thomas.leonard@docker.com>	2016-10-14 15:49:12 +01:00
allencloud	a4a4f3733f	make health check log more readable Signed-off-by: allencloud <allen.sun@daocloud.io>	2016-09-28 14:10:15 +08:00
Stephen Drake	c3319445aa	Prevent stdout / stderr race condition in limitedBuffer. Signed-off-by: Stephen Drake <stephen@xenolith.net>	2016-09-15 13:31:11 +02:00
Michael Crosby	91e197d614	Add engine-api types to docker This moves the types for the `engine-api` repo to the existing types package. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-09-07 11:05:58 -07:00
Tibor Vass	91e9f38313	healthcheck: do not interpret exit code 2 as "starting" Instead reserve exit code 2 to be future proof, document that it should not be used. Implementation-wise, it is considered as unhealthy, but users should not rely on this as it may change in the future. Signed-off-by: Tibor Vass <tibor@docker.com>	2016-07-25 14:28:45 -07:00
Josh Horwitz	4016038bd3	Treat HEALTHCHECK NONE the same as not setting a healthcheck Signed-off-by: Josh Horwitz <horwitzja@gmail.com>	2016-07-25 11:11:14 -04:00
Alexander Morozov	576c9fa200	Merge pull request #23442 from thaJeztah/remove-defaultExitOnUnhealthy remove unused defaultExitOnUnhealthy constant	2016-06-11 16:37:39 -07:00
Yong Tang	a72b45dbec	Fix logrus formatting This fix tries to fix logrus formatting by removing `f` from `logrus.[Error\|Warn\|Debug\|Fatal\|Panic\|Info]f` when formatting string is not present. This fix fixes #23459. Signed-off-by: Yong Tang <yong.tang.github@outlook.com>	2016-06-11 13:16:55 -07:00
Sebastiaan van Stijn	1dd28788f1	remove unused defaultExitOnUnhealthy constant the '--exit-on-unhealty' option was removed, but we forgot to remove this constant. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2016-06-11 00:04:05 +02:00
Jannick Fahlbusch	e3490cdcc0	Fix some typos Signed-off-by: Jannick Fahlbusch <git@jf-projects.de>	2016-06-08 21:59:34 +02:00

1 2

52 commits