beenull/moby

Author	SHA1	Message	Date
Sebastiaan van Stijn	0670621291	Merge pull request #43997 from thaJeztah/healthcheck_capture_logs daemon: capture output of killed health checks	2022-09-02 10:48:22 +02:00
Cory Snider	a09f8dbe6e	daemon: Maintain container exec-inspect invariant We have integration tests which assert the invariant that a GET /containers/{id}/json response lists only IDs of execs which are in the Running state, according to GET /exec/{id}/json. The invariant could be violated if those requests were to race the handling of the exec's task-exit event. The coarse-grained locking of the container ExecStore when starting an exec task was accidentally synchronizing (Daemon).ProcessEvent and (Daemon).ContainerExecInspect to it just enough to make it improbable for the integration tests to catch the invariant violation on execs which exit immediately. Removing the unnecessary locking made the underlying race condition more likely for the tests to hit. Maintain the invariant by deleting the exec from its container's ExecCommands before clearing its Running flag. Additionally, fix other potential data races with execs by ensuring that the ExecConfig lock is held whenever a mutable field is read from or written to. Signed-off-by: Cory Snider <csnider@mirantis.com>	2022-08-24 19:35:07 -04:00
Cory Snider	4bafaa00aa	Refactor libcontainerd to minimize c8d RPCs The containerd client is very chatty at the best of times. Because the libcontained API is stateless and references containers and processes by string ID for every method call, the implementation is essentially forced to use the containerd client in a way which amplifies the number of redundant RPCs invoked to perform any operation. The libcontainerd remote implementation has to reload the containerd container, task and/or process metadata for nearly every operation. This in turn amplifies the number of context switches between dockerd and containerd to perform any container operation or handle a containerd event, increasing the load on the system which could otherwise be allocated to workloads. Overhaul the libcontainerd interface to reduce the impedance mismatch with the containerd client so that the containerd client can be used more efficiently. Split the API out into container, task and process interfaces which the consumer is expected to retain so that libcontainerd can retain state---especially the analogous containerd client objects---without having to manage any state-store inside the libcontainerd client. Signed-off-by: Cory Snider <csnider@mirantis.com>	2022-08-24 14:59:08 -04:00
Cory Snider	0cbb92bcc5	daemon: capture output of killed health checks Add an integration test to verify that health checks are killed on timeout and that the output is captured. Co-authored-by: Nicolas De Loof <nicolas.deloof@gmail.com> Signed-off-by: Cory Snider <csnider@mirantis.com> Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2022-08-24 13:59:34 +02:00
Cory Snider	4b84a33217	daemon: kill exec process on ctx cancel Terminating the exec process when the context is canceled has been broken since Docker v17.11 so nobody has been able to depend upon that behaviour in five years of releases. We are thus free from backwards- compatibility constraints. Co-authored-by: Nicolas De Loof <nicolas.deloof@gmail.com> Co-authored-by: Sebastiaan van Stijn <github@gone.nl> Signed-off-by: Nicolas De Loof <nicolas.deloof@gmail.com> Signed-off-by: Cory Snider <csnider@mirantis.com> Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2022-08-23 15:35:30 +02:00
Paweł Gronowski	56a20dbc19	container/exec: Support ConsoleSize Now client have the possibility to set the console size of the executed process immediately at the creation. This makes a difference for example when executing commands that output some kind of text user interface which is bounded by the console dimensions. Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>	2022-06-24 11:54:25 +02:00
Cory Snider	bdc6473d2d	health: Start probe timeout after exec starts Starting an exec can take a significant amount of time while under heavy container operation load. In extreme cases the time to start the process can take upwards of a second, which is a significant fraction of the default health probe timeout (30s). With a shorter timeout, the exec start delay could make the difference between a successful probe and a probe timeout! Mitigate the impact of excessive exec start latencies by only starting the probe timeout timer after the exec'ed process has started. Add a metric to sample the latency of starting health-check exec probes. Signed-off-by: Cory Snider <csnider@mirantis.com>	2022-04-28 17:21:03 -04:00
Sebastiaan van Stijn	797ec8e913	daemon: rename all receivers to "daemon" Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-04-14 17:22:21 +02:00
Sebastiaan van Stijn	3e6a13ccb8	LCOW: fix using wrong shell for healthchecks As reported in docker/compose#6445, when deploying a Linux container on Windows (LCOW), the daemon made the wrong assumption when deciding which shell to use to execute the healthcheck, looking at the host's platform instead of the container's platform. This patch adds a check for the container's platform when deploying on Windows, and sets the correct shell. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2019-06-21 13:58:25 +02:00
Brian Goff	eaad3ee3cf	Make sure timers are stopped after use. `time.After` keeps a timer running until the specified duration is completed. It also allocates a new timer on each call. This can wind up leaving lots of uneccessary timers running in the background that are not needed and consume resources. Instead of `time.After`, use `time.NewTimer` so the timer can actually be stopped. In some of these cases it's not a big deal since the duraiton is really short, but in others it is much worse. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2019-01-16 14:32:53 -08:00
Kir Kolyshkin	7d62e40f7e	Switch from x/net/context -> context Since Go 1.7, context is a standard package. Since Go 1.9, everything that is provided by "x/net/context" is a couple of type aliases to types in "context". Many vendored packages still use x/net/context, so vendor entry remains for now. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2018-04-23 13:52:44 -07:00
Daniel Nephin	4f0d95fa6e	Add canonical import comment Signed-off-by: Daniel Nephin <dnephin@docker.com>	2018-02-05 16:51:57 -05:00
Nicolas De Loof	aa6bb5cb69	introduce « exec_die » event Signed-off-by: Nicolas De Loof <nicolas.deloof@gmail.com>	2018-01-08 11:42:25 +01:00
Nicolas De Loof	852a943c77	fix #35843 regression on health check workingdir Signed-off-by: Nicolas De Loof <nicolas.deloof@gmail.com>	2017-12-20 14:04:51 +01:00
Yong Tang	29d6aef393	Merge pull request #35533 from AliyunContainerService/supress-warning-healthcheck-none Suppress warning when NONE was set for healthcheck	2017-11-30 11:06:05 -08:00
Li Yi	e987c554c9	Supress warning when NONE was set for healthcheck Change-Id: I9ebcf49e9e8ac76beb037779ad02ac6020169849 Signed-off-by: Li Yi <denverdino@gmail.com>	2017-11-17 19:43:59 +08:00
Stephen J Day	7db30ab0cd	container: protect the health status with mutex Adds a mutex to protect the status, as well. When running the race detector with the unit test, we can see that the Status field is written without holding this lock. Adding a mutex to read and set status addresses the issue. Signed-off-by: Stephen J Day <stephen.day@docker.com>	2017-11-16 15:04:01 -08:00
Daniel Nephin	62c1f0ef41	Add deadcode linter Signed-off-by: Daniel Nephin <dnephin@docker.com>	2017-08-21 18:18:50 -04:00
Daniel Nephin	9b47b7b151	Fix golint errors. Signed-off-by: Daniel Nephin <dnephin@docker.com>	2017-08-18 14:23:44 -04:00
Derek McGowan	1009e6a40b	Update logrus to v1.0.1 Fixes case sensitivity issue Signed-off-by: Derek McGowan <derek@mcgstyle.net>	2017-07-31 13:16:46 -07:00
Aaron Lehmann	da28210a15	Merge pull request #33781 from mlaventure/fix-healhcheck-goroutine-leak Prevent a goroutine leak when healthcheck gets stopped	2017-06-26 15:34:43 -07:00
Kenfe-Mickael Laventure	67297ba005	Prevent a goroutine leak when healthcheck gets stopped Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>	2017-06-23 08:06:49 -07:00
Fabio Kung	aacddda89d	Move checkpointing to the Container object Also hide ViewDB behind an inteface. Signed-off-by: Fabio Kung <fabio.kung@gmail.com>	2017-06-23 07:52:32 -07:00
Fabio Kung	eed4c7b73f	keep a consistent view of containers rendered Replicate relevant mutations to the in-memory ACID store. Readers will then be able to query container state without locking. Signed-off-by: Fabio Kung <fabio.kung@gmail.com>	2017-06-23 07:52:31 -07:00
Boaz Shuster	5836d86ac4	Add container environment variables correctly to the health check The health check process doesn't have all the environment varialbes in the container or has them set incorrectly. This patch should fix that problem. Signed-off-by: Boaz Shuster <ripcurld.github@gmail.com>	2017-05-21 21:39:00 +03:00
Elias Faxö	e401f63735	Added start period option to health check. Signed-off-by: Elias Faxö <elias.faxo@gmail.com>	2017-04-06 12:35:34 +02:00
David McKay	647dce9dea	Healthchecks should inherit environment Signed-off-by: David McKay <david@rawkode.com>	2017-03-02 16:23:56 +00:00
Victor Vieux	f6f67891be	Merge pull request #28438 from vdemeester/use-container-shell-instead-of-hardcoded Use Container.Config.Shell instead of hardcoded…	2016-11-18 18:54:36 -08:00
Tonis Tiigi	89b1234737	Fix deadlock on cancelling healthcheck Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>	2016-11-15 20:10:16 -08:00
Vincent Demeester	5f81cf11f6	Use Container.Config.Shell instead of hardcoded… … for healthcheck. It make the code a little cleaner and more future/usage proof. Signed-off-by: Vincent Demeester <vincent@sbr.pm>	2016-11-15 17:53:24 +01:00
Michael Crosby	3343d234f3	Add basic prometheus support This adds a metrics packages that creates additional metrics. Add the metrics endpoint to the docker api server under `/metrics`. Signed-off-by: Michael Crosby <crosbymichael@gmail.com> Add metrics to daemon package Signed-off-by: Michael Crosby <crosbymichael@gmail.com> api: use standard way for metrics route Also add "type" query parameter Signed-off-by: Alexander Morozov <lk4d4@docker.com> Convert timers to ms Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-10-27 10:34:38 -07:00
Thomas Leonard	b8793cff48	Reset health status to starting when a container is restarted Signed-off-by: Thomas Leonard <thomas.leonard@docker.com>	2016-10-14 15:49:12 +01:00
allencloud	a4a4f3733f	make health check log more readable Signed-off-by: allencloud <allen.sun@daocloud.io>	2016-09-28 14:10:15 +08:00
Stephen Drake	c3319445aa	Prevent stdout / stderr race condition in limitedBuffer. Signed-off-by: Stephen Drake <stephen@xenolith.net>	2016-09-15 13:31:11 +02:00
Michael Crosby	91e197d614	Add engine-api types to docker This moves the types for the `engine-api` repo to the existing types package. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-09-07 11:05:58 -07:00
Tibor Vass	91e9f38313	healthcheck: do not interpret exit code 2 as "starting" Instead reserve exit code 2 to be future proof, document that it should not be used. Implementation-wise, it is considered as unhealthy, but users should not rely on this as it may change in the future. Signed-off-by: Tibor Vass <tibor@docker.com>	2016-07-25 14:28:45 -07:00
Josh Horwitz	4016038bd3	Treat HEALTHCHECK NONE the same as not setting a healthcheck Signed-off-by: Josh Horwitz <horwitzja@gmail.com>	2016-07-25 11:11:14 -04:00
Alexander Morozov	576c9fa200	Merge pull request #23442 from thaJeztah/remove-defaultExitOnUnhealthy remove unused defaultExitOnUnhealthy constant	2016-06-11 16:37:39 -07:00
Yong Tang	a72b45dbec	Fix logrus formatting This fix tries to fix logrus formatting by removing `f` from `logrus.[Error\|Warn\|Debug\|Fatal\|Panic\|Info]f` when formatting string is not present. This fix fixes #23459. Signed-off-by: Yong Tang <yong.tang.github@outlook.com>	2016-06-11 13:16:55 -07:00
Sebastiaan van Stijn	1dd28788f1	remove unused defaultExitOnUnhealthy constant the '--exit-on-unhealty' option was removed, but we forgot to remove this constant. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2016-06-11 00:04:05 +02:00
Jannick Fahlbusch	e3490cdcc0	Fix some typos Signed-off-by: Jannick Fahlbusch <git@jf-projects.de>	2016-06-08 21:59:34 +02:00
Sebastiaan van Stijn	50e470fab4	Healthcheck: set default retries to 3 Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2016-06-03 13:28:08 +02:00
Thomas Leonard	b6c7becbfe	Add support for user-defined healthchecks This PR adds support for user-defined health-check probes for Docker containers. It adds a `HEALTHCHECK` instruction to the Dockerfile syntax plus some corresponding "docker run" options. It can be used with a restart policy to automatically restart a container if the check fails. The `HEALTHCHECK` instruction has two forms: * `HEALTHCHECK [OPTIONS] CMD command` (check container health by running a command inside the container) * `HEALTHCHECK NONE` (disable any healthcheck inherited from the base image) The `HEALTHCHECK` instruction tells Docker how to test a container to check that it is still working. This can detect cases such as a web server that is stuck in an infinite loop and unable to handle new connections, even though the server process is still running. When a container has a healthcheck specified, it has a _health status_ in addition to its normal status. This status is initially `starting`. Whenever a health check passes, it becomes `healthy` (whatever state it was previously in). After a certain number of consecutive failures, it becomes `unhealthy`. The options that can appear before `CMD` are: * `--interval=DURATION` (default: `30s`) * `--timeout=DURATION` (default: `30s`) * `--retries=N` (default: `1`) The health check will first run interval seconds after the container is started, and then again interval seconds after each previous check completes. If a single run of the check takes longer than timeout seconds then the check is considered to have failed. It takes retries consecutive failures of the health check for the container to be considered `unhealthy`. There can only be one `HEALTHCHECK` instruction in a Dockerfile. If you list more than one then only the last `HEALTHCHECK` will take effect. The command after the `CMD` keyword can be either a shell command (e.g. `HEALTHCHECK CMD /bin/check-running`) or an _exec_ array (as with other Dockerfile commands; see e.g. `ENTRYPOINT` for details). The command's exit status indicates the health status of the container. The possible values are: - 0: success - the container is healthy and ready for use - 1: unhealthy - the container is not working correctly - 2: starting - the container is not ready for use yet, but is working correctly If the probe returns 2 ("starting") when the container has already moved out of the "starting" state then it is treated as "unhealthy" instead. For example, to check every five minutes or so that a web-server is able to serve the site's main page within three seconds: HEALTHCHECK --interval=5m --timeout=3s \ CMD curl -f http://localhost/ \|\| exit 1 To help debug failing probes, any output text (UTF-8 encoded) that the command writes on stdout or stderr will be stored in the health status and can be queried with `docker inspect`. Such output should be kept short (only the first 4096 bytes are stored currently). When the health status of a container changes, a `health_status` event is generated with the new status. The health status is also displayed in the `docker ps` output. Signed-off-by: Thomas Leonard <thomas.leonard@docker.com> Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2016-06-02 23:58:34 +02:00

43 commits