Commit graph

43 commits

Author SHA1 Message Date
Sebastiaan van Stijn
0670621291
Merge pull request #43997 from thaJeztah/healthcheck_capture_logs
daemon: capture output of killed health checks
2022-09-02 10:48:22 +02:00
Cory Snider
a09f8dbe6e daemon: Maintain container exec-inspect invariant
We have integration tests which assert the invariant that a
GET /containers/{id}/json response lists only IDs of execs which are in
the Running state, according to GET /exec/{id}/json. The invariant could
be violated if those requests were to race the handling of the exec's
task-exit event. The coarse-grained locking of the container ExecStore
when starting an exec task was accidentally synchronizing
(*Daemon).ProcessEvent and (*Daemon).ContainerExecInspect to it just
enough to make it improbable for the integration tests to catch the
invariant violation on execs which exit immediately. Removing the
unnecessary locking made the underlying race condition more likely for
the tests to hit.

Maintain the invariant by deleting the exec from its container's
ExecCommands before clearing its Running flag. Additionally, fix other
potential data races with execs by ensuring that the ExecConfig lock is
held whenever a mutable field is read from or written to.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-08-24 19:35:07 -04:00
Cory Snider
4bafaa00aa Refactor libcontainerd to minimize c8d RPCs
The containerd client is very chatty at the best of times. Because the
libcontained API is stateless and references containers and processes by
string ID for every method call, the implementation is essentially
forced to use the containerd client in a way which amplifies the number
of redundant RPCs invoked to perform any operation. The libcontainerd
remote implementation has to reload the containerd container, task
and/or process metadata for nearly every operation. This in turn
amplifies the number of context switches between dockerd and containerd
to perform any container operation or handle a containerd event,
increasing the load on the system which could otherwise be allocated to
workloads.

Overhaul the libcontainerd interface to reduce the impedance mismatch
with the containerd client so that the containerd client can be used
more efficiently. Split the API out into container, task and process
interfaces which the consumer is expected to retain so that
libcontainerd can retain state---especially the analogous containerd
client objects---without having to manage any state-store inside the
libcontainerd client.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-08-24 14:59:08 -04:00
Cory Snider
0cbb92bcc5
daemon: capture output of killed health checks
Add an integration test to verify that health checks are killed on
timeout and that the output is captured.

Co-authored-by: Nicolas De Loof <nicolas.deloof@gmail.com>
Signed-off-by: Cory Snider <csnider@mirantis.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-08-24 13:59:34 +02:00
Cory Snider
4b84a33217
daemon: kill exec process on ctx cancel
Terminating the exec process when the context is canceled has been
broken since Docker v17.11 so nobody has been able to depend upon that
behaviour in five years of releases. We are thus free from backwards-
compatibility constraints.

Co-authored-by: Nicolas De Loof <nicolas.deloof@gmail.com>
Co-authored-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Nicolas De Loof <nicolas.deloof@gmail.com>
Signed-off-by: Cory Snider <csnider@mirantis.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-08-23 15:35:30 +02:00
Paweł Gronowski
56a20dbc19 container/exec: Support ConsoleSize
Now client have the possibility to set the console size of the executed
process immediately at the creation. This makes a difference for example
when executing commands that output some kind of text user interface
which is bounded by the console dimensions.

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2022-06-24 11:54:25 +02:00
Cory Snider
bdc6473d2d health: Start probe timeout after exec starts
Starting an exec can take a significant amount of time while under heavy
container operation load. In extreme cases the time to start the process
can take upwards of a second, which is a significant fraction of the
default health probe timeout (30s). With a shorter timeout, the exec
start delay could make the difference between a successful probe and a
probe timeout! Mitigate the impact of excessive exec start latencies by
only starting the probe timeout timer after the exec'ed process has
started.

Add a metric to sample the latency of starting health-check exec probes.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-04-28 17:21:03 -04:00
Sebastiaan van Stijn
797ec8e913
daemon: rename all receivers to "daemon"
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-04-14 17:22:21 +02:00
Sebastiaan van Stijn
3e6a13ccb8
LCOW: fix using wrong shell for healthchecks
As reported in docker/compose#6445, when deploying a Linux
container on Windows (LCOW), the daemon made the wrong assumption
when deciding which shell to use to execute the healthcheck, looking
at the host's platform instead of the container's platform.

This patch adds a check for the container's platform when deploying
on Windows, and sets the correct shell.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-06-21 13:58:25 +02:00
Brian Goff
eaad3ee3cf Make sure timers are stopped after use.
`time.After` keeps a timer running until the specified duration is
completed. It also allocates a new timer on each call. This can wind up
leaving lots of uneccessary timers running in the background that are
not needed and consume resources.

Instead of `time.After`, use `time.NewTimer` so the timer can actually
be stopped.
In some of these cases it's not a big deal since the duraiton is really
short, but in others it is much worse.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2019-01-16 14:32:53 -08:00
Kir Kolyshkin
7d62e40f7e Switch from x/net/context -> context
Since Go 1.7, context is a standard package. Since Go 1.9, everything
that is provided by "x/net/context" is a couple of type aliases to
types in "context".

Many vendored packages still use x/net/context, so vendor entry remains
for now.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2018-04-23 13:52:44 -07:00
Daniel Nephin
4f0d95fa6e Add canonical import comment
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2018-02-05 16:51:57 -05:00
Nicolas De Loof
aa6bb5cb69
introduce « exec_die » event
Signed-off-by: Nicolas De Loof <nicolas.deloof@gmail.com>
2018-01-08 11:42:25 +01:00
Nicolas De Loof
852a943c77
fix #35843 regression on health check workingdir
Signed-off-by: Nicolas De Loof <nicolas.deloof@gmail.com>
2017-12-20 14:04:51 +01:00
Yong Tang
29d6aef393
Merge pull request #35533 from AliyunContainerService/supress-warning-healthcheck-none
Suppress warning when NONE was set for healthcheck
2017-11-30 11:06:05 -08:00
Li Yi
e987c554c9 Supress warning when NONE was set for healthcheck
Change-Id: I9ebcf49e9e8ac76beb037779ad02ac6020169849
Signed-off-by: Li Yi <denverdino@gmail.com>
2017-11-17 19:43:59 +08:00
Stephen J Day
7db30ab0cd
container: protect the health status with mutex
Adds a mutex to protect the status, as well. When running the race
detector with the unit test, we can see that the Status field is written
without holding this lock. Adding a mutex to read and set status
addresses the issue.

Signed-off-by: Stephen J Day <stephen.day@docker.com>
2017-11-16 15:04:01 -08:00
Daniel Nephin
62c1f0ef41 Add deadcode linter
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2017-08-21 18:18:50 -04:00
Daniel Nephin
9b47b7b151 Fix golint errors.
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2017-08-18 14:23:44 -04:00
Derek McGowan
1009e6a40b
Update logrus to v1.0.1
Fixes case sensitivity issue

Signed-off-by: Derek McGowan <derek@mcgstyle.net>
2017-07-31 13:16:46 -07:00
Aaron Lehmann
da28210a15 Merge pull request #33781 from mlaventure/fix-healhcheck-goroutine-leak
Prevent a goroutine leak when healthcheck gets stopped
2017-06-26 15:34:43 -07:00
Kenfe-Mickael Laventure
67297ba005
Prevent a goroutine leak when healthcheck gets stopped
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2017-06-23 08:06:49 -07:00
Fabio Kung
aacddda89d Move checkpointing to the Container object
Also hide ViewDB behind an inteface.

Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
2017-06-23 07:52:32 -07:00
Fabio Kung
eed4c7b73f keep a consistent view of containers rendered
Replicate relevant mutations to the in-memory ACID store. Readers will
then be able to query container state without locking.

Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
2017-06-23 07:52:31 -07:00
Boaz Shuster
5836d86ac4 Add container environment variables correctly to the health check
The health check process doesn't have all the environment
varialbes in the container or has them set incorrectly.

This patch should fix that problem.

Signed-off-by: Boaz Shuster <ripcurld.github@gmail.com>
2017-05-21 21:39:00 +03:00
Elias Faxö
e401f63735 Added start period option to health check.
Signed-off-by: Elias Faxö <elias.faxo@gmail.com>
2017-04-06 12:35:34 +02:00
David McKay
647dce9dea
Healthchecks should inherit environment
Signed-off-by: David McKay <david@rawkode.com>
2017-03-02 16:23:56 +00:00
Victor Vieux
f6f67891be Merge pull request #28438 from vdemeester/use-container-shell-instead-of-hardcoded
Use Container.Config.Shell instead of hardcoded…
2016-11-18 18:54:36 -08:00
Tonis Tiigi
89b1234737 Fix deadlock on cancelling healthcheck
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
2016-11-15 20:10:16 -08:00
Vincent Demeester
5f81cf11f6
Use Container.Config.Shell instead of hardcoded…
… for healthcheck. It make the code a little cleaner and more
future/usage proof.

Signed-off-by: Vincent Demeester <vincent@sbr.pm>
2016-11-15 17:53:24 +01:00
Michael Crosby
3343d234f3 Add basic prometheus support
This adds a metrics packages that creates additional metrics.  Add the
metrics endpoint to the docker api server under `/metrics`.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Add metrics to daemon package

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

api: use standard way for metrics route

Also add "type" query parameter

Signed-off-by: Alexander Morozov <lk4d4@docker.com>

Convert timers to ms

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-10-27 10:34:38 -07:00
Thomas Leonard
b8793cff48 Reset health status to starting when a container is restarted
Signed-off-by: Thomas Leonard <thomas.leonard@docker.com>
2016-10-14 15:49:12 +01:00
allencloud
a4a4f3733f make health check log more readable
Signed-off-by: allencloud <allen.sun@daocloud.io>
2016-09-28 14:10:15 +08:00
Stephen Drake
c3319445aa Prevent stdout / stderr race condition in limitedBuffer.
Signed-off-by: Stephen Drake <stephen@xenolith.net>
2016-09-15 13:31:11 +02:00
Michael Crosby
91e197d614 Add engine-api types to docker
This moves the types for the `engine-api` repo to the existing types
package.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-09-07 11:05:58 -07:00
Tibor Vass
91e9f38313 healthcheck: do not interpret exit code 2 as "starting"
Instead reserve exit code 2 to be future proof, document that it should
not be used. Implementation-wise, it is considered as unhealthy, but
users should not rely on this as it may change in the future.

Signed-off-by: Tibor Vass <tibor@docker.com>
2016-07-25 14:28:45 -07:00
Josh Horwitz
4016038bd3 Treat HEALTHCHECK NONE the same as not setting a healthcheck
Signed-off-by: Josh Horwitz <horwitzja@gmail.com>
2016-07-25 11:11:14 -04:00
Alexander Morozov
576c9fa200 Merge pull request #23442 from thaJeztah/remove-defaultExitOnUnhealthy
remove unused defaultExitOnUnhealthy constant
2016-06-11 16:37:39 -07:00
Yong Tang
a72b45dbec Fix logrus formatting
This fix tries to fix logrus formatting by removing `f` from
`logrus.[Error|Warn|Debug|Fatal|Panic|Info]f` when formatting string
is not present.

This fix fixes #23459.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2016-06-11 13:16:55 -07:00
Sebastiaan van Stijn
1dd28788f1
remove unused defaultExitOnUnhealthy constant
the '--exit-on-unhealty' option was removed,
but we forgot to remove this constant.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2016-06-11 00:04:05 +02:00
Jannick Fahlbusch
e3490cdcc0 Fix some typos
Signed-off-by: Jannick Fahlbusch <git@jf-projects.de>
2016-06-08 21:59:34 +02:00
Sebastiaan van Stijn
50e470fab4
Healthcheck: set default retries to 3
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2016-06-03 13:28:08 +02:00
Thomas Leonard
b6c7becbfe
Add support for user-defined healthchecks
This PR adds support for user-defined health-check probes for Docker
containers. It adds a `HEALTHCHECK` instruction to the Dockerfile syntax plus
some corresponding "docker run" options. It can be used with a restart policy
to automatically restart a container if the check fails.

The `HEALTHCHECK` instruction has two forms:

* `HEALTHCHECK [OPTIONS] CMD command` (check container health by running a command inside the container)
* `HEALTHCHECK NONE` (disable any healthcheck inherited from the base image)

The `HEALTHCHECK` instruction tells Docker how to test a container to check that
it is still working. This can detect cases such as a web server that is stuck in
an infinite loop and unable to handle new connections, even though the server
process is still running.

When a container has a healthcheck specified, it has a _health status_ in
addition to its normal status. This status is initially `starting`. Whenever a
health check passes, it becomes `healthy` (whatever state it was previously in).
After a certain number of consecutive failures, it becomes `unhealthy`.

The options that can appear before `CMD` are:

* `--interval=DURATION` (default: `30s`)
* `--timeout=DURATION` (default: `30s`)
* `--retries=N` (default: `1`)

The health check will first run **interval** seconds after the container is
started, and then again **interval** seconds after each previous check completes.

If a single run of the check takes longer than **timeout** seconds then the check
is considered to have failed.

It takes **retries** consecutive failures of the health check for the container
to be considered `unhealthy`.

There can only be one `HEALTHCHECK` instruction in a Dockerfile. If you list
more than one then only the last `HEALTHCHECK` will take effect.

The command after the `CMD` keyword can be either a shell command (e.g. `HEALTHCHECK
CMD /bin/check-running`) or an _exec_ array (as with other Dockerfile commands;
see e.g. `ENTRYPOINT` for details).

The command's exit status indicates the health status of the container.
The possible values are:

- 0: success - the container is healthy and ready for use
- 1: unhealthy - the container is not working correctly
- 2: starting - the container is not ready for use yet, but is working correctly

If the probe returns 2 ("starting") when the container has already moved out of the
"starting" state then it is treated as "unhealthy" instead.

For example, to check every five minutes or so that a web-server is able to
serve the site's main page within three seconds:

    HEALTHCHECK --interval=5m --timeout=3s \
      CMD curl -f http://localhost/ || exit 1

To help debug failing probes, any output text (UTF-8 encoded) that the command writes
on stdout or stderr will be stored in the health status and can be queried with
`docker inspect`. Such output should be kept short (only the first 4096 bytes
are stored currently).

When the health status of a container changes, a `health_status` event is
generated with the new status. The health status is also displayed in the
`docker ps` output.

Signed-off-by: Thomas Leonard <thomas.leonard@docker.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2016-06-02 23:58:34 +02:00