Commit graph

441 commits

Author SHA1 Message Date
huang-jl
da643c0b8a libcontainerd: change the digest used when restoring
For current implementation of Checkpoint Restore (C/R) in docker, it
will write the checkpoint to content store. However, when restoring
libcontainerd uses .Digest().Encoded(), which will remove the info
of alg, leading to error.

Signed-off-by: huang-jl <1046678590@qq.com>
2024-02-27 20:17:31 +08:00
Sebastiaan van Stijn
c516804d6f
vendor: OTEL v0.46.1 / v1.21.0
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2024-02-23 10:11:07 +01:00
Cory Snider
dd20bf4862 libcontainerd/supervisor: fix data race
The monitorDaemon() goroutine calls startContainerd() then blocks on
<-daemonWaitCh to wait for it to exit. The startContainerd() function
would (re)initialize the daemonWaitCh so a restarted containerd could be
waited on. This implementation was race-free because startContainerd()
would synchronously initialize the daemonWaitCh before returning. When
the call to start the managed containerd process was moved into the
waiter goroutine, the code to initialize the daemonWaitCh struct field
was also moved into the goroutine. This introduced a race condition.

Move the daemonWaitCh initialization to guarantee that it happens before
the startContainerd() call returns.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2024-02-01 15:53:18 -05:00
Cory Snider
659d7b190f libcontainerd: create unstarted tasks
Split task creation and start into two separate method calls in the
libcontainerd API. Clients now have the opportunity to inspect the
freshly-created task and customize its runtime environment before
starting execution of the user-specified binary.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2024-01-10 13:50:26 -05:00
Sebastiaan van Stijn
54fcd40aa4
Merge pull request #46227 from thaJeztah/supervisor_ignore_errs
libcontainerd/supervisor: explicitly ignore process kill errors
2023-11-22 08:40:45 +01:00
Cory Snider
29ac09ee9d Revert "libcontainerd: work around exec start bug in c8d"
The workaround is no longer required. The bug has been fixed in stable
versions of all supported containerd branches.

This reverts commit fb7ec1555c.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-11-06 13:26:44 -05:00
Sebastiaan van Stijn
cff4f20c44
migrate to github.com/containerd/log v0.1.0
The github.com/containerd/containerd/log package was moved to a separate
module, which will also be used by upcoming (patch) releases of containerd.

This patch moves our own uses of the package to use the new module.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-10-11 17:52:23 +02:00
Sebastiaan van Stijn
3614749b55
Merge pull request #45966 from neersighted/buildkit_0.12
Update to BuildKit 0.12
2023-09-22 02:13:15 +02:00
Bjorn Neergaard
fd6dd6935b
vendor: github.com/containerd/containerd v1.7.6
The DeepEqual ignore required in the daemon tests is a bit ugly, but it
works given the new protoc output.

We also have to ignore lints related to schema1 deprecations; these do
not apply as we must continue to support this schema version.

Signed-off-by: Bjorn Neergaard <bjorn.neergaard@docker.com>
2023-09-21 14:18:40 -06:00
Bjorn Neergaard
0e80073e01
daemon: strongly type containerd log.OutputFormat
This type was introduced in
0a79e67e4f

Make use of it throughout our log-format handling code, and convert back
to a string before we pass it to the containerd client.

Signed-off-by: Bjorn Neergaard <bjorn.neergaard@docker.com>
2023-09-21 05:40:17 -06:00
Sebastiaan van Stijn
3bd3cdd82e
Merge pull request #46476 from vvoland/libcontainerd-windows-reap-fix
libcontainerd/windows: Fix cleanup on `newIOFromProcess` error
2023-09-18 15:06:56 +02:00
Sebastiaan van Stijn
96faee9762
libcontainer: client.processEventStream: use locally scoped variables
- use local variables and remove some intermediate variables
- handle the events inside the switch itself; this makes all the
  switch branches use the same logic, instead of "some" using
  a `continue`, and others falling through to have the event handled
  outside of the switch.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-17 14:29:54 +02:00
Sebastiaan van Stijn
bd523abd44
remove more direct uses of logrus
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-15 20:12:27 +02:00
Paweł Gronowski
0937aef261
libcontainerd/windows: Don't reap on failure
Synchronize the code to do the same thing as Exec.
reap doesn't need to be called before the start event was sent.
There's already a defer block which cleans up the process in case where
an error occurs.

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2023-09-14 11:11:33 +02:00
Paweł Gronowski
b805599ef6
libcontainer/windows: Remove unneeded var declaration
The cleanup defer uses an `outErr` now, so we don't need to worry about
shadowing.

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2023-09-14 11:10:40 +02:00
Paweł Gronowski
55b664046c
libcontainer/windows: Fix process not being killed after stdio attach failure
Error check in defer block used wrong error variable which is always nil
if the flow reaches the defer. This caused the `newProcess.Kill` to be
never called if the subsequent attemp to attach to the stdio failed.
Although this only happens in Exec (as Start does overwrite the error),
this also adjusts the Start to also use the returned error to avoid this
kind of mistake in future changes.

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2023-09-14 11:10:11 +02:00
Brian Goff
642e9917ff Add otel support
This uses otel standard environment variables to configure tracing in
the daemon.
It also adds support for propagating trace contexts in the client and
reading those from the API server.

See
https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/
for details on otel environment variables.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2023-09-07 18:38:19 +00:00
Sebastiaan van Stijn
178125ae39
libcontainerd/supervisor: explicitly ignore process kill errors
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-08-14 14:02:27 +02:00
Sebastiaan van Stijn
5e2a1195d7
swap logrus types for their containerd/logs aliases
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-08-01 13:02:55 +02:00
Cory Snider
0c2699da27
Merge pull request #45737 from pkwarren/pkw/issue-44940-dockerd-json-logs
Update dockerd to support JSON logging format
2023-07-13 19:00:31 -04:00
Sebastiaan van Stijn
4175a550fd
libcontainerd: format code with gofumpt
Formatting the code with https://github.com/mvdan/gofumpt

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-29 00:31:50 +02:00
Philip K. Warren
a08abec9f8 Update dockerd to support JSON logging format
Update docker to support a '--log-format' option, which accepts either
'text' (default) or 'json'. Propagate the log format to containerd as
well, to ensure that everything will be logged consistently.

Signed-off-by: Philip K. Warren <pkwarren@gmail.com>
2023-06-28 12:46:28 -05:00
Brian Goff
74da6a6363 Switch all logging to use containerd log pkg
This unifies our logging and allows us to propagate logging and trace
contexts together.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2023-06-24 00:23:44 +00:00
Cory Snider
dea870f4ea daemon: stop setting container resources to zero
Many of the fields in LinuxResources struct are pointers to scalars for
some reason, presumably to differentiate between set-to-zero and unset
when unmarshaling from JSON, despite zero being outside the acceptable
range for the corresponding kernel tunables. When creating the OCI spec
for a container, the daemon sets the container's OCI spec CPUShares and
BlkioWeight parameters to zero when the corresponding Docker container
configuration values are zero, signifying unset, despite the minimum
acceptable value for CPUShares being two, and BlkioWeight ten. This has
gone unnoticed as runC does not distingiush set-to-zero from unset as it
also uses zero internally to represent unset for those fields. However,
kata-containers v3.2.0-alpha.3 tries to apply the explicit-zero resource
parameters to the container, exactly as instructed, and fails loudly.
The OCI runtime-spec is silent on how the runtime should handle the case
when those parameters are explicitly set to out-of-range values and
kata's behaviour is not unreasonable, so the daemon must therefore be in
the wrong.

Translate unset values in the Docker container's resources HostConfig to
omit the corresponding fields in the container's OCI spec when starting
and updating a container in order to maximize compatibility with
runtimes.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-06 12:13:05 -04:00
Cory Snider
fb7ec1555c libcontainerd: work around exec start bug in c8d
It turns out that the unnecessary serialization removed in
b75246202a happened to work around a bug
in containerd. When many exec processes are started concurrently in the
same containerd task, it takes seconds to minutes for them all to start.
Add the workaround back in, only deliberately this time.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-05-25 16:00:29 -04:00
Sebastiaan van Stijn
bafcfdf8c5
Merge pull request #45484 from thaJeztah/remove_deprecated_stubs
remove deprecated types, fields, and functions
2023-05-12 14:03:26 +01:00
Jeyanthinath Muthuram
307b09e7eb
fixing consistent aliases for OCI spec imports
Signed-off-by: Jeyanthinath Muthuram <jeyanthinath10@gmail.com>
2023-05-08 15:27:52 +05:30
Sebastiaan van Stijn
fb96b94ed0
daemon: remove handling for deprecated "oom-score-adjust", and produce error
This option was deprecated in 5a922dc162, which
is part of the v24.0.0 release, so we can remove it from master.

This patch;

- adds a check to ValidatePlatformConfig, and produces a fatal error
  if oom-score-adjust is set
- removes the deprecated libcontainerd/supervisor.WithOOMScore
- removes the warning from docker info

With this patch:

    dockerd --oom-score-adjust=-500 --validate
    Flag --oom-score-adjust has been deprecated, and will be removed in the next release.
    unable to configure the Docker daemon with file /etc/docker/daemon.json: merged configuration validation from file and command line flags failed: DEPRECATED: The "oom-score-adjust" config parameter and the dockerd "--oom-score-adjust" options have been removed.

And when using `daemon.json`:

    dockerd --validate
    unable to configure the Docker daemon with file /etc/docker/daemon.json: merged configuration validation from file and command line flags failed: DEPRECATED: The "oom-score-adjust" config parameter and the dockerd "--oom-score-adjust" options have been removed.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-05-06 16:36:17 +02:00
Sebastiaan van Stijn
61656464d8
Merge pull request #45315 from thaJeztah/deprecate_oom_score_adjust
daemon: deprecate --oom-score-adjust for the daemon
2023-04-14 00:06:58 +02:00
Sebastiaan van Stijn
5a922dc162
daemon: deprecate --oom-score-adjust for the daemon
The `oom-score-adjust` option was added in a894aec8d8,
to prevent the daemon from being OOM-killed before other processes. This
option was mostly added as a "convenience", as running the daemon as a
systemd unit was not yet common.

Having the daemon set its own limits is not best-practice, and something
better handled by the process-manager starting the daemon.

Commit cf7a5be0f2 fixed this option to allow
disabling it, and 2b8e68ef06 removed the default
score adjust.

This patch deprecates the option altogether, recommending users to set these
limits through the process manager used, such as the "OOMScoreAdjust" option
in systemd units.

With this patch:

    dockerd --oom-score-adjust=-500 --validate
    Flag --oom-score-adjust has been deprecated, and will be removed in the next release.
    configuration OK

    echo '{"oom-score-adjust":-500}' > /etc/docker/daemon.json
    dockerd
    INFO[2023-04-12T21:34:51.133389627Z] Starting up
    INFO[2023-04-12T21:34:51.135607544Z] containerd not running, starting managed containerd
    WARN[2023-04-12T21:34:51.135629086Z] DEPRECATED: The "oom-score-adjust" config parameter and the dockerd "--oom-score-adjust" option will be removed in the next release.

    docker info
    Client:
      Context:    default
      Debug Mode: false
    ...
    DEPRECATED: The "oom-score-adjust" config parameter and the dockerd "--oom-score-adjust" option will be removed in the next release

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-13 00:02:39 +02:00
Sebastiaan van Stijn
81e62af94a
use consistent alias for containerd's errdefs package
The signatures of functions in containerd's errdefs packages are very
similar to those in our own, and it's easy to accidentally use the wrong
package.

This patch uses a consistent alias for all occurrences of this import.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-08 19:30:33 +02:00
Cory Snider
36935bd869 libcontainerd: close stdin sync if possible
Closing stdin of a container or exec (a.k.a.: task or process) has been
somewhat broken ever since support for ContainerD 1.0 was introduced
back in Docker v17.11: the error returned from the CloseIO() call was
effectively ignored due to it being assigned to a local variable which
shadowed the intended variable. Serendipitously, that oversight
prevented a data race. In my recent refactor of libcontainerd, I
corrected the variable shadowing issue and introduced the aforementioned
data race in the process.

Avoid deadlocking when closing stdin without swallowing errors or
introducing data races by calling CloseIO() synchronously if the process
handle is available, falling back to an asynchronous close-and-log
strategy otherwise. This solution is inelegant and complex, but looks to
be the best that could be done without changing the libcontainerd API.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-04-03 15:25:16 -04:00
Akihiro Suda
e807ae4f2e
vendor: github.com/containerd/cgroups/v3 v3.0.1
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2023-03-08 20:15:17 +09:00
Paweł Gronowski
a8f5c524a0
libcontainerd: Upgrade to typeurl/v2
In preparation for containerd v1.7 which migrates off gogo/protobuf
and changes the protobuf Any type to one that's not supported by our
vendored version of typeurl.

This fixes a compile error on usages of `typeurl.UnmarshalAny` when
upgrading to containerd v1.7.

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2023-03-08 11:26:32 +01:00
Sebastiaan van Stijn
11261594d8
Merge pull request #45032 from corhere/shim-opts
daemon: allow shimv2 runtimes to be configured
2023-03-02 21:45:05 +01:00
Paweł Gronowski
47e9caede7
libcontainerd/client: Rename cp to checkpoint
Make the variable longer to give a hint about it's broader scope.

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2023-03-01 15:07:58 +01:00
Paweł Gronowski
0c751f904f
libcontainerd/client: Fix checkpoint not being set
`cp` variable is used later to populate the `info.Checkpoint` field
option used by Task creation.
Previous changes mistakenly changed assignment of the `cp` variable to
declaration of a new variable that's scoped only to the if block.

Restore the old assignment behavior.

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2023-03-01 15:07:42 +01:00
Cory Snider
b0eed5ade6 daemon: allow shimv2 runtimes to be configured
Kubernetes only permits RuntimeClass values which are valid lowercase
RFC 1123 labels, which disallows the period character. This prevents
cri-dockerd from being able to support configuring alternative shimv2
runtimes for a pod as shimv2 runtime names must contain at least one
period character. Add support for configuring named shimv2 runtimes in
daemon.json so that runtime names can be aliased to
Kubernetes-compatible names.

Allow options to be set on shimv2 runtimes in daemon.json.

The names of the new daemon runtime config fields have been selected to
correspond with the equivalent field names in cri-containerd's
configuration so that users can more easily follow documentation from
the runtime vendor written for cri-containerd and apply it to
daemon.json.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-17 18:08:06 -05:00
Cory Snider
843fcc96f7 libc8d/remote: name task fifos after task ID
The ID of the task is known at the time that the FIFOs need to be
created (it's passed into the IO-creator callback, and is also the same
as the container ID) so there is no need to hardcode it to "init". Name
the FIFOs after the task ID to be consistent with the FIFO names of
exec'ed processes. Delete the now-unused InitProcessName constant so it
can never again be used in place of a task/process ID.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-31 17:02:43 -05:00
Cory Snider
719b08313f libc8d/local: set task id to container id
ContainerD unconditionally sets the ID of a task to its container's ID.
Emulate this behaviour in the libcontainerd local_windows implementation
so that the daemon can use ProcessID == ContainerID (in libcontainerd
terminology) to identify that an exit event is for the container's task
and not for another process (i.e. an exec) in the same container.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-31 17:02:43 -05:00
Sebastiaan van Stijn
01365cbd74
libcontainerd/local: use strings.Cut()
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-12-21 11:09:02 +01:00
Sebastiaan van Stijn
200edf8030
libcontainerd/remote: remove stray import comment
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-12-08 13:27:50 +01:00
Sebastiaan van Stijn
cea8e9b583
libcontainerd/supervisor: use pkg/pidfile for reading and writing pidfile
Also updated a variable name that collided with a package const.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-11-04 01:50:26 +01:00
Sebastiaan van Stijn
9d5e754caa
move pkg/system: process to a separate package
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-11-04 01:50:23 +01:00
Cory Snider
1bef9e3fbf Fix containerd task deletion after failed start
Deleting a containerd task whose status is Created fails with a
"precondition failed" error. This is because (aside from Windows)
a process is spawned when the task is created, and deleting the task
while the process is running would leak the process if it was allowed.
libcontainerd and the containerd plugin executor mistakenly try to clean
up from a failed start by deleting the created task, which will always
fail with the aforementined error. Change them to pass the
`WithProcessKill` delete option so the cleanup has a chance to succeed.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-11-02 13:48:13 -04:00
Cory Snider
1f22b15030 Lock OS threads when exec'ing with Pdeathsig
On Linux, when (os/exec.Cmd).SysProcAttr.Pdeathsig is set, the signal
will be sent to the process when the OS thread on which cmd.Start() was
executed dies. The runtime terminates an OS thread when a goroutine
exits after being wired to the thread with runtime.LockOSThread(). If
other goroutines are allowed to be scheduled onto a thread which called
cmd.Start(), an unrelated goroutine could cause the thread to be
terminated and prematurely signal the command. See
https://github.com/golang/go/issues/27505 for more information.

Prevent started subprocesses with Pdeathsig from getting signaled
prematurely by wiring the starting goroutine to the OS thread until the
subprocess has exited. No other goroutines can be scheduled onto a
locked thread so it will remain alive until unlocked or the daemon
process exits.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-10-05 12:18:03 -04:00
Cory Snider
6a2f385aea Share logic to create-or-replace a container
The existing logic to handle container ID conflicts when attempting to
create a plugin container is not nearly as robust as the implementation
in daemon for user containers. Extract and refine the logic from daemon
and use it in the plugin executor.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-08-24 14:59:08 -04:00
Cory Snider
4bafaa00aa Refactor libcontainerd to minimize c8d RPCs
The containerd client is very chatty at the best of times. Because the
libcontained API is stateless and references containers and processes by
string ID for every method call, the implementation is essentially
forced to use the containerd client in a way which amplifies the number
of redundant RPCs invoked to perform any operation. The libcontainerd
remote implementation has to reload the containerd container, task
and/or process metadata for nearly every operation. This in turn
amplifies the number of context switches between dockerd and containerd
to perform any container operation or handle a containerd event,
increasing the load on the system which could otherwise be allocated to
workloads.

Overhaul the libcontainerd interface to reduce the impedance mismatch
with the containerd client so that the containerd client can be used
more efficiently. Split the API out into container, task and process
interfaces which the consumer is expected to retain so that
libcontainerd can retain state---especially the analogous containerd
client objects---without having to manage any state-store inside the
libcontainerd client.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-08-24 14:59:08 -04:00
Cory Snider
57d2d6ef62 Update container OOMKilled flag immediately
The OOMKilled flag on a container's state has historically behaved
rather unintuitively: it is updated on container exit to reflect whether
or not any process within the container has been OOM-killed during the
preceding run of the container. The OOMKilled flag would be set to true
when the container exits if any process within the container---including
execs---was OOM-killed at any time while the container was running,
whether or not the OOM-kill was the cause of the container exiting. The
flag is "sticky," persisting through the next start of the container;
only being cleared once the container exits without any processes having
been OOM-killed that run.

Alter the behavior of the OOMKilled flag such that it signals whether
any process in the container had been OOM-killed since the most recent
start of the container. Set the flag immediately upon any process being
OOM-killed, and clear it when the container transitions to the "running"
state.

There is an ulterior motive for this change. It reduces the amount of
state the libcontainerd client needs to keep track of and clean up on
container exit. It's one less place the client could leak memory if a
container was to be deleted without going through libcontainerd.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-08-24 14:59:07 -04:00
Sebastiaan van Stijn
6560e0b136
cmd/dockerd: initContainerD(): clean-up some logs
Change the log-level for messages about starting the managed containerd instance
to be the same as for the main API. And remove a redundant debug-log.

With this patch:

    dockerd
    INFO[2022-08-11T11:46:32.573299176Z] Starting up
    INFO[2022-08-11T11:46:32.574304409Z] containerd not running, starting managed containerd
    INFO[2022-08-11T11:46:32.575289181Z] started new containerd process                address=/var/run/docker/containerd/containerd.sock module=libcontainerd pid=5370
cmd/dockerd: initContainerD(): clean-up some logs

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-08-11 14:11:08 +02:00