Commit graph

38 commits

Author SHA1 Message Date
Cory Snider
f50cb0c7bd
daemon: stop setting container resources to zero
Many of the fields in LinuxResources struct are pointers to scalars for
some reason, presumably to differentiate between set-to-zero and unset
when unmarshaling from JSON, despite zero being outside the acceptable
range for the corresponding kernel tunables. When creating the OCI spec
for a container, the daemon sets the container's OCI spec CPUShares and
BlkioWeight parameters to zero when the corresponding Docker container
configuration values are zero, signifying unset, despite the minimum
acceptable value for CPUShares being two, and BlkioWeight ten. This has
gone unnoticed as runC does not distingiush set-to-zero from unset as it
also uses zero internally to represent unset for those fields. However,
kata-containers v3.2.0-alpha.3 tries to apply the explicit-zero resource
parameters to the container, exactly as instructed, and fails loudly.
The OCI runtime-spec is silent on how the runtime should handle the case
when those parameters are explicitly set to out-of-range values and
kata's behaviour is not unreasonable, so the daemon must therefore be in
the wrong.

Translate unset values in the Docker container's resources HostConfig to
omit the corresponding fields in the container's OCI spec when starting
and updating a container in order to maximize compatibility with
runtimes.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit dea870f4ea)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-21 22:16:22 +02:00
Jeyanthinath Muthuram
e370f224ae
fixing consistent aliases for OCI spec imports
Signed-off-by: Jeyanthinath Muthuram <jeyanthinath10@gmail.com>
(cherry picked from commit 307b09e7eb)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-01 17:22:49 +02:00
Cory Snider
7a4ea19803
libcontainerd: work around exec start bug in c8d
It turns out that the unnecessary serialization removed in
b75246202a happened to work around a bug
in containerd. When many exec processes are started concurrently in the
same containerd task, it takes seconds to minutes for them all to start.
Add the workaround back in, only deliberately this time.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit fb7ec1555c)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-05-25 22:18:14 +02:00
Sebastiaan van Stijn
81e62af94a
use consistent alias for containerd's errdefs package
The signatures of functions in containerd's errdefs packages are very
similar to those in our own, and it's easy to accidentally use the wrong
package.

This patch uses a consistent alias for all occurrences of this import.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-08 19:30:33 +02:00
Cory Snider
36935bd869 libcontainerd: close stdin sync if possible
Closing stdin of a container or exec (a.k.a.: task or process) has been
somewhat broken ever since support for ContainerD 1.0 was introduced
back in Docker v17.11: the error returned from the CloseIO() call was
effectively ignored due to it being assigned to a local variable which
shadowed the intended variable. Serendipitously, that oversight
prevented a data race. In my recent refactor of libcontainerd, I
corrected the variable shadowing issue and introduced the aforementioned
data race in the process.

Avoid deadlocking when closing stdin without swallowing errors or
introducing data races by calling CloseIO() synchronously if the process
handle is available, falling back to an asynchronous close-and-log
strategy otherwise. This solution is inelegant and complex, but looks to
be the best that could be done without changing the libcontainerd API.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-04-03 15:25:16 -04:00
Paweł Gronowski
a8f5c524a0
libcontainerd: Upgrade to typeurl/v2
In preparation for containerd v1.7 which migrates off gogo/protobuf
and changes the protobuf Any type to one that's not supported by our
vendored version of typeurl.

This fixes a compile error on usages of `typeurl.UnmarshalAny` when
upgrading to containerd v1.7.

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2023-03-08 11:26:32 +01:00
Paweł Gronowski
47e9caede7
libcontainerd/client: Rename cp to checkpoint
Make the variable longer to give a hint about it's broader scope.

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2023-03-01 15:07:58 +01:00
Paweł Gronowski
0c751f904f
libcontainerd/client: Fix checkpoint not being set
`cp` variable is used later to populate the `info.Checkpoint` field
option used by Task creation.
Previous changes mistakenly changed assignment of the `cp` variable to
declaration of a new variable that's scoped only to the if block.

Restore the old assignment behavior.

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2023-03-01 15:07:42 +01:00
Cory Snider
843fcc96f7 libc8d/remote: name task fifos after task ID
The ID of the task is known at the time that the FIFOs need to be
created (it's passed into the IO-creator callback, and is also the same
as the container ID) so there is no need to hardcode it to "init". Name
the FIFOs after the task ID to be consistent with the FIFO names of
exec'ed processes. Delete the now-unused InitProcessName constant so it
can never again be used in place of a task/process ID.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-31 17:02:43 -05:00
Sebastiaan van Stijn
200edf8030
libcontainerd/remote: remove stray import comment
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-12-08 13:27:50 +01:00
Cory Snider
1bef9e3fbf Fix containerd task deletion after failed start
Deleting a containerd task whose status is Created fails with a
"precondition failed" error. This is because (aside from Windows)
a process is spawned when the task is created, and deleting the task
while the process is running would leak the process if it was allowed.
libcontainerd and the containerd plugin executor mistakenly try to clean
up from a failed start by deleting the created task, which will always
fail with the aforementined error. Change them to pass the
`WithProcessKill` delete option so the cleanup has a chance to succeed.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-11-02 13:48:13 -04:00
Cory Snider
4bafaa00aa Refactor libcontainerd to minimize c8d RPCs
The containerd client is very chatty at the best of times. Because the
libcontained API is stateless and references containers and processes by
string ID for every method call, the implementation is essentially
forced to use the containerd client in a way which amplifies the number
of redundant RPCs invoked to perform any operation. The libcontainerd
remote implementation has to reload the containerd container, task
and/or process metadata for nearly every operation. This in turn
amplifies the number of context switches between dockerd and containerd
to perform any container operation or handle a containerd event,
increasing the load on the system which could otherwise be allocated to
workloads.

Overhaul the libcontainerd interface to reduce the impedance mismatch
with the containerd client so that the containerd client can be used
more efficiently. Split the API out into container, task and process
interfaces which the consumer is expected to retain so that
libcontainerd can retain state---especially the analogous containerd
client objects---without having to manage any state-store inside the
libcontainerd client.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-08-24 14:59:08 -04:00
Cory Snider
57d2d6ef62 Update container OOMKilled flag immediately
The OOMKilled flag on a container's state has historically behaved
rather unintuitively: it is updated on container exit to reflect whether
or not any process within the container has been OOM-killed during the
preceding run of the container. The OOMKilled flag would be set to true
when the container exits if any process within the container---including
execs---was OOM-killed at any time while the container was running,
whether or not the OOM-kill was the cause of the container exiting. The
flag is "sticky," persisting through the next start of the container;
only being cleared once the container exits without any processes having
been OOM-killed that run.

Alter the behavior of the OOMKilled flag such that it signals whether
any process in the container had been OOM-killed since the most recent
start of the container. Set the flag immediately upon any process being
OOM-killed, and clear it when the container transitions to the "running"
state.

There is an ulterior motive for this change. It reduces the amount of
state the libcontainerd client needs to keep track of and clean up on
container exit. It's one less place the client could leak memory if a
container was to be deleted without going through libcontainerd.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-08-24 14:59:07 -04:00
Sebastiaan van Stijn
4f08346686
fix formatting of "nolint" tags for go1.19
The correct formatting for machine-readable comments is;

    //<some alphanumeric identifier>:<options>[,<option>...][ // comment]

Which basically means:

- MUST NOT have a space before `<identifier>` (e.g. `nolint`)
- Identified MUST be alphanumeric
- MUST be followed by a colon
- MUST be followed by at least one `<option>`
- Optionally additional `<options>` (comma-separated)
- Optionally followed by a comment

Any other format will not be considered a machine-readable comment by `gofmt`,
and thus formatted as a regular comment. Note that this also means that a
`//nolint` (without anything after it) is considered invalid, same for `//#nosec`
(starts with a `#`).

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-07-13 22:31:53 +02:00
Akihiro Suda
658a4b0fec
libcontainerd: remove support for runtime v1 API
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2022-06-05 18:41:44 +09:00
Sebastiaan van Stijn
2ec2b65e45
libcontainerd: SignalProcess(): accept syscall.Signal
This helps reducing some type-juggling / conversions further up
the stack.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-05-05 00:53:49 +02:00
Sebastiaan van Stijn
d13997b4ba
gosec: G601: Implicit memory aliasing in for loop
plugin/v2/plugin.go:141:50: G601: Implicit memory aliasing in for loop. (gosec)
                    updateSettingsEnv(&p.PluginObj.Settings.Env, &s)
                                                                 ^
    libcontainerd/remote/client.go:572:13: G601: Implicit memory aliasing in for loop. (gosec)
                cpDesc = &m
                         ^
    distribution/push_v2.go:400:34: G601: Implicit memory aliasing in for loop. (gosec)
                (metadata.CheckV2MetadataHMAC(&mountCandidate, pd.hmacKey) ||
                                              ^
    builder/dockerfile/builder.go:261:84: G601: Implicit memory aliasing in for loop. (gosec)
            currentCommandIndex = printCommand(b.Stdout, currentCommandIndex, totalCommands, &meta)
                                                                                             ^
    builder/dockerfile/builder.go:278:46: G601: Implicit memory aliasing in for loop. (gosec)
            if err := initializeStage(dispatchRequest, &stage); err != nil {
                                                       ^
    daemon/container.go:283:40: G601: Implicit memory aliasing in for loop. (gosec)
            if err := parser.ValidateMountConfig(&cfg); err != nil {
                                                 ^

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-06-10 13:03:29 +02:00
Sebastiaan van Stijn
08ddbfbdac
libcontainerd: remove LCOW bits
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-06-09 22:05:10 +02:00
Brian Goff
4b981436fe Fixup libnetwork lint errors
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2021-06-01 23:48:32 +00:00
Sebastiaan van Stijn
0f32beb4f8
libcontainerd: remove unused consts
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-03-19 21:52:23 +01:00
Cam
80a5df9c49
Added container ID to containerd task delete event messages
Signed-off-by: Cam <gh@sparr.email>
2020-10-30 20:58:57 -07:00
Brian Goff
f14aea63c9 "Fix" checkpoint on v2 runtime
Checkpoint/Restore is horribly broken all around.
But on the, now default, v2 runtime it's even more broken.

This at least makes checkpoint equally broken on both runtimes.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-10-12 22:35:37 +00:00
Brian Goff
906007f6c1 libcontainerd: use cancellable context for events
The event subscriber can only be cancelled by cancelling the context.
In the case where we have to restart event processing we are never
cancelling the old subscribiption.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-08-12 17:09:21 +00:00
Brian Goff
60d7265803 Use IsServing to determine if c8d client is ready
Instead of sleeping an arbitrary amount of time, using the client to
tell us when it's ready so we can start processing events sooner.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-08-12 17:09:21 +00:00
Brian Goff
f63f73a4a8 Configure shims from runtime config
In dockerd we already have a concept of a "runtime", which specifies the
OCI runtime to use (e.g. runc).
This PR extends that config to add containerd shim configuration.
This option is only exposed within the daemon itself (cannot be
configured in daemon.json).
This is due to issues in supporting unknown shims which will require
more design work.

What this change allows us to do is keep all the runtime config in one
place.

So the default "runc" runtime will just have it's already existing shim
config codified within the runtime config alone.
I've also added 2 more "stock" runtimes which are basically runc+shimv1
and runc+shimv2.
These new runtime configurations are:

- io.containerd.runtime.v1.linux - runc + v1 shim using the V1 shim API
- io.containerd.runc.v2 - runc + shim v2

These names coincide with the actual names of the containerd shims.

This allows the user to essentially control what shim is going to be
used by either specifying these as a `--runtime` on container create or
by setting `--default-runtime` on the daemon.

For custom/user-specified runtimes, the default shim config (currently
shim v1) is used.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-07-13 14:18:02 -07:00
Akihiro Suda
612343618d cgroup2: use shim V2
* Requires containerd binaries from containerd/containerd#3799 . Metrics are unimplemented yet.
* Works with crun v0.10.4, but `--security-opt seccomp=unconfined` is needed unless using master version of libseccomp
  ( containers/crun#156, seccomp/libseccomp#177 )
* Doesn't work with master runc yet
* Resource limitations are unimplemented

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-01-01 02:58:40 +09:00
Sebastiaan van Stijn
5bb4f4818b
libcontainerd: move hcsshim import to windows-only file
This reduces the dependency-graph when building packages for
Linux only.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-12-10 10:58:14 +01:00
Akihiro Suda
de5a67156b
Merge pull request #39082 from ehazlett/opts-for-create
Add NewContainerOpts to libcontainerd.Create
2019-10-04 08:20:47 +09:00
Evan Hazlett
35ac4be5d5 add NewContainerOpts to libcontainerd.Create
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
2019-10-03 11:45:41 -04:00
Sebastiaan van Stijn
07ff4f1de8
goimports: fix imports
Format the source according to latest goimports.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-09-18 12:56:54 +02:00
Brian Goff
1acaf2aabe Sleep before restarting event processing
This prevents restarting event processing in a tight loop.
You can see this with the following steps:

```terminal
$ containerd &
$ dockerd --containerd=/run/containerd/containerd.sock &
$ pkill -9 containerd
```

At this point you will be spammed with logs such as:

```
ERRO[2019-07-12T22:29:37.318761400Z] failed to get event                           error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=plugins.moby
```

Without this change you can quickly end up with gigabytes of log data.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2019-07-12 15:42:19 -07:00
Michael Crosby
b5f28865ef Handle blocked I/O of exec'd processes
This is the second part to
https://github.com/containerd/containerd/pull/3361 and will help process
delete not block forever when the process exists but the I/O was
inherited by a subprocess that lives on.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-06-21 12:02:15 -04:00
Sebastiaan van Stijn
539e72f75b
Fix typo retreive -> retrieve
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-06-04 17:33:04 +02:00
Michael Crosby
b9b5dc37e3 Remove inmemory container map
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-04-05 15:48:07 -04:00
Michael Crosby
adb15c2899 Export WithBundle code
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-04-05 08:41:48 -04:00
Michael Crosby
45e328b0ac Remove libcontainerd status type
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-04-04 15:17:13 -04:00
John Howard
2f27332836 Windows: Implement docker top for containerd
Signed-off-by: John Howard <jhoward@microsoft.com>
2019-03-12 18:41:55 -07:00
John Howard
85ad4b16c1 Windows: Experimental: Allow containerd for runtime
Signed-off-by: John Howard <jhoward@microsoft.com>

This is the first step in refactoring moby (dockerd) to use containerd on Windows.
Similar to the current model in Linux, this adds the option to enable it for runtime.
It does not switch the graphdriver to containerd snapshotters.

 - Refactors libcontainerd to a series of subpackages so that either a
  "local" containerd (1) or a "remote" (2) containerd can be loaded as opposed
  to conditional compile as "local" for Windows and "remote" for Linux.

 - Updates libcontainerd such that Windows has an option to allow the use of a
   "remote" containerd. Here, it communicates over a named pipe using GRPC.
   This is currently guarded behind the experimental flag, an environment variable,
   and the providing of a pipename to connect to containerd.

 - Infrastructure pieces such as under pkg/system to have helper functions for
   determining whether containerd is being used.

(1) "local" containerd is what the daemon on Windows has used since inception.
It's not really containerd at all - it's simply local invocation of HCS APIs
directly in-process from the daemon through the Microsoft/hcsshim library.

(2) "remote" containerd is what docker on Linux uses for it's runtime. It means
that there is a separate containerd service running, and docker communicates over
GRPC to it.

To try this out, you will need to start with something like the following:

Window 1:
	containerd --log-level debug

Window 2:
	$env:DOCKER_WINDOWS_CONTAINERD=1
	dockerd --experimental -D --containerd \\.\pipe\containerd-containerd

You will need the following binary from github.com/containerd/containerd in your path:
 - containerd.exe

You will need the following binaries from github.com/Microsoft/hcsshim in your path:
 - runhcs.exe
 - containerd-shim-runhcs-v1.exe

For LCOW, it will require and initrd.img and kernel in `C:\Program Files\Linux Containers`.
This is no different to the current requirements. However, you may need updated binaries,
particularly initrd.img built from Microsoft/opengcs as (at the time of writing), Linuxkit
binaries are somewhat out of date.

Note that containerd and hcsshim for HCS v2 APIs do not yet support all the required
functionality needed for docker. This will come in time - this is a baby (although large)
step to migrating Docker on Windows to containerd.

Note that the HCS v2 APIs are only called on RS5+ builds. RS1..RS4 will still use
HCS v1 APIs as the v2 APIs were not fully developed enough on these builds to be usable.
This abstraction is done in HCSShim. (Referring specifically to runtime)

Note the LCOW graphdriver still uses HCS v1 APIs regardless.

Note also that this does not migrate docker to use containerd snapshotters
rather than graphdrivers. This needs to be done in conjunction with Linux also
doing the same switch.
2019-03-12 18:41:55 -07:00