Commit graph

384 commits

Author SHA1 Message Date
Cory Snider
4949db7f62
daemon: stop setting container resources to zero
Many of the fields in LinuxResources struct are pointers to scalars for
some reason, presumably to differentiate between set-to-zero and unset
when unmarshaling from JSON, despite zero being outside the acceptable
range for the corresponding kernel tunables. When creating the OCI spec
for a container, the daemon sets the container's OCI spec CPUShares and
BlkioWeight parameters to zero when the corresponding Docker container
configuration values are zero, signifying unset, despite the minimum
acceptable value for CPUShares being two, and BlkioWeight ten. This has
gone unnoticed as runC does not distingiush set-to-zero from unset as it
also uses zero internally to represent unset for those fields. However,
kata-containers v3.2.0-alpha.3 tries to apply the explicit-zero resource
parameters to the container, exactly as instructed, and fails loudly.
The OCI runtime-spec is silent on how the runtime should handle the case
when those parameters are explicitly set to out-of-range values and
kata's behaviour is not unreasonable, so the daemon must therefore be in
the wrong.

Translate unset values in the Docker container's resources HostConfig to
omit the corresponding fields in the container's OCI spec when starting
and updating a container in order to maximize compatibility with
runtimes.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit dea870f4ea)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-08-13 22:44:50 +02:00
Cory Snider
975bdb2c96 daemon: identify container exits by ProcessID
The Pid field of an exit event cannot be relied upon to differentiate
exits of the container's task from exits of other container processes,
i.e. execs. The Pid is reported by the runtime and is implementation-
defined so there is no guarantee that a task's pid is distinct from the
pids of any other process in the same container. In particular,
kata-containers reports the pid of the hypervisor for all exit events.
Update the daemon to differentiate container exits from exec exits by
inspecting the event's ProcessID.

The local_windows libcontainerd implementation already sets the
ProcessID to InitProcessName on container exit events. Update the remote
libcontainerd implementation to match. ContainerD guarantees that the
process ID of a task (container init process) is set to the
corresponding container ID, so use that invariant to distinguish task
exits from other process exits.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-31 12:14:50 -05:00
Cory Snider
1e0f2186a9 Fix containerd task deletion after failed start
Deleting a containerd task whose status is Created fails with a
"precondition failed" error. This is because (aside from Windows)
a process is spawned when the task is created, and deleting the task
while the process is running would leak the process if it was allowed.
libcontainerd mistakenly tries to clean up from a failed start by
deleting the created task, which will always fail with the
aforementioned error. Change it to pass the `WithProcessKill` delete
option so the cleanup has a chance to succeed.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 1bef9e3fbf)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-11-02 16:59:22 -04:00
Sebastiaan van Stijn
6ab3b50a3f
libcontainerd: switch generated containerd.toml to v2 (v1 is deprecated)
Before this patch:

    INFO[2022-07-27T14:30:06.188762628Z] Starting up
    INFO[2022-07-27T14:30:06.190750725Z] libcontainerd: started new containerd process  pid=2028
    ...
    WARN[0000] containerd config version `1` has been deprecated and will be removed in containerd v2.0, please switch to version `2`, see https://github.com/containerd/containerd/blob/main/docs/PLUGINS.md#version-header
    INFO[2022-07-27T14:30:06.220024286Z] starting containerd                           revision=10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1 version=v1.6.6

With this patch:

    INFO[2022-07-27T14:28:04.025543517Z] Starting up
    INFO[2022-07-27T14:28:04.027447105Z] libcontainerd: started new containerd process  pid=1377
    ...
    INFO[2022-07-27T14:28:04.054483270Z] starting containerd                           revision=10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1 version=v1.6.6

And the generated /var/run/docker/containerd/containerd.toml:

```toml
disabled_plugins = ["io.containerd.grpc.v1.cri"]
imports = []
oom_score = 0
plugin_dir = ""
required_plugins = []
root = "/var/lib/docker/containerd/daemon"
state = "/var/run/docker/containerd/daemon"
temp = ""
version = 2

[cgroup]
  path = ""

[debug]
  address = "/var/run/docker/containerd/containerd-debug.sock"
  format = ""
  gid = 0
  level = "debug"
  uid = 0

[grpc]
  address = "/var/run/docker/containerd/containerd.sock"
  gid = 0
  max_recv_message_size = 16777216
  max_send_message_size = 16777216
  tcp_address = ""
  tcp_tls_ca = ""
  tcp_tls_cert = ""
  tcp_tls_key = ""
  uid = 0

[metrics]
  address = ""
  grpc_histogram = false

[plugins]

[proxy_plugins]

[stream_processors]

[timeouts]

[ttrpc]
  address = ""
  gid = 0
  uid = 0
```

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit ba2ff69894)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-07-28 16:45:26 +02:00
Sebastiaan van Stijn
e34ab5200d
fix formatting of "nolint" tags for go1.19
The correct formatting for machine-readable comments is;

    //<some alphanumeric identifier>:<options>[,<option>...][ // comment]

Which basically means:

- MUST NOT have a space before `<identifier>` (e.g. `nolint`)
- Identified MUST be alphanumeric
- MUST be followed by a colon
- MUST be followed by at least one `<option>`
- Optionally additional `<options>` (comma-separated)
- Optionally followed by a comment

Any other format will not be considered a machine-readable comment by `gofmt`,
and thus formatted as a regular comment. Note that this also means that a
`//nolint` (without anything after it) is considered invalid, same for `//#nosec`
(starts with a `#`).

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 4f08346686)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-07-15 13:45:13 +02:00
Sebastiaan van Stijn
cdbca4061b
gofmt GoDoc comments with go1.19
Older versions of Go don't format comments, so committing this as
a separate commit, so that we can already make these changes before
we upgrade to Go 1.19.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 52c1a2fae8)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-07-13 22:42:29 +02:00
Akihiro Suda
658a4b0fec
libcontainerd: remove support for runtime v1 API
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2022-06-05 18:41:44 +09:00
Sebastiaan van Stijn
d733481399
daemon: daemon.ContainerKill() accept stop-signal as string
This allows the postContainersKill() handler to pass values as-is. As part of
the rewrite, I also moved the daemon.GetContainer(name) call later in the
function, so that we can fail early if an invalid signal is passed, before
doing the (heavier) fetching of the container.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-05-05 11:27:47 +02:00
Sebastiaan van Stijn
2ec2b65e45
libcontainerd: SignalProcess(): accept syscall.Signal
This helps reducing some type-juggling / conversions further up
the stack.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-05-05 00:53:49 +02:00
Paul "TBBle" Hampson
31e1fec950 Suport vpci-class-guid in the non-containerd backend
IDType `vpci-class-guid` is a synonym of `class`.

Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
2022-03-27 13:26:47 +11:00
Paul "TBBle" Hampson
cb07afa3cc Implement :// separator for arbitrary Windows Device IDTypes
Arbitrary here does not include '', best to catch that one early as it's
almost certainly a mistake (possibly an attempt to pass a POSIX path
through this API)

Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
2022-03-27 13:26:47 +11:00
Sebastiaan van Stijn
1b3fef5333
Windows: require Windows Server RS5 / ltsc2019 (build 17763) as minimum
Windows Server 2016 (RS1) reached end of support, and Docker Desktop requires
Windows 10 V19H2 (version 1909, build 18363) as a minimum.

This patch makes Windows Server RS5 /  ltsc2019 (build 17763) the minimum version
to run the daemon, and removes some hacks for older versions of Windows.

There is one check remaining that checks for Windows RS3 for a workaround
on older versions, but recent changes in Windows seemed to have regressed
on the same issue, so I kept that code for now to check if we may need that
workaround (again);

085c6a98d5/daemon/graphdriver/windows/windows.go (L319-L341)

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-02-18 22:58:28 +01:00
Eng Zer Jun
c55a4ac779
refactor: move from io/ioutil to io and os package
The io/ioutil package has been deprecated in Go 1.16. This commit
replaces the existing io/ioutil functions with their new definitions in
io and os packages.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2021-08-27 14:56:57 +08:00
Sebastiaan van Stijn
83ec46a7e6
libcontainerd/local: fix GoDoc
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-08-21 20:34:21 +02:00
Sebastiaan van Stijn
c33b9bcfd4
libcontainerd/local: remove LCOW bits
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-07-27 15:12:04 +02:00
Sebastiaan van Stijn
d13997b4ba
gosec: G601: Implicit memory aliasing in for loop
plugin/v2/plugin.go:141:50: G601: Implicit memory aliasing in for loop. (gosec)
                    updateSettingsEnv(&p.PluginObj.Settings.Env, &s)
                                                                 ^
    libcontainerd/remote/client.go:572:13: G601: Implicit memory aliasing in for loop. (gosec)
                cpDesc = &m
                         ^
    distribution/push_v2.go:400:34: G601: Implicit memory aliasing in for loop. (gosec)
                (metadata.CheckV2MetadataHMAC(&mountCandidate, pd.hmacKey) ||
                                              ^
    builder/dockerfile/builder.go:261:84: G601: Implicit memory aliasing in for loop. (gosec)
            currentCommandIndex = printCommand(b.Stdout, currentCommandIndex, totalCommands, &meta)
                                                                                             ^
    builder/dockerfile/builder.go:278:46: G601: Implicit memory aliasing in for loop. (gosec)
            if err := initializeStage(dispatchRequest, &stage); err != nil {
                                                       ^
    daemon/container.go:283:40: G601: Implicit memory aliasing in for loop. (gosec)
            if err := parser.ValidateMountConfig(&cfg); err != nil {
                                                 ^

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-06-10 13:03:29 +02:00
Sebastiaan van Stijn
08ddbfbdac
libcontainerd: remove LCOW bits
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-06-09 22:05:10 +02:00
Brian Goff
4b981436fe Fixup libnetwork lint errors
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2021-06-01 23:48:32 +00:00
Olli Janatuinen
bffa730860 Prepare tests for Windows containerd support
Signed-off-by: Olli Janatuinen <olli.janatuinen@gmail.com>
2021-04-22 10:50:00 +03:00
Sebastiaan van Stijn
2a7c1cc1d6
libcontainerd/supervisor: replace BurntSushi/toml with pelletier/go-toml
Taking the same approach as was taken in containerd

The new library has a slightly different output;

- keys at the same level are sorted alphabetically
- empty sections not omitted (`proxy_plugins`, `stream_processors`, `timeouts`),
  which could possibly be be addressed with an "omitempty" in containerd's struct.
- empty slices are not omitted (`imports`, `required_plugins`)

After sorting the "before" configuration the diff looks like this:

```patch
diff --git a/config-before-sorted.toml b/config-after.toml
index cc771ce7ab..43a727f589 100644
--- a/config-before-sorted.toml
+++ b/config-after.toml
@@ -1,6 +1,8 @@
 disabled_plugins = ["cri"]
+imports = []
 oom_score = 0
 plugin_dir = ""
+required_plugins = []
 root = "/var/lib/docker/containerd/daemon"
 state = "/var/run/docker/containerd/daemon"
 version = 0
@@ -37,6 +39,12 @@ version = 0
     shim = "containerd-shim"
     shim_debug = true

+[proxy_plugins]
+
+[stream_processors]
+
+[timeouts]
+
 [ttrpc]
   address = ""
   gid = 0
```

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-04-02 17:42:57 +02:00
Sebastiaan van Stijn
0f32beb4f8
libcontainerd: remove unused consts
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-03-19 21:52:23 +01:00
Sebastiaan van Stijn
9637be0e9d
libcontainerd: remove unused win32 errors (leftover from TP4)
These were added in 94d70d8355 for Windows TP4,
but no longer used after 331c8a86d4 removed
support for TP4.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-03-19 21:52:21 +01:00
Cam
80a5df9c49
Added container ID to containerd task delete event messages
Signed-off-by: Cam <gh@sparr.email>
2020-10-30 20:58:57 -07:00
Tibor Vass
cf867587b9
Merge pull request #41527 from thaJeztah/no_oom_score_adj
daemon: don't adjust oom-score if score is 0
2020-10-15 15:00:18 -07:00
Brian Goff
f14aea63c9 "Fix" checkpoint on v2 runtime
Checkpoint/Restore is horribly broken all around.
But on the, now default, v2 runtime it's even more broken.

This at least makes checkpoint equally broken on both runtimes.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-10-12 22:35:37 +00:00
Sebastiaan van Stijn
cf7a5be0f2
daemon: don't adjust oom-score if score is 0
This patch makes two changes if --oom-score-adj is set to 0

- do not adjust the oom-score-adjust cgroup for dockerd
- do not set the hard-coded -999 score for containerd if
  containerd is running as child process

Before this change:

oom-score-adj | dockerd       | containerd as child-process
--------------|---------------|----------------------------
-             | -500          | -500 (same as dockerd)
-100          | -100          | -100 (same as dockerd)
 0            |  0            | -999 (hard-coded default)

With this change:

oom-score-adj | dockerd       | containerd as child-process
--------------|---------------|----------------------------
-             | -500          | -500 (same as dockerd)
-100          | -100          | -100 (same as dockerd)
0             | not adjusted  | not adjusted

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-10-05 19:52:02 +02:00
Brian Goff
906007f6c1 libcontainerd: use cancellable context for events
The event subscriber can only be cancelled by cancelling the context.
In the case where we have to restart event processing we are never
cancelling the old subscribiption.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-08-12 17:09:21 +00:00
Brian Goff
60d7265803 Use IsServing to determine if c8d client is ready
Instead of sleeping an arbitrary amount of time, using the client to
tell us when it's ready so we can start processing events sooner.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-08-12 17:09:21 +00:00
Sebastiaan van Stijn
bf7fd015f7
Remove unused useShimV2()
This function was removed in the Linux code as part of
f63f73a4a8, but was not removed in
the Windows code.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-07-15 14:28:48 +02:00
Brian Goff
f63f73a4a8 Configure shims from runtime config
In dockerd we already have a concept of a "runtime", which specifies the
OCI runtime to use (e.g. runc).
This PR extends that config to add containerd shim configuration.
This option is only exposed within the daemon itself (cannot be
configured in daemon.json).
This is due to issues in supporting unknown shims which will require
more design work.

What this change allows us to do is keep all the runtime config in one
place.

So the default "runc" runtime will just have it's already existing shim
config codified within the runtime config alone.
I've also added 2 more "stock" runtimes which are basically runc+shimv1
and runc+shimv2.
These new runtime configurations are:

- io.containerd.runtime.v1.linux - runc + v1 shim using the V1 shim API
- io.containerd.runc.v2 - runc + shim v2

These names coincide with the actual names of the containerd shims.

This allows the user to essentially control what shim is going to be
used by either specifying these as a `--runtime` on container create or
by setting `--default-runtime` on the daemon.

For custom/user-specified runtimes, the default shim config (currently
shim v1) is used.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-07-13 14:18:02 -07:00
Akihiro Suda
3802830989 cgroup2: implement docker stats
The following fields are unsupported:

* BlkioStats: all fields other than IoServiceBytesRecursive
* CPUStats: CPUUsage.PercpuUsage
* MemoryStats: MaxUsage and Failcnt

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-04-02 17:51:34 +09:00
Sebastiaan van Stijn
9f0b3f5609
bump gotest.tools v3.0.1 for compatibility with Go 1.14
full diff: https://github.com/gotestyourself/gotest.tools/compare/v2.3.0...v3.0.1

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-02-11 00:06:42 +01:00
Akihiro Suda
612343618d cgroup2: use shim V2
* Requires containerd binaries from containerd/containerd#3799 . Metrics are unimplemented yet.
* Works with crun v0.10.4, but `--security-opt seccomp=unconfined` is needed unless using master version of libseccomp
  ( containers/crun#156, seccomp/libseccomp#177 )
* Doesn't work with master runc yet
* Resource limitations are unimplemented

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-01-01 02:58:40 +09:00
Brian Goff
36cf709abd
Merge pull request #40146 from thaJeztah/move_hcsshim
libcontainerd: move hcsshim import to windows-only file
2019-12-19 11:55:24 -08:00
Sebastiaan van Stijn
5bb4f4818b
libcontainerd: move hcsshim import to windows-only file
This reduces the dependency-graph when building packages for
Linux only.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-12-10 10:58:14 +01:00
Sebastiaan van Stijn
d29f420424
libcontainerd: normalize comment formatting
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-11-27 15:44:10 +01:00
Sebastiaan van Stijn
9a7e96b5b7
Rename "v1" to "statsV1"
follow-up to 27552ceb15, where this
was left as a review comment, but the PR was already merged.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-11-01 16:18:06 +01:00
Sebastiaan van Stijn
27552ceb15
bump containerd/cgroups 5fbad35c2a7e855762d3c60f2e474ffcad0d470a
full diff: c4b9ac5c76...5fbad35c2a

- containerd/cgroups#82 Add go module support
- containerd/cgroups#96 Move metrics proto package to stats/v1
- containerd/cgroups#97 Allow overriding the default /proc folder in blkioController
- containerd/cgroups#98 Allows ignoring memory modules
- containerd/cgroups#99 Add Go 1.13 to Travis
- containerd/cgroups#100 stats/v1: export per-cgroup stats

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-10-31 01:09:12 +01:00
Sebastiaan van Stijn
6b91ceff74
Use hcsshim osversion package for Windows versions
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-10-22 02:53:00 +02:00
Brian Goff
bef73d8b07 Wait for c8d process exit instead of polling API
In the containerd supervisor, instead of polling the healthcheck API
every 500 milliseconds we can just wait for the process to exit.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2019-10-16 12:23:10 -07:00
Akihiro Suda
de5a67156b
Merge pull request #39082 from ehazlett/opts-for-create
Add NewContainerOpts to libcontainerd.Create
2019-10-04 08:20:47 +09:00
Evan Hazlett
35ac4be5d5 add NewContainerOpts to libcontainerd.Create
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
2019-10-03 11:45:41 -04:00
John Howard
8988448729 Remove refs to jhowardmsft from .go code
Signed-off-by: John Howard <jhoward@microsoft.com>
2019-09-25 10:51:18 -07:00
Sebastiaan van Stijn
07ff4f1de8
goimports: fix imports
Format the source according to latest goimports.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-09-18 12:56:54 +02:00
Sebastiaan van Stijn
e554ab5589
Allow system.MkDirAll() to be used as drop-in for os.MkDirAll()
also renamed the non-windows variant of this file to be
consistent with other files in this package

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-08-08 15:05:49 +02:00
Brian Goff
1acaf2aabe Sleep before restarting event processing
This prevents restarting event processing in a tight loop.
You can see this with the following steps:

```terminal
$ containerd &
$ dockerd --containerd=/run/containerd/containerd.sock &
$ pkill -9 containerd
```

At this point you will be spammed with logs such as:

```
ERRO[2019-07-12T22:29:37.318761400Z] failed to get event                           error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=plugins.moby
```

Without this change you can quickly end up with gigabytes of log data.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2019-07-12 15:42:19 -07:00
Michael Crosby
b5f28865ef Handle blocked I/O of exec'd processes
This is the second part to
https://github.com/containerd/containerd/pull/3361 and will help process
delete not block forever when the process exists but the I/O was
inherited by a subprocess that lives on.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-06-21 12:02:15 -04:00
Sebastiaan van Stijn
c85fe2d224
Merge pull request #38522 from cpuguy83/fix_timers
Make sure timers are stopped after use.
2019-06-07 13:16:46 +02:00
Sebastiaan van Stijn
539e72f75b
Fix typo retreive -> retrieve
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-06-04 17:33:04 +02:00
Sebastiaan van Stijn
c030885e7a
Windows: fix error-type for starting a running container
Trying to start a container that is already running is not an
error condition, so a `304 Not Modified` should be returned instead
of a `409 Conflict`.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-05-22 13:27:55 +02:00