the `--log-level` flag overrides whatever is in the containerd configuration file;
f033f6ff85/cmd/containerd/command/main.go (L339-L352)
Given that we set that flag when we start the containerd binary, there is no need
to write it both to the generated config-file and pass it as flag.
This patch also slightly changes the behavior; as both dockerd and containerd use
"info" as default log-level, don't set the log-level if it's the default.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Adding a remote.configFile to store the location instead of re-constructing its
location each time. Also fixing a minor inconsistency in the error formats.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Adding a remote.pidFile to store the location instead of re-constructing its
location each time. Also performing a small refactor to use `strconv.Itoa`
instead of `fmt.Sprintf`.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Containerd, like dockerd has a OOMScore configuration option to adjust its own
OOM score. In dockerd, this option was added when default installations were not
yet running the daemon as a systemd unit, which made it more complicated to set
the score, and adding a daemon option was convenient.
A binary adjusting its own score has been frowned upon, as it's more logical to
make that the responsibility of the process manager _starting_ the daemon, which
is what we did for dockerd in 21578530d7.
There have been discussions on deprecating the daemon flag for dockerd, and
similar discussions have been happening for containerd.
This patch changes how we set the OOM score for the containerd child process,
and to have dockerd (supervisor) set the OOM score, as it's acting as process
manager in this case (performing a role similar to systemd otherwise).
With this patch, the score is still adjusted as usual, but not written to the
containerd configuration file;
dockerd --oom-score-adjust=-123
cat /proc/$(pidof containerd)/oom_score_adj
-123
As a follow-up, we may consider to adjust the containerd OOM score based on the
daemon's own score instead of on the `cli.OOMScoreAdjust` configuration so that
we will also adjust the score in situations where dockerd's OOM score was set
through other ways (systemd or manually adjusting the cgroup). A TODO was added
for this.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Consider Address() (Config.GRPC.Addres) to be the source of truth for
the location of the containerd socket.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This RWMutex was added in 9c4570a958, and used in
the `remote.Client()` method. Commit dd2e19ebd5
split the code for client and daemon, but did not remove the mutex.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The existing implementation used a `nil` value for the CRI plugin's configuration
to indicate that the plugin had to be disabled. Effectively, the `Plugins` value
was only used as an intermediate step, only to be removed later on, and to instead
add the given plugin to `DisabledPlugins` in the containerd configuration.
This patch removes the intermediate step; as a result we also don't need to mask
the containerd `Plugins` field, which was added to allow serializing the toml.
A code comment was added as well to explain why we're (currently) disabling the
CRI plugin by default, which may help future visitors of the code to determin
if that default is still needed.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This removes the `WithRemoteAddr()`, `WithRemoteAddrUser()`, `WithDebugAddress()`,
and `WithMetricsAddress()` options, added in ddae20c032,
but most of them were never used, and `WithRemoteAddr()` no longer in use since
dd2e19ebd5.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The correct formatting for machine-readable comments is;
//<some alphanumeric identifier>:<options>[,<option>...][ // comment]
Which basically means:
- MUST NOT have a space before `<identifier>` (e.g. `nolint`)
- Identified MUST be alphanumeric
- MUST be followed by a colon
- MUST be followed by at least one `<option>`
- Optionally additional `<options>` (comma-separated)
- Optionally followed by a comment
Any other format will not be considered a machine-readable comment by `gofmt`,
and thus formatted as a regular comment. Note that this also means that a
`//nolint` (without anything after it) is considered invalid, same for `//#nosec`
(starts with a `#`).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Older versions of Go don't format comments, so committing this as
a separate commit, so that we can already make these changes before
we upgrade to Go 1.19.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This allows the postContainersKill() handler to pass values as-is. As part of
the rewrite, I also moved the daemon.GetContainer(name) call later in the
function, so that we can fail early if an invalid signal is passed, before
doing the (heavier) fetching of the container.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Arbitrary here does not include '', best to catch that one early as it's
almost certainly a mistake (possibly an attempt to pass a POSIX path
through this API)
Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
Windows Server 2016 (RS1) reached end of support, and Docker Desktop requires
Windows 10 V19H2 (version 1909, build 18363) as a minimum.
This patch makes Windows Server RS5 / ltsc2019 (build 17763) the minimum version
to run the daemon, and removes some hacks for older versions of Windows.
There is one check remaining that checks for Windows RS3 for a workaround
on older versions, but recent changes in Windows seemed to have regressed
on the same issue, so I kept that code for now to check if we may need that
workaround (again);
085c6a98d5/daemon/graphdriver/windows/windows.go (L319-L341)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The io/ioutil package has been deprecated in Go 1.16. This commit
replaces the existing io/ioutil functions with their new definitions in
io and os packages.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
Taking the same approach as was taken in containerd
The new library has a slightly different output;
- keys at the same level are sorted alphabetically
- empty sections not omitted (`proxy_plugins`, `stream_processors`, `timeouts`),
which could possibly be be addressed with an "omitempty" in containerd's struct.
- empty slices are not omitted (`imports`, `required_plugins`)
After sorting the "before" configuration the diff looks like this:
```patch
diff --git a/config-before-sorted.toml b/config-after.toml
index cc771ce7ab..43a727f589 100644
--- a/config-before-sorted.toml
+++ b/config-after.toml
@@ -1,6 +1,8 @@
disabled_plugins = ["cri"]
+imports = []
oom_score = 0
plugin_dir = ""
+required_plugins = []
root = "/var/lib/docker/containerd/daemon"
state = "/var/run/docker/containerd/daemon"
version = 0
@@ -37,6 +39,12 @@ version = 0
shim = "containerd-shim"
shim_debug = true
+[proxy_plugins]
+
+[stream_processors]
+
+[timeouts]
+
[ttrpc]
address = ""
gid = 0
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
These were added in 94d70d8355 for Windows TP4,
but no longer used after 331c8a86d4 removed
support for TP4.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Checkpoint/Restore is horribly broken all around.
But on the, now default, v2 runtime it's even more broken.
This at least makes checkpoint equally broken on both runtimes.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This patch makes two changes if --oom-score-adj is set to 0
- do not adjust the oom-score-adjust cgroup for dockerd
- do not set the hard-coded -999 score for containerd if
containerd is running as child process
Before this change:
oom-score-adj | dockerd | containerd as child-process
--------------|---------------|----------------------------
- | -500 | -500 (same as dockerd)
-100 | -100 | -100 (same as dockerd)
0 | 0 | -999 (hard-coded default)
With this change:
oom-score-adj | dockerd | containerd as child-process
--------------|---------------|----------------------------
- | -500 | -500 (same as dockerd)
-100 | -100 | -100 (same as dockerd)
0 | not adjusted | not adjusted
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The event subscriber can only be cancelled by cancelling the context.
In the case where we have to restart event processing we are never
cancelling the old subscribiption.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Instead of sleeping an arbitrary amount of time, using the client to
tell us when it's ready so we can start processing events sooner.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This function was removed in the Linux code as part of
f63f73a4a8, but was not removed in
the Windows code.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
In dockerd we already have a concept of a "runtime", which specifies the
OCI runtime to use (e.g. runc).
This PR extends that config to add containerd shim configuration.
This option is only exposed within the daemon itself (cannot be
configured in daemon.json).
This is due to issues in supporting unknown shims which will require
more design work.
What this change allows us to do is keep all the runtime config in one
place.
So the default "runc" runtime will just have it's already existing shim
config codified within the runtime config alone.
I've also added 2 more "stock" runtimes which are basically runc+shimv1
and runc+shimv2.
These new runtime configurations are:
- io.containerd.runtime.v1.linux - runc + v1 shim using the V1 shim API
- io.containerd.runc.v2 - runc + shim v2
These names coincide with the actual names of the containerd shims.
This allows the user to essentially control what shim is going to be
used by either specifying these as a `--runtime` on container create or
by setting `--default-runtime` on the daemon.
For custom/user-specified runtimes, the default shim config (currently
shim v1) is used.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
The following fields are unsupported:
* BlkioStats: all fields other than IoServiceBytesRecursive
* CPUStats: CPUUsage.PercpuUsage
* MemoryStats: MaxUsage and Failcnt
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
* Requires containerd binaries from containerd/containerd#3799 . Metrics are unimplemented yet.
* Works with crun v0.10.4, but `--security-opt seccomp=unconfined` is needed unless using master version of libseccomp
( containers/crun#156, seccomp/libseccomp#177 )
* Doesn't work with master runc yet
* Resource limitations are unimplemented
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
In the containerd supervisor, instead of polling the healthcheck API
every 500 milliseconds we can just wait for the process to exit.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Format the source according to latest goimports.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
also renamed the non-windows variant of this file to be
consistent with other files in this package
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This prevents restarting event processing in a tight loop.
You can see this with the following steps:
```terminal
$ containerd &
$ dockerd --containerd=/run/containerd/containerd.sock &
$ pkill -9 containerd
```
At this point you will be spammed with logs such as:
```
ERRO[2019-07-12T22:29:37.318761400Z] failed to get event error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=plugins.moby
```
Without this change you can quickly end up with gigabytes of log data.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This is the second part to
https://github.com/containerd/containerd/pull/3361 and will help process
delete not block forever when the process exists but the I/O was
inherited by a subprocess that lives on.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Trying to start a container that is already running is not an
error condition, so a `304 Not Modified` should be returned instead
of a `409 Conflict`.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: John Howard <jhoward@microsoft.com>
Fixes#38719
Fixes some subtle bugs on Windows
- Fixes https://github.com/moby/moby/issues/38719. This one is the most important
as failure to start the init process in a Windows container will cause leaked
handles. (ie where the `ctr.hcsContainer.CreateProcess(...)` call fails).
The solution to the leak is to split out the `reapContainer` part of `reapProcess`
into a separate function. This ensures HCS resources are cleaned up correctly and
not leaked.
- Ensuring the reapProcess goroutine is started immediately the process
is actually started, so we don't leak in the case of failures such as
from `newIOFromProcess` or `attachStdio`
- libcontainerd on Windows (local, not containerd) was not sending the EventCreate
back to the monitor on Windows. Just LCOW. This was just an oversight from
refactoring a couple of years ago by Mikael as far as I can tell. Technically
not needed for functionality except for the logging being missing, but is correct.
Signed-off-by: John Howard <jhoward@microsoft.com>
Also fixes https://github.com/moby/moby/issues/22874
This commit is a pre-requisite to moving moby/moby on Windows to using
Containerd for its runtime.
The reason for this is that the interface between moby and containerd
for the runtime is an OCI spec which must be unambigious.
It is the responsibility of the runtime (runhcs in the case of
containerd on Windows) to ensure that arguments are escaped prior
to calling into HCS and onwards to the Win32 CreateProcess call.
Previously, the builder was always escaping arguments which has
led to several bugs in moby. Because the local runtime in
libcontainerd had context of whether or not arguments were escaped,
it was possible to hack around in daemon/oci_windows.go with
knowledge of the context of the call (from builder or not).
With a remote runtime, this is not possible as there's rightly
no context of the caller passed across in the OCI spec. Put another
way, as I put above, the OCI spec must be unambigious.
The other previous limitation (which leads to various subtle bugs)
is that moby is coded entirely from a Linux-centric point of view.
Unfortunately, Windows != Linux. Windows CreateProcess uses a
command line, not an array of arguments. And it has very specific
rules about how to escape a command line. Some interesting reading
links about this are:
https://blogs.msdn.microsoft.com/twistylittlepassagesallalike/2011/04/23/everyone-quotes-command-line-arguments-the-wrong-way/https://stackoverflow.com/questions/31838469/how-do-i-convert-argv-to-lpcommandline-parameter-of-createprocesshttps://docs.microsoft.com/en-us/cpp/cpp/parsing-cpp-command-line-arguments?view=vs-2017
For this reason, the OCI spec has recently been updated to cater
for more natural syntax by including a CommandLine option in
Process.
What does this commit do?
Primary objective is to ensure that the built OCI spec is unambigious.
It changes the builder so that `ArgsEscaped` as commited in a
layer is only controlled by the use of CMD or ENTRYPOINT.
Subsequently, when calling in to create a container from the builder,
if follows a different path to both `docker run` and `docker create`
using the added `ContainerCreateIgnoreImagesArgsEscaped`. This allows
a RUN from the builder to control how to escape in the OCI spec.
It changes the builder so that when shell form is used for RUN,
CMD or ENTRYPOINT, it builds (for WCOW) a more natural command line
using the original as put by the user in the dockerfile, not
the parsed version as a set of args which loses fidelity.
This command line is put into args[0] and `ArgsEscaped` is set
to true for CMD or ENTRYPOINT. A RUN statement does not commit
`ArgsEscaped` to the commited layer regardless or whether shell
or exec form were used.
Signed-off-by: John Howard <jhoward@microsoft.com>
This is the first step in refactoring moby (dockerd) to use containerd on Windows.
Similar to the current model in Linux, this adds the option to enable it for runtime.
It does not switch the graphdriver to containerd snapshotters.
- Refactors libcontainerd to a series of subpackages so that either a
"local" containerd (1) or a "remote" (2) containerd can be loaded as opposed
to conditional compile as "local" for Windows and "remote" for Linux.
- Updates libcontainerd such that Windows has an option to allow the use of a
"remote" containerd. Here, it communicates over a named pipe using GRPC.
This is currently guarded behind the experimental flag, an environment variable,
and the providing of a pipename to connect to containerd.
- Infrastructure pieces such as under pkg/system to have helper functions for
determining whether containerd is being used.
(1) "local" containerd is what the daemon on Windows has used since inception.
It's not really containerd at all - it's simply local invocation of HCS APIs
directly in-process from the daemon through the Microsoft/hcsshim library.
(2) "remote" containerd is what docker on Linux uses for it's runtime. It means
that there is a separate containerd service running, and docker communicates over
GRPC to it.
To try this out, you will need to start with something like the following:
Window 1:
containerd --log-level debug
Window 2:
$env:DOCKER_WINDOWS_CONTAINERD=1
dockerd --experimental -D --containerd \\.\pipe\containerd-containerd
You will need the following binary from github.com/containerd/containerd in your path:
- containerd.exe
You will need the following binaries from github.com/Microsoft/hcsshim in your path:
- runhcs.exe
- containerd-shim-runhcs-v1.exe
For LCOW, it will require and initrd.img and kernel in `C:\Program Files\Linux Containers`.
This is no different to the current requirements. However, you may need updated binaries,
particularly initrd.img built from Microsoft/opengcs as (at the time of writing), Linuxkit
binaries are somewhat out of date.
Note that containerd and hcsshim for HCS v2 APIs do not yet support all the required
functionality needed for docker. This will come in time - this is a baby (although large)
step to migrating Docker on Windows to containerd.
Note that the HCS v2 APIs are only called on RS5+ builds. RS1..RS4 will still use
HCS v1 APIs as the v2 APIs were not fully developed enough on these builds to be usable.
This abstraction is done in HCSShim. (Referring specifically to runtime)
Note the LCOW graphdriver still uses HCS v1 APIs regardless.
Note also that this does not migrate docker to use containerd snapshotters
rather than graphdrivers. This needs to be done in conjunction with Linux also
doing the same switch.
`time.After` keeps a timer running until the specified duration is
completed. It also allocates a new timer on each call. This can wind up
leaving lots of uneccessary timers running in the background that are
not needed and consume resources.
Instead of `time.After`, use `time.NewTimer` so the timer can actually
be stopped.
In some of these cases it's not a big deal since the duraiton is really
short, but in others it is much worse.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Implements the --device forwarding for Windows daemons. This maps the physical
device into the container at runtime.
Ex:
docker run --device="class/<clsid>" <image> <cmd>
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
The stdin fifo of exec process is created in containerd side after
client calls Start. If the client calls CloseIO before Start call, the
stdin of exec process is still opened and wait for close.
For this case, client closes stdinCloseSync channel after Start.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
This should eliminate a bunch of new (go-1.11 related) validation
errors telling that the code is not formatted with `gofmt -s`.
No functional change, just whitespace (i.e.
`git show --ignore-space-change` shows nothing).
Patch generated with:
> git ls-files | grep -v ^vendor/ | grep .go$ | xargs gofmt -s -w
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This implements chown support on Windows. Built-in accounts as well
as accounts included in the SAM database of the container are supported.
NOTE: IDPair is now named Identity and IDMappings is now named
IdentityMapping.
The following are valid examples:
ADD --chown=Guest . <some directory>
COPY --chown=Administrator . <some directory>
COPY --chown=Guests . <some directory>
COPY --chown=ContainerUser . <some directory>
On Windows an owner is only granted the permission to read the security
descriptor and read/write the discretionary access control list. This
fix also grants read/write and execute permissions to the owner.
Signed-off-by: Salahuddin Khan <salah@docker.com>
Adds a supervisor package for starting and monitoring containerd.
Separates grpc connection allowing access from daemon.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
Previously, dockerd would always ask containerd to pass --leave-running
to runc/runsc, ignoring the exit boolean value. Hence, even `docker
checkpoint create --leave-running=false ...` would not stop the
container.
Signed-off-by: Brielle Broder <bbroder@google.com>
Disable cri plugin by default in containerd and
allows an option to enable the plugin. This only
has an effect on containerd when supervised by
dockerd. When containerd is managed outside of
dockerd, the configuration is not effected.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
dockerd allows the `--log-level` to be specified, but this log-level
was not forwarded to the containerd process.
This patch sets containerd's log-level to the same as dockerd if a
custom level is provided.
Now that `--log-level` is also passed to containerd, the default "info"
is removed, so that containerd's default (or the level configured in containerd.toml)
is still used if no log-level is set.
Before this change:
containerd would always be started without a log-level set (only the level that's configured in `containerd.toml`);
```
root 1014 2.5 2.1 496484 43468 pts/0 Sl+ 12:23 0:00 dockerd
root 1023 1.2 1.1 681768 23832 ? Ssl 12:23 0:00 \_ docker-containerd --config /var/run/docker/containerd/containerd.toml
```
After this change:
when running `dockerd` without options (same as current);
```
root 1014 2.5 2.1 496484 43468 pts/0 Sl+ 12:23 0:00 dockerd
root 1023 1.2 1.1 681768 23832 ? Ssl 12:23 0:00 \_ docker-containerd --config /var/run/docker/containerd/containerd.toml
```
when running `dockerd --debug`:
```
root 600 0.8 2.1 512876 43180 pts/0 Sl+ 12:20 0:00 dockerd --debug
root 608 0.6 1.1 624428 23672 ? Ssl 12:20 0:00 \_ docker-containerd --config /var/run/docker/containerd/containerd.toml --log-level debug
```
when running `dockerd --log-level=panic`
```
root 747 0.6 2.1 496548 43996 pts/0 Sl+ 12:21 0:00 dockerd --log-level=panic
root 755 0.7 1.1 550696 24100 ? Ssl 12:21 0:00 \_ docker-containerd --config /var/run/docker/containerd/containerd.toml --log-level panic
```
combining `--debug` and `--log-level` (`--debug` takes precedence):
```
root 880 2.7 2.1 634692 43336 pts/0 Sl+ 12:23 0:00 dockerd --debug --log-level=panic
root 888 1.0 1.1 616232 23652 ? Ssl 12:23 0:00 \_ docker-containerd --config /var/run/docker/containerd/containerd.toml --log-level debug
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This unblocks the client to take other restore requests and makes sure
that a long/stuck request can't block the client forever.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>