Format the source according to latest goimports.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
also renamed the non-windows variant of this file to be
consistent with other files in this package
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This prevents restarting event processing in a tight loop.
You can see this with the following steps:
```terminal
$ containerd &
$ dockerd --containerd=/run/containerd/containerd.sock &
$ pkill -9 containerd
```
At this point you will be spammed with logs such as:
```
ERRO[2019-07-12T22:29:37.318761400Z] failed to get event error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=plugins.moby
```
Without this change you can quickly end up with gigabytes of log data.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This is the second part to
https://github.com/containerd/containerd/pull/3361 and will help process
delete not block forever when the process exists but the I/O was
inherited by a subprocess that lives on.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Trying to start a container that is already running is not an
error condition, so a `304 Not Modified` should be returned instead
of a `409 Conflict`.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: John Howard <jhoward@microsoft.com>
Fixes#38719
Fixes some subtle bugs on Windows
- Fixes https://github.com/moby/moby/issues/38719. This one is the most important
as failure to start the init process in a Windows container will cause leaked
handles. (ie where the `ctr.hcsContainer.CreateProcess(...)` call fails).
The solution to the leak is to split out the `reapContainer` part of `reapProcess`
into a separate function. This ensures HCS resources are cleaned up correctly and
not leaked.
- Ensuring the reapProcess goroutine is started immediately the process
is actually started, so we don't leak in the case of failures such as
from `newIOFromProcess` or `attachStdio`
- libcontainerd on Windows (local, not containerd) was not sending the EventCreate
back to the monitor on Windows. Just LCOW. This was just an oversight from
refactoring a couple of years ago by Mikael as far as I can tell. Technically
not needed for functionality except for the logging being missing, but is correct.
Signed-off-by: John Howard <jhoward@microsoft.com>
Also fixes https://github.com/moby/moby/issues/22874
This commit is a pre-requisite to moving moby/moby on Windows to using
Containerd for its runtime.
The reason for this is that the interface between moby and containerd
for the runtime is an OCI spec which must be unambigious.
It is the responsibility of the runtime (runhcs in the case of
containerd on Windows) to ensure that arguments are escaped prior
to calling into HCS and onwards to the Win32 CreateProcess call.
Previously, the builder was always escaping arguments which has
led to several bugs in moby. Because the local runtime in
libcontainerd had context of whether or not arguments were escaped,
it was possible to hack around in daemon/oci_windows.go with
knowledge of the context of the call (from builder or not).
With a remote runtime, this is not possible as there's rightly
no context of the caller passed across in the OCI spec. Put another
way, as I put above, the OCI spec must be unambigious.
The other previous limitation (which leads to various subtle bugs)
is that moby is coded entirely from a Linux-centric point of view.
Unfortunately, Windows != Linux. Windows CreateProcess uses a
command line, not an array of arguments. And it has very specific
rules about how to escape a command line. Some interesting reading
links about this are:
https://blogs.msdn.microsoft.com/twistylittlepassagesallalike/2011/04/23/everyone-quotes-command-line-arguments-the-wrong-way/https://stackoverflow.com/questions/31838469/how-do-i-convert-argv-to-lpcommandline-parameter-of-createprocesshttps://docs.microsoft.com/en-us/cpp/cpp/parsing-cpp-command-line-arguments?view=vs-2017
For this reason, the OCI spec has recently been updated to cater
for more natural syntax by including a CommandLine option in
Process.
What does this commit do?
Primary objective is to ensure that the built OCI spec is unambigious.
It changes the builder so that `ArgsEscaped` as commited in a
layer is only controlled by the use of CMD or ENTRYPOINT.
Subsequently, when calling in to create a container from the builder,
if follows a different path to both `docker run` and `docker create`
using the added `ContainerCreateIgnoreImagesArgsEscaped`. This allows
a RUN from the builder to control how to escape in the OCI spec.
It changes the builder so that when shell form is used for RUN,
CMD or ENTRYPOINT, it builds (for WCOW) a more natural command line
using the original as put by the user in the dockerfile, not
the parsed version as a set of args which loses fidelity.
This command line is put into args[0] and `ArgsEscaped` is set
to true for CMD or ENTRYPOINT. A RUN statement does not commit
`ArgsEscaped` to the commited layer regardless or whether shell
or exec form were used.
Signed-off-by: John Howard <jhoward@microsoft.com>
This is the first step in refactoring moby (dockerd) to use containerd on Windows.
Similar to the current model in Linux, this adds the option to enable it for runtime.
It does not switch the graphdriver to containerd snapshotters.
- Refactors libcontainerd to a series of subpackages so that either a
"local" containerd (1) or a "remote" (2) containerd can be loaded as opposed
to conditional compile as "local" for Windows and "remote" for Linux.
- Updates libcontainerd such that Windows has an option to allow the use of a
"remote" containerd. Here, it communicates over a named pipe using GRPC.
This is currently guarded behind the experimental flag, an environment variable,
and the providing of a pipename to connect to containerd.
- Infrastructure pieces such as under pkg/system to have helper functions for
determining whether containerd is being used.
(1) "local" containerd is what the daemon on Windows has used since inception.
It's not really containerd at all - it's simply local invocation of HCS APIs
directly in-process from the daemon through the Microsoft/hcsshim library.
(2) "remote" containerd is what docker on Linux uses for it's runtime. It means
that there is a separate containerd service running, and docker communicates over
GRPC to it.
To try this out, you will need to start with something like the following:
Window 1:
containerd --log-level debug
Window 2:
$env:DOCKER_WINDOWS_CONTAINERD=1
dockerd --experimental -D --containerd \\.\pipe\containerd-containerd
You will need the following binary from github.com/containerd/containerd in your path:
- containerd.exe
You will need the following binaries from github.com/Microsoft/hcsshim in your path:
- runhcs.exe
- containerd-shim-runhcs-v1.exe
For LCOW, it will require and initrd.img and kernel in `C:\Program Files\Linux Containers`.
This is no different to the current requirements. However, you may need updated binaries,
particularly initrd.img built from Microsoft/opengcs as (at the time of writing), Linuxkit
binaries are somewhat out of date.
Note that containerd and hcsshim for HCS v2 APIs do not yet support all the required
functionality needed for docker. This will come in time - this is a baby (although large)
step to migrating Docker on Windows to containerd.
Note that the HCS v2 APIs are only called on RS5+ builds. RS1..RS4 will still use
HCS v1 APIs as the v2 APIs were not fully developed enough on these builds to be usable.
This abstraction is done in HCSShim. (Referring specifically to runtime)
Note the LCOW graphdriver still uses HCS v1 APIs regardless.
Note also that this does not migrate docker to use containerd snapshotters
rather than graphdrivers. This needs to be done in conjunction with Linux also
doing the same switch.
`time.After` keeps a timer running until the specified duration is
completed. It also allocates a new timer on each call. This can wind up
leaving lots of uneccessary timers running in the background that are
not needed and consume resources.
Instead of `time.After`, use `time.NewTimer` so the timer can actually
be stopped.
In some of these cases it's not a big deal since the duraiton is really
short, but in others it is much worse.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Implements the --device forwarding for Windows daemons. This maps the physical
device into the container at runtime.
Ex:
docker run --device="class/<clsid>" <image> <cmd>
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
The stdin fifo of exec process is created in containerd side after
client calls Start. If the client calls CloseIO before Start call, the
stdin of exec process is still opened and wait for close.
For this case, client closes stdinCloseSync channel after Start.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
This should eliminate a bunch of new (go-1.11 related) validation
errors telling that the code is not formatted with `gofmt -s`.
No functional change, just whitespace (i.e.
`git show --ignore-space-change` shows nothing).
Patch generated with:
> git ls-files | grep -v ^vendor/ | grep .go$ | xargs gofmt -s -w
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This implements chown support on Windows. Built-in accounts as well
as accounts included in the SAM database of the container are supported.
NOTE: IDPair is now named Identity and IDMappings is now named
IdentityMapping.
The following are valid examples:
ADD --chown=Guest . <some directory>
COPY --chown=Administrator . <some directory>
COPY --chown=Guests . <some directory>
COPY --chown=ContainerUser . <some directory>
On Windows an owner is only granted the permission to read the security
descriptor and read/write the discretionary access control list. This
fix also grants read/write and execute permissions to the owner.
Signed-off-by: Salahuddin Khan <salah@docker.com>
Adds a supervisor package for starting and monitoring containerd.
Separates grpc connection allowing access from daemon.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
Previously, dockerd would always ask containerd to pass --leave-running
to runc/runsc, ignoring the exit boolean value. Hence, even `docker
checkpoint create --leave-running=false ...` would not stop the
container.
Signed-off-by: Brielle Broder <bbroder@google.com>
Disable cri plugin by default in containerd and
allows an option to enable the plugin. This only
has an effect on containerd when supervised by
dockerd. When containerd is managed outside of
dockerd, the configuration is not effected.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
dockerd allows the `--log-level` to be specified, but this log-level
was not forwarded to the containerd process.
This patch sets containerd's log-level to the same as dockerd if a
custom level is provided.
Now that `--log-level` is also passed to containerd, the default "info"
is removed, so that containerd's default (or the level configured in containerd.toml)
is still used if no log-level is set.
Before this change:
containerd would always be started without a log-level set (only the level that's configured in `containerd.toml`);
```
root 1014 2.5 2.1 496484 43468 pts/0 Sl+ 12:23 0:00 dockerd
root 1023 1.2 1.1 681768 23832 ? Ssl 12:23 0:00 \_ docker-containerd --config /var/run/docker/containerd/containerd.toml
```
After this change:
when running `dockerd` without options (same as current);
```
root 1014 2.5 2.1 496484 43468 pts/0 Sl+ 12:23 0:00 dockerd
root 1023 1.2 1.1 681768 23832 ? Ssl 12:23 0:00 \_ docker-containerd --config /var/run/docker/containerd/containerd.toml
```
when running `dockerd --debug`:
```
root 600 0.8 2.1 512876 43180 pts/0 Sl+ 12:20 0:00 dockerd --debug
root 608 0.6 1.1 624428 23672 ? Ssl 12:20 0:00 \_ docker-containerd --config /var/run/docker/containerd/containerd.toml --log-level debug
```
when running `dockerd --log-level=panic`
```
root 747 0.6 2.1 496548 43996 pts/0 Sl+ 12:21 0:00 dockerd --log-level=panic
root 755 0.7 1.1 550696 24100 ? Ssl 12:21 0:00 \_ docker-containerd --config /var/run/docker/containerd/containerd.toml --log-level panic
```
combining `--debug` and `--log-level` (`--debug` takes precedence):
```
root 880 2.7 2.1 634692 43336 pts/0 Sl+ 12:23 0:00 dockerd --debug --log-level=panic
root 888 1.0 1.1 616232 23652 ? Ssl 12:23 0:00 \_ docker-containerd --config /var/run/docker/containerd/containerd.toml --log-level debug
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This unblocks the client to take other restore requests and makes sure
that a long/stuck request can't block the client forever.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This fixes an issue where the containerd client is cached in a container
object in libcontainerd and becomes stale after containerd is restarted.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
With the ticker this could end up just doing back-to-back checks, which
isn't really what we want here.
Instead use a sleep to ensure we actually sleep for the desired
interval.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Signed-off-by: John Howard <jhoward@microsoft.com>
While debugging #32838, it was found (https://github.com/moby/moby/issues/32838#issuecomment-356005845) that the utility VM in some circumstances was crashing. Unfortunately, this was silently thrown away, and as far as the build step (also applies to docker run) was concerned, the exit code was zero and the error was thrown away. Windows containers operate differently to containers on Linux, and there can be legitimate system errors during container shutdown after the init process exits. This PR handles this and passes the error all the way back to the client, and correctly causes a build step running a container which hits a system error to fail, rather than blindly trying to keep going, assuming all is good, and get a subsequent failure on a commit.
With this change, assuming an error occurs, here's an example of a failure which previous was reported as a commit error:
```
The command 'powershell -Command $ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue'; Install-WindowsFeature -Name Web-App-Dev ; Install-WindowsFeature -Name ADLDS; Install-WindowsFeature -Name Web-Mgmt-Compat; Install-WindowsFeature -Name Web-Mgmt-Service; Install-WindowsFeature -Name Web-Metabase; Install-WindowsFeature -Name Web-Lgcy-Scripting; Install-WindowsFeature -Name Web-WMI; Install-WindowsFeature -Name Web-WHC; Install-WindowsFeature -Name Web-Scripting-Tools; Install-WindowsFeature -Name Web-Net-Ext45; Install-WindowsFeature -Name Web-ASP; Install-WindowsFeature -Name Web-ISAPI-Ext; Install-WindowsFeature -Name Web-ISAPI-Filter; Install-WindowsFeature -Name Web-Default-Doc; Install-WindowsFeature -Name Web-Dir-Browsing; Install-WindowsFeature -Name Web-Http-Errors; Install-WindowsFeature -Name Web-Static-Content; Install-WindowsFeature -Name Web-Http-Redirect; Install-WindowsFeature -Name Web-DAV-Publishing; Install-WindowsFeature -Name Web-Health; Install-WindowsFeature -Name Web-Http-Logging; Install-WindowsFeature -Name Web-Custom-Logging; Install-WindowsFeature -Name Web-Log-Libraries; Install-WindowsFeature -Name Web-Request-Monitor; Install-WindowsFeature -Name Web-Http-Tracing; Install-WindowsFeature -Name Web-Stat-Compression; Install-WindowsFeature -Name Web-Dyn-Compression; Install-WindowsFeature -Name Web-Security; Install-WindowsFeature -Name Web-Windows-Auth; Install-WindowsFeature -Name Web-Basic-Auth; Install-WindowsFeature -Name Web-Url-Auth; Install-WindowsFeature -Name Web-WebSockets; Install-WindowsFeature -Name Web-AppInit; Install-WindowsFeature -Name NET-WCF-HTTP-Activation45; Install-WindowsFeature -Name NET-WCF-Pipe-Activation45; Install-WindowsFeature -Name NET-WCF-TCP-Activation45;' returned a non-zero code: 4294967295: container shutdown failed: container ba9c65054d42d4830fb25ef55e4ab3287550345aa1a2bb265df4e5bfcd79c78a encountered an error during WaitTimeout: failure in a Windows system call: The compute system exited unexpectedly. (0xc0370106)
```
Without this change, it would be incorrectly reported such as in this comment: https://github.com/moby/moby/issues/32838#issuecomment-309621097
```
Step 3/8 : ADD buildtools C:/buildtools
re-exec error: exit status 1: output: time="2017-06-20T11:37:38+10:00" level=error msg="hcsshim::ImportLayer failed in Win32: The system cannot find the path specified. (0x3) layerId=\\\\?\\C:\\ProgramData\\docker\\windowsfilter\\b41d28c95f98368b73fc192cb9205700e21
6691495c1f9ac79b9b04ec4923ea2 flavour=1 folder=C:\\Windows\\TEMP\\hcs232661915"
hcsshim::ImportLayer failed in Win32: The system cannot find the path specified. (0x3) layerId=\\?\C:\ProgramData\docker\windowsfilter\b41d28c95f98368b73fc192cb9205700e216691495c1f9ac79b9b04ec4923ea2 flavour=1 folder=C:\Windows\TEMP\hcs232661915
```
When the daemon restores containers on daemon restart, it syncs up with
containerd to determine the existing state. For stopped containers it
then removes the container metadata from containerd.
In some cases this is not handled properly and causes an error when
someone attempts to start that container again.
In particular, this case is just a bad error check.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Before this patch, when containerd is restarted (due to a crash, or
kill, whatever), the daemon would keep trying to process the event
stream against the old socket handles. This would lead to a CPU spin due
to the error handling when the client can't connect to containerd.
This change makes sure the containerd remote client is updated for all
registered libcontainerd clients.
This is not neccessarily the ideal fix which would likely require a
major refactor, but at least gets things to a working state with a
minimal patch.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Instead of having to create a bunch of custom error types that are doing
nothing but wrapping another error in sub-packages, use a common helper
to create errors of the requested type.
e.g. instead of re-implementing this over and over:
```go
type notFoundError struct {
cause error
}
func(e notFoundError) Error() string {
return e.cause.Error()
}
func(e notFoundError) NotFound() {}
func(e notFoundError) Cause() error {
return e.cause
}
```
Packages can instead just do:
```
errdefs.NotFound(err)
```
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
The event filter used two separate filter-conditions for
"namespace" and "topic". As a result, both events matching
"topic" and events matching "namespace" were subscribed to,
causing events to be handled both by the "plugin" client, and
"container" client.
This patch rewrites the filter to match only if both namespace
and topic match.
Thanks to Stephen Day for providing the correct filter :)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
With the contianerd 1.0 migration we now have strongly typed errors that
we can check for process not found.
We also had some bad error checks looking for `ESRCH` which would only
be returned from `unix.Kill` and never from containerd even though we
were checking containerd responses for it.
Fixes some race conditions around process handling and our error checks
that could lead to errors that propagate up to the user that should not.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
The `docker info` code was shelling out to obtain the
version of containerd (using the `--version` flag).
Parsing the output of this version string is error-prone,
and not needed, as the containerd API can return the
version.
This patch adds a `Version()` method to the containerd Client
interface, and uses this to get the containerd version.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This also update:
- runc to 3f2f8b84a77f73d38244dd690525642a72156c64
- runtime-specs to v1.0.0
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
Current insider builds of Windows have support for mounting individual
named pipe servers from the host to the guest. This allows, for example,
exposing the docker engine's named pipe to a container.
This change allows the user to request such a mount via the normal bind
mount syntax in the CLI:
docker run -v \\.\pipe\docker_engine:\\.\pipe\docker_engine <args>
Signed-off-by: John Starks <jostarks@microsoft.com>
Changes most references of syscall to golang.org/x/sys/
Ones aren't changes include, Errno, Signal and SysProcAttr
as they haven't been implemented in /x/sys/.
Signed-off-by: Christopher Jones <tophj@linux.vnet.ibm.com>
[s390x] switch utsname from unsigned to signed
per 33267e036f
char in s390x in the /x/sys/unix package is now signed, so
change the buildtags
Signed-off-by: Christopher Jones <tophj@linux.vnet.ibm.com>
During container startup we end up spending a fair amount of time
encoding/decoding json.
This cuts out some of that since we already have the decoded object in
memory.
The old flow looked like:
1. Start container request
2. Create file
3. Encode container spec to json
4. Write to file
5. Close file
6. Open file
7. Read file
8. Decode container spec
9. Close file
10. Send to containerd.
The new flow cuts out steps 6-9 completely, and with it a lot of time
spent in reflect and file IO.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Docker use default GRPC backoff strategy to reconnect to containerd when
connection is lost. and the delay time grows exponentially, until reaches 120s.
So Change the max delay time to 2s to avoid docker and containerd
connection failure.
Signed-off-by: Wentao Zhang <zhangwentao234@huawei.com>
It has observed defunct containerd processes accumulating over
time while dockerd was permanently failing to restart containerd.
Due to a bug in the runContainerdDaemon() function, dockerd does not clean up
its child process if containerd already exits very soon after the (re)start.
The reproducer and analysis below comes from docker 1.12.x but bug
still applies on latest master.
- from libcontainerd/remote_linux.go:
329 func (r *remote) runContainerdDaemon() error {
:
: // start the containerd child process
:
403 if err := cmd.Start(); err != nil {
404 return err
405 }
:
: // If containerd exits very soon after (re)start, it is
possible
: // that containerd is already in defunct state at the time
when
: // dockerd gets here. The setOOMScore() function tries to
write
: // to /proc/PID_OF_CONTAINERD/oom_score_adj. However, this
fails
: // with errno EINVAL because containerd is defunct. Please see
: // snippets of kernel source code and further explanation
below.
:
407 if err := setOOMScore(cmd.Process.Pid, r.oomScore); err != nil
{
408 utils.KillProcess(cmd.Process.Pid)
:
: // Due to the error from write() we return here. As
the
: // goroutine that would clean up the child has not
been
: // started yet, containerd remains in the defunct
state
: // and never gets reaped.
:
409 return err
410 }
:
417 go func() {
418 cmd.Wait()
419 close(r.daemonWaitCh)
420 }() // Reap our child when needed
:
423 }
This is the kernel function that gets invoked when dockerd tries to
write
to /proc/PID_OF_CONTAINERD/oom_score_adj.
- from fs/proc/base.c:
1197 static ssize_t oom_score_adj_write(struct file *file, ...
1198 size_t count, loff_t
*ppos)
1199 {
:
1223 task = get_proc_task(file_inode(file));
:
: // The defunct containerd process does not have a virtual
: // address space anymore, i.e. task->mm is NULL. Thus the
: // following code returns errno EINVAL to dockerd.
:
1230 if (!task->mm) {
1231 err = -EINVAL;
1232 goto err_task_lock;
1233 }
:
1253 err_task_lock:
:
1257 return err < 0 ? err : count;
1258 }
The purpose of the following program is to demonstrate the behavior of
the oom_score_adj_write() function in connection with a defunct process.
$ cat defunct_test.c
\#include <unistd.h>
main()
{
pid_t pid = fork();
if (pid == 0)
// child
_exit(0);
// parent
pause();
}
$ make defunct_test
cc defunct_test.c -o defunct_test
$ ./defunct_test &
[1] 3142
$ ps -f | grep defunct_test | grep -v grep
root 3142 2956 0 13:04 pts/0 00:00:00 ./defunct_test
root 3143 3142 0 13:04 pts/0 00:00:00 [defunct_test] <defunct>
$ echo "ps 3143" | crash -s
PID PPID CPU TASK ST %MEM VSZ RSS COMM
3143 3142 2 ffff880035def300 ZO 0.0 0 0
defunct_test
$ echo "px ((struct task_struct *)0xffff880035def300)->mm" | crash -s
$1 = (struct mm_struct *) 0x0
^^^ task->mm is NULL
$ cat /proc/3143/oom_score_adj
0
$ echo 0 > /proc/3143/oom_score_adj
-bash: echo: write error: Invalid argument"
---
This patch fixes the above issue by making sure we start the reaper
goroutine as soon as possible.
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
Description:
Kill docker-containerd continuously, and use kill -SIGUSR1 <dockerpid>
to check docker callstacks. And we will find that event
handler: startEventsMonitor or handleEventStream will exit.
This will only happen when system is busy, containerd need more time to
startup, and the monitor gorotine maybe exit.
Signed-off-by: Wentao Zhang <zhangwentao234@huawei.com>
Signed-off-by: John Howard <jhoward@microsoft.com>
This ensures that any compute processes in HCS are cleanedup
during daemon restore. Note Windows cannot (currently) reconnect
to containers on restore.
Instead of a timeout the context is cancelled on error to ensure
proper cleanup of the associated fifos' goroutines.
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>