When a container is left running after the daemon exits (e.g. the daemon
is SIGKILL'd or crashes), it should stop any running containers when the
daemon starts back up.
What actually happens is the daemon only sends the container's
configured stop signal and does not check if it has exited.
If the container does not actually exit then it is left running.
This fixes this unexpected behavior by calling the same function to shut
down the container that the daemon shutdown process does.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This enables image lookup when creating a container to fail when the
reference exists but it is for the wrong platform. This prevents trying
to run an image for the wrong platform, as can be the case with, for
example binfmt_misc+qemu.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Switch to moby/sys/mount and mountinfo. Keep the pkg/mount for potential
outside users.
This commit was generated by the following bash script:
```
set -e -u -o pipefail
for file in $(git grep -l 'docker/docker/pkg/mount"' | grep -v ^pkg/mount); do
sed -i -e 's#/docker/docker/pkg/mount"#/moby/sys/mount"#' \
-e 's#mount\.\(GetMounts\|Mounted\|Info\|[A-Za-z]*Filter\)#mountinfo.\1#g' \
$file
goimports -w $file
done
```
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Metrics collectors generally don't need the daemon to prime the stats
with something to compare since they already have something to compare
with.
Before this change, the API does 2 collection cycles (which takes
roughly 2s) in order to provide comparison for CPU usage over 1s. This
was primarily added so that `docker stats --no-stream` had something to
compare against.
Really the CLI should have just made a 2nd call and done the comparison
itself rather than forcing it on all API consumers.
That ship has long sailed, though.
With this change, clients can set an option to just pull a single stat,
which is *at least* a full second faster:
Old:
```
time curl --unix-socket
/go/src/github.com/docker/docker/bundles/test-integration-shell/docker.sock
http://./containers/test/stats?stream=false\&one-shot=false > /dev/null
2>&1
real0m1.864s
user0m0.005s
sys0m0.007s
time curl --unix-socket
/go/src/github.com/docker/docker/bundles/test-integration-shell/docker.sock
http://./containers/test/stats?stream=false\&one-shot=false > /dev/null
2>&1
real0m1.173s
user0m0.010s
sys0m0.006s
```
New:
```
time curl --unix-socket
/go/src/github.com/docker/docker/bundles/test-integration-shell/docker.sock
http://./containers/test/stats?stream=false\&one-shot=true > /dev/null
2>&1
real0m0.680s
user0m0.008s
sys0m0.004s
time curl --unix-socket
/go/src/github.com/docker/docker/bundles/test-integration-shell/docker.sock
http://./containers/test/stats?stream=false\&one-shot=true > /dev/null
2>&1
real0m0.156s
user0m0.007s
sys0m0.007s
```
This fixes issues with downstreams ability to use the stats API to
collect metrics.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This makes sure that things like `--tmpfs` mounts over an anonymous
volume don't create volumes uneccessarily.
One method only checks mountpoints, the other checks both mountpoints
and tmpfs... the usage of these should likely be consolidated.
Ideally, processing for `--tmpfs` mounts would get merged in with the
rest of the mount parsing. I opted not to do that for this change so the
fix is minimal and can potentially be backported with fewer changes of
breaking things.
Merging the mount processing for tmpfs can be handled in a followup.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Docker Desktop (on MAC and Windows hosts) allows containers
running inside a Linux VM to connect to the host using
the host.docker.internal DNS name, which is implemented by
VPNkit (DNS proxy on the host)
This PR allows containers to connect to Linux hosts
by appending a special string "host-gateway" to --add-host
e.g. "--add-host=host.docker.internal:host-gateway" which adds
host.docker.internal DNS entry in /etc/hosts and maps it to host-gateway-ip
This PR also add a daemon flag call host-gateway-ip which defaults to
the default bridge IP
Docker Desktop will need to set this field to the Host Proxy IP
so DNS requests for host.docker.internal can be routed to VPNkit
Addresses: https://github.com/docker/for-linux/issues/264
Signed-off-by: Arko Dasgupta <arko.dasgupta@docker.com>
```
internal/test/environment/environment.go:37:23: `useing` is a misspelling of `using`(misspell)
integration/container/wait_test.go:49:9: `waitres` is a misspelling of `waiters`(misspell)
integration/container/wait_test.go:95:9: `waitres` is a misspelling of `waiters`(misspell)
integration-cli/docker_api_containers_test.go:1042:7: `waitres` is a misspelling of `waiters`(misspell)
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Trying to link to a non-existing container is not valid, and should return an
"invalid parameter" (400) error. Returning a "not found" error in this situation
would make the client report the container's image could not be found.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This test is failing on Windows currently:
```
11:59:47 --- FAIL: TestHealthKillContainer (8.12s)
11:59:47 health_test.go:57: assertion failed: error is not nil: Error response from daemon: Invalid signal: SIGUSR1
``
That test was added recently in https://github.com/moby/moby/pull/39454, but
rewritten in a commit in the same PR:
f8aef6a92f
In that rewrite, there were some changes:
- originally it was skipped on Windows, but the rewritten test doesn't have that skip:
```go
testRequires(c, DaemonIsLinux) // busybox doesn't work on Windows
```
- the original test used `SIGINT`, but the new one uses `SIGUSR1`
Analysis:
- The Error bubbles up from: 8e610b2b55/pkg/signal/signal.go (L29-L44)
- Interestingly; `ContainerKill` should validate if a signal is valid for the given platform, but somehow we don't hit that part; f1b5612f20/daemon/kill.go (L40-L48)
- Windows only looks to support 2 signals currently 8e610b2b55/pkg/signal/signal_windows.go (L17-L26)
- Upstream Golang looks to define `SIGINT` as well; 77f9b2728e/src/runtime/defs_windows.go (L44)
- This looks like the current list of Signals upstream in Go; 3b58ed4ad3/windows/types_windows.go (L52-L67)
```go
const (
// More invented values for signals
SIGHUP = Signal(0x1)
SIGINT = Signal(0x2)
SIGQUIT = Signal(0x3)
SIGILL = Signal(0x4)
SIGTRAP = Signal(0x5)
SIGABRT = Signal(0x6)
SIGBUS = Signal(0x7)
SIGFPE = Signal(0x8)
SIGKILL = Signal(0x9)
SIGSEGV = Signal(0xb)
SIGPIPE = Signal(0xd)
SIGALRM = Signal(0xe)
SIGTERM = Signal(0xf)
)
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
```
Line 25: warning: context.Context should be the first parameter of a function (golint)
Line 44: warning: context.Context should be the first parameter of a function (golint)
Line 52: warning: context.Context should be the first parameter of a function (golint)
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This adds both a daemon-wide flag and a container creation property:
- Set the `CgroupnsMode: "host|private"` HostConfig property at
container creation time to control what cgroup namespace the container
is created in
- Set the `--default-cgroupns-mode=host|private` daemon flag to control
what cgroup namespace containers are created in by default
- Set the default if the daemon flag is unset to "host", for backward
compatibility
- Default to CgroupnsMode: "host" for client versions < 1.40
Signed-off-by: Rob Gulewich <rgulewich@netflix.com>
This is enabled for all containers that are not run with --privileged,
if the kernel supports it.
Fixes#38332
Signed-off-by: Rob Gulewich <rgulewich@netflix.com>
This test case requires not just daemon >= 1.40, but also
client API >= 1.40. In case older client is used, we'll
get failure from the very first check:
> ipcmode_linux_test.go:313: assertion failed: shareable (string) != private (string)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Older versions of the daemon would concatenate hostname and
domainname, so hostname "foobar" and domainname "baz.cyphar.com"
would produce `foobar.baz.cyphar.com` as hostname.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Create a new container for each subtest, so that individual
subtests are self-contained, and there's no need to execute
them in the exact order, or resetting the container in between.
This makes the test slower (6.54s vs 3.43s), but reduced the
difference by using `network=host`, which made a substantial
difference (without `network=host`, the test took more than
twice as long: 13.96s).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- Don't set `PidsLimit` when creating a container and
no limit was set (or the limit was set to "unlimited")
- Don't set `PidsLimit` if the host does not have pids-limit
support (previously "unlimited" was set).
- Do not generate a warning if the host does not have pids-limit
support, but pids-limit was set to unlimited (having no
limit set, or the limit set to "unlimited" is equivalent,
so no warning is nescessary in that case).
- When updating a container, convert `0`, and `-1` to
"unlimited" (`0`).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This changes the default ipc mode of daemon/engine to be private,
meaning the containers will not have their /dev/shm bind-mounted
from the host by default. The benefits of doing this are:
1. No leaked mounts. Eliminate a possibility to leak mounts into
other namespaces (and therefore unfortunate errors like "Unable to
remove filesystem for <ID>: remove /var/lib/docker/containers/<ID>/shm:
device or resource busy").
2. Working checkpoint/restore. Make `docker checkpoint`
not lose the contents of `/dev/shm`, but save it to
the dump, and be restored back upon `docker start --checkpoint`
(currently it is lost -- while CRIU handles tmpfs mounts,
the "shareable" mount is seen as external to container,
and thus rightfully ignored).
3. Better security. Currently any container is opened to share
its /dev/shm with any other container.
Obviously, this change will break the following usage scenario:
$ docker run -d --name donor busybox top
$ docker run --rm -it --ipc container:donor busybox sh
Error response from daemon: linux spec namespaces: can't join IPC
of container <ID>: non-shareable IPC (hint: use IpcMode:shareable
for the donor container)
The soution, as hinted by the (amended) error message, is to
explicitly enable donor sharing by using --ipc shareable:
$ docker run -d --name donor --ipc shareable busybox top
Compatibility notes:
1. This only applies to containers created _after_ this change.
Existing containers are not affected and will work fine
as their ipc mode is stored in HostConfig.
2. Old backward compatible behavior ("shareable" containers
by default) can be enabled by either using
`--default-ipc-mode shareable` daemon command line option,
or by adding a `"default-ipc-mode": "shareable"`
line in `/etc/docker/daemon.json` configuration file.
3. If an older client (API < 1.40) is used, a "shareable" container
is created. A test to check that is added.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Move the test case from integration-cli to integration.
The test logic itself has not changed, except these
two things:
* the new test sets default-ipc-mode via command line
rather than via daemon.json (less code);
* the new test uses current API version.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>