This adds both a daemon-wide flag and a container creation property:
- Set the `CgroupnsMode: "host|private"` HostConfig property at
container creation time to control what cgroup namespace the container
is created in
- Set the `--default-cgroupns-mode=host|private` daemon flag to control
what cgroup namespace containers are created in by default
- Set the default if the daemon flag is unset to "host", for backward
compatibility
- Default to CgroupnsMode: "host" for client versions < 1.40
Signed-off-by: Rob Gulewich <rgulewich@netflix.com>
This is enabled for all containers that are not run with --privileged,
if the kernel supports it.
Fixes#38332
Signed-off-by: Rob Gulewich <rgulewich@netflix.com>
This test case requires not just daemon >= 1.40, but also
client API >= 1.40. In case older client is used, we'll
get failure from the very first check:
> ipcmode_linux_test.go:313: assertion failed: shareable (string) != private (string)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Older versions of the daemon would concatenate hostname and
domainname, so hostname "foobar" and domainname "baz.cyphar.com"
would produce `foobar.baz.cyphar.com` as hostname.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
pborman/uuid and google/uuid used to be different versions of
the same package, but now pborman/uuid is a compatibility wrapper
around google/uuid, maintained by the same person.
Clean up some of the usage as the functions differ slightly.
Not yet removed some uses of pborman/uuid in vendored code but
I have PRs in process for these.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
When copying between stages, or copying from an image,
ownership of the copied files should not be changed, unless
the `--chown` option is set (in which case ownership of copied
files should be updated to the specified user/group).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Create a new container for each subtest, so that individual
subtests are self-contained, and there's no need to execute
them in the exact order, or resetting the container in between.
This makes the test slower (6.54s vs 3.43s), but reduced the
difference by using `network=host`, which made a substantial
difference (without `network=host`, the test took more than
twice as long: 13.96s).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- Don't set `PidsLimit` when creating a container and
no limit was set (or the limit was set to "unlimited")
- Don't set `PidsLimit` if the host does not have pids-limit
support (previously "unlimited" was set).
- Do not generate a warning if the host does not have pids-limit
support, but pids-limit was set to unlimited (having no
limit set, or the limit set to "unlimited" is equivalent,
so no warning is nescessary in that case).
- When updating a container, convert `0`, and `-1` to
"unlimited" (`0`).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This changes the default ipc mode of daemon/engine to be private,
meaning the containers will not have their /dev/shm bind-mounted
from the host by default. The benefits of doing this are:
1. No leaked mounts. Eliminate a possibility to leak mounts into
other namespaces (and therefore unfortunate errors like "Unable to
remove filesystem for <ID>: remove /var/lib/docker/containers/<ID>/shm:
device or resource busy").
2. Working checkpoint/restore. Make `docker checkpoint`
not lose the contents of `/dev/shm`, but save it to
the dump, and be restored back upon `docker start --checkpoint`
(currently it is lost -- while CRIU handles tmpfs mounts,
the "shareable" mount is seen as external to container,
and thus rightfully ignored).
3. Better security. Currently any container is opened to share
its /dev/shm with any other container.
Obviously, this change will break the following usage scenario:
$ docker run -d --name donor busybox top
$ docker run --rm -it --ipc container:donor busybox sh
Error response from daemon: linux spec namespaces: can't join IPC
of container <ID>: non-shareable IPC (hint: use IpcMode:shareable
for the donor container)
The soution, as hinted by the (amended) error message, is to
explicitly enable donor sharing by using --ipc shareable:
$ docker run -d --name donor --ipc shareable busybox top
Compatibility notes:
1. This only applies to containers created _after_ this change.
Existing containers are not affected and will work fine
as their ipc mode is stored in HostConfig.
2. Old backward compatible behavior ("shareable" containers
by default) can be enabled by either using
`--default-ipc-mode shareable` daemon command line option,
or by adding a `"default-ipc-mode": "shareable"`
line in `/etc/docker/daemon.json` configuration file.
3. If an older client (API < 1.40) is used, a "shareable" container
is created. A test to check that is added.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Move the test case from integration-cli to integration.
The test logic itself has not changed, except these
two things:
* the new test sets default-ipc-mode via command line
rather than via daemon.json (less code);
* the new test uses current API version.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Since container.Create() already initializes HostConfig
to be non-nil, there is no need for this code. Remove it.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Older API clients did not use a pointer for `PidsLimit`, so
API requests would always send `0`, resulting in any previous
value to be reset after an update:
Before this patch:
(using a 17.06 Docker CLI):
```bash
docker run -dit --name test --pids-limit=16 busybox
docker container inspect --format '{{json .HostConfig.PidsLimit}}' test
16
docker container update --memory=100M --memory-swap=200M test
docker container inspect --format '{{json .HostConfig.PidsLimit}}' test
0
docker container exec test cat /sys/fs/cgroup/pids/pids.max
max
```
With this patch applied:
(using a 17.06 Docker CLI):
```bash
docker run -dit --name test --pids-limit=16 busybox
docker container inspect --format '{{json .HostConfig.PidsLimit}}' test
16
docker container update --memory=100M --memory-swap=200M test
docker container inspect --format '{{json .HostConfig.PidsLimit}}' test
16
docker container exec test cat /sys/fs/cgroup/pids/pids.max
16
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Some tests were skipped if the local daemon did not have
experimental features enabled; at the same time, some tests
unconditionally created a new (experimental) daemon, even if
the local daemon already had experimental enabled.
This patch;
- Checks if the "testEnv" is an experimental Linux daemon
- If not, and the daemon is running locally; spin up a new
experimental daemon to be used during the test.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Monitoring systems and load balancers are usually configured to use HEAD
requests for health monitoring. The /_ping endpoint currently does not
support this type of request, which means that those systems have fallback
to GET requests.
This patch adds support for HEAD requests on the /_ping endpoint.
Although optional, this patch also returns `Content-Type` and `Content-Length`
headers in case of a HEAD request; Refering to RFC 7231, section 4.3.2:
The HEAD method is identical to GET except that the server MUST NOT
send a message body in the response (i.e., the response terminates at
the end of the header section). The server SHOULD send the same
header fields in response to a HEAD request as it would have sent if
the request had been a GET, except that the payload header fields
(Section 3.3) MAY be omitted. This method can be used for obtaining
metadata about the selected representation without transferring the
representation data and is often used for testing hypertext links for
validity, accessibility, and recent modification.
A payload within a HEAD request message has no defined semantics;
sending a payload body on a HEAD request might cause some existing
implementations to reject the request.
The response to a HEAD request is cacheable; a cache MAY use it to
satisfy subsequent HEAD requests unless otherwise indicated by the
Cache-Control header field (Section 5.2 of [RFC7234]). A HEAD
response might also have an effect on previously cached responses to
GET; see Section 4.3.5 of [RFC7234].
With this patch applied, either `GET` or `HEAD` requests work; the only
difference is that the body is empty in case of a `HEAD` request;
curl -i --unix-socket /var/run/docker.sock http://localhost/_ping
HTTP/1.1 200 OK
Api-Version: 1.40
Cache-Control: no-cache, no-store, must-revalidate
Docker-Experimental: false
Ostype: linux
Pragma: no-cache
Server: Docker/dev (linux)
Date: Mon, 14 Jan 2019 12:35:16 GMT
Content-Length: 2
Content-Type: text/plain; charset=utf-8
OK
curl --head -i --unix-socket /var/run/docker.sock http://localhost/_ping
HTTP/1.1 200 OK
Api-Version: 1.40
Cache-Control: no-cache, no-store, must-revalidate
Content-Length: 0
Content-Type: text/plain; charset=utf-8
Docker-Experimental: false
Ostype: linux
Pragma: no-cache
Server: Docker/dev (linux)
Date: Mon, 14 Jan 2019 12:34:15 GMT
The client is also updated to use `HEAD` by default, but fallback to `GET`
if the daemon does not support this method.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>