This adds both a daemon-wide flag and a container creation property:
- Set the `CgroupnsMode: "host|private"` HostConfig property at
container creation time to control what cgroup namespace the container
is created in
- Set the `--default-cgroupns-mode=host|private` daemon flag to control
what cgroup namespace containers are created in by default
- Set the default if the daemon flag is unset to "host", for backward
compatibility
- Default to CgroupnsMode: "host" for client versions < 1.40
Signed-off-by: Rob Gulewich <rgulewich@netflix.com>
lxc-user-nic can eliminate slirp overhead but needs /etc/lxc/lxc-usernet to be configured for the current user.
To use lxc-user-nic, $DOCKERD_ROOTLESS_ROOTLESSKIT_NET=lxc-user-nic also needs to be set.
This commit also bumps up RootlessKit from v0.3.0 to v0.4.0:
70e0502f32...e92d5e772e
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
The `--rootless` flag had a couple of issues:
* #38702: euid=0, $USER="root" but no access to cgroup ("rootful" Docker in rootless Docker)
* #39009: euid=0 but $USER="docker" (rootful boot2docker)
To fix#38702, XDG dirs are ignored as in rootful Docker, unless the
dockerd is directly running under RootlessKit namespaces.
RootlessKit detection is implemented by checking whether `$ROOTLESSKIT_STATE_DIR` is set.
To fix#39009, the non-robust `$USER` check is now completely removed.
The entire logic can be illustrated as follows:
```
withRootlessKit := getenv("ROOTLESSKIT_STATE_DIR")
rootlessMode := withRootlessKit || cliFlag("--rootless")
honorXDG := withRootlessKit
useRootlessKitDockerProxy := withRootlessKit
removeCgroupSpec := rootlessMode
adjustOOMScoreAdj := rootlessMode
```
Close#39024Fix#38702#39009
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Now `docker run -p` ports can be exposed to the host namespace automatically when `dockerd-rootless.sh` is launched with
`--userland-proxy --userland-proxy-path $(which rootlesskit-docker-proxy)`.
This is akin to how Docker for Mac/Win works with `--userland-proxy-path=/path/to/vpnkit-expose-port`.
The port number on the host namespace needs to be set to >= 1024.
SCTP ports are currently unsupported.
RootlessKit changes: 7bbbc48a6f...ed26714429
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
This patch hard-codes support for NVIDIA GPUs.
In a future patch it should move out into its own Device Plugin.
Signed-off-by: Tibor Vass <tibor@docker.com>
This changes the default ipc mode of daemon/engine to be private,
meaning the containers will not have their /dev/shm bind-mounted
from the host by default. The benefits of doing this are:
1. No leaked mounts. Eliminate a possibility to leak mounts into
other namespaces (and therefore unfortunate errors like "Unable to
remove filesystem for <ID>: remove /var/lib/docker/containers/<ID>/shm:
device or resource busy").
2. Working checkpoint/restore. Make `docker checkpoint`
not lose the contents of `/dev/shm`, but save it to
the dump, and be restored back upon `docker start --checkpoint`
(currently it is lost -- while CRIU handles tmpfs mounts,
the "shareable" mount is seen as external to container,
and thus rightfully ignored).
3. Better security. Currently any container is opened to share
its /dev/shm with any other container.
Obviously, this change will break the following usage scenario:
$ docker run -d --name donor busybox top
$ docker run --rm -it --ipc container:donor busybox sh
Error response from daemon: linux spec namespaces: can't join IPC
of container <ID>: non-shareable IPC (hint: use IpcMode:shareable
for the donor container)
The soution, as hinted by the (amended) error message, is to
explicitly enable donor sharing by using --ipc shareable:
$ docker run -d --name donor --ipc shareable busybox top
Compatibility notes:
1. This only applies to containers created _after_ this change.
Existing containers are not affected and will work fine
as their ipc mode is stored in HostConfig.
2. Old backward compatible behavior ("shareable" containers
by default) can be enabled by either using
`--default-ipc-mode shareable` daemon command line option,
or by adding a `"default-ipc-mode": "shareable"`
line in `/etc/docker/daemon.json` configuration file.
3. If an older client (API < 1.40) is used, a "shareable" container
is created. A test to check that is added.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The Swarmkit api specifies a target for configs called called "Runtime"
which indicates that the config is not mounted into the container but
has some other use. This commit updates the Docker api to reflect this.
Signed-off-by: Drew Erny <drew.erny@docker.com>
Please refer to `docs/rootless.md`.
TLDR:
* Make sure `/etc/subuid` and `/etc/subgid` contain the entry for you
* `dockerd-rootless.sh --experimental`
* `docker -H unix://$XDG_RUNTIME_DIR/docker.sock run ...`
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
Monitoring systems and load balancers are usually configured to use HEAD
requests for health monitoring. The /_ping endpoint currently does not
support this type of request, which means that those systems have fallback
to GET requests.
This patch adds support for HEAD requests on the /_ping endpoint.
Although optional, this patch also returns `Content-Type` and `Content-Length`
headers in case of a HEAD request; Refering to RFC 7231, section 4.3.2:
The HEAD method is identical to GET except that the server MUST NOT
send a message body in the response (i.e., the response terminates at
the end of the header section). The server SHOULD send the same
header fields in response to a HEAD request as it would have sent if
the request had been a GET, except that the payload header fields
(Section 3.3) MAY be omitted. This method can be used for obtaining
metadata about the selected representation without transferring the
representation data and is often used for testing hypertext links for
validity, accessibility, and recent modification.
A payload within a HEAD request message has no defined semantics;
sending a payload body on a HEAD request might cause some existing
implementations to reject the request.
The response to a HEAD request is cacheable; a cache MAY use it to
satisfy subsequent HEAD requests unless otherwise indicated by the
Cache-Control header field (Section 5.2 of [RFC7234]). A HEAD
response might also have an effect on previously cached responses to
GET; see Section 4.3.5 of [RFC7234].
With this patch applied, either `GET` or `HEAD` requests work; the only
difference is that the body is empty in case of a `HEAD` request;
curl -i --unix-socket /var/run/docker.sock http://localhost/_ping
HTTP/1.1 200 OK
Api-Version: 1.40
Cache-Control: no-cache, no-store, must-revalidate
Docker-Experimental: false
Ostype: linux
Pragma: no-cache
Server: Docker/dev (linux)
Date: Mon, 14 Jan 2019 12:35:16 GMT
Content-Length: 2
Content-Type: text/plain; charset=utf-8
OK
curl --head -i --unix-socket /var/run/docker.sock http://localhost/_ping
HTTP/1.1 200 OK
Api-Version: 1.40
Cache-Control: no-cache, no-store, must-revalidate
Content-Length: 0
Content-Type: text/plain; charset=utf-8
Docker-Experimental: false
Ostype: linux
Pragma: no-cache
Server: Docker/dev (linux)
Date: Mon, 14 Jan 2019 12:34:15 GMT
The client is also updated to use `HEAD` by default, but fallback to `GET`
if the daemon does not support this method.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- Add support for exact list of capabilities, support only OCI model
- Support OCI model on CapAdd and CapDrop but remain backward compatibility
- Create variable locally instead of declaring it at the top
- Use const for magic "ALL" value
- Rename `cap` variable as it overlaps with `cap()` built-in
- Normalize and validate capabilities before use
- Move validation for conflicting options to validateHostConfig()
- TweakCapabilities: simplify logic to calculate capabilities
Signed-off-by: Olli Janatuinen <olli.janatuinen@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This fix tries to address the issue raised in 37038 where
there were no memory.kernelTCP support for linux.
This fix add MemoryKernelTCP to HostConfig, and pass
the config to runtime-spec.
Additional test case has been added.
This fix fixes 37038.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
This commit contains changes to configure DataPathPort
option. By default we use 4789 port number. But this commit
will allow user to configure port number during swarm init.
DataPathPort can't be modified after swarm init.
Signed-off-by: selansen <elango.siva@docker.com>
This allows non-recursive bind-mount, i.e. mount(2) with "bind" rather than "rbind".
Swarm-mode will be supported in a separate PR because of mutual vendoring.
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
This feature was added in 514ce73391,
and was merged after API v1.39 shipped as part of the Docker 18.09
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This feature was added in 14da20f5e7,
and was merged after API v1.39 shipped as part of the Docker 18.09
release candidates.
This commit moves the feature to the correct API version.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Looks like -i (together with DOCKER_INCREMENTAL_BINARY etc)
were used to get faster incremental builds.
Nowdays (since Go 1.10) this is no longer the case, as
go build cache is used [1]. Here's a quote:
> You do not have to use "go test -i" or "go build -i" or
> "go install" just to get fast incremental builds. We will
> not have to teach new users those workarounds anymore.
> Everything will just be fast.
To enable go cache between builds, add a volume for /root/.cache.
[1] https://groups.google.com/forum/#!msg/golang-dev/qfa3mHN4ZPA/X2UzjNV1BAAJ
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Adds support for sysctl options in docker services.
* Adds API plumbing for creating services with sysctl options set.
* Adds swagger.yaml documentation for new API field.
* Updates the API version history document.
* Changes executor package to make use of the Sysctls field on objects
* Includes integration test to verify that new behavior works.
Essentially, everything needed to support the equivalent of docker run's
`--sysctl` option except the CLI.
Includes a vendoring of swarmkit for proto changes to support the new
behavior.
Signed-off-by: Drew Erny <drew.erny@docker.com>
When requesting information about the daemon's configuration through the `/info`
endpoint, missing features (or non-recommended settings) may have to be presented
to the user.
Detecting these situations, and printing warnings currently is handled by the
cli, which results in some complications:
- duplicated effort: each client has to re-implement detection and warnings.
- it's not possible to generate warnings for reasons outside of the information
returned in the `/info` response.
- cli-side detection has to be updated for new conditions. This means that an
older cli connecting to a new daemon may not print all warnings (due to
it not detecting the new conditions)
- some warnings (in particular, warnings about storage-drivers) depend on
driver-status (`DriverStatus`) information. The format of the information
returned in this field is not part of the API specification and can change
over time, resulting in cli-side detection no longer being functional.
This patch adds a new `Warnings` field to the `/info` response. This field is
to return warnings to be presented by the user.
Existing warnings that are currently handled by the CLI are copied to the daemon
as part of this patch; This change is backward-compatible with existing
clients; old client can continue to use the client-side warnings, whereas new
clients can skip client-side detection, and print warnings that are returned by
the daemon.
Example response with this patch applied;
```bash
curl --unix-socket /var/run/docker.sock http://localhost/info | jq .Warnings
```
```json
[
"WARNING: bridge-nf-call-iptables is disabled",
"WARNING: bridge-nf-call-ip6tables is disabled"
]
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This feature allows user to specify list of subnets for global
default address pool. User can configure subnet list using
'swarm init' command. Daemon passes the information to swarmkit.
We validate the information in swarmkit, then store it in cluster
object. when IPAM init is called, we pass subnet list to IPAM driver.
Signed-off-by: selansen <elango.siva@docker.com>
* Expose license status in Info
This wires up a new field in the Info payload that exposes the license.
For moby this is hardcoded to always report a community edition.
Downstream enterprise dockerd will have additional licensing logic wired
into this function to report details about the current license status.
Signed-off-by: Daniel Hiltgen <daniel.hiltgen@docker.com>
* Code review comments
Signed-off-by: Daniel Hiltgen <daniel.hiltgen@docker.com>
* Add windows autogen support
Signed-off-by: Daniel Hiltgen <daniel.hiltgen@docker.com>