This adds both a daemon-wide flag and a container creation property:
- Set the `CgroupnsMode: "host|private"` HostConfig property at
container creation time to control what cgroup namespace the container
is created in
- Set the `--default-cgroupns-mode=host|private` daemon flag to control
what cgroup namespace containers are created in by default
- Set the default if the daemon flag is unset to "host", for backward
compatibility
- Default to CgroupnsMode: "host" for client versions < 1.40
Signed-off-by: Rob Gulewich <rgulewich@netflix.com>
Running a cluster in a two-manager configuration effectively *doubles*
the chance of loosing control over the cluster (compared to running
in a single-manager setup). Users may have the assumption that having
two managers provides fault tolerance, so it's best to warn them if
they're using this configuration.
This patch adds a warning to the `info` response if Swarm is configured
with two managers:
WARNING: Running Swarm in a two-manager configuration. This configuration provides
no fault tolerance, and poses a high risk to loose control over the cluster.
Refer to https://docs.docker.com/engine/swarm/admin_guide/ to configure the
Swarm for fault-tolerance.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This utility allows a client to convert an API response
back to a typed error; allowing the client to perform
different actions based on the type of error, without
having to resort to string-matching the error.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- Don't set `PidsLimit` when creating a container and
no limit was set (or the limit was set to "unlimited")
- Don't set `PidsLimit` if the host does not have pids-limit
support (previously "unlimited" was set).
- Do not generate a warning if the host does not have pids-limit
support, but pids-limit was set to unlimited (having no
limit set, or the limit set to "unlimited" is equivalent,
so no warning is nescessary in that case).
- When updating a container, convert `0`, and `-1` to
"unlimited" (`0`).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This changes the default ipc mode of daemon/engine to be private,
meaning the containers will not have their /dev/shm bind-mounted
from the host by default. The benefits of doing this are:
1. No leaked mounts. Eliminate a possibility to leak mounts into
other namespaces (and therefore unfortunate errors like "Unable to
remove filesystem for <ID>: remove /var/lib/docker/containers/<ID>/shm:
device or resource busy").
2. Working checkpoint/restore. Make `docker checkpoint`
not lose the contents of `/dev/shm`, but save it to
the dump, and be restored back upon `docker start --checkpoint`
(currently it is lost -- while CRIU handles tmpfs mounts,
the "shareable" mount is seen as external to container,
and thus rightfully ignored).
3. Better security. Currently any container is opened to share
its /dev/shm with any other container.
Obviously, this change will break the following usage scenario:
$ docker run -d --name donor busybox top
$ docker run --rm -it --ipc container:donor busybox sh
Error response from daemon: linux spec namespaces: can't join IPC
of container <ID>: non-shareable IPC (hint: use IpcMode:shareable
for the donor container)
The soution, as hinted by the (amended) error message, is to
explicitly enable donor sharing by using --ipc shareable:
$ docker run -d --name donor --ipc shareable busybox top
Compatibility notes:
1. This only applies to containers created _after_ this change.
Existing containers are not affected and will work fine
as their ipc mode is stored in HostConfig.
2. Old backward compatible behavior ("shareable" containers
by default) can be enabled by either using
`--default-ipc-mode shareable` daemon command line option,
or by adding a `"default-ipc-mode": "shareable"`
line in `/etc/docker/daemon.json` configuration file.
3. If an older client (API < 1.40) is used, a "shareable" container
is created. A test to check that is added.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
There are two if statements checking for exactly same conditions:
> if hostConfig != nil && versions.LessThan(version, "1.40")
Merge these.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Older API clients did not use a pointer for `PidsLimit`, so
API requests would always send `0`, resulting in any previous
value to be reset after an update:
Before this patch:
(using a 17.06 Docker CLI):
```bash
docker run -dit --name test --pids-limit=16 busybox
docker container inspect --format '{{json .HostConfig.PidsLimit}}' test
16
docker container update --memory=100M --memory-swap=200M test
docker container inspect --format '{{json .HostConfig.PidsLimit}}' test
0
docker container exec test cat /sys/fs/cgroup/pids/pids.max
max
```
With this patch applied:
(using a 17.06 Docker CLI):
```bash
docker run -dit --name test --pids-limit=16 busybox
docker container inspect --format '{{json .HostConfig.PidsLimit}}' test
16
docker container update --memory=100M --memory-swap=200M test
docker container inspect --format '{{json .HostConfig.PidsLimit}}' test
16
docker container exec test cat /sys/fs/cgroup/pids/pids.max
16
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The Swarmkit api specifies a target for configs called called "Runtime"
which indicates that the config is not mounted into the container but
has some other use. This commit updates the Docker api to reflect this.
Signed-off-by: Drew Erny <drew.erny@docker.com>
This assists to address a regression where distribution errors were not properly
handled, resulting in a generic 500 (internal server error) to be returned for
`/distribution/name/json` if you weren't authenticated, whereas it should return
a 40x (401).
This patch attempts to extract the HTTP status-code that was returned by the
distribution code, and falls back to returning a 500 status if unable to match.
Before this change:
curl -v --unix-socket /var/run/docker.sock http://localhost/distribution/name/json
* Trying /var/run/docker.sock...
* Connected to localhost (/var/run/docker.sock) port 80 (#0)
> GET /distribution/name/json HTTP/1.1
> Host: localhost
> User-Agent: curl/7.52.1
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< Api-Version: 1.37
< Content-Type: application/json
< Docker-Experimental: false
< Ostype: linux
< Server: Docker/dev (linux)
< Date: Tue, 03 Jul 2018 15:52:53 GMT
< Content-Length: 115
<
{"message":"errors:\ndenied: requested access to the resource is denied\nunauthorized: authentication required\n"}
* Curl_http_done: called premature == 0
* Connection #0 to host localhost left intact
daemon logs:
DEBU[2018-07-03T15:52:51.424950601Z] Calling GET /distribution/name/json
DEBU[2018-07-03T15:52:53.179895572Z] FIXME: Got an API for which error does not match any expected type!!!: errors:
denied: requested access to the resource is denied
unauthorized: authentication required
error_type=errcode.Errors module=api
ERRO[2018-07-03T15:52:53.179942783Z] Handler for GET /distribution/name/json returned error: errors:
denied: requested access to the resource is denied
unauthorized: authentication required
With this patch applied:
curl -v --unix-socket /var/run/docker.sock http://localhost/distribution/name/json
* Trying /var/run/docker.sock...
* Connected to localhost (/var/run/docker.sock) port 80 (#0)
> GET /distribution/name/json HTTP/1.1
> Host: localhost
> User-Agent: curl/7.52.1
> Accept: */*
>
< HTTP/1.1 403 Forbidden
< Api-Version: 1.38
< Content-Type: application/json
< Docker-Experimental: false
< Ostype: linux
< Server: Docker/dev (linux)
< Date: Fri, 03 Aug 2018 14:58:09 GMT
< Content-Length: 115
<
{"message":"errors:\ndenied: requested access to the resource is denied\nunauthorized: authentication required\n"}
* Curl_http_done: called premature == 0
* Connection #0 to host localhost left intact
daemon logs:
DEBU[2018-08-03T14:58:08.018726228Z] Calling GET /distribution/name/json
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Monitoring systems and load balancers are usually configured to use HEAD
requests for health monitoring. The /_ping endpoint currently does not
support this type of request, which means that those systems have fallback
to GET requests.
This patch adds support for HEAD requests on the /_ping endpoint.
Although optional, this patch also returns `Content-Type` and `Content-Length`
headers in case of a HEAD request; Refering to RFC 7231, section 4.3.2:
The HEAD method is identical to GET except that the server MUST NOT
send a message body in the response (i.e., the response terminates at
the end of the header section). The server SHOULD send the same
header fields in response to a HEAD request as it would have sent if
the request had been a GET, except that the payload header fields
(Section 3.3) MAY be omitted. This method can be used for obtaining
metadata about the selected representation without transferring the
representation data and is often used for testing hypertext links for
validity, accessibility, and recent modification.
A payload within a HEAD request message has no defined semantics;
sending a payload body on a HEAD request might cause some existing
implementations to reject the request.
The response to a HEAD request is cacheable; a cache MAY use it to
satisfy subsequent HEAD requests unless otherwise indicated by the
Cache-Control header field (Section 5.2 of [RFC7234]). A HEAD
response might also have an effect on previously cached responses to
GET; see Section 4.3.5 of [RFC7234].
With this patch applied, either `GET` or `HEAD` requests work; the only
difference is that the body is empty in case of a `HEAD` request;
curl -i --unix-socket /var/run/docker.sock http://localhost/_ping
HTTP/1.1 200 OK
Api-Version: 1.40
Cache-Control: no-cache, no-store, must-revalidate
Docker-Experimental: false
Ostype: linux
Pragma: no-cache
Server: Docker/dev (linux)
Date: Mon, 14 Jan 2019 12:35:16 GMT
Content-Length: 2
Content-Type: text/plain; charset=utf-8
OK
curl --head -i --unix-socket /var/run/docker.sock http://localhost/_ping
HTTP/1.1 200 OK
Api-Version: 1.40
Cache-Control: no-cache, no-store, must-revalidate
Content-Length: 0
Content-Type: text/plain; charset=utf-8
Docker-Experimental: false
Ostype: linux
Pragma: no-cache
Server: Docker/dev (linux)
Date: Mon, 14 Jan 2019 12:34:15 GMT
The client is also updated to use `HEAD` by default, but fallback to `GET`
if the daemon does not support this method.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- Add support for exact list of capabilities, support only OCI model
- Support OCI model on CapAdd and CapDrop but remain backward compatibility
- Create variable locally instead of declaring it at the top
- Use const for magic "ALL" value
- Rename `cap` variable as it overlaps with `cap()` built-in
- Normalize and validate capabilities before use
- Move validation for conflicting options to validateHostConfig()
- TweakCapabilities: simplify logic to calculate capabilities
Signed-off-by: Olli Janatuinen <olli.janatuinen@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
`time.After` keeps a timer running until the specified duration is
completed. It also allocates a new timer on each call. This can wind up
leaving lots of uneccessary timers running in the background that are
not needed and consume resources.
Instead of `time.After`, use `time.NewTimer` so the timer can actually
be stopped.
In some of these cases it's not a big deal since the duraiton is really
short, but in others it is much worse.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
The cancellable handler is no longer needed as the context that is
passed with the http request will be cancelled just like the close
notifier was doing.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This commit contains changes to configure DataPathPort
option. By default we use 4789 port number. But this commit
will allow user to configure port number during swarm init.
DataPathPort can't be modified after swarm init.
Signed-off-by: selansen <elango.siva@docker.com>
These options were added in API 1.39, so should be ignored
when using an older version of the API.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This allows non-recursive bind-mount, i.e. mount(2) with "bind" rather than "rbind".
Swarm-mode will be supported in a separate PR because of mutual vendoring.
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
This feature was added in 14da20f5e7,
and was merged after API v1.39 shipped as part of the Docker 18.09
release candidates.
This commit moves the feature to the correct API version.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Adds support for sysctl options in docker services.
* Adds API plumbing for creating services with sysctl options set.
* Adds swagger.yaml documentation for new API field.
* Updates the API version history document.
* Changes executor package to make use of the Sysctls field on objects
* Includes integration test to verify that new behavior works.
Essentially, everything needed to support the equivalent of docker run's
`--sysctl` option except the CLI.
Includes a vendoring of swarmkit for proto changes to support the new
behavior.
Signed-off-by: Drew Erny <drew.erny@docker.com>