Disabling the oom-killer for a container without setting a memory limit
is dangerous, as it can result in the container consuming unlimited memory,
without the kernel being able to kill it. A check for this situation is curently
done in the CLI, but other consumers of the API won't receive this warning.
This patch adds a check for this situation to the daemon, so that all consumers
of the API will receive this warning.
This patch will have one side-effect; docker cli's that also perform this check
client-side will print the warning twice; this can be addressed by disabling
the cli-side check for newer API versions, but will generate a bit of extra
noise when using an older CLI.
With this patch applied (and a cli that does not take the new warning into account);
```
docker create --oom-kill-disable busybox
WARNING: OOM killer is disabled for the container, but no memory limit is set, this can result in the system running out of resources.
669933b9b237fa27da699483b5cf15355a9027050825146587a0e5be0d848adf
docker run --rm --oom-kill-disable busybox
WARNING: Disabling the OOM killer on containers without setting a '-m/--memory' limit may be dangerous.
WARNING: OOM killer is disabled for the container, but no memory limit is set, this can result in the system running out of resources.
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The errors returned from Mount and Unmount functions are raw
syscall.Errno errors (like EPERM or EINVAL), which provides
no context about what has happened and why.
Similar to os.PathError type, introduce mount.Error type
with some context. The error messages will now look like this:
> mount /tmp/mount-tests/source:/tmp/mount-tests/target, flags: 0x1001: operation not permitted
or
> mount tmpfs:/tmp/mount-test-source-516297835: operation not permitted
Before this patch, it was just
> operation not permitted
[v2: add Cause()]
[v3: rename MountError to Error, document Cause()]
[v4: fixes; audited all users]
[v5: make Error type private; changes after @cpuguy83 reviews]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
As standard mount.Unmount does what we need, let's use it.
In addition, this adds ignoring "not mounted" condition, which
was previously implemented (see PR#33329, commit cfa2591d3f)
via a very expensive call to mount.Mounted().
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This small fix renames `TestfillLicense` to `TestFillLicense`
as otherwise go vet reports:
```
$ go tool vet daemon/licensing_test.go
daemon/licensing_test.go:11: TestfillLicense has malformed name: first letter after 'Test' must not be lowercase
```
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
This fix tries to address the issue raised in 38258
where current RFC5424 sys log format does not zero pad
the time (trailing zeros are removed)
This fix apply the patch to fix the issue. This fix fixes 38258.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
The OCI doesn't have a specific field for an NIS domainname[1] (mainly
because FreeBSD and Solaris appear to have a similar concept but it is
configured entirely differently).
However, on Linux, the NIS domainname can be configured through both the
setdomainname(2) syscall but also through the "kernel.domainname"
sysctl. Since the OCI has a way of injecting sysctls this means we don't
need to have any OCI changes to support NIS domainnames (and we can
always switch if the OCI picks up such support in the future).
It should be noted that because we have to generate this each spec
creation we also have to make sure that it's not clobbered by the
HostConfig. I'm pretty sure making this change generic (so that
HostConfig will not clobber any pre-set sysctls) will not cause other
issues to crop up.
[1]: https://github.com/opencontainers/runtime-spec/issues/592
Signed-off-by: Aleksa Sarai <asarai@suse.de>
This fix tries to address the issue raised in 37038 where
there were no memory.kernelTCP support for linux.
This fix add MemoryKernelTCP to HostConfig, and pass
the config to runtime-spec.
Additional test case has been added.
This fix fixes 37038.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
The v1.10 layout and the migrator was added in 2015 via #17924.
Although the migrator is not marked as "deprecated" explicitly in
cli/docs/deprecated.md, I suppose people should have already migrated
from pre-v1.10 and they no longer need the migrator, because pre-v1.10
version do not support schema2 images (and these versions no longer
receives security updates).
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
This commit contains changes to configure DataPathPort
option. By default we use 4789 port number. But this commit
will allow user to configure port number during swarm init.
DataPathPort can't be modified after swarm init.
Signed-off-by: selansen <elango.siva@docker.com>
Implements the --device forwarding for Windows daemons. This maps the physical
device into the container at runtime.
Ex:
docker run --device="class/<clsid>" <image> <cmd>
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
This allows non-recursive bind-mount, i.e. mount(2) with "bind" rather than "rbind".
Swarm-mode will be supported in a separate PR because of mutual vendoring.
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
4MB client side limit was introduced in vendoring go-grpc#1165 (v1.4.0)
making these requests likely to produce errors
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
The `aufs` storage driver is deprecated in favor of `overlay2`, and will
be removed in a future release. Users of the `aufs` storage driver are
recommended to migrate to a different storage driver, such as `overlay2`, which
is now the default storage driver.
The `aufs` storage driver facilitates running Docker on distros that have no
support for OverlayFS, such as Ubuntu 14.04 LTS, which originally shipped with
a 3.14 kernel.
Now that Ubuntu 14.04 is no longer a supported distro for Docker, and `overlay2`
is available to all supported distros (as they are either on kernel 4.x, or have
support for multiple lowerdirs backported), there is no reason to continue
maintenance of the `aufs` storage driver.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
For some reason, shared mount propagation between the host
and a container does not work for btrfs, unless container
root directory (i.e. graphdriver home) is a bind mount.
The above issue was reproduced on SLES 12sp3 + btrfs using
the following script:
#!/bin/bash
set -eux -o pipefail
# DIR should not be under a subvolume
DIR=${DIR:-/lib}
MNT=$DIR/my-mnt
FILE=$MNT/file
ID=$(docker run -d --privileged -v $DIR:$DIR:rshared ubuntu sleep 24h)
docker exec $ID mkdir -p $MNT
docker exec $ID mount -t tmpfs tmpfs $MNT
docker exec $ID touch $FILE
ls -l $FILE
umount $MNT
docker rm -f $ID
which fails this way:
+ ls -l /lib/my-mnt/file
ls: cannot access '/lib/my-mnt/file': No such file or directory
meaning the mount performed inside a priviledged container is not
propagated back to the host (even if all the mounts have "shared"
propagation mode).
The remedy to the above is to make graphdriver home a bind mount.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
As pointed out in https://github.com/moby/moby/issues/37970,
Docker overlay driver can't work with index=on feature of
the Linux kernel "overlay" filesystem. In case the global
default is set to "yes", Docker will fail with EBUSY when
trying to mount, like this:
> error creating overlay mount to ...../merged: device or resource busy
and the kernel log should contain something like:
> overlayfs: upperdir is in-use by another mount, mount with
> '-o index=off' to override exclusive upperdir protection.
A workaround is to set index=off in overlay kernel module
parameters, or even recompile the kernel with
CONFIG_OVERLAY_FS_INDEX=n in .config. Surely this is not
always practical or even possible.
The solution, as pointed out my Amir Goldstein (as well as
the above kernel message:) is to use 'index=off' option
when mounting.
NOTE since older (< 4.13rc1) kernels do not support "index="
overlayfs parameter, try to figure out whether the option
is supported. In case it's not possible to figure out,
assume it is not.
NOTE the default can be changed anytime (by writing to
/sys/module/overlay/parameters/index) so we need to always
use index=off.
[v2: move the detection code to Init()]
[v3: don't set index=off if stat() failed]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Discourage users from using deprecated storage-drivers
by skipping them when automatically selecting a storage-
driver.
This change does not affect existing installations, because
existing state will take precedence.
Users can still use deprecated drivers by manually configuring
the daemon to use a specific driver.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The `overlay` storage driver is deprecated in favor of the `overlay2` storage
driver, which has all the benefits of `overlay`, without its limitations (excessive
inode consumption). The legacy `overlay` storage driver will be removed in a future
release. Users of the `overlay` storage driver should migrate to the `overlay2`
storage driver.
The legacy `overlay` storage driver allowed using overlayFS-backed filesystems
on pre 4.x kernels. Now that all supported distributions are able to run `overlay2`
(as they are either on kernel 4.x, or have support for multiple lowerdirs
backported), there is no reason to keep maintaining the `overlay` storage driver.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The `devicemapper` storage driver is deprecated in favor of `overlay2`, and will
be removed in a future release. Users of the `devicemapper` storage driver are
recommended to migrate to a different storage driver, such as `overlay2`, which
is now the default storage driver.
The `devicemapper` storage driver facilitates running Docker on older (3.x) kernels
that have no support for other storage drivers (such as overlay2, or AUFS).
Now that support for `overlay2` is added to all supported distros (as they are
either on kernel 4.x, or have support for multiple lowerdirs backported), there
is no reason to continue maintenance of the `devicemapper` storage driver.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The CloudWatch Logs API defines its limits in terms of bytes, but its
inputs in terms of UTF-8 encoded strings. Byte-sequences which are not
valid UTF-8 encodings are normalized to the Unicode replacement
character U+FFFD, which is a 3-byte sequence in UTF-8. This replacement
can cause the input to grow, exceeding the API limit and causing failed
API calls.
This commit adds logic for counting the effective byte length after
normalization and splitting input without splitting valid UTF-8
byte-sequences into two invalid byte-sequences.
Fixes https://github.com/moby/moby/issues/37747
Signed-off-by: Samuel Karp <skarp@amazon.com>
Using a value such as `--cpuset-mems=1-9223372036854775807` would cause
`dockerd` to run out of memory allocating a map of the values in the
validation code. Set limits to the normal limit of the number of CPUs,
and improve the error handling.
Reported by Huawei PSIRT.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
With containerd reaching 1.0, the runtime now
has a stable API, so there's no need to do a check
if the installed version matches the expected version.
Current versions of Docker now also package containerd
and runc separately, and can be _updated_ separately.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
We check for max value for -default-addr-pool-mask-length param as 32.
But There won't be enough addresses on the overlay network. Hence we are
keeping it 29 so that we would be having atleast 8 addresses in /29 network.
Signed-off-by: selansen <elango.siva@docker.com>
Adds support for sysctl options in docker services.
* Adds API plumbing for creating services with sysctl options set.
* Adds swagger.yaml documentation for new API field.
* Updates the API version history document.
* Changes executor package to make use of the Sysctls field on objects
* Includes integration test to verify that new behavior works.
Essentially, everything needed to support the equivalent of docker run's
`--sysctl` option except the CLI.
Includes a vendoring of swarmkit for proto changes to support the new
behavior.
Signed-off-by: Drew Erny <drew.erny@docker.com>
The `/etc/docker` directory is used both by the dockerd daemon
and the docker cli (if installed on the saem host as the daemon).
In situations where the `/etc/docker` directory does not exist,
and an initial `key.json` (legacy trust key) is generated (at the
default location), the `/etc/docker/` directory was created with
0700 permissions, making the directory only accessible by `root`.
Given that the `0600` permissions on the key itself already protect
it from being used by other users, the permissions of `/etc/docker`
can be less restrictive.
This patch changes the permissions for the directory to `0755`, so
that the CLI (if executed as non-root) can also access this directory.
> **NOTE**: "strictly", this patch is only needed for situations where no _custom_
> location for the trustkey is specified (not overridden with `--deprecated-key-path`),
> but setting the permissions only for the "default" case would make
> this more complicated.
```bash
make binary shell
make install
ls -la /etc/ | grep docker
dockerd
^C
ls -la /etc/ | grep docker
drwxr-xr-x 2 root root 4096 Sep 14 12:11 docker
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Addressing few review comments as part of code refactoring.
Also moved validation logic from CLI to Moby.
Signed-off-by: selansen <elango.siva@docker.com>
As in other similar drivers (jsonlog, local), use a set
(i.e. `map[whatever]struct{}`), making the code simpler.
While at it, make sure we remove the reader from the set
after calling `ProducerGone()` on it.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This should eliminate a bunch of new (go-1.11 related) validation
errors telling that the code is not formatted with `gofmt -s`.
No functional change, just whitespace (i.e.
`git show --ignore-space-change` shows nothing).
Patch generated with:
> git ls-files | grep -v ^vendor/ | grep .go$ | xargs gofmt -s -w
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: John Howard <jhoward@microsoft.com>
This is a fix for a few related scenarios where it's impossible to remove layers or containers
until the host is rebooted. Generally (or at least easiest to repro) through a forced daemon kill
while a container is running.
Possibly slightly worse than that, as following a host reboot, the scratch layer would possibly be leaked and
left on disk under the dataroot\windowsfilter directory after the container is removed.
One such example of a failure:
1. run a long running container with the --rm flag
docker run --rm -d --name test microsoft/windowsservercore powershell sleep 30
2. Force kill the daemon not allowing it to cleanup. Simulates a crash or a host power-cycle.
3. (re-)Start daemon
4. docker ps -a
PS C:\control> docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7aff773d782b malloc "powershell start-sl…" 11 seconds ago Removal In Progress malloc
5. Try to remove
PS C:\control> docker rm 7aff
Error response from daemon: container 7aff773d782bbf35d95095369ffcb170b7b8f0e6f8f65d5aff42abf61234855d: driver "windowsfilter" failed to remove root filesystem: rename C:\control\windowsfilter\7aff773d782bbf35d95095369ffcb170b7b8f0e6f8f65d5aff42abf61234855d C:\control\windowsfilter\7aff773d782bbf35d95095369ffcb170b7b8f0e6f8f65d5aff42abf61234855d-removing: Access is denied.
PS C:\control>
Step 5 fails.
This should test that
- all the messages produced are delivered (i.e. not lost)
- followLogs() exits
Loosely based on the test having the same name by Brian Goff, see
https://gist.github.com/cpuguy83/e538793de18c762608358ee0eaddc197
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
When daemon.ContainerLogs() is called with options.follow=true
(as in "docker logs --follow"), the "loggerutils.followLogs()"
function never returns (even then the logs consumer is gone).
As a result, all the resources associated with it (including
an opened file descriptor for the log file being read, two FDs
for a pipe, and two FDs for inotify watch) are never released.
If this is repeated (such as by running "docker logs --follow"
and pressing Ctrl-C a few times), this results in DoS caused by
either hitting the limit of inotify watches, or the limit of
opened files. The only cure is daemon restart.
Apparently, what happens is:
1. logs producer (a container) is gone, calling (*LogWatcher).Close()
for all its readers (daemon/logger/jsonfilelog/jsonfilelog.go:175).
2. WatchClose() is properly handled by a dedicated goroutine in
followLogs(), cancelling the context.
3. Upon receiving the ctx.Done(), the code in followLogs()
(daemon/logger/loggerutils/logfile.go#L626-L638) keeps to
send messages _synchronously_ (which is OK for now).
4. Logs consumer is gone (Ctrl-C is pressed on a terminal running
"docker logs --follow"). Method (*LogWatcher).Close() is properly
called (see daemon/logs.go:114). Since it was called before and
due to to once.Do(), nothing happens (which is kinda good, as
otherwise it will panic on closing a closed channel).
5. A goroutine (see item 3 above) keeps sending log messages
synchronously to the logWatcher.Msg channel. Since the
channel reader is gone, the channel send operation blocks forever,
and resource cleanup set up in defer statements at the beginning
of followLogs() never happens.
Alas, the fix is somewhat complicated:
1. Distinguish between close from logs producer and logs consumer.
To that effect,
- yet another channel is added to LogWatcher();
- {Watch,}Close() are renamed to {Watch,}ProducerGone();
- {Watch,}ConsumerGone() are added;
*NOTE* that ProducerGone()/WatchProducerGone() pair is ONLY needed
in order to stop ConsumerLogs(follow=true) when a container is stopped;
otherwise we're not interested in it. In other words, we're only
using it in followLogs().
2. Code that was doing (logWatcher*).Close() is modified to either call
ProducerGone() or ConsumerGone(), depending on the context.
3. Code that was waiting for WatchClose() is modified to wait for
either ConsumerGone() or ProducerGone(), or both, depending on the
context.
4. followLogs() are modified accordingly:
- context cancellation is happening on WatchProducerGone(),
and once it's received the FileWatcher is closed and waitRead()
returns errDone on EOF (i.e. log rotation handling logic is disabled);
- due to this, code that was writing synchronously to logWatcher.Msg
can be and is removed as the code above it handles this case;
- function returns once ConsumerGone is received, freeing all the
resources -- this is the bugfix itself.
While at it,
1. Let's also remove the ctx usage to simplify the code a bit.
It was introduced by commit a69a59ffc7 ("Decouple removing the
fileWatcher from reading") in order to fix a bug. The bug was actually
a deadlock in fsnotify, and the fix was just a workaround. Since then
the fsnofify bug has been fixed, and a new fsnotify was vendored in.
For more details, please see
https://github.com/moby/moby/pull/27782#issuecomment-416794490
2. Since `(*filePoller).Close()` is fixed to remove all the files
being watched, there is no need to explicitly call
fileWatcher.Remove(name) anymore, so get rid of the extra code.
Should fix https://github.com/moby/moby/issues/37391
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This test case checks that followLogs() exits once the reader is gone.
Currently it does not (i.e. this test is supposed to fail) due to #37391.
[kolyshkin@: test case Brian Goff, changelog and all bugs are by me]
Source: https://gist.github.com/cpuguy83/e538793de18c762608358ee0eaddc197
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This code has many return statements, for some of them the
"end logs" or "end stream" message was not printed, giving
the impression that this "for" loop never ended.
Make sure that "begin logs" is to be followed by "end logs".
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Similar to a related issue where previously, private Hyper-V networks
would each add 15 secs to the daemon startup, non-hns governed internal
networks are reported by hns as network type "internal" which is not
mapped to any network plugin (and thus we get the same plugin load retry
loop as before).
This issue hits Docker for Desktop because we setup such a network for
the Linux VM communication.
Signed-off-by: Simon Ferquel <simon.ferquel@docker.com>
In case a volume is specified via Mounts API, and SELinux is enabled,
the following error happens on container start:
> $ docker volume create testvol
> $ docker run --rm --mount source=testvol,target=/tmp busybox true
> docker: Error response from daemon: error setting label on mount
> source '': no such file or directory.
The functionality to relabel the source of a local mount specified via
Mounts API was introduced in commit 5bbf5cc and later broken by commit
e4b6adc, which removed setting mp.Source field.
With the current data structures, the host dir is already available in
v.Mountpoint, so let's just use it.
Fixes: e4b6adc
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Commit 5c8da2e967 updated the filtering behavior
to match container-names without having to specify the leading slash.
This change caused a regression in situations where a regex was provided as
filter, using an explicit leading slash (`--filter name=^/mycontainername`).
This fix changes the filters to match containers both with, and without the
leading slash, effectively making the leading slash optional when filtering.
With this fix, filters with and without a leading slash produce the same result:
$ docker ps --filter name=^a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
21afd6362b0c busybox "sh" 2 minutes ago Up 2 minutes a2
56e53770e316 busybox "sh" 2 minutes ago Up 2 minutes a1
$ docker ps --filter name=^/a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
21afd6362b0c busybox "sh" 2 minutes ago Up 2 minutes a2
56e53770e316 busybox "sh" 3 minutes ago Up 3 minutes a1
$ docker ps --filter name=^b
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b69003b6a6fe busybox "sh" About a minute ago Up About a minute b1
$ docker ps --filter name=^/b
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b69003b6a6fe busybox "sh" 56 seconds ago Up 54 seconds b1
$ docker ps --filter name=/a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
21afd6362b0c busybox "sh" 3 minutes ago Up 3 minutes a2
56e53770e316 busybox "sh" 4 minutes ago Up 4 minutes a1
$ docker ps --filter name=a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
21afd6362b0c busybox "sh" 3 minutes ago Up 3 minutes a2
56e53770e316 busybox "sh" 4 minutes ago Up 4 minutes a1
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Since PR 11353 (commit 7804cd36ee "Filter out default mounts that
are override by user") there can be no duplicated mounts in the list,
so the check is redundant.
This should speed up container start by a nanosecond or two.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In case a user wants to have a child reaper inside a container
(i.e. run "docker --init") AND a bind-mounted /dev, the following
error occurs:
> docker run -d -v /dev:/dev --init busybox top
> 088c96808c683077f04c4cc2711fddefe1f5970afc085d59e0baae779745a7cf
> docker: Error response from daemon: OCI runtime create failed: container_linux.go:296: starting container process caused "exec: "/dev/init": stat /dev/init: no such file or directory": unknown.
This happens because if a user-suppled /dev is provided, all the
built-in /dev/xxx mounts are filtered out.
To solve, let's move in-container init to /sbin, as the chance that
/sbin will be bind-mounted to a container is smaller than that for /dev.
While at it, let's give it more unique name (docker-init).
NOTE it still won't work for the case of bind-mounted /sbin.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The remote API allows full privilege escalation and is equivalent to
having root access on the host. Because of this, the API should never
be accessible through an insecure connection (TCP without TLS, or TCP
without TLS verification).
Although a warning is already logged on startup if the daemon uses an
insecure configuration, this warning is not very visible (unless someone
decides to read the logs).
This patch attempts to make insecure configuration more visible by sending
back warnings through the API (which will be printed when using `docker info`).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
When requesting information about the daemon's configuration through the `/info`
endpoint, missing features (or non-recommended settings) may have to be presented
to the user.
Detecting these situations, and printing warnings currently is handled by the
cli, which results in some complications:
- duplicated effort: each client has to re-implement detection and warnings.
- it's not possible to generate warnings for reasons outside of the information
returned in the `/info` response.
- cli-side detection has to be updated for new conditions. This means that an
older cli connecting to a new daemon may not print all warnings (due to
it not detecting the new conditions)
- some warnings (in particular, warnings about storage-drivers) depend on
driver-status (`DriverStatus`) information. The format of the information
returned in this field is not part of the API specification and can change
over time, resulting in cli-side detection no longer being functional.
This patch adds a new `Warnings` field to the `/info` response. This field is
to return warnings to be presented by the user.
Existing warnings that are currently handled by the CLI are copied to the daemon
as part of this patch; This change is backward-compatible with existing
clients; old client can continue to use the client-side warnings, whereas new
clients can skip client-side detection, and print warnings that are returned by
the daemon.
Example response with this patch applied;
```bash
curl --unix-socket /var/run/docker.sock http://localhost/info | jq .Warnings
```
```json
[
"WARNING: bridge-nf-call-iptables is disabled",
"WARNING: bridge-nf-call-ip6tables is disabled"
]
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Blocks the execution of tasks during the Prepare phase until there
exists an IP address for every overlay network in use by the task. This
prevents a task from starting before the NetworkAttachment containing
the IP address has been sent down to the node.
Includes a basic test for the correct use case.
Signed-off-by: Drew Erny <drew.erny@docker.com>
This feature allows user to specify list of subnets for global
default address pool. User can configure subnet list using
'swarm init' command. Daemon passes the information to swarmkit.
We validate the information in swarmkit, then store it in cluster
object. when IPAM init is called, we pass subnet list to IPAM driver.
Signed-off-by: selansen <elango.siva@docker.com>
* Expose license status in Info
This wires up a new field in the Info payload that exposes the license.
For moby this is hardcoded to always report a community edition.
Downstream enterprise dockerd will have additional licensing logic wired
into this function to report details about the current license status.
Signed-off-by: Daniel Hiltgen <daniel.hiltgen@docker.com>
* Code review comments
Signed-off-by: Daniel Hiltgen <daniel.hiltgen@docker.com>
* Add windows autogen support
Signed-off-by: Daniel Hiltgen <daniel.hiltgen@docker.com>
This driver uses protobuf to store log messages and has better defaults
for log file handling (e.g. compression and file rotation enabled by
default).
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Signed-off-by: John Howard <jhoward@microsoft.com>
Fixes#36764
@johnstep PTAL. @jterry75 FYI.
There are two commits in this PR. The first ensure that errors are actually returned to the caller - it was being thrown away.
The second commit changes the LCOW driver to map, on a per service VM basis, "long" container paths such as `/tmp/c8fa0ae1b348f505df2707060f6a49e63280d71b83b7936935c827e2e9bde16d` to much shorter paths, based on a per-service VM counter, so something more like /tmp/d3. This means that the root cause of the failure where the mount call to create the overlay was failing due to command line length becomes something much shorter such as below.
`mount -t overlay overlay -olowerdir=/tmp/d3:/tmp/d4:/tmp/d5:/tmp/d6:/tmp/d7:/tmp/d8:/tmp/d9:/tmp/d10:/tmp/d11:/tmp/d12:/tmp/d13:/tmp/d14:/tmp/d15:/tmp/d16:/tmp/d17:/tmp/d18:/tmp/d19:/tmp/d20:/tmp/d21:/tmp/d22:/tmp/d23:/tmp/d24:/tmp/d25:/tmp/d26:/tmp/d27:/tmp/d28:/tmp/d29:/tmp/d30:/tmp/d31:/tmp/d32:/tmp/d33:/tmp/d34:/tmp/d35:/tmp/d36:/tmp/d37:/tmp/d38:/tmp/d39:/tmp/d40:/tmp/d41:/tmp/d42:/tmp/d43:/tmp/d44:/tmp/d45:/tmp/d46:/tmp/d47:/tmp/d48:/tmp/d49:/tmp/d50:/tmp/d51:/tmp/d52:/tmp/d53:/tmp/d54:/tmp/d55:/tmp/d56:/tmp/d57:/tmp/d58:/tmp/d59:/tmp/d60:/tmp/d61:/tmp/d62,upperdir=/tmp/d2/upper,workdir=/tmp/d2/work /tmp/c8fa0ae1b348f505df2707060f6a49e63280d71b83b7936935c827e2e9bde16d-mount`
For those worrying about overflow (which I'm sure @thaJeztah will mention...): It's safe to use a counter here as SVMs are disposable in the default configuration. The exception is when running the daemon in unsafe LCOW "global" mode (ie `--storage-opt lcow.globalmode=1`) where the SVMs aren't disposed of, but a single one is reused. However, to overflow the command line length, it would require several hundred-thousand trillion (conservative, I should sit down and work it out accurately if I get -really- bored) of SCSI hot-add operations, and even to hit that would be hard as just running containers normally uses the VPMEM path for the containers UVM, not to the global SVM on SCSI. It gets incremented by one per build step (commit more accurately) as a general rule. Hence it would be necessary to have to be doing automated builds without restarting the daemon for literally years on end in unsafe mode. 😇
Note that in reality, the previous limit of ~47 layers before hitting the command line length limit is close to what is possible in the platform, at least as of RS5/Windows Server 2019 where, in the HCS v1 schema, a single SCSI controller is used, and that can only support 64 disks per controller per the Hyper-V VDEV. And remember we have one slot taken up for the SVMs scratch, and another for the containers scratch when committing a layer. So the best you can architecturally get on the platform is around the following (it's also different by 1 depending on whether in unsafe or default mode)
```
PS E:\docker\build\36764\short> docker build --no-cache .
Sending build context to Docker daemon 2.048kB
Step 1/4 : FROM alpine as first
---> 11cd0b38bc3c
Step 2/4 : RUN echo test > /test
---> Running in 8ddfe20e5bfb
Removing intermediate container 8ddfe20e5bfb
---> b0103a00b1c9
Step 3/4 : FROM alpine
---> 11cd0b38bc3c
Step 4/4 : COPY --from=first /test /test
---> 54bfae391eba
Successfully built 54bfae391eba
PS E:\docker\build\36764\short> cd ..
PS E:\docker\build\36764> docker build --no-cache .
Sending build context to Docker daemon 4.689MB
Step 1/61 : FROM alpine as first
---> 11cd0b38bc3c
Step 2/61 : RUN echo test > /test
---> Running in 02597ff870db
Removing intermediate container 02597ff870db
---> 3096de6fc454
Step 3/61 : RUN echo test > /test
---> Running in 9a8110f4ff19
Removing intermediate container 9a8110f4ff19
---> 7691808cf28e
Step 4/61 : RUN echo test > /test
---> Running in 9afb8f51510b
Removing intermediate container 9afb8f51510b
---> e42a0df2bb1c
Step 5/61 : RUN echo test > /test
---> Running in fe977ed6804e
Removing intermediate container fe977ed6804e
---> 55850c9b0479
Step 6/61 : RUN echo test > /test
---> Running in be65cbfad172
Removing intermediate container be65cbfad172
---> 0cf8acba70f0
Step 7/61 : RUN echo test > /test
---> Running in fd5b0907b6a9
Removing intermediate container fd5b0907b6a9
---> 257a4493d85d
Step 8/61 : RUN echo test > /test
---> Running in f7ca0ffd9076
Removing intermediate container f7ca0ffd9076
---> 3baa6f4fa2d5
Step 9/61 : RUN echo test > /test
---> Running in 5146814d4727
Removing intermediate container 5146814d4727
---> 485b9d5cf228
Step 10/61 : RUN echo test > /test
---> Running in a090eec1b743
Removing intermediate container a090eec1b743
---> a7eb10155b51
Step 11/61 : RUN echo test > /test
---> Running in 942660b288df
Removing intermediate container 942660b288df
---> 9d286a1e2133
Step 12/61 : RUN echo test > /test
---> Running in c3d369aa91df
Removing intermediate container c3d369aa91df
---> f78be4788992
Step 13/61 : RUN echo test > /test
---> Running in a03c3ac6888f
Removing intermediate container a03c3ac6888f
---> 6504363f61ab
Step 14/61 : RUN echo test > /test
---> Running in 0c3c2fca3f90
Removing intermediate container 0c3c2fca3f90
---> fe3448b8bb29
Step 15/61 : RUN echo test > /test
---> Running in 828d51c76d3b
Removing intermediate container 828d51c76d3b
---> 870684e3aea0
Step 16/61 : RUN echo test > /test
---> Running in 59a2f7c5f3ad
Removing intermediate container 59a2f7c5f3ad
---> cf84556ca5c0
Step 17/61 : RUN echo test > /test
---> Running in bfb4e088eeb3
Removing intermediate container bfb4e088eeb3
---> 9c8f9f652cef
Step 18/61 : RUN echo test > /test
---> Running in f1b88bb5a2d7
Removing intermediate container f1b88bb5a2d7
---> a6233ad21648
Step 19/61 : RUN echo test > /test
---> Running in 45f70577d709
Removing intermediate container 45f70577d709
---> 1b5cc52d370d
Step 20/61 : RUN echo test > /test
---> Running in 2ce231d5043d
Removing intermediate container 2ce231d5043d
---> 4a0e17cbebaa
Step 21/61 : RUN echo test > /test
---> Running in 52e4b0928f1f
Removing intermediate container 52e4b0928f1f
---> 99b50e989bcb
Step 22/61 : RUN echo test > /test
---> Running in f7ba3da7460d
Removing intermediate container f7ba3da7460d
---> bfa3cad88285
Step 23/61 : RUN echo test > /test
---> Running in 60180bf60f88
Removing intermediate container 60180bf60f88
---> fe7271988bcb
Step 24/61 : RUN echo test > /test
---> Running in 20324d396531
Removing intermediate container 20324d396531
---> e930bc039128
Step 25/61 : RUN echo test > /test
---> Running in b3ac70fd4404
Removing intermediate container b3ac70fd4404
---> 39d0a11ea6d8
Step 26/61 : RUN echo test > /test
---> Running in 0193267d3787
Removing intermediate container 0193267d3787
---> 8062d7aab0a5
Step 27/61 : RUN echo test > /test
---> Running in f41f45fb7985
Removing intermediate container f41f45fb7985
---> 1f5f18f2315b
Step 28/61 : RUN echo test > /test
---> Running in 90dd09c63d6e
Removing intermediate container 90dd09c63d6e
---> 02f0a1141f11
Step 29/61 : RUN echo test > /test
---> Running in c557e5386e0a
Removing intermediate container c557e5386e0a
---> dbcd6fb1f6f4
Step 30/61 : RUN echo test > /test
---> Running in 65369385d855
Removing intermediate container 65369385d855
---> e6e9058a0650
Step 31/61 : RUN echo test > /test
---> Running in d861fcc388fd
Removing intermediate container d861fcc388fd
---> 6e4c2c0f741f
Step 32/61 : RUN echo test > /test
---> Running in 1483962b7e1c
Removing intermediate container 1483962b7e1c
---> cf8f142aa055
Step 33/61 : RUN echo test > /test
---> Running in 5868934816c1
Removing intermediate container 5868934816c1
---> d5ff87cdc204
Step 34/61 : RUN echo test > /test
---> Running in e057f3201f3a
Removing intermediate container e057f3201f3a
---> b4031b7ab4ac
Step 35/61 : RUN echo test > /test
---> Running in 22b769b9079c
Removing intermediate container 22b769b9079c
---> 019d898510b6
Step 36/61 : RUN echo test > /test
---> Running in f1d364ef4ff8
Removing intermediate container f1d364ef4ff8
---> 9525cafdf04d
Step 37/61 : RUN echo test > /test
---> Running in 5bf505b8bdcc
Removing intermediate container 5bf505b8bdcc
---> cd5002b33bfd
Step 38/61 : RUN echo test > /test
---> Running in be24a921945c
Removing intermediate container be24a921945c
---> 8675db44d1b7
Step 39/61 : RUN echo test > /test
---> Running in 352dc6beef3d
Removing intermediate container 352dc6beef3d
---> 0ab0ece43c71
Step 40/61 : RUN echo test > /test
---> Running in eebde33e5d9b
Removing intermediate container eebde33e5d9b
---> 46ca4b0dfc03
Step 41/61 : RUN echo test > /test
---> Running in f920313a1e85
Removing intermediate container f920313a1e85
---> 7f3888414d58
Step 42/61 : RUN echo test > /test
---> Running in 10e2f4dc1ac7
Removing intermediate container 10e2f4dc1ac7
---> 14db9e15f2dc
Step 43/61 : RUN echo test > /test
---> Running in c849d6e89aa5
Removing intermediate container c849d6e89aa5
---> fdb770494dd6
Step 44/61 : RUN echo test > /test
---> Running in 419d1a8353db
Removing intermediate container 419d1a8353db
---> d12e9cf078be
Step 45/61 : RUN echo test > /test
---> Running in 0f1805263e4c
Removing intermediate container 0f1805263e4c
---> cd005e7b08a4
Step 46/61 : RUN echo test > /test
---> Running in 5bde05b46441
Removing intermediate container 5bde05b46441
---> 05aa426a3d4a
Step 47/61 : RUN echo test > /test
---> Running in 01ebc84bd1bc
Removing intermediate container 01ebc84bd1bc
---> 35d371fa4342
Step 48/61 : RUN echo test > /test
---> Running in 49f6c2f51dd4
Removing intermediate container 49f6c2f51dd4
---> 1090b5dfa130
Step 49/61 : RUN echo test > /test
---> Running in f8a9089cd725
Removing intermediate container f8a9089cd725
---> b2d0eec0716d
Step 50/61 : RUN echo test > /test
---> Running in a1697a0b2db0
Removing intermediate container a1697a0b2db0
---> 10d96ac8f497
Step 51/61 : RUN echo test > /test
---> Running in 33a2332c06eb
Removing intermediate container 33a2332c06eb
---> ba5bf5609c1c
Step 52/61 : RUN echo test > /test
---> Running in e8920392be0d
Removing intermediate container e8920392be0d
---> 5b3a95685c7e
Step 53/61 : RUN echo test > /test
---> Running in 4b9298587c65
Removing intermediate container 4b9298587c65
---> d4961a349141
Step 54/61 : RUN echo test > /test
---> Running in 8a0c960c2ba1
Removing intermediate container 8a0c960c2ba1
---> b413197fcfa2
Step 55/61 : RUN echo test > /test
---> Running in 536ee3b9596b
Removing intermediate container 536ee3b9596b
---> fc16b69b224a
Step 56/61 : RUN echo test > /test
---> Running in 8b817b8d7b59
Removing intermediate container 8b817b8d7b59
---> 2f0896400ff9
Step 57/61 : RUN echo test > /test
---> Running in ab0ed79ec3d4
Removing intermediate container ab0ed79ec3d4
---> b4fb420e736c
Step 58/61 : RUN echo test > /test
---> Running in 8548d7eead1f
Removing intermediate container 8548d7eead1f
---> 745103fd5a38
Step 59/61 : RUN echo test > /test
---> Running in 1980559ad5d6
Removing intermediate container 1980559ad5d6
---> 08c1c74a5618
Step 60/61 : FROM alpine
---> 11cd0b38bc3c
Step 61/61 : COPY --from=first /test /test
---> 67f053c66c27
Successfully built 67f053c66c27
PS E:\docker\build\36764>
```
Note also that subsequent error messages once you go beyond current platform limitations kind of suck (such as insufficient resources with a bunch of spew which is incomprehensible to most) and we could do better to detect this earlier in the daemon. That'll be for a (reasonably low-priority) follow-up though as and when I have time. Theoretically we *may*, if the platform doesn't require additional changes for RS5, be able to have bigger platform limits using the v2 schema with up to 127 VPMem devices, and the possibility to have multiple SCSI controllers per SVM/UVM. However, currently LCOW is using HCS v1 schema calls, and there's no plans to rewrite the graphdriver/libcontainerd components outside of the moving LCOW fully over to the containerd runtime/snapshotter using HCS v2 schema, which is still some time off fruition.
PS OK, while waiting for a full run to complete, I did get bored. Turns out it won't overflow line length as max(uint64) is 18446744073709551616 which would still be short enough at 127 layers, double the current platform limit. And I could always change it to hex or base36 to make it even shorter, or remove the 'd' from /tmp/dN. IOW, pretty sure no-one is going to hit the limit even if we could get the platform to 256 which is the current Hyper-V SCSI limit per VM (4x64), although PMEM at 127 would be the next immediate limit.
This implements chown support on Windows. Built-in accounts as well
as accounts included in the SAM database of the container are supported.
NOTE: IDPair is now named Identity and IDMappings is now named
IdentityMapping.
The following are valid examples:
ADD --chown=Guest . <some directory>
COPY --chown=Administrator . <some directory>
COPY --chown=Guests . <some directory>
COPY --chown=ContainerUser . <some directory>
On Windows an owner is only granted the permission to read the security
descriptor and read/write the discretionary access control list. This
fix also grants read/write and execute permissions to the owner.
Signed-off-by: Salahuddin Khan <salah@docker.com>
This makes it so consumers of `LogFile` should pass in how to get an
io.Reader to the requested number of lines to tail.
This is also much more efficient when tailing a large number of lines.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Adds a supervisor package for starting and monitoring containerd.
Separates grpc connection allowing access from daemon.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
Handle the case of systemd-resolved, and if in place
use a different resolv.conf source.
Set appropriately the option on libnetwork.
Move unix specific code to container_operation_unix
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Disable cri plugin by default in containerd and
allows an option to enable the plugin. This only
has an effect on containerd when supervised by
dockerd. When containerd is managed outside of
dockerd, the configuration is not effected.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
* Regex name filters were display undesired behavior due to
names containing the trailing slash when being compared
* Adjusted filterByNameIDMatches and includeContainerInList to
strip the slash prefix before doing name comparisons
* Added test case and helper functions for the test to list_test
* Force failed tests during development to ensure there were
no false positives
Signed-off-by: Chris White <me@cwprogram.com>
When go-1.11beta1 is used for building, the following error is
reported:
> 14:56:20 daemon\graphdriver\lcow\lcow.go:236: Debugf format %s reads
> arg #2, but call has 1 arg
While fixing this, let's also fix a few other things in this
very function (startServiceVMIfNotRunning):
1. Do not use fmt.Printf when not required.
2. Use `title` whenever possible.
3. Don't add `id` to messages as `title` already has it.
4. Remove duplicated colons.
5. Try to unify style of messages.
6. s/startservicevmifnotrunning/startServiceVMIfNotRunning/
...
In general, logging/debugging here is a mess and requires much more
love than I can give it at the moment. Areas for improvement:
1. Add a global var logger = logrus.WithField("storage-driver", "lcow")
and use it everywhere else in the code.
2. Use logger.WithField("id", id) whenever possible (same for "context"
and other similar fields).
3. Revise all the errors returned to be uniform.
4. Make use of errors.Wrap[f] whenever possible.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
There are two build errors when using go-1.11beta1:
> daemon/logger/loggerutils/logfile.go:367: Warningf format %q arg f.Name is a func value, not called
> daemon/logger/loggerutils/logfile.go:564: Debug call has possible formatting directive %v
In the first place, the file name is actually not required as error
message already includes it.
While at it, fix a couple of other places for more correct messages, and
make sure to not add a file name if an error already has it.
Fixes: f69f09f44c
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Fix the following go-1.11beta1 build error:
> daemon/graphdriver/aufs/aufs.go:376: Wrapf format %s reads arg #1, but call has 0 args
While at it, change '%s' to %q.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Go 1.11beta1 (rightfully) complains:
> 15:38:37 daemon/cluster/controllers/plugin/controller.go:183:
> Entry.Debugf format %#T has unrecognized flag #
This debug print was added by commit 72c3bcf2a5.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
In particular, these two:
> daemon/daemon_unix.go:1129: Wrapf format %v reads arg #1, but call has 0 args
> daemon/kill.go:111: Warn call has possible formatting directive %s
and a few more.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Commit c0bc14e8 wrapped the return value of nw.Delete() with some extra
information. However, this breaks the code in
containerAdaptor.removeNetworks() which ignores certain specific
libnetwork error return codes. Said codes actually don't represent
errors, but just regular conditions to be expected in normal operation.
The removeNetworks() call checked for these errors by type assertions
which the errors.Wrap(err...) breaks.
This has a cascading effect, because controller.Remove() invokes
containerAdaptor.removeNetworks() and if the latter returns an error,
then Remove() fails to remove the container itself. This is not
necessarily catastrophic since the container reaper apparently will
purge the container later, but it is clearly not the behavior we want.
Signed-off-by: Chris Telfer <ctelfer@docker.com>
This patch is required for the updated version of libnetwork and entails
two minor changes.
First, it uses the new libnetwork.NetworkDeleteOptionRemoveLB option to
the network.Delete() method to automatically remove the load balancing
endpoint for ingress networks. This allows removal of the
deleteLoadBalancerSandbox() function whose functionality is now within
libnetwork.
The second change is to allocate a load balancer endpoint IP address for
all overlay networks rather than just "ingress" and windows overlay
networks. Swarmkit is already performing this allocation, but moby was
not making use of these IP addresses for Linux overlay networks (except
ingress). The current version of libnetwork makes use of these IP
addresses by creating a load balancing sandbox and endpoint similar to
ingress's for all overlay network and putting all load balancing state
for a given node in that sandbox only. This reduces the amount of linux
kernel state required per node.
In the prior scheme, libnetwork would program each container's network
namespace with every piece of load balancing state for every other
container that shared *any* network with the first container. This
meant that the amount of kernel state on a given node scaled with the
square of the number of services in the cluster and with the square of
the number of containers per service. With the new scheme, kernel state
at each node scales linearly with the number of services and the number
of containers per service. This also reduces the number of system calls
required to add or remove tasks and containers. Previously the number
of system calls required grew linearly with the number of other
tasks that shared a network with the container. Now the number of
system calls grows linearly only with the number of networks that the
task/container is attached to. This results in a significant
performance improvement when adding and removing services to a cluster
that already heavily loaded.
The primary disadvantage to this scheme is that it requires the
allocation of an additional IP address per node per subnet for every
node in the cluster that has a task on the given subnet. However, as
mentioned, swarmkit is already allocating these IP addresses for every
node and they are going unused. Future swarmkit modifications should be
examined to only allocate said IP addresses when nodes actually require
them.
Signed-off-by: Chris Telfer <ctelfer@docker.com>
When using the mounts API, bind mounts are not supposed to be
automatically created.
Before this patch there is a race condition between valiating that a
bind path exists and then actually setting up the bind mount where the
bind path may exist during validation but was removed during mountpooint
setup.
This adds a field to the mountpoint struct to ensure that binds created
over the mounts API are not accidentally created.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This stuff doesn't belong here and is causing imports of libnetwork into
the router, which is not what we want.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Signed-off-by: John Howard <jhoward@microsoft.com>
Addresses https://github.com/moby/moby/pull/35089#issuecomment-367802698.
This change enables the daemon to automatically select an image under LCOW
that can be used if the API doesn't specify an explicit platform.
For example:
FROM supertest2014/nyan
ADD Dockerfile /
And docker build . will download the linux image (not a multi-manifest image)
And similarly docker pull ubuntu will match linux/amd64
This adds MaskedPaths and ReadOnlyPaths options to HostConfig for containers so
that a user can override the default values.
When the value sent through the API is nil the default is used.
Otherwise the default is overridden.
Adds integration tests for MaskedPaths and ReadonlyPaths.
Signed-off-by: Jess Frazelle <acidburn@microsoft.com>
With a full attach, each attach was leaking 4 goroutines.
This updates attach to use errgroup instead of the hodge-podge of
waitgroups and channels.
In addition, the detach event was never being sent.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This makes it a bit simpler to remove this interface for v2 plugins
and not break external projects (libnetwork and swarmkit).
Note that before we remove the `Client()` interface from `CompatPlugin`
libnetwork and swarmkit must be updated to explicitly check for the v1
client interface as is done int his PR.
This is just a minor tweak that I realized is needed after trying to
implement the needed changes on libnetwork.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
If "ps" fails, in many cases it prints a meaningful error message
which a user can benefit from. Let's use it.
While at it, let's use errdefs.System to classify the error,
as well as errors.Wrap.
Before:
> $ docker top $CT <any bad ps options>
> Error response from daemon: Error running ps: exit status 1
After:
> $ docker top $CT auxm
> Error response from daemon: ps: error: thread display conflicts with forest display
or
> $ docker top $CT saur
> Error response from daemon: ps: error: conflicting format options
or, if there's no meaningful error on stderr, same as before:
> $ docker top $CT 1234
> Error response from daemon: ps: exit status 1
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Current ContainerTop (a.k.a. docker top) implementation uses "ps"
to get the info about *all* running processes, then parses it, then
filters the results to only contain PIDs used by the container.
Collecting data only to throw most of it away is inefficient,
especially on a system running many containers (or processes).
For example, "docker top" on a container with a single process
can take up to 0.5 seconds to execute (on a mostly idle system)
which is noticeably slow.
Since the containers PIDs are known beforehand, let's use ps's
"-q" option to provide it with a list of PIDs we want info about.
The problem with this approach is, some ps options can't be used
with "-q" (the only one I'm aware of is "f" ("forest view") but
there might be more). As the list of such options is not known,
in case ps fails, it is executed again without "q" (retaining
the old behavior).
Next, the data produced by "ps" is filtered in the same way as before.
The difference here is, in case "-q" worked, the list is much shorter.
I ran some benchmarks on my laptop, with about 8000 "sleep" processes
running to amplify the savings.
The improvement in "docker top" execution times is 5x to 10x (roughly
0.05s vs 0.5s).
The improvement in ContainerTop() execution time is up to 100x
(roughly 3ms vs 300ms).
I haven't measured the memory or the CPU time savings, guess those
are not that critical.
NOTE that busybox ps does not implement -q so the fallback is always
used, but AFAIK it is not usable anyway and Docker expects a normal
ps to be on the system (say the list of fields produced by
"busybox ps -ef" differs from normal "ps -ef" etc.).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
It does not make sense to keep looking for PID once
we found it, so let's give it a break.
The side effect of this patch is, if there's more than one column
titled "PID", the last (rightmost) column was used before, and now
the first (leftmost) column is used. Should make no practical
difference whatsoever.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Adds functionality to parse and return network attachment spec
information. Network attachment tasks are phony tasks created in
swarmkit to deal with unmanaged containers attached to swarmkit. Before
this change, attempting `docker inspect` on the task id of a network
attachment task would result in an empty task object. After this change,
a full task object is returned
Fixes#26548 the correct way.
Signed-off-by: Drew Erny <drew.erny@docker.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This prevents the following test case failures "go test" is run
as non-root in the daemon/ directory:
> --- FAIL: TestContainerInitDNS (0.02s)
> daemon_test.go:209: chown /tmp/docker-container-test-054812199/volumes: operation not permitted
>
> --- FAIL: TestDaemonReloadNetworkDiagnosticPort (0.00s)
> reload_test.go:525: mkdir /var/lib/docker/network/files/: permission denied
> --- FAIL: TestRootMountCleanup (0.00s)
> daemon_linux_test.go:240: assertion failed: error is not nil: operation not permitted
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In case aufs driver is not supported because supportsAufs() said so,
it is not possible to get a real reason from the logs.
To fix, log the error returned.
Note we're not using WithError here as the error message itself is the
sole message we want to print (i.e. there's nothing to add to it).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
ext4 support d_type by default, but filetype feature is a tunable so
there is still a chance to disable it for some reasons. In this case,
print additional message to explicitly tell how to support d_type.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
In cases where a logging plugin has crashed when the daemon tries to
copy the container stdio to the logging plugin it returns a broken pipe
error and any log entries that occurr while the plugin is down are lost.
Fix this by opening read+write in the daemon so logs are not lost while
the plugin is down.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
- Check errors.Cause(err) when comparing errors
- Fix bug where oldest log file is not actually removed. This in
particular causes issues when compression is enabled. On rotate it just
overwrites the data in the log file corrupting it.
- Use O_TRUNC to open new gzip files to ensure we don't corrupt log
files as happens without the above fix.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Closing the log driver was in a defer meanwhile logs are
collected asyncronously, so the log driver was being closed before reads
were actually finished.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>