The following fields are unsupported:
* BlkioStats: all fields other than IoServiceBytesRecursive
* CPUStats: CPUUsage.PercpuUsage
* MemoryStats: MaxUsage and Failcnt
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
1. Sscanf is very slow, and we don't use the first two fields -- get rid of it.
2. Since the field we search for is at the end of line and prepended by
a space, we can just use strings.HaveSuffix.
3. Error checking for bufio.Scanner should be done after the Scan()
loop, not inside it.
Fixes: 885b29df09
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Switch to moby/sys/mount and mountinfo. Keep the pkg/mount for potential
outside users.
This commit was generated by the following bash script:
```
set -e -u -o pipefail
for file in $(git grep -l 'docker/docker/pkg/mount"' | grep -v ^pkg/mount); do
sed -i -e 's#/docker/docker/pkg/mount"#/moby/sys/mount"#' \
-e 's#mount\.\(GetMounts\|Mounted\|Info\|[A-Za-z]*Filter\)#mountinfo.\1#g' \
$file
goimports -w $file
done
```
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Support cgroup as in Rootless Podman.
Requires cgroup v2 host with crun.
Tested with Ubuntu 19.10 (kernel 5.3, systemd 242), crun v0.12.1.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Dont assign the --bip value directly to the subnet
for the default bridge. Instead use the network value
from the ParseCIDR output
Addresses: https://github.com/moby/moby/issues/40392
Signed-off-by: Arko Dasgupta <arko.dasgupta@docker.com>
Docker Desktop (on MAC and Windows hosts) allows containers
running inside a Linux VM to connect to the host using
the host.docker.internal DNS name, which is implemented by
VPNkit (DNS proxy on the host)
This PR allows containers to connect to Linux hosts
by appending a special string "host-gateway" to --add-host
e.g. "--add-host=host.docker.internal:host-gateway" which adds
host.docker.internal DNS entry in /etc/hosts and maps it to host-gateway-ip
This PR also add a daemon flag call host-gateway-ip which defaults to
the default bridge IP
Docker Desktop will need to set this field to the Host Proxy IP
so DNS requests for host.docker.internal can be routed to VPNkit
Addresses: https://github.com/docker/for-linux/issues/264
Signed-off-by: Arko Dasgupta <arko.dasgupta@docker.com>
When IPv6 is enabled, make sure fixed-cidr-ipv6 is set
by the user since there is no default IPv6 local subnet
in the IPAM
Signed-off-by: Arko Dasgupta <arko.dasgupta@docker.com>
* Requires containerd binaries from containerd/containerd#3799 . Metrics are unimplemented yet.
* Works with crun v0.10.4, but `--security-opt seccomp=unconfined` is needed unless using master version of libseccomp
( containers/crun#156, seccomp/libseccomp#177 )
* Doesn't work with master runc yet
* Resource limitations are unimplemented
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
This fix tries to address the issue raised in 39353 where
docker crash when creating namespaces with UID in /etc/subuid and /etc/subgid.
The issue was that, mapping to `/etc/sub[u,g]id` in docker does not
allow numeric ID.
This fix fixes the issue by probing other combinations (uid:groupname, username:gid, uid:gid)
when normal username:groupname fails.
This fix fixes 39353.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
Trying to link to a non-existing container is not valid, and should return an
"invalid parameter" (400) error. Returning a "not found" error in this situation
would make the client report the container's image could not be found.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This allows our tests, which all share a containerd instance, to be a
bit more isolated by setting the containerd namespaces to the generated
daemon ID's rather than the default namespaces.
This came about because I found in some cases we had test daemons
failing to start (really very slow to start) because it was (seemingly)
processing events from other tests.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
While investigating a test failure, I found this in the logs:
```
time="2019-07-04T15:06:32.622506760Z" level=warning msg="Error while setting daemon root propagation, this is not generally critical but may cause some functionality to not work or fallback to less desirable behavior" dir=/go/src/github.com/docker/docker/bundles/test-integration/d1285b8250308/root error="error writing file to signal mount cleanup on shutdown: open /tmp/dxr/d1285b8250308/unmount-on-shutdown: no such file or directory"
```
This path is generated from the daemon's exec-root, which appears to not
exist yet. This change just makes sure it exists before we try to write
a file.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Previously `docker info` had reported "cgroupfs" as the cgroup driver
but the driver wasn't actually used at all.
This PR reports "none" as the cgroup driver so as to avoid confusion.
e.g. kubeadm/kubelet will detect cgroupless-ness by checking this docker
info field. https://github.com/rootless-containers/usernetes/pull/97
Note that user still cannot specify `native.cgroupdriver=none` manually.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
This adds both a daemon-wide flag and a container creation property:
- Set the `CgroupnsMode: "host|private"` HostConfig property at
container creation time to control what cgroup namespace the container
is created in
- Set the `--default-cgroupns-mode=host|private` daemon flag to control
what cgroup namespace containers are created in by default
- Set the default if the daemon flag is unset to "host", for backward
compatibility
- Default to CgroupnsMode: "host" for client versions < 1.40
Signed-off-by: Rob Gulewich <rgulewich@netflix.com>
- Don't set `PidsLimit` when creating a container and
no limit was set (or the limit was set to "unlimited")
- Don't set `PidsLimit` if the host does not have pids-limit
support (previously "unlimited" was set).
- Do not generate a warning if the host does not have pids-limit
support, but pids-limit was set to unlimited (having no
limit set, or the limit set to "unlimited" is equivalent,
so no warning is nescessary in that case).
- When updating a container, convert `0`, and `-1` to
"unlimited" (`0`).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Please refer to `docs/rootless.md`.
TLDR:
* Make sure `/etc/subuid` and `/etc/subgid` contain the entry for you
* `dockerd-rootless.sh --experimental`
* `docker -H unix://$XDG_RUNTIME_DIR/docker.sock run ...`
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
Many startup tasks have to run for each container, and thus using a
WaitGroup (which doesn't have a limit to the number of parallel tasks)
can result in Docker exceeding the NOFILE limit quite trivially. A more
optimal solution is to have a parallelism limit by using a semaphore.
In addition, several startup tasks were not parallelised previously
which resulted in very long startup times. According to my testing, 20K
dead containers resulted in ~6 minute startup times (during which time
Docker is completely unusable).
This patch fixes both issues, and the parallelStartupTimes factor chosen
(128 * NumCPU) is based on my own significant testing of the 20K
container case. This patch (on my machines) reduces the startup time
from 6 minutes to less than a minute (ideally this could be further
reduced by removing the need to scan all dead containers on startup --
but that's beyond the scope of this patchset).
In order to avoid the NOFILE limit problem, we also detect this
on-startup and if NOFILE < 2*128*NumCPU we will reduce the parallelism
factor to avoid hitting NOFILE limits (but also emit a warning since
this is almost certainly a mis-configuration).
Signed-off-by: Aleksa Sarai <asarai@suse.de>
Disabling the oom-killer for a container without setting a memory limit
is dangerous, as it can result in the container consuming unlimited memory,
without the kernel being able to kill it. A check for this situation is curently
done in the CLI, but other consumers of the API won't receive this warning.
This patch adds a check for this situation to the daemon, so that all consumers
of the API will receive this warning.
This patch will have one side-effect; docker cli's that also perform this check
client-side will print the warning twice; this can be addressed by disabling
the cli-side check for newer API versions, but will generate a bit of extra
noise when using an older CLI.
With this patch applied (and a cli that does not take the new warning into account);
```
docker create --oom-kill-disable busybox
WARNING: OOM killer is disabled for the container, but no memory limit is set, this can result in the system running out of resources.
669933b9b237fa27da699483b5cf15355a9027050825146587a0e5be0d848adf
docker run --rm --oom-kill-disable busybox
WARNING: Disabling the OOM killer on containers without setting a '-m/--memory' limit may be dangerous.
WARNING: OOM killer is disabled for the container, but no memory limit is set, this can result in the system running out of resources.
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This fix tries to address the issue raised in 37038 where
there were no memory.kernelTCP support for linux.
This fix add MemoryKernelTCP to HostConfig, and pass
the config to runtime-spec.
Additional test case has been added.
This fix fixes 37038.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
Using a value such as `--cpuset-mems=1-9223372036854775807` would cause
`dockerd` to run out of memory allocating a map of the values in the
validation code. Set limits to the normal limit of the number of CPUs,
and improve the error handling.
Reported by Huawei PSIRT.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This implements chown support on Windows. Built-in accounts as well
as accounts included in the SAM database of the container are supported.
NOTE: IDPair is now named Identity and IDMappings is now named
IdentityMapping.
The following are valid examples:
ADD --chown=Guest . <some directory>
COPY --chown=Administrator . <some directory>
COPY --chown=Guests . <some directory>
COPY --chown=ContainerUser . <some directory>
On Windows an owner is only granted the permission to read the security
descriptor and read/write the discretionary access control list. This
fix also grants read/write and execute permissions to the owner.
Signed-off-by: Salahuddin Khan <salah@docker.com>
In particular, these two:
> daemon/daemon_unix.go:1129: Wrapf format %v reads arg #1, but call has 0 args
> daemon/kill.go:111: Warn call has possible formatting directive %s
and a few more.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This moves the platform specific stuff in a separate package and keeps
the `volume` package and the defined interfaces light to import.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>