Commit graph

447 commits

Author SHA1 Message Date
Sebastiaan van Stijn
af0415257e
Merge pull request #40694 from kolyshkin/moby-sys-mount-part-II
switch to moby/sys/{mount,mountinfo} part II
2020-04-02 21:52:21 +02:00
Akihiro Suda
3802830989 cgroup2: implement docker stats
The following fields are unsupported:

* BlkioStats: all fields other than IoServiceBytesRecursive
* CPUStats: CPUUsage.PercpuUsage
* MemoryStats: MaxUsage and Failcnt

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-04-02 17:51:34 +09:00
Kir Kolyshkin
5b658a0348 daemon.overlaySupportsSelinux: simplify check
1. Sscanf is very slow, and we don't use the first two fields -- get rid of it.

2. Since the field we search for is at the end of line and prepended by
   a space, we can just use strings.HaveSuffix.

3. Error checking for bufio.Scanner should be done after the Scan()
   loop, not inside it.

Fixes: 885b29df09
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-31 14:32:42 -07:00
Kir Kolyshkin
39048cf656 Really switch to moby/sys/mount*
Switch to moby/sys/mount and mountinfo. Keep the pkg/mount for potential
outside users.

This commit was generated by the following bash script:

```
set -e -u -o pipefail

for file in $(git grep -l 'docker/docker/pkg/mount"' | grep -v ^pkg/mount); do
	sed -i -e 's#/docker/docker/pkg/mount"#/moby/sys/mount"#' \
		-e 's#mount\.\(GetMounts\|Mounted\|Info\|[A-Za-z]*Filter\)#mountinfo.\1#g' \
		$file
	goimports -w $file
done
```

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-20 09:46:25 -07:00
Akihiro Suda
92e7f8f67c daemon: fail early if rootless && cgroupdriver == "systemd" && cgroup v1
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-03-11 12:49:03 +09:00
Akihiro Suda
ca4b51868a rootless: support --exec-opt native.cgroupdriver=systemd
Support cgroup as in Rootless Podman.

Requires cgroup v2 host with crun.
Tested with Ubuntu 19.10 (kernel 5.3, systemd 242), crun v0.12.1.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-02-14 15:32:31 +09:00
Arko Dasgupta
f800d5f786 Set the bip network value as the subnet
Dont assign the --bip value directly to the subnet
for the default bridge. Instead use the network value
from the ParseCIDR output

Addresses: https://github.com/moby/moby/issues/40392

Signed-off-by: Arko Dasgupta <arko.dasgupta@docker.com>
2020-02-10 17:38:54 -08:00
Sebastiaan van Stijn
ca20bc4214
Merge pull request #40007 from arkodg/add-host-docker-internal
Support host.docker.internal in dockerd on Linux
2020-01-27 13:42:26 +01:00
Arko Dasgupta
92e809a680 Support host.docker.internal in dockerd on Linux
Docker Desktop (on MAC and Windows hosts) allows containers
running inside a Linux VM to connect to the host using
the host.docker.internal DNS name, which is implemented by
VPNkit (DNS proxy on the host)

This PR allows containers to connect to Linux hosts
by appending a special string "host-gateway" to --add-host
e.g. "--add-host=host.docker.internal:host-gateway" which adds
host.docker.internal DNS entry in /etc/hosts and maps it to host-gateway-ip

This PR also add a daemon flag call host-gateway-ip which defaults to
the default bridge IP
Docker Desktop will need to set this field to the Host Proxy IP
so DNS requests for host.docker.internal can be routed to VPNkit

Addresses: https://github.com/docker/for-linux/issues/264

Signed-off-by: Arko Dasgupta <arko.dasgupta@docker.com>
2020-01-22 13:30:00 -08:00
Sebastiaan van Stijn
be095a1859
Merge pull request #40366 from arkodg/check-cidr-ipv6
Handle the error case when fixed-cidr-ipv6 is empty and ipv6 is enabled
2020-01-14 13:53:45 +01:00
Arko Dasgupta
bdad16b0ee Handle error case when fixed-cidr-ipv6 is empty
When IPv6 is enabled, make sure fixed-cidr-ipv6 is set
by the user since there is no default IPv6 local subnet
in the IPAM

Signed-off-by: Arko Dasgupta <arko.dasgupta@docker.com>
2020-01-13 09:56:41 -08:00
Akihiro Suda
491531c12b cgroup2: mark cpu-rt-{period,runtime} unimplemented
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-01-01 02:58:40 +09:00
Akihiro Suda
19baeaca26 cgroup2: enable cgroup namespace by default
For cgroup v1, we were unable to change the default because of
compatibility issue.

For cgroup v2, we should change the default right now because switching
to cgroup v2 is already breaking change.

See also containers/libpod#4363 containers/libpod#4374

Privileged containers also use cgroupns=private by default.
https://github.com/containers/libpod/pull/4374#issuecomment-549776387

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-01-01 02:58:40 +09:00
Akihiro Suda
612343618d cgroup2: use shim V2
* Requires containerd binaries from containerd/containerd#3799 . Metrics are unimplemented yet.
* Works with crun v0.10.4, but `--security-opt seccomp=unconfined` is needed unless using master version of libseccomp
  ( containers/crun#156, seccomp/libseccomp#177 )
* Doesn't work with master runc yet
* Resource limitations are unimplemented

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-01-01 02:58:40 +09:00
Yong Tang
f09dc2f4fc Fix docker crash when creating namespaces with UID in /etc/subuid and /etc/subgid
This fix tries to address the issue raised in 39353 where
docker crash when creating namespaces with UID in /etc/subuid and /etc/subgid.

The issue was that, mapping to `/etc/sub[u,g]id` in docker does not
allow numeric ID.

This fix fixes the issue by probing other combinations (uid:groupname, username:gid, uid:gid)
when normal username:groupname fails.

This fix fixes 39353.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2019-11-07 20:17:11 +00:00
Sebastiaan van Stijn
9a7e96b5b7
Rename "v1" to "statsV1"
follow-up to 27552ceb15, where this
was left as a review comment, but the PR was already merged.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-11-01 16:18:06 +01:00
Sebastiaan van Stijn
27552ceb15
bump containerd/cgroups 5fbad35c2a7e855762d3c60f2e474ffcad0d470a
full diff: c4b9ac5c76...5fbad35c2a

- containerd/cgroups#82 Add go module support
- containerd/cgroups#96 Move metrics proto package to stats/v1
- containerd/cgroups#97 Allow overriding the default /proc folder in blkioController
- containerd/cgroups#98 Allows ignoring memory modules
- containerd/cgroups#99 Add Go 1.13 to Travis
- containerd/cgroups#100 stats/v1: export per-cgroup stats

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-10-31 01:09:12 +01:00
Sebastiaan van Stijn
05469b5fa2
daemon: add "isWindows" const
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-10-17 23:49:43 +02:00
Sebastiaan van Stijn
422067ba7b
Return "invalid parameter" when linking to non-existing container
Trying to link to a non-existing container is not valid, and should return an
"invalid parameter" (400) error. Returning a "not found" error in this situation
would make the client report the container's image could not be found.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-09-10 23:06:56 +02:00
Rob Gulewich
530f2d65c3 Explicity set Cgroup NS mode to "host" when running privileged
Signed-off-by: Rob Gulewich <rgulewich@netflix.com>
2019-08-23 11:27:27 -07:00
Sebastiaan van Stijn
1ea8b413d1
initBridgeDriver: minor cleanup and linting fixes
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-08-09 18:34:35 +02:00
Dominic
5f0231bca1
cast Dev and Rdev of Stat_t to uint64 for mips
Signed-off-by: Dominic <yindongchao@inspur.com>
Signed-off-by: Dominic Yin <yindongchao@inspur.com>
2019-08-01 20:22:49 +08:00
Michael Crosby
a4a1e57e9d
Merge pull request #39496 from cpuguy83/fix_missing_dir_cleanup_file
Ensure parent dir exists for mount cleanup file
2019-07-12 13:39:58 -04:00
Brian Goff
24ad2f486d Add (hidden) flags to set containerd namespaces
This allows our tests, which all share a containerd instance, to be a
bit more isolated by setting the containerd namespaces to the generated
daemon ID's rather than the default namespaces.

This came about because I found in some cases we had test daemons
failing to start (really very slow to start) because it was (seemingly)
processing events from other tests.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2019-07-11 17:27:48 -07:00
Brian Goff
7725b88edc Ensure parent dir exists for mount cleanup file
While investigating a test failure, I found this in the logs:

```
time="2019-07-04T15:06:32.622506760Z" level=warning msg="Error while setting daemon root propagation, this is not generally critical but may cause some functionality to not work or fallback to less desirable behavior" dir=/go/src/github.com/docker/docker/bundles/test-integration/d1285b8250308/root error="error writing file to signal mount cleanup on shutdown: open /tmp/dxr/d1285b8250308/unmount-on-shutdown: no such file or directory"
```

This path is generated from the daemon's exec-root, which appears to not
exist yet. This change just makes sure it exists before we try to write
a file.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2019-07-11 13:30:36 -07:00
Akihiro Suda
153466ba0a info: report cgroup driver as "none" when running rootless
Previously `docker info` had reported "cgroupfs" as the cgroup driver
but the driver wasn't actually used at all.

This PR reports "none" as the cgroup driver so as to avoid confusion.
e.g. kubeadm/kubelet will detect cgroupless-ness by checking this docker
info field. https://github.com/rootless-containers/usernetes/pull/97

Note that user still cannot specify `native.cgroupdriver=none` manually.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2019-06-03 00:11:21 +09:00
frankyang
b9f31912de bugfix: fetch the right device number which great than 255
Signed-off-by: frankyang <yyb196@gmail.com>
2019-05-16 15:32:59 +08:00
Rob Gulewich
072400fc4b Make cgroup namespaces configurable
This adds both a daemon-wide flag and a container creation property:
- Set the `CgroupnsMode: "host|private"` HostConfig property at
  container creation time to control what cgroup namespace the container
  is created in
- Set the `--default-cgroupns-mode=host|private` daemon flag to control
  what cgroup namespace containers are created in by default
- Set the default if the daemon flag is unset to "host", for backward
  compatibility
- Default to CgroupnsMode: "host" for client versions < 1.40

Signed-off-by: Rob Gulewich <rgulewich@netflix.com>
2019-05-07 10:22:16 -07:00
Sebastiaan van Stijn
ffa1728d4b
Normalize values for pids-limit
- Don't set `PidsLimit` when creating a container and
  no limit was set (or the limit was set to "unlimited")
- Don't set `PidsLimit` if the host does not have pids-limit
  support (previously "unlimited" was set).
- Do not generate a warning if the host does not have pids-limit
  support, but pids-limit was set to unlimited (having no
  limit set, or the limit set to "unlimited" is equivalent,
  so no warning is nescessary in that case).
- When updating a container, convert `0`, and `-1` to
  "unlimited" (`0`).

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-03-13 00:27:05 +01:00
Sebastiaan van Stijn
dd94555787
Merge pull request #32519 from darkowlzz/32443-docker-update-pids-limit
Add pids-limit support in docker update
2019-02-23 15:20:59 +01:00
Sunny Gogoi
74eb258ffb Add pids-limit support in docker update
- Adds updating PidsLimit in UpdateContainer().
- Adds setting PidsLimit in toContainerResources().

Signed-off-by: Sunny Gogoi <indiasuny000@gmail.com>
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2019-02-21 14:17:38 -08:00
Akihiro Suda
ec87479b7e allow running dockerd in an unprivileged user namespace (rootless mode)
Please refer to `docs/rootless.md`.

TLDR:
 * Make sure `/etc/subuid` and `/etc/subgid` contain the entry for you
 * `dockerd-rootless.sh --experimental`
 * `docker -H unix://$XDG_RUNTIME_DIR/docker.sock run ...`

Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
2019-02-04 00:24:27 +09:00
Akihiro Suda
2cb26cfe9c
Merge pull request #38301 from cyphar/waitgroup-limits
daemon: switch to semaphore-gated WaitGroup for startup tasks
2018-12-22 00:07:55 +09:00
Aleksa Sarai
5a52917e4d
daemon: switch to semaphore-gated WaitGroup for startup tasks
Many startup tasks have to run for each container, and thus using a
WaitGroup (which doesn't have a limit to the number of parallel tasks)
can result in Docker exceeding the NOFILE limit quite trivially. A more
optimal solution is to have a parallelism limit by using a semaphore.

In addition, several startup tasks were not parallelised previously
which resulted in very long startup times. According to my testing, 20K
dead containers resulted in ~6 minute startup times (during which time
Docker is completely unusable).

This patch fixes both issues, and the parallelStartupTimes factor chosen
(128 * NumCPU) is based on my own significant testing of the 20K
container case. This patch (on my machines) reduces the startup time
from 6 minutes to less than a minute (ideally this could be further
reduced by removing the need to scan all dead containers on startup --
but that's beyond the scope of this patchset).

In order to avoid the NOFILE limit problem, we also detect this
on-startup and if NOFILE < 2*128*NumCPU we will reduce the parallelism
factor to avoid hitting NOFILE limits (but also emit a warning since
this is almost certainly a mis-configuration).

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2018-12-21 21:51:02 +11:00
Sebastiaan van Stijn
f6002117a4
Extract container-config and container-hostconfig validation
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2018-12-19 13:09:12 +01:00
Sebastiaan van Stijn
b6e373c525
Rename verifyContainerResources to verifyPlatformContainerResources
This validation function is platform-specific; rename it to be
more explicit.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2018-12-19 10:24:09 +01:00
Sebastiaan van Stijn
e278678705
Remove unused argument from verifyPlatformContainerSettings
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2018-12-19 09:23:09 +01:00
Sebastiaan van Stijn
10c97b9357
Unify logging container validation warnings
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2018-12-19 09:15:21 +01:00
Sebastiaan van Stijn
2e23ef5350
Move port-publishing check to linux platform-check
Windows does not have host-mode networking, so on Windows, this
check was a no-op

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2018-12-18 22:46:05 +01:00
Sebastiaan van Stijn
57f1305e74
Move "OOM Kill disable" warning to the daemon
Disabling the oom-killer for a container without setting a memory limit
is dangerous, as it can result in the container consuming unlimited memory,
without the kernel being able to kill it. A check for this situation is curently
done in the CLI, but other consumers of the API won't receive this warning.

This patch adds a check for this situation to the daemon, so that all consumers
of the API will receive this warning.

This patch will have one side-effect; docker cli's that also perform this check
client-side will print the warning twice; this can be addressed by disabling
the cli-side check for newer API versions, but will generate a bit of extra
noise when using an older CLI.

With this patch applied (and a cli that does not take the new warning into account);

```
docker create --oom-kill-disable busybox
WARNING: OOM killer is disabled for the container, but no memory limit is set, this can result in the system running out of resources.
669933b9b237fa27da699483b5cf15355a9027050825146587a0e5be0d848adf

docker run --rm --oom-kill-disable busybox
WARNING: Disabling the OOM killer on containers without setting a '-m/--memory' limit may be dangerous.
WARNING: OOM killer is disabled for the container, but no memory limit is set, this can result in the system running out of resources.
```

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2018-12-18 22:30:56 +01:00
Andrew Hsu
78045a5419
use empty string as cgroup path to grab first find
Signed-off-by: Andrew Hsu <andrewhsu@docker.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2018-12-07 18:44:00 +01:00
Yong Tang
f023816608 Add memory.kernelTCP support for linux
This fix tries to address the issue raised in 37038 where
there were no memory.kernelTCP support for linux.

This fix add MemoryKernelTCP to HostConfig, and pass
the config to runtime-spec.

Additional test case has been added.

This fix fixes 37038.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2018-11-26 21:03:08 +00:00
Justin Cormack
f8e876d761
Fix denial of service with large numbers in cpuset-cpus and cpuset-mems
Using a value such as `--cpuset-mems=1-9223372036854775807` would cause
`dockerd` to run out of memory allocating a map of the values in the
validation code. Set limits to the normal limit of the number of CPUs,
and improve the error handling.

Reported by Huawei PSIRT.

Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2018-10-05 15:09:02 +02:00
Tibor Vass
34eede0296 Remove 'docker-' prefix for containerd and runc binaries
This allows to run the daemon in environments that have upstream containerd installed.

Signed-off-by: Tibor Vass <tibor@docker.com>
2018-09-24 21:49:03 +00:00
Salahuddin Khan
763d839261 Add ADD/COPY --chown flag support to Windows
This implements chown support on Windows. Built-in accounts as well
as accounts included in the SAM database of the container are supported.

NOTE: IDPair is now named Identity and IDMappings is now named
IdentityMapping.

The following are valid examples:
ADD --chown=Guest . <some directory>
COPY --chown=Administrator . <some directory>
COPY --chown=Guests . <some directory>
COPY --chown=ContainerUser . <some directory>

On Windows an owner is only granted the permission to read the security
descriptor and read/write the discretionary access control list. This
fix also grants read/write and execute permissions to the owner.

Signed-off-by: Salahuddin Khan <salah@docker.com>
2018-08-13 21:59:11 -07:00
Sebastiaan van Stijn
3737194b9f
daemon/*.go: fix some Wrap[f]/Warn[f] errors
In particular, these two:
> daemon/daemon_unix.go:1129: Wrapf format %v reads arg #1, but call has 0 args
> daemon/kill.go:111: Warn call has possible formatting directive %s

and a few more.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2018-07-11 15:51:51 +02:00
Sebastiaan van Stijn
f23c00d870
Various code-cleanup
remove unnescessary import aliases, brackets, and so on.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2018-05-23 17:50:54 +02:00
Alessandro Boch
173b3c364e Allow user to control the default address pools
- Via daemon flag --default-address-pools base=<CIDR>,size=<int>

Signed-off-by: Elango Siva  <elango@docker.com>
2018-04-30 11:14:08 -04:00
Sebastiaan van Stijn
cf9c48bb3e
Merge pull request #36879 from cpuguy83/extra_unmount_check
Extra check before unmounting on shutdown
2018-04-20 17:08:11 -07:00
Brian Goff
6a70fd222b Move mount parsing to separate package.
This moves the platform specific stuff in a separate package and keeps
the `volume` package and the defined interfaces light to import.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-04-19 06:35:54 -04:00