0ct0pu5/moby

Author	SHA1	Message	Date
Akihiro Suda	51e3cd4761	statsV2: implement Failcnt Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-07-30 14:31:20 +09:00
Akihiro Suda	b8ca7de823	Deprecate KernelMemory Kernel memory limit is not supported on cgroup v2. Even on cgroup v1, kernel memory limit (`kmem.limit_in_bytes`) has been deprecated since kernel 5.4. `0158115f70` Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-07-24 20:44:29 +09:00
Brian Goff	260c26b7be	Merge pull request #41016 from kolyshkin/cgroup-init	2020-07-16 11:26:52 -07:00
Brian Goff	61b73ee714	Merge pull request #41182 from cpuguy83/runtime_configure_shim	2020-07-14 14:16:04 -07:00
Brian Goff	f63f73a4a8	Configure shims from runtime config In dockerd we already have a concept of a "runtime", which specifies the OCI runtime to use (e.g. runc). This PR extends that config to add containerd shim configuration. This option is only exposed within the daemon itself (cannot be configured in daemon.json). This is due to issues in supporting unknown shims which will require more design work. What this change allows us to do is keep all the runtime config in one place. So the default "runc" runtime will just have it's already existing shim config codified within the runtime config alone. I've also added 2 more "stock" runtimes which are basically runc+shimv1 and runc+shimv2. These new runtime configurations are: - io.containerd.runtime.v1.linux - runc + v1 shim using the V1 shim API - io.containerd.runc.v2 - runc + shim v2 These names coincide with the actual names of the containerd shims. This allows the user to essentially control what shim is going to be used by either specifying these as a `--runtime` on container create or by setting `--default-runtime` on the daemon. For custom/user-specified runtimes, the default shim config (currently shim v1) is used. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2020-07-13 14:18:02 -07:00
Sebastiaan van Stijn	d2e23405be	Set minimum memory limit to 6M, to account for higher startup memory use For some time, we defined a minimum limit for `--memory` limits to account for overhead during startup, and to supply a reasonable functional container. Changes in the runtime (runc) introduced a higher memory footprint during container startup, which now lead to obscure error-messages that are unfriendly for users: run --rm --memory=4m alpine echo success docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:415: setting cgroup config for procHooks process caused \\\"failed to write \\\\\\\"4194304\\\\\\\" to \\\\\\\"/sys/fs/cgroup/memory/docker/1254c8d63f85442e599b17dff895f4543c897755ee3bd9b56d5d3d17724b38d7/memory.limit_in_bytes\\\\\\\": write /sys/fs/cgroup/memory/docker/1254c8d63f85442e599b17dff895f4543c897755ee3bd9b56d5d3d17724b38d7/memory.limit_in_bytes: device or resource busy\\\"\"": unknown. ERRO[0000] error waiting for container: context canceled Containers that fail to start because of this limit, will not be marked as OOMKilled, which makes it harder for users to find the cause of the failure. Note that _after_ this memory is only required during startup of the container. After the container was started, the container may not consume this memory, and limits could (manually) be lowered, for example, an alpine container running only a shell can run with 512k of memory; echo 524288 > /sys/fs/cgroup/memory/docker/acdd326419f0898be63b0463cfc81cd17fb34d2dae6f8aa3768ee6a075ca5c86/memory.limit_in_bytes However, restarting the container will reset that manual limit to the container's configuration. While `docker container update` would allow for the updated limit to be persisted, (re)starting the container after updating produces the same error message again, so we cannot use different limits for `docker run` / `docker create` and `docker update`. This patch raises the minimum memory limnit to 6M, so that a better error-message is produced if a user tries to create a container with a memory-limit that is too low: docker create --memory=4m alpine echo success docker: Error response from daemon: Minimum memory limit allowed is 6MB. Possibly, this constraint could be handled by runc, so that different runtimes could set a best-matching limit (other runtimes may require less overhead). Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-07-01 13:29:07 +02:00
Kir Kolyshkin	e3cff19dd1	Untangle CPU RT controller init Commit `56f77d5ade` added code that is doing some very ugly things. In partucular, calling cgroups.FindCgroupMountpointAndRoot() and daemon.SysInfoRaw() inside a recursively-called initCgroupsPath() not not a good thing to do. This commit tries to partially untangle this by moving some expensive checks and calls earlier, in a minimally invasive way (meaning I tried hard to not break any logic, however weird it is). This also removes double call to MkdirAll (not important, but it sticks out) and renames the function to better reflect what it's doing. Finally, this wraps some of the errors returned, and fixes the init function to not ignore the error from itself. This could be reworked more radically, but at least this this commit we are calling expensive functions once, and only if necessary. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-06-26 16:19:52 -07:00
Kir Kolyshkin	afbeaf6f29	pkg/sysinfo: rm duplicates The CPU CFS cgroup-aware scheduler is one single kernel feature, not two, so it does not make sense to have two separate booleans (CPUCfsQuota and CPUCfsPeriod). Merge these into CPUCfs. Same for CPU realtime. For compatibility reasons, /info stays the same for now. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-06-26 16:19:52 -07:00
Sebastiaan van Stijn	4534a7afc3	daemon: use containerd/sys to detect UserNamespaces The implementation in libcontainer/system is quite complicated, and we only use it to detect if user-namespaces are enabled. In addition, the implementation in containerd uses a sync.Once, so that detection (and reading/parsing `/proc/self/uid_map`) is only performed once. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-06-15 13:06:08 +02:00
Sebastiaan van Stijn	3aac5f0bbb	Merge pull request #41018 from akhilerm/identity-mapping remove group name from identity mapping	2020-06-08 15:15:05 +02:00
Akhil Mohan	7ad0da7051	remove group name from identity mapping NewIdentityMapping took group name as an argument, and used the group name also to parse the /etc/sub{uid,gui}. But as per linux man pages, the sub{uid,gid} file maps username or uid, not a group name. Therefore, all occurrences where mapping is used need to consider only username and uid. Code trying to map using gid and group name in the daemon is also removed. Signed-off-by: Akhil Mohan <akhil.mohan@mayadata.io>	2020-06-03 20:04:42 +05:30
Brian Goff	763f9e799b	Merge pull request #40846 from AkihiroSuda/cgroup2-use-systemd-by-default cgroup2: use "systemd" cgroup driver by default when available	2020-05-28 11:37:39 -07:00
Sebastiaan van Stijn	b453b64d04	Merge pull request #40845 from AkihiroSuda/allow-privileged-cgroupns-private-on-cgroup-v1 support `--privileged --cgroupns=private` on cgroup v1	2020-05-07 21:11:42 +02:00
Brian Goff	f6163d3f7a	Merge pull request #40673 from kolyshkin/scan Simplify daemon.overlaySupportsSelinux(), fix use of bufio.Scanner.Err()	2020-04-29 17:18:37 -07:00
Akihiro Suda	4714ab5d6c	cgroup2: use "systemd" cgroup driver by default when available The "systemd" cgroup driver is always preferred over "cgroupfs" on systemd-based hosts. This commit does not affect cgroup v1 hosts. Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-04-22 05:13:37 +09:00
Akihiro Suda	33ee7941d4	support `--privileged --cgroupns=private` on cgroup v1 Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-04-21 23:11:32 +09:00
Akihiro Suda	f350b53241	cgroup2: implement `docker info` ref: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-04-17 07:20:01 +09:00
Sebastiaan van Stijn	eb14d936bf	daemon: rename variables that collide with imported package names Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-04-14 17:22:23 +02:00
Sebastiaan van Stijn	5d040cbd16	daemon: fix capitalization of some functions Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-04-14 17:22:19 +02:00
Sebastiaan van Stijn	af0415257e	Merge pull request #40694 from kolyshkin/moby-sys-mount-part-II switch to moby/sys/{mount,mountinfo} part II	2020-04-02 21:52:21 +02:00
Akihiro Suda	3802830989	cgroup2: implement `docker stats` The following fields are unsupported: * BlkioStats: all fields other than IoServiceBytesRecursive * CPUStats: CPUUsage.PercpuUsage * MemoryStats: MaxUsage and Failcnt Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-04-02 17:51:34 +09:00
Kir Kolyshkin	5b658a0348	daemon.overlaySupportsSelinux: simplify check 1. Sscanf is very slow, and we don't use the first two fields -- get rid of it. 2. Since the field we search for is at the end of line and prepended by a space, we can just use strings.HaveSuffix. 3. Error checking for bufio.Scanner should be done after the Scan() loop, not inside it. Fixes: `885b29df09` Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-03-31 14:32:42 -07:00
Kir Kolyshkin	39048cf656	Really switch to moby/sys/mount* Switch to moby/sys/mount and mountinfo. Keep the pkg/mount for potential outside users. This commit was generated by the following bash script: ``` set -e -u -o pipefail for file in $(git grep -l 'docker/docker/pkg/mount"' \| grep -v ^pkg/mount); do sed -i -e 's#/docker/docker/pkg/mount"#/moby/sys/mount"#' \ -e 's#mount\.$GetMounts\\|Mounted\\|Info\\|[A-Za-z]*Filter$#mountinfo.\1#g' \ $file goimports -w $file done ``` Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-03-20 09:46:25 -07:00
Akihiro Suda	92e7f8f67c	daemon: fail early if rootless && cgroupdriver == "systemd" && cgroup v1 Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-03-11 12:49:03 +09:00
Akihiro Suda	ca4b51868a	rootless: support `--exec-opt native.cgroupdriver=systemd` Support cgroup as in Rootless Podman. Requires cgroup v2 host with crun. Tested with Ubuntu 19.10 (kernel 5.3, systemd 242), crun v0.12.1. Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-02-14 15:32:31 +09:00
Arko Dasgupta	f800d5f786	Set the bip network value as the subnet Dont assign the --bip value directly to the subnet for the default bridge. Instead use the network value from the ParseCIDR output Addresses: https://github.com/moby/moby/issues/40392 Signed-off-by: Arko Dasgupta <arko.dasgupta@docker.com>	2020-02-10 17:38:54 -08:00
Sebastiaan van Stijn	ca20bc4214	Merge pull request #40007 from arkodg/add-host-docker-internal Support host.docker.internal in dockerd on Linux	2020-01-27 13:42:26 +01:00
Arko Dasgupta	92e809a680	Support host.docker.internal in dockerd on Linux Docker Desktop (on MAC and Windows hosts) allows containers running inside a Linux VM to connect to the host using the host.docker.internal DNS name, which is implemented by VPNkit (DNS proxy on the host) This PR allows containers to connect to Linux hosts by appending a special string "host-gateway" to --add-host e.g. "--add-host=host.docker.internal:host-gateway" which adds host.docker.internal DNS entry in /etc/hosts and maps it to host-gateway-ip This PR also add a daemon flag call host-gateway-ip which defaults to the default bridge IP Docker Desktop will need to set this field to the Host Proxy IP so DNS requests for host.docker.internal can be routed to VPNkit Addresses: https://github.com/docker/for-linux/issues/264 Signed-off-by: Arko Dasgupta <arko.dasgupta@docker.com>	2020-01-22 13:30:00 -08:00
Sebastiaan van Stijn	be095a1859	Merge pull request #40366 from arkodg/check-cidr-ipv6 Handle the error case when fixed-cidr-ipv6 is empty and ipv6 is enabled	2020-01-14 13:53:45 +01:00
Arko Dasgupta	bdad16b0ee	Handle error case when fixed-cidr-ipv6 is empty When IPv6 is enabled, make sure fixed-cidr-ipv6 is set by the user since there is no default IPv6 local subnet in the IPAM Signed-off-by: Arko Dasgupta <arko.dasgupta@docker.com>	2020-01-13 09:56:41 -08:00
Akihiro Suda	491531c12b	cgroup2: mark cpu-rt-{period,runtime} unimplemented Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-01-01 02:58:40 +09:00
Akihiro Suda	19baeaca26	cgroup2: enable cgroup namespace by default For cgroup v1, we were unable to change the default because of compatibility issue. For cgroup v2, we should change the default right now because switching to cgroup v2 is already breaking change. See also containers/libpod#4363 containers/libpod#4374 Privileged containers also use cgroupns=private by default. https://github.com/containers/libpod/pull/4374#issuecomment-549776387 Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-01-01 02:58:40 +09:00
Akihiro Suda	612343618d	cgroup2: use shim V2 * Requires containerd binaries from containerd/containerd#3799 . Metrics are unimplemented yet. * Works with crun v0.10.4, but `--security-opt seccomp=unconfined` is needed unless using master version of libseccomp ( containers/crun#156, seccomp/libseccomp#177 ) * Doesn't work with master runc yet * Resource limitations are unimplemented Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-01-01 02:58:40 +09:00
Yong Tang	f09dc2f4fc	Fix docker crash when creating namespaces with UID in /etc/subuid and /etc/subgid This fix tries to address the issue raised in 39353 where docker crash when creating namespaces with UID in /etc/subuid and /etc/subgid. The issue was that, mapping to `/etc/sub[u,g]id` in docker does not allow numeric ID. This fix fixes the issue by probing other combinations (uid:groupname, username:gid, uid:gid) when normal username:groupname fails. This fix fixes 39353. Signed-off-by: Yong Tang <yong.tang.github@outlook.com>	2019-11-07 20:17:11 +00:00
Sebastiaan van Stijn	9a7e96b5b7	Rename "v1" to "statsV1" follow-up to `27552ceb15`, where this was left as a review comment, but the PR was already merged. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2019-11-01 16:18:06 +01:00
Sebastiaan van Stijn	27552ceb15	bump containerd/cgroups 5fbad35c2a7e855762d3c60f2e474ffcad0d470a full diff: `c4b9ac5c76...5fbad35c2a` - containerd/cgroups#82 Add go module support - containerd/cgroups#96 Move metrics proto package to stats/v1 - containerd/cgroups#97 Allow overriding the default /proc folder in blkioController - containerd/cgroups#98 Allows ignoring memory modules - containerd/cgroups#99 Add Go 1.13 to Travis - containerd/cgroups#100 stats/v1: export per-cgroup stats Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2019-10-31 01:09:12 +01:00
Sebastiaan van Stijn	05469b5fa2	daemon: add "isWindows" const Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2019-10-17 23:49:43 +02:00
Sebastiaan van Stijn	422067ba7b	Return "invalid parameter" when linking to non-existing container Trying to link to a non-existing container is not valid, and should return an "invalid parameter" (400) error. Returning a "not found" error in this situation would make the client report the container's image could not be found. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2019-09-10 23:06:56 +02:00
Rob Gulewich	530f2d65c3	Explicity set Cgroup NS mode to "host" when running privileged Signed-off-by: Rob Gulewich <rgulewich@netflix.com>	2019-08-23 11:27:27 -07:00
Sebastiaan van Stijn	1ea8b413d1	initBridgeDriver: minor cleanup and linting fixes Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2019-08-09 18:34:35 +02:00
Dominic	5f0231bca1	cast Dev and Rdev of Stat_t to uint64 for mips Signed-off-by: Dominic <yindongchao@inspur.com> Signed-off-by: Dominic Yin <yindongchao@inspur.com>	2019-08-01 20:22:49 +08:00
Michael Crosby	a4a1e57e9d	Merge pull request #39496 from cpuguy83/fix_missing_dir_cleanup_file Ensure parent dir exists for mount cleanup file	2019-07-12 13:39:58 -04:00
Brian Goff	24ad2f486d	Add (hidden) flags to set containerd namespaces This allows our tests, which all share a containerd instance, to be a bit more isolated by setting the containerd namespaces to the generated daemon ID's rather than the default namespaces. This came about because I found in some cases we had test daemons failing to start (really very slow to start) because it was (seemingly) processing events from other tests. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2019-07-11 17:27:48 -07:00
Brian Goff	7725b88edc	Ensure parent dir exists for mount cleanup file While investigating a test failure, I found this in the logs: ``` time="2019-07-04T15:06:32.622506760Z" level=warning msg="Error while setting daemon root propagation, this is not generally critical but may cause some functionality to not work or fallback to less desirable behavior" dir=/go/src/github.com/docker/docker/bundles/test-integration/d1285b8250308/root error="error writing file to signal mount cleanup on shutdown: open /tmp/dxr/d1285b8250308/unmount-on-shutdown: no such file or directory" ``` This path is generated from the daemon's exec-root, which appears to not exist yet. This change just makes sure it exists before we try to write a file. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2019-07-11 13:30:36 -07:00
Akihiro Suda	153466ba0a	info: report cgroup driver as "none" when running rootless Previously `docker info` had reported "cgroupfs" as the cgroup driver but the driver wasn't actually used at all. This PR reports "none" as the cgroup driver so as to avoid confusion. e.g. kubeadm/kubelet will detect cgroupless-ness by checking this docker info field. https://github.com/rootless-containers/usernetes/pull/97 Note that user still cannot specify `native.cgroupdriver=none` manually. Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2019-06-03 00:11:21 +09:00
frankyang	b9f31912de	bugfix: fetch the right device number which great than 255 Signed-off-by: frankyang <yyb196@gmail.com>	2019-05-16 15:32:59 +08:00
Rob Gulewich	072400fc4b	Make cgroup namespaces configurable This adds both a daemon-wide flag and a container creation property: - Set the `CgroupnsMode: "host\|private"` HostConfig property at container creation time to control what cgroup namespace the container is created in - Set the `--default-cgroupns-mode=host\|private` daemon flag to control what cgroup namespace containers are created in by default - Set the default if the daemon flag is unset to "host", for backward compatibility - Default to CgroupnsMode: "host" for client versions < 1.40 Signed-off-by: Rob Gulewich <rgulewich@netflix.com>	2019-05-07 10:22:16 -07:00
Sebastiaan van Stijn	ffa1728d4b	Normalize values for pids-limit - Don't set `PidsLimit` when creating a container and no limit was set (or the limit was set to "unlimited") - Don't set `PidsLimit` if the host does not have pids-limit support (previously "unlimited" was set). - Do not generate a warning if the host does not have pids-limit support, but pids-limit was set to unlimited (having no limit set, or the limit set to "unlimited" is equivalent, so no warning is nescessary in that case). - When updating a container, convert `0`, and `-1` to "unlimited" (`0`). Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2019-03-13 00:27:05 +01:00
Sebastiaan van Stijn	dd94555787	Merge pull request #32519 from darkowlzz/32443-docker-update-pids-limit Add pids-limit support in docker update	2019-02-23 15:20:59 +01:00
Sunny Gogoi	74eb258ffb	Add pids-limit support in docker update - Adds updating PidsLimit in UpdateContainer(). - Adds setting PidsLimit in toContainerResources(). Signed-off-by: Sunny Gogoi <indiasuny000@gmail.com> Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2019-02-21 14:17:38 -08:00

1 2 3 4 5 ...

366 commits