beenull/moby

Author	SHA1	Message	Date
Brian Goff	93ac040bf0	Lock down docker root dir perms. Do not use 0701 perms. 0701 dir perms allows anyone to traverse the docker dir. It happens to allow any user to execute, as an example, suid binaries from image rootfs dirs because it allows traversal AND critically container users need to be able to do execute things. 0701 on lower directories also happens to allow any user to modify things in, for instance, the overlay upper dir which neccessarily has 0755 permissions. This changes to use 0710 which allows users in the group to traverse. In userns mode the UID owner is (real) root and the GID is the remapped root's GID. This prevents anyone but the remapped root to traverse our directories (which is required for userns with runc). Signed-off-by: Brian Goff <cpuguy83@gmail.com> (cherry picked from commit ef7237442147441a7cadcda0600be1186d81ac73) Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2021-08-19 20:40:15 +00:00
Brian Goff	df2a989769	Add shim config for custom runtimes for plugins This fixes a panic when an admin specifies a custom default runtime, when a plugin is started the shim config is nil. Signed-off-by: Brian Goff <cpuguy83@gmail.com> (cherry picked from commit `2903863a1d`) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2021-02-17 21:20:03 +01:00
Brian Goff	e908cc3901	Use real root with 0701 perms Various dirs in /var/lib/docker contain data that needs to be mounted into a container. For this reason, these dirs are set to be owned by the remapped root user, otherwise there can be permissions issues. However, this uneccessarily exposes these dirs to an unprivileged user on the host. Instead, set the ownership of these dirs to the real root (or rather the UID/GID of dockerd) with 0701 permissions, which allows the remapped root to enter the directories but not read/write to them. The remapped root needs to enter these dirs so the container's rootfs can be configured... e.g. to mount /etc/resolve.conf. This prevents an unprivileged user from having read/write access to these dirs on the host. The flip side of this is now any user can enter these directories. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2021-01-26 17:23:32 +00:00
Brian Goff	bfedd27259	Do not set DOCKER_TMP to be owned by remapped root The remapped root does not need access to this dir. Having this owned by the remapped root opens the host up to an uprivileged user on the host being able to escalate privileges. While it would not be normal for the remapped UID to be used outside of the container context, it could happen. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2021-01-26 17:23:32 +00:00
Brian Goff	4a175fd050	Cleanup container shutdown check and add test Adds a test case for the case where dockerd gets stuck on startup due to hanging `daemon.shutdownContainer` Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2020-12-23 16:59:03 +00:00
Simon Ferquel	af0665861b	Fix a potential hang when starting after a non-clean shutdown Previous startup sequence used to call "containerStop" on containers that were persisted with a running state but are not alive when restarting (can happen on non-clean shutdown). This call was made before fixing-up the RunningState of the container, and tricked the daemon to trying to kill a non-existing process and ultimately hang. The fix is very simple - just add a condition on calling containerStop. Signed-off-by: Simon Ferquel <simon.ferquel@docker.com>	2020-12-18 10:20:56 +01:00
Sebastiaan van Stijn	7e600eaae0	daemon: improve log messages during startup / shutdown Consistently set "container ID" as a field for log messages, so that logs can be associated with a container. With this logs look like; INFO[2020-12-15T12:30:46.239329903Z] Loading containers: start. DEBU[2020-12-15T12:30:46.239919357Z] processing event stream module=libcontainerd namespace=moby DEBU[2020-12-15T12:30:46.242061458Z] loaded container container=622dec5f737d532da347bc627655ebc351fa5887476e8b8c33e5fbc5d0e48b5c paused=false running=false DEBU[2020-12-15T12:30:46.242185251Z] loaded container container=47f348160645f46a17c758d120dec600967eed4adf08dd28b809725971d062cc paused=false running=false DEBU[2020-12-15T12:30:46.242912375Z] loaded container container=e29c34c14b84810bc1e6cb6978a81e863601bfbe9ffe076c07dd5f6a439289d6 paused=false running=false DEBU[2020-12-15T12:30:46.243165260Z] loaded container container=31d40ee3e591a50ebee790b08c2bec751610d2eca51ca1a371ea1ff66ea46c1d paused=false running=false DEBU[2020-12-15T12:30:46.243585164Z] loaded container container=03dd5b1dc251a12d2e74eb54cb3ace66c437db228238a8d4831a264c9313c192 paused=false running=false DEBU[2020-12-15T12:30:46.244870764Z] loaded container container=b774141975cc511cc61fc5f374793503bb2e8fa774d6580ac47111a089de1b9b paused=false running=false DEBU[2020-12-15T12:30:46.245140276Z] loaded container container=b8a7229824fb84ff6f5af537a8ba987d106bf9a24a9aad3b628605d26b3facc4 paused=false running=false DEBU[2020-12-15T12:30:46.245457025Z] loaded container container=b3256ff87fc6f243d9e044fb3d7988ef61c86bfb957d90c0227e8a9697ffa49c paused=false running=false DEBU[2020-12-15T12:30:46.292515417Z] restoring container container=b3256ff87fc6f243d9e044fb3d7988ef61c86bfb957d90c0227e8a9697ffa49c paused=false running=false DEBU[2020-12-15T12:30:46.292612379Z] restoring container container=31d40ee3e591a50ebee790b08c2bec751610d2eca51ca1a371ea1ff66ea46c1d paused=false running=false DEBU[2020-12-15T12:30:46.292573767Z] restoring container container=b8a7229824fb84ff6f5af537a8ba987d106bf9a24a9aad3b628605d26b3facc4 paused=false running=false DEBU[2020-12-15T12:30:46.292602437Z] restoring container container=b774141975cc511cc61fc5f374793503bb2e8fa774d6580ac47111a089de1b9b paused=false running=false DEBU[2020-12-15T12:30:46.305032730Z] restoring container container=47f348160645f46a17c758d120dec600967eed4adf08dd28b809725971d062cc paused=false running=false DEBU[2020-12-15T12:30:46.305421360Z] restoring container container=622dec5f737d532da347bc627655ebc351fa5887476e8b8c33e5fbc5d0e48b5c paused=false running=false DEBU[2020-12-15T12:30:46.305558773Z] restoring container container=03dd5b1dc251a12d2e74eb54cb3ace66c437db228238a8d4831a264c9313c192 paused=false running=false DEBU[2020-12-15T12:30:46.307662990Z] restoring container container=e29c34c14b84810bc1e6cb6978a81e863601bfbe9ffe076c07dd5f6a439289d6 paused=false running=false ... INFO[2020-12-15T12:30:46.536506204Z] Loading containers: done. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-12-15 15:57:39 +01:00
Brian Goff	9ca3bb632e	Store image manifests in containerd content store This allows us to cache manifests and avoid extra round trips to the registry for content we already know about. dockerd currently does not support containerd on Windows, so this does not store manifests on Windows, yet. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2020-11-05 20:02:18 +00:00
Sebastiaan van Stijn	5ca758199d	replace pkg/locker with github.com/moby/locker Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-09-10 22:15:40 +02:00
Brian Goff	f63f73a4a8	Configure shims from runtime config In dockerd we already have a concept of a "runtime", which specifies the OCI runtime to use (e.g. runc). This PR extends that config to add containerd shim configuration. This option is only exposed within the daemon itself (cannot be configured in daemon.json). This is due to issues in supporting unknown shims which will require more design work. What this change allows us to do is keep all the runtime config in one place. So the default "runc" runtime will just have it's already existing shim config codified within the runtime config alone. I've also added 2 more "stock" runtimes which are basically runc+shimv1 and runc+shimv2. These new runtime configurations are: - io.containerd.runtime.v1.linux - runc + v1 shim using the V1 shim API - io.containerd.runc.v2 - runc + shim v2 These names coincide with the actual names of the containerd shims. This allows the user to essentially control what shim is going to be used by either specifying these as a `--runtime` on container create or by setting `--default-runtime` on the daemon. For custom/user-specified runtimes, the default shim config (currently shim v1) is used. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2020-07-13 14:18:02 -07:00
Sebastiaan van Stijn	4534a7afc3	daemon: use containerd/sys to detect UserNamespaces The implementation in libcontainer/system is quite complicated, and we only use it to detect if user-namespaces are enabled. In addition, the implementation in containerd uses a sync.Once, so that detection (and reading/parsing `/proc/self/uid_map`) is only performed once. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-06-15 13:06:08 +02:00
Tonis Tiigi	fdb71e410c	registry: fix mtls config dir passing Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>	2020-05-14 12:02:09 -07:00
Akihiro Suda	f350b53241	cgroup2: implement `docker info` ref: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-04-17 07:20:01 +09:00
Sebastiaan van Stijn	db669cd117	Merge pull request #40814 from tonistiigi/buildkit-update vendor: update buildkit to ae7ff174	2020-04-15 21:18:51 +02:00
Akihiro Suda	2e5923c547	Merge pull request #39705 from thaJeztah/daemon_nits daemon: various nits and small fixes	2020-04-15 09:11:25 +09:00
Tonis Tiigi	0cdf6ba9c8	vendor: update buildkit to ae7ff174 Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>	2020-04-14 08:26:07 -07:00
Sebastiaan van Stijn	eb14d936bf	daemon: rename variables that collide with imported package names Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-04-14 17:22:23 +02:00
Brian Goff	ced91bee4b	On startup, actually shutdown the container. When a container is left running after the daemon exits (e.g. the daemon is SIGKILL'd or crashes), it should stop any running containers when the daemon starts back up. What actually happens is the daemon only sends the container's configured stop signal and does not check if it has exited. If the container does not actually exit then it is left running. This fixes this unexpected behavior by calling the same function to shut down the container that the daemon shutdown process does. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2020-04-13 14:20:12 -07:00
Akihiro Suda	f073ed4187	Merge pull request #39816 from tao12345666333/rm-systeminfo-error-handler Remove `SystemInfo()` error handling.	2020-03-13 01:59:13 +09:00
Akihiro Suda	9a82a9a8ea	vendor containerd, BuildKit, protobuf, grpc, and golang.org/x Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-03-03 10:25:20 +09:00
Brian Goff	62bd5a33f7	Merge pull request #40137 from fuweid/me-wait-for-remote-containerd-before-reload daemon: add grpc.WithBlock option	2020-02-21 10:11:10 -08:00
Akihiro Suda	612343618d	cgroup2: use shim V2 * Requires containerd binaries from containerd/containerd#3799 . Metrics are unimplemented yet. * Works with crun v0.10.4, but `--security-opt seccomp=unconfined` is needed unless using master version of libseccomp ( containers/crun#156, seccomp/libseccomp#177 ) * Doesn't work with master runc yet * Resource limitations are unimplemented Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-01-01 02:58:40 +09:00
Brian Goff	47c5c67ed8	Merge pull request #40032 from jmartin84/fix-grpc-withdialer-deprecation-warning Fix grpc withdialer deprecation warning	2019-11-05 12:20:33 -08:00
Wei Fu	9f73396dab	daemon: add grpc.WithBlock option WithBlock makes sure that the following containerd request is reliable. In one edge case with high load pressure, kernel kills dockerd, containerd and containerd-shims caused by OOM. When both dockerd and containerd restart, but containerd will take time to recover all the existing containers. Before containerd serving, dockerd will failed with gRPC error. That bad thing is that restore action will still ignore the any non-NotFound errors and returns running state for already stopped container. It is unexpected behavior. And we need to restart dockerd to make sure that anything is OK. It is painful. Add WithBlock can prevent the edge case. And n common case, the containerd will be serving in shortly. It is not harm to add WithBlock for containerd connection. Signed-off-by: Wei Fu <fuweid89@gmail.com>	2019-10-25 12:19:35 +08:00
Sebastiaan van Stijn	05469b5fa2	daemon: add "isWindows" const Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2019-10-17 23:49:43 +02:00
Justen Martin	3b49bd1d84	replaced call to deprecated grpc method WithDialer with WithContextDialer Signed-off-by: Justen Martin <jmart@the-coder.com>	2019-10-10 15:34:42 -05:00
Derek McGowan	bc5484d2dd	bump moby/buildkit f7042823e340d38d1746aa675b83d1aca431cee3 full diff: `588c73e1e4...f7042823e3` Signed-off-by: Sebastiaan van Stijn <github@gone.nl> fix daemon for changes in containerd registry configuration Signed-off-by: Evan Hazlett <ejhazlett@gmail.com> Update buildernext and daemon for buildkit update Signed-off-by: Derek McGowan <derek@mcgstyle.net>	2019-10-04 15:05:35 -07:00
Lukas Heeren	ce61a1ed98	Adding ability to change max download attempts Moby works perfectly when you are in a situation when one has a good and stable internet connection. Operating in area's where internet connectivity is likely to be lost in undetermined intervals, like a satellite connection or 4G/LTE in rural area's, can become a problem when pulling a new image. When connection is lost while image layers are being pulled, Moby will try to reconnect up to 5 times. If this fails, the incompletely downloaded layers are lost will need to be completely downloaded again during the next pull request. This means that we are using more data than we might have to. Pulling a layer multiple times from the start can become costly over a satellite or 4G/LTE connection. As these techniques (especially 4G) quite common in IoT and Moby is used to run Azure IoT Edge devices, I would like to add a settable maximum download attempts. The maximum download attempts is currently set at 5 (distribution/xfer/download.go). I would like to change this constant to a variable that the user can set. The default will still be 5, so nothing will change from the current version unless specified when starting the daemon with the added flag or in the config file. I added a default value of 5 for DefaultMaxDownloadAttempts and a settable max-download-attempts in the daemon config file. It is also added to the config of dockerd so it can be set with a flag when starting the daemon. This value gets stored in the imageService of the daemon when it is initiated and can be passed to the NewLayerDownloadManager as a parameter. It will be stored in the LayerDownloadManager when initiated. This enables us to set the max amount of retries in makeDownoadFunc equal to the max download attempts. I also added some tests that are based on maxConcurrentDownloads/maxConcurrentUploads. You can pull this version and test in a development container. Either create a config `file /etc/docker/daemon.json` with `{"max-download-attempts"=3}``, or use `dockerd --max-download-attempts=3 -D &` to start up the dockerd. Start downloading a container and disconnect from the internet whilst downloading. The result would be that it stops pulling after three attempts. Signed-off-by: Lukas Heeren <lukas-heeren@hotmail.com> Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2019-09-19 13:51:40 +02:00
Jintao Zhang	9134130b39	Remove `SystemInfo()` error handling. Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>	2019-08-29 07:44:39 +08:00
Tibor Vass	32688a47f3	Merge pull request #39699 from thaJeztah/mkdirall_dropin Allow system.MkDirAll() to be used as drop-in for os.MkDirAll()	2019-08-27 16:27:53 -07:00
Deep Debroy	4d5b6260bc	Fix regression in handling of NotFound err during startup Signed-off-by: Deep Debroy <ddebroy@docker.com>	2019-08-08 16:58:52 -07:00
Sebastiaan van Stijn	e554ab5589	Allow system.MkDirAll() to be used as drop-in for os.MkDirAll() also renamed the non-windows variant of this file to be consistent with other files in this package Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2019-08-08 15:05:49 +02:00
Sebastiaan van Stijn	bad0b4e604	Remove skip evaluation of symlinks to data root on IoT Core This fix was added in `8e71b1e210` to work around a go issue (https://github.com/golang/go/issues/20506). That issue was fixed in `66c03d39f3`, which is part of Go 1.10 and up. This reverts the changes that were made in `8e71b1e210`, and are no longer needed. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2019-07-13 23:44:51 +02:00
Brian Goff	24ad2f486d	Add (hidden) flags to set containerd namespaces This allows our tests, which all share a containerd instance, to be a bit more isolated by setting the containerd namespaces to the generated daemon ID's rather than the default namespaces. This came about because I found in some cases we had test daemons failing to start (really very slow to start) because it was (seemingly) processing events from other tests. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2019-07-11 17:27:48 -07:00
Tibor Vass	f695e98cb7	Revert "Remove the rest of v1 manifest support" This reverts commit `98fc09128b` in order to keep registry v2 schema1 handling and libtrust-key-based engine ID. Because registry v2 schema1 was not officially deprecated and registries are still relying on it, this patch puts its logic back. However, registry v1 relics are not added back since v1 logic has been removed a while ago. This also fixes an engine upgrade issue in a swarm cluster. It was relying on the Engine ID to be the same upon upgrade, but the mentioned commit modified the logic to use UUID and from a different file. Since the libtrust key is always needed to support v2 schema1 pushes, that the old engine ID is based on the libtrust key, and that the engine ID needs to be conserved across upgrades, adding a UUID-based engine ID logic seems to add more complexity than it solves the problems. Hence reverting the engine ID changes as well. Signed-off-by: Tibor Vass <tibor@docker.com>	2019-06-18 00:36:01 +00:00
Sebastiaan van Stijn	28678f2226	Merge pull request #38349 from wk8/wk8/os_version Adding OS version info to nodes' `Info` struct and to the system info's API	2019-06-07 14:54:51 +02:00
Sebastiaan van Stijn	c85fe2d224	Merge pull request #38522 from cpuguy83/fix_timers Make sure timers are stopped after use.	2019-06-07 13:16:46 +02:00
Jean Rouge	d363a1881e	Adding OS version info to the nodes' `Info` struct This is needed so that we can add OS version constraints in Swarmkit, which does require the engine to report its host's OS version (see https://github.com/docker/swarmkit/issues/2770). The OS version is parsed from the `os-release` file on Linux, and from the `ReleaseId` string value of the `SOFTWARE\Microsoft\Windows NT\CurrentVersion` registry key on Windows. Added unit tests when possible, as well as Prometheus metrics. Signed-off-by: Jean Rouge <rougej+github@gmail.com>	2019-06-06 22:40:10 +00:00
Michael Crosby	b9b5dc37e3	Remove inmemory container map Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2019-04-05 15:48:07 -04:00
Michael Crosby	45e328b0ac	Remove libcontainerd status type Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2019-04-04 15:17:13 -04:00
Tonis Tiigi	1a0f04e08e	daemon: fix mirrors validation Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>	2019-04-02 11:38:21 -07:00
John Howard	a3eda72f71	Merge pull request #38541 from Microsoft/jjh/containerd Windows: Experimental: ContainerD runtime	2019-03-19 21:09:19 -07:00
John Howard	85ad4b16c1	Windows: Experimental: Allow containerd for runtime Signed-off-by: John Howard <jhoward@microsoft.com> This is the first step in refactoring moby (dockerd) to use containerd on Windows. Similar to the current model in Linux, this adds the option to enable it for runtime. It does not switch the graphdriver to containerd snapshotters. - Refactors libcontainerd to a series of subpackages so that either a "local" containerd (1) or a "remote" (2) containerd can be loaded as opposed to conditional compile as "local" for Windows and "remote" for Linux. - Updates libcontainerd such that Windows has an option to allow the use of a "remote" containerd. Here, it communicates over a named pipe using GRPC. This is currently guarded behind the experimental flag, an environment variable, and the providing of a pipename to connect to containerd. - Infrastructure pieces such as under pkg/system to have helper functions for determining whether containerd is being used. (1) "local" containerd is what the daemon on Windows has used since inception. It's not really containerd at all - it's simply local invocation of HCS APIs directly in-process from the daemon through the Microsoft/hcsshim library. (2) "remote" containerd is what docker on Linux uses for it's runtime. It means that there is a separate containerd service running, and docker communicates over GRPC to it. To try this out, you will need to start with something like the following: Window 1: containerd --log-level debug Window 2: $env:DOCKER_WINDOWS_CONTAINERD=1 dockerd --experimental -D --containerd \\.\pipe\containerd-containerd You will need the following binary from github.com/containerd/containerd in your path: - containerd.exe You will need the following binaries from github.com/Microsoft/hcsshim in your path: - runhcs.exe - containerd-shim-runhcs-v1.exe For LCOW, it will require and initrd.img and kernel in `C:\Program Files\Linux Containers`. This is no different to the current requirements. However, you may need updated binaries, particularly initrd.img built from Microsoft/opengcs as (at the time of writing), Linuxkit binaries are somewhat out of date. Note that containerd and hcsshim for HCS v2 APIs do not yet support all the required functionality needed for docker. This will come in time - this is a baby (although large) step to migrating Docker on Windows to containerd. Note that the HCS v2 APIs are only called on RS5+ builds. RS1..RS4 will still use HCS v1 APIs as the v2 APIs were not fully developed enough on these builds to be usable. This abstraction is done in HCSShim. (Referring specifically to runtime) Note the LCOW graphdriver still uses HCS v1 APIs regardless. Note also that this does not migrate docker to use containerd snapshotters rather than graphdrivers. This needs to be done in conjunction with Linux also doing the same switch.	2019-03-12 18:41:55 -07:00
Justin Cormack	98fc09128b	Remove the rest of v1 manifest support As people are using the UUID in `docker info` that was based on the v1 manifest signing key, replace with a UUID instead. Remove deprecated `--disable-legacy-registry` option that was scheduled to be removed in 18.03. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2019-03-02 10:46:37 -08:00
Ryo Nakao	894ecb24d1	Merge the divided loops Signed-off-by: Ryo Nakao <nakabonne@gmail.com>	2019-02-24 16:16:19 +09:00
Akihiro Suda	ec87479b7e	allow running `dockerd` in an unprivileged user namespace (rootless mode) Please refer to `docs/rootless.md`. TLDR: * Make sure `/etc/subuid` and `/etc/subgid` contain the entry for you * `dockerd-rootless.sh --experimental` * `docker -H unix://$XDG_RUNTIME_DIR/docker.sock run ...` Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>	2019-02-04 00:24:27 +09:00
Brian Goff	eaad3ee3cf	Make sure timers are stopped after use. `time.After` keeps a timer running until the specified duration is completed. It also allocates a new timer on each call. This can wind up leaving lots of uneccessary timers running in the background that are not needed and consume resources. Instead of `time.After`, use `time.NewTimer` so the timer can actually be stopped. In some of these cases it's not a big deal since the duraiton is really short, but in others it is much worse. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2019-01-16 14:32:53 -08:00
Yong Tang	7315a2bb11	Fix go vet issue in daemon/daemon.go This fix fixes go vet issue: ``` daemon/daemon.go:273: loop variable id captured by func literal daemon/daemon.go:280: loop variable id captured by func literal ``` Signed-off-by: Yong Tang <yong.tang.github@outlook.com>	2019-01-06 00:18:29 +00:00
Akihiro Suda	2cb26cfe9c	Merge pull request #38301 from cyphar/waitgroup-limits daemon: switch to semaphore-gated WaitGroup for startup tasks	2018-12-22 00:07:55 +09:00
Aleksa Sarai	5a52917e4d	daemon: switch to semaphore-gated WaitGroup for startup tasks Many startup tasks have to run for each container, and thus using a WaitGroup (which doesn't have a limit to the number of parallel tasks) can result in Docker exceeding the NOFILE limit quite trivially. A more optimal solution is to have a parallelism limit by using a semaphore. In addition, several startup tasks were not parallelised previously which resulted in very long startup times. According to my testing, 20K dead containers resulted in ~6 minute startup times (during which time Docker is completely unusable). This patch fixes both issues, and the parallelStartupTimes factor chosen (128 * NumCPU) is based on my own significant testing of the 20K container case. This patch (on my machines) reduces the startup time from 6 minutes to less than a minute (ideally this could be further reduced by removing the need to scan all dead containers on startup -- but that's beyond the scope of this patchset). In order to avoid the NOFILE limit problem, we also detect this on-startup and if NOFILE < 2128NumCPU we will reduce the parallelism factor to avoid hitting NOFILE limits (but also emit a warning since this is almost certainly a mis-configuration). Signed-off-by: Aleksa Sarai <asarai@suse.de>	2018-12-21 21:51:02 +11:00

1 2 3 4 5 ...

868 commits