beenull/moby

Author	SHA1	Message	Date
Daniel Nephin	4f0d95fa6e	Add canonical import comment Signed-off-by: Daniel Nephin <dnephin@docker.com>	2018-02-05 16:51:57 -05:00
Brian Goff	c379d2681f	Fix race in attachable network attachment Attachable networks are networks created on the cluster which can then be attached to by non-swarm containers. These networks are lazily created on the node that wants to attach to that network. When no container is currently attached to one of these networks on a node, and then multiple containers which want that network are started concurrently, this can cause a race condition in the network attachment where essentially we try to attach the same network to the node twice. To easily reproduce this issue you must use a multi-node cluster with a worker node that has lots of CPUs (I used a 36 CPU node). Repro steps: 1. On manager, `docker network create -d overlay --attachable test` 2. On worker, `docker create --restart=always --network test busybox top`, many times... 200 is a good number (but not much more due to subnet size restrictions) 3. Restart the daemon When the daemon restarts, it will attempt to start all those containers simultaneously. Note that you could try to do this yourself over the API, but it's harder to trigger due to the added latency from going over the API. The error produced happens when the daemon tries to start the container upon allocating the network resources: ``` attaching to network failed, make sure your network options are correct and check manager logs: context deadline exceeded ``` What happens here is the worker makes a network attachment request to the manager. This is an async call which in the happy case would cause a task to be placed on the node, which the worker is waiting for to get the network configuration. In the case of this race, the error ocurrs on the manager like this: ``` task allocation failure" error="failed during network allocation for task n7bwwwbymj2o2h9asqkza8gom: failed to allocate network IP for task n7bwwwbymj2o2h9asqkza8gom network rj4szie2zfauqnpgh4eri1yue: could not find an available IP" module=node node.id=u3489c490fx1df8onlyfo1v6e ``` The task is not created and the worker times out waiting for the task. --- The mitigation for this is to make sure that only one attachment reuest is in flight for a given network at a time when the network doesn't already exist on the node. If the network already exists on the node there is no need for synchronization because the network is already allocated and on the node so there is no need to request it from the manager. This basically comes down to a race with `Find(network) \|\| Create(network)` without any sort of syncronization. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2018-02-02 13:46:23 -05:00
Allen Sun	de68ac8393	Simplify codes on calculating shutdown timeout Signed-off-by: Allen Sun <shlallen1990@gmail.com> Signed-off-by: Vincent Demeester <vincent@sbr.pm>	2018-01-26 09:18:07 -08:00
John Howard	0cba7740d4	Address feedback from Tonis Signed-off-by: John Howard <jhoward@microsoft.com>	2018-01-18 12:30:39 -08:00
John Howard	afd305c4b5	LCOW: Refactor to multiple layer-stores based on feedback Signed-off-by: John Howard <jhoward@microsoft.com>	2018-01-18 08:31:05 -08:00
John Howard	ce8e529e18	LCOW: Re-coalesce stores Signed-off-by: John Howard <jhoward@microsoft.com> The re-coalesces the daemon stores which were split as part of the original LCOW implementation. This is part of the work discussed in https://github.com/moby/moby/issues/34617, in particular see the document linked to in that issue.	2018-01-18 08:29:19 -08:00
Yong Tang	c36274da83	Merge pull request #35638 from cpuguy83/error_helpers2 Add helpers to create errdef errors	2018-01-15 10:56:46 -08:00
Sebastiaan van Stijn	b4a6313969	Golint: remove redundant ifs Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2018-01-15 00:42:25 +01:00
Brian Goff	d453fe35b9	Move api/errdefs to errdefs Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2018-01-11 21:21:43 -05:00
Victor Vieux	745278d242	Merge pull request #35812 from stevvooe/follow-conventions daemon, plugin: follow containerd namespace conventions	2017-12-19 15:55:39 -08:00
Brian Goff	e69127bd5b	Ensure containers are stopped on daemon startup When the containerd 1.0 runtime changes were made, we inadvertantly removed the functionality where any running containers are killed on startup when not using live-restore. This change restores that behavior. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2017-12-18 14:33:45 -05:00
Stephen J Day	521e7eba86	daemon, plugin: follow containerd namespace conventions Follow the conventions for namespace naming set out by other projects, such as linuxkit and cri-containerd. Typically, they are some sort of host name, with a subdomain describing functionality of the namespace. In the case of linuxkit, services are launched in `services.linuxkit`. In cri-containerd, pods are launched in `k8s.io`, making it clear that these are from kubernetes. Signed-off-by: Stephen J Day <stephen.day@docker.com>	2017-12-15 17:20:42 -08:00
Kir Kolyshkin	516010e92d	Simplify/fix MkdirAll usage This subtle bug keeps lurking in because error checking for `Mkdir()` and `MkdirAll()` is slightly different wrt to `EEXIST`/`IsExist`: - for `Mkdir()`, `IsExist` error should (usually) be ignored (unless you want to make sure directory was not there before) as it means "the destination directory was already there" - for `MkdirAll()`, `IsExist` error should NEVER be ignored. Mostly, this commit just removes ignoring the IsExist error, as it should not be ignored. Also, there are a couple of cases then IsExist is handled as "directory already exist" which is wrong. As a result, some code that never worked as intended is now removed. NOTE that `idtools.MkdirAndChown()` behaves like `os.MkdirAll()` rather than `os.Mkdir()` -- so its description is amended accordingly, and its usage is handled as such (i.e. IsExist error is not ignored). For more details, a quote from my runc commit 6f82d4b (July 2015): TL;DR: check for IsExist(err) after a failed MkdirAll() is both redundant and wrong -- so two reasons to remove it. Quoting MkdirAll documentation: > MkdirAll creates a directory named path, along with any necessary > parents, and returns nil, or else returns an error. If path > is already a directory, MkdirAll does nothing and returns nil. This means two things: 1. If a directory to be created already exists, no error is returned. 2. If the error returned is IsExist (EEXIST), it means there exists a non-directory with the same name as MkdirAll need to use for directory. Example: we want to MkdirAll("a/b"), but file "a" (or "a/b") already exists, so MkdirAll fails. The above is a theory, based on quoted documentation and my UNIX knowledge. 3. In practice, though, current MkdirAll implementation [1] returns ENOTDIR in most of cases described in #2, with the exception when there is a race between MkdirAll and someone else creating the last component of MkdirAll argument as a file. In this very case MkdirAll() will indeed return EEXIST. Because of #1, IsExist check after MkdirAll is not needed. Because of #2 and #3, ignoring IsExist error is just plain wrong, as directory we require is not created. It's cleaner to report the error now. Note this error is all over the tree, I guess due to copy-paste, or trying to follow the same usage pattern as for Mkdir(), or some not quite correct examples on the Internet. [1] https://github.com/golang/go/blob/f9ed2f75/src/os/path.go Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2017-11-27 17:32:12 -08:00
Darren Stahl	ed74ee127f	Increase container default shutdown timeout on Windows The shutdown timeout for containers in insufficient on Windows. If the daemon is shutting down, and a container takes longer than expected to shut down, this can cause the container to remain in a bad state after restart, and never be able to start again. Increasing the timeout makes this less likely to occur. Signed-off-by: Darren Stahl <darst@microsoft.com>	2017-10-23 10:31:31 -07:00
Brian Goff	402540708c	Merge pull request #34895 from mlaventure/containerd-1.0-client Containerd 1.0 client	2017-10-23 10:38:03 -04:00
Yong Tang	ab0eb8fcf6	Merge pull request #35077 from ryansimmen/35076-WindowsDaemonTmpDir Windows Daemon should respect DOCKER_TMPDIR	2017-10-20 08:40:43 -07:00
Kenfe-Mickael Laventure	ddae20c032	Update libcontainerd to use containerd 1.0 Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>	2017-10-20 07:11:37 -07:00
Ryan Simmen	5611f127a7	Windows Daemon should respect DOCKER_TMPDIR Signed-off-by: Ryan Simmen <ryan.simmen@gmail.com>	2017-10-19 10:47:46 -04:00
John Howard	0380fbff37	LCOW: API: Add platform to /images/create and /build Signed-off-by: John Howard <jhoward@microsoft.com> This PR has the API changes described in https://github.com/moby/moby/issues/34617. Specifically, it adds an HTTP header "X-Requested-Platform" which is a JSON-encoded OCI Image-spec `Platform` structure. In addition, it renames (almost all) uses of a string variable platform (and associated) methods/functions to os. This makes it much clearer to disambiguate with the swarm "platform" which is really os/arch. This is a stepping stone to getting the daemon towards fully multi-platform/arch-aware, and makes it clear when "operating system" is being referred to rather than "platform" which is misleadingly used - sometimes in the swarm meaning, but more often as just the operating system.	2017-10-06 11:44:18 -07:00
Pradip Dhara	d00a07b1e6	Updating moby to correspond to naming convention used in https://github.com/docker/swarmkit/pull/2385 Signed-off-by: Pradip Dhara <pradipd@microsoft.com>	2017-09-26 22:08:10 +00:00
Victor Vieux	a971f9c9d7	Merge pull request #34911 from dnephin/new-ci-entrypoint Add a new entrypoint for CI	2017-09-26 11:50:44 -07:00
Sebastiaan van Stijn	2b50b14aeb	Suppress warning for renaming missing tmp directory When starting `dockerd` on a host that has no `/var/lib/docker/tmp` directory, a warning was printed in the logs: $ dockerd --data-root=/no-such-directory ... WARN[2017-09-26T09:37:00.045153377Z] failed to rename /no-such-directory/tmp for background deletion: rename /no-such-directory/tmp /no-such-directory/tmp-old: no such file or directory. Deleting synchronously Although harmless, the warning does not show any useful information, so can be skipped. This patch checks thetype of error, so that warning is not printed. Other errors will still show up: $ touch /i-am-a-file $ dockerd --data-root=/i-am-a-file Unable to get the full path to root (/i-am-a-file): canonical path points to a file '/i-am-a-file' Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2017-09-26 12:04:30 +02:00
Daniel Nephin	dbf580be57	Add a new entrypoint for CI Signed-off-by: Daniel Nephin <dnephin@docker.com>	2017-09-20 17:26:30 -04:00
Brian Goff	c85e8622a4	Decouple plugin manager from libcontainerd package libcontainerd has a bunch of platform dependent code and huge interfaces that are a pain implement. To make the plugin manager a bit easier to work with, extract the plugin executor into an interface and move the containerd implementation to a separate package. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2017-09-19 12:17:55 -04:00
Victor Vieux	a2ee40b98c	Merge pull request #34674 from pradipd/windows_routingmesh Enabling ILB/ELB on windows using per-node, per-network LB endpoint.	2017-09-18 15:56:17 -07:00
Pradip Dhara	9bed0883e7	Enabling ILB/ELB on windows using per-node, per-network LB endpoint. Signed-off-by: Pradip Dhara <pradipd@microsoft.com>	2017-09-18 20:27:56 +00:00
Akash Gupta	7a7357dae1	LCOW: Implemented support for docker cp + build This enables docker cp and ADD/COPY docker build support for LCOW. Originally, the graphdriver.Get() interface returned a local path to the container root filesystem. This does not work for LCOW, so the Get() method now returns an interface that LCOW implements to support copying to and from the container. Signed-off-by: Akash Gupta <akagup@microsoft.com>	2017-09-14 12:07:52 -07:00
Yong Tang	2dcb77b24c	Merge pull request #34738 from wgliang/optimization1 Optimize some wrong usage and spelling	2017-09-07 09:45:14 -07:00
wangguoliang	94cefa2145	Optimize some wrong usage and spelling Signed-off-by: wgliang <liangcszzu@163.com>	2017-09-07 09:44:08 +08:00
Daniel Nephin	2f007e46d0	Remove libtrust dep from api Signed-off-by: Daniel Nephin <dnephin@docker.com>	2017-09-06 12:05:19 -04:00
Daniel Nephin	b68221c37e	Fix bad import graph from opts/opts.go Signed-off-by: Daniel Nephin <dnephin@docker.com>	2017-08-29 15:32:43 -04:00
John Stephens	3d22daeb83	Merge pull request #34568 from Microsoft/jjh/singletagstore Move to a single tag-store	2017-08-22 17:50:36 -07:00
John Howard	7b9a8f460b	Move to a single tag-store Signed-off-by: John Howard <jhoward@microsoft.com>	2017-08-18 17:09:27 -07:00
Daniel Nephin	9b47b7b151	Fix golint errors. Signed-off-by: Daniel Nephin <dnephin@docker.com>	2017-08-18 14:23:44 -04:00
Derek McGowan	1009e6a40b	Update logrus to v1.0.1 Fixes case sensitivity issue Signed-off-by: Derek McGowan <derek@mcgstyle.net>	2017-07-31 13:16:46 -07:00
Flavio Crisciani	f9f25ca5e4	Allow to set the control plane MTU Add daemon config to allow the user to specify the MTU of the control plane network. The first user of this new parameter is actually libnetwork that can seed the gossip with the proper MTU value allowing to pack multiple messages per UDP packet sent. If the value is not specified or is lower than 1500 the logic will set it to the default. Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>	2017-07-28 13:52:03 -07:00
Brian Goff	9319a8a2dd	Merge pull request #33440 from RenaudWasTaken/genericresource Added support for Generic Resources	2017-07-25 15:32:25 -04:00
Renaud Gaubert	87e1464c43	Added support for Generic Resources Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com>	2017-07-24 17:49:56 -07:00
Jérôme Petazzoni	84aefe8697	Add a log message when the storage driver is overriden through the environment Signed-off-by: Jérôme Petazzoni <jerome.petazzoni@gmail.com>	2017-07-20 17:38:34 +02:00
Aaron Lehmann	1128fc1add	Store container names in memdb Currently, names are maintained by a separate system called "registrar". This means there is no way to atomically snapshot the state of containers and the names associated with them. We can add this atomicity and simplify the code by storing name associations in the memdb. This removes the need for pkg/registrar, and makes snapshots a lot less expensive because they no longer need to copy all the names. This change also avoids some problematic behavior from pkg/registrar where it returns slices which may be modified later on. Note that while this change makes the snapshotting atomic, it doesn't yet do anything to make sure containers are named at the same time that they are added to the database. We can do that by adding a transactional interface, either as a followup, or as part of this PR. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>	2017-07-13 12:35:00 -07:00
Brian Goff	c3feb046b9	Allow stopping of paused container When a container is paused, signals are sent once the container has been unpaused. Instead of forcing the user to unpause a container before they can ever send a signal, allow the user to send the signals, and in the case of a stop signal, automatically unpause the container afterwards. This is much safer than unpausing the container first then sending a signal (what a user is currently forced to do), as the container may be paused for very good reasons and should not be unpaused except for stopping. Note that not even SIGKILL is possible while a process is paused, but it is killed the instant it is unpaused. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2017-07-12 10:35:48 -04:00
Michael Crosby	9d87e6e0fb	Do not set -1 for swappiness Do not set a default value for swappiness as the default value should be `nil` Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2017-07-03 11:23:15 -07:00
Fabio Kung	37addf0a50	Net operations already hold locks to containers Fix a deadlock caused by re-entrant locks on container objects. Signed-off-by: Fabio Kung <fabio.kung@gmail.com>	2017-06-23 07:52:35 -07:00
Fabio Kung	76d96418b1	avoid saving container state to disk before daemon.Register Migrate legacy volumes (Daemon.verifyVolumesInfo) before containers are registered on the Daemon, so state on disk is not overwritten and legacy fields lost during registration. Signed-off-by: Fabio Kung <fabio.kung@gmail.com>	2017-06-23 07:52:34 -07:00
Fabio Kung	edad52707c	save deep copies of Container in the replica store Reuse existing structures and rely on json serialization to deep copy Container objects. Also consolidate all "save" operations on container.CheckpointTo, which now both saves a serialized json to disk, and replicates state to the ACID in-memory store. Signed-off-by: Fabio Kung <fabio.kung@gmail.com>	2017-06-23 07:52:33 -07:00
Fabio Kung	aacddda89d	Move checkpointing to the Container object Also hide ViewDB behind an inteface. Signed-off-by: Fabio Kung <fabio.kung@gmail.com>	2017-06-23 07:52:32 -07:00
Fabio Kung	eed4c7b73f	keep a consistent view of containers rendered Replicate relevant mutations to the in-memory ACID store. Readers will then be able to query container state without locking. Signed-off-by: Fabio Kung <fabio.kung@gmail.com>	2017-06-23 07:52:31 -07:00
Fabio Kung	481a92cb41	Grab a lock to read container.RemovalInProgress Signed-off-by: Fabio Kung <fabio.kung@gmail.com>	2017-06-21 19:11:23 -07:00
John Howard	ed10ac6ee9	LCOW: Create layer folders with correct ACL Signed-off-by: John Howard <jhoward@microsoft.com>	2017-06-20 19:50:12 -07:00
John Howard	87abf34a3d	LCOW: Store integrity checks Signed-off-by: John Howard <jhoward@microsoft.com>	2017-06-20 19:49:53 -07:00

1 2 3 4 5 ...

794 commits