Commit graph

794 commits

Author SHA1 Message Date
Daniel Nephin
4f0d95fa6e Add canonical import comment
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2018-02-05 16:51:57 -05:00
Brian Goff
c379d2681f Fix race in attachable network attachment
Attachable networks are networks created on the cluster which can then
be attached to by non-swarm containers. These networks are lazily
created on the node that wants to attach to that network.

When no container is currently attached to one of these networks on a
node, and then multiple containers which want that network are started
concurrently, this can cause a race condition in the network attachment
where essentially we try to attach the same network to the node twice.

To easily reproduce this issue you must use a multi-node cluster with a
worker node that has lots of CPUs (I used a 36 CPU node).

Repro steps:

1. On manager, `docker network create -d overlay --attachable test`
2. On worker, `docker create --restart=always --network test busybox
top`, many times... 200 is a good number (but not much more due to
subnet size restrictions)
3. Restart the daemon

When the daemon restarts, it will attempt to start all those containers
simultaneously. Note that you could try to do this yourself over the API,
but it's harder to trigger due to the added latency from going over
the API.

The error produced happens when the daemon tries to start the container
upon allocating the network resources:

```
attaching to network failed, make sure your network options are correct and check manager logs: context deadline exceeded
```

What happens here is the worker makes a network attachment request to
the manager. This is an async call which in the happy case would cause a
task to be placed on the node, which the worker is waiting for to get
the network configuration.
In the case of this race, the error ocurrs on the manager like this:

```
task allocation failure" error="failed during network allocation for task n7bwwwbymj2o2h9asqkza8gom: failed to allocate network IP for task n7bwwwbymj2o2h9asqkza8gom network rj4szie2zfauqnpgh4eri1yue: could not find an available IP" module=node node.id=u3489c490fx1df8onlyfo1v6e
```

The task is not created and the worker times out waiting for the task.

---

The mitigation for this is to make sure that only one attachment reuest
is in flight for a given network at a time *when the network doesn't
already exist on the node*. If the network already exists on the node
there is no need for synchronization because the network is already
allocated and on the node so there is no need to request it from the
manager.

This basically comes down to a race with `Find(network) ||
Create(network)` without any sort of syncronization.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-02-02 13:46:23 -05:00
Allen Sun
de68ac8393
Simplify codes on calculating shutdown timeout
Signed-off-by: Allen Sun <shlallen1990@gmail.com>
Signed-off-by: Vincent Demeester <vincent@sbr.pm>
2018-01-26 09:18:07 -08:00
John Howard
0cba7740d4 Address feedback from Tonis
Signed-off-by: John Howard <jhoward@microsoft.com>
2018-01-18 12:30:39 -08:00
John Howard
afd305c4b5 LCOW: Refactor to multiple layer-stores based on feedback
Signed-off-by: John Howard <jhoward@microsoft.com>
2018-01-18 08:31:05 -08:00
John Howard
ce8e529e18 LCOW: Re-coalesce stores
Signed-off-by: John Howard <jhoward@microsoft.com>

The re-coalesces the daemon stores which were split as part of the
original LCOW implementation.

This is part of the work discussed in https://github.com/moby/moby/issues/34617,
in particular see the document linked to in that issue.
2018-01-18 08:29:19 -08:00
Yong Tang
c36274da83
Merge pull request #35638 from cpuguy83/error_helpers2
Add helpers to create errdef errors
2018-01-15 10:56:46 -08:00
Sebastiaan van Stijn
b4a6313969
Golint: remove redundant ifs
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2018-01-15 00:42:25 +01:00
Brian Goff
d453fe35b9 Move api/errdefs to errdefs
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-01-11 21:21:43 -05:00
Victor Vieux
745278d242
Merge pull request #35812 from stevvooe/follow-conventions
daemon, plugin: follow containerd namespace conventions
2017-12-19 15:55:39 -08:00
Brian Goff
e69127bd5b Ensure containers are stopped on daemon startup
When the containerd 1.0 runtime changes were made, we inadvertantly
removed the functionality where any running containers are killed on
startup when not using live-restore.
This change restores that behavior.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2017-12-18 14:33:45 -05:00
Stephen J Day
521e7eba86
daemon, plugin: follow containerd namespace conventions
Follow the conventions for namespace naming set out by other projects,
such as linuxkit and cri-containerd. Typically, they are some sort of
host name, with a subdomain describing functionality of the namespace.
In the case of linuxkit, services are launched in `services.linuxkit`.
In cri-containerd, pods are launched in `k8s.io`, making it clear that
these are from kubernetes.

Signed-off-by: Stephen J Day <stephen.day@docker.com>
2017-12-15 17:20:42 -08:00
Kir Kolyshkin
516010e92d Simplify/fix MkdirAll usage
This subtle bug keeps lurking in because error checking for `Mkdir()`
and `MkdirAll()` is slightly different wrt to `EEXIST`/`IsExist`:

 - for `Mkdir()`, `IsExist` error should (usually) be ignored
   (unless you want to make sure directory was not there before)
   as it means "the destination directory was already there"

 - for `MkdirAll()`, `IsExist` error should NEVER be ignored.

Mostly, this commit just removes ignoring the IsExist error, as it
should not be ignored.

Also, there are a couple of cases then IsExist is handled as
"directory already exist" which is wrong. As a result, some code
that never worked as intended is now removed.

NOTE that `idtools.MkdirAndChown()` behaves like `os.MkdirAll()`
rather than `os.Mkdir()` -- so its description is amended accordingly,
and its usage is handled as such (i.e. IsExist error is not ignored).

For more details, a quote from my runc commit 6f82d4b (July 2015):

    TL;DR: check for IsExist(err) after a failed MkdirAll() is both
    redundant and wrong -- so two reasons to remove it.

    Quoting MkdirAll documentation:

    > MkdirAll creates a directory named path, along with any necessary
    > parents, and returns nil, or else returns an error. If path
    > is already a directory, MkdirAll does nothing and returns nil.

    This means two things:

    1. If a directory to be created already exists, no error is
    returned.

    2. If the error returned is IsExist (EEXIST), it means there exists
    a non-directory with the same name as MkdirAll need to use for
    directory. Example: we want to MkdirAll("a/b"), but file "a"
    (or "a/b") already exists, so MkdirAll fails.

    The above is a theory, based on quoted documentation and my UNIX
    knowledge.

    3. In practice, though, current MkdirAll implementation [1] returns
    ENOTDIR in most of cases described in #2, with the exception when
    there is a race between MkdirAll and someone else creating the
    last component of MkdirAll argument as a file. In this very case
    MkdirAll() will indeed return EEXIST.

    Because of #1, IsExist check after MkdirAll is not needed.

    Because of #2 and #3, ignoring IsExist error is just plain wrong,
    as directory we require is not created. It's cleaner to report
    the error now.

    Note this error is all over the tree, I guess due to copy-paste,
    or trying to follow the same usage pattern as for Mkdir(),
    or some not quite correct examples on the Internet.

    [1] https://github.com/golang/go/blob/f9ed2f75/src/os/path.go

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2017-11-27 17:32:12 -08:00
Darren Stahl
ed74ee127f Increase container default shutdown timeout on Windows
The shutdown timeout for containers in insufficient on Windows. If the daemon is shutting down, and a container takes longer than expected to shut down, this can cause the container to remain in a bad state after restart, and never be able to start again. Increasing the timeout makes this less likely to occur.

Signed-off-by: Darren Stahl <darst@microsoft.com>
2017-10-23 10:31:31 -07:00
Brian Goff
402540708c Merge pull request #34895 from mlaventure/containerd-1.0-client
Containerd 1.0 client
2017-10-23 10:38:03 -04:00
Yong Tang
ab0eb8fcf6 Merge pull request #35077 from ryansimmen/35076-WindowsDaemonTmpDir
Windows Daemon should respect DOCKER_TMPDIR
2017-10-20 08:40:43 -07:00
Kenfe-Mickael Laventure
ddae20c032
Update libcontainerd to use containerd 1.0
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2017-10-20 07:11:37 -07:00
Ryan Simmen
5611f127a7 Windows Daemon should respect DOCKER_TMPDIR
Signed-off-by: Ryan Simmen <ryan.simmen@gmail.com>
2017-10-19 10:47:46 -04:00
John Howard
0380fbff37 LCOW: API: Add platform to /images/create and /build
Signed-off-by: John Howard <jhoward@microsoft.com>

This PR has the API changes described in https://github.com/moby/moby/issues/34617.
Specifically, it adds an HTTP header "X-Requested-Platform" which is a JSON-encoded
OCI Image-spec `Platform` structure.

In addition, it renames (almost all) uses of a string variable platform (and associated)
methods/functions to os. This makes it much clearer to disambiguate with the swarm
"platform" which is really os/arch. This is a stepping stone to getting the daemon towards
fully multi-platform/arch-aware, and makes it clear when "operating system" is being
referred to rather than "platform" which is misleadingly used - sometimes in the swarm
meaning, but more often as just the operating system.
2017-10-06 11:44:18 -07:00
Pradip Dhara
d00a07b1e6 Updating moby to correspond to naming convention used in https://github.com/docker/swarmkit/pull/2385
Signed-off-by: Pradip Dhara <pradipd@microsoft.com>
2017-09-26 22:08:10 +00:00
Victor Vieux
a971f9c9d7 Merge pull request #34911 from dnephin/new-ci-entrypoint
Add a new entrypoint for CI
2017-09-26 11:50:44 -07:00
Sebastiaan van Stijn
2b50b14aeb
Suppress warning for renaming missing tmp directory
When starting `dockerd` on a host that has no `/var/lib/docker/tmp` directory,
a warning was printed in the logs:

    $ dockerd --data-root=/no-such-directory
    ...
    WARN[2017-09-26T09:37:00.045153377Z] failed to rename /no-such-directory/tmp for background deletion: rename /no-such-directory/tmp /no-such-directory/tmp-old: no such file or directory. Deleting synchronously

Although harmless, the warning does not show any useful information, so can be
skipped.

This patch checks thetype of error, so that warning is not printed.
Other errors will still show up:

    $ touch /i-am-a-file
    $ dockerd --data-root=/i-am-a-file
    Unable to get the full path to root (/i-am-a-file): canonical path points to a file '/i-am-a-file'

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2017-09-26 12:04:30 +02:00
Daniel Nephin
dbf580be57 Add a new entrypoint for CI
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2017-09-20 17:26:30 -04:00
Brian Goff
c85e8622a4 Decouple plugin manager from libcontainerd package
libcontainerd has a bunch of platform dependent code and huge interfaces
that are a pain implement.
To make the plugin manager a bit easier to work with, extract the plugin
executor into an interface and move the containerd implementation to a
separate package.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2017-09-19 12:17:55 -04:00
Victor Vieux
a2ee40b98c Merge pull request #34674 from pradipd/windows_routingmesh
Enabling ILB/ELB on windows using per-node, per-network LB endpoint.
2017-09-18 15:56:17 -07:00
Pradip Dhara
9bed0883e7 Enabling ILB/ELB on windows using per-node, per-network LB endpoint.
Signed-off-by: Pradip Dhara <pradipd@microsoft.com>
2017-09-18 20:27:56 +00:00
Akash Gupta
7a7357dae1 LCOW: Implemented support for docker cp + build
This enables docker cp and ADD/COPY docker build support for LCOW.
Originally, the graphdriver.Get() interface returned a local path
to the container root filesystem. This does not work for LCOW, so
the Get() method now returns an interface that LCOW implements to
support copying to and from the container.

Signed-off-by: Akash Gupta <akagup@microsoft.com>
2017-09-14 12:07:52 -07:00
Yong Tang
2dcb77b24c Merge pull request #34738 from wgliang/optimization1
Optimize some wrong usage and spelling
2017-09-07 09:45:14 -07:00
wangguoliang
94cefa2145 Optimize some wrong usage and spelling
Signed-off-by: wgliang <liangcszzu@163.com>
2017-09-07 09:44:08 +08:00
Daniel Nephin
2f007e46d0 Remove libtrust dep from api
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2017-09-06 12:05:19 -04:00
Daniel Nephin
b68221c37e Fix bad import graph from opts/opts.go
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2017-08-29 15:32:43 -04:00
John Stephens
3d22daeb83 Merge pull request #34568 from Microsoft/jjh/singletagstore
Move to a single tag-store
2017-08-22 17:50:36 -07:00
John Howard
7b9a8f460b Move to a single tag-store
Signed-off-by: John Howard <jhoward@microsoft.com>
2017-08-18 17:09:27 -07:00
Daniel Nephin
9b47b7b151 Fix golint errors.
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2017-08-18 14:23:44 -04:00
Derek McGowan
1009e6a40b
Update logrus to v1.0.1
Fixes case sensitivity issue

Signed-off-by: Derek McGowan <derek@mcgstyle.net>
2017-07-31 13:16:46 -07:00
Flavio Crisciani
f9f25ca5e4
Allow to set the control plane MTU
Add daemon config to allow the user to specify the MTU of the control plane network.
The first user of this new parameter is actually libnetwork that can seed the
gossip with the proper MTU value allowing to pack multiple messages per UDP packet sent.
If the value is not specified or is lower than 1500 the logic will set it to the default.

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-07-28 13:52:03 -07:00
Brian Goff
9319a8a2dd Merge pull request #33440 from RenaudWasTaken/genericresource
Added support for Generic Resources
2017-07-25 15:32:25 -04:00
Renaud Gaubert
87e1464c43 Added support for Generic Resources
Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com>
2017-07-24 17:49:56 -07:00
Jérôme Petazzoni
84aefe8697 Add a log message when the storage driver is overriden through the environment
Signed-off-by: Jérôme Petazzoni <jerome.petazzoni@gmail.com>
2017-07-20 17:38:34 +02:00
Aaron Lehmann
1128fc1add Store container names in memdb
Currently, names are maintained by a separate system called "registrar".
This means there is no way to atomically snapshot the state of
containers and the names associated with them.

We can add this atomicity and simplify the code by storing name
associations in the memdb. This removes the need for pkg/registrar, and
makes snapshots a lot less expensive because they no longer need to copy
all the names. This change also avoids some problematic behavior from
pkg/registrar where it returns slices which may be modified later on.

Note that while this change makes the *snapshotting* atomic, it doesn't
yet do anything to make sure containers are named at the same time that
they are added to the database. We can do that by adding a transactional
interface, either as a followup, or as part of this PR.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
2017-07-13 12:35:00 -07:00
Brian Goff
c3feb046b9 Allow stopping of paused container
When a container is paused, signals are sent once the container has been
unpaused.
Instead of forcing the user to unpause a container before they can ever
send a signal, allow the user to send the signals, and in the case of a
stop signal, automatically unpause the container afterwards.

This is much safer than unpausing the container first then sending a
signal (what a user is currently forced to do), as the container may be
paused for very good reasons and should not be unpaused except for
stopping.
Note that not even SIGKILL is possible while a process is paused,
but it is killed the instant it is unpaused.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2017-07-12 10:35:48 -04:00
Michael Crosby
9d87e6e0fb Do not set -1 for swappiness
Do not set a default value for swappiness as the default value should be
`nil`

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-07-03 11:23:15 -07:00
Fabio Kung
37addf0a50 Net operations already hold locks to containers
Fix a deadlock caused by re-entrant locks on container objects.

Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
2017-06-23 07:52:35 -07:00
Fabio Kung
76d96418b1 avoid saving container state to disk before daemon.Register
Migrate legacy volumes (Daemon.verifyVolumesInfo) before containers are
registered on the Daemon, so state on disk is not overwritten and legacy
fields lost during registration.

Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
2017-06-23 07:52:34 -07:00
Fabio Kung
edad52707c save deep copies of Container in the replica store
Reuse existing structures and rely on json serialization to deep copy
Container objects.

Also consolidate all "save" operations on container.CheckpointTo, which
now both saves a serialized json to disk, and replicates state to the
ACID in-memory store.

Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
2017-06-23 07:52:33 -07:00
Fabio Kung
aacddda89d Move checkpointing to the Container object
Also hide ViewDB behind an inteface.

Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
2017-06-23 07:52:32 -07:00
Fabio Kung
eed4c7b73f keep a consistent view of containers rendered
Replicate relevant mutations to the in-memory ACID store. Readers will
then be able to query container state without locking.

Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
2017-06-23 07:52:31 -07:00
Fabio Kung
481a92cb41 Grab a lock to read container.RemovalInProgress
Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
2017-06-21 19:11:23 -07:00
John Howard
ed10ac6ee9 LCOW: Create layer folders with correct ACL
Signed-off-by: John Howard <jhoward@microsoft.com>
2017-06-20 19:50:12 -07:00
John Howard
87abf34a3d LCOW: Store integrity checks
Signed-off-by: John Howard <jhoward@microsoft.com>
2017-06-20 19:49:53 -07:00