Commit graph

833 commits

Author SHA1 Message Date
Wei Fu
9ed0504592 daemon: add grpc.WithBlock option
WithBlock makes sure that the following containerd request is reliable.

In one edge case with high load pressure, kernel kills dockerd, containerd
and containerd-shims caused by OOM. When both dockerd and containerd
restart, but containerd will take time to recover all the existing
containers. Before containerd serving, dockerd will failed with gRPC
error. That bad thing is that restore action will still ignore the
any non-NotFound errors and returns running state for
already stopped container. It is unexpected behavior. And
we need to restart dockerd to make sure that anything is OK.

It is painful. Add WithBlock can prevent the edge case. And
n common case, the containerd will be serving in shortly.
It is not harm to add WithBlock for containerd connection.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
(cherry picked from commit 9f73396dab)
Signed-off-by: Wei Fu <fuweid89@gmail.com>
2020-02-22 14:28:28 +08:00
Brian Goff
34418110ec
Add (hidden) flags to set containerd namespaces
This allows our tests, which all share a containerd instance, to be a
bit more isolated by setting the containerd namespaces to the generated
daemon ID's rather than the default namespaces.

This came about because I found in some cases we had test daemons
failing to start (really very slow to start) because it was (seemingly)
processing events from other tests.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit 24ad2f486d)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-09-25 17:26:26 +02:00
Deep Debroy
685565ad18
Fix regression in handling of NotFound err during startup
Signed-off-by: Deep Debroy <ddebroy@docker.com>
(cherry picked from commit 4d5b6260bc)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-08-09 02:09:13 +02:00
Tibor Vass
99cd23cefd Revert "Remove the rest of v1 manifest support"
This reverts commit 98fc09128b in order to
keep registry v2 schema1 handling and libtrust-key-based engine ID.

Because registry v2 schema1 was not officially deprecated and
registries are still relying on it, this patch puts its logic back.

However, registry v1 relics are not added back since v1 logic has been
removed a while ago.

This also fixes an engine upgrade issue in a swarm cluster. It was relying
on the Engine ID to be the same upon upgrade, but the mentioned commit
modified the logic to use UUID and from a different file.

Since the libtrust key is always needed to support v2 schema1 pushes,
that the old engine ID is based on the libtrust key, and that the engine ID
needs to be conserved across upgrades, adding a UUID-based engine ID logic
seems to add more complexity than it solves the problems.

Hence reverting the engine ID changes as well.

Signed-off-by: Tibor Vass <tibor@docker.com>
(cherry picked from commit f695e98cb7)
Signed-off-by: Tibor Vass <tibor@docker.com>
2019-06-18 18:54:57 +00:00
Michael Crosby
b9b5dc37e3 Remove inmemory container map
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-04-05 15:48:07 -04:00
Michael Crosby
45e328b0ac Remove libcontainerd status type
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-04-04 15:17:13 -04:00
Tonis Tiigi
1a0f04e08e daemon: fix mirrors validation
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
2019-04-02 11:38:21 -07:00
John Howard
a3eda72f71
Merge pull request #38541 from Microsoft/jjh/containerd
Windows: Experimental: ContainerD runtime
2019-03-19 21:09:19 -07:00
John Howard
85ad4b16c1 Windows: Experimental: Allow containerd for runtime
Signed-off-by: John Howard <jhoward@microsoft.com>

This is the first step in refactoring moby (dockerd) to use containerd on Windows.
Similar to the current model in Linux, this adds the option to enable it for runtime.
It does not switch the graphdriver to containerd snapshotters.

 - Refactors libcontainerd to a series of subpackages so that either a
  "local" containerd (1) or a "remote" (2) containerd can be loaded as opposed
  to conditional compile as "local" for Windows and "remote" for Linux.

 - Updates libcontainerd such that Windows has an option to allow the use of a
   "remote" containerd. Here, it communicates over a named pipe using GRPC.
   This is currently guarded behind the experimental flag, an environment variable,
   and the providing of a pipename to connect to containerd.

 - Infrastructure pieces such as under pkg/system to have helper functions for
   determining whether containerd is being used.

(1) "local" containerd is what the daemon on Windows has used since inception.
It's not really containerd at all - it's simply local invocation of HCS APIs
directly in-process from the daemon through the Microsoft/hcsshim library.

(2) "remote" containerd is what docker on Linux uses for it's runtime. It means
that there is a separate containerd service running, and docker communicates over
GRPC to it.

To try this out, you will need to start with something like the following:

Window 1:
	containerd --log-level debug

Window 2:
	$env:DOCKER_WINDOWS_CONTAINERD=1
	dockerd --experimental -D --containerd \\.\pipe\containerd-containerd

You will need the following binary from github.com/containerd/containerd in your path:
 - containerd.exe

You will need the following binaries from github.com/Microsoft/hcsshim in your path:
 - runhcs.exe
 - containerd-shim-runhcs-v1.exe

For LCOW, it will require and initrd.img and kernel in `C:\Program Files\Linux Containers`.
This is no different to the current requirements. However, you may need updated binaries,
particularly initrd.img built from Microsoft/opengcs as (at the time of writing), Linuxkit
binaries are somewhat out of date.

Note that containerd and hcsshim for HCS v2 APIs do not yet support all the required
functionality needed for docker. This will come in time - this is a baby (although large)
step to migrating Docker on Windows to containerd.

Note that the HCS v2 APIs are only called on RS5+ builds. RS1..RS4 will still use
HCS v1 APIs as the v2 APIs were not fully developed enough on these builds to be usable.
This abstraction is done in HCSShim. (Referring specifically to runtime)

Note the LCOW graphdriver still uses HCS v1 APIs regardless.

Note also that this does not migrate docker to use containerd snapshotters
rather than graphdrivers. This needs to be done in conjunction with Linux also
doing the same switch.
2019-03-12 18:41:55 -07:00
Justin Cormack
98fc09128b Remove the rest of v1 manifest support
As people are using the UUID in `docker info` that was based on the v1 manifest signing key, replace
with a UUID instead.

Remove deprecated `--disable-legacy-registry` option that was scheduled to be removed in 18.03.

Signed-off-by: Justin Cormack <justin.cormack@docker.com>
2019-03-02 10:46:37 -08:00
Ryo Nakao
894ecb24d1 Merge the divided loops
Signed-off-by: Ryo Nakao <nakabonne@gmail.com>
2019-02-24 16:16:19 +09:00
Akihiro Suda
ec87479b7e allow running dockerd in an unprivileged user namespace (rootless mode)
Please refer to `docs/rootless.md`.

TLDR:
 * Make sure `/etc/subuid` and `/etc/subgid` contain the entry for you
 * `dockerd-rootless.sh --experimental`
 * `docker -H unix://$XDG_RUNTIME_DIR/docker.sock run ...`

Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
2019-02-04 00:24:27 +09:00
Yong Tang
7315a2bb11 Fix go vet issue in daemon/daemon.go
This fix fixes go vet issue:
```
daemon/daemon.go:273: loop variable id captured by func literal
daemon/daemon.go:280: loop variable id captured by func literal
```

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2019-01-06 00:18:29 +00:00
Akihiro Suda
2cb26cfe9c
Merge pull request #38301 from cyphar/waitgroup-limits
daemon: switch to semaphore-gated WaitGroup for startup tasks
2018-12-22 00:07:55 +09:00
Aleksa Sarai
5a52917e4d
daemon: switch to semaphore-gated WaitGroup for startup tasks
Many startup tasks have to run for each container, and thus using a
WaitGroup (which doesn't have a limit to the number of parallel tasks)
can result in Docker exceeding the NOFILE limit quite trivially. A more
optimal solution is to have a parallelism limit by using a semaphore.

In addition, several startup tasks were not parallelised previously
which resulted in very long startup times. According to my testing, 20K
dead containers resulted in ~6 minute startup times (during which time
Docker is completely unusable).

This patch fixes both issues, and the parallelStartupTimes factor chosen
(128 * NumCPU) is based on my own significant testing of the 20K
container case. This patch (on my machines) reduces the startup time
from 6 minutes to less than a minute (ideally this could be further
reduced by removing the need to scan all dead containers on startup --
but that's beyond the scope of this patchset).

In order to avoid the NOFILE limit problem, we also detect this
on-startup and if NOFILE < 2*128*NumCPU we will reduce the parallelism
factor to avoid hitting NOFILE limits (but also emit a warning since
this is almost certainly a mis-configuration).

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2018-12-21 21:51:02 +11:00
Akihiro Suda
1fea38856a Remove v1.10 migrator
The v1.10 layout and the migrator was added in 2015 via #17924.

Although the migrator is not marked as "deprecated" explicitly in
cli/docs/deprecated.md, I suppose people should have already migrated
from pre-v1.10 and they no longer need the migrator, because pre-v1.10
version do not support schema2 images (and these versions no longer
receives security updates).

Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
2018-11-24 17:45:13 +09:00
Anda Xu
171d51c861 add support of registry-mirrors and insecure-registries to buildkit
Signed-off-by: Anda Xu <anda.xu@docker.com>
2018-09-20 11:53:02 -07:00
Kir Kolyshkin
9b0097a699 Format code with gofmt -s from go-1.11beta1
This should eliminate a bunch of new (go-1.11 related) validation
errors telling that the code is not formatted with `gofmt -s`.

No functional change, just whitespace (i.e.
`git show --ignore-space-change` shows nothing).

Patch generated with:

> git ls-files | grep -v ^vendor/ | grep .go$ | xargs gofmt -s -w

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2018-09-06 15:24:16 -07:00
Anda Xu
58a75cebdd allow features option live reloadable
Signed-off-by: Anda Xu <anda.xu@docker.com>
2018-08-31 12:43:04 -07:00
John Howard
5accd82634 Add containerd.WithTimeout(60*time.Second) to match old calls
Signed-off-by: John Howard <jhoward@microsoft.com>
2018-08-23 12:03:43 -07:00
John Stephens
b3e9f7b13b
Merge pull request #35521 from salah-khan/35507
Add --chown flag support for ADD/COPY commands for Windows
2018-08-17 11:31:16 -07:00
Salahuddin Khan
763d839261 Add ADD/COPY --chown flag support to Windows
This implements chown support on Windows. Built-in accounts as well
as accounts included in the SAM database of the container are supported.

NOTE: IDPair is now named Identity and IDMappings is now named
IdentityMapping.

The following are valid examples:
ADD --chown=Guest . <some directory>
COPY --chown=Administrator . <some directory>
COPY --chown=Guests . <some directory>
COPY --chown=ContainerUser . <some directory>

On Windows an owner is only granted the permission to read the security
descriptor and read/write the discretionary access control list. This
fix also grants read/write and execute permissions to the owner.

Signed-off-by: Salahuddin Khan <salah@docker.com>
2018-08-13 21:59:11 -07:00
Derek McGowan
dd2e19ebd5
libcontainerd: split client and supervisor
Adds a supervisor package for starting and monitoring containerd.
Separates grpc connection allowing access from daemon.

Signed-off-by: Derek McGowan <derek@mcgstyle.net>
2018-08-06 10:23:04 -07:00
Flavio Crisciani
e353e7e3f0
Fixes for resolv.conf
Handle the case of systemd-resolved, and if in place
use a different resolv.conf source.
Set appropriately the option on libnetwork.
Move unix specific code to container_operation_unix

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2018-07-26 11:17:56 -07:00
Sebastiaan van Stijn
3737194b9f
daemon/*.go: fix some Wrap[f]/Warn[f] errors
In particular, these two:
> daemon/daemon_unix.go:1129: Wrapf format %v reads arg #1, but call has 0 args
> daemon/kill.go:111: Warn call has possible formatting directive %s

and a few more.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2018-07-11 15:51:51 +02:00
Tonis Tiigi
157b0b30db builder: lint fixes
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
2018-06-10 10:05:29 -07:00
Tonis Tiigi
ea36c3cbaf daemon: access to distribution internals
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
2018-06-10 10:05:26 -07:00
Brian Goff
e4b6adc88e Extract volume interaction to a volumes service
This cleans up some of the package API's used for interacting with
volumes, and simplifies management.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-05-25 14:21:07 -04:00
Brian Goff
82d9185470
Merge pull request #36396 from selansen/master
Allow user to specify default address pools for docker networks
2018-05-03 06:34:14 -04:00
Alessandro Boch
173b3c364e Allow user to control the default address pools
- Via daemon flag --default-address-pools base=<CIDR>,size=<int>

Signed-off-by: Elango Siva  <elango@docker.com>
2018-04-30 11:14:08 -04:00
Antonio Murdaca
75d3214934
restartmanager: do not apply restart policy on created containers
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2018-04-24 11:41:09 +02:00
Brian Goff
977109d808 Remove use of global volume driver store
Instead of using a global store for volume drivers, scope the driver
store to the caller (e.g. the volume store). This makes testing much
simpler.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-04-17 14:07:08 -04:00
Brian Goff
0023abbad3 Remove old/uneeded volume migration from vers 1.7
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-04-17 14:06:53 -04:00
Daniel Nephin
2b1a2b10af Move ImageService to new package
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2018-02-26 16:49:37 -05:00
Daniel Nephin
0dab53ff3c Move all daemon image methods into imageService
imageService provides the backend for the image API and handles the
imageStore, and referenceStore.

Signed-off-by: Daniel Nephin <dnephin@docker.com>
2018-02-26 16:48:29 -05:00
Tibor Vass
747c163a65
Merge pull request #36303 from dnephin/cleanup-in-daemon-unix
Cleanup unnecessary and duplicate functions in `daemon_unix.go`
2018-02-16 14:55:18 -08:00
Brian Goff
b0b9a25e7e Move log validator logic after plugins are loaded
This ensures that all log plugins are registered when the log validator
is run.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-02-15 11:53:11 -05:00
Daniel Nephin
c502bcff33 Remove unnecessary getLayerInit
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2018-02-14 11:59:10 -05:00
Kir Kolyshkin
195893d381 c.RWLayer: check for nil before use
Since commit e9b9e4ace2 has landed, there is a chance that
container.RWLayer is nil (due to some half-removed container). Let's
check the pointer before use to avoid any potential nil pointer
dereferences, resulting in a daemon crash.

Note that even without the abovementioned commit, it's better to perform
an extra check (even it's totally redundant) rather than to have a
possibility of a daemon crash. In other words, better be safe than
sorry.

[v2: add a test case for daemon.getInspectData]
[v3: add a check for container.Dead and a special error for the case]

Fixes: e9b9e4ace2
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2018-02-09 11:24:09 -08:00
Daniel Nephin
4f0d95fa6e Add canonical import comment
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2018-02-05 16:51:57 -05:00
Brian Goff
c379d2681f Fix race in attachable network attachment
Attachable networks are networks created on the cluster which can then
be attached to by non-swarm containers. These networks are lazily
created on the node that wants to attach to that network.

When no container is currently attached to one of these networks on a
node, and then multiple containers which want that network are started
concurrently, this can cause a race condition in the network attachment
where essentially we try to attach the same network to the node twice.

To easily reproduce this issue you must use a multi-node cluster with a
worker node that has lots of CPUs (I used a 36 CPU node).

Repro steps:

1. On manager, `docker network create -d overlay --attachable test`
2. On worker, `docker create --restart=always --network test busybox
top`, many times... 200 is a good number (but not much more due to
subnet size restrictions)
3. Restart the daemon

When the daemon restarts, it will attempt to start all those containers
simultaneously. Note that you could try to do this yourself over the API,
but it's harder to trigger due to the added latency from going over
the API.

The error produced happens when the daemon tries to start the container
upon allocating the network resources:

```
attaching to network failed, make sure your network options are correct and check manager logs: context deadline exceeded
```

What happens here is the worker makes a network attachment request to
the manager. This is an async call which in the happy case would cause a
task to be placed on the node, which the worker is waiting for to get
the network configuration.
In the case of this race, the error ocurrs on the manager like this:

```
task allocation failure" error="failed during network allocation for task n7bwwwbymj2o2h9asqkza8gom: failed to allocate network IP for task n7bwwwbymj2o2h9asqkza8gom network rj4szie2zfauqnpgh4eri1yue: could not find an available IP" module=node node.id=u3489c490fx1df8onlyfo1v6e
```

The task is not created and the worker times out waiting for the task.

---

The mitigation for this is to make sure that only one attachment reuest
is in flight for a given network at a time *when the network doesn't
already exist on the node*. If the network already exists on the node
there is no need for synchronization because the network is already
allocated and on the node so there is no need to request it from the
manager.

This basically comes down to a race with `Find(network) ||
Create(network)` without any sort of syncronization.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-02-02 13:46:23 -05:00
Allen Sun
de68ac8393
Simplify codes on calculating shutdown timeout
Signed-off-by: Allen Sun <shlallen1990@gmail.com>
Signed-off-by: Vincent Demeester <vincent@sbr.pm>
2018-01-26 09:18:07 -08:00
John Howard
0cba7740d4 Address feedback from Tonis
Signed-off-by: John Howard <jhoward@microsoft.com>
2018-01-18 12:30:39 -08:00
John Howard
afd305c4b5 LCOW: Refactor to multiple layer-stores based on feedback
Signed-off-by: John Howard <jhoward@microsoft.com>
2018-01-18 08:31:05 -08:00
John Howard
ce8e529e18 LCOW: Re-coalesce stores
Signed-off-by: John Howard <jhoward@microsoft.com>

The re-coalesces the daemon stores which were split as part of the
original LCOW implementation.

This is part of the work discussed in https://github.com/moby/moby/issues/34617,
in particular see the document linked to in that issue.
2018-01-18 08:29:19 -08:00
Yong Tang
c36274da83
Merge pull request #35638 from cpuguy83/error_helpers2
Add helpers to create errdef errors
2018-01-15 10:56:46 -08:00
Sebastiaan van Stijn
b4a6313969
Golint: remove redundant ifs
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2018-01-15 00:42:25 +01:00
Brian Goff
d453fe35b9 Move api/errdefs to errdefs
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-01-11 21:21:43 -05:00
Victor Vieux
745278d242
Merge pull request #35812 from stevvooe/follow-conventions
daemon, plugin: follow containerd namespace conventions
2017-12-19 15:55:39 -08:00
Brian Goff
e69127bd5b Ensure containers are stopped on daemon startup
When the containerd 1.0 runtime changes were made, we inadvertantly
removed the functionality where any running containers are killed on
startup when not using live-restore.
This change restores that behavior.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2017-12-18 14:33:45 -05:00