WithBlock makes sure that the following containerd request is reliable.
In one edge case with high load pressure, kernel kills dockerd, containerd
and containerd-shims caused by OOM. When both dockerd and containerd
restart, but containerd will take time to recover all the existing
containers. Before containerd serving, dockerd will failed with gRPC
error. That bad thing is that restore action will still ignore the
any non-NotFound errors and returns running state for
already stopped container. It is unexpected behavior. And
we need to restart dockerd to make sure that anything is OK.
It is painful. Add WithBlock can prevent the edge case. And
n common case, the containerd will be serving in shortly.
It is not harm to add WithBlock for containerd connection.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
(cherry picked from commit 9f73396dab)
Signed-off-by: Wei Fu <fuweid89@gmail.com>
This allows our tests, which all share a containerd instance, to be a
bit more isolated by setting the containerd namespaces to the generated
daemon ID's rather than the default namespaces.
This came about because I found in some cases we had test daemons
failing to start (really very slow to start) because it was (seemingly)
processing events from other tests.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit 24ad2f486d)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This reverts commit 98fc09128b in order to
keep registry v2 schema1 handling and libtrust-key-based engine ID.
Because registry v2 schema1 was not officially deprecated and
registries are still relying on it, this patch puts its logic back.
However, registry v1 relics are not added back since v1 logic has been
removed a while ago.
This also fixes an engine upgrade issue in a swarm cluster. It was relying
on the Engine ID to be the same upon upgrade, but the mentioned commit
modified the logic to use UUID and from a different file.
Since the libtrust key is always needed to support v2 schema1 pushes,
that the old engine ID is based on the libtrust key, and that the engine ID
needs to be conserved across upgrades, adding a UUID-based engine ID logic
seems to add more complexity than it solves the problems.
Hence reverting the engine ID changes as well.
Signed-off-by: Tibor Vass <tibor@docker.com>
(cherry picked from commit f695e98cb7)
Signed-off-by: Tibor Vass <tibor@docker.com>
Signed-off-by: John Howard <jhoward@microsoft.com>
This is the first step in refactoring moby (dockerd) to use containerd on Windows.
Similar to the current model in Linux, this adds the option to enable it for runtime.
It does not switch the graphdriver to containerd snapshotters.
- Refactors libcontainerd to a series of subpackages so that either a
"local" containerd (1) or a "remote" (2) containerd can be loaded as opposed
to conditional compile as "local" for Windows and "remote" for Linux.
- Updates libcontainerd such that Windows has an option to allow the use of a
"remote" containerd. Here, it communicates over a named pipe using GRPC.
This is currently guarded behind the experimental flag, an environment variable,
and the providing of a pipename to connect to containerd.
- Infrastructure pieces such as under pkg/system to have helper functions for
determining whether containerd is being used.
(1) "local" containerd is what the daemon on Windows has used since inception.
It's not really containerd at all - it's simply local invocation of HCS APIs
directly in-process from the daemon through the Microsoft/hcsshim library.
(2) "remote" containerd is what docker on Linux uses for it's runtime. It means
that there is a separate containerd service running, and docker communicates over
GRPC to it.
To try this out, you will need to start with something like the following:
Window 1:
containerd --log-level debug
Window 2:
$env:DOCKER_WINDOWS_CONTAINERD=1
dockerd --experimental -D --containerd \\.\pipe\containerd-containerd
You will need the following binary from github.com/containerd/containerd in your path:
- containerd.exe
You will need the following binaries from github.com/Microsoft/hcsshim in your path:
- runhcs.exe
- containerd-shim-runhcs-v1.exe
For LCOW, it will require and initrd.img and kernel in `C:\Program Files\Linux Containers`.
This is no different to the current requirements. However, you may need updated binaries,
particularly initrd.img built from Microsoft/opengcs as (at the time of writing), Linuxkit
binaries are somewhat out of date.
Note that containerd and hcsshim for HCS v2 APIs do not yet support all the required
functionality needed for docker. This will come in time - this is a baby (although large)
step to migrating Docker on Windows to containerd.
Note that the HCS v2 APIs are only called on RS5+ builds. RS1..RS4 will still use
HCS v1 APIs as the v2 APIs were not fully developed enough on these builds to be usable.
This abstraction is done in HCSShim. (Referring specifically to runtime)
Note the LCOW graphdriver still uses HCS v1 APIs regardless.
Note also that this does not migrate docker to use containerd snapshotters
rather than graphdrivers. This needs to be done in conjunction with Linux also
doing the same switch.
As people are using the UUID in `docker info` that was based on the v1 manifest signing key, replace
with a UUID instead.
Remove deprecated `--disable-legacy-registry` option that was scheduled to be removed in 18.03.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Please refer to `docs/rootless.md`.
TLDR:
* Make sure `/etc/subuid` and `/etc/subgid` contain the entry for you
* `dockerd-rootless.sh --experimental`
* `docker -H unix://$XDG_RUNTIME_DIR/docker.sock run ...`
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
This fix fixes go vet issue:
```
daemon/daemon.go:273: loop variable id captured by func literal
daemon/daemon.go:280: loop variable id captured by func literal
```
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
Many startup tasks have to run for each container, and thus using a
WaitGroup (which doesn't have a limit to the number of parallel tasks)
can result in Docker exceeding the NOFILE limit quite trivially. A more
optimal solution is to have a parallelism limit by using a semaphore.
In addition, several startup tasks were not parallelised previously
which resulted in very long startup times. According to my testing, 20K
dead containers resulted in ~6 minute startup times (during which time
Docker is completely unusable).
This patch fixes both issues, and the parallelStartupTimes factor chosen
(128 * NumCPU) is based on my own significant testing of the 20K
container case. This patch (on my machines) reduces the startup time
from 6 minutes to less than a minute (ideally this could be further
reduced by removing the need to scan all dead containers on startup --
but that's beyond the scope of this patchset).
In order to avoid the NOFILE limit problem, we also detect this
on-startup and if NOFILE < 2*128*NumCPU we will reduce the parallelism
factor to avoid hitting NOFILE limits (but also emit a warning since
this is almost certainly a mis-configuration).
Signed-off-by: Aleksa Sarai <asarai@suse.de>
The v1.10 layout and the migrator was added in 2015 via #17924.
Although the migrator is not marked as "deprecated" explicitly in
cli/docs/deprecated.md, I suppose people should have already migrated
from pre-v1.10 and they no longer need the migrator, because pre-v1.10
version do not support schema2 images (and these versions no longer
receives security updates).
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
This should eliminate a bunch of new (go-1.11 related) validation
errors telling that the code is not formatted with `gofmt -s`.
No functional change, just whitespace (i.e.
`git show --ignore-space-change` shows nothing).
Patch generated with:
> git ls-files | grep -v ^vendor/ | grep .go$ | xargs gofmt -s -w
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This implements chown support on Windows. Built-in accounts as well
as accounts included in the SAM database of the container are supported.
NOTE: IDPair is now named Identity and IDMappings is now named
IdentityMapping.
The following are valid examples:
ADD --chown=Guest . <some directory>
COPY --chown=Administrator . <some directory>
COPY --chown=Guests . <some directory>
COPY --chown=ContainerUser . <some directory>
On Windows an owner is only granted the permission to read the security
descriptor and read/write the discretionary access control list. This
fix also grants read/write and execute permissions to the owner.
Signed-off-by: Salahuddin Khan <salah@docker.com>
Adds a supervisor package for starting and monitoring containerd.
Separates grpc connection allowing access from daemon.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
Handle the case of systemd-resolved, and if in place
use a different resolv.conf source.
Set appropriately the option on libnetwork.
Move unix specific code to container_operation_unix
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
In particular, these two:
> daemon/daemon_unix.go:1129: Wrapf format %v reads arg #1, but call has 0 args
> daemon/kill.go:111: Warn call has possible formatting directive %s
and a few more.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Instead of using a global store for volume drivers, scope the driver
store to the caller (e.g. the volume store). This makes testing much
simpler.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Since commit e9b9e4ace2 has landed, there is a chance that
container.RWLayer is nil (due to some half-removed container). Let's
check the pointer before use to avoid any potential nil pointer
dereferences, resulting in a daemon crash.
Note that even without the abovementioned commit, it's better to perform
an extra check (even it's totally redundant) rather than to have a
possibility of a daemon crash. In other words, better be safe than
sorry.
[v2: add a test case for daemon.getInspectData]
[v3: add a check for container.Dead and a special error for the case]
Fixes: e9b9e4ace2
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Attachable networks are networks created on the cluster which can then
be attached to by non-swarm containers. These networks are lazily
created on the node that wants to attach to that network.
When no container is currently attached to one of these networks on a
node, and then multiple containers which want that network are started
concurrently, this can cause a race condition in the network attachment
where essentially we try to attach the same network to the node twice.
To easily reproduce this issue you must use a multi-node cluster with a
worker node that has lots of CPUs (I used a 36 CPU node).
Repro steps:
1. On manager, `docker network create -d overlay --attachable test`
2. On worker, `docker create --restart=always --network test busybox
top`, many times... 200 is a good number (but not much more due to
subnet size restrictions)
3. Restart the daemon
When the daemon restarts, it will attempt to start all those containers
simultaneously. Note that you could try to do this yourself over the API,
but it's harder to trigger due to the added latency from going over
the API.
The error produced happens when the daemon tries to start the container
upon allocating the network resources:
```
attaching to network failed, make sure your network options are correct and check manager logs: context deadline exceeded
```
What happens here is the worker makes a network attachment request to
the manager. This is an async call which in the happy case would cause a
task to be placed on the node, which the worker is waiting for to get
the network configuration.
In the case of this race, the error ocurrs on the manager like this:
```
task allocation failure" error="failed during network allocation for task n7bwwwbymj2o2h9asqkza8gom: failed to allocate network IP for task n7bwwwbymj2o2h9asqkza8gom network rj4szie2zfauqnpgh4eri1yue: could not find an available IP" module=node node.id=u3489c490fx1df8onlyfo1v6e
```
The task is not created and the worker times out waiting for the task.
---
The mitigation for this is to make sure that only one attachment reuest
is in flight for a given network at a time *when the network doesn't
already exist on the node*. If the network already exists on the node
there is no need for synchronization because the network is already
allocated and on the node so there is no need to request it from the
manager.
This basically comes down to a race with `Find(network) ||
Create(network)` without any sort of syncronization.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Signed-off-by: John Howard <jhoward@microsoft.com>
The re-coalesces the daemon stores which were split as part of the
original LCOW implementation.
This is part of the work discussed in https://github.com/moby/moby/issues/34617,
in particular see the document linked to in that issue.
When the containerd 1.0 runtime changes were made, we inadvertantly
removed the functionality where any running containers are killed on
startup when not using live-restore.
This change restores that behavior.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>