Commit graph

922 commits

Author SHA1 Message Date
Brian Goff
eaad3ee3cf Make sure timers are stopped after use.
`time.After` keeps a timer running until the specified duration is
completed. It also allocates a new timer on each call. This can wind up
leaving lots of uneccessary timers running in the background that are
not needed and consume resources.

Instead of `time.After`, use `time.NewTimer` so the timer can actually
be stopped.
In some of these cases it's not a big deal since the duraiton is really
short, but in others it is much worse.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2019-01-16 14:32:53 -08:00
Yong Tang
7315a2bb11 Fix go vet issue in daemon/daemon.go
This fix fixes go vet issue:
```
daemon/daemon.go:273: loop variable id captured by func literal
daemon/daemon.go:280: loop variable id captured by func literal
```

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2019-01-06 00:18:29 +00:00
Akihiro Suda
2cb26cfe9c
Merge pull request #38301 from cyphar/waitgroup-limits
daemon: switch to semaphore-gated WaitGroup for startup tasks
2018-12-22 00:07:55 +09:00
Aleksa Sarai
5a52917e4d
daemon: switch to semaphore-gated WaitGroup for startup tasks
Many startup tasks have to run for each container, and thus using a
WaitGroup (which doesn't have a limit to the number of parallel tasks)
can result in Docker exceeding the NOFILE limit quite trivially. A more
optimal solution is to have a parallelism limit by using a semaphore.

In addition, several startup tasks were not parallelised previously
which resulted in very long startup times. According to my testing, 20K
dead containers resulted in ~6 minute startup times (during which time
Docker is completely unusable).

This patch fixes both issues, and the parallelStartupTimes factor chosen
(128 * NumCPU) is based on my own significant testing of the 20K
container case. This patch (on my machines) reduces the startup time
from 6 minutes to less than a minute (ideally this could be further
reduced by removing the need to scan all dead containers on startup --
but that's beyond the scope of this patchset).

In order to avoid the NOFILE limit problem, we also detect this
on-startup and if NOFILE < 2*128*NumCPU we will reduce the parallelism
factor to avoid hitting NOFILE limits (but also emit a warning since
this is almost certainly a mis-configuration).

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2018-12-21 21:51:02 +11:00
Akihiro Suda
1fea38856a Remove v1.10 migrator
The v1.10 layout and the migrator was added in 2015 via #17924.

Although the migrator is not marked as "deprecated" explicitly in
cli/docs/deprecated.md, I suppose people should have already migrated
from pre-v1.10 and they no longer need the migrator, because pre-v1.10
version do not support schema2 images (and these versions no longer
receives security updates).

Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
2018-11-24 17:45:13 +09:00
Anda Xu
171d51c861 add support of registry-mirrors and insecure-registries to buildkit
Signed-off-by: Anda Xu <anda.xu@docker.com>
2018-09-20 11:53:02 -07:00
Kir Kolyshkin
9b0097a699 Format code with gofmt -s from go-1.11beta1
This should eliminate a bunch of new (go-1.11 related) validation
errors telling that the code is not formatted with `gofmt -s`.

No functional change, just whitespace (i.e.
`git show --ignore-space-change` shows nothing).

Patch generated with:

> git ls-files | grep -v ^vendor/ | grep .go$ | xargs gofmt -s -w

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2018-09-06 15:24:16 -07:00
Anda Xu
58a75cebdd allow features option live reloadable
Signed-off-by: Anda Xu <anda.xu@docker.com>
2018-08-31 12:43:04 -07:00
John Howard
5accd82634 Add containerd.WithTimeout(60*time.Second) to match old calls
Signed-off-by: John Howard <jhoward@microsoft.com>
2018-08-23 12:03:43 -07:00
John Stephens
b3e9f7b13b
Merge pull request #35521 from salah-khan/35507
Add --chown flag support for ADD/COPY commands for Windows
2018-08-17 11:31:16 -07:00
Salahuddin Khan
763d839261 Add ADD/COPY --chown flag support to Windows
This implements chown support on Windows. Built-in accounts as well
as accounts included in the SAM database of the container are supported.

NOTE: IDPair is now named Identity and IDMappings is now named
IdentityMapping.

The following are valid examples:
ADD --chown=Guest . <some directory>
COPY --chown=Administrator . <some directory>
COPY --chown=Guests . <some directory>
COPY --chown=ContainerUser . <some directory>

On Windows an owner is only granted the permission to read the security
descriptor and read/write the discretionary access control list. This
fix also grants read/write and execute permissions to the owner.

Signed-off-by: Salahuddin Khan <salah@docker.com>
2018-08-13 21:59:11 -07:00
Derek McGowan
dd2e19ebd5
libcontainerd: split client and supervisor
Adds a supervisor package for starting and monitoring containerd.
Separates grpc connection allowing access from daemon.

Signed-off-by: Derek McGowan <derek@mcgstyle.net>
2018-08-06 10:23:04 -07:00
Flavio Crisciani
e353e7e3f0
Fixes for resolv.conf
Handle the case of systemd-resolved, and if in place
use a different resolv.conf source.
Set appropriately the option on libnetwork.
Move unix specific code to container_operation_unix

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2018-07-26 11:17:56 -07:00
Sebastiaan van Stijn
3737194b9f
daemon/*.go: fix some Wrap[f]/Warn[f] errors
In particular, these two:
> daemon/daemon_unix.go:1129: Wrapf format %v reads arg #1, but call has 0 args
> daemon/kill.go:111: Warn call has possible formatting directive %s

and a few more.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2018-07-11 15:51:51 +02:00
Tonis Tiigi
157b0b30db builder: lint fixes
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
2018-06-10 10:05:29 -07:00
Tonis Tiigi
ea36c3cbaf daemon: access to distribution internals
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
2018-06-10 10:05:26 -07:00
Brian Goff
e4b6adc88e Extract volume interaction to a volumes service
This cleans up some of the package API's used for interacting with
volumes, and simplifies management.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-05-25 14:21:07 -04:00
Brian Goff
82d9185470
Merge pull request #36396 from selansen/master
Allow user to specify default address pools for docker networks
2018-05-03 06:34:14 -04:00
Alessandro Boch
173b3c364e Allow user to control the default address pools
- Via daemon flag --default-address-pools base=<CIDR>,size=<int>

Signed-off-by: Elango Siva  <elango@docker.com>
2018-04-30 11:14:08 -04:00
Antonio Murdaca
75d3214934
restartmanager: do not apply restart policy on created containers
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2018-04-24 11:41:09 +02:00
Brian Goff
977109d808 Remove use of global volume driver store
Instead of using a global store for volume drivers, scope the driver
store to the caller (e.g. the volume store). This makes testing much
simpler.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-04-17 14:07:08 -04:00
Brian Goff
0023abbad3 Remove old/uneeded volume migration from vers 1.7
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-04-17 14:06:53 -04:00
Daniel Nephin
2b1a2b10af Move ImageService to new package
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2018-02-26 16:49:37 -05:00
Daniel Nephin
0dab53ff3c Move all daemon image methods into imageService
imageService provides the backend for the image API and handles the
imageStore, and referenceStore.

Signed-off-by: Daniel Nephin <dnephin@docker.com>
2018-02-26 16:48:29 -05:00
Tibor Vass
747c163a65
Merge pull request #36303 from dnephin/cleanup-in-daemon-unix
Cleanup unnecessary and duplicate functions in `daemon_unix.go`
2018-02-16 14:55:18 -08:00
Brian Goff
b0b9a25e7e Move log validator logic after plugins are loaded
This ensures that all log plugins are registered when the log validator
is run.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-02-15 11:53:11 -05:00
Daniel Nephin
c502bcff33 Remove unnecessary getLayerInit
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2018-02-14 11:59:10 -05:00
Kir Kolyshkin
195893d381 c.RWLayer: check for nil before use
Since commit e9b9e4ace2 has landed, there is a chance that
container.RWLayer is nil (due to some half-removed container). Let's
check the pointer before use to avoid any potential nil pointer
dereferences, resulting in a daemon crash.

Note that even without the abovementioned commit, it's better to perform
an extra check (even it's totally redundant) rather than to have a
possibility of a daemon crash. In other words, better be safe than
sorry.

[v2: add a test case for daemon.getInspectData]
[v3: add a check for container.Dead and a special error for the case]

Fixes: e9b9e4ace2
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2018-02-09 11:24:09 -08:00
Daniel Nephin
4f0d95fa6e Add canonical import comment
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2018-02-05 16:51:57 -05:00
Brian Goff
c379d2681f Fix race in attachable network attachment
Attachable networks are networks created on the cluster which can then
be attached to by non-swarm containers. These networks are lazily
created on the node that wants to attach to that network.

When no container is currently attached to one of these networks on a
node, and then multiple containers which want that network are started
concurrently, this can cause a race condition in the network attachment
where essentially we try to attach the same network to the node twice.

To easily reproduce this issue you must use a multi-node cluster with a
worker node that has lots of CPUs (I used a 36 CPU node).

Repro steps:

1. On manager, `docker network create -d overlay --attachable test`
2. On worker, `docker create --restart=always --network test busybox
top`, many times... 200 is a good number (but not much more due to
subnet size restrictions)
3. Restart the daemon

When the daemon restarts, it will attempt to start all those containers
simultaneously. Note that you could try to do this yourself over the API,
but it's harder to trigger due to the added latency from going over
the API.

The error produced happens when the daemon tries to start the container
upon allocating the network resources:

```
attaching to network failed, make sure your network options are correct and check manager logs: context deadline exceeded
```

What happens here is the worker makes a network attachment request to
the manager. This is an async call which in the happy case would cause a
task to be placed on the node, which the worker is waiting for to get
the network configuration.
In the case of this race, the error ocurrs on the manager like this:

```
task allocation failure" error="failed during network allocation for task n7bwwwbymj2o2h9asqkza8gom: failed to allocate network IP for task n7bwwwbymj2o2h9asqkza8gom network rj4szie2zfauqnpgh4eri1yue: could not find an available IP" module=node node.id=u3489c490fx1df8onlyfo1v6e
```

The task is not created and the worker times out waiting for the task.

---

The mitigation for this is to make sure that only one attachment reuest
is in flight for a given network at a time *when the network doesn't
already exist on the node*. If the network already exists on the node
there is no need for synchronization because the network is already
allocated and on the node so there is no need to request it from the
manager.

This basically comes down to a race with `Find(network) ||
Create(network)` without any sort of syncronization.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-02-02 13:46:23 -05:00
Allen Sun
de68ac8393
Simplify codes on calculating shutdown timeout
Signed-off-by: Allen Sun <shlallen1990@gmail.com>
Signed-off-by: Vincent Demeester <vincent@sbr.pm>
2018-01-26 09:18:07 -08:00
John Howard
0cba7740d4 Address feedback from Tonis
Signed-off-by: John Howard <jhoward@microsoft.com>
2018-01-18 12:30:39 -08:00
John Howard
afd305c4b5 LCOW: Refactor to multiple layer-stores based on feedback
Signed-off-by: John Howard <jhoward@microsoft.com>
2018-01-18 08:31:05 -08:00
John Howard
ce8e529e18 LCOW: Re-coalesce stores
Signed-off-by: John Howard <jhoward@microsoft.com>

The re-coalesces the daemon stores which were split as part of the
original LCOW implementation.

This is part of the work discussed in https://github.com/moby/moby/issues/34617,
in particular see the document linked to in that issue.
2018-01-18 08:29:19 -08:00
Yong Tang
c36274da83
Merge pull request #35638 from cpuguy83/error_helpers2
Add helpers to create errdef errors
2018-01-15 10:56:46 -08:00
Sebastiaan van Stijn
b4a6313969
Golint: remove redundant ifs
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2018-01-15 00:42:25 +01:00
Brian Goff
d453fe35b9 Move api/errdefs to errdefs
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-01-11 21:21:43 -05:00
Victor Vieux
745278d242
Merge pull request #35812 from stevvooe/follow-conventions
daemon, plugin: follow containerd namespace conventions
2017-12-19 15:55:39 -08:00
Brian Goff
e69127bd5b Ensure containers are stopped on daemon startup
When the containerd 1.0 runtime changes were made, we inadvertantly
removed the functionality where any running containers are killed on
startup when not using live-restore.
This change restores that behavior.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2017-12-18 14:33:45 -05:00
Stephen J Day
521e7eba86
daemon, plugin: follow containerd namespace conventions
Follow the conventions for namespace naming set out by other projects,
such as linuxkit and cri-containerd. Typically, they are some sort of
host name, with a subdomain describing functionality of the namespace.
In the case of linuxkit, services are launched in `services.linuxkit`.
In cri-containerd, pods are launched in `k8s.io`, making it clear that
these are from kubernetes.

Signed-off-by: Stephen J Day <stephen.day@docker.com>
2017-12-15 17:20:42 -08:00
Kir Kolyshkin
516010e92d Simplify/fix MkdirAll usage
This subtle bug keeps lurking in because error checking for `Mkdir()`
and `MkdirAll()` is slightly different wrt to `EEXIST`/`IsExist`:

 - for `Mkdir()`, `IsExist` error should (usually) be ignored
   (unless you want to make sure directory was not there before)
   as it means "the destination directory was already there"

 - for `MkdirAll()`, `IsExist` error should NEVER be ignored.

Mostly, this commit just removes ignoring the IsExist error, as it
should not be ignored.

Also, there are a couple of cases then IsExist is handled as
"directory already exist" which is wrong. As a result, some code
that never worked as intended is now removed.

NOTE that `idtools.MkdirAndChown()` behaves like `os.MkdirAll()`
rather than `os.Mkdir()` -- so its description is amended accordingly,
and its usage is handled as such (i.e. IsExist error is not ignored).

For more details, a quote from my runc commit 6f82d4b (July 2015):

    TL;DR: check for IsExist(err) after a failed MkdirAll() is both
    redundant and wrong -- so two reasons to remove it.

    Quoting MkdirAll documentation:

    > MkdirAll creates a directory named path, along with any necessary
    > parents, and returns nil, or else returns an error. If path
    > is already a directory, MkdirAll does nothing and returns nil.

    This means two things:

    1. If a directory to be created already exists, no error is
    returned.

    2. If the error returned is IsExist (EEXIST), it means there exists
    a non-directory with the same name as MkdirAll need to use for
    directory. Example: we want to MkdirAll("a/b"), but file "a"
    (or "a/b") already exists, so MkdirAll fails.

    The above is a theory, based on quoted documentation and my UNIX
    knowledge.

    3. In practice, though, current MkdirAll implementation [1] returns
    ENOTDIR in most of cases described in #2, with the exception when
    there is a race between MkdirAll and someone else creating the
    last component of MkdirAll argument as a file. In this very case
    MkdirAll() will indeed return EEXIST.

    Because of #1, IsExist check after MkdirAll is not needed.

    Because of #2 and #3, ignoring IsExist error is just plain wrong,
    as directory we require is not created. It's cleaner to report
    the error now.

    Note this error is all over the tree, I guess due to copy-paste,
    or trying to follow the same usage pattern as for Mkdir(),
    or some not quite correct examples on the Internet.

    [1] https://github.com/golang/go/blob/f9ed2f75/src/os/path.go

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2017-11-27 17:32:12 -08:00
Darren Stahl
ed74ee127f Increase container default shutdown timeout on Windows
The shutdown timeout for containers in insufficient on Windows. If the daemon is shutting down, and a container takes longer than expected to shut down, this can cause the container to remain in a bad state after restart, and never be able to start again. Increasing the timeout makes this less likely to occur.

Signed-off-by: Darren Stahl <darst@microsoft.com>
2017-10-23 10:31:31 -07:00
Brian Goff
402540708c Merge pull request #34895 from mlaventure/containerd-1.0-client
Containerd 1.0 client
2017-10-23 10:38:03 -04:00
Yong Tang
ab0eb8fcf6 Merge pull request #35077 from ryansimmen/35076-WindowsDaemonTmpDir
Windows Daemon should respect DOCKER_TMPDIR
2017-10-20 08:40:43 -07:00
Kenfe-Mickael Laventure
ddae20c032
Update libcontainerd to use containerd 1.0
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2017-10-20 07:11:37 -07:00
Ryan Simmen
5611f127a7 Windows Daemon should respect DOCKER_TMPDIR
Signed-off-by: Ryan Simmen <ryan.simmen@gmail.com>
2017-10-19 10:47:46 -04:00
John Howard
0380fbff37 LCOW: API: Add platform to /images/create and /build
Signed-off-by: John Howard <jhoward@microsoft.com>

This PR has the API changes described in https://github.com/moby/moby/issues/34617.
Specifically, it adds an HTTP header "X-Requested-Platform" which is a JSON-encoded
OCI Image-spec `Platform` structure.

In addition, it renames (almost all) uses of a string variable platform (and associated)
methods/functions to os. This makes it much clearer to disambiguate with the swarm
"platform" which is really os/arch. This is a stepping stone to getting the daemon towards
fully multi-platform/arch-aware, and makes it clear when "operating system" is being
referred to rather than "platform" which is misleadingly used - sometimes in the swarm
meaning, but more often as just the operating system.
2017-10-06 11:44:18 -07:00
Pradip Dhara
d00a07b1e6 Updating moby to correspond to naming convention used in https://github.com/docker/swarmkit/pull/2385
Signed-off-by: Pradip Dhara <pradipd@microsoft.com>
2017-09-26 22:08:10 +00:00
Victor Vieux
a971f9c9d7 Merge pull request #34911 from dnephin/new-ci-entrypoint
Add a new entrypoint for CI
2017-09-26 11:50:44 -07:00
Sebastiaan van Stijn
2b50b14aeb
Suppress warning for renaming missing tmp directory
When starting `dockerd` on a host that has no `/var/lib/docker/tmp` directory,
a warning was printed in the logs:

    $ dockerd --data-root=/no-such-directory
    ...
    WARN[2017-09-26T09:37:00.045153377Z] failed to rename /no-such-directory/tmp for background deletion: rename /no-such-directory/tmp /no-such-directory/tmp-old: no such file or directory. Deleting synchronously

Although harmless, the warning does not show any useful information, so can be
skipped.

This patch checks thetype of error, so that warning is not printed.
Other errors will still show up:

    $ touch /i-am-a-file
    $ dockerd --data-root=/i-am-a-file
    Unable to get the full path to root (/i-am-a-file): canonical path points to a file '/i-am-a-file'

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2017-09-26 12:04:30 +02:00