Commit graph

137 commits

Author SHA1 Message Date
Cory Snider
6149c333ff daemon: don't checkpoint container until registered
(*Container).CheckpointTo() upserts a snapshot of the container to the
daemon's in-memory ViewDB and also persists the snapshot to disk. It
does not register the live container object with the daemon's container
store, however. The ViewDB and container store are used as the source of
truth for different operations, so having a container registered in one
but not the other can result in inconsistencies. In particular, the List
Containers API uses the ViewDB as its source of truth and the Container
Inspect API uses the container store.

The (*Daemon).setHostConfig() method is called fairly early in the
process of creating a container, long before the container is registered
in the daemon's container store. Due to a rogue CheckpointTo() call
inside setHostConfig(), there is a window of time where a container can
be included in a List Containers API response but "not exist" according
to the Container Inspect API and similar endpoints which operate on a
particular container. Remove the rogue call so that the caller has full
control over when the container is checkpointed and update callers to
checkpoint explicitly. No changes to (*Daemon).create() are needed as it
checkpoints the fully-created container via (*Daemon).Register().

Fixes #44512.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 0141c6db81)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-12-13 18:08:21 -05:00
Paweł Gronowski
e2bd8edb0d daemon/restart: Don't mutate AutoRemove when restarting
This caused a race condition where AutoRemove could be restored before
container was considered for restart and made autoremove containers
impossible to restart.

```
$ make DOCKER_GRAPHDRIVER=vfs BIND_DIR=. TEST_FILTER='TestContainerWithAutoRemoveCanBeRestarted' TESTFLAGS='-test.count 1' test-integration
...
=== RUN   TestContainerWithAutoRemoveCanBeRestarted
=== RUN   TestContainerWithAutoRemoveCanBeRestarted/kill
=== RUN   TestContainerWithAutoRemoveCanBeRestarted/stop
--- PASS: TestContainerWithAutoRemoveCanBeRestarted (1.61s)
    --- PASS: TestContainerWithAutoRemoveCanBeRestarted/kill (0.70s)
    --- PASS: TestContainerWithAutoRemoveCanBeRestarted/stop (0.86s)
PASS

DONE 3 tests in 3.062s
```

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2022-07-29 16:49:56 +02:00
Sebastiaan van Stijn
300c11c7c9
volume/mounts: remove "containerOS" argument from NewParser (LCOW code)
This changes mounts.NewParser() to create a parser for the current operatingsystem,
instead of one specific to a (possibly non-matching, in case of LCOW) OS.

With the OS-specific handling being removed, the "OS" parameter is also removed
from `daemon.verifyContainerSettings()`, and various other container-related
functions.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-07-02 13:51:55 +02:00
Sebastiaan van Stijn
dc7cbb9b33
remove layerstore indexing by OS (used for LCOW)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-06-10 17:49:11 +02:00
Brian Goff
51f5b1279d Don't set image on containerd container.
We aren't using containerd's image store, so we shouldn't be setting
this value.

This fixes container checkpoints, where containerd attempts to
checkpoint the image since one is set, but the image does not exist in
containerd.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-11-06 04:55:03 +00:00
Sebastiaan van Stijn
182795cff6
Do not call mount.RecursiveUnmount() on Windows
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-10-29 23:00:16 +01:00
Brian Goff
f63f73a4a8 Configure shims from runtime config
In dockerd we already have a concept of a "runtime", which specifies the
OCI runtime to use (e.g. runc).
This PR extends that config to add containerd shim configuration.
This option is only exposed within the daemon itself (cannot be
configured in daemon.json).
This is due to issues in supporting unknown shims which will require
more design work.

What this change allows us to do is keep all the runtime config in one
place.

So the default "runc" runtime will just have it's already existing shim
config codified within the runtime config alone.
I've also added 2 more "stock" runtimes which are basically runc+shimv1
and runc+shimv2.
These new runtime configurations are:

- io.containerd.runtime.v1.linux - runc + v1 shim using the V1 shim API
- io.containerd.runc.v2 - runc + shim v2

These names coincide with the actual names of the containerd shims.

This allows the user to essentially control what shim is going to be
used by either specifying these as a `--runtime` on container create or
by setting `--default-runtime` on the daemon.

For custom/user-specified runtimes, the default shim config (currently
shim v1) is used.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-07-13 14:18:02 -07:00
Sebastiaan van Stijn
eb14d936bf
daemon: rename variables that collide with imported package names
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-04-14 17:22:23 +02:00
Sebastiaan van Stijn
5d040cbd16
daemon: fix capitalization of some functions
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-04-14 17:22:19 +02:00
Kir Kolyshkin
39048cf656 Really switch to moby/sys/mount*
Switch to moby/sys/mount and mountinfo. Keep the pkg/mount for potential
outside users.

This commit was generated by the following bash script:

```
set -e -u -o pipefail

for file in $(git grep -l 'docker/docker/pkg/mount"' | grep -v ^pkg/mount); do
	sed -i -e 's#/docker/docker/pkg/mount"#/moby/sys/mount"#' \
		-e 's#mount\.\(GetMounts\|Mounted\|Info\|[A-Za-z]*Filter\)#mountinfo.\1#g' \
		$file
	goimports -w $file
done
```

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-20 09:46:25 -07:00
Evan Hazlett
35ac4be5d5 add NewContainerOpts to libcontainerd.Create
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
2019-10-03 11:45:41 -04:00
Sebastiaan van Stijn
1250e42a43
daemon:containerStart() fix unhandled error for saveApparmorConfig
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-08-29 20:28:58 +02:00
Brian Goff
5ba30cd1dc Delete stale containerd object on start failure
containerd has two objects with regard to containers.
There is a "container" object which is metadata and a "task" which is
manging the actual runtime state.

When docker starts a container, it creartes both the container metadata
and the task at the same time. So when a container exits, docker deletes
both of these objects as well.

This ensures that if, on start, when we go to create the container metadata object
in containerd, if there is an error due to a name conflict that we go
ahead and clean that up and try again.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2019-02-14 11:46:44 -08:00
Deng Guangxing
8e293be4ba fix unless-stopped unexpected behavior
fix https://github.com/moby/moby/issues/35304.

Signed-off-by: dengguangxing <dengguangxing@huawei.com>
2019-02-01 15:03:17 -08:00
Kir Kolyshkin
77bc327e24 UnmountIpcMount: simplify
As standard mount.Unmount does what we need, let's use it.

In addition, this adds ignoring "not mounted" condition, which
was previously implemented (see PR#33329, commit cfa2591d3f)
via a very expensive call to mount.Mounted().

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2018-12-10 20:06:10 -08:00
Daniel Nephin
2b1a2b10af Move ImageService to new package
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2018-02-26 16:49:37 -05:00
Daniel Nephin
0dab53ff3c Move all daemon image methods into imageService
imageService provides the backend for the image API and handles the
imageStore, and referenceStore.

Signed-off-by: Daniel Nephin <dnephin@docker.com>
2018-02-26 16:48:29 -05:00
Daniel Nephin
4f0d95fa6e Add canonical import comment
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2018-02-05 16:51:57 -05:00
Anusha Ragunathan
c162e8eb41
Merge pull request #35830 from cpuguy83/unbindable_shm
Make container shm parent unbindable
2018-01-19 17:43:30 -08:00
John Howard
afd305c4b5 LCOW: Refactor to multiple layer-stores based on feedback
Signed-off-by: John Howard <jhoward@microsoft.com>
2018-01-18 08:31:05 -08:00
John Howard
ce8e529e18 LCOW: Re-coalesce stores
Signed-off-by: John Howard <jhoward@microsoft.com>

The re-coalesces the daemon stores which were split as part of the
original LCOW implementation.

This is part of the work discussed in https://github.com/moby/moby/issues/34617,
in particular see the document linked to in that issue.
2018-01-18 08:29:19 -08:00
Brian Goff
eaa5192856 Make container resource mounts unbindable
It's a common scenario for admins and/or monitoring applications to
mount in the daemon root dir into a container. When doing so all mounts
get coppied into the container, often with private references.
This can prevent removal of a container due to the various mounts that
must be configured before a container is started (for example, for
shared /dev/shm, or secrets) being leaked into another namespace,
usually with private references.

This is particularly problematic on older kernels (e.g. RHEL < 7.4)
where a mount may be active in another namespace and attempting to
remove a mountpoint which is active in another namespace fails.

This change moves all container resource mounts into a common directory
so that the directory can be made unbindable.
What this does is prevents sub-mounts of this new directory from leaking
into other namespaces when mounted with `rbind`... which is how all
binds are handled for containers.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-01-16 15:09:05 -05:00
Yong Tang
c36274da83
Merge pull request #35638 from cpuguy83/error_helpers2
Add helpers to create errdef errors
2018-01-15 10:56:46 -08:00
Sebastiaan van Stijn
b4a6313969
Golint: remove redundant ifs
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2018-01-15 00:42:25 +01:00
Brian Goff
d453fe35b9 Move api/errdefs to errdefs
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-01-11 21:21:43 -05:00
Brian Goff
87a12421a9 Add helpers to create errdef errors
Instead of having to create a bunch of custom error types that are doing
nothing but wrapping another error in sub-packages, use a common helper
to create errors of the requested type.

e.g. instead of re-implementing this over and over:

```go
type notFoundError struct {
  cause error
}

func(e notFoundError) Error() string {
  return e.cause.Error()
}

func(e notFoundError) NotFound() {}

func(e notFoundError) Cause() error {
  return e.cause
}
```

Packages can instead just do:

```
  errdefs.NotFound(err)
```

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-01-11 21:21:43 -05:00
Kenfe-Mickael Laventure
ddae20c032
Update libcontainerd to use containerd 1.0
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2017-10-20 07:11:37 -07:00
John Howard
0380fbff37 LCOW: API: Add platform to /images/create and /build
Signed-off-by: John Howard <jhoward@microsoft.com>

This PR has the API changes described in https://github.com/moby/moby/issues/34617.
Specifically, it adds an HTTP header "X-Requested-Platform" which is a JSON-encoded
OCI Image-spec `Platform` structure.

In addition, it renames (almost all) uses of a string variable platform (and associated)
methods/functions to os. This makes it much clearer to disambiguate with the swarm
"platform" which is really os/arch. This is a stepping stone to getting the daemon towards
fully multi-platform/arch-aware, and makes it clear when "operating system" is being
referred to rather than "platform" which is misleadingly used - sometimes in the swarm
meaning, but more often as just the operating system.
2017-10-06 11:44:18 -07:00
Akash Gupta
7a7357dae1 LCOW: Implemented support for docker cp + build
This enables docker cp and ADD/COPY docker build support for LCOW.
Originally, the graphdriver.Get() interface returned a local path
to the container root filesystem. This does not work for LCOW, so
the Get() method now returns an interface that LCOW implements to
support copying to and from the container.

Signed-off-by: Akash Gupta <akagup@microsoft.com>
2017-09-14 12:07:52 -07:00
Daniel Nephin
9b47b7b151 Fix golint errors.
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2017-08-18 14:23:44 -04:00
John Howard
9fa449064c LCOW: WORKDIR correct handling
Signed-off-by: John Howard <jhoward@microsoft.com>
2017-08-17 15:29:17 -07:00
Brian Goff
ebcb7d6b40 Remove string checking in API error handling
Use strongly typed errors to set HTTP status codes.
Error interfaces are defined in the api/errors package and errors
returned from controllers are checked against these interfaces.

Errors can be wraeped in a pkg/errors.Causer, as long as somewhere in the
line of causes one of the interfaces is implemented. The special error
interfaces take precedence over Causer, meaning if both Causer and one
of the new error interfaces are implemented, the Causer is not
traversed.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2017-08-15 16:01:11 -04:00
Kir Kolyshkin
7120976d74 Implement none, private, and shareable ipc modes
Since the commit d88fe447df ("Add support for sharing /dev/shm/ and
/dev/mqueue between containers") container's /dev/shm is mounted on the
host first, then bind-mounted inside the container. This is done that
way in order to be able to share this container's IPC namespace
(and the /dev/shm mount point) with another container.

Unfortunately, this functionality breaks container checkpoint/restore
(even if IPC is not shared). Since /dev/shm is an external mount, its
contents is not saved by `criu checkpoint`, and so upon restore any
application that tries to access data under /dev/shm is severily
disappointed (which usually results in a fatal crash).

This commit solves the issue by introducing new IPC modes for containers
(in addition to 'host' and 'container:ID'). The new modes are:

 - 'shareable':	enables sharing this container's IPC with others
		(this used to be the implicit default);

 - 'private':	disables sharing this container's IPC.

In 'private' mode, container's /dev/shm is truly mounted inside the
container, without any bind-mounting from the host, which solves the
issue.

While at it, let's also implement 'none' mode. The motivation, as
eloquently put by Justin Cormack, is:

> I wondered a while back about having a none shm mode, as currently it is
> not possible to have a totally unwriteable container as there is always
> a /dev/shm writeable mount. It is a bit of a niche case (and clearly
> should never be allowed to be daemon default) but it would be trivial to
> add now so maybe we should...

...so here's yet yet another mode:

 - 'none':	no /dev/shm mount inside the container (though it still
		has its own private IPC namespace).

Now, to ultimately solve the abovementioned checkpoint/restore issue, we'd
need to make 'private' the default mode, but unfortunately it breaks the
backward compatibility. So, let's make the default container IPC mode
per-daemon configurable (with the built-in default set to 'shareable'
for now). The default can be changed either via a daemon CLI option
(--default-shm-mode) or a daemon.json configuration file parameter
of the same name.

Note one can only set either 'shareable' or 'private' IPC modes as a
daemon default (i.e. in this context 'host', 'container', or 'none'
do not make much sense).

Some other changes this patch introduces are:

1. A mount for /dev/shm is added to default OCI Linux spec.

2. IpcMode.Valid() is simplified to remove duplicated code that parsed
   'container:ID' form. Note the old version used to check that ID does
   not contain a semicolon -- this is no longer the case (tests are
   modified accordingly). The motivation is we should either do a
   proper check for container ID validity, or don't check it at all
   (since it is checked in other places anyway). I chose the latter.

3. IpcMode.Container() is modified to not return container ID if the
   mode value does not start with "container:", unifying the check to
   be the same as in IpcMode.IsContainer().

3. IPC mode unit tests (runconfig/hostconfig_test.go) are modified
   to add checks for newly added values.

[v2: addressed review at https://github.com/moby/moby/pull/34087#pullrequestreview-51345997]
[v3: addressed review at https://github.com/moby/moby/pull/34087#pullrequestreview-53902833]
[v4: addressed the case of upgrading from older daemon, in this case
     container.HostConfig.IpcMode is unset and this is valid]
[v5: document old and new IpcMode values in api/swagger.yaml]
[v6: add the 'none' mode, changelog entry to docs/api/version-history.md]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2017-08-14 10:50:39 +03:00
Derek McGowan
1009e6a40b
Update logrus to v1.0.1
Fixes case sensitivity issue

Signed-off-by: Derek McGowan <derek@mcgstyle.net>
2017-07-31 13:16:46 -07:00
Fabio Kung
66b231d598 delete unused code (daemon.Start)
Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
2017-06-23 07:52:34 -07:00
Fabio Kung
edad52707c save deep copies of Container in the replica store
Reuse existing structures and rely on json serialization to deep copy
Container objects.

Also consolidate all "save" operations on container.CheckpointTo, which
now both saves a serialized json to disk, and replicates state to the
ACID in-memory store.

Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
2017-06-23 07:52:33 -07:00
Fabio Kung
aacddda89d Move checkpointing to the Container object
Also hide ViewDB behind an inteface.

Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
2017-06-23 07:52:32 -07:00
Fabio Kung
eed4c7b73f keep a consistent view of containers rendered
Replicate relevant mutations to the in-memory ACID store. Readers will
then be able to query container state without locking.

Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
2017-06-23 07:52:31 -07:00
John Howard
3aa4a00715 LCOW: Move daemon stores to per platform
Signed-off-by: John Howard <jhoward@microsoft.com>
2017-06-20 19:49:52 -07:00
Brian Goff
7917a36cc7 Fix some data races
After running the test suite with the race detector enabled I found
these gems that need to be fixed.
This is just round one, sadly lost my test results after I built the
binary to test this... (whoops)

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2017-02-01 14:43:58 -05:00
ROBERTO MUÑOZ
d97a00dfd5 Added an apparmorEnabled boolean in the Daemon struct to indicate if AppArmor is enabled or not. It is set in NewDaemon using sysInfo information.
Signed-off-by: Roberto Muñoz Fernández <robertomf@gmail.com>

Added an apparmorEnabled boolean in the Daemon struct to indicate if AppArmor is enabled or not. It is set in NewDaemon using sysInfo information.

Signed-off-by: Roberto Muñoz Fernández <robertomf@gmail.com>

gofmt'd

Signed-off-by: Roberto Muñoz Fernández <robertomf@gmail.com>

change the function name to something more adequate and changed the behaviour to show empty value on an apparmor disabled system.

Signed-off-by: Roberto Muñoz Fernández <robertomf@gmail.com>

go fmt

Signed-off-by: Roberto Muñoz Fernández <robertomf@gmail.com>
2017-01-30 16:23:23 +01:00
Lei Jitang
e806821b53 fix #29199, reset container if container start failed
Signed-off-by: Lei Jitang <leijitang@huawei.com>
2016-12-07 01:37:08 -05:00
Vincent Demeester
ef39256dfb
Remove hostname validation as it seems to break users
Validation is still done by swarmkit on the service side.

Signed-off-by: Vincent Demeester <vincent@sbr.pm>
2016-11-30 19:22:07 +01:00
Brian Goff
9a2d0bc3ad Fix uneccessary calls to volume.Unmount()
Fixes #22564

When an error occurs on mount, there should not be any call later to
unmount. This can throw off refcounting in the underlying driver
unexpectedly.

Consider these two cases:

```
$ docker run -v foo:/bar busybox true
```

```
$ docker run -v foo:/bar -w /foo busybox true
```

In the first case, if mounting `foo` fails, the volume driver will not
get a call to unmount (this is the incorrect behavior).

In the second case, the volume driver will not get a call to unmount
(correct behavior).

This occurs because in the first case, `/bar` does not exist in the
container, and as such there is no call to `volume.Mount()` during the
`create` phase. It will error out during the `start` phase.

In the second case `/bar` is created before dealing with the volume
because of the `-w`. Because of this, when the volume is being setup
docker will try to copy the image path contents in the volume, in which
case it will attempt to mount the volume and fail. This happens during
the `create` phase. This makes it so the container will not be created
(or at least fully created) and the user gets the error on `create`
instead of `start`. The error handling is different in these two phases.

Changed to only send `unmount` if the volume is mounted.

While investigating the cause of the reported issue I found some odd
behavior in unmount calls so I've cleaned those up a bit here as well.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2016-11-10 14:04:08 -05:00
Evan Hazlett
3716ec25b4 secrets: secret management for swarm
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>

wip: use tmpfs for swarm secrets

Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>

wip: inject secrets from swarm secret store

Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>

secrets: use secret names in cli for service create

Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>

switch to use mounts instead of volumes

Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>

vendor: use ehazlett swarmkit

Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>

secrets: finish secret update

Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
2016-11-09 14:27:43 -05:00
Michael Crosby
47637b49a0 Convert err description to lower
Convert this to lower before checking the message of the error.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-11-08 14:42:54 -08:00
Amit Krishnan
934328d8ea Add functional support for Docker sub commands on Solaris
Signed-off-by: Amit Krishnan <krish.amit@gmail.com>

Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2016-11-07 09:06:34 -08:00
Mrunal Patel
4c10c2ded3 Ensure that SELinux Options are set when seccomp is already set
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2016-11-03 13:23:53 -07:00
boucher
bd7d51292c Allow providing a custom storage directory for docker checkpoints
Signed-off-by: boucher <rboucher@gmail.com>
2016-10-28 07:56:05 -04:00
Brian Goff
3c0a2bb289 error out if checkpoint specified on non-experimental
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2016-10-27 17:43:57 -07:00