This changes mounts.NewParser() to create a parser for the current operatingsystem,
instead of one specific to a (possibly non-matching, in case of LCOW) OS.
With the OS-specific handling being removed, the "OS" parameter is also removed
from `daemon.verifyContainerSettings()`, and various other container-related
functions.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Needed for runc >= 1.0.0-rc94.
See runc issue 2928.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Commit 12c7541f1f updated the
opencontainers/selinux dependency to v1.3.1, which had a breaking
change in the errors that were returned.
Before v1.3.1, the "raw" `syscall.ENOTSUP` was returned if the
underlying filesystem did not support xattrs, but later versions
wrapped the error, which caused our detection to fail.
This patch uses `errors.Is()` to check for the underlying error.
This requires github.com/pkg/errors v0.9.1 or above (older versions
could use `errors.Cause()`, but are not compatible with "native"
wrapping of errors in Go 1.13 and up, and could potentially cause
these errors to not being detected again.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Switch to moby/sys/mount and mountinfo. Keep the pkg/mount for potential
outside users.
This commit was generated by the following bash script:
```
set -e -u -o pipefail
for file in $(git grep -l 'docker/docker/pkg/mount"' | grep -v ^pkg/mount); do
sed -i -e 's#/docker/docker/pkg/mount"#/moby/sys/mount"#' \
-e 's#mount\.\(GetMounts\|Mounted\|Info\|[A-Za-z]*Filter\)#mountinfo.\1#g' \
$file
goimports -w $file
done
```
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
When cleaning up IPC mounts, the daemon could log a warning if the IPC mount was not found;
```
cleanup: failed to unmount IPC: umount /var/lib/docker/containers/90f408e26e205d30676655a08504dddc0d17f5713c1dd4654cf67ded7d3bbb63/mounts/shm, flags: 0x2: no such file or directory"
```
These warnings are safe to ignore, but can cause some confusion; `container.UnmountIpcMount()`
already attempted to suppress these warnings, however, `mount.Unmount()` returns a `mountError`,
which nests the original error, therefore detecting failed.
This parch uses `errors.Cause()` to get the _underlying_ error to detect if it's a "is not exist".
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This import got lost after commit 56cc56b0fa
was merged, likely because the PR was built against an outdated
master.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The errors returned from Mount and Unmount functions are raw
syscall.Errno errors (like EPERM or EINVAL), which provides
no context about what has happened and why.
Similar to os.PathError type, introduce mount.Error type
with some context. The error messages will now look like this:
> mount /tmp/mount-tests/source:/tmp/mount-tests/target, flags: 0x1001: operation not permitted
or
> mount tmpfs:/tmp/mount-test-source-516297835: operation not permitted
Before this patch, it was just
> operation not permitted
[v2: add Cause()]
[v3: rename MountError to Error, document Cause()]
[v4: fixes; audited all users]
[v5: make Error type private; changes after @cpuguy83 reviews]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
As standard mount.Unmount does what we need, let's use it.
In addition, this adds ignoring "not mounted" condition, which
was previously implemented (see PR#33329, commit cfa2591d3f)
via a very expensive call to mount.Mounted().
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. As daemon.ContainerStop() documentation says,
> If a negative number of seconds is given, ContainerStop
> will wait for a graceful termination.
but since commit cfdf84d5d0 (PR #32237) this is no longer the case.
This happens because `context.WithTimeout(ctx, timeout)` is implemented
as `WithDeadline(ctx, time.Now().Add(timeout))`, resulting in a deadline
which is in the past.
To fix, don't use WithDeadline() if the timeout is negative.
2. Add a test case to validate the correct behavior and
as a means to prevent a similar regression in the future.
3. Fix/improve daemon.ContainerStop() and client.ContainerStop()
description for clarity and completeness.
4. Fix/improve DefaultStopTimeout description.
Fixes: cfdf84d5d0 ("Update Container Wait")
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This moves the platform specific stuff in a separate package and keeps
the `volume` package and the defined interfaces light to import.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
On unix, merge secrets/configs handling. This is important because
configs can contain secrets (via templating) and potentially a config
could just simply have secret information "by accident" from the user.
This just make sure that configs are as secure as secrets and de-dups a
lot of code.
Generally this makes everything simpler and configs more secure.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
It's a common scenario for admins and/or monitoring applications to
mount in the daemon root dir into a container. When doing so all mounts
get coppied into the container, often with private references.
This can prevent removal of a container due to the various mounts that
must be configured before a container is started (for example, for
shared /dev/shm, or secrets) being leaked into another namespace,
usually with private references.
This is particularly problematic on older kernels (e.g. RHEL < 7.4)
where a mount may be active in another namespace and attempting to
remove a mountpoint which is active in another namespace fails.
This change moves all container resource mounts into a common directory
so that the directory can be made unbindable.
What this does is prevents sub-mounts of this new directory from leaking
into other namespaces when mounted with `rbind`... which is how all
binds are handled for containers.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This is a fix to the following issue:
$ docker run --tmpfs /dev/shm busybox sh
docker: Error response from daemon: linux mounts: Duplicate mount point '/dev/shm'.
In current code (daemon.createSpec()), tmpfs mount from --tmpfs is added
to list of mounts (`ms`), when the mount from IpcMounts() is added.
While IpcMounts() is checking for existing mounts first, it does that
by using container.HasMountFor() function which only checks container.Mounts
but not container.Tmpfs.
Ultimately, the solution is to get rid of container.Tmpfs (moving its
data to container.Mounts). Current workaround is to add checking
of container.Tmpfs into container.HasMountFor().
A unit test case is included.
Unfortunately we can't call daemon.createSpec() from a unit test,
as the code relies a lot on various daemon structures to be initialized
properly, and it is hard to achieve. Therefore, we minimally mimick
the code flow of daemon.createSpec() -- barely enough to reproduce
the issue.
https://github.com/moby/moby/issues/35455
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The code in question looks up mounts two times: first by using
HasMountFor(), and then directly by looking in container.MountPoints.
There is no need to do it twice.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The shutdown timeout for containers in insufficient on Windows. If the daemon is shutting down, and a container takes longer than expected to shut down, this can cause the container to remain in a bad state after restart, and never be able to start again. Increasing the timeout makes this less likely to occur.
Signed-off-by: Darren Stahl <darst@microsoft.com>
This enables docker cp and ADD/COPY docker build support for LCOW.
Originally, the graphdriver.Get() interface returned a local path
to the container root filesystem. This does not work for LCOW, so
the Get() method now returns an interface that LCOW implements to
support copying to and from the container.
Signed-off-by: Akash Gupta <akagup@microsoft.com>
Use strongly typed errors to set HTTP status codes.
Error interfaces are defined in the api/errors package and errors
returned from controllers are checked against these interfaces.
Errors can be wraeped in a pkg/errors.Causer, as long as somewhere in the
line of causes one of the interfaces is implemented. The special error
interfaces take precedence over Causer, meaning if both Causer and one
of the new error interfaces are implemented, the Causer is not
traversed.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Since the commit d88fe447df ("Add support for sharing /dev/shm/ and
/dev/mqueue between containers") container's /dev/shm is mounted on the
host first, then bind-mounted inside the container. This is done that
way in order to be able to share this container's IPC namespace
(and the /dev/shm mount point) with another container.
Unfortunately, this functionality breaks container checkpoint/restore
(even if IPC is not shared). Since /dev/shm is an external mount, its
contents is not saved by `criu checkpoint`, and so upon restore any
application that tries to access data under /dev/shm is severily
disappointed (which usually results in a fatal crash).
This commit solves the issue by introducing new IPC modes for containers
(in addition to 'host' and 'container:ID'). The new modes are:
- 'shareable': enables sharing this container's IPC with others
(this used to be the implicit default);
- 'private': disables sharing this container's IPC.
In 'private' mode, container's /dev/shm is truly mounted inside the
container, without any bind-mounting from the host, which solves the
issue.
While at it, let's also implement 'none' mode. The motivation, as
eloquently put by Justin Cormack, is:
> I wondered a while back about having a none shm mode, as currently it is
> not possible to have a totally unwriteable container as there is always
> a /dev/shm writeable mount. It is a bit of a niche case (and clearly
> should never be allowed to be daemon default) but it would be trivial to
> add now so maybe we should...
...so here's yet yet another mode:
- 'none': no /dev/shm mount inside the container (though it still
has its own private IPC namespace).
Now, to ultimately solve the abovementioned checkpoint/restore issue, we'd
need to make 'private' the default mode, but unfortunately it breaks the
backward compatibility. So, let's make the default container IPC mode
per-daemon configurable (with the built-in default set to 'shareable'
for now). The default can be changed either via a daemon CLI option
(--default-shm-mode) or a daemon.json configuration file parameter
of the same name.
Note one can only set either 'shareable' or 'private' IPC modes as a
daemon default (i.e. in this context 'host', 'container', or 'none'
do not make much sense).
Some other changes this patch introduces are:
1. A mount for /dev/shm is added to default OCI Linux spec.
2. IpcMode.Valid() is simplified to remove duplicated code that parsed
'container:ID' form. Note the old version used to check that ID does
not contain a semicolon -- this is no longer the case (tests are
modified accordingly). The motivation is we should either do a
proper check for container ID validity, or don't check it at all
(since it is checked in other places anyway). I chose the latter.
3. IpcMode.Container() is modified to not return container ID if the
mode value does not start with "container:", unifying the check to
be the same as in IpcMode.IsContainer().
3. IPC mode unit tests (runconfig/hostconfig_test.go) are modified
to add checks for newly added values.
[v2: addressed review at https://github.com/moby/moby/pull/34087#pullrequestreview-51345997]
[v3: addressed review at https://github.com/moby/moby/pull/34087#pullrequestreview-53902833]
[v4: addressed the case of upgrading from older daemon, in this case
container.HostConfig.IpcMode is unset and this is valid]
[v5: document old and new IpcMode values in api/swagger.yaml]
[v6: add the 'none' mode, changelog entry to docs/api/version-history.md]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Replicate relevant mutations to the in-memory ACID store. Readers will
then be able to query container state without locking.
Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
The Solaris version (previously daemon/inspect_solaris.go) was
apparently missing some fields that should be available on that
platform.
Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
Doing a chown/chmod automatically can cause `EPERM` in some cases (e.g.
with an NFS mount). Currently Docker will always call chown+chmod on a
volume path unless `:nocopy` is passed in, but we don't need to make
these calls if the perms and ownership already match and potentially
avoid an uneccessary `EPERM`.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>