The daemon has made a habit of mutating the DefaultRuntime and Runtimes
values in the Config struct to merge defaults. This would be fine if it
was a part of the regular configuration loading and merging process,
as is done with other config options. The trouble is it does so in
surprising places, such as in functions with 'verify' or 'validate' in
their name. It has been necessary in order to validate that the user has
not defined a custom runtime named "runc" which would shadow the
built-in runtime of the same name. Other daemon code depends on the
runtime named "runc" always being defined in the config, but merging it
with the user config at the same time as the other defaults are merged
would trip the validation. The root of the issue is that the daemon has
used the same config values for both validating the daemon runtime
configuration as supplied by the user and for keeping track of which
runtimes have been set up by the daemon. Now that a completely separate
value is used for the latter purpose, surprising contortions are no
longer required to make the validation work as intended.
Consolidate the validation of the runtimes config and merging of the
built-in runtimes into the daemon.setupRuntimes() function. Set the
result of merging the built-in runtimes config and default default
runtime on the returned runtimes struct, without back-propagating it
onto the config.Config argument.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The existing runtimes reload logic went to great lengths to replace the
directory containing runtime wrapper scripts as atomically as possible
within the limitations of the Linux filesystem ABI. Trouble is,
atomically swapping the wrapper scripts directory solves the wrong
problem! The runtime configuration is "locked in" when a container is
started, including the path to the runC binary. If a container is
started with a runtime which requires a daemon-managed wrapper script
and then the daemon is reloaded with a config which no longer requires
the wrapper script (i.e. some args -> no args, or the runtime is dropped
from the config), that container would become unmanageable. Any attempts
to stop, exec or otherwise perform lifecycle management operations on
the container are likely to fail due to the wrapper script no longer
existing at its original path.
Atomically swapping the wrapper scripts is also incompatible with the
read-copy-update paradigm for reloading configuration. A handler in the
daemon could retain a reference to the pre-reload configuration for an
indeterminate amount of time after the daemon configuration has been
reloaded and updated. It is possible for the daemon to attempt to start
a container using a deleted wrapper script if a request to run a
container races a reload.
Solve the problem of deleting referenced wrapper scripts by ensuring
that all wrapper scripts are *immutable* for the lifetime of the daemon
process. Any given runtime wrapper script must always exist with the
same contents, no matter how many times the daemon config is reloaded,
or what changes are made to the config. This is accomplished by using
everyone's favourite design pattern: content-addressable storage. Each
wrapper script file name is suffixed with the SHA-256 digest of its
contents to (probabilistically) guarantee immutability without needing
any concurrency control. Stale runtime wrapper scripts are only cleaned
up on the next daemon restart.
Split the derived runtimes configuration from the user-supplied
configuration to have a place to store derived state without mutating
the user-supplied configuration or exposing daemon internals in API
struct types. Hold the derived state and the user-supplied configuration
in a single struct value so that they can be updated as an atomic unit.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Ensure data-race-free access to the daemon configuration without
locking by mutating a deep copy of the config and atomically storing
a pointer to the copy into the daemon-wide configStore value. Any
operations which need to read from the daemon config must capture the
configStore value only once and pass it around to guarantee a consistent
view of the config.
Signed-off-by: Cory Snider <csnider@mirantis.com>
`docker run -v /foo:/foo:ro` is now recursively read-only on kernel >= 5.12.
Automatically falls back to the legacy non-recursively read-only mount mode on kernel < 5.12.
Use `ro-non-recursive` to disable RRO.
Use `ro-force-recursive` or `rro` to explicitly enable RRO. (Fails on kernel < 5.12)
Fix issue 44978
Fix docker/for-linux issue 788
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
There's still some locations refering to AuFS;
- pkg/archive: I suspect most of that code is because the whiteout-files
are modelled after aufs (but possibly some code is only relevant to
images created with AuFS as storage driver; to be looked into).
- contrib/apparmor/template: likely some rules can be removed
- contrib/dockerize-disk.sh: very old contribution, and unlikely used
by anyone, but perhaps could be updated if we want to (or just removed).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The netutils.ElectInterfaceAddresses function is only used in one place
outside of tests: in the daemon, to configure the default bridge
network. The function is also messy to reason about as it references the
shared mutable state of ipamutils.PredefinedLocalScopeDefaultNetworks.
It uses the list of predefined default networks to always return an IPv4
address even if the named interface does not exist or does not have any
IPv4 addresses. This list happens to be the same as the one used to
initialize the address pool of the 'builtin' IPAM driver, though that is
far from obvious. (Start with "./libnetwork".initIPAMDrivers and trace
the dataflow of the addressPool value. Surprise! Global state is being
mutated using the value of other global mutable state.)
The daemon does not need the fallback behaviour of
ElectInterfaceAddresses. In fact, the daemon does not have to configure
an address pool for the network at all! libnetwork will acquire one of
the available address ranges from the network's IPAM driver when the
preferred-pool configuration is unset. It will do so using the same list
of address ranges and the exact same logic
(netutils.FindAvailableNetworks) as ElectInterfaceAddresses. So unless
the daemon needs to force the network to use a specific address range
because the bridge interface already exists, it can leave the details
up to libnetwork.
Signed-off-by: Cory Snider <csnider@mirantis.com>
This makes it more transparent that it's unused for Linux,
and we don't pass "root", which has no relation with the
path on Linux.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
While working on deprecation of the `aufs` and `overlay` storage-drivers, the
`TestCleanupMounts` had to be updated, as it was currently using `aufs` for
testing. When rewriting the test to use `overlay2` instead (using an updated
`mountsFixture`), I found out that the test was failing, and it appears that
only `overlay`, but not `overlay2` was taken into account.
These cleanup functions were added in 05cc737f54,
but at the time the `overlay2` storage driver was not yet implemented;
05cc737f54/daemon/graphdriver
This omission was likely missed in 23e5c94cfb,
because the original implementation re-used the `overlay` storage driver, but
later on it was decided to make `overlay2` a separate storage driver.
As a result of the above, `daemon.cleanupMountsByID()` would ignore any `overlay2`
mounts during `daemon.Shutdown()` and `daemon.Cleanup()`.
This patch:
- Adds a new `mountsFixtureOverlay2` with example mounts for `overlay2`
- Rewrites the tests to use `gotest.tools` for more informative output on failures.
- Adds the missing regex patterns to `daemon/getCleanPatterns()`. The patterns
are added at the start of the list to allow for the fasted match (`overlay2`
is the default for most setups, and the code is iterating over possible
options).
As a follow-up, we could consider adding additional fixtures for different
storage drivers.
Before the fix is applied:
go test -v -run TestCleanupMounts ./daemon/
=== RUN TestCleanupMounts
=== RUN TestCleanupMounts/aufs
=== RUN TestCleanupMounts/overlay2
daemon_linux_test.go:135: assertion failed: 0 (unmounted int) != 1 (int): Expected to unmount the shm (and the shm only)
--- FAIL: TestCleanupMounts (0.01s)
--- PASS: TestCleanupMounts/aufs (0.00s)
--- FAIL: TestCleanupMounts/overlay2 (0.01s)
=== RUN TestCleanupMountsByID
=== RUN TestCleanupMountsByID/aufs
=== RUN TestCleanupMountsByID/overlay2
daemon_linux_test.go:171: assertion failed: 0 (unmounted int) != 1 (int): Expected to unmount the root (and that only)
--- FAIL: TestCleanupMountsByID (0.00s)
--- PASS: TestCleanupMountsByID/aufs (0.00s)
--- FAIL: TestCleanupMountsByID/overlay2 (0.00s)
FAIL
FAIL github.com/docker/docker/daemon 0.054s
FAIL
With the fix applied:
go test -v -run TestCleanupMounts ./daemon/
=== RUN TestCleanupMounts
=== RUN TestCleanupMounts/aufs
=== RUN TestCleanupMounts/overlay2
--- PASS: TestCleanupMounts (0.00s)
--- PASS: TestCleanupMounts/aufs (0.00s)
--- PASS: TestCleanupMounts/overlay2 (0.00s)
=== RUN TestCleanupMountsByID
=== RUN TestCleanupMountsByID/aufs
=== RUN TestCleanupMountsByID/overlay2
--- PASS: TestCleanupMountsByID (0.00s)
--- PASS: TestCleanupMountsByID/aufs (0.00s)
--- PASS: TestCleanupMountsByID/overlay2 (0.00s)
PASS
ok github.com/docker/docker/daemon 0.042s
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
After moving libnetwork to this repo, we need to update all the import
paths for libnetwork to point to docker/docker/libnetwork instead of
docker/libnetwork.
This change implements that.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Switch to moby/sys/mount and mountinfo. Keep the pkg/mount for potential
outside users.
This commit was generated by the following bash script:
```
set -e -u -o pipefail
for file in $(git grep -l 'docker/docker/pkg/mount"' | grep -v ^pkg/mount); do
sed -i -e 's#/docker/docker/pkg/mount"#/moby/sys/mount"#' \
-e 's#mount\.\(GetMounts\|Mounted\|Info\|[A-Za-z]*Filter\)#mountinfo.\1#g' \
$file
goimports -w $file
done
```
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This fix was added in 8e71b1e210 to work around
a go issue (https://github.com/golang/go/issues/20506).
That issue was fixed in
66c03d39f3,
which is part of Go 1.10 and up. This reverts the changes that were made in
8e71b1e210, and are no longer needed.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Handle the case of systemd-resolved, and if in place
use a different resolv.conf source.
Set appropriately the option on libnetwork.
Move unix specific code to container_operation_unix
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Use mount.SingleEntryFilter as we're only interested in a single entry.
Test case data of TestShouldUnmountRoot is modified accordingly, as
from now on:
1. `info` can't be nil;
2. the mountpoint check is not performed (as SingleEntryFilter
guarantees it to be equal to daemon.root).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Functions `GetMounts()` and `parseMountTable()` return all the entries
as read and parsed from /proc/self/mountinfo. In many cases the caller
is only interested only one or a few entries, not all of them.
One good example is `Mounted()` function, which looks for a specific
entry only. Another example is `RecursiveUnmount()` which is only
interested in mount under a specific path.
This commit adds `filter` argument to `GetMounts()` to implement
two things:
1. filter out entries a caller is not interested in
2. stop processing if a caller is found what it wanted
`nil` can be passed to get a backward-compatible behavior, i.e. return
all the entries.
A few filters are implemented:
- `PrefixFilter`: filters out all entries not under `prefix`
- `SingleEntryFilter`: looks for a specific entry
Finally, `Mounted()` is modified to use `SingleEntryFilter()`, and
`RecursiveUnmount()` is using `PrefixFilter()`.
Unit tests are added to check filters are working.
[v2: ditch NoFilter, use nil]
[v3: ditch GetMountsFiltered()]
[v4: add unit test for filters]
[v5: switch to gotestyourself]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This makes sure that if the daemon root was already a self-binded mount
(thus meaning the daemonc only performed a remount) that the daemon does
not try to unmount.
Example:
```
$ sudo mount --bind /var/lib/docker /var/lib/docker
$ sudo dockerd &
```
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
- Refactor generic and path based cleanup functions into a single function.
- Include aufs and zfs mounts in the mounts cleanup.
- Containers that receive exit event on restore don't require manual cleanup.
- Make missing sandbox id message a warning because currently sandboxes are always cleared on startup. libnetwork#975
- Don't unmount volumes for containers that don't have base path. Shouldn't be needed after #21372
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
When the daemon shutdown ungracefully, it will left the running
containers' rootfs still be mounted. This will cause some error
when trying to remove the containers.
Signed-off-by: Lei Jitang <leijitang@huawei.com>
Instead of using `MNT_DETACH` to unmount the container's mqueue/shm
mounts, force it... but only on daemon init and shutdown.
This makes sure that these IPC mounts are cleaned up even when the
daemon is killed.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
- Print the mount table as in /proc/self/mountinfo
- Do not exit prematurely when one of the ipc mounts doesn't exist.
- Do not exit prematurely when one of the ipc mounts cannot be unmounted.
- Add a unit test to see if the cleanup really works.
- Use syscall.MNT_DETACH to cleanup mounts after a crash.
- Unmount IPC mounts when the daemon unregisters an old running container.
Signed-off-by: David Calavera <david.calavera@gmail.com>
This changeset creates /dev/shm and /dev/mqueue mounts for each container under
/var/lib/containers/<id>/ and bind mounts them into the container. When --ipc:container<id/name>
is used, then the /dev/shm and /dev/mqueue of the ipc container are used instead of creating
new ones for the container.
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
(cherry picked from commit d88fe447df)
This changeset creates /dev/shm and /dev/mqueue mounts for each container under
/var/lib/containers/<id>/ and bind mounts them into the container. When --ipc:container<id/name>
is used, then the /dev/shm and /dev/mqueue of the ipc container are used instead of creating
new ones for the container.
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)