The existing runtimes reload logic went to great lengths to replace the
directory containing runtime wrapper scripts as atomically as possible
within the limitations of the Linux filesystem ABI. Trouble is,
atomically swapping the wrapper scripts directory solves the wrong
problem! The runtime configuration is "locked in" when a container is
started, including the path to the runC binary. If a container is
started with a runtime which requires a daemon-managed wrapper script
and then the daemon is reloaded with a config which no longer requires
the wrapper script (i.e. some args -> no args, or the runtime is dropped
from the config), that container would become unmanageable. Any attempts
to stop, exec or otherwise perform lifecycle management operations on
the container are likely to fail due to the wrapper script no longer
existing at its original path.
Atomically swapping the wrapper scripts is also incompatible with the
read-copy-update paradigm for reloading configuration. A handler in the
daemon could retain a reference to the pre-reload configuration for an
indeterminate amount of time after the daemon configuration has been
reloaded and updated. It is possible for the daemon to attempt to start
a container using a deleted wrapper script if a request to run a
container races a reload.
Solve the problem of deleting referenced wrapper scripts by ensuring
that all wrapper scripts are *immutable* for the lifetime of the daemon
process. Any given runtime wrapper script must always exist with the
same contents, no matter how many times the daemon config is reloaded,
or what changes are made to the config. This is accomplished by using
everyone's favourite design pattern: content-addressable storage. Each
wrapper script file name is suffixed with the SHA-256 digest of its
contents to (probabilistically) guarantee immutability without needing
any concurrency control. Stale runtime wrapper scripts are only cleaned
up on the next daemon restart.
Split the derived runtimes configuration from the user-supplied
configuration to have a place to store derived state without mutating
the user-supplied configuration or exposing daemon internals in API
struct types. Hold the derived state and the user-supplied configuration
in a single struct value so that they can be updated as an atomic unit.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Ensure data-race-free access to the daemon configuration without
locking by mutating a deep copy of the config and atomically storing
a pointer to the copy into the daemon-wide configStore value. Any
operations which need to read from the daemon config must capture the
configStore value only once and pass it around to guarantee a consistent
view of the config.
Signed-off-by: Cory Snider <csnider@mirantis.com>
There's still some locations refering to AuFS;
- pkg/archive: I suspect most of that code is because the whiteout-files
are modelled after aufs (but possibly some code is only relevant to
images created with AuFS as storage driver; to be looked into).
- contrib/apparmor/template: likely some rules can be removed
- contrib/dockerize-disk.sh: very old contribution, and unlikely used
by anyone, but perhaps could be updated if we want to (or just removed).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The netutils.ElectInterfaceAddresses function is only used in one place
outside of tests: in the daemon, to configure the default bridge
network. The function is also messy to reason about as it references the
shared mutable state of ipamutils.PredefinedLocalScopeDefaultNetworks.
It uses the list of predefined default networks to always return an IPv4
address even if the named interface does not exist or does not have any
IPv4 addresses. This list happens to be the same as the one used to
initialize the address pool of the 'builtin' IPAM driver, though that is
far from obvious. (Start with "./libnetwork".initIPAMDrivers and trace
the dataflow of the addressPool value. Surprise! Global state is being
mutated using the value of other global mutable state.)
The daemon does not need the fallback behaviour of
ElectInterfaceAddresses. In fact, the daemon does not have to configure
an address pool for the network at all! libnetwork will acquire one of
the available address ranges from the network's IPAM driver when the
preferred-pool configuration is unset. It will do so using the same list
of address ranges and the exact same logic
(netutils.FindAvailableNetworks) as ElectInterfaceAddresses. So unless
the daemon needs to force the network to use a specific address range
because the bridge interface already exists, it can leave the details
up to libnetwork.
Signed-off-by: Cory Snider <csnider@mirantis.com>
daemon/network/filter_test.go:174:19: empty-lines: extra empty line at the end of a block (revive)
daemon/restart.go:17:116: empty-lines: extra empty line at the end of a block (revive)
daemon/daemon_linux_test.go:255:41: empty-lines: extra empty line at the end of a block (revive)
daemon/reload_test.go:340:58: empty-lines: extra empty line at the end of a block (revive)
daemon/oci_linux.go:495:101: empty-lines: extra empty line at the end of a block (revive)
daemon/seccomp_linux_test.go:17:36: empty-lines: extra empty line at the start of a block (revive)
daemon/container_operations.go:560:73: empty-lines: extra empty line at the end of a block (revive)
daemon/daemon_unix.go:558:76: empty-lines: extra empty line at the end of a block (revive)
daemon/daemon_unix.go:1092:64: empty-lines: extra empty line at the start of a block (revive)
daemon/container_operations.go:587:24: empty-lines: extra empty line at the end of a block (revive)
daemon/network.go:807:18: empty-lines: extra empty line at the end of a block (revive)
daemon/network.go:813:42: empty-lines: extra empty line at the end of a block (revive)
daemon/network.go:872:72: empty-lines: extra empty line at the end of a block (revive)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
While working on deprecation of the `aufs` and `overlay` storage-drivers, the
`TestCleanupMounts` had to be updated, as it was currently using `aufs` for
testing. When rewriting the test to use `overlay2` instead (using an updated
`mountsFixture`), I found out that the test was failing, and it appears that
only `overlay`, but not `overlay2` was taken into account.
These cleanup functions were added in 05cc737f54,
but at the time the `overlay2` storage driver was not yet implemented;
05cc737f54/daemon/graphdriver
This omission was likely missed in 23e5c94cfb,
because the original implementation re-used the `overlay` storage driver, but
later on it was decided to make `overlay2` a separate storage driver.
As a result of the above, `daemon.cleanupMountsByID()` would ignore any `overlay2`
mounts during `daemon.Shutdown()` and `daemon.Cleanup()`.
This patch:
- Adds a new `mountsFixtureOverlay2` with example mounts for `overlay2`
- Rewrites the tests to use `gotest.tools` for more informative output on failures.
- Adds the missing regex patterns to `daemon/getCleanPatterns()`. The patterns
are added at the start of the list to allow for the fasted match (`overlay2`
is the default for most setups, and the code is iterating over possible
options).
As a follow-up, we could consider adding additional fixtures for different
storage drivers.
Before the fix is applied:
go test -v -run TestCleanupMounts ./daemon/
=== RUN TestCleanupMounts
=== RUN TestCleanupMounts/aufs
=== RUN TestCleanupMounts/overlay2
daemon_linux_test.go:135: assertion failed: 0 (unmounted int) != 1 (int): Expected to unmount the shm (and the shm only)
--- FAIL: TestCleanupMounts (0.01s)
--- PASS: TestCleanupMounts/aufs (0.00s)
--- FAIL: TestCleanupMounts/overlay2 (0.01s)
=== RUN TestCleanupMountsByID
=== RUN TestCleanupMountsByID/aufs
=== RUN TestCleanupMountsByID/overlay2
daemon_linux_test.go:171: assertion failed: 0 (unmounted int) != 1 (int): Expected to unmount the root (and that only)
--- FAIL: TestCleanupMountsByID (0.00s)
--- PASS: TestCleanupMountsByID/aufs (0.00s)
--- FAIL: TestCleanupMountsByID/overlay2 (0.00s)
FAIL
FAIL github.com/docker/docker/daemon 0.054s
FAIL
With the fix applied:
go test -v -run TestCleanupMounts ./daemon/
=== RUN TestCleanupMounts
=== RUN TestCleanupMounts/aufs
=== RUN TestCleanupMounts/overlay2
--- PASS: TestCleanupMounts (0.00s)
--- PASS: TestCleanupMounts/aufs (0.00s)
--- PASS: TestCleanupMounts/overlay2 (0.00s)
=== RUN TestCleanupMountsByID
=== RUN TestCleanupMountsByID/aufs
=== RUN TestCleanupMountsByID/overlay2
--- PASS: TestCleanupMountsByID (0.00s)
--- PASS: TestCleanupMountsByID/aufs (0.00s)
--- PASS: TestCleanupMountsByID/overlay2 (0.00s)
PASS
ok github.com/docker/docker/daemon 0.042s
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The io/ioutil package has been deprecated in Go 1.16. This commit
replaces the existing io/ioutil functions with their new definitions in
io and os packages.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
This changes mounts.NewParser() to create a parser for the current operatingsystem,
instead of one specific to a (possibly non-matching, in case of LCOW) OS.
With the OS-specific handling being removed, the "OS" parameter is also removed
from `daemon.verifyContainerSettings()`, and various other container-related
functions.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Switch to moby/sys/mount and mountinfo. Keep the pkg/mount for potential
outside users.
This commit was generated by the following bash script:
```
set -e -u -o pipefail
for file in $(git grep -l 'docker/docker/pkg/mount"' | grep -v ^pkg/mount); do
sed -i -e 's#/docker/docker/pkg/mount"#/moby/sys/mount"#' \
-e 's#mount\.\(GetMounts\|Mounted\|Info\|[A-Za-z]*Filter\)#mountinfo.\1#g' \
$file
goimports -w $file
done
```
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This implements chown support on Windows. Built-in accounts as well
as accounts included in the SAM database of the container are supported.
NOTE: IDPair is now named Identity and IDMappings is now named
IdentityMapping.
The following are valid examples:
ADD --chown=Guest . <some directory>
COPY --chown=Administrator . <some directory>
COPY --chown=Guests . <some directory>
COPY --chown=ContainerUser . <some directory>
On Windows an owner is only granted the permission to read the security
descriptor and read/write the discretionary access control list. This
fix also grants read/write and execute permissions to the owner.
Signed-off-by: Salahuddin Khan <salah@docker.com>
This prevents the following test case failures "go test" is run
as non-root in the daemon/ directory:
> --- FAIL: TestContainerInitDNS (0.02s)
> daemon_test.go:209: chown /tmp/docker-container-test-054812199/volumes: operation not permitted
>
> --- FAIL: TestDaemonReloadNetworkDiagnosticPort (0.00s)
> reload_test.go:525: mkdir /var/lib/docker/network/files/: permission denied
> --- FAIL: TestRootMountCleanup (0.00s)
> daemon_linux_test.go:240: assertion failed: error is not nil: operation not permitted
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Use mount.SingleEntryFilter as we're only interested in a single entry.
Test case data of TestShouldUnmountRoot is modified accordingly, as
from now on:
1. `info` can't be nil;
2. the mountpoint check is not performed (as SingleEntryFilter
guarantees it to be equal to daemon.root).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This makes sure that if the daemon root was already a self-binded mount
(thus meaning the daemonc only performed a remount) that the daemon does
not try to unmount.
Example:
```
$ sudo mount --bind /var/lib/docker /var/lib/docker
$ sudo dockerd &
```
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This test case is checking that the built-in default size for /dev/shm
(which is used for `--ipcmode` being `private` or `shareable`)
is not overriding the size of user-defined tmpfs mount for /dev/shm.
In other words, this is a regression test case for issue #35271,
https://github.com/moby/moby/issues/35271
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
- Refactor generic and path based cleanup functions into a single function.
- Include aufs and zfs mounts in the mounts cleanup.
- Containers that receive exit event on restore don't require manual cleanup.
- Make missing sandbox id message a warning because currently sandboxes are always cleared on startup. libnetwork#975
- Don't unmount volumes for containers that don't have base path. Shouldn't be needed after #21372
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
- Print the mount table as in /proc/self/mountinfo
- Do not exit prematurely when one of the ipc mounts doesn't exist.
- Do not exit prematurely when one of the ipc mounts cannot be unmounted.
- Add a unit test to see if the cleanup really works.
- Use syscall.MNT_DETACH to cleanup mounts after a crash.
- Unmount IPC mounts when the daemon unregisters an old running container.
Signed-off-by: David Calavera <david.calavera@gmail.com>