The new daemon.containerFSView type covers all the use-cases on Linux
with a much more intuitive API, but is not portable to Windows.
Discourage people from using the old and busted functions in new Linux
code by excluding them entirely from Linux builds.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The Driver abstraction was needed for Linux Containers on Windows,
support for which has since been removed.
There is no direct equivalent to Lchmod() in the standard library so
continue to use the containerd/continuity version.
Signed-off-by: Cory Snider <csnider@mirantis.com>
These interfaces were added in aacddda89d, with
no clear motivation, other than "Also hide ViewDB behind an interface".
This patch removes the interface in favor of using a concrete implementation;
There's currently only one implementation of this interface, and if we would
decide to change to an alternative implementation, we could define relevant
interfaces on the receiver side.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The containerd client is very chatty at the best of times. Because the
libcontained API is stateless and references containers and processes by
string ID for every method call, the implementation is essentially
forced to use the containerd client in a way which amplifies the number
of redundant RPCs invoked to perform any operation. The libcontainerd
remote implementation has to reload the containerd container, task
and/or process metadata for nearly every operation. This in turn
amplifies the number of context switches between dockerd and containerd
to perform any container operation or handle a containerd event,
increasing the load on the system which could otherwise be allocated to
workloads.
Overhaul the libcontainerd interface to reduce the impedance mismatch
with the containerd client so that the containerd client can be used
more efficiently. Split the API out into container, task and process
interfaces which the consumer is expected to retain so that
libcontainerd can retain state---especially the analogous containerd
client objects---without having to manage any state-store inside the
libcontainerd client.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The OOMKilled flag on a container's state has historically behaved
rather unintuitively: it is updated on container exit to reflect whether
or not any process within the container has been OOM-killed during the
preceding run of the container. The OOMKilled flag would be set to true
when the container exits if any process within the container---including
execs---was OOM-killed at any time while the container was running,
whether or not the OOM-kill was the cause of the container exiting. The
flag is "sticky," persisting through the next start of the container;
only being cleared once the container exits without any processes having
been OOM-killed that run.
Alter the behavior of the OOMKilled flag such that it signals whether
any process in the container had been OOM-killed since the most recent
start of the container. Set the flag immediately upon any process being
OOM-killed, and clear it when the container transitions to the "running"
state.
There is an ulterior motive for this change. It reduces the amount of
state the libcontainerd client needs to keep track of and clean up on
container exit. It's one less place the client could leak memory if a
container was to be deleted without going through libcontainerd.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Before this change there was a race condition between State.Wait reading
the exit code from State and the State being changed instantly after the
change which ended the State.Wait.
Now, each State.Wait has its own channel which is used to transmit the
desired StateStatus at the time the state transitions to the awaited
one. Wait no longer reads the status by itself so there is no race.
The issue caused the `docker run --restart=always ...' to sometimes exit
with 0 exit code, because the process was already restarted by the time
State.Wait got the chance to read the exit code.
Test run
--------
Before:
```
$ go test -count 1 -run TestCorrectStateWaitResultAfterRestart .
--- FAIL: TestCorrectStateWaitResultAfterRestart (0.00s)
state_test.go:198: expected exit code 10, got 0
FAIL
FAIL github.com/docker/docker/container 0.011s
FAIL
```
After:
```
$ go test -count 1 -run TestCorrectStateWaitResultAfterRestart .
ok github.com/docker/docker/container 0.011s
```
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Older versions of Go don't format comments, so committing this as
a separate commit, so that we can already make these changes before
we upgrade to Go 1.19.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The io/ioutil package has been deprecated in Go 1.16. This commit
replaces the existing io/ioutil functions with their new definitions in
io and os packages.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
This const was previously living in pkg/signal, but with that package
being moved to its own module, it didn't make much sense to put docker's
defaults in a generic module.
The const from the "signal" package is currenlty used *both* by the CLI
and the daemon as a default value when creating containers. This put up
some questions:
a. should the default be non-exported, and private to the container
package? After all, it's a _default_ (so should be used if _NOT_ set).
b. should the client actually setting a default, or instead just omit
the value, unless specified by the user? having the client set a
default also means that the daemon cannot change the default value
because the client (or older clients) will override it.
c. consider defaults from the client and defaults of the daemon to be
separate things, and create a default const in the CLI.
This patch implements option "a" (option "b" will be done separately,
as it involves the CLI code). This still leaves "c" open as an option,
if the CLI wants to set its own default.
Unfortunately, this change means we'll have to drop the alias for the
deprecated pkg/signal.DefaultStopSignal const, but a comment was left
instead, which can assist consumers of the const to find why it's no
longer there (a search showed the Docker CLI as the only consumer though).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This changes mounts.NewParser() to create a parser for the current operatingsystem,
instead of one specific to a (possibly non-matching, in case of LCOW) OS.
With the OS-specific handling being removed, the "OS" parameter is also removed
from `daemon.verifyContainerSettings()`, and various other container-related
functions.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The LCOW implementation in dockerd has been deprecated in favor of re-implementation
in containerd (in progress). Microsoft started removing the LCOW V1 code from the
build dependencies we use in Microsoft/opengcs (soon to be part of Microsoft/hcshhim),
which means that we need to start removing this code.
This first step removes the lcow graphdriver, the LCOW initialization code, and
some LCOW-related utilities.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Needed for runc >= 1.0.0-rc94.
See runc issue 2928.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
When writing container's `hostconfig.json`, permissions were set to 0644 (world-
readable). While this is not a security concern (as the `/var/lib/docker/containers`
directory has `0700` or `0701` permissions), there is no real need to have these
permissions, as this file is only accessed by the daemon.
Looking at history for file permissions;
- 06b53e3fc7 (first implementation) used `0666` (world-writable)
- cf1a6c08fa refactored the code, and removed explicit permissions
- ea3cbd3274 introduced atomic writes, and brought back the `0666` permissions
- 3ec8fed747 removed world-writable bits, but kept world-readable
This patch updates the permissions to `0600`, matching what's used for `config.v2.json`,
which was updated in ae52cea3ab, but forgot to update
`hostconfig.json`.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Commit 12c7541f1f updated the
opencontainers/selinux dependency to v1.3.1, which had a breaking
change in the errors that were returned.
Before v1.3.1, the "raw" `syscall.ENOTSUP` was returned if the
underlying filesystem did not support xattrs, but later versions
wrapped the error, which caused our detection to fail.
This patch uses `errors.Is()` to check for the underlying error.
This requires github.com/pkg/errors v0.9.1 or above (older versions
could use `errors.Cause()`, but are not compatible with "native"
wrapping of errors in Go 1.13 and up, and could potentially cause
these errors to not being detected again.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Since we don't need the actual split values, instead of calling
`strings.Split`, which allocates new slices on each call, use
`strings.Index`.
This significantly reduces the allocations required when doing env value
replacements.
Additionally, pre-allocate the env var slice, even if we allocate a
little more than we need, it keeps us from having to do multiple
allocations while appending.
```
benchmark old ns/op new ns/op delta
BenchmarkReplaceOrAppendEnvValues/0-8 486 313 -35.60%
BenchmarkReplaceOrAppendEnvValues/100-8 10553 1535 -85.45%
BenchmarkReplaceOrAppendEnvValues/1000-8 94275 12758 -86.47%
BenchmarkReplaceOrAppendEnvValues/10000-8 1161268 129269 -88.87%
benchmark old allocs new allocs delta
BenchmarkReplaceOrAppendEnvValues/0-8 5 2 -60.00%
BenchmarkReplaceOrAppendEnvValues/100-8 110 0 -100.00%
BenchmarkReplaceOrAppendEnvValues/1000-8 1013 0 -100.00%
BenchmarkReplaceOrAppendEnvValues/10000-8 10022 0 -100.00%
benchmark old bytes new bytes delta
BenchmarkReplaceOrAppendEnvValues/0-8 192 24 -87.50%
BenchmarkReplaceOrAppendEnvValues/100-8 7360 0 -100.00%
BenchmarkReplaceOrAppendEnvValues/1000-8 64832 0 -100.00%
BenchmarkReplaceOrAppendEnvValues/10000-8 1146049 0 -100.00%
```
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Switch to moby/sys/mount and mountinfo. Keep the pkg/mount for potential
outside users.
This commit was generated by the following bash script:
```
set -e -u -o pipefail
for file in $(git grep -l 'docker/docker/pkg/mount"' | grep -v ^pkg/mount); do
sed -i -e 's#/docker/docker/pkg/mount"#/moby/sys/mount"#' \
-e 's#mount\.\(GetMounts\|Mounted\|Info\|[A-Za-z]*Filter\)#mountinfo.\1#g' \
$file
goimports -w $file
done
```
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Configuration over the API per container is intentionally left out for
the time being, but is supported to configure the default from the
daemon config.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit cbecf48bc352e680a5390a7ca9cff53098cd16d7)
Signed-off-by: Madhu Venugopal <madhu@docker.com>
This supplements any log driver which does not support reads with a
custom read implementation that uses a local file cache.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit d675e2bf2b75865915c7a4552e00802feeb0847f)
Signed-off-by: Madhu Venugopal <madhu@docker.com>
Format the source according to latest goimports.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This made my IDE unhappy; `ConfigFilePath` is an exported function, so
it makes sense to use the same signature for both Linux and Windows.
This patch also adds error handling (same as on Linux), even though the
current implementation will never return an error (it's good practice
to handle errors, so I assumed this would be the right approach)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
also renamed the non-windows variant of this file to be
consistent with other files in this package
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
When cleaning up IPC mounts, the daemon could log a warning if the IPC mount was not found;
```
cleanup: failed to unmount IPC: umount /var/lib/docker/containers/90f408e26e205d30676655a08504dddc0d17f5713c1dd4654cf67ded7d3bbb63/mounts/shm, flags: 0x2: no such file or directory"
```
These warnings are safe to ignore, but can cause some confusion; `container.UnmountIpcMount()`
already attempted to suppress these warnings, however, `mount.Unmount()` returns a `mountError`,
which nests the original error, therefore detecting failed.
This parch uses `errors.Cause()` to get the _underlying_ error to detect if it's a "is not exist".
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>