Commit graph

465 commits

Author SHA1 Message Date
Sebastiaan van Stijn
f5209d23a8
daemon: add nolint-comments for deprecated kernel-memory options, hooks
This adds some nolint-comments for the deprecated kernel-memory options; we
deprecated these, but they could technically still be accepted by alternative
runtimes.

    daemon/daemon_unix.go:108:3: SA1019: memory.Kernel is deprecated: kernel-memory limits are not supported in cgroups v2, and were obsoleted in [kernel v5.4]. This field should no longer be used, as it may be ignored by runtimes. (staticcheck)
            memory.Kernel = &config.KernelMemory
            ^
    daemon/update_linux.go:63:3: SA1019: memory.Kernel is deprecated: kernel-memory limits are not supported in cgroups v2, and were obsoleted in [kernel v5.4]. This field should no longer be used, as it may be ignored by runtimes. (staticcheck)
            memory.Kernel = &resources.KernelMemory
            ^

Prestart hooks are deprecated, and more granular hooks should be used instead.
CreateRuntime are the closest equivalent, and executed in the same locations
as Prestart-hooks, but depending on what these hooks do, possibly one of the
other hooks could be used instead (such as CreateContainer or StartContainer).
As these hooks are still supported, this patch adds nolint comments, but adds
some TODOs to consider migrating to something else;

    daemon/nvidia_linux.go:86:2: SA1019: s.Hooks.Prestart is deprecated: use [Hooks.CreateRuntime], [Hooks.CreateContainer], and [Hooks.StartContainer] instead, which allow more granular hook control during the create and start phase. (staticcheck)
        s.Hooks.Prestart = append(s.Hooks.Prestart, specs.Hook{
        ^

    daemon/oci_linux.go:76:5: SA1019: s.Hooks.Prestart is deprecated: use [Hooks.CreateRuntime], [Hooks.CreateContainer], and [Hooks.StartContainer] instead, which allow more granular hook control during the create and start phase. (staticcheck)
                    s.Hooks.Prestart = append(s.Hooks.Prestart, specs.Hook{
                    ^

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2024-04-15 17:55:47 +02:00
Sebastiaan van Stijn
2970b320aa
api: remove code for adjusting CPU shares (api < v1.19)
API versions before 1.19 allowed CpuShares that were greater than the maximum
or less than the minimum supported by the kernel, and relied on the kernel to
do the right thing.

Commit ed39fbeb2a introduced code to adjust the
CPU shares to be within the accepted range when using API version 1.18 or
lower.

API v1.23 and older are deprecated, so we can remove support for this
functionality.

Currently, there's no validation for CPU shares to be within an acceptable
range; a TODO was added to add validation for this option, and to use the
`linuxMinCPUShares` and `linuxMaxCPUShares` consts for this.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2024-02-06 18:44:33 +01:00
Sebastiaan van Stijn
ce1ee98aba
Merge pull request #46447 from akerouanton/api-predefined-networks
api: Add consts for predefined networks
2023-11-24 12:26:48 +01:00
Sebastiaan van Stijn
cff4f20c44
migrate to github.com/containerd/log v0.1.0
The github.com/containerd/containerd/log package was moved to a separate
module, which will also be used by upcoming (patch) releases of containerd.

This patch moves our own uses of the package to use the new module.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-10-11 17:52:23 +02:00
Albin Kerouanton
a8975c9042
api: Add consts for predefined networks
Constants for both platform-specific and platform-independent networks
are added to the api/network package.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-09-10 15:39:54 +02:00
Sebastiaan van Stijn
74354043ff
remove uses of libnetwork/Network.Info()
Now that we removed the interface, there's no need to cast the Network
to a NetworkInfo interface, so we can remove uses of the `Info()` method.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-08-08 22:05:30 +02:00
Sebastiaan van Stijn
034e7e153f
daemon: rename containerdCli to containerdClient
The containerdCli was somewhat confusing (is it the CLI?); let's rename
to make it match what it is :)

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-07-18 13:57:27 +02:00
Bjorn Neergaard
b60c02b065
Merge pull request #45887 from thaJeztah/move_mtu
daemon/config: move MTU to BridgeConfig, and warn when using on Windows
2023-07-06 09:41:06 -06:00
Sebastiaan van Stijn
3721a525ce
daemon: initBridgeDriver(): pass BridgeConfig, instead of daemon config
Now that the MTU field was moved, this function only needs the BridgeConfig,
which contains all options for the default "bridge" network.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-07-05 14:43:36 +02:00
Sebastiaan van Stijn
b8220f5d0d
daemon/config: move MTU to BridgeConfig
This option is only used for the default bridge network; let's move the
field to that struct to make it clearer what it's used for.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-07-05 14:43:35 +02:00
Sebastiaan van Stijn
02815416bb
daemon: use string-literals for easier grep'ing
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-07-05 12:27:00 +02:00
Sebastiaan van Stijn
210932b3bf
daemon: format code with gofumpt
Formatting the code with https://github.com/mvdan/gofumpt

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-29 00:33:03 +02:00
Brian Goff
74da6a6363 Switch all logging to use containerd log pkg
This unifies our logging and allows us to propagate logging and trace
contexts together.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2023-06-24 00:23:44 +00:00
Sebastiaan van Stijn
ed798d651a
Merge pull request #45704 from corhere/fix-zeroes-in-linux-resources
daemon: stop setting container resources to zero
2023-06-12 09:44:07 +02:00
Djordje Lukic
32d58144fd c8d: Use reference counting while mounting a snapshot
Some snapshotters (like overlayfs or zfs) can't mount the same
directories twice. For example if the same directroy is used as an upper
directory in two mounts the kernel will output this warning:

    overlayfs: upperdir is in-use as upperdir/workdir of another mount, accessing files from both mounts will result in undefined behavior.

And indeed accessing the files from both mounts will result in an "No
such file or directory" error.

This change introduces reference counts for the mounts, if a directory
is already mounted the mount interface will only increment the mount
counter and return the mount target effectively making sure that the
filesystem doesn't end up in an undefined behavior.

Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
2023-06-07 15:50:01 +02:00
Cory Snider
dea870f4ea daemon: stop setting container resources to zero
Many of the fields in LinuxResources struct are pointers to scalars for
some reason, presumably to differentiate between set-to-zero and unset
when unmarshaling from JSON, despite zero being outside the acceptable
range for the corresponding kernel tunables. When creating the OCI spec
for a container, the daemon sets the container's OCI spec CPUShares and
BlkioWeight parameters to zero when the corresponding Docker container
configuration values are zero, signifying unset, despite the minimum
acceptable value for CPUShares being two, and BlkioWeight ten. This has
gone unnoticed as runC does not distingiush set-to-zero from unset as it
also uses zero internally to represent unset for those fields. However,
kata-containers v3.2.0-alpha.3 tries to apply the explicit-zero resource
parameters to the container, exactly as instructed, and fails loudly.
The OCI runtime-spec is silent on how the runtime should handle the case
when those parameters are explicitly set to out-of-range values and
kata's behaviour is not unreasonable, so the daemon must therefore be in
the wrong.

Translate unset values in the Docker container's resources HostConfig to
omit the corresponding fields in the container's OCI spec when starting
and updating a container in order to maximize compatibility with
runtimes.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-06 12:13:05 -04:00
Cory Snider
0f6eeecac0 daemon: consolidate runtimes config validation
The daemon has made a habit of mutating the DefaultRuntime and Runtimes
values in the Config struct to merge defaults. This would be fine if it
was a part of the regular configuration loading and merging process,
as is done with other config options. The trouble is it does so in
surprising places, such as in functions with 'verify' or 'validate' in
their name. It has been necessary in order to validate that the user has
not defined a custom runtime named "runc" which would shadow the
built-in runtime of the same name. Other daemon code depends on the
runtime named "runc" always being defined in the config, but merging it
with the user config at the same time as the other defaults are merged
would trip the validation. The root of the issue is that the daemon has
used the same config values for both validating the daemon runtime
configuration as supplied by the user and for keeping track of which
runtimes have been set up by the daemon. Now that a completely separate
value is used for the latter purpose, surprising contortions are no
longer required to make the validation work as intended.

Consolidate the validation of the runtimes config and merging of the
built-in runtimes into the daemon.setupRuntimes() function. Set the
result of merging the built-in runtimes config and default default
runtime on the returned runtimes struct, without back-propagating it
onto the config.Config argument.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-01 14:45:25 -04:00
Cory Snider
d222bf097c daemon: reload runtimes w/o breaking containers
The existing runtimes reload logic went to great lengths to replace the
directory containing runtime wrapper scripts as atomically as possible
within the limitations of the Linux filesystem ABI. Trouble is,
atomically swapping the wrapper scripts directory solves the wrong
problem! The runtime configuration is "locked in" when a container is
started, including the path to the runC binary. If a container is
started with a runtime which requires a daemon-managed wrapper script
and then the daemon is reloaded with a config which no longer requires
the wrapper script (i.e. some args -> no args, or the runtime is dropped
from the config), that container would become unmanageable. Any attempts
to stop, exec or otherwise perform lifecycle management operations on
the container are likely to fail due to the wrapper script no longer
existing at its original path.

Atomically swapping the wrapper scripts is also incompatible with the
read-copy-update paradigm for reloading configuration. A handler in the
daemon could retain a reference to the pre-reload configuration for an
indeterminate amount of time after the daemon configuration has been
reloaded and updated. It is possible for the daemon to attempt to start
a container using a deleted wrapper script if a request to run a
container races a reload.

Solve the problem of deleting referenced wrapper scripts by ensuring
that all wrapper scripts are *immutable* for the lifetime of the daemon
process. Any given runtime wrapper script must always exist with the
same contents, no matter how many times the daemon config is reloaded,
or what changes are made to the config. This is accomplished by using
everyone's favourite design pattern: content-addressable storage. Each
wrapper script file name is suffixed with the SHA-256 digest of its
contents to (probabilistically) guarantee immutability without needing
any concurrency control. Stale runtime wrapper scripts are only cleaned
up on the next daemon restart.

Split the derived runtimes configuration from the user-supplied
configuration to have a place to store derived state without mutating
the user-supplied configuration or exposing daemon internals in API
struct types. Hold the derived state and the user-supplied configuration
in a single struct value so that they can be updated as an atomic unit.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-01 14:45:25 -04:00
Cory Snider
0b592467d9 daemon: read-copy-update the daemon config
Ensure data-race-free access to the daemon configuration without
locking by mutating a deep copy of the config and atomically storing
a pointer to the copy into the daemon-wide configStore value. Any
operations which need to read from the daemon config must capture the
configStore value only once and pass it around to guarantee a consistent
view of the config.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-01 14:45:24 -04:00
Sebastiaan van Stijn
ab35df454d
remove pre-go1.17 build-tags
Removed pre-go1.17 build-tags with go fix;

    go mod init
    go fix -mod=readonly ./...
    rm go.mod

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-05-19 20:38:51 +02:00
Sebastiaan van Stijn
fb96b94ed0
daemon: remove handling for deprecated "oom-score-adjust", and produce error
This option was deprecated in 5a922dc162, which
is part of the v24.0.0 release, so we can remove it from master.

This patch;

- adds a check to ValidatePlatformConfig, and produces a fatal error
  if oom-score-adjust is set
- removes the deprecated libcontainerd/supervisor.WithOOMScore
- removes the warning from docker info

With this patch:

    dockerd --oom-score-adjust=-500 --validate
    Flag --oom-score-adjust has been deprecated, and will be removed in the next release.
    unable to configure the Docker daemon with file /etc/docker/daemon.json: merged configuration validation from file and command line flags failed: DEPRECATED: The "oom-score-adjust" config parameter and the dockerd "--oom-score-adjust" options have been removed.

And when using `daemon.json`:

    dockerd --validate
    unable to configure the Docker daemon with file /etc/docker/daemon.json: merged configuration validation from file and command line flags failed: DEPRECATED: The "oom-score-adjust" config parameter and the dockerd "--oom-score-adjust" options have been removed.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-05-06 16:36:17 +02:00
Sebastiaan van Stijn
3eebf4d162
container: split security options to a SecurityOptions struct
- Split these options to a separate struct, so that we can handle them in isolation.
- Change some tests to use subtests, and improve coverage

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-29 00:03:37 +02:00
Brian Goff
0970cb054c
Merge pull request #45366 from akerouanton/fix-docker0-PreferredPool
daemon: set docker0 subpool as the IPAM pool
2023-04-25 11:07:57 -07:00
Albin Kerouanton
2d31697d82
daemon: set docker0 subpool as the IPAM pool
Since cc19eba (backported to v23.0.4), the PreferredPool for docker0 is
set only when the user provides the bip config parameter or when the
default bridge already exist. That means, if a user provides the
fixed-cidr parameter on a fresh install or reboot their computer/server
without bip set, dockerd throw the following error when it starts:

> failed to start daemon: Error initializing network controller: Error
> creating default "bridge" network: failed to parse pool request for
> address space "LocalDefault" pool "" subpool "100.64.0.0/26": Invalid
> Address SubPool

See #45356.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-04-25 15:32:46 +02:00
Sebastiaan van Stijn
f72548956f
remove deprecated legacy "overlay" storage-driver
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-19 17:06:45 +02:00
Sebastiaan van Stijn
f691b13450
daemon: move code related to stats together
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-08 19:00:01 +02:00
Akihiro Suda
e807ae4f2e
vendor: github.com/containerd/cgroups/v3 v3.0.1
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2023-03-08 20:15:17 +09:00
Cory Snider
b0eed5ade6 daemon: allow shimv2 runtimes to be configured
Kubernetes only permits RuntimeClass values which are valid lowercase
RFC 1123 labels, which disallows the period character. This prevents
cri-dockerd from being able to support configuring alternative shimv2
runtimes for a pod as shimv2 runtime names must contain at least one
period character. Add support for configuring named shimv2 runtimes in
daemon.json so that runtime names can be aliased to
Kubernetes-compatible names.

Allow options to be set on shimv2 runtimes in daemon.json.

The names of the new daemon runtime config fields have been selected to
correspond with the equivalent field names in cri-containerd's
configuration so that users can more easily follow documentation from
the runtime vendor written for cri-containerd and apply it to
daemon.json.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-17 18:08:06 -05:00
Djordje Lukic
0137446248 Implement run using the containerd snapshotter
Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>

c8d/daemon: Mount root and fill BaseFS

This fixes things that were broken due to nil BaseFS like `docker cp`
and running a container with workdir override.

This is more of a temporary hack than a real solution.
The correct fix would be to refactor the code to make BaseFS and LayerRW
an implementation detail of the old image store implementation and use
the temporary mounts for the c8d implementation instead.
That requires more work though.

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>

daemon/images: Don't unset BaseFS

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2023-02-06 18:21:50 +01:00
Cory Snider
cc19eba579 daemon: let libnetwork assign default bridge IPAM
The netutils.ElectInterfaceAddresses function is only used in one place
outside of tests: in the daemon, to configure the default bridge
network. The function is also messy to reason about as it references the
shared mutable state of ipamutils.PredefinedLocalScopeDefaultNetworks.
It uses the list of predefined default networks to always return an IPv4
address even if the named interface does not exist or does not have any
IPv4 addresses. This list happens to be the same as the one used to
initialize the address pool of the 'builtin' IPAM driver, though that is
far from obvious. (Start with "./libnetwork".initIPAMDrivers and trace
the dataflow of the addressPool value. Surprise! Global state is being
mutated using the value of other global mutable state.)

The daemon does not need the fallback behaviour of
ElectInterfaceAddresses. In fact, the daemon does not have to configure
an address pool for the network at all! libnetwork will acquire one of
the available address ranges from the network's IPAM driver when the
preferred-pool configuration is unset. It will do so using the same list
of address ranges and the exact same logic
(netutils.FindAvailableNetworks) as ElectInterfaceAddresses. So unless
the daemon needs to force the network to use a specific address range
because the bridge interface already exists, it can leave the details
up to libnetwork.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-26 14:54:57 -05:00
Cory Snider
f96b9bf761 libnetwork: return concrete-typed *Controller
libnetwork.NetworkController is an interface with a single
implementation.

https://github.com/golang/go/wiki/CodeReviewComments#interfaces

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-13 14:09:37 -05:00
Sebastiaan van Stijn
f95e9b68d6
daemon: use strings.Cut() and cleanup error messages
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-12-21 11:09:03 +01:00
Cory Snider
388fe4aea8 daemon: drop side effect from registerLinks()
(*Daemon).registerLinks() calling the WriteHostConfig() method of its
container argument is a vestigial behaviour. In the distant past,
registerLinks() would persist the container links in an SQLite database
and drop the link config from the container's persisted HostConfig. This
changed in Docker v1.10 (#16032) which migrated away from SQLite and
began using the link config in the container's HostConfig as the
persistent source of truth. registerLinks() no longer mutates the
HostConfig at all so persisting the HostConfig to disk falls outside of
its scope of responsibilities.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-12-12 16:04:09 -05:00
Sebastiaan van Stijn
e7904c5faa
Merge pull request #44309 from thaJeztah/daemon_check_requirements
daemon: NewDaemon(): check system requirements early
2022-11-01 13:42:44 +01:00
Sebastiaan van Stijn
17fb29c9e8
daemon: NewDaemon(): check system requirements early
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-10-17 15:15:55 +02:00
Sebastiaan van Stijn
69f72417f4
pkg/idtools: remove CanAccess(), and move to daemon
The implementation of CanAccess() is very rudimentary, and should
not be used for anything other than a basic check (and maybe not
even for that). It's only used in a single location in the daemon,
so move it there, and un-export it to not encourage others to use
it out of context.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-10-15 22:42:39 +02:00
Sebastiaan van Stijn
ddb42f3ad2
daemon: fix empty-lines (revive)
daemon/network/filter_test.go:174:19: empty-lines: extra empty line at the end of a block (revive)
    daemon/restart.go:17:116: empty-lines: extra empty line at the end of a block (revive)
    daemon/daemon_linux_test.go:255:41: empty-lines: extra empty line at the end of a block (revive)
    daemon/reload_test.go:340:58: empty-lines: extra empty line at the end of a block (revive)
    daemon/oci_linux.go:495:101: empty-lines: extra empty line at the end of a block (revive)
    daemon/seccomp_linux_test.go:17:36: empty-lines: extra empty line at the start of a block (revive)
    daemon/container_operations.go:560:73: empty-lines: extra empty line at the end of a block (revive)
    daemon/daemon_unix.go:558:76: empty-lines: extra empty line at the end of a block (revive)
    daemon/daemon_unix.go:1092:64: empty-lines: extra empty line at the start of a block (revive)
    daemon/container_operations.go:587:24: empty-lines: extra empty line at the end of a block (revive)
    daemon/network.go:807:18: empty-lines: extra empty line at the end of a block (revive)
    daemon/network.go:813:42: empty-lines: extra empty line at the end of a block (revive)
    daemon/network.go:872:72: empty-lines: extra empty line at the end of a block (revive)

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-09-28 01:58:51 +02:00
Cory Snider
9ce2b30b81 pkg/containerfs: drop ContainerFS type alias
Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-09-23 16:56:53 -04:00
Cory Snider
4bafaa00aa Refactor libcontainerd to minimize c8d RPCs
The containerd client is very chatty at the best of times. Because the
libcontained API is stateless and references containers and processes by
string ID for every method call, the implementation is essentially
forced to use the containerd client in a way which amplifies the number
of redundant RPCs invoked to perform any operation. The libcontainerd
remote implementation has to reload the containerd container, task
and/or process metadata for nearly every operation. This in turn
amplifies the number of context switches between dockerd and containerd
to perform any container operation or handle a containerd event,
increasing the load on the system which could otherwise be allocated to
workloads.

Overhaul the libcontainerd interface to reduce the impedance mismatch
with the containerd client so that the containerd client can be used
more efficiently. Split the API out into container, task and process
interfaces which the consumer is expected to retain so that
libcontainerd can retain state---especially the analogous containerd
client objects---without having to manage any state-store inside the
libcontainerd client.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-08-24 14:59:08 -04:00
Djordje Lukic
d8d990f2e3
daemon: make the snapshotter configurable
Treat (storage/graph)Driver as snapshotter

Also moved some layerStore related initialization to the non-c8d case
because otherwise they get treated as a graphdriver plugins.

Co-authored-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-08-22 18:57:42 +02:00
Brian Goff
e6ee27a541 Allow containerd shim refs in default-runtime
Since runtimes can now just be containerd shims, we need to check if the
reference is possibly a containerd shim.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2022-08-18 18:41:03 +00:00
Cory Snider
547da0d575 daemon: support other containerd runtimes (MVP)
Contrary to popular belief, the OCI Runtime specification does not
specify the command-line API for runtimes. Looking at containerd's
architecture from the lens of the OCI Runtime spec, the _shim_ is the
OCI Runtime and runC is "just" an implementation detail of the
io.containerd.runc.v2 runtime. When one configures a non-default runtime
in Docker, what they're really doing is instructing Docker to create
containers using the io.containerd.runc.v2 runtime with a configuration
option telling the runtime that the runC binary is at some non-default
path. Consequently, only OCI runtimes which are compatible with the
io.containerd.runc.v2 shim, such as crun, can be used in this manner.
Other OCI runtimes, including kata-containers v2, come with their own
containerd shim and are not compatible with io.containerd.runc.v2.
As Docker has not historically provided a way to select a non-default
runtime which requires its own shim, runtimes such as kata-containers v2
could not be used with Docker.

Allow other containerd shims to be used with Docker; no daemon
configuration required. If the daemon is instructed to create a
container with a runtime name which does not match any of the configured
or stock runtimes, it passes the name along to containerd verbatim. A
user can start a container with the kata-containers runtime, for
example, simply by calling

    docker run --runtime io.containerd.kata.v2

Runtime names which containerd would interpret as a path to an arbitrary
binary are disallowed. While handy for development and testing it is not
strictly necessary and would allow anyone with Engine API access to
trivially execute any binary on the host as root, so we have decided it
would be safest for our users if it was not allowed.

It is not yet possible to set an alternative containerd shim as the
default runtime; it can only be configured per-container.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-07-27 14:22:49 -04:00
Sebastiaan van Stijn
52c1a2fae8
gofmt GoDoc comments with go1.19
Older versions of Go don't format comments, so committing this as
a separate commit, so that we can already make these changes before
we upgrade to Go 1.19.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-07-08 19:56:23 +02:00
Akihiro Suda
2c7a6d7bb1
daemon: remove support for deprecated io.containerd.runtime.v1.linux
This has been deprecated in Docker 20.10.0 (f63f73a4a8)

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2022-06-05 18:41:30 +09:00
Sebastiaan van Stijn
b241e2008e
daemon.NewDaemon(): fix network feature detection on first start
Commit 483aa6294b introduced a regression, causing
spurious warnings to be shown when starting a daemon for the first time after
a fresh install:

    docker info
    ...
    WARNING: IPv4 forwarding is disabled
    WARNING: bridge-nf-call-iptables is disabled
    WARNING: bridge-nf-call-ip6tables is disabled

The information shown is incorrect, as checking the corresponding options on
the system, shows that these options are available:

    cat /proc/sys/net/ipv4/ip_forward
    1
    cat /proc/sys/net/bridge/bridge-nf-call-iptables
    1
    cat /proc/sys/net/bridge/bridge-nf-call-ip6tables
    1

The reason this is failing is because the daemon itself reconfigures those
options during networking initialization in `configureIPForwarding()`;
cf4595265e/libnetwork/drivers/bridge/setup_ip_forwarding.go (L14-L25)

Network initialization happens in the `daemon.restore()` function within `daemon.NewDaemon()`:
cf4595265e/daemon/daemon.go (L475-L478)

However, 483aa6294b moved detection of features
earlier in the `daemon.NewDaemon()` function, and collects the system information
(`d.RawSysInfo()`) before we enter `daemon.restore()`;
cf4595265e/daemon/daemon.go (L1008-L1011)

For optimization (collecting the system information comes at a cost), those
results are cached on the daemon, and will only be performed once (using a
`sync.Once`).

This patch:

- introduces a `getSysInfo()` utility, which collects system information without
  caching the results
- uses `getSysInfo()` to collect the preliminary information needed at that
  point in the daemon's lifecycle.
- moves printing warnings to the end of `daemon.NewDaemon()`, after all information
  can be read correctly.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-06-03 17:54:43 +02:00
Sebastiaan van Stijn
c3d7a0c603
Fix validation of IpcMode, PidMode, UTSMode, CgroupnsMode
These HostConfig properties were not validated until the OCI spec for the container
was created, which meant that `container run` and `docker create` would accept
invalid values, and the invalid value would not be detected until `start` was
called, returning a 500 "internal server error", as well as errors from containerd
("cleanup: failed to delete container from containerd: no such container") in the
daemon logs.

As a result, a faulty container was created, and the container state remained
in the `created` state.

This patch:

- Updates `oci.WithNamespaces()` to return the correct `errdefs.InvalidParameter`
- Updates `verifyPlatformContainerSettings()` to validate these settings, so that
  an error is returned when _creating_ the container.

Before this patch:

    docker run -dit --ipc=shared --name foo busybox
    2a00d74e9fbb7960c4718def8f6c74fa8ee754030eeb93ee26a516e27d4d029f
    docker: Error response from daemon: Invalid IPC mode: shared.

    docker ps -a --filter name=foo
    CONTAINER ID   IMAGE     COMMAND   CREATED              STATUS    PORTS     NAMES
    2a00d74e9fbb   busybox   "sh"      About a minute ago   Created             foo

After this patch:

    docker run -dit --ipc=shared --name foo busybox
    docker: Error response from daemon: invalid IPC mode: shared.

     docker ps -a --filter name=foo
    CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES

An integration test was added to verify the new validation, which can be run with:

    make BIND_DIR=. TEST_FILTER=TestCreateInvalidHostConfig DOCKER_GRAPHDRIVER=vfs test-integration

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-05-25 17:41:51 +02:00
Sebastiaan van Stijn
dbd575ef91
daemon: daemon.initNetworkController(): dont return the controller
This method returned the network controller, only to set it on the daemon.

While making this change, also;

- update some error messages to be in the correct format
- use errors.Wrap() where possible
- extract configuring networks into a separate function to make the flow
  slightly easier to follow.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-04-29 09:08:49 +02:00
Sebastiaan van Stijn
3b56c0663d
daemon: daemon.networkOptions(): don't pass Config as argument
This is a method on the daemon, which itself holds the Config, so
there's no need to pass the same configuration as an argument.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-04-23 23:34:13 +02:00
Sebastiaan van Stijn
0a3336fd7d
Merge pull request #43366 from corhere/finish-identitymapping-refactor
Finish refactor of UID/GID usage to a new struct
2022-03-25 14:51:05 +01:00
Sebastiaan van Stijn
5d10c6ec67
Update handling of deprecated kernel (tcp) memory options
- Omit `KernelMemory` and `KernelMemoryTCP` fields in `/info` response if they're
  not supported, or when using API v1.42 or up.
- Re-enable detection of `KernelMemory` (as it's still needed for older API versions)
- Remove warning about kernel memory TCP in daemon logs (a warning is still returned
  by the `/info` endpoint, but we can consider removing that).
- Prevent incorrect "Minimum kernel memory limit allowed" error if the value was
  reset because it's not supported by the host.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-03-17 09:56:39 +01:00