Commit graph

7618 commits

Author SHA1 Message Date
Brian Goff
7dd547c5ff
Merge pull request #45802 from dperny/fix-missing-csi-topology
Fix missing Topology in NodeCSIInfo
2023-06-24 08:38:40 -07:00
Drew Erny
cdb1293eea Fix missing Topology in NodeCSIInfo
Added code to correctly retrieve and convert the Topology from the gRPC
Swarm Node.

Signed-off-by: Drew Erny <derny@mirantis.com>
2023-06-23 11:45:50 -05:00
Cory Snider
165dfd6c3e daemon: fix restoring container with missing task
Before 4bafaa00aa, if the daemon was
killed while a container was running and the container shim is killed
before the daemon is restarted, such as if the host system is
hard-rebooted, the daemon would restore the container to the stopped
state and set the exit code to 255. The aforementioned commit introduced
a regression where the container's exit code would instead be set to 0.
Fix the regression so that the exit code is once against set to 255 on
restore.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-23 11:28:45 -04:00
Bjorn Neergaard
8d070e30f5
Merge pull request #45797 from corhere/fix-health-probe-double-unlock
daemon: fix double-unlock in health check probe
2023-06-22 18:17:09 -06:00
Cory Snider
786c9adaa2 daemon: fix double-unlock in health check probe
Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-22 17:48:21 -04:00
Cory Snider
3b28a24e97 daemon: fix panic on failed exec start
If an exec fails to start in such a way that containerd publishes an
exit event for it, daemon.ProcessEvent will race
daemon.ContainerExecStart in handling the failure. This race has been a
long-standing bug, which was mostly harmless until
4bafaa00aa. After that change, the daemon
would dereference a nil pointer and crash if ProcessEvent won the race.
Restore the status quo buggy behaviour by adding a check to skip the
dereference if execConfig.Process is nil.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-22 17:04:51 -04:00
Sebastiaan van Stijn
b3843992fc
Merge pull request #45781 from neersighted/c8d_stargz_refcount 2023-06-21 16:46:51 +02:00
Sebastiaan van Stijn
ab60412cb4
Merge pull request #45736 from thaJeztah/reserve_once
daemon: registerName(): don't reserve name twice
2023-06-20 23:47:40 +02:00
Bjorn Neergaard
21c0a54a6b
c8d: mark stargz as requiring reference-counted mounts
The stargz snapshotter cannot be re-mounted, so the reference-counted
path must be used.

Co-authored-by: Djordje Lukic <djordje.lukic@docker.com>
Signed-off-by: Bjorn Neergaard <bjorn.neergaard@docker.com>
2023-06-20 12:59:13 -06:00
Sebastiaan van Stijn
fc94ed0a86
don't cancel container stop when cancelling context
Commit 90de570cfa passed through the request
context to daemon.ContainerStop(). As a result, cancelling the context would
cancel the "graceful" stop of the container, and would proceed with forcefully
killing the container.

This patch partially reverts the changes from 90de570cfa
and breaks the context to prevent cancelling the context from cancelling the stop.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-20 11:53:23 +02:00
Sebastiaan van Stijn
3ba67ee214
daemon: registerName(): don't reserve name twice
daemon.generateNewName() already reserves the generated name, but its name
did not indicate it did. The daemon.registerName() assumed that the generated
name still had to be reserved, which could mean it would try to reserve the
same name again.

This patch renames daemon.generateNewName to daemon.generateAndReserveName
to make it clearer what it does, and updates registerName() to return early
if it successfully generated (and registered) the container name.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-13 13:33:33 +02:00
Tianon Gravi
2a6ff3c24f Use OCI "History" type instead of inventing our own copy
The most notable change here is that the OCI's type uses a pointer for `Created`, which we probably should've been too, so most of these changes are accounting for that (and embedding our `Equal` implementation in the one single place it was used).

Signed-off-by: Tianon Gravi <admwiggin@gmail.com>
2023-06-12 13:47:17 -07:00
Sebastiaan van Stijn
59b5c6075f
pkg/rootless: remove GetRootlessKitClient, and move to daemon
This utility was only used in a single location (as part of `docker info`),
but the `pkg/rootless` package is imported in various locations, causing
rootlesskit to be a dependency for consumers of that package.

Move GetRootlessKitClient to the daemon code, which is the only location
it was used.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-12 13:44:30 +02:00
Sebastiaan van Stijn
ed798d651a
Merge pull request #45704 from corhere/fix-zeroes-in-linux-resources
daemon: stop setting container resources to zero
2023-06-12 09:44:07 +02:00
Cory Snider
71589848a0 daemon: test runtimeoptions runtime options
For configured runtimes with a runtimeType other than
io.containerd.runc.v1, io.containerd.runc.v2 and
io.containerd.runhcs.v1, the only supported way to pass configuration is
through the generic containerd "runtimeoptions/v1".Options type. Add a
unit test case which verifies that the options set in the daemon config
are correctly unmarshaled into the daemon's in-memory runtime config,
and that the map keys for the daemon config align with the ones used
when configuring cri-containerd (PascalCase, not camelCase or
snake_case).

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-09 11:16:31 -04:00
Brian Goff
1f9eb9ab07
Merge pull request #45713 from AkihiroSuda/rro-services
daemon/cluster: convert new BindOptions
2023-06-08 11:30:34 -07:00
Akihiro Suda
038a361a91
daemon/cluster: convert new BindOptions
Convert CreateMountpoint, ReadOnlyNonRecursive, and ReadOnlyForceRecursive.

See moby/swarmkit PR 3134

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2023-06-08 10:17:04 +09:00
Djordje Lukic
32d58144fd c8d: Use reference counting while mounting a snapshot
Some snapshotters (like overlayfs or zfs) can't mount the same
directories twice. For example if the same directroy is used as an upper
directory in two mounts the kernel will output this warning:

    overlayfs: upperdir is in-use as upperdir/workdir of another mount, accessing files from both mounts will result in undefined behavior.

And indeed accessing the files from both mounts will result in an "No
such file or directory" error.

This change introduces reference counts for the mounts, if a directory
is already mounted the mount interface will only increment the mount
counter and return the mount target effectively making sure that the
filesystem doesn't end up in an undefined behavior.

Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
2023-06-07 15:50:01 +02:00
Cory Snider
8a094fe609 daemon: ensure OCI options play nicely together
Audit the OCI spec options used for Linux containers to ensure they are
less order-dependent. Ensure they don't assume that any pointer fields
are non-nil and that they don't unintentionally clobber mutations to the
spec applied by other options.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-06 13:10:05 -04:00
Cory Snider
dea870f4ea daemon: stop setting container resources to zero
Many of the fields in LinuxResources struct are pointers to scalars for
some reason, presumably to differentiate between set-to-zero and unset
when unmarshaling from JSON, despite zero being outside the acceptable
range for the corresponding kernel tunables. When creating the OCI spec
for a container, the daemon sets the container's OCI spec CPUShares and
BlkioWeight parameters to zero when the corresponding Docker container
configuration values are zero, signifying unset, despite the minimum
acceptable value for CPUShares being two, and BlkioWeight ten. This has
gone unnoticed as runC does not distingiush set-to-zero from unset as it
also uses zero internally to represent unset for those fields. However,
kata-containers v3.2.0-alpha.3 tries to apply the explicit-zero resource
parameters to the container, exactly as instructed, and fails loudly.
The OCI runtime-spec is silent on how the runtime should handle the case
when those parameters are explicitly set to out-of-range values and
kata's behaviour is not unreasonable, so the daemon must therefore be in
the wrong.

Translate unset values in the Docker container's resources HostConfig to
omit the corresponding fields in the container's OCI spec when starting
and updating a container in order to maximize compatibility with
runtimes.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-06 12:13:05 -04:00
Cory Snider
9ff169ccf4 daemon: modernize oci_linux_test.go
Switch to using t.TempDir() instead of rolling our own.

Clean up mounts leaked by the tests as otherwise the tests fail due to
the leaked mounts because unlike the old cleanup code, t.TempDir()
cleanup does not ignore errors from os.RemoveAll.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-05 18:30:30 -04:00
Bjorn Neergaard
7bdeb1dfc1
Merge pull request #45644 from vvoland/c8d-load-unpack-attestation
c8d/load: Don't unpack pseudo images
2023-06-02 07:20:33 -06:00
Sebastiaan van Stijn
79c7d26495
Merge pull request #45670 from thaJeztah/c8d_useragent_more_details
containerd: add c8d version and storage-driver to User-Agent
2023-06-02 11:24:02 +02:00
Paweł Gronowski
4295806736
c8d/handlers: Handle error in walkPresentChildren
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2023-06-02 10:23:03 +02:00
Cory Snider
0f6eeecac0 daemon: consolidate runtimes config validation
The daemon has made a habit of mutating the DefaultRuntime and Runtimes
values in the Config struct to merge defaults. This would be fine if it
was a part of the regular configuration loading and merging process,
as is done with other config options. The trouble is it does so in
surprising places, such as in functions with 'verify' or 'validate' in
their name. It has been necessary in order to validate that the user has
not defined a custom runtime named "runc" which would shadow the
built-in runtime of the same name. Other daemon code depends on the
runtime named "runc" always being defined in the config, but merging it
with the user config at the same time as the other defaults are merged
would trip the validation. The root of the issue is that the daemon has
used the same config values for both validating the daemon runtime
configuration as supplied by the user and for keeping track of which
runtimes have been set up by the daemon. Now that a completely separate
value is used for the latter purpose, surprising contortions are no
longer required to make the validation work as intended.

Consolidate the validation of the runtimes config and merging of the
built-in runtimes into the daemon.setupRuntimes() function. Set the
result of merging the built-in runtimes config and default default
runtime on the returned runtimes struct, without back-propagating it
onto the config.Config argument.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-01 14:45:25 -04:00
Cory Snider
d222bf097c daemon: reload runtimes w/o breaking containers
The existing runtimes reload logic went to great lengths to replace the
directory containing runtime wrapper scripts as atomically as possible
within the limitations of the Linux filesystem ABI. Trouble is,
atomically swapping the wrapper scripts directory solves the wrong
problem! The runtime configuration is "locked in" when a container is
started, including the path to the runC binary. If a container is
started with a runtime which requires a daemon-managed wrapper script
and then the daemon is reloaded with a config which no longer requires
the wrapper script (i.e. some args -> no args, or the runtime is dropped
from the config), that container would become unmanageable. Any attempts
to stop, exec or otherwise perform lifecycle management operations on
the container are likely to fail due to the wrapper script no longer
existing at its original path.

Atomically swapping the wrapper scripts is also incompatible with the
read-copy-update paradigm for reloading configuration. A handler in the
daemon could retain a reference to the pre-reload configuration for an
indeterminate amount of time after the daemon configuration has been
reloaded and updated. It is possible for the daemon to attempt to start
a container using a deleted wrapper script if a request to run a
container races a reload.

Solve the problem of deleting referenced wrapper scripts by ensuring
that all wrapper scripts are *immutable* for the lifetime of the daemon
process. Any given runtime wrapper script must always exist with the
same contents, no matter how many times the daemon config is reloaded,
or what changes are made to the config. This is accomplished by using
everyone's favourite design pattern: content-addressable storage. Each
wrapper script file name is suffixed with the SHA-256 digest of its
contents to (probabilistically) guarantee immutability without needing
any concurrency control. Stale runtime wrapper scripts are only cleaned
up on the next daemon restart.

Split the derived runtimes configuration from the user-supplied
configuration to have a place to store derived state without mutating
the user-supplied configuration or exposing daemon internals in API
struct types. Hold the derived state and the user-supplied configuration
in a single struct value so that they can be updated as an atomic unit.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-01 14:45:25 -04:00
Cory Snider
0b592467d9 daemon: read-copy-update the daemon config
Ensure data-race-free access to the daemon configuration without
locking by mutating a deep copy of the config and atomically storing
a pointer to the copy into the daemon-wide configStore value. Any
operations which need to read from the daemon config must capture the
configStore value only once and pass it around to guarantee a consistent
view of the config.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-01 14:45:24 -04:00
Cory Snider
742ac6e275 daemon: make config reloading more transactional
Config reloading has interleaved validations and other fallible
operations with mutating the live daemon configuration. The daemon
configuration could be left in a partially-reloaded state if any of the
operations returns an error. Mutating a copy of the configuration and
atomically swapping the config struct on success is not currently an
option as config values are not copyable due to the presence of
sync.Mutex fields. Introduce a two-phase commit protocol to defer any
mutations of the daemon state until after all fallible operations have
succeeded.

Reload transactions are not yet entirely hermetic. The platform
reloading logic for custom runtimes on *nix could still leave the
directory of generated runtime wrapper scripts in an indeterminate state
if an error is encountered.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-01 14:45:24 -04:00
Cory Snider
038449467e Update BuildKit registry config on daemon reload
Historically, daemon.RegistryHosts() has returned a docker.RegistryHosts
callback function which closes over a point-in-time snapshot of the
daemon configuration. When constructing the BuildKit builder at daemon
startup, the return value of daemon.RegistryHosts() has been used.
Therefore the BuildKit builder would use the registry configuration as
it was at daemon startup for the life of the process, even if the
registry configuration is changed and the configuration reloaded.
Provide BuildKit with a RegistryHosts callback which reflects the
live daemon configuration after reloads so that registry operations
performed by BuildKit always use the same configuration as the rest of
the daemon.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-01 14:45:21 -04:00
Cory Snider
982e4fb448 api/server: get features from a callback fn
Passing around a bare pointer to the map of configured features in order
to propagate to consumers changes to the configuration across reloads is
dangerous. Map operations are not atomic, so concurrently reading from
the map while it is being updated is a data race as there is no
synchronization. Use a getter function to retrieve the current features
map so the features can be retrieved race-free.

Remove the unused features argument from the build router.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-01 14:43:27 -04:00
Sebastiaan van Stijn
d099e47e00
containerd: add c8d version and storage-driver to User-Agent
With this patch, the user-agent has information about the containerd-client
version and the storage-driver that's used when using the containerd-integration;

    time="2023-06-01T11:27:07.959822887Z" level=info msg="listening on [::]:5000" go.version=go1.19.9 instance.id=53590f34-096a-4fd1-9c58-d3b8eb7e5092 service=registry version=2.8.2
    ...
    172.18.0.1 - - [01/Jun/2023:11:30:12 +0000] "HEAD /v2/multifoo/blobs/sha256:c7ec7661263e5e597156f2281d97b160b91af56fa1fd2cc045061c7adac4babd HTTP/1.1" 404 157 "" "docker/dev go/go1.20.4 git-commit/8d67d0c1a8 kernel/5.15.49-linuxkit-pr os/linux arch/arm64 containerd-client/1.6.21+unknown storage-driver/overlayfs UpstreamClient(Docker-Client/24.0.2 \\(linux\\))"

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-01 18:21:58 +02:00
Sebastiaan van Stijn
66137ae429
containerd: set user-agent when pushing/pulling images
Before this, the client would report itself as containerd, and the containerd
version from the containerd go module:

    time="2023-06-01T09:43:21.907359755Z" level=info msg="listening on [::]:5000" go.version=go1.19.9 instance.id=67b89d83-eac0-4f85-b36b-b1b18e80bde1 service=registry version=2.8.2
    ...
    172.18.0.1 - - [01/Jun/2023:09:43:33 +0000] "HEAD /v2/multifoo/blobs/sha256:cb269d7c0c1ca22fb5a70342c3ed2196c57a825f94b3f0e5ce3aa8c55baee829 HTTP/1.1" 404 157 "" "containerd/1.6.21+unknown"

With this patch, the user-agent has the docker daemon information;

    time="2023-06-01T11:27:07.959822887Z" level=info msg="listening on [::]:5000" go.version=go1.19.9 instance.id=53590f34-096a-4fd1-9c58-d3b8eb7e5092 service=registry version=2.8.2
    ...
    172.18.0.1 - - [01/Jun/2023:11:27:20 +0000] "HEAD /v2/multifoo/blobs/sha256:c7ec7661263e5e597156f2281d97b160b91af56fa1fd2cc045061c7adac4babd HTTP/1.1" 404 157 "" "docker/dev go/go1.20.4 git-commit/8d67d0c1a8 kernel/5.15.49-linuxkit-pr os/linux arch/arm64 UpstreamClient(Docker-Client/24.0.2 \\(linux\\))"

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-01 14:20:45 +02:00
Sebastiaan van Stijn
8d67d0c1a8
Merge pull request #45437 from thaJeztah/vendor_image_spec
vendor: github.com/opencontainers/image-spec v1.1.0-rc3
2023-05-31 11:12:51 +02:00
Paweł Gronowski
4d3238dc0b
c8d/load: Don't unpack pseudo images
Don't unpack image manifests which are not a real images that can't be
unpacked.

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2023-05-31 10:47:26 +02:00
Paweł Gronowski
b08bff8ba3
c8d/load: Use walkImageManifests
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2023-05-31 10:47:25 +02:00
Paweł Gronowski
5210f48bfc
c8d/list: Use walkImageManifests
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2023-05-31 10:47:23 +02:00
Paweł Gronowski
fabc1d5bef
c8d: Add walkImageManifests and ImageManifest wrapper
The default implementation of the containerd.Image interface provided by
the containerd operates on the parent index/manifest list of the image
and the platform matcher.

This isn't convenient when a specific manifest is already known and it's
redundant to search the whole index for a manifest that matches the
given platform matcher. It can also result in a different manifest
picked up than expected when multiple manifests with the same platform
are present.

This introduces a walkImageManifests which walks the provided image and
calls a handler with a ImageManifest, which is a simple wrapper that
implements containerd.Image interfaces and performs all containerd.Image
operations against a platform specific manifest instead of the root
manifest list/index.

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2023-05-31 10:47:22 +02:00
Bjorn Neergaard
988f5ac342
Merge pull request #45647 from rumpl/fix-snapshotter-change
c8d: Fix re-pull of an image when the snapshotter is changed
2023-05-30 15:32:55 -06:00
Djordje Lukic
ed32f5e241 Make sure the image is unpacked for the current snapshotter
Switching snapshotter implementations would result in an error when
preparing a snapshot, check that the image is indeed unpacked for the
current snapshot before trying to prepare a snapshot.

Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
2023-05-30 14:45:30 +02:00
Brian Goff
487ea81316 Fix npe in exec resize when exec errored
In cases where an exec start failed the exec process will be nil even
though the channel to signal that the exec started was closed.

Ideally ExecConfig would get a nice refactor to handle this case better
(ie. it's not started so don't close that channel).
This is a minimal fix to prevent NPE. Luckilly this would only get
called by a client and only the http request goroutine gets the panic
(http lib recovers the panic).

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2023-05-28 00:14:47 +00:00
Sebastiaan van Stijn
b42e367045
vendor: github.com/opencontainers/image-spec v1.1.0-rc3
full diff: https://github.com/opencontainers/image-spec/compare/3a7f492d3f1b...v1.1.0-rc3

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-05-26 02:34:50 +02:00
Akihiro Suda
5045a2de24
Support recursively read-only (RRO) mounts
`docker run -v /foo:/foo:ro` is now recursively read-only on kernel >= 5.12.

Automatically falls back to the legacy non-recursively read-only mount mode on kernel < 5.12.

Use `ro-non-recursive` to disable RRO.
Use `ro-force-recursive` or `rro` to explicitly enable RRO. (Fails on kernel < 5.12)

Fix issue 44978
Fix docker/for-linux issue 788

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2023-05-26 01:58:24 +09:00
Cory Snider
1b28b0ed5a
Merge pull request #45134 from elezar/add-cdi-support
Add support for CDI devices under Linux
2023-05-25 18:06:31 +02:00
Paweł Gronowski
b9b8b6597a
c8d/inspect: Fill Created time if available
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2023-05-25 14:59:49 +02:00
Sebastiaan van Stijn
f1d5385515
Merge pull request #45609 from thaJeztah/constantly_numb
c8d: ImageService.softImageDelete: use OCI and containerd constants
2023-05-25 09:52:31 +02:00
Sebastiaan van Stijn
f17c9e4aeb
c8d: ImageService.softImageDelete: rename var that collided with import
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-05-25 01:44:36 +02:00
Sebastiaan van Stijn
df5deab20b
c8d: ImageService.softImageDelete: use OCI and containerd constants
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-05-25 01:44:31 +02:00
Cory Snider
9b9c5242eb daemon: lock in snapshotter setting at daemon init
Feature flags are one of the configuration items which can be reloaded
without restarting the daemon. Whether the daemon uses the containerd
snapshotter service or the legacy graph drivers is controlled by a
feature flag. However, much of the code which checks the snapshotter
feature flag assumes that the flag cannot change at runtime. Make it so
that the snapshotter setting can only be changed by restarting the
daemon, even if the flag state changes after a live configuration
reload.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-05-24 16:56:17 -04:00
Sebastiaan van Stijn
c5126d1435
Merge pull request #45601 from vvoland/c8d-exists
c8d/pull: Use same progress action as distribution
2023-05-24 12:48:12 +02:00
Paweł Gronowski
a7bc65fbd8
c8d/pull: Use same progress action as distribution
Docker with containerd integration emits "Exists" progress action when a
layer of the currently pulled image already exists. This is different
from the non-c8d Docker which emits "Already exists".

This makes both implementations consistent by emitting backwards
compatible "Already exists" action.

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2023-05-24 11:16:57 +02:00