Commit graph

46223 commits

Author SHA1 Message Date
Sebastiaan van Stijn
ff2342154b
Dockerfile: use COPY --link for source code as well
I missed the most important COPY in 637ca59375

Copying the source code into the dev-container does not depend on the parent
layers, so can use the --link option as well.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-01 22:17:09 +02:00
Cory Snider
cc51f0b3d3
Merge pull request #43980 from corhere/rcu-daemon-config
daemon: read-copy-update the daemon config
2023-06-01 21:34:14 +02:00
Cory Snider
0f6eeecac0 daemon: consolidate runtimes config validation
The daemon has made a habit of mutating the DefaultRuntime and Runtimes
values in the Config struct to merge defaults. This would be fine if it
was a part of the regular configuration loading and merging process,
as is done with other config options. The trouble is it does so in
surprising places, such as in functions with 'verify' or 'validate' in
their name. It has been necessary in order to validate that the user has
not defined a custom runtime named "runc" which would shadow the
built-in runtime of the same name. Other daemon code depends on the
runtime named "runc" always being defined in the config, but merging it
with the user config at the same time as the other defaults are merged
would trip the validation. The root of the issue is that the daemon has
used the same config values for both validating the daemon runtime
configuration as supplied by the user and for keeping track of which
runtimes have been set up by the daemon. Now that a completely separate
value is used for the latter purpose, surprising contortions are no
longer required to make the validation work as intended.

Consolidate the validation of the runtimes config and merging of the
built-in runtimes into the daemon.setupRuntimes() function. Set the
result of merging the built-in runtimes config and default default
runtime on the returned runtimes struct, without back-propagating it
onto the config.Config argument.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-01 14:45:25 -04:00
Cory Snider
d222bf097c daemon: reload runtimes w/o breaking containers
The existing runtimes reload logic went to great lengths to replace the
directory containing runtime wrapper scripts as atomically as possible
within the limitations of the Linux filesystem ABI. Trouble is,
atomically swapping the wrapper scripts directory solves the wrong
problem! The runtime configuration is "locked in" when a container is
started, including the path to the runC binary. If a container is
started with a runtime which requires a daemon-managed wrapper script
and then the daemon is reloaded with a config which no longer requires
the wrapper script (i.e. some args -> no args, or the runtime is dropped
from the config), that container would become unmanageable. Any attempts
to stop, exec or otherwise perform lifecycle management operations on
the container are likely to fail due to the wrapper script no longer
existing at its original path.

Atomically swapping the wrapper scripts is also incompatible with the
read-copy-update paradigm for reloading configuration. A handler in the
daemon could retain a reference to the pre-reload configuration for an
indeterminate amount of time after the daemon configuration has been
reloaded and updated. It is possible for the daemon to attempt to start
a container using a deleted wrapper script if a request to run a
container races a reload.

Solve the problem of deleting referenced wrapper scripts by ensuring
that all wrapper scripts are *immutable* for the lifetime of the daemon
process. Any given runtime wrapper script must always exist with the
same contents, no matter how many times the daemon config is reloaded,
or what changes are made to the config. This is accomplished by using
everyone's favourite design pattern: content-addressable storage. Each
wrapper script file name is suffixed with the SHA-256 digest of its
contents to (probabilistically) guarantee immutability without needing
any concurrency control. Stale runtime wrapper scripts are only cleaned
up on the next daemon restart.

Split the derived runtimes configuration from the user-supplied
configuration to have a place to store derived state without mutating
the user-supplied configuration or exposing daemon internals in API
struct types. Hold the derived state and the user-supplied configuration
in a single struct value so that they can be updated as an atomic unit.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-01 14:45:25 -04:00
Cory Snider
0b592467d9 daemon: read-copy-update the daemon config
Ensure data-race-free access to the daemon configuration without
locking by mutating a deep copy of the config and atomically storing
a pointer to the copy into the daemon-wide configStore value. Any
operations which need to read from the daemon config must capture the
configStore value only once and pass it around to guarantee a consistent
view of the config.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-01 14:45:24 -04:00
Cory Snider
742ac6e275 daemon: make config reloading more transactional
Config reloading has interleaved validations and other fallible
operations with mutating the live daemon configuration. The daemon
configuration could be left in a partially-reloaded state if any of the
operations returns an error. Mutating a copy of the configuration and
atomically swapping the config struct on success is not currently an
option as config values are not copyable due to the presence of
sync.Mutex fields. Introduce a two-phase commit protocol to defer any
mutations of the daemon state until after all fallible operations have
succeeded.

Reload transactions are not yet entirely hermetic. The platform
reloading logic for custom runtimes on *nix could still leave the
directory of generated runtime wrapper scripts in an indeterminate state
if an error is encountered.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-01 14:45:24 -04:00
Cory Snider
038449467e Update BuildKit registry config on daemon reload
Historically, daemon.RegistryHosts() has returned a docker.RegistryHosts
callback function which closes over a point-in-time snapshot of the
daemon configuration. When constructing the BuildKit builder at daemon
startup, the return value of daemon.RegistryHosts() has been used.
Therefore the BuildKit builder would use the registry configuration as
it was at daemon startup for the life of the process, even if the
registry configuration is changed and the configuration reloaded.
Provide BuildKit with a RegistryHosts callback which reflects the
live daemon configuration after reloads so that registry operations
performed by BuildKit always use the same configuration as the rest of
the daemon.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-01 14:45:21 -04:00
Cory Snider
982e4fb448 api/server: get features from a callback fn
Passing around a bare pointer to the map of configured features in order
to propagate to consumers changes to the configuration across reloads is
dangerous. Map operations are not atomic, so concurrently reading from
the map while it is being updated is a data race as there is no
synchronization. Use a getter function to retrieve the current features
map so the features can be retrieved race-free.

Remove the unused features argument from the build router.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-01 14:43:27 -04:00
Sebastiaan van Stijn
c679da9ae1
Merge pull request #45669 from thaJeztah/c8d_useragent
containerd: set user-agent when pushing/pulling images
2023-06-01 18:10:24 +02:00
Sebastiaan van Stijn
90e87c4753
Merge pull request #45660 from thaJeztah/dockerfile_copy_link
Dockerfile: use COPY --link to copy artifacts from build-stages
2023-06-01 16:26:01 +02:00
Sebastiaan van Stijn
66137ae429
containerd: set user-agent when pushing/pulling images
Before this, the client would report itself as containerd, and the containerd
version from the containerd go module:

    time="2023-06-01T09:43:21.907359755Z" level=info msg="listening on [::]:5000" go.version=go1.19.9 instance.id=67b89d83-eac0-4f85-b36b-b1b18e80bde1 service=registry version=2.8.2
    ...
    172.18.0.1 - - [01/Jun/2023:09:43:33 +0000] "HEAD /v2/multifoo/blobs/sha256:cb269d7c0c1ca22fb5a70342c3ed2196c57a825f94b3f0e5ce3aa8c55baee829 HTTP/1.1" 404 157 "" "containerd/1.6.21+unknown"

With this patch, the user-agent has the docker daemon information;

    time="2023-06-01T11:27:07.959822887Z" level=info msg="listening on [::]:5000" go.version=go1.19.9 instance.id=53590f34-096a-4fd1-9c58-d3b8eb7e5092 service=registry version=2.8.2
    ...
    172.18.0.1 - - [01/Jun/2023:11:27:20 +0000] "HEAD /v2/multifoo/blobs/sha256:c7ec7661263e5e597156f2281d97b160b91af56fa1fd2cc045061c7adac4babd HTTP/1.1" 404 157 "" "docker/dev go/go1.20.4 git-commit/8d67d0c1a8 kernel/5.15.49-linuxkit-pr os/linux arch/arm64 UpstreamClient(Docker-Client/24.0.2 \\(linux\\))"

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-01 14:20:45 +02:00
Sebastiaan van Stijn
637ca59375
Dockerfile: use COPY --link to copy artifacts from build-stages
Build-cache for the build-stages themselves are already invalidated if the
base-images they're using is updated, and the COPY operations don't depend
on previous steps (as there's no overlap between artifacts copied).

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-05-31 11:52:18 +02:00
Sebastiaan van Stijn
8d67d0c1a8
Merge pull request #45437 from thaJeztah/vendor_image_spec
vendor: github.com/opencontainers/image-spec v1.1.0-rc3
2023-05-31 11:12:51 +02:00
Bjorn Neergaard
abc05cf335
Merge pull request #45645 from sebthom/patch-1
Update blogpost URL
2023-05-30 15:36:37 -06:00
Bjorn Neergaard
988f5ac342
Merge pull request #45647 from rumpl/fix-snapshotter-change
c8d: Fix re-pull of an image when the snapshotter is changed
2023-05-30 15:32:55 -06:00
Cory Snider
d43b398746
Merge pull request #45657 from corhere/libn/setup-resolver-with-verbose-iptables
libnetwork: fix resolver restore w/ chatty 'iptables -C'
2023-05-30 21:44:14 +02:00
Cory Snider
a25434654e
Merge pull request #45654 from corhere/libn/fix-embedded-resolver-live-reload
libnetwork: fix sandbox restore
2023-05-30 21:43:46 +02:00
Cory Snider
1178319313 libn: fix resolver restore w/ chatty 'iptables -C'
Resolver.setupIPTable() checks whether it needs to flush or create the
user chains used for NATing container DNS requests by testing for the
existence of the rules which jump to said user chains. Unfortunately it
does so using the IPTable.RawCombinedOutputNative() method, which
returns a non-nil error if the iptables command returns any output even
if the command exits with a zero status code. While that is fine with
iptables-legacy as it prints no output if the rule exists, iptables-nft
v1.8.7 prints some information about the rule. Consequently,
Resolver.setupIPTable() would incorrectly think that the rule does not
exist during container restore and attempt to create it. This happened
work work by coincidence before 8f5a9a741b
because the failure to create the already-existing table would be
ignored and the new NAT rules would be inserted before the stale rules
left in the table from when the container was last started/restored. Now
that failing to create the table is treated as a fatal error, the
incompatibility with iptables-nft is no longer hidden.

Switch to using IPTable.ExistsNative() to test for the existence of the
jump rules as it correctly only checks the iptables command's exit
status without regard for whether it outputs anything.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-05-30 14:32:27 -04:00
Cory Snider
50eb2d2782 libnetwork: fix sandbox restore
The method to restore a network namespace takes a collection of
interfaces to restore with the options to apply. The interface names are
structured data, tuples of (SrcName, DstPrefix) but for whatever reason
are being passed into Restore() serialized to strings. A refactor,
f0be4d126d, accidentally broke the
serialization by dropping the delimiter. Rather than fix the
serialization and leave the time-bomb for someone else to trip over,
pass the interface names as structured data.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-05-30 12:27:59 -04:00
Cory Snider
18bf3aa442 libnetwork: log why osl sandbox restore failed
Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-05-30 12:17:44 -04:00
Djordje Lukic
ed32f5e241 Make sure the image is unpacked for the current snapshotter
Switching snapshotter implementations would result in an error when
preparing a snapshot, check that the image is indeed unpacked for the
current snapshot before trying to prepare a snapshot.

Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
2023-05-30 14:45:30 +02:00
Sebastiaan van Stijn
2cd23ffeec
Merge pull request #45628 from thaJeztah/context_simplify
builder/remotecontext: remove mimeTypes struct, use consts
2023-05-30 13:40:42 +02:00
sebthom
d58df1fc6c Update blogpost URL
Signed-off-by: sebthom <sebthom@users.noreply.github.com>
2023-05-29 22:37:09 +02:00
Sebastiaan van Stijn
098b0fd1a0
Merge pull request #45627 from thaJeztah/remove_builder_streaming
builder/remotecontext: remove CachableSource, NewCachableSource
2023-05-29 19:04:32 +02:00
Sebastiaan van Stijn
44124ab6b0
builder/remotecontext: remove CachableSource, NewCachableSource
This type (as well as TarsumBackup), was used for the experimental --stream
support for the classic builder. This feature was removed in commit
6ca3ec88ae, which also removed uses of
the CachableSource type.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-05-29 16:35:42 +02:00
Sebastiaan van Stijn
3a643154be
Merge pull request #44697 from crazy-max/generate-files
Update and validation of generated files
2023-05-29 16:34:54 +02:00
CrazyMax
fd72b134d5
update generated files
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
2023-05-29 03:28:35 +02:00
CrazyMax
735537d6b1
replace gogofast with gogofaster extension
gogofaster is identical as gogofast but removes XXX_unrecognized

Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
2023-05-29 03:28:35 +02:00
CrazyMax
1eaea43581
fix protos and "go generate" commands
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
2023-05-29 03:28:35 +02:00
Kevin Alvarez
7daaa00120
hack: generated files update and validation
Adds a Dockerfile and make targets to update and validate
generated files (proto, seccomp default profile)

Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
2023-05-29 03:28:35 +02:00
CrazyMax
f1ca793980
use tools build constraint for proto dependencies
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
2023-05-29 03:13:15 +02:00
Akihiro Suda
2ebd97dec1
Merge pull request #45641 from cpuguy83/exec_npe
Fix npe in exec resize when exec errored
2023-05-28 19:44:23 +09:00
Brian Goff
487ea81316 Fix npe in exec resize when exec errored
In cases where an exec start failed the exec process will be nil even
though the channel to signal that the exec started was closed.

Ideally ExecConfig would get a nice refactor to handle this case better
(ie. it's not started so don't close that channel).
This is a minimal fix to prevent NPE. Luckilly this would only get
called by a client and only the http request goroutine gets the panic
(http lib recovers the panic).

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2023-05-28 00:14:47 +00:00
Cory Snider
8f7bbc39a4
Merge pull request #45636 from corhere/libn/fix-encrypted-overlay-nonstandard-port
libnetwork/d/overlay: support encryption on any port
2023-05-26 22:40:56 +02:00
Cory Snider
9a692a3802 libn/d/overlay: support encryption on any port
While the VXLAN interface and the iptables rules to mark outgoing VXLAN
packets for encryption are configured to use the Swarm data path port,
the XFRM policies for actually applying the encryption are hardcoded to
match packets with destination port 4789/udp. Consequently, encrypted
overlay networks do not pass traffic when the Swarm is configured with
any other data path port: encryption is not applied to the outgoing
VXLAN packets and the destination host drops the received cleartext
packets. Use the configured data path port instead of hardcoding port
4789 in the XFRM policies.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-05-26 14:36:34 -04:00
Sebastiaan van Stijn
e410e27547
builder/remotecontext: remove mimeTypes struct, use consts
This struct was never modified; let's just use consts for these.

Also remove the args return from detectContentType(), as it was
not used anywhere.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-05-26 15:21:15 +02:00
Sebastiaan van Stijn
13fb24458c
Merge pull request #45626 from thaJeztah/deprecate_builder_streaming
builder/remotecontext: deprecate CachableSource, NewCachableSource
2023-05-26 15:12:49 +02:00
Sebastiaan van Stijn
b42e367045
vendor: github.com/opencontainers/image-spec v1.1.0-rc3
full diff: https://github.com/opencontainers/image-spec/compare/3a7f492d3f1b...v1.1.0-rc3

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-05-26 02:34:50 +02:00
Sebastiaan van Stijn
0db4174513
Merge pull request #45278 from AkihiroSuda/rro
Support recursively read-only (RRO) mounts
2023-05-26 02:24:43 +02:00
Sebastiaan van Stijn
37d4b0bee9
builder/remotecontext: deprecate CachableSource, NewCachableSource
This type (as well as TarsumBackup), was used for the experimental --stream
support for the classic builder. This feature was removed in commit
6ca3ec88ae, which also removed uses of
the CachableSource type.

As far as I could find, there's no external consumers of these types,
but let's deprecated it, to give potential users a heads-up that it
will be removed.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-05-26 00:05:08 +02:00
Sebastiaan van Stijn
88f6a92d22
Merge pull request #45624 from corhere/libc8d/serialize-exec-starts-workaround
libcontainerd: work around exec start bug in c8d
2023-05-25 23:02:34 +02:00
Sebastiaan van Stijn
a4c54362c3
Merge pull request #45581 from thaJeztah/vendor_buildkit_0.11.7_dev
vendor: github.com/moby/buildkit v0.11.7-0.20230525183624-798ad6b0ce9f
2023-05-25 22:27:06 +02:00
Cory Snider
fb7ec1555c libcontainerd: work around exec start bug in c8d
It turns out that the unnecessary serialization removed in
b75246202a happened to work around a bug
in containerd. When many exec processes are started concurrently in the
same containerd task, it takes seconds to minutes for them all to start.
Add the workaround back in, only deliberately this time.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-05-25 16:00:29 -04:00
Sebastiaan van Stijn
79ca6630d4
vendor: github.com/moby/buildkit v0.11.7-0.20230525183624-798ad6b0ce9f
full diff: https://github.com/moby/buildkit/compare/v0.11.6...798ad6b0ce9f2fe86dfb2b0277e6770d0b545871

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-05-25 21:35:53 +02:00
Sebastiaan van Stijn
d5dc675d37
Merge pull request #45280 from corhere/libnet/no-overlay-accept-rule
libnetwork/drivers/overlay: stop programming INPUT ACCEPT rule
2023-05-25 21:03:32 +02:00
Sebastiaan van Stijn
10505cac52
Merge pull request #45619 from thaJeztah/update_go_runc_v1.1.0
vendor: github.com/containerd/go-runc v1.1.0
2023-05-25 20:19:05 +02:00
Akihiro Suda
5045a2de24
Support recursively read-only (RRO) mounts
`docker run -v /foo:/foo:ro` is now recursively read-only on kernel >= 5.12.

Automatically falls back to the legacy non-recursively read-only mount mode on kernel < 5.12.

Use `ro-non-recursive` to disable RRO.
Use `ro-force-recursive` or `rro` to explicitly enable RRO. (Fails on kernel < 5.12)

Fix issue 44978
Fix docker/for-linux issue 788

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2023-05-26 01:58:24 +09:00
Sebastiaan van Stijn
3512b04093
vendor: github.com/containerd/go-runc v1.1.0
full diff: https://github.com/containerd/go-runc/compare/v1.0.0...v1.1.0

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-05-25 18:56:52 +02:00
Cory Snider
1b28b0ed5a
Merge pull request #45134 from elezar/add-cdi-support
Add support for CDI devices under Linux
2023-05-25 18:06:31 +02:00
Sebastiaan van Stijn
02c9f038b3
Merge pull request #45618 from vvoland/c8d-inspect-created-time
c8d/inspect: Fill `Created` time if available
2023-05-25 17:16:56 +02:00