Now that we removed the interface, there's no need to cast the Network
to a NetworkInfo interface, so we can remove uses of the `Info()` method.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
PR moby/moby#45759 is going to use the new `errors.Join` function to
return a list of validation errors.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Remove some intermediate vars, move vars closer to where they're used,
and introduce local var for `nw.Name()` to reduce some locking/unlocking in:
- `Daemon.allocateNetwork()`
- `Daemon.releaseNetwork()`
- `Daemon.connectToNetwork()`
- `Daemon.disconnectFromNetwork()`
- `Daemon.findAndAttachNetwork()`
Also un-wrapping some lines to make it slightly easier to read the conditions.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- Remove intermediate variable
- Optimize the order of checks in the condition; check for unmanaged containers
first, before getting information about cluster state and network information.
- Simplify the log messages, as the error would already contain the same
information about the network (name or ID) and container (ID), so would
print the network ID twice:
error detaching from network <ID>: could not find network attachment for container <ID> to network <name or ID>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The function was declaring an err variable which was shadowed. It was
intended for directly assigning to a struct field, but as this function
is directly mutating an existing object, and the err variable was declared
far away from its use, let's use an intermediate var for that to make it
slightly more atomic.
While at it, also combined two "if" branches.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
store network.Name() in a variable to reduce repeatedly locking/unlocking
of the network (although this is very, very minimal in the grand scheme
of things).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This function is called by `daemon.containerCreate()` which is already
wrapping errors coming from `verifyNetworkingConfig()` with
`errdefs.InvalidParameter()`. So `verifyNetworkingConfig()` should only
return standard errors.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
This code was initializing a new PortBinding, and creating a deep copy
for each binding. It's unclear what the intent was here, but at least
PortBinding.GetCopy() wasn't adding much value, as it created a new
PortBinding, [copying all values from the original][1], which includes
a [copy of IPAddresses in it][2]. Our original "template" did not have any
of that, so let's forego that, and just create new PortBindings as we go.
[1]: 454b6a7cf5/libnetwork/types/types.go (L110-L120)
[2]: 454b6a7cf5/libnetwork/types/types.go (L236-L244)
Benchmarking before/after;
BenchmarkPortBindingCopy-10 166752 6230 ns/op 1600 B/op 100 allocs/op
BenchmarkPortBindingNoCopy-10 226989 5056 ns/op 1600 B/op 100 allocs/op
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
These were not adding much, so just getting rid of them. Also added a
TODO to move this code to the type.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Move variables closer to where they're used instead of defining them all
at the start of the function.
Also removing some intermediate variables, unwrapped some lines, and combined
some checks to a single check.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Outside of some tests, these options are the only code setting these fields,
so we can update them to set the value, instead of appending.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This function was created as a "method", but didn't use the Daemon in any
way, and all other options were checked inline, so let's not pretend this
function is more "special" than the other checks, and inline the code.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- store network.Name() in a variable to reduce repeatedly locking/unlocking
of the network (although this is very, very minimal in the grand scheme
of things).
- un-wrap long conditions
- ever so slightly optimise some conditions by changeing the order of checks.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This code was initializing a new PortBinding, and creating a deep copy
for each binding. It's unclear what the intent was here, but at least
PortBinding.GetCopy() wasn't adding much value, as it created a new
PortBinding, [copying all values from the original][1], which includes
a [copy of IPAddresses in it][2]. Our original "template" did not have any
of that, so let's forego that, and just create new PortBindings as we go.
[1]: 454b6a7cf5/libnetwork/types/types.go (L110-L120)
[2]: 454b6a7cf5/libnetwork/types/types.go (L236-L244)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
These were not adding much, so just getting rid of them. Also added a
TODO to move this code to the type.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Move variables closer to where they're used instead of defining them all
at the start of the function.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
`getPortMapInfo` does many things; it creates a copy of all the sandbox
endpoints, gets the driver, endpoints, and network from store, and creates
port-bindings for all exposed and mapped ports.
We should look if we can create a more minimal implementation for this
purpose, but in the meantime, let's prevent it being called if we don't
need it by making it the second condition in the check.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- don't initialize slices; it's not needed to append to them
- store network-ID in a var to prevent repeated lock/unlocking in nw.ID()
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The "Capability" type defines DataScope and ConnectivityScope fields,
but their value was set from consts in the datastore package, which
required importing that package and its dependencies for the consts
only.
This patch:
- Moves the consts to a separate "scope" package
- Adds aliases for the consts in the datastore package.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
refreshImage is the only function used as a reducer and it doesn't use
the `filter *listContext`.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Aggregate same images into one object and add the list of tags pointing
to it to the RepoTags array
Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
Refactor GetContainerLayerSize to calculate unpacked image size only by
following the snapshot parent tree directly instead of following it by
using diff ids from image config.
This works even if the original manifest/config used to create that
container is no longer present in the content store.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Check for generic `errdefs.NotFound` rather than specific error helper
struct when checking if the error is caused by the image not being
present.
It still works for `ErrImageDoesNotExist` because it
implements the NotFound errdefs interface too.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Now that all helper functions are updated, we can use a struct-literal
for this function, which makes it slightly easier to read.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Split the buildDetailedNetworkResources function into separate functions for
collecting container attachments (`buildContainerAttachments`) and service
attachments (`buildServiceAttachments`). This allows us to get rid of the
"verbose" bool, and makes the logic slightly more transparent.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- Pass the endpoint and endpoint-info, instead of individual fields from the
endpoint.
- Remove redundant nil-check, as it's already checked on the call-side
in `buildDetailedNetworkResources`, which skips endpoints without info.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Make the function return the constructed network.IPAM instead of applying
it to a network struct, and rename it to "buildIPAMResources".
Rewrite the function itself:
- Use struct-literals where possible to make it slightly more readable.
- Use a boolean (hasIPv4Config, hasIPv6Config) for both IPv4 and IPv6 to
check whether the IPAM-info needs to be added. This makes the logic the
same for both, and makes the processing order-independent. This also
allows for the `network.IpamInfo()` call to be skipped if it's not needed.
- Change order of "ipv4 config / ipv4 info" and "ipv6 config / ipv4 info"
blocks to make it slightly clearer (and to allow skipping the forementioned
call to `network.IpamInfo()`).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Move the length-check into the function, and change the code to
be a basic type-case, as networkdb.PeerInfo and network.PeerInfo
are identical types.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This change add leases for all the content that will be exported, once
the image(s) are exported the lease is removed, thus letting
containerd's GC to do its job if needed. This fixes the case where
someone would remove an image that is still being exported.
This fixes the TestAPIImagesSaveAndLoad cli integration test.
Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
When resolving a reference that is both a Named and Digested, it could
be resolved to an image that has the same digest, but completely
different repository name.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
If image name is already an untagged digested reference, don't produce
additional digested ref.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
The imgSvcConfig is defined locally, and discarded if an error occurs,
so no need to use the intermediate vars here.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The interface is defined on the receiver-side, and returning concrete
types makes it more transparent what we're creating.
As these namespaced wrappers were not exported, let's inline them, so
that it's clear at a glance what it's doing.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The containerdCli was somewhat confusing (is it the CLI?); let's rename
to make it match what it is :)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
insecure-registries supports using CIDR notation, however, buildkit in
moby was not respecting these. We can update the RegistryHosts function
to support this by inserting the correct host into the lookup map if
it's explicitly marked as insecure.
Signed-off-by: Justin Chadwell <me@jedevc.com>
The RegistryHosts lookup function is used by both BuildKit and by the
containerd snapshotter. However, this function differs in behaviour from
the config parser for the RegistryConfig:
- The protocol for insecure registries is treated as significant by
RegistryHosts, while the RegistryConfig strips this information.
- RegistryConfig validates and deduplicates mirrors.
- RegistryConfig does not parse the insecure-registries as URLs, which
can lead to parsing opaque URLs as was possible by the RegistryHosts
function.
This patch updates the lookup function to ensure consistency.
Signed-off-by: Justin Chadwell <me@jedevc.com>
With this change, the API will now return a 403 instead of a 500 when
trying to create an overlay network on a non-manager node.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
The commit befff0e13f inadvertendly
disabled the error returned when trying to create an overlay network on
a node which is not part of a Swarm cluster.
Since commit e3708a89cc the overlay
netdriver returns the error: `no VNI provided`.
This commit reinstate the original error message by checking if the node
is a manager before calling libnetwork's `controller.NewNetwork()`.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
The daemon.lazyInitializeVolume() function only handles restoring Volumes
if a Driver is specified. The Container's MountPoints field may also
contain other kind of mounts (e.g., bind-mounts). Those were ignored, and
don't return an error; 1d9c8619cd/daemon/volumes.go (L243-L252C2)
However, the prepareMountPoints() assumed each MountPoint was a volume,
and logged an informational message about the volume being restored;
1d9c8619cd/daemon/mounts.go (L18-L25)
This would panic if the MountPoint was not a volume;
github.com/docker/docker/daemon.(*Daemon).prepareMountPoints(0xc00054b7b8?, 0xc0007c2500)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/mounts.go:24 +0x1c0
github.com/docker/docker/daemon.(*Daemon).restore.func5(0xc0007c2500, 0x0?)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/daemon.go:552 +0x271
created by github.com/docker/docker/daemon.(*Daemon).restore
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/daemon.go:530 +0x8d8
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x564e9be4c7c0]
This issue was introduced in 647c2a6cdd
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This adds an additional interval to be used by healthchecks during the
start period.
Typically when a container is just starting you want to check if it is
ready more quickly than a typical healthcheck might run. Without this
users have to balance between running healthchecks to frequently vs
taking a very long time to mark a container as healthy for the first
time.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Now that the MTU field was moved, this function only needs the BridgeConfig,
which contains all options for the default "bridge" network.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This option is only used for the default bridge network; let's move the
field to that struct to make it clearer what it's used for.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The --mtu option is only used for the default "bridge" network on Linux.
On Windows, the flag is available, but ignored. As this option has been
available for a long time, and was always silently ignored, deprecating
or removing it would be a breaking change (and perhaps it's possible to
support it in future).
This patch:
- hides the option on Windows binaries
- logs a warning if the option is set to any non-zero value other than
the default on a Windows binary
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Got a linter warning on this one, and I don't think eventFilter() was
intentionally using a value (not pointer).
> Struct containerConfig has methods on both value and pointer receivers.
> Such usage is not recommended by the Go Documentation
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
These aliases were not needed, and only used in a couple of places,
which made it inconsistent, so let's use the import without aliasing.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
If an image is only by id instead of its name, don't prune it
completely. but only untag it and create a dangling image for it.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
This field was added in f0e6e135a8, and
from that change I suspect it was intended to store the default SELinux
mount-labels to be set on containers.
However, it was never used, so let's remove it.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Update docker to support a '--log-format' option, which accepts either
'text' (default) or 'json'. Propagate the log format to containerd as
well, to ensure that everything will be logged consistently.
Signed-off-by: Philip K. Warren <pkwarren@gmail.com>
When live-restoring a container the volume driver needs be notified that
there is an active mount for the volume.
Before this change the count is zero until the container stops and the
uint64 overflows pretty much making it so the volume can never be
removed until another daemon restart.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Ideally, this should actually do a lookup across images that have no parent, but I wasn't 100% sure how to accomplish that so I opted for the smaller change of having `FROM scratch` builds not be cached for now.
Signed-off-by: Tianon Gravi <admwiggin@gmail.com>
Before 4bafaa00aa, if the daemon was
killed while a container was running and the container shim is killed
before the daemon is restarted, such as if the host system is
hard-rebooted, the daemon would restore the container to the stopped
state and set the exit code to 255. The aforementioned commit introduced
a regression where the container's exit code would instead be set to 0.
Fix the regression so that the exit code is once against set to 255 on
restore.
Signed-off-by: Cory Snider <csnider@mirantis.com>
If an exec fails to start in such a way that containerd publishes an
exit event for it, daemon.ProcessEvent will race
daemon.ContainerExecStart in handling the failure. This race has been a
long-standing bug, which was mostly harmless until
4bafaa00aa. After that change, the daemon
would dereference a nil pointer and crash if ProcessEvent won the race.
Restore the status quo buggy behaviour by adding a check to skip the
dereference if execConfig.Process is nil.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The stargz snapshotter cannot be re-mounted, so the reference-counted
path must be used.
Co-authored-by: Djordje Lukic <djordje.lukic@docker.com>
Signed-off-by: Bjorn Neergaard <bjorn.neergaard@docker.com>
Commit 90de570cfa passed through the request
context to daemon.ContainerStop(). As a result, cancelling the context would
cancel the "graceful" stop of the container, and would proceed with forcefully
killing the container.
This patch partially reverts the changes from 90de570cfa
and breaks the context to prevent cancelling the context from cancelling the stop.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
daemon.generateNewName() already reserves the generated name, but its name
did not indicate it did. The daemon.registerName() assumed that the generated
name still had to be reserved, which could mean it would try to reserve the
same name again.
This patch renames daemon.generateNewName to daemon.generateAndReserveName
to make it clearer what it does, and updates registerName() to return early
if it successfully generated (and registered) the container name.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The most notable change here is that the OCI's type uses a pointer for `Created`, which we probably should've been too, so most of these changes are accounting for that (and embedding our `Equal` implementation in the one single place it was used).
Signed-off-by: Tianon Gravi <admwiggin@gmail.com>
This utility was only used in a single location (as part of `docker info`),
but the `pkg/rootless` package is imported in various locations, causing
rootlesskit to be a dependency for consumers of that package.
Move GetRootlessKitClient to the daemon code, which is the only location
it was used.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
For configured runtimes with a runtimeType other than
io.containerd.runc.v1, io.containerd.runc.v2 and
io.containerd.runhcs.v1, the only supported way to pass configuration is
through the generic containerd "runtimeoptions/v1".Options type. Add a
unit test case which verifies that the options set in the daemon config
are correctly unmarshaled into the daemon's in-memory runtime config,
and that the map keys for the daemon config align with the ones used
when configuring cri-containerd (PascalCase, not camelCase or
snake_case).
Signed-off-by: Cory Snider <csnider@mirantis.com>
Convert CreateMountpoint, ReadOnlyNonRecursive, and ReadOnlyForceRecursive.
See moby/swarmkit PR 3134
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Some snapshotters (like overlayfs or zfs) can't mount the same
directories twice. For example if the same directroy is used as an upper
directory in two mounts the kernel will output this warning:
overlayfs: upperdir is in-use as upperdir/workdir of another mount, accessing files from both mounts will result in undefined behavior.
And indeed accessing the files from both mounts will result in an "No
such file or directory" error.
This change introduces reference counts for the mounts, if a directory
is already mounted the mount interface will only increment the mount
counter and return the mount target effectively making sure that the
filesystem doesn't end up in an undefined behavior.
Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
Audit the OCI spec options used for Linux containers to ensure they are
less order-dependent. Ensure they don't assume that any pointer fields
are non-nil and that they don't unintentionally clobber mutations to the
spec applied by other options.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Many of the fields in LinuxResources struct are pointers to scalars for
some reason, presumably to differentiate between set-to-zero and unset
when unmarshaling from JSON, despite zero being outside the acceptable
range for the corresponding kernel tunables. When creating the OCI spec
for a container, the daemon sets the container's OCI spec CPUShares and
BlkioWeight parameters to zero when the corresponding Docker container
configuration values are zero, signifying unset, despite the minimum
acceptable value for CPUShares being two, and BlkioWeight ten. This has
gone unnoticed as runC does not distingiush set-to-zero from unset as it
also uses zero internally to represent unset for those fields. However,
kata-containers v3.2.0-alpha.3 tries to apply the explicit-zero resource
parameters to the container, exactly as instructed, and fails loudly.
The OCI runtime-spec is silent on how the runtime should handle the case
when those parameters are explicitly set to out-of-range values and
kata's behaviour is not unreasonable, so the daemon must therefore be in
the wrong.
Translate unset values in the Docker container's resources HostConfig to
omit the corresponding fields in the container's OCI spec when starting
and updating a container in order to maximize compatibility with
runtimes.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Switch to using t.TempDir() instead of rolling our own.
Clean up mounts leaked by the tests as otherwise the tests fail due to
the leaked mounts because unlike the old cleanup code, t.TempDir()
cleanup does not ignore errors from os.RemoveAll.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The daemon has made a habit of mutating the DefaultRuntime and Runtimes
values in the Config struct to merge defaults. This would be fine if it
was a part of the regular configuration loading and merging process,
as is done with other config options. The trouble is it does so in
surprising places, such as in functions with 'verify' or 'validate' in
their name. It has been necessary in order to validate that the user has
not defined a custom runtime named "runc" which would shadow the
built-in runtime of the same name. Other daemon code depends on the
runtime named "runc" always being defined in the config, but merging it
with the user config at the same time as the other defaults are merged
would trip the validation. The root of the issue is that the daemon has
used the same config values for both validating the daemon runtime
configuration as supplied by the user and for keeping track of which
runtimes have been set up by the daemon. Now that a completely separate
value is used for the latter purpose, surprising contortions are no
longer required to make the validation work as intended.
Consolidate the validation of the runtimes config and merging of the
built-in runtimes into the daemon.setupRuntimes() function. Set the
result of merging the built-in runtimes config and default default
runtime on the returned runtimes struct, without back-propagating it
onto the config.Config argument.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The existing runtimes reload logic went to great lengths to replace the
directory containing runtime wrapper scripts as atomically as possible
within the limitations of the Linux filesystem ABI. Trouble is,
atomically swapping the wrapper scripts directory solves the wrong
problem! The runtime configuration is "locked in" when a container is
started, including the path to the runC binary. If a container is
started with a runtime which requires a daemon-managed wrapper script
and then the daemon is reloaded with a config which no longer requires
the wrapper script (i.e. some args -> no args, or the runtime is dropped
from the config), that container would become unmanageable. Any attempts
to stop, exec or otherwise perform lifecycle management operations on
the container are likely to fail due to the wrapper script no longer
existing at its original path.
Atomically swapping the wrapper scripts is also incompatible with the
read-copy-update paradigm for reloading configuration. A handler in the
daemon could retain a reference to the pre-reload configuration for an
indeterminate amount of time after the daemon configuration has been
reloaded and updated. It is possible for the daemon to attempt to start
a container using a deleted wrapper script if a request to run a
container races a reload.
Solve the problem of deleting referenced wrapper scripts by ensuring
that all wrapper scripts are *immutable* for the lifetime of the daemon
process. Any given runtime wrapper script must always exist with the
same contents, no matter how many times the daemon config is reloaded,
or what changes are made to the config. This is accomplished by using
everyone's favourite design pattern: content-addressable storage. Each
wrapper script file name is suffixed with the SHA-256 digest of its
contents to (probabilistically) guarantee immutability without needing
any concurrency control. Stale runtime wrapper scripts are only cleaned
up on the next daemon restart.
Split the derived runtimes configuration from the user-supplied
configuration to have a place to store derived state without mutating
the user-supplied configuration or exposing daemon internals in API
struct types. Hold the derived state and the user-supplied configuration
in a single struct value so that they can be updated as an atomic unit.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Ensure data-race-free access to the daemon configuration without
locking by mutating a deep copy of the config and atomically storing
a pointer to the copy into the daemon-wide configStore value. Any
operations which need to read from the daemon config must capture the
configStore value only once and pass it around to guarantee a consistent
view of the config.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Config reloading has interleaved validations and other fallible
operations with mutating the live daemon configuration. The daemon
configuration could be left in a partially-reloaded state if any of the
operations returns an error. Mutating a copy of the configuration and
atomically swapping the config struct on success is not currently an
option as config values are not copyable due to the presence of
sync.Mutex fields. Introduce a two-phase commit protocol to defer any
mutations of the daemon state until after all fallible operations have
succeeded.
Reload transactions are not yet entirely hermetic. The platform
reloading logic for custom runtimes on *nix could still leave the
directory of generated runtime wrapper scripts in an indeterminate state
if an error is encountered.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Historically, daemon.RegistryHosts() has returned a docker.RegistryHosts
callback function which closes over a point-in-time snapshot of the
daemon configuration. When constructing the BuildKit builder at daemon
startup, the return value of daemon.RegistryHosts() has been used.
Therefore the BuildKit builder would use the registry configuration as
it was at daemon startup for the life of the process, even if the
registry configuration is changed and the configuration reloaded.
Provide BuildKit with a RegistryHosts callback which reflects the
live daemon configuration after reloads so that registry operations
performed by BuildKit always use the same configuration as the rest of
the daemon.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Passing around a bare pointer to the map of configured features in order
to propagate to consumers changes to the configuration across reloads is
dangerous. Map operations are not atomic, so concurrently reading from
the map while it is being updated is a data race as there is no
synchronization. Use a getter function to retrieve the current features
map so the features can be retrieved race-free.
Remove the unused features argument from the build router.
Signed-off-by: Cory Snider <csnider@mirantis.com>
With this patch, the user-agent has information about the containerd-client
version and the storage-driver that's used when using the containerd-integration;
time="2023-06-01T11:27:07.959822887Z" level=info msg="listening on [::]:5000" go.version=go1.19.9 instance.id=53590f34-096a-4fd1-9c58-d3b8eb7e5092 service=registry version=2.8.2
...
172.18.0.1 - - [01/Jun/2023:11:30:12 +0000] "HEAD /v2/multifoo/blobs/sha256:c7ec7661263e5e597156f2281d97b160b91af56fa1fd2cc045061c7adac4babd HTTP/1.1" 404 157 "" "docker/dev go/go1.20.4 git-commit/8d67d0c1a8 kernel/5.15.49-linuxkit-pr os/linux arch/arm64 containerd-client/1.6.21+unknown storage-driver/overlayfs UpstreamClient(Docker-Client/24.0.2 \\(linux\\))"
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Before this, the client would report itself as containerd, and the containerd
version from the containerd go module:
time="2023-06-01T09:43:21.907359755Z" level=info msg="listening on [::]:5000" go.version=go1.19.9 instance.id=67b89d83-eac0-4f85-b36b-b1b18e80bde1 service=registry version=2.8.2
...
172.18.0.1 - - [01/Jun/2023:09:43:33 +0000] "HEAD /v2/multifoo/blobs/sha256:cb269d7c0c1ca22fb5a70342c3ed2196c57a825f94b3f0e5ce3aa8c55baee829 HTTP/1.1" 404 157 "" "containerd/1.6.21+unknown"
With this patch, the user-agent has the docker daemon information;
time="2023-06-01T11:27:07.959822887Z" level=info msg="listening on [::]:5000" go.version=go1.19.9 instance.id=53590f34-096a-4fd1-9c58-d3b8eb7e5092 service=registry version=2.8.2
...
172.18.0.1 - - [01/Jun/2023:11:27:20 +0000] "HEAD /v2/multifoo/blobs/sha256:c7ec7661263e5e597156f2281d97b160b91af56fa1fd2cc045061c7adac4babd HTTP/1.1" 404 157 "" "docker/dev go/go1.20.4 git-commit/8d67d0c1a8 kernel/5.15.49-linuxkit-pr os/linux arch/arm64 UpstreamClient(Docker-Client/24.0.2 \\(linux\\))"
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The default implementation of the containerd.Image interface provided by
the containerd operates on the parent index/manifest list of the image
and the platform matcher.
This isn't convenient when a specific manifest is already known and it's
redundant to search the whole index for a manifest that matches the
given platform matcher. It can also result in a different manifest
picked up than expected when multiple manifests with the same platform
are present.
This introduces a walkImageManifests which walks the provided image and
calls a handler with a ImageManifest, which is a simple wrapper that
implements containerd.Image interfaces and performs all containerd.Image
operations against a platform specific manifest instead of the root
manifest list/index.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Switching snapshotter implementations would result in an error when
preparing a snapshot, check that the image is indeed unpacked for the
current snapshot before trying to prepare a snapshot.
Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
In cases where an exec start failed the exec process will be nil even
though the channel to signal that the exec started was closed.
Ideally ExecConfig would get a nice refactor to handle this case better
(ie. it's not started so don't close that channel).
This is a minimal fix to prevent NPE. Luckilly this would only get
called by a client and only the http request goroutine gets the panic
(http lib recovers the panic).
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
`docker run -v /foo:/foo:ro` is now recursively read-only on kernel >= 5.12.
Automatically falls back to the legacy non-recursively read-only mount mode on kernel < 5.12.
Use `ro-non-recursive` to disable RRO.
Use `ro-force-recursive` or `rro` to explicitly enable RRO. (Fails on kernel < 5.12)
Fix issue 44978
Fix docker/for-linux issue 788
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Feature flags are one of the configuration items which can be reloaded
without restarting the daemon. Whether the daemon uses the containerd
snapshotter service or the legacy graph drivers is controlled by a
feature flag. However, much of the code which checks the snapshotter
feature flag assumes that the flag cannot change at runtime. Make it so
that the snapshotter setting can only be changed by restarting the
daemon, even if the flag state changes after a live configuration
reload.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Docker with containerd integration emits "Exists" progress action when a
layer of the currently pulled image already exists. This is different
from the non-c8d Docker which emits "Already exists".
This makes both implementations consistent by emitting backwards
compatible "Already exists" action.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
commit dc11d2a2d8 removed the devicemapper
storage-driver, so these warnings are no longer relevant.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
commit 0abb8dec3f removed support for
running overlay/overlay2 on top of a backing filesystem without d_type
support, and turned it into a fatal error when starting the daemon,
so there's no need to generate warnings for this situation.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Extended attributes are set on files in container images for a reason.
Fail to unpack if extended attributes are present in a layer and setting
the attributes on the unpacked files fails for any reason.
Add an option to the vfs graph driver to opt into the old behaviour
where ENOTSUPP and EPERM errors encountered when setting extended
attributes are ignored. Make it abundantly clear to users and anyone
triaging their bug reports that they are shooting themselves in the
foot by enabling this option.
Signed-off-by: Cory Snider <csnider@mirantis.com>
This fixes a bug where, if a user pulls an image with a tag != `latest` and
a specific platform, we return an NotFound error for the wrong (`latest`) tag.
see: https://github.com/moby/moby/issues/45558
This bug was introduced in 779a5b3029
in the changes to `daemon/images/image_pull.go`, when we started returning the error from the call to
`GetImage` after the pull. We do this call, if pulling with a specified platform, to check if the platform
of the pulled image matches the requested platform (for cases with single-arch images).
However, when we call `GetImage` we're not passing the image tag, only name, so `GetImage` assumes `latest`
which breaks when the user has requested a different tag, since there might not be such an image in the store.
Signed-off-by: Laura Brehm <laurabrehm@hey.com>
These changes add basic CDI integration to the docker daemon.
A cdi driver is added to handle cdi device requests. This
is gated by an experimental feature flag and is only supported on linux
This change also adds a CDISpecDirs (cdi-spec-dirs) option to the config.
This allows the default values of `/etc/cdi`, /var/run/cdi` to be overridden
which is useful for testing.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
When the `ServerAddress` in the `AuthConfig` provided by the client is
empty, default to the default registry (registry-1.docker.io).
This makes the behaviour the same as with the containerd image store
integration disabled.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Now that most uses of reexec have been replaced with non-reexec
solutions, most of the reexec.Init() calls peppered throughout the test
suites are unnecessary. Furthermore, most of the reexec.Init() calls in
test code neglects to check the return value to determine whether to
exit, which would result in the reexec'ed subprocesses proceeding to run
the tests, which would reexec another subprocess which would proceed to
run the tests, recursively. (That would explain why every reexec
callback used to unconditionally call os.Exit() instead of returning...)
Remove unneeded reexec.Init() calls from test and example code which no
longer needs it, and fix the reexec.Init() calls which are not inert to
exit after a reexec callback is invoked.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Add `execDuration` field to the event attributes map. This is useful for tracking how long the container ran.
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
Ports over all the previous image delete logic, such as:
- Introduce `prune` and `force` flags
- Introduce the concept of hard and soft image delete conflics, which represent:
- image referenced in multiple tags (soft conflict)
- image being used by a stopped container (soft conflict)
- image being used by a running container (hard conflict)
- Implement delete logic such as:
- if deleting by reference, and there are other references to the same image, just
delete the passed reference
- if deleting by reference, and there is only 1 reference and the image is being used
by a running container, throw an error if !force, or delete the reference and create
a dangling reference otherwise
- if deleting by imageID, and force is true, remove all tags (otherwise soft conflict)
- if imageID, check if stopped container is using the image (soft conflict), and
delete anyway if force
- if imageID was passed in, check if running container is using the image (hard conflict)
- if `prune` is true, and the image being deleted has dangling parents, remove them
This commit also implements logic to get image parents in c8d by comparing shared layers.
Signed-off-by: Laura Brehm <laurabrehm@hey.com>
This option was deprecated in 5a922dc162, which
is part of the v24.0.0 release, so we can remove it from master.
This patch;
- adds a check to ValidatePlatformConfig, and produces a fatal error
if oom-score-adjust is set
- removes the deprecated libcontainerd/supervisor.WithOOMScore
- removes the warning from docker info
With this patch:
dockerd --oom-score-adjust=-500 --validate
Flag --oom-score-adjust has been deprecated, and will be removed in the next release.
unable to configure the Docker daemon with file /etc/docker/daemon.json: merged configuration validation from file and command line flags failed: DEPRECATED: The "oom-score-adjust" config parameter and the dockerd "--oom-score-adjust" options have been removed.
And when using `daemon.json`:
dockerd --validate
unable to configure the Docker daemon with file /etc/docker/daemon.json: merged configuration validation from file and command line flags failed: DEPRECATED: The "oom-score-adjust" config parameter and the dockerd "--oom-score-adjust" options have been removed.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This field is deprecated since 1261fe69a3,
and will now be omitted on API v1.44 and up for the `GET /images/json`,
`GET /images/{id}/json`, and `GET /system/df` endpoints.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Treat copying extended attributes from a source filesystem which does
not support extended attributes as a no-op, same as if the file did not
possess the extended attribute. Only fail copying extended attributes if
the source file has the attribute and the destination filesystem does
not support xattrs.
Signed-off-by: Cory Snider <csnider@mirantis.com>
daemon.CreateImageFromContainer() already constructs a new config by taking
the image config, applying custom options (`docker commit --change ..`) (if
any), and merging those with the containers' configuration, so there is
no need to merge options again.
e22758bfb2/daemon/commit.go (L152-L158)
This patch removes the merge logic from generateCommitImageConfig, and
removes the unused arguments and error-return.
Co-authored-by: Djordje Lukic <djordje.lukic@docker.com>
Co-authored-by: Laura Brehm <laurabrehm@hey.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The OCI image-spec now also provides ArgsEscaped for backward compatibility
with the option used by Docker.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- Split these options to a separate struct, so that we can handle them in isolation.
- Change some tests to use subtests, and improve coverage
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Create dangling images for imported images which don't have a name
annotation attached. Previously the content got loaded, but no image
referencing it was created which caused it to be garbage collected
immediately.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
- Use logrus.Fields instead of multiple WithField
- Split one giant debug log into one log per image
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Don't panic when processing containers created under fork containerd
integration (this field was added in the upstream and didn't exist in
fork).
Co-authored-by: Djordje Lukic <djordje.lukic@docker.com>
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Since cc19eba (backported to v23.0.4), the PreferredPool for docker0 is
set only when the user provides the bip config parameter or when the
default bridge already exist. That means, if a user provides the
fixed-cidr parameter on a fresh install or reboot their computer/server
without bip set, dockerd throw the following error when it starts:
> failed to start daemon: Error initializing network controller: Error
> creating default "bridge" network: failed to parse pool request for
> address space "LocalDefault" pool "" subpool "100.64.0.0/26": Invalid
> Address SubPool
See #45356.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
The slice which stores chain ids used for computing shared size was
mistakenly initialized with the length set instead of the capacity.
This caused a panic when iterating over it later and dereferncing nil
pointer from empty items.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
This test does not apply when running with snapshotters enabled;
go test -v -run TestGetInspectData .
=== RUN TestGetInspectData
inspect_test.go:27: RWLayer of container inspect-me is unexpectedly nil
--- FAIL: TestGetInspectData (0.00s)
FAIL
FAIL github.com/docker/docker/daemon 0.049s
FAIL
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
In versions of Docker before v1.10, this field was calculated from
the image itself and all of its parent images. Images are now stored
self-contained, and no longer use a parent-chain, making this field
an equivalent of the Size field.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
In versions of Docker before v1.10, this field was calculated from
the image itself and all of its parent images. Images are now stored
self-contained, and no longer use a parent-chain, making this field
an equivalent of the Size field.
For the containerd integration, the Size should be the sum of the
image's compressed / packaged and unpacked (snapshots) layers.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
There's still some locations refering to AuFS;
- pkg/archive: I suspect most of that code is because the whiteout-files
are modelled after aufs (but possibly some code is only relevant to
images created with AuFS as storage driver; to be looked into).
- contrib/apparmor/template: likely some rules can be removed
- contrib/dockerize-disk.sh: very old contribution, and unlikely used
by anyone, but perhaps could be updated if we want to (or just removed).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
In the case of an error when calling snapshotter.Prepare we would return
nil. This change fixes that and returns the error from Prepare all the
time.
Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
It's not originally supported by image list, but we need it for `prune`
needs it, so `list` gets it for free.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Change return value in function signature and return fatal errors so
they can actually be reported to the caller instead of just being logged
to daemon log.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
The `oom-score-adjust` option was added in a894aec8d8,
to prevent the daemon from being OOM-killed before other processes. This
option was mostly added as a "convenience", as running the daemon as a
systemd unit was not yet common.
Having the daemon set its own limits is not best-practice, and something
better handled by the process-manager starting the daemon.
Commit cf7a5be0f2 fixed this option to allow
disabling it, and 2b8e68ef06 removed the default
score adjust.
This patch deprecates the option altogether, recommending users to set these
limits through the process manager used, such as the "OOMScoreAdjust" option
in systemd units.
With this patch:
dockerd --oom-score-adjust=-500 --validate
Flag --oom-score-adjust has been deprecated, and will be removed in the next release.
configuration OK
echo '{"oom-score-adjust":-500}' > /etc/docker/daemon.json
dockerd
INFO[2023-04-12T21:34:51.133389627Z] Starting up
INFO[2023-04-12T21:34:51.135607544Z] containerd not running, starting managed containerd
WARN[2023-04-12T21:34:51.135629086Z] DEPRECATED: The "oom-score-adjust" config parameter and the dockerd "--oom-score-adjust" option will be removed in the next release.
docker info
Client:
Context: default
Debug Mode: false
...
DEPRECATED: The "oom-score-adjust" config parameter and the dockerd "--oom-score-adjust" option will be removed in the next release
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The call to an unsecure registry doesn't return an error saying that the
"server gave an HTTP response to an HTTPS client" but a
tls.RecordHeaderError saying that the "first record does not look like a
TLS handshake", this changeset looks for the right error for that case.
This fixes the http fallback when using an insecure registry
Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
The GetRepository method interacts directly with the registry, and does
not depend on the snapshotter, but is used for two purposes;
For the GET /distribution/{name:.*}/json route;
dd3b71d17c/api/server/router/distribution/backend.go (L11-L15)
And to satisfy the "executor.ImageBackend" interface as used by Swarm;
58c027ac8b/daemon/cluster/executor/backend.go (L77)
This patch removes the method from the ImageService interface, and instead
implements it through an composite struct that satisfies both interfaces,
and an ImageBackend() method is added to the daemon.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
remove GetRepository from ImageService
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The signatures of functions in containerd's errdefs packages are very
similar to those in our own, and it's easy to accidentally use the wrong
package.
This patch uses a consistent alias for all occurrences of this import.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This const looks to only be there for "convenience", or _possibly_ was created
with future normalization or special handling in mind.
In either case, currently it is just a direct copy (alias) for runtime.GOOS,
and defining our own type for this gives the impression that it's more than
that. It's only used in a single place, and there's no external consumers, so
let's deprecate this const, and use runtime.GOOS instead.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This change makes is possible to run `docker exec -u <UID> ...` when the
containerd integration is activated.
Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
This only makes the containerd ImageService implementation respect
context cancellation though.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Implement Children method for containerd image store which makes the
`ancestor` filter work for `docker ps`. Checking if image is a children
of other image is implemented by comparing their rootfs diffids because
containerd image store doesn't have a concept of image parentship like
the graphdriver store. The child is expected to have more layers than
the parent and should start with all parent layers.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
While we currently do not provide an option to specify the snapshotter to use
for individual containers (we may want to add this option in future), currently
it already is possible to configure the snapshotter in the daemon configuration,
which could (likely) cause issues when changing and restarting the daemon.
This patch updates some code-paths that have the container available to use
the snapshotter that's configured for the container (instead of the default
snapshotter configured).
There are still code-paths to be looked into, and a tracking ticket as well as
some TODO's were added for those.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Previously, the AWSLogs driver attempted to implement
non-blocking itself. Non-blocking is supposed to
implemented solely by the Docker RingBuffer that
wraps the log driver.
Please see issue and explanation here:
https://github.com/moby/moby/issues/45217
Signed-off-by: Wesley Pettit <wppttt@amazon.com>
Prevent the daemon from erroring out if daemon.json contains default
network options for network drivers aside from bridge. Configuring
defaults for the bridge driver previously worked by coincidence because
the unrelated CLI flag '--bridge' exists.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Attestation manifests have an OCI image media type, which makes them
being listed like they were a separate platform supported by the image.
Don't use `images.Platforms` and walk the manifest list ourselves
looking for all manifests that are an actual image manifest.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Distribution source labels don't store port of the repository. If the
content was obtained from repository 172.17.0.2:5000 then its
corresponding label will have a key "containerd.io/distribution.source.172.17.0.2".
Fix the check in canBeMounted to ignore the :port part of the domain.
This also removes the check which prevented insecure repositories to use
cross-repo mount - the real cause was the mismatch in domain comparison.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Distribution source label can specify multiple repositories - in this
case value is a comma separated list of source repositories.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Previously the labels would be appended for content that was pushed
even if subsequent pushes of other content failed.
Change the behavior to only append the labels if the whole push
operation succeeded.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>