Previously the labels would be appended for content that was pushed
even if subsequent pushes of other content failed.
Change the behavior to only append the labels if the whole push
operation succeeded.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Handler is called in parallel and modifying a map without
synchronization is a race condition.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
- make jobs.Add accept a list of jobs, so that we don't have to
repeatedly lock/unlock the mutex
- rename some variables that collided with imports or types
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This implements `docker push` under containerd image store. When
pushing manifest lists that reference a content which is not present in
the local content store, it will attempt to perform the cross-repo mount
the content if possible.
Considering this scenario:
```bash
$ docker pull docker.io/library/busybox
```
This will download manifest list and only host platform-specific
manifest and blobs.
Note, tagging to a different repository (but still the same registry) and pushing:
```bash
$ docker tag docker.io/library/busybox docker.io/private-repo/mybusybox
$ docker push docker.io/private-repo/mybusybox
```
will result in error, because the neither we nor the target repository
doesn't have the manifests that the busybox manifest list references
(because manifests can't be cross-repo mounted).
If for some reason the manifests and configs for all other platforms
would be present in the content store, but only layer blobs were
missing, then the push would work, because the blobs can be cross-repo
mounted (only if we push to the same registry).
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Push the reference parsing from repo and tag names into the api and pass
a reference object to the ImageService.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Previously commit incorrectly used image config digest as an image id
for the new image which isn't consistent with the image target.
This changes it to use manifest digest.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Don't try to parse dangling images name (they have a non-canonical
format - `moby-dangling@sha256:...`) as a reference.
Log a warning if the image is not dangling and its name is not a valid
named reference.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
The `docker-init` binary is not intended to be a user-facing command, and as such it is more appropriate for it to be found in `/usr/libexec` (or similar) than in `PATH` (see the FHS, especially https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch04s07.html and https://refspecs.linuxfoundation.org/FHS_2.3/fhs-2.3.html#USRLIBLIBRARIESFORPROGRAMMINGANDPA).
This adjusts the logic for using that configuration option to take this into account and appropriately search for `docker-init` (or the user's configured alternative) in these directories before falling back to the existing `PATH` lookup behavior.
This behavior _used_ to exist for the old `dockerinit` binary (of a similar name and used in a similar way but for an alternative purpose), but that behavior was removed in 4357ed4a73 when that older `dockerinit` was also removed.
Most of this reasoning _also_ applies to `docker-proxy` (and various `containerd-xxx` binaries such as the shims), but this change does not affect those. It would be relatively straightforward to adapt `LookupInitPath` to be a more generic function such as `libexecLookupPath` or similar if we wanted to explore that.
See 14482589df/cli-plugins/manager/manager_unix.go for the related path list in the CLI which loads CLI plugins from a similar set of paths (with a similar rationale - plugin binaries are not typically intended to be run directly by users but rather invoked _via_ the CLI binary).
Signed-off-by: Tianon Gravi <admwiggin@gmail.com>
Commit 6a516acb2e moved the MemInfo type and
ReadMemInfo() function into the pkg/sysinfo package. In an attempt to assist
consumers of these to migrate to the new location, an alias was added.
Unfortunately, the side effect of this alias is that pkg/system now depends
on pkg/sysinfo, which means that consumers of this (such as docker/cli) now
get all (indirect) dependencies of that package as dependency, which includes
many dependencies that should only be needed for the daemon / runtime;
- github.com/cilium/ebpf
- github.com/containerd/cgroups
- github.com/coreos/go-systemd/v22
- github.com/godbus/dbus/v5
- github.com/moby/sys/mountinfo
- github.com/opencontainers/runtime-spec
This patch moves the MemInfo related code to its own package. As the previous move
was not yet part of a release, we're not adding new aliases in pkg/sysinfo.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
idtools.MkdirAllAndChown on Windows does not chown directories, which makes
idtools.MkdirAllAndChown() just an alias for system.MkDirAll().
Also setting the filemode to `0`, as changing filemode is a no-op on Windows as
well; both of these changes should make it more transparent that no chown'ing,
nor changing filemode takes place.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Volumes created from the image config were not being pruned because the
volume service did not think they were anonymous since the code to
create passes along a generated name instead of letting the volume
service generate it.
This changes the code path to have the volume service generate the name
instead of doing it ahead of time.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Commit 3246db3755 added handling for removing
cluster volumes, but in some conditions, this resulted in errors not being
returned if the volume was in use;
docker swarm init
docker volume create foo
docker create -v foo:/foo busybox top
docker volume rm foo
This patch changes the logic for ignoring "local" volume errors if swarm
is enabled (and cluster volumes supported).
While working on this fix, I also discovered that Cluster.RemoveVolume()
did not handle the "force" option correctly; while swarm correctly handled
these, the cluster backend performs a lookup of the volume first (to obtain
its ID), which would fail if the volume didn't exist.
Before this patch:
make TEST_FILTER=TestVolumesRemoveSwarmEnabled DOCKER_GRAPHDRIVER=vfs test-integration
...
Running /go/src/github.com/docker/docker/integration/volume (arm64.integration.volume) flags=-test.v -test.timeout=10m -test.run TestVolumesRemoveSwarmEnabled
...
=== RUN TestVolumesRemoveSwarmEnabled
=== PAUSE TestVolumesRemoveSwarmEnabled
=== CONT TestVolumesRemoveSwarmEnabled
=== RUN TestVolumesRemoveSwarmEnabled/volume_in_use
volume_test.go:122: assertion failed: error is nil, not errdefs.IsConflict
volume_test.go:123: assertion failed: expected an error, got nil
=== RUN TestVolumesRemoveSwarmEnabled/volume_not_in_use
=== RUN TestVolumesRemoveSwarmEnabled/non-existing_volume
=== RUN TestVolumesRemoveSwarmEnabled/non-existing_volume_force
volume_test.go:143: assertion failed: error is not nil: Error response from daemon: volume no_such_volume not found
--- FAIL: TestVolumesRemoveSwarmEnabled (1.57s)
--- FAIL: TestVolumesRemoveSwarmEnabled/volume_in_use (0.00s)
--- PASS: TestVolumesRemoveSwarmEnabled/volume_not_in_use (0.01s)
--- PASS: TestVolumesRemoveSwarmEnabled/non-existing_volume (0.00s)
--- FAIL: TestVolumesRemoveSwarmEnabled/non-existing_volume_force (0.00s)
FAIL
With this patch:
make TEST_FILTER=TestVolumesRemoveSwarmEnabled DOCKER_GRAPHDRIVER=vfs test-integration
...
Running /go/src/github.com/docker/docker/integration/volume (arm64.integration.volume) flags=-test.v -test.timeout=10m -test.run TestVolumesRemoveSwarmEnabled
...
make TEST_FILTER=TestVolumesRemoveSwarmEnabled DOCKER_GRAPHDRIVER=vfs test-integration
...
Running /go/src/github.com/docker/docker/integration/volume (arm64.integration.volume) flags=-test.v -test.timeout=10m -test.run TestVolumesRemoveSwarmEnabled
...
=== RUN TestVolumesRemoveSwarmEnabled
=== PAUSE TestVolumesRemoveSwarmEnabled
=== CONT TestVolumesRemoveSwarmEnabled
=== RUN TestVolumesRemoveSwarmEnabled/volume_in_use
=== RUN TestVolumesRemoveSwarmEnabled/volume_not_in_use
=== RUN TestVolumesRemoveSwarmEnabled/non-existing_volume
=== RUN TestVolumesRemoveSwarmEnabled/non-existing_volume_force
--- PASS: TestVolumesRemoveSwarmEnabled (1.53s)
--- PASS: TestVolumesRemoveSwarmEnabled/volume_in_use (0.00s)
--- PASS: TestVolumesRemoveSwarmEnabled/volume_not_in_use (0.01s)
--- PASS: TestVolumesRemoveSwarmEnabled/non-existing_volume (0.00s)
--- PASS: TestVolumesRemoveSwarmEnabled/non-existing_volume_force (0.00s)
PASS
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
SearchRegistryForImages does not make sense as part of the image
service interface. The implementation just wraps the search API of the
registry service to filter the results client-side. It has nothing to do
with local image storage, and the implementation of search does not need
to change when changing which backend (graph driver vs. containerd
snapshotter) is used for local image storage.
Filtering of the search results is an implementation detail: the
consumer of the results does not care which actor does the filtering so
long as the results are filtered as requested. Move filtering into the
exported API of the registry service to hide the implementation details.
Only one thing---the registry service implementation---would need to
change in order to support server-side filtering of search results if
Docker Hub or other registry servers were to add support for it to their
APIs.
Use a fake registry server in the search unit tests to avoid having to
mock out the registry API client.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Co-authored-by: Nicolas De Loof <nicolas.deloof@gmail.com>
Co-authored-by: Paweł Gronowski <pawel.gronowski@docker.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This addresses the previous issue with the containerd store where, after a container is created, we can't deterministically resolve which image variant was used to run it (since we also don't store what platform the image was fetched for).
This is required for things like `docker commit`, and computing the containers layer size later, since we need to resolve the specific image variant.
Signed-off-by: Laura Brehm <laurabrehm@hey.com>
Signed-off-by: Laura Brehm <laurabrehm@hey.com>
Co-authored-by: Laura Brehm <laurabrehm@hey.com>
Co-authored-by: Sebastiaan van Stijn <github@gone.nl>
Co-authored-by: Paweł Gronowski <pawel.gronowski@docker.com>
Co-authored-by: Nicolas De Loof <nicolas.deloof@gmail.com>
The authorization.Middleware contains a sync.Mutex field, making it
non-copyable. Remove one of the barriers to allowing deep copies of
config.Config values.
Inject the middleware into Daemon as a constructor argument instead.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Deprecate `<none>:<none>` and `<none>@<none>` magic strings included in
`RepoTags` and `RepoDigests`.
Produce an empty arrays instead and leave the presentation of
untagged/dangling images up to the client.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Checking if the image creation failed due to IsAlreadyExists didn't use
the error from ImageService.Create.
Error from ImageService.Create was stored in a separate variable and
later IsAlreadyExists checked the standard `err` variable instead of the
`createErr`.
As there's no need to store the error in a separate variable - just
assign it to err variable and fix the check.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Kubernetes only permits RuntimeClass values which are valid lowercase
RFC 1123 labels, which disallows the period character. This prevents
cri-dockerd from being able to support configuring alternative shimv2
runtimes for a pod as shimv2 runtime names must contain at least one
period character. Add support for configuring named shimv2 runtimes in
daemon.json so that runtime names can be aliased to
Kubernetes-compatible names.
Allow options to be set on shimv2 runtimes in daemon.json.
The names of the new daemon runtime config fields have been selected to
correspond with the equivalent field names in cri-containerd's
configuration so that users can more easily follow documentation from
the runtime vendor written for cri-containerd and apply it to
daemon.json.
Signed-off-by: Cory Snider <csnider@mirantis.com>
This reverts commit ab3fa46502.
This fix was partial, and is not needed with the proper fix in
containerd.
Signed-off-by: Bjorn Neergaard <bneergaard@mirantis.com>
Go 1.20 made a change to the behaviour of package "os/exec" which was
not mentioned in the release notes:
2b8f214094
Attempts to execute a directory now return syscall.EISDIR instead of
syscall.EACCESS. Check for EISDIR errors from the runtime and fudge the
returned error message to maintain compatibility with existing versions
of docker/cli when using a version of runc compiled with Go 1.20+.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Implements image tagging under containerd image store.
If an image with this tag already exists, and there's no other image
with the same target, change its name. The name will have a special
format `moby-dangling@<digest>` which isn't a valid canonical reference
and doesn't resolve to any repository.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
TagImage is just a wrapper for TagImageWithReference which parses the
repo and tag into a reference. Change TagImageWithReference into
TagImage and move the responsibility of reference parsing to caller.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
`hostSupports` doesn't check if the apparmor_parser is available.
It's possible in some environments that the apparmor will be enabled but
the tool to load the profile is not available which will cause the
ensureDefaultAppArmorProfile to fail completely.
This patch checks if the apparmor_parser is available. Otherwise the
function returns early, but still logs a warning to the daemon log.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
c8d/daemon: Mount root and fill BaseFS
This fixes things that were broken due to nil BaseFS like `docker cp`
and running a container with workdir override.
This is more of a temporary hack than a real solution.
The correct fix would be to refactor the code to make BaseFS and LayerRW
an implementation detail of the old image store implementation and use
the temporary mounts for the c8d implementation instead.
That requires more work though.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
daemon/images: Don't unset BaseFS
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
This makes the `docker save` and `docker load` work with the containerd
image store. The archive is both OCI and Docker compatible.
Saved archive will only contain content which is available locally. In
case the saved image is a multi-platform manifest list, the behavior
depends on the local availability of the content. This is to be
reconsidered when we have the `--platform` option in the CLI.
- If all manifests and their contents, referenced by the manifest list
are present, then the manifest-list is exported directly and the ID
will be the same.
- If only one platform manifest is present, only that manifest is
exported (the image id will change and will be the id of
platform-specific manifest, instead of the full manifest list).
- If multiple, but not all, platform manifests are available, a new
manifest list will be created which will be a subset of the original
manifest list.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
The Pid field of an exit event cannot be relied upon to differentiate
exits of the container's task from exits of other container processes,
i.e. execs. The Pid is reported by the runtime and is implementation-
defined so there is no guarantee that a task's pid is distinct from the
pids of any other process in the same container. In particular,
kata-containers reports the pid of the hypervisor for all exit events.
ContainerD guarantees that the process ID of a task is set to the
corresponding container ID, so use that invariant to distinguish task
exits from other process exits.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The latest version of containerd-shim-runhcs-v1 (v0.10.0-rc.4) pulled in
with the bump to ContainerD v1.7.0-rc.3 had several changes to make it
more robust, which had the side effect of increasing the worst-case
amount of time it takes for a container to exit in the worst case.
Notably, the total timeout for shutting down a task increased from 30
seconds to 60! Increase the timeouts hardcoded in the daemon and
integration tests so that they don't give up too soon.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The netutils.ElectInterfaceAddresses function is only used in one place
outside of tests: in the daemon, to configure the default bridge
network. The function is also messy to reason about as it references the
shared mutable state of ipamutils.PredefinedLocalScopeDefaultNetworks.
It uses the list of predefined default networks to always return an IPv4
address even if the named interface does not exist or does not have any
IPv4 addresses. This list happens to be the same as the one used to
initialize the address pool of the 'builtin' IPAM driver, though that is
far from obvious. (Start with "./libnetwork".initIPAMDrivers and trace
the dataflow of the addressPool value. Surprise! Global state is being
mutated using the value of other global mutable state.)
The daemon does not need the fallback behaviour of
ElectInterfaceAddresses. In fact, the daemon does not have to configure
an address pool for the network at all! libnetwork will acquire one of
the available address ranges from the network's IPAM driver when the
preferred-pool configuration is unset. It will do so using the same list
of address ranges and the exact same logic
(netutils.FindAvailableNetworks) as ElectInterfaceAddresses. So unless
the daemon needs to force the network to use a specific address range
because the bridge interface already exists, it can leave the details
up to libnetwork.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The pattern of parsing bool was repeated across multiple files and
caused the duplication of the invalidFilter error helper.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
After the context is cancelled, update the progress for the last time.
This makes sure that the progress also includes finishing updates.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Basically every exported method which takes a libnetwork.Sandbox
argument asserts that the value's concrete type is *sandbox. Passing any
other implementation of the interface is a runtime error! This interface
is a footgun, and clearly not necessary. Export and use the concrete
type instead.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Signed-off-by: Nicolas De Loof <nicolas.deloof@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
containerd: Push progress
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
If the imported layer archive is uncompressed, it gets compressed with
gzip before writing to the content store.
Archives compressed with gzip and zstd are imported as-is.
Xz and bzip2 are recompressed into gzip.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
This helps ensure that users are not surprised by unexpected tokens in
the JSON parser, or fallout later in the daemon.
Signed-off-by: Bjorn Neergaard <bneergaard@mirantis.com>
This is a pragmatic but impure choice, in order to better support the
default tools available on Windows Server, and reduce user confusion due
to otherwise inscrutable-to-the-uninitiated errors like the following:
> invalid character 'þ' looking for beginning of value
> invalid character 'ÿ' looking for beginning of value
While meaningful to those who are familiar with and are equipped to
diagnose encoding issues, these characters will be hidden when the file
is edited with a BOM-aware text editor, and further confuse the user.
Signed-off-by: Bjorn Neergaard <bneergaard@mirantis.com>
[RFC 8259] allows for JSON implementations to optionally ignore a BOM
when it helps with interoperability; do so in Moby as Notepad (the only
text editor available out of the box in many versions of Windows Server)
insists on writing UTF-8 with a BOM.
[RFC 8259]: https://tools.ietf.org/html/rfc8259#section-8.1
Signed-off-by: Bjorn Neergaard <bneergaard@mirantis.com>
By relying on the kernel UAPI (userspace API), we can drop a dependency
and simplify building Moby, while also ensuring that we are using a
stable/supported source of the C types and defines we need.
btrfs-progs mirrors the kernel headers, but the headers it ships with
are not the canonical source and as [we have seen before][44698], could
be subject to changes.
Depending on the canonical headers from the kernel both is more
idiomatic, and ensures we are protected by the kernel's promise to not
break userspace.
[44698]: https://github.com/moby/moby/issues/44698
Signed-off-by: Bjorn Neergaard <bneergaard@mirantis.com>
This is actually quite meaningless as we are reporting the libbtrfs
version, but we do not use libbtrfs. We only use the kernel interface to
btrfs instead.
While we could report the version of the kernel headers in play, they're
rather all-or-nothing: they provide the structures and defines we need,
or they don't. As such, drop all version information as the host kernel
version is the only thing that matters.
Signed-off-by: Bjorn Neergaard <bneergaard@mirantis.com>
This is a squashed version of various PRs (or related code-changes)
to implement image inspect with the containerd-integration;
- add support for image inspect
- introduce GetImageOpts to manage image inspect data in backend
- GetImage to return image tags with details
- list images matching digest to discover all tags
- Add ExposedPorts and Volumes to the image returned
- Refactor resolving/getting images
- Return the image ID on inspect
- consider digest and ignore tag when both are set
- docker run --platform
Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
Signed-off-by: Nicolas De Loof <nicolas.deloof@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The image store's used are an interface, so there's no guarantee
that implementations don't wrap the errors. Make sure to catch
such cases by using errors.Is.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Having this function hides what it's doing, which is just to type-cast
to an image.ID (which is a digest). Using a cast is more transparent,
so deprecating this function in favor of a regular typecast.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Simplify the error message so that we don't have to distinguish between static-
and non-static builds. Also update the link to the storage-driver section to
use a "/go/" redirect in the docs, as the anchor link was no longer correct.
Using a "/go/" redirect makes sure the link remains functional if docs is moving
around.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This function was added in b86e3bee5a to
work around an issue in os/user.Current(), which SEGFAULTS when compiling
statically with cgo enabled (see golang/go#13470).
We hit similar issues in other parts, and contributed a "osusergo" build-
tag in https://go-review.googlesource.com/c/go/+/330753. The "osusergo"
build tag must be set when compiling static binaries with cgo enabled.
If that build-tag is set, the cgo implementation for user.Current() won't
be used, and a pure-go implementation is used instead;
https://github.com/golang/go/blob/go1.19.4/src/os/user/cgo_lookup_unix.go#L5
With the above in place, we no longer need this workaround, and can remove
the ensureHomeIfIAmStatic() function.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
After further looking at the code, it appears that the default exit-code
for unknown (other) errors is 128 (set in `defer`).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
These matches were overwriting the previous "match", so reversing the
order in which they're tried so that we can return early.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Rename the variable make it more visible where it's used, as there's were
other "err" variables masking it.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Fixes a (theoretical?) panic if ID would be shorter than 12
characters. Also trim the ID _after_ cutting off the suffix.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
These types and functions are more closely related to the functionality
provided by pkg/systeminfo, and used in conjunction with the other functions
in that package, so moving them there.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Since b58de39ca7, this option was now only used
to produce a fatal error when starting the daemon. That change is in the 23.0
release, so we can remove it from the master branch.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This type was added in 677a6b3506, and named
"common", because at the time, the "docker" and "dockerd" (daemon) code
were still in the same repository, and shared this type. Renaming it, now
that's no longer the case.
As there are no external consumers of this type, I'm not adding an alias.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(*Daemon).registerLinks() calling the WriteHostConfig() method of its
container argument is a vestigial behaviour. In the distant past,
registerLinks() would persist the container links in an SQLite database
and drop the link config from the container's persisted HostConfig. This
changed in Docker v1.10 (#16032) which migrated away from SQLite and
began using the link config in the container's HostConfig as the
persistent source of truth. registerLinks() no longer mutates the
HostConfig at all so persisting the HostConfig to disk falls outside of
its scope of responsibilities.
Signed-off-by: Cory Snider <csnider@mirantis.com>
(*Container).CheckpointTo() upserts a snapshot of the container to the
daemon's in-memory ViewDB and also persists the snapshot to disk. It
does not register the live container object with the daemon's container
store, however. The ViewDB and container store are used as the source of
truth for different operations, so having a container registered in one
but not the other can result in inconsistencies. In particular, the List
Containers API uses the ViewDB as its source of truth and the Container
Inspect API uses the container store.
The (*Daemon).setHostConfig() method is called fairly early in the
process of creating a container, long before the container is registered
in the daemon's container store. Due to a rogue CheckpointTo() call
inside setHostConfig(), there is a window of time where a container can
be included in a List Containers API response but "not exist" according
to the Container Inspect API and similar endpoints which operate on a
particular container. Remove the rogue call so that the caller has full
control over when the container is checkpointed and update callers to
checkpoint explicitly. No changes to (*Daemon).create() are needed as it
checkpoints the fully-created container via (*Daemon).Register().
Fixes#44512.
Signed-off-by: Cory Snider <csnider@mirantis.com>
GetContainer() would return (nil, nil) when looking up a container
if the container was inserted into the containersReplica ViewDB but not
the containers Store at the time of the lookup. Callers which reasonably
assume that the returned err == nil implies returned container != nil
would dereference a nil pointer and panic. Change GetContainer() so that
it always returns a container or an error.
Signed-off-by: Cory Snider <csnider@mirantis.com>
the non-exported "daemon.createNetwork" already returns nil if there's
an error, so no need to check the error.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This allows differentiating how the detailed data is collected between
the containerd-integration code and the existing implementation.
Signed-off-by: Nicolas De Loof <nicolas.deloof@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The List Images API endpoint has accepted multiple values for the
`since` and `before` filter predicates, but thanks to Go's randomizing
of map iteration order, it would pick an arbitrary image to compare
created timestamps against. In other words, the behaviour was undefined.
Change these filter predicates to have well-defined semantics: the
logical AND of all values for each of the respective predicates. As
timestamps are a totally-ordered relation, this is exactly equivalent to
applying the newest and oldest creation timestamps for the `since` and
`before` predicates, respectively.
Signed-off-by: Cory Snider <csnider@mirantis.com>
This one is a "bit" fuzzy, as it may not be _directly_ related to `archive`,
but it's always used _in combination_ with the archive package, so moving it
there.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The singleflight function was capturing the context.Context of the first
caller that invoked the `singleflight.Do`. This could cause all
concurrent calls to be cancelled when the first request is cancelled.
singleflight calls were also moved from the ImageService to Daemon, to
avoid having to implement this logic in both graphdriver and containerd
based image services.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
This is only used for tests, and the key is not verified anymore, so
instead of creating a key and storing it, we can just use an ad-hoc
one.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Turned out that the loadOrCreateTrustKey() utility was doing exactly the
same as libtrust.LoadOrCreateTrustKey(), so making it a thin wrapped. I kept
the tests to verify the behavior, but we could remove them as we only need this
for our integration tests.
The storage location for the generated key was changed (again as we only need
this for some integration tests), so we can remove the TrustKeyPath from the
config.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The migration code is in the 22.06 branch, and if we don't migrate
the only side-effect is the daemon's ID being regenerated (as a
UUID).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Raspberry Pi allows to start system under overlayfs.
Docker is successfully fallbacks to fuse-overlay but not starting
because of the `Error starting daemon: rename /var/lib/docker/runtimes /var/lib/docker/runtimes-old: invalid cross-device link` error
It's happening because `rename` is not supported by overlayfs.
After manually removing directory `runtimes` docker starts and works successfully
Signed-off-by: Illia Antypenko <ilya@antipenko.pp.ua>
Both of these were deprecated in 55f675811a,
but the format of the GoDoc comments didn't follow the correct format, which
caused them not being picked up by tools as "deprecated".
This patch updates uses in the codebase to use the alternatives.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
If the file doesn't exist, the process isn't running, so we should be able
to ignore that.
Also remove an intermediate variable.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Mounting a container's volumes under its rootfs directory inside the
host mount namespace causes problems with cross-namespace mount
propagation when /var/lib/docker is bind-mounted into the container as a
volume. The mount event propagates into the container's mount namespace,
overmounting the volume, but the propagated unmount events do not fully
reverse the effect. Each archive operation causes the mount table in the
container's mount namespace to grow larger and larger, until the kernel
limiton the number of mounts in a namespace is hit. The only solution to
this issue which is not subject to race conditions or other blocker
caveats is to avoid mounting volumes into the container's rootfs
directory in the host mount namespace in the first place.
Mount the container volumes inside an unshared mount namespace to
prevent any mount events from propagating into any other mount
namespace. Greatly simplify the archiving implementations by also
chrooting into the container rootfs to sidestep the need to resolve
paths in the host.
Signed-off-by: Cory Snider <csnider@mirantis.com>
It is only applicable to Windows so it does not need to be called from
platform-generic code. Fix locking in the Windows implementation.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The Linux implementation needs to diverge significantly from the Windows
one in order to fix platform-specific bugs. Cut the generic
implementation out of daemon/archive.go and paste identical, verbatim
copies of that implementation into daemon/archive_{windows,linux}.go to
make it easier to compare the progression of changes to the respective
implementations through Git history.
Signed-off-by: Cory Snider <csnider@mirantis.com>
`system.MkdirAll()` is a special version of os.Mkdir to handle creating directories
using Windows volume paths (`"\\?\Volume{4c1b02c1-d990-11dc-99ae-806e6f6e6963}"`).
This may be important when `MkdirAll` is used, which traverses all parent paths to
create them if missing (ultimately landing on the "volume" path).
The daemon.NewDaemon() function used `system.MkdirAll()` in various places where
a subdirectory within `daemon.Root` was created. This appeared to be mostly out
of convenience (to not have to handle `os.ErrExist` errors). The `daemon.Root`
directory should already be set up in these locations, and should be set up with
correct permissions. Using `system.MkdirAll()` would potentially mask errors if
the root directory is missing, and instead set up parent directories (possibly
with incorrect permissions).
Because of the above, this patch changes `system.MkdirAll` to `os.Mkdir`. As we
are changing these lines, this patch also changes the legacy octal notation
(`0700`) to the now preferred `0o700`.
One location continues to use `system.MkdirAll`, as the temp-directory may be
configured to be outside of `daemon.Root`, but a redundant `os.Stat(realTmp)`
was removed, as `system.MkdirAll` is expected to handle this.
As we are changing these lines, this patch also changes the legacy octal notation
(`0700`) to the now preferred `0o700`.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This makes it more transparent that it's unused for Linux,
and we don't pass "root", which has no relation with the
path on Linux.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This allows us to run CI with the containerd snapshotter enabled, without
patching the daemon.json, or changing how tests set up daemon flags.
A warning log is added during startup, to inform if this variable is set,
as it should only be used for our integration tests.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The implementation of CanAccess() is very rudimentary, and should
not be used for anything other than a basic check (and maybe not
even for that). It's only used in a single location in the daemon,
so move it there, and un-export it to not encourage others to use
it out of context.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Building off insights from the great work Cory Snider has been doing,
this replaces a reexec with a much lower overhead implementation which
performs the `Chddir` in a new goroutine that is locked to a specific
thread with CLONE_FS unshared.
The thread is thrown away afterwards and the Chdir does effectively the
same thing as what the reexec was being used for.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
go-winio now defines this function, so we can consume that.
Note that there's a difference between the old implementation and the original
one (added in 1cb9e9b44e). The old implementation
had special handling for win32 error codes, which was removed in the go-winio
implementation in 0966e1ad56
As `go-winio.GetFileSystemType()` calls `filepath.VolumeName(path)` internally,
this patch also removes the `string(home[0])`, which is redundant, and could
potentially panic if an empty string would be passed.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Commit 955c1f881a (Docker v17.12.0) replaced
detection of support for multiple lowerdirs (as required by overlay2) to not
depend on the kernel version. The `overlay2.override_kernel_check` was still
used to print a warning that older kernel versions may not have full support.
After this, commit e226aea280 (Docker v20.10.0,
backported to v19.03.7) removed uses of the `overlay2.override_kernel_check`
option altogether, but we were still parsing it.
This patch changes the `parseOptions()` function to not parse the option,
printing a deprecation warning instead. We should change this to be an error,
but the `overlay2.override_kernel_check` option was not deprecated in the
documentation, so keeping it around for one more release.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
On Linux, when (os/exec.Cmd).SysProcAttr.Pdeathsig is set, the signal
will be sent to the process when the OS thread on which cmd.Start() was
executed dies. The runtime terminates an OS thread when a goroutine
exits after being wired to the thread with runtime.LockOSThread(). If
other goroutines are allowed to be scheduled onto a thread which called
cmd.Start(), an unrelated goroutine could cause the thread to be
terminated and prematurely signal the command. See
https://github.com/golang/go/issues/27505 for more information.
Prevent started subprocesses with Pdeathsig from getting signaled
prematurely by wiring the starting goroutine to the OS thread until the
subprocess has exited. No other goroutines can be scheduled onto a
locked thread so it will remain alive until unlocked or the daemon
process exits.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The pkg/fsutils package was forked in containerd, and later moved to
containerd/continuity/fs. As we're moving more bits to containerd, let's also
use the same implementation to reduce code-duplication and to prevent them from
diverging.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Before this change, the awslogs collectBatch and processEvent
function documentation still referenced the batchPublishFrequency
constant which was removed in favor of the configurable log stream
forceFlushInterval member.
Signed-off-by: Austin Vazquez <macedonv@amazon.com>
Before this change restarting the daemon in live-restore with running
containers + a restart policy meant that volume refs were not restored.
This specifically happens when the container is still running *and*
there is a restart policy that would make sure the container was running
again on restart.
The bug allows volumes to be removed even though containers are
referencing them. 😱
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This package was moved to a separate repository, using the steps below:
# install filter-repo (https://github.com/newren/git-filter-repo/blob/main/INSTALL.md)
brew install git-filter-repo
cd ~/projects
# create a temporary clone of docker
git clone https://github.com/docker/docker.git moby_pubsub_temp
cd moby_pubsub_temp
# for reference
git rev-parse HEAD
# --> 572ca799db
# remove all code, except for pkg/pubsub, license, and notice, and rename pkg/pubsub to /
git filter-repo --path pkg/pubsub/ --path LICENSE --path NOTICE --path-rename pkg/pubsub/:
# remove canonical imports
git revert -s -S 585ff0ebbe6bc25b801a0e0087dd5353099cb72e
# initialize module
go mod init github.com/moby/pubsub
go mod tidy
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Managed containerd processes are executed with SysProcAttr.Pdeathsig set
to syscall.SIGKILL so that the managed containerd is automatically
killed along with the daemon. At least, that is the intention. In
practice, the signal is sent to the process when the creating _OS
thread_ dies! If a goroutine exits while locked to an OS thread, the Go
runtime will terminate the thread. If that thread happens to be the
same thread which the subprocess was started from, the subprocess will
be signaled. Prevent the journald driver from sometimes unintentionally
killing child processes by ensuring that all runtime.LockOSThread()
calls are paired with runtime.UnlockOSThread().
Signed-off-by: Cory Snider <csnider@mirantis.com>
daemon/network/filter_test.go:174:19: empty-lines: extra empty line at the end of a block (revive)
daemon/restart.go:17:116: empty-lines: extra empty line at the end of a block (revive)
daemon/daemon_linux_test.go:255:41: empty-lines: extra empty line at the end of a block (revive)
daemon/reload_test.go:340:58: empty-lines: extra empty line at the end of a block (revive)
daemon/oci_linux.go:495:101: empty-lines: extra empty line at the end of a block (revive)
daemon/seccomp_linux_test.go:17:36: empty-lines: extra empty line at the start of a block (revive)
daemon/container_operations.go:560:73: empty-lines: extra empty line at the end of a block (revive)
daemon/daemon_unix.go:558:76: empty-lines: extra empty line at the end of a block (revive)
daemon/daemon_unix.go:1092:64: empty-lines: extra empty line at the start of a block (revive)
daemon/container_operations.go:587:24: empty-lines: extra empty line at the end of a block (revive)
daemon/network.go:807:18: empty-lines: extra empty line at the end of a block (revive)
daemon/network.go:813:42: empty-lines: extra empty line at the end of a block (revive)
daemon/network.go:872:72: empty-lines: extra empty line at the end of a block (revive)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
daemon/images/image_squash.go:17:71: empty-lines: extra empty line at the start of a block (revive)
daemon/images/store.go:128:27: empty-lines: extra empty line at the end of a block (revive)
daemon/images/image_list.go:154:55: empty-lines: extra empty line at the start of a block (revive)
daemon/images/image_delete.go:135:13: empty-lines: extra empty line at the end of a block (revive)
daemon/images/image_search.go:25:64: empty-lines: extra empty line at the start of a block (revive)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
daemon/logger/loggertest/logreader.go:58:43: empty-lines: extra empty line at the end of a block (revive)
daemon/logger/ring_test.go:119:34: empty-lines: extra empty line at the end of a block (revive)
daemon/logger/adapter_test.go:37:12: empty-lines: extra empty line at the end of a block (revive)
daemon/logger/adapter_test.go:41:44: empty-lines: extra empty line at the end of a block (revive)
daemon/logger/adapter_test.go:170:9: empty-lines: extra empty line at the end of a block (revive)
daemon/logger/loggerutils/sharedtemp_test.go:152:43: empty-lines: extra empty line at the end of a block (revive)
daemon/logger/loggerutils/sharedtemp.go:124:117: empty-lines: extra empty line at the end of a block (revive)
daemon/logger/syslog/syslog.go:249:87: empty-lines: extra empty line at the end of a block (revive)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
daemon/graphdriver/aufs/aufs.go:239:80: empty-lines: extra empty line at the start of a block (revive)
daemon/graphdriver/graphtest/graphbench_unix.go:249:27: empty-lines: extra empty line at the start of a block (revive)
daemon/graphdriver/graphtest/testutil.go:271:30: empty-lines: extra empty line at the end of a block (revive)
daemon/graphdriver/graphtest/graphbench_unix.go:179:32: empty-block: this block is empty, you can remove it (revive)
daemon/graphdriver/zfs/zfs.go:375:48: empty-lines: extra empty line at the end of a block (revive)
daemon/graphdriver/overlay/overlay.go:248:89: empty-lines: extra empty line at the start of a block (revive)
daemon/graphdriver/devmapper/deviceset.go:636:21: empty-lines: extra empty line at the end of a block (revive)
daemon/graphdriver/devmapper/deviceset.go:1150:70: empty-lines: extra empty line at the start of a block (revive)
daemon/graphdriver/devmapper/deviceset.go:1613:30: empty-lines: extra empty line at the end of a block (revive)
daemon/graphdriver/devmapper/deviceset.go:1645:65: empty-lines: extra empty line at the start of a block (revive)
daemon/graphdriver/btrfs/btrfs.go:53:101: empty-lines: extra empty line at the start of a block (revive)
daemon/graphdriver/devmapper/deviceset.go:1944:89: empty-lines: extra empty line at the start of a block (revive)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
daemon/cluster/convert/service.go:96:34: empty-lines: extra empty line at the end of a block (revive)
daemon/cluster/convert/service.go:169:44: empty-lines: extra empty line at the end of a block (revive)
daemon/cluster/convert/service.go:470:30: empty-lines: extra empty line at the end of a block (revive)
daemon/cluster/convert/container.go:224:23: empty-lines: extra empty line at the start of a block (revive)
daemon/cluster/convert/network.go:109:14: empty-lines: extra empty line at the end of a block (revive)
daemon/cluster/convert/service.go:537:27: empty-lines: extra empty line at the end of a block (revive)
daemon/cluster/services.go:247:19: empty-lines: extra empty line at the end of a block (revive)
daemon/cluster/services.go:252:41: empty-lines: extra empty line at the end of a block (revive)
daemon/cluster/services.go:256:12: empty-lines: extra empty line at the end of a block (revive)
daemon/cluster/services.go:289:80: empty-lines: extra empty line at the start of a block (revive)
daemon/cluster/executor/container/health_test.go:18:37: empty-lines: extra empty line at the start of a block (revive)
daemon/cluster/executor/container/adapter.go:437:68: empty-lines: extra empty line at the end of a block (revive)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
It was only used in a single location, and the ErrExtractPointNotDirectory was
not checked for, or used as a sentinel error.
This error was introduced in c32dde5baa. It was
never used as a sentinel error, but from that commit, it looks like it was added
as a package variable to mirror already existing errors defined at the package
level.
This patch removes the exported variable, and replaces the error with an
errdefs.InvalidParameter(), so that the API also returns the correct (400)
status code.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
It was only used in a single location, and the ErrVolumeReadonly was not checked
for, or used as a sentinel error.
This error was introduced in c32dde5baa. It was
never used as a sentinel error, but from that commit, it looks like it was added
as a package variable to mirror already existing errors defined at the package
level.
This patch removes the exported variable, and replaces the error with an
errdefs.InvalidParameter(), so that the API also returns the correct (400)
status code.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
It was only used in a single location, and the ErrRootFSReadOnly was not checked
for, or used as a sentinel error.
This error was introduced in c32dde5baa, originally
named `ErrContainerRootfsReadonly`. It was never used as a sentinel error, but
from that commit, it looks like it was added as a package variable to mirror
the coding style of already existing errors defined at the package level.
This patch removes the exported variable, and replaces the error with an
errdefs.InvalidParameter(), so that the API also returns the correct (400)
status code.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The getPortMapInfo var was introduced in f198dfd856,
and (from looking at that patch) looks to have been as a quick and dirty workaround
for the `container` argument colliding with the `container` import.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
It was unclear what the distinction was between these configuration
structs, so merging them to simplify.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Remove the "deadcode", "structcheck", and "varcheck" linters, as they are
deprecated:
WARN [runner] The linter 'deadcode' is deprecated (since v1.49.0) due to: The owner seems to have abandoned the linter. Replaced by unused.
WARN [runner] The linter 'structcheck' is deprecated (since v1.49.0) due to: The owner seems to have abandoned the linter. Replaced by unused.
WARN [runner] The linter 'varcheck' is deprecated (since v1.49.0) due to: The owner seems to have abandoned the linter. Replaced by unused.
WARN [linters context] structcheck is disabled because of generics. You can track the evolution of the generics support by following the https://github.com/golangci/golangci-lint/issues/2649.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Now that the type of Container.BaseFS has been reverted to a string,
values can never implement the extractor or archiver interfaces. Rip out
the dead code to support archiving and unarchiving through those
interfcaes.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The Driver abstraction was needed for Linux Containers on Windows,
support for which has since been removed.
There is no direct equivalent to Lchmod() in the standard library so
continue to use the containerd/continuity version.
Signed-off-by: Cory Snider <csnider@mirantis.com>