Mounting a container's volumes under its rootfs directory inside the
host mount namespace causes problems with cross-namespace mount
propagation when /var/lib/docker is bind-mounted into the container as a
volume. The mount event propagates into the container's mount namespace,
overmounting the volume, but the propagated unmount events do not fully
reverse the effect. Each archive operation causes the mount table in the
container's mount namespace to grow larger and larger, until the kernel
limiton the number of mounts in a namespace is hit. The only solution to
this issue which is not subject to race conditions or other blocker
caveats is to avoid mounting volumes into the container's rootfs
directory in the host mount namespace in the first place.
Mount the container volumes inside an unshared mount namespace to
prevent any mount events from propagating into any other mount
namespace. Greatly simplify the archiving implementations by also
chrooting into the container rootfs to sidestep the need to resolve
paths in the host.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The existing archive implementation is not easy to reason about by
reading the source. Prepare to rewrite it by covering more edge cases in
tests. The new test cases were determined by black-box characterizing
the existing behaviour.
Signed-off-by: Cory Snider <csnider@mirantis.com>
It is only applicable to Windows so it does not need to be called from
platform-generic code. Fix locking in the Windows implementation.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The Linux implementation needs to diverge significantly from the Windows
one in order to fix platform-specific bugs. Cut the generic
implementation out of daemon/archive.go and paste identical, verbatim
copies of that implementation into daemon/archive_{windows,linux}.go to
make it easier to compare the progression of changes to the respective
implementations through Git history.
Signed-off-by: Cory Snider <csnider@mirantis.com>
This change was introduced early in the development of rootless support,
before all the kinks were worked out and rootlesskit was built. The
author was testing the daemon by inside a user namespace set up by runc,
observed that the unshare(2) syscall was returning EPERM, and assumed
that it was a fundamental limitation of user namespaces. Seeing as the
kernel documentation (of today) disagrees with that assessment and that
unshare demonstrably works inside user namespaces, I can only assume
that the EPERM was due to a quirk of their test environment, such as a
seccomp filter set up by runc blocking the unshare syscall.
https://github.com/moby/moby/pull/20902#issuecomment-236409406
Mount namespaces are necessary to address #38995 and #43390. Revert the
special-casing so those issues can also be fixed for rootless daemons.
This reverts commit dc950567c1.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Unshare the thread's file system attributes and, if applicable, mount
namespace so that the chroot operation does not affect the rest of the
process.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The applyLayer implementation in pkg/chrootarchive has to set the TMPDIR
environment variable so that archive.UnpackLayer() can successfully
create the whiteout-file temp directory. Change UnpackLayer to create
the temporary directory under the destination path so that environment
variables do not need to be touched.
Signed-off-by: Cory Snider <csnider@mirantis.com>
cmd.Environ() is new in go1.19, and not needed for this specific case.
Without this, trying to use this package in code that uses go1.18 will fail;
builder/remotecontext/git/gitutils.go:216:23: cmd.Environ undefined (type *exec.Cmd has no field or method Environ)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- On Windows, we don't build and run a local test registry (we're not running
docker-in-docker), so we need to skip this test.
- On rootless, networking doesn't support this (currently)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This is accomplished by storing the distribution source in the content
labels. If the distribution source is not found then we check to the
registry to see if the digest exists in the repo, if it does exist then
the puller will use it.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
GitHub uses these parameters to construct a name; removing the ./ prefix
to make them more readable (and add them back where it's used)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Setting cmd.Env overrides the default of passing through the parent
process' environment, which works out fine most of the time, except when
it doesn't. For whatever reason, leaving out all the environment causes
git-for-windows sh.exe subprocesses to enter an infinite loop of
access violations during Cygwin initialization in certain environments
(specifically, our very own dev container image).
Signed-off-by: Cory Snider <csnider@mirantis.com>
While it is undesirable for the system or user git config to be used
when the daemon clones a Git repo, it could break workflows if it was
unconditionally applied to docker/cli as well.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Prevent git commands we run from reading the user or system
configuration, or cloning submodules from the local filesystem.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Keep It Simple! Set the working directory for git commands by...setting
the git process's working directory. Git commands can be run in the
parent process's working directory by passing the empty string.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Make the test more debuggable by logging all git command output and
running each table-driven test case as a subtest.
Signed-off-by: Cory Snider <csnider@mirantis.com>
This makes it more transparent that it's unused for Linux,
and we don't pass "root", which has no relation with the
path on Linux.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This allows us to run CI with the containerd snapshotter enabled, without
patching the daemon.json, or changing how tests set up daemon flags.
A warning log is added during startup, to inform if this variable is set,
as it should only be used for our integration tests.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>