The new daemon.containerFSView type covers all the use-cases on Linux
with a much more intuitive API, but is not portable to Windows.
Discourage people from using the old and busted functions in new Linux
code by excluding them entirely from Linux builds.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Mounting a container's volumes under its rootfs directory inside the
host mount namespace causes problems with cross-namespace mount
propagation when /var/lib/docker is bind-mounted into the container as a
volume. The mount event propagates into the container's mount namespace,
overmounting the volume, but the propagated unmount events do not fully
reverse the effect. Each archive operation causes the mount table in the
container's mount namespace to grow larger and larger, until the kernel
limiton the number of mounts in a namespace is hit. The only solution to
this issue which is not subject to race conditions or other blocker
caveats is to avoid mounting volumes into the container's rootfs
directory in the host mount namespace in the first place.
Mount the container volumes inside an unshared mount namespace to
prevent any mount events from propagating into any other mount
namespace. Greatly simplify the archiving implementations by also
chrooting into the container rootfs to sidestep the need to resolve
paths in the host.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The existing archive implementation is not easy to reason about by
reading the source. Prepare to rewrite it by covering more edge cases in
tests. The new test cases were determined by black-box characterizing
the existing behaviour.
Signed-off-by: Cory Snider <csnider@mirantis.com>
It is only applicable to Windows so it does not need to be called from
platform-generic code. Fix locking in the Windows implementation.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The Linux implementation needs to diverge significantly from the Windows
one in order to fix platform-specific bugs. Cut the generic
implementation out of daemon/archive.go and paste identical, verbatim
copies of that implementation into daemon/archive_{windows,linux}.go to
make it easier to compare the progression of changes to the respective
implementations through Git history.
Signed-off-by: Cory Snider <csnider@mirantis.com>
This change was introduced early in the development of rootless support,
before all the kinks were worked out and rootlesskit was built. The
author was testing the daemon by inside a user namespace set up by runc,
observed that the unshare(2) syscall was returning EPERM, and assumed
that it was a fundamental limitation of user namespaces. Seeing as the
kernel documentation (of today) disagrees with that assessment and that
unshare demonstrably works inside user namespaces, I can only assume
that the EPERM was due to a quirk of their test environment, such as a
seccomp filter set up by runc blocking the unshare syscall.
https://github.com/moby/moby/pull/20902#issuecomment-236409406
Mount namespaces are necessary to address #38995 and #43390. Revert the
special-casing so those issues can also be fixed for rootless daemons.
This reverts commit dc950567c1.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Unshare the thread's file system attributes and, if applicable, mount
namespace so that the chroot operation does not affect the rest of the
process.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The applyLayer implementation in pkg/chrootarchive has to set the TMPDIR
environment variable so that archive.UnpackLayer() can successfully
create the whiteout-file temp directory. Change UnpackLayer to create
the temporary directory under the destination path so that environment
variables do not need to be touched.
Signed-off-by: Cory Snider <csnider@mirantis.com>
This fix tries to address issues raised in #44346.
The max-concurrent-downloads and max-concurrent-uploads limits are applied for the whole engine and not for each pull/push command.
Signed-off-by: Luis Henrique Mulinari <luis.mulinari@gmail.com>
Aside from unconditionally unlocking the OS thread even if restoring the
thread's network namespace fails, func (*networkNamespace).InvokeFunc()
correctly implements invoking a function inside a network namespace.
This is far from obvious, however. func InitOSContext() does much of the
heavy lifting but in a bizarre fashion: it restores the initial network
namespace before it is changed in the first place, and the cleanup
function it returns does not restore the network namespace at all! The
InvokeFunc() implementation has to restore the network namespace
explicitly by deferring a call to ns.SetNamespace().
func InitOSContext() is a leaky abstraction taped to a footgun. On the
one hand, it defensively resets the current thread's network namespace,
which has the potential to fix up the thread state if other buggy code
had failed to maintain the invariant that an OS thread must be locked to
a goroutine unless it is interchangeable with a "clean" thread as
spawned by the Go runtime. On the other hand, it _facilitates_ writing
buggy code which fails to maintain the aforementioned invariant because
the cleanup function it returns unlocks the thread from the goroutine
unconditionally while neglecting to restore the thread's network
namespace! It is quite scary to need a function which fixes up threads'
network namespaces after the fact as an arbitrary number of goroutines
could have been scheduled onto a "dirty" thread and run non-libnetwork
code before the thread's namespace is fixed up. Any number of
(not-so-)subtle misbehaviours could result if an unfortunate goroutine
is scheduled onto a "dirty" thread. The whole repository has been
audited to ensure that the aforementioned invariant is never violated,
making after-the-fact fixing up of thread network namespaces redundant.
Make InitOSContext() a no-op on Linux and inline the thread-locking into
the function (singular) which previously relied on it to do so.
func ns.SetNamespace() is of similarly dubious utility. It intermixes
capturing the initial network namespace and restoring the thread's
network namespace, which could result in threads getting put into the
wrong network namespace if the wrong thread is the first to call it.
Delete it entirely; functions which need to manipulate a thread's
network namespace are better served by being explicit about capturing
and restoring the thread's namespace.
Rewrite InvokeFunc() to invoke the closure inside a goroutine to enable
a graceful and safe recovery if the thread's network namespace could not
be restored. Avoid any potential race conditions due to changing the
main thread's network namespace by preventing the aforementioned
goroutines from being eligible to be scheduled onto the main thread.
Signed-off-by: Cory Snider <csnider@mirantis.com>
func (*network) watchMiss() correctly locks its goroutine to an OS
thread before changing the thread's network namespace, but neglects to
restore the thread's network namespace before unlocking. Fix this
oversight by unlocking iff the thread's network namespace is
successfully restored.
Prevent the watchMiss goroutine from being locked to the main thread to
avoid the issues which would arise if such a situation was to occur.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The parallel tests were unconditionally unlocking the test case
goroutine from the OS thread, irrespective of whether the thread's
network namespace was successfully restored. This was not a problem in
practice as the unpaired calls to runtime.LockOSThread() peppered
through the test case would have prevented the goroutine from being
unlocked. Unlock the goroutine from the thread iff the thread's network
namespace is successfully restored.
Signed-off-by: Cory Snider <csnider@mirantis.com>
testutils.SetupTestOSContext() sets the calling thread's network
namespace but neglected to restore it on teardown. This was not a
problem in practice as it called runtime.LockOSThread() twice but
runtime.UnlockOSThread() only once, so the tampered threads would be
terminated by the runtime when the test case returned and replaced with
a clean thread. Correct the utility so it restores the thread's network
namespace during teardown and unlocks the goroutine from the thread on
success.
Remove unnecessary runtime.LockOSThread() calls peppering test cases
which leverage testutils.SetupTestOSContext().
Signed-off-by: Cory Snider <csnider@mirantis.com>
cmd.Environ() is new in go1.19, and not needed for this specific case.
Without this, trying to use this package in code that uses go1.18 will fail;
builder/remotecontext/git/gitutils.go:216:23: cmd.Environ undefined (type *exec.Cmd has no field or method Environ)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- On Windows, we don't build and run a local test registry (we're not running
docker-in-docker), so we need to skip this test.
- On rootless, networking doesn't support this (currently)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This is accomplished by storing the distribution source in the content
labels. If the distribution source is not found then we check to the
registry to see if the digest exists in the repo, if it does exist then
the puller will use it.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
GitHub uses these parameters to construct a name; removing the ./ prefix
to make them more readable (and add them back where it's used)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Setting cmd.Env overrides the default of passing through the parent
process' environment, which works out fine most of the time, except when
it doesn't. For whatever reason, leaving out all the environment causes
git-for-windows sh.exe subprocesses to enter an infinite loop of
access violations during Cygwin initialization in certain environments
(specifically, our very own dev container image).
Signed-off-by: Cory Snider <csnider@mirantis.com>
While it is undesirable for the system or user git config to be used
when the daemon clones a Git repo, it could break workflows if it was
unconditionally applied to docker/cli as well.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Prevent git commands we run from reading the user or system
configuration, or cloning submodules from the local filesystem.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Keep It Simple! Set the working directory for git commands by...setting
the git process's working directory. Git commands can be run in the
parent process's working directory by passing the empty string.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Make the test more debuggable by logging all git command output and
running each table-driven test case as a subtest.
Signed-off-by: Cory Snider <csnider@mirantis.com>