This was added in #36047 just as a way to make sure the tree is fully
unmounted on shutdown.
For ZFS this could be a breaking change since there was no unmount before.
Someone could have setup the zfs tree themselves. It would be better, if
we really do want the cleanup to actually the unpacked layers checking
for mounts rather than a blind recursive unmount of the root.
BTRFS does not use mounts and does not need to unmount anyway.
These was only an unmount to begin with because for some reason the
btrfs tree was being moutned with `private` propagation.
For the other graphdrivers that still have a recursive unmount here...
these were already being unmounted and performing the recursive unmount
shouldn't break anything. If anyone had anything mounted at the
graphdriver location it would have been unmounted on shutdown anyway.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
The idea behind making the graphdrivers private is to prevent leaking
mounts into other namespaces.
Unfortunately this is not really what happens.
There is one case where this does work, and that is when the namespace
was created before the daemon's namespace.
However with systemd each system servie winds up with it's own mount
namespace. This causes a race betwen daemon startup and other system
services as to if the mount is actually private.
This also means there is a negative impact when other system services
are started while the daemon is running.
Basically there are too many things that the daemon does not have
control over (nor should it) to be able to protect against these kinds
of leakages. One thing is certain, setting the graphdriver roots to
private disconnects the mount ns heirarchy preventing propagation of
unmounts... new mounts are of course not propagated either, but the
behavior is racey (or just bad in the case of restarting services)... so
it's better to just be able to keep mount propagation in tact.
It also does not protect situations like `-v
/var/lib/docker:/var/lib/docker` where all mounts are recursively bound
into the container anyway.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
The fsmagic check was always performed on "data-root" (`/var/lib/docker`),
not on the storage-driver's home directory (e.g. `/var/lib/docker/<somedriver>`).
This caused detection to be done on the wrong filesystem in situations
where `/var/lib/docker/<somedriver>` was a mount, and a different
filesystem than `/var/lib/docker` itself.
This patch checks if the storage-driver's home directory exists, and only
falls back to `/var/lib/docker` if it doesn't exist.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This enables docker cp and ADD/COPY docker build support for LCOW.
Originally, the graphdriver.Get() interface returned a local path
to the container root filesystem. This does not work for LCOW, so
the Get() method now returns an interface that LCOW implements to
support copying to and from the container.
Signed-off-by: Akash Gupta <akagup@microsoft.com>
Changes most references of syscall to golang.org/x/sys/
Ones aren't changes include, Errno, Signal and SysProcAttr
as they haven't been implemented in /x/sys/.
Signed-off-by: Christopher Jones <tophj@linux.vnet.ibm.com>
[s390x] switch utsname from unsigned to signed
per 33267e036f
char in s390x in the /x/sys/unix package is now signed, so
change the buildtags
Signed-off-by: Christopher Jones <tophj@linux.vnet.ibm.com>
This commit is an extension of fix for 29325 based on the review comment.
In this commit, the quota size for btrfs is kept in `/var/lib/docker/btrfs/quotas`
so that a daemon restart keeps quota.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
This fix tries to address the issue raised in 29325 where
btrfs quota groups are not clean up even after containers
have been destroyed.
The reason for the issue is that btrfs quota groups have
to be explicitly destroyed. This fix fixes this issue.
This fix is tested manually in Ubuntu 16.04,
with steps specified in 29325.
This fix fixes 29325.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
Before this, if `forceRemove` is set the container data will be removed
no matter what, including if there are issues with removing container
on-disk state (rw layer, container root).
In practice this causes a lot of issues with leaked data sitting on
disk that users are not able to clean up themselves.
This is particularly a problem while the `EBUSY` errors on remove are so
prevalent. So for now let's not keep this behavior.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This fix tries to address the issue raised in 29810
where btrfs subvolume removal failed when docker
is in an unprivileged lxc container. The failure
was caused by `Failed to rescan btrfs quota` with
`operation not permitted`.
However, if disk quota is not enabled, there is no
need to run a btrfs rescan at the first place.
This fix checks for `quotaEnabled` and only run btrfs
rescan if `quotaEnabled` is true.
This fix fixes 29810.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
This allows for easy extension of adding more parameters to existing
parameters list. Otherwise adding a single parameter changes code
at so many places.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Since the layer store was introduced, the level above the graphdriver
now differentiates between read/write and read-only layers. This
distinction is useful for graphdrivers that need to take special steps
when creating a layer based on whether it is read-only or not.
Adding this parameter allows the graphdrivers to differentiate, which
in the case of the Windows graphdriver, removes our dependence on parsing
the id of the parent for "-init" in order to infer this information.
This will also set the stage for unblocking some of the layer store
unit tests in the next preview build of Windows.
Signed-off-by: Stefan J. Wernli <swernli@microsoft.com>
btrfs-progs-4.5 introduces device delete by devid
for this reason btrfs_ioctl_vol_args_v2's name was encapsulated
in a union
this patch is for setting btrfs_ioctl_vol_args_v2's name
using a C function in order to preserve compatibility
with all btrfs-progs versions
Signed-off-by: Julio Montes <imc.coder@gmail.com>
For btrfs driver, in d.Create(), Get() of parentDir is called but not followed
by Put().
If we apply SElinux mount label, we need to mount btrfs subvolumes in d.Get(),
without a Put() would end up with a later Remove() failure on
"Device resourse is busy".
This calls the subvolume helper function directly in d.Create().
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Most storage drivers call graphdriver.GetFSMagic(home),
it is more clean to easy to maintain. So btrfs need to
adopt such change.
Signed-off-by: Kai Qiang Wu(Kennan) <wkqwu@cn.ibm.com>
Make sure btrfs mounted subvolumes are owned properly when a remapped
root exists (user namespaces are enabled, for example)
Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
Really fixing 2 things:
1. Panic when any error is detected while walking the btrfs graph dir on
removal due to no error check.
2. Nested subvolumes weren't actually being removed due to passing in
the wrong path
On point 2, for a path detected as a nested subvolume, we were calling
`subvolDelete("/path/to/subvol", "subvol")`, where the last part of the
path was duplicated due to a logic error, and as such actually causing
point #1 since `subvolDelete` joins the two arguemtns, and
`/path/to/subvol/subvol` (the joined version) doesn't exist.
Also adds a test for nested subvol delete.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This change will allow us to run SELinux in a container with
BTRFS back end. We continue to work on fixing the kernel/BTRFS
but this change will allow SELinux Security separation on BTRFS.
It basically relabels the content on container creation.
Just relabling -init directory in BTRFS use case. Everything looks like it
works. I don't believe tar/achive stores the SELinux labels, so we are good
as far as docker commit.
Tested Speed on startup with BTRFS on top of loopback directory. BTRFS
not on loopback should get even better perfomance on startup time. The
more inodes inside of the container image will increase the relabel time.
This patch will give people who care more about security the option of
runnin BTRFS with SELinux. Those who don't want to take the slow down
can disable SELinux either in individual containers or for all containers
by continuing to disable SELinux in the daemon.
Without relabel:
> time docker run --security-opt label:disable fedora echo test
test
real 0m0.918s
user 0m0.009s
sys 0m0.026s
With Relabel
test
real 0m1.942s
user 0m0.007s
sys 0m0.030s
Signed-off-by: Dan Walsh <dwalsh@redhat.com>
Signed-off-by: Dan Walsh <dwalsh@redhat.com>
Adds support for the daemon to handle user namespace maps as a
per-daemon setting.
Support for handling uid/gid mapping is added to the builder,
archive/unarchive packages and functions, all graphdrivers (except
Windows), and the test suite is updated to handle user namespace daemon
rootgraph changes.
Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
Export image/container metadata stored in graph driver. Right now 3 fields
DeviceId, DeviceSize and DeviceName are being exported from devicemapper.
Other graph drivers can export fields as they see fit.
This data can be used to mount the thin device outside of docker and tools
can look into image/container and do some kind of inspection.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
We removed it, because upstream removed it. But now it will be coming
back, so work with it either way.
Signed-off-by: Vincent Batts <vbatts@redhat.com>
They say we should only use the BTRFS_LIB_VERSION
They will no longer support this since it had to be managed manually
Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
There are a couple of drivers that swallow errors that may occur in
their Put() implementation.
This changes the signature of (*Driver).Put for all the drivers implemented.
Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
Some graphdrivers are Differs and type assertions are made
in various places throughout the project. Differ offers some
convenience in generating/applying diffs of filesystem layers
but for most graphdrivers another code path is taken.
This patch brings all of the logic related to filesystem
diffs in one place, and simplifies the implementation of some
common types like Image, Daemon, and Container.
Signed-off-by: Josh Hawn <josh.hawn@docker.com>
If this is at the root directory for the daemon you could unmount
somones filesystem when you stop docker and this is actually only needed
for the palces that the graph drivers mount the container's root
filesystems.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
There are two cases where we can't use a graphdriver:
1) the graphdriver itself isn't supported by the system
2) the graphdriver is supported by some configuration/prerequisites are
missing
This introduces a new error for the 2) case and uses it when trying to
run docker with btrfs backend on a non-btrfs filesystem.
Docker-DCO-1.1-Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org> (github: discordianfish)
If a graphdriver fails initialization due to ErrNotSupported we ignore
that and keep trying the next. But if some driver has a different
error (for instance if you specified an unknown option for it) we fail
the daemon startup, printing the error, rather than falling back to an
unexected driver (typically vfs) which may not match what you have run
earlier.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)