The github.com/containerd/containerd/log package was moved to a separate
module, which will also be used by upcoming (patch) releases of containerd.
This patch moves our own uses of the package to use the new module.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
By relying on the kernel UAPI (userspace API), we can drop a dependency
and simplify building Moby, while also ensuring that we are using a
stable/supported source of the C types and defines we need.
btrfs-progs mirrors the kernel headers, but the headers it ships with
are not the canonical source and as [we have seen before][44698], could
be subject to changes.
Depending on the canonical headers from the kernel both is more
idiomatic, and ensures we are protected by the kernel's promise to not
break userspace.
[44698]: https://github.com/moby/moby/issues/44698
Signed-off-by: Bjorn Neergaard <bneergaard@mirantis.com>
This is actually quite meaningless as we are reporting the libbtrfs
version, but we do not use libbtrfs. We only use the kernel interface to
btrfs instead.
While we could report the version of the kernel headers in play, they're
rather all-or-nothing: they provide the structures and defines we need,
or they don't. As such, drop all version information as the host kernel
version is the only thing that matters.
Signed-off-by: Bjorn Neergaard <bneergaard@mirantis.com>
daemon/graphdriver/aufs/aufs.go:239:80: empty-lines: extra empty line at the start of a block (revive)
daemon/graphdriver/graphtest/graphbench_unix.go:249:27: empty-lines: extra empty line at the start of a block (revive)
daemon/graphdriver/graphtest/testutil.go:271:30: empty-lines: extra empty line at the end of a block (revive)
daemon/graphdriver/graphtest/graphbench_unix.go:179:32: empty-block: this block is empty, you can remove it (revive)
daemon/graphdriver/zfs/zfs.go:375:48: empty-lines: extra empty line at the end of a block (revive)
daemon/graphdriver/overlay/overlay.go:248:89: empty-lines: extra empty line at the start of a block (revive)
daemon/graphdriver/devmapper/deviceset.go:636:21: empty-lines: extra empty line at the end of a block (revive)
daemon/graphdriver/devmapper/deviceset.go:1150:70: empty-lines: extra empty line at the start of a block (revive)
daemon/graphdriver/devmapper/deviceset.go:1613:30: empty-lines: extra empty line at the end of a block (revive)
daemon/graphdriver/devmapper/deviceset.go:1645:65: empty-lines: extra empty line at the start of a block (revive)
daemon/graphdriver/btrfs/btrfs.go:53:101: empty-lines: extra empty line at the start of a block (revive)
daemon/graphdriver/devmapper/deviceset.go:1944:89: empty-lines: extra empty line at the start of a block (revive)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Finish the refactor which was partially completed with commit
34536c498d, passing around IdentityMapping structs instead of pairs of
[]IDMap slices.
Existing code which uses []IDMap relies on zero-valued fields to be
valid, empty mappings. So in order to successfully finish the
refactoring without introducing bugs, their replacement therefore also
needs to have a useful zero value which represents an empty mapping.
Change IdentityMapping to be a pass-by-value type so that there are no
nil pointers to worry about.
The functionality provided by the deprecated NewIDMappingsFromMaps
function is required by unit tests to to construct arbitrary
IdentityMapping values. And the daemon will always need to access the
mappings to pass them to the Linux kernel. Accommodate these use cases
by exporting the struct fields instead. BuildKit currently depends on
the UIDs and GIDs methods so we cannot get rid of them yet.
Signed-off-by: Cory Snider <csnider@mirantis.com>
pkg/system historically has been a bit of a kitchen-sink of things that were
somewhat "system" related, but didn't have a good place for. EnsureRemoveAll()
is one of those utilities. EnsureRemoveAll() is used to both unmount and remove
a path, for which it depends on both github.com/moby/sys/mount, which in turn
depends on github.com/moby/sys/mountinfo.
pkg/system is imported in the CLI, but neither EnsureRemoveAll(), nor any of its
moby/sys dependencies are used on the client side, so let's move this function
somewhere else, to remove those dependencies from the CLI.
I looked for plausible locations that were related; it's used in:
- daemon
- daemon/graphdriver/XXX/
- plugin
I considered moving it into a (e.g.) "utils" package within graphdriver (but not
a huge fan of "utils" packages), and given that it felt (mostly) related to
cleaning up container filesystems, I decided to move it there.
Some things to follow-up on after this:
- Verify if this function is still needed (it feels a bit like a big hammer in
a "YOLO, let's try some things just in case it fails")
- Perhaps it should be integrated in `containerfs.Remove()` (so that it's used
automatically)
- Look if there's other implementations (and if they should be consolidated),
although (e.g.) the one in containerd is a copy of ours:
https://github.com/containerd/containerd/blob/v1.5.9/pkg/cri/server/helpers_linux.go#L200
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Do not use 0701 perms.
0701 dir perms allows anyone to traverse the docker dir.
It happens to allow any user to execute, as an example, suid binaries
from image rootfs dirs because it allows traversal AND critically
container users need to be able to do execute things.
0701 on lower directories also happens to allow any user to modify
things in, for instance, the overlay upper dir which neccessarily
has 0755 permissions.
This changes to use 0710 which allows users in the group to traverse.
In userns mode the UID owner is (real) root and the GID is the remapped
root's GID.
This prevents anyone but the remapped root to traverse our directories
(which is required for userns with runc).
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit ef7237442147441a7cadcda0600be1186d81ac73)
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit 93ac040bf0)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The io/ioutil package has been deprecated in Go 1.16. This commit
replaces the existing io/ioutil functions with their new definitions in
io and os packages.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
Add hints for "Failed to destroy btrfs snapshot <DIR> for <ID>: operation not permitted" on rootless
Related to issue 41762
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Before this change, cleanup of the btrfs driver (occuring on each daemon
shutdown) resulted in disabling quotas. It was done with an assumption
that quotas can be enabled or disabled on a subvolume level, which is
not true - enabling or disabling quota is always done on a filesystem
level.
That was leading to disabling quota on btrfs filesystems on each daemon
shutdown.
This change fixes that behavior and removes misleading `subvol` prefix
from functions and methods which set up quota (on a filesystem level).
Fixes: #34593
Fixes: 401c8d1767 ("Add disk quota support for btrfs")
Signed-off-by: Michal Rostecki <mrostecki@opensuse.org>
Fix issue 41762
Cherry-pick "drivers: btrfs: Allow unprivileged user to delete subvolumes" from containers/storage
831e32b6bd
> In btrfs, subvolume can be deleted by IOC_SNAP_DESTROY ioctl but there
> is one catch: unprivileged IOC_SNAP_DESTROY call is restricted by default.
>
> This is because IOC_SNAP_DESTROY only performs permission checks on
> the top directory(subvolume) and unprivileged user might delete dirs/files
> which cannot be deleted otherwise. This restriction can be relaxed if
> user_subvol_rm_allowed mount option is used.
>
> Although the above ioctl had been the only way to delete a subvolume,
> btrfs now allows deletion of subvolume just like regular directory
> (i.e. rmdir sycall) since kernel 4.18.
>
> So if we fail to cleanup subvolume in subvolDelete(), just fallback to
> system.EnsureRmoveall() to try to cleanup subvolumes again.
> (Note: quota needs privilege, so if quota is enabled we do not fallback)
>
> This fix will allow non-privileged container works with btrfs backend.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Various dirs in /var/lib/docker contain data that needs to be mounted
into a container. For this reason, these dirs are set to be owned by the
remapped root user, otherwise there can be permissions issues.
However, this uneccessarily exposes these dirs to an unprivileged user
on the host.
Instead, set the ownership of these dirs to the real root (or rather the
UID/GID of dockerd) with 0701 permissions, which allows the remapped
root to enter the directories but not read/write to them.
The remapped root needs to enter these dirs so the container's rootfs
can be configured... e.g. to mount /etc/resolve.conf.
This prevents an unprivileged user from having read/write access to
these dirs on the host.
The flip side of this is now any user can enter these directories.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit e908cc3901)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Switch to moby/sys/mount and mountinfo. Keep the pkg/mount for potential
outside users.
This commit was generated by the following bash script:
```
set -e -u -o pipefail
for file in $(git grep -l 'docker/docker/pkg/mount"' | grep -v ^pkg/mount); do
sed -i -e 's#/docker/docker/pkg/mount"#/moby/sys/mount"#' \
-e 's#mount\.\(GetMounts\|Mounted\|Info\|[A-Za-z]*Filter\)#mountinfo.\1#g' \
$file
goimports -w $file
done
```
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
```
daemon/graphdriver/btrfs/btrfs.go:609:5: SA4003: no value of type uint64 is less than 0 (staticcheck)
if driver.options.size <= 0 {
^
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Format the source according to latest goimports.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
btrfs_noversion was added in d7c37b5a28
for distributions that did not have the `btrfs/version.h` header file.
Seeing how all of the distributions we currently support do have the
`btrfs/version.h` file we should probably just remove this build flag
altogether.
Signed-off-by: Eli Uriegas <eli.uriegas@docker.com>
The errors returned from Mount and Unmount functions are raw
syscall.Errno errors (like EPERM or EINVAL), which provides
no context about what has happened and why.
Similar to os.PathError type, introduce mount.Error type
with some context. The error messages will now look like this:
> mount /tmp/mount-tests/source:/tmp/mount-tests/target, flags: 0x1001: operation not permitted
or
> mount tmpfs:/tmp/mount-test-source-516297835: operation not permitted
Before this patch, it was just
> operation not permitted
[v2: add Cause()]
[v3: rename MountError to Error, document Cause()]
[v4: fixes; audited all users]
[v5: make Error type private; changes after @cpuguy83 reviews]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
For some reason, shared mount propagation between the host
and a container does not work for btrfs, unless container
root directory (i.e. graphdriver home) is a bind mount.
The above issue was reproduced on SLES 12sp3 + btrfs using
the following script:
#!/bin/bash
set -eux -o pipefail
# DIR should not be under a subvolume
DIR=${DIR:-/lib}
MNT=$DIR/my-mnt
FILE=$MNT/file
ID=$(docker run -d --privileged -v $DIR:$DIR:rshared ubuntu sleep 24h)
docker exec $ID mkdir -p $MNT
docker exec $ID mount -t tmpfs tmpfs $MNT
docker exec $ID touch $FILE
ls -l $FILE
umount $MNT
docker rm -f $ID
which fails this way:
+ ls -l /lib/my-mnt/file
ls: cannot access '/lib/my-mnt/file': No such file or directory
meaning the mount performed inside a priviledged container is not
propagated back to the host (even if all the mounts have "shared"
propagation mode).
The remedy to the above is to make graphdriver home a bind mount.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This implements chown support on Windows. Built-in accounts as well
as accounts included in the SAM database of the container are supported.
NOTE: IDPair is now named Identity and IDMappings is now named
IdentityMapping.
The following are valid examples:
ADD --chown=Guest . <some directory>
COPY --chown=Administrator . <some directory>
COPY --chown=Guests . <some directory>
COPY --chown=ContainerUser . <some directory>
On Windows an owner is only granted the permission to read the security
descriptor and read/write the discretionary access control list. This
fix also grants read/write and execute permissions to the owner.
Signed-off-by: Salahuddin Khan <salah@docker.com>
Now all of the storage drivers use the field "storage-driver" in their log
messages, which is set to name of the respective driver.
Storage drivers changed:
- Aufs
- Btrfs
- Devicemapper
- Overlay
- Overlay 2
- Zfs
Signed-off-by: Alejandro GonzÃlez Hevia <alejandrgh11@gmail.com>
This was added in #36047 just as a way to make sure the tree is fully
unmounted on shutdown.
For ZFS this could be a breaking change since there was no unmount before.
Someone could have setup the zfs tree themselves. It would be better, if
we really do want the cleanup to actually the unpacked layers checking
for mounts rather than a blind recursive unmount of the root.
BTRFS does not use mounts and does not need to unmount anyway.
These was only an unmount to begin with because for some reason the
btrfs tree was being moutned with `private` propagation.
For the other graphdrivers that still have a recursive unmount here...
these were already being unmounted and performing the recursive unmount
shouldn't break anything. If anyone had anything mounted at the
graphdriver location it would have been unmounted on shutdown anyway.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
The idea behind making the graphdrivers private is to prevent leaking
mounts into other namespaces.
Unfortunately this is not really what happens.
There is one case where this does work, and that is when the namespace
was created before the daemon's namespace.
However with systemd each system servie winds up with it's own mount
namespace. This causes a race betwen daemon startup and other system
services as to if the mount is actually private.
This also means there is a negative impact when other system services
are started while the daemon is running.
Basically there are too many things that the daemon does not have
control over (nor should it) to be able to protect against these kinds
of leakages. One thing is certain, setting the graphdriver roots to
private disconnects the mount ns heirarchy preventing propagation of
unmounts... new mounts are of course not propagated either, but the
behavior is racey (or just bad in the case of restarting services)... so
it's better to just be able to keep mount propagation in tact.
It also does not protect situations like `-v
/var/lib/docker:/var/lib/docker` where all mounts are recursively bound
into the container anyway.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
The fsmagic check was always performed on "data-root" (`/var/lib/docker`),
not on the storage-driver's home directory (e.g. `/var/lib/docker/<somedriver>`).
This caused detection to be done on the wrong filesystem in situations
where `/var/lib/docker/<somedriver>` was a mount, and a different
filesystem than `/var/lib/docker` itself.
This patch checks if the storage-driver's home directory exists, and only
falls back to `/var/lib/docker` if it doesn't exist.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This enables docker cp and ADD/COPY docker build support for LCOW.
Originally, the graphdriver.Get() interface returned a local path
to the container root filesystem. This does not work for LCOW, so
the Get() method now returns an interface that LCOW implements to
support copying to and from the container.
Signed-off-by: Akash Gupta <akagup@microsoft.com>
Changes most references of syscall to golang.org/x/sys/
Ones aren't changes include, Errno, Signal and SysProcAttr
as they haven't been implemented in /x/sys/.
Signed-off-by: Christopher Jones <tophj@linux.vnet.ibm.com>
[s390x] switch utsname from unsigned to signed
per 33267e036f
char in s390x in the /x/sys/unix package is now signed, so
change the buildtags
Signed-off-by: Christopher Jones <tophj@linux.vnet.ibm.com>
This commit is an extension of fix for 29325 based on the review comment.
In this commit, the quota size for btrfs is kept in `/var/lib/docker/btrfs/quotas`
so that a daemon restart keeps quota.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
This fix tries to address the issue raised in 29325 where
btrfs quota groups are not clean up even after containers
have been destroyed.
The reason for the issue is that btrfs quota groups have
to be explicitly destroyed. This fix fixes this issue.
This fix is tested manually in Ubuntu 16.04,
with steps specified in 29325.
This fix fixes 29325.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
Before this, if `forceRemove` is set the container data will be removed
no matter what, including if there are issues with removing container
on-disk state (rw layer, container root).
In practice this causes a lot of issues with leaked data sitting on
disk that users are not able to clean up themselves.
This is particularly a problem while the `EBUSY` errors on remove are so
prevalent. So for now let's not keep this behavior.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This fix tries to address the issue raised in 29810
where btrfs subvolume removal failed when docker
is in an unprivileged lxc container. The failure
was caused by `Failed to rescan btrfs quota` with
`operation not permitted`.
However, if disk quota is not enabled, there is no
need to run a btrfs rescan at the first place.
This fix checks for `quotaEnabled` and only run btrfs
rescan if `quotaEnabled` is true.
This fix fixes 29810.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>