When the daemon process or the host running it is abruptly terminated,
the layer metadata file can become inconsistent on the file system.
Specifically, `link` and `lower` files may exist but be empty, leading
to overlay mounting errors during layer extraction, such as:
"failed to register layer: error creating overlay mount to <path>:
too many levels of symbolic links."
This commit introduces the use of `AtomicWriteFile` to ensure that the
layer metadata files contain correct data when they exist on the file system.
Signed-off-by: Mike <mike.sul@foundries.io>
(cherry picked from commit de2447c2ab)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Treat copying extended attributes from a source filesystem which does
not support extended attributes as a no-op, same as if the file did not
possess the extended attribute. Only fail copying extended attributes if
the source file has the attribute and the destination filesystem does
not support xattrs.
Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 2b6761fd3e)
Signed-off-by: Cory Snider <csnider@mirantis.com>
Remove the "deadcode", "structcheck", and "varcheck" linters, as they are
deprecated:
WARN [runner] The linter 'deadcode' is deprecated (since v1.49.0) due to: The owner seems to have abandoned the linter. Replaced by unused.
WARN [runner] The linter 'structcheck' is deprecated (since v1.49.0) due to: The owner seems to have abandoned the linter. Replaced by unused.
WARN [runner] The linter 'varcheck' is deprecated (since v1.49.0) due to: The owner seems to have abandoned the linter. Replaced by unused.
WARN [linters context] structcheck is disabled because of generics. You can track the evolution of the generics support by following the https://github.com/golangci/golangci-lint/issues/2649.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 2f1c382a6d)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 3ce520ec80)
Signed-off-by: Cory Snider <csnider@mirantis.com>
Older versions of Go don't format comments, so committing this as
a separate commit, so that we can already make these changes before
we upgrade to Go 1.19.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 52c1a2fae8)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit cdbca4061b)
Signed-off-by: Cory Snider <csnider@mirantis.com>
The io/ioutil package has been deprecated in Go 1.16. This commit
replaces the existing io/ioutil functions with their new definitions in
io and os packages.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
(cherry picked from commit c55a4ac779)
Signed-off-by: Cory Snider <csnider@mirantis.com>
Probably needs a similar change as c208f03fbd,
but this code makes my head spin, so for now suppressing, and created a
tracking issue:
daemon/graphdriver/graphtest/graphtest_unix.go:305:12: unsafeptr: possible misuse of reflect.SliceHeader (govet)
header := *(*reflect.SliceHeader)(unsafe.Pointer(&buf))
^
daemon/graphdriver/graphtest/graphtest_unix.go:308:36: unsafeptr: possible misuse of reflect.SliceHeader (govet)
data := *(*[]byte)(unsafe.Pointer(&header))
^
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit e6dabfa977)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Unlike regular comments, nolint comments should not have a leading space.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit bb17074119)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
doCopyXattrs() never reached due to copyXattrs boolean being false, as
a result file capabilities not being copied.
moved copyXattr() out of doCopyXattrs()
Signed-off-by: Illo Abdulrahim <abdulrahim.illo@nokia.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 31f654a704)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Movified from 686be57d0a, and re-ran
gofmt again to address for files not present in 20.10 and vice-versa.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 686be57d0a)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
daemon/graphdriver/fuse-overlayfs/fuseoverlayfs.go:101:63: SA9002: file mode '700' evaluates to 01274; did you mean '0700'? (staticcheck)
if err := idtools.MkdirAllAndChown(path.Join(home, linkDir), 700, currentID); err != nil {
^
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit f9fb5d4f25)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Do not use 0701 perms.
0701 dir perms allows anyone to traverse the docker dir.
It happens to allow any user to execute, as an example, suid binaries
from image rootfs dirs because it allows traversal AND critically
container users need to be able to do execute things.
0701 on lower directories also happens to allow any user to modify
things in, for instance, the overlay upper dir which neccessarily
has 0755 permissions.
This changes to use 0710 which allows users in the group to traverse.
In userns mode the UID owner is (real) root and the GID is the remapped
root's GID.
This prevents anyone but the remapped root to traverse our directories
(which is required for userns with runc).
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit ef7237442147441a7cadcda0600be1186d81ac73)
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Kernel 5.11 introduced support for rootless overlayfs, but incompatible with SELinux.
On the other hand, fuse-overlayfs is compatible.
Close issue 42333
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
(cherry picked from commit 4300a52606)
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Fix issue 41762
Cherry-pick "drivers: btrfs: Allow unprivileged user to delete subvolumes" from containers/storage
831e32b6bd
> In btrfs, subvolume can be deleted by IOC_SNAP_DESTROY ioctl but there
> is one catch: unprivileged IOC_SNAP_DESTROY call is restricted by default.
>
> This is because IOC_SNAP_DESTROY only performs permission checks on
> the top directory(subvolume) and unprivileged user might delete dirs/files
> which cannot be deleted otherwise. This restriction can be relaxed if
> user_subvol_rm_allowed mount option is used.
>
> Although the above ioctl had been the only way to delete a subvolume,
> btrfs now allows deletion of subvolume just like regular directory
> (i.e. rmdir sycall) since kernel 4.18.
>
> So if we fail to cleanup subvolume in subvolDelete(), just fallback to
> system.EnsureRmoveall() to try to cleanup subvolumes again.
> (Note: quota needs privilege, so if quota is enabled we do not fallback)
>
> This fix will allow non-privileged container works with btrfs backend.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
(cherry picked from commit 62b5194f62)
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
overlay2 no longer sets `archive.OverlayWhiteoutFormat` when
running in UserNS, so we can remove the complicated logic in the
archive package.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
(cherry picked from commit 6322dfc217)
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
When running in userns, returns error (i.e. "use naive, not native")
immediately.
No substantial change to the logic.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
(cherry picked from commit 67aa418df2)
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Previously, `d.naiveDiff.ApplyDiff` was not used even when
`useNaiveDiff()==true`
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
(cherry picked from commit dd97134232)
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
(cherry picked from commit a8008f7313)
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
The "userxattr" option is needed for mounting overlayfs inside a user namespace with kernel >= 5.11.
The "userxattr" option is NOT needed for the initial user namespace (aka "the host").
Also, Ubuntu (since circa 2015) and Debian (since 10) with kernel < 5.11 can mount the overlayfs in a user namespace without the "userxattr" option.
The corresponding kernel commit: 2d2f2d7322ff43e0fe92bf8cccdc0b09449bf2e1
> **ovl: user xattr**
>
> Optionally allow using "user.overlay." namespace instead of "trusted.overlay."
> ...
> Disable redirect_dir and metacopy options, because these would allow privilege escalation through direct manipulation of the
> "user.overlay.redirect" or "user.overlay.metacopy" xattrs.
Fix issue 42055
Related to containerd/containerd PR 5076
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
(cherry picked from commit 11ef8d3ba9)
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Various dirs in /var/lib/docker contain data that needs to be mounted
into a container. For this reason, these dirs are set to be owned by the
remapped root user, otherwise there can be permissions issues.
However, this uneccessarily exposes these dirs to an unprivileged user
on the host.
Instead, set the ownership of these dirs to the real root (or rather the
UID/GID of dockerd) with 0701 permissions, which allows the remapped
root to enter the directories but not read/write to them.
The remapped root needs to enter these dirs so the container's rootfs
can be configured... e.g. to mount /etc/resolve.conf.
This prevents an unprivileged user from having read/write access to
these dirs on the host.
The flip side of this is now any user can enter these directories.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This is a fix for https://github.com/docker/for-linux/issues/1012.
The code was not considering that C strings are NULL-terminated so
we need to leave one extra byte.
Without this fix, the testcase in https://github.com/docker/for-linux/issues/1012
fails with
```
Step 61/1001 : RUN echo 60 > 60
---> Running in dde85ac3b1e3
Removing intermediate container dde85ac3b1e3
---> 80a12a18a241
Step 62/1001 : RUN echo 61 > 61
error creating overlay mount to /23456789112345678921234/overlay2/d368abcc97d6c6ebcf23fa71225e2011d095295d5d8c9b31d6810bea748bdf07-init/merged: no such file or directory
```
with the output of `dmesg -T` as:
```
[Sat Dec 19 02:35:40 2020] overlayfs: failed to resolve '/23456789112345678921234/overlay2/89e435a1b24583c463abb73e8abfad8bf8a88312ef8253455390c5fa0a765517-init/wor': -2
```
with this fix, you get the expected:
```
Step 126/1001 : RUN echo 125 > 125
---> Running in 2f2e56da89e0
max depth exceeded
```
Signed-off-by: Oscar Bonilla <6f6231@gmail.com>
full diff: https://github.com/moby/sys/compare/mountinfo/v0.1.3...mountinfo/v0.4.0
> Note that this dependency uses submodules, providing "github.com/moby/sys/mount"
> and "github.com/moby/sys/mountinfo". Our vendoring tool (vndr) currently doesn't
> support submodules, so we vendor the top-level moby/sys repository (which contains
> both) and pick the most recent tag, which could be either `mountinfo/vXXX` or
> `mount/vXXX`.
github.com/moby/sys/mountinfo v0.4.0
--------------------------------------------------------------------------------
Breaking changes:
- `PidMountInfo` is now deprecated and will be removed before v1.0; users should switch to `GetMountsFromReader`
Fixes and improvements:
- run filter after all fields are parsed
- correct handling errors from bufio.Scan
- documentation formatting fixes
github.com/moby/sys/mountinfo v0.3.1
--------------------------------------------------------------------------------
- mount: use MNT_* flags from golang.org/x/sys/unix on freebsd
- various godoc and CI fixes
- mountinfo: make GetMountinfoFromReader Linux-specific
- Add support for OpenBSD in addition to FreeBSD
- mountinfo: use idiomatic naming for fields
github.com/moby/sys/mountinfo v0.2.0
--------------------------------------------------------------------------------
Bug fixes:
- Fix path unescaping for paths with double quotes
Improvements:
- Mounted: speed up by adding fast paths using openat2 (Linux-only) and stat
- Mounted: relax path requirements (allow relative, non-cleaned paths, symlinks)
- Unescape fstype and source fields
- Documentation improvements
Testing/CI:
- Unit tests: exclude darwin
- CI: run tests under Fedora 32 to test openat2
- TestGetMounts: fix for Ubuntu build system
- Makefile: fix ignoring test failures
- CI: add cross build
github.com/moby/sys/mount v0.1.1
--------------------------------------------------------------------------------
https://github.com/moby/sys/releases/tag/mount%2Fv0.1.1
Improvements:
- RecursiveUnmount: add a fast path (#26)
- Unmount: improve doc
- fix CI linter warning on Windows
Testing/CI:
- Unit tests: exclude darwin
- Makefile: fix ignoring test failures
- CI: add cross build
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This pulls in the migration of go-winio/backuptar from the bundled fork
of archive/tar from Go 1.6 to using Go's current archive/tar unmodified.
This fixes the failure to import an OCI layer (tar stream) containing a
file larger than 8gB.
Fixes: #40444
Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
The implementation in libcontainer/system is quite complicated,
and we only use it to detect if user-namespaces are enabled.
In addition, the implementation in containerd uses a sync.Once,
so that detection (and reading/parsing `/proc/self/uid_map`) is
only performed once.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Switch to moby/sys/mount and mountinfo. Keep the pkg/mount for potential
outside users.
This commit was generated by the following bash script:
```
set -e -u -o pipefail
for file in $(git grep -l 'docker/docker/pkg/mount"' | grep -v ^pkg/mount); do
sed -i -e 's#/docker/docker/pkg/mount"#/moby/sys/mount"#' \
-e 's#mount\.\(GetMounts\|Mounted\|Info\|[A-Za-z]*Filter\)#mountinfo.\1#g' \
$file
goimports -w $file
done
```
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
`fuse-overlayfs` provides rootless overlayfs functionality without depending
on any kernel patch.
Aside from rootless, `fuse-overlayfs` could be potentially used for eliminating
`chown()` calls that happen in userns-remap mode, because `fuse-overlayfs` also
provides shiftfs functionality.
System requirements:
* fuse-overlayfs needs to be installed. Tested with 0.7.6.
* kernel >= 4.18
Unit test: `go test -exec sudo -v ./daemon/graphdriver/fuse-overlayfs`
The implementation is based on Podman's `overlay` driver which supports
both kernel-mode overlayfs and fuse-overlayfs in the single driver instance:
https://github.com/containers/storage/blob/39a8d5ed/drivers/overlay/overlay.go
However, Moby's implementation aims to decouple `fuse-overlayfs` driver from the
kernel-mode driver (`overlay2`) for simplicity.
Fix#40218
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Now that we do check if overlay is working by performing an actual
overlayfs mount, there's no need in extra checks for the kernel version
or the filesystem type. Actual mount check is sufficient.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Before this commit, overlay check was performed by looking for
`overlay` in /proc/filesystem. This obviously might not work
for rootless Docker (fs is there, but one can't use it as non-root).
This commit changes the check to perform the actual mount, by reusing
the code previously written to check for multiple lower dirs support.
The old check is removed from both drivers, as well as the additional
check for the multiple lower dirs support in overlay2 since it's now
a part of the main check.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This moves supportsMultipleLowerDir() to overlayutils
so it can be used from both overlay and overlay2.
The only changes made were:
* replace logger with logrus
* don't use workDirName mergedDirName constants
* add mnt var to improve readability a bit
This is a preparation for the next commit.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
As caught by staticcheck (after disabling the default exclusion rules);
Based on the comment, this break was indeed meant to break the
loop and return the error.
```
daemon/graphdriver/aufs/mount.go:54:4: SA4011: ineffective break statement. Did you mean to break out of the outer loop? (staticcheck)
break
^
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
```
daemon/graphdriver/btrfs/btrfs.go:609:5: SA4003: no value of type uint64 is less than 0 (staticcheck)
if driver.options.size <= 0 {
^
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
```
daemon/graphdriver/aufs/aufs_test.go:746:8: SA4021: x = append(y) is equivalent to x = y (staticcheck)
ids = append(ids[2:])
^
```
Also pre-allocating the ids slice while we're at it.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
It has been pointed out that sometimes device mapper unit tests
fail with the following diagnostics:
> --- FAIL: TestDevmapperSetup (0.02s)
> graphtest_unix.go:44: graphdriver: loopback attach failed
> graphtest_unix.go:48: loopback attach failed
The root cause is the absence of udev inside the container used
for testing, which causes device nodes (/dev/loop*) to not be
created.
The test suite itself already has a workaround, but it only
creates 8 devices (loop0 till loop7). It might very well be
the case that the first few devices are already used by the
system (on my laptop 15 devices are busy).
The fix is to raise the number of devices being manually created.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>