Commit graph

117 commits

Author SHA1 Message Date
Jaroslav Jindrak
cadb124ab6
daemon: overlay2: remove world writable permission from the lower file
In de2447c, the creation of the 'lower' file was changed from using
os.Create to using ioutils.AtomicWriteFile, which ignores the system's
umask. This means that even though the requested permission in the
source code was always 0666, it was 0644 on systems with default
umask of 0022 prior to de2447c, so the move to AtomicFile potentially
increased the file's permissions.

This is not a security issue because the parent directory does not
allow writes into the file, but it can confuse security scanners on
Linux-based systems into giving false positives.

Signed-off-by: Jaroslav Jindrak <dzejrou@gmail.com>
2024-03-05 14:25:50 +01:00
Sebastiaan van Stijn
cff4f20c44
migrate to github.com/containerd/log v0.1.0
The github.com/containerd/containerd/log package was moved to a separate
module, which will also be used by upcoming (patch) releases of containerd.

This patch moves our own uses of the package to use the new module.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-10-11 17:52:23 +02:00
Mike Sul
de2447c2ab
daemon: overlay2: Write layer metadata atomically
When the daemon process or the host running it is abruptly terminated,
the layer metadata file can become inconsistent on the file system.
Specifically, `link` and `lower` files may exist but be empty, leading
to overlay mounting errors during layer extraction, such as:
"failed to register layer: error creating overlay mount to <path>:
too many levels of symbolic links."

This commit introduces the use of `AtomicWriteFile` to ensure that the
layer metadata files contain correct data when they exist on the file system.

Signed-off-by: Mike <mike.sul@foundries.io>
2023-09-13 15:07:32 +02:00
Sebastiaan van Stijn
d2a6956afb
daemon/graphdriver: format code with gofumpt
Formatting the code with https://github.com/mvdan/gofumpt

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-29 00:31:34 +02:00
Brian Goff
74da6a6363 Switch all logging to use containerd log pkg
This unifies our logging and allows us to propagate logging and trace
contexts together.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2023-06-24 00:23:44 +00:00
Sebastiaan van Stijn
ab35df454d
remove pre-go1.17 build-tags
Removed pre-go1.17 build-tags with go fix;

    go mod init
    go fix -mod=readonly ./...
    rm go.mod

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-05-19 20:38:51 +02:00
Cory Snider
4e0319c878 [chore] clean up reexec.Init() calls
Now that most uses of reexec have been replaced with non-reexec
solutions, most of the reexec.Init() calls peppered throughout the test
suites are unnecessary. Furthermore, most of the reexec.Init() calls in
test code neglects to check the return value to determine whether to
exit, which would result in the reexec'ed subprocesses proceeding to run
the tests, which would reexec another subprocess which would proceed to
run the tests, recursively. (That would explain why every reexec
callback used to unconditionally call os.Exit() instead of returning...)

Remove unneeded reexec.Init() calls from test and example code which no
longer needs it, and fix the reexec.Init() calls which are not inert to
exit after a reexec callback is invoked.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-05-09 19:13:17 -04:00
Sebastiaan van Stijn
9791756284
overlay2: remove deprecated overlay2.override_kernel_check option
This option was deprecated in e35700eb50
(and backported to v23.0 through 43ce8f7d24).

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-20 23:57:45 +02:00
Bjorn Neergaard
3bcb350711
graphdriver/overlay2: usingMetacopy ENOTSUP is non-fatal
Signed-off-by: Bjorn Neergaard <bneergaard@mirantis.com>
2023-02-03 05:10:53 -07:00
Sebastiaan van Stijn
2400bc66ef
Merge pull request #44285 from cpuguy83/nix_ov2_reexec
Replace overlay2 mount reexec with in-proc impl
2022-10-18 14:39:05 +02:00
Brian Goff
34f459423a Replace overlay2 mount reexec with in-proc impl
Building off insights from the great work Cory Snider has been doing,
this replaces a reexec with a much lower overhead implementation which
performs the `Chddir` in a new goroutine that is locked to a specific
thread with CLONE_FS unshared.
The thread is thrown away afterwards and the Chdir does effectively the
same thing as what the reexec was being used for.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2022-10-12 00:38:50 +00:00
Sebastiaan van Stijn
e35700eb50
daemon/graphdriver/overlay2: remove deprecated overrideKernelCheck
Commit 955c1f881a (Docker v17.12.0) replaced
detection of support for multiple lowerdirs (as required by overlay2) to not
depend on the kernel version. The `overlay2.override_kernel_check` was still
used to print a warning that older kernel versions may not have full support.

After this, commit e226aea280 (Docker v20.10.0,
backported to v19.03.7) removed uses of the `overlay2.override_kernel_check`
option altogether, but we were still parsing it.

This patch changes the `parseOptions()` function to not parse the option,
printing a deprecation warning instead. We should change this to be an error,
but the  `overlay2.override_kernel_check` option was not deprecated in the
documentation, so keeping it around for one more release.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-10-10 14:58:40 +02:00
Sebastiaan van Stijn
1515e02c8a
Merge pull request #44215 from corhere/fix-unlockosthread-pdeathsig
Stop subprocesses from getting unexpectedly killed
2022-10-06 20:08:53 +02:00
Cory Snider
1f22b15030 Lock OS threads when exec'ing with Pdeathsig
On Linux, when (os/exec.Cmd).SysProcAttr.Pdeathsig is set, the signal
will be sent to the process when the OS thread on which cmd.Start() was
executed dies. The runtime terminates an OS thread when a goroutine
exits after being wired to the thread with runtime.LockOSThread(). If
other goroutines are allowed to be scheduled onto a thread which called
cmd.Start(), an unrelated goroutine could cause the thread to be
terminated and prematurely signal the command. See
https://github.com/golang/go/issues/27505 for more information.

Prevent started subprocesses with Pdeathsig from getting signaled
prematurely by wiring the starting goroutine to the OS thread until the
subprocess has exited. No other goroutines can be scheduled onto a
locked thread so it will remain alive until unlocked or the daemon
process exits.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-10-05 12:18:03 -04:00
Sebastiaan van Stijn
5b6b42162b
pkg/fsutils: deprecate in favor of containerd/continuity/fs
The pkg/fsutils package was forked in containerd, and later moved to
containerd/continuity/fs. As we're moving more bits to containerd, let's also
use the same implementation to reduce code-duplication and to prevent them from
diverging.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-10-05 11:36:04 +02:00
Cory Snider
9ce2b30b81 pkg/containerfs: drop ContainerFS type alias
Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-09-23 16:56:53 -04:00
Cory Snider
e332c41e9d pkg/containerfs: alias ContainerFS to string
Drop the constructor and redundant string() type-casts.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-09-23 16:56:52 -04:00
Cory Snider
95824f2b5f pkg/containerfs: simplify ContainerFS type
Iterate towards dropping the type entirely.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-09-23 16:56:49 -04:00
Bjorn Neergaard
ce3e2d1955 overlay2: account for UserNS/userxattr in metacopy test
Signed-off-by: Bjorn Neergaard <bneergaard@mirantis.com>
2022-05-17 06:58:50 -06:00
Bjorn Neergaard
2c3d1f7b4b overlay2: test for and report metacopy status
This is a first, naive implementation, that does not account for
userxattr/UserNS.

Signed-off-by: Bjorn Neergaard <bneergaard@mirantis.com>
2022-05-13 07:37:20 -06:00
Sebastiaan van Stijn
d570bc4922
remove deprecated support for overlay(2) on backing FS without d_type (fstype=1)
Support for overlay on a backing filesystem without d_type was deprecated in
0abb8dec3f (Docker 17.12), with an exception
for existing installations (0a4e793a3d).

That deprecation was nearly 5 years ago, and running without d_type is known to
cause serious issues (so users will likely already have run into other problems).

This patch removes support for running overlay and overlay2 on these filesystems,
returning the error instead of logging it.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-04-07 16:15:26 +02:00
Cory Snider
098a44c07f Finish refactor of UID/GID usage to a new struct
Finish the refactor which was partially completed with commit
34536c498d, passing around IdentityMapping structs instead of pairs of
[]IDMap slices.

Existing code which uses []IDMap relies on zero-valued fields to be
valid, empty mappings. So in order to successfully finish the
refactoring without introducing bugs, their replacement therefore also
needs to have a useful zero value which represents an empty mapping.
Change IdentityMapping to be a pass-by-value type so that there are no
nil pointers to worry about.

The functionality provided by the deprecated NewIDMappingsFromMaps
function is required by unit tests to to construct arbitrary
IdentityMapping values. And the daemon will always need to access the
mappings to pass them to the Linux kernel. Accommodate these use cases
by exporting the struct fields instead. BuildKit currently depends on
the UIDs and GIDs methods so we cannot get rid of them yet.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-03-14 16:28:57 -04:00
Sebastiaan van Stijn
25ee00c494
pkg/system: move EnsureRemoveAll() to pkg/containerfs
pkg/system historically has been a bit of a kitchen-sink of things that were
somewhat "system" related, but didn't have a good place for. EnsureRemoveAll()
is one of those utilities. EnsureRemoveAll() is used to both unmount and remove
a path, for which it depends on both github.com/moby/sys/mount, which in turn
depends on github.com/moby/sys/mountinfo.

pkg/system is imported in the CLI, but neither EnsureRemoveAll(), nor any of its
moby/sys dependencies are used on the client side, so let's move this function
somewhere else, to remove those dependencies from the CLI.

I looked for plausible locations that were related; it's used in:

- daemon
- daemon/graphdriver/XXX/
- plugin

I considered moving it into a (e.g.) "utils" package within graphdriver (but not
a huge fan of "utils" packages), and given that it felt (mostly) related to
cleaning up container filesystems, I decided to move it there.

Some things to follow-up on after this:

- Verify if this function is still needed (it feels a bit like a big hammer in
  a "YOLO, let's try some things just in case it fails")
- Perhaps it should be integrated in `containerfs.Remove()` (so that it's used
  automatically)
- Look if there's other implementations (and if they should be consolidated),
  although (e.g.) the one in containerd is a copy of ours:
  https://github.com/containerd/containerd/blob/v1.5.9/pkg/cri/server/helpers_linux.go#L200

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-03-03 00:22:26 +01:00
Brian Goff
03f1c3d78f
Lock down docker root dir perms.
Do not use 0701 perms.
0701 dir perms allows anyone to traverse the docker dir.
It happens to allow any user to execute, as an example, suid binaries
from image rootfs dirs because it allows traversal AND critically
container users need to be able to do execute things.

0701 on lower directories also happens to allow any user to modify
     things in, for instance, the overlay upper dir which neccessarily
     has 0755 permissions.

This changes to use 0710 which allows users in the group to traverse.
In userns mode the UID owner is (real) root and the GID is the remapped
root's GID.

This prevents anyone but the remapped root to traverse our directories
(which is required for userns with runc).

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit ef7237442147441a7cadcda0600be1186d81ac73)
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit 93ac040bf0)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-10-05 09:57:00 +02:00
Eng Zer Jun
c55a4ac779
refactor: move from io/ioutil to io and os package
The io/ioutil package has been deprecated in Go 1.16. This commit
replaces the existing io/ioutil functions with their new definitions in
io and os packages.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2021-08-27 14:56:57 +08:00
Sebastiaan van Stijn
686be57d0a
Update to Go 1.17.0, and gofmt with Go 1.17
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-08-24 23:33:27 +02:00
Sebastiaan van Stijn
472f21b923
replace uses of deprecated containerd/sys.RunningInUserNS()
This utility was moved to a separate package, which has no
dependencies.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-06-18 11:01:24 +02:00
Akihiro Suda
6322dfc217
archive: do not use overlayWhiteoutConverter for UserNS
overlay2 no longer sets `archive.OverlayWhiteoutFormat` when
running in UserNS, so we can remove the complicated logic in the
archive package.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-03-29 14:47:12 +09:00
Akihiro Suda
67aa418df2
overlay2: doesSupportNativeDiff: add fast path for userns
When running in userns, returns error (i.e. "use naive, not native")
immediately.

No substantial change to the logic.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-03-29 14:47:09 +09:00
Akihiro Suda
dd97134232
overlay2: call d.naiveDiff.ApplyDiff when useNaiveDiff==true
Previously, `d.naiveDiff.ApplyDiff` was not used even when
`useNaiveDiff()==true`

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-03-26 14:34:56 +09:00
Akihiro Suda
11ef8d3ba9
overlay2: support "userxattr" option (kernel 5.11)
The "userxattr" option is needed for mounting overlayfs inside a user namespace with kernel >= 5.11.

The "userxattr" option is NOT needed for the initial user namespace (aka "the host").

Also, Ubuntu (since circa 2015) and Debian (since 10) with kernel < 5.11 can mount the overlayfs in a user namespace without the "userxattr" option.

The corresponding kernel commit: 2d2f2d7322ff43e0fe92bf8cccdc0b09449bf2e1
> **ovl: user xattr**
>
> Optionally allow using "user.overlay." namespace instead of "trusted.overlay."
> ...
> Disable redirect_dir and metacopy options, because these would allow privilege escalation through direct manipulation of the
> "user.overlay.redirect" or "user.overlay.metacopy" xattrs.

Fix issue 42055

Related to containerd/containerd PR 5076

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-03-11 15:12:41 +09:00
Brian Goff
7f5e39bd4f
Use real root with 0701 perms
Various dirs in /var/lib/docker contain data that needs to be mounted
into a container. For this reason, these dirs are set to be owned by the
remapped root user, otherwise there can be permissions issues.
However, this uneccessarily exposes these dirs to an unprivileged user
on the host.

Instead, set the ownership of these dirs to the real root (or rather the
UID/GID of dockerd) with 0701 permissions, which allows the remapped
root to enter the directories but not read/write to them.
The remapped root needs to enter these dirs so the container's rootfs
can be configured... e.g. to mount /etc/resolve.conf.

This prevents an unprivileged user from having read/write access to
these dirs on the host.
The flip side of this is now any user can enter these directories.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit e908cc3901)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-02-02 13:01:25 +01:00
Oscar Bonilla
c923f6ac3b Fix off-by-one bug
This is a fix for https://github.com/docker/for-linux/issues/1012.

The code was not considering that C strings are NULL-terminated so
we need to leave one extra byte.

Without this fix, the testcase in https://github.com/docker/for-linux/issues/1012
fails with

```
Step 61/1001 : RUN echo 60 > 60
 ---> Running in dde85ac3b1e3
Removing intermediate container dde85ac3b1e3
 ---> 80a12a18a241
Step 62/1001 : RUN echo 61 > 61
error creating overlay mount to /23456789112345678921234/overlay2/d368abcc97d6c6ebcf23fa71225e2011d095295d5d8c9b31d6810bea748bdf07-init/merged: no such file or directory
```

with the output of `dmesg -T` as:

```
[Sat Dec 19 02:35:40 2020] overlayfs: failed to resolve '/23456789112345678921234/overlay2/89e435a1b24583c463abb73e8abfad8bf8a88312ef8253455390c5fa0a765517-init/wor': -2
```

with this fix, you get the expected:

```
Step 126/1001 : RUN echo 125 > 125
 ---> Running in 2f2e56da89e0
max depth exceeded
```

Signed-off-by: Oscar Bonilla <6f6231@gmail.com>
2020-12-20 16:23:25 -08:00
Timo Rothenpieler
c677e4cc87 quota: move quota package out of graphdriver
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
2020-10-05 13:28:25 +00:00
Sebastiaan van Stijn
5ca758199d
replace pkg/locker with github.com/moby/locker
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-09-10 22:15:40 +02:00
Sebastiaan van Stijn
4534a7afc3
daemon: use containerd/sys to detect UserNamespaces
The implementation in libcontainer/system is quite complicated,
and we only use it to detect if user-namespaces are enabled.

In addition, the implementation in containerd uses a sync.Once,
so that detection (and reading/parsing `/proc/self/uid_map`) is
only performed once.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-06-15 13:06:08 +02:00
Shijiang Wei
e9d785ce3f enhance storage-opt validation logic in overlay2 driver
Signed-off-by: Shijiang Wei <mountkin@gmail.com>
2020-04-12 23:22:30 +08:00
Kir Kolyshkin
39048cf656 Really switch to moby/sys/mount*
Switch to moby/sys/mount and mountinfo. Keep the pkg/mount for potential
outside users.

This commit was generated by the following bash script:

```
set -e -u -o pipefail

for file in $(git grep -l 'docker/docker/pkg/mount"' | grep -v ^pkg/mount); do
	sed -i -e 's#/docker/docker/pkg/mount"#/moby/sys/mount"#' \
		-e 's#mount\.\(GetMounts\|Mounted\|Info\|[A-Za-z]*Filter\)#mountinfo.\1#g' \
		$file
	goimports -w $file
done
```

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-20 09:46:25 -07:00
Brian Goff
470d32d32a
Merge pull request #40483 from AkihiroSuda/fuse-overlayfs
new storage driver: fuse-overlayfs
2020-03-11 11:32:37 -07:00
Jintao Zhang
18c22f5bc1 fix backingFs assignment
Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>
2020-03-10 00:03:59 +08:00
Akihiro Suda
7418745001 new storage driver: fuse-overlayfs
`fuse-overlayfs` provides rootless overlayfs functionality without depending
on any kernel patch.

Aside from rootless, `fuse-overlayfs` could be potentially used for eliminating
`chown()` calls that happen in userns-remap mode, because `fuse-overlayfs` also
provides shiftfs functionality.

System requirements:
* fuse-overlayfs needs to be installed. Tested with 0.7.6.
* kernel >= 4.18

Unit test: `go test -exec sudo -v ./daemon/graphdriver/fuse-overlayfs`

The implementation is based on Podman's `overlay` driver which supports
both kernel-mode overlayfs and fuse-overlayfs in the single driver instance:
https://github.com/containers/storage/blob/39a8d5ed/drivers/overlay/overlay.go

However, Moby's implementation aims to decouple `fuse-overlayfs` driver from the
kernel-mode driver (`overlay2`) for simplicity.

Fix #40218

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-02-10 23:48:52 +09:00
Sebastiaan van Stijn
ec4bc83258
daemon/graphdriver: normalize comment formatting
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-11-27 15:43:23 +01:00
Kir Kolyshkin
e226aea280 overlay[2]: rm fs checks
Now that we do check if overlay is working by performing an actual
overlayfs mount, there's no need in extra checks for the kernel version
or the filesystem type. Actual mount check is sufficient.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2019-11-14 08:13:12 -08:00
Kir Kolyshkin
649e4c8889 Fix/improve overlay support check
Before this commit, overlay check was performed by looking for
`overlay` in /proc/filesystem. This obviously might not work
for rootless Docker (fs is there, but one can't use it as non-root).

This commit changes the check to perform the actual mount, by reusing
the code previously written to check for multiple lower dirs support.

The old check is removed from both drivers, as well as the additional
check for the multiple lower dirs support in overlay2 since it's now
a part of the main check.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2019-11-08 11:49:39 -08:00
Kir Kolyshkin
d5687079ad overlay: move supportsMultipleLowerDir to utils
This moves supportsMultipleLowerDir() to overlayutils
so it can be used from both overlay and overlay2.

The only changes made were:
 * replace logger with logrus
 * don't use workDirName mergedDirName constants
 * add mnt var to improve readability a bit

This is a preparation for the next commit.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2019-11-08 11:48:47 -08:00
Akihiro Suda
610551e039
Merge pull request #38930 from daym/fewer-modprobes
Use fewer modprobes
2019-09-24 02:37:58 +09:00
Danny Milosavljevic
074eca1d79
Use fewer modprobes
Signed-off-by: Danny Milosavljevic <dannym@scratchpost.org>
2019-09-21 11:21:18 +02:00
Sebastiaan van Stijn
07ff4f1de8
goimports: fix imports
Format the source according to latest goimports.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-09-18 12:56:54 +02:00
Derek McGowan
477bf1e413
Fix overlay2 busy error on mount
When mounting overlays which have children, enforce that
the mount is always performed as read only. Newer versions
of the kernel return a device busy error when a lower directory
is in use as an upper directory in another overlay mount.

Adds committed file to indicate when an overlay is being used
as a parent, ensuring it will no longer be mounted with an
upper directory.

Signed-off-by: Derek McGowan <derek@mcgstyle.net>
2019-08-21 15:03:52 -07:00
imxyb
7ab69cd7e2 change hard code: add some overlay2 constant to replace the hard code.
Signed-off-by: Xiao YongBiao <xyb4638@gmail.com>
2019-04-02 10:57:13 +08:00