Movified from 686be57d0a, and re-ran
gofmt again to address for files not present in 20.10 and vice-versa.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 686be57d0a)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Do not use 0701 perms.
0701 dir perms allows anyone to traverse the docker dir.
It happens to allow any user to execute, as an example, suid binaries
from image rootfs dirs because it allows traversal AND critically
container users need to be able to do execute things.
0701 on lower directories also happens to allow any user to modify
things in, for instance, the overlay upper dir which neccessarily
has 0755 permissions.
This changes to use 0710 which allows users in the group to traverse.
In userns mode the UID owner is (real) root and the GID is the remapped
root's GID.
This prevents anyone but the remapped root to traverse our directories
(which is required for userns with runc).
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit ef7237442147441a7cadcda0600be1186d81ac73)
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Various dirs in /var/lib/docker contain data that needs to be mounted
into a container. For this reason, these dirs are set to be owned by the
remapped root user, otherwise there can be permissions issues.
However, this uneccessarily exposes these dirs to an unprivileged user
on the host.
Instead, set the ownership of these dirs to the real root (or rather the
UID/GID of dockerd) with 0701 permissions, which allows the remapped
root to enter the directories but not read/write to them.
The remapped root needs to enter these dirs so the container's rootfs
can be configured... e.g. to mount /etc/resolve.conf.
This prevents an unprivileged user from having read/write access to
these dirs on the host.
The flip side of this is now any user can enter these directories.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
full diff: https://github.com/moby/sys/compare/mountinfo/v0.1.3...mountinfo/v0.4.0
> Note that this dependency uses submodules, providing "github.com/moby/sys/mount"
> and "github.com/moby/sys/mountinfo". Our vendoring tool (vndr) currently doesn't
> support submodules, so we vendor the top-level moby/sys repository (which contains
> both) and pick the most recent tag, which could be either `mountinfo/vXXX` or
> `mount/vXXX`.
github.com/moby/sys/mountinfo v0.4.0
--------------------------------------------------------------------------------
Breaking changes:
- `PidMountInfo` is now deprecated and will be removed before v1.0; users should switch to `GetMountsFromReader`
Fixes and improvements:
- run filter after all fields are parsed
- correct handling errors from bufio.Scan
- documentation formatting fixes
github.com/moby/sys/mountinfo v0.3.1
--------------------------------------------------------------------------------
- mount: use MNT_* flags from golang.org/x/sys/unix on freebsd
- various godoc and CI fixes
- mountinfo: make GetMountinfoFromReader Linux-specific
- Add support for OpenBSD in addition to FreeBSD
- mountinfo: use idiomatic naming for fields
github.com/moby/sys/mountinfo v0.2.0
--------------------------------------------------------------------------------
Bug fixes:
- Fix path unescaping for paths with double quotes
Improvements:
- Mounted: speed up by adding fast paths using openat2 (Linux-only) and stat
- Mounted: relax path requirements (allow relative, non-cleaned paths, symlinks)
- Unescape fstype and source fields
- Documentation improvements
Testing/CI:
- Unit tests: exclude darwin
- CI: run tests under Fedora 32 to test openat2
- TestGetMounts: fix for Ubuntu build system
- Makefile: fix ignoring test failures
- CI: add cross build
github.com/moby/sys/mount v0.1.1
--------------------------------------------------------------------------------
https://github.com/moby/sys/releases/tag/mount%2Fv0.1.1
Improvements:
- RecursiveUnmount: add a fast path (#26)
- Unmount: improve doc
- fix CI linter warning on Windows
Testing/CI:
- Unit tests: exclude darwin
- Makefile: fix ignoring test failures
- CI: add cross build
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Switch to moby/sys/mount and mountinfo. Keep the pkg/mount for potential
outside users.
This commit was generated by the following bash script:
```
set -e -u -o pipefail
for file in $(git grep -l 'docker/docker/pkg/mount"' | grep -v ^pkg/mount); do
sed -i -e 's#/docker/docker/pkg/mount"#/moby/sys/mount"#' \
-e 's#mount\.\(GetMounts\|Mounted\|Info\|[A-Za-z]*Filter\)#mountinfo.\1#g' \
$file
goimports -w $file
done
```
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The errors returned from Mount and Unmount functions are raw
syscall.Errno errors (like EPERM or EINVAL), which provides
no context about what has happened and why.
Similar to os.PathError type, introduce mount.Error type
with some context. The error messages will now look like this:
> mount /tmp/mount-tests/source:/tmp/mount-tests/target, flags: 0x1001: operation not permitted
or
> mount tmpfs:/tmp/mount-test-source-516297835: operation not permitted
Before this patch, it was just
> operation not permitted
[v2: add Cause()]
[v3: rename MountError to Error, document Cause()]
[v4: fixes; audited all users]
[v5: make Error type private; changes after @cpuguy83 reviews]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This implements chown support on Windows. Built-in accounts as well
as accounts included in the SAM database of the container are supported.
NOTE: IDPair is now named Identity and IDMappings is now named
IdentityMapping.
The following are valid examples:
ADD --chown=Guest . <some directory>
COPY --chown=Administrator . <some directory>
COPY --chown=Guests . <some directory>
COPY --chown=ContainerUser . <some directory>
On Windows an owner is only granted the permission to read the security
descriptor and read/write the discretionary access control list. This
fix also grants read/write and execute permissions to the owner.
Signed-off-by: Salahuddin Khan <salah@docker.com>
Functions `GetMounts()` and `parseMountTable()` return all the entries
as read and parsed from /proc/self/mountinfo. In many cases the caller
is only interested only one or a few entries, not all of them.
One good example is `Mounted()` function, which looks for a specific
entry only. Another example is `RecursiveUnmount()` which is only
interested in mount under a specific path.
This commit adds `filter` argument to `GetMounts()` to implement
two things:
1. filter out entries a caller is not interested in
2. stop processing if a caller is found what it wanted
`nil` can be passed to get a backward-compatible behavior, i.e. return
all the entries.
A few filters are implemented:
- `PrefixFilter`: filters out all entries not under `prefix`
- `SingleEntryFilter`: looks for a specific entry
Finally, `Mounted()` is modified to use `SingleEntryFilter()`, and
`RecursiveUnmount()` is using `PrefixFilter()`.
Unit tests are added to check filters are working.
[v2: ditch NoFilter, use nil]
[v3: ditch GetMountsFiltered()]
[v4: add unit test for filters]
[v5: switch to gotestyourself]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Now all of the storage drivers use the field "storage-driver" in their log
messages, which is set to name of the respective driver.
Storage drivers changed:
- Aufs
- Btrfs
- Devicemapper
- Overlay
- Overlay 2
- Zfs
Signed-off-by: Alejandro GonzÃlez Hevia <alejandrgh11@gmail.com>
This was added in #36047 just as a way to make sure the tree is fully
unmounted on shutdown.
For ZFS this could be a breaking change since there was no unmount before.
Someone could have setup the zfs tree themselves. It would be better, if
we really do want the cleanup to actually the unpacked layers checking
for mounts rather than a blind recursive unmount of the root.
BTRFS does not use mounts and does not need to unmount anyway.
These was only an unmount to begin with because for some reason the
btrfs tree was being moutned with `private` propagation.
For the other graphdrivers that still have a recursive unmount here...
these were already being unmounted and performing the recursive unmount
shouldn't break anything. If anyone had anything mounted at the
graphdriver location it would have been unmounted on shutdown anyway.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
The idea behind making the graphdrivers private is to prevent leaking
mounts into other namespaces.
Unfortunately this is not really what happens.
There is one case where this does work, and that is when the namespace
was created before the daemon's namespace.
However with systemd each system servie winds up with it's own mount
namespace. This causes a race betwen daemon startup and other system
services as to if the mount is actually private.
This also means there is a negative impact when other system services
are started while the daemon is running.
Basically there are too many things that the daemon does not have
control over (nor should it) to be able to protect against these kinds
of leakages. One thing is certain, setting the graphdriver roots to
private disconnects the mount ns heirarchy preventing propagation of
unmounts... new mounts are of course not propagated either, but the
behavior is racey (or just bad in the case of restarting services)... so
it's better to just be able to keep mount propagation in tact.
It also does not protect situations like `-v
/var/lib/docker:/var/lib/docker` where all mounts are recursively bound
into the container anyway.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This commit is a set of fixes and improvement for zfs graph driver,
in particular:
1. Remove mount point after umount in `Get()` error path, as well
as in `Put()` (with `MNT_DETACH` flag). This should solve "failed
to remove root filesystem for <ID> .... dataset is busy" error
reported in Moby issue 35642.
To reproduce the issue:
- start dockerd with zfs
- docker run -d --name c1 --rm busybox top
- docker run -d --name c2 --rm busybox top
- docker stop c1
- docker rm c1
Output when the bug is present:
```
Error response from daemon: driver "zfs" failed to remove root
filesystem for XXX : exit status 1: "/sbin/zfs zfs destroy -r
scratch/docker/YYY" => cannot destroy 'scratch/docker/YYY':
dataset is busy
```
Output when the bug is fixed:
```
Error: No such container: c1
```
(as the container has been successfully autoremoved on stop)
2. Fix/improve error handling in `Get()` -- do not try to umount
if `refcount` > 0
3. Simplifies unmount in `Get()`. Specifically, remove call to
`graphdriver.Mounted()` (which checks if fs is mounted using
`statfs()` and check for fs type) and `mount.Unmount()` (which
parses `/proc/self/mountinfo`). Calling `unix.Unmount()` is
simple and sufficient.
4. Add unmounting of driver's home to `Cleanup()`.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This patch adds the capability for the VFS graphdriver to use
XFS project quotas. It reuses the existing quota management
code that was created by overlay2 on XFS.
It doesn't rely on a filesystem whitelist, but instead
the quota-capability detection code.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Make sure to call C.free on C string allocated using C.CString in every
exit path.
C.CString allocates memory in the C heap using malloc. It is the callers
responsibility to free them. See
https://golang.org/cmd/cgo/#hdr-Go_references_to_C for details.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
This enables docker cp and ADD/COPY docker build support for LCOW.
Originally, the graphdriver.Get() interface returned a local path
to the container root filesystem. This does not work for LCOW, so
the Get() method now returns an interface that LCOW implements to
support copying to and from the container.
Signed-off-by: Akash Gupta <akagup@microsoft.com>
Changes most references of syscall to golang.org/x/sys/
Ones aren't changes include, Errno, Signal and SysProcAttr
as they haven't been implemented in /x/sys/.
Signed-off-by: Christopher Jones <tophj@linux.vnet.ibm.com>
[s390x] switch utsname from unsigned to signed
per 33267e036f
char in s390x in the /x/sys/unix package is now signed, so
change the buildtags
Signed-off-by: Christopher Jones <tophj@linux.vnet.ibm.com>
This allows for easy extension of adding more parameters to existing
parameters list. Otherwise adding a single parameter changes code
at so many places.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Use module name logrus instead of log.
Use logrus.[Error|Warn|Debug|Fatal|Panic|Info]f instead of w/o f
Signed-off-by: Daehyeok Mun <daehyeok@gmail.com>
For things that we can check if they are mounted by using their fsmagic
we should use that and for others do it the slow way.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Now supports setting a containers storage size when using zfs as the
storage engine. By passing in `--storage-opt size=<size>`, the created
container's storage size will be limited to the given size. Note that
the way zfs works, the given specified storage size will be given in
addition to the base container size.
Example:
The node image reports a size of `671M` from `df -h` when started.
Setting `--storage-opt size=2G` will result in a drive the size of
`671M` + `2G`, `2.7G` in total. Available space will be `2.0G`.
The storage size is achieved by setting the zfs option `quota` to the
given size on the zfs volume.
Signed-off-by: Ken Herner <kherner@progress.com>
This makes sure fsdiff doesn't try to unmount things that shouldn't be.
**Note**: This is intended as a temporary solution to have as minor a
change as possible for 1.11.1. A bigger change will be required in order
to support container re-attach.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Since the layer store was introduced, the level above the graphdriver
now differentiates between read/write and read-only layers. This
distinction is useful for graphdrivers that need to take special steps
when creating a layer based on whether it is read-only or not.
Adding this parameter allows the graphdrivers to differentiate, which
in the case of the Windows graphdriver, removes our dependence on parsing
the id of the parent for "-init" in order to infer this information.
This will also set the stage for unblocking some of the layer store
unit tests in the next preview build of Windows.
Signed-off-by: Stefan J. Wernli <swernli@microsoft.com>