Commit graph

977 commits

Author SHA1 Message Date
Kir Kolyshkin
a450a575a6 zfs: fix ebusy on umount
This commit is a set of fixes and improvement for zfs graph driver,
in particular:

1. Remove mount point after umount in `Get()` error path, as well
   as in `Put()` (with `MNT_DETACH` flag). This should solve "failed
   to remove root filesystem for <ID> .... dataset is busy" error
   reported in Moby issue 35642.

To reproduce the issue:

   - start dockerd with zfs
   - docker run -d --name c1 --rm busybox top
   - docker run -d --name c2 --rm busybox top
   - docker stop c1
   - docker rm c1

Output when the bug is present:
```
Error response from daemon: driver "zfs" failed to remove root
filesystem for XXX : exit status 1: "/sbin/zfs zfs destroy -r
scratch/docker/YYY" => cannot destroy 'scratch/docker/YYY':
dataset is busy
```

Output when the bug is fixed:
```
Error: No such container: c1
```
(as the container has been successfully autoremoved on stop)

2. Fix/improve error handling in `Get()` -- do not try to umount
   if `refcount` > 0

3. Simplifies unmount in `Get()`. Specifically, remove call to
   `graphdriver.Mounted()` (which checks if fs is mounted using
   `statfs()` and check for fs type) and `mount.Unmount()` (which
   parses `/proc/self/mountinfo`). Calling `unix.Unmount()` is
   simple and sufficient.

4. Add unmounting of driver's home to `Cleanup()`.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2017-12-01 13:46:47 -08:00
Sargun Dhillon
77a2bc3e5b Fix setting mtimes on directories
Previously, the code would set the mtime on the directories before
creating files in the directory itself. This was problematic
because it resulted in the mtimes on the directories being
incorrectly set. This change makes it so that the mtime is
set only _after_ all of the files have been created.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
2017-12-01 09:12:43 -08:00
Michael Crosby
72e45fd54e
Merge pull request #35618 from kolyshkin/mkdir-all
Fix MkdirAll* and its usage
2017-11-30 11:19:29 -05:00
Tõnis Tiigi
bdd9668b48
Merge pull request #35483 from thaJeztah/disallow-nfs-backing-for-overlay
Disallow overlay/overlay2 on top of NFS
2017-11-29 19:24:58 -08:00
Victor Vieux
11e07e7da6
Merge pull request #35527 from thaJeztah/feature-detect-overlay2
Detect overlay2 support on pre-4.0 kernels
2017-11-29 13:51:26 -08:00
Akihiro Suda
09eb7bcc36
Merge pull request #34948 from euank/public-mounts
Fix EBUSY errors under overlayfs and v4.13+ kernels
2017-11-29 12:48:39 +09:00
Sargun Dhillon
0eac562281 Fix bug, where copy_file_range was still calling legacy copy
There was a small issue here, where it copied the data using
traditional mechanisms, even when copy_file_range was successful.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
2017-11-28 14:59:56 -08:00
Sargun Dhillon
d2b71b2660 Have VFS graphdriver use accelerated in-kernel copy
This change makes the VFS graphdriver use the kernel-accelerated
(copy_file_range) mechanism of copying files, which is able to
leverage reflinks.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
2017-11-28 14:59:56 -08:00
Sargun Dhillon
b467f8b2ef Fix copying hardlinks in graphdriver/copy
Previously, graphdriver/copy would improperly copy hardlinks as just regular
files. This patch changes that behaviour, and instead the code now keeps
track of inode numbers, and if it sees the same inode number again
during the copy loop, it hardlinks it, instead of copying it.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
2017-11-28 14:59:56 -08:00
Sebastiaan van Stijn
955c1f881a
Detect overlay2 support on pre-4.0 kernels
The overlay2 storage-driver requires multiple lower dir
support for overlayFs. Support for this feature was added
in kernel 4.x, but some distros (RHEL 7.4, CentOS 7.4) ship with
an older kernel with this feature backported.

This patch adds feature-detection for multiple lower dirs,
and will perform this feature-detection on pre-4.x kernels
with overlayFS support.

With this patch applied, daemons running on a kernel
with multiple lower dir support will now select "overlay2"
as storage-driver, instead of falling back to "overlay".

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2017-11-28 13:55:33 -08:00
Victor Vieux
9ae697167c
Merge pull request #35528 from thaJeztah/ignore-empty-graphdirs
Skip empty directories on prior graphdriver detection
2017-11-27 17:37:39 -08:00
Kir Kolyshkin
516010e92d Simplify/fix MkdirAll usage
This subtle bug keeps lurking in because error checking for `Mkdir()`
and `MkdirAll()` is slightly different wrt to `EEXIST`/`IsExist`:

 - for `Mkdir()`, `IsExist` error should (usually) be ignored
   (unless you want to make sure directory was not there before)
   as it means "the destination directory was already there"

 - for `MkdirAll()`, `IsExist` error should NEVER be ignored.

Mostly, this commit just removes ignoring the IsExist error, as it
should not be ignored.

Also, there are a couple of cases then IsExist is handled as
"directory already exist" which is wrong. As a result, some code
that never worked as intended is now removed.

NOTE that `idtools.MkdirAndChown()` behaves like `os.MkdirAll()`
rather than `os.Mkdir()` -- so its description is amended accordingly,
and its usage is handled as such (i.e. IsExist error is not ignored).

For more details, a quote from my runc commit 6f82d4b (July 2015):

    TL;DR: check for IsExist(err) after a failed MkdirAll() is both
    redundant and wrong -- so two reasons to remove it.

    Quoting MkdirAll documentation:

    > MkdirAll creates a directory named path, along with any necessary
    > parents, and returns nil, or else returns an error. If path
    > is already a directory, MkdirAll does nothing and returns nil.

    This means two things:

    1. If a directory to be created already exists, no error is
    returned.

    2. If the error returned is IsExist (EEXIST), it means there exists
    a non-directory with the same name as MkdirAll need to use for
    directory. Example: we want to MkdirAll("a/b"), but file "a"
    (or "a/b") already exists, so MkdirAll fails.

    The above is a theory, based on quoted documentation and my UNIX
    knowledge.

    3. In practice, though, current MkdirAll implementation [1] returns
    ENOTDIR in most of cases described in #2, with the exception when
    there is a race between MkdirAll and someone else creating the
    last component of MkdirAll argument as a file. In this very case
    MkdirAll() will indeed return EEXIST.

    Because of #1, IsExist check after MkdirAll is not needed.

    Because of #2 and #3, ignoring IsExist error is just plain wrong,
    as directory we require is not created. It's cleaner to report
    the error now.

    Note this error is all over the tree, I guess due to copy-paste,
    or trying to follow the same usage pattern as for Mkdir(),
    or some not quite correct examples on the Internet.

    [1] https://github.com/golang/go/blob/f9ed2f75/src/os/path.go

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2017-11-27 17:32:12 -08:00
Euan Kemp
af0d589623 graphdriver/overlay{,2}: remove 'merged' on umount
This removes and recreates the merged dir with each umount/mount
respectively.
This is done to make the impact of leaking mountpoints have less
user-visible impact.

It's fairly easy to accidentally leak mountpoints (even if moby doesn't,
other tools on linux like 'unshare' are quite able to incidentally do
so).

As of recently, overlayfs reacts to these mounts being leaked (see

One trick to force an unmount is to remove the mounted directory and
recreate it. Devicemapper now does this, overlay can follow suit.

Signed-off-by: Euan Kemp <euan.kemp@coreos.com>
2017-11-22 14:32:30 -08:00
Euan Kemp
1e214c0952 graphdriver/overlay: minor doc comment cleanup
Signed-off-by: Euan Kemp <euan.kemp@coreos.com>
2017-11-22 14:17:08 -08:00
Sebastiaan van Stijn
1262c57714
Skip empty directories on prior graphdriver detection
When starting the daemon, the `/var/lib/docker` directory
is scanned for existing directories, so that the previously
selected graphdriver will automatically be used.

In some situations, empty directories are present (those
directories can be created during feature detection of
graph-drivers), in which case the daemon refuses to start.

This patch improves detection, and skips empty directories,
so that leftover directories don't cause the daemon to
fail.

Before this change:

    $ mkdir /var/lib/docker /var/lib/docker/aufs /var/lib/docker/overlay2
    $ dockerd
    ...
    Error starting daemon: error initializing graphdriver: /var/lib/docker contains several valid graphdrivers: overlay2, aufs; Please cleanup or explicitly choose storage driver (-s <DRIVER>)

With this patch applied:

    $ mkdir /var/lib/docker /var/lib/docker/aufs /var/lib/docker/overlay2
    $ dockerd
    ...
    INFO[2017-11-16T17:26:43.207739140Z] Docker daemon                                 commit=ab90bc296 graphdriver(s)=overlay2 version=dev
    INFO[2017-11-16T17:26:43.208033095Z] Daemon has completed initialization

And on restart (prior graphdriver is still picked up):

    $ dockerd
    ...
    INFO[2017-11-16T17:27:52.260361465Z] [graphdriver] using prior storage driver: overlay2

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2017-11-21 15:42:04 +01:00
Sebastiaan van Stijn
38b3af567f
Remove deprecated MkdirAllAs(), MkdirAs()
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2017-11-21 13:53:54 +01:00
Victor Vieux
edc204b1ff
Merge pull request #35522 from kolyshkin/gd-custom
graphdriver: custom build-time priority list
2017-11-17 15:48:26 -08:00
Christian Brauner
7e35df0e04
Skip further checks for quota in user namespaces
Commit 7a1618ced3 regresses running Docker
in user namespaces. The new check for whether quota are supported calls
NewControl() which in turn calls makeBackingFsDev() which tries to
mknod(). Skip quota tests when we detect that we are running in a user
namespace and return ErrQuotaNotSupported to the caller. This just
restores the status quo.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2017-11-17 12:57:27 +01:00
Kir Kolyshkin
17708e72a7 graphdriver: custom build-time priority list
Add a way to specify a custom graphdriver priority list
during build. This can be done with something like

  go build -ldflags "-X github.com/docker/docker/daemon/graphdriver.priority=overlay2,devicemapper"

As ldflags are already used by the engine build process, and it seems
that only one (last) `-ldflags` argument is taken into account by go,
an envoronment variable `DOCKER_LDFLAGS` is introduced in order to
be able to append some text to `-ldflags`. With this in place,
using the feature becomes

  make DOCKER_LDFLAGS="-X github.com/docker/docker/daemon/graphdriver.priority=overlay2,devicemapper" dynbinary

The idea behind this is, the priority list might be different
for different distros, so vendors are now able to change it
without patching the source code.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2017-11-16 19:43:34 -08:00
Kir Kolyshkin
d014be5426 daemon/graphdriver/register: separate overlay2
Make it possible to disable overlay and overlay2 separately.

With this commit, we now have `exclude_graphdriver_overlay` and
`exclude_graphdriver_overlay2` build tags for the engine, which
is in line with any other graph driver.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2017-11-15 00:06:00 -08:00
Justin Cormack
0defc69813
Merge pull request #35231 from sargun/add-vfs-quota-support
Add vfs quota support
2017-11-14 15:05:02 +00:00
Sebastiaan van Stijn
90dfb1d0cc
Disallow overlay/overlay2 on top of NFS
From https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt:

> The lower filesystem can be any filesystem supported by Linux and does
> not need to be writable. The lower filesystem can even be another
> overlayfs. The upper filesystem will normally be writable and if it
> is it must support the creation of trusted.* extended attributes, and
> must provide valid d_type in readdir responses, so NFS is not suitable.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2017-11-13 23:24:23 +01:00
Vincent Demeester
bbc4f78ba9
Merge pull request #34573 from cyphar/dm-dos-prevention-remove-mountpoint
devicemapper: remove container rootfs mountPath after umount
2017-11-08 17:08:07 +01:00
Aleksa Sarai
1af8ea681f
devmapper: add a test for mount leak workaround
In order to avoid reverting our fix for mount leakage in devicemapper,
add a test which checks that devicemapper's Get() and Put() cycle can
survive having a command running in an rprivate mount propagation setup
in-between. While this is quite rudimentary, it should be sufficient.

We have to skip this test for pre-3.18 kernels.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-11-08 11:02:11 +11:00
Sargun Dhillon
7a1618ced3 Add quota support to VFS graphdriver
This patch adds the capability for the VFS graphdriver to use
XFS project quotas. It reuses the existing quota management
code that was created by overlay2 on XFS.

It doesn't rely on a filesystem whitelist, but instead
the quota-capability detection code.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
2017-11-06 15:53:51 -08:00
Yong Tang
4785f1a7ab Remove solaris build tag and `contrib/mkimage/solaris
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2017-11-02 00:01:46 +00:00
Sebastiaan van Stijn
226eb8004e
Merge pull request #35177 from sargun/add-quota-tests
Add tests to project quotas and detection mechanism
2017-10-30 21:08:38 +01:00
Sargun Dhillon
6966dc0aa9 Add tests to project quotas and detection mechanism
This adds a mechanism (read-only) to check for project quota support
in a standard way. This mechanism is leveraged by the tests, which
test for the following:
 1. Can we get a quota controller?
 2. Can we set the quota for a particular directory?
 3. Is the quota being over-enforced?
 4. Is the quota being under-enforced?
 5. Can we retrieve the quota?

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
2017-10-27 11:07:37 -07:00
Sebastiaan van Stijn
8f702de9b7
Improve devicemapper driver-status output
Do not print "Data file" and "Metadata file" if they're
not used, and sort/group output.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2017-10-27 10:12:39 +02:00
Sebastiaan van Stijn
ce5800c329 Merge pull request #34670 from sargun/use_copy_file_range
Use In-kernel File Copy for Overlayfs and VFS on Linux
2017-10-25 17:10:44 +02:00
Sargun Dhillon
3ec4ec2857 Add zero-copy support to copy module
This changeset allows Docker's VFS, and Overlay to take advantage of
Linux's zerocopy APIs.

The copy function first tries to use the ficlone ioctl. Reason being:
 - they do not allow partial success (aka short writes)
 - clones are expected to be a fast metadata operation
See: http://oss.sgi.com/archives/xfs/2015-12/msg00356.html

If the clone fails, we fall back to copy_file_range, which internally
may fall back to splice, which has an upper limit on the size
of copy it can perform. Given that, we have to loop until the copy
is done.

For a given dirCopy operation, if the clone fails, we will not try
it again during any other file copy. Same is true with copy_file_range.

If all else fails, we fall back to traditional copy.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
2017-10-24 13:14:40 -07:00
Sargun Dhillon
5298785b8e Separate daemon/graphdriver/overlay/copy into its own package
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
2017-10-24 13:14:40 -07:00
Michael Crosby
5a9b5f10cf Remove solaris files
For obvious reasons that it is not really supported now.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-10-24 15:39:34 -04:00
John Howard
0380fbff37 LCOW: API: Add platform to /images/create and /build
Signed-off-by: John Howard <jhoward@microsoft.com>

This PR has the API changes described in https://github.com/moby/moby/issues/34617.
Specifically, it adds an HTTP header "X-Requested-Platform" which is a JSON-encoded
OCI Image-spec `Platform` structure.

In addition, it renames (almost all) uses of a string variable platform (and associated)
methods/functions to os. This makes it much clearer to disambiguate with the swarm
"platform" which is really os/arch. This is a stepping stone to getting the daemon towards
fully multi-platform/arch-aware, and makes it clear when "operating system" is being
referred to rather than "platform" which is misleadingly used - sometimes in the swarm
meaning, but more often as just the operating system.
2017-10-06 11:44:18 -07:00
Yong Tang
595b929c57 Merge pull request #34342 from coolljt0725/fallback_to_naive_diff
Fallback to use naive diff driver if enable CONFIG_OVERLAY_FS_REDIRECT_DIR
2017-10-03 06:45:17 -07:00
Yuhao Fang
c673319dea fix typo
Signed-off-by: Yuhao Fang <fangyuhao@gmail.com>
2017-10-01 23:11:58 +08:00
Lei Jitang
49c3a7c4ba Fallback to use naive diff driver if enable CONFIG_OVERLAY_FS_REDIRECT_DIR
When use overlay2 as the graphdriver and the kernel enable
`CONFIG_OVERLAY_FS_REDIRECT_DIR=y`, rename a dir in lower layer
will has a xattr to redirct its dir to source dir. This make the
image layer unportable. This patch fallback to use naive diff driver
when kernel enable CONFIG_OVERLAY_FS_REDIRECT_DIR

Signed-off-by: Lei Jitang <leijitang@huawei.com>
2017-09-22 09:40:18 +08:00
Yong Tang
777d4a1bf4 Merge pull request #34861 from tklauser/fix-cstring-leaks
Fix CString memory leaks
2017-09-21 09:14:07 -07:00
Yong Tang
48cce22933 Merge pull request #34914 from euank/000003-percent
overlay2: fix faulty errcheck
2017-09-20 19:52:10 -07:00
Euan Kemp
639ab92f01 overlay2: fix faulty errcheck
The change in 7a7357dae1 inadvertently
changed the `defer` error code into a no-op. This restores its behavior
prior to that code change, and also introduces a little more error
logging.

Signed-off-by: Euan Kemp <euan.kemp@coreos.com>
2017-09-20 15:25:57 -07:00
Yong Tang
e40d5e665c Merge pull request #34863 from keloyang/close-pipe
Close pipe in overlay2 graphdriver
2017-09-20 09:37:15 -07:00
Yong Tang
3fa72d38ec Merge pull request #34721 from kinvolk/iaguis/add-missing-ecryptfs-string
Add missing eCryptfs translation to FsNames
2017-09-19 05:45:24 -07:00
Sebastiaan van Stijn
13e8a7a006 Merge pull request #34891 from Microsoft/jjh/fixcomment
LCOW: Fix comment in graphdriver code
2017-09-19 14:43:35 +02:00
John Howard
f9fc269c20 LCOW: Fix comment in graphdriver code
Signed-off-by: John Howard <jhoward@microsoft.com>
2017-09-18 19:52:55 -07:00
Shukui Yang
9f38923901 Close pipe if mountFrom failed.
Signed-off-by: Shukui Yang <yangshukui@huawei.com>
2017-09-19 01:25:39 +00:00
Yong Tang
65e88d996a Merge pull request #34759 from kolyshkin/gometalinter
Gometalinter fixups for non-x86
2017-09-18 13:44:15 -07:00
Iago López Galeiras
ddb31b4fdf Add missing eCryptfs translation to FsNames
It was causing the error message to be

    'overlay' is not supported over <unknown>

instead of

    'overlay' is not supported over ecryptfs

Signed-off-by: Iago López Galeiras <iago@kinvolk.io>
2017-09-18 19:06:13 +02:00
Kir Kolyshkin
b569f57890 overlay gd: fix build for 32-bit ARM
This commit reverts a hunk of commit 2f5f0af3f ("Add unconvert linter")
and adds a hint for unconvert linter to ignore excessive conversion as
it is required on 32-bit platforms (e.g. armhf).

The exact error on armhf is this:

	19:06:45 ---> Making bundle: dynbinary (in bundles/17.06.0-dev/dynbinary)
	19:06:48 Building: bundles/17.06.0-dev/dynbinary-daemon/dockerd-17.06.0-dev
	19:10:58 # github.com/docker/docker/daemon/graphdriver/overlay
	19:10:58 daemon/graphdriver/overlay/copy.go:161: cannot use stat.Atim.Sec (type int32) as type int64 in argument to time.Unix
	19:10:58 daemon/graphdriver/overlay/copy.go:161: cannot use stat.Atim.Nsec (type int32) as type int64 in argument to time.Unix
	19:10:58 daemon/graphdriver/overlay/copy.go:162: cannot use stat.Mtim.Sec (type int32) as type int64 in argument to time.Unix
	19:10:58 daemon/graphdriver/overlay/copy.go:162: cannot use stat.Mtim.Nsec (type int32) as type int64 in argument to time.Unix

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2017-09-17 22:04:31 -07:00
Kir Kolyshkin
c21245c920 devmapper: tell why xfs is not supported
Instead of providing a generic message listing all possible reasons
why xfs is not available on the system, let's be specific.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2017-09-17 22:04:31 -07:00
Kir Kolyshkin
46833ee1c3 devmapper: show dmesg if mount fails
If mount fails, the reason might be right there in the kernel log ring buffer.
Let's include it in the error message, it might be of great help.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2017-09-17 22:04:31 -07:00