When go-1.11beta1 is used for building, the following error is
reported:
> 14:56:20 daemon\graphdriver\lcow\lcow.go:236: Debugf format %s reads
> arg #2, but call has 1 arg
While fixing this, let's also fix a few other things in this
very function (startServiceVMIfNotRunning):
1. Do not use fmt.Printf when not required.
2. Use `title` whenever possible.
3. Don't add `id` to messages as `title` already has it.
4. Remove duplicated colons.
5. Try to unify style of messages.
6. s/startservicevmifnotrunning/startServiceVMIfNotRunning/
...
In general, logging/debugging here is a mess and requires much more
love than I can give it at the moment. Areas for improvement:
1. Add a global var logger = logrus.WithField("storage-driver", "lcow")
and use it everywhere else in the code.
2. Use logger.WithField("id", id) whenever possible (same for "context"
and other similar fields).
3. Revise all the errors returned to be uniform.
4. Make use of errors.Wrap[f] whenever possible.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Fix the following go-1.11beta1 build error:
> daemon/graphdriver/aufs/aufs.go:376: Wrapf format %s reads arg #1, but call has 0 args
While at it, change '%s' to %q.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This makes it a bit simpler to remove this interface for v2 plugins
and not break external projects (libnetwork and swarmkit).
Note that before we remove the `Client()` interface from `CompatPlugin`
libnetwork and swarmkit must be updated to explicitly check for the v1
client interface as is done int his PR.
This is just a minor tweak that I realized is needed after trying to
implement the needed changes on libnetwork.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
In case aufs driver is not supported because supportsAufs() said so,
it is not possible to get a real reason from the logs.
To fix, log the error returned.
Note we're not using WithError here as the error message itself is the
sole message we want to print (i.e. there's nothing to add to it).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
ext4 support d_type by default, but filetype feature is a tunable so
there is still a chance to disable it for some reasons. In this case,
print additional message to explicitly tell how to support d_type.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
The overlay storage driver currently does not support any option, but was silently
ignoring any option that was passed.
This patch verifies that no options are passed, and if they are passed will produce
an error.
Before this change:
dockerd --storage-driver=overlay --storage-opt dm.thinp_percent=95
INFO[2018-05-11T11:40:40.996597152Z] libcontainerd: started new docker-containerd process pid=256
....
INFO[2018-05-11T11:40:41.135392535Z] Daemon has completed initialization
INFO[2018-05-11T11:40:41.141035093Z] API listen on /var/run/docker.sock
After this change:
dockerd --storage-driver=overlay --storage-opt dm.thinp_percent=95
INFO[2018-05-11T11:39:21.632610319Z] libcontainerd: started new docker-containerd process pid=233
....
Error starting daemon: error initializing graphdriver: overlay: unknown option dm.thinp_percent
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Functions `GetMounts()` and `parseMountTable()` return all the entries
as read and parsed from /proc/self/mountinfo. In many cases the caller
is only interested only one or a few entries, not all of them.
One good example is `Mounted()` function, which looks for a specific
entry only. Another example is `RecursiveUnmount()` which is only
interested in mount under a specific path.
This commit adds `filter` argument to `GetMounts()` to implement
two things:
1. filter out entries a caller is not interested in
2. stop processing if a caller is found what it wanted
`nil` can be passed to get a backward-compatible behavior, i.e. return
all the entries.
A few filters are implemented:
- `PrefixFilter`: filters out all entries not under `prefix`
- `SingleEntryFilter`: looks for a specific entry
Finally, `Mounted()` is modified to use `SingleEntryFilter()`, and
`RecursiveUnmount()` is using `PrefixFilter()`.
Unit tests are added to check filters are working.
[v2: ditch NoFilter, use nil]
[v3: ditch GetMountsFiltered()]
[v4: add unit test for filters]
[v5: switch to gotestyourself]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
commit 617c352e92 "Don't create devices if in a user namespace"
introduced check, which meant to skip mknod operation when run
in user namespace, but instread skipped FIFO and socket files
copy.
Signed-off-by: Maxim Ivanov <ivanov.maxim@gmail.com>
Now all of the storage drivers use the field "storage-driver" in their log
messages, which is set to name of the respective driver.
Storage drivers changed:
- Aufs
- Btrfs
- Devicemapper
- Overlay
- Overlay 2
- Zfs
Signed-off-by: Alejandro GonzÃlez Hevia <alejandrgh11@gmail.com>
Move the "unmount and deactivate" code into a separate method, and
optimize it a bit:
1. Do not use filepath.Walk() as there's no requirement to recursively
go into every directory under home/mnt; a list of directories in mnt
is sufficient. With filepath.Walk(), in case some container will fail
to unmount, it'll go through the whole container filesystem which is
excessive and useless.
2. Do not use GetMounts() and check if a directory is mounted; just
unmount it and ignore "not mounted" error. Note the same error
is returned in case of wrong flags set, but as flags are hardcoded
we can safely ignore such case.
While at it, promote "can't unmount" log level from debug to warning.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. Make sure it's clear the error is from unmount.
2. Simplify the code a bit to make it more readable.
[v2: use errors.Wrap]
[v3: use errors.Wrapf]
[v4: lowercase the error message]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. Replace EnsureRemoveAll() with Rmdir(), as here we are removing
the container's mount point, which is already properly unmounted
and is therefore an empty directory.
2. Ignore the Rmdir() error (but log it unless it's ENOENT). This
is a mount point, currently unmounted (i.e. an empty directory),
and an older kernel can return EBUSY if e.g. the mount was
leaked to other mount namespaces.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Before this change, volume management was relying on the fact that
everything the plugin mounts is visible on the host within the plugin's
rootfs. In practice this caused some issues with mount leaks, so we
changed the behavior such that mounts are not visible on the plugin's
rootfs, but available outside of it, which breaks volume management.
To fix the issue, allow the plugin to scope the path correctly rather
than assuming that everything is visible in `p.Rootfs`.
In practice this is just scoping the `PropagatedMount` paths to the
correct host path.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Setting up the mounts on the host increases chances of mount leakage and
makes for more cleanup after the plugin has stopped.
With this change all mounts for the plugin are performed by the
container runtime and automatically cleaned up when the container exits.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This was added in #36047 just as a way to make sure the tree is fully
unmounted on shutdown.
For ZFS this could be a breaking change since there was no unmount before.
Someone could have setup the zfs tree themselves. It would be better, if
we really do want the cleanup to actually the unpacked layers checking
for mounts rather than a blind recursive unmount of the root.
BTRFS does not use mounts and does not need to unmount anyway.
These was only an unmount to begin with because for some reason the
btrfs tree was being moutned with `private` propagation.
For the other graphdrivers that still have a recursive unmount here...
these were already being unmounted and performing the recursive unmount
shouldn't break anything. If anyone had anything mounted at the
graphdriver location it would have been unmounted on shutdown anyway.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
The idea behind making the graphdrivers private is to prevent leaking
mounts into other namespaces.
Unfortunately this is not really what happens.
There is one case where this does work, and that is when the namespace
was created before the daemon's namespace.
However with systemd each system servie winds up with it's own mount
namespace. This causes a race betwen daemon startup and other system
services as to if the mount is actually private.
This also means there is a negative impact when other system services
are started while the daemon is running.
Basically there are too many things that the daemon does not have
control over (nor should it) to be able to protect against these kinds
of leakages. One thing is certain, setting the graphdriver roots to
private disconnects the mount ns heirarchy preventing propagation of
unmounts... new mounts are of course not propagated either, but the
behavior is racey (or just bad in the case of restarting services)... so
it's better to just be able to keep mount propagation in tact.
It also does not protect situations like `-v
/var/lib/docker:/var/lib/docker` where all mounts are recursively bound
into the container anyway.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This is a fix to regression in vfs graph driver introduced by
commit 7a1618ced3 ("add quota support to VFS graphdriver").
On some filesystems, vfs fails to init with the following error:
> Error starting daemon: error initializing graphdriver: Failed to mknod
> /go/src/github.com/docker/docker/bundles/test-integration/d6bcf6de610e9/root/vfs/backingFsBlockDev:
> function not implemented
As quota is not essential for vfs, let's ignore (but log as a warning) any error
from quota init.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
If mknod() returns ENOSYS, it most probably means quota is not supported
here, so return the appropriate error.
This is a conservative* fix to regression in vfs graph driver introduced
by commit 7a1618ced3 ("add quota support to VFS graphdriver").
On some filesystems, vfs fails to init with the following error:
> Error starting daemon: error initializing graphdriver: Failed to mknod
> /go/src/github.com/docker/docker/bundles/test-integration/d6bcf6de610e9/root/vfs/backingFsBlockDev:
> function not implemented
Reported-by: Brian Goff <cpuguy83@gmail.com>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Files that are suffixed with `_linux.go` or `_windows.go` are
already only built on Linux / Windows, so these build-tags
were redundant.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The overlay2 driver was not setting up the archive.TarOptions field
properly like other storage backend routes to "applyTarLayer"
functionality. The InUserNS field is populated now for overlay2 using
the same query function used by the other storage drivers.
Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com>
Right now we only log source and destination (and demsg) if mount operation
fails. fstype and mount options are available easily. It probably is a good
idea to log these as well. Especially sometimes failures can happen due to
mount options.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Even though it's highly discouraged, there are existing
installs that are running overlay/overlay2 on filesystems
without d_type support.
This patch allows the daemon to start in such cases, instead of
refusing to start without an option to override.
For fresh installs, backing filesystems without d_type support
will still cause the overlay/overlay2 drivers to be marked as
"unsupported", and skipped during the automatic selection.
This feature is only to keep backward compatibility, but
will be removed at some point.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Support for running overlay/overlay2 on a backing filesystem
without d_type support (most likely: xfs, as ext4 supports
this by default), was deprecated for some time.
Running without d_type support is problematic, and can
lead to difficult to debug issues ("invalid argument" errors,
or unable to remove files from the container's filesystem).
This patch turns the warning that was previously printed
into an "unsupported" error, so that the overlay/overlay2
drivers are not automatically selected when detecting supported
storage drivers.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The fsmagic check was always performed on "data-root" (`/var/lib/docker`),
not on the storage-driver's home directory (e.g. `/var/lib/docker/<somedriver>`).
This caused detection to be done on the wrong filesystem in situations
where `/var/lib/docker/<somedriver>` was a mount, and a different
filesystem than `/var/lib/docker` itself.
This patch checks if the storage-driver's home directory exists, and only
falls back to `/var/lib/docker` if it doesn't exist.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This commit is a set of fixes and improvement for zfs graph driver,
in particular:
1. Remove mount point after umount in `Get()` error path, as well
as in `Put()` (with `MNT_DETACH` flag). This should solve "failed
to remove root filesystem for <ID> .... dataset is busy" error
reported in Moby issue 35642.
To reproduce the issue:
- start dockerd with zfs
- docker run -d --name c1 --rm busybox top
- docker run -d --name c2 --rm busybox top
- docker stop c1
- docker rm c1
Output when the bug is present:
```
Error response from daemon: driver "zfs" failed to remove root
filesystem for XXX : exit status 1: "/sbin/zfs zfs destroy -r
scratch/docker/YYY" => cannot destroy 'scratch/docker/YYY':
dataset is busy
```
Output when the bug is fixed:
```
Error: No such container: c1
```
(as the container has been successfully autoremoved on stop)
2. Fix/improve error handling in `Get()` -- do not try to umount
if `refcount` > 0
3. Simplifies unmount in `Get()`. Specifically, remove call to
`graphdriver.Mounted()` (which checks if fs is mounted using
`statfs()` and check for fs type) and `mount.Unmount()` (which
parses `/proc/self/mountinfo`). Calling `unix.Unmount()` is
simple and sufficient.
4. Add unmounting of driver's home to `Cleanup()`.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Previously, the code would set the mtime on the directories before
creating files in the directory itself. This was problematic
because it resulted in the mtimes on the directories being
incorrectly set. This change makes it so that the mtime is
set only _after_ all of the files have been created.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
There was a small issue here, where it copied the data using
traditional mechanisms, even when copy_file_range was successful.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
This change makes the VFS graphdriver use the kernel-accelerated
(copy_file_range) mechanism of copying files, which is able to
leverage reflinks.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Previously, graphdriver/copy would improperly copy hardlinks as just regular
files. This patch changes that behaviour, and instead the code now keeps
track of inode numbers, and if it sees the same inode number again
during the copy loop, it hardlinks it, instead of copying it.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
The overlay2 storage-driver requires multiple lower dir
support for overlayFs. Support for this feature was added
in kernel 4.x, but some distros (RHEL 7.4, CentOS 7.4) ship with
an older kernel with this feature backported.
This patch adds feature-detection for multiple lower dirs,
and will perform this feature-detection on pre-4.x kernels
with overlayFS support.
With this patch applied, daemons running on a kernel
with multiple lower dir support will now select "overlay2"
as storage-driver, instead of falling back to "overlay".
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This subtle bug keeps lurking in because error checking for `Mkdir()`
and `MkdirAll()` is slightly different wrt to `EEXIST`/`IsExist`:
- for `Mkdir()`, `IsExist` error should (usually) be ignored
(unless you want to make sure directory was not there before)
as it means "the destination directory was already there"
- for `MkdirAll()`, `IsExist` error should NEVER be ignored.
Mostly, this commit just removes ignoring the IsExist error, as it
should not be ignored.
Also, there are a couple of cases then IsExist is handled as
"directory already exist" which is wrong. As a result, some code
that never worked as intended is now removed.
NOTE that `idtools.MkdirAndChown()` behaves like `os.MkdirAll()`
rather than `os.Mkdir()` -- so its description is amended accordingly,
and its usage is handled as such (i.e. IsExist error is not ignored).
For more details, a quote from my runc commit 6f82d4b (July 2015):
TL;DR: check for IsExist(err) after a failed MkdirAll() is both
redundant and wrong -- so two reasons to remove it.
Quoting MkdirAll documentation:
> MkdirAll creates a directory named path, along with any necessary
> parents, and returns nil, or else returns an error. If path
> is already a directory, MkdirAll does nothing and returns nil.
This means two things:
1. If a directory to be created already exists, no error is
returned.
2. If the error returned is IsExist (EEXIST), it means there exists
a non-directory with the same name as MkdirAll need to use for
directory. Example: we want to MkdirAll("a/b"), but file "a"
(or "a/b") already exists, so MkdirAll fails.
The above is a theory, based on quoted documentation and my UNIX
knowledge.
3. In practice, though, current MkdirAll implementation [1] returns
ENOTDIR in most of cases described in #2, with the exception when
there is a race between MkdirAll and someone else creating the
last component of MkdirAll argument as a file. In this very case
MkdirAll() will indeed return EEXIST.
Because of #1, IsExist check after MkdirAll is not needed.
Because of #2 and #3, ignoring IsExist error is just plain wrong,
as directory we require is not created. It's cleaner to report
the error now.
Note this error is all over the tree, I guess due to copy-paste,
or trying to follow the same usage pattern as for Mkdir(),
or some not quite correct examples on the Internet.
[1] https://github.com/golang/go/blob/f9ed2f75/src/os/path.go
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This removes and recreates the merged dir with each umount/mount
respectively.
This is done to make the impact of leaking mountpoints have less
user-visible impact.
It's fairly easy to accidentally leak mountpoints (even if moby doesn't,
other tools on linux like 'unshare' are quite able to incidentally do
so).
As of recently, overlayfs reacts to these mounts being leaked (see
One trick to force an unmount is to remove the mounted directory and
recreate it. Devicemapper now does this, overlay can follow suit.
Signed-off-by: Euan Kemp <euan.kemp@coreos.com>
When starting the daemon, the `/var/lib/docker` directory
is scanned for existing directories, so that the previously
selected graphdriver will automatically be used.
In some situations, empty directories are present (those
directories can be created during feature detection of
graph-drivers), in which case the daemon refuses to start.
This patch improves detection, and skips empty directories,
so that leftover directories don't cause the daemon to
fail.
Before this change:
$ mkdir /var/lib/docker /var/lib/docker/aufs /var/lib/docker/overlay2
$ dockerd
...
Error starting daemon: error initializing graphdriver: /var/lib/docker contains several valid graphdrivers: overlay2, aufs; Please cleanup or explicitly choose storage driver (-s <DRIVER>)
With this patch applied:
$ mkdir /var/lib/docker /var/lib/docker/aufs /var/lib/docker/overlay2
$ dockerd
...
INFO[2017-11-16T17:26:43.207739140Z] Docker daemon commit=ab90bc296 graphdriver(s)=overlay2 version=dev
INFO[2017-11-16T17:26:43.208033095Z] Daemon has completed initialization
And on restart (prior graphdriver is still picked up):
$ dockerd
...
INFO[2017-11-16T17:27:52.260361465Z] [graphdriver] using prior storage driver: overlay2
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Commit 7a1618ced3 regresses running Docker
in user namespaces. The new check for whether quota are supported calls
NewControl() which in turn calls makeBackingFsDev() which tries to
mknod(). Skip quota tests when we detect that we are running in a user
namespace and return ErrQuotaNotSupported to the caller. This just
restores the status quo.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Add a way to specify a custom graphdriver priority list
during build. This can be done with something like
go build -ldflags "-X github.com/docker/docker/daemon/graphdriver.priority=overlay2,devicemapper"
As ldflags are already used by the engine build process, and it seems
that only one (last) `-ldflags` argument is taken into account by go,
an envoronment variable `DOCKER_LDFLAGS` is introduced in order to
be able to append some text to `-ldflags`. With this in place,
using the feature becomes
make DOCKER_LDFLAGS="-X github.com/docker/docker/daemon/graphdriver.priority=overlay2,devicemapper" dynbinary
The idea behind this is, the priority list might be different
for different distros, so vendors are now able to change it
without patching the source code.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Make it possible to disable overlay and overlay2 separately.
With this commit, we now have `exclude_graphdriver_overlay` and
`exclude_graphdriver_overlay2` build tags for the engine, which
is in line with any other graph driver.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
From https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt:
> The lower filesystem can be any filesystem supported by Linux and does
> not need to be writable. The lower filesystem can even be another
> overlayfs. The upper filesystem will normally be writable and if it
> is it must support the creation of trusted.* extended attributes, and
> must provide valid d_type in readdir responses, so NFS is not suitable.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
In order to avoid reverting our fix for mount leakage in devicemapper,
add a test which checks that devicemapper's Get() and Put() cycle can
survive having a command running in an rprivate mount propagation setup
in-between. While this is quite rudimentary, it should be sufficient.
We have to skip this test for pre-3.18 kernels.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
This patch adds the capability for the VFS graphdriver to use
XFS project quotas. It reuses the existing quota management
code that was created by overlay2 on XFS.
It doesn't rely on a filesystem whitelist, but instead
the quota-capability detection code.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
This adds a mechanism (read-only) to check for project quota support
in a standard way. This mechanism is leveraged by the tests, which
test for the following:
1. Can we get a quota controller?
2. Can we set the quota for a particular directory?
3. Is the quota being over-enforced?
4. Is the quota being under-enforced?
5. Can we retrieve the quota?
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
This changeset allows Docker's VFS, and Overlay to take advantage of
Linux's zerocopy APIs.
The copy function first tries to use the ficlone ioctl. Reason being:
- they do not allow partial success (aka short writes)
- clones are expected to be a fast metadata operation
See: http://oss.sgi.com/archives/xfs/2015-12/msg00356.html
If the clone fails, we fall back to copy_file_range, which internally
may fall back to splice, which has an upper limit on the size
of copy it can perform. Given that, we have to loop until the copy
is done.
For a given dirCopy operation, if the clone fails, we will not try
it again during any other file copy. Same is true with copy_file_range.
If all else fails, we fall back to traditional copy.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Signed-off-by: John Howard <jhoward@microsoft.com>
This PR has the API changes described in https://github.com/moby/moby/issues/34617.
Specifically, it adds an HTTP header "X-Requested-Platform" which is a JSON-encoded
OCI Image-spec `Platform` structure.
In addition, it renames (almost all) uses of a string variable platform (and associated)
methods/functions to os. This makes it much clearer to disambiguate with the swarm
"platform" which is really os/arch. This is a stepping stone to getting the daemon towards
fully multi-platform/arch-aware, and makes it clear when "operating system" is being
referred to rather than "platform" which is misleadingly used - sometimes in the swarm
meaning, but more often as just the operating system.
When use overlay2 as the graphdriver and the kernel enable
`CONFIG_OVERLAY_FS_REDIRECT_DIR=y`, rename a dir in lower layer
will has a xattr to redirct its dir to source dir. This make the
image layer unportable. This patch fallback to use naive diff driver
when kernel enable CONFIG_OVERLAY_FS_REDIRECT_DIR
Signed-off-by: Lei Jitang <leijitang@huawei.com>
The change in 7a7357dae1 inadvertently
changed the `defer` error code into a no-op. This restores its behavior
prior to that code change, and also introduces a little more error
logging.
Signed-off-by: Euan Kemp <euan.kemp@coreos.com>
It was causing the error message to be
'overlay' is not supported over <unknown>
instead of
'overlay' is not supported over ecryptfs
Signed-off-by: Iago López Galeiras <iago@kinvolk.io>
This commit reverts a hunk of commit 2f5f0af3f ("Add unconvert linter")
and adds a hint for unconvert linter to ignore excessive conversion as
it is required on 32-bit platforms (e.g. armhf).
The exact error on armhf is this:
19:06:45 ---> Making bundle: dynbinary (in bundles/17.06.0-dev/dynbinary)
19:06:48 Building: bundles/17.06.0-dev/dynbinary-daemon/dockerd-17.06.0-dev
19:10:58 # github.com/docker/docker/daemon/graphdriver/overlay
19:10:58 daemon/graphdriver/overlay/copy.go:161: cannot use stat.Atim.Sec (type int32) as type int64 in argument to time.Unix
19:10:58 daemon/graphdriver/overlay/copy.go:161: cannot use stat.Atim.Nsec (type int32) as type int64 in argument to time.Unix
19:10:58 daemon/graphdriver/overlay/copy.go:162: cannot use stat.Mtim.Sec (type int32) as type int64 in argument to time.Unix
19:10:58 daemon/graphdriver/overlay/copy.go:162: cannot use stat.Mtim.Nsec (type int32) as type int64 in argument to time.Unix
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Instead of providing a generic message listing all possible reasons
why xfs is not available on the system, let's be specific.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
If mount fails, the reason might be right there in the kernel log ring buffer.
Let's include it in the error message, it might be of great help.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Since the update to Debian Stretch, devmapper unit test fails. One
reason is, the combination of somewhat old (less than 3.16) kernel and
relatively new xfsprogs leads to creating a filesystem which is not supported
by the kernel:
> [12206.467518] XFS (dm-1): Superblock has unknown read-only compatible features (0x1) enabled.
> [12206.472046] XFS (dm-1): Attempted to mount read-only compatible filesystem read-write.
> Filesystem can only be safely mounted read only.
> [12206.472079] XFS (dm-1): SB validate failed with error 22.
Ideally, that would be automatically and implicitly handled by xfsprogs.
In real life, we have to take care about it here. Sigh.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Static build with devmapper is impossible now since libudev is required
and no static version of libudev is available (as static libraries are
not supported by systemd which udev is part of).
This should not hurt anyone as "[t]he primary user of static builds
is the Editions, and docker in docker via the containers, and none
of those use device mapper".
Also, since the need for static libdevmapper is gone, there is no need
to self-compile libdevmapper -- let's use the one from Debian Stretch.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Make sure to call C.free on C string allocated using C.CString in every
exit path.
C.CString allocates memory in the C heap using malloc. It is the callers
responsibility to free them. See
https://golang.org/cmd/cgo/#hdr-Go_references_to_C for details.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
This enables docker cp and ADD/COPY docker build support for LCOW.
Originally, the graphdriver.Get() interface returned a local path
to the container root filesystem. This does not work for LCOW, so
the Get() method now returns an interface that LCOW implements to
support copying to and from the container.
Signed-off-by: Akash Gupta <akagup@microsoft.com>
This commit reverts a hunk of commit 2f5f0af3f ("Add unconvert linter")
and adds a hint for unconvert linter to ignore excessive conversion as
it is required on 32-bit platforms (e.g. armhf).
The exact error on armhf is this:
19:06:45 ---> Making bundle: dynbinary (in bundles/17.06.0-dev/dynbinary)
19:06:48 Building: bundles/17.06.0-dev/dynbinary-daemon/dockerd-17.06.0-dev
19:10:58 # github.com/docker/docker/daemon/graphdriver/overlay
19:10:58 daemon/graphdriver/overlay/copy.go:161: cannot use stat.Atim.Sec (type int32) as type int64 in argument to time.Unix
19:10:58 daemon/graphdriver/overlay/copy.go:161: cannot use stat.Atim.Nsec (type int32) as type int64 in argument to time.Unix
19:10:58 daemon/graphdriver/overlay/copy.go:162: cannot use stat.Mtim.Sec (type int32) as type int64 in argument to time.Unix
19:10:58 daemon/graphdriver/overlay/copy.go:162: cannot use stat.Mtim.Nsec (type int32) as type int64 in argument to time.Unix
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
libdm currently has a fairly substantial DoS bug that makes certain
operations fail on a libdm device if the device has active references
through mountpoints. This is a significant problem with the advent of
mount namespaces and MS_PRIVATE, and can cause certain --volume mounts
to cause libdm to no longer be able to remove containers:
% docker run -d --name testA busybox top
% docker run -d --name testB -v /var/lib/docker:/docker busybox top
% docker rm -f testA
[fails on libdm with dm_task_run errors.]
This also solves the problem of unprivileged users being able to DoS
docker by using unprivileged mount namespaces to preseve mounts that
Docker has dropped.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
In d42dbdd3d4 the code was re-arranged to
better report errors, and ignore non-errors.
In doing so we removed a deferred remove of the AUFS diff path, but did
not replace it with a non-deferred one.
This fixes the issue and makes the code a bit more readable.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
I was able to successfully use device mapper autoconfig feature
(commit 5ef07d79c) but it stopped working after a reboot.
Investigation shown that the dm device was not activated because of
a missing binary, that is not used during initial setup, but every
following time. Here's an error shown when trying to manually activate
the device:
> kir@kd:~/go/src/github.com/docker/docker$ sudo lvchange -a y /dev/docker/thinpool
> /usr/sbin/thin_check: execvp failed: No such file or directory
> Check of pool docker/thinpool failed (status:2). Manual repair required!
Surely, there is no solution to this other than to have a package that
provides the thin_check binary installed beforehand. Due to the fact
the issue revealed itself way later than DM setup was performed, it was
somewhat harder to investigate.
With this in mind, let's check for binary presense before setting up DM,
refusing to proceed if the binary is not there, saving a user from later
frustration.
While at it, eliminate repeated binary checking code. The downside is
that the binary lookup is happening more than once now -- I think the
clarity of code overweights this minor de-optimization.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
I tried using dm.directlvm_device but it ended up with the following
error:
> Error starting daemon: error initializing graphdriver: error
> writing docker thinp autoextend profile: open
> /etc/lvm/profile/docker-thinpool.profile: no such file or directory
The reason is /etc/lvm/profile directory does not exist. I think it is
better to try creating it beforehand.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: John Howard <jhoward@microsoft.com>
This changes the graphdriver to perform dynamic sandbox management.
Previously, as a temporary 'hack', the service VM had a prebuilt
sandbox in it. With this change, management is under the control
of the client (docker) and executes a mkfs.ext4 on it. This enables
sandboxes of non-default sizes too (a TODO previously in the code).
It also addresses https://github.com/moby/moby/pull/33969#discussion_r127287887
Requires:
- go-winio: v0.4.3
- opengcs: v0.0.12
- hcsshim: v0.6.x
Make sure user understands this is about the in-kernel driver
(not the dockerd driver or smth).
While at it, amend the comment as well.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Switch some more usage of the Stat function and the Stat_t type from the
syscall package to golang.org/x/sys. Those were missing in PR #33399.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Specifically, none of the graphdrivers are supposed to return a
not-exist type of error on remove (or at least that's how they are
currently handled).
Found that AUFS still had one case where a not-exist error could escape,
when checking if the directory is mounted we call a `Statfs` on the
path.
This fixes AUFS to not return an error in this case, but also
double-checks at the daemon level on layer remove that the error is not
a `not-exist` type of error.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Signed-off-by: John Howard <jhoward@microsoft.com>
This changes the LCOW driver to support both global SVM lifetime and
per-instance lifetime. It also corrects the scratch implementation.
Changes most references of syscall to golang.org/x/sys/
Ones aren't changes include, Errno, Signal and SysProcAttr
as they haven't been implemented in /x/sys/.
Signed-off-by: Christopher Jones <tophj@linux.vnet.ibm.com>
[s390x] switch utsname from unsigned to signed
per 33267e036f
char in s390x in the /x/sys/unix package is now signed, so
change the buildtags
Signed-off-by: Christopher Jones <tophj@linux.vnet.ibm.com>
Because we use our own logging callbacks in order to use libdm
effectively, it is quite difficult to debug complicated devicemapper
issues (because any warnings or notices from libdm are muted by our own
callback function). e07d3cd9a ("devmapper: Fix libdm logging") further
reduced the ability of this debugging by only allowing _LOG_FATAL errors
to be passed to the output.
Unfortunately libdm is very chatty, so in order to avoid making the logs
even more crowded, add a dm.libdm_log_level storage option that allows
people who are debugging the lovely world of libdm to be able to dive in
without recompiling binaries.
The valid values of dm.libdm_log_level map directly to the libdm logging
levels, and are in the range [2,7] as of the time of writing with 7
being _LOG_DEBUG and 2 being _LOG_FATAL. The default is _LOG_FATAL.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
LogInit used to act as a manual way of registering the *necessary*
pkg/devicemapper logging callbacks. In addition, it was used to split up
the logic of pkg/devicemapper into daemon/graphdriver/devmapper (such
that some things were logged from libdm).
The manual aspect of this API was completely non-sensical and was just
begging for incorrect usage of pkg/devicemapper, so remove that semantic
and always register our own libdm callbacks.
In addition, recombine the split out logging callbacks into
pkg/devicemapper so that the default logger is local to the library and
also shown to be the recommended logger. This makes the code
substantially easier to read. Also the new DefaultLogger now has
configurable upper-bound for the log level, which allows for dynamically
changing the logging level.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
There have been some cases where umount, a device can be busy for a very
short duration. Maybe its udev rules, or maybe it is runc related races
or probably it is something else. We don't know yet.
If deferred removal is enabled but deferred deletion is not, then for the
case of "docker run -ti --rm fedora bash", a container will exit, device
will be deferred removed and then immediately a call will come to delete
the device. It is possible that deletion will fail if device was busy
at that time.
A device can't be deleted if it can't be removed/deactivated first. There
is only one exception and that is when deferred deletion is on. In that
case graph driver will keep track of deleted device and try to delete it
later and return success to caller.
Always make sure that device deactivation is synchronous when device is
being deleted (except the case when deferred deletion is enabled).
This should also take care of small races when device is busy for a short
duration and it is being deleted.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
This commit adds the overlay2.size option to the daemon daemon
storage opts.
The user can override this option by the "docker run --storage-opt"
options.
Signed-off-by: Dhawal Yogesh Bhanushali <dbhanushali@vmware.com>
This enables deferred device deletion/removal by default if the driver
version in the kernel is new enough to support the feature.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
we see a lot of
```
level=debug msg="Failed to unmount a03b1bb6f569421857e5407d73d89451f92724674caa56bfc2170de7e585a00b-init overlay: device or resource busy"
```
in daemon logs and there is a lot of mountpoint leftover.
This cause failed to remove container.
Signed-off-by: Lei Jitang <leijitang@huawei.com>
There is no case which would resolve in this error. The root user always exists, and if the id maps are empty, the default value of 0 is correct.
Signed-off-by: Daniel Nephin <dnephin@docker.com>
The test was failing because TarOptions was using a non-pointer for
ChownOpts, which meant the check for nil was never true, and
createTarFile was never using the hdr.UID/GID
Signed-off-by: Daniel Nephin <dnephin@docker.com>
This commit is an extension of fix for 29325 based on the review comment.
In this commit, the quota size for btrfs is kept in `/var/lib/docker/btrfs/quotas`
so that a daemon restart keeps quota.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
This fix tries to address the issue raised in 29325 where
btrfs quota groups are not clean up even after containers
have been destroyed.
The reason for the issue is that btrfs quota groups have
to be explicitly destroyed. This fix fixes this issue.
This fix is tested manually in Ubuntu 16.04,
with steps specified in 29325.
This fix fixes 29325.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
OverlayFS is supported on top of btrfs as of Linux Kernel 4.7.
Skip the hard enforcement when on kernel 4.7 or newer and
respect the kernel check override flag on older kernels.
https://btrfs.wiki.kernel.org/index.php/Changelog#By_feature
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
Before this, if `forceRemove` is set the container data will be removed
no matter what, including if there are issues with removing container
on-disk state (rw layer, container root).
In practice this causes a lot of issues with leaked data sitting on
disk that users are not able to clean up themselves.
This is particularly a problem while the `EBUSY` errors on remove are so
prevalent. So for now let's not keep this behavior.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Instead of forcing users to manually configure a block device to use
with devmapper, this gives the user the option to let the devmapper
driver configure a device for them.
Adds several new options to the devmapper storage-opts:
- dm.directlvm_device="" - path to the block device to configure for
direct-lvm
- dm.thinp_percent=95 - sets the percentage of space to use for
storage from the passed in block device
- dm.thinp_metapercent=1 - sets the percentage of space to for metadata
storage from the passed in block device
- dm.thinp_autoextend_threshold=80 - sets the threshold for when `lvm`
should automatically extend the thin pool as a percentage of the total
storage space
- dm.thinp_autoextend_percent=20 - sets the percentage to increase the
thin pool by when an autoextend is triggered.
Defaults are taken from
[here](https://docs.docker.com/engine/userguide/storagedriver/device-mapper-driver/#/configure-direct-lvm-mode-for-production)
The only option that is required is `dm.directlvm_device` for docker to
set everything up.
Changes to these settings are not currently supported and will error
out.
Future work could support allowing changes to these values.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
if initDevmapper failed after creating thin-pool, the thin-pool will not be removed,
this would cause we can't use the same lvm to create another thin-pool.
Signed-off-by: Lei Jitang <leijitang@huawei.com>
This allows graphdrivers to declare that they can reproduce the original
diff stream for a layer. If they do so, the layer store will not use
tar-split processing, but will still verify the digest on layer export.
This makes it easier to experiment with non-default diff formats.
Signed-off-by: Alfred Landrum <alfred.landrum@docker.com>
Since it was introduced no reports were made and lsof seems to cause
issues on some systems.
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
when doing devices.cancelDeferredRemoval, the device could have been removed
and return ErrEnxio, but it continue to check if it is need to do suspend.
doSuspend := devinfo != nil && devinfo.Exists != 0 uses a devinfo which is
get before devices.cancelDeferredRemoval(baseInfo), it is outdate, the device
has been removed and there is no need to do suspend. If do suspend it will return
devicemapper: Error running deviceSuspend dm_task_run failed.
Signed-off-by: Lei Jitang <leijitang@huawei.com>
Signed-off-by: John Howard <jhoward@microsoft.com>
This fixes https://github.com/docker/docker/issues/30278 where
there is a race condition in HCS for RS1 and RS2 builds, and enumeration
of compute systems can return access is denied if a silo is being
torn down in the kernel while HCS is attempting to enumerate them.
I often get complains that container removal failed and users got following
error message.
"Driver devicemapper failed to remove root filesystem 18a69ba82aaf7a039ce7d44156215012d703001643079775190ac7dd6c6acf56:Device is Busy"
This error message talks about container id but does not give any info
about which particular device id is busy. Most likely device is mounted
in some other mount namespace and if one knows the device id, they
can try to do some debugging figuring which process and which mount
namespace is keeping the device busy and how did we reach that stage.
Without that information, it becomes almost impossible to debug the
problem.
So to improve the debuggability, when device removal fails, also return
device id in error message. Now new message looks as follows.
"Driver devicemapper failed to remove root filesystem 18a69ba82aaf7a039ce7d44156215012d703001643079775190ac7dd6c6acf56: Failed to remove device dbc15bdf9994a17c613d8ef9e924f3cffbf67f91e4f709295c901ad628377991:Device is Busy"
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
This fix tries to address the issue raised in 29810
where btrfs subvolume removal failed when docker
is in an unprivileged lxc container. The failure
was caused by `Failed to rescan btrfs quota` with
`operation not permitted`.
However, if disk quota is not enabled, there is no
need to run a btrfs rescan at the first place.
This fix checks for `quotaEnabled` and only run btrfs
rescan if `quotaEnabled` is true.
This fix fixes 29810.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
Go style calls for mixed caps instead of all caps:
https://golang.org/doc/effective_go.html#mixed-caps
Change LOOKUP, ACQUIRE, and RELEASE to Lookup, Acquire, and Release.
This vendors a fork of libnetwork for now, to deal with a cyclic
dependency issue. The change will be upstream to libnetwork once this is
merged.
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Adds 2 new methods to v2 plugin `Acquire` and `Release` which allow
refcounting directly at the plugin level instead of just the store.
Since a graphdriver is initialized exactly once, and is really managed
by a separate object, it didn't really seem right to call
`getter.Get()` to refcount graphdriver plugins.
On shutdown it was particularly weird where we'd either need to keep a
driver reference in daemon, or keep a reference to the pluggin getter in
the layer store, and even then still store extra details on if the
graphdriver is a plugin or not.
Instead the plugin proxy itself will handle calling the neccessary
refcounting methods directly on the plugin object.
Also adds a new interface in `plugingetter` to account for these new
functions which are not going to be implemented by v1 plugins.
Changes terms `plugingetter.CREATE` and `plugingetter.REMOVE` to
`ACQUIRE` and `RELEASE` respectively, which seems to be better
adjectives for what we're doing.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Currently the plugin initialization is too late for a loaded v2 plugin
to be usable as a graph driver.
This moves the initialization up before we create the graph driver.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
There is no need to populate device id during unregisterDevice(). Nobody
makes use of this information. We just need to remove file associated
with device and that file is looked up using the hash and not the
device id which is used for thin pool operations.
So get rid of device id argument to unregisterDevice().
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Naivediff fails when layers are created directly on top of
each other. Other graphdrivers which use naivediff already
skip these tests. Until naivediff is fixed, skip with overlay2
when running tests on a kernel which causes naivediff fallback.
Fix applydiff to never use the naivediff size when not applying
changes with naivediff.
Signed-off-by: Derek McGowan <derek@mcgstyle.net> (github: dmcgowan)