The overlay2 storage-driver requires multiple lower dir
support for overlayFs. Support for this feature was added
in kernel 4.x, but some distros (RHEL 7.4, CentOS 7.4) ship with
an older kernel with this feature backported.
This patch adds feature-detection for multiple lower dirs,
and will perform this feature-detection on pre-4.x kernels
with overlayFS support.
With this patch applied, daemons running on a kernel
with multiple lower dir support will now select "overlay2"
as storage-driver, instead of falling back to "overlay".
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This subtle bug keeps lurking in because error checking for `Mkdir()`
and `MkdirAll()` is slightly different wrt to `EEXIST`/`IsExist`:
- for `Mkdir()`, `IsExist` error should (usually) be ignored
(unless you want to make sure directory was not there before)
as it means "the destination directory was already there"
- for `MkdirAll()`, `IsExist` error should NEVER be ignored.
Mostly, this commit just removes ignoring the IsExist error, as it
should not be ignored.
Also, there are a couple of cases then IsExist is handled as
"directory already exist" which is wrong. As a result, some code
that never worked as intended is now removed.
NOTE that `idtools.MkdirAndChown()` behaves like `os.MkdirAll()`
rather than `os.Mkdir()` -- so its description is amended accordingly,
and its usage is handled as such (i.e. IsExist error is not ignored).
For more details, a quote from my runc commit 6f82d4b (July 2015):
TL;DR: check for IsExist(err) after a failed MkdirAll() is both
redundant and wrong -- so two reasons to remove it.
Quoting MkdirAll documentation:
> MkdirAll creates a directory named path, along with any necessary
> parents, and returns nil, or else returns an error. If path
> is already a directory, MkdirAll does nothing and returns nil.
This means two things:
1. If a directory to be created already exists, no error is
returned.
2. If the error returned is IsExist (EEXIST), it means there exists
a non-directory with the same name as MkdirAll need to use for
directory. Example: we want to MkdirAll("a/b"), but file "a"
(or "a/b") already exists, so MkdirAll fails.
The above is a theory, based on quoted documentation and my UNIX
knowledge.
3. In practice, though, current MkdirAll implementation [1] returns
ENOTDIR in most of cases described in #2, with the exception when
there is a race between MkdirAll and someone else creating the
last component of MkdirAll argument as a file. In this very case
MkdirAll() will indeed return EEXIST.
Because of #1, IsExist check after MkdirAll is not needed.
Because of #2 and #3, ignoring IsExist error is just plain wrong,
as directory we require is not created. It's cleaner to report
the error now.
Note this error is all over the tree, I guess due to copy-paste,
or trying to follow the same usage pattern as for Mkdir(),
or some not quite correct examples on the Internet.
[1] https://github.com/golang/go/blob/f9ed2f75/src/os/path.go
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This removes and recreates the merged dir with each umount/mount
respectively.
This is done to make the impact of leaking mountpoints have less
user-visible impact.
It's fairly easy to accidentally leak mountpoints (even if moby doesn't,
other tools on linux like 'unshare' are quite able to incidentally do
so).
As of recently, overlayfs reacts to these mounts being leaked (see
One trick to force an unmount is to remove the mounted directory and
recreate it. Devicemapper now does this, overlay can follow suit.
Signed-off-by: Euan Kemp <euan.kemp@coreos.com>
From https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt:
> The lower filesystem can be any filesystem supported by Linux and does
> not need to be writable. The lower filesystem can even be another
> overlayfs. The upper filesystem will normally be writable and if it
> is it must support the creation of trusted.* extended attributes, and
> must provide valid d_type in readdir responses, so NFS is not suitable.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
When use overlay2 as the graphdriver and the kernel enable
`CONFIG_OVERLAY_FS_REDIRECT_DIR=y`, rename a dir in lower layer
will has a xattr to redirct its dir to source dir. This make the
image layer unportable. This patch fallback to use naive diff driver
when kernel enable CONFIG_OVERLAY_FS_REDIRECT_DIR
Signed-off-by: Lei Jitang <leijitang@huawei.com>
The change in 7a7357dae1 inadvertently
changed the `defer` error code into a no-op. This restores its behavior
prior to that code change, and also introduces a little more error
logging.
Signed-off-by: Euan Kemp <euan.kemp@coreos.com>
This enables docker cp and ADD/COPY docker build support for LCOW.
Originally, the graphdriver.Get() interface returned a local path
to the container root filesystem. This does not work for LCOW, so
the Get() method now returns an interface that LCOW implements to
support copying to and from the container.
Signed-off-by: Akash Gupta <akagup@microsoft.com>
Changes most references of syscall to golang.org/x/sys/
Ones aren't changes include, Errno, Signal and SysProcAttr
as they haven't been implemented in /x/sys/.
Signed-off-by: Christopher Jones <tophj@linux.vnet.ibm.com>
[s390x] switch utsname from unsigned to signed
per 33267e036f
char in s390x in the /x/sys/unix package is now signed, so
change the buildtags
Signed-off-by: Christopher Jones <tophj@linux.vnet.ibm.com>
This commit adds the overlay2.size option to the daemon daemon
storage opts.
The user can override this option by the "docker run --storage-opt"
options.
Signed-off-by: Dhawal Yogesh Bhanushali <dbhanushali@vmware.com>
we see a lot of
```
level=debug msg="Failed to unmount a03b1bb6f569421857e5407d73d89451f92724674caa56bfc2170de7e585a00b-init overlay: device or resource busy"
```
in daemon logs and there is a lot of mountpoint leftover.
This cause failed to remove container.
Signed-off-by: Lei Jitang <leijitang@huawei.com>
OverlayFS is supported on top of btrfs as of Linux Kernel 4.7.
Skip the hard enforcement when on kernel 4.7 or newer and
respect the kernel check override flag on older kernels.
https://btrfs.wiki.kernel.org/index.php/Changelog#By_feature
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
Before this, if `forceRemove` is set the container data will be removed
no matter what, including if there are issues with removing container
on-disk state (rw layer, container root).
In practice this causes a lot of issues with leaked data sitting on
disk that users are not able to clean up themselves.
This is particularly a problem while the `EBUSY` errors on remove are so
prevalent. So for now let's not keep this behavior.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Naivediff fails when layers are created directly on top of
each other. Other graphdrivers which use naivediff already
skip these tests. Until naivediff is fixed, skip with overlay2
when running tests on a kernel which causes naivediff fallback.
Fix applydiff to never use the naivediff size when not applying
changes with naivediff.
Signed-off-by: Derek McGowan <derek@mcgstyle.net> (github: dmcgowan)
When running on a kernel which is not patched for the copy up bug
overlay2 will use the naive diff driver.
Signed-off-by: Derek McGowan <derek@mcgstyle.net> (github: dmcgowan)
This allows for easy extension of adding more parameters to existing
parameters list. Otherwise adding a single parameter changes code
at so many places.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
The overlay2 change ensures that the correct path is used to resolve the
symlink. The current code will not fail since the symlinks are always given
a value of "../id/diff" which ends up ignoring the incorrect "link" value.
Fix this code so it doesn't cause unexpected errors in the future if the
symlink changes.
The layerstore cleanup ensures that the empty layer returns a tar stream if
the provided parent is empty. Any value other than empty still returns an
error since the empty layer has no parent. Currently empty layer is not
used anywhere that TarStreamFrom is used but could break in the future if
this function is called.
Signed-off-by: Derek McGowan <derek@mcgstyle.net> (github: dmcgowan)
Allow built images to be squash to scratch.
Squashing does not destroy any images or layers, and preserves the
build cache.
Introduce a new CLI argument --squash to docker build
Introduce a new param to the build API endpoint `squash`
Once the build is complete, docker creates a new image loading the diffs
from each layer into a single new layer and references all the parent's
layers.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This fix tries to fix logrus formatting by removing `f` from
`logrus.[Error|Warn|Debug|Fatal|Panic|Info]f` when formatting string
is not present.
Fixed issue #23459
Signed-off-by: Daehyeok Mun <daehyeok@gmail.com>
The `archive` package defines aliases for `io.ReadCloser` and
`io.Reader`. These don't seem to provide an benefit other than type
decoration. Per this change, several unnecessary type cases were
removed.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Go can falsely report a larger page size than supported,
causing overlay2 mount arguments to be truncated. When overlay2
detects the mount arguments have hit the page limit, it will
switch to using relative paths. If this limit is smaller than
the actual page size there is no behavioral problems, but if it
is larger mounts can fail for images with many layers.
Closes#27384
Signed-off-by: Derek McGowan <derek@mcgstyle.net> (github: dmcgowan)
Allow passing --storage-opt size=X to docker create/run commands
for the `overlay2` graphriver.
The size option is only available if the backing fs is xfs that is
mounted with the `pquota` mount option.
The user can pass any size less then the backing fs size.
Signed-off-by: Amir Goldstein <amir73il@aquasec.com>
In the common case where the user is using /var/lib/docker and
an image with less than 60 layers, forking is not needed. Calculate
whether absolute paths can be used and avoid forking to mount in
those cases.
Signed-off-by: Derek McGowan <derek@mcgstyle.net> (github: dmcgowan)
Add option to skip kernel check for older kernels which have been patched to support multiple lower directories in overlayfs.
Fixes#24023
Signed-off-by: Derek McGowan <derek@mcgstyle.net> (github: dmcgowan)
Symlinks are currently not getting cleaned up when removing layers since only the root directory is removed.
On remove, read the link file and remove the associated link from the link directory.
Signed-off-by: Derek McGowan <derek@mcgstyle.net> (github: dmcgowan)
Adds a new overlay driver which uses multiple lower directories to create the union fs.
Additionally it uses symlinks and relative mount paths to allow a depth of 128 and stay within the mount page size limit.
Diffs and done directly over a single directory allowing diffs to be done efficiently and without the need fo the naive diff driver.
Signed-off-by: Derek McGowan <derek@mcgstyle.net> (github: dmcgowan)