Now that we do check if overlay is working by performing an actual
overlayfs mount, there's no need in extra checks for the kernel version
or the filesystem type. Actual mount check is sufficient.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Before this commit, overlay check was performed by looking for
`overlay` in /proc/filesystem. This obviously might not work
for rootless Docker (fs is there, but one can't use it as non-root).
This commit changes the check to perform the actual mount, by reusing
the code previously written to check for multiple lower dirs support.
The old check is removed from both drivers, as well as the additional
check for the multiple lower dirs support in overlay2 since it's now
a part of the main check.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This moves supportsMultipleLowerDir() to overlayutils
so it can be used from both overlay and overlay2.
The only changes made were:
* replace logger with logrus
* don't use workDirName mergedDirName constants
* add mnt var to improve readability a bit
This is a preparation for the next commit.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
As caught by staticcheck (after disabling the default exclusion rules);
Based on the comment, this break was indeed meant to break the
loop and return the error.
```
daemon/graphdriver/aufs/mount.go:54:4: SA4011: ineffective break statement. Did you mean to break out of the outer loop? (staticcheck)
break
^
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
```
daemon/graphdriver/btrfs/btrfs.go:609:5: SA4003: no value of type uint64 is less than 0 (staticcheck)
if driver.options.size <= 0 {
^
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
```
daemon/graphdriver/aufs/aufs_test.go:746:8: SA4021: x = append(y) is equivalent to x = y (staticcheck)
ids = append(ids[2:])
^
```
Also pre-allocating the ids slice while we're at it.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
It has been pointed out that sometimes device mapper unit tests
fail with the following diagnostics:
> --- FAIL: TestDevmapperSetup (0.02s)
> graphtest_unix.go:44: graphdriver: loopback attach failed
> graphtest_unix.go:48: loopback attach failed
The root cause is the absence of udev inside the container used
for testing, which causes device nodes (/dev/loop*) to not be
created.
The test suite itself already has a workaround, but it only
creates 8 devices (loop0 till loop7). It might very well be
the case that the first few devices are already used by the
system (on my laptop 15 devices are busy).
The fix is to raise the number of devices being manually created.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Here, err is never non-nil as it was checked earlier.
Fixes the following linter warning:
> daemon/graphdriver/copy/copy.go:136:10: nilness: impossible condition: nil != nil (govet)
> if err != nil {
> ^
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Format the source according to latest goimports.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
suppressing the "SA9003: empty branch (staticcheck)" instead of commenting-out
or removing these lines because removing/commenting these lines causes a ripple
effect of changes, and there's still a to-do below.
```
13:06:14 daemon/graphdriver/graphtest/graphbench_unix.go:175:3: SA9003: empty branch (staticcheck)
13:06:14 if applyDiffSize != diffSize {
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
When mounting overlays which have children, enforce that
the mount is always performed as read only. Newer versions
of the kernel return a device busy error when a lower directory
is in use as an upper directory in another overlay mount.
Adds committed file to indicate when an overlay is being used
as a parent, ensuring it will no longer be mounted with an
upper directory.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
btrfs_noversion was added in d7c37b5a28
for distributions that did not have the `btrfs/version.h` header file.
Seeing how all of the distributions we currently support do have the
`btrfs/version.h` file we should probably just remove this build flag
altogether.
Signed-off-by: Eli Uriegas <eli.uriegas@docker.com>
Protect access to q.quotas map, and lock around changing nextProjectID.
Techinically, the lock in findNextProjectID() is not needed as it is
only called during initialization, but one can never be too careful.
Fixes: 52897d1c09 ("projectquota: utility class for project quota controls")
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Commit e2989c4d48 says:
> With the suffix added, the possibility to hit the race is extremely
> low, and we don't have to do any locking.
Probability theory just laughed in my face this weekend, as this has
actually happened once in 6050000 containers created, on a high-end
hardware with 1000 parallel "docker create" running (took a few days).
One way to work around this is increase the randomness by adding more
characters, which will further decrease the probability, but won't
eliminate it entirely. Another is to fix it upstream (done, see the
link below, but the fix might not be packported to Ubuntu).
Overall, as much as I like this solution, I think we need to
revert it :-\
See-also: https://github.com/sfjro/aufs5-standalone/commit/abf61326f49535
This reverts commit e2989c4d48.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
For some reason, retrying to unmount in case of getting EBUSY error
was only performed in Remove(), but not Put().
I have done some testing on Ubuntu 16.04 and 18.04 with aufs,
performing massively parallel container creation using this script:
```
NUMCTS=5000
PARALLEL=100
IMAGE=busybox
docker pull $IMAGE >/dev/null
seq $NUMCTS | parallel -j$PARALLEL docker create $IMAGE true > /dev/null
docker ps -qa | shuf | tail -n $NUMCTS | parallel -j$PARALLEL docker rm -f '{}' > /dev/null
```
Sometimes (1 to 5 times per 10000 `docker create`), aufs.Put() fails on Unmount syscall
with EBUSY during container creation:
> Error response from daemon: device or resource busy
and in docker log, with debug turned on:
> level=debug msg="Failed to unmount ID-init aufs: device or resource busy"
> level=error msg="Handler for POST /v1.30/containers/create returned error: device or resource busy"
I did some debugging by running fuser -v -M -m $MOUNT_POINT but
that reveals nothing.
This commit:
* implements retry on EBUSY in Unmount()
* calls Unmount() from Remove()
* increases the number of retries from 3 to 5
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In case statfs() returns ENOENT, do not return an error, but rather
treat this as "not mounted".
Related to commit d42dbdd3d4.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Commit 5cd62852fa added a lock around call to unix.Mount() to
avoid the race in aufs kernel code related to xino file creation
and removal. While this is going to be fixed in the kernel, we still
need to support the current aufs, so some kind of fix is required.
A think a better fix (rather than a lock) is to add a random suffix
to the file name (note it is and was a separate file per mount,
never mind the same file name -- the file is created/opened and
removed instantly, so each mount deals with its own file).
With the suffix added, the possibility to hit the race is extremely
low, and we don't have to do any locking.
Note we don't add any more characters, instead we're replacing
`xino` with four random characters in the 0-9a-z range.
See also: https://sourceforge.net/p/aufs/mailman/message/36674769/
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Running a bundled aufs benchmark sometimes results in this warning:
> WARN[0001] Couldn't run auplink before unmount /tmp/aufs-tests/aufs/mnt/XXXXX error="exit status 22" storage-driver=aufs
If we take a look at what aulink utility produces on stderr, we'll see:
> auplink:proc_mnt.c:96: /tmp/aufs-tests/aufs/mnt/XXXXX: Invalid argument
and auplink exits with exit code of 22 (EINVAL).
Looking into auplink source code, what happens is it tries to find a
record in /proc/self/mounts corresponding to the mount point (by using
setmntent()/getmntent_r() glibc functions), and it fails.
Some manual testing, as well as runtime testing with lots of printf
added on mount/unmount, as well as calls to check the superblock fs
magic on mount point (as in graphdriver.Mounted(graphdriver.FsMagicAufs, target)
confirmed that this record is in fact there, but sometimes auplink
can't find it. I was also able to reproduce the same error (inability
to find a mount in /proc/self/mounts that should definitely be there)
using a small C program, mocking what `auplink` does:
```c
#include <stdio.h>
#include <err.h>
#include <mntent.h>
#include <string.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
FILE *fp;
struct mntent m, *p;
char a[4096];
char buf[4096 + 1024];
int found =0, lines = 0;
if (argc != 2) {
fprintf(stderr, "Usage: %s <mountpoint>\n", argv[0]);
exit(1);
}
fp = setmntent("/proc/self/mounts", "r");
if (!fp) {
err(1, "setmntent");
}
setvbuf(fp, a, _IOLBF, sizeof(a));
while ((p = getmntent_r(fp, &m, buf, sizeof(buf)))) {
lines++;
if (!strcmp(p->mnt_dir, argv[1])) {
found++;
}
}
printf("found %d entries for %s (%d lines seen)\n", found, argv[1], lines);
return !found;
}
```
I have also wrote a few other C proggies -- one that reads
/proc/self/mounts directly, one that reads /proc/self/mountinfo instead.
They are also prone to the same occasional error.
It is not perfectly clear why this happens, but so far my best theory
is when a lot of mounts/unmounts happen in parallel with reading
contents of /proc/self/mounts, sometimes the kernel fails to provide
continuity (i.e. it skips some part of file or mixes it up in some
other way). In other words, this is a kernel bug (which is probably
hard to fix unless some other interface to get a mount entry is added).
Now, there is no real fix, and a workaround I was able to come up
with is to retry when we got EINVAL. It usually works on the second
attempt, although I've once seen it took two attempts to go through.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Do not use filepath.Walk() as there's no requirement to recursively
go into every directory under mnt -- a (non-recursive) list of
directories in mnt is sufficient.
With filepath.Walk(), in case some container will fail to unmount,
it'll go through the whole container filesystem which is both
excessive and useless.
This is similar to commit f1a4592297 ("devmapper.shutdown:
optimize")
While at it, raise the priority of "unmount error" message from debug
to a warning. Note we don't have to explicitly add `m` as unmount error (from
pkg/mount) will have it.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In case there are a big number of layers, so that mount data won't fit
into a single memory page (4096 bytes on most platforms, which is good
enough for about 40 layers, depending on how long graphdriver root path
is), we supply additional layers with O_REMOUNT, as described in aufs
documentation.
Problem is, the current implementation does that one layer at a time
(i.e. there is one mount syscall per each additional layer).
Optimize the code to supply as many layers as we can fit in one page
(basically reusing the same code as for the original mount).
Note, per aufs docs, "[a]t remount-time, the options are interpreted
in the given order, e.g. left to right" so we should be good.
Tested on an image with ~100 layers.
Before (35 syscalls):
> [pid 22756] 1556919088.686955 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/a86f8c9dd0ec2486293119c20b0ec026e19bbc4d51332c554f7cf05d777c9866", "aufs", 0, "br:/mnt/volume_sfo2_09/docker-au"...) = 0 <0.000504>
> [pid 22756] 1556919088.687643 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/a86f8c9dd0ec2486293119c20b0ec026e19bbc4d51332c554f7cf05d777c9866", 0xc000c451b0, MS_REMOUNT, "append:/mnt/volume_sfo2_09/docke"...) = 0 <0.000105>
> [pid 22756] 1556919088.687851 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/a86f8c9dd0ec2486293119c20b0ec026e19bbc4d51332c554f7cf05d777c9866", 0xc000c451ba, MS_REMOUNT, "append:/mnt/volume_sfo2_09/docke"...) = 0 <0.000098>
> ..... (~30 lines skipped for clarity)
> [pid 22756] 1556919088.696182 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/a86f8c9dd0ec2486293119c20b0ec026e19bbc4d51332c554f7cf05d777c9866", 0xc000c45310, MS_REMOUNT, "append:/mnt/volume_sfo2_09/docke"...) = 0 <0.000266>
After (2 syscalls):
> [pid 24352] 1556919361.799889 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/8e7ba189e347a834e99eea4ed568f95b86cec809c227516afdc7c70286ff9a20", "aufs", 0, "br:/mnt/volume_sfo2_09/docker-au"...) = 0 <0.001717>
> [pid 24352] 1556919361.801761 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/8e7ba189e347a834e99eea4ed568f95b86cec809c227516afdc7c70286ff9a20", 0xc000dbecb0, MS_REMOUNT, "append:/mnt/volume_sfo2_09/docke"...) = 0 <0.001358>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Apparently there is some kind of race in aufs kernel module code,
which leads to the errors like:
[98221.158606] aufs au_xino_create2:186:dockerd[25801]: aufs.xino create err -17
[98221.162128] aufs au_xino_set:1229:dockerd[25801]: I/O Error, failed creating xino(-17).
[98362.239085] aufs au_xino_create2:186:dockerd[6348]: aufs.xino create err -17
[98362.243860] aufs au_xino_set:1229:dockerd[6348]: I/O Error, failed creating xino(-17).
[98373.775380] aufs au_xino_create:767:dockerd[27435]: open /dev/shm/aufs.xino(-17)
[98389.015640] aufs au_xino_create2:186:dockerd[26753]: aufs.xino create err -17
[98389.018776] aufs au_xino_set:1229:dockerd[26753]: I/O Error, failed creating xino(-17).
[98424.117584] aufs au_xino_create:767:dockerd[27105]: open /dev/shm/aufs.xino(-17)
So, we have to have a lock around mount syscall.
While at it, don't call the whole Unmount() on an error path, as
it leads to bogus error from auplink flush.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. Use mount.Unmount() which ignores EINVAL ("not mounted") error,
and provides better error diagnostics (so we don't have to explicitly
add target to error messages).
2. Since we're ignoring "not mounted" error, we can call
multiple unmounts without any locking -- but since "auplink flush"
is still involved and can produce an error in logs, let's keep
the check for fs being mounted (it's just a statfs so should be fast).
2. While at it, improve the "can't unmount" error message in Put().
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Both mount and unmount calls are already protected by fine-grained
(per id) locks in Get()/Put() introduced in commit fc1cf1911b
("Add more locking to storage drivers"), so there's no point in
having a global lock in mount/unmount.
The only place from which unmount is called without any locking
is Cleanup() -- this is to be addressed in the next patch.
This reverts commit 824c24e680.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: John Howard <jhoward@microsoft.com>
Some permissions corrections here. Also needs re-vendor of go-winio.
- Create the layer folder directory as standard, not with SDDL. It will inherit permissions from the data-root correctly.
- Apply the VM Group SID access to layer.vhd
Permissions after this changes
Data root:
```
PS C:\> icacls test
test BUILTIN\Administrators:(OI)(CI)(F)
NT AUTHORITY\SYSTEM:(OI)(CI)(F)
```
lcow subdirectory under dataroot
```
PS C:\> icacls test\lcow
test\lcow BUILTIN\Administrators:(I)(OI)(CI)(F)
NT AUTHORITY\SYSTEM:(I)(OI)(CI)(F)
```
layer.vhd in a layer folder for LCOW
```
.\test\lcow\c33923d21c9621fea2f990a8778f469ecdbdc57fd9ca682565d1fa86fadd5d95\layer.vhd NT VIRTUAL MACHINE\Virtual Machines:(R)
BUILTIN\Administrators:(I)(F)
NT AUTHORITY\SYSTEM:(I)(F)
```
And showing working
```
PS C:\> docker-ci-zap -folder=c:\test
INFO: Zapped successfully
PS C:\> docker run --rm alpine echo hello
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
8e402f1a9c57: Pull complete
Digest: sha256:644fcb1a676b5165371437feaa922943aaf7afcfa8bfee4472f6860aad1ef2a0
Status: Downloaded newer image for alpine:latest
hello
```