Here, err is never non-nil as it was checked earlier.
Fixes the following linter warning:
> daemon/graphdriver/copy/copy.go:136:10: nilness: impossible condition: nil != nil (govet)
> if err != nil {
> ^
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Format the source according to latest goimports.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
suppressing the "SA9003: empty branch (staticcheck)" instead of commenting-out
or removing these lines because removing/commenting these lines causes a ripple
effect of changes, and there's still a to-do below.
```
13:06:14 daemon/graphdriver/graphtest/graphbench_unix.go:175:3: SA9003: empty branch (staticcheck)
13:06:14 if applyDiffSize != diffSize {
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
When mounting overlays which have children, enforce that
the mount is always performed as read only. Newer versions
of the kernel return a device busy error when a lower directory
is in use as an upper directory in another overlay mount.
Adds committed file to indicate when an overlay is being used
as a parent, ensuring it will no longer be mounted with an
upper directory.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
btrfs_noversion was added in d7c37b5a28
for distributions that did not have the `btrfs/version.h` header file.
Seeing how all of the distributions we currently support do have the
`btrfs/version.h` file we should probably just remove this build flag
altogether.
Signed-off-by: Eli Uriegas <eli.uriegas@docker.com>
Protect access to q.quotas map, and lock around changing nextProjectID.
Techinically, the lock in findNextProjectID() is not needed as it is
only called during initialization, but one can never be too careful.
Fixes: 52897d1c09 ("projectquota: utility class for project quota controls")
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Commit e2989c4d48 says:
> With the suffix added, the possibility to hit the race is extremely
> low, and we don't have to do any locking.
Probability theory just laughed in my face this weekend, as this has
actually happened once in 6050000 containers created, on a high-end
hardware with 1000 parallel "docker create" running (took a few days).
One way to work around this is increase the randomness by adding more
characters, which will further decrease the probability, but won't
eliminate it entirely. Another is to fix it upstream (done, see the
link below, but the fix might not be packported to Ubuntu).
Overall, as much as I like this solution, I think we need to
revert it :-\
See-also: https://github.com/sfjro/aufs5-standalone/commit/abf61326f49535
This reverts commit e2989c4d48.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
For some reason, retrying to unmount in case of getting EBUSY error
was only performed in Remove(), but not Put().
I have done some testing on Ubuntu 16.04 and 18.04 with aufs,
performing massively parallel container creation using this script:
```
NUMCTS=5000
PARALLEL=100
IMAGE=busybox
docker pull $IMAGE >/dev/null
seq $NUMCTS | parallel -j$PARALLEL docker create $IMAGE true > /dev/null
docker ps -qa | shuf | tail -n $NUMCTS | parallel -j$PARALLEL docker rm -f '{}' > /dev/null
```
Sometimes (1 to 5 times per 10000 `docker create`), aufs.Put() fails on Unmount syscall
with EBUSY during container creation:
> Error response from daemon: device or resource busy
and in docker log, with debug turned on:
> level=debug msg="Failed to unmount ID-init aufs: device or resource busy"
> level=error msg="Handler for POST /v1.30/containers/create returned error: device or resource busy"
I did some debugging by running fuser -v -M -m $MOUNT_POINT but
that reveals nothing.
This commit:
* implements retry on EBUSY in Unmount()
* calls Unmount() from Remove()
* increases the number of retries from 3 to 5
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In case statfs() returns ENOENT, do not return an error, but rather
treat this as "not mounted".
Related to commit d42dbdd3d4.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Commit 5cd62852fa added a lock around call to unix.Mount() to
avoid the race in aufs kernel code related to xino file creation
and removal. While this is going to be fixed in the kernel, we still
need to support the current aufs, so some kind of fix is required.
A think a better fix (rather than a lock) is to add a random suffix
to the file name (note it is and was a separate file per mount,
never mind the same file name -- the file is created/opened and
removed instantly, so each mount deals with its own file).
With the suffix added, the possibility to hit the race is extremely
low, and we don't have to do any locking.
Note we don't add any more characters, instead we're replacing
`xino` with four random characters in the 0-9a-z range.
See also: https://sourceforge.net/p/aufs/mailman/message/36674769/
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Running a bundled aufs benchmark sometimes results in this warning:
> WARN[0001] Couldn't run auplink before unmount /tmp/aufs-tests/aufs/mnt/XXXXX error="exit status 22" storage-driver=aufs
If we take a look at what aulink utility produces on stderr, we'll see:
> auplink:proc_mnt.c:96: /tmp/aufs-tests/aufs/mnt/XXXXX: Invalid argument
and auplink exits with exit code of 22 (EINVAL).
Looking into auplink source code, what happens is it tries to find a
record in /proc/self/mounts corresponding to the mount point (by using
setmntent()/getmntent_r() glibc functions), and it fails.
Some manual testing, as well as runtime testing with lots of printf
added on mount/unmount, as well as calls to check the superblock fs
magic on mount point (as in graphdriver.Mounted(graphdriver.FsMagicAufs, target)
confirmed that this record is in fact there, but sometimes auplink
can't find it. I was also able to reproduce the same error (inability
to find a mount in /proc/self/mounts that should definitely be there)
using a small C program, mocking what `auplink` does:
```c
#include <stdio.h>
#include <err.h>
#include <mntent.h>
#include <string.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
FILE *fp;
struct mntent m, *p;
char a[4096];
char buf[4096 + 1024];
int found =0, lines = 0;
if (argc != 2) {
fprintf(stderr, "Usage: %s <mountpoint>\n", argv[0]);
exit(1);
}
fp = setmntent("/proc/self/mounts", "r");
if (!fp) {
err(1, "setmntent");
}
setvbuf(fp, a, _IOLBF, sizeof(a));
while ((p = getmntent_r(fp, &m, buf, sizeof(buf)))) {
lines++;
if (!strcmp(p->mnt_dir, argv[1])) {
found++;
}
}
printf("found %d entries for %s (%d lines seen)\n", found, argv[1], lines);
return !found;
}
```
I have also wrote a few other C proggies -- one that reads
/proc/self/mounts directly, one that reads /proc/self/mountinfo instead.
They are also prone to the same occasional error.
It is not perfectly clear why this happens, but so far my best theory
is when a lot of mounts/unmounts happen in parallel with reading
contents of /proc/self/mounts, sometimes the kernel fails to provide
continuity (i.e. it skips some part of file or mixes it up in some
other way). In other words, this is a kernel bug (which is probably
hard to fix unless some other interface to get a mount entry is added).
Now, there is no real fix, and a workaround I was able to come up
with is to retry when we got EINVAL. It usually works on the second
attempt, although I've once seen it took two attempts to go through.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Do not use filepath.Walk() as there's no requirement to recursively
go into every directory under mnt -- a (non-recursive) list of
directories in mnt is sufficient.
With filepath.Walk(), in case some container will fail to unmount,
it'll go through the whole container filesystem which is both
excessive and useless.
This is similar to commit f1a4592297 ("devmapper.shutdown:
optimize")
While at it, raise the priority of "unmount error" message from debug
to a warning. Note we don't have to explicitly add `m` as unmount error (from
pkg/mount) will have it.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In case there are a big number of layers, so that mount data won't fit
into a single memory page (4096 bytes on most platforms, which is good
enough for about 40 layers, depending on how long graphdriver root path
is), we supply additional layers with O_REMOUNT, as described in aufs
documentation.
Problem is, the current implementation does that one layer at a time
(i.e. there is one mount syscall per each additional layer).
Optimize the code to supply as many layers as we can fit in one page
(basically reusing the same code as for the original mount).
Note, per aufs docs, "[a]t remount-time, the options are interpreted
in the given order, e.g. left to right" so we should be good.
Tested on an image with ~100 layers.
Before (35 syscalls):
> [pid 22756] 1556919088.686955 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/a86f8c9dd0ec2486293119c20b0ec026e19bbc4d51332c554f7cf05d777c9866", "aufs", 0, "br:/mnt/volume_sfo2_09/docker-au"...) = 0 <0.000504>
> [pid 22756] 1556919088.687643 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/a86f8c9dd0ec2486293119c20b0ec026e19bbc4d51332c554f7cf05d777c9866", 0xc000c451b0, MS_REMOUNT, "append:/mnt/volume_sfo2_09/docke"...) = 0 <0.000105>
> [pid 22756] 1556919088.687851 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/a86f8c9dd0ec2486293119c20b0ec026e19bbc4d51332c554f7cf05d777c9866", 0xc000c451ba, MS_REMOUNT, "append:/mnt/volume_sfo2_09/docke"...) = 0 <0.000098>
> ..... (~30 lines skipped for clarity)
> [pid 22756] 1556919088.696182 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/a86f8c9dd0ec2486293119c20b0ec026e19bbc4d51332c554f7cf05d777c9866", 0xc000c45310, MS_REMOUNT, "append:/mnt/volume_sfo2_09/docke"...) = 0 <0.000266>
After (2 syscalls):
> [pid 24352] 1556919361.799889 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/8e7ba189e347a834e99eea4ed568f95b86cec809c227516afdc7c70286ff9a20", "aufs", 0, "br:/mnt/volume_sfo2_09/docker-au"...) = 0 <0.001717>
> [pid 24352] 1556919361.801761 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/8e7ba189e347a834e99eea4ed568f95b86cec809c227516afdc7c70286ff9a20", 0xc000dbecb0, MS_REMOUNT, "append:/mnt/volume_sfo2_09/docke"...) = 0 <0.001358>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Apparently there is some kind of race in aufs kernel module code,
which leads to the errors like:
[98221.158606] aufs au_xino_create2:186:dockerd[25801]: aufs.xino create err -17
[98221.162128] aufs au_xino_set:1229:dockerd[25801]: I/O Error, failed creating xino(-17).
[98362.239085] aufs au_xino_create2:186:dockerd[6348]: aufs.xino create err -17
[98362.243860] aufs au_xino_set:1229:dockerd[6348]: I/O Error, failed creating xino(-17).
[98373.775380] aufs au_xino_create:767:dockerd[27435]: open /dev/shm/aufs.xino(-17)
[98389.015640] aufs au_xino_create2:186:dockerd[26753]: aufs.xino create err -17
[98389.018776] aufs au_xino_set:1229:dockerd[26753]: I/O Error, failed creating xino(-17).
[98424.117584] aufs au_xino_create:767:dockerd[27105]: open /dev/shm/aufs.xino(-17)
So, we have to have a lock around mount syscall.
While at it, don't call the whole Unmount() on an error path, as
it leads to bogus error from auplink flush.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. Use mount.Unmount() which ignores EINVAL ("not mounted") error,
and provides better error diagnostics (so we don't have to explicitly
add target to error messages).
2. Since we're ignoring "not mounted" error, we can call
multiple unmounts without any locking -- but since "auplink flush"
is still involved and can produce an error in logs, let's keep
the check for fs being mounted (it's just a statfs so should be fast).
2. While at it, improve the "can't unmount" error message in Put().
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Both mount and unmount calls are already protected by fine-grained
(per id) locks in Get()/Put() introduced in commit fc1cf1911b
("Add more locking to storage drivers"), so there's no point in
having a global lock in mount/unmount.
The only place from which unmount is called without any locking
is Cleanup() -- this is to be addressed in the next patch.
This reverts commit 824c24e680.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: John Howard <jhoward@microsoft.com>
Some permissions corrections here. Also needs re-vendor of go-winio.
- Create the layer folder directory as standard, not with SDDL. It will inherit permissions from the data-root correctly.
- Apply the VM Group SID access to layer.vhd
Permissions after this changes
Data root:
```
PS C:\> icacls test
test BUILTIN\Administrators:(OI)(CI)(F)
NT AUTHORITY\SYSTEM:(OI)(CI)(F)
```
lcow subdirectory under dataroot
```
PS C:\> icacls test\lcow
test\lcow BUILTIN\Administrators:(I)(OI)(CI)(F)
NT AUTHORITY\SYSTEM:(I)(OI)(CI)(F)
```
layer.vhd in a layer folder for LCOW
```
.\test\lcow\c33923d21c9621fea2f990a8778f469ecdbdc57fd9ca682565d1fa86fadd5d95\layer.vhd NT VIRTUAL MACHINE\Virtual Machines:(R)
BUILTIN\Administrators:(I)(F)
NT AUTHORITY\SYSTEM:(I)(F)
```
And showing working
```
PS C:\> docker-ci-zap -folder=c:\test
INFO: Zapped successfully
PS C:\> docker run --rm alpine echo hello
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
8e402f1a9c57: Pull complete
Digest: sha256:644fcb1a676b5165371437feaa922943aaf7afcfa8bfee4472f6860aad1ef2a0
Status: Downloaded newer image for alpine:latest
hello
```
When initializing graphdrivers without root a permission warning
log is given due to lack of permission to create a device. This
error should be treated the same as quota not supported.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
Signed-off-by: John Howard <jhoward@microsoft.com>
LCOWv1 will be deprecated soon anyway (and LCOW is experimental regardless).
Removing lcow.initrd and lcow.kernel options which will not be supported
in LCOWv2 (via containerd).
Due to a bug in Golang (github.com/golang#27640), the "character device"
bit was omitted when checking file-modes with `os.ModeType`.
This bug was resolved in Go 1.12, but as a result, graphdrivers
would no longer recognize "device" files, causing pulling of
images that have a file with this filemode to fail;
failed to register layer:
unknown file type for /var/lib/docker/vfs/dir/.../dev/console
The current code checked for an exact match of Modes to be set. The
`os.ModeCharDevice` and `os.ModeDevice` bits will always be set in
tandem, however, because the code was only looking for an exact
match, this detection broke now that `os.ModeCharDevice` was added.
This patch changes the code to be more defensive, and instead
check if the `os.ModeDevice` bit is set (either with, or without
the `os.ModeCharDevice` bit).
In addition, some information was added to the error-message if
no type was matched, to assist debugging in case additional types
are added in future.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: John Howard <jhoward@microsoft.com>
Reported internally at Microsoft through VSO#19696554.
Using the solution from https://groups.google.com/forum/#!topic/Golang-Nuts/DpldsmrhPio
to quote file name and escape single quotes (https://play.golang.org/p/ntk8EEGjfk)
Simple repro steps are something like:
On an ubuntu box run something like
```
docker run -d --rm -p 5000:5000 registry:latest
hostname-I to get the ip address
```
On Windows start the daemon adding `--insecure-registry 10.124.186.18:5000`
(or whatever the IP address from above was)
```
docker run -it alpine sh
/ # echo bar > "with space"
/ # echo foo > 'single quote space'
/ # exit
docker ps -a
docker commit <containerid>
(note the first few of the image id)
docker tag <first few> 10.124.186.18:5000/test
docker push 10.124.186.18:5000/test
```
Resulting error when pushing the image:
```
PS E:\docker\build\19696554> docker push 10.124.186.18:5000/simpletest2
The push refers to repository [10.124.186.18:5000/simpletest2]
d328d7f5f277: Pushing [==================================================>] 74.24kB/74.24kB
503e53e365f3: Layer already exists
svm.runProcess: command cat /tmp/d59/single quote space failed with exit code 1
PS E:\docker\build\19696554>
```
After this change pushing the image:
```
PS E:\docker\build\19696554> docker push 10.124.186.18:5000/simpletest2
The push refers to repository [10.124.186.18:5000/simpletest2]
d328d7f5f277: Pushing [==================================================>] 74.24kB/74.24kB
503e53e365f3: Layer already exists
latest: digest: sha256:b9828a2d2a3d2421a4c342f48b7936714b3d8409dc32c103da5f3fb13b54bdbf size: 735
PS E:\docker\build\19696554>
```
The errors returned from Mount and Unmount functions are raw
syscall.Errno errors (like EPERM or EINVAL), which provides
no context about what has happened and why.
Similar to os.PathError type, introduce mount.Error type
with some context. The error messages will now look like this:
> mount /tmp/mount-tests/source:/tmp/mount-tests/target, flags: 0x1001: operation not permitted
or
> mount tmpfs:/tmp/mount-test-source-516297835: operation not permitted
Before this patch, it was just
> operation not permitted
[v2: add Cause()]
[v3: rename MountError to Error, document Cause()]
[v4: fixes; audited all users]
[v5: make Error type private; changes after @cpuguy83 reviews]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The `aufs` storage driver is deprecated in favor of `overlay2`, and will
be removed in a future release. Users of the `aufs` storage driver are
recommended to migrate to a different storage driver, such as `overlay2`, which
is now the default storage driver.
The `aufs` storage driver facilitates running Docker on distros that have no
support for OverlayFS, such as Ubuntu 14.04 LTS, which originally shipped with
a 3.14 kernel.
Now that Ubuntu 14.04 is no longer a supported distro for Docker, and `overlay2`
is available to all supported distros (as they are either on kernel 4.x, or have
support for multiple lowerdirs backported), there is no reason to continue
maintenance of the `aufs` storage driver.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>