Staticcheck reported:
SA9002: file mode '600' evaluates to 01130; did you mean '0600'? (staticcheck)
But fixing that caused the test to fail:
=== Failed
=== FAIL: pkg/filenotify TestPollerEvent (0.80s)
poller_test.go:75: timeout waiting for event CHMOD
The problem turned out to be that the file was created with `0644`. However,
after umask, the file created actually had `0600` filemode. Running the `os.Chmod`
with `0600` therefore was a no-op, causing the test to fail (because no
CHMOD event would fire).
This patch changes the test to;
- create the file with mode `0600`
- assert that the file has the expected mode
- change the chmod to `0644`
- assert that it has the correct mode, before testing the event.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
```
pkg/devicemapper/devmapper_wrapper.go:209:206: SA4000: identical expressions on the left and right side of the '==' operator (staticcheck)
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
As pointed out by govet,
> pkg/jsonmessage/jsonmessage_test.go:231:94: nilness: nil dereference in dynamic method call (govet)
> if err := DisplayJSONMessagesStream(reader, data, inFd, false, nil); err == nil && err.Error()[:17] != "invalid character" {
> ^
The nil deref never happened as err was always non-nil, and so the check
for error message text was not performed.
Fix this, and while at it, refactor the code a bit.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Fix the following warnings from staticcheck linter:
```
pkg/term/term_linux_test.go:34:2: SA5001: should check returned error before deferring tty.Close() (staticcheck)
defer tty.Close()
^
pkg/term/term_linux_test.go:52:2: SA5001: should check returned error before deferring tty.Close() (staticcheck)
defer tty.Close()
^
pkg/term/term_linux_test.go:67:2: SA5001: should check returned error before deferring tty.Close() (staticcheck)
defer tty.Close()
^
....
```
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
opts/env_test: suppress a linter warning
this one:
> opts/env_test.go:95:4: U1000: field `err` is unused (unused)
> err error
> ^
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Format the source according to latest goimports.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
```
13:06:14 pkg/directory/directory_test.go:182:2: S1032: should use sort.Strings(...) instead of sort.Sort(sort.StringSlice(...)) (gosimple)
13:06:14 sort.Sort(sort.StringSlice(results))
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
```
13:06:14 pkg/pools/pools.go:75:13: SA6002: argument should be pointer-like to avoid allocations (staticcheck)
13:06:14 bp.pool.Put(b)
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
There is a race condition in pkg/archive when using `cmd.Start` for pigz
and xz where the `*bufio.Reader` could be returned to the pool while the
command is still writing to it, and then picked up and used by a new
command.
The command is wrapped in a `CommandContext` where the process will be
killed when the context is cancelled, however this is not instantaneous,
so there's a brief window while the command is still running but the
`*bufio.Reader` was already returned to the pool.
wrapReadCloser calls `cancel()`, and then `readBuf.Close()` which
eventually returns the buffer to the pool. However, because cmdStream
runs `cmd.Wait` in a go routine that we never wait for to finish, it is
not safe to return the reader to the pool yet. We need to ensure we
wait for `cmd.Wait` to finish!
Signed-off-by: Stephen Benjamin <stephen@redhat.com>
CheckSystemDriveAndRemoveDriveLetter depends on pathdriver.PathDriver
unnecessarily. This depends on the minimal interface that it actually
needs, to avoid callers from unnecessarily bringing in a
containerd/continuity dependency.
Signed-off-by: Jon Johnson <jonjohnson@google.com>
Support for GOOS=solaris was removed in PR #35373. Remove two leftover
*_solaris.go files missed in this PR.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
The `ioctl` interface for the `LOOP_CTL_GET_FREE` request on
`/dev/loop-control` is a little different from what `unix.IoctlGetInt`
expects: the first index is the returned status in `r1`, not an `int`
pointer as the first parameter.
Unfortunately we have to go a little lower level to get the appropriate
loop device index out, using `unix.Syscall` directly to read from
`r1`. Internally, the index is returned as a signed integer to match the
internal `ioctl` expectations of interpreting a negative signed integer
as an error at the userspace ABI boundary, so the direct interface of
`ioctlLoopCtlGetFree` can remain as-is.
[@kolyshkin: it still worked before this fix because of
/dev scan fallback in ioctlLoopCtlGetFree()]
Signed-off-by: Daniel Sweet <danieljsweet@icloud.com>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
- use subtests to make it clearer what the individual test-cases
are, and to prevent tests from depending on values set by the
previous test(s).
- remove redundant messages in assert (gotest.tools already prints
a useful message if assertions fail).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
also renamed the non-windows variant of this file to be
consistent with other files in this package
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
environment not in the chroot from untrusted files.
See also OpenVZ a3f732ef75/src/enter.c (L227-L234)
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
(cherry picked from commit cea6dca993c2b4cfa99b1e7a19ca134c8ebc236b)
Signed-off-by: Tibor Vass <tibor@docker.com>
This fix was added in 8e71b1e210 to work around
a go issue (https://github.com/golang/go/issues/20506).
That issue was fixed in
66c03d39f3,
which is part of Go 1.10 and up. This reverts the changes that were made in
8e71b1e210, and are no longer needed.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Not really bullet-proof, users can still create cgroups with name like
"foo:/init.scope" or "\nfoo" to bypass the detection. However, solving
these cases will require kernel to provide a better interface.
Signed-off-by: Robert Wang <robert@arctic.tw>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This can happen when you have --config-only network
Such attempt will fail anyway and it will create 15s delay in container
startup
Signed-off-by: Pavel Matěja <pavel@verotel.cz>
Use Getpid and SchedGetaffinity from golang.org/x/sys/unix to get the
number of CPUs in numCPU on Linux.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
This is needed so that we can add OS version constraints in Swarmkit, which
does require the engine to report its host's OS version (see
https://github.com/docker/swarmkit/issues/2770).
The OS version is parsed from the `os-release` file on Linux, and from the
`ReleaseId` string value of the `SOFTWARE\Microsoft\Windows NT\CurrentVersion`
registry key on Windows.
Added unit tests when possible, as well as Prometheus metrics.
Signed-off-by: Jean Rouge <rougej+github@gmail.com>
Previously only unpack operations were supported with chroot.
This adds chroot support for packing operations.
This prevents potential breakouts when copying data from a container.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This is useful for preventing CVE-2018-15664 where a malicious container
process can take advantage of a race on symlink resolution/sanitization.
Before this change chrootarchive would chroot to the destination
directory which is attacker controlled. With this patch we always chroot
to the container's root which is not attacker controlled.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Moby currently sorts uid and gid ranges in id maps. This causes subuid
and subgid files to be interpreted wrongly.
The subuid file
```
> cat /etc/subuid
jonas:100000:1000
jonas:1000:1
```
configures that the container uids 0-999 are mapped to the host uids
100000-100999 and uid 1000 in the container is mapped to uid 1000 on the
host. The expected uid_map is:
```
> docker run ubuntu cat /proc/self/uid_map
0 100000 1000
1000 1000 1
```
Moby currently sorts the ranges by the first id in the range. Therefore
with the subuid file above the uid 0 in the container is mapped to uid
100000 on host and the uids 1-1000 in container are mapped to the uids
1-1000 on the host. The resulting uid_map is:
```
> docker run ubuntu cat /proc/self/uid_map
0 1000 1
1 100000 1000
```
The ordering was implemented to work around a limitation in Linux 3.8.
This is fixed since Linux 3.9 as stated on the user namespaces manpage
[1]:
> In the initial implementation (Linux 3.8), this requirement was
> satisfied by a simplistic implementation that imposed the further
> requirement that the values in both field 1 and field 2 of successive
> lines must be in ascending numerical order, which prevented some
> otherwise valid maps from being created. Linux 3.9 and later fix this
> limitation, allowing any valid set of nonoverlapping maps.
This fix changes the interpretation of subuid and subgid files which do
not have the ids of in the numerical order for each individual user.
This breaks users that rely on the current behaviour.
The desired mapping above - map low user ids in the container to high
user ids on the host and some higher user ids in the container to lower
user on host - can unfortunately not archived with the current
behaviour.
[1] http://man7.org/linux/man-pages/man7/user_namespaces.7.html
Signed-off-by: Jonas Dohse <jonas@dohse.ch>
This is enabled for all containers that are not run with --privileged,
if the kernel supports it.
Fixes#38332
Signed-off-by: Rob Gulewich <rgulewich@netflix.com>
Katherine Louise Bouman is an imaging scientist and Assistant Professor
of Computer Science at the California Institute of Technology. She
researches computational methods for imaging, and developed an algorithm
that made possible the picture first visualization of a black hole
using the Event Horizon Telescope. - https://en.wikipedia.org/wiki/Katie_Bouman
Thank you for being amazing!
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The only option we supply is either BIND or a mount propagation flag,
so it makes sense to specify the flag value directly, rather than using
parseOptions() every time.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Current code in MakeMount parses /proc/self/mountinfo twice:
first in call to Mounted(), then in call to Mount(). Use
ForceMount() to eliminate such double parsing.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
`/proc/self/mountinfo` uses `\040` for spaces, however, `parseInfoFile()`
did not decode those spaces in paths, therefore attempting to use `\040`
as a literal part of the path.
This patch un-quotes the `root` and `mount point` fields to fix
situations where paths contain spaces.
Note that the `mount source` field is not modified, given that
this field is documented (man `PROC(5)`) as:
filesystem-specific information or "none"
Which I interpreted as "the format in this field is undefined".
Reported-by: Daniil Yaroslavtsev <daniilyar@users.noreply.github.com>
Reported-by: Nathan Ringo <remexre@gmail.com>
Based-on-patch-by: Diego Becciolini <itizir@users.noreply.github.com>
Based-on-patch-by: Sergei Utinski <sergei-utinski@users.noreply.github.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
* Add new adjectives to the names generator
Signed-off-by: sh7dm <d3dx12.xx@gmail.com>
* Add some more adjectives to the names generator
Signed-off-by: sh7dm <d3dx12.xx@gmail.com>
Signed-off-by: John Howard <jhoward@microsoft.com>
Some permissions corrections here. Also needs re-vendor of go-winio.
- Create the layer folder directory as standard, not with SDDL. It will inherit permissions from the data-root correctly.
- Apply the VM Group SID access to layer.vhd
Permissions after this changes
Data root:
```
PS C:\> icacls test
test BUILTIN\Administrators:(OI)(CI)(F)
NT AUTHORITY\SYSTEM:(OI)(CI)(F)
```
lcow subdirectory under dataroot
```
PS C:\> icacls test\lcow
test\lcow BUILTIN\Administrators:(I)(OI)(CI)(F)
NT AUTHORITY\SYSTEM:(I)(OI)(CI)(F)
```
layer.vhd in a layer folder for LCOW
```
.\test\lcow\c33923d21c9621fea2f990a8778f469ecdbdc57fd9ca682565d1fa86fadd5d95\layer.vhd NT VIRTUAL MACHINE\Virtual Machines:(R)
BUILTIN\Administrators:(I)(F)
NT AUTHORITY\SYSTEM:(I)(F)
```
And showing working
```
PS C:\> docker-ci-zap -folder=c:\test
INFO: Zapped successfully
PS C:\> docker run --rm alpine echo hello
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
8e402f1a9c57: Pull complete
Digest: sha256:644fcb1a676b5165371437feaa922943aaf7afcfa8bfee4472f6860aad1ef2a0
Status: Downloaded newer image for alpine:latest
hello
```
This patch hard-codes support for NVIDIA GPUs.
In a future patch it should move out into its own Device Plugin.
Signed-off-by: Tibor Vass <tibor@docker.com>
Signed-off-by: John Howard <jhoward@microsoft.com>
Also fixes https://github.com/moby/moby/issues/22874
This commit is a pre-requisite to moving moby/moby on Windows to using
Containerd for its runtime.
The reason for this is that the interface between moby and containerd
for the runtime is an OCI spec which must be unambigious.
It is the responsibility of the runtime (runhcs in the case of
containerd on Windows) to ensure that arguments are escaped prior
to calling into HCS and onwards to the Win32 CreateProcess call.
Previously, the builder was always escaping arguments which has
led to several bugs in moby. Because the local runtime in
libcontainerd had context of whether or not arguments were escaped,
it was possible to hack around in daemon/oci_windows.go with
knowledge of the context of the call (from builder or not).
With a remote runtime, this is not possible as there's rightly
no context of the caller passed across in the OCI spec. Put another
way, as I put above, the OCI spec must be unambigious.
The other previous limitation (which leads to various subtle bugs)
is that moby is coded entirely from a Linux-centric point of view.
Unfortunately, Windows != Linux. Windows CreateProcess uses a
command line, not an array of arguments. And it has very specific
rules about how to escape a command line. Some interesting reading
links about this are:
https://blogs.msdn.microsoft.com/twistylittlepassagesallalike/2011/04/23/everyone-quotes-command-line-arguments-the-wrong-way/https://stackoverflow.com/questions/31838469/how-do-i-convert-argv-to-lpcommandline-parameter-of-createprocesshttps://docs.microsoft.com/en-us/cpp/cpp/parsing-cpp-command-line-arguments?view=vs-2017
For this reason, the OCI spec has recently been updated to cater
for more natural syntax by including a CommandLine option in
Process.
What does this commit do?
Primary objective is to ensure that the built OCI spec is unambigious.
It changes the builder so that `ArgsEscaped` as commited in a
layer is only controlled by the use of CMD or ENTRYPOINT.
Subsequently, when calling in to create a container from the builder,
if follows a different path to both `docker run` and `docker create`
using the added `ContainerCreateIgnoreImagesArgsEscaped`. This allows
a RUN from the builder to control how to escape in the OCI spec.
It changes the builder so that when shell form is used for RUN,
CMD or ENTRYPOINT, it builds (for WCOW) a more natural command line
using the original as put by the user in the dockerfile, not
the parsed version as a set of args which loses fidelity.
This command line is put into args[0] and `ArgsEscaped` is set
to true for CMD or ENTRYPOINT. A RUN statement does not commit
`ArgsEscaped` to the commited layer regardless or whether shell
or exec form were used.
Signed-off-by: John Howard <jhoward@microsoft.com>
This is the first step in refactoring moby (dockerd) to use containerd on Windows.
Similar to the current model in Linux, this adds the option to enable it for runtime.
It does not switch the graphdriver to containerd snapshotters.
- Refactors libcontainerd to a series of subpackages so that either a
"local" containerd (1) or a "remote" (2) containerd can be loaded as opposed
to conditional compile as "local" for Windows and "remote" for Linux.
- Updates libcontainerd such that Windows has an option to allow the use of a
"remote" containerd. Here, it communicates over a named pipe using GRPC.
This is currently guarded behind the experimental flag, an environment variable,
and the providing of a pipename to connect to containerd.
- Infrastructure pieces such as under pkg/system to have helper functions for
determining whether containerd is being used.
(1) "local" containerd is what the daemon on Windows has used since inception.
It's not really containerd at all - it's simply local invocation of HCS APIs
directly in-process from the daemon through the Microsoft/hcsshim library.
(2) "remote" containerd is what docker on Linux uses for it's runtime. It means
that there is a separate containerd service running, and docker communicates over
GRPC to it.
To try this out, you will need to start with something like the following:
Window 1:
containerd --log-level debug
Window 2:
$env:DOCKER_WINDOWS_CONTAINERD=1
dockerd --experimental -D --containerd \\.\pipe\containerd-containerd
You will need the following binary from github.com/containerd/containerd in your path:
- containerd.exe
You will need the following binaries from github.com/Microsoft/hcsshim in your path:
- runhcs.exe
- containerd-shim-runhcs-v1.exe
For LCOW, it will require and initrd.img and kernel in `C:\Program Files\Linux Containers`.
This is no different to the current requirements. However, you may need updated binaries,
particularly initrd.img built from Microsoft/opengcs as (at the time of writing), Linuxkit
binaries are somewhat out of date.
Note that containerd and hcsshim for HCS v2 APIs do not yet support all the required
functionality needed for docker. This will come in time - this is a baby (although large)
step to migrating Docker on Windows to containerd.
Note that the HCS v2 APIs are only called on RS5+ builds. RS1..RS4 will still use
HCS v1 APIs as the v2 APIs were not fully developed enough on these builds to be usable.
This abstraction is done in HCSShim. (Referring specifically to runtime)
Note the LCOW graphdriver still uses HCS v1 APIs regardless.
Note also that this does not migrate docker to use containerd snapshotters
rather than graphdrivers. This needs to be done in conjunction with Linux also
doing the same switch.
This function was previously used on the client to validate
tmpfs options, but is no longer used since
b9b8d8b364, as this validation
is platform-specific, so should be handled by the daemon.
Removing this function as it's no longer used anywhere.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Please refer to `docs/rootless.md`.
TLDR:
* Make sure `/etc/subuid` and `/etc/subgid` contain the entry for you
* `dockerd-rootless.sh --experimental`
* `docker -H unix://$XDG_RUNTIME_DIR/docker.sock run ...`
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
As reported in docker/for-linux/issues/484, since Docker 18.06
docker cp with a destination file name fails with the following error:
> archive/tar: cannot encode header: Format specifies USTAR; and USTAR cannot encode Name="a_very_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx_long_filename_that_is_101_characters"
The problem is caused by changes in Go 1.10 archive/tar, which
mis-guesses the tar stream format as USTAR (rather than PAX),
which, in turn, leads to inability to specify file names
longer than 100 characters.
This tar stream is sent by TarWithOptions() (which, since we switched to
Go 1.10, explicitly sets format=PAX for every file, see FileInfoHeader(),
and before Go 1.10 it was PAX by default). Unfortunately, the receiving
side, RebaseArchiveEntries(), which calls tar.Next(), mistakenly guesses
header format as USTAR, which leads to the above error.
The fix is easy: set the format to PAX in RebaseArchiveEntries()
where we read the tar stream and change the file name.
A unit test is added to prevent future regressions.
NOTE this code is not used by dockerd, but rather but docker cli
(also possibly other clients), so this needs to be re-vendored
to cli in order to take effect.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
`time.After` keeps a timer running until the specified duration is
completed. It also allocates a new timer on each call. This can wind up
leaving lots of uneccessary timers running in the background that are
not needed and consume resources.
Instead of `time.After`, use `time.NewTimer` so the timer can actually
be stopped.
In some of these cases it's not a big deal since the duraiton is really
short, but in others it is much worse.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
RHEL/CentOS 3.10 kernels report that kernel-memory accounting is supported,
but it actually does not work.
Runc (when compiled for those kernels) will be compiled without kernel-memory
support, so even though the daemon may be reporting that it's supported,
it actually is not.
This cause tests to fail when testing against a daemon that's using a runc
version without kmem support.
For now, skip these tests based on the kernel version reported by the daemon.
This should fix failures such as:
```
FAIL: /go/src/github.com/docker/docker/integration-cli/docker_cli_run_unix_test.go:499: DockerSuite.TestRunWithKernelMemory
assertion failed:
Command: /usr/bin/docker run --kernel-memory 50M --name test1 busybox cat /sys/fs/cgroup/memory/memory.kmem.limit_in_bytes
ExitCode: 0
Error: <nil>
Stdout: 9223372036854771712
Stderr: WARNING: You specified a kernel memory limit on a kernel older than 4.0. Kernel memory limits are experimental on older kernels, it won't work as expected and can cause your system to be unstable.
Failures:
Expected stdout to contain "52428800"
FAIL: /go/src/github.com/docker/docker/integration-cli/docker_cli_update_unix_test.go:125: DockerSuite.TestUpdateKernelMemory
/go/src/github.com/docker/docker/integration-cli/docker_cli_update_unix_test.go:136:
...open /go/src/github.com/docker/docker/integration-cli/docker_cli_update_unix_test.go: no such file or directory
... obtained string = "9223372036854771712"
... expected string = "104857600"
----------------------------------------------------------------------
FAIL: /go/src/github.com/docker/docker/integration-cli/docker_cli_update_unix_test.go:139: DockerSuite.TestUpdateKernelMemoryUninitialized
/go/src/github.com/docker/docker/integration-cli/docker_cli_update_unix_test.go:149:
...open /go/src/github.com/docker/docker/integration-cli/docker_cli_update_unix_test.go: no such file or directory
... value = nil
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
* Replaces `cocks` with `cerf` as the former might be perceived as
offensive by some people (as pointed out by @jeking3
[here](https://github.com/moby/moby/pull/37157#commitcomment-31758059))
* Removes a duplicate entry for `burnell`
* Re-arranges the entry for `sutherland` to ensure that the names are in
sorted order
* Adds entries for `shamir` and `wilbur`
Signed-off-by: Debayan De <debayande@users.noreply.github.com>
The errors returned from Mount and Unmount functions are raw
syscall.Errno errors (like EPERM or EINVAL), which provides
no context about what has happened and why.
Similar to os.PathError type, introduce mount.Error type
with some context. The error messages will now look like this:
> mount /tmp/mount-tests/source:/tmp/mount-tests/target, flags: 0x1001: operation not permitted
or
> mount tmpfs:/tmp/mount-test-source-516297835: operation not permitted
Before this patch, it was just
> operation not permitted
[v2: add Cause()]
[v3: rename MountError to Error, document Cause()]
[v4: fixes; audited all users]
[v5: make Error type private; changes after @cpuguy83 reviews]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
It has been pointed out that we're ignoring EINVAL from umount(2)
everywhere, so let's move it to a lower-level function. Also, its
implementation should be the same for any UNIX incarnation, so
let's consolidate it.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
syscall.Stat (and Lstat), unlike functions from os pkg,
return "raw" errors (like EPERM or EINVAL), and those are
propagated up the function call stack unchanged, and gets
logged and/or returned to the user as is.
Wrap those into os.PathError{} so the error message will
at least have function name and file name.
Note we use Capitalized function names to distinguish
between functions in os and ours.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Ubuntu kernel supports overlayfs in user namespaces.
However, Docker had previously crafting overlay opaques directly
using mknod(2) and setxattr(2), which are not supported in userns.
Tested with LXD, Ubuntu 18.04, kernel 4.15.0-36-generic #39-Ubuntu.
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
This fix tries to address the issue raised in 37038 where
there were no memory.kernelTCP support for linux.
This fix add MemoryKernelTCP to HostConfig, and pass
the config to runtime-spec.
Additional test case has been added.
This fix fixes 37038.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
Signed-off-by: John Howard <jhoward@microsoft.com>
If fixes an error in sameFsTime which was using `==` to compare two times. The correct way is to use go's built-in timea.Equals(timeb).
In changes_windows, it uses sameFsTime to compare mTim of a `system.StatT` to allow TestChangesDirsMutated to operate correctly now.
Note there is slight different between the Linux and Windows implementations of detecting changes. Due to https://github.com/moby/moby/issues/9874,
and the fix at https://github.com/moby/moby/pull/11422, Linux does not consider a change to the directory time as a change. Windows on NTFS
does. See https://github.com/moby/moby/pull/37982 for more information. The result in `TestChangesDirsMutated`, `dir3` is NOT considered a change
in Linux, but IS considered a change on Windows. The test mutates dir3 to have a mtime of +1 second.
With a handful of tests still outstanding, this change ports most of the unit tests under pkg/archive to Windows.
It provides an implementation of `copyDir` in tests for Windows. To make a copy similar to Linux's `cp -a` while preserving timestamps
and links to both valid and invalid targets, xcopy isn't sufficient. So I used robocopy, but had to circumvent certain exit codes that
robocopy exits with which are warnings. Link to article describing this is in the code.
This function ensures the argument is the mount point
(i.e. if it's not, it bind mounts it to itself).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. There is no need to specify rw argument -- bind mounts are
read-write by default.
2. There is no point in parsing /proc/self/mountinfo after performing
a mount, especially if we don't check whether the fs is mounted or
not -- the only outcome from it could be an error from our mountinfo
parser, which makes no sense in this context.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Using a value such as `--cpuset-mems=1-9223372036854775807` would cause
`dockerd` to run out of memory allocating a map of the values in the
validation code. Set limits to the normal limit of the number of CPUs,
and improve the error handling.
Reported by Huawei PSIRT.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This should eliminate a bunch of new (go-1.11 related) validation
errors telling that the code is not formatted with `gofmt -s`.
No functional change, just whitespace (i.e.
`git show --ignore-space-change` shows nothing).
Patch generated with:
> git ls-files | grep -v ^vendor/ | grep .go$ | xargs gofmt -s -w
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The code in Close() that removes the watches was not working,
because it first sets `w.closed = true` and then calls w.close(),
which starts with
```
if w.closed {
return errPollerClosed
}
```
Fix by setting w.closed only after calling w.remove() for all the
files being watched.
While at it, remove the duplicated `delete(w.watches, name)` code.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
There is no need to wait for up to 200ms in order to close
the file descriptor once the chClose is received.
This commit might reduce the chances for occasional "The process
cannot access the file because it is being used by another process"
error on Windows, where an opened file can't be removed.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
* Add cool, crazy, charming, magical and sweet as a adjectives (Aug 18)
* Add four male scientists to the list - faraday, maxwell, sutherland, and moore (Aug 21)
* Add four female scientists to the list - cannon, moser and rhodes (Aug 28)
Signed-off-by: Yadnyawalkya Tale <yadnyawalkyatale@gmail.com>
This implements chown support on Windows. Built-in accounts as well
as accounts included in the SAM database of the container are supported.
NOTE: IDPair is now named Identity and IDMappings is now named
IdentityMapping.
The following are valid examples:
ADD --chown=Guest . <some directory>
COPY --chown=Administrator . <some directory>
COPY --chown=Guests . <some directory>
COPY --chown=ContainerUser . <some directory>
On Windows an owner is only granted the permission to read the security
descriptor and read/write the discretionary access control list. This
fix also grants read/write and execute permissions to the owner.
Signed-off-by: Salahuddin Khan <salah@docker.com>
Since go-1.11beta1 archive/tar, tar headers with Typeflag == TypeRegA
(numeric 0) (which is the default unless explicitly initialized) are
modified to have Typeflag set to either tar.TypeReg (character value
'0', not numeric 0) or tar.TypeDir (character value '5') [1].
This results in different Typeflag value in the resulting header,
leading to a different Checksum, and causing the following test
case errors:
> 12:09:14 --- FAIL: TestTarSums (0.05s)
> 12:09:14 tarsum_test.go:393: expecting
> [tarsum+sha256:8bf12d7e67c51ee2e8306cba569398b1b9f419969521a12ffb9d8875e8836738],
> but got
> [tarsum+sha256:75258b2c5dcd9adfe24ce71eeca5fc5019c7e669912f15703ede92b1a60cb11f]
> ... (etc.)
All the other code explicitly sets the Typeflag field, but this test
case is not, causing the incompatibility with Go 1.11. Therefore,
the fix is to set TypeReg explicitly, and change the expected checksums
in test cases).
Alternatively, we can vendor archive/tar again (for the 100th time),
but given that the issue is limited to the particular test case it
does not make sense.
This fixes the test for all Go versions.
[1] https://go-review.googlesource.com/c/go/+/85656
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: John Howard <jhoward@microsoft.com>
Addresses https://github.com/moby/moby/pull/35089#issuecomment-367802698.
This change enables the daemon to automatically select an image under LCOW
that can be used if the API doesn't specify an explicit platform.
For example:
FROM supertest2014/nyan
ADD Dockerfile /
And docker build . will download the linux image (not a multi-manifest image)
And similarly docker pull ubuntu will match linux/amd64
This makes it a bit simpler to remove this interface for v2 plugins
and not break external projects (libnetwork and swarmkit).
Note that before we remove the `Client()` interface from `CompatPlugin`
libnetwork and swarmkit must be updated to explicitly check for the v1
client interface as is done int his PR.
This is just a minor tweak that I realized is needed after trying to
implement the needed changes on libnetwork.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Sometimes docker-master CI fails on rhel4+selinux configuration,
like this:
--- FAIL: TestMount (0.12s)
--- FAIL: TestMount/none-remount,size=128k (0.01s)
mounter_linux_test.go:209: unexpected mount option "seclabel" expected "rw,size=128k"
--- FAIL: TestMount/none-remount,ro,size=128k (0.01s)
mounter_linux_test.go:209: unexpected mount option "seclabel" expected "ro,size=128k"
Earlier, commit 8bebd42df2 (PR #34965) fixed this failure,
but not entirely (i.e. the test is now flaky). It looks like
either selinux detection code is not always working (it won't
work in d-in-d), or the kernel might or might not add 'seclabel'
option).
As the subject of this test case is definitely not selinux,
it can just ignore the option added by it.
While at it, fix error messages:
- add missing commas;
- fix a typo;
- allow for clear distinction between mount
and vfs (per-superblock) options.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In pkg/term/proxy.go and pkg/term/proxy_test.go, check if escapeKeys is empty and if it is, return the one key read
Signed-off-by: Patrik Cyvoct <patrik@ptrk.io>
Since Go 1.7, context is a standard package. Since Go 1.9, everything
that is provided by "x/net/context" is a couple of type aliases to
types in "context".
Many vendored packages still use x/net/context, so vendor entry remains
for now.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
It should check `os.Geteuid` with `uid` instead of `os.Getegid`.
On the container (where the tests run), the uid and gid seems to be
the same, thus this doesn't fail.
Signed-off-by: Vincent Demeester <vincent@sbr.pm>
govet complains (when using standard "context" package):
> the cancel function returned by context.WithTimeout should be called,
> not discarded, to avoid a context leak (vet)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Now, every Unmount() call takes a burden to parse the whole nine yards
of /proc/self/mountinfo to figure out whether the given mount point is
mounted or not (and returns an error in case parsing fails somehow).
Instead, let's just call umount() and ignore EINVAL, which results
in the same behavior, but much better performance.
Note that EINVAL is returned from umount(2) not only in the case when
`target` is not mounted, but also for invalid flags. As the flags are
hardcoded here, it can't be the case.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The mountinfo parser implemented via `fmt.Sscanf()` is slower than the one
using `strings.Split()` and `strconv.Atoi()`. This rewrite helps to speed it
up to a factor of 8x, here is a result from go bench:
> BenchmarkParsingScanf-4 300 22294112 ns/op
> BenchmarkParsingSplit-4 3000 2780703 ns/op
I tried other approaches, such as using `fmt.Sscanf()` for the first
three (integer) fields and `strings.Split()` for the rest, but it slows
things down considerably:
> BenchmarkParsingMixed-4 1000 8827058 ns/op
Note the old code uses `fmt.Sscanf`, when a linear search for '-' field,
when a split for the last 3 fields. The new code relies on a single
split.
I have also added more comments to aid in future development.
Finally, the test data is fixed to now have white space before the first field.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The flow of getSourceMount was:
1 get all entries from /proc/self/mountinfo
2 do a linear search for the `source` directory
3 if found, return its data
4 get the parent directory of `source`, goto 2
The repeated linear search through the whole mountinfo (which can have
thousands of records) is inefficient. Instead, let's just
1 collect all the relevant records (only those mount points
that can be a parent of `source`)
2 find the record with the longest mountpath, return its data
This was tested manually with something like
```go
func TestGetSourceMount(t *testing.T) {
mnt, flags, err := getSourceMount("/sys/devices/msr/")
assert.NoError(t, err)
t.Logf("mnt: %v, flags: %v", mnt, flags)
}
```
...but it relies on having a specific mount points on the system
being used for testing.
[v2: add unit tests for ParentsFilter]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Functions `GetMounts()` and `parseMountTable()` return all the entries
as read and parsed from /proc/self/mountinfo. In many cases the caller
is only interested only one or a few entries, not all of them.
One good example is `Mounted()` function, which looks for a specific
entry only. Another example is `RecursiveUnmount()` which is only
interested in mount under a specific path.
This commit adds `filter` argument to `GetMounts()` to implement
two things:
1. filter out entries a caller is not interested in
2. stop processing if a caller is found what it wanted
`nil` can be passed to get a backward-compatible behavior, i.e. return
all the entries.
A few filters are implemented:
- `PrefixFilter`: filters out all entries not under `prefix`
- `SingleEntryFilter`: looks for a specific entry
Finally, `Mounted()` is modified to use `SingleEntryFilter()`, and
`RecursiveUnmount()` is using `PrefixFilter()`.
Unit tests are added to check filters are working.
[v2: ditch NoFilter, use nil]
[v3: ditch GetMountsFiltered()]
[v4: add unit test for filters]
[v5: switch to gotestyourself]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Prevent changing the tar output by setting the format to
PAX and keeping the times truncated.
Without this change the archiver will produce different tar
archives with different hashes with go 1.10.
The addition of the access and changetime timestamps would
also cause diff comparisons to fail.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Remove invalid flush commands, flush should only occur when file
has been completely written. This is already handle, remove these calls.
Ensure data gets written after EOF in correct order and before close.
Remove gname and uname from sum for hash compatibility.
Update tarsum tests for gname/uname removal.
Return valid length after eof.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
When the authz response buffer limit is hit, perform a flush.
This prevents excessive buffer sizes, especially on large responses
(e.g. `/containers/<id>/archive` or `/containers/<id>/export`).
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Sorting by mount point length can be implemented in a more
straightforward fashion since Go 1.8 introduced sort.Slice()
with an ability to provide a less() function in place.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This makes `go test .` to pass if run as non-root user, skipping
those tests that require superuser privileges (for `mount`).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
dm_task_deferred_remove is not supported by all distributions, due to
out-dated versions of devicemapper. However, in the case where the
devicemapper library was updated without rebuilding Docker (which can
happen in some distributions) then we should attempt to dynamically load
the relevant object rather than try to link to it.
This can only be done if Docker was built dynamically, for obvious
reasons.
In order to avoid having issues arise when dlsym(3) was unnecessary,
gate the whole dlsym(3) logic behind a buildflag that we disable by
default (libdm_dlsym_deferred_remove).
Signed-off-by: Aleksa Sarai <asarai@suse.de>
Before this change, volume management was relying on the fact that
everything the plugin mounts is visible on the host within the plugin's
rootfs. In practice this caused some issues with mount leaks, so we
changed the behavior such that mounts are not visible on the plugin's
rootfs, but available outside of it, which breaks volume management.
To fix the issue, allow the plugin to scope the path correctly rather
than assuming that everything is visible in `p.Rootfs`.
In practice this is just scoping the `PropagatedMount` paths to the
correct host path.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>