enable resource limitation by disabling cgroup v1 warnings
resource limitation still doesn't work with rootless mode (even with systemd mode)
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
If `unix.Lgetxattr` returns an error, then `sz == -1` which will cause a
runtime panic if `errno == unix.ERANGE`.
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
This clarifies comments about static linking made in commit a8608b5b67.
1. There are two ways to create a static binary, one is to disable
cgo, the other is to set linker flags. When cgo is disabled,
there is no need to use osusergo build tag.
2. osusergo only needs to be set when linking against glibc.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
TL;DR: there is no way to do this right.
We do know that in some combination of build tags set (or unset),
linker flags, environment variables, and libc implementation,
this package won't work right. In fact, there is one specific
combination:
1. `CGO_ENABLED=1` (or unset)
2. static binary is being built (e.g. `go build` is run with `-extldflags -static`)
3. `go build` links the binary against glibc
4. `osusergo` is not set
This particular combination results in the following legitimate linker warning:
> cgo_lookup_unix.go: warning: Using 'getpwuid_r' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
If this warning is ignored and the resulting binary is used on a system
with files from a different glibc version (or without those files), it
could result in a segfault.
The commit being reverted tried to guard against such possibility,
but the problem is, we can only use build tags to account for items
1 and 4 from the above list, while items 2 and 3 do not result in
any build tags being set or unset, making this guard excessive.
Remove it.
This reverts commit 023b072288.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
```
pkg/mount/mountinfo_linux.go:93:5: SA4011: ineffective break statement. Did you mean to break out of the outer loop? (staticcheck)
break
^
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Also fixed some incorrectly formatted comments
```
pkg/jsonmessage/jsonmessage.go:180:20: SA1006: printf-style function with dynamic format string and no further arguments should use print-style function instead (staticcheck)
fmt.Fprintf(out, endl)
^
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This struct now has a properly typed member, so use the properly typed
functions with it.
Also update the vendor directory and hope nothing explodes.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This is to ensure that users of the homedir package cannot
compile statically (CGO_ENABLED=0) without also setting the osusergo
build tag.
Signed-off-by: Tibor Vass <tibor@docker.com>
About github.com/opencontainers/runc/libcontainer/user:
According to 195d8d544a
this package has two functions:
- Have a static implementation of user lookup, which is now supported in the
os/user stdlib package with the osusergo build tag, but wasn't at the time.
- Have extra functions that os/user doesn't have, but none of those are used
in homedir.
Since https://github.com/moby/moby/pull/11287, homedir depended directly on
libcontainer's user package for CurrentUser().
This is being replaced with os/user.Current(), because all of our static
binaries are compiled with the osusergo tag, and for dynamic libraries it
is more correct to use libc's implementation than parsing /etc/passwd.
About github.com/docker/docker/pkg/idtools:
Only dependency was from GetStatic() which uses idtools.LookupUID(uid).
The implementation of idtools.LookupUID just calls to
github.com/opencontainers/runc/libcontainer/user.LookupUid or fallbacks
to exec-ing to getent (since https://github.com/moby/moby/pull/27599).
This patch replaces calls to homedir.GetStatic by homedir.Get(), opting out
of supporting nss lookups in static binaries via exec-ing to getent for
the homedir package.
If homedir package users need to support nss lookups, they are advised
to compile dynamically instead.
Signed-off-by: Tibor Vass <tibor@docker.com>
Staticcheck reported:
SA9002: file mode '600' evaluates to 01130; did you mean '0600'? (staticcheck)
But fixing that caused the test to fail:
=== Failed
=== FAIL: pkg/filenotify TestPollerEvent (0.80s)
poller_test.go:75: timeout waiting for event CHMOD
The problem turned out to be that the file was created with `0644`. However,
after umask, the file created actually had `0600` filemode. Running the `os.Chmod`
with `0600` therefore was a no-op, causing the test to fail (because no
CHMOD event would fire).
This patch changes the test to;
- create the file with mode `0600`
- assert that the file has the expected mode
- change the chmod to `0644`
- assert that it has the correct mode, before testing the event.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
```
pkg/devicemapper/devmapper_wrapper.go:209:206: SA4000: identical expressions on the left and right side of the '==' operator (staticcheck)
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
As pointed out by govet,
> pkg/jsonmessage/jsonmessage_test.go:231:94: nilness: nil dereference in dynamic method call (govet)
> if err := DisplayJSONMessagesStream(reader, data, inFd, false, nil); err == nil && err.Error()[:17] != "invalid character" {
> ^
The nil deref never happened as err was always non-nil, and so the check
for error message text was not performed.
Fix this, and while at it, refactor the code a bit.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Fix the following warnings from staticcheck linter:
```
pkg/term/term_linux_test.go:34:2: SA5001: should check returned error before deferring tty.Close() (staticcheck)
defer tty.Close()
^
pkg/term/term_linux_test.go:52:2: SA5001: should check returned error before deferring tty.Close() (staticcheck)
defer tty.Close()
^
pkg/term/term_linux_test.go:67:2: SA5001: should check returned error before deferring tty.Close() (staticcheck)
defer tty.Close()
^
....
```
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
opts/env_test: suppress a linter warning
this one:
> opts/env_test.go:95:4: U1000: field `err` is unused (unused)
> err error
> ^
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Format the source according to latest goimports.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
```
13:06:14 pkg/directory/directory_test.go:182:2: S1032: should use sort.Strings(...) instead of sort.Sort(sort.StringSlice(...)) (gosimple)
13:06:14 sort.Sort(sort.StringSlice(results))
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
```
13:06:14 pkg/pools/pools.go:75:13: SA6002: argument should be pointer-like to avoid allocations (staticcheck)
13:06:14 bp.pool.Put(b)
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
There is a race condition in pkg/archive when using `cmd.Start` for pigz
and xz where the `*bufio.Reader` could be returned to the pool while the
command is still writing to it, and then picked up and used by a new
command.
The command is wrapped in a `CommandContext` where the process will be
killed when the context is cancelled, however this is not instantaneous,
so there's a brief window while the command is still running but the
`*bufio.Reader` was already returned to the pool.
wrapReadCloser calls `cancel()`, and then `readBuf.Close()` which
eventually returns the buffer to the pool. However, because cmdStream
runs `cmd.Wait` in a go routine that we never wait for to finish, it is
not safe to return the reader to the pool yet. We need to ensure we
wait for `cmd.Wait` to finish!
Signed-off-by: Stephen Benjamin <stephen@redhat.com>
CheckSystemDriveAndRemoveDriveLetter depends on pathdriver.PathDriver
unnecessarily. This depends on the minimal interface that it actually
needs, to avoid callers from unnecessarily bringing in a
containerd/continuity dependency.
Signed-off-by: Jon Johnson <jonjohnson@google.com>
Support for GOOS=solaris was removed in PR #35373. Remove two leftover
*_solaris.go files missed in this PR.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
The `ioctl` interface for the `LOOP_CTL_GET_FREE` request on
`/dev/loop-control` is a little different from what `unix.IoctlGetInt`
expects: the first index is the returned status in `r1`, not an `int`
pointer as the first parameter.
Unfortunately we have to go a little lower level to get the appropriate
loop device index out, using `unix.Syscall` directly to read from
`r1`. Internally, the index is returned as a signed integer to match the
internal `ioctl` expectations of interpreting a negative signed integer
as an error at the userspace ABI boundary, so the direct interface of
`ioctlLoopCtlGetFree` can remain as-is.
[@kolyshkin: it still worked before this fix because of
/dev scan fallback in ioctlLoopCtlGetFree()]
Signed-off-by: Daniel Sweet <danieljsweet@icloud.com>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
- use subtests to make it clearer what the individual test-cases
are, and to prevent tests from depending on values set by the
previous test(s).
- remove redundant messages in assert (gotest.tools already prints
a useful message if assertions fail).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
also renamed the non-windows variant of this file to be
consistent with other files in this package
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
environment not in the chroot from untrusted files.
See also OpenVZ a3f732ef75/src/enter.c (L227-L234)
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
(cherry picked from commit cea6dca993c2b4cfa99b1e7a19ca134c8ebc236b)
Signed-off-by: Tibor Vass <tibor@docker.com>
This fix was added in 8e71b1e210 to work around
a go issue (https://github.com/golang/go/issues/20506).
That issue was fixed in
66c03d39f3,
which is part of Go 1.10 and up. This reverts the changes that were made in
8e71b1e210, and are no longer needed.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Not really bullet-proof, users can still create cgroups with name like
"foo:/init.scope" or "\nfoo" to bypass the detection. However, solving
these cases will require kernel to provide a better interface.
Signed-off-by: Robert Wang <robert@arctic.tw>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This can happen when you have --config-only network
Such attempt will fail anyway and it will create 15s delay in container
startup
Signed-off-by: Pavel Matěja <pavel@verotel.cz>
Use Getpid and SchedGetaffinity from golang.org/x/sys/unix to get the
number of CPUs in numCPU on Linux.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
This is needed so that we can add OS version constraints in Swarmkit, which
does require the engine to report its host's OS version (see
https://github.com/docker/swarmkit/issues/2770).
The OS version is parsed from the `os-release` file on Linux, and from the
`ReleaseId` string value of the `SOFTWARE\Microsoft\Windows NT\CurrentVersion`
registry key on Windows.
Added unit tests when possible, as well as Prometheus metrics.
Signed-off-by: Jean Rouge <rougej+github@gmail.com>
Previously only unpack operations were supported with chroot.
This adds chroot support for packing operations.
This prevents potential breakouts when copying data from a container.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This is useful for preventing CVE-2018-15664 where a malicious container
process can take advantage of a race on symlink resolution/sanitization.
Before this change chrootarchive would chroot to the destination
directory which is attacker controlled. With this patch we always chroot
to the container's root which is not attacker controlled.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Moby currently sorts uid and gid ranges in id maps. This causes subuid
and subgid files to be interpreted wrongly.
The subuid file
```
> cat /etc/subuid
jonas:100000:1000
jonas:1000:1
```
configures that the container uids 0-999 are mapped to the host uids
100000-100999 and uid 1000 in the container is mapped to uid 1000 on the
host. The expected uid_map is:
```
> docker run ubuntu cat /proc/self/uid_map
0 100000 1000
1000 1000 1
```
Moby currently sorts the ranges by the first id in the range. Therefore
with the subuid file above the uid 0 in the container is mapped to uid
100000 on host and the uids 1-1000 in container are mapped to the uids
1-1000 on the host. The resulting uid_map is:
```
> docker run ubuntu cat /proc/self/uid_map
0 1000 1
1 100000 1000
```
The ordering was implemented to work around a limitation in Linux 3.8.
This is fixed since Linux 3.9 as stated on the user namespaces manpage
[1]:
> In the initial implementation (Linux 3.8), this requirement was
> satisfied by a simplistic implementation that imposed the further
> requirement that the values in both field 1 and field 2 of successive
> lines must be in ascending numerical order, which prevented some
> otherwise valid maps from being created. Linux 3.9 and later fix this
> limitation, allowing any valid set of nonoverlapping maps.
This fix changes the interpretation of subuid and subgid files which do
not have the ids of in the numerical order for each individual user.
This breaks users that rely on the current behaviour.
The desired mapping above - map low user ids in the container to high
user ids on the host and some higher user ids in the container to lower
user on host - can unfortunately not archived with the current
behaviour.
[1] http://man7.org/linux/man-pages/man7/user_namespaces.7.html
Signed-off-by: Jonas Dohse <jonas@dohse.ch>
This is enabled for all containers that are not run with --privileged,
if the kernel supports it.
Fixes#38332
Signed-off-by: Rob Gulewich <rgulewich@netflix.com>
Katherine Louise Bouman is an imaging scientist and Assistant Professor
of Computer Science at the California Institute of Technology. She
researches computational methods for imaging, and developed an algorithm
that made possible the picture first visualization of a black hole
using the Event Horizon Telescope. - https://en.wikipedia.org/wiki/Katie_Bouman
Thank you for being amazing!
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The only option we supply is either BIND or a mount propagation flag,
so it makes sense to specify the flag value directly, rather than using
parseOptions() every time.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Current code in MakeMount parses /proc/self/mountinfo twice:
first in call to Mounted(), then in call to Mount(). Use
ForceMount() to eliminate such double parsing.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
`/proc/self/mountinfo` uses `\040` for spaces, however, `parseInfoFile()`
did not decode those spaces in paths, therefore attempting to use `\040`
as a literal part of the path.
This patch un-quotes the `root` and `mount point` fields to fix
situations where paths contain spaces.
Note that the `mount source` field is not modified, given that
this field is documented (man `PROC(5)`) as:
filesystem-specific information or "none"
Which I interpreted as "the format in this field is undefined".
Reported-by: Daniil Yaroslavtsev <daniilyar@users.noreply.github.com>
Reported-by: Nathan Ringo <remexre@gmail.com>
Based-on-patch-by: Diego Becciolini <itizir@users.noreply.github.com>
Based-on-patch-by: Sergei Utinski <sergei-utinski@users.noreply.github.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
* Add new adjectives to the names generator
Signed-off-by: sh7dm <d3dx12.xx@gmail.com>
* Add some more adjectives to the names generator
Signed-off-by: sh7dm <d3dx12.xx@gmail.com>
Signed-off-by: John Howard <jhoward@microsoft.com>
Some permissions corrections here. Also needs re-vendor of go-winio.
- Create the layer folder directory as standard, not with SDDL. It will inherit permissions from the data-root correctly.
- Apply the VM Group SID access to layer.vhd
Permissions after this changes
Data root:
```
PS C:\> icacls test
test BUILTIN\Administrators:(OI)(CI)(F)
NT AUTHORITY\SYSTEM:(OI)(CI)(F)
```
lcow subdirectory under dataroot
```
PS C:\> icacls test\lcow
test\lcow BUILTIN\Administrators:(I)(OI)(CI)(F)
NT AUTHORITY\SYSTEM:(I)(OI)(CI)(F)
```
layer.vhd in a layer folder for LCOW
```
.\test\lcow\c33923d21c9621fea2f990a8778f469ecdbdc57fd9ca682565d1fa86fadd5d95\layer.vhd NT VIRTUAL MACHINE\Virtual Machines:(R)
BUILTIN\Administrators:(I)(F)
NT AUTHORITY\SYSTEM:(I)(F)
```
And showing working
```
PS C:\> docker-ci-zap -folder=c:\test
INFO: Zapped successfully
PS C:\> docker run --rm alpine echo hello
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
8e402f1a9c57: Pull complete
Digest: sha256:644fcb1a676b5165371437feaa922943aaf7afcfa8bfee4472f6860aad1ef2a0
Status: Downloaded newer image for alpine:latest
hello
```
This patch hard-codes support for NVIDIA GPUs.
In a future patch it should move out into its own Device Plugin.
Signed-off-by: Tibor Vass <tibor@docker.com>
Signed-off-by: John Howard <jhoward@microsoft.com>
Also fixes https://github.com/moby/moby/issues/22874
This commit is a pre-requisite to moving moby/moby on Windows to using
Containerd for its runtime.
The reason for this is that the interface between moby and containerd
for the runtime is an OCI spec which must be unambigious.
It is the responsibility of the runtime (runhcs in the case of
containerd on Windows) to ensure that arguments are escaped prior
to calling into HCS and onwards to the Win32 CreateProcess call.
Previously, the builder was always escaping arguments which has
led to several bugs in moby. Because the local runtime in
libcontainerd had context of whether or not arguments were escaped,
it was possible to hack around in daemon/oci_windows.go with
knowledge of the context of the call (from builder or not).
With a remote runtime, this is not possible as there's rightly
no context of the caller passed across in the OCI spec. Put another
way, as I put above, the OCI spec must be unambigious.
The other previous limitation (which leads to various subtle bugs)
is that moby is coded entirely from a Linux-centric point of view.
Unfortunately, Windows != Linux. Windows CreateProcess uses a
command line, not an array of arguments. And it has very specific
rules about how to escape a command line. Some interesting reading
links about this are:
https://blogs.msdn.microsoft.com/twistylittlepassagesallalike/2011/04/23/everyone-quotes-command-line-arguments-the-wrong-way/https://stackoverflow.com/questions/31838469/how-do-i-convert-argv-to-lpcommandline-parameter-of-createprocesshttps://docs.microsoft.com/en-us/cpp/cpp/parsing-cpp-command-line-arguments?view=vs-2017
For this reason, the OCI spec has recently been updated to cater
for more natural syntax by including a CommandLine option in
Process.
What does this commit do?
Primary objective is to ensure that the built OCI spec is unambigious.
It changes the builder so that `ArgsEscaped` as commited in a
layer is only controlled by the use of CMD or ENTRYPOINT.
Subsequently, when calling in to create a container from the builder,
if follows a different path to both `docker run` and `docker create`
using the added `ContainerCreateIgnoreImagesArgsEscaped`. This allows
a RUN from the builder to control how to escape in the OCI spec.
It changes the builder so that when shell form is used for RUN,
CMD or ENTRYPOINT, it builds (for WCOW) a more natural command line
using the original as put by the user in the dockerfile, not
the parsed version as a set of args which loses fidelity.
This command line is put into args[0] and `ArgsEscaped` is set
to true for CMD or ENTRYPOINT. A RUN statement does not commit
`ArgsEscaped` to the commited layer regardless or whether shell
or exec form were used.