Fix#41803
Also attempt to mknod devices.
Mknodding devices are likely to fail, but still worth trying when
running with a seccomp user notification.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
This fixes a panic when an invalid "device cgroup rule" is passed, resulting
in an "index out of range".
This bug was introduced in the original implementation in 1756af6faf,
but was not reproducible when using the CLI, because the same commit also added
client-side validation on the flag before making an API request. The following
example, uses an invalid rule (`c *:* rwm` - two spaces before the permissions);
```console
$ docker run --rm --network=host --device-cgroup-rule='c *:* rwm' busybox
invalid argument "c *:* rwm" for "--device-cgroup-rule" flag: invalid device cgroup format 'c *:* rwm'
```
Doing the same, but using the API results in a daemon panic when starting the container;
Create a container with an invalid device cgroup rule:
```console
curl -v \
--unix-socket /var/run/docker.sock \
"http://localhost/v1.41/containers/create?name=foobar" \
-H "Content-Type: application/json" \
-d '{"Image":"busybox:latest", "HostConfig":{"DeviceCgroupRules": ["c *:* rwm"]}}'
```
Start the container:
```console
curl -v \
--unix-socket /var/run/docker.sock \
-X POST \
"http://localhost/v1.41/containers/foobar/start"
```
Observe the daemon logs:
```
2021-01-22 12:53:03.313806 I | http: panic serving @: runtime error: index out of range [0] with length 0
goroutine 571 [running]:
net/http.(*conn).serve.func1(0xc000cb2d20)
/usr/local/go/src/net/http/server.go:1795 +0x13b
panic(0x2f32380, 0xc000aebfc0)
/usr/local/go/src/runtime/panic.go:679 +0x1b6
github.com/docker/docker/oci.AppendDevicePermissionsFromCgroupRules(0xc000175c00, 0x8, 0x8, 0xc0000bd380, 0x1, 0x4, 0x0, 0x0, 0xc0000e69c0, 0x0, ...)
/go/src/github.com/docker/docker/oci/oci.go:34 +0x64f
```
This patch:
- fixes the panic, allowing the daemon to return an error on container start
- adds a unit-test to validate various permutations
- adds a "todo" to verify the regular expression (and handling) of the "a" (all) value
We should also consider performing this validation when _creating_ the container,
so that an error is produced early.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
"docker load" validates parent links by comparing image histories, and the
History struct has a time.Time member "Created". Time.UnmarshalJSON can read
RFC3339 timestamps with offset "+00:00", but t.MarshalJSON writes them with
offset "Z". Equivalent times in these two formats are not equal when compared
with the == operator.
This causes checkValidParent to incorrectly return false when the parent image
history contains times using offset "+00:00". In that case the history copied
to the child image will have been converted into "Z" form when marshaled out.
This patch adds an "Equal" method to History, which compares "Created" times
with t.Equal. This is used instead of reflect.DeepEqual in checkValidParent.
Signed-off-by: Rob Cowsill <42620235+rcowsill@users.noreply.github.com>
Loggers that implement BufSize() (e.g. awslogs) uses the method to
tell Copier about the maximum log line length. However loggerWithCache
and RingBuffer hide the method by wrapping loggers.
As a result, Copier uses its default 16KB limit which breaks log
lines > 16kB even the destinations can handle that.
This change implements BufSize() on loggerWithCache and RingBuffer to
make sure these logger wrappes don't hide the method on the underlying
loggers.
Fixes#41794.
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
The following warnings in `docker info` are now discarded,
because there is no action user can actually take.
On cgroup v1:
- "WARNING: No blkio weight support"
- "WARNING: No blkio weight_device support"
On cgroup v2:
- "WARNING: No kernel memory TCP limit support"
- "WARNING: No oom kill disable support"
`docker run` still prints warnings when the missing feature is being attempted to use.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
When pulling an image by platform, it is possible for the image's
configured platform to not match what was in the manifest list.
The image itself is buggy because either the manifest list is incorrect
or the image config is incorrect. In any case, this is preventing people
from upgrading because many times users do not have control over these
buggy images.
This was not a problem in 19.03 because we did not compare on platform
before. It just assumed if we had the image it was the one we wanted
regardless of platform, which has its own problems.
Example Dockerfile that has this problem:
```Dockerfile
FROM --platform=linux/arm64 k8s.gcr.io/build-image/debian-iptables:buster-v1.3.0
RUN echo hello
```
This fails the first time you try to build after it finishes pulling but
before performing the `RUN` command.
On the second attempt it works because the image is already there and
does not hit the code that errors out on platform mismatch (Actually it
ignores errors if an image is returned at all).
Must be run with the classic builder (DOCKER_BUILDKIT=0).
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This fixes a panic when an admin specifies a custom default runtime,
when a plugin is started the shim config is nil.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This parameter was removed by kernel commit 4c145dce260137,
which made its way to kernel v5.3-rc1. Since that commit,
the functionality is built-in (i.e. it is available as long
as CONFIG_XFRM is on).
Make the check conditional.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
These config options are removed by kernel commit f382fb0bcef4,
which made its way into kernel v5.0-rc1.
Make the check conditional.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Kernel commit 2d1c498072de69e (which made its way into kernel v5.8-rc1)
removed CONFIG_MEMCG_SWAP_ENABLED Kconfig option, making swap accounting
always enabled (unless swapaccount=0 boot option is provided).
Make the check conditional.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
CONFIG_NF_NAT_NEEDED was removed in kernel commit 4806e975729f99c7,
which made its way into v5.2-rc1. The functionality is now under
NF_NAT which we already check for.
Make the check for NF_NAT_NEEDED conditional.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
CONFIG_NF_NAT_IPV4 was removed in kernel commit 3bf195ae6037e310,
which made its way into v5.1-rc1. The functionality is now under
NF_NAT which we already check for.
Make the check for NF_NAT_IPV4 conditional.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Commit f2f5106c92 added this test to verify loading
of images that were built with user-namespaces enabled.
However, because this test spins up a new daemon, not the daemon that's set up by
the test-suite's `TestMain()` (which loads the frozen images).
As a result, the `debian:bullseye` image was pulled from Docker Hub when running
the test;
Calling POST /v1.41/images/load?quiet=1
Applying tar in /go/src/github.com/docker/docker/bundles/test-integration/TestBuildUserNamespaceValidateCapabilitiesAreV2/d4d366b15997b/root/165536.165536/overlay2/3f7f9375197667acaf7bc810b34689c21f8fed9c52c6765c032497092ca023d6/diff" storage-driver=overlay
Applied tar sha256:845f0e5159140e9dbcad00c0326c2a506fbe375aa1c229c43f082867d283149c to 3f7f9375197667acaf7bc810b34689c21f8fed9c52c6765c032497092ca023d6, size: 5922359
Calling POST /v1.41/build?buildargs=null&cachefrom=null&cgroupparent=&cpuperiod=0&cpuquota=0&cpusetcpus=&cpusetmems=&cpushares=0&dockerfile=&labels=null&memory=0&memswap=0&networkmode=&rm=0&shmsize=0&t=capabilities%3A1.0&target=&ulimits=null&version=
Trying to pull debian from https://registry-1.docker.io v2
Fetching manifest from remote" digest="sha256:f169dbadc9021fc0b08e371d50a772809286a167f62a8b6ae86e4745878d283d" error="<nil>" remote="docker.io/library/debian:bullseye
Pulling ref from V2 registry: debian:bullseye
...
This patch updates `TestBuildUserNamespaceValidateCapabilitiesAreV2` to load the
frozen image. `StartWithBusybox` is also changed to `Start`, because the test
is not using the busybox image, so there's no need to load it.
In a followup, we should probably add some utilities to make this easier to set up
(and to allow passing the list frozen images that we want to load, without having
to "hard-code" the image name to load).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Using `d.Kill()` with rootless mode causes the restarted daemon to not
be able to start containerd (it times out).
Originally this was SIGKILLing the daemon because we were hoping to not
have to manipulate on disk state, but since we need to anyway we can
shut it down normally.
I also tested this to ensure the test fails correctly without the fix
that the test was added to check for.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Adds a test case for the case where dockerd gets stuck on startup due to
hanging `daemon.shutdownContainer`
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
When building images in a user-namespaced container, v3 capabilities are
stored including the root UID of the creator of the user-namespace.
This UID does not make sense outside the build environment however. If
the image is run in a non-user-namespaced runtime, or if a user-namespaced
runtime uses a different UID, the capabilities requested by the effective
bit will not be honoured by `execve(2)` due to this mismatch.
Instead, we convert v3 capabilities to v2, dropping the root UID on the
fly.
Signed-off-by: Eric Mountain <eric.mountain@datadoghq.com>