We have upgraded runc to rc93 and added CI for cgroup 2.
So we can move cgroup v2 out of experimental.
Fix issue 41916
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
(cherry picked from commit 1d2a660093)
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
This test has been flaky for a long time, failing with:
--- FAIL: TestInspect (12.04s)
inspect_test.go:39: timeout hit after 10s: waiting for tasks to enter run state. task failed with error: task: non-zero exit (1)
While looking through logs, noticed tasks were started, entering RUNNING stage,
and then exited, to be started again.
state.transition="STARTING->RUNNING"
...
msg="fatal task error" error="task: non-zero exit (1)"
...
state.transition="RUNNING->FAILED"
Looking for possible reasons, first considering network issues (possibly we ran
out of IP addresses or networking not cleaned up), then I spotted the issue.
The service is started with;
Command: []string{"/bin/top"},
Args: []string{"-u", "root"},
The `-u root` is not an argument for the service, but for `/bin/top`. While the
Ubuntu/Debian/GNU version `top` has a -u/-U option;
docker run --rm ubuntu:20.04 top -h 2>&1 | grep '\-u'
top -hv | -bcEHiOSs1 -d secs -n max -u|U user -p pid(s) -o field -w [cols]
The *busybox* version of top does not:
docker run --rm busybox top --help 2>&1 | grep '\-u'
So running `top -u root` would cause the task to fail;
docker run --rm busybox top -u root
top: invalid option -- u
...
echo $?
1
As a result, the service went into a crash-loop, and because the `poll.WaitOn()`
was running with a short interval, in many cases would _just_ find the RUNNING
state, perform the `service inspect`, and pass, but in other cases, it would not
be that lucky, and continue polling untill we reached the 10 seconds timeout,
and mark the test as failed.
Looking for history of this option (was it previously using a different image?) I
found this was added in 6cd6d8646a, but probably
just missed during review.
Given that the option is only set to have "something" to inspect, I replaced
the `-u root` with `-d 5`, which makes top refresh with a 5 second interval.
Note that there is another test (`TestServiceListWithStatuses) that uses the same
spec, however, that test is skipped based on API version of the test-daemon, and
(to be looked into), when performing that check, no API version is known, causing
the test to (always?) be skipped:
=== RUN TestServiceListWithStatuses
--- SKIP: TestServiceListWithStatuses (0.00s)
list_test.go:34: versions.LessThan(testEnv.DaemonInfo.ServerVersion, "1.41")
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 00cb3073f4)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This was changed as part of a refactor to use containerd dist code. The
problem is the OCI media types are not compatible with older versions of
Docker.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit a876ede24f)
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Fix issue 41762
Cherry-pick "drivers: btrfs: Allow unprivileged user to delete subvolumes" from containers/storage
831e32b6bd
> In btrfs, subvolume can be deleted by IOC_SNAP_DESTROY ioctl but there
> is one catch: unprivileged IOC_SNAP_DESTROY call is restricted by default.
>
> This is because IOC_SNAP_DESTROY only performs permission checks on
> the top directory(subvolume) and unprivileged user might delete dirs/files
> which cannot be deleted otherwise. This restriction can be relaxed if
> user_subvol_rm_allowed mount option is used.
>
> Although the above ioctl had been the only way to delete a subvolume,
> btrfs now allows deletion of subvolume just like regular directory
> (i.e. rmdir sycall) since kernel 4.18.
>
> So if we fail to cleanup subvolume in subvolDelete(), just fallback to
> system.EnsureRmoveall() to try to cleanup subvolumes again.
> (Note: quota needs privilege, so if quota is enabled we do not fallback)
>
> This fix will allow non-privileged container works with btrfs backend.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
(cherry picked from commit 62b5194f62)
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
`getCurrentOOMScoreAdj()` was broken because `strconv.Atoi()` was called
without trimming "\n".
Fix issue 40068: `rootless docker in kubernetes: "getting the final child's pid from
pipe caused \"EOF\": unknown"
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
(cherry picked from commit d6ddfb6118)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
overlay2 no longer sets `archive.OverlayWhiteoutFormat` when
running in UserNS, so we can remove the complicated logic in the
archive package.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
(cherry picked from commit 6322dfc217)
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
When running in userns, returns error (i.e. "use naive, not native")
immediately.
No substantial change to the logic.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
(cherry picked from commit 67aa418df2)
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Previously, `d.naiveDiff.ApplyDiff` was not used even when
`useNaiveDiff()==true`
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
(cherry picked from commit dd97134232)
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
The following was failing previously, because `getUnprivilegedMountFlags()` was not called:
```console
$ sudo mount -t tmpfs -o noexec none /tmp/foo
$ $ docker --context=rootless run -it --rm -v /tmp/foo:/mnt:ro alpine
docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:520: container init caused: rootfs_linux.go:60: mounting "/tmp/foo" to rootfs at "/home/suda/.local/share/docker/overlay2/b8e7ea02f6ef51247f7f10c7fb26edbfb308d2af8a2c77915260408ed3b0a8ec/merged/mnt" caused: operation not permitted: unknown.
```
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
(cherry picked from commit 248f98ef5e)
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
full diff: fa125a3512...b3507428be
- fixed IPv6 iptables rules for enabled firewalld (libnetwork#2609)
- fixes "Docker uses 'iptables' instead of 'ip6tables' for IPv6 NAT rule, crashes"
- Fix regression in docker-proxy
- introduced in "Fix IPv6 Port Forwarding for the Bridge Driver" (libnetwork#2604)
- fixes/addresses: "IPv4 and IPv6 addresses are not bound by default anymore" (libnetwork#2607)
- fixes/addresses "IPv6 is no longer proxied by default anymore" (moby#41858)
- Use hostIP to decide on Portmapper version
- fixes docker-proxy not being stopped correctly
Port mapping of containers now contain separatet mappings for IPv4 and IPv6 addresses, when
listening on "any" IP address. Various tests had to be updated to take multiple mappings into
account.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 0450728267)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The normalizing was updated with the output of the "docker port" command
in mind, but we're normalizing the "expected" output, which is passed
without the "->" in front of the mapping, causing some tests to fail;
=== RUN TestDockerSuite/TestPortHostBinding
--- FAIL: TestDockerSuite/TestPortHostBinding (1.21s)
docker_cli_port_test.go:324: assertion failed: error is not nil: |:::9876!=[::]:9876|
=== RUN TestDockerSuite/TestPortList
--- FAIL: TestDockerSuite/TestPortList (0.96s)
docker_cli_port_test.go:25: assertion failed: error is not nil: |:::9876!=[::]:9876|
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit c8599a6537)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
With the promotion of the experimental Dockerfile syntax to "stable", the Dockerfile
syntax now includes some options that are supported by BuildKit, but not (yet)
supported in the classic builder.
As a result, parsing a Dockerfile may succeed, but any flag that's known to BuildKit,
but not supported by the classic builder is silently ignored;
$ mkdir buildkit_flags && cd buildkit_flags
$ touch foo.txt
For example, `RUN --mount`:
DOCKER_BUILDKIT=0 docker build --no-cache -f- . <<EOF
FROM busybox
RUN --mount=type=cache,target=/foo echo hello
EOF
Sending build context to Docker daemon 2.095kB
Step 1/2 : FROM busybox
---> 219ee5171f80
Step 2/2 : RUN --mount=type=cache,target=/foo echo hello
---> Running in 022fdb856bc8
hello
Removing intermediate container 022fdb856bc8
---> e9f0988844d1
Successfully built e9f0988844d1
Or `COPY --chmod` (same for `ADD --chmod`):
DOCKER_BUILDKIT=0 docker build --no-cache -f- . <<EOF
FROM busybox
COPY --chmod=0777 /foo.txt /foo.txt
EOF
Sending build context to Docker daemon 2.095kB
Step 1/2 : FROM busybox
---> 219ee5171f80
Step 2/2 : COPY --chmod=0777 /foo.txt /foo.txt
---> 8b7117932a2a
Successfully built 8b7117932a2a
Note that unknown flags still produce and error, for example, the below fails because `--hello` is an unknown flag;
DOCKER_BUILDKIT=0 docker build -<<EOF
FROM busybox
RUN --hello echo hello
EOF
Sending build context to Docker daemon 2.048kB
Error response from daemon: dockerfile parse error line 2: Unknown flag: hello
With this patch applied
----------------------------
With this patch applied, flags that are known in the Dockerfile spec, but are not
supported by the classic builder, produce an error, which includes a link to the
documentation how to enable BuildKit:
DOCKER_BUILDKIT=0 docker build --no-cache -f- . <<EOF
FROM busybox
RUN --mount=type=cache,target=/foo echo hello
EOF
Sending build context to Docker daemon 2.048kB
Step 1/2 : FROM busybox
---> b97242f89c8a
Step 2/2 : RUN --mount=type=cache,target=/foo echo hello
the --mount option requires BuildKit. Refer to https://docs.docker.com/go/buildkit/ to learn how to build images with BuildKit enabled
DOCKER_BUILDKIT=0 docker build --no-cache -f- . <<EOF
FROM busybox
COPY --chmod=0777 /foo.txt /foo.txt
EOF
Sending build context to Docker daemon 2.095kB
Step 1/2 : FROM busybox
---> b97242f89c8a
Step 2/2 : COPY --chmod=0777 /foo.txt /foo.txt
the --chmod option requires BuildKit. Refer to https://docs.docker.com/go/buildkit/ to learn how to build images with BuildKit enabled
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit a09c0276a2)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Also re-formatting some lines for readability.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit c6038b4884)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Rootlesskit doesn't currently handle IPv6 addresses, causing TestNetworkLoopbackNat
and TestNetworkNat to fail;
Error starting userland proxy:
error while calling PortManager.AddPort(): listen tcp: address :::8080: too many colons in address
This patch:
- Updates `getExternalAddress()` to pick IPv4 address if both IPv6 and IPv4 are found
- Update TestNetworkNat to net.JoinHostPort(), so that square brackets are used for
IPv6 addresses (e.g. `[::]:8080`)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit f845b98ca6)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Since rootlesskit removed vendor folder, building it has to rely on go mod.
Dockerfile in docker-ce-packaging uses GOPROXY=direct, which makes "go mod"
commands use git to fetch modules. "go mod" in Go versions before 1.14.1 are
incompatible with older git versions, including the version of git that ships
with CentOS/RHEL 7 (which have git 1.8), see golang/go#38373
This patch switches rootlesskit install script to set GOPROXY to
https://proxy.golang.org so that git is not required for downloading modules.
Once all our code has upgraded to Go 1.14+, this workaround should be
removed.
Signed-off-by: Tibor Vass <tibor@docker.com>
(cherry picked from commit cbc6cefdcb)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
full diff: https://github.com/rootless-containers/rootlesskit/compare/v0.13.1...v0.14.0
v0.14.0 Changes (since v0.13.2)
--------------------------------------
- CLI: improve --help output
- API: support GET /info
- Port API: support specifying IP version explicitly ("tcp4", "tcp6")
- rootlesskit-docker-proxy: support libnetwork >= 20201216 convention
- Allow vendoring with moby/sys/mountinfo@v0.1.3 as well as @v0.4.0
- Remove socat port driver
- socat driver has been deprecated since v0.7.1 (Dec 2019)
- New experimental flag: --ipv6
- Enables IPv6 routing (slirp4netns --enable-ipv6). Unrelated to port driver.
v0.13.2
--------------------------------------
- Fix cleaning up crashed state dir
- Update Go to 1.16
- Misc fixes
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit e166af959d)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This seems to be testing a strange case, specifically that one can set
the `--net` and `--network` in the same command with the same network.
Indeed this used to work with older CLIs but newer ones error out when
validating the request before sending it to the daemon.
Opening this for discussion because:
1. This doesn't seem to be testing anything at all related to the rest
of the test
2. Not really providing any value here.
3. Is testing that a technically invalid option is successful (whether
the option should be valid as it relates to the CLI accepting it is
debatable).
4. Such a case seems fringe and even a bug in whatever is calling the
CLI with such options.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit e31086320e)
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
In 20.10 we no longer implicitly push all tags and require a
"--all-tags" flag, so add this to the test when the CLI is >= 20.10
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit 601707a655)
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Tonis mentioned that we can run into issues if there is more error
handling added here. This adds a custom reader implementation which is
like io.MultiReader except it does not cache EOF's.
What got us into trouble in the first place is `io.MultiReader` will
always return EOF once it has received an EOF, however the error
handling that we are going for is to recover from an EOF because the
underlying file is a file which can have more data added to it after
EOF.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit 5a664dc87d)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
When the multireader hits EOF, we will always get EOF from it, so we
cannot store the multrireader fro later error handling, only for the
decoder.
Thanks @tobiasstadler for pointing this error out.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit 4be98a38e7)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
(cherry picked from commit a8008f7313)
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
The "userxattr" option is needed for mounting overlayfs inside a user namespace with kernel >= 5.11.
The "userxattr" option is NOT needed for the initial user namespace (aka "the host").
Also, Ubuntu (since circa 2015) and Debian (since 10) with kernel < 5.11 can mount the overlayfs in a user namespace without the "userxattr" option.
The corresponding kernel commit: 2d2f2d7322ff43e0fe92bf8cccdc0b09449bf2e1
> **ovl: user xattr**
>
> Optionally allow using "user.overlay." namespace instead of "trusted.overlay."
> ...
> Disable redirect_dir and metacopy options, because these would allow privilege escalation through direct manipulation of the
> "user.overlay.redirect" or "user.overlay.metacopy" xattrs.
Fix issue 42055
Related to containerd/containerd PR 5076
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
(cherry picked from commit 11ef8d3ba9)
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Fixes#41704
The latest released versions of the static binaries (20.10.3) are still unable
to use faccessat2 with musl-1.2.2 even though this was addressed in #41353 and
related issues. The underlying cause seems to be that the build system
here still uses the default version of libseccomp shipped with buster.
An updated version is available in buster backports:
https://packages.debian.org/buster-backports/libseccomp-dev
Signed-off-by: Jeremy Huntwork <jhuntwork@lightcubesolutions.com>
(cherry picked from commit 1600e851b5)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>