Commit graph

6716 commits

Author SHA1 Message Date
Tianon Gravi
f76958f612
Merge pull request #42245 from thaJeztah/use_proper_domains
Use designated test domains (RFC2606) in tests
2021-04-05 09:44:18 -07:00
Sebastiaan van Stijn
1df3d5c1de
Merge pull request #42203 from AkihiroSuda/btrfs-allow-unprivileged
btrfs: Allow unprivileged user to delete subvolumes (kernel >= 4.18)
2021-04-05 16:35:12 +02:00
Sebastiaan van Stijn
97a5b797b6
Use designated test domains (RFC2606) in tests
Some tests were using domain names that were intended to be "fake", but are
actually registered domain names (such as domain.com, registry.com, mytest.com).

Even though we were not actually making connections to these domains, it's
better to use domains that are designated for testing/examples in RFC2606:
https://tools.ietf.org/html/rfc2606

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-04-02 14:06:27 +02:00
Tibor Vass
5b11047c25
Merge pull request #42188 from AkihiroSuda/fix-overlay2-naivediff
rootless: overlay2: fix "createDirWithOverlayOpaque(...) ... input/output error"
2021-04-01 05:03:24 -07:00
Akihiro Suda
248f98ef5e
rootless: bind mount: fix "operation not permitted"
The following was failing previously, because `getUnprivilegedMountFlags()` was not called:
```console
$ sudo mount -t tmpfs -o noexec none /tmp/foo
$ $ docker --context=rootless run -it --rm -v /tmp/foo:/mnt:ro alpine
docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:520: container init caused: rootfs_linux.go:60: mounting "/tmp/foo" to rootfs at "/home/suda/.local/share/docker/overlay2/b8e7ea02f6ef51247f7f10c7fb26edbfb308d2af8a2c77915260408ed3b0a8ec/merged/mnt" caused: operation not permitted: unknown.
```

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-04-01 14:58:11 +09:00
Akihiro Suda
6322dfc217
archive: do not use overlayWhiteoutConverter for UserNS
overlay2 no longer sets `archive.OverlayWhiteoutFormat` when
running in UserNS, so we can remove the complicated logic in the
archive package.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-03-29 14:47:12 +09:00
Akihiro Suda
67aa418df2
overlay2: doesSupportNativeDiff: add fast path for userns
When running in userns, returns error (i.e. "use naive, not native")
immediately.

No substantial change to the logic.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-03-29 14:47:09 +09:00
Akihiro Suda
dd97134232
overlay2: call d.naiveDiff.ApplyDiff when useNaiveDiff==true
Previously, `d.naiveDiff.ApplyDiff` was not used even when
`useNaiveDiff()==true`

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-03-26 14:34:56 +09:00
Akihiro Suda
62b5194f62
btrfs: Allow unprivileged user to delete subvolumes (kernel >= 4.18)
Fix issue 41762

Cherry-pick "drivers: btrfs: Allow unprivileged user to delete subvolumes" from containers/storage
831e32b6bd

> In btrfs, subvolume can be deleted by IOC_SNAP_DESTROY ioctl but there
> is one catch: unprivileged IOC_SNAP_DESTROY call is restricted by default.
>
> This is because IOC_SNAP_DESTROY only performs permission checks on
> the top directory(subvolume) and unprivileged user might delete dirs/files
> which cannot be deleted otherwise. This restriction can be relaxed if
> user_subvol_rm_allowed mount option is used.
>
> Although the above ioctl had been the only way to delete a subvolume,
> btrfs now allows deletion of subvolume just like regular directory
> (i.e. rmdir sycall) since kernel 4.18.
>
> So if we fail to cleanup subvolume in subvolDelete(), just fallback to
> system.EnsureRmoveall() to try to cleanup subvolumes again.
> (Note: quota needs privilege, so if quota is enabled we do not fallback)
>
> This fix will allow non-privileged container works with btrfs backend.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-03-26 14:30:40 +09:00
Brian Goff
788f2883d2
Merge pull request #42104 from cpuguy83/41820_fix_json_unexpected_eof 2021-03-18 14:18:11 -07:00
Brian Goff
a84d824c5f
Merge pull request #42068 from AkihiroSuda/ovl-k511 2021-03-18 11:54:01 -07:00
Brian Goff
5a664dc87d jsonfile: more defensive reader implementation
Tonis mentioned that we can run into issues if there is more error
handling added here. This adds a custom reader implementation which is
like io.MultiReader except it does not cache EOF's.
What got us into trouble in the first place is `io.MultiReader` will
always return EOF once it has received an EOF, however the error
handling that we are going for is to recover from an EOF because the
underlying file is a file which can have more data added to it after
EOF.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2021-03-18 18:44:46 +00:00
Brian Goff
ece4cd4c4d
Merge pull request #41757 from thaJeztah/carry_39371_remove_more_v1_code 2021-03-18 11:38:07 -07:00
Brian Goff
4be98a38e7 Fix handling for json-file io.UnexpectedEOF
When the multireader hits EOF, we will always get EOF from it, so we
cannot store the multrireader fro later error handling, only for the
decoder.

Thanks @tobiasstadler for pointing this error out.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2021-03-11 20:01:03 +00:00
Akihiro Suda
a8008f7313
overlayutils/userxattr.go: add "fast path" for kernel >= 5.11.0
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-03-11 15:18:59 +09:00
Akihiro Suda
11ef8d3ba9
overlay2: support "userxattr" option (kernel 5.11)
The "userxattr" option is needed for mounting overlayfs inside a user namespace with kernel >= 5.11.

The "userxattr" option is NOT needed for the initial user namespace (aka "the host").

Also, Ubuntu (since circa 2015) and Debian (since 10) with kernel < 5.11 can mount the overlayfs in a user namespace without the "userxattr" option.

The corresponding kernel commit: 2d2f2d7322ff43e0fe92bf8cccdc0b09449bf2e1
> **ovl: user xattr**
>
> Optionally allow using "user.overlay." namespace instead of "trusted.overlay."
> ...
> Disable redirect_dir and metacopy options, because these would allow privilege escalation through direct manipulation of the
> "user.overlay.redirect" or "user.overlay.metacopy" xattrs.

Fix issue 42055

Related to containerd/containerd PR 5076

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-03-11 15:12:41 +09:00
Sebastiaan van Stijn
328de0b8d9
Update documentation links
- Using "/go/" redirects for some topics, which allows us to
  redirect to new locations if topics are moved around in the
  documentation.
- Updated some old URLs to their new location.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-02-25 12:11:50 +01:00
Tibor Vass
7a50fe8a52
Remove more of registry v1 code.
Signed-off-by: Tibor Vass <tibor@docker.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-02-23 09:49:46 +01:00
Sebastiaan van Stijn
0f3b94a5c7
daemon: remove migration code from docker 1.11 to 1.12
This code was added in 391441c28b, to fix
upgrades from docker 1.11 to 1.12 with existing containers.

Given that any container after 1.12 should have the correct configuration
already, it should be safe to assume this upgrade logic is no longer needed.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-02-22 11:36:43 +01:00
Sebastiaan van Stijn
8b6d9eaa55
Merge pull request #42044 from nathanlcarlson/labels_regex_length_check
Check the length of the correct variable #42039
2021-02-18 22:22:44 +01:00
Sebastiaan van Stijn
56ffa614d6
Merge pull request #41955 from cpuguy83/fallback_manifest_on_bad_plat
Fallback to  manifest list when no platform match
2021-02-18 20:59:51 +01:00
Brian Goff
50f39e7247 Move cpu variant checks into platform matcher
Wrap platforms.Only and fallback to our ignore mismatches due to  empty
CPU variants. This just cleans things up and makes the logic re-usable
in other places.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2021-02-18 16:58:48 +00:00
Nathan Carlson
8d73c1ad68 Check the length of the correct variable #42039
Signed-off-by: Nathan Carlson <carl4403@umn.edu>
2021-02-18 10:27:35 -06:00
Brian Goff
4be5453215 Fallback to manifest list when no platform match
In some cases, in fact many in the wild, an image may have the incorrect
platform on the image config.
This can lead to failures to run an image, particularly when a user
specifies a `--platform`.
Typically what we see in the wild is a manifest list with an an entry
for, as an example, linux/arm64 pointing to an image config that has
linux/amd64 on it.

This change falls back to looking up the manifest list for an image to
see if the manifest list shows the image as the correct one for that
platform.

In order to accomplish this we need to traverse the leases associated
with an image. Each image, if pulled with Docker 20.10, will have the
manifest list stored in the containerd content store with the resource
assigned to a lease keyed on the image ID.
So we look up the lease for the image, then look up the assocated
resources to find the manifest list, then check the manifest list for a
platform match, then ensure that manifest referes to our image config.

This is only used as a fallback when a user specified they want a
particular platform and the image config that we have does not match
that platform.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2021-02-17 19:10:48 +00:00
Akihiro Suda
1d2a660093
Move cgroup v2 out of experimental
We have upgraded runc to rc93 and added CI for cgroup 2.
So we can move cgroup v2 out of experimental.

Fix issue 41916

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-02-16 17:54:28 +09:00
Tibor Vass
7359a3b1e9
Merge pull request #41567 from J-jaeyoung/fix_off_by_one
Update array length check logic for preventing off-by-one error
2021-02-11 11:18:23 -08:00
Sebastiaan van Stijn
264353425a
Merge pull request #41698 from cpuguy83/fix_shutdown_handling
Move container exit state to after cleanup.
2021-02-11 20:18:00 +01:00
Sebastiaan van Stijn
45bb0860b6
Merge pull request #41320 from pjbgf/add-seccomp-tests
Add test coverage to seccomp.
2021-02-10 17:14:15 +01:00
Sebastiaan van Stijn
1c39b1c44c
Merge pull request #41842 from jchorl/master
Reject null manifests during tar import
2021-02-09 12:06:27 +01:00
Paulo Gomes
137f86067c
Add test coverage for seccomp implementation
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
2021-02-04 19:47:07 +00:00
Josh Chorlton
654f854fae reject null manifests
Signed-off-by: Josh Chorlton <jchorlton@gmail.com>
2021-02-02 09:24:53 -08:00
Tibor Vass
2bd6213363
Merge pull request #41965 from thaJeztah/buildkit_apparmor_master
[master] Ensure AppArmor and SELinux profiles are applied when building with BuildKit
2021-02-02 08:52:11 -08:00
Brian Goff
94c07441c2
buildkit: Apply apparmor profile
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit 611eb6ffb3)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-02-02 13:32:24 +01:00
Brian Goff
7f5e39bd4f
Use real root with 0701 perms
Various dirs in /var/lib/docker contain data that needs to be mounted
into a container. For this reason, these dirs are set to be owned by the
remapped root user, otherwise there can be permissions issues.
However, this uneccessarily exposes these dirs to an unprivileged user
on the host.

Instead, set the ownership of these dirs to the real root (or rather the
UID/GID of dockerd) with 0701 permissions, which allows the remapped
root to enter the directories but not read/write to them.
The remapped root needs to enter these dirs so the container's rootfs
can be configured... e.g. to mount /etc/resolve.conf.

This prevents an unprivileged user from having read/write access to
these dirs on the host.
The flip side of this is now any user can enter these directories.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit e908cc3901)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-02-02 13:01:25 +01:00
Brian Goff
4b5aa28f24
Do not set DOCKER_TMP to be owned by remapped root
The remapped root does not need access to this dir.
Having this owned by the remapped root opens the host up to an
uprivileged user on the host being able to escalate privileges.

While it would not be normal for the remapped UID to be used outside of
the container context, it could happen.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit bfedd27259)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-02-02 13:01:22 +01:00
Brian Goff
e192ce4009 Move container exit state to after cleanup.
Before this change, there is no way to know if container (runtime)
resources have been cleaned up unless you actually remove the container.

This change allows callers of the wait API or the events API to know
that all runtime resources for the container are released (e.g. IP
addresses).

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2021-01-28 11:28:41 -08:00
Sebastiaan van Stijn
f266f13965
Merge pull request #41636 from TBBle/37352-test-and-fix
Set 127GB default sandbox size for WCOW, and ensure storage-opts is honoured on all paths under WCOW and LCOW
2021-01-25 14:34:34 +01:00
Brian Goff
b865beba22
Merge pull request #41894 from AkihiroSuda/silence-dockerinfo 2021-01-21 09:32:37 -08:00
Sebastiaan van Stijn
d5612a0ef8
Merge pull request #41854 from cpuguy83/for-linux-1169-plugins-custom-runtime-panic
Add shim config for custom runtimes for plugins
2021-01-21 16:26:36 +01:00
Sebastiaan van Stijn
44aacff3fc
Merge pull request #41873 from cpuguy83/fix_builder_inconsisent_platform
Fix builder inconsistent error on buggy platform
2021-01-21 16:23:28 +01:00
Kazuyoshi Kato
bb11365e96 Handle long log messages correctly on SizedLogger
Loggers that implement BufSize() (e.g. awslogs) uses the method to
tell Copier about the maximum log line length. However loggerWithCache
and RingBuffer hide the method by wrapping loggers.

As a result, Copier uses its default 16KB limit which breaks log
lines > 16kB even the destinations can handle that.

This change implements BufSize() on loggerWithCache and RingBuffer to
make sure these logger wrappes don't hide the method on the underlying
loggers.

Fixes #41794.

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2021-01-20 16:44:06 -08:00
Akihiro Suda
00225e220f
docker info: adjust warning strings for cgroup v2
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-01-20 13:42:32 +09:00
Akihiro Suda
8086443a44
docker info: silence unhandleable warnings
The following warnings in `docker info` are now discarded,
because there is no action user can actually take.

On cgroup v1:
- "WARNING: No blkio weight support"
- "WARNING: No blkio weight_device support"

On cgroup v2:
- "WARNING: No kernel memory TCP limit support"
- "WARNING: No oom kill disable support"

`docker run` still prints warnings when the missing feature is being attempted to use.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-01-19 15:10:21 +09:00
Brian Goff
399695305c Fix builder inconsistent error on buggy platform
When pulling an image by platform, it is possible for the image's
configured platform to not match what was in the manifest list.
The image itself is buggy because either the manifest list is incorrect
or the image config is incorrect. In any case, this is preventing people
from upgrading because many times users do not have control over these
buggy images.

This was not a problem in 19.03 because we did not compare on platform
before. It just assumed if we had the image it was the one we wanted
regardless of platform, which has its own problems.

Example Dockerfile that has this problem:

```Dockerfile
FROM --platform=linux/arm64 k8s.gcr.io/build-image/debian-iptables:buster-v1.3.0
RUN echo hello
```

This fails the first time you try to build after it finishes pulling but
before performing the `RUN` command.
On the second attempt it works because the image is already there and
does not hit the code that errors out on platform mismatch (Actually it
ignores errors if an image is returned at all).

Must be run with the classic builder (DOCKER_BUILDKIT=0).

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2021-01-14 21:45:45 +00:00
Brian Goff
2903863a1d Add shim config for custom runtimes for plugins
This fixes a panic when an admin specifies a custom default runtime,
when a plugin is started the shim config is nil.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2021-01-14 19:28:28 +00:00
gunadhya
64465f3b5f Fix Error in daemon_unix.go and docker_cli_run_unit_test.go
Signed-off-by: gunadhya <6939749+gunadhya@users.noreply.github.com>
2021-01-05 16:56:29 +05:30
Akihiro Suda
8891c58a43
Merge pull request #41786 from thaJeztah/test_selinux_tip
vendor: opencontainers/selinux v1.8.0, and remove selinux build-tag and stubs
2020-12-26 00:07:49 +09:00
Tibor Vass
ffc4dc9aec
Merge pull request #41817 from simonferquel/desktop-startup-hang
Fix a potential hang when starting after a non-clean shutdown
2020-12-23 23:22:00 -08:00
Sebastiaan van Stijn
1c0af18c6c
vendor: opencontainers/selinux v1.8.0, and remove selinux build-tag and stubs
full diff: https://github.com/opencontainers/selinux/compare/v1.7.0...v1.8.0

Remove "selinux" build tag

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-12-24 00:47:16 +01:00
Brian Goff
4a175fd050 Cleanup container shutdown check and add test
Adds a test case for the case where dockerd gets stuck on startup due to
hanging `daemon.shutdownContainer`

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-12-23 16:59:03 +00:00