Commit graph

7184 commits

Author SHA1 Message Date
Adam Williams
489f57b877 Add security privilege needed to write layers when windows VHDX used as docker data root
Signed-off-by: Adam Williams <awilliams@mirantis.com>
2021-04-29 10:41:19 -07:00
Akihiro Suda
4300a52606
rootless: disable overlay2 if running with SELinux
Kernel 5.11 introduced support for rootless overlayfs, but incompatible with SELinux.

On the other hand, fuse-overlayfs is compatible.

Close issue 42333

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-04-28 18:22:06 +09:00
Sebastiaan van Stijn
6d1eceb509
Fix panic in TestExecSetPlatformOpt, TestExecSetPlatformOptPrivileged
These tests would panic;

- in WithRLimits(), because HostConfig was not set;
  470ae8422f/daemon/oci_linux.go (L46-L47)
- in daemon.mergeUlimits(), because daemon.configStore was not set;
  470ae8422f/daemon/oci_linux.go (L1069)

This panic was not discovered because the current version of runc/libcontainer that we vendor
would not always return false for `apparmor.IsEnabled()` when running docker-in-docker or if
`apparmor_parser` is not found. Starting with v1.0.0-rc93 of libcontainer, this is no longer
the case (changed in bfb4ea1b1b)

This patch;

- changes the tests to initialize Daemon.configStore and Container.HostConfig
- Combines TestExecSetPlatformOpt and TestExecSetPlatformOptPrivileged into a new test
  (TestExecSetPlatformOptAppArmor)
- Runs the test both if AppArmor is enabled and if not (in which case it tests
  that the container's AppArmor profile is left empty).
- Adds a FIXME comment for a possible bug in execSetPlatformOpts, which currently
  prefers custom profiles over "privileged".

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-04-23 00:39:39 +02:00
Tianon Gravi
72fef53cec
Merge pull request #42270 from cpuguy83/bump_hcsshim
Bump hcsshim to get some fixes.
2021-04-20 14:42:29 -07:00
Brian Goff
225e046d9d Error string match: do not match command path
Whether or not the command path is in the error message is a an
implementation detail.
For example, on Windows the only reason this ever matched was because it
dumped the entire container config into the error message, but this had
nothing to do with the actual error.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2021-04-14 23:03:18 +00:00
Cam
e57a365ab1
docker kill: fix bug where failed kills didnt fallback to unix kill
1. fixes #41587
2. removes potential infinite Wait and goroutine leak at end of kill
function

fixes #41587

Signed-off-by: Cam <gh@sparr.email>
2021-04-14 15:43:44 -07:00
Brian Goff
6110ba3d7c
Merge pull request #41586 from sparrc/stop-refactor
Fix hung docker stop if stop signal and daemon.Kill both fail
2021-04-14 13:33:14 -07:00
Cam
8e362b75cb
docker daemon container stop refactor
this refactors the Stop command to fix a few issues and behaviors that
dont seem completely correct:

1. first it fixes a situation where stop could hang forever (#41579)
2. fixes a behavior where if sending the
stop signal failed, then the code directly sends a -9 signal. If that
fails, it returns without waiting for the process to exit or going
through the full docker kill codepath.
3. fixes a behavior where if sending the stop signal failed, then the
code sends a -9 signal. If that succeeds, then we still go through the
same stop waiting process, and may even go through the docker kill path
again, even though we've already sent a -9.
4. fixes a behavior where the code would wait the full 30 seconds after
sending a stop signal, even if we already know the stop signal failed.

fixes #41579

Signed-off-by: Cam <gh@sparr.email>
2021-04-13 09:53:00 -07:00
Michal Rostecki
1ec689c4c2 btrfs: Do not disable quota on cleanup
Before this change, cleanup of the btrfs driver (occuring on each daemon
shutdown) resulted in disabling quotas. It was done with an assumption
that quotas can be enabled or disabled on a subvolume level, which is
not true - enabling or disabling quota is always done on a filesystem
level.

That was leading to disabling quota on btrfs filesystems on each daemon
shutdown.

This change fixes that behavior and removes misleading `subvol` prefix
from functions and methods which set up quota (on a filesystem level).

Fixes: #34593
Fixes: 401c8d1767 ("Add disk quota support for btrfs")
Signed-off-by: Michal Rostecki <mrostecki@opensuse.org>
2021-04-13 16:23:39 +01:00
Tibor Vass
68bec0fcf7
Merge pull request #42276 from thaJeztah/apparmor_detect_fix
Use containerd's apparmor package to detect if apparmor can be used
2021-04-09 16:09:54 -07:00
Sebastiaan van Stijn
be95eae6d2
Merge pull request #41999 from diakovliev/fix_update_sync
Fix for lack of synchronization in daemon/update.go
2021-04-09 00:27:43 +02:00
Sebastiaan van Stijn
2834f842ee
Use containerd's apparmor package to detect if apparmor can be used
The runc/libcontainer apparmor package on master no longer checks if apparmor_parser
is enabled, or if we are running docker-in-docker.

While those checks are not relevant to runc (as it doesn't load the profile), these
checks _are_ relevant to us (and containerd). So switching to use the containerd
apparmor package, which does include the needed checks.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-04-08 20:22:08 +02:00
Tianon Gravi
f76958f612
Merge pull request #42245 from thaJeztah/use_proper_domains
Use designated test domains (RFC2606) in tests
2021-04-05 09:44:18 -07:00
Sebastiaan van Stijn
1df3d5c1de
Merge pull request #42203 from AkihiroSuda/btrfs-allow-unprivileged
btrfs: Allow unprivileged user to delete subvolumes (kernel >= 4.18)
2021-04-05 16:35:12 +02:00
Sebastiaan van Stijn
97a5b797b6
Use designated test domains (RFC2606) in tests
Some tests were using domain names that were intended to be "fake", but are
actually registered domain names (such as domain.com, registry.com, mytest.com).

Even though we were not actually making connections to these domains, it's
better to use domains that are designated for testing/examples in RFC2606:
https://tools.ietf.org/html/rfc2606

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-04-02 14:06:27 +02:00
Tibor Vass
5b11047c25
Merge pull request #42188 from AkihiroSuda/fix-overlay2-naivediff
rootless: overlay2: fix "createDirWithOverlayOpaque(...) ... input/output error"
2021-04-01 05:03:24 -07:00
Akihiro Suda
248f98ef5e
rootless: bind mount: fix "operation not permitted"
The following was failing previously, because `getUnprivilegedMountFlags()` was not called:
```console
$ sudo mount -t tmpfs -o noexec none /tmp/foo
$ $ docker --context=rootless run -it --rm -v /tmp/foo:/mnt:ro alpine
docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:520: container init caused: rootfs_linux.go:60: mounting "/tmp/foo" to rootfs at "/home/suda/.local/share/docker/overlay2/b8e7ea02f6ef51247f7f10c7fb26edbfb308d2af8a2c77915260408ed3b0a8ec/merged/mnt" caused: operation not permitted: unknown.
```

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-04-01 14:58:11 +09:00
Akihiro Suda
6322dfc217
archive: do not use overlayWhiteoutConverter for UserNS
overlay2 no longer sets `archive.OverlayWhiteoutFormat` when
running in UserNS, so we can remove the complicated logic in the
archive package.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-03-29 14:47:12 +09:00
Akihiro Suda
67aa418df2
overlay2: doesSupportNativeDiff: add fast path for userns
When running in userns, returns error (i.e. "use naive, not native")
immediately.

No substantial change to the logic.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-03-29 14:47:09 +09:00
Akihiro Suda
dd97134232
overlay2: call d.naiveDiff.ApplyDiff when useNaiveDiff==true
Previously, `d.naiveDiff.ApplyDiff` was not used even when
`useNaiveDiff()==true`

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-03-26 14:34:56 +09:00
Akihiro Suda
62b5194f62
btrfs: Allow unprivileged user to delete subvolumes (kernel >= 4.18)
Fix issue 41762

Cherry-pick "drivers: btrfs: Allow unprivileged user to delete subvolumes" from containers/storage
831e32b6bd

> In btrfs, subvolume can be deleted by IOC_SNAP_DESTROY ioctl but there
> is one catch: unprivileged IOC_SNAP_DESTROY call is restricted by default.
>
> This is because IOC_SNAP_DESTROY only performs permission checks on
> the top directory(subvolume) and unprivileged user might delete dirs/files
> which cannot be deleted otherwise. This restriction can be relaxed if
> user_subvol_rm_allowed mount option is used.
>
> Although the above ioctl had been the only way to delete a subvolume,
> btrfs now allows deletion of subvolume just like regular directory
> (i.e. rmdir sycall) since kernel 4.18.
>
> So if we fail to cleanup subvolume in subvolDelete(), just fallback to
> system.EnsureRmoveall() to try to cleanup subvolumes again.
> (Note: quota needs privilege, so if quota is enabled we do not fallback)
>
> This fix will allow non-privileged container works with btrfs backend.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-03-26 14:30:40 +09:00
lzhfromustc
5ffcd162b5 discovery & test: Fix goroutine leaks by adding 1 buffer to channel
Signed-off-by: Ziheng Liu <lzhfromustc@gmail.com>
2021-03-24 10:32:39 -04:00
Sebastiaan van Stijn
a432eb4b3a
ContainerExecStart(): don't wrap getExecConfig() errors, and prevent panic
daemon.getExecConfig() already returns typed errors; by wrapping those errors
we may loose the actual reason for failures. Changing the error-type was
originally added in 2d43d93410, but I think
it was not intentional to ignore already-typed errors. It was later refactored
in a793564b25, which added helper functions
to create these errors, but kept the same behavior.

Also adds error-handling to prevent a panic in situations where (although
unlikely) `daemon.containers.Get()` would not return a container.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-03-22 13:37:05 +01:00
Sebastiaan van Stijn
6eb5720233
Fix daemon.getExecConfig(): not using typed errNotRunning() error
This makes daemon.getExecConfig return a errdefs.Conflict() error if the
container is not running.

This was originally the case, but a refactor of this code changed the typed
error (`derr.ErrorCodeContainerNotRunning`) to a non-typed error;
a793564b25

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-03-22 13:37:03 +01:00
Brian Goff
788f2883d2
Merge pull request #42104 from cpuguy83/41820_fix_json_unexpected_eof 2021-03-18 14:18:11 -07:00
Brian Goff
a84d824c5f
Merge pull request #42068 from AkihiroSuda/ovl-k511 2021-03-18 11:54:01 -07:00
Brian Goff
5a664dc87d jsonfile: more defensive reader implementation
Tonis mentioned that we can run into issues if there is more error
handling added here. This adds a custom reader implementation which is
like io.MultiReader except it does not cache EOF's.
What got us into trouble in the first place is `io.MultiReader` will
always return EOF once it has received an EOF, however the error
handling that we are going for is to recover from an EOF because the
underlying file is a file which can have more data added to it after
EOF.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2021-03-18 18:44:46 +00:00
Brian Goff
ece4cd4c4d
Merge pull request #41757 from thaJeztah/carry_39371_remove_more_v1_code 2021-03-18 11:38:07 -07:00
Akihiro Suda
039e9670cb
info: unset cgroup-related fields when CgroupDriver == none
Fix issue 42151

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-03-16 16:17:22 +09:00
Brian Goff
4be98a38e7 Fix handling for json-file io.UnexpectedEOF
When the multireader hits EOF, we will always get EOF from it, so we
cannot store the multrireader fro later error handling, only for the
decoder.

Thanks @tobiasstadler for pointing this error out.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2021-03-11 20:01:03 +00:00
Akihiro Suda
a8008f7313
overlayutils/userxattr.go: add "fast path" for kernel >= 5.11.0
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-03-11 15:18:59 +09:00
Akihiro Suda
11ef8d3ba9
overlay2: support "userxattr" option (kernel 5.11)
The "userxattr" option is needed for mounting overlayfs inside a user namespace with kernel >= 5.11.

The "userxattr" option is NOT needed for the initial user namespace (aka "the host").

Also, Ubuntu (since circa 2015) and Debian (since 10) with kernel < 5.11 can mount the overlayfs in a user namespace without the "userxattr" option.

The corresponding kernel commit: 2d2f2d7322ff43e0fe92bf8cccdc0b09449bf2e1
> **ovl: user xattr**
>
> Optionally allow using "user.overlay." namespace instead of "trusted.overlay."
> ...
> Disable redirect_dir and metacopy options, because these would allow privilege escalation through direct manipulation of the
> "user.overlay.redirect" or "user.overlay.metacopy" xattrs.

Fix issue 42055

Related to containerd/containerd PR 5076

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-03-11 15:12:41 +09:00
Xia Wu
d10046f228 Add an option to skip create log stream for awslogs driver
Added an option `awslogs-create-stream` to allow skipping log stream
creation for awslogs log driver. The default value is still true to
keep the behavior be consistent with before.

Signed-off-by: Xia Wu <xwumzn@amazon.com>
2021-03-09 15:49:43 -08:00
Sebastiaan van Stijn
328de0b8d9
Update documentation links
- Using "/go/" redirects for some topics, which allows us to
  redirect to new locations if topics are moved around in the
  documentation.
- Updated some old URLs to their new location.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-02-25 12:11:50 +01:00
Tibor Vass
7a50fe8a52
Remove more of registry v1 code.
Signed-off-by: Tibor Vass <tibor@docker.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-02-23 09:49:46 +01:00
Sebastiaan van Stijn
0f3b94a5c7
daemon: remove migration code from docker 1.11 to 1.12
This code was added in 391441c28b, to fix
upgrades from docker 1.11 to 1.12 with existing containers.

Given that any container after 1.12 should have the correct configuration
already, it should be safe to assume this upgrade logic is no longer needed.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-02-22 11:36:43 +01:00
Sebastiaan van Stijn
8b6d9eaa55
Merge pull request #42044 from nathanlcarlson/labels_regex_length_check
Check the length of the correct variable #42039
2021-02-18 22:22:44 +01:00
Sebastiaan van Stijn
56ffa614d6
Merge pull request #41955 from cpuguy83/fallback_manifest_on_bad_plat
Fallback to  manifest list when no platform match
2021-02-18 20:59:51 +01:00
Brian Goff
50f39e7247 Move cpu variant checks into platform matcher
Wrap platforms.Only and fallback to our ignore mismatches due to  empty
CPU variants. This just cleans things up and makes the logic re-usable
in other places.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2021-02-18 16:58:48 +00:00
Nathan Carlson
8d73c1ad68 Check the length of the correct variable #42039
Signed-off-by: Nathan Carlson <carl4403@umn.edu>
2021-02-18 10:27:35 -06:00
Brian Goff
4be5453215 Fallback to manifest list when no platform match
In some cases, in fact many in the wild, an image may have the incorrect
platform on the image config.
This can lead to failures to run an image, particularly when a user
specifies a `--platform`.
Typically what we see in the wild is a manifest list with an an entry
for, as an example, linux/arm64 pointing to an image config that has
linux/amd64 on it.

This change falls back to looking up the manifest list for an image to
see if the manifest list shows the image as the correct one for that
platform.

In order to accomplish this we need to traverse the leases associated
with an image. Each image, if pulled with Docker 20.10, will have the
manifest list stored in the containerd content store with the resource
assigned to a lease keyed on the image ID.
So we look up the lease for the image, then look up the assocated
resources to find the manifest list, then check the manifest list for a
platform match, then ensure that manifest referes to our image config.

This is only used as a fallback when a user specified they want a
particular platform and the image config that we have does not match
that platform.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2021-02-17 19:10:48 +00:00
Akihiro Suda
1d2a660093
Move cgroup v2 out of experimental
We have upgraded runc to rc93 and added CI for cgroup 2.
So we can move cgroup v2 out of experimental.

Fix issue 41916

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-02-16 17:54:28 +09:00
Tibor Vass
7359a3b1e9
Merge pull request #41567 from J-jaeyoung/fix_off_by_one
Update array length check logic for preventing off-by-one error
2021-02-11 11:18:23 -08:00
Sebastiaan van Stijn
264353425a
Merge pull request #41698 from cpuguy83/fix_shutdown_handling
Move container exit state to after cleanup.
2021-02-11 20:18:00 +01:00
Sebastiaan van Stijn
45bb0860b6
Merge pull request #41320 from pjbgf/add-seccomp-tests
Add test coverage to seccomp.
2021-02-10 17:14:15 +01:00
Sebastiaan van Stijn
1c39b1c44c
Merge pull request #41842 from jchorl/master
Reject null manifests during tar import
2021-02-09 12:06:27 +01:00
dmytro.iakovliev
58825ffc32 Fix for lack of syncromization in daemon/update.go
Signed-off-by: dmytro.iakovliev <dmytro.iakovliev@zodiacsystems.com>
2021-02-09 09:34:20 +02:00
Paulo Gomes
137f86067c
Add test coverage for seccomp implementation
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
2021-02-04 19:47:07 +00:00
Josh Chorlton
654f854fae reject null manifests
Signed-off-by: Josh Chorlton <jchorlton@gmail.com>
2021-02-02 09:24:53 -08:00
Tibor Vass
2bd6213363
Merge pull request #41965 from thaJeztah/buildkit_apparmor_master
[master] Ensure AppArmor and SELinux profiles are applied when building with BuildKit
2021-02-02 08:52:11 -08:00