Commit graph

39295 commits

Author SHA1 Message Date
Tonis Tiigi
b53ea19c49 builder: fix pull synchronization regression
Config resolution was synchronized based on a wrong key as ref
variable is initialized only after in the same function. Using
the right key isn't fully correct either as the synchronized method
changes properties of the puller instance and can't be just skipped.
Added better error handling for the same case as well.

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
2021-02-16 22:48:37 -08:00
Tianon Gravi
646072ed65
Merge pull request #42024 from LeviHarrison/fix-grammar
Fix grammar in client function comments
2021-02-16 09:57:12 -08:00
Brian Goff
3d96682687
Merge pull request #41936 from thaJeztah/fix_image_reference 2021-02-16 09:39:19 -08:00
Levi Harrison
8128a9a478 Fix grammar in client function comments
Changes certain words and adds punctuation to the comments of functions in the client package, which end up in the GoDoc documentation. Areas where only periods were needed were ignored to prevent excessive code churn.

Signed-off-by: Levi Harrison <levisamuelharrison@gmail.com>
2021-02-16 10:07:44 -05:00
Sebastiaan van Stijn
2834afe426
Merge pull request #41925 from AkihiroSuda/cgroup2ci-jenkins
Jenkinsfile: add cgroup2
2021-02-16 09:21:00 +01:00
Sebastiaan van Stijn
fa480403c7
TestBuildUserNamespaceValidateCapabilitiesAreV2: verify build completed
Check if the `docker build` completed successfully before continuing.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-02-15 16:08:40 +01:00
Sebastiaan van Stijn
26965fbfa0
TestBuildUserNamespaceValidateCapabilitiesAreV2: use correct image name
This currently doesn't make a difference, because load.FrozenImagesLinux()
currently loads all frozen images, not just the specified one, but in case
that is fixed/implemented at some point.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-02-15 14:02:41 +01:00
Sebastiaan van Stijn
01ae718aef
Merge pull request #41984 from tonistiigi/pax-parent
archive: avoid creating parent dirs for XGlobalHeader
2021-02-12 17:58:08 +01:00
Sebastiaan van Stijn
806a090133
Merge pull request #41994 from thaJeztah/bump_runc_binary
Bump runc binary v1.0.0-rc93
2021-02-12 11:59:22 +01:00
Sebastiaan van Stijn
28e5a3c5a4
update runc binary to v1.0.0-rc93
full diff: https://github.com/opencontainers/runc/compare/v1.0.0-rc92...v1.0.0-rc93
release notes: https://github.com/opencontainers/runc/releases/tag/v1.0.0-rc93

Release notes for runc v1.0.0-rc93
-------------------------------------------------

This is the last feature-rich RC release and we are in a feature-freeze until
1.0. 1.0.0~rc94 will be released in a few weeks with minimal bug fixes only,
and 1.0.0 will be released soon afterwards.

- runc's cgroupv2 support is no longer considered experimental. It is now
  believed to be fully ready for production deployments. In addition, runc's
  cgroup code has been improved:
    - The systemd cgroup driver has been improved to be more resilient and
      handle more systemd properties correctly.
    - We now make use of openat2(2) when possible to improve the security of
      cgroup operations (in future runc will be wholesale ported to libpathrs to
      get this protection in all codepaths).
- runc's mountinfo parsing code has been reworked significantly, making
  container startup times significantly faster and less wasteful in general.
- runc now has special handling for seccomp profiles to avoid making new
  syscalls unusable for glibc. This is done by installing a custom prefix to
  all seccomp filters which returns -ENOSYS for syscalls that are newer than
  any syscall in the profile (meaning they have a larger syscall number).

  This should not cause any regressions (because previously users would simply
  get -EPERM rather than -ENOSYS, and the rule applied above is the most
  conservative rule possible) but please report any regressions you find as a
  result of this change -- in particular, programs which have special fallback
  code that is only run in the case of -EPERM.
- runc now supports the following new runtime-spec features:
    - The umask of a container can now be specified.
    - The new Linux 5.9 capabilities (CAP_PERFMON, CAP_BPF, and
      CAP_CHECKPOINT_RESTORE) are now supported.
    - The "unified" cgroup configuration option, which allows users to explicitly
      specify the limits based on the cgroup file names rather than abstracting
      them through OCI configuration. This is currently limited in scope to
      cgroupv2.
- Various rootless containers improvements:
    - runc will no longer cause conflicts if a user specifies a custom device
      which conflicts with a user-configured device -- the user device takes
      precedence.
    - runc no longer panics if /sys/fs/cgroup is missing in rootless mode.
- runc --root is now always treated as local to the current working directory.
- The --no-pivot-root hardening was improved to handle nested mounts properly
  (please note that we still strongly recommend that users do not use
  --no-pivot-root -- it is still an insecure option).
- A large number of code cleanliness and other various cleanups, including
  fairly large changes to our tests and CI to make them all run more
  efficiently.

For packagers the following changes have been made which will have impact on
your packaging of runc:

- The "selinux" and "apparmor" buildtags have been removed, and now all runc
  builds will have SELinux and AppArmor support enabled. Note that "seccomp"
  is still optional (though we very highly recommend you enable it).
- make install DESTDIR= now functions correctly.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-02-11 21:46:33 +01:00
Sebastiaan van Stijn
c9bbc68e75
Merge pull request #42004 from Rid/42003-fix-userns-uid-username-match
Fix userns-remap option when username & UID match
2021-02-11 20:55:47 +01:00
Brian Goff
93ab21a193
Merge pull request #42009 from thaJeztah/fix_nanocpus_casing 2021-02-11 11:23:38 -08:00
Tibor Vass
7359a3b1e9
Merge pull request #41567 from J-jaeyoung/fix_off_by_one
Update array length check logic for preventing off-by-one error
2021-02-11 11:18:23 -08:00
Sebastiaan van Stijn
264353425a
Merge pull request #41698 from cpuguy83/fix_shutdown_handling
Move container exit state to after cleanup.
2021-02-11 20:18:00 +01:00
Sebastiaan van Stijn
45bb0860b6
Merge pull request #41320 from pjbgf/add-seccomp-tests
Add test coverage to seccomp.
2021-02-10 17:14:15 +01:00
Grant Millar
2ad187fd4a Fix userns-remap option when username & UID match
Signed-off-by: Grant Millar <rid@cylo.io>
2021-02-10 15:58:34 +00:00
Sebastiaan van Stijn
8e2343ffd4
docs: fix NanoCPUs casing
While the field in the Go struct is named `NanoCPUs`, it has a JSON label to
use `NanoCpus`, which was added in the original pull request (not clear what
the reason was); 846baf1fd3

Some notes:

- Golang processes field names case-insensitive, so when *using* the API,
  both cases should work, but when inspecting a container, the field is
  returned as `NanoCpus`.
- This only affects Containers.Resources. The `Limits` and `Reservation`
  for SwarmKit services and SwarmKit "nodes" do not override the name
  for JSON, so have the canonical (`NanoCPUs`) casing.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-02-10 13:02:27 +01:00
Sebastiaan van Stijn
2bd46ed7e5
api: fix NanoCPUs casing in swagger
While the field in the Go struct is named `NanoCPUs`, it has a JSON label to
use `NanoCpus`, which was added in the original pull request (not clear what
the reason was); 846baf1fd3

Some notes:

- Golang processes field names case-insensitive, so when *using* the API,
  both cases should work, but when inspecting a container, the field is
  returned as `NanoCpus`.
- This only affects Containers.Resources. The `Limits` and `Reservation`
  for SwarmKit services and SwarmKit "nodes" do not override the name
  for JSON, so have the canonical (`NanoCPUs`) casing.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-02-10 12:52:09 +01:00
Sebastiaan van Stijn
1c39b1c44c
Merge pull request #41842 from jchorl/master
Reject null manifests during tar import
2021-02-09 12:06:27 +01:00
Tianon Gravi
791640417b
Merge pull request #41995 from coolljt0725/coolljt0725/fix_dockerfile_simple
Dockerfile.simple: Fix compile docker binary error with btrfs
2021-02-06 09:33:47 -08:00
Tonis Tiigi
ba7906aef3 archive: avoid creating parent dirs for XGlobalHeader
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
2021-02-04 18:38:51 -08:00
Sebastiaan van Stijn
0af8ed47bb
Merge pull request #41919 from thaJeztah/fix_cgroup_rule_panic
Fix panic when starting container with invalid device cgroup rule
2021-02-04 21:29:31 +01:00
Paulo Gomes
137f86067c
Add test coverage for seccomp implementation
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
2021-02-04 19:47:07 +00:00
Lei Jiang
dd7ee8ea3e Dockerfile.simple: Fix compile docker binary error with btrfs
Use the image build from Dockerfile.simple to build docker binary failed
with not find <brtfs/ioctl.h>, we need to install libbtrfs-dev to fix this.
```
Building: bundles/dynbinary-daemon/dockerd-dev
GOOS="" GOARCH="" GOARM=""
.gopath/src/github.com/docker/docker/daemon/graphdriver/btrfs/btrfs.go:8:10: fatal error: btrfs/ioctl.h: No such file or directory
 #include <btrfs/ioctl.h>

```

Signed-off-by: Lei Jitang <leijitang@outlook.com>
2021-02-03 23:16:15 +00:00
Josh Chorlton
654f854fae reject null manifests
Signed-off-by: Josh Chorlton <jchorlton@gmail.com>
2021-02-02 09:24:53 -08:00
Tibor Vass
8d3179546e
Merge pull request #41966 from thaJeztah/CVE-2021-21285_master
[master] prevent an invalid image from crashing docker daemon (CVE-2021-21285)
2021-02-02 09:16:18 -08:00
Tibor Vass
2bd6213363
Merge pull request #41965 from thaJeztah/buildkit_apparmor_master
[master] Ensure AppArmor and SELinux profiles are applied when building with BuildKit
2021-02-02 08:52:11 -08:00
Tibor Vass
64bd4485b3
Merge pull request #41964 from thaJeztah/CVE-2021-21284_master
[master] Fix Access to remapped root allows privilege escalation to real root (CVE-2021-21284)
2021-02-02 08:49:34 -08:00
Brian Goff
c747d9f8ee
pull: Validate layer digest format
Otherwise a malformed or empty digest may cause a panic.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit a7d4af84bd)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-02-02 13:37:24 +01:00
Brian Goff
94c07441c2
buildkit: Apply apparmor profile
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit 611eb6ffb3)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-02-02 13:32:24 +01:00
Tibor Vass
28a623aa3a
vendor buildkit 68bb095353c65bc3993fd534c26cf77fe05e61b1
Signed-off-by: Tibor Vass <tibor@docker.com>
(cherry picked from commit 4afe620fac)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-02-02 13:27:03 +01:00
Brian Goff
7f5e39bd4f
Use real root with 0701 perms
Various dirs in /var/lib/docker contain data that needs to be mounted
into a container. For this reason, these dirs are set to be owned by the
remapped root user, otherwise there can be permissions issues.
However, this uneccessarily exposes these dirs to an unprivileged user
on the host.

Instead, set the ownership of these dirs to the real root (or rather the
UID/GID of dockerd) with 0701 permissions, which allows the remapped
root to enter the directories but not read/write to them.
The remapped root needs to enter these dirs so the container's rootfs
can be configured... e.g. to mount /etc/resolve.conf.

This prevents an unprivileged user from having read/write access to
these dirs on the host.
The flip side of this is now any user can enter these directories.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit e908cc3901)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-02-02 13:01:25 +01:00
Brian Goff
4b5aa28f24
Do not set DOCKER_TMP to be owned by remapped root
The remapped root does not need access to this dir.
Having this owned by the remapped root opens the host up to an
uprivileged user on the host being able to escalate privileges.

While it would not be normal for the remapped UID to be used outside of
the container context, it could happen.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit bfedd27259)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-02-02 13:01:22 +01:00
Brian Goff
66dffbec86
Ensure MkdirAllAndChown also sets perms
Generally if we ever need to change perms of a dir, between versions,
this ensures the permissions actually change when we think it should
change without having to handle special cases if it already existed.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit edb62a3ace)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-02-02 13:01:20 +01:00
Akihiro Suda
c23b99f4db
Jenkinsfile: add cgroup2
Thanks to Stefan Scherer for setting up the Jenkins nodes.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-02-01 14:48:34 +09:00
Akihiro Suda
c316dd7cc5
TestInspectOomKilledTrue: skip on cgroup v2
The test fails intermittently on cgroup v2.

```
=== FAIL: amd64.integration.container TestInspectOomKilledTrue (0.53s)
    kill_test.go:171: assertion failed: true (true bool) != false (inspect.State.OOMKilled bool)
```

Tracked in issue 41929

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-01-29 16:05:15 +09:00
Brian Goff
3e0025e2fc
Merge pull request #41689 from thaJeztah/switch_hcsshim
vendor: update github.com/Microsoft/hcsshim v0.8.10 (back to tagged release)
2021-01-28 13:34:58 -08:00
Sebastiaan van Stijn
3c3a2ff2d4
Merge pull request #41947 from AkihiroSuda/rootless-kill-mode-mixed
rootless: prevent the service hanging when stopping (set systemd KillMode to mixed)
2021-01-28 22:00:33 +01:00
Brian Goff
35c2d1cd3c
Merge pull request #41917 from AkihiroSuda/fix-cgroup2-tests
TestCgroupNamespacesRunOlderClient: support cgroup v2
2021-01-28 11:54:28 -08:00
Brian Goff
452baa2059
Merge pull request #41939 from thaJeztah/swagger_docs_fixes
docs: fix double "the" in existing API versions
2021-01-28 11:53:09 -08:00
Brian Goff
e192ce4009 Move container exit state to after cleanup.
Before this change, there is no way to know if container (runtime)
resources have been cleaned up unless you actually remove the container.

This change allows callers of the wait API or the events API to know
that all runtime resources for the container are released (e.g. IP
addresses).

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2021-01-28 11:28:41 -08:00
Sebastiaan van Stijn
e422445418
Merge pull request #41892 from AkihiroSuda/fix-41803
pkg/archive: allow mknodding FIFO inside userns
2021-01-28 08:26:05 +01:00
Akihiro Suda
05566adf71
rootless: set systemd KillMode to mixed
Now `systemctl --user stop docker` completes just with in 1 or 2 seconds.

Fix issue 41944 ("Docker rootless does not exit properly if containers are running")

See systemd.kill(5) https://www.freedesktop.org/software/systemd/man/systemd.kill.html

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-01-28 15:19:43 +09:00
Sebastiaan van Stijn
e64651075d
Merge pull request #41932 from thaJeztah/bump_buildx
Dockerfile.buildx: update buildx to v0.5.1
2021-01-27 23:04:54 +01:00
Sebastiaan van Stijn
240d0b37bb
docs: fix double "the" in existing API versions
Backport of 2db5676c6e to the swagger files
used in the documentation

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-01-27 12:24:47 +01:00
Sebastiaan van Stijn
c189e5be88
Merge pull request #41924 from FreddieOliveira/patch-1
swagger.yaml: Remove extra 'the' wrapped by newline
2021-01-27 12:18:07 +01:00
Akihiro Suda
76f4bbd0a8
Merge pull request #41709 from thaJeztah/bump_docker_py
testing: update docker-py 4.4.1
2021-01-27 18:14:23 +09:00
Akihiro Suda
dc7a89990d
Merge pull request #41889 from cyphar/seccomp-update
profiles: seccomp: update to Linux 5.11 syscall list
2021-01-27 18:13:51 +09:00
Aleksa Sarai
54eff4354b
profiles: seccomp: update to Linux 5.11 syscall list
These syscalls (some of which have been in Linux for a while but were
missing from the profile) fall into a few buckets:

 * close_range(2), epoll_pwait2(2) are just extensions of existing "safe
   for everyone" syscalls.

 * The mountv2 API syscalls (fs*(2), move_mount(2), open_tree(2)) are
   all equivalent to aspects of mount(2) and thus go into the
   CAP_SYS_ADMIN category.

 * process_madvise(2) is similar to the other process_*(2) syscalls and
   thus goes in the CAP_SYS_PTRACE category.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2021-01-27 13:25:49 +11:00
Tibor Vass
d5209b29b9
Merge pull request #41927 from tiborvass/execabs
Use golang.org/x/sys/execabs
2021-01-26 09:15:54 -08:00