Config resolution was synchronized based on a wrong key as ref
variable is initialized only after in the same function. Using
the right key isn't fully correct either as the synchronized method
changes properties of the puller instance and can't be just skipped.
Added better error handling for the same case as well.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Changes certain words and adds punctuation to the comments of functions in the client package, which end up in the GoDoc documentation. Areas where only periods were needed were ignored to prevent excessive code churn.
Signed-off-by: Levi Harrison <levisamuelharrison@gmail.com>
This currently doesn't make a difference, because load.FrozenImagesLinux()
currently loads all frozen images, not just the specified one, but in case
that is fixed/implemented at some point.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
full diff: https://github.com/opencontainers/runc/compare/v1.0.0-rc92...v1.0.0-rc93
release notes: https://github.com/opencontainers/runc/releases/tag/v1.0.0-rc93
Release notes for runc v1.0.0-rc93
-------------------------------------------------
This is the last feature-rich RC release and we are in a feature-freeze until
1.0. 1.0.0~rc94 will be released in a few weeks with minimal bug fixes only,
and 1.0.0 will be released soon afterwards.
- runc's cgroupv2 support is no longer considered experimental. It is now
believed to be fully ready for production deployments. In addition, runc's
cgroup code has been improved:
- The systemd cgroup driver has been improved to be more resilient and
handle more systemd properties correctly.
- We now make use of openat2(2) when possible to improve the security of
cgroup operations (in future runc will be wholesale ported to libpathrs to
get this protection in all codepaths).
- runc's mountinfo parsing code has been reworked significantly, making
container startup times significantly faster and less wasteful in general.
- runc now has special handling for seccomp profiles to avoid making new
syscalls unusable for glibc. This is done by installing a custom prefix to
all seccomp filters which returns -ENOSYS for syscalls that are newer than
any syscall in the profile (meaning they have a larger syscall number).
This should not cause any regressions (because previously users would simply
get -EPERM rather than -ENOSYS, and the rule applied above is the most
conservative rule possible) but please report any regressions you find as a
result of this change -- in particular, programs which have special fallback
code that is only run in the case of -EPERM.
- runc now supports the following new runtime-spec features:
- The umask of a container can now be specified.
- The new Linux 5.9 capabilities (CAP_PERFMON, CAP_BPF, and
CAP_CHECKPOINT_RESTORE) are now supported.
- The "unified" cgroup configuration option, which allows users to explicitly
specify the limits based on the cgroup file names rather than abstracting
them through OCI configuration. This is currently limited in scope to
cgroupv2.
- Various rootless containers improvements:
- runc will no longer cause conflicts if a user specifies a custom device
which conflicts with a user-configured device -- the user device takes
precedence.
- runc no longer panics if /sys/fs/cgroup is missing in rootless mode.
- runc --root is now always treated as local to the current working directory.
- The --no-pivot-root hardening was improved to handle nested mounts properly
(please note that we still strongly recommend that users do not use
--no-pivot-root -- it is still an insecure option).
- A large number of code cleanliness and other various cleanups, including
fairly large changes to our tests and CI to make them all run more
efficiently.
For packagers the following changes have been made which will have impact on
your packaging of runc:
- The "selinux" and "apparmor" buildtags have been removed, and now all runc
builds will have SELinux and AppArmor support enabled. Note that "seccomp"
is still optional (though we very highly recommend you enable it).
- make install DESTDIR= now functions correctly.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
While the field in the Go struct is named `NanoCPUs`, it has a JSON label to
use `NanoCpus`, which was added in the original pull request (not clear what
the reason was); 846baf1fd3
Some notes:
- Golang processes field names case-insensitive, so when *using* the API,
both cases should work, but when inspecting a container, the field is
returned as `NanoCpus`.
- This only affects Containers.Resources. The `Limits` and `Reservation`
for SwarmKit services and SwarmKit "nodes" do not override the name
for JSON, so have the canonical (`NanoCPUs`) casing.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
While the field in the Go struct is named `NanoCPUs`, it has a JSON label to
use `NanoCpus`, which was added in the original pull request (not clear what
the reason was); 846baf1fd3
Some notes:
- Golang processes field names case-insensitive, so when *using* the API,
both cases should work, but when inspecting a container, the field is
returned as `NanoCpus`.
- This only affects Containers.Resources. The `Limits` and `Reservation`
for SwarmKit services and SwarmKit "nodes" do not override the name
for JSON, so have the canonical (`NanoCPUs`) casing.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Use the image build from Dockerfile.simple to build docker binary failed
with not find <brtfs/ioctl.h>, we need to install libbtrfs-dev to fix this.
```
Building: bundles/dynbinary-daemon/dockerd-dev
GOOS="" GOARCH="" GOARM=""
.gopath/src/github.com/docker/docker/daemon/graphdriver/btrfs/btrfs.go:8:10: fatal error: btrfs/ioctl.h: No such file or directory
#include <btrfs/ioctl.h>
```
Signed-off-by: Lei Jitang <leijitang@outlook.com>
Otherwise a malformed or empty digest may cause a panic.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit a7d4af84bd)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Various dirs in /var/lib/docker contain data that needs to be mounted
into a container. For this reason, these dirs are set to be owned by the
remapped root user, otherwise there can be permissions issues.
However, this uneccessarily exposes these dirs to an unprivileged user
on the host.
Instead, set the ownership of these dirs to the real root (or rather the
UID/GID of dockerd) with 0701 permissions, which allows the remapped
root to enter the directories but not read/write to them.
The remapped root needs to enter these dirs so the container's rootfs
can be configured... e.g. to mount /etc/resolve.conf.
This prevents an unprivileged user from having read/write access to
these dirs on the host.
The flip side of this is now any user can enter these directories.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit e908cc3901)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The remapped root does not need access to this dir.
Having this owned by the remapped root opens the host up to an
uprivileged user on the host being able to escalate privileges.
While it would not be normal for the remapped UID to be used outside of
the container context, it could happen.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit bfedd27259)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Generally if we ever need to change perms of a dir, between versions,
this ensures the permissions actually change when we think it should
change without having to handle special cases if it already existed.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit edb62a3ace)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Before this change, there is no way to know if container (runtime)
resources have been cleaned up unless you actually remove the container.
This change allows callers of the wait API or the events API to know
that all runtime resources for the container are released (e.g. IP
addresses).
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Now `systemctl --user stop docker` completes just with in 1 or 2 seconds.
Fix issue 41944 ("Docker rootless does not exit properly if containers are running")
See systemd.kill(5) https://www.freedesktop.org/software/systemd/man/systemd.kill.html
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
These syscalls (some of which have been in Linux for a while but were
missing from the profile) fall into a few buckets:
* close_range(2), epoll_pwait2(2) are just extensions of existing "safe
for everyone" syscalls.
* The mountv2 API syscalls (fs*(2), move_mount(2), open_tree(2)) are
all equivalent to aspects of mount(2) and thus go into the
CAP_SYS_ADMIN category.
* process_madvise(2) is similar to the other process_*(2) syscalls and
thus goes in the CAP_SYS_PTRACE category.
Signed-off-by: Aleksa Sarai <asarai@suse.de>