Various dirs in /var/lib/docker contain data that needs to be mounted
into a container. For this reason, these dirs are set to be owned by the
remapped root user, otherwise there can be permissions issues.
However, this uneccessarily exposes these dirs to an unprivileged user
on the host.
Instead, set the ownership of these dirs to the real root (or rather the
UID/GID of dockerd) with 0701 permissions, which allows the remapped
root to enter the directories but not read/write to them.
The remapped root needs to enter these dirs so the container's rootfs
can be configured... e.g. to mount /etc/resolve.conf.
This prevents an unprivileged user from having read/write access to
these dirs on the host.
The flip side of this is now any user can enter these directories.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit e908cc3901)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The remapped root does not need access to this dir.
Having this owned by the remapped root opens the host up to an
uprivileged user on the host being able to escalate privileges.
While it would not be normal for the remapped UID to be used outside of
the container context, it could happen.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit bfedd27259)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Before this change, there is no way to know if container (runtime)
resources have been cleaned up unless you actually remove the container.
This change allows callers of the wait API or the events API to know
that all runtime resources for the container are released (e.g. IP
addresses).
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Loggers that implement BufSize() (e.g. awslogs) uses the method to
tell Copier about the maximum log line length. However loggerWithCache
and RingBuffer hide the method by wrapping loggers.
As a result, Copier uses its default 16KB limit which breaks log
lines > 16kB even the destinations can handle that.
This change implements BufSize() on loggerWithCache and RingBuffer to
make sure these logger wrappes don't hide the method on the underlying
loggers.
Fixes#41794.
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
The following warnings in `docker info` are now discarded,
because there is no action user can actually take.
On cgroup v1:
- "WARNING: No blkio weight support"
- "WARNING: No blkio weight_device support"
On cgroup v2:
- "WARNING: No kernel memory TCP limit support"
- "WARNING: No oom kill disable support"
`docker run` still prints warnings when the missing feature is being attempted to use.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
When pulling an image by platform, it is possible for the image's
configured platform to not match what was in the manifest list.
The image itself is buggy because either the manifest list is incorrect
or the image config is incorrect. In any case, this is preventing people
from upgrading because many times users do not have control over these
buggy images.
This was not a problem in 19.03 because we did not compare on platform
before. It just assumed if we had the image it was the one we wanted
regardless of platform, which has its own problems.
Example Dockerfile that has this problem:
```Dockerfile
FROM --platform=linux/arm64 k8s.gcr.io/build-image/debian-iptables:buster-v1.3.0
RUN echo hello
```
This fails the first time you try to build after it finishes pulling but
before performing the `RUN` command.
On the second attempt it works because the image is already there and
does not hit the code that errors out on platform mismatch (Actually it
ignores errors if an image is returned at all).
Must be run with the classic builder (DOCKER_BUILDKIT=0).
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This fixes a panic when an admin specifies a custom default runtime,
when a plugin is started the shim config is nil.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Adds a test case for the case where dockerd gets stuck on startup due to
hanging `daemon.shutdownContainer`
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This is a fix for https://github.com/docker/for-linux/issues/1012.
The code was not considering that C strings are NULL-terminated so
we need to leave one extra byte.
Without this fix, the testcase in https://github.com/docker/for-linux/issues/1012
fails with
```
Step 61/1001 : RUN echo 60 > 60
---> Running in dde85ac3b1e3
Removing intermediate container dde85ac3b1e3
---> 80a12a18a241
Step 62/1001 : RUN echo 61 > 61
error creating overlay mount to /23456789112345678921234/overlay2/d368abcc97d6c6ebcf23fa71225e2011d095295d5d8c9b31d6810bea748bdf07-init/merged: no such file or directory
```
with the output of `dmesg -T` as:
```
[Sat Dec 19 02:35:40 2020] overlayfs: failed to resolve '/23456789112345678921234/overlay2/89e435a1b24583c463abb73e8abfad8bf8a88312ef8253455390c5fa0a765517-init/wor': -2
```
with this fix, you get the expected:
```
Step 126/1001 : RUN echo 125 > 125
---> Running in 2f2e56da89e0
max depth exceeded
```
Signed-off-by: Oscar Bonilla <6f6231@gmail.com>
Previous startup sequence used to call "containerStop" on containers that were persisted with a running state but are not alive when restarting (can happen on non-clean shutdown).
This call was made before fixing-up the RunningState of the container, and tricked the daemon to trying to kill a non-existing process and ultimately hang.
The fix is very simple - just add a condition on calling containerStop.
Signed-off-by: Simon Ferquel <simon.ferquel@docker.com>
These tests fail when run by a non-root user
=== RUN TestTmpfsDevShmNoDupMount
oci_linux_test.go:29: assertion failed: error is not nil: mkdir /var/lib/docker: permission denied
--- FAIL: TestTmpfsDevShmNoDupMount (0.00s)
=== RUN TestIpcPrivateVsReadonly
oci_linux_test.go:29: assertion failed: error is not nil: mkdir /var/lib/docker: permission denied
--- FAIL: TestIpcPrivateVsReadonly (0.00s)
=== RUN TestSysctlOverride
oci_linux_test.go:29: assertion failed: error is not nil: mkdir /var/lib/docker: permission denied
--- FAIL: TestSysctlOverride (0.00s)
=== RUN TestSysctlOverrideHost
oci_linux_test.go:29: assertion failed: error is not nil: mkdir /var/lib/docker: permission denied
--- FAIL: TestSysctlOverrideHost (0.00s)
Signed-off-by: Arnaud Rebillout <elboulangero@gmail.com>
While working on some other code, noticed a bug in the jobs code. We're
adding job version after we're checking if there are port configs.
Before, if there were no port configs, the job version would be missing,
because we would return before trying to convert.
This moves the jobs version conversion above that code, so we don't
accidentally return before it.
Signed-off-by: Drew Erny <derny@mirantis.com>
Also move c.Lock() below containerd delete task, as it doesn't seem that
there is any necessity to hold the container lock while containerd is
killing the task.
This fixes a potential edge-case where containerd delete task hangs, and
thereafter all operations on the container would hang forever, as this
function is holding onto the container lock.
Signed-off-by: Cam <gh@sparr.email>
This ensures the storage-opts applies to all operations by the graph
drivers, replacing the merging of storage-opts into container storage
config at container-creation time, and hence applying storage-opts to
non-container operations like `COPY` and `ADD` in the builder.
Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
This applies the 127GB default WCOW Sandbox size to not just `RUN` under
`docker build` (as was previously the case) but to `COPY` and `ADD`
under `docker build` and also to `docker run`.
It also removes an inconsistency that the 127GB size was not applied
when `--platform windows` was not passed to `docker build`, but WCOW was
still used as a platform default, e.g. Docker Desktop for Windows in
Windows Containers mode.
Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
libcontainer does not guarantee a stable API, and is not intended
for external consumers.
this patch replaces some uses of libcontainer/cgroups with
containerd/cgroups.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
We aren't using containerd's image store, so we shouldn't be setting
this value.
This fixes container checkpoints, where containerd attempts to
checkpoint the image since one is set, but the image does not exist in
containerd.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This allows us to cache manifests and avoid extra round trips to the
registry for content we already know about.
dockerd currently does not support containerd on Windows, so this does
not store manifests on Windows, yet.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This prevents consumers of the opts package to also having to
depend on daemon/network, and everything related.
We can probably change some of the other constants to strings,
for easier concatenating, and need to review the windows-specific
"127.0.0.1" (instead of "localhost"), which may no longer be
needed.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
full diff: https://github.com/moby/sys/compare/mountinfo/v0.1.3...mountinfo/v0.4.0
> Note that this dependency uses submodules, providing "github.com/moby/sys/mount"
> and "github.com/moby/sys/mountinfo". Our vendoring tool (vndr) currently doesn't
> support submodules, so we vendor the top-level moby/sys repository (which contains
> both) and pick the most recent tag, which could be either `mountinfo/vXXX` or
> `mount/vXXX`.
github.com/moby/sys/mountinfo v0.4.0
--------------------------------------------------------------------------------
Breaking changes:
- `PidMountInfo` is now deprecated and will be removed before v1.0; users should switch to `GetMountsFromReader`
Fixes and improvements:
- run filter after all fields are parsed
- correct handling errors from bufio.Scan
- documentation formatting fixes
github.com/moby/sys/mountinfo v0.3.1
--------------------------------------------------------------------------------
- mount: use MNT_* flags from golang.org/x/sys/unix on freebsd
- various godoc and CI fixes
- mountinfo: make GetMountinfoFromReader Linux-specific
- Add support for OpenBSD in addition to FreeBSD
- mountinfo: use idiomatic naming for fields
github.com/moby/sys/mountinfo v0.2.0
--------------------------------------------------------------------------------
Bug fixes:
- Fix path unescaping for paths with double quotes
Improvements:
- Mounted: speed up by adding fast paths using openat2 (Linux-only) and stat
- Mounted: relax path requirements (allow relative, non-cleaned paths, symlinks)
- Unescape fstype and source fields
- Documentation improvements
Testing/CI:
- Unit tests: exclude darwin
- CI: run tests under Fedora 32 to test openat2
- TestGetMounts: fix for Ubuntu build system
- Makefile: fix ignoring test failures
- CI: add cross build
github.com/moby/sys/mount v0.1.1
--------------------------------------------------------------------------------
https://github.com/moby/sys/releases/tag/mount%2Fv0.1.1
Improvements:
- RecursiveUnmount: add a fast path (#26)
- Unmount: improve doc
- fix CI linter warning on Windows
Testing/CI:
- Unit tests: exclude darwin
- Makefile: fix ignoring test failures
- CI: add cross build
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This fixes the case where log rotation fails on Windows while there are
clients reading container logs.
Evicts readers if there is an error during rotation and try rotation again.
This is needed for Windows with this scenario:
1. `docker logs -f` is called
2. Log rotation occurs (log.txt -> log.txt.1, truncate and re-open
log.txt)
3. Log rotation occurs again (rm log.txt.1, log.txt -> log.txt.1)
On step 3, before this change, the log rotation will fail with `Access
is denied`.
In this case, what we have is a reader holding a file handle to the
primary log file. The log file is then rotated, but the reader still has
a the handle open. `FILE_SHARE_DELETE` allows this to happen... but then
we try to do it again for the next rotation and it blows up.
So when it blows up we force all the readers to disconnect, close the
log file, and try rotation again, which will succeed based on the added
tests.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This makes sure, on Windows, that all files are opened with
FILE_SHARE_DELETE.
On non-Windows this just calls the same `os.Open()`.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
- Ignore some pointless errors (like not exist on remove)
- Consolidate error handling/logging
- Fix race condition reading last log timestamp in the compression
goroutine. This needs to be done while holding the write lock, which
is not (or may not be) locked while compressing a rotated log file.
- Remove some indentation and consolidate mutex unlocking in
`compressFile`
Signed-off-by: Brian Goff <cpuguy83@gmail.com>