This syncs the seccomp profile with changes made to containerd's default
profile in [1].
The original containerd issue and PR mention:
> Security experts generally believe io_uring to be unsafe. In fact
> Google ChromeOS and Android have turned it off, plus all Google
> production servers turn it off. Based on the blog published by Google
> below it seems like a bunch of vulnerabilities related to io_uring can
> be exploited to breakout of the container.
>
> [2]
>
> Other security reaserchers also hold this opinion: see [3] for a
> blackhat presentation on io_uring exploits.
For the record, these syscalls were added to the allowlist in [4].
[1]: a48ddf4a20
[2]: https://security.googleblog.com/2023/06/learnings-from-kctf-vrps-42-linux.html
[3]: https://i.blackhat.com/BH-US-23/Presentations/US-23-Lin-bad_io_uring.pdf
[4]: https://github.com/moby/moby/pull/39415
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Adds test ensuring that additional groups set with `--group-add`
are kept on exec when container had `--user` set on run.
Regression test for https://github.com/moby/moby/issues/46712
Signed-off-by: Laura Brehm <laurabrehm@hey.com>
Kept `coci` import alias since we use it elsewhere,
maybe to prevent confusion with our own `oci` package.
Signed-off-by: Laura Brehm <laurabrehm@hey.com>
- Merge BC conds for API < v1.42 together
- Merge BC conds for API < v1.44 together
- Re-order BC conds by API version
- Move pids-limit normalization after BC conds
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
The same error is already returned by `(*Daemon).containerCreate()` but
since this function is also called by the cluster executor, the error
has to be duplicated.
Doing that allows to remove a nil check on container config in
`postContainersCreate`.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
containerd's `WithUser` function now resets this property, starting with
[3eda46af12b1deedab3d0802adb2e81cb3521950][1] (v1.7.0-beta.4), so we no
longer need this function.
[1]: 3eda46af12
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The github.com/opencontainers/runc/libcontainer/user package was moved
to a separate module. While there's still uses of the old module in
our code-base, runc itself is migrating to the new module, and deprecated
the old package (for runc 1.2).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
commit def549c8f6 passed through the context
to the daemon.ContainerStart function. As a result, restarting containers
no longer is an atomic operation, because a context cancellation could
interrupt the restart (between "stopping" and "(re)starting"), resulting
in the container being stopped, but not restarted.
Restarting a container, or more factually; making a successful request on
the `/containers/{id]/restart` endpoint, should be an atomic operation.
This patch uses a context.WithoutCancel for restart requests.
It's worth noting that daemon.containerStop already uses context.WithoutCancel,
so in that function, we'll be wrapping the context twice, but this should
likely not cause issues (just redundant for this code-path).
Before this patch, starting a container that bind-mounts the docker socket,
then restarting itself from within the container would cancel the restart
operation. The container would be stopped, but not started after that:
docker run -dit --name myself -v /var/run/docker.sock:/var/run/docker.sock docker:cli sh
docker exec myself sh -c 'docker restart myself'
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
3a2a741c65ff docker:cli "docker-entrypoint.s…" 26 seconds ago Exited (128) 7 seconds ago myself
With this patch: the stop still cancels the exec, but does not cancel the
restart operation, and the container is started again:
docker run -dit --name myself -v /var/run/docker.sock:/var/run/docker.sock docker:cli sh
docker exec myself sh -c 'docker restart myself'
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4393a01f7c75 docker:cli "docker-entrypoint.s…" About a minute ago Up 4 seconds myself
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Fix a silly bug in the implementation which had the effect of
len(h.Xattrs) blank entries being inserted in the middle of
orderedHeaders. Luckily this is not a load-bearing bug: empty headers
are ignored as the tarsum digest is computed by concatenating header
keys and values without any intervening delimiter.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The existing pkg/archive unit tests are primarily round-trip tests which
assert that pkg/archive produces tarballs which pkg/archive can unpack.
While these tests are effective at catching regressions in archiving or
unarchiving, they have a blind spot for regressions in compatibility
with the rest of the ecosystem. For example, a typo in the capabilities
extended attribute constant would result in subtly broken image layer
tarballs, but the existing tests would not catch the bug if both the
archiving and unarchiving implementations have the same typo.
Extend the test for archiving an overlay filesystem layer to assert that
the overlayfs style whiteouts (extended attributes and device files) are
transformed into AUFS-style whiteouts (magic file names).
Extend the test for archiving files with extended attributes to assert
that the extended attribute is encoded into the file's tar header in the
standard, interoperable format compatible with the rest of the
ecosystem.
Signed-off-by: Cory Snider <csnider@mirantis.com>