Commit graph

150 commits

Author SHA1 Message Date
Sebastiaan van Stijn
c3d7a0c603
Fix validation of IpcMode, PidMode, UTSMode, CgroupnsMode
These HostConfig properties were not validated until the OCI spec for the container
was created, which meant that `container run` and `docker create` would accept
invalid values, and the invalid value would not be detected until `start` was
called, returning a 500 "internal server error", as well as errors from containerd
("cleanup: failed to delete container from containerd: no such container") in the
daemon logs.

As a result, a faulty container was created, and the container state remained
in the `created` state.

This patch:

- Updates `oci.WithNamespaces()` to return the correct `errdefs.InvalidParameter`
- Updates `verifyPlatformContainerSettings()` to validate these settings, so that
  an error is returned when _creating_ the container.

Before this patch:

    docker run -dit --ipc=shared --name foo busybox
    2a00d74e9fbb7960c4718def8f6c74fa8ee754030eeb93ee26a516e27d4d029f
    docker: Error response from daemon: Invalid IPC mode: shared.

    docker ps -a --filter name=foo
    CONTAINER ID   IMAGE     COMMAND   CREATED              STATUS    PORTS     NAMES
    2a00d74e9fbb   busybox   "sh"      About a minute ago   Created             foo

After this patch:

    docker run -dit --ipc=shared --name foo busybox
    docker: Error response from daemon: invalid IPC mode: shared.

     docker ps -a --filter name=foo
    CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES

An integration test was added to verify the new validation, which can be run with:

    make BIND_DIR=. TEST_FILTER=TestCreateInvalidHostConfig DOCKER_GRAPHDRIVER=vfs test-integration

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-05-25 17:41:51 +02:00
Paweł Gronowski
85a7f5a09a daemon/linux: Set console size on creation
On Linux the daemon was not respecting the HostConfig.ConsoleSize
property and relied on cli initializing the tty size after the container
was created. This caused a delay between container creation and
the tty actually being resized.

This is also a small change to the api description, because
HostConfig.ConsoleSize is no longer Windows-only.

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2022-05-19 07:57:27 +02:00
Eng Zer Jun
7873c27cfb
all: replace strings.Replace with strings.ReplaceAll
strings.ReplaceAll(s, old, new) is a wrapper function for
strings.Replace(s, old, new, -1). But strings.ReplaceAll is more
readable and removes the hardcoded -1.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2022-05-09 19:45:40 +08:00
Sebastiaan van Stijn
0a3336fd7d
Merge pull request #43366 from corhere/finish-identitymapping-refactor
Finish refactor of UID/GID usage to a new struct
2022-03-25 14:51:05 +01:00
Elias Koromilas
8c7ea316d1 Mount (accessible) host devices in --privileged rootless containers
Signed-off-by: Elias Koromilas <elias.koromilas@gmail.com>
2022-03-23 22:30:22 +02:00
Cory Snider
098a44c07f Finish refactor of UID/GID usage to a new struct
Finish the refactor which was partially completed with commit
34536c498d, passing around IdentityMapping structs instead of pairs of
[]IDMap slices.

Existing code which uses []IDMap relies on zero-valued fields to be
valid, empty mappings. So in order to successfully finish the
refactoring without introducing bugs, their replacement therefore also
needs to have a useful zero value which represents an empty mapping.
Change IdentityMapping to be a pass-by-value type so that there are no
nil pointers to worry about.

The functionality provided by the deprecated NewIDMappingsFromMaps
function is required by unit tests to to construct arbitrary
IdentityMapping values. And the daemon will always need to access the
mappings to pass them to the Linux kernel. Accommodate these use cases
by exporting the struct fields instead. BuildKit currently depends on
the UIDs and GIDs methods so we cannot get rid of them yet.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-03-14 16:28:57 -04:00
Sebastiaan van Stijn
011e1c71ff
Merge pull request #43131 from thaJeztah/move_cpu_realtime_checks
daemon: move check for CPU-realtime daemon options
2022-03-07 19:27:12 +01:00
Sebastiaan van Stijn
5263bea70f
daemon: move check for CPU-realtime daemon options
Perform the validation when the daemon starts instead of performing these
validations for each individual container, so that we can fail early.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-03-03 19:50:27 +01:00
Sebastiaan van Stijn
821b4d4108
daemon/config: DefaultShmSize: minor tweak and improve docs
I had to check what the actual size was, so added it to the const's documentation.

While at it, also made use of it in a test, so that we're testing against the expected
value, and changed one alias to be consistent with other places where we alias this
import.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-02-18 23:29:36 +01:00
Sebastiaan van Stijn
9d9b8e0cf3
daemon.WithDevices(): use containerd's HostDevices()
Trying to reduce the use of libcontainer/devices, as it's considered
to be an "internal" package by runc.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-12-01 15:42:18 +01:00
Sebastiaan van Stijn
a826ca3aef
daemon.WithCommonOptions() fix detection of user-namespaces
Commit dae652e2e5 added support for non-privileged
containers to use ICMP_PROTO (used for `ping`). This option cannot be set for
containers that have user-namespaces enabled.

However, the detection looks to be incorrect; HostConfig.UsernsMode was added
in 6993e891d1 / ee2183881b,
and the property only has meaning if the daemon is running with user namespaces
enabled. In other situations, the property has no meaning.
As a result of the above, the sysctl would only be set for containers running
with UsernsMode=host on a daemon running with user-namespaces enabled.

This patch adds a check if the daemon has user-namespaces enabled (RemappedRoot
having a non-empty value), or if the daemon is running inside a user namespace
(e.g. rootless mode) to fix the detection.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-08-30 19:48:29 +02:00
Eng Zer Jun
c55a4ac779
refactor: move from io/ioutil to io and os package
The io/ioutil package has been deprecated in Go 1.16. This commit
replaces the existing io/ioutil functions with their new definitions in
io and os packages.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2021-08-27 14:56:57 +08:00
Brian Goff
9674540ccf
Merge pull request #42520 from thaJeztah/remove_lcow_step5_alternative
Remove LCOW (step 5): volumes/mounts: remove LCOW code (alternative)
2021-07-26 10:24:52 -07:00
Sebastiaan van Stijn
9b795c3e50
pkg/sysinfo.New(), daemon.RawSysInfo(): remove "quiet" argument
The "quiet" argument was only used in a single place (at daemon startup), and
every other use had to pass "false" to prevent this function from logging
warnings.

Now that SysInfo contains the warnings that occurred when collecting the
system information, we can make leave it up to the caller to use those
warnings (and log them if wanted).

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-07-14 23:10:07 +02:00
Sebastiaan van Stijn
300c11c7c9
volume/mounts: remove "containerOS" argument from NewParser (LCOW code)
This changes mounts.NewParser() to create a parser for the current operatingsystem,
instead of one specific to a (possibly non-matching, in case of LCOW) OS.

With the OS-specific handling being removed, the "OS" parameter is also removed
from `daemon.verifyContainerSettings()`, and various other container-related
functions.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-07-02 13:51:55 +02:00
Sebastiaan van Stijn
472f21b923
replace uses of deprecated containerd/sys.RunningInUserNS()
This utility was moved to a separate package, which has no
dependencies.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-06-18 11:01:24 +02:00
Sebastiaan van Stijn
aa4dce742f
daemon: improve handling of ROOTLESSKIT_PARENT_EUID
- daemon.WithRootless():  make sure ROOTLESSKIT_PARENT_EUID is valid int
- daemon.RawSysInfo(): minor simplification, and rename variable that
  clashed with imported package.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-06-05 21:12:32 +02:00
Sebastiaan van Stijn
2834f842ee
Use containerd's apparmor package to detect if apparmor can be used
The runc/libcontainer apparmor package on master no longer checks if apparmor_parser
is enabled, or if we are running docker-in-docker.

While those checks are not relevant to runc (as it doesn't load the profile), these
checks _are_ relevant to us (and containerd). So switching to use the containerd
apparmor package, which does include the needed checks.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-04-08 20:22:08 +02:00
Akihiro Suda
248f98ef5e
rootless: bind mount: fix "operation not permitted"
The following was failing previously, because `getUnprivilegedMountFlags()` was not called:
```console
$ sudo mount -t tmpfs -o noexec none /tmp/foo
$ $ docker --context=rootless run -it --rm -v /tmp/foo:/mnt:ro alpine
docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:520: container init caused: rootfs_linux.go:60: mounting "/tmp/foo" to rootfs at "/home/suda/.local/share/docker/overlay2/b8e7ea02f6ef51247f7f10c7fb26edbfb308d2af8a2c77915260408ed3b0a8ec/merged/mnt" caused: operation not permitted: unknown.
```

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-04-01 14:58:11 +09:00
Sebastiaan van Stijn
6458f750e1
use containerd/cgroups to detect cgroups v2
libcontainer does not guarantee a stable API, and is not intended
for external consumers.

this patch replaces some uses of libcontainer/cgroups with
containerd/cgroups.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-11-09 15:00:32 +01:00
Sebastiaan van Stijn
65a33d02f6
Simplify getUser() to use libcontainer built-in functionality
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-09-09 13:25:59 +02:00
Aleksa Sarai
3108ae6226
oci: correctly use user.GetExecUser interface
A nil interface in Go is not the same as a nil pointer that satisfies
the interface. libcontainer/user has special handling for missing
/etc/{passwd,group} files but this is all based on nil interface checks,
which were broken by Docker's usage of the API.

When combined with some recent changes in runc that made read errors
actually be returned to the caller, this results in spurrious -EINVAL
errors when we should detect the situation as "there is no passwd file".

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2020-07-29 14:04:47 +02:00
Brian Goff
24f173a003 Replace service "Capabilities" w/ add/drop API
After dicussing with maintainers, it was decided putting the burden of
providing the full cap list on the client is not a good design.
Instead we decided to follow along with the container API and use cap
add/drop.

This brings in the changes already merged into swarmkit.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-07-27 10:09:42 -07:00
Kir Kolyshkin
e3cff19dd1 Untangle CPU RT controller init
Commit 56f77d5ade added code that is doing some very ugly things.
In partucular, calling cgroups.FindCgroupMountpointAndRoot() and
daemon.SysInfoRaw() inside a recursively-called initCgroupsPath()
not not a good thing to do.

This commit tries to partially untangle this by moving some expensive
checks and calls earlier, in a minimally invasive way (meaning I
tried hard to not break any logic, however weird it is).

This also removes double call to MkdirAll (not important, but it sticks
out) and renames the function to better reflect what it's doing.

Finally, this wraps some of the errors returned, and fixes the init
function to not ignore the error from itself.

This could be reworked more radically, but at least this this commit
we are calling expensive functions once, and only if necessary.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-26 16:19:52 -07:00
Sebastiaan van Stijn
4534a7afc3
daemon: use containerd/sys to detect UserNamespaces
The implementation in libcontainer/system is quite complicated,
and we only use it to detect if user-namespaces are enabled.

In addition, the implementation in containerd uses a sync.Once,
so that detection (and reading/parsing `/proc/self/uid_map`) is
only performed once.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-06-15 13:06:08 +02:00
Justin Cormack
dae652e2e5
Add default sysctls to allow ping sockets and privileged ports with no capabilities
Currently default capability CAP_NET_RAW allows users to open ICMP echo
sockets, and CAP_NET_BIND_SERVICE allows binding to ports under 1024.
Both of these are safe operations, and Linux now provides ways that
these can be set, per container, to be allowed without any capabilties
for non root users. Enable these by default. Users can revert to the
previous behaviour by overriding the sysctl values explicitly.

Signed-off-by: Justin Cormack <justin.cormack@docker.com>
2020-06-04 18:11:08 +01:00
Akihiro Suda
33ee7941d4 support --privileged --cgroupns=private on cgroup v1
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-04-21 23:11:32 +09:00
Sebastiaan van Stijn
5d040cbd16
daemon: fix capitalization of some functions
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-04-14 17:22:19 +02:00
Sebastiaan van Stijn
eeef12f469
daemon: address some minor linting issues and nits
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-04-14 17:22:17 +02:00
Kir Kolyshkin
39048cf656 Really switch to moby/sys/mount*
Switch to moby/sys/mount and mountinfo. Keep the pkg/mount for potential
outside users.

This commit was generated by the following bash script:

```
set -e -u -o pipefail

for file in $(git grep -l 'docker/docker/pkg/mount"' | grep -v ^pkg/mount); do
	sed -i -e 's#/docker/docker/pkg/mount"#/moby/sys/mount"#' \
		-e 's#mount\.\(GetMounts\|Mounted\|Info\|[A-Za-z]*Filter\)#mountinfo.\1#g' \
		$file
	goimports -w $file
done
```

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-20 09:46:25 -07:00
Akihiro Suda
ca4b51868a rootless: support --exec-opt native.cgroupdriver=systemd
Support cgroup as in Rootless Podman.

Requires cgroup v2 host with crun.
Tested with Ubuntu 19.10 (kernel 5.3, systemd 242), crun v0.12.1.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-02-14 15:32:31 +09:00
Sebastiaan van Stijn
e6c1820ef5
Merge pull request #40174 from AkihiroSuda/cgroup2
support cgroup2
2020-01-09 20:09:11 +01:00
Akhil Mohan
86ebbe16de
remove host directory check
Signed-off-by: Akhil Mohan <akhil.mohan@mayadata.io>
2020-01-02 14:28:51 +05:30
Akihiro Suda
19baeaca26 cgroup2: enable cgroup namespace by default
For cgroup v1, we were unable to change the default because of
compatibility issue.

For cgroup v2, we should change the default right now because switching
to cgroup v2 is already breaking change.

See also containers/libpod#4363 containers/libpod#4374

Privileged containers also use cgroupns=private by default.
https://github.com/containers/libpod/pull/4374#issuecomment-549776387

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-01-01 02:58:40 +09:00
Akhil Mohan
35b9e6989f
Make --device flag work in privileged mode
When a container is started in privileged mode, the device mappings
provided by `--device` flag was ignored. Now the device mappings will
be considered even in privileged mode.

Signed-off-by: Akhil Mohan <akhil.mohan@mayadata.io>
2019-12-06 18:43:56 +05:30
wenlxie
03b3ec1dd5
make --device works at privileged mode
Signed-off-by: wenlxie <wenlxie@ebay.com>
2019-12-06 18:17:03 +05:30
Olli Janatuinen
1308a3a99f Move DefaultCapabilities() to caps package
Signed-off-by: Olli Janatuinen <olli.janatuinen@gmail.com>
2019-11-14 21:13:16 +02:00
Justin Cormack
dde030a6b1
Merge pull request #40083 from thaJeztah/daemon_consts
daemon: use constants for AppArmor and Seccomp
2019-10-17 11:12:37 -07:00
Grant Millar
df7b8f458a daemon: Use short libnetwork ID in exec-root & update libnetwork
Signed-off-by: Grant Millar <rid@cylo.io>
2019-10-15 11:40:24 +01:00
Sebastiaan van Stijn
a33cf495f2
daemon: use constants for AppArmor profiles
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-10-13 19:16:12 +02:00
Sebastiaan van Stijn
07ff4f1de8
goimports: fix imports
Format the source according to latest goimports.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-09-18 12:56:54 +02:00
Rob Gulewich
072400fc4b Make cgroup namespaces configurable
This adds both a daemon-wide flag and a container creation property:
- Set the `CgroupnsMode: "host|private"` HostConfig property at
  container creation time to control what cgroup namespace the container
  is created in
- Set the `--default-cgroupns-mode=host|private` daemon flag to control
  what cgroup namespace containers are created in by default
- Set the default if the daemon flag is unset to "host", for backward
  compatibility
- Default to CgroupnsMode: "host" for client versions < 1.40

Signed-off-by: Rob Gulewich <rgulewich@netflix.com>
2019-05-07 10:22:16 -07:00
Rob Gulewich
256eb04d69 Start containers in their own cgroup namespaces
This is enabled for all containers that are not run with --privileged,
if the kernel supports it.

Fixes #38332

Signed-off-by: Rob Gulewich <rgulewich@netflix.com>
2019-05-07 10:22:16 -07:00
Michael Crosby
c478553640 Export all spec generation opts
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-04-10 15:38:36 -04:00
Michael Crosby
cb902f4430 Refactor few spec generation ops
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-04-09 16:51:40 -04:00
John Howard
a3eda72f71
Merge pull request #38541 from Microsoft/jjh/containerd
Windows: Experimental: ContainerD runtime
2019-03-19 21:09:19 -07:00
Tibor Vass
8f936ae8cf Add DeviceRequests to HostConfig to support NVIDIA GPUs
This patch hard-codes support for NVIDIA GPUs.
In a future patch it should move out into its own Device Plugin.

Signed-off-by: Tibor Vass <tibor@docker.com>
2019-03-18 17:19:45 +00:00
John Howard
d4ceb61f2b LCOW:Reworking spec builder
Signed-off-by: John Howard <jhoward@microsoft.com>
2019-03-12 18:41:55 -07:00
Sebastiaan van Stijn
dd94555787
Merge pull request #32519 from darkowlzz/32443-docker-update-pids-limit
Add pids-limit support in docker update
2019-02-23 15:20:59 +01:00
Sunny Gogoi
74eb258ffb Add pids-limit support in docker update
- Adds updating PidsLimit in UpdateContainer().
- Adds setting PidsLimit in toContainerResources().

Signed-off-by: Sunny Gogoi <indiasuny000@gmail.com>
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2019-02-21 14:17:38 -08:00