Commit graph

75 commits

Author SHA1 Message Date
Bjorn Neergaard
164a1a0f14
oci/defaults: deny /sys/devices/virtual/powercap
The ability to read these files may offer a power-based sidechannel
attack against any workloads running on the same kernel.

This was originally [CVE-2020-8694][1], which was fixed in
[949dd0104c496fa7c14991a23c03c62e44637e71][2] by restricting read access
to root. However, since many containers run as root, this is not
sufficient for our use case.

While untrusted code should ideally never be run, we can add some
defense in depth here by masking out the device class by default.

[Other mechanisms][3] to access this hardware exist, but they should not
be accessible to a container due to other safeguards in the
kernel/container stack (e.g. capabilities, perf paranoia).

[1]: https://nvd.nist.gov/vuln/detail/CVE-2020-8694
[2]: 949dd0104c
[3]: https://web.eece.maine.edu/~vweaver/projects/rapl/

Signed-off-by: Bjorn Neergaard <bjorn.neergaard@docker.com>
(cherry picked from commit 83cac3c3e3)
Signed-off-by: Bjorn Neergaard <bjorn.neergaard@docker.com>
2023-09-18 16:43:34 -06:00
Luboslav Pivarc
bf2b8a05a0
Do not drop effective&permitted set
Currently moby drops ep sets before the entrypoint is executed.
This does mean that with combination of no-new-privileges the
file capabilities stops working with non-root containers.
This is undesired as the usability of such containers is harmed
comparing to running root containers.

This commit therefore sets the effective/permitted set in order
to allow use of file capabilities or libcap(3)/prctl(2) respectively
with combination of no-new-privileges and without respectively.

For no-new-privileges the container will be able to obtain capabilities
that are requested.

Signed-off-by: Luboslav Pivarc <lpivarc@redhat.com>
Signed-off-by: Bjorn Neergaard <bjorn.neergaard@docker.com>
(cherry picked from commit 3aef732e61)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-08-13 22:26:45 +02:00
Cory Snider
210c4d6f4b
daemon: ensure OCI options play nicely together
Audit the OCI spec options used for Linux containers to ensure they are
less order-dependent. Ensure they don't assume that any pointer fields
are non-nil and that they don't unintentionally clobber mutations to the
spec applied by other options.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 8a094fe609)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-21 22:16:28 +02:00
Sebastiaan van Stijn
6371675bf9
Merge pull request #44275 from thaJeztah/move_pkg_system_funcs
pkg/system: move some functions to a new home
2022-12-16 15:25:41 +01:00
AdamKorcz
93fa093122
testing: move fuzzers over from OSS-Fuzz
Signed-off-by: AdamKorcz <adam@adalogics.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-11-30 17:31:03 +01:00
Sebastiaan van Stijn
9f3e5eead5
pkg/system: deprecate DefaultPathEnv, move to oci
This patch:

- Deprecates pkg/system.DefaultPathEnv
- Moves the implementation inside oci
- Adds TODOs to align the default in the Builder with the one used elsewhere

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-11-29 17:02:50 +01:00
Sebastiaan van Stijn
0ee5518e76
oci: use filepath.WalkDir instead of filepath.Walk
WalkDir is more performant as it doesn't perform an os.Lstat on every visited
file or directory.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-10-09 17:21:04 +02:00
Sebastiaan van Stijn
b44b3193d0
oci.DevicesFromPath() switch to use containerd implementation
Reducing the amount of code used from runc/libcontainer

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-09-29 10:40:30 +02:00
Sebastiaan van Stijn
8a2e1245d4
runconfig, oci, image, layer, distribution: fix empty-lines (revive)
runconfig/config_test.go:23:46: empty-lines: extra empty line at the start of a block (revive)
    runconfig/config_test.go:75:55: empty-lines: extra empty line at the start of a block (revive)

    oci/devices_linux.go:57:34: empty-lines: extra empty line at the start of a block (revive)
    oci/devices_linux.go:60:69: empty-lines: extra empty line at the start of a block (revive)

    image/fs_test.go:53:38: empty-lines: extra empty line at the end of a block (revive)
    image/tarexport/save.go:88:29: empty-lines: extra empty line at the end of a block (revive)

    layer/layer_unix_test.go:21:34: empty-lines: extra empty line at the end of a block (revive)

    distribution/xfer/download.go:302:9: empty-lines: extra empty line at the end of a block (revive)
    distribution/manifest_test.go:154:99: empty-lines: extra empty line at the end of a block (revive)
    distribution/manifest_test.go:329:52: empty-lines: extra empty line at the end of a block (revive)
    distribution/manifest_test.go:354:59: empty-lines: extra empty line at the end of a block (revive)

    registry/config_test.go:323:42: empty-lines: extra empty line at the end of a block (revive)
    registry/config_test.go:350:33: empty-lines: extra empty line at the end of a block (revive)

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-09-28 01:58:52 +02:00
Sebastiaan van Stijn
52c1a2fae8
gofmt GoDoc comments with go1.19
Older versions of Go don't format comments, so committing this as
a separate commit, so that we can already make these changes before
we upgrade to Go 1.19.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-07-08 19:56:23 +02:00
Samuel Karp
0d9a37d0c2
oci: inheritable capability set should be empty
The Linux kernel never sets the Inheritable capability flag to anything
other than empty.  Moby should have the same behavior, and leave it to
userspace code within the container to set a non-empty value if desired.

Reported-by: Andrew G. Morgan <morgan@kernel.org>
Signed-off-by: Samuel Karp <skarp@amazon.com>
2022-02-08 14:33:44 -08:00
Sebastiaan van Stijn
485cf38d48
oci/caps: limit available capabilities to current environment
In situations where docker runs in an environment where capabilities are limited,
sucn as docker-in-docker in a container created by older versions of docker, or
in a container where some capabilities have been disabled, starting a privileged
container may fail, because even though the _kernel_ supports a capability, the
capability is not available.

This patch attempts to address this problem by limiting the list of "known" capa-
bilities on the set of effective capabilties for the current process. This code
is based on the code in containerd's "caps" package.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-10-15 16:12:26 +02:00
Eng Zer Jun
c55a4ac779
refactor: move from io/ioutil to io and os package
The io/ioutil package has been deprecated in Go 1.16. This commit
replaces the existing io/ioutil functions with their new definitions in
io and os packages.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2021-08-27 14:56:57 +08:00
Sebastiaan van Stijn
686be57d0a
Update to Go 1.17.0, and gofmt with Go 1.17
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-08-24 23:33:27 +02:00
Sebastiaan van Stijn
58c4c120a8
oci/caps: simplify, and remove types that were not needed
The `CapabilityMapping` and `Capabilities` types appeared to be only
used locally, and added unneeded complexity.

This patch removes those types, and simplifies the logic to use a
map that maps names to `capability.Cap`s

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-08-04 11:25:55 +02:00
Sebastiaan van Stijn
fc3f98848a
oci/caps: improve error message for unsupported capabilities
A capability can either be invalid, or not supported by the kernel
on which we're running. This patch changes the error message produced
to reflect if the capability is invalid/unknown, or a known capability,
but not supported by the kernel version.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-08-04 11:25:53 +02:00
Sebastiaan van Stijn
72b1fb59fe
oci/caps: use map for capabilities to simplify lookup
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-08-04 11:25:51 +02:00
Sebastiaan van Stijn
d786a52364
oci/caps: generate list of all capabilities on "init"
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-08-04 11:25:48 +02:00
Sebastiaan van Stijn
0ec6f7ea23
oci/caps: minor optimization in init
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-08-04 11:25:44 +02:00
Sebastiaan van Stijn
b00b21b93c
oci/caps: rename some vars that conflicted with imports / built-ins
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-08-04 11:24:40 +02:00
Sebastiaan van Stijn
94334153b5
oci/caps: remove hack for RHEL6 kernels
We no longer support these kernels, so we can remove the workaround

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-08-04 11:23:56 +02:00
Sebastiaan van Stijn
c1c973e81b
Revert "Temporarily disable CAP_PERFMON, CAP_BPF, and CAP_CHECKPOINT_RESTORE"
Now that runc v1.0.0-rc93 is used, we can revert this temporary workaround

This reverts commit a38b96b8cd.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-08-03 16:12:31 +02:00
Sebastiaan van Stijn
0c84c322ae
daemon, oci: remove LCOW bits
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-07-27 13:35:59 +02:00
Sebastiaan van Stijn
bb17074119
reformat "nolint" comments
Unlike regular comments, nolint comments should not have a leading space.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-06-10 13:03:42 +02:00
Sebastiaan van Stijn
d414c0c1e8
replace uses of deprecated libcontainer/configs.Device
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-06-02 17:55:51 +02:00
Sebastiaan van Stijn
a40197328e
oci/caps: remove unused GetCapability() and ValidateCapabilities()
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-05-06 15:59:26 +02:00
Sebastiaan van Stijn
1cd1925acd
oci.Device() fix FileMode to match runtime spec
The runtime spec expects the FileMode field to only hold file permissions,
however `unix.Stat_t.Mode` contains both file type and mode.

This patch strips file type so that only file mode is included in the Device.

Thanks to Iceber Gu, who noticed the same issue in containerd and runc.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-02-18 10:48:24 +01:00
Sebastiaan van Stijn
5cc1753f2c
Fix daemon panic when starting container with invalid device cgroup rule
This fixes a panic when an invalid "device cgroup rule" is passed, resulting
in an "index out of range".

This bug was introduced in the original implementation in 1756af6faf,
but was not reproducible when using the CLI, because the same commit also added
client-side validation on the flag before making an API request. The following
example, uses an invalid rule (`c *:*  rwm` - two spaces before the permissions);

```console
$ docker run --rm --network=host --device-cgroup-rule='c *:*  rwm' busybox
invalid argument "c *:*  rwm" for "--device-cgroup-rule" flag: invalid device cgroup format 'c *:*  rwm'
```

Doing the same, but using the API results in a daemon panic when starting the container;

Create a container with an invalid device cgroup rule:

```console
curl -v \
  --unix-socket /var/run/docker.sock \
  "http://localhost/v1.41/containers/create?name=foobar" \
  -H "Content-Type: application/json" \
  -d '{"Image":"busybox:latest", "HostConfig":{"DeviceCgroupRules": ["c *:*  rwm"]}}'
```

Start the container:

```console
curl -v \
  --unix-socket /var/run/docker.sock \
  -X POST \
  "http://localhost/v1.41/containers/foobar/start"
```

Observe the daemon logs:

```
2021-01-22 12:53:03.313806 I | http: panic serving @: runtime error: index out of range [0] with length 0
goroutine 571 [running]:
net/http.(*conn).serve.func1(0xc000cb2d20)
	/usr/local/go/src/net/http/server.go:1795 +0x13b
panic(0x2f32380, 0xc000aebfc0)
	/usr/local/go/src/runtime/panic.go:679 +0x1b6
github.com/docker/docker/oci.AppendDevicePermissionsFromCgroupRules(0xc000175c00, 0x8, 0x8, 0xc0000bd380, 0x1, 0x4, 0x0, 0x0, 0xc0000e69c0, 0x0, ...)
	/go/src/github.com/docker/docker/oci/oci.go:34 +0x64f
```

This patch:

- fixes the panic, allowing the daemon to return an error on container start
- adds a unit-test to validate various permutations
- adds a "todo" to verify the regular expression (and handling) of the "a" (all) value

We should also consider performing this validation when _creating_ the container,
so that an error is produced early.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-01-22 16:02:19 +01:00
Arnaud Rebillout
f9b2989e97 Fix permissions on oci fixtures files
These two json files were executable, they are now 0644.

Signed-off-by: Arnaud Rebillout <elboulangero@gmail.com>
2020-11-27 10:29:47 +07:00
Sebastiaan van Stijn
a38b96b8cd
Temporarily disable CAP_PERFMON, CAP_BPF, and CAP_CHECKPOINT_RESTORE
This prevents docker from setting CAP_PERFMON, CAP_BPF, and CAP_CHECKPOINT_RESTORE
capabilities on privileged (or CAP_ALL) containers on Kernel 5.8 and up.

While these kernels support these capabilities, the current release of
runc ships with an older version of /gocapability/capability, and does
not know about them, causing an error to be produced.

We can remove this restriction once 6dfbe9b807
is included in a runc release and once we stop supporting containerd 1.3.x
(which ships with runc v1.0.0-rc92).

Thanks to Anca Iordache for reporting.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-10-16 17:52:27 +02:00
Sebastiaan van Stijn
c9c7756301
oci: add tests for loading seccomp profiles
Verify that we're able to test seccomp profiles with our
default Spec.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-09-29 20:15:43 +02:00
Jintao Zhang
9ad35b7e69 vendor runc 67169a9d43456ff0d5ae12b967acb8e366e2f181
v1.0.0-rc91-48-g67169a9d

Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>
2020-07-30 16:16:11 +00:00
Sebastiaan van Stijn
bd0c2b3581
oci/deviceCgroup(): remove redundant variable
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-07-29 14:50:56 +02:00
Brian Goff
24f173a003 Replace service "Capabilities" w/ add/drop API
After dicussing with maintainers, it was decided putting the burden of
providing the full cap list on the client is not a good design.
Instead we decided to follow along with the container API and use cap
add/drop.

This brings in the changes already merged into swarmkit.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-07-27 10:09:42 -07:00
Olli Janatuinen
1308a3a99f Move DefaultCapabilities() to caps package
Signed-off-by: Olli Janatuinen <olli.janatuinen@gmail.com>
2019-11-14 21:13:16 +02:00
Sebastiaan van Stijn
4a3ee04351
oci: fix SA4009: argument e is overwritten before first use (staticcheck)
```
oci/devices_linux.go:64:72: SA4009: argument e is overwritten before first use (staticcheck)
```

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-09-18 12:57:50 +02:00
Sebastiaan van Stijn
07ff4f1de8
goimports: fix imports
Format the source according to latest goimports.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-09-18 12:56:54 +02:00
Olli Janatuinen
80d7bfd54d Capabilities refactor
- Add support for exact list of capabilities, support only OCI model
- Support OCI model on CapAdd and CapDrop but remain backward compatibility
- Create variable locally instead of declaring it at the top
- Use const for magic "ALL" value
- Rename `cap` variable as it overlaps with `cap()` built-in
- Normalize and validate capabilities before use
- Move validation for conflicting options to validateHostConfig()
- TweakCapabilities: simplify logic to calculate capabilities

Signed-off-by: Olli Janatuinen <olli.janatuinen@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-01-22 21:50:41 +02:00
Michael Crosby
b940cc5cff Move caps and device spec utils to oci pkg
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2018-12-11 10:20:25 -05:00
Jonathan A. Schweder
64e52ff3db Masked /proc/asound
@sw-pschmied originally post this in #38285

While looking through the Moby source code was found /proc/asound to be
shared with containers as read-only (as defined in
https://github.com/moby/moby/blob/master/oci/defaults.go#L128).

This can lead to two information leaks.

---

**Leak of media playback status of the host**

Steps to reproduce the issue:

 - Listen to music/Play a YouTube video/Do anything else that involves
sound output
 - Execute docker run --rm ubuntu:latest bash -c "sleep 7; cat
/proc/asound/card*/pcm*p/sub*/status | grep state | cut -d ' ' -f2 |
grep RUNNING || echo 'not running'"
 - See that the containerized process is able to check whether someone
on the host is playing music as it prints RUNNING
 - Stop the music output
 - Execute the command again (The sleep is delaying the output because
information regarding playback status isn't propagated instantly)
 - See that it outputs not running

**Describe the results you received:**

A containerized process is able to gather information on the playback
status of an audio device governed by the host. Therefore a process of a
container is able to check whether and what kind of user activity is
present on the host system. Also, this may indicate whether a container
runs on a desktop system or a server as media playback rarely happens on
server systems.

The description above is in regard to media playback - when examining
`/proc/asound/card*/pcm*c/sub*/status` (`pcm*c` instead of `pcm*p`) this
can also leak information regarding capturing sound, as in recording
audio or making calls on the host system.

Signed-off-by: Jonathan A. Schweder <jonathanschweder@gmail.com>
2018-11-30 10:03:10 -02:00
Antonio Murdaca
569b9702a5
Add /proc/acpi to masked paths
The deafult OCI linux spec in oci/defaults{_linux}.go in Docker/Moby
from 1.11 to current upstream master does not block /proc/acpi pathnames
allowing attackers to modify host's hardware like enabling/disabling
bluetooth or turning up/down keyboard brightness. SELinux prevents all
of this if enabled.

Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2018-07-05 17:39:52 +02:00
Sebastiaan van Stijn
f23c00d870
Various code-cleanup
remove unnescessary import aliases, brackets, and so on.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2018-05-23 17:50:54 +02:00
Justin Cormack
de23cb9398 Add /proc/keys to masked paths
This leaks information about keyrings on the host. Keyrings are
not namespaced.

Signed-off-by: Justin Cormack <justin.cormack@docker.com>
2018-02-21 16:23:34 +00:00
Daniel Nephin
4f0d95fa6e Add canonical import comment
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2018-02-05 16:51:57 -05:00
John Howard
b023a46a07 Don't special case /sys/firmware in masked paths
Signed-off-by: John Howard <jhoward@microsoft.com>
2017-11-08 12:10:42 -08:00
Justin Cormack
a21ecdf3c8 Add /proc/scsi to masked paths
This is writeable, and can be used to remove devices. Containers do
not need to know about scsi devices.

Signed-off-by: Justin Cormack <justin.cormack@docker.com>
2017-11-03 15:12:22 +00:00
John Howard
71651e0b80 Fixes LCOW after containerd 1.0 introduced regressions
Signed-off-by: John Howard <jhoward@microsoft.com>
2017-10-27 09:55:43 -07:00
Michael Crosby
5a9b5f10cf Remove solaris files
For obvious reasons that it is not really supported now.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-10-24 15:39:34 -04:00
Kenfe-Mickael Laventure
ddae20c032
Update libcontainerd to use containerd 1.0
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2017-10-20 07:11:37 -07:00
John Howard
285bc99731 Merge pull request #34356 from mlaventure/update-containerd
Update containerd to 06b9cb35161009dcb7123345749fef02f7cea8e0
2017-08-24 14:25:44 -07:00