Commit graph

447 commits

Author SHA1 Message Date
Sebastiaan van Stijn
147b87a03e
daemon: use string-literals for easier grep'ing
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 02815416bb)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-07-15 00:49:46 +02:00
Cory Snider
f50cb0c7bd
daemon: stop setting container resources to zero
Many of the fields in LinuxResources struct are pointers to scalars for
some reason, presumably to differentiate between set-to-zero and unset
when unmarshaling from JSON, despite zero being outside the acceptable
range for the corresponding kernel tunables. When creating the OCI spec
for a container, the daemon sets the container's OCI spec CPUShares and
BlkioWeight parameters to zero when the corresponding Docker container
configuration values are zero, signifying unset, despite the minimum
acceptable value for CPUShares being two, and BlkioWeight ten. This has
gone unnoticed as runC does not distingiush set-to-zero from unset as it
also uses zero internally to represent unset for those fields. However,
kata-containers v3.2.0-alpha.3 tries to apply the explicit-zero resource
parameters to the container, exactly as instructed, and fails loudly.
The OCI runtime-spec is silent on how the runtime should handle the case
when those parameters are explicitly set to out-of-range values and
kata's behaviour is not unreasonable, so the daemon must therefore be in
the wrong.

Translate unset values in the Docker container's resources HostConfig to
omit the corresponding fields in the container's OCI spec when starting
and updating a container in order to maximize compatibility with
runtimes.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit dea870f4ea)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-21 22:16:22 +02:00
Djordje Lukic
aec7a80c6f
c8d: Use reference counting while mounting a snapshot
Some snapshotters (like overlayfs or zfs) can't mount the same
directories twice. For example if the same directroy is used as an upper
directory in two mounts the kernel will output this warning:

    overlayfs: upperdir is in-use as upperdir/workdir of another mount, accessing files from both mounts will result in undefined behavior.

And indeed accessing the files from both mounts will result in an "No
such file or directory" error.

This change introduces reference counts for the mounts, if a directory
is already mounted the mount interface will only increment the mount
counter and return the mount target effectively making sure that the
filesystem doesn't end up in an undefined behavior.

Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
(cherry picked from commit 32d58144fd)
Signed-off-by: Bjorn Neergaard <bjorn.neergaard@docker.com>
2023-06-20 12:54:03 -06:00
Sebastiaan van Stijn
3eebf4d162
container: split security options to a SecurityOptions struct
- Split these options to a separate struct, so that we can handle them in isolation.
- Change some tests to use subtests, and improve coverage

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-29 00:03:37 +02:00
Brian Goff
0970cb054c
Merge pull request #45366 from akerouanton/fix-docker0-PreferredPool
daemon: set docker0 subpool as the IPAM pool
2023-04-25 11:07:57 -07:00
Albin Kerouanton
2d31697d82
daemon: set docker0 subpool as the IPAM pool
Since cc19eba (backported to v23.0.4), the PreferredPool for docker0 is
set only when the user provides the bip config parameter or when the
default bridge already exist. That means, if a user provides the
fixed-cidr parameter on a fresh install or reboot their computer/server
without bip set, dockerd throw the following error when it starts:

> failed to start daemon: Error initializing network controller: Error
> creating default "bridge" network: failed to parse pool request for
> address space "LocalDefault" pool "" subpool "100.64.0.0/26": Invalid
> Address SubPool

See #45356.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-04-25 15:32:46 +02:00
Sebastiaan van Stijn
f72548956f
remove deprecated legacy "overlay" storage-driver
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-19 17:06:45 +02:00
Sebastiaan van Stijn
f691b13450
daemon: move code related to stats together
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-08 19:00:01 +02:00
Akihiro Suda
e807ae4f2e
vendor: github.com/containerd/cgroups/v3 v3.0.1
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2023-03-08 20:15:17 +09:00
Cory Snider
b0eed5ade6 daemon: allow shimv2 runtimes to be configured
Kubernetes only permits RuntimeClass values which are valid lowercase
RFC 1123 labels, which disallows the period character. This prevents
cri-dockerd from being able to support configuring alternative shimv2
runtimes for a pod as shimv2 runtime names must contain at least one
period character. Add support for configuring named shimv2 runtimes in
daemon.json so that runtime names can be aliased to
Kubernetes-compatible names.

Allow options to be set on shimv2 runtimes in daemon.json.

The names of the new daemon runtime config fields have been selected to
correspond with the equivalent field names in cri-containerd's
configuration so that users can more easily follow documentation from
the runtime vendor written for cri-containerd and apply it to
daemon.json.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-17 18:08:06 -05:00
Djordje Lukic
0137446248 Implement run using the containerd snapshotter
Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>

c8d/daemon: Mount root and fill BaseFS

This fixes things that were broken due to nil BaseFS like `docker cp`
and running a container with workdir override.

This is more of a temporary hack than a real solution.
The correct fix would be to refactor the code to make BaseFS and LayerRW
an implementation detail of the old image store implementation and use
the temporary mounts for the c8d implementation instead.
That requires more work though.

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>

daemon/images: Don't unset BaseFS

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2023-02-06 18:21:50 +01:00
Cory Snider
cc19eba579 daemon: let libnetwork assign default bridge IPAM
The netutils.ElectInterfaceAddresses function is only used in one place
outside of tests: in the daemon, to configure the default bridge
network. The function is also messy to reason about as it references the
shared mutable state of ipamutils.PredefinedLocalScopeDefaultNetworks.
It uses the list of predefined default networks to always return an IPv4
address even if the named interface does not exist or does not have any
IPv4 addresses. This list happens to be the same as the one used to
initialize the address pool of the 'builtin' IPAM driver, though that is
far from obvious. (Start with "./libnetwork".initIPAMDrivers and trace
the dataflow of the addressPool value. Surprise! Global state is being
mutated using the value of other global mutable state.)

The daemon does not need the fallback behaviour of
ElectInterfaceAddresses. In fact, the daemon does not have to configure
an address pool for the network at all! libnetwork will acquire one of
the available address ranges from the network's IPAM driver when the
preferred-pool configuration is unset. It will do so using the same list
of address ranges and the exact same logic
(netutils.FindAvailableNetworks) as ElectInterfaceAddresses. So unless
the daemon needs to force the network to use a specific address range
because the bridge interface already exists, it can leave the details
up to libnetwork.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-26 14:54:57 -05:00
Cory Snider
f96b9bf761 libnetwork: return concrete-typed *Controller
libnetwork.NetworkController is an interface with a single
implementation.

https://github.com/golang/go/wiki/CodeReviewComments#interfaces

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-13 14:09:37 -05:00
Sebastiaan van Stijn
f95e9b68d6
daemon: use strings.Cut() and cleanup error messages
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-12-21 11:09:03 +01:00
Cory Snider
388fe4aea8 daemon: drop side effect from registerLinks()
(*Daemon).registerLinks() calling the WriteHostConfig() method of its
container argument is a vestigial behaviour. In the distant past,
registerLinks() would persist the container links in an SQLite database
and drop the link config from the container's persisted HostConfig. This
changed in Docker v1.10 (#16032) which migrated away from SQLite and
began using the link config in the container's HostConfig as the
persistent source of truth. registerLinks() no longer mutates the
HostConfig at all so persisting the HostConfig to disk falls outside of
its scope of responsibilities.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-12-12 16:04:09 -05:00
Sebastiaan van Stijn
e7904c5faa
Merge pull request #44309 from thaJeztah/daemon_check_requirements
daemon: NewDaemon(): check system requirements early
2022-11-01 13:42:44 +01:00
Sebastiaan van Stijn
17fb29c9e8
daemon: NewDaemon(): check system requirements early
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-10-17 15:15:55 +02:00
Sebastiaan van Stijn
69f72417f4
pkg/idtools: remove CanAccess(), and move to daemon
The implementation of CanAccess() is very rudimentary, and should
not be used for anything other than a basic check (and maybe not
even for that). It's only used in a single location in the daemon,
so move it there, and un-export it to not encourage others to use
it out of context.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-10-15 22:42:39 +02:00
Sebastiaan van Stijn
ddb42f3ad2
daemon: fix empty-lines (revive)
daemon/network/filter_test.go:174:19: empty-lines: extra empty line at the end of a block (revive)
    daemon/restart.go:17:116: empty-lines: extra empty line at the end of a block (revive)
    daemon/daemon_linux_test.go:255:41: empty-lines: extra empty line at the end of a block (revive)
    daemon/reload_test.go:340:58: empty-lines: extra empty line at the end of a block (revive)
    daemon/oci_linux.go:495:101: empty-lines: extra empty line at the end of a block (revive)
    daemon/seccomp_linux_test.go:17:36: empty-lines: extra empty line at the start of a block (revive)
    daemon/container_operations.go:560:73: empty-lines: extra empty line at the end of a block (revive)
    daemon/daemon_unix.go:558:76: empty-lines: extra empty line at the end of a block (revive)
    daemon/daemon_unix.go:1092:64: empty-lines: extra empty line at the start of a block (revive)
    daemon/container_operations.go:587:24: empty-lines: extra empty line at the end of a block (revive)
    daemon/network.go:807:18: empty-lines: extra empty line at the end of a block (revive)
    daemon/network.go:813:42: empty-lines: extra empty line at the end of a block (revive)
    daemon/network.go:872:72: empty-lines: extra empty line at the end of a block (revive)

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-09-28 01:58:51 +02:00
Cory Snider
9ce2b30b81 pkg/containerfs: drop ContainerFS type alias
Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-09-23 16:56:53 -04:00
Cory Snider
4bafaa00aa Refactor libcontainerd to minimize c8d RPCs
The containerd client is very chatty at the best of times. Because the
libcontained API is stateless and references containers and processes by
string ID for every method call, the implementation is essentially
forced to use the containerd client in a way which amplifies the number
of redundant RPCs invoked to perform any operation. The libcontainerd
remote implementation has to reload the containerd container, task
and/or process metadata for nearly every operation. This in turn
amplifies the number of context switches between dockerd and containerd
to perform any container operation or handle a containerd event,
increasing the load on the system which could otherwise be allocated to
workloads.

Overhaul the libcontainerd interface to reduce the impedance mismatch
with the containerd client so that the containerd client can be used
more efficiently. Split the API out into container, task and process
interfaces which the consumer is expected to retain so that
libcontainerd can retain state---especially the analogous containerd
client objects---without having to manage any state-store inside the
libcontainerd client.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-08-24 14:59:08 -04:00
Djordje Lukic
d8d990f2e3
daemon: make the snapshotter configurable
Treat (storage/graph)Driver as snapshotter

Also moved some layerStore related initialization to the non-c8d case
because otherwise they get treated as a graphdriver plugins.

Co-authored-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-08-22 18:57:42 +02:00
Brian Goff
e6ee27a541 Allow containerd shim refs in default-runtime
Since runtimes can now just be containerd shims, we need to check if the
reference is possibly a containerd shim.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2022-08-18 18:41:03 +00:00
Cory Snider
547da0d575 daemon: support other containerd runtimes (MVP)
Contrary to popular belief, the OCI Runtime specification does not
specify the command-line API for runtimes. Looking at containerd's
architecture from the lens of the OCI Runtime spec, the _shim_ is the
OCI Runtime and runC is "just" an implementation detail of the
io.containerd.runc.v2 runtime. When one configures a non-default runtime
in Docker, what they're really doing is instructing Docker to create
containers using the io.containerd.runc.v2 runtime with a configuration
option telling the runtime that the runC binary is at some non-default
path. Consequently, only OCI runtimes which are compatible with the
io.containerd.runc.v2 shim, such as crun, can be used in this manner.
Other OCI runtimes, including kata-containers v2, come with their own
containerd shim and are not compatible with io.containerd.runc.v2.
As Docker has not historically provided a way to select a non-default
runtime which requires its own shim, runtimes such as kata-containers v2
could not be used with Docker.

Allow other containerd shims to be used with Docker; no daemon
configuration required. If the daemon is instructed to create a
container with a runtime name which does not match any of the configured
or stock runtimes, it passes the name along to containerd verbatim. A
user can start a container with the kata-containers runtime, for
example, simply by calling

    docker run --runtime io.containerd.kata.v2

Runtime names which containerd would interpret as a path to an arbitrary
binary are disallowed. While handy for development and testing it is not
strictly necessary and would allow anyone with Engine API access to
trivially execute any binary on the host as root, so we have decided it
would be safest for our users if it was not allowed.

It is not yet possible to set an alternative containerd shim as the
default runtime; it can only be configured per-container.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-07-27 14:22:49 -04:00
Sebastiaan van Stijn
52c1a2fae8
gofmt GoDoc comments with go1.19
Older versions of Go don't format comments, so committing this as
a separate commit, so that we can already make these changes before
we upgrade to Go 1.19.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-07-08 19:56:23 +02:00
Akihiro Suda
2c7a6d7bb1
daemon: remove support for deprecated io.containerd.runtime.v1.linux
This has been deprecated in Docker 20.10.0 (f63f73a4a8)

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2022-06-05 18:41:30 +09:00
Sebastiaan van Stijn
b241e2008e
daemon.NewDaemon(): fix network feature detection on first start
Commit 483aa6294b introduced a regression, causing
spurious warnings to be shown when starting a daemon for the first time after
a fresh install:

    docker info
    ...
    WARNING: IPv4 forwarding is disabled
    WARNING: bridge-nf-call-iptables is disabled
    WARNING: bridge-nf-call-ip6tables is disabled

The information shown is incorrect, as checking the corresponding options on
the system, shows that these options are available:

    cat /proc/sys/net/ipv4/ip_forward
    1
    cat /proc/sys/net/bridge/bridge-nf-call-iptables
    1
    cat /proc/sys/net/bridge/bridge-nf-call-ip6tables
    1

The reason this is failing is because the daemon itself reconfigures those
options during networking initialization in `configureIPForwarding()`;
cf4595265e/libnetwork/drivers/bridge/setup_ip_forwarding.go (L14-L25)

Network initialization happens in the `daemon.restore()` function within `daemon.NewDaemon()`:
cf4595265e/daemon/daemon.go (L475-L478)

However, 483aa6294b moved detection of features
earlier in the `daemon.NewDaemon()` function, and collects the system information
(`d.RawSysInfo()`) before we enter `daemon.restore()`;
cf4595265e/daemon/daemon.go (L1008-L1011)

For optimization (collecting the system information comes at a cost), those
results are cached on the daemon, and will only be performed once (using a
`sync.Once`).

This patch:

- introduces a `getSysInfo()` utility, which collects system information without
  caching the results
- uses `getSysInfo()` to collect the preliminary information needed at that
  point in the daemon's lifecycle.
- moves printing warnings to the end of `daemon.NewDaemon()`, after all information
  can be read correctly.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-06-03 17:54:43 +02:00
Sebastiaan van Stijn
c3d7a0c603
Fix validation of IpcMode, PidMode, UTSMode, CgroupnsMode
These HostConfig properties were not validated until the OCI spec for the container
was created, which meant that `container run` and `docker create` would accept
invalid values, and the invalid value would not be detected until `start` was
called, returning a 500 "internal server error", as well as errors from containerd
("cleanup: failed to delete container from containerd: no such container") in the
daemon logs.

As a result, a faulty container was created, and the container state remained
in the `created` state.

This patch:

- Updates `oci.WithNamespaces()` to return the correct `errdefs.InvalidParameter`
- Updates `verifyPlatformContainerSettings()` to validate these settings, so that
  an error is returned when _creating_ the container.

Before this patch:

    docker run -dit --ipc=shared --name foo busybox
    2a00d74e9fbb7960c4718def8f6c74fa8ee754030eeb93ee26a516e27d4d029f
    docker: Error response from daemon: Invalid IPC mode: shared.

    docker ps -a --filter name=foo
    CONTAINER ID   IMAGE     COMMAND   CREATED              STATUS    PORTS     NAMES
    2a00d74e9fbb   busybox   "sh"      About a minute ago   Created             foo

After this patch:

    docker run -dit --ipc=shared --name foo busybox
    docker: Error response from daemon: invalid IPC mode: shared.

     docker ps -a --filter name=foo
    CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES

An integration test was added to verify the new validation, which can be run with:

    make BIND_DIR=. TEST_FILTER=TestCreateInvalidHostConfig DOCKER_GRAPHDRIVER=vfs test-integration

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-05-25 17:41:51 +02:00
Sebastiaan van Stijn
dbd575ef91
daemon: daemon.initNetworkController(): dont return the controller
This method returned the network controller, only to set it on the daemon.

While making this change, also;

- update some error messages to be in the correct format
- use errors.Wrap() where possible
- extract configuring networks into a separate function to make the flow
  slightly easier to follow.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-04-29 09:08:49 +02:00
Sebastiaan van Stijn
3b56c0663d
daemon: daemon.networkOptions(): don't pass Config as argument
This is a method on the daemon, which itself holds the Config, so
there's no need to pass the same configuration as an argument.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-04-23 23:34:13 +02:00
Sebastiaan van Stijn
0a3336fd7d
Merge pull request #43366 from corhere/finish-identitymapping-refactor
Finish refactor of UID/GID usage to a new struct
2022-03-25 14:51:05 +01:00
Sebastiaan van Stijn
5d10c6ec67
Update handling of deprecated kernel (tcp) memory options
- Omit `KernelMemory` and `KernelMemoryTCP` fields in `/info` response if they're
  not supported, or when using API v1.42 or up.
- Re-enable detection of `KernelMemory` (as it's still needed for older API versions)
- Remove warning about kernel memory TCP in daemon logs (a warning is still returned
  by the `/info` endpoint, but we can consider removing that).
- Prevent incorrect "Minimum kernel memory limit allowed" error if the value was
  reset because it's not supported by the host.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-03-17 09:56:39 +01:00
aiordache
af6307fbda
Remove KernelMemory option from /containers/create and /update endpoints
- remove KernelMemory option from `v1.42` api docs
 - remove KernelMemory warning on `/info`
 - update changes for `v1.42`
 - remove `KernelMemory` field from endpoints docs

Signed-off-by: aiordache <anca.iordache@docker.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-03-17 09:55:36 +01:00
Cory Snider
098a44c07f Finish refactor of UID/GID usage to a new struct
Finish the refactor which was partially completed with commit
34536c498d, passing around IdentityMapping structs instead of pairs of
[]IDMap slices.

Existing code which uses []IDMap relies on zero-valued fields to be
valid, empty mappings. So in order to successfully finish the
refactoring without introducing bugs, their replacement therefore also
needs to have a useful zero value which represents an empty mapping.
Change IdentityMapping to be a pass-by-value type so that there are no
nil pointers to worry about.

The functionality provided by the deprecated NewIDMappingsFromMaps
function is required by unit tests to to construct arbitrary
IdentityMapping values. And the daemon will always need to access the
mappings to pass them to the Linux kernel. Accommodate these use cases
by exporting the struct fields instead. BuildKit currently depends on
the UIDs and GIDs methods so we cannot get rid of them yet.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-03-14 16:28:57 -04:00
Sebastiaan van Stijn
3c44ade6d0
daemon: fix error-message for minimum allowed kernel-memory limit
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-02-22 10:25:48 +01:00
Akihiro Suda
54d35c071d
Merge pull request #43130 from thaJeztah/daemon_cache_sysinfo
daemon: load and cache sysInfo on initialization
2022-02-18 13:46:15 +09:00
Sebastiaan van Stijn
1240f8b41d
daemon: remove kernel version check and DOCKER_NOWARN_KERNEL_VERSION
All regular, non-EOL Linux distros now come with more recent kernels
out of the box. There may still be users trying to run on kernel 3.10
or older (some embedded systems, e.g.), but those should be a rare
exception, which we don't have to take into account.

This patch removes the kernel version check on Linux, and the corresponding
DOCKER_NOWARN_KERNEL_VERSION environment that was there to skip this
check.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-02-17 17:47:22 +01:00
Sebastiaan van Stijn
483aa6294b
daemon: load and cache sysInfo on initialization
The `daemon.RawSysInfo()` function can be a heavy operation, as it collects
information about all cgroups on the host, networking, AppArmor, Seccomp, etc.

While looking at our code, I noticed that various parts in the code call this
function, potentially even _multiple times_ per container, for example, it is
called from:

- `verifyPlatformContainerSettings()`
- `oci.WithCgroups()` if the daemon has `cpu-rt-period` or `cpu-rt-runtime` configured
- in `ContainerDecoder.DecodeConfig()`, which is called on boith `container create` and `container commit`

Given that this information is not expected to change during the daemon's
lifecycle, and various information coming from this (such as seccomp and
apparmor status) was already cached, we may as well load it once, and cache
the results in the daemon instance.

This patch updates `daemon.RawSysInfo()` to use a `sync.Once()` so that
it's only executed once for the daemon's lifecycle.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-01-12 18:28:15 +01:00
Akihiro Suda
40ccedd61b
Merge pull request #42785 from sanchayanghosh/42753-fix-host.internal
Fixed docker.internal.gateway not displaying properly on live restore
2021-11-16 13:26:20 +09:00
Akihiro Suda
d116e12c6d
Merge pull request #42726 from thaJeztah/daemon_simplify_nwconfig
daemon: simplify networking config
2021-11-12 01:19:07 +09:00
sanchayanghosh
894230b82d
Fixed docker.internal.gateway not displaying properly on live restore
Also includes review suggestions in daemon.initNetworkController():

- update godoc for setHostGatewayIP()
- change setHostGatewayIP() to get config, instead of daemon
- remove redundant nil check for controller

Signed-off-by: sanchayanghosh <sanchayanghosh@outlook.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-10-27 12:44:56 +02:00
Sebastiaan van Stijn
ba16293330
Merge pull request #42907 from thaJeztah/master_forward_port_security_fixes
[master] forward-port security fixes from 20.10.9
2021-10-14 20:43:01 +02:00
Brian Goff
03f1c3d78f
Lock down docker root dir perms.
Do not use 0701 perms.
0701 dir perms allows anyone to traverse the docker dir.
It happens to allow any user to execute, as an example, suid binaries
from image rootfs dirs because it allows traversal AND critically
container users need to be able to do execute things.

0701 on lower directories also happens to allow any user to modify
     things in, for instance, the overlay upper dir which neccessarily
     has 0755 permissions.

This changes to use 0710 which allows users in the group to traverse.
In userns mode the UID owner is (real) root and the GID is the remapped
root's GID.

This prevents anyone but the remapped root to traverse our directories
(which is required for userns with runc).

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit ef7237442147441a7cadcda0600be1186d81ac73)
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit 93ac040bf0)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-10-05 09:57:00 +02:00
Sebastiaan van Stijn
3ce1dcc25d
daemon.UsingSystemd(): don't call getCD() multiple times
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-09-24 13:51:39 +02:00
Brian Goff
7ccf750daa Allow switching Windows runtimes.
This adds support for 2 runtimes on Windows, one that uses the built-in
HCSv1 integration and another which uses containerd with the runhcs
shim.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2021-09-23 17:44:04 +00:00
Akihiro Suda
9e7bbdb9ba
Merge pull request #40084 from thaJeztah/hostconfig_const_cleanup
api/types: hostconfig: add some constants/enums and minor code cleanup
2021-08-28 00:21:31 +09:00
Eng Zer Jun
c55a4ac779
refactor: move from io/ioutil to io and os package
The io/ioutil package has been deprecated in Go 1.16. This commit
replaces the existing io/ioutil functions with their new definitions in
io and os packages.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2021-08-27 14:56:57 +08:00
Sebastiaan van Stijn
686be57d0a
Update to Go 1.17.0, and gofmt with Go 1.17
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-08-24 23:33:27 +02:00
Sebastiaan van Stijn
e8e278c44f
daemon: simplify networking config
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-08-09 11:15:49 +02:00
Sebastiaan van Stijn
27aaadb710
daemon: normalize seccomp profile as part of setupSeccompProfile()
This makes sure that the value set in the daemon can be used as-is,
without having to replicate the normalization logic elsewhere.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-08-07 15:41:46 +02:00