Commit graph

213 commits

Author SHA1 Message Date
Sebastiaan van Stijn
3eebf4d162
container: split security options to a SecurityOptions struct
- Split these options to a separate struct, so that we can handle them in isolation.
- Change some tests to use subtests, and improve coverage

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-29 00:03:37 +02:00
Sebastiaan van Stijn
f691b13450
daemon: move code related to stats together
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-08 19:00:01 +02:00
Cory Snider
f96b9bf761 libnetwork: return concrete-typed *Controller
libnetwork.NetworkController is an interface with a single
implementation.

https://github.com/golang/go/wiki/CodeReviewComments#interfaces

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-13 14:09:37 -05:00
Sebastiaan van Stijn
e7904c5faa
Merge pull request #44309 from thaJeztah/daemon_check_requirements
daemon: NewDaemon(): check system requirements early
2022-11-01 13:42:44 +01:00
Sebastiaan van Stijn
19c5d21e6f
daemon: getPluginExecRoot(): pass config
This makes it more transparent that it's unused for Linux,
and we don't pass "root", which has no relation with the
path on Linux.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-10-17 15:22:10 +02:00
Sebastiaan van Stijn
17fb29c9e8
daemon: NewDaemon(): check system requirements early
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-10-17 15:15:55 +02:00
Cory Snider
9ce2b30b81 pkg/containerfs: drop ContainerFS type alias
Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-09-23 16:56:53 -04:00
Cory Snider
4bafaa00aa Refactor libcontainerd to minimize c8d RPCs
The containerd client is very chatty at the best of times. Because the
libcontained API is stateless and references containers and processes by
string ID for every method call, the implementation is essentially
forced to use the containerd client in a way which amplifies the number
of redundant RPCs invoked to perform any operation. The libcontainerd
remote implementation has to reload the containerd container, task
and/or process metadata for nearly every operation. This in turn
amplifies the number of context switches between dockerd and containerd
to perform any container operation or handle a containerd event,
increasing the load on the system which could otherwise be allocated to
workloads.

Overhaul the libcontainerd interface to reduce the impedance mismatch
with the containerd client so that the containerd client can be used
more efficiently. Split the API out into container, task and process
interfaces which the consumer is expected to retain so that
libcontainerd can retain state---especially the analogous containerd
client objects---without having to manage any state-store inside the
libcontainerd client.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-08-24 14:59:08 -04:00
Olli Janatuinen
67c36d5d6e Windows: Re-create custom NAT networks after restart if missing from HNS
Signed-off-by: Olli Janatuinen <olli.janatuinen@gmail.com>
2022-07-19 14:16:31 -07:00
Sebastiaan van Stijn
b241e2008e
daemon.NewDaemon(): fix network feature detection on first start
Commit 483aa6294b introduced a regression, causing
spurious warnings to be shown when starting a daemon for the first time after
a fresh install:

    docker info
    ...
    WARNING: IPv4 forwarding is disabled
    WARNING: bridge-nf-call-iptables is disabled
    WARNING: bridge-nf-call-ip6tables is disabled

The information shown is incorrect, as checking the corresponding options on
the system, shows that these options are available:

    cat /proc/sys/net/ipv4/ip_forward
    1
    cat /proc/sys/net/bridge/bridge-nf-call-iptables
    1
    cat /proc/sys/net/bridge/bridge-nf-call-ip6tables
    1

The reason this is failing is because the daemon itself reconfigures those
options during networking initialization in `configureIPForwarding()`;
cf4595265e/libnetwork/drivers/bridge/setup_ip_forwarding.go (L14-L25)

Network initialization happens in the `daemon.restore()` function within `daemon.NewDaemon()`:
cf4595265e/daemon/daemon.go (L475-L478)

However, 483aa6294b moved detection of features
earlier in the `daemon.NewDaemon()` function, and collects the system information
(`d.RawSysInfo()`) before we enter `daemon.restore()`;
cf4595265e/daemon/daemon.go (L1008-L1011)

For optimization (collecting the system information comes at a cost), those
results are cached on the daemon, and will only be performed once (using a
`sync.Once`).

This patch:

- introduces a `getSysInfo()` utility, which collects system information without
  caching the results
- uses `getSysInfo()` to collect the preliminary information needed at that
  point in the daemon's lifecycle.
- moves printing warnings to the end of `daemon.NewDaemon()`, after all information
  can be read correctly.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-06-03 17:54:43 +02:00
Sebastiaan van Stijn
dbd575ef91
daemon: daemon.initNetworkController(): dont return the controller
This method returned the network controller, only to set it on the daemon.

While making this change, also;

- update some error messages to be in the correct format
- use errors.Wrap() where possible
- extract configuring networks into a separate function to make the flow
  slightly easier to follow.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-04-29 09:08:49 +02:00
Sebastiaan van Stijn
3b56c0663d
daemon: daemon.networkOptions(): don't pass Config as argument
This is a method on the daemon, which itself holds the Config, so
there's no need to pass the same configuration as an argument.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-04-23 23:34:13 +02:00
Sebastiaan van Stijn
0a3336fd7d
Merge pull request #43366 from corhere/finish-identitymapping-refactor
Finish refactor of UID/GID usage to a new struct
2022-03-25 14:51:05 +01:00
Sebastiaan van Stijn
9bf40d7edd
pkg/system: move IsWindowsClient to pkg/parsers/operatingsystem
This function was only used in a single place, and pkg/parsers/operatingsystem
already copied the `verNTWorkstation` const, so we might as well move this function
there as well to "unclutter" pkg/system.

The function had no external users, so not adding an alias / stub.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-03-17 10:26:50 +01:00
Cory Snider
098a44c07f Finish refactor of UID/GID usage to a new struct
Finish the refactor which was partially completed with commit
34536c498d, passing around IdentityMapping structs instead of pairs of
[]IDMap slices.

Existing code which uses []IDMap relies on zero-valued fields to be
valid, empty mappings. So in order to successfully finish the
refactoring without introducing bugs, their replacement therefore also
needs to have a useful zero value which represents an empty mapping.
Change IdentityMapping to be a pass-by-value type so that there are no
nil pointers to worry about.

The functionality provided by the deprecated NewIDMappingsFromMaps
function is required by unit tests to to construct arbitrary
IdentityMapping values. And the daemon will always need to access the
mappings to pass them to the Linux kernel. Accommodate these use cases
by exporting the struct fields instead. BuildKit currently depends on
the UIDs and GIDs methods so we cannot get rid of them yet.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-03-14 16:28:57 -04:00
Sebastiaan van Stijn
705f9b68cc
some cleaning up of isolation checks, and platform information
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-02-18 22:58:37 +01:00
Sebastiaan van Stijn
1b3fef5333
Windows: require Windows Server RS5 / ltsc2019 (build 17763) as minimum
Windows Server 2016 (RS1) reached end of support, and Docker Desktop requires
Windows 10 V19H2 (version 1909, build 18363) as a minimum.

This patch makes Windows Server RS5 /  ltsc2019 (build 17763) the minimum version
to run the daemon, and removes some hacks for older versions of Windows.

There is one check remaining that checks for Windows RS3 for a workaround
on older versions, but recent changes in Windows seemed to have regressed
on the same issue, so I kept that code for now to check if we may need that
workaround (again);

085c6a98d5/daemon/graphdriver/windows/windows.go (L319-L341)

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-02-18 22:58:28 +01:00
Akihiro Suda
54d35c071d
Merge pull request #43130 from thaJeztah/daemon_cache_sysinfo
daemon: load and cache sysInfo on initialization
2022-02-18 13:46:15 +09:00
Sebastiaan van Stijn
dd4cf4b641
daemon: remove some unused stubs on Windows
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-02-17 17:57:51 +01:00
Sebastiaan van Stijn
1240f8b41d
daemon: remove kernel version check and DOCKER_NOWARN_KERNEL_VERSION
All regular, non-EOL Linux distros now come with more recent kernels
out of the box. There may still be users trying to run on kernel 3.10
or older (some embedded systems, e.g.), but those should be a rare
exception, which we don't have to take into account.

This patch removes the kernel version check on Linux, and the corresponding
DOCKER_NOWARN_KERNEL_VERSION environment that was there to skip this
check.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-02-17 17:47:22 +01:00
Sebastiaan van Stijn
483aa6294b
daemon: load and cache sysInfo on initialization
The `daemon.RawSysInfo()` function can be a heavy operation, as it collects
information about all cgroups on the host, networking, AppArmor, Seccomp, etc.

While looking at our code, I noticed that various parts in the code call this
function, potentially even _multiple times_ per container, for example, it is
called from:

- `verifyPlatformContainerSettings()`
- `oci.WithCgroups()` if the daemon has `cpu-rt-period` or `cpu-rt-runtime` configured
- in `ContainerDecoder.DecodeConfig()`, which is called on boith `container create` and `container commit`

Given that this information is not expected to change during the daemon's
lifecycle, and various information coming from this (such as seccomp and
apparmor status) was already cached, we may as well load it once, and cache
the results in the daemon instance.

This patch updates `daemon.RawSysInfo()` to use a `sync.Once()` so that
it's only executed once for the daemon's lifecycle.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-01-12 18:28:15 +01:00
Akihiro Suda
d116e12c6d
Merge pull request #42726 from thaJeztah/daemon_simplify_nwconfig
daemon: simplify networking config
2021-11-12 01:19:07 +09:00
Brian Goff
7ccf750daa Allow switching Windows runtimes.
This adds support for 2 runtimes on Windows, one that uses the built-in
HCSv1 integration and another which uses containerd with the runhcs
shim.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2021-09-23 17:44:04 +00:00
Sebastiaan van Stijn
e8e278c44f
daemon: simplify networking config
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-08-09 11:15:49 +02:00
Sebastiaan van Stijn
0c84c322ae
daemon, oci: remove LCOW bits
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-07-27 13:35:59 +02:00
Sebastiaan van Stijn
9b795c3e50
pkg/sysinfo.New(), daemon.RawSysInfo(): remove "quiet" argument
The "quiet" argument was only used in a single place (at daemon startup), and
every other use had to pass "false" to prevent this function from logging
warnings.

Now that SysInfo contains the warnings that occurred when collecting the
system information, we can make leave it up to the caller to use those
warnings (and log them if wanted).

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-07-14 23:10:07 +02:00
Sebastiaan van Stijn
e047d984dc
Remove LCOW code (step 1)
The LCOW implementation in dockerd has been deprecated in favor of re-implementation
in containerd (in progress). Microsoft started removing the LCOW V1 code from the
build dependencies we use in Microsoft/opengcs (soon to be part of Microsoft/hcshhim),
which means that we need to start removing this code.

This first step removes the lcow graphdriver, the LCOW initialization code, and
some LCOW-related utilities.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-06-03 21:16:21 +02:00
Brian Goff
4b981436fe Fixup libnetwork lint errors
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2021-06-01 23:48:32 +00:00
Brian Goff
a0a473125b Fix libnetwork imports
After moving libnetwork to this repo, we need to update all the import
paths for libnetwork to point to docker/docker/libnetwork instead of
docker/libnetwork.
This change implements that.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2021-06-01 21:51:23 +00:00
Sebastiaan van Stijn
182795cff6
Do not call mount.RecursiveUnmount() on Windows
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-10-29 23:00:16 +01:00
Sebastiaan van Stijn
bf7fd015f7
Remove unused useShimV2()
This function was removed in the Linux code as part of
f63f73a4a8, but was not removed in
the Windows code.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-07-15 14:28:48 +02:00
Akihiro Suda
f350b53241 cgroup2: implement docker info
ref: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-04-17 07:20:01 +09:00
vboulineau
ec16053ccf
Fix UsageInUsermode value on Windows
Looks like a wrong copy-paste using `RuntimeKernel100ns` twice instead of `RuntimeUser100ns`

Signed-off-by: Vincent Boulineau <vincent.boulineau@datadoghq.com>
2020-03-27 16:43:22 +01:00
Akihiro Suda
612343618d cgroup2: use shim V2
* Requires containerd binaries from containerd/containerd#3799 . Metrics are unimplemented yet.
* Works with crun v0.10.4, but `--security-opt seccomp=unconfined` is needed unless using master version of libseccomp
  ( containers/crun#156, seccomp/libseccomp#177 )
* Doesn't work with master runc yet
* Resource limitations are unimplemented

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-01-01 02:58:40 +09:00
Brian Goff
8d2456d1e6
Merge pull request #40246 from thaJeztah/system_windows_cleanup
pkg/system: minor cleanups and remove use of deprecated system.GetOSVersion()
2019-12-19 11:36:29 -08:00
Brian Goff
e25754b80c
Merge pull request #40186 from pradipd/default-nat-subnet
Dockerd won't start if a network with the default subnet prefix already exists in HNS.
2019-12-03 09:31:29 -08:00
Sebastiaan van Stijn
044b74e33b
daemon: remove use of deprecated system.GetOSVersion()
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-11-25 13:39:50 +01:00
Olli Janatuinen
447a840254 Windows: Use system specific parallelism value on containers restart
Signed-off-by: Olli Janatuinen <olli.janatuinen@gmail.com>
2019-11-11 15:44:47 +02:00
Pradip Dhara
89c6febfc2 Dockerd won't start if a network with the default subnet prefix already exists in HNS.
Signed-off-by: Pradip Dhara <pradipd@microsoft.com>
2019-11-06 10:54:28 -08:00
Sebastiaan van Stijn
6b91ceff74
Use hcsshim osversion package for Windows versions
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-10-22 02:53:00 +02:00
Sebastiaan van Stijn
05469b5fa2
daemon: add "isWindows" const
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-10-17 23:49:43 +02:00
John Howard
8988448729 Remove refs to jhowardmsft from .go code
Signed-off-by: John Howard <jhoward@microsoft.com>
2019-09-25 10:51:18 -07:00
Sebastiaan van Stijn
a9aeda8343
Rename some references to docker.exe to dockerd.exe
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-09-11 15:16:27 +02:00
Sebastiaan van Stijn
bad0b4e604
Remove skip evaluation of symlinks to data root on IoT Core
This fix was added in 8e71b1e210 to work around
a go issue (https://github.com/golang/go/issues/20506).

That issue was fixed in
66c03d39f3,
which is part of Go 1.10 and up. This reverts the changes that were made in
8e71b1e210, and are no longer needed.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-07-13 23:44:51 +02:00
John Howard
85ad4b16c1 Windows: Experimental: Allow containerd for runtime
Signed-off-by: John Howard <jhoward@microsoft.com>

This is the first step in refactoring moby (dockerd) to use containerd on Windows.
Similar to the current model in Linux, this adds the option to enable it for runtime.
It does not switch the graphdriver to containerd snapshotters.

 - Refactors libcontainerd to a series of subpackages so that either a
  "local" containerd (1) or a "remote" (2) containerd can be loaded as opposed
  to conditional compile as "local" for Windows and "remote" for Linux.

 - Updates libcontainerd such that Windows has an option to allow the use of a
   "remote" containerd. Here, it communicates over a named pipe using GRPC.
   This is currently guarded behind the experimental flag, an environment variable,
   and the providing of a pipename to connect to containerd.

 - Infrastructure pieces such as under pkg/system to have helper functions for
   determining whether containerd is being used.

(1) "local" containerd is what the daemon on Windows has used since inception.
It's not really containerd at all - it's simply local invocation of HCS APIs
directly in-process from the daemon through the Microsoft/hcsshim library.

(2) "remote" containerd is what docker on Linux uses for it's runtime. It means
that there is a separate containerd service running, and docker communicates over
GRPC to it.

To try this out, you will need to start with something like the following:

Window 1:
	containerd --log-level debug

Window 2:
	$env:DOCKER_WINDOWS_CONTAINERD=1
	dockerd --experimental -D --containerd \\.\pipe\containerd-containerd

You will need the following binary from github.com/containerd/containerd in your path:
 - containerd.exe

You will need the following binaries from github.com/Microsoft/hcsshim in your path:
 - runhcs.exe
 - containerd-shim-runhcs-v1.exe

For LCOW, it will require and initrd.img and kernel in `C:\Program Files\Linux Containers`.
This is no different to the current requirements. However, you may need updated binaries,
particularly initrd.img built from Microsoft/opengcs as (at the time of writing), Linuxkit
binaries are somewhat out of date.

Note that containerd and hcsshim for HCS v2 APIs do not yet support all the required
functionality needed for docker. This will come in time - this is a baby (although large)
step to migrating Docker on Windows to containerd.

Note that the HCS v2 APIs are only called on RS5+ builds. RS1..RS4 will still use
HCS v1 APIs as the v2 APIs were not fully developed enough on these builds to be usable.
This abstraction is done in HCSShim. (Referring specifically to runtime)

Note the LCOW graphdriver still uses HCS v1 APIs regardless.

Note also that this does not migrate docker to use containerd snapshotters
rather than graphdrivers. This needs to be done in conjunction with Linux also
doing the same switch.
2019-03-12 18:41:55 -07:00
Sebastiaan van Stijn
dd94555787
Merge pull request #32519 from darkowlzz/32443-docker-update-pids-limit
Add pids-limit support in docker update
2019-02-23 15:20:59 +01:00
Sunny Gogoi
74eb258ffb Add pids-limit support in docker update
- Adds updating PidsLimit in UpdateContainer().
- Adds setting PidsLimit in toContainerResources().

Signed-off-by: Sunny Gogoi <indiasuny000@gmail.com>
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2019-02-21 14:17:38 -08:00
akolomentsev
e017717d96 keep old network ids
for windows all networks are re-populated in the store during network controller initialization. In current version it also regenerate network Ids which may be referenced by other components and it may cause broken references to a networks. This commit avoids regeneration of network ids.

Signed-off-by: Andrey Kolomentsev <andrey.kolomentsev@docker.com>
2019-01-23 14:53:27 -08:00
Akihiro Suda
2cb26cfe9c
Merge pull request #38301 from cyphar/waitgroup-limits
daemon: switch to semaphore-gated WaitGroup for startup tasks
2018-12-22 00:07:55 +09:00
Aleksa Sarai
5a52917e4d
daemon: switch to semaphore-gated WaitGroup for startup tasks
Many startup tasks have to run for each container, and thus using a
WaitGroup (which doesn't have a limit to the number of parallel tasks)
can result in Docker exceeding the NOFILE limit quite trivially. A more
optimal solution is to have a parallelism limit by using a semaphore.

In addition, several startup tasks were not parallelised previously
which resulted in very long startup times. According to my testing, 20K
dead containers resulted in ~6 minute startup times (during which time
Docker is completely unusable).

This patch fixes both issues, and the parallelStartupTimes factor chosen
(128 * NumCPU) is based on my own significant testing of the 20K
container case. This patch (on my machines) reduces the startup time
from 6 minutes to less than a minute (ideally this could be further
reduced by removing the need to scan all dead containers on startup --
but that's beyond the scope of this patchset).

In order to avoid the NOFILE limit problem, we also detect this
on-startup and if NOFILE < 2*128*NumCPU we will reduce the parallelism
factor to avoid hitting NOFILE limits (but also emit a warning since
this is almost certainly a mis-configuration).

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2018-12-21 21:51:02 +11:00