Commit graph

158 commits

Author SHA1 Message Date
Brian Goff
74da6a6363 Switch all logging to use containerd log pkg
This unifies our logging and allows us to propagate logging and trace
contexts together.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2023-06-24 00:23:44 +00:00
Sebastiaan van Stijn
3eebf4d162
container: split security options to a SecurityOptions struct
- Split these options to a separate struct, so that we can handle them in isolation.
- Change some tests to use subtests, and improve coverage

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-29 00:03:37 +02:00
Laura Brehm
a34060cdb4
Resolve and store manifest when creating container
This addresses the previous issue with the containerd store where, after a container is created, we can't deterministically resolve which image variant was used to run it (since we also don't store what platform the image was fetched for).

This is required for things like `docker commit`, and computing the containers layer size later, since we need to resolve the specific image variant.

Signed-off-by: Laura Brehm <laurabrehm@hey.com>
2023-03-06 15:13:36 +01:00
Sebastiaan van Stijn
c5d4b6b311
restartmanager: remove RestartManager interface
It only had a single implementation, so we may as well remove the added
complexity of defining it as an interface.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-12-28 09:36:58 +01:00
Sebastiaan van Stijn
efb97da0da
restartmanager: add SetPolicy() to the RestartManager interface
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-12-28 09:36:58 +01:00
Sebastiaan van Stijn
9f3e5eead5
pkg/system: deprecate DefaultPathEnv, move to oci
This patch:

- Deprecates pkg/system.DefaultPathEnv
- Moves the implementation inside oci
- Adds TODOs to align the default in the Builder with the one used elsewhere

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-11-29 17:02:50 +01:00
Cory Snider
9ce2b30b81 pkg/containerfs: drop ContainerFS type alias
Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-09-23 16:56:53 -04:00
Cory Snider
e332c41e9d pkg/containerfs: alias ContainerFS to string
Drop the constructor and redundant string() type-casts.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-09-23 16:56:52 -04:00
Cory Snider
95824f2b5f pkg/containerfs: simplify ContainerFS type
Iterate towards dropping the type entirely.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-09-23 16:56:49 -04:00
Cory Snider
a7c8fdc55b pkg/containerfs: make ResolveScopedPath a free fn
Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-09-23 16:25:22 -04:00
Sebastiaan van Stijn
511a909ae6
container: remove ViewDB and View interfaces, use concrete types
These interfaces were added in aacddda89d, with
no clear motivation, other than "Also hide ViewDB behind an interface".

This patch removes the interface in favor of using a concrete implementation;
There's currently only one implementation of this interface, and if we would
decide to change to an alternative implementation, we could define relevant
interfaces on the receiver side.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-09-21 17:38:45 +02:00
Cory Snider
4bafaa00aa Refactor libcontainerd to minimize c8d RPCs
The containerd client is very chatty at the best of times. Because the
libcontained API is stateless and references containers and processes by
string ID for every method call, the implementation is essentially
forced to use the containerd client in a way which amplifies the number
of redundant RPCs invoked to perform any operation. The libcontainerd
remote implementation has to reload the containerd container, task
and/or process metadata for nearly every operation. This in turn
amplifies the number of context switches between dockerd and containerd
to perform any container operation or handle a containerd event,
increasing the load on the system which could otherwise be allocated to
workloads.

Overhaul the libcontainerd interface to reduce the impedance mismatch
with the containerd client so that the containerd client can be used
more efficiently. Split the API out into container, task and process
interfaces which the consumer is expected to retain so that
libcontainerd can retain state---especially the analogous containerd
client objects---without having to manage any state-store inside the
libcontainerd client.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-08-24 14:59:08 -04:00
Cory Snider
57d2d6ef62 Update container OOMKilled flag immediately
The OOMKilled flag on a container's state has historically behaved
rather unintuitively: it is updated on container exit to reflect whether
or not any process within the container has been OOM-killed during the
preceding run of the container. The OOMKilled flag would be set to true
when the container exits if any process within the container---including
execs---was OOM-killed at any time while the container was running,
whether or not the OOM-kill was the cause of the container exiting. The
flag is "sticky," persisting through the next start of the container;
only being cleared once the container exits without any processes having
been OOM-killed that run.

Alter the behavior of the OOMKilled flag such that it signals whether
any process in the container had been OOM-killed since the most recent
start of the container. Set the flag immediately upon any process being
OOM-killed, and clear it when the container transitions to the "running"
state.

There is an ulterior motive for this change. It reduces the amount of
state the libcontainerd client needs to keep track of and clean up on
container exit. It's one less place the client could leak memory if a
container was to be deleted without going through libcontainerd.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-08-24 14:59:07 -04:00
Paweł Gronowski
a290f5d04c state/Wait: Fix race when reading exit status
Before this change there was a race condition between State.Wait reading
the exit code from State and the State being changed instantly after the
change which ended the State.Wait.

Now, each State.Wait has its own channel which is used to transmit the
desired StateStatus at the time the state transitions to the awaited
one. Wait no longer reads the status by itself so there is no race.

The issue caused the `docker run --restart=always ...' to sometimes exit
with 0 exit code, because the process was already restarted by the time
State.Wait got the chance to read the exit code.

Test run
--------
Before:
```
$ go test -count 1 -run TestCorrectStateWaitResultAfterRestart .
--- FAIL: TestCorrectStateWaitResultAfterRestart (0.00s)
    state_test.go:198: expected exit code 10, got 0
FAIL
FAIL    github.com/docker/docker/container      0.011s
FAIL

```

After:
```
$ go test -count 1 -run TestCorrectStateWaitResultAfterRestart .
ok      github.com/docker/docker/container      0.011s
```

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2022-07-20 09:23:31 +02:00
Sebastiaan van Stijn
52c1a2fae8
gofmt GoDoc comments with go1.19
Older versions of Go don't format comments, so committing this as
a separate commit, so that we can already make these changes before
we upgrade to Go 1.19.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-07-08 19:56:23 +02:00
Sebastiaan van Stijn
21df9a04e0
container: StopSignal(): return syscall.Signal
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-05-05 00:53:53 +02:00
Cory Snider
1c129103b4 Bump swarmkit to v2
Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-04-21 17:33:07 -04:00
Sebastiaan van Stijn
e53f65a916
pkg/signal: remove DefaultStopSignal const
This const was previously living in pkg/signal, but with that package
being moved to its own module, it didn't make much sense to put docker's
defaults in a generic module.

The const from the "signal" package is currenlty used *both* by the CLI
and the daemon as a default value when creating containers. This put up
some questions:

a. should the default be non-exported, and private to the container
   package? After all, it's a _default_ (so should be used if _NOT_ set).
b. should the client actually setting a default, or instead just omit
   the value, unless specified by the user? having the client set a
   default also means that the daemon cannot change the default value
   because the client (or older clients) will override it.
c. consider defaults from the client and defaults of the daemon to be
   separate things, and create a default const in the CLI.

This patch implements option "a" (option "b" will be done separately,
as it involves the CLI code). This still leaves "c" open as an option,
if the CLI wants to set its own default.

Unfortunately, this change means we'll have to drop the alias for the
deprecated pkg/signal.DefaultStopSignal const, but a comment was left
instead, which can assist consumers of the const to find why it's no
longer there (a search showed the Docker CLI as the only consumer though).

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-08-11 10:31:29 +02:00
Sebastiaan van Stijn
3b316814f9
container: un-export DefaultStopTimeout
It's not used outside of the package itself

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-08-11 10:05:40 +02:00
Sebastiaan van Stijn
13cb04e57c
remove various LCOW bits (container, image, pkg/containerfs)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-07-27 13:36:21 +02:00
Brian Goff
12f1b3ce43
Merge pull request #42616 from thaJeztah/migrate_pkg_signal
replace pkg/signal with moby/sys/signal v0.5.0
2021-07-26 10:47:28 -07:00
Sebastiaan van Stijn
28409ca6c7
replace pkg/signal with moby/sys/signal v0.5.0
This code was moved to the moby/sys repository

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-07-23 09:32:54 +02:00
Sebastiaan van Stijn
300c11c7c9
volume/mounts: remove "containerOS" argument from NewParser (LCOW code)
This changes mounts.NewParser() to create a parser for the current operatingsystem,
instead of one specific to a (possibly non-matching, in case of LCOW) OS.

With the OS-specific handling being removed, the "OS" parameter is also removed
from `daemon.verifyContainerSettings()`, and various other container-related
functions.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-07-02 13:51:55 +02:00
Sebastiaan van Stijn
e047d984dc
Remove LCOW code (step 1)
The LCOW implementation in dockerd has been deprecated in favor of re-implementation
in containerd (in progress). Microsoft started removing the LCOW V1 code from the
build dependencies we use in Microsoft/opengcs (soon to be part of Microsoft/hcshhim),
which means that we need to start removing this code.

This first step removes the lcow graphdriver, the LCOW initialization code, and
some LCOW-related utilities.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-06-03 21:16:21 +02:00
Akihiro Suda
9303376242
Swarm config: use absolute paths for mount destination strings
Needed for runc >= 1.0.0-rc94.

See runc issue 2928.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-05-11 12:46:43 +02:00
Sebastiaan van Stijn
f4aafedc48
container: minor cleanup/refactor
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-11-10 18:43:02 +01:00
Sebastiaan van Stijn
5c0b694ef3
container: make hostconfig.json non-world-readable (0600)
When writing container's `hostconfig.json`, permissions were set to 0644 (world-
readable). While this is not a security concern (as the `/var/lib/docker/containers`
directory has `0700` or `0701` permissions), there is no real need to have these
permissions, as this file is only accessed by the daemon.

Looking at history for file permissions;

- 06b53e3fc7 (first implementation) used `0666` (world-writable)
- cf1a6c08fa refactored the code, and removed explicit permissions
- ea3cbd3274 introduced atomic writes, and brought back the `0666` permissions
- 3ec8fed747 removed world-writable bits, but kept world-readable

This patch updates the permissions to `0600`, matching what's used for `config.v2.json`,
which was updated in ae52cea3ab, but forgot to update
`hostconfig.json`.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-11-10 18:42:59 +01:00
Sebastiaan van Stijn
dc3c382b34
replace pkg/symlink with github.com/moby/sys/symlink
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-11-03 11:17:12 +01:00
Brian Goff
5702a89db6 Use strings.Index instead of strings.Split
Since we don't need the actual split values, instead of calling
`strings.Split`, which allocates new slices on each call, use
`strings.Index`.

This significantly reduces the allocations required when doing env value
replacements.
Additionally, pre-allocate the env var slice, even if we allocate a
little more than we need, it keeps us from having to do multiple
allocations while appending.

```
benchmark                                     old ns/op     new ns/op     delta
BenchmarkReplaceOrAppendEnvValues/0-8         486           313           -35.60%
BenchmarkReplaceOrAppendEnvValues/100-8       10553         1535          -85.45%
BenchmarkReplaceOrAppendEnvValues/1000-8      94275         12758         -86.47%
BenchmarkReplaceOrAppendEnvValues/10000-8     1161268       129269        -88.87%

benchmark                                     old allocs     new allocs     delta
BenchmarkReplaceOrAppendEnvValues/0-8         5              2              -60.00%
BenchmarkReplaceOrAppendEnvValues/100-8       110            0              -100.00%
BenchmarkReplaceOrAppendEnvValues/1000-8      1013           0              -100.00%
BenchmarkReplaceOrAppendEnvValues/10000-8     10022          0              -100.00%

benchmark                                     old bytes     new bytes     delta
BenchmarkReplaceOrAppendEnvValues/0-8         192           24            -87.50%
BenchmarkReplaceOrAppendEnvValues/100-8       7360          0             -100.00%
BenchmarkReplaceOrAppendEnvValues/1000-8      64832         0             -100.00%
BenchmarkReplaceOrAppendEnvValues/10000-8     1146049       0             -100.00%
```

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-04-24 11:10:13 -07:00
Brian Goff
750f0d1648 Support configuration of log cacher.
Configuration over the API per container is intentionally left out for
the time being, but is supported to configure the default from the
daemon config.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit cbecf48bc352e680a5390a7ca9cff53098cd16d7)
Signed-off-by: Madhu Venugopal <madhu@docker.com>
2020-02-19 17:02:34 -05:00
Brian Goff
e2ceb83a53 Support reads for all log drivers.
This supplements any log driver which does not support reads with a
custom read implementation that uses a local file cache.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit d675e2bf2b75865915c7a4552e00802feeb0847f)
Signed-off-by: Madhu Venugopal <madhu@docker.com>
2020-02-19 17:01:44 -05:00
John Howard
8988448729 Remove refs to jhowardmsft from .go code
Signed-off-by: John Howard <jhoward@microsoft.com>
2019-09-25 10:51:18 -07:00
Sebastiaan van Stijn
07ff4f1de8
goimports: fix imports
Format the source according to latest goimports.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-09-18 12:56:54 +02:00
Michael Crosby
b5f28865ef Handle blocked I/O of exec'd processes
This is the second part to
https://github.com/containerd/containerd/pull/3361 and will help process
delete not block forever when the process exists but the I/O was
inherited by a subprocess that lives on.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-06-21 12:02:15 -04:00
Sebastiaan van Stijn
e0ad6d045c
Merge pull request #37092 from cpuguy83/local_logger
Add "local" log driver
2018-08-20 07:01:41 +01:00
Brian Goff
a351b38e72 Add new local log driver
This driver uses protobuf to store log messages and has better defaults
for log file handling (e.g. compression and file rotation enabled by
default).

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-08-17 09:36:56 -07:00
Salahuddin Khan
763d839261 Add ADD/COPY --chown flag support to Windows
This implements chown support on Windows. Built-in accounts as well
as accounts included in the SAM database of the container are supported.

NOTE: IDPair is now named Identity and IDMappings is now named
IdentityMapping.

The following are valid examples:
ADD --chown=Guest . <some directory>
COPY --chown=Administrator . <some directory>
COPY --chown=Guests . <some directory>
COPY --chown=ContainerUser . <some directory>

On Windows an owner is only granted the permission to read the security
descriptor and read/write the discretionary access control list. This
fix also grants read/write and execute permissions to the owner.

Signed-off-by: Salahuddin Khan <salah@docker.com>
2018-08-13 21:59:11 -07:00
Brian Goff
cc8f358c23 Move network operations out of container package
These network operations really don't have anything to do with the
container but rather are setting up the networking.

Ideally these wouldn't get shoved into the daemon package, but doing
something else (e.g. extract a network service into a new package) but
there's a lot more work to do in that regard.
In reality, this probably simplifies some of that work as it moves all
the network operations to the same place.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-05-10 17:16:00 -04:00
Kir Kolyshkin
7d62e40f7e Switch from x/net/context -> context
Since Go 1.7, context is a standard package. Since Go 1.9, everything
that is provided by "x/net/context" is a couple of type aliases to
types in "context".

Many vendored packages still use x/net/context, so vendor entry remains
for now.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2018-04-23 13:52:44 -07:00
Brian Goff
6a70fd222b Move mount parsing to separate package.
This moves the platform specific stuff in a separate package and keeps
the `volume` package and the defined interfaces light to import.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-04-19 06:35:54 -04:00
Kir Kolyshkin
d6ea46ceda container.BaseFS: check for nil before deref
Commit 7a7357dae1 ("LCOW: Implemented support for docker cp + build")
changed `container.BaseFS` from being a string (that could be empty but
can't lead to nil pointer dereference) to containerfs.ContainerFS,
which could be be `nil` and so nil dereference is at least theoretically
possible, which leads to panic (i.e. engine crashes).

Such a panic can be avoided by carefully analysing the source code in all
the places that dereference a variable, to make the variable can't be nil.
Practically, this analisys are impossible as code is constantly
evolving.

Still, we need to avoid panics and crashes. A good way to do so is to
explicitly check that a variable is non-nil, returning an error
otherwise. Even in case such a check looks absolutely redundant,
further changes to the code might make it useful, and having an
extra check is not a big price to pay to avoid a panic.

This commit adds such checks for all the places where it is not obvious
that container.BaseFS is not nil (which in this case means we do not
call daemon.Mount() a few lines earlier).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2018-03-13 21:24:48 -07:00
Brian Goff
a1afe38e52
Merge pull request #36272 from mnussbaum/36255-fix_log_path
Fix empty LogPath with non-blocking logging mode
2018-02-27 11:25:39 -05:00
junzhe and mnussbaum
20ca612a59 Fix empty LogPath with non-blocking logging mode
This fixes an issue where the container LogPath was empty when the
non-blocking logging mode was enabled. This change sets the LogPath on
the container as soon as the path is generated, instead of setting the
LogPath on a logger struct and then attempting to pull it off that
logger at a later point. That attempt to pull the LogPath off the logger
was error prone since it assumed that the logger would only ever be a
single type.

Prior to this change docker inspect returned an empty string for
LogPath. This caused issues with tools that rely on docker inspect
output to discover container logs, e.g. Kubernetes.

This commit also removes some LogPath methods that are now unnecessary
and are never invoked.

Signed-off-by: junzhe and mnussbaum <code@getbraintree.com>
2018-02-20 23:12:34 -08:00
Brian Goff
c02171802b Merge configs/secrets in unix implementation
On unix, merge secrets/configs handling. This is important because
configs can contain secrets (via templating) and potentially a config
could just simply have secret information "by accident" from the user.
This just make sure that configs are as secure as secrets and de-dups a
lot of code.
Generally this makes everything simpler and configs more secure.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-02-16 11:25:14 -05:00
Aaron Lehmann
cd3d0486a6 Store configs that contain secrets on tmpfs
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
2018-02-16 11:25:14 -05:00
Daniel Nephin
4f0d95fa6e Add canonical import comment
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2018-02-05 16:51:57 -05:00
Brian Goff
eaa5192856 Make container resource mounts unbindable
It's a common scenario for admins and/or monitoring applications to
mount in the daemon root dir into a container. When doing so all mounts
get coppied into the container, often with private references.
This can prevent removal of a container due to the various mounts that
must be configured before a container is started (for example, for
shared /dev/shm, or secrets) being leaked into another namespace,
usually with private references.

This is particularly problematic on older kernels (e.g. RHEL < 7.4)
where a mount may be active in another namespace and attempting to
remove a mountpoint which is active in another namespace fails.

This change moves all container resource mounts into a common directory
so that the directory can be made unbindable.
What this does is prevents sub-mounts of this new directory from leaking
into other namespaces when mounted with `rbind`... which is how all
binds are handled for containers.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-01-16 15:09:05 -05:00
Daniel Nephin
3fec7c0858 Remove libcontainerd.IOPipe
replaced with cio.DirectIO

Signed-off-by: Daniel Nephin <dnephin@docker.com>
2018-01-09 12:00:28 -05:00
Michael Crosby
aa3ce07c41 Update daemon code for containerd API changes
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-11-30 09:55:03 -05:00
Darren Stahl
ed74ee127f Increase container default shutdown timeout on Windows
The shutdown timeout for containers in insufficient on Windows. If the daemon is shutting down, and a container takes longer than expected to shut down, this can cause the container to remain in a bad state after restart, and never be able to start again. Increasing the timeout makes this less likely to occur.

Signed-off-by: Darren Stahl <darst@microsoft.com>
2017-10-23 10:31:31 -07:00