Commit graph

448 commits

Author SHA1 Message Date
Cory Snider
d222bf097c daemon: reload runtimes w/o breaking containers
The existing runtimes reload logic went to great lengths to replace the
directory containing runtime wrapper scripts as atomically as possible
within the limitations of the Linux filesystem ABI. Trouble is,
atomically swapping the wrapper scripts directory solves the wrong
problem! The runtime configuration is "locked in" when a container is
started, including the path to the runC binary. If a container is
started with a runtime which requires a daemon-managed wrapper script
and then the daemon is reloaded with a config which no longer requires
the wrapper script (i.e. some args -> no args, or the runtime is dropped
from the config), that container would become unmanageable. Any attempts
to stop, exec or otherwise perform lifecycle management operations on
the container are likely to fail due to the wrapper script no longer
existing at its original path.

Atomically swapping the wrapper scripts is also incompatible with the
read-copy-update paradigm for reloading configuration. A handler in the
daemon could retain a reference to the pre-reload configuration for an
indeterminate amount of time after the daemon configuration has been
reloaded and updated. It is possible for the daemon to attempt to start
a container using a deleted wrapper script if a request to run a
container races a reload.

Solve the problem of deleting referenced wrapper scripts by ensuring
that all wrapper scripts are *immutable* for the lifetime of the daemon
process. Any given runtime wrapper script must always exist with the
same contents, no matter how many times the daemon config is reloaded,
or what changes are made to the config. This is accomplished by using
everyone's favourite design pattern: content-addressable storage. Each
wrapper script file name is suffixed with the SHA-256 digest of its
contents to (probabilistically) guarantee immutability without needing
any concurrency control. Stale runtime wrapper scripts are only cleaned
up on the next daemon restart.

Split the derived runtimes configuration from the user-supplied
configuration to have a place to store derived state without mutating
the user-supplied configuration or exposing daemon internals in API
struct types. Hold the derived state and the user-supplied configuration
in a single struct value so that they can be updated as an atomic unit.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-01 14:45:25 -04:00
Cory Snider
0b592467d9 daemon: read-copy-update the daemon config
Ensure data-race-free access to the daemon configuration without
locking by mutating a deep copy of the config and atomically storing
a pointer to the copy into the daemon-wide configStore value. Any
operations which need to read from the daemon config must capture the
configStore value only once and pass it around to guarantee a consistent
view of the config.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-01 14:45:24 -04:00
Sebastiaan van Stijn
3eebf4d162
container: split security options to a SecurityOptions struct
- Split these options to a separate struct, so that we can handle them in isolation.
- Change some tests to use subtests, and improve coverage

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-29 00:03:37 +02:00
Cory Snider
0ffaa6c785 daemon: add annotations to container HostConfig
Allow clients to set annotations on a container which will applied to
the container's OCI spec.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-23 18:59:00 -05:00
Brian Goff
73e09ddecf
Merge pull request #43787 from thaJeztah/memdb_nits
container: ViewDB: cleanup error-types
2023-01-06 08:09:06 -08:00
Cory Snider
0141c6db81 daemon: don't checkpoint container until registered
(*Container).CheckpointTo() upserts a snapshot of the container to the
daemon's in-memory ViewDB and also persists the snapshot to disk. It
does not register the live container object with the daemon's container
store, however. The ViewDB and container store are used as the source of
truth for different operations, so having a container registered in one
but not the other can result in inconsistencies. In particular, the List
Containers API uses the ViewDB as its source of truth and the Container
Inspect API uses the container store.

The (*Daemon).setHostConfig() method is called fairly early in the
process of creating a container, long before the container is registered
in the daemon's container store. Due to a rogue CheckpointTo() call
inside setHostConfig(), there is a window of time where a container can
be included in a List Containers API response but "not exist" according
to the Container Inspect API and similar endpoints which operate on a
particular container. Remove the rogue call so that the caller has full
control over when the container is checkpointed and update callers to
checkpoint explicitly. No changes to (*Daemon).create() are needed as it
checkpoints the fully-created container via (*Daemon).Register().

Fixes #44512.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-12-12 15:53:49 -05:00
Cory Snider
00157a42d3 daemon: fix GetContainer() returning (nil, nil)
GetContainer() would return (nil, nil) when looking up a container
if the container was inserted into the containersReplica ViewDB but not
the containers Store at the time of the lookup. Callers which reasonably
assume that the returned err == nil implies returned container != nil
would dereference a nil pointer and panic. Change GetContainer() so that
it always returns a container or an error.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-12-12 14:39:10 -05:00
Sebastiaan van Stijn
94dea2018e
container: ViewDB: GetByPrefix() return typed errors
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-12-08 14:33:57 +01:00
Sebastiaan van Stijn
da4d627e79
container: ViewDB: use errdefs for non-existing containers
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-12-08 14:33:57 +01:00
Sebastiaan van Stijn
8dd14509d7
ImageService: rename GraphDriverName to StorageDriver
Make the function name more generic, as it's no longer used only
for graphdrivers but also for snapshotters.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-08-18 09:44:51 +02:00
Sebastiaan van Stijn
52c1a2fae8
gofmt GoDoc comments with go1.19
Older versions of Go don't format comments, so committing this as
a separate commit, so that we can already make these changes before
we upgrade to Go 1.19.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-07-08 19:56:23 +02:00
Djordje Lukic
70dc392bfa
Use hashicorp/go-memdb instead of truncindex
memdb already knows how to search by prefix so there is no need to keep
a separate list of container ids in the truncindex

Benchmarks:

$ go test -benchmem -run=^$ -count 5 -tags linux -bench ^BenchmarkDBGetByPrefix100$ github.com/docker/docker/container
goos: linux
goarch: amd64
pkg: github.com/docker/docker/container
cpu: Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
BenchmarkDBGetByPrefix100-6        16018             73935 ns/op           33888 B/op       1100 allocs/op
BenchmarkDBGetByPrefix100-6        16502             73150 ns/op           33888 B/op       1100 allocs/op
BenchmarkDBGetByPrefix100-6        16218             74014 ns/op           33856 B/op       1100 allocs/op
BenchmarkDBGetByPrefix100-6        15733             73370 ns/op           33792 B/op       1100 allocs/op
BenchmarkDBGetByPrefix100-6        16432             72546 ns/op           33744 B/op       1100 allocs/op
PASS
ok      github.com/docker/docker/container      9.752s

$ go test -benchmem -run=^$ -count 5 -tags linux -bench ^BenchmarkTruncIndexGet100$ github.com/docker/docker/pkg/truncindex
goos: linux
goarch: amd64
pkg: github.com/docker/docker/pkg/truncindex
cpu: Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
BenchmarkTruncIndexGet100-6        16862             73732 ns/op           44776 B/op       1173 allocs/op
BenchmarkTruncIndexGet100-6        16832             73629 ns/op           45184 B/op       1179 allocs/op
BenchmarkTruncIndexGet100-6        17214             73571 ns/op           45160 B/op       1178 allocs/op
BenchmarkTruncIndexGet100-6        16113             71680 ns/op           45360 B/op       1182 allocs/op
BenchmarkTruncIndexGet100-6        16676             71246 ns/op           45056 B/op       1184 allocs/op
PASS
ok      github.com/docker/docker/pkg/truncindex 9.759s

$ go test -benchmem -run=^$ -count 5 -tags linux -bench ^BenchmarkDBGetByPrefix500$ github.com/docker/docker/container
goos: linux
goarch: amd64
pkg: github.com/docker/docker/container
cpu: Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
BenchmarkDBGetByPrefix500-6         1539            753541 ns/op          169381 B/op       5500 allocs/op
BenchmarkDBGetByPrefix500-6         1624            749975 ns/op          169458 B/op       5500 allocs/op
BenchmarkDBGetByPrefix500-6         1635            761222 ns/op          169298 B/op       5500 allocs/op
BenchmarkDBGetByPrefix500-6         1693            727856 ns/op          169297 B/op       5500 allocs/op
BenchmarkDBGetByPrefix500-6         1874            710813 ns/op          169570 B/op       5500 allocs/op
PASS
ok      github.com/docker/docker/container      6.711s

$ go test -benchmem -run=^$ -count 5 -tags linux -bench ^BenchmarkTruncIndexGet500$ github.com/docker/docker/pkg/truncindex
goos: linux
goarch: amd64
pkg: github.com/docker/docker/pkg/truncindex
cpu: Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
BenchmarkTruncIndexGet500-6         1934            780328 ns/op          224073 B/op       5929 allocs/op
BenchmarkTruncIndexGet500-6         1713            713935 ns/op          225011 B/op       5937 allocs/op
BenchmarkTruncIndexGet500-6         1780            702847 ns/op          224090 B/op       5943 allocs/op
BenchmarkTruncIndexGet500-6         1736            711086 ns/op          224027 B/op       5929 allocs/op
BenchmarkTruncIndexGet500-6         2448            508694 ns/op          222322 B/op       5914 allocs/op
PASS
ok      github.com/docker/docker/pkg/truncindex 6.877s

Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
2022-05-20 18:22:21 +02:00
Brian Goff
12f1b3ce43
Merge pull request #42616 from thaJeztah/migrate_pkg_signal
replace pkg/signal with moby/sys/signal v0.5.0
2021-07-26 10:47:28 -07:00
Sebastiaan van Stijn
28409ca6c7
replace pkg/signal with moby/sys/signal v0.5.0
This code was moved to the moby/sys repository

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-07-23 09:32:54 +02:00
Sebastiaan van Stijn
300c11c7c9
volume/mounts: remove "containerOS" argument from NewParser (LCOW code)
This changes mounts.NewParser() to create a parser for the current operatingsystem,
instead of one specific to a (possibly non-matching, in case of LCOW) OS.

With the OS-specific handling being removed, the "OS" parameter is also removed
from `daemon.verifyContainerSettings()`, and various other container-related
functions.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-07-02 13:51:55 +02:00
Sebastiaan van Stijn
2773f81aa5
Merge pull request #42445 from thaJeztah/bump_golang_ci
[testing] ~update~ fix linting issues found by golangci-lint v1.40.1
2021-06-16 22:15:01 +02:00
Sebastiaan van Stijn
dc7cbb9b33
remove layerstore indexing by OS (used for LCOW)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-06-10 17:49:11 +02:00
Sebastiaan van Stijn
d13997b4ba
gosec: G601: Implicit memory aliasing in for loop
plugin/v2/plugin.go:141:50: G601: Implicit memory aliasing in for loop. (gosec)
                    updateSettingsEnv(&p.PluginObj.Settings.Env, &s)
                                                                 ^
    libcontainerd/remote/client.go:572:13: G601: Implicit memory aliasing in for loop. (gosec)
                cpDesc = &m
                         ^
    distribution/push_v2.go:400:34: G601: Implicit memory aliasing in for loop. (gosec)
                (metadata.CheckV2MetadataHMAC(&mountCandidate, pd.hmacKey) ||
                                              ^
    builder/dockerfile/builder.go:261:84: G601: Implicit memory aliasing in for loop. (gosec)
            currentCommandIndex = printCommand(b.Stdout, currentCommandIndex, totalCommands, &meta)
                                                                                             ^
    builder/dockerfile/builder.go:278:46: G601: Implicit memory aliasing in for loop. (gosec)
            if err := initializeStage(dispatchRequest, &stage); err != nil {
                                                       ^
    daemon/container.go:283:40: G601: Implicit memory aliasing in for loop. (gosec)
            if err := parser.ValidateMountConfig(&cfg); err != nil {
                                                 ^

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-06-10 13:03:29 +02:00
Brian Goff
24f173a003 Replace service "Capabilities" w/ add/drop API
After dicussing with maintainers, it was decided putting the burden of
providing the full cap list on the client is not a good design.
Instead we decided to follow along with the container API and use cap
add/drop.

This brings in the changes already merged into swarmkit.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-07-27 10:09:42 -07:00
Sebastiaan van Stijn
a8216806ce
vendor: opencontainers/selinux v1.5.1
full diff: https://github.com/opencontainers/selinux/compare/v1.3.3...v1.5.1

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-05-05 20:33:06 +02:00
Sebastiaan van Stijn
eb14d936bf
daemon: rename variables that collide with imported package names
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-04-14 17:22:23 +02:00
Brian Goff
03163f6825
Merge pull request #40291 from akhilerm/privileged-device
35991- make `--device` works at privileged mode
2020-01-02 10:09:31 -08:00
Akhil Mohan
86ebbe16de
remove host directory check
Signed-off-by: Akhil Mohan <akhil.mohan@mayadata.io>
2020-01-02 14:28:51 +05:30
wenlxie
03b3ec1dd5
make --device works at privileged mode
Signed-off-by: wenlxie <wenlxie@ebay.com>
2019-12-06 18:17:03 +05:30
Sebastiaan van Stijn
f4f56b1197
daemon: normalize comment formatting
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-11-27 15:43:53 +01:00
Olli Janatuinen
80d7bfd54d Capabilities refactor
- Add support for exact list of capabilities, support only OCI model
- Support OCI model on CapAdd and CapDrop but remain backward compatibility
- Create variable locally instead of declaring it at the top
- Use const for magic "ALL" value
- Rename `cap` variable as it overlaps with `cap()` built-in
- Normalize and validate capabilities before use
- Move validation for conflicting options to validateHostConfig()
- TweakCapabilities: simplify logic to calculate capabilities

Signed-off-by: Olli Janatuinen <olli.janatuinen@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-01-22 21:50:41 +02:00
Sebastiaan van Stijn
f6002117a4
Extract container-config and container-hostconfig validation
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2018-12-19 13:09:12 +01:00
Sebastiaan van Stijn
5fc0f03426
Extract workingdir validation/conversion to a function
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2018-12-19 10:24:39 +01:00
Sebastiaan van Stijn
c0697c27aa
Extract port-mapping validation to a function
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2018-12-19 10:24:33 +01:00
Sebastiaan van Stijn
e1809510ca
Extract restart-policy-validation to a function
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2018-12-19 10:24:28 +01:00
Sebastiaan van Stijn
6a7da0b31b
Extract healthcheck-validation to a function
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2018-12-19 10:24:23 +01:00
Sebastiaan van Stijn
e278678705
Remove unused argument from verifyPlatformContainerSettings
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2018-12-19 09:23:09 +01:00
Sebastiaan van Stijn
10c97b9357
Unify logging container validation warnings
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2018-12-19 09:15:21 +01:00
Sebastiaan van Stijn
2e23ef5350
Move port-publishing check to linux platform-check
Windows does not have host-mode networking, so on Windows, this
check was a no-op

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2018-12-18 22:46:05 +01:00
Brian Goff
6a70fd222b Move mount parsing to separate package.
This moves the platform specific stuff in a separate package and keeps
the `volume` package and the defined interfaces light to import.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-04-19 06:35:54 -04:00
Daniel Nephin
0dab53ff3c Move all daemon image methods into imageService
imageService provides the backend for the image API and handles the
imageStore, and referenceStore.

Signed-off-by: Daniel Nephin <dnephin@docker.com>
2018-02-26 16:48:29 -05:00
Vincent Demeester
35d69f10a9
Merge pull request #35510 from ripcurld0/fix_35500
Display a warn message when there is binding ports and net mode is host
2018-02-19 08:57:36 +01:00
Boaz Shuster
6e78fdb790 Display a warn message when there is binding ports and net mode is host
When a container is created if "--network" is set to "host" all the
ports in the container are bound to the host.
Thus, adding "-p" or "--publish" to the command-line is meaningless.

Unlike "docker run" and "docker create", "docker service create" sends
an error message when network mode is host and port bindings are given

This patch however suggests to send a warning message to the client when
such a case occurs.

The warning message is added to "warnings" which are returned from
"verifyPlatformContainerSettings".

Signed-off-by: Boaz Shuster <ripcurld.github@gmail.com>
2018-02-18 13:28:44 +00:00
Daniel Nephin
4f0d95fa6e Add canonical import comment
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2018-02-05 16:51:57 -05:00
Brian Goff
d453fe35b9 Move api/errdefs to errdefs
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-01-11 21:21:43 -05:00
Brian Goff
87a12421a9 Add helpers to create errdef errors
Instead of having to create a bunch of custom error types that are doing
nothing but wrapping another error in sub-packages, use a common helper
to create errors of the requested type.

e.g. instead of re-implementing this over and over:

```go
type notFoundError struct {
  cause error
}

func(e notFoundError) Error() string {
  return e.cause.Error()
}

func(e notFoundError) NotFound() {}

func(e notFoundError) Cause() error {
  return e.cause
}
```

Packages can instead just do:

```
  errdefs.NotFound(err)
```

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-01-11 21:21:43 -05:00
Sebastiaan van Stijn
7cb96ba308
Re-validate Mounts on container start
Validation of Mounts was only performed on container _creation_, not on
container _start_. As a result, if the host-path no longer existed
when the container was started, a directory was created in the given
location.

This is the wrong behavior, because when using the `Mounts` API, host paths
should never be created, and an error should be produced instead.

This patch adds a validation step on container start, and produces an
error if the host path is not found.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2017-12-19 11:44:29 +01:00
Simon Ferquel
e6bfe9cdcb Added validation of isolation settings on daemon.verifyContainerSettings
Signed-off-by: Simon Ferquel <simon.ferquel@docker.com>
2017-11-20 10:34:20 +01:00
John Howard
0380fbff37 LCOW: API: Add platform to /images/create and /build
Signed-off-by: John Howard <jhoward@microsoft.com>

This PR has the API changes described in https://github.com/moby/moby/issues/34617.
Specifically, it adds an HTTP header "X-Requested-Platform" which is a JSON-encoded
OCI Image-spec `Platform` structure.

In addition, it renames (almost all) uses of a string variable platform (and associated)
methods/functions to os. This makes it much clearer to disambiguate with the swarm
"platform" which is really os/arch. This is a stepping stone to getting the daemon towards
fully multi-platform/arch-aware, and makes it clear when "operating system" is being
referred to rather than "platform" which is misleadingly used - sometimes in the swarm
meaning, but more often as just the operating system.
2017-10-06 11:44:18 -07:00
John Howard
9fa449064c LCOW: WORKDIR correct handling
Signed-off-by: John Howard <jhoward@microsoft.com>
2017-08-17 15:29:17 -07:00
Brian Goff
ebcb7d6b40 Remove string checking in API error handling
Use strongly typed errors to set HTTP status codes.
Error interfaces are defined in the api/errors package and errors
returned from controllers are checked against these interfaces.

Errors can be wraeped in a pkg/errors.Causer, as long as somewhere in the
line of causes one of the interfaces is implemented. The special error
interfaces take precedence over Causer, meaning if both Causer and one
of the new error interfaces are implemented, the Causer is not
traversed.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2017-08-15 16:01:11 -04:00
Aaron Lehmann
1128fc1add Store container names in memdb
Currently, names are maintained by a separate system called "registrar".
This means there is no way to atomically snapshot the state of
containers and the names associated with them.

We can add this atomicity and simplify the code by storing name
associations in the memdb. This removes the need for pkg/registrar, and
makes snapshots a lot less expensive because they no longer need to copy
all the names. This change also avoids some problematic behavior from
pkg/registrar where it returns slices which may be modified later on.

Note that while this change makes the *snapshotting* atomic, it doesn't
yet do anything to make sure containers are named at the same time that
they are added to the database. We can do that by adding a transactional
interface, either as a followup, or as part of this PR.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
2017-07-13 12:35:00 -07:00
Fabio Kung
9134e87afc only Daemon.load needs to call label.ReserveLabel
Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
2017-06-23 07:52:33 -07:00
Fabio Kung
edad52707c save deep copies of Container in the replica store
Reuse existing structures and rely on json serialization to deep copy
Container objects.

Also consolidate all "save" operations on container.CheckpointTo, which
now both saves a serialized json to disk, and replicates state to the
ACID in-memory store.

Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
2017-06-23 07:52:33 -07:00
Fabio Kung
aacddda89d Move checkpointing to the Container object
Also hide ViewDB behind an inteface.

Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
2017-06-23 07:52:32 -07:00