Commit graph

401 commits

Author SHA1 Message Date
Cory Snider
97d32bb7d7 daemon: stop checkpointing health probes to disk
The health status and probe log of containers are not mission-criticial
data which must survive a crash. It is not worth prematrely wearing out
consumer-grade flash storage by overwriting and fsync()ing the container
config on after every probe. Update only the live Container object and
the ViewDB replica on every container health probe instead. It will
eventually get checkpointed along with some other state (or config)
change. Running containers will not be checkpointed on daemon shutdown
when live-restore is enabled, but it does not matter: the health status
and probe log will be zeroed out when the daemon starts back up.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2024-01-16 14:09:40 -05:00
Sebastiaan van Stijn
b8f2caa80a
pkg/containerfs: deprecate ResolveScopedPath
If was a very shallow wrapper around symlink.FollowSymlinkInScope, so inline
that code instead.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2024-01-02 15:32:31 +01:00
Sebastiaan van Stijn
e18f5a5304
container: internalize InitAttachContext
Move the initialization logic to the attachContext itself, so that
the container doesn't have to be aware about mutexes and other logic.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-11-30 15:26:53 +01:00
Sebastiaan van Stijn
cff4f20c44
migrate to github.com/containerd/log v0.1.0
The github.com/containerd/containerd/log package was moved to a separate
module, which will also be used by upcoming (patch) releases of containerd.

This patch moves our own uses of the package to use the new module.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-10-11 17:52:23 +02:00
Sebastiaan van Stijn
0f871f8cb7
api/types/events: define "Action" type and consts
Define consts for the Actions we use for events, instead of "ad-hoc" strings.
Having these consts makes it easier to find where specific events are triggered,
makes the events less error-prone, and allows documenting each Action (if needed).

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-08-29 00:38:08 +02:00
Sebastiaan van Stijn
b71951c70c
container: format code with gofumpt
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-08-24 17:57:34 +02:00
Brian Goff
74da6a6363 Switch all logging to use containerd log pkg
This unifies our logging and allows us to propagate logging and trace
contexts together.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2023-06-24 00:23:44 +00:00
Akihiro Suda
5045a2de24
Support recursively read-only (RRO) mounts
`docker run -v /foo:/foo:ro` is now recursively read-only on kernel >= 5.12.

Automatically falls back to the legacy non-recursively read-only mount mode on kernel < 5.12.

Use `ro-non-recursive` to disable RRO.
Use `ro-force-recursive` or `rro` to explicitly enable RRO. (Fails on kernel < 5.12)

Fix issue 44978
Fix docker/for-linux issue 788

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2023-05-26 01:58:24 +09:00
Sebastiaan van Stijn
ab35df454d
remove pre-go1.17 build-tags
Removed pre-go1.17 build-tags with go fix;

    go mod init
    go fix -mod=readonly ./...
    rm go.mod

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-05-19 20:38:51 +02:00
Sebastiaan van Stijn
3eebf4d162
container: split security options to a SecurityOptions struct
- Split these options to a separate struct, so that we can handle them in isolation.
- Change some tests to use subtests, and improve coverage

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-29 00:03:37 +02:00
Laura Brehm
a34060cdb4
Resolve and store manifest when creating container
This addresses the previous issue with the containerd store where, after a container is created, we can't deterministically resolve which image variant was used to run it (since we also don't store what platform the image was fetched for).

This is required for things like `docker commit`, and computing the containers layer size later, since we need to resolve the specific image variant.

Signed-off-by: Laura Brehm <laurabrehm@hey.com>
2023-03-06 15:13:36 +01:00
Sebastiaan van Stijn
d73aba2500
Merge pull request #42264 from thaJeztah/update_the_update
restartmanager:  Remove RestartManager interface, and unused error return
2023-01-10 11:40:36 +01:00
Brian Goff
73e09ddecf
Merge pull request #43787 from thaJeztah/memdb_nits
container: ViewDB: cleanup error-types
2023-01-06 08:09:06 -08:00
Sebastiaan van Stijn
c5d4b6b311
restartmanager: remove RestartManager interface
It only had a single implementation, so we may as well remove the added
complexity of defining it as an interface.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-12-28 09:36:58 +01:00
Sebastiaan van Stijn
efb97da0da
restartmanager: add SetPolicy() to the RestartManager interface
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-12-28 09:36:58 +01:00
Sebastiaan van Stijn
c63ea32a17
pkg/ioutils: TempDir: move to pkg/longpath
This utility wasn't very related to all other utilities in pkg/ioutils.
Moving it to longpath to also make it more clear what it does.

It looks like there's only a single (public) external consumer of this
utility, and only used in a test, and it's not 100% clear if it was
intentional to use our package, of if it was a case of "I actually meant
`io/ioutil.MkdirTemp`" so we could consider skipping the alias.

While moving the package, I also renamed `TempDir` to `MkdirTemp`, which
is the signature it matches in "os" from stdlib.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-12-20 23:24:12 +01:00
Sebastiaan van Stijn
28382c58ec
container: ViewDB: use logrus.WithError()
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-12-08 14:36:22 +01:00
Sebastiaan van Stijn
6549a270e9
container: ViewDB: return typed system errors
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-12-08 14:33:57 +01:00
Sebastiaan van Stijn
94dea2018e
container: ViewDB: GetByPrefix() return typed errors
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-12-08 14:33:57 +01:00
Sebastiaan van Stijn
da4d627e79
container: ViewDB: use errdefs for non-existing containers
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-12-08 14:33:57 +01:00
Sebastiaan van Stijn
8c9a240597
container: use const for null-terminator
Also using `bytes.TrimSuffix()`, which is slightly more readable, and
makes sure we're only stripping the null terminator.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-12-08 14:33:56 +01:00
Sebastiaan van Stijn
fb77973201
pkg/system: move CheckSystemDriveAndRemoveDriveLetter to pkg/archive
This one is a "bit" fuzzy, as it may not be _directly_ related to `archive`,
but it's always used _in combination_ with the archive package, so moving it
there.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-11-29 17:07:48 +01:00
Sebastiaan van Stijn
9f3e5eead5
pkg/system: deprecate DefaultPathEnv, move to oci
This patch:

- Deprecates pkg/system.DefaultPathEnv
- Moves the implementation inside oci
- Adds TODOs to align the default in the Builder with the one used elsewhere

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-11-29 17:02:50 +01:00
Cory Snider
dcd6c1d2e2 container: make path resolution fns Windows-only
The new daemon.containerFSView type covers all the use-cases on Linux
with a much more intuitive API, but is not portable to Windows.
Discourage people from using the old and busted functions in new Linux
code by excluding them entirely from Linux builds.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-10-27 12:52:14 -04:00
Cory Snider
9ce2b30b81 pkg/containerfs: drop ContainerFS type alias
Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-09-23 16:56:53 -04:00
Cory Snider
e332c41e9d pkg/containerfs: alias ContainerFS to string
Drop the constructor and redundant string() type-casts.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-09-23 16:56:52 -04:00
Cory Snider
95824f2b5f pkg/containerfs: simplify ContainerFS type
Iterate towards dropping the type entirely.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-09-23 16:56:49 -04:00
Cory Snider
be4f4644a8 pkg/containerfs: drop Driver abstraction
The Driver abstraction was needed for Linux Containers on Windows,
support for which has since been removed.

There is no direct equivalent to Lchmod() in the standard library so
continue to use the containerd/continuity version.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-09-23 16:25:22 -04:00
Cory Snider
7014c0d65d pkg/containerfs: drop PathDriver abstraction
With LCOW support removed, there is no need to support non-native file
paths any longer.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-09-23 16:25:22 -04:00
Cory Snider
a7c8fdc55b pkg/containerfs: make ResolveScopedPath a free fn
Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-09-23 16:25:22 -04:00
Sebastiaan van Stijn
511a909ae6
container: remove ViewDB and View interfaces, use concrete types
These interfaces were added in aacddda89d, with
no clear motivation, other than "Also hide ViewDB behind an interface".

This patch removes the interface in favor of using a concrete implementation;
There's currently only one implementation of this interface, and if we would
decide to change to an alternative implementation, we could define relevant
interfaces on the receiver side.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-09-21 17:38:45 +02:00
Cory Snider
4bafaa00aa Refactor libcontainerd to minimize c8d RPCs
The containerd client is very chatty at the best of times. Because the
libcontained API is stateless and references containers and processes by
string ID for every method call, the implementation is essentially
forced to use the containerd client in a way which amplifies the number
of redundant RPCs invoked to perform any operation. The libcontainerd
remote implementation has to reload the containerd container, task
and/or process metadata for nearly every operation. This in turn
amplifies the number of context switches between dockerd and containerd
to perform any container operation or handle a containerd event,
increasing the load on the system which could otherwise be allocated to
workloads.

Overhaul the libcontainerd interface to reduce the impedance mismatch
with the containerd client so that the containerd client can be used
more efficiently. Split the API out into container, task and process
interfaces which the consumer is expected to retain so that
libcontainerd can retain state---especially the analogous containerd
client objects---without having to manage any state-store inside the
libcontainerd client.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-08-24 14:59:08 -04:00
Cory Snider
57d2d6ef62 Update container OOMKilled flag immediately
The OOMKilled flag on a container's state has historically behaved
rather unintuitively: it is updated on container exit to reflect whether
or not any process within the container has been OOM-killed during the
preceding run of the container. The OOMKilled flag would be set to true
when the container exits if any process within the container---including
execs---was OOM-killed at any time while the container was running,
whether or not the OOM-kill was the cause of the container exiting. The
flag is "sticky," persisting through the next start of the container;
only being cleared once the container exits without any processes having
been OOM-killed that run.

Alter the behavior of the OOMKilled flag such that it signals whether
any process in the container had been OOM-killed since the most recent
start of the container. Set the flag immediately upon any process being
OOM-killed, and clear it when the container transitions to the "running"
state.

There is an ulterior motive for this change. It reduces the amount of
state the libcontainerd client needs to keep track of and clean up on
container exit. It's one less place the client could leak memory if a
container was to be deleted without going through libcontainerd.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-08-24 14:59:07 -04:00
Paweł Gronowski
a290f5d04c state/Wait: Fix race when reading exit status
Before this change there was a race condition between State.Wait reading
the exit code from State and the State being changed instantly after the
change which ended the State.Wait.

Now, each State.Wait has its own channel which is used to transmit the
desired StateStatus at the time the state transitions to the awaited
one. Wait no longer reads the status by itself so there is no race.

The issue caused the `docker run --restart=always ...' to sometimes exit
with 0 exit code, because the process was already restarted by the time
State.Wait got the chance to read the exit code.

Test run
--------
Before:
```
$ go test -count 1 -run TestCorrectStateWaitResultAfterRestart .
--- FAIL: TestCorrectStateWaitResultAfterRestart (0.00s)
    state_test.go:198: expected exit code 10, got 0
FAIL
FAIL    github.com/docker/docker/container      0.011s
FAIL

```

After:
```
$ go test -count 1 -run TestCorrectStateWaitResultAfterRestart .
ok      github.com/docker/docker/container      0.011s
```

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2022-07-20 09:23:31 +02:00
Sebastiaan van Stijn
52c1a2fae8
gofmt GoDoc comments with go1.19
Older versions of Go don't format comments, so committing this as
a separate commit, so that we can already make these changes before
we upgrade to Go 1.19.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-07-08 19:56:23 +02:00
Djordje Lukic
70dc392bfa
Use hashicorp/go-memdb instead of truncindex
memdb already knows how to search by prefix so there is no need to keep
a separate list of container ids in the truncindex

Benchmarks:

$ go test -benchmem -run=^$ -count 5 -tags linux -bench ^BenchmarkDBGetByPrefix100$ github.com/docker/docker/container
goos: linux
goarch: amd64
pkg: github.com/docker/docker/container
cpu: Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
BenchmarkDBGetByPrefix100-6        16018             73935 ns/op           33888 B/op       1100 allocs/op
BenchmarkDBGetByPrefix100-6        16502             73150 ns/op           33888 B/op       1100 allocs/op
BenchmarkDBGetByPrefix100-6        16218             74014 ns/op           33856 B/op       1100 allocs/op
BenchmarkDBGetByPrefix100-6        15733             73370 ns/op           33792 B/op       1100 allocs/op
BenchmarkDBGetByPrefix100-6        16432             72546 ns/op           33744 B/op       1100 allocs/op
PASS
ok      github.com/docker/docker/container      9.752s

$ go test -benchmem -run=^$ -count 5 -tags linux -bench ^BenchmarkTruncIndexGet100$ github.com/docker/docker/pkg/truncindex
goos: linux
goarch: amd64
pkg: github.com/docker/docker/pkg/truncindex
cpu: Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
BenchmarkTruncIndexGet100-6        16862             73732 ns/op           44776 B/op       1173 allocs/op
BenchmarkTruncIndexGet100-6        16832             73629 ns/op           45184 B/op       1179 allocs/op
BenchmarkTruncIndexGet100-6        17214             73571 ns/op           45160 B/op       1178 allocs/op
BenchmarkTruncIndexGet100-6        16113             71680 ns/op           45360 B/op       1182 allocs/op
BenchmarkTruncIndexGet100-6        16676             71246 ns/op           45056 B/op       1184 allocs/op
PASS
ok      github.com/docker/docker/pkg/truncindex 9.759s

$ go test -benchmem -run=^$ -count 5 -tags linux -bench ^BenchmarkDBGetByPrefix500$ github.com/docker/docker/container
goos: linux
goarch: amd64
pkg: github.com/docker/docker/container
cpu: Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
BenchmarkDBGetByPrefix500-6         1539            753541 ns/op          169381 B/op       5500 allocs/op
BenchmarkDBGetByPrefix500-6         1624            749975 ns/op          169458 B/op       5500 allocs/op
BenchmarkDBGetByPrefix500-6         1635            761222 ns/op          169298 B/op       5500 allocs/op
BenchmarkDBGetByPrefix500-6         1693            727856 ns/op          169297 B/op       5500 allocs/op
BenchmarkDBGetByPrefix500-6         1874            710813 ns/op          169570 B/op       5500 allocs/op
PASS
ok      github.com/docker/docker/container      6.711s

$ go test -benchmem -run=^$ -count 5 -tags linux -bench ^BenchmarkTruncIndexGet500$ github.com/docker/docker/pkg/truncindex
goos: linux
goarch: amd64
pkg: github.com/docker/docker/pkg/truncindex
cpu: Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
BenchmarkTruncIndexGet500-6         1934            780328 ns/op          224073 B/op       5929 allocs/op
BenchmarkTruncIndexGet500-6         1713            713935 ns/op          225011 B/op       5937 allocs/op
BenchmarkTruncIndexGet500-6         1780            702847 ns/op          224090 B/op       5943 allocs/op
BenchmarkTruncIndexGet500-6         1736            711086 ns/op          224027 B/op       5929 allocs/op
BenchmarkTruncIndexGet500-6         2448            508694 ns/op          222322 B/op       5914 allocs/op
PASS
ok      github.com/docker/docker/pkg/truncindex 6.877s

Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
2022-05-20 18:22:21 +02:00
Sebastiaan van Stijn
21df9a04e0
container: StopSignal(): return syscall.Signal
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-05-05 00:53:53 +02:00
Cory Snider
1c129103b4 Bump swarmkit to v2
Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-04-21 17:33:07 -04:00
Tianon Gravi
cf811b1122
Merge pull request #42574 from charlesxsh/fix-deadlock-1
fix potential goroutine leak by making channel non-blocking
2021-12-01 17:35:30 -08:00
Eng Zer Jun
c55a4ac779
refactor: move from io/ioutil to io and os package
The io/ioutil package has been deprecated in Go 1.16. This commit
replaces the existing io/ioutil functions with their new definitions in
io and os packages.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2021-08-27 14:56:57 +08:00
Shihao Xia
6a72e73c1d fix potential goroutine leak by making channel non-blocking
Signed-off-by: Shihao Xia <charlesxsh@hotmail.com>
2021-08-26 12:57:03 -04:00
Sebastiaan van Stijn
686be57d0a
Update to Go 1.17.0, and gofmt with Go 1.17
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-08-24 23:33:27 +02:00
Sebastiaan van Stijn
e53f65a916
pkg/signal: remove DefaultStopSignal const
This const was previously living in pkg/signal, but with that package
being moved to its own module, it didn't make much sense to put docker's
defaults in a generic module.

The const from the "signal" package is currenlty used *both* by the CLI
and the daemon as a default value when creating containers. This put up
some questions:

a. should the default be non-exported, and private to the container
   package? After all, it's a _default_ (so should be used if _NOT_ set).
b. should the client actually setting a default, or instead just omit
   the value, unless specified by the user? having the client set a
   default also means that the daemon cannot change the default value
   because the client (or older clients) will override it.
c. consider defaults from the client and defaults of the daemon to be
   separate things, and create a default const in the CLI.

This patch implements option "a" (option "b" will be done separately,
as it involves the CLI code). This still leaves "c" open as an option,
if the CLI wants to set its own default.

Unfortunately, this change means we'll have to drop the alias for the
deprecated pkg/signal.DefaultStopSignal const, but a comment was left
instead, which can assist consumers of the const to find why it's no
longer there (a search showed the Docker CLI as the only consumer though).

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-08-11 10:31:29 +02:00
Sebastiaan van Stijn
3b316814f9
container: un-export DefaultStopTimeout
It's not used outside of the package itself

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-08-11 10:05:40 +02:00
Sebastiaan van Stijn
13cb04e57c
remove various LCOW bits (container, image, pkg/containerfs)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-07-27 13:36:21 +02:00
Brian Goff
12f1b3ce43
Merge pull request #42616 from thaJeztah/migrate_pkg_signal
replace pkg/signal with moby/sys/signal v0.5.0
2021-07-26 10:47:28 -07:00
Sebastiaan van Stijn
28409ca6c7
replace pkg/signal with moby/sys/signal v0.5.0
This code was moved to the moby/sys repository

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-07-23 09:32:54 +02:00
Sebastiaan van Stijn
300c11c7c9
volume/mounts: remove "containerOS" argument from NewParser (LCOW code)
This changes mounts.NewParser() to create a parser for the current operatingsystem,
instead of one specific to a (possibly non-matching, in case of LCOW) OS.

With the OS-specific handling being removed, the "OS" parameter is also removed
from `daemon.verifyContainerSettings()`, and various other container-related
functions.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-07-02 13:51:55 +02:00
Sebastiaan van Stijn
e047d984dc
Remove LCOW code (step 1)
The LCOW implementation in dockerd has been deprecated in favor of re-implementation
in containerd (in progress). Microsoft started removing the LCOW V1 code from the
build dependencies we use in Microsoft/opengcs (soon to be part of Microsoft/hcshhim),
which means that we need to start removing this code.

This first step removes the lcow graphdriver, the LCOW initialization code, and
some LCOW-related utilities.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-06-03 21:16:21 +02:00
Akihiro Suda
9303376242
Swarm config: use absolute paths for mount destination strings
Needed for runc >= 1.0.0-rc94.

See runc issue 2928.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-05-11 12:46:43 +02:00