beenull/moby

Author	SHA1	Message	Date
huang-jl	da643c0b8a	libcontainerd: change the digest used when restoring For current implementation of Checkpoint Restore (C/R) in docker, it will write the checkpoint to content store. However, when restoring libcontainerd uses .Digest().Encoded(), which will remove the info of alg, leading to error. Signed-off-by: huang-jl <1046678590@qq.com>	2024-02-27 20:17:31 +08:00
Cory Snider	659d7b190f	libcontainerd: create unstarted tasks Split task creation and start into two separate method calls in the libcontainerd API. Clients now have the opportunity to inspect the freshly-created task and customize its runtime environment before starting execution of the user-specified binary. Signed-off-by: Cory Snider <csnider@mirantis.com>	2024-01-10 13:50:26 -05:00
Cory Snider	29ac09ee9d	Revert "libcontainerd: work around exec start bug in c8d" The workaround is no longer required. The bug has been fixed in stable versions of all supported containerd branches. This reverts commit `fb7ec1555c`. Signed-off-by: Cory Snider <csnider@mirantis.com>	2023-11-06 13:26:44 -05:00
Sebastiaan van Stijn	cff4f20c44	migrate to github.com/containerd/log v0.1.0 The github.com/containerd/containerd/log package was moved to a separate module, which will also be used by upcoming (patch) releases of containerd. This patch moves our own uses of the package to use the new module. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2023-10-11 17:52:23 +02:00
Bjorn Neergaard	fd6dd6935b	vendor: github.com/containerd/containerd v1.7.6 The DeepEqual ignore required in the daemon tests is a bit ugly, but it works given the new protoc output. We also have to ignore lints related to schema1 deprecations; these do not apply as we must continue to support this schema version. Signed-off-by: Bjorn Neergaard <bjorn.neergaard@docker.com>	2023-09-21 14:18:40 -06:00
Sebastiaan van Stijn	96faee9762	libcontainer: client.processEventStream: use locally scoped variables - use local variables and remove some intermediate variables - handle the events inside the switch itself; this makes all the switch branches use the same logic, instead of "some" using a `continue`, and others falling through to have the event handled outside of the switch. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2023-09-17 14:29:54 +02:00
Sebastiaan van Stijn	bd523abd44	remove more direct uses of logrus Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2023-09-15 20:12:27 +02:00
Sebastiaan van Stijn	5e2a1195d7	swap logrus types for their containerd/logs aliases Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2023-08-01 13:02:55 +02:00
Sebastiaan van Stijn	4175a550fd	libcontainerd: format code with gofumpt Formatting the code with https://github.com/mvdan/gofumpt Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2023-06-29 00:31:50 +02:00
Brian Goff	74da6a6363	Switch all logging to use containerd log pkg This unifies our logging and allows us to propagate logging and trace contexts together. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2023-06-24 00:23:44 +00:00
Cory Snider	dea870f4ea	daemon: stop setting container resources to zero Many of the fields in LinuxResources struct are pointers to scalars for some reason, presumably to differentiate between set-to-zero and unset when unmarshaling from JSON, despite zero being outside the acceptable range for the corresponding kernel tunables. When creating the OCI spec for a container, the daemon sets the container's OCI spec CPUShares and BlkioWeight parameters to zero when the corresponding Docker container configuration values are zero, signifying unset, despite the minimum acceptable value for CPUShares being two, and BlkioWeight ten. This has gone unnoticed as runC does not distingiush set-to-zero from unset as it also uses zero internally to represent unset for those fields. However, kata-containers v3.2.0-alpha.3 tries to apply the explicit-zero resource parameters to the container, exactly as instructed, and fails loudly. The OCI runtime-spec is silent on how the runtime should handle the case when those parameters are explicitly set to out-of-range values and kata's behaviour is not unreasonable, so the daemon must therefore be in the wrong. Translate unset values in the Docker container's resources HostConfig to omit the corresponding fields in the container's OCI spec when starting and updating a container in order to maximize compatibility with runtimes. Signed-off-by: Cory Snider <csnider@mirantis.com>	2023-06-06 12:13:05 -04:00
Cory Snider	fb7ec1555c	libcontainerd: work around exec start bug in c8d It turns out that the unnecessary serialization removed in `b75246202a` happened to work around a bug in containerd. When many exec processes are started concurrently in the same containerd task, it takes seconds to minutes for them all to start. Add the workaround back in, only deliberately this time. Signed-off-by: Cory Snider <csnider@mirantis.com>	2023-05-25 16:00:29 -04:00
Jeyanthinath Muthuram	307b09e7eb	fixing consistent aliases for OCI spec imports Signed-off-by: Jeyanthinath Muthuram <jeyanthinath10@gmail.com>	2023-05-08 15:27:52 +05:30
Sebastiaan van Stijn	81e62af94a	use consistent alias for containerd's errdefs package The signatures of functions in containerd's errdefs packages are very similar to those in our own, and it's easy to accidentally use the wrong package. This patch uses a consistent alias for all occurrences of this import. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2023-04-08 19:30:33 +02:00
Cory Snider	36935bd869	libcontainerd: close stdin sync if possible Closing stdin of a container or exec (a.k.a.: task or process) has been somewhat broken ever since support for ContainerD 1.0 was introduced back in Docker v17.11: the error returned from the CloseIO() call was effectively ignored due to it being assigned to a local variable which shadowed the intended variable. Serendipitously, that oversight prevented a data race. In my recent refactor of libcontainerd, I corrected the variable shadowing issue and introduced the aforementioned data race in the process. Avoid deadlocking when closing stdin without swallowing errors or introducing data races by calling CloseIO() synchronously if the process handle is available, falling back to an asynchronous close-and-log strategy otherwise. This solution is inelegant and complex, but looks to be the best that could be done without changing the libcontainerd API. Signed-off-by: Cory Snider <csnider@mirantis.com>	2023-04-03 15:25:16 -04:00
Paweł Gronowski	a8f5c524a0	libcontainerd: Upgrade to typeurl/v2 In preparation for containerd v1.7 which migrates off gogo/protobuf and changes the protobuf Any type to one that's not supported by our vendored version of typeurl. This fixes a compile error on usages of `typeurl.UnmarshalAny` when upgrading to containerd v1.7. Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>	2023-03-08 11:26:32 +01:00
Paweł Gronowski	47e9caede7	libcontainerd/client: Rename `cp` to `checkpoint` Make the variable longer to give a hint about it's broader scope. Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>	2023-03-01 15:07:58 +01:00
Paweł Gronowski	0c751f904f	libcontainerd/client: Fix checkpoint not being set `cp` variable is used later to populate the `info.Checkpoint` field option used by Task creation. Previous changes mistakenly changed assignment of the `cp` variable to declaration of a new variable that's scoped only to the if block. Restore the old assignment behavior. Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>	2023-03-01 15:07:42 +01:00
Cory Snider	843fcc96f7	libc8d/remote: name task fifos after task ID The ID of the task is known at the time that the FIFOs need to be created (it's passed into the IO-creator callback, and is also the same as the container ID) so there is no need to hardcode it to "init". Name the FIFOs after the task ID to be consistent with the FIFO names of exec'ed processes. Delete the now-unused InitProcessName constant so it can never again be used in place of a task/process ID. Signed-off-by: Cory Snider <csnider@mirantis.com>	2023-01-31 17:02:43 -05:00
Sebastiaan van Stijn	200edf8030	libcontainerd/remote: remove stray import comment Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2022-12-08 13:27:50 +01:00
Cory Snider	1bef9e3fbf	Fix containerd task deletion after failed start Deleting a containerd task whose status is Created fails with a "precondition failed" error. This is because (aside from Windows) a process is spawned when the task is created, and deleting the task while the process is running would leak the process if it was allowed. libcontainerd and the containerd plugin executor mistakenly try to clean up from a failed start by deleting the created task, which will always fail with the aforementined error. Change them to pass the `WithProcessKill` delete option so the cleanup has a chance to succeed. Signed-off-by: Cory Snider <csnider@mirantis.com>	2022-11-02 13:48:13 -04:00
Cory Snider	4bafaa00aa	Refactor libcontainerd to minimize c8d RPCs The containerd client is very chatty at the best of times. Because the libcontained API is stateless and references containers and processes by string ID for every method call, the implementation is essentially forced to use the containerd client in a way which amplifies the number of redundant RPCs invoked to perform any operation. The libcontainerd remote implementation has to reload the containerd container, task and/or process metadata for nearly every operation. This in turn amplifies the number of context switches between dockerd and containerd to perform any container operation or handle a containerd event, increasing the load on the system which could otherwise be allocated to workloads. Overhaul the libcontainerd interface to reduce the impedance mismatch with the containerd client so that the containerd client can be used more efficiently. Split the API out into container, task and process interfaces which the consumer is expected to retain so that libcontainerd can retain state---especially the analogous containerd client objects---without having to manage any state-store inside the libcontainerd client. Signed-off-by: Cory Snider <csnider@mirantis.com>	2022-08-24 14:59:08 -04:00
Cory Snider	57d2d6ef62	Update container OOMKilled flag immediately The OOMKilled flag on a container's state has historically behaved rather unintuitively: it is updated on container exit to reflect whether or not any process within the container has been OOM-killed during the preceding run of the container. The OOMKilled flag would be set to true when the container exits if any process within the container---including execs---was OOM-killed at any time while the container was running, whether or not the OOM-kill was the cause of the container exiting. The flag is "sticky," persisting through the next start of the container; only being cleared once the container exits without any processes having been OOM-killed that run. Alter the behavior of the OOMKilled flag such that it signals whether any process in the container had been OOM-killed since the most recent start of the container. Set the flag immediately upon any process being OOM-killed, and clear it when the container transitions to the "running" state. There is an ulterior motive for this change. It reduces the amount of state the libcontainerd client needs to keep track of and clean up on container exit. It's one less place the client could leak memory if a container was to be deleted without going through libcontainerd. Signed-off-by: Cory Snider <csnider@mirantis.com>	2022-08-24 14:59:07 -04:00
Sebastiaan van Stijn	4f08346686	fix formatting of "nolint" tags for go1.19 The correct formatting for machine-readable comments is; //<some alphanumeric identifier>:<options>[,<option>...][ // comment] Which basically means: - MUST NOT have a space before `<identifier>` (e.g. `nolint`) - Identified MUST be alphanumeric - MUST be followed by a colon - MUST be followed by at least one `<option>` - Optionally additional `<options>` (comma-separated) - Optionally followed by a comment Any other format will not be considered a machine-readable comment by `gofmt`, and thus formatted as a regular comment. Note that this also means that a `//nolint` (without anything after it) is considered invalid, same for `//#nosec` (starts with a `#`). Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2022-07-13 22:31:53 +02:00
Akihiro Suda	658a4b0fec	libcontainerd: remove support for runtime v1 API Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2022-06-05 18:41:44 +09:00
Sebastiaan van Stijn	2ec2b65e45	libcontainerd: SignalProcess(): accept syscall.Signal This helps reducing some type-juggling / conversions further up the stack. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2022-05-05 00:53:49 +02:00
Sebastiaan van Stijn	d13997b4ba	gosec: G601: Implicit memory aliasing in for loop plugin/v2/plugin.go:141:50: G601: Implicit memory aliasing in for loop. (gosec) updateSettingsEnv(&p.PluginObj.Settings.Env, &s) ^ libcontainerd/remote/client.go:572:13: G601: Implicit memory aliasing in for loop. (gosec) cpDesc = &m ^ distribution/push_v2.go:400:34: G601: Implicit memory aliasing in for loop. (gosec) (metadata.CheckV2MetadataHMAC(&mountCandidate, pd.hmacKey) \|\| ^ builder/dockerfile/builder.go:261:84: G601: Implicit memory aliasing in for loop. (gosec) currentCommandIndex = printCommand(b.Stdout, currentCommandIndex, totalCommands, &meta) ^ builder/dockerfile/builder.go:278:46: G601: Implicit memory aliasing in for loop. (gosec) if err := initializeStage(dispatchRequest, &stage); err != nil { ^ daemon/container.go:283:40: G601: Implicit memory aliasing in for loop. (gosec) if err := parser.ValidateMountConfig(&cfg); err != nil { ^ Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2021-06-10 13:03:29 +02:00
Sebastiaan van Stijn	08ddbfbdac	libcontainerd: remove LCOW bits Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2021-06-09 22:05:10 +02:00
Brian Goff	4b981436fe	Fixup libnetwork lint errors Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2021-06-01 23:48:32 +00:00
Sebastiaan van Stijn	0f32beb4f8	libcontainerd: remove unused consts Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2021-03-19 21:52:23 +01:00
Cam	80a5df9c49	Added container ID to containerd task delete event messages Signed-off-by: Cam <gh@sparr.email>	2020-10-30 20:58:57 -07:00
Brian Goff	f14aea63c9	"Fix" checkpoint on v2 runtime Checkpoint/Restore is horribly broken all around. But on the, now default, v2 runtime it's even more broken. This at least makes checkpoint equally broken on both runtimes. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2020-10-12 22:35:37 +00:00
Brian Goff	906007f6c1	libcontainerd: use cancellable context for events The event subscriber can only be cancelled by cancelling the context. In the case where we have to restart event processing we are never cancelling the old subscribiption. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2020-08-12 17:09:21 +00:00
Brian Goff	60d7265803	Use IsServing to determine if c8d client is ready Instead of sleeping an arbitrary amount of time, using the client to tell us when it's ready so we can start processing events sooner. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2020-08-12 17:09:21 +00:00
Brian Goff	f63f73a4a8	Configure shims from runtime config In dockerd we already have a concept of a "runtime", which specifies the OCI runtime to use (e.g. runc). This PR extends that config to add containerd shim configuration. This option is only exposed within the daemon itself (cannot be configured in daemon.json). This is due to issues in supporting unknown shims which will require more design work. What this change allows us to do is keep all the runtime config in one place. So the default "runc" runtime will just have it's already existing shim config codified within the runtime config alone. I've also added 2 more "stock" runtimes which are basically runc+shimv1 and runc+shimv2. These new runtime configurations are: - io.containerd.runtime.v1.linux - runc + v1 shim using the V1 shim API - io.containerd.runc.v2 - runc + shim v2 These names coincide with the actual names of the containerd shims. This allows the user to essentially control what shim is going to be used by either specifying these as a `--runtime` on container create or by setting `--default-runtime` on the daemon. For custom/user-specified runtimes, the default shim config (currently shim v1) is used. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2020-07-13 14:18:02 -07:00
Akihiro Suda	612343618d	cgroup2: use shim V2 * Requires containerd binaries from containerd/containerd#3799 . Metrics are unimplemented yet. * Works with crun v0.10.4, but `--security-opt seccomp=unconfined` is needed unless using master version of libseccomp ( containers/crun#156, seccomp/libseccomp#177 ) * Doesn't work with master runc yet * Resource limitations are unimplemented Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-01-01 02:58:40 +09:00
Sebastiaan van Stijn	5bb4f4818b	libcontainerd: move hcsshim import to windows-only file This reduces the dependency-graph when building packages for Linux only. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2019-12-10 10:58:14 +01:00
Akihiro Suda	de5a67156b	Merge pull request #39082 from ehazlett/opts-for-create Add NewContainerOpts to libcontainerd.Create	2019-10-04 08:20:47 +09:00
Evan Hazlett	35ac4be5d5	add NewContainerOpts to libcontainerd.Create Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>	2019-10-03 11:45:41 -04:00
Sebastiaan van Stijn	07ff4f1de8	goimports: fix imports Format the source according to latest goimports. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2019-09-18 12:56:54 +02:00
Brian Goff	1acaf2aabe	Sleep before restarting event processing This prevents restarting event processing in a tight loop. You can see this with the following steps: ```terminal $ containerd & $ dockerd --containerd=/run/containerd/containerd.sock & $ pkill -9 containerd ``` At this point you will be spammed with logs such as: ``` ERRO[2019-07-12T22:29:37.318761400Z] failed to get event error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=plugins.moby ``` Without this change you can quickly end up with gigabytes of log data. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2019-07-12 15:42:19 -07:00
Michael Crosby	b5f28865ef	Handle blocked I/O of exec'd processes This is the second part to https://github.com/containerd/containerd/pull/3361 and will help process delete not block forever when the process exists but the I/O was inherited by a subprocess that lives on. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2019-06-21 12:02:15 -04:00
Sebastiaan van Stijn	539e72f75b	Fix typo retreive -> retrieve Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2019-06-04 17:33:04 +02:00
Michael Crosby	b9b5dc37e3	Remove inmemory container map Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2019-04-05 15:48:07 -04:00
Michael Crosby	adb15c2899	Export WithBundle code Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2019-04-05 08:41:48 -04:00
Michael Crosby	45e328b0ac	Remove libcontainerd status type Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2019-04-04 15:17:13 -04:00
John Howard	2f27332836	Windows: Implement docker top for containerd Signed-off-by: John Howard <jhoward@microsoft.com>	2019-03-12 18:41:55 -07:00
John Howard	85ad4b16c1	Windows: Experimental: Allow containerd for runtime Signed-off-by: John Howard <jhoward@microsoft.com> This is the first step in refactoring moby (dockerd) to use containerd on Windows. Similar to the current model in Linux, this adds the option to enable it for runtime. It does not switch the graphdriver to containerd snapshotters. - Refactors libcontainerd to a series of subpackages so that either a "local" containerd (1) or a "remote" (2) containerd can be loaded as opposed to conditional compile as "local" for Windows and "remote" for Linux. - Updates libcontainerd such that Windows has an option to allow the use of a "remote" containerd. Here, it communicates over a named pipe using GRPC. This is currently guarded behind the experimental flag, an environment variable, and the providing of a pipename to connect to containerd. - Infrastructure pieces such as under pkg/system to have helper functions for determining whether containerd is being used. (1) "local" containerd is what the daemon on Windows has used since inception. It's not really containerd at all - it's simply local invocation of HCS APIs directly in-process from the daemon through the Microsoft/hcsshim library. (2) "remote" containerd is what docker on Linux uses for it's runtime. It means that there is a separate containerd service running, and docker communicates over GRPC to it. To try this out, you will need to start with something like the following: Window 1: containerd --log-level debug Window 2: $env:DOCKER_WINDOWS_CONTAINERD=1 dockerd --experimental -D --containerd \\.\pipe\containerd-containerd You will need the following binary from github.com/containerd/containerd in your path: - containerd.exe You will need the following binaries from github.com/Microsoft/hcsshim in your path: - runhcs.exe - containerd-shim-runhcs-v1.exe For LCOW, it will require and initrd.img and kernel in `C:\Program Files\Linux Containers`. This is no different to the current requirements. However, you may need updated binaries, particularly initrd.img built from Microsoft/opengcs as (at the time of writing), Linuxkit binaries are somewhat out of date. Note that containerd and hcsshim for HCS v2 APIs do not yet support all the required functionality needed for docker. This will come in time - this is a baby (although large) step to migrating Docker on Windows to containerd. Note that the HCS v2 APIs are only called on RS5+ builds. RS1..RS4 will still use HCS v1 APIs as the v2 APIs were not fully developed enough on these builds to be usable. This abstraction is done in HCSShim. (Referring specifically to runtime) Note the LCOW graphdriver still uses HCS v1 APIs regardless. Note also that this does not migrate docker to use containerd snapshotters rather than graphdrivers. This needs to be done in conjunction with Linux also doing the same switch.	2019-03-12 18:41:55 -07:00

48 commits