beenull/moby

Author	SHA1	Message	Date
Sebastiaan van Stijn	bf1fb97575	daemon: Daemon.containerStart(): add comment to clarify error-type Any error that occurs while creating the spec, even if it's the result of an invalid container config, must be considered a System error (internal server error), as it's not an error with the request to start the container. Invalid configuration in the config itself must be validated when creating the container (creating its config), but some errors are dependent on the current state, for example when starting a container that shares a namespace with another container, and that container is not running (or missing). Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2023-08-11 14:47:22 +02:00
Brian Goff	74da6a6363	Switch all logging to use containerd log pkg This unifies our logging and allows us to propagate logging and trace contexts together. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2023-06-24 00:23:44 +00:00
Djordje Lukic	32d58144fd	c8d: Use reference counting while mounting a snapshot Some snapshotters (like overlayfs or zfs) can't mount the same directories twice. For example if the same directroy is used as an upper directory in two mounts the kernel will output this warning: overlayfs: upperdir is in-use as upperdir/workdir of another mount, accessing files from both mounts will result in undefined behavior. And indeed accessing the files from both mounts will result in an "No such file or directory" error. This change introduces reference counts for the mounts, if a directory is already mounted the mount interface will only increment the mount counter and return the mount target effectively making sure that the filesystem doesn't end up in an undefined behavior. Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>	2023-06-07 15:50:01 +02:00
Cory Snider	d222bf097c	daemon: reload runtimes w/o breaking containers The existing runtimes reload logic went to great lengths to replace the directory containing runtime wrapper scripts as atomically as possible within the limitations of the Linux filesystem ABI. Trouble is, atomically swapping the wrapper scripts directory solves the wrong problem! The runtime configuration is "locked in" when a container is started, including the path to the runC binary. If a container is started with a runtime which requires a daemon-managed wrapper script and then the daemon is reloaded with a config which no longer requires the wrapper script (i.e. some args -> no args, or the runtime is dropped from the config), that container would become unmanageable. Any attempts to stop, exec or otherwise perform lifecycle management operations on the container are likely to fail due to the wrapper script no longer existing at its original path. Atomically swapping the wrapper scripts is also incompatible with the read-copy-update paradigm for reloading configuration. A handler in the daemon could retain a reference to the pre-reload configuration for an indeterminate amount of time after the daemon configuration has been reloaded and updated. It is possible for the daemon to attempt to start a container using a deleted wrapper script if a request to run a container races a reload. Solve the problem of deleting referenced wrapper scripts by ensuring that all wrapper scripts are immutable for the lifetime of the daemon process. Any given runtime wrapper script must always exist with the same contents, no matter how many times the daemon config is reloaded, or what changes are made to the config. This is accomplished by using everyone's favourite design pattern: content-addressable storage. Each wrapper script file name is suffixed with the SHA-256 digest of its contents to (probabilistically) guarantee immutability without needing any concurrency control. Stale runtime wrapper scripts are only cleaned up on the next daemon restart. Split the derived runtimes configuration from the user-supplied configuration to have a place to store derived state without mutating the user-supplied configuration or exposing daemon internals in API struct types. Hold the derived state and the user-supplied configuration in a single struct value so that they can be updated as an atomic unit. Signed-off-by: Cory Snider <csnider@mirantis.com>	2023-06-01 14:45:25 -04:00
Cory Snider	0b592467d9	daemon: read-copy-update the daemon config Ensure data-race-free access to the daemon configuration without locking by mutating a deep copy of the config and atomically storing a pointer to the copy into the daemon-wide configStore value. Any operations which need to read from the daemon config must capture the configStore value only once and pass it around to guarantee a consistent view of the config. Signed-off-by: Cory Snider <csnider@mirantis.com>	2023-06-01 14:45:24 -04:00
Djordje Lukic	0137446248	Implement run using the containerd snapshotter Signed-off-by: Djordje Lukic <djordje.lukic@docker.com> c8d/daemon: Mount root and fill BaseFS This fixes things that were broken due to nil BaseFS like `docker cp` and running a container with workdir override. This is more of a temporary hack than a real solution. The correct fix would be to refactor the code to make BaseFS and LayerRW an implementation detail of the old image store implementation and use the temporary mounts for the c8d implementation instead. That requires more work though. Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com> daemon/images: Don't unset BaseFS Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>	2023-02-06 18:21:50 +01:00
Sebastiaan van Stijn	42f1be8030	daemon: translateContainerdStartErr(): rename to setExitCodeFromError() This should hopefully make it slightly clearer what it does. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2022-12-28 09:27:42 +01:00
Sebastiaan van Stijn	a756fa60ef	daemon: translateContainerdStartErr(): use const/enum for exit-statuses Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2022-12-28 09:27:41 +01:00
Sebastiaan van Stijn	2cf09c5446	daemon: translateContainerdStartErr(): remove unused cmd argument This argument was no longer used since commit `225e046d9d` Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2022-12-28 09:27:41 +01:00
Sebastiaan van Stijn	087369aeeb	daemon: containerStart(): rename return variable Rename the variable make it more visible where it's used, as there's were other "err" variables masking it. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2022-12-28 09:27:37 +01:00
Cory Snider	0141c6db81	daemon: don't checkpoint container until registered (Container).CheckpointTo() upserts a snapshot of the container to the daemon's in-memory ViewDB and also persists the snapshot to disk. It does not register the live container object with the daemon's container store, however. The ViewDB and container store are used as the source of truth for different operations, so having a container registered in one but not the other can result in inconsistencies. In particular, the List Containers API uses the ViewDB as its source of truth and the Container Inspect API uses the container store. The (Daemon).setHostConfig() method is called fairly early in the process of creating a container, long before the container is registered in the daemon's container store. Due to a rogue CheckpointTo() call inside setHostConfig(), there is a window of time where a container can be included in a List Containers API response but "not exist" according to the Container Inspect API and similar endpoints which operate on a particular container. Remove the rogue call so that the caller has full control over when the container is checkpointed and update callers to checkpoint explicitly. No changes to (Daemon).create() are needed as it checkpoints the fully-created container via (Daemon).Register(). Fixes #44512. Signed-off-by: Cory Snider <csnider@mirantis.com>	2022-12-12 15:53:49 -05:00
Paweł Gronowski	a181a825c8	daemon/start: Revert passing ctx to ctr.Start This caused integration tests to timeout in the CI Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>	2022-11-03 12:22:44 +01:00
Nicolas De Loof	def549c8f6	imageservice: Add context to various methods Co-authored-by: Paweł Gronowski <pawel.gronowski@docker.com> Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>	2022-11-03 12:22:40 +01:00
Cory Snider	95824f2b5f	pkg/containerfs: simplify ContainerFS type Iterate towards dropping the type entirely. Signed-off-by: Cory Snider <csnider@mirantis.com>	2022-09-23 16:56:49 -04:00
Cory Snider	6a2f385aea	Share logic to create-or-replace a container The existing logic to handle container ID conflicts when attempting to create a plugin container is not nearly as robust as the implementation in daemon for user containers. Extract and refine the logic from daemon and use it in the plugin executor. Signed-off-by: Cory Snider <csnider@mirantis.com>	2022-08-24 14:59:08 -04:00
Cory Snider	4bafaa00aa	Refactor libcontainerd to minimize c8d RPCs The containerd client is very chatty at the best of times. Because the libcontained API is stateless and references containers and processes by string ID for every method call, the implementation is essentially forced to use the containerd client in a way which amplifies the number of redundant RPCs invoked to perform any operation. The libcontainerd remote implementation has to reload the containerd container, task and/or process metadata for nearly every operation. This in turn amplifies the number of context switches between dockerd and containerd to perform any container operation or handle a containerd event, increasing the load on the system which could otherwise be allocated to workloads. Overhaul the libcontainerd interface to reduce the impedance mismatch with the containerd client so that the containerd client can be used more efficiently. Split the API out into container, task and process interfaces which the consumer is expected to retain so that libcontainerd can retain state---especially the analogous containerd client objects---without having to manage any state-store inside the libcontainerd client. Signed-off-by: Cory Snider <csnider@mirantis.com>	2022-08-24 14:59:08 -04:00
Paweł Gronowski	498803bec9	daemon/restart: Don't mutate AutoRemove when restarting This caused a race condition where AutoRemove could be restored before container was considered for restart and made autoremove containers impossible to restart. ``` $ make DOCKER_GRAPHDRIVER=vfs BIND_DIR=. TEST_FILTER='TestContainerWithAutoRemoveCanBeRestarted' TESTFLAGS='-test.count 1' test-integration ... === RUN TestContainerWithAutoRemoveCanBeRestarted === RUN TestContainerWithAutoRemoveCanBeRestarted/kill === RUN TestContainerWithAutoRemoveCanBeRestarted/stop --- PASS: TestContainerWithAutoRemoveCanBeRestarted (1.61s) --- PASS: TestContainerWithAutoRemoveCanBeRestarted/kill (0.70s) --- PASS: TestContainerWithAutoRemoveCanBeRestarted/stop (0.86s) PASS DONE 3 tests in 3.062s ``` Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>	2022-07-20 09:23:31 +02:00
Sebastiaan van Stijn	300c11c7c9	volume/mounts: remove "containerOS" argument from NewParser (LCOW code) This changes mounts.NewParser() to create a parser for the current operatingsystem, instead of one specific to a (possibly non-matching, in case of LCOW) OS. With the OS-specific handling being removed, the "OS" parameter is also removed from `daemon.verifyContainerSettings()`, and various other container-related functions. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2021-07-02 13:51:55 +02:00
Sebastiaan van Stijn	dc7cbb9b33	remove layerstore indexing by OS (used for LCOW) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2021-06-10 17:49:11 +02:00
Brian Goff	51f5b1279d	Don't set image on containerd container. We aren't using containerd's image store, so we shouldn't be setting this value. This fixes container checkpoints, where containerd attempts to checkpoint the image since one is set, but the image does not exist in containerd. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2020-11-06 04:55:03 +00:00
Sebastiaan van Stijn	182795cff6	Do not call mount.RecursiveUnmount() on Windows Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-10-29 23:00:16 +01:00
Brian Goff	f63f73a4a8	Configure shims from runtime config In dockerd we already have a concept of a "runtime", which specifies the OCI runtime to use (e.g. runc). This PR extends that config to add containerd shim configuration. This option is only exposed within the daemon itself (cannot be configured in daemon.json). This is due to issues in supporting unknown shims which will require more design work. What this change allows us to do is keep all the runtime config in one place. So the default "runc" runtime will just have it's already existing shim config codified within the runtime config alone. I've also added 2 more "stock" runtimes which are basically runc+shimv1 and runc+shimv2. These new runtime configurations are: - io.containerd.runtime.v1.linux - runc + v1 shim using the V1 shim API - io.containerd.runc.v2 - runc + shim v2 These names coincide with the actual names of the containerd shims. This allows the user to essentially control what shim is going to be used by either specifying these as a `--runtime` on container create or by setting `--default-runtime` on the daemon. For custom/user-specified runtimes, the default shim config (currently shim v1) is used. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2020-07-13 14:18:02 -07:00
Sebastiaan van Stijn	eb14d936bf	daemon: rename variables that collide with imported package names Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-04-14 17:22:23 +02:00
Sebastiaan van Stijn	5d040cbd16	daemon: fix capitalization of some functions Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-04-14 17:22:19 +02:00
Kir Kolyshkin	39048cf656	Really switch to moby/sys/mount* Switch to moby/sys/mount and mountinfo. Keep the pkg/mount for potential outside users. This commit was generated by the following bash script: ``` set -e -u -o pipefail for file in $(git grep -l 'docker/docker/pkg/mount"' \| grep -v ^pkg/mount); do sed -i -e 's#/docker/docker/pkg/mount"#/moby/sys/mount"#' \ -e 's#mount\.$GetMounts\\|Mounted\\|Info\\|[A-Za-z]*Filter$#mountinfo.\1#g' \ $file goimports -w $file done ``` Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-03-20 09:46:25 -07:00
Evan Hazlett	35ac4be5d5	add NewContainerOpts to libcontainerd.Create Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>	2019-10-03 11:45:41 -04:00
Sebastiaan van Stijn	1250e42a43	daemon:containerStart() fix unhandled error for saveApparmorConfig Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2019-08-29 20:28:58 +02:00
Brian Goff	5ba30cd1dc	Delete stale containerd object on start failure containerd has two objects with regard to containers. There is a "container" object which is metadata and a "task" which is manging the actual runtime state. When docker starts a container, it creartes both the container metadata and the task at the same time. So when a container exits, docker deletes both of these objects as well. This ensures that if, on start, when we go to create the container metadata object in containerd, if there is an error due to a name conflict that we go ahead and clean that up and try again. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2019-02-14 11:46:44 -08:00
Deng Guangxing	8e293be4ba	fix unless-stopped unexpected behavior fix https://github.com/moby/moby/issues/35304. Signed-off-by: dengguangxing <dengguangxing@huawei.com>	2019-02-01 15:03:17 -08:00
Kir Kolyshkin	77bc327e24	UnmountIpcMount: simplify As standard mount.Unmount does what we need, let's use it. In addition, this adds ignoring "not mounted" condition, which was previously implemented (see PR#33329, commit `cfa2591d3f`) via a very expensive call to mount.Mounted(). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2018-12-10 20:06:10 -08:00
Daniel Nephin	2b1a2b10af	Move ImageService to new package Signed-off-by: Daniel Nephin <dnephin@docker.com>	2018-02-26 16:49:37 -05:00
Daniel Nephin	0dab53ff3c	Move all daemon image methods into imageService imageService provides the backend for the image API and handles the imageStore, and referenceStore. Signed-off-by: Daniel Nephin <dnephin@docker.com>	2018-02-26 16:48:29 -05:00
Daniel Nephin	4f0d95fa6e	Add canonical import comment Signed-off-by: Daniel Nephin <dnephin@docker.com>	2018-02-05 16:51:57 -05:00
Anusha Ragunathan	c162e8eb41	Merge pull request #35830 from cpuguy83/unbindable_shm Make container shm parent unbindable	2018-01-19 17:43:30 -08:00
John Howard	afd305c4b5	LCOW: Refactor to multiple layer-stores based on feedback Signed-off-by: John Howard <jhoward@microsoft.com>	2018-01-18 08:31:05 -08:00
John Howard	ce8e529e18	LCOW: Re-coalesce stores Signed-off-by: John Howard <jhoward@microsoft.com> The re-coalesces the daemon stores which were split as part of the original LCOW implementation. This is part of the work discussed in https://github.com/moby/moby/issues/34617, in particular see the document linked to in that issue.	2018-01-18 08:29:19 -08:00
Brian Goff	eaa5192856	Make container resource mounts unbindable It's a common scenario for admins and/or monitoring applications to mount in the daemon root dir into a container. When doing so all mounts get coppied into the container, often with private references. This can prevent removal of a container due to the various mounts that must be configured before a container is started (for example, for shared /dev/shm, or secrets) being leaked into another namespace, usually with private references. This is particularly problematic on older kernels (e.g. RHEL < 7.4) where a mount may be active in another namespace and attempting to remove a mountpoint which is active in another namespace fails. This change moves all container resource mounts into a common directory so that the directory can be made unbindable. What this does is prevents sub-mounts of this new directory from leaking into other namespaces when mounted with `rbind`... which is how all binds are handled for containers. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2018-01-16 15:09:05 -05:00
Yong Tang	c36274da83	Merge pull request #35638 from cpuguy83/error_helpers2 Add helpers to create errdef errors	2018-01-15 10:56:46 -08:00
Sebastiaan van Stijn	b4a6313969	Golint: remove redundant ifs Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2018-01-15 00:42:25 +01:00
Brian Goff	d453fe35b9	Move api/errdefs to errdefs Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2018-01-11 21:21:43 -05:00
Brian Goff	87a12421a9	Add helpers to create errdef errors Instead of having to create a bunch of custom error types that are doing nothing but wrapping another error in sub-packages, use a common helper to create errors of the requested type. e.g. instead of re-implementing this over and over: ```go type notFoundError struct { cause error } func(e notFoundError) Error() string { return e.cause.Error() } func(e notFoundError) NotFound() {} func(e notFoundError) Cause() error { return e.cause } ``` Packages can instead just do: ``` errdefs.NotFound(err) ``` Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2018-01-11 21:21:43 -05:00
Kenfe-Mickael Laventure	ddae20c032	Update libcontainerd to use containerd 1.0 Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>	2017-10-20 07:11:37 -07:00
John Howard	0380fbff37	LCOW: API: Add platform to /images/create and /build Signed-off-by: John Howard <jhoward@microsoft.com> This PR has the API changes described in https://github.com/moby/moby/issues/34617. Specifically, it adds an HTTP header "X-Requested-Platform" which is a JSON-encoded OCI Image-spec `Platform` structure. In addition, it renames (almost all) uses of a string variable platform (and associated) methods/functions to os. This makes it much clearer to disambiguate with the swarm "platform" which is really os/arch. This is a stepping stone to getting the daemon towards fully multi-platform/arch-aware, and makes it clear when "operating system" is being referred to rather than "platform" which is misleadingly used - sometimes in the swarm meaning, but more often as just the operating system.	2017-10-06 11:44:18 -07:00
Akash Gupta	7a7357dae1	LCOW: Implemented support for docker cp + build This enables docker cp and ADD/COPY docker build support for LCOW. Originally, the graphdriver.Get() interface returned a local path to the container root filesystem. This does not work for LCOW, so the Get() method now returns an interface that LCOW implements to support copying to and from the container. Signed-off-by: Akash Gupta <akagup@microsoft.com>	2017-09-14 12:07:52 -07:00
Daniel Nephin	9b47b7b151	Fix golint errors. Signed-off-by: Daniel Nephin <dnephin@docker.com>	2017-08-18 14:23:44 -04:00
John Howard	9fa449064c	LCOW: WORKDIR correct handling Signed-off-by: John Howard <jhoward@microsoft.com>	2017-08-17 15:29:17 -07:00
Brian Goff	ebcb7d6b40	Remove string checking in API error handling Use strongly typed errors to set HTTP status codes. Error interfaces are defined in the api/errors package and errors returned from controllers are checked against these interfaces. Errors can be wraeped in a pkg/errors.Causer, as long as somewhere in the line of causes one of the interfaces is implemented. The special error interfaces take precedence over Causer, meaning if both Causer and one of the new error interfaces are implemented, the Causer is not traversed. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2017-08-15 16:01:11 -04:00
Kir Kolyshkin	7120976d74	Implement none, private, and shareable ipc modes Since the commit `d88fe447df` ("Add support for sharing /dev/shm/ and /dev/mqueue between containers") container's /dev/shm is mounted on the host first, then bind-mounted inside the container. This is done that way in order to be able to share this container's IPC namespace (and the /dev/shm mount point) with another container. Unfortunately, this functionality breaks container checkpoint/restore (even if IPC is not shared). Since /dev/shm is an external mount, its contents is not saved by `criu checkpoint`, and so upon restore any application that tries to access data under /dev/shm is severily disappointed (which usually results in a fatal crash). This commit solves the issue by introducing new IPC modes for containers (in addition to 'host' and 'container:ID'). The new modes are: - 'shareable': enables sharing this container's IPC with others (this used to be the implicit default); - 'private': disables sharing this container's IPC. In 'private' mode, container's /dev/shm is truly mounted inside the container, without any bind-mounting from the host, which solves the issue. While at it, let's also implement 'none' mode. The motivation, as eloquently put by Justin Cormack, is: > I wondered a while back about having a none shm mode, as currently it is > not possible to have a totally unwriteable container as there is always > a /dev/shm writeable mount. It is a bit of a niche case (and clearly > should never be allowed to be daemon default) but it would be trivial to > add now so maybe we should... ...so here's yet yet another mode: - 'none': no /dev/shm mount inside the container (though it still has its own private IPC namespace). Now, to ultimately solve the abovementioned checkpoint/restore issue, we'd need to make 'private' the default mode, but unfortunately it breaks the backward compatibility. So, let's make the default container IPC mode per-daemon configurable (with the built-in default set to 'shareable' for now). The default can be changed either via a daemon CLI option (--default-shm-mode) or a daemon.json configuration file parameter of the same name. Note one can only set either 'shareable' or 'private' IPC modes as a daemon default (i.e. in this context 'host', 'container', or 'none' do not make much sense). Some other changes this patch introduces are: 1. A mount for /dev/shm is added to default OCI Linux spec. 2. IpcMode.Valid() is simplified to remove duplicated code that parsed 'container:ID' form. Note the old version used to check that ID does not contain a semicolon -- this is no longer the case (tests are modified accordingly). The motivation is we should either do a proper check for container ID validity, or don't check it at all (since it is checked in other places anyway). I chose the latter. 3. IpcMode.Container() is modified to not return container ID if the mode value does not start with "container:", unifying the check to be the same as in IpcMode.IsContainer(). 3. IPC mode unit tests (runconfig/hostconfig_test.go) are modified to add checks for newly added values. [v2: addressed review at https://github.com/moby/moby/pull/34087#pullrequestreview-51345997] [v3: addressed review at https://github.com/moby/moby/pull/34087#pullrequestreview-53902833] [v4: addressed the case of upgrading from older daemon, in this case container.HostConfig.IpcMode is unset and this is valid] [v5: document old and new IpcMode values in api/swagger.yaml] [v6: add the 'none' mode, changelog entry to docs/api/version-history.md] Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2017-08-14 10:50:39 +03:00
Derek McGowan	1009e6a40b	Update logrus to v1.0.1 Fixes case sensitivity issue Signed-off-by: Derek McGowan <derek@mcgstyle.net>	2017-07-31 13:16:46 -07:00
Fabio Kung	66b231d598	delete unused code (daemon.Start) Signed-off-by: Fabio Kung <fabio.kung@gmail.com>	2017-06-23 07:52:34 -07:00

1 2 3 4

152 commits