The same error is already returned by `(*Daemon).containerCreate()` but
since this function is also called by the cluster executor, the error
has to be duplicated.
Doing that allows to remove a nil check on container config in
`postContainersCreate`.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
The github.com/containerd/containerd/log package was moved to a separate
module, which will also be used by upcoming (patch) releases of containerd.
This patch moves our own uses of the package to use the new module.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Diffing a container yielded some extra changes that come from the
files/directories that we mount inside the container (/etc/resolv.conf
for example). To avoid that we create an intermediate snapshot that has
these files, with this we can now diff the container fs with its parent
and only get the differences that were made inside the container.
Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
Thus far, validation code would stop as soon as a bad value was found.
Now, we try to validate as much as we can, to return all errors to the
API client.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
So far, only a subset of NetworkingConfig was validated when calling
ContainerCreate. Other parameters would be validated when the container
was started. And the same goes for EndpointSettings on NetworkConnect.
This commit adds two validation steps:
1. Check if the IP addresses set in endpoint's IPAMConfig are valid,
when ContainerCreate and ConnectToNetwork is called ;
2. Check if the network allows static IP addresses, only on
ConnectToNetwork as we need the libnetwork's Network for that and it
might not exist until NetworkAttachment requests are sent to the
Swarm leader (which happens only when starting the container) ;
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
The API endpoint `/containers/create` accepts several EndpointsConfig
since v1.22 but the daemon would error out in such case. This check is
moved from the daemon to the api and is now applied only for API < 1.44,
effectively allowing the daemon to create containers connected to
several networks.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Define consts for the Actions we use for events, instead of "ad-hoc" strings.
Having these consts makes it easier to find where specific events are triggered,
makes the events less error-prone, and allows documenting each Action (if needed).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Also move the validation function to live with the type definition,
which allows it to be used outside of the daemon as well.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This function is called by `daemon.containerCreate()` which is already
wrapping errors coming from `verifyNetworkingConfig()` with
`errdefs.InvalidParameter()`. So `verifyNetworkingConfig()` should only
return standard errors.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
The existing runtimes reload logic went to great lengths to replace the
directory containing runtime wrapper scripts as atomically as possible
within the limitations of the Linux filesystem ABI. Trouble is,
atomically swapping the wrapper scripts directory solves the wrong
problem! The runtime configuration is "locked in" when a container is
started, including the path to the runC binary. If a container is
started with a runtime which requires a daemon-managed wrapper script
and then the daemon is reloaded with a config which no longer requires
the wrapper script (i.e. some args -> no args, or the runtime is dropped
from the config), that container would become unmanageable. Any attempts
to stop, exec or otherwise perform lifecycle management operations on
the container are likely to fail due to the wrapper script no longer
existing at its original path.
Atomically swapping the wrapper scripts is also incompatible with the
read-copy-update paradigm for reloading configuration. A handler in the
daemon could retain a reference to the pre-reload configuration for an
indeterminate amount of time after the daemon configuration has been
reloaded and updated. It is possible for the daemon to attempt to start
a container using a deleted wrapper script if a request to run a
container races a reload.
Solve the problem of deleting referenced wrapper scripts by ensuring
that all wrapper scripts are *immutable* for the lifetime of the daemon
process. Any given runtime wrapper script must always exist with the
same contents, no matter how many times the daemon config is reloaded,
or what changes are made to the config. This is accomplished by using
everyone's favourite design pattern: content-addressable storage. Each
wrapper script file name is suffixed with the SHA-256 digest of its
contents to (probabilistically) guarantee immutability without needing
any concurrency control. Stale runtime wrapper scripts are only cleaned
up on the next daemon restart.
Split the derived runtimes configuration from the user-supplied
configuration to have a place to store derived state without mutating
the user-supplied configuration or exposing daemon internals in API
struct types. Hold the derived state and the user-supplied configuration
in a single struct value so that they can be updated as an atomic unit.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Ensure data-race-free access to the daemon configuration without
locking by mutating a deep copy of the config and atomically storing
a pointer to the copy into the daemon-wide configStore value. Any
operations which need to read from the daemon config must capture the
configStore value only once and pass it around to guarantee a consistent
view of the config.
Signed-off-by: Cory Snider <csnider@mirantis.com>
This addresses the previous issue with the containerd store where, after a container is created, we can't deterministically resolve which image variant was used to run it (since we also don't store what platform the image was fetched for).
This is required for things like `docker commit`, and computing the containers layer size later, since we need to resolve the specific image variant.
Signed-off-by: Laura Brehm <laurabrehm@hey.com>
Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
c8d/daemon: Mount root and fill BaseFS
This fixes things that were broken due to nil BaseFS like `docker cp`
and running a container with workdir override.
This is more of a temporary hack than a real solution.
The correct fix would be to refactor the code to make BaseFS and LayerRW
an implementation detail of the old image store implementation and use
the temporary mounts for the c8d implementation instead.
That requires more work though.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
daemon/images: Don't unset BaseFS
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Now that we can pass any custom containerd shim to dockerd there is need
for this check. Without this it becomes possible to use wasm shims for
example with images that have "wasi" as the OS.
Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
Just a minor cleanup; I went looking where these options were used, and
these occurrences set then to their default value. Removing these
assignments makes it easier to find where they're actually used.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Currently only provides the existing "platform" option, but more
options will be added in follow-ups.
Signed-off-by: Nicolas De Loof <nicolas.deloof@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
We already have this config, so might as well pass it, instead of passing
each option as a separate argument.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This removes some of the checks that were added in 0cba7740d4,
but should no longer be needed.
- `Daemon.create()`: fix the error message, which assumed it could only occur on Windows.
- `Daemon.cleanupContainer()`: no need to validate container platform to delete it.
- `Daemon.containerExport`: if a container was created, we should be able to
export it; no need to validate.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Do not use 0701 perms.
0701 dir perms allows anyone to traverse the docker dir.
It happens to allow any user to execute, as an example, suid binaries
from image rootfs dirs because it allows traversal AND critically
container users need to be able to do execute things.
0701 on lower directories also happens to allow any user to modify
things in, for instance, the overlay upper dir which neccessarily
has 0755 permissions.
This changes to use 0710 which allows users in the group to traverse.
In userns mode the UID owner is (real) root and the GID is the remapped
root's GID.
This prevents anyone but the remapped root to traverse our directories
(which is required for userns with runc).
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit ef7237442147441a7cadcda0600be1186d81ac73)
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit 93ac040bf0)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This changes mounts.NewParser() to create a parser for the current operatingsystem,
instead of one specific to a (possibly non-matching, in case of LCOW) OS.
With the OS-specific handling being removed, the "OS" parameter is also removed
from `daemon.verifyContainerSettings()`, and various other container-related
functions.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The LCOW implementation in dockerd has been deprecated in favor of re-implementation
in containerd (in progress). Microsoft started removing the LCOW V1 code from the
build dependencies we use in Microsoft/opengcs (soon to be part of Microsoft/hcshhim),
which means that we need to start removing this code.
This first step removes the lcow graphdriver, the LCOW initialization code, and
some LCOW-related utilities.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Wrap platforms.Only and fallback to our ignore mismatches due to empty
CPU variants. This just cleans things up and makes the logic re-usable
in other places.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Various dirs in /var/lib/docker contain data that needs to be mounted
into a container. For this reason, these dirs are set to be owned by the
remapped root user, otherwise there can be permissions issues.
However, this uneccessarily exposes these dirs to an unprivileged user
on the host.
Instead, set the ownership of these dirs to the real root (or rather the
UID/GID of dockerd) with 0701 permissions, which allows the remapped
root to enter the directories but not read/write to them.
The remapped root needs to enter these dirs so the container's rootfs
can be configured... e.g. to mount /etc/resolve.conf.
This prevents an unprivileged user from having read/write access to
these dirs on the host.
The flip side of this is now any user can enter these directories.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit e908cc3901)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This ensures the storage-opts applies to all operations by the graph
drivers, replacing the merging of storage-opts into container storage
config at container-creation time, and hence applying storage-opts to
non-container operations like `COPY` and `ADD` in the builder.
Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
This fixes a regression based on expectations of the runtime:
```
docker pull arm32v7/alpine
docker run arm32v7/alpine
```
Without this change, the `docker run` will fail due to platform
matching on non-arm32v7 systems, even though the image could run
(assuming the system is setup correctly).
This also emits a warning to make sure that the user is aware that a
platform that does not match the default platform of the system is being
run, for the cases like:
```
docker pull --platform armhf busybox
docker run busybox
```
Not typically an issue if the requests are done together like that, but
if the image was already there and someone did `docker run` without an
explicit `--platform`, they may very well be expecting to run a native
version of the image instead of the armhf one.
This warning does add some extra noise in the case of platform specific
images being run, such as `arm32v7/alpine`, but this can be supressed by
explicitly setting the platform.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This enables image lookup when creating a container to fail when the
reference exists but it is for the wrong platform. This prevents trying
to run an image for the wrong platform, as can be the case with, for
example binfmt_misc+qemu.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Signed-off-by: John Howard <jhoward@microsoft.com>
Also fixes https://github.com/moby/moby/issues/22874
This commit is a pre-requisite to moving moby/moby on Windows to using
Containerd for its runtime.
The reason for this is that the interface between moby and containerd
for the runtime is an OCI spec which must be unambigious.
It is the responsibility of the runtime (runhcs in the case of
containerd on Windows) to ensure that arguments are escaped prior
to calling into HCS and onwards to the Win32 CreateProcess call.
Previously, the builder was always escaping arguments which has
led to several bugs in moby. Because the local runtime in
libcontainerd had context of whether or not arguments were escaped,
it was possible to hack around in daemon/oci_windows.go with
knowledge of the context of the call (from builder or not).
With a remote runtime, this is not possible as there's rightly
no context of the caller passed across in the OCI spec. Put another
way, as I put above, the OCI spec must be unambigious.
The other previous limitation (which leads to various subtle bugs)
is that moby is coded entirely from a Linux-centric point of view.
Unfortunately, Windows != Linux. Windows CreateProcess uses a
command line, not an array of arguments. And it has very specific
rules about how to escape a command line. Some interesting reading
links about this are:
https://blogs.msdn.microsoft.com/twistylittlepassagesallalike/2011/04/23/everyone-quotes-command-line-arguments-the-wrong-way/https://stackoverflow.com/questions/31838469/how-do-i-convert-argv-to-lpcommandline-parameter-of-createprocesshttps://docs.microsoft.com/en-us/cpp/cpp/parsing-cpp-command-line-arguments?view=vs-2017
For this reason, the OCI spec has recently been updated to cater
for more natural syntax by including a CommandLine option in
Process.
What does this commit do?
Primary objective is to ensure that the built OCI spec is unambigious.
It changes the builder so that `ArgsEscaped` as commited in a
layer is only controlled by the use of CMD or ENTRYPOINT.
Subsequently, when calling in to create a container from the builder,
if follows a different path to both `docker run` and `docker create`
using the added `ContainerCreateIgnoreImagesArgsEscaped`. This allows
a RUN from the builder to control how to escape in the OCI spec.
It changes the builder so that when shell form is used for RUN,
CMD or ENTRYPOINT, it builds (for WCOW) a more natural command line
using the original as put by the user in the dockerfile, not
the parsed version as a set of args which loses fidelity.
This command line is put into args[0] and `ArgsEscaped` is set
to true for CMD or ENTRYPOINT. A RUN statement does not commit
`ArgsEscaped` to the commited layer regardless or whether shell
or exec form were used.
This implements chown support on Windows. Built-in accounts as well
as accounts included in the SAM database of the container are supported.
NOTE: IDPair is now named Identity and IDMappings is now named
IdentityMapping.
The following are valid examples:
ADD --chown=Guest . <some directory>
COPY --chown=Administrator . <some directory>
COPY --chown=Guests . <some directory>
COPY --chown=ContainerUser . <some directory>
On Windows an owner is only granted the permission to read the security
descriptor and read/write the discretionary access control list. This
fix also grants read/write and execute permissions to the owner.
Signed-off-by: Salahuddin Khan <salah@docker.com>