In dockerd we already have a concept of a "runtime", which specifies the
OCI runtime to use (e.g. runc).
This PR extends that config to add containerd shim configuration.
This option is only exposed within the daemon itself (cannot be
configured in daemon.json).
This is due to issues in supporting unknown shims which will require
more design work.
What this change allows us to do is keep all the runtime config in one
place.
So the default "runc" runtime will just have it's already existing shim
config codified within the runtime config alone.
I've also added 2 more "stock" runtimes which are basically runc+shimv1
and runc+shimv2.
These new runtime configurations are:
- io.containerd.runtime.v1.linux - runc + v1 shim using the V1 shim API
- io.containerd.runc.v2 - runc + shim v2
These names coincide with the actual names of the containerd shims.
This allows the user to essentially control what shim is going to be
used by either specifying these as a `--runtime` on container create or
by setting `--default-runtime` on the daemon.
For custom/user-specified runtimes, the default shim config (currently
shim v1) is used.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This removes the use of the old distribution code in the plugin packages
and replaces it with containerd libraries for plugin pushes and pulls.
Additionally it uses a content store from containerd which seems like
it's compatible with the old "basicBlobStore" in the plugin package.
This is being used locally isntead of through the containerd client for
now.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Switch to moby/sys/mount and mountinfo. Keep the pkg/mount for potential
outside users.
This commit was generated by the following bash script:
```
set -e -u -o pipefail
for file in $(git grep -l 'docker/docker/pkg/mount"' | grep -v ^pkg/mount); do
sed -i -e 's#/docker/docker/pkg/mount"#/moby/sys/mount"#' \
-e 's#mount\.\(GetMounts\|Mounted\|Info\|[A-Za-z]*Filter\)#mountinfo.\1#g' \
$file
goimports -w $file
done
```
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
* Requires containerd binaries from containerd/containerd#3799 . Metrics are unimplemented yet.
* Works with crun v0.10.4, but `--security-opt seccomp=unconfined` is needed unless using master version of libseccomp
( containers/crun#156, seccomp/libseccomp#177 )
* Doesn't work with master runc yet
* Resource limitations are unimplemented
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Format the source according to latest goimports.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This allows our tests, which all share a containerd instance, to be a
bit more isolated by setting the containerd namespaces to the generated
daemon ID's rather than the default namespaces.
This came about because I found in some cases we had test daemons
failing to start (really very slow to start) because it was (seemingly)
processing events from other tests.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Allow specifying environment variables when installing an engine plugin
as a Swarm service. Invalid environment variable entries (without an
equals (`=`) char) will be ignored.
Signed-off-by: Sune Keller <absukl@almbrand.dk>
Signed-off-by: John Howard <jhoward@microsoft.com>
This is the first step in refactoring moby (dockerd) to use containerd on Windows.
Similar to the current model in Linux, this adds the option to enable it for runtime.
It does not switch the graphdriver to containerd snapshotters.
- Refactors libcontainerd to a series of subpackages so that either a
"local" containerd (1) or a "remote" (2) containerd can be loaded as opposed
to conditional compile as "local" for Windows and "remote" for Linux.
- Updates libcontainerd such that Windows has an option to allow the use of a
"remote" containerd. Here, it communicates over a named pipe using GRPC.
This is currently guarded behind the experimental flag, an environment variable,
and the providing of a pipename to connect to containerd.
- Infrastructure pieces such as under pkg/system to have helper functions for
determining whether containerd is being used.
(1) "local" containerd is what the daemon on Windows has used since inception.
It's not really containerd at all - it's simply local invocation of HCS APIs
directly in-process from the daemon through the Microsoft/hcsshim library.
(2) "remote" containerd is what docker on Linux uses for it's runtime. It means
that there is a separate containerd service running, and docker communicates over
GRPC to it.
To try this out, you will need to start with something like the following:
Window 1:
containerd --log-level debug
Window 2:
$env:DOCKER_WINDOWS_CONTAINERD=1
dockerd --experimental -D --containerd \\.\pipe\containerd-containerd
You will need the following binary from github.com/containerd/containerd in your path:
- containerd.exe
You will need the following binaries from github.com/Microsoft/hcsshim in your path:
- runhcs.exe
- containerd-shim-runhcs-v1.exe
For LCOW, it will require and initrd.img and kernel in `C:\Program Files\Linux Containers`.
This is no different to the current requirements. However, you may need updated binaries,
particularly initrd.img built from Microsoft/opengcs as (at the time of writing), Linuxkit
binaries are somewhat out of date.
Note that containerd and hcsshim for HCS v2 APIs do not yet support all the required
functionality needed for docker. This will come in time - this is a baby (although large)
step to migrating Docker on Windows to containerd.
Note that the HCS v2 APIs are only called on RS5+ builds. RS1..RS4 will still use
HCS v1 APIs as the v2 APIs were not fully developed enough on these builds to be usable.
This abstraction is done in HCSShim. (Referring specifically to runtime)
Note the LCOW graphdriver still uses HCS v1 APIs regardless.
Note also that this does not migrate docker to use containerd snapshotters
rather than graphdrivers. This needs to be done in conjunction with Linux also
doing the same switch.
`time.After` keeps a timer running until the specified duration is
completed. It also allocates a new timer on each call. This can wind up
leaving lots of uneccessary timers running in the background that are
not needed and consume resources.
Instead of `time.After`, use `time.NewTimer` so the timer can actually
be stopped.
In some of these cases it's not a big deal since the duraiton is really
short, but in others it is much worse.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
The errors returned from Mount and Unmount functions are raw
syscall.Errno errors (like EPERM or EINVAL), which provides
no context about what has happened and why.
Similar to os.PathError type, introduce mount.Error type
with some context. The error messages will now look like this:
> mount /tmp/mount-tests/source:/tmp/mount-tests/target, flags: 0x1001: operation not permitted
or
> mount tmpfs:/tmp/mount-test-source-516297835: operation not permitted
Before this patch, it was just
> operation not permitted
[v2: add Cause()]
[v3: rename MountError to Error, document Cause()]
[v4: fixes; audited all users]
[v5: make Error type private; changes after @cpuguy83 reviews]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This implements chown support on Windows. Built-in accounts as well
as accounts included in the SAM database of the container are supported.
NOTE: IDPair is now named Identity and IDMappings is now named
IdentityMapping.
The following are valid examples:
ADD --chown=Guest . <some directory>
COPY --chown=Administrator . <some directory>
COPY --chown=Guests . <some directory>
COPY --chown=ContainerUser . <some directory>
On Windows an owner is only granted the permission to read the security
descriptor and read/write the discretionary access control list. This
fix also grants read/write and execute permissions to the owner.
Signed-off-by: Salahuddin Khan <salah@docker.com>
Adds a supervisor package for starting and monitoring containerd.
Separates grpc connection allowing access from daemon.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
Scenario:
Daemon is ungracefully shutdown and leaves plugins running (no
live-restore).
Daemon comes back up.
The next time a container tries to use that plugin it will cause a
daemon panic because the plugin client is not set.
This fixes that by ensuring that the plugin does get shutdown.
Note, I do not think there would be any harm in just re-attaching to the
running plugin instead of shutting it down, however historically we shut
down plugins and containers when live-restore is not enabled.
[kir@: consolidate code to deleteTaskAndContainer, a few minor nits]
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Since Go 1.7, context is a standard package. Since Go 1.9, everything
that is provided by "x/net/context" is a couple of type aliases to
types in "context".
Many vendored packages still use x/net/context, so vendor entry remains
for now.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This was added as part of a53930a04f with
the intent to sort the mounts in the plugin config, but this was sorting
*all* the mounts from the default OCI spec which is problematic.
In reality we don't need to sort this because we are only adding a
self-binded mount to flag it as rshared.
We may want to look at sorting the plugin mounts before they are added
to the OCI spec in the future, but for now I think the existing behavior
is fine since the plugin author has control of the order (except for the
propagated mount).
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This reverts commit 0c2821d6f2.
Due to other changes this is no longer needed and resolves some other
issues with plugins.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Before this change, volume management was relying on the fact that
everything the plugin mounts is visible on the host within the plugin's
rootfs. In practice this caused some issues with mount leaks, so we
changed the behavior such that mounts are not visible on the plugin's
rootfs, but available outside of it, which breaks volume management.
To fix the issue, allow the plugin to scope the path correctly rather
than assuming that everything is visible in `p.Rootfs`.
In practice this is just scoping the `PropagatedMount` paths to the
correct host path.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Setting up the mounts on the host increases chances of mount leakage and
makes for more cleanup after the plugin has stopped.
With this change all mounts for the plugin are performed by the
container runtime and automatically cleaned up when the container exits.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Currently the metrics plugin uses a really hackish host mount with
propagated mounts to get the metrics socket into a plugin after the
plugin is alreay running.
This approach ends up leaking mounts which requires setting the plugin
manager root to private, which causes some other issues.
With this change, plugin subsystems can register a set of modifiers to
apply to the plugin's runtime spec before the plugin is ever started.
This will help to generalize some of the customization work that needs
to happen for various plugin subsystems (and future ones).
Specifically it lets the metrics plugin subsystem append a mount to the
runtime spec to mount the metrics socket in the plugin's mount namespace
rather than the host's and prevetns any leaking due to this mount.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
During a plugin remove, docker performs an `os.Rename` to move the
plugin data dir to a new location before removing to acheive an atomic
removal.
`os.Rename` can return either a `NotExist` error if the source path
doesn't exist, or an `Exist` error if the target path already exists.
Both these cases can happen when there is an error on the final
`os.Remove` call, which is common on older kernels (`device or resource
busy`).
When calling rename, we can safely ignore these error types and proceed
to try and remove the plugin.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Signed-off-by: John Howard <jhoward@microsoft.com>
The re-coalesces the daemon stores which were split as part of the
original LCOW implementation.
This is part of the work discussed in https://github.com/moby/moby/issues/34617,
in particular see the document linked to in that issue.
Instead of having to create a bunch of custom error types that are doing
nothing but wrapping another error in sub-packages, use a common helper
to create errors of the requested type.
e.g. instead of re-implementing this over and over:
```go
type notFoundError struct {
cause error
}
func(e notFoundError) Error() string {
return e.cause.Error()
}
func(e notFoundError) NotFound() {}
func(e notFoundError) Cause() error {
return e.cause
}
```
Packages can instead just do:
```
errdefs.NotFound(err)
```
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Files that are suffixed with `_linux.go` or `_windows.go` are
already only built on Linux / Windows, so these build-tags
were redundant.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Follow the conventions for namespace naming set out by other projects,
such as linuxkit and cri-containerd. Typically, they are some sort of
host name, with a subdomain describing functionality of the namespace.
In the case of linuxkit, services are launched in `services.linuxkit`.
In cri-containerd, pods are launched in `k8s.io`, making it clear that
these are from kubernetes.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Ensures that when a plugin is removed that it doesn't interfere with
other plugins mounts and also ensures its own mounts are cleaned up.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Plugin config can have Mounts without a 'Source' field. In such cases,
performing a 'plugin set' on the mount source will panic the daemon. Its
the same case for device paths as well. This detects the case and
returns error.
Signed-off-by: Anusha Ragunathan <anusha.ragunathan@docker.com>
Instead of duplicating the same if condition per plugin manager directory,
use one if condition and a for-loop.
Signed-off-by: Boaz Shuster <ripcurld.github@gmail.com>
In some circumstances we were not properly releasing plugin references,
leading to failures in removing a plugin with no way to recover other
than restarting the daemon.
1. If volume create fails (in the driver)
2. If a driver validation fails (should be rare)
3. If trying to get a plugin that does not match the passed in capability
Ideally the test for 1 and 2 would just be a unit test, however the
plugin interfaces are too complicated as `plugingetter` relies on
github.com/pkg/plugin/Client (a concrete type), which will require
spinning up services from within the unit test... it just wouldn't be a
unit test at this point.
I attempted to refactor this a bit, but since both libnetwork and
swarmkit are reliant on `plugingetter` as well, this would not work.
This really requires a re-write of the lower-level plugin management to
decouple these pieces.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Signed-off-by: John Howard <jhoward@microsoft.com>
This PR has the API changes described in https://github.com/moby/moby/issues/34617.
Specifically, it adds an HTTP header "X-Requested-Platform" which is a JSON-encoded
OCI Image-spec `Platform` structure.
In addition, it renames (almost all) uses of a string variable platform (and associated)
methods/functions to os. This makes it much clearer to disambiguate with the swarm
"platform" which is really os/arch. This is a stepping stone to getting the daemon towards
fully multi-platform/arch-aware, and makes it clear when "operating system" is being
referred to rather than "platform" which is misleadingly used - sometimes in the swarm
meaning, but more often as just the operating system.
The `filters.Include()` method was deprecated in favor of `filters.Contains()`
in 065118390a, but still used in various
locations.
This patch replaces uses of `filters.Include()` with `filters.Contains()`.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
libcontainerd has a bunch of platform dependent code and huge interfaces
that are a pain implement.
To make the plugin manager a bit easier to work with, extract the plugin
executor into an interface and move the containerd implementation to a
separate package.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This enables docker cp and ADD/COPY docker build support for LCOW.
Originally, the graphdriver.Get() interface returned a local path
to the container root filesystem. This does not work for LCOW, so
the Get() method now returns an interface that LCOW implements to
support copying to and from the container.
Signed-off-by: Akash Gupta <akagup@microsoft.com>
This also update:
- runc to 3f2f8b84a77f73d38244dd690525642a72156c64
- runtime-specs to v1.0.0
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
Use strongly typed errors to set HTTP status codes.
Error interfaces are defined in the api/errors package and errors
returned from controllers are checked against these interfaces.
Errors can be wraeped in a pkg/errors.Causer, as long as somewhere in the
line of causes one of the interfaces is implemented. The special error
interfaces take precedence over Causer, meaning if both Causer and one
of the new error interfaces are implemented, the Causer is not
traversed.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This prevents mounts in the plugins dir from leaking into other
namespaces which can prevent removal (`device or resource busy`),
particularly on older kernels.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Changes most references of syscall to golang.org/x/sys/
Ones aren't changes include, Errno, Signal and SysProcAttr
as they haven't been implemented in /x/sys/.
Signed-off-by: Christopher Jones <tophj@linux.vnet.ibm.com>
[s390x] switch utsname from unsigned to signed
per 33267e036f
char in s390x in the /x/sys/unix package is now signed, so
change the buildtags
Signed-off-by: Christopher Jones <tophj@linux.vnet.ibm.com>
Enables other subsystems to watch actions for a plugin(s).
This will be used specifically for implementing plugins on swarm where a
swarm controller needs to watch the state of a plugin.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Before this patch, if the plugin's `config.json` is successfully removed
but the main plugin state dir could not be removed for some reason (e.g.
leaked mount), it will prevent the daemon from being able to be
restarted.
This patches changes this to atomically remove the plugin such that on
daemon restart we can detect that there was an error and re-try. It also
changes the logic so that it only logs errors on restore rather than
erroring out the daemon.
This also removes some code which is now duplicated elsewhere.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Also, this removes the use of a questionable golang range feature which
corrects for mutation of a slice during iteration over that slice. This
makes the filter operation easier to read and reason about.
Signed-off-by: David Sheets <dsheets@docker.com>
Previously, a 'plugin not found' error would be returned if a plugin to be
retrieved was found but disabled. This was misleading and incorrect. Now,
a new error plugin.ErrDisabled is returned in this case. This makes the
error message when trying to statically start plugins (from daemon.json or
dockerd command line) accurate.
Signed-off-by: David Sheets <dsheets@docker.com>
Increases the test coverage of pkg/plugins.
Changed signature of function NewClientWithTimeout in pkg/plugin/client, to
take time.Duration instead of integers.
Signed-off-by: Raja Sami <raja.sami@tenpearl.com>