This changes mounts.NewParser() to create a parser for the current operatingsystem,
instead of one specific to a (possibly non-matching, in case of LCOW) OS.
With the OS-specific handling being removed, the "OS" parameter is also removed
from `daemon.verifyContainerSettings()`, and various other container-related
functions.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- Remove the windowsparser.HasResource() override, as it was the same on both
Windows and Linux
- Move the rxLCOWDestination to the lcowParser code
- Move the rwModes variable to a generic (non-platform-specific) file, as it's
used both for the windowsParser and the linuxParser
- Some minor formatting and linting changes
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The VolumesService did not have information wether or not a volume
was _created_ or if a volume already existed in the driver, and
the existing volume was used.
As a result, multiple "create" events could be generated for the
same volume. For example:
1. Run `docker events` in a shell to start listening for events
2. Create a volume:
docker volume create myvolume
3. Start a container that uses that volume:
docker run -dit -v myvolume:/foo busybox
4. Check the events that were generated:
2021-02-15T18:49:55.874621004+01:00 volume create myvolume (driver=local)
2021-02-15T18:50:11.442759052+01:00 volume create myvolume (driver=local)
2021-02-15T18:50:11.487104176+01:00 container create 45112157c8b1382626bf5e01ef18445a4c680f3846c5e32d01775dddee8ca6d1 (image=busybox, name=gracious_hypatia)
2021-02-15T18:50:11.519288102+01:00 network connect a19f6bb8d44ff84d478670fa4e34c5bf5305f42786294d3d90e790ac74b6d3e0 (container=45112157c8b1382626bf5e01ef18445a4c680f3846c5e32d01775dddee8ca6d1, name=bridge, type=bridge)
2021-02-15T18:50:11.526407799+01:00 volume mount myvolume (container=45112157c8b1382626bf5e01ef18445a4c680f3846c5e32d01775dddee8ca6d1, destination=/foo, driver=local, propagation=, read/write=true)
2021-02-15T18:50:11.864134043+01:00 container start 45112157c8b1382626bf5e01ef18445a4c680f3846c5e32d01775dddee8ca6d1 (image=busybox, name=gracious_hypatia)
5. Notice that a "volume create" event is created twice;
- once when `docker volume create` was ran
- once when `docker run ...` was ran
This patch moves the generation of (most) events to the volume _store_, and only
generates an event if the volume did not yet exist.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Various dirs in /var/lib/docker contain data that needs to be mounted
into a container. For this reason, these dirs are set to be owned by the
remapped root user, otherwise there can be permissions issues.
However, this uneccessarily exposes these dirs to an unprivileged user
on the host.
Instead, set the ownership of these dirs to the real root (or rather the
UID/GID of dockerd) with 0701 permissions, which allows the remapped
root to enter the directories but not read/write to them.
The remapped root needs to enter these dirs so the container's rootfs
can be configured... e.g. to mount /etc/resolve.conf.
This prevents an unprivileged user from having read/write access to
these dirs on the host.
The flip side of this is now any user can enter these directories.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit e908cc3901)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
full diff: https://github.com/moby/sys/compare/mountinfo/v0.1.3...mountinfo/v0.4.0
> Note that this dependency uses submodules, providing "github.com/moby/sys/mount"
> and "github.com/moby/sys/mountinfo". Our vendoring tool (vndr) currently doesn't
> support submodules, so we vendor the top-level moby/sys repository (which contains
> both) and pick the most recent tag, which could be either `mountinfo/vXXX` or
> `mount/vXXX`.
github.com/moby/sys/mountinfo v0.4.0
--------------------------------------------------------------------------------
Breaking changes:
- `PidMountInfo` is now deprecated and will be removed before v1.0; users should switch to `GetMountsFromReader`
Fixes and improvements:
- run filter after all fields are parsed
- correct handling errors from bufio.Scan
- documentation formatting fixes
github.com/moby/sys/mountinfo v0.3.1
--------------------------------------------------------------------------------
- mount: use MNT_* flags from golang.org/x/sys/unix on freebsd
- various godoc and CI fixes
- mountinfo: make GetMountinfoFromReader Linux-specific
- Add support for OpenBSD in addition to FreeBSD
- mountinfo: use idiomatic naming for fields
github.com/moby/sys/mountinfo v0.2.0
--------------------------------------------------------------------------------
Bug fixes:
- Fix path unescaping for paths with double quotes
Improvements:
- Mounted: speed up by adding fast paths using openat2 (Linux-only) and stat
- Mounted: relax path requirements (allow relative, non-cleaned paths, symlinks)
- Unescape fstype and source fields
- Documentation improvements
Testing/CI:
- Unit tests: exclude darwin
- CI: run tests under Fedora 32 to test openat2
- TestGetMounts: fix for Ubuntu build system
- Makefile: fix ignoring test failures
- CI: add cross build
github.com/moby/sys/mount v0.1.1
--------------------------------------------------------------------------------
https://github.com/moby/sys/releases/tag/mount%2Fv0.1.1
Improvements:
- RecursiveUnmount: add a fast path (#26)
- Unmount: improve doc
- fix CI linter warning on Windows
Testing/CI:
- Unit tests: exclude darwin
- Makefile: fix ignoring test failures
- CI: add cross build
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This patch adds a new "prune" event type to indicate that pruning of a resource
type completed.
This event-type can be used on systems that want to perform actions after
resources have been cleaned up. For example, Docker Desktop performs an fstrim
after resources are deleted (https://github.com/linuxkit/linuxkit/tree/v0.7/pkg/trim-after-delete).
While the current (remove, destroy) events can provide information on _most_
resources, there is currently no event triggered after the BuildKit build-cache
is cleaned.
Prune events have a `reclaimed` attribute, indicating the amount of space that
was reclaimed (in bytes). The attribute can be used, for example, to use as a
threshold for performing fstrim actions. Reclaimed space for `network` events
will always be 0, but the field is added to be consistent with prune events for
other resources.
To test this patch:
Create some resources:
for i in foo bar baz; do \
docker network create network_$i \
&& docker volume create volume_$i \
&& docker run -d --name container_$i -v volume_$i:/volume busybox sh -c 'truncate -s 5M somefile; truncate -s 5M /volume/file' \
&& docker tag busybox:latest image_$i; \
done;
docker pull alpine
docker pull nginx:alpine
echo -e "FROM busybox\nRUN truncate -s 50M bigfile" | DOCKER_BUILDKIT=1 docker build -
Start listening for "prune" events in another shell:
docker events --filter event=prune
Prune containers, networks, volumes, and build-cache:
docker system prune -af --volumes
See the events that are returned:
docker events --filter event=prune
2020-07-25T12:12:09.268491000Z container prune (reclaimed=15728640)
2020-07-25T12:12:09.447890400Z network prune (reclaimed=0)
2020-07-25T12:12:09.452323000Z volume prune (reclaimed=15728640)
2020-07-25T12:12:09.517236200Z image prune (reclaimed=21568540)
2020-07-25T12:12:09.566662600Z builder prune (reclaimed=52428841)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Commit 12c7541f1f updated the
opencontainers/selinux dependency to v1.3.1, which had a breaking
change in the errors that were returned.
Before v1.3.1, the "raw" `syscall.ENOTSUP` was returned if the
underlying filesystem did not support xattrs, but later versions
wrapped the error, which caused our detection to fail.
This patch uses `errors.Is()` to check for the underlying error.
This requires github.com/pkg/errors v0.9.1 or above (older versions
could use `errors.Cause()`, but are not compatible with "native"
wrapping of errors in Go 1.13 and up, and could potentially cause
these errors to not being detected again.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Switch to moby/sys/mount and mountinfo. Keep the pkg/mount for potential
outside users.
This commit was generated by the following bash script:
```
set -e -u -o pipefail
for file in $(git grep -l 'docker/docker/pkg/mount"' | grep -v ^pkg/mount); do
sed -i -e 's#/docker/docker/pkg/mount"#/moby/sys/mount"#' \
-e 's#mount\.\(GetMounts\|Mounted\|Info\|[A-Za-z]*Filter\)#mountinfo.\1#g' \
$file
goimports -w $file
done
```
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
opts/env_test: suppress a linter warning
this one:
> opts/env_test.go:95:4: U1000: field `err` is unused (unused)
> err error
> ^
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Errors were being ignored and always telling the user that the path
doesn't exist even if it was some other problem, such as a permission
error.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This comes from an old suggestion (https://github.com/docker/cli/issues/706#issuecomment-371157691) on an issue we were having and has since popped up again. For NFS volumes, Docker will do an IP lookup on the volume name. This is not done for CIFS volumes, which forces you to add the volume via IP address instead. This change will enable the IP lookup also for CIFS volumes.
Signed-off-by: Shu-Wai Chow <shu-wai.chow@seattlechildrens.org>
Using `errors.Errorf()` passes the error with the stack trace for
debugging purposes.
Also using `errdefs.InvalidParameter` for Windows, so that the API
will return a 4xx status, instead of a 5xx, and added tests for
both validations.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Description:
When using local volume option such as size=10G, type=tmpfs, if we provide wrong options, we could create volume successfully.
But when we are ready to use it, it will fail to start container by failing to mount the local volume(invalid option).
We should check the options at when we create it.
Signed-off-by: Wentao Zhang <zhangwentao234@huawei.com>
Signed-off-by: Vincent Demeester <vincent@sbr.pm>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The errors returned from Mount and Unmount functions are raw
syscall.Errno errors (like EPERM or EINVAL), which provides
no context about what has happened and why.
Similar to os.PathError type, introduce mount.Error type
with some context. The error messages will now look like this:
> mount /tmp/mount-tests/source:/tmp/mount-tests/target, flags: 0x1001: operation not permitted
or
> mount tmpfs:/tmp/mount-test-source-516297835: operation not permitted
Before this patch, it was just
> operation not permitted
[v2: add Cause()]
[v3: rename MountError to Error, document Cause()]
[v4: fixes; audited all users]
[v5: make Error type private; changes after @cpuguy83 reviews]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This allows non-recursive bind-mount, i.e. mount(2) with "bind" rather than "rbind".
Swarm-mode will be supported in a separate PR because of mutual vendoring.
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
These messages were enhanced to include the path that was
missing (in df6af282b9), but
also changed the first part of the message.
This change complicates running e2e tests with mixed versions
of the engine.
Looking at the full error message, "mount" is a bit redundant
as well, because the error message already indicates this is
about a "mount";
docker run --rm --mount type=bind,source=/no-such-thing,target=/foo busybox
docker: Error response from daemon: invalid mount config for type "bind": bind mount source path does not exist: /no-such-thing.
Removing the "mount" part from the error message, because
it was redundant, and makes cross-version testing easier :)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This implements chown support on Windows. Built-in accounts as well
as accounts included in the SAM database of the container are supported.
NOTE: IDPair is now named Identity and IDMappings is now named
IdentityMapping.
The following are valid examples:
ADD --chown=Guest . <some directory>
COPY --chown=Administrator . <some directory>
COPY --chown=Guests . <some directory>
COPY --chown=ContainerUser . <some directory>
On Windows an owner is only granted the permission to read the security
descriptor and read/write the discretionary access control list. This
fix also grants read/write and execute permissions to the owner.
Signed-off-by: Salahuddin Khan <salah@docker.com>
When using the mounts API, bind mounts are not supposed to be
automatically created.
Before this patch there is a race condition between valiating that a
bind path exists and then actually setting up the bind mount where the
bind path may exist during validation but was removed during mountpooint
setup.
This adds a field to the mountpoint struct to ensure that binds created
over the mounts API are not accidentally created.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This makes it a bit simpler to remove this interface for v2 plugins
and not break external projects (libnetwork and swarmkit).
Note that before we remove the `Client()` interface from `CompatPlugin`
libnetwork and swarmkit must be updated to explicitly check for the v1
client interface as is done int his PR.
This is just a minor tweak that I realized is needed after trying to
implement the needed changes on libnetwork.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This is not for the sake of test to run faster of course;
this is to simplify the code as well as have some more
testing for mount.SingleEntryFilter().
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
There is no need to parse mount table and iterate through the list of
mounts, and then call Unmount() which again parses the mount table and
iterates through the list of mounts.
It is totally OK to call Unmount() unconditionally.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Functions `GetMounts()` and `parseMountTable()` return all the entries
as read and parsed from /proc/self/mountinfo. In many cases the caller
is only interested only one or a few entries, not all of them.
One good example is `Mounted()` function, which looks for a specific
entry only. Another example is `RecursiveUnmount()` which is only
interested in mount under a specific path.
This commit adds `filter` argument to `GetMounts()` to implement
two things:
1. filter out entries a caller is not interested in
2. stop processing if a caller is found what it wanted
`nil` can be passed to get a backward-compatible behavior, i.e. return
all the entries.
A few filters are implemented:
- `PrefixFilter`: filters out all entries not under `prefix`
- `SingleEntryFilter`: looks for a specific entry
Finally, `Mounted()` is modified to use `SingleEntryFilter()`, and
`RecursiveUnmount()` is using `PrefixFilter()`.
Unit tests are added to check filters are working.
[v2: ditch NoFilter, use nil]
[v3: ditch GetMountsFiltered()]
[v4: add unit test for filters]
[v5: switch to gotestyourself]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This moves the platform specific stuff in a separate package and keeps
the `volume` package and the defined interfaces light to import.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Instead of using a global store for volume drivers, scope the driver
store to the caller (e.g. the volume store). This makes testing much
simpler.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Changes Details:
--------------
Fixes: #36395
Refactoring the code to do the following:
1. Add the method `errBindSourceDoesNotExist` inside `validate.go` to be in-line with the rest of error message
2. Utilised the new method inside `linux_parser.go`, `windows_parser.go` and `validate_test.go`
3. Change the format from `bind mount source path: '%s' does not exist` to `bind mount source path does not exist: %s`
4. Reflected the format change into the 2 unit tests, namely: `volume_test.go` and `validate_test.go`
5. Reflected the format change into `docker_api_containers_test.go` integration test
Signed-off-by: Amr Gawish <amr.gawish@gmail.com>
Before this change, volume management was relying on the fact that
everything the plugin mounts is visible on the host within the plugin's
rootfs. In practice this caused some issues with mount leaks, so we
changed the behavior such that mounts are not visible on the plugin's
rootfs, but available outside of it, which breaks volume management.
To fix the issue, allow the plugin to scope the path correctly rather
than assuming that everything is visible in `p.Rootfs`.
In practice this is just scoping the `PropagatedMount` paths to the
correct host path.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Both lcow_parser.go and linux_parser.go are duplicating the error:
"invalid specification: destination can't be '/'"
This commit creates a new error called "ErrVolumeTargetIsRoot"
that is used by both linux_parser and lcow_parser and remove
the duplication in the code.
Signed-off-by: Boaz Shuster <ripcurld.github@gmail.com>
Instead of having to create a bunch of custom error types that are doing
nothing but wrapping another error in sub-packages, use a common helper
to create errors of the requested type.
e.g. instead of re-implementing this over and over:
```go
type notFoundError struct {
cause error
}
func(e notFoundError) Error() string {
return e.cause.Error()
}
func(e notFoundError) NotFound() {}
func(e notFoundError) Cause() error {
return e.cause
}
```
Packages can instead just do:
```
errdefs.NotFound(err)
```
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Validation of Mounts was only performed on container _creation_, not on
container _start_. As a result, if the host-path no longer existed
when the container was started, a directory was created in the given
location.
This is the wrong behavior, because when using the `Mounts` API, host paths
should never be created, and an error should be produced instead.
This patch adds a validation step on container start, and produces an
error if the host path is not found.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This subtle bug keeps lurking in because error checking for `Mkdir()`
and `MkdirAll()` is slightly different wrt to `EEXIST`/`IsExist`:
- for `Mkdir()`, `IsExist` error should (usually) be ignored
(unless you want to make sure directory was not there before)
as it means "the destination directory was already there"
- for `MkdirAll()`, `IsExist` error should NEVER be ignored.
Mostly, this commit just removes ignoring the IsExist error, as it
should not be ignored.
Also, there are a couple of cases then IsExist is handled as
"directory already exist" which is wrong. As a result, some code
that never worked as intended is now removed.
NOTE that `idtools.MkdirAndChown()` behaves like `os.MkdirAll()`
rather than `os.Mkdir()` -- so its description is amended accordingly,
and its usage is handled as such (i.e. IsExist error is not ignored).
For more details, a quote from my runc commit 6f82d4b (July 2015):
TL;DR: check for IsExist(err) after a failed MkdirAll() is both
redundant and wrong -- so two reasons to remove it.
Quoting MkdirAll documentation:
> MkdirAll creates a directory named path, along with any necessary
> parents, and returns nil, or else returns an error. If path
> is already a directory, MkdirAll does nothing and returns nil.
This means two things:
1. If a directory to be created already exists, no error is
returned.
2. If the error returned is IsExist (EEXIST), it means there exists
a non-directory with the same name as MkdirAll need to use for
directory. Example: we want to MkdirAll("a/b"), but file "a"
(or "a/b") already exists, so MkdirAll fails.
The above is a theory, based on quoted documentation and my UNIX
knowledge.
3. In practice, though, current MkdirAll implementation [1] returns
ENOTDIR in most of cases described in #2, with the exception when
there is a race between MkdirAll and someone else creating the
last component of MkdirAll argument as a file. In this very case
MkdirAll() will indeed return EEXIST.
Because of #1, IsExist check after MkdirAll is not needed.
Because of #2 and #3, ignoring IsExist error is just plain wrong,
as directory we require is not created. It's cleaner to report
the error now.
Note this error is all over the tree, I guess due to copy-paste,
or trying to follow the same usage pattern as for Mkdir(),
or some not quite correct examples on the Internet.
[1] https://github.com/golang/go/blob/f9ed2f75/src/os/path.go
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Before this, if a volume exists in a driver but not in the local cache,
the store would just return a bare volume. This means that if a user
supplied options or labels, they will not get stored.
Instead only return early if we have the volume stored locally. Note
this could still have an issue with labels/opts passed in by the user
differing from what is stored, however this isn't really a new problem.
This fixes a problem where if there is a shared storage backend between
two docker nodes, a create on one node will have labels stored and a
create on the other node will not.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
In some circumstances we were not properly releasing plugin references,
leading to failures in removing a plugin with no way to recover other
than restarting the daemon.
1. If volume create fails (in the driver)
2. If a driver validation fails (should be rare)
3. If trying to get a plugin that does not match the passed in capability
Ideally the test for 1 and 2 would just be a unit test, however the
plugin interfaces are too complicated as `plugingetter` relies on
github.com/pkg/plugin/Client (a concrete type), which will require
spinning up services from within the unit test... it just wouldn't be a
unit test at this point.
I attempted to refactor this a bit, but since both libnetwork and
swarmkit are reliant on `plugingetter` as well, this would not work.
This really requires a re-write of the lower-level plugin management to
decouple these pieces.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
When using a volume via the `Binds` API, a shared selinux label is
automatically set.
The `Mounts` API is not setting this, which makes volumes specified via
the mounts API useless when selinux is enabled.
This fix adopts the same selinux label for volumes on the mounts API as on
binds.
Note in the case of both the `Binds` API and the `Mounts` API, the
selinux label is only applied when the volume driver is the `local`
driver.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Use strongly typed errors to set HTTP status codes.
Error interfaces are defined in the api/errors package and errors
returned from controllers are checked against these interfaces.
Errors can be wraeped in a pkg/errors.Causer, as long as somewhere in the
line of causes one of the interfaces is implemented. The special error
interfaces take precedence over Causer, meaning if both Causer and one
of the new error interfaces are implemented, the Causer is not
traversed.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Current insider builds of Windows have support for mounting individual
named pipe servers from the host to the guest. This allows, for example,
exposing the docker engine's named pipe to a container.
This change allows the user to request such a mount via the normal bind
mount syntax in the CLI:
docker run -v \\.\pipe\docker_engine:\\.\pipe\docker_engine <args>
Signed-off-by: John Starks <jostarks@microsoft.com>
[1.12.x] Fix issue where volume metadata was not removed
(cherry picked from commit 7613b23a58)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Conflicts:
volume/store/store.go
volume/store/store_test.go
If a container mount the socket the daemon is listening on into
container while the daemon is being shutdown, the socket will
not exist on the host, then daemon will assume it's a directory
and create it on the host, this will cause the daemon can't start
next time.
fix issue https://github.com/moby/moby/issues/30348
To reproduce this issue, you can add following code
```
--- a/daemon/oci_linux.go
+++ b/daemon/oci_linux.go
@@ -8,6 +8,7 @@ import (
"sort"
"strconv"
"strings"
+ "time"
"github.com/Sirupsen/logrus"
"github.com/docker/docker/container"
@@ -666,7 +667,8 @@ func (daemon *Daemon) createSpec(c *container.Container) (*libcontainerd.Spec, e
if err := daemon.setupIpcDirs(c); err != nil {
return nil, err
}
-
+ fmt.Printf("===please stop the daemon===\n")
+ time.Sleep(time.Second * 2)
ms, err := daemon.setupMounts(c)
if err != nil {
return nil, err
```
step1 run a container which has `--restart always` and `-v /var/run/docker.sock:/sock`
```
$ docker run -ti --restart always -v /var/run/docker.sock:/sock busybox
/ #
```
step2 exit the the container
```
/ # exit
```
and kill the daemon when you see
```
===please stop the daemon===
```
in the daemon log
The daemon can't restart again and fail with `can't create unix socket /var/run/docker.sock: is a directory`.
Signed-off-by: Lei Jitang <leijitang@huawei.com>