Mounting a container's volumes under its rootfs directory inside the
host mount namespace causes problems with cross-namespace mount
propagation when /var/lib/docker is bind-mounted into the container as a
volume. The mount event propagates into the container's mount namespace,
overmounting the volume, but the propagated unmount events do not fully
reverse the effect. Each archive operation causes the mount table in the
container's mount namespace to grow larger and larger, until the kernel
limiton the number of mounts in a namespace is hit. The only solution to
this issue which is not subject to race conditions or other blocker
caveats is to avoid mounting volumes into the container's rootfs
directory in the host mount namespace in the first place.
Mount the container volumes inside an unshared mount namespace to
prevent any mount events from propagating into any other mount
namespace. Greatly simplify the archiving implementations by also
chrooting into the container rootfs to sidestep the need to resolve
paths in the host.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Switch to moby/sys/mount and mountinfo. Keep the pkg/mount for potential
outside users.
This commit was generated by the following bash script:
```
set -e -u -o pipefail
for file in $(git grep -l 'docker/docker/pkg/mount"' | grep -v ^pkg/mount); do
sed -i -e 's#/docker/docker/pkg/mount"#/moby/sys/mount"#' \
-e 's#mount\.\(GetMounts\|Mounted\|Info\|[A-Za-z]*Filter\)#mountinfo.\1#g' \
$file
goimports -w $file
done
```
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
For bind mounts, fstype argument to mount(2) is ignored.
Usual convention is either empty string or "none".
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This allows non-recursive bind-mount, i.e. mount(2) with "bind" rather than "rbind".
Swarm-mode will be supported in a separate PR because of mutual vendoring.
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
This implements chown support on Windows. Built-in accounts as well
as accounts included in the SAM database of the container are supported.
NOTE: IDPair is now named Identity and IDMappings is now named
IdentityMapping.
The following are valid examples:
ADD --chown=Guest . <some directory>
COPY --chown=Administrator . <some directory>
COPY --chown=Guests . <some directory>
COPY --chown=ContainerUser . <some directory>
On Windows an owner is only granted the permission to read the security
descriptor and read/write the discretionary access control list. This
fix also grants read/write and execute permissions to the owner.
Signed-off-by: Salahuddin Khan <salah@docker.com>
This moves the platform specific stuff in a separate package and keeps
the `volume` package and the defined interfaces light to import.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
If the user specifies a mountpath from the host, we should not be
attempting to chown files outside the daemon's metadata directory
(represented by `daemon.repository` at init time).
This forces users who want to use user namespaces to handle the
ownership needs of any external files mounted as network files
(/etc/resolv.conf, /etc/hosts, /etc/hostname) separately from the
daemon. In all other volume/bind mount situations we have taken this
same line--we don't chown host file content.
Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com>
Migrate legacy volumes (Daemon.verifyVolumesInfo) before containers are
registered on the Daemon, so state on disk is not overwritten and legacy
fields lost during registration.
Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
Reuse existing structures and rely on json serialization to deep copy
Container objects.
Also consolidate all "save" operations on container.CheckpointTo, which
now both saves a serialized json to disk, and replicates state to the
ACID in-memory store.
Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
It operates on containers that have already been registered on the
daemon, and are visible to other goroutines.
Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
There is no case which would resolve in this error. The root user always exists, and if the id maps are empty, the default value of 0 is correct.
Signed-off-by: Daniel Nephin <dnephin@docker.com>
If a container mount the socket the daemon is listening on into
container while the daemon is being shutdown, the socket will
not exist on the host, then daemon will assume it's a directory
and create it on the host, this will cause the daemon can't start
next time.
fix issue https://github.com/moby/moby/issues/30348
To reproduce this issue, you can add following code
```
--- a/daemon/oci_linux.go
+++ b/daemon/oci_linux.go
@@ -8,6 +8,7 @@ import (
"sort"
"strconv"
"strings"
+ "time"
"github.com/Sirupsen/logrus"
"github.com/docker/docker/container"
@@ -666,7 +667,8 @@ func (daemon *Daemon) createSpec(c *container.Container) (*libcontainerd.Spec, e
if err := daemon.setupIpcDirs(c); err != nil {
return nil, err
}
-
+ fmt.Printf("===please stop the daemon===\n")
+ time.Sleep(time.Second * 2)
ms, err := daemon.setupMounts(c)
if err != nil {
return nil, err
```
step1 run a container which has `--restart always` and `-v /var/run/docker.sock:/sock`
```
$ docker run -ti --restart always -v /var/run/docker.sock:/sock busybox
/ #
```
step2 exit the the container
```
/ # exit
```
and kill the daemon when you see
```
===please stop the daemon===
```
in the daemon log
The daemon can't restart again and fail with `can't create unix socket /var/run/docker.sock: is a directory`.
Signed-off-by: Lei Jitang <leijitang@huawei.com>
`Mounts` allows users to specify in a much safer way the volumes they
want to use in the container.
This replaces `Binds` and `Volumes`, which both still exist, but
`Mounts` and `Binds`/`Volumes` are exclussive.
The CLI will continue to use `Binds` and `Volumes` due to concerns with
parsing the volume specs on the client side and cross-platform support
(for now).
The new API follows exactly the services mount API.
Example usage of `Mounts`:
```
$ curl -XPOST localhost:2375/containers/create -d '{
"Image": "alpine:latest",
"HostConfig": {
"Mounts": [{
"Type": "Volume",
"Target": "/foo"
},{
"Type": "bind",
"Source": "/var/run/docker.sock",
"Target": "/var/run/docker.sock",
},{
"Type": "volume",
"Name": "important_data",
"Target": "/var/data",
"ReadOnly": true,
"VolumeOptions": {
"DriverConfig": {
Name: "awesomeStorage",
Options: {"size": "10m"},
Labels: {"some":"label"}
}
}]
}
}'
```
There are currently 2 types of mounts:
- **bind**: Paths on the host that get mounted into the
container. Paths must exist prior to creating the container.
- **volume**: Volumes that persist after the
container is removed.
Not all fields are available in each type, and validation is done to
ensure these fields aren't mixed up between types.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This was removed in a clean-up
(060f4ae617) but should not have been.
Fixes issues with volumes when upgrading from pre-1.7.0 daemons.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This makes sure that:
1. Already existing directories are left untouched
2. Newly created directories are chowned to the correct root UID/GID in case of user namespaces
3. All parent directories still get created with host root UID/GID
Fix#21738
Signed-off-by: Antonis Kalipetis <akalipetis@gmail.com>
In order to be consistent on creation of volumes for bind mounts
we need to create the source directory if it does not exist and the
user specified he wants it relabeled.
Can not do this lower down the stack, since we are not passing in the
mode fields.
Signed-off-by: Dan Walsh <dwalsh@redhat.com>
Currently on daemon start volumes are "created" which involves invoking
a volume driver if needed. If this process fails the mount is left in a
bad state in which there is no source or Volume set. This now becomes
an unrecoverable state in which that container can not be started. The
only way to fix is to restart the daemon and hopefully you don't get
another error on startup.
This change moves "createVolume" to be done at container start. If the
start fails it leaves it in the state in which you can try another
start. If the second start can contact the volume driver everything
will recover fine.
Signed-off-by: Darren Shepherd <darren@rancher.com>
Allow passing mount propagation option shared, slave, or private as volume
property.
For example.
docker run -ti -v /root/mnt-source:/root/mnt-dest:slave fedora bash
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
So other packages don't need to import the daemon package when they
want to use this struct.
Signed-off-by: David Calavera <david.calavera@gmail.com>
Signed-off-by: Tibor Vass <tibor@docker.com>
By removing deprecated volume structures, now that windows mount volumes we don't need a initializer per platform.
Signed-off-by: David Calavera <david.calavera@gmail.com>
All the go-lint work forced any existing "Uid" -> "UID", but seems to
not have the same rules for Gid, so stat package has calls UID() and
Gid().
Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
Adds support for the daemon to handle user namespace maps as a
per-daemon setting.
Support for handling uid/gid mapping is added to the builder,
archive/unarchive packages and functions, all graphdrivers (except
Windows), and the test suite is updated to handle user namespace daemon
rootgraph changes.
Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
Although having a request ID available throughout the codebase is very
valuable, the impact of requiring a Context as an argument to every
function in the codepath of an API request, is too significant and was
not properly understood at the time of the review.
Furthermore, mixing API-layer code with non-API-layer code makes the
latter usable only by API-layer code (one that has a notion of Context).
This reverts commit de41640435, reversing
changes made to 7daeecd42d.
Signed-off-by: Tibor Vass <tibor@docker.com>
Conflicts:
api/server/container.go
builder/internals.go
daemon/container_unix.go
daemon/create.go
This PR adds a "request ID" to each event generated, the 'docker events'
stream now looks like this:
```
2015-09-10T15:02:50.000000000-07:00 [reqid: c01e3534ddca] de7c5d4ca927253cf4e978ee9c4545161e406e9b5a14617efb52c658b249174a: (from ubuntu) create
```
Note the `[reqID: c01e3534ddca]` part, that's new.
Each HTTP request will generate its own unique ID. So, if you do a
`docker build` you'll see a series of events all with the same reqID.
This allow for log processing tools to determine which events are all related
to the same http request.
I didn't propigate the context to all possible funcs in the daemon,
I decided to just do the ones that needed it in order to get the reqID
into the events. I'd like to have people review this direction first, and
if we're ok with it then I'll make sure we're consistent about when
we pass around the context - IOW, make sure that all funcs at the same level
have a context passed in even if they don't call the log funcs - this will
ensure we're consistent w/o passing it around for all calls unnecessarily.
ping @icecrime @calavera @crosbymichael
Signed-off-by: Doug Davis <dug@us.ibm.com>
Noteworthy changes:
- Add Prestart/Poststop hook support
- Fix bug finding cgroup mount directory
- Add OomScoreAdj as a container configuration option
- Ensure the cleanup jobs in the deferrer are executed on error
- Don't make modifications to /dev when it is bind mounted
Other changes in runc:
https://github.com/opencontainers/runc/compare/v0.0.3...v0.0.4
Signed-off-by: David Calavera <david.calavera@gmail.com>