This patch updates container.getResourcePath and container.getRootResourcePath
to return the error from symlink.FollowSymlinkInScope (rather than using utils).
Docker-DCO-1.1-Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> (github: cyphar)
Remove Inject to help rebase
Docker-DCO-1.1-Signed-off-by: Tibor Vass <teabee89@gmail.com> (github: tiborvass)
Docker-DCO-1.1-Signed-off-by: cyphar <cyphar@cyphar.com> (github: tiborvass)
as a maintainer.
Best of luck on your e-commerce business Guillaume, and thanks for all
the great contributions!
Docker-DCO-1.1-Signed-off-by: Solomon Hykes <solomon@docker.com> (github: shykes)
If this is at the root directory for the daemon you could unmount
somones filesystem when you stop docker and this is actually only needed
for the palces that the graph drivers mount the container's root
filesystems.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
The blkdiscard hack we do on container/image delete is pretty slow, but
required to restore space to the "host" root filesystem. However, it
is pretty useless on raw devices, and you may not need it in development
either.
In a simple test of the devicemapper backend on loopback the time to
delete 20 container went from 11 seconds to 0.4 seconds with
--storage-opt blkdiscard=false.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
This adds dm.datadev and dm.metadatadev options that you can use with
--storage-opt to set to specific devices to use for the thin
provisioning pool.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
This adds the following --storage-opts for the daemon:
dm.fs: The filesystem to use for the base image
dm.mkfsarg: Add an argument to the mkfs command for the base image
dm.mountopt: Add a mount option for devicemapper mount
Currently supported filesystems are xfs and ext4.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
This allows setting these settings to be passed:
dm.basesize
dm.loopdatasize
dm.loopmetadatasize
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
If we can't even get the current device mapper driver version, then
we cleanly fail the devmapper driver as not supported and fall back
on the next one.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
This patch adds pause/unpause to the command line, api, and drivers
for use on containers. This is implemented using the cgroups/freeze
utility in libcontainer and lxc freeze/unfreeze.
Co-Authored-By: Eric Windisch <ewindisch@docker.com>
Co-Authored-By: Chris Alfonso <calfonso@redhat.com>
Docker-DCO-1.1-Signed-off-by: Ian Main <imain@redhat.com> (github: imain)
This only works if the file or dir is already created in
the image before setting it to be a volume. There is no way around this
because we don't have the data avaliable to set the volume at the
beginning of the dockerfile
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
This is a fix for a race condition in the LXC driver. This is described
more in issue #6092.
Closes#6092
Docker-DCO-1.1-Signed-off-by: Shane Canon <scanon@lbl.gov> (github: scanon)
This also makes sure that devices are pointers to avoid copies
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
We now have one place that keeps track of (most) devices that are allowed and created within the container. That place is pkg/libcontainer/devices/devices.go
This fixes several inconsistencies between which devices were created in the lxc backend and the native backend. It also fixes inconsistencies between wich devices were created and which were allowed. For example, /dev/full was being created but it was not allowed within the cgroup. It also declares the file modes and permissions of the default devices, rather than copying them from the host. This is in line with docker's philosphy of not being host dependent.
Docker-DCO-1.1-Signed-off-by: Timothy Hobbs <timothyhobbs@seznam.cz> (github: https://github.com/timthelion)
There are two cases where we can't use a graphdriver:
1) the graphdriver itself isn't supported by the system
2) the graphdriver is supported by some configuration/prerequisites are
missing
This introduces a new error for the 2) case and uses it when trying to
run docker with btrfs backend on a non-btrfs filesystem.
Docker-DCO-1.1-Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org> (github: discordianfish)
We don't need ordered set anymore, also some cleanings and simple
benchmark.
Docker-DCO-1.1-Signed-off-by: Alexandr Morozov <lk4d4math@gmail.com> (github: LK4D4)
This patch fixes the incorrect handling of paths which contain a
symlink as a path component when copying data from a container.
Essentially, this patch changes the container.Copy() method to
first "resolve" the resource by resolving all of symlinks encountered
in the path relative to the container's rootfs (using pkg/symlink).
Docker-DCO-1.1-Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> (github: cyphar)
Fixes#2586
This fixes a few races where the name generator asks if a name is free
but another container takes the name before it can be reserved. This
solves this by generating the name and setting it. If the set fails
with a non unique error then we try again.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
There is no reason to do discard durink mkfs, as the filesystem
is on a newly allocated device anyway. Discard is a slow operation,
so this may help initial startup a bit, especially if you use a larger
thin pool.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
Add specific types for Required and Optional DeviceNodes
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
Fixes#5692
This change requires lxc 1.0+ to work and breaks lxc versions less than
1.0 for host networking. We think that this is a find tradeoff by
bumping docker to only support lxc 1.0
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
We need SETFCAP to be able to mark files as having caps, which is
heavily used by fedora.
See https://github.com/dotcloud/docker/issues/5928
We also need SETPCAP, for instance systemd needs this to set caps
on its childen.
Both of these are safe in the sense that they can never ever
result in a process with a capability not in the bounding set of the
container.
We also add NET_BIND_SERVICE caps, to be able to bind to ports lower
than 1024.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
systemd systems do not require a /etc/hosts file exists since an nss
module is shipped that creates localhost implicitly. So, mounting
/etc/hosts can fail on these sorts of systems, as was reported on CoreOS
in issue #5812.
Instead of trying to bind mount just copy the hosts entries onto the
containers private /etc/hosts.
Docker-DCO-1.1-Signed-off-by: Brandon Philips <brandon.philips@coreos.com> (github: philips)
Now that we have the generic graphtest tests that actually tests
the driver we can remove the old mock-using tests. Almost all of
these tests were disabled anyway, and the four remaining ones
didn't really test much while at the same time being really
fragile and making the rest of the code more complex due to
the mocking setup.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
those that were specified in the config. This commit also explicitly
adds a set of capabilities that we were silently not dropping and were
assumed by the tests.
Docker-DCO-1.1-Signed-off-by: Victor Marmol <vmarmol@google.com> (github: vmarmol)
This commit makes the Docker daemon call UpdateSuffixarray only after
it finishes registering all containers.
This lowers the amount of time required for the Docker daemon to start
up.
Docker-DCO-1.1-Signed-off-by: Cristian Staretu <cristian.staretu@gmail.com> (github: unclejack)
This moves the call to sort in daemon/history to a function to be
called explicitly when we're done adding elements to the list.
This speeds up `docker ps`.
Docker-DCO-1.1-Signed-off-by: Cristian Staretu <cristian.staretu@gmail.com> (github: unclejack)
Now IP reuses only after all IPs from network was allocated
Fixes#5729
Docker-DCO-1.1-Signed-off-by: Alexandr Morozov <lk4d4math@gmail.com> (github: LK4D4)
This patch is a preventative patch, it fixes possible future
vulnerabilities regarding unsantised paths. Due to several recent
vulnerabilities, wherein the docker daemon could be fooled into
accessing data from the host (rather than a container), this patch
was created to try and mitigate future possible vulnerabilities in
the same vein.
Docker-DCO-1.1-Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> (github: cyphar)
This patch fixes the bug that allowed cp to copy files outside of
the containers rootfs, by passing a relative path (such as
../../../../../../../../etc/shadow). This is fixed by first converting
the path to an absolute path (relative to /) and then appending it
to the container's rootfs before continuing.
Docker-DCO-1.1-Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> (github: cyphar)
All modern distros set up /run to be a tmpfs, see for instance:
https://wiki.debian.org/ReleaseGoals/RunDirectory
Its a very useful place to store pid-files, sockets and other things
that only live at runtime and that should not be stored in the image.
This is also useful when running systemd inside a container, as it
will try to mount /run if not already mounted, which will fail for
non-privileged container.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
The btrfs driver attempts to stat the /var/lib/docker directory to
ensure it exists. If it doesn't exist then it bails with an unhelpful
log line:
```
2014/05/10 00:51:30 no such file or directory
```
In 0.10 the directory was created but quickly digging through the logs I
can't tell what sort of re-ordering of code caused this regression.
Docker-DCO-1.1-Signed-off-by: Brandon Philips <brandon.philips@coreos.com> (github: philips)
For now this means the btrfs backend is skipped when run
inside make test. You can however run it manually if you want.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
If a graphdriver fails initialization due to ErrNotSupported we ignore
that and keep trying the next. But if some driver has a different
error (for instance if you specified an unknown option for it) we fail
the daemon startup, printing the error, rather than falling back to an
unexected driver (typically vfs) which may not match what you have run
earlier.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
This adds daemon/graphdriver/graphtest/graphtest which has a few
generic tests for all graph drivers, and then uses these
from the btrs, devicemapper and vfs backends.
I've not yet added the aufs backend, because i can't test that here
atm. It should work though.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
Currently the tests that mocks or denies functions leave this state
around for the next test. This is no good if we want to actually
test the devicemapper code in later tests.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
This moves the Attach method from the container to the daemon. This
method mostly supports the http attach logic and does not have anything
to do with the running of a container.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
This updates systemd.Apply to match the fs backend by:
* Always join blockio controller (for stats)
* Support CpusetCpus
* Support MemorySwap
Also, it removes the generic UnitProperties in favour of a single
option to set the slice.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
Also make sure we copy the joining containers hosts and resolv.conf with
the hostname if we are joining it's network stack.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
This is required to address a race condition described in #5553,
where a container can be partially deleted -- for example, the
root filesystem but not the init filesystem -- which makes
it impossible to delete the container without re-adding the
missing filesystems manually.
This behavior has been witnessed when rebooting boxes that
are configured to remove containers on shutdown in parallel
with stopping the Docker daemon.
Docker-DCO-1.1-Signed-off-by: Gabriel Monroy <gabriel@opdemand.com> (github: gabrtv)
We don't have the flexibility to do extra things with lxc because it is
a black box and most fo the magic happens before we get a chance to
interact with it in dockerinit.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
This also cleans up some of the left over restriction paths code from
before.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
It has been pointed out that some files in /proc and /sys can be used
to break out of containers. However, if those filesystems are mounted
read-only, most of the known exploits are mitigated, since they rely
on writing some file in those filesystems.
This does not replace security modules (like SELinux or AppArmor), it
is just another layer of security. Likewise, it doesn't mean that the
other mitigations (shadowing parts of /proc or /sys with bind mounts)
are useless. Those measures are still useful. As such, the shadowing
of /proc/kcore is still enabled with both LXC and native drivers.
Special care has to be taken with /proc/1/attr, which still needs to
be mounted read-write in order to enable the AppArmor profile. It is
bind-mounted from a private read-write mount of procfs.
All that enforcement is done in dockerinit. The code doing the real
work is in libcontainer. The init function for the LXC driver calls
the function from libcontainer to avoid code duplication.
Docker-DCO-1.1-Signed-off-by: Jérôme Petazzoni <jerome@docker.com> (github: jpetazzo)
Kernel capabilities for privileged syslog operations are currently splitted into
CAP_SYS_ADMIN and CAP_SYSLOG since the following commit:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ce6ada35bdf710d16582cc4869c26722547e6f11
This patch drops CAP_SYSLOG to prevent containers from messing with
host's syslog (e.g. `dmesg -c` clears up host's printk ring buffer).
Closes#5491
Docker-DCO-1.1-Signed-off-by: Eiichi Tsukata <devel@etsukata.com> (github: Etsukata)
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
This duplicates some of the Exec code but I think it it worth it because
the native driver is more straight forward and does not have the
complexity have handling the type issues for now.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
This temp. expands the Exec method's signature but adds a more robust
way to know when the container's process is actually released and begins
to run. The network interfaces are not guaranteed to be up yet but this
provides a more accurate view with a single callback at this time.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
Without this patch, containers inherit the open file descriptors of the daemon, so my "exec 42>&2" allows us to "echo >&42 some nasty error with some bad advice" directly into the daemon log. :)
Also, "hack/dind" was already doing this due to issues caused by the inheritance, so I'm removing that hack too since this patch obsoletes it by generalizing it for all containers.
Docker-DCO-1.1-Signed-off-by: Andrew Page <admwiggin@gmail.com> (github: tianon)
Added --selinux-enable switch to daemon to enable SELinux labeling.
The daemon will now generate a new unique random SELinux label when a
container starts, and remove it when the container is removed. The MCS
labels will be stored in the daemon memory. The labels of containers will
be stored in the container.json file.
When the daemon restarts on boot or if done by an admin, it will read all containers json files and reserve the MCS labels.
A potential problem would be conflicts if you setup thousands of containers,
current scheme would handle ~500,000 containers.
Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: crosbymichael)
This has every container using the docker daemon's pid for the processes
label so it does not work correctly.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
This allows multiple instances of the backend in different containers
to access devices (although generally only one can modify/create them).
Any old metadata is converted on the first run.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
Instead of globally keeping track of the free device ids we just
start from 0 each run and handle EEXIST error and try the next one.
This way we don't need any global state for the device ids, which
means we can read device metadata lazily. This is important for
multi-process use of the backend.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
This moves the EBUSY detection to devmapper.go, and then returns
a real ErrBusy that deviceset uses.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
This will allow for these to be set independently. Keep the current Docker behavior where Memory and MemoryReservation are set to the value of Memory.
Docker-DCO-1.1-Signed-off-by: Victor Marmol <vmarmol@google.com> (github: vmarmol)
container.Kill() might read a pid of 0 from
container.State.Pid due to losing a race with
container.monitor() calling
container.State.SetStopped(). Sending a SIGKILL to
pid 0 is undesirable as "If pid equals 0, then sig
is sent to every process in the process group of
the calling process."
Docker-DCO-1.1-Signed-off-by: Daniel Norberg <daniel.norberg@gmail.com> (github: danielnorberg)
We used to mount in Create() to be able to create a few files that
needs to be in each device. However, this mount is problematic for
selinux, as we need to set the mount label at mount-time, and it
is not known at the time of Create().
This change just moves the file creation to first Get() call and
drops the mount from Create(). Additionally, this lets us remove
some complexities we had to avoid an extra unmount+mount cycle.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
container.Register() checks both IsRunning() and IsGhost(), but at
this point IsGhost() is always true if IsRunning() is true. For a
newly created container both are false, and for a restored-from-disk
container Daemon.load() sets Ghost to true if IsRunning is true. So we
just drop the IsGhost check.
This was the last call to IsGhost, so we remove It and all other
traces of the ghost state.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)