This patch adds some tests to ensure that quoted flags are properly
handled by the mflag package.
Docker-DCO-1.1-Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> (github: cyphar)
This patch improves the mflag package to ensure that things arguments
to mflag such as `-flag="var"` or `-flag='var'` have the quotes
stripped from the value (to mirror the getopt functionality for similar
flags).
Docker-DCO-1.1-Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> (github: cyphar)
This also makes sure that devices are pointers to avoid copies
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
We now have one place that keeps track of (most) devices that are allowed and created within the container. That place is pkg/libcontainer/devices/devices.go
This fixes several inconsistencies between which devices were created in the lxc backend and the native backend. It also fixes inconsistencies between wich devices were created and which were allowed. For example, /dev/full was being created but it was not allowed within the cgroup. It also declares the file modes and permissions of the default devices, rather than copying them from the host. This is in line with docker's philosphy of not being host dependent.
Docker-DCO-1.1-Signed-off-by: Timothy Hobbs <timothyhobbs@seznam.cz> (github: https://github.com/timthelion)
Remove old Stats interface in libcontainers cgroups package.
Changed Stats to use unit64 instead of int64 to prevent integer overflow issues.
Updated unit tests.
Docker-DCO-1.1-Signed-off-by: Vishnu Kannan <vishnuk@google.com> (github: vishh)
There is no need for this, the device node by itself doesn't work, since
its not on a devpts fs, and we can just a regular file to bind mount over.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
It is no longer necessary to pass "SETUID" or "SETGID" capabilities to
the container when a "user" is specified in the config.
Docker-DCO-1.1-Signed-off-by: Bernerd Schaefer <bj.schaefer@gmail.com> (github: bernerdschaefer)
Fixes#2586
This fixes a few races where the name generator asks if a name is free
but another container takes the name before it can be reserved. This
solves this by generating the name and setting it. If the set fails
with a non unique error then we try again.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
Add specific types for Required and Optional DeviceNodes
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
Without this any container startup fails:
2014/05/20 09:20:36 setup mount namespace copy additional dev nodes mknod fuse operation not permitted
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
Fixes#5849
If the host system does not have fuse enabled in the kernel config we
will ignore the is not exist errors when trying to copy the device node
from the host system into the container.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
Some applications want to write to /proc. For instance:
docker run -it centos groupadd foo
Gives: groupadd: failure while writing changes to /etc/group
And strace reveals why:
open("/proc/self/task/13/attr/fscreate", O_RDWR) = -1 EROFS (Read-only file system)
I've looked at what other systems do, and systemd-nspawn makes /proc read-write
and /proc/sys readonly, while lxc allows "proc:mixed" which does the same,
plus it makes /proc/sysrq-trigger also readonly.
The later seems like a prudent idea, so we follows lxc proc:mixed.
Additionally we make /proc/irq and /proc/bus, as these seem to let
you control various hardware things.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
those that were specified in the config. This commit also explicitly
adds a set of capabilities that we were silently not dropping and were
assumed by the tests.
Docker-DCO-1.1-Signed-off-by: Victor Marmol <vmarmol@google.com> (github: vmarmol)
ideally it should never reach it, but there was already multiple issues with infinite loop
at following symlinks. this fixes hanging unit tests
Docker-DCO-1.1-Signed-off-by: Lajos Papp <lajos.papp@sequenceiq.com> (github: lalyos)
normally symlinks are created as either
ln -s /path/existing /path/new-name
or
cd /path && ln -s ./existing new-name
but one can create it this way
cd /path && ln -s existing new-name
this drives FollowSymlinkInScope into infinite loop
Docker-DCO-1.1-Signed-off-by: Lajos Papp <lajos.papp@sequenceiq.com> (github: lalyos)
We don't need this because it is covered by the libcontainer MAINTAINERS
file
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
After copying allowed device nodes, set up "/dev/fd", "/dev/stdin",
"/dev/stdout", and "/dev/stderr" symlinks.
Docker-DCO-1.1-Signed-off-by: Bernerd Schaefer <bj.schaefer@gmail.com> (github: bernerdschaefer)
[rebased by @crosbymichael]
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
Before we create any files to bind-mount on, make sure they are
inside the container rootfs, handling for instance absolute symbolic
links inside the container.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
All modern distros set up /run to be a tmpfs, see for instance:
https://wiki.debian.org/ReleaseGoals/RunDirectory
Its a very useful place to store pid-files, sockets and other things
that only live at runtime and that should not be stored in the image.
This is also useful when running systemd inside a container, as it
will try to mount /run if not already mounted, which will fail for
non-privileged container.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
If you specify a bind mount in a place that doesn't have a file yet we
create that (and parent directories). This is needed because otherwise
you can't use volumes like e.g. /dev/log, as that gets covered by the
/dev tmpfs mounts.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
This is a convenience for callers which are only interested in one value
per key. Similar to how HTTP headers allow multiple keys per value, but
are often used to store and retrieve only one value.
Docker-DCO-1.1-Signed-off-by: Solomon Hykes <solomon@docker.com> (github: shykes)
This introduces a superficial change to the Beam API:
* `beam.SendPipe` is renamed to the more accurate `beam.SendRPipe`
* `beam.SendWPipe` is introduced as a mirror to `SendRPipe`
There is no other change in the beam API.
Docker-DCO-1.1-Signed-off-by: Solomon Hykes <solomon@docker.com> (github: shykes)
Update restrict.Restrict to both show the error message when failing to mount /dev/null over /proc/kcore, and to ignore "not exists" errors while doing so (for when CONFIG_PROC_KCORE=n in the kernel)
This patch fixes the fact that the tests for pkg/networkfs/etchosts
couldn't build due to syntax errors.
Docker-DCO-1.1-Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> (github: cyphar)
This updates systemd.Apply to match the fs backend by:
* Always join blockio controller (for stats)
* Support CpusetCpus
* Support MemorySwap
Also, it removes the generic UnitProperties in favour of a single
option to set the slice.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
We don't have the flexibility to do extra things with lxc because it is
a black box and most fo the magic happens before we get a chance to
interact with it in dockerinit.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
There is not need for the remount hack, we use aa_change_onexec so the
apparmor profile is not applied until we exec the users app.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
This also cleans up some of the left over restriction paths code from
before.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
It has been pointed out that some files in /proc and /sys can be used
to break out of containers. However, if those filesystems are mounted
read-only, most of the known exploits are mitigated, since they rely
on writing some file in those filesystems.
This does not replace security modules (like SELinux or AppArmor), it
is just another layer of security. Likewise, it doesn't mean that the
other mitigations (shadowing parts of /proc or /sys with bind mounts)
are useless. Those measures are still useful. As such, the shadowing
of /proc/kcore is still enabled with both LXC and native drivers.
Special care has to be taken with /proc/1/attr, which still needs to
be mounted read-write in order to enable the AppArmor profile. It is
bind-mounted from a private read-write mount of procfs.
All that enforcement is done in dockerinit. The code doing the real
work is in libcontainer. The init function for the LXC driver calls
the function from libcontainer to avoid code duplication.
Docker-DCO-1.1-Signed-off-by: Jérôme Petazzoni <jerome@docker.com> (github: jpetazzo)
Kernel capabilities for privileged syslog operations are currently splitted into
CAP_SYS_ADMIN and CAP_SYSLOG since the following commit:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ce6ada35bdf710d16582cc4869c26722547e6f11
This patch drops CAP_SYSLOG to prevent containers from messing with
host's syslog (e.g. `dmesg -c` clears up host's printk ring buffer).
Closes#5491
Docker-DCO-1.1-Signed-off-by: Eiichi Tsukata <devel@etsukata.com> (github: Etsukata)
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
This is needed for Send/Recieve to correctly handle borders between
the messages.
The framing uses a single 32bit uint32 length for each frame, of which
the high bit is used to indicate whether the message contains a file
descriptor or not. This is enough to separate out each message sent
and to decide to which message each file descriptors belongs, even
though multiple Sends may be coalesced into a single read, and/or one
Send can be split into multiple writes.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
Docker-DCO-1.1-Signed-off-by: Solomon Hykes <solomon@docker.com> (github: shykes)
No need to duplicate this information when we already have a
container.json file in the root of libcontainer
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
This duplicates some of the Exec code but I think it it worth it because
the native driver is more straight forward and does not have the
complexity have handling the type issues for now.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
This temp. expands the Exec method's signature but adds a more robust
way to know when the container's process is actually released and begins
to run. The network interfaces are not guaranteed to be up yet but this
provides a more accurate view with a single callback at this time.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
These are failing, and indicate things that need to be fixed. The
primarily problem is the lack of framing between beam messages.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
[solomon@docker.com: rebased on master]
Signed-off-by: Solomon Hykes <solomon@docker.com>
Without this patch, containers inherit the open file descriptors of the daemon, so my "exec 42>&2" allows us to "echo >&42 some nasty error with some bad advice" directly into the daemon log. :)
Also, "hack/dind" was already doing this due to issues caused by the inheritance, so I'm removing that hack too since this patch obsoletes it by generalizing it for all containers.
Docker-DCO-1.1-Signed-off-by: Andrew Page <admwiggin@gmail.com> (github: tianon)