container.
docker run -v /dev:/dev should stop mounting other default mounts in i
libcontainer otherwise directories and devices like /dev/ptx get mishandled.
We want to be able to run libvirtd for launching vms and it needs
access to the hosts /dev. This is a key componant of OpenStack.
Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
Add a --readonly flag to allow the container's root filesystem to be
mounted as readonly. This can be used in combination with volumes to
force a container's process to only write to locations that will be
persisted. This is useful in many cases where the admin controls where
they would like developers to write files and error on any other
locations.
Closes#7923Closes#8752
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
We want to be able to use container without the PID namespace. We basically
want containers that can manage the host os, which I call Super Privileged
Containers. We eventually would like to get to the point where the only
namespace we use is the MNT namespace to bring the Apps userspace with it.
By eliminating the PID namespace we can get better communication between the
host and the clients and potentially tools like strace and gdb become easier
to use. We also see tools like libvirtd running within a container telling
systemd to place a VM in a particular cgroup, we need to have communications of the PID.
I don't see us needing to share PID namespaces between containers, since this
is really what docker exec does.
So currently I see us just needing docker run --pid=host
Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
This commit contains changes for docker:
* user.GetGroupFile to user.GetGroupPath docker/libcontainer#301
* Add systemd support for OOM docker/libcontainer#307
* Support for custom namespaces docker/libcontainer#279, docker/libcontainer#312
* Fixes#9699docker/libcontainer#308
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
Some workloads rely on IPC for communications with other processes. We
would like to split workloads between two container but still allow them
to communicate though shared IPC.
This patch mimics the --net code to allow --ipc=host to not split off
the IPC Namespace. ipc=container:CONTAINERID to share ipc between containers
If you share IPC between containers, then you need to make sure SELinux labels
match.
Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
Right now, MAC addresses are randomly generated by the kernel when
creating the veth interfaces.
This causes different issues related to ARP, such as #4581, #5737 and #8269.
This change adds support for consistent MAC addresses, guaranteeing that
an IP address will always end up with the same MAC address, no matter
what.
Since IP addresses are already guaranteed to be unique by the
IPAllocator, MAC addresses will inherit this property as well for free.
Consistent mac addresses is also a requirement for stable networking (#8297)
since re-using the same IP address on a different MAC address triggers the ARP
issue.
Finally, this change makes the MAC address accessible through docker
inspect, which fixes#4033.
Signed-off-by: Andrea Luzzardi <aluzzardi@gmail.com>
This also removes dead code in the native driver for a past feature that
was never fully implemented.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This also makes sure that devices are pointers to avoid copies
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
We now have one place that keeps track of (most) devices that are allowed and created within the container. That place is pkg/libcontainer/devices/devices.go
This fixes several inconsistencies between which devices were created in the lxc backend and the native backend. It also fixes inconsistencies between wich devices were created and which were allowed. For example, /dev/full was being created but it was not allowed within the cgroup. It also declares the file modes and permissions of the default devices, rather than copying them from the host. This is in line with docker's philosphy of not being host dependent.
Docker-DCO-1.1-Signed-off-by: Timothy Hobbs <timothyhobbs@seznam.cz> (github: https://github.com/timthelion)
Add specific types for Required and Optional DeviceNodes
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
This also cleans up some of the left over restriction paths code from
before.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
It has been pointed out that some files in /proc and /sys can be used
to break out of containers. However, if those filesystems are mounted
read-only, most of the known exploits are mitigated, since they rely
on writing some file in those filesystems.
This does not replace security modules (like SELinux or AppArmor), it
is just another layer of security. Likewise, it doesn't mean that the
other mitigations (shadowing parts of /proc or /sys with bind mounts)
are useless. Those measures are still useful. As such, the shadowing
of /proc/kcore is still enabled with both LXC and native drivers.
Special care has to be taken with /proc/1/attr, which still needs to
be mounted read-write in order to enable the AppArmor profile. It is
bind-mounted from a private read-write mount of procfs.
All that enforcement is done in dockerinit. The code doing the real
work is in libcontainer. The init function for the LXC driver calls
the function from libcontainer to avoid code duplication.
Docker-DCO-1.1-Signed-off-by: Jérôme Petazzoni <jerome@docker.com> (github: jpetazzo)