0ct0pu5/moby

Author	SHA1	Message	Date
Akihiro Suda	33ee7941d4	support `--privileged --cgroupns=private` on cgroup v1 Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-04-21 23:11:32 +09:00
Sebastiaan van Stijn	5d040cbd16	daemon: fix capitalization of some functions Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-04-14 17:22:19 +02:00
Sebastiaan van Stijn	eeef12f469	daemon: address some minor linting issues and nits Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-04-14 17:22:17 +02:00
Kir Kolyshkin	39048cf656	Really switch to moby/sys/mount* Switch to moby/sys/mount and mountinfo. Keep the pkg/mount for potential outside users. This commit was generated by the following bash script: ``` set -e -u -o pipefail for file in $(git grep -l 'docker/docker/pkg/mount"' \| grep -v ^pkg/mount); do sed -i -e 's#/docker/docker/pkg/mount"#/moby/sys/mount"#' \ -e 's#mount\.$GetMounts\\|Mounted\\|Info\\|[A-Za-z]*Filter$#mountinfo.\1#g' \ $file goimports -w $file done ``` Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-03-20 09:46:25 -07:00
Akihiro Suda	ca4b51868a	rootless: support `--exec-opt native.cgroupdriver=systemd` Support cgroup as in Rootless Podman. Requires cgroup v2 host with crun. Tested with Ubuntu 19.10 (kernel 5.3, systemd 242), crun v0.12.1. Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-02-14 15:32:31 +09:00
Sebastiaan van Stijn	e6c1820ef5	Merge pull request #40174 from AkihiroSuda/cgroup2 support cgroup2	2020-01-09 20:09:11 +01:00
Akhil Mohan	86ebbe16de	remove host directory check Signed-off-by: Akhil Mohan <akhil.mohan@mayadata.io>	2020-01-02 14:28:51 +05:30
Akihiro Suda	19baeaca26	cgroup2: enable cgroup namespace by default For cgroup v1, we were unable to change the default because of compatibility issue. For cgroup v2, we should change the default right now because switching to cgroup v2 is already breaking change. See also containers/libpod#4363 containers/libpod#4374 Privileged containers also use cgroupns=private by default. https://github.com/containers/libpod/pull/4374#issuecomment-549776387 Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-01-01 02:58:40 +09:00
Akhil Mohan	35b9e6989f	Make `--device` flag work in privileged mode When a container is started in privileged mode, the device mappings provided by `--device` flag was ignored. Now the device mappings will be considered even in privileged mode. Signed-off-by: Akhil Mohan <akhil.mohan@mayadata.io>	2019-12-06 18:43:56 +05:30
wenlxie	03b3ec1dd5	make --device works at privileged mode Signed-off-by: wenlxie <wenlxie@ebay.com>	2019-12-06 18:17:03 +05:30
Olli Janatuinen	1308a3a99f	Move DefaultCapabilities() to caps package Signed-off-by: Olli Janatuinen <olli.janatuinen@gmail.com>	2019-11-14 21:13:16 +02:00
Justin Cormack	dde030a6b1	Merge pull request #40083 from thaJeztah/daemon_consts daemon: use constants for AppArmor and Seccomp	2019-10-17 11:12:37 -07:00
Grant Millar	df7b8f458a	daemon: Use short libnetwork ID in exec-root & update libnetwork Signed-off-by: Grant Millar <rid@cylo.io>	2019-10-15 11:40:24 +01:00
Sebastiaan van Stijn	a33cf495f2	daemon: use constants for AppArmor profiles Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2019-10-13 19:16:12 +02:00
Sebastiaan van Stijn	07ff4f1de8	goimports: fix imports Format the source according to latest goimports. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2019-09-18 12:56:54 +02:00
Rob Gulewich	072400fc4b	Make cgroup namespaces configurable This adds both a daemon-wide flag and a container creation property: - Set the `CgroupnsMode: "host\|private"` HostConfig property at container creation time to control what cgroup namespace the container is created in - Set the `--default-cgroupns-mode=host\|private` daemon flag to control what cgroup namespace containers are created in by default - Set the default if the daemon flag is unset to "host", for backward compatibility - Default to CgroupnsMode: "host" for client versions < 1.40 Signed-off-by: Rob Gulewich <rgulewich@netflix.com>	2019-05-07 10:22:16 -07:00
Rob Gulewich	256eb04d69	Start containers in their own cgroup namespaces This is enabled for all containers that are not run with --privileged, if the kernel supports it. Fixes #38332 Signed-off-by: Rob Gulewich <rgulewich@netflix.com>	2019-05-07 10:22:16 -07:00
Michael Crosby	c478553640	Export all spec generation opts Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2019-04-10 15:38:36 -04:00
Michael Crosby	cb902f4430	Refactor few spec generation ops Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2019-04-09 16:51:40 -04:00
John Howard	a3eda72f71	Merge pull request #38541 from Microsoft/jjh/containerd Windows: Experimental: ContainerD runtime	2019-03-19 21:09:19 -07:00
Tibor Vass	8f936ae8cf	Add DeviceRequests to HostConfig to support NVIDIA GPUs This patch hard-codes support for NVIDIA GPUs. In a future patch it should move out into its own Device Plugin. Signed-off-by: Tibor Vass <tibor@docker.com>	2019-03-18 17:19:45 +00:00
John Howard	d4ceb61f2b	LCOW:Reworking spec builder Signed-off-by: John Howard <jhoward@microsoft.com>	2019-03-12 18:41:55 -07:00
Sebastiaan van Stijn	dd94555787	Merge pull request #32519 from darkowlzz/32443-docker-update-pids-limit Add pids-limit support in docker update	2019-02-23 15:20:59 +01:00
Sunny Gogoi	74eb258ffb	Add pids-limit support in docker update - Adds updating PidsLimit in UpdateContainer(). - Adds setting PidsLimit in toContainerResources(). Signed-off-by: Sunny Gogoi <indiasuny000@gmail.com> Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2019-02-21 14:17:38 -08:00
Akihiro Suda	ec87479b7e	allow running `dockerd` in an unprivileged user namespace (rootless mode) Please refer to `docs/rootless.md`. TLDR: * Make sure `/etc/subuid` and `/etc/subgid` contain the entry for you * `dockerd-rootless.sh --experimental` * `docker -H unix://$XDG_RUNTIME_DIR/docker.sock run ...` Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>	2019-02-04 00:24:27 +09:00
Olli Janatuinen	80d7bfd54d	Capabilities refactor - Add support for exact list of capabilities, support only OCI model - Support OCI model on CapAdd and CapDrop but remain backward compatibility - Create variable locally instead of declaring it at the top - Use const for magic "ALL" value - Rename `cap` variable as it overlaps with `cap()` built-in - Normalize and validate capabilities before use - Move validation for conflicting options to validateHostConfig() - TweakCapabilities: simplify logic to calculate capabilities Signed-off-by: Olli Janatuinen <olli.janatuinen@gmail.com> Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2019-01-22 21:50:41 +02:00
Michael Crosby	b940cc5cff	Move caps and device spec utils to `oci` pkg Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2018-12-11 10:20:25 -05:00
Aleksa Sarai	7417f50575	oci: include the domainname in "kernel.domainname" The OCI doesn't have a specific field for an NIS domainname[1] (mainly because FreeBSD and Solaris appear to have a similar concept but it is configured entirely differently). However, on Linux, the NIS domainname can be configured through both the setdomainname(2) syscall but also through the "kernel.domainname" sysctl. Since the OCI has a way of injecting sysctls this means we don't need to have any OCI changes to support NIS domainnames (and we can always switch if the OCI picks up such support in the future). It should be noted that because we have to generate this each spec creation we also have to make sure that it's not clobbered by the HostConfig. I'm pretty sure making this change generic (so that HostConfig will not clobber any pre-set sysctls) will not cause other issues to crop up. [1]: https://github.com/opencontainers/runtime-spec/issues/592 Signed-off-by: Aleksa Sarai <asarai@suse.de>	2018-11-30 17:31:38 +11:00
Akihiro Suda	596cdffb9f	mount: add BindOptions.NonRecursive (API v1.40) This allows non-recursive bind-mount, i.e. mount(2) with "bind" rather than "rbind". Swarm-mode will be supported in a separate PR because of mutual vendoring. Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>	2018-11-06 17:51:58 +09:00
Sebastiaan van Stijn	deac65c929	Merge pull request #37850 from AkihiroSuda/propagate-exec-root-to-libnetwork daemon: propagate exec-root to libnetwork-setkey	2018-09-28 15:20:37 +02:00
Brian Goff	12d5eb8e22	Merge pull request #37703 from kolyshkin/rm-dead-code daemon/setMounts(): remove dead code	2018-09-25 16:07:15 -07:00
Akihiro Suda	40385208cb	daemon: propagate exec-root to libnetwork-setkey Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>	2018-09-15 13:49:30 +09:00
Kir Kolyshkin	ac8c3debdb	daemon/setMounts(): remove dead code Since PR 11353 (commit `7804cd36ee` "Filter out default mounts that are override by user") there can be no duplicated mounts in the list, so the check is redundant. This should speed up container start by a nanosecond or two. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2018-08-27 15:40:10 -07:00
Kir Kolyshkin	bcacbf523b	Fix docker --init with /dev bind mount In case a user wants to have a child reaper inside a container (i.e. run "docker --init") AND a bind-mounted /dev, the following error occurs: > docker run -d -v /dev:/dev --init busybox top > 088c96808c683077f04c4cc2711fddefe1f5970afc085d59e0baae779745a7cf > docker: Error response from daemon: OCI runtime create failed: container_linux.go:296: starting container process caused "exec: "/dev/init": stat /dev/init: no such file or directory": unknown. This happens because if a user-suppled /dev is provided, all the built-in /dev/xxx mounts are filtered out. To solve, let's move in-container init to /sbin, as the chance that /sbin will be bind-mounted to a container is smaller than that for /dev. While at it, let's give it more unique name (docker-init). NOTE it still won't work for the case of bind-mounted /sbin. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2018-08-27 15:38:46 -07:00
Salahuddin Khan	763d839261	Add ADD/COPY --chown flag support to Windows This implements chown support on Windows. Built-in accounts as well as accounts included in the SAM database of the container are supported. NOTE: IDPair is now named Identity and IDMappings is now named IdentityMapping. The following are valid examples: ADD --chown=Guest . <some directory> COPY --chown=Administrator . <some directory> COPY --chown=Guests . <some directory> COPY --chown=ContainerUser . <some directory> On Windows an owner is only granted the permission to read the security descriptor and read/write the discretionary access control list. This fix also grants read/write and execute permissions to the owner. Signed-off-by: Salahuddin Khan <salah@docker.com>	2018-08-13 21:59:11 -07:00
Kazuhiro Sera	1e49fdcafc	Fix the several typos detected by github.com/client9/misspell Signed-off-by: Kazuhiro Sera <seratch@gmail.com>	2018-08-09 00:45:00 +09:00
John Starks	e9268d9642	lcow: Allow the client to add device cgroup rules Signed-off-by: John Starks <jostarks@microsoft.com>	2018-06-15 16:14:17 -07:00
John Starks	349aeeab7c	lcow: Allow the client to add or remove capabilities Signed-off-by: John Starks <jostarks@microsoft.com>	2018-06-15 16:03:33 -07:00
Jess Frazelle	3694c1e34e	api: add configurable MaskedPaths and ReadOnlyPaths to the API This adds MaskedPaths and ReadOnlyPaths options to HostConfig for containers so that a user can override the default values. When the value sent through the API is nil the default is used. Otherwise the default is overridden. Adds integration tests for MaskedPaths and ReadonlyPaths. Signed-off-by: Jess Frazelle <acidburn@microsoft.com>	2018-06-05 12:33:14 -04:00
Sebastiaan van Stijn	f23c00d870	Various code-cleanup remove unnescessary import aliases, brackets, and so on. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2018-05-23 17:50:54 +02:00
Sebastiaan van Stijn	31aca4bef4	Merge pull request #36991 from kolyshkin/slice-in-place daemon.setMounts(): copy slice in place	2018-05-14 13:49:47 +02:00
Kir Kolyshkin	d8fd6137a1	daemon.getSourceMount(): fix for / mount point A recent optimization in getSourceMount() made it return an error in case when the found mount point is "/". This prevented bind-mounted volumes from working in such cases. A (rather trivial but adeqate) unit test case is added. Fixes: `871c957242` ("getSourceMount(): simplify") Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2018-05-10 12:53:37 -07:00
Kir Kolyshkin	d4c94e83ca	daemon.setMounts(): copy slice in place It does not make sense to copy a slice element by element, then discard the source one. Let's do copy in place instead which is way more efficient. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2018-05-03 10:26:06 -07:00
Vincent Demeester	53982e3fc1	Merge pull request #36091 from kolyshkin/mount pkg/mount improvements	2018-04-21 11:03:54 +02:00
Kir Kolyshkin	871c957242	getSourceMount(): simplify The flow of getSourceMount was: 1 get all entries from /proc/self/mountinfo 2 do a linear search for the `source` directory 3 if found, return its data 4 get the parent directory of `source`, goto 2 The repeated linear search through the whole mountinfo (which can have thousands of records) is inefficient. Instead, let's just 1 collect all the relevant records (only those mount points that can be a parent of `source`) 2 find the record with the longest mountpath, return its data This was tested manually with something like ```go func TestGetSourceMount(t *testing.T) { mnt, flags, err := getSourceMount("/sys/devices/msr/") assert.NoError(t, err) t.Logf("mnt: %v, flags: %v", mnt, flags) } ``` ...but it relies on having a specific mount points on the system being used for testing. [v2: add unit tests for ParentsFilter] Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2018-04-19 14:49:17 -07:00
Kir Kolyshkin	bb934c6aca	pkg/mount: implement/use filter for mountinfo parsing Functions `GetMounts()` and `parseMountTable()` return all the entries as read and parsed from /proc/self/mountinfo. In many cases the caller is only interested only one or a few entries, not all of them. One good example is `Mounted()` function, which looks for a specific entry only. Another example is `RecursiveUnmount()` which is only interested in mount under a specific path. This commit adds `filter` argument to `GetMounts()` to implement two things: 1. filter out entries a caller is not interested in 2. stop processing if a caller is found what it wanted `nil` can be passed to get a backward-compatible behavior, i.e. return all the entries. A few filters are implemented: - `PrefixFilter`: filters out all entries not under `prefix` - `SingleEntryFilter`: looks for a specific entry Finally, `Mounted()` is modified to use `SingleEntryFilter()`, and `RecursiveUnmount()` is using `PrefixFilter()`. Unit tests are added to check filters are working. [v2: ditch NoFilter, use nil] [v3: ditch GetMountsFiltered()] [v4: add unit test for filters] [v5: switch to gotestyourself] Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2018-04-19 14:48:09 -07:00
Brian Goff	6a70fd222b	Move mount parsing to separate package. This moves the platform specific stuff in a separate package and keeps the `volume` package and the defined interfaces light to import. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2018-04-19 06:35:54 -04:00
Justin Cormack	a729853bc7	Always make sysfs read-write with privileged It does not make any sense to vary this based on whether the rootfs is read only. We removed all the other mount dependencies on read-only eg see #35344. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2018-04-06 16:17:18 +01:00
Justin Cormack	15ff09395c	If container will run as non root user, drop permitted, effective caps early As soon as the initial executable in the container is executed as a non root user, permitted and effective capabilities are dropped. Drop them earlier than this, so that they are dropped before executing the file. The main effect of this is that if `CAP_DAC_OVERRIDE` is set (the default) the user will not be able to execute files they do not have permission to execute, which previously they could. The old behaviour was somewhat surprising and the new one is definitely correct, but it is not in any meaningful way exploitable, and I do not think it is necessary to backport this fix. It is unlikely to have any negative effects as almost all executables have world execute permission anyway. Use the bounding set not the effective set as the canonical set of capabilities, as effective will now vary. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2018-03-19 14:45:27 -07:00
Kir Kolyshkin	d6ea46ceda	container.BaseFS: check for nil before deref Commit `7a7357dae1` ("LCOW: Implemented support for docker cp + build") changed `container.BaseFS` from being a string (that could be empty but can't lead to nil pointer dereference) to containerfs.ContainerFS, which could be be `nil` and so nil dereference is at least theoretically possible, which leads to panic (i.e. engine crashes). Such a panic can be avoided by carefully analysing the source code in all the places that dereference a variable, to make the variable can't be nil. Practically, this analisys are impossible as code is constantly evolving. Still, we need to avoid panics and crashes. A good way to do so is to explicitly check that a variable is non-nil, returning an error otherwise. Even in case such a check looks absolutely redundant, further changes to the code might make it useful, and having an extra check is not a big price to pay to avoid a panic. This commit adds such checks for all the places where it is not obvious that container.BaseFS is not nil (which in this case means we do not call daemon.Mount() a few lines earlier). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2018-03-13 21:24:48 -07:00

1 2 3

124 commits