beenull/moby

Author	SHA1	Message	Date
Nicolas V Castet	47dfff68e4	Whitelist syscalls linked to CAP_SYS_NICE in default seccomp profile * Update profile to match docker documentation at https://docs.docker.com/engine/security/seccomp/ Signed-off-by: Nicolas V Castet <nvcastet@us.ibm.com>	2018-06-20 07:32:08 -05:00
Justin Cormack	15ff09395c	If container will run as non root user, drop permitted, effective caps early As soon as the initial executable in the container is executed as a non root user, permitted and effective capabilities are dropped. Drop them earlier than this, so that they are dropped before executing the file. The main effect of this is that if `CAP_DAC_OVERRIDE` is set (the default) the user will not be able to execute files they do not have permission to execute, which previously they could. The old behaviour was somewhat surprising and the new one is definitely correct, but it is not in any meaningful way exploitable, and I do not think it is necessary to backport this fix. It is unlikely to have any negative effects as almost all executables have world execute permission anyway. Use the bounding set not the effective set as the canonical set of capabilities, as effective will now vary. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2018-03-19 14:45:27 -07:00
NobodyOnSE	b2a907c8ca	Whitelist statx syscall for libseccomp-2.3.3 onward Older seccomp versions will ignore this. Signed-off-by: NobodyOnSE <ich@sektor.selfip.com>	2018-03-06 08:42:12 +01:00
Daniel Nephin	4f0d95fa6e	Add canonical import comment Signed-off-by: Daniel Nephin <dnephin@docker.com>	2018-02-05 16:51:57 -05:00
Chao Wang	5c154cfac8	Copy Inslice() to those parts that use it Signed-off-by: Chao Wang <wangchao.fnst@cn.fujitsu.com>	2017-11-10 13:42:38 +08:00
Simon Vikstrom	d7bf5e3b4d	Remove double defined alarm Signed-off-by: Simon Vikstrom <pullreq@devsn.se>	2017-08-19 09:55:03 +02:00
Panagiotis Moustafellos	cf6e1c5dfd	seccomp: whitelist quotactl with CAP_SYS_ADMIN The quotactl syscall is being whitelisted in default seccomp profile, gated by CAP_SYS_ADMIN. Signed-off-by: Panagiotis Moustafellos <pmoust@elastic.co>	2017-08-09 18:52:15 +03:00
Florin Patan	52d4716843	Remove unused import This commit removes an unused import. Signed-off-by: Florin Patan <florinpatan@gmail.com>	2017-07-29 22:21:53 +01:00
Christopher Jones	069fdc8a08	[project] change syscall to /x/sys/unix\|windows Changes most references of syscall to golang.org/x/sys/ Ones aren't changes include, Errno, Signal and SysProcAttr as they haven't been implemented in /x/sys/. Signed-off-by: Christopher Jones <tophj@linux.vnet.ibm.com> [s390x] switch utsname from unsigned to signed per `33267e036f` char in s390x in the /x/sys/unix package is now signed, so change the buildtags Signed-off-by: Christopher Jones <tophj@linux.vnet.ibm.com>	2017-07-11 08:00:32 -04:00
Miklos Szegedi	2db05316d0	Whitelist adjtimex get operation. Adjustment operations are gated by CAP_SYS_TIME Signed-off-by: Miklos Szegedi <miklos.szegedi@cloudera.com>	2017-06-02 18:48:16 +00:00
Justin Cormack	dcf2632945	Revert "Block obsolete socket families in the default seccomp profile" This reverts commit `7e3a596a63`. Unfortunately, it was pointed out in https://github.com/moby/moby/pull/29076#commitcomment-21831387 that the `socketcall` syscall takes a pointer to a struct so it is not possible to use seccomp profiles to filter it. This means these cannot be blocked as you can use `socketcall` to call them regardless, as we currently allow 32 bit syscalls. Users who wish to block these should use a seccomp profile that blocks all 32 bit syscalls and then just block the non socketcall versions. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2017-05-09 14:26:00 +01:00
Michael Crosby	005506d36c	Update moby to runc and oci 1.0 runtime final rc Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2017-05-05 13:45:45 -07:00
Ian Campbell	cd456433ea	seccomp: Allow personality with UNAME26 bit set. From personality(2): Have uname(2) report a 2.6.40+ version number rather than a 3.x version number. Added as a stopgap measure to support broken applications that could not handle the kernel version-numbering switch from 2.6.x to 3.x. This allows both "UNAME26\|PER_LINUX" and "UNAME26\|PER_LINUX32". Fixes: #32839 Signed-off-by: Ian Campbell <ian.campbell@docker.com>	2017-05-02 15:05:01 +01:00
Antonio Murdaca	3ab4961032	profiles: seccomp: allow clock_settime when CAP_SYS_TIME is added Signed-off-by: Antonio Murdaca <runcom@redhat.com>	2017-03-20 11:05:23 +01:00
Justin Cormack	9067ef0e32	Seccomp Update - Update libseccomp-golang to 0.9.0 release - Update libseccomp to 2.3.2 release - add preadv2 and pwritev2 syscalls to whitelist Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2017-03-07 22:19:46 +00:00
Aleksa Sarai	a3155743ad	profiles: seccomp: fix !seccomp build Previously building with seccomp disabled would cause build failures because of a mismatch in the type signatures of DefaultProfile(). Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-03-02 21:13:17 +11:00
Gabriel Linder	52d8f582c3	Allow sync_file_range2 on supported architectures. Signed-off-by: Gabriel Linder <linder.gabriel@gmail.com>	2017-02-14 21:29:33 +01:00
Justin Cormack	d6adcd6a82	Add two arm specific syscalls to seccomp profile These are arm variants with different argument ordering because of register alignment requirements. fix #30516 Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2017-01-29 14:59:45 +00:00
Justin Cormack	7e3a596a63	Block obsolete socket families in the default seccomp profile Linux supports many obsolete address families, which are usually available in common distro kernels, but they are less likely to be properly audited and may have security issues This blocks all socket families in the socket (and socketcall where applicable) syscall except - AF_UNIX - Unix domain sockets - AF_INET - IPv4 - AF_INET6 - IPv6 - AF_NETLINK - Netlink sockets for communicating with the ekrnel - AF_PACKET - raw sockets, which are only allowed with CAP_NET_RAW All other socket families are blocked, including Appletalk (native, not over IP), IPX (remember that!), VSOCK and HVSOCK, which should not generally be used in containers, etc. Note that users can of course provide a profile per container or in the daemon config if they have unusual use cases that require these. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2017-01-17 17:50:44 +00:00
Antonio Murdaca	197f3ee687	profiles/seccomp: fix comment Signed-off-by: Antonio Murdaca <runcom@redhat.com>	2016-11-25 11:40:54 +01:00
Michael Crosby	91e197d614	Add engine-api types to docker This moves the types for the `engine-api` repo to the existing types package. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-09-07 11:05:58 -07:00
Antonio Murdaca	5ff21add06	New seccomp format Signed-off-by: Antonio Murdaca <runcom@redhat.com>	2016-09-01 11:53:07 +02:00
Michael Crosby	041e5a21dc	Replace old oci specs import with runtime-specs Fixes #25804 The upstream repo changed the import paths. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-08-17 09:38:34 -07:00
Justin Cormack	c1ca124682	Gate name_to_handle_at by CAP_SYS_ADMIN not CAP_DAC_READ_SEARCH Only open_by_handle_at requires CAP_DAC_READ_SEARCH. This allows systemd to run with only `--cap-add SYS_ADMIN` rather than having to also add `--cap-add DAC_READ_SEARCH` as well which it does not really need. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2016-08-10 12:22:36 +01:00
Justin Cormack	bdf01cf5de	Move mlock back into the default ungated seccomp profile Do not gate with CAP_IPC_LOCK as unprivileged use is now allowed in Linux. This returns it to how it was in 1.11. Fixes #23587 Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2016-06-15 16:25:27 -04:00
Michael Holzheu	bf2a577c13	Enable seccomp for s390x To implement seccomp for s390x the following changes are required: 1) seccomp_default: Add s390 compat mode On s390x (64 bit) we can run s390 (32 bit) programs in 32 bit compat mode. Therefore add this information to arches(). 2) seccomp_default: Use correct flags parameter for sys_clone on s390x On s390x the second parameter for the clone system call is the flags parameter. On all other architectures it is the first one. See kernel code kernel/fork.c: #elif defined(CONFIG_CLONE_BACKWARDS2) SYSCALL_DEFINE5(clone, unsigned long, newsp, unsigned long, clone_flags, int __user , parent_tidptr, So fix the docker default seccomp rule and check for the second parameter on s390/s390x. 3) seccomp_default: Add s390 specific syscalls For s390 we currently have three additional system calls that should be added to the seccomp whitelist: - Other architectures can read/write unprivileged from/to PCI MMIO memory. On s390 the instructions are privileged and therefore we need system calls for that purpose: s390_pci_mmio_write() * s390_pci_mmio_read() - Runtime instrumentation: * s390_runtime_instr() 4) test_integration: Do not run seccomp default profile test on s390x The generated profile that we check in is for amd64 and i386 architectures and does not work correctly on s390x. See also: `75385dc216` ("Do not run the seccomp tests that use default.json on non x86 architectures") 5) Dockerfile.s390x: Add "seccomp" to DOCKER_BUILDTAGS Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>	2016-06-06 08:13:22 -04:00
Justin Cormack	9ed6e39cdd	Do not restrict chown via seccomp, just let capabilities control access In #22554 I aligned seccomp and capabilities, however the case of the chown calls and CAP_CHOWN was less clearcut, as these are simple calls that the capabilities will block if they are not allowed. They are needed when no new privileges is not set in order to allow docker to call chown before the container is started, so there was a workaround but this did not include all the chown syscalls, and Arm was failing on some seccomp tests because it was using a different syscall from just the fchown that was allowed in this case. It is simpler to just allow all the chown calls in the default seccomp profile and let the capabilities subsystem block them. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2016-05-25 12:49:30 -07:00
Justin Cormack	a83cedddc6	Enable seccomp on ppc64le In order to do this, allow the socketcall syscall in the default seccomp profile. This is a multiplexing syscall for the socket operations, which is becoming obsolete gradually, but it is used in some architectures. libseccomp has special handling for it for x86 where it is common, so we did not need it in the profile, but does not have any handling for ppc64le. It turns out that the Debian images we use for tests do use the socketcall, while the newer images such as Ubuntu 16.04 do not. Enabling this does no harm as we allow all the socket operations anyway, and we allow the similar ipc call for similar reasons already. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2016-05-23 22:35:55 -07:00
Justin Cormack	a01c4dc8f8	Align default seccomp profile with selected capabilities Currently the default seccomp profile is fixed. This changes it so that it varies depending on the Linux capabilities selected with the --cap-add and --cap-drop options. Without this, if a user adds privileges, eg to allow ptrace with --cap-add sys_ptrace then still cannot actually use ptrace as it is still blocked by seccomp, so they will probably disable seccomp or use --privileged. With this change the syscalls that are needed for the capability are also allowed by the seccomp profile based on the selected capabilities. While this patch makes it easier to do things with for example cap_sys_admin enabled, as it will now allow creating new namespaces and use of mount, it still allows less than --cap-add cap_sys_admin --security-opt seccomp:unconfined would have previously. It is not recommended that users run containers with cap_sys_admin as this does give full access to the host machine. It also cleans up some architecture specific system calls to be only selected when needed. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2016-05-11 09:30:23 +01:00
Justin Cormack	e7a99ae5e1	Remove mlock and vhangup from the default seccomp profile These syscalls are already blocked by the default capabilities: mlock mlock2 mlockall require CAP_IPC_LOCK vhangup requires CAP_SYS_TTY_CONFIG There is therefore no reason to allow them in the default profile as they cannot be used anyway. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2016-04-21 18:23:59 +01:00
Tonis Tiigi	99b16b3523	Reuse profiles/seccomp package Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>	2016-03-19 14:15:39 -07:00
Justin Cormack	96896f2d0b	Add new syscalls in libseccomp 2.3.0 to seccomp default profile This adds the following new syscalls that are supported in libseccomp 2.3.0, including calls added up to kernel 4.5-rc4: mlock2 - same as mlock but with a flag copy_file_range - copy file contents, like splice but with reflink support. The following are not added, and mentioned in docs: userfaultfd - userspace page fault handling, mainly designed for process migration The following are not added, only apply to less common architectures: switch_endian membarrier breakpoint set_tls I plan to review the other architectures, some of which can now have seccomp enabled in the build as they are now supported. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2016-03-16 21:17:32 +00:00
Justin Cormack	5abd881883	Allow restart_syscall in default seccomp profile Fixes #20818 This syscall was blocked as there was some concern that it could be used to bypass filtering of other syscall arguments. However none of the potential syscalls where this could be an issue (poll, nanosleep, clock_nanosleep, futex) are blocked in the default profile anyway. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2016-03-11 16:44:11 +00:00
allencloud	34b82a69b9	fix some typos. Signed-off-by: allencloud <allen.sun@daocloud.io>	2016-03-10 10:09:27 +08:00
Justin Cormack	31410a6d79	Add ipc syscall to default seccomp profile On 32 bit x86 this is a multiplexing syscall for the system V ipc syscalls such as shmget, and so needs to be allowed for shared memory access for 32 bit binaries. Fixes #20733 Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2016-03-05 22:12:23 +00:00
Justin Cormack	39b799ac53	Add some uses of personality syscall to default seccomp filter We generally want to filter the personality(2) syscall, as it allows disabling ASLR, and turning on some poorly supported emulations that have been the target of CVEs. However the use cases for reading the current value, setting the default PER_LINUX personality, and setting PER_LINUX32 for 32 bit emulation are fine. See issue #20634 Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2016-02-26 18:43:08 +01:00
Antonio Murdaca	11435b674b	add seccomp default profile fix tests Signed-off-by: Antonio Murdaca <runcom@redhat.com> Signed-off-by: Jessica Frazelle <acidburn@docker.com>	2016-02-19 13:32:54 -08:00
Jessica Frazelle	ad600239bc	generate seccomp profile convert type Signed-off-by: Jessica Frazelle <acidburn@docker.com>	2016-02-19 13:32:54 -08:00
Jessica Frazelle	9bc771af9d	add validation for generating default secccomp profile Signed-off-by: Jessica Frazelle <acidburn@docker.com>	2016-02-08 13:04:52 -08:00
Jessica Frazelle	d57816de02	add default seccomp profile as json profile is created by go generate Signed-off-by: Jessica Frazelle <acidburn@docker.com>	2016-02-08 08:19:21 -08:00
Jessica Frazelle	bed0bb7d01	move default seccomp profile into package Signed-off-by: Jessica Frazelle <acidburn@docker.com>	2016-01-21 16:55:29 -08:00

41 commits