beenull/moby

Author	SHA1	Message	Date
Sebastiaan van Stijn	1155b6bc7a	Merge pull request #41395 from cpuguy83/no_libseccomp Remove dependency in dockerd on libseccomp	2020-09-15 17:37:04 +02:00
Brian Goff	ccbb00c815	Remove dependency in dockerd on libseccomp This was just using libseccomp to get the right arch, but we can use GOARCH to get this. The nativeToSeccomp map needed to be adjusted a bit for mipsle vs mipsel since that's go how refers to it. Also added some other arches to it. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2020-09-11 22:48:42 +00:00
Justin Cormack	7ca355652f	Merge pull request #41337 from cyphar/apparmor-update-profile apparmor: permit signals from unconfined programs	2020-09-11 12:05:40 +01:00
Jintao Zhang	a18139111d	Add faccessat2 to default seccomp profile. Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>	2020-08-17 21:13:03 +08:00
Jintao Zhang	b8988c8475	Add openat2 to default seccomp profile. follow up to https://github.com/moby/moby/pull/41344#discussion_r469919978 Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>	2020-08-16 15:58:57 +08:00
Aleksa Sarai	725eced4e0	apparmor: permit signals from unconfined programs Otherwise if you try to kill a container process from the host directly, you get EACCES. Also add a comment to make sure that the profile code (which has been replicated by several projects) doesn't get out of sync. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2020-08-11 18:18:58 +10:00
Sebastiaan van Stijn	3895dd585f	Replace uses of blacklist/whitelist Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-07-14 10:41:34 +02:00
Florian Schmaus	d0d99b04cf	seccomp: allow 'rseq' syscall in default seccomp profile Restartable Sequences (rseq) are a kernel-based mechanism for fast update operations on per-core data in user-space. Some libraries, like the newest version of Google's TCMalloc, depend on it [1]. This also makes dockers default seccomp profile on par with systemd's, which enabled 'rseq' in early 2019 [2]. 1: https://google.github.io/tcmalloc/design.html 2: `6fee3be0b4` Signed-off-by: Florian Schmaus <flo@geekplace.eu>	2020-06-26 16:06:26 +02:00
Justin Cormack	1aafcbb47a	Merge pull request #40995 from KentaTada/remove-unused-syscall seccomp: remove the unused query_module(2)	2020-05-28 11:25:59 +01:00
Akihiro Suda	b2917efb1a	Merge pull request #40731 from sqreen/fix/seccomp-profile seccomp: allow syscall membarrier	2020-05-20 00:31:32 +09:00
Kenta Tada	1192c7aee4	seccomp: remove the unused query_module(2) query_module(2) is only in kernels before Linux 2.6. Signed-off-by: Kenta Tada <Kenta.Tada@sony.com>	2020-05-19 10:30:54 +09:00
Stanislav Levin	5d3a9e4319	seccomp: Whitelist `clock_adjtime` This only allows making the syscall. CAP_SYS_TIME is still required for time adjustment (enforced by the kernel): ``` kernel/time/posix-timers.c: 1112 SYSCALL_DEFINE2(clock_adjtime, const clockid_t, which_clock, 1113 struct __kernel_timex __user , utx) ... 1121 err = do_clock_adjtime(which_clock, &ktx); 1100 int do_clock_adjtime(const clockid_t which_clock, struct __kernel_timex ktx) 1101 { ... 1109 return kc->clock_adj(which_clock, ktx); 1299 static const struct k_clock clock_realtime = { ... 1304 .clock_adj = posix_clock_realtime_adj, 188 static int posix_clock_realtime_adj(const clockid_t which_clock, 189 struct __kernel_timex t) 190 { 191 return do_adjtimex(t); kernel/time/timekeeping.c: 2312 int do_adjtimex(struct __kernel_timex txc) 2313 { ... 2321 /* Validate the data before disabling interrupts / 2322 ret = timekeeping_validate_timex(txc); 2246 static int timekeeping_validate_timex(const struct __kernel_timex txc) 2247 { 2248 if (txc->modes & ADJ_ADJTIME) { ... 2252 if (!(txc->modes & ADJ_OFFSET_READONLY) && 2253 !capable(CAP_SYS_TIME)) 2254 return -EPERM; 2255 } else { 2256 /* In order to modify anything, you gotta be super-user! */ 2257 if (txc->modes && !capable(CAP_SYS_TIME)) 2258 return -EPERM; ``` Fixes: https://github.com/moby/moby/issues/40919 Signed-off-by: Stanislav Levin <slev@altlinux.org>	2020-05-08 12:33:25 +03:00
Julio Guerra	1026f873a4	seccomp: allow syscall membarrier Add the membarrier syscall to the default seccomp profile. It is for example used in the implementation of dlopen() in the musl libc of Alpine images. Signed-off-by: Julio Guerra <julio@sqreen.com>	2020-04-07 16:24:17 +02:00
Sebastiaan van Stijn	89fabf0f24	seccomp: add 64-bit time_t syscalls Relates to https://patchwork.kernel.org/patch/10756415/ Added to whitelist: - `clock_getres_time64` (equivalent of `clock_getres`, which was whitelisted) - `clock_gettime64` (equivalent of `clock_gettime`, which was whitelisted) - `clock_nanosleep_time64` (equivalent of `clock_nanosleep`, which was whitelisted) - `futex_time64` (equivalent of `futex`, which was whitelisted) - `io_pgetevents_time64` (equivalent of `io_pgetevents`, which was whitelisted) - `mq_timedreceive_time64` (equivalent of `mq_timedreceive`, which was whitelisted) - `mq_timedsend_time64 ` (equivalent of `mq_timedsend`, which was whitelisted) - `ppoll_time64` (equivalent of `ppoll`, which was whitelisted) - `pselect6_time64` (equivalent of `pselect6`, which was whitelisted) - `recvmmsg_time64` (equivalent of `recvmmsg`, which was whitelisted) - `rt_sigtimedwait_time64` (equivalent of `rt_sigtimedwait`, which was whitelisted) - `sched_rr_get_interval_time64` (equivalent of `sched_rr_get_interval`, which was whitelisted) - `semtimedop_time64` (equivalent of `semtimedop`, which was whitelisted) - `timer_gettime64` (equivalent of `timer_gettime`, which was whitelisted) - `timer_settime64` (equivalent of `timer_settime`, which was whitelisted) - `timerfd_gettime64` (equivalent of `timerfd_gettime`, which was whitelisted) - `timerfd_settime64` (equivalent of `timerfd_settime`, which was whitelisted) - `utimensat_time64` (equivalent of `utimensat`, which was whitelisted) Not added to whitelist: - `clock_adjtime64` (equivalent of `clock_adjtime`, which was not whitelisted) - `clock_settime64` (equivalent of `clock_settime`, which was not whitelisted) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-03-25 13:49:49 +01:00
Arnaud Rebillout	667c87ef4f	profiles: Fix file permissions on json files json files should not be executable I think. Signed-off-by: Arnaud Rebillout <arnaud.rebillout@collabora.com>	2019-09-16 11:15:37 +07:00
youcai	f4d41f1dfa	seccomp: whitelist io-uring related system calls Signed-off-by: youcai <omegacoleman@gmail.com>	2019-09-07 07:35:23 +00:00
Michael Crosby	e4605cc2a5	Add sigprocmask to default seccomp profile Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2019-08-29 13:52:45 -04:00
Kir Kolyshkin	0d496e3d71	profiles/seccomp: improve profile conversion When translating seccomp profile to opencontainers format, a single group with multiple syscalls is converted to individual syscall rules. I am not sure why it is done that way, but suspect it might have performance implications as the number of rules grows. Change this to pass a groups of syscalls as a group. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2019-06-18 17:58:51 -07:00
Sebastiaan van Stijn	9e763de6ad	Merge pull request #39121 from goldwynr/master apparmor: allow readby and tracedby	2019-06-11 18:25:47 +02:00
Sebastiaan van Stijn	a1ec8551ab	Fix seccomp profile for clone syscall All clone flags for namespace should be denied. Based-on-patch-by: Kenta Tada <Kenta.Tada@sony.com> Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2019-06-04 15:28:12 +02:00
Goldwyn Rodrigues	b36455258f	apparmor: allow readby and tracedby Fixes audit errors such as: type=AVC msg=audit(1550236803.810:143): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=3181 comm="ps" requested_mask="readby" denied_mask="readby" peer="docker-default" audit(1550236375.918:3): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=2267 comm="ps" requested_mask="tracedby" denied_mask="tracedby" peer="docker-default" Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>	2019-04-22 09:11:50 -05:00
Avi Kivity	665741510a	seccomp: whitelist io_pgetevents() io_pgetevents() is a new Linux system call. It is similar to io_getevents() that is already whitelisted, and adds no special abilities over that system call. Allow that system call to enable applications that use it. Fixes #38894. Signed-off-by: Avi Kivity <avi@scylladb.com>	2019-03-18 20:46:16 +02:00
Tonis Tiigi	e76380b67b	seccomp: review update Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>	2019-02-05 12:02:41 -08:00
Justin Cormack	1603af9689	Merge pull request #38137 from tonistiigi/seccomp-ptrace seccomp: allow ptrace(2) for 4.8+ kernels	2019-02-05 13:47:43 +00:00
Vincent Demeester	f11b87bfca	Merge pull request #37831 from cyphar/apparmor-external-templates apparmor: allow receiving of signals from 'docker kill'	2018-11-19 09:12:15 +01:00
Tonis Tiigi	1124543ca8	seccomp: allow ptrace for 4.8+ kernels 4.8+ kernels have fixed the ptrace security issues so we can allow ptrace(2) on the default seccomp profile if we do the kernel version check. `93e35efb8d` Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>	2018-11-04 13:06:43 -08:00
Justin Cormack	ccd22ffcc8	Move the syslog syscall to be gated by CAP_SYS_ADMIN or CAP_SYSLOG This call is what is used to implement `dmesg` to get kernel messages about the host. This can leak substantial information about the host. It is normally available to unprivileged users on the host, unless the sysctl `kernel.dmesg_restrict = 1` is set, but this is not set by standard on the majority of distributions. Blocking this to restrict leaks about the configuration seems correct. Fix #37897 See also https://googleprojectzero.blogspot.com/2018/09/a-cache-invalidation-bug-in-linux.html Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2018-09-27 14:27:05 -07:00
Aleksa Sarai	4822fb1e24	apparmor: allow receiving of signals from 'docker kill' In newer kernels, AppArmor will reject attempts to send signals to a container because the signal originated from outside of that AppArmor profile. Correct this by allowing all unconfined signals to be received. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Aleksa Sarai <asarai@suse.de>	2018-09-13 02:06:56 +10:00
Nicolas V Castet	47dfff68e4	Whitelist syscalls linked to CAP_SYS_NICE in default seccomp profile * Update profile to match docker documentation at https://docs.docker.com/engine/security/seccomp/ Signed-off-by: Nicolas V Castet <nvcastet@us.ibm.com>	2018-06-20 07:32:08 -05:00
Justin Cormack	15ff09395c	If container will run as non root user, drop permitted, effective caps early As soon as the initial executable in the container is executed as a non root user, permitted and effective capabilities are dropped. Drop them earlier than this, so that they are dropped before executing the file. The main effect of this is that if `CAP_DAC_OVERRIDE` is set (the default) the user will not be able to execute files they do not have permission to execute, which previously they could. The old behaviour was somewhat surprising and the new one is definitely correct, but it is not in any meaningful way exploitable, and I do not think it is necessary to backport this fix. It is unlikely to have any negative effects as almost all executables have world execute permission anyway. Use the bounding set not the effective set as the canonical set of capabilities, as effective will now vary. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2018-03-19 14:45:27 -07:00
NobodyOnSE	b2a907c8ca	Whitelist statx syscall for libseccomp-2.3.3 onward Older seccomp versions will ignore this. Signed-off-by: NobodyOnSE <ich@sektor.selfip.com>	2018-03-06 08:42:12 +01:00
Daniel Nephin	4f0d95fa6e	Add canonical import comment Signed-off-by: Daniel Nephin <dnephin@docker.com>	2018-02-05 16:51:57 -05:00
Chao Wang	5c154cfac8	Copy Inslice() to those parts that use it Signed-off-by: Chao Wang <wangchao.fnst@cn.fujitsu.com>	2017-11-10 13:42:38 +08:00
Tycho Andersen	b4a6ccbc5f	drop useless apparmor denies These files don't exist under proc so this rule does nothing. They are protected against by docker's default cgroup devices since they're both character devices and not explicitly allowed. Signed-off-by: Tycho Andersen <tycho@docker.com>	2017-10-06 09:11:59 -06:00
Simon Vikstrom	d7bf5e3b4d	Remove double defined alarm Signed-off-by: Simon Vikstrom <pullreq@devsn.se>	2017-08-19 09:55:03 +02:00
Yong Tang	bbb401de87	Merge pull request #34445 from pmoust/f-seccomp-quotacl seccomp: whitelist quotactl with CAP_SYS_ADMIN	2017-08-09 11:53:13 -07:00
Panagiotis Moustafellos	cf6e1c5dfd	seccomp: whitelist quotactl with CAP_SYS_ADMIN The quotactl syscall is being whitelisted in default seccomp profile, gated by CAP_SYS_ADMIN. Signed-off-by: Panagiotis Moustafellos <pmoust@elastic.co>	2017-08-09 18:52:15 +03:00
Vincent Demeester	9ef3b53597	Move pkg/templates away - Remove unused function and variables from the package - Remove usage of it from `profiles/apparmor` where it wasn't required - Move the package to `daemon/logger/templates` where it's only used Signed-off-by: Vincent Demeester <vincent@sbr.pm>	2017-08-08 18:16:41 +02:00
Florin Patan	52d4716843	Remove unused import This commit removes an unused import. Signed-off-by: Florin Patan <florinpatan@gmail.com>	2017-07-29 22:21:53 +01:00
Christopher Jones	069fdc8a08	[project] change syscall to /x/sys/unix\|windows Changes most references of syscall to golang.org/x/sys/ Ones aren't changes include, Errno, Signal and SysProcAttr as they haven't been implemented in /x/sys/. Signed-off-by: Christopher Jones <tophj@linux.vnet.ibm.com> [s390x] switch utsname from unsigned to signed per `33267e036f` char in s390x in the /x/sys/unix package is now signed, so change the buildtags Signed-off-by: Christopher Jones <tophj@linux.vnet.ibm.com>	2017-07-11 08:00:32 -04:00
Miklos Szegedi	2db05316d0	Whitelist adjtimex get operation. Adjustment operations are gated by CAP_SYS_TIME Signed-off-by: Miklos Szegedi <miklos.szegedi@cloudera.com>	2017-06-02 18:48:16 +00:00
Justin Cormack	dcf2632945	Revert "Block obsolete socket families in the default seccomp profile" This reverts commit `7e3a596a63`. Unfortunately, it was pointed out in https://github.com/moby/moby/pull/29076#commitcomment-21831387 that the `socketcall` syscall takes a pointer to a struct so it is not possible to use seccomp profiles to filter it. This means these cannot be blocked as you can use `socketcall` to call them regardless, as we currently allow 32 bit syscalls. Users who wish to block these should use a seccomp profile that blocks all 32 bit syscalls and then just block the non socketcall versions. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2017-05-09 14:26:00 +01:00
Michael Crosby	005506d36c	Update moby to runc and oci 1.0 runtime final rc Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2017-05-05 13:45:45 -07:00
Ian Campbell	cd456433ea	seccomp: Allow personality with UNAME26 bit set. From personality(2): Have uname(2) report a 2.6.40+ version number rather than a 3.x version number. Added as a stopgap measure to support broken applications that could not handle the kernel version-numbering switch from 2.6.x to 3.x. This allows both "UNAME26\|PER_LINUX" and "UNAME26\|PER_LINUX32". Fixes: #32839 Signed-off-by: Ian Campbell <ian.campbell@docker.com>	2017-05-02 15:05:01 +01:00
Antonio Murdaca	3ab4961032	profiles: seccomp: allow clock_settime when CAP_SYS_TIME is added Signed-off-by: Antonio Murdaca <runcom@redhat.com>	2017-03-20 11:05:23 +01:00
Justin Cormack	9067ef0e32	Seccomp Update - Update libseccomp-golang to 0.9.0 release - Update libseccomp to 2.3.2 release - add preadv2 and pwritev2 syscalls to whitelist Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2017-03-07 22:19:46 +00:00
Aleksa Sarai	a3155743ad	profiles: seccomp: fix !seccomp build Previously building with seccomp disabled would cause build failures because of a mismatch in the type signatures of DefaultProfile(). Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-03-02 21:13:17 +11:00
Gabriel Linder	52d8f582c3	Allow sync_file_range2 on supported architectures. Signed-off-by: Gabriel Linder <linder.gabriel@gmail.com>	2017-02-14 21:29:33 +01:00
Justin Cormack	d6adcd6a82	Add two arm specific syscalls to seccomp profile These are arm variants with different argument ordering because of register alignment requirements. fix #30516 Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2017-01-29 14:59:45 +00:00
Justin Cormack	7e3a596a63	Block obsolete socket families in the default seccomp profile Linux supports many obsolete address families, which are usually available in common distro kernels, but they are less likely to be properly audited and may have security issues This blocks all socket families in the socket (and socketcall where applicable) syscall except - AF_UNIX - Unix domain sockets - AF_INET - IPv4 - AF_INET6 - IPv6 - AF_NETLINK - Netlink sockets for communicating with the ekrnel - AF_PACKET - raw sockets, which are only allowed with CAP_NET_RAW All other socket families are blocked, including Appletalk (native, not over IP), IPX (remember that!), VSOCK and HVSOCK, which should not generally be used in containers, etc. Note that users can of course provide a profile per container or in the daemon config if they have unusual use cases that require these. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2017-01-17 17:50:44 +00:00

1 2

93 commits