0ct0pu5/moby

Author	SHA1	Message	Date
Bjorn Neergaard	a480b37621	seccomp: add name_to_handle_at to allowlist Based on the analysis on [the previous PR][1]. [1]: https://github.com/moby/moby/pull/45766#pullrequestreview-1493908145 Signed-off-by: Bjorn Neergaard <bjorn.neergaard@docker.com> (cherry picked from commit `b335e3d305`) Resolved conflicts: profiles/seccomp/default_linux.go Signed-off-by: Bjorn Neergaard <bjorn.neergaard@docker.com>	2023-06-28 05:48:28 -06:00
Vitor Anjos	45a8248070	remove name_to_handle_at(2) from filtered syscalls Signed-off-by: Vitor Anjos <bartier@users.noreply.github.com> (cherry picked from commit `fdc9b7cceb`) Resolved conflicts: profiles/seccomp/default_linux.go Co-Authored-by: Bjorn Neergaard <bjorn.neergaard@docker.com> Signed-off-by: Bjorn Neergaard <bjorn.neergaard@docker.com>	2023-06-27 13:22:05 -06:00
Bjorn Neergaard	dcf27af59b	Revert "seccomp: block socket calls to AF_VSOCK in default profile" This reverts commit `57b229012a`. This change, while favorable from a security standpoint, caused a regression for users of the 20.10 branch of Moby. As such, we are reverting it to ensure stability and compatibility for the affected users. However, users of AF_VSOCK in containers should recognize that this (special) address family is not currently namespaced in any version of the Linux kernel, and may result in unexpected behavior, like VMs communicating directly with host hypervisors. Future branches, including the 23.0 branch, will continue to filter AF_VSOCK. Users who need to allow containers to communicate over the unnamespaced AF_VSOCK will need to turn off seccomp confinement or set a custom seccomp profile. It is our hope that future mechanisms will make this more ergonomic/maintainable for end users, and that future kernels will support namespacing of AF_VSOCK. Signed-off-by: Bjorn Neergaard <bneergaard@mirantis.com>	2022-12-29 13:16:57 -07:00
Sebastiaan van Stijn	a01576ec4a	seccomp: block socket calls to AF_VSOCK in default profile This syncs the seccomp-profile with the latest changes in containerd's profile, applying the same changes as `17a9324035` Some background from the associated ticket: > We want to use vsock for guest-host communication on KubeVirt > (https://github.com/kubevirt/kubevirt). In KubeVirt we run VMs in pods. > > However since anyone can just connect from any pod to any VM with the > default seccomp settings, we cannot limit connection attempts to our > privileged node-agent. > > ### Describe the solution you'd like > We want to deny the `socket` syscall for the `AF_VSOCK` family by default. > > I see in [1] and [2] that AF_VSOCK was actually already blocked for some > time, but that got reverted since some architectures support the `socketcall` > syscall which can't be restricted properly. However we are mostly interested > in `arm64` and `amd64` where limiting `socket` would probably be enough. > > ### Additional context > I know that in theory we could use our own seccomp profiles, but we would want > to provide security for as many users as possible which use KubeVirt, and there > it would be very helpful if this protection could be added by being part of the > DefaultRuntime profile to easily ensure that it is active for all pods [3]. > > Impact on existing workloads: It is unlikely that this will disturb any existing > workload, becuase VSOCK is almost exclusively used for host-guest commmunication. > However if someone would still use it: Privileged pods would still be able to > use `socket` for `AF_VSOCK`, custom seccomp policies could be applied too. > Further it was already blocked for quite some time and the blockade got lifted > due to reasons not related to AF_VSOCK. > > The PR in KubeVirt which adds VSOCK support for additional context: [4] > > [1]: https://github.com/moby/moby/pull/29076#commitcomment-21831387 > [2]: `dcf2632945` > [3]: https://kubernetes.io/docs/tutorials/security/seccomp/#enable-the-use-of-runtimedefault-as-the-default-seccomp-profile-for-all-workloads > [4]: https://github.com/kubevirt/kubevirt/pull/8546 Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit `57b229012a`) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2022-12-01 14:32:05 +01:00
Djordje Lukic	d127287d92	Allow different syscalls from kernels 5.12 -> 5.16 Kernel 5.12: mount_setattr: needs CAP_SYS_ADMIN Kernel 5.14: quotactl_fd: needs CAP_SYS_ADMIN memfd_secret: always allowed Kernel 5.15: process_mrelease: always allowed Kernel 5.16: futex_waitv: always allowed Signed-off-by: Djordje Lukic <djordje.lukic@docker.com> (cherry picked from commit `7de9f4f82d`) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2022-08-18 18:58:09 +02:00
Tudor Brindus	57db169641	seccomp: add support for Landlock syscalls in default policy This commit allows the Landlock[0] system calls in the default seccomp policy. Landlock was introduced in kernel 5.13, to fill the gap that inspecting filepaths passed as arguments to filesystem system calls is not really possible with pure `seccomp` (unless involving `ptrace`). Allowing Landlock by default fits in with allowing `seccomp` for containerized applications to voluntarily restrict their access rights to files within the container. [0]: https://www.kernel.org/doc/html/latest/userspace-api/landlock.html Signed-off-by: Tudor Brindus <me@tbrindus.ca> (cherry picked from commit `af819bf623`) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2022-08-18 18:55:16 +02:00
Tianon Gravi	567c01f6d1	seccomp: add support for "clone3" syscall in default policy This is a backport of `9f6b562dd1`, adapted to avoid the refactoring that happened in `d92739713c`. Original commit message is as follows: > If no seccomp policy is requested, then the built-in default policy in > dockerd applies. This has no rule for "clone3" defined, nor any default > errno defined. So when runc receives the config it attempts to determine > a default errno, using logic defined in its commit: > > opencontainers/runc@7a8d716 > > As explained in the above commit message, runc uses a heuristic to > decide which errno to return by default: > > [quote] > The solution applied here is to prepend a "stub" filter which returns > -ENOSYS if the requested syscall has a larger syscall number than any > syscall mentioned in the filter. The reason for this specific rule is > that syscall numbers are (roughly) allocated sequentially and thus newer > syscalls will (usually) have a larger syscall number -- thus causing our > filters to produce -ENOSYS if the filter was written before the syscall > existed. > [/quote] > > Unfortunately clone3 appears to one of the edge cases that does not > result in use of ENOSYS, instead ending up with the historical EPERM > errno. > > Latest glibc (2.33.9000, in Fedora 35 rawhide) will attempt to use > clone3 by default. If it sees ENOSYS then it will automatically > fallback to using clone. Any other errno is treated as a fatal > error. Thus when docker seccomp policy triggers EPERM from clone3, > no fallback occurs and programs are thus unable to spawn threads. > > The clone3 syscall is much more complicated than clone, most notably its > flags are not exposed as a directly argument any more. Instead they are > hidden inside a struct. This means that seccomp filters are unable to > apply policy based on values seen in flags. Thus we can't directly > replicate the current "clone" filtering for "clone3". We can at least > ensure "clone3" returns ENOSYS errno, to trigger fallback to "clone" > at which point we can filter on flags. Signed-off-by: Tianon Gravi <admwiggin@gmail.com> Co-authored-by: Daniel P. Berrangé <berrange@redhat.com>	2021-09-13 08:56:21 -07:00
Aleksa Sarai	a6a88b3145	profiles: seccomp: update to Linux 5.11 syscall list These syscalls (some of which have been in Linux for a while but were missing from the profile) fall into a few buckets: * close_range(2), epoll_pwait2(2) are just extensions of existing "safe for everyone" syscalls. * The mountv2 API syscalls (fs(2), move_mount(2), open_tree(2)) are all equivalent to aspects of mount(2) and thus go into the CAP_SYS_ADMIN category. process_madvise(2) is similar to the other process_*(2) syscalls and thus goes in the CAP_SYS_PTRACE category. Signed-off-by: Aleksa Sarai <asarai@suse.de> (cherry picked from commit `54eff4354b`) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2021-02-17 21:22:12 +01:00
Mark Vainomaa	f7bcb02f67	seccomp: Add pidfd_getfd syscall Signed-off-by: Mark Vainomaa <mikroskeem@mikroskeem.eu>	2020-11-12 15:31:07 +02:00
Mark Vainomaa	5e3ffe6464	seccomp: Add pidfd_open and pidfd_send_signal Signed-off-by: Mark Vainomaa <mikroskeem@mikroskeem.eu>	2020-11-11 15:20:34 +02:00
Sebastiaan van Stijn	0d75b63987	seccomp: replace types with runtime-spec types Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-09-18 19:33:58 +02:00
Jintao Zhang	a18139111d	Add faccessat2 to default seccomp profile. Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>	2020-08-17 21:13:03 +08:00
Jintao Zhang	b8988c8475	Add openat2 to default seccomp profile. follow up to https://github.com/moby/moby/pull/41344#discussion_r469919978 Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>	2020-08-16 15:58:57 +08:00
Florian Schmaus	d0d99b04cf	seccomp: allow 'rseq' syscall in default seccomp profile Restartable Sequences (rseq) are a kernel-based mechanism for fast update operations on per-core data in user-space. Some libraries, like the newest version of Google's TCMalloc, depend on it [1]. This also makes dockers default seccomp profile on par with systemd's, which enabled 'rseq' in early 2019 [2]. 1: https://google.github.io/tcmalloc/design.html 2: `6fee3be0b4` Signed-off-by: Florian Schmaus <flo@geekplace.eu>	2020-06-26 16:06:26 +02:00
Justin Cormack	1aafcbb47a	Merge pull request #40995 from KentaTada/remove-unused-syscall seccomp: remove the unused query_module(2)	2020-05-28 11:25:59 +01:00
Akihiro Suda	b2917efb1a	Merge pull request #40731 from sqreen/fix/seccomp-profile seccomp: allow syscall membarrier	2020-05-20 00:31:32 +09:00
Kenta Tada	1192c7aee4	seccomp: remove the unused query_module(2) query_module(2) is only in kernels before Linux 2.6. Signed-off-by: Kenta Tada <Kenta.Tada@sony.com>	2020-05-19 10:30:54 +09:00
Stanislav Levin	5d3a9e4319	seccomp: Whitelist `clock_adjtime` This only allows making the syscall. CAP_SYS_TIME is still required for time adjustment (enforced by the kernel): ``` kernel/time/posix-timers.c: 1112 SYSCALL_DEFINE2(clock_adjtime, const clockid_t, which_clock, 1113 struct __kernel_timex __user , utx) ... 1121 err = do_clock_adjtime(which_clock, &ktx); 1100 int do_clock_adjtime(const clockid_t which_clock, struct __kernel_timex ktx) 1101 { ... 1109 return kc->clock_adj(which_clock, ktx); 1299 static const struct k_clock clock_realtime = { ... 1304 .clock_adj = posix_clock_realtime_adj, 188 static int posix_clock_realtime_adj(const clockid_t which_clock, 189 struct __kernel_timex t) 190 { 191 return do_adjtimex(t); kernel/time/timekeeping.c: 2312 int do_adjtimex(struct __kernel_timex txc) 2313 { ... 2321 /* Validate the data before disabling interrupts / 2322 ret = timekeeping_validate_timex(txc); 2246 static int timekeeping_validate_timex(const struct __kernel_timex txc) 2247 { 2248 if (txc->modes & ADJ_ADJTIME) { ... 2252 if (!(txc->modes & ADJ_OFFSET_READONLY) && 2253 !capable(CAP_SYS_TIME)) 2254 return -EPERM; 2255 } else { 2256 /* In order to modify anything, you gotta be super-user! */ 2257 if (txc->modes && !capable(CAP_SYS_TIME)) 2258 return -EPERM; ``` Fixes: https://github.com/moby/moby/issues/40919 Signed-off-by: Stanislav Levin <slev@altlinux.org>	2020-05-08 12:33:25 +03:00
Julio Guerra	1026f873a4	seccomp: allow syscall membarrier Add the membarrier syscall to the default seccomp profile. It is for example used in the implementation of dlopen() in the musl libc of Alpine images. Signed-off-by: Julio Guerra <julio@sqreen.com>	2020-04-07 16:24:17 +02:00
Sebastiaan van Stijn	89fabf0f24	seccomp: add 64-bit time_t syscalls Relates to https://patchwork.kernel.org/patch/10756415/ Added to whitelist: - `clock_getres_time64` (equivalent of `clock_getres`, which was whitelisted) - `clock_gettime64` (equivalent of `clock_gettime`, which was whitelisted) - `clock_nanosleep_time64` (equivalent of `clock_nanosleep`, which was whitelisted) - `futex_time64` (equivalent of `futex`, which was whitelisted) - `io_pgetevents_time64` (equivalent of `io_pgetevents`, which was whitelisted) - `mq_timedreceive_time64` (equivalent of `mq_timedreceive`, which was whitelisted) - `mq_timedsend_time64 ` (equivalent of `mq_timedsend`, which was whitelisted) - `ppoll_time64` (equivalent of `ppoll`, which was whitelisted) - `pselect6_time64` (equivalent of `pselect6`, which was whitelisted) - `recvmmsg_time64` (equivalent of `recvmmsg`, which was whitelisted) - `rt_sigtimedwait_time64` (equivalent of `rt_sigtimedwait`, which was whitelisted) - `sched_rr_get_interval_time64` (equivalent of `sched_rr_get_interval`, which was whitelisted) - `semtimedop_time64` (equivalent of `semtimedop`, which was whitelisted) - `timer_gettime64` (equivalent of `timer_gettime`, which was whitelisted) - `timer_settime64` (equivalent of `timer_settime`, which was whitelisted) - `timerfd_gettime64` (equivalent of `timerfd_gettime`, which was whitelisted) - `timerfd_settime64` (equivalent of `timerfd_settime`, which was whitelisted) - `utimensat_time64` (equivalent of `utimensat`, which was whitelisted) Not added to whitelist: - `clock_adjtime64` (equivalent of `clock_adjtime`, which was not whitelisted) - `clock_settime64` (equivalent of `clock_settime`, which was not whitelisted) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-03-25 13:49:49 +01:00
Arnaud Rebillout	667c87ef4f	profiles: Fix file permissions on json files json files should not be executable I think. Signed-off-by: Arnaud Rebillout <arnaud.rebillout@collabora.com>	2019-09-16 11:15:37 +07:00
youcai	f4d41f1dfa	seccomp: whitelist io-uring related system calls Signed-off-by: youcai <omegacoleman@gmail.com>	2019-09-07 07:35:23 +00:00
Michael Crosby	e4605cc2a5	Add sigprocmask to default seccomp profile Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2019-08-29 13:52:45 -04:00
Sebastiaan van Stijn	a1ec8551ab	Fix seccomp profile for clone syscall All clone flags for namespace should be denied. Based-on-patch-by: Kenta Tada <Kenta.Tada@sony.com> Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2019-06-04 15:28:12 +02:00
Avi Kivity	665741510a	seccomp: whitelist io_pgetevents() io_pgetevents() is a new Linux system call. It is similar to io_getevents() that is already whitelisted, and adds no special abilities over that system call. Allow that system call to enable applications that use it. Fixes #38894. Signed-off-by: Avi Kivity <avi@scylladb.com>	2019-03-18 20:46:16 +02:00
Tonis Tiigi	e76380b67b	seccomp: review update Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>	2019-02-05 12:02:41 -08:00
Tonis Tiigi	1124543ca8	seccomp: allow ptrace for 4.8+ kernels 4.8+ kernels have fixed the ptrace security issues so we can allow ptrace(2) on the default seccomp profile if we do the kernel version check. `93e35efb8d` Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>	2018-11-04 13:06:43 -08:00
Justin Cormack	ccd22ffcc8	Move the syslog syscall to be gated by CAP_SYS_ADMIN or CAP_SYSLOG This call is what is used to implement `dmesg` to get kernel messages about the host. This can leak substantial information about the host. It is normally available to unprivileged users on the host, unless the sysctl `kernel.dmesg_restrict = 1` is set, but this is not set by standard on the majority of distributions. Blocking this to restrict leaks about the configuration seems correct. Fix #37897 See also https://googleprojectzero.blogspot.com/2018/09/a-cache-invalidation-bug-in-linux.html Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2018-09-27 14:27:05 -07:00
Nicolas V Castet	47dfff68e4	Whitelist syscalls linked to CAP_SYS_NICE in default seccomp profile * Update profile to match docker documentation at https://docs.docker.com/engine/security/seccomp/ Signed-off-by: Nicolas V Castet <nvcastet@us.ibm.com>	2018-06-20 07:32:08 -05:00
NobodyOnSE	b2a907c8ca	Whitelist statx syscall for libseccomp-2.3.3 onward Older seccomp versions will ignore this. Signed-off-by: NobodyOnSE <ich@sektor.selfip.com>	2018-03-06 08:42:12 +01:00
Simon Vikstrom	d7bf5e3b4d	Remove double defined alarm Signed-off-by: Simon Vikstrom <pullreq@devsn.se>	2017-08-19 09:55:03 +02:00
Panagiotis Moustafellos	cf6e1c5dfd	seccomp: whitelist quotactl with CAP_SYS_ADMIN The quotactl syscall is being whitelisted in default seccomp profile, gated by CAP_SYS_ADMIN. Signed-off-by: Panagiotis Moustafellos <pmoust@elastic.co>	2017-08-09 18:52:15 +03:00
Miklos Szegedi	2db05316d0	Whitelist adjtimex get operation. Adjustment operations are gated by CAP_SYS_TIME Signed-off-by: Miklos Szegedi <miklos.szegedi@cloudera.com>	2017-06-02 18:48:16 +00:00
Justin Cormack	dcf2632945	Revert "Block obsolete socket families in the default seccomp profile" This reverts commit `7e3a596a63`. Unfortunately, it was pointed out in https://github.com/moby/moby/pull/29076#commitcomment-21831387 that the `socketcall` syscall takes a pointer to a struct so it is not possible to use seccomp profiles to filter it. This means these cannot be blocked as you can use `socketcall` to call them regardless, as we currently allow 32 bit syscalls. Users who wish to block these should use a seccomp profile that blocks all 32 bit syscalls and then just block the non socketcall versions. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2017-05-09 14:26:00 +01:00
Ian Campbell	cd456433ea	seccomp: Allow personality with UNAME26 bit set. From personality(2): Have uname(2) report a 2.6.40+ version number rather than a 3.x version number. Added as a stopgap measure to support broken applications that could not handle the kernel version-numbering switch from 2.6.x to 3.x. This allows both "UNAME26\|PER_LINUX" and "UNAME26\|PER_LINUX32". Fixes: #32839 Signed-off-by: Ian Campbell <ian.campbell@docker.com>	2017-05-02 15:05:01 +01:00
Antonio Murdaca	3ab4961032	profiles: seccomp: allow clock_settime when CAP_SYS_TIME is added Signed-off-by: Antonio Murdaca <runcom@redhat.com>	2017-03-20 11:05:23 +01:00
Justin Cormack	9067ef0e32	Seccomp Update - Update libseccomp-golang to 0.9.0 release - Update libseccomp to 2.3.2 release - add preadv2 and pwritev2 syscalls to whitelist Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2017-03-07 22:19:46 +00:00
Gabriel Linder	52d8f582c3	Allow sync_file_range2 on supported architectures. Signed-off-by: Gabriel Linder <linder.gabriel@gmail.com>	2017-02-14 21:29:33 +01:00
Justin Cormack	d6adcd6a82	Add two arm specific syscalls to seccomp profile These are arm variants with different argument ordering because of register alignment requirements. fix #30516 Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2017-01-29 14:59:45 +00:00
Justin Cormack	7e3a596a63	Block obsolete socket families in the default seccomp profile Linux supports many obsolete address families, which are usually available in common distro kernels, but they are less likely to be properly audited and may have security issues This blocks all socket families in the socket (and socketcall where applicable) syscall except - AF_UNIX - Unix domain sockets - AF_INET - IPv4 - AF_INET6 - IPv6 - AF_NETLINK - Netlink sockets for communicating with the ekrnel - AF_PACKET - raw sockets, which are only allowed with CAP_NET_RAW All other socket families are blocked, including Appletalk (native, not over IP), IPX (remember that!), VSOCK and HVSOCK, which should not generally be used in containers, etc. Note that users can of course provide a profile per container or in the daemon config if they have unusual use cases that require these. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2017-01-17 17:50:44 +00:00
Antonio Murdaca	5ff21add06	New seccomp format Signed-off-by: Antonio Murdaca <runcom@redhat.com>	2016-09-01 11:53:07 +02:00
Justin Cormack	bdf01cf5de	Move mlock back into the default ungated seccomp profile Do not gate with CAP_IPC_LOCK as unprivileged use is now allowed in Linux. This returns it to how it was in 1.11. Fixes #23587 Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2016-06-15 16:25:27 -04:00
Justin Cormack	9ed6e39cdd	Do not restrict chown via seccomp, just let capabilities control access In #22554 I aligned seccomp and capabilities, however the case of the chown calls and CAP_CHOWN was less clearcut, as these are simple calls that the capabilities will block if they are not allowed. They are needed when no new privileges is not set in order to allow docker to call chown before the container is started, so there was a workaround but this did not include all the chown syscalls, and Arm was failing on some seccomp tests because it was using a different syscall from just the fchown that was allowed in this case. It is simpler to just allow all the chown calls in the default seccomp profile and let the capabilities subsystem block them. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2016-05-25 12:49:30 -07:00
Justin Cormack	a83cedddc6	Enable seccomp on ppc64le In order to do this, allow the socketcall syscall in the default seccomp profile. This is a multiplexing syscall for the socket operations, which is becoming obsolete gradually, but it is used in some architectures. libseccomp has special handling for it for x86 where it is common, so we did not need it in the profile, but does not have any handling for ppc64le. It turns out that the Debian images we use for tests do use the socketcall, while the newer images such as Ubuntu 16.04 do not. Enabling this does no harm as we allow all the socket operations anyway, and we allow the similar ipc call for similar reasons already. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2016-05-23 22:35:55 -07:00
Justin Cormack	a01c4dc8f8	Align default seccomp profile with selected capabilities Currently the default seccomp profile is fixed. This changes it so that it varies depending on the Linux capabilities selected with the --cap-add and --cap-drop options. Without this, if a user adds privileges, eg to allow ptrace with --cap-add sys_ptrace then still cannot actually use ptrace as it is still blocked by seccomp, so they will probably disable seccomp or use --privileged. With this change the syscalls that are needed for the capability are also allowed by the seccomp profile based on the selected capabilities. While this patch makes it easier to do things with for example cap_sys_admin enabled, as it will now allow creating new namespaces and use of mount, it still allows less than --cap-add cap_sys_admin --security-opt seccomp:unconfined would have previously. It is not recommended that users run containers with cap_sys_admin as this does give full access to the host machine. It also cleans up some architecture specific system calls to be only selected when needed. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2016-05-11 09:30:23 +01:00
Justin Cormack	e7a99ae5e1	Remove mlock and vhangup from the default seccomp profile These syscalls are already blocked by the default capabilities: mlock mlock2 mlockall require CAP_IPC_LOCK vhangup requires CAP_SYS_TTY_CONFIG There is therefore no reason to allow them in the default profile as they cannot be used anyway. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2016-04-21 18:23:59 +01:00
Justin Cormack	96896f2d0b	Add new syscalls in libseccomp 2.3.0 to seccomp default profile This adds the following new syscalls that are supported in libseccomp 2.3.0, including calls added up to kernel 4.5-rc4: mlock2 - same as mlock but with a flag copy_file_range - copy file contents, like splice but with reflink support. The following are not added, and mentioned in docs: userfaultfd - userspace page fault handling, mainly designed for process migration The following are not added, only apply to less common architectures: switch_endian membarrier breakpoint set_tls I plan to review the other architectures, some of which can now have seccomp enabled in the build as they are now supported. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2016-03-16 21:17:32 +00:00
Justin Cormack	5abd881883	Allow restart_syscall in default seccomp profile Fixes #20818 This syscall was blocked as there was some concern that it could be used to bypass filtering of other syscall arguments. However none of the potential syscalls where this could be an issue (poll, nanosleep, clock_nanosleep, futex) are blocked in the default profile anyway. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2016-03-11 16:44:11 +00:00
Justin Cormack	31410a6d79	Add ipc syscall to default seccomp profile On 32 bit x86 this is a multiplexing syscall for the system V ipc syscalls such as shmget, and so needs to be allowed for shared memory access for 32 bit binaries. Fixes #20733 Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2016-03-05 22:12:23 +00:00
Justin Cormack	39b799ac53	Add some uses of personality syscall to default seccomp filter We generally want to filter the personality(2) syscall, as it allows disabling ASLR, and turning on some poorly supported emulations that have been the target of CVEs. However the use cases for reading the current value, setting the default PER_LINUX personality, and setting PER_LINUX32 for 32 bit emulation are fine. See issue #20634 Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2016-02-26 18:43:08 +01:00

1 2

52 commits