0ct0pu5/moby

Author	SHA1	Message	Date
Sebastiaan van Stijn	ed7c26339e	seccomp: add futex_wake syscall (kernel v6.7, libseccomp v2.5.5) Add this syscall to match the profile in containerd containerd: `a6e52c74fa` libseccomp: `53267af3fb` kernel: `9f6c532f59` futex: Add sys_futex_wake() To complement sys_futex_waitv() add sys_futex_wake(). This syscall implements what was previously known as FUTEX_WAKE_BITSET except it uses 'unsigned long' for the bitmask and takes FUTEX2 flags. The 'unsigned long' allows FUTEX2_SIZE_U64 on 64bit platforms. Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit `d69729e053`) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2024-02-06 15:25:59 +01:00
Sebastiaan van Stijn	74e3b4fb2e	seccomp: add futex_wait syscall (kernel v6.7, libseccomp v2.5.5) Add this syscall to match the profile in containerd containerd: `a6e52c74fa` libseccomp: `53267af3fb` kernel: `cb8c4312af` futex: Add sys_futex_wait() To complement sys_futex_waitv()/wake(), add sys_futex_wait(). This syscall implements what was previously known as FUTEX_WAIT_BITSET except it uses 'unsigned long' for the value and bitmask arguments, takes timespec and clockid_t arguments for the absolute timeout and uses FUTEX2 flags. The 'unsigned long' allows FUTEX2_SIZE_U64 on 64bit platforms. Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit `10d344d176`) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2024-02-06 15:25:59 +01:00
Sebastiaan van Stijn	4cc0416534	seccomp: add futex_requeue syscall (kernel v6.7, libseccomp v2.5.5) Add this syscall to match the profile in containerd containerd: `a6e52c74fa` libseccomp: `53267af3fb` kernel: `0f4b5f9722` futex: Add sys_futex_requeue() Finish off the 'simple' futex2 syscall group by adding sys_futex_requeue(). Unlike sys_futex_{wait,wake}() its arguments are too numerous to fit into a regular syscall. As such, use struct futex_waitv to pass the 'source' and 'destination' futexes to the syscall. This syscall implements what was previously known as FUTEX_CMP_REQUEUE and uses {val, uaddr, flags} for source and {uaddr, flags} for destination. This design explicitly allows requeueing between different types of futex by having a different flags word per uaddr. Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit `df57a080b6`) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2024-02-06 15:25:59 +01:00
Sebastiaan van Stijn	f9f9e7ff9a	seccomp: add map_shadow_stack syscall (kernel v6.6, libseccomp v2.5.5) Add this syscall to match the profile in containerd containerd: `a6e52c74fa` libseccomp: `53267af3fb` kernel: `c35559f94e` x86/shstk: Introduce map_shadow_stack syscall When operating with shadow stacks enabled, the kernel will automatically allocate shadow stacks for new threads, however in some cases userspace will need additional shadow stacks. The main example of this is the ucontext family of functions, which require userspace allocating and pivoting to userspace managed stacks. Unlike most other user memory permissions, shadow stacks need to be provisioned with special data in order to be useful. They need to be setup with a restore token so that userspace can pivot to them via the RSTORSSP instruction. But, the security design of shadow stacks is that they should not be written to except in limited circumstances. This presents a problem for userspace, as to how userspace can provision this special data, without allowing for the shadow stack to be generally writable. Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit `8826f402f9`) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2024-02-06 15:25:59 +01:00
Sebastiaan van Stijn	5fb4eb941d	seccomp: add fchmodat2 syscall (kernel v6.6, libseccomp v2.5.5) Add this syscall to match the profile in containerd containerd: `a6e52c74fa` libseccomp: `53267af3fb` kernel: `09da082b07` fs: Add fchmodat2() On the userspace side fchmodat(3) is implemented as a wrapper function which implements the POSIX-specified interface. This interface differs from the underlying kernel system call, which does not have a flags argument. Most implementations require procfs [1][2]. There doesn't appear to be a good userspace workaround for this issue but the implementation in the kernel is pretty straight-forward. The new fchmodat2() syscall allows to pass the AT_SYMLINK_NOFOLLOW flag, unlike existing fchmodat. Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit `6f242f1a28`) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2024-02-06 15:25:59 +01:00
Sebastiaan van Stijn	67e9aa6d4d	seccomp: add cachestat syscall (kernel v6.5, libseccomp v2.5.5) Add this syscall to match the profile in containerd containerd: `a6e52c74fa` libseccomp: `53267af3fb` kernel: `cf264e1329` NAME cachestat - query the page cache statistics of a file. SYNOPSIS #include <sys/mman.h> struct cachestat_range { __u64 off; __u64 len; }; struct cachestat { __u64 nr_cache; __u64 nr_dirty; __u64 nr_writeback; __u64 nr_evicted; __u64 nr_recently_evicted; }; int cachestat(unsigned int fd, struct cachestat_range cstat_range, struct cachestat cstat, unsigned int flags); DESCRIPTION cachestat() queries the number of cached pages, number of dirty pages, number of pages marked for writeback, number of evicted pages, number of recently evicted pages, in the bytes range given by `off` and `len`. Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit `4d0d5ee10d`) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2024-02-06 15:25:58 +01:00
Sebastiaan van Stijn	61b82be580	seccomp: add set_mempolicy_home_node syscall (kernel v5.17, libseccomp v2.5.4) This syscall is gated by CAP_SYS_NICE, matching the profile in containerd. containerd: `a6e52c74fa` libseccomp: `d83cb7ac25` kernel: `c6018b4b25` mm/mempolicy: add set_mempolicy_home_node syscall This syscall can be used to set a home node for the MPOL_BIND and MPOL_PREFERRED_MANY memory policy. Users should use this syscall after setting up a memory policy for the specified range as shown below. mbind(p, nr_pages * page_size, MPOL_BIND, new_nodes->maskp, new_nodes->size + 1, 0); sys_set_mempolicy_home_node((unsigned long)p, nr_pages * page_size, home_node, 0); The syscall allows specifying a home node/preferred node from which kernel will fulfill memory allocation requests first. ... Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit `1251982cf7`) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2024-02-06 15:25:56 +01:00
Albin Kerouanton	891241e7e7	seccomp: block io_uring_* syscalls in default profile This syncs the seccomp profile with changes made to containerd's default profile in [1]. The original containerd issue and PR mention: > Security experts generally believe io_uring to be unsafe. In fact > Google ChromeOS and Android have turned it off, plus all Google > production servers turn it off. Based on the blog published by Google > below it seems like a bunch of vulnerabilities related to io_uring can > be exploited to breakout of the container. > > [2] > > Other security reaserchers also hold this opinion: see [3] for a > blackhat presentation on io_uring exploits. For the record, these syscalls were added to the allowlist in [4]. [1]: `a48ddf4a20` [2]: https://security.googleblog.com/2023/06/learnings-from-kctf-vrps-42-linux.html [3]: https://i.blackhat.com/BH-US-23/Presentations/US-23-Lin-bad_io_uring.pdf [4]: https://github.com/moby/moby/pull/39415 Signed-off-by: Albin Kerouanton <albinker@gmail.com>	2023-11-02 19:05:47 +01:00
Bjorn Neergaard	b335e3d305	seccomp: add name_to_handle_at to allowlist Based on the analysis on [the previous PR][1]. [1]: https://github.com/moby/moby/pull/45766#pullrequestreview-1493908145 Signed-off-by: Bjorn Neergaard <bjorn.neergaard@docker.com>	2023-06-28 05:44:48 -06:00
Vitor Anjos	fdc9b7cceb	remove name_to_handle_at(2) from filtered syscalls Signed-off-by: Vitor Anjos <bartier@users.noreply.github.com>	2023-06-27 09:49:38 -03:00
Sebastiaan van Stijn	57b229012a	seccomp: block socket calls to AF_VSOCK in default profile This syncs the seccomp-profile with the latest changes in containerd's profile, applying the same changes as `17a9324035` Some background from the associated ticket: > We want to use vsock for guest-host communication on KubeVirt > (https://github.com/kubevirt/kubevirt). In KubeVirt we run VMs in pods. > > However since anyone can just connect from any pod to any VM with the > default seccomp settings, we cannot limit connection attempts to our > privileged node-agent. > > ### Describe the solution you'd like > We want to deny the `socket` syscall for the `AF_VSOCK` family by default. > > I see in [1] and [2] that AF_VSOCK was actually already blocked for some > time, but that got reverted since some architectures support the `socketcall` > syscall which can't be restricted properly. However we are mostly interested > in `arm64` and `amd64` where limiting `socket` would probably be enough. > > ### Additional context > I know that in theory we could use our own seccomp profiles, but we would want > to provide security for as many users as possible which use KubeVirt, and there > it would be very helpful if this protection could be added by being part of the > DefaultRuntime profile to easily ensure that it is active for all pods [3]. > > Impact on existing workloads: It is unlikely that this will disturb any existing > workload, becuase VSOCK is almost exclusively used for host-guest commmunication. > However if someone would still use it: Privileged pods would still be able to > use `socket` for `AF_VSOCK`, custom seccomp policies could be applied too. > Further it was already blocked for quite some time and the blockade got lifted > due to reasons not related to AF_VSOCK. > > The PR in KubeVirt which adds VSOCK support for additional context: [4] > > [1]: https://github.com/moby/moby/pull/29076#commitcomment-21831387 > [2]: `dcf2632945` > [3]: https://kubernetes.io/docs/tutorials/security/seccomp/#enable-the-use-of-runtimedefault-as-the-default-seccomp-profile-for-all-workloads > [4]: https://github.com/kubevirt/kubevirt/pull/8546 Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2022-12-01 14:06:37 +01:00
Sebastiaan van Stijn	7b7d1132e8	seccomp: allow "bpf", "perf_event_open", gated by CAP_BPF, CAP_PERFMON Update the profile to make use of CAP_BPF and CAP_PERFMON capabilities. Prior to kernel 5.8, bpf and perf_event_open required CAP_SYS_ADMIN. This change enables finer control of the privilege setting, thus allowing us to run certain system tracing tools with minimal privileges. Based on the original patch from Henry Wang in the containerd repository. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2022-08-18 18:34:09 +02:00
zhubojun	e258d66f17	profiles: seccomp: add syscalls related to PKU in default policy Add pkey_alloc(2), pkey_free(2) and pkey_mprotect(2) in seccomp default profile. pkey_alloc(2), pkey_free(2) and pkey_mprotect(2) can only configure the calling process's own memory, so they are existing "safe for everyone" syscalls. close issue: #43481 Signed-off-by: zhubojun <bojun.zhu@foxmail.com>	2022-07-11 09:50:53 +08:00
Bastien Pascard	420142a886	profiles: seccomp: allow clock_settime64 when CAP_SYS_TIME is added Signed-off-by: Bastien Pascard <bpascard@hotmail.com>	2022-07-06 23:45:13 +02:00
Djordje Lukic	7de9f4f82d	Allow different syscalls from kernels 5.12 -> 5.16 Kernel 5.12: mount_setattr: needs CAP_SYS_ADMIN Kernel 5.14: quotactl_fd: needs CAP_SYS_ADMIN memfd_secret: always allowed Kernel 5.15: process_mrelease: always allowed Kernel 5.16: futex_waitv: always allowed Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>	2022-05-13 12:35:08 +02:00
Justin Cormack	f1dd6bf84e	Merge pull request #43553 from AkihiroSuda/riscv64 seccomp: support riscv64	2022-05-13 10:41:53 +01:00
Akihiro Suda	4c2f18f6cc	seccomp: support riscv64 Corresponds to containerd PR 6882 Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2022-05-02 17:41:43 +09:00
Tudor Brindus	af819bf623	seccomp: add support for Landlock syscalls in default policy This commit allows the Landlock[0] system calls in the default seccomp policy. Landlock was introduced in kernel 5.13, to fill the gap that inspecting filepaths passed as arguments to filesystem system calls is not really possible with pure `seccomp` (unless involving `ptrace`). Allowing Landlock by default fits in with allowing `seccomp` for containerized applications to voluntarily restrict their access rights to files within the container. [0]: https://www.kernel.org/doc/html/latest/userspace-api/landlock.html Signed-off-by: Tudor Brindus <me@tbrindus.ca>	2022-01-31 08:44:04 -05:00
Sören Tempel	85eaf23bf4	seccomp: add support for "swapcontext" syscall in default policy This system call is only available on the 32- and 64-bit PowerPC, it is used by modern programming language implementations (such as gcc-go) to implement coroutine features through userspace context switches. Other container environment, such as Systemd nspawn already whitelist this system call in their seccomp profile [1] [2]. As such, it would be nice to also whitelist it in moby. This issue was encountered on Alpine Linux GitLab CI system, which uses moby, when attempting to execute gcc-go compiled software on ppc64le. [1]: https://github.com/systemd/systemd/pull/9487 [2]: https://github.com/systemd/systemd/issues/9485 Signed-off-by: Sören Tempel <soeren+git@soeren-tempel.net>	2021-12-18 14:06:07 +01:00
Sebastiaan van Stijn	2480bebf59	Merge pull request #42649 from kinvolk/rata/seccomp-default-errno seccomp: Use explicit DefaultErrnoRet	2021-08-03 15:13:42 +02:00
Rodrigo Campos	fb794166d9	seccomp: Use explicit DefaultErrnoRet Since commit "seccomp: Sync fields with runtime-spec fields" (`5d244675bd`) we support to specify the DefaultErrnoRet to be used. Before that commit it was not specified and EPERM was used by default. This commit keeps the same behaviour but just makes it explicit that the default is EPERM. Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>	2021-07-30 19:13:21 +02:00
Daniel P. Berrangé	9f6b562dd1	seccomp: add support for "clone3" syscall in default policy If no seccomp policy is requested, then the built-in default policy in dockerd applies. This has no rule for "clone3" defined, nor any default errno defined. So when runc receives the config it attempts to determine a default errno, using logic defined in its commit: `7a8d7162f9` As explained in the above commit message, runc uses a heuristic to decide which errno to return by default: [quote] The solution applied here is to prepend a "stub" filter which returns -ENOSYS if the requested syscall has a larger syscall number than any syscall mentioned in the filter. The reason for this specific rule is that syscall numbers are (roughly) allocated sequentially and thus newer syscalls will (usually) have a larger syscall number -- thus causing our filters to produce -ENOSYS if the filter was written before the syscall existed. [/quote] Unfortunately clone3 appears to one of the edge cases that does not result in use of ENOSYS, instead ending up with the historical EPERM errno. Latest glibc (2.33.9000, in Fedora 35 rawhide) will attempt to use clone3 by default. If it sees ENOSYS then it will automatically fallback to using clone. Any other errno is treated as a fatal error. Thus when docker seccomp policy triggers EPERM from clone3, no fallback occurs and programs are thus unable to spawn threads. The clone3 syscall is much more complicated than clone, most notably its flags are not exposed as a directly argument any more. Instead they are hidden inside a struct. This means that seccomp filters are unable to apply policy based on values seen in flags. Thus we can't directly replicate the current "clone" filtering for "clone3". We can at least ensure "clone3" returns ENOSYS errno, to trigger fallback to "clone" at which point we can filter on flags. Fixes: https://github.com/moby/moby/issues/42680 Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2021-07-27 10:56:07 +01:00
Sebastiaan van Stijn	c7cd1b9436	profiles/seccomp.Syscall: use pointers and omitempty These fields are optional, and this makes the JSON representation slightly less verbose. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2021-06-17 21:25:09 +02:00
Sebastiaan van Stijn	d92739713c	seccomp.Syscall: embed runtime-spec Syscall type This makes the type better reflect the difference with the "runtime" profile; our local type is used to generate a runtime-spec seccomp profile and extends the runtime-spec type with additional fields; adding a "Name" field for backward compatibility with older JSON representations, additional "Comment" metadata, and conditional rules ("Includes", "Excludes") used during generation to adjust the profile based on the container (capabilities) and host's (architecture, kernel) configuration. This change introduces one change in the type; the "runtime-spec" type uses a `[]LinuxSeccompArg` for the `Args` field, whereas the local type used pointers; `[]*LinuxSeccompArg`. In addition, the runtime-spec Syscall type brings a new `ErrnoRet` field, allowing the profile to specify the errno code returned for the syscall, which allows changing the default EPERM for specific syscalls. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2021-06-17 21:25:06 +02:00
clubby789	d39b075302	Enable `process_vm_readv` and `process_vm_writev` for kernel > 4.8 These syscalls were disabled in #18971 due to them requiring CAP_PTRACE. CAP_PTRACE was blocked by default due to a ptrace related exploit. This has been patched in the Linux kernel (version 4.8) and thus `ptrace` has been re-enabled. However, these associated syscalls seem to have been left behind. This commit brings them in line with `ptrace`, and re-enables it for kernel > 4.8. Signed-off-by: clubby789 <jamie@hill-daniel.co.uk>	2021-03-04 17:12:01 +00:00
Aleksa Sarai	54eff4354b	profiles: seccomp: update to Linux 5.11 syscall list These syscalls (some of which have been in Linux for a while but were missing from the profile) fall into a few buckets: * close_range(2), epoll_pwait2(2) are just extensions of existing "safe for everyone" syscalls. * The mountv2 API syscalls (fs(2), move_mount(2), open_tree(2)) are all equivalent to aspects of mount(2) and thus go into the CAP_SYS_ADMIN category. process_madvise(2) is similar to the other process_*(2) syscalls and thus goes in the CAP_SYS_PTRACE category. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2021-01-27 13:25:49 +11:00
Mark Vainomaa	f7bcb02f67	seccomp: Add pidfd_getfd syscall Signed-off-by: Mark Vainomaa <mikroskeem@mikroskeem.eu>	2020-11-12 15:31:07 +02:00
Mark Vainomaa	5e3ffe6464	seccomp: Add pidfd_open and pidfd_send_signal Signed-off-by: Mark Vainomaa <mikroskeem@mikroskeem.eu>	2020-11-11 15:20:34 +02:00
Sebastiaan van Stijn	0d75b63987	seccomp: replace types with runtime-spec types Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-09-18 19:33:58 +02:00
Jintao Zhang	a18139111d	Add faccessat2 to default seccomp profile. Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>	2020-08-17 21:13:03 +08:00
Jintao Zhang	b8988c8475	Add openat2 to default seccomp profile. follow up to https://github.com/moby/moby/pull/41344#discussion_r469919978 Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>	2020-08-16 15:58:57 +08:00
Florian Schmaus	d0d99b04cf	seccomp: allow 'rseq' syscall in default seccomp profile Restartable Sequences (rseq) are a kernel-based mechanism for fast update operations on per-core data in user-space. Some libraries, like the newest version of Google's TCMalloc, depend on it [1]. This also makes dockers default seccomp profile on par with systemd's, which enabled 'rseq' in early 2019 [2]. 1: https://google.github.io/tcmalloc/design.html 2: `6fee3be0b4` Signed-off-by: Florian Schmaus <flo@geekplace.eu>	2020-06-26 16:06:26 +02:00
Justin Cormack	1aafcbb47a	Merge pull request #40995 from KentaTada/remove-unused-syscall seccomp: remove the unused query_module(2)	2020-05-28 11:25:59 +01:00
Akihiro Suda	b2917efb1a	Merge pull request #40731 from sqreen/fix/seccomp-profile seccomp: allow syscall membarrier	2020-05-20 00:31:32 +09:00
Kenta Tada	1192c7aee4	seccomp: remove the unused query_module(2) query_module(2) is only in kernels before Linux 2.6. Signed-off-by: Kenta Tada <Kenta.Tada@sony.com>	2020-05-19 10:30:54 +09:00
Stanislav Levin	5d3a9e4319	seccomp: Whitelist `clock_adjtime` This only allows making the syscall. CAP_SYS_TIME is still required for time adjustment (enforced by the kernel): ``` kernel/time/posix-timers.c: 1112 SYSCALL_DEFINE2(clock_adjtime, const clockid_t, which_clock, 1113 struct __kernel_timex __user , utx) ... 1121 err = do_clock_adjtime(which_clock, &ktx); 1100 int do_clock_adjtime(const clockid_t which_clock, struct __kernel_timex ktx) 1101 { ... 1109 return kc->clock_adj(which_clock, ktx); 1299 static const struct k_clock clock_realtime = { ... 1304 .clock_adj = posix_clock_realtime_adj, 188 static int posix_clock_realtime_adj(const clockid_t which_clock, 189 struct __kernel_timex t) 190 { 191 return do_adjtimex(t); kernel/time/timekeeping.c: 2312 int do_adjtimex(struct __kernel_timex txc) 2313 { ... 2321 /* Validate the data before disabling interrupts / 2322 ret = timekeeping_validate_timex(txc); 2246 static int timekeeping_validate_timex(const struct __kernel_timex txc) 2247 { 2248 if (txc->modes & ADJ_ADJTIME) { ... 2252 if (!(txc->modes & ADJ_OFFSET_READONLY) && 2253 !capable(CAP_SYS_TIME)) 2254 return -EPERM; 2255 } else { 2256 /* In order to modify anything, you gotta be super-user! */ 2257 if (txc->modes && !capable(CAP_SYS_TIME)) 2258 return -EPERM; ``` Fixes: https://github.com/moby/moby/issues/40919 Signed-off-by: Stanislav Levin <slev@altlinux.org>	2020-05-08 12:33:25 +03:00
Julio Guerra	1026f873a4	seccomp: allow syscall membarrier Add the membarrier syscall to the default seccomp profile. It is for example used in the implementation of dlopen() in the musl libc of Alpine images. Signed-off-by: Julio Guerra <julio@sqreen.com>	2020-04-07 16:24:17 +02:00
Sebastiaan van Stijn	89fabf0f24	seccomp: add 64-bit time_t syscalls Relates to https://patchwork.kernel.org/patch/10756415/ Added to whitelist: - `clock_getres_time64` (equivalent of `clock_getres`, which was whitelisted) - `clock_gettime64` (equivalent of `clock_gettime`, which was whitelisted) - `clock_nanosleep_time64` (equivalent of `clock_nanosleep`, which was whitelisted) - `futex_time64` (equivalent of `futex`, which was whitelisted) - `io_pgetevents_time64` (equivalent of `io_pgetevents`, which was whitelisted) - `mq_timedreceive_time64` (equivalent of `mq_timedreceive`, which was whitelisted) - `mq_timedsend_time64 ` (equivalent of `mq_timedsend`, which was whitelisted) - `ppoll_time64` (equivalent of `ppoll`, which was whitelisted) - `pselect6_time64` (equivalent of `pselect6`, which was whitelisted) - `recvmmsg_time64` (equivalent of `recvmmsg`, which was whitelisted) - `rt_sigtimedwait_time64` (equivalent of `rt_sigtimedwait`, which was whitelisted) - `sched_rr_get_interval_time64` (equivalent of `sched_rr_get_interval`, which was whitelisted) - `semtimedop_time64` (equivalent of `semtimedop`, which was whitelisted) - `timer_gettime64` (equivalent of `timer_gettime`, which was whitelisted) - `timer_settime64` (equivalent of `timer_settime`, which was whitelisted) - `timerfd_gettime64` (equivalent of `timerfd_gettime`, which was whitelisted) - `timerfd_settime64` (equivalent of `timerfd_settime`, which was whitelisted) - `utimensat_time64` (equivalent of `utimensat`, which was whitelisted) Not added to whitelist: - `clock_adjtime64` (equivalent of `clock_adjtime`, which was not whitelisted) - `clock_settime64` (equivalent of `clock_settime`, which was not whitelisted) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-03-25 13:49:49 +01:00
Arnaud Rebillout	667c87ef4f	profiles: Fix file permissions on json files json files should not be executable I think. Signed-off-by: Arnaud Rebillout <arnaud.rebillout@collabora.com>	2019-09-16 11:15:37 +07:00
youcai	f4d41f1dfa	seccomp: whitelist io-uring related system calls Signed-off-by: youcai <omegacoleman@gmail.com>	2019-09-07 07:35:23 +00:00
Michael Crosby	e4605cc2a5	Add sigprocmask to default seccomp profile Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2019-08-29 13:52:45 -04:00
Sebastiaan van Stijn	a1ec8551ab	Fix seccomp profile for clone syscall All clone flags for namespace should be denied. Based-on-patch-by: Kenta Tada <Kenta.Tada@sony.com> Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2019-06-04 15:28:12 +02:00
Avi Kivity	665741510a	seccomp: whitelist io_pgetevents() io_pgetevents() is a new Linux system call. It is similar to io_getevents() that is already whitelisted, and adds no special abilities over that system call. Allow that system call to enable applications that use it. Fixes #38894. Signed-off-by: Avi Kivity <avi@scylladb.com>	2019-03-18 20:46:16 +02:00
Tonis Tiigi	e76380b67b	seccomp: review update Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>	2019-02-05 12:02:41 -08:00
Tonis Tiigi	1124543ca8	seccomp: allow ptrace for 4.8+ kernels 4.8+ kernels have fixed the ptrace security issues so we can allow ptrace(2) on the default seccomp profile if we do the kernel version check. `93e35efb8d` Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>	2018-11-04 13:06:43 -08:00
Justin Cormack	ccd22ffcc8	Move the syslog syscall to be gated by CAP_SYS_ADMIN or CAP_SYSLOG This call is what is used to implement `dmesg` to get kernel messages about the host. This can leak substantial information about the host. It is normally available to unprivileged users on the host, unless the sysctl `kernel.dmesg_restrict = 1` is set, but this is not set by standard on the majority of distributions. Blocking this to restrict leaks about the configuration seems correct. Fix #37897 See also https://googleprojectzero.blogspot.com/2018/09/a-cache-invalidation-bug-in-linux.html Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2018-09-27 14:27:05 -07:00
Nicolas V Castet	47dfff68e4	Whitelist syscalls linked to CAP_SYS_NICE in default seccomp profile * Update profile to match docker documentation at https://docs.docker.com/engine/security/seccomp/ Signed-off-by: Nicolas V Castet <nvcastet@us.ibm.com>	2018-06-20 07:32:08 -05:00
NobodyOnSE	b2a907c8ca	Whitelist statx syscall for libseccomp-2.3.3 onward Older seccomp versions will ignore this. Signed-off-by: NobodyOnSE <ich@sektor.selfip.com>	2018-03-06 08:42:12 +01:00
Simon Vikstrom	d7bf5e3b4d	Remove double defined alarm Signed-off-by: Simon Vikstrom <pullreq@devsn.se>	2017-08-19 09:55:03 +02:00
Panagiotis Moustafellos	cf6e1c5dfd	seccomp: whitelist quotactl with CAP_SYS_ADMIN The quotactl syscall is being whitelisted in default seccomp profile, gated by CAP_SYS_ADMIN. Signed-off-by: Panagiotis Moustafellos <pmoust@elastic.co>	2017-08-09 18:52:15 +03:00

1 2

70 commits