0ct0pu5/ladybird

Author	SHA1	Message	Date
Liav A	633006926f	Kernel: Make the Jails' internal design a lot more sane This is done with 2 major steps: 1. Remove JailManagement singleton and use a structure that resembles what we have with the Process object. This is required later for the second step in this commit, but on its own, is a major change that removes this clunky singleton that had no real usage by itself. 2. Use IntrusiveLists to keep references to Process objects in the same Jail so it will be much more straightforward to iterate on this kind of objects when needed. Previously we locked the entire Process list and we did a simple pointer comparison to check if the checked Process we iterate on is in the same Jail or not, which required taking multiple Spinlocks in a very clumsy and heavyweight way.	2023-03-12 10:21:59 -06:00
Andreas Kling	d1371d66f7	Kernel: Use non-locking {Nonnull,}RefPtr for OpenFileDescription This patch switches away from {Nonnull,}LockRefPtr to the non-locking smart pointers throughout the kernel. I've looked at the handful of places where these were being persisted and I don't see any race situations. Note that the process file descriptor table (Process::m_fds) was already guarded via MutexProtected.	2023-03-07 00:30:12 +01:00
Andreas Kling	359d6e7b0b	Everywhere: Stop using NonnullOwnPtrVector Same as NonnullRefPtrVector: weird semantics, questionable benefits.	2023-03-06 23:46:35 +01:00
Sam Atkins	fe7b08dad7	Kernel: Protect Process::m_name with a spinlock This also lets us remove the `get_process_name` and `set_process_name` syscalls from the big lock. :^)	2023-02-06 20:36:53 +01:00
Timon Kruiper	b941bd55d9	Kernel: Add Syscalls/execve.cpp to aarch64 build	2023-01-27 20:47:08 +00:00
Timon Kruiper	1fbf562e7e	Kernel: Add ThreadRegisters::set_exec_state and use it in execve.cpp Using this abstraction it is possible to compile this file for aarch64.	2023-01-27 20:47:08 +00:00
Timon Kruiper	12322670cb	Kernel: Use InterruptsState abstraction in execve.cpp This was using the x86_64 specific cpu_flags abstraction, which is not compatible with aarch64.	2023-01-27 20:47:08 +00:00
Andrew Kaster	ddea37b521	Kernel+LibC: Move name length constants to Kernel/API from limits.h Reduce inclusion of limits.h as much as possible at the same time. This does mean that kmalloc.h is now including Kernel/API/POSIX/limits.h instead of LibC/limits.h, but the scope could be limited a lot more. Basically every file in the kernel includes kmalloc.h, and needs the limits.h include for PAGE_SIZE.	2023-01-21 10:43:59 -07:00
Liav A	04221a7533	Kernel: Mark Process::jail() method as const We really don't want callers of this function to accidentally change the jail, or even worse - remove the Process from an attached jail. To ensure this never happens, we can just declare this method as const so nobody can mutate it this way.	2023-01-07 03:44:59 +03:30
yyny	9ca979846c	Kernel: Add `sid` and `pgid` to `Credentials` There are places in the kernel that would like to have access to `pgid` credentials in certain circumstances. I haven't found any use cases for `sid` yet, but `sid` and `pgid` are both changed with `sys$setpgid`, so it seemed sensical to add it. In Linux, `man 7 credentials` also mentions both the session id and process group id, so this isn't unprecedented.	2023-01-03 18:13:11 +01:00
Liav A	e598f22768	Kernel: Disallow executing SUID binaries if process is jailed Check if the process we are currently running is in a jail, and if that is the case, fail early with the EPERM error code. Also, as Brian noted, we should also disallow attaching to a jail in case of already running within a setid executable, as this leaves the user with false thinking of being secure (because you can't exec new setid binaries), but the current program is still marked setid, which means that at the very least we gained permissions while we didn't expect it, so let's block it.	2022-12-30 15:49:37 -05:00
Liav A	5ff318cf3a	Kernel: Remove i686 support	2022-12-28 11:53:41 +01:00
Agustin Gianni	ac40090583	Kernel: Add the auxiliary vector to the stack size validation This patch validates that the size of the auxiliary vector does not exceed `Process::max_auxiliary_size`. The auxiliary vector is a range of memory in userspace stack where the kernel can pass information to the process that will be created via `Process:do_exec`. The reason the kernel needs to validate its size is that the about to be created process needs to have remaining space on the stack. Previously only `argv` and `envp` were taken into account for the size validation, with this patch, the size of `auxv` is also checked. All three elements contain values that a user (or an attacker) can specify. This patch adds the constant `Process::max_auxiliary_size` which is defined to be one eight of the user-space stack size. This is the approach taken by `Process:max_arguments_size` and `Process::max_environment_size` which are used to check the sizes of `argv` and `envp`.	2022-12-14 15:09:28 +00:00
sin-ack	ef6921d7c7	Kernel+LibC+LibELF: Set stack size based on PT_GNU_STACK during execve Some programs explicitly ask for a different initial stack size than what the OS provides. This is implemented in ELF by having a PT_GNU_STACK header which has its p_memsz set to the amount that the program requires. This commit implements this policy by reading the p_memsz of the header and setting the main thread stack size to that. ELF::Image::validate_program_headers ensures that the size attribute is a reasonable value.	2022-12-11 19:55:37 -07:00
Liav A	718ae68621	Kernel+LibCore+LibC: Implement support for forcing unveil on exec To accomplish this, we add another VeilState which is called LockedInherited. The idea is to apply exec unveil data, similar to execpromises of the pledge syscall, on the current exec'ed program during the execve sequence. When applying the forced unveil data, the veil state is set to be locked but the special state of LockedInherited ensures that if the new program tries to unveil paths, the request will silently be ignored, so the program will continue running without receiving an error, but is still can only use the paths that were unveiled before the exec syscall. This in turn, allows us to use the unveil syscall with a special utility to sandbox other userland programs in terms of what is visible to them on the filesystem, and is usable on both programs that use or don't use the unveil syscall in their code.	2022-11-26 12:42:15 -07:00
Liav A	5e062414c1	Kernel: Add support for jails Our implementation for Jails resembles much of how FreeBSD jails are working - it's essentially only a matter of using a RefPtr in the Process class to a Jail object. Then, when we iterate over all processes in various cases, we could ensure if either the current process is in jail and therefore should be restricted what is visible in terms of PID isolation, and also to be able to expose metadata about Jails in /sys/kernel/jails node (which does not reveal anything to a process which is in jail). A lifetime model for the Jail object is currently plain simple - there's simpy no way to manually delete a Jail object once it was created. Such feature should be carefully designed to allow safe destruction of a Jail without the possibility of releasing a process which is in Jail from the actual jail. Each process which is attached into a Jail cannot leave it until the end of a Process (i.e. when finalizing a Process). All jails are kept being referenced in the JailManagement. When a last attached process is finalized, the Jail is automatically destroyed.	2022-11-05 18:00:58 -06:00
Liav A	965afba320	Kernel/FileSystem: Add a few missing includes In preparation to future commits, we need to ensure that OpenFileDescription.h doesn't include the VirtualFileSystem.h file to avoid include loops.	2022-10-22 16:57:52 -04:00
Andreas Kling	cf16b2c8e6	Kernel: Wrap process address spaces in SpinlockProtected This forces anyone who wants to look into and/or manipulate an address space to lock it. And this replaces the previous, more flimsy, manual spinlock use. Note that pointers into the address space are not safe to use after you unlock the space. We've got many issues like this, and we'll have to track those down as wlel.	2022-08-24 14:57:51 +02:00
Anthony Iacono	f86b671de2	Kernel: Use Process::credentials() and remove user ID/group ID helpers Move away from using the group ID/user ID helpers in the process to allow for us to take advantage of the immutable credentials instead.	2022-08-22 12:46:32 +02:00
Andreas Kling	c3351d4b9f	Kernel: Make VirtualFileSystem functions take credentials as input Instead of getting credentials from Process::current(), we now require that they be provided as input to the various VFS functions. This ensures that an atomic set of credentials is used throughout an entire VFS operation.	2022-08-21 16:02:24 +02:00
Andreas Kling	8ed06ad814	Kernel: Guard Process "protected data" with a spinlock This ensures that both mutable and immutable access to the protected data of a process is serialized. Note that there may still be multiple TOCTOU issues around this, as we have a bunch of convenience accessors that make it easy to introduce them. We'll need to audit those as well.	2022-08-21 12:25:14 +02:00
Andreas Kling	728c3fbd14	Kernel: Use RefPtr instead of LockRefPtr for Custody By protecting all the RefPtr<Custody> objects that may be accessed from multiple threads at the same time (with spinlocks), we remove the need for using LockRefPtr<Custody> (which is basically a RefPtr with a built-in spinlock.)	2022-08-21 12:25:14 +02:00
Andreas Kling	122d7d9533	Kernel: Add Credentials to hold a set of user and group IDs This patch adds a new object to hold a Process's user credentials: - UID, EUID, SUID - GID, EGID, SGID, extra GIDs Credentials are immutable and child processes initially inherit the Credentials object from their parent. Whenever a process changes one or more of its user/group IDs, a new Credentials object is constructed. Any code that wants to inspect and act on a set of credentials can now do so without worrying about data races.	2022-08-20 18:32:50 +02:00
Andreas Kling	11eee67b85	Kernel: Make self-contained locking smart pointers their own classes Until now, our kernel has reimplemented a number of AK classes to provide automatic internal locking: - RefPtr - NonnullRefPtr - WeakPtr - Weakable This patch renames the Kernel classes so that they can coexist with the original AK classes: - RefPtr => LockRefPtr - NonnullRefPtr => NonnullLockRefPtr - WeakPtr => LockWeakPtr - Weakable => LockWeakable The goal here is to eventually get rid of the Lock* classes in favor of using external locking.	2022-08-20 17:20:43 +02:00
Undefine	97cc33ca47	Everywhere: Make the codebase more architecture aware	2022-07-27 21:46:42 +00:00
Hendiadyoin1	ad904cdcab	Kernel: Use find_last_split_view to get the executable name in do_exec	2022-07-15 12:42:43 +02:00
sin-ack	3f3f45580a	Everywhere: Add sv suffix to strings relying on StringView(char const) Each of these strings would previously rely on StringView's char const constructor overload, which would call __builtin_strlen on the string. Since we now have operator ""sv, we can replace these with much simpler versions. This opens the door to being able to remove StringView(char const*). No functional changes.	2022-07-12 23:11:35 +02:00
Tim Schumacher	add4dd3589	Kernel: Do a POSIX-correct signal handler reset on exec	2022-07-05 20:58:38 +03:00
Luke Wilde	1682b0b6d8	Kernel: Remove big lock from `sys$set_coredump_metadata` The only requirement for this syscall is to make Process::m_coredump_properties SpinlockProtected.	2022-04-09 21:51:16 +02:00
Andreas Kling	9250ac0c24	Kernel: Randomize non-specific VM allocations done by sys$execve() Stuff like TLS regions, main thread stacks, etc. All deserve to be randomized unless the ELF requires specific placement. :^)	2022-04-04 00:42:18 +02:00
Andreas Kling	858b196c59	Kernel: Unbreak ASLR in the new RegionTree world Functions that allocate and/or place a Region now take a parameter that tells it whether to randomize unspecified addresses.	2022-04-03 21:51:58 +02:00
Andreas Kling	07f3d09c55	Kernel: Make VM allocation atomic for userspace regions This patch move AddressSpace (the per-process memory manager) to using the new atomic "place" APIs in RegionTree as well, just like we did for MemoryManager in the previous commit. This required updating quite a few places where VM allocation and actually committing a Region object to the AddressSpace were separated by other code. All you have to do now is call into AddressSpace once and it'll take care of everything for you.	2022-04-03 21:51:58 +02:00
Idan Horowitz	086969277e	Everywhere: Run clang-format	2022-04-01 21:24:45 +01:00
Andreas Kling	580d89f093	Kernel: Put Process unveil state in a SpinlockProtected container This makes path resolution safe to perform without holding the big lock.	2022-03-08 00:19:49 +01:00
Idan Horowitz	011bd06053	Kernel: Set CS selector when initializing thread context on x86_64 These are not technically required, since the Thread constructor already sets these, but they are set on i686, so let's try and keep consistent behaviour between the different archs.	2022-02-27 00:38:00 +02:00
Brian Gianforcaro	70f3fa2dd2	Kernel: Set new process name in `do_exec` before waiting for the tracer While investigating why gdb is failing when it calls `PT_CONTINUE` against Serenity I noticed that the names of the programs in the System Monitor didn't make sense. They were seemingly stale. After inspecting the kernel code, it became apparent that the sequence occurs as follows: 1. Debugger calls `fork()` 2. The forked child calls `PT_TRACE_ME` 3. The `PT_TRACE_ME` instructs the forked process to block in the kernel waiting for a signal from the tracer on the next call to `execve(..)`. 4. Debugger waits for forked child to spawn and stop, and then it calls `PT_ATTACH` followed by `PT_CONTINUE` on the child. 5. Currently the `PT_CONTINUE` fails because of some other yet to be found bug. 6. The process name is set immediately AFTER we are woken up by the `PT_CONTINUE` which never happens in the case I'm debugging. This chain of events leaves the process suspended, with the name of the original (forked) process instead of the name we inherit from the `execve(..)` call. To avoid such confusion in the future, we set the new name before we block waiting for the tracer.	2022-02-19 18:04:32 -08:00
Ali Mohammad Pur	a1cb2c371a	AK+Kernel: OOM-harden most parts of Trie The only part of Unveil that can't handle OOM gracefully is the String::formatted() use in the node metadata.	2022-02-15 18:03:02 +02:00
Idan Horowitz	c8ab7bde3b	Kernel: Use try_make_weak_ptr() instead of make_weak_ptr()	2022-02-13 23:02:57 +01:00
Idan Horowitz	d6ea6c39a7	AK+Kernel: Rename try_make_weak_ptr to make_weak_ptr_if_nonnull This matches the likes of the adopt_{own, ref}_if_nonnull family and also frees up the name to allow us to eventually add OOM-fallible versions of these functions.	2022-02-13 23:02:57 +01:00
Andrew Kaster	b4a7d148b1	Kernel: Expose maximum argument limit in sysconf Move the definitions for maximum argument and environment size to Process.h from execve.cpp. This allows sysconf(_SC_ARG_MAX) to return the actual argument maximum of 128 KiB to userspace.	2022-02-13 22:06:54 +02:00
Lenny Maiorani	c6acf64558	Kernel: Change static constexpr variables to constexpr where possible Function-local `static constexpr` variables can be `constexpr`. This can reduce memory consumption, binary size, and offer additional compiler optimizations. These changes result in a stripped x86_64 kernel binary size reduction of 592 bytes.	2022-02-09 21:04:51 +00:00
Andreas Kling	3845c90e08	Kernel: Remove unnecessary includes from Thread.h ...and deal with the fallout by adding missing includes everywhere.	2022-01-30 16:21:59 +01:00
Idan Horowitz	e28af4a2fc	Kernel: Stop using HashMap in Mutex This commit removes the usage of HashMap in Mutex, thereby making Mutex be allocation-free. In order to achieve this several simplifications were made to Mutex, removing unused code-paths and extra VERIFYs: * We no longer support 'upgrading' a shared lock holder to an exclusive holder when it is the only shared holder and it did not unlock the lock before relocking it as exclusive. NOTE: Unlike the rest of these changes, this scenario is not VERIFY-able in an allocation-free way, as a result the new LOCK_SHARED_UPGRADE_DEBUG debug flag was added, this flag lets Mutex allocate in order to detect such cases when debugging a deadlock. * We no longer support checking if a Mutex is locked by the current thread when the Mutex was not locked exclusively, the shared version of this check was not used anywhere. * We no longer support force unlocking/relocking a Mutex if the Mutex was not locked exclusively, the shared version of these functions was not used anywhere.	2022-01-29 16:45:39 +01:00
Andreas Kling	b56646e293	Kernel: Switch process file descriptor table from spinlock to mutex There's no reason for this to use a spinlock. Instead, let's allow threads to block if someone else is using the descriptor table.	2022-01-29 02:17:09 +01:00
Andreas Kling	8ebec2938c	Kernel: Convert process file descriptor table to a SpinlockProtected Instead of manually locking in the various member functions of Process::OpenFileDescriptions, simply wrap it in a SpinlockProtected.	2022-01-29 02:17:06 +01:00
Andreas Kling	31c1094577	Kernel: Don't mess with thread state in Process::do_exec() We were marking the execing thread as Runnable near the end of Process::do_exec(). This was necessary for exec in processes that had never been scheduled yet, which is a specific edge case that only applies to the very first userspace process (normally SystemServer). At this point, such threads are in the Invalid state. In the common case (normal userspace-initiated exec), making the current thread Runnable meant that we switched away from its current state: Running. As the thread is indeed running, that's a bogus change! This created a short time window in which the thread state was bogus, and any attempt to block the thread would panic the kernel (due to a bogus thread state in Thread::block() leading to VERIFY_NOT_REACHED().) Fix this by not touching the thread state in Process::do_exec() and instead make the first userspace thread Runnable directly after calling Process::exec() on it in try_create_userspace_process(). It's unfortunate that exec() can be called both on the current thread, and on a new thread that has never been scheduled. It would be good to not have the latter edge case, but fixing that will require larger architectural changes outside the scope of this fix.	2022-01-27 11:18:25 +01:00
Brian Gianforcaro	e954b4bdd4	Kernel: Return error from sys$execve() when called with zero arguments There are many assumptions in the stack that argc is not zero, and argv[0] points to a valid string. The recent pwnkit exploit on Linux was able to exploit this assumption in the `pkexec` utility (a SUID-root binary) to escalate from any user to root. By convention `execve(..)` should always be called with at least one valid argument, so lets enforce that semantic to harden the system against vulnerabilities like pwnkit. Reference: https://www.qualys.com/2022/01/25/cve-2021-4034/pwnkit.txt	2022-01-26 13:05:59 +01:00
Idan Horowitz	d1433c35b0	Kernel: Handle OOM failures in find_shebang_interpreter_for_executable	2022-01-26 02:37:03 +02:00
Idan Horowitz	8cf0e4a5e4	Kernel: Eliminate allocations from generate_auxiliary_vector	2022-01-26 02:37:03 +02:00
Jelle Raaijmakers	df73e8b46b	Kernel: Allow program headers to align on multiples of `PAGE_SIZE` These checks in `sys$execve` could trip up the system whenever you try to execute an `.so` file. For example, double-clicking `libwasm.so` in Terminal crashes the kernel. This changes the program header alignment checks to reflect the same checks in LibELF, and passes the requested alignment on to `::try_allocate_range()`.	2022-01-23 00:11:56 +02:00

1 2 3 4 5

248 commits