Mere mortals like myself cannot understand more than two lines of
assembly without a million comments explaining what's happening, so do
that and make sure no one has to go on a wild stack state chase when
hacking on these.
POSIX requires that sigaction() and friends set a _process-wide_ signal
handler, so move signal handlers and flags inside Process.
This also fixes a "pid/tid confusion" FIXME, as we can now send the
signal to the process and let that decide which thread should get the
signal (which is the thread with tid==pid, but that's now the Process's
problem).
Note that each thread still retains its signal mask, as that is local to
each thread.
This commit removes the usage of HashMap in Mutex, thereby making Mutex
be allocation-free.
In order to achieve this several simplifications were made to Mutex,
removing unused code-paths and extra VERIFYs:
* We no longer support 'upgrading' a shared lock holder to an
exclusive holder when it is the only shared holder and it did not
unlock the lock before relocking it as exclusive. NOTE: Unlike the
rest of these changes, this scenario is not VERIFY-able in an
allocation-free way, as a result the new LOCK_SHARED_UPGRADE_DEBUG
debug flag was added, this flag lets Mutex allocate in order to
detect such cases when debugging a deadlock.
* We no longer support checking if a Mutex is locked by the current
thread when the Mutex was not locked exclusively, the shared version
of this check was not used anywhere.
* We no longer support force unlocking/relocking a Mutex if the Mutex
was not locked exclusively, the shared version of these functions
was not used anywhere.
We were marking the execing thread as Runnable near the end of
Process::do_exec().
This was necessary for exec in processes that had never been scheduled
yet, which is a specific edge case that only applies to the very first
userspace process (normally SystemServer). At this point, such threads
are in the Invalid state.
In the common case (normal userspace-initiated exec), making the current
thread Runnable meant that we switched away from its current state:
Running. As the thread is indeed running, that's a bogus change!
This created a short time window in which the thread state was bogus,
and any attempt to block the thread would panic the kernel (due to a
bogus thread state in Thread::block() leading to VERIFY_NOT_REACHED().)
Fix this by not touching the thread state in Process::do_exec()
and instead make the first userspace thread Runnable directly after
calling Process::exec() on it in try_create_userspace_process().
It's unfortunate that exec() can be called both on the current thread,
and on a new thread that has never been scheduled. It would be good to
not have the latter edge case, but fixing that will require larger
architectural changes outside the scope of this fix.
This ensures that everything allocated on the stack in Process::exec()
gets cleaned up. We had a few leaks related to the parsing of shebang
(#!) executables that get fixed by this.
This change adds a thread member variable to track if we have a pending
promise violation on a kernel thread. This ensures that all code
properly propagates promise violations up to the syscall handler.
Suggested-by: Andreas Kling <kling@serenityos.org>
Previously we would crash the process immediately when a promise
violation was found during a syscall. This is error prone, as we
don't unwind the stack. This means that in certain cases we can
leak resources, like an OwnPtr / RefPtr tracked on the stack. Or
even leak a lock acquired in a ScopeLockLocker.
To remedy this situation we move the promise violation handling to
the syscall handler, right before we return to user space. This
allows the code to follow the normal unwind path, and grantees
there is no longer any cleanup that needs to occur.
The Process::require_promise() and Process::require_no_promises()
functions were modified to return ErrorOr<void> so we enforce that
the errors are always propagated by the caller.
Since perfcore files can be generated during process finalization,
we can't just allow them to contain sensitive kernel information
if they're gonna be owned by the process's own UID+GID.
So instead, perfcores are now owned by 0:0. This is not the most
ergonomic solution, but I'm not sure what we could do to make it nicer.
We'll have to think more about that. In the meantime, this patches up
a kernel info leak. :^)
This fixes at least half of our LibC includes in the kernel. The source
of truth for errno codes and their description strings now lives in
Kernel/API/POSIX/errno.h as an enumeration, which LibC includes.
... In files included from Kernel/Thread.cpp or Kernel/Process.cpp
Some places the warning is suppressed, because we do not want a const
object do have non-const access to the returned sub-object.
Instead of signalling allocation failure with a bool return value
(false), we now use ErrorOr<void> and return ENOMEM as appropriate.
This allows us to use TRY() and MUST() with Vector. :^)
We now use AK::Error and AK::ErrorOr<T> in both kernel and userspace!
This was a slightly tedious refactoring that took a long time, so it's
not unlikely that some bugs crept in.
Nevertheless, it does pass basic functionality testing, and it's just
real nice to finally see the same pattern in all contexts. :^)
This singleton simplifies many aspects that we struggled with before:
1. There's no need to make derived classes of Device expose the
constructor as public anymore. The singleton is a friend of them, so he
can call the constructor. This solves the issue with try_create_device
helper neatly, hopefully for good.
2. Getting a reference of the NullDevice is now being done from this
singleton, which means that NullDevice no longer needs to use its own
singleton, and we can apply the try_create_device helper on it too :)
3. We can now defer registration completely after the Device constructor
which means the Device constructor is merely assigning the major and
minor numbers of the Device, and the try_create_device helper ensures it
calls the after_inserting method immediately after construction. This
creates a great opportunity to make registration more OOM-safe.
These interfaces are broken for about 9 months, maybe longer than that.
At this point, this is just a dead code nobody tests or tries to use, so
let's remove it instead of keeping a stale code just for the sake of
keeping it and hoping someone will fix it.
To better justify this, I read that OpenBSD removed loadable kernel
modules in 5.7 release (2014), mainly for the same reason we do -
nobody used it so they had no good reason to maintain it.
Still, OpenBSD had LKMs being effectively working, which is not the
current state in our project for a long time.
An arguably better approach to minimize the Kernel image size is to
allow dropping drivers and features while compiling a new image.
This patch converts all the usage of AK::String around sys$execve() to
using KString instead, allowing us to catch and propagate OOM errors.
It also required changing the kernel CommandLine helper class to return
a vector of KString for the userspace init program arguments.
This patch adds KBufferBuilder::try_create() and treats it like anything
else that can fail. And so, failure to allocate the initial internal
buffer of the builder will now propagate an ENOMEM to the caller. :^)