This patch adds Space, a class representing a process's address space.
- Each Process has a Space.
- The Space owns the PageDirectory and all Regions in the Process.
This allows us to reorganize sys$execve() so that it constructs and
populates a new Space fully before committing to it.
Previously, we would construct the new address space while still
running in the old one, and encountering an error meant we had to do
tedious and error-prone rollback.
Those problems are now gone, replaced by what's hopefully a set of much
smaller problems and missing cleanups. :^)
This patch adds sys$msyscall() which is loosely based on an OpenBSD
mechanism for preventing syscalls from non-blessed memory regions.
It works similarly to pledge and unveil, you can call it as many
times as you like, and when you're finished, you call it with a null
pointer and it will stop accepting new regions from then on.
If a syscall later happens and doesn't originate from one of the
previously blessed regions, the kernel will simply crash the process.
We now move the execpromises state into the regular promises, and clear
the execpromises state.
Also make sure to duplicate the promise state on fork.
This fixes an issue where "su" would launch a shell which immediately
crashed due to not having pledged "stdio".
This was done with the help of several scripts, I dump them here to
easily find them later:
awk '/#ifdef/ { print "#cmakedefine01 "$2 }' AK/Debug.h.in
for debug_macro in $(awk '/#ifdef/ { print $2 }' AK/Debug.h.in)
do
find . \( -name '*.cpp' -o -name '*.h' -o -name '*.in' \) -not -path './Toolchain/*' -not -path './Build/*' -exec sed -i -E 's/#ifdef '$debug_macro'/#if '$debug_macro'/' {} \;
done
# Remember to remove WRAPPER_GERNERATOR_DEBUG from the list.
awk '/#cmake/ { print "set("$2" ON)" }' AK/Debug.h.in
All users of this mechanism have been switched to anonymous files and
passing file descriptors with sendfd()/recvfd().
Shbufs got us where we are today, but it's time we say good-bye to them
and welcome a much more idiomatic replacement. :^)
These changes are arbitrarily divided into multiple commits to make it
easier to find potentially introduced bugs with git bisect.Everything:
The modifications in this commit were automatically made using the
following command:
find . -name '*.cpp' -exec sed -i -E 's/dbg\(\) << ("[^"{]*");/dbgln\(\1\);/' {} \;
By designating a committed page pool we can guarantee to have physical
pages available for lazy allocation in mappings. However, when forking
we will overcommit. The assumption is that worst-case it's better for
the fork to die due to insufficient physical memory on COW access than
the parent that created the region. If a fork wants to ensure that all
memory is available (trigger a commit) then it can use madvise.
This also means that fork now can gracefully fail if we don't have
enough physical pages available.
We need to create a reference for the new PID for each shared buffer
that the process had a reference to. If the process subsequently
get replaced through exec, those references will be dropped again.
But if exec for some reason fails then other code, such as global
destructors could still expect having access to them.
Fixes#4076
This makes most operations thread safe, especially so that they
can safely be used in the Kernel. This includes obtaining a strong
reference from a weak reference, which now requires an explicit
call to WeakPtr::strong_ref(). Another major change is that
Weakable::make_weak_ref() may require the explicit target type.
Previously we used reinterpret_cast in WeakPtr, assuming that it
can be properly converted. But WeakPtr does not necessarily have
the knowledge to be able to do this. Instead, we now ask the class
itself to deliver a WeakPtr to the type that we want.
Also, WeakLink is no longer specific to a target type. The reason
for this is that we want to be able to safely convert e.g. WeakPtr<T>
to WeakPtr<U>, and before this we just reinterpret_cast the internal
WeakLink<T> to WeakLink<U>, which is a bold assumption that it would
actually produce the correct code. Instead, WeakLink now operates
on just a raw pointer and we only make those constructors/operators
available if we can verify that it can be safely cast.
In order to guarantee thread safety, we now use the least significant
bit in the pointer for locking purposes. This also means that only
properly aligned pointers can be used.
g_scheduler_lock cannot safely be acquired after Thread::m_lock
because another processor may already hold g_scheduler_lock and wait
for the same Thread::m_lock.
Similar to Process, we need to make Thread refcounted. This will solve
problems that will appear once we schedule threads on more than one
processor. This allows us to hold onto threads without necessarily
holding the scheduler lock for the entire duration.
Since "rings" typically refer to code execution and user processes
can also execute in ring 0, rename these functions to more accurately
describe what they mean: kernel processes and user processes.
We need to wait until a thread is fully set up and ready for running
before attempting to deliver a signal. Otherwise we may not have a
user stack yet.
Also, remove the Skip0SchedulerPasses and Skip1SchedulerPass thread
states that we don't really need anymore with software context switching.
Fixes the kernel crash reported in #3419
This does not add any behaviour change to the processes, but it ties a
TTY to an active process group via TIOCSPGRP, and returns the TTY to the
kernel when all processes in the process group die.
Also makes the TTY keep a link to the original controlling process' parent (for
SIGCHLD) instead of the process itself.
This compiles, and contains exactly the same bugs as before.
The regex 'FIXME: PID/' should reveal all markers that I left behind, including:
- Incomplete conversion
- Issues or things that look fishy
- Actual bugs that will go wrong during runtime
This is something I've been meaning to do for a long time, and here we
finally go. This patch moves all sys$foo functions out of Process.cpp
and into files in Kernel/Syscalls/.
It's not exactly one syscall per file (although it could be, but I got
a bit tired of the repetitive work here..)
This makes hacking on individual syscalls a lot less painful since you
don't have to rebuild nearly as much code every time. I'm also hopeful
that this makes it easier to understand individual syscalls. :^)