The ProtectedDataMutationScope cannot blindly assume that there is only
exactly one thread at a time that may want to unprotect the Process.
Most of the time the big lock guaranteed this, but there are some cases
such as finalization (among others) where this is not necessarily
guaranteed.
This fixes random panics due to access violations when the
ProtectedDataMutationScope protects the Process instance while another
is still modifying it.
Fixes#8512
This was an old SerenityOS-specific syscall for donating the remainder
of the calling thread's time-slice to another thread within the same
process.
Now that Threading::Lock uses a pthread_mutex_t internally, we no
longer need this syscall, which allows us to get rid of a surprising
amount of unnecessary scheduler logic. :^)
The new ProcFS design consists of two main parts:
1. The representative ProcFS class, which is derived from the FS class.
The ProcFS and its inodes are much more lean - merely 3 classes to
represent the common type of inodes - regular files, symbolic links and
directories. They're backed by a ProcFSExposedComponent object, which
is responsible for the functional operation behind the scenes.
2. The backend of the ProcFS - the ProcFSComponentsRegistrar class
and all derived classes from the ProcFSExposedComponent class. These
together form the entire backend and handle all the functions you can
expect from the ProcFS.
The ProcFSExposedComponent derived classes split to 3 types in the
manner of lifetime in the kernel:
1. Persistent objects - this category includes all basic objects, like
the root folder, /proc/bus folder, main blob files in the root folders,
etc. These objects are persistent and cannot die ever.
2. Semi-persistent objects - this category includes all PID folders,
and subdirectories to the PID folders. It also includes exposed objects
like the unveil JSON'ed blob. These object are persistent as long as the
the responsible process they represent is still alive.
3. Dynamic objects - this category includes files in the subdirectories
of a PID folder, like /proc/PID/fd/* or /proc/PID/stacks/*. Essentially,
these objects are always created dynamically and when no longer in need
after being used, they're deallocated.
Nevertheless, the new allocated backend objects and inodes try to use
the same InodeIndex if possible - this might change only when a thread
dies and a new thread is born with a new thread stack, or when a file
descriptor is closed and a new one within the same file descriptor
number is opened. This is needed to actually be able to do something
useful with these objects.
The new design assures that many ProcFS instances can be used at once,
with one backend for usage for all instances.
The Process::Handler type has KResultOr<FlatPtr> as its return type.
Using a different return type with an equally-sized template parameter
sort of works but breaks once that condition is no longer true, e.g.
for KResultOr<int> on x86_64.
Ideally the syscall handlers would also take FlatPtrs as their args
so we can get rid of the reinterpret_cast for the function pointer
but I didn't quite feel like cleaning that up as well.
Without this the ProcessBase class is placed into the padding for the
ProtectedProcessBase class which then causes the members of the
RefCounted class to end up without the first 4096 bytes of the Process
class:
BP 1, Kernel::Process::protect_data (this=this@entry=0xc063b000)
205 {
(gdb) p &m_ref_count
$1 = (AK::Atomic<unsigned int, (AK::MemoryOrder)5> *) 0xc063bffc
Note how the difference between 'this' and &m_ref_count is less than
4096.
This fixes a bug where unveiling a subdirectory of an already unveiled
path would sometimes be allowed and sometimes not (depending on what
other unveil calls have been made).
Now, it is always allowed to unveil a subdirectory of an already
unveiled directory, even if it has higher permissions.
This removes the need for the permissions_inherited_from_root flag in
UnveilMetadata, so it has been removed.
This adds two new arguments to the thread_exit system call which let
a thread unmap an arbitrary VM range on thread exit. LibPthread
uses this functionality to unmap the thread stack.
Fixes#7267.
With -Og, all calls to create_kernel_process were triggering -Wnonnull
when creating these lambdas that get implicitly converted to function
pointers. A different design of create_kernel_process to use
AK::Function instead might avoid this awkward behavior.
Previously the process' m_profiling flag was ignored for all event
types other than CPU samples.
The kfree tracing code relies on temporarily disabling tracing during
exec. This didn't work for per-process profiles and would instead
panic.
This updates the profiling code so that the m_profiling flag isn't
ignored.
Problem:
- `static` variables consume memory and sometimes are less
optimizable.
- `static const` variables can be `constexpr`, usually.
- `static` function-local variables require an initialization check
every time the function is run.
Solution:
- If a global `static` variable is only used in a single function then
move it into the function and make it non-`static` and `constexpr`.
- Make all global `static` variables `constexpr` instead of `const`.
- Change function-local `static const[expr]` variables to be just
`constexpr`.
Unlike accept() the new accept4() system call lets the caller specify
flags for the newly accepted socket file descriptor, such as
SOCK_CLOEXEC and SOCK_NONBLOCK.
By constraining two implementations, the compiler will select the best
fitting one. All this will require is duplicating the implementation and
simplifying for the `void` case.
This constraining also informs both the caller and compiler by passing
the callback parameter types as part of the constraint
(e.g.: `IterationFunction<int>`).
Some `for_each` functions in LibELF only take functions which return
`void`. This is a minimal correctness check, as it removes one way for a
function to incompletely do something.
There seems to be a possible idiom where inside a lambda, a `return;` is
the same as `continue;` in a for-loop.
This change looks more involved than it actually is. This simply
reshuffles the previous Process constructor and splits out the
parts which can fail (resource allocation) into separate methods
which can be called from a factory method. The factory is then
used everywhere instead of the constructor.
Modify the API so it's possible to propagate error on OOM failure.
NonnullOwnPtr<T> is not appropriate for the ThreadTracer::create() API,
so switch to OwnPtr<T>, use adopt_own_if_nonnull() to handle creation.
This patch modifies InodeWatcher to switch to a one watcher, multiple
watches architecture. The following changes have been made:
- The watch_file syscall is removed, and in its place the
create_iwatcher, iwatcher_add_watch and iwatcher_remove_watch calls
have been added.
- InodeWatcher now holds multiple WatchDescriptions for each file that
is being watched.
- The InodeWatcher file descriptor can be read from to receive events on
all watched files.
Co-authored-by: Gunnar Beutner <gunnar@beutner.name>
The current method of emitting performance events requires a bit of
boiler plate at every invocation, as well as having to ignore the
return code which isn't used outside of the perf event syscall. This
change attempts to clean that up by exposing high level API's that
can be used around the code base.
Previously, TLS data was always zero-initialized.
To support initializing the values of TLS data, sys$allocate_tls now
receives a buffer with the desired initial data, and copies it to the
master TLS region of the process.
The DynamicLinker gathers the initial TLS image and passes it to
sys$allocate_tls.
We also now require the size passed to sys$allocate_tls to be
page-aligned, to make things easier. Note that this doesn't waste memory
as the TLS data has to be allocated in separate pages anyway.
This turns the perfcore format into more a log than it was before,
which lets us properly log process, thread and region
creation/destruction. This also makes it unnecessary to dump the
process' regions every time it is scheduled like we did before.
Incidentally this also fixes 'profile -c' because we previously ended
up incorrectly dumping the parent's region map into the profile data.
Log-based mmap support enables profiling shared libraries which
are loaded at runtime, e.g. via dlopen().
This enables profiling both the parent and child process for
programs which use execve(). Previously we'd discard the profiling
data for the old process.
The Profiler tool has been updated to not treat thread IDs as
process IDs anymore. This enables support for processes with more
than one thread. Also, there's a new widget to filter which
process should be displayed.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.
See: https://spdx.dev/resources/use/#identifiers
This was done with the `ambr` search and replace tool.
ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
While profiling all processes the profile buffer lives forever.
Once you have copied the profile to disk, there's no need to keep it
in memory. This syscall surfaces the ability to clear that buffer.
This should allow creating intrusive lists that have smart pointers,
while remaining free (compared to the impl before this commit) when
holding raw pointers :^)
As a sidenote, this also adds a `RawPtr<T>` type, which is just
equivalent to `T*`.
Note that this does not actually use such functionality, but is only
expected to pave the way for #6369, to replace NonnullRefPtrVector<T>
with intrusive lists.
As it is with zero-cost things, this makes the interface a bit less nice
by requiring the type name of what an `IntrusiveListNode` holds (and
optionally its container, if not RawPtr), and also requiring the type of
the container (normally `RawPtr`) on the `IntrusiveList` instance.
The previous architecture had a huge flaw: the pointer to the protected
data was itself unprotected, allowing you to overwrite it at any time.
This patch reorganizes the protected data so it's part of the Process
class itself. (Actually, it's a new ProcessBase helper class.)
We use the first 4 KB of Process objects themselves as the new storage
location for protected data. Then we make Process objects page-aligned
using MAKE_ALIGNED_ALLOCATED.
This allows us to easily turn on/off write-protection for everything in
the ProcessBase portion of Process. :^)
Thanks to @bugaevc for pointing out the flaw! This is still not perfect
but it's an improvement.