If we're flushing user space pointers and the process only has one
thread, we do not need to broadcast this to other processors as
they will all discard that request anyway.
We were failing to round down the base of partial VM ranges. This led
to split regions being constructed that could have a non-page-aligned
base address. This would then trip assertions in the VM code.
Found by fuzz-syscalls. :^)
If a program attempts to write from more than a million different locations,
there is likely shenaniganery afoot! Refuse to write to prevent kmem exhaustion.
Found by fuzz-syscalls. Can be reproduced by running this in the Shell:
$ syscall writev 1 [ 0 ] 0x08000000
Found by fuzz-syscalls. Can be reproduced by running this in the Shell:
$ syscall exit_thread
This leaves the process in the 'Dying' state but never actually removes it.
Therefore, avoid this scenario by pretending to exit the entire process.
Since the payload size is user-controlled, this could be used to
overflow the kernel stack.
We should probably also be breaking things into smaller packets at a
higher level, e.g TCPSocket::protocol_send(), but let's do that as
a separate exercise.
Fixes#5310.
Not sure why this was 4 MiB in the first place, but that's a lot of
memory to reserve for each thread when we're running with 512 MiB
total in the default testing setup. :^)
* We don't have to lock the "all IPv4 sockets" in exclusive mode, shared mode is
enough for just reading the list (as opposed to modifying it).
* We don't have to lock socket's own lock at all, the IPv4Socket::did_receive()
implementation takes care of this.
* Most importantly, we don't have to hold the "all IPv4 sockets" across the
IPv4Socket::did_receive() call(s). We can copy the current ICMP socket list
while holding the lock, then release the lock, and then call
IPv4Socket::did_receive() on all the ICMP sockets in our list.
These changes fix a deadlock triggered by receiving ICMP messages when using tap
networking setup (as opposed to QEMU's default user/SLIRP networking) on the host.
The way we read/write directories is very inefficient, and this doesn't
solve any of that. It does however reduce memory usage of directory
entry vectors by 25% which has nice immediate benefits.
CLion doesn't understand that we switch compilers mid-build (which I
can understand since it's a bit unusual.) Defining __serenity__ makes
the majority of IDE features work correctly in the kernel context.
sys$fork() already takes care of children inheriting the parent's root
directory, so there was no need to do the same thing when creating a
new user process.
Add a per-process ptrace lock and use it to prevent ptrace access to a
process after it decides to commit to a new executable in sys$execve().
Fixes#5230.
This patch adds Space, a class representing a process's address space.
- Each Process has a Space.
- The Space owns the PageDirectory and all Regions in the Process.
This allows us to reorganize sys$execve() so that it constructs and
populates a new Space fully before committing to it.
Previously, we would construct the new address space while still
running in the old one, and encountering an error meant we had to do
tedious and error-prone rollback.
Those problems are now gone, replaced by what's hopefully a set of much
smaller problems and missing cleanups. :^)
Wrap thread creation in a Thread::try_create() helper that first
allocates a kernel stack region. If that allocation fails, we propagate
an ENOMEM error to the caller.
This avoids the situation where a thread is half-constructed, without a
valid kernel stack, and avoids having to do messy cleanup in that case.
We now build the kernel with partial UBSAN support.
The following -fsanitize sub-options are enabled:
* nonnull-attribute
* bool
If the kernel detects UB at runtime, it will now print a debug message
with a stack trace. This is very cool! I'm leaving it on by default for
now, but we'll probably have to re-evaluate this as more options are
enabled and slowdown increases.
This achieves two things:
- Programs can now intentionally perform arbitrary syscalls by calling
syscall(). This allows us to work on things like syscall fuzzing.
- It restricts the ability of userspace to make syscalls to a single
4KB page of code. In order to call the kernel directly, an attacker
must now locate this page and call through it.
Leaking macros across headers is a terrible thing, but I can't think of
a better way of achieving this.
- We need some way of modifying debug macros from CMake to implement
ENABLE_ALL_THE_DEBUG_MACROS.
- We need some way of modifying debug macros in specific source files
because otherwise we need to rebuild too many files.
This was done using the following script:
sed -i -E 's/#cmakedefine01 ([A-Z0-9_]+)/#ifndef \1\n\0\n#endif\n/' AK/Debug.h.in
sed -i -E 's/#cmakedefine01 ([A-Z0-9_]+)/#ifndef \1\n\0\n#endif\n/' Kernel/Debug.h.in
There's no need for this to be generic and support running from an
arbitrary thread context. Perf events are always generated from within
the thread being profiled, so take advantage of that to simplify the
code. Also use Vector capacity to avoid heap allocations.
We were checking for size_t (unsigned) overflow but the current offset
is actually stored as off_t (signed). Fix this, and also fail with
EOVERFLOW correctly.
This change can be actually seen as two logical changes, the first
change is about to ensure we only read the ATA Status register only
once, because if we read it multiple times, we acknowledge interrupts
unintentionally. To solve this issue, we always use the alternate Status
register and only read the original status register in the IRQ handler.
The second change is how we handle interrupts - if we use DMA, we can
just complete the request and return from the IRQ handler. For PIO mode,
it's more complicated. For PIO write operation, after setting the ATA
registers, we send out the data to IO port, and wait for an interrupt.
For PIO read operation, we set the ATA registers, and wait for an
interrupt to fire, then we just read from the data IO port.