This bug is a good example why copy-paste code should eventually be eliminated
from the code base: Apparently the code was copied from read.cpp before
c6027ed7cc, so the same bug got introduced here.
To recap: A malicious program can ask the Kernel to prepare sys-ing to
a huge amount of iovecs. The Kernel must first copy all the vector locations
into 'vecs', and before that allocates an arbitrary amount of memory:
vecs.resize(iov_count);
This can cause Kernel memory exhaustion, triggered by any malicious userland
program.
This removes some hard references to the toolchain, some unnecessary
uses of an external install command, and disables a -Werror flag (for
the time being) - only if run inside serenity.
With this, we can build and link the kernel :^)
KASAN is a dynamic analysis tool that finds memory errors. It focuses
mostly on finding use-after-free and out-of-bound read/writes bugs.
KASAN works by allocating a "shadow memory" region which is used to store
whether each byte of memory is safe to access. The compiler then instruments
the kernel code and a check is inserted which validates the state of the
shadow memory region on every memory access (load or store).
To fully integrate KASAN into the SerenityOS kernel we need to:
a) Implement the KASAN interface to intercept the injected loads/stores.
void __asan_load*(address);
void __asan_store(address);
b) Setup KASAN region and determine the shadow memory offset + translation.
This might be challenging since Serenity is only 32bit at this time.
Ex: Linux implements kernel address -> shadow address translation like:
static inline void *kasan_mem_to_shadow(const void *addr)
{
return ((unsigned long)addr >> KASAN_SHADOW_SCALE_SHIFT)
+ KASAN_SHADOW_OFFSET;
}
c) Integrating KASAN with Kernel allocators.
The kernel allocators need to be taught how to record allocation state
in the shadow memory region.
This commit only implements the initial steps of this long process:
- A new (default OFF) CMake build flag `ENABLE_KERNEL_ADDRESS_SANITIZER`
- Stubs out enough of the KASAN interface to allow the Kernel to link clean.
Currently the KASAN kernel crashes on boot (triple fault because of the crash
in strlen other sanitizer are seeing) but the goal here is to just get started,
and this should help others jump in and continue making progress on KASAN.
References:
* ASAN Paper: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37752.pdf
* KASAN Docs: https://github.com/google/kasan
* NetBSD KASAN Blog: https://blog.netbsd.org/tnf/entry/kernel_address_sanitizer_part_3
* LWN KASAN Article: https://lwn.net/Articles/612153/
* Tracking Issue #5351
There is no reason to call a getter without observing the result, doing
so indicates an error in the code. Mark these methods as [[nodiscard]]
to find these cases.
There is no reason to call a getter without observing the result, doing
so indicates an error in the code. Mark these methods as [[nodiscard]]
to find these cases.
`UserOrKernelBuffer` objects should always be observed when created, in
turn there is no reason to call a getter without observing the result.
Doing either of these indicates an error in the code. Mark these methods
as [[nodiscard]] to find these cases.
There is no reason to call a getter without observing the result, doing
so indicates an error in the code. Mark these methods as [[nodiscard]]
to find these cases.
There is no reason to call a getter without observing the result, doing
so indicates an error in the code. Mark these methods as [[nodiscard]]
to find these cases.
In the majority of cases we want to force callers to observe the
result of a blocking operation as it's not grantee to succeed as
they expect. Mark BlockResult as [[nodiscard]] to force any callers
to observe the result of the blocking operation.
In preparation for marking BlockingResult [[nodiscard]], there are a few
places that perform infinite waits, which we never observe the result of
the wait. Instead of suppressing them, add an alternate function which
returns void when performing and infinite wait.
You can now use the READONLY_AFTER_INIT macro when declaring a variable
and we will put it in a special ".ro_after_init" section in the kernel.
Data in that section remains writable during the boot and init process,
and is then marked read-only just before launching the SystemServer.
This is based on an idea from the Linux kernel. :^)
Since kernel stacks are much smaller (64 KiB) than userspace stacks,
we only add a small bit of randomness here (0-256 bytes, 16b aligned.)
This makes the location of the task context switch buffer not be
100% predictable. Note that we still also add extra randomness upon
syscall entry, so this patch primarily affects context switching.
This patch adds a random offset between 0 and 4096 to the initial
stack pointer in new processes. Since the stack has to be 16-byte
aligned, the bottom bits can't be randomized.
Yet another thing to make things less predictable. :^)
We were doing stack and syscall-origin region validations before
taking the big process lock. There was a window of time where those
regions could then be unmapped/remapped by another thread before we
proceed with our syscall.
This patch closes that window, and makes sys$get_stack_bounds() rely
on the fact that we now know the userspace stack pointer to be valid.
Thanks to @BenWiederhake for spotting this! :^)
If we try to align a number above 0xfffff000 to the next multiple of
the page size (4 KiB), it would wrap around to 0. This is most likely
never what we want, so let's assert if that happens.
Let's be a little more expressive when inducing a kernel panic. :^)
PANIC(...) passes any arguments you give it to dmesgln(), then prints
a backtrace and hangs the machine.
Now that we no longer need to support the signal trampolines being
user-accessible inside the kernel memory range, we can get rid of the
"kernel" and "user-accessible" flags on Region and simply use the
address of the region to determine whether it's kernel or user.
This also tightens the page table mapping code, since it can now set
user-accessibility based solely on the virtual address of a page.
The signal trampoline was previously in kernelspace memory, but with
a special exception to make it user-accessible.
This patch moves it into each process's regular address space so we
can stop supporting user-allowed memory above 0xc0000000.