For some reason I don't yet understand, building the kernel with -O2
produces a way-too-large kernel on some people's systems.
Since there are some really nice performance benefits from -O2 in
userspace, let's do a compromise and build Userland with -O2 but
put Kernel back into the -Os box for now.
Due to the non-standard way the boot assembler code is linked into
the kernel (not and actual dependency, but linked via linker.ld script)
both make and ninja weren't re-linking the kernel when boot.S was
changed. This should theoretically work since we use the cmake
`add_dependencies(..)` directive to express a manual dependency
on boot from Kernel, but something is obviously broken in cmake.
We can work around that with a hack, which forces a dependency on
a file we know will always exist in the kernel (init.cpp). So if
boot.S is rebuilt, then init.cpp is forced to be rebuilt, and then
we re-link the kernel. init.cpp is also relatively small, so it
compiles fast.
With the kernel command line issue fixed, we can now enable these
KUBSAN options without getting triple faults on startup:
* alignment
* null
* pointer-overflow
Clangd (CLion) was choking on some of the -fsanitize options, and since
we're not building the kernel with Clang anyway, let's just disable
the options for non-GCC compilers for now.
This removes some hard references to the toolchain, some unnecessary
uses of an external install command, and disables a -Werror flag (for
the time being) - only if run inside serenity.
With this, we can build and link the kernel :^)
KASAN is a dynamic analysis tool that finds memory errors. It focuses
mostly on finding use-after-free and out-of-bound read/writes bugs.
KASAN works by allocating a "shadow memory" region which is used to store
whether each byte of memory is safe to access. The compiler then instruments
the kernel code and a check is inserted which validates the state of the
shadow memory region on every memory access (load or store).
To fully integrate KASAN into the SerenityOS kernel we need to:
a) Implement the KASAN interface to intercept the injected loads/stores.
void __asan_load*(address);
void __asan_store(address);
b) Setup KASAN region and determine the shadow memory offset + translation.
This might be challenging since Serenity is only 32bit at this time.
Ex: Linux implements kernel address -> shadow address translation like:
static inline void *kasan_mem_to_shadow(const void *addr)
{
return ((unsigned long)addr >> KASAN_SHADOW_SCALE_SHIFT)
+ KASAN_SHADOW_OFFSET;
}
c) Integrating KASAN with Kernel allocators.
The kernel allocators need to be taught how to record allocation state
in the shadow memory region.
This commit only implements the initial steps of this long process:
- A new (default OFF) CMake build flag `ENABLE_KERNEL_ADDRESS_SANITIZER`
- Stubs out enough of the KASAN interface to allow the Kernel to link clean.
Currently the KASAN kernel crashes on boot (triple fault because of the crash
in strlen other sanitizer are seeing) but the goal here is to just get started,
and this should help others jump in and continue making progress on KASAN.
References:
* ASAN Paper: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37752.pdf
* KASAN Docs: https://github.com/google/kasan
* NetBSD KASAN Blog: https://blog.netbsd.org/tnf/entry/kernel_address_sanitizer_part_3
* LWN KASAN Article: https://lwn.net/Articles/612153/
* Tracking Issue #5351
Let's be a little more expressive when inducing a kernel panic. :^)
PANIC(...) passes any arguments you give it to dmesgln(), then prints
a backtrace and hangs the machine.
CLion doesn't understand that we switch compilers mid-build (which I
can understand since it's a bit unusual.) Defining __serenity__ makes
the majority of IDE features work correctly in the kernel context.
This patch adds Space, a class representing a process's address space.
- Each Process has a Space.
- The Space owns the PageDirectory and all Regions in the Process.
This allows us to reorganize sys$execve() so that it constructs and
populates a new Space fully before committing to it.
Previously, we would construct the new address space while still
running in the old one, and encountering an error meant we had to do
tedious and error-prone rollback.
Those problems are now gone, replaced by what's hopefully a set of much
smaller problems and missing cleanups. :^)
We now build the kernel with partial UBSAN support.
The following -fsanitize sub-options are enabled:
* nonnull-attribute
* bool
If the kernel detects UB at runtime, it will now print a debug message
with a stack trace. This is very cool! I'm leaving it on by default for
now, but we'll probably have to re-evaluate this as more options are
enabled and slowdown increases.
This adds support for FUTEX_WAKE_OP, FUTEX_WAIT_BITSET, FUTEX_WAKE_BITSET,
FUTEX_REQUEUE, and FUTEX_CMP_REQUEUE, as well well as global and private
futex and absolute/relative timeouts against the appropriate clock. This
also changes the implementation so that kernel resources are only used when
a thread is blocked on a futex.
Global futexes are implemented as offsets in VMObjects, so that different
processes can share a futex against the same VMObject despite potentially
being mapped at different virtual addresses.
All users of this mechanism have been switched to anonymous files and
passing file descriptors with sendfd()/recvfd().
Shbufs got us where we are today, but it's time we say good-bye to them
and welcome a much more idiomatic replacement. :^)
This patch adds a new AnonymousFile class which is a File backed by
an AnonymousVMObject that can only be mmap'ed and nothing else, really.
I'm hoping that this can become a replacement for shbufs. :^)
This patch merges the profiling functionality in the kernel with the
performance events mechanism. A profiler sample is now just another
perf event, rather than a dedicated thing.
Since perf events were already per-process, this now makes profiling
per-process as well.
Processes with perf events would already write out a perfcore.PID file
to the current directory on death, but since we may want to profile
a process and then let it continue running, recorded perf events can
now be accessed at any time via /proc/PID/perf_events.
This patch also adds information about process memory regions to the
perfcore JSON format. This removes the need to supply a core dump to
the Profiler app for symbolication, and so the "profiler coredump"
mechanism is removed entirely.
There's still a hard limit of 4MB worth of perf events per process,
so this is by no means a perfect final design, but it's a nice step
forward for both simplicity and stability.
Fixes#4848Fixes#4849
This patch adds sys$abort() which immediately crashes the process with
SIGABRT. This makes assertion backtraces a lot nicer by removing all
the gunk that otherwise happens between __assertion_failed() and
actually crashing from the SIGABRT.
Insert stack canaries to find stack corruptions in the kernel.
It looks like this was enabled in the past (842716a) but appears to have been
lost during the CMake conversion.
The `-fstack-protector-strong` variant was chosen because it catches more issues
than `-fstack-protector`, but doesn't have substantial performance impact like
`-fstack-protector-all`.
Instead of specifying the boot argument to be root=/dev/hdXY, now
one can write root=PARTUUID= with the right UUID, and if the partition
is found, the kernel will boot from it.
This feature is mainly used with GUID partitions, and is considered to
be the most reliable way for the kernel to identify partitions.
RTTI is still disabled for the Kernel, and for the Dynamic Loader. This
allows for much less awkward navigation of class heirarchies in LibCore,
LibGUI, LibWeb, and LibJS (eventually). Measured RootFS size increase
was < 1%, and libgui.so binary size was ~3.3%. The small binary size
increase here seems worth it :^)
* Add SERENITY_ARCH option to CMake for selecting the target toolchain
* Port all build scripts but continue to use i686
* Update GitHub Actions cache to include BuildIt.sh
The partitioning code was very outdated, and required a full refactor.
The new subsystem removes duplicated code and uses more AK containers.
The most important change is that all implementations of the
PartitionTable class conform to one interface, which made it possible
to remove unnecessary code in the EBRPartitionTable class.
Finding partitions is now done in the StorageManagement singleton,
instead of doing so in init.cpp.
Also, now we don't try to find partitions on demand - the kernel will
try to detect if a StorageDevice is partitioned, and if so, will check
what is the partition table, which could be MBR, GUID or EBR.
Then, it will create DiskPartitionMetadata object for each partition
that is available in the partition table. This object will be used
by the partition enumeration code to create a DiskPartition with the
correct minor number.
The DevFS along with DevPtsFS give a complete solution for populating
device nodes in /dev. The main purpose of DevFS is to eliminate the
need of device nodes generation when building the system.
Later on, DevFS will assist with exposing disk partition nodes.
This new flag controls two things:
- Whether the kernel will generate core dumps for the process
- Whether the EUID:EGID should own the process's files in /proc
Processes are automatically made non-dumpable when their EUID or EGID is
changed, either via syscalls that specifically modify those ID's, or via
sys$execve(), when a set-uid or set-gid program is executed.
A process can change its own dumpable flag at any time by calling the
new sys$prctl(PR_SET_DUMPABLE) syscall.
Fixes#4504.
This commit gets rid of ELF::Loader entirely since its very ambiguous
purpose was actually to load executables for the kernel, and that is
now handled by the kernel itself.
This patch includes some drive-by cleanup in LibDebug and CrashDaemon
enabled by the fact that we no longer need to keep the ref-counted
ELF::Loader around.
The StorageManagement class has 2 roles:
1. During boot, it should find all storage controllers in the machine,
and then determine what is the boot device.
2. Later on boot, it is a registrar of all storage controllers and
storage devices. Thus, it could be used to show information about these
devices when implemented.
This change allows the user to specify a boot driver other than /dev/hda
and if it's connected in the machine - it will boot.
This new subsystem is somewhat replacing the IDE disk code we had with a
new flexible design.
StorageDevice is a generic class that represent a generic storage
device. It is meant that specific storage hardware will override the
interface. StorageController is a generic class that represent
a storage controller that can be found in a machine.
The IDEController class governs two IDEChannels. An IDEChannel is
responsible to manage the master & slave devices of the channel,
therefore an IDEChannel is an IRQHandler.