We already asked the region about what its library name is, and we also
use that value when maybe constructing a path, so let's make the check
use that as well.
This issue was also present in the kernel, the description of which is
provided in an identically titled commit.
Note that this couldn't have affected any programs running in
UserspaceEmulator as we don't support SSE instructions, and don't seem
to raise faults under any conditions.
The fact that profiles are json on one giant line makes them very
difficult to debug when things go wrong. Instead make sure to wrap
each event or sample on a newline so you can easily grep/heap/tail
the profile files.
It was fragile to use the address of the body of the memory management
functions to disable memory auditing within them. Functions called from
these did not get exempted from the audits, so in some cases
UserspaceEmulator reported bogus heap buffer overflows.
Memory auditing did not work at all on Clang because when querying the
addresses, their offset was taken relative to the base of `.text` which
is not the first segment in the `R/RX/RW(RELRO)/RW(non-RELRO)` layout
produced by LLD.
Similarly to when setting metadata about the allocations, we now use the
`emuctl` system call to selectively suppress auditing when we reach
these functions. This ensures that functions called from `malloc` are
affected too, and no issues occur because of the inconsistency between
Clang and GCC memory layouts.
... segment
This happens with binaries build with Clang or with a custom linker
script. If this is the case, offsets should be calculated not from the
base address of `.text`, but from the first section loaded for the
library.
This commit moves all UserspaceEmulator symbolication into a common
helper function and fixes a FIXME.
`ue --profile --profile-file ~/some-file.profile id` can now generate a
full profile (instruction-by-instruction, if needed), at the cost of not
being able to see past the syscall boundary (a.la. callgrind).
This makes it significantly easier to profile seemingly fast userspace
things, like Loader.so :^)
The LexicalPath instance methods dirname(), basename(), title() and
extension() will be changed to return StringView const& in a further
commit. Due to this, users creating temporary LexicalPath objects just
to call one of those getters will recieve a StringView const& pointing
to a possible freed buffer.
To avoid this, static methods for those APIs have been added, which will
return a String by value to avoid those problems. All cases where
temporary LexicalPath objects have been used as described above haven
been changed to use the static APIs.
For now this only allows us to single-step through execution and inspect
part of the execution environment for debugging
This also allows to run to function return and sending signals to the VM
This changes the behavior of SIGINT for UE to pause execution and then
terminate if already paused
A way of setting a watchpoint for a function would be a good addition in
the future, the scaffold for this is already present, we only need to
figure out a way to find the address of a function
On a side note I have changed all occurences of west-const to east const
This adds __attribute__((used)) to the function declaration so the
compiler doesn't discard it. It also makes the function NEVER_INLINE
so that we don't end up with multiple copies of the function. This
is necessary because the function uses inline assembly to define some
unique labels.
We had two functions for doing mostly the same thing. Combine both
of them into String::find() and use that everywhere.
Also add some tests to cover basic behavior.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.
See: https://spdx.dev/resources/use/#identifiers
This was done with the `ambr` search and replace tool.
ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
The auditing code always starts by checking if we're in one of the
ignored code ranges (malloc, free, realloc, syscall, etc.)
To reduce the number of checks needed, we can cache the bounds of
the LibC text segment. This allows us to fast-reject addresses that
cannot possibly be a LibC function.
This returns ENOSYS if you are running in the real kernel, and some
other result if you are running in UserspaceEmulator.
There are other ways we could check if we're inside an emulator, but
it seemed easier to just ask. :^)
This is basically just for consistency, it's quite strange to see
multiple AK container types next to each other, some with and some
without the namespace prefix - we're 'using AK::Foo;' a lot and should
leverage that. :^)
(...and ASSERT_NOT_REACHED => VERIFY_NOT_REACHED)
Since all of these checks are done in release builds as well,
let's rename them to VERIFY to prevent confusion, as everyone is
used to assertions being compiled out in release.
We can introduce a new ASSERT macro that is specifically for debug
checks, but I'm doing this wholesale conversion first since we've
accumulated thousands of these already, and it's not immediately
obvious which ones are suitable for ASSERT.
This patch brings Kernel::RangeAllocator to UserspaceEmulator in a
slightly simplified form.
It supports the basic three allocation types needed by virt$mmap():
allocate_anywhere, allocate_specific, and allocate_randomized.
Porting virt$mmap() and virt$munmap() to use the allocator makes
UE work correctly once again. :^)
This achieves two things:
- Programs can now intentionally perform arbitrary syscalls by calling
syscall(). This allows us to work on things like syscall fuzzing.
- It restricts the ability of userspace to make syscalls to a single
4KB page of code. In order to call the kernel directly, an attacker
must now locate this page and call through it.
Personally, I prefer the naming convention DEBUG_FOO over FOO_DEBUG, but
the majority of the debug macros are already named in the latter naming
convention, so I just enforce consistency here.
This was done with the following script:
find . \( -name '*.cpp' -o -name '*.h' -o -name '*.in' \) -not -path './Toolchain/*' -not -path './Build/*' -exec sed -i -E 's/DEBUG_PATH/PATH_DEBUG/' {} \;
This was done with the help of several scripts, I dump them here to
easily find them later:
awk '/#ifdef/ { print "#cmakedefine01 "$2 }' AK/Debug.h.in
for debug_macro in $(awk '/#ifdef/ { print $2 }' AK/Debug.h.in)
do
find . \( -name '*.cpp' -o -name '*.h' -o -name '*.in' \) -not -path './Toolchain/*' -not -path './Build/*' -exec sed -i -E 's/#ifdef '$debug_macro'/#if '$debug_macro'/' {} \;
done
# Remember to remove WRAPPER_GERNERATOR_DEBUG from the list.
awk '/#cmake/ { print "set("$2" ON)" }' AK/Debug.h.in
All users of this mechanism have been switched to anonymous files and
passing file descriptors with sendfd()/recvfd().
Shbufs got us where we are today, but it's time we say good-bye to them
and welcome a much more idiomatic replacement. :^)
This 'data loss' was introduced in 809a8ee693, because
I hoped we could eventually outlaw overlong paths entirely. This sparked some discussion:
https://github.com/SerenityOS/serenity/discussions/4357
Among other things, we agree that yeah, the Kernel can and should be able to return
paths of arbitrary length. This means that the 'arbitrary' maximum of PATH_MAX in
UserspaceEmulator should be considered to be unnecessary data loss, and as such, needs to
be fixed.
This API was a mostly gratuitous deviation from POSIX that gave up some
portability in exchange for avoiding the occasional strlen().
I don't think that was actually achieving anything valuable, so let's
just chill out and have the same open() API as everyone else. :^)