We now have a proper aligned allocation implementation, and the
toolchain patch to make Clang use the intermediary implementation
has already been removed in an earlier iteration.
Some ports linked against posix_memalign, but didn't use it, and others
used it if it was Available. So I decided to implement posix_memalign.
My implementation adds almost no overhead to regular mallocs. However,
if an alignment is specified, it will use the smallest ChunkedBlock, for
which aligned chunks exist, and simply use one of the chunks that is
aligned. If it cannot use a ChunkedBlock, for size or alignment reasons,
it will use a BigAllocationBlock, and return a pointer to the first
aligned address past the start of the block. This implementation
supports alignments up to 32768, due to the limitations of the
BigAllocationBlock technique.
We don't mutate the pointed-to memory, so let's be const correct.
Fixes building the `mimalloc` library that's optionally used by the mold
linker (note that it isn't enabled yet as I haven't tested it).
NoAllocationGuard is an RAII stack guard that prevents allocations
while it exists. This is done through a thread-local global flag which
causes malloc to crash on a VERIFY if it is false. The guard allows for
recursion.
The intended use case for this class is in real-time audio code. In such
code, allocations are really bad, and this is an easy way of dynamically
enforcing the no-allocations rule while giving the user good feedback if
it is violated. Before real-time audio code is executed, e.g. in LibDSP,
a NoAllocationGuard is instantiated. This is not done with this commit,
as currently some code in LibDSP may still incorrectly allocate in real-
time situations.
Other use cases for the Kernel have also been added, so this commit
builds on the previous to add the support both in Userland and in the
Kernel.
In order to reduce our reliance on __builtin_{ffs, clz, ctz, popcount},
this commit removes all calls to these functions and replaces them with
the equivalent functions in AK/BuiltinWrappers.h.
C++17 introduced aligned versions of `new` and `delete`, which are
automatically called by the compiler when allocating over-aligned
objects. As with the regular allocator functions, these are generally
thin wrappers around LibC.
We did not have support for aligned allocations in LibC, so this was not
possible. While libstdc++ has a fallback implementation, libc++ does
not, so the aligned allocation function was disabled internally. This
made building the LLVM port with Clang impossible.
Note that while the Microsoft docs say that aligned_malloc and
_aligned_free are declared in `malloc.h`, libc++ doesn't #include that
file, but instead relies on the definition coming from `stdlib.h`.
Therefore, I chose to declare it in that file instead of creating a new
LibC header.
I chose not to implement the more Unix-y `memalign`, `posix_memalign`,
or the C11 `aligned_alloc`, because that would require us to
significantly alter the memory allocator's internals. See the comment in
malloc.cpp.
If we hit an assertion while the heap isn't in a stable state, we can't
rely on dynamic memory allocation because the malloc mutex is already
held and the heap is most likely corrupted. Instead, we need to bail
out fast before we make the situation even worse.
This is no longer needed as per the previous commit, UserspaceEmulator's
malloc tracer now correctly handles functions called from within
`malloc` and `free`. This might also have a benefit on performance
because forcibly inlining all function calls pessimizes cache locality.
It was fragile to use the address of the body of the memory management
functions to disable memory auditing within them. Functions called from
these did not get exempted from the audits, so in some cases
UserspaceEmulator reported bogus heap buffer overflows.
Memory auditing did not work at all on Clang because when querying the
addresses, their offset was taken relative to the base of `.text` which
is not the first segment in the `R/RX/RW(RELRO)/RW(non-RELRO)` layout
produced by LLD.
Similarly to when setting metadata about the allocations, we now use the
`emuctl` system call to selectively suppress auditing when we reach
these functions. This ensures that functions called from `malloc` are
affected too, and no issues occur because of the inconsistency between
Clang and GCC memory layouts.
This patch changes the semantics of purgeable memory.
- AnonymousVMObject now has a "purgeable" flag. It can only be set when
constructing the object. (Previously, all anonymous memory was
effectively purgeable.)
- AnonymousVMObject now has a "volatile" flag. It covers the entire
range of physical pages. (Previously, we tracked ranges of volatile
pages, effectively making it a page-level concept.)
- Non-volatile objects maintain a physical page reservation via the
committed pages mechanism, to ensure full coverage for page faults.
- When an object is made volatile, it relinquishes any unused committed
pages immediately. If later made non-volatile again, we then attempt
to make a new committed pages reservation. If this fails, we return
ENOMEM to userspace.
mmap() now creates purgeable objects if passed the MAP_PURGEABLE option
together with MAP_ANONYMOUS. anon_create() memory is always purgeable.
- Use a simple pthread_mutex_t instead of bringing in headers from
LibThreading just to get a mutex.
- Use a normal mutex instead of a recursive one.
- Remove redundant locking in realloc().
When creating uninitialized storage for variables, we need to make sure
that the alignment is correct. Fixes a KUBSAN failure when running
kernels compiled with Clang.
In `Syscalls/socket.cpp`, we can simply use local variables, as
`sockaddr_un` is a POD type.
Along with moving the `alignas` specifier to the correct member,
`AK::Optional`'s internal buffer has been made non-zeroed by default.
GCC emitted bogus uninitialized memory access warnings, so we now use
`__builtin_launder` to tell the compiler that we know what we are doing.
This might disable some optimizations, but judging by how GCC failed to
notice that the memory's initialization is dependent on `m_has_value`,
I'm not sure that's a bad thing.
Previously each malloc size class would keep around a limited number of
unused blocks which were marked with MADV_SET_VOLATILE which could then
be reinitialized when additional blocks were needed.
This changes malloc() so that it also keeps around a number of blocks
without marking them with MADV_SET_VOLATILE. I termed these "hot"
blocks whereas blocks which were marked as MADV_SET_VOLATILE are called
"cold" blocks because they're more expensive to reinitialize.
In the worst case this could increase memory usage per process by
1MB when a program requests a bunch of memory and frees all of it.
Also, in order to make more efficient use of these unused blocks
they're now shared between size classes.
Problem:
- `size_classes` is a C-style array which makes it difficult to use in
algorithms.
- `all_of` algorithm is re-written for the specific implementation.
Solution:
- Change `size_classes` to be an `Array`.
- Directly use the generic `all_of` algorithm instead of
reimplementing.
This implements the macOS API malloc_good_size() which returns the
true allocation size for a given requested allocation size. This
allows us to make use of all the available memory in a malloc chunk.
For example, for a malloc request of 35 bytes our malloc would
internally use a chunk of size 64, however the remaining 29 bytes
would be unused.
Knowing the true allocation size allows us to request more usable
memory that would otherwise be wasted and make that available for
Vector, HashTable and potentially other callers in the future.
The LOCKER() macro appears to have been added to LibThread as a
userspace analog to the previous LOCKER() macro that existed in
the kernel. The kernel version used the macro to inject __FILE__ and
__LINE__ number into the lock acquisition for debugging. However
AK::SourceLocation was used to remove the need for the macro. So
the kernel version no longer exists. The LOCKER() in LibThread doesn't
appear to actually need to be a macro, using the type directly works
fine, and arguably is more readable as it removes an unnecessary
level of indirection.
Legally we could just return a null pointer, however returning a
pointer other than the null pointer is more compatible with
improperly written software that assumes that a null pointer means
allocation failure.
By default malloc manages memory internally in larger blocks. When
one of those blocks is added we initialize a free list by touching
each of the new block's pages, thereby committing all that memory
upfront.
This changes malloc to build the free list on demand which as a
bonus also distributes the latency hit for new blocks more evenly
because the page faults for the zero pages now don't happen all at
once.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.
See: https://spdx.dev/resources/use/#identifiers
This was done with the `ambr` search and replace tool.
ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
calloc() was internally calling malloc_impl() which would scrub out
all the allocated memory with the scrub byte (0xdc). We would then
immediately zero-fill the memory.
This was obviously a waste of time, and our hash tables were doing
it all the time. :^)
(...and ASSERT_NOT_REACHED => VERIFY_NOT_REACHED)
Since all of these checks are done in release builds as well,
let's rename them to VERIFY to prevent confusion, as everyone is
used to assertions being compiled out in release.
We can introduce a new ASSERT macro that is specifically for debug
checks, but I'm doing this wholesale conversion first since we've
accumulated thousands of these already, and it's not immediately
obvious which ones are suitable for ASSERT.
Just ignore all these environment flags if the AT_SECURE flag is set in
the program's auxiliary vector.
This prevents a user from tricking set-uid programs into dumping debug
information via environment flags.