This commit adds the used relocation types to elf.h, and handles the
types in DynamicLoader and DynamicObject. No new functionalitty has to
be added, as the same code can be reused between aarch64 and x86_64.
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.
One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
This automatically fixes an issue where we were accidentally copying
garbage data from beyond the TLS segment as uninitialized data isn't
actually stored inside the image.
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).
No functional changes.
Previously we would just tightly pack the different libraries' TLS
segments together, but that is incorrect, as they might require some
kind of minimum alignment for their TLS base address.
We now plumb the required TLS segment alignment down to the TLS block
linear allocator and align the base address down to the appropriate
alignment.
The original heuristic of "a library being in `s_global_objects` means
that it was fully initialized already" doesn't hold up anymore since we
changed the loading order. This was causing us to skip parts of the
initialization of dependency libraries when running dlopen (since it was
the only user of that setting).
Instead, set a flag after we run stage 4 (which is the "run the global
initializers" stage) and check that flag when determining unfinished
dependencies. This entirely replaces the `skip_global_objects` logic.
This keeps us from needlessly allocating storage via `malloc` as part
of the `Vector`s that early, which we might conflict on while reserving
memory for the main executable.
.text sections of objects that contain textrels have to be writable
during the relocation procedure. Because of this, we would segfault if
we tried to execute IFUNC resolvers defined in them. Let's print a
meaningful error message instead.
Additionally, a warning is now printed when we load objects with
textrels, as in the future, additional security mitigations might
interfere with them being loaded.
IFUNC is a GNU extension to the ELF standard that allows a function to
have multiple implementations. A resolver function has to be called at
load time to choose the right one to use. The PLT will contain the entry
to the resolved function, so branching and more indirect jumps can be
avoided at run-time.
This mechanism is usually used when a routine can be made faster using
CPU features that are available in only some models, and a fallback
implementation has to exist for others.
We will use this feature to have two separate memset implementations for
CPUs with and without ERMS (Enhanced REP MOVSB/STOSB) support.
IFUNC resolvers depend on the resolved function's address having been
relocated by the time they are called. This means that relative
relocations have to be done first.
The linker is kind enough to put R_*_RELATIVE before R_*_IRELATIVE in
.rel.dyn, but .relr.dyn contains relative relocations too.
The DT_RELR relocation is a relatively new relocation encoding designed
to achieve space-efficient relative relocations in PIE programs.
The description of the format is available here:
https://groups.google.com/g/generic-abi/c/bX460iggiKg/m/Pi9aSwwABgAJ
It works by using a bitmap to store the offsets which need to be
relocated. Even entries are *address* entries: they contain an address
(relative to the base of the executable) which needs to be relocated.
Subsequent even entries are *bitmap* entries: "1" bits encode offsets
(in word size increments) relative to the last address entry which need
to be relocated.
This is in contrast to the REL/RELA format, where each entry takes up
2/3 machine words. Certain kinds of relocations store useful data in
that space (like the name of the referenced symbol), so not everything
can be encoded in this format. But as position-independent executables
and shared libraries tend to have a lot of relative relocations, a
specialized encoding for them absolutely makes sense.
The authors of the format suggest an overall 5-20% reduction in the file
size of various programs. Due to our extensive use of dynamic linking
and us not stripping debug info, relative relocations don't make up such
a large portion of the binary's size, so the measurements will tend to
skew to the lower side of the spectrum.
The following measurements were made with the x86-64 Clang toolchain:
- The kernel contains 290989 relocations. Enabling RELR decreased its
size from 30 MiB to 23 MiB.
- LibUnicodeData contains 190262 relocations, almost all of them
relative. Its file size changed from 17 MiB to 13 MiB.
- /bin/WebContent contains 1300 relocations, 66% of which are relative
relocations. With RELR, its size changed from 832 KiB to 812 KiB.
This change was inspired by the following blog post:
https://maskray.me/blog/2021-10-31-relative-relocations-and-relr
The dynamic loader was mistakenly assuming that there are only two types
of program load headers: text (RX) and data (RW).
Now that we're linking with `-z separate-code`, we will also get some
read-onlydata (R) segments. These can be memory-mapped directly without
making a private per-process copy.
To solve this, the code now instead separates the headers into map/copy
instead of text/data. Writable segments get copied, while non-writable
segments get memory-mapped. :^)
GNU ld sometimes generates zero-sized PT_LOAD headers when running with
the "-z separate-code" option. Let's not choke on such headers, we can
just ignore them and move along.
These integer => pointer => integer conversions were technically prone
to UB, since they were used as offsets (which are perfectly fine to be
zero), but we calculated them with pointer arithmetic. This made Clang
insert pointer overflow UBSAN checks, which trigger in case of a zero
result.
It's perfectly acceptable for the segment's vaddr to not be page aligned
as long as the segment itself is page-aligned. We'll just map a few more
bytes at the start of the segment that will be unused by the library.
We didn't notice this problem because because GCC either always uses
0 for the .text segment's vaddr or at least aligns the vaddr to the
page size.
LibELF would also fail to load really small libraries (i.e. smaller than
4096 bytes).
My previous patch (1f93ffcd) broke loading objects whose first PT_LOAD
entry had a non-zero vaddr.
On top of that the calculations for the relro and dynamic section were
also incorrect.