The DT_RELR relocation is a relatively new relocation encoding designed
to achieve space-efficient relative relocations in PIE programs.
The description of the format is available here:
https://groups.google.com/g/generic-abi/c/bX460iggiKg/m/Pi9aSwwABgAJ
It works by using a bitmap to store the offsets which need to be
relocated. Even entries are *address* entries: they contain an address
(relative to the base of the executable) which needs to be relocated.
Subsequent even entries are *bitmap* entries: "1" bits encode offsets
(in word size increments) relative to the last address entry which need
to be relocated.
This is in contrast to the REL/RELA format, where each entry takes up
2/3 machine words. Certain kinds of relocations store useful data in
that space (like the name of the referenced symbol), so not everything
can be encoded in this format. But as position-independent executables
and shared libraries tend to have a lot of relative relocations, a
specialized encoding for them absolutely makes sense.
The authors of the format suggest an overall 5-20% reduction in the file
size of various programs. Due to our extensive use of dynamic linking
and us not stripping debug info, relative relocations don't make up such
a large portion of the binary's size, so the measurements will tend to
skew to the lower side of the spectrum.
The following measurements were made with the x86-64 Clang toolchain:
- The kernel contains 290989 relocations. Enabling RELR decreased its
size from 30 MiB to 23 MiB.
- LibUnicodeData contains 190262 relocations, almost all of them
relative. Its file size changed from 17 MiB to 13 MiB.
- /bin/WebContent contains 1300 relocations, 66% of which are relative
relocations. With RELR, its size changed from 832 KiB to 812 KiB.
This change was inspired by the following blog post:
https://maskray.me/blog/2021-10-31-relative-relocations-and-relr
The dynamic loader was mistakenly assuming that there are only two types
of program load headers: text (RX) and data (RW).
Now that we're linking with `-z separate-code`, we will also get some
read-onlydata (R) segments. These can be memory-mapped directly without
making a private per-process copy.
To solve this, the code now instead separates the headers into map/copy
instead of text/data. Writable segments get copied, while non-writable
segments get memory-mapped. :^)
GNU ld sometimes generates zero-sized PT_LOAD headers when running with
the "-z separate-code" option. Let's not choke on such headers, we can
just ignore them and move along.
These integer => pointer => integer conversions were technically prone
to UB, since they were used as offsets (which are perfectly fine to be
zero), but we calculated them with pointer arithmetic. This made Clang
insert pointer overflow UBSAN checks, which trigger in case of a zero
result.
It's perfectly acceptable for the segment's vaddr to not be page aligned
as long as the segment itself is page-aligned. We'll just map a few more
bytes at the start of the segment that will be unused by the library.
We didn't notice this problem because because GCC either always uses
0 for the .text segment's vaddr or at least aligns the vaddr to the
page size.
LibELF would also fail to load really small libraries (i.e. smaller than
4096 bytes).
My previous patch (1f93ffcd) broke loading objects whose first PT_LOAD
entry had a non-zero vaddr.
On top of that the calculations for the relro and dynamic section were
also incorrect.
By constraining two implementations, the compiler will select the best
fitting one. All this will require is duplicating the implementation and
simplifying for the `void` case.
This constraining also informs both the caller and compiler by passing
the callback parameter types as part of the constraint
(e.g.: `IterationFunction<int>`).
Some `for_each` functions in LibELF only take functions which return
`void`. This is a minimal correctness check, as it removes one way for a
function to incompletely do something.
There seems to be a possible idiom where inside a lambda, a `return;` is
the same as `continue;` in a for-loop.
With this fixed dlopen() no longer crashes when given an invalid
ELF image and instead returns an error code that can be retrieved
with dlerror().
Fixes#6995.
This changes the TLS offset calculation logic to be based on the
symbol's size instead of the total size of the TLS.
Because of this change, we no longer need to pipe "m_tls_size" to so
many functions.
Also, After this patch, the TLS data of the main program exists at the
"end" of the TLS block (Highest addresses).
This fixes a part of #6609.
Previously, TLS data was always zero-initialized.
To support initializing the values of TLS data, sys$allocate_tls now
receives a buffer with the desired initial data, and copies it to the
master TLS region of the process.
The DynamicLinker gathers the initial TLS image and passes it to
sys$allocate_tls.
We also now require the size passed to sys$allocate_tls to be
page-aligned, to make things easier. Note that this doesn't waste memory
as the TLS data has to be allocated in separate pages anyway.
This implements more of the dlfcn functionality. Most notably:
* It's now possible to dlopen() libraries which were already
loaded at program startup time. This does not cause those
libraries to be loaded twice.
* Errors are reported via dlerror() rather than by crashing
the program.
* Calls to the dl*() functions are thread-safe.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.
See: https://spdx.dev/resources/use/#identifiers
This was done with the `ambr` search and replace tool.
ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
The calculation for TLS relocations was incorrect which would
result in overlapping TLS variables when more than one shared
object used TLS variables.
This bug can be reproduced with a shared library and a program
like this:
$ cat tlstest.c
#include <string.h>
__thread char tls_val[1024];
void set_val() { memset(tls_val, 0, sizeof(tls_val)); }
$ gcc -g -shared -o usr/lib/libtlstest.so tlstest.c
$ cat test.c
void set_val();
int main() { set_val(); }
$ gcc -g -o tls test.c -ltlstest
Due to the way the TLS relocations are done this program would
clobber libc's TLS variables (e.g. errno).
Merge the load_elf() and commit_elf() functions into a single
load_main_executable() function that takes care of both things.
Also split "stage 3" into two separate stages, keeping the lazy
relocations in stage 3, and adding a stage 4 for calling library
initialization functions.
We also make sure to map the main executable before dealing with
any of its dependencies, to ensure that non-PIE executables get
loaded at their desired address.
Instead of having a special case in the dynamic loader where we ignore
TM-related GCC symbols, just stub them out in LibC like we already do
for various other things we don't support.
(...and ASSERT_NOT_REACHED => VERIFY_NOT_REACHED)
Since all of these checks are done in release builds as well,
let's rename them to VERIFY to prevent confusion, as everyone is
used to assertions being compiled out in release.
We can introduce a new ASSERT macro that is specifically for debug
checks, but I'm doing this wholesale conversion first since we've
accumulated thousands of these already, and it's not immediately
obvious which ones are suitable for ASSERT.
When performing a global symbol lookup, we were recomputing the symbol
hashes once for every dynamic object searched. The hash function was
at the very top of a profile (15%) of program startup.
With this change, the hash function is no longer visible among the top
stacks in the profile. :^)