Commit graph

87 commits

Author SHA1 Message Date
Idan Horowitz
086969277e Everywhere: Run clang-format 2022-04-01 21:24:45 +01:00
Brian Gianforcaro
7d667b9f69 LibELF: Remove unused m_program_interpreter member from DynamicLoader
While profiling I realized that this member is unused, so the
StringBuilder and String allocation are completely un-necessary.
2022-03-31 10:18:07 +02:00
Daniel Bertalan
3974cac148 LibELF: Implement support for DT_RELR relative relocations
The DT_RELR relocation is a relatively new relocation encoding designed
to achieve space-efficient relative relocations in PIE programs.

The description of the format is available here:
https://groups.google.com/g/generic-abi/c/bX460iggiKg/m/Pi9aSwwABgAJ

It works by using a bitmap to store the offsets which need to be
relocated. Even entries are *address* entries: they contain an address
(relative to the base of the executable) which needs to be relocated.
Subsequent even entries are *bitmap* entries: "1" bits encode offsets
(in word size increments) relative to the last address entry which need
to be relocated.

This is in contrast to the REL/RELA format, where each entry takes up
2/3 machine words. Certain kinds of relocations store useful data in
that space (like the name of the referenced symbol), so not everything
can be encoded in this format. But as position-independent executables
and shared libraries tend to have a lot of relative relocations, a
specialized encoding for them absolutely makes sense.

The authors of the format suggest an overall 5-20% reduction in the file
size of various programs. Due to our extensive use of dynamic linking
and us not stripping debug info, relative relocations don't make up such
a large portion of the binary's size, so the measurements will tend to
skew to the lower side of the spectrum.

The following measurements were made with the x86-64 Clang toolchain:

- The kernel contains 290989 relocations. Enabling RELR decreased its
  size from 30 MiB to 23 MiB.
- LibUnicodeData contains 190262 relocations, almost all of them
  relative. Its file size changed from 17 MiB to 13 MiB.
- /bin/WebContent contains 1300 relocations, 66% of which are relative
  relocations. With RELR, its size changed from 832 KiB to 812 KiB.

This change was inspired by the following blog post:
https://maskray.me/blog/2021-10-31-relative-relocations-and-relr
2022-02-11 18:07:53 +01:00
Andreas Kling
c482508aa1 LibELF: Use shared memory mapping when loading ELF objects
There's no reason to make a private read-only mapping just for reading
(and validating) the ELF headers, and copying out the data segments.
2022-01-15 19:51:15 +01:00
Idan Horowitz
cfb9f889ac LibELF: Accept Span instead of Pointer+Size in validate_program_headers 2022-01-13 22:40:25 +01:00
Idan Horowitz
3e959618c3 LibELF: Use StringBuilders instead of Strings for the interpreter path
This is required for the Kernel's usage of LibELF, since Strings do not
expose allocation failure.
2022-01-13 22:40:25 +01:00
Daniel Bertalan
d1ef8e63f7 LibELF: Use MAP_FIXED_NOREPLACE for address space reservation
This ensures that we don't corrupt our address space if a non-PIE
program's requested address space happens to coincide with memory we
already use.
2021-12-23 23:08:10 +01:00
Andreas Kling
b7ee0191ea LibELF: Name non-executable map regions ".rodata" instead of ".text" 2021-09-04 20:30:56 +02:00
Andreas Kling
9206efaabe LibELF: Don't copy read-only data sections
The dynamic loader was mistakenly assuming that there are only two types
of program load headers: text (RX) and data (RW).

Now that we're linking with `-z separate-code`, we will also get some
read-onlydata (R) segments. These can be memory-mapped directly without
making a private per-process copy.

To solve this, the code now instead separates the headers into map/copy
instead of text/data. Writable segments get copied, while non-writable
segments get memory-mapped. :^)
2021-09-01 01:36:18 +02:00
Andreas Kling
0819f0a3fd LibELF: Allow (but ignore) PT_LOAD headers with zero size
GNU ld sometimes generates zero-sized PT_LOAD headers when running with
the "-z separate-code" option. Let's not choke on such headers, we can
just ignore them and move along.
2021-08-31 16:46:16 +02:00
Gunnar Beutner
e4f0795ae4 LibELF+LibTest: Fix incorrect #ifdef 2021-08-12 08:16:07 +02:00
Daniel Bertalan
18b2484985 LibELF: Remove (FlatPtr)something.as_ptr() idiom
This is equivalent to `something.get()`, but more verbose.
2021-08-09 23:15:48 +02:00
Daniel Bertalan
e0e3198d51 LibELF: Fix 'applying offset produced null pointer' UBSAN failure
These integer => pointer => integer conversions were technically prone
to UB, since they were used as offsets (which are perfectly fine to be
zero), but we calculated them with pointer arithmetic. This made Clang
insert pointer overflow UBSAN checks, which trigger in case of a zero
result.
2021-08-09 23:15:48 +02:00
Gunnar Beutner
4cf24c6ba2 Userland: Prefer using ARCH() over __LP64__ 2021-07-13 23:19:33 +02:00
Gunnar Beutner
13a14b3112 LibELF: Fix loading libs with a .text segment that's not page-aligned
It's perfectly acceptable for the segment's vaddr to not be page aligned
as long as the segment itself is page-aligned. We'll just map a few more
bytes at the start of the segment that will be unused by the library.

We didn't notice this problem because because GCC either always uses
0 for the .text segment's vaddr or at least aligns the vaddr to the
page size.

LibELF would also fail to load really small libraries (i.e. smaller than
4096 bytes).
2021-07-07 11:53:17 +02:00
Gunnar Beutner
ea8ff03475 LibELF: Fix loading objects with a non-zero load base
My previous patch (1f93ffcd) broke loading objects whose first PT_LOAD
entry had a non-zero vaddr.

On top of that the calculations for the relro and dynamic section were
also incorrect.
2021-07-04 14:23:52 +02:00
Gunnar Beutner
371c852fc0 LibELF: Swap the arguments for negative_offset_from_tls_block_end
Now that m_tls_offset points to the start of the TLS block the argument
order makes more sense this way.
2021-07-04 01:07:28 +02:00
Gunnar Beutner
251eaad8f0 LibELF: Fix relocation support for 'static __thread' variables 2021-07-04 01:07:28 +02:00
Gunnar Beutner
5f6ee4c539 LibELF: Save the negative TLS offset in m_tls_offset
This makes it unnecessary to track the symbol size which just isn't
available for unexported symbols (e.g. for 'static __thread').
2021-07-04 01:07:28 +02:00
Gunnar Beutner
a0a38e1e84 LibELF: Implement TLS relocation support for x86_64 2021-07-04 01:07:28 +02:00
Gunnar Beutner
f9a8c6f053 LibELF: Implement support for RELA relocations 2021-07-01 10:50:00 +02:00
Gunnar Beutner
1f93ffcd72 LibELF: Simplify ELF load address calculations
These were unnecessarily complicated.
2021-07-01 10:50:00 +02:00
Gunnar Beutner
2dbd3f83c1 LibELF: Fix incorrect error message 2021-07-01 10:50:00 +02:00
Gunnar Beutner
d3127efc01 LibELF: Implement PLT relocations for x86_64 2021-06-29 20:03:36 +02:00
Gunnar Beutner
5afec84cc2 LibELF: Add stub for R_X86_64_TPOFF64 2021-06-29 20:03:36 +02:00
Gunnar Beutner
811f9d562d LibELF: Make sure the mmap() regions are large enough
Sometimes we'd end up requesting a smaller range for .text and .data
than was actually necessary.
2021-06-29 20:03:36 +02:00
Gunnar Beutner
0cb937416b Meta: Install 64-bit libgcc_s.so for x86_64 targets 2021-06-28 22:29:28 +02:00
Gunnar Beutner
158355e0d7 Kernel+LibELF: Add support for validating and loading ELF64 executables 2021-06-28 22:29:28 +02:00
Andrew Kaster
7b4dc590e7 AK+Userland: Use akaster@serenityos.org for my copyright headers 2021-05-30 14:35:34 +01:00
Nicholas Baron
aa4d41fe2c
AK+Kernel+LibELF: Remove the need for IteratorDecision::Continue
By constraining two implementations, the compiler will select the best
fitting one. All this will require is duplicating the implementation and
simplifying for the `void` case.

This constraining also informs both the caller and compiler by passing
the callback parameter types as part of the constraint
(e.g.: `IterationFunction<int>`).

Some `for_each` functions in LibELF only take functions which return
`void`. This is a minimal correctness check, as it removes one way for a
function to incompletely do something.

There seems to be a possible idiom where inside a lambda, a `return;` is
the same as `continue;` in a for-loop.
2021-05-16 10:36:52 +01:00
Gunnar Beutner
0ab37dbd03 LibELF: Propagate ELF image validation errors to the caller
With this fixed dlopen() no longer crashes when given an invalid
ELF image and instead returns an error code that can be retrieved
with dlerror().

Fixes #6995.
2021-05-10 21:27:11 +02:00
Gunnar Beutner
a050b43290 LibELF: Implement x86_64 relocation support
There are definitely some relocations missing and this is untested
for now.
2021-05-03 08:42:39 +02:00
Itamar
101ac45c1a LibELF: Change TLS offset calculation
This changes the TLS offset calculation logic to be based on the
symbol's size instead of the total size of the TLS.

Because of this change, we no longer need to pipe "m_tls_size" to so
many functions.

Also, After this patch, the TLS data of the main program exists at the
"end" of the TLS block (Highest addresses).

This fixes a part of #6609.
2021-04-30 18:47:39 +02:00
Itamar
6bbd2ebf83 Kernel+LibELF: Support initializing values of TLS data
Previously, TLS data was always zero-initialized.

To support initializing the values of TLS data, sys$allocate_tls now
receives a buffer with the desired initial data, and copies it to the
master TLS region of the process.

The DynamicLinker gathers the initial TLS image and passes it to
sys$allocate_tls.

We also now require the size passed to sys$allocate_tls to be
page-aligned, to make things easier. Note that this doesn't waste memory
as the TLS data has to be allocated in separate pages anyway.
2021-04-30 18:47:39 +02:00
Itamar
db76702d71 LibELF: Rename tls_size to tls_size_of_current_object 2021-04-30 18:47:39 +02:00
Itamar
1c24388d74 LibELF: Extract TLS offset calculation logic to separate function 2021-04-30 18:47:39 +02:00
Gunnar Beutner
f40ee1b03f LibC+LibELF: Implement more fully-features dlfcn functionality
This implements more of the dlfcn functionality. Most notably:

* It's now possible to dlopen() libraries which were already
  loaded at program startup time. This does not cause those
  libraries to be loaded twice.
* Errors are reported via dlerror() rather than by crashing
  the program.
* Calls to the dl*() functions are thread-safe.
2021-04-25 10:14:50 +02:00
Gunnar Beutner
f74b8a2d1f LibELF: Avoid calculating symbol hashes when we don't need them 2021-04-23 23:35:36 +02:00
Andreas Kling
b91c49364d AK: Rename adopt() to adopt_ref()
This makes it more symmetrical with adopt_own() (which is used to
create a NonnullOwnPtr from the result of a naked new.)
2021-04-23 16:46:57 +02:00
Brian Gianforcaro
1682f0b760 Everything: Move to SPDX license identifiers in all files.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.

See: https://spdx.dev/resources/use/#identifiers

This was done with the `ambr` search and replace tool.

 ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
2021-04-22 11:22:27 +02:00
Gunnar Beutner
6c729993a8 LibELF: Allow shared objects which don't have a text segment
Shared objects without a text segment are perfectly OK. For
example libicudata.so has only data segments:

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .hash         00000014  00000094  00000094  00000094  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  1 .dynsym       00000020  000000a8  000000a8  000000a8  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .dynstr       0000002a  000000c8  000000c8  000000c8  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  3 .rodata       01b562d0  00000100  00000100  00000100  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .eh_frame     00000000  01b563d0  01b563d0  01b563d0  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 .dynamic      00000070  01b573d0  01b573d0  01b563d0  2**2
2021-04-19 20:39:22 +02:00
Gunnar Beutner
c32b58873a LibELF: Fix calculation for TLS relocations
The calculation for TLS relocations was incorrect which would
result in overlapping TLS variables when more than one shared
object used TLS variables.

This bug can be reproduced with a shared library and a program
like this:

    $ cat tlstest.c
    #include <string.h>
    __thread char tls_val[1024];
    void set_val() { memset(tls_val, 0, sizeof(tls_val)); }

    $ gcc -g -shared -o usr/lib/libtlstest.so tlstest.c

    $ cat test.c
    void set_val();
    int main() { set_val(); }
    $ gcc -g -o tls test.c -ltlstest

Due to the way the TLS relocations are done this program would
clobber libc's TLS variables (e.g. errno).
2021-04-19 12:14:43 +02:00
Gunnar Beutner
1dab5ca5fd LibELF: Fix support for relocating weak symbols
Having unresolved weak symbols is allowed and we should initialize
them to zero.
2021-04-19 12:00:40 +02:00
Gunnar Beutner
97d7450571 LibELF: Remove VERIFY() calls and let control flow return to the caller
This way we get better error messages for unresolved symbols because
the caller logs the file and symbol names.
2021-04-19 12:00:40 +02:00
Gunnar Beutner
6cb28ecee8 LibC+LibELF: Implement support for the dl_iterate_phdr helper
This helper is used by libgcc_s to figure out where the .eh_frame sections
are located for all loaded shared objects.
2021-04-18 10:55:25 +02:00
Gunnar Beutner
cd7512a2ad LibELF: Add support for loading objects with multiple data and text segments
This enables loading executables with multiple data and text segments. Also
it fixes loading executables where the text segment has a non-zero offset.

Example:

  $ echo "main () {}" > test.c
  $ gcc -Wl,-z,separate-code -o test test.c
  $ objdump -p test
  test:     file format elf32-i386

  Program Header:
      PHDR off    0x00000034 vaddr 0x00000034 paddr 0x00000034 align 2**2
           filesz 0x000000e0 memsz 0x000000e0 flags r--
    INTERP off    0x00000114 vaddr 0x00000114 paddr 0x00000114 align 2**0
           filesz 0x00000013 memsz 0x00000013 flags r--
      LOAD off    0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**12
           filesz 0x000003c4 memsz 0x000003c4 flags r--
      LOAD off    0x00001000 vaddr 0x00001000 paddr 0x00001000 align 2**12
           filesz 0x00000279 memsz 0x00000279 flags r-x
      LOAD off    0x00002000 vaddr 0x00002000 paddr 0x00002000 align 2**12
           filesz 0x00000004 memsz 0x00000004 flags r--
      LOAD off    0x00002004 vaddr 0x00003004 paddr 0x00003004 align 2**12
           filesz 0x00000100 memsz 0x00000124 flags rw-
   DYNAMIC off    0x00002014 vaddr 0x00003014 paddr 0x00003014 align 2**2
           filesz 0x000000c8 memsz 0x000000c8 flags rw-
2021-04-14 13:12:52 +02:00
Andreas Kling
ef1e5db1d0 Everywhere: Remove klog(), dbg() and purge all LogStream usage :^)
Good-bye LogStream. Long live AK::Format!
2021-03-12 17:29:37 +01:00
Andreas Kling
79889ef052 LibELF: Consolidate main executable loading a bit
Merge the load_elf() and commit_elf() functions into a single
load_main_executable() function that takes care of both things.

Also split "stage 3" into two separate stages, keeping the lazy
relocations in stage 3, and adding a stage 4 for calling library
initialization functions.

We also make sure to map the main executable before dealing with
any of its dependencies, to ensure that non-PIE executables get
loaded at their desired address.
2021-02-26 14:49:55 +01:00
Andreas Kling
7db8ccc0e4 LibC+DynamicLoader: Move "transactional memory" GCC stubs to LibC
Instead of having a special case in the dynamic loader where we ignore
TM-related GCC symbols, just stub them out in LibC like we already do
for various other things we don't support.
2021-02-24 14:54:26 +01:00
Brian Gianforcaro
069fd58381 LibELF: Convert more string literals to StringView literals.
Most of these won't have perf impact, but the optimization is
practically free, so no harm in fixing these up.
2021-02-24 14:45:34 +01:00