ladybird/Userland/DynamicLoader
Daniel Bertalan bcf124c07d LibC: Implement a faster memset routine for x86-64 in assembly
This commit addresses the following shortcomings of our current, simple
and elegant memset function:
- REP STOSB/STOSQ has considerable startup overhead, it's impractical to
  use for smaller sizes.
- Up until very recently, AMD CPUs didn't have support for "Enhanced REP
  MOVSB/STOSB", so it performed pretty poorly on them.

With this commit applied, I could measure a ~5% decrease in `test-js`'s
runtime when I used qemu's TCG backend. The implementation is based on
the following article from Microsoft:

https://msrc-blog.microsoft.com/2021/01/11/building-faster-amd64-memset-routines

Two versions of the routine are implemented: one that uses the ERMS
extension mentioned above, and one that performs plain SSE stores. The
version appropriate for the CPU is selected at load time using an IFUNC.
2022-05-01 12:42:01 +02:00
..
CMakeLists.txt LibC: Implement a faster memset routine for x86-64 in assembly 2022-05-01 12:42:01 +02:00
main.cpp Everywhere: Run clang-format 2022-04-01 21:24:45 +01:00
misc.cpp Everywhere: Run clang-format 2022-04-01 21:24:45 +01:00
misc.h Everywhere: Run clang-format 2022-04-01 21:24:45 +01:00