This reverts commit d48c68cf3f.
Unfortunately, this currently copies some warn() invocations that we do
*not* want in the debug console, such as test-js's use of OSC command 9
to report progress.
This is just a straight (and fairly inefficient) implementation of IPv6
parsing and serialization from the URL spec.
Note that we don't use AK::IPv6Address here because the URL spec
requires a specific serialization behavior.
The array which contains the actual parameters is always located
immediately after the base `TypeErasedFormatParams` object of
`VariadicFormatParams`. Hence, storing a pointer to it inside a `Span`
is redundant. Changing it to a zero-length array saves 8 bytes.
Secondly, we limit the number of parameters to 256, so `m_size` and
`m_next_index` can be stored in a smaller data type than `size_t`,
saving us another 8 bytes.
This decreases the size of a single-element `VariadicFormatParams` from
48 to 32 bytes, thus reducing the code size overhead of setting up
parameters for `dbgln()`.
Note that [arrays of length zero][1] are a GNU extension, but it's used
elsewhere in the codebase already and is explicitly supported by Clang
and GCC.
[1]: https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
Xcode 15 betas 1-3 lack https://reviews.llvm.org/D135772, which fixes a
bug that causes trailing `requires` clauses to be evaluated twice,
failing the second time. Reported as FB12284201.
This caused compile errors when instantiating types derived from RefPtr:
> error: invalid reference to function 'NonnullRefPtr': constraints not
> satisfied
> note: because substituted constraint expression is ill-formed: value
> of type '<dependent type>' is not contextually convertible to 'bool'.
This commit works around the issue by moving the `requires` clauses
after the template parameter list.
In most cases, trailing `requires` clauses and those specified after the
template parameter list work identically, so this change should not
impact the code's behavior. The only difference is that trailing
requires clauses are evaluated *after* constrained placeholder types
(i.e. `Integral auto i` function parameter).
All elements of the vector were moved to the left, for each element to
remove. This patch makes the function move each element exactly once.
On the same test case as the previous commit, it makes the function
disappear from the profile. These two commits combined reduce the
decompression time by 12%.
As confusing as it may sound, reusing them is terrible performance wise.
When profiling the PNG decoder, the result (which is dominated by the
Zlib decompression) shows that the `cleanup_unused_chunks()` function
represented 14.26% of the profile before this patch and only 7.7%
afterward.
On a 6.5 MB PNG image, it reduces the decompression time by more than
5%.
This uses one of Sun OS's algorithms, for a comparison to other
algorithms please refer to
https://gist.github.com/Hendiadyoin1/f58346d66637deb9156ef360aa158bf9
This is used on aarch64 builds and for x86 floats and doubles
for performance gains check
https://quick-bench.com/q/_2jTykshP6cUqtgdepFaoQ53YC8
which shows approximately 2x gains
Co-Authored-By: Ben Wiederhake <BenWiederhake.GitHub@gmx.de>
Co-Authored-By: kleines Filmröllchen <filmroellchen@serenityos.org>
Co-Authored-By: Dan Klishch <danilklishch@gmail.com>
This now searches the memory in blocks, which should be slightly more
efficient. However, it doesn't make much difference (e.g. ~1% in LZMA
compression) in most real-world applications, as the non-hint function
is more expensive by orders of magnitude.
This factors out a lot of complicated math into somewhat understandable
functions.
While at it, rename `next_read_span_with_seekback` to
`next_seekback_span` to keep the naming consistent and to avoid making
function names any longer.
The "operation modes" of this function have very different focuses, and
trying to combine both in a way where we share the most amount of code
probably results in the worst performance.
Instead, split up the function into "existing distances" and "no
existing distances" so that we can optimize either case separately.
We will be adding extra logic to the CircularBuffer to optimize
searching, but this would negatively impact the performance of
CircularBuffer users that don't need that functionality.
By golly, this is a lot more spec comments than I originally thought
I would need to do! This has exposed some bugs in the implementation,
as well as a whole lot of things which we are yet to implement.
No functional changes intended in this commit (already pretty large
as is!).
ECMA-262 implies that `MIN_VALUE` should be a denormalized value if
denormal arithmetic is supported. This is the case on x86-64 and AArch64
using standard GCC/Clang compilation settings.
test262 checks whether `Number.MIN_VALUE / 2.0` is equal to 0, which
only holds if `MIN_VALUE` is the smallest denormalized value.
This commit renames the existing `NumericLimits<FloatingPoint>::min()`
to `min_normal()` and adds a `min_denormal()` method to force users to
explicitly think about which one is appropriate for their use case. We
shouldn't follow the STL's confusingly designed interface in this
regard.
Instead of checking the address of a temporary, grab the address of the
current frame pointer to determine how much memory is left on the stack.
This better communicates to the compiler what we're trying to do, and
resolves some crashes with ASAN in test-js while the option
detect_stack_use_after_return is turned on.
This was missed in 02b74e5a70
We need to disable consteval in AK::String as well as AK::StringView,
and we need to disable it when building both the tools build and the
fuzzer build.
These recursive templates have a measurable impact on the compile speed
of Variant-heavy code like LibWeb. Using these builtins leads to a 2.5%
speedup for the measured compilation units.
oss-fuzz ships a pre-release commit of clang-15 for all of their build
bots. Until they update to a release of clang-15 that includes the fix
for this bug, or a later release, we need to keep the workaround in
place.
The Windows CRT definition of assert() is not noreturn, and causes
compile errors when using it as the backing for VERIFY() in debug
configurations of applications like the Jakt compiler.
Apple Clang 14.0.3 (Xcode 14.3) miscompiles this builtin on AArch64,
causing the borrow flag to be set incorrectly. I have added a detailed
writeup on Qemu's issue tracker, where the same issue led to a hang when
emulating x86:
https://gitlab.com/qemu-project/qemu/-/issues/1659#note_1408275831
I don't know of any specific issue caused by this on Lagom, but better
safe than sorry.
GCC 14 (https://gcc.gnu.org/g:2b4e0415ad664cdb3ce87d1f7eee5ca26911a05b)
has added support for the previously Clang-specific add/subtract with
borrow builtins. Let's use `__has_builtin` to detect them instead of
assuming that only Clang has them. We should prefer these to the
x86-specific ones as architecture-independent middle-end optimizations
might deal with them better.
As an added bonus, this significantly improves codegen on AArch64
compared to the fallback implementation that uses
`__builtin_{add,sub}_overflow`.
For now, the code path with the x86-specific intrinsics stays so as to
avoid regressing the performance of builds with GCC 12 and 13.
The previous version had a sequence of calls that are likely not
optimized out, while this version is strictly a sequence of static type
conversion which are always fully optimized out.