ByteBuffer::get_bytes_for_writing() was only ensuring capacity before
this patch. The method needs to call resize to register the appended
data, otherwise it will be overwritten with next data addition.
Usually the values of the previous and next pointers of deleted buckets
are never used, as they're not part of the main ordered bucket chain,
but if an in-place rehashing is done, which results in the bucket being
turned into a free bucket, the stale pointers will remain, at which
point any item that is inserted into said free-bucket will have either
a stale previous pointer if the HashTable was empty on insertion, or a
stale next pointer, resulting in undefined behaviour.
This commit also includes a new HashMap test that reproduces this issue
This reverts commit 50c88e5e3a.
The intention was to add them to NonnullRefPtr, not NonnullOwnPtr. That
is also what was advertised in the PR, but not actually done in the
reverted commit.
It was mostly implemented based on a spec note, that described only
allowed characters, but instead of allowing some special characters not
to be escaped, we escaped every special character except those 'new in
this encode set' disallowed characters from the spec definition.
This is an issue on systems that don't have the empty base class
optimisation (such as windows), and we normally don't need to care -
however static_cast is technically the right thing to use, so let's use
that instead.
Co-Authored-By: Daniel Bertalan <dani@danielbertalan.dev>
The compiler would complain about `__builtin_memcpy` in ByteBuffer::copy
writing out of bounds, as it isn't able to deduce the invariant that the
inline buffer is only used when the requested size is smaller than the
inline capacity.
The other change is more bizarre. If the destructor's declaration
exists, gcc complains about a `delete` operation causing an
out-of-bounds array access.
error: array subscript 'DHCPv4Client::__as_base [0]' is partly outside
array bounds of 'unsigned char [8]' [-Werror=array-bounds]
14 | ~DHCPv4Client() = default;
| ^
This looks like a compiler bug, and I'll report it if I find a suitable
reduced reproducer.
This allows direct inlining and hides away some assembly and
bit-fiddling when manipulating the floating point environment.
This only implements the x87/SSE versions, as of now.
This uses the `fistp` and `cvts[sd]2si` respectively, to potentially
round floating point values with just one instruction.
This falls back to `llrint[fl]?` on aarch64 for now.
This new class with an admittedly long OOP-y name provides a circular
queue in shared memory. The queue is a lock-free synchronous queue
implemented with atomics, and its implementation is significantly
simplified by only accounting for one producer (and multiple consumers).
It is intended to be used as a producer-consumer communication
datastructure across processes. The original motivation behind this
class is efficient short-period transfer of audio data in userspace.
This class includes formal proofs of several correctness properties of
the main queue operations `enqueue` and `dequeue`. These proofs are not
100% complete in their existing form as the invariants they depend on
are "handwaved". This seems fine to me right now, as any proof is better
than no proof :^). Anyways, the proofs should build confidence that the
implemented algorithms, which are only roughly based on existing work,
operate correctly in even the worst-case concurrency scenarios.
This allows for calling this function with any argument type for which
the appropriate traits and operators have been implemented so it can be
compared to the Vector's item type
This patch adds a header containing the fuzzy match algorithm
previously used in Assistant. The algorithm was moved to AK
since there are many places where a search may benefit from fuzzyness.
Some functions want to ignore cv-qualifiers, and it's much easier to
constrain the type through a concept than a separate requires clause on
the function.
Both calls essentially only differ in one boolean, which dictates
whether to print the value in uppercase or lowercase.
Move the long function call into a new function and pass in the
"uppercase" boolean seperately to avoid having to write everything
twice.
Those functions only differ by the input type of `number`. No other
wrapper does this, as they rely on adjusting the type of the argument on
the caller side instead.
Avoid specializing too much by just doing the same for signed numbers.
We were decoding and then re-encoding the query string in URLs.
This round-trip caused us to lose information about plus ('+')
ASCII characters encoded as "%2B".
A change was made prior to percent encode plus signs in order to fix an
issue with the Google cookie consent page.
Unforunately, this was treating a symptom of a problem and not the root
cause and is incorrect behavior.
When we want to use the find_first_index that base Vector provides, we
need to provide an element of the real contained type. That's impossible
for OwnPtr, however, and even with RefPtr there might be instances where
we have a raw reference to the object we want to find, but no smart
pointer. Therefore, overloading this function (with an identical body,
the magic is done by the find_index templatization) with `T const&` as a
parameter allows there use cases.
On oss-fuzz, the LibJS REPL is provided a file encoded with Windows-1252
with the following contents:
/ô¡°½/
The REPL assumes the input file is UTF-8. So in Windows-1252, the above
is represented as [0x2f 0xf4 0xa1 0xb0 0xbd 0x2f]. The inner 4 bytes are
actually a valid UTF-8 encoding if we only look at the most significant
bits to parse leading/continuation bytes. However, it decodes to the
code point U+121c3d, which is not a valid code point.
This commit adds additional validation to ensure the decoded code point
itself is also valid.
These functions are _very_ misleading, as `first()` and `last()` return
references, but `{first,last}_matching()` return copies of the values.
This commit makes it so that they now return Optional<T&>, eliminating
the copy and the confusion.
This implements Optional<T&> as a T*, whose presence has been missing
since the early days of Optional.
As a lot of find_foo() APIs return an Optional<T> which imposes a
pointless copy on the underlying value, and can sometimes be very
misleading, with this change, those APIs can return Optional<T&>.
This method exploits the fact that the values themselves hold the tree
pointers, and as a result this let's us skip the O(logn) traversal down
to the matching Node for a Key-Value pair.
Adds a new optional parameter 'reserved_chars' to
AK::URL::percent_encode. This new optional parameter allows the caller
to specify custom characters to be percent encoded. This is then used
to percent encode plus signs by HttpRequest::to_raw_request.
As seen on TV, HashTable can get "thrashed", i.e. it has a bunch of
deleted buckets that count towards the load factor. This means that hash
tables which are large enough for their contents need to be resized.
This was fixed in 9d8da16 with a workaround that shrinks the HashTable
back down in these cases, as after the resize and re-hash the load
factor is very low again. However, that's not a good solution. If you
insert and remove repeatedly around a size boundary, you might get
frequent resizes, which involve frequent re-allocations.
The new solution is an in-place rehashing algorithm that I came up with.
(Do complain to me, I'm at fault.) Basically, it iterates the buckets
and re-hashes the used buckets while marking the deleted slots empty.
The issue arises with collisions in the re-hash. For this reason, there
are two kinds of used buckets during the re-hashing: the normal "used"
buckets, which are old and are treated as free space, and the
"re-hashed" buckets, which are new and treated as used space, i.e. they
trigger probing. Therefore, the procedure for relocating a bucket's
contents is as follows:
- Locate the "real" bucket of the contents with the hash. That bucket is
the starting point for the target bucket, and the current (old) bucket
is the bucket we want to move.
- While we still need to move the bucket:
- If we're the target, something strange happened last iteration or we
just re-hashed to the same location. We're done.
- If the target is empty or deleted, just move the bucket. We're done.
- If the target is a re-hashed full bucket, we probe by double-hashing
our hash as usual. Henceforth, we move our target for the next
iteration.
- If the target is an old full bucket, we swap the target and to-move
buckets. Therefore, the bucket to move is a the correct location and the
former target, which still needs to find a new place, is now in the
bucket to move. So we can just continue with the loop; the target is
re-obtained from the bucket to move. This happens for each and every
bucket, though some buckets are "coincidentally" moved before their
point of iteration is reached. Either way, this guarantees full in-place
movement (even without stack storage) and therefore space complexity of
O(1). Time complexity is amortized O(2n) asssuming a good hashing
function.
This leads to a performance improvement of ~30% on the benchmark
introduced with the last commit.
Co-authored-by: Hendiadyoin1 <leon.a@serenityos.org>
The hash table buckets had three different state booleans that are in
fact exclusive. In preparation for further states, this commit
consolidates them into one enum. This has the added benefit on not
relying on the compiler's boolean packing anymore; we definitely now
only need one byte for the bucket state.
Currently this can parse XML and resolve external resources/references,
and read a DTD (but not apply or verify its rules).
That's good enough for _most_ XHTML documents as the HTML 5 spec
enforces its own rules about document well-formedness, and does not make
use of XML DTDs (aside from a list of predefined entities).
An accompanying `xml` utility is provided that can read and dump XML
documents, and can also run the XML conformance test suite.
This is an enum-like type that works with arbitrary sized storage > u64,
which is the limit for a regular enum class - which limits it to 64
members when needing bit field behavior.
Co-authored-by: Ali Mohammad Pur <mpfard@serenityos.org>
Previously, case-insensitively searching the haystack "Go Go Back" for
the needle "Go Back" would return false:
1. Match the first three characters. "Go ".
2. Notice that 'G' and 'B' don't match.
3. Skip ahead 3 characters, plus 1 for the outer for-loop.
4. Now, the haystack is effectively "o Back", so the match fails.
Reducing the skip by 1 fixes this issue. I'm not 100% convinced this
fixes all cases, but I haven't been able to find any cases where it
doesn't work now. :^)
Day and month name constants are defined in numerous places. This
pulls them together into a single place and eliminates the
duplication. It also ensures they are `constexpr`.
Even though the StringView(char*, size_t) constructor only runs its
overflow check when evaluated in a runtime context, the code generated
here could prevent the compiler from optimizing invocations from the
StringView user-defined literal (verified on Compiler Explorer).
This changes the user-defined literal declaration to be consteval to
ensure it is evaluated at compile time.
C++20 provides the `requires` clause which simplifies the ability to
limit overload resolution. Prefer it over `EnableIf`
With all uses of `EnableIf` being removed, also remove the
implementation so future devs are not tempted.
Since the allocated memory is going to be zeroed immediately anyway,
let's avoid redundantly scrubbing it with MALLOC_SCRUB_BYTE just before
that.
The latest versions of gcc and Clang can automatically do this malloc +
memset -> calloc optimization, but I've seen a couple of places where it
failed to be done.
This commit also adds a naive kcalloc function to the kernel that
doesn't (yet) eliminate the redundancy like the userland does.
Previously, if you forgot to set a key on a SourceGenerator, you would
get this less-than-helpful error message:
> Generate_CSS_MediaFeatureID_cpp:
/home/sam/serenity/Meta/Lagom/../../AK/Optional.h:174: T
AK::Optional<T>::release_value() [with T = AK::String]: Assertion
`m_has_value' failed.
Now, it instead looks like this:
> No key named `name:titlecase` set on SourceGenerator
Generate_CSS_MediaFeatureID_cpp:
/home/sam/serenity/Meta/Lagom/../../AK/SourceGenerator.h:44:
AK::String AK::SourceGenerator::get(AK::StringView) const: Assertion
`false' failed.
This is the IPv6 counter part to the IPv4Address class and implements
parsing strings into a in6_addr and formatting one as a string. It
supports the address compression scheme as well as IPv4 mapped
addresses.
If the utilization of a HashTable (size vs capacity) goes below 20%,
we'll now shrink the table down to capacity = (size * 2).
This fixes an issue where tables would grow infinitely when inserting
and removing keys repeatedly. Basically, we would accumulate deleted
buckets with nothing reclaiming them, and eventually deciding that we
needed to grow the table (because we grow if used+deleted > limit!)
I found this because HashTable iteration was taking a suspicious amount
of time in Core::EventLoop::get_next_timer_expiration(). Turns out the
timer table kept growing in capacity over time. That made iteration
slower and slower since HashTable iterators visit every bucket.
Just walk the table from start to finish, deleting buckets as we go.
This removes the need for remove() to return an iterator, which is
preventing me from implementing hash table auto-shrinking.
This will be caught by new test cases: when the initial chunk is empty,
a dereference before calling operator++ on the iterator will crash as
the initial chunk's size is never checked.
Previously, although we were taking a moved chunk, we still copied it
into our chunk list. This makes DisjointChunk compatible with containers
that don't have a copy constructor but a move constructor.
This adds a NOLINT directive to the definition of the TODO() macro.
clang-tidy wants the assert replaced with a static_assert, since the
macro simply resolves to assert(false). This is obviously nonsensical,
since we want the code to still compile even with TODO().
The same fix has already been implemented for VERIFY_NOT_REACHED().
While the code did already VERIFY that the ErrorOr holds a value, this
was done by Variant, so the error message was just that `has<T>()` is
false. This is less helpful than I would like, especially if backtraces
are not working and this is all you have to go on. Adding this extra
VERIFY means the assertion message (`!is_error()`) is easier to
understand.