Commit graph

81 commits

Author SHA1 Message Date
Hendiadyoin1
5bf84a5b0e AK: Zero previous pointer *after* fixing the insertion list in HashTable 2022-06-23 20:25:12 +03:00
Idan Horowitz
eb02425ef9 AK: Clear the previous and next pointers of deleted HashTable buckets
Usually the values of the previous and next pointers of deleted buckets
are never used, as they're not part of the main ordered bucket chain,
but if an in-place rehashing is done, which results in the bucket being
turned into a free bucket, the stale pointers will remain, at which
point any item that is inserted into said free-bucket will have either
a stale previous pointer if the HashTable was empty on insertion, or a
stale next pointer, resulting in undefined behaviour.

This commit also includes a new HashMap test that reproduces this issue
2022-06-22 21:53:13 +02:00
Vitaly Dyachkov
a0a4d169f4 AK+LibGUI: Pass predicate to *_matching() methods by const reference 2022-05-08 17:02:00 +02:00
Idan Horowitz
086969277e Everywhere: Run clang-format 2022-04-01 21:24:45 +01:00
kleines Filmröllchen
09a12247fb AK: Use bucket states with special bit patterns in HashTable
This simplifies some of the bucket state handling code, as there's now
an easy way of checking the basic category of bucket state.
2022-03-31 12:06:13 +02:00
kleines Filmröllchen
49d29c8298 AK: Rehash HashTable in-place instead of shrinking
As seen on TV, HashTable can get "thrashed", i.e. it has a bunch of
deleted buckets that count towards the load factor. This means that hash
tables which are large enough for their contents need to be resized.
This was fixed in 9d8da16 with a workaround that shrinks the HashTable
back down in these cases, as after the resize and re-hash the load
factor is very low again. However, that's not a good solution. If you
insert and remove repeatedly around a size boundary, you might get
frequent resizes, which involve frequent re-allocations.

The new solution is an in-place rehashing algorithm that I came up with.
(Do complain to me, I'm at fault.) Basically, it iterates the buckets
and re-hashes the used buckets while marking the deleted slots empty.
The issue arises with collisions in the re-hash. For this reason, there
are two kinds of used buckets during the re-hashing: the normal "used"
buckets, which are old and are treated as free space, and the
"re-hashed" buckets, which are new and treated as used space, i.e. they
trigger probing. Therefore, the procedure for relocating a bucket's
contents is as follows:
- Locate the "real" bucket of the contents with the hash. That bucket is
  the starting point for the target bucket, and the current (old) bucket
  is the bucket we want to move.
- While we still need to move the bucket:
  - If we're the target, something strange happened last iteration or we
    just re-hashed to the same location. We're done.
  - If the target is empty or deleted, just move the bucket. We're done.
  - If the target is a re-hashed full bucket, we probe by double-hashing
    our hash as usual. Henceforth, we move our target for the next
    iteration.
  - If the target is an old full bucket, we swap the target and to-move
buckets. Therefore, the bucket to move is a the correct location and the
former target, which still needs to find a new place, is now in the
bucket to move. So we can just continue with the loop; the target is
re-obtained from the bucket to move. This happens for each and every
bucket, though some buckets are "coincidentally" moved before their
point of iteration is reached. Either way, this guarantees full in-place
movement (even without stack storage) and therefore space complexity of
O(1). Time complexity is amortized O(2n) asssuming a good hashing
function.

This leads to a performance improvement of ~30% on the benchmark
introduced with the last commit.

Co-authored-by: Hendiadyoin1 <leon.a@serenityos.org>
2022-03-31 12:06:13 +02:00
kleines Filmröllchen
bcb8937898 AK: Merge HashTable bucket state into one enum
The hash table buckets had three different state booleans that are in
fact exclusive. In preparation for further states, this commit
consolidates them into one enum. This has the added benefit on not
relying on the compiler's boolean packing anymore; we definitely now
only need one byte for the bucket state.
2022-03-31 12:06:13 +02:00
Daniel Bertalan
e3eb68dd58 AK+Kernel: Avoid double memory clearing of HashTable buckets
Since the allocated memory is going to be zeroed immediately anyway,
let's avoid redundantly scrubbing it with MALLOC_SCRUB_BYTE just before
that.

The latest versions of gcc and Clang can automatically do this malloc +
memset -> calloc optimization, but I've seen a couple of places where it
failed to be done.

This commit also adds a naive kcalloc function to the kernel that
doesn't (yet) eliminate the redundancy like the userland does.
2022-03-15 11:56:46 +01:00
Andreas Kling
9d8da1697e AK: Automatically shrink HashTable when removing entries
If the utilization of a HashTable (size vs capacity) goes below 20%,
we'll now shrink the table down to capacity = (size * 2).

This fixes an issue where tables would grow infinitely when inserting
and removing keys repeatedly. Basically, we would accumulate deleted
buckets with nothing reclaiming them, and eventually deciding that we
needed to grow the table (because we grow if used+deleted > limit!)

I found this because HashTable iteration was taking a suspicious amount
of time in Core::EventLoop::get_next_timer_expiration(). Turns out the
timer table kept growing in capacity over time. That made iteration
slower and slower since HashTable iterators visit every bucket.
2022-03-07 00:08:22 +01:00
Andreas Kling
eb829924da AK: Remove return value from HashTable::remove() and HashMap::remove()
This was only used by remove_all_matching(), where it's no longer used.
2022-03-07 00:08:22 +01:00
Andreas Kling
623bdd8b6a AK: Simplify HashTable::remove_all_matching()
Just walk the table from start to finish, deleting buckets as we go.
This removes the need for remove() to return an iterator, which is
preventing me from implementing hash table auto-shrinking.
2022-03-07 00:08:22 +01:00
Idan Horowitz
9b0d90a71d AK: Support using custom comparison operations for hash compatible keys 2022-01-29 23:01:23 +02:00
James Puleo
10b25d2a57 AK: Implement HashTable::try_ensure_capacity, as used in HashMap
This was used in `HashMap::try_ensure_capacity`, but was missing from
`HashTable`s implementation. No one had used
`HashMap::try_ensure_capacity` before so it went unnoticed!
2022-01-25 09:17:22 +01:00
Andreas Kling
5279a04c78 AK: Make Hash{Map,Table}::remove_all_matching() return removal success
These functions now return whether one or more entries were removed.
2022-01-05 18:57:14 +01:00
Andreas Kling
54cf42fac1 AK: Add HashTable::remove_all_matching(predicate)
This removes all matching entries from a table in a single pass.
2022-01-05 18:57:14 +01:00
Hendiadyoin1
c673b7220a AK: Enable fast path for removal by hash-compatible key in HashMap/Table 2021-12-15 23:35:14 -08:00
Hendiadyoin1
d50360f5dd AK: Allow hash-compatible key types in Hash[Table|Map] lookup
This will allow us to avoid some potentially expensive type conversion
during lookup, like form String to StringView, which would allocate
memory otherwise.
2021-12-15 13:09:49 +03:30
Andrew Kaster
762b92c650 AK: Resolve clang-tidy readability-qualified-auto warnings
... In files included by Kernel/Process.cpp and Kernel/Thread.cpp
2021-11-14 22:52:35 +01:00
Andrew Kaster
22feb9d47b AK: Resolve clang-tidy readability-bool-conversion warnings
... In files included by Kernel/Process.cpp and Kernel/Thread.cpp
2021-11-14 22:52:35 +01:00
Hendiadyoin1
f76241914c AK: Allow to clear HashTables/Maps with capacity 2021-11-11 09:19:17 +01:00
Andreas Kling
9d1f238450 AK: Make HashTable and HashMap try_* functions return ErrorOr<T>
This allows us to use TRY() and MUST() with them.
2021-11-11 01:27:46 +01:00
Ben Wiederhake
f8d7b4daea AK: Add missing headers
Example failure:

IDAllocator.h only pulls in AK/Hashtable.h, so any compilation unit that
includes AK/IDAllocator.h without including AK/Traits.h before it used
to be doomed to fail with the cryptic error message "In instantiation of
'AK::HashTable<T, TraitsForT, IsOrdered>::Iterator AK::HashTable<T,
TraitsForT, IsOrdered>::find(const T&) [with T = int; TraitsForT =
AK::Traits: incomplete type 'AK::Traits<int>' used in nested name
specifier".
2021-10-06 23:52:40 +01:00
Hendiadyoin1
93cf01ad7d AK: Mark HashTable::size_in_bytes() as constexpr 2021-09-10 14:33:53 +00:00
Hediadyoin1
1aa527f5b6 AK: Add OOM safe interface to HashTable/Map
This adds a new HashSetResult only returned by try_set, to signal
allocation failure during setting.
2021-09-10 14:33:53 +00:00
Andreas Kling
6ad427993a Everywhere: Behaviour => Behavior 2021-09-07 13:53:14 +02:00
Andreas Kling
a940a8bf37 AK: Remove unused private HashTable::lookup_for_reading() 2021-07-21 18:18:51 +02:00
Andreas Kling
f65b039c44 AK: Sprinkle [[nodiscard]] on HashMap and HashTable 2021-07-21 18:18:29 +02:00
ngc6302h
213e2af281 HashTable: Rename finders with a more accurate and self-descripting name 2021-07-13 17:31:00 +02:00
Andreas Kling
3aabace9f5 AK: Use kfree_sized() in AK::HashTable 2021-07-11 14:14:51 +02:00
Hediadyoin1
4a81c79909 AK: Add Ordering support to HashTable and HashMap
Adds a IsOrdered flag to Hashtable and HashMap, which allows iteration
in insertion order
2021-06-15 22:16:55 +02:00
Idan Horowitz
71c54198fa AK: Allow changing the HashTable behaviour for sets on existing entries
Specifically, replacing the existing entry or just keeping it and
canceling the set.
2021-06-09 11:48:04 +01:00
Andreas Kling
c584421592 AK: Make HashTable::operator=(HashTable&&) clear the moved-from table
This is consistent with how other AK containers behave when moved from.
2021-05-30 14:34:32 +02:00
Gunnar Beutner
f89e8fb71a AK+LibC: Implement malloc_good_size() and use it for Vector/HashTable
This implements the macOS API malloc_good_size() which returns the
true allocation size for a given requested allocation size. This
allows us to make use of all the available memory in a malloc chunk.

For example, for a malloc request of 35 bytes our malloc would
internally use a chunk of size 64, however the remaining 29 bytes
would be unused.

Knowing the true allocation size allows us to request more usable
memory that would otherwise be wasted and make that available for
Vector, HashTable and potentially other callers in the future.
2021-05-15 16:30:14 +02:00
Brian Gianforcaro
1682f0b760 Everything: Move to SPDX license identifiers in all files.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.

See: https://spdx.dev/resources/use/#identifiers

This was done with the `ambr` search and replace tool.

 ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
2021-04-22 11:22:27 +02:00
Brian Gianforcaro
0593fa4dcb AK: Annotate HashTable functions as [[nodiscard]] 2021-04-11 12:50:33 +02:00
Brian Gianforcaro
15b94ec1fd AK: Make HashTable with capacity constructor explicit 2021-04-11 12:50:33 +02:00
thislooksfun
e55b8712d4 AK: Inline HashTable writing bucket lookup
The old approach was more complex and also had a very bad edge case
with lots of collisions. This approach eliminates that possiblility.
It also makes both reading and writing lookups a little bit faster.
2021-04-02 12:54:54 +02:00
thislooksfun
509eb10df4 AK: Inline the bucket index calculation
The result of the modulo is only used in the array index, so why
make the code more complex by calculating it in two different places?
2021-04-02 12:54:54 +02:00
Andreas Kling
ef1e5db1d0 Everywhere: Remove klog(), dbg() and purge all LogStream usage :^)
Good-bye LogStream. Long live AK::Format!
2021-03-12 17:29:37 +01:00
Andreas Kling
5d180d1f99 Everywhere: Rename ASSERT => VERIFY
(...and ASSERT_NOT_REACHED => VERIFY_NOT_REACHED)

Since all of these checks are done in release builds as well,
let's rename them to VERIFY to prevent confusion, as everyone is
used to assertions being compiled out in release.

We can introduce a new ASSERT macro that is specifically for debug
checks, but I'm doing this wholesale conversion first since we've
accumulated thousands of these already, and it's not immediately
obvious which ones are suitable for ASSERT.
2021-02-23 20:56:54 +01:00
Lenny Maiorani
537bedbf38 HashTable: Correctly pass args to set
Problem:
- Using regular functions rather than function templates results in
  the arguments not being deduced. This then requires the same
  function to be written multiple times and for `move` to be used
  rather than `forward`.

Solution:
- Collapse multiple function overloads to a single function template
  with a deduced argument. This allows the argument to be a forwarding
  reference and bind to either an l-value or r-value and forward the
  value.
2021-01-31 10:48:12 +01:00
Lenny Maiorani
e6f907a155 AK: Simplify constructors and conversions from nullptr_t
Problem:
- Many constructors are defined as `{}` rather than using the ` =
  default` compiler-provided constructor.
- Some types provide an implicit conversion operator from `nullptr_t`
  instead of requiring the caller to default construct. This violates
  the C++ Core Guidelines suggestion to declare single-argument
  constructors explicit
  (https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#c46-by-default-declare-single-argument-constructors-explicit).

Solution:
- Change default constructors to use the compiler-provided default
  constructor.
- Remove implicit conversion operators from `nullptr_t` and change
  usage to enforce type consistency without conversion.
2021-01-12 09:11:45 +01:00
Dano Perniš
3efd4c105f AK: Reduce memory writes in HashTable destructor 2020-10-18 14:44:23 +02:00
Dano Perniš
d30c559774 AK: Implement HashTable assignment in terms of swap 2020-10-18 14:44:23 +02:00
Dano Perniš
7f3f63dd92 AK: Provide swap() for HashTable 2020-10-18 14:44:23 +02:00
Andreas Kling
7ad8bb5be6 AK: Tune HashTable load factor
Double the capacity when used+deleted buckets crosses 60% of capacity.
This appears to be a sweet spot for performance based on some ad-hoc
testing with test-js. :^)
2020-10-16 08:47:10 +02:00
Andreas Kling
4e50079f36 AK: Redesign HashTable to use closed hashing
Instead of each hash bucket being a SinglyLinkedList, switch to using
closed hashing (open addressing). Buckets are chained together via
double hashing (hashing the hash until we find an unused bucket.)

This greatly reduces malloc traffic, since each added element no longer
allocates a new linked list node.

Appears performance neutral on test-js. Can definitely be tuned and
could use proper management of load factor, etc.
2020-10-15 23:49:53 +02:00
Muhammad Zahalqa
a68650a7b4
AK: HashTable add a constructor that allows preallocation of capacity + Use in CppLexer. (#3147)
1. Add general utility to get array number of elements.
2. Add Needed API to AK::HashTable
3. Refactor CppLexer initialization
2020-08-16 11:04:00 +02:00
Tom
dadd53e4f2 AK: HashTable/HashMap return whether action was performed for set/remove
This allows performing an action based on whether something
was actually added or removed without having to look it up
prior to calling set() or remove().
2020-07-09 21:58:07 +02:00
William McPherson
121e7306c3 AK: Expose SinglyLinkedListIterator constructor
This commit replaces SinglyLinkedListIterator::universal_end() with an
empty SinglyLinkedListIterator(). Piano needs this in order to
initialize a member array of iterators without 84 lines of
universal_end().
2020-02-27 10:21:13 +01:00