Previously, only the first block in a chain of blocks would be
overwritten while all subsequent blocks would be appended to the heap.
Now we make sure to reuse all existing blocks in the chain.
Previously, `Heap` would store serialized data in blocks of 1024 bytes
regardless of the actual length. Data longer than 1024 bytes was
silently truncated causing database corruption.
This changes the heap storage to prefix every block with two new fields:
the total data size in bytes, and the next block to retrieve if the data
is longer than what can be stored inside a single block. By chaining
blocks together, we can store arbitrary amounts of data without needing
to change anything of the logic in the rest of LibSQL.
As part of these changes, the "free list" is also removed from the heap
awaiting an actual implementation: it was never used.
Note that this bumps the database version from 3 to 4, and as such
invalidates (deletes) any database opened with LibSQL that is not
version 4.
We are performing a lot of checks on pointers that are performed again
immediately afterwards because of a dereference. This removes the
redundant `VERIFY`s and simplifies a couple others.
Currently, integers are stored in LibSQL as 32-bit signed integers, even
if the provided type is unsigned. This resulted in a series of unchecked
unsigned-to-signed conversions, and prevented storing 64-bit values.
Further, mathematical operations were performed without similar checks,
and without checking for overflow.
This changes SQL::Value to behave like SQLite for INTEGER types. In
SQLite, the INTEGER type does not imply a size or signedness of the
underlying type. Instead, SQLite determines on-the-fly what type is
needed as values are created and updated.
To do so, the SQL::Value variant can now hold an i64 or u64 integer. If
a specific type is requested, invalid conversions are now explictly an
error (e.g. converting a stored -1 to a u64 will fail). When binary
mathematical operations are performed, we now try to coerce the RHS
value to a type that works with the LHS value, failing the operation if
that isn't possible. Any overflow or invalid operation (e.g. bitshifting
a 64-bit value by more than 64 bytes) is an error.
In the long run, this is obviously a bad way to handle version changes
to the SQL database files. We will want to migrate old databases to new
formats. Until we figure out a good way to do that, wipe old databases
so that we don't crash trying to read incompatible data.
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
This allows surrounding IO operations with TRY, making the code much
easier to reason about. This also replaces surrounding dbgln_if
statements to use "{:hex-dump}" instead of individually writing out
bytes.
The handling of filesystem level errors was basically non-existing or
consisting of `VERIFY_NOT_REACHED` assertions. Addressed this by
* Adding `open` methods to `Heap` and `Database` which return errors.
* Changing the interface of methods of these classes and clients
downstream to propagate these errors.
The constructors of `Heap` and `Database` don't open the underlying
filesystem file anymore.
The SQL statement handlers return an `SQLErrorCode::InternalError`
error code if an error comes back from the lower levels. Note that some
of these errors are things like duplicate index entry errors that should
be caught before the SQL layer attempts to actually update the database.
Added tests to catch attempts to open weird or non-existent files as
databases.
Finally, in between me writing this patch and submitting the PR the
AK::Result<Foo, Bar> template got deprecated in favour of ErrorOr<Foo>.
This resulted in more busywork.
Derivatives of Core::Object should be constructed through
ClassName::construct(), to avoid handling ref-counted objects with
refcount zero. Fixing the visibility means that misuses like this are
more difficult.
Classes reading and writing to the data heap would communicate directly
with the Heap object, and transfer ByteBuffers back and forth with it.
This makes things like caching and locking hard. Therefore all data
persistence activity will be funneled through a Serializer object which
in turn submits it to the Heap.
Introducing this unfortunately resulted in a huge amount of churn, in
which a number of smaller refactorings got caught up as well.
Unfortunately this patch is quite large.
The main functionality included are a BTree index implementation and
the Heap class which manages persistent storage.
Also included are a Key subclass of the Tuple class, which is a
specialization for index key tuples. This "dragged in" the Meta layer,
which has classes defining SQL objects like tables and indexes.