Commit graph

12 commits

Author SHA1 Message Date
Andreas Kling
a3e82eaad3 AK: Introduce the new String, replacement for DeprecatedString
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.

Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.

Problems in DeprecatedString:

- It assumes string allocation never fails. This makes it impossible
  to use in allocation-sensitive contexts, and is the reason we had to
  ban DeprecatedString from the kernel entirely.

- The awkward null state. DeprecatedString can be null. It's different
  from the empty state, although null strings are considered empty.
  All code is immediately nicer when using Optional<DeprecatedString>
  but DeprecatedString came before Optional, which is how we ended up
  like this.

- The encoding of the underlying data is ambiguous. For the most part,
  we use it as if it's always UTF-8, but there have been cases where
  we pass around strings in other encodings (e.g ISO8859-1)

- operator[] and length() are used to iterate over DeprecatedString one
  byte at a time. This is done all over the codebase, and will *not*
  give the right results unless the string is all ASCII.

How we solve these issues in the new String:

- Functions that may allocate now return ErrorOr<String> so that ENOMEM
  errors can be passed to the caller.

- String has no null state. Use Optional<String> when needed.

- String is always UTF-8. This is validated when constructing a String.
  We may need to add a bypass for this in the future, for cases where
  you have a known-good string, but for now: validate all the things!

- There is no operator[] or length(). You can get the underlying data
  with bytes(), but for iterating over code points, you should be using
  an UTF-8 iterator.

Furthermore, it has two nifty new features:

- String implements a small string optimization (SSO) for strings that
  can fit entirely within a pointer. This means up to 3 bytes on 32-bit
  platforms, and 7 bytes on 64-bit platforms. Such small strings will
  not be heap-allocated.

- String can create substrings without making a deep copy of the
  substring. Instead, the superstring gets +1 refcount from the
  substring, and it acts like a view into the superstring. To make
  substrings like this, use the substring_with_shared_superstring() API.

One caveat:

- String does not guarantee that the underlying data is null-terminated
  like DeprecatedString does today. While this was nifty in a handful of
  places where we were calling C functions, it did stand in the way of
  shared-superstring substrings.
2022-12-06 15:21:26 +01:00
Linus Groh
6e19ab2bbc AK+Everywhere: Rename String to DeprecatedString
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
2022-12-06 08:54:33 +01:00
Dan Klishch
fdc53a5995 AK: Add framework for a unified floating point to string conversion
Currently, the floating point to string conversion is implemented
several times across the codebase. This commit provides a pretty
low-level function to unify all of such conversions. It converts the
given double to a fixed point decimal satisfying a few correctness
criteria.
2022-11-03 20:17:09 -06:00
davidot
53b7f5e6a1 AK: Add an exact and fast floating point parsing algorithm
This is based on the paper by Daniel Lemire called
"Number parsing at a Gigabyte per second", currently available at
https://arxiv.org/abs/2101.11408
An implementation can be found at
https://github.com/fastfloat/fast_float

To support both strtod like methods and String::to_double we have two
different APIs. The parse_first_floating_point gives back both the
result, next character to read and the error/out of range status.
Out of range here means we rounded to infinity 0.

The other API, parse_floating_point_completely, will return a floating
point only if the given character range contains just the floating point
and nothing else. This can be much faster as we can skip actually
computing the value if we notice we did not parse the whole range.

Both of these APIs support a very lenient format to be usable in as many
places as possible. Also it does not check for "named" values like
"nan", "inf", "NAN" etc. Because this can be different for every usage.

For integers and small values this new method is not faster and often
even a tiny bit slower than the current strtod implementation. However
the strtod implementation is wrong for a lot of values and has a much
less predictable running time.

For correctness this method was tested against known string -> double
datasets from https://github.com/nigeltao/parse-number-fxx-test-data
This method gives 100% accuracy.
The old strtod gave an incorrect value in over 50% of the numbers
tested.
2022-10-23 15:48:45 +02:00
Andrew Kaster
1ca48a2aec AK+Userland: Use a CMake variable for AK_SOURCES instead of GLOB
This lets us remove a glob pattern from LibC, the DynamicLoader, and,
later, Lagom. The Kernel already has its own separate list of AK files
that it wants, which is only a subset of all AK files.
2022-10-16 16:36:39 +02:00
Gunnar Beutner
0dd03413d6 Meta: Add support for declaring components
Components are a group of build targets that can be built and installed
separately. Whether a component should be built can be configured with
CMake arguments: -DBUILD_<NAME>=ON|OFF, where <NAME> is the name of the
component (in all caps).

Components can be marked as REQUIRED if they're necessary for a
minimally functional base system or they can be marked as RECOMMENDED
if they're not strictly necessary but are useful for most users.

A component can have an optional description which isn't used by the
build system but may be useful for a configuration UI.

Components specify the TARGETS which should be built when the component
is enabled. They can also specify other components which they depend on
(with DEPENDS).

This also adds the BUILD_EVERYTHING CMake variable which lets the user
build all optional components. For now this defaults to ON to make the
transition to the components-based build system easier.

The list of components is exported as an INI file in the build directory
(e.g. Build/i686/components.ini).

Fixes #8048.
2021-06-17 11:03:51 +02:00
Brian Gianforcaro
31081e8ebf AK: Remove now unused InlineLinkedList class
All usages of AK::InlineLinkedList have been converted to
AK::IntrusiveList. So it's time to retire our old friend.

Note: The empty white space change in AK/CMakeLists.txt is to
force CMake to re-glob the header files in the AK directory so
incremental build will work when folks git pull this change locally.

Otherwise they'll get errors, because CMake will attempt to install
a file which no longer exists.
2021-06-16 10:40:01 +02:00
Brian Gianforcaro
67322b0702 Tests: Move AK tests to Tests/AK 2021-05-06 17:54:28 +02:00
Andrew Kaster
e787738c24 Meta: Build AK and LibRegex tests in Lagom and for Serenity
These tests were never built for the serenity target. Move their Lagom
build steps to the Lagom CMakeLists.txt, and add serenity build steps
for them. Also, fix the build errors when building them with the
serenity cross-compiler :^)
2021-02-28 18:19:37 +01:00
Lenny Maiorani
e4ce485309 CMake: Decouple cmake utility functions from top-level CMakeLists.txt
Problem:
- These utility functions are only used in `AK`, but are being defined
  in the top-level. This clutters the top-level.

Solution:
- Move the utility functions to `Meta/CMake/utils.cmake` and include
  where needed.
- Also, move `all_the_debug_macros.cmake` into `Meta/CMake` directory
  to consolidate the location of `*.cmake` script files.
2020-12-24 11:02:04 +01:00
Itamar
310063fed8 Meta: Install source files at /usr/src/serenity 2020-08-15 15:06:35 +02:00
Paul Redmond
4d4e578edf Ports: Fix CMake-based ports
The SDL port failed to build because the CMake toolchain filed pointed
to the old root. Now the toolchain file assumes that the Root is in
Build/Root.

Additionally, the AK/ and Kernel/ headers need to be installed in the
root too.
2020-05-29 20:21:10 +02:00