Problem:
- The hash functions have no associated tests, so there is nothing
binding their behavior.
Solution:
- Bind the hash function behavior by adding tests.
- Use the existing behavior as "correct".
Problem:
- `constexpr` functions are decorated with the `inline` specifier
keyword. This is redundant because `constexpr` functions are
implicitly `inline`.
- [dcl.constexpr], §7.1.5/2 in the C++11 standard): "constexpr
functions and constexpr constructors are implicitly inline (7.1.2)".
Solution:
- Remove the redundant `inline` keyword.
Problem:
- `Checked` is not `constexpr`-aware.
Solution:
- Decorate member functions with `constexpr` keyword.
- Add tests to ensure the functionality where possible.
Problem:
- Compiler-generated functions are being defined which results in
extra code to maintain.
Solution:
- Switch to compiler-generated default functions for default
construction, copy assignment, move assignment, copy construction
and move construction.
Problem:
- There is no test which guarantees the CircularQueue does not
construct any objects of the value type. The goal is to have
uninitialized memory which can be used.
Solution:
- Add a test requiring that the constructor of the value type is never
called.
This commit also removes a few functions like raw_out and vwarn. If we
want to write raw output, we can do this as follows:
out("{}", "Hello, World!");
The vout stuff isn't really public API anyways, so no need for another
vwarn.
Problem:
- It is not possible to perform a binary search at compile-time
because `binary_search` is not `constexpr`-aware.
Solution:
- Add `constexpr` support.
This implements the transmit time suggestion in (abandoned?)
draft-ietf-ntp-data-minimization. (The other suggestions were already
implemented as far as I can tell.)
Problem:
- `Span` is not `constexpr` aware.
Solution:
- Add `constexpr` support for all parts that do not require
`reinterpret_cast`.
- Modify tests which use the `constexpr` functions.
Double the capacity when used+deleted buckets crosses 60% of capacity.
This appears to be a sweet spot for performance based on some ad-hoc
testing with test-js. :^)
Instead of each hash bucket being a SinglyLinkedList, switch to using
closed hashing (open addressing). Buckets are chained together via
double hashing (hashing the hash until we find an unused bucket.)
This greatly reduces malloc traffic, since each added element no longer
allocates a new linked list node.
Appears performance neutral on test-js. Can definitely be tuned and
could use proper management of load factor, etc.
There is no portable way to forward declare abort because the libc
implementations disagree on the signature.
Originally, I added a __portable_abort function with a "portable"
signature which just called abort. But I really don't like it and just
including <stdlib.h> is simpler.
Note that the headers we include in <AK/TestSuite.h> are no longer
commutative now, we have to include <stdlib.h> before anything else.
Problem:
- Output of decode and encode grow as the decode and encode
happen. This is inefficient because a large size will require many
reallocations.
- `const` qualifiers are missing on variables which are not intended
to change.
Solution:
- Since the size of the decoded or encoded message is known prior to
starting, calculate the size and set the output to that size
immediately. All appends will not incur the reallocation overhead.
- Add `const` qualifiers to show intent.
* AK: Add formatter for JsonValue.
* Inspector: Use new format functions.
* Profiler: Use new format functions.
* UserspaceEmulator: Use new format functions.
Problem:
- The Base64 alphabet and lookup table are initialized at
run-time. This results in an initial start-up cost as well as a
boolean evaluation and branch every time the function is called.
Solution:
- Provide `constexpr` functions which initialize the alphabet and
lookup table at compile-time. These can be called and assigned to a
`constexpr` variable so that there is no run-time cost associated
with the initialization or lookup.
We are adding the process name as prefix and a newline as suffix to any
message written to debug. Thus, the following doesn't make any sense:
for (u8 byte : bytes)
dbg("{:02x} ", byte);
dbgln();
Which function call would put the prefix? This doesn't make any sense,
thus these functions must go.
The example above could be converted to:
StringBuilder builder;
for (u8 byte : bytes)
builder.appendff("{:02x} ", byte);
dbgln("{}", builder.build());
Problem:
- Constructors and conversion operators are not `constexpr`,
but they can be.
- `constexpr` is needed here so that other classes can add `constexpr`
evaluation.
Solution:
- Add the `constexpr` keyword to the constructors and
conversion operators.
- Add `static_assert` tests which ensure the capability works.
Problem:
- m_data is being passed to the constructor of the parent class before
it is initialized. This is not really a problem because the compiler
knows the location and it is only a span being constructed, but it
triggers a warning in clang for use-before-init.
Solution:
- Initialize using a default constructed array and then overwrite it
inside the constructor after the member is initialized.
Formatter is specialized in the header file. The definition in the
implementation file is extraneous and has no effect. Simply removing
it so that there is no confusion.
String literals are just pointers to a constant character. It should be
possible to format them as such. (The default is to print them as
strings still.)
When we write the format specifier '{:#08x}' we are asking for eight
significant digits, zero padding and the prefix '0x'.
However, previously we got only six significant digits because the
prefix counted towards the width. (The number '8' here is the total
width and not the number of significant digits.)
Both fmtlib and printf shared this behaviour. However, I am introducing
a special case here because when we do zero padding we really only care
about the digits and not the width.
Notice that zero padding is a special case anyways, because zero padding
goes after the prefix as opposed to any other padding which goes before
it.
This would previously fail at runtime, and it would have zero indication
of what exactly went wrong.
Also adds `AK::DependentFalse<Ts...>', which is a...dependent false.
In the future all (normal) output should be written by any of the
following functions:
out (currently called new_out)
outln
dbg (currently called new_dbg)
dbgln
warn (currently called new_warn)
warnln
However, there are still a ton of uses of the old out/warn/dbg in the
code base so the new functions are called new_out/new_warn/new_dbg. I am
going to rename them as soon as all the other usages are gone (this
might take a while.)
I also added raw_out/raw_dbg/raw_warn which don't do any escaping,
this should be useful if no formatting is required and if the input
contains tons of curly braces. (I am not entirely sure if this function
will stay, but I am adding it for now.)
If we're sharing buffers, we only want to share trivial structures
as anything else could potentially share internal pointers, which
most likely is going to cause problems due to different address
spaces.
Fix the GUI::SystemTheme structure, which was not trivial, which
is now caught at compile time.
Fixes#3650
The problem with our test suite is that it can't detect if a test
failed. When a test fails we simply write 'FAIL ...' to stderr and move
on.
Previously, the test suite would list all tests as passing regardless
how many assertions failed. In the future it might be smart to implement
this properly but test suites for C++ are always hard to do nicely.
(Because C++ execution isn't meant to be embedded.)
It's now save to pass a signed integer as parameter and then use it as
replacement field (previously, this would just cast it to size_t which
would be bad.)
This finally takes care of the kind-of excessive boilerplate code that were the
ctype adapters. On the other hand, I had to link `LibC/ctype.cpp` to the Kernel
(for `AK/JsonParser.cpp` and `AK/Format.cpp`). The previous commit actually makes
sense now: the `string.h` includes in `ctype.{h,cpp}` would require to link more LibC
stuff to the Kernel when it only needs the `_ctype_` array of `ctype.cpp`, and there
wasn't any string stuff used in ctype.
Instead of all this I could have put static derivatives of `is_any_of()` in the
concerned AK files, however that would have meant more boilerplate and workarounds;
so I went for the Kernel approach.
Since commit 1ec59f28ce turns the ctype macros
into functions we can now feed them directly to a GenericLexer! This will lead to
removing the ctype adapters that were kind-of excessive boilerplate, but needed as
the Kernel doesn't compile with the LibC.
I put this into the <AK/PrintfImplementation.h> header in the hope that
it could be re-used by the printf implementation. That would not be
super trivial though, so I am not doing that now.
The `consume_quoted_string()` can now take an escape character. This allows it
(for example) to capture a string's enclosing quotes. The escape character is
optional by default.
You can also consume and unescape a quoted string with the eponymous method
`consume_and_unescape_string()`. It takes an escape character as parameter
(backslash by default). It builds a String in which common escape sequences
get... unescaped :^) (e.g. \n, \r, \t...).
Instead of just implementing format specifiers ad-hog this commit
implements the exact syntax std::format uses.
There are still a ton of features that are not supported by this
implementation, however, the format specifiers should be parsed
correctly.
In some cases however, the format specifiers aren't quite parsed
correctly, for example:
String::formatted("{:{}}", 42, 4)
should produce the string " 42" however an (unrelated) assertion fails.
This is because vformat doesn't consider nested parentheses. I have to
spend some time coming up with a simple way of doing this, I don't feel
like doing that right now.
The fundamental code for this already exists, by limiting the number of
format arguments (arbitrarily) to 256 large widths are used to encode
that these should be taken from other format parameters.
With this commit, <AK/Format.h> has a more supportive role and isn't
used directly.
Essentially, there now is a public 'vformat' function ('v' for vector)
which takes already type erased parameters. The name is choosen to
indicate that this function behaves similar to C-style functions taking
a va_list equivalent.
The interface for frontend users are now 'String::formatted' and
'StringBuilder::appendff'.
Two things I hate about C++:
1. 'int', 'signed int' and 'unsigned int' are two distinct types while
'char, 'signed char' and 'unsigned char' are *three* distinct types.
This is because 'signed int' is an alias for 'int' but 'signed char'
can't be an alias for 'char' because on some weird systems 'char' is
unsigned.
One might think why not do it the other way around, make 'int' an
alias for 'signed int' and 'char' an alias for whatever that is on
the platform, or make 'char' signed on all platforms. But who am I
to ask?
2. 'unsigned long' and 'unsigned long long' are always different types,
even if both are 64 bit numbers.
This commit fixes a few bugs that coming from this.
See Also: 1b3169f405.
This function is not avaliable in the kernel.
In the future it would be nice to have some sort of <charconv> header
that does this for all integer types and then call it in strtoull and et
cetera.
The difference would be that this function say 'from_chars' would return
an Optional and not just interpret anything invalid as zero.
There are three classes avaliable that share the functionality of
BufferStream:
1. InputMemoryStream is for reading from static buffers. Example:
Bytes input = /* ... */;
InputMemoryStream stream { input };
LittleEndian<u32> little_endian_value;
input >> little_endian_value;
u32 host_endian_value;
input >> host_endian_value;
SomeComplexStruct complex_struct;
input >> Bytes { &complex_struct, sizeof(complex_struct) };
2. OutputMemoryStream is for writing to static buffers. Example:
Array<u8, 4096> buffer;
OutputMemoryStream stream;
stream << LittleEndian<u32> { 42 };
stream << ReadonlyBytes { &complex_struct, sizeof(complex_struct) };
foo(stream.bytes());
3. DuplexMemoryStream for writing to dynamic buffers, can also be used
as an intermediate buffer by reading from it directly. Example:
DuplexMemoryStream stream;
stream << NetworkOrdered<u32> { 13 };
stream << NetowkrOrdered<u64> { 22 };
NetworkOrdered<u32> value;
stream >> value;
ASSERT(value == 13);
foo(stream.copy_into_contiguous_buffer());
Unlike BufferStream these streams do not use a fixed endianness
(BufferStream used little endian) these have to be explicitly specified.
There are helper types in <AK/Endian.h>.
OutputMemoryStream was originally a proxy for DuplexMemoryStream that
did not expose any reading API.
Now I need to add another class that is like OutputMemoryStream but only
for static buffers. My first idea was to make OutputMemoryStream do that
too, but I think it's much better to have a distinct class for that.
I originally wanted to call that class FixedOutputMemoryStream but that
name is really cumbersome and it's a bit unintuitive because
InputMemoryStream is already reading from a fixed buffer.
So let's just use DuplexMemoryStream instead of OutputMemoryStream for
any dynamic stuff and create a new OutputMemoryStream for static
buffers.
Consider the following snippet:
void foo(InputStream& stream) {
if(!stream.eof()) {
u8 byte;
stream >> byte;
}
}
There is a very subtle bug in this snippet, for some input streams eof()
might return false even if no more data can be read. In this case an
error flag would be set on the stream.
Until now I've always ensured that this is not the case, but this made
the implementation of eof() unnecessarily complicated.
InputFileStream::eof had to keep a ByteBuffer around just to make this
possible. That meant a ton of unnecessary copies just to get a reliable
eof().
In most cases it isn't actually necessary to have a reliable eof()
implementation.
In most other cases a reliable eof() is avaliable anyways because in
some cases like InputMemoryStream it is very easy to implement.
This makes PrintfImplementation usable with any sequence, provided that
a 'next element' function can be written for it.
Does not affect the behaviour of printf() and co.
It wasn't actually possible to call
const LogStream& operator<<(const LogStream&, ReadonlyBytes);
because it was shadowed by
template<typename T>
const LogStream& operator<<(const LogStream& stream, Span<T> span);
not sure how I didn't find this when I added the overload.
It would be possible to use SFINAE to disable the other overload,
however, I think it is better to use a different method entirely because
the output can be very verbose:
void dump_bytes(ReadonlyBytes);
Leverage constexpr and __builtin_ffs for Bitmap::find_first. Also add
a variant Bitmap::find_one_anywhere that can start scanning at a
provided hint.
Also, merge Bitmap::fill_range into the already existing Bitmap::set_range
The streaming operator doesn't short-circuit, consider the following
snippet:
void foo(InputStream& stream) {
int a, b;
stream >> a >> b;
}
If the first read fails, the second is called regardless. It should be
well defined what happens in this case: nothing.
This is a strcpy()-like method with actually sane semantics:
* It accepts a non-empty buffer along with its size in bytes.
* It copies as much of the string as fits into the buffer.
* It always null-terminates the result.
* It returns, as a non-discardable boolean, whether the whole string has been
copied.
Intended usage looks like this:
bool fits = string.copy_characters_to_buffer(buffer, sizeof(buffer));
and then either
if (!fits) {
fprintf(stderr, "The name does not fit!!11");
return nullptr;
}
or, if you're sure the buffer is large enough,
// I'm totally sure it fits because [reasons go here].
ASSERT(fits);
or if you're feeling extremely adventurous,
(void)fits;
but don't do that, please.
For some weird reason the C++ standard considers char, signed char and
unsigned char *three* different types. On the other hand int is just an
alias for signed int, meaning that int, signed int and unsigned int are
just *two* different types.
https://stackoverflow.com/a/32856568/8746648
Before, we had about these occurrence counts:
COPY: 13 without, 33 with
MOVE: 12 without, 28 with
Clearly, 'with' was the preferred way. However, this introduced double-semicolons
all over the place, and caused some warnings to trigger.
This patch *forces* the usage of a semi-colon when calling the macro,
by removing the semi-colon within the macro. (And thus also gets rid
of the double-semicolon.)
The implementation in LibC did a timestamp->day-of-week conversion
which looks like a valuable thing to have. But we only need it in
time_to_tm, where we already computed year/month/day -- so let's
consolidate on the day_of_week function in DateTime (which is
getting extracted to AK).
The JS tests pointed out that the implementation in DateTime
had an off-by-one in the month when doing the leap year check,
so this change fixes that bug.
Specifically:
- post-increment actually implemented pre-increment
- helper-templates that provided operator{+,-,*,/}() couldn't possibly work,
because the interface of add (etc) were incompatible (not taking a Checked<>,
and returning void)
Consider the following scenario:
if(condition)
FOO();
else
bar();
Suppose FOO is defined as follows:
#define FOO() { bar(); baz(); }
Then it expands to the following:
if(condition)
// Syntax error, we are not allowed to put a semicolon at the end.
{ bar(); baz(); };
else
bar();
If we define FOO as follows:
#define FOO() do { bar(); baz(); } while(false)
Then it expands to the following:
if(condition)
do { bar(); baz(); } while(false);
else
bar();
Which is correct.
MemoryManager cannot use the Singleton class because
MemoryManager::initialize is called before the global constructors
are run. That caused the Singleton to be re-initialized, causing
it to create another MemoryManager instance.
Fixes#3226
In particular: consistent rounding and extreme values.
Before, rounding was something like 'away from 0.999...', which led to
surprising corner cases in which the value was rounded up.
Now, rounding is always 'down'.
This even works for 0xffffffff, and also for 0xffffffffffffffffULL on 64-bit.
This makes error messages more useful during debugging.
Old:
START Running test compare_views
FAIL: ../AK/Tests/TestStringView.cpp:59: EXPECT_EQ(view1, "foobar") failed
New:
START Running test compare_views
FAIL: ../AK/Tests/TestStringView.cpp:59: EXPECT_EQ(view1, "foobar") failed: LHS="foo", RHS="foobar"
Previously, it would just print something with 'FAIL' to stderr which
would be picked up by CTest. However, some code assumes that
ASSERT_NOT_REACHED() doesn't return, for example:
bool foo(int value) {
switch(value) {
case 0:
return true;
case 1:
return false;
default:
ASSERT_NOT_REACHED();
}
// warning: control reaches end of non-void function
}
Thankfully, this hasn't happened in any other code yet, but it happened
while I was trying something out. Using '==' on two ByteBuffers to check
whether they're equal seemed straight-forward, so I ran into the trap.
This seems to be because ByteBuffer implements 'operator bool', and C++
considers bool to be an integer type. Thus, when trying to find a way to
evaluate '==', it attempts integer promotion, which in turn finds 'operator bool'.
This explains why all non-empty buffers seem to be equal, but different from the
empty one. Also, why comparison seems to be implemented.
clang-format automatically sorts include statements that are in a
'block'. Adding a whitespace prevents this. It is crutial that
<AK/TestSuite.h> is included first because it redefines some macros.
Previously, the implementation would produce one Vector<u8> which
would contain the whole decompressed data. That can be a lot and
even exhaust memory.
With these changes it is still necessary to store the whole input data
in one piece (I am working on this next,) but the output can be read
block by block. (That's not optimal either because blocks can be
arbitrarily large, but it's good for now.)
This class is similar to BufferStream because it is possible to both
read and write to it. However, it differs in the following ways:
- DuplexMemoryStream keeps a history of 64KiB and discards the rest,
BufferStream always keeps everything around.
- DuplexMemoryStream tracks reading and writing seperately, the
following is valid:
DuplexMemoryStream stream;
stream << 42;
int value;
stream >> value;
For BufferStream it would read:
BufferStream stream;
stream << 42;
int value;
stream.seek(0);
stream >> value;
In the future I would like to replace all usages of BufferStream with
InputMemoryStream, OutputMemoryStream (doesn't exist yet) and
DuplexMemoryStream. For now I just add DuplexMemoryStream though.
Fatal errors can not be handeled and lead to an assertion error when the
stream is destroyed. It makes no sense to delay the assertion failure,
instead of setting m_fatal, an assertion should be done directly.
Two changes were made
1. copy_to() and copy_trimmed_to() now return how many bytes were
copied.
2. The argument was changed to Span<typename RemoveConst<T>::Type>
because the following would not work:
ReadonlyBytes bytes0;
Bytes bytes1;
// Won't work because this calls Span<const u8>::copy_to(Span<u8>)
// but the method was defined as Span<const u8>::copy_to(Span<const u8>)
bytes0.copy_to(bytes1);
The Coverity compiler doesn't support C++2a yet, and thus doesn't
even recognize concept keywords. To allow serenity to be built and
analyzed on such compilers, add a fallback underdef to perform
the same template restriction based on AK::EnableIf<..> meta
programming.
Note: Coverity does seem to (annoyingly) define __cpp_concepts, even
though it doesn't support them, so we need to further check for
__COVERITY__ explicitly.
Windows uses "KB", "MB", "GB" as powers of two.
macOS uses "kB", "MB", "GB" as powers of ten.
"k", "M", "G" are standard SI prefixes that normally refer to powers of
ten.
The IEC introduced "KiB", "MiB", "GiB" to unambiguously refer to
powers of two. It admittedly hasn't caught on that much, but it
does have the advantage that it's unabigious what it means.
So let's use it for user-visible sizes in SerenityOS.
(Linux does all of the above in different places, depending on app and
toolkit.)
Let's use the one in AK/NumberFormat.h everywhere.
It has slightly different behavior than some of the copies this
removes, but it's probably nice to have uniform human readable
size outputs across the system.
The SI prefixes "k", "M", "G" mean "10^3", "10^6", "10^9".
The IEC prefixes "Ki", "Mi", "Gi" mean "2^10", "2^20", "2^30".
Let's use the correct name, at least in code.
Only changes the name of the constants, no other behavior change.
I originally defined the bytes() method for the String class, because it
made it obvious that it's a span of bytes instead of span of characters.
This commit makes this more consistent by defining a bytes() method when
the type of the span is known to be u8.
Additionaly, the cast operator to Bytes is overloaded for ByteBuffer and
such.
This change aims to add support for obscure IPv4 address notations, such as 1.1 (which should be equal to 1.0.0.1), or the hypothetical address 1 (which is equal to 0.0.0.1). This is supported on other platforms as well, such as Linux, Windows, *BSD, and even Haiku.
This enables a nice warning in case a function becomes dead code. Also, add forgotten
header to Base64.cpp, which would cause an issue later when we enable -Wmissing-declarations.
This template class allows for easy generation of incompatible numeric types.
This is useful whenever code has to handle heterogenous data (like meters and
seconds) but the underlying data types are compatible (like int and int).
The motivation comes from the Kernel's inconsistent use of pid_t for process and
thread IDs even though the ID spaces are incompatible, and translating forth/back
is nontrivial.
Other uses could be units (as described above), or incompatible index systems.
A popular use in real life is image manipulation, when there are multiple
coordinate systems.
The symbol name insertion scheme is different from objdump -d's.
Compare the output on Build/Userland/id:
* disasm:
...
_start (08048305-0804836b):
08048305 push ebp
...
08048366 call 0x0000df56
0804836b o16 nop
0804836d o16 nop
0804836f nop
(deregister_tm_clones (08048370-08048370))
08048370 mov eax, 0x080643e0
...
_ZN2AK8Utf8ViewC1ERKNS_6StringE (0805d9b2-0805d9b7):
_ZN2AK8Utf8ViewC2ERKNS_6StringE (0805d9b2-0805d9b7):
0805d9b2 jmp 0x00014ff2
0805d9b7 nop
* objdump -d:
08048305 <_start>:
8048305: 55 push %ebp
...
8048366: e8 9b dc 00 00 call 8056006 <exit>
804836b: 66 90 xchg %ax,%ax
804836d: 66 90 xchg %ax,%ax
804836f: 90 nop
08048370 <deregister_tm_clones>:
8048370: b8 e0 43 06 08 mov $0x80643e0,%eax
...
0805d9b2 <_ZN2AK8Utf8ViewC1ERKNS_6StringE>:
805d9b2: e9 eb f6 ff ff jmp 805d0a2 <_ZN2AK10StringViewC1ERKNS_6StringE>
805d9b7: 90 nop
Differences:
1. disasm can show multiple symbols that cover the same instructions.
I've only seen this happen for C1/C2 (and D1/D2) ctor/dtor pairs,
but it could conceivably happen with ICF as well.
2. disasm separates instructions that do not belong to a symbol with
a newline, so that nop padding isn't shown as part of a function
when it technically isn't.
3. disasm shows symbols that are skipped (due to having size 0)
in parenthesis, separated from preceding and following instructions.
When using Userspace<T> there are certain syscalls where being able
to cast between types is needed. You should be able to easily cast
away the Userspace<T> wrapper, but it's perfectly safe to be able to
cast the internal type that is being wrapped.
This function did a const_cast internally which made the call side look
"safe". This method is removed completely and call sites are replaced
with ByteBuffer::wrap(const_cast<void*>(data), size) which makes the
behaviour obvious.
We should always leak to an observed variable, otherwise
it's an actual leak. This is similar to AK::RefPtr::leak_ref()
which is also marked as [[nodiscard]].
There are use cases where a linked list is useful but it's also worth
the overhead to maintain a count so you can quickly answer queries of
the size of the list.
This is a very cheesy patch and I don't like it, but as Qt Creator does
not grok C++20 concepts yet, this makes it possible to still use syntax
highlighting.
We'll remove this hack the moment it stops being a problem. Note that
it doesn't actually affect the build since we use GCC, not Clang.