This removes the awkward String::replace API which was the only String
API which mutated the String and replaces it with a new immutable
version that returns a new String with the replacements applied. This
also fixes a couple of UAFs that were caused by the use of this API.
As an optimization an equivalent StringView::replace API was also added
to remove an unnecessary String allocations in the format of:
`String { view }.replace(...);`
Command used:
grep -Pirn '(out|warn)ln\((?!["\)]|format,|stderr,|stdout,|output, ")' \
AK Kernel/ Tests/ Userland/
(Plus some manual reviewing.)
Let's pick ArgsParser as an example:
outln(file, m_general_help);
This will fail at runtime if the general help happens to contain braces.
Even if this transformation turns out to be unnecessary in a place or
two, this way the code is "more obviously" correct.
Commit 890c647e0f fixed an off-by-one bug, so the mapping of the page
at the very end of the user address space now works correctly.
This change adjusts the test so cover the corner cases the original
version was designed too.validate.
Previously, LibUnicode would store the values of a keyword as a Vector.
For example, the locale "en-u-ca-abc-def" would have its keyword "ca"
stored as {"abc, "def"}. Then, canonicalization would occur on each of
the elements in that Vector.
This is incorrect because, for example, the keyword value "true" should
only be dropped if that is the entire value. That is, the canonical form
of "en-u-kb-true" is "en-u-kb", but "en-u-kb-abc-true" does not change
for canonicalization. However, we would canonicalize that locale as
"en-u-kb-abc".
This should fix the flaky tests of test-js.
It also fixes the tests when running with the -g flag since the values
will not be garbage collected too soon.
Note that the algorithm in the Unicode spec is for checking that a code
point precedes U+0307, but the special casing condition NotBeforeDot is
interested in the inverse of this rule.
Executing 100 times vs 10 times doesn't increase test coverage
substantial, this API is stable and more iterations is just a
waste of time.
Without KVM this test is a clear outlier in runtime during CI:
```
START LibC/TestQsort (106/172)
PASS LibC/TestQsort (41.233692s)
```
This avoids a value copy when calling value() or value_or() on a
temporary Optional. This is very common when using the HashMap::get()
API like this:
auto value = hash_map.get(key).value_or(fallback_value);
Our existing implementation did not check the element type of the other
pointer in the constructors and move assignment operators. This meant
that some operations that would require explicit casting on raw pointers
were done implicitly, such as:
- downcasting a base class to a derived class (e.g. `Kernel::Inode` =>
`Kernel::ProcFSDirectoryInode` in Kernel/ProcFS.cpp),
- casting to an unrelated type (e.g. `Promise<bool>` => `Promise<Empty>`
in LibIMAP/Client.cpp)
This, of course, allows gross violations of the type system, and makes
the need to type-check less obvious before downcasting. Luckily, while
adding the `static_ptr_cast`s, only two truly incorrect usages were
found; in the other instances, our casts just needed to be made
explicit.
Using a file(GLOB) to find all the test files in a directory is an easy
hack to get things started, but has some drawbacks. Namely, if you add
a test, it won't be found again without re-running CMake. `ninja` seems
to do this automatically, but it would be nice to one day stop seeing it
rechecking our globbed directories.
Calendar subtags are a bit of an odd-man-out in that we must match the
variants "ethiopic-amete-alem" in that order, without any other variant
in the locale. So a separate method is needed for this, and we now defer
sorting the variant list until after other canonicalization is done.
Unicode TR35 defines how locale subtag aliases should be emplaced when
converting a locale to canonical form. For most subtags, it is a simple
substitution. Language subtags depend on context; for example, the
language "sh" should become "sr-Latn", but if the original locale has a
script subtag already ("sh-Cyrl"), then only the language subtag of the
alias should be taken ("sr-Latn").
To facilitate this, we now make two passes when canonicalizing a locale.
In the first pass, we convert the LocaleID structure to canonical syntax
(where the conversions all happen in-place). In the second pass, we form
the canonical string based on the canonical syntax.
Originally, it was convenient to store the parsed Unicode locale data as
views into the original string being parsed. But to implement locale
aliases will require mutating the data that was parsed. To prepare for
that, store the parsed data as proper strings.
Some i64 values will not fit in normal doubles, and these values _are_
tested by the test suite, this makes the test runtime capable of
handling them correctly.
When swapping the same object, we could end up with a double-free error.
This was found while quick-sorting a Vector of Variants holding complex
types, reproduced by the new swap_same_complex_object test case.
ECMA-402 requires validating user input against the EBNF grammar for
Unicode locales described in TR-35: https://www.unicode.org/reports/tr35
This commit adds validators for that grammar, as well as other helper to
e.g. canonicalize a locale string.
Since there are no real users of these functions in Serenity's
userland and this is my third attempt at this... This time, the great
LibTest test suite will make sure that I do it right!
Classes reading and writing to the data heap would communicate directly
with the Heap object, and transfer ByteBuffers back and forth with it.
This makes things like caching and locking hard. Therefore all data
persistence activity will be funneled through a Serializer object which
in turn submits it to the Heap.
Introducing this unfortunately resulted in a huge amount of churn, in
which a number of smaller refactorings got caught up as well.
This patch provides very basic, bare bones implementations of the
INSERT and SELECT statements. They are *very* limited:
- The only variant of the INSERT statement that currently works is
SELECT INTO schema.table (column1, column2, ....) VALUES
(value11, value21, ...), (value12, value22, ...), ...
where the values are literals.
- The SELECT statement is even more limited, and is only provided to
allow verification of the INSERT statement. The only form implemented
is: SELECT * FROM schema.table
These statements required a bit of change in the Statement::execute
API. Originally execute only received a Database object as parameter.
This is not enough; we now pass an ExecutionContext object which
contains the Database, the current result set, and the last Tuple read
from the database. This object will undoubtedly evolve over time.
This API change dragged SQLServer::SQLStatement into the patch.
Another API addition is Expression::evaluate. This method is,
unsurprisingly, used to evaluate expressions, like the values in the
INSERT statement.
Finally, a new test file is added: TestSqlStatementExecution, which
tests the currently implemented statements. As the number and flavour of
implemented statements grows, this test file will probably have to be
restructured.
The implemtation of the Value class was based on lambda member variables
implementing type-dependent behaviour. This was done to ensure that
Values can be used as stack-only objects; the simplest alternative,
virtual methods, forces them onto the heap. The problem with the the
lambda approach is that it bloats the Values (which are supposed to be
lightweight objects) quite considerably, because every object contains
more than a dozen function pointers.
The solution to address both problems (we want Values to be able to live
on the stack and be as lightweight as possible) chosen here is to
encapsulate type-dependent behaviour and state in an implementation
class, and let the Value be an AK::Variant of those implementation
classes. All methods of Value are now basically straight delegates to
the implementation object using the Variant::visit method.
One issue complicating matters is the addition of two aggregate types,
Tuple and Array, which each contain a Vector of Values. At this point
Tuples and Arrays (and potential future aggregate types) can't contain
these aggregate types. This is limiting and needs to be addressed.
Another area that needs attention is the nomenclature of things; it's
a bit of a tangle of 'ValueBlahBlah' and 'ImplBlahBlah'. It makes sense
right now I think but admit we probably can do better.
Other things included here:
- Added the Boolean and Null types (and Tuple and Array, see above).
- to_string now always succeeds and returns a String instead of an
Optional. This had some impact on other sources.
- Added a lot of tests.
- Started moving the serialization mechanism more towards where I want
it to be, i.e. a 'DataSerializer' object which just takes
serialization and deserialization requests and knows for example how
to store long strings out-of-line.
One last remark: There is obviously a naming clash between the Tuple
class and the Tuple Value type. This is intentional; I plan to make the
Tuple class a subclass of Value (and hence Key and Row as well).
For example, consider the following pattern:
new RegExp('\ud834\udf06', 'u')
With this pattern, the regex parser should insert the UTF-8 encoded
bytes 0xf0, 0x9d, 0x8c, and 0x86. However, because these characters are
currently treated as normal char types, they have a negative value since
they are all > 0x7f. Then, due to sign extension, when these characters
are cast to u64, the sign bit is preserved. The result is that these
bytes are inserted as 0xfffffffffffffff0, 0xffffffffffffff9d, etc.
Fortunately, there are only a few places where we insert bytecode with
the raw characters. In these places, be sure to treat the bytes as u8
before they are cast to u64.
Unfortunately, this requires a slight divergence in the way the capture
group names are stored. Previously, the generated byte code would simply
store a view into the regex pattern string, so no string copying was
required.
Now, the escape sequences are decoded into a new string, and a vector
of all parsed capture group names are stored in a vector in the parser
result structure. The byte code then stores a view into the
corresponding string in that vector.
This parsing is already duplicated between LibJS and LibRegex, and will
shortly be needed in more places in those libraries. Move it to AK to
prevent further duplication.
This API will consume escaped Unicode code points of the form:
\\u{code point}
\\unnnn (where each n is a hexadecimal digit)
\\unnnn\\unnnn (where the two escaped values are a surrogate pair)
Currently, when we need to repeat an instruction N times, we simply add
that instruction N times in a for-loop. This doesn't scale well with
extremely large values of N, and ECMA-262 allows up to N = 2^53 - 1.
Instead, add a new REPEAT bytecode operation to defer this loop from the
parser to the runtime executor. This allows the parser to complete sans
any loops (for this instruction), and allows the executor to bail early
if the repeated bytecode fails.
Note: The templated ByteCode methods are to allow the Posix parsers to
continue using u32 because they are limited to N = 2^20.
Combining these into one list helps reduce the size of MatchState, and
as a result, reduces the amount of memory consumed during execution of
very large regex matches.
Doing this also allows us to remove a few regex byte code instructions:
ClearNamedCaptureGroup, SaveLeftNamedCaptureGroup, and NamedReference.
Named groups now behave the same as unnamed groups for these operations.
Note that SaveRightNamedCaptureGroup still exists to cache the matched
group name.
This also removes the recursion level from the MatchState, as it can
exist as a local variable in Matcher::execute instead.
The grammar for the ECMA-262 CharacterEscape is:
CharacterEscape[U, N] ::
ControlEscape
c ControlLetter
0 [lookahead ∉ DecimalDigit]
HexEscapeSequence
RegExpUnicodeEscapeSequence[?U]
[~U]LegacyOctalEscapeSequence
IdentityEscape[?U, ?N]
It's important to parse the standalone "\0 [lookahead ∉ DecimalDigit]"
before parsing LegacyOctalEscapeSequence. Otherwise, all standalone "\0"
patterns are parsed as octal, which are disallowed in Unicode mode.
Further, LegacyOctalEscapeSequence should also be parsed while parsing
character classes.
A subsequent commit will add tests that require a string containing only
"\0". As a C-string, this will be interpreted as the null terminator. To
make the diff for that commit easier to grok, this commit converts all
tests to use StringView without any other functional changes.
* Only alphabetic (A-Z, a-z) characters may be escaped with \c. The loop
currently parsing \c includes code points between the upper/lower case
groups.
* In Unicode mode, all invalid identity escapes should cause a parser
error, even in browser-extended mode.
* Avoid an infinite loop when parsing the pattern "\c" on its own.
Similarly to the LibCpp parser regression tests, these tests run the
preprocessor on the .cpp test files under
Userland/LibCpp/Tests/preprocessor, and compare the output with existing
.txt ground truth files.
These script extensions have some peculiar behavior in the Unicode spec.
The UCD ScriptExtension file does not contain these scripts. Rather, it
is implied the code points which have these scripts as an extension are
the code points that both:
1. Have Common or Inherited as their primary script value
2. Do not have any other script value in their script extension lists
Because these are not explictly listed in the UCD, we must manually form
these script extensions.
Notice that unlike the note in populate_general_category_unions(),
script extension do indeed have code point ranges which overlap. Thus,
this commit adds code to handle that, and hooks it into the GC unions.
Previously, each code point's General Category was part of the generated
UnicodeData structure. This ultimately presented two problems, one
functional and one performance related:
* Some General Categories are applied to unassigned code points, for
example the Unassigned (Cn) category. Unassigned code points are
strictly excluded from UnicodeData.txt, so by relying on that file,
the generator is unable to handle these categories.
* Lookups for General Categories are slower when searching through the
large UnicodeData hash map. Even though lookups are O(1), the hash
function turned out to be slower than binary searching through a
category-specific table.
So, now a table is generated for each General Category. When querying a
code point for a category, a binary search is done on each code point
range in that category's table to check if code point has that category.
Further, General Categories are now parsed from the UCD file
DerivedGeneralCategory.txt. This file is a normal "prop list" file and
contains the categories for unassigned code points.
There seems to be more incorrect assumptions about Clang-built
executables' memory layout than expected. These make the CI fail even
though the system is functional in all other aspects. While this is
being fixed, let's just disable tests for UserspaceEmulator.
This helps us avoid weird truncation issues and fixes a bug on Clang
builds where truncation while reading caused the DIE offsets following
large LEB128 numbers to be incorrect. This removes the need for the
separate `LongUnsignedNumber` type.
We now call Preprocessor::process_and_lex() and pass the result to the
parser.
Doing the lexing in the preprocessor will allow us to maintain the
original position information of tokens after substituting definitions.
Improve the parsing of data urls in URLParser to bring it more up-to-
spec. At the moment, we cannot parse the components of the MIME type
since it is represented as a string, but the spec requires it to be
parsed as a "MIME type record".
This is a regression test to validate the functionality that was
reported broken in #9071, where the kernel would spin attempting
to cancel a stale timer.
Before now, only binary properties could be parsed. Non-binary props are
of the form "Type=Value", where "Type" may be General_Category, Script,
or Script_Extension (or their aliases). Of these, LibUnicode currently
supports General_Category, so LibRegex can parse only that type.
This changes LibRegex to parse the property escape as a Variant of
Unicode Property & General Category values. A byte code instruction is
added to perform matching based on General Category values.
The following commit broke Tests/AK/TestJSON.cpp as it removed the
file that the test loaded from disk to validate JSON parsing.
commit ad141a2286
Author: Andreas Kling <kling@serenityos.org>
Date: Sat Jul 31 15:26:14 2021 +0200
Base: Remove "test.frm" from HackStudio test project
Instead of restoring the file, lets just embed a bit of JSON in the
test case to avoid using external resources, as they obviously are
surprising and make the test less portable across environments.
This supports some binary property matching. It does not support any
properties not yet parsed by LibUnicode, nor does it support value
matching (such as Script_Extensions=Latin).
Previously unmapping any offset starting at 0x0 would assert in the
kernel, add a regression test to validate the fix.
Co-authored-by: Federico Guerinoni <guerinoni.federico@gmail.com>
Apparently, some code points fit both categories, for example U+0345
(COMBINING GREEK YPOGEGRAMMENI). Handle this fact when determining if
a code point is a final code point in a string.
This implements unconditional special case folding, and conditional
folding for non-locale cases. Worth noting that the only conditional,
non-locale special case is for converting an uppercase sigma to
lowercase.
The Unicode standard publishes the Unicode Character Database (UCD) with
information about every code point, such as each code point's upper case
mapping. LibUnicode exists to download and parse UCD files at build time
and to provide accessors to that data.
As a start, LibUnicode includes upper- and lower-case code point
converters.
Previously there was no way to create a MACAddress by passing a direct
address as a string. This will allow programs like the arp utility to
create a MACAddress instance by user-passed addresses.
When the Unicode flag is set, regular expressions may escape code points
by surrounding the hexadecimal code point with curly braces, e.g. \u{41}
is the character "A".
When the Unicode flag is not set, this should be considered a repetition
symbol - \u{41} is the character "u" repeated 41 times. This is left as
a TODO for now.
When the Unicode option is not set, regular expressions should match
based on code units; when it is set, they should match based on code
points. To do so, the regex parser must combine surrogate pairs when
the Unicode option is set. Further, RegexStringView needs to know if
the flag is set in order to return code point vs. code unit based
string lengths and substrings.
This is a generally nicer-to-use version of the existing {any,all}_of()
that doesn't require the user to explicitly provide two iterators.
As a bonus, it also allows arbitrary iterators (as opposed to the hard
requirement of providing SimpleIterators in the iterator version).
The state of the formatter for the previous element should be thrown
away for each iteration. This showed up when trying to format a
Vector<String>, since Formatter<StringView> was unhappy about some state
that gets set when it's called. Add a test for Formatter<Vector>.
During a recent commit the 64-bit kernel was moved to a different
address, breaking this test (unnoticed). This fixes it, so we can
turn on breaking x86_64 tests on the CI again.
This commit makes LibRegex (mostly) capable of operating on any of
the three main string views:
- StringView for raw strings
- Utf8View for utf-8 encoded strings
- Utf32View for raw unicode strings
As a result, regexps with unicode strings should be able to properly
handle utf-8 and not stop in the middle of a code point.
A future commit will update LibJS to use the correct type of string
depending on the flags.
While trying to port to Clang we found that the functions as
implemented didn't actually work, and replacing them with a blatantly
broken function also did not break the tests on the GCC build. It
turns out we've been testing GCC's builtins by many tests. This
removes the use of builtins for LibM's tests (so we test the whole
function). It turns off the denormal test for scalbn (which was not
implemented) and comments out the tgamma(0.5) test which is too
inaccurate to be usable (and too complicated for me to fix). The gamma
function was made accurate for all other test cases, and asin received
two more layers of Taylor expansion to bring it within error margin
for the tests.
Previously, HTMLToken would expose the Vector<Attribute> directly to
its users. In preparation for a future change, all users now use
implementation-agnostic APIs which do not expose the Vector directly.
* wasm: Don't try to print the function results if it traps
* LibWasm: Inline some very hot functions
These are mostly pretty small functions too, and they were about ~10%
of runtime.
* LibWasm+Everywhere: Make the instruction count limit configurable
...and enable it for LibWeb and test-wasm.
Note that `wasm` will not be limited by this.
* LibWasm: Remove a useless use of ScopeGuard
There are no multiple exit paths in that function, so we can just put
the ending logic right at the end of the function instead.
Prior to this, it'd try to stuff them into an i64, which could fail and
give us nothing.
Even though this is an extension we've made to JSON, the parser should
be able to correctly round-trip from whatever our serialiser has
generated.
This fixes parsing the following regular expression: /</g;
It also adds a simple script element to the HTMLTokenizer regression
test, which also contains that specific regex.
The test suite includes a few basic tests and a very crude regression
test, which just concatenates the to_string() of all tokens and checks
the String's hash to be equal. This relies on the format of
HTMLToken::to_string() to stay the same, which is not ideal.
Since Clang enables a couple of warnings that we don't have in GCC,
these were not caught before. Included fixes:
- Use correct printf format string for `size_t`
- Don't compare Nonnull(Ref|Own)Ptr` to nullptr
- Fix unsigned int& => unsigned long& conversion
This test exposed a kernel panic in is_user_range calculations, so let's
convert it to be a LibTest test so we can prevent regressions in mmap,
the page allocator, and the memory manager.
Let's bring this class back, but without the confusing resize() API.
A FixedArray<T> is simply a fixed-size array of T.
The size is provided at run-time, unlike Array<T> where the size is
provided at compile-time.
This patch introduces the SQLServer system server. This service is
supposed to be the only process/application talking to database storage.
This makes things like locking and caching more reliable, easier to
implement, and more efficient.
In LibSQL we added a client component that does the ugly IPC nitty-
gritty for you. All that's needed is setting a number of event handler
lambdas and you can connect to databases and execute statements on them.
Applications that wish to use this SQLClient class obviously need to
link LibSQL and LibIPC.
The Order enum is used in the Meta component of LibSQL. Using this enum
meant having to include the monster AST/AST.h include file. Furthermore,
they are sort of basic and therefore can live in the general SQL
namespace. Moved to LibSQL/Type.h.
Also introduced a new class, SQLResult, which is needed in future
patches.
Now that the test is converted to be LibTest based, we can remove it
from the exclude list in /home/anon/.config/Tests.ini.
Prior to this it would crash and fail because it was signaled instead of
returning normally with exit code 0.
After the changes to LibGfx to make default font management handled in
WindowServer instead of each GUI application to allow for global font
broadcasts, the two LibGfx tests broke. The non-benchmark was fixed in
8f96d2, but the benchmark was left in the dust because nobody really
runs it manually :^(
These are usually incorrect, and people sometimes forget to add the
correct values as a result of them being optional, so they should just
be specified explicitly.
This adds a test case for String::find and String::find_all with empty
needles. The expected behavior is in line with what the C++ standard
library (and other languages standard libraries) expect.
This implements StringUtils::find_any_of() and uses it in
String::find_any_of() and StringView::find_any_of(). All uses of
find_{first,last}_of have been replaced with find_any_of(), find() or
find_last(). find_{first,last}_of have subsequently been removed.
This removes StringView::find_first_of(char) and find_last_of(char) and
replaces all its usages with find and find_last respectively. This is
because those two methods are functionally equivalent.
find_{first,last}_of should only be used if searching for multiple
different characters, which is never the case with the char argument.
This also adds the [[nodiscard]] to the remaining find_{first,last}_of
methods.
This replaces the current LexicalPath::append() API with a new method
that returns a new LexicalPath object and doesn't touch the this-object.
With this, LexicalPath is now immutable. It also adds a
LexicalPath::parent() method and the relevant test cases.
Since this is always set to true on the non-default constructor and
subsequently never modified, it is somewhat pointless. Furthermore,
there are arguably no invalid relative paths.
Split out the functionality to gather multiple tests from the filesystem
and run them in turn into Test::TestRunner, and leave the JavaScript
specific test harness logic in Test::JS::TestRunner and friends.
If someone runs the test with shell redirection going on, or in a way
that changes any of the standard file descriptors this assumption will
not hold. When running from a terminal normally, it is true however.
Instead, check that /proc/self/fd/[0,1,2] are symlinks, and can be
stat-d by verifying that both stat and lstat succeed, and give different
struct stat contents.
Also add some tests to ensure that they _remain_ constexpr.
In general, any runtime assertions, weirdo C casts, pointer aliasing,
and such shenanigans should be gated behind the (helpfully newly added)
AK::is_constant_evaluated() function when the intention is to write
constexpr-capable code.
a.k.a. deliver promises of constexpr-ness :P
This commit converts naked `new`s to `AK::try_make` and `AK::try_create`
wherever possible. If the called constructor is private, this can not be
done, so we instead now use the standard-defined and compiler-agnostic
`new (nothrow)`.
Scanning tables is a linear process using pointers in the table's
tuples, and does not involve more 'stochastic' code paths like index
traversals. Therefore the 1000 and 10000 row tests were basically
overkill and added nothing we can't find out with less rows.
This declares all test cases which compare function outputs over the
entire Unicode range as `BENCHMARK_CASE`, to avoid them being run by CI.
This reduces runtime of TestCharacterTypes (without benchmarks) by about
one third.
SQL was standardized before there was consensus on sane language syntax
constructs had evolved. The language is mostly case-insensitive, with
unquoted text converted to upper case. Identifiers can include lower
case characters and other 'special' characters by enclosing the
identifier with double quotes. A double quote is escaped by doubling it.
Likewise, a single quote in a literal string is escaped by doubling it.
All this means that the strategy used in the lexer, where a token's
value is a StringView 'window' on the source string, does not work,
because the value needs to be massaged before being handed to the
parser. Therefore a token now has a String containing its value. Given
the limited lifetime of a token, this is acceptable overhead.
Not doing this means that for example quote removal and double quote
escaping would need to be done in the parser or in AST node
construction, which would spread lexing basically all over the place.
Which would be suboptimal.
There was some impact on the sql utility and SyntaxHighlighter component
which was addressed by storing the token's end position together with
the start position in order to properly highlight it.
Finally, reviewing the tests for parsing numeric literals revealed an
inconsistency in which tokens we accept or reject: `1a` is accepted but
`1e` is rejected. Related to this is the fate of `0x`. Added a FIXME
reminding us to address this.
Unfortunately this patch is quite large.
The main functionality included are a BTree index implementation and
the Heap class which manages persistent storage.
Also included are a Key subclass of the Tuple class, which is a
specialization for index key tuples. This "dragged in" the Meta layer,
which has classes defining SQL objects like tables and indexes.
This patch adds the basic dynamic value classes used by the SQL Storage
layer. The most elementary class is Value, which holds a typed Value
which can be converted to standard C++ types. A Tuple is a collection
of Values described by a TupleDescriptor, which specifies the names,
types, and ordering of the elements in the Tuple.
Tuples and Values can be serialized and deserialized to and from
ByteBuffers. This is mechanism which is used to save them to disk.
Tuples are used as keys in SQL indexes and rows in SQL tables.
Also included is a test file.
The insert_before method on AK::InlineLinkedList is used, so in order to
achieve feature parity, we need to implement it for AK::IntrusiveList as
well.
A POSIX-compatibility fix was introduced in 64740a0214 to make the
compilation of the `diffutils` port work, which expected a
`char* const* argv` signature.
And indeed, the POSIX spec does not mention permutation of `argv`:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/getopt.html
However, most implementations do modify `argv` as evidenced by
documentation such as:
https://refspecs.linuxbase.org/LSB_5.0.0/LSB-Core-generic
/LSB-Core-generic/libutil-getopt-3.html
"The function prototype was aligned with POSIX 1003.1-2008 (ISO/IEC
9945-2009) despite the fact that it modifies argv, and the library
maintainers are unwilling to change this."
Change the behavior back to permutate `argc` to allow for the following
command line argument order to work again:
unzip ./file.zip -o target-dir
Without this change, `./file.zip` in the example above would have been
ignored completely.
Doing these as custom classes might be faster, especially when writing
them in SSE, but this would cause a lot of Code duplication and due to
the nature of constexprs and the intelligence of the compiler they might
be using SSE/MMX either way
This commit makes it possible to instantiate `Vector<T&>` and use it
to store references to `T` in a vector.
All non-pointer observers are made to return the reference, and the
pointer observers simply yield the underlying pointer.
Note that the 'find_*' methods act on the values and not the pointers
that are stored in the vector.
This commit also makes errors in various vector methods much more
readable by directly using requires-clauses on them.
And finally, it should be noted that Vector cannot hold temporaries :^)
Previously, AK::Function would accept _any_ callable type, and try to
call it when called, first with the given set of arguments, then with
zero arguments, and if all of those failed, it would simply not call the
function and **return a value-constructed Out type**.
This lead to many, many, many hard to debug situations when someone
forgot a `const` in their lambda argument types, and many cases of
people taking zero arguments in their lambdas to ignore them.
This commit reworks the Function interface to not include any such
surprising behaviour, if your function instance is not callable with
the declared argument set of the Function, it can simply not be
assigned to that Function instance, end of story.
According to the definition at https://sqlite.org/lang_expr.html, SQL
expressions could be infinitely deep. For practicality, SQLite enforces
a maxiumum expression tree depth of 1000. Apply the same limit in
LibSQL to avoid stack overflow in the expression parser.
Fixes https://crbug.com/oss-fuzz/34859.
Because non-ASCII code points have negative byte values, trimming away
control characters requires checking for negative bytes values.
This also adds a test case with a URL containing non-ASCII code points.
This commit is a fairly large refactor, mainly because it unified the
two different ways that existed to represent references.
Now Reference values are also a kind of value.
It also implements a printer for values/references instead of copying
the implementation everywhere.
StringView::lines() supports line-separators “\n”, “\r”, and “\r\n”.
The method will drop an entire line if it is surrounded by “\r”
and “\n” separators on the left and right sides respectively.
The previous behavior was to always VERIFY that the UTF-8 bytes were
valid when iterating over the code points of an UTF8View. This change
makes it so we instead output the 0xFFFD 'REPLACEMENT CHARACTER'
code point when encountering invalid bytes, and keep iterating the
view after skipping one byte.
Leaving the decision to the consumer would break symmetry with the
UTF32View API, which would in turn require heavy refactoring and/or
code duplication in generic code such as the one found in
Gfx::Painter and the Shell.
To make it easier for the consumers to detect the original bytes, we
provide a new method on the iterator that returns a Span over the
data that has been decoded. This method is immediately used in the
TextNode::compute_text_for_rendering method, which previously did
this in a ad-hoc waay.
This also add tests for the new behavior in TestUtf8.cpp, as well
as reinforcements to the existing tests to check if the underlying
bytes match up with their expected values.
Rather than aborting when a LIMIT clause of the form 'LIMIT expr, expr'
is encountered, fail the parser with a syntax error. This will be nicer
for the user and fixes the following fuzzer bug:
https://crbug.com/oss-fuzz/34837
This patch introduces a new operator== to compare an Optional to its
contained type directly. If the Optional does not contain a value, the
comparison will always return false.
This also adds a test case for the new behavior as well as comparison
between Optional objects themselves.
This adds more tests for AK::URL. Furthermore, this also changes some
tests to conform to what the reworked URL class does (and the URL
specification mostly expects).
This adds a peek method for Utf8CodepointIterator, which enables it to
be used in some parsing cases where peeking is necessary.
peek(0) is equivalent to operator*, expect that peek() does not contain
any assertions and will just return an empty Optional<u32>.
This also implements a test case for iterating UTF-8.
Previously, we would go crazy and shift things way out of bounds.
Add tests to verify that the decoding algorithm is safe around the
limits of the result type.
The round trip compress test wants the first half of the byte buffer to
be filled with random data, and the second half to be all zeroes. The
strategy of using memset on ByteBuffer::offset_pointer confuses
__builtin_memset_chk when building with -fsanitize=undefined. It thinks
that the buffer is using inline capacity when we can prove to ourselves
pretty easily that it's not. To avoid this, just create the buffer
zeroed to start, and then fill the first half with the random data.
This implements the XSI-compliant version of strerror_r() - as opposed
to the GNU-specific variant.
The function explicitly saves errno so as to not accidentally change it
with one of the calls to other functions.
We had two functions for doing mostly the same thing. Combine both
of them into String::find() and use that everywhere.
Also add some tests to cover basic behavior.
Managing the instantiated modules becomes a pain if they're on the
stack, since an instantiated module will eventually reference itself.
To make using this simpler, just avoid copying the instance.
This fixes a FIXME and will allow linking only select modules together,
instead of linking every instantiated module into a big mess of exported
entities :P
This also optionally generates a test suite from the WebAssembly
testsuite, which can be enabled via passing `INCLUDE_WASM_SPEC_TESTS`
to cmake, which will generate test-wasm-compatible tests and the
required fixtures.
The generated directories are excluded from git since there's no point
in committing them.
This only tests "can it be parsed", but the goal of this commit is to
provide a test framework that can be built upon :)
The conformance tests are downloaded, compiled* and installed only if
the INCLUDE_WASM_SPEC_TESTS cmake option is enabled.
(*) Since we do not yet have a wast parser, the compilation is delegated
to an external tool from binaryen, `wasm-as`, which is required for the
test suite download/install to succeed.
This *does* run the tests in CI, but it currently does not include the
spec conformance tests.