For setreuid and setresuid syscalls, -1 means to set the current
uid/euid/gid/egid value, to be more convenient for programming.
However, for other syscalls where we pass only one argument, there's no
justification to specify -1.
This behavior is identical to how Linux handles the value -1, and is
influenced by the fact that the manual pages for the group of one
argument syscalls that handle ID operations is ambiguous about this
topic.
The goal of this file is to enable C++ overloaded functions for
standard builtin functions that we use. It contains fallback
implementations for systems that do not have the builtins available.
This unbreaks the /var/run/utmp system which starts out as an empty
string, and is then turned into an object by the first update.
This isn't necessarily the best way for this to work, but it's how
it used to work, so this just fixes the regression for now.
The instructions can have dependencies (e.g. Repeat), so only unify
equal blocks instead of consecutive instructions.
Fixes#11247.
Also adds the minimal test case(s) from that issue.
Fixes a crash that was caused by a syntax error which is difficult to
catch by the parser: usually identifiers are accepted in column lists,
but they are not in a list of column values to be inserted in an INSERT.
Fixed this by putting in a heuristic check; we probably need a better
way to do this.
Included tests for this case.
Also introduced a new SQL Error code, `NotYetImplemented`, and return
that instead of crashing when encountering unimplemented SQL.
The handling of filesystem level errors was basically non-existing or
consisting of `VERIFY_NOT_REACHED` assertions. Addressed this by
* Adding `open` methods to `Heap` and `Database` which return errors.
* Changing the interface of methods of these classes and clients
downstream to propagate these errors.
The constructors of `Heap` and `Database` don't open the underlying
filesystem file anymore.
The SQL statement handlers return an `SQLErrorCode::InternalError`
error code if an error comes back from the lower levels. Note that some
of these errors are things like duplicate index entry errors that should
be caught before the SQL layer attempts to actually update the database.
Added tests to catch attempts to open weird or non-existent files as
databases.
Finally, in between me writing this patch and submitting the PR the
AK::Result<Foo, Bar> template got deprecated in favour of ErrorOr<Foo>.
This resulted in more busywork.
For example, consider the following adjacent entries in UnicodeData.txt:
3400;<CJK Ideograph Extension A, First>;Lo;0;L;;;;;N;;;;;
4DBF;<CJK Ideograph Extension A, Last>;Lo;0;L;;;;;N;;;;;
Our current implementation would assign the display name "CJK Ideograph
Extension A" to code points U+3400 & U+4DBF, but not to the code points
in between. Not only should those code points be assigned a name, but
the Unicode spec also has formatting rules on what the names should be
(the names for these ranged code points are not as they appear in
UnicodeData.txt).
The spec also defines names for code point ranges that actually are
listed individually in UnicodeData.txt. For example:
2F800;CJK COMPATIBILITY IDEOGRAPH-2F800;Lo;0;L;4E3D;;;;N;;;;;
2F801;CJK COMPATIBILITY IDEOGRAPH-2F801;Lo;0;L;4E38;;;;N;;;;;
2F802;CJK COMPATIBILITY IDEOGRAPH-2F802;Lo;0;L;4E41;;;;N;;;;;
Code points are only coalesced into a range if all fields after the name
are equivalent. Our parser will insert the range and its name formatting
pattern when it comes across the first code point in that range, then
ignore other code points in that range. This reduces the number of names
we generated by nearly 2,000.
At the moment we just check if we *can* render a simple triangle, we do
not yet actually test if the image is indeed the triangle we wanted.
This test also outputs the rendered image when GL_DEBUG is enabled to a
file called "picture.bmp" for manual verification.
Co-authored-by: sunverwerth <s.unverwerth@serenityos.org>
Since we no longer populate a Vector<String> the lifetime of the strings
in all of these tests is now messed up, as the Vector<StringView> now
points to free'd memory.
We attempt to fix this for the unit tests, by saving the results in a
RAII type that should live as long as the test wants to validate some
output of the ArgParser.
As noted by ECMA-402, if a supported locale contains all of a language,
script, and region subtag, then the implementation must also support the
locale without the script subtag. The most complicated example of this
is the zh-TW locale.
The list of locales in the CLDR database does not include zh-TW or its
maximized zh-Hant-TW variant. Instead, it inlcudes the zh-Hant locale.
However, zh-Hant-TW is listed in the default-content locale list in the
cldr-core package. This defines an alias from zh-Hant-TW to zh-Hant. We
must then also support the zh-Hant-TW alias without the script subtag:
zh-TW. This transitively maps zh-TW to zh-Hant, which is a case quite
heavily tested by test262.
This is a naive implementation based on the symmetry with `asin`.
Before, I'm not really sure what we were doing, but it was returning
wildly incorrect results.
The initial `ForkStay` is only needed if the looping block has a
following block, if there's no following block or the following block
does not attempt to match anything, we should not insert the ForkStay,
otherwise we would be rewriting `a+` as `a*` by allowing the 'end' to be
executed.
Fixes#10952.
This isn't a complete conversion to ErrorOr<void>, but a good chunk.
The end goal here is to propagate buffer allocation failures to the
caller, and allow the use of TRY() with formatting functions.
Also add slightly richer parse errors now that we can include a string
literal with returned errors.
This will allow us to use TRY() when working with JSON data.
Currently, we get the following results
-1 - -2 = -1
-2 - -1 = 1
Correct would be:
-1 - -2 = 1
-2 - -1 = -1
This was already attempted to be fixed in 7ed8970, but that change was
incorrect. This directly translates to LibJS BigInts having the same
incorrect behavior - it even was tested.
We were passing raw Gfx::Bitmap objects into the various image decoders
instead of encoded image data. This made all of them fail, but the test
expectations were set up in a way that aligned with this outcome.
With this patch, we now test the codecs for real. Except ICO, since we
don't have an ICO file handy. That's a FIXME.
Same as Vector, ByteBuffer now also signals allocation failure by
returning an ENOMEM Error instead of a bool, allowing us to use the
TRY() and MUST() patterns.
This patch introduces table joins. It uses a pretty dumb algorithm-
starting with a singleton '__unity__' row consisting of a single boolean
value, a cartesian product of all tables in the 'FROM' clause is built.
This cartesian product is then filtered through the 'WHERE' clause,
again without any smarts just using brute force.
This patch required a bunch of busy work to allow for example the
ColumnNameExpression having to deal with multiple tables potentially
having columns with the same name.
Because SQL is the craptastic language that it is, sometimes expressions
need to know details about the calling statement. For example the tables
in the 'FROM' clause may be needed to determine which columns are
referenced in 'WHERE' expressions. So the current statement is added
to the ExecutionContext and a new 'execute' overload on Statement is
created which takes the Database and the Statement and builds an
ExecutionContaxt from those.
These are needed to distinguish columns from different tables with the
same column name in one and the same (joined) Tuple. Not quite happy
yet with this API; I think some sort of hierarchical structure would be
better but we'll burn that bridge when we get there :^)
This file contains the list of locales which default to their parent
locale's values. In the core CLDR dataset, these locales have their own
files, but they are empty (except for identity data). For example:
https://github.com/unicode-org/cldr/blob/main/common/main/en_US.xml
In the JSON export, these files are excluded, so we currently are not
recognizing these locales just by iterating the locale files.
This is a prerequisite for upgrading to CLDR version 40. One of these
default-content locales is the popular "en-US" locale, which defaults to
"en" values. We were previously inferring the existence of this locale
from the "en-US-POSIX" locale (many implementations, including ours,
strip variants such as POSIX). However, v40 removes the "en-US-POSIX"
locale entirely, meaning that without this change, we wouldn't know that
"en-US" exists (we would default to "en").
For more detail on this and other v40 changes, see:
https://cldr.unicode.org/index/downloads/cldr-40#h.nssoo2lq3cba
When I added this code in 1472f6d, I forgot to add tests for it. That's
why I didn't realize that the values were appended to the wrong
FormatBuilder object, so an empty string was returned instead of the
expected "nan"/"inf". This made debugging some FPU issues with the
ScummVM port significantly more difficult.
We create a base class called GenericFramebufferDevice, which defines
all the virtual functions that must be implemented by a
FramebufferDevice. Then, we make the VirtIO FramebufferDevice and other
FramebufferDevice implementations inherit from it.
The most important consequence of rearranging the classes is that we now
have one IOCTL method, so all drivers should be committed to not
override the IOCTL method or make their own IOCTLs of FramebufferDevice.
All graphical IOCTLs are known to all FramebufferDevices, and it's up to
the specific implementation whether to support them or discard them (so
we require extensive usage of KResult and KResultOr, together with
virtual characteristic functions).
As a result, the interface is much cleaner and understandable to read.
Also add a test to prevent this from happening again. There were two
bugs:
* The number of bytes just after processing the last value was written,
instead of the number of bytes after skipping remaining whitespace.
Confirmed by testing against GNU's `scanf()` since the man page
leaves something to be desired.
* The number of bytes was written to the wrong variable argument; i.e.
the first argument was overwritten.
In the long-term, we should probably have a way to signal decoding
failure. For now, it should suffice to at least not crash. This is
particularly relevant because apparently this can be triggered while
parsing a PEM certificate, which happens during every TLS connection.
Found by OSS Fuzz
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=38979
To ensure everything works as expected, a unit test was added with
multiple scenarios.
This binary has to have the SetUID flag, and we also bind-mount the
/usr/Tests directory to allow running of SetUID binaries.
The old versions were renamed to JS_DECLARE_OLD_NATIVE_FUNCTION and
JS_DEFINE_OLD_NATIVE_FUNCTION, and will be eventually removed once all
native functions were converted to the new format.
This function converts a single wide character into its multibyte
representation (UTF-8 in our case). It is called from libc++'s
`std::basic_ostream<wchar_t>::flush`, which gets called at program exit
from a global destructor in order to flush `std::wcout`.
Consider the situation where two shared libraries libA and libB, both
depending (as in having a NEEDED dtag) on libC. libA is first
dlopen()-ed, which produces libC to be mapped and linked. When libB is
dlopen()-ed the DynamicLinker would re-map and re-link libC though,
causing any previous references to its old location to be invalid. And
if libA's PLT has been patched to point to libC's symbols, then any
further invocations to libA will cause the code to jump to a virtual
address that isn't mapped anymore, therefore causing a crash. This
situation was reported in #10014, although the setup was more convolved
in the ticket.
This commit fixes the issue by distinguishing between a main program
loading being performed by Loader.so, and a dlopen() call. The main
difference between these two cases is that in the former the
s_globals_objects maps is always empty, while in the latter it might
already contain dependencies for the library being dlopen()-ed. Hence,
when collecting dependencies to map and link, dlopen() should skip those
that are present in the global map to avoid the issue described above.
With this patch the original issue seen in #10014 is gone, with all
python3 modules (so far) loading correctly.
A unit test reproducing a simplified issue is also included in this
commit. The unit test includes the building of two dynamic libraries A
and B with both depending on libline.so (and B also depending on A); the
test then dlopen()s libA, invokes one its function, then does the same
with libB.
Generate a sorted, compressed series of ranges in a match table for
character classes, and use a binary search to find the matches.
This is about a 3-4x speedup for character class match performance. :^)
The callback should be called as soon as the connection is established,
and if we actually set the callback when it already is, we expect it to
be called immediately.
See #10042 for details. In short: qemu doesn't seem to implement that
feature, therefore the test correctly fails. However, that does not help
us, so we skip that test.
This makes it so we don't need to specify the full path to all the
helper scripts we include() from different places in the codebase and
feels a lot cleaner.
DisjointChunks<T> provides a nice interface over multiple sequential
Vector<T>'s, allowing the user to iterate over/index into/slice from
said buffers as if they were a single contiguous buffer.
To work with views on such objects, DisjointSpans<T> is provided, which
has the same behaviour but does not own the underlying objects.
This removes the awkward String::replace API which was the only String
API which mutated the String and replaces it with a new immutable
version that returns a new String with the replacements applied. This
also fixes a couple of UAFs that were caused by the use of this API.
As an optimization an equivalent StringView::replace API was also added
to remove an unnecessary String allocations in the format of:
`String { view }.replace(...);`
Command used:
grep -Pirn '(out|warn)ln\((?!["\)]|format,|stderr,|stdout,|output, ")' \
AK Kernel/ Tests/ Userland/
(Plus some manual reviewing.)
Let's pick ArgsParser as an example:
outln(file, m_general_help);
This will fail at runtime if the general help happens to contain braces.
Even if this transformation turns out to be unnecessary in a place or
two, this way the code is "more obviously" correct.
Commit 890c647e0f fixed an off-by-one bug, so the mapping of the page
at the very end of the user address space now works correctly.
This change adjusts the test so cover the corner cases the original
version was designed too.validate.
Previously, LibUnicode would store the values of a keyword as a Vector.
For example, the locale "en-u-ca-abc-def" would have its keyword "ca"
stored as {"abc, "def"}. Then, canonicalization would occur on each of
the elements in that Vector.
This is incorrect because, for example, the keyword value "true" should
only be dropped if that is the entire value. That is, the canonical form
of "en-u-kb-true" is "en-u-kb", but "en-u-kb-abc-true" does not change
for canonicalization. However, we would canonicalize that locale as
"en-u-kb-abc".
This should fix the flaky tests of test-js.
It also fixes the tests when running with the -g flag since the values
will not be garbage collected too soon.
Note that the algorithm in the Unicode spec is for checking that a code
point precedes U+0307, but the special casing condition NotBeforeDot is
interested in the inverse of this rule.
Executing 100 times vs 10 times doesn't increase test coverage
substantial, this API is stable and more iterations is just a
waste of time.
Without KVM this test is a clear outlier in runtime during CI:
```
START LibC/TestQsort (106/172)
PASS LibC/TestQsort (41.233692s)
```
This avoids a value copy when calling value() or value_or() on a
temporary Optional. This is very common when using the HashMap::get()
API like this:
auto value = hash_map.get(key).value_or(fallback_value);
Our existing implementation did not check the element type of the other
pointer in the constructors and move assignment operators. This meant
that some operations that would require explicit casting on raw pointers
were done implicitly, such as:
- downcasting a base class to a derived class (e.g. `Kernel::Inode` =>
`Kernel::ProcFSDirectoryInode` in Kernel/ProcFS.cpp),
- casting to an unrelated type (e.g. `Promise<bool>` => `Promise<Empty>`
in LibIMAP/Client.cpp)
This, of course, allows gross violations of the type system, and makes
the need to type-check less obvious before downcasting. Luckily, while
adding the `static_ptr_cast`s, only two truly incorrect usages were
found; in the other instances, our casts just needed to be made
explicit.
Using a file(GLOB) to find all the test files in a directory is an easy
hack to get things started, but has some drawbacks. Namely, if you add
a test, it won't be found again without re-running CMake. `ninja` seems
to do this automatically, but it would be nice to one day stop seeing it
rechecking our globbed directories.
Calendar subtags are a bit of an odd-man-out in that we must match the
variants "ethiopic-amete-alem" in that order, without any other variant
in the locale. So a separate method is needed for this, and we now defer
sorting the variant list until after other canonicalization is done.
Unicode TR35 defines how locale subtag aliases should be emplaced when
converting a locale to canonical form. For most subtags, it is a simple
substitution. Language subtags depend on context; for example, the
language "sh" should become "sr-Latn", but if the original locale has a
script subtag already ("sh-Cyrl"), then only the language subtag of the
alias should be taken ("sr-Latn").
To facilitate this, we now make two passes when canonicalizing a locale.
In the first pass, we convert the LocaleID structure to canonical syntax
(where the conversions all happen in-place). In the second pass, we form
the canonical string based on the canonical syntax.
Originally, it was convenient to store the parsed Unicode locale data as
views into the original string being parsed. But to implement locale
aliases will require mutating the data that was parsed. To prepare for
that, store the parsed data as proper strings.
Some i64 values will not fit in normal doubles, and these values _are_
tested by the test suite, this makes the test runtime capable of
handling them correctly.
When swapping the same object, we could end up with a double-free error.
This was found while quick-sorting a Vector of Variants holding complex
types, reproduced by the new swap_same_complex_object test case.
ECMA-402 requires validating user input against the EBNF grammar for
Unicode locales described in TR-35: https://www.unicode.org/reports/tr35
This commit adds validators for that grammar, as well as other helper to
e.g. canonicalize a locale string.
Since there are no real users of these functions in Serenity's
userland and this is my third attempt at this... This time, the great
LibTest test suite will make sure that I do it right!
Classes reading and writing to the data heap would communicate directly
with the Heap object, and transfer ByteBuffers back and forth with it.
This makes things like caching and locking hard. Therefore all data
persistence activity will be funneled through a Serializer object which
in turn submits it to the Heap.
Introducing this unfortunately resulted in a huge amount of churn, in
which a number of smaller refactorings got caught up as well.
This patch provides very basic, bare bones implementations of the
INSERT and SELECT statements. They are *very* limited:
- The only variant of the INSERT statement that currently works is
SELECT INTO schema.table (column1, column2, ....) VALUES
(value11, value21, ...), (value12, value22, ...), ...
where the values are literals.
- The SELECT statement is even more limited, and is only provided to
allow verification of the INSERT statement. The only form implemented
is: SELECT * FROM schema.table
These statements required a bit of change in the Statement::execute
API. Originally execute only received a Database object as parameter.
This is not enough; we now pass an ExecutionContext object which
contains the Database, the current result set, and the last Tuple read
from the database. This object will undoubtedly evolve over time.
This API change dragged SQLServer::SQLStatement into the patch.
Another API addition is Expression::evaluate. This method is,
unsurprisingly, used to evaluate expressions, like the values in the
INSERT statement.
Finally, a new test file is added: TestSqlStatementExecution, which
tests the currently implemented statements. As the number and flavour of
implemented statements grows, this test file will probably have to be
restructured.
The implemtation of the Value class was based on lambda member variables
implementing type-dependent behaviour. This was done to ensure that
Values can be used as stack-only objects; the simplest alternative,
virtual methods, forces them onto the heap. The problem with the the
lambda approach is that it bloats the Values (which are supposed to be
lightweight objects) quite considerably, because every object contains
more than a dozen function pointers.
The solution to address both problems (we want Values to be able to live
on the stack and be as lightweight as possible) chosen here is to
encapsulate type-dependent behaviour and state in an implementation
class, and let the Value be an AK::Variant of those implementation
classes. All methods of Value are now basically straight delegates to
the implementation object using the Variant::visit method.
One issue complicating matters is the addition of two aggregate types,
Tuple and Array, which each contain a Vector of Values. At this point
Tuples and Arrays (and potential future aggregate types) can't contain
these aggregate types. This is limiting and needs to be addressed.
Another area that needs attention is the nomenclature of things; it's
a bit of a tangle of 'ValueBlahBlah' and 'ImplBlahBlah'. It makes sense
right now I think but admit we probably can do better.
Other things included here:
- Added the Boolean and Null types (and Tuple and Array, see above).
- to_string now always succeeds and returns a String instead of an
Optional. This had some impact on other sources.
- Added a lot of tests.
- Started moving the serialization mechanism more towards where I want
it to be, i.e. a 'DataSerializer' object which just takes
serialization and deserialization requests and knows for example how
to store long strings out-of-line.
One last remark: There is obviously a naming clash between the Tuple
class and the Tuple Value type. This is intentional; I plan to make the
Tuple class a subclass of Value (and hence Key and Row as well).
For example, consider the following pattern:
new RegExp('\ud834\udf06', 'u')
With this pattern, the regex parser should insert the UTF-8 encoded
bytes 0xf0, 0x9d, 0x8c, and 0x86. However, because these characters are
currently treated as normal char types, they have a negative value since
they are all > 0x7f. Then, due to sign extension, when these characters
are cast to u64, the sign bit is preserved. The result is that these
bytes are inserted as 0xfffffffffffffff0, 0xffffffffffffff9d, etc.
Fortunately, there are only a few places where we insert bytecode with
the raw characters. In these places, be sure to treat the bytes as u8
before they are cast to u64.
Unfortunately, this requires a slight divergence in the way the capture
group names are stored. Previously, the generated byte code would simply
store a view into the regex pattern string, so no string copying was
required.
Now, the escape sequences are decoded into a new string, and a vector
of all parsed capture group names are stored in a vector in the parser
result structure. The byte code then stores a view into the
corresponding string in that vector.
This parsing is already duplicated between LibJS and LibRegex, and will
shortly be needed in more places in those libraries. Move it to AK to
prevent further duplication.
This API will consume escaped Unicode code points of the form:
\\u{code point}
\\unnnn (where each n is a hexadecimal digit)
\\unnnn\\unnnn (where the two escaped values are a surrogate pair)
Currently, when we need to repeat an instruction N times, we simply add
that instruction N times in a for-loop. This doesn't scale well with
extremely large values of N, and ECMA-262 allows up to N = 2^53 - 1.
Instead, add a new REPEAT bytecode operation to defer this loop from the
parser to the runtime executor. This allows the parser to complete sans
any loops (for this instruction), and allows the executor to bail early
if the repeated bytecode fails.
Note: The templated ByteCode methods are to allow the Posix parsers to
continue using u32 because they are limited to N = 2^20.
Combining these into one list helps reduce the size of MatchState, and
as a result, reduces the amount of memory consumed during execution of
very large regex matches.
Doing this also allows us to remove a few regex byte code instructions:
ClearNamedCaptureGroup, SaveLeftNamedCaptureGroup, and NamedReference.
Named groups now behave the same as unnamed groups for these operations.
Note that SaveRightNamedCaptureGroup still exists to cache the matched
group name.
This also removes the recursion level from the MatchState, as it can
exist as a local variable in Matcher::execute instead.
The grammar for the ECMA-262 CharacterEscape is:
CharacterEscape[U, N] ::
ControlEscape
c ControlLetter
0 [lookahead ∉ DecimalDigit]
HexEscapeSequence
RegExpUnicodeEscapeSequence[?U]
[~U]LegacyOctalEscapeSequence
IdentityEscape[?U, ?N]
It's important to parse the standalone "\0 [lookahead ∉ DecimalDigit]"
before parsing LegacyOctalEscapeSequence. Otherwise, all standalone "\0"
patterns are parsed as octal, which are disallowed in Unicode mode.
Further, LegacyOctalEscapeSequence should also be parsed while parsing
character classes.
A subsequent commit will add tests that require a string containing only
"\0". As a C-string, this will be interpreted as the null terminator. To
make the diff for that commit easier to grok, this commit converts all
tests to use StringView without any other functional changes.
* Only alphabetic (A-Z, a-z) characters may be escaped with \c. The loop
currently parsing \c includes code points between the upper/lower case
groups.
* In Unicode mode, all invalid identity escapes should cause a parser
error, even in browser-extended mode.
* Avoid an infinite loop when parsing the pattern "\c" on its own.
Similarly to the LibCpp parser regression tests, these tests run the
preprocessor on the .cpp test files under
Userland/LibCpp/Tests/preprocessor, and compare the output with existing
.txt ground truth files.
These script extensions have some peculiar behavior in the Unicode spec.
The UCD ScriptExtension file does not contain these scripts. Rather, it
is implied the code points which have these scripts as an extension are
the code points that both:
1. Have Common or Inherited as their primary script value
2. Do not have any other script value in their script extension lists
Because these are not explictly listed in the UCD, we must manually form
these script extensions.
Notice that unlike the note in populate_general_category_unions(),
script extension do indeed have code point ranges which overlap. Thus,
this commit adds code to handle that, and hooks it into the GC unions.
Previously, each code point's General Category was part of the generated
UnicodeData structure. This ultimately presented two problems, one
functional and one performance related:
* Some General Categories are applied to unassigned code points, for
example the Unassigned (Cn) category. Unassigned code points are
strictly excluded from UnicodeData.txt, so by relying on that file,
the generator is unable to handle these categories.
* Lookups for General Categories are slower when searching through the
large UnicodeData hash map. Even though lookups are O(1), the hash
function turned out to be slower than binary searching through a
category-specific table.
So, now a table is generated for each General Category. When querying a
code point for a category, a binary search is done on each code point
range in that category's table to check if code point has that category.
Further, General Categories are now parsed from the UCD file
DerivedGeneralCategory.txt. This file is a normal "prop list" file and
contains the categories for unassigned code points.
There seems to be more incorrect assumptions about Clang-built
executables' memory layout than expected. These make the CI fail even
though the system is functional in all other aspects. While this is
being fixed, let's just disable tests for UserspaceEmulator.
This helps us avoid weird truncation issues and fixes a bug on Clang
builds where truncation while reading caused the DIE offsets following
large LEB128 numbers to be incorrect. This removes the need for the
separate `LongUnsignedNumber` type.
We now call Preprocessor::process_and_lex() and pass the result to the
parser.
Doing the lexing in the preprocessor will allow us to maintain the
original position information of tokens after substituting definitions.
Improve the parsing of data urls in URLParser to bring it more up-to-
spec. At the moment, we cannot parse the components of the MIME type
since it is represented as a string, but the spec requires it to be
parsed as a "MIME type record".
This is a regression test to validate the functionality that was
reported broken in #9071, where the kernel would spin attempting
to cancel a stale timer.
Before now, only binary properties could be parsed. Non-binary props are
of the form "Type=Value", where "Type" may be General_Category, Script,
or Script_Extension (or their aliases). Of these, LibUnicode currently
supports General_Category, so LibRegex can parse only that type.
This changes LibRegex to parse the property escape as a Variant of
Unicode Property & General Category values. A byte code instruction is
added to perform matching based on General Category values.
The following commit broke Tests/AK/TestJSON.cpp as it removed the
file that the test loaded from disk to validate JSON parsing.
commit ad141a2286
Author: Andreas Kling <kling@serenityos.org>
Date: Sat Jul 31 15:26:14 2021 +0200
Base: Remove "test.frm" from HackStudio test project
Instead of restoring the file, lets just embed a bit of JSON in the
test case to avoid using external resources, as they obviously are
surprising and make the test less portable across environments.
This supports some binary property matching. It does not support any
properties not yet parsed by LibUnicode, nor does it support value
matching (such as Script_Extensions=Latin).
Previously unmapping any offset starting at 0x0 would assert in the
kernel, add a regression test to validate the fix.
Co-authored-by: Federico Guerinoni <guerinoni.federico@gmail.com>
Apparently, some code points fit both categories, for example U+0345
(COMBINING GREEK YPOGEGRAMMENI). Handle this fact when determining if
a code point is a final code point in a string.
This implements unconditional special case folding, and conditional
folding for non-locale cases. Worth noting that the only conditional,
non-locale special case is for converting an uppercase sigma to
lowercase.
The Unicode standard publishes the Unicode Character Database (UCD) with
information about every code point, such as each code point's upper case
mapping. LibUnicode exists to download and parse UCD files at build time
and to provide accessors to that data.
As a start, LibUnicode includes upper- and lower-case code point
converters.
Previously there was no way to create a MACAddress by passing a direct
address as a string. This will allow programs like the arp utility to
create a MACAddress instance by user-passed addresses.
When the Unicode flag is set, regular expressions may escape code points
by surrounding the hexadecimal code point with curly braces, e.g. \u{41}
is the character "A".
When the Unicode flag is not set, this should be considered a repetition
symbol - \u{41} is the character "u" repeated 41 times. This is left as
a TODO for now.
When the Unicode option is not set, regular expressions should match
based on code units; when it is set, they should match based on code
points. To do so, the regex parser must combine surrogate pairs when
the Unicode option is set. Further, RegexStringView needs to know if
the flag is set in order to return code point vs. code unit based
string lengths and substrings.
This is a generally nicer-to-use version of the existing {any,all}_of()
that doesn't require the user to explicitly provide two iterators.
As a bonus, it also allows arbitrary iterators (as opposed to the hard
requirement of providing SimpleIterators in the iterator version).
The state of the formatter for the previous element should be thrown
away for each iteration. This showed up when trying to format a
Vector<String>, since Formatter<StringView> was unhappy about some state
that gets set when it's called. Add a test for Formatter<Vector>.
During a recent commit the 64-bit kernel was moved to a different
address, breaking this test (unnoticed). This fixes it, so we can
turn on breaking x86_64 tests on the CI again.
This commit makes LibRegex (mostly) capable of operating on any of
the three main string views:
- StringView for raw strings
- Utf8View for utf-8 encoded strings
- Utf32View for raw unicode strings
As a result, regexps with unicode strings should be able to properly
handle utf-8 and not stop in the middle of a code point.
A future commit will update LibJS to use the correct type of string
depending on the flags.
While trying to port to Clang we found that the functions as
implemented didn't actually work, and replacing them with a blatantly
broken function also did not break the tests on the GCC build. It
turns out we've been testing GCC's builtins by many tests. This
removes the use of builtins for LibM's tests (so we test the whole
function). It turns off the denormal test for scalbn (which was not
implemented) and comments out the tgamma(0.5) test which is too
inaccurate to be usable (and too complicated for me to fix). The gamma
function was made accurate for all other test cases, and asin received
two more layers of Taylor expansion to bring it within error margin
for the tests.
Previously, HTMLToken would expose the Vector<Attribute> directly to
its users. In preparation for a future change, all users now use
implementation-agnostic APIs which do not expose the Vector directly.
* wasm: Don't try to print the function results if it traps
* LibWasm: Inline some very hot functions
These are mostly pretty small functions too, and they were about ~10%
of runtime.
* LibWasm+Everywhere: Make the instruction count limit configurable
...and enable it for LibWeb and test-wasm.
Note that `wasm` will not be limited by this.
* LibWasm: Remove a useless use of ScopeGuard
There are no multiple exit paths in that function, so we can just put
the ending logic right at the end of the function instead.
Prior to this, it'd try to stuff them into an i64, which could fail and
give us nothing.
Even though this is an extension we've made to JSON, the parser should
be able to correctly round-trip from whatever our serialiser has
generated.
This fixes parsing the following regular expression: /</g;
It also adds a simple script element to the HTMLTokenizer regression
test, which also contains that specific regex.
The test suite includes a few basic tests and a very crude regression
test, which just concatenates the to_string() of all tokens and checks
the String's hash to be equal. This relies on the format of
HTMLToken::to_string() to stay the same, which is not ideal.
Since Clang enables a couple of warnings that we don't have in GCC,
these were not caught before. Included fixes:
- Use correct printf format string for `size_t`
- Don't compare Nonnull(Ref|Own)Ptr` to nullptr
- Fix unsigned int& => unsigned long& conversion
This test exposed a kernel panic in is_user_range calculations, so let's
convert it to be a LibTest test so we can prevent regressions in mmap,
the page allocator, and the memory manager.
Let's bring this class back, but without the confusing resize() API.
A FixedArray<T> is simply a fixed-size array of T.
The size is provided at run-time, unlike Array<T> where the size is
provided at compile-time.
This patch introduces the SQLServer system server. This service is
supposed to be the only process/application talking to database storage.
This makes things like locking and caching more reliable, easier to
implement, and more efficient.
In LibSQL we added a client component that does the ugly IPC nitty-
gritty for you. All that's needed is setting a number of event handler
lambdas and you can connect to databases and execute statements on them.
Applications that wish to use this SQLClient class obviously need to
link LibSQL and LibIPC.
The Order enum is used in the Meta component of LibSQL. Using this enum
meant having to include the monster AST/AST.h include file. Furthermore,
they are sort of basic and therefore can live in the general SQL
namespace. Moved to LibSQL/Type.h.
Also introduced a new class, SQLResult, which is needed in future
patches.
Now that the test is converted to be LibTest based, we can remove it
from the exclude list in /home/anon/.config/Tests.ini.
Prior to this it would crash and fail because it was signaled instead of
returning normally with exit code 0.
After the changes to LibGfx to make default font management handled in
WindowServer instead of each GUI application to allow for global font
broadcasts, the two LibGfx tests broke. The non-benchmark was fixed in
8f96d2, but the benchmark was left in the dust because nobody really
runs it manually :^(
These are usually incorrect, and people sometimes forget to add the
correct values as a result of them being optional, so they should just
be specified explicitly.
This adds a test case for String::find and String::find_all with empty
needles. The expected behavior is in line with what the C++ standard
library (and other languages standard libraries) expect.
This implements StringUtils::find_any_of() and uses it in
String::find_any_of() and StringView::find_any_of(). All uses of
find_{first,last}_of have been replaced with find_any_of(), find() or
find_last(). find_{first,last}_of have subsequently been removed.
This removes StringView::find_first_of(char) and find_last_of(char) and
replaces all its usages with find and find_last respectively. This is
because those two methods are functionally equivalent.
find_{first,last}_of should only be used if searching for multiple
different characters, which is never the case with the char argument.
This also adds the [[nodiscard]] to the remaining find_{first,last}_of
methods.
This replaces the current LexicalPath::append() API with a new method
that returns a new LexicalPath object and doesn't touch the this-object.
With this, LexicalPath is now immutable. It also adds a
LexicalPath::parent() method and the relevant test cases.
Since this is always set to true on the non-default constructor and
subsequently never modified, it is somewhat pointless. Furthermore,
there are arguably no invalid relative paths.
Split out the functionality to gather multiple tests from the filesystem
and run them in turn into Test::TestRunner, and leave the JavaScript
specific test harness logic in Test::JS::TestRunner and friends.
If someone runs the test with shell redirection going on, or in a way
that changes any of the standard file descriptors this assumption will
not hold. When running from a terminal normally, it is true however.
Instead, check that /proc/self/fd/[0,1,2] are symlinks, and can be
stat-d by verifying that both stat and lstat succeed, and give different
struct stat contents.