Currently, the unique string lists are stored in the initialized data
sections of their shared libraries. In order to move the data to the
read-only section, generate the strings using RLE arrays.
We generate two arrays: the first is the RLE data itself, the second is
a list of indices into the RLE array for each string. We then generate a
decoding method to convert an RLE string to a StringView.
This isn't called out in TR-35, but before ICU even looks at CLDR data,
it adds a hard-coded set of default patterns to each locale's calendar.
It has done this since 2006 when its DateTimeFormat feature was first
created. Several test262 tests depend on this, which under ECMA-402,
falls into "implementation defined" behavior. For compatibility, we
can do the same in LibUnicode.
In the generated unique string list, index 0 is the empty string, and is
used to indicate a value doesn't exist in the CLDR. Check for this
before returning an empty calendar symbol.
For example, an upcoming commit will add the fixed day period "noon",
which not all locales support.
Commit ec7d535 only partially handled the case of flexible day periods
rolling over midnight, in that it only worked for hours after midnight.
For example, the en locale defines a day period range of [21:00, 06:00).
The previous method of adding 24 hours to the given hour would change
e.g. 23:00 to 47:00, which isn't valid.
This is a start to properly letting us cross-compile Lagom where both
the Tools and the BUILD_LAGOM=ON build are using Lagom CMakeLists.
The initial cut allows an Android build to succeed, more or less.
But there are issues with namespace clashes when using FetchContent with
this approach.
When patterns, grouping digits, symbols, etc. for a requested numbering
system are not found, use the locale's default numbering system. This
will allow using the correct digits e.g. for the locale "en-u-nu-arab"
even though the "en" locale only contains patterns for the "latn"
numbering system.
Replacement conditions for `requires_argument` have been chosen based
on what would be most convenient for implementing an eventual optional
argument mode.
While null StringViews are just as bad, these prevent the removal of
StringView(char const*) as that constructor accepts a nullptr.
No functional changes.
This prevents us from needing a sv suffix, and potentially reduces the
need to run generic code for a single character (as contains,
starts_with, ends_with etc. for a char will be just a length and
equality check).
No functional changes.
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).
No functional changes.
Error::from_string_literal now takes direct char const*s, while
Error::from_string_view does what Error::from_string_literal used to do:
taking StringViews. This change will remove the need to insert `sv`
after error strings when returning string literal errors once
StringView(char const*) is removed.
No functional changes.
This commit moves the length calculations out to be directly on the
StringView users. This is an important step towards the goal of removing
StringView(char const*), as it moves the responsibility of calculating
the size of the string to the user of the StringView (which will prevent
naive uses causing OOB access).
These are mostly minor mistakes I've encountered while working on the
removal of StringView(char const*). The usage of builder.put_string over
Format<FormatString>::format is preferrable as it will avoid the
indirection altogether when there's no formatting to be done. Similarly,
there is no need to do format(builder, "{}", number) when
builder.put_u64(number) works equally well.
Additionally a few Strings where only constant strings were used are
replaced with StringViews.
To prepare for using plural rules within number & duration format, this
removes the NumberFormat::Plurality enumeration.
This also adds PluralCategory::ExactlyZero & PluralCategory::ExactlyOne.
These are used in locales like French, where PluralCategory::One really
means any value from 0.00 to 1.99. PluralCategory::ExactlyOne means only
the value 1, as the name implies. These exact rules are not known by the
general plural rules, they are explicitly for number / currency format.
The PluralCategory enum is currently generated for plural rules. Instead
of generating it, this moves the enum to the public LibUnicode header.
While it was nice to auto-discover these values, they are well defined
by TR-35, and we will need their values from within the number format
code generator (which can't rely on the plural rules generator having
run yet). Further, number format will require additional values in the
enum that plural rules doesn't know about.
This patch adds support for URLSearchParams to XHR::send() and
introduces the union type XMLHttpRequestBodyInit.
XHR::send() now has support for String and URLSearchParams.
Plural rules in the CLDR are of the form:
"cs": {
"pluralRule-count-one": "i = 1 and v = 0 @integer 1",
"pluralRule-count-few": "i = 2..4 and v = 0 @integer 2~4",
"pluralRule-count-many": "v != 0 @decimal 0.0~1.5, 10.0, 100.0 ...",
"pluralRule-count-other": "@integer 0, 5~19, 100, 1000, 10000 ..."
}
The syntax is described here:
https://unicode.org/reports/tr35/tr35-numbers.html#Plural_rules_syntax
There are up to 2 sets of rules for each locale, a cardinal set and an
ordinal set. The approach here is to generate a C++ function for each
set of rules. Each condition in the rules (e.g. "i = 1 and v = 0") is
transpiled to a C++ if-statement within its function. Then lookup tables
are generated to match locales to their generated functions.
NOTE: -Wno-parentheses-equality is added to the LibUnicodeData compile
flags because the generated plural rules have lots of extra parentheses
(because e.g. we need to selectively negate and combine rules). The code
to generate only exactly the right number of parentheses is quite hairy,
so this just tells the compiler to ignore the extras.
This includes:
* The minimum number of days in a week for that week to count as the
first week of a new year.
* The day to be shown as the first day of the week in a calendar.
* The start/end days of the weekend.
Like the existing hour cycle data, week data is presented per-region in
the CLDR, rather than per-locale. The method to add likely subtags to a
locale to perform region lookups is the same.
The list of regions in the CLDR for hour cycle, minimum days, first day,
and weekend days are quite different. So rather than changing the
existing HourCycleRegion enum to a generic Region enum, we generate
separate enums for each of the week data fields. This allows each lookup
into these fields to remain simple array-based index access, without any
"jumps" for regions that don't have CLDR data for a field.
Currently contains just each locale's character order, but is set up to
easily add other text layout fields from the CLDR if ECMA-402 eventually
requires them.
The zone1970.tab file in the TZDB contains regional time zone data, some
of which we already parse for the system time zone settings map.
This parses the region names from that file and generates a list of time
zones which are used in each of those regions.
Add overrides for serenity_bin and serenity_lib to allow the actual
CMakeLists.txt from Userland to be used to build as many services as
possible without adding more clutter to Meta/Lagom/CMakeLists.txt
This matches the target names for the main serenity build, and will make
simplifying the Lagom build much easier going forward.
The LagomFoo name came from a time when we had both library builds in
the same CMake generated project and needed to deconflict the names.
This commit has no behavior changes.
In particular, this does not fix any of the wrong uses of the previous
default parameter (which used to be 'false', meaning "only replace the
first occurence in the string"). It simply replaces the default uses by
String::replace(..., ReplaceMode::FirstOnly), leaving them incorrect.
The main thing that is missing is validating certain pieces of data
against XML productions in well-formed mode, but nothing uses
well-formed mode right now.
Required by Closure Library for sanitising HTML.
e687b3d8ab/closure/goog/html/sanitizer/safedomtreeprocessor.js (L117)
I saw one site relying on this, where they are trying to set
XHR.responseType to "text/plain", which is not a valid responseType.
However, they also don't expect it to throw. The IDL spec special cases
enumerations to make it return instead of throwing in this case.
Previously we only threw an error if the enum was used as a function
argument. However, we are supposed to throw an error no matter the
context it is used in.
With the compilation of LibWeb, there's now quite a few cases where this
warning gets triggered. Rather than trying to fix them all right away,
we simply disable the warning for now.
This workaround was proposed by Andrew Kaster and BertalanD who promised
to open an issue about it!
By using RefPtrs to handle interfaces, the IDL parser could store cyclic
references to interfaces that import each other. One main example is the
"EventTarget.idl" and the "AbortSignal.idl" files, which both reference
each other. This caused huge amounts of memory not to be freed on exit.
To fix this, the parsed IDL interfaces are now stored in a HashTable of
NonnullOwnPtr<Interface>, which serves as the sole reference for every
parsed interface. All other usages of the Interface are changed to use
references instead of RefPtrs, or occasionally as raw pointers where
references don't fit inside the data structures.
This new HashTable is static, and as such will automatically be freed
prior to exiting the generator. This ensures that the code generator
properly cleans up after itself.
With this change, The IDL code generators can properly run on Lagom when
compiled with the -DENABLE_ADDRESS_SANITIZER=ON flag, and gets compiled
properly on the CI :^)
Previously, we were sending Buffers to the server whenever we had new
audio data for it. This meant that for every audio enqueue action, we
needed to create a new shared memory anonymous buffer, send that
buffer's file descriptor over IPC (+recfd on the other side) and then
map the buffer into the audio server's memory to be able to play it.
This was fine for sending large chunks of audio data, like when playing
existing audio files. However, in the future we want to move to
real-time audio in some applications like Piano. This means that the
size of buffers that are sent need to be very small, as just the size of
a buffer itself is part of the audio latency. If we were to try
real-time audio with the existing system, we would run into problems
really quickly. Dealing with a continuous stream of new anonymous files
like the current audio system is rather expensive, as we need Kernel
help in multiple places. Additionally, every enqueue incurs an IPC call,
which are not optimized for >1000 calls/second (which would be needed
for real-time audio with buffer sizes of ~40 samples). So a fundamental
change in how we handle audio sending in userspace is necessary.
This commit moves the audio sending system onto a shared single producer
circular queue (SSPCQ) (introduced with one of the previous commits).
This queue is intended to live in shared memory and be accessed by
multiple processes at the same time. It was specifically written to
support the audio sending case, so e.g. it only supports a single
producer (the audio client). Now, audio sending follows these general
steps:
- The audio client connects to the audio server.
- The audio client creates a SSPCQ in shared memory.
- The audio client sends the SSPCQ's file descriptor to the audio server
with the set_buffer() IPC call.
- The audio server receives the SSPCQ and maps it.
- The audio client signals start of playback with start_playback().
- At the same time:
- The audio client writes its audio data into the shared-memory queue.
- The audio server reads audio data from the shared-memory queue(s).
Both sides have additional before-queue/after-queue buffers, depending
on the exact application.
- Pausing playback is just an IPC call, nothing happens to the buffer
except that the server stops reading from it until playback is
resumed.
- Muting has nothing to do with whether audio data is read or not.
- When the connection closes, the queues are unmapped on both sides.
This should already improve audio playback performance in a bunch of
places.
Implementation & commit notes:
- Audio loaders don't create LegacyBuffers anymore. LegacyBuffer is kept
for WavLoader, see previous commit message.
- Most intra-process audio data passing is done with FixedArray<Sample>
or Vector<Sample>.
- Improvements to most audio-enqueuing applications. (If necessary I can
try to extract some of the aplay improvements.)
- New APIs on LibAudio/ClientConnection which allows non-realtime
applications to enqueue audio in big chunks like before.
- Removal of status APIs from the audio server connection for
information that can be directly obtained from the shared queue.
- Split the pause playback API into two APIs with more intuitive names.
I know this is a large commit, and you can kinda tell from the commit
message. It's basically impossible to break this up without hacks, so
please forgive me. These are some of the best changes to the audio
subsystem and I hope that that makes up for this :yaktangle: commit.
:yakring:
Each LibGL test can now be tested against a reference QOI image.
Initially, these images can be generated by setting `SAVE_OUTPUT` to
`true`, which will save a bunch of QOI images to `/home/anon`.
Similar reasoning to making Core::Stream::read() return Bytes, except
that every user of read_line() creates a StringView from the result, so
let's just return one right away.
A mistake I've repeatedly made is along these lines:
```c++
auto nread = TRY(source_file->read(buffer));
TRY(destination_file->write(buffer));
```
It's a little clunky to have to create a Bytes or StringView from the
buffer's data pointer and the nread, and easy to forget and just use
the buffer. So, this patch changes the read() function to return a
Bytes of the data that were just read.
The other read_foo() methods will be modified in the same way in
subsequent commits.
Fixes#13687
Alias values are represented by "alias-name=real-name".
We have a lot of repetitive code for converting between ValueID and
property-specific enums. Let's see if we can generate it. :^)
This first step just produces the enums, from a JSON file. The values in
there are a duplication of what's in Properties.json, but eventually
those will go away.
The goal here is to move the parser-internal classes into this namespace
so they can have more convenient names without causing collisions. The
Parser itself won't collide, and would be more convenient to just
remain `CSS::Parser`, but having a namespace and a class with the same
name makes C++ unhappy.
This adds the is_primitive() method as described in the Web IDL
specification. is_primitive() returns true if the type is a bigint,
boolean or numeric type.
This is in Tests/LibTTF instead of Tests/LibGfx because Tests/LibGfx
depends on serenity's file system layout and can't run in lagom,
but this new test runs just fine in lagom.
This is an editorial change in the Intl spec. See:
https://github.com/tc39/ecma402/commit/087995chttps://github.com/tc39/ecma402/commit/233d29c
This also adds a missing spec link for the sanctioned units and fixes a
broken spec link for IsSanctionedSingleUnitIdentifier. In LibUnicode,
the NumberFormat generator is updated to use the constexpr helper to
retrieve sanctioned units.
Currently this can parse XML and resolve external resources/references,
and read a DTD (but not apply or verify its rules).
That's good enough for _most_ XHTML documents as the HTML 5 spec
enforces its own rules about document well-formedness, and does not make
use of XML DTDs (aside from a list of predefined entities).
An accompanying `xml` utility is provided that can read and dump XML
documents, and can also run the XML conformance test suite.
This does a few things in total:
* Ports the IPC-compiler to LibMain
* Extract some compiler steps into separate functions
* Minify some appends to use appendln (or appendff in the case of
StringBuilder)
This reduces the clang-tidies maximum cognitive-complexity score for
this file from 325 to under 100.
We did already have range checking for the `<integer>` and `<number>`
types, but this patch adds this functionality to all numeric types
(dimensions and percentages).
The syntax in Properties.json is taken from the spec:
https://www.w3.org/TR/css-values-3/#numeric-ranges
eg, `length [0,∞]` defines that a Length is allowed as long as it has a
positive value.
The implementation here allows for any number to be the positive or
negative limit, even though only 0 and positive/negative infinity are
meaningful values without a unit.
This is a bit strange in the IDL syntax, but e.g., in HTMLSelectElement,
we have (simplified):
undefined add(optional (HTMLElement or long)? before = null)
This could instead become:
undefined add(optional (HTMLElement or long) before)
This change generates code for the former as if it were the latter.
I came across some websites that change an elements CSS "opacity" in
their :hover selectors. That caused us to relayout on hover, which we'd
like to avoid.
With this patch, we now check if a property only affects the stacking
context tree, and if nothing layout-affecting has changed, we only
invalidate the stacking context tree, causing it to be rebuilt on next
paint or hit test.
This makes :hover { opacity: ... } rules much faster. :^)
When building on an arm host system, char defaults to unsigned,
leading to errors such as:
serenity/AK/StringBuilder.cpp:198:20:
error: comparison is always true due to limited range of data type
[-Werror=type-limits]
198 | if (ch >= 0 && ch <= 0x1f)
|
Building with -fsigned-char makes things work like on Intel, and
it's what we already do in Kernel/CMakeLists.txt for the same reasons.
We have seen some cases where the build fails for folks, and they are
missing unzip/tar/gzip etc. We can catch some of these in CMake itself,
so lets make sure to handle that uniformly across the build system.
The REQUIRED flag to `find_program` was only added on in CMake 3.18 and
above, so we can't rely on that to actually halt the program execution.
Day and month name constants are defined in numerous places. This
pulls them together into a single place and eliminates the
duplication. It also ensures they are `constexpr`.
This patch adds CSS::property_affects_layout(PropertyID) which tells us
whether a CSS property would affect layout if it were changed.
This will be used to avoid unnecessary relayout work when something
changes that really only requires us to repaint the page.
To mark a property as not affecting layout, set "affects-layout" to
false in the corresponding Properties.json entry. Note that all
properties affect layout by default.
Since we want to store an initial value for every CSS::PropertyID,
it's pretty silly to use a HashMap when we can use an Array.
This takes the function from ~2.8% when mousing around on GitHub all the
way down to ~0.6%. :^)
These work differently from how we validate StyleValues. There, we parse
a StyleValue from the CSS, and then see if it is allowed in the
property. That causes problems when the syntax is ambiguous - for
example, `0` can be a number or a Length.
Here instead, we ask what kinds of value are allowed for a
media-feature, and then only attempt to parse those kinds of value.
This makes the ambiguity problem go away. :^)
Each media-feature in the spec only accepts one type of value, and/or
some identifiers. This makes the switch statements for the type a bit
excessive, but the spec does not *require* that only one type is
allowed, so this is more future-proof.
This works largely the same as the PropertyID and ValueID generators,
but using LibMain, Core::Stream, and TRY().
Rather than have a MediaFeatureID::Invalid, I decided to return an
Optional. We'll see if that turns out better or not. :^)
This patch adds NodeIterator (created via Document.createNodeIterator())
which allows you to iterate through all the nodes in a subtree while
filtering with a provided NodeFilter callback along the way.
This first cut implements the full API, but does not yet handle nodes
being removed from the document while referenced by the iterator. That
will be done in a subsequent patch.
This initial version lays down the basic foundation of IDL overload
resolution, but much of it will have to be replaced with the actual IDL
overload resolution algorithms once we start implementing more complex
IDL overloading scenarios.
This allows us to fuzz the generated unicode and timezone database
helpers, and to fuzz things like LibJS using Fuzzilli to get proper
coverage of our unicode handling code.
Update the Azure CI to use the new two-stage build as well, and cleanup
some unused CMake options there.
WebSockets got moved from the HTML standard to their own, the new
WebSockets Standard (https://websockets.spec.whatwg.org).
Move the IDL file and implementation into a new WebSockets directory and
C++ namespace accordingly.
The single 4000-line WrapperGenerator.cpp file was proving to be a pain
to hack, and was filled with spaghetti, split it into a bunch of files
to lessen the impact of the spaghetti.
Also refactor the whole parser to use a class instead of a giant
function with a million lambdas.
I've attempted to handle the errors gracefully where it was clear how to
do so, and simple, but a lot of this was just adding
`release_value_but_fixme_should_propagate_errors()` in places.
I can't imagine how this happened, but it seems we've managed to
conflate the "event listener" and "EventListener" concepts from the DOM
specification in some parts of the code.
We previously had two things:
- DOM::EventListener
- DOM::EventTarget::EventListenerRegistration
DOM::EventListener was roughly the "EventListener" IDL type,
and DOM::EventTarget::EventListenerRegistration was roughly the "event
listener" concept. However, they were used interchangeably (and
incorrectly!) in many places.
After this patch, we now have:
- DOM::IDLEventListener
- DOM::DOMEventListener
DOM::IDLEventListener is the "EventListener" IDL type,
and DOM::DOMEventListener is the "event listener" concept.
This patch also updates the addEventListener() and removeEventListener()
functions to follow the spec more closely, along with the "inner invoke"
function in our EventDispatcher.
We no longer include all the things, so each generated IDL file only
depends on the things it actually needs now.
A possible downside is that all IDL files have to explicitly import
their dependencies.
Note that non-IDL dependencies still remain and are injected into all
generated files, this can be resolved later if desired by allowing IDL
files to import headers.
BCP 47 will be the single source of truth for known calendar and number
system keywords, and their aliases (e.g. "gregory" is an alias for
"gregorian"). Move the generation of available keywords to where we
parse the BCP 47 data, so that hard-coded aliases may be removed from
other generators.
We have a fair amount of hard-coded keywords / aliases that can now be
replaced with real data from BCP 47. As a result, the also changes the
awkward way we were previously generating keys. Before, we were more or
less generating keywords as a CSV list of keys, e.g. for the "nu" key,
we'd generate "latn,arab,grek" (ordered by locale preference). Then at
runtime, we'd split on the comma. We now just generate spans of keywords
directly.
This package was originally meant to be included in CLDR version 40, but
was missed in their release scripts. This has been resolved:
https://unicode-org.atlassian.net/browse/CLDR-15158
Unfortunately, the CLDR was re-released with the same version number. So
to bust the build's CLDR cache, change the "version" used to detect that
we need to redownload the CLDR.
This is no longer needed as BrowsingContextContainer::content_document()
now does the right thing, and HTMLIFrameElement.contentDocument is the
only user of this attribute. Let's not invent our own mechanisms for
things that are important to get right, like same origin comparisons.
The spec version of canonical_numeric_index_string is absurdly complex,
and ends up converting from a string to a number, and then back again
which is both slow and also requires a few allocations and a string
compare.
Instead this patch moves away from using Values to represent canonical
a canonical index. In most cases all we need to know is whether a
PropertyKey is an integer between 0 and 2^^32-2, which we already
compute when we construct a PropertyKey so the existing is_number()
check is sufficient.
The more expensive case is handling strings containing numbers that
don't roundtrip through string conversion. In most cases these turn
into regular string properties, but for TypedArray access these
property names are not treated as normal named properties.
TypedArrays treat these numeric properties as magic indexes that are
ignored on read and are not stored (but are evaluated) on assignment.
For that reason there's now a mode flag on canonical_numeric_index_string
so that only TypedArrays take the cost of the ToString round trip test.
In order to improve the performance of this path this patch includes
some early returns to avoid conversion in cases where we can quickly
know whether a property can round trip.
This adds a generator utility to read an entire file and parse it as a
JSON value. This is heavily used by the CLDR generators. The idea here
is to put the file reading details in the utility so that when we have a
good story for generically reading an entire stream in LibCore, we can
update the generators to use that by only touching this helper.
This also moves the open_file helper to the utility file. It's currently
a lambda redefined in each TZDB/Unicode generator. It used to display
the missing command line flag and other info local to each generator.
After switching to LibMain, it just returns a generic error message, and
is duplicated several times.
This reverts commit 3a184f7841.
This broke a number of test262 tests under "TypedArrayConstructors".
The issue is that the CanonicalNumericIndexString AO should not fail
for inputs like "1.1", despite them not being integral indices.
The spec version of canonical_numeric_index_string is absurdly complex,
and ends up converting from a string to a number, and then back again
which is both slow and also requires a few allocations and a string
compare.
Instead lets use the logic we already have as that is much more
efficient.
This improves performance of all non-numeric property names.
This initial implementation stubs out the WorkerGlobalScope,
WorkerLocation and WorkerNavigator classes. It doesn't take into account
all the things that actually need passed into the constructors for these
objects, nor the extra abstract operations that need to be performed on
them by the rest of the Browser infrastructure. However, it does create
bindings that compile and link :^)