0ct0pu5/ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2024-12-12 17:30:38 +00:00

Author	SHA1	Message	Date
Timothy Flynn	fd0011989a	LibUnicode: Resolve the most likely territory alias when there are many	2021-09-01 14:14:47 +01:00
Timothy Flynn	72f49e42b4	LibUnicode: Perform complex Unicode locale alias substitution	2021-09-01 14:14:47 +01:00
Timothy Flynn	da89cf9afb	LibUnicode: Canonicalize calendar subtags Calendar subtags are a bit of an odd-man-out in that we must match the variants "ethiopic-amete-alem" in that order, without any other variant in the locale. So a separate method is needed for this, and we now defer sorting the variant list until after other canonicalization is done.	2021-09-01 14:14:47 +01:00
Timothy Flynn	8458f477a4	LibUnicode: Canonicalize timezone subtags	2021-09-01 14:14:47 +01:00
Timothy Flynn	335f985b31	LibUnicode: Canonicalize the subtag "imperial" to "uksystem"	2021-09-01 14:14:47 +01:00
Timothy Flynn	2d90144888	LibUnicode: Canonicalize the subtag "primary" and "tertiary" to "levelN"	2021-09-01 14:14:47 +01:00
Timothy Flynn	409f39b336	LibUnicode: Canonicalize the subtag "names" to "prprname"	2021-09-01 14:14:47 +01:00
Timothy Flynn	f907a7dc38	LibUnicode: Canonicalize the subtag "yes" to "true"	2021-09-01 14:14:47 +01:00
Timothy Flynn	556374a904	LibUnicode: Substitute Unicode locale aliases during canonicalization Unicode TR35 defines how locale subtag aliases should be emplaced when converting a locale to canonical form. For most subtags, it is a simple substitution. Language subtags depend on context; for example, the language "sh" should become "sr-Latn", but if the original locale has a script subtag already ("sh-Cyrl"), then only the language subtag of the alias should be taken ("sr-Latn"). To facilitate this, we now make two passes when canonicalizing a locale. In the first pass, we convert the LocaleID structure to canonical syntax (where the conversions all happen in-place). In the second pass, we form the canonical string based on the canonical syntax.	2021-09-01 14:14:47 +01:00
Timothy Flynn	d13142f015	LibJS+LibUnicode: Store parsed Unicode locale data as full strings Originally, it was convenient to store the parsed Unicode locale data as views into the original string being parsed. But to implement locale aliases will require mutating the data that was parsed. To prepare for that, store the parsed data as proper strings.	2021-09-01 14:14:47 +01:00
Andrew Kaster	b3e3e4d45d	Tests: Convert remaining LibC tests to LibTest Convert them to using outln instead of printf at the same time.	2021-09-01 13:44:24 +02:00
Peter Elliott	33d7fdca28	Everywhere: Use my cool new @serenityos.org email address	2021-09-01 11:37:25 +04:30
Peter Elliott	8d2c04821f	Tests: Test LibMarkdown against commonmark test suite TestCommonmark runs the CommonMark test suite (https://spec.commonmark.org/0.30/spec.json) against LibMarkdown. Currently 44/652 tests pass.	2021-08-31 16:53:51 +02:00
Hediadyoin1	fdef6e5f76	AK: Add FixedPoint arithmetic helper Co-authored-by: Hendiadyoin1 <leon2002.la@gmail.com> Co-authored-by: kleines Filmröllchen <malu.bertsch@gmail.com>	2021-08-31 17:03:55 +04:30
Ali Mohammad Pur	30a1a25daa	Tests/LibWasm: Handle all stream errors in parse_webassembly_module	2021-08-30 22:47:02 +02:00
Ali Mohammad Pur	99199b9bfd	Tests/LibWasm: Add support for javascript bigint values Some i64 values will not fit in normal doubles, and these values _are_ tested by the test suite, this makes the test runtime capable of handling them correctly.	2021-08-30 22:47:02 +02:00
Timothy Flynn	f897c2edb3	LibUnicode: Canonicalize locale private use extensions	2021-08-30 19:42:40 +01:00
Timothy Flynn	6f0cb52dc4	LibUnicode: Canonicalize locale extensions	2021-08-30 19:42:40 +01:00
Timothy Flynn	30855e6663	LibUnicode: Parse locale private use extensions	2021-08-30 19:42:40 +01:00
Timothy Flynn	29f76ef7c8	LibUnicode: Parse locale extensions of the other extension form	2021-08-30 19:42:40 +01:00
Timothy Flynn	d2d304fcf8	LibUnicode: Parse locale extensions of the transformed extension form	2021-08-30 19:42:40 +01:00
Timothy Flynn	eda92d15e4	LibUnicode: Parse locale extensions of the Unicode locale extension form	2021-08-30 19:42:40 +01:00
Timothy Flynn	587d4663a3	AK: Return early from swap() when swapping the same object When swapping the same object, we could end up with a double-free error. This was found while quick-sorting a Vector of Variants holding complex types, reproduced by the new swap_same_complex_object test case.	2021-08-30 19:42:40 +01:00
Ali Mohammad Pur	206bc01f81	LibRegex: Allow null bytes in pattern That check was rather pointless as the input is a StringView which knows its own bounds. Fixes #9686.	2021-08-30 18:43:09 +02:00
Timothy Flynn	b7a95cba65	LibUnicode: Implement grammar validators for Unicode TR-35 ECMA-402 requires validating user input against the EBNF grammar for Unicode locales described in TR-35: https://www.unicode.org/reports/tr35 This commit adds validators for that grammar, as well as other helper to e.g. canonicalize a locale string.	2021-08-26 22:04:09 +01:00
Timothy Flynn	262e412634	AK: Implement method to convert a String/StringView to title case This implementation preserves consecutive spaces in the orginal string.	2021-08-26 22:04:09 +01:00
Jean-Baptiste Boric	c97f7ea23b	Tests: Test setjmp/sigsetjmp LibC functions Since there are no real users of these functions in Serenity's userland and this is my third attempt at this... This time, the great LibTest test suite will make sure that I do it right!	2021-08-26 00:54:23 +02:00
Jan de Visser	85a84b0794	LibSQL: Introduce Serializer as a mediator between Heap and client code Classes reading and writing to the data heap would communicate directly with the Heap object, and transfer ByteBuffers back and forth with it. This makes things like caching and locking hard. Therefore all data persistence activity will be funneled through a Serializer object which in turn submits it to the Heap. Introducing this unfortunately resulted in a huge amount of churn, in which a number of smaller refactorings got caught up as well.	2021-08-21 22:03:30 +02:00
Jan de Visser	d074a601df	LibSQL+SQLServer: Bare bones INSERT and SELECT statements This patch provides very basic, bare bones implementations of the INSERT and SELECT statements. They are very limited: - The only variant of the INSERT statement that currently works is SELECT INTO schema.table (column1, column2, ....) VALUES (value11, value21, ...), (value12, value22, ...), ... where the values are literals. - The SELECT statement is even more limited, and is only provided to allow verification of the INSERT statement. The only form implemented is: SELECT * FROM schema.table These statements required a bit of change in the Statement::execute API. Originally execute only received a Database object as parameter. This is not enough; we now pass an ExecutionContext object which contains the Database, the current result set, and the last Tuple read from the database. This object will undoubtedly evolve over time. This API change dragged SQLServer::SQLStatement into the patch. Another API addition is Expression::evaluate. This method is, unsurprisingly, used to evaluate expressions, like the values in the INSERT statement. Finally, a new test file is added: TestSqlStatementExecution, which tests the currently implemented statements. As the number and flavour of implemented statements grows, this test file will probably have to be restructured.	2021-08-21 22:03:30 +02:00
Jan de Visser	b74721e604	LibSQL: Redesign Value implementation and add new types The implemtation of the Value class was based on lambda member variables implementing type-dependent behaviour. This was done to ensure that Values can be used as stack-only objects; the simplest alternative, virtual methods, forces them onto the heap. The problem with the the lambda approach is that it bloats the Values (which are supposed to be lightweight objects) quite considerably, because every object contains more than a dozen function pointers. The solution to address both problems (we want Values to be able to live on the stack and be as lightweight as possible) chosen here is to encapsulate type-dependent behaviour and state in an implementation class, and let the Value be an AK::Variant of those implementation classes. All methods of Value are now basically straight delegates to the implementation object using the Variant::visit method. One issue complicating matters is the addition of two aggregate types, Tuple and Array, which each contain a Vector of Values. At this point Tuples and Arrays (and potential future aggregate types) can't contain these aggregate types. This is limiting and needs to be addressed. Another area that needs attention is the nomenclature of things; it's a bit of a tangle of 'ValueBlahBlah' and 'ImplBlahBlah'. It makes sense right now I think but admit we probably can do better. Other things included here: - Added the Boolean and Null types (and Tuple and Array, see above). - to_string now always succeeds and returns a String instead of an Optional. This had some impact on other sources. - Added a lot of tests. - Started moving the serialization mechanism more towards where I want it to be, i.e. a 'DataSerializer' object which just takes serialization and deserialization requests and knows for example how to store long strings out-of-line. One last remark: There is obviously a naming clash between the Tuple class and the Tuple Value type. This is intentional; I plan to make the Tuple class a subclass of Value (and hence Key and Row as well).	2021-08-21 22:03:30 +02:00
Jan de Visser	a5e28f2897	LibSQL: Make TupleDescriptor a shared pointer instead of a stack object Tuple descriptors are basically the same for for example all rows in a table. Makes sense to share them instead of copying them for every single row.	2021-08-21 22:03:30 +02:00
Timothy Flynn	562d4e497b	LibRegex: Treat pattern string characters as unsigned For example, consider the following pattern: new RegExp('\ud834\udf06', 'u') With this pattern, the regex parser should insert the UTF-8 encoded bytes 0xf0, 0x9d, 0x8c, and 0x86. However, because these characters are currently treated as normal char types, they have a negative value since they are all > 0x7f. Then, due to sign extension, when these characters are cast to u64, the sign bit is preserved. The result is that these bytes are inserted as 0xfffffffffffffff0, 0xffffffffffffff9d, etc. Fortunately, there are only a few places where we insert bytecode with the raw characters. In these places, be sure to treat the bytes as u8 before they are cast to u64.	2021-08-20 19:16:33 +02:00
Andreas Kling	13f4890c38	LibCore: Make Core::File::open() return OSError in case of failure	2021-08-20 15:31:46 +02:00
Timothy Flynn	4f2cbe119b	LibRegex: Allow Unicode escape sequences in capture group names Unfortunately, this requires a slight divergence in the way the capture group names are stored. Previously, the generated byte code would simply store a view into the regex pattern string, so no string copying was required. Now, the escape sequences are decoded into a new string, and a vector of all parsed capture group names are stored in a vector in the parser result structure. The byte code then stores a view into the corresponding string in that vector.	2021-08-19 23:49:25 +02:00
Timothy Flynn	fd8ccedf2b	AK: Add GenericLexer API to consume an escaped Unicode code point This parsing is already duplicated between LibJS and LibRegex, and will shortly be needed in more places in those libraries. Move it to AK to prevent further duplication. This API will consume escaped Unicode code points of the form: \\u{code point} \\unnnn (where each n is a hexadecimal digit) \\unnnn\\unnnn (where the two escaped values are a surrogate pair)	2021-08-19 23:49:25 +02:00
Timothy Flynn	325eabc770	LibRegex: Ensure the GoBack operation decrements the code unit index This was missed in commit `27d555bab0`.	2021-08-18 09:47:09 +04:30
Timothy Flynn	a9716ad44e	LibRegex: In non-Unicode mode, parse \u{4} as a repetition pattern	2021-08-18 09:47:09 +04:30
davidot	7613c22b06	LibJS: Add a mode to parse JS as a module In a module strict mode should be enabled at the start of parsing and we allow import and export statements.	2021-08-15 23:51:47 +01:00
Timothy Flynn	9509433e25	LibRegex: Implement and use a REPEAT operation for bytecode repetition Currently, when we need to repeat an instruction N times, we simply add that instruction N times in a for-loop. This doesn't scale well with extremely large values of N, and ECMA-262 allows up to N = 2^53 - 1. Instead, add a new REPEAT bytecode operation to defer this loop from the parser to the runtime executor. This allows the parser to complete sans any loops (for this instruction), and allows the executor to bail early if the repeated bytecode fails. Note: The templated ByteCode methods are to allow the Posix parsers to continue using u32 because they are limited to N = 2^20.	2021-08-15 11:43:45 +01:00
Timothy Flynn	f1ce998d73	LibRegex+LibJS: Combine named and unnamed capture groups in MatchState Combining these into one list helps reduce the size of MatchState, and as a result, reduces the amount of memory consumed during execution of very large regex matches. Doing this also allows us to remove a few regex byte code instructions: ClearNamedCaptureGroup, SaveLeftNamedCaptureGroup, and NamedReference. Named groups now behave the same as unnamed groups for these operations. Note that SaveRightNamedCaptureGroup still exists to cache the matched group name. This also removes the recursion level from the MatchState, as it can exist as a local variable in Matcher::execute instead.	2021-08-15 11:43:45 +01:00
Timothy Flynn	1a173be29d	LibRegex: Disallow unescaped quantifiers in Unicode mode	2021-08-15 11:43:45 +01:00
Timothy Flynn	c3e1f1f687	LibRegex: Use correct source characters for Unicode identity escapes	2021-08-15 11:43:45 +01:00
Timothy Flynn	6a485f612f	LibRegex: Implement legacy octal escape parsing closer to the spec The grammar for the ECMA-262 CharacterEscape is: CharacterEscape[U, N] :: ControlEscape c ControlLetter 0 [lookahead ∉ DecimalDigit] HexEscapeSequence RegExpUnicodeEscapeSequence[?U] [~U]LegacyOctalEscapeSequence IdentityEscape[?U, ?N] It's important to parse the standalone "\0 [lookahead ∉ DecimalDigit]" before parsing LegacyOctalEscapeSequence. Otherwise, all standalone "\0" patterns are parsed as octal, which are disallowed in Unicode mode. Further, LegacyOctalEscapeSequence should also be parsed while parsing character classes.	2021-08-15 11:43:45 +01:00
Timothy Flynn	83ca8c7e38	LibRegex: Convert LibRegex tests to use StringView in place of C-strings A subsequent commit will add tests that require a string containing only "\0". As a C-string, this will be interpreted as the null terminator. To make the diff for that commit easier to grok, this commit converts all tests to use StringView without any other functional changes.	2021-08-15 11:43:45 +01:00
Timothy Flynn	0c8f2f5aca	LibRegex: Ensure escaped hexadecimals are exactly 2 digits in length	2021-08-15 11:43:45 +01:00
Timothy Flynn	2e4b6fd1ac	LibRegex: Ensure escaped code points are exactly 4 digits in length	2021-08-15 11:43:45 +01:00
Timothy Flynn	e887314472	LibRegex: Fix ECMA-262 parsing of invalid identity escapes * Only alphabetic (A-Z, a-z) characters may be escaped with \c. The loop currently parsing \c includes code points between the upper/lower case groups. * In Unicode mode, all invalid identity escapes should cause a parser error, even in browser-extended mode. * Avoid an infinite loop when parsing the pattern "\c" on its own.	2021-08-15 11:43:45 +01:00
Brian Gianforcaro	a2a5cb0f24	AK: Add Time::is_negative() to detect negative time values	2021-08-15 12:20:38 +02:00
Daniel Bertalan	0a36cea9dc	Tests: Re-enable UserspaceEmulator tests on the Clang build Now that problems that made UE crash have been fixed, this test should now pass.	2021-08-14 18:42:14 +02:00
Itamar	e57fdb63f8	Tests: Add regression tests for the LibCpp preprocessor Similarly to the LibCpp parser regression tests, these tests run the preprocessor on the .cpp test files under Userland/LibCpp/Tests/preprocessor, and compare the output with existing .txt ground truth files.	2021-08-14 12:40:55 +02:00

1 2 3 4 5 ...

277 commits