mirror of
https://github.com/LadybirdBrowser/ladybird.git
synced 2024-11-21 23:20:20 +00:00
LibTextCodec: Use String::from_utf8() when decoding UTF-8 to UTF-8
This way, we still perform UTF-8 validation, but don't go through the slow generic code path that rebuilds the decoded string one code point at a time. This was a bottleneck when loading a canned copy of reddit.com, which ended up being ~120 MiB large. - Time spent decoding UTF-8 before this change: 1192 ms - Time spent decoding UTF-8 after this change: 154 ms That's still a long time, but 7.7x faster is nothing to sneeze at! :^) Note that if the input fails UTF-8 validation, we still fall back to the slow path and insert replacement characters per the WHATWG Encoding spec: https://encoding.spec.whatwg.org/#utf-8-decode
This commit is contained in:
parent
1a9dabe5ff
commit
1a46d8df5f
Notes:
github-actions[bot]
2024-07-20 12:30:34 +00:00
Author: https://github.com/awesomekling Commit: https://github.com/LadybirdBrowser/ladybird/commit/1a46d8df5fc Pull-request: https://github.com/LadybirdBrowser/ladybird/pull/733 Reviewed-by: https://github.com/trflynn89
1 changed files with 3 additions and 0 deletions
|
@ -372,6 +372,9 @@ ErrorOr<String> UTF8Decoder::to_utf8(StringView input)
|
|||
bomless_input = input.substring_view(3);
|
||||
}
|
||||
|
||||
if (Utf8View(bomless_input).validate())
|
||||
return String::from_utf8_without_validation(bomless_input.bytes());
|
||||
|
||||
return Decoder::to_utf8(bomless_input);
|
||||
}
|
||||
|
||||
|
|
Loading…
Reference in a new issue