浏览代码

LibTextCodec: Use String::from_utf8() when decoding UTF-8 to UTF-8

This way, we still perform UTF-8 validation, but don't go through the
slow generic code path that rebuilds the decoded string one code point
at a time.

This was a bottleneck when loading a canned copy of reddit.com, which
ended up being ~120 MiB large.

- Time spent decoding UTF-8 before this change: 1192 ms
- Time spent decoding UTF-8 after this change:  154 ms

That's still a long time, but 7.7x faster is nothing to sneeze at! :^)

Note that if the input fails UTF-8 validation, we still fall back to
the slow path and insert replacement characters per the WHATWG Encoding
spec: https://encoding.spec.whatwg.org/#utf-8-decode
Andreas Kling 1 年之前
父节点
当前提交
1a46d8df5f
共有 1 个文件被更改,包括 3 次插入0 次删除
  1. 3 0
      Userland/Libraries/LibTextCodec/Decoder.cpp

+ 3 - 0
Userland/Libraries/LibTextCodec/Decoder.cpp

@@ -372,6 +372,9 @@ ErrorOr<String> UTF8Decoder::to_utf8(StringView input)
         bomless_input = input.substring_view(3);
     }
 
+    if (Utf8View(bomless_input).validate())
+        return String::from_utf8_without_validation(bomless_input.bytes());
+
     return Decoder::to_utf8(bomless_input);
 }