This commit adds a new process method to all Decoder subclasses which
do what to_utf8 used to do, and allows callers to customize the handling
of individiual UTF-8 code points through a callback. Decoder::to_utf8
now uses this API to generate a string via StringBuilder, preserving the
original behavior.
This patch changes get_standardized_encoding to use an Optional<String>
return type instead of just returning the null string when unable to
match the provided encoding to one of the canonical encoding names.
This is part of an effort to move away from using null strings towards
explicitly using Optional<String> to indicate that the String may not
have a value.
This encoding (a superset of ascii that adds in the cyrillic alphabet)
is currently the third most used encoding on the web, and because
cyrillic glyphs were added by Dmitrii Trifonov recently, we can now
support it as well :^)
SPDX License Identifiers are a more compact / standardized
way of representing file license information.
See: https://spdx.dev/resources/use/#identifiers
This was done with the `ambr` search and replace tool.
ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
This is a superset of ascii that adds in the hebrew alphabet.
(Google currently assumes we are running windows due to not
recognizing Serenity as the OS in the user agent, resulting
in this encoding instead of UTF8 in google search results)