This implementation is basically a copy-paste of the SECP256r1
implementation with all "256" replaced with "384".
In the future it might be nice to make this generic, instead of having
two almost identical copies of code.
Instead of building the REDUCE_PRIME constant on the fly from the carry
flag, we now simply use the constant in combination with select. This
improves the readablility of the functions significantly.
Instead of having a large list of magical constants, we now only have
the curve prime, a, b, and order, which are all taken from the
specification. All the other helper constants are now calculated from
the curve paramters.
A reference to the current stack frame becomes invalid after returning,
so returning Bytes is pointless.
I don't understand why this wasn't discovered earlier, but it caused
some CI problems for me, so I fixed it.
Don't take this as encouragement to break master! :^)
When building for AArch64 with UBSan enabled, GCC 13.1 reports a false
"array out of bounds" error on access to offset `1 * sizeof(u64)`.
Changing the order of the stores seems to silence it.
This generic stream wrapper performs checksum calculations on all data
passed through it for reading or writing, and is therefore convenient
for calculating checksums while performing normal data input/output, as
well as computing streaming checksums on non-seekable streams.
The implementation of this is naive enough so it can handle all 8-bit
CRC polynomials, of which there are quite a few. The table generation
and update procedure is MSB first, which is backwards from the LSB first
method of CRC32.
`vformat()` can now accept format specifiers of the form
{:'[numeric-type]}. This will output a number with a comma separator
every 3 digits.
For example:
`dbgln("{:'d}", 9999999);` will output 9,999,999.
Binary, octal and hexadecimal numbers can also use this feature, for
example:
`dbgln("{:'x}", 0xffffffff);` will output ff,fff,fff.
Rather than the very C-like API we currently have, accepting a void* and
a length, let's take a Bytes object instead. In almost all existing
cases, the compiler figures out the length.
This implements Intel's slicing-by-8 algorithm for CRC checksums (only
little endian CPUs for now, as I don't have a way to test big endian).
The original paper for this algorithm seems to have disappeared, but
Intel's source code is still available as a reference:
https://sourceforge.net/projects/slicing-by-8/
As well as other implementations for reference:
https://docs.rs/slice-by-8/latest/src/slice_by_8/algorithm.rs.html
Using the "enwik8" file as a test (100MB uncompressed, commonly used in
benchmarks: https://www.mattmahoney.net/dc/enwik8.zip), decompression
time decreases from:
4.89s to 3.52s on Serenity (cold)
1.72s to 1.32s on Serenity (warm)
1.06s to 0.92s on Linux
Instead of going byte by byte, copy entire blocks at once and only check
if we need to update the state once per block. This pretty much
eliminates `::update()` from profiles and measurably improves
performance for utilities like `sha256sum`.
This adds a function to parse multiple PEMs out of a single input.
This allows us to load certificates from a cacert.pem file without
need for preprocessing.