Explicit template arguments must be wrapped in parens,
else they confuse the preprocessor.
Add the parens instead of avoiding the use of explicit template
arguments.
No behavior change.
That way, we can write 0 instead of 8 bits for every alpha byte.
Reduces the size of sunset-retro.png when saved as a webp file
from 3 MiB to 2.25 MiB, without affecting encode speed.
Once we use CanonicalCodes we'll get this for free for all channels,
but opaque images are common enough that it feels worth it to do this
before then.
Bilevel images are not required to have a BitsPerSample or a
SamplesPerPixel tag, while this is unusual these images are still valid.
The test case has been generated by first making a copy of
ccitt3_1d_fill.tiff and then, using `tiffset` to remove both tags:
tiffset -u 258 ccitt3_no_tags.tiff
tiffset -u 277 ccitt3_no_tags.tiff
If we don't have enough stack space, throw an exception while we still
can, and give the caller a chance to recover.
This particular problem will go away once we make calls non-recursive.
This means inlining all the things. This yields a 40% speedup on the for
loop microbenchmark, and everything else gets faster as well. :^)
This makes compilation take foreeeever with GCC, so I'm only enabling it
for Clang in this commit. We should figure out how to make GCC compile
this without timing out CI, since the speedup is amazing.
This commit converts the main loop in Bytecode::Interpreter to use a
label table and computed goto for fast instruction dispatch.
This yields roughly 35% speedup on the for loop microbenchmark,
and makes everything else faster as well. :^)
Now that the interpreter is unrolled, we can advance the program counter
manually based on the current instruction type.
This makes most instructions a bit smaller. :^)
This commit adds a HANDLE_INSTRUCTION macro that expands to everything
needed to handle a single instruction (invoking the handler function,
checking for exceptions, and advancing the program counter).
This gives a ~15% speed-up on a for loop microbenchmark, and makes
basically everything faster.
Instead of storing a BasicBlock* and forcing the size of Label to be
sizeof(BasicBlock*), we now store the basic block index as a u32.
This means the final version of the bytecode is able to keep labels
at sizeof(u32), shrinking the size of many instructions. :^)
Instead of storing source offsets with each instruction, we now keep
them in a side table in Executable.
This shrinks each instruction by 8 bytes, further improving locality.
Instead of keeping bytecode as a set of disjoint basic blocks on the
malloc heap, bytecode is now a contiguous sequence of bytes(!)
The transformation happens at the end of Bytecode::Generator::generate()
and the only really hairy part is rerouting jump labels.
This required solving a few problems:
- The interpreter execution loop had to change quite a bit, since we
were storing BasicBlock pointers all over the place, and control
transfer was done by redirecting the interpreter's current block.
- Exception handlers & finalizers are now stored per-bytecode-range
in a side table in Executable.
- The interpreter now has a plain program counter instead of a stream
iterator. This actually makes error stack generation a bit nicer
since we just have to deal with a number instead of reaching into
the iterator.
This yields a 25% performance improvement on this microbenchmark:
for (let i = 0; i < 1_000_000; ++i) { }
But basically everything gets faster. :^)
Before this change, all JumpFoo instructions inherited from Jump, which
forced the unconditional Jump to have an unusued "false target" member.
Also, labels were unnecessarily wrapped in Optional<>.
By defining each jump instruction separately, they all shrink in size,
and all ambiguity is removed.
For bitmap fonts, we will often not have an exact match for requested
sizes. Return the closest match instead of a nullptr.
LibWeb is currently the only user of this API. If it needs to be
configurable in the future to only allow exact matches, we can add a
parameter or another method at that time.
This commit replaces the `import_pgn` implementation
with a more resiliant parser to handle various edge
cases and give helpful error messages instead of crashing.