`swizzle` had the wrong operands, and the vector masking boolean logic
was incorrect in the internal `shuffle_or_0` implementation. `shuffle`
was previously implemented as a dynamic swizzle, when it uses an
immediate operand for lane indices in the spec.
This necessitates marking bit_cast as ALWAYS_INLINE since emitting it as
a function call there will create an unnecessary potential SSE
registers -> plain registers/memory round-trip.
Is it another great upgrade to our PNG encoder like in 9aafaec259?
Well, not really - it's not a 2x or 55x improvement like you saw there,
but still it saves something:
- a screenshot of a blank Serenity desktop dropped from about 45 KiB
to 40 KiB.
- re-encoding NASA photo of the Earth to PNG again saves about 25%
(16.5 MiB -> 12.3 MiB), compared to not using filters.
[1]: https://commons.wikimedia.org/wiki/File:The_Blue_Marble_(remastered).jpg
This implements an 8-bit front stencil buffer. Stencil operations are
SIMD optimized. LibGL changes include:
* New `glStencilMask` and `glStencilMaskSeparate` functions
* New context parameter `GL_STENCIL_CLEAR_VALUE`