We don't need to run through the whole floating-point color converter
for videos that use sRGB transfer characteristics and BT.709 color
primaries. This commit adds a new templated inlining function to
ColorConverter to do a very fast fixed-point YCbCr to RGB conversion.
With the fast path, frame conversion times go from ~7.8ms down to
~3.7ms. The fast path can benefit a lot more from extra SIMD vector
width, as well.
Clang was reluctant to inline these for some reason. However, inlining
them seems to be quite beneficial, reducing decoding time in an intra-
heavy video by about 21% (~12.7s -> ~10.0s).
Quantizers are a constant for the whole frame, except when segment
features override them, in which case they are a constant per segment
ID. We take advantage of this by pre-calculating those after reading
the quantization parameters and segmentation features for a frame.
This results in a small 1.5% improvement (~12.9s -> ~12.7s).
This throws out some ugly `#define`s we had that were taking the role
of an enum anyway. We now have some nice getters in the contexts that
take the place of the combo of `seg_feature_active()` and then doing a
lookup in `FrameContext::m_segmentation_features` directly.
Bit reversals are used very often in intra-predicted frames. Turning
these into a constexpr lookup table reduces the branching needed for
block transforms significantly. This reduces the times spent decoding
an intra-heavy 1080p video by about 9% (~14.3s -> ~12.9s).
Previously, the block sizes would be checked at runtime to
determine the transform size to apply for residuals. Making the block
sizes into constant expressions allows all the loops to be unrolled
and reduces branching significantly.
This results in about a 26% improvement (~18s -> ~13.2s) in speed in an
intra-heavy test video.
Inter-prediction convolution filters are selected based on the
subpixel position determined for the motion vector relative to the
block being predicted. The subpixel position 0 only uses one single
sample in the center of the convolution, not averaging any other
samples. Let's call this a copy.
Reference frames can also be a different size relative to the frame
being predicted, but in almost every case, that scale will be 1:1
for every single frame in a video.
Taking into account these facts, we can create multiple fast paths for
inter prediction. These fast paths are only active when scaling is 1:1.
If we are doing a copy in both dimensions, then we can do a straight
memcpy from the reference frame to the output block buffer. In videos
where there is no motion, this is a dramatic speedup.
If we are doing a copy in one dimension, we can just do one convolution
and average directly into the output block buffer.
If we aren't doing a copy in either dimension, we can still cut out a
few operations from the convolution loops, since we only need to
advance our samples by whole pixels instead of subpixels.
These fast paths result in about a 34% improvement (~31.2s -> ~20.6s)
in a video which relies heavily on intra-predicted blocks due to high
motion. In videos with less motion, the improvement will be even
greater.
Also, note that the accumulators in these faster loops are only 16-bit.
High bit-depth videos will overflow those, so for now the fast path is
only used for 8-bit videos.
A typo caused the Y scale value to never be used, so if a reference
frame's aspect ratio didn't match up with the current frame's, it would
decode incorrectly.
Some comments have been added to clarify the frame-constants used in
the function as well.
This moves all the frame size calculation to `FrameContext`, where the
subsampling is easily accessible to determine the size for each plane.
The internal framebuffer size has also been reduced to the exact frame
size that is output.
The division in the `round_mv_...()` functions contained in the motion
vector selection process was done by bit shifting right. However, since
bit shifting negative values will truncate towards the negative end, it
was flooring instead of rounding.
This changes it to match the spec and rely on the compiler to simplify
down to a bit shift.
Previosly we had a very messed up PS1 as the Shell PROMPT is not
unset correctly.
We now provide a default `zshrc` file for the system that uses
sane values for basic categories like aliases, autocompletion and
history management to make the port more usable. It also forces
the prompt to be the default zsh one.
After the EventLoop changes, we do not need to override LibVideo's timer
with a Qt timer for Ladybird. The timer callback provided here also does
not need the JS::SafeFunction wrapper that Platform::Timer provides.
Things such as timers and notifiers aren't specific to one instance of
Core::EventLoop, so let's not tie them down to EventLoopImplementation.
Instead, move those APIs + signals & a few other things to a new
EventLoopManager interface. EventLoopManager also knows how to create a
new EventLoopImplementation object.
Using QEventLoop works for everything but it breaks *one* little feature
that we care about: automatically quitting the app when all windows have
been closed.
That only works if you drive the outermost main event loop with a
QCoreApplication instead of a QEventLoop. This is unfortunate, as it
complicates our API a little bit, but I'm sure we can think of a way to
make this nicer someday.
In order for QCoreApplication::exec() to process our own
ThreadEventQueue, we now have a zero-timer that we kick whenever new
events are posted to the thread queue.
`WavWriter` can be constructed without a file, which should probably be
made impossible at some point. For now, let's not crash `Piano` when you
close the application.
Now that the Core::EventLoop is driven by a QEventLoop in Ladybird,
we don't need to patch LibWeb with Web::Platform plugins.
This patch removes EventLoopPluginQt and TimerQt.
Note that we can't just replace the Web::Platform abstractions with
LibCore stuff immediately, since the Web::Platform APIs use
JS::SafeFunction for callbacks.
This patch adds EventLoopImplementationQt which is a full replacement
for the Core::EventLoopImplementationUnix that uses Qt's event loop
as a backend instead.
This means that Core::Timer, Core::Notifier, and Core::Event delivery
are all driven by Qt primitives in the Ladybird UI and WC processes.
The EventLoop is now a wrapper around an EventLoopImplementation.
Our old EventLoop code has moved into EventLoopImplementationUnix and
continues to work as before.
The main difference is that all the separate thread_local variables have
been collected into a file-local ThreadData data structure.
The goal here is to allow running Core::EventLoop with a totally
different backend, such as Qt for Ladybird.
This program has never lived up to its original idea, and has been
broken for years (property editing, etc). It's also unmaintained and
off-by-default since forever.
At this point, Inspector is more of a maintenance burden than a feature,
so this commit removes it from the system, along with the mechanism in
Core::EventLoop that enables it.
If we decide we want the feature again in the future, it can be
reimplemented better. :^)
Not a single client of this API actually used the event mask feature to
listen for readability AND writability.
Let's simplify the API and have only one hook: on_activation.
Instead of juggling events between individual instances of
Core::EventLoop, move queueing and processing to a separate per-thread
queue (ThreadEventQueue).
This was used in exactly one place, to avoid sending multiple
CustomEvents to the enqueuer thread in Audio::ConnectionToServer.
Instead of this, we now just send a CustomEvent and wake the enqueuer
thread. If it wakes up and has multiple CustomEvents, they get delivered
and ignored in no time anyway. Since they only get ignored if there's
no work to be done, this seems harmless.
Previously, statements containing malformed exists expressions such as:
`INSERT INTO t(a) VALUES (SELECT 1)`;
could cause the parser to crash. The parser will now return an error
message instead.
The check on the currently active document for an existing favicon has
been removed. It caused an issue during navigation because the active
document will only be updated after the load and the favicon check has
been executed against the old document.