The two different mode sets are stored in single fields, and the
underlying values didn't overlap, so there was no reason to keep them
separate.
The enum is now an enum class as well, to enforce that almost all uses
of the enum are named. The only case where underlying values are used
is in lookup tables, but it may be worth abstracting that as well to
make array bounds more clear.
Frames will now be queued for retrieval by the user of the decoder.
When the end of the current queue is reached, a DecoderError of
category NeedsMoreInput will be emitted, allowing the caller to react
by displaying what was previously retrieved for sending more samples.
I've realized that it probably makes more sense to change the input
transfer characteristics to treat these as sRGB since color conversion
in linear converted from BT.709 doesn't really make sense. If content
creation applications expect media players to display BT.709 without
conversions, this means they expect applications to treat it as sRGB,
since that's what most displays use. That most likely also means they
process it as sRGB internally, meaning we should do the same for our
color primaries conversion.
No longer will the video player explode with error dialogs that then
lock the user out of closing them.
To avoid issues where the playback state becomes invalid when an error
occurs, I've made all decoder errors pass through the frame queue.
This way, when a video is corrupted, there should be no chance that the
playback state becomes invalid due to setting the state to Corrupted
in the event handler while a presentation event is still pending.
Or at least I think that was what caused some issues I was seeing :^)
This system should be a lot more robust if any future errors need to be
handled.
Otherwise, we end up propagating those dependencies into targets that
link against that library, which creates unnecessary link-time
dependencies.
Also included are changes to readd now missing dependencies to tools
that actually need them.
This file will be the basis for abstracting away the out-of-thread or
later out-of-process decoding from applications displaying videos. For
now, the demuxer is hardcoded to be MatroskaParser, since that is all
we support so far. The demuxer should later be selected based on the
file header.
The playback and decoding are currently all done on one thread using
timers. The design of the code is such that adding threading should
be trivial, at least based on an earlier version of the code. For now,
though, it's better that this runs in one thread, as the multithreaded
approach causes the Video Player to lock up permanently after a few
frames are decoded.
This moves the setting of code points in CICP structs to member
functions completely so that the code having to set these code points
can be much cleaner.
The class is virtual and has one subclass, SubsampledYUVFrame, which
is used by the VP9 decoder to return a single frame. The
output_to_bitmap(Bitmap&) function can be used to set pixels on an
existing bitmap of the correct size to the RGB values that
should be displayed. The to_bitmap() function will allocate a new bitmap
and fill it using output_to_bitmap.
This new class also implements bilinear scaling of the subsampled U and
V planes so that subsampled videos' colors will appear smoother.
This adds a struct called CodingIndependentCodePoints and related enums
that are used by video codecs to define its color space that frames
must be converted from when displaying a video.
Pre-multiplied matrices and lookup tables are stored to avoid most of
the floating point division and exponentiation in the conversion.
Previously, some integer overflows and truncations were causing parsing
errors for 4K videos, with those fixed it can fully decode 8K video.
This adds a test to ensure that 4K video will continue to be decoded.
Note: There seems to be unexpectedly high memory usage while decoding
them, causing 8K video to require more than a gigabyte of RAM. (!!!)
Previously, saved probability tables were being inserted, causing the
Vector to increase in size when it should say fixed at a size of 4. This
changes the Vector to an Array<T, 4> which will default-initalize and
allow assigning to any index without previously setting size.
Integer overflow could sometimes occur due to counts going above 255,
where the values should instead be clamped at their maximum to avoid
wrapping to 0.
This fixes an issue causing frame 3 of the test video to fail to parse
because a reference vector was incorrectly within the range for a high
precision delta vector read.
This allows the second shown frame of the VP9 test video to be decoded,
as the second chunk uses a superframe to encode a reference frame and
a second to inter predict between the keyframe and the reference frame.
This enables the second frame of the test video to be decoded.
It appears that the test video uses a superframe (group of multiple
frames) for the first chunk of the file, but we haven't implemented
superframe parsing.
We also ignore the show_frame flag, so for now, this
means that the second frame read out is shown when it should not be. To
fix this, another error type needs to be implemented that is "thrown" to
decoder's client so they know to send another sample buffer.
The above interpolation filter mode was being taken from the left side
instead, causing some parsing errors.
This also changes the magic number 3 to SWITCHABLE_FILTERS.
Unfortunately, the spec uses the magic number, so this value was taken
instead from the reference codec, libvpx.
This gets the decoder closer to fully parsing the second frame without
any errors. It will still be unable to output an inter-predicted frame.
The lack of output causes VideoPlayer to crash if it attempts to read
the buffers for frame 1, so it is still limited to the first frame.
This changes MotionVector by removing the cpp file and moving all
functions to the header, where they are now declared as constexpr
so that they can be compile-time evaluated in LookupTables.h.
For testing purposes, the output buffer is taken directly from the
decoder and displayed in an image widget.
The first keyframe can be displayed, but the second will not decode
so VideoPlayer will stop at frame 0 for now.
This implements a BT.709 YCbCr to RGB conversion in VideoPlayer, but
that should be moved to a library for handling color space conversion.
The first keyframe of the test video can be decoded with these changes.
Raw memory allocations in the Parser have been replaced with Vector or
Array to avoid memory leaks and OOBs.
This allows runtime strings, so we can format the errors to make them
more helpful. Errors in the VP9 decoder will now print out a function,
filename and line number for where a read or bitstream requirement
has failed.
The DecoderErrorCategory enum will classify the errors so library users
can show general user-friendly error messages, while providing the
debug information separately.
Any non-DecoderErrorOr<> results can be wrapped by DECODER_TRY to
return from decoder functions. This will also add the extra information
mentioned above to the error message.
init_bool will now check whether there is enough data in the bitstream
for the range coding size to be fully read.
exit_bool will now read the entire padding element regardless of size,
which the spec does not specify a limit on.
Reads will now be done in larger chunks at a time.
The public read_byte() function was removed in favor of a private
fill_reservoir() function which will be used to fill the 64-bit
reservoir field which will then be bit-shifted and masked as necessary
for subsequent arbitrary bit-sized reads.
read_f(n) was renamed to read_bits to be clearer about its use.
The interpolation filter value is not set when reading an intra-only
frame, so printing this for the first keyframe of the file was printing
"220", which is invalid.
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).
No functional changes.
Apologies for the enormous commit, but I don't see a way to split this
up nicely. In the vast majority of cases it's a simple change. A few
extra places can use TRY instead of manual error checking though. :^)
AK's version should see better inlining behaviors, than the LibM one.
We avoid mixed usage for now though.
Also clean up some stale math includes and improper floatingpoint usage.
With this patch we are finally done with section 6.4.X of the spec :^)
The only parsing left to be done is 6.5.X, motion vector prediction.
Additionally, this patch fixes how MVs were being stored in the parser.
Originally, due to the spec naming two very different values very
similarly, these properties had totally wrong data types, but this has
now been rectified.
Though technically block decoding calls into some other incomplete
methods, so it isn't functionally complete yet. However, we are
very close to being done with the 6.4.X sections :)
These elements were being used in the new tokens implementation, so
support for them in the TreeParser has been added.
Additionally, this uncovered a bug where the nonzero contexts were
being cleared with the wrong size.
This was required for correctly parsing more than one frame's
height/width data properly. Additionally, start handling failure
a little more gracefully. Since we don't fully parse a tile before
starting to parse the next tile, we will now no longer make it past
the first tile mark, meaning we should not handle that scenario well.
The class that was previously named Decoder handled section 6.X.X of
the spec, which actually deals with parsing out the syntax of the data,
not the actual decoding logic which is specified in section 8.X.X.
The new Decoder class will be in charge of owning and running the
Parser, as well as implementing all of the decoding behavior.
Additionally, this uncovered a couple bugs with existing code,
so those have been fixed. Currently, parsing a whole video does
fail because we are now using a new calculation for frame width,
but it hasn't been fully implemented yet.
Now TreeParser has mostly complete probability calculation
implementations for all currently used syntax elements. Some of these
calculation methods aren't actually finished because they use data
we have yet to parse in the Decoder, but they're close to finished.
With the progress made in the Decoder thus far, we have the ability
to support most of the syntax element counters in the tree parser.
Additionally, it will now crash when trying to count unsupported
elements.
This patch brings all of LibVideo up to the east-const style in the
project. Additionally, it applies a few fixes from the reviews in #8170
that referred to older LibVideo code.
The TreeParser requires information about a lot of the decoder's
current state in order to parse syntax tree elements correctly, so
there has to be some communication between the Decoder and the
TreeParser. Previously, the Decoder would copy its state to the
TreeParser when it changed, however, this was a poor choice. Now,
the TreeParser simply has a reference to its owning Decoder, and
accesses its state directly.
This patch adds compressed header parsing to the VP9 decoder (section
6.4 of the spec). This is the final decoder step before we can start to
decode tiles.
This patch brings all of the previous work together and starts to
actually parse and decode frame information. Currently it only parses
the uncompressed header data (section 6.2 of the spec).
The VP9 specification requires a special decoding process to parse a
lot of the data read from a frame, so this BitStream wrapper implements
that behavior. These processes are defined in section 9 of the
VP9 spec.
This commit initializes the LibVideo library and implements parsing
basic Matroska container files. Currently, it will only parse audio
and video tracks.