This also renames (most?) of the related quantizer functions and
variables to make more sense. I haven't determined what AC/DC stands
for here, but it may be just an arbitrary naming scheme for the first
and subsequent coefficients used to quantize the residuals for a block.
The color config is reused for most inter predicted frames, so we use a
struct ColorConfig to store the config from intra frames, and put it in
a field in Parser to copy from when an inter frame without color config
is encountered.
There are three mutually exclusive frame-showing states:
- Show no new frame, only store the frame as a reference.
- Show a newly decoded frame.
- Show frame from the reference frame store.
Since they are mutually exclusive, using an enum rather than two bools
makes more sense.
These are used to pass context needed for decoding, with mutability
scoped only to the sections that the function receiving the contexts
needs to modify. This allows lifetimes of data to be more explicit
rather than being stored in fields, as well as preventing tile threads
from modifying outside their allowed bounds.
The field was only used once to track whether residual tokens were
present in the block. Parser::tokens() now returns a bool indicating
whether they were present.
These are now passed as parameters to each function that uses them.
These will later be moved to a struct to further reduce the amount of
parameters that get passed around.
Above and left per-frame block contexts are now also parameters passed
to the functions that use them instead of being retrieved when needed
from a field. This will allow them to be more easily moved to a tile-
specific context later.
There are three fields that we need to store from FrameBlockContext to
keep between frames, which are used to parse for those same fields for
the next frame.
The function serves no purpose now, any debug information we want to
pull from the decoder should be instead accessed by some other yet to
be created interface.
All state that needed to persist between calls to decode_block was
previously stored in plain Vector fields. This moves them into a struct
which sets a more explicit lifetime on that data. It may be possible to
store this data on the stack of a function with the appropriate
lifetime now that it is split into its own struct.
The default intra prediction mode was only used to set the sub-block
modes and the y prediction mode. Instead of storing it in a field, with
the sub modes are stored in an Array, we can just pull the last element
to set the y mode.
When errors are encountered by PlaybackManager, it attempts to switch
states to either Stopped or Corrupted. However, that causes it to set
the last presentation media time to the current playback time while the
last presentation time is unexpectedly negative because the seek never
ended.
Ending the seek before the state changes to Stopped or Corrupted
prevents this situation from happening.
This implements the fastest seeking mode available for tracks with cues
using an array of cue points for each track. It approximates the index
based on the seeking timestamp and then finds the earliest cue point
before the timestamp. The approximation assumes that cues will be on
a regular interval, which I don't believe is always the case, but it
should at least be faster than iterating the whole set of cue points
each time.
Cues are stored per track, but most videos will only have cue points
for the video track(s) that are present. For now, this assumes that it
should only seek based on the cue points for the selected track. To
seek audio in a video file, we should copy the seeked iterator over to
the audio track's iterator after seeking is complete. The iterator will
then skip to the next audio block.
Tracks have a timestamp scale value that should be present which scales
each block's timestamp offset to allow video to be synced with audio.
They should also contain a CodecDelay element and may also contain a
TrackOffset that offsets the block timestamps.
Now that we're able to find the nearest keyframe, we can have a fast
seeking mode that only seeks to keyframes, so that it doesn't have to
also decode inter frames until it reaches the timestamp.
The default is still accurate seeking, so that the entire seeking
implementation can be tested.
This just searches sequentially through each block in a SampleIterator
until it finds a block after the specified seek timestamp. Once it
finds one, it will try to set the input/output iterator to the most
recent keyframe. If the iterator's original position is closer to the
target, however, it leaves it at that original position, allowing
callers to continue decoding from that position until they reach the
target timestamp.
This implements the PlaybackManager portion of seeking, so that it can
seek to any frame in the video by dropping all preceding frames until
it reaches the seek point.
MatroskaDemuxer currently will only seek to the start of the video.
That means that every seek has to drop all the frames until it comes
across the target timestamp.
The PlaybackManager::update_presented_frame function was getting out of
hand and adding seeking was making it illegible. This rewrites it to be
(hopefully) quite a bit more readable, and adds a few comments to help
future readers of the code.
In addition, some helpful debugging prints were added that should help
debug any future issues with the player.
With these changes, the seek bar can be used, but only to seek to the
start of the file. Seeking to anywhere else in the file will cause an
error in the demuxer.
The timestamp label that was previously invisible now has its text set
according to either the playback or seek slider's position.
The Demuxer class was changed to return errors for more functions so
that all of the underlying reading can be done lazily. Other than that,
the demuxer interface is unchanged, and only the underlying reader was
modified.
The MatroskaDocument class is no more, and MatroskaReader's getter
functions replace it. Every MatroskaReader getter beyond the Segment
element's position is parsed lazily from the file as needed. This means
that all getter functions can return DecoderErrors which must be
handled by callers.
Making these functions static makes it easier to implement lazy-loading
since the parsing functions can now be called at any time.
The functions were reorganized because they were not defined in the
order they are called. However, instead of moving every function to
that order, I've declared some but defined them further into the file,
which allows the next commit's diff to be more readable.
Matroska::Reader functions now return DecoderErrorOr instead of values
being declared Optional. Useful errors can be handled by the users of
the parser, similarly to the VP9 decoder. A lot of the error checking
in the reader is a lot cleaner thanks to this change, since all reads
can be range checked in Streamer::read_octet() now.
Most functions for the Streamer class are now also out-of-line in
Reader.cpp now instead of residing in the header.
As new demuxers are added, this will get quite full of files, so it'll
be good to have a separate folder for these.
To avoid too many chained namespaces, the Containers subdirectory is
not also a namespace, but the Matroska folder is for the sake of
separating the multiple classes for parsed information entering the
Video namespace.
Timers keep their previously set interval even for single-shot mode.
We want all timers to fire immediately if they don't have a delay set
in the start() call.
This has two benefits:
- I observed a ~34% decrease in decoding time running TestVP9Decode.
- Removing all of these silly Vector fields helps simplify the code
relationships between all the functions in Decoder.cpp. It'll also be
much easier to make these static with template specializations, if
that turns out to be worthy performance improvement.
VideoPlayerWidget was keeping a reference to PlaybackManager when
changing files, so the old and new managers would both send frames to
be presented at the same time, causing it to flicker back and forth
between the two videos. However, PlaybackManager no longer relies on
event bubbling to pass events to its parent. By changing it to send
events directly to an Object, it can avoid being ref counted, so that
it will get destroyed with its containing object and stop sending
events.
With the addition of this struct, both the bool to determine if coefs
should be parsed and the token parse itself can take specific
parameters.
This is the last step in parameterizing all the tree parsing, so the
old functions in TreeParser are now unused. This patch is very
satisfying :^)
There's still more work to be done to clean up how the parameters are
passed from Parser, but that's work for another day.
Since these two types are often passed around as a pair, it's easier to
handle them with a simple pair struct, at least for now. Once things
are fully being passed around as parameters wherever possible, it may
be good to change this type for something more generalized.
This adds a tree-parsing function that can be called statically from
specific trees' implementations in TreeParser, of which Partition is
the first. This way, all calls to tree parses will take the context
they need to be able to select a tree and probabilities, which will
allow removal of the state dependence in TreeParser on fields from
itself and Parser.
The two different mode sets are stored in single fields, and the
underlying values didn't overlap, so there was no reason to keep them
separate.
The enum is now an enum class as well, to enforce that almost all uses
of the enum are named. The only case where underlying values are used
is in lookup tables, but it may be worth abstracting that as well to
make array bounds more clear.
Frames will now be queued for retrieval by the user of the decoder.
When the end of the current queue is reached, a DecoderError of
category NeedsMoreInput will be emitted, allowing the caller to react
by displaying what was previously retrieved for sending more samples.
I've realized that it probably makes more sense to change the input
transfer characteristics to treat these as sRGB since color conversion
in linear converted from BT.709 doesn't really make sense. If content
creation applications expect media players to display BT.709 without
conversions, this means they expect applications to treat it as sRGB,
since that's what most displays use. That most likely also means they
process it as sRGB internally, meaning we should do the same for our
color primaries conversion.
No longer will the video player explode with error dialogs that then
lock the user out of closing them.
To avoid issues where the playback state becomes invalid when an error
occurs, I've made all decoder errors pass through the frame queue.
This way, when a video is corrupted, there should be no chance that the
playback state becomes invalid due to setting the state to Corrupted
in the event handler while a presentation event is still pending.
Or at least I think that was what caused some issues I was seeing :^)
This system should be a lot more robust if any future errors need to be
handled.
Otherwise, we end up propagating those dependencies into targets that
link against that library, which creates unnecessary link-time
dependencies.
Also included are changes to readd now missing dependencies to tools
that actually need them.
This file will be the basis for abstracting away the out-of-thread or
later out-of-process decoding from applications displaying videos. For
now, the demuxer is hardcoded to be MatroskaParser, since that is all
we support so far. The demuxer should later be selected based on the
file header.
The playback and decoding are currently all done on one thread using
timers. The design of the code is such that adding threading should
be trivial, at least based on an earlier version of the code. For now,
though, it's better that this runs in one thread, as the multithreaded
approach causes the Video Player to lock up permanently after a few
frames are decoded.
This moves the setting of code points in CICP structs to member
functions completely so that the code having to set these code points
can be much cleaner.
The class is virtual and has one subclass, SubsampledYUVFrame, which
is used by the VP9 decoder to return a single frame. The
output_to_bitmap(Bitmap&) function can be used to set pixels on an
existing bitmap of the correct size to the RGB values that
should be displayed. The to_bitmap() function will allocate a new bitmap
and fill it using output_to_bitmap.
This new class also implements bilinear scaling of the subsampled U and
V planes so that subsampled videos' colors will appear smoother.
This adds a struct called CodingIndependentCodePoints and related enums
that are used by video codecs to define its color space that frames
must be converted from when displaying a video.
Pre-multiplied matrices and lookup tables are stored to avoid most of
the floating point division and exponentiation in the conversion.
Previously, some integer overflows and truncations were causing parsing
errors for 4K videos, with those fixed it can fully decode 8K video.
This adds a test to ensure that 4K video will continue to be decoded.
Note: There seems to be unexpectedly high memory usage while decoding
them, causing 8K video to require more than a gigabyte of RAM. (!!!)
Previously, saved probability tables were being inserted, causing the
Vector to increase in size when it should say fixed at a size of 4. This
changes the Vector to an Array<T, 4> which will default-initalize and
allow assigning to any index without previously setting size.
Integer overflow could sometimes occur due to counts going above 255,
where the values should instead be clamped at their maximum to avoid
wrapping to 0.
This fixes an issue causing frame 3 of the test video to fail to parse
because a reference vector was incorrectly within the range for a high
precision delta vector read.
This allows the second shown frame of the VP9 test video to be decoded,
as the second chunk uses a superframe to encode a reference frame and
a second to inter predict between the keyframe and the reference frame.
This enables the second frame of the test video to be decoded.
It appears that the test video uses a superframe (group of multiple
frames) for the first chunk of the file, but we haven't implemented
superframe parsing.
We also ignore the show_frame flag, so for now, this
means that the second frame read out is shown when it should not be. To
fix this, another error type needs to be implemented that is "thrown" to
decoder's client so they know to send another sample buffer.
The above interpolation filter mode was being taken from the left side
instead, causing some parsing errors.
This also changes the magic number 3 to SWITCHABLE_FILTERS.
Unfortunately, the spec uses the magic number, so this value was taken
instead from the reference codec, libvpx.
This gets the decoder closer to fully parsing the second frame without
any errors. It will still be unable to output an inter-predicted frame.
The lack of output causes VideoPlayer to crash if it attempts to read
the buffers for frame 1, so it is still limited to the first frame.
This changes MotionVector by removing the cpp file and moving all
functions to the header, where they are now declared as constexpr
so that they can be compile-time evaluated in LookupTables.h.
For testing purposes, the output buffer is taken directly from the
decoder and displayed in an image widget.
The first keyframe can be displayed, but the second will not decode
so VideoPlayer will stop at frame 0 for now.
This implements a BT.709 YCbCr to RGB conversion in VideoPlayer, but
that should be moved to a library for handling color space conversion.
The first keyframe of the test video can be decoded with these changes.
Raw memory allocations in the Parser have been replaced with Vector or
Array to avoid memory leaks and OOBs.
This allows runtime strings, so we can format the errors to make them
more helpful. Errors in the VP9 decoder will now print out a function,
filename and line number for where a read or bitstream requirement
has failed.
The DecoderErrorCategory enum will classify the errors so library users
can show general user-friendly error messages, while providing the
debug information separately.
Any non-DecoderErrorOr<> results can be wrapped by DECODER_TRY to
return from decoder functions. This will also add the extra information
mentioned above to the error message.
init_bool will now check whether there is enough data in the bitstream
for the range coding size to be fully read.
exit_bool will now read the entire padding element regardless of size,
which the spec does not specify a limit on.
Reads will now be done in larger chunks at a time.
The public read_byte() function was removed in favor of a private
fill_reservoir() function which will be used to fill the 64-bit
reservoir field which will then be bit-shifted and masked as necessary
for subsequent arbitrary bit-sized reads.
read_f(n) was renamed to read_bits to be clearer about its use.
The interpolation filter value is not set when reading an intra-only
frame, so printing this for the first keyframe of the file was printing
"220", which is invalid.
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).
No functional changes.
Apologies for the enormous commit, but I don't see a way to split this
up nicely. In the vast majority of cases it's a simple change. A few
extra places can use TRY instead of manual error checking though. :^)
AK's version should see better inlining behaviors, than the LibM one.
We avoid mixed usage for now though.
Also clean up some stale math includes and improper floatingpoint usage.
With this patch we are finally done with section 6.4.X of the spec :^)
The only parsing left to be done is 6.5.X, motion vector prediction.
Additionally, this patch fixes how MVs were being stored in the parser.
Originally, due to the spec naming two very different values very
similarly, these properties had totally wrong data types, but this has
now been rectified.
Though technically block decoding calls into some other incomplete
methods, so it isn't functionally complete yet. However, we are
very close to being done with the 6.4.X sections :)
These elements were being used in the new tokens implementation, so
support for them in the TreeParser has been added.
Additionally, this uncovered a bug where the nonzero contexts were
being cleared with the wrong size.
This was required for correctly parsing more than one frame's
height/width data properly. Additionally, start handling failure
a little more gracefully. Since we don't fully parse a tile before
starting to parse the next tile, we will now no longer make it past
the first tile mark, meaning we should not handle that scenario well.
The class that was previously named Decoder handled section 6.X.X of
the spec, which actually deals with parsing out the syntax of the data,
not the actual decoding logic which is specified in section 8.X.X.
The new Decoder class will be in charge of owning and running the
Parser, as well as implementing all of the decoding behavior.
Additionally, this uncovered a couple bugs with existing code,
so those have been fixed. Currently, parsing a whole video does
fail because we are now using a new calculation for frame width,
but it hasn't been fully implemented yet.
Now TreeParser has mostly complete probability calculation
implementations for all currently used syntax elements. Some of these
calculation methods aren't actually finished because they use data
we have yet to parse in the Decoder, but they're close to finished.
With the progress made in the Decoder thus far, we have the ability
to support most of the syntax element counters in the tree parser.
Additionally, it will now crash when trying to count unsupported
elements.
This patch brings all of LibVideo up to the east-const style in the
project. Additionally, it applies a few fixes from the reviews in #8170
that referred to older LibVideo code.
The TreeParser requires information about a lot of the decoder's
current state in order to parse syntax tree elements correctly, so
there has to be some communication between the Decoder and the
TreeParser. Previously, the Decoder would copy its state to the
TreeParser when it changed, however, this was a poor choice. Now,
the TreeParser simply has a reference to its owning Decoder, and
accesses its state directly.
This patch adds compressed header parsing to the VP9 decoder (section
6.4 of the spec). This is the final decoder step before we can start to
decode tiles.
This patch brings all of the previous work together and starts to
actually parse and decode frame information. Currently it only parses
the uncompressed header data (section 6.2 of the spec).
The VP9 specification requires a special decoding process to parse a
lot of the data read from a frame, so this BitStream wrapper implements
that behavior. These processes are defined in section 9 of the
VP9 spec.
This commit initializes the LibVideo library and implements parsing
basic Matroska container files. Currently, it will only parse audio
and video tracks.