Before, some loader plugins implemented their own buffering (FLAC&MP3),
some didn't require any (WAV), and some didn't buffer at all (QOA). This
meant that in practice, while you could load arbitrary amounts of
samples from some loader plugins, you couldn't do that with some others.
Also, it was ill-defined how many samples you would actually get back
from a get_more_samples call.
This commit fixes that by introducing a layer of abstraction between the
loader and its plugins (because that's the whole point of having the
extra class!). The plugins now only implement a load_chunks() function,
which is much simpler to implement and allows plugins to play fast and
loose with what they actually return. Basically, they can return many
chunks of samples, where one chunk is simply a convenient block of
samples to load. In fact, some loaders such as FLAC and QOA have
separate internal functions for loading exactly one chunk. The loaders
*should* load as many chunks as necessary for the sample count to be
reached or surpassed (the latter simplifies loading loops in the
implementations, since you don't need to know how large your next chunk
is going to be; a problem for e.g. FLAC). If a plugin has no problems
returning data of arbitrary size (currently WAV), it can return a single
chunk that exactly (or roughly) matches the requested sample count. If a
plugin is at the stream end, it can also return less samples than was
requested! The loader can handle all of these cases and may call into
load_chunk multiple times. If the plugin returns an empty chunk list (or
only empty chunks; again, they can play fast and loose), the loader
takes that as a stream end signal. Otherwise, the loader will always
return exactly as many samples as the user requested. Buffering is
handled by the loader, allowing any underlying plugin to deal with any
weird sample count requirement the user throws at it (looking at you,
SoundPlayer!).
This (not accidentally!) makes QOA work in SoundPlayer.
`Stream` will be qualified as `AK::Stream` until we remove the
`Core::Stream` namespace. `IODevice` now reuses the `SeekMode` that is
defined by `SeekableStream`, since defining its own would require us to
qualify it with `AK::SeekMode` everywhere.
DeprecatedFlyString relies heavily on DeprecatedString's StringImpl, so
let's rename it to A) match the name of DeprecatedString, B) write a new
FlyString class that is tied to String.
This is to differentiate between the upcoming `AllocatingMemoryStream`,
which automatically allocates memory as needed instead of operating on a
static memory area.
This allows us to either pass a reference, which keeps compatibility
with old code, or to pass a NonnullOwnPtr, which allows us to
comfortably chain streams as usual.
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
This doesn't have any immediate uses, but this adapts the code a bit
more to `Core::Stream` conventions (as most functions there use
NonnullOwnPtr to handle streams) and it makes it a bit clearer that this
pointer isn't actually supposed to be null. In fact, MP3LoaderPlugin
and FlacLoaderPlugin apparently forgot to check for that completely
before starting to decode data.
This now prepares all the needed (fallible) components before actually
constructing a LoaderPlugin object, so we are no longer filling them in
at an arbitrary later point in time.
Error::from_string_literal now takes direct char const*s, while
Error::from_string_view does what Error::from_string_literal used to do:
taking StringViews. This change will remove the need to insert `sv`
after error strings when returning string literal errors once
StringView(char const*) is removed.
No functional changes.
This has been overkill from the start, and it has been bugging me for a
long time. With this change, we're probably a bit slower on most
platforms but save huge amounts of space with all in-memory sample
datastructures.
The file is now renamed to Queue.h, and the Resampler APIs with
LegacyBuffer are also removed. These changes look large because nobody
actually needs Buffer.h (or Queue.h). It was mostly transitive
dependencies on the massive list of includes in that header, which are
now almost all gone. Instead, we include common things like Sample.h
directly, which should give faster compile times as very few files
actually need Queue.h.
Previously, we were sending Buffers to the server whenever we had new
audio data for it. This meant that for every audio enqueue action, we
needed to create a new shared memory anonymous buffer, send that
buffer's file descriptor over IPC (+recfd on the other side) and then
map the buffer into the audio server's memory to be able to play it.
This was fine for sending large chunks of audio data, like when playing
existing audio files. However, in the future we want to move to
real-time audio in some applications like Piano. This means that the
size of buffers that are sent need to be very small, as just the size of
a buffer itself is part of the audio latency. If we were to try
real-time audio with the existing system, we would run into problems
really quickly. Dealing with a continuous stream of new anonymous files
like the current audio system is rather expensive, as we need Kernel
help in multiple places. Additionally, every enqueue incurs an IPC call,
which are not optimized for >1000 calls/second (which would be needed
for real-time audio with buffer sizes of ~40 samples). So a fundamental
change in how we handle audio sending in userspace is necessary.
This commit moves the audio sending system onto a shared single producer
circular queue (SSPCQ) (introduced with one of the previous commits).
This queue is intended to live in shared memory and be accessed by
multiple processes at the same time. It was specifically written to
support the audio sending case, so e.g. it only supports a single
producer (the audio client). Now, audio sending follows these general
steps:
- The audio client connects to the audio server.
- The audio client creates a SSPCQ in shared memory.
- The audio client sends the SSPCQ's file descriptor to the audio server
with the set_buffer() IPC call.
- The audio server receives the SSPCQ and maps it.
- The audio client signals start of playback with start_playback().
- At the same time:
- The audio client writes its audio data into the shared-memory queue.
- The audio server reads audio data from the shared-memory queue(s).
Both sides have additional before-queue/after-queue buffers, depending
on the exact application.
- Pausing playback is just an IPC call, nothing happens to the buffer
except that the server stops reading from it until playback is
resumed.
- Muting has nothing to do with whether audio data is read or not.
- When the connection closes, the queues are unmapped on both sides.
This should already improve audio playback performance in a bunch of
places.
Implementation & commit notes:
- Audio loaders don't create LegacyBuffers anymore. LegacyBuffer is kept
for WavLoader, see previous commit message.
- Most intra-process audio data passing is done with FixedArray<Sample>
or Vector<Sample>.
- Improvements to most audio-enqueuing applications. (If necessary I can
try to extract some of the aplay improvements.)
- New APIs on LibAudio/ClientConnection which allows non-realtime
applications to enqueue audio in big chunks like before.
- Removal of status APIs from the audio server connection for
information that can be directly obtained from the shared queue.
- Split the pause playback API into two APIs with more intuitive names.
I know this is a large commit, and you can kinda tell from the commit
message. It's basically impossible to break this up without hacks, so
please forgive me. These are some of the best changes to the audio
subsystem and I hope that that makes up for this :yaktangle: commit.
:yakring:
With the following change in how we send audio, the old Buffer type is
not really needed anymore. However, moving WavLoader to the new system
is a bit more involved and out of the scope of this PR. Therefore, we
need to keep Buffer around, but to make it clear that it's the old
buffer type which will be removed soon, we rename it to LegacyBuffer.
Most of the users will be gone after the next commit anyways.
A mistake I've repeatedly made is along these lines:
```c++
auto nread = TRY(source_file->read(buffer));
TRY(destination_file->write(buffer));
```
It's a little clunky to have to create a Bytes or StringView from the
buffer's data pointer and the nread, and easy to forget and just use
the buffer. So, this patch changes the read() function to return a
Bytes of the data that were just read.
The other read_foo() methods will be modified in the same way in
subsequent commits.
Fixes#13687
Apologies for the enormous commit, but I don't see a way to split this
up nicely. In the vast majority of cases it's a simple change. A few
extra places can use TRY instead of manual error checking though. :^)
FixedArray now doesn't expose any infallible constructors anymore.
Rather, it exposes fallible methods. Therefore, it can be used for
OOM-safe code.
This commit also converts the rest of the system to use the new API.
However, as an example, VMObject can't take advantage of this yet,
as we would have to endow VMObject with a fallible static
construction method, which would require a very fundamental change
to VMObject's whole inheritance hierarchy.
Previously, FlacLoader would read the data for each frame into a
separate vector, which are then combined via extend() in the end. This
incurs an avoidable copy per frame. By having the next_frame() function
write into a given Span, there's only one vector allocated per call to
get_more_samples().
This increases performance by at least 100% realtime, as measured by
abench, from about 1200%-1300% to (usually) 1400% on complex test files.
As long as possible, entire decoded frame sample vectors are moved into
the output vector, leading to up to 20% speedups by avoiding memmoves on
take_first.
Previously, a libc-like out-of-line error information was used in the
loader and its plugins. Now, all functions that may fail to do their job
return some sort of Result. The universally-used error type ist the new
LoaderError, which can contain information about the general error
category (such as file format, I/O, unimplemented features), an error
description, and location information, such as file index or sample
index.
Additionally, the loader plugins try to do as little work as possible in
their constructors. Right after being constructed, a user should call
initialize() and check the errors returned from there. (This is done
transparently by Loader itself.) If a constructor caused an error, the
call to initialize should check and return it immediately.
This opportunity was used to rework a lot of the internal error
propagation in both loader classes, especially FlacLoader. Therefore, a
couple of other refactorings may have sneaked in as well.
The adoption of LibAudio users is minimal. Piano's adoption is not
important, as the code will receive major refactoring in the near future
anyways. SoundPlayer's adoption is also less important, as changes to
refactor it are in the works as well. aplay's adoption is the best and
may serve as an example for other users. It also includes new buffering
behavior.
Buffer also gets some attention, making it OOM-safe and thereby also
propagating its errors to the user.
Decoding the residual in FLAC subframes is by far the most I/O-heavy
operation in FLAC decoding, as the residual data makes up the majority
of subframe data in LPC subframes. As the residual consists of many
Rice-encoded numbers with different bit sizes for differently large
numbers, the residual decoder frequently reads only one or two bytes at
a time. As we use a normal FileInputStream, that directly translates to
many calls to the read() syscall. We can see that the I/O overhead while
FLAC decoding is quite large, and much time is spent in the read()
syscall's kernel code.
This is optimized by using a Buffered<FileInputStream> instead, leading
to 4K blocks being read at once and a large reduction in I/O overhead.
Benchmarking with the new abench utility gives a 15-20% speedup on
identical files, usually pushing FLAC decoding to 10-15x realtime speed
on common sample rates.
Some nuances in the FLAC loading code can do well with an explanation,
as these non-obvious insights are often the result of long and painful
debugging and nobody should touch the affected code without careful
deliberation.
(Of course, secretly I just want people to maintain my loader code.)
:^)
This fixes all current code smells, bugs and issues reported by
SonarCloud static analysis. Other issues are almost exclusively false
positives. This makes much code clearer, and some minor benefits in
performance or bug evasion may be gained.