Currently, ephemeral port allocation is handled by the
allocate_local_port_if_needed() and protocol_allocate_local_port()
methods. Actually binding the socket to an address (which means
inserting the socket/address pair into a global map) is performed either
in protocol_allocate_local_port() (for ephemeral ports) or in
protocol_listen() (for non-ephemeral ports); the latter will fail with
EADDRINUSE if the address is already used by an existing pair present in
the map.
There used to be a bug where for listen() without an explicit bind(),
the port allocation would conflict with itself: first an ephemeral port
would get allocated and inserted into the map, and then
protocol_listen() would check again for the port being free, find the
just-created map entry, and error out. This was fixed in commit
01e5af487f by passing an additional flag
did_allocate_port into protocol_listen() which specifies whether the
port was just allocated, and skipping the check in protocol_listen() if
the flag is set.
However, this only helps if the socket is bound to an ephemeral port
inside of this very listen() call. But calling bind(sin_port = 0) from
userspace should succeed and bind to an allocated ephemeral port, in the
same was as using an unbound socket for connect() does. The port number
can then be retrieved from userspace by calling getsockname (), and it
should be possible to either connect() or listen() on this socket,
keeping the allocated port number. Also, calling bind() when already
bound (either explicitly or implicitly) should always result in EINVAL.
To untangle this, introduce an explicit m_bound state in IPv4Socket,
just like LocalSocket has already. Once a socket is bound, further
attempt to bind it fail. Some operations cause the socket to implicitly
get bound to an (ephemeral) address; this is implemented by the new
ensure_bound() method. The protocol_allocate_local_port() method is
gone; it is now up to a protocol to assign a port to the socket inside
protocol_bind() if it finds that the socket has local_port() == 0.
protocol_bind() is now called in more cases, such as inside listen() if
the socket wasn't bound before that.
Since this is the block size that file system drivers *should* set,
let's name it the logical block size, just like most file systems such
as ext2 already do anyways.
This never was a logical block size, it always was a device specific
block size. Ideally the block size would change in accordance to
whatever the driver wants to use, but that is a change for the future.
For now, let's get rid of this confusing naming.
This also makes it easier to understand and reference where these
(sometimes rather arbitrary) calculations come from.
This also fixes a bug where group_index_from_block_index assumed 1KiB
blocks.
For a long time, our shutdown procedure has basically been:
- Acquire big process lock.
- Switch framebuffer to Kernel debug console.
- Sync and lock all file systems so that disk caches are flushed and
files are in a good state.
- Use firmware and architecture-specific functionality to perform
hardware shutdown.
This naive and simple shutdown procedure has multiple issues:
- No processes are terminated properly, meaning they cannot perform more
complex cleanup work. If they were in the middle of I/O, for instance,
only the data that already reached the Kernel is written to disk, and
data corruption due to unfinished writes can therefore still occur.
- No file systems are unmounted, meaning that any important unmount work
will never happen. This is important for e.g. Ext2, which has
facilites for detecting improper unmounts (see superblock's s_state
variable) and therefore requires a proper unmount to be performed.
This was also the starting point for this PR, since I wanted to
introduce basic Ext2 file system checking and unmounting.
- No hardware is properly shut down beyond what the system firmware does
on its own.
- Shutdown is performed within the write() call that asked the Kernel to
change its power state. If the shutdown procedure takes longer (i.e.
when it's done properly), this blocks the process causing the shutdown
and prevents any potentially-useful interactions between Kernel and
userland during shutdown.
In essence, current shutdown is a glorified system crash with minimal
file system cleanliness guarantees.
Therefore, this commit is the first step in improving our shutdown
procedure. The new shutdown flow is now as follows:
- From the write() call to the power state SysFS node, a new task is
started, the Power State Switch Task. Its only purpose is to change
the operating system's power state. This task takes over shutdown and
reboot duties, although reboot is not modified in this commit.
- The Power State Switch Task assumes that userland has performed all
shutdown duties it can perform on its own. In particular, it assumes
that all kinds of clean process shutdown have been done, and remaining
processes can be hard-killed without consequence. This is an important
separation of concerns: While this commit does not modify userland, in
the future SystemServer will be responsible for performing proper
shutdown of user processes, including timeouts for stubborn processes
etc.
- As mentioned above, the task hard-kills remaining user processes.
- The task hard-kills all Kernel processes except itself and the
Finalizer Task. Since Kernel processes can delay their own shutdown
indefinitely if they want to, they have plenty opportunity to perform
proper shutdown if necessary. This may become a problem with
non-cooperative Kernel tasks, but as seen two commits earlier, for now
all tasks will cooperate within a few seconds.
- The task waits for the Finalizer Task to clean up all processes.
- The task hard-kills and finalizes the Finalizer Task itself, meaning
that it now is the only remaining process in the system.
- The task syncs and locks all file systems, and then unmounts them. Due
to an unknown refcount bug we currently cannot unmount the root file
system; therefore the task is able to abort the clean unmount if
necessary.
- The task performs platform-dependent hardware shutdown as before.
This commit has multiple remaining issues (or exposed existing ones)
which will need to be addressed in the future but are out of scope for
now:
- Unmounting the root filesystem is impossible due to remaining
references to the inodes /home and /home/anon. I investigated this
very heavily and could not find whoever is holding the last two
references.
- Userland cannot perform proper cleanup, since the Kernel's power state
variable is accessed directly by tools instead of a proper userland
shutdown procedure directed by SystemServer.
The recently introduced Firmware/PowerState procedures are removed
again, since all of the architecture-independent code can live in the
power state switch task. The architecture-specific code is kept,
however.
Once we move to a more proper shutdown procedure, processes other than
the finalizer task must be able to perform cleanup and finalization
duties, not only because the finalizer task itself needs to be cleaned
up by someone. This global variable, mirroring the early boot flags,
allows a future shutdown process to perform cleanup on its own.
Note that while this *could* be considered a weakening in security, the
attack surface is minimal and the results are not dramatic. To exploit
this, an attacker would have to gain a Kernel write primitive to this
global variable (bypassing KASLR among other things) and then gain some
way of calling the relevant functions, all of this only to destroy some
other running process. The same effect can be achieved with LPE which
can often be gained with significantly simpler userspace exploits (e.g.
of setuid binaries).
Since we never check a kernel process's state like a userland process,
it's possible for a kernel process to ignore the fact that someone is
trying to kill it, and continue running. This is not desireable if we
want to properly shutdown all processes, including Kernel ones.
This is correct since unmount doesn't treat bind mounts specially. If we
don't do this, unmounting bind mounts will call
prepare_for_last_unmount() on the guest FS much too early, which will
most likely fail due to a busy file system.
Previously, we started parsing the ELF file again in a completely
different place, and without the partial mapping that we do while
validating.
Instead of doing manual parsing in two places, just capture the
requested stack size right after we validated it.
This resolves the various "implicit truncation from int to a one-bit
wide bit-field changes value from 1 to -1" warnings produced by Clang
16+ when assigning to single-bit bitfields.
The driver would crash if it was unable to find an output route, and
subsequently the destruction of controller did not invoke
`GenericInterruptHandler::will_be_destroyed()` because on the level of
`AudioController`, that method is unavailable.
By decoupling the interrupt handling from the controller, we get a new
refcounted class that correctly cleans up after itself :^)
We used to not care about stopping an audio output stream for Intel HDA
since AudioServer would continuously send new buffers to play. Since
707f5ac150ef858760eb9faa52b9ba80c50c4262 however, that has changed.
Intel HDA now uses interrupts to detect when each buffer was completed
by the device, and uses a simple heuristic to detect whether a buffer
underrun has occurred so it can stop the output stream.
This was tested on Qemu's Intel HDA (Linux x86_64) and a bare metal MSI
Starship/Matisse HD Audio Controller.
This is a preparation before we can create a usable mechanism to use
filesystem-specific mount flags.
To keep some compatibility with userland code, LibC and LibCore mount
functions are kept being usable, but now instead of doing an "atomic"
syscall, they do multiple syscalls to perform the complete procedure of
mounting a filesystem.
The FileBackedFileSystem IntrusiveList in the VFS code is now changed to
be protected by a Mutex, because when we mount a new filesystem, we need
to check if a filesystem is already created for a given source_fd so we
do a scan for that OpenFileDescription in that list. If we fail to find
an already-created filesystem we create a new one and register it in the
list if we successfully mounted it. We use a Mutex because we might need
to initiate disk access during the filesystem creation, which will take
other mutexes in other parts of the kernel, therefore making it not
possible to take a spinlock while doing this.
Otherwise, reading will sometimes fail on the Raspberry Pi.
This is mostly a hack, the spec has some info about how the correct
divisor should be calculated and how we can recover from timeouts.
Namely, we previously forgot to configure the SD Host Controller for
4-bit mode after issuing ACMD6, which caused data transfers to fail on
bare metal.
Instead of using ifdefs to use the correct platform-specific methods, we
can just use the same pattern we use for the microseconds_delay function
which has specific implementations for each Arch CPU subdirectory.
When linking a kernel image, the actual correct and platform-specific
power-state changing methods will be called in Firmware/PowerState.cpp
file.
Since https://reviews.llvm.org/D131441, libc++ must be included before
LibC. As clang includes libc++ as one of the system includes, LibC
must be included after those, and the only correct way to do that is
to install LibC's headers into the sysroot.
Targets that don't link with LibC yet require its headers for one
reason or another must add install_libc_headers as a dependency to
ensure that the correct headers have been (re)installed into the
sysroot.
LibC/stddef.h has been dropped since the built-in stddef.h receives
a higher include priority.
In addition, string.h and wchar.h must
define __CORRECT_ISO_CPP_STRING_H_PROTO and
_LIBCPP_WCHAR_H_HAS_CONST_OVERLOADS respectively in order to tell
libc++ to not try to define methods implemented by LibC.
Once LibC is installed to the sysroot and its conflicts with libc++
are resolved, including LibC headers in such a way will cause errors
with a modern LLVM-based toolchain.
This is needed to avoid including LibC headers in Lagom builds.
Unfortunately, we cannot rely on the build machine to provide a
fully POSIX-compatible ELF header for Lagom builds, so we have to
use our own.
To ensure actual PS2 code is not tied to the i8042 code, we make them
separated in the following ways:
- PS2KeyboardDevice and PS2MouseDevice classes are no longer inheriting
from the IRQHandler class. Instead we have specific IRQHandler derived
class for the i8042 controller implementation, which is used to ensure
that we don't end up mixing PS2 code with low-level interrupt handling
functionality. In the future this means that we could add a driver for
other PS2 controllers that might have only one interrupt handler but
multiple PS2 devices are attached, therefore, making it easier to put
the right propagation flow from the controller driver all the way to
the HID core code.
- A simple abstraction layer is added between the PS2 command set which
devices could use and the actual implementation low-level commands.
This means that the code in PS2MouseDevice and PS2KeyboardDevice
classes is no longer tied to i8042 implementation-specific commands,
so now these objects could send PS2 commands to their PS2 controller
and get a PS2Response which abstracts the given response too.
The HIDController class is removed and instead adding SerialIOController
class. The HIDController class was a mistake - there's no such thing in
real hardware as host controller only for human interface devices
(VirtIO PCI input controller being the exception here, but it could be
technically treated as serial IO controller too).
Instead, we simply add a new abstraction layer - the SerialIO "bus",
which will hold all the code that is related to serial communications
with other devices. A PS2 controller is simply a serial IO controller,
and the Intel 8042 Controller is simply a specific implementation of a
PS2 controller.
Ideally, we would want the audio controller to run a channel at a
device's initial sample rate instead of hardcoding 44.1 KHz. However,
most audio is provided at 44.1 KHz and as long as `Audio::Resampler`
introduces significant audio artifacts, let's set a sensible sample
rate that offers a better experience for most users.
This can be removed after someone implements a higher quality
`Audio::Resampler`.
All code that is related to PC BIOS should not be in the Kernel/Firmware
directory as this directory is for abstracted and platform-agnostic code
like ACPI (and device tree parsing in the future).
This fixes a problem with the aarch64 architecure, as these machines
don't have any PC-BIOS in them so actually trying to access these memory
locations (EBDA, BIOS ROM) does not make any sense, as they're specific
to x86 machines only.
This code is very x86-specific, because Intel introduced the actual
MultiProcessor specification back in 1993, qouted here as a proof:
"The MP specification covers PC/AT-compatible MP platform designs based
on Intel processor architectures and Advanced Programmable Interrupt
Controller (APIC) architectures"
Most of the ACPI static parsing methods (methods that can be called
without initializing a full AML parser) are not tied to any specific
platform or CPU architecture.
The only method that is platform-specific is the one that finds the RSDP
structure. Thus, each CPU architecture/platform needs to implement it.
This means that now aarch64 can implement its own method to find the
ACPI RSDP structure, which would be hooked into the rest of the ACPI
code elegantly, but for now I just added a FIXME and that method returns
empty value of Optional<PhysicalAddress>.
Previously, reads would only be successful for offset 0. For this
reason, the maximum size that could be correctly read from the PCI
expansion ROM SysFS node was limited to the block size, and
subsequent blocks would fail. This commit fixes the computation of
the number of bytes to read.
During receive_tcp_packet(), we now set m_send_window_size for the
socket if it is different from the default.
This removes one FIXME from TCPSocket.h.
These 4 fields were made `Atomic` in
c3f668a758, at which time these were still
accessed unserialized and TOCTOU bugs could happen. Later, in
8ed06ad814, we serialized access to these
fields in a number of helper methods, removing the need for `Atomic`.
Instead of having a single available memory range that encompasses the
whole 0x00000000-0x3EFFFFFF range of physical memory, create a separate
reserved entry for the RAM range used by the VideoCore. This fixes a
crash that happens when we try to allocate physical pages in the GPU's
reserved range.
This will eventually be replaced with parsing the data from the device
tree, but for now, this should solve some of the recurring CI failures.
Like the HID, Audio and Storage subsystem, the Graphics subsystem (which
handles GPUs technically) exposes unix device files (typically in /dev).
To ensure consistency across the repository, move all related files to a
new directory under Kernel/Devices called "GPU".
Also remove the redundant "GPU" word from the VirtIO driver directory,
and the word "Graphics" from GraphicsManagement.{h,cpp} filenames.
The implemented cloning mechanism should be sound:
- If a PartitionTable is passed a File with
ShouldCloseFileDescriptor::Yes, then it will keep it alive until the
PartitionTable is destroyed.
- If a PartitionTable is passed a File with
ShouldCloseFileDescriptor::No, then the caller has to ensure that the
file descriptor remains alive.
If the caller is EBRPartitionTable, the same consideration holds.
If the caller is PartitionEditor::PartitionModel, this is satisfied by
keeping an OwnPtr<Core::File> around which is the originally opened
file.
Therefore, we never leak any fds, and never access a Core::File or fd
after destroying it.
This has KString, KBuffer, DoubleBuffer, KBufferBuilder, IOWindow,
UserOrKernelBuffer and ScopedCritical classes being moved to the
Kernel/Library subdirectory.
Also, move the panic and assertions handling code to that directory.
When deleting a directory, the rmdir syscall should fail if the path was
unveiled without the 'c' permission. This matches the same behavior that
OpenBSD enforces when doing this kind of operation.
When deleting a file, the unlink syscall should fail if the path was
unveiled without the 'w' permission, to ensure that userspace is aware
of the possibility of removing a file only when the path was unveiled as
writable.
When using the userdel utility, we now unveil that directory path with
the unveil 'c' permission so removal of an account home directory is
done properly.
The Storage subsystem, like the Audio and HID subsystems, exposes Unix
device files (for example, in the /dev directory). To ensure consistency
across the repository, we should make the Storage subsystem to reside in
the Kernel/Devices directory like the two other mentioned subsystems.
This is enforced by the hardware and an exception is generated when the
stack pointer is not properly aligned. This brings us closer to booting
the aarch64 Kernel on baremetal.
This is the only kernel issue blocking us from running the test suite.
Having userspace backtraces printed to the debug console during crashes
isn't vital to the system's function, so let's just return an empty
trace and print a FIXME instead of crashing.
After examination of all overriden Inode::traverse_as_directory methods
it seems like proper locking is already existing everywhere, so there's
no need to take the big process lock anymore, as there's no access to
shared process structures anyway.
The contents of the directory inode could change if we are not taking so
we must take the m_inode_lock to prevent corruption when reading the
directory contents.
This is not needed, because when we are doing this traversing, functions
that are called from this function are using proper and more "atomic"
locking.
"Wherever applicable" = most places, actually :^), especially for
networking and filesystem timestamps.
This includes changes to unzip, which uses DOSPackedTime, since that is
changed for the FAT file systems.
That's what this class really is; in fact that's what the first line of
the comment says it is.
This commit does not rename the main files, since those will contain
other time-related classes in a little bit.
Add a helper initialize_interrupt_queue() helper to enable_irq instead
of doing it as part of its object construction as it can fail. This is
similar to how AHCI initializes its interrupt as well.
NVMe{Poll|Interrupt}Queue don't have a try_create() method. Add one to
keep it consistent with how we create objects. Also this commit is in
preparation to moving any initialization related code out of the
constructor.
This commit lets us differentiate whether access faults are caused by
accessing junk memory addresses given to us by userspace or if we hit a
kernel bug.
The stub implementations of the `safe_*` functions currently don't let
us jump back into them and return a value indicating failure, so we
panic if such a fault happens. Practically, this means that we still
crash, but if the access violation was caused by something else, we take
the usual kernel crash code path and print a register and memory dump,
rather than hitting the `TODO_AARCH64` in `handle_safe_access_fault`.
These are used in futexes, which are needed if we want to get further in
`run-tests`.
For now, we have no way to return a non-fatal error if an access fault
is raised while executing these, so the kernel will panic. Some would
consider this a DoS vulnerability where a malicious userspace app can
crash the kernel by passing bogus pointers to it, but I prefer to call
it progress :^)
Enabling these will fix the Unsupported Exclusive or Atomic access data
fault we get on bare metal Raspberry Pi 3. On A53/A57 chips (and newer),
atomic compare-exchange operations require the data cache to be enabled.
Referencing ARM DDI 0487J.a, update the names of previously reserved
fields, and set the reset_value() of the SCTLR_EL1 struct to reflect
the defaults we want for this register on reboot.
... key-value decomposition
The RaspberryPi firmware will give us a value for the 'video' key that
contains multiple equal signs:
```
video=HDMI-A-1:1920x1080M@30D,margin_left=48,margin_right=48,[...]
```
Instead of asserting that this only has one equal sign, let's just split
it by the first one.
Remove the hardcoded "AHCI Scattered DMA" for region name as it is a
part of a common API. Add region_name parameter to the try_create API
so that this API can be used by other drivers with the correct Memory
region name.
The constructor code of ScatterGatherList had code that can return
error. Move it to try_create for better error propagation.
This removes one TODO() and one
release_value_but_fixme_should_propagate_errors().
This removes the TODO from the try_create API to return ErrorOr. This
is also a preparation patch to move the init code in the constructor
that can fail to this try_create function.
These 2 are an actual separate types of syscalls, so let's stop using
special flags for bind mounting or re-mounting and instead let userspace
calling directly for this kind of actions.
The Multiboot header stores the framebuffer's pitch in bytes, so
multiplying it by the pixel's size is not necessary. We ended up
allocating 4 times as much memory as needed, which caused us to overlap
the MMIO reserved memory area on the Raspberry Pi.
Otherwise, the message's contents might be in the cache only, so
VideoCore will read stale/garbage data from main memory.
This fixes framebuffer setup on bare metal with the data cache enabled.
While the PL011-based UART0 is currently reserved for the kernel
console, UART1 is free to be exposed to the userspace as `/dev/ttyS0`.
This will be used as the stdout of `run-tests-and-shutdown.sh` when
testing the AArch64 kernel.
The Raspberry Pi hardware doesn't support a proper software-initiated
shutdown, so this instead uses the watchdog to reboot to a special
partition which the firmware interprets as an immediate halt on
shutdown. When running under Qemu, this causes the emulator to exit.
We now have everything in the AArch64 kernel to be able to use the full
`__panic` implementation, so we can share the code with x86-64.
I have kept `__assertion_failed` separate for now, as the x86-64 version
directly executes inline assembly, thus `Kernel/Arch/aarch64/Panic.cpp`
could not be removed.
Extend reserve_irqs, allocate_irq, enable_interrupt and
disable_interrupt API to add MSI support in PCI device.
The current changes only implement single MSI message support.
TODOs have been added to support Multiple MSI Message (MME) support in
the future.
Add a struct named MSIInfo that stores all the relevant MSI
information as a part of PCI DeviceIdentifier struct.
Populate the MSI struct during the PCI device init.
These functions would have caused a `-Woverloaded-virtual` warning with
GCC 13, as they shadow `File::{attach,detach}(OpenFileDescription&)`.
Both of these functions had a single call site. This commit inlines
`attach` into its only caller, `FIFO::open_direction`.
Instead of explicitly checking `is_fifo()` in `~OpenFileDescription`
before running the `detach(Direction)` overload, let's just override the
regular `detach(OpenFileDescription&)` for `FIFO` to perform this action
instead.
This logo was actually used as a first sign of life in the very early
days of the aarch64 port.
Now that we boot into the graphical mode of the system just fine there's
no need to keep this.
This is in preparation for adding MSI(x) support to the NVMe device.
NVMeInterruptQueue needs access to the PCI device to deal with MSI(x)
interrupts. It is ok to pass the NVMeController as a reference to the
NVMeQueue as NVMeController is the one that owns the NVMeQueue.
This is very similar to how AHCIController passes its reference to its
interrupt handler.
Add an explicit QueueType enum which could be used to create a poll or
an interrupt queue. This is better than passing an Optional<irq>.
This refactoring is in preparation for adding MSIx support to NVMe.
PCIIRQHandler is a generic IRQ handler that the device driver can
inherit to use either Pin or MSI(x) based interrupt mechanism.
The PCIIRQHandler can do what the existing IRQHandler can do for pin
based interrupts but also deal with MSI based interrupts. We can
hopefully convert all the PCI based devices to use this handler so that
MSI(x) can be used.
Add reserve_irqs, allocate_irq, enable_interrupt and disable_interrupt
API to a PCI device.
reserve_irqs() can be used by a device driver that would like to
reserve irqs for MSI(x) interrupts. The API returns the type of IRQ
that was reserved by the PCI device. If the PCI device does not support
MSI(x), then it is a noop.
allocate_irq() API can be used to allocate an IRQ at an index. For
MSIx the driver needs to map the vector table into the memory and add
the corresponding IRQ at the given index. This API will return the
actual IRQ that was used so that the driver can use it create interrupt
handler for that IRQ.
{enable, disable}_interrupt API is used to enable or disable a
particular IRQ at the given index. It is a noop for pin-based
interrupts. This could be used by IRQHandler to enable or disable an
interrupt.
MSIx table entry is used to program interrupt vectors and it is
architecture specific. Add helper functions declaration in
Arch/PCIMSI.h. The definition of the function is placed in the
respective arch specific code.
Add a struct named MSIxInfo that stores all the relevant MSIx
information as a part of PCI DeviceIdentifier struct.
Populate the MSIx struct during the PCI device init. As the
DeviceIdentifier struct need to populate MSIx info, don't mark
DeviceIdentifier as const in the PCI::Device class.
MSI(x) interrupts need to reserve IRQs so that it can be programmed by
the device. Add an API to reserve contiguous ranges of interrupt
handlers so that it can used by PCI devices that use MSI(x) mechanism.
This API needs to be implemented by aarch64 architecture.
Set pin-based interrupt handler as reserved during PCI bus init.
This is required so that MSI(x) based interrupts can avoid sharing the
IRQ which has been marked as reserved.
Pin-based PCI device are allocated an IRQ, and it could be shared with
multiple devices. An interrupt handler with an IRQ for a PCI device
will get registered only during the driver initialization.
For MSI(x) interrupts, the driver has to allocate IRQs and this field
can be used to skip IRQs that have already been reserved by pin-based
interrupts so that we don't have to share IRQs, which generally will
reduce the performance.
"The official project language is American English […]."
5d2e915623/CONTRIBUTING.md (L30)
Here's a short statistic of the occurrences of the word "behavio(u)r":
$ git grep -IPioh 'behaviou?r' | sort | uniq -c | sort -n
2 BEHAVIOR
24 Behaviour
32 behaviour
407 Behavior
992 behavior
Therefore, it is clear that "behaviour" (56 occurrences) should be
regarded a typo, and "behavior" (1401 occurrences) should be preferred.
Note that The occurrences in LibJS are intentionally NOT changed,
because there are taken verbatim from the specification. Hence:
$ git grep -IPioh 'behaviou?r' | sort | uniq -c | sort -n
2 BEHAVIOR
10 behaviour
24 Behaviour
407 Behavior
1014 behavior
This was discovered by me during a work on USB keyboard patches, so it
triggered this bug.
The printing format for the VirtualAddress part is incorrect, leading to
another crash when handling page fault after accessing UNMAP_AFTER_INIT
code section.
Whenever an entry is added to the cache, the last element is removed to
make space for the new entry(if the cache is full). To make this an LRU
cache, the entry needs to be moved to the front of the list when there
is a cache hit so that the least recently used entry moves to the end
to be evicted first.
Rename the initialize method to initialize_virtio_resources so it's
clear what this method is intended for.
To ensure healthier device initialization, we could also return the type
of ErrorOr<void> from this method, so in all overriden instances and in
the original method code, we could leverage TRY() pattern which also
does simplify the code a bit.
Since multiboot_modules_count is set to 0, we can safely set the
multiboot_modules pointer to 0 (null pointer), as we don't use multiboot
on aarch64 anyway.
This reuses the existing `RPi::Mailbox` interface to read the command
line via a VideoCore-specific mailbox message. This will have to be
replaced if that interface starts being smarter, as this is needed very
early, and nothing guarantees that a smarter Mailbox interface wouldn't
need to allocate or log, which is a no-no during early boot.
As the response string can be arbitrarily long, it's the caller's job to
provide a long enough buffer for `Mailbox::query_kernel_command_line`.
This commit chose 512 bytes, as it provides a large enough headroom over
the 150-200 characters implicitly added by the VC firmware.
The portable way would be to parse the `/chosen/bootargs` property of
the device tree, but we currently lack the scaffolding for doing that.
Support for this in QEMU relies on a patch that has not yet been
accepted upstream, but is available via our `Toolchain/BuildQEMU.sh`
script. It should, however, work on bare metal.
Tested-By: Timon Kruiper <timonkruiper@gmail.com>
The Raspberry Pi's mailbox interface does not guarantee that the
returned command line is null-terminated. This commit removes that
assumption from the current code, allowing the next commit to add
support for reading it on the Pi.
This also lets us eliminate a few manual `strlen()` calls :^)