thread_context_first_enter reuses the context restoring code in the
trap handler, just like other arches already do.
The `ld x2, 1*8(sp)` is unnecessary in the trap handler, as the stack
pointer should be equal to the stack pointer slot in the RegisterState
if the trap is from supervisor mode (and we currently don't support
user traps).
This load will however make us unable to reuse that code for
thread_context_first_enter.
This commit adds two functions which save/restore the entire FPU state.
On RISC-V, you only need to save the floating pointer registers
themselves and the fcsr CSR, which contains the entire state of the F/D
extensions.
Some real hardware apparently uses smaller BAR sizes than sizeof(HBA)
with a completely filled port_regs member.
Change the port_regs array to a flexible array member, so we don't panic
while verifying that the BAR size is large enough to map this struct.
Accesses to this array are already bounds checked against
AHCI::Limits::MaxPorts.
Allowing creation of StorageDevicePartition objects for any arbitrary
BlockDevice objects means that we could technically create a
StorageDevicePartition for another StorageDevicePartition which is
obviously not the intention for this code. Instead, require to pass a
StorageDevice reference to ensure this cannot happen.
It is expected that these class members will be set when the object is
created (so they're set in the class constructor method) and never
change again, as its the driver responsibility to find these values
before creating a StorageDevice object.
This makes it easier to rely on these values later on as we don't expect
them to ever change for a StorageDevice object during its lifetime.
It calculated the disk size with the zero-based max addressable block
value.
For example, for a disk device with a block size of 512 bytes that has 2
LBAs so it can address LBA 0 and LBA 1 (so m_max_addressable_block is 1)
the calculated disk size will be 512 instead of 1024 bytes.
We remove can_read() and can_write(), as both of these methods should be
implemented for proper blocking support.
For our case, the previous code will simply block the user if they tries
to read beyond the max addressable offset, which is not a correct
behavior.
Instead, just do proper EOF guarding when calling read() and write() on
such objects.
Add a method for matehmatical operations when verifying IO operation
boundaries.
Also, make max_addressable_block method non-virtual, since no other
derived class actually has ever overrided this method.
RFC9293 states that a closed socket should reply to all non-RST
packets with an RST. This change implements this behaviour as
specified in section 3.5.2 in bullet point 1.
The former automatically adapts the prefix to binary and octal
output, and is what we already use in the majority of cases.
Patch generated by:
rg -l '0x\{' | xargs sed -i '' -e 's/0x{:/{:#/'
I ran it 4 times (until it stopped changing things) since each
invocation only converted one instance per line.
No behavior change.
According to the Intel Software Developer's Manual Volume 3A section
6.12.1.3, the interrupt gate type means the IF flag is cleared to
prevent nested interruption. The trap gate type does not modify the
IF flag. Thus the gate type argument is important when constructing
an IDTEntry.
On x86_64 and x86, storage_segment (bit 12 counting from 0)
is always 0 according to the Intel Software Developer's Manual,
volume 3A, section 6.11 and section 6.14.1. It has therefore
been removed as a parameter from IDTEntry's constructor and
hardwired to 0.
In a TTY's non-canonical mode, data availability can be configured by
setting VMIN and VTIME to determine the minimum amount of bytes to read
and the timeout between bytes, respectively. Some ports (such as SRB2)
set VMIN to 0 which effectively makes reading a TTY such as stdin a
non-blocking read. We didn't support this, causing ports to hang as soon
as they try to read stdin without any data available.
Add a very duct-tapey implementation for the case where VMIN == 0 by
overwriting the TTY's description's blocking status; 3 FIXMEs are
included to make sure we clean this up some day.
This makes it possible to use MakeIndexSequqnce in functions like:
template<typename T, size_t N>
constexpr auto foo(T (&a)[N])
This means AK/StdLibExtraDetails.h must now include AK/Types.h
for size_t, which means AK/Types.h can no longer include
AK/StdLibExtras.h (which arguably it shouldn't do anyways),
which requires rejiggering some things.
(IMHO Types.h shouldn't use AK::Details metaprogramming at all.
FlatPtr doesn't necessarily have to use Conditional<> and ssize_t could
maybe be in its own header or something. But since it's tangential to
this PR, going with the tried and true "lift things that cause the
cycle up to the top" approach.)
This easily led to kernel deadlocks if the stopped thread held an
important global mutex (like the disk cache lock) while blocking.
Resolve this by ensuring stopped threads have a chance to return to the
userland boundary before actually stopping.
Locking a mutex while holding a spinlock is always wrong, but in the
case of the scheduler lock, it also causes an assertion failure. (Which
would be triggered by 2 separate threads trying to ptrace at the same
time).
This helps ensure no one accidentally accesses m_requests without first
locking it's spinlock. In fact this change fixed such a case, since
process_cq() implicitly assumed the caller locked the lock, which was
not the case for NVMePollQueue::submit_sqe().
Due to an incorrect lambda scope capture declaration, we would copy the
result status at the start of the function, before it actually got
updated with the final status. Capture it by reference instead to
ensure we report the updated result.
Instead of assuming data races won't occur and trying to somehow verify
it with manual un-atomic tracking, we can just use a recursive spinlock
instead of a normal one, to resolve the original deadlock.
Most of the actual logic is identical, with the only real difference
being that one wraps it with an async work item.
Merge the implementations to reduce duplications (which will also
require the fixes in the next commits to only be done once).
Multiple fields in sstatus are defined as WPRI "Reserved Writes Preserve
Values, Reads Ignore Values", which means we have to preserve their
values when writing to other fields in the same CSR.
We don't really need to touch any fields except SIE.
Interrupts are probably already disabled, but just to be safe,
disable them explicitly.
We need to handle the character map to set the code point before we can
reassign the correct key to the queued_event.key. This fixes keyboard
shortcuts using the incorrect keys based on the keyboard layout.
Automarks are similar to bookmarks placed by the terminal, allowing the
user to selectively remove a single command and its output from the
terminal scrollback.
This commit implements a single way to add marks: automatically placing
them when the shell becomes interactive.
To make sure the shell behaves correctly after its expected prompt
position changes, the terminal layer forces a resize event to be passed
to the shell on such (possibly) partial clears; this also has the nice
side effect of fixing the disappearing prompt on the preexisting "clear
including history" action: Fixes#4192.
I just copy-pasted microseconds_since_boot and
set_interrupt_interval_usec from aarch64.
However, on RISC-V, they are not in microseconds.
The TimerRegisters struct is also unused.
current_time and set_compare can also be private and static.
IRQHandler is not the correct class to inherit from, as the timer
is not connected to an IRQController.
Each hart has one of these Timers directly connected to it.
I originally added this header because I misunderstood how
IRQControllers are supposed to be used.
I thought that I would need a IRQController class for the hart-local
interrupt controller, but apparently, this class is supposed to be used
for non-local interrupt controllers like the IOAPIC or RISC-V PLIC.
x86 LAPICs don't use this class either.
I am not sure why 096cecb95e disabled the stack protector and sanitizers
for all files, but this is not necessary.
Only the pre_init code needs to run without them, as that code runs
identity mapped.
SysFS, ProcFS and DevPtsFS were all sending filetype 0 when traversing
their directories, but it is actually very easy to send proper filetypes
in these filesystems.
This patch binds all RAM backed filesystems to use only one enum for
their internal filetype, to simplify the implementation and allow
sharing of code.
Please note that the Plan9FS case is currently not solved as I am not
familiar with this filesystem and its constructs.
The ProcFS mostly keeps track of the filetype, and a fix was needed for
the /proc root directory - all processes exhibit a directory inside it
which makes it very easy to hardcode the directory filetype for them.
There's also the `self` symlink inode which is now exposed as DT_LNK.
As for SysFS, we could leverage the fact everything inherits from the
SysFSComponent class, so we could have a virtual const method to return
the proper filetype.
Most of the files in SysFS are "regular" files though, so the base class
has a non-pure virtual method.
Lastly, the DevPtsFS simply hardcodes '.' and '..' as directory file
type, and everything else is hardcoded to send the character device file
type, as this filesystem is only exposing character pts device files.
This is for some reason needed for riscv64 clang, as otherwise the
kernel.map file would grow too big to fit in its section inside the
kernel image.
None of our other architectures have temporary locals in their
kernel.map.
We initialize the MMU by first setting up the page tables for the
kernel image and the initial kernel stack.
Then we jump to a identity mapped page which makes the newly created
kernel root page table active by setting `satp` and then jumps to
`init`.
This trap handler uses the SBI to print an error message via a newly
introduced panic function, which is necessary as `pre_init` is running
identity mapped.
Also add a header file for `pre_init.cpp` as we wan't to use the panic
and `dbgln` function in the MMU init code as well.
We first try to use the newer "SRST" extension for rebooting and
shutting down and if that fails, we try to shutdown using the legacy
"System Shutdown" extension (which can't reboot, so we always shutdown).
The kernel will halt, if we return from here due to all attempts at
rebooting / shutting down failing.
This device will be used by userspace to read mouse packets from all
mouse devices that are attached to the machine.
This change is a preparation before we can enable seamless hotplug
capabilities in WindowServer for mouse devices, without any major change
on the userspace side.
We do this by implementing the following fixes:
- The Key_Plus is assigned to a proper map entry index now which is 0x4e
both on the keypad and non-keypad keys.
- Shift+Q now prints out "Q" properly on scan code set 2.
- Key BackSlash (or Pipe on shift key being pressed down) is now working
properly as well.
- Key_Pipe (which is "|" for en-US layout) is now working in scan code
set 2.
- Numpad keys as well as the decimal separator key are working again.
According to RFC 9293 Section 3.6.1. Half-Closed Connections, we should
still accept incoming packets in the FinWait2 state. Additionally, we
didn't handle the FIN+ACK case. We should handle this the same we
handle the FIN flag. The ACK is only added to signify successful
reception of the last packet.
The specification uses awkward numbering, marking the first byte as 7,
and the last one as 0, which caused me to misunderstand their ordering,
and use the last byte's address as the first one, and so on.
This scan code set is more advanced than the basic scan code set 1, and
is required to be supported for some bare metal hardware that might not
properly enable the PS2 first port translation in the i8042 controller.
LibWeb can now also generate bindings for keyboard events like the Pause
key, as well as other function keys (such as Right Alt, etc).
The logic for handling scan code sets is implemented by the PS2 keyboard
driver and is abstracted from the main HID KeyboardDevice code which
only handles "standard" KeyEvent(s).
RISC-V uses a different convention for storing stack frame information
described here: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc#frame-pointer-convention
This part of the psABI is not yet in a ratified version, but both GCC
and Clang seem to use this convention.
Note that the backtrace dumping code still won't work for the initial
stack, as it is located before `kernel_mapping_base`.
To allow for easy mapping between the kernel virtual addresses and
KASAN shadow memory, we map shadow memory at the very end of the
virtual range, so that we can index into it using just an offset.
To ensure this range is free when needed, we restrict the possible
KASLR range when KASAN is enabled to make sure we don't use the end of
the virtual range.
This fixes the random kernel panics that could occur when KASAN is
enabled, if the kernel was randomly placed at the very end of the
virtual range.
This commit adds minimal support for compiler-instrumentation based
memory access sanitization.
Currently we only support detection of kmalloc redzone accesses, and
kmalloc use-after-free accesses.
Support for inline checks (for improved performance), and for stack
use-after-return and use-after-return detection is left for future PRs.
This scan code set is more advanced than the basic scan code set 1, and
is required to be supported for some bare metal hardware that might not
properly enable the PS2 first port translation in the i8042 controller.
LibWeb can now also generate bindings for keyboard events like the Pause
key, as well as other function keys (such as Right Alt, etc).
The logic for handling scan code sets is implemented by the PS2 keyboard
driver and is abstracted from the main HID KeyboardDevice code which
only handles "standard" KeyEvent(s).
This will be used later on by WindowServer so it will not use the
scancode, which will represent the actual character index in the
keyboard mapping when using scan code set 2.
This adds a simple EHCI driver that currently only interrogates the
device and checks if all ports are addressable via associated legacy
controllers (companion controllers), and warns if this is not the case.
This also adds a lot of the other data structures needed for actually
driving the controller, but these are currently not hooked up to
anything.
To test this run with `SERENITY_EXTRA_QEMU_ARGS="--device usb-ehci"`
or the q35 machine type
This should allow us to eventually properly saturate high-bandwidth
network links when using TCP, once other nonoptimal parts of our
network stack are improved.
Instead of lying and claiming we always have space left in our receive
buffer, actually report the available space.
While this doesn't really affect network-bound workloads, it makes a
world of difference in cpu/disk-bound ones, like git clones. Resulting
in a considerable speed-up, and in some cases making them work at all.
(instead of the sender side hanging up the connection due to timeouts)
Previously we would incorrectly handle the (somewhat uncommon) case of
binding and then separately connecting a tcp socket to a server, as we
would register the socket during the manual bind(2) in the sockets by
tuple table, but our effective tuple would then change as the result of
the connect updating our target peer address. This would result in the
the entry not being removed from the table on destruction, which could
lead to a UAF.
We now make sure to update the table entry if needed during connects.
POSIX (rightfully so) specifies that the sendto address argument is
ignored in connection-oriented protocols.
The TCPSocket also assumed the peer address may not change post-connect
and would trigger a UAF in sockets_by_tuple() when it did.
POSIX requires that broadcast sends will only be allowed if the
SO_BROADCAST socket option was set on the socket.
Also, broadcast sends to protocols that do not support broadcast (like
TCP), should always fail.
Since the POSIX sigaltstack manpage suggests allocating the stack
region using malloc(), and many heap implementations (including ours)
store heap chunk metadata in memory just before the vended pointer,
we would end up zeroing the metadata, leading to various crashes.
In a bunch of cases, this actually ends up simplifying the code as
to_number will handle something such as:
```
Optional<I> opt;
if constexpr (IsSigned<I>)
opt = view.to_int<I>();
else
opt = view.to_uint<I>();
```
For us.
The main goal here however is to have a single generic number conversion
API between all of the String classes.
Our existing AnonymousVMObject cloning flow contains an optimization
wherein purgeable VMObjects which are marked volatile during the clone
are created as a new zero-filled VMObject (as if it was purged), which
lets us skip the expensive COW process.
Unfortunately, one crucial part was missing: Marking the cloned region
as purged, (which is the value returned from madvise when unmarking the
region as volatile) so the userland logic was left unaware of the
effective zero-ing of their memory region, resulting in odd behaviour
and crashes in places like our malloc's large allocation support.
The signal handling code (and possibly other code as well) expects this
struct to have an alignment of 16 bytes, as it pushes this struct on the
stack.
MasterPTY::read called DoubleBuffer::read which takes a mutex (which
may block) while holding m_slave's spinlock. If it did block, and was
later rescheduled on a different physical CPU, we would deadlock on
re-locking m_slave inside the unblock callback. (Since our recursive
spinlock implementation is processor based and not process based)
MasterPTY's double buffer unblock callback would take m_slave's
spinlock and then call evaluate_block_conditions() which would take
BlockerSet's spinlock, while on the other hand, BlockerSet's
add_blocker would take BlockerSet's spinlock, and then call
should_add_blocker, which would call unblock_if_conditions_are_met,
which would then call should_unblock, which will finally call
MasterPTY::can_read() which will take m_slave's spinlock.
Resolve this by moving the call to evaluate_block_conditions() out of
the scope of m_slave's spinlock, as there's no need to hold the lock
while calling it anyways.
If there's no loadable segments then there can't be any code to execute
either. This resolves a crash these kinds of ELF files would cause from
the directly following VERIFY statement.
Previously we would unintentionally leave them zero-initialized,
resulting in any threads created post fork (but without execve) having
invalid thread local storage pointers stored in their FS register.
This commit adds all necessary includes, so all functions are properly
declared.
PCI.cpp is moved to PCI/Initializer.cpp, as that matches the header
path.
The `[[gnu::packed]]` attribute apparently lowered the required
alignment of the structs, which caused the compiler to generate two
1 byte loads/stores on RISC-V. This caused the kernel to read/write
incorrect values, as the device only seems to accept 2 byte operations.
`MM.protect_kernel_image` would otherwise make the contents of these
sections read-only, as they were for some reason placed before `.data`
and after the start of `.text`.
RAMFS was passing 0, which lead to the userspace seeing all entries as
DT_UNKNOWN when iterating over the directory contents.
To repro prior to this commit, simply check `echo /tmp/*/`.
Following 77441079dd, the code in Kernel/Devices/HID/MouseDevice.cpp
is used by both USB and PS2 rodents. Make sure not to emit misleading
debug messages that could suggest that a USB mouse is a PS/2 one.
Other arches don't use the prekernel, so don't try to unmap it on
non-x86 platforms.
For some reason, this didn't cause aarch64 to crash, but on riscv64 this
would cause a panic.
This doesn't affect system functionality, but somewhat reduces the
reliance on complicated hardcoded paths. It also allows the user to
simply link /init (which is normally a symbolic link) to another program
to run it instead of SystemServer as the default option.