This commit creates a new library LibPartition which will contain
partition related code sharable between Kernel and Userland and
includes DiskPartitionMetadata as the first shared class.
This argument is always set to description.is_blocking(), but
description is also given as a separate argument, so there's no point
to piping it through separately.
The interrupts enabled check in the Kernel mutex is there so that we
don't lock mutexes within a spinlock, because mutexes reenable
interrupts and that will mess up the spinlock in more ways than one if
the thread moves processors. This check is guarded behind a debug flag
because it's too hard to fix all the problems at once, but we regressed
and weren't even getting to init stage 2 with it enabled. With this
commit, we get to stage 2 again. In early boot, there are no interrupts
enabled and spinlocks used, so we can sort of kind of safely ignore the
interrupt state. There might be a better solution with another boot
state flag that checks whether APs are up (because they have interrupts
enabled from the start) but that seems overkill.
Right now the TD and QH descriptor pools look to be susceptible
to a race condition in the event they are accessed simultaneously
by separate threads making USB transfers. This fix does not seem to
add any noticeable overhead.
IDEChannel which is an ATAPort derived class holded a NonnullRefPtr to a
parent IDEController, although we can easily defer the usage of it to
not be in the IDEChannel code at all, so it allows to keep NonnullRefPtr
to the parent ATAController in the ATAPort base class and only there.
This abstraction layer is mainly for ATA ports (AHCI ports, IDE ports).
The goal is to create a convenient and flexible framework so it's
possible to expand to support other types of controller (e.g. Intel PIIX
and ICH IDE controllers) and to abstract operations that are possible on
each component.
Currently only the ATA IDE code is affected by this, making it much
cleaner and readable - the ATA bus mastering code is moved to the
ATAPort code so more implementations in the near future can take
advantage of such functionality easily.
In addition to that, the hierarchy of the ATA IDE code resembles more of
the SATA AHCI code now, which means the IDEChannel class is solely
responsible for getting interrupts, passing them for further processing
in the ATAPort code to take care of the rest of the handling logic.
We do that to increase clarity of the major and secondary components in
the subsystem. To ensure it's even more understandable, we rename the
files to better represent the class within them and to remove redundancy
in the name.
Also, some includes are removed from the general components of the ATA
components' classes.
We are able to read the EDID from SysFS, therefore there's no need to
provide this ioctl on a DisplayConnector anymore.
Also, now we can simply require the video pledge to be set before doing
any ioctl on a DisplayConnector.
It is starting to get a little messy with how each device can try to add
or remove itself to either /sys/dev/block or /sys/dev/char directories.
To better do this, we introduce 4 virtual methods to take care of that,
so until we ensure all nodes in /sys/dev/block and /sys/dev/char are
actual symlinks, we allow the Device base class to call virtual methods
upon insertion or before being destroying, so it add itself elegantly to
either of these directories or remove itself when needed.
For special cases where we need to create symlinks, we have two virtual
methods to be called otherwise to do almost the same thing mentioned
before, but to use symlinks instead.
Under normal conditions (when mounting SysFS in /sys), there will be a
new directory in the /sys/devices directory called "graphics".
For now, under that directory there will be only a sub-directory called
"connectors" which will contain all DisplayConnectors' details, each in
its own sub-directory too, distinguished in naming with its minor
number.
Therefore, /sys/devices/graphics/connectors/MINOR_NUMBER/ will contain:
- General device attributes such as mutable_mode_setting_capable,
double_buffering_capable, flush_support, partial_flush_support and
refresh_rate_support. These values are exposed in the ioctl interface
of the DisplayConnector class too, but these can be useful later on
for command line utilities that want/need to expose these basic
settings.
- The EDID blob, simply named "edid". This will help userspace to fetch
the edid without the need of using the ioctl interface later on.
This change in fact does the following:
1. Use support for symlinks between /sys/dev/block/ storage device
identifier nodes and devices in /sys/devices/storage/{LUN}.
2. Add basic nodes in a /sys/devices/storage/{LUN} directory, to let
userspace to know about the device and its details.
These methods are essentially splitted from the after_inserting method
and the will_be_destroyed method so later on we can allow Storage
devices to override the after_inserting method and the will_be_destroyed
method while still being able to use shared functionality as before,
such as adding the device to and removing it from the device list.
This enforces us to remove duplicated code across the SysFS code. This
results in great simplification of how the SysFS works now, because we
enforce one way to treat SysFSDirectory objects.
This will be used later on to help connecting a node at /sys/dev/block/
that represents a Storage device to a directory in /sys/devices/storage/
with details on that device in that directory.
These methods will be used later on to introduce symbolic links support
in the SysFS, so the kernel will be able to resolve relative paths of
components in filesystem based on using the m_parent_directory pointer
in each SysFSComponent object.
LUN address is essentially how people used to address SCSI devices back
in the day we had these devices more in use. However, SCSI was taken as
an abstraction layer for many Unix and Unix-like systems, so it still
common to see LUN addresses in use. In Serenity, we don't really provide
such abstraction layer, and therefore until now, we didn't use LUNs too.
However (again), this changes, as we want to let users to address their
devices under SysFS easily. LUNs make sense in that regard, because they
can be easily adapted to different interfaces besides SCSI.
For example, for legacy ATA hard drive being connected to the first IDE
controller which was enumerated on the PCI bus, and then to the primary
channel as slave device, the LUN address would be 0:0:1.
To make this happen, we add unique ID number to each StorageController,
which increments by 1 for each new instance of StorageController. Then,
we adapt the ATA and NVMe devices to use these numbers and generate LUN
in the construction time.
This folder in the SysFS code represents everything related to /sys/dev,
which is a directory meant to be a convenient interface to track all IDs
of all block and character devices (ID = major:minor numbers).
This bug was probably around for a very long time, but it is noticeable
only under VirtualBox as it generated an non fatal error which caused a
kernel panic because we VERIFYed the wrong lock to be locked.
There's no point in keeping this method as we don't really care if a
graphics adapter is VGA compatible or not because we don't use this
method anymore.
There's no real value in separating physical pages to supervisor and
user types, so let's remove the concept and just let everyone to use
"user" physical pages which can be allocated from any PhysicalRegion
we want to use. Later on, we will remove the "user" prefix as this
prefix is not needed anymore.
We are limited on the amount of supervisor pages we can allocate, so
don't allocate from that pool. Supervisor pages are always below 16 MiB
barrier so using those was crucial when we used devices like the ISA
SoundBlaster 16 card, because that device required very low physical
addresses to be used.
This used to be needed to protect accesses to Process::all_instances.
That list now has a more granular lock, so we don't need to take the
scheduler lock.
This fixes a crash when we try to access a locked Thread::m_fds in the
loop, which calls Thread::block, which then asserts that the scheduler
lock must not be locked by the current process.
Fixes#13617
We should not allocate a kernel region inside the constructor of the
VGATextModeConsole class. We do use MUST() because allocation cannot
fail at this point, but that happens in the static factory method
instead.
The original intention was to support other types of consoles based on
standard VGA modes, but it never came to an implementation, nor we need
such feature at all.
Therefore, this class is not needed and can be removed.
While null StringViews are just as bad, these prevent the removal of
StringView(char const*) as that constructor accepts a nullptr.
No functional changes.
This prevents us from needing a sv suffix, and potentially reduces the
need to run generic code for a single character (as contains,
starts_with, ends_with etc. for a char will be just a length and
equality check).
No functional changes.
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).
No functional changes.
This commit moves the length calculations out to be directly on the
StringView users. This is an important step towards the goal of removing
StringView(char const*), as it moves the responsibility of calculating
the size of the string to the user of the StringView (which will prevent
naive uses causing OOB access).
We never supported VGA framebuffers and that folder was a big misleading
part of the graphics subsystem.
We do support bare-bones VGA text console (80x25), but that only happens
to be supported because we can't be 100% sure we can always initialize
framebuffer so in the worst scenario we default to plain old VGA console
so the user can still use its own machine.
Therefore, the only remaining parts of VGA is in the GraphicsManagement
code to help driving the VGA text console if needed.
Uncommitted pages (shared zero pages) can not contain any existing data
and can not be modified, so there's no point to committing a bunch of
extra pages to cover for them in the forked child.
Since both the parent process and child process hold a reference to the
COW committed set, once the child process exits, the committed COW
pages are effectively leaked, only being slowly re-claimed each time
the parent process writes to one of them, realizing it's no longer
shared, and uncommitting it.
In order to mitigate this we now hold a weak reference the parent
VMObject from which the pages are cloned, and we use it on destruction
when available to drop the reference to the committed set from it as
well.
Until the thread is first set as Runnable at the end of sys$fork, its
state is Invalid, and as a result, the Finalizer which is searching for
Dying threads will never find it if the syscall short-circuits due to
an error condition like OOM. This also meant the parent Process of the
thread would be leaked as well.
The extra argument to fcntl is a pointer in the case of F_GETLK/F_SETLK
and we were pulling out a u32, leading to pointer truncation on x86_64.
Among other things, this fixes Assistant on x86_64 :^)
This is not explicitly specified by POSIX, but is supported by other
*nixes, already supported by our sys$bind, and expected by various
programs. While were here, also clean up the user memory copies a bit.
We were previously assuming that the how value was a bitfield, but that
is not the case, so we must explicitly check for SHUT_RDWR when
deciding on the read and write shutdowns.
The previous check for valid how values assumed this field was a bitmap
and that SHUT_RDWR was simply a bitwise or of SHUT_RD and SHUT_WR,
which is not the case.
`sigsuspend` was previously implemented using a poll on an empty set of
file descriptors. However, this broke quite a few assumptions in
`SelectBlocker`, as it verifies at least one file descriptor to be
ready after waking up and as it relies on being notified by the file
descriptor.
A bare-bones `sigsuspend` may also be implemented by relying on any of
the `sigwait` functions, but as `sigsuspend` features several (currently
unimplemented) restrictions on how returns work, it is a syscall on its
own.
When updating the signal mask, there is a small frame where we might set
up the receiving process for handing the signal and therefore remove
that signal from the list of pending signals before SignalBlocker has a
chance to block. In turn, this might cause SignalBlocker to never notice
that the signal arrives and it will never unblock once blocked.
Track the currently handled signal separately and include it when
determining if SignalBlocker should be unblocking.
I haven't found any POSIX specification on this, but the Linux kernel
appears to handle it like that.
This is required by QEMU, as it just bulk-unlocks all its file locking
bytes without checking first if they are held.
Access to RDTSC is occasionally restricted to give malware one less
option to accurately time attacks (side-channels, etc.).
However, QEMU requires access to the timestamp counter for the exact
same reason (which is accurately timing its CPU ticks), so lets just
enable it for now.
The initialize_hba method now calls the reset method to reset the HBA
and initialize each AHCIPort. Also, after full HBA reset we need to turn
on the AHCI functionality of the HBA and global interrupts since they
are cleared to 0 according to the specification in the GHC register.
Instead of doing this in a parent class like the AHCIController, let's
do that directly in the AHCIPort class as that class is the only user of
these sort of physical pages. While it seems like we waste an entire 4KB
of physical RAM for each allocation, this could serve us later on if we
want to fetch other types of logs from the ATA device.
The way AHCIPortHandler held AHCIPorts and even provided them with
physical pages for the ATA identify buffer just felt wrong.
To fix this, AHCIPortHandler is not a ref-counted object anymore. This
solves the big part of the problem, because AHCIPorts can't hold a
reference to this object anymore, only the AHCIController can do that.
Then, most of the responsibilities are shifted to the AHCIController,
making the AHCIPortHandler a handler of port interrupts only.
The AHCI code is not very good at OOM conditions, so this is a first
step towards OOM correctness. We should not allocate things inside C++
constructors because we can't catch OOM failures, so most allocation
code inside constructors is exported to a different function.
Also, don't use a HashMap for holding RefPtr of AHCIPort objects in
AHCIPortHandler because this structure is not very OOM-friendly. Instead
use a fixed Array of 32 RefPtrs, as at most we can have 32 AHCI ports
per AHCI controller.
To prevent a race condition in case we received the ARP response in the
window between creating and initializing the Thread Blocker and the
actual blocking, we were checking if the IP address was updated in the
ARP table just before starting to block.
Unfortunately, the condition was partially flipped, which meant that if
the table was updated with the IP address we would still end up
blocking, at which point we would never end unblocking again, which
would result in LookupServer locking up as well.
Currently when allocating buffers for USB transfers, it is done
once for every transfer rather than once upon creation of the
USB device. This commit changes that by moving allocation of buffers
to the USB Pipe class where they can be reused.
In the same fashion like in the Linux kernel, we support pre-initialized
framebuffers that were set up by either the BIOS or the bootloader.
These framebuffers can be backed by any kind of video hardware, and are
not tied to VGA hardware at all. Therefore, this code should be in a
separate sub-folder in the Graphics subsystem to indicate this.
The flag will automatically initialize all variables to a pattern based
on it's type. The goal being here is to eradicate an entire bug class
of issues that can originate from uninitialized stack memory.
Some examples include:
- Kernel information disclosure, where uninitialized struct members
or struct padding is copied back to usermode, leaking kernel
information such as stack or heap addresses, or secret data like
stack cookies.
- Control flow based on uninitialized memory can cause a variety of
issues at runtime, including stack corruptions like buffer
overflows, heap corruptions due to deleting stray pointers.
Even basic logic bugs can result from control flow operating on
uninitialized data.
As of GCC 12 this flag is now supported.
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=a25e0b5e6ac8a77a71c229e0a7b744603365b0e9
Clang has already supported it for a few releases.
https://reviews.llvm.org/D54604
When the size of the audio data was not a multiple of a page size,
subtracting the page size from this unsigned variable would underflow it
close to 2^32 and be clamped to the page size again. This would lead to
writes into garbage addresses because of an incorrect write size,
interestingly only causing the write() call to error out.
Using saturating math neatly fixes this problem and allows buffer
lengths that are not a multiple of a page size.
Currently CursorStyle enum handles both the styles and the steadiness or
blinking of the terminal caret, which doubles the amount of its entries.
This commit changes CursorStyle to CursorShape and moves the blinking
option to a seperate boolean value.
The RDGSBASE userspace instruction allows programs to read the contents
of the gs segment register which contains a kernel pointer to the base
of the current Processor struct.
Since we don't use this instruction in Serenity at the moment, we can
simply disable it for now to ensure we don't break KASLR. Support can
later be restored once proper swapping of the contents of gs is done on
userspace/kernel boundaries.
This is basically unchanged since the beginning of 2020, which is a year
before we had proper ASLR.
Now that we have a proper ASLR implementation, we can turn this down a
bit, as it is no longer our only protection against predictable dynamic
loader addresses, and it actually obstructs the default loading address
of x86_64 quite frequently.
There's nothing stopping a userspace program from keeping a bunch of
threads around with a custom signal stack in a suspended state with
their normal thread stack mprotected to PROT_NONE.
OpenJDK seems to do this, for example.
In a previous commit I moved everything into the new subdirectories in
FileSystem/SysFS directory without trying to actually make changes in
the code itself too much. Now it's time to split the code to make it
more readable and understandable, hence this change occurs now.
This is necessary for the next commit in the patch, otherwise this can't
be compiled. It seems like this was a hidden issue that is discovered
now only by changing includes in a mass-scale.
Move methods that are overriding the virtual methods in the File class,
to a private access scope in the DisplayConnector class because nobody
tries to access them in any derived class of this class.
- Remove some magic numbers
- Remove some duplicate branches
- Reduce the amount of casting between u8* and u32*
- Some renaming of confusing variables
The WindowServer doesn't use this interface anymore and therefore it's
not used by any userspace application, so let's remove this stale method
to ensure we don't have to bother with it anymore.
The mmap interface was removed when we introduced the DisplayConnector
class, as it was quite unsafe to use and didn't handle switching between
graphical and text modes safely. By using the SharedFramebufferVMObject,
we are able to elegantly coordinate the switch by remapping the attached
mmap'ed-Memory::Region(s) with different mappings, therefore, keeping
WindowServer to think that the mappings it has are still valid, while
they are going to a different physical range until we are back to the
graphical mode (after a switch from text mode).
Most drivers take advantage of the fact that we know where is the actual
framebuffer in physical memory space, the SharedFramebufferVMObject is
created with that information. However, the VirtIO driver is different
in that aspect, because it relies on DMA transactions to show graphics
on the framebuffer, so the SharedFramebufferVMObject is created with
that mindset to support the arbitrary framebuffer location in physical
memory space.
This new type of VMObject will be used to coordinate switching safely
from graphical mode to text mode and vice-versa, by supplying a way to
remap all Regions that were created with this object, so mappings can be
changed according to the given state of system mode. This makes it quite
easy to give applications like WindowServer the feeling of having full
access to the framebuffer device from a DisplayConnector, but still keep
the Kernel in control to be able to safely switch to text console.
We should first enable the VirtualConsole and then enable graphical
mode, to ensure proper display output on the switched-to virtual console
that has been chosen. When de-activating graphical mode, we do the
de-activating first then enable the VirtualConsole to ensure proper text
output on screen.
Keeping the exact details of a dirty rectangle doesn't make any sense
when we just flush the entire screen, so just keep a simple boolean
value to know if the screen needs to be flushed or not.
This change unifies the naming convention for kernel tasks.
The goal of this change is to:
- Make the task names more descriptive, so users can more
easily understand their purpose in System Monitor.
- Unify the naming convention so they are consistent.
For an upcoming change to support interrupts in this driver, this class
has to inherit from IRQHandler. That in turn will make this class
virtual, which will then actually call the destructor of the class. We
don't want this to happen, thus we have to wrap the class in a
AK::NeverDestroyed.
These 2 classes currently contain much code that is x86(_64) specific.
Move them to the architecture specific directory. This also allows for a
simpler implementation for aarch64.
This register can be used to check whether the 4 different types of
interrupts are masked. A different variant can be used to set/clear
specific interrupt bits.
This requires us to add an Interrupts.h file in the Kernel/Arch
directory, which includes the architecture specific files.
The commit also stubs out the functions to be able to compile the
aarch64 Kernel.
This name was misleading, as it wasn't really "getting" anything. It has
hence been renamed to `enumerate_interfaces` to reflect what it's
actually doing.
For the same reason we ignore interfaces without an IP address when
choosing where to send a route, we should also ignore interfaces without
IP addresses when updating the ARP table on incoming packets from
local addresses.
On an interface with a null address, the mask checking would always
result in zero, which resulted in the system updating the ARP table on
almost every incoming packet from any address (private or public).
This patch fixes this behavior by only applying this check to interfaces
with valid addresses and now the ARP table won't get constantly
hammered.
Closes#13713
Including signal.h would cause several ports to fail on build,
because it would end up including AK/Platform.h through these
mcontext headers. This is problematic because AK/Platform.h defines
several macros with very common names, such as `NAKED` (breaks radare2),
and `NO_SANITIZE_ADDRESS` and `ALWAYS_INLINE` (breaks ruby).
As with the previous commit, we put a distinction between filesystems
that require a file description and those which don't, but now in a much
more readable mechanism - all initialization properties as well as the
create static method are grouped to create the FileSystemInitializer
structure. Then when we need to initialize an instance, we iterate over
a table of these structures, checking for matching structure and then
validating the given arguments from userspace against the requirements
to ensure we can create a valid instance of the requested filesystem.
We do this by putting a distinction between two types of filesystems -
the first type is backed in RAM, and includes TmpFS, ProcFS, SysFS,
DevPtsFS and DevTmpFS. Because these filesystems are backed in RAM,
trying to mount them doesn't require source open file description.
The second type is filesystems that are backed by a file, therefore the
userspace program has to open them (hence it has a open file description
on them) and provide the appropriate source open file description.
By putting this distinction, we can early check if the user tried to
mount the second type of filesystems without a valid file description,
and fail with EBADF then.
Otherwise, we can proceed to either mount either type of filesystem,
provided that the fs_type is valid.
Previously the routing table did not store the route flags. This
adds basic support and exposes them in the /proc directory so that a
userspace caller can query the route and identify the type of each
route.
With the update to GCC 12.1.0, the compiler now vectorizes code with
-O2. This causes vector ops to be emitted, which are not supported in
the Kernel. Add the -mgeneral-regs-only flag to force the compiler to
not emit floating-point and SIMD ops.
By default we enable the Kernel Undefined Behavior Sanitizer, which
checks for undefined behavior at runtime. However, sometimes a developer
might want to turn that off, so now there is a easy way to do that.
When disabling UBSAN, the compiler would complain that the constraints
of the inline assembly could not be met. By adding the alignas specifier
the compiler can now determine that the struct can be passed into a
register, and thus the constraints are met.
Implement futimes() in terms of utimensat(). Now, utimensat() strays
from POSIX compliance because it also accepts a combination of a file
descriptor of a regular file and an empty path. utimensat() then uses
this file descriptor instead of the path to update the last access
and/or modification time of a file. That being said, its prior behavior
remains intact.
With the new behavior of utimensat(), `path` must point to a valid
string; given a null pointer instead of an empty string, utimensat()
sets `errno` to `EFAULT` and returns a failure.
This adds some new buffers to the `FPUState` struct, which contains
enough space for the `xsave` instruction to run. This instruction writes
the upper part of the x86 SIMD registers (YMM0-15) to a seperate
256-byte area, as well as an "xsave header" describing the region.
If the underlying processor supports AVX, the `fxsave` instruction is no
longer used, as `xsave` itself implictly saves all of the SSE and x87
registers.
Co-authored-by: Leon Albrecht <leon.a@serenityos.org>
Most of the string.h and wchar.h functions are implemented quite naively
at the moment, and GCC's pattern recognition pass might realize what we
are trying to do, and transform them into libcalls. This is usually a
useful optimization, but not when we're implementing the functions
themselves :^)
Relevant discussion from the GCC Bugzilla:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102725
This prevents the infamous recursive `strlen`.
A more proper fix would be writing these functions in assembly. That
would likely give a small performance boost as well ;)
Instead of storing the current Processor into a core local register, we
currently just store it into a global, since we don't support SMP for
aarch64 anyway. This simplifies the initial implementation.
By putting the NOLOAD sections (.bss and .super_pages) at the end of the
ELF file, objcopy does not have to insert a lot of zeros to make sure
that the .ksyms section is at the right place in memory. Now the .ksyms
section comes before the two NOLOAD sections. This shrinks the
kernel8.img with 6MB, from 8.3M to 2.3M. :^)
The sections did end up in the ELF file, however they weren't
explicitely mentioned in the linker.ld script. In the future, we can add
the --orphan-handling=error flag to the linker options, which will
enforce that the sections used in the sources files also are mentioned
in the linker script.
This fixes a weird bug that when sometimes a user tried to switch to
console mode, the screen was frozen on graphics mode. After a hour of
debugging this, it became apparent that the problem was that we left the
y offset of the bochs graphics device in an invalid state, so it was not
zero because the WindowServer changed it, and the framebuffer console
code is not aware of horizontal and vertical offsets of the framebuffer
screen, leading to the problem that the framebuffer console updates the
first framebuffer (y offset = 0), but hardware was indicated to show the
second framebuffer (y offset = first framebuffer height).
Therefore, when doing a switch between these modes, always set the y
offset to be zero.
This exposes the child processes for a process as a directory
of symlinks to the respective /proc entries for each child.
This makes for an easier and possibly more efficient way
to find and count a process's children. Previously the only
method was to parse the entire /proc/all JSON file.