In a previous commit I moved everything into the new subdirectories in
FileSystem/SysFS directory without trying to actually make changes in
the code itself too much. Now it's time to split the code to make it
more readable and understandable, hence this change occurs now.
This is necessary for the next commit in the patch, otherwise this can't
be compiled. It seems like this was a hidden issue that is discovered
now only by changing includes in a mass-scale.
Move methods that are overriding the virtual methods in the File class,
to a private access scope in the DisplayConnector class because nobody
tries to access them in any derived class of this class.
- Remove some magic numbers
- Remove some duplicate branches
- Reduce the amount of casting between u8* and u32*
- Some renaming of confusing variables
The WindowServer doesn't use this interface anymore and therefore it's
not used by any userspace application, so let's remove this stale method
to ensure we don't have to bother with it anymore.
The mmap interface was removed when we introduced the DisplayConnector
class, as it was quite unsafe to use and didn't handle switching between
graphical and text modes safely. By using the SharedFramebufferVMObject,
we are able to elegantly coordinate the switch by remapping the attached
mmap'ed-Memory::Region(s) with different mappings, therefore, keeping
WindowServer to think that the mappings it has are still valid, while
they are going to a different physical range until we are back to the
graphical mode (after a switch from text mode).
Most drivers take advantage of the fact that we know where is the actual
framebuffer in physical memory space, the SharedFramebufferVMObject is
created with that information. However, the VirtIO driver is different
in that aspect, because it relies on DMA transactions to show graphics
on the framebuffer, so the SharedFramebufferVMObject is created with
that mindset to support the arbitrary framebuffer location in physical
memory space.
This new type of VMObject will be used to coordinate switching safely
from graphical mode to text mode and vice-versa, by supplying a way to
remap all Regions that were created with this object, so mappings can be
changed according to the given state of system mode. This makes it quite
easy to give applications like WindowServer the feeling of having full
access to the framebuffer device from a DisplayConnector, but still keep
the Kernel in control to be able to safely switch to text console.
We should first enable the VirtualConsole and then enable graphical
mode, to ensure proper display output on the switched-to virtual console
that has been chosen. When de-activating graphical mode, we do the
de-activating first then enable the VirtualConsole to ensure proper text
output on screen.
Keeping the exact details of a dirty rectangle doesn't make any sense
when we just flush the entire screen, so just keep a simple boolean
value to know if the screen needs to be flushed or not.
This change unifies the naming convention for kernel tasks.
The goal of this change is to:
- Make the task names more descriptive, so users can more
easily understand their purpose in System Monitor.
- Unify the naming convention so they are consistent.
For an upcoming change to support interrupts in this driver, this class
has to inherit from IRQHandler. That in turn will make this class
virtual, which will then actually call the destructor of the class. We
don't want this to happen, thus we have to wrap the class in a
AK::NeverDestroyed.
These 2 classes currently contain much code that is x86(_64) specific.
Move them to the architecture specific directory. This also allows for a
simpler implementation for aarch64.
This register can be used to check whether the 4 different types of
interrupts are masked. A different variant can be used to set/clear
specific interrupt bits.
This requires us to add an Interrupts.h file in the Kernel/Arch
directory, which includes the architecture specific files.
The commit also stubs out the functions to be able to compile the
aarch64 Kernel.
This name was misleading, as it wasn't really "getting" anything. It has
hence been renamed to `enumerate_interfaces` to reflect what it's
actually doing.
For the same reason we ignore interfaces without an IP address when
choosing where to send a route, we should also ignore interfaces without
IP addresses when updating the ARP table on incoming packets from
local addresses.
On an interface with a null address, the mask checking would always
result in zero, which resulted in the system updating the ARP table on
almost every incoming packet from any address (private or public).
This patch fixes this behavior by only applying this check to interfaces
with valid addresses and now the ARP table won't get constantly
hammered.
Closes#13713
Including signal.h would cause several ports to fail on build,
because it would end up including AK/Platform.h through these
mcontext headers. This is problematic because AK/Platform.h defines
several macros with very common names, such as `NAKED` (breaks radare2),
and `NO_SANITIZE_ADDRESS` and `ALWAYS_INLINE` (breaks ruby).
As with the previous commit, we put a distinction between filesystems
that require a file description and those which don't, but now in a much
more readable mechanism - all initialization properties as well as the
create static method are grouped to create the FileSystemInitializer
structure. Then when we need to initialize an instance, we iterate over
a table of these structures, checking for matching structure and then
validating the given arguments from userspace against the requirements
to ensure we can create a valid instance of the requested filesystem.
We do this by putting a distinction between two types of filesystems -
the first type is backed in RAM, and includes TmpFS, ProcFS, SysFS,
DevPtsFS and DevTmpFS. Because these filesystems are backed in RAM,
trying to mount them doesn't require source open file description.
The second type is filesystems that are backed by a file, therefore the
userspace program has to open them (hence it has a open file description
on them) and provide the appropriate source open file description.
By putting this distinction, we can early check if the user tried to
mount the second type of filesystems without a valid file description,
and fail with EBADF then.
Otherwise, we can proceed to either mount either type of filesystem,
provided that the fs_type is valid.
Previously the routing table did not store the route flags. This
adds basic support and exposes them in the /proc directory so that a
userspace caller can query the route and identify the type of each
route.
With the update to GCC 12.1.0, the compiler now vectorizes code with
-O2. This causes vector ops to be emitted, which are not supported in
the Kernel. Add the -mgeneral-regs-only flag to force the compiler to
not emit floating-point and SIMD ops.
By default we enable the Kernel Undefined Behavior Sanitizer, which
checks for undefined behavior at runtime. However, sometimes a developer
might want to turn that off, so now there is a easy way to do that.
When disabling UBSAN, the compiler would complain that the constraints
of the inline assembly could not be met. By adding the alignas specifier
the compiler can now determine that the struct can be passed into a
register, and thus the constraints are met.
Implement futimes() in terms of utimensat(). Now, utimensat() strays
from POSIX compliance because it also accepts a combination of a file
descriptor of a regular file and an empty path. utimensat() then uses
this file descriptor instead of the path to update the last access
and/or modification time of a file. That being said, its prior behavior
remains intact.
With the new behavior of utimensat(), `path` must point to a valid
string; given a null pointer instead of an empty string, utimensat()
sets `errno` to `EFAULT` and returns a failure.
This adds some new buffers to the `FPUState` struct, which contains
enough space for the `xsave` instruction to run. This instruction writes
the upper part of the x86 SIMD registers (YMM0-15) to a seperate
256-byte area, as well as an "xsave header" describing the region.
If the underlying processor supports AVX, the `fxsave` instruction is no
longer used, as `xsave` itself implictly saves all of the SSE and x87
registers.
Co-authored-by: Leon Albrecht <leon.a@serenityos.org>
Most of the string.h and wchar.h functions are implemented quite naively
at the moment, and GCC's pattern recognition pass might realize what we
are trying to do, and transform them into libcalls. This is usually a
useful optimization, but not when we're implementing the functions
themselves :^)
Relevant discussion from the GCC Bugzilla:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102725
This prevents the infamous recursive `strlen`.
A more proper fix would be writing these functions in assembly. That
would likely give a small performance boost as well ;)
Instead of storing the current Processor into a core local register, we
currently just store it into a global, since we don't support SMP for
aarch64 anyway. This simplifies the initial implementation.
By putting the NOLOAD sections (.bss and .super_pages) at the end of the
ELF file, objcopy does not have to insert a lot of zeros to make sure
that the .ksyms section is at the right place in memory. Now the .ksyms
section comes before the two NOLOAD sections. This shrinks the
kernel8.img with 6MB, from 8.3M to 2.3M. :^)
The sections did end up in the ELF file, however they weren't
explicitely mentioned in the linker.ld script. In the future, we can add
the --orphan-handling=error flag to the linker options, which will
enforce that the sections used in the sources files also are mentioned
in the linker script.
This fixes a weird bug that when sometimes a user tried to switch to
console mode, the screen was frozen on graphics mode. After a hour of
debugging this, it became apparent that the problem was that we left the
y offset of the bochs graphics device in an invalid state, so it was not
zero because the WindowServer changed it, and the framebuffer console
code is not aware of horizontal and vertical offsets of the framebuffer
screen, leading to the problem that the framebuffer console updates the
first framebuffer (y offset = 0), but hardware was indicated to show the
second framebuffer (y offset = first framebuffer height).
Therefore, when doing a switch between these modes, always set the y
offset to be zero.
This exposes the child processes for a process as a directory
of symlinks to the respective /proc entries for each child.
This makes for an easier and possibly more efficient way
to find and count a process's children. Previously the only
method was to parse the entire /proc/all JSON file.
This in turn makes the built-in kernel console much more nicer to look
into, so let's remove the support for 8x8 bitmap and instead add 8x16
font bitmap.
The old methods are already can be considered deprecated, and now after
we removed framebuffer devices entirely, we can safely remove these
methods too, which simplfies the GenericGraphicsAdapter class a lot.
Instead of letting the user to determine whether framebuffer devices
will be created (which is useless because they are gone by now), let's
simplify the flow by allowing the user to choose between full, limited
or disabled functionality. The determination happens only once, so, if
the user decided to disable graphics support, the initialize method
exits immediately. If limited functionality is chosen, then a generic
DisplayConnector is initialized with the preset framebuffer resolution,
if present, and then the initialize method exits. As a default, the code
proceeds to initialize all drivers as usual.
This ioctl is more appropriate when the hardware supports flushing of
the entire framebuffer, so we use that instead of the previous default
FB_IOCTL_FLUSH_HEAD_BUFFERS ioctl.
We shouldn't expose the VirtIO GPU3DDevice constructor as public method,
so instead, let's use the usual pattern of a static construction method
that uses the constructor within the method.
Such mechanism will be used by the Intel Graphics driver, because we
lack support of changing the resolution on this driver currently, so,
when WindowServer will try to mode-set the display then it will fail,
and will use the safe mode-setting call instead to be able to show
something on screen.
The DisplayConnector class is meant to replace the FramebufferDevice
class. The advantage of this class over the FramebufferDevice class is:
1. It removes the mmap interface entirely. This interface is unsafe, as
multiple processes could try to use it, and when switching to and from
text console mode, there's no "good" way to revoke a memory mapping from
this interface, let alone when there are multiple processes that call
this interface. Therefore, in the DisplayConnector class there's no
implementation for this method at all.
2. The class uses a new real-world structure called ModeSetting, which
takes into account the fact that real hardware requires more than width,
height and pitch settings to mode-set the display resolution.
3. The class assumes all instances should supply some sort of EDID,
so it facilitates such mechanism to do so. Even if a given driver does
not know what is the actual EDID, it will ask to create default-generic
EDID blob.
3. This class shifts the responsibilies of switching between console
mode and graphical mode from a GraphicsAdapter to the DisplayConnector
class, so when doing the switch, the GraphicsManagement code actually
asks each DisplayConnector object to do the switch and doesn't rely on
the GraphicsAdapter objects at all.
Since kmalloc() now works, we can actually load the kernel symbol table!
This in turn allows us to call dump_backtrace(), and actually get a
useful backtrace in the aarch64 Kernel.
These functions are called by kmalloc, and since there is no support for
threading in the aarch64 build yet, we can simply remove the
VERIFY_NOT_REACHED().
The code in Spinlock.h has no architectural specific logic, thus can be
moved to the Arch directory. This contains no functional change.
Also add the Spinlock.cpp file for aarch64 which contains stubs for the
lock and unlock functions.
Previously the embedmap.sh script generated a warning, since there was
no section defined where the actual kernel.map could be stored. This is
necesarry for generating kernel backtraces.
This compiler builtin abstracts away the specifics of fetching the frame
pointer. This will allow the KSyms.cpp to be build for the aarch64
target. While we're here, lets also change the
PerformanceEventBuffer.cpp to not rely on x86_64 specifics.
Previously in the aarch64 Kernel, this would cause dbgln() to actually
print more characters of the next string in memory, because strings in
the Kernel are not zero terminated by default. Prevent this by using the
passed in length of the string.
When calling dbgln(), the formatting code in AK/Format.h calls
Processor::is_initialized() to determine whether to add some text about
the current processor to the debug output. Instead of crashing, we just
return false, such that we can use dbgln() etc in the aarch64 Kernel.
This allows us to use the AK formatting functions in the aarch64 Kernel.
Also add FIXME to make sure that this file will be removed when the
proper abstractions are in place in the normal Kernel/kprintf.cpp.
The compiler figured out that the MemoryManager is not initialised, and
thus MemoryManager::the() cannot return a valid reference. Once the
necesarry code is in place, this compiler flag can be removed.
Coverage tools like LLVM's source-based coverage or GNU's --coverage
need to be able to write out coverage files from any binary, regardless
of its security posture. Not ignoring these pledges and veils means we
can't get our coverage data out without playing some serious tricks.
However this is pretty terrible for normal exeuction, so only skip these
checks when we explicitly configured userspace for coverage.
It doesn't make sense after introduction of routing table which allows
having multiple gateways for every interface, and isn't used by any of
the userspace programs now.
This will allow using the console tty and WindowServer regardless of
your kernel command line. Also this fixes a bug where, when booting in
text mode, the console was in graphical mode, and would not accept
input.
That code used the old AK::Result container, which leads to overly
complicated initialization flow when trying to figure out the correct
partition table type. Instead, when using the ErrorOr container the code
is much simpler and more understandable.
Previously the system had no concept of assigning different routes for
different destination addresses as the default gateway IP address was
directly assigned to a network adapter. This default gateway was
statically assigned and any update would remove the previously existing
route.
This patch is a beginning step towards implementing #180. It implements
a simple global routing table that is referenced during the routing
process. With this implementation it is now possible for a user or
service (i.e. DHCP) to dynamically add routes to the table.
The routing table will select the most specific route when possible. It
will select any direct match between the destination and routing entry
addresses. If the destination address overlaps between multiple entries,
the Kernel will use the longest prefix match, or the longest number of
matching bits between the destination address and the routing address.
In the event that there is no entries found for a specific destination
address, this implementation supports entries for a default route to be
set for any specified interface.
This is a small first step towards enhancing the system's routing
capabilities. Future enhancements would include referencing a
configuration file at boot to load pre-defined static routes.
I've noticed that the KVM hypervisor vendor ID string contained null
terminators in the serialized JSON string in /proc/cpuinfo - let's avoid
that, and err on the side of caution and strip them from all strings
built from CPUID register values. They may not be fixed width after all.
This creates all interfaces when the device is enumerated, with a link
to the configuration that it is a part of. As such, a new class,
`USBInterface` has been introduced to express this state.
Some other parts of the USB stack may require us to perform a control
transfer. Instead of abusing `friend` to expose the default pipe, let's
just expose it via a function.
This also introduces a new class, `USBConfiguration` that stores a
configuration. The device, when instructed, sets this configuration and
holds a pointer to it so we have a record of what configuration is
currently active.
AnonymousFile always allocates in multiples of a page size when created
with anon_create. This is especially an issue if we use AnonymousFile
shared memory to store a shared data structure that isn't exactly a
multiple of a page in size. Therefore, we can just allow mmaps of
AnonymousFile to map only an initial part of the shared memory.
This makes SharedSingleProducerCircularQueue work when it's introduced
later.
In most cases it's safe to abort the requested operation and go forward,
however, in some places it's not clear yet how to handle these failures,
therefore, we use the MUST() wrapper to force a kernel panic for now.
On the QEMU microvm machine type, it became apparent that the BIOS was
not setting the i8042 controller to function as expected. To ensure that
the controller is always outputting correct scan codes, set it to scan
code 2 and enable first port translation to ensure all scan codes are
translated to scan code set 1. This is the expected behavior when using
SeaBIOS, but on qboot (the BIOS for the QEMU microvm machine type), the
firmware doesn't take care of this so we need to do this ourselves.
This keeps us from accidentally overwriting an already set region name,
for example when we are mapping a file (as, in this case, the file name
is already stored in the region).
Since KASLR was added kernel_load_base only signifies the address at
which the kernel image start, not the start of kernel memory, meaning
that a valid kernel stack can be allocated before it in memory.
We use kernel_mapping_base, the lowest address covered by the kernel
page directory, as the minimal address when performing safety checks
during backtrace generation.
When we lock a mutex, eventually `Thread::block` is invoked which could
in turn invoke `Process::big_lock().restore_exclusive_lock()`. This
would then try to add the current thread to a different blocked thread
list then the one in use for the original mutex being locked, and
because it's an intrusive list, the thread is removed from its original
list during the `.append()`. When the original mutex eventually
unblocks, we no longer have the thread in the intrusive blocked threads
list and we panic.
Solve this by making the big lock mutex special and giving it its own
blocked thread list. Because the process big lock is temporary and is
being actively removed from e.g. syscalls, it's a matter of time before
we can also remove the fix introduced by this commit.
Fixes issue #9401.
If we unregister from the RegionTree before unmapping, there's a race
where a new region can get inserted at the same address that we're about
to unmap. If this happens, ~Region() will then unmap the newly inserted
region, which now finds itself with cleared-out page table entries.
This had no business being in RegionTree, since RegionTree doesn't track
identity-mapped regions anyway. (We allow *any* address to be identity
mapped, not just the ones that are part of the RegionTree's range.)
This patch adds RegionTree::get_lock() which exposes the internal lock
inside RegionTree. We can then lock it from the outside when doing
lookups or traversal.
This solution is not very beautiful, we should find a way to protect
this data with SpinlockProtected or something similar. This is a stopgap
patch to try and fix the currently flaky CI.
This syscall ends up disabling interrupts while changing the time,
and the clock is a global resource anyway, so preventing threads in the
same process from running wouldn't solve anything.
Let's use terminology from the the Intel manual to avoid confusion.
Also add `_string` suffixes to better distinguish the numeric values
from the string values.
...and remove the last remaining client of the API. It's no longer
possible to ask the RegionTree for a VM range. You can only ask it to
place your Region somewhere in available space.
This patch move AddressSpace (the per-process memory manager) to using
the new atomic "place" APIs in RegionTree as well, just like we did for
MemoryManager in the previous commit.
This required updating quite a few places where VM allocation and
actually committing a Region object to the AddressSpace were separated
by other code.
All you have to do now is call into AddressSpace once and it'll take
care of everything for you.
Instead of first allocating the VM range, and then inserting a region
with that range into the MM region tree, we now do both things in a
single atomic operation:
- RegionTree::place_anywhere(Region&, size, alignment)
- RegionTree::place_specifically(Region&, address, size)
To reduce the number of things we do while locking the region tree,
we also require callers to provide a constructed Region object.
This patch ports MemoryManager to RegionTree as well. The biggest
difference between this and the userspace code is that kernel regions
are owned by extant OwnPtr<Region> objects spread around the kernel,
while userspace regions are owned by the AddressSpace itself.
For kernelspace, there are a couple of situations where we need to make
large VM reservations that never get backed by regular VMObjects
(for example the kernel image reservation, or the big kmalloc range.)
Since we can't make a VM reservation without a Region object anymore,
this patch adds a way to create unbacked Region objects that can be
used for this exact purpose. They have no internal VMObject.)
RegionTree holds an IntrusiveRedBlackTree of Region objects and vends a
set of APIs for allocating memory ranges.
It's used by AddressSpace at the moment, and will be used by MM soon.
This patch stops using VirtualRangeAllocator in AddressSpace and instead
looks for holes in the region tree when allocating VM space.
There are many benefits:
- VirtualRangeAllocator is non-intrusive and would call kmalloc/kfree
when used. This new solution is allocation-free. This was a source
of unpleasant MM/kmalloc deadlocks.
- We consolidate authority on what the address space looks like in a
single place. Previously, we had both the range allocator *and* the
region tree both being used to determine if an address was valid.
Now there is only the region tree.
- Deallocation of VM when splitting regions is no longer complicated,
as we don't need to keep two separate trees in sync.
This is important for dmidecode because it does an fstat on the DMI
blobs, trying to figure out their size. Because we already know the size
of the blobs when creating the SysFS components, there's no performance
penalty whatsoever, and this allows dmidecode to not use the /dev/mem
device as a fallback.
The current implementation of read/write will fail in StorageDevice
when the request length is less than the block size of the underlying
device. Fix it by calculating the offset within a block for such cases
and using it for copying data from the bounce buffer.
8233da3398 introduced a not-so-subtle bug
where an application with an existing pledge set containing `no_error`
could elevate its pledge set by pledging _anything_, this commit makes
sure that no new promise is accepted.
Some error indication was done by returning bool. This was changed to
propagate the error by ErrorOr from the underlying functions. The
returntype of the underlying functions was also changed to propagate the
error.
We're now able to detect all the regular CPUID feature flags from
ECX/EDX for EAX=1 :^)
None of the new ones are being used for anything yet, but they will show
up in /proc/cpuinfo and subsequently lscpu and SystemMonitor.
Note that I replaced the periods from the SSE 4.1 and 4.2 instructions
with underscores, which matches the internal enum names, Linux's
/proc/cpuinfo and the general pattern of replacing special characters
with underscores to limit feature names to [a-z0-9_].
The enum member stringification has been moved to a new function for
better re-usability and to avoid cluttering up Processor.cpp.
This will make it possible to add many, many more CPU features - more
than the current limit 32 and later limit of 64 if we stick with an enum
class to be specific :^)
Checks of ECX go before EDX, and the bit indices are now ordered
properly. Additionally, handling of the EDX[11] bit has been moved into
a lambda function to keep the series of if statements neatly together.
All of this makes it *a lot* easier to follow along and compare the
implementation to the tables in the Intel manual, e.g. to find missing
checks.
This solves a problem where any non-trivial member in the global BSP
Processor instance would get re-initialized (improperly), losing data
that was already initialized earlier.
Expose the block size variable via a member function in the
AsyncBlockDeviceRequest so that the driver doesn't need to assume any
value such as 512 bytes.
The underlying driver does not need to recalculate the buffer size as
it is passed in the AsyncBlockDevice struct anyway. This also helps in
removing any assumptions of the underlying block size of the device.
This makes pledge() ignore promises that would otherwise cause it to
fail with EPERM, which is very useful for allowing programs to run under
a "jail" so to speak, without having them termiate early due to a
failing pledge() call.