Nobody uses this because the x86 prekernel environment is corrupting the
ramdisk image prior to running the actual kernel. In the future we can
ensure that the prekernel doesn't corrupt the ramdisk if we want to
bring support back. In addition to that, we could just use a RAM based
filesystem to load whatever is needed like in Linux, without the need of
additional filesystem driver.
For the mentioned corruption problem, look at issue #9893.
That code heavily relies on x86-specific instructions, and while other
CPU architectures and platforms can have PCI IDE controllers, currently
we don't support those, so this code is a special case which needs to be
in the Arch/x86 directory.
In the future it could be put back to the original place when we make it
more generic and suitable for other platforms.
The ISA IDE controller code makes sense to be compiled in a x86 build as
it relies on access to the x86 IO space. For other architectures, we can
just omit the code as there's no way we can use that code again.
To ensure we can omit the code easily, we move it to the Arch/x86
directory.
Before of this patch, we supported two methods to address a boot device:
1. Specifying root=/dev/hdXY, where X is a-z letter which corresponds to
a boot device, and Y as number from 1 to 16, to indicate the partition
number, which can be omitted to instruct the kernel to use a raw device
rather than a partition on a raw device.
2. Specifying root=PARTUUID: with a GUID string of a GUID partition. In
case of existing storage device with GPT partitions, this is most likely
the safest option to ensure booting from persistent storage.
While option 2 is more advanced and reliable, the first option has 2
caveats:
1. The string prefix "/dev/hd" doesn't mean anything beside a convention
on Linux installations, that was taken into use in Serenity. In Serenity
we don't mount DevTmpFS before we mount the boot device on /, so the
kernel doesn't really access /dev anyway, so this convention is only a
big misleading relic that can easily make the user to assume we access
/dev early on boot.
2. This convention although resemble the simple linux convention, is
quite limited in specifying a correct boot device across hardware setup
changes, so option 2 was recommended to ensure the system is always
bootable.
With these caveats in mind, this commit tries to fix the problem with
adding more addressing options as well as to remove the first option
being mentioned above of addressing.
To sum it up, there are 4 addressing options:
1. Hardware relative address - Each instance of StorageController is
assigned with a index number relative to the type of hardware it handles
which makes it possible to address storage devices with a prefix of the
commandset ("ata" for ATA, "nvme" for NVMe, "ramdisk" for Plain memory),
and then the number for the parent controller relative hardware index,
another number LUN target_id, and a third number for LUN disk_id.
2. LUN address - Similar to the previous option, but instead we rely on
the parent controller absolute index for the first number.
3. Block device major and minor numbers - by specifying the major and
minor numbers, the kernel can simply try to get the corresponding block
device and use it as the boot device.
4. GUID string, in the same fashion like before, so the user use the
"PARTUUID:" string prefix and add the GUID of the GPT partition.
For the new address modes 1 and 2, the user can choose to also specify a
partition out of the selected boot device. To do that, the user needs to
append the semicolon character and then add the string "partX" where X
is to be changed for the partition number. We start counting from 0, and
therefore the first partition number is 0 and not 1 in the kernel boot
argument.
Until now, our kernel has reimplemented a number of AK classes to
provide automatic internal locking:
- RefPtr
- NonnullRefPtr
- WeakPtr
- Weakable
This patch renames the Kernel classes so that they can coexist with
the original AK classes:
- RefPtr => LockRefPtr
- NonnullRefPtr => NonnullLockRefPtr
- WeakPtr => LockWeakPtr
- Weakable => LockWeakable
The goal here is to eventually get rid of the Lock* classes in favor of
using external locking.
In the near future, we will be able to figure out connections between
storage devices and their partitions, so there's no need to hardcode 16
partitions per storage device - each storage device should be able to
have "infinite" count of partitions in it, and we should be able to use
and figure out about them.
We do that to increase clarity of the major and secondary components in
the subsystem. To ensure it's even more understandable, we rename the
files to better represent the class within them and to remove redundancy
in the name.
Also, some includes are removed from the general components of the ATA
components' classes.
LUN address is essentially how people used to address SCSI devices back
in the day we had these devices more in use. However, SCSI was taken as
an abstraction layer for many Unix and Unix-like systems, so it still
common to see LUN addresses in use. In Serenity, we don't really provide
such abstraction layer, and therefore until now, we didn't use LUNs too.
However (again), this changes, as we want to let users to address their
devices under SysFS easily. LUNs make sense in that regard, because they
can be easily adapted to different interfaces besides SCSI.
For example, for legacy ATA hard drive being connected to the first IDE
controller which was enumerated on the PCI bus, and then to the primary
channel as slave device, the LUN address would be 0:0:1.
To make this happen, we add unique ID number to each StorageController,
which increments by 1 for each new instance of StorageController. Then,
we adapt the ATA and NVMe devices to use these numbers and generate LUN
in the construction time.
That code used the old AK::Result container, which leads to overly
complicated initialization flow when trying to figure out the correct
partition table type. Instead, when using the ErrorOr container the code
is much simpler and more understandable.
This is mainly useful when adding an HostController but due to OOM
condition, we abort temporary Vector insertion of a DeviceIdentifier
and then exit the iteration loop to report back the error if occured.
Instead, hold the lock while we copy the contents to a stack-based
Vector then iterate on it without any locking.
Because we rely on heap allocations, we need to propagate errors back
in case of OOM condition, therefore, both PCI::enumerate API function
and PCI::Access::add_host_controller_and_enumerate_attached_devices use
now a ErrorOr<void> return value to propagate errors. OOM Error can only
occur when enumerating the m_device_identifiers vector under a spinlock
and trying to expand the temporary Vector which will be used locklessly
to actually iterate over the PCI::DeviceIdentifiers objects.
If there's no PCI bus, then it's safe to assume that we run on a x86
machine that has an ISA IDE controller in the system. In such case, we
just instantiate a ISAIDEController object that assumes fixed locations
of IDE IO ports.
Function-local `static constexpr` variables can be `constexpr`. This
can reduce memory consumption, binary size, and offer additional
compiler optimizations.
These changes result in a stripped x86_64 kernel binary size reduction
of 592 bytes.
Add polling support to NVMe so that it does not use interrupt to
complete a IO but instead actively polls for completion. This probably
is not very efficient in terms of CPU usage but it does not use
interrupts to complete a IO which is beneficial at the moment as there
is no MSI(X) support and it can reduce the latency of an IO in a very
fast NVMe device.
The NVMeQueue class has been made the base class for NVMeInterruptQueue
and NVMePollQueue. The factory function `NVMeQueue::try_create` will
return the appropriate queue to the controller based on the polling
boot parameter.
The polling mode can be enabled by adding an extra boot parameter:
`nvme_poll`.
This is being used by GUID partitions so the first three dash-delimited
fields of the GUID are stored in little endian order but the last two
fields are stored in big endian order, hence it's a representation which
is mixed.
If we panic the kernel for a storage-related reason, we might as well be
helpful and print out a list of detected storage devices and their
partitions to help with debugging.
Reasons for such a panic include:
- No boot device with the given name found
- No boot device with the given UUID found
- Failing to open the root filesystem after determining a boot device
Since NVME devices end with a digit that indicates the node index we
cannot simply append a partition index. Instead, there will be a "p"
character as separator, e.g. /dev/nvme0n1p3 for the 3rd partition.
So, if the early device name ends in a digit we need to add this
separater before matching for the partition index.
If the partition index is omitted (as is the default) the root file
system is on a disk without any partition table (e.g. using QEMU).
This enables booting from the correct partition on an NVMe drive by
setting the command line variable root to e.g. root=/dev/nvme0n1p1
Add a basic NVMe driver support to serenity
based on NVMe spec 1.4.
The driver can support multiple NVMe drives (subsystems).
But in a NVMe drive, the driver can support one controller
with multiple namespaces.
Each core will get a separate NVMe Queue.
As the system lacks MSI support, PIN based interrupts are
used for IO.
Tested the NVMe support by replacing IDE driver
with the NVMe driver :^)
Like what happened with the PCI and USB code, this feels like the right
thing to do because we can improve on the ATA capabilities and keep it
distinguished from the rest of the subsystem.
We now use AK::Error and AK::ErrorOr<T> in both kernel and userspace!
This was a slightly tedious refactoring that took a long time, so it's
not unlikely that some bugs crept in.
Nevertheless, it does pass basic functionality testing, and it's just
real nice to finally see the same pattern in all contexts. :^)
There's basically no real difference in software between a SATA harddisk
and IDE harddisk. The difference in the implementation is for the host
bus adapter protocol and registers layout.
Therefore, there's no point in putting a distinction in software to
these devices.
This change also greatly simplifies and removes stale APIs and removes
unnecessary parameters in constructor calls, which tighten things
further everywhere.
This is really a basic support for AHCI hotplug events, so we know how
to add a node representing the device in /sys/dev/block and removing it
according to the event type (insertion/removal).
This change doesn't take into account what happens if the device was
mounted or a read/write operation is being handled.
For this to work correctly, StorageManagement now uses the Singleton
container, as it might be accessed simultaneously from many CPUs
for hotplug events. DiskPartition holds a WeakPtr instead of a RefPtr,
to allow removal of a StorageDevice object from the heap.
StorageDevices are now stored and being referenced to via an
IntrusiveList to make it easier to remove them on hotplug event.
In future changes, all of the stated above might change, but for now,
this commit represents the least amount of changes to make everything
to work correctly.
These methods are no longer needed because SystemServer is able to
populate the DevFS on its own.
Device absolute_path no longer assume a path to the /dev location,
because it really should not assume any path to a Device node.
Because StorageManagement still needs to know the storage name, we
declare a virtual method only for StorageDevices to override, but this
technique should really be removed later on.