This function is an extended version of `chmod(2)` that lets one control
whether to dereference symlinks, and specify a file descriptor to a
directory that will be used as the base for relative paths.
When deleting an entire AddressSpace, we don't need to do TLB flushes
at all (since the entire page directory is going away anyway).
We also don't need to deallocate VM ranges one by one, since the entire
VM range allocator will be deleted anyway.
We can use `StringView::for_each_split_view` here to avoid the potential
allocation of `Vector<StringView>` elements we would get from the normal
Split view functions.
This mechanism was unsafe to use in any multithreaded context, since
the hook function was invoked on a raw pointer *after* decrementing
the local ref count.
Since we don't use it for anything anymore, let's just get rid of it.
Previously we were uncaching inodes from TmpFSInode::one_ref_left().
This was not safe, since one_ref_left() was effectively being called
on a raw pointer after decrementing the local ref count and observing
it become 1. There was a race here where someone else could trigger
the destructor by unreffing to 0 before one_ref_left() got called,
causing us to call one_ref_left() on a deleted inode.
We fix this by using the new remove_from_secondary_lists() mechanism
in ListedRefCounted and synchronizing all access to the TmpFS inode
map with the main Inode::all_instances() lock.
There's probably a nicer way to solve this.
Look for remove_from_secondary_lists() and call it on the ref-counting
target if present *while the lock is held*.
This allows listed-ref-counted objects to be present in multiple lists
and still have synchronized removal on final unref.
For "destructive" disallowance of allocations throughout the system,
Thread gains a member that controls whether allocations are currently
allowed or not. kmalloc checks this member on both allocations and
deallocations (with the exception of early boot) and panics the kernel
if allocations are disabled. This will allow for critical sections that
can't be allowed to allocate to fail-fast, making for easier debugging.
PS: My first proper Kernel commit :^)
The purpose of the PageDirectory::m_page_tables map was really just
to act as ref-counting storage for PhysicalPage objects that were
being used for the directory's page tables.
However, this was basically redundant, since we can find the physical
address of each page table from the page directory, and we can find the
PhysicalPage object from MemoryManager::get_physical_page_entry().
So if we just manually ref() and unref() the pages when they go in and
out of the directory, we no longer need PageDirectory::m_page_tables!
Not only does this remove a bunch of kmalloc() traffic, it also solves
a race condition that would occur when lazily adding a new page table
to a directory:
Previously, when MemoryManager::ensure_pte() would call HashMap::set()
to insert the new page table into m_page_tables, if the HashMap had to
grow its internal storage, it would call kmalloc(). If that kmalloc()
would need to perform heap expansion, it would end up calling
ensure_pte() again, which would clobber the page directory mapping used
by the outer invocation of ensure_pte().
The net result of the above bug would be that any invocation of
MemoryManager::ensure_pte() could erroneously return a pointer into
a kernel page table instead of the correct one!
This whole problem goes away when we remove the HashMap, as ensure_pte()
no longer does anything that allocates from the heap.
This was broken in commit 0a1b34c753 / PR #11687 since the buffer
descriptor list size was not page-aligned, and the new
`MM.allocate_dma_buffer_pages` expects a page-aligned size.
Add them in `<Kernel/API/Device.h>` and use these to provides
`{makedev,major,minor}` in `<sys/sysmacros.h>`. It aims to be more in
line with other Unix implementations and avoid code duplication in user
land.
Not all drivers need the PhysicalPage output parameter while creating
a DMA buffer. This overload will avoid creating a temporary variable
for the caller
The cacheable parameter to allocate_kernel_region should be explicitly
set to No as this region is used to do physical memory transfers. Even
though most architectures ignore this even if it is set, it is better
to make this explicit.
Two classes are added - HostBridge and MemoryBackedHostBridge, which
both derive from HostController class. This allows the kernel to map
different busses from different PCI domains in the same time. Each
HostController implementation doesn't take the Address object to address
PCI devices but instead we take distinct numbers of the PCI bus, device
and function as it allows us to specify arbitrary PCI domains in the
Address structure and still to get the correct PCI devices. This also
matches the hardware behavior of PCI domains - the host bridge merely
takes memory operations or IO operations and translates them to
addressing of three components - PCI bus, device and function.
These changes also greatly simplify how enumeration of Host Bridges work
now - scanning of the hardware depends on what the Host bridges can do
for us, so in case we have multiple host bridges that expose a memory
mapped region or IO ports to access PCI configuration space, we simply
let the code of the host bridge to figure out how to fetch data for us.
Another semantical change is that a PCI domain structure is no longer
attached to a PhysicalAddress, so even in the case that the machine
doesn't implement PCI domains, we still treat that machine to contain 1
PCI domain to treat that one host bridge in the same way, like with a
machine with one or more PCI domains.
FixedArray now doesn't expose any infallible constructors anymore.
Rather, it exposes fallible methods. Therefore, it can be used for
OOM-safe code.
This commit also converts the rest of the system to use the new API.
However, as an example, VMObject can't take advantage of this yet,
as we would have to endow VMObject with a fallible static
construction method, which would require a very fundamental change
to VMObject's whole inheritance hierarchy.
When writing into a block via an O_DIRECT open file description, we
would first flush every dirty block *except* the one we're about to
write into.
The purpose of flushing is to ensure coherency when mixing direct and
indirect accesses to the same file. This patch fixes the issue by only
flushing the affected block.
Our behavior was totally backwards and could end up doing a lot of
unnecessary work while avoiding the work that actually mattered.
Since this function always returns a CacheEntry& (after potentially
evicting someone else to make room), let's call it "ensure" instead of
"get" to match how we usually use these terms.
When doing the last unref() on a listed-ref-counted object, we keep
the list locked while mutating the ref count. The destructor itself
is invoked after unlocking the list.
This was racy with weakable classes, since their weak pointer factory
still pointed to the object after we'd decided to destroy it. That
opened a small time window where someone could try to strong-ref a weak
pointer to an object after it was removed from the list, but just before
the destructor got invoked.
This patch closes the race window by explicitly revoking all weak
pointers while the list is locked.
Although we can still consider this impossible to happen now, because
the mmap syscall entry code verifies that specified offset must be page
aligned, it's still a good practice to VERIFY we actually take a start
address as page-aligned in case of doing mmap on /dev/mem.
As for read(2) on /dev/mem, we don't map anything to userspace so it's
safe to read from whatever offset userspace specified as long as it does
not break the original rules of reading physical memory from /dev/mem.
So far we only had mmap(2) functionality on the /dev/mem device, but now
we can also do read(2) on it.
The test unit was updated to check we are doing it safely.
As it was pointed by Idan Horowitz, the rest of the method doesn't
assume we have any reserved ranges to allow mmap(2) to work on them, so
the VERIFY is not needed at all.
We should only look at the framebuffer structure members if the
MULTIBOOT_INFO_FRAMEBUFFER_INFO bit is set in the flags field.
Also add some logging if we ignored the fbdev command line argument
due to either not having a framebuffer provided by the bootloader, or
because we don't support the framebuffer format.
This allows forcing the use of only the framebuffer set up by the
bootloader and skips instantiating devices for any other graphics
cards that may be present.
Use the same trick as SlavePTY and override unref() to provide safe
removal from the sockets_by_tuple table when destroying a TCPSocket.
This should fix the TCPSocket::from_tuple() flake seen on CI.
Previously we were using this vector to store the inodes as we iterated.
However, we don't need to store all of them, just the previous inode, as
we know it will be safe to remove it once we've iterated past that
element.