The kernel sampling profiler will walk thread stacks during the timer
tick handler. Since it's not safe to trigger page faults during IRQ's,
we now avoid this by checking the page tables manually before accessing
each stack location.
We're not equipped to deal with page faults during an IRQ handler,
so add an assertion so we can immediately tell what's wrong.
This is why profiling sometimes hangs the system -- walking the stack
of the profiled thread causes a page fault and things fall apart.
Anonymous VM objects should never have null entries in their physical
page list. Instead, "empty" or untouched pages should refer to the
shared zero page.
Fixes#1237.
A 16-bit refcount is just begging for trouble right nowl.
A 32-bit refcount will be begging for trouble later down the line,
so we'll have to revisit this eventually. :^)
This patch adds a globally shared zero-filled PhysicalPage that will
be mapped into every slot of every zero-filled AnonymousVMObject until
that page is written to, achieving CoW-like zero-filled pages.
Initial testing show that this doesn't actually achieve any sharing yet
but it seems like a good design regardless, since it may reduce the
number of page faults taken by programs.
If you look at the refcount of MM.shared_zero_page() it will have quite
a high refcount, but that's just because everything maps it everywhere.
If you want to see the "real" refcount, you can build with the
MAP_SHARED_ZERO_PAGE_LAZILY flag, and we'll defer mapping of the shared
zero page until the first NP read fault.
I've left this behavior behind a flag for future testing of this code.
This was only used by HashTable::dump() which I used when doing the
first HashTable implementation. Removing this allows us to also remove
most includes of <AK/kstdio.h>.
It doesn't look healthy to create raw references into an array before
a temporary unlock. In fact, that temporary unlock looks generally
unhealthy, but it's a different problem.
Previously we were only checking that each of the virtual pages in the
specified range were valid.
This made it possible to pass in negative buffer sizes to some syscalls
as long as (address) and (address+size) were on the same page.
Previously it was not possible for this function to fail. You could
exploit this by triggering the creation of a VMObject whose physical
memory range would wrap around the 32-bit limit.
It was quite easy to map kernel memory into userspace and read/write
whatever you wanted in it.
Test: Kernel/bxvga-mmap-kernel-into-userspace.cpp
When using dbg() in the kernel, the output is automatically prefixed
with [Process(PID:TID)]. This makes it a lot easier to understand which
thread is generating the output.
This patch also cleans up some common logging messages and removes the
now-unnecessary "dbg() << *current << ..." pattern.
We don't need to have this method anymore. It was a hack that was used
in many components in the system but currently we use better methods to
create virtual memory mappings. To prevent any further use of this
method it's best to just remove it completely.
Also, the APIC code is disabled for now since it doesn't help booting
the system, and is broken since it relies on identity mapping to exist
in the first 1MB. Any call to the APIC code will result in assertion
failed.
In addition to that, the name of the method which is responsible to
create an identity mapping between 1MB to 2MB was changed, to be more
precise about its purpose.
There is no real "read protection" on x86, so we have no choice but to
map write-only pages simply as "present & read/write".
If we get a read page fault in a non-readable region, that's still a
correctness issue, so we crash the process. It's by no means a complete
protection against invalid reads, since it's trivial to fool the kernel
by first causing a write fault in the same region.
uintptr_t is 32-bit or 64-bit depending on the target platform.
This will help us write pointer size agnostic code so that when the day
comes that we want to do a 64-bit port, we'll be in better shape.
Instead of restoring CR3 to the current process's paging scope when a
ProcessPagingScope goes out of scope, we now restore exactly whatever
the CR3 value was when we created the ProcessPagingScope.
This fixes breakage in situations where a process ends up with nested
ProcessPagingScopes. This was making profiling very fragile, and with
this change it's now possible to profile g++! :^)
Previously, when deallocating a range of VM, we would sort and merge
the range list. This was quite slow for large processes.
This patch optimizes VM deallocation in the following ways:
- Use binary search instead of linear scan to find the place to insert
the deallocated range.
- Insert at the right place immediately, removing the need to sort.
- Merge the inserted range with any adjacent range(s) in-line instead
of doing a separate merge pass into a list copy.
- Add Traits<Range> to inform Vector that Range objects are trivial
and can be moved using memmove().
I've also added an assertion that deallocated ranges are actually part
of the RangeAllocator's initial address range.
I've benchmarked this using g++ to compile Kernel/Process.cpp.
With these changes, compilation goes from ~41 sec to ~35 sec.