The old approach only worked because of an overpermissive accident.
There's now a concept of supervisor physical pages that can be allocated.
They all sit in the low 4 MB of physical memory and are identity mapped,
shared between all processes, and only ring 0 can access them.
This container is really just there to keep a retain on the individual
PhysicalPages for each page table. A HashMap does the job with far greater
space efficiency.
Process page directories can now actually be freed. This could definitely
be implemented in a nicer, less wasteful way, but this works for now.
The spawn stress test can now run for a lot longer but eventually dies
due to kmalloc running out of memory.
mmap() will now map uncommitted pages that get allocated and zeroed upon the
first access. I also made /proc/PID/vm show number of "committed" bytes in
each region. This is so cool! :^)
...by adding a new class called Ext2Inode that inherits CoreInode.
The idea is that a vnode will wrap a CoreInode rather than InodeIdentifier.
Each CoreInode subclass can keep whatever caches they like.
Right now, Ext2Inode caches the list of block indices since it can be very
expensive to retrieve.
- Process::exec() needs to restore the original paging scope when called
on a non-current process.
- Add missing InterruptDisabler guards around g_processes access.
- Only flush the TLB when modifying the active page tables.
This is really sweet! :^) The four instances of /bin/sh spawned at
startup now share their read-only text pages.
There are problems and limitations here, and plenty of room for
improvement. But it kinda works.
All right, we can now mmap() a file and it gets magically paged in from fs
in response to an NP page fault. This is really cool :^)
I need to refactor this to support sharing of read-only file-backed pages,
but it's cool to just have something working.
It only works for sending a signal to a process that's in userspace code.
We implement reception by synthesizing a PUSHA+PUSHF in the receiving process
(operating on values in the TSS.)
The TSS CS:EIP is then rerouted to the signal handler and a tiny return
trampoline is constructed in a dedicated region in the receiving process.
Also hacked up /bin/kill to be able to send arbitrary signals (kill -N PID)
sys$fork() now clones all writable regions with per-page COW bits.
The pages are then mapped read-only and we handle a PF by COWing the pages.
This is quite delightful. Obviously there's lots of work to do still,
and it needs better data structures, but the general concept works.
This turned out way better than the old code. ELF loading is now quite
straightforward, and we don't need the weird concept of subregions anymore.
Next step is to respect the is_writable flag.
This was the fix:
-process.m_page_directory[0] = m_kernel_page_directory[0];
-process.m_page_directory[1] = m_kernel_page_directory[1];
+process.m_page_directory->entries[0] = m_kernel_page_directory->entries[0];
+process.m_page_directory->entries[1] = m_kernel_page_directory->entries[1];
I spent a good two hours scratching my head, not being able to figure out why
user process page directories felt they had ownership of page tables in the
kernel page directory.
It was because I was copying the entire damn kernel page directory into
the process instead of only sharing the two first PDE's. Dang!
This is quite cool! The syscall entry point plumbs the register dump
down to sys$fork(), which uses it to set up the child process's TSS
in order to resume execution right after the int 0x80 fork() call. :^)
This works pretty well, although there is some problem with the kernel
alias mappings used to clone the parent process's regions. If I disable
the MM::release_page_directory() code, there's no problem. Probably there's
a premature freeing of a physical page somehow.
This is way better than walking the region lists. I suppose we could
even let the hardware trigger a page fault and handle that. That'll
be the next step in the evolution here I guess.
I added an RAII helper called OtherTaskPagingScope. While present,
it switches the kernel over to using another task's page directory.
This is perfect for e.g walking the stack in /proc/PID/stack.
I spent some time stuck on a problem where processes would clobber each
other's stacks. Took me a moment to figure out that their stacks
were allocated in the sub-4MB linear address range which is shared
between all processes. Oops!
This isn't finished but I'll commit as I go. We need to get to where context
switching only needs to change CR3 and everything's ready to go.
My basic idea is:
- The first 4 kB is off-limits. This catches null dereferences.
- Up to the 4 MB mark is identity-mapped and kernel-only.
- The rest is available to everyone!
While the first 4 MB is only available to the kernel, it's still mapped in
every process, for convenience when entering the kernel.
I ran out of steam writing library routines and imported two
BSD-licensed libc routines: sscanf() and getopt().
I will most likely rewrite them sooner or later. For now
I just wanted to see figlet running.
This shows some info about the MM. Right now it's just the zone count
and the number of free physical pages. Lots more can be added.
Also added "exit" to sh so we can nest shells and exit from them.
I also noticed that we were leaking all the physical pages, so fixed that.
This took me a couple hours. :^)
The ELF loading code now allocates a single region for the entire
file and creates virtual memory mappings for the sections as needed.
Very nice!