Implemented some syscalls: dup(), dup2(), getdtablesize().
FileHandle is now a retainable, since that's needed for dup()'ed fd's.
I didn't really test any of this beyond a basic smoke check.
sys$fork() now clones all writable regions with per-page COW bits.
The pages are then mapped read-only and we handle a PF by COWing the pages.
This is quite delightful. Obviously there's lots of work to do still,
and it needs better data structures, but the general concept works.
This turned out way better than the old code. ELF loading is now quite
straightforward, and we don't need the weird concept of subregions anymore.
Next step is to respect the is_writable flag.
We don't do anything useful with the data yet, but I'm gonna rewrite
the layout code to run off of the program header table instead.
This will allow us to map .text and .rodata sections in read-only memory.
This was the fix:
-process.m_page_directory[0] = m_kernel_page_directory[0];
-process.m_page_directory[1] = m_kernel_page_directory[1];
+process.m_page_directory->entries[0] = m_kernel_page_directory->entries[0];
+process.m_page_directory->entries[1] = m_kernel_page_directory->entries[1];
I spent a good two hours scratching my head, not being able to figure out why
user process page directories felt they had ownership of page tables in the
kernel page directory.
It was because I was copying the entire damn kernel page directory into
the process instead of only sharing the two first PDE's. Dang!
This is quite cool! The syscall entry point plumbs the register dump
down to sys$fork(), which uses it to set up the child process's TSS
in order to resume execution right after the int 0x80 fork() call. :^)
This works pretty well, although there is some problem with the kernel
alias mappings used to clone the parent process's regions. If I disable
the MM::release_page_directory() code, there's no problem. Probably there's
a premature freeing of a physical page somehow.
We no longer disable interrupts around the whole affair.
Since MM manages per-process data structures, this works quite smoothly now.
Only procfs had to be tweaked with an InterruptDisabler.
I'm still playing around with finding a style that I like.
This is starting to feel pleasing to the eye. I guess this is how long
it took me to break free from the habit of my previous Qt/WK coding style.
This is way better than walking the region lists. I suppose we could
even let the hardware trigger a page fault and handle that. That'll
be the next step in the evolution here I guess.
I added an RAII helper called OtherTaskPagingScope. While present,
it switches the kernel over to using another task's page directory.
This is perfect for e.g walking the stack in /proc/PID/stack.
I spent some time stuck on a problem where processes would clobber each
other's stacks. Took me a moment to figure out that their stacks
were allocated in the sub-4MB linear address range which is shared
between all processes. Oops!
This isn't finished but I'll commit as I go. We need to get to where context
switching only needs to change CR3 and everything's ready to go.
My basic idea is:
- The first 4 kB is off-limits. This catches null dereferences.
- Up to the 4 MB mark is identity-mapped and kernel-only.
- The rest is available to everyone!
While the first 4 MB is only available to the kernel, it's still mapped in
every process, for convenience when entering the kernel.
Ran into a horrendous bug where VirtualConsole would overrun its buffer
and scribble right into some other object if we were interrupted while
processing a character. Slapped an InterruptDisabler onto onChar for now.
This provokes an interesting question though.. if a process is killed
while its in kernel space, how the heck do we release any locks it held?
I'm sure there are many different solutions to this problem, but I'll
have to think about it.