We now validate the full range of userspace memory passed into syscalls
instead of just checking that the first and last byte of the memory are
in process-owned regions.
This fixes an issue where it was possible to avoid rejection of invalid
addresses that sat between two valid ones, simply by passing a valid
address and a size large enough to put the end of the range at another
valid address.
I added a little test utility that tries to provoke EFAULT in various
ways to help verify this. I'm sure we can think of more ways to test
this but it's at least a start. :^)
Thanks to mozjag for pointing out that this code was still lacking!
Incidentally this also makes backtraces work again.
Fixes#989.
All threads were running with iomapbase=0 in their TSS, which the CPU
interprets as "there's an I/O permission bitmap starting at offset 0
into my TSS".
Because of that, any bits that were 1 inside the TSS would allow the
thread to execute I/O instructions on the port with that bit index.
Fix this by always setting the iomapbase to sizeof(TSS32), and also
setting the TSS descriptor's limit to sizeof(TSS32), effectively making
the I/O permissions bitmap zero-length.
This should make it no longer possible to do I/O from userspace. :^)
This patch introduces a syscall:
int set_thread_boost(int tid, int amount)
You can use this to add a permanent boost value to the effective thread
priority of any thread with your UID (or any thread in the system if
you are the superuser.)
This is quite crude, but opens up some interesting opportunities. :^)
Threads now have numeric priorities with a base priority in the 1-99
range.
Whenever a runnable thread is *not* scheduled, its effective priority
is incremented by 1. This is tracked in Thread::m_extra_priority.
The effective priority of a thread is m_priority + m_extra_priority.
When a runnable thread *is* scheduled, its m_extra_priority is reset to
zero and the effective priority returns to base.
This means that lower-priority threads will always eventually get
scheduled to run, once its effective priority becomes high enough to
exceed the base priority of threads "above" it.
The previous values for ThreadPriority (Low, Normal and High) are now
replaced as follows:
Low -> 10
Normal -> 30
High -> 50
In other words, it will take 20 ticks for a "Low" priority thread to
get to "Normal" effective priority, and another 20 to reach "High".
This is not perfect, and I've used some quite naive data structures,
but I think the mechanism will allow us to build various new and
interesting optimizations, and we can figure out better data structures
later on. :^)
If an mmap fails to allocate a region, but the addr passed in was
non-zero, non-fixed mmaps should attempt to allocate at any available
virtual address.
This is memory that's loaded from an inode (file) but not modified in
memory, so still identical to what's on disk. This kind of memory can
be freed and reloaded transparently from disk if needed.
Dirty private memory is all memory in non-inode-backed mappings that's
process-private, meaning it's not shared with any other process.
This patch exposes that number via SystemMonitor, giving us an idea of
how much memory each process is responsible for all on its own.
This patch implements a simple version of the futex (fast userspace
mutex) API in the kernel and uses it to make the pthread_cond_t API's
block instead of busily sched_yield().
An arbitrary userspace address is passed to the kernel as a "token"
that identifies the futex and you can then FUTEX_WAIT and FUTEX_WAKE
that specific userspace address.
FUTEX_WAIT corresponds to pthread_cond_wait() and FUTEX_WAKE is used
for pthread_cond_signal() and pthread_cond_broadcast().
I'm pretty sure I'm missing something in this implementation, but it's
hopefully okay for a start. :^)
This is a little strange, but it's how I understand things should work.
The first thread in a new process now has TID == PID.
Additional threads subsequently spawned in that process all have unique
TID's generated by the PID allocator. TIDs are now globally unique.
The idea of all processes reliably having a main thread was nice in
some ways, but cumbersome in others. More importantly, it didn't match
up with POSIX thread semantics, so let's move away from it.
This thread gets rid of Process::main_thread() and you now we just have
a bunch of Thread objects floating around each Process.
When the finalizer nukes the last Thread in a Process, it will also
tear down the Process.
There's a bunch of more things to fix around this, but this is where we
get started :^)
While setting up the main thread stack for a new process, we'd incur
some zero-fill page faults. This was to be expected, since we allocate
a huge stack but lazily populate it with physical pages.
The problem is that page fault handlers may enable interrupts in order
to grab a VMObject lock (or to page in from an inode.)
During exec(), a process is reorganizing itself and will be in a very
unrunnable state if the scheduler should interrupt it and then later
ask it to run again. Which is exactly what happens if the process gets
pre-empted while the new stack's zero-fill page fault grabs the lock.
This patch fixes the issue by creating new main thread stacks before
disabling interrupts and going into the critical part of exec().
To enforce this, we create two separate mappings of the same underlying
physical page. A writable mapping for the kernel, and a read-only one
for userspace (the one returned by sys$get_kernel_info_page.)
This patch adds a single "kernel info page" that is mappable read-only
by any process and contains the current time of day.
This is then used to implement a version of gettimeofday() that doesn't
have to make a syscall.
To protect against race condition issues, the info page also has a
serial number which is incremented whenever the kernel updates the
contents of the page. Make sure to verify that the serial number is the
same before and after reading the information you want from the page.
Every process keeps its own ELF executable mapped in memory in case we
need to do symbol lookup (for backtraces, etc.)
Until now, it was mapped in a way that made it accessible to the
program, despite the program not having mapped it itself.
I don't really see a need for userspace to have access to this right
now, so let's lock things down a little bit.
This patch makes it inaccessible to userspace and exposes that fact
through /proc/PID/vm (per-region "user_accessible" flag.)
Currently only Ext2FS and TmpFS supports InodeWatchers. We now fail
with ENOTSUPP if watch_file() is called on e.g ProcFS.
This fixes an issue with FileManager chewing up all the CPU when /proc
was opened. Watchers don't keep the watched Inode open, and when they
close, the watcher FD will EOF.
Since nothing else kept /proc open in FileManager, the watchers created
for it would EOF immediately, causing a refresh over and over.
Fixes#879.
The kernel now supports basic profiling of all the threads in a process
by calling profiling_enable(pid_t). You finish the profiling by calling
profiling_disable(pid_t).
This all works by recording thread stacks when the timer interrupt
fires and the current thread is in a process being profiled.
Note that symbolication is deferred until profiling_disable() to avoid
adding more noise than necessary to the profile.
A simple "/bin/profile" command is included here that can be used to
start/stop profiling like so:
$ profile 10 on
... wait ...
$ profile 10 off
After a profile has been recorded, it can be fetched in /proc/profile
There are various limits (or "bugs") on this mechanism at the moment:
- Only one process can be profiled at a time.
- We allocate 8MB for the samples, if you use more space, things will
not work, and probably break a bit.
- Things will probably fall apart if the profiled process dies during
profiling, or while extracing /proc/profile
This patch makes SharedBuffer use a PurgeableVMObject as its underlying
memory object.
A new syscall is added to control the volatile flag of a SharedBuffer.
It's now possible to get purgeable memory by using mmap(MAP_PURGEABLE).
Purgeable memory has a "volatile" flag that can be set using madvise():
- madvise(..., MADV_SET_VOLATILE)
- madvise(..., MADV_SET_NONVOLATILE)
When in the "volatile" state, the kernel may take away the underlying
physical memory pages at any time, without notifying the owner.
This gives you a guilt discount when caching very large things. :^)
Setting a purgeable region to non-volatile will return whether or not
the memory has been taken away by the kernel while being volatile.
Basically, if madvise(..., MADV_SET_NONVOLATILE) returns 1, that means
the memory was purged while volatile, and whatever was in that piece
of memory needs to be reconstructed before use.
Using int was a mistake. This patch changes String, StringImpl,
StringView and StringBuilder to use size_t instead of int for lengths.
Obviously a lot of code needs to change as a result of this.
The main thread of each kernel/user process will take the name of
the process. Extra threads will get a fancy new name
"ProcessName[<tid>]".
Thread backtraces now list the thread name in addtion to tid.
Add the thread name to /proc/all (should it get its own proc
file?).
Add two new syscalls, set_thread_name and get_thread_name.
This patch makes it possible to make memory regions non-readable.
This is enforced using the "present" bit in the page tables.
A process that hits an not-present page fault in a non-readable
region will be crashed.
This patch introduces code generation for the WindowServer IPC with
its clients. The client/server endpoints are defined by the two .ipc
files in Servers/WindowServer/: WindowServer.ipc and WindowClient.ipc
It now becomes significantly easier to add features and capabilities
to WindowServer since you don't have to know nearly as much about all
the intricate paths that IPC messages take between LibGUI and WSWindow.
The new system also uses significantly less IPC bandwidth since we're
now doing packed serialization instead of passing fixed-sized structs
of ~600 bytes for each message.
Some repaint coalescing optimizations are lost in this conversion and
we'll need to look at how to implement those in the new world.
The old CoreIPC::Client::Connection and CoreIPC::Server::Connection
classes are removed by this patch and replaced by use of ConnectionNG,
which will be renamed eventually.
Goodbye, old WindowServer IPC. You served us well :^)