In order to preserve the absolute path of the process root, we save the
custody used by chroot() before stripping it to become the new "/".
There's probably a better way to do this.
The chroot() syscall now allows the superuser to isolate a process into
a specific subtree of the filesystem. This is not strictly permanent,
as it is also possible for a superuser to break *out* of a chroot, but
it is a useful mechanism for isolating unprivileged processes.
The VFS now uses the current process's root_directory() as the root for
path resolution purposes. The root directory is stored as an uncached
Custody in the Process object.
Note that I'm developing some helper types in the Syscall namespace as
I go here. Once I settle on some nice types, I will convert all the
other syscalls to use them as well.
The join_thread() syscall is not supposed to be interruptible by
signals, but it was. And since the process death mechanism piggybacked
on signal interrupts, it was possible to interrupt a pthread_join() by
killing the process that was doing it, leading to confusing due to some
assumptions being made by Thread::finalize() for threads that have a
pending joiner.
This patch fixes the issue by making "interrupted by death" a distinct
block result separate from "interrupted by signal". Then we handle that
state in join_thread() and tidy things up so that thread finalization
doesn't get confused by the pending joiner being gone.
Test: Tests/Kernel/null-deref-crash-during-pthread_join.cpp
The userspace execve() wrapper now measures all the strings and puts
them in a neat and tidy structure on the stack.
This way we know exactly how much to copy in the kernel, and we don't
have to use the SMAP-violating validate_read_str(). :^)
When loading a new executable, we now map the ELF image in kernel-only
memory and parse it there. Then we use copy_to_user() when initializing
writable regions with data from the executable.
Note that the exec() syscall still disables SMAP protection and will
require additional work. This patch only affects kernel-originated
process spawns.
Make mmap return -ENOTSUP in this case to make sure users don't get
confused and think they're using a private mapping when it's actually
shared. It's currenlty not possible to open a file and mmap it
MAP_PRIVATE, and change the perms of the private mapping to ones that
don't match the permissions of the underlying file.
This patch fixes some issues with the mmap() and mprotect() syscalls,
neither of whom were checking the permission bits of the underlying
files when mapping an inode MAP_SHARED.
This made it possible to subvert execution of any running program
by simply memory-mapping its executable and replacing some of the code.
Test: Kernel/mmap-write-into-running-programs-executable-file.cpp
This encourages callers to strongly reference file descriptions while
working with them.
This fixes a use-after-free issue where one thread would close() an
open fd while another thread was blocked on it becoming readable.
Test: Kernel/uaf-close-while-blocked-in-read.cpp
Before this, you could make the kernel copy memory from anywhere by
setting up an ELF executable with a program header specifying file
offsets outside the file.
Since ELFImage didn't even know how large it was, we had no clue that
we were copying things from outside the ELF.
Fix this by adding a size field to ELFImage and validating program
header ranges before memcpy()'ing to them.
The ELF code is definitely going to need more validation and checking.
This code had been misinterpreting the Multiboot ELF section headers
since the beginning. Furthermore QEMU wasn't even passing us any
headers at all, so this wasn't checking anything.
This patch introduces a helpful copy_string_from_user() function
that takes a bounded null-terminated string from userspace memory
and copies it into a String object.
Supervisor Mode Access Prevention (SMAP) is an x86 CPU feature that
prevents the kernel from accessing userspace memory. With SMAP enabled,
trying to read/write a userspace memory address while in the kernel
will now generate a page fault.
Since it's sometimes necessary to read/write userspace memory, there
are two new instructions that quickly switch the protection on/off:
STAC (disables protection) and CLAC (enables protection.)
These are exposed in kernel code via the stac() and clac() helpers.
There's also a SmapDisabler RAII object that can be used to ensure
that you don't forget to re-enable protection before returning to
userspace code.
THis patch also adds copy_to_user(), copy_from_user() and memset_user()
which are the "correct" way of doing things. These functions allow us
to briefly disable protection for a specific purpose, and then turn it
back on immediately after it's done. Going forward all kernel code
should be moved to using these and all uses of SmapDisabler are to be
considered FIXME's.
Note that we're not realizing the full potential of this feature since
I've used SmapDisabler quite liberally in this initial bring-up patch.
Our syscall calling convention only allows passing up to 3 arguments in
registers. For syscalls that take more arguments, we bake them into a
struct and pass a pointer to that struct instead.
When doing pointer validation, this is what we would do:
1) Validate the "params" struct
2) Validate "params->some_pointer"
3) ... other stuff ...
4) Use "params->some_pointer"
Since the parameter struct is stored in userspace, it can be modified
by userspace after validation has completed.
This was a recurring pattern in many syscalls that was further hidden
by me using structured binding declarations to give convenient local
names to things in the parameter struct:
auto& [some_pointer, ...] = *params;
memcpy(some_pointer, ...);
This devilishly makes "some_pointer" look like a local variable but
it's actually more like an alias for "params->some_pointer" and will
expand to a dereference when accessed!
This patch fixes the issues by explicitly copying out each member from
the parameter structs before validating them, and then never using
the "param" pointers beyond that.
Thanks to braindead for finding this bug! :^)