Commit graph

683 commits

Author SHA1 Message Date
Andreas Kling
cf48c20170 Kernel: Forked children should inherit unveil()'ed paths 2020-01-21 09:44:32 +01:00
Andreas Kling
0569123ad7 Kernel: Add a basic implementation of unveil()
This syscall is a complement to pledge() and adds the same sort of
incremental relinquishing of capabilities for filesystem access.

The first call to unveil() will "drop a veil" on the process, and from
now on, only unveiled parts of the filesystem are visible to it.

Each call to unveil() specifies a path to either a directory or a file
along with permissions for that path. The permissions are a combination
of the following:

- r: Read access (like the "rpath" promise)
- w: Write access (like the "wpath" promise)
- x: Execute access
- c: Create/remove access (like the "cpath" promise)

Attempts to open a path that has not been unveiled with fail with
ENOENT. If the unveiled path lacks sufficient permissions, it will fail
with EACCES.

Like pledge(), subsequent calls to unveil() with the same path can only
remove permissions, not add them.

Once you call unveil(nullptr, nullptr), the veil is locked, and it's no
longer possible to unveil any more paths for the process, ever.

This concept comes from OpenBSD, and their implementation does various
things differently, I'm sure. This is just a first implementation for
SerenityOS, and we'll keep improving on it as we go. :^)
2020-01-20 22:12:04 +01:00
Andreas Kling
e901a3695a Kernel: Use the templated copy_to/from_user() in more places
These ensure that the "to" and "from" pointers have the same type,
and also that we copy the correct number of bytes.
2020-01-20 13:41:21 +01:00
Sergey Bugaev
d5426fcc88 Kernel: Misc tweaks 2020-01-20 13:26:06 +01:00
Sergey Bugaev
9bc6157998 Kernel: Return new fd from sys$fcntl(F_DUPFD)
This fixes GNU Bash getting confused after performing a redirection.
2020-01-20 13:26:06 +01:00
Andreas Kling
4b7a89911c Kernel: Remove some unnecessary casts to uintptr_t
VirtualAddress is constructible from uintptr_t and const void*.
PhysicalAddress is constructible from uintptr_t but not const void*.
2020-01-20 13:13:03 +01:00
Andreas Kling
a246e9cd7e Use uintptr_t instead of u32 when storing pointers as integers
uintptr_t is 32-bit or 64-bit depending on the target platform.
This will help us write pointer size agnostic code so that when the day
comes that we want to do a 64-bit port, we'll be in better shape.
2020-01-20 13:13:03 +01:00
Andreas Kling
8d9dd1b04b Kernel: Add a 1-deep cache to Process::region_from_range()
This simple cache gets hit over 70% of the time on "g++ Process.cpp"
and shaves ~3% off the runtime.
2020-01-19 16:44:37 +01:00
Andreas Kling
ae0c435e68 Kernel: Add a Process::add_region() helper
This is a private helper for adding a Region to Process::m_regions.
It's just for convenience since it's a bit cumbersome to do this.
2020-01-19 16:26:42 +01:00
Andreas Kling
1dc9fa9506 Kernel: Simplify PageDirectory swapping in sys$execve()
Swap out both the PageDirectory and the Region list at the same time,
instead of doing the Region list slightly later.
2020-01-19 16:05:42 +01:00
Andreas Kling
6eab7b398d Kernel: Make ProcessPagingScope restore CR3 properly
Instead of restoring CR3 to the current process's paging scope when a
ProcessPagingScope goes out of scope, we now restore exactly whatever
the CR3 value was when we created the ProcessPagingScope.

This fixes breakage in situations where a process ends up with nested
ProcessPagingScopes. This was making profiling very fragile, and with
this change it's now possible to profile g++! :^)
2020-01-19 13:44:53 +01:00
Andreas Kling
f7b394e9a1 Kernel: Assert that copy_to/from_user() are called with user addresses
This will panic the kernel immediately if these functions are misused
so we can catch it and fix the misuse.

This patch fixes a couple of misuses:

    - create_signal_trampolines() writes to a user-accessible page
      above the 3GB address mark. We should really get rid of this
      page but that's a whole other thing.

    - CoW faults need to use copy_from_user rather than copy_to_user
      since it's the *source* pointer that points to user memory.

    - Inode faults need to use memcpy rather than copy_to_user since
      we're copying a kernel stack buffer into a quickmapped page.

This should make the copy_to/from_user() functions slightly less useful
for exploitation. Before this, they were essentially just glorified
memcpy() with SMAP disabled. :^)
2020-01-19 09:18:55 +01:00
Andreas Kling
5ce9382e98 Kernel: Only require "stdio" pledge for sending signals to self
This should match what OpenBSD does. Sending a signal to yourself seems
basically harmless.
2020-01-19 08:50:55 +01:00
Sergey Bugaev
3e1ed38d4b Kernel: Do not return ENOENT for unresolved symbols
ENOENT means "no such file or directory", not "no such symbol". Return EINVAL
instead, as we already do in other cases.
2020-01-18 23:51:22 +01:00
Sergey Bugaev
d0d13e2bf5 Kernel: Move setting file flags and r/w mode to VFS::open()
Previously, VFS::open() would only use the passed flags for permission checking
purposes, and Process::sys$open() would set them on the created FileDescription
explicitly. Now, they should be set by VFS::open() on any files being opened,
including files that the kernel opens internally.

This also lets us get rid of the explicit check for whether or not the returned
FileDescription was a preopen fd, and in fact, fixes a bug where a read-only
preopen fd without any other flags would be considered freshly opened (due to
O_RDONLY being indistinguishable from 0) and granted a new set of flags.
2020-01-18 23:51:22 +01:00
Sergey Bugaev
544b8286da Kernel: Do not open stdio fds for kernel processes
Kernel processes just do not need them.

This also avoids touching the file (sub)system early in the boot process when
initializing the colonel process.
2020-01-18 23:51:22 +01:00
Sergey Bugaev
6466c3d750 Kernel: Pass correct permission flags when opening files
Right now, permission flags passed to VFS::open() are effectively ignored, but
that is going to change.

* O_RDONLY is 0, but it's still nicer to pass it explicitly
* POSIX says that binding a Unix socket to a symlink shall fail with EADDRINUSE
2020-01-18 23:51:22 +01:00
Andreas Kling
862b3ccb4e Kernel: Enforce W^X between sys$mmap() and sys$execve()
It's now an error to sys$mmap() a file as writable if it's currently
mapped executable by anyone else.

It's also an error to sys$execve() a file that's currently mapped
writable by anyone else.

This fixes a race condition vulnerability where one program could make
modifications to an executable while another process was in the kernel,
in the middle of exec'ing the same executable.

Test: Kernel/elf-execve-mmap-race.cpp
2020-01-18 23:40:12 +01:00
Andreas Kling
4e6fe3c14b Kernel: Symbolicate kernel EIP on process crash
Process::crash() was assuming that EIP was always inside the ELF binary
of the program, but it could also be in the kernel.
2020-01-18 14:38:39 +01:00
Andreas Kling
9c9fe62a4b Kernel: Validate the requested range in allocate_region_with_vmobject() 2020-01-18 14:37:22 +01:00
Andreas Kling
aa63de53bd Kernel: Use get_syscall_path_argument() in sys$execve()
Paths passed to sys$execve() should certainly be subject to all the
usual path validation checks.
2020-01-18 11:43:28 +01:00
Andreas Kling
b65572b3fe Kernel: Disallow mmap names longer than PATH_MAX 2020-01-18 11:34:53 +01:00
Andreas Kling
94ca55cefd Meta: Add license header to source files
As suggested by Joshua, this commit adds the 2-clause BSD license as a
comment block to the top of every source file.

For the first pass, I've just added myself for simplicity. I encourage
everyone to add themselves as copyright holders of any file they've
added or modified in some significant way. If I've added myself in
error somewhere, feel free to replace it with the appropriate copyright
holder instead.

Going forward, all new source files should include a license header.
2020-01-18 09:45:54 +01:00
Andreas Kling
19c31d1617 Kernel: Always dump kernel regions when dumping process regions 2020-01-18 08:57:18 +01:00
Sergey Bugaev
064cd2278c Kernel: Remove the use of FileSystemPath in sys$realpath()
Now that VFS::resolve_path() canonicalizes paths automatically, we don't need to
do that here anymore.
2020-01-17 21:49:58 +01:00
Sergey Bugaev
8642a7046c Kernel: Let inodes provide pre-open file descriptions
Some magical inodes, such as /proc/pid/fd/fileno, are going to want to open() to
a custom FileDescription, so add a hook for that.
2020-01-17 21:49:58 +01:00
Sergey Bugaev
e0013a6b4c Kernel+LibC: Unify sys$open() and sys$openat()
The syscall is now called sys$open(), but it behaves like the old sys$openat().
In userspace, open_with_path_length() is made a wrapper over openat_with_path_length().
2020-01-17 21:49:58 +01:00
Andreas Kling
4d4d5e1c07 Kernel: Drop futex queues/state on exec()
This state is not meaningful to the new process image so just drop it.
2020-01-17 16:08:00 +01:00
Andreas Kling
26a31c7efb Kernel: Add "accept" pledge promise for accepting incoming connections
This patch adds a new "accept" promise that allows you to call accept()
on an already listening socket. This lets programs set up a socket for
for listening and then dropping "inet" and/or "unix" so that only
incoming (and existing) connections are allowed from that point on.
No new outgoing connections or listening server sockets can be created.

In addition to accept() it also allows getsockopt() with SOL_SOCKET
and SO_PEERCRED, which is used to find the PID/UID/GID of the socket
peer. This is used by our IPC library when creating shared buffers that
should only be accessible to a specific peer process.

This allows us to drop "unix" in WindowServer and LookupServer. :^)

It also makes the debugging/introspection RPC sockets in CEventLoop
based programs work again.
2020-01-17 11:19:06 +01:00
Andreas Kling
c6e552ac8f Kernel+LibELF: Don't blindly trust ELF symbol offsets in symbolication
It was possible to craft a custom ELF executable that when symbolicated
would cause the kernel to read from user-controlled addresses anywhere
in memory. You could then fetch this memory via /proc/PID/stack

We fix this by making ELFImage hand out StringView rather than raw
const char* for symbol names. In case a symbol offset is outside the
ELF image, you get a null StringView. :^)

Test: Kernel/elf-symbolication-kernel-read-exploit.cpp
2020-01-16 22:11:31 +01:00
Andreas Kling
d79de38bd2 Kernel: Don't allow userspace to sys$open() literal symlinks
The O_NOFOLLOW_NOERROR is an internal kernel mechanism used for the
implementation of sys$readlink() and sys$lstat().

There is no reason to allow userspace to open symlinks directly.
2020-01-15 21:19:26 +01:00
Andreas Kling
e23536d682 Kernel: Use Vector::unstable_remove() in a couple of places 2020-01-15 19:26:41 +01:00
Liav A
d2b41010c5 Kernel: Change Region allocation helpers
We now can create a cacheable Region, so when map() is called, if a
Region is cacheable then all the virtual memory space being allocated
to it will be marked as not cache disabled.

In addition to that, OS components can create a Region that will be
mapped to a specific physical address by using the appropriate helper
method.
2020-01-14 15:38:58 +01:00
Andreas Kling
65cb406327 Kernel: Allow unlocking a held Lock with interrupts disabled
This is needed to eliminate a race in Thread::wait_on() where we'd
otherwise have to wait until after unlocking the process lock before
we can disable interrupts.
2020-01-13 18:56:46 +01:00
Andrew Kaster
7a7e7c82b5 Kernel: Tighten up exec/do_exec and allow for PT_INTERP iterpreters
This patch changes how exec() figures out which program image to
actually load. Previously, we opened the path to our main executable in
find_shebang_interpreter_for_executable, read the first page (or less,
if the file was smaller) and then decided whether to recurse with the
interpreter instead. We then then re-opened the main executable in
do_exec.

However, since we now want to parse the ELF header and Program Headers
of an elf image before even doing any memory region work, we can change
the way this whole process works. We open the file and read (up to) the
first page in exec() itself, then pass just the page and the amount read
to find_shebang_interpreter_for_executable. Since we now have that page
and the FileDescription for the main executable handy, we can do a few
things. First, validate the ELF header and ELF program headers for any
shenanigans. ELF32 Little Endian i386 only, please. Second, we can grab
the PT_INTERP interpreter from any ET_DYN files, and open that guy right
away if it exists. Finally, we can pass the main executable's and
optionally the PT_INTERP interpreter's file descriptions down to do_exec
and not have to feel guilty about opening the file twice.

In do_exec, we now have a choice. Are we going to load the main
executable, or the interpreter? We could load both, but it'll be way
easier for the inital pass on the RTLD if we only load the interpreter.
Then it can load the main executable itself like any old shared object,
just, the one with main in it :). Later on we can load both of them
into memory and the RTLD can relocate itself before trying to do
anything. The way it's written now the RTLD will get dibs on its
requested virtual addresses being the actual virtual addresses.
2020-01-13 13:03:30 +01:00
Brian Gianforcaro
4cee441279 Kernel: Combine validate and copy of user mode pointers (#1069)
Right now there is a significant amount of boiler plate code required
to validate user mode parameters in syscalls. In an attempt to reduce
this a bit, introduce validate_read_and_copy_typed which combines the
usermode address check and does the copy internally if the validation
passes. This cleans up a little bit of code from a significant amount
of syscalls.
2020-01-13 11:19:17 +01:00
Brian Gianforcaro
9cac205d67 Kernel: Fix SMAP in setkeymap syscall
It looks like setkeymap was missed when
the SMAP functionality was introduced.

Disable SMAP only in the scope where we
actually read the usermode addresses.
2020-01-13 11:17:10 +01:00
Brian Gianforcaro
02704a73e9 Kernel: Use the templated copy_from_user where possible
Now that the templated version of copy_from_user exists
their is normally no reason to use the version which
takes the number of bytes to copy. Move to the templated
version where possible.
2020-01-13 11:07:39 +01:00
Andreas Kling
20b2bfcafd Kernel: Fix SMAP violation in sys$getrandom() 2020-01-12 20:10:53 +01:00
Sergey Bugaev
33c0dc08a7 Kernel: Don't forget to copy & destroy root_directory_for_procfs
Also, rename it to root_directory_relative_to_global_root.
2020-01-12 20:02:11 +01:00
Sergey Bugaev
dd54d13d8d Kernel+LibC: Allow passing mount flags to chroot()
Since a chroot is in many ways similar to a separate root mount, we can also
apply mount flags to it as if it was an actual mount. These flags will apply
whenever the chrooted process accesses its root directory, but not when other
processes access this same directory for the outside. Since it's common to
chdir("/") immediately after chrooting (so that files accessed through the
current directory inherit the same mount flags), this effectively allows one to
apply additional limitations to a process confined inside a chroot.

To this effect, sys$chroot() gains a mount_flags argument (exposed as
chroot_with_mount_flags() in userspace) which can be set to all the same values
as the flags argument for sys$mount(), and additionally to -1 to keep the flags
set for that file system. Note that passing 0 as mount_flags will unset any
flags that may have been set for the file system, not keep them.
2020-01-12 20:02:11 +01:00
Sergey Bugaev
93ff911473 Kernel: Properly propagate bind mount flags
Previously, when performing a bind mount flags other than MS_BIND were ignored.
Now, they're properly propagated the same way a for any other mount.
2020-01-12 20:02:11 +01:00
Sergey Bugaev
b620ed25ab Kernel: Simplify Ext2FS mount code path
Instead of looking up device metadata and then looking up a device by that
metadata explicitly, just use VFS::open(). This also means that attempting to
mount a device residing on a MS_NODEV file system will properly fail.
2020-01-12 20:02:11 +01:00
Sergey Bugaev
35b0f10f20 Kernel: Don't dump backtrace on successful exits
This was getting really annoying.
2020-01-12 20:02:11 +01:00
Andreas Kling
d1839ae0c9 Kernel: Clearing promises with pledge("") should fail
Thanks Sergey for catching this brain-fart. :^)
2020-01-12 12:16:17 +01:00
Andreas Kling
114a770c6f Kernel: Reduce pledge requirement for recvfrom()+sendto() to "stdio"
Since these only operate on already-open sockets, we should treat them
the same as we do read() and write() by putting them into "stdio".
2020-01-12 11:52:37 +01:00
Andreas Kling
955034e86e Kernel: Remove manual STAC/CLAC in create_thread() 2020-01-12 11:51:31 +01:00
Andreas Kling
a6cef2408c Kernel: Add sigreturn() to "stdio" with all the other signal syscalls 2020-01-12 10:32:56 +01:00
Andreas Kling
7b53699e6f Kernel: Require the "thread" pledge promise for futex() 2020-01-12 10:31:21 +01:00
Andreas Kling
c32d65ae9f Kernel: Put some more syscalls in the "stdio" bucket
yield() and get_kernel_info_page() seem like decent fits for "stdio".
2020-01-12 10:31:21 +01:00