Commit graph

1027 commits

Author SHA1 Message Date
int16
256744ebdf Kernel: Make mmap validation functions return ErrorOr<void> 2022-03-22 12:20:19 +01:00
int16
4b96d9c813 Kernel: Move mmap validation functions to Process 2022-03-22 12:20:19 +01:00
int16
479929b06c Kernel: Check wxallowed mount flag when validating mmap call 2022-03-22 12:20:19 +01:00
Brian Gianforcaro
03342876b8 Revert "Kernel: Use an ArmedScopeGuard to revert changes after failed mmap"
This reverts commit 790d620b39.
2022-03-12 21:45:57 -08:00
Andreas Kling
7b3642d08c Kernel: Mark sys$lseek() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-09 16:43:00 +01:00
Andreas Kling
09e644f0ba Kernel: Mark sys$emuctl() as not needing the big lock
This syscall doesn't do anything at all, and definitely doesn't need the
big lock. :^)
2022-03-09 16:43:00 +01:00
Andreas Kling
b4fefedd1d Kernel: Mark sys$chmod() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-09 16:43:00 +01:00
Andreas Kling
aa381c4a67 Kernel: Mark sys$fchmod() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-09 16:43:00 +01:00
Andreas Kling
d074aae422 Kernel: Mark sys$dup2() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-09 16:43:00 +01:00
Andreas Kling
8aad9e7448 Kernel: Mark sys$ftruncate() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-09 16:43:00 +01:00
Andreas Kling
69a6a4d927 Kernel: Mark sys$fstatvfs() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-09 16:43:00 +01:00
Hendiadyoin1
790d620b39 Kernel: Use an ArmedScopeGuard to revert changes after failed mmap 2022-03-08 15:58:51 -08:00
Andreas Kling
6354a9a030 Kernel: Mark sys$fsync() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-08 00:19:49 +01:00
Andreas Kling
ef45ff4703 Kernel: Mark sys$readlink() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-08 00:19:49 +01:00
Andreas Kling
2688ee28ff Kernel: Mark sys$stat() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-08 00:19:49 +01:00
Andreas Kling
be7ec52ed0 Kernel: Mark sys$fstat() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-08 00:19:49 +01:00
Andreas Kling
23822febd2 Kernel: Mark sys$fchdir() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-08 00:19:49 +01:00
Andreas Kling
156ab0c47d Kernel: Mark sys$chdir() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-08 00:19:49 +01:00
Andreas Kling
7597bef771 Kernel: Mark sys$getcwd() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-08 00:19:49 +01:00
Andreas Kling
f630d0f095 Kernel: Mark sys$realpath() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-08 00:19:49 +01:00
Andreas Kling
580d89f093 Kernel: Put Process unveil state in a SpinlockProtected container
This makes path resolution safe to perform without holding the big lock.
2022-03-08 00:19:49 +01:00
Andreas Kling
24f02bd421 Kernel: Put Process's current directory in a SpinlockProtected
Also let's call it "current_directory" instead of "cwd" everywhere.
2022-03-08 00:19:49 +01:00
Andreas Kling
7543c34d07 Kernel: Mark sys$anon_create() as not needing the big lock
This syscall is already safe for no-big-lock since it doesn't access any
unprotected data.
2022-03-08 00:19:49 +01:00
Andreas Kling
baa6ff5649 Kernel: Wrap HIDManagement keymap data in SpinlockProtected
This serializes access to the current keymap data everywhere in the
kernel, allowing to mark sys$setkeymap() as not needing the big lock.
2022-03-07 16:35:23 +01:00
Ali Mohammad Pur
6608812e4b Kernel: Over-align the FPUState on the stack in sigreturn
The stack is misaligned at this point for some reason, this is a hack
that makes the resulting object "correctly" aligned, thus avoiding a
KUBSAN error.
2022-03-04 20:07:05 +01:00
Ali Mohammad Pur
88d7bf7362 Kernel: Save and restore FPU state on signal dispatch on i386/x86_64 2022-03-04 20:07:05 +01:00
Ali Mohammad Pur
4bd01b7fe9 Kernel: Add support for SA_SIGINFO
We currently don't really populate most of the fields, but that can
wait :^)
2022-03-04 20:07:05 +01:00
Ali Mohammad Pur
585054d68b Kernel: Comment the living daylights out of signal trampoline/sigreturn
Mere mortals like myself cannot understand more than two lines of
assembly without a million comments explaining what's happening, so do
that and make sure no one has to go on a wild stack state chase when
hacking on these.
2022-03-04 20:07:05 +01:00
Ali Mohammad Pur
848eaf2220 Kernel: Reject sigaction() with SA_SIGINFO
We can't handle this, so let sigaction() fail instead of PANIC()'ing
later when we try to dispatch a signal with SA_SIGINFO set.
2022-03-04 20:07:05 +01:00
Ali Mohammad Pur
cf63447044 Kernel: Move signal handlers from being thread state to process state
POSIX requires that sigaction() and friends set a _process-wide_ signal
handler, so move signal handlers and flags inside Process.
This also fixes a "pid/tid confusion" FIXME, as we can now send the
signal to the process and let that decide which thread should get the
signal (which is the thread with tid==pid, but that's now the Process's
problem).
Note that each thread still retains its signal mask, as that is local to
each thread.
2022-03-04 20:07:05 +01:00
Lucas CHOLLET
839d3d9f74 Kernel: Add getrusage() syscall
Only the two timeval fields are maintained, as required by the POSIX
standard.
2022-02-28 20:09:37 +01:00
Idan Horowitz
011bd06053 Kernel: Set CS selector when initializing thread context on x86_64
These are not technically required, since the Thread constructor
already sets these, but they are set on i686, so let's try and keep
consistent behaviour between the different archs.
2022-02-27 00:38:00 +02:00
Brian Gianforcaro
d05fa14e52 Kernel: Use TRY() when validating clock_id in TimeManagement
Gets rid of a bit of code duplication, and makes the API more consistent
with the style we are moving towards.
2022-02-21 15:47:51 -08:00
Brian Gianforcaro
70f3fa2dd2 Kernel: Set new process name in do_exec before waiting for the tracer
While investigating why gdb is failing when it calls `PT_CONTINUE`
against Serenity I noticed that the names of the programs in the
System Monitor didn't make sense. They were seemingly stale.

After inspecting the kernel code, it became apparent that the sequence
occurs as follows:

    1. Debugger calls `fork()`
    2. The forked child calls `PT_TRACE_ME`
    3. The `PT_TRACE_ME` instructs the forked process to block in the
       kernel waiting for a signal from the tracer on the next call
       to `execve(..)`.
    4. Debugger waits for forked child to spawn and stop, and then it
       calls `PT_ATTACH` followed by `PT_CONTINUE` on the child.
    5. Currently the `PT_CONTINUE` fails because of some other yet to
       be found bug.
    6. The process name is set immediately AFTER we are woken up by
       the `PT_CONTINUE` which never happens in the case I'm debugging.

This chain of events leaves the process suspended, with the name  of
the original (forked) process instead of the name we inherit from
the `execve(..)` call.

To avoid such confusion in the future, we set the new name before we
block waiting for the tracer.
2022-02-19 18:04:32 -08:00
Jakub Berkop
895a050e04 Kernel: Fixed argument passing for profiling_enable syscall
Arguments larger than 32bit need to be passed as a pointer on a 32bit
architectures. sys$profiling_enable has u64 event_mask argument,
which means that it needs to be passed as an pointer. Previously upper
32bits were filled by garbage.
2022-02-19 11:37:02 +01:00
Ali Mohammad Pur
a1cb2c371a AK+Kernel: OOM-harden most parts of Trie
The only part of Unveil that can't handle OOM gracefully is the
String::formatted() use in the node metadata.
2022-02-15 18:03:02 +02:00
Jakub Berkop
4916c892b2 Kernel/Profiling: Add profiling to read syscall
Syscalls to read can now be profiled, allowing us to monitor
filesystem usage by different applications.
2022-02-14 11:38:13 +01:00
Idan Horowitz
bd821982e0 Kernel: Use StringView::for_each_split_view() in sys$pledge
This let's us avoid the fallible Vector allocation that split_view()
entails.
2022-02-14 11:35:20 +01:00
Idan Horowitz
e384f62ee2 Kernel: Make master TLS region WeakPtr construction OOM-fallible 2022-02-14 11:35:20 +01:00
Idan Horowitz
c8ab7bde3b Kernel: Use try_make_weak_ptr() instead of make_weak_ptr() 2022-02-13 23:02:57 +01:00
Idan Horowitz
d6ea6c39a7 AK+Kernel: Rename try_make_weak_ptr to make_weak_ptr_if_nonnull
This matches the likes of the adopt_{own, ref}_if_nonnull family and
also frees up the name to allow us to eventually add OOM-fallible
versions of these functions.
2022-02-13 23:02:57 +01:00
Andrew Kaster
b4a7d148b1 Kernel: Expose maximum argument limit in sysconf
Move the definitions for maximum argument and environment size to
Process.h from execve.cpp. This allows sysconf(_SC_ARG_MAX) to return
the actual argument maximum of 128 KiB to userspace.
2022-02-13 22:06:54 +02:00
Idan Horowitz
57bce8ab97 Kernel: Set up Regions before adding them to a Process's AddressSpace
This reduces the amount of time in which not fully-initialized Regions
are present inside an AddressSpace's region tree.
2022-02-11 17:49:46 +02:00
Lenny Maiorani
c6acf64558 Kernel: Change static constexpr variables to constexpr where possible
Function-local `static constexpr` variables can be `constexpr`. This
can reduce memory consumption, binary size, and offer additional
compiler optimizations.

These changes result in a stripped x86_64 kernel binary size reduction
of 592 bytes.
2022-02-09 21:04:51 +00:00
Andreas Kling
cda56f8049 Kernel: Robustify and rename Inode bound socket API
Rename the bound socket accessor from socket() to bound_socket().
Also return RefPtr<LocalSocket> instead of a raw pointer, to make it
harder for callers to mess up.
2022-02-07 13:02:34 +01:00
sin-ack
24fd8fb16f Kernel: Ensure socket is suitable for writing in sys$sendmsg
Previously we would return a bytes written value of 0 if the writing end
of the socket was full. Now we either exit with EAGAIN if the socket
description is non-blocking, or block until the description can be
written to.

This is mostly a copy of the conditions in sys$write but with the "total
nwritten" parts removed as sys$sendmsg does not have that.
2022-02-07 12:21:45 +01:00
Andreas Kling
04539d4930 Kernel: Propagate sys$profiling_enable() buffer allocation failure
Caught a kernel panic when enabling profiling of all threads when there
was very little memory available.
2022-02-06 01:25:32 +01:00
Andreas Kling
3845c90e08 Kernel: Remove unnecessary includes from Thread.h
...and deal with the fallout by adding missing includes everywhere.
2022-01-30 16:21:59 +01:00
Idan Horowitz
e28af4a2fc Kernel: Stop using HashMap in Mutex
This commit removes the usage of HashMap in Mutex, thereby making Mutex
be allocation-free.

In order to achieve this several simplifications were made to Mutex,
removing unused code-paths and extra VERIFYs:
 * We no longer support 'upgrading' a shared lock holder to an
   exclusive holder when it is the only shared holder and it did not
   unlock the lock before relocking it as exclusive. NOTE: Unlike the
   rest of these changes, this scenario is not VERIFY-able in an
   allocation-free way, as a result the new LOCK_SHARED_UPGRADE_DEBUG
   debug flag was added, this flag lets Mutex allocate in order to
   detect such cases when debugging a deadlock.
 * We no longer support checking if a Mutex is locked by the current
   thread when the Mutex was not locked exclusively, the shared version
   of this check was not used anywhere.
 * We no longer support force unlocking/relocking a Mutex if the Mutex
   was not locked exclusively, the shared version of these functions
   was not used anywhere.
2022-01-29 16:45:39 +01:00
Andreas Kling
d748a3c173 Kernel: Only lock process file descriptor table once in sys$poll()
Grab the OpenFileDescriptions mutex once and hold on to it while
populating the SelectBlocker::FDVector.
2022-01-29 02:17:12 +01:00
Andreas Kling
b56646e293 Kernel: Switch process file descriptor table from spinlock to mutex
There's no reason for this to use a spinlock. Instead, let's allow
threads to block if someone else is using the descriptor table.
2022-01-29 02:17:09 +01:00
Andreas Kling
8ebec2938c Kernel: Convert process file descriptor table to a SpinlockProtected
Instead of manually locking in the various member functions of
Process::OpenFileDescriptions, simply wrap it in a SpinlockProtected.
2022-01-29 02:17:06 +01:00
Andreas Kling
b27b22a68c Kernel: Allocate entire SelectBlocker::FDVector at once
Use try_ensure_capacity() + unchecked_append() instead of repeatedly
doing try_append().
2022-01-28 23:41:18 +01:00
Andreas Kling
31c1094577 Kernel: Don't mess with thread state in Process::do_exec()
We were marking the execing thread as Runnable near the end of
Process::do_exec().

This was necessary for exec in processes that had never been scheduled
yet, which is a specific edge case that only applies to the very first
userspace process (normally SystemServer). At this point, such threads
are in the Invalid state.

In the common case (normal userspace-initiated exec), making the current
thread Runnable meant that we switched away from its current state:
Running. As the thread is indeed running, that's a bogus change!
This created a short time window in which the thread state was bogus,
and any attempt to block the thread would panic the kernel (due to a
bogus thread state in Thread::block() leading to VERIFY_NOT_REACHED().)

Fix this by not touching the thread state in Process::do_exec()
and instead make the first userspace thread Runnable directly after
calling Process::exec() on it in try_create_userspace_process().

It's unfortunate that exec() can be called both on the current thread,
and on a new thread that has never been scheduled. It would be good to
not have the latter edge case, but fixing that will require larger
architectural changes outside the scope of this fix.
2022-01-27 11:18:25 +01:00
Brian Gianforcaro
e954b4bdd4 Kernel: Return error from sys$execve() when called with zero arguments
There are many assumptions in the stack that argc is not zero, and
argv[0] points to a valid string. The recent pwnkit exploit on Linux
was able to exploit this assumption in the `pkexec` utility
(a SUID-root binary) to escalate from any user to root.

By convention `execve(..)` should always be called with at least one
valid argument, so lets enforce that semantic to harden the system
against vulnerabilities like pwnkit.

Reference: https://www.qualys.com/2022/01/25/cve-2021-4034/pwnkit.txt
2022-01-26 13:05:59 +01:00
Idan Horowitz
d1433c35b0 Kernel: Handle OOM failures in find_shebang_interpreter_for_executable 2022-01-26 02:37:03 +02:00
Idan Horowitz
8cf0e4a5e4 Kernel: Eliminate allocations from generate_auxiliary_vector 2022-01-26 02:37:03 +02:00
Idan Horowitz
a6f0ab358a Kernel: Make AddressSpace::find_regions_intersecting OOM-fallible 2022-01-26 02:37:03 +02:00
Idan Horowitz
e23d320bb9 Kernel: Fail gracefully due to OOM on HashTable set in sys$setgroups 2022-01-26 02:37:03 +02:00
Idan Horowitz
8dfd124718 Kernel: Replace String with NonnullOwnPtr<KString> in sys$getkeymap 2022-01-25 08:06:02 +01:00
Liav A
69f054616d Kernel: Add CommandLine option to disable or enable the PC speaker
By default, we disable the PC speaker as it's quite annoying when using
the text mode console.
2022-01-23 00:40:54 +00:00
Jelle Raaijmakers
df73e8b46b Kernel: Allow program headers to align on multiples of PAGE_SIZE
These checks in `sys$execve` could trip up the system whenever you try
to execute an `.so` file. For example, double-clicking `libwasm.so` in
Terminal crashes the kernel.

This changes the program header alignment checks to reflect the same
checks in LibELF, and passes the requested alignment on to
`::try_allocate_range()`.
2022-01-23 00:11:56 +02:00
Idan Horowitz
0adee378fd Kernel: Stop using LibKeyboard's CharacterMap in HIDManagement
This was easily done, as the Kernel and Userland don't actually share
any of the APIs exposed by it, so instead the Kernel APIs were moved to
the Kernel, and the Userland APIs stayed in LibKeyboard.

This has multiple advantages:
 * The non OOM-fallible String is not longer used for storing the
   character map name in the Kernel
 * The kernel no longer has to link to the userland LibKeyboard code
 * A lot of #ifdef KERNEL cruft can be removed from LibKeyboard
2022-01-21 18:25:44 +01:00
Andreas Kling
0e08763483 Kernel: Wrap much of sys$execve() in a block scope
Since we don't return normally from this function, let's make it a
little extra difficult to accidentally leak something by leaving it on
the stack in this function.
2022-01-13 23:57:33 +01:00
Andreas Kling
0e72b04e7d Kernel: Perform exec-into-new-image directly in sys$execve()
This ensures that everything allocated on the stack in Process::exec()
gets cleaned up. We had a few leaks related to the parsing of shebang
(#!) executables that get fixed by this.
2022-01-13 23:57:33 +01:00
Idan Horowitz
cfb9f889ac LibELF: Accept Span instead of Pointer+Size in validate_program_headers 2022-01-13 22:40:25 +01:00
Idan Horowitz
3e959618c3 LibELF: Use StringBuilders instead of Strings for the interpreter path
This is required for the Kernel's usage of LibELF, since Strings do not
expose allocation failure.
2022-01-13 22:40:25 +01:00
Andreas Kling
8ad46fd8f5 Kernel: Stop leaking executable path in successful sys$execve()
Since we don't return from sys$execve() when it's successful, we have to
take special care to tear down anything we've allocated.

Turns out we were not doing this for the full executable path itself.
2022-01-13 16:15:37 +01:00
Idan Horowitz
40159186c1 Kernel: Remove String use-after-free in generate_auxiliary_vector
Instead we generate the random bytes directly in
make_userspace_context_for_main_thread if requested.
2022-01-13 00:20:08 -08:00
Idan Horowitz
e72bbca9eb Kernel: Fix OOB write in sys$uname
Since this was only out of bounds of the specific field, not of the
whole struct, and because setting the hostname requires root privileges
this was not actually a security vulnerability.
2022-01-13 00:20:08 -08:00
Idan Horowitz
50d6a6a186 Kernel: Convert hostname to KString 2022-01-13 00:20:08 -08:00
Idan Horowitz
bc85b64a38 Kernel: Replace usages of String::formatted with KString in sys$exec 2022-01-12 16:09:09 +02:00
Idan Horowitz
dba0840942 Kernel: Remove outdated FIXME comment in sys$sethostname 2022-01-12 16:09:09 +02:00
Daniel Bertalan
182016d7c0 Kernel+LibC+LibCore+UE: Implement fchmodat(2)
This function is an extended version of `chmod(2)` that lets one control
whether to dereference symlinks, and specify a file descriptor to a
directory that will be used as the base for relative paths.
2022-01-12 14:54:12 +01:00
Andreas Kling
a62bdb0761 Kernel: Delay Process data unprotection in sys$pledge()
Don't unprotect the protected data area until we've validated the pledge
syscall inputs.
2022-01-02 18:08:02 +01:00
circl
63760603f3 Kernel+LibC+LibCore: Add lchown and fchownat functions
This modifies sys$chown to allow specifying whether or not to follow
symlinks and in which directory.

This was then used to implement lchown and fchownat in LibC and LibCore.
2022-01-01 15:08:49 +01:00
Daniel Bertalan
1d2f78682b Kernel+AK: Eliminate a couple of temporary String allocations 2021-12-30 14:16:03 +01:00
Brian Gianforcaro
54b9a4ec1e Kernel: Handle promise violations in the syscall handler
Previously we would crash the process immediately when a promise
violation was found during a syscall. This is error prone, as we
don't unwind the stack. This means that in certain cases we can
leak resources, like an OwnPtr / RefPtr tracked on the stack. Or
even leak a lock acquired in a ScopeLockLocker.

To remedy this situation we move the promise violation handling to
the syscall handler, right before we return to user space. This
allows the code to follow the normal unwind path, and grantees
there is no longer any cleanup that needs to occur.

The Process::require_promise() and Process::require_no_promises()
functions were modified to return ErrorOr<void> so we enforce that
the errors are always propagated by the caller.
2021-12-29 18:08:15 +01:00
Brian Gianforcaro
0f7fe1eb08 Kernel: Use Process::require_no_promises instead of REQUIRE_NO_PROMISES
This change lays the foundation for making the require_promise return
an error hand handling the process abort outside of the syscall
implementations, to avoid cases where we would leak resources.

It also has the advantage that it makes removes a gs pointer read
to look up the current thread, then process for every syscall. We
can instead go through the Process this pointer in most cases.
2021-12-29 18:08:15 +01:00
Brian Gianforcaro
bad6d50b86 Kernel: Use Process::require_promise() instead of REQUIRE_PROMISE()
This change lays the foundation for making the require_promise return
an error hand handling the process abort outside of the syscall
implementations, to avoid cases where we would leak resources.

It also has the advantage that it makes removes a gs pointer read
to look up the current thread, then process for every syscall. We
can instead go through the Process this pointer in most cases.
2021-12-29 18:08:15 +01:00
Brian Gianforcaro
b5367bbf31 Kernel: Clarify why ftruncate() & pread() are passed off_t const*
I fell into this trap and tried to switch the syscalls to pass by
the `off_t` by register. I think it makes sense to add a clarifying
comment for future readers of the code, so they don't fall into the
same trap. :^)
2021-12-29 05:54:04 -08:00
Brian Gianforcaro
737a11389c Kernel: Fix info leak from sockaddr_un in socket syscalls
In `sys$accept4()` and `get_sock_or_peer_name()` we were not
initializing the padding of the `sockaddr_un` struct, leading to
an kernel information leak if the
caller looked back at it's contents.

Before Fix:

    37.766 Clipboard(11:11): accept4 Bytes:
    2f746d702f706f7274616c2f636c6970626f61726440eac130e7fbc1e8abbfc
    19c10ffc18440eac15485bcc130e7fbc1549feaca6c9deaca549feaca1bb0bc
    03efdf62c0e056eac1b402d7acd010ffc14602000001b0bc030100000050bf0
    5c24602000001e7fbc1b402d7ac6bdc

After Fix:

    0.603 Clipboard(11:11): accept4 Bytes:
    2f746d702f706f7274616c2f636c6970626f617264000000000000000000000
    000000000000000000000000000000000000000000000000000000000000000
    000000000000000000000000000000000000000000000000000000000000000
    0000000000000000000000000000000
2021-12-29 03:41:32 -08:00
Daniel Bertalan
fcdd202741 Kernel: Return the actual number of CPU cores that we have
... instead of returning the maximum number of Processor objects that we
can allocate.

Some ports (e.g. gdb) rely on this information to determine the number
of worker threads to spawn. When gdb spawned 64 threads, the kernel
could not cope with generating backtraces for it, which prevented us
from debugging it properly.

This commit also removes the confusingly named
`Processor::processor_count` function so that this mistake can't happen
again.
2021-12-29 03:17:41 -08:00
Idan Horowitz
d7ec5d042f Kernel: Port Process to ListedRefCounted 2021-12-29 12:04:15 +01:00
Guilherme Goncalves
33b78915d3 Kernel: Propagate overflow errors from Memory::page_round_up
Fixes #11402.
2021-12-28 23:08:50 +01:00
Daniel Bertalan
52beeebe70 Kernel: Remove the KString::try_create(String::formatted(...)) pattern
We can now directly create formatted KStrings with KString::formatted.

:^)
2021-12-28 01:55:22 -08:00
Guilherme Gonçalves
da6aef9fff Kernel: Make msync return EINVAL when regions are too large
As a small cleanup, this also makes `page_round_up` verify its
precondition with `page_round_up_would_wrap` (which callers are expected
to call), rather than having its own logic.

Fixes #11297.
2021-12-23 17:43:12 -08:00
Daniel Bertalan
8e3d1a42e3 Kernel+UE+LibC: Store address as void* in SC_m{re,}map_params
Most other syscalls pass address arguments as `void*` instead of
`uintptr_t`, so let's do that here too. Besides improving consistency,
this commit makes `strace` correctly pretty-print these arguments in
hex.
2021-12-23 23:08:10 +01:00
Daniel Bertalan
77f9272aaf Kernel+UE: Add MAP_FIXED_NOREPLACE mmap() flag
This feature was introduced in version 4.17 of the Linux kernel, and
while it's not specified by POSIX, I think it will be a nice addition to
our system.

MAP_FIXED_NOREPLACE provides a less error-prone alternative to
MAP_FIXED: while regular fixed mappings would cause any intersecting
ranges to be unmapped, MAP_FIXED_NOREPLACE returns EEXIST instead. This
ensures that we don't corrupt our process's address space if something
is already at the requested address.

Note that the more portable way to do this is to use regular
MAP_ANONYMOUS, and check afterwards whether the returned address matches
what we wanted. This, however, has a large performance impact on
programs like Wine which try to reserve large portions of the address
space at once, as the non-matching addresses have to be unmapped
separately.
2021-12-23 23:08:10 +01:00
Andreas Kling
1d08b671ea Kernel: Enter new address space before destroying old in sys$execve()
Previously we were assigning to Process::m_space before actually
entering the new address space (assigning it to CR3.)

If a thread was preempted by the scheduler while destroying the old
address space, we'd then attempt to resume the thread with CR3 pointing
at a partially destroyed address space.

We could then crash immediately in write_cr3(), right after assigning
the new value to CR3. I am hopeful that this may have been the bug
haunting our CI for months. :^)
2021-12-23 01:18:26 +01:00
Daniel Bertalan
ce1bf3724e Kernel: Replace intersecting ranges in mmap when MAP_FIXED is specified
This behavior is mandated by POSIX and is used by software like Wine
after reserving large chunks of the address range.
2021-12-22 00:02:36 -08:00
Martin Bříza
86b249f02f Kernel: Implement sysconf(_SC_SYMLOOP_MAX)
Not much to say here, this is an implementation of this call that
accesses the actual limit constant that's used by the VirtualFileSystem
class.

As a side note, this is required for my eventual Qt port.
2021-12-21 12:54:11 -08:00
Liav A
5a649d0fd5 Kernel: Return EINVAL when specifying -1 for setuid and similar syscalls
For setreuid and setresuid syscalls, -1 means to set the current
uid/euid/gid/egid value, to be more convenient for programming.
However, for other syscalls where we pass only one argument, there's no
justification to specify -1.

This behavior is identical to how Linux handles the value -1, and is
influenced by the fact that the manual pages for the group of one
argument syscalls that handle ID operations is ambiguous about this
topic.
2021-12-20 11:32:16 +01:00
Hendiadyoin1
18013f3c06 Kernel: Remove a redundant check in Process::remap_range_as_stack
We already VERIFY that we have carved something out, so we don't need to
check that again.
2021-12-18 10:31:18 -08:00
Hendiadyoin1
2d28b441bf Kernel: Collapse a redundant boolean conditional return statement in …
validate_mmap_prot
2021-12-18 10:31:18 -08:00
Hendiadyoin1
f38d32535c Kernel: Access OpenFileDescriptions::max_open() statically in Syscalls 2021-12-18 10:31:18 -08:00
Hendiadyoin1
c860e0ab95 Kernel: Add implicit auto qualifiers in Syscalls 2021-12-18 10:31:18 -08:00
Hendiadyoin1
f5b495d92c Kernel: Remove else after return in Process::do_write 2021-12-18 10:31:18 -08:00
Andreas Kling
32aa623eff Kernel: Fix 4-byte uninitialized memory leak in sys$sigaltstack()
It was possible to extract 4 bytes of uninitialized kernel stack memory
on x86_64 by looking in the padding of stack_t.
2021-12-18 11:30:10 +01:00
Andreas Kling
0ae8702692 Kernel: Make File::stat() & friends return Error<struct stat>
Instead of making the caller provide a stat buffer, let's just return
one as a value.
2021-12-18 11:30:10 +01:00
Andreas Kling
abf2204402 Kernel: Use copy_typed_from_user() in more places :^) 2021-12-18 11:30:10 +01:00
Andreas Kling
39d9337db5 Kernel: Make sys${ftruncate,pread} take off_t as const pointer
These syscalls don't write back to the off_t value (unlike sys$lseek)
so let's take Userspace<off_t const*> instead of Userspace<off_t*>.
2021-12-18 11:30:10 +01:00
Jean-Baptiste Boric
23257cac52 Kernel: Remove sys$select() syscall
Now that the userland has a compatiblity wrapper for select(), the
kernel doesn't need to implement this syscall natively. The poll()
interface been around since 1987, any code still using select()
should be slapped silly.

Note: the SerenityOS source tree mostly uses select() and not poll()
despite SerenityOS having support for poll() since early 2019...
2021-12-12 21:48:50 +01:00
Jean-Baptiste Boric
2177c2a30b Kernel: Split off sys$poll() into Syscalls/poll.cpp 2021-12-12 21:48:50 +01:00
Idan Horowitz
762e047ec9 Kernel+LibC: Implement sigtimedwait()
This includes a new Thread::Blocker called SignalBlocker which blocks
until a signal of a matching type is pending. The current Blocker
implementation in the Kernel is very complicated, but cleaning it up is
a different yak for a different day.
2021-12-12 08:34:19 +02:00
Idan Horowitz
81a76a30a1 Kernel: Preserve pending signals across execve(2)s
As required by posix. Also rename Thread::clear_signals to
Thread::reset_signals_for_exec since it doesn't actually clear any
pending signals, but rather does execve related signal book-keeping.
2021-12-12 08:34:19 +02:00
Idan Horowitz
0ca1231d8f Kernel: Inherit alternative signal stack on fork(2)
A child process created via fork(2) inherits a copy of its parent's
alternate signal stack settings.
2021-12-12 08:34:19 +02:00
Idan Horowitz
92a6c91f4e Kernel: Preserve signal mask across fork(2) and execve(2)
A child created via fork(2) inherits a copy of its parent's signal
mask; the signal mask is preserved across execve(2).
2021-12-12 08:34:19 +02:00
Ben Wiederhake
0e6e1092f0 Kernel: Make ptrace return an error on error
Returning 'result.error().code()' erroneously creates an
ErrorOr<FlatPtr> of the positive errno code, which breaks our
error-returning convention.

This seems to be due to a forgotten minus-sign during the refactoring in
9e51e295cf. This latent bug was never
discovered, because currently the error-handling paths are rarely
exercised.
2021-12-05 22:59:09 +01:00
Ben Wiederhake
0f8483f09c Kernel: Implement new ptrace function PT_PEEKBUF
This enables the tracer to copy large amounts of data in a much saner
way.
2021-12-05 22:59:09 +01:00
Ben Wiederhake
3e223185b3 Kernel+strace: Remove unnecessary indirection for PEEK
Also, remove incomplete, superfluous check.
Incomplete, because only the byte at the provided address was checked;
this misses the last bytes of the "jerk page".
Superfluous, because it is already correctly checked by peek_user_data
(which calls copy_from_user).

The caller/tracer should not typically attempt to read non-userspace
addresses, we don't need to "hot-path" it either.
2021-12-05 22:59:09 +01:00
Idan Horowitz
265764ff2f Kernel: Add support for the POLLWRBAND poll event 2021-12-05 12:53:29 +01:00
Idan Horowitz
f415218afe Kernel+LibC: Implement sigaltstack()
This is required for compiling wine for serenity
2021-12-01 21:44:11 +02:00
Idan Horowitz
d5d0eb45bf Kernel: Clear up some comments in the sys$mprotect implementation 2021-12-01 21:44:11 +02:00
Idan Horowitz
f27bbec7b2 Kernel: Move incorrect early return in sys$mprotect
Since we're iterating over multiple regions that interesect with the
requested range, just one of them having the requested access flags
is not enough to finish the syscall early.
2021-12-01 21:44:11 +02:00
Idan Horowitz
4ca39c7110 Kernel: Move the expand_range_to_page_boundaries helper to MemoryManager
This helper can (and will) be used in more parts of the kernel besides
the mmap-family of syscalls.
2021-12-01 21:44:11 +02:00
Idan Horowitz
fc13d0782f LibC: Make the madvise advice field a value instead of a bitfield
The advices are almost always exclusive of one another, and while POSIX
does not define madvise, most other unix-like and *BSD systems also only
accept a singular value per call.
2021-12-01 21:44:11 +02:00
Hendiadyoin1
c7b90fa7d3 Kernel: Don't rewrite the whole file on sys$msync 2021-12-01 09:47:46 +01:00
Hendiadyoin1
259f78545a Kernel: Allow flushing of partial regions in sys$msync 2021-12-01 09:47:46 +01:00
Hendiadyoin1
49d6ad6633 Kernel: Handle more error cases in sys$msync 2021-12-01 09:47:46 +01:00
Ben Wiederhake
33079c8ab9 Kernel+UE+LibC: Remove unused dbgputch syscall
Everything uses the dbgputstr syscall anyway, so there is no need to
keep supporting it.
2021-11-24 22:56:39 +01:00
Jelle Raaijmakers
46ad5f2a17 Kernel: Fix futex syscall return values
We were returning `int`s from two functions that caused `ErrorOr` to
not recognize the error codes as a special case. For example,
`ETIMEDOUT` was returned as the positive number 66 resulting in all
kinds of defective behavior.

As a result, SDL2's timer subsystem was not working at all, since the
`SDL_MUTEX_TIMEDOUT` value was never returned.
2021-11-24 19:44:57 +01:00
Andreas Kling
dd6e73176d Kernel: Make sys$mmap() interpret 0-alignment as page-sized alignment
This allows userspace to get a sane default behavior without having to
specify the page size.
2021-11-23 11:44:42 +01:00
Andreas Kling
f99af1bef0 Kernel: Make sure OpenFileDescription is kept alive while read() blocks
It's not safe to store OpenFileDescription in a raw pointer when
blocking, since another thread may decide to close the corresponding
file descriptor.
2021-11-21 20:22:48 +01:00
Andreas Kling
f2c3a41a8f Kernel: Make UserOrKernelBuffer::for_user_buffer() return ErrorOr<T>
This simplifies EFAULT propagation with TRY(). :^)
2021-11-21 20:22:48 +01:00
Itamar
38ddf301f6 Kernel+LibC: Fix ptrace for 64-bit
This makes the types used in the PT_PEEK and PT_POKE actions
suitable for 64-bit platforms as well.
2021-11-20 21:22:24 +00:00
Andreas Kling
e08d213830 Kernel: Use DistinctNumeric for filesystem ID's
This patch adds the FileSystemID type, which is a distinct u32.
This prevents accidental conversion from arbitrary integers.
2021-11-18 21:11:30 +01:00
Andreas Kling
32aa37d5dc Kernel+LibC: Add msync() system call
This allows userspace to trigger a full (FIXME) flush of a shared file
mapping to disk. We iterate over all the mapped pages in the VMObject
and write them out to the underlying inode, one by one. This is rather
naive, and there's lots of room for improvement.

Note that shared file mappings are currently not possible since mmap()
returns ENOTSUP for PROT_WRITE+MAP_SHARED. That restriction will be
removed in a subsequent commit. :^)
2021-11-17 19:34:15 +01:00
Andrew Kaster
f1d8978804 AK+Kernel: Remove implicit conversion from Userspace<T*> to FlatPtr
This feels like it was a refactor transition kind of conversion. The
places that were relying on it can easily be changed to explicitly ask
for the ptr() or a new vaddr() method on Userspace<T*>.

FlatPtr can still implicitly convert to Userspace<T> because the
constructor is not explicit, but there's quite a few more places that
are relying on that conversion.
2021-11-16 00:13:22 +01:00
Andrew Kaster
194456efdc Kernel: Remove unnecessary StringBuilder from sys$create_thread()
A series of refactors changed Threads to always have a name, and to
store their name as a KString. Before the refactors a StringBuilder was
used to format the default thread name for a non-main thread, but it is
since unused. Remove it and the AK/String related header includes from
the thread syscall implementation file.
2021-11-16 00:13:22 +01:00
Daniel Bertalan
648a139af3 Kernel+LibC: Pass off_t to pread() via a pointer
`off_t` is a 64-bit signed integer, so passing it in a register on i686
is not the best idea.

This fix gets us one step closer to making the LLVM port work.
2021-11-13 10:04:46 +01:00
Andreas Kling
88b6428c25 AK: Make Vector::try_* functions return ErrorOr<void>
Instead of signalling allocation failure with a bool return value
(false), we now use ErrorOr<void> and return ENOMEM as appropriate.
This allows us to use TRY() and MUST() with Vector. :^)
2021-11-10 21:58:58 +01:00
Ben Wiederhake
ad5061bb7a Kernel: Make (f)statvfs report filesystem ID correctly 2021-11-10 16:13:10 +01:00
Ben Wiederhake
631447da57 Kernel: Fix TOCTOU in fstatvfs
In particular, fstatvfs used to assume that a file that was earlier
opened using some path will forever be at that path. This is wrong, and
in the meantime new mounts and new filesystems could take up the
filename or directories, leading to a completely inaccurate result.
This commit improves the situation:
- All filesystem information is now always accurate.
- The mount flags *might* be erroneously zero, if the custody for the
  open file is not available. I don't know when that might happen, but
  it is definitely not the typical case.
2021-11-10 16:13:10 +01:00
Andreas Kling
79fa9765ca Kernel: Replace KResult and KResultOr<T> with Error and ErrorOr<T>
We now use AK::Error and AK::ErrorOr<T> in both kernel and userspace!
This was a slightly tedious refactoring that took a long time, so it's
not unlikely that some bugs crept in.

Nevertheless, it does pass basic functionality testing, and it's just
real nice to finally see the same pattern in all contexts. :^)
2021-11-08 01:10:53 +01:00
Brian Gianforcaro
9f6eabd73a Kernel: Move TTY subsystem to use KString instead of AK::String
This is minor progress on removing the `AK::String` API from the Kernel
in the interest of improving OOM safety.
2021-11-02 11:34:31 +01:00
Ben Wiederhake
c05c5a7ff4 Kernel: Clarify ambiguous {File,Description}::absolute_path
Found due to smelly code in InodeFile::absolute_path.

In particular, this replaces the following misleading methods:

File::absolute_path
This method *never* returns an actual path, and if called on an
InodeFile (which is impossible), it would VERIFY_NOT_REACHED().

OpenFileDescription::try_serialize_absolute_path
OpenFileDescription::absolute_path
These methods do not guarantee to return an actual path (just like the
other method), and just like Custody::absolute_path they do not
guarantee accuracy. In particular, just renaming the method made a
TOCTOU bug obvious.

The new method signatures use KResultOr, just like
try_serialize_absolute_path() already did.
2021-10-31 12:06:28 +01:00
Ben Wiederhake
88ca12f037 Kernel: Enable early-returns from VFS::for_each_mount 2021-10-31 12:06:28 +01:00
Ben Wiederhake
735da58d44 Kernel: Avoid OpenFileDescription::absolute_path 2021-10-31 12:06:28 +01:00
James Mintram
0fbeac6011 Kernel: Split SmapDisabler so header is platform independent
A new header file has been created in the Arch/ folder while the
implementation has been moved into a CPP living in the X86 folder.
2021-10-15 21:48:45 +01:00
Rodrigo Tobar
e1093c3403 Kernel: Implement pread syscall
The OpenFileDescription class already offers the necessary functionlity,
so implementing this was only a matter of following the structure for
`read` while handling the additional `offset` argument.
2021-10-13 16:10:50 +02:00
Rodrigo Tobar
8936b111a7 Kernel: Factor out common code from read/readv syscalls
Having these bits of code factored out not only prevents duplication
now, but will also allow us to implement pread without repeating
ourselves (too much).
2021-10-13 16:10:50 +02:00
Rodrigo Tobar
bf4e536f00 Kernel: Correctly interpret ioctl's FIONBIO user value
Values in `ioctl` are given through a pointer, but ioctl's FIONBIO
implementation was interpreting this pointer as an integer directly.
This meant that programs using `ioctl` to set a file descriptor in
blocking mode met with incorrect behavior: they passed a non-null
pointer pointing to a value of 0, but the kernel interpreted the pointer
as a non-zero integer, thus making the file non-blocking.

This commit fixes this behavior by reading the value from the userspace
pointer and using that to set the non-blocking flag on the file
descriptor.

This bug was found while trying to run the openssl tool on serenity,
which used `ioctl` to ensure newly-created sockets are in blocking mode.
2021-10-11 10:46:01 -07:00
Nico Weber
1cdb12e920 Kernel: Fix -Wunreachable-code warnings from clang 2021-10-08 23:33:46 +02:00
Brian Gianforcaro
0223faf6f4 Kernel: Access MemoryManager static functions statically
SonarCloud flagged this "Code Smell", where we are accessing these
static methods as if they are instance methods. While it is technically
possible, it is very confusing to read when you realize they are static
functions.
2021-10-02 18:16:15 +02:00
Nico Weber
5a951d6258 Kernel: Fix a few typos 2021-10-01 00:51:49 +01:00
Eric Seifert
8924b1f532 Kernel: Allow PROT_NONE in mmap and mprotect for stack regions
To allow for userspace guard pages (ruby uses this).
Redundant since serenity creates them automatically,
but should be allowed anyway.
2021-09-23 04:14:41 +00:00
Ben Wiederhake
12247fe9b4 Kernel: Use AK::Variant default initialization where appropriate 2021-09-21 04:22:52 +04:30
Eric Seifert
edbc5489a8 Kernel: Add support for O_NONBLOCK in pipe syscall
While working on a port, I saw a pipe creation fail due to missing
nonblock support in pipe syscall.
2021-09-19 12:20:16 +02:00
Itamar
bb1ad759c5 Kernel: Allow calling sys$waitid on traced, non-child processes
Previously, attempting to call sys$waitid on non-child processes
returned ECHILD.

That prevented debugging non-child processes by attaching to them during
runtime (as opposed to forking and debugging the child, which is what
was previously supported).

We now allow calling sys$waitid on a any process that is being traced
by us, even if it's not our child.
2021-09-16 23:47:46 +02:00
Brian Gianforcaro
e8ec1e908d Kernel: Only instantiate main_program_metadata in the scope it's needed
pvs-studio flagged this as a potential perf optimization.
2021-09-16 17:17:13 +02:00
Andreas Kling
b6efd66d56 Kernel: Use move semantics in sys$sendfd()
Avoid an unnecessary NonnullRefPtr<OpenFileDescription> copy.
2021-09-15 21:09:47 +02:00
Liav A
8d0dbdeaac Kernel+Userland: Introduce a new way to reboot and poweroff the machine
This change removes the halt and reboot syscalls, and create a new
mechanism to change the power state of the machine.
Instead of how power state was changed until now, put a SysFS node as
writable only for the superuser, that with a defined value, can result
in either reboot or poweroff.
In the future, a power group can be assigned to this node (which will be
the GroupID responsible for power management).

This opens an opportunity to permit to shutdown/reboot without superuser
permissions, so in the future, a userspace daemon can take control of
this node to perform power management operations without superuser
permissions, if we enforce different UserID/GroupID on that node.
2021-09-12 11:52:16 +02:00
Liav A
9132596b8e Kernel: Move ACPI and BIOS code into the new Firmware directory
This will somwhat help unify them also under the same SysFS directory in
the commit.
Also, it feels much more like this change reflects the reality that both
ACPI and the BIOS are part of the firmware on x86 computers.
2021-09-12 11:52:16 +02:00
TheFightingCatfish
a81b21c1a7 Kernel+LibC: Implement fsync 2021-09-12 11:24:02 +02:00
Liav A
04ba31b8c5 Kernel+Userland: Remove loadable kernel moduless
These interfaces are broken for about 9 months, maybe longer than that.
At this point, this is just a dead code nobody tests or tries to use, so
let's remove it instead of keeping a stale code just for the sake of
keeping it and hoping someone will fix it.

To better justify this, I read that OpenBSD removed loadable kernel
modules in 5.7 release (2014), mainly for the same reason we do -
nobody used it so they had no good reason to maintain it.
Still, OpenBSD had LKMs being effectively working, which is not the
current state in our project for a long time.
An arguably better approach to minimize the Kernel image size is to
allow dropping drivers and features while compiling a new image.
2021-09-11 19:05:00 +02:00
Linus Groh
f646d49ac1 Kernel: Add _SC_HOST_NAME_MAX 2021-09-11 00:28:39 +02:00
Andreas Kling
dd82f68326 Kernel: Use KString all the way in sys$execve()
This patch converts all the usage of AK::String around sys$execve() to
using KString instead, allowing us to catch and propagate OOM errors.

It also required changing the kernel CommandLine helper class to return
a vector of KString for the userspace init program arguments.
2021-09-09 21:25:10 +02:00
Andreas Kling
524ef5e475 Kernel: Add KBuffer::bytes() and use it
(Instead of hand-wrapping { data(), size() } in a bunch of places.)
2021-09-08 20:16:00 +02:00
Liav A
3d5ddbab74 Kernel: Rename DevFS => DevTmpFS
The current implementation of DevFS resembles the linux devtmpfs, and
not the traditional DevFS, so let's rename it to better represent the
direction of the development in regard to this filesystem.

The abbreviation for DevTmpFS is still "dev", because it doesn't add
value as a commandline option to make it longer.

In quick summary - DevFS in unix OSes is simply a static filesystem, so
device nodes are generated and removed by the kernel code. DevTmpFS
is a "modern reinvention" of the DevFS, so it is much more like a TmpFS
in the sense that not only it's stored entirely in RAM, but the userland
is responsible to add and remove devices nodes as it sees fit, and no
kernel code is directly being involved to keep the filesystem in sync.
2021-09-08 00:42:20 +02:00
Andreas Kling
a01b19c878 Kernel: Remove KBuffer::try_copy() in favor of try_create_with_bytes()
These were already equivalent, so let's only have one of them.
2021-09-07 16:22:29 +02:00
Andreas Kling
b300f9aa2f Kernel: Convert KBuffer::copy() => KBuffer::try_copy()
This was a weird KBuffer API that assumed failure was impossible.
This patch converts it to a modern KResultOr<NonnullOwnPtr<KBuffer>> API
and updates the two clients to the new style.
2021-09-07 15:36:39 +02:00
Andreas Kling
250b52d6e5 Kernel: Make KBuffer::try_create_with_bytes() return KResultOr 2021-09-07 15:22:24 +02:00
Andreas Kling
ed5d04b0ea Kernel: Use KResultOr and TRY() for FIFO 2021-09-07 13:58:16 +02:00
Andreas Kling
213b8868af Kernel: Rename file_description(fd) => open_file_description(fd)
To go with the class rename.
2021-09-07 13:53:14 +02:00
Andreas Kling
4a9c18afb9 Kernel: Rename FileDescription => OpenFileDescription
Dr. POSIX really calls these "open file description", not just
"file description", so let's call them exactly that. :^)
2021-09-07 13:53:14 +02:00
Andreas Kling
dbd639a2d8 Kernel: Convert much of sys$execve() to using KString
Make use of the new FileDescription::try_serialize_absolute_path() to
avoid String in favor of KString throughout much of sys$execve() and
its helpers.
2021-09-07 13:53:14 +02:00
Andreas Kling
226383f45b LibELF: Use StringView to carry temporary strings in auxiliary vector
Let's not force clients to provide a String.
2021-09-07 13:53:14 +02:00
Andreas Kling
a27c6f5226 Kernel: Avoid unnecessary String allocation in sys$statvfs() 2021-09-07 13:53:14 +02:00
Andreas Kling
55b0b06897 Kernel: Store process names as KString 2021-09-07 13:53:14 +02:00
Andreas Kling
fe2e25edad Kernel: Add a comment explaining an alternate path in Process::exec()
I had to look at this for a moment before I realized that sys$execve()
and the spawning of /bin/SystemServer at boot are taking two different
paths out of exec().

Add a comment to help the next person looking at it. :^)
2021-09-07 01:34:26 +02:00
Andreas Kling
5d06ab6531 Kernel: Fix file description leak in sys$execve()
Before this patch, we were leaking a ref on the open file description
used for the interpreter (the dynamic loader) in sys$execve().

This surfaced when adapting the syscall to use TRY(), since we were now
correctly transferring ownership of the interpreter to Process::exec()
and no longer holding on to a local copy of it (in `elf_result`).

Fixing the leak uncovered another problem. The interpreter description
would now get destroyed when returning from do_exec(), which led to a
kernel panic when attempting to acquire a mutex.

This happens because we're in a particularly delicate state when
returning from do_exec(). Everything is primed for the upcoming context
switch into the new executable image, and trying to block the thread
at this point will panic the kernel.

We fix this by destroying the interpreter description earlier in
do_exec(), at the point where we no longer need it.
2021-09-07 01:18:02 +02:00
Andreas Kling
e226400dd8 Kernel: Don't seek the program executable description in sys$execve()
The dynamic loader doesn't care if the kernel has moved the file
cursor around before it gains control.
2021-09-07 01:18:02 +02:00
Andreas Kling
f4624e4ee1 Kernel: Hoist allocation of main program FD in sys$execve()
When executing a dynamically linked program, we need to pass the main
program executable via a file descriptor to the dynamic loader.

Before this patch, we were allocating an FD for this purpose long after
it was safe to do anything fallible. If we were unable to allocate an
FD we would simply panic the kernel(!)

We now hoist the allocation so it can fail before we've committed to
a new executable.
2021-09-07 01:18:02 +02:00
Andreas Kling
b141bfe53b Kernel: Reorganize ELF loading so it can use TRY()
Due to the use of ELF::Image::for_each_program_header(), we were
previously unable to use TRY() in the ELF loading code (since the return
statement inside TRY() would only return from the iteration callback.)
2021-09-07 01:18:02 +02:00
Andreas Kling
e6929835d2 Kernel: Make copy_time_from_user() helpers use KResultOr<Time>
...and use TRY() for smooth error propagation everywhere.
2021-09-07 01:18:02 +02:00
Andreas Kling
274d535d0e Kernel: Use TRY() in sys$module_load() and sys$module_unload() 2021-09-06 20:23:08 +02:00
Andreas Kling
56a2594de7 Kernel: Make KString factories return KResultOr + use TRY() everywhere
There are a number of places that don't have an error propagation path
right now, so I've added FIXME's about that.
2021-09-06 19:25:36 +02:00
Andreas Kling
f16b9a691f Kernel: Rename ProcessPagingScope => ScopedAddressSpaceSwitcher 2021-09-06 18:56:51 +02:00
Andreas Kling
cd8d52e6ae Kernel: Improve API names for switching address spaces
- enter_space => enter_address_space
- enter_process_paging_scope => enter_process_address_space
2021-09-06 18:56:51 +02:00
Andreas Kling
298cd57fe7 Kernel: Allocate signal trampoline before committing to a sys$execve()
Once we commit to a new executable image in sys$execve(), we can no
longer return with an error to whoever called us from userspace.
We must make sure to surface any potential errors before that point.

This patch moves signal trampoline allocation before the commit.
A number of other things remain to be moved.
2021-09-06 18:56:51 +02:00
Andreas Kling
6863d015ec Kernel: Use TRY() more in sys$execve()
I just keep finding more and more places to make use of this. :^)
2021-09-06 18:56:51 +02:00
Andreas Kling
009ea5013d Kernel: Use TRY() in find_elf_interpreter_for_executable() 2021-09-06 18:56:51 +02:00
Andreas Kling
511ebffd94 Kernel: Improve find_elf_interpreter_for_executable() parameter names 2021-09-06 18:56:51 +02:00
Andreas Kling
645e29a88b Kernel: Don't turn I/O errors during sys$execve() into ENOEXEC
Instead, just propagate whatever the real error was.
2021-09-06 13:06:05 +02:00
Andreas Kling
84addef10f Kernel: Improve arguments retrieval error propagation in sys$execve()
Instead of turning any arguments related error into an EFAULT, we now
propagate the innermost error during arguments retrieval.
2021-09-06 13:06:05 +02:00
Andreas Kling
6e3381ac32 Kernel: Use KResultOr and TRY() for {Shared,Private}InodeVMObject 2021-09-06 13:06:05 +02:00
Andreas Kling
e3a716ceff Kernel: Make Memory::Region::map() return KResult
..and use TRY() at the call sites to propagate errors. :^)
2021-09-06 13:06:05 +02:00
Andreas Kling
7981422500 Kernel: Make Threads always have a name
We previously allowed Thread to exist in a state where its m_name was
null, and had to work around that in various places.

This patch removes that possibility and forces those who would create a
thread (or change the name of one) to provide a NonnullOwnPtr<KString>
with the name.
2021-09-06 13:06:05 +02:00
Andreas Kling
cda2b9e71c Kernel: Improvements to Custody absolute path serialization
- Renamed try_create_absolute_path() => try_serialize_absolute_path()
- Use KResultOr and TRY() to propagate errors
- Don't call this when it's only for debug logging
2021-09-06 13:06:05 +02:00
Andreas Kling
e3b063581e Kernel: Use TRY() in sys$alarm() 2021-09-06 13:06:05 +02:00
Andreas Kling
d34f2b643e Kernel: Tidy up Plan9FS construction a bit 2021-09-06 13:06:05 +02:00
Andreas Kling
36725228fa Kernel: Tidy up Ext2FS construction a bit 2021-09-06 13:06:05 +02:00
Andreas Kling
47bfbe343b Kernel: Tidy up SysFS construction
- Use KResultOr and TRY() to propagate errors
- Check for OOM errors
- Move allocation out of constructors

There's still a lot more to do here, as SysFS is still quite brittle
in the face of memory pressure.
2021-09-06 13:06:05 +02:00
Andreas Kling
788b91a65c Kernel: Tidy up DevFS construction and handle OOM errorso
- Use KResultOr and TRY() to propagate errors
- Check for OOM
- Move allocations out of the DevFS constructor
2021-09-06 13:06:05 +02:00
Andreas Kling
efe4e230ee Kernel: Tidy up DevPtsFS construction and handle OOM errors
- Use KResultOr and TRY() to propagate errors
- Check for OOM when creating new inodes
2021-09-06 13:06:05 +02:00
Andreas Kling
f2512071f2 Kernel: Use TRY() in sys$getrandom() 2021-09-06 02:36:21 +02:00
Andreas Kling
a8516681b7 Kernel: Tidy up TmpFS and TmpFSInode construction
- Use KResultOr<NonnullRefPtr<T>>
- Propagate errors
- Use TRY() at call sites
2021-09-06 02:36:21 +02:00
Andreas Kling
a994f11f10 Kernel: Make AddressSpace::add_region() return KResultOr<Region*>
This allows us to use TRY() in a few places.
2021-09-06 02:02:06 +02:00
Andreas Kling
75564b4a5f Kernel: Make kernel region allocators return KResultOr<NOP<Region>>
This expands the reach of error propagation greatly throughout the
kernel. Sadly, it also exposes the fact that we're allocating (and
doing other fallible things) in constructors all over the place.

This patch doesn't attempt to address that of course. That's work for
our future selves.
2021-09-06 01:55:27 +02:00