The DevFS along with DevPtsFS give a complete solution for populating
device nodes in /dev. The main purpose of DevFS is to eliminate the
need of device nodes generation when building the system.
Later on, DevFS will assist with exposing disk partition nodes.
This was a goofy kernel API where you could assign an icon_id (int) to
a process which referred to a global shbuf with a 16x16 icon bitmap
inside it.
Instead of this, programs that want to display a process icon now
retrieve it from the process executable instead.
This can happen when an unveil follows another with a path that is a
sub-path of the other one:
```c++
unveil("/home/anon/.config/whoa.ini", "rw");
unveil("/home/anon", "r"); // this would fail, as "/home/anon" inherits
// the permissions of "/", which is None.
```
This new flag controls two things:
- Whether the kernel will generate core dumps for the process
- Whether the EUID:EGID should own the process's files in /proc
Processes are automatically made non-dumpable when their EUID or EGID is
changed, either via syscalls that specifically modify those ID's, or via
sys$execve(), when a set-uid or set-gid program is executed.
A process can change its own dumpable flag at any time by calling the
new sys$prctl(PR_SET_DUMPABLE) syscall.
Fixes#4504.
If the allocation fails (e.g ENOMEM) we want to simply return an error
from sys$execve() and continue executing the current executable.
This patch also moves make_userspace_stack_for_main_thread() out of the
Thread class since it had nothing in particular to do with Thread.
Process had a couple of members whose only purpose was holding on to
some temporary data while building the auxiliary vector. Remove those
members and move the vector building to a free function in execve.cpp
Make it possible to bail out of ELF::Image::for_each_program_header()
and then do exactly that if something goes wrong during executable
loading in the kernel.
Also make the errors we return slightly more nuanced than just ENOEXEC.
Get rid of the lambda functions and put the logic inline in the program
header traversal loop instead. This makes the code quite a bit shorter
and hopefully makes it easier to see what's going on.
This commit gets rid of ELF::Loader entirely since its very ambiguous
purpose was actually to load executables for the kernel, and that is
now handled by the kernel itself.
This patch includes some drive-by cleanup in LibDebug and CrashDaemon
enabled by the fact that we no longer need to keep the ref-counted
ELF::Loader around.
It was really weird that ELF loading was performed by the ELF::Loader
class instead of just being done by the kernel itself. This patch moves
all the layout logic from ELF::Loader over to sys$execve().
The kernel no longer cares about ELF::Loader and instead only uses an
ELF::Image as an interpreting wrapper around executables.
It was possible to overwrite the entire EFLAGS register since we didn't
do any masking in the ptrace and sigreturn syscalls.
This made it trivial to gain IO privileges by raising IOPL to 3 and
then you could talk to hardware to do all kinds of nasty things.
Thanks to @allesctf for finding these issues! :^)
Their exploit/write-up: https://github.com/allesctf/writeups/blob/master/2020/hxpctf/wisdom2/writeup.md
And make an effort to propagate errors out from the inner parts.
This fixes an issue where the kernel would infinitely loop in coredump
generation if the TmpFS filled up.
This implements a number of changes related to time:
* If a HPET is present, it is now used only as a system timer, unless
the Local APIC timer is used (in which case the HPET timer will not
trigger any interrupts at all).
* If a HPET is present, the current time can now be as accurate as the
chip can be, independently from the system timer. We now query the
HPET main counter for the current time in CPU #0's system timer
interrupt, and use that as a base line. If a high precision time is
queried, that base line is used in combination with quering the HPET
timer directly, which should give a much more accurate time stamp at
the expense of more overhead. For faster time stamps, the more coarse
value based on the last interrupt will be returned. This also means
that any missed interrupts should not cause the time to drift.
* The default system interrupt rate is reduced to about 250 per second.
* Fix calculation of Thread CPU usage by using the amount of ticks they
used rather than the number of times a context switch happened.
* Implement CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE and use it
for most cases where precise timestamps are not needed.
Problem:
- `(void)` simply casts the expression to void. This is understood to
indicate that it is ignored, but this is really a compiler trick to
get the compiler to not generate a warning.
Solution:
- Use the `[[maybe_unused]]` attribute to indicate the value is unused.
Note:
- Functions taking a `(void)` argument list have also been changed to
`()` because this is not needed and shows up in the same grep
command.
Switch on the new credentials before loading the new executable into
memory. This ensures that attempts to ptrace() the program from an
unprivileged process will fail.
This covers one bug that was exploited in the 2020 HXP CTF:
https://hxp.io/blog/79/hxp-CTF-2020-wisdom2/
Thanks to yyyyyyy for finding the bug! :^)
We need to account for how many shared lock instances the current
thread owns, so that we can properly release such references when
yielding execution.
We also need to release the process lock when donating.
LexicalPath is a big and heavy class that's really meant as a helper
for extracting parts of a path, not for storage or passing around.
Instead, pass paths around as strings and use LexicalPath locally
as needed.
When a process crashes, we generate a coredump file and write it in
/tmp/coredumps/.
The coredump file is an ELF file of type ET_CORE.
It contains a segment for every userspace memory region of the process,
and an additional PT_NOTE segment that contains the registers state for
each thread, and a additional data about memory regions
(e.g their name).
This adds an allocate_tls syscall through which a userspace process
can request the allocation of a TLS region with a given size.
This will be used by the dynamic loader to allocate TLS for the main
executable & its libraries.
When the main executable needs an interpreter, we load the requested
interpreter program, and pass to it an open file decsriptor to the main
executable via the auxiliary vector.
Note that we do not allocate a TLS region for the interpreter.
This prevents zombies created by multi-threaded applications and brings
our model back to closer to what other OSs do.
This also means that SIGSTOP needs to halt all threads, and SIGCONT needs
to resume those threads.
This is necessary because if a process changes the state to Stopped
or resumes from that state, a wait entry is created in the parent
process. So, if a child process does this before disown is called,
we need to clear those entries to avoid leaking references/zombies
that won't be cleaned up until the former parent exits.
This also should solve an even more unlikely corner case where another
thread is waiting on a pid that is being disowned by another thread.
Fix some problems with join blocks where the joining thread block
condition was added twice, which lead to a crash when trying to
unblock that condition a second time.
Deferred block condition evaluation by File objects were also not
properly keeping the File object alive, which lead to some random
crashes and corruption problems.
Other problems were caused by the fact that the Queued state didn't
handle signals/interruptions consistently. To solve these issues we
remove this state entirely, along with Thread::wait_on and change
the WaitQueue into a BlockCondition instead.
Also, deliver signals even if there isn't going to be a context switch
to another thread.
Fixes#4336 and #4330
This allows us to use blocking timeouts with either monotonic or
real time for all blockers. Which means that clock_nanosleep()
now also supports CLOCK_REALTIME.
Also, switch alarm() to use CLOCK_REALTIME as per specification.