Note: I switched from copying the single element out of the sched_param
struct, to copy struct it self as it is identical in functionality.
This way the types match up nicer with the Userpace<T> api's and it
conforms to the conventions used in other syscalls.
Since we already have the type information in the Userspace template,
it was a bit silly to cast manually everywhere. Just add a sufficiently
scary-sounding getter for a typed pointer.
Thanks @alimpfard for pointing out that I was being silly with tossing
out the type.
In the future we may want to make this API non-public as well.
This is something I've been meaning to do for a long time, and here we
finally go. This patch moves all sys$foo functions out of Process.cpp
and into files in Kernel/Syscalls/.
It's not exactly one syscall per file (although it could be, but I got
a bit tired of the repetitive work here..)
This makes hacking on individual syscalls a lot less painful since you
don't have to rebuild nearly as much code every time. I'm also hopeful
that this makes it easier to understand individual syscalls. :^)
For now, only the non-standard _SC_NPROCESSORS_CONF and
_SC_NPROCESSORS_ONLN are implemented.
Use them to make ninja pick a better default -j value.
While here, make the ninja package script not fail if
no other port has been built yet.
The AT_* entries are placed after the environment variables, so that
they can be found by iterating until the end of the envp array, and then
going even further beyond :^)
When delivering urgent signals to the current thread
we need to check if we should be unblocked, and if not
we need to yield to another process.
We also need to make sure that we suppress context switches
during Process::exec() so that we don't clobber the registers
that it sets up (eip mainly) by a context switch. To be able
to do that we add the concept of a critical section, which are
similar to Process::m_in_irq but different in that they can be
requested at any time. Calls to Scheduler::yield and
Scheduler::donate_to will return instantly without triggering
a context switch, but the processor will then asynchronously
trigger a context switch once the critical section is left.
These new syscalls allow you to send and receive file descriptors over
a local domain socket. This will enable various privilege separation
techniques and other good stuff. :^)
ppoll() is similar() to poll(), but it takes its timeout
as timespec instead of as int, and it takes an additional
sigmask parameter.
Change the sys$poll parameters to match ppoll() and implement
poll() in terms of ppoll().
It looks like they're considered a bad idea, so let's not add
them before we need them. I figured it's good to have them in
git history if we ever do need them though, hence the add/remove
dance.
Add seteuid()/setegid() under _POSIX_SAVED_IDS semantics,
which also requires adding suid and sgid to Process, and
changing setuid()/setgid() to honor these semantics.
The exact semantics aren't specified by POSIX and differ
between different Unix implementations. This patch makes
serenity follow FreeBSD. The 2002 USENIX paper
"Setuid Demystified" explains the differences well.
In addition to seteuid() and setegid() this also adds
setreuid()/setregid() and setresuid()/setresgid(), and
the accessors getresuid()/getresgid().
Also reorder uid/euid functions so that they are the
same order everywhere (namely, the order that
geteuid()/getuid() already have).
You now have to pledge "sigaction" to change signal handlers/dispositions. This
is to prevent malicious code from messing with assertions (and segmentation
faults), which are normally expected to instantly terminate the process but can
do other things if you change signal disposition for them.
This was a holdover from the old times when each Process had a special
main thread with TID 0. Using it was a total crapshoot since it would
just return whichever thread was first on the process's thread list.
Now that I've removed all uses of it, we don't need it anymore. :^)
Instead of falling back to the suspicious "any_thread()" mechanism,
just fail with ESRCH if you try to kill() a PID that doesn't have a
corresponding TID.
This was supposed to be the foundation for some kind of pre-kernel
environment, but nobody is working on it right now, so let's move
everything back into the kernel and remove all the confusion.
We stopped using gettimeofday() in Core::EventLoop a while back,
in favor of clock_gettime() for monotonic time.
Maintaining an optimization for a syscall we're not using doesn't make
a lot of sense, so let's go back to the old-style sys$gettimeofday().
Ultimately we should not panic just because we can't fully commit a VM
region (by populating it with physical pages.)
This patch handles some of the situations where commit() can fail.
This patch adds PageFaultResponse::OutOfMemory which informs the fault
handler that we were unable to allocate a necessary physical page and
cannot continue.
In response to this, the kernel will crash the current process. Because
we are OOM, we can't symbolicate the crash like we normally would
(since the ELF symbolication code needs to allocate), so we also
communicate to Process::crash() that we're out of memory.
Now we can survive "allocate 300 MB" (only the allocate process dies.)
This is definitely not perfect and can easily end up killing a random
innocent other process who happened to allocate one page at the wrong
time, but it's a *lot* better than panicking on OOM. :^)
This is a special case that was previously not implemented.
The idea is that you can dispatch a signal to all other processes
the calling process has access to.
There was some minor refactoring to make the self signal logic
into a function so it could easily be easily re-used from do_killall.
Previously, when returning from a pthread's start_routine, we would
segfault. Now we instead implicitly call pthread_exit as specified in
the standard.
pthread_create now creates a thread running the new
pthread_create_helper, which properly manages the calling and exiting
of the start_routine supplied to pthread_create. To accomplish this,
the thread's stack initialization has been moved out of
sys$create_thread and into the userspace function create_thread.
PT_SETTREGS sets the regsiters of the traced thread. It can only be
used when the tracee is stopped.
Also, refactor ptrace.
The implementation was getting long and cluttered the alraedy large
Process.cpp file.
This commit moves the bulk of the implementation to Kernel/Ptrace.cpp,
and factors out peek & poke to separate methods of the Process class.
This patch adds the minherit() syscall originally invented by OpenBSD.
Only the MAP_INHERIT_ZERO mode is supported for now. If set on an mmap
region, that region will be zeroed out on fork().
This commit adds a basic implementation of
the ptrace syscall, which allows one process
(the tracer) to control another process (the tracee).
While a process is being traced, it is stopped whenever a signal is
received (other than SIGCONT).
The tracer can start tracing another thread with PT_ATTACH,
which causes the tracee to stop.
From there, the tracer can use PT_CONTINUE
to continue the execution of the tracee,
or use other request codes (which haven't been implemented yet)
to modify the state of the tracee.
Additional request codes are PT_SYSCALL, which causes the tracee to
continue exection but stop at the next entry or exit from a syscall,
and PT_GETREGS which fethces the last saved register set of the tracee
(can be used to inspect syscall arguments and return value).
A special request code is PT_TRACE_ME, which is issued by the tracee
and causes it to stop when it calls execve and wait for the
tracer to attach.
This was only used by the mechanism for mapping executables into each
process's own address space. Now that we remap executables on demand
when needed for symbolication, this can go away.
Previously we would map the entire executable of a program in its own
address space (but make it unavailable to userspace code.)
This patch removes that and changes the symbolication code to remap
the executable on demand (and into the kernel's own address space
instead of the process address space.)
This opens up a couple of further simplifications that will follow.