It's possible that we broadcast an IPI message right at the same time
another processor requests a halt. Rather than spinning forever waiting
for that message to be handled, check if we should halt while waiting.
Problem:
- `constexpr` functions are decorated with the `inline` specifier
keyword. This is redundant because `constexpr` functions are
implicitly `inline`.
- [dcl.constexpr], §7.1.5/2 in the C++11 standard): "constexpr
functions and constexpr constructors are implicitly inline (7.1.2)".
Solution:
- Remove the redundant `inline` keyword.
In case we want to rely more on TSC in time keeping in the future, idk
This adds:
- RDTSCP, for when the RDTSCP instruction is available
- CONSTANT_TSC, for when the TSC has a constant frequency, invariant
under things like the CPU boosting its frequency.
- NONSTOP_TSC, for when the TSC doesn't pause when the CPU enters
sleep states.
AMD cpus and newer intel cpus set the INVSTC bit (bit 8 in edx of
extended cpuid 0x8000000008), which implies both CONSTANT_TSC and
NONSTOP_TSC. Some older intel processors have CONSTANT_TSC but not
NONSTOP_TSC; this is set based on cpu model checks.
There isn't a ton of documentation on this, so this follows Linux
terminology and http://blog.tinola.com/?e=54
CONSTANT_TSC:
39b3a79105
NONSTOP_TSC:
40fb17152c
qemu disables invtsc (bit 8 in edx of extended cpuid 0x8000000008)
by default even if the host cpu supports it. It can be enabled by
running with `SERENITY_QEMU_CPU=host,migratable=off` set.
Fix gracefully failing these calls if used within IRQ handlers. If we're
handling IRQs, we need to handle these failures first, because we can't
really resolve page faults in a meaningful way. But if we know that it
was one of these functions that failed, then we can gracefully handle
the situation.
This solves a crash where the Scheduler attempts to produce backtraces
in the timer irq, some of which cause faults.
Fixes#3492
Since the CPU already does almost all necessary validation steps
for us, we don't really need to attempt to do this. Doing it
ourselves doesn't really work very reliably, because we'd have to
account for other processors modifying virtual memory, and we'd
have to account for e.g. pages not being able to be allocated
due to insufficient resources.
So change the copy_to/from_user (and associated helper functions)
to use the new safe_memcpy, which will return whether it succeeded
or not. The only manual validation step needed (which the CPU
can't perform for us) is making sure the pointers provided by user
mode aren't pointing to kernel mappings.
To make it easier to read/write from/to either kernel or user mode
data add the UserOrKernelBuffer helper class, which will internally
either use copy_from/to_user or directly memcpy, or pass the data
through directly using a temporary buffer on the stack.
Last but not least we need to keep syscall params trivial as we
need to copy them from/to user mode using copy_from/to_user.
These special functions can be used to safely copy/set memory or
determine the length of a string, e.g. provided by user mode.
In the event of a page fault, safe_memcpy/safe_memset will return
false and safe_strnlen will return -1.
Since "rings" typically refer to code execution and user processes
can also execute in ring 0, rename these functions to more accurately
describe what they mean: kernel processes and user processes.
The ring is determined based on the CS register. This fixes crashes
being handled as ring 3 crashes even though EIP/CS clearly showed
that the crash happened in the kernel.
The exit condition for the loop was sizeof(m_features) * 8,
which was 32. Presumably this was supposed to mean 32 bits, but it
actually made it stop as soon as it reached the 6th bit.
Also add detection for more SIMD CPU features.
The behaviour of the PT_TRACEME feature has been broken for some time,
this change fixes it.
When this ptrace flag is used, the traced process should be paused
before exiting execve.
We previously were sending the SIGSTOP signal at a stage where
interrupts are disabled, and the traced process continued executing
normally, without pausing and waiting for the tracer.
This change fixes it.
Upon leaving a critical section (such as a SpinLock) we need to
check if we're already asynchronously invoking the Scheduler.
Otherwise we might end up triggering another context switch
as soon as leaving the scheduler lock.
Fixes#2883
For now, only the non-standard _SC_NPROCESSORS_CONF and
_SC_NPROCESSORS_ONLN are implemented.
Use them to make ninja pick a better default -j value.
While here, make the ninja package script not fail if
no other port has been built yet.
We need to halt the BSP briefly until all APs are ready for the
first context switch, but we can't hold the same spinlock by all
of them while doing so. So, while the APs are waiting on each other
they need to release the scheduler lock, and then once signaled
re-acquire it. Should solve some timing dependent hangs or crashes,
most easily observed using qemu with kvm disabled.
We can now properly initialize all processors without
crashing by sending SMP IPI messages to synchronize memory
between processors.
We now initialize the APs once we have the scheduler running.
This is so that we can process IPI messages from the other
cores.
Also rework interrupt handling a bit so that it's more of a
1:1 mapping. We need to allocate non-sharable interrupts for
IPIs.
This also fixes the occasional hang/crash because all
CPUs now synchronize memory with each other.
The short-circuit path added for waiting on a queue that already had a
pending wake was able to return with interrupts disabled, which breaks
the API contract of wait_on() always returning with IF=1.
Fix this by adding a way to override the restored IF in ScopedCritical.
These changes solve a number of problems with the software
context swithcing:
* The scheduler lock really should be held throughout context switches
* Transitioning from the initial (idle) thread to another needs to
hold the scheduler lock
* Transitioning from a dying thread to another also needs to hold
the scheduler lock
* Dying threads cannot necessarily be finalized if they haven't
switched out of it yet, so flag them as active while a processor
is running it (the Running state may be switched to Dying while
it still is actually running)
If we're trying to walk the stack for another thread, we can
no longer retreive the EBP register from Thread::m_tss. Instead,
we need to look at the top of the kernel stack, because all threads
not currently running were last in kernel mode. Context switches
now always trigger a brief switch to kernel mode, and Thread::m_tss
only is used to save ESP and EIP.
Fixes#2678
When delivering urgent signals to the current thread
we need to check if we should be unblocked, and if not
we need to yield to another process.
We also need to make sure that we suppress context switches
during Process::exec() so that we don't clobber the registers
that it sets up (eip mainly) by a context switch. To be able
to do that we add the concept of a critical section, which are
similar to Process::m_in_irq but different in that they can be
requested at any time. Calls to Scheduler::yield and
Scheduler::donate_to will return instantly without triggering
a context switch, but the processor will then asynchronously
trigger a context switch once the critical section is left.
CPUs which support RDRAND do not necessarily support RDSEED. This
introduces a flag g_cpu_supports_rdseed which is set appropriately
by CPUID. This causes Haswell CPUs in particular (and probably a lot
of AMD chips) to now fail to boot with #2634, rather than an illegal
instruction.
It seems like the KernelRng needs either an initial reseed call or
more random events added before the first call to get_good_random,
but I don't feel qualified to make that kind of change.
We were getting a little overly memey in some places, so let's scale
things back to business-casual.
Informal language is fine in comments, commits and debug logs,
but let's keep the runtime nice and presentable. :^)
Let's not be paying the function call overhead for these tiny ops.
Maybe there's an argument for having fewer gadgets in the kernel but
for now we're actually seeing stac() in profiles so let's put
that above theoretical security issues.
This was supposed to be the foundation for some kind of pre-kernel
environment, but nobody is working on it right now, so let's move
everything back into the kernel and remove all the confusion.
This patch adds PageFaultResponse::OutOfMemory which informs the fault
handler that we were unable to allocate a necessary physical page and
cannot continue.
In response to this, the kernel will crash the current process. Because
we are OOM, we can't symbolicate the crash like we normally would
(since the ELF symbolication code needs to allocate), so we also
communicate to Process::crash() that we're out of memory.
Now we can survive "allocate 300 MB" (only the allocate process dies.)
This is definitely not perfect and can easily end up killing a random
innocent other process who happened to allocate one page at the wrong
time, but it's a *lot* better than panicking on OOM. :^)
We currently only care about debug exceptions that are triggered
by the single-step execution mode.
The debug exception is translated to a SIGTRAP, which can be caught
and handled by the tracing thread.