Notice that we ensured that the size is a multiple of the page size and
that there is at least one page there, otherwise, this change would be
invalid.
We create an empty region and then expand it:
// First iteration.
m_user_physical_regions.append(PhysicalRegion::create(addr, addr));
// Following iterations.
region->expand(region->lower(), addr);
So if the memory region only has one page, we would end up with an empty
region. Thus we need to do one more iteration.
If something goes wrong when trying to write out a perfcore file during
process finalization, there's nowhere to report an error to, other than
the debug log. So write it to the debug log.
Problem: Defining the destructor violates the "rule of 0" and prevents
the copy/move constructor/assignment operators from being provided by
the compiler.
Solution: Change the constructor and destructor to be the default
compiler-provided definition.
In case we want to rely more on TSC in time keeping in the future, idk
This adds:
- RDTSCP, for when the RDTSCP instruction is available
- CONSTANT_TSC, for when the TSC has a constant frequency, invariant
under things like the CPU boosting its frequency.
- NONSTOP_TSC, for when the TSC doesn't pause when the CPU enters
sleep states.
AMD cpus and newer intel cpus set the INVSTC bit (bit 8 in edx of
extended cpuid 0x8000000008), which implies both CONSTANT_TSC and
NONSTOP_TSC. Some older intel processors have CONSTANT_TSC but not
NONSTOP_TSC; this is set based on cpu model checks.
There isn't a ton of documentation on this, so this follows Linux
terminology and http://blog.tinola.com/?e=54
CONSTANT_TSC:
39b3a79105
NONSTOP_TSC:
40fb17152c
qemu disables invtsc (bit 8 in edx of extended cpuid 0x8000000008)
by default even if the host cpu supports it. It can be enabled by
running with `SERENITY_QEMU_CPU=host,migratable=off` set.
The block group indices are 1-based for some reason. Because of that,
we were forgetting to check in the very last block group when doing
block allocation. This caused block allocation to fail even when the
superblock indicated that we had free blocks.
Fixes#3674.
When obeying FD_CLOEXEC, we don't need to explicitly call close() on
all the FileDescriptions. We can just clear them out from the process
fd table. ~FileDescription() will call close() anyway.
This fixes an issue where TelnetServer would shut down accepted sockets
when exec'ing a shell for them. Since the parent process still has the
socket open, we should not force-close it. Just let go.
This finally takes care of the kind-of excessive boilerplate code that were the
ctype adapters. On the other hand, I had to link `LibC/ctype.cpp` to the Kernel
(for `AK/JsonParser.cpp` and `AK/Format.cpp`). The previous commit actually makes
sense now: the `string.h` includes in `ctype.{h,cpp}` would require to link more LibC
stuff to the Kernel when it only needs the `_ctype_` array of `ctype.cpp`, and there
wasn't any string stuff used in ctype.
Instead of all this I could have put static derivatives of `is_any_of()` in the
concerned AK files, however that would have meant more boilerplate and workarounds;
so I went for the Kernel approach.
Similar to Process, we need to make Thread refcounted. This will solve
problems that will appear once we schedule threads on more than one
processor. This allows us to hold onto threads without necessarily
holding the scheduler lock for the entire duration.
The thread joining logic hadn't been updated to account for the subtle
differences introduced by software context switching. This fixes several
race conditions related to thread destruction and joining, as well as
finalization which did not properly account for detached state and the
fact that threads can be joined after termination as long as they're not
detached.
Fixes#3596
This function is not avaliable in the kernel.
In the future it would be nice to have some sort of <charconv> header
that does this for all integer types and then call it in strtoull and et
cetera.
The difference would be that this function say 'from_chars' would return
an Optional and not just interpret anything invalid as zero.
We need to assert if interrupts are not disabled when changing the
interrupt number of an interrupt handler.
Before this fix, any change like this would lead to a crash,
because we are using InterruptDisabler in IRQHandler::change_irq_number.
First of all, this fixes a dumb info leak where we'd write kernel heap
addresses (StringImpl*) into userspace memory when reading a watcher.
Instead of trying to pass names to userspace, we now simply pass the
child inode index. Nothing in userspace makes use of this yet anyway,
so it's not like we're breaking anything. We'll see how this evolves.
When SO_TIMESTAMP is set as an option on a SOCK_DGRAM socket, then
recvmsg() will return a SCM_TIMESTAMP control message that
contains a struct timeval with the system time that was current
when the socket was received.
Since the receiving socket isn't yet known at packet receive time,
keep timestamps for all packets.
This is useful for keeping statistics about in-kernel queue latencies
in the future, and it can be used to implement SO_TIMESTAMP.
The implementation only supports a single iovec for now.
Some might say having more than one iovec is the main point of
recvmsg() and sendmsg(), but I'm interested in the control message
bits.
There are plenty of places in the kernel that aren't
checking if they actually got their allocation.
This fixes some of them, but definitely not all.
Fixes#3390Fixes#3391
Also, let's make find_one_free_page() return nullptr
if it doesn't get a free index. This stops the kernel
crashing when out of memory and allows memory purging
to take place again.
Fixes#3487
Before e06362de9487806df92cf2360a42d3eed905b6bf this was a sneaky buffer
overflow. BufferStream did not do range checking and continued to write
past the allocated buffer (the size of which was controlled by the
user.)
The issue surfaced after my changes because OutputMemoryStream does
range checking.
Not sure how exploitable that bug was, directory entries are somewhat
controllable by the user but the buffer was on the heap, so exploiting
that should be tough.
Fixes two flaws in the thread donation logic: Scheduler::donate_to
would never really donate, but just trigger a deferred yield. And
that deferred yield never actually donated to the beneficiary.
So, when we can't immediately donate, we need to save the beneficiary
and use this information as soon as we can perform the deferred
context switch.
Fixes#3495
Fix gracefully failing these calls if used within IRQ handlers. If we're
handling IRQs, we need to handle these failures first, because we can't
really resolve page faults in a meaningful way. But if we know that it
was one of these functions that failed, then we can gracefully handle
the situation.
This solves a crash where the Scheduler attempts to produce backtraces
in the timer irq, some of which cause faults.
Fixes#3492
Since the CPU already does almost all necessary validation steps
for us, we don't really need to attempt to do this. Doing it
ourselves doesn't really work very reliably, because we'd have to
account for other processors modifying virtual memory, and we'd
have to account for e.g. pages not being able to be allocated
due to insufficient resources.
So change the copy_to/from_user (and associated helper functions)
to use the new safe_memcpy, which will return whether it succeeded
or not. The only manual validation step needed (which the CPU
can't perform for us) is making sure the pointers provided by user
mode aren't pointing to kernel mappings.
To make it easier to read/write from/to either kernel or user mode
data add the UserOrKernelBuffer helper class, which will internally
either use copy_from/to_user or directly memcpy, or pass the data
through directly using a temporary buffer on the stack.
Last but not least we need to keep syscall params trivial as we
need to copy them from/to user mode using copy_from/to_user.
These special functions can be used to safely copy/set memory or
determine the length of a string, e.g. provided by user mode.
In the event of a page fault, safe_memcpy/safe_memset will return
false and safe_strnlen will return -1.
I decided to modify MappedROM.h because all other entried in Forward.h
are also classes, and this is visually more pleasing.
Other than that, it just doesn't make any difference which way we resolve
the conflicts.
Since "rings" typically refer to code execution and user processes
can also execute in ring 0, rename these functions to more accurately
describe what they mean: kernel processes and user processes.
The ring is determined based on the CS register. This fixes crashes
being handled as ring 3 crashes even though EIP/CS clearly showed
that the crash happened in the kernel.
In addition to being the proper POSIX etiquette, it seems like a bad idea
for issues like the one seen in #3428 to result in a kernel crash. This patch
replaces the current behavior of failing on insufficient buffer size to truncating
SOCK_RAW messages to the buffer size. This will have to change if/when MSG_PEEK
is implemented, but for now this behavior is more compliant and logical than
just bailing.