Each process has a 1-level lookup cache for fast repeated lookups of
the same VM region (which tends to be the majority of lookups.)
The cache is used by the following syscalls: munmap, madvise, mprotect
and set_mmap_name.
After a succesful exec(), there could be a stale Region* in the lookup
cache, and the new executable was able to manipulate it using a number
of use-after-free code paths.
Now the ACPI & PCI code is more safer, because we don't use raw pointers
or references to objects or data that are located in the physical
address space, so an accidental dereference cannot happen easily.
Instead, we use the PhysicalAddress class to represent those addresses.
Now we use the GenericInterruptHandler class instead of IRQHandler in
the CPU functions.
This commit adds an include to the ISR stub macros header file.
Also, this commit adds support for IRQ sharing, so when an IRQHandler
will try to register to already-assigned IRQ number, a SharedIRQHandler
will be created to register both IRQHandlers.
Also, the enable() function is now correct and will use the right
registers and values. In addition to that, write_register() and
read_registers() are not relying on identity mapping anymore.
Also, update the class implementation to use PCI::Device class
accordingly.
The create() helper will now search for an IDE controller in the
PCI bus, allowing to simplify the initialize() method.
This class represents a shared interrupt handler. This class will not be
created automatically but only if two IRQ Handlers are sharing the same
IRQ number.
The GenericInterruptHandler class will be used to represent
an abstract interrupt handler. The InterruptManagement class will
represent a centralized component to manage interrupts.
I probably would've done INI config removal in another commit, but it
fit well here because I didn't want to pledge wpath for SystemMenu if I
didn't need to.
Frankly, that's something that I think should be done: allow ConfigFile
to be used read-only.
Each allocation header was tracking its index into the chunk bitmap,
but that index can be computed from the allocation address anyway.
Removing this means that each allocation gets 4 more bytes of memory
and this avoids allocating an extra chunk in many cases. :^)
When committing to a new executable, disown any shared buffers that the
process was previously co-owning.
Otherwise accessing the same shared buffer ID from the new program
would cause the kernel to find a cached (and stale!) reference to the
previous program's VM region corresponding to that shared buffer,
leading to a Region* use-after-free.
Fixes#1270.
Since we're gonna throw away these stacks at the end of exec anyway,
we might as well disable profiling before starting to mess with the
process page tables. One less weird situation to worry about in the
sampling code.
Linux creates holes in block lists for all-zero content. This is very
reasonable and we can now handle that situation as well.
Note that we're not smart enough to generate these holes ourselves yet,
but now we can at least read from such files.
The kernel sampling profiler will walk thread stacks during the timer
tick handler. Since it's not safe to trigger page faults during IRQ's,
we now avoid this by checking the page tables manually before accessing
each stack location.
We're not equipped to deal with page faults during an IRQ handler,
so add an assertion so we can immediately tell what's wrong.
This is why profiling sometimes hangs the system -- walking the stack
of the profiled thread causes a page fault and things fall apart.
If we get an -ENOENT when resolving the target because of some part, that is not
the very last part, missing, we should just return the error instead of panicking
later :^)
To test:
$ mkdir /tmp/foo/
$ mv /tmp/foo/ /tmp/bar/
Related to https://github.com/SerenityOS/serenity/issues/1253
This is apparently a special case unlike any other, so let's handle it
directly in VFS::mkdir() instead of adding an alternative code path into
VFS::resolve_path().
Fixes https://github.com/SerenityOS/serenity/issues/1253
Anonymous VM objects should never have null entries in their physical
page list. Instead, "empty" or untouched pages should refer to the
shared zero page.
Fixes#1237.
This will allow us to run the system menu as any user. It will also
enable further lockdown of the WindowServer process since it should no
longer need to pledge proc and exec. :^)
Note that this program is not finished yet.
Work towards #1231.
Process teardown is divided into two main stages: finalize and reap.
Finalization happens in the "Finalizer" kernel and runs with interrupts
enabled, allowing destructors to take locks, etc.
Reaping happens either in sys$waitid() or in the scheduler for orphans.
The more work we can do in finalization, the better, since it's fully
pre-emptible and reduces the amount of time the system runs without
interrupts enabled.
Replace Process::m_being_inspected with an inspector reference count.
This prevents an assertion from firing when inspecting the same process
in /proc from multiple processes at the same time.
It was trivially reproducible by opening multiple FileManagers.
This patch adds NotificationServer, which runs as the "notify" user
and provides an IPC API for desktop notifications.
LibGUI gains the GUI::Notification class for showing notifications.
NotificationServer is spawned on demand and will unspawn after
dimissing all visible notifications. :^)
Finally, this also comes with a small /bin/notify utility.
This was actually rather painless and straightforward. WindowServer now
runs as the "window" user. Users in the "window" group can connect to
it via the socket in /tmp/portal/window as usual.
This mechanism wasn't actually used to create any WeakPtr<Process>.
Such pointers would be pretty hard to work with anyway, due to the
multi-step destruction ritual of Process.
A 16-bit refcount is just begging for trouble right nowl.
A 32-bit refcount will be begging for trouble later down the line,
so we'll have to revisit this eventually. :^)
This patch adds a globally shared zero-filled PhysicalPage that will
be mapped into every slot of every zero-filled AnonymousVMObject until
that page is written to, achieving CoW-like zero-filled pages.
Initial testing show that this doesn't actually achieve any sharing yet
but it seems like a good design regardless, since it may reduce the
number of page faults taken by programs.
If you look at the refcount of MM.shared_zero_page() it will have quite
a high refcount, but that's just because everything maps it everywhere.
If you want to see the "real" refcount, you can build with the
MAP_SHARED_ZERO_PAGE_LAZILY flag, and we'll defer mapping of the shared
zero page until the first NP read fault.
I've left this behavior behind a flag for future testing of this code.
This was only used by HashTable::dump() which I used when doing the
first HashTable implementation. Removing this allows us to also remove
most includes of <AK/kstdio.h>.
Warnings fixed:
* SC2086: Double quote to prevent globbing and word splitting.
* SC2006: Use $(...) notation instead of legacy backticked `...`
* SC2039: In POSIX sh, echo flags are undefined
* SC2209: Use var=$(command) to assign output (or quote to assign string)
* SC2164: Use 'cd ... || exit' or 'cd ... || return' in case cd fails
* SC2166: Prefer [ p ] && [ q ] as [ p -a q ] is not well defined.
* SC2034: i appears unused. Verify use (or export if used externally)
* SC2046: Quote this to prevent word splitting.
* SC2236: Use -z instead of ! -n.
There are still a lot of warnings in Kernel/run about:
- SC2086: Double quote to prevent globbing and word splitting.
However, splitting on space is intentional in this case, and not trivial to
change. Therefore ignore the warning for now - but we should fix this in
the future.
This server listens on port 8000 and serves HTML files from /www.
It's very simple and quite naive, but I think we can start here and
build our way to something pretty neat.
Work towards #792.
We can now participate in the TCP connection closing handshake. :^)
This implementation is definitely not complete and needs to handle a
bunch of other cases. But it's a huge improvement over not being able
to close connections at all.
Note that we hold on to pending-close sockets indefinitely, until they
are moved into the Closed state. This should also have a timeout but
that's still a FIXME. :^)
Fixes#428.
It doesn't look healthy to create raw references into an array before
a temporary unlock. In fact, that temporary unlock looks generally
unhealthy, but it's a different problem.
Calling shutdown prevents further reads and/or writes on a socket.
We should do a few more things based on the type of socket, but this
initial implementation just puts the basic mechanism in place.
Work towards #428.
The idea behind WeakPtr<NetworkAdapter> was to support hot-pluggable
network adapters, but on closer thought, that's super impractical so
let's not go down that road.
If there's not enough space in the output buffer for the whole sockaddr
we now simply truncate the address instead of returning EINVAL.
This patch also makes getpeername() actually return the peer address
rather than the local address.. :^)
According to POSIX, waitid() should fill si_signo and si_pid members
with zeroes if there are no children that have already changed their
state by the time of the call. Let's just fill the whole structure
with zeroes to avoid leaking kernel memory.
sys$waitid() takes an explicit description of whether it's waiting for a single
process with the given PID, all of the children, a group, etc., and returns its
info as a siginfo_t.
It also doesn't automatically imply WEXITED, which clears up the confusion in
the kernel.
We add this feature together with the VMWareBackdoor class.
VMWareBackdoor class is responsible for enabling the vmmouse, and then
controlling it from the PS2 mouse IRQ handler.
On my system (Void Linux) the root user has a default umask of 0077,
causing files and directories in the disk image to have zero group and
world permissions.
This patch introduces sys$perf_event() with two event types:
- PERF_EVENT_MALLOC
- PERF_EVENT_FREE
After the first call to sys$perf_event(), a process will begin keeping
these events in a buffer. When the process dies, that buffer will be
written out to "perfcore" in the current directory unless that filename
is already taken.
This is probably not the best way to do this, but it's a start and will
make it possible to start doing memory allocation profiling. :^)