Kernel: Don't finalize a thread while it still has code running

After marking a thread for death we might end up finalizing the thread while it still has code to run, e.g. via: Thread::block -> Thread::dispatch_one_pending_signal -> Thread::dispatch_signal -> Process::terminate_due_to_signal -> Process::die -> Process::kill_all_threads -> Thread::set_should_die This marks the thread for death. It isn't destroyed at this point though. The scheduler then gets invoked via: Thread::block -> Thread::relock_process At that point we still have a registered blocker on the stack frame which belongs to Thread::block. Thread::relock_process drops the critical section which allows the scheduler to run. When the thread is then scheduled out the scheduler sets the thread state to Thread::Dying which allows the finalizer to destroy the Thread object and its associated resources including the kernel stack. This probably also affects objects other than blockers which rely on their destructor to be run, however the problem was most noticible because blockers are allocated on the stack of the dying thread and cause an access violation when another thread touches the blocker which belonged to the now-dead thread. Fixes #7823.
Author: https://github.com/gunnarbeutner Commit: https://github.com/SerenityOS/serenity/commit/3c2a6a25dac Pull-request: https://github.com/SerenityOS/serenity/pull/7839
2021-06-06 11:40:11 +02:00 · 2021-06-06 11:40:11 +02:00 · 3c2a6a25da · 2024-07-18 12:46:26 +09:00
commit 3c2a6a25da
parent cab2ee5ea2
2 changed files with 12 additions and 18 deletions
--- a/Kernel/Scheduler.cpp
+++ b/Kernel/Scheduler.cpp
@ -175,8 +175,6 @@ bool Scheduler::pick_next()
 {
    VERIFY_INTERRUPTS_DISABLED();

-    auto current_thread = Thread::current();
-
    // Set the m_in_scheduler flag before acquiring the spinlock. This
    // prevents a recursive call into Scheduler::invoke_async upon
    // leaving the scheduler lock.
@ -194,22 +192,6 @@ bool Scheduler::pick_next()

    ScopedSpinLock lock(g_scheduler_lock);

-    if (current_thread->should_die() && current_thread->state() == Thread::Running) {
-        // Rather than immediately killing threads, yanking the kernel stack
-        // away from them (which can lead to e.g. reference leaks), we always
-        // allow Thread::wait_on to return. This allows the kernel stack to
-        // clean up and eventually we'll get here shortly before transitioning
-        // back to user mode (from Processor::exit_trap). At this point we
-        // no longer want to schedule this thread. We can't wait until
-        // Scheduler::enter_current because we don't want to allow it to
-        // transition back to user mode.
-
-        if constexpr (SCHEDULER_DEBUG)
-            dbgln("Scheduler[{}]: Thread {} is dying", Processor::id(), *current_thread);
-
-        current_thread->set_state(Thread::Dying);
-    }
-
    if constexpr (SCHEDULER_RUNNABLE_DEBUG) {
        dump_thread_list();
    }
--- a/Kernel/Thread.cpp
+++ b/Kernel/Thread.cpp
@ -241,6 +241,16 @@ void Thread::die_if_needed()
    u32 unlock_count;
    [[maybe_unused]] auto rc = unlock_process_if_locked(unlock_count);

+    dbgln_if(THREAD_DEBUG, "Thread {} is dying", *this);
+
+    {
+        ScopedSpinLock lock(g_scheduler_lock);
+        // It's possible that we don't reach the code after this block if the
+        // scheduler is invoked and FinalizerTask cleans up this thread, however
+        // that doesn't matter because we're trying to invoke the scheduler anyway
+        set_state(Thread::Dying);
+    }
+
    ScopedCritical critical;

    // Flag a context switch. Because we're in a critical section,
@ -316,6 +326,8 @@ LockMode Thread::unlock_process_if_locked(u32& lock_count_to_restore)

 void Thread::relock_process(LockMode previous_locked, u32 lock_count_to_restore)
 {
+    VERIFY(state() == Thread::Running);
+
    // Clearing the critical section may trigger the context switch
    // flagged by calling Scheduler::donate_to or Scheduler::yield
    // above. We have to do it this way because we intentionally