Kernel: Overhaul system shutdown procedure

For a long time, our shutdown procedure has basically been:
- Acquire big process lock.
- Switch framebuffer to Kernel debug console.
- Sync and lock all file systems so that disk caches are flushed and
  files are in a good state.
- Use firmware and architecture-specific functionality to perform
  hardware shutdown.

This naive and simple shutdown procedure has multiple issues:
- No processes are terminated properly, meaning they cannot perform more
  complex cleanup work. If they were in the middle of I/O, for instance,
  only the data that already reached the Kernel is written to disk, and
  data corruption due to unfinished writes can therefore still occur.
- No file systems are unmounted, meaning that any important unmount work
  will never happen. This is important for e.g. Ext2, which has
  facilites for detecting improper unmounts (see superblock's s_state
  variable) and therefore requires a proper unmount to be performed.
  This was also the starting point for this PR, since I wanted to
  introduce basic Ext2 file system checking and unmounting.
- No hardware is properly shut down beyond what the system firmware does
  on its own.
- Shutdown is performed within the write() call that asked the Kernel to
  change its power state. If the shutdown procedure takes longer (i.e.
  when it's done properly), this blocks the process causing the shutdown
  and prevents any potentially-useful interactions between Kernel and
  userland during shutdown.

In essence, current shutdown is a glorified system crash with minimal
file system cleanliness guarantees.

Therefore, this commit is the first step in improving our shutdown
procedure. The new shutdown flow is now as follows:
- From the write() call to the power state SysFS node, a new task is
  started, the Power State Switch Task. Its only purpose is to change
  the operating system's power state. This task takes over shutdown and
  reboot duties, although reboot is not modified in this commit.
- The Power State Switch Task assumes that userland has performed all
  shutdown duties it can perform on its own. In particular, it assumes
  that all kinds of clean process shutdown have been done, and remaining
  processes can be hard-killed without consequence. This is an important
  separation of concerns: While this commit does not modify userland, in
  the future SystemServer will be responsible for performing proper
  shutdown of user processes, including timeouts for stubborn processes
  etc.
- As mentioned above, the task hard-kills remaining user processes.
- The task hard-kills all Kernel processes except itself and the
  Finalizer Task. Since Kernel processes can delay their own shutdown
  indefinitely if they want to, they have plenty opportunity to perform
  proper shutdown if necessary. This may become a problem with
  non-cooperative Kernel tasks, but as seen two commits earlier, for now
  all tasks will cooperate within a few seconds.
- The task waits for the Finalizer Task to clean up all processes.
- The task hard-kills and finalizes the Finalizer Task itself, meaning
  that it now is the only remaining process in the system.
- The task syncs and locks all file systems, and then unmounts them. Due
  to an unknown refcount bug we currently cannot unmount the root file
  system; therefore the task is able to abort the clean unmount if
  necessary.
- The task performs platform-dependent hardware shutdown as before.

This commit has multiple remaining issues (or exposed existing ones)
which will need to be addressed in the future but are out of scope for
now:
- Unmounting the root filesystem is impossible due to remaining
  references to the inodes /home and /home/anon. I investigated this
  very heavily and could not find whoever is holding the last two
  references.
- Userland cannot perform proper cleanup, since the Kernel's power state
  variable is accessed directly by tools instead of a proper userland
  shutdown procedure directed by SystemServer.

The recently introduced Firmware/PowerState procedures are removed
again, since all of the architecture-independent code can live in the
power state switch task. The architecture-specific code is kept,
however.
This commit is contained in:
kleines Filmröllchen 2023-07-10 00:17:11 +02:00 committed by Jelle Raaijmakers
parent 2fd23745a9
commit b645f87b7a
Notes: sideshowbarker 2024-07-17 01:11:48 +09:00
6 changed files with 229 additions and 81 deletions

View file

@ -211,7 +211,6 @@ set(KERNEL_SOURCES
Firmware/ACPI/Initialize.cpp
Firmware/ACPI/Parser.cpp
Firmware/ACPI/StaticParsing.cpp
Firmware/PowerState.cpp
Interrupts/GenericInterruptHandler.cpp
Interrupts/IRQHandler.cpp
Interrupts/PCIIRQHandler.cpp

View file

@ -1,17 +1,14 @@
/*
* Copyright (c) 2018-2020, Andreas Kling <kling@serenityos.org>
* Copyright (c) 2021, Liav A. <liavalb@hotmail.co.il>
* Copyright (c) 2023, kleines Filmröllchen <filmroellchen@serenityos.org>
*
* SPDX-License-Identifier: BSD-2-Clause
*/
#include <AK/Platform.h>
#include <Kernel/FileSystem/FileSystem.h>
#include <Kernel/FileSystem/SysFS/Subsystems/Kernel/PowerStateSwitch.h>
#include <Kernel/Firmware/ACPI/Parser.h>
#include <Kernel/Firmware/PowerState.h>
#include <Kernel/Sections.h>
#include <Kernel/TTY/ConsoleManagement.h>
#include <Kernel/Tasks/PowerStateSwitchTask.h>
#include <Kernel/Tasks/Process.h>
namespace Kernel {
@ -55,13 +52,14 @@ ErrorOr<size_t> SysFSPowerStateSwitchNode::write_bytes(off_t offset, size_t coun
TRY(data.read(buf, 1));
switch (buf[0]) {
case '1':
Firmware::reboot();
VERIFY_NOT_REACHED();
PowerStateSwitchTask::reboot();
return 1;
case '2':
Firmware::poweroff();
VERIFY_NOT_REACHED();
PowerStateSwitchTask::shutdown();
return 1;
default:
return Error::from_errno(EINVAL);
}
}
}

View file

@ -1,55 +0,0 @@
/*
* Copyright (c) 2023, Liav A. <liavalb@hotmail.co.il>
*
* SPDX-License-Identifier: BSD-2-Clause
*/
#include <AK/Format.h>
#include <Kernel/Arch/PowerState.h>
#include <Kernel/FileSystem/FileSystem.h>
#include <Kernel/Firmware/ACPI/Parser.h>
#include <Kernel/Firmware/PowerState.h>
#include <Kernel/TTY/ConsoleManagement.h>
#include <Kernel/Tasks/Process.h>
namespace Kernel::Firmware {
void reboot()
{
MutexLocker locker(Process::current().big_lock());
dbgln("acquiring FS locks...");
FileSystem::lock_all();
dbgln("syncing mounted filesystems...");
FileSystem::sync();
dbgln("attempting reboot via ACPI");
if (ACPI::is_enabled())
ACPI::Parser::the()->try_acpi_reboot();
arch_specific_reboot();
dbgln("reboot attempts failed, applications will stop responding.");
dmesgln("Reboot can't be completed. It's safe to turn off the computer!");
Processor::halt();
}
void poweroff()
{
MutexLocker locker(Process::current().big_lock());
ConsoleManagement::the().switch_to_debug();
dbgln("acquiring FS locks...");
FileSystem::lock_all();
dbgln("syncing mounted filesystems...");
FileSystem::sync();
dbgln("attempting system shutdown...");
arch_specific_poweroff();
dbgln("shutdown attempts failed, applications will stop responding.");
dmesgln("Shutdown can't be completed. It's safe to turn off the computer!");
Processor::halt();
}
}

View file

@ -1,16 +0,0 @@
/*
* Copyright (c) 2023, Liav A. <liavalb@hotmail.co.il>
*
* SPDX-License-Identifier: BSD-2-Clause
*/
#pragma once
#include <AK/Types.h>
namespace Kernel::Firmware {
void reboot();
void poweroff();
}

View file

@ -4,8 +4,199 @@
* SPDX-License-Identifier: BSD-2-Clause
*/
#include <AK/Platform.h>
#if ARCH(X86_64)
# include <Kernel/Arch/x86_64/I8042Reboot.h>
# include <Kernel/Arch/x86_64/Shutdown.h>
#elif ARCH(AARCH64)
# include <Kernel/Arch/aarch64/RPi/Watchdog.h>
#endif
#include <AK/StringView.h>
#include <Kernel/Arch/PowerState.h>
#include <Kernel/FileSystem/FileSystem.h>
#include <Kernel/FileSystem/VirtualFileSystem.h>
#include <Kernel/Firmware/ACPI/Parser.h>
#include <Kernel/Library/Panic.h>
#include <Kernel/Sections.h>
#include <Kernel/TTY/ConsoleManagement.h>
#include <Kernel/Tasks/FinalizerTask.h>
#include <Kernel/Tasks/PowerStateSwitchTask.h>
#include <Kernel/Tasks/Process.h>
#include <Kernel/Tasks/Scheduler.h>
namespace Kernel {
static constexpr StringView power_state_switch_task_name_view = "Power State Switch Task"sv;
Thread* g_power_state_switch_task;
bool g_in_system_shutdown { false };
void PowerStateSwitchTask::power_state_switch_task(void* raw_entry_data)
{
Thread::current()->set_priority(THREAD_PRIORITY_HIGH);
auto entry_data = bit_cast<PowerStateCommand>(raw_entry_data);
switch (entry_data) {
case PowerStateCommand::Shutdown:
MUST(PowerStateSwitchTask::perform_shutdown());
break;
case PowerStateCommand::Reboot:
MUST(PowerStateSwitchTask::perform_reboot());
break;
default:
PANIC("Unknown power state command: {}", to_underlying(entry_data));
}
// Although common, the system may not halt through this task.
// Clear the power state switch task so that it can be spawned again.
g_power_state_switch_task = nullptr;
}
void PowerStateSwitchTask::spawn(PowerStateCommand command)
{
// FIXME: If we switch power states during memory pressure, don't let the system crash just because of our task name.
NonnullOwnPtr<KString> power_state_switch_task_name = MUST(KString::try_create(power_state_switch_task_name_view));
VERIFY(g_power_state_switch_task == nullptr);
auto [_, power_state_switch_task_thread] = MUST(Process::create_kernel_process(
move(power_state_switch_task_name), power_state_switch_task, bit_cast<void*>(command)));
g_power_state_switch_task = move(power_state_switch_task_thread);
}
ErrorOr<void> PowerStateSwitchTask::perform_reboot()
{
dbgln("acquiring FS locks...");
FileSystem::lock_all();
dbgln("syncing mounted filesystems...");
FileSystem::sync();
dbgln("attempting reboot via ACPI");
if (ACPI::is_enabled())
ACPI::Parser::the()->try_acpi_reboot();
arch_specific_reboot();
dbgln("reboot attempts failed, applications will stop responding.");
dmesgln("Reboot can't be completed. It's safe to turn off the computer!");
Processor::halt();
}
ErrorOr<void> PowerStateSwitchTask::perform_shutdown()
{
// We assume that by this point userland has tried as much as possible to shut down everything in an orderly fashion.
// Therefore, we force kill remaining processes, including Kernel processes, except the finalizer and ourselves.
dbgln("Killing remaining processes...");
Optional<Process&> finalizer_process;
Process::all_instances().for_each([&](Process& process) {
if (process.pid() == g_finalizer->process().pid())
finalizer_process = process;
});
VERIFY(finalizer_process.has_value());
// Allow init process and finalizer task to be killed.
g_in_system_shutdown = true;
// Make sure to kill all user processes first, otherwise we might get weird hangups.
TRY(kill_processes(ProcessKind::User, finalizer_process->pid()));
TRY(kill_processes(ProcessKind::Kernel, finalizer_process->pid()));
finalizer_process->die();
finalizer_process->finalize();
size_t alive_process_count = 0;
Process::all_instances().for_each([&](Process& process) {
if (process.pid() != Process::current().pid() && !process.is_dead())
alive_process_count++;
});
// Don't panic here (since we may panic in a bit anyways) but report the probable cause of an unclean shutdown.
if (alive_process_count != 0)
dbgln("We're not the last process alive; proper shutdown may fail!");
ConsoleManagement::the().switch_to_debug();
dbgln("Locking all file systems...");
FileSystem::lock_all();
FileSystem::sync();
dbgln("Unmounting all file systems...");
auto unmount_was_successful = true;
while (unmount_was_successful) {
unmount_was_successful = false;
Vector<Mount&, 16> mounts;
TRY(VirtualFileSystem::the().for_each_mount([&](auto const& mount) -> ErrorOr<void> {
TRY(mounts.try_append(const_cast<Mount&>(mount)));
return {};
}));
if (mounts.is_empty())
break;
auto const remaining_mounts = mounts.size();
while (!mounts.is_empty()) {
auto& mount = mounts.take_last();
mount.guest_fs().flush_writes();
auto mount_path = TRY(mount.absolute_path());
auto& mount_inode = mount.guest();
auto const result = VirtualFileSystem::the().unmount(mount_inode, mount_path->view());
if (result.is_error()) {
dbgln("Error during unmount of {}: {}", mount_path, result.error());
// FIXME: For unknown reasons the root FS stays busy even after everything else has shut down and was unmounted.
// Until we find the underlying issue, allow an unclean shutdown here.
if (remaining_mounts <= 1)
dbgln("BUG! One mount remaining; the root file system may not be unmountable at all. Shutting down anyways.");
} else {
unmount_was_successful = true;
}
}
}
dbgln("Attempting system shutdown...");
arch_specific_poweroff();
dbgln("shutdown attempts failed, applications will stop responding.");
dmesgln("Shutdown can't be completed. It's safe to turn off the computer!");
Processor::halt();
}
ErrorOr<void> PowerStateSwitchTask::kill_processes(ProcessKind kind, ProcessID finalizer_pid)
{
bool kill_kernel_processes = kind == ProcessKind::Kernel;
Process::all_instances().for_each([&](Process& process) {
if (process.pid() != Process::current().pid() && process.pid() != finalizer_pid && process.is_kernel_process() == kill_kernel_processes) {
process.die();
}
});
// Although we *could* finalize processes ourselves (g_in_system_shutdown allows this),
// we're nice citizens and let the finalizer task perform final duties before we kill it.
Scheduler::notify_finalizer();
int alive_process_count = 1;
MonotonicTime last_status_time = TimeManagement::the().monotonic_time();
while (alive_process_count > 0) {
Scheduler::yield();
alive_process_count = 0;
Process::all_instances().for_each([&](Process& process) {
if (process.pid() != Process::current().pid() && !process.is_dead() && process.pid() != finalizer_pid && process.is_kernel_process() == kill_kernel_processes)
alive_process_count++;
});
if (TimeManagement::the().monotonic_time() - last_status_time > Duration::from_seconds(2)) {
last_status_time = TimeManagement::the().monotonic_time();
dmesgln("Waiting on {} processes to exit...", alive_process_count);
if constexpr (PROCESS_DEBUG) {
Process::all_instances().for_each_const([&](Process const& process) {
if (process.pid() != Process::current().pid() && !process.is_dead() && process.pid() != finalizer_pid && process.is_kernel_process() == kill_kernel_processes) {
dbgln("Process {:2} kernel={} dead={} dying={} ({})",
process.pid(), process.is_kernel_process(), process.is_dead(), process.is_dying(),
process.name().with([](auto& name) { return name->view(); }));
}
});
}
}
}
return {};
}
}

View file

@ -4,8 +4,39 @@
* SPDX-License-Identifier: BSD-2-Clause
*/
#pragma once
#include <AK/Forward.h>
#include <Kernel/Forward.h>
namespace Kernel {
enum class PowerStateCommand : uintptr_t {
Shutdown,
Reboot,
};
// We will pass the power state command to the task in place of a void* as to avoid the complications of raw allocations.
static_assert(sizeof(PowerStateCommand) == sizeof(void*));
extern bool g_in_system_shutdown;
class PowerStateSwitchTask {
public:
static void shutdown() { spawn(PowerStateCommand::Shutdown); }
static void reboot() { spawn(PowerStateCommand::Reboot); }
private:
static void spawn(PowerStateCommand);
static void power_state_switch_task(void* raw_entry_data);
static ErrorOr<void> perform_reboot();
static ErrorOr<void> perform_shutdown();
enum class ProcessKind {
User,
Kernel,
};
static ErrorOr<void> kill_processes(ProcessKind, ProcessID finalizer_pid);
};
}