Previously this API would return an InodeIdentifier, which meant that
there was a race in path resolution where an inode could be unlinked
in between finding the InodeIdentifier for a path component, and
actually resolving that to an Inode object.
Attaching a test that would quickly trip an assertion before.
Test: Kernel/path-resolution-race.cpp
When using dbg() in the kernel, the output is automatically prefixed
with [Process(PID:TID)]. This makes it a lot easier to understand which
thread is generating the output.
This patch also cleans up some common logging messages and removes the
now-unnecessary "dbg() << *current << ..." pattern.
Sergey suggested that having a non-zero O_RDONLY would make some things
less confusing, and it seems like he's right about that.
We can now easily check read/write permissions separately instead of
dancing around with the bits.
This patch also fixes unveil() validation for O_RDWR which previously
forgot to check for "r" permission.
This syscall is a complement to pledge() and adds the same sort of
incremental relinquishing of capabilities for filesystem access.
The first call to unveil() will "drop a veil" on the process, and from
now on, only unveiled parts of the filesystem are visible to it.
Each call to unveil() specifies a path to either a directory or a file
along with permissions for that path. The permissions are a combination
of the following:
- r: Read access (like the "rpath" promise)
- w: Write access (like the "wpath" promise)
- x: Execute access
- c: Create/remove access (like the "cpath" promise)
Attempts to open a path that has not been unveiled with fail with
ENOENT. If the unveiled path lacks sufficient permissions, it will fail
with EACCES.
Like pledge(), subsequent calls to unveil() with the same path can only
remove permissions, not add them.
Once you call unveil(nullptr, nullptr), the veil is locked, and it's no
longer possible to unveil any more paths for the process, ever.
This concept comes from OpenBSD, and their implementation does various
things differently, I'm sure. This is just a first implementation for
SerenityOS, and we'll keep improving on it as we go. :^)
Previously, VFS::open() would only use the passed flags for permission checking
purposes, and Process::sys$open() would set them on the created FileDescription
explicitly. Now, they should be set by VFS::open() on any files being opened,
including files that the kernel opens internally.
This also lets us get rid of the explicit check for whether or not the returned
FileDescription was a preopen fd, and in fact, fixes a bug where a read-only
preopen fd without any other flags would be considered freshly opened (due to
O_RDONLY being indistinguishable from 0) and granted a new set of flags.
As suggested by Joshua, this commit adds the 2-clause BSD license as a
comment block to the top of every source file.
For the first pass, I've just added myself for simplicity. I encourage
everyone to add themselves as copyright holders of any file they've
added or modified in some significant way. If I've added myself in
error somewhere, feel free to replace it with the appropriate copyright
holder instead.
Going forward, all new source files should include a license header.
Symlink resolution is now a virtual method on an inode,
Inode::resolve_as_symlink(). The default implementation just reads the stored
inode contents, treats them as a path and calls through to VFS::resolve_path().
This will let us support other, magical files that appear to be plain old
symlinks but resolve to something else. This is particularly useful for ProcFS.
It turns out we don't even need to store the whole custody chain, as we only
ever access its last element. So we can just store one custody. This also fixes
a performance FIXME :^)
Also, rename parent_custody to out_parent.
This makes the implementation easier to follow, but also fixes multiple issues
with the old implementation. In particular, it now deals properly with . and ..
in paths, including around mount points.
Hopefully there aren't many new bugs this introduces :^)
You can now bind-mount files and directories. This essentially exposes an
existing part of the file system in another place, and can be used as an
alternative to symlinks or hardlinks.
Here's an example of doing this:
# mkdir /tmp/foo
# mount /home/anon/myfile.txt /tmp/foo -o bind
# cat /tmp/foo
This is anon's file.
We now support these mount flags:
* MS_NODEV: disallow opening any devices from this file system
* MS_NOEXEC: disallow executing any executables from this file system
* MS_NOSUID: ignore set-user-id bits on executables from this file system
The fourth flag, MS_BIND, is defined, but currently ignored.
O_EXEC is mentioned by POSIX, so let's have it. Currently, it is only used
inside the kernel to ensure the process has the right permissions when opening
an executable.
At the moment, the actual flags are ignored, but we correctly propagate them all
the way from the original mount() syscall to each custody that resides on the
mounted FS.
No need to pass around RefPtr<>s and NonnullRefPtr<>s and no need to
heap-allocate them.
Also remove VFS::mount(NonnullRefPtr<FS>&&, StringView path) - it has been
unused for a long time.
The chroot() syscall now allows the superuser to isolate a process into
a specific subtree of the filesystem. This is not strictly permanent,
as it is also possible for a superuser to break *out* of a chroot, but
it is a useful mechanism for isolating unprivileged processes.
The VFS now uses the current process's root_directory() as the root for
path resolution purposes. The root directory is stored as an uncached
Custody in the Process object.
If we're creating something that should have a different owner than the
current process's UID/GID, we need to plumb that all the way through
VFS down to the FS functions.
To accomodate file creation, path resolution optionally returns the
last valid parent directory seen while traversing the path.
Clients will then interpret "ENOENT, but I have a parent for you" as
meaning that the file doesn't exist, but its immediate parent directory
does. The client then goes ahead and creates a new file.
In the case of "/foo/bar/baz" where there is no "/foo", it would fail
with ENOENT and "/" as the last seen parent directory, causing e.g the
open() syscall to create "/baz".
Covered by test_io.
Cautiously use 5 as a limit for now so that we don't blow the stack.
This can be increased in the future if we are sure that we won't be
blowing the stack, or if the implementation is changed to not use
recursion :^)
It is now possible to unmount file systems from the VFS via `umount`.
It works via looking up the `fsid` of the filesystem from the `Inode`'s
metatdata so I'm not sure how fragile it is. It seems to work for now
though as something to get us going.
- You must now have superuser privileges to use mount().
- We now verify that the mount point is a valid path first, before
trying to find a filesystem on the specified device.
- Convert some dbgprintf() to dbg().
Previously the check for an empty part would happen before the
check for access and for the parent being a directory, and so an
error in those would not be detected.
If a symlink is not the last part of a path, the remaining part
of the path has to be further resolved against the symlink target.
With this, a path containing a symlink always resolves to the target
of the first (leftmost) symlink in it, for example any path of form
/proc/self/... resolves to the corresponding /proc/pid directory.
After reading a bunch of POSIX specs, I've learned that a file descriptor
is the number that refers to a file description, not the description itself.
So this patch renames FileDescriptor to FileDescription, and Process now has
FileDescription* file_description(int fd).
Walk the custody cache and try to reuse an existing one when possible.
The VFS is responsible for updating them when something happens that would
cause the described relationship to change.
This is definitely not perfect but it does work for the basic scenarios like
renaming and removing directory entries.
When encountering a symlink, we abandon the custody chain we've been working
on and start over with a new one (by recursing into a new resolution call.)
Caching symlinks in the custody model would be incredibly difficult to get
right with all the extra invalidation it would require, so let's just not.
The current working directory is now stored as a custody. Likewise for a
process executable file. This unbreaks /proc/PID/fd which has not been
working since we made the filesystem bigger.
This still needs a bunch of work, for instance when renaming or removing
a file somewhere, we have to update the relevant custody links.
A custody is kind of a directory entry abstraction that represents a single
entry in a parent directory that tells us the name of a child inode.
The idea here is for path resolution to produce a chain of custody objects.
It was wrong to do a reverse name lookup on the old inode after adding
a new name for it, since we might very well get the new inode instead of
the old one, depending on hash table layouts.