Since the CPU already does almost all necessary validation steps
for us, we don't really need to attempt to do this. Doing it
ourselves doesn't really work very reliably, because we'd have to
account for other processors modifying virtual memory, and we'd
have to account for e.g. pages not being able to be allocated
due to insufficient resources.
So change the copy_to/from_user (and associated helper functions)
to use the new safe_memcpy, which will return whether it succeeded
or not. The only manual validation step needed (which the CPU
can't perform for us) is making sure the pointers provided by user
mode aren't pointing to kernel mappings.
To make it easier to read/write from/to either kernel or user mode
data add the UserOrKernelBuffer helper class, which will internally
either use copy_from/to_user or directly memcpy, or pass the data
through directly using a temporary buffer on the stack.
Last but not least we need to keep syscall params trivial as we
need to copy them from/to user mode using copy_from/to_user.
Previously, we were including the whole of <getopt.h> in <unistd.h>.
However according to the posix <unistd.h> doesn't have to contain
everything we had in <getopt.h>.
This fixes some symbol collisions that broke the git port.
This is racy in userspace and non-racy in kernelspace so let's keep
it in kernelspace.
The behavior change where CLOEXEC is preserved when dup2() is called
with (old_fd == new_fd) was good though, let's keep that.
- Remove goofy _r suffix from syscall names.
- Don't take a signed buffer size.
- Use Userspace<T>.
- Make TTY::tty_name() return a String instead of a StringView.
This being inline somehow broke the binutils autoconf scripts. It used
to work, so I suspect that some other change to LibC has caused those
autoconf scripts to go down a new path.
Regardless, this seems perfectly sensible.
While profiling I noticed that gettid() was hitting gs register
twice, once for the initial fetch of s_cache_tid out of TLS for
the initialization check, and then again when we return the actual
value.
Optimize the implementation to cache the value so we avoid the
double fetch during the 99% case where it's already set. With
this change gettid() goes from being the 3rd most sampled function
in test-js, to pretty much disappearing into ~20th place.
Additionally add the same optimization to getpid().
This works the same as gettid(). No sense in making a syscall to the
kernel every time you ask for the PID since it won't change.
Just like gettid(), the cache is invalidated on fork().
For now, only the non-standard _SC_NPROCESSORS_CONF and
_SC_NPROCESSORS_ONLN are implemented.
Use them to make ninja pick a better default -j value.
While here, make the ninja package script not fail if
no other port has been built yet.
It looks like they're considered a bad idea, so let's not add
them before we need them. I figured it's good to have them in
git history if we ever do need them though, hence the add/remove
dance.
Add seteuid()/setegid() under _POSIX_SAVED_IDS semantics,
which also requires adding suid and sgid to Process, and
changing setuid()/setgid() to honor these semantics.
The exact semantics aren't specified by POSIX and differ
between different Unix implementations. This patch makes
serenity follow FreeBSD. The 2002 USENIX paper
"Setuid Demystified" explains the differences well.
In addition to seteuid() and setegid() this also adds
setreuid()/setregid() and setresuid()/setresgid(), and
the accessors getresuid()/getresgid().
Also reorder uid/euid functions so that they are the
same order everywhere (namely, the order that
geteuid()/getuid() already have).
That's not how readlink() is supposed to work: it should copy as many bytes
as fit into the buffer, and return the number of bytes copied. So do that,
but add a twist: make sys$readlink() actually return the whole size, not
the number of bytes copied. We fix up this return value in userspace, to make
LibC's readlink() behave as expected, but this will also allow other code
to allocate a buffer of just the right size.
Also, avoid an extra copy of the link target.
We now use minherit(MAP_INHERIT_ZERO) to create a gettid() cache that
is automatically invalidated on fork(). This is needed since the TID
will be different in a forked child, and so we can't have a stale
cached TID lying around.
This is a gigantic speedup for LibJS (and everyone else too) :^)
Add an extra out-parameter to shbuf_get() that receives the size of the
shared buffer. That way we don't need to make a separate syscall to
get the size, which we always did immediately after.
This feels a lot more consistent and Unixy:
create_shared_buffer() => shbuf_create()
share_buffer_with() => shbuf_allow_pid()
share_buffer_globally() => shbuf_allow_all()
get_shared_buffer() => shbuf_get()
release_shared_buffer() => shbuf_release()
seal_shared_buffer() => shbuf_seal()
get_shared_buffer_size() => shbuf_get_size()
Also, "shared_buffer_id" is shortened to "shbuf_id" all around.
This patch adds a crappy pread() just to get "git" working locally.
A proper version would be implemented in the kernel so that we don't
have to mess with the file descriptor's offset at all.
We should only execute the filename verbatim if it contains a slash (/)
character somewhere. Otherwise, we need to look through the entries in
the PATH environment variable.
This fixes an issue where you could easily "override" system programs
by placing them in a directory you control, and then waiting for
someone to come there and run e.g "ls" :^)
Test: LibC/exec-should-not-search-current-directory.cpp
This syscall is a complement to pledge() and adds the same sort of
incremental relinquishing of capabilities for filesystem access.
The first call to unveil() will "drop a veil" on the process, and from
now on, only unveiled parts of the filesystem are visible to it.
Each call to unveil() specifies a path to either a directory or a file
along with permissions for that path. The permissions are a combination
of the following:
- r: Read access (like the "rpath" promise)
- w: Write access (like the "wpath" promise)
- x: Execute access
- c: Create/remove access (like the "cpath" promise)
Attempts to open a path that has not been unveiled with fail with
ENOENT. If the unveiled path lacks sufficient permissions, it will fail
with EACCES.
Like pledge(), subsequent calls to unveil() with the same path can only
remove permissions, not add them.
Once you call unveil(nullptr, nullptr), the veil is locked, and it's no
longer possible to unveil any more paths for the process, ever.
This concept comes from OpenBSD, and their implementation does various
things differently, I'm sure. This is just a first implementation for
SerenityOS, and we'll keep improving on it as we go. :^)
As suggested by Joshua, this commit adds the 2-clause BSD license as a
comment block to the top of every source file.
For the first pass, I've just added myself for simplicity. I encourage
everyone to add themselves as copyright holders of any file they've
added or modified in some significant way. If I've added myself in
error somewhere, feel free to replace it with the appropriate copyright
holder instead.
Going forward, all new source files should include a license header.
Since a chroot is in many ways similar to a separate root mount, we can also
apply mount flags to it as if it was an actual mount. These flags will apply
whenever the chrooted process accesses its root directory, but not when other
processes access this same directory for the outside. Since it's common to
chdir("/") immediately after chrooting (so that files accessed through the
current directory inherit the same mount flags), this effectively allows one to
apply additional limitations to a process confined inside a chroot.
To this effect, sys$chroot() gains a mount_flags argument (exposed as
chroot_with_mount_flags() in userspace) which can be set to all the same values
as the flags argument for sys$mount(), and additionally to -1 to keep the flags
set for that file system. Note that passing 0 as mount_flags will unset any
flags that may have been set for the file system, not keep them.
This patch implements basic support for OpenBSD-style pledge().
pledge() allows programs to incrementally reduce their set of allowed
syscalls, which are divided into categories that each make up a subset
of POSIX functionality.
If a process violates one of its pledged promises by attempting to call
a syscall that it previously said it wouldn't call, the process is
immediately terminated with an uncatchable SIGABRT.
This is by no means complete, and we'll need to add more checks in
various places to ensure that promises are being kept.
But it is pretty cool! :^)