Browse Source

add syscalls we purposely block to docs

Signed-off-by: Jessica Frazelle <acidburn@docker.com>
Jessica Frazelle 9 năm trước cách đây
mục cha
commit
52f32818df
1 tập tin đã thay đổi với 64 bổ sung2 xóa
  1. 64 2
      docs/security/seccomp.md

+ 64 - 2
docs/security/seccomp.md

@@ -71,8 +71,7 @@ containers with seccomp. It is moderately protective while
 providing wide application compatibility.
 
 
-Overriding the default profile for a container
-----------------------------------------------
+### Overriding the default profile for a container
 
 You can pass `unconfined` to run a container without the default seccomp
 profile.
@@ -81,3 +80,66 @@ profile.
 $ docker run --rm -it --security-opt seccomp:unconfined debian:jessie \
     unshare --map-root-user --user sh -c whoami
 ```
+
+### Syscalls blocked by the default profile
+
+Docker's default seccomp profile is a whitelist which specifies the calls that
+are allowed. The table below lists the significant (but not all) syscalls that
+are effectively blocked because they are not on the whitelist. The table includes
+the reason each syscall is blocked rather than white-listed.
+
+| Syscall             | Description                                                                                                                           |
+|---------------------|---------------------------------------------------------------------------------------------------------------------------------------|
+| `acct`              | Accounting syscall which could let containers disable their own resource limits or process accounting. Also gated by `CAP_SYS_PACCT`. |
+| `add_key`           | Prevent containers from using the kernel keyring, which is not namespaced.                                                            |
+| `adjtimex`          | Similar to `clock_settime` and `settimeofday`, time/date is not namespaced.                                                           |
+| `bpf`               | Deny loading potentially persistent bpf programs into kernel, already gated by `CAP_SYS_ADMIN`.                                       |
+| `clock_adjtime`     | Time/date is not namespaced.                                                                                                          |
+| `clock_settime`     | Time/date is not namespaced.                                                                                                          |
+| `clone`             | Deny cloning new namespaces. Also gated by `CAP_SYS_ADMIN` for CLONE_* flags, except `CLONE_USERNS`.                                  |
+| `create_module`     | Deny manipulation and functions on kernel modules.                                                                                    |
+| `delete_module`     | Deny manipulation and functions on kernel modules. Also gated by `CAP_SYS_MODULE`.                                                    |
+| `finit_module`      | Deny manipulation and functions on kernel modules. Also gated by `CAP_SYS_MODULE`.                                                    |
+| `get_kernel_syms`   | Deny retrieval of exported kernel and module symbols.                                                                                 |
+| `get_mempolicy`     | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`.                                               |
+| `init_module`       | Deny manipulation and functions on kernel modules. Also gated by `CAP_SYS_MODULE`.                                                    |
+| `ioperm`            | Prevent containers from modifying kernel I/O privilege levels. Already gated by `CAP_SYS_RAWIO`.                                      |
+| `iopl`              | Prevent containers from modifying kernel I/O privilege levels. Already gated by `CAP_SYS_RAWIO`.                                      |
+| `kcmp`              | Restrict process inspection capabilities, already blocked by dropping `CAP_PTRACE`.                                                   |
+| `kexec_file_load`   | Sister syscall of `kexec_load` that does the same thing, slightly different arguments.                                                |
+| `kexec_load`        | Deny loading a new kernel for later execution.                                                                                        |
+| `keyctl`            | Prevent containers from using the kernel keyring, which is not namespaced.                                                            |
+| `lookup_dcookie`    | Tracing/profiling syscall, which could leak a lot of information on the host.                                                         |
+| `mbind`             | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`.                                               |
+| `modify_ldt`        | Old syscall only used in 16-bit code and a potential information leak.                                                                |
+| `mount`             | Deny mounting, already gated by `CAP_SYS_ADMIN`.                                                                                      |
+| `move_pages`        | Syscall that modifies kernel memory and NUMA settings.                                                                                |
+| `name_to_handle_at` | Sister syscall to `open_by_handle_at`. Already gated by `CAP_SYS_NICE`.                                                               |
+| `nfsservctl`        | Deny interaction with the kernel nfs daemon.                                                                                          |
+| `open_by_handle_at` | Cause of an old container breakout. Also gated by `CAP_DAC_READ_SEARCH`.                                                              |
+| `perf_event_open`   | Tracing/profiling syscall, which could leak a lot of information on the host.                                                         |
+| `personality`       | Prevent container from enabling BSD emulation. Not inherently dangerous, but poorly tested, potential for a lot of kernel vulns.      |
+| `pivot_root`        | Deny `pivot_root`, should be privileged operation.                                                                                    |
+| `process_vm_readv`  | Restrict process inspection capabilities, already blocked by dropping `CAP_PTRACE`.                                                   |
+| `process_vm_writev` | Restrict process inspection capabilities, already blocked by dropping `CAP_PTRACE`.                                                   |
+| `ptrace`            | Tracing/profiling syscall, which could leak a lot of information on the host. Already blocked by dropping `CAP_PTRACE`.               |
+| `query_module`      | Deny manipulation and functions on kernel modules.                                                                                    |
+| `quotactl`          | Quota syscall which could let containers disable their own resource limits or process accounting. Also gated by `CAP_SYS_ADMIN`.      |
+| `reboot`            | Don't let containers reboot the host. Also gated by `CAP_SYS_BOOT`.                                                                   |
+| `restart_syscall`   | Don't allow containers to restart a syscall. Possible seccomp bypass see: https://code.google.com/p/chromium/issues/detail?id=408827. |
+| `request_key`       | Prevent containers from using the kernel keyring, which is not namespaced.                                                            |
+| `set_mempolicy`     | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`.                                               |
+| `setns`             | Deny associating a thread with a namespace. Also gated by `CAP_SYS_ADMIN`.                                                            |
+| `settimeofday`      | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`.                                                                            |
+| `stime`             | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`.                                                                            |
+| `swapon`            | Deny start/stop swapping to file/device. Also gated by `CAP_SYS_ADMIN`.                                                               |
+| `swapoff`           | Deny start/stop swapping to file/device. Also gated by `CAP_SYS_ADMIN`.                                                               |
+| `sysfs`             | Obsolete syscall.                                                                                                                     |
+| `_sysctl`           | Obsolete, replaced by /proc/sys.                                                                                                      |
+| `umount`            | Should be a privileged operation. Also gated by `CAP_SYS_ADMIN`.                                                                      |
+| `umount2`           | Should be a privileged operation.                                                                                                     |
+| `unshare`           | Deny cloning new namespaces for processes. Also gated by `CAP_SYS_ADMIN`, with the exception of `unshare --user`.                     |
+| `uselib`            | Older syscall related to shared libraries, unused for a long time.                                                                    |
+| `ustat`             | Obsolete syscall.                                                                                                                     |
+| `vm86`              | In kernel x86 real mode virtual machine. Also gated by `CAP_SYS_ADMIN`.                                                               |
+| `vm86old`           | In kernel x86 real mode virtual machine. Also gated by `CAP_SYS_ADMIN`.                                                               |