From cfb8ca520ae8a76f5bc93012cac58756177ca9a3 Mon Sep 17 00:00:00 2001 From: Sebastiaan van Stijn Date: Mon, 27 Nov 2023 13:23:06 +0100 Subject: [PATCH] hack/dind-systemd: make AppArmor work with systemd enabled MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On bookworm, AppArmor failed to start inside the container, which can be seen at startup of the dev-container: Created symlink /etc/systemd/system/systemd-firstboot.service → /dev/null. Created symlink /etc/systemd/system/systemd-udevd.service → /dev/null. Created symlink /etc/systemd/system/multi-user.target.wants/docker-entrypoint.service → /etc/systemd/system/docker-entrypoint.service. hack/dind-systemd: starting /lib/systemd/systemd --show-status=false --unit=docker-entrypoint.target systemd 252.17-1~deb12u1 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified) Detected virtualization docker. Detected architecture x86-64. modprobe@configfs.service: Deactivated successfully. modprobe@dm_mod.service: Deactivated successfully. modprobe@drm.service: Deactivated successfully. modprobe@efi_pstore.service: Deactivated successfully. modprobe@fuse.service: Deactivated successfully. modprobe@loop.service: Deactivated successfully. apparmor.service: Starting requested but asserts failed. proc-sys-fs-binfmt_misc.automount: Got automount request for /proc/sys/fs/binfmt_misc, triggered by 49 (systemd-binfmt) + source /etc/docker-entrypoint-cmd ++ hack/make.sh dynbinary test-integration When checking "aa-status", an error was printed that the filesystem was not mounted: aa-status apparmor filesystem is not mounted. apparmor module is loaded. Checking if "local-fs.target" was loaded, that seemed to be the case; systemctl status local-fs.target ● local-fs.target - Local File Systems Loaded: loaded (/lib/systemd/system/local-fs.target; static) Active: active since Mon 2023-11-27 10:48:38 UTC; 18s ago Docs: man:systemd.special(7) However, **on the host**, "/sys/kernel/security" has a mount, which was not present inside the container: mount | grep securityfs securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime) Interestingly, on `debian:bullseye`, this was not the case either; no `securityfs` mount was present inside the container, and apparmor actually failed to start, but succeeded silently: mount | grep securityfs systemctl start apparmor systemctl status apparmor ● apparmor.service - Load AppArmor profiles Loaded: loaded (/lib/systemd/system/apparmor.service; enabled; vendor preset: enabled) Active: active (exited) since Mon 2023-11-27 11:59:09 UTC; 44s ago Docs: man:apparmor(7) https://gitlab.com/apparmor/apparmor/wikis/home/ Process: 43 ExecStart=/lib/apparmor/apparmor.systemd reload (code=exited, status=0/SUCCESS) Main PID: 43 (code=exited, status=0/SUCCESS) CPU: 10ms Nov 27 11:59:09 9519f89cade1 apparmor.systemd[43]: Not starting AppArmor in container Same, using the `/etc/init.d/apparmor` script: /etc/init.d/apparmor start Starting apparmor (via systemctl): apparmor.service. echo $? 0 And apparmor was not actually active: aa-status apparmor module is loaded. apparmor filesystem is not mounted. aa-enabled Maybe - policy interface not available. After further investigating, I found that the non-systemd dind script had a mount for AppArmor, which was added in 31638ab2ad2a5380d447780f05f7aa078c9421f5 The systemd variant was missing this mount, which may have gone unnoticed because `debian:bullseye` was silently ignoring this when starting the apparmor service. Signed-off-by: Sebastiaan van Stijn --- hack/dind-systemd | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) diff --git a/hack/dind-systemd b/hack/dind-systemd index 5ab0d25fc1..ff45b7560f 100755 --- a/hack/dind-systemd +++ b/hack/dind-systemd @@ -1,5 +1,11 @@ #!/bin/bash set -e + +# Set the container env-var, so that AppArmor is enabled in the daemon and +# containerd when running docker-in-docker. +# +# see: https://github.com/containerd/containerd/blob/787943dc1027a67f3b52631e084db0d4a6be2ccc/pkg/apparmor/apparmor_linux.go#L29-L45 +# see: https://github.com/moby/moby/commit/de191e86321f7d3136ff42ff75826b8107399497 container=docker export container @@ -18,6 +24,38 @@ fi # running in a container. mount --make-rshared / +# Allow AppArmor to work inside the container; +# +# aa-status +# apparmor filesystem is not mounted. +# apparmor module is loaded. +# +# mount -t securityfs none /sys/kernel/security +# +# aa-status +# apparmor module is loaded. +# 30 profiles are loaded. +# 30 profiles are in enforce mode. +# /snap/snapd/18357/usr/lib/snapd/snap-confine +# ... +# +# Note: https://0xn3va.gitbook.io/cheat-sheets/container/escaping/sensitive-mounts#sys-kernel-security +# +# ## /sys/kernel/security +# +# In /sys/kernel/security mounted the securityfs interface, which allows +# configuration of Linux Security Modules. This allows configuration of +# AppArmor policies, and so access to this may allow a container to disable +# its MAC system. +# +# Given that we're running privileged already, this should not be an issue. +if [ -d /sys/kernel/security ] && ! mountpoint -q /sys/kernel/security; then + mount -t securityfs none /sys/kernel/security || { + echo >&2 'Could not mount /sys/kernel/security.' + echo >&2 'AppArmor detection and --privileged mode might break.' + } +fi + env > /etc/docker-entrypoint-env cat > /etc/systemd/system/docker-entrypoint.target << EOF