moby/hack/dind-systemd
Sebastiaan van Stijn cfb8ca520a
hack/dind-systemd: make AppArmor work with systemd enabled
On bookworm, AppArmor failed to start inside the container, which can be
seen at startup of the dev-container:

    Created symlink /etc/systemd/system/systemd-firstboot.service → /dev/null.
    Created symlink /etc/systemd/system/systemd-udevd.service → /dev/null.
    Created symlink /etc/systemd/system/multi-user.target.wants/docker-entrypoint.service → /etc/systemd/system/docker-entrypoint.service.
    hack/dind-systemd: starting /lib/systemd/systemd --show-status=false --unit=docker-entrypoint.target
    systemd 252.17-1~deb12u1 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
    Detected virtualization docker.
    Detected architecture x86-64.
    modprobe@configfs.service: Deactivated successfully.
    modprobe@dm_mod.service: Deactivated successfully.
    modprobe@drm.service: Deactivated successfully.
    modprobe@efi_pstore.service: Deactivated successfully.
    modprobe@fuse.service: Deactivated successfully.
    modprobe@loop.service: Deactivated successfully.
    apparmor.service: Starting requested but asserts failed.
    proc-sys-fs-binfmt_misc.automount: Got automount request for /proc/sys/fs/binfmt_misc, triggered by 49 (systemd-binfmt)
    + source /etc/docker-entrypoint-cmd
    ++ hack/make.sh dynbinary test-integration

When checking "aa-status", an error was printed that the filesystem was
not mounted:

    aa-status
    apparmor filesystem is not mounted.
    apparmor module is loaded.

Checking if "local-fs.target" was loaded, that seemed to be the case;

    systemctl status local-fs.target
    ● local-fs.target - Local File Systems
         Loaded: loaded (/lib/systemd/system/local-fs.target; static)
         Active: active since Mon 2023-11-27 10:48:38 UTC; 18s ago
           Docs: man:systemd.special(7)

However, **on the host**, "/sys/kernel/security" has a mount, which was not
present inside the container:

    mount | grep securityfs
    securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)

Interestingly, on `debian:bullseye`, this was not the case either; no
`securityfs` mount was present inside the container, and apparmor actually
failed to start, but succeeded silently:

    mount | grep securityfs
    systemctl start apparmor
    systemctl status apparmor
    ● apparmor.service - Load AppArmor profiles
         Loaded: loaded (/lib/systemd/system/apparmor.service; enabled; vendor preset: enabled)
         Active: active (exited) since Mon 2023-11-27 11:59:09 UTC; 44s ago
           Docs: man:apparmor(7)
                 https://gitlab.com/apparmor/apparmor/wikis/home/
        Process: 43 ExecStart=/lib/apparmor/apparmor.systemd reload (code=exited, status=0/SUCCESS)
       Main PID: 43 (code=exited, status=0/SUCCESS)
            CPU: 10ms

    Nov 27 11:59:09 9519f89cade1 apparmor.systemd[43]: Not starting AppArmor in container

Same, using the `/etc/init.d/apparmor` script:

    /etc/init.d/apparmor start
    Starting apparmor (via systemctl): apparmor.service.
    echo $?
    0

And apparmor was not actually active:

    aa-status
    apparmor module is loaded.
    apparmor filesystem is not mounted.

    aa-enabled
    Maybe - policy interface not available.

After further investigating, I found that the non-systemd dind script
had a mount for AppArmor, which was added in 31638ab2ad

The systemd variant was missing this mount, which may have gone unnoticed
because `debian:bullseye` was silently ignoring this when starting the
apparmor service.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-11-27 14:47:59 +01:00

105 lines
3.4 KiB
Bash
Executable file

#!/bin/bash
set -e
# Set the container env-var, so that AppArmor is enabled in the daemon and
# containerd when running docker-in-docker.
#
# see: https://github.com/containerd/containerd/blob/787943dc1027a67f3b52631e084db0d4a6be2ccc/pkg/apparmor/apparmor_linux.go#L29-L45
# see: https://github.com/moby/moby/commit/de191e86321f7d3136ff42ff75826b8107399497
container=docker
export container
if [ $# -eq 0 ]; then
echo >&2 'ERROR: No command specified. You probably want to run `journalctl -f`, or maybe `bash`?'
exit 1
fi
if [ ! -t 0 ]; then
echo >&2 'ERROR: TTY needs to be enabled (`docker run -t ...`).'
exit 1
fi
# Change mount propagation to shared, which SystemD PID 1 would normally do
# itself when started by the kernel. SystemD skips that when it detects it is
# running in a container.
mount --make-rshared /
# Allow AppArmor to work inside the container;
#
# aa-status
# apparmor filesystem is not mounted.
# apparmor module is loaded.
#
# mount -t securityfs none /sys/kernel/security
#
# aa-status
# apparmor module is loaded.
# 30 profiles are loaded.
# 30 profiles are in enforce mode.
# /snap/snapd/18357/usr/lib/snapd/snap-confine
# ...
#
# Note: https://0xn3va.gitbook.io/cheat-sheets/container/escaping/sensitive-mounts#sys-kernel-security
#
# ## /sys/kernel/security
#
# In /sys/kernel/security mounted the securityfs interface, which allows
# configuration of Linux Security Modules. This allows configuration of
# AppArmor policies, and so access to this may allow a container to disable
# its MAC system.
#
# Given that we're running privileged already, this should not be an issue.
if [ -d /sys/kernel/security ] && ! mountpoint -q /sys/kernel/security; then
mount -t securityfs none /sys/kernel/security || {
echo >&2 'Could not mount /sys/kernel/security.'
echo >&2 'AppArmor detection and --privileged mode might break.'
}
fi
env > /etc/docker-entrypoint-env
cat > /etc/systemd/system/docker-entrypoint.target << EOF
[Unit]
Description=the target for docker-entrypoint.service
Requires=docker-entrypoint.service systemd-logind.service systemd-user-sessions.service
EOF
quoted_args="$(printf " %q" "${@}")"
echo "${quoted_args}" > /etc/docker-entrypoint-cmd
cat > /etc/systemd/system/docker-entrypoint.service << EOF
[Unit]
Description=docker-entrypoint.service
[Service]
ExecStart=/bin/bash -exc "source /etc/docker-entrypoint-cmd"
# EXIT_STATUS is either an exit code integer or a signal name string, see systemd.exec(5)
ExecStopPost=/bin/bash -ec "if echo \${EXIT_STATUS} | grep [A-Z] > /dev/null; then echo >&2 \"got signal \${EXIT_STATUS}\"; systemctl exit \$(( 128 + \$( kill -l \${EXIT_STATUS} ) )); else systemctl exit \${EXIT_STATUS}; fi"
StandardInput=tty-force
StandardOutput=inherit
StandardError=inherit
WorkingDirectory=$(pwd)
EnvironmentFile=/etc/docker-entrypoint-env
[Install]
WantedBy=multi-user.target
EOF
systemctl mask systemd-firstboot.service systemd-udevd.service
systemctl unmask systemd-logind
systemctl enable docker-entrypoint.service
systemd=
if [ -x /lib/systemd/systemd ]; then
systemd=/lib/systemd/systemd
elif [ -x /usr/lib/systemd/systemd ]; then
systemd=/usr/lib/systemd/systemd
elif [ -x /sbin/init ]; then
systemd=/sbin/init
else
echo >&2 'ERROR: systemd is not installed'
exit 1
fi
systemd_args="--show-status=false --unit=docker-entrypoint.target"
echo "$0: starting $systemd $systemd_args"
exec $systemd $systemd_args