libcontainerd: fix reaper goroutine position
It has observed defunct containerd processes accumulating over time while dockerd was permanently failing to restart containerd. Due to a bug in the runContainerdDaemon() function, dockerd does not clean up its child process if containerd already exits very soon after the (re)start. The reproducer and analysis below comes from docker 1.12.x but bug still applies on latest master. - from libcontainerd/remote_linux.go: 329 func (r *remote) runContainerdDaemon() error { : : // start the containerd child process : 403 if err := cmd.Start(); err != nil { 404 return err 405 } : : // If containerd exits very soon after (re)start, it is possible : // that containerd is already in defunct state at the time when : // dockerd gets here. The setOOMScore() function tries to write : // to /proc/PID_OF_CONTAINERD/oom_score_adj. However, this fails : // with errno EINVAL because containerd is defunct. Please see : // snippets of kernel source code and further explanation below. : 407 if err := setOOMScore(cmd.Process.Pid, r.oomScore); err != nil { 408 utils.KillProcess(cmd.Process.Pid) : : // Due to the error from write() we return here. As the : // goroutine that would clean up the child has not been : // started yet, containerd remains in the defunct state : // and never gets reaped. : 409 return err 410 } : 417 go func() { 418 cmd.Wait() 419 close(r.daemonWaitCh) 420 }() // Reap our child when needed : 423 } This is the kernel function that gets invoked when dockerd tries to write to /proc/PID_OF_CONTAINERD/oom_score_adj. - from fs/proc/base.c: 1197 static ssize_t oom_score_adj_write(struct file *file, ... 1198 size_t count, loff_t *ppos) 1199 { : 1223 task = get_proc_task(file_inode(file)); : : // The defunct containerd process does not have a virtual : // address space anymore, i.e. task->mm is NULL. Thus the : // following code returns errno EINVAL to dockerd. : 1230 if (!task->mm) { 1231 err = -EINVAL; 1232 goto err_task_lock; 1233 } : 1253 err_task_lock: : 1257 return err < 0 ? err : count; 1258 } The purpose of the following program is to demonstrate the behavior of the oom_score_adj_write() function in connection with a defunct process. $ cat defunct_test.c \#include <unistd.h> main() { pid_t pid = fork(); if (pid == 0) // child _exit(0); // parent pause(); } $ make defunct_test cc defunct_test.c -o defunct_test $ ./defunct_test & [1] 3142 $ ps -f | grep defunct_test | grep -v grep root 3142 2956 0 13:04 pts/0 00:00:00 ./defunct_test root 3143 3142 0 13:04 pts/0 00:00:00 [defunct_test] <defunct> $ echo "ps 3143" | crash -s PID PPID CPU TASK ST %MEM VSZ RSS COMM 3143 3142 2 ffff880035def300 ZO 0.0 0 0 defunct_test $ echo "px ((struct task_struct *)0xffff880035def300)->mm" | crash -s $1 = (struct mm_struct *) 0x0 ^^^ task->mm is NULL $ cat /proc/3143/oom_score_adj 0 $ echo 0 > /proc/3143/oom_score_adj -bash: echo: write error: Invalid argument" --- This patch fixes the above issue by making sure we start the reaper goroutine as soon as possible. Signed-off-by: Antonio Murdaca <runcom@redhat.com>
This commit is contained in:
parent
bde4c89351
commit
27087eacbf
1 changed files with 12 additions and 5 deletions
|
@ -414,6 +414,18 @@ func (r *remote) runContainerdDaemon() error {
|
|||
if err := cmd.Start(); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
// unless strictly necessary, do not add anything in between here
|
||||
// as the reaper goroutine below needs to kick in as soon as possible
|
||||
// and any "return" from code paths added here will defeat the reaper
|
||||
// process.
|
||||
|
||||
r.daemonWaitCh = make(chan struct{})
|
||||
go func() {
|
||||
cmd.Wait()
|
||||
close(r.daemonWaitCh)
|
||||
}() // Reap our child when needed
|
||||
|
||||
logrus.Infof("libcontainerd: new containerd process, pid: %d", cmd.Process.Pid)
|
||||
if err := setOOMScore(cmd.Process.Pid, r.oomScore); err != nil {
|
||||
system.KillProcess(cmd.Process.Pid)
|
||||
|
@ -424,11 +436,6 @@ func (r *remote) runContainerdDaemon() error {
|
|||
return err
|
||||
}
|
||||
|
||||
r.daemonWaitCh = make(chan struct{})
|
||||
go func() {
|
||||
cmd.Wait()
|
||||
close(r.daemonWaitCh)
|
||||
}() // Reap our child when needed
|
||||
r.daemonPid = cmd.Process.Pid
|
||||
return nil
|
||||
}
|
||||
|
|
Loading…
Reference in a new issue