Fix race condition on container stop

I'm fairly consistently seeing an error in
DockerSuite.TestContainerApiRestartNotimeoutParam:

docker_api_containers_test.go:969:
    c.Assert(status, check.Equals, http.StatusNoContent)
    ... obtained int = 500
    ... expected int = 204

And in the daemon logs I see:
INFO[0003] Container 8cf77c20275586b36c5095613159cf73babf92ba42ed4a2954bd55dca6b08971 failed to exit within 0 seconds of SIGTERM - using the force
ERRO[0003] Handler for POST /containers/{name:.*}/restart returned error: Cannot restart container 8cf77c20275586b36c5095613159cf73babf92ba42ed4a2954bd55dca6b08971: [2] Container does not exist: container destroyed

ERRO[0003] HTTP Error                                    err=Cannot restart container 8cf77c20275586b36c5095613159cf73babf92ba42ed4a2954bd55dca6b08971: [2] Container does not exist: container destroyed
 statusCode=500

Note the "container destroyed" error message.  This is being generatd by
the libcontainer code and bubbled up in container.Kill() as a result of the
call to `container.killPossiblyDeadProcess(9)` on line 439.

See the comment in the code, but what I think is going on is that because we
don't have any timeout on the Stop() call we immediate try to force things to
stop. And by the time we get into libcontainer code the process just finished
stopping due to the initial signal, so this secondary sig-9 fails due to the
container no longer running (ie. its 'destroyed').

Since we can't look for "container destroyed" to just ignore the error, because
some other driver might have different text, I opted to just ignore the error
and keep going - with the assumption that if it couldnt send a sig-9 to the
process then it MUST be because its already dead and not something else.

To reproduce this I just run:
curl -v -X POST http://127.0.0.1:2375/v1.19/containers/8cf77c20275586b36c5095613159cf73babf92ba42ed4a2954bd55dca6b08971/restart

a few times and then it fails with the HTTP 500.

Would like to hear some other ideas on to handle this since I'm not
thrilled with the proposed solution.

Signed-off-by: Doug Davis <dug@us.ibm.com>
This commit is contained in:
Doug Davis 2015-05-24 20:05:10 -07:00
parent f83073d3eb
commit 29bdcaf3cf

View file

@ -437,7 +437,23 @@ func (container *Container) Kill() error {
// 1. Send SIGKILL
if err := container.killPossiblyDeadProcess(9); err != nil {
return err
// While normally we might "return err" here we're not going to
// because if we can't stop the container by this point then
// its probably because its already stopped. Meaning, between
// the time of the IsRunning() call above and now it stopped.
// Also, since the err return will be exec driver specific we can't
// look for any particular (common) error that would indicate
// that the process is already dead vs something else going wrong.
// So, instead we'll give it up to 2 more seconds to complete and if
// by that time the container is still running, then the error
// we got is probably valid and so we return it to the caller.
if container.IsRunning() {
container.WaitStop(2 * time.Second)
if container.IsRunning() {
return err
}
}
}
// 2. Wait for the process to die, in last resort, try to kill the process directly