Fix race condition on container stop
I'm fairly consistently seeing an error in DockerSuite.TestContainerApiRestartNotimeoutParam: docker_api_containers_test.go:969: c.Assert(status, check.Equals, http.StatusNoContent) ... obtained int = 500 ... expected int = 204 And in the daemon logs I see: INFO[0003] Container 8cf77c20275586b36c5095613159cf73babf92ba42ed4a2954bd55dca6b08971 failed to exit within 0 seconds of SIGTERM - using the force ERRO[0003] Handler for POST /containers/{name:.*}/restart returned error: Cannot restart container 8cf77c20275586b36c5095613159cf73babf92ba42ed4a2954bd55dca6b08971: [2] Container does not exist: container destroyed ERRO[0003] HTTP Error err=Cannot restart container 8cf77c20275586b36c5095613159cf73babf92ba42ed4a2954bd55dca6b08971: [2] Container does not exist: container destroyed statusCode=500 Note the "container destroyed" error message. This is being generatd by the libcontainer code and bubbled up in container.Kill() as a result of the call to `container.killPossiblyDeadProcess(9)` on line 439. See the comment in the code, but what I think is going on is that because we don't have any timeout on the Stop() call we immediate try to force things to stop. And by the time we get into libcontainer code the process just finished stopping due to the initial signal, so this secondary sig-9 fails due to the container no longer running (ie. its 'destroyed'). Since we can't look for "container destroyed" to just ignore the error, because some other driver might have different text, I opted to just ignore the error and keep going - with the assumption that if it couldnt send a sig-9 to the process then it MUST be because its already dead and not something else. To reproduce this I just run: curl -v -X POST http://127.0.0.1:2375/v1.19/containers/8cf77c20275586b36c5095613159cf73babf92ba42ed4a2954bd55dca6b08971/restart a few times and then it fails with the HTTP 500. Would like to hear some other ideas on to handle this since I'm not thrilled with the proposed solution. Signed-off-by: Doug Davis <dug@us.ibm.com>
This commit is contained in:
parent
f83073d3eb
commit
29bdcaf3cf
1 changed files with 17 additions and 1 deletions
|
@ -437,7 +437,23 @@ func (container *Container) Kill() error {
|
|||
|
||||
// 1. Send SIGKILL
|
||||
if err := container.killPossiblyDeadProcess(9); err != nil {
|
||||
return err
|
||||
// While normally we might "return err" here we're not going to
|
||||
// because if we can't stop the container by this point then
|
||||
// its probably because its already stopped. Meaning, between
|
||||
// the time of the IsRunning() call above and now it stopped.
|
||||
// Also, since the err return will be exec driver specific we can't
|
||||
// look for any particular (common) error that would indicate
|
||||
// that the process is already dead vs something else going wrong.
|
||||
// So, instead we'll give it up to 2 more seconds to complete and if
|
||||
// by that time the container is still running, then the error
|
||||
// we got is probably valid and so we return it to the caller.
|
||||
|
||||
if container.IsRunning() {
|
||||
container.WaitStop(2 * time.Second)
|
||||
if container.IsRunning() {
|
||||
return err
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// 2. Wait for the process to die, in last resort, try to kill the process directly
|
||||
|
|
Loading…
Add table
Reference in a new issue