moby/integration/service/create_test.go
Sebastiaan van Stijn 4b9a3dac46
Fix race in TestCreateServiceSecretFileMode, TestCreateServiceConfigFileMode
Looks like this test was broken from the start, and fully relied on a race
condition. (Test was added in 65ee7fff02)

The problem is in the service's command: `ls -l /etc/config || /bin/top`, which
will either:

- exit immediately if the secret is mounted correctly at `/etc/config` (which it should)
- keep running with `/bin/top` if the above failed

After the service is created, the test enters a race-condition, checking for 1
task to be running (which it ocassionally is), after which it proceeds, and looks
up the list of tasks of the service, to get the log output of `ls -l /etc/config`.

This is another race: first of all, the original filter for that task lookup did
not filter by `running`, so it would pick "any" task of the service (either failed,
running, or "completed" (successfully exited) tasks).

In the meantime though, SwarmKit kept reconciling the service, and creating new
tasks, so even if the test was able to get the ID of the correct task, that task
may already have been exited, and removed (task-limit is 5 by default), so only
if the test was "lucky", it would be able to get the logs, but of course, chances
were likely that it would be "too late", and the task already gone.

The problem can be easily reproduced when running the steps manually:

    echo 'CONFIG' | docker config create myconfig -

    docker service create --config source=myconfig,target=/etc/config,mode=0777 --name myservice busybox sh -c 'ls -l /etc/config || /bin/top'

The above creates the service, but it keeps retrying, because each task exits
immediately (followed by SwarmKit reconciling and starting a new task);

    mjntpfkkyuuc1dpay4h00c4oo
    overall progress: 0 out of 1 tasks
    1/1: ready     [======================================>            ]
    verify: Detected task failure
    ^COperation continuing in background.
    Use `docker service ps mjntpfkkyuuc1dpay4h00c4oo` to check progress.

And checking the tasks for the service reveals that tasks exit cleanly (no error),
but _do exit_, so swarm just keeps up reconciling, and spinning up new tasks;

    docker service ps myservice --no-trunc
    ID                          NAME              IMAGE                                                                                    NODE             DESIRED STATE   CURRENT STATE                     ERROR     PORTS
    2wmcuv4vffnet8nybg3he4v9n   myservice.1       busybox:latest@sha256:f7ca5a32c10d51aeda3b4d01c61c6061f497893d7f6628b92f822f7117182a57   docker-desktop   Ready           Ready less than a second ago
    5p8b006uec125iq2892lxay64    \_ myservice.1   busybox:latest@sha256:f7ca5a32c10d51aeda3b4d01c61c6061f497893d7f6628b92f822f7117182a57   docker-desktop   Shutdown        Complete less than a second ago
    k8lpsvlak4b3nil0zfkexw61p    \_ myservice.1   busybox:latest@sha256:f7ca5a32c10d51aeda3b4d01c61c6061f497893d7f6628b92f822f7117182a57   docker-desktop   Shutdown        Complete 6 seconds ago
    vsunl5pi7e2n9ol3p89kvj6pn    \_ myservice.1   busybox:latest@sha256:f7ca5a32c10d51aeda3b4d01c61c6061f497893d7f6628b92f822f7117182a57   docker-desktop   Shutdown        Complete 11 seconds ago
    orxl8b6kt2l6dfznzzd4lij4s    \_ myservice.1   busybox:latest@sha256:f7ca5a32c10d51aeda3b4d01c61c6061f497893d7f6628b92f822f7117182a57   docker-desktop   Shutdown        Complete 17 seconds ago

This patch changes the service's command to `sleep`, so that a successful task
(after successfully performing `ls -l /etc/config`) continues to be running until
the service is deleted. With that change, the service should (usually) reconcile
immediately, which removes the race condition, and should also make it faster :)

This patch changes the tests to use client.ServiceLogs() instead of using the
service's tasklist to directly access container logs. This should also fix some
failures that happened if some tasks failed to start before reconciling, in which
case client.TaskList() (with the current filters), could return more tasks than
anticipated (as it also contained the exited tasks);

    === RUN   TestCreateServiceSecretFileMode
        create_test.go:291: assertion failed: 2 (int) != 1 (int)
    --- FAIL: TestCreateServiceSecretFileMode (7.88s)
    === RUN   TestCreateServiceConfigFileMode
        create_test.go:355: assertion failed: 2 (int) != 1 (int)
    --- FAIL: TestCreateServiceConfigFileMode (7.87s)

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 13cff6d583)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-10-27 12:30:35 +02:00

532 lines
18 KiB
Go

package service // import "github.com/docker/docker/integration/service"
import (
"context"
"fmt"
"io/ioutil"
"strings"
"testing"
"time"
"github.com/docker/docker/api/types"
"github.com/docker/docker/api/types/filters"
"github.com/docker/docker/api/types/strslice"
swarmtypes "github.com/docker/docker/api/types/swarm"
"github.com/docker/docker/api/types/versions"
"github.com/docker/docker/client"
"github.com/docker/docker/errdefs"
"github.com/docker/docker/integration/internal/network"
"github.com/docker/docker/integration/internal/swarm"
"github.com/docker/docker/testutil/daemon"
"gotest.tools/v3/assert"
is "gotest.tools/v3/assert/cmp"
"gotest.tools/v3/poll"
"gotest.tools/v3/skip"
)
func TestServiceCreateInit(t *testing.T) {
defer setupTest(t)()
t.Run("daemonInitDisabled", testServiceCreateInit(false))
t.Run("daemonInitEnabled", testServiceCreateInit(true))
}
func testServiceCreateInit(daemonEnabled bool) func(t *testing.T) {
return func(t *testing.T) {
var ops = []daemon.Option{}
if daemonEnabled {
ops = append(ops, daemon.WithInit())
}
d := swarm.NewSwarm(t, testEnv, ops...)
defer d.Stop(t)
client := d.NewClientT(t)
defer client.Close()
booleanTrue := true
booleanFalse := false
serviceID := swarm.CreateService(t, d)
poll.WaitOn(t, swarm.RunningTasksCount(client, serviceID, 1), swarm.ServicePoll)
i := inspectServiceContainer(t, client, serviceID)
// HostConfig.Init == nil means that it delegates to daemon configuration
assert.Check(t, i.HostConfig.Init == nil)
serviceID = swarm.CreateService(t, d, swarm.ServiceWithInit(&booleanTrue))
poll.WaitOn(t, swarm.RunningTasksCount(client, serviceID, 1), swarm.ServicePoll)
i = inspectServiceContainer(t, client, serviceID)
assert.Check(t, is.Equal(true, *i.HostConfig.Init))
serviceID = swarm.CreateService(t, d, swarm.ServiceWithInit(&booleanFalse))
poll.WaitOn(t, swarm.RunningTasksCount(client, serviceID, 1), swarm.ServicePoll)
i = inspectServiceContainer(t, client, serviceID)
assert.Check(t, is.Equal(false, *i.HostConfig.Init))
}
}
func inspectServiceContainer(t *testing.T, client client.APIClient, serviceID string) types.ContainerJSON {
t.Helper()
filter := filters.NewArgs()
filter.Add("label", fmt.Sprintf("com.docker.swarm.service.id=%s", serviceID))
containers, err := client.ContainerList(context.Background(), types.ContainerListOptions{Filters: filter})
assert.NilError(t, err)
assert.Check(t, is.Len(containers, 1))
i, err := client.ContainerInspect(context.Background(), containers[0].ID)
assert.NilError(t, err)
return i
}
func TestCreateServiceMultipleTimes(t *testing.T) {
skip.If(t, testEnv.DaemonInfo.OSType == "windows")
defer setupTest(t)()
d := swarm.NewSwarm(t, testEnv)
defer d.Stop(t)
client := d.NewClientT(t)
defer client.Close()
ctx := context.Background()
overlayName := "overlay1_" + t.Name()
overlayID := network.CreateNoError(ctx, t, client, overlayName,
network.WithCheckDuplicate(),
network.WithDriver("overlay"),
)
var instances uint64 = 4
serviceName := "TestService_" + t.Name()
serviceSpec := []swarm.ServiceSpecOpt{
swarm.ServiceWithReplicas(instances),
swarm.ServiceWithName(serviceName),
swarm.ServiceWithNetwork(overlayName),
}
serviceID := swarm.CreateService(t, d, serviceSpec...)
poll.WaitOn(t, swarm.RunningTasksCount(client, serviceID, instances), swarm.ServicePoll)
_, _, err := client.ServiceInspectWithRaw(context.Background(), serviceID, types.ServiceInspectOptions{})
assert.NilError(t, err)
err = client.ServiceRemove(context.Background(), serviceID)
assert.NilError(t, err)
poll.WaitOn(t, swarm.NoTasksForService(ctx, client, serviceID), swarm.ServicePoll)
serviceID2 := swarm.CreateService(t, d, serviceSpec...)
poll.WaitOn(t, swarm.RunningTasksCount(client, serviceID2, instances), swarm.ServicePoll)
err = client.ServiceRemove(context.Background(), serviceID2)
assert.NilError(t, err)
// we can't just wait on no tasks for the service, counter-intuitively.
// Tasks may briefly exist but not show up, if they are are in the process
// of being deallocated. To avoid this case, we should retry network remove
// a few times, to give tasks time to be deallcoated
poll.WaitOn(t, swarm.NoTasksForService(ctx, client, serviceID2), swarm.ServicePoll)
for retry := 0; retry < 5; retry++ {
err = client.NetworkRemove(context.Background(), overlayID)
// TODO(dperny): using strings.Contains for error checking is awful,
// but so is the fact that swarm functions don't return errdefs errors.
// I don't have time at this moment to fix the latter, so I guess I'll
// go with the former.
//
// The full error we're looking for is something like this:
//
// Error response from daemon: rpc error: code = FailedPrecondition desc = network %v is in use by task %v
//
// The safest way to catch this, I think, will be to match on "is in
// use by", as this is an uninterrupted string that best identifies
// this error.
if err == nil || !strings.Contains(err.Error(), "is in use by") {
// if there is no error, or the error isn't this kind of error,
// then we'll break the loop body, and either fail the test or
// continue.
break
}
}
assert.NilError(t, err)
poll.WaitOn(t, network.IsRemoved(context.Background(), client, overlayID), poll.WithTimeout(1*time.Minute), poll.WithDelay(10*time.Second))
}
func TestCreateServiceConflict(t *testing.T) {
skip.If(t, testEnv.DaemonInfo.OSType == "windows")
defer setupTest(t)()
d := swarm.NewSwarm(t, testEnv)
defer d.Stop(t)
c := d.NewClientT(t)
defer c.Close()
ctx := context.Background()
serviceName := "TestService_" + t.Name()
serviceSpec := []swarm.ServiceSpecOpt{
swarm.ServiceWithName(serviceName),
}
swarm.CreateService(t, d, serviceSpec...)
spec := swarm.CreateServiceSpec(t, serviceSpec...)
_, err := c.ServiceCreate(ctx, spec, types.ServiceCreateOptions{})
assert.Check(t, errdefs.IsConflict(err))
assert.ErrorContains(t, err, "service "+serviceName+" already exists")
}
func TestCreateServiceMaxReplicas(t *testing.T) {
defer setupTest(t)()
d := swarm.NewSwarm(t, testEnv)
defer d.Stop(t)
client := d.NewClientT(t)
defer client.Close()
var maxReplicas uint64 = 2
serviceSpec := []swarm.ServiceSpecOpt{
swarm.ServiceWithReplicas(maxReplicas),
swarm.ServiceWithMaxReplicas(maxReplicas),
}
serviceID := swarm.CreateService(t, d, serviceSpec...)
poll.WaitOn(t, swarm.RunningTasksCount(client, serviceID, maxReplicas), swarm.ServicePoll)
_, _, err := client.ServiceInspectWithRaw(context.Background(), serviceID, types.ServiceInspectOptions{})
assert.NilError(t, err)
}
func TestCreateWithDuplicateNetworkNames(t *testing.T) {
skip.If(t, testEnv.DaemonInfo.OSType == "windows")
defer setupTest(t)()
d := swarm.NewSwarm(t, testEnv)
defer d.Stop(t)
client := d.NewClientT(t)
defer client.Close()
ctx := context.Background()
name := "foo_" + t.Name()
n1 := network.CreateNoError(ctx, t, client, name, network.WithDriver("bridge"))
n2 := network.CreateNoError(ctx, t, client, name, network.WithDriver("bridge"))
// Duplicates with name but with different driver
n3 := network.CreateNoError(ctx, t, client, name, network.WithDriver("overlay"))
// Create Service with the same name
var instances uint64 = 1
serviceName := "top_" + t.Name()
serviceID := swarm.CreateService(t, d,
swarm.ServiceWithReplicas(instances),
swarm.ServiceWithName(serviceName),
swarm.ServiceWithNetwork(name),
)
poll.WaitOn(t, swarm.RunningTasksCount(client, serviceID, instances), swarm.ServicePoll)
resp, _, err := client.ServiceInspectWithRaw(ctx, serviceID, types.ServiceInspectOptions{})
assert.NilError(t, err)
assert.Check(t, is.Equal(n3, resp.Spec.TaskTemplate.Networks[0].Target))
// Remove Service, and wait for its tasks to be removed
err = client.ServiceRemove(ctx, serviceID)
assert.NilError(t, err)
poll.WaitOn(t, swarm.NoTasksForService(ctx, client, serviceID), swarm.ServicePoll)
// Remove networks
err = client.NetworkRemove(context.Background(), n3)
assert.NilError(t, err)
err = client.NetworkRemove(context.Background(), n2)
assert.NilError(t, err)
err = client.NetworkRemove(context.Background(), n1)
assert.NilError(t, err)
// Make sure networks have been destroyed.
poll.WaitOn(t, network.IsRemoved(context.Background(), client, n3), poll.WithTimeout(1*time.Minute), poll.WithDelay(10*time.Second))
poll.WaitOn(t, network.IsRemoved(context.Background(), client, n2), poll.WithTimeout(1*time.Minute), poll.WithDelay(10*time.Second))
poll.WaitOn(t, network.IsRemoved(context.Background(), client, n1), poll.WithTimeout(1*time.Minute), poll.WithDelay(10*time.Second))
}
func TestCreateServiceSecretFileMode(t *testing.T) {
skip.If(t, testEnv.DaemonInfo.OSType == "windows")
defer setupTest(t)()
d := swarm.NewSwarm(t, testEnv)
defer d.Stop(t)
client := d.NewClientT(t)
defer client.Close()
ctx := context.Background()
secretName := "TestSecret_" + t.Name()
secretResp, err := client.SecretCreate(ctx, swarmtypes.SecretSpec{
Annotations: swarmtypes.Annotations{
Name: secretName,
},
Data: []byte("TESTSECRET"),
})
assert.NilError(t, err)
var instances uint64 = 1
serviceName := "TestService_" + t.Name()
serviceID := swarm.CreateService(t, d,
swarm.ServiceWithReplicas(instances),
swarm.ServiceWithName(serviceName),
swarm.ServiceWithCommand([]string{"/bin/sh", "-c", "ls -l /etc/secret && sleep inf"}),
swarm.ServiceWithSecret(&swarmtypes.SecretReference{
File: &swarmtypes.SecretReferenceFileTarget{
Name: "/etc/secret",
UID: "0",
GID: "0",
Mode: 0777,
},
SecretID: secretResp.ID,
SecretName: secretName,
}),
)
poll.WaitOn(t, swarm.RunningTasksCount(client, serviceID, instances), swarm.ServicePoll)
body, err := client.ServiceLogs(ctx, serviceID, types.ContainerLogsOptions{
Tail: "1",
ShowStdout: true,
})
assert.NilError(t, err)
defer body.Close()
content, err := ioutil.ReadAll(body)
assert.NilError(t, err)
assert.Check(t, is.Contains(string(content), "-rwxrwxrwx"))
err = client.ServiceRemove(ctx, serviceID)
assert.NilError(t, err)
poll.WaitOn(t, swarm.NoTasksForService(ctx, client, serviceID), swarm.ServicePoll)
err = client.SecretRemove(ctx, secretName)
assert.NilError(t, err)
}
func TestCreateServiceConfigFileMode(t *testing.T) {
skip.If(t, testEnv.DaemonInfo.OSType == "windows")
defer setupTest(t)()
d := swarm.NewSwarm(t, testEnv)
defer d.Stop(t)
client := d.NewClientT(t)
defer client.Close()
ctx := context.Background()
configName := "TestConfig_" + t.Name()
configResp, err := client.ConfigCreate(ctx, swarmtypes.ConfigSpec{
Annotations: swarmtypes.Annotations{
Name: configName,
},
Data: []byte("TESTCONFIG"),
})
assert.NilError(t, err)
var instances uint64 = 1
serviceName := "TestService_" + t.Name()
serviceID := swarm.CreateService(t, d,
swarm.ServiceWithName(serviceName),
swarm.ServiceWithCommand([]string{"/bin/sh", "-c", "ls -l /etc/config && sleep inf"}),
swarm.ServiceWithReplicas(instances),
swarm.ServiceWithConfig(&swarmtypes.ConfigReference{
File: &swarmtypes.ConfigReferenceFileTarget{
Name: "/etc/config",
UID: "0",
GID: "0",
Mode: 0777,
},
ConfigID: configResp.ID,
ConfigName: configName,
}),
)
poll.WaitOn(t, swarm.RunningTasksCount(client, serviceID, instances))
body, err := client.ServiceLogs(ctx, serviceID, types.ContainerLogsOptions{
Tail: "1",
ShowStdout: true,
})
assert.NilError(t, err)
defer body.Close()
content, err := ioutil.ReadAll(body)
assert.NilError(t, err)
assert.Check(t, is.Contains(string(content), "-rwxrwxrwx"))
err = client.ServiceRemove(ctx, serviceID)
assert.NilError(t, err)
poll.WaitOn(t, swarm.NoTasksForService(ctx, client, serviceID))
err = client.ConfigRemove(ctx, configName)
assert.NilError(t, err)
}
// TestServiceCreateSysctls tests that a service created with sysctl options in
// the ContainerSpec correctly applies those options.
//
// To test this, we're going to create a service with the sysctl option
//
// {"net.ipv4.ip_nonlocal_bind": "0"}
//
// We'll get the service's tasks to get the container ID, and then we'll
// inspect the container. If the output of the container inspect contains the
// sysctl option with the correct value, we can assume that the sysctl has been
// plumbed correctly.
//
// Next, we'll remove that service and create a new service with that option
// set to 1. This means that no matter what the default is, we can be confident
// that the sysctl option is applying as intended.
//
// Additionally, we'll do service and task inspects to verify that the inspect
// output includes the desired sysctl option.
//
// We're using net.ipv4.ip_nonlocal_bind because it's something that I'm fairly
// confident won't be modified by the container runtime, and won't blow
// anything up in the test environment
func TestCreateServiceSysctls(t *testing.T) {
skip.If(
t, versions.LessThan(testEnv.DaemonAPIVersion(), "1.40"),
"setting service sysctls is unsupported before api v1.40",
)
defer setupTest(t)()
d := swarm.NewSwarm(t, testEnv)
defer d.Stop(t)
client := d.NewClientT(t)
defer client.Close()
ctx := context.Background()
// run thie block twice, so that no matter what the default value of
// net.ipv4.ip_nonlocal_bind is, we can verify that setting the sysctl
// options works
for _, expected := range []string{"0", "1"} {
// store the map we're going to be using everywhere.
expectedSysctls := map[string]string{"net.ipv4.ip_nonlocal_bind": expected}
// Create the service with the sysctl options
var instances uint64 = 1
serviceID := swarm.CreateService(t, d,
swarm.ServiceWithSysctls(expectedSysctls),
)
// wait for the service to converge to 1 running task as expected
poll.WaitOn(t, swarm.RunningTasksCount(client, serviceID, instances))
// we're going to check 3 things:
//
// 1. Does the container, when inspected, have the sysctl option set?
// 2. Does the task have the sysctl in the spec?
// 3. Does the service have the sysctl in the spec?
//
// if all 3 of these things are true, we know that the sysctl has been
// plumbed correctly through the engine.
//
// We don't actually have to get inside the container and check its
// logs or anything. If we see the sysctl set on the container inspect,
// we know that the sysctl is plumbed correctly. everything below that
// level has been tested elsewhere. (thanks @thaJeztah, because an
// earlier version of this test had to get container logs and was much
// more complex)
// get all of the tasks of the service, so we can get the container
filter := filters.NewArgs()
filter.Add("service", serviceID)
tasks, err := client.TaskList(ctx, types.TaskListOptions{
Filters: filter,
})
assert.NilError(t, err)
assert.Check(t, is.Equal(len(tasks), 1))
// verify that the container has the sysctl option set
ctnr, err := client.ContainerInspect(ctx, tasks[0].Status.ContainerStatus.ContainerID)
assert.NilError(t, err)
assert.DeepEqual(t, ctnr.HostConfig.Sysctls, expectedSysctls)
// verify that the task has the sysctl option set in the task object
assert.DeepEqual(t, tasks[0].Spec.ContainerSpec.Sysctls, expectedSysctls)
// verify that the service also has the sysctl set in the spec.
service, _, err := client.ServiceInspectWithRaw(ctx, serviceID, types.ServiceInspectOptions{})
assert.NilError(t, err)
assert.DeepEqual(t,
service.Spec.TaskTemplate.ContainerSpec.Sysctls, expectedSysctls,
)
}
}
// TestServiceCreateCapabilities tests that a service created with capabilities options in
// the ContainerSpec correctly applies those options.
//
// To test this, we're going to create a service with the capabilities option
//
// []string{"CAP_NET_RAW", "CAP_SYS_CHROOT"}
//
// We'll get the service's tasks to get the container ID, and then we'll
// inspect the container. If the output of the container inspect contains the
// capabilities option with the correct value, we can assume that the capabilities has been
// plumbed correctly.
func TestCreateServiceCapabilities(t *testing.T) {
skip.If(
t, versions.LessThan(testEnv.DaemonAPIVersion(), "1.41"),
"setting service capabilities is unsupported before api v1.41",
)
defer setupTest(t)()
d := swarm.NewSwarm(t, testEnv)
defer d.Stop(t)
client := d.NewClientT(t)
defer client.Close()
ctx := context.Background()
// store the map we're going to be using everywhere.
capAdd := []string{"CAP_SYS_CHROOT"}
capDrop := []string{"CAP_NET_RAW"}
// Create the service with the capabilities options
var instances uint64 = 1
serviceID := swarm.CreateService(t, d,
swarm.ServiceWithCapabilities(capAdd, capDrop),
)
// wait for the service to converge to 1 running task as expected
poll.WaitOn(t, swarm.RunningTasksCount(client, serviceID, instances))
// we're going to check 3 things:
//
// 1. Does the container, when inspected, have the capabilities option set?
// 2. Does the task have the capabilities in the spec?
// 3. Does the service have the capabilities in the spec?
//
// if all 3 of these things are true, we know that the capabilities has been
// plumbed correctly through the engine.
//
// We don't actually have to get inside the container and check its
// logs or anything. If we see the capabilities set on the container inspect,
// we know that the capabilities is plumbed correctly. everything below that
// level has been tested elsewhere.
// get all of the tasks of the service, so we can get the container
filter := filters.NewArgs()
filter.Add("service", serviceID)
tasks, err := client.TaskList(ctx, types.TaskListOptions{
Filters: filter,
})
assert.NilError(t, err)
assert.Check(t, is.Equal(len(tasks), 1))
// verify that the container has the capabilities option set
ctnr, err := client.ContainerInspect(ctx, tasks[0].Status.ContainerStatus.ContainerID)
assert.NilError(t, err)
assert.DeepEqual(t, ctnr.HostConfig.CapAdd, strslice.StrSlice(capAdd))
assert.DeepEqual(t, ctnr.HostConfig.CapDrop, strslice.StrSlice(capDrop))
// verify that the task has the capabilities option set in the task object
assert.DeepEqual(t, tasks[0].Spec.ContainerSpec.CapabilityAdd, capAdd)
assert.DeepEqual(t, tasks[0].Spec.ContainerSpec.CapabilityDrop, capDrop)
// verify that the service also has the capabilities set in the spec.
service, _, err := client.ServiceInspectWithRaw(ctx, serviceID, types.ServiceInspectOptions{})
assert.NilError(t, err)
assert.DeepEqual(t, service.Spec.TaskTemplate.ContainerSpec.CapabilityAdd, capAdd)
assert.DeepEqual(t, service.Spec.TaskTemplate.ContainerSpec.CapabilityDrop, capDrop)
}