A node is no longer using its load balancer IP address when it no longer
has tasks that use the network that requires that load balancer. When
this occurs, the swarmkit manager will free that IP in IPAM, and may
reaassign it.
When a task shuts down cleanly, it attempts removal of the networks it
uses, and if it is the last task using those networks, this removal
succeeds, and the load balancer IP is freed.
However, this behavior is absent if the container fails. Removal of the
networks is never attempted.
To address this issue, I amend the executor. Whenever a node load
balancer IP is removed or changed, that information is passedd to the
executor by way of the Configure method. By keeping track of the set of
node NetworkAttachments from the previous call to Configure, we can
determine which, if any, have been removed or changed.
At first, this seems to create a race, by which a task can be attempting
to start and the network is removed right out from under it. However,
this is already addressed in the controller. The controller will attempt
to recreate missing networks before starting a task.
Signed-off-by: Drew Erny <derny@mirantis.com>
(cherry picked from commit 0d9b0ed678)
Signed-off-by: Ameya Gawde <agawde@mirantis.com>
While working on some other code, noticed a bug in the jobs code. We're
adding job version after we're checking if there are port configs.
Before, if there were no port configs, the job version would be missing,
because we would return before trying to convert.
This moves the jobs version conversion above that code, so we don't
accidentally return before it.
Signed-off-by: Drew Erny <derny@mirantis.com>
Add Ulimits field to the ContainerSpec API type and wire it to Swarmkit.
This is related to #40639.
Signed-off-by: Albin Kerouanton <albin@akerouanton.name>
After dicussing with maintainers, it was decided putting the burden of
providing the full cap list on the client is not a good design.
Instead we decided to follow along with the container API and use cap
add/drop.
This brings in the changes already merged into swarmkit.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
The initial implementation followed the Swarm API, where
PidsLimit is located in ContainerSpec. This is not the
desired place for this property, so moving the field to
TaskTemplate.Resources in our API.
A similar change should be made in the SwarmKit API (likely
keeping the old field for backward compatibility, because
it was merged some releases back)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This introduces A new type (`Limit`), which allows Limits
and "Reservations" to have different options, as it's not
possible to make "Reservations" for some kind of limits.
The `GenericResources` have been removed from the new type;
the API did not handle specifying `GenericResources` as a
_Limit_ (only as _Reservations_), and this field would
therefore always be empty (omitted) in the `Limits` case.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Support for PidsLimit was added to SwarmKit in docker/swarmkit/pull/2415,
but never exposed through the Docker remove API.
This patch exposes the feature in the repote API.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Adds support for ReplicatedJob and GlobalJob service modes. These modes
allow running service which execute tasks that exit upon success,
instead of daemon-type tasks.
Signed-off-by: Drew Erny <drew.erny@docker.com>
Adds a new ServiceStatus field to the Service object, which includes the
running and desired task counts. This new field is gated behind a
"status" query parameter.
Signed-off-by: Drew Erny <drew.erny@docker.com>
```
daemon/cluster/controllers/plugin/controller.go:37:2: U1000: field `taskID` is unused (unused)
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Format the source according to latest goimports.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
```
daemon/cluster/nodes.go:69:36: SA4009: argument ctx is overwritten before first use (staticcheck)
13:06:14 return c.lockedManagerAction(func(ctx context.Context, state nodeState) error {
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
There are a few more places, apparently, that List operations against
Swarm exist, besides just in the List methods. This increases the max
received message size in those places.
Signed-off-by: Drew Erny <drew.erny@docker.com>
Increases the max recieved gRPC message size for Node and Secret list
operations. This has already been done for the other swarm types, but
was not done for these.
Signed-off-by: Drew Erny <drew.erny@docker.com>
Make sure adapter.removeNetworks executes during task Remove
adapter.removeNetworks was being skipped for cases when
isUnknownContainer(err) was true after adapter.remove was executed
This fix eliminates the nil return case forcing the function
to continue executing unless there is a true error
Fixes https://github.com/moby/moby/issues/39225
Signed-off-by: Arko Dasgupta <arko.dasgupta@docker.com>
Today `$ docker service create --limit-cpu` configures a containers
`CpuPeriod` and `CpuQuota` variables, this commit switches this to
configure a containers `NanoCpu` variable instead.
Signed-off-by: Olly Pomeroy <olly@docker.com>
Allow specifying environment variables when installing an engine plugin
as a Swarm service. Invalid environment variable entries (without an
equals (`=`) char) will be ignored.
Signed-off-by: Sune Keller <absukl@almbrand.dk>
In this case the message to stop the agent is never actually sent
because the swarm node is nil
Signed-off-by: Kyle Wuolle <kyle.wuolle@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Running a cluster in a two-manager configuration effectively *doubles*
the chance of loosing control over the cluster (compared to running
in a single-manager setup). Users may have the assumption that having
two managers provides fault tolerance, so it's best to warn them if
they're using this configuration.
This patch adds a warning to the `info` response if Swarm is configured
with two managers:
WARNING: Running Swarm in a two-manager configuration. This configuration provides
no fault tolerance, and poses a high risk to loose control over the cluster.
Refer to https://docs.docker.com/engine/swarm/admin_guide/ to configure the
Swarm for fault-tolerance.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
It's already defined on the daemon. This allows us to not call
`SystemInfo` which is failry heavy and potentially can even error.
Takes care of todo item from Derek's containerd integration PR.
51c412f26e/daemon/cluster/services.go (L148-L149)
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
The Swarmkit api specifies a target for configs called called "Runtime"
which indicates that the config is not mounted into the container but
has some other use. This commit updates the Docker api to reflect this.
Signed-off-by: Drew Erny <drew.erny@docker.com>
`time.After` keeps a timer running until the specified duration is
completed. It also allocates a new timer on each call. This can wind up
leaving lots of uneccessary timers running in the background that are
not needed and consume resources.
Instead of `time.After`, use `time.NewTimer` so the timer can actually
be stopped.
In some of these cases it's not a big deal since the duraiton is really
short, but in others it is much worse.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This commit contains changes to configure DataPathPort
option. By default we use 4789 port number. But this commit
will allow user to configure port number during swarm init.
DataPathPort can't be modified after swarm init.
Signed-off-by: selansen <elango.siva@docker.com>
This allows non-recursive bind-mount, i.e. mount(2) with "bind" rather than "rbind".
Swarm-mode will be supported in a separate PR because of mutual vendoring.
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>