This patch attempts to allow endpoints to complete servicing connections
while being removed from a service. The change adds a flag to the
endpoint.deleteServiceInfoFromCluster() method to indicate whether this
removal should fully remove connectivity through the load balancer
to the endpoint or should just disable directing further connections to
the endpoint. If the flag is 'false', then the load balancer assigns
a weight of 0 to the endpoint but does not remove it as a linux load
balancing destination. It does remove the endpoint as a docker load
balancing endpoint but tracks it in a special map of "disabled-but-not-
destroyed" load balancing endpoints. This allows traffic to continue
flowing, at least under Linux. If the flag is 'true', then the code
removes the endpoint entirely as a load balancing destination.
The sandbox.DisableService() method invokes deleteServiceInfoFromCluster()
with the flag sent to 'false', while the endpoint.sbLeave() method invokes
it with the flag set to 'true' to complete the removal on endpoint
finalization. Renaming the endpoint invokes deleteServiceInfoFromCluster()
with the flag set to 'true' because renaming attempts to completely
remove and then re-add each endpoint service entry.
The controller.rmServiceBinding() method, which carries out the operation,
similarly gets a new flag for whether to fully remove the endpoint. If
the flag is false, it does the job of moving the endpoint from the
load balancing set to the 'disabled' set. It then removes or
de-weights the entry in the OS load balancing table via
network.rmLBBackend(). It removes the service entirely via said method
ONLY IF there are no more live or disabled load balancing endpoints.
Similarly network.addLBBackend() requires slight tweaking to properly
manage the disabled set.
Finally, this change requires propagating the status of disabled
service endpoints via the networkDB. Accordingly, the patch includes
both code to generate and handle service update messages. It also
augments the service structure with a ServiceDisabled boolean to convey
whether an endpoint should ultimately be removed or just disabled.
This, naturally, required a rebuild of the protocol buffer code as well.
Signed-off-by: Chris Telfer <ctelfer@docker.com>
This PR contains a fix for moby/moby#30321. There was a moby/moby#31142
PR intending to fix the issue by adding a delay between disabling the
service in the cluster and the shutdown of the tasks. However
disabling the service was not deleting the service info in the cluster.
Added a fix to delete service info from cluster and verified using siege
to ensure there is zero downtime on rolling update of a service.
Signed-off-by: abhi <abhi@docker.com>
Refreshed the PR: https://github.com/docker/libnetwork/pull/1585
Addressed comments suggesting to remove the IPAlias logic not anymore used
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
changed the ipMap to SetMatrix to allow transient states
Compacted the addSvc and deleteSvc into a one single method
Updated the datastructure for backends to allow storing all the information needed
to cleanup properly during the cleanupServiceBindings
Removed the enable/disable Service logic that was racing with sbLeave/sbJoin logic
Add some debug logs to track further race conditions
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
When sandbox is deleting, another SetKey routine could be also in
progress as there's no lock to protect it, when this happens, there
could be a scene that one sandbox is removed, but it's osSbox file
"/var/run/docker/netns/xxxx" left on system and will never be cleaned.
So add a inDelete check for SetKey() to eliminate the race.
Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
Because the failure would not be on creating the osl sandbox
(which is done by somebody else). It would be on the programming
libnetwork does on the osl sandbox. In case of failure just report
the error. External caller will take care of removing the parent sandbox
via the cleanup on the error handling path. Otherwise the osl sandbox
will never be removed.
Signed-off-by: Alessandro Boch <aboch@docker.com>
This fix tries to fix logrus formatting by removing `f` from
`logrus.[Error|Warn|Debug|Fatal|Panic|Info]f` when formatting string
is not present.
Also fix import name to use original project name 'logrus' instead of
'log'
Signed-off-by: Daehyeok Mun <daehyeok@gmail.com>
When the libnetwork controller is not in distributed control mode avoid
retaining stale sandboxes when the network cannot be retrieved from
store. This ratining logic is only applicable for an independent k/v
store which manages libnetwork state. In such case the k/v store may be
temporarily unavailable so there is a need to retain the sandbox so that
the resource cleanup happens properly.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
When adding a loadbalancer to a sandbox, the sandbox may have a valid
namespace but it might not have populated all the dependent network
resources yet. In that case do not populate that endpoint's loadbalancer
into that sandbox yet. The loadbalancer will be populated into the
sandbox when it is done populating all the dependent network resources.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
Ingress load balancer is achieved via a service sandbox which acts as
the proxy to translate incoming node port requests and mapping that to a
service entry. Once the right service is identified, the same internal
loadbalancer implementation is used to load balance to the right backend
instance.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>