Before when a node was failing, all the nodes would bump the lamport time of all their
entries. This means that if a node flap, there will be a storm of update of all the entries.
This commit on the base of the previous logic guarantees that only the node that joins back
will readvertise its own entries, the other nodes won't need to advertise again.
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
join/leave fixes:
- when a node leaves the network will deletes all the other nodes entries but will keep track of its
to make sure that other nodes if they are tcp syncing will be aware of them being deleted. (a node that
did not yet receive the network leave will potentially tcp/sync)
add network reapTime, was not being set locally
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
- Diagnose framework that exposes REST API for db interaction
- Dockerfile to build the test image
- Periodic print of stats regarding queue size
- Client and server side for integration with testkit
- Added write-delete-leave-join
- Added test write-delete-wait-leave-join
- Added write-wait-leave-join
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
- Introduce the possibility to specify the max buffer length
in network DB. This will allow to use the whole MTU limit of
the interface
- Add queue stats per network, it can be handy to identify the
node's throughput per network and identify unbalance between
nodes that can point to an MTU missconfiguration
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
A rapid (within networkReapTime 30min) leave/join network
can corrupt the list of nodes per network with multiple copies
of the same nodes.
The fix makes sure that each node is present only once
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Commit ca9a768d80
added a number of debugging messages for node join/leave
events.
This patch checks if a node already was listed,
and otherwise skips the logging to make the logs a bit
less noisy.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
changed the ipMap to SetMatrix to allow transient states
Compacted the addSvc and deleteSvc into a one single method
Updated the datastructure for backends to allow storing all the information needed
to cleanup properly during the cleanupServiceBindings
Removed the enable/disable Service logic that was racing with sbLeave/sbJoin logic
Add some debug logs to track further race conditions
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
The channel ch.C is never closed.
Added the listen of the ch.Done() to guarantee
that the goroutine is exiting once the event channel
is closed
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
The time to keep a node failed into the failed node list
was originally supposed to be 24h.
If a node leaves explicitly it will be removed from the list of nodes
and put into the leftNodes list. This way the NotifyLeave event won't
insert it into the retry list.
NOTE: if the event is lost instead the behavior will be the same as a failed node.
If a node fails, the NotifyLeave will insert it into the failedNodes
list with a reapTime of 24h. This means that the node will be checked
for 24h before being completely forgot. The current check time is every
1 second and is done by the reconnectNode function.
The failed node list is updated every 2h instead.
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Memberlist does a full validation of the protocol version (min, current, max)
amoung all the ndoes of the cluster.
The previous code was setting the protocol version to max version.
That made the upgrade incompatible.
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
- Otherwise operation will unnecessarely block
for five seconds.
- This is particularly noticeable on graceful
shutdown of daemon in one node cluster.
Signed-off-by: Alessandro Boch <aboch@docker.com>
- Do not run the risk of suppressing meaningful messages
for the rest of the cluster, as a many services depend
on it, like the service records and the distributed
load balancers.
Signed-off-by: Alessandro Boch <aboch@docker.com>
With the introduction of networkdb, the node discovery events were not
sent to the drivers. This commit generates the node discovery events and
sents it to the drivers interested in it.
Signed-off-by: Madhu Venugopal <madhu@docker.com>
Right now, items logged by memberlist end up as a complete log line
embedded inside another log line, like the following:
Nov 22 16:34:16 hostname dockerd: time="2016-11-22T16:34:16.802103258-08:00" level=info msg="2016/11/22 16:34:16 [INFO] memberlist: Marking xyz-1d1ec2dfa053 as failed, suspect timeout reached\n"
This has two time and date stamps, and an escaped newline inside the
"msg" field of the outer log message.
To fix this, define a custom logger that only prints the message itself.
Capture this message in logWriter, strip off the log level (added
directly by memberlist), and route to the appropriate logrus method.
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
- Disable ipv6 on all interface by default at sandbox creation.
Enable IPv6 per interface basis if the interface has an IPv6
address. In case sandbox has an IPv6 interface, also enable
IPv6 on loopback interface.
Signed-off-by: Alessandro Boch <aboch@docker.com>
Once the bulksync ack channel is closed remove it from the ack table
right away. There is no reason to keep it in the ack table and later
delete it in the ack waiter. Ack waiter anyways has reference to the
channel on which it is waiting.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
When a gossip join failure happens do not return early in the call chain
because a join failure is most likely transient and the retry logic
built in the networkdb is going to retry and succeed. Returning early
makes the initialization of ingress network/sandbox to not happen which
causes a problem even after the gossip join on retry is successful.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>