Commit graph

105 commits

Author SHA1 Message Date
Euan Harris
96c7cba64c networkdb, drivers: Regenerate protocol buffers
agent.pb.go is unchanged, but the files in networkdb and drivers
are slightly different when regenerated using the current versions
of protoc and gogoproto.    This is probably because agent.pb.go
was last regenerated quite recently, in February 2018, whereas
networkdb.pb.go and overlay/overlay.pb.go were last changed in 2017,
and windows/overlay/overlay.pb.go was last changed in 2016.

Signed-off-by: Euan Harris <euan.harris@docker.com>
2018-06-22 15:03:12 +01:00
Flavio Crisciani
48196df4a2 Further makefile cleanup
- cleaned the make check
- local build do not require context

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2018-06-16 11:03:11 -07:00
Flavio Crisciani
65e8971ffd Merge pull request #2134 from dani-docker/esc-532
Adding a recovery mechanism for a split gossip cluster
2018-04-23 13:14:27 -07:00
Dani Louca
96472cdaea Adding a recovery mechanism for a split gossip cluster
Signed-off-by: Dani Louca <dani.louca@docker.com>
2018-04-23 14:18:46 -04:00
Brian Goff
bc465326fe networkdb: Use write lock in handleNodeEvent
`handleNodeEvent` is calling `changeNodeState` which writes to various
maps on the ndb object.
Using a write lock prevents a panic on concurrent read/write access on
these maps.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-04-11 21:28:29 -04:00
Flavio Crisciani
9b7922ff6e Fix README flag and expose orphan network peers
- Readme example was using wrong flag
- Network peers were not exposed properly

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2018-03-23 10:19:02 -07:00
Flavio Crisciani
a59ecd9537 Change diagnose module name to diagnostic
Align it to the moby/moby external api

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2018-01-25 16:09:29 -08:00
Flavio Crisciani
64da6b8889 Avoid delay on node rejoin, avoid useless witness
Avoid waiting for a double notification once a node rejoin, just
put it back to active state. Waiting for a further message does not
really add anything to the safety of the operation, the source of truth
for the node status resided inside memberlist.

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2018-01-23 16:21:18 -08:00
Flavio Crisciani
b190ee3ccf Cleanup node management logic
Created method to handle the node state change with cleanup operation
associated.
Realign testing client with the new diagnostic interface

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-12-13 09:40:38 -08:00
Flavio Crisciani
3e544bc500 Avoid extra notification on node leave
If a node leave, avoid to notify the upper layer
for entries that are already marked for deletion

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-12-01 16:19:38 -08:00
Flavio Crisciani
b578cdce86 Diagnose framework for networkDB
This commit introduces the possibility to enable a debug mode
for the networkDB, this will allow the opening of a tcp port
on localhost that will expose the networkDB api for debugging
purposes.

The API can be discovered using curl localhost:<port>/help
It support json output if passed json as URL query parameter
option and pretty printing if passing json=pretty

All the binaries values are serialized in base64 encoding, this
can be skip passing the unsafe option as url query parameter

A simple go client will follow up

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-12-01 16:19:35 -08:00
Flavio Crisciani
f0fcb0bbe6 Fixed race on quick node fail/join
The previous logic was not properly handling the case of a node
that was failing and oining back in short period of time.
The issue was in the handling of the network messages.
When a node joins it sync with other nodes, these are passing
the whole list of nodes that at best of their knowledge are part
of a network. At this point if the node receives that node A is part
of the network it saves it before having received the notification
that node A is actually alive (coming from memberlist).
If node A failed the source node will receive the notification
while the new joined node won't because memberlist never advertise
node A as available. In this case the new node will never purge
node A from its state but also worse, will accept any table notification
where node A is the owner and so will end up in a out of sync state
with the rest of the cluster.

This commit contains also some code cleanup around the area of node
management

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-11-27 14:38:06 -08:00
Flavio Crisciani
4037132b33 Fix listen port for test infra
Update Dockerfile, curl is used for the healthcheck
Add /dump for creating the routine stack trace

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-11-16 16:23:44 -08:00
Flavio Crisciani
a41f623b10 Merge pull request #1957 from fcrisciani/netdb-gc-test
Add test to confirm garbage collection
2017-11-08 16:25:47 -08:00
Flavio Crisciani
7fbaf6de2c Add test to confirm garbage collection
- Create a test to verify that a node that joins
  in an async way is not going to extend the life
  of a already deleted object

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-10-23 09:58:57 +02:00
Flavio Crisciani
1732ab426d Handle cleanup DNS for attachable container
Attachable containers they are tasks with no service associated
their cleanup was not done properly so it was possible to have
a leak of their name resolution if that was the last container
on the network.
Cleanupservicebindings was not able to do the cleanup because there
is no service, while also the notification of the delete arrives
after that the network is already being cleaned

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-10-12 21:41:29 -07:00
Flavio Crisciani
ad577a25fe Changed ipMask to string
Avoid error logs in case of local peer case, there is no need for deleteNeighbor
Avoid the network leave to readvertise already deleted entries to upper layer

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-10-02 17:29:18 -07:00
Flavio Crisciani
ef2e91707d Merge pull request #1958 from ityangchen/test-libnetwork
Repair (*Broadcaster).run goroutine leak
2017-09-30 10:44:12 -07:00
Flavio Crisciani
b92d91d6a1 Fix comparison against wrong constant
The comparison was against the wrong constant value.
As described in the comment the check is there to guarantee
to not propagate events realted to stale deleted elements

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-09-29 21:05:24 -07:00
yangchenliang
955c532735 Repair (*Broadcaster).run goroutine leak
When execute 'docker swarm init' and 'docker swarm leave -f' on a node
repeatedly, the (*Broadcaster).run goroutine leak.

Signed-off-by: yangchenliang <yangchenliang@huawei.com>
2017-09-29 18:56:16 +08:00
Flavio Crisciani
8c31217a44 NetworkDB create NodeID for cluster nodes
Separate the hostname from the node identifier. All the messages
that are exchanged on the network are containing a nodeName field
that today was hostname-uniqueid. Now being encoded as strings in
the protobuf without any length restriction they plays a role
on the effieciency of protocol itself. If the hostname is very long
the overhead will increase and will degradate the performance of
the database itself that each single cycle by default allows 1400
bytes payload

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-09-26 10:48:04 -07:00
Flavio Crisciani
a4e64d05c1 Avoid alignment of reapNetwork and tableEntries
Make sure that the network is garbage collected after
the entries. Entries to be deleted requires that the network
is present.

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-09-22 10:57:47 -07:00
Flavio Crisciani
053a534ab1 Changed ReapTable logic
- Changed the loop per network. Previous implementation was taking a
  ReadLock to update the reapTime but now with the residualReapTime
  also the bulkSync is using the same ReadLock creating possible
  issues in concurrent read and update of the value.
  The new logic fetches the list of networks and proceed to the
  cleanup network by network locking the database and releasing it
  after each network. This should ensure a fair locking avoiding
  to keep the database blocked for too much time.

  Note: The ticker does not guarantee that the reap logic runs
  precisely every reapTimePeriod, actually documentation says that
  if the routine is too long will skip ticks. In case of slowdown
  of the process itself it is possible that the lifetime of the
  deleted entries increases, it still should not be a huge problem
  because now the residual reaptime is propagated among all the nodes
  a slower node will let the deleted entry being repropagate multiple
  times but the state will still remain consistent.

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-09-21 09:37:47 -07:00
Flavio Crisciani
2d2a2bc568 Fix reapTime logic in NetworkDB
- Added remainingReapTime field in the table event.
  Wihtout it a node that did not have a state for the element
  was marking the element for deletion setting the max reapTime.
  This was creating the possibility to keep the entry being resync
  between nodes forever avoding the purpose of the reap time
  itself.

- On broadcast of the table event the node owner was rewritten
  with the local node name, this was not correct because the owner
  should continue to remain the original one of the message

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-09-21 09:37:37 -07:00
Derek McGowan
710e0664c4 Update logrus to v1.0.1
Fix case sensitivity issue
Update docker and runc vendors

Signed-off-by: Derek McGowan <derek@mcgstyle.net>
2017-08-07 11:20:47 -07:00
Flavio Crisciani
2e38c53def PeerInit for the sandbox init
Move the sandbox init logic into the go routine that handles
peer operations.
This is to avoid deadlocks in the use of the pMap.Lock for the
network

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-08-05 12:07:31 -07:00
Flavio Crisciani
d6440c9139 optimize the rebroadcast for failure case
Before when a node was failing, all the nodes would bump the lamport time of all their
entries. This means that if a node flap, there will be a storm of update of all the entries.
This commit on the base of the previous logic guarantees that only the node that joins back
will readvertise its own entries, the other nodes won't need to advertise again.

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-08-01 14:08:54 -07:00
Flavio Crisciani
a3ecb8902a fix join/leave
join/leave fixes:
 - when a node leaves the network will deletes all the other nodes entries but will keep track of its
   to make sure that other nodes if they are tcp syncing will be aware of them being deleted. (a node that
   did not yet receive the network leave will potentially tcp/sync)

add network reapTime, was not being set locally

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-08-01 14:08:45 -07:00
Flavio Crisciani
e77c245e45 2x faster to converge
- Introduced back the Invalidate
- optimized the rebroadcast logic

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-08-01 13:47:18 -07:00
Flavio Crisciani
585964bf32 NetworkDB testing infra
- Diagnose framework that exposes REST API for db interaction
- Dockerfile to build the test image
- Periodic print of stats regarding queue size
- Client and server side for integration with testkit
- Added write-delete-leave-join
- Added test write-delete-wait-leave-join
- Added write-wait-leave-join

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-07-27 08:50:43 -07:00
Flavio Crisciani
60b5add4af NetworkDB allow setting PacketSize
- Introduce the possibility to specify the max buffer length
  in network DB. This will allow to use the whole MTU limit of
  the interface

- Add queue stats per network, it can be handy to identify the
  node's throughput per network and identify unbalance between
  nodes that can point to an MTU missconfiguration

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-07-26 13:44:33 -07:00
Flavio Crisciani
051a0d5ce9 NetworkDB incorrect number of entries in networkNodes
A rapid (within networkReapTime 30min) leave/join network
can corrupt the list of nodes per network with multiple copies
of the same nodes.
The fix makes sure that each node is present only once

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-07-18 16:57:49 -07:00
Sebastiaan van Stijn
3dd1fb1217 Make node join event logging less noisy
Commit ca9a768d80
added a number of debugging messages for node join/leave
events.

This patch checks if a node already was listed,
and otherwise skips the logging to make the logs a bit
less noisy.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2017-07-10 17:25:14 -07:00
Santhosh Manohar
6bd57f977d Fix go generate for protobuf
Signed-off-by: Santhosh Manohar <santhosh@docker.com>
2017-07-05 16:31:12 -07:00
Flavio Crisciani
39d2204896 Service discovery logic rework
changed the ipMap to SetMatrix to allow transient states
Compacted the addSvc and deleteSvc into a one single method
Updated the datastructure for backends to allow storing all the information needed
to cleanup properly during the cleanupServiceBindings
Removed the enable/disable Service logic that was racing with sbLeave/sbJoin logic
Add some debug logs to track further race conditions

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-06-11 20:49:29 -07:00
Madhu Venugopal
78a910ee17 Merge pull request #1787 from fcrisciani/goroutine_leak
Fix leak of handleTableEvents
2017-06-06 13:17:17 -07:00
Madhu Venugopal
59994bbb15 Merge pull request #1775 from sanimej/gossip
Handle single manager reload by having workers reconnect
2017-05-31 14:57:34 -07:00
Santhosh Manohar
ca9a768d80 Handle single manager reload by having workers reconnect
Signed-off-by: Santhosh Manohar <santhosh@docker.com>
2017-05-31 14:36:23 -07:00
Flavio Crisciani
6d768ef73c Fix leak of handleTableEvents
The channel ch.C is never closed.
Added the listen of the ch.Done() to guarantee
that the goroutine is exiting once the event channel
is closed

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-05-31 11:04:19 -07:00
Flavio Crisciani
f585f33042 Node failure timeout fix
The time to keep a node failed into the failed node list
was originally supposed to be 24h.

If a node leaves explicitly it will be removed from the list of nodes
and put into the leftNodes list. This way the NotifyLeave event won't
insert it into the retry list.
NOTE: if the event is lost instead the behavior will be the same as a failed node.

If a node fails, the NotifyLeave will insert it into the failedNodes
list with a reapTime of 24h. This means that the node will be checked
for 24h before being completely forgot. The current check time is every
1 second and is done by the reconnectNode function.
The failed node list is updated every 2h instead.

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-05-22 17:19:31 -07:00
Santhosh Manohar
06c3489bb8 retry once on a bulk sync failure
Signed-off-by: Santhosh Manohar <santhosh@docker.com>
2017-05-11 21:13:18 -07:00
Flavio Crisciani
da9ac65ea6 Remove explicit set of memberlist protocol
Memberlist does a full validation of the protocol version (min, current, max)
amoung all the ndoes of the cluster.
The previous code was setting the protocol version to max version.
That made the upgrade incompatible.

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-05-08 16:58:53 -07:00
Madhu Venugopal
1624c61ef2 Merge pull request #1727 from sanimej/cphard
control-plane hardening: Avoid nDB stale entries
2017-04-25 11:04:13 -07:00
Santhosh Manohar
1693144ae2 Merge pull request #1713 from aboch/nse
On clusterLeave, notify only if there are peers
2017-04-23 16:31:46 -07:00
Alessandro Boch
1323730eca On send node envents, notify only if there are peers
- Otherwise operation will unnecessarely block
  for five seconds.
- This is particularly noticeable on graceful
  shutdown of daemon in one node cluster.

Signed-off-by: Alessandro Boch <aboch@docker.com>
2017-04-21 10:19:08 -07:00
Santhosh Manohar
102f9d230d Avoid nDB stale entries because of intermittent nw issues.
Signed-off-by: Santhosh Manohar <santhosh@docker.com>
2017-04-19 14:01:28 -07:00
Santhosh Manohar
69ad7ef244 control-plane hardning: cleanup local state on peer leaving a network
Signed-off-by: Santhosh Manohar <santhosh@docker.com>
2017-03-31 01:49:03 -07:00
Santhosh Manohar
539888412b Merge pull request #1689 from aboch/inv
Do not invalidate table event messages
2017-03-16 13:47:01 -07:00
Alessandro Boch
9c3c86a931 Do not invalidate table event messages
- Do not run the risk of suppressing meaningful messages
  for the rest of the cluster, as a many services depend
  on it, like the service records and the distributed
  load balancers.

Signed-off-by: Alessandro Boch <aboch@docker.com>
2017-03-16 00:49:58 -07:00
Alessandro Boch
4b306ee83d Fix panic in networkdb test code
fatal error: concurrent map read and map write

goroutine 264 [running]:
runtime.throw(0x90043c, 0x21)
	/usr/local/go/src/runtime/panic.go:566 +0x95 fp=0xc4203d1d68 sp=0xc4203d1d48
runtime.mapaccess2_faststr(0x86df20, 0xc4203f5470, 0xc42044afc0, 0x5, 0xc4203d1e40, 0x4ed6b8)
	/usr/local/go/src/runtime/hashmap_fast.go:306 +0x52b fp=0xc4203d1dc8 sp=0xc4203d1d68
github.com/docker/libnetwork/networkdb.(*NetworkDB).verifyNodeExistence(0xc42007e160, 0xc42008a240, 0xc42044afc0, 0x5, 0x1)
	/go/src/github.com/docker/libnetwork/networkdb/networkdb_test.go:58 +0x6c fp=0xc4203d1e50 sp=0xc4203d1dc8

Signed-off-by: Alessandro Boch <aboch@docker.com>
2017-03-15 23:26:32 -07:00