In network node change test, the expected behavior is focused on how many nodes
left in networkDB, besides timing issues, things would also go tricky for a
leave-then-join sequence, if the check (counting the nodes) happened before the
first "leave" event, then the testcase actually miss its target and report PASS
without verifying its final result; if the check happened after the 'leave' event,
but before the 'join' event, the test would report FAIL unnecessary;
This code change would check both the db changes and the node count, it would
report PASS only when networkdb has indeed changed and the node count is expected.
Signed-off-by: David Wang <00107082@163.com>
(cherry picked from commit f499c6b9ec)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The io/ioutil package has been deprecated in Go 1.16. This commit
replaces the existing io/ioutil functions with their new definitions in
io and os packages.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
Perhaps the testutils package in the past had an `init()` function to set up
specific things, but it no longer has. so these imports were doing nothing.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
`github.com/hashicorp/memberlist` update caused `TestNetworkDBCRUDTableEntries`
to occasionally fail, because the test would try to check whether an entry
write is propagated to all nodes, but it would not wait for all nodes to
be available before performing the write.
It could be that the failure is caused simply by improved performance of
the dependency - it could also be that some connectivity guarantee the
test depended on is not provided by the dependency anymore.
The same fix is applied to `TestNetworkDBNodeJoinLeaveIteration` due to
same issue.
Signed-off-by: Roman Volosatovs <roman.volosatovs@docker.com>
This allows the rejoin intervals to be chosen according to the context
within which the component is used, and, in particular, this allows
lower intervals to be used within TestNetworkDBIslands test.
Signed-off-by: Roman Volosatovs <roman.volosatovs@docker.com>
After moving libnetwork to this repo, we need to update all the import
paths for libnetwork to point to docker/docker/libnetwork instead of
docker/libnetwork.
This change implements that.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Fixed these tests :
1.TestNetworkDBIslands
Addresses : https://github.com/docker/libnetwork/issues/2402
2.TestNetworkDBCRUDMediumCluster
Addresses : https://github.com/docker/libnetwork/issues/2401
By :
1. Importing gotest.tools/poll to use poll.WaitOn
Above function can be used to check a condition at regular intervals
until a timeout is reached
2. Replacing Sleep with poll.WaitOn
2. Adding closeNetworkDBInstances to close remaining DBs
Signed-off-by: Arko Dasgupta <arko.dasgupta@docker.com>
```
make lint
networkdb/networkdb_test.go:88:2: should replace t.Error(fmt.Sprintf(...)) with t.Errorf(...)
networkdb/networkdb_test.go:136:2: should replace t.Error(fmt.Sprintf(...)) with t.Errorf(...)
make: *** [lint] Error 1
```
Signed-off-by: Lei Gong <lgong@alauda.io>
Refactor the ostweaks file to allows a more easy reuse
Add a method on the osl.Sandbox interface to allow setting
knobs on the sandbox
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Previous logic was not accounting that each node is
in the node list so the bootstrap nodes won't retry
to reconnect because they will always find themselves
in the node map
Added test that validate the gossip island condition
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Avoid waiting for a double notification once a node rejoin, just
put it back to active state. Waiting for a further message does not
really add anything to the safety of the operation, the source of truth
for the node status resided inside memberlist.
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Created method to handle the node state change with cleanup operation
associated.
Realign testing client with the new diagnostic interface
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
- Create a test to verify that a node that joins
in an async way is not going to extend the life
of a already deleted object
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Separate the hostname from the node identifier. All the messages
that are exchanged on the network are containing a nodeName field
that today was hostname-uniqueid. Now being encoded as strings in
the protobuf without any length restriction they plays a role
on the effieciency of protocol itself. If the hostname is very long
the overhead will increase and will degradate the performance of
the database itself that each single cycle by default allows 1400
bytes payload
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
- Changed the loop per network. Previous implementation was taking a
ReadLock to update the reapTime but now with the residualReapTime
also the bulkSync is using the same ReadLock creating possible
issues in concurrent read and update of the value.
The new logic fetches the list of networks and proceed to the
cleanup network by network locking the database and releasing it
after each network. This should ensure a fair locking avoiding
to keep the database blocked for too much time.
Note: The ticker does not guarantee that the reap logic runs
precisely every reapTimePeriod, actually documentation says that
if the routine is too long will skip ticks. In case of slowdown
of the process itself it is possible that the lifetime of the
deleted entries increases, it still should not be a huge problem
because now the residual reaptime is propagated among all the nodes
a slower node will let the deleted entry being repropagate multiple
times but the state will still remain consistent.
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
- Introduce the possibility to specify the max buffer length
in network DB. This will allow to use the whole MTU limit of
the interface
- Add queue stats per network, it can be handy to identify the
node's throughput per network and identify unbalance between
nodes that can point to an MTU missconfiguration
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
A rapid (within networkReapTime 30min) leave/join network
can corrupt the list of nodes per network with multiple copies
of the same nodes.
The fix makes sure that each node is present only once
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
The channel ch.C is never closed.
Added the listen of the ch.Done() to guarantee
that the goroutine is exiting once the event channel
is closed
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
- Disable ipv6 on all interface by default at sandbox creation.
Enable IPv6 per interface basis if the interface has an IPv6
address. In case sandbox has an IPv6 interface, also enable
IPv6 on loopback interface.
Signed-off-by: Alessandro Boch <aboch@docker.com>
Network DB is a network scoped gossip database built
on top of hashicorp/memberlist providing an eventually
consistent state store.
It limits the scope of the gossip and periodic bulk syncing
for table entries to only the nodes which participate in the
network to which the gossip belongs. This designs make the
gossip layer scale better and only consumes resources for the
network state that the node participates in.
Since the complete state for a network is maintained by all nodes
participating in the network, all nodes will eventually converge
to the same state.
NetworkDB also provides facilities for the users of the package to
watch on any table (or all tables) and get notified if there are
state changes of interest that happened anywhere in the cluster when
that state change eventually finds it's way to the watcher's node.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>