Commit graph

141 commits

Author SHA1 Message Date
Sebastiaan van Stijn
188c5d4a7c
linting: suppress false positive for G404 (gosec)
The linter falsely detects this as using "math/rand":

    libnetwork/networkdb/cluster.go:721:14: G404: Use of weak random number generator (math/rand instead of crypto/rand) (gosec)
       val, err := rand.Int(rand.Reader, big.NewInt(int64(n)))
                   ^

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 561a010161)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-09-06 15:11:42 +02:00
David Wang
2293a20972
Test: wait for network changes in TestNetworkDBNodeJoinLeaveIteration
In network node change test, the expected behavior is focused on how many nodes
left in networkDB, besides timing issues, things would also go tricky for a
leave-then-join sequence, if the check (counting the nodes) happened before the
first "leave" event, then the testcase actually miss its target and report PASS
without verifying its final result; if the check happened after the 'leave' event,
but before the 'join' event, the test would report FAIL unnecessary;

This code change would check both the db changes and the node count, it would
report PASS only when networkdb has indeed changed and the node count is expected.

Signed-off-by: David Wang <00107082@163.com>
(cherry picked from commit f499c6b9ec)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-08-24 01:45:06 +02:00
Sebastiaan van Stijn
cdbca4061b
gofmt GoDoc comments with go1.19
Older versions of Go don't format comments, so committing this as
a separate commit, so that we can already make these changes before
we upgrade to Go 1.19.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 52c1a2fae8)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-07-13 22:42:29 +02:00
Sebastiaan van Stijn
db977355b0
fix typo (cluser -> cluster)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-06-27 15:12:14 +02:00
Sebastiaan van Stijn
b9c8eca468
libnetwork/networkdb: remove some redundant fmt.Sprintf()'s
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-02-15 12:56:23 +01:00
frobnicaty
d78b883576 Fix grammar for "does not exist"
as opposed to "does not exists"

Signed-off-by: frobnicaty <92033765+frobnicaty@users.noreply.github.com>
2021-12-03 15:50:13 +00:00
Eng Zer Jun
c55a4ac779
refactor: move from io/ioutil to io and os package
The io/ioutil package has been deprecated in Go 1.16. This commit
replaces the existing io/ioutil functions with their new definitions in
io and os packages.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2021-08-27 14:56:57 +08:00
Sebastiaan van Stijn
686be57d0a
Update to Go 1.17.0, and gofmt with Go 1.17
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-08-24 23:33:27 +02:00
Sebastiaan van Stijn
427ad30c05
libnetwork: remove unused "testutils" imports
Perhaps the testutils package in the past had an `init()` function to set up
specific things, but it no longer has. so these imports were doing nothing.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-08-18 14:20:37 +02:00
Roman Volosatovs
b821590461
libnetwork/networkdb: consistently wait for nodes in tests
Use `verifyNetworkExistence` like it was done in 2837fba75f

Signed-off-by: Roman Volosatovs <roman.volosatovs@docker.com>
2021-08-01 17:47:51 +02:00
Roman Volosatovs
8fbba73f42
libnetwork: wait until t.Deadline() instead of hardcoded value
Signed-off-by: Roman Volosatovs <roman.volosatovs@docker.com>
2021-08-01 17:47:50 +02:00
Roman Volosatovs
2837fba75f
libnetwork: ensure all nodes are available in tests
`github.com/hashicorp/memberlist` update caused `TestNetworkDBCRUDTableEntries`
to occasionally fail, because the test would try to check whether an entry
write is propagated to all nodes, but it would not wait for all nodes to
be available before performing the write.
It could be that the failure is caused simply by improved performance of
the dependency - it could also be that some connectivity guarantee the
test depended on is not provided by the dependency anymore.
The same fix is applied to `TestNetworkDBNodeJoinLeaveIteration` due to
same issue.

Signed-off-by: Roman Volosatovs <roman.volosatovs@docker.com>
2021-07-12 19:25:50 +02:00
Roman Volosatovs
d7a2635537
libnetwork: make rejoin intervals configurable
This allows the rejoin intervals to be chosen according to the context
within which the component is used, and, in particular, this allows
lower intervals to be used within TestNetworkDBIslands test.

Signed-off-by: Roman Volosatovs <roman.volosatovs@docker.com>
2021-07-12 19:25:49 +02:00
Brian Goff
116f200737
Fix gosec complaints in libnetwork
These were purposefully ignored before but this goes ahead and "fixes"
most of them.
Note that none of the things gosec flagged are problematic, just
quieting the linter here.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2021-06-25 18:02:03 +02:00
Sebastiaan van Stijn
9f6add406e
networkdb: mark test-helpers as t.Helper()
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-06-09 01:44:46 +02:00
Brian Goff
0dd8bc6d31 Fix flakey test TestNetworkDBIslands
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2021-06-02 16:53:29 +00:00
Brian Goff
b3c883bb2f Skip libnetwork integration tests on Windows
Most of these tests are making use of the bridge network and do not work
on Windows.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2021-06-02 16:53:29 +00:00
Brian Goff
4b981436fe Fixup libnetwork lint errors
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2021-06-01 23:48:32 +00:00
Brian Goff
a0a473125b Fix libnetwork imports
After moving libnetwork to this repo, we need to update all the import
paths for libnetwork to point to docker/docker/libnetwork instead of
docker/libnetwork.
This change implements that.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2021-06-01 21:51:23 +00:00
Sebastiaan van Stijn
3e1e9e878c vendor: gotest.tools v3.0.2
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-09-12 03:22:18 +02:00
Sebastiaan van Stijn
847f469e76 regenerate protobufs with debian buster
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-02-26 16:03:42 +01:00
Arko Dasgupta
34a636bf51 Fix flaky NetworkDB tests
Fixed these tests :

1.TestNetworkDBIslands
Addresses : https://github.com/docker/libnetwork/issues/2402

2.TestNetworkDBCRUDMediumCluster
Addresses : https://github.com/docker/libnetwork/issues/2401

By :

1. Importing gotest.tools/poll to use poll.WaitOn
Above function can be used to check a condition at regular intervals
until a timeout is reached

2. Replacing Sleep with poll.WaitOn

2. Adding closeNetworkDBInstances to close remaining DBs

Signed-off-by: Arko Dasgupta <arko.dasgupta@docker.com>
2019-10-04 10:17:19 -07:00
Flavio Crisciani
2b1e45c682 Merge pull request #2238 from talex5/networkdb-docs
Add NetworkDB docs
2019-03-14 16:05:31 -07:00
Flavio Crisciani
151f42aeaa Fix possible nil pointer exception
It is possible that the node is not yet present in
the node list map. In this case just print a warning
and return. The next iteration would be fine

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2019-01-22 17:07:15 -08:00
Lei Gong
1adcfa9aa1 fix error when make lint
```
make lint
networkdb/networkdb_test.go:88:2: should replace t.Error(fmt.Sprintf(...)) with t.Errorf(...)
networkdb/networkdb_test.go:136:2: should replace t.Error(fmt.Sprintf(...)) with t.Errorf(...)
make: *** [lint] Error 1
```

Signed-off-by: Lei Gong <lgong@alauda.io>
2018-09-08 21:06:07 +08:00
Thomas Leonard
05c05ea5e9 Add NetworkDB docs
This is based on reading the code in the `networkdb` directory.

Signed-off-by: Thomas Leonard <thomas.leonard@docker.com>
2018-08-08 13:35:11 +01:00
Flavio Crisciani
204ce3e31d Create internal directory
Internal directory is designed to contain libraries
that are exclusively used by this project

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2018-07-16 17:34:20 -07:00
Josh Soref
a06f1b2c4e Spelling fixes
* addresses
* assigned
* at least
* attachments
* auxiliary
* available
* cleanup
* communicate
* communications
* configuration
* connection
* connectivity
* destination
* encountered
* endpoint
* example
* existing
* expansion
* expected
* external
* forwarded
* gateway
* implementations
* implemented
* initialize
* internally
* loses
* message
* network
* occurred
* operational
* origin
* overlapping
* reaper
* redirector
* release
* representation
* resolver
* retrieve
* returns
* sanbdox
* sequence
* succesful
* synchronizing
* update
* validates

Signed-off-by: Josh Soref <jsoref@gmail.com>
2018-07-12 12:54:44 -07:00
Vincent Demeester
06d471d186 Migrate to gotest.tools :)
Signed-off-by: Vincent Demeester <vincent@sbr.pm>
2018-07-06 11:01:37 -07:00
Flavio Crisciani
b0a0059237 Merge pull request #2216 from fcrisciani/netdb-qlen-issue
NetworkDB qlen optimization
2018-07-05 15:02:58 -07:00
Chris Telfer
06922d2d81 Use fmt precision to limit string length
The previous code used string slices to limit the length of certain
fields like endpoint or sandbox IDs.  This assumes that these strings
are at least as long as the slice length.  Unfortunately, some sandbox
IDs can be smaller than 7 characters.   This fix addresses this issue
by systematically converting format string calls that were taking
fixed-slice arguments to use a precision specifier in the string format
itself.  From the golang fmt package documentation:

    For strings, byte slices and byte arrays, however, precision limits
    the length of the input to be formatted (not the size of the output),
    truncating if necessary. Normally it is measured in runes, but for
    these types when formatted with the %x or %X format it is measured
    in bytes.

This nicely fits the desired behavior: it will limit the number of
runes considered for string interpolation to the precision value.

Signed-off-by: Chris Telfer <ctelfer@docker.com>
2018-07-05 17:44:04 -04:00
Flavio Crisciani
55e4cc7262 Optimize networkDB queue
Added some optimizations to reduce the messages in the queue:
1) on join network the node execute a tcp sync with all the nodes that
it is aware part of the specific network. During this time before the
node was redistributing all the entries. This meant that if the network
had 10K entries the queue of the joining node will jump to 10K. The fix
adds a flag on the network that would avoid to insert any entry in the
queue till the sync happens. Note that right now the flag is set in
a best effort way, there is no real check if at least one of the nodes
succeed.
2) limit the number of messages to redistribute coming from a TCP sync.
Introduced a threshold that limit the number of messages that are
propagated, this will disable this optimization in case of heavy load.

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2018-07-02 16:59:45 -07:00
Flavio Crisciani
b09cb39fa5 Enhance testing infra
Allow to write and delete X number of entries
Allow to query the queue length

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2018-07-02 16:47:34 -07:00
Flavio Crisciani
7fc1795cdf Allows to set generic knobs on the Sandbox
Refactor the ostweaks file to allows a more easy reuse
Add a method on the osl.Sandbox interface to allow setting
knobs on the sandbox

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2018-06-28 16:14:08 -07:00
Flavio Crisciani
ef457321a9 Merge pull request #2200 from fcrisciani/networkdb-retry
Adjust corner case for reconnect logic
2018-06-28 16:00:00 -07:00
Euan Harris
96c7cba64c networkdb, drivers: Regenerate protocol buffers
agent.pb.go is unchanged, but the files in networkdb and drivers
are slightly different when regenerated using the current versions
of protoc and gogoproto.    This is probably because agent.pb.go
was last regenerated quite recently, in February 2018, whereas
networkdb.pb.go and overlay/overlay.pb.go were last changed in 2017,
and windows/overlay/overlay.pb.go was last changed in 2016.

Signed-off-by: Euan Harris <euan.harris@docker.com>
2018-06-22 15:03:12 +01:00
Flavio Crisciani
500d9f4515 Adjust corner case for reconnect logic
Previous logic was not accounting that each node is
in the node list so the bootstrap nodes won't retry
to reconnect because they will always find themselves
in the node map
Added test that validate the gossip island condition

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2018-06-21 18:04:55 -07:00
Flavio Crisciani
48196df4a2 Further makefile cleanup
- cleaned the make check
- local build do not require context

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2018-06-16 11:03:11 -07:00
Flavio Crisciani
65e8971ffd Merge pull request #2134 from dani-docker/esc-532
Adding a recovery mechanism for a split gossip cluster
2018-04-23 13:14:27 -07:00
Dani Louca
96472cdaea Adding a recovery mechanism for a split gossip cluster
Signed-off-by: Dani Louca <dani.louca@docker.com>
2018-04-23 14:18:46 -04:00
Brian Goff
bc465326fe networkdb: Use write lock in handleNodeEvent
`handleNodeEvent` is calling `changeNodeState` which writes to various
maps on the ndb object.
Using a write lock prevents a panic on concurrent read/write access on
these maps.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-04-11 21:28:29 -04:00
Flavio Crisciani
9b7922ff6e Fix README flag and expose orphan network peers
- Readme example was using wrong flag
- Network peers were not exposed properly

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2018-03-23 10:19:02 -07:00
Flavio Crisciani
a59ecd9537 Change diagnose module name to diagnostic
Align it to the moby/moby external api

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2018-01-25 16:09:29 -08:00
Flavio Crisciani
64da6b8889 Avoid delay on node rejoin, avoid useless witness
Avoid waiting for a double notification once a node rejoin, just
put it back to active state. Waiting for a further message does not
really add anything to the safety of the operation, the source of truth
for the node status resided inside memberlist.

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2018-01-23 16:21:18 -08:00
Flavio Crisciani
b190ee3ccf Cleanup node management logic
Created method to handle the node state change with cleanup operation
associated.
Realign testing client with the new diagnostic interface

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-12-13 09:40:38 -08:00
Flavio Crisciani
3e544bc500 Avoid extra notification on node leave
If a node leave, avoid to notify the upper layer
for entries that are already marked for deletion

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-12-01 16:19:38 -08:00
Flavio Crisciani
b578cdce86 Diagnose framework for networkDB
This commit introduces the possibility to enable a debug mode
for the networkDB, this will allow the opening of a tcp port
on localhost that will expose the networkDB api for debugging
purposes.

The API can be discovered using curl localhost:<port>/help
It support json output if passed json as URL query parameter
option and pretty printing if passing json=pretty

All the binaries values are serialized in base64 encoding, this
can be skip passing the unsafe option as url query parameter

A simple go client will follow up

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-12-01 16:19:35 -08:00
Flavio Crisciani
f0fcb0bbe6 Fixed race on quick node fail/join
The previous logic was not properly handling the case of a node
that was failing and oining back in short period of time.
The issue was in the handling of the network messages.
When a node joins it sync with other nodes, these are passing
the whole list of nodes that at best of their knowledge are part
of a network. At this point if the node receives that node A is part
of the network it saves it before having received the notification
that node A is actually alive (coming from memberlist).
If node A failed the source node will receive the notification
while the new joined node won't because memberlist never advertise
node A as available. In this case the new node will never purge
node A from its state but also worse, will accept any table notification
where node A is the owner and so will end up in a out of sync state
with the rest of the cluster.

This commit contains also some code cleanup around the area of node
management

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-11-27 14:38:06 -08:00
Flavio Crisciani
4037132b33 Fix listen port for test infra
Update Dockerfile, curl is used for the healthcheck
Add /dump for creating the routine stack trace

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-11-16 16:23:44 -08:00
Flavio Crisciani
a41f623b10 Merge pull request #1957 from fcrisciani/netdb-gc-test
Add test to confirm garbage collection
2017-11-08 16:25:47 -08:00