Avoid error logs in case of local peer case, there is no need for deleteNeighbor
Avoid the network leave to readvertise already deleted entries to upper layer
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
In case of IP reuse locally there was a race condition
that was leaving the overlay namespace with wrong configuration
causing connectivity issues.
This commit introduces the use of setMatrix to handle the transient
state and make sure that the proper configuration is maintained
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
The package updated and now shows new warnings that had to be corrected
to let the CI pass
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
The CreateNetwork in the bridge driver was not able to properly
handle concurrent operations causing 2 issues:
1) crash from nil pointer exception
2) not proper handling of conflicting configuration
This commit addresses the 2 previous mentioned issues
and adds a test for it.
The test with the original code has a low failure frequency
to confirm the fix I had to add a time.Sleep in the body of the
CreateNetwork so to have a 100% failure
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Prevents an issue where the goroutine may jump to a new OS thread during
execution putting it into a mount/network NS that is unexpected.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit 6d8617d8757a759d806a3307ca04d4d588c04aed)
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
In the peerDelete the updateDB flag was always true
In the peerAdd the updateDB flag was always true except for
the initSandbox case. But now the initSandbox is handled by the
go routing of the peer operations, so we can move that flag
down and remove it from the top level functions
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
The peerDbDelete was passing the wrong field to the underlay
Delete operation causing the mac entry to not being deleted
from the bridge on the overlay. This caused connectivity issue
when a container that before was remote was now scheduled
on the local node. The entry was such:
bridge fdb show | grep -i 02:42:0a:01:00:02
02:42:0a:01:00:02 dev vxlan0 master br0
02:42:0a:01:00:02 dev vxlan0 dst 172.31.14.63 link-netnsid 0 self permanent
That was still pointing to a remove node
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Move the sandbox init logic into the go routine that handles
peer operations.
This is to avoid deadlocks in the use of the pMap.Lock for the
network
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Remove the need for the wait group and avoid new
locks
Added utility to print the method name and the caller name
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Remove the need for the wait group and avoid new
locks
Added utility to print the method name and the caller name
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
neighbor entries. On an l3 miss try to reprogram the neighbor entry
if the peer is valid. Its a best effort attempt because if the arp
table is still at gc_thresh3 value, addition will fail.
Signed-off-by: Santhosh Manohar <santhosh@docker.com>
The feature was not getting properly triggered, move it as
first operation in the configure
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
The netlink socket that was used to monitor the L2
miss was never being closed. The watchMiss goroutine
spawned was never returning. This was causing goroutine
leak in case of createNetwork/destroyNetwork
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
On linux systems bump up gc_thresholds so to lower the
probability of running with neighbor table overflow issues
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Allow users to configure firewall policies in a way that persists
docker operations/restarts. Docker will not delete or modify any
pre-existing rules from the DOCKER-USER filter chain. This allows
the user to create in advance any rules required to further
restrict access from/to the containers.
Fixesdocker/docker#29184Fixesdocker/docker#23987
Related to docker/docker#24848
Signed-off-by: Jacob Wen <jian.w.wen@oracle.com>
- Orchestrator interaction with the network driver is limited
to at most allocation/release of simple resources. For local scope
drivers all what is needed is the retrieval of the driver scope.The
full driver code base does not need to be pulled into the orschestrator.
This PR introduces a dedicated package in each builtin nw
driver for that purpose, as it was done for overlay driver.
Signed-off-by: Alessandro Boch <aboch@docker.com>
- It specifies whether the network driver can
provide containers connectivity across hosts.
- As of now, the data scope of the driver was
being overloaded with this notion.
- The driver scope information is still valid
and it defines whether the data allocation
of the network resources can be done globally
or only locally.
- With the scope network option, user can now
force a network as swarm scoped
regardless of the driver data scope.
- In case the network is configured as swarm scoped,
and the network driver is multihost capable,
a network DB instance will be launched for it.
Signed-off-by: Alessandro Boch <aboch@docker.com>
Use the string concatenation operator instead of using Sprintf for
simple string concatenation. This is usually easier to read, and allows
the compiler to detect problems with the type or number of operands,
which would be runtime errors with Sprintf.
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Flush all the endpoint flows when the external
connectivity is removed.
This will prevent issues where if there is a flow
in conntrack this will have precedence and will
let the packet skip the POSTROUTING chain.
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
With Plugin-V2, plugins can get activated before remote driver is
Initialized. Those plugins fails to get registered with drvRegistry.
This fix handles that scenario
Signed-off-by: Madhu Venugopal <madhu@docker.com>
With the introduction of GetIDInRange function in IDM and using it in
ovmanager, the idm.New was modified to start from 1. But that causes
issues when the network is removed which results in releasing the
vxlan-id from IDM. With the offset of 1, the Release call incorrectly
releases a bit which could be in use by another network and this results
in the infamous "error creating vxlan interface: file exists" errors
when another network is created with this freed bit.
Signed-off-by: Madhu Venugopal <madhu@docker.com>
Fix import name to use original project name 'logrus' instead of 'log'
Removing `f` from `logrus.Debugf` when formatting string is not present.
Signed-off-by: Daehyeok Mun <daehyeok@gmail.com>
This fix tries to fix logrus formatting by removing `f` from
`logrus.[Error|Warn|Debug|Fatal|Panic|Info]f` when formatting string
is not present.
Also fix import name to use original project name 'logrus' instead of
'log'
Signed-off-by: Daehyeok Mun <daehyeok@gmail.com>
To make it consistent with windows and linux workers
Signed-off-by: Madhu Venugopal <madhu@docker.com>
Fixed build breaks
Signed-off-by: msabansal <sabansal@microsoft.com>
1. Base work was done by msabansal and nwoodmsft
from : https://github.com/msabansal/docker/tree/overlay
2. reorganized under drivers/windows/overlay and rebased to
libnetwork master
3. Porting overlay common fixes to windows driver
* 46f525c
* ba8714e
* 6368406
4. Windows Service Discovery changes for swarm-mode
5. renaming default windows ipam drivers as "windows"
Signed-off-by: Madhu Venugopal <madhu@docker.com>
Signed-off-by: msabansal <sabansal@microsoft.com>
Signed-off-by: nwoodmsft <Nicholas.Wood@microsoft.com>
This fix tries to address the issue raised in:
https://github.com/docker/docker/issues/26341
where multiple addresses in a bridge may cause `--fixed-cidr` to
not have the correct addresses.
The issue is that `netutils.ElectInterfaceAddresses(bridgeName)`
only returns the first IPv4 address.
This fix changes `ElectInterfaceAddresses()` and `addresses()`
so that all IPv4 addresses are returned. This will allow the
possibility of selectively choose the address needed.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
- and update it to store. Otherwise after an ungraceful shutdown,
at next boot there will be in store two bridge endpoints with
same port-mapping data. When bridge driver will try to restore
the endpoints, there will be conflicts and a container with
restart policy could fail to start.
Signed-off-by: Alessandro Boch <aboch@docker.com>
As part of daemon init, network and ipam drivers are passed a
pluginstore object that implements the plugin/getter interface. Use this
interface methods in libnetwork to interact with network plugins. This
interface provides the new and improved pluginv2 functionality and falls
back to pluginv1 (legacy) if necessary.
Signed-off-by: Anusha Ragunathan <anusha@docker.com>
When plumbing overlay filter rules serialize this to make sure that
multiple sandbox join or leave is not causing erroneous behavior while
moving the RETURN rule in the predefined chains.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
This reverts commit b042dbe312.
The original commit breaks s390x, for example Docker build fails:
* https://github.com/docker/docker/issues/26440
As discussed in the above issue:
Even though char is unsigned by default on s390x, (gcc)go forces the type
of RawSockaddr.Data to be signed.
It makes no practical difference if these fields are signed or unsigned,
it's just an API issue.
The (assumed) reason for the original commit:
For a while RawSockaddr.Data was unsigned during development of the gcc
s390x port (not in an upstream release though). Probably the patch has
been developed in this time frame.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
When stale delete notifications are received, we still need to make sure
to purge sandbox neighbor cache because these stale deletes are most
typically out of order delete notifications and if an add for the
peermac was received before the delete of the old peermac,vtep pair then
we process that and replace the kernel state but the old neighbor state
in the sandbox cache remains. That needs to be purged when we finally
get the out of order delete notification.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
Fixed certain spurious overlay errors which were not errors at all but
showing up everytime service tasks are started in the engine.
Also added a check to make sure a delete is valid by checking the
incoming endpoint id wih the one in peerdb just to make sure if the
delete from gossip is not stale.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
- We need to compare the node notification IP with
the advertise address otherwise when the advertise
address is different from the local address (this
is for the public address outside of the host
that maps 1-to-1 to the local private address)
the local IP will be acocunted as an ipsec host
and extra states will be programmed for it.
Signed-off-by: Alessandro Boch <aboch@docker.com>
- When creating a non encrypted overlay network,
make sure no encryption related mangle rule from
stale network is on the way.
Signed-off-by: Alessandro Boch <aboch@docker.com>
- Because of a bug in the netlink xfrm code, our code will
fail to find and remove the states. While we could wait
for the netlink library fix, there is no longer a need to
convert the parsed IP addresses to the canonical notation
given the previous SPI computation (which worked on that
4 byte address assumption) is now replaced by the fnv hash.
- Also modify driver option that enables ipsec to "encrypted"
Signed-off-by: Alessandro Boch <aboch@docker.com>
With this change, all the auto-detection of the addresses are removed
from libnetwork and the caller takes the responsibilty to have a proper
advertise-addr in various scenarios (including externally facing public
advertise-addr with an internal facing private listen-addr)
Signed-off-by: Madhu Venugopal <madhu@docker.com>
A network is added to the `d.networks` map before it's fully initialized. That
is, it's possible for a network in `d.networks` to exist without having
`bridgeIPv4` populated yet. If multiple networks are spun up close to the same
time, a panic can occur.
Example:
```
panic(0x1a75d20, 0xc82000e090)
/usr/local/go/src/runtime/panic.go:443 +0x4e9
net.networkNumberAndMask(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/usr/local/go/src/net/ip.go:433 +0x42
net.(*IPNet).Contains(0x0, 0xc82084dbd0, 0x4, 0x4, 0xc820010200)
/usr/local/go/src/net/ip.go:457 +0x25
github.com/docker/libnetwork/drivers/bridge.(*networkConfiguration).conflictsWithNetworks(0xc822249360, 0xc822761380, 0x40, 0xc820866a60, 0x4, 0x4, 0x0, 0x0)
/root/rpmbuild/BUILD/docker-engine/vendor/src/github.com/docker/libnetwork/drivers/bridge/bridge.go:334 +0x40b
```
Signed-off-by: Andy Lindeman <alindeman@salesforce.com>
Currently ovmanager simply logs an error when there is a vni allocation
failure. Instead it should error out and free all the previously
allocated vnis
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
If xfrm modules cannot be loaded:
- Create netlink.Handle only for ROUTE socket
- Reject local join on overlay secure network
Signed-off-by: Alessandro Boch <aboch@docker.com>
If we cleaned up a stale network sandbox and an entry for that exists in
vniTbl, then purge it from vniTbl. Otherwise when a new vxlan for that
vni is added to the network, we might destroy the network sandbox
created in the current life.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
If a miss notification arrives on a network's miss go routine currently
it is unconditionally processed. This is unnecessary and can be bad if
there are too many misses. This is especially true for hostmode. Fix
this by filtering out misses that doesn't belong to any of the network's
subnets.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
In the current implementation, the local peers are being added as remote
peers so gets added to the vxlan neighbor and fdb table. This causes the
local forwarding to get stuck for a few seconds after the bridge mac
table entries for the local peers get aged out. This PR fixes the
problem.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
If a new network request is received for a prticular vni, cleanup the
interface with that vni even if it is inside a namespace. This is done
by collecting vni to namespace data during init and later using it to
delete the interface.
Also fixed a long pending issue of the vxlan interface not getting
destroyed even if the sandbox is destroyed. Fixed by first deleting the
vxlan interface first before destroying the sandbox.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
libnetwork agent mode is a mode where libnetwork can act as a local
agent for network and discovery plumbing alone while the state
management is done elsewhere. This completes the support for making
libnetwork and its associated drivers to be completely independent of a
k/v store(if needed) and work purely based on the state information
passed along by some some external controller or manager. This does not
mean that libnetwork support for decentralized state management via a
k/v store is removed.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
Currently overlay driver requires a k/v store to allocate a vxlan id and
add an entry in k/v store for network->vxlanIDs binding. But the overlay
driver should be able to work without a k/v store provided libnetwork
can pass along the vxlanIDs needed for the network, rather than the
driver managing it themselves. Modified the driver to work with vxlanIDs
passed down by libnetwork.
Also made changes in the driver to make use of the gossip layer
available in libnetwork if available.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
With the introduction of a driver generic gossip in libnetwork it is not
necessary for drivers to run their own gossip protocol (like what
overlay driver is doing currently) but instead rely on the gossip
instance run centrally in libnetwork. In order to achieve this, certain
enhancements to driver api are needed. This api aims to provide these
enhancements.
The new api provides a way for drivers to register interest on table
names of their choice by returning a list of table names of interest as
a response to CreateNetwork. By doing that they will get notified if a
CRUD operation happened on the tables of their interest, via the newly
added EventNotify call.
Drivers themselves can add entries to any table during a Join call by
invoking AddTableEntry method any number of times during the Join
call. These entries lifetime is the same as the endpoint itself. As soon
as the container leaves the endpoint, those entries added by driver
during that endpoint's Join call will be automatically removed by
libnetwork. This action may trigger notification of such deletion to all
driver instances in the cluster who have registered interest in that
table's notification.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
Currently datastore has dependencies on various kv backends.
This is undesirable if datastore had to be used as a backend
agnostic store management package with it's cache layer. This
PR aims to achieve that.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
Because overlay is a builtin driver and global allocation of overlay
resources is probably going to happen in a different node (a single
node) and the actual plumbing of the network is probably going to happen
in all nodes, it makes sense to split the functionality of allocation
into two different packages. The central component(this package) only
implements the NetworkAllocate/Free apis while the distributed
component(the existing overlay driver) implements the rest of the driver
api. This way we can reduce the memory footprint overall.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>