This commit carries forward the work done in
https://github.com/docker/libnetwork/pull/2295
and fixes two things
1. Allows macvlan and ipvlan to be restored properly
after dockerd or the system is restarted
2. Makes sure the refcount for the configOnly network
is not incremented for the above case so this network
can be deleted after all the associated ConfigFrom networks
are deleted
Addresses: https://github.com/docker/libnetwork/issues/1743
Signed-off-by: Arko Dasgupta <arko.dasgupta@docker.com>
Since docker container can be connected to combination of several
internal and external networks change of default gateway of the internal
ones breaks communication via the external ones.
This fixes only macvlan network type
Signed-off-by: Pavel Matěja <pavel@verotel.cz>
This commit updates the vendored ishidawataru/sctp and adapts its used
types.
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
moby/moby commit b27f70d45 wraps the ErrNotFound error returned when
a plugin cannot be found, to include a backtrace. This changes the
type of the error, so contoller.loadIPAMDriver no longer converts it
to a libnetwork plugin.NotFoundError.
This is a similar patch as was merged in 9b114971e5
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Changes included:
- Allow index specification at link creation time
- replace syscall with golang.org/x/sys/unix
- related: Use IFF_MULTI_QUEUE from x/sys/unix to define TUNTAP_MULTI_QUEUE
- related: Use IFLA_* constants from x/sys/unix
- Fix index out of range when no metadata for gretap
- added encapsulation attributes for Iptun and Sittun to support SIT tunnels
- Expose xfrm state's statistics
- Support invert in ip rules
- Support LWTUNNEL_ENCAP_SEG6
- Support setting and retrieving route MTU/AdvMSS
- Fix CalcRtable array parameter bug
- added support for Foo-over-UDP netlink calls
- Support num{tx,rx}queues and udp6zerocsum{tx,rx}
- tuntap: Add multiqueue support
- Retrieve VLAN ID when listing neighbour
- Fix LinkAdd for sit tunnel on 3.10 kernel
- Add support for managing source MACVLANs
- Two functions: one for adding bond slave, one for getting veth peer index
- Eliminate cgo from netlink
- Don't overwrite the XDP file descriptor with flags
- Fix reference to BPF instructions (on Kernel 4.13)
- Add Matchall filter
- Send IFA_CACHEINFO when setting up addresses
- Support IPv6 GRE Tun and Tap
- Add List option to RouteSubscribeWithOptions, AddrSubscribeWithOptions, and LinkSubscribeWithOptions
- Add Fq and Fq_Codel Qdisc support
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
RFC434 states that DNS Servers should be case insensitive
This commit makes sure that all DNS queries will be translated
to lower ASCII characters and all svcRecords will be saved in
lower case to abide by the RFC
Relates to https://github.com/moby/moby/issues/21169
Signed-off-by: Arko Dasgupta <arko.dasgupta@docker.com>
This test was producing error messages due to missing endpoints
in the plugin API;
```
=== RUN TestValidRemoteDriver
ERRO[0039] error getting capability for valid-network-driver due to NetworkDriver.GetCapabilities: 404 page not found
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
systemd and udev in their default configuration attempt to set a
persistent MAC address for network interfaces that don't have one
already [systemd-def-link]. We set the address only after creating the
interface, so there is a race between us and udev. There are several
outcomes (that actually occur, this race is very much not a theoretical
one):
* We set the address before udev gets to the networking rules, so udev
sees `/sys/devices/virtual/net/docker0/addr_assign_type = 3`
(NET_ADDR_SET). This means there's no need to assign a different
address and everything is fine.
* udev reads `/sys/devices/virtual/net/docker0/addr_assign_type` before
we set the address, gets `1` (NET_ADDR_RANDOM), and proceeds to
generate and set a persistent address.
Old versions of udev (pre-v242, i.e. without [udev-patch]) would then
fail to generate an address, spit out "Could not generate persistent
MAC address for docker0: No such file or directory" (see [udev-issue],
and everything would be probably fine as well.
Current version of udev (with [udev-patch]) will generate an address
just fine and then race us setting it. As udev does more work than we,
the most probable outcome is that udev will overwrite the address we
set and possibly cause some trouble later on.
On a clean Debian Buster (from Vagrant) VM with systemd/udev 242 from
Debian Experimental, `docker network create net1` up to `net7` resulted
in 3 bridges having a 02:42: address and 4 bridges having a seemingly
random (actually generated from interface name) address. With systemd
241, the result would be all bridges having a 02:42:, but some "Could
not generate persistent MAC address for" messages in the log.
The fix is to revert the MAC address setting fix from 6901ea51dc,
as it is no longer necessary with current netlink [netlink-addr-add],
and set the address atomically when creating the bridge interface, not
after that.
[systemd-def-link]: a166cd3aac/network/99-default.link
[udev-patch]: 6d36464065
[udev-issue]: https://github.com/systemd/systemd/issues/3374
[netlink-addr-add]: 7d9b424492
...
Do note that a similar race happens when creating veth devices as well.
I wasn't able to reproduce getting a wrong (non-02:42:) address,
possibly because the address is set by docker later, maybe only after
the interface is moved to another network namespace (but I'm just
guessing here). Still, different timings result in various error
messages being logged ("link_config: could not get ethtool features for
vethd9c938e" and the like) depending on when the interface disappears
from the primary network namespace. I'm not sure how to fix this and I
don't intend to dig deeper into this.
Signed-off-by: Tomas Janousek <tomi@nomi.cz>
The endpoint count for --config-only networks
was being incremented even when the respective --config-from
inherited network failed to create a network
This was due to a variable shadowing problem with err causing
the deferred function to not execute correctly.
Using the same err variable across the entire function fixes
the issue
Fixes: moby/moby#35101
Signed-off-by: Arko Dasgupta <arko.dasgupta@docker.com>
Starting from 18.09, there's a per node LB for each overlay
network, this change adds the check to node LB.
This change should not break on older docker versions.
Signed-off-by: Xinfeng Liu <xinfeng.liu@gmail.com>
For overlay, l2bridge, and l2tunnel, if the user does not specify a host port, windows driver will select a random port for them. This matches linux behavior.
For ics and nat networks the windows OS will choose the port.
Signed-off-by: Pradip Dhara <pradipd@microsoft.com>
This reverts commit 7adcd856fe.
Libnetwork should only use the iptables binary. Iptables v1.8 and above
uses the nftables backend. The translations for all the rules used by
libnetwork is supported by the new iptables binary.
Signed-off-by: Arko Dasgupta <arko.dasgupta@docker.com>
Minor changes following review of the engine pull request
for this feature;
- Remove the name of the function from the error message
as it's not a debug message.
- Add the valid range to the error message, so that a
user has sufficient information to address the problem.
- Update GoDoc for the function to describe the default
port, and valid port-ranges.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
It is possible that the node is not yet present in
the node list map. In this case just print a warning
and return. The next iteration would be fine
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Introduce "com.docker.network.bridge.inhibit_ipv4" option to the bridge
network driver. If set, this option will prevent docker from setting or
modifying Layer-3 (IP) configuration on the bridge interface in any way.
This option should allow connecting containers to pre-existing network
segments (with e.g., pre-existing default gateways) while simultaneously
preserving our ability to communicate with the host and/or configure the
properties of the host-side container virtual network interface (e.g.,
delay/loss/jitter via netem), which can not be done using macvlan.
Signed-off-by: Gabriel Somlo <gsomlo@gmail.com>
This PR chnages allow user to configure VxLAN UDP
port number. By default we use 4789 port number. But this commit
will allow user to configure port number during swarm init.
VxLAN port can't be modified after swarm init.
Signed-off-by: selansen <elango.siva@docker.com>
- NXDOMAIN is an authoritive answer, so when receiving an NXDOMAIN, we're done.
From RFC 1035: Name Error - Meaningful only for responses from an authoritative
name server, this code signifies that the domain name referenced in the query
does not exist.
FROM RFC 8020: When an iterative caching DNS resolver receives an NXDOMAIN
response, it SHOULD store it in its cache and then all names and resource
record sets (RRsets) at or below that node SHOULD be considered unreachable.
Subsequent queries for such names SHOULD elicit an NXDOMAIN response.
- REFUSED can be a transitional status: (https://www.ietf.org/rfc/rfc1035.txt)
The name server refuses to perform the specified operation for
policy reasons. For example, a name server may not wish to provide the
information to the particular requester, or a name server may not wish to
perform a particular operation (e.g., zone)
Other errors are now logged as debug-message, which can be useful for
troubleshooting.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Allow DSR to be a configurable option through a generic option to the
overlay driver. On the one hand this approach makes sense insofar as
only overlay networks can currently perform load balancing. On the
other hand, this approach has several issues. First, should we create
another type of swarm scope network, this will prevent it working.
Second, the service core code is separate from the driver code and the
driver code can't influence the core data structures. So the driver
code can't set this option itself. Therefore, implementing in this way
requires some hack code to test for this option in
controller.NewNetwork.
A more correct approach would be to make this a generic option for any
network. Then the driver could ignore, reject or be unaware of the option
depending on the chosen model. This would require changes to:
* libnetwork - naturally
* the docker API - to carry the option
* swarmkit - to propagate the option
* the docker CLI - to support the option
* moby - to translate the API option into a libnetwork option
Given the urgency of requests to address this issue, this approach will
be saved for a future iteration.
Signed-off-by: Chris Telfer <ctelfer@docker.com>
Modify the loadbalancing for east-west traffic to use direct routing
rather than NAT and update tasks to use direct service return under
linux. This avoids hiding the source address of the sender and improves
the performance in single-client/single-server tests.
Signed-off-by: Chris Telfer <ctelfer@docker.com>
Since SvcStats represents the stats for a `Service`, we don't want
to reuse that struct in the `Destination` (for no other reason than
incompatible nomenclature). So this patch adds a `DstStats` struct
to hold the Destination stats.
When dockerd.exe is not stopped cleanly (such as when Windows is
restarted), the endpoints are not cleaned up. When using a transparent
network, the endpoint IPv4 address is blank. When dockerd.exe starts up
again, libnetwork restores the endpoint, which would not have been
stored on a clean shutdown of dockerd.exe. That fails because the IPv4
address is blank. This change warns instead of failing.
Signed-off-by: John Stephens <johnstep@docker.com>
IPVLAN driver had been retained in experimental for multiple releases
with the requirement to have a proper L3 control-plane (such as BGP) to
go along with it which will make this driver much more useful. But
based on the community feedback,
https://github.com/moby/moby/issues/21735, am proposing to move this
driver out of experimental.
Signed-off-by: Madhu Venugopal <madhu@docker.com>
Signed-off-by: John Howard <jhoward@microsoft.com>
As well as bumping, libkv now requires go.etcd.io/bolt rather
than boltdb/bolt. Hence removed bolt from vendor.conf,
vendored go.etcd.io/bbot @ v1.3.1-etcd.8 and rerun vndr.
This addresses/alleviates https://github.com/docker/libnetwork/issues/2214
The new proposed limit should remediate the issue for most users.
Signed-off-by: Thiago Alves Silva <thiago.alves@aurea.com>
```
make lint
networkdb/networkdb_test.go:88:2: should replace t.Error(fmt.Sprintf(...)) with t.Errorf(...)
networkdb/networkdb_test.go:136:2: should replace t.Error(fmt.Sprintf(...)) with t.Errorf(...)
make: *** [lint] Error 1
```
Signed-off-by: Lei Gong <lgong@alauda.io>
ipamutils has two default address pool. Instead of allowing them to
be accessed directly, adding get functions so that other packages
can use get APIs.
Signed-off-by: selansen <elango.siva@docker.com>
This change brings global default address pool feature into
libnetwork. Idea is to reuse same code flow and functions that were
implemented for local scope default address pool.
Function InitNetworks carries most of the changes. local scope default
address pool init should always happen only once. But Global scope
default address pool can be initialized multiple times.
Signed-off-by: selansen <elango.siva@docker.com>
Go 1.10 fixed the problem related to thread and namespaces.
Details:
2595fe7fb6
In few words there is no more the possibility to have a go routine
running on a thread that is another namespace.
In this commit some cleanup is done and the method SetNamespace is
being removed. This will save tons of setns syscall, that were happening
way too frequently possibily to make sure that each operation was being
done in the host namespace.
I suspect that also all the drivers not running in a different
namespace would be able to drop also the lock of the OS Thread but
will address it in a different commit
Removed useless LockOSThreads around
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Change the sandbox IDs for the sandboxes of load-balancing endpoints to
be "lb_XXXXXXXXX" where XXXXXXXXX is the network ID that this sandbox
load balances for. This makes it easier to find these sandboxes in
/var/run/docker/netns and thus makes debugging easier.
Signed-off-by: Chris Telfer <ctelfer@docker.com>
Internal directory is designed to contain libraries
that are exclusively used by this project
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Instead of using "sync.Once" to determine whether to initialize a
network sandbox or subnet sandbox, we use a traditional mutex +
initialization boolean. This is because the initialization state isn't
truly a once-and-done condition. Rather, libnetwork destroys network
and subnet sandboxes when the last endpoint leaves them. The use of
sync.Once in this kind of scenario requires, therefore, re-initializing
the Once which is impoissible. So the approach that libnetwork
currently takes is to use a pointer to a Once and redirect that pointer
to a new Once on reset. This leads to nasty race conditions.
In addition to refactoring the locking, this patch merges the functions
joinSandbox(), and joinSubnetSandbox(). This makes the code both cleaner
and it also holds the network and subnet locks through the series of
read-modify-writes avoiding further potential races. This does reduce
the potential parallelism which could be applied should there be many
joins coming in on many different subnets in the same overlay network.
However, this should be an extremely minor performance hit for a very
obscure case.
One important pattern in this commit is that it is crucial to avoid
sending peerDB messages while holding a driver or network lock. The
changes herein defer such (asynchronous) notifications until after
release of such locks. This prevents deadlocks where the peerDB
blocks acquiring said locks while the network method blocks trying
to send to the peerDB's channel.
Signed-off-by: Chris Telfer <ctelfer@docker.com>
'make check' will now fail if the files produced by re-running protoc
differ from those which are checked into the repository.
Signed-off-by: Euan Harris <euan.harris@docker.com>
Outside the build container, run: make protobuf
Inside the build container, run: make protobuf-local
Signed-off-by: Euan Harris <euan.harris@docker.com>
This makes it possible to Ctrl-C tests and builds again. Zombie
processes will also be reaped correctly.
Signed-off-by: Euan Harris <euan.harris@docker.com>
The previous code used string slices to limit the length of certain
fields like endpoint or sandbox IDs. This assumes that these strings
are at least as long as the slice length. Unfortunately, some sandbox
IDs can be smaller than 7 characters. This fix addresses this issue
by systematically converting format string calls that were taking
fixed-slice arguments to use a precision specifier in the string format
itself. From the golang fmt package documentation:
For strings, byte slices and byte arrays, however, precision limits
the length of the input to be formatted (not the size of the output),
truncating if necessary. Normally it is measured in runes, but for
these types when formatted with the %x or %X format it is measured
in bytes.
This nicely fits the desired behavior: it will limit the number of
runes considered for string interpolation to the precision value.
Signed-off-by: Chris Telfer <ctelfer@docker.com>
TestOverlappingRequests checks that pool requests which are supersets or
subsets of existing allocations, and those which overlap with existing
allocations at the beginning or the end.
Multiple allocation is now tested by TestOverlappingRequests, so
TestDoublePoolRelease only needs to test double releasing.
Signed-off-by: Euan Harris <euan.harris@docker.com>
Added some optimizations to reduce the messages in the queue:
1) on join network the node execute a tcp sync with all the nodes that
it is aware part of the specific network. During this time before the
node was redistributing all the entries. This meant that if the network
had 10K entries the queue of the joining node will jump to 10K. The fix
adds a flag on the network that would avoid to insert any entry in the
queue till the sync happens. Note that right now the flag is set in
a best effort way, there is no real check if at least one of the nodes
succeed.
2) limit the number of messages to redistribute coming from a TCP sync.
Introduced a threshold that limit the number of messages that are
propagated, this will disable this optimization in case of heavy load.
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
instead of printing the whole option, print the _number_ only,
because that's what the error-message is pointing at;
Before this change:
invalid number for ndots option ndots:foobar
After this change:
invalid number for ndots option: foobar
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Refactor the ostweaks file to allows a more easy reuse
Add a method on the osl.Sandbox interface to allow setting
knobs on the sandbox
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
`ndots:0` is a valid DNS option; previously, `ndots:0` was
ignored, leading to the default (`ndots:0`) also being applied;
Before this change:
docker network create foo
docker run --rm --network foo --dns-opt ndots:0 alpine cat /etc/resolv.conf
nameserver 127.0.0.11
options ndots:0 ndots:0
After this change:
docker network create foo
docker run --rm --network foo --dns-opt ndots:0 alpine cat /etc/resolv.conf
nameserver 127.0.0.11
options ndots:0
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The "message" argument in assert.Equal expects a format
string; the current string was not that, resulting in an
incorrect message being printed;
--- FAIL: TestDNSOptions (1.28s)
Location: service_common_test.go:92
Error: Not equal: "ndots:5" (expected)
!= "ndots:0" (actual)
Messages: The option must be ndots:5 instead:%!(EXTRA string=ndots:0)
This patch removes the message altogether, because assert.Equal
already prints enough information to catch the error;
--- FAIL: TestDNSOptions (1.28s)
Location: service_common_test.go:92
Error: Not equal: "ndots:5" (expected)
!= "ndots:0" (actual)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Add debug and error logs to notify when a load balancing sandbox
is not found. This can occur in normal operation during removal.
Signed-off-by: Chris Telfer <ctelfer@docker.com>
Lock the network ID in the controller during an addServiceBinding to
prevent racing with network.delete(). This would cause the binding to
be silently ignored in the system.
Signed-off-by: Chris Telfer <ctelfer@docker.com>
This is the heart of the scalability change for services in libnetwork.
The present routing mesh adds load-balancing rules for a network to
every container connected to the network. This newer approach creates a
load-balancing endpoint per network per node. For every service on a
network, libnetwork assigns the VIP of the service to the endpoint's
interface as an alias. This endpoint must have a unique IP address in
order to route return traffic to it. Traffic destined for a service's
VIP arrives at the load-balancing endpoint on the VIP and from there,
Linux load balances it among backend destinations while SNATing said
traffic to the endpoint's unique IP address.
The net result of this scheme is that each node in a swarm need only
have one set of load balancing state per service instead of one per
container on the node. This scheme is very similar to how services
currently operate on Windows nodes in libnetwork. It (as with Windows
nodes) costs the use of extra IP addresses in a network (one per node)
and an extra network hop in the stack, although, always in the stack
local to the container.
In order to prevent existing deployments from suddenly failing if they
failed to allocate sufficient address space to include per-node
load-balancing endpoint IP addresses, this patch preserves the existing
functionality and activates the new functionality on a per-network
basis depending on whether the network has a load-balancing endpoint.
Eventually, moby should always set this option when creating new
networks and should only omit it for networks created as part of a swarm
that are not marked to use endpoint load balancing.
This patch also normalizes the code to treat "load" and "balancer"
as two separate words from the perspectives of variable/function naming.
This means that the 'b' in "balancer" must be capitalized.
Signed-off-by: Chris Telfer <ctelfer@docker.com>
This was passing extra information and adding confusion about the
purpose of the load balancing structure.
Signed-off-by: Chris Telfer <ctelfer@docker.com>
New load balancing code will require ability to add aliases to
load-balncer sandboxes. So this broadens the OSL interface to allow
adding aliases to any interface, along with the facility to get the
loopback interface's name based on the OS.
Signed-off-by: Chris Telfer <ctelfer@docker.com>
This method returns the name of the interface from the perspective
of the host OS pre-container. This will be required later for
finding matching a sandbox's interface name to an endpoint which
is, in turn, requied for adding an IP alias to a load balancer
endpoint.
Signed-off-by: Chris Telfer <ctelfer@docker.com>
Error unwinding only works if the error variable is used consistently
and isn't hidden in the scope of other if statements.
Signed-off-by: Chris Telfer <ctelfer@docker.com>
Default gateways truncate the endpoint name to 12 characters. This can
make network endpoints ambiguous especially for load-balancing sandboxes
for networks with lenghty names (such as with our prefixes). Address
this by detecting an overflow in the sanbox name length and instead
opting to name the gateway endpoint "gateway_<id>" which should never
collide.
Signed-off-by: Chris Telfer <ctelfer@docker.com>
Change the Delete() method to take optional options and add
NetworkDeleteOptionRemoveLB as one such option. This option allows
explicit removal of an ingress network along with its load-balancing
endpoint if there are no other endpoints in the network. Prior to this,
the libnetwork client would have to manually search for and remove the
ingress load balancing endpoint from an ingress network. This was, of
course, completely hacky.
This commit will require a slight modification in moby to make use of
the option when deleting the ingress network.
Signed-off-by: Chris Telfer <ctelfer@docker.com>
Fix related to bug: https://github.com/docker/for-linux/issues/348
We should perform updateToStore(ep) after n.addEndpoint or do update twice,
otherwise response from network plugin will not be written to KV storage.
This results in container creation with broken network config.
Signed-off-by: Siarhei Rasiukevich <raskintech@gmail.com>
Most of the libcontainer imports was just for a single test to marshal a
simple type, meanwhile this caused all kinds of transient imports that
are not really needed.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit a07a1ee9ccdf4c5a3a90eea9fd359f10b5156c84)
Signed-off-by: selansen <elango.siva@docker.com>
The use of `Client()` on v2 plugins is being deprecated so that we can
be more flexible on the protocol used for plugins.
This means checking specifically if the plugin implements the
`Client() *plugins.Client` interface for V1 plugins, and for v2 plugins
building a the client manually.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit 45824a226b8a220d6f189c2d25fe16f9efc83db9)
Signed-off-by: selansen <elango.siva@docker.com>
agent.pb.go is unchanged, but the files in networkdb and drivers
are slightly different when regenerated using the current versions
of protoc and gogoproto. This is probably because agent.pb.go
was last regenerated quite recently, in February 2018, whereas
networkdb.pb.go and overlay/overlay.pb.go were last changed in 2017,
and windows/overlay/overlay.pb.go was last changed in 2016.
Signed-off-by: Euan Harris <euan.harris@docker.com>
Previous logic was not accounting that each node is
in the node list so the bootstrap nodes won't retry
to reconnect because they will always find themselves
in the node map
Added test that validate the gossip island condition
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
* Update dependencies to match moby master; add new sub-dependencies
as necessary.
* Update moby to latest
* Update gocapability
This moves gocapability beyond the version vendored in moby;
presumably the code which requires this particular version
is not used in moby and is removed by vndr. Moby will need
to be updated as well.
Signed-off-by: Euan Harris <euan.harris@docker.com>
moby/moby commit b27f70d45 wraps the ErrNotFound error returned when
a plugin cannot be found, to include a backtrace. This changes the
type of the error, so contoller.loadDriver no longer converts it to a
libnetwork plugin.NotFoundError. This causes a couple of tests which
inspect the return type to fail; most code only checks whether the
error is non-nil and is not affected by the change in type.
Signed-off-by: Euan Harris <euan.harris@docker.com>
Add a test to confirm that the pool allocator will iterate through all
the pools even if some earlier ones were freed before coming back to
previously allocated pools.
Signed-off-by: Chris Telfer <ctelfer@docker.com>
This commit prevents subnets from being reused at least initially,
instead favoring to cycle through them as we do with addresses within a
subnet.
Signed-off-by: Chris Telfer <ctelfer@docker.com>
Commit b64997ea prevented data corruption due to simultaneous
driver.CreateNetwork()/driver.DeleteNetwork() by holding the network
lock through the read/modify part of the operation. However, part of
the DeleteNetwork operation entails sending a message to the peerDB to
tell that goroutine to flush entries on deletion. This can lead to a
deadlock where:
* driver.DeleteNetwork() starts and acquires driver.Lock()
* peerDB receives some other request (e.g. EventNotify) and blocks
on driver.Lock()
* driver.DeleteNetwork() attempts a peerDB flush and blocks waiting
on the synchronous peerDB operation channel
This patch fixes the issue by deferring the peerDB flush operation until
after DeleteNetwork() unlocks driver.Lock(). Commit b64997ea only
modified CreateNetwork() and DeleteNetwork() and the critical section
that driver.Lock() protects in CreateNetwork() does not perform any
peerDB notifications or other locks of driver data structures. So this
solution should be a complete fix for any regressions introduced in
b64997ea.
Signed-off-by: Chris Telfer <ctelfer@docker.com>
Make sure that iptables operations on ingress
are serialized.
Before 2 racing routines trying to create the ingress chain
were allowed and one was failing reporting the chain as
already existing.
The lock guarantees that this condition does not happen anymore
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
The system should remove cluster service info including networkDB
entries and DNS entries for container endpoints that are not part of a
service as well as those that are part of a service. This used to be
the normal sequence of operations but it moved to
sandbox.DisableService() in an effort to more gracefully handle endpoint
removal from a service (which proved insufficient). Unfortunately
subsequent changes also removed the newly-mandetory call to
sandbox.DisableService() preventing proper cleanup for non-service
container endpoints.
Signed-off-by: Chris Telfer <ctelfer@docker.com>
Go 1.7 added the subtest feature which can make table-driven tests much easier to run and debug. Some tests are not using this feature.
Signed-off-by: Yang Li <idealhack@gmail.com>
Use net.splitHostPort() instead of our own logic in func (p *PortBinding)
FromString(s string) error. This means that IPv6 literals, including
IPv4 in IPv6 literals, can now be parsed from the string form of
PortBindings. Zoned addresses do not work - net.splitHostPort() parses
them but net.ParseIP() cannot and returns an error. This is ok because
we do not have a slot to store the zone name in PortBinding anyway.
Signed-off-by: Euan Harris <euan.harris@docker.com>
The function tested by TestUtilGetHostPortionIP is called GetHostPartIP.
Rename the test to match the function being tested.
Signed-off-by: Euan Harris <euan.harris@docker.com>
Releasing a pool which has already been released should fail; this
change increases coverage by a fraction by exercising this path.
Signed-off-by: Euan Harris <euan.harris@docker.com>
This makes it easy to drop into the build container, for instance to
run tests or other Go tools over a subset of the code.
Signed-off-by: Euan Harris <euan.harris@docker.com>