Commit graph

757 commits

Author SHA1 Message Date
Cory Snider
042f0799db libn/d/overlay: support encryption on any port
While the VXLAN interface and the iptables rules to mark outgoing VXLAN
packets for encryption are configured to use the Swarm data path port,
the XFRM policies for actually applying the encryption are hardcoded to
match packets with destination port 4789/udp. Consequently, encrypted
overlay networks do not pass traffic when the Swarm is configured with
any other data path port: encryption is not applied to the outgoing
VXLAN packets and the destination host drops the received cleartext
packets. Use the configured data path port instead of hardcoding port
4789 in the XFRM policies.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 9a692a3802)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-05-26 16:41:42 -04:00
Cory Snider
f77a3274b4
[chore] clean up reexec.Init() calls
Now that most uses of reexec have been replaced with non-reexec
solutions, most of the reexec.Init() calls peppered throughout the test
suites are unnecessary. Furthermore, most of the reexec.Init() calls in
test code neglects to check the return value to determine whether to
exit, which would result in the reexec'ed subprocesses proceeding to run
the tests, which would reexec another subprocess which would proceed to
run the tests, recursively. (That would explain why every reexec
callback used to unconditionally call os.Exit() instead of returning...)

Remove unneeded reexec.Init() calls from test and example code which no
longer needs it, and fix the reexec.Init() calls which are not inert to
exit after a reexec callback is invoked.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 4e0319c878)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-05-11 16:31:41 +02:00
Sebastiaan van Stijn
17feabcba0
libnetwork: overlayutils: remove redundant init()
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-28 20:18:29 +02:00
Sebastiaan van Stijn
214e200f95
Merge pull request #45308 from corhere/libnet/overlay-bpf-ipv6
libnetwork/drivers/overlay: make VNI matcher IPv6-compatible
2023-04-26 14:37:09 +02:00
Brian Goff
0970cb054c
Merge pull request #45366 from akerouanton/fix-docker0-PreferredPool
daemon: set docker0 subpool as the IPAM pool
2023-04-25 11:07:57 -07:00
Albin Kerouanton
2d31697d82
daemon: set docker0 subpool as the IPAM pool
Since cc19eba (backported to v23.0.4), the PreferredPool for docker0 is
set only when the user provides the bip config parameter or when the
default bridge already exist. That means, if a user provides the
fixed-cidr parameter on a fresh install or reboot their computer/server
without bip set, dockerd throw the following error when it starts:

> failed to start daemon: Error initializing network controller: Error
> creating default "bridge" network: failed to parse pool request for
> address space "LocalDefault" pool "" subpool "100.64.0.0/26": Invalid
> Address SubPool

See #45356.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-04-25 15:32:46 +02:00
Cory Snider
c399963243 libn/d/overlay: make VNI matcher IPv6-compatible
Use Linux BPF extensions to locate the offset of the VXLAN header within
the packet so that the same BPF program works with VXLAN packets
received over either IPv4 or IPv6.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-04-24 14:20:29 -04:00
Cory Snider
7d9bb170b7 libn/d/overlay: test the VNI BPF matcher on IPv4
Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-04-24 14:19:39 -04:00
Albin Kerouanton
1e1efe1f61
libnet/d/overlay: clean up iptables rules on network delete
This commit removes iptables rules configured for secure overlay
networks when a network is deleted. Prior to this commit, only
CreateNetwork() was taking care of removing stale iptables rules.

If one of the iptables rule can't be removed, the erorr is logged but
it doesn't prevent network deletion.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-04-17 17:21:21 +02:00
Sebastiaan van Stijn
0154746b9f
Merge pull request #44965 from akerouanton/libnetwork-dead-code
libnetwork/overlay: remove dead code
2023-04-11 17:09:45 +02:00
Albin Kerouanton
8ed900263e
libnetwork/overlay: remove host mode
Linux kernel prior to v3.16 was not supporting netns for vxlan
interfaces. As such, moby/libnetwork#821 introduced a "host mode" to the
overlay driver. The related kernel fix is available for rhel7 users
since v7.2.

This mode could be forced through the use of the env var
_OVERLAY_HOST_MODE. However this env var has never been documented and
is not referenced in any blog post, so there's little chance many people
rely on it. Moreover, this host mode is deemed as an implementation
details by maintainers. As such, we can consider it dead and we can
remove it without a prior deprecation warning.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-04-06 19:52:41 +02:00
Albin Kerouanton
1d46597c8b
libnetwork/overlay: remove KVObject implementation
Since 0fa873c, there's no function writing overlay networks to some
datastore. As such, overlay network struct doesn't need to implement
KVObject interface.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-04-06 19:52:29 +02:00
Albin Kerouanton
f32f09e78f
libnetwork/overlay: don't lock network when accessing subnet vni
Since a few commits, subnet's vni don't change during the lifetime of
the subnet struct, so there's no need to lock the network before
accessing it.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-04-06 19:52:27 +02:00
Albin Kerouanton
b67446a8fa
libnetwork: remove local store from overlay driver
Since the previous commit, data from the local store are never read,
thus proving it was only used for Classic Swarm.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-04-06 19:52:27 +02:00
Albin Kerouanton
8aa1060c34
libnetwork/overlay: remove live-restore support
The overlay driver in Swarm v2 mode doesn't support live-restore, ie.
the daemon won't even start if the node is part of a Swarm cluster and
live-restore is enabled. This feature was only used by Swarm Classic.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-04-06 19:52:27 +02:00
Albin Kerouanton
e3708a89cc
libnetwork/overlay: remove vni allocation
VNI allocations made by the overlay driver were only used by Classic
Swarm. With Swarm v2 mode, the driver ovmanager is responsible of
allocating & releasing them.

Previously, vxlanIdm was initialized when a global store was available
but since 142b522, no global store can be instantiated. As such,
releaseVxlanID actually does actually nothing and iptables rules are
never removed.

The last line of dead code detected by golangci-lint is now gone.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-04-06 19:52:27 +02:00
Albin Kerouanton
e251837445
libnetwork/overlay: remove Serf-based clustering
Prior to 0fa873c, the serf-based event loop was started when a global
store was available. Since there's no more global store, this event loop
and all its associated code is dead.

Most dead code detected by golangci-lint in prior commits is now gone.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-04-06 19:52:17 +02:00
Albin Kerouanton
644e3d4cdb
libnetwork/netlabel: remove dead code
- LocalKVProvider, LocalKVProviderURL, LocalKVProviderConfig,
  GlobalKVProvider, GlobalKVProviderURL and GlobalKVProviderConfig
  are all unused since moby/libnetwork@be2b6962 (moby/libnetwork#908).
- GlobalKVClient is unused since 0fa873c and c8d2c6e.
- MakeKVProvider, MakeKVProviderURL and MakeKVProviderConfig are unused
  since 96cfb076 (moby/moby#44683).
- MakeKVClient is unused since 142b5229 (moby/moby#44875).

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-04-06 19:51:56 +02:00
Albin Kerouanton
c8d2c6ea77
libnetwork: remove unused props from windows overlay driver
The overlay driver was creating a global store whenever
netlabel.GlobalKVClient was specified in its config argument. This
specific label is unused anymore since 142b522 (moby/moby#44875).

It was also creating a local store whenever netlabel.LocalKVClient was
specificed in its config argument. This store is unused since
moby/libnetwork@9e72136 (moby/libnetwork#1636).

Finally, the sync.Once properties are never used and thus can be
deleted.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-04-06 19:33:04 +02:00
Albin Kerouanton
0fa873c0fe
libnetwork: remove global store from overlay driver
The overlay driver was creating a global store whenever
netlabel.GlobalKVClient was specified in its config argument. This
specific label is not used anymore since 142b522 (moby/moby#44875).

golangci-lint now detects dead code. This will be fixed in subsequent
commits.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-04-06 19:33:04 +02:00
Cory Snider
4d04068184 libn/d/overlay: only program xt_bpf rules
Drop support for platforms which only have xt_u32 but not xt_bpf. No
attempt is made to clean up old xt_u32 iptables rules left over from a
previous daemon instance.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-04-05 11:50:03 -04:00
Sebastiaan van Stijn
878ee341d6
Merge pull request from GHSA-232p-vwff-86mp
libnetwork: ensure encryption is mandatory on encrypted overlay networks
2023-04-04 20:03:51 +02:00
Cory Snider
9e3a6ccf69 libn/i/setmatrix: make generic and constructorless
Allow SetMatrix to be used as a value type with a ready-to-use zero
value. SetMatrix values are already non-copyable by virtue of having a
mutex field so there is no harm in allowing non-pointer values to be
used as local variables or struct fields. Any attempts to pass around
by-value copies, e.g. as function arguments, will be flagged by go vet.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-03-29 13:31:12 -04:00
Brian Goff
0a334ea081
Merge pull request #45164 from corhere/libnet/peer-op-function-call
libnetwork/d/overlay: handle peer ops directly
2023-03-27 09:46:49 -07:00
Albin Kerouanton
bae49ff278
libnet/d/windows: log EnableInternalDNS val after setting it
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-03-24 18:23:21 +01:00
Cory Snider
965eda3b9a libnet/d/overlay: insert the input-drop rule
FirewallD creates the root INPUT chain with a default-accept policy and
a terminal rule which rejects all packets not accepted by any prior
rule. Any subsequent rules appended to the chain are therefore inert.
The administrator would have to open the VXLAN UDP port to make overlay
networks work at all, which would result in all VXLAN traffic being
accepted and defeating our attempts to enforce encryption on encrypted
overlay networks.

Insert the rule to drop unencrypted VXLAN packets tagged for encrypted
overlay networks at the top of the INPUT chain so that enforcement of
mandatory encryption takes precedence over any accept rules configured
by the administrator. Continue to append the accept rule to the bottom
of the chain so as not to override any administrator-configured drop
rules.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-03-22 20:54:01 -04:00
Cory Snider
105b9834fb libnet/d/overlay: add BPF-powered VNI matcher
Some newer distros such as RHEL 9 have stopped making the xt_u32 kernel
module available with the kernels they ship. They do ship the xt_bpf
kernel module, which can do everything xt_u32 can and more. Add an
alternative implementation of the iptables match rule which uses xt_bpf
to implement exactly the same logic as the u32 filter using a BPF
program. Try programming the BPF-powered rules as a fallback when
programming the u32-powered rules fails.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-03-15 19:33:51 -04:00
Cory Snider
44cf27b5fc libnet/d/overlay: extract VNI match rule builder
The iptables rule clause used to match on the VNI of VXLAN datagrams
looks like line noise to the uninitiated. It doesn't help that the
expression is repeated twice and neither copy has any commentary.
DRY out the rule builder to a common function, and document what the
rule does and how it works.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-03-15 19:30:28 -04:00
Cory Snider
142f46cac1 libn/d/overlay: enforce encryption on sandbox init
The iptables rules which make encryption mandatory on an encrypted
overlay network are only programmed once there is a second node
participating in the network. This leaves single-node encrypted overlay
networks vulnerable to packet injection. Furthermore, failure to program
the rules is not treated as a fatal error.

Program the iptables rules to make encryption mandatory before creating
the VXLAN link to guarantee that there is no window of time where
incoming cleartext VXLAN packets for the network would be accepted, or
outgoing cleartext packets be transmitted. Only create the VXLAN link if
programming the rules succeeds to ensure that it fails closed.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-03-15 19:28:11 -04:00
Cory Snider
d4fd582fb2 libnet/d/overlay: document some encryption code
The overlay-network encryption code is woefully under-documented, which
is especially problematic as it operates on under-documented kernel
interfaces. Document what I have puzzled out of the implementation for
the benefit of the next poor soul to touch this code.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-03-15 17:26:24 -04:00
Cory Snider
a050db4a6f libnetwork/d/overlay: handle peer ops directly
Funneling the peer operations into an unbuffered channel only serves to
achieve the same result as a mutex, using a lot more boilerplate and
indirection. Get rid of the boilerplate and unnecessary indirection by
using a mutex and calling the operations directly.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-03-14 18:33:32 -04:00
Cory Snider
09d39c023c libnetwork/i/setmatrix: devirtualize
There is only one implementation. Get rid of the interface.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-03-14 18:09:08 -04:00
Cory Snider
91725ddc92 libnet/d/ipvlan: gracefully migrate from older dbs
IPVLAN networks created on Moby v20.10 do not have the IpvlanFlag
configuration value persisted in the libnetwork database as that config
value did not exist before v23.0.0. Gracefully migrate configurations on
unmarshal to prevent type-assertion panics at daemon start after upgrade.

Fixes #44925

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-06 12:08:28 -05:00
Cory Snider
a08a254df3 libnetwork: drop DatastoreConfig discovery type
The DatastoreConfig discovery type is unused. Remove the constant and
any resulting dead code. Today's biggest loser is the IPAM Allocator:
DatastoreConfig was the only type of discovery event it was listening
for, and there was no other place where a non-nil datastore could be
passed into the allocator. Strip out all the dead persistence code from
Allocator, leaving it as purely an in-memory implementation.

There is no more need to check the consistency of the allocator's
bit-sequences as there is no persistent storage for inconsistent bit
sequences to be loaded from.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-27 11:47:43 -05:00
Cory Snider
28edc8e2d6 libnet: convert to new-style driver registration
Per the Interface Segregation Principle, network drivers should not have
to depend on GetPluginGetter methods they do not use. The remote network
driver is the only one which needs a PluginGetter, and it is already
special-cased in Controller so there is no sense warping the interfaces
to achieve a foolish consistency. Replace all other network drivers' Init
functions with Register functions which take a driverapi.Registerer
argument instead of a driverapi.DriverCallback. Add back in Init wrapper
functions for only the drivers which Swarmkit references so that
Swarmkit can continue to build.

Refactor the libnetwork Controller to use the new drvregistry.Networks
and drvregistry.IPAMs driver registries in place of the legacy ones.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-27 11:47:42 -05:00
Cory Snider
48ad9e19e4 libnetwork/netutils: drop ElectInterfaceAddresses
The function references global shared, mutable state and is no longer
needed. Deleting it brings us one step closer to getting rid of that
pesky shared state.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-26 14:56:11 -05:00
Bjorn Neergaard
390532cbc6
libnetwork/windows/overlay: drop unused variables
These package-level variables were copied over from the Linux
implementation; drop them for clarity's sake.

Signed-off-by: Bjorn Neergaard <bneergaard@mirantis.com>
2023-01-24 12:44:19 -07:00
Bjorn Neergaard
3775939303
libnetwork/netutils: refactor GenerateRandomName
GenerateRandomName now uses length to represent the overall length of
the string; this will help future users avoid creating interface names
that are too long for the kernel to accept by mistake. The test coverage
is increased and cleaned up using gotest.tools.

Signed-off-by: Bjorn Neergaard <bneergaard@mirantis.com>
2023-01-24 12:44:14 -07:00
Brian Goff
483b03562a
Merge pull request #44491 from corhere/libnetwork-minus-reexec
libnetwork: eliminate almost all reexecs
2023-01-13 10:44:25 -08:00
Bjorn Neergaard
dae48a8064
Merge pull request #44803 from akerouanton/fix-44721
libnetwork: Remove iptables nat rule when hairpin is disabled
2023-01-12 08:36:10 -07:00
Cory Snider
4733127a04 libnetwork: set default VLAN without reexec
Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-11 12:14:31 -05:00
Albin Kerouanton
ef161d4aeb
libnetwork: Clean up sysfs-based operations
- The oldest kernel version currently supported is v3.10. Bridge
parameters can be set through netlink since v3.8 (see
torvalds/linux@25c71c7). As such, we don't need to fallback to sysfs to
set hairpin mode.
- `scanInterfaceStats()` is never called, so no need to keep it alive.
- Document why `default_pvid` is set through sysfs

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-01-11 17:01:53 +01:00
Albin Kerouanton
566a2e4c79
libnetwork: Remove iptables nat rule when hairpin is disabled
When userland-proxy is turned off and on again, the iptables nat rule
doing hairpinning isn't properly removed. This fix makes sure this nat
rule is removed whenever the bridge is torn down or hairpinning is
disabled (through setting userland-proxy to true).

Unlike for ip masquerading and ICC, the `programChainRule()` call
setting up the "MASQ LOCAL HOST" rule has to be called unconditionally
because the hairpin parameter isn't restored from the driver store, but
always comes from the driver config.

For the "SKIP DNAT" rule, things are a bit different: this rule is
always deleted by `removeIPChains()` when the bridge driver is
initialized.

Fixes #44721.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-01-11 16:32:18 +01:00
Albin Kerouanton
b37d34307d
Clear conntrack entries for published UDP ports
Conntrack entries are created for UDP flows even if there's nowhere to
route these packets (ie. no listening socket and no NAT rules to
apply). Moreover, iptables NAT rules are evaluated by netfilter only
when creating a new conntrack entry.

When Docker adds NAT rules, netfilter will ignore them for any packet
matching a pre-existing conntrack entry. In such case, when
dockerd runs with userland proxy enabled, packets got routed to it and
the main symptom will be bad source IP address (as shown by #44688).

If the publishing container is run through Docker Swarm or in
"standalone" Docker but with no userland proxy, affected packets will
be dropped (eg. routed to nowhere).

As such, Docker needs to flush all conntrack entries for published UDP
ports to make sure NAT rules are correctly applied to all packets.

- Fixes #44688
- Fixes #8795
- Fixes #16720
- Fixes #7540
- Fixes moby/libnetwork#2423
- and probably more.

As a precautionary measure, those conntrack entries are also flushed
when revoking external connectivity to avoid those entries to be reused
when a new sandbox is created (although the kernel should already
prevent such case).

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-01-05 12:53:22 +01:00
Tianon Gravi
bcb8f69cc5
Merge pull request #44239 from thaJeztah/resolvconf_refactor_step2
libnetwork: simplify handling of reading resolv.conf
2022-12-22 13:18:47 -08:00
Sebastiaan van Stijn
36151bd1d7
libnetwork/drivers/bridge: remove "ioctl" fallback code for legacy kernels
This code was forked from libcontainer (now runc) in
fb6dd9766e

From the description of this code:

> THIS CODE DOES NOT COMMUNICATE WITH KERNEL VIA RTNETLINK INTERFACE
> IT IS HERE FOR BACKWARDS COMPATIBILITY WITH OLDER LINUX KERNELS
> WHICH SHIP WITH OLDER NOT ENTIRELY FUNCTIONAL VERSION OF NETLINK

That comment was added as part of a refactor in;
4fe2c7a4db

Digging deeper into the code, it describes:

> This is more backward-compatible than netlink.NetworkSetMaster and
> works on RHEL 6.

That comment (and code) moved around a few times;

- moved into the libcontainer pkg: 6158ccad97
- moved within the networkdriver pkg: 4cdcea2047
- moved into the networkdriver pkg: 90494600d3

Ultimately leading to 7a94cdf8ed, which implemented
this:

> create the bridge device with ioctl
>
> On RHEL 6, creation of a bridge device with netlink fails.  Use the more
> backward-compatible ioctl instead.  This fixes networking on RHEL 6.

So from that information, it looks indeed to support RHEL 6, and Ubuntu 12.04
which are both EOL, and we haven't supported for a long time, so probably time
to remove this.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-12-21 17:32:04 +01:00
Sebastiaan van Stijn
0cbe6524db
libnetwork/drivers/overlay: getBridgeNamePrefix() simplify reading of resolv.conf
We only need the content here, not the checksum, so simplifying the code by
just using os.ReadFile().

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-11-29 20:10:42 +01:00
Cory Snider
010077ba0f libnet/d/bridge: fix race condition in test case
TestCreateParallel, which was ostensibly added as a regression test for
race conditions inside the bridge driver, contains a race condition. The
getIPv4Data() calls race the network configuration and so will sometimes
see the existing address assignments return IP address ranges which do
not conflict with them. While normally a good thing, the test asserts
that exactly one of the 100 networks is successfully created. Pass the
same IPAM data when attempting to create every network to ensure that
the address ranges conflict.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-11-08 17:58:06 -05:00
Cory Snider
7b2308980c libnet/d/bridge: fix bridgeInterface.addresses()
addresses() would incorrectly return all IP addresses assigned to any
interface in the network namespace if exists() is false. This went
unnoticed as the unit test covering this case tested the method inside a
clean new network namespace, which had no interfaces brought up and
therefore no IP addresses assigned. Modifying
testutils.SetupTestOSContext() to bring up the loopback interface 'lo'
resulted in the loopback addresses 127.0.0.1 and [::1] being assigned to
the loopback interface, causing addresses() to return the loopback
addresses and TestAddressesEmptyInterface to start failing. Fix the
implementation of addresses() so that it only ever returns addresses for
the bridge interface.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-11-08 17:58:06 -05:00
Cory Snider
c2a087a9f7 libnet/d/bridge: use fresh PortAllocator in tests
portallocator.PortAllocator holds persistent state on port allocations
in the network namespace. It normal operation it is used as a singleton,
which is a problem in unit tests as every test runs in a clean network
namespace. The singleton "default" PortAllocator remembers all the port
allocations made in other tests---in other network namespaces---and
can result in spurious test failures. Refactor the bridge driver so that
tests can construct driver instances with unique PortAllocators.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-11-08 17:58:06 -05:00