Commit graph

3048 commits

Author SHA1 Message Date
Sebastiaan van Stijn
e515bef423
libnetwork/internal/kvstore: remove unused WatchTree and NewLock methods
These were not used, and not implemented by the BoltDB store.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-07-05 12:30:18 +02:00
Sebastiaan van Stijn
a373983a86
libnetwork/internal/kvstore: fix some linting issues
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-07-05 12:30:17 +02:00
Sebastiaan van Stijn
05988f88b7
libnetwork/internal/kvstore: remove unused Config options
The only remaining kvstore is BoltDB, which doesn't use TLS connections
or authentication, so we can remove these options.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-07-05 12:30:17 +02:00
Sebastiaan van Stijn
96a1c444cc
libnetwork: use string-literals for easier grep'ing
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-07-05 12:27:01 +02:00
Sebastiaan van Stijn
9ffca1fedd
libnetwork/drivers/remote: rename vars that collided
rename variables that collided with types and pre-defined funcs

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-07-03 10:37:15 +02:00
Sebastiaan van Stijn
40908c5fcd
libnetwork/drivers: inline capabilities options
Remove the intermediate variable, and move the option closer
to where it's used, as in some cases we created the variable,
but could return with an error before it was used.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-30 14:36:01 +02:00
Sebastiaan van Stijn
97285711f3
libnetwork/drivers/overlay: Register does not require DriverCallback
This function was not using the DriverCallback interface, and only
required the Registerer interface.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-30 14:29:30 +02:00
Sebastiaan van Stijn
a718ccd0c5
libnetwork/drivers: remove unused "config" parameters and fields
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-30 14:26:32 +02:00
Sebastiaan van Stijn
dd5ea7e996
libnetwork: format code with gofumpt
Formatting the code with https://github.com/mvdan/gofumpt

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-29 00:31:49 +02:00
Sebastiaan van Stijn
bba21735bf
libnetwork/ipamutils: format code with gofumpt
Formatting the code with https://github.com/mvdan/gofumpt

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-29 00:31:49 +02:00
Sebastiaan van Stijn
0b75c02276
libnetwork/resolvconf: format code with gofumpt
Formatting the code with https://github.com/mvdan/gofumpt

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-29 00:31:48 +02:00
Sebastiaan van Stijn
801cd50744
libnetwork/portallocator: format code with gofumpt
Formatting the code with https://github.com/mvdan/gofumpt

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-29 00:31:48 +02:00
Sebastiaan van Stijn
6187ada21f
libnetwork/options: format code with gofumpt
Formatting the code with https://github.com/mvdan/gofumpt

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-29 00:31:48 +02:00
Sebastiaan van Stijn
882f7bbf1f
libnetwork/osl: format code with gofumpt
Formatting the code with https://github.com/mvdan/gofumpt

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-29 00:31:48 +02:00
Sebastiaan van Stijn
32e716e848
libnetwork/networkdb: format code with gofumpt
Formatting the code with https://github.com/mvdan/gofumpt

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-29 00:31:48 +02:00
Sebastiaan van Stijn
65e2149b3e
libnetwork/netutils: format code with gofumpt
Formatting the code with https://github.com/mvdan/gofumpt

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-29 00:31:48 +02:00
Sebastiaan van Stijn
1cd937a867
libnetwork/etchosts: format code with gofumpt
Formatting the code with https://github.com/mvdan/gofumpt

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-29 00:31:48 +02:00
Sebastiaan van Stijn
540e150e4e
libnetwork/cmd: format code with gofumpt
Formatting the code with https://github.com/mvdan/gofumpt

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-29 00:31:47 +02:00
Sebastiaan van Stijn
fffcbdae4c
libnetwork/iptables: format code with gofumpt
Formatting the code with https://github.com/mvdan/gofumpt

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-29 00:31:47 +02:00
Sebastiaan van Stijn
6f3fcbcfe1
libnetwork/ipam(s): format code with gofumpt
Formatting the code with https://github.com/mvdan/gofumpt

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-29 00:31:47 +02:00
Sebastiaan van Stijn
eb6437b4db
libnetwork/datastore: format code with gofumpt
Formatting the code with https://github.com/mvdan/gofumpt

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-29 00:31:47 +02:00
Sebastiaan van Stijn
defa8ba7b4
ibnetwork/bitmap: format code with gofumpt
Formatting the code with https://github.com/mvdan/gofumpt

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-29 00:31:47 +02:00
Sebastiaan van Stijn
f349754b55
libnetwork/bitseq: format code with gofumpt
Formatting the code with https://github.com/mvdan/gofumpt

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-29 00:31:47 +02:00
Sebastiaan van Stijn
3af2963c74
libnetwork/drvregistry: format code with gofumpt
Formatting the code with https://github.com/mvdan/gofumpt

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-29 00:31:46 +02:00
Sebastiaan van Stijn
dc17f5e613
libnetwork/drivers/remote: format code with gofumpt
Formatting the code with https://github.com/mvdan/gofumpt

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-29 00:31:46 +02:00
Sebastiaan van Stijn
485977de57
libnetwork/drivers/windows: format code with gofumpt
Formatting the code with https://github.com/mvdan/gofumpt

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-29 00:31:46 +02:00
Sebastiaan van Stijn
2cc5c2d2e6
libnetwork/drivers/overlay: format code with gofumpt
Formatting the code with https://github.com/mvdan/gofumpt

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-29 00:31:46 +02:00
Sebastiaan van Stijn
e74028554e
libnetwork/drivers/macvlan: format code with gofumpt
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-29 00:31:46 +02:00
Sebastiaan van Stijn
7b02ccda86
libnetwork/drivers/ipvlan: format code with gofumpt
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-29 00:31:46 +02:00
Sebastiaan van Stijn
17a35bc645
libnetwork/drivers/bridge: format code with gofumpt
Formatting the code with https://github.com/mvdan/gofumpt

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-29 00:31:45 +02:00
Sebastiaan van Stijn
e60cda7051
libnetwork/internal/kvstore/boltdb: fix linting issues
libnetwork/internal/kvstore/boltdb/boltdb.go:452:28: unnecessary conversion (unconvert)
                _ = bucket.Delete([]byte(key))
                                        ^
    libnetwork/internal/kvstore/boltdb/boltdb.go:425:2: S1023: redundant `return` statement (gosimple)
        return
        ^

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-26 20:52:04 +02:00
Sebastiaan van Stijn
d18b89ced6
libnetwork/internal/kvstore: remove some unused code
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-26 20:51:53 +02:00
Sebastiaan van Stijn
b873d70369
replace libkv with local fork
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-26 20:51:42 +02:00
Sebastiaan van Stijn
5d25143ef3
libnetwork/kvstore: rewrite code for new location
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-26 20:49:52 +02:00
Sebastiaan van Stijn
3887475971
Integrate github.com/docker/libkv
A reduced set of the dependency, only taking the parts that are used. Taken from
upstream commit: dfacc563de

    # install filter-repo (https://github.com/newren/git-filter-repo/blob/main/INSTALL.md)
    brew install git-filter-repo

    cd ~/projects

    # create a temporary clone of docker
    git clone https://github.com/docker/libkv.git temp_libkv
    cd temp_libkv

    # create branch to work with
    git checkout -b migrate_libkv

    # remove all code, except for the files we need; rename the remaining ones to their new target location
    git filter-repo --force \
        --path libkv.go \
        --path store/store.go \
        --path store/boltdb/boltdb.go \
        --path-rename libkv.go:libnetwork/internal/kvstore/kvstore_manage.go \
        --path-rename store/store.go:libnetwork/internal/kvstore/kvstore.go \
        --path-rename store/boltdb/:libnetwork/internal/kvstore/boltdb/

    # go to the target github.com/moby/moby repository
    cd ~/projects/docker

    # create a branch to work with
    git checkout -b integrate_libkv

    # add the temporary repository as an upstream and make sure it's up-to-date
    git remote add temp_libkv ~/projects/temp_libkv
    git fetch temp_libkv

    # merge the upstream code, rewriting "pkg/symlink" to "symlink"
    git merge --allow-unrelated-histories --signoff -S temp_libkv/migrate_libkv

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-26 20:47:08 +02:00
Brian Goff
74da6a6363 Switch all logging to use containerd log pkg
This unifies our logging and allows us to propagate logging and trace
contexts together.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2023-06-24 00:23:44 +00:00
Cory Snider
d43b398746
Merge pull request #45657 from corhere/libn/setup-resolver-with-verbose-iptables
libnetwork: fix resolver restore w/ chatty 'iptables -C'
2023-05-30 21:44:14 +02:00
Cory Snider
1178319313 libn: fix resolver restore w/ chatty 'iptables -C'
Resolver.setupIPTable() checks whether it needs to flush or create the
user chains used for NATing container DNS requests by testing for the
existence of the rules which jump to said user chains. Unfortunately it
does so using the IPTable.RawCombinedOutputNative() method, which
returns a non-nil error if the iptables command returns any output even
if the command exits with a zero status code. While that is fine with
iptables-legacy as it prints no output if the rule exists, iptables-nft
v1.8.7 prints some information about the rule. Consequently,
Resolver.setupIPTable() would incorrectly think that the rule does not
exist during container restore and attempt to create it. This happened
work work by coincidence before 8f5a9a741b
because the failure to create the already-existing table would be
ignored and the new NAT rules would be inserted before the stale rules
left in the table from when the container was last started/restored. Now
that failing to create the table is treated as a fatal error, the
incompatibility with iptables-nft is no longer hidden.

Switch to using IPTable.ExistsNative() to test for the existence of the
jump rules as it correctly only checks the iptables command's exit
status without regard for whether it outputs anything.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-05-30 14:32:27 -04:00
Cory Snider
50eb2d2782 libnetwork: fix sandbox restore
The method to restore a network namespace takes a collection of
interfaces to restore with the options to apply. The interface names are
structured data, tuples of (SrcName, DstPrefix) but for whatever reason
are being passed into Restore() serialized to strings. A refactor,
f0be4d126d, accidentally broke the
serialization by dropping the delimiter. Rather than fix the
serialization and leave the time-bomb for someone else to trip over,
pass the interface names as structured data.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-05-30 12:27:59 -04:00
Cory Snider
18bf3aa442 libnetwork: log why osl sandbox restore failed
Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-05-30 12:17:44 -04:00
CrazyMax
fd72b134d5
update generated files
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
2023-05-29 03:28:35 +02:00
CrazyMax
735537d6b1
replace gogofast with gogofaster extension
gogofaster is identical as gogofast but removes XXX_unrecognized

Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
2023-05-29 03:28:35 +02:00
CrazyMax
1eaea43581
fix protos and "go generate" commands
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
2023-05-29 03:28:35 +02:00
Cory Snider
9a692a3802 libn/d/overlay: support encryption on any port
While the VXLAN interface and the iptables rules to mark outgoing VXLAN
packets for encryption are configured to use the Swarm data path port,
the XFRM policies for actually applying the encryption are hardcoded to
match packets with destination port 4789/udp. Consequently, encrypted
overlay networks do not pass traffic when the Swarm is configured with
any other data path port: encryption is not applied to the outgoing
VXLAN packets and the destination host drops the received cleartext
packets. Use the configured data path port instead of hardcoding port
4789 in the XFRM policies.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-05-26 14:36:34 -04:00
Sebastiaan van Stijn
d5dc675d37
Merge pull request #45280 from corhere/libnet/no-overlay-accept-rule
libnetwork/drivers/overlay: stop programming INPUT ACCEPT rule
2023-05-25 21:03:32 +02:00
Bjorn Neergaard
ecbd126d6a
Merge pull request #45586 from corhere/fix-flaky-resolver-test
libnetwork/osl: restore the right thread's netns
2023-05-19 20:45:38 -06:00
Cory Snider
871cf72363 libnetwork: check for netns leaks from prior tests
TestProxyNXDOMAIN has proven to be susceptible to failing as a
consequence of unlocked threads being set to the wrong network
namespace. As the failure mode looks a lot like a bug in the test
itself, it seems prudent to add a check for mismatched namespaces to the
test so we will know for next time that the root cause lies elsewhere.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-05-19 19:36:18 -04:00
Cory Snider
6d79864135 libnetwork/osl: restore the right thread's netns
osl.setIPv6 mistakenly captured the calling goroutine's thread's network
namespace instead of the network namespace of the thread getting its
namespace temporarily changed. As this function appears to only be
called from contexts in the process's initial network namespace, this
mistake would be of little consequence at runtime. The libnetwork unit
tests, on the other hand, unshare network namespaces so as not to
interfere with each other or the host's network namespace. But due to
this bug, the isolation backfires and the network namespace of
goroutines used by a test which are expected to be in the initial
network namespace can randomly become the isolated network namespace of
some other test. Symptoms include a loopback network server running in
one goroutine being inexplicably and randomly being unreachable by a
client in another goroutine.

Capture the original network namespace of the thread from the thread to
be tampered with, after locking the goroutine to the thread.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-05-19 18:35:59 -04:00
Cory Snider
d4f3858a40 libnetwork: leave global logger alone in tests
Swapping out the global logger on the fly is causing tests to flake out
by logging to a test's log output after the test function has returned.
Refactor Resolver to use a dependency-injected logger and the resolver
unit tests to inject a private logger instance into the Resolver under
test.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-05-19 18:35:58 -04:00
Cory Snider
0cc6e445d7 libnetwork: make resolver tests less confusing
tstwriter mocks the server-side connection between the resolver and the
container, not the resolver and the external DNS server, so returning
the external DNS server's address as w.LocalAddr() is technically
incorrect and misleading. Only the protocols need to match as the
resolver uses the client's choice of protocol to determine which
protocol to use when forwarding the query to the external DNS server.
While this change has no material impact on the tests, it makes the
tests slightly more comprehensible for the next person.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-05-19 18:35:58 -04:00
Sebastiaan van Stijn
ab35df454d
remove pre-go1.17 build-tags
Removed pre-go1.17 build-tags with go fix;

    go mod init
    go fix -mod=readonly ./...
    rm go.mod

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-05-19 20:38:51 +02:00
Cory Snider
41356227f2 libnetwork: just forward the external DNS response
Our resolver is just a forwarder for external DNS so it should act like
it. Unless it's a server failure or refusal, take the response at face
value and forward it along to the client. RFC 8020 is only applicable to
caching recursive name servers and our resolver is neither caching nor
recursive.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-05-18 16:04:19 -04:00
Sebastiaan van Stijn
9e817251a8
libnetwork/docs: fix broken link
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-05-10 12:05:05 +02:00
Sebastiaan van Stijn
17882ed614
libnetwork: update example in README.md
Align the example with the code updated in 4e0319c878.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-05-10 12:01:06 +02:00
Cory Snider
4e0319c878 [chore] clean up reexec.Init() calls
Now that most uses of reexec have been replaced with non-reexec
solutions, most of the reexec.Init() calls peppered throughout the test
suites are unnecessary. Furthermore, most of the reexec.Init() calls in
test code neglects to check the return value to determine whether to
exit, which would result in the reexec'ed subprocesses proceeding to run
the tests, which would reexec another subprocess which would proceed to
run the tests, recursively. (That would explain why every reexec
callback used to unconditionally call os.Exit() instead of returning...)

Remove unneeded reexec.Init() calls from test and example code which no
longer needs it, and fix the reexec.Init() calls which are not inert to
exit after a reexec callback is invoked.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-05-09 19:13:17 -04:00
Sebastiaan van Stijn
8142051a3b
libnetwork/osl: unify stubs for NeighOption
Use the same signature for all platforms, but stub the neigh type.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-28 20:20:58 +02:00
Sebastiaan van Stijn
0ea41eaa51
libnetwork/osl: unify stubs for IfaceOption
Use the same signature for all platforms, but stub the nwIface type.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-28 20:20:58 +02:00
Sebastiaan van Stijn
021e89d702
libnetwork/osl: rename var that collided with import
Also renaming another var for consistency ':-)

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-28 20:20:58 +02:00
Sebastiaan van Stijn
3a4158e4fa
libnetwork: add missing stub for getInitializers()
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-28 20:18:33 +02:00
Sebastiaan van Stijn
939a4eb5c9
libnetwork: fix stubs
- sandbox, endpoint changed in c71555f030, but
  missed updating the stubs.
- add missing stub for Controller.cleanupServiceDiscovery()
- While at it also doing some minor (formatting) changes.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-28 20:18:33 +02:00
Sebastiaan van Stijn
17feabcba0
libnetwork: overlayutils: remove redundant init()
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-28 20:18:29 +02:00
Sebastiaan van Stijn
1e9ebfb00c
libnetwork: inline sendKey() into SetExternalKey()
This function included a defer to close the net.Conn if an error occurred,
but the calling function (SetExternalKey()) also had a defer to close it
unconditionally.

Rewrite it to use json.NewEncoder(), which accepts a writer, and inline
the code.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-28 16:44:54 +02:00
Sebastiaan van Stijn
9d8fcb3296
libnetwork: setKey(): remove intermediate buffer
Use json.NewDecoder() instead, which accepts a reader.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-28 16:44:54 +02:00
Sebastiaan van Stijn
a813d7e961
libnetwork: don't register "libnetwork-setkey" re-exec on non-unix
It's a no-op on Windows and other non-Linux, non-FreeBSD platforms,
so there's no need to register the re-exec.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-28 16:44:54 +02:00
Sebastiaan van Stijn
881fff1a2f
libnetwork: processSetKeyReexec: don't use logrus.Fatal()
Just print the error and os.Exit() instead, which makes it more
explicit that we're exiting, and there's no need to decorate the
error.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-28 16:44:40 +02:00
Sebastiaan van Stijn
e974599593
libnetwork: processSetKeyReexec() remove defer()
Split the function into a "backing" function that returns an error, and the
re-exec entrypoint, which handles the error to provide a more idiomatic approach.

This was part of a larger change accross multiple re-exec functions (now removed).

For history's sake; here's the description for that;

The `reexec.Register()` function accepts reexec entrypoints, which are a `func()`
without return (matching a binary's `main()` function). As these functions cannot
return an error, it's the entrypoint's responsibility to handle any error, and to
indicate failures through `os.Exit()`.

I noticed that some of these entrypoint functions had `defer()` statements, but
called `os.Exit()` either explicitly or implicitly (e.g. through `logrus.Fatal()`).
defer statements are not executed if `os.Exit()` is called, which rendered these
statements useless.

While I doubt these were problematic (I expect files to be closed when the process
exists, and `runtime.LockOSThread()` to not have side-effects after exit), it also
didn't seem to "hurt" to call these as was expected by the function.

This patch rewrites some of the entrypoints to split them into a "backing function"
that can return an error (being slightly more iodiomatic Go) and an wrapper function
to act as entrypoint (which can handle the error and exit the executable).

To some extend, I'm wondering if we should change the signatures of the entrypoints
to return an error so that `reexec.Init()` can handle (or return) the errors, so
that logging can be handled more consistently (currently, some some use logrus,
some just print); this would also keep logging out of some packages, as well as
allows us to provide more metadata about the error (which reexec produced the
error for example).

A quick search showed that there's some external consumers of pkg/reexec, so I
kept this for a future discussion / exercise.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-28 12:52:38 +02:00
Sebastiaan van Stijn
56fbbde2ed
libnetwork/resolvconf: fix some minor (linting) issues
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-26 22:49:50 +02:00
Sebastiaan van Stijn
820975595c
libnetwork/resolvconf: improve tests for Build
- Verify the content to be equal, not "contains"; this output should be
  predictable.
- Also verify the content returned by the function to match.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-26 22:49:50 +02:00
Sebastiaan van Stijn
93c7b25ccd
libnetwork/resolvconf: refactor tests for readability
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-26 22:49:50 +02:00
Sebastiaan van Stijn
43378636d0
libnetwork/resolvconf: allow tests to be run on unix
Looks like the intent is to exclude windows (which wouldn't have /etc/resolv.conf
nor systemd), but most tests would run fine elsewhere. This allows running the
tests on macOS for local testing.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-26 22:49:49 +02:00
Sebastiaan van Stijn
73c637ad60
libnetwork/resolvconf: use t.TempDir(), change t.Fatal to t.Error
Use t.TempDir() for convenience, and change some t.Fatal's to Errors,
so that all tests can run instead of failing early.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-26 22:49:49 +02:00
Sebastiaan van Stijn
fc1e698914
libnetwork/resolvconf: fix TestGet() testing wrong path
The test was assuming that the "source" file was always "/etc/resolv.conf",
but the `Get()` function uses `Path()` to find the location of resolv.conf,
which may be different.

While at it, also changed some `t.Fatalf()` to `t.Errorf()`, and renamed
some variables for clarity.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-26 22:49:49 +02:00
Sebastiaan van Stijn
55d18b7db9
libnetwork/resolvconf: use []byte for hash instead of string
After my last change, I noticed that the hash is used as a []byte in most
cases (other than tests). This patch updates the type to use a []byte, which
(although unlikely very important) also improves performance:

Compared to the previous version:

    benchstat new.txt new2.txt
    name         old time/op    new time/op    delta
    HashData-10     128ns ± 1%     116ns ± 1%   -9.77%  (p=0.000 n=20+20)

    name         old alloc/op   new alloc/op   delta
    HashData-10      208B ± 0%       88B ± 0%  -57.69%  (p=0.000 n=20+20)

    name         old allocs/op  new allocs/op  delta
    HashData-10      3.00 ± 0%      2.00 ± 0%  -33.33%  (p=0.000 n=20+20)

And compared to the original version:

    benchstat old.txt new2.txt
    name         old time/op    new time/op    delta
    HashData-10     201ns ± 1%     116ns ± 1%  -42.39%  (p=0.000 n=18+20)

    name         old alloc/op   new alloc/op   delta
    HashData-10      416B ± 0%       88B ± 0%  -78.85%  (p=0.000 n=20+20)

    name         old allocs/op  new allocs/op  delta
    HashData-10      6.00 ± 0%      2.00 ± 0%  -66.67%  (p=0.000 n=20+20)

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-26 22:49:47 +02:00
Sebastiaan van Stijn
630fc3839e
libnetwork/resolvconf: simplify hashData() and improve performance
The code seemed overly complicated, requiring a reader to be constructed,
where in all cases, the data was already available in a variable. This patch
simplifies the utility to not require a reader, which also makes it a bit
more performant:

    go install golang.org/x/perf/cmd/benchstat@latest
    GO111MODULE=off go test -run='^$' -bench=. -count=20 > old.txt
    GO111MODULE=off go test -run='^$' -bench=. -count=20 > new.txt

    benchstat old.txt new.txt
    name         old time/op    new time/op    delta
    HashData-10     201ns ± 1%     128ns ± 1%  -36.16%  (p=0.000 n=18+20)

    name         old alloc/op   new alloc/op   delta
    HashData-10      416B ± 0%      208B ± 0%  -50.00%  (p=0.000 n=20+20)

    name         old allocs/op  new allocs/op  delta
    HashData-10      6.00 ± 0%      3.00 ± 0%  -50.00%  (p=0.000 n=20+20)

A small change was made in `Build()`, which previously returned the resolv.conf
data, even if the function failed to write it. In the new variation, `nil` is
consistently returned on failures.

Note that in various places, the hash is not even used, so we may be able to
simplify things more after this.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-26 22:47:23 +02:00
Sebastiaan van Stijn
214e200f95
Merge pull request #45308 from corhere/libnet/overlay-bpf-ipv6
libnetwork/drivers/overlay: make VNI matcher IPv6-compatible
2023-04-26 14:37:09 +02:00
Brian Goff
0970cb054c
Merge pull request #45366 from akerouanton/fix-docker0-PreferredPool
daemon: set docker0 subpool as the IPAM pool
2023-04-25 11:07:57 -07:00
Albin Kerouanton
2d31697d82
daemon: set docker0 subpool as the IPAM pool
Since cc19eba (backported to v23.0.4), the PreferredPool for docker0 is
set only when the user provides the bip config parameter or when the
default bridge already exist. That means, if a user provides the
fixed-cidr parameter on a fresh install or reboot their computer/server
without bip set, dockerd throw the following error when it starts:

> failed to start daemon: Error initializing network controller: Error
> creating default "bridge" network: failed to parse pool request for
> address space "LocalDefault" pool "" subpool "100.64.0.0/26": Invalid
> Address SubPool

See #45356.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-04-25 15:32:46 +02:00
Cory Snider
c399963243 libn/d/overlay: make VNI matcher IPv6-compatible
Use Linux BPF extensions to locate the offset of the VXLAN header within
the packet so that the same BPF program works with VXLAN packets
received over either IPv4 or IPv6.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-04-24 14:20:29 -04:00
Cory Snider
7d9bb170b7 libn/d/overlay: test the VNI BPF matcher on IPv4
Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-04-24 14:19:39 -04:00
Bjorn Neergaard
660c04803e
Merge pull request #45310 from corhere/libn/delete-network-more-atomically
libnetwork: clean up inDelete network atomically
2023-04-20 20:26:24 +02:00
Albin Kerouanton
1e1efe1f61
libnet/d/overlay: clean up iptables rules on network delete
This commit removes iptables rules configured for secure overlay
networks when a network is deleted. Prior to this commit, only
CreateNetwork() was taking care of removing stale iptables rules.

If one of the iptables rule can't be removed, the erorr is logged but
it doesn't prevent network deletion.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-04-17 17:21:21 +02:00
Cory Snider
c957ad0067 libnetwork: clean up inDelete network atomically
The (*network).ipamRelease function nils out the network's IPAM info
fields, putting the network struct into an inconsistent state. The
network-restore startup code panics if it tries to restore a network
from a struct which has fewer IPAM config entries than IPAM info
entries. Therefore (*network).delete contains a critical section: by
persisting the network to the store after ipamRelease(), the datastore
will contain an inconsistent network until the deletion operation
completes and finishes deleting the network from the datastore. If for
any reason the deletion operation is interrupted between ipamRelease()
and deleteFromStore(), the daemon will crash on startup when it tries to
restore the network.

Updating the datastore after releasing the network's IPAM pools may have
served a purpose in the past, when a global datastore was used for
intra-cluster communication and the IPAM allocator had persistent global
state, but nowadays there is no global datastore and the IPAM allocator
has no persistent state whatsoever. Remove the vestigial datastore
update as it is no longer necessary and only serves to cause problems.
If the network deletion is interrupted before the network is deleted
from the datastore, the deletion will resume during the next daemon
startup, including releasing the IPAM pools.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-04-11 19:00:59 -04:00
Sebastiaan van Stijn
0154746b9f
Merge pull request #44965 from akerouanton/libnetwork-dead-code
libnetwork/overlay: remove dead code
2023-04-11 17:09:45 +02:00
Albin Kerouanton
8ed900263e
libnetwork/overlay: remove host mode
Linux kernel prior to v3.16 was not supporting netns for vxlan
interfaces. As such, moby/libnetwork#821 introduced a "host mode" to the
overlay driver. The related kernel fix is available for rhel7 users
since v7.2.

This mode could be forced through the use of the env var
_OVERLAY_HOST_MODE. However this env var has never been documented and
is not referenced in any blog post, so there's little chance many people
rely on it. Moreover, this host mode is deemed as an implementation
details by maintainers. As such, we can consider it dead and we can
remove it without a prior deprecation warning.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-04-06 19:52:41 +02:00
Albin Kerouanton
1d46597c8b
libnetwork/overlay: remove KVObject implementation
Since 0fa873c, there's no function writing overlay networks to some
datastore. As such, overlay network struct doesn't need to implement
KVObject interface.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-04-06 19:52:29 +02:00
Albin Kerouanton
f32f09e78f
libnetwork/overlay: don't lock network when accessing subnet vni
Since a few commits, subnet's vni don't change during the lifetime of
the subnet struct, so there's no need to lock the network before
accessing it.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-04-06 19:52:27 +02:00
Albin Kerouanton
b67446a8fa
libnetwork: remove local store from overlay driver
Since the previous commit, data from the local store are never read,
thus proving it was only used for Classic Swarm.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-04-06 19:52:27 +02:00
Albin Kerouanton
8aa1060c34
libnetwork/overlay: remove live-restore support
The overlay driver in Swarm v2 mode doesn't support live-restore, ie.
the daemon won't even start if the node is part of a Swarm cluster and
live-restore is enabled. This feature was only used by Swarm Classic.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-04-06 19:52:27 +02:00
Albin Kerouanton
e3708a89cc
libnetwork/overlay: remove vni allocation
VNI allocations made by the overlay driver were only used by Classic
Swarm. With Swarm v2 mode, the driver ovmanager is responsible of
allocating & releasing them.

Previously, vxlanIdm was initialized when a global store was available
but since 142b522, no global store can be instantiated. As such,
releaseVxlanID actually does actually nothing and iptables rules are
never removed.

The last line of dead code detected by golangci-lint is now gone.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-04-06 19:52:27 +02:00
Albin Kerouanton
e251837445
libnetwork/overlay: remove Serf-based clustering
Prior to 0fa873c, the serf-based event loop was started when a global
store was available. Since there's no more global store, this event loop
and all its associated code is dead.

Most dead code detected by golangci-lint in prior commits is now gone.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-04-06 19:52:17 +02:00
Albin Kerouanton
644e3d4cdb
libnetwork/netlabel: remove dead code
- LocalKVProvider, LocalKVProviderURL, LocalKVProviderConfig,
  GlobalKVProvider, GlobalKVProviderURL and GlobalKVProviderConfig
  are all unused since moby/libnetwork@be2b6962 (moby/libnetwork#908).
- GlobalKVClient is unused since 0fa873c and c8d2c6e.
- MakeKVProvider, MakeKVProviderURL and MakeKVProviderConfig are unused
  since 96cfb076 (moby/moby#44683).
- MakeKVClient is unused since 142b5229 (moby/moby#44875).

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-04-06 19:51:56 +02:00
Albin Kerouanton
f8b5fe5724
libnetwork/netutils: remove dead code
- GetIfaceAddr is unused since moby/libnetwork@e51ead59
  (moby/libnetwork#670).
- ValidateAlias and ParseAlias are unused since moby/moby@0645eb84
  (moby/moby#42539).

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-04-06 19:33:04 +02:00
Albin Kerouanton
c8d2c6ea77
libnetwork: remove unused props from windows overlay driver
The overlay driver was creating a global store whenever
netlabel.GlobalKVClient was specified in its config argument. This
specific label is unused anymore since 142b522 (moby/moby#44875).

It was also creating a local store whenever netlabel.LocalKVClient was
specificed in its config argument. This store is unused since
moby/libnetwork@9e72136 (moby/libnetwork#1636).

Finally, the sync.Once properties are never used and thus can be
deleted.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-04-06 19:33:04 +02:00
Albin Kerouanton
0fa873c0fe
libnetwork: remove global store from overlay driver
The overlay driver was creating a global store whenever
netlabel.GlobalKVClient was specified in its config argument. This
specific label is not used anymore since 142b522 (moby/moby#44875).

golangci-lint now detects dead code. This will be fixed in subsequent
commits.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-04-06 19:33:04 +02:00
Albin Kerouanton
00037cd44b
libnetwork: remove ovrouter cmd
This command was useful when overlay networks based on external KV store
was developed but is unused nowadays.

As the last reference to OverlayBindInterface and OverlayNeighborIP
netlabels are in the ovrouter cmd, they're removed too.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-04-06 19:33:04 +02:00
Cory Snider
4d04068184 libn/d/overlay: only program xt_bpf rules
Drop support for platforms which only have xt_u32 but not xt_bpf. No
attempt is made to clean up old xt_u32 iptables rules left over from a
previous daemon instance.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-04-05 11:50:03 -04:00
Cory Snider
1e195acee4 libn/d/overlay: stop programming INPUT ACCEPT rule
Encrypted overlay networks are unique in that they are the only kind of
network for which libnetwork programs an iptables rule to explicitly
accept incoming packets. No other network driver does this. The overlay
driver doesn't even do this for unencrypted networks!

Because the ACCEPT rule is appended to the end of INPUT table rather
than inserted at the front, the rule can be entirely inert on many
common configurations. For example, FirewallD programs an unconditional
REJECT rule at the end of the INPUT table, so any ACCEPT rules appended
after it have no effect. And on systems where the rule is effective, its
presence may subvert the administrator's intentions. In particular,
automatically appending the ACCEPT rule could allow incoming traffic
which the administrator was expecting to be dropped implicitly with a
default-DROP policy.

Let the administrator always have the final say in how incoming
encrypted overlay packets are filtered by no longer automatically
programming INPUT ACCEPT iptables rules for them.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-04-05 11:35:19 -04:00
Sebastiaan van Stijn
878ee341d6
Merge pull request from GHSA-232p-vwff-86mp
libnetwork: ensure encryption is mandatory on encrypted overlay networks
2023-04-04 20:03:51 +02:00
Sebastiaan van Stijn
ae64fd8d6f
Merge pull request #45247 from akerouanton/drop-ElectInterfaceAddress
libnetwork/netutils: drop ElectInterfaceAddresses
2023-03-31 19:27:40 +02:00
Albin Kerouanton
f6b50d52d4
libnetwork/netutils: drop ElectInterfaceAddresses
This is a follow-up of 48ad9e1. This commit removed the function
ElectInterfaceAddresses from utils_linux.go but not their FreeBSD &
Windows counterpart. As these functions are never called, they can be
safely removed.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-03-31 09:37:03 +02:00
Sebastiaan van Stijn
b8e963595e libnetwork: sbState: rename ExtDNS2 back to ExtDNS
The ExtDNS2 field was added in
aad1632c15
to migrate existing state from < 1.14 to a new type. As it's unlikely
that installations still have state from before 1.14, rename ExtDNS2
back to ExtDNS and drop the migration code.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Co-authored-by: Cory Snider <csnider@mirantis.com>
Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-03-30 18:44:24 -04:00
Cory Snider
9e3a6ccf69 libn/i/setmatrix: make generic and constructorless
Allow SetMatrix to be used as a value type with a ready-to-use zero
value. SetMatrix values are already non-copyable by virtue of having a
mutex field so there is no harm in allowing non-pointer values to be
used as local variables or struct fields. Any attempts to pass around
by-value copies, e.g. as function arguments, will be flagged by go vet.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-03-29 13:31:12 -04:00
Brian Goff
0a334ea081
Merge pull request #45164 from corhere/libnet/peer-op-function-call
libnetwork/d/overlay: handle peer ops directly
2023-03-27 09:46:49 -07:00
Albin Kerouanton
bae49ff278
libnet/d/windows: log EnableInternalDNS val after setting it
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-03-24 18:23:21 +01:00
Cory Snider
965eda3b9a libnet/d/overlay: insert the input-drop rule
FirewallD creates the root INPUT chain with a default-accept policy and
a terminal rule which rejects all packets not accepted by any prior
rule. Any subsequent rules appended to the chain are therefore inert.
The administrator would have to open the VXLAN UDP port to make overlay
networks work at all, which would result in all VXLAN traffic being
accepted and defeating our attempts to enforce encryption on encrypted
overlay networks.

Insert the rule to drop unencrypted VXLAN packets tagged for encrypted
overlay networks at the top of the INPUT chain so that enforcement of
mandatory encryption takes precedence over any accept rules configured
by the administrator. Continue to append the accept rule to the bottom
of the chain so as not to override any administrator-configured drop
rules.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-03-22 20:54:01 -04:00
Cory Snider
105b9834fb libnet/d/overlay: add BPF-powered VNI matcher
Some newer distros such as RHEL 9 have stopped making the xt_u32 kernel
module available with the kernels they ship. They do ship the xt_bpf
kernel module, which can do everything xt_u32 can and more. Add an
alternative implementation of the iptables match rule which uses xt_bpf
to implement exactly the same logic as the u32 filter using a BPF
program. Try programming the BPF-powered rules as a fallback when
programming the u32-powered rules fails.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-03-15 19:33:51 -04:00
Cory Snider
44cf27b5fc libnet/d/overlay: extract VNI match rule builder
The iptables rule clause used to match on the VNI of VXLAN datagrams
looks like line noise to the uninitiated. It doesn't help that the
expression is repeated twice and neither copy has any commentary.
DRY out the rule builder to a common function, and document what the
rule does and how it works.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-03-15 19:30:28 -04:00
Cory Snider
142f46cac1 libn/d/overlay: enforce encryption on sandbox init
The iptables rules which make encryption mandatory on an encrypted
overlay network are only programmed once there is a second node
participating in the network. This leaves single-node encrypted overlay
networks vulnerable to packet injection. Furthermore, failure to program
the rules is not treated as a fatal error.

Program the iptables rules to make encryption mandatory before creating
the VXLAN link to guarantee that there is no window of time where
incoming cleartext VXLAN packets for the network would be accepted, or
outgoing cleartext packets be transmitted. Only create the VXLAN link if
programming the rules succeeds to ensure that it fails closed.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-03-15 19:28:11 -04:00
Cory Snider
d4fd582fb2 libnet/d/overlay: document some encryption code
The overlay-network encryption code is woefully under-documented, which
is especially problematic as it operates on under-documented kernel
interfaces. Document what I have puzzled out of the implementation for
the benefit of the next poor soul to touch this code.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-03-15 17:26:24 -04:00
Cory Snider
a050db4a6f libnetwork/d/overlay: handle peer ops directly
Funneling the peer operations into an unbuffered channel only serves to
achieve the same result as a mutex, using a lot more boilerplate and
indirection. Get rid of the boilerplate and unnecessary indirection by
using a mutex and calling the operations directly.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-03-14 18:33:32 -04:00
Cory Snider
09d39c023c libnetwork/i/setmatrix: devirtualize
There is only one implementation. Get rid of the interface.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-03-14 18:09:08 -04:00
Cory Snider
34303ccd55 libnetwork/i/setmatrix: un-embed the mutex
so that it cannot be accessed outside of the package.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-03-14 17:49:59 -04:00
Cory Snider
3c59ef247f libnet/ipam: use netip types internally
The netip types can be used as map keys, unlike net.IP and friends,
which is a very useful property to have for this application.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-23 18:10:01 -05:00
Cory Snider
01dbe23b6f libnet/ipam: simplify the data model
The address spaces are orthogonal. There is no shared state between them
logically so there is no reason for them to share any in-memory data
structures. addrSpace is responsible for allocating subnets and
addresses, while Allocator is responsible for implementing the IPAM API.
Lower all the implementation details of allocation into addrSpace.

There is no longer a need to include the name of the address space in
the map keys for subnets now that each addrSpace holds its own state
independently from other addrSpaces. Remove the AddressSpace field from
the struct used for map keys within an addrSpace so that an addrSpace
does not need to know its own name.

Pool allocations were encoded in a tree structure, using parent
references and reference counters. This structure affords for pools
subdivided an arbitrary number of times to be modeled, in theory. In
practice, the max depth is only two: master pools and sub-pools. The
allocator data model has also been heavily influenced by the
requirements and limitations of Datastore persistence, which are no
longer applicable.

Address allocations are always associated with a master pool. Sub-pools
only serve to restrict the range of addresses within the master pool
which could be allocated from. Model pool allocations within an address
space as a two-level hierarchy of maps.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-23 18:09:22 -05:00
Cory Snider
8273db28f3 libnet/ipam: inline parsePoolRequest function
There is only one call site, in (*Allocator).RequestPool. The logic is
tightly coupled between the caller and callee, and having them separate
makes it harder to reason about.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-23 18:09:22 -05:00
Cory Snider
9a8b45c133 libnet/ipam: drop vestiges of custom addrSpaces
Only two address spaces are supported: LocalDefault and GlobalDefault.
Support for non-default address spaces in the IPAM Allocator is
vestigial, from a time when IPAM state was stored in a persistent shared
datastore. There is no way to create non-default address spaces through
the IPAM API so there is no need to retain code to support the use of
such address spaces. Drop all pretense that more address spaces can
exist, to the extent that the IPAM API allows.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-23 18:09:22 -05:00
Cory Snider
18ac200efe libnet/ipam: get rid of superfluous closure
The two-phase commit dance serves no purpose with the current IPAM
allocator implementation. There are no fallible operations between the
call to aSpace.updatePoolDBOnAdd() and invoking the returned closure.
Allocate the subnet in the address space immediately when called and get
rid of the closure return.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-23 18:09:22 -05:00
Bjorn Neergaard
2f0e308c7d
Merge pull request #45070 from corhere/libnet/fix-networkdb-test-panic
libnet/networkdb: fix nil-dereference panic in test
2023-02-23 15:22:31 -07:00
Cory Snider
88f6b637a0 libnet/networkdb: fix nil-dereference panic in test
Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-23 14:31:48 -05:00
Bjorn Neergaard
855c684708
Merge pull request #44664 from corhere/embedded-resolver-fixes
libnetwork: improve embedded DNS resolver
2023-02-23 12:25:58 -07:00
Sebastiaan van Stijn
ca2fe6859f
Merge pull request #45019 from corhere/libnet/fix-ipam-flaky-test
libnetwork/ipam: fix racy, flaky unit test
2023-02-23 20:24:59 +01:00
Cory Snider
3606d6a7cd Upgrade to golangci-lint v1.51.2
Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-22 14:17:30 -05:00
Cory Snider
f8cfd3a61f libnetwork: devirtualize Resolver type
https://github.com/golang/go/wiki/CodeReviewComments#interfaces

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-16 19:05:59 -05:00
Cory Snider
faaa4fdf18 libnetwork: forward unknown PTR queries externally
PTR queries with domain names unknown to us are not necessarily invalid.
Act like a well-behaved middlebox and fall back to forwarding
externally, same as we do with the other query types.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-16 19:05:59 -05:00
Cory Snider
8f5a9a741b libnetwork: fail loudly on resolver iptables setup
Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-16 19:05:59 -05:00
Cory Snider
25b51cad3d libnetwork: replace ad-hoc semaphore implementation
...for limiting concurrent external DNS requests with
"golang.org/x/sync/semaphore".Weighted. Replace the ad-hoc rate limiter
for when the concurrency limit is hit (which contains a data-race bug)
with "golang.org/x/time/rate".Sometimes.

Immediately retrying with the next server if the concurrency limit has
been hit just further compounds the problem. Wait on the semaphore and
refuse the query if it could not be acquired in a reasonable amount of
time.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-16 19:05:59 -05:00
Cory Snider
a1f7c644be libnetwork: use dns.Client for forwarded requests
It handles figuring out the UDP receive buffer size and setting IO
timeouts, which simplifies our code. It is also more robust to receiving
UDP replies to earlier queries which timed out.

Log failures to perform a client exchange at level error so they are
more visible to operators and administrators.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-16 18:50:23 -05:00
Cory Snider
e6258e6590 libnetwork: reply SERVFAIL if DNS forwarding fails
Fixes moby/moby issue 44575

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-16 18:44:56 -05:00
Cory Snider
9cf8c4f689 libnetwork: extract DNS client exchange to method
forwardExtDNS() will now continue with the next external DNS sever if
co.ReadMsg() returns (nil, nil). Previously it would abort resolving the
query and not reply to the container client. The implementation of
ReadMsg() in the currently- vendored version of miekg/dns cannot return
(nil, nil) so the difference is immaterial in practice.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-16 18:44:56 -05:00
Cory Snider
854ec3ffb3 libnetwork: extract dialExtDNS to method
Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-16 18:44:56 -05:00
Cory Snider
51cdd7ceac libnetwork: truncate DNS msgs using library method
(*dns.Msg).Truncate() is more intelligent and standards-compliant about
truncating DNS response messages than our hand-rolled version. Fix a
silly fencepost error the max TCP message size: the limit is
dns.MaxMsgSize (65535), full stop.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-16 18:44:56 -05:00
Cory Snider
860e83e52f libnetwork: get rid of truncation red herring
The TC flag in a DNS message indicates that the sender had to
truncate it to fit within the length limit of the transmission channel.
It does NOT indicate that part of the message was lost before reaching
the recipient. Older versions of github.com/miekg/dns conflated the two
cases by returning ErrTruncated from ReadMsg() if the message was parsed
without error but had the TC flag set. The version of miekg/dns
currently vendored no longer returns an error when a well-formed DNS
message is received which has its TC flag set, but there was some
confusion on how to update libnetwork to deal with this behaviour
change. Truncated DNS replies are no longer different from any other
reply message: they are normal replies which do not need any special-
case handling to proxy back to the client.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-16 18:44:56 -05:00
Cory Snider
8a35fb0d1c libnetwork: refactor ServeDNS for readability
Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-16 18:44:55 -05:00
Cory Snider
0bd30e90bb libnetwork: reply SERVFAIL on resolve error
...instead of silently dropping the DNS query.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-16 18:44:54 -05:00
Cory Snider
92aa6e6282 libnetwork: extract fn for external DNS forwarding
Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-16 18:43:15 -05:00
Cory Snider
78792eae68 libnetwork: add regression test for issue 44575
Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-16 18:41:13 -05:00
Cory Snider
b62445871e libnet/ipam: fix racy, flaky unit test
TestRequestReleaseAddressDuplicate gets flagged by go test -race because
the same err variable inside the test is assigned to from multiple
goroutines without synchronization, which obscures whether or not there
are any data races in the code under test.

Trouble is, the test _depends on_ the data race to exit the loop if an
error occurs inside a spawned goroutine. And the test contains a logical
concurrency bug (not flagged by the Go race detector) which can result
in false-positive test failures. Because a release operation is logged
after the IP is released, the other goroutine could reacquire the
address and log that it was reacquired before the release is logged.

Fix up the test so it is no longer subject to data races or
false-positive test failures, i.e. flakes.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-16 18:00:31 -05:00
Cory Snider
4e27640076
Merge pull request #45017 from neersighted/forwardport/44980
Fix nil pointer dereference when attempting to log DNS errors
2023-02-16 16:58:20 -05:00
er0k
81f9f90e47
Do not log connection info before the connection exists
If the resolver encounters an error before it attempts to forward the
request to external DNS, do not try to log information about the
external connection, because at this point `extConn` is `nil`. This
makes sure `dockerd` won't panic and crash from a nil pointer
dereference when it sees an invalid DNS query.

fixes #44979

Signed-off-by: er0k <er0k@er0k.net>
(cherry picked from commit 6c2637be11)
Signed-off-by: Bjorn Neergaard <bneergaard@mirantis.com>
2023-02-16 13:21:45 -07:00
Sebastiaan van Stijn
243a55d603
Merge pull request #44500 from tiborvass/libnetwork_test_race
libnetwork/networkdb: make go test -race ./libnetwork/networkdb pass
2023-02-16 21:12:34 +01:00
Cory Snider
dea3f2b417 Migrate away from things deprecated in Go 1.20
"math/rand".Seed
  - Migrate to using local RNG instances.

"archive/tar".TypeRegA
  - The deprecated constant tar.TypeRegA is the same value as
    tar.TypeReg and so is not needed at all.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-15 12:30:32 -05:00
Cory Snider
046cc9e776 libnetwork: check DNS loopback with user DNS opts
DNS servers in the loopback address range should always be resolved in
the host network namespace when the servers are configured by reading
from the host's /etc/resolv.conf. The daemon mistakenly conflated the
presence of DNS options (docker run --dns-opt) with user-supplied DNS
servers, treating the list of servers loaded from the host as a user-
supplied list and attempting to resolve in the container's network
namespace. Correct this oversight so that loopback DNS servers are only
resolved in the container's network namespace when the user provides the
DNS server list, irrespective of other DNS configuration.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-10 16:20:06 -05:00
Cory Snider
d31fa84c7c libnet/networkdb: use atomics for stats counters
The per-network statistics counters are loaded and incremented without
any concurrency control. Use atomic integers to prevent data races
without having to add any synchronization.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-10 15:21:58 -05:00
Tibor Vass
3539452ef0 libnetwork/networkdb: make go test -race ./libnetwork/networkdb pass
Signed-off-by: Tibor Vass <teabee89@gmail.com>
Co-authored-by: Cory Snider <csnider@mirantis.com>
Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-10 15:18:31 -05:00
Cory Snider
5287b2ddbf libnet/ipam: stop eagerly stringifying debug logs
The (*bitmap.Handle).String() method can be rather expensive to call. It
is all the more tragic when the expensively-constructed string is
immediately discarded because the log level is not high enough. Let
logrus stringify the arguments to debug logs so they are only
stringified when the log level is high enough.

    # Before
    ok  	github.com/docker/docker/libnetwork/ipam	10.159s
    # After
    ok  	github.com/docker/docker/libnetwork/ipam	2.484s

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-08 16:54:27 -05:00
Cory Snider
91725ddc92 libnet/d/ipvlan: gracefully migrate from older dbs
IPVLAN networks created on Moby v20.10 do not have the IpvlanFlag
configuration value persisted in the libnetwork database as that config
value did not exist before v23.0.0. Gracefully migrate configurations on
unmarshal to prevent type-assertion panics at daemon start after upgrade.

Fixes #44925

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-06 12:08:28 -05:00
Cory Snider
7950abcc46 libnetwork: delete CHANGELOG.md
It hasn't been updated since 2016.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-30 19:01:26 -05:00
Cory Snider
a264f2dc55 libnetwork/ipam: skip Destroy()ing bitseq.Handle values
The (*bitseq.Handle).Destroy() method deletes the persisted KVObject
from the datastore. This is a no-op on all the bitseq handles in package
ipam as they are not persisted in any datastore.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-27 11:47:43 -05:00
Cory Snider
6f08fe20e9 libnetwork/bit{seq,map}: delete CheckConsistency()
That method was only referenced by ipam.Allocator, but as it no longer
stores any state persistently there is no possibility for it to load an
inconsistent bit-sequence from Docker 1.9.x.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-27 11:47:43 -05:00
Cory Snider
a08a254df3 libnetwork: drop DatastoreConfig discovery type
The DatastoreConfig discovery type is unused. Remove the constant and
any resulting dead code. Today's biggest loser is the IPAM Allocator:
DatastoreConfig was the only type of discovery event it was listening
for, and there was no other place where a non-nil datastore could be
passed into the allocator. Strip out all the dead persistence code from
Allocator, leaving it as purely an in-memory implementation.

There is no more need to check the consistency of the allocator's
bit-sequences as there is no persistent storage for inconsistent bit
sequences to be loaded from.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-27 11:47:43 -05:00
Cory Snider
28edc8e2d6 libnet: convert to new-style driver registration
Per the Interface Segregation Principle, network drivers should not have
to depend on GetPluginGetter methods they do not use. The remote network
driver is the only one which needs a PluginGetter, and it is already
special-cased in Controller so there is no sense warping the interfaces
to achieve a foolish consistency. Replace all other network drivers' Init
functions with Register functions which take a driverapi.Registerer
argument instead of a driverapi.DriverCallback. Add back in Init wrapper
functions for only the drivers which Swarmkit references so that
Swarmkit can continue to build.

Refactor the libnetwork Controller to use the new drvregistry.Networks
and drvregistry.IPAMs driver registries in place of the legacy ones.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-27 11:47:42 -05:00
Cory Snider
5595311209 libnetwork/drvregistry: split up the registries
There is no benefit to having a single registry for both IPAM drivers
and network drivers. IPAM drivers are registered in a separate namespace
from network drivers, have separate registration methods, separate
accessor methods and do not interact with network drivers within a
DrvRegistry in any way. The only point of commonality is

    interface { GetPluginGetter() plugingetter.PluginGetter }

which is only used by the respective remote drivers and therefore should
be outside of the scope of a driver registry.

Create new, separate registry types for network drivers and IPAM
drivers, respectively. These types are "legacy-free". Neither type has
GetPluginGetter methods. The IPAMs registry does not have an
IPAMDefaultAddressSpaces method as that information can be queried
directly from the driver using its GetDefaultAddressSpaces method.
The Networks registry does not have an AddDriver method as that method
is a trivial wrapper around calling one of its arguments with its other
arguments.

Refactor DrvRegistry in terms of the new IPAMs and Networks registries
so that existing code in libnetwork and Swarmkit will continue to work.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-27 11:47:42 -05:00
Cory Snider
d478e13639 libnet: un-plumb datastores from IPAM inits
The datastore arguments to the IPAM driver Init() functions are always
nil, even in Swarmkit. The only IPAM driver which consumed the
datastores was builtin; all others (null, remote, windowsipam) simply
ignored it. As the signatures of the IPAM driver init functions cannot
be changed without breaking the Swarmkit build, they have to be left
with the same signatures for the time being. Assert that nil datastores
are always passed into the builtin IPAM driver's init function so that
there is no ambiguity the datastores are no longer respected.

Add new Register functions for the IPAM drivers which are free from the
legacy baggage of the Init functions. (The legacy Init functions can be
removed once Swarmkit is migrated to using the Register functions.) As
the remote IPAM driver is the only one which depends on a PluginGetter,
pass it in explicitly as an argument to Register. The other IPAM drivers
should not be forced to depend on a GetPluginGetter() method they do not
use (Interface Segregation Principle).

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-27 11:47:42 -05:00
Cory Snider
27cca19c9a libnetwork/drvregistry: drop unused args
Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-26 17:56:40 -05:00
Cory Snider
befff0e13f libnetwork: remove more datastore scope plumbing
Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-26 17:56:40 -05:00
Cory Snider
142b522946 libnetwork/config: remove vestiges of global scope
Without (*Controller).ReloadConfiguration, the only way to configure
datastore scopes would be by passing config.Options to libnetwork.New.
The only options defined which relate to datastore scopes are limited to
configuring the local-scope datastore. Furthermore, the default
datastore config only defines configuration for the local-scope
datastore. The local-scope datastore is therefore the only datastore
scope possible in libnetwork. Start removing code which is only
needed to support multiple datastore scopes.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-26 17:56:29 -05:00
Cory Snider
52d9883812 libnetwork: drop (*Controller).ReloadConfiguration
...as it is unused.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-26 16:48:35 -05:00
Cory Snider
e8011d7872 libnw/ipamutils: make local defaults immutable
ConfigLocalScopeDefaultNetworks is now dead code, thank goodness! Make
sure it stays dead by deleting the function. Refactor package ipamutils
to simplify things given its newly-reduced (ahem) scope.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-26 14:56:12 -05:00
Cory Snider
540d1e0561 libnw: untangle IPAM allocator from global state
ipam.Allocator is not a singleton, but it references mutable singleton
state. Address that deficiency by refactoring it to instead take the
predefined address spaces as constructor arguments. Unfortunately some
work is needed on the Swarmkit side before the mutable singleton state
can be completely eliminated from the IPAMs.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-26 14:56:12 -05:00
Cory Snider
48ad9e19e4 libnetwork/netutils: drop ElectInterfaceAddresses
The function references global shared, mutable state and is no longer
needed. Deleting it brings us one step closer to getting rid of that
pesky shared state.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-26 14:56:11 -05:00
Brian Goff
3119b1ef7f
Merge pull request #44864 from corhere/libnet/split-bitseq
libnetwork: split package bitseq
2023-01-26 11:44:45 -08:00
Bjorn Neergaard
390532cbc6
libnetwork/windows/overlay: drop unused variables
These package-level variables were copied over from the Linux
implementation; drop them for clarity's sake.

Signed-off-by: Bjorn Neergaard <bneergaard@mirantis.com>
2023-01-24 12:44:19 -07:00
Bjorn Neergaard
b3e6aa9316
libnetwork/netutils: clean up GenerateIfaceName
netlink offers the netlink.LinkNotFoundError type, which we can use with
errors.As() to detect a unused link name.

Additionally, early return if GenerateRandomName fails, as reading
random bytes should be a highly reliable operation, and otherwise the
error would be swallowed by the fall-through return.

Signed-off-by: Bjorn Neergaard <bneergaard@mirantis.com>
2023-01-24 12:44:17 -07:00
Bjorn Neergaard
3775939303
libnetwork/netutils: refactor GenerateRandomName
GenerateRandomName now uses length to represent the overall length of
the string; this will help future users avoid creating interface names
that are too long for the kernel to accept by mistake. The test coverage
is increased and cleaned up using gotest.tools.

Signed-off-by: Bjorn Neergaard <bneergaard@mirantis.com>
2023-01-24 12:44:14 -07:00
Cory Snider
c0eb207b76 libnetwork/bitseq: refactor JSON marshaling
Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-20 15:54:19 -05:00
Cory Snider
89ae725d23 libnetwork/bitseq: make mutex an unexported field
*bitseq.Handle should not implement sync.Locker. The mutex is a private
implementation detail which external consumers should not be able to
manipulate.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-20 15:54:14 -05:00
Cory Snider
94ef26428b libnetwork/bitseq: refactor in terms of bitmap
Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-20 15:54:01 -05:00
Cory Snider
143c092187 libnetwork/bitmap: optimize binary serialization
The byte-slice temporary is fully overwritten on each loop iteration so
it can be safely reused to reduce GC pressure.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-20 13:57:44 -05:00
Cory Snider
c4d7294b5c libnetwork/bitmap: remove datastore concerns
There is a solid bit-vector datatype hidden inside bitseq.Handle, but
it's obscured by all the intrusive datastore and KVObject nonsense.
It can be used without a datastore, but the datastore baggage limits its
utility inside and outside libnetwork. Extract the datatype goodness
into its own package which depends only on the standard library so it
can be used in more situations.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-20 13:42:40 -05:00
Cory Snider
ad03a09451 libnetwork/bitmap: dup from package bitseq
...in preparation for separating the bit-sequence datatype from the
datastore persistence (KVObject) concerns. The new package's contents
are identical to those of package libnetwork/bitseq to assist in
reviewing the changes made on each side of the split.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-20 13:33:22 -05:00
Cory Snider
cd2e7fafd4 libnetwork/bitseq: add marshal/unmarshal tests
Catch any backwards-compatibility hazards with serialization of
sequences early.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-20 12:23:21 -05:00
Albin Kerouanton
ffd75c2e0c
libnetwork: Support IPv6 in arrangeUserFilterRule() (redux)
This reapplies commit 2d397beb00.

Fixes #44451.

Co-authored-by: Bjorn Neergaard <bneergaard@mirantis.com>
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Signed-off-by: Bjorn Neergaard <bneergaard@mirantis.com>
2023-01-14 19:11:44 -07:00
Bjorn Neergaard
17723691e5
Revert "libnetwork: Support IPv6 in arrangeUserFilterRule()"
This reverts commit 2d397beb00.

moby#44706 and moby#44805 were both merged, and both refactored the same
file. The combination broke the build, and was not detected in CI as
only the combination of the two, applied to the same parent commit,
caused the failure.

moby#44706 should be carried forward, based on the current master, in
order to resolve this conflict.

Signed-off-by: Bjorn Neergaard <bneergaard@mirantis.com>
2023-01-14 15:31:56 -07:00
Bjorn Neergaard
803c21f4b2
Merge pull request #44706 from akerouanton/fix-44451
libnetwork: Support IPv6 in arrangeUserFilterRule()
2023-01-14 15:18:01 -07:00
Cory Snider
8be470eea8 libnetwork: don't embed mutex in network
Embedded structs are part of the exported surface of a struct type.
Boxing a struct value into an interface value does not erase that;
any code could gain access to the embedded struct value with a simple
type assertion. The mutex is supposed to be a private implementation
detail, but *network implements sync.Locker because the mutex is
embedded. Change the mutex to an unexported field so *network no
longer spuriously implements the sync.Locker interface.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-13 14:19:06 -05:00
Cory Snider
c71555f030 libnetwork: return concrete-typed *Endpoint
libnetwork.Endpoint is an interface with a single implementation.

https://github.com/golang/go/wiki/CodeReviewComments#interfaces

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-13 14:19:06 -05:00
Cory Snider
581f005aad libnetwork: don't embed mutex in endpoint
Embedded structs are part of the exported surface of a struct type.
Boxing a struct value into an interface value does not erase that;
any code could gain access to the embedded struct value with a simple
type assertion. The mutex is supposed to be a private implementation
detail, but *endpoint implements sync.Locker because the mutex is
embedded. Change the mutex to an unexported field so *endpoint no
longer spuriously implements the sync.Locker interface.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-13 14:19:06 -05:00
Cory Snider
0e91d2e0e9 libnetwork: return concrete-typed *Sandbox
Basically every exported method which takes a libnetwork.Sandbox
argument asserts that the value's concrete type is *sandbox. Passing any
other implementation of the interface is a runtime error! This interface
is a footgun, and clearly not necessary. Export and use the concrete
type instead.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-13 14:19:06 -05:00
Cory Snider
0425baf883 libnetwork: don't embed mutex in sandbox
Embedded structs are part of the exported surface of a struct type.
Boxing a struct value into an interface value does not erase that;
any code could gain access to the embedded struct value with a simple
type assertion. The mutex is supposed to be a private implementation
detail, but *sandbox implements sync.Locker because the mutex is
embedded. Change the mutex to an unexported field so *sandbox no
longer spuriously implements the sync.Locker interface.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-13 14:09:37 -05:00
Cory Snider
f96b9bf761 libnetwork: return concrete-typed *Controller
libnetwork.NetworkController is an interface with a single
implementation.

https://github.com/golang/go/wiki/CodeReviewComments#interfaces

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-13 14:09:37 -05:00
Cory Snider
ae09fe3da7 libnetwork: don't embed mutex in controller
Embedded structs are part of the exported surface of a struct type.
Boxing a struct value into an interface value does not erase that;
any code could gain access to the embedded struct value with a simple
type assertion. The mutex is supposed to be a private implementation
detail, but *controller implements sync.Locker because the mutex is
embedded.

    c, _ := libnetwork.New()
    c.(sync.Locker).Lock()

Change the mutex to an unexported field so *controller no longer
spuriously implements the sync.Locker interface.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-13 14:09:37 -05:00
Brian Goff
483b03562a
Merge pull request #44491 from corhere/libnetwork-minus-reexec
libnetwork: eliminate almost all reexecs
2023-01-13 10:44:25 -08:00
Bjorn Neergaard
dae48a8064
Merge pull request #44803 from akerouanton/fix-44721
libnetwork: Remove iptables nat rule when hairpin is disabled
2023-01-12 08:36:10 -07:00
Cory Snider
102090916e libnetwork: addRedirectRules without reexec
Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-11 12:14:32 -05:00
Cory Snider
582dd705c1 libnetwork: fwmarker without reexec
Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-11 12:14:32 -05:00
Cory Snider
d6cc02d301 libnetwork: drop (resolver).resolverKey field
...as it is now unused.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-11 12:14:32 -05:00
Cory Snider
50a4951ddc libnetwork: setup DNS resolver without reexec
Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-11 12:14:32 -05:00
Cory Snider
4733127a04 libnetwork: set default VLAN without reexec
Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-11 12:14:31 -05:00
Cory Snider
7037c48e58 libnetwork: set IPv6 without reexec
unshare.Go() is not used as an existing network namespace needs to be
entered, not a new one created. Explicitly lock main() to the initial
thread so as not to depend on the side effects of importing the
internal/unshare package to achieve the same.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-11 12:05:39 -05:00
Cory Snider
0246332954 libnetwork: create netns without reexec
Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-11 12:05:39 -05:00
Albin Kerouanton
ef161d4aeb
libnetwork: Clean up sysfs-based operations
- The oldest kernel version currently supported is v3.10. Bridge
parameters can be set through netlink since v3.8 (see
torvalds/linux@25c71c7). As such, we don't need to fallback to sysfs to
set hairpin mode.
- `scanInterfaceStats()` is never called, so no need to keep it alive.
- Document why `default_pvid` is set through sysfs

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-01-11 17:01:53 +01:00
Albin Kerouanton
566a2e4c79
libnetwork: Remove iptables nat rule when hairpin is disabled
When userland-proxy is turned off and on again, the iptables nat rule
doing hairpinning isn't properly removed. This fix makes sure this nat
rule is removed whenever the bridge is torn down or hairpinning is
disabled (through setting userland-proxy to true).

Unlike for ip masquerading and ICC, the `programChainRule()` call
setting up the "MASQ LOCAL HOST" rule has to be called unconditionally
because the hairpin parameter isn't restored from the driver store, but
always comes from the driver config.

For the "SKIP DNAT" rule, things are a bit different: this rule is
always deleted by `removeIPChains()` when the bridge driver is
initialized.

Fixes #44721.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-01-11 16:32:18 +01:00
Sebastiaan van Stijn
65aa43bf66
libnetwork: use example.com for tests and examples
Trying to remove the "docker.io" domain from locations where it's not relevant.
In these cases, this domain was used as a "random" domain for testing or example
purposes.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-01-10 15:27:58 +01:00
Albin Kerouanton
2d397beb00
libnetwork: Support IPv6 in arrangeUserFilterRule()
Fixes #44451.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-01-09 17:18:28 +01:00
Jan Garcia
6ab12ec8f4 rootless: move ./rootless to ./pkg/rootless
Signed-off-by: Jan Garcia <github-public@n-garcia.com>
2023-01-09 16:26:06 +01:00
Albin Kerouanton
b37d34307d
Clear conntrack entries for published UDP ports
Conntrack entries are created for UDP flows even if there's nowhere to
route these packets (ie. no listening socket and no NAT rules to
apply). Moreover, iptables NAT rules are evaluated by netfilter only
when creating a new conntrack entry.

When Docker adds NAT rules, netfilter will ignore them for any packet
matching a pre-existing conntrack entry. In such case, when
dockerd runs with userland proxy enabled, packets got routed to it and
the main symptom will be bad source IP address (as shown by #44688).

If the publishing container is run through Docker Swarm or in
"standalone" Docker but with no userland proxy, affected packets will
be dropped (eg. routed to nowhere).

As such, Docker needs to flush all conntrack entries for published UDP
ports to make sure NAT rules are correctly applied to all packets.

- Fixes #44688
- Fixes #8795
- Fixes #16720
- Fixes #7540
- Fixes moby/libnetwork#2423
- and probably more.

As a precautionary measure, those conntrack entries are also flushed
when revoking external connectivity to avoid those entries to be reused
when a new sandbox is created (although the kernel should already
prevent such case).

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-01-05 12:53:22 +01:00
Bjorn Neergaard
f5106148e3
Merge pull request #43060 from akerouanton/fix-42127
Check iptables options before looking for ip6tables binary
2022-12-29 17:13:36 -07:00
Albin Kerouanton
799cc143c9
Always use iptables -C to look for rules
iptables -C flag was introduced in v1.4.11, which was released ten
years ago. Thus, there're no more Linux distributions supported by
Docker using this version. As such, this commit removes the old way of
checking if an iptables rule exists (by using substring matching).

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2022-12-23 11:04:28 +01:00
Albin Kerouanton
205e5278c6
Merge iptables.probe() into iptables.detectIptables()
The former was doing some checks and logging warnings, whereas
the latter was doing the same checks but to set some internal variables.
As both are called only once and from the same place, there're now
merged together.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2022-12-23 11:04:28 +01:00
Cory Snider
3330fc2e93
Merge pull request #44667 from masibw/44610-logs-for-DNS-failures
libnetwork: improve logs for DNS failures
2022-12-22 16:24:47 -05:00