Commit graph

3320 commits

Author SHA1 Message Date
Rob Murray
52d9b0cb56 Remove unused error types.
Signed-off-by: Rob Murray <rob.murray@docker.com>
2023-12-21 12:47:59 +00:00
Albin Kerouanton
f9135cdeb5
libnet: Improve the debug log written when the extKeyListener is stopped
This log message was quite spreading FUD whereas it's absolutely benign.
Reword it.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-12-21 12:38:08 +01:00
Sebastiaan van Stijn
7bc56c5365
Merge pull request #46853 from akerouanton/libnet-ep-dns-names
libnet: Endpoint: remove isAnonymous & myAliases
2023-12-20 19:53:16 +01:00
Albin Kerouanton
13915f6521
libnet: document what Network.networkType represents
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-12-20 19:04:37 +01:00
Albin Kerouanton
6a2542dacf
libnet: remove Endpoint.anonymous
No more concept of "anonymous endpoints". The equivalent is now an
endpoint with no DNSNames set.

Some of the code removed by this commit was mutating user-supplied
endpoint's Aliases to add container's short ID to that list. In order to
preserve backward compatibility for the ContainerInspect endpoint, this
commit also takes care of adding that short ID (and the container
hostname) to `EndpointSettings.Aliases` before returning the response.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-12-20 19:04:37 +01:00
Sebastiaan van Stijn
388216fc45
Merge pull request #46850 from robmry/46829-allow_ipv6_subnet_change
Allow overlapping change in bridge's IPv6 network.
2023-12-19 18:35:13 +01:00
Cory Snider
5eaf898fcb libnetwork: write ServFail if DNS reply msg is bad
If the resolver's DNSBackend returns a name that cannot be marshaled
into a well-formed DNS message, the resolver will only discover this
when it attempts to write the reply message and it fails with an error.
No reply message is sent, leaving the client to wait out its timeout and
the user in the dark about what went wrong.

When writing the intended reply message fails, retry once with a
ServFail response to inform the client and user that the DNS query was
not resolved due to a problem with to the resolver, not the network.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-12-19 11:24:33 -05:00
Cory Snider
1da85f7bdc libnetwork: assert DNS replies are well-formed
The well-formedness of a DNS message is only checked when it is
serialized, through the (*dns.Msg).Pack() method. Add a call to Pack()
to our tstwriter mock to mirror the behaviour of the real
dns.ResponseWriter implementation. And fix tests which generated
ill-formed DNS query messages.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-12-19 11:13:35 -05:00
Albin Kerouanton
7a9b680a9c
libnet: remove Endpoint.myAliases
This property is now unused, let's get rid of it.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-12-19 10:20:38 +01:00
Albin Kerouanton
8b7af1d0fc
libnet: update dnsNames on ContainerRename
The `(*Endpoint).rename()` method is changed to only mutate `ep.name`
and let a new method `(*Endpoint).UpdateDNSNames()` handle DNS updates.

As a consequence, the rollback code that was part of
`(*Endpoint).rename()` is now removed, and DNS updates are now
rolled back by `ContainerRename`.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-12-19 10:20:38 +01:00
Albin Kerouanton
3bb13c7eb4
libnet: Use Endpoint.dnsNames to create DNS records
Instead of special-casing anonymous endpoints, use the list of DNS names
associated to the endpoint.

`(*Endpoint).isAnonymous()` has no more uses, so let's delete it.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-12-19 10:20:37 +01:00
Albin Kerouanton
f5cc497eac
libnet: populate Endpoint.dnsNames on UnmarshalJSON
This new property will be empty if the daemon was upgraded with
live-restore enabled. To not break DNS resolutions for restored
containers, we need to populate dnsNames based on endpoint's myAliases &
anonymous properties.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-12-19 10:16:05 +01:00
Albin Kerouanton
ab8968437b
daemon: build the list of endpoint's DNS names
Instead of special-casing anonymous endpoints in libnetwork, let the
daemon specify what (non fully qualified) DNS names should be associated
to container's endpoints.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-12-19 10:16:04 +01:00
Albin Kerouanton
dc1e73cbbf
libnet: add a new dnsNames property to Endpoint
This new property is meant to replace myAliases and anonymous
properties.

The end goal is to get rid of both properties by letting the daemon
determine what (non fully qualified) DNS names should be associated to
them.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-12-18 18:38:25 +01:00
Rob Murray
27f3abd893 Allow overlapping change in bridge's IPv6 network.
Calculate the IPv6 addreesses needed on a bridge, then reconcile them
with the addresses on an existing bridge by deleting then adding as
required.

(Previously, required addresses were added one-by-one, then unwanted
addresses were removed. This meant the daemon failed to start if, for
example, an existing bridge had address '2000:db8::/64' and the config
was changed to '2000:db8::/80'.)

IPv6 addresses are now calculated and applied in one go, so there's no
need for setupVerifyAndReconcile() to check the set of IPv6 addresses on
the bridge. And, it was guarded by !config.InhibitIPv4, which can't have
been right. So, removed its IPv6 parts, and added IPv4 to its name.

Link local addresses, the example given in the original ticket, are now
released when containers are stopped. Not releasing them meant that
when using an LL subnet on the default bridge, no container could be
started after a container was stopped (because the calculated address
could not be re-allocated). In non-default bridge networks using an
LL subnet, addresses leaked.

Linux always uses the standard 'fe80::/64' LL network. So, if a bridge
is configured with an LL subnet prefix that overlaps with it, a config
error is reported. Non-overlapping LL subnet prefixes are allowed.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2023-12-18 16:10:41 +00:00
Albin Kerouanton
d6a656cf7f
libnet: Remove unused cmd/readme_test
This command was originally added by ea7f555446
to test the code snippet put into libnet's README.md. Nothing compiles
this file and it doesn't add any value to the project. So better remove
it than maintaining it.

This commit also removes the code snippet from libnet's README.md for
the same reasons.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-12-16 13:06:15 +01:00
Sebastiaan van Stijn
2cf230951f
add //go:build directives to prevent downgrading to go1.16 language
This repository is not yet a module (i.e., does not have a `go.mod`). This
is not problematic when building the code in GOPATH or "vendor" mode, but
when using the code as a module-dependency (in module-mode), different semantics
are applied since Go1.21, which switches Go _language versions_ on a per-module,
per-package, or even per-file base.

A condensed summary of that logic [is as follows][1]:

- For modules that have a go.mod containing a go version directive; that
  version is considered a minimum _required_ version (starting with the
  go1.19.13 and go1.20.8 patch releases: before those, it was only a
  recommendation).
- For dependencies that don't have a go.mod (not a module), go language
  version go1.16 is assumed.
- Likewise, for modules that have a go.mod, but the file does not have a
  go version directive, go language version go1.16 is assumed.
- If a go.work file is present, but does not have a go version directive,
  language version go1.17 is assumed.

When switching language versions, Go _downgrades_ the language version,
which means that language features (such as generics, and `any`) are not
available, and compilation fails. For example:

    # github.com/docker/cli/cli/context/store
    /go/pkg/mod/github.com/docker/cli@v25.0.0-beta.2+incompatible/cli/context/store/storeconfig.go:6:24: predeclared any requires go1.18 or later (-lang was set to go1.16; check go.mod)
    /go/pkg/mod/github.com/docker/cli@v25.0.0-beta.2+incompatible/cli/context/store/store.go:74:12: predeclared any requires go1.18 or later (-lang was set to go1.16; check go.mod)

Note that these fallbacks are per-module, per-package, and can even be
per-file, so _(indirect) dependencies_ can still use modern language
features, as long as their respective go.mod has a version specified.

Unfortunately, these failures do not occur when building locally (using
vendor / GOPATH mode), but will affect consumers of the module.

Obviously, this situation is not ideal, and the ultimate solution is to
move to go modules (add a go.mod), but this comes with a non-insignificant
risk in other areas (due to our complex dependency tree).

We can revert to using go1.16 language features only, but this may be
limiting, and may still be problematic when (e.g.) matching signatures
of dependencies.

There is an escape hatch: adding a `//go:build` directive to files that
make use of go language features. From the [go toolchain docs][2]:

> The go line for each module sets the language version the compiler enforces
> when compiling packages in that module. The language version can be changed
> on a per-file basis by using a build constraint.
>
> For example, a module containing code that uses the Go 1.21 language version
> should have a `go.mod` file with a go line such as `go 1.21` or `go 1.21.3`.
> If a specific source file should be compiled only when using a newer Go
> toolchain, adding `//go:build go1.22` to that source file both ensures that
> only Go 1.22 and newer toolchains will compile the file and also changes
> the language version in that file to Go 1.22.

This patch adds `//go:build` directives to those files using recent additions
to the language. It's currently using go1.19 as version to match the version
in our "vendor.mod", but we can consider being more permissive ("any" requires
go1.18 or up), or more "optimistic" (force go1.21, which is the version we
currently use to build).

For completeness sake, note that any file _without_ a `//go:build` directive
will continue to use go1.16 language version when used as a module.

[1]: 58c28ba286/src/cmd/go/internal/gover/version.go (L9-L56)
[2]: https://go.dev/doc/toolchain

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-12-15 15:24:15 +01:00
Rob Murray
0f9f9a132e Move 'netip' utils from 'ipam' to 'internal'.
Signed-off-by: Rob Murray <rob.murray@docker.com>
2023-12-06 17:13:40 +00:00
Cory Snider
1931a1bdc7 libnetwork/diagnostic: lock mutex in help handler
Acquire the mutex in the help handler to synchronize access to the
handlers map. While a trivial issue---a panic in the request handler if
the node joins a swarm at just the right time, which would only result
in an HTTP 500 response---it is also a trivial race condition to fix.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-12-06 11:20:47 -05:00
Cory Snider
424ae36046 libnetwork/diagnostic: use standard http.Handler
We don't need C-style callback functions which accept a void* context
parameter: Go has closures. Drop the unnecessary httpHandlerCustom type
and refactor the diagnostic server handler functions into closures which
capture whatever context they need implicitly.

If the node leaves and rejoins a swarm, the cluster agent and its
associated NetworkDB are discarded and replaced with new instances. Upon
rejoin, the agent registers its NetworkDB instance with the diagnostic
server. These handlers would all conflict with the handlers registered
by the previous NetworkDB instance. Attempting to register a second
handler on a http.ServeMux with the same pattern will panic, which the
diagnostic server would historically deal with by ignoring the duplicate
handler registration. Consequently, the first NetworkDB instance to be
registered would "stick" to the diagnostic server for the lifetime of
the process, even after it is replaced with another instance. Improve
duplicate-handler registration such that the most recently-registered
handler for a pattern is used for all subsequent requests.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-12-06 11:19:59 -05:00
Cory Snider
757a004a90 libnetwork/diagnostic: drop Init method
Fold it into the constructor, because that's what the constructor is
supposed to do.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-12-04 15:13:17 -05:00
Cory Snider
f270057e0c libnetwork/diagnostic: un-embed sync.Mutex field
Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-12-04 15:13:17 -05:00
Rob Murray
964ab7158c Explicitly set MTU on bridge devices.
This is purely cosmetic - if a non-default MTU is configured, the bridge
will have the default MTU=1500 until a container's 'veth' is connected
and an MTU is set on the veth. That's a disconcerting, it looks like the
config has been ignored - so, set the bridge's MTU explicitly.

Fixes #37937

Signed-off-by: Rob Murray <rob.murray@docker.com>
2023-11-27 11:18:54 +00:00
Sebastiaan van Stijn
2f65748927
Merge pull request #46790 from corhere/libn/overlay-ipv6-vtep
libnetwork/drivers/overlay: support IPv6 transport
2023-11-23 18:23:27 +01:00
Paweł Gronowski
d154421092
Merge pull request #46444 from cpuguy83/docker_info_slow
Plumb context through info endpoint
2023-11-20 12:10:30 +01:00
Sebastiaan van Stijn
f13d8c2026
Merge pull request #46724 from rhansen/host_ipv6
New `host_ipv6` bridge option to SNAT IPv6 connections
2023-11-13 21:50:17 +01:00
Brian Goff
677d41aa3b Plumb context through info endpoint
I was trying to find out why `docker info` was sometimes slow so
plumbing a context through to propagate trace data through.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2023-11-10 20:09:25 +00:00
Brian Goff
f0b89e63b9 Fix missing import for "scope" package
I believe this happened due to conflicting PR's that got merged without
CI re-running between them.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2023-11-09 22:48:01 +00:00
Brian Goff
524eef5d75
Merge pull request #46681 from corhere/libn/datastore-misc-cleanups 2023-11-09 11:31:30 -08:00
Cory Snider
33564a0c03 libnetwork/d/overlay: support IPv6 transport
The forwarding database (fdb) of Linux VXLAN links are restricted to
entries with destination VXLAN tunnel endpoint (VTEP) address of a
single address family. Which address family is permitted is set when the
link is created and cannot be modified. The overlay network driver
creates VXLAN links such that the kernel only allows fdb entries to be
created with IPv4 destination VTEP addresses. If the Swarm is configured
with IPv6 advertise addresses, creating fdb entries for remote peers
fails with EAFNOSUPPORT (address family not supported by protocol).

Make overlay networks functional over IPv6 transport by configuring the
VXLAN links for IPv6 VTEPs if the local node's advertise address is an
IPv6 address. Make encrypted overlay networks secure over IPv6 transport
by applying the iptables rules to the ip6tables when appropriate.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-11-09 12:04:47 -05:00
Cory Snider
e1d85da306 libnetwork/d/overlay: parse discovery data eagerly
Parse the address strings once and use the binary representation
internally.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-11-09 12:04:47 -05:00
Albin Kerouanton
d47b3ef4c9
libnet: early return from updateSvcRecord if no addr available
Early return if the iface or its address is nil to make the whole
function slightly easier to read.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-11-08 20:45:15 +01:00
Sebastiaan van Stijn
5b19725de2
Merge pull request #46668 from corhere/libn/svc-record-update-without-store
libnetwork: svc record update without store
2023-11-03 13:47:12 +01:00
Cory Snider
7257c77e19 libnetwork/ipam: refactor prefix-overlap checks
I am finally convinced that, given two netip.Prefix values a and b, the
expression

    a.Contains(b.Addr()) || b.Contains(a.Addr())

is functionally equivalent to

    a.Overlaps(b)

The (netip.Prefix).Contains method works by masking the address with the
prefix's mask and testing whether the remaining most-significant bits
are equal to the same bits in the prefix. The (netip.Prefix).Overlaps
method works by masking the longer prefix to the length of the shorter
prefix and testing whether the remaining most-significant bits are
equal. This is equivalent to
shorterPrefix.Contains(longerPrefix.Addr()), therefore applying Contains
symmetrically to two prefixes will always yield the same result as
applying Overlaps to the two prefixes in either order.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-11-01 11:44:24 -04:00
Richard Hansen
808120e5b8 New host_ipv6 bridge option to SNAT IPv6 connections
Add a new `com.docker.network.host_ipv6` bridge option to compliment
the existing `com.docker.network.host_ipv4` option. When set to an
IPv6 address, this causes the bridge to insert `SNAT` rules instead of
`MASQUERADE` rules (assuming `ip6tables` is enabled).  `SNAT` makes it
possible for users to control the source IP address used for outgoing
connections.

Signed-off-by: Richard Hansen <rhansen@rhansen.org>
2023-10-25 20:11:49 -04:00
Richard Hansen
0cf113e250 Add unit tests for outgoing NAT rules
Signed-off-by: Richard Hansen <rhansen@rhansen.org>
2023-10-21 13:53:58 -04:00
Cory Snider
4af420f978 libnetwork/internal/kvstore: prune unused method
The datastore never calls Get() due to how the cache is implemented.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-10-19 12:57:42 -04:00
Cory Snider
4039b9c9c4 libnetwork/datastore: drop (KVObject).DataScope()
It wasn't being used for anything meaningful.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-10-19 12:38:39 -04:00
Cory Snider
4f4a897dda libnetwork/datastore: drop (*Store).Scope() method
It unconditionally returned scope.Local.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-10-19 12:38:37 -04:00
Cory Snider
4b40d82233 libnetwork/datastore: un-embed mutex from cache
Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-10-19 12:37:12 -04:00
Cory Snider
9536fabaa8 libnetwork/datastore: minor code cleanup
While there is nothing inherently wrong with goto statements, their use
here is not helping with readability.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-10-19 12:37:12 -04:00
Cory Snider
43dccc6c1a libnetwork/datastore: unconditionally use ds.cache
ds.cache is never nil so the uncached code paths are unreachable in
practice. And given how many KVObject deep-copy implementations shallow
copy pointers and other reference-typed values, there is the distinct
possibility that disabling the datastore cache could break things.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-10-19 12:37:10 -04:00
Cory Snider
5b3086db1f libnetwork/datastore: prevent accidental recursion
The datastore cache only uses the reference to its datastore to get a
reference to the backing store. Modify the cache to take the backing
store reference directly so that methods on the datastore can't get
called, as that might result in infinite recursion between datastore and
cache methods.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-10-19 11:56:08 -04:00
Cory Snider
bcca214e36 libnetwork: open-code updating svc records
Inline the tortured logic for deciding when to skip updating the svc
records to give us a fighting chance at deciphering the logic behind the
logic and spotting logic bugs.

Update the service records synchronously. The only potential for issues
is if this change introduces deadlocks, which should be fixed by
restrucuting the mutexes rather than papering over the issue with
sketchy hacks like deferring the operation to a goroutine.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-10-17 19:51:21 -04:00
Cory Snider
33cf73f699 libnetwork: drop (*Controller).nmap
Its only remaining purpose is to elide removing the endpoint from the
service records if it was not previously added. Deleting the service
records is an idempotent operation so it is harmless to delete service
records which do not exist.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-10-17 19:46:18 -04:00
Cory Snider
804ef16822 libnetwork: only delete svc db entry on network rm
The service db entry for each network is deleted by
(*Controller).cleanupServiceDiscovery() when the network is deleted.
There is no need to also eagerly delete it whenever the network's
endpoint count drops to zero.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-10-17 19:46:18 -04:00
Cory Snider
c85398b020 libnetwork: drop vestigial endpoint-rename logic
The logic to rename an endpoint includes code which would synchronize
the renamed service records to peers through the distributed datastore.
It would trigger the remote peers to pick up the rename by touching a
datastore object which remote peers would have subscribed to events on.
The code also asserts that the local peer is subscribed to updates on
the network associated with the endpoint, presumably as a proxy for
asserting that the remote peers would also be subscribed.
https://github.com/moby/libnetwork/pull/712

Libnetwork no longer has support for distributed datastores or
subscribing to datastore object updates, so this logic can be deleted.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-10-17 19:46:18 -04:00
Cory Snider
29da565133 libnetwork: change netWatch map to a set
The map keys are only tested for presence. The value stored at the keys
is unused.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-10-17 18:26:34 -04:00
Cory Snider
0456c0db87 libnetwork: refactor isDistributedControl()
The meaning of the (*Controller).isDistributedControl() method is not
immediately clear from the name, and it does not have any doc comment.
It returns true if and only if the controller is neither a manager node
nor an agent node -- that is, if the daemon is _not_ participating in a
Swarm cluster. The method name likely comes from the old abandoned
datastore-as-IPC control plane architecture for libnetwork. Refactor

    c.isDistributedControl() -> !c.isSwarmNode()

to make it easier to understand code which consumes the method.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-10-17 17:59:19 -04:00
Cory Snider
749d4abd41 libnetwork: get rid of watchLoop goroutine
Replace with roughly equivalent code which relies upon the existing
mutexes for synchronization.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-10-17 17:06:52 -04:00
Richard Hansen
96f85def5b s/HostIP/HostIPv4/ for com.docker.network.host_ipv4 setting
Rename all variables/fields/map keys associated with the
`com.docker.network.host_ipv4` option from `HostIP` to `HostIPv4`.
Rationale:

  * This makes the variable/field name consistent with the option
    name.
  * This makes the code more readable because it is clear that the
    variable/field does not hold an IPv6 address.  This will hopefully
    avoid bugs like <https://github.com/moby/moby/issues/46445> in the
    future.
  * If IPv6 SNAT support is ever added, the names will be symmetric.

Signed-off-by: Richard Hansen <rhansen@rhansen.org>
2023-10-14 02:47:14 -04:00
Richard Hansen
2a14b6cf60 Use iptRule to simplify setIcc (code health)
Signed-off-by: Richard Hansen <rhansen@rhansen.org>
2023-10-14 02:47:14 -04:00
Richard Hansen
d7c6fd2f80 Move programChainRule logic to iptRule methods (code health)
Signed-off-by: Richard Hansen <rhansen@rhansen.org>
2023-10-14 02:47:13 -04:00
Richard Hansen
e260808a57 Move duplicate logic to iptRule.Exists method (code health)
Signed-off-by: Richard Hansen <rhansen@rhansen.org>
2023-10-14 01:41:09 -04:00
Richard Hansen
14d2535f13 Move iptables.IPVersion into iptRule struct (code health)
Rather than pass an `iptables.IPVersion` value alongside every
`iptRule` parameter, embed the IP version in the `iptRule` struct.

Signed-off-by: Richard Hansen <rhansen@rhansen.org>
2023-10-14 01:41:09 -04:00
Richard Hansen
4e219ebafb Eliminate unnecessary iptRule.preArgs field (code health)
That field was only used to pass `-t nat` for NAT rules.  Now `-t
<tableName>` (where `<tableName>` is one of the `iptables.Table`
values) is always passed, eliminating the need for `preArgs`.

Signed-off-by: Richard Hansen <rhansen@rhansen.org>
2023-10-14 01:41:09 -04:00
Richard Hansen
4662e9889c Simplify setupIPTablesInternal parameters (code health)
Pass the entire `*networkConfiguration` struct to
`setupIPTablesInternal` to simplify the function signature and improve
code readability.

Signed-off-by: Richard Hansen <rhansen@rhansen.org>
2023-10-14 01:41:09 -04:00
Bjorn Neergaard
f20abbc96c
libnetwork: use conntrack and --ctstate for all rules
On modern kernels this is an alias; however newer code has preferred
ctstate while older code has preferred the deprecated 'state' name.

Prefer the newer name for uniformity in the rules libnetwork creates,
and because some implementations/distributions of the xtables userland
tools may not support the legacy alias.

Signed-off-by: Bjorn Neergaard <bjorn.neergaard@docker.com>
2023-10-13 00:56:30 -06:00
Sebastiaan van Stijn
adea457841
Merge pull request #46553 from thaJeztah/no_panic
libnetwork: Controller: getKeys, getPrimaryKeyTag: prevent panic and small refactor
2023-10-12 14:19:06 +02:00
Sebastiaan van Stijn
cff4f20c44
migrate to github.com/containerd/log v0.1.0
The github.com/containerd/containerd/log package was moved to a separate
module, which will also be used by upcoming (patch) releases of containerd.

This patch moves our own uses of the package to use the new module.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-10-11 17:52:23 +02:00
Sebastiaan van Stijn
2835d1f7b2
Merge pull request #46603 from akerouanton/libnet-bridge-internal
libnet/d/bridge: Don't set container's gateway when network is internal
2023-10-11 17:07:02 +02:00
Sebastiaan van Stijn
26c5d1ea0d
Merge pull request #46551 from akerouanton/libnet-resolver-otel
libnet: add OTEL tracing to the embedded DNS
2023-10-11 17:03:30 +02:00
Albin Kerouanton
37ca57e9d5
libnet/d/bridge: inline error checks
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-10-10 10:46:44 +02:00
Albin Kerouanton
cbc2a71c27
libnet/d/bridge: Don't set container's gateway when network is internal
So far, internal networks were only isolated from the host by iptables
DROP rules. As a consequence, outbound connections from containers would
timeout instead of being "rejected" through an immediate ICMP dest/port
unreachable, a TCP RST or a failing `connect` syscall.

This was visible when internal containers were trying to resolve a
domain that don't match any container on the same network (be it a truly
"external" domain, or a container that don't exist/is dead). In that
case, the embedded resolver would try to forward DNS queries for the
different values of resolv.conf `search` option, making DNS resolution
slow to return an error, and the slowness being exacerbated by some libc
implementations.

This change makes `connect` syscall to return ENETUNREACH, and thus
solves the broader issue of failing fast when external connections are
attempted.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-10-09 13:57:54 +02:00
Albin Kerouanton
2c4551d86d
libnet: resolver: remove direct use of logrus
This causes logs written through `r.log(ctx)` to not end in OTEL traces.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-10-06 19:14:48 +02:00
Albin Kerouanton
4de8459265
libnet: add OTEL tracing to the embedded DNS
This change creates a few OTEL spans and plumb context through the DNS
resolver and DNS backends (ie. Sandbox and Network). This should help
better understand how much lock contention impacts performance, and
help debug issues related to DNS queries (we basically have no
visibility into what's happening here right now).

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-10-06 19:14:48 +02:00
Sebastiaan van Stijn
dcc75e1563
libnetwork: Controller: agentInit, agentDriverNotify rm intermediate vars
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-27 12:08:28 +02:00
Sebastiaan van Stijn
a384102fdf
libnetwork/datastore: Store.Map, Store.List: remove intermediate vars
Inline the closures, and rename a var to be more descriptive.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-27 12:07:31 +02:00
Sebastiaan van Stijn
bb5402e6fb
libnetwork: Controller: getKeys, getPrimaryKeyTag: slight refactor
- use named return variables to make the function more self-describing
- rename variable for readability
- slightly optimize slice initialization, and keep linters happy

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-27 12:01:54 +02:00
Sebastiaan van Stijn
603f49706e
libnetwork: Controller: getKeys, getPrimaryKeyTag: prevent panic
Prevent potential panics if we don't have the expected number of keys
for the subsystem.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-27 12:01:54 +02:00
Sebastiaan van Stijn
605c8fb75d
Merge pull request #46546 from thaJeztah/libnetwork_return_errs
libnetwork: Controller.cleanupLocalEndpoints, sandboxCleanup: return errors
2023-09-27 10:31:56 +02:00
Sebastiaan van Stijn
324cb3d08f
Merge pull request #46545 from thaJeztah/libnetwork_NetworkByID_simplify
libnetwork: Controller.NetworkByID: remove redundant error-handling
2023-09-27 10:30:47 +02:00
Sebastiaan van Stijn
f3143745b2
Merge pull request #46547 from thaJeztah/libnetwork_store_nolock
libnetwork: Controller: remove mutex for "store"
2023-09-27 10:23:32 +02:00
Sebastiaan van Stijn
b1855bb4af
Merge pull request #46548 from thaJeztah/libnetwork_inline_populateSpecial
libnetwork: inline populateSpecial NetworkWalker
2023-09-27 10:13:15 +02:00
Sebastiaan van Stijn
618d9b5d54
libnetwork: nwAgent: un-export mutex
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-26 19:46:27 +02:00
Sebastiaan van Stijn
7cda3fb7b5
libnetwork: inline populateSpecial NetworkWalker
It was only used in a single place, and it was defined far away from
where it was used.

Move the code inline, so that it's clear at a glance what it's doing.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-26 19:41:50 +02:00
Sebastiaan van Stijn
ca1307c56e
libnetwork: Controller: remove mutex for "store"
The store field is only mutated by Controller.initStores(), which is
only called inside the cosntructor (libnetwork.New), so there should be
no need to protect the field with a mutex in non-exported functions.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-26 19:34:12 +02:00
Sebastiaan van Stijn
a8ea752a93
libnetwork: Controller.cleanupLocalEndpoints: return errors
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-26 19:28:18 +02:00
Sebastiaan van Stijn
2e60051c92
libnetwork: Controller.sandboxCleanup: return errors
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-26 19:28:18 +02:00
Sebastiaan van Stijn
642cf261a8
libnetwork: Controller.NetworkByID: remove redundant error-handling
Controller.getNetworkFromStore() already returns a ErrNoSuchNetwork if
no network was found, so we don't need to convert the existing error.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-26 19:22:52 +02:00
Sebastiaan van Stijn
d7a31cfb2d
libnetwork: Sandbox.resolveName: slightly simplify locking
Simplify the lock/unlock cycle, and make the "lookupAlias" branch
more similar to the non-lookupAlias variant.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-21 16:23:36 +02:00
Sebastiaan van Stijn
f549aaa205
libnetwork: Sandbox.resolveName: add fast-path for alias lookups
Skip faster when we're looking for aliases. Also check for the list
of aliases to be empty, not just `nil` (although in practice it should
be equivalent).

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-21 16:23:35 +02:00
Sebastiaan van Stijn
9249b34be8
libnetwork: Sandbox.resolveName: rename vars for clarity
- use `nameOrAlias` for the name (or alias) to resolve
- use `lookupAlias` to indicate what the intent is; this function
  is either looking up aliases or "regular" names. Ideally we would
  split the function, but let's keep that for a future exercise.
- name the `ipv6Miss` output variable. The "ipv6 miss" logic is rather
  confusing, and should probably be revisited, but let's start with
  giving the variable a name to make it more apparent what it is.
- use `nw` for networks, which is the more common local name

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-21 16:23:35 +02:00
Sebastiaan van Stijn
4401ccac22
libnetwork: Sandbox: remove some intermediate vars
- remove some intermediate vars, or move them closer to where they're used.
- ResolveService: use strings.SplitN to limit number of elements. This
  code is only used to validate the input, results are not used.
- ResolveService: return early instead of breaking the loop. This makes
  it clearer from the code that were not returning anything (nil, nil).
- Controller.sandboxCleanup(): rename a var, and slight refactor of
  error-handling.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-21 16:23:35 +02:00
Sebastiaan van Stijn
4ff252456b
libnetwork: rewrite Network.isClusterEligible to return agent
This function was used to check if the network is a multi-host, swarm-scoped
network. Part of this check involved a check whether the cluster-agent was
present.

In all places where this function was used, the next step after checking if
the network was "cluster eligible", was to get the agent, and (again) check
if it was not nil.

This patch rewrites the isClusterEligible utility into a clusterAgent utility,
which both checks if the network is cluster-eligible, and returns the agent
(if set). For convenience, an "ok" bool is added, which callers can use to
return early (although just checking for nilness would likely have been
sufficient).

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-21 10:19:21 +02:00
Sebastiaan van Stijn
6203e3660d
libnetwork: Endpoint: return early if no agent was found
This removes redundant nil-checks in Endpoint.deleteServiceInfoFromCluster
and Endpoint.addServiceInfoToCluster.

These functions return early if the network is not ["cluster eligible"][1],
and the function used for that (`Network.isClusterEligible`) requires the
[agent to not be `nil`][2].

This check moved around a few times ([3][3], [4][4]), but was originally
added in [libnetwork 1570][5] which, among others, tried to avoid a nil-pointer
exception reported in [moby 28712][6], which accessed the `Controller.agent`
[without locking][7]. That issue was addressed by adding locks, adding a
`Controller.getAgent` accessor, and updating deleteServiceInfoFromCluster
to use a local var. It also sprinkled this `nil` check to be on the safe
side, but as `Network.isClusterEligible` already checks for the agent
to not be `nil`, this should not be redundant.

[1]: 5b53ddfcdd/libnetwork/agent.go (L529-L534)
[2]: 5b53ddfcdd/libnetwork/agent.go (L688-L696)
[3]: f2307265c7
[4]: 6426d1e66f
[5]: 8dcf9960aa
[6]: https://github.com/moby/moby/issues/28712
[7]: 75fd88ba89/vendor/github.com/docker/libnetwork/agent.go (L452)

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-21 10:19:21 +02:00
Sebastiaan van Stijn
6eeef51c6a
libnetwork: Controller.agentSetup: use structured logs
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-21 10:19:15 +02:00
Sebastiaan van Stijn
8b95ea4a35
libnetwork: Controller.agentSetup: remove redundant condition
The function returns at the start if there agent is non-nil.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-21 10:15:06 +02:00
Sebastiaan van Stijn
1ed5d91555
Merge pull request #46365 from thaJeztah/libnetwork_endpoint_nits
libnetwork: Endpoint: fixing some nits
2023-09-20 22:01:28 +02:00
Sebastiaan van Stijn
313a090c0e
libnetwork/osl: add some TODOs
These came up during review of a refactor, and need further investigating.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-20 12:45:45 +02:00
Sebastiaan van Stijn
9d3b1f9419
libnetwork/osl: make constructing Interfaces more atomic
It's still not "great", but implement a `newInterface()` constructor
to create a new Interface instance, instead of creating a partial
instance and applying "options" after the fact.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-20 12:45:40 +02:00
Sebastiaan van Stijn
47f9e70385
libnetwork/osl: Namespace.Restore: conditionally fetch IPs
We're only using the results if the interface doesn't have an address
yet, so skip this step if we don't use it.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-20 12:38:27 +02:00
Sebastiaan van Stijn
ee5a91e663
libnetwork/osl: Namespace.Restore: flatten nested conditions
Flatten some nested "if"-statements, and improve error.

Errors returned by this function are not handled, and only logged, so
make them more informative if debugging is needed.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-20 12:38:27 +02:00
Sebastiaan van Stijn
299bd58c5a
libnetwork/osl: Namespace.Restore: rename vars for readability
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-20 12:38:27 +02:00
Sebastiaan van Stijn
7b96663082
libnetwork/osl: Namespace: inline setGateway and setGatewayIPv6
They were not consistently used, and the locations where they were
used were already "setters", so we may as well inline the code.

Also updating Namespace.Restore to keep the lock slightly longer,
instead of locking/unlocking for each property individually, although
we should consider to keep the long for the duration of the whole
function to make it more atomic.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-20 12:38:26 +02:00
Sebastiaan van Stijn
bd17d27658
libnetwork/osl: Namespace: make error-handling more idiomatic
Check for non-nil errors (and return early) instead of the reverse.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-20 12:38:26 +02:00
Sebastiaan van Stijn
0b4a70ca2c
libnetwork/osl: Namespace: programRoute, removeRoute rm path arg
Remove the argument, because it was not used.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-20 12:38:26 +02:00
Sebastiaan van Stijn
542fe0da40
libnetwork/osl: Namespace: make mutex private
Make the mutex internal to the Namespace; locking/unlocking should not
be done externally, and this makes it easier to see where it's used.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-20 12:38:26 +02:00
Sebastiaan van Stijn
338fc49060
libnetwork/osl: implement Namespace.RemoveInterface
Interface.Remove() was directly accessing Namespace "internals", such
as locking/unlocking. Move the code from Interface.Remove() into the
Namespace instead.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-20 12:34:47 +02:00
Sebastiaan van Stijn
7cfb81ba04
Merge pull request #46342 from thaJeztah/libnetwork_nwAgent_ip
libnetwork: nwAgent.bindAddr: change to net.IP
2023-09-20 10:27:06 +02:00