Make the internal DNS resolver for Windows containers forward requests
to upsteam DNS servers when it cannot respond itself, rather than
returning SERVFAIL.
Windows containers are normally configured with the internal resolver
first for service discovery (container name lookup), then external
resolvers from '--dns' or the host's networking configuration.
When a tool like ping gets a SERVFAIL from the internal resolver, it
tries the other nameservers. But, nslookup does not, and with this
change it does not need to.
The internal resolver learns external server addresses from the
container's HNSEndpoint configuration, so it will use the same DNS
servers as processes in the container.
The internal resolver for Windows containers listens on the network's
gateway address, and each container may have a different set of external
DNS servers. So, the resolver uses the source address of the DNS request
to select external resolvers.
On Windows, daemon.json feature option 'windows-no-dns-proxy' can be used
to prevent the internal resolver from forwarding requests (restoring the
old behaviour).
Signed-off-by: Rob Murray <rob.murray@docker.com>
The internal DNS resolver should only forward requests to external
resolvers if the libnetwork.Sandbox served by the resolver has external
network access (so, no forwarding for '--internal' networks).
The test for external network access was whether the Sandbox had an
Endpoint with a gateway configured.
However, an ipvlan-l3 networks with external network access does not
have a gateway, it has a default route bound to an interface.
Also, we document that an ipvlan network with no parent interface is
equivalent to a '--internal' network. But, in this case, an ipvlan-l2
network was configured with a gateway. So, DNS proxying would be enabled
in the internal resolver (and, if the host's resolver was on a localhost
address, requests to external resolvers from the host's network
namespace would succeed).
So, this change adjusts the test for enabling DNS proxying to include
a check for '--internal' (as a shortcut) and, for non-internal networks,
checks for a default route as well as a gateway. It also disables
configuration of a gateway or a default route for an ipvlan Endpoint if
no parent interface is specified.
(Note if a parent interface with no external network is supplied as
'-o parent=<dummy>', the gateway/default route will still be set up
and external DNS proxying will be enabled. The network must be
configured as '--internal' to prevent that from happening.)
Signed-off-by: Rob Murray <rob.murray@docker.com>
Commit 8b7af1d0f added some code to update the DNSNames of all
endpoints attached to a sandbox by loading a new instance of each
affected endpoints from the datastore through a call to
`Network.EndpointByID()`.
This method then calls `Network.getEndpointFromStore()`, that in
turn calls `store.GetObject()`, which then calls `cache.get()`,
which calls `o.CopyTo(kvObject)`. This effectively creates a fresh
new instance of an Endpoint. However, endpoints are already kept in
memory by Sandbox, meaning we now have two in-memory instances of
the same Endpoint.
As it turns out, libnetwork is built around the idea that no two objects
representing the same thing should leave in-memory, otherwise breaking
mutex locking and optimistic locking (as both instances will have a drifting
version tracking ID -- dbIndex in libnetwork parliance).
In this specific case, this bug materializes by container rename failing
when applied a second time for a given container. An integration test is
added to make sure this won't happen again.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
When resolving names in swarm mode, services with exposed ports are
connected to user overlay network, ingress network, and local (docker_gwbridge)
networks. Name resolution should prioritize returning the VIP/IPs on user
overlay network over ingress and local networks.
Sandbox.ResolveName implemented this by taking the list of endpoints,
splitting the list into 3 separate lists based on the type of network
that the endpoint was attached to (dynamic, ingress, local), and then
creating a new list, applying the networks in that order.
This patch refactors that logic to use a custom sorter (sort.Interface),
which makes the code more transparent, and prevents iterating over the
list of endpoints multiple times.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The meaning of the (*Controller).isDistributedControl() method is not
immediately clear from the name, and it does not have any doc comment.
It returns true if and only if the controller is neither a manager node
nor an agent node -- that is, if the daemon is _not_ participating in a
Swarm cluster. The method name likely comes from the old abandoned
datastore-as-IPC control plane architecture for libnetwork. Refactor
c.isDistributedControl() -> !c.isSwarmNode()
to make it easier to understand code which consumes the method.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The github.com/containerd/containerd/log package was moved to a separate
module, which will also be used by upcoming (patch) releases of containerd.
This patch moves our own uses of the package to use the new module.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This change creates a few OTEL spans and plumb context through the DNS
resolver and DNS backends (ie. Sandbox and Network). This should help
better understand how much lock contention impacts performance, and
help debug issues related to DNS queries (we basically have no
visibility into what's happening here right now).
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Simplify the lock/unlock cycle, and make the "lookupAlias" branch
more similar to the non-lookupAlias variant.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Skip faster when we're looking for aliases. Also check for the list
of aliases to be empty, not just `nil` (although in practice it should
be equivalent).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- use `nameOrAlias` for the name (or alias) to resolve
- use `lookupAlias` to indicate what the intent is; this function
is either looking up aliases or "regular" names. Ideally we would
split the function, but let's keep that for a future exercise.
- name the `ipv6Miss` output variable. The "ipv6 miss" logic is rather
confusing, and should probably be revisited, but let's start with
giving the variable a name to make it more apparent what it is.
- use `nw` for networks, which is the more common local name
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- remove some intermediate vars, or move them closer to where they're used.
- ResolveService: use strings.SplitN to limit number of elements. This
code is only used to validate the input, results are not used.
- ResolveService: return early instead of breaking the loop. This makes
it clearer from the code that were not returning anything (nil, nil).
- Controller.sandboxCleanup(): rename a var, and slight refactor of
error-handling.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
osl.NewSandbox() always returns a nil interface on Windows (and other non-Linux
platforms). This means that any code that these fields are always nil, and
any code using these fields must be considered Linux-only.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
"Pay no attention to the implementation behind the curtain!"
There's only one implementation of the Sandbox interface, and only one implementation
of the Info interface, and they both happens to be implemented by the same type:
networkNamespace. Let's merge these interfaces.
And now that we know that there's one, and only one Info, we can drop the charade,
and relieve the Sandbox from its dual personality.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
InvalidParameter is now compatible with errdefs.InvalidParameter. Thus,
these errors will now return a 400 status code instead of a 500.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Outside of some tests, these options are the only code setting these fields,
so we can update them to set the value, instead of appending.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The method to restore a network namespace takes a collection of
interfaces to restore with the options to apply. The interface names are
structured data, tuples of (SrcName, DstPrefix) but for whatever reason
are being passed into Restore() serialized to strings. A refactor,
f0be4d126d, accidentally broke the
serialization by dropping the delimiter. Rather than fix the
serialization and leave the time-bomb for someone else to trip over,
pass the interface names as structured data.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Embedded structs are part of the exported surface of a struct type.
Boxing a struct value into an interface value does not erase that;
any code could gain access to the embedded struct value with a simple
type assertion. The mutex is supposed to be a private implementation
detail, but *endpoint implements sync.Locker because the mutex is
embedded. Change the mutex to an unexported field so *endpoint no
longer spuriously implements the sync.Locker interface.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Basically every exported method which takes a libnetwork.Sandbox
argument asserts that the value's concrete type is *sandbox. Passing any
other implementation of the interface is a runtime error! This interface
is a footgun, and clearly not necessary. Export and use the concrete
type instead.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Embedded structs are part of the exported surface of a struct type.
Boxing a struct value into an interface value does not erase that;
any code could gain access to the embedded struct value with a simple
type assertion. The mutex is supposed to be a private implementation
detail, but *sandbox implements sync.Locker because the mutex is
embedded. Change the mutex to an unexported field so *sandbox no
longer spuriously implements the sync.Locker interface.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Embedded structs are part of the exported surface of a struct type.
Boxing a struct value into an interface value does not erase that;
any code could gain access to the embedded struct value with a simple
type assertion. The mutex is supposed to be a private implementation
detail, but *controller implements sync.Locker because the mutex is
embedded.
c, _ := libnetwork.New()
c.(sync.Locker).Lock()
Change the mutex to an unexported field so *controller no longer
spuriously implements the sync.Locker interface.
Signed-off-by: Cory Snider <csnider@mirantis.com>
libnetwork/etchosts/etchosts_test.go:167:54: empty-lines: extra empty line at the end of a block (revive)
libnetwork/osl/route_linux.go:185:74: empty-lines: extra empty line at the start of a block (revive)
libnetwork/osl/sandbox_linux_test.go:323:36: empty-lines: extra empty line at the start of a block (revive)
libnetwork/bitseq/sequence.go:412:48: empty-lines: extra empty line at the start of a block (revive)
libnetwork/datastore/datastore_test.go:67:46: empty-lines: extra empty line at the end of a block (revive)
libnetwork/datastore/mock_store.go:34:60: empty-lines: extra empty line at the end of a block (revive)
libnetwork/iptables/firewalld.go:202:44: empty-lines: extra empty line at the end of a block (revive)
libnetwork/iptables/firewalld_test.go:76:36: empty-lines: extra empty line at the end of a block (revive)
libnetwork/iptables/iptables.go:256:67: empty-lines: extra empty line at the end of a block (revive)
libnetwork/iptables/iptables.go:303:128: empty-lines: extra empty line at the start of a block (revive)
libnetwork/networkdb/cluster.go:183:72: empty-lines: extra empty line at the end of a block (revive)
libnetwork/ipams/null/null_test.go:44:38: empty-lines: extra empty line at the end of a block (revive)
libnetwork/drivers/macvlan/macvlan_store.go:45:52: empty-lines: extra empty line at the end of a block (revive)
libnetwork/ipam/allocator_test.go:1058:39: empty-lines: extra empty line at the start of a block (revive)
libnetwork/drivers/bridge/port_mapping.go:88:111: empty-lines: extra empty line at the end of a block (revive)
libnetwork/drivers/bridge/link.go:26:90: empty-lines: extra empty line at the end of a block (revive)
libnetwork/drivers/bridge/setup_ipv6_test.go:17:34: empty-lines: extra empty line at the end of a block (revive)
libnetwork/drivers/bridge/setup_ip_tables.go:392:4: empty-lines: extra empty line at the start of a block (revive)
libnetwork/drivers/bridge/bridge.go:804:50: empty-lines: extra empty line at the start of a block (revive)
libnetwork/drivers/overlay/ov_serf.go:183:29: empty-lines: extra empty line at the start of a block (revive)
libnetwork/drivers/overlay/ov_utils.go:81:64: empty-lines: extra empty line at the end of a block (revive)
libnetwork/drivers/overlay/peerdb.go:172:67: empty-lines: extra empty line at the start of a block (revive)
libnetwork/drivers/overlay/peerdb.go:209:67: empty-lines: extra empty line at the start of a block (revive)
libnetwork/drivers/overlay/peerdb.go:344:89: empty-lines: extra empty line at the start of a block (revive)
libnetwork/drivers/overlay/peerdb.go:436:63: empty-lines: extra empty line at the start of a block (revive)
libnetwork/drivers/overlay/overlay.go:183:36: empty-lines: extra empty line at the start of a block (revive)
libnetwork/drivers/overlay/encryption.go:69:28: empty-lines: extra empty line at the end of a block (revive)
libnetwork/drivers/overlay/ov_network.go:563:81: empty-lines: extra empty line at the start of a block (revive)
libnetwork/default_gateway.go:32:43: empty-lines: extra empty line at the start of a block (revive)
libnetwork/errors_test.go:9:40: empty-lines: extra empty line at the start of a block (revive)
libnetwork/service_common.go:184:64: empty-lines: extra empty line at the end of a block (revive)
libnetwork/endpoint.go:161:55: empty-lines: extra empty line at the end of a block (revive)
libnetwork/store.go:320:33: empty-lines: extra empty line at the end of a block (revive)
libnetwork/store_linux_test.go:11:38: empty-lines: extra empty line at the end of a block (revive)
libnetwork/sandbox.go:571:36: empty-lines: extra empty line at the start of a block (revive)
libnetwork/service_common.go:317:246: empty-lines: extra empty line at the start of a block (revive)
libnetwork/endpoint.go:550:17: empty-lines: extra empty line at the end of a block (revive)
libnetwork/sandbox_dns_unix.go:213:106: empty-lines: extra empty line at the start of a block (revive)
libnetwork/controller.go:676:85: empty-lines: extra empty line at the end of a block (revive)
libnetwork/agent.go:876:60: empty-lines: extra empty line at the end of a block (revive)
libnetwork/resolver.go:324:69: empty-lines: extra empty line at the end of a block (revive)
libnetwork/network.go:1153:92: empty-lines: extra empty line at the end of a block (revive)
libnetwork/network.go:1955:67: empty-lines: extra empty line at the start of a block (revive)
libnetwork/network.go:2235:9: empty-lines: extra empty line at the start of a block (revive)
libnetwork/libnetwork_internal_test.go:336:26: empty-lines: extra empty line at the start of a block (revive)
libnetwork/resolver_test.go:76:35: empty-lines: extra empty line at the end of a block (revive)
libnetwork/libnetwork_test.go:303:38: empty-lines: extra empty line at the end of a block (revive)
libnetwork/libnetwork_test.go:985:46: empty-lines: extra empty line at the end of a block (revive)
libnetwork/ipam/allocator_test.go:1263:37: empty-lines: extra empty line at the start of a block (revive)
libnetwork/errors_test.go:9:40: empty-lines: extra empty line at the end of a block (revive)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Remove the "deadcode", "structcheck", and "varcheck" linters, as they are
deprecated:
WARN [runner] The linter 'deadcode' is deprecated (since v1.49.0) due to: The owner seems to have abandoned the linter. Replaced by unused.
WARN [runner] The linter 'structcheck' is deprecated (since v1.49.0) due to: The owner seems to have abandoned the linter. Replaced by unused.
WARN [runner] The linter 'varcheck' is deprecated (since v1.49.0) due to: The owner seems to have abandoned the linter. Replaced by unused.
WARN [linters context] structcheck is disabled because of generics. You can track the evolution of the generics support by following the https://github.com/golangci/golangci-lint/issues/2649.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The correct formatting for machine-readable comments is;
//<some alphanumeric identifier>:<options>[,<option>...][ // comment]
Which basically means:
- MUST NOT have a space before `<identifier>` (e.g. `nolint`)
- Identified MUST be alphanumeric
- MUST be followed by a colon
- MUST be followed by at least one `<option>`
- Optionally additional `<options>` (comma-separated)
- Optionally followed by a comment
Any other format will not be considered a machine-readable comment by `gofmt`,
and thus formatted as a regular comment. Note that this also means that a
`//nolint` (without anything after it) is considered invalid, same for `//#nosec`
(starts with a `#`).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
After moving libnetwork to this repo, we need to update all the import
paths for libnetwork to point to docker/docker/libnetwork instead of
docker/libnetwork.
This change implements that.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Allow DSR to be a configurable option through a generic option to the
overlay driver. On the one hand this approach makes sense insofar as
only overlay networks can currently perform load balancing. On the
other hand, this approach has several issues. First, should we create
another type of swarm scope network, this will prevent it working.
Second, the service core code is separate from the driver code and the
driver code can't influence the core data structures. So the driver
code can't set this option itself. Therefore, implementing in this way
requires some hack code to test for this option in
controller.NewNetwork.
A more correct approach would be to make this a generic option for any
network. Then the driver could ignore, reject or be unaware of the option
depending on the chosen model. This would require changes to:
* libnetwork - naturally
* the docker API - to carry the option
* swarmkit - to propagate the option
* the docker CLI - to support the option
* moby - to translate the API option into a libnetwork option
Given the urgency of requests to address this issue, this approach will
be saved for a future iteration.
Signed-off-by: Chris Telfer <ctelfer@docker.com>
Modify the loadbalancing for east-west traffic to use direct routing
rather than NAT and update tasks to use direct service return under
linux. This avoids hiding the source address of the sender and improves
the performance in single-client/single-server tests.
Signed-off-by: Chris Telfer <ctelfer@docker.com>
Change the sandbox IDs for the sandboxes of load-balancing endpoints to
be "lb_XXXXXXXXX" where XXXXXXXXX is the network ID that this sandbox
load balances for. This makes it easier to find these sandboxes in
/var/run/docker/netns and thus makes debugging easier.
Signed-off-by: Chris Telfer <ctelfer@docker.com>
Refactor the ostweaks file to allows a more easy reuse
Add a method on the osl.Sandbox interface to allow setting
knobs on the sandbox
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
This is the heart of the scalability change for services in libnetwork.
The present routing mesh adds load-balancing rules for a network to
every container connected to the network. This newer approach creates a
load-balancing endpoint per network per node. For every service on a
network, libnetwork assigns the VIP of the service to the endpoint's
interface as an alias. This endpoint must have a unique IP address in
order to route return traffic to it. Traffic destined for a service's
VIP arrives at the load-balancing endpoint on the VIP and from there,
Linux load balances it among backend destinations while SNATing said
traffic to the endpoint's unique IP address.
The net result of this scheme is that each node in a swarm need only
have one set of load balancing state per service instead of one per
container on the node. This scheme is very similar to how services
currently operate on Windows nodes in libnetwork. It (as with Windows
nodes) costs the use of extra IP addresses in a network (one per node)
and an extra network hop in the stack, although, always in the stack
local to the container.
In order to prevent existing deployments from suddenly failing if they
failed to allocate sufficient address space to include per-node
load-balancing endpoint IP addresses, this patch preserves the existing
functionality and activates the new functionality on a per-network
basis depending on whether the network has a load-balancing endpoint.
Eventually, moby should always set this option when creating new
networks and should only omit it for networks created as part of a swarm
that are not marked to use endpoint load balancing.
This patch also normalizes the code to treat "load" and "balancer"
as two separate words from the perspectives of variable/function naming.
This means that the 'b' in "balancer" must be capitalized.
Signed-off-by: Chris Telfer <ctelfer@docker.com>
New load balancing code will require ability to add aliases to
load-balncer sandboxes. So this broadens the OSL interface to allow
adding aliases to any interface, along with the facility to get the
loopback interface's name based on the OS.
Signed-off-by: Chris Telfer <ctelfer@docker.com>