beenull/moby

Author	SHA1	Message	Date
Rob Murray	6c68be24a2	Windows DNS resolver forwarding Make the internal DNS resolver for Windows containers forward requests to upsteam DNS servers when it cannot respond itself, rather than returning SERVFAIL. Windows containers are normally configured with the internal resolver first for service discovery (container name lookup), then external resolvers from '--dns' or the host's networking configuration. When a tool like ping gets a SERVFAIL from the internal resolver, it tries the other nameservers. But, nslookup does not, and with this change it does not need to. The internal resolver learns external server addresses from the container's HNSEndpoint configuration, so it will use the same DNS servers as processes in the container. The internal resolver for Windows containers listens on the network's gateway address, and each container may have a different set of external DNS servers. So, the resolver uses the source address of the DNS request to select external resolvers. On Windows, daemon.json feature option 'windows-no-dns-proxy' can be used to prevent the internal resolver from forwarding requests (restoring the old behaviour). Signed-off-by: Rob Murray <rob.murray@docker.com>	2024-04-16 18:57:28 +01:00
Rob Murray	17b8631545	Enable DNS proxying for ipvlan-l3 The internal DNS resolver should only forward requests to external resolvers if the libnetwork.Sandbox served by the resolver has external network access (so, no forwarding for '--internal' networks). The test for external network access was whether the Sandbox had an Endpoint with a gateway configured. However, an ipvlan-l3 networks with external network access does not have a gateway, it has a default route bound to an interface. Also, we document that an ipvlan network with no parent interface is equivalent to a '--internal' network. But, in this case, an ipvlan-l2 network was configured with a gateway. So, DNS proxying would be enabled in the internal resolver (and, if the host's resolver was on a localhost address, requests to external resolvers from the host's network namespace would succeed). So, this change adjusts the test for enabling DNS proxying to include a check for '--internal' (as a shortcut) and, for non-internal networks, checks for a default route as well as a gateway. It also disables configuration of a gateway or a default route for an ipvlan Endpoint if no parent interface is specified. (Note if a parent interface with no external network is supplied as '-o parent=<dummy>', the gateway/default route will still be set up and external DNS proxying will be enabled. The network must be configured as '--internal' to prevent that from happening.) Signed-off-by: Rob Murray <rob.murray@docker.com>	2024-04-10 08:50:57 +01:00
Albin Kerouanton	790c3039d0	libnet: Don't forward to upstream resolvers on internal nw Commit `cbc2a71c2` makes `connect` syscall fail fast when a container is only attached to an internal network. Thanks to that, if such a container tries to resolve an "external" domain, the embedded resolver returns an error immediately instead of waiting for a timeout. This commit makes sure the embedded resolver doesn't even try to forward to upstream servers. Co-authored-by: Albin Kerouanton <albinker@gmail.com> Signed-off-by: Rob Murray <rob.murray@docker.com>	2024-03-14 17:46:48 +00:00
Sebastiaan van Stijn	f472dda2e9	Merge pull request #47236 from akerouanton/remove-sb-leave-options-param libnet: remove arg `options` from (*Endpoint).Leave()	2024-01-30 16:57:36 +01:00
Albin Kerouanton	794f7127ef	Merge pull request #47062 from robmry/35954-default_ipv6_enabled Detect IPv6 support in containers, generate '/etc/hosts' accordingly.	2024-01-29 16:31:35 +01:00
Albin Kerouanton	21136865ac	libnet: remove arg `options` from (*Endpoint).Leave() This arg is never set by any caller. Better remove it Signed-off-by: Albin Kerouanton <albinker@gmail.com>	2024-01-27 09:26:36 +01:00
Sebastiaan van Stijn	bd4ff31775	add more //go:build directives to prevent downgrading to go1.16 language This is a follow-up to `2cf230951f`, adding more directives to adjust for some new code added since: Before this patch: make -C ./internal/gocompat/ GO111MODULE=off go generate . GO111MODULE=on go mod tidy GO111MODULE=on go test -v # github.com/docker/docker/internal/sliceutil internal/sliceutil/sliceutil.go:3:12: type parameter requires go1.18 or later (-lang was set to go1.16; check go.mod) internal/sliceutil/sliceutil.go:3:14: predeclared comparable requires go1.18 or later (-lang was set to go1.16; check go.mod) internal/sliceutil/sliceutil.go:4:19: invalid map key type T (missing comparable constraint) # github.com/docker/docker/libnetwork libnetwork/endpoint.go:252:17: implicit function instantiation requires go1.18 or later (-lang was set to go1.16; check go.mod) # github.com/docker/docker/daemon daemon/container_operations.go:682:9: implicit function instantiation requires go1.18 or later (-lang was set to go1.16; check go.mod) daemon/inspect.go:42:18: implicit function instantiation requires go1.18 or later (-lang was set to go1.16; check go.mod) With this patch: make -C ./internal/gocompat/ GO111MODULE=off go generate . GO111MODULE=on go mod tidy GO111MODULE=on go test -v === RUN TestModuleCompatibllity main_test.go:321: all packages have the correct go version specified through //go:build --- PASS: TestModuleCompatibllity (0.00s) PASS ok gocompat 0.031s make: Leaving directory '/go/src/github.com/docker/docker/internal/gocompat' Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2024-01-25 11:18:44 +01:00
Sebastiaan van Stijn	0a9bc3b507	libnetwork: Sandbox.ResolveName: refactor ordering of endpoints When resolving names in swarm mode, services with exposed ports are connected to user overlay network, ingress network, and local (docker_gwbridge) networks. Name resolution should prioritize returning the VIP/IPs on user overlay network over ingress and local networks. Sandbox.ResolveName implemented this by taking the list of endpoints, splitting the list into 3 separate lists based on the type of network that the endpoint was attached to (dynamic, ingress, local), and then creating a new list, applying the networks in that order. This patch refactors that logic to use a custom sorter (sort.Interface), which makes the code more transparent, and prevents iterating over the list of endpoints multiple times. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2024-01-20 12:41:33 +01:00
Rob Murray	a8f7c5ee48	Detect IPv6 support in containers. Some configuration in a container depends on whether it has support for IPv6 (including default entries for '::1' etc in '/etc/hosts'). Before this change, the container's support for IPv6 was determined by whether it was connected to any IPv6-enabled networks. But, that can change over time, it isn't a property of the container itself. So, instead, detect IPv6 support by looking for '::1' on the container's loopback interface. It will not be present if the kernel does not have IPv6 support, or the user has disabled it in new namespaces by other means. Once IPv6 support has been determined for the container, its '/etc/hosts' is re-generated accordingly. The daemon no longer disables IPv6 on all interfaces during initialisation. It now disables IPv6 only for interfaces that have not been assigned an IPv6 address. (But, even if IPv6 is disabled for the container using the sysctl 'net.ipv6.conf.all.disable_ipv6=1', interfaces connected to IPv6 networks still get IPv6 addresses that appear in the internal DNS. There's more to-do!) Signed-off-by: Rob Murray <rob.murray@docker.com>	2024-01-19 20:24:07 +00:00
Sebastiaan van Stijn	7bc56c5365	Merge pull request #46853 from akerouanton/libnet-ep-dns-names libnet: Endpoint: remove isAnonymous & myAliases	2023-12-20 19:53:16 +01:00
Albin Kerouanton	6a2542dacf	libnet: remove Endpoint.anonymous No more concept of "anonymous endpoints". The equivalent is now an endpoint with no DNSNames set. Some of the code removed by this commit was mutating user-supplied endpoint's Aliases to add container's short ID to that list. In order to preserve backward compatibility for the ContainerInspect endpoint, this commit also takes care of adding that short ID (and the container hostname) to `EndpointSettings.Aliases` before returning the response. Signed-off-by: Albin Kerouanton <albinker@gmail.com>	2023-12-20 19:04:37 +01:00
Albin Kerouanton	7a9b680a9c	libnet: remove Endpoint.myAliases This property is now unused, let's get rid of it. Signed-off-by: Albin Kerouanton <albinker@gmail.com>	2023-12-19 10:20:38 +01:00
Albin Kerouanton	8b7af1d0fc	libnet: update dnsNames on ContainerRename The `(Endpoint).rename()` method is changed to only mutate `ep.name` and let a new method `(Endpoint).UpdateDNSNames()` handle DNS updates. As a consequence, the rollback code that was part of `(*Endpoint).rename()` is now removed, and DNS updates are now rolled back by `ContainerRename`. Signed-off-by: Albin Kerouanton <albinker@gmail.com>	2023-12-19 10:20:38 +01:00
Albin Kerouanton	3bb13c7eb4	libnet: Use Endpoint.dnsNames to create DNS records Instead of special-casing anonymous endpoints, use the list of DNS names associated to the endpoint. `(*Endpoint).isAnonymous()` has no more uses, so let's delete it. Signed-off-by: Albin Kerouanton <albinker@gmail.com>	2023-12-19 10:20:37 +01:00
Albin Kerouanton	f5cc497eac	libnet: populate Endpoint.dnsNames on UnmarshalJSON This new property will be empty if the daemon was upgraded with live-restore enabled. To not break DNS resolutions for restored containers, we need to populate dnsNames based on endpoint's myAliases & anonymous properties. Signed-off-by: Albin Kerouanton <albinker@gmail.com>	2023-12-19 10:16:05 +01:00
Albin Kerouanton	ab8968437b	daemon: build the list of endpoint's DNS names Instead of special-casing anonymous endpoints in libnetwork, let the daemon specify what (non fully qualified) DNS names should be associated to container's endpoints. Signed-off-by: Albin Kerouanton <albinker@gmail.com>	2023-12-19 10:16:04 +01:00
Albin Kerouanton	dc1e73cbbf	libnet: add a new dnsNames property to Endpoint This new property is meant to replace myAliases and anonymous properties. The end goal is to get rid of both properties by letting the daemon determine what (non fully qualified) DNS names should be associated to them. Signed-off-by: Albin Kerouanton <albinker@gmail.com>	2023-12-18 18:38:25 +01:00
Rob Murray	27f3abd893	Allow overlapping change in bridge's IPv6 network. Calculate the IPv6 addreesses needed on a bridge, then reconcile them with the addresses on an existing bridge by deleting then adding as required. (Previously, required addresses were added one-by-one, then unwanted addresses were removed. This meant the daemon failed to start if, for example, an existing bridge had address '2000:db8::/64' and the config was changed to '2000:db8::/80'.) IPv6 addresses are now calculated and applied in one go, so there's no need for setupVerifyAndReconcile() to check the set of IPv6 addresses on the bridge. And, it was guarded by !config.InhibitIPv4, which can't have been right. So, removed its IPv6 parts, and added IPv4 to its name. Link local addresses, the example given in the original ticket, are now released when containers are stopped. Not releasing them meant that when using an LL subnet on the default bridge, no container could be started after a container was stopped (because the calculated address could not be re-allocated). In non-default bridge networks using an LL subnet, addresses leaked. Linux always uses the standard 'fe80::/64' LL network. So, if a bridge is configured with an LL subnet prefix that overlaps with it, a config error is reported. Non-overlapping LL subnet prefixes are allowed. Signed-off-by: Rob Murray <rob.murray@docker.com>	2023-12-18 16:10:41 +00:00
Brian Goff	524eef5d75	Merge pull request #46681 from corhere/libn/datastore-misc-cleanups	2023-11-09 11:31:30 -08:00
Cory Snider	4039b9c9c4	libnetwork/datastore: drop (KVObject).DataScope() It wasn't being used for anything meaningful. Signed-off-by: Cory Snider <csnider@mirantis.com>	2023-10-19 12:38:39 -04:00
Cory Snider	bcca214e36	libnetwork: open-code updating svc records Inline the tortured logic for deciding when to skip updating the svc records to give us a fighting chance at deciphering the logic behind the logic and spotting logic bugs. Update the service records synchronously. The only potential for issues is if this change introduces deadlocks, which should be fixed by restrucuting the mutexes rather than papering over the issue with sketchy hacks like deferring the operation to a goroutine. Signed-off-by: Cory Snider <csnider@mirantis.com>	2023-10-17 19:51:21 -04:00
Cory Snider	c85398b020	libnetwork: drop vestigial endpoint-rename logic The logic to rename an endpoint includes code which would synchronize the renamed service records to peers through the distributed datastore. It would trigger the remote peers to pick up the rename by touching a datastore object which remote peers would have subscribed to events on. The code also asserts that the local peer is subscribed to updates on the network associated with the endpoint, presumably as a proxy for asserting that the remote peers would also be subscribed. https://github.com/moby/libnetwork/pull/712 Libnetwork no longer has support for distributed datastores or subscribing to datastore object updates, so this logic can be deleted. Signed-off-by: Cory Snider <csnider@mirantis.com>	2023-10-17 19:46:18 -04:00
Sebastiaan van Stijn	cff4f20c44	migrate to github.com/containerd/log v0.1.0 The github.com/containerd/containerd/log package was moved to a separate module, which will also be used by upcoming (patch) releases of containerd. This patch moves our own uses of the package to use the new module. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2023-10-11 17:52:23 +02:00
Sebastiaan van Stijn	a8ea752a93	libnetwork: Controller.cleanupLocalEndpoints: return errors Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2023-09-26 19:28:18 +02:00
Sebastiaan van Stijn	8ae5dc4aae	libnetwork: Network.updateSvcRecord: remove unused localEps arg Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2023-09-14 15:58:48 +02:00
Sebastiaan van Stijn	3b9f4395cf	libnetwork: remove InterfaceInfo interface Use the only implementation (EndpointInterface) instead. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2023-08-20 19:08:21 +02:00
Albin Kerouanton	c22ec82477	libnet: Fix error capitalization Signed-off-by: Albin Kerouanton <albinker@gmail.com>	2023-08-17 16:48:09 +02:00
Albin Kerouanton	42d34e40f9	libnet: Replace BadRequest with InvalidParameter InvalidParameter is now compatible with errdefs.InvalidParameter. Thus, these errors will now return a 400 status code instead of a 500. Signed-off-by: Albin Kerouanton <albinker@gmail.com>	2023-08-17 16:45:04 +02:00
Sebastiaan van Stijn	64c6f72988	libnetwork: remove Network interface There's only one implementation; drop the interface and use the concrete type instead. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2023-07-22 11:56:41 +02:00
Brian Goff	74da6a6363	Switch all logging to use containerd log pkg This unifies our logging and allows us to propagate logging and trace contexts together. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2023-06-24 00:23:44 +00:00
Cory Snider	befff0e13f	libnetwork: remove more datastore scope plumbing Signed-off-by: Cory Snider <csnider@mirantis.com>	2023-01-26 17:56:40 -05:00
Cory Snider	c71555f030	libnetwork: return concrete-typed *Endpoint libnetwork.Endpoint is an interface with a single implementation. https://github.com/golang/go/wiki/CodeReviewComments#interfaces Signed-off-by: Cory Snider <csnider@mirantis.com>	2023-01-13 14:19:06 -05:00
Cory Snider	581f005aad	libnetwork: don't embed mutex in endpoint Embedded structs are part of the exported surface of a struct type. Boxing a struct value into an interface value does not erase that; any code could gain access to the embedded struct value with a simple type assertion. The mutex is supposed to be a private implementation detail, but endpoint implements sync.Locker because the mutex is embedded. Change the mutex to an unexported field so endpoint no longer spuriously implements the sync.Locker interface. Signed-off-by: Cory Snider <csnider@mirantis.com>	2023-01-13 14:19:06 -05:00
Cory Snider	0e91d2e0e9	libnetwork: return concrete-typed Sandbox Basically every exported method which takes a libnetwork.Sandbox argument asserts that the value's concrete type is sandbox. Passing any other implementation of the interface is a runtime error! This interface is a footgun, and clearly not necessary. Export and use the concrete type instead. Signed-off-by: Cory Snider <csnider@mirantis.com>	2023-01-13 14:19:06 -05:00
Cory Snider	f96b9bf761	libnetwork: return concrete-typed *Controller libnetwork.NetworkController is an interface with a single implementation. https://github.com/golang/go/wiki/CodeReviewComments#interfaces Signed-off-by: Cory Snider <csnider@mirantis.com>	2023-01-13 14:09:37 -05:00
Cory Snider	ae09fe3da7	libnetwork: don't embed mutex in controller Embedded structs are part of the exported surface of a struct type. Boxing a struct value into an interface value does not erase that; any code could gain access to the embedded struct value with a simple type assertion. The mutex is supposed to be a private implementation detail, but controller implements sync.Locker because the mutex is embedded. c, _ := libnetwork.New() c.(sync.Locker).Lock() Change the mutex to an unexported field so controller no longer spuriously implements the sync.Locker interface. Signed-off-by: Cory Snider <csnider@mirantis.com>	2023-01-13 14:09:37 -05:00
Sebastiaan van Stijn	cd381aea56	libnetwork: fix empty-lines (revive) libnetwork/etchosts/etchosts_test.go:167:54: empty-lines: extra empty line at the end of a block (revive) libnetwork/osl/route_linux.go:185:74: empty-lines: extra empty line at the start of a block (revive) libnetwork/osl/sandbox_linux_test.go:323:36: empty-lines: extra empty line at the start of a block (revive) libnetwork/bitseq/sequence.go:412:48: empty-lines: extra empty line at the start of a block (revive) libnetwork/datastore/datastore_test.go:67:46: empty-lines: extra empty line at the end of a block (revive) libnetwork/datastore/mock_store.go:34:60: empty-lines: extra empty line at the end of a block (revive) libnetwork/iptables/firewalld.go:202:44: empty-lines: extra empty line at the end of a block (revive) libnetwork/iptables/firewalld_test.go:76:36: empty-lines: extra empty line at the end of a block (revive) libnetwork/iptables/iptables.go:256:67: empty-lines: extra empty line at the end of a block (revive) libnetwork/iptables/iptables.go:303:128: empty-lines: extra empty line at the start of a block (revive) libnetwork/networkdb/cluster.go:183:72: empty-lines: extra empty line at the end of a block (revive) libnetwork/ipams/null/null_test.go:44:38: empty-lines: extra empty line at the end of a block (revive) libnetwork/drivers/macvlan/macvlan_store.go:45:52: empty-lines: extra empty line at the end of a block (revive) libnetwork/ipam/allocator_test.go:1058:39: empty-lines: extra empty line at the start of a block (revive) libnetwork/drivers/bridge/port_mapping.go:88:111: empty-lines: extra empty line at the end of a block (revive) libnetwork/drivers/bridge/link.go:26:90: empty-lines: extra empty line at the end of a block (revive) libnetwork/drivers/bridge/setup_ipv6_test.go:17:34: empty-lines: extra empty line at the end of a block (revive) libnetwork/drivers/bridge/setup_ip_tables.go:392:4: empty-lines: extra empty line at the start of a block (revive) libnetwork/drivers/bridge/bridge.go:804:50: empty-lines: extra empty line at the start of a block (revive) libnetwork/drivers/overlay/ov_serf.go:183:29: empty-lines: extra empty line at the start of a block (revive) libnetwork/drivers/overlay/ov_utils.go:81:64: empty-lines: extra empty line at the end of a block (revive) libnetwork/drivers/overlay/peerdb.go:172:67: empty-lines: extra empty line at the start of a block (revive) libnetwork/drivers/overlay/peerdb.go:209:67: empty-lines: extra empty line at the start of a block (revive) libnetwork/drivers/overlay/peerdb.go:344:89: empty-lines: extra empty line at the start of a block (revive) libnetwork/drivers/overlay/peerdb.go:436:63: empty-lines: extra empty line at the start of a block (revive) libnetwork/drivers/overlay/overlay.go:183:36: empty-lines: extra empty line at the start of a block (revive) libnetwork/drivers/overlay/encryption.go:69:28: empty-lines: extra empty line at the end of a block (revive) libnetwork/drivers/overlay/ov_network.go:563:81: empty-lines: extra empty line at the start of a block (revive) libnetwork/default_gateway.go:32:43: empty-lines: extra empty line at the start of a block (revive) libnetwork/errors_test.go:9:40: empty-lines: extra empty line at the start of a block (revive) libnetwork/service_common.go:184:64: empty-lines: extra empty line at the end of a block (revive) libnetwork/endpoint.go:161:55: empty-lines: extra empty line at the end of a block (revive) libnetwork/store.go:320:33: empty-lines: extra empty line at the end of a block (revive) libnetwork/store_linux_test.go:11:38: empty-lines: extra empty line at the end of a block (revive) libnetwork/sandbox.go:571:36: empty-lines: extra empty line at the start of a block (revive) libnetwork/service_common.go:317:246: empty-lines: extra empty line at the start of a block (revive) libnetwork/endpoint.go:550:17: empty-lines: extra empty line at the end of a block (revive) libnetwork/sandbox_dns_unix.go:213:106: empty-lines: extra empty line at the start of a block (revive) libnetwork/controller.go:676:85: empty-lines: extra empty line at the end of a block (revive) libnetwork/agent.go:876:60: empty-lines: extra empty line at the end of a block (revive) libnetwork/resolver.go:324:69: empty-lines: extra empty line at the end of a block (revive) libnetwork/network.go:1153:92: empty-lines: extra empty line at the end of a block (revive) libnetwork/network.go:1955:67: empty-lines: extra empty line at the start of a block (revive) libnetwork/network.go:2235:9: empty-lines: extra empty line at the start of a block (revive) libnetwork/libnetwork_internal_test.go:336:26: empty-lines: extra empty line at the start of a block (revive) libnetwork/resolver_test.go:76:35: empty-lines: extra empty line at the end of a block (revive) libnetwork/libnetwork_test.go:303:38: empty-lines: extra empty line at the end of a block (revive) libnetwork/libnetwork_test.go:985:46: empty-lines: extra empty line at the end of a block (revive) libnetwork/ipam/allocator_test.go:1263:37: empty-lines: extra empty line at the start of a block (revive) libnetwork/errors_test.go:9:40: empty-lines: extra empty line at the end of a block (revive) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2022-09-26 19:21:58 +02:00
Sebastiaan van Stijn	4f08346686	fix formatting of "nolint" tags for go1.19 The correct formatting for machine-readable comments is; //<some alphanumeric identifier>:<options>[,<option>...][ // comment] Which basically means: - MUST NOT have a space before `<identifier>` (e.g. `nolint`) - Identified MUST be alphanumeric - MUST be followed by a colon - MUST be followed by at least one `<option>` - Optionally additional `<options>` (comma-separated) - Optionally followed by a comment Any other format will not be considered a machine-readable comment by `gofmt`, and thus formatted as a regular comment. Note that this also means that a `//nolint` (without anything after it) is considered invalid, same for `//#nosec` (starts with a `#`). Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2022-07-13 22:31:53 +02:00
Sebastiaan van Stijn	7b692a421b	libnetwork: remove more config bits related to external k/v stores Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2022-01-06 18:45:45 +01:00
Sebastiaan van Stijn	350e303c7f	endpoint: remove redundant doUpdateHostsFile() function The second (sandbox) argument was unused, and it was only used in a single location, so we may as well inline the check. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2021-06-09 22:38:34 +02:00
Brian Goff	4b981436fe	Fixup libnetwork lint errors Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2021-06-01 23:48:32 +00:00
Brian Goff	a0a473125b	Fix libnetwork imports After moving libnetwork to this repo, we need to update all the import paths for libnetwork to point to docker/docker/libnetwork instead of docker/libnetwork. This change implements that. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2021-06-01 21:51:23 +00:00
fanjiyun	a24e5f5fd4	reduce parameters for func JoinOptionPriority Signed-off-by: fanjiyun <fan.jiyun@zte.com.cn>	2020-05-15 18:29:54 +08:00
Kamil Domański	c4fcd7059c	etchosts: additionally include the container's IPv6 address if available Signed-off-by: Kamil Domański <kamil@domanski.co>	2019-09-04 01:58:00 +02:00
Kamil Domański	226fde5cdd	etchosts: allow adding multiple container ips Signed-off-by: Kamil Domański <kamil@domanski.co>	2019-08-30 23:49:33 +02:00
Chris Telfer	ea2fa20859	Add endpoint load-balancing mode This is the heart of the scalability change for services in libnetwork. The present routing mesh adds load-balancing rules for a network to every container connected to the network. This newer approach creates a load-balancing endpoint per network per node. For every service on a network, libnetwork assigns the VIP of the service to the endpoint's interface as an alias. This endpoint must have a unique IP address in order to route return traffic to it. Traffic destined for a service's VIP arrives at the load-balancing endpoint on the VIP and from there, Linux load balances it among backend destinations while SNATing said traffic to the endpoint's unique IP address. The net result of this scheme is that each node in a swarm need only have one set of load balancing state per service instead of one per container on the node. This scheme is very similar to how services currently operate on Windows nodes in libnetwork. It (as with Windows nodes) costs the use of extra IP addresses in a network (one per node) and an extra network hop in the stack, although, always in the stack local to the container. In order to prevent existing deployments from suddenly failing if they failed to allocate sufficient address space to include per-node load-balancing endpoint IP addresses, this patch preserves the existing functionality and activates the new functionality on a per-network basis depending on whether the network has a load-balancing endpoint. Eventually, moby should always set this option when creating new networks and should only omit it for networks created as part of a swarm that are not marked to use endpoint load balancing. This patch also normalizes the code to treat "load" and "balancer" as two separate words from the perspectives of variable/function naming. This means that the 'b' in "balancer" must be capitalized. Signed-off-by: Chris Telfer <ctelfer@docker.com>	2018-06-28 12:08:18 -04:00
Chris Telfer	c4d507b566	Remove non-service cluster info on sbLeave The system should remove cluster service info including networkDB entries and DNS entries for container endpoints that are not part of a service as well as those that are part of a service. This used to be the normal sequence of operations but it moved to sandbox.DisableService() in an effort to more gracefully handle endpoint removal from a service (which proved insufficient). Unfortunately subsequent changes also removed the newly-mandetory call to sandbox.DisableService() preventing proper cleanup for non-service container endpoints. Signed-off-by: Chris Telfer <ctelfer@docker.com>	2018-05-31 14:21:55 -04:00
Chris Telfer	147912afad	Merge pull request #2132 from cziebuhr/2093-iface_order2 Improve interface order	2018-05-30 12:26:38 -04:00
Christoph Ziebuhr	40923e7353	Use ordered array instead of heap for sb.endpoints Signed-off-by: Christoph Ziebuhr <chris@codefrickler.de>	2018-03-21 10:31:56 +01:00
Chris Telfer	7d7412f957	Gracefully remove LB endpoints from services This patch attempts to allow endpoints to complete servicing connections while being removed from a service. The change adds a flag to the endpoint.deleteServiceInfoFromCluster() method to indicate whether this removal should fully remove connectivity through the load balancer to the endpoint or should just disable directing further connections to the endpoint. If the flag is 'false', then the load balancer assigns a weight of 0 to the endpoint but does not remove it as a linux load balancing destination. It does remove the endpoint as a docker load balancing endpoint but tracks it in a special map of "disabled-but-not- destroyed" load balancing endpoints. This allows traffic to continue flowing, at least under Linux. If the flag is 'true', then the code removes the endpoint entirely as a load balancing destination. The sandbox.DisableService() method invokes deleteServiceInfoFromCluster() with the flag sent to 'false', while the endpoint.sbLeave() method invokes it with the flag set to 'true' to complete the removal on endpoint finalization. Renaming the endpoint invokes deleteServiceInfoFromCluster() with the flag set to 'true' because renaming attempts to completely remove and then re-add each endpoint service entry. The controller.rmServiceBinding() method, which carries out the operation, similarly gets a new flag for whether to fully remove the endpoint. If the flag is false, it does the job of moving the endpoint from the load balancing set to the 'disabled' set. It then removes or de-weights the entry in the OS load balancing table via network.rmLBBackend(). It removes the service entirely via said method ONLY IF there are no more live or disabled load balancing endpoints. Similarly network.addLBBackend() requires slight tweaking to properly manage the disabled set. Finally, this change requires propagating the status of disabled service endpoints via the networkDB. Accordingly, the patch includes both code to generate and handle service update messages. It also augments the service structure with a ServiceDisabled boolean to convey whether an endpoint should ultimately be removed or just disabled. This, naturally, required a rebuild of the protocol buffer code as well. Signed-off-by: Chris Telfer <ctelfer@docker.com>	2018-03-16 15:19:49 -04:00

1 2 3 4 5

214 commits