beenull/moby

Author	SHA1	Message	Date
Albin Kerouanton	80c44b4b2e	daemon: rename: don't reload endpoint from datastore Commit `8b7af1d0f` added some code to update the DNSNames of all endpoints attached to a sandbox by loading a new instance of each affected endpoints from the datastore through a call to `Network.EndpointByID()`. This method then calls `Network.getEndpointFromStore()`, that in turn calls `store.GetObject()`, which then calls `cache.get()`, which calls `o.CopyTo(kvObject)`. This effectively creates a fresh new instance of an Endpoint. However, endpoints are already kept in memory by Sandbox, meaning we now have two in-memory instances of the same Endpoint. As it turns out, libnetwork is built around the idea that no two objects representing the same thing should leave in-memory, otherwise breaking mutex locking and optimistic locking (as both instances will have a drifting version tracking ID -- dbIndex in libnetwork parliance). In this specific case, this bug materializes by container rename failing when applied a second time for a given container. An integration test is added to make sure this won't happen again. Signed-off-by: Albin Kerouanton <albinker@gmail.com>	2024-01-23 22:53:21 +01:00
Bjorn Neergaard	f20abbc96c	libnetwork: use conntrack and --ctstate for all rules On modern kernels this is an alias; however newer code has preferred ctstate while older code has preferred the deprecated 'state' name. Prefer the newer name for uniformity in the rules libnetwork creates, and because some implementations/distributions of the xtables userland tools may not support the legacy alias. Signed-off-by: Bjorn Neergaard <bjorn.neergaard@docker.com>	2023-10-13 00:56:30 -06:00
Sebastiaan van Stijn	cff4f20c44	migrate to github.com/containerd/log v0.1.0 The github.com/containerd/containerd/log package was moved to a separate module, which will also be used by upcoming (patch) releases of containerd. This patch moves our own uses of the package to use the new module. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2023-10-11 17:52:23 +02:00
Sebastiaan van Stijn	cc414a2012	libnetwork/osl: remove Sandbox.Info() "Pay no attention to the implementation behind the curtain!" There's only one implementation of the Sandbox interface, and only one implementation of the Info interface, and they both happens to be implemented by the same type: networkNamespace. Let's merge these interfaces. And now that we know that there's one, and only one Info, we can drop the charade, and relieve the Sandbox from its dual personality. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2023-08-20 19:26:39 +02:00
Sebastiaan van Stijn	64c6f72988	libnetwork: remove Network interface There's only one implementation; drop the interface and use the concrete type instead. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2023-07-22 11:56:41 +02:00
Sebastiaan van Stijn	dd5ea7e996	libnetwork: format code with gofumpt Formatting the code with https://github.com/mvdan/gofumpt Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2023-06-29 00:31:49 +02:00
Brian Goff	74da6a6363	Switch all logging to use containerd log pkg This unifies our logging and allows us to propagate logging and trace contexts together. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2023-06-24 00:23:44 +00:00
Cory Snider	c71555f030	libnetwork: return concrete-typed *Endpoint libnetwork.Endpoint is an interface with a single implementation. https://github.com/golang/go/wiki/CodeReviewComments#interfaces Signed-off-by: Cory Snider <csnider@mirantis.com>	2023-01-13 14:19:06 -05:00
Cory Snider	0e91d2e0e9	libnetwork: return concrete-typed Sandbox Basically every exported method which takes a libnetwork.Sandbox argument asserts that the value's concrete type is sandbox. Passing any other implementation of the interface is a runtime error! This interface is a footgun, and clearly not necessary. Export and use the concrete type instead. Signed-off-by: Cory Snider <csnider@mirantis.com>	2023-01-13 14:19:06 -05:00
Cory Snider	102090916e	libnetwork: addRedirectRules without reexec Signed-off-by: Cory Snider <csnider@mirantis.com>	2023-01-11 12:14:32 -05:00
Cory Snider	582dd705c1	libnetwork: fwmarker without reexec Signed-off-by: Cory Snider <csnider@mirantis.com>	2023-01-11 12:14:32 -05:00
Sebastiaan van Stijn	145817a9cf	libnetwork: use strconv instead of fmt.Sprintf() Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2022-10-08 17:41:39 +02:00
Ryan Barry	293cfd6c76	Ensure performance tuning is always applied Previously, with the patch from #43146, it was possible for a network configured with a single ingress or load balancer on a distribution which does not have the `ip_vs` kernel module loaded by default to try to apply sysctls which did not exist yet, and subsequently dynamically load the module as part of ipvs/netlink.go. This module is vendored, and not a great place to try to tie back into core libnetwork functionality, so also ensure that the sysctls (which are idempotent) are called after ingress/lb creation once `ipvs` has been initialized. Signed-off-by: Ryan Barry <rbarry@mirantis.com>	2022-05-31 11:47:30 -04:00
Sebastiaan van Stijn	301b252b58	libnetwork: don't use strings.Fields() to improve performance While looking at this code, I noticed that we were wasting quite some resources by first constructing a string, only to split it again (with `strings.Fields()`) into a string slice. Some conversions were also happening multiple times (int to string, IP-address to string, etc.) Setting up networking is known to be costing a considerable amount of time when starting containers, and while this may only be a small part of that, it doesn't hurt to save some resources (and readability of the code isn't significantly impacted). For example, benchmarking the `redirector()` code before/after: BenchmarkParseOld-4 137646 8398 ns/op 4192 B/op 75 allocs/op BenchmarkParseNew-4 629395 1762 ns/op 2362 B/op 24 allocs/op Average over 10 runs: benchstat old.txt new.txt name old time/op new time/op delta Parse-4 8.43µs ± 2% 1.79µs ± 3% -78.76% (p=0.000 n=9+8) name old alloc/op new alloc/op delta Parse-4 4.19kB ± 0% 2.36kB ± 0% -43.65% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Parse-4 75.0 ± 0% 24.0 ± 0% -68.00% (p=0.000 n=10+10) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2022-04-20 14:43:07 +02:00
Eng Zer Jun	c55a4ac779	refactor: move from io/ioutil to io and os package The io/ioutil package has been deprecated in Go 1.16. This commit replaces the existing io/ioutil functions with their new definitions in io and os packages. Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>	2021-08-27 14:56:57 +08:00
Brian Goff	116f200737	Fix gosec complaints in libnetwork These were purposefully ignored before but this goes ahead and "fixes" most of them. Note that none of the things gosec flagged are problematic, just quieting the linter here. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2021-06-25 18:02:03 +02:00
Brian Goff	4b981436fe	Fixup libnetwork lint errors Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2021-06-01 23:48:32 +00:00
Brian Goff	a0a473125b	Fix libnetwork imports After moving libnetwork to this repo, we need to update all the import paths for libnetwork to point to docker/docker/libnetwork instead of docker/libnetwork. This change implements that. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2021-06-01 21:51:23 +00:00
Sebastiaan van Stijn	fb9ecec127	Merge pull request #2585 from scottp-dpaw/lbendpoint_fix service_linux: Fix null dereference in findLBEndpointSandbox	2020-10-31 18:31:17 +01:00
Scott Percival	959dfca7e6	service_linux: Fix null dereference in findLBEndpointSandbox Signed-off-by: Scott Percival <scottp@lastyard.com>	2020-09-22 15:06:41 +08:00
Benjamin Böhmke	34f4706174	added TODOs for open IPv6 point Signed-off-by: Benjamin Böhmke <benjamin@boehmke.net>	2020-07-23 16:52:40 +02:00
Billy Ridgway	8dbb5b5a7d	Implement NAT IPv6 to fix the issue https://github.com/moby/moby/issues/25407 Signed-off-by: Billy Ridgway <wrridgwa@us.ibm.com> Signed-off-by: Benjamin Böhmke <benjamin@boehmke.net>	2020-07-19 16:16:51 +02:00
Brian Goff	a533fe7094	Use vendored ipvs package The ipvs package was moved to a separate repo. The ipvs package is a fairly generic set of helpers for managing IPVS. The ipvs package is used by docker swarm and kubernetes. Because we want to merge libnetwork back into the moby/moby codebase while also not creating more dependencies for other projects on moby/moby itself, it was decided that the best path for ipvs is to live on it's own since there are no other ties to libnetwork. Ref: https://github.com/moby/libnetwork/issues/2522 Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2020-03-11 12:13:37 -07:00
Chris Telfer	013ca3bdf8	Make DSR an overlay-specific driver "option" Allow DSR to be a configurable option through a generic option to the overlay driver. On the one hand this approach makes sense insofar as only overlay networks can currently perform load balancing. On the other hand, this approach has several issues. First, should we create another type of swarm scope network, this will prevent it working. Second, the service core code is separate from the driver code and the driver code can't influence the core data structures. So the driver code can't set this option itself. Therefore, implementing in this way requires some hack code to test for this option in controller.NewNetwork. A more correct approach would be to make this a generic option for any network. Then the driver could ignore, reject or be unaware of the option depending on the chosen model. This would require changes to: * libnetwork - naturally * the docker API - to carry the option * swarmkit - to propagate the option * the docker CLI - to support the option * moby - to translate the API option into a libnetwork option Given the urgency of requests to address this issue, this approach will be saved for a future iteration. Signed-off-by: Chris Telfer <ctelfer@docker.com>	2018-10-11 14:13:19 -04:00
Chris Telfer	9a2464f436	Set east-west load balancing to use direct routing Modify the loadbalancing for east-west traffic to use direct routing rather than NAT and update tasks to use direct service return under linux. This avoids hiding the source address of the sender and improves the performance in single-client/single-server tests. Signed-off-by: Chris Telfer <ctelfer@docker.com>	2018-10-11 14:13:19 -04:00
fanjiyun	03ba96c5cf	Rolling back the port configs if failed to programIngress() Signed-off-by: fanjiyun <fan.jiyun@zte.com.cn>	2018-09-11 19:10:59 +08:00
Josh Soref	a06f1b2c4e	Spelling fixes * addresses * assigned * at least * attachments * auxiliary * available * cleanup * communicate * communications * configuration * connection * connectivity * destination * encountered * endpoint * example * existing * expansion * expected * external * forwarded * gateway * implementations * implemented * initialize * internally * loses * message * network * occurred * operational * origin * overlapping * reaper * redirector * release * representation * resolver * retrieve * returns * sanbdox * sequence * succesful * synchronizing * update * validates Signed-off-by: Josh Soref <jsoref@gmail.com>	2018-07-12 12:54:44 -07:00
Chris Telfer	06922d2d81	Use fmt precision to limit string length The previous code used string slices to limit the length of certain fields like endpoint or sandbox IDs. This assumes that these strings are at least as long as the slice length. Unfortunately, some sandbox IDs can be smaller than 7 characters. This fix addresses this issue by systematically converting format string calls that were taking fixed-slice arguments to use a precision specifier in the string format itself. From the golang fmt package documentation: For strings, byte slices and byte arrays, however, precision limits the length of the input to be formatted (not the size of the output), truncating if necessary. Normally it is measured in runes, but for these types when formatted with the %x or %X format it is measured in bytes. This nicely fits the desired behavior: it will limit the number of runes considered for string interpolation to the precision value. Signed-off-by: Chris Telfer <ctelfer@docker.com>	2018-07-05 17:44:04 -04:00
Chris Telfer	ac0aa6485b	Adjust warnings for transient LB endpoint conds Add debug and error logs to notify when a load balancing sandbox is not found. This can occur in normal operation during removal. Signed-off-by: Chris Telfer <ctelfer@docker.com>	2018-06-28 12:08:18 -04:00
Chris Telfer	ea2fa20859	Add endpoint load-balancing mode This is the heart of the scalability change for services in libnetwork. The present routing mesh adds load-balancing rules for a network to every container connected to the network. This newer approach creates a load-balancing endpoint per network per node. For every service on a network, libnetwork assigns the VIP of the service to the endpoint's interface as an alias. This endpoint must have a unique IP address in order to route return traffic to it. Traffic destined for a service's VIP arrives at the load-balancing endpoint on the VIP and from there, Linux load balances it among backend destinations while SNATing said traffic to the endpoint's unique IP address. The net result of this scheme is that each node in a swarm need only have one set of load balancing state per service instead of one per container on the node. This scheme is very similar to how services currently operate on Windows nodes in libnetwork. It (as with Windows nodes) costs the use of extra IP addresses in a network (one per node) and an extra network hop in the stack, although, always in the stack local to the container. In order to prevent existing deployments from suddenly failing if they failed to allocate sufficient address space to include per-node load-balancing endpoint IP addresses, this patch preserves the existing functionality and activates the new functionality on a per-network basis depending on whether the network has a load-balancing endpoint. Eventually, moby should always set this option when creating new networks and should only omit it for networks created as part of a swarm that are not marked to use endpoint load balancing. This patch also normalizes the code to treat "load" and "balancer" as two separate words from the perspectives of variable/function naming. This means that the 'b' in "balancer" must be capitalized. Signed-off-by: Chris Telfer <ctelfer@docker.com>	2018-06-28 12:08:18 -04:00
Chris Telfer	85a3483b4b	Refactor [add\|rm]LBBackend() to use lb struct This was passing extra information and adding confusion about the purpose of the load balancing structure. Signed-off-by: Chris Telfer <ctelfer@docker.com>	2018-06-28 12:08:18 -04:00
Flavio Crisciani	3d2b2f1c7e	Possible race on ingress programming Make sure that iptables operations on ingress are serialized. Before 2 racing routines trying to create the ingress chain were allowed and one was failing reporting the chain as already existing. The lock guarantees that this condition does not happen anymore Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>	2018-06-07 13:02:04 -07:00
Chris Telfer	7d7412f957	Gracefully remove LB endpoints from services This patch attempts to allow endpoints to complete servicing connections while being removed from a service. The change adds a flag to the endpoint.deleteServiceInfoFromCluster() method to indicate whether this removal should fully remove connectivity through the load balancer to the endpoint or should just disable directing further connections to the endpoint. If the flag is 'false', then the load balancer assigns a weight of 0 to the endpoint but does not remove it as a linux load balancing destination. It does remove the endpoint as a docker load balancing endpoint but tracks it in a special map of "disabled-but-not- destroyed" load balancing endpoints. This allows traffic to continue flowing, at least under Linux. If the flag is 'true', then the code removes the endpoint entirely as a load balancing destination. The sandbox.DisableService() method invokes deleteServiceInfoFromCluster() with the flag sent to 'false', while the endpoint.sbLeave() method invokes it with the flag set to 'true' to complete the removal on endpoint finalization. Renaming the endpoint invokes deleteServiceInfoFromCluster() with the flag set to 'true' because renaming attempts to completely remove and then re-add each endpoint service entry. The controller.rmServiceBinding() method, which carries out the operation, similarly gets a new flag for whether to fully remove the endpoint. If the flag is false, it does the job of moving the endpoint from the load balancing set to the 'disabled' set. It then removes or de-weights the entry in the OS load balancing table via network.rmLBBackend(). It removes the service entirely via said method ONLY IF there are no more live or disabled load balancing endpoints. Similarly network.addLBBackend() requires slight tweaking to properly manage the disabled set. Finally, this change requires propagating the status of disabled service endpoints via the networkDB. Accordingly, the patch includes both code to generate and handle service update messages. It also augments the service structure with a ServiceDisabled boolean to convey whether an endpoint should ultimately be removed or just disabled. This, naturally, required a rebuild of the protocol buffer code as well. Signed-off-by: Chris Telfer <ctelfer@docker.com>	2018-03-16 15:19:49 -04:00
Wataru Ishida	2120ed2363	Support SCTP port mapping Signed-off-by: Wataru Ishida <ishida.wataru@lab.ntt.co.jp> Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>	2018-02-13 16:01:03 +09:00
Pradip Dhara	43360c627f	Enabling ILB/ELB on windows using per-node, per-network LB endpoint. Signed-off-by: Pradip Dhara <pradipd@microsoft.com>	2017-08-29 00:17:42 -07:00
Derek McGowan	710e0664c4	Update logrus to v1.0.1 Fix case sensitivity issue Update docker and runc vendors Signed-off-by: Derek McGowan <derek@mcgstyle.net>	2017-08-07 11:20:47 -07:00
Jacob Wen	5c01dcd401	iptables: jump to DOCKER-USER first Fixes #1827 Signed-off-by: Jacob Wen <jian.w.wen@oracle.com>	2017-07-20 16:38:14 +08:00
Flavio Crisciani	f969f26966	Service discovery race on serviceBindings delete. Bug on IP reuse (#1808 ) * Correct SetMatrix documentation The SetMatrix is a generic data structure, so the description should not be tight to any specific use Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com> * Service Discovery reuse name and serviceBindings deletion - Added logic to handle name reuse from different services - Moved the deletion from the serviceBindings map at the end of the rmServiceBindings body to avoid race with new services Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com> * Avoid race on network cleanup Use the locker to avoid the race between the network deletion and new endpoints being created Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com> * CleanupServiceBindings to clean the SD records Allow the cleanupServicebindings to take care of the service discovery cleanup. Also avoid to trigger the cleanup for each endpoint from an SD point of view LB and SD will be separated in the future Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com> * Addressed comments Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com> * NetworkDB deleteEntry has to happen If there is an error locally guarantee that the delete entry on network DB is still honored Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>	2017-06-18 05:25:58 -07:00
Flavio Crisciani	65860255c6	Fixed code issues Fixed issues highlighted by the new checks Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>	2017-06-12 11:31:35 -07:00
Flavio Crisciani	39d2204896	Service discovery logic rework changed the ipMap to SetMatrix to allow transient states Compacted the addSvc and deleteSvc into a one single method Updated the datastructure for backends to allow storing all the information needed to cleanup properly during the cleanupServiceBindings Removed the enable/disable Service logic that was racing with sbLeave/sbJoin logic Add some debug logs to track further race conditions Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>	2017-06-11 20:49:29 -07:00
Alessandro Boch	a02b4ef4a4	Fix service logs - do not error on duplicate service removal - give some context to service logs, this would help debugging related issues Signed-off-by: Alessandro Boch <aboch@docker.com>	2017-02-01 17:32:08 -08:00
Alessandro Boch	4e69afc4f3	Make virtual service programming more robust - Do not relay on software flags to decide when to create the virtual service. Instead query the kernel for presence. So that it cannot happen that a real server creation fails because the virtual server is missing. Signed-off-by: Alessandro Boch <aboch@docker.com>	2017-02-01 15:54:31 -08:00
Alessandro Boch	d565d5f2d2	Gracefully handle redundant ipvs service create failures Signed-off-by: Alessandro Boch <aboch@docker.com>	2017-01-31 16:34:53 -08:00
Alessandro Boch	66197b7787	Fix incorrect error log message - Failed to _add_ firewall mark... should be _delete_ Signed-off-by: Alessandro Boch <aboch@docker.com>	2017-01-23 16:29:03 -08:00
Alessandro Boch	fac86cf69a	Add missing locks in agent and service code Signed-off-by: Alessandro Boch <aboch@docker.com>	2016-11-29 13:58:06 -08:00
Madhu Venugopal	684ea92515	Add a ICMP reply rule for service VIP Ping on VIP has been behaving inconsistently depending on if a task for a service is local or remote. With this fix, the ICMP echo-request packets to service VIP are replied to by the NAT rule to self Signed-off-by: Madhu Venugopal <madhu@docker.com>	2016-11-21 08:57:40 -08:00
Madhu Venugopal	b6540296b0	Revert "Enable ping for service vip address" This reverts commit `ddc74ffced`. Signed-off-by: Madhu Venugopal <madhu@docker.com>	2016-11-21 03:30:27 -08:00
Madhu Venugopal	d1b012d97a	Windows overlay driver support 1. Base work was done by msabansal and nwoodmsft from : https://github.com/msabansal/docker/tree/overlay 2. reorganized under drivers/windows/overlay and rebased to libnetwork master 3. Porting overlay common fixes to windows driver * `46f525c` * `ba8714e` * `6368406` 4. Windows Service Discovery changes for swarm-mode 5. renaming default windows ipam drivers as "windows" Signed-off-by: Madhu Venugopal <madhu@docker.com> Signed-off-by: msabansal <sabansal@microsoft.com> Signed-off-by: nwoodmsft <Nicholas.Wood@microsoft.com>	2016-11-03 16:50:04 -07:00
Jana Radhakrishnan	b1e753137f	Merge pull request #1501 from sanimej/vip Enable ping for service vip address	2016-11-02 09:45:14 -07:00
Alessandro Boch	a21d577b8b	Block non exposed port traffic on ingress nw interfaces Signed-off-by: Alessandro Boch <aboch@docker.com>	2016-10-27 20:28:08 -07:00

1 2

78 commits