Commit graph

78 commits

Author SHA1 Message Date
Albin Kerouanton
80c44b4b2e daemon: rename: don't reload endpoint from datastore
Commit 8b7af1d0f added some code to update the DNSNames of all
endpoints attached to a sandbox by loading a new instance of each
affected endpoints from the datastore through a call to
`Network.EndpointByID()`.

This method then calls `Network.getEndpointFromStore()`, that in
turn calls `store.GetObject()`, which then calls `cache.get()`,
which calls `o.CopyTo(kvObject)`. This effectively creates a fresh
new instance of an Endpoint. However, endpoints are already kept in
memory by Sandbox, meaning we now have two in-memory instances of
the same Endpoint.

As it turns out, libnetwork is built around the idea that no two objects
representing the same thing should leave in-memory, otherwise breaking
mutex locking and optimistic locking (as both instances will have a drifting
version tracking ID -- dbIndex in libnetwork parliance).

In this specific case, this bug materializes by container rename failing
when applied a second time for a given container. An integration test is
added to make sure this won't happen again.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2024-01-23 22:53:21 +01:00
Bjorn Neergaard
f20abbc96c
libnetwork: use conntrack and --ctstate for all rules
On modern kernels this is an alias; however newer code has preferred
ctstate while older code has preferred the deprecated 'state' name.

Prefer the newer name for uniformity in the rules libnetwork creates,
and because some implementations/distributions of the xtables userland
tools may not support the legacy alias.

Signed-off-by: Bjorn Neergaard <bjorn.neergaard@docker.com>
2023-10-13 00:56:30 -06:00
Sebastiaan van Stijn
cff4f20c44
migrate to github.com/containerd/log v0.1.0
The github.com/containerd/containerd/log package was moved to a separate
module, which will also be used by upcoming (patch) releases of containerd.

This patch moves our own uses of the package to use the new module.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-10-11 17:52:23 +02:00
Sebastiaan van Stijn
cc414a2012
libnetwork/osl: remove Sandbox.Info()
"Pay no attention to the implementation behind the curtain!"

There's only one implementation of the Sandbox interface, and only one implementation
of the Info interface, and they both happens to be implemented by the same type:
networkNamespace. Let's merge these interfaces.

And now that we know that there's one, and only one Info, we can drop the charade,
and relieve the Sandbox from its dual personality.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-08-20 19:26:39 +02:00
Sebastiaan van Stijn
64c6f72988
libnetwork: remove Network interface
There's only one implementation; drop the interface and use the
concrete type instead.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-07-22 11:56:41 +02:00
Sebastiaan van Stijn
dd5ea7e996
libnetwork: format code with gofumpt
Formatting the code with https://github.com/mvdan/gofumpt

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-29 00:31:49 +02:00
Brian Goff
74da6a6363 Switch all logging to use containerd log pkg
This unifies our logging and allows us to propagate logging and trace
contexts together.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2023-06-24 00:23:44 +00:00
Cory Snider
c71555f030 libnetwork: return concrete-typed *Endpoint
libnetwork.Endpoint is an interface with a single implementation.

https://github.com/golang/go/wiki/CodeReviewComments#interfaces

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-13 14:19:06 -05:00
Cory Snider
0e91d2e0e9 libnetwork: return concrete-typed *Sandbox
Basically every exported method which takes a libnetwork.Sandbox
argument asserts that the value's concrete type is *sandbox. Passing any
other implementation of the interface is a runtime error! This interface
is a footgun, and clearly not necessary. Export and use the concrete
type instead.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-13 14:19:06 -05:00
Cory Snider
102090916e libnetwork: addRedirectRules without reexec
Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-11 12:14:32 -05:00
Cory Snider
582dd705c1 libnetwork: fwmarker without reexec
Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-11 12:14:32 -05:00
Sebastiaan van Stijn
145817a9cf
libnetwork: use strconv instead of fmt.Sprintf()
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-10-08 17:41:39 +02:00
Ryan Barry
293cfd6c76 Ensure performance tuning is always applied
Previously, with the patch from #43146, it was possible for a
network configured with a single ingress or load balancer on a
distribution which does not have the `ip_vs` kernel module loaded
by default to try to apply sysctls which did not exist yet, and
subsequently dynamically load the module as part of ipvs/netlink.go.

This module is vendored, and not a great place to try to tie back
into core libnetwork functionality, so also ensure that the sysctls
(which are idempotent) are called after ingress/lb creation once
`ipvs` has been initialized.

Signed-off-by: Ryan Barry <rbarry@mirantis.com>
2022-05-31 11:47:30 -04:00
Sebastiaan van Stijn
301b252b58
libnetwork: don't use strings.Fields() to improve performance
While looking at this code, I noticed that we were wasting quite some resources
by first constructing a string, only to split it again (with `strings.Fields()`)
into a string slice.

Some conversions were also happening multiple times (int to string, IP-address to
string, etc.)

Setting up networking is known to be costing a considerable amount of time when
starting containers, and while this may only be a small part of that, it doesn't
hurt to save some resources (and readability of the code isn't significantly
impacted).

For example, benchmarking the `redirector()` code before/after:

    BenchmarkParseOld-4   	  137646	      8398 ns/op	    4192 B/op	      75 allocs/op
    BenchmarkParseNew-4   	  629395	      1762 ns/op	    2362 B/op	      24 allocs/op

Average over 10 runs:

    benchstat old.txt new.txt

    name     old time/op    new time/op    delta
    Parse-4    8.43µs ± 2%    1.79µs ± 3%  -78.76%  (p=0.000 n=9+8)

    name     old alloc/op   new alloc/op   delta
    Parse-4    4.19kB ± 0%    2.36kB ± 0%  -43.65%  (p=0.000 n=10+10)

    name     old allocs/op  new allocs/op  delta
    Parse-4      75.0 ± 0%      24.0 ± 0%  -68.00%  (p=0.000 n=10+10)

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-04-20 14:43:07 +02:00
Eng Zer Jun
c55a4ac779
refactor: move from io/ioutil to io and os package
The io/ioutil package has been deprecated in Go 1.16. This commit
replaces the existing io/ioutil functions with their new definitions in
io and os packages.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2021-08-27 14:56:57 +08:00
Brian Goff
116f200737
Fix gosec complaints in libnetwork
These were purposefully ignored before but this goes ahead and "fixes"
most of them.
Note that none of the things gosec flagged are problematic, just
quieting the linter here.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2021-06-25 18:02:03 +02:00
Brian Goff
4b981436fe Fixup libnetwork lint errors
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2021-06-01 23:48:32 +00:00
Brian Goff
a0a473125b Fix libnetwork imports
After moving libnetwork to this repo, we need to update all the import
paths for libnetwork to point to docker/docker/libnetwork instead of
docker/libnetwork.
This change implements that.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2021-06-01 21:51:23 +00:00
Sebastiaan van Stijn
fb9ecec127 Merge pull request #2585 from scottp-dpaw/lbendpoint_fix
service_linux: Fix null dereference in findLBEndpointSandbox
2020-10-31 18:31:17 +01:00
Scott Percival
959dfca7e6 service_linux: Fix null dereference in findLBEndpointSandbox
Signed-off-by: Scott Percival <scottp@lastyard.com>
2020-09-22 15:06:41 +08:00
Benjamin Böhmke
34f4706174 added TODOs for open IPv6 point
Signed-off-by: Benjamin Böhmke <benjamin@boehmke.net>
2020-07-23 16:52:40 +02:00
Billy Ridgway
8dbb5b5a7d Implement NAT IPv6 to fix the issue https://github.com/moby/moby/issues/25407
Signed-off-by: Billy Ridgway <wrridgwa@us.ibm.com>
Signed-off-by: Benjamin Böhmke <benjamin@boehmke.net>
2020-07-19 16:16:51 +02:00
Brian Goff
a533fe7094 Use vendored ipvs package
The ipvs package was moved to a separate repo.

The ipvs package is a fairly generic set of helpers for managing IPVS.
The ipvs package is used by docker swarm and kubernetes.
Because we want to merge libnetwork back into the moby/moby codebase
while also not creating more dependencies for other projects on
moby/moby itself, it was decided that the best path for ipvs is to live
on it's own since there are no other ties to libnetwork.

Ref: https://github.com/moby/libnetwork/issues/2522

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-03-11 12:13:37 -07:00
Chris Telfer
013ca3bdf8 Make DSR an overlay-specific driver "option"
Allow DSR to be a configurable option through a generic option to the
overlay driver.  On the one hand this approach makes sense insofar as
only overlay networks can currently perform load balancing.  On the
other hand, this approach has several issues.  First, should we create
another type of swarm scope network, this will prevent it working.
Second, the service core code is separate from the driver code and the
driver code can't influence the core data structures.  So the driver
code can't set this option itself.  Therefore, implementing in this way
requires some hack code to test for this option in
controller.NewNetwork.

A more correct approach would be to make this a generic option for any
network.  Then the driver could ignore, reject or be unaware of the option
depending on the chosen model.  This would require changes to:
  * libnetwork - naturally
  * the docker API - to carry the option
  * swarmkit - to propagate the option
  * the docker CLI - to support the option
  * moby - to translate the API option into a libnetwork option
Given the urgency of requests to address this issue, this approach will
be saved for a future iteration.

Signed-off-by: Chris Telfer <ctelfer@docker.com>
2018-10-11 14:13:19 -04:00
Chris Telfer
9a2464f436 Set east-west load balancing to use direct routing
Modify the loadbalancing for east-west traffic to use direct routing
rather than NAT and update tasks to use direct service return under
linux.  This avoids hiding the source address of the sender and improves
the performance in single-client/single-server tests.

Signed-off-by: Chris Telfer <ctelfer@docker.com>
2018-10-11 14:13:19 -04:00
fanjiyun
03ba96c5cf Rolling back the port configs if failed to programIngress()
Signed-off-by: fanjiyun <fan.jiyun@zte.com.cn>
2018-09-11 19:10:59 +08:00
Josh Soref
a06f1b2c4e Spelling fixes
* addresses
* assigned
* at least
* attachments
* auxiliary
* available
* cleanup
* communicate
* communications
* configuration
* connection
* connectivity
* destination
* encountered
* endpoint
* example
* existing
* expansion
* expected
* external
* forwarded
* gateway
* implementations
* implemented
* initialize
* internally
* loses
* message
* network
* occurred
* operational
* origin
* overlapping
* reaper
* redirector
* release
* representation
* resolver
* retrieve
* returns
* sanbdox
* sequence
* succesful
* synchronizing
* update
* validates

Signed-off-by: Josh Soref <jsoref@gmail.com>
2018-07-12 12:54:44 -07:00
Chris Telfer
06922d2d81 Use fmt precision to limit string length
The previous code used string slices to limit the length of certain
fields like endpoint or sandbox IDs.  This assumes that these strings
are at least as long as the slice length.  Unfortunately, some sandbox
IDs can be smaller than 7 characters.   This fix addresses this issue
by systematically converting format string calls that were taking
fixed-slice arguments to use a precision specifier in the string format
itself.  From the golang fmt package documentation:

    For strings, byte slices and byte arrays, however, precision limits
    the length of the input to be formatted (not the size of the output),
    truncating if necessary. Normally it is measured in runes, but for
    these types when formatted with the %x or %X format it is measured
    in bytes.

This nicely fits the desired behavior: it will limit the number of
runes considered for string interpolation to the precision value.

Signed-off-by: Chris Telfer <ctelfer@docker.com>
2018-07-05 17:44:04 -04:00
Chris Telfer
ac0aa6485b Adjust warnings for transient LB endpoint conds
Add debug and error logs to notify when a load balancing sandbox
is not found.  This can occur in normal operation during removal.

Signed-off-by: Chris Telfer <ctelfer@docker.com>
2018-06-28 12:08:18 -04:00
Chris Telfer
ea2fa20859 Add endpoint load-balancing mode
This is the heart of the scalability change for services in libnetwork.
The present routing mesh adds load-balancing rules for a network to
every container connected to the network.  This newer approach creates a
load-balancing endpoint per network per node.  For every service on a
network, libnetwork assigns the VIP of the service to the endpoint's
interface as an alias.  This endpoint must have a unique IP address in
order to route return traffic to it.  Traffic destined for a service's
VIP arrives at the load-balancing endpoint on the VIP and from there,
Linux load balances it among backend destinations while SNATing said
traffic to the endpoint's unique IP address.

The net result of this scheme is that each node in a swarm need only
have one set of load balancing state per service instead of one per
container on the node.  This scheme is very similar to how services
currently operate on Windows nodes in libnetwork.  It (as with Windows
nodes) costs the use of extra IP addresses in a network (one per node)
and an extra network hop in the stack, although, always in the stack
local to the container.

In order to prevent existing deployments from suddenly failing if they
failed to allocate sufficient address space to include per-node
load-balancing endpoint IP addresses, this patch preserves the existing
functionality and activates the new functionality on a per-network
basis depending on whether the network has a load-balancing endpoint.
Eventually, moby should always set this option when creating new
networks and should only omit it for networks created as part of a swarm
that are not marked to use endpoint load balancing.

This patch also normalizes the code to treat "load" and "balancer"
as two separate words from the perspectives of variable/function naming.
This means that the 'b' in "balancer" must be capitalized.

Signed-off-by: Chris Telfer <ctelfer@docker.com>
2018-06-28 12:08:18 -04:00
Chris Telfer
85a3483b4b Refactor [add|rm]LBBackend() to use lb struct
This was passing extra information and adding confusion about the
purpose of the load balancing structure.

Signed-off-by: Chris Telfer <ctelfer@docker.com>
2018-06-28 12:08:18 -04:00
Flavio Crisciani
3d2b2f1c7e Possible race on ingress programming
Make sure that iptables operations on ingress
are serialized.
Before 2 racing routines trying to create the ingress chain
were allowed and one was failing reporting the chain as
already existing.
The lock guarantees that this condition does not happen anymore

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2018-06-07 13:02:04 -07:00
Chris Telfer
7d7412f957 Gracefully remove LB endpoints from services
This patch attempts to allow endpoints to complete servicing connections
while being removed from a service.  The change adds a flag to the
endpoint.deleteServiceInfoFromCluster() method to indicate whether this
removal should fully remove connectivity through the load balancer
to the endpoint or should just disable directing further connections to
the endpoint.  If the flag is 'false', then the load balancer assigns
a weight of 0 to the endpoint but does not remove it as a linux load
balancing destination.  It does remove the endpoint as a docker load
balancing endpoint but tracks it in a special map of "disabled-but-not-
destroyed" load balancing endpoints.  This allows traffic to continue
flowing, at least under Linux.  If the flag is 'true', then the code
removes the endpoint entirely as a load balancing destination.

The sandbox.DisableService() method invokes deleteServiceInfoFromCluster()
with the flag sent to 'false', while the endpoint.sbLeave() method invokes
it with the flag set to 'true' to complete the removal on endpoint
finalization.  Renaming the endpoint invokes deleteServiceInfoFromCluster()
with the flag set to 'true' because renaming attempts to completely
remove and then re-add each endpoint service entry.

The controller.rmServiceBinding() method, which carries out the operation,
similarly gets a new flag for whether to fully remove the endpoint.  If
the flag is false, it does the job of moving the endpoint from the
load balancing set to the 'disabled' set.  It then removes or
de-weights the entry in the OS load balancing table via
network.rmLBBackend().  It removes the service entirely via said method
ONLY IF there are no more live or disabled load balancing endpoints.
Similarly network.addLBBackend() requires slight tweaking to properly
manage the disabled set.

Finally, this change requires propagating the status of disabled
service endpoints via the networkDB.  Accordingly, the patch includes
both code to generate and handle service update messages.  It also
augments the service structure with a ServiceDisabled boolean to convey
whether an endpoint should ultimately be removed or just disabled.
This, naturally, required a rebuild of the protocol buffer code as well.

Signed-off-by: Chris Telfer <ctelfer@docker.com>
2018-03-16 15:19:49 -04:00
Wataru Ishida
2120ed2363 Support SCTP port mapping
Signed-off-by: Wataru Ishida <ishida.wataru@lab.ntt.co.jp>
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
2018-02-13 16:01:03 +09:00
Pradip Dhara
43360c627f Enabling ILB/ELB on windows using per-node, per-network LB endpoint.
Signed-off-by: Pradip Dhara <pradipd@microsoft.com>
2017-08-29 00:17:42 -07:00
Derek McGowan
710e0664c4 Update logrus to v1.0.1
Fix case sensitivity issue
Update docker and runc vendors

Signed-off-by: Derek McGowan <derek@mcgstyle.net>
2017-08-07 11:20:47 -07:00
Jacob Wen
5c01dcd401 iptables: jump to DOCKER-USER first
Fixes #1827

Signed-off-by: Jacob Wen <jian.w.wen@oracle.com>
2017-07-20 16:38:14 +08:00
Flavio Crisciani
f969f26966 Service discovery race on serviceBindings delete. Bug on IP reuse (#1808)
* Correct SetMatrix documentation

The SetMatrix is a generic data structure, so the description
should not be tight to any specific use

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>

* Service Discovery reuse name and serviceBindings deletion

- Added logic to handle name reuse from different services
- Moved the deletion from the serviceBindings map at the end
  of the rmServiceBindings body to avoid race with new services

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>

* Avoid race on network cleanup

Use the locker to avoid the race between the network
deletion and new endpoints being created

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>

* CleanupServiceBindings to clean the SD records

Allow the cleanupServicebindings to take care of the service discovery
cleanup. Also avoid to trigger the cleanup for each endpoint from an SD
point of view
LB and SD will be separated in the future

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>

* Addressed comments

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>

* NetworkDB deleteEntry has to happen

If there is an error locally guarantee that the delete entry
on network DB is still honored

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-06-18 05:25:58 -07:00
Flavio Crisciani
65860255c6 Fixed code issues
Fixed issues highlighted by the new checks

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-06-12 11:31:35 -07:00
Flavio Crisciani
39d2204896 Service discovery logic rework
changed the ipMap to SetMatrix to allow transient states
Compacted the addSvc and deleteSvc into a one single method
Updated the datastructure for backends to allow storing all the information needed
to cleanup properly during the cleanupServiceBindings
Removed the enable/disable Service logic that was racing with sbLeave/sbJoin logic
Add some debug logs to track further race conditions

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-06-11 20:49:29 -07:00
Alessandro Boch
a02b4ef4a4 Fix service logs
- do not error on duplicate service removal
- give some context to service logs,
  this would help debugging related issues

Signed-off-by: Alessandro Boch <aboch@docker.com>
2017-02-01 17:32:08 -08:00
Alessandro Boch
4e69afc4f3 Make virtual service programming more robust
- Do not relay on software flags to decide when to create the
   virtual service. Instead query the kernel for presence.
   So that it cannot happen that a real server creation
   fails because the virtual server is missing.

Signed-off-by: Alessandro Boch <aboch@docker.com>
2017-02-01 15:54:31 -08:00
Alessandro Boch
d565d5f2d2 Gracefully handle redundant ipvs service create failures
Signed-off-by: Alessandro Boch <aboch@docker.com>
2017-01-31 16:34:53 -08:00
Alessandro Boch
66197b7787 Fix incorrect error log message
- Failed to _add_ firewall mark... should be _delete_

Signed-off-by: Alessandro Boch <aboch@docker.com>
2017-01-23 16:29:03 -08:00
Alessandro Boch
fac86cf69a Add missing locks in agent and service code
Signed-off-by: Alessandro Boch <aboch@docker.com>
2016-11-29 13:58:06 -08:00
Madhu Venugopal
684ea92515 Add a ICMP reply rule for service VIP
Ping on VIP has been behaving inconsistently depending on if a task
for a service is local or remote.

With this fix, the ICMP echo-request packets to service VIP are replied
to by the NAT rule to self

Signed-off-by: Madhu Venugopal <madhu@docker.com>
2016-11-21 08:57:40 -08:00
Madhu Venugopal
b6540296b0 Revert "Enable ping for service vip address"
This reverts commit ddc74ffced.

Signed-off-by: Madhu Venugopal <madhu@docker.com>
2016-11-21 03:30:27 -08:00
Madhu Venugopal
d1b012d97a Windows overlay driver support
1. Base work was done by msabansal and nwoodmsft
   from : https://github.com/msabansal/docker/tree/overlay
2. reorganized under drivers/windows/overlay and rebased to
   libnetwork master
3. Porting overlay common fixes to windows driver
    * 46f525c
    * ba8714e
    * 6368406
4. Windows Service Discovery changes for swarm-mode
5. renaming default windows ipam drivers as "windows"

Signed-off-by: Madhu Venugopal <madhu@docker.com>
Signed-off-by: msabansal <sabansal@microsoft.com>
Signed-off-by: nwoodmsft <Nicholas.Wood@microsoft.com>
2016-11-03 16:50:04 -07:00
Jana Radhakrishnan
b1e753137f Merge pull request #1501 from sanimej/vip
Enable ping for service vip address
2016-11-02 09:45:14 -07:00
Alessandro Boch
a21d577b8b Block non exposed port traffic on ingress nw interfaces
Signed-off-by: Alessandro Boch <aboch@docker.com>
2016-10-27 20:28:08 -07:00