When userland-proxy is turned off and on again, the iptables nat rule
doing hairpinning isn't properly removed. This fix makes sure this nat
rule is removed whenever the bridge is torn down or hairpinning is
disabled (through setting userland-proxy to true).
Unlike for ip masquerading and ICC, the `programChainRule()` call
setting up the "MASQ LOCAL HOST" rule has to be called unconditionally
because the hairpin parameter isn't restored from the driver store, but
always comes from the driver config.
For the "SKIP DNAT" rule, things are a bit different: this rule is
always deleted by `removeIPChains()` when the bridge driver is
initialized.
Fixes#44721.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
(cherry picked from commit 566a2e4)
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Conntrack entries are created for UDP flows even if there's nowhere to
route these packets (ie. no listening socket and no NAT rules to
apply). Moreover, iptables NAT rules are evaluated by netfilter only
when creating a new conntrack entry.
When Docker adds NAT rules, netfilter will ignore them for any packet
matching a pre-existing conntrack entry. In such case, when
dockerd runs with userland proxy enabled, packets got routed to it and
the main symptom will be bad source IP address (as shown by #44688).
If the publishing container is run through Docker Swarm or in
"standalone" Docker but with no userland proxy, affected packets will
be dropped (eg. routed to nowhere).
As such, Docker needs to flush all conntrack entries for published UDP
ports to make sure NAT rules are correctly applied to all packets.
- Fixes#44688
- Fixes#8795
- Fixes#16720
- Fixes#7540
- Fixesmoby/libnetwork#2423
- and probably more.
As a precautionary measure, those conntrack entries are also flushed
when revoking external connectivity to avoid those entries to be reused
when a new sandbox is created (although the kernel should already
prevent such case).
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
(cherry picked from commit b37d34307d)
Signed-off-by: Cory Snider <csnider@mirantis.com>
iptables -C flag was introduced in v1.4.11, which was released ten
years ago. Thus, there're no more Linux distributions supported by
Docker using this version. As such, this commit removes the old way of
checking if an iptables rule exists (by using substring matching).
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
(cherry picked from commit 799cc143c9)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The former was doing some checks and logging warnings, whereas
the latter was doing the same checks but to set some internal variables.
As both are called only once and from the same place, there're now
merged together.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
(cherry picked from commit 205e5278c6)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
iptables package has a function `detectIptables()` called to initialize
some local variables. Since v20.10.0, it first looks for iptables bin,
then ip6tables and finally it checks what iptables flags are available
(including -C). It early exits when ip6tables isn't available, and
doesn't execute the last check.
To remove port mappings (eg. when a container stops/dies), Docker
first checks if those NAT rules exist and then deletes them. However, in
the particular case where there's no ip6tables bin available, iptables
`-C` flag is considered unavailable and thus it looks for NAT rules by
using some substring matching. This substring matching then fails
because `iptables -t nat -S POSTROUTING` dumps rules in a slighly format
than what's expected.
For instance, here's what `iptables -t nat -S POSTROUTING` dumps:
```
-A POSTROUTING -s 172.18.0.2/32 -d 172.18.0.2/32 -p tcp -m tcp --dport 9999 -j MASQUERADE
```
And here's what Docker looks for:
```
POSTROUTING -p tcp -s 172.18.0.2 -d 172.18.0.2 --dport 9999 -j MASQUERADE
```
Because of that, those rules are considered non-existant by Docker and
thus never deleted. To fix that, this change reorders the code in
`detectIptables()`.
Fixes#42127.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
(cherry picked from commit af7236f85a)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- use local variables for chains instead of sharing global variables
- make createNewChain a t.Helper
Signed-off-by: Chee Hau Lim <ch33hau@gmail.com>
(cherry picked from commit a2cea992c2)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
These functions were used in 63a7ccdd23, which was
part of Docker v1.5.0 and v1.6.0, but removed in Docker v1.7.0 when the network
stack was replaced with libnetwork in d18919e304.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 49de15cdcc)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Remove the "deadcode", "structcheck", and "varcheck" linters, as they are
deprecated:
WARN [runner] The linter 'deadcode' is deprecated (since v1.49.0) due to: The owner seems to have abandoned the linter. Replaced by unused.
WARN [runner] The linter 'structcheck' is deprecated (since v1.49.0) due to: The owner seems to have abandoned the linter. Replaced by unused.
WARN [runner] The linter 'varcheck' is deprecated (since v1.49.0) due to: The owner seems to have abandoned the linter. Replaced by unused.
WARN [linters context] structcheck is disabled because of generics. You can track the evolution of the generics support by following the https://github.com/golangci/golangci-lint/issues/2649.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 2f1c382a6d)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
After discussing in the maintainers meeting, we concluded that Slowloris attacks
are not a real risk other than potentially having some additional goroutines
lingering around, so setting a long timeout to satisfy the linter, and to at
least have "some" timeout.
libnetwork/diagnostic/server.go:96:10: G112: Potential Slowloris Attack because ReadHeaderTimeout is not configured in the http.Server (gosec)
srv := &http.Server{
Addr: net.JoinHostPort(ip, strconv.Itoa(port)),
Handler: s,
}
api/server/server.go:60:10: G112: Potential Slowloris Attack because ReadHeaderTimeout is not configured in the http.Server (gosec)
srv: &http.Server{
Addr: addr,
},
daemon/metrics_unix.go:34:13: G114: Use of net/http serve function that has no support for setting timeouts (gosec)
if err := http.Serve(l, mux); err != nil && !strings.Contains(err.Error(), "use of closed network connection") {
^
cmd/dockerd/metrics.go:27:13: G114: Use of net/http serve function that has no support for setting timeouts (gosec)
if err := http.Serve(l, mux); err != nil && !strings.Contains(err.Error(), "use of closed network connection") {
^
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 55fd77f724)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
also ran gofmt with go1.19
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 58413c15cb)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The linter falsely detects this as using "math/rand":
libnetwork/networkdb/cluster.go:721:14: G404: Use of weak random number generator (math/rand instead of crypto/rand) (gosec)
val, err := rand.Int(rand.Reader, big.NewInt(int64(n)))
^
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 561a010161)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Use net.JoinHostPort to account for IPv6 addresses.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit a33d1f9a7c)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
In network node change test, the expected behavior is focused on how many nodes
left in networkDB, besides timing issues, things would also go tricky for a
leave-then-join sequence, if the check (counting the nodes) happened before the
first "leave" event, then the testcase actually miss its target and report PASS
without verifying its final result; if the check happened after the 'leave' event,
but before the 'join' event, the test would report FAIL unnecessary;
This code change would check both the db changes and the node count, it would
report PASS only when networkdb has indeed changed and the node count is expected.
Signed-off-by: David Wang <00107082@163.com>
(cherry picked from commit f499c6b9ec)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The correct formatting for machine-readable comments is;
//<some alphanumeric identifier>:<options>[,<option>...][ // comment]
Which basically means:
- MUST NOT have a space before `<identifier>` (e.g. `nolint`)
- Identified MUST be alphanumeric
- MUST be followed by a colon
- MUST be followed by at least one `<option>`
- Optionally additional `<options>` (comma-separated)
- Optionally followed by a comment
Any other format will not be considered a machine-readable comment by `gofmt`,
and thus formatted as a regular comment. Note that this also means that a
`//nolint` (without anything after it) is considered invalid, same for `//#nosec`
(starts with a `#`).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 4f08346686)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Older versions of Go don't format comments, so committing this as
a separate commit, so that we can already make these changes before
we upgrade to Go 1.19.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 52c1a2fae8)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
libnetwork/firewall_linux.go:11:21: var-declaration: should drop = nil from declaration of var ctrl; it is the zero value (revive)
ctrl *controller = nil
^
distribution/pull_v2_test.go:213:4: S1038: should use t.Fatalf(...) instead of t.Fatal(fmt.Sprintf(...)) (gosimple)
t.Fatal(fmt.Sprintf("expected formatPlatform to show windows platform with a version, but got '%s'", result))
^
integration-cli/docker_cli_build_test.go:5951:3: S1038: should use c.Skipf(...) instead of c.Skip(fmt.Sprintf(...)) (gosimple)
c.Skip(fmt.Sprintf("Bug fixed in 18.06 or higher.Skipping it for %s", testEnv.DaemonInfo.ServerVersion))
^
integration-cli/docker_cli_daemon_test.go:240:3: S1038: should use c.Skipf(...) instead of c.Skip(fmt.Sprintf(...)) (gosimple)
c.Skip(fmt.Sprintf("New base device size (%v) must be greater than (%s)", units.HumanSize(float64(newBasesizeBytes)), units.HumanSize(float64(oldBasesizeBytes))))
^
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
bump netlink to 1.2.1
change usages of netlink handle .Delete() to Close()
remove superfluous replace in vendor.mod
make requires of github.com/Azure/go-ansiterm direct
Signed-off-by: Martin Braun <braun@neuroforge.de>
Since commit 2c4a868f64, Docker doesn't
use the value of net.ipv4.ip_local_port_range when choosing an ephemeral
port. This change reverts back to the previous behavior.
Fixes#43054.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Previously, with the patch from #43146, it was possible for a
network configured with a single ingress or load balancer on a
distribution which does not have the `ip_vs` kernel module loaded
by default to try to apply sysctls which did not exist yet, and
subsequently dynamically load the module as part of ipvs/netlink.go.
This module is vendored, and not a great place to try to tie back
into core libnetwork functionality, so also ensure that the sysctls
(which are idempotent) are called after ingress/lb creation once
`ipvs` has been initialized.
Signed-off-by: Ryan Barry <rbarry@mirantis.com>
relates to #35082, moby/libnetwork#2491
Previously, values for expire_quiescent_template, conn_reuse_mode,
and expire_nodest_conn were set only system-wide. Also apply them
for new lb_* and ingress_sbox sandboxes, so they are appropriately
propagated
Signed-off-by: Ryan Barry <rbarry@mirantis.com>
strings.ReplaceAll(s, old, new) is a wrapper function for
strings.Replace(s, old, new, -1). But strings.ReplaceAll is more
readable and removes the hardcoded -1.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
Some packages were using `logrus.Fatal()` in init functions (which logs the error,
and (by default) calls `os.Exit(1)` after logging).
Given that logrus formatting and outputs have not yet been configured during the
initialization stage, it does not provide much benefits over a plain `panic()`.
This patch replaces some instances of `logrus.Fatal()` with `panic()`, which has
the added benefits of not introducing logrus as a dependency in some of these
packages, and also produces a stacktrace, which could help locating the problem
in the unlikely event an `init()` fails.
Before this change, an error would look like:
$ dockerd
FATA[0000] something bad happened
After this change, the same error looks like:
$ dockerd
panic: something bad happened
goroutine 1 [running]:
github.com/docker/docker/daemon/logger/awslogs.init.0()
/go/src/github.com/docker/docker/daemon/logger/awslogs/cloudwatchlogs.go:128 +0x89
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
While looking at this code, I noticed that we were wasting quite some resources
by first constructing a string, only to split it again (with `strings.Fields()`)
into a string slice.
Some conversions were also happening multiple times (int to string, IP-address to
string, etc.)
Setting up networking is known to be costing a considerable amount of time when
starting containers, and while this may only be a small part of that, it doesn't
hurt to save some resources (and readability of the code isn't significantly
impacted).
For example, benchmarking the `redirector()` code before/after:
BenchmarkParseOld-4 137646 8398 ns/op 4192 B/op 75 allocs/op
BenchmarkParseNew-4 629395 1762 ns/op 2362 B/op 24 allocs/op
Average over 10 runs:
benchstat old.txt new.txt
name old time/op new time/op delta
Parse-4 8.43µs ± 2% 1.79µs ± 3% -78.76% (p=0.000 n=9+8)
name old alloc/op new alloc/op delta
Parse-4 4.19kB ± 0% 2.36kB ± 0% -43.65% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
Parse-4 75.0 ± 0% 24.0 ± 0% -68.00% (p=0.000 n=10+10)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Make the call to cleanupServiceBindings during network deletion
conditional on Windows (where it is required), thereby providing a
performance improvement to network cleanup on Linux.
Signed-off-by: Trapier Marshall <tmarshall@mirantis.com>
Removal of PolicyLists from Windows VFP must be performed prior to
removing the HNS network. Otherwise PolicyList removal fails with
HNS error "network not found".
Signed-off-by: Trapier Marshall <tmarshall@mirantis.com>
There is a race condition between the local proxy and iptables rule
setting. When we have a lot of UDP traffic, the kernel will create
conntrack entries to the local proxy and will ignore the iptables
rules set after that.
Related to PR #32505. Fix#8795.
Signed-off-by: Vincent Bernat <vincent@bernat.ch>
Operations performed on overlay network sandboxes are handled by
dispatching operations send through a channel. This allows for
asynchronous operations to be performed which, since they are
not called from within another function, are able to operate in
an idempotent manner with a known/measurable starting state from
which an identical series of iterative actions can be performed.
However, it was possible in some cases for an operation dispatched
from this channel to write a message back to the channel in the
case of joining a network when a sufficient volume of sandboxes
were operated on.
A goroutine which is simultaneously reading and writing to an
unbuffered channel can deadlock if it sends a message to a channel
then waits for it to be consumed and completed, since the only
available goroutine is more or less "talking to itself". In order
to break this deadlock, in the observed race, a goroutine is now
created to send the message to the channel.
Signed-off-by: Martin Dojcak <martin.dojcak@lablabs.io>
Signed-off-by: Ryan Barry <rbarry@mirantis.com>
Windows Server 2016 (RS1) reached end of support, and Docker Desktop requires
Windows 10 V19H2 (version 1909, build 18363) as a minimum.
This patch makes Windows Server RS5 / ltsc2019 (build 17763) the minimum version
to run the daemon, and removes some hacks for older versions of Windows.
There is one check remaining that checks for Windows RS3 for a workaround
on older versions, but recent changes in Windows seemed to have regressed
on the same issue, so I kept that code for now to check if we may need that
workaround (again);
085c6a98d5/daemon/graphdriver/windows/windows.go (L319-L341)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>