Commit graph

441 commits

Author SHA1 Message Date
Sebastiaan van Stijn
4adc40ac40
fix duplicate words (dupwords)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2024-03-07 10:57:03 +01:00
Albin Kerouanton
cbd45e83cf libnet: Replace DeleteAtomic in retry loops with DeleteIdempotent
A common pattern in libnetwork is to delete an object using
`DeleteAtomic`, ie. to check the optimistic lock, but put in a retry
loop to refresh the data and the version index used by the optimistic
lock.

This commit introduces a new `Delete` method to delete without
checking the optimistic lock. It focuses only on the few places where
it's obvious the calling code doesn't rely on the side-effects of the
retry loop (ie. refreshing the object to be deleted).

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2024-02-22 08:22:09 +01:00
Rob Murray
419f5a6372 Make 'internal' bridge networks accessible from host
Prior to release 25.0.0, the bridge in an internal network was assigned
an IP address - making the internal network accessible from the host,
giving containers on the network access to anything listening on the
bridge's address (or INADDR_ANY on the host).

This change restores that behaviour. It does not restore the default
route that was configured in the container, because packets sent outside
the internal network's subnet have always been dropped. So, a 'connect()'
to an address outside the subnet will still fail fast.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-02-07 19:12:10 +00:00
Albin Kerouanton
89470a7114 libnet: bridge: ignore EINVAL when configuring bridge MTU
Since 964ab7158c, we explicitly set the bridge MTU if it was specified.
Unfortunately, kernel <v4.17 have a check preventing us to manually set
the MTU to anything greater than 1500 if no links is attached to the
bridge, which is how we do things -- create the bridge, set its MTU and
later on, attach veths to it.

Relevant kernel commit: 804b854d37

As we still have to support CentOS/RHEL 7 (and their old v3.10 kernels)
for a few more months, we need to ignore EINVAL if the MTU is > 1500
(but <= 65535).

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2024-02-02 19:32:45 +01:00
Albin Kerouanton
025967efd0
Merge pull request #47293 from robmry/47229-internal-bridge-firewalld
Add internal n/w bridge to firewalld docker zone
2024-02-02 08:36:27 +01:00
Rob Murray
2cc627932a Add internal n/w bridge to firewalld docker zone
Containers attached to an 'internal' bridge network are unable to
communicate when the host is running firewalld.

Non-internal bridges are added to a trusted 'docker' firewalld zone, but
internal bridges were not.

DOCKER-ISOLATION iptables rules are still configured for an internal
network, they block traffic to/from addresses outside the network's subnet.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-02-01 11:49:53 +00:00
Cory Snider
d21d0884ae libnetwork: share a single datastore with drivers
The bbolt library wants exclusive access to the boltdb file and uses
file locking to assure that is the case. The controller and each network
driver that needs persistent storage instantiates its own unique
datastore instance, backed by the same boltdb file. The boltdb kvstore
implementation works around multiple access to the same boltdb file by
aggressively closing the boltdb file between each transaction. This is
very inefficient. Have the controller pass its datastore instance into
the drivers and enable the PersistConnection option to disable closing
the boltdb between transactions.

Set data-dir in unit tests which instantiate libnetwork controllers so
they don't hang trying to lock the default boltdb database file.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2024-01-31 21:08:34 -05:00
Albin Kerouanton
794f7127ef
Merge pull request #47062 from robmry/35954-default_ipv6_enabled
Detect IPv6 support in containers, generate '/etc/hosts' accordingly.
2024-01-29 16:31:35 +01:00
Albin Kerouanton
3147a013fb libnet/ds: remove unused param key from List
Since 43dccc6 the `key` param is never used and can be safely
removed.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2024-01-24 22:42:18 +01:00
Albin Kerouanton
f7ef0e9fc7 libnet/ds: remove unused param key from GetObject
Since 43dccc6 the `key` param is never used and can be safely
removed.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2024-01-24 22:42:18 +01:00
Rob Murray
a8f7c5ee48 Detect IPv6 support in containers.
Some configuration in a container depends on whether it has support for
IPv6 (including default entries for '::1' etc in '/etc/hosts').

Before this change, the container's support for IPv6 was determined by
whether it was connected to any IPv6-enabled networks. But, that can
change over time, it isn't a property of the container itself.

So, instead, detect IPv6 support by looking for '::1' on the container's
loopback interface. It will not be present if the kernel does not have
IPv6 support, or the user has disabled it in new namespaces by other
means.

Once IPv6 support has been determined for the container, its '/etc/hosts'
is re-generated accordingly.

The daemon no longer disables IPv6 on all interfaces during initialisation.
It now disables IPv6 only for interfaces that have not been assigned an
IPv6 address. (But, even if IPv6 is disabled for the container using the
sysctl 'net.ipv6.conf.all.disable_ipv6=1', interfaces connected to IPv6
networks still get IPv6 addresses that appear in the internal DNS. There's
more to-do!)

Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-01-19 20:24:07 +00:00
Albin Kerouanton
b9e27acabc
libnet/d/bridge: dead code: no conflict on stale default nw
A check was added to the bridge driver to detect when it was called to
create the default bridge nw whereas a stale default bridge already
existed. In such case, the bridge driver was deleting the stale network
before re-creating it. This check was introduced in docker/libnetwork@6b158eac6a
to fix an issue related to newly introduced live-restore.

However, since commit docker/docker@ecffb6d58c,
the daemon doesn't even try to create default networks if there're
active sandboxes (ie. due to live-restore).

Thus, now it's impossible for the default bridge network to be stale and
to exists when the driver's CreateNetwork() method is called. As such,
the check introduced in the first commit mentioned above is dead code
and can be safely removed.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2024-01-04 11:50:04 +01:00
Albin Kerouanton
0a26cdf344
libnet/d/bridge: remove dead ActiveEndpointsError
This error is unused since docker/libnetwork@6b158eac6.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2024-01-04 11:12:53 +01:00
Sebastiaan van Stijn
84ba2558e2
Merge pull request #46976 from robmry/bridge_todos
Validate IPv6 address in libnetwork's bridge driver, remove unused error types.
2024-01-02 16:03:16 +01:00
Sebastiaan van Stijn
4f9db655ed
portmapper: move userland-proxy lookup to daemon config
When mapping a port with the userland-proxy enabled, the daemon would
perform an "exec.LookPath" for every mapped port (which, in case of
a range of ports, would be for every port in the range).

This was both inefficient (looking up the binary for each port), inconsistent
(when running in rootless-mode, the binary was looked-up once), as well as
inconvenient, because a missing binary, or a mis-configureed userland-proxy-path
would not be detected daeemon startup, and not produce an error until starting
the container;

    docker run -d -P nginx:alpine
    4f7b6589a1680f883d98d03db12203973387f9061e7a963331776170e4414194
    docker: Error response from daemon: driver failed programming external connectivity on endpoint romantic_wiles (7cfdc361821f75cbc665564cf49856cf216a5b09046d3c22d5b9988836ee088d): fork/exec docker-proxy: no such file or directory.

However, the container would still be created (but invalid);

    docker ps -a
    CONTAINER ID   IMAGE          COMMAND                  CREATED          STATUS    PORTS     NAMES
    869f41d7e94f   nginx:alpine   "/docker-entrypoint.…"   10 seconds ago   Created             romantic_wiles

This patch changes how the userland-proxy is configured;

- The path of the userland-proxy is now looked up / configured at daemon
  startup; this is similar to how the proxy is configured in rootless-mode.
- A warning is logged when failing to lookup the binary.
- If the daemon is configured with "userland-proxy" enabled, an error is
  produced, and the daemon will refuse to start.
- The "proxyPath" argument for newProxyCommand() (in libnetwork/portmapper)
  is now required to be set. It no longer looks up the executable, and
  produces an error if no path was provided. While this change was not
  required, it makes the daemon config the canonical source of truth, instead
  of logic spread accross multiplee locations.

Some of this logic is a change of behavior, but these changes were made with
the assumption that we don't want to support;

- installing the userland proxy _after_ the daemon was started
- moving the userland proxy (or installing a proxy with a higher
  preference in PATH)

With this patch:

Validating the config produces an error if the binary is not found:

    dockerd --validate
    WARN[2023-12-29T11:36:39.748699591Z] failed to lookup default userland-proxy binary       error="exec: \"docker-proxy\": executable file not found in $PATH"
    userland-proxy is enabled, but userland-proxy-path is not set

Disabling userland-proxy prints a warning, but validates as "OK":

    dockerd --userland-proxy=false --validate
    WARN[2023-12-29T11:38:30.752523879Z] ffailed to lookup default userland-proxy binary       error="exec: \"docker-proxy\": executable file not found in $PATH"
    configuration OK

Speficying a non-absolute path produces an error:

    dockerd --userland-proxy-path=docker-proxy --validate
    invalid userland-proxy-path: must be an absolute path: docker-proxy

Befor this patch, we would not validate this path, which would allow the daemon
to start, but fail to map a port;

    docker run -d -P nginx:alpine
    4f7b6589a1680f883d98d03db12203973387f9061e7a963331776170e4414194
    docker: Error response from daemon: driver failed programming external connectivity on endpoint romantic_wiles (7cfdc361821f75cbc665564cf49856cf216a5b09046d3c22d5b9988836ee088d): fork/exec docker-proxy: no such file or directory.

Specifying an invalid userland-proxy-path produces an error as well:

    dockerd --userland-proxy-path=/usr/local/bin/no-such-binary --validate
    userland-proxy-path is invalid: stat /usr/local/bin/no-such-binary: no such file or directory

    mkdir -p /usr/local/bin/not-a-file
    dockerd --userland-proxy-path=/usr/local/bin/not-a-file --validate
    userland-proxy-path is invalid: exec: "/usr/local/bin/not-a-file": is a directory

    touch /usr/local/bin/not-an-executable
    dockerd --userland-proxy-path=/usr/local/bin/not-an-executable --validate
    userland-proxy-path is invalid: exec: "/usr/local/bin/not-an-executable": permission denied

Same when using the daemon.json config-file;

    echo '{"userland-proxy-path":"no-such-binary"}' > /etc/docker/daemon.json
    dockerd --validate
    unable to configure the Docker daemon with file /etc/docker/daemon.json: merged configuration validation from file and command line flags failed: invalid userland-proxy-path: must be an absolute path: no-such-binary

    dockerd --userland-proxy-path=hello --validate
    unable to configure the Docker daemon with file /etc/docker/daemon.json: the following directives are specified both as a flag and in the configuration file: userland-proxy-path: (from flag: hello, from file: /usr/local/bin/docker-proxy)

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-12-29 16:23:18 +01:00
Rob Murray
141cb65e51 Check, then assume an IPv6 bridge has a subnet.
If IPv6 is enabled for a bridge network, by the time configuration
is applied, the bridge will always have an address. Assert that, by
raising an error when the configuration is validated.

Use that to simplify the logic used to calculate which addresses
should be assigned to a bridge. Also remove a redundant check in
setupGatewayIPv6() and the error associated with it.

Fix unit tests that enabled IPv6, but didn't supply an IPv6 IPAM
address/pool. Before this change, these tests passed but silently
left the bridge without an IPv6 address.

(The daemon already ensured there was an IPv6 address, this change
does not add a new restriction on config at that level.)

Signed-off-by: Rob Murray <rob.murray@docker.com>
2023-12-21 15:26:34 +00:00
Rob Murray
437bc829bf Don't try to validate incomplete network config.
Some checks in 'networkConfiguration.Validate()' were not running as
expected, they'd always pass - because 'parseNetworkOptions()' called
it before 'config.processIPAM()' had added IP addresses and gateways.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2023-12-21 15:16:26 +00:00
Rob Murray
52d9b0cb56 Remove unused error types.
Signed-off-by: Rob Murray <rob.murray@docker.com>
2023-12-21 12:47:59 +00:00
Rob Murray
27f3abd893 Allow overlapping change in bridge's IPv6 network.
Calculate the IPv6 addreesses needed on a bridge, then reconcile them
with the addresses on an existing bridge by deleting then adding as
required.

(Previously, required addresses were added one-by-one, then unwanted
addresses were removed. This meant the daemon failed to start if, for
example, an existing bridge had address '2000:db8::/64' and the config
was changed to '2000:db8::/80'.)

IPv6 addresses are now calculated and applied in one go, so there's no
need for setupVerifyAndReconcile() to check the set of IPv6 addresses on
the bridge. And, it was guarded by !config.InhibitIPv4, which can't have
been right. So, removed its IPv6 parts, and added IPv4 to its name.

Link local addresses, the example given in the original ticket, are now
released when containers are stopped. Not releasing them meant that
when using an LL subnet on the default bridge, no container could be
started after a container was stopped (because the calculated address
could not be re-allocated). In non-default bridge networks using an
LL subnet, addresses leaked.

Linux always uses the standard 'fe80::/64' LL network. So, if a bridge
is configured with an LL subnet prefix that overlaps with it, a config
error is reported. Non-overlapping LL subnet prefixes are allowed.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2023-12-18 16:10:41 +00:00
Rob Murray
964ab7158c Explicitly set MTU on bridge devices.
This is purely cosmetic - if a non-default MTU is configured, the bridge
will have the default MTU=1500 until a container's 'veth' is connected
and an MTU is set on the veth. That's a disconcerting, it looks like the
config has been ignored - so, set the bridge's MTU explicitly.

Fixes #37937

Signed-off-by: Rob Murray <rob.murray@docker.com>
2023-11-27 11:18:54 +00:00
Sebastiaan van Stijn
f13d8c2026
Merge pull request #46724 from rhansen/host_ipv6
New `host_ipv6` bridge option to SNAT IPv6 connections
2023-11-13 21:50:17 +01:00
Brian Goff
524eef5d75
Merge pull request #46681 from corhere/libn/datastore-misc-cleanups 2023-11-09 11:31:30 -08:00
Richard Hansen
808120e5b8 New host_ipv6 bridge option to SNAT IPv6 connections
Add a new `com.docker.network.host_ipv6` bridge option to compliment
the existing `com.docker.network.host_ipv4` option. When set to an
IPv6 address, this causes the bridge to insert `SNAT` rules instead of
`MASQUERADE` rules (assuming `ip6tables` is enabled).  `SNAT` makes it
possible for users to control the source IP address used for outgoing
connections.

Signed-off-by: Richard Hansen <rhansen@rhansen.org>
2023-10-25 20:11:49 -04:00
Richard Hansen
0cf113e250 Add unit tests for outgoing NAT rules
Signed-off-by: Richard Hansen <rhansen@rhansen.org>
2023-10-21 13:53:58 -04:00
Cory Snider
4039b9c9c4 libnetwork/datastore: drop (KVObject).DataScope()
It wasn't being used for anything meaningful.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-10-19 12:38:39 -04:00
Richard Hansen
96f85def5b s/HostIP/HostIPv4/ for com.docker.network.host_ipv4 setting
Rename all variables/fields/map keys associated with the
`com.docker.network.host_ipv4` option from `HostIP` to `HostIPv4`.
Rationale:

  * This makes the variable/field name consistent with the option
    name.
  * This makes the code more readable because it is clear that the
    variable/field does not hold an IPv6 address.  This will hopefully
    avoid bugs like <https://github.com/moby/moby/issues/46445> in the
    future.
  * If IPv6 SNAT support is ever added, the names will be symmetric.

Signed-off-by: Richard Hansen <rhansen@rhansen.org>
2023-10-14 02:47:14 -04:00
Richard Hansen
2a14b6cf60 Use iptRule to simplify setIcc (code health)
Signed-off-by: Richard Hansen <rhansen@rhansen.org>
2023-10-14 02:47:14 -04:00
Richard Hansen
d7c6fd2f80 Move programChainRule logic to iptRule methods (code health)
Signed-off-by: Richard Hansen <rhansen@rhansen.org>
2023-10-14 02:47:13 -04:00
Richard Hansen
e260808a57 Move duplicate logic to iptRule.Exists method (code health)
Signed-off-by: Richard Hansen <rhansen@rhansen.org>
2023-10-14 01:41:09 -04:00
Richard Hansen
14d2535f13 Move iptables.IPVersion into iptRule struct (code health)
Rather than pass an `iptables.IPVersion` value alongside every
`iptRule` parameter, embed the IP version in the `iptRule` struct.

Signed-off-by: Richard Hansen <rhansen@rhansen.org>
2023-10-14 01:41:09 -04:00
Richard Hansen
4e219ebafb Eliminate unnecessary iptRule.preArgs field (code health)
That field was only used to pass `-t nat` for NAT rules.  Now `-t
<tableName>` (where `<tableName>` is one of the `iptables.Table`
values) is always passed, eliminating the need for `preArgs`.

Signed-off-by: Richard Hansen <rhansen@rhansen.org>
2023-10-14 01:41:09 -04:00
Richard Hansen
4662e9889c Simplify setupIPTablesInternal parameters (code health)
Pass the entire `*networkConfiguration` struct to
`setupIPTablesInternal` to simplify the function signature and improve
code readability.

Signed-off-by: Richard Hansen <rhansen@rhansen.org>
2023-10-14 01:41:09 -04:00
Sebastiaan van Stijn
cff4f20c44
migrate to github.com/containerd/log v0.1.0
The github.com/containerd/containerd/log package was moved to a separate
module, which will also be used by upcoming (patch) releases of containerd.

This patch moves our own uses of the package to use the new module.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-10-11 17:52:23 +02:00
Albin Kerouanton
37ca57e9d5
libnet/d/bridge: inline error checks
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-10-10 10:46:44 +02:00
Albin Kerouanton
cbc2a71c27
libnet/d/bridge: Don't set container's gateway when network is internal
So far, internal networks were only isolated from the host by iptables
DROP rules. As a consequence, outbound connections from containers would
timeout instead of being "rejected" through an immediate ICMP dest/port
unreachable, a TCP RST or a failing `connect` syscall.

This was visible when internal containers were trying to resolve a
domain that don't match any container on the same network (be it a truly
"external" domain, or a container that don't exist/is dead). In that
case, the embedded resolver would try to forward DNS queries for the
different values of resolv.conf `search` option, making DNS resolution
slow to return an error, and the slowness being exacerbated by some libc
implementations.

This change makes `connect` syscall to return ENETUNREACH, and thus
solves the broader issue of failing fast when external connections are
attempted.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-10-09 13:57:54 +02:00
Sebastiaan van Stijn
863909a749
libnetwork/portmapper: New(): remove unused argument
None of the code using this function was setting the value, so let's
simplify and remove the argument.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-13 18:12:53 +02:00
Richard Hansen
12e27dfd8f Fix host_ipv4 bridge option when IPv6 and ip6tables are enabled
Before this commit, setting the `com.docker.network.host_ipv4` bridge
option when `enable_ipv6` is true and the experimental `ip6tables`
option is enabled would cause Docker to fail to create the network:

> failed to create network `test-network`: Error response from daemon:
> Failed to Setup IP tables: Unable to enable NAT rule: (iptables
> failed: `ip6tables --wait -t nat -I POSTROUTING -s fd01::/64 ! -o
> br-test -j SNAT --to-source 192.168.0.2`: ip6tables
> v1.8.7 (nf_tables): Bad IP address "192.168.0.2"
>
> Try `ip6tables -h` or `ip6tables --help` for more information.
>  (exit status 2))

Fix this error by passing nil -- not the `host_ipv4` address -- when
creating the IPv6 rules.

Signed-off-by: Richard Hansen <rhansen@rhansen.org>
2023-09-10 04:03:07 -04:00
Sebastiaan van Stijn
15435f7293
libnetwork/drivers/bridge: testEndpoint.Interface: return concrete type
Interface-matching should generally happen on the receiver side, and this
function was only used in a single location, and passed as argument to
Driver.CreateEndpoint, which already matches the interface by accepting
a driverapi.InterfaceInfo.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-08-27 20:13:27 +02:00
Sebastiaan van Stijn
9afb688f5f
libnetwork/drivers/bridge: getIPv4Data: remove unused argument
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-08-27 20:13:27 +02:00
Albin Kerouanton
c22ec82477
libnet: Fix error capitalization
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-08-17 16:48:09 +02:00
Albin Kerouanton
42d34e40f9
libnet: Replace BadRequest with InvalidParameter
InvalidParameter is now compatible with errdefs.InvalidParameter. Thus,
these errors will now return a 400 status code instead of a 500.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-08-17 16:45:04 +02:00
Sebastiaan van Stijn
0503cf2510
libnetwork/drivers/bridge: setupIPChains(): name output variables
This function has _four_ output variables of the same type, and several
defer statements that checked the error returned (but using the `err`
variable).

This patch names the return variables to make it clearer what's being
returned, and renames the error-return to `retErr` to make it clearer
where we're dealing with the returned error (and not any local err), to
prevent accidentally shadowing.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-08-16 00:26:35 +02:00
Sebastiaan van Stijn
8070f15966
libnetwork/drivers/bridge: rename some linux-only files
This makes it easier to spot if code is only used on Linux. Note that "all of"
the bridge driver is Linux-only.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-08-12 00:37:43 +02:00
Sebastiaan van Stijn
014fefee1d
libnetwork/drivers/bridge: minor formatting fixes
My IDE kept on re-formatting, so let's do so.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-08-12 00:37:43 +02:00
Sebastiaan van Stijn
2aa24519da
ibnetwork/drivers/bridge: newLink: validate before creating
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-08-08 11:50:40 +02:00
Sebastiaan van Stijn
5d722b35d9
libnetwork/drivers/bridge: bridgeNetwork.getEndpoint(): move lock
Don't lock if there's no need to.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-08-08 11:50:39 +02:00
Sebastiaan van Stijn
eba15fe905
libnetwork/drivers/bridge: driver.link: don't defer in a loop
Collect a list of all the links we successfully enabled (if any), and
use a single defer to disable them.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-08-08 11:50:39 +02:00
Sebastiaan van Stijn
76b736c242
libnetwork/drivers/bridge: driver.link: name return var for defer handling
Name the return variable to prevent accidental shadowing of the error,
which is used in defers.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-08-08 11:50:39 +02:00
Sebastiaan van Stijn
ea5f21ceac
libnetwork/drivers/bridge: don't convert IP to string and back again
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-08-08 11:50:39 +02:00
Sebastiaan van Stijn
8b6203b613
libnetwork/drivers/bridge: link.Enable: don't register reload on error
Only register a reload function if we actually managed to enable the link.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-08-08 11:50:34 +02:00