Commit graph

28 commits

Author SHA1 Message Date
Paweł Gronowski
e829cca0ee
Merge pull request #47584 from robmry/upstream_dns_windows
Windows DNS resolver forwarding
2024-04-19 11:34:50 +02:00
Rob Murray
6c68be24a2 Windows DNS resolver forwarding
Make the internal DNS resolver for Windows containers forward requests
to upsteam DNS servers when it cannot respond itself, rather than
returning SERVFAIL.

Windows containers are normally configured with the internal resolver
first for service discovery (container name lookup), then external
resolvers from '--dns' or the host's networking configuration.

When a tool like ping gets a SERVFAIL from the internal resolver, it
tries the other nameservers. But, nslookup does not, and with this
change it does not need to.

The internal resolver learns external server addresses from the
container's HNSEndpoint configuration, so it will use the same DNS
servers as processes in the container.

The internal resolver for Windows containers listens on the network's
gateway address, and each container may have a different set of external
DNS servers. So, the resolver uses the source address of the DNS request
to select external resolvers.

On Windows, daemon.json feature option 'windows-no-dns-proxy' can be used
to prevent the internal resolver from forwarding requests (restoring the
old behaviour).

Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-04-16 18:57:28 +01:00
Rob Murray
57dd56726a Disable IPv6 for endpoints in '--ipv6=false' networks.
No IPAM IPv6 address is given to an interface in a network with
'--ipv6=false', but the kernel would assign a link-local address and,
in a macvlan/ipvlan network, the interface may get a SLAAC-assigned
address.

So, disable IPv6 on the interface to avoid that.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-04-10 17:11:20 +01:00
Rob Murray
d8b768149b Move dummy DNS server to integration/internal/network
Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-04-04 12:02:22 +01:00
Rob Murray
fde80fe2e7 Restore the SetKey prestart hook.
Partially reverts 0046b16 "daemon: set libnetwork sandbox key w/o OCI hook"

Running SetKey to store the OCI Sandbox key after task creation, rather
than from the OCI prestart hook, meant it happened after sysctl settings
were applied by the runtime - which was the intention, we wanted to
complete Sandbox configuration after IPv6 had been disabled by a sysctl
if that was going to happen.

But, it meant '--sysctl' options for a specfic network interface caused
container task creation to fail, because the interface is only moved into
the network namespace during SetKey.

This change restores the SetKey prestart hook, and regenerates config
files that depend on the container's support for IPv6 after the task has
been created. It also adds a regression test that makes sure it's possible
to set an interface-specfic sysctl.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-03-25 19:35:55 +00:00
Bjorn Neergaard
641e341eed
Merge pull request #47538 from robmry/libnet-resolver-nxdomain
libnet: Don't forward to upstream resolvers on internal nw
2024-03-18 11:22:59 -06:00
Albin Kerouanton
790c3039d0 libnet: Don't forward to upstream resolvers on internal nw
Commit cbc2a71c2 makes `connect` syscall fail fast when a container is
only attached to an internal network. Thanks to that, if such a
container tries to resolve an "external" domain, the embedded resolver
returns an error immediately instead of waiting for a timeout.

This commit makes sure the embedded resolver doesn't even try to forward
to upstream servers.

Co-authored-by: Albin Kerouanton <albinker@gmail.com>
Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-03-14 17:46:48 +00:00
Sebastiaan van Stijn
0fb845858d
Merge pull request #47505 from akerouanton/fix-TestBridgeICC-ipv6
inte/networking:  ping with -6 specified when needed
2024-03-08 18:33:46 +01:00
Albin Kerouanton
5a009cdd5b inte/networking: add isIPv6 flag
Make sure the `ping` command used by `TestBridgeICC` actually has
the `-6` flag when it runs IPv6 test cases. Without this flag,
IPv6 connectivity isn't tested properly.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2024-03-07 17:55:53 +01:00
Rob Murray
ef5295cda4 Don't configure IPv6 addr/gw when IPv6 disabled.
When IPv6 is disabled in a container by, for example, using the --sysctl
option - an IPv6 address/gateway is still allocated. Don't attempt to
apply that config because doing so enables IPv6 on the interface.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-03-06 18:32:31 +00:00
Albin Kerouanton
7c7e453255
Merge pull request #47474 from robmry/47441_mac_addr_config_migration
Don't create endpoint config for MAC addr config migration
2024-03-06 11:04:17 +01:00
Albin Kerouanton
21835a5696 inte/networking: rename linkLocal flag into isLinkLocal
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2024-03-06 00:16:08 +01:00
Sebastiaan van Stijn
137a9d6a4c
Merge pull request #47395 from robmry/47370_windows_natnw_dns_test
Test DNS on Windows 'nat' networks
2024-03-01 13:02:52 +01:00
Rob Murray
a580544d82 Don't create endpoint config for MAC addr config migration
In a container-create API request, HostConfig.NetworkMode (the identity
of the "main" network) may be a name, id or short-id.

The configuration for that network, including preferred IP address etc,
may be keyed on network name or id - it need not match the NetworkMode.

So, when migrating the old container-wide MAC address to the new
per-endpoint field - it is not safe to create a new EndpointSettings
entry unless there is no possibility that it will duplicate settings
intended for the same network (because one of the duplicates will be
discarded later, dropping the settings it contains).

This change introduces a new API restriction, if the deprecated container
wide field is used in the new API, and EndpointsConfig is provided for
any network, the NetworkMode and key under which the EndpointsConfig is
store must be the same - no mixing of ids and names.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-02-29 17:02:19 +00:00
Sebastiaan van Stijn
6c3b3523c9
Merge pull request #47041 from robmry/46968_refactor_resolvconf
Refactor 'resolv.conf' generation.
2024-02-29 09:33:55 +01:00
Rob Murray
9083c2f10d Test DNS on Windows 'nat' networks
Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-02-27 11:40:11 +00:00
Rob Murray
419f5a6372 Make 'internal' bridge networks accessible from host
Prior to release 25.0.0, the bridge in an internal network was assigned
an IP address - making the internal network accessible from the host,
giving containers on the network access to anything listening on the
bridge's address (or INADDR_ANY on the host).

This change restores that behaviour. It does not restore the default
route that was configured in the container, because packets sent outside
the internal network's subnet have always been dropped. So, a 'connect()'
to an address outside the subnet will still fail fast.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-02-07 19:12:10 +00:00
Rob Murray
beb97f7fdf Refactor 'resolv.conf' generation.
Replace regex matching/replacement and re-reading of generated files
with a simple parser, and struct to remember and manipulate the file
content.

Annotate the generated file with a header comment saying the file is
generated, but can be modified, and a trailing comment describing how
the file was generated and listing external nameservers.

Always start with the host's resolv.conf file, whether generating config
for host networking, or with/without an internal resolver - rather than
editing a file previously generated for a different use-case.

Resolves an issue where rewrites of the generated file resulted in
default IPv6 nameservers being unnecessarily added to the config.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-02-06 22:26:12 +00:00
Albin Kerouanton
ca683c1c77
Merge pull request #47233 from robmry/47146-duplicate_mac_addrs2
Only restore a configured MAC addr on restart.
2024-02-02 09:08:17 +01:00
Rob Murray
8c64b85fb9 No inspect 'Config.MacAddress' unless configured.
Do not set 'Config.MacAddress' in inspect output unless the MAC address
is configured.

Also, make sure it is filled in for a configured address on the default
network before the container is started (by translating the network name
from 'default' to 'config' so that the address lookup works).

Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-02-01 09:57:35 +00:00
Rob Murray
dae33031e0 Only restore a configured MAC addr on restart.
The API's EndpointConfig struct has a MacAddress field that's used for
both the configured address, and the current address (which may be generated).

A configured address must be restored when a container is restarted, but a
generated address must not.

The previous attempt to differentiate between the two, without adding a field
to the API's EndpointConfig that would show up in 'inspect' output, was a
field in the daemon's version of EndpointSettings, MACOperational. It did
not work, MACOperational was set to true when a configured address was
used. So, while it ensured addresses were regenerated, it failed to preserve
a configured address.

So, this change removes that code, and adds DesiredMacAddress to the wrapped
version of EndpointSettings, where it is persisted but does not appear in
'inspect' results. Its value is copied from MacAddress (the API field) when
a container is created.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-02-01 09:55:54 +00:00
Albin Kerouanton
794f7127ef
Merge pull request #47062 from robmry/35954-default_ipv6_enabled
Detect IPv6 support in containers, generate '/etc/hosts' accordingly.
2024-01-29 16:31:35 +01:00
Rob Murray
cd53b7380c Remove generated MAC addresses on restart.
The MAC address of a running container was stored in the same place as
the configured address for a container.

When starting a stopped container, a generated address was treated as a
configured address. If that generated address (based on an IPAM-assigned
IP address) had been reused, the containers ended up with duplicate MAC
addresses.

So, remember whether the MAC address was explicitly configured, and
clear it if not.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-01-22 17:52:20 +00:00
Rob Murray
a8f7c5ee48 Detect IPv6 support in containers.
Some configuration in a container depends on whether it has support for
IPv6 (including default entries for '::1' etc in '/etc/hosts').

Before this change, the container's support for IPv6 was determined by
whether it was connected to any IPv6-enabled networks. But, that can
change over time, it isn't a property of the container itself.

So, instead, detect IPv6 support by looking for '::1' on the container's
loopback interface. It will not be present if the kernel does not have
IPv6 support, or the user has disabled it in new namespaces by other
means.

Once IPv6 support has been determined for the container, its '/etc/hosts'
is re-generated accordingly.

The daemon no longer disables IPv6 on all interfaces during initialisation.
It now disables IPv6 only for interfaces that have not been assigned an
IPv6 address. (But, even if IPv6 is disabled for the container using the
sysctl 'net.ipv6.conf.all.disable_ipv6=1', interfaces connected to IPv6
networks still get IPv6 addresses that appear in the internal DNS. There's
more to-do!)

Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-01-19 20:24:07 +00:00
Rob Murray
27f3abd893 Allow overlapping change in bridge's IPv6 network.
Calculate the IPv6 addreesses needed on a bridge, then reconcile them
with the addresses on an existing bridge by deleting then adding as
required.

(Previously, required addresses were added one-by-one, then unwanted
addresses were removed. This meant the daemon failed to start if, for
example, an existing bridge had address '2000:db8::/64' and the config
was changed to '2000:db8::/80'.)

IPv6 addresses are now calculated and applied in one go, so there's no
need for setupVerifyAndReconcile() to check the set of IPv6 addresses on
the bridge. And, it was guarded by !config.InhibitIPv4, which can't have
been right. So, removed its IPv6 parts, and added IPv4 to its name.

Link local addresses, the example given in the original ticket, are now
released when containers are stopped. Not releasing them meant that
when using an LL subnet on the default bridge, no container could be
started after a container was stopped (because the calculated address
could not be re-allocated). In non-default bridge networks using an
LL subnet, addresses leaked.

Linux always uses the standard 'fe80::/64' LL network. So, if a bridge
is configured with an LL subnet prefix that overlaps with it, a config
error is reported. Non-overlapping LL subnet prefixes are allowed.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2023-12-18 16:10:41 +00:00
Sebastiaan van Stijn
58785c2932
integration/networking: fix TestBridgeICC
This test broke in 98323ac114.

This commit renamed WithMacAddress into WithContainerWideMacAddress.
This helper sets the MacAddress field in container.Config. However, API
v1.44 now ignores this field if the NetworkMode has no matching entry in
EndpointsConfig.

This fix uses the helper WithMacAddress and specify for which
EndpointConfig the MacAddress is specified.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-11-08 10:23:24 +01:00
Albin Kerouanton
c1ab6eda4b
integration/networking: Test bridge ICC and INC
Following tests are implemented in this specific commit:

- Inter-container communications for internal and non-internal
  bridge networks, over IPv4 and IPv6.
- Inter-container communications using IPv6 link-local addresses for
  internal and non-internal bridge networks.
- Inter-network communications for internal and non-internal bridge
  networks, over IPv4 and IPv6, are disallowed.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-11-03 09:58:50 +01:00
Albin Kerouanton
409ea700c7
integration: Add a new networking integration test suite
This commit introduces a new integration test suite aimed at testing
networking features like inter-container communication, network
isolation, port mapping, etc... and how they interact with daemon-level
and network-level parameters.

So far, there's pretty much no tests making sure our networks are well
configured: 1. there're a few tests for port mapping, but they don't
cover all use cases ; 2. there're a few tests that check if a specific
iptables rule exist, but that doesn't prevent that specific iptables
rule to be wrong in the first place.

As we're planning to refactor how iptables rules are written, and change
some of them to fix known security issues, we need a way to test all
combinations of parameters. So far, this was done by hand, which is
particularly painful and time consuming. As such, this new test suite is
foundational to upcoming work.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-11-03 09:58:50 +01:00