Prior to release 25.0.0, the bridge in an internal network was assigned
an IP address - making the internal network accessible from the host,
giving containers on the network access to anything listening on the
bridge's address (or INADDR_ANY on the host).
This change restores that behaviour. It does not restore the default
route that was configured in the container, because packets sent outside
the internal network's subnet have always been dropped. So, a 'connect()'
to an address outside the subnet will still fail fast.
Signed-off-by: Rob Murray <rob.murray@docker.com>
Since 964ab7158c, we explicitly set the bridge MTU if it was specified.
Unfortunately, kernel <v4.17 have a check preventing us to manually set
the MTU to anything greater than 1500 if no links is attached to the
bridge, which is how we do things -- create the bridge, set its MTU and
later on, attach veths to it.
Relevant kernel commit: 804b854d37
As we still have to support CentOS/RHEL 7 (and their old v3.10 kernels)
for a few more months, we need to ignore EINVAL if the MTU is > 1500
(but <= 65535).
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Previous commit made getDBhandle a one-liner returning a struct
member -- making it useless. Inline it.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
This parameter was used to tell the boltdb kvstore not to open/close
the underlying boltdb db file before/after each get/put operation.
Since d21d0884ae, we've a single datastore instance shared by all
components that need it. That commit set `PersistConnection=true`.
We can now safely remove this param altogether, and remove all the
code that was opening and closing the db file before and after each
operation -- it's dead code!
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
This test is non-representative of what we now do in libnetwork.
Since the ability of opening the same boltdb database multiple
times in parallel will be dropped in the next commit, just remove
this test.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Containers attached to an 'internal' bridge network are unable to
communicate when the host is running firewalld.
Non-internal bridges are added to a trusted 'docker' firewalld zone, but
internal bridges were not.
DOCKER-ISOLATION iptables rules are still configured for an internal
network, they block traffic to/from addresses outside the network's subnet.
Signed-off-by: Rob Murray <rob.murray@docker.com>
File paths can contain commas, particularly paths returned from
t.TempDir() in subtests which include commas in their names. There is
only one datastore provider and it only supports a single address, so
the only use of parsing the address is to break tests in mysterious
ways.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The bbolt library wants exclusive access to the boltdb file and uses
file locking to assure that is the case. The controller and each network
driver that needs persistent storage instantiates its own unique
datastore instance, backed by the same boltdb file. The boltdb kvstore
implementation works around multiple access to the same boltdb file by
aggressively closing the boltdb file between each transaction. This is
very inefficient. Have the controller pass its datastore instance into
the drivers and enable the PersistConnection option to disable closing
the boltdb between transactions.
Set data-dir in unit tests which instantiate libnetwork controllers so
they don't hang trying to lock the default boltdb database file.
Signed-off-by: Cory Snider <csnider@mirantis.com>
This is a follow-up to 2cf230951f, adding
more directives to adjust for some new code added since:
Before this patch:
make -C ./internal/gocompat/
GO111MODULE=off go generate .
GO111MODULE=on go mod tidy
GO111MODULE=on go test -v
# github.com/docker/docker/internal/sliceutil
internal/sliceutil/sliceutil.go:3:12: type parameter requires go1.18 or later (-lang was set to go1.16; check go.mod)
internal/sliceutil/sliceutil.go:3:14: predeclared comparable requires go1.18 or later (-lang was set to go1.16; check go.mod)
internal/sliceutil/sliceutil.go:4:19: invalid map key type T (missing comparable constraint)
# github.com/docker/docker/libnetwork
libnetwork/endpoint.go:252:17: implicit function instantiation requires go1.18 or later (-lang was set to go1.16; check go.mod)
# github.com/docker/docker/daemon
daemon/container_operations.go:682:9: implicit function instantiation requires go1.18 or later (-lang was set to go1.16; check go.mod)
daemon/inspect.go:42:18: implicit function instantiation requires go1.18 or later (-lang was set to go1.16; check go.mod)
With this patch:
make -C ./internal/gocompat/
GO111MODULE=off go generate .
GO111MODULE=on go mod tidy
GO111MODULE=on go test -v
=== RUN TestModuleCompatibllity
main_test.go:321: all packages have the correct go version specified through //go:build
--- PASS: TestModuleCompatibllity (0.00s)
PASS
ok gocompat 0.031s
make: Leaving directory '/go/src/github.com/docker/docker/internal/gocompat'
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Commit 8b7af1d0f added some code to update the DNSNames of all
endpoints attached to a sandbox by loading a new instance of each
affected endpoints from the datastore through a call to
`Network.EndpointByID()`.
This method then calls `Network.getEndpointFromStore()`, that in
turn calls `store.GetObject()`, which then calls `cache.get()`,
which calls `o.CopyTo(kvObject)`. This effectively creates a fresh
new instance of an Endpoint. However, endpoints are already kept in
memory by Sandbox, meaning we now have two in-memory instances of
the same Endpoint.
As it turns out, libnetwork is built around the idea that no two objects
representing the same thing should leave in-memory, otherwise breaking
mutex locking and optimistic locking (as both instances will have a drifting
version tracking ID -- dbIndex in libnetwork parliance).
In this specific case, this bug materializes by container rename failing
when applied a second time for a given container. An integration test is
added to make sure this won't happen again.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
When resolving names in swarm mode, services with exposed ports are
connected to user overlay network, ingress network, and local (docker_gwbridge)
networks. Name resolution should prioritize returning the VIP/IPs on user
overlay network over ingress and local networks.
Sandbox.ResolveName implemented this by taking the list of endpoints,
splitting the list into 3 separate lists based on the type of network
that the endpoint was attached to (dynamic, ingress, local), and then
creating a new list, applying the networks in that order.
This patch refactors that logic to use a custom sorter (sort.Interface),
which makes the code more transparent, and prevents iterating over the
list of endpoints multiple times.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Permit container network attachments to set any static IP address within
the network's IPAM master pool, including when a subpool is configured.
Users have come to depend on being able to statically assign container
IP addresses which are guaranteed not to collide with automatically-
assigned container addresses.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Some configuration in a container depends on whether it has support for
IPv6 (including default entries for '::1' etc in '/etc/hosts').
Before this change, the container's support for IPv6 was determined by
whether it was connected to any IPv6-enabled networks. But, that can
change over time, it isn't a property of the container itself.
So, instead, detect IPv6 support by looking for '::1' on the container's
loopback interface. It will not be present if the kernel does not have
IPv6 support, or the user has disabled it in new namespaces by other
means.
Once IPv6 support has been determined for the container, its '/etc/hosts'
is re-generated accordingly.
The daemon no longer disables IPv6 on all interfaces during initialisation.
It now disables IPv6 only for interfaces that have not been assigned an
IPv6 address. (But, even if IPv6 is disabled for the container using the
sysctl 'net.ipv6.conf.all.disable_ipv6=1', interfaces connected to IPv6
networks still get IPv6 addresses that appear in the internal DNS. There's
more to-do!)
Signed-off-by: Rob Murray <rob.murray@docker.com>
For some time, when adding an interface with no IPv6 address (an
interface to a network that does not have IPv6 enabled), we've been
disabling IPv6 on that interface.
As part of a separate change, I'm removing that logic - there's nothing
wrong with having IPv6 enabled on an interface with no routable address.
The difference is that the kernel will assign a link-local address.
TestAddRemoveInterface does this...
- Assign an IPv6 link-local address to one end of a veth interface, and
add it to a namespace.
- Add a bridge with no assigned IPv6 address to the namespace.
- Remove the veth interface from the namespace.
- Put the veth interface back into the namespace, still with an
explicitly assigned IPv6 link local address.
When IPv6 is disabled on the bridge interface, the test passes.
But, when IPv6 is enabled, the bridge gets a kernel assigned link-local
address.
Then, when re-adding the veth interface, the test generates an error in
'osl/interface_linux.go:checkRouteConflict()'. The conflict is between
the explicitly assigned fe80::2 on the veth, and a route for fe80::/64
belonging to the bridge.
So, in preparation for not-disabling IPv6 on these interfaces, use a
unique-local address in the test instead of link-local.
I don't think that changes the intent of the test.
With the change to not-always disable IPv6, it is possible to repro the
problem with a real container, disconnect and re-connect a user-defined
network with '--subnet fe80::/64' while the container's connected to an
IPv4 network. So, strictly speaking, that will be a regression.
But, it's also possible to repro the problem in master, by disconnecting
and re-connecting the fe80::/64 network while another IPv6 network is
connected. So, I don't think it's a problem we need to address, perhaps
other than by prohibiting '--subnet fe80::/64'.
Signed-off-by: Rob Murray <rob.murray@docker.com>
A check was added to the bridge driver to detect when it was called to
create the default bridge nw whereas a stale default bridge already
existed. In such case, the bridge driver was deleting the stale network
before re-creating it. This check was introduced in docker/libnetwork@6b158eac6a
to fix an issue related to newly introduced live-restore.
However, since commit docker/docker@ecffb6d58c,
the daemon doesn't even try to create default networks if there're
active sandboxes (ie. due to live-restore).
Thus, now it's impossible for the default bridge network to be stale and
to exists when the driver's CreateNetwork() method is called. As such,
the check introduced in the first commit mentioned above is dead code
and can be safely removed.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
This function never returned an error, and was not matching an interface, so
remove the error-return.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
When mapping a port with the userland-proxy enabled, the daemon would
perform an "exec.LookPath" for every mapped port (which, in case of
a range of ports, would be for every port in the range).
This was both inefficient (looking up the binary for each port), inconsistent
(when running in rootless-mode, the binary was looked-up once), as well as
inconvenient, because a missing binary, or a mis-configureed userland-proxy-path
would not be detected daeemon startup, and not produce an error until starting
the container;
docker run -d -P nginx:alpine
4f7b6589a1680f883d98d03db12203973387f9061e7a963331776170e4414194
docker: Error response from daemon: driver failed programming external connectivity on endpoint romantic_wiles (7cfdc361821f75cbc665564cf49856cf216a5b09046d3c22d5b9988836ee088d): fork/exec docker-proxy: no such file or directory.
However, the container would still be created (but invalid);
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
869f41d7e94f nginx:alpine "/docker-entrypoint.…" 10 seconds ago Created romantic_wiles
This patch changes how the userland-proxy is configured;
- The path of the userland-proxy is now looked up / configured at daemon
startup; this is similar to how the proxy is configured in rootless-mode.
- A warning is logged when failing to lookup the binary.
- If the daemon is configured with "userland-proxy" enabled, an error is
produced, and the daemon will refuse to start.
- The "proxyPath" argument for newProxyCommand() (in libnetwork/portmapper)
is now required to be set. It no longer looks up the executable, and
produces an error if no path was provided. While this change was not
required, it makes the daemon config the canonical source of truth, instead
of logic spread accross multiplee locations.
Some of this logic is a change of behavior, but these changes were made with
the assumption that we don't want to support;
- installing the userland proxy _after_ the daemon was started
- moving the userland proxy (or installing a proxy with a higher
preference in PATH)
With this patch:
Validating the config produces an error if the binary is not found:
dockerd --validate
WARN[2023-12-29T11:36:39.748699591Z] failed to lookup default userland-proxy binary error="exec: \"docker-proxy\": executable file not found in $PATH"
userland-proxy is enabled, but userland-proxy-path is not set
Disabling userland-proxy prints a warning, but validates as "OK":
dockerd --userland-proxy=false --validate
WARN[2023-12-29T11:38:30.752523879Z] ffailed to lookup default userland-proxy binary error="exec: \"docker-proxy\": executable file not found in $PATH"
configuration OK
Speficying a non-absolute path produces an error:
dockerd --userland-proxy-path=docker-proxy --validate
invalid userland-proxy-path: must be an absolute path: docker-proxy
Befor this patch, we would not validate this path, which would allow the daemon
to start, but fail to map a port;
docker run -d -P nginx:alpine
4f7b6589a1680f883d98d03db12203973387f9061e7a963331776170e4414194
docker: Error response from daemon: driver failed programming external connectivity on endpoint romantic_wiles (7cfdc361821f75cbc665564cf49856cf216a5b09046d3c22d5b9988836ee088d): fork/exec docker-proxy: no such file or directory.
Specifying an invalid userland-proxy-path produces an error as well:
dockerd --userland-proxy-path=/usr/local/bin/no-such-binary --validate
userland-proxy-path is invalid: stat /usr/local/bin/no-such-binary: no such file or directory
mkdir -p /usr/local/bin/not-a-file
dockerd --userland-proxy-path=/usr/local/bin/not-a-file --validate
userland-proxy-path is invalid: exec: "/usr/local/bin/not-a-file": is a directory
touch /usr/local/bin/not-an-executable
dockerd --userland-proxy-path=/usr/local/bin/not-an-executable --validate
userland-proxy-path is invalid: exec: "/usr/local/bin/not-an-executable": permission denied
Same when using the daemon.json config-file;
echo '{"userland-proxy-path":"no-such-binary"}' > /etc/docker/daemon.json
dockerd --validate
unable to configure the Docker daemon with file /etc/docker/daemon.json: merged configuration validation from file and command line flags failed: invalid userland-proxy-path: must be an absolute path: no-such-binary
dockerd --userland-proxy-path=hello --validate
unable to configure the Docker daemon with file /etc/docker/daemon.json: the following directives are specified both as a flag and in the configuration file: userland-proxy-path: (from flag: hello, from file: /usr/local/bin/docker-proxy)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The cleanup function never returns an error, so didn't add much value. This
patch removes the closure, and calls it inline to remove the extra
indirection, and removes the error which would never be returned.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The defer was set after the switch, but various code-paths inside the switch
could return with an error after the port was allocated / reserved, which
could result in those ports not being released.
This patch moves the defer into each individual branch of the switch to set
it immediately after succesfully reserving the port.
We can also remove a redundant ReleasePort from the cleanup function, as
it's only called if an error occurs, and the defers already take care of
that.
Note that the cleanup function was handling errors returned by ReleasePort,
but this function never returns an error, so it was fully redundant.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Prevent accidentally shadowing the error, which is used in a defer.
Also re-format the code to make it more clear we're not acting on
a locally-scoped "allocatedHostPort" variable.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
If IPv6 is enabled for a bridge network, by the time configuration
is applied, the bridge will always have an address. Assert that, by
raising an error when the configuration is validated.
Use that to simplify the logic used to calculate which addresses
should be assigned to a bridge. Also remove a redundant check in
setupGatewayIPv6() and the error associated with it.
Fix unit tests that enabled IPv6, but didn't supply an IPv6 IPAM
address/pool. Before this change, these tests passed but silently
left the bridge without an IPv6 address.
(The daemon already ensured there was an IPv6 address, this change
does not add a new restriction on config at that level.)
Signed-off-by: Rob Murray <rob.murray@docker.com>
Some checks in 'networkConfiguration.Validate()' were not running as
expected, they'd always pass - because 'parseNetworkOptions()' called
it before 'config.processIPAM()' had added IP addresses and gateways.
Signed-off-by: Rob Murray <rob.murray@docker.com>
No more concept of "anonymous endpoints". The equivalent is now an
endpoint with no DNSNames set.
Some of the code removed by this commit was mutating user-supplied
endpoint's Aliases to add container's short ID to that list. In order to
preserve backward compatibility for the ContainerInspect endpoint, this
commit also takes care of adding that short ID (and the container
hostname) to `EndpointSettings.Aliases` before returning the response.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>