No IPAM IPv6 address is given to an interface in a network with
'--ipv6=false', but the kernel would assign a link-local address and,
in a macvlan/ipvlan network, the interface may get a SLAAC-assigned
address.
So, disable IPv6 on the interface to avoid that.
Signed-off-by: Rob Murray <rob.murray@docker.com>
Partially reverts 0046b16 "daemon: set libnetwork sandbox key w/o OCI hook"
Running SetKey to store the OCI Sandbox key after task creation, rather
than from the OCI prestart hook, meant it happened after sysctl settings
were applied by the runtime - which was the intention, we wanted to
complete Sandbox configuration after IPv6 had been disabled by a sysctl
if that was going to happen.
But, it meant '--sysctl' options for a specfic network interface caused
container task creation to fail, because the interface is only moved into
the network namespace during SetKey.
This change restores the SetKey prestart hook, and regenerates config
files that depend on the container's support for IPv6 after the task has
been created. It also adds a regression test that makes sure it's possible
to set an interface-specfic sysctl.
Signed-off-by: Rob Murray <rob.murray@docker.com>
Some configuration in a container depends on whether it has support for
IPv6 (including default entries for '::1' etc in '/etc/hosts').
Before this change, the container's support for IPv6 was determined by
whether it was connected to any IPv6-enabled networks. But, that can
change over time, it isn't a property of the container itself.
So, instead, detect IPv6 support by looking for '::1' on the container's
loopback interface. It will not be present if the kernel does not have
IPv6 support, or the user has disabled it in new namespaces by other
means.
Once IPv6 support has been determined for the container, its '/etc/hosts'
is re-generated accordingly.
The daemon no longer disables IPv6 on all interfaces during initialisation.
It now disables IPv6 only for interfaces that have not been assigned an
IPv6 address. (But, even if IPv6 is disabled for the container using the
sysctl 'net.ipv6.conf.all.disable_ipv6=1', interfaces connected to IPv6
networks still get IPv6 addresses that appear in the internal DNS. There's
more to-do!)
Signed-off-by: Rob Murray <rob.murray@docker.com>
For some time, when adding an interface with no IPv6 address (an
interface to a network that does not have IPv6 enabled), we've been
disabling IPv6 on that interface.
As part of a separate change, I'm removing that logic - there's nothing
wrong with having IPv6 enabled on an interface with no routable address.
The difference is that the kernel will assign a link-local address.
TestAddRemoveInterface does this...
- Assign an IPv6 link-local address to one end of a veth interface, and
add it to a namespace.
- Add a bridge with no assigned IPv6 address to the namespace.
- Remove the veth interface from the namespace.
- Put the veth interface back into the namespace, still with an
explicitly assigned IPv6 link local address.
When IPv6 is disabled on the bridge interface, the test passes.
But, when IPv6 is enabled, the bridge gets a kernel assigned link-local
address.
Then, when re-adding the veth interface, the test generates an error in
'osl/interface_linux.go:checkRouteConflict()'. The conflict is between
the explicitly assigned fe80::2 on the veth, and a route for fe80::/64
belonging to the bridge.
So, in preparation for not-disabling IPv6 on these interfaces, use a
unique-local address in the test instead of link-local.
I don't think that changes the intent of the test.
With the change to not-always disable IPv6, it is possible to repro the
problem with a real container, disconnect and re-connect a user-defined
network with '--subnet fe80::/64' while the container's connected to an
IPv4 network. So, strictly speaking, that will be a regression.
But, it's also possible to repro the problem in master, by disconnecting
and re-connecting the fe80::/64 network while another IPv6 network is
connected. So, I don't think it's a problem we need to address, perhaps
other than by prohibiting '--subnet fe80::/64'.
Signed-off-by: Rob Murray <rob.murray@docker.com>
The github.com/containerd/containerd/log package was moved to a separate
module, which will also be used by upcoming (patch) releases of containerd.
This patch moves our own uses of the package to use the new module.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
It's still not "great", but implement a `newInterface()` constructor
to create a new Interface instance, instead of creating a partial
instance and applying "options" after the fact.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
We're only using the results if the interface doesn't have an address
yet, so skip this step if we don't use it.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Flatten some nested "if"-statements, and improve error.
Errors returned by this function are not handled, and only logged, so
make them more informative if debugging is needed.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
They were not consistently used, and the locations where they were
used were already "setters", so we may as well inline the code.
Also updating Namespace.Restore to keep the lock slightly longer,
instead of locking/unlocking for each property individually, although
we should consider to keep the long for the duration of the whole
function to make it more atomic.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Make the mutex internal to the Namespace; locking/unlocking should not
be done externally, and this makes it easier to see where it's used.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Interface.Remove() was directly accessing Namespace "internals", such
as locking/unlocking. Move the code from Interface.Remove() into the
Namespace instead.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- store linkIndex in a local variable so that it can be reused
- remove / rename some intermediate vars that shadowed existing declaration
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This argument was originally added in libnetwork:
03f440667f
At the time, this argument was conditional, but currently it's always set
to "true", so let's remove it.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The code ignores these errors, but will unconditionally print a warning;
> If the kernel deletion fails for the neighbor entry still remote it
> from the namespace cache. Otherwise if the neighbor moves back to the
> same host again, kernel update can fail.
Let's reduce noise if the neighbor wasn't found, to prevent logs like:
Aug 16 13:26:35 master1.local dockerd[4019880]: time="2023-08-16T13:26:35.186662370+02:00" level=warning msg="error while deleting neighbor entry" error="no such file or directory"
Aug 16 13:26:35 master1.local dockerd[4019880]: time="2023-08-16T13:26:35.366585939+02:00" level=warning msg="error while deleting neighbor entry" error="no such file or directory"
Aug 16 13:26:42 master1.local dockerd[4019880]: time="2023-08-16T13:26:42.366658513+02:00" level=warning msg="error while deleting neighbor entry" error="no such file or directory"
While changing this code, also slightly rephrase the code-comment, and
fix a typo ("remote -> remove").
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
libnetwork/osl: Namespace.DeleteNeighbor: rephrase code-comment
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The test-file had a duplicate definition for ErrNotImplemented, which
caused an error in this package, and was not used otherwise, so we can
remove this file.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
There's only one implementation; let's use that.
Also fixing a linting issue;
libnetwork/osl/interface_linux.go:91:2: S1001: should use copy(to, from) instead of a loop (gosimple)
for i, iface := range n.iFaces {
^
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
InterfaceOptions() returned an IfaceOptionSetter interface, which contained
"methods" that returned functional options. Such a construct could have made
sense if the functional options returned would (e.g.) be pre-propagated with
information from the Sandbox (network namespace), but none of that was the case.
There was only one implementation of IfaceOptionSetter (networkNamespace),
which happened to be the same as the only implementation of Sandbox, so remove
the interface as well, to help networkNamespace with its multi-personality
disorder.
This patch:
- removes Sandbox.Bridge() and makes it a regular function (WithIsBridge)
- removes Sandbox.Master() and makes it a regular function (WithMaster)
- removes Sandbox.MacAddress() and makes it a regular function (WithMACAddress)
- removes Sandbox.Address() and makes it a regular function (WithIPv4Address)
- removes Sandbox.AddressIPv6() and makes it a regular function (WithIPv6Address)
- removes Sandbox.LinkLocalAddresses() and makes it a regular function (WithLinkLocalAddresses)
- removes Sandbox.Routes() and makes it a regular function (WithRoutes)
- removes Sandbox.InterfaceOptions().
- removes the IfaceOptionSetter interface.
Note that the IfaceOption signature was changes as well to allow returning
an error. This is not currently used, but will be used for some options
in the near future, so adding that in preparation.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
NeighborOptions() returned an NeighborOptionSetter interface, which
contained "methods" that returned functional options. Such a construct
could have made sense if the functional options returned would (e.g.)
be pre-propagated with information from the Sandbox (network namespace),
but none of that was the case.
There was only one implementation of NeighborOptionSetter (networkNamespace),
which happened to be the same as the only implementation of Sandbox, so
remove the interface as well, to help networkNamespace with its multi-personality
disorder.
This patch:
- removes Sandbox.LinkName() and makes it a regular function (WithLinkName)
- removes Sandbox.Family() and makes it a regular function (WithFamily)
- removes Sandbox.NeighborOptions().
- removes the NeighborOptionSetter interface
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Copying the descriptions from the Sandbox, Info, NeighborOptionSetter,
and IfaceOptionSetter interfaces that it implements.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
"Pay no attention to the implementation behind the curtain!"
There's only one implementation of the Sandbox interface, and only one implementation
of the Info interface, and they both happens to be implemented by the same type:
networkNamespace. Let's merge these interfaces.
And now that we know that there's one, and only one Info, we can drop the charade,
and relieve the Sandbox from its dual personality.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The basepath is only used on Linux, so no need to call it on other
platforms. SetBasePath was already stubbed out on other platforms,
but "osl" was still imported in various places where it was not actually
used, so trying to reduce imports to get a better picture of what parts
are used (and not used).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The mutex is only used on reads, but there's nothing protecting writes,
and it looks like nothing is mutating fields after creation, so let's
remove this altogether.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
No context in the commit that added it, but PR discussion shows that
the API was mostly exploratory, and it was 8 Years go, so let's not
head in that direction :) b646784859
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The method to restore a network namespace takes a collection of
interfaces to restore with the options to apply. The interface names are
structured data, tuples of (SrcName, DstPrefix) but for whatever reason
are being passed into Restore() serialized to strings. A refactor,
f0be4d126d, accidentally broke the
serialization by dropping the delimiter. Rather than fix the
serialization and leave the time-bomb for someone else to trip over,
pass the interface names as structured data.
Signed-off-by: Cory Snider <csnider@mirantis.com>
osl.setIPv6 mistakenly captured the calling goroutine's thread's network
namespace instead of the network namespace of the thread getting its
namespace temporarily changed. As this function appears to only be
called from contexts in the process's initial network namespace, this
mistake would be of little consequence at runtime. The libnetwork unit
tests, on the other hand, unshare network namespaces so as not to
interfere with each other or the host's network namespace. But due to
this bug, the isolation backfires and the network namespace of
goroutines used by a test which are expected to be in the initial
network namespace can randomly become the isolated network namespace of
some other test. Symptoms include a loopback network server running in
one goroutine being inexplicably and randomly being unreachable by a
client in another goroutine.
Capture the original network namespace of the thread from the thread to
be tampered with, after locking the goroutine to the thread.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Now that most uses of reexec have been replaced with non-reexec
solutions, most of the reexec.Init() calls peppered throughout the test
suites are unnecessary. Furthermore, most of the reexec.Init() calls in
test code neglects to check the return value to determine whether to
exit, which would result in the reexec'ed subprocesses proceeding to run
the tests, which would reexec another subprocess which would proceed to
run the tests, recursively. (That would explain why every reexec
callback used to unconditionally call os.Exit() instead of returning...)
Remove unneeded reexec.Init() calls from test and example code which no
longer needs it, and fix the reexec.Init() calls which are not inert to
exit after a reexec callback is invoked.
Signed-off-by: Cory Snider <csnider@mirantis.com>