Non-swarm networks created before network-creation-time validation
was added in 25.0.0 continued working, because the checks are not
re-run.
But, swarm creates networks when needed (with 'agent=true'), to
ensure they exist on each agent - ignoring the NetworkNameError
that says the network already existed.
By ignoring validation errors on creation of a network with
agent=true, pre-existing swarm networks with IPAM config that would
fail the new checks will continue to work too.
New swarm (overlay) networks are still validated, because they are
initially created with 'agent=false'.
Signed-off-by: Rob Murray <rob.murray@docker.com>
The API's EndpointConfig struct has a MacAddress field that's used for
both the configured address, and the current address (which may be generated).
A configured address must be restored when a container is restarted, but a
generated address must not.
The previous attempt to differentiate between the two, without adding a field
to the API's EndpointConfig that would show up in 'inspect' output, was a
field in the daemon's version of EndpointSettings, MACOperational. It did
not work, MACOperational was set to true when a configured address was
used. So, while it ensured addresses were regenerated, it failed to preserve
a configured address.
So, this change removes that code, and adds DesiredMacAddress to the wrapped
version of EndpointSettings, where it is persisted but does not appear in
'inspect' results. Its value is copied from MacAddress (the API field) when
a container is created.
Signed-off-by: Rob Murray <rob.murray@docker.com>
The semantics of an "anonymous" endpoint has always been weird: it was
set on endpoints which name shouldn't be taken into account when
inserting DNS records into libnetwork's `Controller.svcRecords` (and
into the NetworkDB). However, in that case the endpoint's aliases would
still be used to create DNS records; thus, making those "anonymous
endpoints" not so anonymous.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Instead of special-casing anonymous endpoints in libnetwork, let the
daemon specify what (non fully qualified) DNS names should be associated
to container's endpoints.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
They just happen to exist on a network that doesn't support DNS-based
service discovery (ie. no embedded DNS servers are started for them).
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
This struct is intended for internal use only for the backend, and is
not intended to be used externally.
This moves the plugin-related `NetworkListConfig` types to the backend
package to prevent it being imported in the client, and to make it more
clear that this is part of internal APIs, and not public-facing.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
DNS config is a property of each adapter on Windows, thus we've a
dedicated `EndpointOption` for that.
The list of `EndpointOption` that should be applied to a given endpoint
is built by `buildCreateEndpointOptions`. This function contains a
seemingly flawed condition that adds the DNS config _iff_:
1. the network isn't internal ;
2. no ports are published / exposed through another sandbox endpoint ;
While 1. does make sense, there's actually no justification for 2.,
hence this commit remove this part of the condition.
This logic flaw has been made obvious by 0fd0e82, but it was originally
introduced by d1e0a78. Commit and PR comments don't mention why this is
done like so. Most probably, this was overlooked both by the original
author and the PR reviewers.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
The `buildCreateEndpointOptions` does a lot of things to build the list
of `libnetwork.EndpointOption` from the `EndpointSettings` spec. To skip
ports-related options, an early return was put in the middle of that
function body.
Early returns are generally great, but put in the middle of a 150-loc
long function that does a lot, they're just a potential footgun. And I'm
the one who pulled the trigger in 052562f. Since this commit, generic
options won't be applied to endpoints if there's already one with
exposed/published ports. As a consequence, only the first endpoint can
have a user-defined MAC address right now.
Instead of moving up the code line that adds generic options, a better
change IMO is to move ports-related options, and the early-return gating
those options, to a dedicated func to make `buildCreateEndpointOptions`
slightly easier to read and reason about.
There was actually one oddity in the original
`buildCreateEndpointOptions`: the early-return also gates the addition
of `CreateOptionDNS`. These options are Windows-specific; a comment is
added to explain that. But the oddity is really: why are we checking if
an endpoint with exposed / published ports joined this sandbox to decide
whether we want to configure DNS server on the endpoint's adapter? Well,
this early-return was most probably overlooked by the original author
and by reviewers at the time these options were added (in commit d1e0a78)
Let's fix that in a follow-up commit.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
I was trying to find out why `docker info` was sometimes slow so
plumbing a context through to propagate trace data through.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Having a sandbox/container-wide MacAddress field makes little sense
since a container can be connected to multiple networks at the same
time. This field is an artefact of old times where a container could be
connected to a single network only.
As we now have a way to specify per-endpoint mac address, this field is
now deprecated.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Prior to this commit, only container.Config had a MacAddress field and
it's used only for the first network the container connects to. It's a
relic of old times where custom networks were not supported.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
The github.com/containerd/containerd/log package was moved to a separate
module, which will also be used by upcoming (patch) releases of containerd.
This patch moves our own uses of the package to use the new module.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Fixes#18864, #20648, #33561, #40901.
[This GH comment][1] makes clear network name uniqueness has never been
enforced due to the eventually consistent nature of Classic Swarm
datastores:
> there is no guaranteed way to check for duplicates across a cluster of
> docker hosts.
And this is further confirmed by other comments made by @mrjana in that
same issue, eg. [this one][2]:
> we want to adopt a schema which can pave the way in the future for a
> completely decentralized cluster of docker hosts (if scalability is
> needed).
This decentralized model is what Classic Swarm was trying to be. It's
been superseded since then by Docker Swarm, which has a centralized
control plane.
To circumvent this drawback, the `NetworkCreate` endpoint accepts a
`CheckDuplicate` flag. However it's not perfectly reliable as it won't
catch concurrent requests.
Due to this design decision, API clients like Compose have to implement
workarounds to make sure names are really unique (eg.
docker/compose#9585). And the daemon itself has seen a string of issues
due to that decision, including some that aren't fixed to this day (for
instance moby/moby#40901):
> The problem is, that if you specify a network for a container using
> the ID, it will add that network to the container but it will then
> change it to reference the network by using the name.
To summarize, this "feature" is broken, has no practical use and is a
source of pain for Docker users and API consumers. So let's just remove
it for _all_ API versions.
[1]: https://github.com/moby/moby/issues/18864#issuecomment-167201414
[2]: https://github.com/moby/moby/issues/18864#issuecomment-167202589
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
PR 4f47013feb added a validation step to `NetworkCreate` to ensure
no IPv6 subnet could be set on a network if its `EnableIPv6` parameter
is false.
Before that, the daemon was accepting such request but was doing nothing
with the IPv6 subnet.
This validation step is now deleted, and we automatically set
`EnableIPv6` if an IPv6 subnet was specified.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
The daemon would pass an EndpointCreateOption to set the interface MAC
address if the network name and the provided network mode were matching.
Obviously, if the network mode is a network ID, it won't work. To make
things worse, the network mode is never normalized if it's a partial ID.
To fix that: 1. the condition under what the container's mac-address is
applied is updated to also match the full ID; 2. the network mode is
normalized to a full ID when it's only a partial one.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Define consts for the Actions we use for events, instead of "ad-hoc" strings.
Having these consts makes it easier to find where specific events are triggered,
makes the events less error-prone, and allows documenting each Action (if needed).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Currently, IPAM config is never validated by the API. Some checks
are done by the CLI, but they're not exhaustive. And some of these
misconfigurations might be caught early by libnetwork (ie. when the
network is created), and others only surface when connecting a container
to a misconfigured network. In both cases, the API would return a 500.
Although the `NetworkCreate` endpoint might already return warnings,
these are never displayed by the CLI. As such, it was decided during a
maintainer's call to return validation errors _for all API versions_.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Now that we removed the interface, there's no need to cast the Network
to a NetworkInfo interface, so we can remove uses of the `Info()` method.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
PR moby/moby#45759 is going to use the new `errors.Join` function to
return a list of validation errors.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
store network.Name() in a variable to reduce repeatedly locking/unlocking
of the network (although this is very, very minimal in the grand scheme
of things).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- store network.Name() in a variable to reduce repeatedly locking/unlocking
of the network (although this is very, very minimal in the grand scheme
of things).
- un-wrap long conditions
- ever so slightly optimise some conditions by changeing the order of checks.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This code was initializing a new PortBinding, and creating a deep copy
for each binding. It's unclear what the intent was here, but at least
PortBinding.GetCopy() wasn't adding much value, as it created a new
PortBinding, [copying all values from the original][1], which includes
a [copy of IPAddresses in it][2]. Our original "template" did not have any
of that, so let's forego that, and just create new PortBindings as we go.
[1]: 454b6a7cf5/libnetwork/types/types.go (L110-L120)
[2]: 454b6a7cf5/libnetwork/types/types.go (L236-L244)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
These were not adding much, so just getting rid of them. Also added a
TODO to move this code to the type.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Move variables closer to where they're used instead of defining them all
at the start of the function.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
`getPortMapInfo` does many things; it creates a copy of all the sandbox
endpoints, gets the driver, endpoints, and network from store, and creates
port-bindings for all exposed and mapped ports.
We should look if we can create a more minimal implementation for this
purpose, but in the meantime, let's prevent it being called if we don't
need it by making it the second condition in the check.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- don't initialize slices; it's not needed to append to them
- store network-ID in a var to prevent repeated lock/unlocking in nw.ID()
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Now that all helper functions are updated, we can use a struct-literal
for this function, which makes it slightly easier to read.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Split the buildDetailedNetworkResources function into separate functions for
collecting container attachments (`buildContainerAttachments`) and service
attachments (`buildServiceAttachments`). This allows us to get rid of the
"verbose" bool, and makes the logic slightly more transparent.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- Pass the endpoint and endpoint-info, instead of individual fields from the
endpoint.
- Remove redundant nil-check, as it's already checked on the call-side
in `buildDetailedNetworkResources`, which skips endpoints without info.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Make the function return the constructed network.IPAM instead of applying
it to a network struct, and rename it to "buildIPAMResources".
Rewrite the function itself:
- Use struct-literals where possible to make it slightly more readable.
- Use a boolean (hasIPv4Config, hasIPv6Config) for both IPv4 and IPv6 to
check whether the IPAM-info needs to be added. This makes the logic the
same for both, and makes the processing order-independent. This also
allows for the `network.IpamInfo()` call to be skipped if it's not needed.
- Change order of "ipv4 config / ipv4 info" and "ipv6 config / ipv4 info"
blocks to make it slightly clearer (and to allow skipping the forementioned
call to `network.IpamInfo()`).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Move the length-check into the function, and change the code to
be a basic type-case, as networkdb.PeerInfo and network.PeerInfo
are identical types.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
With this change, the API will now return a 403 instead of a 500 when
trying to create an overlay network on a non-manager node.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
The commit befff0e13f inadvertendly
disabled the error returned when trying to create an overlay network on
a node which is not part of a Swarm cluster.
Since commit e3708a89cc the overlay
netdriver returns the error: `no VNI provided`.
This commit reinstate the original error message by checking if the node
is a manager before calling libnetwork's `controller.NewNetwork()`.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
The existing runtimes reload logic went to great lengths to replace the
directory containing runtime wrapper scripts as atomically as possible
within the limitations of the Linux filesystem ABI. Trouble is,
atomically swapping the wrapper scripts directory solves the wrong
problem! The runtime configuration is "locked in" when a container is
started, including the path to the runC binary. If a container is
started with a runtime which requires a daemon-managed wrapper script
and then the daemon is reloaded with a config which no longer requires
the wrapper script (i.e. some args -> no args, or the runtime is dropped
from the config), that container would become unmanageable. Any attempts
to stop, exec or otherwise perform lifecycle management operations on
the container are likely to fail due to the wrapper script no longer
existing at its original path.
Atomically swapping the wrapper scripts is also incompatible with the
read-copy-update paradigm for reloading configuration. A handler in the
daemon could retain a reference to the pre-reload configuration for an
indeterminate amount of time after the daemon configuration has been
reloaded and updated. It is possible for the daemon to attempt to start
a container using a deleted wrapper script if a request to run a
container races a reload.
Solve the problem of deleting referenced wrapper scripts by ensuring
that all wrapper scripts are *immutable* for the lifetime of the daemon
process. Any given runtime wrapper script must always exist with the
same contents, no matter how many times the daemon config is reloaded,
or what changes are made to the config. This is accomplished by using
everyone's favourite design pattern: content-addressable storage. Each
wrapper script file name is suffixed with the SHA-256 digest of its
contents to (probabilistically) guarantee immutability without needing
any concurrency control. Stale runtime wrapper scripts are only cleaned
up on the next daemon restart.
Split the derived runtimes configuration from the user-supplied
configuration to have a place to store derived state without mutating
the user-supplied configuration or exposing daemon internals in API
struct types. Hold the derived state and the user-supplied configuration
in a single struct value so that they can be updated as an atomic unit.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Ensure data-race-free access to the daemon configuration without
locking by mutating a deep copy of the config and atomically storing
a pointer to the copy into the daemon-wide configStore value. Any
operations which need to read from the daemon config must capture the
configStore value only once and pass it around to guarantee a consistent
view of the config.
Signed-off-by: Cory Snider <csnider@mirantis.com>