Currently there is an instance of controller and service lock being
obtained in different order which causes the AB/BA deadlock. Do not ever
wrap controller lock around service lock.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
With port redirect in the ingress path happening before ipvs in the
ingess sandbox, there is a chance of 5-tuple collision in the ipvs
connection table for two entirely different services have different
PublishedPorts but the same TargetPort. To disambiguate the ipvs
connection table, delay the port redirect from PublishedPort to
TargetPort until after the loadbalancing has happened in ipvs. To be
specific, perform the redirect after the packet enters the real backend
container namespace.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
Currently, a reference counting scheme is used to reference count all
individual port configs that need to be plumbed in the ingress to make
sure that in situations where a service with the same set of port
configs is getting added or removed doesn't accidentally remove the port
config plumbing if the add/remove notifications come out of order. This
same reference counting scheme is also used for plumbing the port-based
marking rules. But marking rules should not be plumbed based on that
because marks are always different for different instantiations of the
same service. So fixed the code to plumb port-based mark rules based on
the complete set of port configs, while plumbing pure port rules and
proxies based on a filter set of port configs based on the reference
count.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
This also allows pubslied services to be accessible from containers on bridge
networks on the host
Signed-off-by: Santhosh Manohar <santhosh@docker.com>
When leaving the entire gossip cluster or when leaving a network
specific gossip cluster, we may not have had a chance to cleanup service
bindings by way of gossip updates due to premature closure of gossip
channel. Make sure to cleanup all service bindings since we are not
participating in the cluster any more.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
The SNAT rules added for LB egress is broader and breaks load balancing
if the service is connected to multiple networks. Make it conditional
based on the subnet to which the network belongs so that the right SNAT
rule gets matched when egressing the corresponding network.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
Make service loadbalancing to work from within one of the containers of
the service. Currently this only works when the loadbalancer selects the
current container. If another container of the same service is chosen,
the connection times out. This fix adds a SNAT rule to change the source
IP to the containers primary IP so that responses can be routed back to
this container.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
Ingress loadbalancer is only required to be plumbed in ingress sandboxes
of nodes which are the only mechanism to get traffix outside the cluster
to tasks. Since the tasks are part of ingress network, these
loadbalancers were getting added in all tasks which are exposing ports
which is totally unnecessary resource usage. This PR avoids that.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
Sometimes you may get stale backend removal notices from gossip due to
some lingering state. If a stale backend notice is received and it is
already processed in this node ignore it rather than processing it.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
While scaling down, currently we are removing the service record even if
the LB entry for the vip is not fully removed. This causes resolution
issues when scaling down. Fixed it by removing the service record only
if the LB for the vip is going away.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
While trying to update loadbalancer state index the service both on id
and portconfig. From libnetwork point of view a service is not just
defined by its id but also the ports it exposes. When a service updates
its port its id remains the same but its portconfigs change which should
be treated as a new service in libnetwork in order to ensure proper
cleanup of old LB state and creation of new LB state.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
Currently even outgoing connection requests are matched to inject into
DOCKER-INGRESS chain. This is not correct because it disrupts access to
services outside the host on the same service port. Instead inject only
the locally destined packets towards DOCKER-INGRESS chain.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
When adding a loadbalancer to a sandbox, the sandbox may have a valid
namespace but it might not have populated all the dependent network
resources yet. In that case do not populate that endpoint's loadbalancer
into that sandbox yet. The loadbalancer will be populated into the
sandbox when it is done populating all the dependent network resources.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
Right now if no vip is provided only when a new loadbalancer is created
we add the service records of the backend ip. But it should happen all
the time. This is to make DNS RR on service name work.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
When a goroutine which is adding the service and another which is adding
just a destination interleave the destination which is dependent on the
service may not get added and will result in service working at reduced
scale. The fix is to synchronize this with the service mutex.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
- Moved ingress port forwarding rules to its own chain
- Flushed the chain during init
- Bound to the swarm ports so no hijacks it.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
Also do not log error messages when adding a destination and it already
exists. This can happen because of duplicate gossip notifications.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
Ingress load balancer is achieved via a service sandbox which acts as
the proxy to translate incoming node port requests and mapping that to a
service entry. Once the right service is identified, the same internal
loadbalancer implementation is used to load balance to the right backend
instance.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
This PR adds support for loadbalancing across a group of endpoints that
share the same service configuration as passed in by
`OptionService`. The loadbalancer is implemented using ipvs with just
round robin scheduling supported for now.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>