Commit graph

78 commits

Author SHA1 Message Date
Flavio Crisciani
e353e7e3f0
Fixes for resolv.conf
Handle the case of systemd-resolved, and if in place
use a different resolv.conf source.
Set appropriately the option on libnetwork.
Move unix specific code to container_operation_unix

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2018-07-26 11:17:56 -07:00
Brian Goff
cc8f358c23 Move network operations out of container package
These network operations really don't have anything to do with the
container but rather are setting up the networking.

Ideally these wouldn't get shoved into the daemon package, but doing
something else (e.g. extract a network service into a new package) but
there's a lot more work to do in that regard.
In reality, this probably simplifies some of that work as it moves all
the network operations to the same place.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-05-10 17:16:00 -04:00
Chris Telfer
c27417aa7d Remove (now) extra call to sb.DisableService()
This call was added as part of commit a042e5a20 and at the time was
useful.  sandbox.DisableService() basically calls
endpoint.deleteServiceInfoFromCluster() for every endpoint in the
sandbox.  However, with the libnetwork change, endpoint.sbLeave()
invokes endpoint.deleteServiceInfoFromCluster(). The releaseNetwork()
call invokes sandbox.Delete() immediately after
sandbox.DisableService().  The sandbox.Delete() in turn ultimately
invokes endpoint.sbLeave() for every endpoint in the sandbox which thus
removes the endpoint's load balancing entry via
endpoint.deleteServiceInfoFromCluster().  So the call to
sandbox.DisableService() is now redundant.

It is noteworthy that, while redundant, the presence of the call would
not cause errors.  It would just be sub-optimal.  The DisableService()
call would cause libnetwork to down-weight the load balancing entries
while the call to sandbox.Delete() would cause it to remove the entries
immediately afterwards.  Aside from the wasted computation, the extra
call would also propagate an extra state change in the networkDB gossip
messages.  So, overall, it is much better to just avoid the extra
overhead.

Signed-off-by: Chris Telfer <ctelfer@docker.com>
2018-03-28 14:16:31 -04:00
Daniel Nephin
4f0d95fa6e Add canonical import comment
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2018-02-05 16:51:57 -05:00
Brian Goff
c379d2681f Fix race in attachable network attachment
Attachable networks are networks created on the cluster which can then
be attached to by non-swarm containers. These networks are lazily
created on the node that wants to attach to that network.

When no container is currently attached to one of these networks on a
node, and then multiple containers which want that network are started
concurrently, this can cause a race condition in the network attachment
where essentially we try to attach the same network to the node twice.

To easily reproduce this issue you must use a multi-node cluster with a
worker node that has lots of CPUs (I used a 36 CPU node).

Repro steps:

1. On manager, `docker network create -d overlay --attachable test`
2. On worker, `docker create --restart=always --network test busybox
top`, many times... 200 is a good number (but not much more due to
subnet size restrictions)
3. Restart the daemon

When the daemon restarts, it will attempt to start all those containers
simultaneously. Note that you could try to do this yourself over the API,
but it's harder to trigger due to the added latency from going over
the API.

The error produced happens when the daemon tries to start the container
upon allocating the network resources:

```
attaching to network failed, make sure your network options are correct and check manager logs: context deadline exceeded
```

What happens here is the worker makes a network attachment request to
the manager. This is an async call which in the happy case would cause a
task to be placed on the node, which the worker is waiting for to get
the network configuration.
In the case of this race, the error ocurrs on the manager like this:

```
task allocation failure" error="failed during network allocation for task n7bwwwbymj2o2h9asqkza8gom: failed to allocate network IP for task n7bwwwbymj2o2h9asqkza8gom network rj4szie2zfauqnpgh4eri1yue: could not find an available IP" module=node node.id=u3489c490fx1df8onlyfo1v6e
```

The task is not created and the worker times out waiting for the task.

---

The mitigation for this is to make sure that only one attachment reuest
is in flight for a given network at a time *when the network doesn't
already exist on the node*. If the network already exists on the node
there is no need for synchronization because the network is already
allocated and on the node so there is no need to request it from the
manager.

This basically comes down to a race with `Find(network) ||
Create(network)` without any sort of syncronization.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-02-02 13:46:23 -05:00
Yong Tang
d63a5a1ff5 Fix network alias issue
This fix tries to address the issue raised in 33661 where
network alias does not work when connect to a network the second time.

This fix address the issue.

This fix fixes 33661.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2018-01-23 01:04:33 +00:00
Abhinandan Prativadi
a042e5a20a Disable service on release network
This PR contains a fix for moby/moby#30321. There was a moby/moby#31142
PR intending to fix the issue by adding a delay between disabling the
service in the cluster and the shutdown of the tasks. However
disabling the service was not deleting the service info in the cluster.
Added a fix to delete service info from cluster and verified using siege
to ensure there is zero downtime on rolling update of a service.In order
to support it and ensure consitency of enabling and disable service knob
from the daemon, we need to ensure we disable service when we release
the network from the container. This helps in making the enable and
disable service less racy. The corresponding part of libnetwork fix is
part of docker/libnetwork#1824

Signed-off-by: abhi <abhi@docker.com>
2018-01-17 14:19:51 -08:00
Vincent Demeester
be14665210
Merge pull request #36021 from yongtang/30897-follow-up
Rename FindUniqueNetwork to FindNetwork
2018-01-16 09:38:16 +01:00
Yong Tang
ccc2ed0189 Rename FindUniqueNetwork to FindNetwork
This fix is a follow up to 30397, with `FindUniqueNetwork`
changed to `FindNetwork` based on the review feedback.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2018-01-15 17:34:40 +00:00
Brian Goff
d453fe35b9 Move api/errdefs to errdefs
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-01-11 21:21:43 -05:00
Brian Goff
87a12421a9 Add helpers to create errdef errors
Instead of having to create a bunch of custom error types that are doing
nothing but wrapping another error in sub-packages, use a common helper
to create errors of the requested type.

e.g. instead of re-implementing this over and over:

```go
type notFoundError struct {
  cause error
}

func(e notFoundError) Error() string {
  return e.cause.Error()
}

func(e notFoundError) NotFound() {}

func(e notFoundError) Cause() error {
  return e.cause
}
```

Packages can instead just do:

```
  errdefs.NotFound(err)
```

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-01-11 21:21:43 -05:00
Yong Tang
cafed80cd0 Update FindUniqueNetwork to address network name duplications
This fix is part of the effort to address 30242 where
issue arise because of the fact that multiple networks
may share the same name (within or across local/swarm scopes).

The focus of this fix is to allow creation of service
when a network in local scope has the same name as the
service network.

An integration test has been added.

This fix fixes 30242.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2018-01-06 01:55:28 +00:00
Brian Goff
ebcb7d6b40 Remove string checking in API error handling
Use strongly typed errors to set HTTP status codes.
Error interfaces are defined in the api/errors package and errors
returned from controllers are checked against these interfaces.

Errors can be wraeped in a pkg/errors.Causer, as long as somewhere in the
line of causes one of the interfaces is implemented. The special error
interfaces take precedence over Causer, meaning if both Causer and one
of the new error interfaces are implemented, the Causer is not
traversed.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2017-08-15 16:01:11 -04:00
Derek McGowan
1009e6a40b
Update logrus to v1.0.1
Fixes case sensitivity issue

Signed-off-by: Derek McGowan <derek@mcgstyle.net>
2017-07-31 13:16:46 -07:00
Madhan Raj Mookkandy
349913ce9f Include Endpoint List for Shared Endpoints
Do not allow sharing of container network with hyperv containers

Signed-off-by: Madhan Raj Mookkandy <madhanm@microsoft.com>
2017-07-06 12:19:17 -07:00
Fabio Kung
37addf0a50 Net operations already hold locks to containers
Fix a deadlock caused by re-entrant locks on container objects.

Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
2017-06-23 07:52:35 -07:00
Fabio Kung
a43be3431e avoid re-reading json files when copying containers
Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
2017-06-23 07:52:34 -07:00
Fabio Kung
edad52707c save deep copies of Container in the replica store
Reuse existing structures and rely on json serialization to deep copy
Container objects.

Also consolidate all "save" operations on container.CheckpointTo, which
now both saves a serialized json to disk, and replicates state to the
ACID in-memory store.

Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
2017-06-23 07:52:33 -07:00
Fabio Kung
aacddda89d Move checkpointing to the Container object
Also hide ViewDB behind an inteface.

Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
2017-06-23 07:52:32 -07:00
Fabio Kung
eed4c7b73f keep a consistent view of containers rendered
Replicate relevant mutations to the in-memory ACID store. Readers will
then be able to query container state without locking.

Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
2017-06-23 07:52:31 -07:00
Brian Goff
4d0888e32b Lock container while connecting to a new network.
`ConnectToNetwork` is modfying the container but is not locking the
object.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2017-05-31 15:13:04 -04:00
Alessandro Boch
2418f25767 Do not error out on serv bind deactivation if no sbox is found
- If the nw sbox is not there, then there is nothing to deactivate.

Signed-off-by: Alessandro Boch <aboch@docker.com>
2017-04-10 09:13:41 -07:00
Alessandro Boch
4ca7d4f0c1 Fix start/restart of detached container
Signed-off-by: Alessandro Boch <aboch@docker.com>
2017-03-22 02:38:26 -07:00
Ryan Liu
786f30107b Fix nw sandbox leak when stopping detached container
Signed-off-by: Ryan Liu <ryanlyy@me.com>
2017-03-21 23:51:52 -07:00
Madhan Raj Mookkandy
040afcce8f (*) Support --net:container:<containername/id> for windows
(*) (vdemeester) Removed duplicate code across Windows and Unix wrt Net:Containers
(*) Return unsupported error for network sharing for hyperv isolation containers

Signed-off-by: Madhan Raj Mookkandy <MadhanRaj.Mookkandy@microsoft.com>
2017-02-28 20:03:43 -08:00
Vincent Demeester
40f390e67e Merge pull request #31384 from allencloud/validate-extrahosts-in-deamon-side
validate extraHosts in daemon side
2017-02-28 18:28:10 +01:00
Vincent Demeester
cb6832c6d3
Extract common code from disconnectFromNetwork and releaseNetwork
Both method are trying to detach the container from a cluster
network. The code is exactly the same, this removes the duplication.

Signed-off-by: Vincent Demeester <vincent@sbr.pm>
2017-02-28 11:11:59 +01:00
allencloud
d524dd95cc validate extraHosts in daemon side
Signed-off-by: allencloud <allen.sun@daocloud.io>
2017-02-28 10:37:59 +08:00
Vincent Demeester
db63f9370e
Extract daemon configuration and discovery to their own package
This also moves some cli specific in `cmd/dockerd` as it does not
really belong to the `daemon/config` package.

Signed-off-by: Vincent Demeester <vincent@sbr.pm>
2017-02-08 09:53:38 +01:00
Vincent Demeester
c0a1d2e0d8 Merge pull request #30117 from msabansal/natfix
Added support for dns-search and fixes #30102
2017-01-31 11:05:29 +01:00
Zhang Wei
827bbe90a0 Fix some typos
Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
2017-01-19 15:29:28 +08:00
msabansal
e6962481a0 Added support for dns-search and fixes #30102
Signed-off-by: msabansal <sabansal@microsoft.com>
2017-01-13 12:01:10 -08:00
allencloud
847de59934 fix nit in comments
Signed-off-by: allencloud <allen.sun@daocloud.io>
2017-01-08 21:32:30 +08:00
allencloud
f0844de8f0 display network name when disconnecting network error
Signed-off-by: allencloud <allen.sun@daocloud.io>
2016-12-27 13:37:54 +08:00
Yong Tang
b0a7b0120f Fix issue for --hostname when running in "--net=host"
This fix tries to address the issue raised in 29129 where
"--hostname" not working when running in "--net=host" for
`docker run`.

The fix fixes the issue by not resetting the `container.Config.Hostname`
if the `Hostname` has already been assigned through `--hostname`.

An integration test has been added to cover the changes.

This fix fixes 29129.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2016-12-06 07:29:45 -08:00
Justin Cormack
cd5c8e9c2d Fix grammar on error message
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
2016-11-25 14:58:20 +00:00
Sebastiaan van Stijn
9a61bd05f8 Merge pull request #27466 from mrjana/net
Retry AttachNetwork when it fails to find network
2016-11-08 18:25:45 +01:00
Dong Chen
ca81f6ee7c dynamic service binding.
Signed-off-by: Dong Chen <dongluo.chen@docker.com>
2016-11-04 21:50:56 -07:00
Madhu Venugopal
5f17e0f6c9 Handle NetworkDettach for the case of network-id
When a container is attached to an "--attachable" network, it strictly
forms the attacherKey using either the network-id or network-name
because at the time of attachment, the daemon may not have the network
downloaded locally from the manager. Hence, when the NetworkDettach is
called, it should use either network-name or network-id. This fix
addresses the missing network-id based dettachment case.

Signed-off-by: Madhu Venugopal <madhu@docker.com>
2016-11-03 15:56:35 -07:00
Jana Radhakrishnan
849e345e2c Retry AttachNetwork when it fails to find network
When trying to attach to swarm scope network for an unmanaged container
sometimes even if attaching to network succeeds, we may not find the
network because some other container which was using the network went
down and removed the network. So if it is not found, try to detach and
reattach to re-download the network from the manager.

Fixes #26588

Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
2016-11-01 21:46:24 -07:00
Madhu Venugopal
c5dd4d70c6 Copy only the relevant endpoint configs from Attachable config
When a container is run on a --attachable network, the endpoint
configs passed by the user were incorrectly overwritten.
Copy the relevant configs instead of overwriting the entire configs.

Signed-off-by: Madhu Venugopal <madhu@docker.com>
2016-10-29 17:11:30 -07:00
Michael Crosby
3343d234f3 Add basic prometheus support
This adds a metrics packages that creates additional metrics.  Add the
metrics endpoint to the docker api server under `/metrics`.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Add metrics to daemon package

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

api: use standard way for metrics route

Also add "type" query parameter

Signed-off-by: Alexander Morozov <lk4d4@docker.com>

Convert timers to ms

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-10-27 10:34:38 -07:00
Yanqiang Miao
2d126f190d Delete a redundant error return
Signed-off-by: Yanqiang Miao <miao.yanqiang@zte.com.cn>

update

Signed-off-by: Yanqiang Miao <miao.yanqiang@zte.com.cn>

update

Signed-off-by: Yanqiang Miao <miao.yanqiang@zte.com.cn>
2016-10-22 08:53:57 +08:00
John Howard
600f0ad211 Windows: Factor out unused fields in container
Signed-off-by: John Howard <jhoward@microsoft.com>
2016-10-13 14:51:10 -07:00
Yanqiang Miao
9c3d1236d2 Delete the redundant function 'errClusterNetworkOnRun'
Signed-off-by: Yanqiang Miao <miao.yanqiang@zte.com.cn>
2016-09-24 11:24:48 +08:00
Sebastiaan van Stijn
8643903e49 Merge pull request #26805 from miaoyq/refactor-allocateNetwork
Replace two array with a map type, make it easier to understand.
2016-09-23 10:15:37 +02:00
Madhu Venugopal
d3139fc84a Merge pull request #25987 from msabansal/dnssupport
Support for Windows service discovery
2016-09-22 20:56:03 -07:00
Yanqiang Miao
1989b1b58c Replace two array with a map type, make it easier to understand.
Signed-off-by: Yanqiang Miao <miao.yanqiang@zte.com.cn>

update

Signed-off-by: Yanqiang Miao <miao.yanqiang@zte.com.cn>

update

Signed-off-by: Yanqiang Miao <miao.yanqiang@zte.com.cn>
2016-09-23 09:52:05 +08:00
msabansal
d1e0a78614 Changes required to support windows service discovery
Signed-off-by: msabansal <sabansal@microsoft.com>
2016-09-22 12:21:21 -07:00
msabansal
50f02b585c Fixed support for docker compose by allowing connect/disconnect on stopped containers
Signed-off-by: msabansal <sabansal@microsoft.com>
2016-09-21 13:29:17 -07:00