Commit graph

5679 commits

Author SHA1 Message Date
Akihiro Suda
9769ef333f
Merge pull request #36224 from dnephin/refactor-commit
Refactor Daemon.Commit()
2018-02-08 21:02:30 +09:00
Tianon Gravi
3a633a712c
Merge pull request #36194 from dnephin/add-canonical-import
Add canonical import path
2018-02-07 13:06:45 -08:00
Daniel Nephin
daff039049 Refactor commit
The goal of this refactor is to make it easier to integrate buildkit
and containerd snapshotters.

Commit is used from two places (api and build), each calls it
with distinct arguments. Refactored to pull out the common commit
logic and provide different interfaces for each consumer.

Signed-off-by: Daniel Nephin <dnephin@docker.com>
2018-02-07 15:09:06 -05:00
Sebastiaan van Stijn
2e8ccbb49e
Merge pull request #36201 from arm64b/oom-kill-disable-fixing
Daemon: passdown the `--oom-kill-disable` option to containerd
2018-02-06 21:39:43 -08:00
John Howard
e62d36bcad
Merge pull request #35414 from madhanrm/hotadd1
Enable HotAdd for Windows
2018-02-06 10:40:39 -08:00
Daniel Nephin
4f0d95fa6e Add canonical import comment
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2018-02-05 16:51:57 -05:00
Yong Tang
6987557e0c
Merge pull request #36191 from cpuguy83/fix_attachable_network_race
Fix race in attachable network attachment
2018-02-05 09:41:35 -08:00
Dennis Chen
44b074d199 Daemon: passdown the --oom-kill-disable option to containerd
Current implementaion of docke daemon doesn't pass down the
`--oom-kill-disable` option specified by the end user to the containerd
when spawning a new docker instance with help from `runc` component, which
results in the `--oom-kill-disable` doesn't work no matter the flag is `true`
or `false`.

This PR will fix this issue reported by #36090

Signed-off-by: Dennis Chen <dennis.chen@arm.com>
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2018-02-05 03:25:59 +00:00
Flavio Crisciani
ec86547244
Libnetwork revendoring
Diff:
5ab4ab8300...20dd462e0a

- Memberlist revendor (fix for deadlock on exit)
- Network diagnostic client
- Fix for ndots configuration

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2018-02-02 14:36:32 -08:00
Brian Goff
c379d2681f Fix race in attachable network attachment
Attachable networks are networks created on the cluster which can then
be attached to by non-swarm containers. These networks are lazily
created on the node that wants to attach to that network.

When no container is currently attached to one of these networks on a
node, and then multiple containers which want that network are started
concurrently, this can cause a race condition in the network attachment
where essentially we try to attach the same network to the node twice.

To easily reproduce this issue you must use a multi-node cluster with a
worker node that has lots of CPUs (I used a 36 CPU node).

Repro steps:

1. On manager, `docker network create -d overlay --attachable test`
2. On worker, `docker create --restart=always --network test busybox
top`, many times... 200 is a good number (but not much more due to
subnet size restrictions)
3. Restart the daemon

When the daemon restarts, it will attempt to start all those containers
simultaneously. Note that you could try to do this yourself over the API,
but it's harder to trigger due to the added latency from going over
the API.

The error produced happens when the daemon tries to start the container
upon allocating the network resources:

```
attaching to network failed, make sure your network options are correct and check manager logs: context deadline exceeded
```

What happens here is the worker makes a network attachment request to
the manager. This is an async call which in the happy case would cause a
task to be placed on the node, which the worker is waiting for to get
the network configuration.
In the case of this race, the error ocurrs on the manager like this:

```
task allocation failure" error="failed during network allocation for task n7bwwwbymj2o2h9asqkza8gom: failed to allocate network IP for task n7bwwwbymj2o2h9asqkza8gom network rj4szie2zfauqnpgh4eri1yue: could not find an available IP" module=node node.id=u3489c490fx1df8onlyfo1v6e
```

The task is not created and the worker times out waiting for the task.

---

The mitigation for this is to make sure that only one attachment reuest
is in flight for a given network at a time *when the network doesn't
already exist on the node*. If the network already exists on the node
there is no need for synchronization because the network is already
allocated and on the node so there is no need to request it from the
manager.

This basically comes down to a race with `Find(network) ||
Create(network)` without any sort of syncronization.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-02-02 13:46:23 -05:00
Tibor Vass
53a58da551
Merge pull request #36160 from kolyshkin/layer-not-retained
daemon.cleanupContainer: nullify container RWLayer upon release
2018-01-31 15:13:00 -08:00
Yong Tang
9247e09944 Fix issue of ExitCode and PID not show up in Task.Status.ContainerStatus
This fix tries to address the issue raised in 36139 where
ExitCode and PID does not show up in Task.Status.ContainerStatus

The issue was caused by `json:",omitempty"` in PID and ExitCode
which interprate 0 as null.

This is confusion as ExitCode 0 does have a meaning.

This fix removes  `json:",omitempty"` in ExitCode and PID,
but changes ContainerStatus to pointer so that ContainerStatus
does not show up at all if no content. If ContainerStatus
does have a content, then ExitCode and PID will show up (even if
they are 0).

This fix fixes 36139.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2018-01-31 15:35:19 +00:00
Kir Kolyshkin
e9b9e4ace2 daemon.cleanupContainer: nullify container RWLayer upon release
ReleaseRWLayer can and should only be called once (unless it returns
an error), but might be called twice in case of a failure from
`system.EnsureRemoveAll(container.Root)`. This results in the
following error:

> Error response from daemon: driver "XXX" failed to remove root filesystem for YYY: layer not retained

The obvious fix is to set container.RWLayer to nil as soon as
ReleaseRWLayer() succeeds.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2018-01-30 18:50:59 -08:00
Yong Tang
9d61e5c8c1
Merge pull request #36124 from crosbymichael/exec
Use proc/exe for reexec
2018-01-29 11:41:08 -08:00
Vincent Demeester
d093aa0ec3
Merge pull request #36130 from yongtang/36042-secret-config-mode
Fix secret and config mode issue
2018-01-29 10:37:24 -08:00
Yong Tang
03a1df9536
Merge pull request #36114 from Microsoft/jjh/fixdeadlock-lcowdriver-hotremove
LCOW: Graphdriver fix deadlock in hotRemoveVHDs
2018-01-29 09:57:43 -08:00
Yong Tang
3305221eef Fix secret and config mode issue
This fix tries to address the issue raised in 36042
where secret and config are not configured with the
specified file mode.

This fix update the file mode so that it is not impacted
with umask.

Additional tests have been added.

This fix fixes 36042.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2018-01-28 16:21:41 +00:00
Yong Tang
c9f1807abb
Merge pull request #34992 from allencloud/simplify-shutdowntimeout
simplify codes on calculating shutdown timeout
2018-01-27 18:26:54 -08:00
Sebastiaan van Stijn
924fb0e843
Merge pull request #36095 from yongtang/36083-network-inspect-created-time
Fix issue where network inspect does not show Created time for networks in swarm scope
2018-01-26 17:18:30 -08:00
Michael Crosby
59ec65cd8c Use proc/exe for reexec
You don't need to resolve the symlink for the exec as long as the
process is to keep running during execution.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2018-01-26 14:13:43 -05:00
Michael Crosby
2c05aefc99
Merge pull request #36047 from cpuguy83/graphdriver_improvements
Do not make graphdriver homes private mounts.
2018-01-26 13:54:30 -05:00
Allen Sun
de68ac8393
Simplify codes on calculating shutdown timeout
Signed-off-by: Allen Sun <shlallen1990@gmail.com>
Signed-off-by: Vincent Demeester <vincent@sbr.pm>
2018-01-26 09:18:07 -08:00
John Howard
a44fcd3d27 LCOW: Graphdriver fix deadlock
Signed-off-by: John Howard <jhoward@microsoft.com>
2018-01-26 08:57:52 -08:00
Sebastiaan van Stijn
a8d0e36d03
Merge pull request #36052 from Microsoft/jjh/no-overlay-off-only-one-disk
LCOW: Regular mount if only one layer
2018-01-25 15:46:16 -08:00
Yong Tang
3ca99ac2f4
Merge pull request #36096 from cpuguy83/use_rshared_prop_for_daemon_root
Set daemon root to use shared propagation
2018-01-24 12:24:33 -08:00
Yong Tang
a636ed5ff4
Merge pull request #36078 from mixja/multiline-max-event-processing
awslogs - don't add new lines to maximum sized events
2018-01-24 12:06:49 -08:00
Yong Tang
25e56670cf
Merge pull request #35938 from yongtang/35931-filter-before-since
Fix `before` and `since` filter for `docker ps`
2018-01-24 12:06:19 -08:00
Yong Tang
914ce4fde7
Merge pull request #36077 from yongtang/35752-verifyNetworking
Verify NetworkingConfig to make sure EndpointSettings is not nil
2018-01-24 12:05:58 -08:00
Yong Tang
70a0621f25
Merge pull request #35966 from yongtang/33661-network-alias
Fix network alias issue with `network connect`
2018-01-23 14:56:28 -08:00
Brian Goff
a510192b86 Set daemon root to use shared propagation
This change sets an explicit mount propagation for the daemon root.
This is useful for people who need to bind mount the docker daemon root
into a container.

Since bind mounting the daemon root should only ever happen with at
least `rlsave` propagation (to prevent the container from holding
references to mounts making it impossible for the daemon to clean up its
resources), we should make sure the user is actually able to this.

Most modern systems have shared root (`/`) propagation by default
already, however there are some cases where this may not be so
(e.g. potentially docker-in-docker scenarios, but also other cases).
So this just gives the daemon a little more control here and provides
a more uniform experience across different systems.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-01-23 14:17:08 -08:00
Yong Tang
090c439fb8 Fix issue where network inspect does not show Created time in swarm scope
This fix tries to address the issue raised in 36083 where
`network inspect` does not show Created time if the network is
created in swarm scope.

The issue was that Created was not converted from swarm api.
This fix addresses the issue.

An unit test has been added.

This fix fixes 36083.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2018-01-23 18:26:51 +00:00
Yong Tang
99cfb5f31a
Merge pull request #36019 from thaJeztah/improve-config-reload
improve daemon config reload; log active configuration
2018-01-22 17:58:25 -08:00
Yong Tang
d63a5a1ff5 Fix network alias issue
This fix tries to address the issue raised in 33661 where
network alias does not work when connect to a network the second time.

This fix address the issue.

This fix fixes 33661.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2018-01-23 01:04:33 +00:00
Vincent Demeester
ea74dbe907
Merge pull request #35949 from yongtang/34248-carry
Carry #34248 Added tag log option to json-logger and use RawAttrs
2018-01-22 15:02:54 -08:00
Yong Tang
8d2f4cb241 Verify NetworkingConfig to make sure EndpointSettings is not nil
This fix tries to address the issue raised in 35752
where container start will trigger a crash if EndpointSettings is nil.

This fix adds the validation to make sure EndpointSettings != nil

This fix fixes 35752.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2018-01-22 16:31:10 +00:00
Justin Menga
d3e2d55a3d Don't append new line for maximum sized events
Signed-off-by: Justin Menga <justin.menga@gmail.com>
2018-01-21 14:29:55 +13:00
Sebastiaan van Stijn
8378dcf46d
Log active configuration when reloading
When succesfully reloading the daemon configuration, print a message
in the logs with the active configuration:

    INFO[2018-01-15T15:36:20.901688317Z] Got signal to reload configuration, reloading from: /etc/docker/daemon.json
    INFO[2018-01-14T02:23:48.782769942Z] Reloaded configuration: {"mtu":1500,"pidfile":"/var/run/docker.pid","data-root":"/var/lib/docker","exec-root":"/var/run/docker","group":"docker","deprecated-key-path":"/etc/docker/key.json","max-concurrent-downloads":3,"max-concurrent-uploads":5,"shutdown-timeout":15,"debug":true,"hosts":["unix:///var/run/docker.sock"],"log-level":"info","swarm-default-advertise-addr":"","metrics-addr":"","log-driver":"json-file","ip":"0.0.0.0","icc":true,"iptables":true,"ip-forward":true,"ip-masq":true,"userland-proxy":true,"disable-legacy-registry":true,"experimental":false,"network-control-plane-mtu":1500,"runtimes":{"runc":{"path":"docker-runc"}},"default-runtime":"runc","oom-score-adjust":-500,"default-shm-size":67108864,"default-ipc-mode":"shareable"}

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2018-01-21 00:56:02 +01:00
Sebastiaan van Stijn
6121a8429b
Move reload-related functions to reload.go
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2018-01-21 00:55:49 +01:00
Anusha Ragunathan
c162e8eb41
Merge pull request #35830 from cpuguy83/unbindable_shm
Make container shm parent unbindable
2018-01-19 17:43:30 -08:00
Vincent Demeester
f97256cbf1
Merge pull request #35744 from ndeloof/35702
closes #35702 introduce « exec_die » event
2018-01-19 15:03:50 -08:00
Brian Goff
949ee0e529
Merge pull request #36003 from pradipd/upgrade_fix
Fixing ingress network when upgrading from 17.09 to 17.12.
2018-01-19 15:46:50 -05:00
Vincent Demeester
3c9d023af3
Merge pull request #36051 from Microsoft/jjh/remotefs-read-return-error
LCOW remotefs - return error in Read() implementation
2018-01-19 11:27:13 -08:00
Yong Tang
e77267c5a6 Carry 34248 Added tag log option to json-logger and use RawAttrs
This fix carries PR 34248: Added tag log option to json-logger

This fix changes to use RawAttrs based on review feedback.

This fix fixes 19803, this fix closes 34248.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2018-01-19 17:51:20 +00:00
bonczj
5f50f4f511 Added tag log option to json-logger
Fixes #19803
Updated the json-logger to utilize the common log option
'tag' that can define container/image information to include
as part of logging.

When the 'tag' log option is not included, there is no change
to the log content via the json-logger. When the 'tag' log option
is included, the tag will be parsed as a template and the result
will be stored within each log entry as the attribute 'tag'.

Update: Removing test added to integration_cli as those have been deprecated.
Update: Using proper test calls (require and assert) in jsonfilelog_test.go based on review.
Update: Added new unit test configs for logs with tag. Updated unit test error checking.
Update: Cleanup check in jsonlogbytes_test.go to match pending changes in PR #34946.
Update: Merging to correct conflicts from PR #34946.

Signed-off-by: bonczj <josh.bonczkowski@gmail.com>
2018-01-19 17:41:19 +00:00
Brian Goff
bb6ce89737
Merge pull request #34859 from Microsoft/jjh/singleimagestore
LCOW: Coalesce daemon stores, allow dual LCOW and WCOW mode
2018-01-19 11:38:30 -05:00
John Howard
6112ad6e7d LCOW remotefs - return error in Read() implementation
Signed-off-by: John Howard <jhoward@microsoft.com>
2018-01-18 17:46:58 -08:00
John Howard
0cba7740d4 Address feedback from Tonis
Signed-off-by: John Howard <jhoward@microsoft.com>
2018-01-18 12:30:39 -08:00
John Howard
420dc4eeb4 LCOW: Regular mount if only one layer
Signed-off-by: John Howard <jhoward@microsoft.com>
2018-01-18 12:01:58 -08:00
Sebastiaan van Stijn
0fa3962b8d
Merge pull request #36030 from cpuguy83/quota_update
Ensure CPU quota/period updates are sent to runc
2018-01-18 19:54:10 +01:00
John Howard
afd305c4b5 LCOW: Refactor to multiple layer-stores based on feedback
Signed-off-by: John Howard <jhoward@microsoft.com>
2018-01-18 08:31:05 -08:00