Commit graph

36683 commits

Author SHA1 Message Date
Sebastiaan van Stijn
42987cab19
Merge pull request #38874 from thaJeztah/small_error_improvements
Minor error cleanups in projectquota
2019-03-14 09:58:08 +01:00
Sebastiaan van Stijn
aa51dcec94
Merge pull request #38868 from justincormack/google-uuid
Switch to google/uuid
2019-03-14 02:19:01 +01:00
Sebastiaan van Stijn
f73dd5fdad
Revert "Adding builder version"
This reverts commit f821f002e5.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-03-14 00:18:46 +01:00
Sebastiaan van Stijn
154d6c5207
Minor error cleanups in projectquota
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-03-13 23:39:38 +01:00
Sebastiaan van Stijn
42ad354e7a
Merge pull request #38870 from dmcgowan/quota-not-permitted-log
Update quota support to treat permission error as not supported
2019-03-13 23:38:37 +01:00
Sebastiaan van Stijn
386b06eacd
vendor containerd/cgroups dbea6f2bd41658b84b00417ceefa416b979cbf10
Relevant changes:

- containerd/containerd#51 Fix empty device type
- containerd/containerd#52 Remove call to unitName
  - Calling unitName incorrectly appends -slice onto the end of the slice cgroup we are looking for
  - addresses containerd/containerd#47 cgroups: cgroup deleted
- containerd/containerd#53 systemd-239+ no longer allows delegate slice
- containerd/containerd#54 Bugfix: can't write to cpuset cgroup
- containerd/containerd#63 Makes Load function more lenient on subsystems' checking
  - addresses containerd/containerd#58 Very strict checking of subsystems' existence while loading cgroup
- containerd/containerd#67 Add functionality for retrieving all tasks of a cgroup
- containerd/containerd#68 Fix net_prio typo
- containerd/containerd#69 Blkio weight/leafWeight pointer value
- containerd/containerd#77 Check for non-active/supported cgroups
  - addresses containerd/containerd#76 unable to find * in controller set: unknown
  - addresses docker/for-linux#545 Raspbian: Error response from daemon: unable to find "net_prio" in controller set: unknown
  - addresses docker/for-linux#552 Error response from daemon: unable to find "cpuacct" in controller set: unknown
  - addresses docker/for-linux#545 Raspbian: Error response from daemon: unable to find "net_prio" in controller set: unknown

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-03-13 21:39:49 +01:00
Sebastiaan van Stijn
69f7263795
vendor containerd client v1.2.5
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-03-13 21:22:13 +01:00
Sebastiaan van Stijn
79f5fbee01
Vendor runc 2b18fe1d885ee5083ef9f0838fee39b62d653e30
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-03-13 21:15:32 +01:00
Sebastiaan van Stijn
25cdae293f
Update containerd v1.2.5, runc 2b18fe1d885ee5083ef9f0838fee39b62d653e30
Notable Updates

- Fix an issue that non-existent parent directory in image layers is created with permission 0700. containerd#3017
- Fix an issue that snapshots of the base image can be deleted by mistake, when images built on top of it are deleted. containerd#3087
- Support for GC references to content from snapshot and container objects. containerd#3080
- cgroups updated to dbea6f2bd41658b84b00417ceefa416b97 to fix issues for systemd 420 and non-existent cgroups. containerd#3079
- runc updated to 2b18fe1d885ee5083ef9f0838fee39b62d653e30 to include the improved fix for CVE-2019-5736. containerd#3082
- cri: Fix a bug that pod can't get started when the same volume is defined differently in the image and the pod spec. cri#1059
- cri: Fix a bug that causes container start failure after in-place upgrade containerd to 1.2.4+ or 1.1.6+. cri#1082
- cri updated to a92c40017473cbe0239ce180125f12669757e44f. containerd#3084

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-03-13 21:00:50 +01:00
Derek McGowan
1217819f07
Update quota support to treat permission error as not supported
When initializing graphdrivers without root a permission warning
log is given due to lack of permission to create a device. This
error should be treated the same as quota not supported.

Signed-off-by: Derek McGowan <derek@mcgstyle.net>
2019-03-13 11:22:13 -07:00
Justin Cormack
c435551ccc
Switch to google/uuid
pborman/uuid and google/uuid used to be different versions of
the same package, but now pborman/uuid is a compatibility wrapper
around google/uuid, maintained by the same person.

Clean up some of the usage as the functions differ slightly.

Not yet removed some uses of pborman/uuid in vendored code but
I have PRs in process for these.

Signed-off-by: Justin Cormack <justin.cormack@docker.com>
2019-03-13 14:13:58 +00:00
Vincent Demeester
46036c2308
Merge pull request #37534 from thaJeztah/fix-distribution-500
Fix error 500 on distribution endpoint
2019-03-13 08:29:16 +01:00
John Howard
19a938f6bc LCOWv1:Remote lcow.kernel and lcow.initrd
Signed-off-by: John Howard <jhoward@microsoft.com>

LCOWv1 will be deprecated soon anyway (and LCOW is experimental regardless).
Removing lcow.initrd and lcow.kernel options which will not be supported
in LCOWv2 (via containerd).
2019-03-12 19:31:12 -07:00
John Howard
2f27332836 Windows: Implement docker top for containerd
Signed-off-by: John Howard <jhoward@microsoft.com>
2019-03-12 18:41:55 -07:00
John Howard
8de5db1c00 Remove unsupported lcow.vhdx option
Signed-off-by: John Howard <jhoward@microsoft.com>

This was only experimental and removed from opengcs. Making same
change in docker.
2019-03-12 18:41:55 -07:00
John Howard
0a30ef4c59 Publish empty stats on error
Signed-off-by: John Howard <jhoward@microsoft.com>
2019-03-12 18:41:55 -07:00
John Howard
92bf0a5046 Windows:Add ETW logging hook
Signed-off-by: John Howard <jhoward@microsoft.com>
2019-03-12 18:41:55 -07:00
John Howard
afa3aec024 Windows: Don't shadow err variable
Signed-off-by: John Howard <jhoward@microsoft.com>
2019-03-12 18:41:55 -07:00
John Howard
32acc76b1a Windows: Fix handle leaks/logging if init proc start fails
Signed-off-by: John Howard <jhoward@microsoft.com>

Fixes #38719

Fixes some subtle bugs on Windows

 - Fixes https://github.com/moby/moby/issues/38719. This one is the most important
   as failure to start the init process in a Windows container will cause leaked
   handles. (ie where the `ctr.hcsContainer.CreateProcess(...)` call fails).
   The solution to the leak is to split out the `reapContainer` part of `reapProcess`
   into a separate function. This ensures HCS resources are cleaned up correctly and
   not leaked.

 - Ensuring the reapProcess goroutine is started immediately the process
   is actually started, so we don't leak in the case of failures such as
   from `newIOFromProcess` or `attachStdio`

 - libcontainerd on Windows (local, not containerd) was not sending the EventCreate
   back to the monitor on Windows. Just LCOW. This was just an oversight from
   refactoring a couple of years ago by Mikael as far as I can tell. Technically
   not needed for functionality except for the logging being missing, but is correct.
2019-03-12 18:41:55 -07:00
John Howard
d4ceb61f2b LCOW:Reworking spec builder
Signed-off-by: John Howard <jhoward@microsoft.com>
2019-03-12 18:41:55 -07:00
John Howard
20833b06a0 Windows: (WCOW) Generate OCI spec that remote runtime can escape
Signed-off-by: John Howard <jhoward@microsoft.com>

Also fixes https://github.com/moby/moby/issues/22874

This commit is a pre-requisite to moving moby/moby on Windows to using
Containerd for its runtime.

The reason for this is that the interface between moby and containerd
for the runtime is an OCI spec which must be unambigious.

It is the responsibility of the runtime (runhcs in the case of
containerd on Windows) to ensure that arguments are escaped prior
to calling into HCS and onwards to the Win32 CreateProcess call.

Previously, the builder was always escaping arguments which has
led to several bugs in moby. Because the local runtime in
libcontainerd had context of whether or not arguments were escaped,
it was possible to hack around in daemon/oci_windows.go with
knowledge of the context of the call (from builder or not).

With a remote runtime, this is not possible as there's rightly
no context of the caller passed across in the OCI spec. Put another
way, as I put above, the OCI spec must be unambigious.

The other previous limitation (which leads to various subtle bugs)
is that moby is coded entirely from a Linux-centric point of view.

Unfortunately, Windows != Linux. Windows CreateProcess uses a
command line, not an array of arguments. And it has very specific
rules about how to escape a command line. Some interesting reading
links about this are:

https://blogs.msdn.microsoft.com/twistylittlepassagesallalike/2011/04/23/everyone-quotes-command-line-arguments-the-wrong-way/
https://stackoverflow.com/questions/31838469/how-do-i-convert-argv-to-lpcommandline-parameter-of-createprocess
https://docs.microsoft.com/en-us/cpp/cpp/parsing-cpp-command-line-arguments?view=vs-2017

For this reason, the OCI spec has recently been updated to cater
for more natural syntax by including a CommandLine option in
Process.

What does this commit do?

Primary objective is to ensure that the built OCI spec is unambigious.

It changes the builder so that `ArgsEscaped` as commited in a
layer is only controlled by the use of CMD or ENTRYPOINT.

Subsequently, when calling in to create a container from the builder,
if follows a different path to both `docker run` and `docker create`
using the added `ContainerCreateIgnoreImagesArgsEscaped`. This allows
a RUN from the builder to control how to escape in the OCI spec.

It changes the builder so that when shell form is used for RUN,
CMD or ENTRYPOINT, it builds (for WCOW) a more natural command line
using the original as put by the user in the dockerfile, not
the parsed version as a set of args which loses fidelity.
This command line is put into args[0] and `ArgsEscaped` is set
to true for CMD or ENTRYPOINT. A RUN statement does not commit
`ArgsEscaped` to the commited layer regardless or whether shell
or exec form were used.
2019-03-12 18:41:55 -07:00
John Howard
85ad4b16c1 Windows: Experimental: Allow containerd for runtime
Signed-off-by: John Howard <jhoward@microsoft.com>

This is the first step in refactoring moby (dockerd) to use containerd on Windows.
Similar to the current model in Linux, this adds the option to enable it for runtime.
It does not switch the graphdriver to containerd snapshotters.

 - Refactors libcontainerd to a series of subpackages so that either a
  "local" containerd (1) or a "remote" (2) containerd can be loaded as opposed
  to conditional compile as "local" for Windows and "remote" for Linux.

 - Updates libcontainerd such that Windows has an option to allow the use of a
   "remote" containerd. Here, it communicates over a named pipe using GRPC.
   This is currently guarded behind the experimental flag, an environment variable,
   and the providing of a pipename to connect to containerd.

 - Infrastructure pieces such as under pkg/system to have helper functions for
   determining whether containerd is being used.

(1) "local" containerd is what the daemon on Windows has used since inception.
It's not really containerd at all - it's simply local invocation of HCS APIs
directly in-process from the daemon through the Microsoft/hcsshim library.

(2) "remote" containerd is what docker on Linux uses for it's runtime. It means
that there is a separate containerd service running, and docker communicates over
GRPC to it.

To try this out, you will need to start with something like the following:

Window 1:
	containerd --log-level debug

Window 2:
	$env:DOCKER_WINDOWS_CONTAINERD=1
	dockerd --experimental -D --containerd \\.\pipe\containerd-containerd

You will need the following binary from github.com/containerd/containerd in your path:
 - containerd.exe

You will need the following binaries from github.com/Microsoft/hcsshim in your path:
 - runhcs.exe
 - containerd-shim-runhcs-v1.exe

For LCOW, it will require and initrd.img and kernel in `C:\Program Files\Linux Containers`.
This is no different to the current requirements. However, you may need updated binaries,
particularly initrd.img built from Microsoft/opengcs as (at the time of writing), Linuxkit
binaries are somewhat out of date.

Note that containerd and hcsshim for HCS v2 APIs do not yet support all the required
functionality needed for docker. This will come in time - this is a baby (although large)
step to migrating Docker on Windows to containerd.

Note that the HCS v2 APIs are only called on RS5+ builds. RS1..RS4 will still use
HCS v1 APIs as the v2 APIs were not fully developed enough on these builds to be usable.
This abstraction is done in HCSShim. (Referring specifically to runtime)

Note the LCOW graphdriver still uses HCS v1 APIs regardless.

Note also that this does not migrate docker to use containerd snapshotters
rather than graphdrivers. This needs to be done in conjunction with Linux also
doing the same switch.
2019-03-12 18:41:55 -07:00
John Howard
1feaf88aa0 Vendor sirupsen/logrus@v1.3.0
Signed-off-by: John Howard <jhoward@microsoft.com>
2019-03-12 18:41:55 -07:00
John Howard
d1cb9a47ec Vendor Microsoft/opengcs@a1096715
Signed-off-by: John Howard <jhoward@microsoft.com>
2019-03-12 18:41:55 -07:00
John Howard
25dff4b4ab Vendor Microsoft/go-winio@4de24ed3
Signed-off-by: John Howard <jhoward@microsoft.com>
2019-03-12 18:41:46 -07:00
John Howard
cc46695320 Vendor Microsoft/hcsshim@ada9cb39
Signed-off-by: John Howard <jhoward@microsoft.com>
2019-03-12 18:21:41 -07:00
Aleksa Sarai
ba0afa6ba8
internal: test/env: switch to assert.TestingT
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2019-03-13 11:48:40 +11:00
Aleksa Sarai
175b1d7830
integration-cli: don't build -test images if they already exist
There's no need to try to re-build the test images if they already
exist. This change makes basically no difference to the upstream
integration test-suite running, but for users who want to run the
integration-cli suite on a host machine (such as distributions doing
tests) this change allows images to be pre-loaded such that compilers
aren't needed on the test machine.

However, this does remove the accidental re-compilation of nnp-test, as
well as handling errors far more cleanly (previously if an error
occurred during a test build, further tests won't attempt to rebuild
it).

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2019-03-13 11:48:40 +11:00
Aleksa Sarai
d283c7fa2b
*: remove interfacer linter from CI
It has been declared deprecated by the author, and has a knack for
false-positives (as well as giving bad advice when it comes to APIs --
which is quite clear when looking at "nolint: interfacer" comments).

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2019-03-13 11:48:39 +11:00
Sebastiaan van Stijn
6d87f19142
builder: fix COPY --from should preserve ownership
When copying between stages, or copying from an image,
ownership of the copied files should not be changed, unless
the `--chown` option is set (in which case ownership of copied
files should be updated to the specified user/group).

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-03-13 00:55:04 +01:00
Sebastiaan van Stijn
1101568fa1
Update TestUpdatePidsLimit to be more atomic
Create a new container for each subtest, so that individual
subtests are self-contained, and there's no need to execute
them in the exact order, or resetting the container in between.

This makes the test slower (6.54s vs  3.43s), but reduced the
difference by using `network=host`, which made a substantial
difference (without `network=host`, the test took more than
twice as long: 13.96s).

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-03-13 00:27:15 +01:00
Sebastiaan van Stijn
ffa1728d4b
Normalize values for pids-limit
- Don't set `PidsLimit` when creating a container and
  no limit was set (or the limit was set to "unlimited")
- Don't set `PidsLimit` if the host does not have pids-limit
  support (previously "unlimited" was set).
- Do not generate a warning if the host does not have pids-limit
  support, but pids-limit was set to unlimited (having no
  limit set, or the limit set to "unlimited" is equivalent,
  so no warning is nescessary in that case).
- When updating a container, convert `0`, and `-1` to
  "unlimited" (`0`).

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-03-13 00:27:05 +01:00
Brian Goff
258edd715d
Merge pull request #38831 from thaJeztah/bump_swarmkit
bump swarmkit to 415dc72789e2b733ea884f09188c286ca187d8ec
2019-03-12 09:51:51 -07:00
Sebastiaan van Stijn
f58fa6e5c0
Merge pull request #38855 from thaJeztah/добро_пожаловать_Кир_как_сопровождающий
Add Kir as maintainer
2019-03-12 16:35:37 +01:00
Sebastiaan van Stijn
f196671db1
Add Kir as maintainer
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-03-12 13:36:55 +01:00
fanjiyun
1397b8c63c add vfs quota for daemon storage-opts
Signed-off-by: fanjiyun <fan.jiyun@zte.com.cn>
2019-03-11 21:07:29 +08:00
Kir Kolyshkin
596ca142e0 daemon: use 'private' ipc mode by default
This changes the default ipc mode of daemon/engine to be private,
meaning the containers will not have their /dev/shm bind-mounted
from the host by default. The benefits of doing this are:

 1. No leaked mounts. Eliminate a possibility to leak mounts into
    other namespaces (and therefore unfortunate errors like "Unable to
    remove filesystem for <ID>: remove /var/lib/docker/containers/<ID>/shm:
    device or resource busy").

 2. Working checkpoint/restore. Make `docker checkpoint`
    not lose the contents of `/dev/shm`, but save it to
    the dump, and be restored back upon `docker start --checkpoint`
    (currently it is lost -- while CRIU handles tmpfs mounts,
    the "shareable" mount is seen as external to container,
    and thus rightfully ignored).

3. Better security. Currently any container is opened to share
   its /dev/shm with any other container.

Obviously, this change will break the following usage scenario:

 $ docker run -d --name donor busybox top
 $ docker run --rm -it --ipc container:donor busybox sh
 Error response from daemon: linux spec namespaces: can't join IPC
 of container <ID>: non-shareable IPC (hint: use IpcMode:shareable
 for the donor container)

The soution, as hinted by the (amended) error message, is to
explicitly enable donor sharing by using --ipc shareable:

 $ docker run -d --name donor --ipc shareable busybox top

Compatibility notes:

1. This only applies to containers created _after_ this change.
   Existing containers are not affected and will work fine
   as their ipc mode is stored in HostConfig.

2. Old backward compatible behavior ("shareable" containers
   by default) can be enabled by either using
   `--default-ipc-mode shareable` daemon command line option,
   or by adding a `"default-ipc-mode": "shareable"`
   line in `/etc/docker/daemon.json` configuration file.

3. If an older client (API < 1.40) is used, a "shareable" container
   is created. A test to check that is added.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2019-03-09 18:57:42 -08:00
Kir Kolyshkin
ce7528ebdf postContainersCreate: minor nitpick
There are two if statements checking for exactly same conditions:

> if hostConfig != nil && versions.LessThan(version, "1.40")

Merge these.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2019-03-09 18:57:42 -08:00
Brian Goff
1275a001a6 Enable buildkit for Makefile build target
This is set only if it is not already set.
This should give a little speedup to CI builds.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2019-03-09 18:28:45 -08:00
Yong Tang
33c3200e0d
Merge pull request #38843 from kolyshkin/ipc-test-move
TestDaemonRestartIpcMode: move to integration
2019-03-09 15:59:53 -08:00
Kir Kolyshkin
9fd765f07c TestDaemonRestartIpcMode: modernize
Move the test case from integration-cli to integration.

The test logic itself has not changed, except these
two things:

* the new test sets default-ipc-mode via command line
  rather than via daemon.json (less code);
* the new test uses current API version.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2019-03-08 10:04:43 -08:00
Kir Kolyshkin
f664df01d1 integration: add/use WithRestartPolicy
NOTE TestUpdateRestartPolicy is left as is as otherwise
it will decrease its readability.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2019-03-08 10:03:55 -08:00
Kir Kolyshkin
17022b3ad2 integration/internal/container/ops: rm unused code
Since container.Create() already initializes HostConfig
to be non-nil, there is no need for this code. Remove it.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2019-03-08 10:00:14 -08:00
Kir Kolyshkin
39eaf1ef97 TestUpdateRestartWithAutoRemove: use WithAutoRemove
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2019-03-08 09:59:22 -08:00
Sebastiaan van Stijn
54dddadc7d
Merge pull request #38452 from avagin/cr-test
integration/container: add a base test for C/R
2019-03-07 01:54:17 +01:00
Sebastiaan van Stijn
667e800b2c
bump swarmkit to 415dc72789e2b733ea884f09188c286ca187d8ec
relevant changes:

- swarmkit#2815 Extension and resource API declarations
- swarmkit#2816 Moving swap options into `ResourceRequirements` instead of `ContainerSpec`s
  - relates to moby#37872
- swarmkit#2821 allocator: use a map for network-IDs to prevent O(n2)
- swarmkit#2832 [api] Add created object to return types for extension and resource create apis
- swarmkit#2831 [controlapi] Extension api implementation
- swarmkit#2835 Resource controlapi Implemetation
- swarmkit#2802 Use custom gRPC dialer to override default proxy dialer
  - addresses moby#35395 Swarm worker cannot connect to master if proxy is configured
  - addresses moby#issues/36951 Swarm nodes cannot join as masters if http proxy is set
  - relates to swarmkit#2419 Provide custom gRPC dialer to override default proxy dialer

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-03-06 16:46:01 +01:00
Akihiro Suda
fc01c2b481
Merge pull request #37874 from justincormack/remove-libtrust
Remove the rest of v1 manifest support
2019-03-06 14:41:27 +09:00
Tianon Gravi
5a7d6dcf21
Merge pull request #38820 from bynnchapu/mkimage-yum_add-new-tag-option
Add new option to specify tag information to mkimage-yum.sh
2019-03-05 16:23:16 -08:00
Noriki Nakamura
57c2228cc1 Add new option to specify tag information
Previously, tag information automatically is added from
/etc/{redhat,system}-release in image (target directory).

But I want to specify any tag informtion when using mkimage-yum.sh.
Because a Linux distribution based RHEL (It's Asianux Server) uses
SPn notation (e.g. SP3) instead of period notaion (e.g. 7.6).

Signed-off-by: Noriki Nakamura <noriki.nakamura@miraclelinux.com>
2019-03-06 07:06:40 +09:00
Yong Tang
6e86b1198f
Merge pull request #38780 from thaJeztah/remove_parse_tmpfs_options
pkg/mount: remove unused ParseTmpfsOptions
2019-03-04 10:01:41 -08:00