Commit graph

6349 commits

Author SHA1 Message Date
Tibor Vass
119f892016
Merge pull request #38510 from ZYecho/tune-code
fix: simplify code logic
2019-03-21 13:56:02 -07:00
Tõnis Tiigi
25661a3a04
Merge pull request #38793 from thaJeztah/pids_limit_improvements
Some refactoring on PidsLimit
2019-03-21 13:44:05 -07:00
John Howard
b4db78be5a LCOW: Add SIDs to layer.vhd at creation
Signed-off-by: John Howard <jhoward@microsoft.com>

Some permissions corrections here. Also needs re-vendor of go-winio.

 - Create the layer folder directory as standard, not with SDDL. It will inherit permissions from the data-root correctly.
 - Apply the VM Group SID access to layer.vhd

Permissions after this changes

Data root:

```
PS C:\> icacls test
test BUILTIN\Administrators:(OI)(CI)(F)
     NT AUTHORITY\SYSTEM:(OI)(CI)(F)
```

lcow subdirectory under dataroot
```
PS C:\> icacls test\lcow
test\lcow BUILTIN\Administrators:(I)(OI)(CI)(F)
          NT AUTHORITY\SYSTEM:(I)(OI)(CI)(F)
```

layer.vhd in a layer folder for LCOW
```
.\test\lcow\c33923d21c9621fea2f990a8778f469ecdbdc57fd9ca682565d1fa86fadd5d95\layer.vhd NT VIRTUAL MACHINE\Virtual Machines:(R)
                                                                                       BUILTIN\Administrators:(I)(F)
                                                                                       NT AUTHORITY\SYSTEM:(I)(F)
```

And showing working

```
PS C:\> docker-ci-zap -folder=c:\test
INFO: Zapped successfully
PS C:\> docker run --rm alpine echo hello
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
8e402f1a9c57: Pull complete
Digest: sha256:644fcb1a676b5165371437feaa922943aaf7afcfa8bfee4472f6860aad1ef2a0
Status: Downloaded newer image for alpine:latest
hello
```
2019-03-21 13:12:17 -07:00
Michael Crosby
7603c22c73 Use original process spec for execs
Fixes #38865

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-03-21 15:41:53 -04:00
Sebastiaan van Stijn
f43826c433
bump opencontainers/selinux to v1.2
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-03-21 10:10:05 +01:00
Sebastiaan van Stijn
c7105e3c99
Simplify verifyNetworkingConfig()
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-03-20 18:46:56 +01:00
Sebastiaan van Stijn
bcb4a331f9
connectToNetwork: use locally scoped err
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-03-20 18:46:46 +01:00
Sebastiaan van Stijn
ebe0174f22
Simplify hasUserDefinedIPAddress, and centralize validation
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-03-20 18:46:35 +01:00
Sebastiaan van Stijn
20dde01848
Move EnableServiceDiscoveryOnDefaultNetwork to container-operations
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-03-20 18:45:20 +01:00
Sebastiaan van Stijn
0169ad3e2a
Remove redundant isNetworkHotPluggable() function
All platforms now have hot-pluggable networks, so this
check was no longer needed.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-03-20 18:45:07 +01:00
John Howard
a3eda72f71
Merge pull request #38541 from Microsoft/jjh/containerd
Windows: Experimental: ContainerD runtime
2019-03-19 21:09:19 -07:00
Sebastiaan van Stijn
e7b5f7dbe9
Merge pull request #38891 from thaJeztah/warn_manager_count
Return a warning when running in a two-manager setup
2019-03-19 22:54:53 +01:00
Kyle Wuolle
e65c680394
Fix for situation where swarm leave causes wait forever for agent to stop
In this case the message to stop the agent is never actually sent
because the swarm node is nil

Signed-off-by: Kyle Wuolle <kyle.wuolle@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-03-19 18:45:14 +01:00
Tibor Vass
8f936ae8cf Add DeviceRequests to HostConfig to support NVIDIA GPUs
This patch hard-codes support for NVIDIA GPUs.
In a future patch it should move out into its own Device Plugin.

Signed-off-by: Tibor Vass <tibor@docker.com>
2019-03-18 17:19:45 +00:00
Sebastiaan van Stijn
81eef17e38
Return a warning when running in a two-manager setup
Running a cluster in a two-manager configuration effectively *doubles*
the chance of loosing control over the cluster (compared to running
in a single-manager setup). Users may have the assumption that having
two managers provides fault tolerance, so it's best to warn them if
they're using this configuration.

This patch adds a warning to the `info` response if Swarm is configured
with two managers:

    WARNING: Running Swarm in a two-manager configuration. This configuration provides
             no fault tolerance, and poses a high risk to loose control over the cluster.
             Refer to https://docs.docker.com/engine/swarm/admin_guide/ to configure the
             Swarm for fault-tolerance.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-03-18 14:36:00 +01:00
Sebastiaan van Stijn
2925eb7a2a
Merge pull request #38777 from wk8/wk8/raw_cred_specs
Making it possible to pass Windows credential specs directly to the engine
2019-03-16 16:42:39 +01:00
Jean Rouge
7fdac7eb0f Making it possible to pass Windows credential specs directly to the engine
Instead of having to go through files or registry values as is currently the
case.

While adding GMSA support to Kubernetes (https://github.com/kubernetes/kubernetes/pull/73726)
I stumbled upon the fact that Docker currently only allows passing Windows
credential specs through files or registry values, forcing the Kubelet
to perform a rather awkward dance of writing-then-deleting to either the
disk or the registry to be able to create a Windows container with cred
specs.

This patch solves this problem by making it possible to directly pass
whole base64-encoded cred specs to the engine's API. I took the opportunity
to slightly refactor the method responsible for Windows cred spec as it
seemed hard to read to me.

Added some unit tests on Windows credential specs handling, as there were
previously none.

Added/amended the relevant integration tests.

I have also tested it manually: given a Windows container using a cred spec
that you would normally start with e.g.
```powershell
docker run --rm --security-opt "credentialspec=file://win.json" mcr.microsoft.com/windows/servercore:ltsc2019 nltest /parentdomain
# output:
# my.ad.domain.com. (1)
# The command completed successfully
```
can now equivalently be started with
```powershell
$rawCredSpec = & cat 'C:\ProgramData\docker\credentialspecs\win.json'
$escaped = $rawCredSpec.Replace('"', '\"')
docker run --rm --security-opt "credentialspec=raw://$escaped" mcr.microsoft.com/windows/servercore:ltsc2019 nltest /parentdomain
# same output!
```

I'll do another PR on Swarmkit after this is merged to allow services to use
the same option.

(It's worth noting that @dperny faced the same problem adding GMSA support
to Swarmkit, to which he came up with an interesting solution - see
https://github.com/moby/moby/pull/38632 - but alas these tricks are not
available to the Kubelet.)

Signed-off-by: Jean Rouge <rougej+github@gmail.com>
2019-03-15 19:20:19 -07:00
Sebastiaan van Stijn
ca0b64ee3b
Merge pull request #35621 from kolyshkin/ipc-private
daemon: use 'private' ipc mode by default
2019-03-14 19:27:30 +01:00
Sebastiaan van Stijn
42987cab19
Merge pull request #38874 from thaJeztah/small_error_improvements
Minor error cleanups in projectquota
2019-03-14 09:58:08 +01:00
Sebastiaan van Stijn
aa51dcec94
Merge pull request #38868 from justincormack/google-uuid
Switch to google/uuid
2019-03-14 02:19:01 +01:00
Sebastiaan van Stijn
154d6c5207
Minor error cleanups in projectquota
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-03-13 23:39:38 +01:00
Sebastiaan van Stijn
42ad354e7a
Merge pull request #38870 from dmcgowan/quota-not-permitted-log
Update quota support to treat permission error as not supported
2019-03-13 23:38:37 +01:00
Derek McGowan
1217819f07
Update quota support to treat permission error as not supported
When initializing graphdrivers without root a permission warning
log is given due to lack of permission to create a device. This
error should be treated the same as quota not supported.

Signed-off-by: Derek McGowan <derek@mcgstyle.net>
2019-03-13 11:22:13 -07:00
Justin Cormack
c435551ccc
Switch to google/uuid
pborman/uuid and google/uuid used to be different versions of
the same package, but now pborman/uuid is a compatibility wrapper
around google/uuid, maintained by the same person.

Clean up some of the usage as the functions differ slightly.

Not yet removed some uses of pborman/uuid in vendored code but
I have PRs in process for these.

Signed-off-by: Justin Cormack <justin.cormack@docker.com>
2019-03-13 14:13:58 +00:00
Vincent Demeester
46036c2308
Merge pull request #37534 from thaJeztah/fix-distribution-500
Fix error 500 on distribution endpoint
2019-03-13 08:29:16 +01:00
John Howard
19a938f6bc LCOWv1:Remote lcow.kernel and lcow.initrd
Signed-off-by: John Howard <jhoward@microsoft.com>

LCOWv1 will be deprecated soon anyway (and LCOW is experimental regardless).
Removing lcow.initrd and lcow.kernel options which will not be supported
in LCOWv2 (via containerd).
2019-03-12 19:31:12 -07:00
John Howard
2f27332836 Windows: Implement docker top for containerd
Signed-off-by: John Howard <jhoward@microsoft.com>
2019-03-12 18:41:55 -07:00
John Howard
8de5db1c00 Remove unsupported lcow.vhdx option
Signed-off-by: John Howard <jhoward@microsoft.com>

This was only experimental and removed from opengcs. Making same
change in docker.
2019-03-12 18:41:55 -07:00
John Howard
0a30ef4c59 Publish empty stats on error
Signed-off-by: John Howard <jhoward@microsoft.com>
2019-03-12 18:41:55 -07:00
John Howard
32acc76b1a Windows: Fix handle leaks/logging if init proc start fails
Signed-off-by: John Howard <jhoward@microsoft.com>

Fixes #38719

Fixes some subtle bugs on Windows

 - Fixes https://github.com/moby/moby/issues/38719. This one is the most important
   as failure to start the init process in a Windows container will cause leaked
   handles. (ie where the `ctr.hcsContainer.CreateProcess(...)` call fails).
   The solution to the leak is to split out the `reapContainer` part of `reapProcess`
   into a separate function. This ensures HCS resources are cleaned up correctly and
   not leaked.

 - Ensuring the reapProcess goroutine is started immediately the process
   is actually started, so we don't leak in the case of failures such as
   from `newIOFromProcess` or `attachStdio`

 - libcontainerd on Windows (local, not containerd) was not sending the EventCreate
   back to the monitor on Windows. Just LCOW. This was just an oversight from
   refactoring a couple of years ago by Mikael as far as I can tell. Technically
   not needed for functionality except for the logging being missing, but is correct.
2019-03-12 18:41:55 -07:00
John Howard
d4ceb61f2b LCOW:Reworking spec builder
Signed-off-by: John Howard <jhoward@microsoft.com>
2019-03-12 18:41:55 -07:00
John Howard
20833b06a0 Windows: (WCOW) Generate OCI spec that remote runtime can escape
Signed-off-by: John Howard <jhoward@microsoft.com>

Also fixes https://github.com/moby/moby/issues/22874

This commit is a pre-requisite to moving moby/moby on Windows to using
Containerd for its runtime.

The reason for this is that the interface between moby and containerd
for the runtime is an OCI spec which must be unambigious.

It is the responsibility of the runtime (runhcs in the case of
containerd on Windows) to ensure that arguments are escaped prior
to calling into HCS and onwards to the Win32 CreateProcess call.

Previously, the builder was always escaping arguments which has
led to several bugs in moby. Because the local runtime in
libcontainerd had context of whether or not arguments were escaped,
it was possible to hack around in daemon/oci_windows.go with
knowledge of the context of the call (from builder or not).

With a remote runtime, this is not possible as there's rightly
no context of the caller passed across in the OCI spec. Put another
way, as I put above, the OCI spec must be unambigious.

The other previous limitation (which leads to various subtle bugs)
is that moby is coded entirely from a Linux-centric point of view.

Unfortunately, Windows != Linux. Windows CreateProcess uses a
command line, not an array of arguments. And it has very specific
rules about how to escape a command line. Some interesting reading
links about this are:

https://blogs.msdn.microsoft.com/twistylittlepassagesallalike/2011/04/23/everyone-quotes-command-line-arguments-the-wrong-way/
https://stackoverflow.com/questions/31838469/how-do-i-convert-argv-to-lpcommandline-parameter-of-createprocess
https://docs.microsoft.com/en-us/cpp/cpp/parsing-cpp-command-line-arguments?view=vs-2017

For this reason, the OCI spec has recently been updated to cater
for more natural syntax by including a CommandLine option in
Process.

What does this commit do?

Primary objective is to ensure that the built OCI spec is unambigious.

It changes the builder so that `ArgsEscaped` as commited in a
layer is only controlled by the use of CMD or ENTRYPOINT.

Subsequently, when calling in to create a container from the builder,
if follows a different path to both `docker run` and `docker create`
using the added `ContainerCreateIgnoreImagesArgsEscaped`. This allows
a RUN from the builder to control how to escape in the OCI spec.

It changes the builder so that when shell form is used for RUN,
CMD or ENTRYPOINT, it builds (for WCOW) a more natural command line
using the original as put by the user in the dockerfile, not
the parsed version as a set of args which loses fidelity.
This command line is put into args[0] and `ArgsEscaped` is set
to true for CMD or ENTRYPOINT. A RUN statement does not commit
`ArgsEscaped` to the commited layer regardless or whether shell
or exec form were used.
2019-03-12 18:41:55 -07:00
John Howard
85ad4b16c1 Windows: Experimental: Allow containerd for runtime
Signed-off-by: John Howard <jhoward@microsoft.com>

This is the first step in refactoring moby (dockerd) to use containerd on Windows.
Similar to the current model in Linux, this adds the option to enable it for runtime.
It does not switch the graphdriver to containerd snapshotters.

 - Refactors libcontainerd to a series of subpackages so that either a
  "local" containerd (1) or a "remote" (2) containerd can be loaded as opposed
  to conditional compile as "local" for Windows and "remote" for Linux.

 - Updates libcontainerd such that Windows has an option to allow the use of a
   "remote" containerd. Here, it communicates over a named pipe using GRPC.
   This is currently guarded behind the experimental flag, an environment variable,
   and the providing of a pipename to connect to containerd.

 - Infrastructure pieces such as under pkg/system to have helper functions for
   determining whether containerd is being used.

(1) "local" containerd is what the daemon on Windows has used since inception.
It's not really containerd at all - it's simply local invocation of HCS APIs
directly in-process from the daemon through the Microsoft/hcsshim library.

(2) "remote" containerd is what docker on Linux uses for it's runtime. It means
that there is a separate containerd service running, and docker communicates over
GRPC to it.

To try this out, you will need to start with something like the following:

Window 1:
	containerd --log-level debug

Window 2:
	$env:DOCKER_WINDOWS_CONTAINERD=1
	dockerd --experimental -D --containerd \\.\pipe\containerd-containerd

You will need the following binary from github.com/containerd/containerd in your path:
 - containerd.exe

You will need the following binaries from github.com/Microsoft/hcsshim in your path:
 - runhcs.exe
 - containerd-shim-runhcs-v1.exe

For LCOW, it will require and initrd.img and kernel in `C:\Program Files\Linux Containers`.
This is no different to the current requirements. However, you may need updated binaries,
particularly initrd.img built from Microsoft/opengcs as (at the time of writing), Linuxkit
binaries are somewhat out of date.

Note that containerd and hcsshim for HCS v2 APIs do not yet support all the required
functionality needed for docker. This will come in time - this is a baby (although large)
step to migrating Docker on Windows to containerd.

Note that the HCS v2 APIs are only called on RS5+ builds. RS1..RS4 will still use
HCS v1 APIs as the v2 APIs were not fully developed enough on these builds to be usable.
This abstraction is done in HCSShim. (Referring specifically to runtime)

Note the LCOW graphdriver still uses HCS v1 APIs regardless.

Note also that this does not migrate docker to use containerd snapshotters
rather than graphdrivers. This needs to be done in conjunction with Linux also
doing the same switch.
2019-03-12 18:41:55 -07:00
Aleksa Sarai
d283c7fa2b
*: remove interfacer linter from CI
It has been declared deprecated by the author, and has a knack for
false-positives (as well as giving bad advice when it comes to APIs --
which is quite clear when looking at "nolint: interfacer" comments).

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2019-03-13 11:48:39 +11:00
Sebastiaan van Stijn
ffa1728d4b
Normalize values for pids-limit
- Don't set `PidsLimit` when creating a container and
  no limit was set (or the limit was set to "unlimited")
- Don't set `PidsLimit` if the host does not have pids-limit
  support (previously "unlimited" was set).
- Do not generate a warning if the host does not have pids-limit
  support, but pids-limit was set to unlimited (having no
  limit set, or the limit set to "unlimited" is equivalent,
  so no warning is nescessary in that case).
- When updating a container, convert `0`, and `-1` to
  "unlimited" (`0`).

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-03-13 00:27:05 +01:00
fanjiyun
1397b8c63c add vfs quota for daemon storage-opts
Signed-off-by: fanjiyun <fan.jiyun@zte.com.cn>
2019-03-11 21:07:29 +08:00
Kir Kolyshkin
596ca142e0 daemon: use 'private' ipc mode by default
This changes the default ipc mode of daemon/engine to be private,
meaning the containers will not have their /dev/shm bind-mounted
from the host by default. The benefits of doing this are:

 1. No leaked mounts. Eliminate a possibility to leak mounts into
    other namespaces (and therefore unfortunate errors like "Unable to
    remove filesystem for <ID>: remove /var/lib/docker/containers/<ID>/shm:
    device or resource busy").

 2. Working checkpoint/restore. Make `docker checkpoint`
    not lose the contents of `/dev/shm`, but save it to
    the dump, and be restored back upon `docker start --checkpoint`
    (currently it is lost -- while CRIU handles tmpfs mounts,
    the "shareable" mount is seen as external to container,
    and thus rightfully ignored).

3. Better security. Currently any container is opened to share
   its /dev/shm with any other container.

Obviously, this change will break the following usage scenario:

 $ docker run -d --name donor busybox top
 $ docker run --rm -it --ipc container:donor busybox sh
 Error response from daemon: linux spec namespaces: can't join IPC
 of container <ID>: non-shareable IPC (hint: use IpcMode:shareable
 for the donor container)

The soution, as hinted by the (amended) error message, is to
explicitly enable donor sharing by using --ipc shareable:

 $ docker run -d --name donor --ipc shareable busybox top

Compatibility notes:

1. This only applies to containers created _after_ this change.
   Existing containers are not affected and will work fine
   as their ipc mode is stored in HostConfig.

2. Old backward compatible behavior ("shareable" containers
   by default) can be enabled by either using
   `--default-ipc-mode shareable` daemon command line option,
   or by adding a `"default-ipc-mode": "shareable"`
   line in `/etc/docker/daemon.json` configuration file.

3. If an older client (API < 1.40) is used, a "shareable" container
   is created. A test to check that is added.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2019-03-09 18:57:42 -08:00
Sebastiaan van Stijn
54dddadc7d
Merge pull request #38452 from avagin/cr-test
integration/container: add a base test for C/R
2019-03-07 01:54:17 +01:00
Justin Cormack
98fc09128b Remove the rest of v1 manifest support
As people are using the UUID in `docker info` that was based on the v1 manifest signing key, replace
with a UUID instead.

Remove deprecated `--disable-legacy-registry` option that was scheduled to be removed in 18.03.

Signed-off-by: Justin Cormack <justin.cormack@docker.com>
2019-03-02 10:46:37 -08:00
Andrei Vagin
0b96bf891c Fix CheckpointList
A container checkpoint directory doesn't have config.json.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2019-02-28 23:04:16 -08:00
Brian Goff
fa9df85c6a Had HasExperimental() to cluster backend
It's already defined on the daemon. This allows us to not call
`SystemInfo` which is failry heavy and potentially can even error.

Takes care of todo item from Derek's containerd integration PR.
51c412f26e/daemon/cluster/services.go (L148-L149)

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2019-02-28 16:52:30 -08:00
Brian Goff
45eae4cb2b
Merge pull request #38806 from tonistiigi/rootless-build-fixes
builder-next: fixes for rootless mode
2019-02-28 15:44:40 -08:00
Tonis Tiigi
f9b9d5f584 builder-next: fixes for rootless mode
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
2019-02-28 10:44:21 -08:00
Sebastiaan van Stijn
8df160dde7
Merge pull request #38790 from nakabonne/refactor-setting-graph-driver
Refactor setting graph driver name
2019-02-28 10:42:09 +01:00
Vincent Demeester
ba641fef28
Merge pull request #31551 from KarthikNayak/dry_run
Network: add support for 'dangling' filter
2019-02-28 08:14:45 +01:00
Sebastiaan van Stijn
8c0ecb6387
Fix stopped containers with restart-policy showing as "restarting"
When manually stopping a container with a restart-policy, the container
would show as "restarting" in `docker ps` whereas its actual state
is "exited".

Stopping a container with a restart policy shows the container as "restarting"

    docker run -d --name test --restart unless-stopped busybox false

    docker stop test

    docker ps
    CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS                       PORTS               NAMES
    7e07409fa1d3        busybox             "false"             5 minutes ago       Restarting (1) 4 minutes ago                     test

However, inspecting the same container shows that it's exited:

    docker inspect test --format '{{ json .State }}'
    {
      "Status": "exited",
      "Running": false,
      "Paused": false,
      "Restarting": false,
      "OOMKilled": false,
      "Dead": false,
      "Pid": 0,
      "ExitCode": 1,
      "Error": "",
      "StartedAt": "2019-02-14T13:26:27.6091648Z",
      "FinishedAt": "2019-02-14T13:26:27.689427Z"
    }

And killing the container confirms this;

    docker kill test
    Error response from daemon: Cannot kill container: test: Container 7e07409fa1d36dc8d8cb8f25cf12ee1168ad9040183b85fafa73ee2c1fcf9361 is not running

    docker run -d --name test --restart unless-stopped busybox false

    docker stop test

    docker ps
    CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS                PORTS               NAMES
    d0595237054a        busybox             "false"             5 minutes ago       Restarting (1)       4 minutes ago                       exit

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-02-28 00:18:22 +01:00
karthik nayak
131cbaf5b7 Network: add support for 'dangling' filter
Like its counterpart in images and volumes, introduce the dangling
filter while listing networks. When the filter value is set to true,
only networks which aren't attached to containers and aren't builtin
networks are shown. When set to false, all builtin networks and
networks which are attached to containers are shown.

Signed-off-by: Karthik Nayak <Karthik.188@gmail.com>
2019-02-27 15:08:44 -05:00
John Howard
de7172b600
Merge pull request #38782 from Microsoft/fix-restart
Windows: Fix restart for Hyper-V containers
2019-02-26 22:44:36 -08:00
Dani Louca
3fbbeb703c set bigger grpc limit for GetConfigs api
Signed-off-by: Dani Louca <dani.louca@docker.com>
2019-02-26 11:09:25 -05:00
Ryo Nakao
894ecb24d1 Merge the divided loops
Signed-off-by: Ryo Nakao <nakabonne@gmail.com>
2019-02-24 16:16:19 +09:00