Commit graph

108 commits

Author SHA1 Message Date
Cory Snider
0ffaa6c785 daemon: add annotations to container HostConfig
Allow clients to set annotations on a container which will applied to
the container's OCI spec.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-23 18:59:00 -05:00
Sebastiaan van Stijn
f95e9b68d6
daemon: use strings.Cut() and cleanup error messages
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-12-21 11:09:03 +01:00
Nicolas De Loof
def549c8f6
imageservice: Add context to various methods
Co-authored-by: Paweł Gronowski <pawel.gronowski@docker.com>
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2022-11-03 12:22:40 +01:00
Cory Snider
e332c41e9d pkg/containerfs: alias ContainerFS to string
Drop the constructor and redundant string() type-casts.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-09-23 16:56:52 -04:00
Cory Snider
95824f2b5f pkg/containerfs: simplify ContainerFS type
Iterate towards dropping the type entirely.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-09-23 16:56:49 -04:00
Sebastiaan van Stijn
779a5b3029 ImageService.GetImage(): pass context
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
2022-09-07 16:53:45 +02:00
Nicolas De Loof
9a849cc83a
introduce GetImageOpts to manage image inspect data in backend
Currently only provides the existing "platform" option, but more
options will be added in follow-ups.

Signed-off-by: Nicolas De Loof <nicolas.deloof@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-08-16 16:49:46 +02:00
Paul "TBBle" Hampson
cb07afa3cc Implement :// separator for arbitrary Windows Device IDTypes
Arbitrary here does not include '', best to catch that one early as it's
almost certainly a mistake (possibly an attempt to pass a POSIX path
through this API)

Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
2022-03-27 13:26:47 +11:00
Paul "TBBle" Hampson
92f13bad88 Allow Windows Devices to be activated for HyperV Isolation
If not using the containerd backend, this will still fail, but later.

Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
2022-03-27 13:26:41 +11:00
Paul "TBBle" Hampson
c60f70f112 Break out setupWindowsDevices and add tests
Since this function is about to get more complicated, and change
behaviour, this establishes tests for the existing implementation.

Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
2022-03-27 13:23:48 +11:00
Sebastiaan van Stijn
1b3fef5333
Windows: require Windows Server RS5 / ltsc2019 (build 17763) as minimum
Windows Server 2016 (RS1) reached end of support, and Docker Desktop requires
Windows 10 V19H2 (version 1909, build 18363) as a minimum.

This patch makes Windows Server RS5 /  ltsc2019 (build 17763) the minimum version
to run the daemon, and removes some hacks for older versions of Windows.

There is one check remaining that checks for Windows RS3 for a workaround
on older versions, but recent changes in Windows seemed to have regressed
on the same issue, so I kept that code for now to check if we may need that
workaround (again);

085c6a98d5/daemon/graphdriver/windows/windows.go (L319-L341)

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-02-18 22:58:28 +01:00
Eng Zer Jun
c55a4ac779
refactor: move from io/ioutil to io and os package
The io/ioutil package has been deprecated in Go 1.16. This commit
replaces the existing io/ioutil functions with their new definitions in
io and os packages.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2021-08-27 14:56:57 +08:00
Sebastiaan van Stijn
0c84c322ae
daemon, oci: remove LCOW bits
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-07-27 13:35:59 +02:00
Brian Goff
24f173a003 Replace service "Capabilities" w/ add/drop API
After dicussing with maintainers, it was decided putting the burden of
providing the full cap list on the client is not a good design.
Instead we decided to follow along with the container API and use cap
add/drop.

This brings in the changes already merged into swarmkit.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-07-27 10:09:42 -07:00
Brian Goff
7a9cb29fb9 Accept platform spec on container create
This enables image lookup when creating a container to fail when the
reference exists but it is for the wrong platform. This prevents trying
to run an image for the wrong platform, as can be the case with, for
example binfmt_misc+qemu.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-03-20 16:10:36 -07:00
Olli Janatuinen
1308a3a99f Move DefaultCapabilities() to caps package
Signed-off-by: Olli Janatuinen <olli.janatuinen@gmail.com>
2019-11-14 21:13:16 +02:00
Sebastiaan van Stijn
6b91ceff74
Use hcsshim osversion package for Windows versions
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-10-22 02:53:00 +02:00
John Howard
8988448729 Remove refs to jhowardmsft from .go code
Signed-off-by: John Howard <jhoward@microsoft.com>
2019-09-25 10:51:18 -07:00
Sebastiaan van Stijn
07ff4f1de8
goimports: fix imports
Format the source according to latest goimports.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-09-18 12:56:54 +02:00
John Howard
a3eda72f71
Merge pull request #38541 from Microsoft/jjh/containerd
Windows: Experimental: ContainerD runtime
2019-03-19 21:09:19 -07:00
Jean Rouge
7fdac7eb0f Making it possible to pass Windows credential specs directly to the engine
Instead of having to go through files or registry values as is currently the
case.

While adding GMSA support to Kubernetes (https://github.com/kubernetes/kubernetes/pull/73726)
I stumbled upon the fact that Docker currently only allows passing Windows
credential specs through files or registry values, forcing the Kubelet
to perform a rather awkward dance of writing-then-deleting to either the
disk or the registry to be able to create a Windows container with cred
specs.

This patch solves this problem by making it possible to directly pass
whole base64-encoded cred specs to the engine's API. I took the opportunity
to slightly refactor the method responsible for Windows cred spec as it
seemed hard to read to me.

Added some unit tests on Windows credential specs handling, as there were
previously none.

Added/amended the relevant integration tests.

I have also tested it manually: given a Windows container using a cred spec
that you would normally start with e.g.
```powershell
docker run --rm --security-opt "credentialspec=file://win.json" mcr.microsoft.com/windows/servercore:ltsc2019 nltest /parentdomain
# output:
# my.ad.domain.com. (1)
# The command completed successfully
```
can now equivalently be started with
```powershell
$rawCredSpec = & cat 'C:\ProgramData\docker\credentialspecs\win.json'
$escaped = $rawCredSpec.Replace('"', '\"')
docker run --rm --security-opt "credentialspec=raw://$escaped" mcr.microsoft.com/windows/servercore:ltsc2019 nltest /parentdomain
# same output!
```

I'll do another PR on Swarmkit after this is merged to allow services to use
the same option.

(It's worth noting that @dperny faced the same problem adding GMSA support
to Swarmkit, to which he came up with an interesting solution - see
https://github.com/moby/moby/pull/38632 - but alas these tricks are not
available to the Kubelet.)

Signed-off-by: Jean Rouge <rougej+github@gmail.com>
2019-03-15 19:20:19 -07:00
John Howard
d4ceb61f2b LCOW:Reworking spec builder
Signed-off-by: John Howard <jhoward@microsoft.com>
2019-03-12 18:41:55 -07:00
John Howard
20833b06a0 Windows: (WCOW) Generate OCI spec that remote runtime can escape
Signed-off-by: John Howard <jhoward@microsoft.com>

Also fixes https://github.com/moby/moby/issues/22874

This commit is a pre-requisite to moving moby/moby on Windows to using
Containerd for its runtime.

The reason for this is that the interface between moby and containerd
for the runtime is an OCI spec which must be unambigious.

It is the responsibility of the runtime (runhcs in the case of
containerd on Windows) to ensure that arguments are escaped prior
to calling into HCS and onwards to the Win32 CreateProcess call.

Previously, the builder was always escaping arguments which has
led to several bugs in moby. Because the local runtime in
libcontainerd had context of whether or not arguments were escaped,
it was possible to hack around in daemon/oci_windows.go with
knowledge of the context of the call (from builder or not).

With a remote runtime, this is not possible as there's rightly
no context of the caller passed across in the OCI spec. Put another
way, as I put above, the OCI spec must be unambigious.

The other previous limitation (which leads to various subtle bugs)
is that moby is coded entirely from a Linux-centric point of view.

Unfortunately, Windows != Linux. Windows CreateProcess uses a
command line, not an array of arguments. And it has very specific
rules about how to escape a command line. Some interesting reading
links about this are:

https://blogs.msdn.microsoft.com/twistylittlepassagesallalike/2011/04/23/everyone-quotes-command-line-arguments-the-wrong-way/
https://stackoverflow.com/questions/31838469/how-do-i-convert-argv-to-lpcommandline-parameter-of-createprocess
https://docs.microsoft.com/en-us/cpp/cpp/parsing-cpp-command-line-arguments?view=vs-2017

For this reason, the OCI spec has recently been updated to cater
for more natural syntax by including a CommandLine option in
Process.

What does this commit do?

Primary objective is to ensure that the built OCI spec is unambigious.

It changes the builder so that `ArgsEscaped` as commited in a
layer is only controlled by the use of CMD or ENTRYPOINT.

Subsequently, when calling in to create a container from the builder,
if follows a different path to both `docker run` and `docker create`
using the added `ContainerCreateIgnoreImagesArgsEscaped`. This allows
a RUN from the builder to control how to escape in the OCI spec.

It changes the builder so that when shell form is used for RUN,
CMD or ENTRYPOINT, it builds (for WCOW) a more natural command line
using the original as put by the user in the dockerfile, not
the parsed version as a set of args which loses fidelity.
This command line is put into args[0] and `ArgsEscaped` is set
to true for CMD or ENTRYPOINT. A RUN statement does not commit
`ArgsEscaped` to the commited layer regardless or whether shell
or exec form were used.
2019-03-12 18:41:55 -07:00
John Howard
85ad4b16c1 Windows: Experimental: Allow containerd for runtime
Signed-off-by: John Howard <jhoward@microsoft.com>

This is the first step in refactoring moby (dockerd) to use containerd on Windows.
Similar to the current model in Linux, this adds the option to enable it for runtime.
It does not switch the graphdriver to containerd snapshotters.

 - Refactors libcontainerd to a series of subpackages so that either a
  "local" containerd (1) or a "remote" (2) containerd can be loaded as opposed
  to conditional compile as "local" for Windows and "remote" for Linux.

 - Updates libcontainerd such that Windows has an option to allow the use of a
   "remote" containerd. Here, it communicates over a named pipe using GRPC.
   This is currently guarded behind the experimental flag, an environment variable,
   and the providing of a pipename to connect to containerd.

 - Infrastructure pieces such as under pkg/system to have helper functions for
   determining whether containerd is being used.

(1) "local" containerd is what the daemon on Windows has used since inception.
It's not really containerd at all - it's simply local invocation of HCS APIs
directly in-process from the daemon through the Microsoft/hcsshim library.

(2) "remote" containerd is what docker on Linux uses for it's runtime. It means
that there is a separate containerd service running, and docker communicates over
GRPC to it.

To try this out, you will need to start with something like the following:

Window 1:
	containerd --log-level debug

Window 2:
	$env:DOCKER_WINDOWS_CONTAINERD=1
	dockerd --experimental -D --containerd \\.\pipe\containerd-containerd

You will need the following binary from github.com/containerd/containerd in your path:
 - containerd.exe

You will need the following binaries from github.com/Microsoft/hcsshim in your path:
 - runhcs.exe
 - containerd-shim-runhcs-v1.exe

For LCOW, it will require and initrd.img and kernel in `C:\Program Files\Linux Containers`.
This is no different to the current requirements. However, you may need updated binaries,
particularly initrd.img built from Microsoft/opengcs as (at the time of writing), Linuxkit
binaries are somewhat out of date.

Note that containerd and hcsshim for HCS v2 APIs do not yet support all the required
functionality needed for docker. This will come in time - this is a baby (although large)
step to migrating Docker on Windows to containerd.

Note that the HCS v2 APIs are only called on RS5+ builds. RS1..RS4 will still use
HCS v1 APIs as the v2 APIs were not fully developed enough on these builds to be usable.
This abstraction is done in HCSShim. (Referring specifically to runtime)

Note the LCOW graphdriver still uses HCS v1 APIs regardless.

Note also that this does not migrate docker to use containerd snapshotters
rather than graphdrivers. This needs to be done in conjunction with Linux also
doing the same switch.
2019-03-12 18:41:55 -07:00
Drew Erny
6f1d7ddfa4 Use Runtime target
The Swarmkit api specifies a target for configs called called "Runtime"
which indicates that the config is not mounted into the container but
has some other use. This commit updates the Docker api to reflect this.

Signed-off-by: Drew Erny <drew.erny@docker.com>
2019-02-19 13:14:17 -06:00
Drew Erny
04995fa7c7 Add CredentialSpec from configs support
Signed-off-by: Drew Erny <drew.erny@docker.com>
2019-02-04 14:52:01 -06:00
Yusuf Tarık Günaydın
86bd2e9864 Implemented memory and CPU limits for LCOW.
Signed-off-by: Yusuf Tarık Günaydın <yusuf_tarik@hotmail.com>
2019-02-02 13:02:23 +03:00
Olli Janatuinen
80d7bfd54d Capabilities refactor
- Add support for exact list of capabilities, support only OCI model
- Support OCI model on CapAdd and CapDrop but remain backward compatibility
- Create variable locally instead of declaring it at the top
- Use const for magic "ALL" value
- Rename `cap` variable as it overlaps with `cap()` built-in
- Normalize and validate capabilities before use
- Move validation for conflicting options to validateHostConfig()
- TweakCapabilities: simplify logic to calculate capabilities

Signed-off-by: Olli Janatuinen <olli.janatuinen@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-01-22 21:50:41 +02:00
Michael Crosby
b940cc5cff Move caps and device spec utils to oci pkg
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2018-12-11 10:20:25 -05:00
Justin Terry (VM)
b2d99865ea Add --device support for Windows
Implements the --device forwarding for Windows daemons. This maps the physical
device into the container at runtime.

Ex:

docker run --device="class/<clsid>" <image> <cmd>

Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
2018-11-21 15:31:17 -08:00
John Starks
e9268d9642 lcow: Allow the client to add device cgroup rules
Signed-off-by: John Starks <jostarks@microsoft.com>
2018-06-15 16:14:17 -07:00
John Starks
349aeeab7c lcow: Allow the client to add or remove capabilities
Signed-off-by: John Starks <jostarks@microsoft.com>
2018-06-15 16:03:33 -07:00
Brian Goff
cc8f358c23 Move network operations out of container package
These network operations really don't have anything to do with the
container but rather are setting up the networking.

Ideally these wouldn't get shoved into the daemon package, but doing
something else (e.g. extract a network service into a new package) but
there's a lot more work to do in that regard.
In reality, this probably simplifies some of that work as it moves all
the network operations to the same place.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-05-10 17:16:00 -04:00
John Howard
0f5fe3f9cf Windows: Fix Hyper-V containers regression from 36586
Signed-off-by: John Howard <jhoward@microsoft.com>
2018-03-15 15:36:36 -07:00
Kir Kolyshkin
d6ea46ceda container.BaseFS: check for nil before deref
Commit 7a7357dae1 ("LCOW: Implemented support for docker cp + build")
changed `container.BaseFS` from being a string (that could be empty but
can't lead to nil pointer dereference) to containerfs.ContainerFS,
which could be be `nil` and so nil dereference is at least theoretically
possible, which leads to panic (i.e. engine crashes).

Such a panic can be avoided by carefully analysing the source code in all
the places that dereference a variable, to make the variable can't be nil.
Practically, this analisys are impossible as code is constantly
evolving.

Still, we need to avoid panics and crashes. A good way to do so is to
explicitly check that a variable is non-nil, returning an error
otherwise. Even in case such a check looks absolutely redundant,
further changes to the code might make it useful, and having an
extra check is not a big price to pay to avoid a panic.

This commit adds such checks for all the places where it is not obvious
that container.BaseFS is not nil (which in this case means we do not
call daemon.Mount() a few lines earlier).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2018-03-13 21:24:48 -07:00
Sebastiaan van Stijn
1346a2c89a
Merge pull request #36267 from Microsoft/jjh/removeservicing
Windows: Remove servicing mode
2018-02-28 01:15:03 +01:00
John Howard
d4f37c0885 Windows: Remove servicing mode
Signed-off-by: John Howard <jhoward@microsoft.com>
2018-02-27 08:48:31 -08:00
Daniel Nephin
2b1a2b10af Move ImageService to new package
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2018-02-26 16:49:37 -05:00
Daniel Nephin
0dab53ff3c Move all daemon image methods into imageService
imageService provides the backend for the image API and handles the
imageStore, and referenceStore.

Signed-off-by: Daniel Nephin <dnephin@docker.com>
2018-02-26 16:48:29 -05:00
Daniel Nephin
f6639cb46d GetLayerFolders
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2018-02-21 18:26:16 -05:00
Brian Goff
c02171802b Merge configs/secrets in unix implementation
On unix, merge secrets/configs handling. This is important because
configs can contain secrets (via templating) and potentially a config
could just simply have secret information "by accident" from the user.
This just make sure that configs are as secure as secrets and de-dups a
lot of code.
Generally this makes everything simpler and configs more secure.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-02-16 11:25:14 -05:00
Kir Kolyshkin
195893d381 c.RWLayer: check for nil before use
Since commit e9b9e4ace2 has landed, there is a chance that
container.RWLayer is nil (due to some half-removed container). Let's
check the pointer before use to avoid any potential nil pointer
dereferences, resulting in a daemon crash.

Note that even without the abovementioned commit, it's better to perform
an extra check (even it's totally redundant) rather than to have a
possibility of a daemon crash. In other words, better be safe than
sorry.

[v2: add a test case for daemon.getInspectData]
[v3: add a check for container.Dead and a special error for the case]

Fixes: e9b9e4ace2
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2018-02-09 11:24:09 -08:00
Daniel Nephin
4f0d95fa6e Add canonical import comment
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2018-02-05 16:51:57 -05:00
Anusha Ragunathan
c162e8eb41
Merge pull request #35830 from cpuguy83/unbindable_shm
Make container shm parent unbindable
2018-01-19 17:43:30 -08:00
John Howard
0cba7740d4 Address feedback from Tonis
Signed-off-by: John Howard <jhoward@microsoft.com>
2018-01-18 12:30:39 -08:00
John Howard
afd305c4b5 LCOW: Refactor to multiple layer-stores based on feedback
Signed-off-by: John Howard <jhoward@microsoft.com>
2018-01-18 08:31:05 -08:00
John Howard
ce8e529e18 LCOW: Re-coalesce stores
Signed-off-by: John Howard <jhoward@microsoft.com>

The re-coalesces the daemon stores which were split as part of the
original LCOW implementation.

This is part of the work discussed in https://github.com/moby/moby/issues/34617,
in particular see the document linked to in that issue.
2018-01-18 08:29:19 -08:00
Brian Goff
eaa5192856 Make container resource mounts unbindable
It's a common scenario for admins and/or monitoring applications to
mount in the daemon root dir into a container. When doing so all mounts
get coppied into the container, often with private references.
This can prevent removal of a container due to the various mounts that
must be configured before a container is started (for example, for
shared /dev/shm, or secrets) being leaked into another namespace,
usually with private references.

This is particularly problematic on older kernels (e.g. RHEL < 7.4)
where a mount may be active in another namespace and attempting to
remove a mountpoint which is active in another namespace fails.

This change moves all container resource mounts into a common directory
so that the directory can be made unbindable.
What this does is prevents sub-mounts of this new directory from leaking
into other namespaces when mounted with `rbind`... which is how all
binds are handled for containers.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-01-16 15:09:05 -05:00
Yong Tang
ccc2ed0189 Rename FindUniqueNetwork to FindNetwork
This fix is a follow up to 30397, with `FindUniqueNetwork`
changed to `FindNetwork` based on the review feedback.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2018-01-15 17:34:40 +00:00
Yong Tang
b249ccb115 Update and use FindNetwork on Windows.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2018-01-07 03:32:37 +00:00