Let clients choose object types to compute disk usage of.
Signed-off-by: Roman Volosatovs <roman.volosatovs@docker.com>
Co-authored-by: Sebastiaan van Stijn <github@gone.nl>
The reasoning for this change is to be able to query image shared size without having to rely on the more heavyweight `/system/df` endpoint.
Signed-off-by: Roman Volosatovs <roman.volosatovs@docker.com>
This makes it easier to add more options to the backend without having to change
the signature.
While we're changing the signature, also adding a context.Context, which is not
currently used, but probably should be at some point.
Signed-off-by: Roman Volosatovs <roman.volosatovs@docker.com>
Ensure empty `BuildCache` field is represented as empty JSON array(`[]`)
instead of `null` to be consistent with `Images`, `Containers` etc.
Signed-off-by: Roman Volosatovs <roman.volosatovs@docker.com>
The LCOW implementation in dockerd has been deprecated in favor of re-implementation
in containerd (in progress). Microsoft started removing the LCOW V1 code from the
build dependencies we use in Microsoft/opengcs (soon to be part of Microsoft/hcshhim),
which means that we need to start removing this code.
This first step removes the lcow graphdriver, the LCOW initialization code, and
some LCOW-related utilities.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
After moving libnetwork to this repo, we need to update all the import
paths for libnetwork to point to docker/docker/libnetwork instead of
docker/libnetwork.
This change implements that.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This option was originally added in d05aa418b0,
and moved in 8b15839ee8 (after which it temporarily
went to the docker/engine-api repository, and was brought back in this repository
in 91e197d614).
However, it looks like this field was never used; the API always returns the standard
information, and the "--quiet" option for `docker ps` is implemented on the CLI
side, which uses different formatting when setting this option;
2ec468e284/api/client/ps.go (L73-L79)
This patch removes the unused field,
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
While the field in the Go struct is named `NanoCPUs`, it has a JSON label to
use `NanoCpus`, which was added in the original pull request (not clear what
the reason was); 846baf1fd3
Some notes:
- Golang processes field names case-insensitive, so when *using* the API,
both cases should work, but when inspecting a container, the field is
returned as `NanoCpus`.
- This only affects Containers.Resources. The `Limits` and `Reservation`
for SwarmKit services and SwarmKit "nodes" do not override the name
for JSON, so have the canonical (`NanoCPUs`) casing.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This fixes a regression based on expectations of the runtime:
```
docker pull arm32v7/alpine
docker run arm32v7/alpine
```
Without this change, the `docker run` will fail due to platform
matching on non-arm32v7 systems, even though the image could run
(assuming the system is setup correctly).
This also emits a warning to make sure that the user is aware that a
platform that does not match the default platform of the system is being
run, for the cases like:
```
docker pull --platform armhf busybox
docker run busybox
```
Not typically an issue if the requests are done together like that, but
if the image was already there and someone did `docker run` without an
explicit `--platform`, they may very well be expecting to run a native
version of the image instead of the armhf one.
This warning does add some extra noise in the case of platform specific
images being run, such as `arm32v7/alpine`, but this can be supressed by
explicitly setting the platform.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Current description of the "v" option doesn't explain what happens to
the volumes that are still in use by other containers. Turns out that
the only volumes that are removed are unnamed ones[1].
Perhaps a good way of clarifying this behavior would be adapting the
description from "docker rm --help".
As for the docs/api/v1.*.yaml changes — they seem to be applicable,
since the origin of this behavior dates way back to the 2016 or v1.11[2].
[1]: a24a71c50f/daemon/mounts.go (L34-L38)
[2]: dd7d1c8a02
Signed-off-by: Nikolay Edigaryev <edigaryev@gmail.com>
These types were not used in the API, so could not come up with
a reason why they were in that package.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Add Ulimits field to the ContainerSpec API type and wire it to Swarmkit.
This is related to #40639.
Signed-off-by: Albin Kerouanton <albin@akerouanton.name>
This patch adds a new "prune" event type to indicate that pruning of a resource
type completed.
This event-type can be used on systems that want to perform actions after
resources have been cleaned up. For example, Docker Desktop performs an fstrim
after resources are deleted (https://github.com/linuxkit/linuxkit/tree/v0.7/pkg/trim-after-delete).
While the current (remove, destroy) events can provide information on _most_
resources, there is currently no event triggered after the BuildKit build-cache
is cleaned.
Prune events have a `reclaimed` attribute, indicating the amount of space that
was reclaimed (in bytes). The attribute can be used, for example, to use as a
threshold for performing fstrim actions. Reclaimed space for `network` events
will always be 0, but the field is added to be consistent with prune events for
other resources.
To test this patch:
Create some resources:
for i in foo bar baz; do \
docker network create network_$i \
&& docker volume create volume_$i \
&& docker run -d --name container_$i -v volume_$i:/volume busybox sh -c 'truncate -s 5M somefile; truncate -s 5M /volume/file' \
&& docker tag busybox:latest image_$i; \
done;
docker pull alpine
docker pull nginx:alpine
echo -e "FROM busybox\nRUN truncate -s 50M bigfile" | DOCKER_BUILDKIT=1 docker build -
Start listening for "prune" events in another shell:
docker events --filter event=prune
Prune containers, networks, volumes, and build-cache:
docker system prune -af --volumes
See the events that are returned:
docker events --filter event=prune
2020-07-25T12:12:09.268491000Z container prune (reclaimed=15728640)
2020-07-25T12:12:09.447890400Z network prune (reclaimed=0)
2020-07-25T12:12:09.452323000Z volume prune (reclaimed=15728640)
2020-07-25T12:12:09.517236200Z image prune (reclaimed=21568540)
2020-07-25T12:12:09.566662600Z builder prune (reclaimed=52428841)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
After dicussing with maintainers, it was decided putting the burden of
providing the full cap list on the client is not a good design.
Instead we decided to follow along with the container API and use cap
add/drop.
This brings in the changes already merged into swarmkit.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Kernel memory limit is not supported on cgroup v2.
Even on cgroup v1, kernel memory limit (`kmem.limit_in_bytes`) has been deprecated since kernel 5.4.
0158115f70
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
In dockerd we already have a concept of a "runtime", which specifies the
OCI runtime to use (e.g. runc).
This PR extends that config to add containerd shim configuration.
This option is only exposed within the daemon itself (cannot be
configured in daemon.json).
This is due to issues in supporting unknown shims which will require
more design work.
What this change allows us to do is keep all the runtime config in one
place.
So the default "runc" runtime will just have it's already existing shim
config codified within the runtime config alone.
I've also added 2 more "stock" runtimes which are basically runc+shimv1
and runc+shimv2.
These new runtime configurations are:
- io.containerd.runtime.v1.linux - runc + v1 shim using the V1 shim API
- io.containerd.runc.v2 - runc + shim v2
These names coincide with the actual names of the containerd shims.
This allows the user to essentially control what shim is going to be
used by either specifying these as a `--runtime` on container create or
by setting `--default-runtime` on the daemon.
For custom/user-specified runtimes, the default shim config (currently
shim v1) is used.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
The initial implementation followed the Swarm API, where
PidsLimit is located in ContainerSpec. This is not the
desired place for this property, so moving the field to
TaskTemplate.Resources in our API.
A similar change should be made in the SwarmKit API (likely
keeping the old field for backward compatibility, because
it was merged some releases back)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This prevents projects that import only the api/types package from
also having to use the errdefs package (and because of that, containerd)
as a dependency.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This introduces A new type (`Limit`), which allows Limits
and "Reservations" to have different options, as it's not
possible to make "Reservations" for some kind of limits.
The `GenericResources` have been removed from the new type;
the API did not handle specifying `GenericResources` as a
_Limit_ (only as _Reservations_), and this field would
therefore always be empty (omitted) in the `Limits` case.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Support for PidsLimit was added to SwarmKit in docker/swarmkit/pull/2415,
but never exposed through the Docker remove API.
This patch exposes the feature in the repote API.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
While the docker cli may be sending a "version" header, this header
is not part of the API, or at least should not determin what API
version is used.
This code was added in c0afd9c873, to
adjust the handling of requests when an older version of the API was
used, but because the code relied on the "version" header set by the
CLI, it didn't work with other clients (e.g. when using cURL to make
an API request).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
"Container's image" term is rather ambiguous: it can be both a name and an ID.
Looking at the sources[1], it's actually an image ID, so bring some clarity.
[1]: a6a47d1a49/daemon/inspect.go (L170)
Signed-off-by: Nikolay Edigaryev <edigaryev@gmail.com>
The following fields are unsupported:
* BlkioStats: all fields other than IoServiceBytesRecursive
* CPUStats: CPUUsage.PercpuUsage
* MemoryStats: MaxUsage and Failcnt
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
This enables image lookup when creating a container to fail when the
reference exists but it is for the wrong platform. This prevents trying
to run an image for the wrong platform, as can be the case with, for
example binfmt_misc+qemu.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Metrics collectors generally don't need the daemon to prime the stats
with something to compare since they already have something to compare
with.
Before this change, the API does 2 collection cycles (which takes
roughly 2s) in order to provide comparison for CPU usage over 1s. This
was primarily added so that `docker stats --no-stream` had something to
compare against.
Really the CLI should have just made a 2nd call and done the comparison
itself rather than forcing it on all API consumers.
That ship has long sailed, though.
With this change, clients can set an option to just pull a single stat,
which is *at least* a full second faster:
Old:
```
time curl --unix-socket
/go/src/github.com/docker/docker/bundles/test-integration-shell/docker.sock
http://./containers/test/stats?stream=false\&one-shot=false > /dev/null
2>&1
real0m1.864s
user0m0.005s
sys0m0.007s
time curl --unix-socket
/go/src/github.com/docker/docker/bundles/test-integration-shell/docker.sock
http://./containers/test/stats?stream=false\&one-shot=false > /dev/null
2>&1
real0m1.173s
user0m0.010s
sys0m0.006s
```
New:
```
time curl --unix-socket
/go/src/github.com/docker/docker/bundles/test-integration-shell/docker.sock
http://./containers/test/stats?stream=false\&one-shot=true > /dev/null
2>&1
real0m0.680s
user0m0.008s
sys0m0.004s
time curl --unix-socket
/go/src/github.com/docker/docker/bundles/test-integration-shell/docker.sock
http://./containers/test/stats?stream=false\&one-shot=true > /dev/null
2>&1
real0m0.156s
user0m0.007s
sys0m0.007s
```
This fixes issues with downstreams ability to use the stats API to
collect metrics.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This query-parameter was deprecated in docker 1.13 in commit
820b809e70, and scheduled for
removal in docker 17.12, so we should remove it for the next
API version.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Adds support for ReplicatedJob and GlobalJob service modes. These modes
allow running service which execute tasks that exit upon success,
instead of daemon-type tasks.
Signed-off-by: Drew Erny <drew.erny@docker.com>
This information was added to an older version of the API
documentation (through 164ab2cfc9 and
5213a0a67e), but only added in the
"docs" branch.
This patch copies the information to the swagger file.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This parameter was introduced 4 years ago in b857dadb33
as part of https://github.com/moby/moby/pull/15711, but has never made it to the API docs.
Signed-off-by: Jan Chren (rindeal) <dev.rindeal@gmail.com>
- construct the initial options as a literal
- move validation for windows up, and fail early
- move all API-version handling together
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
As described in https://golang.org/s/generatedcode, Go has
a formalized format that should be used to indicate that a
file is generated.
Matching that format helps linters to skip generated files;
From https://golang.org/s/generatedcode (https://github.com/golang/go/issues/13560#issuecomment-288457920);
> Generated files are marked by a line of text that matches the regular expression, in Go syntax:
>
> ^// Code generated .* DO NOT EDIT\.$
>
> The `.*` means the tool can put whatever folderol it wants in there, but the comment
> must be a single line and must start with `Code generated` and end with `DO NOT EDIT.`,
> with a period.
>
> The text may appear anywhere in the file.
This patch updates the template used for our generated types
to match that format.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The validate step in CI was broken, due to a combination of
086b4541cf, fbdd437d29,
and 85733620eb being merged to master.
```
api/types/filters/parse.go:39:1: exported method `Args.Keys` should have comment or be unexported (golint)
func (args Args) Keys() []string {
^
daemon/config/builder.go:19:6: exported type `BuilderGCFilter` should have comment or be unexported (golint)
type BuilderGCFilter filters.Args
^
daemon/config/builder.go:21:1: exported method `BuilderGCFilter.MarshalJSON` should have comment or be unexported (golint)
func (x *BuilderGCFilter) MarshalJSON() ([]byte, error) {
^
daemon/config/builder.go:35:1: exported method `BuilderGCFilter.UnmarshalJSON` should have comment or be unexported (golint)
func (x *BuilderGCFilter) UnmarshalJSON(data []byte) error {
^
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Adds a new ServiceStatus field to the Service object, which includes the
running and desired task counts. This new field is gated behind a
"status" query parameter.
Signed-off-by: Drew Erny <drew.erny@docker.com>
If anything marshals the daemon config now or in the future
this commit ensures the correct canonical form for the builder
GC policies' filters.
Signed-off-by: Tibor Vass <tibor@docker.com>
This feature was used by docker build --stream and it was kept experimental.
Users of this endpoint should enable BuildKit anyway by setting Version to BuilderBuildKit.
Signed-off-by: Tibor Vass <tibor@docker.com>
The `/session` endpoint left experimental in API V1.39 through
239047c2d3 and
01c9e7082e, but the API reference
was not updated accordingly.
This updates the API documentation to match the change.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This commit adds the image variant to the image.(Image) type and
updates related functionality. Images built from another will
inherit the OS, architecture and variant.
Note that if a base image does not specify an architecture, the
local machine's architecture is used for inherited images. On the
other hand, the variant is set equal to the parent image's variant,
even when the parent image's variant is unset.
The legacy builder is also updated to allow the user to specify
a '--platform' argument on the command line when creating an image
FROM scratch. A complete platform specification, including variant,
is supported. The built image will include the variant, as will any
derived images.
Signed-off-by: Chris Price <chris.price@docker.com>
```
api/server/router/build/build_routes.go:309:41: Error return value of `(*encoding/json.Decoder).Decode` is not checked (errcheck)
api/server/router/build/build_routes.go:431:11: Error return value of `io.Copy` is not checked (errcheck)
api/server/router/container/container_routes.go:582:13: Error return value of `conn.Write` is not checked (errcheck)
api/server/router/grpc/grpc_routes.go:38:12: Error return value of `conn.Write` is not checked (errcheck)
api/server/router/grpc/grpc_routes.go:39:12: Error return value of `resp.Write` is not checked (errcheck)
api/server/router/image/image_routes.go:94:15: Error return value of `output.Write` is not checked (errcheck)
api/server/router/image/image_routes.go:139:15: Error return value of `output.Write` is not checked (errcheck)
api/server/router/image/image_routes.go:164:15: Error return value of `output.Write` is not checked (errcheck)
api/server/router/image/image_routes.go:180:15: Error return value of `output.Write` is not checked (errcheck)
api/server/router/plugin/plugin_routes.go:126:15: Error return value of `output.Write` is not checked (errcheck)
api/server/router/plugin/plugin_routes.go:165:15: Error return value of `output.Write` is not checked (errcheck)
api/server/router/plugin/plugin_routes.go:273:15: Error return value of `output.Write` is not checked (errcheck)
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
```
api/types/filters/parse_test.go:340:14: Error return value of `f.WalkValues` is not checked (errcheck)
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Add annotations to suppress warnings like this one:
> client/container_list.go:38:22: SA1019: filters.ToParamWithVersion is deprecated: Use ToJSON (staticcheck)
> filterJSON, err := filters.ToParamWithVersion(cli.version, options.Filters)
> ^
Modify the deprecation notice to specify it is applicable to new code
only.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Format the source according to latest goimports.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The router swapper was previously used to toggle
a debug mode, that code has since been removed.
Now this router is unnecessary.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
The previous description stated that an array of names / ids could be passed when the API in reality expects objects in the form of NetworkAttachmentConfig. This is fixed by updating the description and adding a definition for NetworkAttachmentConfig.
Signed-off-by: Hannes Ljungberg <hannes@5monkeys.se>
Fix the indentation to allow jane-openapi generate to work
Signed-off-by: Jeremy Leherpeur <jeremy.leherpeur@yousign.fr>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Path-specific rules were removed, so this is no longer used.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 530e63c1a61b105a6f7fc143c5acb9b5cd87f958)
Signed-off-by: Tibor Vass <tibor@docker.com>
Commit 77b8465d7e added a secret update
endpoint to allow updating labels on existing secrets. However, when
implementing the endpoint, the DebugRequestMiddleware was not updated
to scrub the Data field (as is being done when creating a secret).
When updating a secret (to set labels), the Data field should be either
`nil` (not set), or contain the same value as the existing secret. In
situations where the Data field is set, and the `dockerd` daemon is
running with debugging enabled / log-level debug, the base64-encoded
value of the secret is printed to the daemon logs.
The docker cli does not have a `docker secret update` command, but
when using `docker stack deploy`, the docker cli sends the secret
data both when _creating_ a stack, and when _updating_ a stack, thus
leaking the secret data if the daemon runs with debug enabled:
1. Start the daemon in debug-mode
dockerd --debug
2. Initialize swarm
docker swarm init
3. Create a file containing a secret
echo secret > my_secret.txt
4. Create a docker-compose file using that secret
cat > docker-compose.yml <<'EOF'
version: "3.3"
services:
web:
image: nginx:alpine
secrets:
- my_secret
secrets:
my_secret:
file: ./my_secret.txt
EOF
5. Deploy the stack
docker stack deploy -c docker-compose.yml test
6. Verify that the secret is scrubbed in the daemon logs
DEBU[2019-07-01T22:36:08.170617400Z] Calling POST /v1.30/secrets/create
DEBU[2019-07-01T22:36:08.171364900Z] form data: {"Data":"*****","Labels":{"com.docker.stack.namespace":"test"},"Name":"test_my_secret"}
7. Re-deploy the stack to trigger an "update"
docker stack deploy -c docker-compose.yml test
8. Notice that this time, the Data field is not scrubbed, and the base64-encoded secret is logged
DEBU[2019-07-01T22:37:35.828819400Z] Calling POST /v1.30/secrets/w3hgvwpzl8yooq5ctnyp71v52/update?version=34
DEBU[2019-07-01T22:37:35.829993700Z] form data: {"Data":"c2VjcmV0Cg==","Labels":{"com.docker.stack.namespace":"test"},"Name":"test_my_secret"}
This patch modifies `maskSecretKeys` to unconditionally scrub `Data` fields.
Currently, only the `secrets` and `configs` endpoints use a field with this
name, and no other POST API endpoints use a data field, so scrubbing this
field unconditionally will only scrub requests for those endpoints.
If a new endpoint is added in future where this field should not be scrubbed,
we can re-introduce more fine-grained (path-specific) handling.
This patch introduces some change in behavior:
- In addition to secrets, requests to create or update _configs_ will
now have their `Data` field scrubbed. Generally, the actual data should
not be interesting for debugging, so likely will not be problematic.
In addition, scrubbing this data for configs may actually be desirable,
because (even though they are not explicitely designed for this purpose)
configs may contain sensitive data (credentials inside a configuration
file, e.g.).
- Requests that send key/value pairs as a "map" and that contain a
key named "data", will see the value of that field scrubbed. This
means that (e.g.) setting a `label` named `data` on a config, will
scrub/mask the value of that label.
- Note that this is already the case for any label named `jointoken`,
`password`, `secret`, `signingcakey`, or `unlockkey`.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit c7ce4be93ae8edd2da62a588e01c67313a4aba0c)
Signed-off-by: Tibor Vass <tibor@docker.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 32d70c7e21631224674cd60021d3ec908c2d888c)
Signed-off-by: Tibor Vass <tibor@docker.com>
Add tests for
- case-insensitive matching of fields
- recursive masking
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit db5f811216e70bcb4a10e477c1558d6c68f618c5)
Signed-off-by: Tibor Vass <tibor@docker.com>
This is needed so that we can add OS version constraints in Swarmkit, which
does require the engine to report its host's OS version (see
https://github.com/docker/swarmkit/issues/2770).
The OS version is parsed from the `os-release` file on Linux, and from the
`ReleaseId` string value of the `SOFTWARE\Microsoft\Windows NT\CurrentVersion`
registry key on Windows.
Added unit tests when possible, as well as Prometheus metrics.
Signed-off-by: Jean Rouge <rougej+github@gmail.com>
Currently the API spec would allow `"443/tcp": [null]`, but what should
be allowed is `"443/tcp": null`
Signed-off-by: Dominic Tubach <dominic.tubach@to.com>
This adds both a daemon-wide flag and a container creation property:
- Set the `CgroupnsMode: "host|private"` HostConfig property at
container creation time to control what cgroup namespace the container
is created in
- Set the `--default-cgroupns-mode=host|private` daemon flag to control
what cgroup namespace containers are created in by default
- Set the default if the daemon flag is unset to "host", for backward
compatibility
- Default to CgroupnsMode: "host" for client versions < 1.40
Signed-off-by: Rob Gulewich <rgulewich@netflix.com>
Allow specifying environment variables when installing an engine plugin
as a Swarm service. Invalid environment variable entries (without an
equals (`=`) char) will be ignored.
Signed-off-by: Sune Keller <absukl@almbrand.dk>
This patch hard-codes support for NVIDIA GPUs.
In a future patch it should move out into its own Device Plugin.
Signed-off-by: Tibor Vass <tibor@docker.com>
Running a cluster in a two-manager configuration effectively *doubles*
the chance of loosing control over the cluster (compared to running
in a single-manager setup). Users may have the assumption that having
two managers provides fault tolerance, so it's best to warn them if
they're using this configuration.
This patch adds a warning to the `info` response if Swarm is configured
with two managers:
WARNING: Running Swarm in a two-manager configuration. This configuration provides
no fault tolerance, and poses a high risk to loose control over the cluster.
Refer to https://docs.docker.com/engine/swarm/admin_guide/ to configure the
Swarm for fault-tolerance.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This utility allows a client to convert an API response
back to a typed error; allowing the client to perform
different actions based on the type of error, without
having to resort to string-matching the error.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: John Howard <jhoward@microsoft.com>
Also fixes https://github.com/moby/moby/issues/22874
This commit is a pre-requisite to moving moby/moby on Windows to using
Containerd for its runtime.
The reason for this is that the interface between moby and containerd
for the runtime is an OCI spec which must be unambigious.
It is the responsibility of the runtime (runhcs in the case of
containerd on Windows) to ensure that arguments are escaped prior
to calling into HCS and onwards to the Win32 CreateProcess call.
Previously, the builder was always escaping arguments which has
led to several bugs in moby. Because the local runtime in
libcontainerd had context of whether or not arguments were escaped,
it was possible to hack around in daemon/oci_windows.go with
knowledge of the context of the call (from builder or not).
With a remote runtime, this is not possible as there's rightly
no context of the caller passed across in the OCI spec. Put another
way, as I put above, the OCI spec must be unambigious.
The other previous limitation (which leads to various subtle bugs)
is that moby is coded entirely from a Linux-centric point of view.
Unfortunately, Windows != Linux. Windows CreateProcess uses a
command line, not an array of arguments. And it has very specific
rules about how to escape a command line. Some interesting reading
links about this are:
https://blogs.msdn.microsoft.com/twistylittlepassagesallalike/2011/04/23/everyone-quotes-command-line-arguments-the-wrong-way/https://stackoverflow.com/questions/31838469/how-do-i-convert-argv-to-lpcommandline-parameter-of-createprocesshttps://docs.microsoft.com/en-us/cpp/cpp/parsing-cpp-command-line-arguments?view=vs-2017
For this reason, the OCI spec has recently been updated to cater
for more natural syntax by including a CommandLine option in
Process.
What does this commit do?
Primary objective is to ensure that the built OCI spec is unambigious.
It changes the builder so that `ArgsEscaped` as commited in a
layer is only controlled by the use of CMD or ENTRYPOINT.
Subsequently, when calling in to create a container from the builder,
if follows a different path to both `docker run` and `docker create`
using the added `ContainerCreateIgnoreImagesArgsEscaped`. This allows
a RUN from the builder to control how to escape in the OCI spec.
It changes the builder so that when shell form is used for RUN,
CMD or ENTRYPOINT, it builds (for WCOW) a more natural command line
using the original as put by the user in the dockerfile, not
the parsed version as a set of args which loses fidelity.
This command line is put into args[0] and `ArgsEscaped` is set
to true for CMD or ENTRYPOINT. A RUN statement does not commit
`ArgsEscaped` to the commited layer regardless or whether shell
or exec form were used.
- Don't set `PidsLimit` when creating a container and
no limit was set (or the limit was set to "unlimited")
- Don't set `PidsLimit` if the host does not have pids-limit
support (previously "unlimited" was set).
- Do not generate a warning if the host does not have pids-limit
support, but pids-limit was set to unlimited (having no
limit set, or the limit set to "unlimited" is equivalent,
so no warning is nescessary in that case).
- When updating a container, convert `0`, and `-1` to
"unlimited" (`0`).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This changes the default ipc mode of daemon/engine to be private,
meaning the containers will not have their /dev/shm bind-mounted
from the host by default. The benefits of doing this are:
1. No leaked mounts. Eliminate a possibility to leak mounts into
other namespaces (and therefore unfortunate errors like "Unable to
remove filesystem for <ID>: remove /var/lib/docker/containers/<ID>/shm:
device or resource busy").
2. Working checkpoint/restore. Make `docker checkpoint`
not lose the contents of `/dev/shm`, but save it to
the dump, and be restored back upon `docker start --checkpoint`
(currently it is lost -- while CRIU handles tmpfs mounts,
the "shareable" mount is seen as external to container,
and thus rightfully ignored).
3. Better security. Currently any container is opened to share
its /dev/shm with any other container.
Obviously, this change will break the following usage scenario:
$ docker run -d --name donor busybox top
$ docker run --rm -it --ipc container:donor busybox sh
Error response from daemon: linux spec namespaces: can't join IPC
of container <ID>: non-shareable IPC (hint: use IpcMode:shareable
for the donor container)
The soution, as hinted by the (amended) error message, is to
explicitly enable donor sharing by using --ipc shareable:
$ docker run -d --name donor --ipc shareable busybox top
Compatibility notes:
1. This only applies to containers created _after_ this change.
Existing containers are not affected and will work fine
as their ipc mode is stored in HostConfig.
2. Old backward compatible behavior ("shareable" containers
by default) can be enabled by either using
`--default-ipc-mode shareable` daemon command line option,
or by adding a `"default-ipc-mode": "shareable"`
line in `/etc/docker/daemon.json` configuration file.
3. If an older client (API < 1.40) is used, a "shareable" container
is created. A test to check that is added.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
There are two if statements checking for exactly same conditions:
> if hostConfig != nil && versions.LessThan(version, "1.40")
Merge these.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Like its counterpart in images and volumes, introduce the dangling
filter while listing networks. When the filter value is set to true,
only networks which aren't attached to containers and aren't builtin
networks are shown. When set to false, all builtin networks and
networks which are attached to containers are shown.
Signed-off-by: Karthik Nayak <Karthik.188@gmail.com>
Older API clients did not use a pointer for `PidsLimit`, so
API requests would always send `0`, resulting in any previous
value to be reset after an update:
Before this patch:
(using a 17.06 Docker CLI):
```bash
docker run -dit --name test --pids-limit=16 busybox
docker container inspect --format '{{json .HostConfig.PidsLimit}}' test
16
docker container update --memory=100M --memory-swap=200M test
docker container inspect --format '{{json .HostConfig.PidsLimit}}' test
0
docker container exec test cat /sys/fs/cgroup/pids/pids.max
max
```
With this patch applied:
(using a 17.06 Docker CLI):
```bash
docker run -dit --name test --pids-limit=16 busybox
docker container inspect --format '{{json .HostConfig.PidsLimit}}' test
16
docker container update --memory=100M --memory-swap=200M test
docker container inspect --format '{{json .HostConfig.PidsLimit}}' test
16
docker container exec test cat /sys/fs/cgroup/pids/pids.max
16
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The Swarmkit api specifies a target for configs called called "Runtime"
which indicates that the config is not mounted into the container but
has some other use. This commit updates the Docker api to reflect this.
Signed-off-by: Drew Erny <drew.erny@docker.com>
This assists to address a regression where distribution errors were not properly
handled, resulting in a generic 500 (internal server error) to be returned for
`/distribution/name/json` if you weren't authenticated, whereas it should return
a 40x (401).
This patch attempts to extract the HTTP status-code that was returned by the
distribution code, and falls back to returning a 500 status if unable to match.
Before this change:
curl -v --unix-socket /var/run/docker.sock http://localhost/distribution/name/json
* Trying /var/run/docker.sock...
* Connected to localhost (/var/run/docker.sock) port 80 (#0)
> GET /distribution/name/json HTTP/1.1
> Host: localhost
> User-Agent: curl/7.52.1
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< Api-Version: 1.37
< Content-Type: application/json
< Docker-Experimental: false
< Ostype: linux
< Server: Docker/dev (linux)
< Date: Tue, 03 Jul 2018 15:52:53 GMT
< Content-Length: 115
<
{"message":"errors:\ndenied: requested access to the resource is denied\nunauthorized: authentication required\n"}
* Curl_http_done: called premature == 0
* Connection #0 to host localhost left intact
daemon logs:
DEBU[2018-07-03T15:52:51.424950601Z] Calling GET /distribution/name/json
DEBU[2018-07-03T15:52:53.179895572Z] FIXME: Got an API for which error does not match any expected type!!!: errors:
denied: requested access to the resource is denied
unauthorized: authentication required
error_type=errcode.Errors module=api
ERRO[2018-07-03T15:52:53.179942783Z] Handler for GET /distribution/name/json returned error: errors:
denied: requested access to the resource is denied
unauthorized: authentication required
With this patch applied:
curl -v --unix-socket /var/run/docker.sock http://localhost/distribution/name/json
* Trying /var/run/docker.sock...
* Connected to localhost (/var/run/docker.sock) port 80 (#0)
> GET /distribution/name/json HTTP/1.1
> Host: localhost
> User-Agent: curl/7.52.1
> Accept: */*
>
< HTTP/1.1 403 Forbidden
< Api-Version: 1.38
< Content-Type: application/json
< Docker-Experimental: false
< Ostype: linux
< Server: Docker/dev (linux)
< Date: Fri, 03 Aug 2018 14:58:09 GMT
< Content-Length: 115
<
{"message":"errors:\ndenied: requested access to the resource is denied\nunauthorized: authentication required\n"}
* Curl_http_done: called premature == 0
* Connection #0 to host localhost left intact
daemon logs:
DEBU[2018-08-03T14:58:08.018726228Z] Calling GET /distribution/name/json
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Monitoring systems and load balancers are usually configured to use HEAD
requests for health monitoring. The /_ping endpoint currently does not
support this type of request, which means that those systems have fallback
to GET requests.
This patch adds support for HEAD requests on the /_ping endpoint.
Although optional, this patch also returns `Content-Type` and `Content-Length`
headers in case of a HEAD request; Refering to RFC 7231, section 4.3.2:
The HEAD method is identical to GET except that the server MUST NOT
send a message body in the response (i.e., the response terminates at
the end of the header section). The server SHOULD send the same
header fields in response to a HEAD request as it would have sent if
the request had been a GET, except that the payload header fields
(Section 3.3) MAY be omitted. This method can be used for obtaining
metadata about the selected representation without transferring the
representation data and is often used for testing hypertext links for
validity, accessibility, and recent modification.
A payload within a HEAD request message has no defined semantics;
sending a payload body on a HEAD request might cause some existing
implementations to reject the request.
The response to a HEAD request is cacheable; a cache MAY use it to
satisfy subsequent HEAD requests unless otherwise indicated by the
Cache-Control header field (Section 5.2 of [RFC7234]). A HEAD
response might also have an effect on previously cached responses to
GET; see Section 4.3.5 of [RFC7234].
With this patch applied, either `GET` or `HEAD` requests work; the only
difference is that the body is empty in case of a `HEAD` request;
curl -i --unix-socket /var/run/docker.sock http://localhost/_ping
HTTP/1.1 200 OK
Api-Version: 1.40
Cache-Control: no-cache, no-store, must-revalidate
Docker-Experimental: false
Ostype: linux
Pragma: no-cache
Server: Docker/dev (linux)
Date: Mon, 14 Jan 2019 12:35:16 GMT
Content-Length: 2
Content-Type: text/plain; charset=utf-8
OK
curl --head -i --unix-socket /var/run/docker.sock http://localhost/_ping
HTTP/1.1 200 OK
Api-Version: 1.40
Cache-Control: no-cache, no-store, must-revalidate
Content-Length: 0
Content-Type: text/plain; charset=utf-8
Docker-Experimental: false
Ostype: linux
Pragma: no-cache
Server: Docker/dev (linux)
Date: Mon, 14 Jan 2019 12:34:15 GMT
The client is also updated to use `HEAD` by default, but fallback to `GET`
if the daemon does not support this method.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- Add support for exact list of capabilities, support only OCI model
- Support OCI model on CapAdd and CapDrop but remain backward compatibility
- Create variable locally instead of declaring it at the top
- Use const for magic "ALL" value
- Rename `cap` variable as it overlaps with `cap()` built-in
- Normalize and validate capabilities before use
- Move validation for conflicting options to validateHostConfig()
- TweakCapabilities: simplify logic to calculate capabilities
Signed-off-by: Olli Janatuinen <olli.janatuinen@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
`time.After` keeps a timer running until the specified duration is
completed. It also allocates a new timer on each call. This can wind up
leaving lots of uneccessary timers running in the background that are
not needed and consume resources.
Instead of `time.After`, use `time.NewTimer` so the timer can actually
be stopped.
In some of these cases it's not a big deal since the duraiton is really
short, but in others it is much worse.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
The cancellable handler is no longer needed as the context that is
passed with the http request will be cancelled just like the close
notifier was doing.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This fix tries to address the issue raised in 37038 where
there were no memory.kernelTCP support for linux.
This fix add MemoryKernelTCP to HostConfig, and pass
the config to runtime-spec.
Additional test case has been added.
This fix fixes 37038.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
This commit contains changes to configure DataPathPort
option. By default we use 4789 port number. But this commit
will allow user to configure port number during swarm init.
DataPathPort can't be modified after swarm init.
Signed-off-by: selansen <elango.siva@docker.com>
These options were added in API 1.39, so should be ignored
when using an older version of the API.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This allows non-recursive bind-mount, i.e. mount(2) with "bind" rather than "rbind".
Swarm-mode will be supported in a separate PR because of mutual vendoring.
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
4.8+ kernels have fixed the ptrace security issues
so we can allow ptrace(2) on the default seccomp
profile if we do the kernel version check.
93e35efb8d
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
This feature was added in 14da20f5e7,
and was merged after API v1.39 shipped as part of the Docker 18.09
release candidates.
This commit moves the feature to the correct API version.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Adds support for sysctl options in docker services.
* Adds API plumbing for creating services with sysctl options set.
* Adds swagger.yaml documentation for new API field.
* Updates the API version history document.
* Changes executor package to make use of the Sysctls field on objects
* Includes integration test to verify that new behavior works.
Essentially, everything needed to support the equivalent of docker run's
`--sysctl` option except the CLI.
Includes a vendoring of swarmkit for proto changes to support the new
behavior.
Signed-off-by: Drew Erny <drew.erny@docker.com>