Commit graph

7184 commits

Author SHA1 Message Date
Brian Goff
3148a46657 Fix various race conditions in loggerutils
Found by running with `go test -race`

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-10-28 20:36:32 +00:00
Brian Goff
88c0271605 Don't set default platform on container create
This fixes a regression based on expectations of the runtime:

```
docker pull arm32v7/alpine
docker run arm32v7/alpine
```

Without this change, the `docker run` will fail due to platform
matching on non-arm32v7 systems, even though the image could run
(assuming the system is setup correctly).

This also emits a warning to make sure that the user is aware that a
platform that does not match the default platform of the system is being
run, for the cases like:

```
docker pull --platform armhf busybox
docker run busybox
```

Not typically an issue if the requests are done together like that, but
if the image was already there and someone did `docker run` without an
explicit `--platform`, they may very well be expecting to run a native
version of the image instead of the armhf one.

This warning does add some extra noise in the case of platform specific
images being run, such as `arm32v7/alpine`, but this can be supressed by
explicitly setting the platform.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-10-20 20:17:23 +00:00
Tibor Vass
cf867587b9
Merge pull request #41527 from thaJeztah/no_oom_score_adj
daemon: don't adjust oom-score if score is 0
2020-10-15 15:00:18 -07:00
Sebastiaan van Stijn
cf7a5be0f2
daemon: don't adjust oom-score if score is 0
This patch makes two changes if --oom-score-adj is set to 0

- do not adjust the oom-score-adjust cgroup for dockerd
- do not set the hard-coded -999 score for containerd if
  containerd is running as child process

Before this change:

oom-score-adj | dockerd       | containerd as child-process
--------------|---------------|----------------------------
-             | -500          | -500 (same as dockerd)
-100          | -100          | -100 (same as dockerd)
 0            |  0            | -999 (hard-coded default)

With this change:

oom-score-adj | dockerd       | containerd as child-process
--------------|---------------|----------------------------
-             | -500          | -500 (same as dockerd)
-100          | -100          | -100 (same as dockerd)
0             | not adjusted  | not adjusted

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-10-05 19:52:02 +02:00
Timo Rothenpieler
c677e4cc87 quota: move quota package out of graphdriver
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
2020-10-05 13:28:25 +00:00
Timo Rothenpieler
6f1553625d projectquota: build types and unsupported stubs everywhere
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
2020-10-05 13:28:25 +00:00
Timo Rothenpieler
31ed121cb8 projectquota: sync next projectID across Control instances
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
2020-10-05 13:28:25 +00:00
Patrick Haas
ef553e14a4 Fix gcplogs memory/connection leak
The cloud logging client should be closed when the log driver is closed. Otherwise dockerd will keep a gRPC connection to the logging endpoint open indefinitely.

This results in a slow leak of tcp sockets (1) and memory (~200Kb) any time that a container using `--log-driver=gcplogs` is terminates.

Signed-off-by: Patrick Haas <patrickhaas@google.com>
2020-09-30 17:45:19 -07:00
Tibor Vass
0c9c828937
Merge pull request #41484 from thaJeztah/remove_redundant_check
Remove redundant "os.IsNotExist" checks on os.RemoveAll()
2020-09-29 09:07:23 -07:00
Tibor Vass
b4cb377d30
Merge pull request #41290 from thaJeztah/getuser_refactor
Simplify getUser() to use libcontainer built-in functionality
2020-09-28 17:00:24 -07:00
Brian Goff
2617742802
Merge pull request #41482 from tklauser/unix-fileclone 2020-09-25 17:02:17 -07:00
Sebastiaan van Stijn
ccc1233f37
Merge pull request #41285 from cpuguy83/no_more_pwns
Sterner warnings and deprecation notice for unauthenticated tcp access
2020-09-25 15:40:41 +02:00
Brian Goff
5f5285a6e2 Sterner warnings for unathenticated tcp
People keep doing this and getting pwned because they accidentally left
it exposed to the internet.

The warning about doing this has been there forever.
This introduces a sleep after warning.
To disable the extra sleep users must explicitly specify `--tls=false`
or `--tlsverify=false`

Warning also specifies this sleep will be removed in the next release
where the flag will be required if running unauthenticated.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-09-25 00:21:54 +00:00
Tibor Vass
48b5b51bdb
Merge pull request #41411 from pjbgf/simplify-seccomp
Simplify seccomp logic
2020-09-24 11:21:19 -07:00
Sebastiaan van Stijn
7335167340
Remove redundant "os.IsNotExist" checks on os.RemoveAll()
`os.RemoveAll()` should never return this error. From the docs:

> If the path does not exist, RemoveAll returns nil (no error).

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-09-23 10:30:53 +02:00
Tobias Klauser
5a7b75f889 daemon/graphdriver/copy: use IoctlFileClone from golang.org/x/sys/unix
This allows to drop the cgo implementation.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
2020-09-22 21:49:08 +02:00
Paul "TBBle" Hampson
35c531db1a Revendor Microsoft/go-winio for 8gB file fix
This pulls in the migration of go-winio/backuptar from the bundled fork
of archive/tar from Go 1.6 to using Go's current archive/tar unmodified.

This fixes the failure to import an OCI layer (tar stream) containing a
file larger than 8gB.

Fixes: #40444

Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
2020-09-19 23:13:44 +10:00
Kirill Kolyshkin
41be7293f5
daemon/listeners: use pkg/errors
Co-Authored-By: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-09-14 14:50:54 +02:00
Sebastiaan van Stijn
5ca758199d
replace pkg/locker with github.com/moby/locker
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-09-10 22:15:40 +02:00
Paulo Gomes
a8e7115fca Simplify seccomp logic
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
2020-09-09 18:23:27 +01:00
Sebastiaan van Stijn
65a33d02f6
Simplify getUser() to use libcontainer built-in functionality
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-09-09 13:25:59 +02:00
Christian Becker
322c9e6866 add partial metadata to journald logs
add all partial metadata available to journald logs to allow easier reassembly of partial messages in downstream logging systems

fixes #41403

Signed-off-by: Christian Becker <christian.becker@sixt.com>
2020-09-01 12:54:05 +02:00
Akihiro Suda
111f9c3fdf
Merge pull request #41335 from thaJeztah/remove_unneeded_eval_symlinks
daemon.setupPathsAndSandboxOptions() skip resolving symlinks
2020-08-18 19:55:51 +09:00
Sebastiaan van Stijn
b837751e40
Merge pull request #41329 from zvier/master
Add more error message for ops when container limit use an device whi…
2020-08-13 15:07:25 +02:00
Sebastiaan van Stijn
cf169b45bb
daemon.setupPathsAndSandboxOptions() skip resolving symlinks
This came up in a review of a5324d6950, but
for some reason that comment didn't find its way to GitHub, and/or I
forgot to push the change.

These files are "copied" by reading their content with ioutil.Readfile(),
resolving the symlinks should therefore not be needed, and paths can be
passed as-is;

```go
func copyFile(src, dst string) error {
	sBytes, err := ioutil.ReadFile(src)
	if err != nil {
		return err
	}
	return ioutil.WriteFile(dst, sBytes, filePerm)
}
```

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-08-13 13:54:09 +02:00
Brian Goff
7fd23345c9 Wait for container exit before forcing handler
This code assumes that we missed an exit event since the container is
still marked as running in Docker but attempts to signal the process in
containerd returns a "process not found" error.

There is a case where the event wasn't missed, just that it hasn't been
processed yet.

This change tries to work around that possibility by waiting to see if
the container is eventually marked as stopped. It uses the container's
configured stop timeout for this.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-08-11 21:33:59 +00:00
Jeff Zvier
a7c279f203 Add more error message for ops when container limit use an device which not exist
Signed-off-by: Jeff Zvier <zvier20@gmail.com>
2020-08-11 06:33:22 +08:00
Sebastiaan van Stijn
07746cc972
Merge pull request #41227 from cpuguy83/work_around_missing_shim_event
Work around missing shim event
2020-07-30 20:42:41 +02:00
Sebastiaan van Stijn
47b7c888ee
Merge pull request #41284 from akerouanton/service-ulimits
Support ulimits on Swarm services.
2020-07-30 20:08:41 +02:00
Akihiro Suda
51e3cd4761
statsV2: implement Failcnt
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-07-30 14:31:20 +09:00
Brian Goff
6d9c4d60c5
Merge pull request #41288 from thaJeztah/fix_getexecuser
oci: correctly use user.GetExecUser interface
2020-07-29 10:23:36 -07:00
Aleksa Sarai
3108ae6226
oci: correctly use user.GetExecUser interface
A nil interface in Go is not the same as a nil pointer that satisfies
the interface. libcontainer/user has special handling for missing
/etc/{passwd,group} files but this is all based on nil interface checks,
which were broken by Docker's usage of the API.

When combined with some recent changes in runc that made read errors
actually be returned to the caller, this results in spurrious -EINVAL
errors when we should detect the situation as "there is no passwd file".

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2020-07-29 14:04:47 +02:00
Albin Kerouanton
c76f380bea
Add ulimits support to services
Add Ulimits field to the ContainerSpec API type and wire it to Swarmkit.

This is related to #40639.

Signed-off-by: Albin Kerouanton <albin@akerouanton.name>
2020-07-29 02:09:06 +02:00
Brian Goff
c458bca6dc Handle missing c8d task on stop
In this case, we are sending a signal to the container (typically this
would be SIGKILL or SIGTERM, but could be any signal), but container
reports that the process does not exist.

At the point this code is happening, dockerd thinks that the container
is running, but containerd reports that it is not.

Since containerd reports that it is not running, try to collect the exit
status of the container from containerd, and mark the container as
stopped in dockerd.

Repro this problem like so:

```
id=$(docker run -d busybox top)
pkill containerd && pkill top
docker stop $id
```

Without this change, `docker stop $id` will first try to send SIGTERM,
wait for exit, then try SIGKILL.
Because the process doesn't exist to begin with, no signal is sent, and
so nothing happens.
Since we won't receive any event here to process, the container can
never be marked as stopped until the daemon is restarted.

With the change `docker stop` succeeds immediately (since the process is
already stopped) and we mark the container as stopped. We handle the
case as if we missed a exit event.

There are definitely some other places in the stack that could use some
improvement here, but this helps people get out of a sticky situation.

With io.containerd.runc.v2, no event is ever recieved by docker because
the shim quits trying to send the event.

With io.containerd.runtime.v1.linux the TastExit event is sent before
dockerd can reconnect to the event stream and we miss the event.

No matter what, we shouldn't be reliant on the shim doing the right
thing here, nor can we rely on a steady event stream.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-07-28 10:09:25 -07:00
Sebastiaan van Stijn
51c7992928
API: add "prune" events
This patch adds a new "prune" event type to indicate that pruning of a resource
type completed.

This event-type can be used on systems that want to perform actions after
resources have been cleaned up. For example, Docker Desktop performs an fstrim
after resources are deleted (https://github.com/linuxkit/linuxkit/tree/v0.7/pkg/trim-after-delete).

While the current (remove, destroy) events can provide information on _most_
resources, there is currently no event triggered after the BuildKit build-cache
is cleaned.

Prune events have a `reclaimed` attribute, indicating the amount of space that
was reclaimed (in bytes). The attribute can be used, for example, to use as a
threshold for performing fstrim actions. Reclaimed space for `network` events
will always be 0, but the field is added to be consistent with prune events for
other resources.

To test this patch:

Create some resources:

    for i in foo bar baz; do \
        docker network create network_$i \
        && docker volume create volume_$i \
        && docker run -d --name container_$i -v volume_$i:/volume busybox sh -c 'truncate -s 5M somefile; truncate -s 5M /volume/file' \
        && docker tag busybox:latest image_$i; \
    done;

    docker pull alpine
    docker pull nginx:alpine

    echo -e "FROM busybox\nRUN truncate -s 50M bigfile" | DOCKER_BUILDKIT=1 docker build -

Start listening for "prune" events in another shell:

    docker events --filter event=prune

Prune containers, networks, volumes, and build-cache:

    docker system prune -af --volumes

See the events that are returned:

    docker events --filter event=prune
    2020-07-25T12:12:09.268491000Z container prune  (reclaimed=15728640)
    2020-07-25T12:12:09.447890400Z network prune  (reclaimed=0)
    2020-07-25T12:12:09.452323000Z volume prune  (reclaimed=15728640)
    2020-07-25T12:12:09.517236200Z image prune  (reclaimed=21568540)
    2020-07-25T12:12:09.566662600Z builder prune  (reclaimed=52428841)

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-07-28 12:41:14 +02:00
Tibor Vass
846b7e24ba
Merge pull request #41254 from AkihiroSuda/deprecate-kernel-memory
Deprecate KernelMemory
2020-07-28 10:43:29 +02:00
Sebastiaan van Stijn
b36e87af03
Merge pull request #41249 from cpuguy83/swarm_caps
Replace swarm Capabilites API with cap add/drop API
2020-07-28 01:07:49 +02:00
Brian Goff
24f173a003 Replace service "Capabilities" w/ add/drop API
After dicussing with maintainers, it was decided putting the burden of
providing the full cap list on the client is not a good design.
Instead we decided to follow along with the container API and use cap
add/drop.

This brings in the changes already merged into swarmkit.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-07-27 10:09:42 -07:00
Akihiro Suda
b8ca7de823
Deprecate KernelMemory
Kernel memory limit is not supported on cgroup v2.
Even on cgroup v1, kernel memory limit (`kmem.limit_in_bytes`) has been deprecated since kernel 5.4.
0158115f70

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-07-24 20:44:29 +09:00
Wang Yumu
c8008bfbe9 fix address pool flags merge #40388
Signed-off-by: Wang Yumu <37442693@qq.com>
2020-07-21 22:12:28 +08:00
Sebastiaan van Stijn
22153d111e
Merge pull request #41239 from cpuguy83/fix_racey_logger_test
Fix log file rotation test.
2020-07-21 01:04:53 +02:00
Brian Goff
260c26b7be
Merge pull request #41016 from kolyshkin/cgroup-init 2020-07-16 11:26:52 -07:00
Sebastiaan van Stijn
de5812c2a1
Merge pull request #40807 from wpjunior/plugin-feedback
Improve error feedback when plugin does not implement desired interface
2020-07-16 09:55:17 +02:00
Akihiro Suda
95a8e9ff19
Merge pull request #41214 from thaJeztah/remove_unused_v2
Remove unused useShimV2()
2020-07-16 06:14:00 +09:00
Brian Goff
c6d860ace6 Fix log file rotation test.
The test was looking for the wrong file name.
Since compression happens asyncronously, sometimes the test would
succeed and sometimes fail.

This change makes sure to wait for the compressed version of the file
since we can't know when the compression is going to occur.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-07-15 11:06:07 -07:00
Sebastiaan van Stijn
bf7fd015f7
Remove unused useShimV2()
This function was removed in the Linux code as part of
f63f73a4a8, but was not removed in
the Windows code.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-07-15 14:28:48 +02:00
Akihiro Suda
0b14c2b67a
cgroup v1: change the default runtime to io.containerd.runc.v2
The previous default runtime `io.containerd.runtime.v1.linux` is being deprecated (https://github.com/containerd/containerd/issues/4365)

`io.containerd.runc.v2` is available since containerd v1.3.0.
 Using v1.3.5 or later is recommended.  v1.3.0-v1.3.4 doesn't pass `TestContainerStartOnDaemonRestart`.

Fix #41107
Replace #41115

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-07-15 14:06:21 +09:00
Brian Goff
61b73ee714
Merge pull request #41182 from cpuguy83/runtime_configure_shim 2020-07-14 14:16:04 -07:00
Brian Goff
f63f73a4a8 Configure shims from runtime config
In dockerd we already have a concept of a "runtime", which specifies the
OCI runtime to use (e.g. runc).
This PR extends that config to add containerd shim configuration.
This option is only exposed within the daemon itself (cannot be
configured in daemon.json).
This is due to issues in supporting unknown shims which will require
more design work.

What this change allows us to do is keep all the runtime config in one
place.

So the default "runc" runtime will just have it's already existing shim
config codified within the runtime config alone.
I've also added 2 more "stock" runtimes which are basically runc+shimv1
and runc+shimv2.
These new runtime configurations are:

- io.containerd.runtime.v1.linux - runc + v1 shim using the V1 shim API
- io.containerd.runc.v2 - runc + shim v2

These names coincide with the actual names of the containerd shims.

This allows the user to essentially control what shim is going to be
used by either specifying these as a `--runtime` on container create or
by setting `--default-runtime` on the daemon.

For custom/user-specified runtimes, the default shim config (currently
shim v1) is used.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-07-13 14:18:02 -07:00
Brian Goff
6fd94aa933 Fix lint error on sprintf call for runtime string
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-07-09 15:41:44 -07:00