This fixes a regression based on expectations of the runtime:
```
docker pull arm32v7/alpine
docker run arm32v7/alpine
```
Without this change, the `docker run` will fail due to platform
matching on non-arm32v7 systems, even though the image could run
(assuming the system is setup correctly).
This also emits a warning to make sure that the user is aware that a
platform that does not match the default platform of the system is being
run, for the cases like:
```
docker pull --platform armhf busybox
docker run busybox
```
Not typically an issue if the requests are done together like that, but
if the image was already there and someone did `docker run` without an
explicit `--platform`, they may very well be expecting to run a native
version of the image instead of the armhf one.
This warning does add some extra noise in the case of platform specific
images being run, such as `arm32v7/alpine`, but this can be supressed by
explicitly setting the platform.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This patch makes two changes if --oom-score-adj is set to 0
- do not adjust the oom-score-adjust cgroup for dockerd
- do not set the hard-coded -999 score for containerd if
containerd is running as child process
Before this change:
oom-score-adj | dockerd | containerd as child-process
--------------|---------------|----------------------------
- | -500 | -500 (same as dockerd)
-100 | -100 | -100 (same as dockerd)
0 | 0 | -999 (hard-coded default)
With this change:
oom-score-adj | dockerd | containerd as child-process
--------------|---------------|----------------------------
- | -500 | -500 (same as dockerd)
-100 | -100 | -100 (same as dockerd)
0 | not adjusted | not adjusted
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The cloud logging client should be closed when the log driver is closed. Otherwise dockerd will keep a gRPC connection to the logging endpoint open indefinitely.
This results in a slow leak of tcp sockets (1) and memory (~200Kb) any time that a container using `--log-driver=gcplogs` is terminates.
Signed-off-by: Patrick Haas <patrickhaas@google.com>
People keep doing this and getting pwned because they accidentally left
it exposed to the internet.
The warning about doing this has been there forever.
This introduces a sleep after warning.
To disable the extra sleep users must explicitly specify `--tls=false`
or `--tlsverify=false`
Warning also specifies this sleep will be removed in the next release
where the flag will be required if running unauthenticated.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
`os.RemoveAll()` should never return this error. From the docs:
> If the path does not exist, RemoveAll returns nil (no error).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This pulls in the migration of go-winio/backuptar from the bundled fork
of archive/tar from Go 1.6 to using Go's current archive/tar unmodified.
This fixes the failure to import an OCI layer (tar stream) containing a
file larger than 8gB.
Fixes: #40444
Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
Co-Authored-By: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
add all partial metadata available to journald logs to allow easier reassembly of partial messages in downstream logging systems
fixes#41403
Signed-off-by: Christian Becker <christian.becker@sixt.com>
This came up in a review of a5324d6950, but
for some reason that comment didn't find its way to GitHub, and/or I
forgot to push the change.
These files are "copied" by reading their content with ioutil.Readfile(),
resolving the symlinks should therefore not be needed, and paths can be
passed as-is;
```go
func copyFile(src, dst string) error {
sBytes, err := ioutil.ReadFile(src)
if err != nil {
return err
}
return ioutil.WriteFile(dst, sBytes, filePerm)
}
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This code assumes that we missed an exit event since the container is
still marked as running in Docker but attempts to signal the process in
containerd returns a "process not found" error.
There is a case where the event wasn't missed, just that it hasn't been
processed yet.
This change tries to work around that possibility by waiting to see if
the container is eventually marked as stopped. It uses the container's
configured stop timeout for this.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
A nil interface in Go is not the same as a nil pointer that satisfies
the interface. libcontainer/user has special handling for missing
/etc/{passwd,group} files but this is all based on nil interface checks,
which were broken by Docker's usage of the API.
When combined with some recent changes in runc that made read errors
actually be returned to the caller, this results in spurrious -EINVAL
errors when we should detect the situation as "there is no passwd file".
Signed-off-by: Aleksa Sarai <asarai@suse.de>
Add Ulimits field to the ContainerSpec API type and wire it to Swarmkit.
This is related to #40639.
Signed-off-by: Albin Kerouanton <albin@akerouanton.name>
In this case, we are sending a signal to the container (typically this
would be SIGKILL or SIGTERM, but could be any signal), but container
reports that the process does not exist.
At the point this code is happening, dockerd thinks that the container
is running, but containerd reports that it is not.
Since containerd reports that it is not running, try to collect the exit
status of the container from containerd, and mark the container as
stopped in dockerd.
Repro this problem like so:
```
id=$(docker run -d busybox top)
pkill containerd && pkill top
docker stop $id
```
Without this change, `docker stop $id` will first try to send SIGTERM,
wait for exit, then try SIGKILL.
Because the process doesn't exist to begin with, no signal is sent, and
so nothing happens.
Since we won't receive any event here to process, the container can
never be marked as stopped until the daemon is restarted.
With the change `docker stop` succeeds immediately (since the process is
already stopped) and we mark the container as stopped. We handle the
case as if we missed a exit event.
There are definitely some other places in the stack that could use some
improvement here, but this helps people get out of a sticky situation.
With io.containerd.runc.v2, no event is ever recieved by docker because
the shim quits trying to send the event.
With io.containerd.runtime.v1.linux the TastExit event is sent before
dockerd can reconnect to the event stream and we miss the event.
No matter what, we shouldn't be reliant on the shim doing the right
thing here, nor can we rely on a steady event stream.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This patch adds a new "prune" event type to indicate that pruning of a resource
type completed.
This event-type can be used on systems that want to perform actions after
resources have been cleaned up. For example, Docker Desktop performs an fstrim
after resources are deleted (https://github.com/linuxkit/linuxkit/tree/v0.7/pkg/trim-after-delete).
While the current (remove, destroy) events can provide information on _most_
resources, there is currently no event triggered after the BuildKit build-cache
is cleaned.
Prune events have a `reclaimed` attribute, indicating the amount of space that
was reclaimed (in bytes). The attribute can be used, for example, to use as a
threshold for performing fstrim actions. Reclaimed space for `network` events
will always be 0, but the field is added to be consistent with prune events for
other resources.
To test this patch:
Create some resources:
for i in foo bar baz; do \
docker network create network_$i \
&& docker volume create volume_$i \
&& docker run -d --name container_$i -v volume_$i:/volume busybox sh -c 'truncate -s 5M somefile; truncate -s 5M /volume/file' \
&& docker tag busybox:latest image_$i; \
done;
docker pull alpine
docker pull nginx:alpine
echo -e "FROM busybox\nRUN truncate -s 50M bigfile" | DOCKER_BUILDKIT=1 docker build -
Start listening for "prune" events in another shell:
docker events --filter event=prune
Prune containers, networks, volumes, and build-cache:
docker system prune -af --volumes
See the events that are returned:
docker events --filter event=prune
2020-07-25T12:12:09.268491000Z container prune (reclaimed=15728640)
2020-07-25T12:12:09.447890400Z network prune (reclaimed=0)
2020-07-25T12:12:09.452323000Z volume prune (reclaimed=15728640)
2020-07-25T12:12:09.517236200Z image prune (reclaimed=21568540)
2020-07-25T12:12:09.566662600Z builder prune (reclaimed=52428841)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
After dicussing with maintainers, it was decided putting the burden of
providing the full cap list on the client is not a good design.
Instead we decided to follow along with the container API and use cap
add/drop.
This brings in the changes already merged into swarmkit.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Kernel memory limit is not supported on cgroup v2.
Even on cgroup v1, kernel memory limit (`kmem.limit_in_bytes`) has been deprecated since kernel 5.4.
0158115f70
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
The test was looking for the wrong file name.
Since compression happens asyncronously, sometimes the test would
succeed and sometimes fail.
This change makes sure to wait for the compressed version of the file
since we can't know when the compression is going to occur.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This function was removed in the Linux code as part of
f63f73a4a8, but was not removed in
the Windows code.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The previous default runtime `io.containerd.runtime.v1.linux` is being deprecated (https://github.com/containerd/containerd/issues/4365)
`io.containerd.runc.v2` is available since containerd v1.3.0.
Using v1.3.5 or later is recommended. v1.3.0-v1.3.4 doesn't pass `TestContainerStartOnDaemonRestart`.
Fix#41107
Replace #41115
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
In dockerd we already have a concept of a "runtime", which specifies the
OCI runtime to use (e.g. runc).
This PR extends that config to add containerd shim configuration.
This option is only exposed within the daemon itself (cannot be
configured in daemon.json).
This is due to issues in supporting unknown shims which will require
more design work.
What this change allows us to do is keep all the runtime config in one
place.
So the default "runc" runtime will just have it's already existing shim
config codified within the runtime config alone.
I've also added 2 more "stock" runtimes which are basically runc+shimv1
and runc+shimv2.
These new runtime configurations are:
- io.containerd.runtime.v1.linux - runc + v1 shim using the V1 shim API
- io.containerd.runc.v2 - runc + shim v2
These names coincide with the actual names of the containerd shims.
This allows the user to essentially control what shim is going to be
used by either specifying these as a `--runtime` on container create or
by setting `--default-runtime` on the daemon.
For custom/user-specified runtimes, the default shim config (currently
shim v1) is used.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>