I had to check what the actual size was, so added it to the const's documentation.
While at it, also made use of it in a test, so that we're testing against the expected
value, and changed one alias to be consistent with other places where we alias this
import.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Windows Server 2016 (RS1) reached end of support, and Docker Desktop requires
Windows 10 V19H2 (version 1909, build 18363) as a minimum.
This patch makes Windows Server RS5 / ltsc2019 (build 17763) the minimum version
to run the daemon, and removes some hacks for older versions of Windows.
There is one check remaining that checks for Windows RS3 for a workaround
on older versions, but recent changes in Windows seemed to have regressed
on the same issue, so I kept that code for now to check if we may need that
workaround (again);
085c6a98d5/daemon/graphdriver/windows/windows.go (L319-L341)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This makes the function a bit more idiomatic, and leaves it to the caller to
decide wether or not the error can be ignored.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
All regular, non-EOL Linux distros now come with more recent kernels
out of the box. There may still be users trying to run on kernel 3.10
or older (some embedded systems, e.g.), but those should be a rare
exception, which we don't have to take into account.
This patch removes the kernel version check on Linux, and the corresponding
DOCKER_NOWARN_KERNEL_VERSION environment that was there to skip this
check.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The Linux kernel never sets the Inheritable capability flag to anything
other than empty. Moby should have the same behavior, and leave it to
userspace code within the container to set a non-empty value if desired.
Reported-by: Andrew G. Morgan <morgan@kernel.org>
Signed-off-by: Samuel Karp <skarp@amazon.com>
daemon/graphdriver/fuse-overlayfs/fuseoverlayfs.go:101:63: SA9002: file mode '700' evaluates to 01274; did you mean '0700'? (staticcheck)
if err := idtools.MkdirAllAndChown(path.Join(home, linkDir), 700, currentID); err != nil {
^
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This was added in commits fc21bf280b and
0380fbff37 in support of LCOW, but was
now always set to runtime.GOOS.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This removes some of the checks that were added in 0cba7740d4,
but should no longer be needed.
- `Daemon.create()`: fix the error message, which assumed it could only occur on Windows.
- `Daemon.cleanupContainer()`: no need to validate container platform to delete it.
- `Daemon.containerExport`: if a container was created, we should be able to
export it; no need to validate.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This removes some of the checks that were added in 0cba7740d4,
but should no longer be needed.
- `ImageService.ImageDelete()`: no need to validate image platform to delete it.
- `ImageService.ImageHistory()`: no need to validate image platform to show its
history; if it made it into the local image cache, it should be valid.
- `ImageService.ImportImage()`: `dockerfile.BuildFromConfig()` is used for
`docker (container) commmit` and `docker (image) import`. For `docker import`,
it's more transparent to perform validation early.
- `ImageService.LookupImage()`: no need to validate image platform to inspect it;
if it made it into the local image cache, it should be valid.
- `ImageService.SquashImage()`: same. This code was actually broken, because it
wrapped an `err` that was always `nil`, so would never return an error.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
None of the implementations used return an error, so removing the error
return can simplify using these.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Commit 0380fbff37 added the ability to pass a
--platform flag on `docker import` when importing an archive. The intent
of that commit was to allow importing a Linux rootfs on a Windows daemon
(as part of the experimental LCOW feature).
A later commit (337ba71fc1) changed some
of this code to take both OS and Architecture into account (for `docker build`
and `docker pull`), but did not yet update the `docker image import`.
This patch updates the import endpoitn to allow passing both OS and
Architecture. Note that currently only matching OSes are accepted,
and an error will be produced when (e.g.) specifying `linux` on Windows
and vice-versa.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
These logs were meant to be logged when starting the daemon. Moving the logs
to the daemon startup code (which also prints similar messages) instead of
having the images service log them.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This interface only had a single implementation (xfer.LayerDownloadManager),
and all places where it was used already imported the xfer package.
Removing the interface, also makes it a closer match to the "upload" part,
as `xfer.LayerUploadManager()` did not use an interface.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The `daemon.RawSysInfo()` function can be a heavy operation, as it collects
information about all cgroups on the host, networking, AppArmor, Seccomp, etc.
While looking at our code, I noticed that various parts in the code call this
function, potentially even _multiple times_ per container, for example, it is
called from:
- `verifyPlatformContainerSettings()`
- `oci.WithCgroups()` if the daemon has `cpu-rt-period` or `cpu-rt-runtime` configured
- in `ContainerDecoder.DecodeConfig()`, which is called on boith `container create` and `container commit`
Given that this information is not expected to change during the daemon's
lifecycle, and various information coming from this (such as seccomp and
apparmor status) was already cached, we may as well load it once, and cache
the results in the daemon instance.
This patch updates `daemon.RawSysInfo()` to use a `sync.Once()` so that
it's only executed once for the daemon's lifecycle.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Use the syscall method instead of repeating the type conversions for
the syscall.Stat_t Atim/Mtim members. This also allows to drop the
//nolint: unconvert comments.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
followLogs() is getting really long (170+ lines) and complex.
The function has multiple inner functions that mutate its variables.
To refactor the function, this change introduces follow{} struct.
The inner functions are now defined as ordinal methods, which are
accessible from tests.
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
* When async is enabled, this option defines the interval (ms) at which the connection
to the fluentd-address is re-established. This option is useful if the address
may resolve to one or more IP addresses, e.g. a Consul service address.
While the change in #42979 resolves the issue where a Docker container can be stuck
if the fluentd-address is unavailable, this functionality adds an additional benefit
in that a new and healthy fluentd-address can be resolved, allowing logs to flow once again.
This adds a `fluentd-async-reconnect-interval` log-opt for the fluentd logging driver.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Conor Evans <coevans@tcd.ie>
Co-authored-by: Sebastiaan van Stijn <github@gone.nl>
Co-authored-by: Conor Evans <coevans@tcd.ie>
Before this change, if Decode() couldn't read a log record fully,
the subsequent invocation of Decode() would read the record's non-header part
as a header and cause a huge heap allocation.
This change prevents such a case by having the intermediate buffer in
the decoder struct.
Fixes#42125.
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
The flag ForceStopAsyncSend was added to fluent logger lib in v1.5.0 (at
this time named AsyncStop) to tell fluentd to abort sending logs
asynchronously as soon as possible, when its Close() method is called.
However this flag was broken because of the way the lib was handling it
(basically, the lib could be stucked in retry-connect loop without
checking this flag).
Since fluent logger lib v1.7.0, calling Close() (when ForceStopAsyncSend
is true) will really stop all ongoing send/connect procedure,
wherever it's stucked.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Trying to reduce the use of libcontainer/devices, as it's considered
to be an "internal" package by runc.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Also includes review suggestions in daemon.initNetworkController():
- update godoc for setHostGatewayIP()
- change setHostGatewayIP() to get config, instead of daemon
- remove redundant nil check for controller
Signed-off-by: sanchayanghosh <sanchayanghosh@outlook.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The daemon can print the proxy configuration as part of error-messages,
and when reloading the daemon configuration (SIGHUP). Make sure that
the configuration is sanitized before printing.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This allows configuring the daemon's proxy server through the daemon.json con-
figuration file or command-line flags configuration file, in addition to the
existing option (through environment variables).
Configuring environment variables on Windows to configure a service is more
complicated than on Linux, and adding alternatives for this to the daemon con-
figuration makes the configuration more transparent and easier to use.
The configuration as set through command-line flags or through the daemon.json
configuration file takes precedence over env-vars in the daemon's environment,
which allows the daemon to use a different proxy. If both command-line flags
and a daemon.json configuration option is set, an error is produced when starting
the daemon.
Note that this configuration is not "live reloadable" due to Golang's use of
`sync.Once()` for proxy configuration, which means that changing the proxy
configuration requires a restart of the daemon (reload / SIGHUP will not update
the configuration.
With this patch:
cat /etc/docker/daemon.json
{
"http-proxy": "http://proxytest.example.com:80",
"https-proxy": "https://proxytest.example.com:443"
}
docker pull busybox
Using default tag: latest
Error response from daemon: Get "https://registry-1.docker.io/v2/": proxyconnect tcp: dial tcp: lookup proxytest.example.com on 127.0.0.11:53: no such host
docker build .
Sending build context to Docker daemon 89.28MB
Step 1/3 : FROM golang:1.16-alpine AS base
Get "https://registry-1.docker.io/v2/": proxyconnect tcp: dial tcp: lookup proxytest.example.com on 127.0.0.11:53: no such host
Integration tests were added to test the behavior:
- verify that the configuration through all means are used (env-var,
command-line flags, damon.json), and used in the expected order of
preference.
- verify that conflicting options produce an error.
Signed-off-by: Anca Iordache <anca.iordache@docker.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This test runs with t.Parallel() _and_ uses subtests, but didn't capture
the `tc` variable, which potentialy (likely) makes it test the same testcase
multiple times.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Added an option 'awslogs-format' to allow specifying
a log format for the logs sent CloudWatch from the aws log driver.
For now, only the 'json/emf' format is supported.
If no option is provided, the log format header in the
request to CloudWatch will be omitted as before.
Signed-off-by: James Sanders <james3sanders@gmail.com>
Do not use 0701 perms.
0701 dir perms allows anyone to traverse the docker dir.
It happens to allow any user to execute, as an example, suid binaries
from image rootfs dirs because it allows traversal AND critically
container users need to be able to do execute things.
0701 on lower directories also happens to allow any user to modify
things in, for instance, the overlay upper dir which neccessarily
has 0755 permissions.
This changes to use 0710 which allows users in the group to traverse.
In userns mode the UID owner is (real) root and the GID is the remapped
root's GID.
This prevents anyone but the remapped root to traverse our directories
(which is required for userns with runc).
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit ef7237442147441a7cadcda0600be1186d81ac73)
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit 93ac040bf0)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This adds support for 2 runtimes on Windows, one that uses the built-in
HCSv1 integration and another which uses containerd with the runhcs
shim.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Commit dae652e2e5 added support for non-privileged
containers to use ICMP_PROTO (used for `ping`). This option cannot be set for
containers that have user-namespaces enabled.
However, the detection looks to be incorrect; HostConfig.UsernsMode was added
in 6993e891d1 / ee2183881b,
and the property only has meaning if the daemon is running with user namespaces
enabled. In other situations, the property has no meaning.
As a result of the above, the sysctl would only be set for containers running
with UsernsMode=host on a daemon running with user-namespaces enabled.
This patch adds a check if the daemon has user-namespaces enabled (RemappedRoot
having a non-empty value), or if the daemon is running inside a user namespace
(e.g. rootless mode) to fix the detection.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The io/ioutil package has been deprecated in Go 1.16. This commit
replaces the existing io/ioutil functions with their new definitions in
io and os packages.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
These checks were added when we required a specific version of containerd
and runc (different versions were known to be incompatible). I don't think
we had a similar requirement for tini, so this check was redundant. Let's
remove the check altogether.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This makes sure that the value set in the daemon can be used as-is,
without having to replicate the normalization logic elsewhere.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This allows containers to use the embedded default profile if a different
default is set (e.g. "unconfined") in the daemon configuration. Without this
option, users would have to copy the default profile to a file in order to
use the default.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Commit b237189e6c implemented an option to
set the default seccomp profile in the daemon configuration. When that PR
was reviewed, it was discussed to have the option accept the path to a custom
profile JSON file; https://github.com/moby/moby/pull/26276#issuecomment-253546966
However, in the implementation, the special "unconfined" value was not taken into
account. The "unconfined" value is meant to disable seccomp (more factually:
run with an empty profile).
While it's likely possible to achieve this by creating a file with an an empty
(`{}`) profile, and passing the path to that file, it's inconsistent with the
`--security-opt seccomp=unconfined` option on `docker run` and `docker create`,
which is both confusing, and makes it harder to use (especially on Docker Desktop,
where there's no direct access to the VM's filesystem).
This patch adds the missing check for the special "unconfined" value.
Co-authored-by: Tianon Gravi <admwiggin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Using "default" as a name is a bit ambiguous, because the _daemon_ default
can be changed using the '--seccomp-profile' daemon flag.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Let clients choose object types to compute disk usage of.
Signed-off-by: Roman Volosatovs <roman.volosatovs@docker.com>
Co-authored-by: Sebastiaan van Stijn <github@gone.nl>
Add hints for "Failed to destroy btrfs snapshot <DIR> for <ID>: operation not permitted" on rootless
Related to issue 41762
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
The daemon uses a priority list to automatically select the best-matching storage
driver for the backing filesystem that is used.
Historically, overlay2 was not supported on Btrfs and ZFS, and the daemon would
automatically pick the `btrfs` or `zfs` storage driver if that was the Backing
File System.
Commits 649e4c8889 and e226aea280
improved our detection to check if overlay2 was supported on the backing file-
system, allowing overlay2 to be used on top of Btrfs or ZFS, but did not change
the priority list.
While both Btrfs and ZFS have advantages for certain use-cases, and provide
advanced features that are not available to overlay2, they also are known
to require more "handholding", and are generally considered to be mostly
useful for "advanced" users.
This patch changes the storage-driver priority list, to prefer overlay2 (if
supported by the backing filesystem), and effectively makes btrfs and zfs
opt-in storage drivers.
This change does not affect existing installations; the daemon will detect
the storage driver that was previously in use (based on the presence of
storage directories in `/var/lib/docker`).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
It is not directly related to signal-handling, so can well live
in its own package.
Also added a variant that doesn't take a directory to write files
to, for easier consumption / better match to how it's used.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The "quiet" argument was only used in a single place (at daemon startup), and
every other use had to pass "false" to prevent this function from logging
warnings.
Now that SysInfo contains the warnings that occurred when collecting the
system information, we can make leave it up to the caller to use those
warnings (and log them if wanted).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This makes it easier to add more options to the backend without having to change
the signature.
While we're changing the signature, also adding a context.Context, which is not
currently used, but probably should be at some point.
Signed-off-by: Roman Volosatovs <roman.volosatovs@docker.com>
This code is not generically useful on "unix", and contains linux-
specific code, so make it only compile on linux.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This type was added to support Solaris (which didn't support these
options). Solaris support was removed, so we can integrate this type
back into the "unix" type.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This type was added to support Solaris (which didn't support these
options). Solaris support was removed, so we can integrate this type
back into the "unix" type.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Put variables and functions in the same owrder between both,
to allow for easier comparing between platforms.
Also synchronised some comments/godoc between both.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This changes mounts.NewParser() to create a parser for the current operatingsystem,
instead of one specific to a (possibly non-matching, in case of LCOW) OS.
With the OS-specific handling being removed, the "OS" parameter is also removed
from `daemon.verifyContainerSettings()`, and various other container-related
functions.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- Rename image summary constructor
- Rename `newImage` into `newImageSummary`, since the returned type is
`*types.ImageSummary`
- Rename variables for clarity
- Rename `newImage` into `summary`, since the variable type is
`*types.ImageSummary`
- Rename `imagesMap` into `summaryMap`, since the value type
contained is `*types.ImageSummary`
- Only compute `DiffSize` when more than 1 reference to the layer
exists, since it is not used otherwise
- Move variable declarations closer to where they are used
Signed-off-by: Roman Volosatovs <roman.volosatovs@docker.com>
Co-authored-by: Sebastiaan van Stijn <github@gone.nl>
Fixes#36911
If config file is invalid we'll exit anyhow, so this just prevents
the daemon from starting if the configuration is fine.
Mainly useful for making config changes and restarting the daemon
iff the config is valid.
Signed-off-by: Rich Horwood <rjhorwood@apple.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Anca Iordache <anca.iordache@docker.com>
Probably needs a similar change as c208f03fbd,
but this code makes my head spin, so for now suppressing, and created a
tracking issue:
daemon/graphdriver/graphtest/graphtest_unix.go:305:12: unsafeptr: possible misuse of reflect.SliceHeader (govet)
header := *(*reflect.SliceHeader)(unsafe.Pointer(&buf))
^
daemon/graphdriver/graphtest/graphtest_unix.go:308:36: unsafeptr: possible misuse of reflect.SliceHeader (govet)
data := *(*[]byte)(unsafe.Pointer(&header))
^
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
daemon/list.go:556:18: var-declaration: should omit type bool from declaration of var shouldSkip; it will be inferred from the right-hand side (revive)
shouldSkip bool = true
^
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
daemon/config/config_unix.go:92:21: error-strings: error strings should not be capitalized or end with punctuation or a newline (revive)
return fmt.Errorf("Default cgroup namespace mode (%v) is invalid. Use \"host\" or \"private\".", cm) // nolint: golint
^
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Also looks like a false positive, but given that these were basically
testing for the `errdefs.Conflict` and `errdefs.NotFound` interfaces, I
replaced these with those;
daemon/stats/collector.go:154:6: type `notRunningErr` is unused (unused)
type notRunningErr interface {
^
daemon/stats/collector.go:159:6: type `notFoundErr` is unused (unused)
type notFoundErr interface {
^
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
daemon/volumes_unix_test.go:228:13: SA4001: &*x will be simplified to x. It will not copy x. (staticcheck)
mp: &(*c.MountPoints["/jambolan"]), // copy the mountpoint, expect no changes
^
daemon/logger/local/local_test.go:214:22: SA4001: &*x will be simplified to x. It will not copy x. (staticcheck)
dst.PLogMetaData = &(*src.PLogMetaData)
^
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
daemon/logger/journald/read.go:128:3 comment on exported function `CErr` should be of the form `CErr ...`
daemon/logger/journald/read.go:131:36: unnecessary conversion (unconvert)
return C.GoString(C.strerror(C.int(-ret)))
^
daemon/logger/journald/read.go:380:2: S1023: redundant `return` statement (gosimple)
return
^
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
A node is no longer using its load balancer IP address when it no longer
has tasks that use the network that requires that load balancer. When
this occurs, the swarmkit manager will free that IP in IPAM, and may
reaassign it.
When a task shuts down cleanly, it attempts removal of the networks it
uses, and if it is the last task using those networks, this removal
succeeds, and the load balancer IP is freed.
However, this behavior is absent if the container fails. Removal of the
networks is never attempted.
To address this issue, I amend the executor. Whenever a node load
balancer IP is removed or changed, that information is passedd to the
executor by way of the Configure method. By keeping track of the set of
node NetworkAttachments from the previous call to Configure, we can
determine which, if any, have been removed or changed.
At first, this seems to create a race, by which a task can be attempting
to start and the network is removed right out from under it. However,
this is already addressed in the controller. The controller will attempt
to recreate missing networks before starting a task.
Signed-off-by: Drew Erny <derny@mirantis.com>
log statement should reflect how long it actually waited, not how long
it theoretically could wait based on the 'seconds' integer passed in.
Signed-off-by: Cam <gh@sparr.email>
This takes the same approach as was implemented on `docker build`, where a warning
is printed if `FROM --platform=...` is used (added in 399695305c)
Before:
docker rmi armhf/busybox
docker pull --platform=linux/s390x armhf/busybox
Using default tag: latest
latest: Pulling from armhf/busybox
d34a655120f5: Pull complete
Digest: sha256:8e51389cdda2158935f2b231cd158790c33ae13288c3106909324b061d24d6d1
Status: Downloaded newer image for armhf/busybox:latest
docker.io/armhf/busybox:latest
With this change:
docker rmi armhf/busybox
docker pull --platform=linux/s390x armhf/busybox
Using default tag: latest
latest: Pulling from armhf/busybox
d34a655120f5: Pull complete
Digest: sha256:8e51389cdda2158935f2b231cd158790c33ae13288c3106909324b061d24d6d1
Status: Downloaded newer image for armhf/busybox:latest
WARNING: image with reference armhf/busybox was found but does not match the specified platform: wanted linux/s390x, actual: linux/arm64
docker.io/armhf/busybox:latest
And daemon logs print:
WARN[2021-04-26T11:19:37.153572667Z] ignoring platform mismatch on single-arch image error="image with reference armhf/busybox was found but does not match the specified platform: wanted linux/s390x, actual: linux/arm64" image=armhf/busybox
When pulling without specifying `--platform, no warning is currently printed (but we can add a warning in future);
docker rmi armhf/busybox
docker pull armhf/busybox
Using default tag: latest
latest: Pulling from armhf/busybox
d34a655120f5: Pull complete
Digest: sha256:8e51389cdda2158935f2b231cd158790c33ae13288c3106909324b061d24d6d1
Status: Downloaded newer image for armhf/busybox:latest
docker.io/armhf/busybox:latest
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- daemon.WithRootless(): make sure ROOTLESSKIT_PARENT_EUID is valid int
- daemon.RawSysInfo(): minor simplification, and rename variable that
clashed with imported package.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The LCOW implementation in dockerd has been deprecated in favor of re-implementation
in containerd (in progress). Microsoft started removing the LCOW V1 code from the
build dependencies we use in Microsoft/opengcs (soon to be part of Microsoft/hcshhim),
which means that we need to start removing this code.
This first step removes the lcow graphdriver, the LCOW initialization code, and
some LCOW-related utilities.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
After moving libnetwork to this repo, we need to update all the import
paths for libnetwork to point to docker/docker/libnetwork instead of
docker/libnetwork.
This change implements that.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This utility was added after 19.03, and is only used in the daemon code
itself, so we can un-export it, until there's an external use for it.
Also updated the description, because the runc code already copied it
from coreos/go-systemd, so better to describe the actual source.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Logging to daemon logs every time there's an error with a log driver can be
problematic since daemon logs can grow rapidly, potentially exhausting disk
space.
Instead, it's preferable to limit the rate at which log driver errors are allowed
to be written. By default, this limit is 333 entries per second max.
Signed-off-by: Angel Velazquez <angelcar@amazon.com>
A temporary directory was created but not removed at the end of the test.
The missing remove directory call is added now.
Signed-off-by: Muhammad Zohaib Aslam <zohaibse011@gmail.com>
The underlying Loggers Close() function can be called with the the
run() goroutine still writing to the driver. This is causing the
fluentd-golang-logger to panic cause it doesn't defensively check
for the closing of the channel before writing to it.
It relies on the docker daemon to keep the contract of not calling Log()
if Close() has already been called.
Contributions by: James Johnston <james.johnston@thumbtack.com>
Nathan Wong <nathanw@thumbtack.com>
Signed-off-by: Anuj Varma <anujvarma@thumbtack.com>
Kernel 5.11 introduced support for rootless overlayfs, but incompatible with SELinux.
On the other hand, fuse-overlayfs is compatible.
Close issue 42333
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
These tests would panic;
- in WithRLimits(), because HostConfig was not set;
470ae8422f/daemon/oci_linux.go (L46-L47)
- in daemon.mergeUlimits(), because daemon.configStore was not set;
470ae8422f/daemon/oci_linux.go (L1069)
This panic was not discovered because the current version of runc/libcontainer that we vendor
would not always return false for `apparmor.IsEnabled()` when running docker-in-docker or if
`apparmor_parser` is not found. Starting with v1.0.0-rc93 of libcontainer, this is no longer
the case (changed in bfb4ea1b1b)
This patch;
- changes the tests to initialize Daemon.configStore and Container.HostConfig
- Combines TestExecSetPlatformOpt and TestExecSetPlatformOptPrivileged into a new test
(TestExecSetPlatformOptAppArmor)
- Runs the test both if AppArmor is enabled and if not (in which case it tests
that the container's AppArmor profile is left empty).
- Adds a FIXME comment for a possible bug in execSetPlatformOpts, which currently
prefers custom profiles over "privileged".
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Whether or not the command path is in the error message is a an
implementation detail.
For example, on Windows the only reason this ever matched was because it
dumped the entire container config into the error message, but this had
nothing to do with the actual error.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
this refactors the Stop command to fix a few issues and behaviors that
dont seem completely correct:
1. first it fixes a situation where stop could hang forever (#41579)
2. fixes a behavior where if sending the
stop signal failed, then the code directly sends a -9 signal. If that
fails, it returns without waiting for the process to exit or going
through the full docker kill codepath.
3. fixes a behavior where if sending the stop signal failed, then the
code sends a -9 signal. If that succeeds, then we still go through the
same stop waiting process, and may even go through the docker kill path
again, even though we've already sent a -9.
4. fixes a behavior where the code would wait the full 30 seconds after
sending a stop signal, even if we already know the stop signal failed.
fixes#41579
Signed-off-by: Cam <gh@sparr.email>
Before this change, cleanup of the btrfs driver (occuring on each daemon
shutdown) resulted in disabling quotas. It was done with an assumption
that quotas can be enabled or disabled on a subvolume level, which is
not true - enabling or disabling quota is always done on a filesystem
level.
That was leading to disabling quota on btrfs filesystems on each daemon
shutdown.
This change fixes that behavior and removes misleading `subvol` prefix
from functions and methods which set up quota (on a filesystem level).
Fixes: #34593
Fixes: 401c8d1767 ("Add disk quota support for btrfs")
Signed-off-by: Michal Rostecki <mrostecki@opensuse.org>
The runc/libcontainer apparmor package on master no longer checks if apparmor_parser
is enabled, or if we are running docker-in-docker.
While those checks are not relevant to runc (as it doesn't load the profile), these
checks _are_ relevant to us (and containerd). So switching to use the containerd
apparmor package, which does include the needed checks.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Some tests were using domain names that were intended to be "fake", but are
actually registered domain names (such as domain.com, registry.com, mytest.com).
Even though we were not actually making connections to these domains, it's
better to use domains that are designated for testing/examples in RFC2606:
https://tools.ietf.org/html/rfc2606
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The following was failing previously, because `getUnprivilegedMountFlags()` was not called:
```console
$ sudo mount -t tmpfs -o noexec none /tmp/foo
$ $ docker --context=rootless run -it --rm -v /tmp/foo:/mnt:ro alpine
docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:520: container init caused: rootfs_linux.go:60: mounting "/tmp/foo" to rootfs at "/home/suda/.local/share/docker/overlay2/b8e7ea02f6ef51247f7f10c7fb26edbfb308d2af8a2c77915260408ed3b0a8ec/merged/mnt" caused: operation not permitted: unknown.
```
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
overlay2 no longer sets `archive.OverlayWhiteoutFormat` when
running in UserNS, so we can remove the complicated logic in the
archive package.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
When running in userns, returns error (i.e. "use naive, not native")
immediately.
No substantial change to the logic.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Fix issue 41762
Cherry-pick "drivers: btrfs: Allow unprivileged user to delete subvolumes" from containers/storage
831e32b6bd
> In btrfs, subvolume can be deleted by IOC_SNAP_DESTROY ioctl but there
> is one catch: unprivileged IOC_SNAP_DESTROY call is restricted by default.
>
> This is because IOC_SNAP_DESTROY only performs permission checks on
> the top directory(subvolume) and unprivileged user might delete dirs/files
> which cannot be deleted otherwise. This restriction can be relaxed if
> user_subvol_rm_allowed mount option is used.
>
> Although the above ioctl had been the only way to delete a subvolume,
> btrfs now allows deletion of subvolume just like regular directory
> (i.e. rmdir sycall) since kernel 4.18.
>
> So if we fail to cleanup subvolume in subvolDelete(), just fallback to
> system.EnsureRmoveall() to try to cleanup subvolumes again.
> (Note: quota needs privilege, so if quota is enabled we do not fallback)
>
> This fix will allow non-privileged container works with btrfs backend.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
daemon.getExecConfig() already returns typed errors; by wrapping those errors
we may loose the actual reason for failures. Changing the error-type was
originally added in 2d43d93410, but I think
it was not intentional to ignore already-typed errors. It was later refactored
in a793564b25, which added helper functions
to create these errors, but kept the same behavior.
Also adds error-handling to prevent a panic in situations where (although
unlikely) `daemon.containers.Get()` would not return a container.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This makes daemon.getExecConfig return a errdefs.Conflict() error if the
container is not running.
This was originally the case, but a refactor of this code changed the typed
error (`derr.ErrorCodeContainerNotRunning`) to a non-typed error;
a793564b25
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Tonis mentioned that we can run into issues if there is more error
handling added here. This adds a custom reader implementation which is
like io.MultiReader except it does not cache EOF's.
What got us into trouble in the first place is `io.MultiReader` will
always return EOF once it has received an EOF, however the error
handling that we are going for is to recover from an EOF because the
underlying file is a file which can have more data added to it after
EOF.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
When the multireader hits EOF, we will always get EOF from it, so we
cannot store the multrireader fro later error handling, only for the
decoder.
Thanks @tobiasstadler for pointing this error out.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
The "userxattr" option is needed for mounting overlayfs inside a user namespace with kernel >= 5.11.
The "userxattr" option is NOT needed for the initial user namespace (aka "the host").
Also, Ubuntu (since circa 2015) and Debian (since 10) with kernel < 5.11 can mount the overlayfs in a user namespace without the "userxattr" option.
The corresponding kernel commit: 2d2f2d7322ff43e0fe92bf8cccdc0b09449bf2e1
> **ovl: user xattr**
>
> Optionally allow using "user.overlay." namespace instead of "trusted.overlay."
> ...
> Disable redirect_dir and metacopy options, because these would allow privilege escalation through direct manipulation of the
> "user.overlay.redirect" or "user.overlay.metacopy" xattrs.
Fix issue 42055
Related to containerd/containerd PR 5076
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Added an option `awslogs-create-stream` to allow skipping log stream
creation for awslogs log driver. The default value is still true to
keep the behavior be consistent with before.
Signed-off-by: Xia Wu <xwumzn@amazon.com>
- Using "/go/" redirects for some topics, which allows us to
redirect to new locations if topics are moved around in the
documentation.
- Updated some old URLs to their new location.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This code was added in 391441c28b, to fix
upgrades from docker 1.11 to 1.12 with existing containers.
Given that any container after 1.12 should have the correct configuration
already, it should be safe to assume this upgrade logic is no longer needed.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Wrap platforms.Only and fallback to our ignore mismatches due to empty
CPU variants. This just cleans things up and makes the logic re-usable
in other places.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
In some cases, in fact many in the wild, an image may have the incorrect
platform on the image config.
This can lead to failures to run an image, particularly when a user
specifies a `--platform`.
Typically what we see in the wild is a manifest list with an an entry
for, as an example, linux/arm64 pointing to an image config that has
linux/amd64 on it.
This change falls back to looking up the manifest list for an image to
see if the manifest list shows the image as the correct one for that
platform.
In order to accomplish this we need to traverse the leases associated
with an image. Each image, if pulled with Docker 20.10, will have the
manifest list stored in the containerd content store with the resource
assigned to a lease keyed on the image ID.
So we look up the lease for the image, then look up the assocated
resources to find the manifest list, then check the manifest list for a
platform match, then ensure that manifest referes to our image config.
This is only used as a fallback when a user specified they want a
particular platform and the image config that we have does not match
that platform.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
We have upgraded runc to rc93 and added CI for cgroup 2.
So we can move cgroup v2 out of experimental.
Fix issue 41916
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Various dirs in /var/lib/docker contain data that needs to be mounted
into a container. For this reason, these dirs are set to be owned by the
remapped root user, otherwise there can be permissions issues.
However, this uneccessarily exposes these dirs to an unprivileged user
on the host.
Instead, set the ownership of these dirs to the real root (or rather the
UID/GID of dockerd) with 0701 permissions, which allows the remapped
root to enter the directories but not read/write to them.
The remapped root needs to enter these dirs so the container's rootfs
can be configured... e.g. to mount /etc/resolve.conf.
This prevents an unprivileged user from having read/write access to
these dirs on the host.
The flip side of this is now any user can enter these directories.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit e908cc3901)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The remapped root does not need access to this dir.
Having this owned by the remapped root opens the host up to an
uprivileged user on the host being able to escalate privileges.
While it would not be normal for the remapped UID to be used outside of
the container context, it could happen.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit bfedd27259)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Before this change, there is no way to know if container (runtime)
resources have been cleaned up unless you actually remove the container.
This change allows callers of the wait API or the events API to know
that all runtime resources for the container are released (e.g. IP
addresses).
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Loggers that implement BufSize() (e.g. awslogs) uses the method to
tell Copier about the maximum log line length. However loggerWithCache
and RingBuffer hide the method by wrapping loggers.
As a result, Copier uses its default 16KB limit which breaks log
lines > 16kB even the destinations can handle that.
This change implements BufSize() on loggerWithCache and RingBuffer to
make sure these logger wrappes don't hide the method on the underlying
loggers.
Fixes#41794.
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
The following warnings in `docker info` are now discarded,
because there is no action user can actually take.
On cgroup v1:
- "WARNING: No blkio weight support"
- "WARNING: No blkio weight_device support"
On cgroup v2:
- "WARNING: No kernel memory TCP limit support"
- "WARNING: No oom kill disable support"
`docker run` still prints warnings when the missing feature is being attempted to use.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
When pulling an image by platform, it is possible for the image's
configured platform to not match what was in the manifest list.
The image itself is buggy because either the manifest list is incorrect
or the image config is incorrect. In any case, this is preventing people
from upgrading because many times users do not have control over these
buggy images.
This was not a problem in 19.03 because we did not compare on platform
before. It just assumed if we had the image it was the one we wanted
regardless of platform, which has its own problems.
Example Dockerfile that has this problem:
```Dockerfile
FROM --platform=linux/arm64 k8s.gcr.io/build-image/debian-iptables:buster-v1.3.0
RUN echo hello
```
This fails the first time you try to build after it finishes pulling but
before performing the `RUN` command.
On the second attempt it works because the image is already there and
does not hit the code that errors out on platform mismatch (Actually it
ignores errors if an image is returned at all).
Must be run with the classic builder (DOCKER_BUILDKIT=0).
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This fixes a panic when an admin specifies a custom default runtime,
when a plugin is started the shim config is nil.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Adds a test case for the case where dockerd gets stuck on startup due to
hanging `daemon.shutdownContainer`
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This is a fix for https://github.com/docker/for-linux/issues/1012.
The code was not considering that C strings are NULL-terminated so
we need to leave one extra byte.
Without this fix, the testcase in https://github.com/docker/for-linux/issues/1012
fails with
```
Step 61/1001 : RUN echo 60 > 60
---> Running in dde85ac3b1e3
Removing intermediate container dde85ac3b1e3
---> 80a12a18a241
Step 62/1001 : RUN echo 61 > 61
error creating overlay mount to /23456789112345678921234/overlay2/d368abcc97d6c6ebcf23fa71225e2011d095295d5d8c9b31d6810bea748bdf07-init/merged: no such file or directory
```
with the output of `dmesg -T` as:
```
[Sat Dec 19 02:35:40 2020] overlayfs: failed to resolve '/23456789112345678921234/overlay2/89e435a1b24583c463abb73e8abfad8bf8a88312ef8253455390c5fa0a765517-init/wor': -2
```
with this fix, you get the expected:
```
Step 126/1001 : RUN echo 125 > 125
---> Running in 2f2e56da89e0
max depth exceeded
```
Signed-off-by: Oscar Bonilla <6f6231@gmail.com>
Previous startup sequence used to call "containerStop" on containers that were persisted with a running state but are not alive when restarting (can happen on non-clean shutdown).
This call was made before fixing-up the RunningState of the container, and tricked the daemon to trying to kill a non-existing process and ultimately hang.
The fix is very simple - just add a condition on calling containerStop.
Signed-off-by: Simon Ferquel <simon.ferquel@docker.com>
These tests fail when run by a non-root user
=== RUN TestTmpfsDevShmNoDupMount
oci_linux_test.go:29: assertion failed: error is not nil: mkdir /var/lib/docker: permission denied
--- FAIL: TestTmpfsDevShmNoDupMount (0.00s)
=== RUN TestIpcPrivateVsReadonly
oci_linux_test.go:29: assertion failed: error is not nil: mkdir /var/lib/docker: permission denied
--- FAIL: TestIpcPrivateVsReadonly (0.00s)
=== RUN TestSysctlOverride
oci_linux_test.go:29: assertion failed: error is not nil: mkdir /var/lib/docker: permission denied
--- FAIL: TestSysctlOverride (0.00s)
=== RUN TestSysctlOverrideHost
oci_linux_test.go:29: assertion failed: error is not nil: mkdir /var/lib/docker: permission denied
--- FAIL: TestSysctlOverrideHost (0.00s)
Signed-off-by: Arnaud Rebillout <elboulangero@gmail.com>