This makes sure that things like `--tmpfs` mounts over an anonymous
volume don't create volumes uneccessarily.
One method only checks mountpoints, the other checks both mountpoints
and tmpfs... the usage of these should likely be consolidated.
Ideally, processing for `--tmpfs` mounts would get merged in with the
rest of the mount parsing. I opted not to do that for this change so the
fix is minimal and can potentially be backported with fewer changes of
breaking things.
Merging the mount processing for tmpfs can be handled in a followup.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Docker Desktop (on MAC and Windows hosts) allows containers
running inside a Linux VM to connect to the host using
the host.docker.internal DNS name, which is implemented by
VPNkit (DNS proxy on the host)
This PR allows containers to connect to Linux hosts
by appending a special string "host-gateway" to --add-host
e.g. "--add-host=host.docker.internal:host-gateway" which adds
host.docker.internal DNS entry in /etc/hosts and maps it to host-gateway-ip
This PR also add a daemon flag call host-gateway-ip which defaults to
the default bridge IP
Docker Desktop will need to set this field to the Host Proxy IP
so DNS requests for host.docker.internal can be routed to VPNkit
Addresses: https://github.com/docker/for-linux/issues/264
Signed-off-by: Arko Dasgupta <arko.dasgupta@docker.com>
Adds support for ReplicatedJob and GlobalJob service modes. These modes
allow running service which execute tasks that exit upon success,
instead of daemon-type tasks.
Signed-off-by: Drew Erny <drew.erny@docker.com>
When IPv6 is enabled, make sure fixed-cidr-ipv6 is set
by the user since there is no default IPv6 local subnet
in the IPAM
Signed-off-by: Arko Dasgupta <arko.dasgupta@docker.com>
If TINI_COMMIT isn't set, .go-autogen sets an empty value
as the "expected" commit. Attempting to truncate the value
caused a panic in that situation.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
* Requires containerd binaries from containerd/containerd#3799 . Metrics are unimplemented yet.
* Works with crun v0.10.4, but `--security-opt seccomp=unconfined` is needed unless using master version of libseccomp
( containers/crun#156, seccomp/libseccomp#177 )
* Doesn't work with master runc yet
* Resource limitations are unimplemented
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
When a container is started in privileged mode, the device mappings
provided by `--device` flag was ignored. Now the device mappings will
be considered even in privileged mode.
Signed-off-by: Akhil Mohan <akhil.mohan@mayadata.io>
The [gelf payload specification](http://docs.graylog.org/en/2.4/pages/gelf.html#gelf-payload-specification)
demands that the field `short_message` *MUST* be set by the client library.
Since docker logging via the gelf driver sends messages line by line, it can happen that messages with an empty
`short_message` are passed on. This causes strict downstream processors (like graylog) to raise an exception.
The logger now skips messages with an empty line.
Resolves: #40232
See also: #37572
Signed-off-by: Jonas Heinrich <Jonas@JonasHeinrich.com>
Now that we do check if overlay is working by performing an actual
overlayfs mount, there's no need in extra checks for the kernel version
or the filesystem type. Actual mount check is sufficient.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Before this commit, overlay check was performed by looking for
`overlay` in /proc/filesystem. This obviously might not work
for rootless Docker (fs is there, but one can't use it as non-root).
This commit changes the check to perform the actual mount, by reusing
the code previously written to check for multiple lower dirs support.
The old check is removed from both drivers, as well as the additional
check for the multiple lower dirs support in overlay2 since it's now
a part of the main check.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This moves supportsMultipleLowerDir() to overlayutils
so it can be used from both overlay and overlay2.
The only changes made were:
* replace logger with logrus
* don't use workDirName mergedDirName constants
* add mnt var to improve readability a bit
This is a preparation for the next commit.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This fix tries to address the issue raised in 39353 where
docker crash when creating namespaces with UID in /etc/subuid and /etc/subgid.
The issue was that, mapping to `/etc/sub[u,g]id` in docker does not
allow numeric ID.
This fix fixes the issue by probing other combinations (uid:groupname, username:gid, uid:gid)
when normal username:groupname fails.
This fix fixes 39353.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
WithBlock makes sure that the following containerd request is reliable.
In one edge case with high load pressure, kernel kills dockerd, containerd
and containerd-shims caused by OOM. When both dockerd and containerd
restart, but containerd will take time to recover all the existing
containers. Before containerd serving, dockerd will failed with gRPC
error. That bad thing is that restore action will still ignore the
any non-NotFound errors and returns running state for
already stopped container. It is unexpected behavior. And
we need to restart dockerd to make sure that anything is OK.
It is painful. Add WithBlock can prevent the edge case. And
n common case, the containerd will be serving in shortly.
It is not harm to add WithBlock for containerd connection.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
The validate step in CI was broken, due to a combination of
086b4541cf, fbdd437d29,
and 85733620eb being merged to master.
```
api/types/filters/parse.go:39:1: exported method `Args.Keys` should have comment or be unexported (golint)
func (args Args) Keys() []string {
^
daemon/config/builder.go:19:6: exported type `BuilderGCFilter` should have comment or be unexported (golint)
type BuilderGCFilter filters.Args
^
daemon/config/builder.go:21:1: exported method `BuilderGCFilter.MarshalJSON` should have comment or be unexported (golint)
func (x *BuilderGCFilter) MarshalJSON() ([]byte, error) {
^
daemon/config/builder.go:35:1: exported method `BuilderGCFilter.UnmarshalJSON` should have comment or be unexported (golint)
func (x *BuilderGCFilter) UnmarshalJSON(data []byte) error {
^
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
As caught by staticcheck (after disabling the default exclusion rules);
Based on the comment, this break was indeed meant to break the
loop and return the error.
```
daemon/graphdriver/aufs/mount.go:54:4: SA4011: ineffective break statement. Did you mean to break out of the outer loop? (staticcheck)
break
^
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
```
daemon/container_operations.go:787:2: S1033: unnecessary guard around call to delete (gosimple)
if _, ok := container.NetworkSettings.Networks[n.ID()]; ok {
^
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
```
daemon/graphdriver/btrfs/btrfs.go:609:5: SA4003: no value of type uint64 is less than 0 (staticcheck)
if driver.options.size <= 0 {
^
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
```
daemon/graphdriver/aufs/aufs_test.go:746:8: SA4021: x = append(y) is equivalent to x = y (staticcheck)
ids = append(ids[2:])
^
```
Also pre-allocating the ids slice while we're at it.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The t.Log() caused some unneeded noise; changing these
tests to us subtests instead, so that we can track them
more easily.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Adds a new ServiceStatus field to the Service object, which includes the
running and desired task counts. This new field is gated behind a
"status" query parameter.
Signed-off-by: Drew Erny <drew.erny@docker.com>
It has been pointed out that sometimes device mapper unit tests
fail with the following diagnostics:
> --- FAIL: TestDevmapperSetup (0.02s)
> graphtest_unix.go:44: graphdriver: loopback attach failed
> graphtest_unix.go:48: loopback attach failed
The root cause is the absence of udev inside the container used
for testing, which causes device nodes (/dev/loop*) to not be
created.
The test suite itself already has a workaround, but it only
creates 8 devices (loop0 till loop7). It might very well be
the case that the first few devices are already used by the
system (on my laptop 15 devices are busy).
The fix is to raise the number of devices being manually created.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
If anything marshals the daemon config now or in the future
this commit ensures the correct canonical form for the builder
GC policies' filters.
Signed-off-by: Tibor Vass <tibor@docker.com>
For backwards compatibility, the old incorrect object format for
builder.GC.Rule.Filter still works but is deprecated in favor of array of
strings akin to what needs to be passed on the CLI.
Signed-off-by: Tibor Vass <tibor@docker.com>
This struct now has a properly typed member, so use the properly typed
functions with it.
Also update the vendor directory and hope nothing explodes.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
buildkit supports entitlements like network-host and security-insecure.
this patch aims to make it configurable through daemon.json file.
by default network-host is enabled & secuirty-insecure is disabled.
Signed-off-by: Kunal Kushwaha <kunal.kushwaha@gmail.com>
Previously there was no way for the splunk log driver to work if index
acknowledgment was set on the HEC, and it would in fact fail silently.
This will now allow users to specify if index acknowledgment is set and
will work with that setting.
Signed-off-by: Devon Estes <devon.c.estes@gmail.com>
The `docker/go-connections` package was only used for a quite generic utility.
This patch removes the use of the package by replacing the `GetProxyEnv` utility with
a local function that's based on the one in golang.org/x/net/http/httpproxy:
c21de06aaf/http/httpproxy/proxy.go (L100-L107)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This commit adds the image variant to the image.(Image) type and
updates related functionality. Images built from another will
inherit the OS, architecture and variant.
Note that if a base image does not specify an architecture, the
local machine's architecture is used for inherited images. On the
other hand, the variant is set equal to the parent image's variant,
even when the parent image's variant is unset.
The legacy builder is also updated to allow the user to specify
a '--platform' argument on the command line when creating an image
FROM scratch. A complete platform specification, including variant,
is supported. The built image will include the variant, as will any
derived images.
Signed-off-by: Chris Price <chris.price@docker.com>
About github.com/opencontainers/runc/libcontainer/user:
According to 195d8d544a
this package has two functions:
- Have a static implementation of user lookup, which is now supported in the
os/user stdlib package with the osusergo build tag, but wasn't at the time.
- Have extra functions that os/user doesn't have, but none of those are used
in homedir.
Since https://github.com/moby/moby/pull/11287, homedir depended directly on
libcontainer's user package for CurrentUser().
This is being replaced with os/user.Current(), because all of our static
binaries are compiled with the osusergo tag, and for dynamic libraries it
is more correct to use libc's implementation than parsing /etc/passwd.
About github.com/docker/docker/pkg/idtools:
Only dependency was from GetStatic() which uses idtools.LookupUID(uid).
The implementation of idtools.LookupUID just calls to
github.com/opencontainers/runc/libcontainer/user.LookupUid or fallbacks
to exec-ing to getent (since https://github.com/moby/moby/pull/27599).
This patch replaces calls to homedir.GetStatic by homedir.Get(), opting out
of supporting nss lookups in static binaries via exec-ing to getent for
the homedir package.
If homedir package users need to support nss lookups, they are advised
to compile dynamically instead.
Signed-off-by: Tibor Vass <tibor@docker.com>
This fixes issues where one goroutine tries to delete or rename a file
while another goroutine has the file open (e.g. a log reader).
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
In case jsonlogfile is used with max-file=1 and max-size set,
the log rotation is not perfomed; instead, the log file is closed
and re-open with O_TRUNC.
This situation is not handled by the log reader in follow mode,
leading to an issue of log reader being stuck forever.
This situation (file close/reopen) could be handled in waitRead(),
but fsnotify library chose to not listen to or deliver this event
(IN_CLOSE_WRITE in inotify lingo).
So, we have to handle this by checking the file size upon receiving
io.EOF from the log reader, and comparing the size with the one received
earlier. In case the new size is less than the old one, the file was
truncated and we need to seek to its beginning.
Fixes#39235.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
TestLogBlocking is intended to test that the Log method blocks by
default. It does this by mocking out the internals of the
awslogs.logStream and replacing one of its internal channels with one
that is controlled by the test. The call to Log occurs inside a
goroutine. Go may or may not schedule the goroutine immediately and the
blocking may or may not be observed outside the goroutine immediately
due to decisions made by the Go runtime. This change adds a small
timeout for test failure so that the Go runtime has the opportunity to
run the goroutine before the test fails.
Signed-off-by: Samuel Karp <skarp@amazon.com>
Moby works perfectly when you are in a situation when one has a good and stable
internet connection. Operating in area's where internet connectivity is likely
to be lost in undetermined intervals, like a satellite connection or 4G/LTE in
rural area's, can become a problem when pulling a new image. When connection is
lost while image layers are being pulled, Moby will try to reconnect up to 5 times.
If this fails, the incompletely downloaded layers are lost will need to be completely
downloaded again during the next pull request. This means that we are using more
data than we might have to.
Pulling a layer multiple times from the start can become costly over a satellite
or 4G/LTE connection. As these techniques (especially 4G) quite common in IoT and
Moby is used to run Azure IoT Edge devices, I would like to add a settable maximum
download attempts. The maximum download attempts is currently set at 5
(distribution/xfer/download.go). I would like to change this constant to a variable
that the user can set. The default will still be 5, so nothing will change from
the current version unless specified when starting the daemon with the added flag
or in the config file.
I added a default value of 5 for DefaultMaxDownloadAttempts and a settable
max-download-attempts in the daemon config file. It is also added to the config
of dockerd so it can be set with a flag when starting the daemon. This value gets
stored in the imageService of the daemon when it is initiated and can be passed
to the NewLayerDownloadManager as a parameter. It will be stored in the
LayerDownloadManager when initiated. This enables us to set the max amount of
retries in makeDownoadFunc equal to the max download attempts.
I also added some tests that are based on maxConcurrentDownloads/maxConcurrentUploads.
You can pull this version and test in a development container. Either create a config
`file /etc/docker/daemon.json` with `{"max-download-attempts"=3}``, or use
`dockerd --max-download-attempts=3 -D &` to start up the dockerd. Start downloading
a container and disconnect from the internet whilst downloading. The result would
be that it stops pulling after three attempts.
Signed-off-by: Lukas Heeren <lukas-heeren@hotmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
```
daemon/cluster/controllers/plugin/controller.go:37:2: U1000: field `taskID` is unused (unused)
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
this looks to be a false positive, but this field is not
used if journald is not supported, which may be the cause
```
daemon/logger/journald/journald.go:21:2: U1000: field `mu` is unused (unused)
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
staticcheck go linter says:
> daemon/logger/copier_test.go:451:2: SA2002: the goroutine calls T.Fatal, which must be called in the same goroutine as the test (staticcheck)
What it doesn't say is why. The reason is, t.Fatal() calls t.FailNow(),
which is expected to stop test execution right now. It does so by
calling runtime.Goexit(), which, unless called from a main goroutine,
does not stop test execution.
Anyway, long story short, if we don't care much about stopping the test
case immediately, we can just replace t.Fatalf() with t.Errorf() which
still marks the test case as failed, but won't stop it immediately.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Fix warnings like this one:
> daemon/logger/jsonfilelog/jsonfilelog_test.go:191:3: SA5001: should check returned error before deferring file.Close() (staticcheck)
> defer file.Close()
> ^
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Here, err is never non-nil as it was checked earlier.
Fixes the following linter warning:
> daemon/graphdriver/copy/copy.go:136:10: nilness: impossible condition: nil != nil (govet)
> if err != nil {
> ^
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In this code, err is already checked to be nil (or non-nil), so no need
to repeat extra checks.
Fixes the following govet warnings:
> daemon/checkpoint.go:38:12: nilness: tautological condition: nil == nil (govet)
> case err == nil:
> ^
> daemon/checkpoint.go:45:12: nilness: tautological condition: nil == nil (govet)
> case err == nil && stat.IsDir():
> ^
> daemon/checkpoint.go:47:12: nilness: tautological condition: nil == nil (govet)
> case err == nil:
> ^
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The last check for err != nil is not needed as err is always non-nil
there. Remove the check.
Also, no need to explicitly define `var err error` here.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Format the source according to latest goimports.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
suppressing the "SA9003: empty branch (staticcheck)" instead of commenting-out
or removing these lines because removing/commenting these lines causes a ripple
effect of changes, and there's still a to-do below.
```
13:06:14 daemon/graphdriver/graphtest/graphbench_unix.go:175:3: SA9003: empty branch (staticcheck)
13:06:14 if applyDiffSize != diffSize {
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
```
daemon/cluster/nodes.go:69:36: SA4009: argument ctx is overwritten before first use (staticcheck)
13:06:14 return c.lockedManagerAction(func(ctx context.Context, state nodeState) error {
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Trying to link to a non-existing container is not valid, and should return an
"invalid parameter" (400) error. Returning a "not found" error in this situation
would make the client report the container's image could not be found.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This test frequently fails on Windows RS1 (mainly), so skipping it
for now on Windows;
```
ok github.com/docker/docker/daemon/logger 0.525s coverage: 43.0% of statements
time="2019-09-09T20:37:35Z" level=info msg="Trying to get region from EC2 Metadata"
time="2019-09-09T20:37:36Z" level=info msg="Log stream already exists" errorCode=ResourceAlreadyExistsException logGroupName= logStreamName= message= origError="<nil>"
--- FAIL: TestLogBlocking (0.02s)
cloudwatchlogs_test.go:313: Expected to be able to read from stream.messages but was unable to
time="2019-09-09T20:37:36Z" level=error msg=Error
time="2019-09-09T20:37:36Z" level=error msg="Failed to put log events" errorCode=InvalidSequenceTokenException logGroupName=groupName logStreamName=streamName message="use token token" origError="<nil>"
time="2019-09-09T20:37:36Z" level=error msg="Failed to put log events" errorCode=DataAlreadyAcceptedException logGroupName=groupName logStreamName=streamName message="use token token" origError="<nil>"
time="2019-09-09T20:37:36Z" level=info msg="Data already accepted, ignoring error" errorCode=DataAlreadyAcceptedException logGroupName=groupName logStreamName=streamName message="use token token"
FAIL
coverage: 78.2% of statements
FAIL github.com/docker/docker/daemon/logger/awslogs 0.630s
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This made my IDE unhappy; `ConfigFilePath` is an exported function, so
it makes sense to use the same signature for both Linux and Windows.
This patch also adds error handling (same as on Linux), even though the
current implementation will never return an error (it's good practice
to handle errors, so I assumed this would be the right approach)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Get rid of too many nested if statements. Remove the redundand check for
err != nil, fixing the following lint issue:
> daemon/logger/awslogs/cloudwatchlogs.go:452:10: nilness: tautological condition: non-nil != nil (govet)
> if err != nil {
> ^
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
```
16:04:35 daemon/logger/awslogs/cloudwatchlogs.go:312:25: SA1019: session.New is deprecated: Use NewSession functions to create sessions instead. NewSession has the same functionality as New except an error can be returned when the func is called instead of waiting to receive an error until a request is made. (staticcheck)
16:04:35 return ec2metadata.New(session.New())
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
We return immediately after this, so no need to update eventBuffer:
```
16:04:35 daemon/logger/awslogs/cloudwatchlogs.go:554:5: SA4006: this value of `eventBuffer` is never used (staticcheck)
16:04:35 eventBuffer = eventBuffer[:0]
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
When mounting overlays which have children, enforce that
the mount is always performed as read only. Newer versions
of the kernel return a device busy error when a lower directory
is in use as an upper directory in another overlay mount.
Adds committed file to indicate when an overlay is being used
as a parent, ensuring it will no longer be mounted with an
upper directory.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
also renamed the non-windows variant of this file to be
consistent with other files in this package
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
btrfs_noversion was added in d7c37b5a28
for distributions that did not have the `btrfs/version.h` header file.
Seeing how all of the distributions we currently support do have the
`btrfs/version.h` file we should probably just remove this build flag
altogether.
Signed-off-by: Eli Uriegas <eli.uriegas@docker.com>
Reported by govet linter:
> daemon/monitor.go:57:9: lostcancel: the cancel function returned by context.WithTimeout should be called, not discarded, to avoid a context leak (govet)
> ctx, _ := context.WithTimeout(context.Background(), 2*time.Second)
> ^
> daemon/monitor.go:128:9: lostcancel: the cancel function returned by context.WithTimeout should be called, not discarded, to avoid a context leak (govet)
> ctx, _ := context.WithTimeout(context.Background(), 2*time.Second)
> ^
Fixes: b5f288 ("Handle blocked I/O of exec'd processes")
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. Use "in-place" variables for if statements to limit their scope to
the respectful `if` block.
2. Report the error returned from sd_journal_* by using CErr().
3. Use errors.New() instead of fmt.Errorf().
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
From the first glance, `docker logs --tail 0` does not make sense,
as it is supposed to produce no output, but `tail -n 0` from GNU
coreutils is working like that, plus there is even a test case
(`TestLogsTail` in integration-cli/docker_cli_logs_test.go).
Now, something like `docker logs --follow --tail 0` makes total
sense, so let's make it work.
(NOTE if --tail is not used, config.Tail is set to -1)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
If we take a long time to process log messages, and during that time
journal file rotation occurs, the journald client library will keep
those rotated files open until sd_journal_process() is called.
By periodically calling sd_journal_process() during the processing
loop we shrink the window of time a client instance has open file
descriptors for rotated (deleted) journal files.
This code is modelled after that of journalctl [1]; the above explanation
as well as the value of 1024 is taken from there.
[v2: fix CErr() argument]
[1] https://github.com/systemd/systemd/blob/dc16327c48d/src/journal/journalctl.c#L2676
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
TL;DR: simplify the code, fix --follow hanging indefinitely
Do the following to simplify the followJournal() code:
1. Use Go-native select instead of C-native polling.
2. Use Watch{Producer,Consumer}Gone(), eliminating the need
to have journald.closed variable, and an extra goroutine.
3. Use sd_journal_wait(). In the words of its own man page:
> A synchronous alternative for using sd_journal_get_fd(),
> sd_journal_get_events(), sd_journal_get_timeout() and
> sd_journal_process() is sd_journal_wait().
Unfortunately, the logic is still not as simple as it
could be; the reason being, once the container has exited,
journald might still be writing some logs from its internal
buffers onto journal file(s), and there is no way to
figure out whether it's done so we are guaranteed to
read all of it back. This bug can be reproduced with
something like
> $ ID=$(docker run -d busybox seq 1 150000); docker logs --follow $ID
> ...
> 128123
> $
(The last expected output line should be `150000`).
To avoid exiting from followJournal() early, add the
following logic: once the container is gone, keep trying
to drain the journal until there's no new data for at
least `waitTimeout` time period.
Should fix https://github.com/docker/for-linux/issues/575
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. The journald client library initializes inotify watch(es)
during the first call to sd_journal_get_fd(), and it make sense
to open it earlier in order to not lose any journal file rotation
events.
2. It only makes sense to call this if we're going to use it
later on -- so add a check for config.Follow.
3. Remove the redundant call to sd_journal_get_fd().
NOTE that any subsequent calls to sd_journal_get_fd() return
the same file descriptor, so there's no real need to save it
for later use in wait_for_data_cancelable().
Based on earlier patch by Nalin Dahyabhai <nalin@redhat.com>.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In case the LogConsumer is gone, the code that sends the message can
stuck forever. Wrap the code in select case, as all other loggers do.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In case Tail=N parameter is requested, we need to show N lines.
It does not make sense to walk backwards one by one if we can
do it at once. Now, if Since=T is also provided, make sure we
haven't jumped too far (before T), and if we did, move forward.
The primary motivation for this was to make the code simpler.
This also fixes a tiny bug in the "since" implementation.
Before this commit:
> $ docker logs -t --tail=6000 --since="2019-03-10T03:54:25.00" $ID | head
> 2019-03-10T03:54:24.999821000Z 95981
After:
> $ docker logs -t --tail=6000 --since="2019-03-10T03:54:25.00" $ID | head
> 2019-03-10T03:54:25.000013000Z 95982
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Protect access to q.quotas map, and lock around changing nextProjectID.
Techinically, the lock in findNextProjectID() is not needed as it is
only called during initialization, but one can never be too careful.
Fixes: 52897d1c09 ("projectquota: utility class for project quota controls")
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Docker daemon always stops healthcheck before sending signal to a
container now. However, when we use "docker kill" to send signals
other than SIGTERM or SIGKILL to a container, such as SIGINT,
daemon still stops container health check though container process
handles the signal normally and continues to work.
Signed-off-by: Ruilin Li <liruilin4@huawei.com>
This fix was added in 8e71b1e210 to work around
a go issue (https://github.com/golang/go/issues/20506).
That issue was fixed in
66c03d39f3,
which is part of Go 1.10 and up. This reverts the changes that were made in
8e71b1e210, and are no longer needed.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This allows our tests, which all share a containerd instance, to be a
bit more isolated by setting the containerd namespaces to the generated
daemon ID's rather than the default namespaces.
This came about because I found in some cases we had test daemons
failing to start (really very slow to start) because it was (seemingly)
processing events from other tests.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Before this change we just accept that any error is "not found" and it
could be something else, but even if it it is just a "not found" kind of
error this should be dealt with from the container store and not the
event processor.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
While investigating a test failure, I found this in the logs:
```
time="2019-07-04T15:06:32.622506760Z" level=warning msg="Error while setting daemon root propagation, this is not generally critical but may cause some functionality to not work or fallback to less desirable behavior" dir=/go/src/github.com/docker/docker/bundles/test-integration/d1285b8250308/root error="error writing file to signal mount cleanup on shutdown: open /tmp/dxr/d1285b8250308/unmount-on-shutdown: no such file or directory"
```
This path is generated from the daemon's exec-root, which appears to not
exist yet. This change just makes sure it exists before we try to write
a file.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Fixes#39427
This always sends the exec exit events even when the exec fails to find
the binary. A standard 127 exit status is sent in this situation.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This is the second part to
https://github.com/containerd/containerd/pull/3361 and will help process
delete not block forever when the process exists but the I/O was
inherited by a subprocess that lives on.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
As reported in docker/compose#6445, when deploying a Linux
container on Windows (LCOW), the daemon made the wrong assumption
when deciding which shell to use to execute the healthcheck, looking
at the host's platform instead of the container's platform.
This patch adds a check for the container's platform when deploying
on Windows, and sets the correct shell.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This reverts commit 98fc09128b in order to
keep registry v2 schema1 handling and libtrust-key-based engine ID.
Because registry v2 schema1 was not officially deprecated and
registries are still relying on it, this patch puts its logic back.
However, registry v1 relics are not added back since v1 logic has been
removed a while ago.
This also fixes an engine upgrade issue in a swarm cluster. It was relying
on the Engine ID to be the same upon upgrade, but the mentioned commit
modified the logic to use UUID and from a different file.
Since the libtrust key is always needed to support v2 schema1 pushes,
that the old engine ID is based on the libtrust key, and that the engine ID
needs to be conserved across upgrades, adding a UUID-based engine ID logic
seems to add more complexity than it solves the problems.
Hence reverting the engine ID changes as well.
Signed-off-by: Tibor Vass <tibor@docker.com>
There are a few more places, apparently, that List operations against
Swarm exist, besides just in the List methods. This increases the max
received message size in those places.
Signed-off-by: Drew Erny <drew.erny@docker.com>
Before 7a7357da, archive.TarResourceRebase was being used to copy files
and folders from the container. That function splits the source path
into a dirname + basename pair to support copying a file:
if you wanted to tar `dir/file` it would tar from `dir` the file `file`
(as part of the IncludedFiles option).
However, that path splitting logic was kept for folders as well, which
resulted in weird inputs to archive.TarWithOptions:
if you wanted to tar `dir1/dir2` it would tar from `dir1` the directory
`dir2` (as part of IncludedFiles option).
Although it was weird, it worked fine until we started chrooting into
the container rootfs when doing a `docker cp` with container source set
to `/` (cf 3029e765).
The fix is to only do the path splitting logic if the source is a file.
Unfortunately, 7a7357da added support for LCOW by duplicating some of
this subtle logic. Ideally we would need to do more refactoring of the
archive codebase to properly encapsulate these behaviors behind well-
documented APIs.
This fix does not do that. Instead, it fixes the issue inline.
Signed-off-by: Tibor Vass <tibor@docker.com>
This is needed so that we can add OS version constraints in Swarmkit, which
does require the engine to report its host's OS version (see
https://github.com/docker/swarmkit/issues/2770).
The OS version is parsed from the `os-release` file on Linux, and from the
`ReleaseId` string value of the `SOFTWARE\Microsoft\Windows NT\CurrentVersion`
registry key on Windows.
Added unit tests when possible, as well as Prometheus metrics.
Signed-off-by: Jean Rouge <rougej+github@gmail.com>
Commit e2989c4d48 says:
> With the suffix added, the possibility to hit the race is extremely
> low, and we don't have to do any locking.
Probability theory just laughed in my face this weekend, as this has
actually happened once in 6050000 containers created, on a high-end
hardware with 1000 parallel "docker create" running (took a few days).
One way to work around this is increase the randomness by adding more
characters, which will further decrease the probability, but won't
eliminate it entirely. Another is to fix it upstream (done, see the
link below, but the fix might not be packported to Ubuntu).
Overall, as much as I like this solution, I think we need to
revert it :-\
See-also: https://github.com/sfjro/aufs5-standalone/commit/abf61326f49535
This reverts commit e2989c4d48.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Previously only unpack operations were supported with chroot.
This adds chroot support for packing operations.
This prevents potential breakouts when copying data from a container.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This is useful for preventing CVE-2018-15664 where a malicious container
process can take advantage of a race on symlink resolution/sanitization.
Before this change chrootarchive would chroot to the destination
directory which is attacker controlled. With this patch we always chroot
to the container's root which is not attacker controlled.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Increases the max recieved gRPC message size for Node and Secret list
operations. This has already been done for the other swarm types, but
was not done for these.
Signed-off-by: Drew Erny <drew.erny@docker.com>
Previously `docker info` had reported "cgroupfs" as the cgroup driver
but the driver wasn't actually used at all.
This PR reports "none" as the cgroup driver so as to avoid confusion.
e.g. kubeadm/kubelet will detect cgroupless-ness by checking this docker
info field. https://github.com/rootless-containers/usernetes/pull/97
Note that user still cannot specify `native.cgroupdriver=none` manually.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
For some reason, retrying to unmount in case of getting EBUSY error
was only performed in Remove(), but not Put().
I have done some testing on Ubuntu 16.04 and 18.04 with aufs,
performing massively parallel container creation using this script:
```
NUMCTS=5000
PARALLEL=100
IMAGE=busybox
docker pull $IMAGE >/dev/null
seq $NUMCTS | parallel -j$PARALLEL docker create $IMAGE true > /dev/null
docker ps -qa | shuf | tail -n $NUMCTS | parallel -j$PARALLEL docker rm -f '{}' > /dev/null
```
Sometimes (1 to 5 times per 10000 `docker create`), aufs.Put() fails on Unmount syscall
with EBUSY during container creation:
> Error response from daemon: device or resource busy
and in docker log, with debug turned on:
> level=debug msg="Failed to unmount ID-init aufs: device or resource busy"
> level=error msg="Handler for POST /v1.30/containers/create returned error: device or resource busy"
I did some debugging by running fuser -v -M -m $MOUNT_POINT but
that reveals nothing.
This commit:
* implements retry on EBUSY in Unmount()
* calls Unmount() from Remove()
* increases the number of retries from 3 to 5
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In case statfs() returns ENOENT, do not return an error, but rather
treat this as "not mounted".
Related to commit d42dbdd3d4.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Make sure adapter.removeNetworks executes during task Remove
adapter.removeNetworks was being skipped for cases when
isUnknownContainer(err) was true after adapter.remove was executed
This fix eliminates the nil return case forcing the function
to continue executing unless there is a true error
Fixes https://github.com/moby/moby/issues/39225
Signed-off-by: Arko Dasgupta <arko.dasgupta@docker.com>
Commit 5cd62852fa added a lock around call to unix.Mount() to
avoid the race in aufs kernel code related to xino file creation
and removal. While this is going to be fixed in the kernel, we still
need to support the current aufs, so some kind of fix is required.
A think a better fix (rather than a lock) is to add a random suffix
to the file name (note it is and was a separate file per mount,
never mind the same file name -- the file is created/opened and
removed instantly, so each mount deals with its own file).
With the suffix added, the possibility to hit the race is extremely
low, and we don't have to do any locking.
Note we don't add any more characters, instead we're replacing
`xino` with four random characters in the 0-9a-z range.
See also: https://sourceforge.net/p/aufs/mailman/message/36674769/
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Running a bundled aufs benchmark sometimes results in this warning:
> WARN[0001] Couldn't run auplink before unmount /tmp/aufs-tests/aufs/mnt/XXXXX error="exit status 22" storage-driver=aufs
If we take a look at what aulink utility produces on stderr, we'll see:
> auplink:proc_mnt.c:96: /tmp/aufs-tests/aufs/mnt/XXXXX: Invalid argument
and auplink exits with exit code of 22 (EINVAL).
Looking into auplink source code, what happens is it tries to find a
record in /proc/self/mounts corresponding to the mount point (by using
setmntent()/getmntent_r() glibc functions), and it fails.
Some manual testing, as well as runtime testing with lots of printf
added on mount/unmount, as well as calls to check the superblock fs
magic on mount point (as in graphdriver.Mounted(graphdriver.FsMagicAufs, target)
confirmed that this record is in fact there, but sometimes auplink
can't find it. I was also able to reproduce the same error (inability
to find a mount in /proc/self/mounts that should definitely be there)
using a small C program, mocking what `auplink` does:
```c
#include <stdio.h>
#include <err.h>
#include <mntent.h>
#include <string.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
FILE *fp;
struct mntent m, *p;
char a[4096];
char buf[4096 + 1024];
int found =0, lines = 0;
if (argc != 2) {
fprintf(stderr, "Usage: %s <mountpoint>\n", argv[0]);
exit(1);
}
fp = setmntent("/proc/self/mounts", "r");
if (!fp) {
err(1, "setmntent");
}
setvbuf(fp, a, _IOLBF, sizeof(a));
while ((p = getmntent_r(fp, &m, buf, sizeof(buf)))) {
lines++;
if (!strcmp(p->mnt_dir, argv[1])) {
found++;
}
}
printf("found %d entries for %s (%d lines seen)\n", found, argv[1], lines);
return !found;
}
```
I have also wrote a few other C proggies -- one that reads
/proc/self/mounts directly, one that reads /proc/self/mountinfo instead.
They are also prone to the same occasional error.
It is not perfectly clear why this happens, but so far my best theory
is when a lot of mounts/unmounts happen in parallel with reading
contents of /proc/self/mounts, sometimes the kernel fails to provide
continuity (i.e. it skips some part of file or mixes it up in some
other way). In other words, this is a kernel bug (which is probably
hard to fix unless some other interface to get a mount entry is added).
Now, there is no real fix, and a workaround I was able to come up
with is to retry when we got EINVAL. It usually works on the second
attempt, although I've once seen it took two attempts to go through.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Do not use filepath.Walk() as there's no requirement to recursively
go into every directory under mnt -- a (non-recursive) list of
directories in mnt is sufficient.
With filepath.Walk(), in case some container will fail to unmount,
it'll go through the whole container filesystem which is both
excessive and useless.
This is similar to commit f1a4592297 ("devmapper.shutdown:
optimize")
While at it, raise the priority of "unmount error" message from debug
to a warning. Note we don't have to explicitly add `m` as unmount error (from
pkg/mount) will have it.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In case there are a big number of layers, so that mount data won't fit
into a single memory page (4096 bytes on most platforms, which is good
enough for about 40 layers, depending on how long graphdriver root path
is), we supply additional layers with O_REMOUNT, as described in aufs
documentation.
Problem is, the current implementation does that one layer at a time
(i.e. there is one mount syscall per each additional layer).
Optimize the code to supply as many layers as we can fit in one page
(basically reusing the same code as for the original mount).
Note, per aufs docs, "[a]t remount-time, the options are interpreted
in the given order, e.g. left to right" so we should be good.
Tested on an image with ~100 layers.
Before (35 syscalls):
> [pid 22756] 1556919088.686955 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/a86f8c9dd0ec2486293119c20b0ec026e19bbc4d51332c554f7cf05d777c9866", "aufs", 0, "br:/mnt/volume_sfo2_09/docker-au"...) = 0 <0.000504>
> [pid 22756] 1556919088.687643 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/a86f8c9dd0ec2486293119c20b0ec026e19bbc4d51332c554f7cf05d777c9866", 0xc000c451b0, MS_REMOUNT, "append:/mnt/volume_sfo2_09/docke"...) = 0 <0.000105>
> [pid 22756] 1556919088.687851 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/a86f8c9dd0ec2486293119c20b0ec026e19bbc4d51332c554f7cf05d777c9866", 0xc000c451ba, MS_REMOUNT, "append:/mnt/volume_sfo2_09/docke"...) = 0 <0.000098>
> ..... (~30 lines skipped for clarity)
> [pid 22756] 1556919088.696182 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/a86f8c9dd0ec2486293119c20b0ec026e19bbc4d51332c554f7cf05d777c9866", 0xc000c45310, MS_REMOUNT, "append:/mnt/volume_sfo2_09/docke"...) = 0 <0.000266>
After (2 syscalls):
> [pid 24352] 1556919361.799889 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/8e7ba189e347a834e99eea4ed568f95b86cec809c227516afdc7c70286ff9a20", "aufs", 0, "br:/mnt/volume_sfo2_09/docker-au"...) = 0 <0.001717>
> [pid 24352] 1556919361.801761 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/8e7ba189e347a834e99eea4ed568f95b86cec809c227516afdc7c70286ff9a20", 0xc000dbecb0, MS_REMOUNT, "append:/mnt/volume_sfo2_09/docke"...) = 0 <0.001358>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Apparently there is some kind of race in aufs kernel module code,
which leads to the errors like:
[98221.158606] aufs au_xino_create2:186:dockerd[25801]: aufs.xino create err -17
[98221.162128] aufs au_xino_set:1229:dockerd[25801]: I/O Error, failed creating xino(-17).
[98362.239085] aufs au_xino_create2:186:dockerd[6348]: aufs.xino create err -17
[98362.243860] aufs au_xino_set:1229:dockerd[6348]: I/O Error, failed creating xino(-17).
[98373.775380] aufs au_xino_create:767:dockerd[27435]: open /dev/shm/aufs.xino(-17)
[98389.015640] aufs au_xino_create2:186:dockerd[26753]: aufs.xino create err -17
[98389.018776] aufs au_xino_set:1229:dockerd[26753]: I/O Error, failed creating xino(-17).
[98424.117584] aufs au_xino_create:767:dockerd[27105]: open /dev/shm/aufs.xino(-17)
So, we have to have a lock around mount syscall.
While at it, don't call the whole Unmount() on an error path, as
it leads to bogus error from auplink flush.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. Use mount.Unmount() which ignores EINVAL ("not mounted") error,
and provides better error diagnostics (so we don't have to explicitly
add target to error messages).
2. Since we're ignoring "not mounted" error, we can call
multiple unmounts without any locking -- but since "auplink flush"
is still involved and can produce an error in logs, let's keep
the check for fs being mounted (it's just a statfs so should be fast).
2. While at it, improve the "can't unmount" error message in Put().
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Both mount and unmount calls are already protected by fine-grained
(per id) locks in Get()/Put() introduced in commit fc1cf1911b
("Add more locking to storage drivers"), so there's no point in
having a global lock in mount/unmount.
The only place from which unmount is called without any locking
is Cleanup() -- this is to be addressed in the next patch.
This reverts commit 824c24e680.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Today `$ docker service create --limit-cpu` configures a containers
`CpuPeriod` and `CpuQuota` variables, this commit switches this to
configure a containers `NanoCpu` variable instead.
Signed-off-by: Olly Pomeroy <olly@docker.com>
This adds both a daemon-wide flag and a container creation property:
- Set the `CgroupnsMode: "host|private"` HostConfig property at
container creation time to control what cgroup namespace the container
is created in
- Set the `--default-cgroupns-mode=host|private` daemon flag to control
what cgroup namespace containers are created in by default
- Set the default if the daemon flag is unset to "host", for backward
compatibility
- Default to CgroupnsMode: "host" for client versions < 1.40
Signed-off-by: Rob Gulewich <rgulewich@netflix.com>
This is enabled for all containers that are not run with --privileged,
if the kernel supports it.
Fixes#38332
Signed-off-by: Rob Gulewich <rgulewich@netflix.com>