Commit graph

76 commits

Author SHA1 Message Date
Paweł Gronowski
2463c40144 daemon/logger: Fix TestConcurrentLogging race test
The recent fix for log corruption changed the signature of the
NewLogFile and WriteLogEntry functions and the test wasn't adjusted to
this change.

Fix the test by adjusting to the new LogFile API.

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2022-05-31 14:02:59 +02:00
Paweł Gronowski
7493342926 daemon/logger: Share buffers by sync.Pool
Marshalling log messages by json-file and local drivers involved
serializing the message into a shared buffer. This caused a regression
resulting in log corruption with recent changes where Log may be called
from multiple goroutines at the same time.

Solution is to use a sync.Pool to manage the buffers used for the
serialization. Also removed the MarshalFunc, which the driver had to
expose to the LogFile so that it can marshal the message. This is now
moved entirely to the driver.

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2022-05-27 16:44:06 +02:00
Cory Snider
a67e159909 daemon/logger: hold LogFile lock less on ReadLogs
Reduce the amount of time ReadLogs holds the LogFile fsop lock by
releasing it as soon as all the files are opened, before parsing the
compressed file headers.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-05-19 15:23:18 -04:00
Cory Snider
01915a725e daemon/logger: follow LogFile without file watches
File watches have been a source of complexity and unreliability in the
LogFile follow implementation, especially when combined with file
rotation. File change events can be unreliably delivered, especially on
Windows, and the polling fallback adds latency. Following across
rotations has never worked reliably on Windows. Without synchronization
between the log writer and readers, race conditions abound: readers can
read from the file while a log entry is only partially written, leading
to decode errors and necessitating retries.

In addition to the complexities stemming from file watches, the LogFile
follow implementation had complexity from needing to handle file
truncations, and (due to a now-fixed bug in the polling file watcher
implementation) evictions to unlock the log file so it could be rotated.
Log files are now always rotated, never truncated, so these situations
no longer need to be handled by the follow code.

Rewrite the LogFile follow implementation in terms of waiting until
LogFile notifies it that a new message has been written to the log file.
The LogFile informs the follower of the file offset of the last complete
write so that the follower knows not to read past that, preventing it
from attempting to decode partial messages and making retries
unnecessary. Synchronization between LogFile and its followers is used
at critical points to prevent missed notifications of writes and races
between file rotations and the follower opening files for read.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-05-19 15:22:22 -04:00
Cory Snider
6d5bc07189 daemon/logger: fix refcounting decompressed files
The refCounter used for sharing temporary decompressed log files and
tracking when the files can be deleted is keyed off the source file's
path. But the path of a log file is not stable: it is renamed on each
rotation. Consequently, when logging is configured with both rotation
and compression, multiple concurrent readers of a container's logs could
read logs out of order, see duplicates or decompress a log file which
has already been decompressed.

Replace refCounter with a new implementation, sharedTempFileConverter,
which is agnostic to the file path, keying off the source file's
identity instead. Additionally, sharedTempFileConverter handles the full
lifecycle of the temporary file, from creation to deletion. This is all
abstracted from the consumer: all the bookkeeping and cleanup is handled
behind the scenes when Close() is called on the returned reader value.
Only one file descriptor is used per temporary file, which is shared by
all readers.

A channel is used for concurrency control so that the lock can be
acquired inside a select statement. While not currently utilized, this
makes it possible to add support for cancellation to
sharedTempFileConverter in the future.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-05-19 15:22:22 -04:00
Cory Snider
49aa66b597 daemon/logger: rotate log files, never truncate
Truncating the current log file while a reader is still reading through
it results in log lines getting missed. In contrast, rotating the file
allows readers who have the file open can continue to read from it
undisturbed. Rotating frees up the file name for the logger to create a
new file in its place. This remains true even when max-file=1; the
current log file is "rotated" from its name without giving it a new one.

On POSIXy filesystem APIs, rotating the last file is straightforward:
unlink()ing a file name immediately deletes the name from the filesystem
and makes it available for reuse, even if processes have the file open
at the time. Windows on the other hand only makes the name available
for reuse once the file itself is deleted, which only happens when no
processes have it open. To reuse the file name while the file is still
in use, the file needs to be renamed. So that's what we have to do:
rotate the file to a temporary name before marking it for deletion.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-05-19 15:22:22 -04:00
Cory Snider
3844d1a3d1 daemon/logger: drain readers when logger is closed
The LogFile follower would stop immediately upon the producer closing.
The close signal would race the file watcher; if a message were to be
logged and the logger immediately closed, the follower could miss that
last message if the close signal (formerly ProducerGone) was to win the
race. Add logic to perform one more round of reading when the producer
is closed to catch up on any final logs.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-05-19 15:22:22 -04:00
Cory Snider
906b979b88 daemon/logger: remove ProducerGone from LogWatcher
Whether or not the logger has been closed is a property of the logger,
and only of concern to its log reading implementation, not log watchers.
The loggers and their reader implementations can communicate as they see
fit. A single channel per logger which is closed when the logger is
closed is plenty sufficient to broadcast the state to log readers, with
no extra bookeeping or synchronization required.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-05-19 15:22:22 -04:00
Cory Snider
ae5f664f4e daemon/logger: open log reader synchronously
The asynchronous startup of the log-reading goroutine made the
follow-tail tests nondeterministic. The Log calls in the tests which
were supposed to happen after the reader started reading would sometimes
execute before the reader, throwing off the counts. Tweak the ReadLogs
implementation so that the order of operations is deterministic.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-05-19 15:22:22 -04:00
Cory Snider
90c54320c8 daemon/logger: fix data race in LogFile
The log message's timestamp was being read after it was returned to the
pool. By coincidence the timestamp field happened to not be zeroed on
reset so much of the time things would work as expected. But if the
message value was to be taken back out of the pool before WriteLogEntry
returned, the timestamp recorded in the gzip header of compressed
rotated log files would be incorrect.

Make future use-after-put bugs fail fast by zeroing all fields of the
Message value, including the timestamp, when it is put into the pool.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-03-03 14:56:25 -05:00
Cory Snider
9080e5a1f7 daemon/logger: add test to detect data races
Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-03-03 14:56:25 -05:00
Kazuyoshi Kato
c91e09bee2 daemon/logger: replace flaky TestFollowLogsHandleDecodeErr
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2021-12-27 09:11:46 -08:00
Kazuyoshi Kato
7a10f5a558 daemon/logger: refactor followLogs to write more unit tests
followLogs() is getting really long (170+ lines) and complex.
The function has multiple inner functions that mutate its variables.

To refactor the function, this change introduces follow{} struct.
The inner functions are now defined as ordinal methods, which are
accessible from tests.

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2021-12-27 09:11:12 -08:00
Kazuyoshi Kato
f2e458ebc5 daemon/logger: test followLogs' handleDecodeErr case
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2021-12-15 15:13:02 -08:00
Kazuyoshi Kato
48d387a757 daemon/logger: read the length header correctly
Before this change, if Decode() couldn't read a log record fully,
the subsequent invocation of Decode() would read the record's non-header part
as a header and cause a huge heap allocation.

This change prevents such a case by having the intermediate buffer in
the decoder struct.

Fixes #42125.

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2021-12-15 15:13:02 -08:00
Akihiro Suda
9e7bbdb9ba
Merge pull request #40084 from thaJeztah/hostconfig_const_cleanup
api/types: hostconfig: add some constants/enums and minor code cleanup
2021-08-28 00:21:31 +09:00
Eng Zer Jun
c55a4ac779
refactor: move from io/ioutil to io and os package
The io/ioutil package has been deprecated in Go 1.16. This commit
replaces the existing io/ioutil functions with their new definitions in
io and os packages.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2021-08-27 14:56:57 +08:00
Sebastiaan van Stijn
686be57d0a
Update to Go 1.17.0, and gofmt with Go 1.17
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-08-24 23:33:27 +02:00
Sebastiaan van Stijn
6948ab4fa1
api/types: hostconfig: fix LogMode enum
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-08-06 19:05:58 +02:00
Kazuyoshi Kato
bb11365e96 Handle long log messages correctly on SizedLogger
Loggers that implement BufSize() (e.g. awslogs) uses the method to
tell Copier about the maximum log line length. However loggerWithCache
and RingBuffer hide the method by wrapping loggers.

As a result, Copier uses its default 16KB limit which breaks log
lines > 16kB even the destinations can handle that.

This change implements BufSize() on loggerWithCache and RingBuffer to
make sure these logger wrappes don't hide the method on the underlying
loggers.

Fixes #41794.

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2021-01-20 16:44:06 -08:00
=
b102d4637c Fix windows log file rotation with readers
This fixes the case where log rotation fails on Windows while there are
clients reading container logs.

Evicts readers if there is an error during rotation and try rotation again.
This is needed for Windows with this scenario:

1. `docker logs -f` is called
2. Log rotation occurs (log.txt -> log.txt.1, truncate and re-open
   log.txt)
3. Log rotation occurs again (rm log.txt.1, log.txt -> log.txt.1)

On step 3, before this change, the log rotation will fail with `Access
is denied`.
In this case, what we have is a reader holding a file handle to the
primary log file. The log file is then rotated, but the reader still has
a the handle open. `FILE_SHARE_DELETE` allows this to happen... but then
we try to do it again for the next rotation and it blows up.
So when it blows up we force all the readers to disconnect, close the
log file, and try rotation again, which will succeed based on the added
tests.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-10-29 18:38:30 +00:00
Brian Goff
bcc993b494 Fix logfile to open all files with custom openFile
This makes sure, on Windows, that all files are opened with
FILE_SHARE_DELETE.

On non-Windows this just calls the same `os.Open()`.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-10-28 20:36:32 +00:00
Brian Goff
56ba96b6c1 Clean up some logfile implementation
- Ignore some pointless errors (like not exist on remove)
- Consolidate error handling/logging
- Fix race condition reading last log timestamp in the compression
  goroutine. This needs to be done while holding the write lock, which
  is not (or may not be) locked while compressing a rotated log file.
- Remove some indentation and consolidate mutex unlocking in
  `compressFile`

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-10-28 20:36:32 +00:00
Brian Goff
3148a46657 Fix various race conditions in loggerutils
Found by running with `go test -race`

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-10-28 20:36:32 +00:00
Brian Goff
c6d860ace6 Fix log file rotation test.
The test was looking for the wrong file name.
Since compression happens asyncronously, sometimes the test would
succeed and sometimes fail.

This change makes sure to wait for the compressed version of the file
since we can't know when the compression is going to occur.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-07-15 11:06:07 -07:00
Brian Goff
5ea5c02c88 Fix flakey test for log file rotate.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-05-18 10:27:53 -07:00
Brian Goff
75d655320e
Merge pull request #40920 from cpuguy83/log_rotate_error_handling
logfile: Check if log is closed on close error during rotate
2020-05-07 14:45:42 -07:00
Brian Goff
3989f91075 logfile: Check if log is closed on close error during rotate
This prevents getting into a situation where a container log cannot make
progress because we tried to rotate a file, got an error, and now the
file is closed. The next time we try to write a log entry it will try
and rotate again but error that the file is already closed.

I wonder if there is more we can do to beef up this rotation logic.
Found this issue while investigating missing logs with errors in the
docker daemon logs like:

```
Failed to log message for json-file: error closing file: close <file>:
file already closed
```

I'm not sure why the original rotation failed since the data was no
longer available.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-05-07 11:37:06 -07:00
Sebastiaan van Stijn
07d60bc257
Replace errors.Cause() with errors.Is() / errors.As()
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-04-29 00:28:41 +02:00
Brian Goff
933a87236f Reduce allocations for logfile reader
Before this change, the log decoder function provided by the log driver
to logfile would not be able to re-use buffers, causing undeeded
allocations and memory bloat for dockerd.

This change introduces an interface that allows the log driver to manage
it's memory usge more effectively.
This only affects json-file and local log drivers.

`json-file` still is not great just because of how the json decoder in the
stdlib works.
`local` is significantly improved.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-04-08 12:24:31 -07:00
Brian Goff
750f0d1648 Support configuration of log cacher.
Configuration over the API per container is intentionally left out for
the time being, but is supported to configure the default from the
daemon config.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit cbecf48bc352e680a5390a7ca9cff53098cd16d7)
Signed-off-by: Madhu Venugopal <madhu@docker.com>
2020-02-19 17:02:34 -05:00
Brian Goff
e2ceb83a53 Support reads for all log drivers.
This supplements any log driver which does not support reads with a
custom read implementation that uses a local file cache.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit d675e2bf2b75865915c7a4552e00802feeb0847f)
Signed-off-by: Madhu Venugopal <madhu@docker.com>
2020-02-19 17:01:44 -05:00
Sebastiaan van Stijn
9f0b3f5609
bump gotest.tools v3.0.1 for compatibility with Go 1.14
full diff: https://github.com/gotestyourself/gotest.tools/compare/v2.3.0...v3.0.1

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-02-11 00:06:42 +01:00
Sebastiaan van Stijn
6ee536b4a0
daemon: remove use of deprecated os.SEEK_END
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-11-25 18:53:03 +01:00
Brian Goff
a5f237c2b5 Use FILE_SHARE_DELETE for log files on Windows.
This fixes issues where one goroutine tries to delete or rename a file
while another goroutine has the file open (e.g. a log reader).

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2019-09-23 16:45:07 -07:00
Kir Kolyshkin
9cd24ba605 logger: fix follow logs for max-file=1
In case jsonlogfile is used with max-file=1 and max-size set,
the log rotation is not perfomed; instead, the log file is closed
and re-open with O_TRUNC.

This situation is not handled by the log reader in follow mode,
leading to an issue of log reader being stuck forever.

This situation (file close/reopen) could be handled in waitRead(),
but fsnotify library chose to not listen to or deliver this event
(IN_CLOSE_WRITE in inotify lingo).

So, we have to handle this by checking the file size upon receiving
io.EOF from the log reader, and comparing the size with the one received
earlier. In case the new size is less than the old one, the file was
truncated and we need to seek to its beginning.

Fixes #39235.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2019-09-20 16:09:39 -07:00
Sebastiaan van Stijn
3449b12cc7
Use assert.NilError() instead of assert.Assert()
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-01-21 13:16:02 +01:00
Kir Kolyshkin
f845d76d04 TestFollowLogsProducerGone: add
This should test that
 - all the messages produced are delivered (i.e. not lost)
 - followLogs() exits

Loosely based on the test having the same name by Brian Goff, see
https://gist.github.com/cpuguy83/e538793de18c762608358ee0eaddc197

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2018-09-06 11:48:37 -07:00
Kir Kolyshkin
916eabd459 daemon.ContainerLogs(): fix resource leak on follow
When daemon.ContainerLogs() is called with options.follow=true
(as in "docker logs --follow"), the "loggerutils.followLogs()"
function never returns (even then the logs consumer is gone).
As a result, all the resources associated with it (including
an opened file descriptor for the log file being read, two FDs
for a pipe, and two FDs for inotify watch) are never released.

If this is repeated (such as by running "docker logs --follow"
and pressing Ctrl-C a few times), this results in DoS caused by
either hitting the limit of inotify watches, or the limit of
opened files. The only cure is daemon restart.

Apparently, what happens is:

1. logs producer (a container) is gone, calling (*LogWatcher).Close()
for all its readers (daemon/logger/jsonfilelog/jsonfilelog.go:175).

2. WatchClose() is properly handled by a dedicated goroutine in
followLogs(), cancelling the context.

3. Upon receiving the ctx.Done(), the code in followLogs()
(daemon/logger/loggerutils/logfile.go#L626-L638) keeps to
send messages _synchronously_ (which is OK for now).

4. Logs consumer is gone (Ctrl-C is pressed on a terminal running
"docker logs --follow"). Method (*LogWatcher).Close() is properly
called (see daemon/logs.go:114). Since it was called before and
due to to once.Do(), nothing happens (which is kinda good, as
otherwise it will panic on closing a closed channel).

5. A goroutine (see item 3 above) keeps sending log messages
synchronously to the logWatcher.Msg channel. Since the
channel reader is gone, the channel send operation blocks forever,
and resource cleanup set up in defer statements at the beginning
of followLogs() never happens.

Alas, the fix is somewhat complicated:

1. Distinguish between close from logs producer and logs consumer.
To that effect,
 - yet another channel is added to LogWatcher();
 - {Watch,}Close() are renamed to {Watch,}ProducerGone();
 - {Watch,}ConsumerGone() are added;

*NOTE* that ProducerGone()/WatchProducerGone() pair is ONLY needed
in order to stop ConsumerLogs(follow=true) when a container is stopped;
otherwise we're not interested in it. In other words, we're only
using it in followLogs().

2. Code that was doing (logWatcher*).Close() is modified to either call
ProducerGone() or ConsumerGone(), depending on the context.

3. Code that was waiting for WatchClose() is modified to wait for
either ConsumerGone() or ProducerGone(), or both, depending on the
context.

4. followLogs() are modified accordingly:
 - context cancellation is happening on WatchProducerGone(),
and once it's received the FileWatcher is closed and waitRead()
returns errDone on EOF (i.e. log rotation handling logic is disabled);
 - due to this, code that was writing synchronously to logWatcher.Msg
can be and is removed as the code above it handles this case;
 - function returns once ConsumerGone is received, freeing all the
resources -- this is the bugfix itself.

While at it,

1. Let's also remove the ctx usage to simplify the code a bit.
It was introduced by commit a69a59ffc7 ("Decouple removing the
fileWatcher from reading") in order to fix a bug. The bug was actually
a deadlock in fsnotify, and the fix was just a workaround. Since then
the fsnofify bug has been fixed, and a new fsnotify was vendored in.
For more details, please see
https://github.com/moby/moby/pull/27782#issuecomment-416794490

2. Since `(*filePoller).Close()` is fixed to remove all the files
being watched, there is no need to explicitly call
fileWatcher.Remove(name) anymore, so get rid of the extra code.

Should fix https://github.com/moby/moby/issues/37391

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2018-09-06 11:47:42 -07:00
Brian Goff
d37a11bfba daemon/logger/loggerutils: add TestFollowLogsClose
This test case checks that followLogs() exits once the reader is gone.
Currently it does not (i.e. this test is supposed to fail) due to #37391.

[kolyshkin@: test case Brian Goff, changelog and all bugs are by me]
Source: https://gist.github.com/cpuguy83/e538793de18c762608358ee0eaddc197

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2018-09-06 11:46:34 -07:00
SeungUkLee
a79f8b48d4 fixed typo (becuase -> because)
Signed-off-by: SeungUkLee <lsy931106@gmail.com>
2018-08-26 17:30:40 +09:00
Brian Goff
5da8bc2e5b Remove now unused multireader.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-08-20 09:42:19 -07:00
Sebastiaan van Stijn
e0ad6d045c
Merge pull request #37092 from cpuguy83/local_logger
Add "local" log driver
2018-08-20 07:01:41 +01:00
Sebastiaan van Stijn
678d4b3a6d
Merge pull request #37412 from AzureCR/naduggar/logissue
Select polling based watcher for docker log file watcher on Windows
2018-08-14 14:40:44 +02:00
Brian Goff
94a10150f6 Decouple logfile from tailfile.
This makes it so consumers of `LogFile` should pass in how to get an
io.Reader to the requested number of lines to tail.

This is also much more efficient when tailing a large number of lines.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-08-10 21:02:19 -07:00
Kir Kolyshkin
2c6fbd864a loggerutils: fix a typo
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2018-07-31 15:44:46 +03:00
Kir Kolyshkin
09ad434f10
loggerutils: build fixes, improve errors
There are two build errors when using go-1.11beta1:

> daemon/logger/loggerutils/logfile.go:367: Warningf format %q arg f.Name is a func value, not called
> daemon/logger/loggerutils/logfile.go:564: Debug call has possible formatting directive %v

In the first place, the file name is actually not required as error
message already includes it.

While at it, fix a couple of other places for more correct messages, and
make sure to not add a file name if an error already has it.

Fixes: f69f09f44c
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2018-07-11 15:52:25 +02:00
Tejaswini Duggaraju
df84cdd091 Select polling based watcher for Windows log watcher
Signed-off-by: Tejaswini Duggaraju <naduggar@microsoft.com>
2018-07-10 10:20:10 -07:00
Sebastiaan van Stijn
f23c00d870
Various code-cleanup
remove unnescessary import aliases, brackets, and so on.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2018-05-23 17:50:54 +02:00
Brian Goff
e87e9e6ad6 Fix some issues in logfile reader and rotation
- Check errors.Cause(err) when comparing errors
- Fix bug where oldest log file is not actually removed. This in
particular causes issues when compression is enabled. On rotate it just
overwrites the data in the log file corrupting it.
- Use O_TRUNC to open new gzip files to ensure we don't corrupt log
files as happens without the above fix.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-05-14 15:52:18 -04:00