Commit graph

20 commits

Author SHA1 Message Date
Brian Goff
0901d4ab31 Use condition variable to wake stats collector.
Before the collection goroutine wakes up every 1 second (as configured).
This sleep interval is in case there are no stats to collect we don't
end up in a tight loop.

Instead use a condition variable to signal that a collection is needed.
This prevents us from waking the goroutine needlessly when there is no
one looking for stats.

For now I've kept the sleep just moved it to the end of the loop, which
gives some space between collections.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit e75e6b0e31)
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-02-20 11:38:16 -08:00
Tonis Tiigi
80e2871d21 stats: avoid cgo in collector
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
(cherry picked from commit cf104d85c3)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2019-08-01 14:38:02 -07:00
Tibor Vass
119f892016
Merge pull request #38510 from ZYecho/tune-code
fix: simplify code logic
2019-03-21 13:56:02 -07:00
John Howard
0a30ef4c59 Publish empty stats on error
Signed-off-by: John Howard <jhoward@microsoft.com>
2019-03-12 18:41:55 -07:00
zhangyue
c6894aa492 fix: simplify code logic
Signed-off-by: zhangyue <zy675793960@yeah.net>
2019-01-08 11:00:34 +08:00
Sebastiaan van Stijn
481b8e54b4
Fix stats collector spinning CPU if no stats are collected
Commit fd0e24b718 changed
the stats collection loop to use a `sleep()` instead
of `time.Tick()` in the for-loop.

This change caused a regression in situations where
no stats are being collected, or an error is hit
in the loop (in which case the loop would `continue`,
and the `sleep()` is not hit).

This patch puts the sleep at the start of the loop
to guarantee it's always hit.

This will delay the sampling, which is similar to the
behavior before fd0e24b718.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2018-03-15 17:56:15 +01:00
Yong Tang
623b1a5c3c
Merge pull request #36519 from stevvooe/resilient-cpu-sampling
daemon/stats: more resilient cpu sampling
2018-03-09 14:34:45 -08:00
Stephen J Day
fd0e24b718
daemon/stats: more resilient cpu sampling
To avoid noise in sampling CPU usage metrics, we now sample the system
usage closer to the actual response from the underlying runtime. Because
the response from the runtime may be delayed, this makes the sampling
more resilient in loaded conditions. In addition to this, we also
replace the tick with a sleep to avoid situations where ticks can backup
under loaded conditions.

The trade off here is slightly more load reading the system CPU usage
for each container. There may be an optimization required for large
amounts of containers but the cost is on the order of 15 ms per 1000
containers. If this becomes a problem, we can time slot the sampling,
but the complexity may not be worth it unless we can test further.

Unfortunately, there aren't really any good tests for this condition.
Triggering this behavior is highly system dependent. As a matter of
course, we should qualify the fix with the users that are affected.

Signed-off-by: Stephen J Day <stephen.day@docker.com>
2018-03-07 13:20:21 -08:00
Stephen J Day
244e59e94f
daemon/stats: remove obnoxious types file
While a `types.go` file is handly when there are a lot of record types,
it is completely obnoxious when used for concrete, utility types with a
struct, new function and method set in the same file. This change
removes the `types.go` file in favor of the simpler approach.

Signed-off-by: Stephen J Day <stephen.day@docker.com>
2018-03-05 15:59:04 -08:00
Daniel Nephin
4f0d95fa6e Add canonical import comment
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2018-02-05 16:51:57 -05:00
Sebastiaan van Stijn
6ed1163c98
Remove redundant build-tags
Files that are suffixed with `_linux.go` or `_windows.go` are
already only built on Linux / Windows, so these build-tags
were redundant.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2017-12-18 17:41:53 +01:00
Yong Tang
4785f1a7ab Remove solaris build tag and `contrib/mkimage/solaris
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2017-11-02 00:01:46 +00:00
Michael Crosby
5a9b5f10cf Remove solaris files
For obvious reasons that it is not really supported now.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-10-24 15:39:34 -04:00
Brian Goff
ebcb7d6b40 Remove string checking in API error handling
Use strongly typed errors to set HTTP status codes.
Error interfaces are defined in the api/errors package and errors
returned from controllers are checked against these interfaces.

Errors can be wraeped in a pkg/errors.Causer, as long as somewhere in the
line of causes one of the interfaces is implemented. The special error
interfaces take precedence over Causer, meaning if both Causer and one
of the new error interfaces are implemented, the Causer is not
traversed.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2017-08-15 16:01:11 -04:00
Derek McGowan
1009e6a40b
Update logrus to v1.0.1
Fixes case sensitivity issue

Signed-off-by: Derek McGowan <derek@mcgstyle.net>
2017-07-31 13:16:46 -07:00
Yuanhong Peng
4a6cbf9bcb Return an empty stats if "container not found"
If we get "container not found" error from containerd, it's possibly
because that this container has already been stopped. It will be ok to
ignore this error and just return an empty stats.

Signed-off-by: Yuanhong Peng <pengyuanhong@huawei.com>
2017-07-10 16:30:48 +08:00
Pavel Tikhomirov
dec084962e Do not treat C.sysconf(C._SC_NPROCESSORS_ONLN) non-zero errno as error
Treat return code -1 as error instead.

People from glibc say that errno is undefined in case of successful
sysconf call according to POSIX standard:
Glibc bug: https://sourceware.org/bugzilla/show_bug.cgi?id=21536

More over in sysconf man it is wrongly said that "errno is not changed"
on success. So I've created a bug to man-pages:
https://bugzilla.kernel.org/show_bug.cgi?id=195955

Background: Glibc's sysconf(_SC_NPROCESSORS_ONLN) changes errno to
ENOENT, if there is no /sys/devices/system/cpu/online file, while
the call itself is successful. In Virtuozzo containers we prohibit
most of sysfs files for security reasons. So we have Run():daemon
/stats/collector.go infinitely loop never actualy collecting stats
from publisher pairs.

v2: add comment

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2017-06-01 18:23:49 +03:00
Ian Campbell
115f91d757 Correct CPU usage calculation in presence of offline CPUs and newer Linux
In https://github.com/torvalds/linux/commit/5ca3726 (released in v4.7-rc1) the
content of the `cpuacct.usage_percpu` file in sysfs was changed to include both
online and offline cpus. This broke the arithmetic in the stats helpers used by
`docker stats`, since it was using the length of the PerCPUUsage array as a
proxy for the number of online CPUs.

Add current number of online CPUs to types.StatsJSON and use it in the
calculation.

Keep a fallback to `len(v.CPUStats.CPUUsage.PercpuUsage)` so this code
continues to work when talking to an older daemon. An old client talking to a
new daemon will ignore the new field and behave as before.

Fixes #28941.

Signed-off-by: Ian Campbell <ian.campbell@docker.com>
2017-03-10 10:24:33 +00:00
Zhang Wei
eb3a7c2329 Send "Name" and "ID" when stating stopped containers
When `docker stats` stopped containers, client will get empty stats data,
this commit will gurantee client always get "Name" and "ID" field, so
that it can format with `ID` and `Name` fields successfully.

Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
2017-02-09 09:46:59 +08:00
Vincent Demeester
835971c6fd
Extract daemon statsCollector to its own package
Signed-off-by: Vincent Demeester <vincent@sbr.pm>
2017-01-04 18:18:30 +01:00