Moby works perfectly when you are in a situation when one has a good and stable
internet connection. Operating in area's where internet connectivity is likely
to be lost in undetermined intervals, like a satellite connection or 4G/LTE in
rural area's, can become a problem when pulling a new image. When connection is
lost while image layers are being pulled, Moby will try to reconnect up to 5 times.
If this fails, the incompletely downloaded layers are lost will need to be completely
downloaded again during the next pull request. This means that we are using more
data than we might have to.
Pulling a layer multiple times from the start can become costly over a satellite
or 4G/LTE connection. As these techniques (especially 4G) quite common in IoT and
Moby is used to run Azure IoT Edge devices, I would like to add a settable maximum
download attempts. The maximum download attempts is currently set at 5
(distribution/xfer/download.go). I would like to change this constant to a variable
that the user can set. The default will still be 5, so nothing will change from
the current version unless specified when starting the daemon with the added flag
or in the config file.
I added a default value of 5 for DefaultMaxDownloadAttempts and a settable
max-download-attempts in the daemon config file. It is also added to the config
of dockerd so it can be set with a flag when starting the daemon. This value gets
stored in the imageService of the daemon when it is initiated and can be passed
to the NewLayerDownloadManager as a parameter. It will be stored in the
LayerDownloadManager when initiated. This enables us to set the max amount of
retries in makeDownoadFunc equal to the max download attempts.
I also added some tests that are based on maxConcurrentDownloads/maxConcurrentUploads.
You can pull this version and test in a development container. Either create a config
`file /etc/docker/daemon.json` with `{"max-download-attempts"=3}``, or use
`dockerd --max-download-attempts=3 -D &` to start up the dockerd. Start downloading
a container and disconnect from the internet whilst downloading. The result would
be that it stops pulling after three attempts.
Signed-off-by: Lukas Heeren <lukas-heeren@hotmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
staticcheck go linter warns:
> distribution/xfer/transfer_test.go:37:2: SA2002: the goroutine calls T.Fatalf, which must be called in the same goroutine as the test (staticcheck)
What it doesn't say is why. The reason is, t.Fatalf() calls t.FailNow(),
which is expected to stop test execution right now. It does so by
calling runtime.Goexit(), which, unless called from a main goroutine,
does not stop test execution.
Anyway, long story short, if we don't care much about stopping the test
case immediately, we can just replace t.Fatalf() with t.Errorf() which
still marks the test case as failed, but won't stop it immediately.
This patch was tested to check that the test fails if any of the
goroutines call t.Errorf():
1. Failure in DoFunc ("transfer function not started ...") was tested by
decreading the NewTransferManager() argument:
- tm := NewTransferManager(5)
+ tm := NewTransferManager(2)
2. Failure "got unexpected progress value" was tested by injecting a random:
- if present && p.Current <= val {
+ if present && p.Current <= val || rand.Intn(100) > 80 {
3. Failure in DoFunc ("too many jobs running") was tested by increasing
the NewTransferManager() argument:
- tm := NewTransferManager(concurrencyLimit)
+ tm := NewTransferManager(concurrencyLimit + 1)
While at it:
* fix/amend some error messages
* use _ for unused arguments of DoFunc
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Format the source according to latest goimports.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Since Go 1.7, context is a standard package. Since Go 1.9, everything
that is provided by "x/net/context" is a couple of type aliases to
types in "context".
Many vendored packages still use x/net/context, so vendor entry remains
for now.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: John Howard <jhoward@microsoft.com>
The re-coalesces the daemon stores which were split as part of the
original LCOW implementation.
This is part of the work discussed in https://github.com/moby/moby/issues/34617,
in particular see the document linked to in that issue.
Signed-off-by: John Howard <jhoward@microsoft.com>
This PR has the API changes described in https://github.com/moby/moby/issues/34617.
Specifically, it adds an HTTP header "X-Requested-Platform" which is a JSON-encoded
OCI Image-spec `Platform` structure.
In addition, it renames (almost all) uses of a string variable platform (and associated)
methods/functions to os. This makes it much clearer to disambiguate with the swarm
"platform" which is really os/arch. This is a stepping stone to getting the daemon towards
fully multi-platform/arch-aware, and makes it clear when "operating system" is being
referred to rather than "platform" which is misleadingly used - sometimes in the swarm
meaning, but more often as just the operating system.
The `digest` data type, used throughout docker for image verification
and identity, has been broken out into `opencontainers/go-digest`. This
PR updates the dependencies and moves uses over to the new type.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
- Make it possible to define a shorter waiting time of httputils
- Make a small hack to reduce the waiting time on distribution/xfer
Signed-off-by: Vincent Demeester <vincent@sbr.pm>
Signed-off-by: liwenqi <vikilwq@zju.edu.cn>
update some files in the folder of distribution/xfer
Signed-off-by: liwenqi <vikilwq@zju.edu.cn>
correct again
Signed-off-by: liwenqi <vikilwq@zju.edu.cn>
Move some of the optional parameters of CreateRWLayer() in a struct
called CreateRWLayerOpts. This will make it easy to add more options
arguments without having to change signature of CreateRWLayer().
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Allow built images to be squash to scratch.
Squashing does not destroy any images or layers, and preserves the
build cache.
Introduce a new CLI argument --squash to docker build
Introduce a new param to the build API endpoint `squash`
Once the build is complete, docker creates a new image loading the diffs
from each layer into a single new layer and references all the parent's
layers.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This fix tries to address issues raised in #20936 and #22443
where `docker pull` or `docker push` fails because of the
concurrent connection failing.
Currently, the number of maximum concurrent connections is
controlled by `maxDownloadConcurrency` and `maxUploadConcurrency`
which are hardcoded to 3 and 5 respectively. Therefore, in
situations where network connections don't support multiple
downloads/uploads, failures may encounter for `docker push`
or `docker pull`.
This fix tries changes `maxDownloadConcurrency` and
`maxUploadConcurrency` to adjustable by passing
`--max-concurrent-uploads` and `--max-concurrent-downloads` to
`docker daemon` command.
The documentation related to docker daemon has been updated.
Additional test case have been added to cover the changes in this fix.
This fix fixes#20936. This fix fixes#22443.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
Fix unmount issues in the daemon crash and restart lifecycle, w.r.t
graph drivers. This change sets a live container RWLayer's activity
count to 1, so that the RWLayer is aware of the mount. Note that
containerd has experimental support for restore live containers.
Added/updated corresponding tests.
Signed-off-by: Anusha Ragunathan <anusha@docker.com>
This test was checking that it received every progress update that was
produced. But delivery of these intermediate progress updates is not
guaranteed. A new update can overwrite the previous one if the previous
one hasn't been sent to the channel yet.
The call to t.Fatalf exited the current goroutine which was consuming
the channel, which caused a deadlock and eventual test timeout rather
than a proper failure message.
Failure seen here:
https://jenkins.dockerproject.org/job/Docker-PRs-experimental/16400/console
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
The download manager assumed there was at least one layer involved in
all images. This can be false if the image is essentially a copy of
`scratch`.
Fix a nil pointer dereference that happened in this case. Add
integration tests that involve schema1 and schema2 manifests.
Fixes#21213
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Concurrent uploads which share layers worked correctly as of #18353,
but unfortunately #18785 caused a regression. This PR removed the logic
that shares digests between different push sessions. This overlooked the
case where one session was waiting for another session to upload a
layer.
This commit adds back the ability to propagate this digest information,
using the distribution.Descriptor type because this is what is received
from stats and uploads, and also what is ultimately needed for building
the manifest.
Surprisingly, there was no test covering this case. This commit adds
one. It fails without the fix.
See recent comments on #9132.
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
A watcher would output the current progress item when it was detached,
in case it missed that item earlier, which would leave the user seeing
some intermediate step of the operation. This commit changes it to only
output it on detach if it didn't already output the same item.
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Things could go wrong if Watch was called after the last existing
watcher was released. The call to Watch would succeed even though it was
not really adding a watcher, and the corresponding call to Release would
close hasWatchers a second time.
The fix for this is twofold:
1. We allow transfers to gain new watchers after the watcher count has
touched zero. This means that the channel returned by Released should
not be closed until all watchers have been released AND the transfer is
no longer tracked by the transfer manager, meaning it won't be possible
for additional calls to Watch to race with closing the channel returned
by Released.
The Transfer interface has a new method called Close so the transfer can
know when the transfer manager no longer references it.
Remove the Cancel method. It's not used and should not be exported.
2. Even though (1) makes it possible to add watchers after all the
previous watchers have been released, we want to avoid doing this in
practice. A transfer that has had all its watchers released is in the
process of being cancelled, and attaching to one of these will never be
the correct behavior. Add a check if a watcher is attaching to a
cancelled transfer. In this case, wait for the transfer to be removed
from the map and try again. This will ensure correct behavior when a
watcher tries to attach during the race window.
Either (1) or (2) should be sufficient to fix the race involved here,
but the combination is the most correct approach. (1) fixes the
low-level plumbing to be resilient to the race condition, and (2) avoids
using it in a racy way.
Fixes#19606
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
One of the things this test checks is that the progress indicator
completes for each download. Some progress messages may be lost because
only one message is buffered for each download, but the last progress
message is guaranteed not to be lost. The test therefore checks for a
10/10 progress indication.
However, the assumption that this is the last progress message to be
sent is incorrect. The last message is actually "Pull complete". So
check for this instead.
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Currently this always uses the schema1 manifest builder. Later, it will
be changed to attempt schema2 first, and fall back when necessary.
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Support restoreCustomImage for windows with a new interface to extract
the graph driver from the LayerStore.
Signed-off-by: Daniel Nephin <dnephin@docker.com>