The existing runtimes reload logic went to great lengths to replace the
directory containing runtime wrapper scripts as atomically as possible
within the limitations of the Linux filesystem ABI. Trouble is,
atomically swapping the wrapper scripts directory solves the wrong
problem! The runtime configuration is "locked in" when a container is
started, including the path to the runC binary. If a container is
started with a runtime which requires a daemon-managed wrapper script
and then the daemon is reloaded with a config which no longer requires
the wrapper script (i.e. some args -> no args, or the runtime is dropped
from the config), that container would become unmanageable. Any attempts
to stop, exec or otherwise perform lifecycle management operations on
the container are likely to fail due to the wrapper script no longer
existing at its original path.
Atomically swapping the wrapper scripts is also incompatible with the
read-copy-update paradigm for reloading configuration. A handler in the
daemon could retain a reference to the pre-reload configuration for an
indeterminate amount of time after the daemon configuration has been
reloaded and updated. It is possible for the daemon to attempt to start
a container using a deleted wrapper script if a request to run a
container races a reload.
Solve the problem of deleting referenced wrapper scripts by ensuring
that all wrapper scripts are *immutable* for the lifetime of the daemon
process. Any given runtime wrapper script must always exist with the
same contents, no matter how many times the daemon config is reloaded,
or what changes are made to the config. This is accomplished by using
everyone's favourite design pattern: content-addressable storage. Each
wrapper script file name is suffixed with the SHA-256 digest of its
contents to (probabilistically) guarantee immutability without needing
any concurrency control. Stale runtime wrapper scripts are only cleaned
up on the next daemon restart.
Split the derived runtimes configuration from the user-supplied
configuration to have a place to store derived state without mutating
the user-supplied configuration or exposing daemon internals in API
struct types. Hold the derived state and the user-supplied configuration
in a single struct value so that they can be updated as an atomic unit.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Ensure data-race-free access to the daemon configuration without
locking by mutating a deep copy of the config and atomically storing
a pointer to the copy into the daemon-wide configStore value. Any
operations which need to read from the daemon config must capture the
configStore value only once and pass it around to guarantee a consistent
view of the config.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Config reloading has interleaved validations and other fallible
operations with mutating the live daemon configuration. The daemon
configuration could be left in a partially-reloaded state if any of the
operations returns an error. Mutating a copy of the configuration and
atomically swapping the config struct on success is not currently an
option as config values are not copyable due to the presence of
sync.Mutex fields. Introduce a two-phase commit protocol to defer any
mutations of the daemon state until after all fallible operations have
succeeded.
Reload transactions are not yet entirely hermetic. The platform
reloading logic for custom runtimes on *nix could still leave the
directory of generated runtime wrapper scripts in an indeterminate state
if an error is encountered.
Signed-off-by: Cory Snider <csnider@mirantis.com>
This function was calling SystemInfo() only to get the daemon's name
to add to the event that's generated.
SystemInfo() is quite heavy, and no info other than the Name was used.
The name returned is just looking up the hostname, so instead, call
`hostName()` directly.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Use the default (0) value to indicate "not set", which simplifies
working with these configuration options, preventing the need to
use intermediate variables etc.
While changing this code, also making some small cleanups, such
as replacing "fmt.Sprintf()" for "strconv" variants.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
reloadMaxDownloadAttempts() is used to reload the configuration,
but validation happened before merging the config with the defaults.
This removes the validation from this function, instead centralizing
validation in config.Validate().
NOTE:
Currently this validation is "ok", as it checks for "nil" values;
I am working on changes to reduce the use of pointers in the config,
and instead provide a mechanism to fill in defaults. This change is
in preparation of that.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This is a follow-up to 427c7cc5f8, which added
proxy-configuration options ("http-proxy", "https-proxy", "no-proxy") to the
dockerd cli and in `daemon.json`.
While working on documentation changes for this feature, I realised that those
options won't be "next" to each-other when formatting the daemon.json JSON, for
example using `jq` (which sorts the fields alphabetically). As it's possible that
additional proxy configuration options are added in future, I considered that
grouping these options in a struct within the JSON may help setting these options,
as well as discovering related options.
This patch introduces a "proxies" field in the JSON, which includes the
"http-proxy", "https-proxy", "no-proxy" options.
Conflict detection continues to work as before; with this patch applied:
mkdir -p /etc/docker/
echo '{"proxies":{"http-proxy":"http-config", "https-proxy":"https-config", "no-proxy": "no-proxy-config"}}' > /etc/docker/daemon.json
dockerd --http-proxy=http-flag --https-proxy=https-flag --no-proxy=no-proxy-flag --validate
unable to configure the Docker daemon with file /etc/docker/daemon.json:
the following directives are specified both as a flag and in the configuration file:
http-proxy: (from flag: http-flag, from file: http-config),
https-proxy: (from flag: https-flag, from file: https-config),
no-proxy: (from flag: no-proxy-flag, from file: no-proxy-config)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
These are used internally only, and set by daemon.NewDaemon(). If they're
used externally, we should add an accessor added (which may be something
we want to do for daemon.registryService (which should be its own backend)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The daemon can print the proxy configuration as part of error-messages,
and when reloading the daemon configuration (SIGHUP). Make sure that
the configuration is sanitized before printing.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Moby works perfectly when you are in a situation when one has a good and stable
internet connection. Operating in area's where internet connectivity is likely
to be lost in undetermined intervals, like a satellite connection or 4G/LTE in
rural area's, can become a problem when pulling a new image. When connection is
lost while image layers are being pulled, Moby will try to reconnect up to 5 times.
If this fails, the incompletely downloaded layers are lost will need to be completely
downloaded again during the next pull request. This means that we are using more
data than we might have to.
Pulling a layer multiple times from the start can become costly over a satellite
or 4G/LTE connection. As these techniques (especially 4G) quite common in IoT and
Moby is used to run Azure IoT Edge devices, I would like to add a settable maximum
download attempts. The maximum download attempts is currently set at 5
(distribution/xfer/download.go). I would like to change this constant to a variable
that the user can set. The default will still be 5, so nothing will change from
the current version unless specified when starting the daemon with the added flag
or in the config file.
I added a default value of 5 for DefaultMaxDownloadAttempts and a settable
max-download-attempts in the daemon config file. It is also added to the config
of dockerd so it can be set with a flag when starting the daemon. This value gets
stored in the imageService of the daemon when it is initiated and can be passed
to the NewLayerDownloadManager as a parameter. It will be stored in the
LayerDownloadManager when initiated. This enables us to set the max amount of
retries in makeDownoadFunc equal to the max download attempts.
I also added some tests that are based on maxConcurrentDownloads/maxConcurrentUploads.
You can pull this version and test in a development container. Either create a config
`file /etc/docker/daemon.json` with `{"max-download-attempts"=3}``, or use
`dockerd --max-download-attempts=3 -D &` to start up the dockerd. Start downloading
a container and disconnect from the internet whilst downloading. The result would
be that it stops pulling after three attempts.
Signed-off-by: Lukas Heeren <lukas-heeren@hotmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This commit fixes two possible crashes in the `*Daemon` bound method
`reloadMaxConcurrentDownloadsAndUploads()`.
The first fixed issue is when `daemon.imageService` is `nil`. The second
panic can occur if the provided `*config.Config` is incomplete and the
fields `conf.MaxConcurrentDownloads` or `conf.MaxConcurrentUploads` are
`nil`.
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
In particular, these two:
> daemon/daemon_unix.go:1129: Wrapf format %v reads arg #1, but call has 0 args
> daemon/kill.go:111: Warn call has possible formatting directive %s
and a few more.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
When succesfully reloading the daemon configuration, print a message
in the logs with the active configuration:
INFO[2018-01-15T15:36:20.901688317Z] Got signal to reload configuration, reloading from: /etc/docker/daemon.json
INFO[2018-01-14T02:23:48.782769942Z] Reloaded configuration: {"mtu":1500,"pidfile":"/var/run/docker.pid","data-root":"/var/lib/docker","exec-root":"/var/run/docker","group":"docker","deprecated-key-path":"/etc/docker/key.json","max-concurrent-downloads":3,"max-concurrent-uploads":5,"shutdown-timeout":15,"debug":true,"hosts":["unix:///var/run/docker.sock"],"log-level":"info","swarm-default-advertise-addr":"","metrics-addr":"","log-driver":"json-file","ip":"0.0.0.0","icc":true,"iptables":true,"ip-forward":true,"ip-masq":true,"userland-proxy":true,"disable-legacy-registry":true,"experimental":false,"network-control-plane-mtu":1500,"runtimes":{"runc":{"path":"docker-runc"}},"default-runtime":"runc","oom-score-adjust":-500,"default-shm-size":67108864,"default-ipc-mode":"shareable"}
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
PR #36011 fixed almost all of the golint issues though
there is still one golint error:
https://goreportcard.com/report/github.com/docker/docker#golint
```
Golint is a linter for Go source code.
docker/daemon/reload.go
Line 64: warning: redundant if ...; err != nil check, just return error instead. (golint)
```
This fix fixes the last one.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
Add a new configuration option to allow the enabling
of the networkDB debug. The option is only parsed using the
reload event. This will protect the daemon on start or restart
if the option is left behind in the config file
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Since the commit d88fe447df ("Add support for sharing /dev/shm/ and
/dev/mqueue between containers") container's /dev/shm is mounted on the
host first, then bind-mounted inside the container. This is done that
way in order to be able to share this container's IPC namespace
(and the /dev/shm mount point) with another container.
Unfortunately, this functionality breaks container checkpoint/restore
(even if IPC is not shared). Since /dev/shm is an external mount, its
contents is not saved by `criu checkpoint`, and so upon restore any
application that tries to access data under /dev/shm is severily
disappointed (which usually results in a fatal crash).
This commit solves the issue by introducing new IPC modes for containers
(in addition to 'host' and 'container:ID'). The new modes are:
- 'shareable': enables sharing this container's IPC with others
(this used to be the implicit default);
- 'private': disables sharing this container's IPC.
In 'private' mode, container's /dev/shm is truly mounted inside the
container, without any bind-mounting from the host, which solves the
issue.
While at it, let's also implement 'none' mode. The motivation, as
eloquently put by Justin Cormack, is:
> I wondered a while back about having a none shm mode, as currently it is
> not possible to have a totally unwriteable container as there is always
> a /dev/shm writeable mount. It is a bit of a niche case (and clearly
> should never be allowed to be daemon default) but it would be trivial to
> add now so maybe we should...
...so here's yet yet another mode:
- 'none': no /dev/shm mount inside the container (though it still
has its own private IPC namespace).
Now, to ultimately solve the abovementioned checkpoint/restore issue, we'd
need to make 'private' the default mode, but unfortunately it breaks the
backward compatibility. So, let's make the default container IPC mode
per-daemon configurable (with the built-in default set to 'shareable'
for now). The default can be changed either via a daemon CLI option
(--default-shm-mode) or a daemon.json configuration file parameter
of the same name.
Note one can only set either 'shareable' or 'private' IPC modes as a
daemon default (i.e. in this context 'host', 'container', or 'none'
do not make much sense).
Some other changes this patch introduces are:
1. A mount for /dev/shm is added to default OCI Linux spec.
2. IpcMode.Valid() is simplified to remove duplicated code that parsed
'container:ID' form. Note the old version used to check that ID does
not contain a semicolon -- this is no longer the case (tests are
modified accordingly). The motivation is we should either do a
proper check for container ID validity, or don't check it at all
(since it is checked in other places anyway). I chose the latter.
3. IpcMode.Container() is modified to not return container ID if the
mode value does not start with "container:", unifying the check to
be the same as in IpcMode.IsContainer().
3. IPC mode unit tests (runconfig/hostconfig_test.go) are modified
to add checks for newly added values.
[v2: addressed review at https://github.com/moby/moby/pull/34087#pullrequestreview-51345997]
[v3: addressed review at https://github.com/moby/moby/pull/34087#pullrequestreview-53902833]
[v4: addressed the case of upgrading from older daemon, in this case
container.HostConfig.IpcMode is unset and this is valid]
[v5: document old and new IpcMode values in api/swagger.yaml]
[v6: add the 'none' mode, changelog entry to docs/api/version-history.md]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The --allow-nondistributable-artifacts daemon option specifies
registries to which foreign layers should be pushed. (By default,
foreign layers are not pushed to registries.)
Additionally, to make this option effective, foreign layers are now
pulled from the registry if possible, falling back to the URLs in the
image manifest otherwise.
This option is useful when pushing images containing foreign layers to a
registry on an air-gapped network so hosts on that network can pull the
images without connecting to another server.
Signed-off-by: Noah Treuhaft <noah.treuhaft@docker.com>