Use strongly typed errors to set HTTP status codes.
Error interfaces are defined in the api/errors package and errors
returned from controllers are checked against these interfaces.
Errors can be wraeped in a pkg/errors.Causer, as long as somewhere in the
line of causes one of the interfaces is implemented. The special error
interfaces take precedence over Causer, meaning if both Causer and one
of the new error interfaces are implemented, the Causer is not
traversed.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Since the commit d88fe447df ("Add support for sharing /dev/shm/ and
/dev/mqueue between containers") container's /dev/shm is mounted on the
host first, then bind-mounted inside the container. This is done that
way in order to be able to share this container's IPC namespace
(and the /dev/shm mount point) with another container.
Unfortunately, this functionality breaks container checkpoint/restore
(even if IPC is not shared). Since /dev/shm is an external mount, its
contents is not saved by `criu checkpoint`, and so upon restore any
application that tries to access data under /dev/shm is severily
disappointed (which usually results in a fatal crash).
This commit solves the issue by introducing new IPC modes for containers
(in addition to 'host' and 'container:ID'). The new modes are:
- 'shareable': enables sharing this container's IPC with others
(this used to be the implicit default);
- 'private': disables sharing this container's IPC.
In 'private' mode, container's /dev/shm is truly mounted inside the
container, without any bind-mounting from the host, which solves the
issue.
While at it, let's also implement 'none' mode. The motivation, as
eloquently put by Justin Cormack, is:
> I wondered a while back about having a none shm mode, as currently it is
> not possible to have a totally unwriteable container as there is always
> a /dev/shm writeable mount. It is a bit of a niche case (and clearly
> should never be allowed to be daemon default) but it would be trivial to
> add now so maybe we should...
...so here's yet yet another mode:
- 'none': no /dev/shm mount inside the container (though it still
has its own private IPC namespace).
Now, to ultimately solve the abovementioned checkpoint/restore issue, we'd
need to make 'private' the default mode, but unfortunately it breaks the
backward compatibility. So, let's make the default container IPC mode
per-daemon configurable (with the built-in default set to 'shareable'
for now). The default can be changed either via a daemon CLI option
(--default-shm-mode) or a daemon.json configuration file parameter
of the same name.
Note one can only set either 'shareable' or 'private' IPC modes as a
daemon default (i.e. in this context 'host', 'container', or 'none'
do not make much sense).
Some other changes this patch introduces are:
1. A mount for /dev/shm is added to default OCI Linux spec.
2. IpcMode.Valid() is simplified to remove duplicated code that parsed
'container:ID' form. Note the old version used to check that ID does
not contain a semicolon -- this is no longer the case (tests are
modified accordingly). The motivation is we should either do a
proper check for container ID validity, or don't check it at all
(since it is checked in other places anyway). I chose the latter.
3. IpcMode.Container() is modified to not return container ID if the
mode value does not start with "container:", unifying the check to
be the same as in IpcMode.IsContainer().
3. IPC mode unit tests (runconfig/hostconfig_test.go) are modified
to add checks for newly added values.
[v2: addressed review at https://github.com/moby/moby/pull/34087#pullrequestreview-51345997]
[v3: addressed review at https://github.com/moby/moby/pull/34087#pullrequestreview-53902833]
[v4: addressed the case of upgrading from older daemon, in this case
container.HostConfig.IpcMode is unset and this is valid]
[v5: document old and new IpcMode values in api/swagger.yaml]
[v6: add the 'none' mode, changelog entry to docs/api/version-history.md]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Delete needs to release names related to a container even if that
container isn't present in the db. However, slightly overzealous error
checking causes the transaction to get rolled back. Ignore the error
from Delete on the container itself, since it may not be present.
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Currently, names are maintained by a separate system called "registrar".
This means there is no way to atomically snapshot the state of
containers and the names associated with them.
We can add this atomicity and simplify the code by storing name
associations in the memdb. This removes the need for pkg/registrar, and
makes snapshots a lot less expensive because they no longer need to copy
all the names. This change also avoids some problematic behavior from
pkg/registrar where it returns slices which may be modified later on.
Note that while this change makes the *snapshotting* atomic, it doesn't
yet do anything to make sure containers are named at the same time that
they are added to the database. We can do that by adding a transactional
interface, either as a followup, or as part of this PR.
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Do not change pause state when restoring container's
status, or status in docker will be different with
status in runc.
Signed-off-by: Fengtu Wang <wangfengtu@huawei.com>
Description:
1. start a container with restart=always.
`docker run -d --restart=always ubuntu sleep 3`
2. container init process exits.
3. use `docker pause <id>` to pause this container.
if the pause action is before cgroup data is removed and after the init process died.
`Pause` operation will success to write cgroup data, but actually do not freeze any process.
And then docker received pause event and stateExit event from
containerd, the docker state will be Running(paused), but the container
is free running.
Then we can not remove it, stop it , pause it and unpause it.
Signed-off-by: Wentao Zhang <zhangwentao234@huawei.com>
If a container doesn't exist in the memdb, First will return nil, not an
error. This should be checked for before using the result.
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Reuse existing structures and rely on json serialization to deep copy
Container objects.
Also consolidate all "save" operations on container.CheckpointTo, which
now both saves a serialized json to disk, and replicates state to the
ACID in-memory store.
Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
Replicate relevant mutations to the in-memory ACID store. Readers will
then be able to query container state without locking.
Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
The Solaris version (previously daemon/inspect_solaris.go) was
apparently missing some fields that should be available on that
platform.
Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
Doing a chown/chmod automatically can cause `EPERM` in some cases (e.g.
with an NFS mount). Currently Docker will always call chown+chmod on a
volume path unless `:nocopy` is passed in, but we don't need to make
these calls if the perms and ownership already match and potentially
avoid an uneccessary `EPERM`.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
The use of pools.Copy avoids io.Copy's internal buffer allocation.
This commit replaces io.Copy with pools.Copy to avoid the allocation of
buffers in io.Copy.
Signed-off-by: Cristian Staretu <cristian.staretu@gmail.com>
Commit abd72d4008 added
a "FIXME" comment to the container "State", mentioning
that a container cannot be both "Running" and "Paused".
This comment was incorrect, because containers on
Linux actually _must_ be running in order to be
paused.
This patch adds additional information both in a
comment, and in the API documentation to clarify
that these booleans are not mutually exclusive.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The commit adds capability to accept csv parameters
for network option in service create/update commands.The change
includes name,alias driver options specific to the network.
With this the following will be supported
docker service create --name web --network name=docknet,alias=web1,driver-opt=field1=value1 nginx
docker service create --name web --network docknet nginx
docker service update web --network-add name=docknet,alias=web1,driver-opt=field1=value1
docker service update web --network-rm docknet
Signed-off-by: Abhinandan Prativadi <abhi@docker.com>
This patch adds the untilRemoved option to the ContainerWait API which
allows the client to wait until the container is not only exited but
also removed.
This patch also adds some more CLI integration tests for waiting for a
created container and waiting with the new --until-removed flag.
Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)
Handle detach sequence in CLI
Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)
Update Container Wait Conditions
Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)
Apply container wait changes to API 1.30
The set of changes to the containerWait API missed the cut for the
Docker 17.05 release (API version 1.29). This patch bumps the version
checks to use 1.30 instead.
This patch also makes a minor update to a testfile which was added to
the builder/dockerfile package.
Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)
Remove wait changes from CLI
Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)
Address minor nits on wait changes
- Changed the name of the tty Proxy wrapper to `escapeProxy`
- Removed the unnecessary Error() method on container.State
- Fixes a typo in comment (repeated word)
Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)
Use router.WithCancel in the containerWait handler
This handler previously added this functionality manually but now uses
the existing wrapper which does it for us.
Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)
Add WaitCondition constants to api/types/container
Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)
Address more ContainerWait review comments
- Update ContainerWait backend interface to not return pointer values
for container.StateStatus type.
- Updated container state's Wait() method comments to clarify that a
context MUST be used for cancelling the request, setting timeouts,
and to avoid goroutine leaks.
- Removed unnecessary buffering when making channels in the client's
ContainerWait methods.
- Renamed result and error channels in client's ContainerWait methods
to clarify that only a single result or error value would be sent
on the channel.
Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)
Move container.WaitCondition type to separate file
... to avoid conflict with swagger-generated code for API response
Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)
Address more ContainerWait review comments
Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)
This patch consolidates the two WaitStop and WaitWithContext methods
on the container.State type. Now there is a single method, Wait, which
takes a context and a bool specifying whether to wait for not just a
container exit but also removal.
The behavior has been changed slightly so that a wait call during a
Created state will not return immediately but instead wait for the
container to be started and then exited.
The interface has been changed to no longer block, but instead returns
a channel on which the caller can receive a *StateStatus value which
indicates the ExitCode or an error if there was one (like a context
timeout or state transition error).
These changes have been propagated through the rest of the deamon to
preserve all other existing behavior.
Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)