moby/docs/cluster_volumes.md

Cluster Volumes
===============

Docker Cluster Volumes is a new feature which allows using CSI plugins to
create cluster-aware volumes.

The Container Storage Interface is a platform-agnostic API for storage
providers to write storage plugins which are compatible with many container
orchestrators. By leveraging the CSI, Docker Swarm can provide intelligent,
cluster-aware access to volumes across many supported storage providers.

## Installing a CSI plugin

Docker accesses CSI plugins through the Docker managed plugin system, using the
`docker plugin` command.

If a plugin is available for Docker, it can be installed through the `docker
plugin install` command. Plugins may require configuration specific to the
user's environment, they will ultimately be detected by and work automatically
with Docker once enabled.

Currently, there is no way to automatically deploy a Docker Plugin across all
nodes in a cluster. Therefore, users must ensure the Docker Plugin is installed
on all nodes in the cluster on which it is desired.

The CSI plugin must be installed on all manager nodes. If a manager node does
not have the CSI plugin installed, a leadership change to that manager nodes
will make Swarm unable to use that driver.

Docker Swarm worker nodes report their active plugins to the Docker Swarm
managers, so it is not necessary to install a plugin on every worker node. The
plugin only needs to be installed on those nodes need access to the volumes
provided by that plugin.

### Multiple Instances of the Same Plugin

In some cases, it may be desirable to run multiple instances of the same
plugin. For example, there may be two different instances of some storage
provider which each need a differently configured plugin.

To run more than one instance of the same plugin, set the `--alias` option when
installing the plugin. This will cause the plugin to take a local name
different from its original name.

Ensure that when using plugin name aliases, the plugin name alias is the same
on every node.

## Creating a Docker CSI Plugin

Most CSI plugins are shipped with configuration specific to Kubernetes. They
are often provided in the form of Helm charts, and installation information may
include Kubernetes-specific steps. Docker CSI Plugins use the same binaries as
those for Kubernetes, but in a different environment and sometimes with
different configuration.

Before following this section, readers should ensure they are acquainted with
the
[Docker Engine managed plugin system](https://docs.docker.com/engine/extend/).
Docker CSI plugins use this system to run.

Docker Plugins consist of a root filesystem and a `config.json`. The root
filesystem can generally be exported from whatever image is built for the
plugin. The `config.json` specifies how the plugin is used. Several
CSI-specific concerns, as well as some general but poorly-documented features,
are outlined here.

### Basic Requirements

Docker CSI plugins are identified with a special interface type. There are two
related interfaces that CSI plugins can expose.

* `docker.csicontroller/1.0` is used for CSI Controller plugins.
* `docker.csinode/1.0` is used for CSI Node plugins.
* Combined plugins should include both interfaces.

Additionally, the interface field of the config.json includes a `socket` field.
This can be set to any value, but the CSI plugin should have its `CSI_ENDPOINT`
environment variable set appropriately.

In the `config.json`, this should be set as such:

```json
  "interface": {
    "types": ["docker.csicontroller/1.0","docker.csinode/1.0"],
    "socket": "my-csi-plugin.sock"
  },
  "env": [
    {
      "name": "CSI_ENDPOINT",
      "value": "/run/docker/plugins/my-csi-plugin.sock"
    }
  ]
```

The CSI specification states that CSI plugins should have
`CAP_SYS_ADMIN` privileges, so this should be set in the `config.json` as
well:

```json
  "linux" : {
    "capabilities": ["CAP_SYS_ADMIN"]
  }
```

### Propagated Mount

In order for the plugin to expose volumes to Swarm, it must publish those
volumes to a Propagated Mount location. This allows a mount to be itself
mounted to a different location in the filesystem. The Docker Plugin system
only allows one Propagated Mount, which is configured as a string representing
the path in the plugin filesystem.

When calling the CSI plugin, Docker Swarm specifies the publish target path,
which is the path in the plugin filesystem that a volume should ultimately be
used from. This is also the path that needs to be specified as the Propagated
Mount for the plugin. This path is hard-coded to be `/data/published` in the
plugin filesystem, and as such, the plugin configuration should list this as
the Propagated Mount:

```json
  "propagatedMount": "/data/published"
```

### Configurable Options

Plugin configurations can specify configurable options for many fields. To
expose a field as configurable, the object including that field should include
a field `Settable`, which is an array of strings specifying the name of
settable fields.

For example, consider a plugin that supports a config file.

```json
  "mounts": [
    {
      "name": "configfile",
      "description": "Config file mounted in from the host filesystem",
      "type": "bind",
      "destination": "/opt/my-csi-plugin/config.yaml",
      "source": "/etc/my-csi-plugin/config.yaml"
    }
  ]
```

This configuration would result in a file located on the host filesystem at
`/etc/my-csi-plugin/config.yaml` being mounted into the plugin filesystem at
`/opt/my-csi-plugin/config.yaml`. However, hard-specifying the source path of
the configuration is undesirable. Instead, the plugin author can put the
`Source` field in the Settable array:

```json
  "mounts": [
    {
      "name": "configfile",
      "description": "Config file mounted in from the host filesystem",
      "type": "bind",
      "destination": "/opt/my-csi-plugin/config.yaml",
      "source": "",
      "settable": ["source"]
    }
  ]
```

When a field is exposed as settable, the user can configure that field when
installing the plugin.

```
$ docker plugin install my-csi-plugin configfile.source="/srv/my-csi-plugin/config.yaml"
```

Or, alternatively, it can be set while the plugin is disabled:

```
$ docker plugin disable my-csi-plugin
$ docker plugin set my-csi-plugin configfile.source="/var/lib/my-csi-plugin/config.yaml"
$ docker plugin enable
```

### Split-Component Plugins

For split-component plugins, users can specify either the
`docker.csicontroller/1.0` or `docker.csinode/1.0` plugin interfaces. Manager
nodes should run plugin instances with the `docker.csicontroller/1.0`
interface, and worker nodes the `docker.csinode/1.0` interface.

Docker does support running two plugins with the same name, nor does it support
specifying different drivers for the node and controller plugins. This means in
a fully split plugin, Swarm will be unable to schedule volumes to manager
nodes.

If it is desired to run a split-component plugin such that the Volumes managed
by that plugin are accessible to Tasks on the manager node, the user will need
to build the plugin such that some proxy or multiplexer provides the illusion
of combined components to the manager through one socket, and ensure the plugin
reports both interface types.

## Using Cluster Volumes

### Create a Cluster Volume

Creating a Cluster Volume is done with the same `docker volume` commands as any
other Volume. To create a Cluster Volume, one needs to do both of things:

* Specify a CSI-capable driver with the `--driver` or `-d` option.
* Use any one of the cluster-specific `docker volume create` flags.

For example, to create a Cluster Volume called `my-volume` with the
`democratic-csi` Volume Driver, one might use this command:

```bash
docker volume create \
  --driver democratic-csi \
  --type mount \
  --sharing all \
  --scope multi \
  --limit-bytes 10G \
  --required-bytes 1G \
  my-volume
```

### List Cluster Volumes

Cluster Volumes will be listed along with other volumes when doing
`docker volume ls`. However, if users want to see only Cluster Volumes, and
with cluster-specific information, the flag `--cluster` can be specified:

```
$ docker volume ls --cluster
VOLUME NAME   GROUP     DRIVER    AVAILABILITY   STATUS
volume1       group1    driver1   active         pending creation
volume2       group1    driver1   pause          created
volume3       group2    driver2   active         in use (1 node)
volume4       group2    driver2   active         in use (2 nodes)
```

### Deploying a Service

Cluster Volumes are only compatible with Docker Services, not plain Docker
Containers.

In Docker Services, a Cluster Volume is used the same way any other volume
would be used. The `type` should be set to `cluster`. For example, to create a
Service that uses `my-volume` created above, one would execute a command like:

```bash
docker service create \
  --name my-service \
  --mount type=cluster,src=my-volume,dst=/srv/www \
  nginx:alpine
```

When scheduling Services which use Cluster Volumes, Docker Swarm uses the
volume's information and state to make decisions about Task placement.

For example, the Service will be constrained to run only on nodes on which the
volume is available. If the volume is configured with `scope=single`, meaning
it can only be used on one node in the cluster at a time, then all Tasks for
that Service will be scheduled to that same node. If that node changes for some
reason, like a node failure, then the Tasks will be rescheduled to the new
node automatically, without user input.

If the Cluster Volume is accessible only on some set of nodes at the same time,
and not the whole cluster, then Docker Swarm will only schedule the Service to
those nodes as reported by the plugin.

### Using Volume Groups

It is frequently desirable that a Service use any available volume out of an
interchangeable set. To accomplish this in the most simple and straightforward
manner possible, Cluster Volumes use the concept of a volume "Group".

The Volume Group is a field, somewhat like a special label, which is used to
instruct Swarm that a given volume is interchangeable with every other volume
of the same Group. When creating a Cluster Volume, the Group can be specified
by using the `--group` flag.

To use a Cluster Volume by Group instead of by Name, the mount `src` option is
prefixed with `group:`, followed by the group name. For example:

```
--mount type=cluster,src=group:my-group,dst=/srv/www
```

This instructs Docker Swarm that any Volume with the Group `my-group` can be
used to satisfy the mounts.

Volumes in a Group do not need to be identical, but they must be
interchangeable. These caveats should be kept in mind when using Groups:

* No Service ever gets the monopoly on a Cluster Volume. If several Services
  use the same Group, then the Cluster Volumes in that Group can be used with
  any of those Services at any time. Just because a particular Volume was used
  by a particular Service at one point does not mean it won't be used by a
  different Service later.
* Volumes in a group can have different configurations, but all of those
  configurations must be compatible with the Service. For example, if some of
  the Volumes in a group have `sharing=readonly`, then the Service must be
  capable of using the volume in read-only mode.
* Volumes in a Group are created statically ahead of time, not dynamically
  as-needed. This means that the user must ensure a sufficient number of
  Volumes belong to the desired Group to support the needs of the Service.

### Taking Cluster Volumes Offline

For various reasons, users may wish to take a particular Cluster Volume
offline, such that is not actively used by Services. To facilitate this,
Cluster Volumes have an `availability` option similar to Docker Swarm nodes.

Cluster Volume availability can be one of three states:

* `active` - Default. Volume can be used as normal.
* `pause` - The volume will not be used for new Services, but existing Tasks
  using the volume will not be stopped.
* `drain` - The volume will not be used for new Services, and any running Tasks
  using the volume will be stopped and rescheduled.

A Volume can only be removed from the cluster entirely if its availability is
set to `drain`, and it has been fully unpublished from all nodes.

## Unsupported Features

The CSI Spec allows for a large number of features which Cluster Volumes in
this initial implementation do not support. Most notably, Cluster Volumes do
not support snapshots, cloning, or volume expansion.