Add docs about how to extend devicemapper thin pool
Signed-off-by: Chun Chen <ramichen@tencent.com>
Update to device mapper
Entering comments
Signed-off-by: Mary Anthony <mary@docker.com>
(cherry picked from commit a7b2f87b06
)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This commit is contained in:
parent
15af7564cf
commit
3cf8c53515
1 changed files with 147 additions and 180 deletions
|
@ -16,12 +16,10 @@ leverages the thin provisioning and snapshotting capabilities of this framework
|
|||
for image and container management. This article refers to the Device Mapper
|
||||
storage driver as `devicemapper`, and the kernel framework as `Device Mapper`.
|
||||
|
||||
|
||||
>**Note**: The [Commercially Supported Docker Engine (CS-Engine) running on RHEL
|
||||
and CentOS Linux](https://www.docker.com/compatibility-maintenance) requires
|
||||
that you use the `devicemapper` storage driver.
|
||||
|
||||
|
||||
## An alternative to AUFS
|
||||
|
||||
Docker originally ran on Ubuntu and Debian Linux and used AUFS for its storage
|
||||
|
@ -61,20 +59,20 @@ With `devicemapper` the high level process for creating images is as follows:
|
|||
|
||||
1. The `devicemapper` storage driver creates a thin pool.
|
||||
|
||||
The pool is created from block devices or loop mounted sparse files (more
|
||||
on this later).
|
||||
The pool is created from block devices or loop mounted sparse files (more
|
||||
on this later).
|
||||
|
||||
2. Next it creates a *base device*.
|
||||
|
||||
A base device is a thin device with a filesystem. You can see which
|
||||
filesystem is in use by running the `docker info` command and checking the
|
||||
`Backing filesystem` value.
|
||||
A base device is a thin device with a filesystem. You can see which
|
||||
filesystem is in use by running the `docker info` command and checking the
|
||||
`Backing filesystem` value.
|
||||
|
||||
3. Each new image (and image layer) is a snapshot of this base device.
|
||||
|
||||
These are thin provisioned copy-on-write snapshots. This means that they
|
||||
are initially empty and only consume space from the pool when data is written
|
||||
to them.
|
||||
These are thin provisioned copy-on-write snapshots. This means that they
|
||||
are initially empty and only consume space from the pool when data is written
|
||||
to them.
|
||||
|
||||
With `devicemapper`, container layers are snapshots of the image they are
|
||||
created from. Just as with images, container snapshots are thin provisioned
|
||||
|
@ -109,9 +107,9 @@ block (`0x44f`) in an example container.
|
|||
|
||||
1. An application makes a read request for block `0x44f` in the container.
|
||||
|
||||
Because the container is a thin snapshot of an image it does not have the
|
||||
data. Instead, it has a pointer (PTR) to where the data is stored in the image
|
||||
snapshot lower down in the image stack.
|
||||
Because the container is a thin snapshot of an image it does not have the
|
||||
data. Instead, it has a pointer (PTR) to where the data is stored in the image
|
||||
snapshot lower down in the image stack.
|
||||
|
||||
2. The storage driver follows the pointer to block `0xf33` in the snapshot
|
||||
relating to image layer `a005...`.
|
||||
|
@ -121,7 +119,7 @@ snapshot to memory in the container.
|
|||
|
||||
4. The storage driver returns the data to the requesting application.
|
||||
|
||||
### Write examples
|
||||
## Write examples
|
||||
|
||||
With the `devicemapper` driver, writing new data to a container is accomplished
|
||||
by an *allocate-on-demand* operation. Updating existing data uses a
|
||||
|
@ -132,7 +130,7 @@ For example, when making a small change to a large file in a container, the
|
|||
`devicemapper` storage driver does not copy the entire file. It only copies the
|
||||
blocks to be modified. Each block is 64KB.
|
||||
|
||||
#### Writing new data
|
||||
### Writing new data
|
||||
|
||||
To write 56KB of new data to a container:
|
||||
|
||||
|
@ -141,12 +139,12 @@ To write 56KB of new data to a container:
|
|||
2. The allocate-on-demand operation allocates a single new 64KB block to the
|
||||
container's snapshot.
|
||||
|
||||
If the write operation is larger than 64KB, multiple new blocks are
|
||||
allocated to the container's snapshot.
|
||||
If the write operation is larger than 64KB, multiple new blocks are
|
||||
allocated to the container's snapshot.
|
||||
|
||||
3. The data is written to the newly allocated block.
|
||||
|
||||
#### Overwriting existing data
|
||||
### Overwriting existing data
|
||||
|
||||
To modify existing data for the first time:
|
||||
|
||||
|
@ -163,7 +161,7 @@ The application in the container is unaware of any of these
|
|||
allocate-on-demand and copy-on-write operations. However, they may add latency
|
||||
to the application's read and write operations.
|
||||
|
||||
## Configuring Docker with Device Mapper
|
||||
## Configure Docker with devicemapper
|
||||
|
||||
The `devicemapper` is the default Docker storage driver on some Linux
|
||||
distributions. This includes RHEL and most of its forks. Currently, the
|
||||
|
@ -182,18 +180,20 @@ deployments should not run under `loop-lvm` mode.
|
|||
|
||||
You can detect the mode by viewing the `docker info` command:
|
||||
|
||||
$ sudo docker info
|
||||
Containers: 0
|
||||
Images: 0
|
||||
Storage Driver: devicemapper
|
||||
Pool Name: docker-202:2-25220302-pool
|
||||
Pool Blocksize: 65.54 kB
|
||||
Backing Filesystem: xfs
|
||||
...
|
||||
Data loop file: /var/lib/docker/devicemapper/devicemapper/data
|
||||
Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
|
||||
Library Version: 1.02.93-RHEL7 (2015-01-28)
|
||||
...
|
||||
```bash
|
||||
$ sudo docker info
|
||||
Containers: 0
|
||||
Images: 0
|
||||
Storage Driver: devicemapper
|
||||
Pool Name: docker-202:2-25220302-pool
|
||||
Pool Blocksize: 65.54 kB
|
||||
Backing Filesystem: xfs
|
||||
[...]
|
||||
Data loop file: /var/lib/docker/devicemapper/devicemapper/data
|
||||
Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
|
||||
Library Version: 1.02.93-RHEL7 (2015-01-28)
|
||||
[...]
|
||||
```
|
||||
|
||||
The output above shows a Docker host running with the `devicemapper` storage
|
||||
driver operating in `loop-lvm` mode. This is indicated by the fact that the
|
||||
|
@ -203,175 +203,141 @@ files.
|
|||
|
||||
### Configure direct-lvm mode for production
|
||||
|
||||
The preferred configuration for production deployments is `direct lvm`. This
|
||||
The preferred configuration for production deployments is `direct-lvm`. This
|
||||
mode uses block devices to create the thin pool. The following procedure shows
|
||||
you how to configure a Docker host to use the `devicemapper` storage driver in
|
||||
a `direct-lvm` configuration.
|
||||
|
||||
> **Caution:** If you have already run the Engine daemon on your Docker host
|
||||
> **Caution:** If you have already run the Docker daemon on your Docker host
|
||||
> and have images you want to keep, `push` them Docker Hub or your private
|
||||
> Docker Trusted Registry before attempting this procedure.
|
||||
|
||||
The procedure below will create a 90GB data volume and 4GB metadata volume to
|
||||
use as backing for the storage pool. It assumes that you have a spare block
|
||||
device at `/dev/sdd` with enough free space to complete the task. The device
|
||||
device at `/dev/xvdf` with enough free space to complete the task. The device
|
||||
identifier and volume sizes may be be different in your environment and you
|
||||
should substitute your own values throughout the procedure.
|
||||
should substitute your own values throughout the procedure. The procedure also
|
||||
assumes that the Docker daemon is in the `stopped` state.
|
||||
|
||||
The procedure also assumes that the Engine daemon is in the `stopped` state.
|
||||
Any existing images or data are lost by this process.
|
||||
1. Log in to the Docker host you want to configure and stop the Docker daemon.
|
||||
|
||||
1. Log in to the Docker host you want to configure.
|
||||
2. If it is running, stop the Engine daemon.
|
||||
3. Install the logical volume management version 2.
|
||||
2. If it exists, delete your existing image store by removing the
|
||||
`/var/lib/docker` directory.
|
||||
|
||||
```bash
|
||||
$ yum install lvm2
|
||||
```
|
||||
4. Create a physical volume replacing `/dev/sdd` with your block device.
|
||||
```bash
|
||||
$ sudo rm -rf /var/lib/docker
|
||||
```
|
||||
|
||||
```bash
|
||||
$ pvcreate /dev/sdd
|
||||
```
|
||||
3. Create an LVM physical volume (PV) on your spare block device using the
|
||||
`pvcreate` command.
|
||||
|
||||
5. Create a 'docker' volume group.
|
||||
```bash
|
||||
$ sudo pvcreate /dev/xvdf
|
||||
Physical volume `/dev/xvdf` successfully created
|
||||
```
|
||||
|
||||
```bash
|
||||
$ vgcreate docker /dev/sdd
|
||||
```
|
||||
The device identifier may be different on your system. Remember to substitute
|
||||
your value in the command above. If your host is running on AWS EC2, you may
|
||||
need to install `lvm2` and <a href="http://goo.gl/Q5pUwG"
|
||||
target="_blank">attach an EBS device</a> to use this procedure.
|
||||
|
||||
6. Create a thin pool named `thinpool`.
|
||||
4. Create a new volume group (VG) called `vg-docker` using the PV created in
|
||||
the previous step.
|
||||
|
||||
In this example, the data logical is 95% of the 'docker' volume group size.
|
||||
Leaving this free space allows for auto expanding of either the data or
|
||||
metadata if space runs low as a temporary stopgap.
|
||||
```bash
|
||||
$ sudo vgcreate vg-docker /dev/xvdf
|
||||
Volume group `vg-docker` successfully created
|
||||
```
|
||||
|
||||
```bash
|
||||
$ lvcreate --wipesignatures y -n thinpool docker -l 95%VG
|
||||
$ lvcreate --wipesignatures y -n thinpoolmeta docker -l 1%VG
|
||||
```
|
||||
5. Create a new 90GB logical volume (LV) called `data` from space in the
|
||||
`vg-docker` volume group.
|
||||
|
||||
7. Convert the pool to a thin pool.
|
||||
```bash
|
||||
$ sudo lvcreate -L 90G -n data vg-docker
|
||||
Logical volume `data` created.
|
||||
```
|
||||
|
||||
```bash
|
||||
$ lvconvert -y --zero n -c 512K --thinpool docker/thinpool --poolmetadata docker/thinpoolmeta
|
||||
```
|
||||
The command creates an LVM logical volume called `data` and an associated
|
||||
block device file at `/dev/vg-docker/data`. In a later step, you instruct the
|
||||
`devicemapper` storage driver to use this block device to store image and
|
||||
container data.
|
||||
|
||||
8. Configure autoextension of thin pools via an `lvm` profile.
|
||||
If you receive a signature detection warning, make sure you are working on
|
||||
the correct devices before continuing. Signature warnings indicate that the
|
||||
device you're working on is currently in use by LVM or has been used by LVM in
|
||||
the past.
|
||||
|
||||
```bash
|
||||
$ vi /etc/lvm/profile/docker-thinpool.profile
|
||||
```
|
||||
6. Create a new logical volume (LV) called `metadata` from space in the
|
||||
`vg-docker` volume group.
|
||||
|
||||
9. Specify 'thin_pool_autoextend_threshold' value.
|
||||
```bash
|
||||
$ sudo lvcreate -L 4G -n metadata vg-docker
|
||||
Logical volume `metadata` created.
|
||||
```
|
||||
|
||||
The value should be the percentage of space used before `lvm` attempts
|
||||
to autoextend the available space (100 = disabled).
|
||||
This creates an LVM logical volume called `metadata` and an associated
|
||||
block device file at `/dev/vg-docker/metadata`. In the next step you instruct
|
||||
the `devicemapper` storage driver to use this block device to store image and
|
||||
container metadata.
|
||||
|
||||
```
|
||||
thin_pool_autoextend_threshold = 80
|
||||
```
|
||||
7. Start the Docker daemon with the `devicemapper` storage driver and the
|
||||
`--storage-opt` flags.
|
||||
|
||||
10. Modify the `thin_pool_autoextend_percent` for when thin pool autoextension occurs.
|
||||
The `data` and `metadata` devices that you pass to the `--storage-opt`
|
||||
options were created in the previous steps.
|
||||
|
||||
The value's setting is the perentage of space to increase the thin pool (100 =
|
||||
disabled)
|
||||
```bash
|
||||
$ sudo docker daemon --storage-driver=devicemapper --storage-opt dm.datadev=/dev/vg-docker/data --storage-opt dm.metadatadev=/dev/vg-docker/metadata &
|
||||
[1] 2163
|
||||
[root@ip-10-0-0-75 centos]# INFO[0000] Listening for HTTP on unix (/var/run/docker.sock)
|
||||
INFO[0027] Option DefaultDriver: bridge
|
||||
INFO[0027] Option DefaultNetwork: bridge
|
||||
<-- output truncated -->
|
||||
INFO[0027] Daemon has completed initialization
|
||||
INFO[0027] Docker daemon commit=1b09a95 graphdriver=aufs version=1.11.0-dev
|
||||
```
|
||||
|
||||
```
|
||||
thin_pool_autoextend_percent = 20
|
||||
```
|
||||
It is also possible to set the `--storage-driver` and `--storage-opt` flags
|
||||
in the Docker config file and start the daemon normally using the `service` or
|
||||
`systemd` commands.
|
||||
|
||||
11. Check your work, your `docker-thinpool.profile` file should appear similar to the following:
|
||||
8. Use the `docker info` command to verify that the daemon is using `data` and
|
||||
`metadata` devices you created.
|
||||
|
||||
An example `/etc/lvm/profile/docker-thinpool.profile` file:
|
||||
|
||||
```
|
||||
activation {
|
||||
thin_pool_autoextend_threshold=80
|
||||
thin_pool_autoextend_percent=20
|
||||
}
|
||||
```
|
||||
|
||||
12. Apply your new lvm profile
|
||||
|
||||
```bash
|
||||
$ lvchange --metadataprofile docker-thinpool docker/thinpool
|
||||
```
|
||||
|
||||
13. Verify the `lv` is monitored.
|
||||
|
||||
```bash
|
||||
$ lvs -o+seg_monitor
|
||||
```
|
||||
|
||||
14. If Engine was previously started, clear your graph driver directory.
|
||||
|
||||
Clearing your graph driver removes any images and containers in your Docker
|
||||
installation.
|
||||
|
||||
```bash
|
||||
$ rm -rf /var/lib/docker/*
|
||||
```
|
||||
|
||||
14. Configure the Engine daemon with specific devicemapper options.
|
||||
|
||||
There are two ways to do this. You can set options on the commmand line if you start the daemon there:
|
||||
|
||||
```bash
|
||||
--storage-driver=devicemapper --storage-opt=dm.thinpooldev=/dev/mapper/docker-thinpool --storage-opt dm.use_deferred_removal=true
|
||||
```
|
||||
|
||||
You can also set them for startup in the `daemon.json` configuration, for example:
|
||||
|
||||
```json
|
||||
{
|
||||
"storage-driver": "devicemapper",
|
||||
"storage-opts": [
|
||||
"dm.thinpooldev=/dev/mapper/docker-thinpool",
|
||||
"dm.use_deferred_removal=true"
|
||||
]
|
||||
}
|
||||
```
|
||||
15. Start the Engine daemon.
|
||||
|
||||
```bash
|
||||
$ systemctl start docker
|
||||
```
|
||||
|
||||
After you start the Engine daemon, ensure you monitor your thin pool and volume
|
||||
group free space. While the volume group will auto-extend, it can still fill
|
||||
up. To monitor logical volumes, use `lvs` without options or `lvs -a` to see tha
|
||||
data and metadata sizes. To monitor volume group free space, use the `vgs` command.
|
||||
|
||||
Logs can show the auto-extension of the thin pool when it hits the threshold, to
|
||||
view the logs use:
|
||||
|
||||
```bash
|
||||
journalctl -fu dm-event.service
|
||||
```
|
||||
|
||||
If you run into repeated problems with thin pool, you can use the
|
||||
`dm.min_free_space` option to tune the Engine behavior. This value ensures that
|
||||
operations fail with a warning when the free space is at or near the minimum.
|
||||
For information, see <a
|
||||
href="https://docs.docker.com/engine/reference/commandline/daemon/#storage-driver-options"
|
||||
target="_blank">the storage driver options in the Engine daemon reference</a>.
|
||||
```bash
|
||||
$ sudo docker info
|
||||
INFO[0180] GET /v1.20/info
|
||||
Containers: 0
|
||||
Images: 0
|
||||
Storage Driver: devicemapper
|
||||
Pool Name: docker-202:1-1032-pool
|
||||
Pool Blocksize: 65.54 kB
|
||||
Backing Filesystem: xfs
|
||||
Data file: /dev/vg-docker/data
|
||||
Metadata file: /dev/vg-docker/metadata
|
||||
[...]
|
||||
```
|
||||
|
||||
The output of the command above shows the storage driver as `devicemapper`.
|
||||
The last two lines also confirm that the correct devices are being used for
|
||||
the `Data file` and the `Metadata file`.
|
||||
|
||||
### Examine devicemapper structures on the host
|
||||
|
||||
You can use the `lsblk` command to see the device files created above and the
|
||||
`pool` that the `devicemapper` storage driver creates on top of them.
|
||||
|
||||
$ sudo lsblk
|
||||
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
|
||||
xvda 202:0 0 8G 0 disk
|
||||
└─xvda1 202:1 0 8G 0 part /
|
||||
xvdf 202:80 0 10G 0 disk
|
||||
├─vg--docker-data 253:0 0 90G 0 lvm
|
||||
│ └─docker-202:1-1032-pool 253:2 0 10G 0 dm
|
||||
└─vg--docker-metadata 253:1 0 4G 0 lvm
|
||||
└─docker-202:1-1032-pool 253:2 0 10G 0 dm
|
||||
```bash
|
||||
$ sudo lsblk
|
||||
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
|
||||
xvda 202:0 0 8G 0 disk
|
||||
└─xvda1 202:1 0 8G 0 part /
|
||||
xvdf 202:80 0 10G 0 disk
|
||||
├─vg--docker-data 253:0 0 90G 0 lvm
|
||||
│ └─docker-202:1-1032-pool 253:2 0 10G 0 dm
|
||||
└─vg--docker-metadata 253:1 0 4G 0 lvm
|
||||
└─docker-202:1-1032-pool 253:2 0 10G 0 dm
|
||||
```
|
||||
|
||||
The diagram below shows the image from prior examples updated with the detail
|
||||
from the `lsblk` command above.
|
||||
|
@ -379,8 +345,8 @@ from the `lsblk` command above.
|
|||
![](http://farm1.staticflickr.com/703/22116692899_0471e5e160_b.jpg)
|
||||
|
||||
In the diagram, the pool is named `Docker-202:1-1032-pool` and spans the `data`
|
||||
and `metadata` devices created earlier. The `devicemapper` constructs the pool
|
||||
name as follows:
|
||||
and `metadata` devices created earlier. The `devicemapper` constructs the pool
|
||||
name as follows:
|
||||
|
||||
```
|
||||
Docker-MAJ:MIN-INO-pool
|
||||
|
@ -440,18 +406,18 @@ Logging Driver: json-file
|
|||
[...]
|
||||
```
|
||||
|
||||
The `Data Space` values show that the pool is 100GiB total. This example extends the pool to 200GiB.
|
||||
The `Data Space` values show that the pool is 100GB total. This example extends the pool to 200GB.
|
||||
|
||||
1. List the sizes of the devices.
|
||||
|
||||
```bash
|
||||
$ sudo ls -lh /var/lib/docker/devicemapper/devicemapper/
|
||||
total 1.2G
|
||||
-rw------- 1 root root 100G Apr 14 08:47 data
|
||||
-rw------- 1 root root 2.0G Apr 19 13:27 metadata
|
||||
total 1175492
|
||||
-rw------- 1 root root 100G Mar 30 05:22 data
|
||||
-rw------- 1 root root 2.0G Mar 31 11:17 metadata
|
||||
```
|
||||
|
||||
2. Truncate `data` file to 200GiB.
|
||||
2. Truncate `data` file to the size of the `metadata` file (approximage 200GB).
|
||||
|
||||
```bash
|
||||
$ sudo truncate -s 214748364800 /var/lib/docker/devicemapper/devicemapper/data
|
||||
|
@ -460,10 +426,12 @@ The `Data Space` values show that the pool is 100GiB total. This example extends
|
|||
3. Verify the file size changed.
|
||||
|
||||
```bash
|
||||
$ sudo ls -lh /var/lib/docker/devicemapper/devicemapper/
|
||||
total 1.2G
|
||||
-rw------- 1 root root 200G Apr 14 08:47 data
|
||||
-rw------- 1 root root 2.0G Apr 19 13:27 metadata
|
||||
$ sudo ls -al /var/lib/docker/devicemapper/devicemapper/
|
||||
total 1175492
|
||||
drwx------ 2 root root 4096 Mar 29 02:45 .
|
||||
drwx------ 5 root root 4096 Mar 29 02:48 ..
|
||||
-rw------- 1 root root 214748364800 Mar 31 11:20 data
|
||||
-rw------- 1 root root 2147483648 Mar 31 11:17 metadata
|
||||
```
|
||||
|
||||
4. Reload data loop device
|
||||
|
@ -480,19 +448,19 @@ The `Data Space` values show that the pool is 100GiB total. This example extends
|
|||
|
||||
a. Get the pool name first.
|
||||
|
||||
$ sudo dmsetup status | grep pool
|
||||
docker-8:1-123141-pool: 0 209715200 thin-pool 91 422/524288 18338/1638400 - rw discard_passdown queue_if_no_space -
|
||||
$ sudo dmsetup status docker-8:1-123141-pool: 0 209715200 thin-pool 91
|
||||
422/524288 18338/1638400 - rw discard_passdown queue_if_no_space -
|
||||
|
||||
The name is the string before the colon.
|
||||
|
||||
b. Dump the device mapper table first.
|
||||
b. Dump the device mapper table first.
|
||||
|
||||
$ sudo dmsetup table docker-8:1-123141-pool
|
||||
0 209715200 thin-pool 7:1 7:0 128 32768 1 skip_block_zeroing
|
||||
|
||||
c. Calculate the real total sectors of the thin pool now.
|
||||
|
||||
Change the second number of the table info (i.e. the number of sectors) to reflect the new number of 512 byte sectors in the disk. For example, as the new loop size is 200GiB, change the second number to 419430400.
|
||||
Change the second number of the table info (i.e. the disk end sector) to reflect the new number of 512 byte sectors in the disk. For example, as the new loop size is 200GB, change the second number to 419430400.
|
||||
|
||||
d. Reload the thin pool with the new sector number
|
||||
|
||||
|
@ -514,7 +482,7 @@ $ ./device_tool resize 200GB
|
|||
### For a direct-lvm mode configuration
|
||||
|
||||
In this example, you extend the capacity of a running device that uses the
|
||||
`direct-lvm` configuration. This example assumes you are using the `/dev/sdh1`
|
||||
`direct-lvm` configuration. This example assumes you are using the `/dev/sdh1`
|
||||
disk partition.
|
||||
|
||||
1. Extend the volume group (VG) `vg-docker`.
|
||||
|
@ -550,7 +518,7 @@ disk partition.
|
|||
|
||||
c. Calculate the real total sectors of the thin pool now. we can use `blockdev` to get the real size of data lv.
|
||||
|
||||
Change the second number of the table info (i.e. the number of sectors) to
|
||||
Change the second number of the table info (i.e. the disk end sector) to
|
||||
reflect the new number of 512 byte sectors in the disk. For example, as the
|
||||
new data `lv` size is `264132100096` bytes, change the second number to
|
||||
`515883008`.
|
||||
|
@ -562,7 +530,6 @@ disk partition.
|
|||
|
||||
$ sudo dmsetup suspend docker-253:17-1835016-pool && sudo dmsetup reload docker-253:17-1835016-pool --table '0 515883008 thin-pool 252:0 252:1 128 32768 1 skip_block_zeroing' && sudo dmsetup resume docker-253:17-1835016-pool
|
||||
|
||||
|
||||
## Device Mapper and Docker performance
|
||||
|
||||
It is important to understand the impact that allocate-on-demand and
|
||||
|
|
Loading…
Reference in a new issue