Currently updatePoolTransactionId() checks if NewTransactionId and
TransactionId are not same only then update the transaction Id in pool. This
check is redundant. Currently we call updatePoolTransactionId() only from
two places and both of these first allocate a new transaction Id.
Also updatePoolTransactionId() should only be called after allocating
new transaction Id otherwise it does not make any sense.
Remove the redundant check and reduce confusion.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Create two new helper functions for device and snap device creation. These
functions will not only create the device and also register the device.
Again, makes the code structure better and keeps all transaction logic
contained to functions instead of spilling over into functions like
setupBaseImage or AddDevice().
Just the code reorganization. No functionality change.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Currently registerDevice() adds a device to in-memory table, saves metadata
and also updates the pool transaction ID.
Now move transaciton Id update out of registerDevice() and provide a new
function unregisterDevice() which does the reverse of registerDevice().
This will simplify some code down the line and make it more structured.
This is just code reorganization and should not change functionality.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Currently devicemapper CreateDevice and CreateSnapDevice keep on retrying
device creation till a suitable device id is found.
With new transaction mechanism we need to store device id in transaction
before it has been created.
So change the logic in such a way that caller decides the devices Id to
use. If that device Id is not available, caller bumps up the device Id
and retries.
That way caller can update transaciton too when it tries a new Id. Transaction
related patches will come later in the series.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
When we are deleting a device, we also delete associated metadata file. If
that file removal fails, we are adding back the device in in-memory
table. I really can't see what's the point. When next lookup takes place
it will be automatically loaded if need be. Remove that code.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Right now initMetaData() first queries the pool for current transaciton Id
and then it migrates the old metafile.
Move pool transaction Id query and file migration in separate functions
for better code reuse and organization.
Given we have removed device transaction Id dependency from saveMetaData(),
we don't have to query pool transaction Id before migrating files.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Right now saveMetaData() is kind of little overloaded function. It is
supposed to save file metadata to disk. But in addition if user has
bumped up NewTransactionId before calling saveMetaData(), then it will
also update the transaction ID in pool.
Keep saveMetaData() simple and let it just save the file. Any update
of pool transaction ID is done inline in the code which needs it.
Also create an helper function updatePoolTransactionId() to update pool
transaction Id.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Remove call to allocateTransactionId() during device removal. This seems to
be unnecessary and it is not clear what this call is doing.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Again, just because device transaction id is greater than pool transaction
id, it does not guarantee that device is in the pool. So do not check
of this during loading of device metadata.
Docker needs to deal with it. And device activation will fail when we try
to activate a device for whom metafile is present but there is no device
in the pool.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Current code is associating a transaction id with each device and if pool
transaction id is greater that value, then current code assumes that device
is there in pool.
Transaction id of pool is a mechanism so that during device creation and
removal one can define a transaction and during startup figure out if
transaction was complete or not. I think we are using transaction id
throughout the code little inappropriately.
For example, if a device is being deleted, it is possible that we deleted
the device from pool but before we could delete metafile docker crashed.
When docker comes back it will think that device is in the pool (due to
device transaction id being less than pool transaction id) but device
is not in the pool.
Similary, it could happen that some data in the pool is corrupted and
during pool repair some devices are lost (without docker knowing about
it). In that case tool pool transaction id will be higher than device
transaction id and there are no guaratees that device is actually in
the pool.
So move away from this model where we think that a device is in pool if pool
transaction id is greater than device transaction Id. Per device
transaction Id just says that after device creation this should be pool's
transaction Id and nothing more.
Transaction id is per pool property (as opposed to per device property) and
will be used internally to figure out if last transaction was complete or
not and recover from failure during docker startup.
If for some reason metafile is present but device is not in pool, then
device activation will fail later.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Ideally lvm2 would be used to create/manage the thin-pool volume that is
then handed to docker to exclusively create/manage the thin and thin
snapshot volumes needed for it's containers. Managing the thin-pool
outside of docker makes for the most feature-rich method of having
docker utilize device mapper thin provisioning as the backing storage
for docker's containers. lvm2-based thin-pool management feature
highlights include: automatic or interactive thin-pool resize support,
dynamically change thin-pool features, automatic thinp metadata checking
when lvm2 activates the thin-pool, etc.
Docker will not activate/deactivate the specified thin-pool device but
it will exclusively manage/create thin and thin snapshot volumes in it.
Docker will not take ownership of the specified thin-pool device unless
it has 0 data blocks used and a transaction id of 0. This should help
guard against using a thin-pool that is already in use.
Also fix typos in setupBaseImage() relative to the thin volume type of
the base image.
Docker-DCO-1.1-Signed-off-by: Mike Snitzer <snitzer@redhat.com> (github: snitm)
Took care of some review comments from crosbymichael.
v2:
- Return "err = nil" if file deviceset-metadata file does not exist.
- Use json.Decoder() interface for loading deviceset metadata.
v3:
- Reverted back to json marshal interface in loadDeviceSetMetaData().
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
In previous patch I had introduce json:"-" tags to be on safer side to make
sure certain fields are not marshalled/unmarshalled. But struct fields
starting with small letter are not exported so they will not be marshalled
anyway. So remove json:"-" tags from there.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
My pull request failed the build due to gofmat issues. I have run gofmt
on specified files and this commit fixes it.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
The way thin-pool right now is designed, user space is supposed to keep
track of what device ids have already been used. If user space tries to
create a new thin/snap device and device id has already been used, thin
pool retuns -EEXIST.
Upon receiving -EEXIST, current docker implementation simply tries the
NextDeviceId++ and keeps on doing this till it finds a free device id.
This approach has two issues.
- It is little suboptimal.
- If device id already exists, current kenrel implementation spits out
a messsage on console.
[17991.140135] device-mapper: thin: Creation of new snapshot 33 of device 3 failed.
Here kenrel is trying to tell user that device id 33 has already been used.
And this shows up for every device id docker tries till it reaches a point
where device ids are not used. So if there are thousands of container and
one is trying to create a new container after fresh docker start, expect
thousands of such warnings to flood console.
This patch saves the NextDeviceId in a file in
/var/lib/docker/devmapper/metadata/deviceset-metadata and reads it back
when docker starts. This way we don't retry lots of device ids which
have already been used.
There might be some device ids which are free but we will get back to them
once device numbers wrap around (24bit limit on device ids).
This patch should cut down on number of kernel warnings.
Notice that I am creating a deviceset metadata file which is a global file
for this pool. So down the line if we need to save more data we should be
able to do that.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
I was trying to save nextDeviceId to a file but it would not work and
json.Marshal() will do nothing. Then some search showed that I need to
make first letter of struct field capital, exporting this field and
now json.Marshal() works.
This is a preparatory patch for the next one.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Currently we save device metadata and have a helper function saveMetadata()
which converts data in json format as well as saves it to file. For
converting data in json format, one needs to know what is being saved.
Break this function down in two functions. One function only has file
write capability and takes in argument about byte array of json data.
Now this function does not have to know what data is being saved. It
only knows about a stream of json data is being saved to a file.
This allows me to reuse this function to save a different type of
metadata. In this case I am planning to save NextDeviceId so that
docker can use this device Id upon next restart. Otherwise docker
starts from 0 which is suboptimal.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
If `--storage-opt dm.datadev=/dev/loop0 --storage-opt
dm.metadatadev=/dev/loop1 ` were provided, the information was not
reflected in the information output.
Closes: #7137
Signed-off-by: Vincent Batts <vbatts@redhat.com>
functions to pkg/parsers/kernel, and parsing filters to
pkg/parsers/filter. Adjust imports and package references.
Docker-DCO-1.1-Signed-off-by: Erik Hollensbe <github@hollensbe.org> (github: erikh)
Commit 09ee269d ("devmapper: Add option for specifying the thin pool
blocksize") also switched the default dm-thin-pool blocksize from 64K to
512K. That change unfortunately breaks the activation of dm-thin-pool
devices that were previously created using a 64K blocksize. Here is an
example of the dm-thin-pool activation failure users may experience:
device-mapper: thin: 253:4: pool target (204800 blocks) too small: expected 1638400
device-mapper: table: 253:4: thin-pool: preresume failed, error = -22
The reason for this is docker is passing 512K as the blocksize for a
dm-thin-pool that was previously created using a 64K blocksize. Docker
doesn't record the blocksize the is used when it creates a dm-thin-pool.
Until now it never had a need to do so because the blocksize was always
hardcoded. The dm-thin-pool blocksize must be the same every time a
dm-thin-pool is activated.
As a stop-gap fix, revert to using 64K for the default blocksize.
But we do need a proper fix for this now that 'dm.blocksize' is exposed
as a proper storage option. One possible fix would be to record the
blocksize for each dm-thin-pool that docker creates and to pass that
recorded blocksize down in the dmsetup table load each time the
dm-thin-pool is activated (this would be comparable to what lvm2 does).
Docker-DCO-1.1-Signed-off-by: Mike Snitzer <snitzer@redhat.com> (github: snitm)
Add dm.blocksize option that you can use with --storage-opt to set a
specific blocksize for the thin provisioning pool.
Also change the default dm-thin-pool blocksize from 64K to 512K. This
strikes a balance between the desire to have smaller blocksize given
docker's use of snapshots versus the desire to have more performance
that comes with using a larger blocksize. But if very small files will
be used on average the user is encouraged to override this default.
Docker-DCO-1.1-Signed-off-by: Mike Snitzer <snitzer@redhat.com> (github: snitm)
Device Mapper needs device sizes in binary (1024) multiples. Otherwise
kernel checks can find that the specified thin-pool device sizes aren't
a multiple of the specified thin-pool blocksize.
The name for "RAMInBytes" is likely too narrow given the new consumers
but... Also add "tebibyte" support to RAMInBytes.
Docker-DCO-1.1-Signed-off-by: Mike Snitzer <snitzer@redhat.com> (github: snitm)
The blkdiscard hack we do on container/image delete is pretty slow, but
required to restore space to the "host" root filesystem. However, it
is pretty useless on raw devices, and you may not need it in development
either.
In a simple test of the devicemapper backend on loopback the time to
delete 20 container went from 11 seconds to 0.4 seconds with
--storage-opt blkdiscard=false.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
This adds dm.datadev and dm.metadatadev options that you can use with
--storage-opt to set to specific devices to use for the thin
provisioning pool.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
This adds the following --storage-opts for the daemon:
dm.fs: The filesystem to use for the base image
dm.mkfsarg: Add an argument to the mkfs command for the base image
dm.mountopt: Add a mount option for devicemapper mount
Currently supported filesystems are xfs and ext4.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
This allows setting these settings to be passed:
dm.basesize
dm.loopdatasize
dm.loopmetadatasize
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
If we can't even get the current device mapper driver version, then
we cleanly fail the devmapper driver as not supported and fall back
on the next one.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)