In an effort to make layer content 'stable' between import
and export from two different graph drivers, we must resolve
an issue where AUFS produces metadata files in its layers
which other drivers explicitly ignore when importing.
The issue presents itself like this:
- Generate a layer using AUFS
- On commit of that container, the new stored layer contains
AUFS metadata files/dirs. The stored layer content has some
tarsum value: '1234567'
- `docker save` that image to a USB drive and `docker load`
into another docker engine instance which uses another
graph driver, say 'btrfs'
- On load, this graph driver explicitly ignores any AUFS metadata
that it encounters. The stored layer content now has some
different tarsum value: 'abcdefg'.
The only (apparent) useful aufs metadata to keep are the psuedo link
files located at `/.wh..wh.plink/`. Thes files hold information at the
RW layer about hard linked files between this layer and another layer.
The other graph drivers make sure to copy up these psuedo linked files
but I've tested out a few different situations and it seems that this
is unnecessary (In my test, AUFS already copies up the other hard linked
files to the RW layer).
This changeset adds explicit exclusion of the AUFS metadata files and
directories (NOTE: not the whiteout files!) on commit of a container
using the AUFS storage driver.
Also included is a change to the archive package. It now explicitly
ignores the root directory from being included in the resulting tar archive
for 2 reasons: 1) it's unnecessary. 2) It's another difference between
what other graph drivers produce when exporting a layer to a tar archive.
Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)
This backend uses the overlayfs union filesystem for containers
plus hard link file sharing for images.
Each container/image can have a "root" subdirectory which is a plain
filesystem hierarchy, or they can use overlayfs.
If they use overlayfs there is a "upper" directory and a "lower-id"
file, as well as "merged" and "work" directories. The "upper"
directory has the upper layer of the overlay, and "lower-id" contains
the id of the parent whose "root" directory shall be used as the lower
layer in the overlay. The overlay itself is mounted in the "merged"
directory, and the "work" dir is needed for overlayfs to work.
When a overlay layer is created there are two cases, either the
parent has a "root" dir, then we start out with a empty "upper"
directory overlaid on the parents root. This is typically the
case with the init layer of a container which is based on an image.
If there is no "root" in the parent, we inherit the lower-id from
the parent and start by making a copy if the parents "upper" dir.
This is typically the case for a container layer which copies
its parent -init upper layer.
Additionally we also have a custom implementation of ApplyLayer
which makes a recursive copy of the parent "root" layer using
hardlinks to share file data, and then applies the layer on top
of that. This means all chile images share file (but not directory)
data with the parent.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
The vfs storage driver currently shells out to the `cp` binary on the host
system to perform an 'archive' copy of the base image to a new directory.
The archive option preserves the modified time of the files which are created
but there was an issue where it was unable to preserve the modified time of
copied symbolic links on some host systems with an outdated version of `cp`.
This change no longer relies on the host system implementation and instead
utilizes the `CopyWithTar` function found in `pkg/archive` which is used
to copy from source to destination directory using a Tar archive, which
should correctly preserve file attributes.
Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)
Now that the archive package does not depend on any docker-specific
packages, only those in pkg and vendor, it can be safely moved into pkg.
Signed-off-by: Rafe Colton <rafael.colton@gmail.com>
If `--storage-opt dm.datadev=/dev/loop0 --storage-opt
dm.metadatadev=/dev/loop1 ` were provided, the information was not
reflected in the information output.
Closes: #7137
Signed-off-by: Vincent Batts <vbatts@redhat.com>
Some graphdrivers are Differs and type assertions are made
in various places throughout the project. Differ offers some
convenience in generating/applying diffs of filesystem layers
but for most graphdrivers another code path is taken.
This patch brings all of the logic related to filesystem
diffs in one place, and simplifies the implementation of some
common types like Image, Daemon, and Container.
Signed-off-by: Josh Hawn <josh.hawn@docker.com>
Since these will be shared between containers we want to label
them as svirt_sandbox_file_t:s0. That will allow multiple containers
to write to them.
Currently we are allowing container domains to read/write all content in
/var/lib/docker because of container volumes. This is a big security hole
in our SELinux story.
This patch will allow us to tighten up the security of docker containers.
Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
functions to pkg/parsers/kernel, and parsing filters to
pkg/parsers/filter. Adjust imports and package references.
Docker-DCO-1.1-Signed-off-by: Erik Hollensbe <github@hollensbe.org> (github: erikh)
Commit 09ee269d ("devmapper: Add option for specifying the thin pool
blocksize") also switched the default dm-thin-pool blocksize from 64K to
512K. That change unfortunately breaks the activation of dm-thin-pool
devices that were previously created using a 64K blocksize. Here is an
example of the dm-thin-pool activation failure users may experience:
device-mapper: thin: 253:4: pool target (204800 blocks) too small: expected 1638400
device-mapper: table: 253:4: thin-pool: preresume failed, error = -22
The reason for this is docker is passing 512K as the blocksize for a
dm-thin-pool that was previously created using a 64K blocksize. Docker
doesn't record the blocksize the is used when it creates a dm-thin-pool.
Until now it never had a need to do so because the blocksize was always
hardcoded. The dm-thin-pool blocksize must be the same every time a
dm-thin-pool is activated.
As a stop-gap fix, revert to using 64K for the default blocksize.
But we do need a proper fix for this now that 'dm.blocksize' is exposed
as a proper storage option. One possible fix would be to record the
blocksize for each dm-thin-pool that docker creates and to pass that
recorded blocksize down in the dmsetup table load each time the
dm-thin-pool is activated (this would be comparable to what lvm2 does).
Docker-DCO-1.1-Signed-off-by: Mike Snitzer <snitzer@redhat.com> (github: snitm)
Add dm.blocksize option that you can use with --storage-opt to set a
specific blocksize for the thin provisioning pool.
Also change the default dm-thin-pool blocksize from 64K to 512K. This
strikes a balance between the desire to have smaller blocksize given
docker's use of snapshots versus the desire to have more performance
that comes with using a larger blocksize. But if very small files will
be used on average the user is encouraged to override this default.
Docker-DCO-1.1-Signed-off-by: Mike Snitzer <snitzer@redhat.com> (github: snitm)
Device Mapper needs device sizes in binary (1024) multiples. Otherwise
kernel checks can find that the specified thin-pool device sizes aren't
a multiple of the specified thin-pool blocksize.
The name for "RAMInBytes" is likely too narrow given the new consumers
but... Also add "tebibyte" support to RAMInBytes.
Docker-DCO-1.1-Signed-off-by: Mike Snitzer <snitzer@redhat.com> (github: snitm)
createPool() and reloadPool() should be consistent with the thin-pool
table params they use.
Since createPool() specifies '1 skip_block_zeroing' reloadPool() should
too. Otherwise, if the pool is reloaded (as is done when resizing
loopback devices) block zeroing will be enabled after the reload
completes.
Docker-DCO-1.1-Signed-off-by: Mike Snitzer <snitzer@redhat.com> (github: snitm)
If this is at the root directory for the daemon you could unmount
somones filesystem when you stop docker and this is actually only needed
for the palces that the graph drivers mount the container's root
filesystems.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
The blkdiscard hack we do on container/image delete is pretty slow, but
required to restore space to the "host" root filesystem. However, it
is pretty useless on raw devices, and you may not need it in development
either.
In a simple test of the devicemapper backend on loopback the time to
delete 20 container went from 11 seconds to 0.4 seconds with
--storage-opt blkdiscard=false.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
This adds dm.datadev and dm.metadatadev options that you can use with
--storage-opt to set to specific devices to use for the thin
provisioning pool.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
This adds the following --storage-opts for the daemon:
dm.fs: The filesystem to use for the base image
dm.mkfsarg: Add an argument to the mkfs command for the base image
dm.mountopt: Add a mount option for devicemapper mount
Currently supported filesystems are xfs and ext4.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
This allows setting these settings to be passed:
dm.basesize
dm.loopdatasize
dm.loopmetadatasize
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
If we can't even get the current device mapper driver version, then
we cleanly fail the devmapper driver as not supported and fall back
on the next one.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
There are two cases where we can't use a graphdriver:
1) the graphdriver itself isn't supported by the system
2) the graphdriver is supported by some configuration/prerequisites are
missing
This introduces a new error for the 2) case and uses it when trying to
run docker with btrfs backend on a non-btrfs filesystem.
Docker-DCO-1.1-Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org> (github: discordianfish)
There is no reason to do discard durink mkfs, as the filesystem
is on a newly allocated device anyway. Discard is a slow operation,
so this may help initial startup a bit, especially if you use a larger
thin pool.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
Now that we have the generic graphtest tests that actually tests
the driver we can remove the old mock-using tests. Almost all of
these tests were disabled anyway, and the four remaining ones
didn't really test much while at the same time being really
fragile and making the rest of the code more complex due to
the mocking setup.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
For now this means the btrfs backend is skipped when run
inside make test. You can however run it manually if you want.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
If a graphdriver fails initialization due to ErrNotSupported we ignore
that and keep trying the next. But if some driver has a different
error (for instance if you specified an unknown option for it) we fail
the daemon startup, printing the error, rather than falling back to an
unexected driver (typically vfs) which may not match what you have run
earlier.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
This adds daemon/graphdriver/graphtest/graphtest which has a few
generic tests for all graph drivers, and then uses these
from the btrs, devicemapper and vfs backends.
I've not yet added the aufs backend, because i can't test that here
atm. It should work though.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
Currently the tests that mocks or denies functions leave this state
around for the next test. This is no good if we want to actually
test the devicemapper code in later tests.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
This has every container using the docker daemon's pid for the processes
label so it does not work correctly.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
This allows multiple instances of the backend in different containers
to access devices (although generally only one can modify/create them).
Any old metadata is converted on the first run.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
Instead of globally keeping track of the free device ids we just
start from 0 each run and handle EEXIST error and try the next one.
This way we don't need any global state for the device ids, which
means we can read device metadata lazily. This is important for
multi-process use of the backend.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
This moves the EBUSY detection to devmapper.go, and then returns
a real ErrBusy that deviceset uses.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
We used to mount in Create() to be able to create a few files that
needs to be in each device. However, this mount is problematic for
selinux, as we need to set the mount label at mount-time, and it
is not known at the time of Create().
This change just moves the file creation to first Get() call and
drops the mount from Create(). Additionally, this lets us remove
some complexities we had to avoid an extra unmount+mount cycle.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)