Fixes#1171Fixes#6465
Data passed to mount(2) is clipped to PAGE_SIZE if its bigger. Previous
implementation checked if error was returned and then started to append layers
one by one. But if the PAGE_SIZE clipping appeared in between the paths, in the
permission sections or in xino definition the call would not error and
remaining layers would just be skipped(or some other unknown situation).
This also optimizes system calls as it tries to mount as much as possible with
the first mount.
Signed-off-by: Tõnis Tiigi <tonistiigi@gmail.com> (github: tonistiigi)
Took care of some review comments from crosbymichael.
v2:
- Return "err = nil" if file deviceset-metadata file does not exist.
- Use json.Decoder() interface for loading deviceset metadata.
v3:
- Reverted back to json marshal interface in loadDeviceSetMetaData().
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
In previous patch I had introduce json:"-" tags to be on safer side to make
sure certain fields are not marshalled/unmarshalled. But struct fields
starting with small letter are not exported so they will not be marshalled
anyway. So remove json:"-" tags from there.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
My pull request failed the build due to gofmat issues. I have run gofmt
on specified files and this commit fixes it.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
The way thin-pool right now is designed, user space is supposed to keep
track of what device ids have already been used. If user space tries to
create a new thin/snap device and device id has already been used, thin
pool retuns -EEXIST.
Upon receiving -EEXIST, current docker implementation simply tries the
NextDeviceId++ and keeps on doing this till it finds a free device id.
This approach has two issues.
- It is little suboptimal.
- If device id already exists, current kenrel implementation spits out
a messsage on console.
[17991.140135] device-mapper: thin: Creation of new snapshot 33 of device 3 failed.
Here kenrel is trying to tell user that device id 33 has already been used.
And this shows up for every device id docker tries till it reaches a point
where device ids are not used. So if there are thousands of container and
one is trying to create a new container after fresh docker start, expect
thousands of such warnings to flood console.
This patch saves the NextDeviceId in a file in
/var/lib/docker/devmapper/metadata/deviceset-metadata and reads it back
when docker starts. This way we don't retry lots of device ids which
have already been used.
There might be some device ids which are free but we will get back to them
once device numbers wrap around (24bit limit on device ids).
This patch should cut down on number of kernel warnings.
Notice that I am creating a deviceset metadata file which is a global file
for this pool. So down the line if we need to save more data we should be
able to do that.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
I was trying to save nextDeviceId to a file but it would not work and
json.Marshal() will do nothing. Then some search showed that I need to
make first letter of struct field capital, exporting this field and
now json.Marshal() works.
This is a preparatory patch for the next one.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Currently we save device metadata and have a helper function saveMetadata()
which converts data in json format as well as saves it to file. For
converting data in json format, one needs to know what is being saved.
Break this function down in two functions. One function only has file
write capability and takes in argument about byte array of json data.
Now this function does not have to know what data is being saved. It
only knows about a stream of json data is being saved to a file.
This allows me to reuse this function to save a different type of
metadata. In this case I am planning to save NextDeviceId so that
docker can use this device Id upon next restart. Otherwise docker
starts from 0 which is suboptimal.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
In an effort to make layer content 'stable' between import
and export from two different graph drivers, we must resolve
an issue where AUFS produces metadata files in its layers
which other drivers explicitly ignore when importing.
The issue presents itself like this:
- Generate a layer using AUFS
- On commit of that container, the new stored layer contains
AUFS metadata files/dirs. The stored layer content has some
tarsum value: '1234567'
- `docker save` that image to a USB drive and `docker load`
into another docker engine instance which uses another
graph driver, say 'btrfs'
- On load, this graph driver explicitly ignores any AUFS metadata
that it encounters. The stored layer content now has some
different tarsum value: 'abcdefg'.
The only (apparent) useful aufs metadata to keep are the psuedo link
files located at `/.wh..wh.plink/`. Thes files hold information at the
RW layer about hard linked files between this layer and another layer.
The other graph drivers make sure to copy up these psuedo linked files
but I've tested out a few different situations and it seems that this
is unnecessary (In my test, AUFS already copies up the other hard linked
files to the RW layer).
This changeset adds explicit exclusion of the AUFS metadata files and
directories (NOTE: not the whiteout files!) on commit of a container
using the AUFS storage driver.
Also included is a change to the archive package. It now explicitly
ignores the root directory from being included in the resulting tar archive
for 2 reasons: 1) it's unnecessary. 2) It's another difference between
what other graph drivers produce when exporting a layer to a tar archive.
Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)
The vfs storage driver currently shells out to the `cp` binary on the host
system to perform an 'archive' copy of the base image to a new directory.
The archive option preserves the modified time of the files which are created
but there was an issue where it was unable to preserve the modified time of
copied symbolic links on some host systems with an outdated version of `cp`.
This change no longer relies on the host system implementation and instead
utilizes the `CopyWithTar` function found in `pkg/archive` which is used
to copy from source to destination directory using a Tar archive, which
should correctly preserve file attributes.
Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)
Now that the archive package does not depend on any docker-specific
packages, only those in pkg and vendor, it can be safely moved into pkg.
Signed-off-by: Rafe Colton <rafael.colton@gmail.com>
If `--storage-opt dm.datadev=/dev/loop0 --storage-opt
dm.metadatadev=/dev/loop1 ` were provided, the information was not
reflected in the information output.
Closes: #7137
Signed-off-by: Vincent Batts <vbatts@redhat.com>
Some graphdrivers are Differs and type assertions are made
in various places throughout the project. Differ offers some
convenience in generating/applying diffs of filesystem layers
but for most graphdrivers another code path is taken.
This patch brings all of the logic related to filesystem
diffs in one place, and simplifies the implementation of some
common types like Image, Daemon, and Container.
Signed-off-by: Josh Hawn <josh.hawn@docker.com>
Since these will be shared between containers we want to label
them as svirt_sandbox_file_t:s0. That will allow multiple containers
to write to them.
Currently we are allowing container domains to read/write all content in
/var/lib/docker because of container volumes. This is a big security hole
in our SELinux story.
This patch will allow us to tighten up the security of docker containers.
Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
functions to pkg/parsers/kernel, and parsing filters to
pkg/parsers/filter. Adjust imports and package references.
Docker-DCO-1.1-Signed-off-by: Erik Hollensbe <github@hollensbe.org> (github: erikh)
Commit 09ee269d ("devmapper: Add option for specifying the thin pool
blocksize") also switched the default dm-thin-pool blocksize from 64K to
512K. That change unfortunately breaks the activation of dm-thin-pool
devices that were previously created using a 64K blocksize. Here is an
example of the dm-thin-pool activation failure users may experience:
device-mapper: thin: 253:4: pool target (204800 blocks) too small: expected 1638400
device-mapper: table: 253:4: thin-pool: preresume failed, error = -22
The reason for this is docker is passing 512K as the blocksize for a
dm-thin-pool that was previously created using a 64K blocksize. Docker
doesn't record the blocksize the is used when it creates a dm-thin-pool.
Until now it never had a need to do so because the blocksize was always
hardcoded. The dm-thin-pool blocksize must be the same every time a
dm-thin-pool is activated.
As a stop-gap fix, revert to using 64K for the default blocksize.
But we do need a proper fix for this now that 'dm.blocksize' is exposed
as a proper storage option. One possible fix would be to record the
blocksize for each dm-thin-pool that docker creates and to pass that
recorded blocksize down in the dmsetup table load each time the
dm-thin-pool is activated (this would be comparable to what lvm2 does).
Docker-DCO-1.1-Signed-off-by: Mike Snitzer <snitzer@redhat.com> (github: snitm)
Add dm.blocksize option that you can use with --storage-opt to set a
specific blocksize for the thin provisioning pool.
Also change the default dm-thin-pool blocksize from 64K to 512K. This
strikes a balance between the desire to have smaller blocksize given
docker's use of snapshots versus the desire to have more performance
that comes with using a larger blocksize. But if very small files will
be used on average the user is encouraged to override this default.
Docker-DCO-1.1-Signed-off-by: Mike Snitzer <snitzer@redhat.com> (github: snitm)
Device Mapper needs device sizes in binary (1024) multiples. Otherwise
kernel checks can find that the specified thin-pool device sizes aren't
a multiple of the specified thin-pool blocksize.
The name for "RAMInBytes" is likely too narrow given the new consumers
but... Also add "tebibyte" support to RAMInBytes.
Docker-DCO-1.1-Signed-off-by: Mike Snitzer <snitzer@redhat.com> (github: snitm)
createPool() and reloadPool() should be consistent with the thin-pool
table params they use.
Since createPool() specifies '1 skip_block_zeroing' reloadPool() should
too. Otherwise, if the pool is reloaded (as is done when resizing
loopback devices) block zeroing will be enabled after the reload
completes.
Docker-DCO-1.1-Signed-off-by: Mike Snitzer <snitzer@redhat.com> (github: snitm)