environment not in the chroot from untrusted files.
See also OpenVZ a3f732ef75/src/enter.c (L227-L234)
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
(cherry picked from commit cea6dca993c2b4cfa99b1e7a19ca134c8ebc236b)
Signed-off-by: Tibor Vass <tibor@docker.com>
Path-specific rules were removed, so this is no longer used.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 530e63c1a61b105a6f7fc143c5acb9b5cd87f958)
Signed-off-by: Tibor Vass <tibor@docker.com>
(cherry picked from commit f8a0f26843)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Commit 77b8465d7e added a secret update
endpoint to allow updating labels on existing secrets. However, when
implementing the endpoint, the DebugRequestMiddleware was not updated
to scrub the Data field (as is being done when creating a secret).
When updating a secret (to set labels), the Data field should be either
`nil` (not set), or contain the same value as the existing secret. In
situations where the Data field is set, and the `dockerd` daemon is
running with debugging enabled / log-level debug, the base64-encoded
value of the secret is printed to the daemon logs.
The docker cli does not have a `docker secret update` command, but
when using `docker stack deploy`, the docker cli sends the secret
data both when _creating_ a stack, and when _updating_ a stack, thus
leaking the secret data if the daemon runs with debug enabled:
1. Start the daemon in debug-mode
dockerd --debug
2. Initialize swarm
docker swarm init
3. Create a file containing a secret
echo secret > my_secret.txt
4. Create a docker-compose file using that secret
cat > docker-compose.yml <<'EOF'
version: "3.3"
services:
web:
image: nginx:alpine
secrets:
- my_secret
secrets:
my_secret:
file: ./my_secret.txt
EOF
5. Deploy the stack
docker stack deploy -c docker-compose.yml test
6. Verify that the secret is scrubbed in the daemon logs
DEBU[2019-07-01T22:36:08.170617400Z] Calling POST /v1.30/secrets/create
DEBU[2019-07-01T22:36:08.171364900Z] form data: {"Data":"*****","Labels":{"com.docker.stack.namespace":"test"},"Name":"test_my_secret"}
7. Re-deploy the stack to trigger an "update"
docker stack deploy -c docker-compose.yml test
8. Notice that this time, the Data field is not scrubbed, and the base64-encoded secret is logged
DEBU[2019-07-01T22:37:35.828819400Z] Calling POST /v1.30/secrets/w3hgvwpzl8yooq5ctnyp71v52/update?version=34
DEBU[2019-07-01T22:37:35.829993700Z] form data: {"Data":"c2VjcmV0Cg==","Labels":{"com.docker.stack.namespace":"test"},"Name":"test_my_secret"}
This patch modifies `maskSecretKeys` to unconditionally scrub `Data` fields.
Currently, only the `secrets` and `configs` endpoints use a field with this
name, and no other POST API endpoints use a data field, so scrubbing this
field unconditionally will only scrub requests for those endpoints.
If a new endpoint is added in future where this field should not be scrubbed,
we can re-introduce more fine-grained (path-specific) handling.
This patch introduces some change in behavior:
- In addition to secrets, requests to create or update _configs_ will
now have their `Data` field scrubbed. Generally, the actual data should
not be interesting for debugging, so likely will not be problematic.
In addition, scrubbing this data for configs may actually be desirable,
because (even though they are not explicitely designed for this purpose)
configs may contain sensitive data (credentials inside a configuration
file, e.g.).
- Requests that send key/value pairs as a "map" and that contain a
key named "data", will see the value of that field scrubbed. This
means that (e.g.) setting a `label` named `data` on a config, will
scrub/mask the value of that label.
- Note that this is already the case for any label named `jointoken`,
`password`, `secret`, `signingcakey`, or `unlockkey`.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit c7ce4be93ae8edd2da62a588e01c67313a4aba0c)
Signed-off-by: Tibor Vass <tibor@docker.com>
(cherry picked from commit 73db8c77bf)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 32d70c7e21631224674cd60021d3ec908c2d888c)
Signed-off-by: Tibor Vass <tibor@docker.com>
(cherry picked from commit ebb542b3f8)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Add tests for
- case-insensitive matching of fields
- recursive masking
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit db5f811216e70bcb4a10e477c1558d6c68f618c5)
Signed-off-by: Tibor Vass <tibor@docker.com>
(cherry picked from commit 18dac2cf32)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Co-Authored-By: Jim Ehrismann <jim-docker@users.noreply.github.com>
Co-Authored-By: Sebastiaan van Stijn <thaJeztah@users.noreply.github.com>
Signed-off-by: Jim Ehrismann <jim.ehrismann@docker.com>
(cherry picked from commit a77e147d32)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This will add a warning log in the daemon, and will send the message
to be displayed by the CLI.
Signed-off-by: Tibor Vass <tibor@docker.com>
(cherry picked from commit d35f8f4329)
Signed-off-by: Tibor Vass <tibor@docker.com>
This reverts commit 98fc09128b in order to
keep registry v2 schema1 handling and libtrust-key-based engine ID.
Because registry v2 schema1 was not officially deprecated and
registries are still relying on it, this patch puts its logic back.
However, registry v1 relics are not added back since v1 logic has been
removed a while ago.
This also fixes an engine upgrade issue in a swarm cluster. It was relying
on the Engine ID to be the same upon upgrade, but the mentioned commit
modified the logic to use UUID and from a different file.
Since the libtrust key is always needed to support v2 schema1 pushes,
that the old engine ID is based on the libtrust key, and that the engine ID
needs to be conserved across upgrades, adding a UUID-based engine ID logic
seems to add more complexity than it solves the problems.
Hence reverting the engine ID changes as well.
Signed-off-by: Tibor Vass <tibor@docker.com>
(cherry picked from commit f695e98cb7)
Signed-off-by: Tibor Vass <tibor@docker.com>
This reverts commit 13b7d11be1.
Signed-off-by: Tibor Vass <tibor@docker.com>
(cherry picked from commit f23a51a860)
Signed-off-by: Tibor Vass <tibor@docker.com>
Previously, getWalkRoot("/", "foo") would return "//foo"
Now it returns "/foo"
Signed-off-by: Tibor Vass <tibor@docker.com>
(cherry picked from commit 7410f1a859)
Signed-off-by: Tibor Vass <tibor@docker.com>
Before 7a7357da, archive.TarResourceRebase was being used to copy files
and folders from the container. That function splits the source path
into a dirname + basename pair to support copying a file:
if you wanted to tar `dir/file` it would tar from `dir` the file `file`
(as part of the IncludedFiles option).
However, that path splitting logic was kept for folders as well, which
resulted in weird inputs to archive.TarWithOptions:
if you wanted to tar `dir1/dir2` it would tar from `dir1` the directory
`dir2` (as part of IncludedFiles option).
Although it was weird, it worked fine until we started chrooting into
the container rootfs when doing a `docker cp` with container source set
to `/` (cf 3029e765).
The fix is to only do the path splitting logic if the source is a file.
Unfortunately, 7a7357da added support for LCOW by duplicating some of
this subtle logic. Ideally we would need to do more refactoring of the
archive codebase to properly encapsulate these behaviors behind well-
documented APIs.
This fix does not do that. Instead, it fixes the issue inline.
Signed-off-by: Tibor Vass <tibor@docker.com>
(cherry picked from commit 171538c190)
Signed-off-by: Tibor Vass <tibor@docker.com>
CID=$(docker create alpine)
docker cp $CID:/ out
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Signed-off-by: Tibor Vass <tibor@docker.com>
(cherry picked from commit 6db9f1c3d6)
Signed-off-by: Tibor Vass <tibor@docker.com>
retries to attach to a network, it is already connected to
Fixes - https://github.com/docker/for-linux/issues/632
Signed-off-by: Arko Dasgupta <arko.dasgupta@docker.com>
(cherry picked from commit 871acb1c86)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This test runs on a daemon also used by other tests
so make sure we don't get failures if another test
doesn't cleanup or is running in parallel.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 915acffdb4)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This adds some logs, handles timers better, and sets a request timeout
for the ping request.
I'm not sure the ticker in that loop is what we really want since the
ticker keeps ticking while we are (attempting) to make a request... but
I opted to not change that for now.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit 20ea8942b8)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Moby currently sorts uid and gid ranges in id maps. This causes subuid
and subgid files to be interpreted wrongly.
The subuid file
```
> cat /etc/subuid
jonas:100000:1000
jonas:1000:1
```
configures that the container uids 0-999 are mapped to the host uids
100000-100999 and uid 1000 in the container is mapped to uid 1000 on the
host. The expected uid_map is:
```
> docker run ubuntu cat /proc/self/uid_map
0 100000 1000
1000 1000 1
```
Moby currently sorts the ranges by the first id in the range. Therefore
with the subuid file above the uid 0 in the container is mapped to uid
100000 on host and the uids 1-1000 in container are mapped to the uids
1-1000 on the host. The resulting uid_map is:
```
> docker run ubuntu cat /proc/self/uid_map
0 1000 1
1 100000 1000
```
The ordering was implemented to work around a limitation in Linux 3.8.
This is fixed since Linux 3.9 as stated on the user namespaces manpage
[1]:
> In the initial implementation (Linux 3.8), this requirement was
> satisfied by a simplistic implementation that imposed the further
> requirement that the values in both field 1 and field 2 of successive
> lines must be in ascending numerical order, which prevented some
> otherwise valid maps from being created. Linux 3.9 and later fix this
> limitation, allowing any valid set of nonoverlapping maps.
This fix changes the interpretation of subuid and subgid files which do
not have the ids of in the numerical order for each individual user.
This breaks users that rely on the current behaviour.
The desired mapping above - map low user ids in the container to high
user ids on the host and some higher user ids in the container to lower
user on host - can unfortunately not archived with the current
behaviour.
[1] http://man7.org/linux/man-pages/man7/user_namespaces.7.html
Signed-off-by: Jonas Dohse <jonas@dohse.ch>
(cherry picked from commit c4628d79d2)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Running a bundled aufs benchmark sometimes results in this warning:
> WARN[0001] Couldn't run auplink before unmount /tmp/aufs-tests/aufs/mnt/XXXXX error="exit status 22" storage-driver=aufs
If we take a look at what aulink utility produces on stderr, we'll see:
> auplink:proc_mnt.c:96: /tmp/aufs-tests/aufs/mnt/XXXXX: Invalid argument
and auplink exits with exit code of 22 (EINVAL).
Looking into auplink source code, what happens is it tries to find a
record in /proc/self/mounts corresponding to the mount point (by using
setmntent()/getmntent_r() glibc functions), and it fails.
Some manual testing, as well as runtime testing with lots of printf
added on mount/unmount, as well as calls to check the superblock fs
magic on mount point (as in graphdriver.Mounted(graphdriver.FsMagicAufs, target)
confirmed that this record is in fact there, but sometimes auplink
can't find it. I was also able to reproduce the same error (inability
to find a mount in /proc/self/mounts that should definitely be there)
using a small C program, mocking what `auplink` does:
```c
#include <stdio.h>
#include <err.h>
#include <mntent.h>
#include <string.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
FILE *fp;
struct mntent m, *p;
char a[4096];
char buf[4096 + 1024];
int found =0, lines = 0;
if (argc != 2) {
fprintf(stderr, "Usage: %s <mountpoint>\n", argv[0]);
exit(1);
}
fp = setmntent("/proc/self/mounts", "r");
if (!fp) {
err(1, "setmntent");
}
setvbuf(fp, a, _IOLBF, sizeof(a));
while ((p = getmntent_r(fp, &m, buf, sizeof(buf)))) {
lines++;
if (!strcmp(p->mnt_dir, argv[1])) {
found++;
}
}
printf("found %d entries for %s (%d lines seen)\n", found, argv[1], lines);
return !found;
}
```
I have also wrote a few other C proggies -- one that reads
/proc/self/mounts directly, one that reads /proc/self/mountinfo instead.
They are also prone to the same occasional error.
It is not perfectly clear why this happens, but so far my best theory
is when a lot of mounts/unmounts happen in parallel with reading
contents of /proc/self/mounts, sometimes the kernel fails to provide
continuity (i.e. it skips some part of file or mixes it up in some
other way). In other words, this is a kernel bug (which is probably
hard to fix unless some other interface to get a mount entry is added).
Now, there is no real fix, and a workaround I was able to come up
with is to retry when we got EINVAL. It usually works on the second
attempt, although I've once seen it took two attempts to go through.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit ae431b10a9)
Do not use filepath.Walk() as there's no requirement to recursively
go into every directory under mnt -- a (non-recursive) list of
directories in mnt is sufficient.
With filepath.Walk(), in case some container will fail to unmount,
it'll go through the whole container filesystem which is both
excessive and useless.
This is similar to commit f1a4592297 ("devmapper.shutdown:
optimize")
While at it, raise the priority of "unmount error" message from debug
to a warning. Note we don't have to explicitly add `m` as unmount error (from
pkg/mount) will have it.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit 8fda12c607)
In case there are a big number of layers, so that mount data won't fit
into a single memory page (4096 bytes on most platforms, which is good
enough for about 40 layers, depending on how long graphdriver root path
is), we supply additional layers with O_REMOUNT, as described in aufs
documentation.
Problem is, the current implementation does that one layer at a time
(i.e. there is one mount syscall per each additional layer).
Optimize the code to supply as many layers as we can fit in one page
(basically reusing the same code as for the original mount).
Note, per aufs docs, "[a]t remount-time, the options are interpreted
in the given order, e.g. left to right" so we should be good.
Tested on an image with ~100 layers.
Before (35 syscalls):
> [pid 22756] 1556919088.686955 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/a86f8c9dd0ec2486293119c20b0ec026e19bbc4d51332c554f7cf05d777c9866", "aufs", 0, "br:/mnt/volume_sfo2_09/docker-au"...) = 0 <0.000504>
> [pid 22756] 1556919088.687643 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/a86f8c9dd0ec2486293119c20b0ec026e19bbc4d51332c554f7cf05d777c9866", 0xc000c451b0, MS_REMOUNT, "append:/mnt/volume_sfo2_09/docke"...) = 0 <0.000105>
> [pid 22756] 1556919088.687851 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/a86f8c9dd0ec2486293119c20b0ec026e19bbc4d51332c554f7cf05d777c9866", 0xc000c451ba, MS_REMOUNT, "append:/mnt/volume_sfo2_09/docke"...) = 0 <0.000098>
> ..... (~30 lines skipped for clarity)
> [pid 22756] 1556919088.696182 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/a86f8c9dd0ec2486293119c20b0ec026e19bbc4d51332c554f7cf05d777c9866", 0xc000c45310, MS_REMOUNT, "append:/mnt/volume_sfo2_09/docke"...) = 0 <0.000266>
After (2 syscalls):
> [pid 24352] 1556919361.799889 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/8e7ba189e347a834e99eea4ed568f95b86cec809c227516afdc7c70286ff9a20", "aufs", 0, "br:/mnt/volume_sfo2_09/docker-au"...) = 0 <0.001717>
> [pid 24352] 1556919361.801761 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/8e7ba189e347a834e99eea4ed568f95b86cec809c227516afdc7c70286ff9a20", 0xc000dbecb0, MS_REMOUNT, "append:/mnt/volume_sfo2_09/docke"...) = 0 <0.001358>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit d58c434bff)