Make the internal DNS resolver for Windows containers forward requests
to upsteam DNS servers when it cannot respond itself, rather than
returning SERVFAIL.
Windows containers are normally configured with the internal resolver
first for service discovery (container name lookup), then external
resolvers from '--dns' or the host's networking configuration.
When a tool like ping gets a SERVFAIL from the internal resolver, it
tries the other nameservers. But, nslookup does not, and with this
change it does not need to.
The internal resolver learns external server addresses from the
container's HNSEndpoint configuration, so it will use the same DNS
servers as processes in the container.
The internal resolver for Windows containers listens on the network's
gateway address, and each container may have a different set of external
DNS servers. So, the resolver uses the source address of the DNS request
to select external resolvers.
On Windows, daemon.json feature option 'windows-no-dns-proxy' can be used
to prevent the internal resolver from forwarding requests (restoring the
old behaviour).
Signed-off-by: Rob Murray <rob.murray@docker.com>
Replace regex matching/replacement and re-reading of generated files
with a simple parser, and struct to remember and manipulate the file
content.
Annotate the generated file with a header comment saying the file is
generated, but can be modified, and a trailing comment describing how
the file was generated and listing external nameservers.
Always start with the host's resolv.conf file, whether generating config
for host networking, or with/without an internal resolver - rather than
editing a file previously generated for a different use-case.
Resolves an issue where rewrites of the generated file resulted in
default IPv6 nameservers being unnecessarily added to the config.
Signed-off-by: Rob Murray <rob.murray@docker.com>
The github.com/containerd/containerd/log package was moved to a separate
module, which will also be used by upcoming (patch) releases of containerd.
This patch moves our own uses of the package to use the new module.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
So far, only a subset of NetworkingConfig was validated when calling
ContainerCreate. Other parameters would be validated when the container
was started. And the same goes for EndpointSettings on NetworkConnect.
This commit adds two validation steps:
1. Check if the IP addresses set in endpoint's IPAMConfig are valid,
when ContainerCreate and ConnectToNetwork is called ;
2. Check if the network allows static IP addresses, only on
ConnectToNetwork as we need the libnetwork's Network for that and it
might not exist until NetworkAttachment requests are sent to the
Swarm leader (which happens only when starting the container) ;
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
- Most error-message returned would already include "container" and the
container ID in the error-message (e.g. "container %s is not running"),
so there's no need to add a custom prefix for that.
- os.Stat returns a PathError, which already includes the operation ("stat"),
the path, and the underlying error that occurred.
And while updating, let's also fix the name to be proper camelCase :)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This function didn't need the whole container, only its ID, so let's
use that as argument. This also makes it consistent with getIpcContainer.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
`Daemon.getPidContainer()` was wrapping the error-message with a message
("cannot join PID of a non running container") that did not reflect the
actual reason for the error; `Daemon.GetContainer()` could either return
an invalid parameter (invalid / empty identifier), or a "not found" error
if the specified container-ID could not be found.
In the latter case, we don't want to return a "not found" error through
the API, as this would indicate that the container we're _starting_ was
not found (which is not the case), so we need to convert the error into
an `errdefs.ErrInvalidParameter` (the container-ID specified for the PID
namespace is invalid if the container doesn't exist).
This logic is similar to what we do for IPC namespaces. which received
a similar fix in c3d7a0c603.
This patch updates the error-types, and moves them into the getIpcContainer
and getPidContainer container functions, both of which should return
an "invalid parameter" if the container was not found.
It's worth noting that, while `WithNamespaces()` may return an "invalid
parameter" error, the `start` endpoint itself may _not_ be. as outlined
in commit bf1fb97575, starting a container
that has an invalid configuration should be considered an internal server
error, and is not an invalid _request_. However, for uses other than
container "start", `WithNamespaces()` should return the correct error
to allow code to handle it accordingly.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This was added in 12485d62ee to save some
duplication, but was really over-engineered to save a few lines of code,
at the cost of hiding away what it does and also potentially returning
inconsistent errors (not addressed in this patch). Let's start with
inlining these.
This removes;
- Daemon.checkContainer
- daemon.containerIsRunning
- daemon.containerIsNotRestarting
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Ensure data-race-free access to the daemon configuration without
locking by mutating a deep copy of the config and atomically storing
a pointer to the copy into the daemon-wide configStore value. Any
operations which need to read from the daemon config must capture the
configStore value only once and pass it around to guarantee a consistent
view of the config.
Signed-off-by: Cory Snider <csnider@mirantis.com>
If the file doesn't exist, the process isn't running, so we should be able
to ignore that.
Also remove an intermediate variable.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Do not use 0701 perms.
0701 dir perms allows anyone to traverse the docker dir.
It happens to allow any user to execute, as an example, suid binaries
from image rootfs dirs because it allows traversal AND critically
container users need to be able to do execute things.
0701 on lower directories also happens to allow any user to modify
things in, for instance, the overlay upper dir which neccessarily
has 0755 permissions.
This changes to use 0710 which allows users in the group to traverse.
In userns mode the UID owner is (real) root and the GID is the remapped
root's GID.
This prevents anyone but the remapped root to traverse our directories
(which is required for userns with runc).
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit ef7237442147441a7cadcda0600be1186d81ac73)
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit 93ac040bf0)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The io/ioutil package has been deprecated in Go 1.16. This commit
replaces the existing io/ioutil functions with their new definitions in
io and os packages.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
After moving libnetwork to this repo, we need to update all the import
paths for libnetwork to point to docker/docker/libnetwork instead of
docker/libnetwork.
This change implements that.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Various dirs in /var/lib/docker contain data that needs to be mounted
into a container. For this reason, these dirs are set to be owned by the
remapped root user, otherwise there can be permissions issues.
However, this uneccessarily exposes these dirs to an unprivileged user
on the host.
Instead, set the ownership of these dirs to the real root (or rather the
UID/GID of dockerd) with 0701 permissions, which allows the remapped
root to enter the directories but not read/write to them.
The remapped root needs to enter these dirs so the container's rootfs
can be configured... e.g. to mount /etc/resolve.conf.
This prevents an unprivileged user from having read/write access to
these dirs on the host.
The flip side of this is now any user can enter these directories.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit e908cc3901)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
`os.RemoveAll()` should never return this error. From the docs:
> If the path does not exist, RemoveAll returns nil (no error).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This came up in a review of a5324d6950, but
for some reason that comment didn't find its way to GitHub, and/or I
forgot to push the change.
These files are "copied" by reading their content with ioutil.Readfile(),
resolving the symlinks should therefore not be needed, and paths can be
passed as-is;
```go
func copyFile(src, dst string) error {
sBytes, err := ioutil.ReadFile(src)
if err != nil {
return err
}
return ioutil.WriteFile(dst, sBytes, filePerm)
}
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Commit e353e7e3f0 updated selection of the
`resolv.conf` file to use in situations where systemd-resolvd is used as
a resolver.
If a host uses `systemd-resolvd`, the system's `/etc/resolv.conf` file is
updated to set `127.0.0.53` as DNS, which is the local IP address for
systemd-resolvd. The DNS servers that are configured by the user will now
be stored in `/run/systemd/resolve/resolv.conf`, and systemd-resolvd acts
as a forwarding DNS for those.
Originally, Docker copied the DNS servers as configured in `/etc/resolv.conf`
as default DNS servers in containers, which failed to work if systemd-resolvd
is used (as `127.0.0.53` is not available inside the container's networking
namespace). To resolve this, e353e7e3f0 instead
detected if systemd-resolvd is in use, and in that case copied the "upstream"
DNS servers from the `/run/systemd/resolve/resolv.conf` configuration.
While this worked for most situations, it had some downsides, among which:
- we're skipping systemd-resolvd altogether, which means that we cannot take
advantage of addition functionality provided by it (such as per-interface
DNS servers)
- when updating DNS servers in the system's configuration, those changes were
not reflected in the container configuration, which could be problematic in
"developer" scenarios, when switching between networks.
This patch changes the way we select which resolv.conf to use as template
for the container's resolv.conf;
- in situations where a custom network is attached to the container, and the
embedded DNS is available, we use `/etc/resolv.conf` unconditionally. If
systemd-resolvd is used, the embedded DNS forwards external DNS lookups to
systemd-resolvd, which in turn is responsible for forwarding requests to
the external DNS servers configured by the user.
- if the container is running in "host mode" networking, we also use the
DNS server that's configured in `/etc/resolv.conf`. In this situation, no
embedded DNS server is available, but the container runs in the host's
networking namespace, and can use the same DNS servers as the host (which
could be systemd-resolvd or DNSMasq
- if the container uses the default (bridge) network, no embedded DNS is
available, and the container has its own networking namespace. In this
situation we check if systemd-resolvd is used, in which case we skip
systemd-resolvd, and configure the upstream DNS servers as DNS for the
container. This situation is the same as is used currently, which means
that dynamically switching DNS servers won't be supported for these
containers.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Switch to moby/sys/mount and mountinfo. Keep the pkg/mount for potential
outside users.
This commit was generated by the following bash script:
```
set -e -u -o pipefail
for file in $(git grep -l 'docker/docker/pkg/mount"' | grep -v ^pkg/mount); do
sed -i -e 's#/docker/docker/pkg/mount"#/moby/sys/mount"#' \
-e 's#mount\.\(GetMounts\|Mounted\|Info\|[A-Za-z]*Filter\)#mountinfo.\1#g' \
$file
goimports -w $file
done
```
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This changes the default ipc mode of daemon/engine to be private,
meaning the containers will not have their /dev/shm bind-mounted
from the host by default. The benefits of doing this are:
1. No leaked mounts. Eliminate a possibility to leak mounts into
other namespaces (and therefore unfortunate errors like "Unable to
remove filesystem for <ID>: remove /var/lib/docker/containers/<ID>/shm:
device or resource busy").
2. Working checkpoint/restore. Make `docker checkpoint`
not lose the contents of `/dev/shm`, but save it to
the dump, and be restored back upon `docker start --checkpoint`
(currently it is lost -- while CRIU handles tmpfs mounts,
the "shareable" mount is seen as external to container,
and thus rightfully ignored).
3. Better security. Currently any container is opened to share
its /dev/shm with any other container.
Obviously, this change will break the following usage scenario:
$ docker run -d --name donor busybox top
$ docker run --rm -it --ipc container:donor busybox sh
Error response from daemon: linux spec namespaces: can't join IPC
of container <ID>: non-shareable IPC (hint: use IpcMode:shareable
for the donor container)
The soution, as hinted by the (amended) error message, is to
explicitly enable donor sharing by using --ipc shareable:
$ docker run -d --name donor --ipc shareable busybox top
Compatibility notes:
1. This only applies to containers created _after_ this change.
Existing containers are not affected and will work fine
as their ipc mode is stored in HostConfig.
2. Old backward compatible behavior ("shareable" containers
by default) can be enabled by either using
`--default-ipc-mode shareable` daemon command line option,
or by adding a `"default-ipc-mode": "shareable"`
line in `/etc/docker/daemon.json` configuration file.
3. If an older client (API < 1.40) is used, a "shareable" container
is created. A test to check that is added.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The Swarmkit api specifies a target for configs called called "Runtime"
which indicates that the config is not mounted into the container but
has some other use. This commit updates the Docker api to reflect this.
Signed-off-by: Drew Erny <drew.erny@docker.com>
When running a container in the host's network namespace, the container
gets a copy of the host's resolv.conf (copied to `/etc/resolv.conf` inside
the container).
The current code always used the default (`/etc/resolv.conf`) path on the
host, irregardless if `systemd-resolved` was used or not.
This patch uses the correct file if `systemd-resolved` was detected
to be running.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
As standard mount.Unmount does what we need, let's use it.
In addition, this adds ignoring "not mounted" condition, which
was previously implemented (see PR#33329, commit cfa2591d3f)
via a very expensive call to mount.Mounted().
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This implements chown support on Windows. Built-in accounts as well
as accounts included in the SAM database of the container are supported.
NOTE: IDPair is now named Identity and IDMappings is now named
IdentityMapping.
The following are valid examples:
ADD --chown=Guest . <some directory>
COPY --chown=Administrator . <some directory>
COPY --chown=Guests . <some directory>
COPY --chown=ContainerUser . <some directory>
On Windows an owner is only granted the permission to read the security
descriptor and read/write the discretionary access control list. This
fix also grants read/write and execute permissions to the owner.
Signed-off-by: Salahuddin Khan <salah@docker.com>
Handle the case of systemd-resolved, and if in place
use a different resolv.conf source.
Set appropriately the option on libnetwork.
Move unix specific code to container_operation_unix
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
On unix, merge secrets/configs handling. This is important because
configs can contain secrets (via templating) and potentially a config
could just simply have secret information "by accident" from the user.
This just make sure that configs are as secure as secrets and de-dups a
lot of code.
Generally this makes everything simpler and configs more secure.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This fix tries to address the issue raised in 36042
where secret and config are not configured with the
specified file mode.
This fix update the file mode so that it is not impacted
with umask.
Additional tests have been added.
This fix fixes 36042.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>