Commit graph

178 commits

Author SHA1 Message Date
Rob Murray
6c68be24a2 Windows DNS resolver forwarding
Make the internal DNS resolver for Windows containers forward requests
to upsteam DNS servers when it cannot respond itself, rather than
returning SERVFAIL.

Windows containers are normally configured with the internal resolver
first for service discovery (container name lookup), then external
resolvers from '--dns' or the host's networking configuration.

When a tool like ping gets a SERVFAIL from the internal resolver, it
tries the other nameservers. But, nslookup does not, and with this
change it does not need to.

The internal resolver learns external server addresses from the
container's HNSEndpoint configuration, so it will use the same DNS
servers as processes in the container.

The internal resolver for Windows containers listens on the network's
gateway address, and each container may have a different set of external
DNS servers. So, the resolver uses the source address of the DNS request
to select external resolvers.

On Windows, daemon.json feature option 'windows-no-dns-proxy' can be used
to prevent the internal resolver from forwarding requests (restoring the
old behaviour).

Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-04-16 18:57:28 +01:00
Rob Murray
beb97f7fdf Refactor 'resolv.conf' generation.
Replace regex matching/replacement and re-reading of generated files
with a simple parser, and struct to remember and manipulate the file
content.

Annotate the generated file with a header comment saying the file is
generated, but can be modified, and a trailing comment describing how
the file was generated and listing external nameservers.

Always start with the host's resolv.conf file, whether generating config
for host networking, or with/without an internal resolver - rather than
editing a file previously generated for a different use-case.

Resolves an issue where rewrites of the generated file resulted in
default IPv6 nameservers being unnecessarily added to the config.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-02-06 22:26:12 +00:00
Paweł Gronowski
f07387466a
daemon/oci: Extract side effects from withMounts
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2024-01-19 17:27:16 +01:00
Sebastiaan van Stijn
cff4f20c44
migrate to github.com/containerd/log v0.1.0
The github.com/containerd/containerd/log package was moved to a separate
module, which will also be used by upcoming (patch) releases of containerd.

This patch moves our own uses of the package to use the new module.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-10-11 17:52:23 +02:00
Albin Kerouanton
ff503882f7
daemon: Improve NetworkingConfig & EndpointSettings validation
So far, only a subset of NetworkingConfig was validated when calling
ContainerCreate. Other parameters would be validated when the container
was started. And the same goes for EndpointSettings on NetworkConnect.

This commit adds two validation steps:

1. Check if the IP addresses set in endpoint's IPAMConfig are valid,
   when ContainerCreate and ConnectToNetwork is called ;
2. Check if the network allows static IP addresses, only on
   ConnectToNetwork as we need the libnetwork's Network for that and it
   might not exist until NetworkAttachment requests are sent to the
   Swarm leader (which happens only when starting the container) ;

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2023-09-18 17:21:06 +02:00
Sebastiaan van Stijn
dd26e6b15e
daemon: Daemon.getIpcContainer: make errors less repetitive
- Most error-message returned would already include "container" and the
  container ID in the error-message (e.g. "container %s is not running"),
  so there's no need to add a custom prefix for that.
- os.Stat returns a PathError, which already includes the operation ("stat"),
  the path, and the underlying error that occurred.

And while updating, let's also fix the name to be proper camelCase :)

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-08-24 16:20:42 +02:00
Sebastiaan van Stijn
3d94eb9bcd
daemon: Daemon.getPidContainer: change to accept "id" argument
This function didn't need the whole container, only its ID, so let's
use that as argument. This also makes it consistent with getIpcContainer.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-08-24 16:20:42 +02:00
Sebastiaan van Stijn
bc7f341f29
daemon: WithNamespaces(): fix incorrect error for PID, IPC namespace
`Daemon.getPidContainer()` was wrapping the error-message with a message
("cannot join PID of a non running container") that did not reflect the
actual reason for the error; `Daemon.GetContainer()` could either return
an invalid parameter (invalid / empty identifier), or a "not found" error
if the specified container-ID could not be found.

In the latter case, we don't want to return a "not found" error through
the API, as this would indicate that the container we're _starting_ was
not found (which is not the case), so we need to convert the error into
an `errdefs.ErrInvalidParameter` (the container-ID specified for the PID
namespace is invalid if the container doesn't exist).

This logic is similar to what we do for IPC namespaces. which received
a similar fix in c3d7a0c603.

This patch updates the error-types, and moves them into the getIpcContainer
and getPidContainer container functions, both of which should return
an "invalid parameter" if the container was not found.

It's worth noting that, while `WithNamespaces()` may return an "invalid
parameter" error, the `start` endpoint itself may _not_ be. as outlined
in commit bf1fb97575, starting a container
that has an invalid configuration should be considered an internal server
error, and is not an invalid _request_. However, for uses other than
container "start", `WithNamespaces()` should return the correct error
to allow code to handle it accordingly.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-08-24 16:19:07 +02:00
Sebastiaan van Stijn
13648a0e21
daemon: remove Daemon.checkContainer and related utils
This was added in 12485d62ee to save some
duplication, but was really over-engineered to save a few lines of code,
at the cost of hiding away what it does and also potentially returning
inconsistent errors (not addressed in this patch). Let's start with
inlining these.

This removes;

- Daemon.checkContainer
- daemon.containerIsRunning
- daemon.containerIsNotRestarting

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-08-24 16:12:18 +02:00
Sebastiaan van Stijn
603547fa19
daemon: change Daemon.setupPathsAndSandboxOptions to a regular func
It's not using the daemon in any way, so let's change it to a regular
function.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-08-02 16:14:15 +02:00
Sebastiaan van Stijn
5e2a1195d7
swap logrus types for their containerd/logs aliases
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-08-01 13:02:55 +02:00
Sebastiaan van Stijn
210932b3bf
daemon: format code with gofumpt
Formatting the code with https://github.com/mvdan/gofumpt

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-06-29 00:33:03 +02:00
Brian Goff
74da6a6363 Switch all logging to use containerd log pkg
This unifies our logging and allows us to propagate logging and trace
contexts together.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2023-06-24 00:23:44 +00:00
Cory Snider
0b592467d9 daemon: read-copy-update the daemon config
Ensure data-race-free access to the daemon configuration without
locking by mutating a deep copy of the config and atomically storing
a pointer to the copy into the daemon-wide configStore value. Any
operations which need to read from the daemon config must capture the
configStore value only once and pass it around to guarantee a consistent
view of the config.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-01 14:45:24 -04:00
Sebastiaan van Stijn
ab35df454d
remove pre-go1.17 build-tags
Removed pre-go1.17 build-tags with go fix;

    go mod init
    go fix -mod=readonly ./...
    rm go.mod

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-05-19 20:38:51 +02:00
Sebastiaan van Stijn
9d5e754caa
move pkg/system: process to a separate package
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-11-04 01:50:23 +01:00
Sebastiaan van Stijn
970ad4e3c7
pkg/system: IsProcessZombie() ignore "os.ErrNotExist" errors
If the file doesn't exist, the process isn't running, so we should be able
to ignore that.

Also remove an intermediate variable.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-11-04 01:49:49 +01:00
Sebastiaan van Stijn
ea1eb449b7
daemon: killWithSignal, killPossiblyDeadProcess: accept syscall.Signal
This helps reducing some type-juggling / conversions further up
the stack.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-05-05 00:53:52 +02:00
Brian Goff
03f1c3d78f
Lock down docker root dir perms.
Do not use 0701 perms.
0701 dir perms allows anyone to traverse the docker dir.
It happens to allow any user to execute, as an example, suid binaries
from image rootfs dirs because it allows traversal AND critically
container users need to be able to do execute things.

0701 on lower directories also happens to allow any user to modify
     things in, for instance, the overlay upper dir which neccessarily
     has 0755 permissions.

This changes to use 0710 which allows users in the group to traverse.
In userns mode the UID owner is (real) root and the GID is the remapped
root's GID.

This prevents anyone but the remapped root to traverse our directories
(which is required for userns with runc).

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit ef7237442147441a7cadcda0600be1186d81ac73)
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit 93ac040bf0)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-10-05 09:57:00 +02:00
Eng Zer Jun
c55a4ac779
refactor: move from io/ioutil to io and os package
The io/ioutil package has been deprecated in Go 1.16. This commit
replaces the existing io/ioutil functions with their new definitions in
io and os packages.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2021-08-27 14:56:57 +08:00
Sebastiaan van Stijn
686be57d0a
Update to Go 1.17.0, and gofmt with Go 1.17
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-08-24 23:33:27 +02:00
Brian Goff
4b981436fe Fixup libnetwork lint errors
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2021-06-01 23:48:32 +00:00
Brian Goff
a0a473125b Fix libnetwork imports
After moving libnetwork to this repo, we need to update all the import
paths for libnetwork to point to docker/docker/libnetwork instead of
docker/libnetwork.
This change implements that.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2021-06-01 21:51:23 +00:00
Cam
e57a365ab1
docker kill: fix bug where failed kills didnt fallback to unix kill
1. fixes #41587
2. removes potential infinite Wait and goroutine leak at end of kill
function

fixes #41587

Signed-off-by: Cam <gh@sparr.email>
2021-04-14 15:43:44 -07:00
Brian Goff
7f5e39bd4f
Use real root with 0701 perms
Various dirs in /var/lib/docker contain data that needs to be mounted
into a container. For this reason, these dirs are set to be owned by the
remapped root user, otherwise there can be permissions issues.
However, this uneccessarily exposes these dirs to an unprivileged user
on the host.

Instead, set the ownership of these dirs to the real root (or rather the
UID/GID of dockerd) with 0701 permissions, which allows the remapped
root to enter the directories but not read/write to them.
The remapped root needs to enter these dirs so the container's rootfs
can be configured... e.g. to mount /etc/resolve.conf.

This prevents an unprivileged user from having read/write access to
these dirs on the host.
The flip side of this is now any user can enter these directories.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit e908cc3901)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-02-02 13:01:25 +01:00
Sebastiaan van Stijn
7335167340
Remove redundant "os.IsNotExist" checks on os.RemoveAll()
`os.RemoveAll()` should never return this error. From the docs:

> If the path does not exist, RemoveAll returns nil (no error).

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-09-23 10:30:53 +02:00
Sebastiaan van Stijn
cf169b45bb
daemon.setupPathsAndSandboxOptions() skip resolving symlinks
This came up in a review of a5324d6950, but
for some reason that comment didn't find its way to GitHub, and/or I
forgot to push the change.

These files are "copied" by reading their content with ioutil.Readfile(),
resolving the symlinks should therefore not be needed, and paths can be
passed as-is;

```go
func copyFile(src, dst string) error {
	sBytes, err := ioutil.ReadFile(src)
	if err != nil {
		return err
	}
	return ioutil.WriteFile(dst, sBytes, filePerm)
}
```

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-08-13 13:54:09 +02:00
Sebastiaan van Stijn
a5324d6950
Better selection of DNS server
Commit e353e7e3f0 updated selection of the
`resolv.conf` file to use in situations where systemd-resolvd is used as
a resolver.

If a host uses `systemd-resolvd`, the system's `/etc/resolv.conf` file is
updated to set `127.0.0.53` as DNS, which is the local IP address for
systemd-resolvd. The DNS servers that are configured by the user will now
be stored in `/run/systemd/resolve/resolv.conf`, and systemd-resolvd acts
as a forwarding DNS for those.

Originally, Docker copied the DNS servers as configured in `/etc/resolv.conf`
as default DNS servers in containers, which failed to work if systemd-resolvd
is used (as `127.0.0.53` is not available inside the container's networking
namespace). To resolve this, e353e7e3f0 instead
detected if systemd-resolvd is in use, and in that case copied the "upstream"
DNS servers from the `/run/systemd/resolve/resolv.conf` configuration.

While this worked for most situations, it had some downsides, among which:

- we're skipping systemd-resolvd altogether, which means that we cannot take
  advantage of addition functionality provided by it (such as per-interface
  DNS servers)
- when updating DNS servers in the system's configuration, those changes were
  not reflected in the container configuration, which could be problematic in
  "developer" scenarios, when switching between networks.

This patch changes the way we select which resolv.conf to use as template
for the container's resolv.conf;

- in situations where a custom network is attached to the container, and the
  embedded DNS is available, we use `/etc/resolv.conf` unconditionally. If
  systemd-resolvd is used, the embedded DNS forwards external DNS lookups to
  systemd-resolvd, which in turn is responsible for forwarding requests to
  the external DNS servers configured by the user.
- if the container is running in "host mode" networking, we also use the
  DNS server that's configured in `/etc/resolv.conf`. In this situation, no
  embedded DNS server is available, but the container runs in the host's
  networking namespace, and can use the same DNS servers as the host (which
  could be systemd-resolvd or DNSMasq
- if the container uses the default (bridge) network, no embedded DNS is
  available, and the container has its own networking namespace. In this
  situation we check if systemd-resolvd is used, in which case we skip
  systemd-resolvd, and configure the upstream DNS servers as DNS for the
  container. This situation is the same as is used currently, which means
  that dynamically switching DNS servers won't be supported for these
  containers.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-05-25 18:20:56 +02:00
Brian Goff
620bce847d
Merge pull request #40749 from DanielQujun/zombie_check_for_container
add zombie check for container when killing it
2020-04-16 12:01:58 -07:00
Sebastiaan van Stijn
eb14d936bf
daemon: rename variables that collide with imported package names
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-04-14 17:22:23 +02:00
屈骏
f3c1eec99e add zombie check for container when killing it, alernative fix for #40735.
Signed-off-by: 屈骏 <qujun@tiduyun.com>
2020-04-10 16:46:31 +08:00
Kir Kolyshkin
39048cf656 Really switch to moby/sys/mount*
Switch to moby/sys/mount and mountinfo. Keep the pkg/mount for potential
outside users.

This commit was generated by the following bash script:

```
set -e -u -o pipefail

for file in $(git grep -l 'docker/docker/pkg/mount"' | grep -v ^pkg/mount); do
	sed -i -e 's#/docker/docker/pkg/mount"#/moby/sys/mount"#' \
		-e 's#mount\.\(GetMounts\|Mounted\|Info\|[A-Za-z]*Filter\)#mountinfo.\1#g' \
		$file
	goimports -w $file
done
```

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-20 09:46:25 -07:00
Sebastiaan van Stijn
e128f17508
Rename variable for consistency
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-09-03 10:51:45 +02:00
Tibor Vass
56bb8fb324
Merge pull request #38579 from thaJeztah/fix_net_host_systemd_resolved
Fix: network=host using wrong resolv.conf with systemd-resolved
2019-03-21 15:34:48 -07:00
Sebastiaan van Stijn
20dde01848
Move EnableServiceDiscoveryOnDefaultNetwork to container-operations
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-03-20 18:45:20 +01:00
Sebastiaan van Stijn
0169ad3e2a
Remove redundant isNetworkHotPluggable() function
All platforms now have hot-pluggable networks, so this
check was no longer needed.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-03-20 18:45:07 +01:00
Kir Kolyshkin
596ca142e0 daemon: use 'private' ipc mode by default
This changes the default ipc mode of daemon/engine to be private,
meaning the containers will not have their /dev/shm bind-mounted
from the host by default. The benefits of doing this are:

 1. No leaked mounts. Eliminate a possibility to leak mounts into
    other namespaces (and therefore unfortunate errors like "Unable to
    remove filesystem for <ID>: remove /var/lib/docker/containers/<ID>/shm:
    device or resource busy").

 2. Working checkpoint/restore. Make `docker checkpoint`
    not lose the contents of `/dev/shm`, but save it to
    the dump, and be restored back upon `docker start --checkpoint`
    (currently it is lost -- while CRIU handles tmpfs mounts,
    the "shareable" mount is seen as external to container,
    and thus rightfully ignored).

3. Better security. Currently any container is opened to share
   its /dev/shm with any other container.

Obviously, this change will break the following usage scenario:

 $ docker run -d --name donor busybox top
 $ docker run --rm -it --ipc container:donor busybox sh
 Error response from daemon: linux spec namespaces: can't join IPC
 of container <ID>: non-shareable IPC (hint: use IpcMode:shareable
 for the donor container)

The soution, as hinted by the (amended) error message, is to
explicitly enable donor sharing by using --ipc shareable:

 $ docker run -d --name donor --ipc shareable busybox top

Compatibility notes:

1. This only applies to containers created _after_ this change.
   Existing containers are not affected and will work fine
   as their ipc mode is stored in HostConfig.

2. Old backward compatible behavior ("shareable" containers
   by default) can be enabled by either using
   `--default-ipc-mode shareable` daemon command line option,
   or by adding a `"default-ipc-mode": "shareable"`
   line in `/etc/docker/daemon.json` configuration file.

3. If an older client (API < 1.40) is used, a "shareable" container
   is created. A test to check that is added.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2019-03-09 18:57:42 -08:00
Drew Erny
6f1d7ddfa4 Use Runtime target
The Swarmkit api specifies a target for configs called called "Runtime"
which indicates that the config is not mounted into the container but
has some other use. This commit updates the Docker api to reflect this.

Signed-off-by: Drew Erny <drew.erny@docker.com>
2019-02-19 13:14:17 -06:00
Sebastiaan van Stijn
8364d1c9d5
Fix: network=host using wrong resolv.conf with systemd-resolved
When running a container in the host's network namespace, the container
gets a copy of the host's resolv.conf (copied to `/etc/resolv.conf` inside
the container).

The current code always used the default (`/etc/resolv.conf`) path on the
host, irregardless if `systemd-resolved` was used or not.

This patch uses the correct file if `systemd-resolved` was detected
to be running.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-01-10 22:58:55 +01:00
Kir Kolyshkin
77bc327e24 UnmountIpcMount: simplify
As standard mount.Unmount does what we need, let's use it.

In addition, this adds ignoring "not mounted" condition, which
was previously implemented (see PR#33329, commit cfa2591d3f)
via a very expensive call to mount.Mounted().

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2018-12-10 20:06:10 -08:00
Salahuddin Khan
763d839261 Add ADD/COPY --chown flag support to Windows
This implements chown support on Windows. Built-in accounts as well
as accounts included in the SAM database of the container are supported.

NOTE: IDPair is now named Identity and IDMappings is now named
IdentityMapping.

The following are valid examples:
ADD --chown=Guest . <some directory>
COPY --chown=Administrator . <some directory>
COPY --chown=Guests . <some directory>
COPY --chown=ContainerUser . <some directory>

On Windows an owner is only granted the permission to read the security
descriptor and read/write the discretionary access control list. This
fix also grants read/write and execute permissions to the owner.

Signed-off-by: Salahuddin Khan <salah@docker.com>
2018-08-13 21:59:11 -07:00
Kazuhiro Sera
1e49fdcafc Fix the several typos detected by github.com/client9/misspell
Signed-off-by: Kazuhiro Sera <seratch@gmail.com>
2018-08-09 00:45:00 +09:00
Flavio Crisciani
e353e7e3f0
Fixes for resolv.conf
Handle the case of systemd-resolved, and if in place
use a different resolv.conf source.
Set appropriately the option on libnetwork.
Move unix specific code to container_operation_unix

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2018-07-26 11:17:56 -07:00
Michael Crosby
4c000662fe Don't make container mount unbindable
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2018-04-10 15:14:34 -04:00
Brian Goff
c02171802b Merge configs/secrets in unix implementation
On unix, merge secrets/configs handling. This is important because
configs can contain secrets (via templating) and potentially a config
could just simply have secret information "by accident" from the user.
This just make sure that configs are as secure as secrets and de-dups a
lot of code.
Generally this makes everything simpler and configs more secure.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-02-16 11:25:14 -05:00
Brian Goff
8e8f5f4457 Always mount configs with tmpfs
This makes configs and secrets behavior identical.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-02-16 11:25:14 -05:00
Aaron Lehmann
426f4e48e3 daemon: Check return value of createSecretDir
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
2018-02-16 11:25:14 -05:00
Aaron Lehmann
cd3d0486a6 Store configs that contain secrets on tmpfs
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
2018-02-16 11:25:14 -05:00
Daniel Nephin
4f0d95fa6e Add canonical import comment
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2018-02-05 16:51:57 -05:00
Yong Tang
3305221eef Fix secret and config mode issue
This fix tries to address the issue raised in 36042
where secret and config are not configured with the
specified file mode.

This fix update the file mode so that it is not impacted
with umask.

Additional tests have been added.

This fix fixes 36042.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2018-01-28 16:21:41 +00:00