Kubernetes only permits RuntimeClass values which are valid lowercase
RFC 1123 labels, which disallows the period character. This prevents
cri-dockerd from being able to support configuring alternative shimv2
runtimes for a pod as shimv2 runtime names must contain at least one
period character. Add support for configuring named shimv2 runtimes in
daemon.json so that runtime names can be aliased to
Kubernetes-compatible names.
Allow options to be set on shimv2 runtimes in daemon.json.
The names of the new daemon runtime config fields have been selected to
correspond with the equivalent field names in cri-containerd's
configuration so that users can more easily follow documentation from
the runtime vendor written for cri-containerd and apply it to
daemon.json.
Signed-off-by: Cory Snider <csnider@mirantis.com>
PTR queries with domain names unknown to us are not necessarily invalid.
Act like a well-behaved middlebox and fall back to forwarding
externally, same as we do with the other query types.
Signed-off-by: Cory Snider <csnider@mirantis.com>
...for limiting concurrent external DNS requests with
"golang.org/x/sync/semaphore".Weighted. Replace the ad-hoc rate limiter
for when the concurrency limit is hit (which contains a data-race bug)
with "golang.org/x/time/rate".Sometimes.
Immediately retrying with the next server if the concurrency limit has
been hit just further compounds the problem. Wait on the semaphore and
refuse the query if it could not be acquired in a reasonable amount of
time.
Signed-off-by: Cory Snider <csnider@mirantis.com>
It handles figuring out the UDP receive buffer size and setting IO
timeouts, which simplifies our code. It is also more robust to receiving
UDP replies to earlier queries which timed out.
Log failures to perform a client exchange at level error so they are
more visible to operators and administrators.
Signed-off-by: Cory Snider <csnider@mirantis.com>
forwardExtDNS() will now continue with the next external DNS sever if
co.ReadMsg() returns (nil, nil). Previously it would abort resolving the
query and not reply to the container client. The implementation of
ReadMsg() in the currently- vendored version of miekg/dns cannot return
(nil, nil) so the difference is immaterial in practice.
Signed-off-by: Cory Snider <csnider@mirantis.com>
(*dns.Msg).Truncate() is more intelligent and standards-compliant about
truncating DNS response messages than our hand-rolled version. Fix a
silly fencepost error the max TCP message size: the limit is
dns.MaxMsgSize (65535), full stop.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The TC flag in a DNS message indicates that the sender had to
truncate it to fit within the length limit of the transmission channel.
It does NOT indicate that part of the message was lost before reaching
the recipient. Older versions of github.com/miekg/dns conflated the two
cases by returning ErrTruncated from ReadMsg() if the message was parsed
without error but had the TC flag set. The version of miekg/dns
currently vendored no longer returns an error when a well-formed DNS
message is received which has its TC flag set, but there was some
confusion on how to update libnetwork to deal with this behaviour
change. Truncated DNS replies are no longer different from any other
reply message: they are normal replies which do not need any special-
case handling to proxy back to the client.
Signed-off-by: Cory Snider <csnider@mirantis.com>
go1.19.6 (released 2023-02-14) includes security fixes to the crypto/tls,
mime/multipart, net/http, and path/filepath packages, as well as bug fixes to
the go command, the linker, the runtime, and the crypto/x509, net/http, and
time packages. See the Go 1.19.6 milestone on our issue tracker for details:
https://github.com/golang/go/issues?q=milestone%3AGo1.19.6+label%3ACherryPickApproved
From the announcement on the security mailing:
We have just released Go versions 1.20.1 and 1.19.6, minor point releases.
These minor releases include 4 security fixes following the security policy:
- path/filepath: path traversal in filepath.Clean on Windows
On Windows, the filepath.Clean function could transform an invalid path such
as a/../c:/b into the valid path c:\b. This transformation of a relative (if
invalid) path into an absolute path could enable a directory traversal attack.
The filepath.Clean function will now transform this path into the relative
(but still invalid) path .\c:\b.
This is CVE-2022-41722 and Go issue https://go.dev/issue/57274.
- net/http, mime/multipart: denial of service from excessive resource
consumption
Multipart form parsing with mime/multipart.Reader.ReadForm can consume largely
unlimited amounts of memory and disk files. This also affects form parsing in
the net/http package with the Request methods FormFile, FormValue,
ParseMultipartForm, and PostFormValue.
ReadForm takes a maxMemory parameter, and is documented as storing "up to
maxMemory bytes +10MB (reserved for non-file parts) in memory". File parts
which cannot be stored in memory are stored on disk in temporary files. The
unconfigurable 10MB reserved for non-file parts is excessively large and can
potentially open a denial of service vector on its own. However, ReadForm did
not properly account for all memory consumed by a parsed form, such as map
ntry overhead, part names, and MIME headers, permitting a maliciously crafted
form to consume well over 10MB. In addition, ReadForm contained no limit on
the number of disk files created, permitting a relatively small request body
to create a large number of disk temporary files.
ReadForm now properly accounts for various forms of memory overhead, and
should now stay within its documented limit of 10MB + maxMemory bytes of
memory consumption. Users should still be aware that this limit is high and
may still be hazardous.
ReadForm now creates at most one on-disk temporary file, combining multiple
form parts into a single temporary file. The mime/multipart.File interface
type's documentation states, "If stored on disk, the File's underlying
concrete type will be an *os.File.". This is no longer the case when a form
contains more than one file part, due to this coalescing of parts into a
single file. The previous behavior of using distinct files for each form part
may be reenabled with the environment variable
GODEBUG=multipartfiles=distinct.
Users should be aware that multipart.ReadForm and the http.Request methods
that call it do not limit the amount of disk consumed by temporary files.
Callers can limit the size of form data with http.MaxBytesReader.
This is CVE-2022-41725 and Go issue https://go.dev/issue/58006.
- crypto/tls: large handshake records may cause panics
Both clients and servers may send large TLS handshake records which cause
servers and clients, respectively, to panic when attempting to construct
responses.
This affects all TLS 1.3 clients, TLS 1.2 clients which explicitly enable
session resumption (by setting Config.ClientSessionCache to a non-nil value),
and TLS 1.3 servers which request client certificates (by setting
Config.ClientAuth
> = RequestClientCert).
This is CVE-2022-41724 and Go issue https://go.dev/issue/58001.
- net/http: avoid quadratic complexity in HPACK decoding
A maliciously crafted HTTP/2 stream could cause excessive CPU consumption
in the HPACK decoder, sufficient to cause a denial of service from a small
number of small requests.
This issue is also fixed in golang.org/x/net/http2 v0.7.0, for users manually
configuring HTTP/2.
This is CVE-2022-41723 and Go issue https://go.dev/issue/57855.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
TestRequestReleaseAddressDuplicate gets flagged by go test -race because
the same err variable inside the test is assigned to from multiple
goroutines without synchronization, which obscures whether or not there
are any data races in the code under test.
Trouble is, the test _depends on_ the data race to exit the loop if an
error occurs inside a spawned goroutine. And the test contains a logical
concurrency bug (not flagged by the Go race detector) which can result
in false-positive test failures. Because a release operation is logged
after the IP is released, the other goroutine could reacquire the
address and log that it was reacquired before the release is logged.
Fix up the test so it is no longer subject to data races or
false-positive test failures, i.e. flakes.
Signed-off-by: Cory Snider <csnider@mirantis.com>
If the resolver encounters an error before it attempts to forward the
request to external DNS, do not try to log information about the
external connection, because at this point `extConn` is `nil`. This
makes sure `dockerd` won't panic and crash from a nil pointer
dereference when it sees an invalid DNS query.
fixes#44979
Signed-off-by: er0k <er0k@er0k.net>
(cherry picked from commit 6c2637be11)
Signed-off-by: Bjorn Neergaard <bneergaard@mirantis.com>
This reverts commit ab3fa46502.
This fix was partial, and is not needed with the proper fix in
containerd.
Signed-off-by: Bjorn Neergaard <bneergaard@mirantis.com>
The errors are already returned to the client in the API response, so
logging them to the daemon log is redundant. Log the errors at level
Debug so as not to pollute the end-users' daemon logs with noise.
Refactor the logs to use structured fields. Add the request context to
the log entry so that logrus hooks could annotate the log entries with
contextual information about the API request in the hypothetical future.
Fixes#44997
Signed-off-by: Cory Snider <csnider@mirantis.com>
Go 1.20 made a change to the behaviour of package "os/exec" which was
not mentioned in the release notes:
2b8f214094
Attempts to execute a directory now return syscall.EISDIR instead of
syscall.EACCESS. Check for EISDIR errors from the runtime and fudge the
returned error message to maintain compatibility with existing versions
of docker/cli when using a version of runc compiled with Go 1.20+.
Signed-off-by: Cory Snider <csnider@mirantis.com>
maxDownloadAttempts maps to the daemon configuration flag
--max-download-attempts int
Set the max download attempts for each pull (default 5)
and the daemon configuration machinery interprets a value of 0 as "apply
the default value" and not a valid user value (config validation/
normalization bugs notwithstanding). The intention is clearly that this
configuration value should be an upper limit on the number of times the
daemon should try to download a particular layer before giving up. So it
is surprising to have the configuration value interpreted as a _retry_
limit. The daemon will make up to N+1 attempts to download a layer! This
also means users cannot disable retries even if they wanted to.
Fix the fencepost bug so that max attempts really means max attempts,
not max retries. And fix the fencepost bug with the retry-backoff delay
so that the first backoff is 5s, not 10s.
Signed-off-by: Cory Snider <csnider@mirantis.com>
"math/rand".Seed
- Migrate to using local RNG instances.
"archive/tar".TypeRegA
- The deprecated constant tar.TypeRegA is the same value as
tar.TypeReg and so is not needed at all.
Signed-off-by: Cory Snider <csnider@mirantis.com>
This addresses the same CVE as is patched in go1.19.6. From that announcement:
> net/http: avoid quadratic complexity in HPACK decoding
>
> A maliciously crafted HTTP/2 stream could cause excessive CPU consumption
> in the HPACK decoder, sufficient to cause a denial of service from a small
> number of small requests.
>
> This issue is also fixed in golang.org/x/net/http2 v0.7.0, for users manually
> configuring HTTP/2.
>
> This is CVE-2022-41723 and Go issue https://go.dev/issue/57855.
full diff: https://github.com/golang/net/compare/v0.5.0...v0.7.0
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
DNS servers in the loopback address range should always be resolved in
the host network namespace when the servers are configured by reading
from the host's /etc/resolv.conf. The daemon mistakenly conflated the
presence of DNS options (docker run --dns-opt) with user-supplied DNS
servers, treating the list of servers loaded from the host as a user-
supplied list and attempting to resolve in the container's network
namespace. Correct this oversight so that loopback DNS servers are only
resolved in the container's network namespace when the user provides the
DNS server list, irrespective of other DNS configuration.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The per-network statistics counters are loaded and incremented without
any concurrency control. Use atomic integers to prevent data races
without having to add any synchronization.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The (*bitmap.Handle).String() method can be rather expensive to call. It
is all the more tragic when the expensively-constructed string is
immediately discarded because the log level is not high enough. Let
logrus stringify the arguments to debug logs so they are only
stringified when the log level is high enough.
# Before
ok github.com/docker/docker/libnetwork/ipam 10.159s
# After
ok github.com/docker/docker/libnetwork/ipam 2.484s
Signed-off-by: Cory Snider <csnider@mirantis.com>
These conditions were added in 8cf89245f5
to account for old versions of debian/ubuntu (apparmor_parser < 2.9)
that lacked some options;
> This allows us to use the apparmor profile we have in contrib/apparmor/
> and solves the problems where certain functions are not apparent on older
> versions of apparmor_parser on debian/ubuntu.
Those patches were from 2015/2016, and all currently supported distro
versions should now have more current versions than that. Looking at the
oldest supported versions;
Ubuntu 18.04 "Bionic":
apparmor_parser --version
AppArmor parser version 2.12
Copyright (C) 1999-2008 Novell Inc.
Copyright 2009-2012 Canonical Ltd.
Debian 10 "Buster"
apparmor_parser --version
AppArmor parser version 2.13.2
Copyright (C) 1999-2008 Novell Inc.
Copyright 2009-2018 Canonical Ltd.
This patch removes the conditionals.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
These conditions were added in 8cf89245f5
to account for old versions of debian/ubuntu (apparmor_parser < 2.8.96)
that lacked some options;
> This allows us to use the apparmor profile we have in contrib/apparmor/
> and solves the problems where certain functions are not apparent on older
> versions of apparmor_parser on debian/ubuntu.
Those patches were from 2015/2016, and all currently supported distro
versions should now have more current versions than that. Looking at the
oldest supported versions;
Ubuntu 18.04 "Bionic":
apparmor_parser --version
AppArmor parser version 2.12
Copyright (C) 1999-2008 Novell Inc.
Copyright 2009-2012 Canonical Ltd.
Debian 10 "Buster"
apparmor_parser --version
AppArmor parser version 2.13.2
Copyright (C) 1999-2008 Novell Inc.
Copyright 2009-2018 Canonical Ltd.
This patch removes the conditionals.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>