Inspect and history used two different ways to find the present images.
This made history fail in some cases where image inspect would work (if
a configuration of a manifest wasn't found in the content store).
With this change we now use the same logic for both inspect and history.
Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
Add this syscall to match the profile in containerd
containerd: a6e52c74fa
libseccomp: 53267af3fb
kernel: 9f6c532f59
futex: Add sys_futex_wake()
To complement sys_futex_waitv() add sys_futex_wake(). This syscall
implements what was previously known as FUTEX_WAKE_BITSET except it
uses 'unsigned long' for the bitmask and takes FUTEX2 flags.
The 'unsigned long' allows FUTEX2_SIZE_U64 on 64bit platforms.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit d69729e053)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Add this syscall to match the profile in containerd
containerd: a6e52c74fa
libseccomp: 53267af3fb
kernel: cb8c4312af
futex: Add sys_futex_wait()
To complement sys_futex_waitv()/wake(), add sys_futex_wait(). This
syscall implements what was previously known as FUTEX_WAIT_BITSET
except it uses 'unsigned long' for the value and bitmask arguments,
takes timespec and clockid_t arguments for the absolute timeout and
uses FUTEX2 flags.
The 'unsigned long' allows FUTEX2_SIZE_U64 on 64bit platforms.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 10d344d176)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Add this syscall to match the profile in containerd
containerd: a6e52c74fa
libseccomp: 53267af3fb
kernel: 0f4b5f9722
futex: Add sys_futex_requeue()
Finish off the 'simple' futex2 syscall group by adding
sys_futex_requeue(). Unlike sys_futex_{wait,wake}() its arguments are
too numerous to fit into a regular syscall. As such, use struct
futex_waitv to pass the 'source' and 'destination' futexes to the
syscall.
This syscall implements what was previously known as FUTEX_CMP_REQUEUE
and uses {val, uaddr, flags} for source and {uaddr, flags} for
destination.
This design explicitly allows requeueing between different types of
futex by having a different flags word per uaddr.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit df57a080b6)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Add this syscall to match the profile in containerd
containerd: a6e52c74fa
libseccomp: 53267af3fb
kernel: c35559f94e
x86/shstk: Introduce map_shadow_stack syscall
When operating with shadow stacks enabled, the kernel will automatically
allocate shadow stacks for new threads, however in some cases userspace
will need additional shadow stacks. The main example of this is the
ucontext family of functions, which require userspace allocating and
pivoting to userspace managed stacks.
Unlike most other user memory permissions, shadow stacks need to be
provisioned with special data in order to be useful. They need to be setup
with a restore token so that userspace can pivot to them via the RSTORSSP
instruction. But, the security design of shadow stacks is that they
should not be written to except in limited circumstances. This presents a
problem for userspace, as to how userspace can provision this special
data, without allowing for the shadow stack to be generally writable.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 8826f402f9)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Add this syscall to match the profile in containerd
containerd: a6e52c74fa
libseccomp: 53267af3fb
kernel: 09da082b07
fs: Add fchmodat2()
On the userspace side fchmodat(3) is implemented as a wrapper
function which implements the POSIX-specified interface. This
interface differs from the underlying kernel system call, which does not
have a flags argument. Most implementations require procfs [1][2].
There doesn't appear to be a good userspace workaround for this issue
but the implementation in the kernel is pretty straight-forward.
The new fchmodat2() syscall allows to pass the AT_SYMLINK_NOFOLLOW flag,
unlike existing fchmodat.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 6f242f1a28)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Add this syscall to match the profile in containerd
containerd: a6e52c74fa
libseccomp: 53267af3fb
kernel: cf264e1329
NAME
cachestat - query the page cache statistics of a file.
SYNOPSIS
#include <sys/mman.h>
struct cachestat_range {
__u64 off;
__u64 len;
};
struct cachestat {
__u64 nr_cache;
__u64 nr_dirty;
__u64 nr_writeback;
__u64 nr_evicted;
__u64 nr_recently_evicted;
};
int cachestat(unsigned int fd, struct cachestat_range *cstat_range,
struct cachestat *cstat, unsigned int flags);
DESCRIPTION
cachestat() queries the number of cached pages, number of dirty
pages, number of pages marked for writeback, number of evicted
pages, number of recently evicted pages, in the bytes range given by
`off` and `len`.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 4d0d5ee10d)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This syscall is gated by CAP_SYS_NICE, matching the profile in containerd.
containerd: a6e52c74fa
libseccomp: d83cb7ac25
kernel: c6018b4b25
mm/mempolicy: add set_mempolicy_home_node syscall
This syscall can be used to set a home node for the MPOL_BIND and
MPOL_PREFERRED_MANY memory policy. Users should use this syscall after
setting up a memory policy for the specified range as shown below.
mbind(p, nr_pages * page_size, MPOL_BIND, new_nodes->maskp,
new_nodes->size + 1, 0);
sys_set_mempolicy_home_node((unsigned long)p, nr_pages * page_size,
home_node, 0);
The syscall allows specifying a home node/preferred node from which
kernel will fulfill memory allocation requests first.
...
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 1251982cf7)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
(cherry picked from commit 2c01d53d96)
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
The compatibility depends on whether `hyperv` or `process` container
isolation is used.
This fixes cache not being used when building images based on older
Windows versions on a newer Windows host.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
(cherry picked from commit 91ea04089b)
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
(cherry picked from commit 2ef0b53e51)
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Only print the tag when the received reference has a tag, if
we can't cast the received tag to a `reference.Tagged` then
skip printing the tag as it's likely a digest.
Fixes panic when trying to install a plugin from a reference
with a digest such as
`vieux/sshfs@sha256:1d3c3e42c12138da5ef7873b97f7f32cf99fb6edde75fa4f0bcf9ed277855811`
Signed-off-by: Laura Brehm <laurabrehm@hey.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Adds a test case for installing a plugin from a remote in the form
of `plugin-content-trust@sha256:d98f2f8061...`, which is currently
causing the daemon to panic, as we found while running the CLI e2e
tests:
```
docker plugin install registry:5000/plugin-content-trust@sha256:d98f2f806144bf4ba62d4ecaf78fec2f2fe350df5a001f6e3b491c393326aedb
```
Signed-off-by: Laura Brehm <laurabrehm@hey.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Update the docker CLI that's available for debugging in the dev-shell
to the v25 release.
full diff: https://github.com/docker/cli/compare/v25.0.1...v25.0.2
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 9c92c07acf)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Update the docker CLI that's available for debugging in the dev-shell
to the v25 release.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 3eb1527fdb)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The monitorDaemon() goroutine calls startContainerd() then blocks on
<-daemonWaitCh to wait for it to exit. The startContainerd() function
would (re)initialize the daemonWaitCh so a restarted containerd could be
waited on. This implementation was race-free because startContainerd()
would synchronously initialize the daemonWaitCh before returning. When
the call to start the managed containerd process was moved into the
waiter goroutine, the code to initialize the daemonWaitCh struct field
was also moved into the goroutine. This introduced a race condition.
Move the daemonWaitCh initialization to guarantee that it happens before
the startContainerd() call returns.
Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit dd20bf4862)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
If a reader has caught up to the logger and is waiting for the next
message, it should stop waiting when the logger is closed. Otherwise
the reader will unnecessarily wait the full closedDrainTimeout for no
log messages to arrive.
This case was overlooked when the journald reader was recently
overhauled to be compatible with systemd 255, and the reader tests only
failed when a logical race happened to settle in such a way to exercise
the bugged code path. It was only after implicit flushing on close was
added to the journald test harness that the Follow tests would
repeatably fail due to this bug. (No new regression tests are needed.)
Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 987fe37ed1)
Signed-off-by: Cory Snider <csnider@mirantis.com>
The journald reader test harness injects an artificial asynchronous
delay into the logging pipeline: a logged message won't be written to
the journal until at least 150ms after the Log() call returns. If a test
returns while log messages are still in flight to be written, the logs
may attempt to be written after the TempDir has been cleaned up, leading
to spurious errors.
The logger read tests which interleave writing and reading have to
include explicit synchronization points to work reliably with this delay
in place. On the other hand, tests should not be required to sync the
logger explicitly before returning. Override the Close() method in the
test harness wrapper to wait for in-flight logs to be flushed to disk.
Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit d53b7d7e46)
Signed-off-by: Cory Snider <csnider@mirantis.com>
- Check the return value when logging messages
- Log the stream (stdout/stderr) and list of messages that were not read
- Wait until the logger is closed before returning early (panic/fatal)
Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 39c5c16521)
Signed-off-by: Cory Snider <csnider@mirantis.com>
Writing the systemd-journal-remote command output directly to os.Stdout
and os.Stderr makes it nearly impossible to tell which test case the
output is related to when the tests are not run in verbose mode. Extend
the journald sender fake to redirect output to the test log so they
interleave with the rest of the test output.
Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 5792bf7ab3)
Signed-off-by: Cory Snider <csnider@mirantis.com>
The Go race detector was detecting a data race when running the
TestLogRead/Follow/Concurrent test against the journald logging driver.
The race was in the test harness, specifically syncLogger. The waitOn
field would be reassigned each time a log entry is sent to the journal,
which is not concurrency-safe. Make it concurrency-safe using the same
patterns that are used in the log follower implementation to synchronize
with the logger.
Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 982e777d49)
Signed-off-by: Cory Snider <csnider@mirantis.com>
Since 964ab7158c, we explicitly set the bridge MTU if it was specified.
Unfortunately, kernel <v4.17 have a check preventing us to manually set
the MTU to anything greater than 1500 if no links is attached to the
bridge, which is how we do things -- create the bridge, set its MTU and
later on, attach veths to it.
Relevant kernel commit: 804b854d37
As we still have to support CentOS/RHEL 7 (and their old v3.10 kernels)
for a few more months, we need to ignore EINVAL if the MTU is > 1500
(but <= 65535).
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
(cherry picked from commit 89470a7114)
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Commit 4f47013feb introduced a new validation step to make sure no
IPv6 subnet is configured on a network which has EnableIPv6=false.
Commit 5d5eeac310 then removed that validation step and automatically
enabled IPv6 for networks with a v6 subnet. But this specific commit
was reverted in c59e93a67b and now the error introduced by 4f47013feb
is re-introduced.
But it turns out some users expect a network created with an IPv6
subnet and EnableIPv6=false to actually have no IPv6 connectivity.
This restores that behavior.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
(cherry picked from commit e37172c613)
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Do not set 'Config.MacAddress' in inspect output unless the MAC address
is configured.
Also, make sure it is filled in for a configured address on the default
network before the container is started (by translating the network name
from 'default' to 'config' so that the address lookup works).
Signed-off-by: Rob Murray <rob.murray@docker.com>
(cherry picked from commit 8c64b85fb9)
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
The API's EndpointConfig struct has a MacAddress field that's used for
both the configured address, and the current address (which may be generated).
A configured address must be restored when a container is restarted, but a
generated address must not.
The previous attempt to differentiate between the two, without adding a field
to the API's EndpointConfig that would show up in 'inspect' output, was a
field in the daemon's version of EndpointSettings, MACOperational. It did
not work, MACOperational was set to true when a configured address was
used. So, while it ensured addresses were regenerated, it failed to preserve
a configured address.
So, this change removes that code, and adds DesiredMacAddress to the wrapped
version of EndpointSettings, where it is persisted but does not appear in
'inspect' results. Its value is copied from MacAddress (the API field) when
a container is created.
Signed-off-by: Rob Murray <rob.murray@docker.com>
(cherry picked from commit dae33031e0)
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Containers attached to an 'internal' bridge network are unable to
communicate when the host is running firewalld.
Non-internal bridges are added to a trusted 'docker' firewalld zone, but
internal bridges were not.
DOCKER-ISOLATION iptables rules are still configured for an internal
network, they block traffic to/from addresses outside the network's subnet.
Signed-off-by: Rob Murray <rob.murray@docker.com>
(cherry picked from commit 2cc627932a)
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
It was introduced in API v1.38 but wasn't documented.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
(cherry picked from commit 0c3b8ccda7)
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
When saving an image treat `image@sha256:abcdef...` the same as
`abcdef...`, this makes it:
- Not export the digested tag as the image name
- Not try to export all tags from the image repository
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
(cherry picked from commit 5e13f54f57)
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Saving an image via digested reference, ID or truncated ID doesn't store
the image reference in the archive. This also causes the save code to
not add the image's manifest to the index.json.
This commit explicitly adds the untagged manifests to the index.json if
no tagged manifests were added.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
(cherry picked from commit d131f00fff)
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>