This avoids two allocations when receiving network packets. One for
inserting a PacketWithTimestamp into m_packet_queue and another one
when inserting buffers into the list of unused packet buffers.
With this fixed the only allocations in NetworkTask happen when
initially allocating the PacketWithTimestamp structs and when switching
contexts.
Occasionally we'll see messages in the serial console like:
handle_tcp: unexpected flags in FinWait1 state
In these cases it would be nice to know what flags we are receiving that
we aren't expecting.
Previously we didn't retransmit lost TCP packets which would cause
connections to hang if packets were lost. Also we now time out
TCP connections after a number of retransmission attempts.
This wakes up NetworkTask every 500 milliseconds so that it can send
pending delayed TCP ACKs and isn't forced to send all of them early
when it goes to sleep like it did before.
When establishing the connection we should send ACKs right away so
as to not delay the connection process. This didn't previously
matter because we'd flush all delayed ACKs when NetworkTask waits
for incoming packets.
Avoid holding the sockets_by_tuple lock while allocating the TCPSocket.
While checking if the list contains the item we can also hold the lock
in shared mode, as we are only reading the hash table.
In addition the call to from_tuple appears to be superfluous, as we
created the socket, so we should be able to just return it directly.
This avoids the recursive lock acquisition, as well as the unnecessary
hash table lookups.
Note that the changes to IPv4Socket::create are unfortunately needed as
the return type of TCPSocket::create and IPv4Socket::create don't match.
- KResultOr<NonnullRefPtr<TcpSocket>>>
vs
- KResultOr<NonnullRefPtr<Socket>>>
To handle this we are forced to manually decompose the KResultOr<T> and
return the value() and error() separately.
This avoids allocations for initializing the Function<T>
for the NetworkAdapter::for_each callback argument.
Applying this patch decreases CPU utilization for NetworkTask
from 40% to 28% when receiving TCP packets at a rate of 100Mbit/s.
We already have another limit for the total number of packet buffers
allowed (max_packet_buffers). This second limit caused us to
repeatedly allocate and then free buffers.
This matches what other operating systems like Linux do:
$ ip route get 0.0.0.0
local 0.0.0.0 dev lo src 127.0.0.1 uid 1000
cache <local>
$ ssh 0.0.0.0
gunnar@0.0.0.0's password:
$ ss -na | grep :22 | grep ESTAB
tcp ESTAB 0 0 127.0.0.1:43118 127.0.0.1:22
tcp ESTAB 0 0 127.0.0.1:22 127.0.0.1:43118
When we receive a TCP packet with a sequence number that is not what
we expected we have lost one or more packets. We can signal this to
the sender by sending a TCP ACK with the previous ack number so that
they can resend the missing TCP fragments.
Previously we'd process TCP packets in whatever order we received
them in. In the case where packets arrived out of order we'd end
up passing garbage to the userspace process.
This was most evident for TLS connections:
courage:~ $ git clone https://github.com/SerenityOS/serenity
Cloning into 'serenity'...
remote: Enumerating objects: 178826, done.
remote: Counting objects: 100% (1880/1880), done.
remote: Compressing objects: 100% (907/907), done.
error: RPC failed; curl 56 OpenSSL SSL_read: error:1408F119:SSL
routines:SSL3_GET_RECORD:decryption failed or bad record mac, errno 0
error: 1918 bytes of body are still expected
fetch-pack: unexpected disconnect while reading sideband packet
fatal: early EOF
fatal: fetch-pack: invalid index-pack output
When the MSS option header is missing the default maximum segment
size is 536 which results in lots of very small TCP packets that
NetworkTask has to handle.
This adds the MSS option header to outbound TCP SYN packets and
sets it to an appropriate value depending on the interface's MTU.
Note that we do not currently do path MTU discovery so this could
cause problems when hops don't fragment packets properly.
This increases the default TCP window size to a more reasonable
value of 64k. This allows TCP peers to send us more packets before
waiting for corresponding ACKs.
This increases the buffer size for connection-oriented sockets
to 256kB. In combination with the other patches in this series
I was able to receive TCP packets at a rate of about 120Mbps.
Previously we'd incorrectly use the default gateway's MAC address.
Instead we must use destination MAC addresses that are derived from
the multicast IPv4 address.
With this patch applied I can query mDNS on a real network.
When reading UDP packets from userspace with recvmsg()/recv() we
would hit a VERIFY() if the supplied buffer is smaller than the
received UDP packet. Instead we should just return truncated data
to the caller.
This can be reproduced with:
$ dd if=/dev/zero bs=1k count=1 | nc -u 192.168.3.190 68
An IP socket can now join a multicast group by using the
IP_ADD_MEMBERSHIP sockopt, which will cause it to start receiving
packets sent to the multicast address, even though this address does
not belong to this host.
The previous patch already helped with this, however my idea of only
reading a few packets didn't work and we'd still sometimes end up not
receiving any more packets from the E1000 interface.
With this patch applied my NIC seems to receive packets just fine, at
least for now.
Raw sockets don't need a local port, so we shouldn't fail operations
if allocation yields an ENOPROTOOPT.
I'm not in love with the factoring here, just patching up the bug.
Without this patch we end up with sockets in the listener's accept
queue with state 'closed' when doing stealth SYN scans:
Client -> Server: SYN for port 22
Server -> Client: SYN/ACK
Client -> Server: RST (i.e. don't complete the TCP handshake)
This commit will add MSG_PEEK support, which allows a package to be
seen without taking it from the buffer, so that a subsequent recv()
without the MSG_PEEK flag can pick it up.
We had some inconsistencies before:
- Sometimes "The", sometimes "the"
- Sometimes trailing ".", sometimes no trailing "."
I picked the most common one (lowecase "the", trailing ".") and applied
it to all copyright headers.
By using the exact same string everywhere we can ensure nothing gets
missed during a global search (and replace), and that these
inconsistencies are not spread any further (as copyright headers are
commonly copied to new files).
Previously the E1000 network adapter would stop receiving further
packets when an RX buffer overrun occurred. This was the case
when connecting the adapter to a real network where enough broadcast
traffic caused the buffer to be full before the kernel had a chance
to clear the RX buffer.