<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/net/ipv4/tcp_ipv4.c, branch linux-6.18.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-6.18.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-6.18.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2026-03-19T15:08:50Z</updated>
<entry>
<title>net/tcp-md5: Fix MAC comparison to be constant-time</title>
<updated>2026-03-19T15:08:50Z</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2026-03-10T20:16:36Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=ae3831b44f477de048287493e184fc3ff913b624'/>
<id>urn:sha1:ae3831b44f477de048287493e184fc3ff913b624</id>
<content type='text'>
commit 46d0d6f50dab706637f4c18a470aac20a21900d3 upstream.

To prevent timing attacks, MACs need to be compared in constant
time.  Use the appropriate helper function for this.

Fixes: cfb6eeb4c860 ("[TCP]: MD5 Signature Option (RFC2385) support.")
Fixes: 658ddaaf6694 ("tcp: md5: RST: getting md5 key from listener")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
Link: https://patch.msgid.link/20260302203409.13388-1-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>tcp: secure_seq: add back ports to TS offset</title>
<updated>2026-03-12T11:09:58Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2026-03-02T20:55:27Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=eae2f14ab2efccdb7480fae7d42c4b0116ef8805'/>
<id>urn:sha1:eae2f14ab2efccdb7480fae7d42c4b0116ef8805</id>
<content type='text'>
[ Upstream commit 165573e41f2f66ef98940cf65f838b2cb575d9d1 ]

This reverts 28ee1b746f49 ("secure_seq: downgrade to per-host timestamp offsets")

tcp_tw_recycle went away in 2017.

Zhouyan Deng reported off-path TCP source port leakage via
SYN cookie side-channel that can be fixed in multiple ways.

One of them is to bring back TCP ports in TS offset randomization.

As a bonus, we perform a single siphash() computation
to provide both an ISN and a TS offset.

Fixes: 28ee1b746f49 ("secure_seq: downgrade to per-host timestamp offsets")
Reported-by: Zhouyan Deng &lt;dengzhouyan_nwpu@163.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Acked-by: Florian Westphal &lt;fw@strlen.de&gt;
Link: https://patch.msgid.link/20260302205527.1982836-1-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>tcp: fix potential race in tcp_v6_syn_recv_sock()</title>
<updated>2026-03-04T12:20:44Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2026-02-17T16:12:05Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=fe89b2f05b854847784f91127319172945c1fadd'/>
<id>urn:sha1:fe89b2f05b854847784f91127319172945c1fadd</id>
<content type='text'>
[ Upstream commit 858d2a4f67ff69e645a43487ef7ea7f28f06deae ]

Code in tcp_v6_syn_recv_sock() after the call to tcp_v4_syn_recv_sock()
is done too late.

After tcp_v4_syn_recv_sock(), the child socket is already visible
from TCP ehash table and other cpus might use it.

Since newinet-&gt;pinet6 is still pointing to the listener ipv6_pinfo
bad things can happen as syzbot found.

Move the problematic code in tcp_v6_mapped_child_init()
and call this new helper from tcp_v4_syn_recv_sock() before
the ehash insertion.

This allows the removal of one tcp_sync_mss(), since
tcp_v4_syn_recv_sock() will call it with the correct
context.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+937b5bbb6a815b3e5d0b@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/69949275.050a0220.2eeac1.0145.GAE@google.com/
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Link: https://patch.msgid.link/20260217161205.2079883-1-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: psp: update the TCP MSS to reflect PSP packet overhead</title>
<updated>2025-09-18T10:32:06Z</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2025-09-17T00:09:36Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=e97269257fe437910cddc7c642a636ca3cf9fb1d'/>
<id>urn:sha1:e97269257fe437910cddc7c642a636ca3cf9fb1d</id>
<content type='text'>
PSP eats 40B of header space. Adjust MSS appropriately.

We can either modify tcp_mtu_to_mss() / tcp_mss_to_mtu()
or reuse icsk_ext_hdr_len. The former option is more TCP
specific and has runtime overhead. The latter is a bit
of a hack as PSP is not an ext_hdr. If one squints hard
enough, UDP encap is just a more practical version of
IPv6 exthdr, so go with the latter. Happy to change.

Reviewed-by: Willem de Bruijn &lt;willemb@google.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Daniel Zahka &lt;daniel.zahka@gmail.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://patch.msgid.link/20250917000954.859376-10-daniel.zahka@gmail.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
</entry>
<entry>
<title>tcp: add datapath logic for PSP with inline key exchange</title>
<updated>2025-09-18T10:32:06Z</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2025-09-17T00:09:31Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=659a2899a57da59f433182eba571881884d6323e'/>
<id>urn:sha1:659a2899a57da59f433182eba571881884d6323e</id>
<content type='text'>
Add validation points and state propagation to support PSP key
exchange inline, on TCP connections. The expectation is that
application will use some well established mechanism like TLS
handshake to establish a secure channel over the connection and
if both endpoints are PSP-capable - exchange and install PSP keys.
Because the connection can existing in PSP-unsecured and PSP-secured
state we need to make sure that there are no race conditions or
retransmission leaks.

On Tx - mark packets with the skb-&gt;decrypted bit when PSP key
is at the enqueue time. Drivers should only encrypt packets with
this bit set. This prevents retransmissions getting encrypted when
original transmission was not. Similarly to TLS, we'll use
sk-&gt;sk_validate_xmit_skb to make sure PSP skbs can't "escape"
via a PSP-unaware device without being encrypted.

On Rx - validation is done under socket lock. This moves the validation
point later than xfrm, for example. Please see the documentation patch
for more details on the flow of securing a connection, but for
the purpose of this patch what's important is that we want to
enforce the invariant that once connection is secured any skb
in the receive queue has been encrypted with PSP.

Add GRO and coalescing checks to prevent PSP authenticated data from
being combined with cleartext data, or data with non-matching PSP
state. On Rx, check skb's with psp_skb_coalesce_diff() at points
before psp_sk_rx_policy_check(). After skb's are policy checked and on
the socket receive queue, skb_cmp_decrypted() is sufficient for
checking for coalescable PSP state. On Tx, tcp_write_collapse_fence()
should be called when transitioning a socket into PSP Tx state to
prevent data sent as cleartext from being coalesced with PSP
encapsulated data.

This change only adds the validation points, for ease of review.
Subsequent change will add the ability to install keys, and flesh
the enforcement logic out

Reviewed-by: Willem de Bruijn &lt;willemb@google.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Co-developed-by: Daniel Zahka &lt;daniel.zahka@gmail.com&gt;
Signed-off-by: Daniel Zahka &lt;daniel.zahka@gmail.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://patch.msgid.link/20250917000954.859376-5-daniel.zahka@gmail.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
</entry>
<entry>
<title>tcp: accecn: AccECN option send control</title>
<updated>2025-09-18T06:47:52Z</updated>
<author>
<name>Chia-Yu Chang</name>
<email>chia-yu.chang@nokia-bell-labs.com</email>
</author>
<published>2025-09-16T08:24:31Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=aa55a7dde7ec506bb23448a5005ae3f4f809d022'/>
<id>urn:sha1:aa55a7dde7ec506bb23448a5005ae3f4f809d022</id>
<content type='text'>
Instead of sending the option in every ACK, limit sending to
those ACKs where the option is necessary:
- Handshake
- "Change-triggered ACK" + the ACK following it. The
  2nd ACK is necessary to unambiguously indicate which
  of the ECN byte counters in increasing. The first
  ACK has two counters increasing due to the ecnfield
  edge.
- ACKs with CE to allow CEP delta validations to take
  advantage of the option.
- Force option to be sent every at least once per 2^22
  bytes. The check is done using the bit edges of the
  byte counters (avoids need for extra variables).
- AccECN option beacon to send a few times per RTT even if
  nothing in the ECN state requires that. The default is 3
  times per RTT, and its period can be set via
  sysctl_tcp_ecn_option_beacon.

Below are the pahole outcomes before and after this patch,
in which the group size of tcp_sock_write_tx is increased
from 89 to 97 due to the new u64 accecn_opt_tstamp member:

[BEFORE THIS PATCH]
struct tcp_sock {
    [...]
    u64                        tcp_wstamp_ns;        /*  2488     8 */
    struct list_head           tsorted_sent_queue;   /*  2496    16 */

    [...]
    __cacheline_group_end__tcp_sock_write_tx[0];     /*  2521     0 */
    __cacheline_group_begin__tcp_sock_write_txrx[0]; /*  2521     0 */
    u8                         nonagle:4;            /*  2521: 0  1 */
    u8                         rate_app_limited:1;   /*  2521: 4  1 */
    /* XXX 3 bits hole, try to pack */

    /* Force alignment to the next boundary: */
    u8                         :0;
    u8                         received_ce_pending:4;/*  2522: 0  1 */
    u8                         unused2:4;            /*  2522: 4  1 */
    u8                         accecn_minlen:2;      /*  2523: 0  1 */
    u8                         est_ecnfield:2;       /*  2523: 2  1 */
    u8                         unused3:4;            /*  2523: 4  1 */

    [...]
    __cacheline_group_end__tcp_sock_write_txrx[0];   /*  2628     0 */

    [...]
    /* size: 3200, cachelines: 50, members: 171 */
}

[AFTER THIS PATCH]
struct tcp_sock {
    [...]
    u64                        tcp_wstamp_ns;        /*  2488     8 */
    u64                        accecn_opt_tstamp;    /*  2596     8 */
    struct list_head           tsorted_sent_queue;   /*  2504    16 */

    [...]
    __cacheline_group_end__tcp_sock_write_tx[0];     /*  2529     0 */
    __cacheline_group_begin__tcp_sock_write_txrx[0]; /*  2529     0 */
    u8                         nonagle:4;            /*  2529: 0  1 */
    u8                         rate_app_limited:1;   /*  2529: 4  1 */
    /* XXX 3 bits hole, try to pack */

    /* Force alignment to the next boundary: */
    u8                         :0;
    u8                         received_ce_pending:4;/*  2530: 0  1 */
    u8                         unused2:4;            /*  2530: 4  1 */
    u8                         accecn_minlen:2;      /*  2531: 0  1 */
    u8                         est_ecnfield:2;       /*  2531: 2  1 */
    u8                         accecn_opt_demand:2;  /*  2531: 4  1 */
    u8                         prev_ecnfield:2;      /*  2531: 6  1 */

    [...]
    __cacheline_group_end__tcp_sock_write_txrx[0];   /*  2636     0 */

    [...]
    /* size: 3200, cachelines: 50, members: 173 */
}

Signed-off-by: Chia-Yu Chang &lt;chia-yu.chang@nokia-bell-labs.com&gt;
Co-developed-by: Ilpo Järvinen &lt;ij@kernel.org&gt;
Signed-off-by: Ilpo Järvinen &lt;ij@kernel.org&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://patch.msgid.link/20250916082434.100722-8-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
</entry>
<entry>
<title>tcp: accecn: AccECN option</title>
<updated>2025-09-18T06:47:52Z</updated>
<author>
<name>Ilpo Järvinen</name>
<email>ij@kernel.org</email>
</author>
<published>2025-09-16T08:24:30Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=b5e74132dfbe60329b3ff0e5c485039f2e31605c'/>
<id>urn:sha1:b5e74132dfbe60329b3ff0e5c485039f2e31605c</id>
<content type='text'>
The Accurate ECN allows echoing back the sum of bytes for
each IP ECN field value in the received packets using
AccECN option. This change implements AccECN option tx &amp; rx
side processing without option send control related features
that are added by a later change.

Based on specification:
  https://tools.ietf.org/id/draft-ietf-tcpm-accurate-ecn-28.txt
(Some features of the spec will be added in the later changes
rather than in this one).

A full-length AccECN option is always attempted but if it does
not fit, the minimum length is selected based on the counters
that have changed since the last update. The AccECN option
(with 24-bit fields) often ends in odd sizes so the option
write code tries to take advantage of some nop used to pad
the other TCP options.

The delivered_ecn_bytes pairs with received_ecn_bytes similar
to how delivered_ce pairs with received_ce. In contrast to
ACE field, however, the option is not always available to update
delivered_ecn_bytes. For ACK w/o AccECN option, the delivered
bytes calculated based on the cumulative ACK+SACK information
are assigned to one of the counters using an estimation
heuristic to select the most likely ECN byte counter. Any
estimation error is corrected when the next AccECN option
arrives. It may occur that the heuristic gets too confused
when there are enough different byte counter deltas between
ACKs with the AccECN option in which case the heuristic just
gives up on updating the counters for a while.

tcp_ecn_option sysctl can be used to select option sending
mode for AccECN: TCP_ECN_OPTION_DISABLED, TCP_ECN_OPTION_MINIMUM,
and TCP_ECN_OPTION_FULL.

This patch increases the size of tcp_info struct, as there is
no existing holes for new u32 variables. Below are the pahole
outcomes before and after this patch:

[BEFORE THIS PATCH]
struct tcp_info {
    [...]
     __u32                     tcpi_total_rto_time;  /*   244     4 */

    /* size: 248, cachelines: 4, members: 61 */
}

[AFTER THIS PATCH]
struct tcp_info {
    [...]
    __u32                      tcpi_total_rto_time;  /*   244     4 */
    __u32                      tcpi_received_ce;     /*   248     4 */
    __u32                      tcpi_delivered_e1_bytes; /*   252     4 */
    __u32                      tcpi_delivered_e0_bytes; /*   256     4 */
    __u32                      tcpi_delivered_ce_bytes; /*   260     4 */
    __u32                      tcpi_received_e1_bytes; /*   264     4 */
    __u32                      tcpi_received_e0_bytes; /*   268     4 */
    __u32                      tcpi_received_ce_bytes; /*   272     4 */

    /* size: 280, cachelines: 5, members: 68 */
}

This patch uses the existing 1-byte holes in the tcp_sock_write_txrx
group for new u8 members, but adds a 4-byte hole in tcp_sock_write_rx
group after the new u32 delivered_ecn_bytes[3] member. Therefore, the
group size of tcp_sock_write_rx is increased from 96 to 112. Below
are the pahole outcomes before and after this patch:

[BEFORE THIS PATCH]
struct tcp_sock {
    [...]
    u8                         received_ce_pending:4; /*  2522: 0  1 */
    u8                         unused2:4;             /*  2522: 4  1 */
    /* XXX 1 byte hole, try to pack */

    [...]
    u32                        rcv_rtt_last_tsecr;    /*  2668     4 */

    [...]
    __cacheline_group_end__tcp_sock_write_rx[0];      /*  2728     0 */

    [...]
    /* size: 3200, cachelines: 50, members: 167 */
}

[AFTER THIS PATCH]
struct tcp_sock {
    [...]
    u8                         received_ce_pending:4;/*  2522: 0  1 */
    u8                         unused2:4;            /*  2522: 4  1 */
    u8                         accecn_minlen:2;      /*  2523: 0  1 */
    u8                         est_ecnfield:2;       /*  2523: 2  1 */
    u8                         unused3:4;            /*  2523: 4  1 */

    [...]
    u32                        rcv_rtt_last_tsecr;   /*  2668     4 */
    u32                        delivered_ecn_bytes[3];/*  2672    12 */
    /* XXX 4 bytes hole, try to pack */

    [...]
    __cacheline_group_end__tcp_sock_write_rx[0];     /*  2744     0 */

    [...]
    /* size: 3200, cachelines: 50, members: 171 */
}

Signed-off-by: Ilpo Järvinen &lt;ij@kernel.org&gt;
Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Co-developed-by: Chia-Yu Chang &lt;chia-yu.chang@nokia-bell-labs.com&gt;
Signed-off-by: Chia-Yu Chang &lt;chia-yu.chang@nokia-bell-labs.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://patch.msgid.link/20250916082434.100722-7-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
</entry>
<entry>
<title>tcp: accecn: AccECN negotiation</title>
<updated>2025-09-18T06:47:51Z</updated>
<author>
<name>Ilpo Järvinen</name>
<email>ij@kernel.org</email>
</author>
<published>2025-09-16T08:24:26Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=3cae34274c79e0c60ccd1c10516973af1aed2a7c'/>
<id>urn:sha1:3cae34274c79e0c60ccd1c10516973af1aed2a7c</id>
<content type='text'>
Accurate ECN negotiation parts based on the specification:
  https://tools.ietf.org/id/draft-ietf-tcpm-accurate-ecn-28.txt

Accurate ECN is negotiated using ECE, CWR and AE flags in the
TCP header. TCP falls back into using RFC3168 ECN if one of the
ends supports only RFC3168-style ECN.

The AccECN negotiation includes reflecting IP ECN field value
seen in SYN and SYNACK back using the same bits as negotiation
to allow responding to SYN CE marks and to detect ECN field
mangling. CE marks should not occur currently because SYN=1
segments are sent with Non-ECT in IP ECN field (but proposal
exists to remove this restriction).

Reflecting SYN IP ECN field in SYNACK is relatively simple.
Reflecting SYNACK IP ECN field in the final/third ACK of
the handshake is more challenging. Linux TCP code is not well
prepared for using the final/third ACK a signalling channel
which makes things somewhat complicated here.

tcp_ecn sysctl can be used to select the highest ECN variant
(Accurate ECN, ECN, No ECN) that is attemped to be negotiated and
requested for incoming connection and outgoing connection:
TCP_ECN_IN_NOECN_OUT_NOECN, TCP_ECN_IN_ECN_OUT_ECN,
TCP_ECN_IN_ECN_OUT_NOECN, TCP_ECN_IN_ACCECN_OUT_ACCECN,
TCP_ECN_IN_ACCECN_OUT_ECN, and TCP_ECN_IN_ACCECN_OUT_NOECN.

After this patch, the size of tcp_request_sock remains unchanged
and no new holes are added. Below are the pahole outcomes before
and after this patch:

[BEFORE THIS PATCH]
struct tcp_request_sock {
    [...]
    u32                        rcv_nxt;              /*   352     4 */
    u8                         syn_tos;              /*   356     1 */

    /* size: 360, cachelines: 6, members: 16 */
}

[AFTER THIS PATCH]
struct tcp_request_sock {
    [...]
    u32                        rcv_nxt;              /*   352     4 */
    u8                         syn_tos;              /*   356     1 */
    bool                       accecn_ok;            /*   357     1 */
    u8                         syn_ect_snt:2;        /*   358: 0  1 */
    u8                         syn_ect_rcv:2;        /*   358: 2  1 */
    u8                         accecn_fail_mode:4;   /*   358: 4  1 */

    /* size: 360, cachelines: 6, members: 20 */
}

After this patch, the size of tcp_sock remains unchanged and no new
holes are added. Also, 4 bits of the existing 2-byte hole are exploited.
Below are the pahole outcomes before and after this patch:

[BEFORE THIS PATCH]
struct tcp_sock {
    [...]
    u8                         dup_ack_counter:2;    /*  2761: 0  1 */
    u8                         tlp_retrans:1;        /*  2761: 2  1 */
    u8                         unused:5;             /*  2761: 3  1 */
    u8                         thin_lto:1;           /*  2762: 0  1 */
    u8                         fastopen_connect:1;   /*  2762: 1  1 */
    u8                         fastopen_no_cookie:1; /*  2762: 2  1 */
    u8                         fastopen_client_fail:2; /*  2762: 3  1 */
    u8                         frto:1;               /*  2762: 5  1 */
    /* XXX 2 bits hole, try to pack */

    [...]
    u8                         keepalive_probes;     /*  2765     1 */
    /* XXX 2 bytes hole, try to pack */

    [...]
    /* size: 3200, cachelines: 50, members: 164 */
}

[AFTER THIS PATCH]
struct tcp_sock {
    [...]
    u8                         dup_ack_counter:2;    /*  2761: 0  1 */
    u8                         tlp_retrans:1;        /*  2761: 2  1 */
    u8                         syn_ect_snt:2;        /*  2761: 3  1 */
    u8                         syn_ect_rcv:2;        /*  2761: 5  1 */
    u8                         thin_lto:1;           /*  2761: 7  1 */
    u8                         fastopen_connect:1;   /*  2762: 0  1 */
    u8                         fastopen_no_cookie:1; /*  2762: 1  1 */
    u8                         fastopen_client_fail:2; /*  2762: 2  1 */
    u8                         frto:1;               /*  2762: 4  1 */
    /* XXX 3 bits hole, try to pack */

    [...]
    u8                         keepalive_probes;     /*  2765     1 */
    u8                         accecn_fail_mode:4;   /*  2766: 0  1 */
    /* XXX 4 bits hole, try to pack */
    /* XXX 1 byte hole, try to pack */

    [...]
    /* size: 3200, cachelines: 50, members: 166 */
}

Signed-off-by: Ilpo Järvinen &lt;ij@kernel.org&gt;
Co-developed-by: Olivier Tilmans &lt;olivier.tilmans@nokia.com&gt;
Signed-off-by: Olivier Tilmans &lt;olivier.tilmans@nokia.com&gt;
Co-developed-by: Chia-Yu Chang &lt;chia-yu.chang@nokia-bell-labs.com&gt;
Signed-off-by: Chia-Yu Chang &lt;chia-yu.chang@nokia-bell-labs.com&gt;
Acked-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://patch.msgid.link/20250916082434.100722-3-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
</entry>
<entry>
<title>tcp: Free TCP-AO/TCP-MD5 info/keys without RCU</title>
<updated>2025-09-12T02:05:56Z</updated>
<author>
<name>Dmitry Safonov</name>
<email>dima@arista.com</email>
</author>
<published>2025-09-09T01:18:51Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=51e547e8c89c661f6fbede4a28b1d33b13625683'/>
<id>urn:sha1:51e547e8c89c661f6fbede4a28b1d33b13625683</id>
<content type='text'>
Now that the destruction of info/keys is delayed until the socket
destructor, it's safe to use kfree() without an RCU callback.
The socket is in TCP_CLOSE state either because it never left it,
or it's already closed and the refcounter is zero. In any way,
no one can discover it anymore, it's safe to release memory
straight away.

Similar thing was possible for twsk already.

Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Signed-off-by: Dmitry Safonov &lt;dima@arista.com&gt;
Link: https://patch.msgid.link/20250909-b4-tcp-ao-md5-rst-finwait2-v5-2-9ffaaaf8b236@arista.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>tcp: Destroy TCP-AO, TCP-MD5 keys in .sk_destruct()</title>
<updated>2025-09-12T02:05:56Z</updated>
<author>
<name>Dmitry Safonov</name>
<email>dima@arista.com</email>
</author>
<published>2025-09-09T01:18:50Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=9e472d9e84b11e9f3c429eba97c2a9e74461a884'/>
<id>urn:sha1:9e472d9e84b11e9f3c429eba97c2a9e74461a884</id>
<content type='text'>
Currently there are a couple of minor issues with destroying the keys
tcp_v4_destroy_sock():

1. The socket is yet in TCP bind buckets, making it reachable for
   incoming segments [on another CPU core], potentially available to send
   late FIN/ACK/RST replies.

2. There is at least one code path, where tcp_done() is called before
   sending RST [kudos to Bob for investigation]. This is a case of
   a server, that finished sending its data and just called close().

   The socket is in TCP_FIN_WAIT2 and has RCV_SHUTDOWN (set by
   __tcp_close())

   tcp_v4_do_rcv()/tcp_v6_do_rcv()
     tcp_rcv_state_process()            /* LINUX_MIB_TCPABORTONDATA */
       tcp_reset()
         tcp_done_with_error()
           tcp_done()
             inet_csk_destroy_sock()    /* Destroys AO/MD5 keys */
     /* tcp_rcv_state_process() returns SKB_DROP_REASON_TCP_ABORT_ON_DATA */
   tcp_v4_send_reset()                  /* Sends an unsigned RST segment */

   tcpdump:
&gt; 22:53:15.399377 00:00:b2:1f:00:00 &gt; 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 33929, offset 0, flags [DF], proto TCP (6), length 60)
&gt;     1.0.0.1.34567 &gt; 1.0.0.2.49848: Flags [F.], seq 2185658590, ack 3969644355, win 502, options [nop,nop,md5 valid], length 0
&gt; 22:53:15.399396 00:00:01:01:00:00 &gt; 00:00:b2:1f:00:00, ethertype IPv4 (0x0800), length 86: (tos 0x0, ttl 64, id 51951, offset 0, flags [DF], proto TCP (6), length 72)
&gt;     1.0.0.2.49848 &gt; 1.0.0.1.34567: Flags [.], seq 3969644375, ack 2185658591, win 128, options [nop,nop,md5 valid,nop,nop,sack 1 {2185658590:2185658591}], length 0
&gt; 22:53:16.429588 00:00:b2:1f:00:00 &gt; 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
&gt;     1.0.0.1.34567 &gt; 1.0.0.2.49848: Flags [R], seq 2185658590, win 0, length 0
&gt; 22:53:16.664725 00:00:b2:1f:00:00 &gt; 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
&gt;     1.0.0.1.34567 &gt; 1.0.0.2.49848: Flags [R], seq 2185658591, win 0, options [nop,nop,md5 valid], length 0
&gt; 22:53:17.289832 00:00:b2:1f:00:00 &gt; 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
&gt;     1.0.0.1.34567 &gt; 1.0.0.2.49848: Flags [R], seq 2185658591, win 0, options [nop,nop,md5 valid], length 0

  Note the signed RSTs later in the dump - those are sent by the server
  when the fin-wait socket gets removed from hash buckets, by
  the listener socket.

Instead of destroying AO/MD5 info and their keys in inet_csk_destroy_sock(),
slightly delay it until the actual socket .sk_destruct(). As shutdown'ed
socket can yet send non-data replies, they should be signed in order for
the peer to process them. Now it also matches how AO/MD5 gets destructed
for TIME-WAIT sockets (in tcp_twsk_destructor()).

This seems optimal for TCP-MD5, while for TCP-AO it seems to have an
open problem: once RST get sent and socket gets actually destructed,
there is no information on the initial sequence numbers. So, in case
this last RST gets lost in the network, the server's listener socket
won't be able to properly sign another RST. Nothing in RFC 1122
prescribes keeping any local state after non-graceful reset.
Luckily, BGP are known to use keep alive(s).

While the issue is quite minor/cosmetic, these days monitoring network
counters is a common practice and getting invalid signed segments from
a trusted BGP peer can get customers worried.

Investigated-by: Bob Gilligan &lt;gilligan@arista.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Signed-off-by: Dmitry Safonov &lt;dima@arista.com&gt;
Link: https://patch.msgid.link/20250909-b4-tcp-ao-md5-rst-finwait2-v5-1-9ffaaaf8b236@arista.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
</feed>
