summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2024-08-01Merge tag 'net-6.11-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from wireless, bleutooth, BPF and netfilter. Current release - regressions: - core: drop bad gso csum_start and offset in virtio_net_hdr - wifi: mt76: fix null pointer access in mt792x_mac_link_bss_remove - eth: tun: add missing bpf_net_ctx_clear() in do_xdp_generic() - phy: aquantia: only poll GLOBAL_CFG regs on aqr113, aqr113c and aqr115c Current release - new code bugs: - smc: prevent UAF in inet_create() - bluetooth: btmtk: fix kernel crash when entering btmtk_usb_suspend - eth: bnxt: reject unsupported hash functions Previous releases - regressions: - sched: act_ct: take care of padding in struct zones_ht_key - netfilter: fix null-ptr-deref in iptable_nat_table_init(). - tcp: adjust clamping window for applications specifying SO_RCVBUF Previous releases - always broken: - ethtool: rss: small fixes to spec and GET - mptcp: - fix signal endpoint re-add - pm: fix backup support in signal endpoints - wifi: ath12k: fix soft lockup on suspend - eth: bnxt_en: fix RSS logic in __bnxt_reserve_rings() - eth: ice: fix AF_XDP ZC timeout and concurrency issues - eth: mlx5: - fix missing lock on sync reset reload - fix error handling in irq_pool_request_irq" * tag 'net-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (76 commits) mptcp: fix duplicate data handling mptcp: fix bad RCVPRUNED mib accounting ipv6: fix ndisc_is_useropt() handling for PIO igc: Fix double reset adapter triggered from a single taprio cmd net: MAINTAINERS: Demote Qualcomm IPA to "maintained" net: wan: fsl_qmc_hdlc: Discard received CRC net: wan: fsl_qmc_hdlc: Convert carrier_lock spinlock to a mutex net/mlx5e: Add a check for the return value from mlx5_port_set_eth_ptys net/mlx5e: Fix CT entry update leaks of modify header context net/mlx5e: Require mlx5 tc classifier action support for IPsec prio capability net/mlx5: Fix missing lock on sync reset reload net/mlx5: Lag, don't use the hardcoded value of the first port net/mlx5: DR, Fix 'stack guard page was hit' error in dr_rule net/mlx5: Fix error handling in irq_pool_request_irq net/mlx5: Always drain health in shutdown callback net: Add skbuff.h to MAINTAINERS r8169: don't increment tx_dropped in case of NETDEV_TX_BUSY netfilter: iptables: Fix potential null-ptr-deref in ip6table_nat_table_init(). netfilter: iptables: Fix null-ptr-deref in iptable_nat_table_init(). net: drop bad gso csum_start and offset in virtio_net_hdr ...
2024-08-01mptcp: fix duplicate data handlingPaolo Abeni
When a subflow receives and discards duplicate data, the mptcp stack assumes that the consumed offset inside the current skb is zero. With multiple subflows receiving data simultaneously such assertion does not held true. As a result the subflow-level copied_seq will be incorrectly increased and later on the same subflow will observe a bad mapping, leading to subflow reset. Address the issue taking into account the skb consumed offset in mptcp_subflow_discard_data(). Fixes: 04e4cd4f7ca4 ("mptcp: cleanup mptcp_subflow_discard_data()") Cc: stable@vger.kernel.org Link: https://github.com/multipath-tcp/mptcp_net-next/issues/501 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-01mptcp: fix bad RCVPRUNED mib accountingPaolo Abeni
Since its introduction, the mentioned MIB accounted for the wrong event: wake-up being skipped as not-needed on some edge condition instead of incoming skb being dropped after landing in the (subflow) receive queue. Move the increment in the correct location. Fixes: ce599c516386 ("mptcp: properly account bulk freed memory") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-01Merge tag 'nf-24-07-31' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: Fix a possible null-ptr-deref sometimes triggered by iptables-restore at boot time. Register iptables {ipv4,ipv6} nat table pernet in first place to fix this issue. Patch #1 and #2 from Kuniyuki Iwashima. netfilter pull request 24-07-31 * tag 'nf-24-07-31' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: iptables: Fix potential null-ptr-deref in ip6table_nat_table_init(). netfilter: iptables: Fix null-ptr-deref in iptable_nat_table_init(). ==================== Link: https://patch.msgid.link/20240731213046.6194-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-01ipv6: fix ndisc_is_useropt() handling for PIOMaciej Żenczykowski
The current logic only works if the PIO is between two other ND user options. This fixes it so that the PIO can also be either before or after other ND user options (for example the first or last option in the RA). side note: there's actually Android tests verifying a portion of the old broken behaviour, so: https://android-review.googlesource.com/c/kernel/tests/+/3196704 fixes those up. Cc: Jen Linkova <furry@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: Patrick Rohr <prohr@google.com> Cc: David Ahern <dsahern@kernel.org> Cc: YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Maciej Żenczykowski <maze@google.com> Fixes: 048c796beb6e ("ipv6: adjust ndisc_is_useropt() to also return true for PIO") Link: https://patch.msgid.link/20240730001748.147636-1-maze@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-31netfilter: iptables: Fix potential null-ptr-deref in ip6table_nat_table_init().Kuniyuki Iwashima
ip6table_nat_table_init() accesses net->gen->ptr[ip6table_nat_net_ops.id], but the function is exposed to user space before the entry is allocated via register_pernet_subsys(). Let's call register_pernet_subsys() before xt_register_template(). Fixes: fdacd57c79b7 ("netfilter: x_tables: never register tables by default") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-07-31netfilter: iptables: Fix null-ptr-deref in iptable_nat_table_init().Kuniyuki Iwashima
We had a report that iptables-restore sometimes triggered null-ptr-deref at boot time. [0] The problem is that iptable_nat_table_init() is exposed to user space before the kernel fully initialises netns. In the small race window, a user could call iptable_nat_table_init() that accesses net_generic(net, iptable_nat_net_id), which is available only after registering iptable_nat_net_ops. Let's call register_pernet_subsys() before xt_register_template(). [0]: bpfilter: Loaded bpfilter_umh pid 11702 Started bpfilter BUG: kernel NULL pointer dereference, address: 0000000000000013 PF: supervisor write access in kernel mode PF: error_code(0x0002) - not-present page PGD 0 P4D 0 PREEMPT SMP NOPTI CPU: 2 PID: 11879 Comm: iptables-restor Not tainted 6.1.92-99.174.amzn2023.x86_64 #1 Hardware name: Amazon EC2 c6i.4xlarge/, BIOS 1.0 10/16/2017 RIP: 0010:iptable_nat_table_init (net/ipv4/netfilter/iptable_nat.c:87 net/ipv4/netfilter/iptable_nat.c:121) iptable_nat Code: 10 4c 89 f6 48 89 ef e8 0b 19 bb ff 41 89 c4 85 c0 75 38 41 83 c7 01 49 83 c6 28 41 83 ff 04 75 dc 48 8b 44 24 08 48 8b 0c 24 <48> 89 08 4c 89 ef e8 a2 3b a2 cf 48 83 c4 10 44 89 e0 5b 5d 41 5c RSP: 0018:ffffbef902843cd0 EFLAGS: 00010246 RAX: 0000000000000013 RBX: ffff9f4b052caa20 RCX: ffff9f4b20988d80 RDX: 0000000000000000 RSI: 0000000000000064 RDI: ffffffffc04201c0 RBP: ffff9f4b29394000 R08: ffff9f4b07f77258 R09: ffff9f4b07f77240 R10: 0000000000000000 R11: ffff9f4b09635388 R12: 0000000000000000 R13: ffff9f4b1a3c6c00 R14: ffff9f4b20988e20 R15: 0000000000000004 FS: 00007f6284340000(0000) GS:ffff9f51fe280000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000013 CR3: 00000001d10a6005 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? show_trace_log_lvl (arch/x86/kernel/dumpstack.c:259) ? show_trace_log_lvl (arch/x86/kernel/dumpstack.c:259) ? xt_find_table_lock (net/netfilter/x_tables.c:1259) ? __die_body.cold (arch/x86/kernel/dumpstack.c:478 arch/x86/kernel/dumpstack.c:420) ? page_fault_oops (arch/x86/mm/fault.c:727) ? exc_page_fault (./arch/x86/include/asm/irqflags.h:40 ./arch/x86/include/asm/irqflags.h:75 arch/x86/mm/fault.c:1470 arch/x86/mm/fault.c:1518) ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:570) ? iptable_nat_table_init (net/ipv4/netfilter/iptable_nat.c:87 net/ipv4/netfilter/iptable_nat.c:121) iptable_nat xt_find_table_lock (net/netfilter/x_tables.c:1259) xt_request_find_table_lock (net/netfilter/x_tables.c:1287) get_info (net/ipv4/netfilter/ip_tables.c:965) ? security_capable (security/security.c:809 (discriminator 13)) ? ns_capable (kernel/capability.c:376 kernel/capability.c:397) ? do_ipt_get_ctl (net/ipv4/netfilter/ip_tables.c:1656) ? bpfilter_send_req (net/bpfilter/bpfilter_kern.c:52) bpfilter nf_getsockopt (net/netfilter/nf_sockopt.c:116) ip_getsockopt (net/ipv4/ip_sockglue.c:1827) __sys_getsockopt (net/socket.c:2327) __x64_sys_getsockopt (net/socket.c:2342 net/socket.c:2339 net/socket.c:2339) do_syscall_64 (arch/x86/entry/common.c:51 arch/x86/entry/common.c:81) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121) RIP: 0033:0x7f62844685ee Code: 48 8b 0d 45 28 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 37 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 0a c3 66 0f 1f 84 00 00 00 00 00 48 8b 15 09 RSP: 002b:00007ffd1f83d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000037 RAX: ffffffffffffffda RBX: 00007ffd1f83d680 RCX: 00007f62844685ee RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000004 RBP: 0000000000000004 R08: 00007ffd1f83d670 R09: 0000558798ffa2a0 R10: 00007ffd1f83d680 R11: 0000000000000246 R12: 00007ffd1f83e3b2 R13: 00007f628455baa0 R14: 00007ffd1f83d7b0 R15: 00007f628457a008 </TASK> Modules linked in: iptable_nat(+) bpfilter rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache veth xt_state xt_connmark xt_nat xt_statistic xt_MASQUERADE xt_mark xt_addrtype ipt_REJECT nf_reject_ipv4 nft_chain_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_comment nft_compat nf_tables nfnetlink overlay nls_ascii nls_cp437 vfat fat ghash_clmulni_intel aesni_intel ena crypto_simd ptp cryptd i8042 pps_core serio button sunrpc sch_fq_codel configfs loop dm_mod fuse dax dmi_sysfs crc32_pclmul crc32c_intel efivarfs CR2: 0000000000000013 Fixes: fdacd57c79b7 ("netfilter: x_tables: never register tables by default") Reported-by: Takahiro Kawahara <takawaha@amazon.co.jp> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-07-30net: drop bad gso csum_start and offset in virtio_net_hdrWillem de Bruijn
Tighten csum_start and csum_offset checks in virtio_net_hdr_to_skb for GSO packets. The function already checks that a checksum requested with VIRTIO_NET_HDR_F_NEEDS_CSUM is in skb linear. But for GSO packets this might not hold for segs after segmentation. Syzkaller demonstrated to reach this warning in skb_checksum_help offset = skb_checksum_start_offset(skb); ret = -EINVAL; if (WARN_ON_ONCE(offset >= skb_headlen(skb))) By injecting a TSO packet: WARNING: CPU: 1 PID: 3539 at net/core/dev.c:3284 skb_checksum_help+0x3d0/0x5b0 ip_do_fragment+0x209/0x1b20 net/ipv4/ip_output.c:774 ip_finish_output_gso net/ipv4/ip_output.c:279 [inline] __ip_finish_output+0x2bd/0x4b0 net/ipv4/ip_output.c:301 iptunnel_xmit+0x50c/0x930 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x2296/0x2c70 net/ipv4/ip_tunnel.c:813 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x759/0xa60 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4850 [inline] netdev_start_xmit include/linux/netdevice.h:4864 [inline] xmit_one net/core/dev.c:3595 [inline] dev_hard_start_xmit+0x261/0x8c0 net/core/dev.c:3611 __dev_queue_xmit+0x1b97/0x3c90 net/core/dev.c:4261 packet_snd net/packet/af_packet.c:3073 [inline] The geometry of the bad input packet at tcp_gso_segment: [ 52.003050][ T8403] skb len=12202 headroom=244 headlen=12093 tailroom=0 [ 52.003050][ T8403] mac=(168,24) mac_len=24 net=(192,52) trans=244 [ 52.003050][ T8403] shinfo(txflags=0 nr_frags=1 gso(size=1552 type=3 segs=0)) [ 52.003050][ T8403] csum(0x60000c7 start=199 offset=1536 ip_summed=3 complete_sw=0 valid=0 level=0) Mitigate with stricter input validation. csum_offset: for GSO packets, deduce the correct value from gso_type. This is already done for USO. Extend it to TSO. Let UFO be: udp[46]_ufo_fragment ignores these fields and always computes the checksum in software. csum_start: finding the real offset requires parsing to the transport header. Do not add a parser, use existing segmentation parsing. Thanks to SKB_GSO_DODGY, that also catches bad packets that are hw offloaded. Again test both TSO and USO. Do not test UFO for the above reason, and do not test UDP tunnel offload. GSO packet are almost always CHECKSUM_PARTIAL. USO packets may be CHECKSUM_NONE since commit 10154dbded6d6 ("udp: Allow GSO transmit from devices with no checksum offload"), but then still these fields are initialized correctly in udp4_hwcsum/udp6_hwcsum_outgoing. So no need to test for ip_summed == CHECKSUM_PARTIAL first. This revises an existing fix mentioned in the Fixes tag, which broke small packets with GSO offload, as detected by kselftests. Link: https://syzkaller.appspot.com/bug?extid=e1db31216c789f552871 Link: https://lore.kernel.org/netdev/20240723223109.2196886-1-kuba@kernel.org Fixes: e269d79c7d35 ("net: missing check virtio") Cc: stable@vger.kernel.org Signed-off-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240729201108.1615114-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-30net/iucv: fix use after free in iucv_sock_close()Alexandra Winter
iucv_sever_path() is called from process context and from bh context. iucv->path is used as indicator whether somebody else is taking care of severing the path (or it is already removed / never existed). This needs to be done with atomic compare and swap, otherwise there is a small window where iucv_sock_close() will try to work with a path that has already been severed and freed by iucv_callback_connrej() called by iucv_tasklet_fn(). Example: [452744.123844] Call Trace: [452744.123845] ([<0000001e87f03880>] 0x1e87f03880) [452744.123966] [<00000000d593001e>] iucv_path_sever+0x96/0x138 [452744.124330] [<000003ff801ddbca>] iucv_sever_path+0xc2/0xd0 [af_iucv] [452744.124336] [<000003ff801e01b6>] iucv_sock_close+0xa6/0x310 [af_iucv] [452744.124341] [<000003ff801e08cc>] iucv_sock_release+0x3c/0xd0 [af_iucv] [452744.124345] [<00000000d574794e>] __sock_release+0x5e/0xe8 [452744.124815] [<00000000d5747a0c>] sock_close+0x34/0x48 [452744.124820] [<00000000d5421642>] __fput+0xba/0x268 [452744.124826] [<00000000d51b382c>] task_work_run+0xbc/0xf0 [452744.124832] [<00000000d5145710>] do_notify_resume+0x88/0x90 [452744.124841] [<00000000d5978096>] system_call+0xe2/0x2c8 [452744.125319] Last Breaking-Event-Address: [452744.125321] [<00000000d5930018>] iucv_path_sever+0x90/0x138 [452744.125324] [452744.125325] Kernel panic - not syncing: Fatal exception in interrupt Note that bh_lock_sock() is not serializing the tasklet context against process context, because the check for sock_owned_by_user() and corresponding handling is missing. Ideas for a future clean-up patch: A) Correct usage of bh_lock_sock() in tasklet context, as described in Link: https://lore.kernel.org/netdev/1280155406.2899.407.camel@edumazet-laptop/ Re-enqueue, if needed. This may require adding return values to the tasklet functions and thus changes to all users of iucv. B) Change iucv tasklet into worker and use only lock_sock() in af_iucv. Fixes: 7d316b945352 ("af_iucv: remove IUCV-pathes completely") Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Link: https://patch.msgid.link/20240729122818.947756-1-wintera@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-30net/smc: prevent UAF in inet_create()D. Wythe
Following syzbot repro crashes the kernel: socketpair(0x2, 0x1, 0x100, &(0x7f0000000140)) (fail_nth: 13) Fix this by not calling sk_common_release() from smc_create_clcsk(). Stack trace: socket: no more sockets ------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 1 PID: 5092 at lib/refcount.c:28 refcount_warn_saturate+0x15a/0x1d0 lib/refcount.c:28 Modules linked in: CPU: 1 PID: 5092 Comm: syz-executor424 Not tainted 6.10.0-syzkaller-04483-g0be9ae5486cd #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/27/2024 RIP: 0010:refcount_warn_saturate+0x15a/0x1d0 lib/refcount.c:28 Code: 80 f3 1f 8c e8 e7 69 a8 fc 90 0f 0b 90 90 eb 99 e8 cb 4f e6 fc c6 05 8a 8d e8 0a 01 90 48 c7 c7 e0 f3 1f 8c e8 c7 69 a8 fc 90 <0f> 0b 90 90 e9 76 ff ff ff e8 a8 4f e6 fc c6 05 64 8d e8 0a 01 90 RSP: 0018:ffffc900034cfcf0 EFLAGS: 00010246 RAX: 3b9fcde1c862f700 RBX: ffff888022918b80 RCX: ffff88807b39bc00 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000003 R08: ffffffff815878a2 R09: fffffbfff1c39d94 R10: dffffc0000000000 R11: fffffbfff1c39d94 R12: 00000000ffffffe9 R13: 1ffff11004523165 R14: ffff888022918b28 R15: ffff888022918b00 FS: 00005555870e7380(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000140 CR3: 000000007582e000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> inet_create+0xbaf/0xe70 __sock_create+0x490/0x920 net/socket.c:1571 sock_create net/socket.c:1622 [inline] __sys_socketpair+0x2ca/0x720 net/socket.c:1769 __do_sys_socketpair net/socket.c:1822 [inline] __se_sys_socketpair net/socket.c:1819 [inline] __x64_sys_socketpair+0x9b/0xb0 net/socket.c:1819 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fbcb9259669 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 a1 1a 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fffe931c6d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000035 RAX: ffffffffffffffda RBX: 00007fffe931c6f0 RCX: 00007fbcb9259669 RDX: 0000000000000100 RSI: 0000000000000001 RDI: 0000000000000002 RBP: 0000000000000002 R08: 00007fffe931c476 R09: 00000000000000a0 R10: 0000000020000140 R11: 0000000000000246 R12: 00007fffe931c6ec R13: 431bde82d7b634db R14: 0000000000000001 R15: 0000000000000001 </TASK> Link: https://lore.kernel.org/r/20240723175809.537291-1-edumazet@google.com/ Fixes: d25a92ccae6b ("net/smc: Introduce IPPROTO_SMC") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Link: https://patch.msgid.link/1722224415-30999-1-git-send-email-alibuda@linux.alibaba.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-30mptcp: pm: fix backup support in signal endpointsMatthieu Baerts (NGI0)
There was a support for signal endpoints, but only when the endpoint's flag was changed during a connection. If an endpoint with the signal and backup was already present, the MP_JOIN reply was not containing the backup flag as expected. That's confusing to have this inconsistent behaviour. On the other hand, the infrastructure to set the backup flag in the SYN + ACK + MP_JOIN was already there, it was just never set before. Now when requesting the local ID from the path-manager, the backup status is also requested. Note that when the userspace PM is used, the backup flag can be set if the local address was already used before with a backup flag, e.g. if the address was announced with the 'backup' flag, or a subflow was created with the 'backup' flag. Fixes: 4596a2c1b7f5 ("mptcp: allow creating non-backup subflows") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/507 Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-30mptcp: mib: count MPJ with backup flagMatthieu Baerts (NGI0)
Without such counters, it is difficult to easily debug issues with MPJ not having the backup flags on production servers. This is not strictly a fix, but it eases to validate the following patches without requiring to take packet traces, to query ongoing connections with Netlink with admin permissions, or to guess by looking at the behaviour of the packet scheduler. Also, the modification is self contained, isolated, well controlled, and the increments are done just after others, there from the beginning. It looks then safe, and helpful to backport this. Fixes: 4596a2c1b7f5 ("mptcp: allow creating non-backup subflows") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-30mptcp: pm: only set request_bkup flag when sending MP_PRIOMatthieu Baerts (NGI0)
The 'backup' flag from mptcp_subflow_context structure is supposed to be set only when the other peer flagged a subflow as backup, not the opposite. Fixes: 067065422fcd ("mptcp: add the outgoing MP_PRIO support") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-30mptcp: distinguish rcv vs sent backup flag in requestsMatthieu Baerts (NGI0)
When sending an MP_JOIN + SYN + ACK, it is possible to mark the subflow as 'backup' by setting the flag with the same name. Before this patch, the backup was set if the other peer set it in its MP_JOIN + SYN request. It is not correct: the backup flag should be set in the MPJ+SYN+ACK only if the host asks for it, and not mirroring what was done by the other peer. It is then required to have a dedicated bit for each direction, similar to what is done in the subflow context. Fixes: f296234c98a8 ("mptcp: Add handling of incoming MP_JOIN requests") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-30mptcp: sched: check both directions for backupMatthieu Baerts (NGI0)
The 'mptcp_subflow_context' structure has two items related to the backup flags: - 'backup': the subflow has been marked as backup by the other peer - 'request_bkup': the backup flag has been set by the host Before this patch, the scheduler was only looking at the 'backup' flag. That can make sense in some cases, but it looks like that's not what we wanted for the general use, because either the path-manager was setting both of them when sending an MP_PRIO, or the receiver was duplicating the 'backup' flag in the subflow request. Note that the use of these two flags in the path-manager are going to be fixed in the next commits, but this change here is needed not to modify the behaviour. Fixes: f296234c98a8 ("mptcp: Add handling of incoming MP_JOIN requests") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-29mptcp: fix NL PM announced address accountingPaolo Abeni
Currently the per connection announced address counter is never decreased. As a consequence, after connection establishment, if the NL PM deletes an endpoint and adds a new/different one, no additional subflow is created for the new endpoint even if the current limits allow that. Address the issue properly updating the signaled address counter every time the NL PM removes such addresses. Fixes: 01cacb00b35c ("mptcp: add netlink-based PM") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-29mptcp: fix user-space PM announced address accountingPaolo Abeni
Currently the per-connection announced address counter is never decreased. When the user-space PM is in use, this just affect the information exposed via diag/sockopt, but it could still foul the PM to wrong decision. Add the missing accounting for the user-space PM's sake. Fixes: 8b1c94da1e48 ("mptcp: only send RM_ADDR in nl_cmd_remove") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-29rtnetlink: Don't ignore IFLA_TARGET_NETNSID when ifname is specified in ↵Kuniyuki Iwashima
rtnl_dellink(). The cited commit accidentally replaced tgt_net with net in rtnl_dellink(). As a result, IFLA_TARGET_NETNSID is ignored if the interface is specified with IFLA_IFNAME or IFLA_ALT_IFNAME. Let's pass tgt_net to rtnl_dev_get(). Fixes: cc6090e985d7 ("net: rtnetlink: introduce helper to get net_device instance by ifname") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-29tcp: Adjust clamping window for applications specifying SO_RCVBUFSubash Abhinov Kasiviswanathan
tp->scaling_ratio is not updated based on skb->len/skb->truesize once SO_RCVBUF is set leading to the maximum window scaling to be 25% of rcvbuf after commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale") and 50% of rcvbuf after commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio"). 50% tries to emulate the behavior of older kernels using sysctl_tcp_adv_win_scale with default value. Systems which were using a different values of sysctl_tcp_adv_win_scale in older kernels ended up seeing reduced download speeds in certain cases as covered in https://lists.openwall.net/netdev/2024/05/15/13 While the sysctl scheme is no longer acceptable, the value of 50% is a bit conservative when the skb->len/skb->truesize ratio is later determined to be ~0.66. Applications not specifying SO_RCVBUF update the window scaling and the receiver buffer every time data is copied to userspace. This computation is now used for applications setting SO_RCVBUF to update the maximum window scaling while ensuring that the receive buffer is within the application specified limit. Fixes: dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale") Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com> Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-29ethtool: fix the state of additional contexts with old APIJakub Kicinski
We expect drivers implementing the new create/modify/destroy API to populate the defaults in struct ethtool_rxfh_context. In legacy API ctx isn't even passed, and rxfh.indir / rxfh.key are NULL so drivers can't give us defaults even if they want to. Call get_rxfh() to fetch the values. We can reuse rxfh_dev for the get_rxfh(), rxfh stores the input from the user. This fixes IOCTL reporting 0s instead of the default key / indir table for drivers using legacy API. Add a check to try to catch drivers using the new API but not populating the key. Fixes: 7964e7884643 ("net: ethtool: use the tracking array for get_rxfh on custom RSS contexts") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-29ethtool: fix setting key and resetting indir at onceJakub Kicinski
The indirection table and the key follow struct ethtool_rxfh in user memory. To reset the indirection table user space calls SET_RXFH with table of size 0 (OTOH to say "no change" it should use -1 / ~0). The logic for calculating the offset where they key sits is incorrect in this case, as kernel would still offset by the full table length, while for the reset there is no indir table and key is immediately after the struct. $ ethtool -X eth0 default hkey 01:02:03... $ ethtool -x eth0 [...] RSS hash key: 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00 [...] Fixes: 3de0b592394d ("ethtool: Support for configurable RSS hash key") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-29tun: Add missing bpf_net_ctx_clear() in do_xdp_generic()Jeongjun Park
There are cases where do_xdp_generic returns bpf_net_context without clearing it. This causes various memory corruptions, so the missing bpf_net_ctx_clear must be added. Reported-by: syzbot+44623300f057a28baf1e@syzkaller.appspotmail.com Fixes: fecef4cd42c6 ("tun: Assign missing bpf_net_context.") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reported-by: syzbot+3c2b6d5d4bec3b904933@syzkaller.appspotmail.com Reported-by: syzbot+707d98c8649695eaf329@syzkaller.appspotmail.com Reported-by: syzbot+c226757eb784a9da3e8b@syzkaller.appspotmail.com Reported-by: syzbot+61a1cfc2b6632363d319@syzkaller.appspotmail.com Reported-by: syzbot+709e4c85c904bcd62735@syzkaller.appspotmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-28minmax: add a few more MIN_T/MAX_T usersLinus Torvalds
Commit 3a7e02c040b1 ("minmax: avoid overly complicated constant expressions in VM code") added the simpler MIN_T/MAX_T macros in order to avoid some excessive expansion from the rather complicated regular min/max macros. The complexity of those macros stems from two issues: (a) trying to use them in situations that require a C constant expression (in static initializers and for array sizes) (b) the type sanity checking and MIN_T/MAX_T avoids both of these issues. Now, in the whole (long) discussion about all this, it was pointed out that the whole type sanity checking is entirely unnecessary for min_t/max_t which get a fixed type that the comparison is done in. But that still leaves min_t/max_t unnecessarily complicated due to worries about the C constant expression case. However, it turns out that there really aren't very many cases that use min_t/max_t for this, and we can just force-convert those. This does exactly that. Which in turn will then allow for much simpler implementations of min_t()/max_t(). All the usual "macros in all upper case will evaluate the arguments multiple times" rules apply. We should do all the same things for the regular min/max() vs MIN/MAX() cases, but that has the added complexity of various drivers defining their own local versions of MIN/MAX, so that needs another level of fixes first. Link: https://lore.kernel.org/all/b47fad1d0cf8449886ad148f8c013dae@AcuMS.aculab.com/ Cc: David Laight <David.Laight@aculab.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-07-26Merge tag 'for-net-2024-07-26' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - btmtk: Fix kernel crash when entering btmtk_usb_suspend - btmtk: Fix btmtk.c undefined reference build error - btintel: Fail setup on error - hci_sync: Fix suspending with wrong filter policy - hci_event: Fix setting DISCOVERY_FINDING for passive scanning * tag 'for-net-2024-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: hci_event: Fix setting DISCOVERY_FINDING for passive scanning Bluetooth: btmtk: remove #ifdef around declarations Bluetooth: btmtk: Fix btmtk.c undefined reference build error harder Bluetooth: btmtk: Fix btmtk.c undefined reference build error Bluetooth: hci_sync: Fix suspending with wrong filter policy Bluetooth: btmtk: Fix kernel crash when entering btmtk_usb_suspend Bluetooth: btintel: Fail setup on error ==================== Link: https://patch.msgid.link/20240726150502.3300832-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-26Merge tag 'wireless-2024-07-26' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Couple of more urgent fixes: * ath12k: wowlan loop iteration issue * ath12k: fix soft lockup on suspend in certain scenarios * mt76: fix crash when removing an interface * mac80211: fix injection crash with some drivers that don't want monitor vif * cfg80211: fix S1G beacon parsing in scan * cfg80211: fix MLO link status reporting on connect * tag 'wireless-2024-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: ath12k: fix soft lockup on suspend wifi: mt76: mt7921: fix null pointer access in mt792x_mac_link_bss_remove wifi: ath12k: fix reusing outside iterator in ath12k_wow_vif_set_wakeups() wifi: cfg80211: correct S1G beacon length calculation wifi: cfg80211: fix reporting failed MLO links status with cfg80211_connect_done wifi: mac80211: use monitor sdata with driver only if desired ==================== Link: https://patch.msgid.link/20240726122638.942420-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-26Bluetooth: hci_event: Fix setting DISCOVERY_FINDING for passive scanningLuiz Augusto von Dentz
DISCOVERY_FINDING shall only be set for active scanning as passive scanning is not meant to generate MGMT Device Found events causing discovering state to go out of sync since userspace would believe it is discovering when in fact it is just passive scanning. Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=219088 Fixes: 2e2515c1ba38 ("Bluetooth: hci_event: Set DISCOVERY_FINDING on SCAN_ENABLED") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-07-26Bluetooth: hci_sync: Fix suspending with wrong filter policyLuiz Augusto von Dentz
When suspending the scan filter policy cannot be 0x00 (no acceptlist) since that means the host has to process every advertisement report waking up the system, so this attempts to check if hdev is marked as suspended and if the resulting filter policy would be 0x00 (no acceptlist) then skip passive scanning if thre no devices in the acceptlist otherwise reset the filter policy to 0x01 so the acceptlist is used since the devices programmed there can still wakeup be system. Fixes: 182ee45da083 ("Bluetooth: hci_sync: Rework hci_suspend_notifier") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-07-26wifi: cfg80211: correct S1G beacon length calculationJohannes Berg
The minimum header length calculation (equivalent to the start of the elements) for the S1G long beacon erroneously required only up to the start of u.s1g_beacon rather than the start of u.s1g_beacon.variable. Fix that, and also shuffle the branches around a bit to not assign useless values that are overwritten later. Reported-by: syzbot+0f3afa93b91202f21939@syzkaller.appspotmail.com Fixes: 9eaffe5078ca ("cfg80211: convert S1G beacon to scan results") Link: https://patch.msgid.link/20240724132912.9662972db7c1.I8779675b5bbda4994cc66f876b6b87a2361c3c0b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-26wifi: cfg80211: fix reporting failed MLO links status with cfg80211_connect_doneVeerendranath Jakkam
Individual MLO links connection status is not copied to EVENT_CONNECT_RESULT data while processing the connect response information in cfg80211_connect_done(). Due to this failed links are wrongly indicated with success status in EVENT_CONNECT_RESULT. To fix this, copy the individual MLO links status to the EVENT_CONNECT_RESULT data. Fixes: 53ad07e9823b ("wifi: cfg80211: support reporting failed links") Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com> Reviewed-by: Carlos Llamas <cmllamas@google.com> Link: https://patch.msgid.link/20240724125327.3495874-1-quic_vjakkam@quicinc.com [commit message editorial changes] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-26wifi: mac80211: use monitor sdata with driver only if desiredJohannes Berg
In commit 0d9c2beed116 ("wifi: mac80211: fix monitor channel with chanctx emulation") I changed mac80211 to always have an internal monitor_sdata to have something to have the chanctx bound to. However, if the driver didn't also have the WANT_MONITOR flag this would cause mac80211 to allocate it without telling the driver (which was intentional) but also use it for later APIs to the driver without it ever having known about it which was _not_ intentional. Check through the code and only use the monitor_sdata in the relevant places (TX, MU-MIMO follow settings, TX power, and interface iteration) when the WANT_MONITOR flag is set. Cc: stable@vger.kernel.org Fixes: 0d9c2beed116 ("wifi: mac80211: fix monitor channel with chanctx emulation") Reported-by: ZeroBeat <ZeroBeat@gmx.de> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219086 Tested-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20240725184836.25d334157a8e.I02574086da2c5cf0e18264ce5807db6f14ffd9c0@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-26sched: act_ct: take care of padding in struct zones_ht_keyEric Dumazet
Blamed commit increased lookup key size from 2 bytes to 16 bytes, because zones_ht_key got a struct net pointer. Make sure rhashtable_lookup() is not using the padding bytes which are not initialized. BUG: KMSAN: uninit-value in rht_ptr_rcu include/linux/rhashtable.h:376 [inline] BUG: KMSAN: uninit-value in __rhashtable_lookup include/linux/rhashtable.h:607 [inline] BUG: KMSAN: uninit-value in rhashtable_lookup include/linux/rhashtable.h:646 [inline] BUG: KMSAN: uninit-value in rhashtable_lookup_fast include/linux/rhashtable.h:672 [inline] BUG: KMSAN: uninit-value in tcf_ct_flow_table_get+0x611/0x2260 net/sched/act_ct.c:329 rht_ptr_rcu include/linux/rhashtable.h:376 [inline] __rhashtable_lookup include/linux/rhashtable.h:607 [inline] rhashtable_lookup include/linux/rhashtable.h:646 [inline] rhashtable_lookup_fast include/linux/rhashtable.h:672 [inline] tcf_ct_flow_table_get+0x611/0x2260 net/sched/act_ct.c:329 tcf_ct_init+0xa67/0x2890 net/sched/act_ct.c:1408 tcf_action_init_1+0x6cc/0xb30 net/sched/act_api.c:1425 tcf_action_init+0x458/0xf00 net/sched/act_api.c:1488 tcf_action_add net/sched/act_api.c:2061 [inline] tc_ctl_action+0x4be/0x19d0 net/sched/act_api.c:2118 rtnetlink_rcv_msg+0x12fc/0x1410 net/core/rtnetlink.c:6647 netlink_rcv_skb+0x375/0x650 net/netlink/af_netlink.c:2550 rtnetlink_rcv+0x34/0x40 net/core/rtnetlink.c:6665 netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline] netlink_unicast+0xf52/0x1260 net/netlink/af_netlink.c:1357 netlink_sendmsg+0x10da/0x11e0 net/netlink/af_netlink.c:1901 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:745 ____sys_sendmsg+0x877/0xb60 net/socket.c:2597 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2651 __sys_sendmsg net/socket.c:2680 [inline] __do_sys_sendmsg net/socket.c:2689 [inline] __se_sys_sendmsg net/socket.c:2687 [inline] __x64_sys_sendmsg+0x307/0x4a0 net/socket.c:2687 x64_sys_call+0x2dd6/0x3c10 arch/x86/include/generated/asm/syscalls_64.h:47 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Local variable key created at: tcf_ct_flow_table_get+0x4a/0x2260 net/sched/act_ct.c:324 tcf_ct_init+0xa67/0x2890 net/sched/act_ct.c:1408 Fixes: 88c67aeb1407 ("sched: act_ct: add netns into the key of tcf_ct_flow_table") Reported-by: syzbot+1b5e4e187cc586d05ea0@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-25ethtool: rss: echo the context number backJakub Kicinski
The response to a GET request in Netlink should fully identify the queried object. RSS_GET accepts context id as an input, so it must echo that attribute back to the response. After (assuming context 1 has been created): $ ./cli.py --spec netlink/specs/ethtool.yaml \ --do rss-get \ --json '{"header": {"dev-index": 2}, "context": 1}' {'context': 1, 'header': {'dev-index': 2, 'dev-name': 'eth0'}, [...] Fixes: 7112a04664bf ("ethtool: add netlink based get rss support") Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20240724234249.2621109-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-25Merge tag 'net-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf and netfilter. A lot of networking people were at a conference last week, busy catching COVID, so relatively short PR. Current release - regressions: - tcp: process the 3rd ACK with sk_socket for TFO and MPTCP Current release - new code bugs: - l2tp: protect session IDR and tunnel session list with one lock, make sure the state is coherent to avoid a warning - eth: bnxt_en: update xdp_rxq_info in queue restart logic - eth: airoha: fix location of the MBI_RX_AGE_SEL_MASK field Previous releases - regressions: - xsk: require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len, the field reuses previously un-validated pad Previous releases - always broken: - tap/tun: drop short frames to prevent crashes later in the stack - eth: ice: add a per-VF limit on number of FDIR filters - af_unix: disable MSG_OOB handling for sockets in sockmap/sockhash" * tag 'net-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits) tun: add missing verification for short frame tap: add missing verification for short frame mISDN: Fix a use after free in hfcmulti_tx() gve: Fix an edge case for TSO skb validity check bnxt_en: update xdp_rxq_info in queue restart logic tcp: process the 3rd ACK with sk_socket for TFO/MPTCP selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len bpf: Fix a segment issue when downgrading gso_size net: mediatek: Fix potential NULL pointer dereference in dummy net_device handling MAINTAINERS: make Breno the netconsole maintainer MAINTAINERS: Update bonding entry net: nexthop: Initialize all fields in dumped nexthops net: stmmac: Correct byte order of perfect_match selftests: forwarding: skip if kernel not support setting bridge fdb learning limit tipc: Return non-zero value from tipc_udp_addr2str() on error netfilter: nft_set_pipapo_avx2: disable softinterrupts ice: Fix recipe read procedure ice: Add a per-VF limit on number of FDIR filters net: bonding: correctly annotate RCU in bond_should_notify_peers() ...
2024-07-25Merge tag 'constfy-sysctl-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl Pull sysctl constification from Joel Granados: "Treewide constification of the ctl_table argument of proc_handlers using a coccinelle script and some manual code formatting fixups. This is a prerequisite to moving the static ctl_table structs into read-only data section which will ensure that proc_handler function pointers cannot be modified" * tag 'constfy-sysctl-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: sysctl: treewide: constify the ctl_table argument of proc_handlers
2024-07-25Merge tag 'driver-core-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big set of driver core changes for 6.11-rc1. Lots of stuff in here, with not a huge diffstat, but apis are evolving which required lots of files to be touched. Highlights of the changes in here are: - platform remove callback api final fixups (Uwe took many releases to get here, finally!) - Rust bindings for basic firmware apis and initial driver-core interactions. It's not all that useful for a "write a whole driver in rust" type of thing, but the firmware bindings do help out the phy rust drivers, and the driver core bindings give a solid base on which others can start their work. There is still a long way to go here before we have a multitude of rust drivers being added, but it's a great first step. - driver core const api changes. This reached across all bus types, and there are some fix-ups for some not-common bus types that linux-next and 0-day testing shook out. This work is being done to help make the rust bindings more safe, as well as the C code, moving toward the end-goal of allowing us to put driver structures into read-only memory. We aren't there yet, but are getting closer. - minor devres cleanups and fixes found by code inspection - arch_topology minor changes - other minor driver core cleanups All of these have been in linux-next for a very long time with no reported problems" * tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (55 commits) ARM: sa1100: make match function take a const pointer sysfs/cpu: Make crash_hotplug attribute world-readable dio: Have dio_bus_match() callback take a const * zorro: make match function take a const pointer driver core: module: make module_[add|remove]_driver take a const * driver core: make driver_find_device() take a const * driver core: make driver_[create|remove]_file take a const * firmware_loader: fix soundness issue in `request_internal` firmware_loader: annotate doctests as `no_run` devres: Correct code style for functions that return a pointer type devres: Initialize an uninitialized struct member devres: Fix memory leakage caused by driver API devm_free_percpu() devres: Fix devm_krealloc() wasting memory driver core: platform: Switch to use kmemdup_array() driver core: have match() callback in struct bus_type take a const * MAINTAINERS: add Rust device abstractions to DRIVER CORE device: rust: improve safety comments MAINTAINERS: add Danilo as FIRMWARE LOADER maintainer MAINTAINERS: add Rust FW abstractions to FIRMWARE LOADER firmware: rust: improve safety comments ...
2024-07-25Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-07-25 We've added 14 non-merge commits during the last 8 day(s) which contain a total of 19 files changed, 177 insertions(+), 70 deletions(-). The main changes are: 1) Fix af_unix to disable MSG_OOB handling for sockets in BPF sockmap and BPF sockhash. Also add test coverage for this case, from Michal Luczaj. 2) Fix a segmentation issue when downgrading gso_size in the BPF helper bpf_skb_adjust_room(), from Fred Li. 3) Fix a compiler warning in resolve_btfids due to a missing type cast, from Liwei Song. 4) Fix stack allocation for arm64 to align the stack pointer at a 16 byte boundary in the fexit_sleep BPF selftest, from Puranjay Mohan. 5) Fix a xsk regression to require a flag when actuating tx_metadata_len, from Stanislav Fomichev. 6) Fix function prototype BTF dumping in libbpf for prototypes that have no input arguments, from Andrii Nakryiko. 7) Fix stacktrace symbol resolution in perf script for BPF programs containing subprograms, from Hou Tao. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len bpf: Fix a segment issue when downgrading gso_size tools/resolve_btfids: Fix comparison of distinct pointer types warning in resolve_btfids bpf, events: Use prog to emit ksymbol event for main program selftests/bpf: Test sockmap redirect for AF_UNIX MSG_OOB selftests/bpf: Parametrize AF_UNIX redir functions to accept send() flags selftests/bpf: Support SOCK_STREAM in unix_inet_redir_to_connected() af_unix: Disable MSG_OOB handling for sockets in sockmap/sockhash bpftool: Fix typo in usage help libbpf: Fix no-args func prototype BTF dumping syntax MAINTAINERS: Update powerpc BPF JIT maintainers MAINTAINERS: Update email address of Naveen selftests/bpf: fexit_sleep: Fix stack allocation for arm64 ==================== Link: https://patch.msgid.link/20240725114312.32197-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-25tcp: process the 3rd ACK with sk_socket for TFO/MPTCPMatthieu Baerts (NGI0)
The 'Fixes' commit recently changed the behaviour of TCP by skipping the processing of the 3rd ACK when a sk->sk_socket is set. The goal was to skip tcp_ack_snd_check() in tcp_rcv_state_process() not to send an unnecessary ACK in case of simultaneous connect(). Unfortunately, that had an impact on TFO and MPTCP. I started to look at the impact on MPTCP, because the MPTCP CI found some issues with the MPTCP Packetdrill tests [1]. Then Paolo Abeni suggested me to look at the impact on TFO with "plain" TCP. For MPTCP, when receiving the 3rd ACK of a request adding a new path (MP_JOIN), sk->sk_socket will be set, and point to the MPTCP sock that has been created when the MPTCP connection got established before with the first path. The newly added 'goto' will then skip the processing of the segment text (step 7) and not go through tcp_data_queue() where the MPTCP options are validated, and some actions are triggered, e.g. sending the MPJ 4th ACK [2] as demonstrated by the new errors when running a packetdrill test [3] establishing a second subflow. This doesn't fully break MPTCP, mainly the 4th MPJ ACK that will be delayed. Still, we don't want to have this behaviour as it delays the switch to the fully established mode, and invalid MPTCP options in this 3rd ACK will not be caught any more. This modification also affects the MPTCP + TFO feature as well, and being the reason why the selftests started to be unstable the last few days [4]. For TFO, the existing 'basic-cookie-not-reqd' test [5] was no longer passing: if the 3rd ACK contains data, and the connection is accept()ed before receiving them, these data would no longer be processed, and thus not ACKed. One last thing about MPTCP, in case of simultaneous connect(), a fallback to TCP will be done, which seems fine: `../common/defaults.sh` 0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_MPTCP) = 3 +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress) +0 > S 0:0(0) <mss 1460, sackOK, TS val 100 ecr 0, nop, wscale 8, mpcapable v1 flags[flag_h] nokey> +0 < S 0:0(0) win 1000 <mss 1460, sackOK, TS val 407 ecr 0, nop, wscale 8, mpcapable v1 flags[flag_h] nokey> +0 > S. 0:0(0) ack 1 <mss 1460, sackOK, TS val 330 ecr 0, nop, wscale 8, mpcapable v1 flags[flag_h] nokey> +0 < S. 0:0(0) ack 1 win 65535 <mss 1460, sackOK, TS val 700 ecr 100, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey=2]> +0 > . 1:1(0) ack 1 <nop, nop, TS val 845707014 ecr 700, nop, nop, sack 0:1> Simultaneous SYN-data crossing is also not supported by TFO, see [6]. Kuniyuki Iwashima suggested to restrict the processing to SYN+ACK only: that's a more generic solution than the one initially proposed, and also enough to fix the issues described above. Later on, Eric Dumazet mentioned that an ACK should still be sent in reaction to the second SYN+ACK that is received: not sending a DUPACK here seems wrong and could hurt: 0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_TCP) = 3 +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress) +0 > S 0:0(0) <mss 1460, sackOK, TS val 1000 ecr 0,nop,wscale 8> +0 < S 0:0(0) win 1000 <mss 1000, sackOK, nop, nop> +0 > S. 0:0(0) ack 1 <mss 1460, sackOK, TS val 3308134035 ecr 0,nop,wscale 8> +0 < S. 0:0(0) ack 1 win 1000 <mss 1000, sackOK, nop, nop> +0 > . 1:1(0) ack 1 <nop, nop, sack 0:1> // <== Here So in this version, the 'goto consume' is dropped, to always send an ACK when switching from TCP_SYN_RECV to TCP_ESTABLISHED. This ACK will be seen as a DUPACK -- with DSACK if SACK has been negotiated -- in case of simultaneous SYN crossing: that's what is expected here. Link: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/9936227696 [1] Link: https://datatracker.ietf.org/doc/html/rfc8684#fig_tokens [2] Link: https://github.com/multipath-tcp/packetdrill/blob/mptcp-net-next/gtests/net/mptcp/syscalls/accept.pkt#L28 [3] Link: https://netdev.bots.linux.dev/contest.html?executor=vmksft-mptcp-dbg&test=mptcp-connect-sh [4] Link: https://github.com/google/packetdrill/blob/master/gtests/net/tcp/fastopen/server/basic-cookie-not-reqd.pkt#L21 [5] Link: https://github.com/google/packetdrill/blob/master/gtests/net/tcp/fastopen/client/simultaneous-fast-open.pkt [6] Fixes: 23e89e8ee7be ("tcp: Don't drop SYN+ACK for simultaneous connect().") Suggested-by: Paolo Abeni <pabeni@redhat.com> Suggested-by: Kuniyuki Iwashima <kuniyu@amazon.com> Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240724-upstream-net-next-20240716-tcp-3rd-ack-consume-sk_socket-v3-1-d48339764ce9@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-25xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_lenStanislav Fomichev
Julian reports that commit 341ac980eab9 ("xsk: Support tx_metadata_len") can break existing use cases which don't zero-initialize xdp_umem_reg padding. Introduce new XDP_UMEM_TX_METADATA_LEN to make sure we interpret the padding as tx_metadata_len only when being explicitly asked. Fixes: 341ac980eab9 ("xsk: Support tx_metadata_len") Reported-by: Julian Schindel <mail@arctic-alpaca.de> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/bpf/20240713015253.121248-2-sdf@fomichev.me
2024-07-25bpf: Fix a segment issue when downgrading gso_sizeFred Li
Linearize the skb when downgrading gso_size because it may trigger a BUG_ON() later when the skb is segmented as described in [1,2]. Fixes: 2be7e212d5419 ("bpf: add bpf_skb_adjust_room helper") Signed-off-by: Fred Li <dracodingfly@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/all/20240626065555.35460-2-dracodingfly@gmail.com [1] Link: https://lore.kernel.org/all/668d5cf1ec330_1c18c32947@willemb.c.googlers.com.notmuch [2] Link: https://lore.kernel.org/bpf/20240719024653.77006-1-dracodingfly@gmail.com
2024-07-25Merge tag 'nf-24-07-24' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains a Netfilter fix for net: Patch #1 if FPU is busy, then pipapo set backend falls back to standard set element lookup. Moreover, disable bh while at this. From Florian Westphal. netfilter pull request 24-07-24 * tag 'nf-24-07-24' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nft_set_pipapo_avx2: disable softinterrupts ==================== Link: https://patch.msgid.link/20240724081305.3152-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-24sysctl: treewide: constify the ctl_table argument of proc_handlersJoel Granados
const qualify the struct ctl_table argument in the proc_handler function signatures. This is a prerequisite to moving the static ctl_table structs into .rodata data which will ensure that proc_handler function pointers cannot be modified. This patch has been generated by the following coccinelle script: ``` virtual patch @r1@ identifier ctl, write, buffer, lenp, ppos; identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)"; @@ int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos); @r2@ identifier func, ctl, write, buffer, lenp, ppos; @@ int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos) { ... } @r3@ identifier func; @@ int func( - struct ctl_table * + const struct ctl_table * ,int , void *, size_t *, loff_t *); @r4@ identifier func, ctl; @@ int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int , void *, size_t *, loff_t *); @r5@ identifier func, write, buffer, lenp, ppos; @@ int func( - struct ctl_table * + const struct ctl_table * ,int write, void *buffer, size_t *lenp, loff_t *ppos); ``` * Code formatting was adjusted in xfs_sysctl.c to comply with code conventions. The xfs_stats_clear_proc_handler, xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where adjusted. * The ctl_table argument in proc_watchdog_common was const qualified. This is called from a proc_handler itself and is calling back into another proc_handler, making it necessary to change it as part of the proc_handler migration. Co-developed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Co-developed-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Joel Granados <j.granados@samsung.com>
2024-07-24net: nexthop: Initialize all fields in dumped nexthopsPetr Machata
struct nexthop_grp contains two reserved fields that are not initialized by nla_put_nh_group(), and carry garbage. This can be observed e.g. with strace (edited for clarity): # ip nexthop add id 1 dev lo # ip nexthop add id 101 group 1 # strace -e recvmsg ip nexthop get id 101 ... recvmsg(... [{nla_len=12, nla_type=NHA_GROUP}, [{id=1, weight=0, resvd1=0x69, resvd2=0x67}]] ...) = 52 The fields are reserved and therefore not currently used. But as they are, they leak kernel memory, and the fact they are not just zero complicates repurposing of the fields for new ends. Initialize the full structure. Fixes: 430a049190de ("nexthop: Add support for nexthop groups") Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-24tipc: Return non-zero value from tipc_udp_addr2str() on errorShigeru Yoshida
tipc_udp_addr2str() should return non-zero value if the UDP media address is invalid. Otherwise, a buffer overflow access can occur in tipc_media_addr_printf(). Fix this by returning 1 on an invalid UDP media address. Fixes: d0f91938bede ("tipc: add ip/udp media type") Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Reviewed-by: Tung Nguyen <tung.q.nguyen@endava.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-24netfilter: nft_set_pipapo_avx2: disable softinterruptsFlorian Westphal
We need to disable softinterrupts, else we get following problem: 1. pipapo_avx2 called from process context; fpu usable 2. preempt_disable() called, pcpu scratchmap in use 3. softirq handles rx or tx, we re-enter pipapo_avx2 4. fpu busy, fallback to generic non-avx version 5. fallback reuses scratch map and index, which are in use by the preempted process Handle this same way as generic version by first disabling softinterrupts while the scratchmap is in use. Fixes: f0b3d338064e ("netfilter: nft_set_pipapo_avx2: Add irq_fpu_usable() check, fallback to non-AVX2 version") Cc: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-07-23l2tp: make session IDR and tunnel session list coherentJames Chapman
Modify l2tp_session_register and l2tp_session_unhash so that the session IDR and tunnel session lists remain coherent. To do so, hold the session IDR lock and the tunnel's session list lock when making any changes to either list. Without this change, a rare race condition could hit the WARN_ON_ONCE in l2tp_session_unhash if a thread replaced the IDR entry while another thread was registering the same ID. [ 7126.151795][T17511] WARNING: CPU: 3 PID: 17511 at net/l2tp/l2tp_core.c:1282 l2tp_session_delete.part.0+0x87e/0xbc0 [ 7126.163754][T17511] ? show_regs+0x93/0xa0 [ 7126.164157][T17511] ? __warn+0xe5/0x3c0 [ 7126.164536][T17511] ? l2tp_session_delete.part.0+0x87e/0xbc0 [ 7126.165070][T17511] ? report_bug+0x2e1/0x500 [ 7126.165486][T17511] ? l2tp_session_delete.part.0+0x87e/0xbc0 [ 7126.166013][T17511] ? handle_bug+0x99/0x130 [ 7126.166428][T17511] ? exc_invalid_op+0x35/0x80 [ 7126.166890][T17511] ? asm_exc_invalid_op+0x1a/0x20 [ 7126.167372][T17511] ? l2tp_session_delete.part.0+0x87d/0xbc0 [ 7126.167900][T17511] ? l2tp_session_delete.part.0+0x87e/0xbc0 [ 7126.168429][T17511] ? __local_bh_enable_ip+0xa4/0x120 [ 7126.168917][T17511] l2tp_session_delete+0x40/0x50 [ 7126.169369][T17511] pppol2tp_release+0x1a1/0x3f0 [ 7126.169817][T17511] __sock_release+0xb3/0x270 [ 7126.170247][T17511] ? __pfx_sock_close+0x10/0x10 [ 7126.170697][T17511] sock_close+0x1c/0x30 [ 7126.171087][T17511] __fput+0x40b/0xb90 [ 7126.171470][T17511] task_work_run+0x16c/0x260 [ 7126.171897][T17511] ? __pfx_task_work_run+0x10/0x10 [ 7126.172362][T17511] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7126.172863][T17511] ? do_raw_spin_unlock+0x174/0x230 [ 7126.173348][T17511] do_exit+0xaae/0x2b40 [ 7126.173730][T17511] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7126.174235][T17511] ? __pfx_lock_release+0x10/0x10 [ 7126.174690][T17511] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7126.175190][T17511] ? do_raw_spin_lock+0x12c/0x2b0 [ 7126.175650][T17511] ? __pfx_do_exit+0x10/0x10 [ 7126.176072][T17511] ? _raw_spin_unlock_irq+0x23/0x50 [ 7126.176543][T17511] do_group_exit+0xd3/0x2a0 [ 7126.176990][T17511] __x64_sys_exit_group+0x3e/0x50 [ 7126.177456][T17511] x64_sys_call+0x1821/0x1830 [ 7126.177895][T17511] do_syscall_64+0xcb/0x250 [ 7126.178317][T17511] entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: aa5e17e1f5ec ("l2tp: store l2tpv3 sessions in per-net IDR") Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240718134348.289865-1-jchapman@katalix.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-23ipv4: Fix incorrect source address in Record Route optionIdo Schimmel
The Record Route IP option records the addresses of the routers that routed the packet. In the case of forwarded packets, the kernel performs a route lookup via fib_lookup() and fills in the preferred source address of the matched route. The lookup is performed with the DS field of the forwarded packet, but using the RT_TOS() macro which only masks one of the two ECN bits. If the packet is ECT(0) or CE, the matched route might be different than the route via which the packet was forwarded as the input path masks both of the ECN bits, resulting in the wrong address being filled in the Record Route option. Fix by masking both of the ECN bits. Fixes: 8e36360ae876 ("ipv4: Remove route key identity dependencies in ip_rt_get_source().") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Link: https://patch.msgid.link/20240718123407.434778-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-21Merge tag 'mm-nonmm-stable-2024-07-21-15-07' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - In the series "treewide: Refactor heap related implementation", Kuan-Wei Chiu has significantly reworked the min_heap library code and has taught bcachefs to use the new more generic implementation. - Yury Norov's series "Cleanup cpumask.h inclusion in core headers" reworks the cpumask and nodemask headers to make things generally more rational. - Kuan-Wei Chiu has sent along some maintenance work against our sorting library code in the series "lib/sort: Optimizations and cleanups". - More library maintainance work from Christophe Jaillet in the series "Remove usage of the deprecated ida_simple_xx() API". - Ryusuke Konishi continues with the nilfs2 fixes and clanups in the series "nilfs2: eliminate the call to inode_attach_wb()". - Kuan-Ying Lee has some fixes to the gdb scripts in the series "Fix GDB command error". - Plus the usual shower of singleton patches all over the place. Please see the relevant changelogs for details. * tag 'mm-nonmm-stable-2024-07-21-15-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (98 commits) ia64: scrub ia64 from poison.h watchdog/perf: properly initialize the turbo mode timestamp and rearm counter tsacct: replace strncpy() with strscpy() lib/bch.c: use swap() to improve code test_bpf: convert comma to semicolon init/modpost: conditionally check section mismatch to __meminit* init: remove unused __MEMINIT* macros nilfs2: Constify struct kobj_type nilfs2: avoid undefined behavior in nilfs_cnt32_ge macro math: rational: add missing MODULE_DESCRIPTION() macro lib/zlib: add missing MODULE_DESCRIPTION() macro fs: ufs: add MODULE_DESCRIPTION() lib/rbtree.c: fix the example typo ocfs2: add bounds checking to ocfs2_check_dir_entry() fs: add kernel-doc comments to ocfs2_prepare_orphan_dir() coredump: simplify zap_process() selftests/fpu: add missing MODULE_DESCRIPTION() macro compiler.h: simplify data_race() macro build-id: require program headers to be right after ELF header resource: add missing MODULE_DESCRIPTION() ...
2024-07-19Merge tag 'net-6.11-rc0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter. Notably this includes fixes for a s390 build breakage. Current release - new code bugs: - eth: fbnic: fix s390 build - eth: airoha: fix NULL pointer dereference in airoha_qdma_cleanup_rx_queue() Previous releases - regressions: - flow_dissector: use DEBUG_NET_WARN_ON_ONCE - ipv4: fix incorrect TOS in route get reply - dsa: fix chip-wide frame size config in some drivers Previous releases - always broken: - netfilter: nf_set_pipapo: fix initial map fill - eth: gve: fix XDP TX completion handling when counters overflow" * tag 'net-6.11-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: eth: fbnic: don't build the driver when skb has more than 21 frags net: dsa: b53: Limit chip-wide jumbo frame config to CPU ports net: dsa: mv88e6xxx: Limit chip-wide frame size config to CPU ports net: airoha: Fix NULL pointer dereference in airoha_qdma_cleanup_rx_queue() net: wwan: t7xx: add support for Dell DW5933e ipv4: Fix incorrect TOS in fibmatch route get reply ipv4: Fix incorrect TOS in route get reply net: flow_dissector: use DEBUG_NET_WARN_ON_ONCE driver core: auxiliary bus: Fix documentation of auxiliary_device net: airoha: fix error branch in airoha_dev_xmit and airoha_set_gdm_ports gve: Fix XDP TX completion handling when counters overflow ipvs: properly dereference pe in ip_vs_add_service selftests: netfilter: add test case for recent mismatch bug netfilter: nf_set_pipapo: fix initial map fill netfilter: ctnetlink: use helper function to calculate expect ID eth: fbnic: fix s390 build.
2024-07-19Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: "Several new features here: - Virtio find vqs API has been reworked (required to fix the scalability issue we have with adminq, which I hope to merge later in the cycle) - vDPA driver for Marvell OCTEON - virtio fs performance improvement - mlx5 migration speedups Fixes, cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (56 commits) virtio: rename virtio_find_vqs_info() to virtio_find_vqs() virtio: remove unused virtio_find_vqs() and virtio_find_vqs_ctx() helpers virtio: convert the rest virtio_find_vqs() users to virtio_find_vqs_info() virtio_balloon: convert to use virtio_find_vqs_info() virtiofs: convert to use virtio_find_vqs_info() scsi: virtio_scsi: convert to use virtio_find_vqs_info() virtio_net: convert to use virtio_find_vqs_info() virtio_crypto: convert to use virtio_find_vqs_info() virtio_console: convert to use virtio_find_vqs_info() virtio_blk: convert to use virtio_find_vqs_info() virtio: rename find_vqs_info() op to find_vqs() virtio: remove the original find_vqs() op virtio: call virtio_find_vqs_info() from virtio_find_single_vq() directly virtio: convert find_vqs() op implementations to find_vqs_info() virtio_pci: convert vp_*find_vqs() ops to find_vqs_info() virtio: introduce virtio_queue_info struct and find_vqs_info() config op virtio: make virtio_find_single_vq() call virtio_find_vqs() virtio: make virtio_find_vqs() call virtio_find_vqs_ctx() caif_virtio: use virtio_find_single_vq() for single virtqueue finding vdpa/mlx5: Don't enable non-active VQs in .set_vq_ready() ...
2024-07-18Merge tag 'nfs-for-6.11-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client updates from Anna Schumaker: "New Features: - Add support for large folios - Implement rpcrdma generic device removal notification - Add client support for attribute delegations - Use a LAYOUTRETURN during reboot recovery to report layoutstats and errors - Improve throughput for random buffered writes - Add NVMe support to pnfs/blocklayout Bugfixes: - Fix rpcrdma_reqs_reset() - Avoid soft lockups when using UDP - Fix an nfs/blocklayout premature PR key unregestration - Another fix for EXCHGID4_FLAG_USE_PNFS_DS for DS server - Do not extend writes to the entire folio - Pass explicit offset and count values to tracepoints - Fix a race to wake up sleeping SUNRPC sync tasks - Fix gss_status tracepoint output Cleanups: - Add missing MODULE_DESCRIPTION() macros - Add blocklayout / SCSI layout tracepoints - Remove asm-generic headers from xprtrdma verbs.c - Remove unused 'struct mnt_fhstatus' - Other delegation related cleanups - Other folio related cleanups - Other pNFS related cleanups - Other xprtrdma cleanups" * tag 'nfs-for-6.11-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (63 commits) SUNRPC: Fixup gss_status tracepoint error output SUNRPC: Fix a race to wake a sync task nfs: split nfs_read_folio nfs: pass explicit offset/count to trace events nfs: do not extend writes to the entire folio nfs/blocklayout: add support for NVMe nfs: remove nfs_page_length nfs: remove the unused max_deviceinfo_size field from struct pnfs_layoutdriver_type nfs: don't reuse partially completed requests in nfs_lock_and_join_requests nfs: move nfs_wait_on_request to write.c nfs: fold nfs_page_group_lock_subrequests into nfs_lock_and_join_requests nfs: fold nfs_folio_find_and_lock_request into nfs_lock_and_join_requests nfs: simplify nfs_folio_find_and_lock_request nfs: remove nfs_folio_private_request nfs: remove dead code for the old swap over NFS implementation NFSv4.1 another fix for EXCHGID4_FLAG_USE_PNFS_DS for DS server nfs: Block on write congestion nfs: Properly initialize server->writeback nfs: Drop pointless check from nfs_commit_release_pages() nfs/blocklayout: SCSI layout trace points for reservation key reg/unreg ...