Age | Commit message (Collapse) | Author |
|
Normally, x->replay_esn and x->preplay_esn should be allocated at
xfrm_alloc_replay_state_esn(...) in xfrm_state_construct(...), hence the
xfrm_update_ae_params(...) is okay to update them. However, the current
implementation of xfrm_new_ae(...) allows a malicious user to directly
dereference a NULL pointer and crash the kernel like below.
BUG: kernel NULL pointer dereference, address: 0000000000000000
PGD 8253067 P4D 8253067 PUD 8e0e067 PMD 0
Oops: 0002 [#1] PREEMPT SMP KASAN NOPTI
CPU: 0 PID: 98 Comm: poc.npd Not tainted 6.4.0-rc7-00072-gdad9774deaf1 #8
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.o4
RIP: 0010:memcpy_orig+0xad/0x140
Code: e8 4c 89 5f e0 48 8d 7f e0 73 d2 83 c2 20 48 29 d6 48 29 d7 83 fa 10 72 34 4c 8b 06 4c 8b 4e 08 c
RSP: 0018:ffff888008f57658 EFLAGS: 00000202
RAX: 0000000000000000 RBX: ffff888008bd0000 RCX: ffffffff8238e571
RDX: 0000000000000018 RSI: ffff888007f64844 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff888008f57818
R13: ffff888007f64aa4 R14: 0000000000000000 R15: 0000000000000000
FS: 00000000014013c0(0000) GS:ffff88806d600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000000054d8000 CR4: 00000000000006f0
Call Trace:
<TASK>
? __die+0x1f/0x70
? page_fault_oops+0x1e8/0x500
? __pfx_is_prefetch.constprop.0+0x10/0x10
? __pfx_page_fault_oops+0x10/0x10
? _raw_spin_unlock_irqrestore+0x11/0x40
? fixup_exception+0x36/0x460
? _raw_spin_unlock_irqrestore+0x11/0x40
? exc_page_fault+0x5e/0xc0
? asm_exc_page_fault+0x26/0x30
? xfrm_update_ae_params+0xd1/0x260
? memcpy_orig+0xad/0x140
? __pfx__raw_spin_lock_bh+0x10/0x10
xfrm_update_ae_params+0xe7/0x260
xfrm_new_ae+0x298/0x4e0
? __pfx_xfrm_new_ae+0x10/0x10
? __pfx_xfrm_new_ae+0x10/0x10
xfrm_user_rcv_msg+0x25a/0x410
? __pfx_xfrm_user_rcv_msg+0x10/0x10
? __alloc_skb+0xcf/0x210
? stack_trace_save+0x90/0xd0
? filter_irq_stacks+0x1c/0x70
? __stack_depot_save+0x39/0x4e0
? __kasan_slab_free+0x10a/0x190
? kmem_cache_free+0x9c/0x340
? netlink_recvmsg+0x23c/0x660
? sock_recvmsg+0xeb/0xf0
? __sys_recvfrom+0x13c/0x1f0
? __x64_sys_recvfrom+0x71/0x90
? do_syscall_64+0x3f/0x90
? entry_SYSCALL_64_after_hwframe+0x72/0xdc
? copyout+0x3e/0x50
netlink_rcv_skb+0xd6/0x210
? __pfx_xfrm_user_rcv_msg+0x10/0x10
? __pfx_netlink_rcv_skb+0x10/0x10
? __pfx_sock_has_perm+0x10/0x10
? mutex_lock+0x8d/0xe0
? __pfx_mutex_lock+0x10/0x10
xfrm_netlink_rcv+0x44/0x50
netlink_unicast+0x36f/0x4c0
? __pfx_netlink_unicast+0x10/0x10
? netlink_recvmsg+0x500/0x660
netlink_sendmsg+0x3b7/0x700
This Null-ptr-deref bug is assigned CVE-2023-3772. And this commit
adds additional NULL check in xfrm_update_ae_params to fix the NPD.
Fixes: d8647b79c3b7 ("xfrm: Add user interface for esn and big anti-replay windows")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
sk->sk_mark is often read while another thread could change the value.
Fixes: 4a19ec5800fc ("[NET]: Introducing socket mark socket option.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since mlx5 supports UDP encapsulation in packet offload, change the XFRM
core to allow users to configure it.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When the xfrm device is set to the qdisc of the sfb type, the cb field
of the sent skb may be modified during enqueuing. Then,
slab-use-after-free may occur when the xfrm device sends IPv6 packets.
The stack information is as follows:
BUG: KASAN: slab-use-after-free in decode_session6+0x103f/0x1890
Read of size 1 at addr ffff8881111458ef by task swapper/3/0
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 6.4.0-next-20230707 #409
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl+0xd9/0x150
print_address_description.constprop.0+0x2c/0x3c0
kasan_report+0x11d/0x130
decode_session6+0x103f/0x1890
__xfrm_decode_session+0x54/0xb0
xfrmi_xmit+0x173/0x1ca0
dev_hard_start_xmit+0x187/0x700
sch_direct_xmit+0x1a3/0xc30
__qdisc_run+0x510/0x17a0
__dev_queue_xmit+0x2215/0x3b10
neigh_connected_output+0x3c2/0x550
ip6_finish_output2+0x55a/0x1550
ip6_finish_output+0x6b9/0x1270
ip6_output+0x1f1/0x540
ndisc_send_skb+0xa63/0x1890
ndisc_send_rs+0x132/0x6f0
addrconf_rs_timer+0x3f1/0x870
call_timer_fn+0x1a0/0x580
expire_timers+0x29b/0x4b0
run_timer_softirq+0x326/0x910
__do_softirq+0x1d4/0x905
irq_exit_rcu+0xb7/0x120
sysvec_apic_timer_interrupt+0x97/0xc0
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1a/0x20
RIP: 0010:intel_idle_hlt+0x23/0x30
Code: 1f 84 00 00 00 00 00 f3 0f 1e fa 41 54 41 89 d4 0f 1f 44 00 00 66 90 0f 1f 44 00 00 0f 00 2d c4 9f ab 00 0f 1f 44 00 00 fb f4 <fa> 44 89 e0 41 5c c3 66 0f 1f 44 00 00 f3 0f 1e fa 41 54 41 89 d4
RSP: 0018:ffffc90000197d78 EFLAGS: 00000246
RAX: 00000000000a83c3 RBX: ffffe8ffffd09c50 RCX: ffffffff8a22d8e5
RDX: 0000000000000001 RSI: ffffffff8d3f8080 RDI: ffffe8ffffd09c50
RBP: ffffffff8d3f8080 R08: 0000000000000001 R09: ffffed1026ba6d9d
R10: ffff888135d36ceb R11: 0000000000000001 R12: 0000000000000001
R13: ffffffff8d3f8100 R14: 0000000000000001 R15: 0000000000000000
cpuidle_enter_state+0xd3/0x6f0
cpuidle_enter+0x4e/0xa0
do_idle+0x2fe/0x3c0
cpu_startup_entry+0x18/0x20
start_secondary+0x200/0x290
secondary_startup_64_no_verify+0x167/0x16b
</TASK>
Allocated by task 939:
kasan_save_stack+0x22/0x40
kasan_set_track+0x25/0x30
__kasan_slab_alloc+0x7f/0x90
kmem_cache_alloc_node+0x1cd/0x410
kmalloc_reserve+0x165/0x270
__alloc_skb+0x129/0x330
inet6_ifa_notify+0x118/0x230
__ipv6_ifa_notify+0x177/0xbe0
addrconf_dad_completed+0x133/0xe00
addrconf_dad_work+0x764/0x1390
process_one_work+0xa32/0x16f0
worker_thread+0x67d/0x10c0
kthread+0x344/0x440
ret_from_fork+0x1f/0x30
The buggy address belongs to the object at ffff888111145800
which belongs to the cache skbuff_small_head of size 640
The buggy address is located 239 bytes inside of
freed 640-byte region [ffff888111145800, ffff888111145a80)
As commit f855691975bb ("xfrm6: Fix the nexthdr offset in
_decode_session6.") showed, xfrm_decode_session was originally intended
only for the receive path. IP6CB(skb)->nhoff is not set during
transmission. Therefore, set the cb field in the skb to 0 before
sending packets.
Fixes: f855691975bb ("xfrm6: Fix the nexthdr offset in _decode_session6.")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
After the elimination of inner modes, a couple of warnings that
were previously unreachable can now be triggered by malformed
inbound packets.
Fix this by:
1. Moving the setting of skb->protocol into the decap functions.
2. Returning -EINVAL when unexpected protocol is seen.
Reported-by: Maciej Żenczykowski<maze@google.com>
Fixes: 5f24f41e8ea6 ("xfrm: Remove inner/outer modes from input path")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
According to all consumers code of attrs[XFRMA_SEC_CTX], like
* verify_sec_ctx_len(), convert to xfrm_user_sec_ctx*
* xfrm_state_construct(), call security_xfrm_state_alloc whose prototype
is int security_xfrm_state_alloc(.., struct xfrm_user_sec_ctx *sec_ctx);
* copy_from_user_sec_ctx(), convert to xfrm_user_sec_ctx *
...
It seems that the expected parsing result for XFRMA_SEC_CTX should be
structure xfrm_user_sec_ctx, and the current xfrm_sec_ctx is confusing
and misleading (Luckily, they happen to have same size 8 bytes).
This commit amend the policy structure to xfrm_user_sec_ctx to avoid
ambiguity.
Fixes: cf5cb79f6946 ("[XFRM] netlink: Establish an attribute policy")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
We found below OOB crash:
[ 44.211730] ==================================================================
[ 44.212045] BUG: KASAN: slab-out-of-bounds in memcmp+0x8b/0xb0
[ 44.212045] Read of size 8 at addr ffff88800870f320 by task poc.xfrm/97
[ 44.212045]
[ 44.212045] CPU: 0 PID: 97 Comm: poc.xfrm Not tainted 6.4.0-rc7-00072-gdad9774deaf1-dirty #4
[ 44.212045] Call Trace:
[ 44.212045] <TASK>
[ 44.212045] dump_stack_lvl+0x37/0x50
[ 44.212045] print_report+0xcc/0x620
[ 44.212045] ? __virt_addr_valid+0xf3/0x170
[ 44.212045] ? memcmp+0x8b/0xb0
[ 44.212045] kasan_report+0xb2/0xe0
[ 44.212045] ? memcmp+0x8b/0xb0
[ 44.212045] kasan_check_range+0x39/0x1c0
[ 44.212045] memcmp+0x8b/0xb0
[ 44.212045] xfrm_state_walk+0x21c/0x420
[ 44.212045] ? __pfx_dump_one_state+0x10/0x10
[ 44.212045] xfrm_dump_sa+0x1e2/0x290
[ 44.212045] ? __pfx_xfrm_dump_sa+0x10/0x10
[ 44.212045] ? __kernel_text_address+0xd/0x40
[ 44.212045] ? kasan_unpoison+0x27/0x60
[ 44.212045] ? mutex_lock+0x60/0xe0
[ 44.212045] ? __pfx_mutex_lock+0x10/0x10
[ 44.212045] ? kasan_save_stack+0x22/0x50
[ 44.212045] netlink_dump+0x322/0x6c0
[ 44.212045] ? __pfx_netlink_dump+0x10/0x10
[ 44.212045] ? mutex_unlock+0x7f/0xd0
[ 44.212045] ? __pfx_mutex_unlock+0x10/0x10
[ 44.212045] __netlink_dump_start+0x353/0x430
[ 44.212045] xfrm_user_rcv_msg+0x3a4/0x410
[ 44.212045] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[ 44.212045] ? __pfx_xfrm_user_rcv_msg+0x10/0x10
[ 44.212045] ? __pfx_xfrm_dump_sa+0x10/0x10
[ 44.212045] ? __pfx_xfrm_dump_sa_done+0x10/0x10
[ 44.212045] ? __stack_depot_save+0x382/0x4e0
[ 44.212045] ? filter_irq_stacks+0x1c/0x70
[ 44.212045] ? kasan_save_stack+0x32/0x50
[ 44.212045] ? kasan_save_stack+0x22/0x50
[ 44.212045] ? kasan_set_track+0x25/0x30
[ 44.212045] ? __kasan_slab_alloc+0x59/0x70
[ 44.212045] ? kmem_cache_alloc_node+0xf7/0x260
[ 44.212045] ? kmalloc_reserve+0xab/0x120
[ 44.212045] ? __alloc_skb+0xcf/0x210
[ 44.212045] ? netlink_sendmsg+0x509/0x700
[ 44.212045] ? sock_sendmsg+0xde/0xe0
[ 44.212045] ? __sys_sendto+0x18d/0x230
[ 44.212045] ? __x64_sys_sendto+0x71/0x90
[ 44.212045] ? do_syscall_64+0x3f/0x90
[ 44.212045] ? entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 44.212045] ? netlink_sendmsg+0x509/0x700
[ 44.212045] ? sock_sendmsg+0xde/0xe0
[ 44.212045] ? __sys_sendto+0x18d/0x230
[ 44.212045] ? __x64_sys_sendto+0x71/0x90
[ 44.212045] ? do_syscall_64+0x3f/0x90
[ 44.212045] ? entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 44.212045] ? kasan_save_stack+0x22/0x50
[ 44.212045] ? kasan_set_track+0x25/0x30
[ 44.212045] ? kasan_save_free_info+0x2e/0x50
[ 44.212045] ? __kasan_slab_free+0x10a/0x190
[ 44.212045] ? kmem_cache_free+0x9c/0x340
[ 44.212045] ? netlink_recvmsg+0x23c/0x660
[ 44.212045] ? sock_recvmsg+0xeb/0xf0
[ 44.212045] ? __sys_recvfrom+0x13c/0x1f0
[ 44.212045] ? __x64_sys_recvfrom+0x71/0x90
[ 44.212045] ? do_syscall_64+0x3f/0x90
[ 44.212045] ? entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 44.212045] ? copyout+0x3e/0x50
[ 44.212045] netlink_rcv_skb+0xd6/0x210
[ 44.212045] ? __pfx_xfrm_user_rcv_msg+0x10/0x10
[ 44.212045] ? __pfx_netlink_rcv_skb+0x10/0x10
[ 44.212045] ? __pfx_sock_has_perm+0x10/0x10
[ 44.212045] ? mutex_lock+0x8d/0xe0
[ 44.212045] ? __pfx_mutex_lock+0x10/0x10
[ 44.212045] xfrm_netlink_rcv+0x44/0x50
[ 44.212045] netlink_unicast+0x36f/0x4c0
[ 44.212045] ? __pfx_netlink_unicast+0x10/0x10
[ 44.212045] ? netlink_recvmsg+0x500/0x660
[ 44.212045] netlink_sendmsg+0x3b7/0x700
[ 44.212045] ? __pfx_netlink_sendmsg+0x10/0x10
[ 44.212045] ? __pfx_netlink_sendmsg+0x10/0x10
[ 44.212045] sock_sendmsg+0xde/0xe0
[ 44.212045] __sys_sendto+0x18d/0x230
[ 44.212045] ? __pfx___sys_sendto+0x10/0x10
[ 44.212045] ? rcu_core+0x44a/0xe10
[ 44.212045] ? __rseq_handle_notify_resume+0x45b/0x740
[ 44.212045] ? _raw_spin_lock_irq+0x81/0xe0
[ 44.212045] ? __pfx___rseq_handle_notify_resume+0x10/0x10
[ 44.212045] ? __pfx_restore_fpregs_from_fpstate+0x10/0x10
[ 44.212045] ? __pfx_blkcg_maybe_throttle_current+0x10/0x10
[ 44.212045] ? __pfx_task_work_run+0x10/0x10
[ 44.212045] __x64_sys_sendto+0x71/0x90
[ 44.212045] do_syscall_64+0x3f/0x90
[ 44.212045] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 44.212045] RIP: 0033:0x44b7da
[ 44.212045] RSP: 002b:00007ffdc8838548 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 44.212045] RAX: ffffffffffffffda RBX: 00007ffdc8839978 RCX: 000000000044b7da
[ 44.212045] RDX: 0000000000000038 RSI: 00007ffdc8838770 RDI: 0000000000000003
[ 44.212045] RBP: 00007ffdc88385b0 R08: 00007ffdc883858c R09: 000000000000000c
[ 44.212045] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
[ 44.212045] R13: 00007ffdc8839968 R14: 00000000004c37d0 R15: 0000000000000001
[ 44.212045] </TASK>
[ 44.212045]
[ 44.212045] Allocated by task 97:
[ 44.212045] kasan_save_stack+0x22/0x50
[ 44.212045] kasan_set_track+0x25/0x30
[ 44.212045] __kasan_kmalloc+0x7f/0x90
[ 44.212045] __kmalloc_node_track_caller+0x5b/0x140
[ 44.212045] kmemdup+0x21/0x50
[ 44.212045] xfrm_dump_sa+0x17d/0x290
[ 44.212045] netlink_dump+0x322/0x6c0
[ 44.212045] __netlink_dump_start+0x353/0x430
[ 44.212045] xfrm_user_rcv_msg+0x3a4/0x410
[ 44.212045] netlink_rcv_skb+0xd6/0x210
[ 44.212045] xfrm_netlink_rcv+0x44/0x50
[ 44.212045] netlink_unicast+0x36f/0x4c0
[ 44.212045] netlink_sendmsg+0x3b7/0x700
[ 44.212045] sock_sendmsg+0xde/0xe0
[ 44.212045] __sys_sendto+0x18d/0x230
[ 44.212045] __x64_sys_sendto+0x71/0x90
[ 44.212045] do_syscall_64+0x3f/0x90
[ 44.212045] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 44.212045]
[ 44.212045] The buggy address belongs to the object at ffff88800870f300
[ 44.212045] which belongs to the cache kmalloc-64 of size 64
[ 44.212045] The buggy address is located 32 bytes inside of
[ 44.212045] allocated 36-byte region [ffff88800870f300, ffff88800870f324)
[ 44.212045]
[ 44.212045] The buggy address belongs to the physical page:
[ 44.212045] page:00000000e4de16ee refcount:1 mapcount:0 mapping:000000000 ...
[ 44.212045] flags: 0x100000000000200(slab|node=0|zone=1)
[ 44.212045] page_type: 0xffffffff()
[ 44.212045] raw: 0100000000000200 ffff888004c41640 dead000000000122 0000000000000000
[ 44.212045] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 44.212045] page dumped because: kasan: bad access detected
[ 44.212045]
[ 44.212045] Memory state around the buggy address:
[ 44.212045] ffff88800870f200: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 44.212045] ffff88800870f280: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
[ 44.212045] >ffff88800870f300: 00 00 00 00 04 fc fc fc fc fc fc fc fc fc fc fc
[ 44.212045] ^
[ 44.212045] ffff88800870f380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 44.212045] ffff88800870f400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 44.212045] ==================================================================
By investigating the code, we find the root cause of this OOB is the lack
of checks in xfrm_dump_sa(). The buggy code allows a malicious user to pass
arbitrary value of filter->splen/dplen. Hence, with crafted xfrm states,
the attacker can achieve 8 bytes heap OOB read, which causes info leak.
if (attrs[XFRMA_ADDRESS_FILTER]) {
filter = kmemdup(nla_data(attrs[XFRMA_ADDRESS_FILTER]),
sizeof(*filter), GFP_KERNEL);
if (filter == NULL)
return -ENOMEM;
// NO MORE CHECKS HERE !!!
}
This patch fixes the OOB by adding necessary boundary checks, just like
the code in pfkey_dump() function.
Fixes: d3623099d350 ("ipsec: add support of limited SA dump")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
As MSG_SENDPAGE_NOTLAST is being phased out along with sendpage(), don't
use it further in than the sendpage methods, but rather translate it to
MSG_MORE and use that instead.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
cc: Bernard Metzler <bmt@zurich.ibm.com>
cc: Jason Gunthorpe <jgg@ziepe.ca>
cc: Leon Romanovsky <leon@kernel.org>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jakub Sitnicki <jakub@cloudflare.com>
cc: David Ahern <dsahern@kernel.org>
cc: Karsten Graul <kgraul@linux.ibm.com>
cc: Wenjia Zhang <wenjia@linux.ibm.com>
cc: Jan Karcher <jaka@linux.ibm.com>
cc: "D. Wythe" <alibuda@linux.alibaba.com>
cc: Tony Lu <tonylu@linux.alibaba.com>
cc: Wen Gu <guwen@linux.alibaba.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: Steffen Klassert <steffen.klassert@secunet.com>
cc: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/20230623225513.2732256-2-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
tools/testing/selftests/net/fcnal-test.sh
d7a2fc1437f7 ("selftests: net: fcnal-test: check if FIPS mode is enabled")
dd017c72dde6 ("selftests: fcnal: Test SO_DONTROUTE on TCP sockets.")
https://lore.kernel.org/all/5007b52c-dd16-dbf6-8d64-b9701bfa498b@tessares.net/
https://lore.kernel.org/all/20230619105427.4a0df9b3@canb.auug.org.au/
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
For BEET the inner address and therefore family is stored in the
xfrm_state selector. Use that when decapsulating an input packet
instead of incorrectly relying on a non-existent tunnel protocol.
Fixes: 5f24f41e8ea6 ("xfrm: Remove inner/outer modes from input path")
Reported-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Move declarations into include/net/gso.h and code into net/core/gso.c
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230608191738.3947077-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Offloaded policies are deleted through two flows: netdev is going
down and policy flush.
In both cases, the code lacks relevant call to delete offloaded policy.
Fixes: 919e43fad516 ("xfrm: add an interface to offload policy")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
do_tcp_sendpages() is now just a small wrapper around tcp_sendmsg_locked(),
so inline it, allowing do_tcp_sendpages() to be removed. This is part of
replacing ->sendpage() with a call to sendmsg() with MSG_SPLICE_PAGES set.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steffen Klassert <steffen.klassert@secunet.com>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This change adds methods in the XFRM-I input path that ensures that
policies are checked prior to processing of the subsequent decapsulated
packet, after which the relevant policies may no longer be resolvable
(due to changing src/dst/proto/etc).
Notably, raw ESP/AH packets did not perform policy checks inherently,
whereas all other encapsulated packets (UDP, TCP encapsulated) do policy
checks after calling xfrm_input handling in the respective encapsulation
layer.
Fixes: b0355dbbf13c ("Fix XFRM-I support for nested ESP tunnels")
Test: Verified with additional Android Kernel Unit tests
Test: Verified against Android CTS
Signed-off-by: Benedict Wong <benedictwong@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
This change allows inbound traffic through nested IPsec tunnels to
successfully match policies and templates, while retaining the secpath
stack trace as necessary for netfilter policies.
Specifically, this patch marks secpath entries that have already matched
against a relevant policy as having been verified, allowing it to be
treated as optional and skipped after a tunnel decapsulation (during
which the src/dst/proto/etc may have changed, and the correct policy
chain no long be resolvable).
This approach is taken as opposed to the iteration in b0355dbbf13c,
where the secpath was cleared, since that breaks subsequent validations
that rely on the existence of the secpath entries (netfilter policies, or
transport-in-tunnel mode, where policies remain resolvable).
Fixes: b0355dbbf13c ("Fix XFRM-I support for nested ESP tunnels")
Test: Tested against Android Kernel Unit Tests
Test: Tested against Android CTS
Signed-off-by: Benedict Wong <benedictwong@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Conflicts:
drivers/net/ethernet/freescale/fec_main.c
6ead9c98cafc ("net: fec: remove the xdp_return_frame when lack of tx BDs")
144470c88c5d ("net: fec: using the standard return codes when xdp xmit errors")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2023-05-16
1) Don't check the policy default if we have an allow
policy. Fix from Sabrina Dubroca.
2) Fix netdevice refount usage on offload.
From Leon Romanovsky.
3) Use netdev_put instead of dev_puti to correctly release
the netdev on failure in xfrm_dev_policy_add.
From Leon Romanovsky.
4) Revert "Fix XFRM-I support for nested ESP tunnels"
This broke Netfilter policy matching.
From Martin Willi.
5) Reject optional tunnel/BEET mode templates in outbound policies
on netlink and pfkey sockets. From Tobias Brunner.
6) Check if_id in inbound policy/secpath match to make
it symetric to the outbound codepath.
From Benedict Wong.
* tag 'ipsec-2023-05-16' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
xfrm: Check if_id in inbound policy/secpath match
af_key: Reject optional tunnel/BEET mode templates in outbound policies
xfrm: Reject optional tunnel/BEET mode templates in outbound policies
Revert "Fix XFRM-I support for nested ESP tunnels"
xfrm: Fix leak of dev tracker
xfrm: release all offloaded policy memory
xfrm: don't check the default policy if the policy allows the packet
====================
Link: https://lore.kernel.org/r/20230516052405.2677554-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Most users use __skb_frag_set_page()/skb_frag_off_set()/
skb_frag_size_set() to fill the page desc for a skb frag.
Introduce skb_frag_fill_page_desc() to do that.
net/bpf/test_run.c does not call skb_frag_off_set() to
set the offset, "copy_from_user(page_address(page), ...)"
and 'shinfo' being part of the 'data' kzalloced in
bpf_test_init() suggest that it is assuming offset to be
initialized as zero, so call skb_frag_fill_page_desc()
with offset being zero for this case.
Also, skb_frag_set_page() is not used anymore, so remove
it.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This change ensures that if configured in the policy, the if_id set in
the policy and secpath states match during the inbound policy check.
Without this, there is potential for ambiguity where entries in the
secpath differing by only the if_id could be mismatched.
Notably, this is checked in the outbound direction when resolving
templates to SAs, but not on the inbound path when matching SAs and
policies.
Test: Tested against Android kernel unit tests & CTS
Signed-off-by: Benedict Wong <benedictwong@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
xfrm_state_find() uses `encap_family` of the current template with
the passed local and remote addresses to find a matching state.
If an optional tunnel or BEET mode template is skipped in a mixed-family
scenario, there could be a mismatch causing an out-of-bounds read as
the addresses were not replaced to match the family of the next template.
While there are theoretical use cases for optional templates in outbound
policies, the only practical one is to skip IPComp states in inbound
policies if uncompressed packets are received that are handled by an
implicitly created IPIP state instead.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Tobias Brunner <tobias@strongswan.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
This reverts commit b0355dbbf13c0052931dd14c38c789efed64d3de.
The reverted commit clears the secpath on packets received via xfrm interfaces
to support nested IPsec tunnels. This breaks Netfilter policy matching using
xt_policy in the FORWARD chain, as the secpath is missing during forwarding.
Additionally, Benedict Wong reports that it breaks Transport-in-Tunnel mode.
Fix this regression by reverting the commit until we have a better approach
for nested IPsec tunnels.
Fixes: b0355dbbf13c ("Fix XFRM-I support for nested ESP tunnels")
Link: https://lore.kernel.org/netdev/20230412085615.124791-1-martin@strongswan.org/
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
At the stage of direction checks, the netdev reference tracker is
already initialized, but released with wrong *_put() call.
Fixes: 919e43fad516 ("xfrm: add an interface to offload policy")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Failure to add offloaded policy will cause to the following
error once user will try to reload driver.
Unregister_netdevice: waiting for eth3 to become free. Usage count = 2
This was caused by xfrm_dev_policy_add() which increments reference
to net_device. That reference was supposed to be decremented
in xfrm_dev_policy_free(). However the latter wasn't called.
unregister_netdevice: waiting for eth3 to become free. Usage count = 2
leaked reference.
xfrm_dev_policy_add+0xff/0x3d0
xfrm_policy_construct+0x352/0x420
xfrm_add_policy+0x179/0x320
xfrm_user_rcv_msg+0x1d2/0x3d0
netlink_rcv_skb+0xe0/0x210
xfrm_netlink_rcv+0x45/0x50
netlink_unicast+0x346/0x490
netlink_sendmsg+0x3b0/0x6c0
sock_sendmsg+0x73/0xc0
sock_write_iter+0x13b/0x1f0
vfs_write+0x528/0x5d0
ksys_write+0x120/0x150
do_syscall_64+0x3d/0x90
entry_SYSCALL_64_after_hwframe+0x46/0xb0
Fixes: 919e43fad516 ("xfrm: add an interface to offload policy")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
ipsec-next 2023-04-19
1) Remove inner/outer modes from input/output path. These are
not needed anymore. From Herbert Xu.
* tag 'ipsec-next-2023-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: Remove inner/outer modes from output path
xfrm: Remove inner/outer modes from input path
====================
Link: https://lore.kernel.org/r/20230419075300.452227-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Leon Romanovsky says:
====================
Improve IPsec limits, ESN and replay window
This series overcomes existing hardware limitations in Mellanox ConnectX
devices around handling IPsec soft and hard limits.
In addition, the ESN logic is tied and added an interface to configure
replay window sequence numbers through existing iproute2 interface.
ip xfrm state ... [ replay-seq SEQ ] [ replay-oseq SEQ ]
[ replay-seq-hi SEQ ] [ replay-oseq-hi SEQ ]
Link: https://lore.kernel.org/all/cover.1680162300.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
* tag 'ipsec-esn-replay' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5e: Simulate missing IPsec TX limits hardware functionality
net/mlx5e: Generalize IPsec work structs
net/mlx5e: Reduce contention in IPsec workqueue
net/mlx5e: Set IPsec replay sequence numbers
net/mlx5e: Remove ESN callbacks if it is not supported
xfrm: don't require advance ESN callback for packet offload
net/mlx5e: Overcome slow response for first IPsec ASO WQE
net/mlx5e: Add SW implementation to support IPsec 64 bit soft and hard limits
net/mlx5e: Prevent zero IPsec soft/hard limits
net/mlx5e: Factor out IPsec ASO update function
====================
Link: https://lore.kernel.org/r/20230406071902.712388-1-leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The current code doesn't let a simple "allow" policy counteract a
default policy blocking all incoming packets:
ip x p setdefault in block
ip x p a src 192.168.2.1/32 dst 192.168.2.2/32 dir in action allow
At this stage, we have an allow policy (with or without transforms)
for this packet. It doesn't matter what the default policy says, since
the policy we looked up lets the packet through. The case of a
blocking policy is already handled separately, so we can remove this
check.
Fixes: 2d151d39073a ("xfrm: Add possibility to set the default to block if we have no policy")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
In packet offload mode, the hardware is responsible to manage
replay window and advance ESN. In that mode, there won't any
call to .xdo_dev_state_advance_esn callback.
So relax current check for existence of that callback.
Link: https://lore.kernel.org/r/9f3dfc3fef2cfcd191f0c5eee7cf0aa74e7f7786.1680162300.git.leonro@nvidia.com
Reviewed-by: Raed Salem <raeds@nvidia.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Leon Romanovsky says:
====================
Extend packet offload to fully support libreswan
The following patches are an outcome of Raed's work to add packet
offload support to libreswan [1].
The series includes:
* Priority support to IPsec policies
* Statistics per-SA (visible through "ip -s xfrm state ..." command)
* Support to IKE policy holes
* Fine tuning to acquire logic.
[1] https://github.com/libreswan/libreswan/pull/986
Link: https://lore.kernel.org/all/cover.1678714336.git.leon@kernel.org
* tag 'ipsec-libreswan-mlx5' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5e: Update IPsec per SA packets/bytes count
net/mlx5e: Use one rule to count all IPsec Tx offloaded traffic
net/mlx5e: Support IPsec acquire default SA
net/mlx5e: Allow policies with reqid 0, to support IKE policy holes
xfrm: copy_to_user_state fetch offloaded SA packets/bytes statistics
xfrm: add new device offload acquire flag
net/mlx5e: Use chains for IPsec policy priority offload
net/mlx5: fs_core: Allow ignore_flow_level on TX dest
net/mlx5: fs_chains: Refactor to detach chains from tc usage
====================
Link: https://lore.kernel.org/r/20230320094722.1009304-1-leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Both in RX and TX, the traffic that performs IPsec packet offload
transformation is accounted by HW only. Consequently, the HW should
be queried for packets/bytes statistics when user asks for such
transformation data.
Signed-off-by: Raed Salem <raeds@nvidia.com>
Link: https://lore.kernel.org/r/d90ec74186452b1509ee94875d942cb777b7181e.1678714336.git.leon@kernel.org
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
During XFRM acquire flow, a default SA is created to be updated later,
once acquire netlink message is handled in user space. When the relevant
policy is offloaded this default SA is also offloaded to IPsec offload
supporting driver, however this SA does not have context suitable for
offloading in HW, nor is interesting to offload to HW, consequently needs
a special driver handling apart from other offloaded SA(s).
Add a special flag that marks such SA so driver can handle it correctly.
Signed-off-by: Raed Salem <raeds@nvidia.com>
Link: https://lore.kernel.org/r/f5da0834d8c6b82ab9ba38bd4a0c55e71f0e3dab.1678714336.git.leon@kernel.org
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2023-03-15
1) Fix an information leak when dumping algos and encap.
From Herbert Xu
2) Allow transport-mode states with AF_UNSPEC selector
to allow for nested transport-mode states.
From Herbert Xu.
* tag 'ipsec-2023-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
xfrm: Allow transport-mode states with AF_UNSPEC selector
xfrm: Zero padding when dumping algos and encap
====================
Link: https://lore.kernel.org/r/20230315105623.1396491-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The inner/outer modes were added to abstract out common code that
were once duplicated between IPv4 and IPv6. As time went on the
abstractions have been removed and we are now left with empty
shells that only contain duplicate information. These can be
removed one-by-one as the same information is already present
elsewhere in the xfrm_state object.
Just like the input-side, removing this from the output code
makes it possible to use transport-mode SAs underneath an
inter-family tunnel mode SA.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
The inner/outer modes were added to abstract out common code that
were once duplicated between IPv4 and IPv6. As time went on the
abstractions have been removed and we are now left with empty
shells that only contain duplicate information. These can be
removed one-by-one as the same information is already present
elsewhere in the xfrm_state object.
Removing them from the input path actually allows certain valid
combinations that are currently disallowed. In particular, when
a transport mode SA sits beneath a tunnel mode SA that changes
address families, at present the transport mode SA cannot have
AF_UNSPEC as its selector because it will be erroneously be treated
as inter-family itself even though it simply sits beneath one.
This is a serious problem because you can't set the selector to
non-AF_UNSPEC either as that will cause the selector match to
fail as we always match selectors to the inner-most traffic.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
xfrm state selectors are matched against the inner-most flow
which can be of any address family. Therefore middle states
in nested configurations need to carry a wildcard selector in
order to work at all.
However, this is currently forbidden for transport-mode states.
Fix this by removing the unnecessary check.
Fixes: 13996378e658 ("[IPSEC]: Rename mode to outer_mode and add inner_mode")
Reported-by: David George <David.George@sophos.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
When copying data to user-space we should ensure that only valid
data is copied over. Padding in structures may be filled with
random (possibly sensitve) data and should never be given directly
to user-space.
This patch fixes the copying of xfrm algorithms and the encap
template in xfrm_user so that padding is zeroed.
Reported-by: syzbot+fa5414772d5c445dac3c@syzkaller.appspotmail.com
Reported-by: Hyunwoo Kim <v4bel@theori.io>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
====================
pull-request: bpf-next 2023-02-11
We've added 96 non-merge commits during the last 14 day(s) which contain
a total of 152 files changed, 4884 insertions(+), 962 deletions(-).
There is a minor conflict in drivers/net/ethernet/intel/ice/ice_main.c
between commit 5b246e533d01 ("ice: split probe into smaller functions")
from the net-next tree and commit 66c0e13ad236 ("drivers: net: turn on
XDP features") from the bpf-next tree. Remove the hunk given ice_cfg_netdev()
is otherwise there a 2nd time, and add XDP features to the existing
ice_cfg_netdev() one:
[...]
ice_set_netdev_features(netdev);
netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT |
NETDEV_XDP_ACT_XSK_ZEROCOPY;
ice_set_ops(netdev);
[...]
Stephen's merge conflict mail:
https://lore.kernel.org/bpf/20230207101951.21a114fa@canb.auug.org.au/
The main changes are:
1) Add support for BPF trampoline on s390x which finally allows to remove many
test cases from the BPF CI's DENYLIST.s390x, from Ilya Leoshkevich.
2) Add multi-buffer XDP support to ice driver, from Maciej Fijalkowski.
3) Add capability to export the XDP features supported by the NIC.
Along with that, add a XDP compliance test tool,
from Lorenzo Bianconi & Marek Majtyka.
4) Add __bpf_kfunc tag for marking kernel functions as kfuncs,
from David Vernet.
5) Add a deep dive documentation about the verifier's register
liveness tracking algorithm, from Eduard Zingerman.
6) Fix and follow-up cleanups for resolve_btfids to be compiled
as a host program to avoid cross compile issues,
from Jiri Olsa & Ian Rogers.
7) Batch of fixes to the BPF selftest for xdp_hw_metadata which resulted
when testing on different NICs, from Jesper Dangaard Brouer.
8) Fix libbpf to better detect kernel version code on Debian, from Hao Xiang.
9) Extend libbpf to add an option for when the perf buffer should
wake up, from Jon Doron.
10) Follow-up fix on xdp_metadata selftest to just consume on TX
completion, from Stanislav Fomichev.
11) Extend the kfuncs.rst document with description on kfunc
lifecycle & stability expectations, from David Vernet.
12) Fix bpftool prog profile to skip attaching to offline CPUs,
from Tonghao Zhang.
====================
Link: https://lore.kernel.org/r/20230211002037.8489-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
net/devlink/leftover.c / net/core/devlink.c:
565b4824c39f ("devlink: change port event netdev notifier from per-net to global")
f05bd8ebeb69 ("devlink: move code to a dedicated directory")
687125b5799c ("devlink: split out core code")
https://lore.kernel.org/all/20230208094657.379f2b1a@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In netdev common pattern, extack pointer is forwarded to the drivers
to be filled with error message. However, the caller can easily
overwrite the filled message.
Instead of adding multiple "if (!extack->_msg)" checks before any
NL_SET_ERR_MSG() call, which appears after call to the driver, let's
add new macro to common code.
[1] https://lore.kernel.org/all/Y9Irgrgf3uxOjwUm@unreal
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/6993fac557a40a1973dfa0095107c3d03d40bec1.1675171790.git.leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Now that we have the __bpf_kfunc tag, we should use add it to all
existing kfuncs to ensure that they'll never be elided in LTO builds.
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20230201173016.342758-4-void@manifault.com
|
|
When copying the DSCP bits for decap-dscp into IPv6 don't assume the
outer encap is always IPv6. Instead, as with the inner IPv4 case, copy
the DSCP bits from the correctly saved "tos" value in the control block.
Fixes: 227620e29509 ("[IPSEC]: Separate inner/outer mode processing on input")
Signed-off-by: Christian Hopps <chopps@chopps.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
KCSAN reported multiple cpus can update use_time
at the same time.
Adds READ_ONCE()/WRITE_ONCE() annotations.
Note that 32bit arches are not fully protected,
but they will probably no longer be supported/used in 2106.
BUG: KCSAN: data-race in __xfrm_policy_check / __xfrm_policy_check
write to 0xffff88813e7ec108 of 8 bytes by interrupt on cpu 0:
__xfrm_policy_check+0x6ae/0x17f0 net/xfrm/xfrm_policy.c:3664
__xfrm_policy_check2 include/net/xfrm.h:1174 [inline]
xfrm_policy_check include/net/xfrm.h:1179 [inline]
xfrm6_policy_check+0x2e9/0x320 include/net/xfrm.h:1189
udpv6_queue_rcv_one_skb+0x48/0xa30 net/ipv6/udp.c:703
udpv6_queue_rcv_skb+0x2d6/0x310 net/ipv6/udp.c:792
udp6_unicast_rcv_skb+0x16b/0x190 net/ipv6/udp.c:935
__udp6_lib_rcv+0x84b/0x9b0 net/ipv6/udp.c:1020
udpv6_rcv+0x4b/0x50 net/ipv6/udp.c:1133
ip6_protocol_deliver_rcu+0x99e/0x1020 net/ipv6/ip6_input.c:439
ip6_input_finish net/ipv6/ip6_input.c:484 [inline]
NF_HOOK include/linux/netfilter.h:302 [inline]
ip6_input+0xca/0x180 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:454 [inline]
ip6_rcv_finish+0x1e9/0x2d0 net/ipv6/ip6_input.c:79
NF_HOOK include/linux/netfilter.h:302 [inline]
ipv6_rcv+0x85/0x140 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core net/core/dev.c:5482 [inline]
__netif_receive_skb+0x8b/0x1b0 net/core/dev.c:5596
process_backlog+0x23f/0x3b0 net/core/dev.c:5924
__napi_poll+0x65/0x390 net/core/dev.c:6485
napi_poll net/core/dev.c:6552 [inline]
net_rx_action+0x37e/0x730 net/core/dev.c:6663
__do_softirq+0xf2/0x2c7 kernel/softirq.c:571
do_softirq+0xb1/0xf0 kernel/softirq.c:472
__local_bh_enable_ip+0x6f/0x80 kernel/softirq.c:396
__raw_read_unlock_bh include/linux/rwlock_api_smp.h:257 [inline]
_raw_read_unlock_bh+0x17/0x20 kernel/locking/spinlock.c:284
wg_socket_send_skb_to_peer+0x107/0x120 drivers/net/wireguard/socket.c:184
wg_packet_create_data_done drivers/net/wireguard/send.c:251 [inline]
wg_packet_tx_worker+0x142/0x360 drivers/net/wireguard/send.c:276
process_one_work+0x3d3/0x720 kernel/workqueue.c:2289
worker_thread+0x618/0xa70 kernel/workqueue.c:2436
kthread+0x1a9/0x1e0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
write to 0xffff88813e7ec108 of 8 bytes by interrupt on cpu 1:
__xfrm_policy_check+0x6ae/0x17f0 net/xfrm/xfrm_policy.c:3664
__xfrm_policy_check2 include/net/xfrm.h:1174 [inline]
xfrm_policy_check include/net/xfrm.h:1179 [inline]
xfrm6_policy_check+0x2e9/0x320 include/net/xfrm.h:1189
udpv6_queue_rcv_one_skb+0x48/0xa30 net/ipv6/udp.c:703
udpv6_queue_rcv_skb+0x2d6/0x310 net/ipv6/udp.c:792
udp6_unicast_rcv_skb+0x16b/0x190 net/ipv6/udp.c:935
__udp6_lib_rcv+0x84b/0x9b0 net/ipv6/udp.c:1020
udpv6_rcv+0x4b/0x50 net/ipv6/udp.c:1133
ip6_protocol_deliver_rcu+0x99e/0x1020 net/ipv6/ip6_input.c:439
ip6_input_finish net/ipv6/ip6_input.c:484 [inline]
NF_HOOK include/linux/netfilter.h:302 [inline]
ip6_input+0xca/0x180 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:454 [inline]
ip6_rcv_finish+0x1e9/0x2d0 net/ipv6/ip6_input.c:79
NF_HOOK include/linux/netfilter.h:302 [inline]
ipv6_rcv+0x85/0x140 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core net/core/dev.c:5482 [inline]
__netif_receive_skb+0x8b/0x1b0 net/core/dev.c:5596
process_backlog+0x23f/0x3b0 net/core/dev.c:5924
__napi_poll+0x65/0x390 net/core/dev.c:6485
napi_poll net/core/dev.c:6552 [inline]
net_rx_action+0x37e/0x730 net/core/dev.c:6663
__do_softirq+0xf2/0x2c7 kernel/softirq.c:571
do_softirq+0xb1/0xf0 kernel/softirq.c:472
__local_bh_enable_ip+0x6f/0x80 kernel/softirq.c:396
__raw_read_unlock_bh include/linux/rwlock_api_smp.h:257 [inline]
_raw_read_unlock_bh+0x17/0x20 kernel/locking/spinlock.c:284
wg_socket_send_skb_to_peer+0x107/0x120 drivers/net/wireguard/socket.c:184
wg_packet_create_data_done drivers/net/wireguard/send.c:251 [inline]
wg_packet_tx_worker+0x142/0x360 drivers/net/wireguard/send.c:276
process_one_work+0x3d3/0x720 kernel/workqueue.c:2289
worker_thread+0x618/0xa70 kernel/workqueue.c:2436
kthread+0x1a9/0x1e0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
value changed: 0x0000000063c62d6f -> 0x0000000063c62d70
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 4185 Comm: kworker/1:2 Tainted: G W 6.2.0-rc4-syzkaller-00009-gd532dd102151-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
Workqueue: wg-crypt-wg0 wg_packet_tx_worker
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
For some reason, blamed commit did the right thing in xfrm_policy_timer()
but did not in xfrm_timer_handler()
Fixes: 386c5680e2e8 ("xfrm: use time64_t for in-kernel timestamps")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Almost all validation logic is in the drivers, but they are
missing reliable way to convey failure reason to userspace
applications.
Let's use extack to return this information to users.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Almost all validation logic is in the drivers, but they are
missing reliable way to convey failure reason to userspace
applications.
Let's use extack to return this information to users.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As suggested by Cong, introduce a tracepoint for all ->sk_data_ready()
callback implementations. For example:
<...>
iperf-609 [002] ..... 70.660425: sk_data_ready: family=2 protocol=6 func=sock_def_readable
iperf-609 [002] ..... 70.660436: sk_data_ready: family=2 protocol=6 func=sock_def_readable
<...>
Suggested-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
int type = nla_type(nla);
if (type > XFRMA_MAX) {
return -EOPNOTSUPP;
}
@type is then used as an array index and can be used
as a Spectre v1 gadget.
if (nla_len(nla) < compat_policy[type].len) {
array_index_nospec() can be used to prevent leaking
content of kernel memory to malicious users.
Fixes: 5106f4a8acff ("xfrm/compat: Add 32=>64-bit messages translator")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dmitry Safonov <dima@arista.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Compare XFRM_MSG_NEWSPDINFO (value from netlink
configuration messages enum) with nlh_src->nlmsg_type
instead of nlh_src->nlmsg_type - XFRM_MSG_BASE.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 4e9505064f58 ("net/xfrm/compat: Copy xfrm_spdattr_type_t atributes")
Signed-off-by: Anastasia Belova <abelova@astralinux.ru>
Acked-by: Dmitry Safonov <0x7f454c46@gmail.com>
Tested-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
This change adds support for nested IPsec tunnels by ensuring that
XFRM-I verifies existing policies before decapsulating a subsequent
policies. Addtionally, this clears the secpath entries after policies
are verified, ensuring that previous tunnels with no-longer-valid
do not pollute subsequent policy checks.
This is necessary especially for nested tunnels, as the IP addresses,
protocol and ports may all change, thus not matching the previous
policies. In order to ensure that packets match the relevant inbound
templates, the xfrm_policy_check should be done before handing off to
the inner XFRM protocol to decrypt and decapsulate.
Notably, raw ESP/AH packets did not perform policy checks inherently,
whereas all other encapsulated packets (UDP, TCP encapsulated) do policy
checks after calling xfrm_input handling in the respective encapsulation
layer.
Test: Verified with additional Android Kernel Unit tests
Signed-off-by: Benedict Wong <benedictwong@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Since moving to memalloc_nofs_save/restore, SUNRPC has stopped setting the
GFP_NOIO flag on sk_allocation which the networking system uses to decide
when it is safe to use current->task_frag. The results of this are
unexpected corruption in task_frag when SUNRPC is involved in memory
reclaim.
The corruption can be seen in crashes, but the root cause is often
difficult to ascertain as a crashing machine's stack trace will have no
evidence of being near NFS or SUNRPC code. I believe this problem to
be much more pervasive than reports to the community may indicate.
Fix this by having kernel users of sockets that may corrupt task_frag due
to reclaim set sk_use_task_frag = false. Preemptively correcting this
situation for users that still set sk_allocation allows them to convert to
memalloc_nofs_save/restore without the same unexpected corruptions that are
sure to follow, unlikely to show up in testing, and difficult to bisect.
CC: Philipp Reisner <philipp.reisner@linbit.com>
CC: Lars Ellenberg <lars.ellenberg@linbit.com>
CC: "Christoph Böhmwalder" <christoph.boehmwalder@linbit.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Josef Bacik <josef@toxicpanda.com>
CC: Keith Busch <kbusch@kernel.org>
CC: Christoph Hellwig <hch@lst.de>
CC: Sagi Grimberg <sagi@grimberg.me>
CC: Lee Duncan <lduncan@suse.com>
CC: Chris Leech <cleech@redhat.com>
CC: Mike Christie <michael.christie@oracle.com>
CC: "James E.J. Bottomley" <jejb@linux.ibm.com>
CC: "Martin K. Petersen" <martin.petersen@oracle.com>
CC: Valentina Manea <valentina.manea.m@gmail.com>
CC: Shuah Khan <shuah@kernel.org>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: David Howells <dhowells@redhat.com>
CC: Marc Dionne <marc.dionne@auristor.com>
CC: Steve French <sfrench@samba.org>
CC: Christine Caulfield <ccaulfie@redhat.com>
CC: David Teigland <teigland@redhat.com>
CC: Mark Fasheh <mark@fasheh.com>
CC: Joel Becker <jlbec@evilplan.org>
CC: Joseph Qi <joseph.qi@linux.alibaba.com>
CC: Eric Van Hensbergen <ericvh@gmail.com>
CC: Latchesar Ionkov <lucho@ionkov.net>
CC: Dominique Martinet <asmadeus@codewreck.org>
CC: Ilya Dryomov <idryomov@gmail.com>
CC: Xiubo Li <xiubli@redhat.com>
CC: Chuck Lever <chuck.lever@oracle.com>
CC: Jeff Layton <jlayton@kernel.org>
CC: Trond Myklebust <trond.myklebust@hammerspace.com>
CC: Anna Schumaker <anna@kernel.org>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
Suggested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"Core:
- Allow live renaming when an interface is up
- Add retpoline wrappers for tc, improving considerably the
performances of complex queue discipline configurations
- Add inet drop monitor support
- A few GRO performance improvements
- Add infrastructure for atomic dev stats, addressing long standing
data races
- De-duplicate common code between OVS and conntrack offloading
infrastructure
- A bunch of UBSAN_BOUNDS/FORTIFY_SOURCE improvements
- Netfilter: introduce packet parser for tunneled packets
- Replace IPVS timer-based estimators with kthreads to scale up the
workload with the number of available CPUs
- Add the helper support for connection-tracking OVS offload
BPF:
- Support for user defined BPF objects: the use case is to allocate
own objects, build own object hierarchies and use the building
blocks to build own data structures flexibly, for example, linked
lists in BPF
- Make cgroup local storage available to non-cgroup attached BPF
programs
- Avoid unnecessary deadlock detection and failures wrt BPF task
storage helpers
- A relevant bunch of BPF verifier fixes and improvements
- Veristat tool improvements to support custom filtering, sorting,
and replay of results
- Add LLVM disassembler as default library for dumping JITed code
- Lots of new BPF documentation for various BPF maps
- Add bpf_rcu_read_{,un}lock() support for sleepable programs
- Add RCU grace period chaining to BPF to wait for the completion of
access from both sleepable and non-sleepable BPF programs
- Add support storing struct task_struct objects as kptrs in maps
- Improve helper UAPI by explicitly defining BPF_FUNC_xxx integer
values
- Add libbpf *_opts API-variants for bpf_*_get_fd_by_id() functions
Protocols:
- TCP: implement Protective Load Balancing across switch links
- TCP: allow dynamically disabling TCP-MD5 static key, reverting back
to fast[er]-path
- UDP: Introduce optional per-netns hash lookup table
- IPv6: simplify and cleanup sockets disposal
- Netlink: support different type policies for each generic netlink
operation
- MPTCP: add MSG_FASTOPEN and FastOpen listener side support
- MPTCP: add netlink notification support for listener sockets events
- SCTP: add VRF support, allowing sctp sockets binding to VRF devices
- Add bridging MAC Authentication Bypass (MAB) support
- Extensions for Ethernet VPN bridging implementation to better
support multicast scenarios
- More work for Wi-Fi 7 support, comprising conversion of all the
existing drivers to internal TX queue usage
- IPSec: introduce a new offload type (packet offload) allowing
complete header processing and crypto offloading
- IPSec: extended ack support for more descriptive XFRM error
reporting
- RXRPC: increase SACK table size and move processing into a
per-local endpoint kernel thread, reducing considerably the
required locking
- IEEE 802154: synchronous send frame and extended filtering support,
initial support for scanning available 15.4 networks
- Tun: bump the link speed from 10Mbps to 10Gbps
- Tun/VirtioNet: implement UDP segmentation offload support
Driver API:
- PHY/SFP: improve power level switching between standard level 1 and
the higher power levels
- New API for netdev <-> devlink_port linkage
- PTP: convert existing drivers to new frequency adjustment
implementation
- DSA: add support for rx offloading
- Autoload DSA tagging driver when dynamically changing protocol
- Add new PCP and APPTRUST attributes to Data Center Bridging
- Add configuration support for 800Gbps link speed
- Add devlink port function attribute to enable/disable RoCE and
migratable
- Extend devlink-rate to support strict prioriry and weighted fair
queuing
- Add devlink support to directly reading from region memory
- New device tree helper to fetch MAC address from nvmem
- New big TCP helper to simplify temporary header stripping
New hardware / drivers:
- Ethernet:
- Marvel Octeon CNF95N and CN10KB Ethernet Switches
- Marvel Prestera AC5X Ethernet Switch
- WangXun 10 Gigabit NIC
- Motorcomm yt8521 Gigabit Ethernet
- Microchip ksz9563 Gigabit Ethernet Switch
- Microsoft Azure Network Adapter
- Linux Automation 10Base-T1L adapter
- PHY:
- Aquantia AQR112 and AQR412
- Motorcomm YT8531S
- PTP:
- Orolia ART-CARD
- WiFi:
- MediaTek Wi-Fi 7 (802.11be) devices
- RealTek rtw8821cu, rtw8822bu, rtw8822cu and rtw8723du USB
devices
- Bluetooth:
- Broadcom BCM4377/4378/4387 Bluetooth chipsets
- Realtek RTL8852BE and RTL8723DS
- Cypress.CYW4373A0 WiFi + Bluetooth combo device
Drivers:
- CAN:
- gs_usb: bus error reporting support
- kvaser_usb: listen only and bus error reporting support
- Ethernet NICs:
- Intel (100G):
- extend action skbedit to RX queue mapping
- implement devlink-rate support
- support direct read from memory
- nVidia/Mellanox (mlx5):
- SW steering improvements, increasing rules update rate
- Support for enhanced events compression
- extend H/W offload packet manipulation capabilities
- implement IPSec packet offload mode
- nVidia/Mellanox (mlx4):
- better big TCP support
- Netronome Ethernet NICs (nfp):
- IPsec offload support
- add support for multicast filter
- Broadcom:
- RSS and PTP support improvements
- AMD/SolarFlare:
- netlink extened ack improvements
- add basic flower matches to offload, and related stats
- Virtual NICs:
- ibmvnic: introduce affinity hint support
- small / embedded:
- FreeScale fec: add initial XDP support
- Marvel mv643xx_eth: support MII/GMII/RGMII modes for Kirkwood
- TI am65-cpsw: add suspend/resume support
- Mediatek MT7986: add RX wireless wthernet dispatch support
- Realtek 8169: enable GRO software interrupt coalescing per
default
- Ethernet high-speed switches:
- Microchip (sparx5):
- add support for Sparx5 TC/flower H/W offload via VCAP
- Mellanox mlxsw:
- add 802.1X and MAC Authentication Bypass offload support
- add ip6gre support
- Embedded Ethernet switches:
- Mediatek (mtk_eth_soc):
- improve PCS implementation, add DSA untag support
- enable flow offload support
- Renesas:
- add rswitch R-Car Gen4 gPTP support
- Microchip (lan966x):
- add full XDP support
- add TC H/W offload via VCAP
- enable PTP on bridge interfaces
- Microchip (ksz8):
- add MTU support for KSZ8 series
- Qualcomm 802.11ax WiFi (ath11k):
- support configuring channel dwell time during scan
- MediaTek WiFi (mt76):
- enable Wireless Ethernet Dispatch (WED) offload support
- add ack signal support
- enable coredump support
- remain_on_channel support
- Intel WiFi (iwlwifi):
- enable Wi-Fi 7 Extremely High Throughput (EHT) PHY capabilities
- 320 MHz channels support
- RealTek WiFi (rtw89):
- new dynamic header firmware format support
- wake-over-WLAN support"
* tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2002 commits)
ipvs: fix type warning in do_div() on 32 bit
net: lan966x: Remove a useless test in lan966x_ptp_add_trap()
net: ipa: add IPA v4.7 support
dt-bindings: net: qcom,ipa: Add SM6350 compatible
bnxt: Use generic HBH removal helper in tx path
IPv6/GRO: generic helper to remove temporary HBH/jumbo header in driver
selftests: forwarding: Add bridge MDB test
selftests: forwarding: Rename bridge_mdb test
bridge: mcast: Support replacement of MDB port group entries
bridge: mcast: Allow user space to specify MDB entry routing protocol
bridge: mcast: Allow user space to add (*, G) with a source list and filter mode
bridge: mcast: Add support for (*, G) with a source list and filter mode
bridge: mcast: Avoid arming group timer when (S, G) corresponds to a source
bridge: mcast: Add a flag for user installed source entries
bridge: mcast: Expose __br_multicast_del_group_src()
bridge: mcast: Expose br_multicast_new_group_src()
bridge: mcast: Add a centralized error path
bridge: mcast: Place netlink policy before validation functions
bridge: mcast: Split (*, G) and (S, G) addition into different functions
bridge: mcast: Do not derive entry type from its filter mode
...
|