diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2022-12-13 15:47:48 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2022-12-13 15:47:48 -0800 |
commit | 7e68dd7d07a28faa2e6574dd6b9dbd90cdeaae91 (patch) | |
tree | ae0427c5a3b905f24b3a44b510a9bcf35d9b67a3 /samples | |
parent | 1ca06f1c1acecbe02124f14a37cce347b8c1a90c (diff) | |
parent | 7c4a6309e27f411743817fe74a832ec2d2798a4b (diff) |
Merge tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"Core:
- Allow live renaming when an interface is up
- Add retpoline wrappers for tc, improving considerably the
performances of complex queue discipline configurations
- Add inet drop monitor support
- A few GRO performance improvements
- Add infrastructure for atomic dev stats, addressing long standing
data races
- De-duplicate common code between OVS and conntrack offloading
infrastructure
- A bunch of UBSAN_BOUNDS/FORTIFY_SOURCE improvements
- Netfilter: introduce packet parser for tunneled packets
- Replace IPVS timer-based estimators with kthreads to scale up the
workload with the number of available CPUs
- Add the helper support for connection-tracking OVS offload
BPF:
- Support for user defined BPF objects: the use case is to allocate
own objects, build own object hierarchies and use the building
blocks to build own data structures flexibly, for example, linked
lists in BPF
- Make cgroup local storage available to non-cgroup attached BPF
programs
- Avoid unnecessary deadlock detection and failures wrt BPF task
storage helpers
- A relevant bunch of BPF verifier fixes and improvements
- Veristat tool improvements to support custom filtering, sorting,
and replay of results
- Add LLVM disassembler as default library for dumping JITed code
- Lots of new BPF documentation for various BPF maps
- Add bpf_rcu_read_{,un}lock() support for sleepable programs
- Add RCU grace period chaining to BPF to wait for the completion of
access from both sleepable and non-sleepable BPF programs
- Add support storing struct task_struct objects as kptrs in maps
- Improve helper UAPI by explicitly defining BPF_FUNC_xxx integer
values
- Add libbpf *_opts API-variants for bpf_*_get_fd_by_id() functions
Protocols:
- TCP: implement Protective Load Balancing across switch links
- TCP: allow dynamically disabling TCP-MD5 static key, reverting back
to fast[er]-path
- UDP: Introduce optional per-netns hash lookup table
- IPv6: simplify and cleanup sockets disposal
- Netlink: support different type policies for each generic netlink
operation
- MPTCP: add MSG_FASTOPEN and FastOpen listener side support
- MPTCP: add netlink notification support for listener sockets events
- SCTP: add VRF support, allowing sctp sockets binding to VRF devices
- Add bridging MAC Authentication Bypass (MAB) support
- Extensions for Ethernet VPN bridging implementation to better
support multicast scenarios
- More work for Wi-Fi 7 support, comprising conversion of all the
existing drivers to internal TX queue usage
- IPSec: introduce a new offload type (packet offload) allowing
complete header processing and crypto offloading
- IPSec: extended ack support for more descriptive XFRM error
reporting
- RXRPC: increase SACK table size and move processing into a
per-local endpoint kernel thread, reducing considerably the
required locking
- IEEE 802154: synchronous send frame and extended filtering support,
initial support for scanning available 15.4 networks
- Tun: bump the link speed from 10Mbps to 10Gbps
- Tun/VirtioNet: implement UDP segmentation offload support
Driver API:
- PHY/SFP: improve power level switching between standard level 1 and
the higher power levels
- New API for netdev <-> devlink_port linkage
- PTP: convert existing drivers to new frequency adjustment
implementation
- DSA: add support for rx offloading
- Autoload DSA tagging driver when dynamically changing protocol
- Add new PCP and APPTRUST attributes to Data Center Bridging
- Add configuration support for 800Gbps link speed
- Add devlink port function attribute to enable/disable RoCE and
migratable
- Extend devlink-rate to support strict prioriry and weighted fair
queuing
- Add devlink support to directly reading from region memory
- New device tree helper to fetch MAC address from nvmem
- New big TCP helper to simplify temporary header stripping
New hardware / drivers:
- Ethernet:
- Marvel Octeon CNF95N and CN10KB Ethernet Switches
- Marvel Prestera AC5X Ethernet Switch
- WangXun 10 Gigabit NIC
- Motorcomm yt8521 Gigabit Ethernet
- Microchip ksz9563 Gigabit Ethernet Switch
- Microsoft Azure Network Adapter
- Linux Automation 10Base-T1L adapter
- PHY:
- Aquantia AQR112 and AQR412
- Motorcomm YT8531S
- PTP:
- Orolia ART-CARD
- WiFi:
- MediaTek Wi-Fi 7 (802.11be) devices
- RealTek rtw8821cu, rtw8822bu, rtw8822cu and rtw8723du USB
devices
- Bluetooth:
- Broadcom BCM4377/4378/4387 Bluetooth chipsets
- Realtek RTL8852BE and RTL8723DS
- Cypress.CYW4373A0 WiFi + Bluetooth combo device
Drivers:
- CAN:
- gs_usb: bus error reporting support
- kvaser_usb: listen only and bus error reporting support
- Ethernet NICs:
- Intel (100G):
- extend action skbedit to RX queue mapping
- implement devlink-rate support
- support direct read from memory
- nVidia/Mellanox (mlx5):
- SW steering improvements, increasing rules update rate
- Support for enhanced events compression
- extend H/W offload packet manipulation capabilities
- implement IPSec packet offload mode
- nVidia/Mellanox (mlx4):
- better big TCP support
- Netronome Ethernet NICs (nfp):
- IPsec offload support
- add support for multicast filter
- Broadcom:
- RSS and PTP support improvements
- AMD/SolarFlare:
- netlink extened ack improvements
- add basic flower matches to offload, and related stats
- Virtual NICs:
- ibmvnic: introduce affinity hint support
- small / embedded:
- FreeScale fec: add initial XDP support
- Marvel mv643xx_eth: support MII/GMII/RGMII modes for Kirkwood
- TI am65-cpsw: add suspend/resume support
- Mediatek MT7986: add RX wireless wthernet dispatch support
- Realtek 8169: enable GRO software interrupt coalescing per
default
- Ethernet high-speed switches:
- Microchip (sparx5):
- add support for Sparx5 TC/flower H/W offload via VCAP
- Mellanox mlxsw:
- add 802.1X and MAC Authentication Bypass offload support
- add ip6gre support
- Embedded Ethernet switches:
- Mediatek (mtk_eth_soc):
- improve PCS implementation, add DSA untag support
- enable flow offload support
- Renesas:
- add rswitch R-Car Gen4 gPTP support
- Microchip (lan966x):
- add full XDP support
- add TC H/W offload via VCAP
- enable PTP on bridge interfaces
- Microchip (ksz8):
- add MTU support for KSZ8 series
- Qualcomm 802.11ax WiFi (ath11k):
- support configuring channel dwell time during scan
- MediaTek WiFi (mt76):
- enable Wireless Ethernet Dispatch (WED) offload support
- add ack signal support
- enable coredump support
- remain_on_channel support
- Intel WiFi (iwlwifi):
- enable Wi-Fi 7 Extremely High Throughput (EHT) PHY capabilities
- 320 MHz channels support
- RealTek WiFi (rtw89):
- new dynamic header firmware format support
- wake-over-WLAN support"
* tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2002 commits)
ipvs: fix type warning in do_div() on 32 bit
net: lan966x: Remove a useless test in lan966x_ptp_add_trap()
net: ipa: add IPA v4.7 support
dt-bindings: net: qcom,ipa: Add SM6350 compatible
bnxt: Use generic HBH removal helper in tx path
IPv6/GRO: generic helper to remove temporary HBH/jumbo header in driver
selftests: forwarding: Add bridge MDB test
selftests: forwarding: Rename bridge_mdb test
bridge: mcast: Support replacement of MDB port group entries
bridge: mcast: Allow user space to specify MDB entry routing protocol
bridge: mcast: Allow user space to add (*, G) with a source list and filter mode
bridge: mcast: Add support for (*, G) with a source list and filter mode
bridge: mcast: Avoid arming group timer when (S, G) corresponds to a source
bridge: mcast: Add a flag for user installed source entries
bridge: mcast: Expose __br_multicast_del_group_src()
bridge: mcast: Expose br_multicast_new_group_src()
bridge: mcast: Add a centralized error path
bridge: mcast: Place netlink policy before validation functions
bridge: mcast: Split (*, G) and (S, G) addition into different functions
bridge: mcast: Do not derive entry type from its filter mode
...
Diffstat (limited to 'samples')
-rw-r--r-- | samples/bpf/README.rst | 6 | ||||
-rw-r--r-- | samples/bpf/hbm_edt_kern.c | 2 | ||||
-rw-r--r-- | samples/bpf/sockex3_kern.c | 95 | ||||
-rw-r--r-- | samples/bpf/sockex3_user.c | 23 | ||||
-rwxr-xr-x | samples/bpf/test_cgrp2_tc.sh | 2 | ||||
-rw-r--r-- | samples/bpf/tracex2_kern.c | 4 | ||||
-rw-r--r-- | samples/bpf/tracex2_user.c | 3 | ||||
-rw-r--r-- | samples/bpf/xdp1_user.c | 2 | ||||
-rw-r--r-- | samples/bpf/xdp2_kern.c | 4 | ||||
-rw-r--r-- | samples/bpf/xdp_router_ipv4_user.c | 2 | ||||
-rw-r--r-- | samples/pktgen/functions.sh | 2 |
11 files changed, 79 insertions, 66 deletions
diff --git a/samples/bpf/README.rst b/samples/bpf/README.rst index 60c6494adb1b..57f93edd1957 100644 --- a/samples/bpf/README.rst +++ b/samples/bpf/README.rst @@ -37,8 +37,8 @@ user, simply call:: make headers_install -This will creates a local "usr/include" directory in the git/build top -level directory, that the make system automatically pickup first. +This will create a local "usr/include" directory in the git/build top +level directory, that the make system will automatically pick up first. Compiling ========= @@ -87,7 +87,7 @@ Cross compiling samples ----------------------- In order to cross-compile, say for arm64 targets, export CROSS_COMPILE and ARCH environment variables before calling make. But do this before clean, -cofiguration and header install steps described above. This will direct make to +configuration and header install steps described above. This will direct make to build samples for the cross target:: export ARCH=arm64 diff --git a/samples/bpf/hbm_edt_kern.c b/samples/bpf/hbm_edt_kern.c index a65b677acdb0..6294f1d716c0 100644 --- a/samples/bpf/hbm_edt_kern.c +++ b/samples/bpf/hbm_edt_kern.c @@ -35,7 +35,7 @@ * * If the credit is below the drop threshold, the packet is dropped. If it * is a TCP packet, then it also calls tcp_cwr since packets dropped by - * by a cgroup skb BPF program do not automatically trigger a call to + * a cgroup skb BPF program do not automatically trigger a call to * tcp_cwr in the current kernel code. * * This BPF program actually uses 2 drop thresholds, one threshold diff --git a/samples/bpf/sockex3_kern.c b/samples/bpf/sockex3_kern.c index b363503357e5..822c13242251 100644 --- a/samples/bpf/sockex3_kern.c +++ b/samples/bpf/sockex3_kern.c @@ -17,48 +17,11 @@ #define IP_MF 0x2000 #define IP_OFFSET 0x1FFF -#define PROG(F) SEC("socket/"__stringify(F)) int bpf_func_##F - -struct { - __uint(type, BPF_MAP_TYPE_PROG_ARRAY); - __uint(key_size, sizeof(u32)); - __uint(value_size, sizeof(u32)); - __uint(max_entries, 8); -} jmp_table SEC(".maps"); - #define PARSE_VLAN 1 #define PARSE_MPLS 2 #define PARSE_IP 3 #define PARSE_IPV6 4 -/* Protocol dispatch routine. It tail-calls next BPF program depending - * on eth proto. Note, we could have used ... - * - * bpf_tail_call(skb, &jmp_table, proto); - * - * ... but it would need large prog_array and cannot be optimised given - * the map key is not static. - */ -static inline void parse_eth_proto(struct __sk_buff *skb, u32 proto) -{ - switch (proto) { - case ETH_P_8021Q: - case ETH_P_8021AD: - bpf_tail_call(skb, &jmp_table, PARSE_VLAN); - break; - case ETH_P_MPLS_UC: - case ETH_P_MPLS_MC: - bpf_tail_call(skb, &jmp_table, PARSE_MPLS); - break; - case ETH_P_IP: - bpf_tail_call(skb, &jmp_table, PARSE_IP); - break; - case ETH_P_IPV6: - bpf_tail_call(skb, &jmp_table, PARSE_IPV6); - break; - } -} - struct vlan_hdr { __be16 h_vlan_TCI; __be16 h_vlan_encapsulated_proto; @@ -74,6 +37,8 @@ struct flow_key_record { __u32 ip_proto; }; +static inline void parse_eth_proto(struct __sk_buff *skb, u32 proto); + static inline int ip_is_fragment(struct __sk_buff *ctx, __u64 nhoff) { return load_half(ctx, nhoff + offsetof(struct iphdr, frag_off)) @@ -189,7 +154,8 @@ static __always_inline void parse_ip_proto(struct __sk_buff *skb, } } -PROG(PARSE_IP)(struct __sk_buff *skb) +SEC("socket") +int bpf_func_ip(struct __sk_buff *skb) { struct globals *g = this_cpu_globals(); __u32 nhoff, verlen, ip_proto; @@ -217,7 +183,8 @@ PROG(PARSE_IP)(struct __sk_buff *skb) return 0; } -PROG(PARSE_IPV6)(struct __sk_buff *skb) +SEC("socket") +int bpf_func_ipv6(struct __sk_buff *skb) { struct globals *g = this_cpu_globals(); __u32 nhoff, ip_proto; @@ -240,7 +207,8 @@ PROG(PARSE_IPV6)(struct __sk_buff *skb) return 0; } -PROG(PARSE_VLAN)(struct __sk_buff *skb) +SEC("socket") +int bpf_func_vlan(struct __sk_buff *skb) { __u32 nhoff, proto; @@ -256,7 +224,8 @@ PROG(PARSE_VLAN)(struct __sk_buff *skb) return 0; } -PROG(PARSE_MPLS)(struct __sk_buff *skb) +SEC("socket") +int bpf_func_mpls(struct __sk_buff *skb) { __u32 nhoff, label; @@ -279,7 +248,49 @@ PROG(PARSE_MPLS)(struct __sk_buff *skb) return 0; } -SEC("socket/0") +struct { + __uint(type, BPF_MAP_TYPE_PROG_ARRAY); + __uint(key_size, sizeof(u32)); + __uint(max_entries, 8); + __array(values, u32 (void *)); +} prog_array_init SEC(".maps") = { + .values = { + [PARSE_VLAN] = (void *)&bpf_func_vlan, + [PARSE_IP] = (void *)&bpf_func_ip, + [PARSE_IPV6] = (void *)&bpf_func_ipv6, + [PARSE_MPLS] = (void *)&bpf_func_mpls, + }, +}; + +/* Protocol dispatch routine. It tail-calls next BPF program depending + * on eth proto. Note, we could have used ... + * + * bpf_tail_call(skb, &prog_array_init, proto); + * + * ... but it would need large prog_array and cannot be optimised given + * the map key is not static. + */ +static inline void parse_eth_proto(struct __sk_buff *skb, u32 proto) +{ + switch (proto) { + case ETH_P_8021Q: + case ETH_P_8021AD: + bpf_tail_call(skb, &prog_array_init, PARSE_VLAN); + break; + case ETH_P_MPLS_UC: + case ETH_P_MPLS_MC: + bpf_tail_call(skb, &prog_array_init, PARSE_MPLS); + break; + case ETH_P_IP: + bpf_tail_call(skb, &prog_array_init, PARSE_IP); + break; + case ETH_P_IPV6: + bpf_tail_call(skb, &prog_array_init, PARSE_IPV6); + break; + } +} + +SEC("socket") int main_prog(struct __sk_buff *skb) { __u32 nhoff = ETH_HLEN; diff --git a/samples/bpf/sockex3_user.c b/samples/bpf/sockex3_user.c index cd6fa79df900..56044acbd25d 100644 --- a/samples/bpf/sockex3_user.c +++ b/samples/bpf/sockex3_user.c @@ -24,10 +24,9 @@ struct pair { int main(int argc, char **argv) { - int i, sock, key, fd, main_prog_fd, jmp_table_fd, hash_map_fd; + int i, sock, fd, main_prog_fd, hash_map_fd; struct bpf_program *prog; struct bpf_object *obj; - const char *section; char filename[256]; FILE *f; @@ -45,26 +44,24 @@ int main(int argc, char **argv) goto cleanup; } - jmp_table_fd = bpf_object__find_map_fd_by_name(obj, "jmp_table"); hash_map_fd = bpf_object__find_map_fd_by_name(obj, "hash_map"); - if (jmp_table_fd < 0 || hash_map_fd < 0) { + if (hash_map_fd < 0) { fprintf(stderr, "ERROR: finding a map in obj file failed\n"); goto cleanup; } + /* find BPF main program */ + main_prog_fd = 0; bpf_object__for_each_program(prog, obj) { fd = bpf_program__fd(prog); - section = bpf_program__section_name(prog); - if (sscanf(section, "socket/%d", &key) != 1) { - fprintf(stderr, "ERROR: finding prog failed\n"); - goto cleanup; - } - - if (key == 0) + if (!strcmp(bpf_program__name(prog), "main_prog")) main_prog_fd = fd; - else - bpf_map_update_elem(jmp_table_fd, &key, &fd, BPF_ANY); + } + + if (main_prog_fd == 0) { + fprintf(stderr, "ERROR: can't find main_prog\n"); + goto cleanup; } sock = open_raw_sock("lo"); diff --git a/samples/bpf/test_cgrp2_tc.sh b/samples/bpf/test_cgrp2_tc.sh index 12faf5847e22..395573be6ae8 100755 --- a/samples/bpf/test_cgrp2_tc.sh +++ b/samples/bpf/test_cgrp2_tc.sh @@ -115,7 +115,7 @@ do_exit() { if [ "$DEBUG" == "yes" ] && [ "$MODE" != 'cleanuponly' ] then echo "------ DEBUG ------" - echo "mount: "; mount | egrep '(cgroup2|bpf)'; echo + echo "mount: "; mount | grep -E '(cgroup2|bpf)'; echo echo "$CGRP2_TC_LEAF: "; ls -l $CGRP2_TC_LEAF; echo if [ -d "$BPF_FS_TC_SHARE" ] then diff --git a/samples/bpf/tracex2_kern.c b/samples/bpf/tracex2_kern.c index 5bc696bac27d..93e0b7680b4f 100644 --- a/samples/bpf/tracex2_kern.c +++ b/samples/bpf/tracex2_kern.c @@ -22,14 +22,14 @@ struct { /* kprobe is NOT a stable ABI. If kernel internals change this bpf+kprobe * example will no longer be meaningful */ -SEC("kprobe/kfree_skb") +SEC("kprobe/kfree_skb_reason") int bpf_prog2(struct pt_regs *ctx) { long loc = 0; long init_val = 1; long *value; - /* read ip of kfree_skb caller. + /* read ip of kfree_skb_reason caller. * non-portable version of __builtin_return_address(0) */ BPF_KPROBE_READ_RET_IP(loc, ctx); diff --git a/samples/bpf/tracex2_user.c b/samples/bpf/tracex2_user.c index dd6205c6b6a7..089e408abd7a 100644 --- a/samples/bpf/tracex2_user.c +++ b/samples/bpf/tracex2_user.c @@ -146,7 +146,8 @@ int main(int ac, char **argv) signal(SIGINT, int_exit); signal(SIGTERM, int_exit); - /* start 'ping' in the background to have some kfree_skb events */ + /* start 'ping' in the background to have some kfree_skb_reason + * events */ f = popen("ping -4 -c5 localhost", "r"); (void) f; diff --git a/samples/bpf/xdp1_user.c b/samples/bpf/xdp1_user.c index ac370e638fa3..281dc964de8d 100644 --- a/samples/bpf/xdp1_user.c +++ b/samples/bpf/xdp1_user.c @@ -51,7 +51,7 @@ static void poll_stats(int map_fd, int interval) sleep(interval); - while (bpf_map_get_next_key(map_fd, &key, &key) != -1) { + while (bpf_map_get_next_key(map_fd, &key, &key) == 0) { __u64 sum = 0; assert(bpf_map_lookup_elem(map_fd, &key, values) == 0); diff --git a/samples/bpf/xdp2_kern.c b/samples/bpf/xdp2_kern.c index 3332ba6bb95f..67804ecf7ce3 100644 --- a/samples/bpf/xdp2_kern.c +++ b/samples/bpf/xdp2_kern.c @@ -112,6 +112,10 @@ int xdp_prog1(struct xdp_md *ctx) if (ipproto == IPPROTO_UDP) { swap_src_dst_mac(data); + + if (bpf_xdp_store_bytes(ctx, 0, pkt, sizeof(pkt))) + return rc; + rc = XDP_TX; } diff --git a/samples/bpf/xdp_router_ipv4_user.c b/samples/bpf/xdp_router_ipv4_user.c index 683913bbf279..9d41db09c480 100644 --- a/samples/bpf/xdp_router_ipv4_user.c +++ b/samples/bpf/xdp_router_ipv4_user.c @@ -162,7 +162,7 @@ static void read_route(struct nlmsghdr *nh, int nll) __be32 gw; } *prefix_value; - prefix_key = alloca(sizeof(*prefix_key) + 3); + prefix_key = alloca(sizeof(*prefix_key) + 4); prefix_value = alloca(sizeof(*prefix_value)); prefix_key->prefixlen = 32; diff --git a/samples/pktgen/functions.sh b/samples/pktgen/functions.sh index 933194257a24..dd4e53ae9b73 100644 --- a/samples/pktgen/functions.sh +++ b/samples/pktgen/functions.sh @@ -191,7 +191,7 @@ function extend_addr6() fi # if shrink '::' occurs multiple, it's malformed. - shrink=( $(egrep -o "$sep{2,}" <<< $addr) ) + shrink=( $(grep -E -o "$sep{2,}" <<< $addr) ) if [[ ${#shrink[@]} -ne 0 ]]; then if [[ ${#shrink[@]} -gt 1 || ( ${shrink[0]} != $sep2 ) ]]; then err 5 "Invalid IP6 address: $1" |