diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2024-07-16 19:28:34 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2024-07-16 19:28:34 -0700 |
commit | 51835949dda3783d4639cfa74ce13a3c9829de00 (patch) | |
tree | 2b593de5eba6ecc73f7c58fc65fdaffae45c7323 /arch | |
parent | 0434dbe32053d07d658165be681505120c6b1abc (diff) | |
parent | 77ae5e5b00720372af2860efdc4bc652ac682696 (diff) |
Merge tag 'net-next-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextHEADmaster
Pull networking updates from Jakub Kicinski:
"Not much excitement - a handful of large patchsets (devmem among them)
did not make it in time.
Core & protocols:
- Use local_lock in addition to local_bh_disable() to protect per-CPU
resources in networking, a step closer for local_bh_disable() not
to act as a big lock on PREEMPT_RT
- Use flex array for netdevice priv area, ensure its cache alignment
- Add a sysctl knob to allow user to specify a default rto_min at
socket init time. Bit of a big hammer but multiple companies were
independently carrying such patch downstream so clearly it's useful
- Support scheduling transmission of packets based on CLOCK_TAI
- Un-pin TCP TIMEWAIT timer to avoid it firing on CPUs later cordoned
off using cpusets
- Support multiple L2TPv3 UDP tunnels using the same 5-tuple address
- Allow configuration of multipath hash seed, to both allow
synchronizing hashing of two routers, and preventing partial
accidental sync
- Improve TCP compliance with RFC 9293 for simultaneous connect()
- Support sending NAT keepalives in IPsec ESP in UDP states.
Userspace IKE daemon had to do this before, but the kernel can
better keep track of it
- Support sending supervision HSR frames with MAC addresses stored in
ProxyNodeTable when RedBox (i.e. HSR-SAN) is enabled
- Introduce IPPROTO_SMC for selecting SMC when socket is created
- Allow UDP GSO transmit from devices with no checksum offload
- openvswitch: add packet sampling via psample, separating the
sampled traffic from "upcall" packets sent to user space for
forwarding
- nf_tables: shrink memory consumption for transaction objects
Things we sprinkled into general kernel code:
- Power Sequencing subsystem (used by Qualcomm Bluetooth driver for
QCA6390) [ Already merged separately - Linus ]
- Add IRQ information in sysfs for auxiliary bus
- Introduce guard definition for local_lock
- Add aligned flavor of __cacheline_group_{begin, end}() markings for
grouping fields in structures
BPF:
- Notify user space (via epoll) when a struct_ops object is getting
detached/unregistered
- Add new kfuncs for a generic, open-coded bits iterator
- Enable BPF programs to declare arrays of kptr, bpf_rb_root, and
bpf_list_head
- Support resilient split BTF which cuts down on duplication and
makes BTF as compact as possible WRT BTF from modules
- Add support for dumping kfunc prototypes from BTF which enables
both detecting as well as dumping compilable prototypes for kfuncs
- riscv64 BPF JIT improvements in particular to add 12-argument
support for BPF trampolines and to utilize bpf_prog_pack for the
latter
- Add the capability to offload the netfilter flowtable in XDP layer
through kfuncs
Driver API:
- Allow users to configure IRQ tresholds between which automatic IRQ
moderation can choose
- Expand Power Sourcing (PoE) status with power, class and failure
reason. Support setting power limits
- Track additional RSS contexts in the core, make sure configuration
changes don't break them
- Support IPsec crypto offload for IPv6 ESP and IPv4 UDP-encapsulated
ESP data paths
- Support updating firmware on SFP modules
Tests and tooling:
- mptcp: use net/lib.sh to manage netns
- TCP-AO and TCP-MD5: replace debug prints used by tests with
tracepoints
- openvswitch: make test self-contained (don't depend on OvS CLI
tools)
Drivers:
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- increase the max total outstanding PTP TX packets to 4
- add timestamping statistics support
- implement netdev_queue_mgmt_ops
- support new RSS context API
- Intel (100G, ice, idpf):
- implement FEC statistics and dumping signal quality indicators
- support E825C products (with 56Gbps PHYs)
- nVidia/Mellanox:
- support HW-GRO
- mlx4/mlx5: support per-queue statistics via netlink
- obey the max number of EQs setting in sub-functions
- AMD/Solarflare:
- support new RSS context API
- AMD/Pensando:
- ionic: rework fix for doorbell miss to lower overhead and
skip it on new HW
- Wangxun:
- txgbe: support Flow Director perfect filters
- Ethernet NICs consumer, embedded and virtual:
- Add driver for Tehuti Networks TN40xx chips
- Add driver for Meta's internal NIC chips
- Add driver for Ethernet MAC on Airoha EN7581 SoCs
- Add driver for Renesas Ethernet-TSN devices
- Google cloud vNIC:
- flow steering support
- Microsoft vNIC:
- support page sizes other than 4KB on ARM64
- vmware vNIC:
- support latency measurement (update to version 9)
- VirtIO net:
- support for Byte Queue Limits
- support configuring thresholds for automatic IRQ moderation
- support for AF_XDP Rx zero-copy
- Synopsys (stmmac):
- support for STM32MP13 SoC
- let platforms select the right PCS implementation
- TI:
- icssg-prueth: add multicast filtering support
- icssg-prueth: enable PTP timestamping and PPS
- Renesas:
- ravb: improve Rx performance 30-400% by using page pool,
theaded NAPI and timer-based IRQ coalescing
- ravb: add MII support for R-Car V4M
- Cadence (macb):
- macb: add ARP support to Wake-On-LAN
- Cortina:
- use phylib for RX and TX pause configuration
- Ethernet switches:
- nVidia/Mellanox:
- support configuration of multipath hash seed
- report more accurate max MTU
- use page_pool to improve Rx performance
- MediaTek:
- mt7530: add support for bridge port isolation
- Qualcomm:
- qca8k: add support for bridge port isolation
- Microchip:
- lan9371/2: add 100BaseTX PHY support
- NXP:
- vsc73xx: implement VLAN operations
- Ethernet PHYs:
- aquantia: enable support for aqr115c
- aquantia: add support for PHY LEDs
- realtek: add support for rtl8224 2.5Gbps PHY
- xpcs: add memory-mapped device support
- add BroadR-Reach link mode and support in Broadcom's PHY driver
- CAN:
- add document for ISO 15765-2 protocol support
- mcp251xfd: workaround for erratum DS80000789E, use timestamps to
catch when device returns incorrect FIFO status
- WiFi:
- mac80211/cfg80211:
- parse Transmit Power Envelope (TPE) data in mac80211 instead
of in drivers
- improvements for 6 GHz regulatory flexibility
- multi-link improvements
- support multiple radios per wiphy
- remove DEAUTH_NEED_MGD_TX_PREP flag
- Intel (iwlwifi):
- bump FW API to 91 for BZ/SC devices
- report 64-bit radiotap timestamp
- enable P2P low latency by default
- handle Transmit Power Envelope (TPE) advertised by AP
- remove support for older FW for new devices
- fast resume (keeping the device configured)
- mvm: re-enable Multi-Link Operation (MLO)
- aggregation (A-MSDU) optimizations
- MediaTek (mt76):
- mt7925 Multi-Link Operation (MLO) support
- Qualcomm (ath10k):
- LED support for various chipsets
- Qualcomm (ath12k):
- remove unsupported Tx monitor handling
- support channel 2 in 6 GHz band
- support Spatial Multiplexing Power Save (SMPS) in 6 GHz band
- supprt multiple BSSID (MBSSID) and Enhanced Multi-BSSID
Advertisements (EMA)
- support dynamic VLAN
- add panic handler for resetting the firmware state
- DebugFS support for datapath statistics
- WCN7850: support for Wake on WLAN
- Microchip (wilc1000):
- read MAC address during probe to make it visible to user space
- suspend/resume improvements
- TI (wl18xx):
- support newer firmware versions
- RealTek (rtw89):
- preparation for RTL8852BE-VT support
- Wake on WLAN support for WiFi 6 chips
- 36-bit PCI DMA support
- RealTek (rtlwifi):
- RTL8192DU support
- Broadcom (brcmfmac):
- Management Frame Protection support (to enable WPA3)
- Bluetooth:
- qualcomm: use the power sequencer for QCA6390
- btusb: mediatek: add ISO data transmission functions
- hci_bcm4377: add BCM4388 support
- btintel: add support for BlazarU core
- btintel: add support for Whale Peak2
- btnxpuart: add support for AW693 A1 chipset
- btnxpuart: add support for IW615 chipset
- btusb: add Realtek RTL8852BE support ID 0x13d3:0x3591"
* tag 'net-next-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1589 commits)
eth: fbnic: Fix spelling mistake "tiggerring" -> "triggering"
tcp: Replace strncpy() with strscpy()
wifi: ath12k: fix build vs old compiler
tcp: Don't access uninit tcp_rsk(req)->ao_keyid in tcp_create_openreq_child().
eth: fbnic: Write the TCAM tables used for RSS control and Rx to host
eth: fbnic: Add L2 address programming
eth: fbnic: Add basic Rx handling
eth: fbnic: Add basic Tx handling
eth: fbnic: Add link detection
eth: fbnic: Add initial messaging to notify FW of our presence
eth: fbnic: Implement Rx queue alloc/start/stop/free
eth: fbnic: Implement Tx queue alloc/start/stop/free
eth: fbnic: Allocate a netdevice and napi vectors with queues
eth: fbnic: Add FW communication mechanism
eth: fbnic: Add message parsing for FW messages
eth: fbnic: Add register init to set PCIe/Ethernet device config
eth: fbnic: Allocate core device specific structures and devlink interface
eth: fbnic: Add scaffolding for Meta's NIC driver
PCI: Add Meta Platforms vendor ID
net/sched: cls_flower: propagate tca[TCA_OPTIONS] to NL_REQ_ATTR_CHECK
...
Diffstat (limited to 'arch')
-rw-r--r-- | arch/arm/boot/dts/rockchip/rk3066a.dtsi | 4 | ||||
-rw-r--r-- | arch/arm/boot/dts/rockchip/rk3xxx.dtsi | 7 | ||||
-rw-r--r-- | arch/arm64/boot/dts/ti/k3-am65-iot2050-common-pg1.dtsi | 12 | ||||
-rw-r--r-- | arch/arm64/net/bpf_jit_comp.c | 16 | ||||
-rw-r--r-- | arch/powerpc/net/bpf_jit_comp.c | 4 | ||||
-rw-r--r-- | arch/riscv/Kconfig | 12 | ||||
-rw-r--r-- | arch/riscv/net/bpf_jit.h | 51 | ||||
-rw-r--r-- | arch/riscv/net/bpf_jit_comp32.c | 3 | ||||
-rw-r--r-- | arch/riscv/net/bpf_jit_comp64.c | 144 | ||||
-rw-r--r-- | arch/riscv/net/bpf_jit_core.c | 5 | ||||
-rw-r--r-- | arch/s390/net/bpf_jit_comp.c | 489 | ||||
-rw-r--r-- | arch/x86/net/bpf_jit_comp.c | 15 |
12 files changed, 575 insertions, 187 deletions
diff --git a/arch/arm/boot/dts/rockchip/rk3066a.dtsi b/arch/arm/boot/dts/rockchip/rk3066a.dtsi index 5e0750547ab5..3f6d49459734 100644 --- a/arch/arm/boot/dts/rockchip/rk3066a.dtsi +++ b/arch/arm/boot/dts/rockchip/rk3066a.dtsi @@ -896,7 +896,3 @@ &wdt { compatible = "rockchip,rk3066-wdt", "snps,dw-wdt"; }; - -&emac { - compatible = "rockchip,rk3066-emac"; -}; diff --git a/arch/arm/boot/dts/rockchip/rk3xxx.dtsi b/arch/arm/boot/dts/rockchip/rk3xxx.dtsi index f37137f298d5..e6a78bcf9163 100644 --- a/arch/arm/boot/dts/rockchip/rk3xxx.dtsi +++ b/arch/arm/boot/dts/rockchip/rk3xxx.dtsi @@ -194,17 +194,14 @@ }; emac: ethernet@10204000 { - compatible = "snps,arc-emac"; + compatible = "rockchip,rk3066-emac"; reg = <0x10204000 0x3c>; interrupts = <GIC_SPI 19 IRQ_TYPE_LEVEL_HIGH>; - - rockchip,grf = <&grf>; - clocks = <&cru HCLK_EMAC>, <&cru SCLK_MAC>; clock-names = "hclk", "macref"; max-speed = <100>; phy-mode = "rmii"; - + rockchip,grf = <&grf>; status = "disabled"; }; diff --git a/arch/arm64/boot/dts/ti/k3-am65-iot2050-common-pg1.dtsi b/arch/arm64/boot/dts/ti/k3-am65-iot2050-common-pg1.dtsi index ef7897763ef8..0a29ed172215 100644 --- a/arch/arm64/boot/dts/ti/k3-am65-iot2050-common-pg1.dtsi +++ b/arch/arm64/boot/dts/ti/k3-am65-iot2050-common-pg1.dtsi @@ -73,3 +73,15 @@ "rx0", "rx1", "rxmgm0", "rxmgm1"; }; + +&icssg0_iep0 { + interrupt-parent = <&icssg0_intc>; + interrupts = <7 7 7>; + interrupt-names = "iep_cap_cmp"; +}; + +&icssg0_iep1 { + interrupt-parent = <&icssg0_intc>; + interrupts = <56 8 8>; + interrupt-names = "iep_cap_cmp"; +}; diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index 720336d28856..dd0bb069df4b 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -1244,6 +1244,13 @@ emit_cond_jmp: break; } + /* Implement helper call to bpf_get_current_task/_btf() inline */ + if (insn->src_reg == 0 && (insn->imm == BPF_FUNC_get_current_task || + insn->imm == BPF_FUNC_get_current_task_btf)) { + emit(A64_MRS_SP_EL0(r0), ctx); + break; + } + ret = bpf_jit_get_func_addr(ctx->prog, insn, extra_pass, &func_addr, &func_addr_fixed); if (ret < 0) @@ -1829,8 +1836,7 @@ skip_init_ctx: prog->jited_len = 0; goto out_free_hdr; } - if (WARN_ON(bpf_jit_binary_pack_finalize(prog, ro_header, - header))) { + if (WARN_ON(bpf_jit_binary_pack_finalize(ro_header, header))) { /* ro_header has been freed */ ro_header = NULL; prog = orig_prog; @@ -2141,7 +2147,7 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im, emit(A64_STR64I(A64_R(20), A64_SP, regs_off + 8), ctx); if (flags & BPF_TRAMP_F_CALL_ORIG) { - emit_addr_mov_i64(A64_R(0), (const u64)im, ctx); + emit_a64_mov_i64(A64_R(0), (const u64)im, ctx); emit_call((const u64)__bpf_tramp_enter, ctx); } @@ -2185,7 +2191,7 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im, if (flags & BPF_TRAMP_F_CALL_ORIG) { im->ip_epilogue = ctx->ro_image + ctx->idx; - emit_addr_mov_i64(A64_R(0), (const u64)im, ctx); + emit_a64_mov_i64(A64_R(0), (const u64)im, ctx); emit_call((const u64)__bpf_tramp_exit, ctx); } @@ -2581,6 +2587,8 @@ bool bpf_jit_inlines_helper_call(s32 imm) { switch (imm) { case BPF_FUNC_get_smp_processor_id: + case BPF_FUNC_get_current_task: + case BPF_FUNC_get_current_task_btf: return true; default: return false; diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c index 984655419da5..2a36cc2e7e9e 100644 --- a/arch/powerpc/net/bpf_jit_comp.c +++ b/arch/powerpc/net/bpf_jit_comp.c @@ -225,7 +225,7 @@ skip_init_ctx: fp->jited_len = proglen + FUNCTION_DESCR_SIZE; if (!fp->is_func || extra_pass) { - if (bpf_jit_binary_pack_finalize(fp, fhdr, hdr)) { + if (bpf_jit_binary_pack_finalize(fhdr, hdr)) { fp = org_fp; goto out_addrs; } @@ -348,7 +348,7 @@ void bpf_jit_free(struct bpf_prog *fp) * before freeing it. */ if (jit_data) { - bpf_jit_binary_pack_finalize(fp, jit_data->fhdr, jit_data->hdr); + bpf_jit_binary_pack_finalize(jit_data->fhdr, jit_data->hdr); kvfree(jit_data->addrs); kfree(jit_data); } diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 0525ee2d63c7..9f38a5ecbee3 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -610,6 +610,18 @@ config TOOLCHAIN_HAS_VECTOR_CRYPTO def_bool $(as-instr, .option arch$(comma) +v$(comma) +zvkb) depends on AS_HAS_OPTION_ARCH +config RISCV_ISA_ZBA + bool "Zba extension support for bit manipulation instructions" + default y + help + Add support for enabling optimisations in the kernel when the Zba + extension is detected at boot. + + The Zba extension provides instructions to accelerate the generation + of addresses that index into arrays of basic data types. + + If you don't know what to do here, say Y. + config RISCV_ISA_ZBB bool "Zbb extension support for bit manipulation instructions" depends on TOOLCHAIN_HAS_ZBB diff --git a/arch/riscv/net/bpf_jit.h b/arch/riscv/net/bpf_jit.h index fdbf88ca8b70..1d1c78d4cff1 100644 --- a/arch/riscv/net/bpf_jit.h +++ b/arch/riscv/net/bpf_jit.h @@ -18,6 +18,11 @@ static inline bool rvc_enabled(void) return IS_ENABLED(CONFIG_RISCV_ISA_C); } +static inline bool rvzba_enabled(void) +{ + return IS_ENABLED(CONFIG_RISCV_ISA_ZBA) && riscv_has_extension_likely(RISCV_ISA_EXT_ZBA); +} + static inline bool rvzbb_enabled(void) { return IS_ENABLED(CONFIG_RISCV_ISA_ZBB) && riscv_has_extension_likely(RISCV_ISA_EXT_ZBB); @@ -737,6 +742,17 @@ static inline u16 rvc_swsp(u32 imm8, u8 rs2) return rv_css_insn(0x6, imm, rs2, 0x2); } +/* RVZBA instructions. */ +static inline u32 rvzba_sh2add(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(0x10, rs2, rs1, 0x4, rd, 0x33); +} + +static inline u32 rvzba_sh3add(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(0x10, rs2, rs1, 0x6, rd, 0x33); +} + /* RVZBB instructions. */ static inline u32 rvzbb_sextb(u8 rd, u8 rs1) { @@ -939,6 +955,14 @@ static inline u16 rvc_sdsp(u32 imm9, u8 rs2) return rv_css_insn(0x7, imm, rs2, 0x2); } +/* RV64-only ZBA instructions. */ + +static inline u32 rvzba_zextw(u8 rd, u8 rs1) +{ + /* add.uw rd, rs1, ZERO */ + return rv_r_insn(0x04, RV_REG_ZERO, rs1, 0, rd, 0x3b); +} + #endif /* __riscv_xlen == 64 */ /* Helper functions that emit RVC instructions when possible. */ @@ -1082,6 +1106,28 @@ static inline void emit_sw(u8 rs1, s32 off, u8 rs2, struct rv_jit_context *ctx) emit(rv_sw(rs1, off, rs2), ctx); } +static inline void emit_sh2add(u8 rd, u8 rs1, u8 rs2, struct rv_jit_context *ctx) +{ + if (rvzba_enabled()) { + emit(rvzba_sh2add(rd, rs1, rs2), ctx); + return; + } + + emit_slli(rd, rs1, 2, ctx); + emit_add(rd, rd, rs2, ctx); +} + +static inline void emit_sh3add(u8 rd, u8 rs1, u8 rs2, struct rv_jit_context *ctx) +{ + if (rvzba_enabled()) { + emit(rvzba_sh3add(rd, rs1, rs2), ctx); + return; + } + + emit_slli(rd, rs1, 3, ctx); + emit_add(rd, rd, rs2, ctx); +} + /* RV64-only helper functions. */ #if __riscv_xlen == 64 @@ -1161,6 +1207,11 @@ static inline void emit_zexth(u8 rd, u8 rs, struct rv_jit_context *ctx) static inline void emit_zextw(u8 rd, u8 rs, struct rv_jit_context *ctx) { + if (rvzba_enabled()) { + emit(rvzba_zextw(rd, rs), ctx); + return; + } + emit_slli(rd, rs, 32, ctx); emit_srli(rd, rd, 32, ctx); } diff --git a/arch/riscv/net/bpf_jit_comp32.c b/arch/riscv/net/bpf_jit_comp32.c index f5ba73bb153d..592dd86fbf81 100644 --- a/arch/riscv/net/bpf_jit_comp32.c +++ b/arch/riscv/net/bpf_jit_comp32.c @@ -811,8 +811,7 @@ static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx) * if (!prog) * goto out; */ - emit(rv_slli(RV_REG_T0, lo(idx_reg), 2), ctx); - emit(rv_add(RV_REG_T0, RV_REG_T0, lo(arr_reg)), ctx); + emit_sh2add(RV_REG_T0, lo(idx_reg), lo(arr_reg), ctx); off = offsetof(struct bpf_array, ptrs); if (is_12b_check(off, insn)) return -1; diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c index 79a001d5533e..0795efdd3519 100644 --- a/arch/riscv/net/bpf_jit_comp64.c +++ b/arch/riscv/net/bpf_jit_comp64.c @@ -15,7 +15,10 @@ #include <asm/percpu.h> #include "bpf_jit.h" +#define RV_MAX_REG_ARGS 8 #define RV_FENTRY_NINSNS 2 +/* imm that allows emit_imm to emit max count insns */ +#define RV_MAX_COUNT_IMM 0x7FFF7FF7FF7FF7FF #define RV_REG_TCC RV_REG_A6 #define RV_REG_TCC_SAVED RV_REG_S6 /* Store A6 in S6 if program do calls */ @@ -380,8 +383,7 @@ static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx) * if (!prog) * goto out; */ - emit_slli(RV_REG_T2, RV_REG_A2, 3, ctx); - emit_add(RV_REG_T2, RV_REG_T2, RV_REG_A1, ctx); + emit_sh3add(RV_REG_T2, RV_REG_A2, RV_REG_A1, ctx); off = offsetof(struct bpf_array, ptrs); if (is_12b_check(off, insn)) return -1; @@ -537,8 +539,10 @@ static void emit_atomic(u8 rd, u8 rs, s16 off, s32 imm, bool is64, /* r0 = atomic_cmpxchg(dst_reg + off16, r0, src_reg); */ case BPF_CMPXCHG: r0 = bpf_to_rv_reg(BPF_REG_0, ctx); - emit(is64 ? rv_addi(RV_REG_T2, r0, 0) : - rv_addiw(RV_REG_T2, r0, 0), ctx); + if (is64) + emit_mv(RV_REG_T2, r0, ctx); + else + emit_addiw(RV_REG_T2, r0, 0, ctx); emit(is64 ? rv_lr_d(r0, 0, rd, 0, 0) : rv_lr_w(r0, 0, rd, 0, 0), ctx); jmp_offset = ninsns_rvoff(8); @@ -689,26 +693,45 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type poke_type, return ret; } -static void store_args(int nregs, int args_off, struct rv_jit_context *ctx) +static void store_args(int nr_arg_slots, int args_off, struct rv_jit_context *ctx) { int i; - for (i = 0; i < nregs; i++) { - emit_sd(RV_REG_FP, -args_off, RV_REG_A0 + i, ctx); + for (i = 0; i < nr_arg_slots; i++) { + if (i < RV_MAX_REG_ARGS) { + emit_sd(RV_REG_FP, -args_off, RV_REG_A0 + i, ctx); + } else { + /* skip slots for T0 and FP of traced function */ + emit_ld(RV_REG_T1, 16 + (i - RV_MAX_REG_ARGS) * 8, RV_REG_FP, ctx); + emit_sd(RV_REG_FP, -args_off, RV_REG_T1, ctx); + } args_off -= 8; } } -static void restore_args(int nregs, int args_off, struct rv_jit_context *ctx) +static void restore_args(int nr_reg_args, int args_off, struct rv_jit_context *ctx) { int i; - for (i = 0; i < nregs; i++) { + for (i = 0; i < nr_reg_args; i++) { emit_ld(RV_REG_A0 + i, -args_off, RV_REG_FP, ctx); args_off -= 8; } } +static void restore_stack_args(int nr_stack_args, int args_off, int stk_arg_off, + struct rv_jit_context *ctx) +{ + int i; + + for (i = 0; i < nr_stack_args; i++) { + emit_ld(RV_REG_T1, -(args_off - RV_MAX_REG_ARGS * 8), RV_REG_FP, ctx); + emit_sd(RV_REG_FP, -stk_arg_off, RV_REG_T1, ctx); + args_off -= 8; + stk_arg_off -= 8; + } +} + static int invoke_bpf_prog(struct bpf_tramp_link *l, int args_off, int retval_off, int run_ctx_off, bool save_ret, struct rv_jit_context *ctx) { @@ -781,8 +804,8 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, { int i, ret, offset; int *branches_off = NULL; - int stack_size = 0, nregs = m->nr_args; - int retval_off, args_off, nregs_off, ip_off, run_ctx_off, sreg_off; + int stack_size = 0, nr_arg_slots = 0; + int retval_off, args_off, nregs_off, ip_off, run_ctx_off, sreg_off, stk_arg_off; struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY]; struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT]; struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN]; @@ -828,20 +851,21 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, * FP - sreg_off [ callee saved reg ] * * [ pads ] pads for 16 bytes alignment + * + * [ stack_argN ] + * [ ... ] + * FP - stk_arg_off [ stack_arg1 ] BPF_TRAMP_F_CALL_ORIG */ if (flags & (BPF_TRAMP_F_ORIG_STACK | BPF_TRAMP_F_SHARE_IPMODIFY)) return -ENOTSUPP; - /* extra regiters for struct arguments */ - for (i = 0; i < m->nr_args; i++) - if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG) - nregs += round_up(m->arg_size[i], 8) / 8 - 1; - - /* 8 arguments passed by registers */ - if (nregs > 8) + if (m->nr_args > MAX_BPF_FUNC_ARGS) return -ENOTSUPP; + for (i = 0; i < m->nr_args; i++) + nr_arg_slots += round_up(m->arg_size[i], 8) / 8; + /* room of trampoline frame to store return address and frame pointer */ stack_size += 16; @@ -851,7 +875,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, retval_off = stack_size; } - stack_size += nregs * 8; + stack_size += nr_arg_slots * 8; args_off = stack_size; stack_size += 8; @@ -868,7 +892,13 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, stack_size += 8; sreg_off = stack_size; - stack_size = round_up(stack_size, 16); + if ((flags & BPF_TRAMP_F_CALL_ORIG) && (nr_arg_slots - RV_MAX_REG_ARGS > 0)) + stack_size += (nr_arg_slots - RV_MAX_REG_ARGS) * 8; + + stack_size = round_up(stack_size, STACK_ALIGN); + + /* room for args on stack must be at the top of stack */ + stk_arg_off = stack_size; if (!is_struct_ops) { /* For the trampoline called from function entry, @@ -905,17 +935,17 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, emit_sd(RV_REG_FP, -ip_off, RV_REG_T1, ctx); } - emit_li(RV_REG_T1, nregs, ctx); + emit_li(RV_REG_T1, nr_arg_slots, ctx); emit_sd(RV_REG_FP, -nregs_off, RV_REG_T1, ctx); - store_args(nregs, args_off, ctx); + store_args(nr_arg_slots, args_off, ctx); /* skip to actual body of traced function */ if (flags & BPF_TRAMP_F_SKIP_FRAME) orig_call += RV_FENTRY_NINSNS * 4; if (flags & BPF_TRAMP_F_CALL_ORIG) { - emit_imm(RV_REG_A0, (const s64)im, ctx); + emit_imm(RV_REG_A0, ctx->insns ? (const s64)im : RV_MAX_COUNT_IMM, ctx); ret = emit_call((const u64)__bpf_tramp_enter, true, ctx); if (ret) return ret; @@ -948,13 +978,14 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, } if (flags & BPF_TRAMP_F_CALL_ORIG) { - restore_args(nregs, args_off, ctx); + restore_args(min_t(int, nr_arg_slots, RV_MAX_REG_ARGS), args_off, ctx); + restore_stack_args(nr_arg_slots - RV_MAX_REG_ARGS, args_off, stk_arg_off, ctx); ret = emit_call((const u64)orig_call, true, ctx); if (ret) goto out; emit_sd(RV_REG_FP, -retval_off, RV_REG_A0, ctx); emit_sd(RV_REG_FP, -(retval_off - 8), regmap[BPF_REG_0], ctx); - im->ip_after_call = ctx->insns + ctx->ninsns; + im->ip_after_call = ctx->ro_insns + ctx->ninsns; /* 2 nops reserved for auipc+jalr pair */ emit(rv_nop(), ctx); emit(rv_nop(), ctx); @@ -975,15 +1006,15 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, } if (flags & BPF_TRAMP_F_CALL_ORIG) { - im->ip_epilogue = ctx->insns + ctx->ninsns; - emit_imm(RV_REG_A0, (const s64)im, ctx); + im->ip_epilogue = ctx->ro_insns + ctx->ninsns; + emit_imm(RV_REG_A0, ctx->insns ? (const s64)im : RV_MAX_COUNT_IMM, ctx); ret = emit_call((const u64)__bpf_tramp_exit, true, ctx); if (ret) goto out; } if (flags & BPF_TRAMP_F_RESTORE_REGS) - restore_args(nregs, args_off, ctx); + restore_args(min_t(int, nr_arg_slots, RV_MAX_REG_ARGS), args_off, ctx); if (save_ret) { emit_ld(RV_REG_A0, -retval_off, RV_REG_FP, ctx); @@ -1038,31 +1069,52 @@ int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags, return ret < 0 ? ret : ninsns_rvoff(ctx.ninsns); } -int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, - void *image_end, const struct btf_func_model *m, +void *arch_alloc_bpf_trampoline(unsigned int size) +{ + return bpf_prog_pack_alloc(size, bpf_fill_ill_insns); +} + +void arch_free_bpf_trampoline(void *image, unsigned int size) +{ + bpf_prog_pack_free(image, size); +} + +int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *ro_image, + void *ro_image_end, const struct btf_func_model *m, u32 flags, struct bpf_tramp_links *tlinks, void *func_addr) { int ret; + void *image, *res; struct rv_jit_context ctx; + u32 size = ro_image_end - ro_image; + + image = kvmalloc(size, GFP_KERNEL); + if (!image) + return -ENOMEM; ctx.ninsns = 0; - /* - * The bpf_int_jit_compile() uses a RW buffer (ctx.insns) to write the - * JITed instructions and later copies it to a RX region (ctx.ro_insns). - * It also uses ctx.ro_insns to calculate offsets for jumps etc. As the - * trampoline image uses the same memory area for writing and execution, - * both ctx.insns and ctx.ro_insns can be set to image. - */ ctx.insns = image; - ctx.ro_insns = image; + ctx.ro_insns = ro_image; ret = __arch_prepare_bpf_trampoline(im, m, tlinks, func_addr, flags, &ctx); if (ret < 0) - return ret; + goto out; - bpf_flush_icache(ctx.insns, ctx.insns + ctx.ninsns); + if (WARN_ON(size < ninsns_rvoff(ctx.ninsns))) { + ret = -E2BIG; + goto out; + } + + res = bpf_arch_text_copy(ro_image, image, size); + if (IS_ERR(res)) { + ret = PTR_ERR(res); + goto out; + } - return ninsns_rvoff(ret); + bpf_flush_icache(ro_image, ro_image_end); +out: + kvfree(image); + return ret < 0 ? ret : size; } int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx, @@ -1097,12 +1149,10 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx, /* Load current CPU number in T1 */ emit_ld(RV_REG_T1, offsetof(struct thread_info, cpu), RV_REG_TP, ctx); - /* << 3 because offsets are 8 bytes */ - emit_slli(RV_REG_T1, RV_REG_T1, 3, ctx); /* Load address of __per_cpu_offset array in T2 */ emit_addr(RV_REG_T2, (u64)&__per_cpu_offset, extra_pass, ctx); - /* Add offset of current CPU to __per_cpu_offset */ - emit_add(RV_REG_T1, RV_REG_T2, RV_REG_T1, ctx); + /* Get address of __per_cpu_offset[cpu] in T1 */ + emit_sh3add(RV_REG_T1, RV_REG_T1, RV_REG_T2, ctx); /* Load __per_cpu_offset[cpu] in T1 */ emit_ld(RV_REG_T1, 0, RV_REG_T1, ctx); /* Add the offset to Rd */ @@ -1960,7 +2010,7 @@ void bpf_jit_build_prologue(struct rv_jit_context *ctx, bool is_subprog) { int i, stack_adjust = 0, store_offset, bpf_stack_adjust; - bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16); + bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, STACK_ALIGN); if (bpf_stack_adjust) mark_fp(ctx); @@ -1982,7 +2032,7 @@ void bpf_jit_build_prologue(struct rv_jit_context *ctx, bool is_subprog) if (ctx->arena_vm_start) stack_adjust += 8; - stack_adjust = round_up(stack_adjust, 16); + stack_adjust = round_up(stack_adjust, STACK_ALIGN); stack_adjust += bpf_stack_adjust; store_offset = stack_adjust - 8; diff --git a/arch/riscv/net/bpf_jit_core.c b/arch/riscv/net/bpf_jit_core.c index 0a96abdaca65..6de753c667f4 100644 --- a/arch/riscv/net/bpf_jit_core.c +++ b/arch/riscv/net/bpf_jit_core.c @@ -178,8 +178,7 @@ skip_init_ctx: prog->jited_len = prog_size - cfi_get_offset(); if (!prog->is_func || extra_pass) { - if (WARN_ON(bpf_jit_binary_pack_finalize(prog, jit_data->ro_header, - jit_data->header))) { + if (WARN_ON(bpf_jit_binary_pack_finalize(jit_data->ro_header, jit_data->header))) { /* ro_header has been freed */ jit_data->ro_header = NULL; prog = orig_prog; @@ -258,7 +257,7 @@ void bpf_jit_free(struct bpf_prog *prog) * before freeing it. */ if (jit_data) { - bpf_jit_binary_pack_finalize(prog, jit_data->ro_header, jit_data->header); + bpf_jit_binary_pack_finalize(jit_data->ro_header, jit_data->header); kfree(jit_data); } hdr = bpf_jit_binary_pack_hdr(prog); diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c index 4be8f5cadd02..9d440a0b729e 100644 --- a/arch/s390/net/bpf_jit_comp.c +++ b/arch/s390/net/bpf_jit_comp.c @@ -31,11 +31,12 @@ #include <asm/nospec-branch.h> #include <asm/set_memory.h> #include <asm/text-patching.h> +#include <asm/unwind.h> #include "bpf_jit.h" struct bpf_jit { u32 seen; /* Flags to remember seen eBPF instructions */ - u32 seen_reg[16]; /* Array to remember which registers are used */ + u16 seen_regs; /* Mask to remember which registers are used */ u32 *addrs; /* Array with relative instruction addresses */ u8 *prg_buf; /* Start of program */ int size; /* Size of program and literal pool */ @@ -53,6 +54,8 @@ struct bpf_jit { int excnt; /* Number of exception table entries */ int prologue_plt_ret; /* Return address for prologue hotpatch PLT */ int prologue_plt; /* Start of prologue hotpatch PLT */ + int kern_arena; /* Pool offset of kernel arena address */ + u64 user_arena; /* User arena address */ }; #define SEEN_MEM BIT(0) /* use mem[] for temporary storage */ @@ -60,6 +63,8 @@ struct bpf_jit { #define SEEN_FUNC BIT(2) /* calls C functions */ #define SEEN_STACK (SEEN_FUNC | SEEN_MEM) +#define NVREGS 0xffc0 /* %r6-%r15 */ + /* * s390 registers */ @@ -118,8 +123,8 @@ static inline void reg_set_seen(struct bpf_jit *jit, u32 b1) { u32 r1 = reg2hex[b1]; - if (r1 >= 6 && r1 <= 15 && !jit->seen_reg[r1]) - jit->seen_reg[r1] = 1; + if (r1 >= 6 && r1 <= 15) + jit->seen_regs |= (1 << r1); } #define REG_SET_SEEN(b1) \ @@ -127,8 +132,6 @@ static inline void reg_set_seen(struct bpf_jit *jit, u32 b1) reg_set_seen(jit, b1); \ }) -#define REG_SEEN(b1) jit->seen_reg[reg2hex[(b1)]] - /* * EMIT macros for code generation */ @@ -436,12 +439,12 @@ static void restore_regs(struct bpf_jit *jit, u32 rs, u32 re, u32 stack_depth) /* * Return first seen register (from start) */ -static int get_start(struct bpf_jit *jit, int start) +static int get_start(u16 seen_regs, int start) { int i; for (i = start; i <= 15; i++) { - if (jit->seen_reg[i]) + if (seen_regs & (1 << i)) return i; } return 0; @@ -450,15 +453,15 @@ static int get_start(struct bpf_jit *jit, int start) /* * Return last seen register (from start) (gap >= 2) */ -static int get_end(struct bpf_jit *jit, int start) +static int get_end(u16 seen_regs, int start) { int i; for (i = start; i < 15; i++) { - if (!jit->seen_reg[i] && !jit->seen_reg[i + 1]) + if (!(seen_regs & (3 << i))) return i - 1; } - return jit->seen_reg[15] ? 15 : 14; + return (seen_regs & (1 << 15)) ? 15 : 14; } #define REGS_SAVE 1 @@ -467,8 +470,10 @@ static int get_end(struct bpf_jit *jit, int start) * Save and restore clobbered registers (6-15) on stack. * We save/restore registers in chunks with gap >= 2 registers. */ -static void save_restore_regs(struct bpf_jit *jit, int op, u32 stack_depth) +static void save_restore_regs(struct bpf_jit *jit, int op, u32 stack_depth, + u16 extra_regs) { + u16 seen_regs = jit->seen_regs | extra_regs; const int last = 15, save_restore_size = 6; int re = 6, rs; @@ -482,10 +487,10 @@ static void save_restore_regs(struct bpf_jit *jit, int op, u32 stack_depth) } do { - rs = get_start(jit, re); + rs = get_start(seen_regs, re); if (!rs) break; - re = get_end(jit, rs + 1); + re = get_end(seen_regs, rs + 1); if (op == REGS_SAVE) save_regs(jit, rs, re); else @@ -570,8 +575,21 @@ static void bpf_jit_prologue(struct bpf_jit *jit, struct bpf_prog *fp, } /* Tail calls have to skip above initialization */ jit->tail_call_start = jit->prg; - /* Save registers */ - save_restore_regs(jit, REGS_SAVE, stack_depth); + if (fp->aux->exception_cb) { + /* + * Switch stack, the new address is in the 2nd parameter. + * + * Arrange the restoration of %r6-%r15 in the epilogue. + * Do not restore them now, the prog does not need them. + */ + /* lgr %r15,%r3 */ + EMIT4(0xb9040000, REG_15, REG_3); + jit->seen_regs |= NVREGS; + } else { + /* Save registers */ + save_restore_regs(jit, REGS_SAVE, stack_depth, + fp->aux->exception_boundary ? NVREGS : 0); + } /* Setup literal pool */ if (is_first_pass(jit) || (jit->seen & SEEN_LITERAL)) { if (!is_first_pass(jit) && @@ -647,7 +665,7 @@ static void bpf_jit_epilogue(struct bpf_jit *jit, u32 stack_depth) /* Load exit code: lgr %r2,%b0 */ EMIT4(0xb9040000, REG_2, BPF_REG_0); /* Restore registers */ - save_restore_regs(jit, REGS_RESTORE, stack_depth); + save_restore_regs(jit, REGS_RESTORE, stack_depth, 0); if (nospec_uses_trampoline()) { jit->r14_thunk_ip = jit->prg; /* Generate __s390_indirect_jump_r14 thunk */ @@ -667,50 +685,111 @@ static void bpf_jit_epilogue(struct bpf_jit *jit, u32 stack_depth) jit->prg += sizeof(struct bpf_plt); } -static int get_probe_mem_regno(const u8 *insn) -{ - /* - * insn must point to llgc, llgh, llgf, lg, lgb, lgh or lgf, which have - * destination register at the same position. - */ - if (insn[0] != 0xe3) /* common prefix */ - return -1; - if (insn[5] != 0x90 && /* llgc */ - insn[5] != 0x91 && /* llgh */ - insn[5] != 0x16 && /* llgf */ - insn[5] != 0x04 && /* lg */ - insn[5] != 0x77 && /* lgb */ - insn[5] != 0x15 && /* lgh */ - insn[5] != 0x14) /* lgf */ - return -1; - return insn[1] >> 4; -} - bool ex_handler_bpf(const struct exception_table_entry *x, struct pt_regs *regs) { regs->psw.addr = extable_fixup(x); - regs->gprs[x->data] = 0; + if (x->data != -1) + regs->gprs[x->data] = 0; return true; } -static int bpf_jit_probe_mem(struct bpf_jit *jit, struct bpf_prog *fp, - int probe_prg, int nop_prg) +/* + * A single BPF probe instruction + */ +struct bpf_jit_probe { + int prg; /* JITed instruction offset */ + int nop_prg; /* JITed nop offset */ + int reg; /* Register to clear on exception */ + int arena_reg; /* Register to use for arena addressing */ +}; + +static void bpf_jit_probe_init(struct bpf_jit_probe *probe) +{ + probe->prg = -1; + probe->nop_prg = -1; + probe->reg = -1; + probe->arena_reg = REG_0; +} + +/* + * Handlers of certain exceptions leave psw.addr pointing to the instruction + * directly after the failing one. Therefore, create two exception table + * entries and also add a nop in case two probing instructions come directly + * after each other. + */ +static void bpf_jit_probe_emit_nop(struct bpf_jit *jit, + struct bpf_jit_probe *probe) +{ + if (probe->prg == -1 || probe->nop_prg != -1) + /* The probe is not armed or nop is already emitted. */ + return; + + probe->nop_prg = jit->prg; + /* bcr 0,%0 */ + _EMIT2(0x0700); +} + +static void bpf_jit_probe_load_pre(struct bpf_jit *jit, struct bpf_insn *insn, + struct bpf_jit_probe *probe) +{ + if (BPF_MODE(insn->code) != BPF_PROBE_MEM && + BPF_MODE(insn->code) != BPF_PROBE_MEMSX && + BPF_MODE(insn->code) != BPF_PROBE_MEM32) + return; + + if (BPF_MODE(insn->code) == BPF_PROBE_MEM32) { + /* lgrl %r1,kern_arena */ + EMIT6_PCREL_RILB(0xc4080000, REG_W1, jit->kern_arena); + probe->arena_reg = REG_W1; + } + probe->prg = jit->prg; + probe->reg = reg2hex[insn->dst_reg]; +} + +static void bpf_jit_probe_store_pre(struct bpf_jit *jit, struct bpf_insn *insn, + struct bpf_jit_probe *probe) +{ + if (BPF_MODE(insn->code) != BPF_PROBE_MEM32) + return; + + /* lgrl %r1,kern_arena */ + EMIT6_PCREL_RILB(0xc4080000, REG_W1, jit->kern_arena); + probe->arena_reg = REG_W1; + probe->prg = jit->prg; +} + +static void bpf_jit_probe_atomic_pre(struct bpf_jit *jit, + struct bpf_insn *insn, + struct bpf_jit_probe *probe) +{ + if (BPF_MODE(insn->code) != BPF_PROBE_ATOMIC) + return; + + /* lgrl %r1,kern_arena */ + EMIT6_PCREL_RILB(0xc4080000, REG_W1, jit->kern_arena); + /* agr %r1,%dst */ + EMIT4(0xb9080000, REG_W1, insn->dst_reg); + probe->arena_reg = REG_W1; + probe->prg = jit->prg; +} + +static int bpf_jit_probe_post(struct bpf_jit *jit, struct bpf_prog *fp, + struct bpf_jit_probe *probe) { struct exception_table_entry *ex; - int reg, prg; + int i, prg; s64 delta; u8 *insn; - int i; + if (probe->prg == -1) + /* The probe is not armed. */ + return 0; + bpf_jit_probe_emit_nop(jit, probe); if (!fp->aux->extable) /* Do nothing during early JIT passes. */ return 0; - insn = jit->prg_buf + probe_prg; - reg = get_probe_mem_regno(insn); - if (WARN_ON_ONCE(reg < 0)) - /* JIT bug - unexpected probe instruction. */ - return -1; - if (WARN_ON_ONCE(probe_prg + insn_length(*insn) != nop_prg)) + insn = jit->prg_buf + probe->prg; + if (WARN_ON_ONCE(probe->prg + insn_length(*insn) != probe->nop_prg)) /* JIT bug - gap between probe and nop instructions. */ return -1; for (i = 0; i < 2; i++) { @@ -719,23 +798,24 @@ static int bpf_jit_probe_mem(struct bpf_jit *jit, struct bpf_prog *fp, return -1; ex = &fp->aux->extable[jit->excnt]; /* Add extable entries for probe and nop instructions. */ - prg = i == 0 ? probe_prg : nop_prg; + prg = i == 0 ? probe->prg : probe->nop_prg; delta = jit->prg_buf + prg - (u8 *)&ex->insn; if (WARN_ON_ONCE(delta < INT_MIN || delta > INT_MAX)) /* JIT bug - code and extable must be close. */ return -1; ex->insn = delta; /* - * Always land on the nop. Note that extable infrastructure - * ignores fixup field, it is handled by ex_handler_bpf(). + * Land on the current instruction. Note that the extable + * infrastructure ignores the fixup field; it is handled by + * ex_handler_bpf(). */ - delta = jit->prg_buf + nop_prg - (u8 *)&ex->fixup; + delta = jit->prg_buf + jit->prg - (u8 *)&ex->fixup; if (WARN_ON_ONCE(delta < INT_MIN || delta > INT_MAX)) /* JIT bug - landing pad and extable must be close. */ return -1; ex->fixup = delta; ex->type = EX_TYPE_BPF; - ex->data = reg; + ex->data = probe->reg; jit->excnt++; } return 0; @@ -782,19 +862,15 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp, s32 branch_oc_off = insn->off; u32 dst_reg = insn->dst_reg; u32 src_reg = insn->src_reg; + struct bpf_jit_probe probe; int last, insn_count = 1; u32 *addrs = jit->addrs; s32 imm = insn->imm; s16 off = insn->off; - int probe_prg = -1; unsigned int mask; - int nop_prg; int err; - if (BPF_CLASS(insn->code) == BPF_LDX && - (BPF_MODE(insn->code) == BPF_PROBE_MEM || - BPF_MODE(insn->code) == BPF_PROBE_MEMSX)) - probe_prg = jit->prg; + bpf_jit_probe_init(&probe); switch (insn->code) { /* @@ -823,6 +899,22 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp, } break; case BPF_ALU64 | BPF_MOV | BPF_X: + if (insn_is_cast_user(insn)) { + int patch_brc; + + /* ltgr %dst,%src */ + EMIT4(0xb9020000, dst_reg, src_reg); + /* brc 8,0f */ + patch_brc = jit->prg; + EMIT4_PCREL_RIC(0xa7040000, 8, 0); + /* iihf %dst,user_arena>>32 */ + EMIT6_IMM(0xc0080000, dst_reg, jit->user_arena >> 32); + /* 0: */ + if (jit->prg_buf) + *(u16 *)(jit->prg_buf + patch_brc + 2) = + (jit->prg - patch_brc) >> 1; + break; + } switch (insn->off) { case 0: /* DST = SRC */ /* lgr %dst,%src */ @@ -1366,51 +1458,99 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp, * BPF_ST(X) */ case BPF_STX | BPF_MEM | BPF_B: /* *(u8 *)(dst + off) = src_reg */ - /* stcy %src,off(%dst) */ - EMIT6_DISP_LH(0xe3000000, 0x0072, src_reg, dst_reg, REG_0, off); + case BPF_STX | BPF_PROBE_MEM32 | BPF_B: + bpf_jit_probe_store_pre(jit, insn, &probe); + /* stcy %src,off(%dst,%arena) */ + EMIT6_DISP_LH(0xe3000000, 0x0072, src_reg, dst_reg, + probe.arena_reg, off); + err = bpf_jit_probe_post(jit, fp, &probe); + if (err < 0) + return err; jit->seen |= SEEN_MEM; break; case BPF_STX | BPF_MEM | BPF_H: /* (u16 *)(dst + off) = src */ - /* sthy %src,off(%dst) */ - EMIT6_DISP_LH(0xe3000000, 0x0070, src_reg, dst_reg, REG_0, off); + case BPF_STX | BPF_PROBE_MEM32 | BPF_H: + bpf_jit_probe_store_pre(jit, insn, &probe); + /* sthy %src,off(%dst,%arena) */ + EMIT6_DISP_LH(0xe3000000, 0x0070, src_reg, dst_reg, + probe.arena_reg, off); + err = bpf_jit_probe_post(jit, fp, &probe); + if (err < 0) + return err; jit->seen |= SEEN_MEM; break; case BPF_STX | BPF_MEM | BPF_W: /* *(u32 *)(dst + off) = src */ - /* sty %src,off(%dst) */ - EMIT6_DISP_LH(0xe3000000, 0x0050, src_reg, dst_reg, REG_0, off); + case BPF_STX | BPF_PROBE_MEM32 | BPF_W: + bpf_jit_probe_store_pre(jit, insn, &probe); + /* sty %src,off(%dst,%arena) */ + EMIT6_DISP_LH(0xe3000000, 0x0050, src_reg, dst_reg, + probe.arena_reg, off); + err = bpf_jit_probe_post(jit, fp, &probe); + if (err < 0) + return err; jit->seen |= SEEN_MEM; break; case BPF_STX | BPF_MEM | BPF_DW: /* (u64 *)(dst + off) = src */ - /* stg %src,off(%dst) */ - EMIT6_DISP_LH(0xe3000000, 0x0024, src_reg, dst_reg, REG_0, off); + case BPF_STX | BPF_PROBE_MEM32 | BPF_DW: + bpf_jit_probe_store_pre(jit, insn, &probe); + /* stg %src,off(%dst,%arena) */ + EMIT6_DISP_LH(0xe3000000, 0x0024, src_reg, dst_reg, + probe.arena_reg, off); + err = bpf_jit_probe_post(jit, fp, &probe); + if (err < 0) + return err; jit->seen |= SEEN_MEM; break; case BPF_ST | BPF_MEM | BPF_B: /* *(u8 *)(dst + off) = imm */ + case BPF_ST | BPF_PROBE_MEM32 | BPF_B: /* lhi %w0,imm */ EMIT4_IMM(0xa7080000, REG_W0, (u8) imm); - /* stcy %w0,off(dst) */ - EMIT6_DISP_LH(0xe3000000, 0x0072, REG_W0, dst_reg, REG_0, off); + bpf_jit_probe_store_pre(jit, insn, &probe); + /* stcy %w0,off(%dst,%arena) */ + EMIT6_DISP_LH(0xe3000000, 0x0072, REG_W0, dst_reg, + probe.arena_reg, off); + err = bpf_jit_probe_post(jit, fp, &probe); + if (err < 0) + return err; jit->seen |= SEEN_MEM; break; case BPF_ST | BPF_MEM | BPF_H: /* (u16 *)(dst + off) = imm */ + case BPF_ST | BPF_PROBE_MEM32 | BPF_H: /* lhi %w0,imm */ EMIT4_IMM(0xa7080000, REG_W0, (u16) imm); - /* sthy %w0,off(dst) */ - EMIT6_DISP_LH(0xe3000000, 0x0070, REG_W0, dst_reg, REG_0, off); + bpf_jit_probe_store_pre(jit, insn, &probe); + /* sthy %w0,off(%dst,%arena) */ + EMIT6_DISP_LH(0xe3000000, 0x0070, REG_W0, dst_reg, + probe.arena_reg, off); + err = bpf_jit_probe_post(jit, fp, &probe); + if (err < 0) + return err; jit->seen |= SEEN_MEM; break; case BPF_ST | BPF_MEM | BPF_W: /* *(u32 *)(dst + off) = imm */ + case BPF_ST | BPF_PROBE_MEM32 | BPF_W: /* llilf %w0,imm */ EMIT6_IMM(0xc00f0000, REG_W0, (u32) imm); - /* sty %w0,off(%dst) */ - EMIT6_DISP_LH(0xe3000000, 0x0050, REG_W0, dst_reg, REG_0, off); + bpf_jit_probe_store_pre(jit, insn, &probe); + /* sty %w0,off(%dst,%arena) */ + EMIT6_DISP_LH(0xe3000000, 0x0050, REG_W0, dst_reg, + probe.arena_reg, off); + err = bpf_jit_probe_post(jit, fp, &probe); + if (err < 0) + return err; jit->seen |= SEEN_MEM; break; case BPF_ST | BPF_MEM | BPF_DW: /* *(u64 *)(dst + off) = imm */ + case BPF_ST | BPF_PROBE_MEM32 | BPF_DW: /* lgfi %w0,imm */ EMIT6_IMM(0xc0010000, REG_W0, imm); - /* stg %w0,off(%dst) */ - EMIT6_DISP_LH(0xe3000000, 0x0024, REG_W0, dst_reg, REG_0, off); + bpf_jit_probe_store_pre(jit, insn, &probe); + /* stg %w0,off(%dst,%arena) */ + EMIT6_DISP_LH(0xe3000000, 0x0024, REG_W0, dst_reg, + probe.arena_reg, off); + err = bpf_jit_probe_post(jit, fp, &probe); + if (err < 0) + return err; jit->seen |= SEEN_MEM; break; /* @@ -1418,15 +1558,30 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp, */ case BPF_STX | BPF_ATOMIC | BPF_DW: case BPF_STX | BPF_ATOMIC | BPF_W: + case BPF_STX | BPF_PROBE_ATOMIC | BPF_DW: + case BPF_STX | BPF_PROBE_ATOMIC | BPF_W: { bool is32 = BPF_SIZE(insn->code) == BPF_W; + /* + * Unlike loads and stores, atomics have only a base register, + * but no index register. For the non-arena case, simply use + * %dst as a base. For the arena case, use the work register + * %r1: first, load the arena base into it, and then add %dst + * to it. + */ + probe.arena_reg = dst_reg; + switch (insn->imm) { -/* {op32|op64} {%w0|%src},%src,off(%dst) */ #define EMIT_ATOMIC(op32, op64) do { \ + bpf_jit_probe_atomic_pre(jit, insn, &probe); \ + /* {op32|op64} {%w0|%src},%src,off(%arena) */ \ EMIT6_DISP_LH(0xeb000000, is32 ? (op32) : (op64), \ (insn->imm & BPF_FETCH) ? src_reg : REG_W0, \ - src_reg, dst_reg, off); \ + src_reg, probe.arena_reg, off); \ + err = bpf_jit_probe_post(jit, fp, &probe); \ + if (err < 0) \ + return err; \ if (insn->imm & BPF_FETCH) { \ /* bcr 14,0 - see atomic_fetch_{add,and,or,xor}() */ \ _EMIT2(0x07e0); \ @@ -1455,25 +1610,50 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp, EMIT_ATOMIC(0x00f7, 0x00e7); break; #undef EMIT_ATOMIC - case BPF_XCHG: - /* {ly|lg} %w0,off(%dst) */ + case BPF_XCHG: { + struct bpf_jit_probe load_probe = probe; + int loop_start; + + bpf_jit_probe_atomic_pre(jit, insn, &load_probe); + /* {ly|lg} %w0,off(%arena) */ EMIT6_DISP_LH(0xe3000000, is32 ? 0x0058 : 0x0004, REG_W0, REG_0, - dst_reg, off); - /* 0: {csy|csg} %w0,%src,off(%dst) */ + load_probe.arena_reg, off); + bpf_jit_probe_emit_nop(jit, &load_probe); + /* Reuse {ly|lg}'s arena_reg for {csy|csg}. */ + if (load_probe.prg != -1) { + probe.prg = jit->prg; + probe.arena_reg = load_probe.arena_reg; + } + loop_start = jit->prg; + /* 0: {csy|csg} %w0,%src,off(%arena) */ EMIT6_DISP_LH(0xeb000000, is32 ? 0x0014 : 0x0030, - REG_W0, src_reg, dst_reg, off); + REG_W0, src_reg, probe.arena_reg, off); + bpf_jit_probe_emit_nop(jit, &probe); /* brc 4,0b */ - EMIT4_PCREL_RIC(0xa7040000, 4, jit->prg - 6); + EMIT4_PCREL_RIC(0xa7040000, 4, loop_start); /* {llgfr|lgr} %src,%w0 */ EMIT4(is32 ? 0xb9160000 : 0xb9040000, src_reg, REG_W0); + /* Both probes should land here on exception. */ + err = bpf_jit_probe_post(jit, fp, &load_probe); + if (err < 0) + return err; + err = bpf_jit_probe_post(jit, fp, &probe); + if (err < 0) + return err; if (is32 && insn_is_zext(&insn[1])) insn_count = 2; break; + } case BPF_CMPXCHG: - /* 0: {csy|csg} %b0,%src,off(%dst) */ + bpf_jit_probe_atomic_pre(jit, insn, &probe); + /* 0: {csy|csg} %b0,%src,off(%arena) */ EMIT6_DISP_LH(0xeb000000, is32 ? 0x0014 : 0x0030, - BPF_REG_0, src_reg, dst_reg, off); + BPF_REG_0, src_reg, + probe.arena_reg, off); + err = bpf_jit_probe_post(jit, fp, &probe); + if (err < 0) + return err; break; default: pr_err("Unknown atomic operation %02x\n", insn->imm); @@ -1488,51 +1668,87 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp, */ case BPF_LDX | BPF_MEM | BPF_B: /* dst = *(u8 *)(ul) (src + off) */ case BPF_LDX | BPF_PROBE_MEM | BPF_B: - /* llgc %dst,0(off,%src) */ - EMIT6_DISP_LH(0xe3000000, 0x0090, dst_reg, src_reg, REG_0, off); + case BPF_LDX | BPF_PROBE_MEM32 | BPF_B: + bpf_jit_probe_load_pre(jit, insn, &probe); + /* llgc %dst,off(%src,%arena) */ + EMIT6_DISP_LH(0xe3000000, 0x0090, dst_reg, src_reg, + probe.arena_reg, off); + err = bpf_jit_probe_post(jit, fp, &probe); + if (err < 0) + return err; jit->seen |= SEEN_MEM; if (insn_is_zext(&insn[1])) insn_count = 2; break; case BPF_LDX | BPF_MEMSX | BPF_B: /* dst = *(s8 *)(ul) (src + off) */ case BPF_LDX | BPF_PROBE_MEMSX | BPF_B: - /* lgb %dst,0(off,%src) */ + bpf_jit_probe_load_pre(jit, insn, &probe); + /* lgb %dst,off(%src) */ EMIT6_DISP_LH(0xe3000000, 0x0077, dst_reg, src_reg, REG_0, off); + err = bpf_jit_probe_post(jit, fp, &probe); + if (err < 0) + return err; jit->seen |= SEEN_MEM; break; case BPF_LDX | BPF_MEM | BPF_H: /* dst = *(u16 *)(ul) (src + off) */ case BPF_LDX | BPF_PROBE_MEM | BPF_H: - /* llgh %dst,0(off,%src) */ - EMIT6_DISP_LH(0xe3000000, 0x0091, dst_reg, src_reg, REG_0, off); + case BPF_LDX | BPF_PROBE_MEM32 | BPF_H: + bpf_jit_probe_load_pre(jit, insn, &probe); + /* llgh %dst,off(%src,%arena) */ + EMIT6_DISP_LH(0xe3000000, 0x0091, dst_reg, src_reg, + probe.arena_reg, off); + err = bpf_jit_probe_post(jit, fp, &probe); + if (err < 0) + return err; jit->seen |= SEEN_MEM; if (insn_is_zext(&insn[1])) insn_count = 2; break; case BPF_LDX | BPF_MEMSX | BPF_H: /* dst = *(s16 *)(ul) (src + off) */ case BPF_LDX | BPF_PROBE_MEMSX | BPF_H: - /* lgh %dst,0(off,%src) */ + bpf_jit_probe_load_pre(jit, insn, &probe); + /* lgh %dst,off(%src) */ EMIT6_DISP_LH(0xe3000000, 0x0015, dst_reg, src_reg, REG_0, off); + err = bpf_jit_probe_post(jit, fp, &probe); + if (err < 0) + return err; jit->seen |= SEEN_MEM; break; case BPF_LDX | BPF_MEM | BPF_W: /* dst = *(u32 *)(ul) (src + off) */ case BPF_LDX | BPF_PROBE_MEM | BPF_W: + case BPF_LDX | BPF_PROBE_MEM32 | BPF_W: + bpf_jit_probe_load_pre(jit, insn, &probe); /* llgf %dst,off(%src) */ jit->seen |= SEEN_MEM; - EMIT6_DISP_LH(0xe3000000, 0x0016, dst_reg, src_reg, REG_0, off); + EMIT6_DISP_LH(0xe3000000, 0x0016, dst_reg, src_reg, + probe.arena_reg, off); + err = bpf_jit_probe_post(jit, fp, &probe); + if (err < 0) + return err; if (insn_is_zext(&insn[1])) insn_count = 2; break; case BPF_LDX | BPF_MEMSX | BPF_W: /* dst = *(s32 *)(ul) (src + off) */ case BPF_LDX | BPF_PROBE_MEMSX | BPF_W: + bpf_jit_probe_load_pre(jit, insn, &probe); /* lgf %dst,off(%src) */ jit->seen |= SEEN_MEM; EMIT6_DISP_LH(0xe3000000, 0x0014, dst_reg, src_reg, REG_0, off); + err = bpf_jit_probe_post(jit, fp, &probe); + if (err < 0) + return err; break; case BPF_LDX | BPF_MEM | BPF_DW: /* dst = *(u64 *)(ul) (src + off) */ case BPF_LDX | BPF_PROBE_MEM | BPF_DW: - /* lg %dst,0(off,%src) */ + case BPF_LDX | BPF_PROBE_MEM32 | BPF_DW: + bpf_jit_probe_load_pre(jit, insn, &probe); + /* lg %dst,off(%src,%arena) */ jit->seen |= SEEN_MEM; - EMIT6_DISP_LH(0xe3000000, 0x0004, dst_reg, src_reg, REG_0, off); + EMIT6_DISP_LH(0xe3000000, 0x0004, dst_reg, src_reg, + probe.arena_reg, off); + err = bpf_jit_probe_post(jit, fp, &probe); + if (err < 0) + return err; break; /* * BPF_JMP / CALL @@ -1647,7 +1863,7 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp, /* * Restore registers before calling function */ - save_restore_regs(jit, REGS_RESTORE, stack_depth); + save_restore_regs(jit, REGS_RESTORE, stack_depth, 0); /* * goto *(prog->bpf_func + tail_call_start); @@ -1897,22 +2113,6 @@ branch_oc: return -1; } - if (probe_prg != -1) { - /* - * Handlers of certain exceptions leave psw.addr pointing to - * the instruction directly after the failing one. Therefore, - * create two exception table entries and also add a nop in - * case two probing instructions come directly after each - * other. - */ - nop_prg = jit->prg; - /* bcr 0,%0 */ - _EMIT2(0x0700); - err = bpf_jit_probe_mem(jit, fp, probe_prg, nop_prg); - if (err < 0) - return err; - } - return insn_count; } @@ -1958,12 +2158,18 @@ static int bpf_jit_prog(struct bpf_jit *jit, struct bpf_prog *fp, bool extra_pass, u32 stack_depth) { int i, insn_count, lit32_size, lit64_size; + u64 kern_arena; jit->lit32 = jit->lit32_start; jit->lit64 = jit->lit64_start; jit->prg = 0; jit->excnt = 0; + kern_arena = bpf_arena_get_kern_vm_start(fp->aux->arena); + if (kern_arena) + jit->kern_arena = _EMIT_CONST_U64(kern_arena); + jit->user_arena = bpf_arena_get_user_vm_start(fp->aux->arena); + bpf_jit_prologue(jit, fp, stack_depth); if (bpf_set_addr(jit, 0) < 0) return -1; @@ -2011,9 +2217,25 @@ static struct bpf_binary_header *bpf_jit_alloc(struct bpf_jit *jit, struct bpf_prog *fp) { struct bpf_binary_header *header; + struct bpf_insn *insn; u32 extable_size; u32 code_size; + int i; + + for (i = 0; i < fp->len; i++) { + insn = &fp->insnsi[i]; + if (BPF_CLASS(insn->code) == BPF_STX && + BPF_MODE(insn->code) == BPF_PROBE_ATOMIC && + (BPF_SIZE(insn->code) == BPF_DW || + BPF_SIZE(insn->code) == BPF_W) && + insn->imm == BPF_XCHG) + /* + * bpf_jit_insn() emits a load and a compare-and-swap, + * both of which need to be probed. + */ + fp->aux->num_exentries += 1; + } /* We need two entries per insn. */ fp->aux->num_exentries *= 2; @@ -2689,3 +2911,52 @@ bool bpf_jit_supports_subprog_tailcalls(void) { return true; } + +bool bpf_jit_supports_arena(void) +{ + return true; +} + +bool bpf_jit_supports_insn(struct bpf_insn *insn, bool in_arena) +{ + /* + * Currently the verifier uses this function only to check which + * atomic stores to arena are supported, and they all are. + */ + return true; +} + +bool bpf_jit_supports_exceptions(void) +{ + /* + * Exceptions require unwinding support, which is always available, + * because the kernel is always built with backchain. + */ + return true; +} + +void arch_bpf_stack_walk(bool (*consume_fn)(void *, u64, u64, u64), + void *cookie) +{ + unsigned long addr, prev_addr = 0; + struct unwind_state state; + + unwind_for_each_frame(&state, NULL, NULL, 0) { + addr = unwind_get_return_address(&state); + if (!addr) + break; + /* + * addr is a return address and state.sp is the value of %r15 + * at this address. exception_cb needs %r15 at entry to the + * function containing addr, so take the next state.sp. + * + * There is no bp, and the exception_cb prog does not need one + * to perform a quasi-longjmp. The common code requires a + * non-zero bp, so pass sp there as well. + */ + if (prev_addr && !consume_fn(cookie, prev_addr, state.sp, + state.sp)) + break; + prev_addr = addr; + } +} diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 5159c7a22922..d25d81c8ecc0 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -1234,13 +1234,11 @@ bool ex_handler_bpf(const struct exception_table_entry *x, struct pt_regs *regs) } static void detect_reg_usage(struct bpf_insn *insn, int insn_cnt, - bool *regs_used, bool *tail_call_seen) + bool *regs_used) { int i; for (i = 1; i <= insn_cnt; i++, insn++) { - if (insn->code == (BPF_JMP | BPF_TAIL_CALL)) - *tail_call_seen = true; if (insn->dst_reg == BPF_REG_6 || insn->src_reg == BPF_REG_6) regs_used[0] = true; if (insn->dst_reg == BPF_REG_7 || insn->src_reg == BPF_REG_7) @@ -1324,7 +1322,6 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image struct bpf_insn *insn = bpf_prog->insnsi; bool callee_regs_used[4] = {}; int insn_cnt = bpf_prog->len; - bool tail_call_seen = false; bool seen_exit = false; u8 temp[BPF_MAX_INSN_SIZE + BPF_INSN_SAFETY]; u64 arena_vm_start, user_vm_start; @@ -1336,11 +1333,7 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image arena_vm_start = bpf_arena_get_kern_vm_start(bpf_prog->aux->arena); user_vm_start = bpf_arena_get_user_vm_start(bpf_prog->aux->arena); - detect_reg_usage(insn, insn_cnt, callee_regs_used, - &tail_call_seen); - - /* tail call's presence in current prog implies it is reachable */ - tail_call_reachable |= tail_call_seen; + detect_reg_usage(insn, insn_cnt, callee_regs_used); emit_prologue(&prog, bpf_prog->aux->stack_depth, bpf_prog_was_classic(bpf_prog), tail_call_reachable, @@ -3363,7 +3356,7 @@ out_image: * * Both cases are serious bugs and justify WARN_ON. */ - if (WARN_ON(bpf_jit_binary_pack_finalize(prog, header, rw_header))) { + if (WARN_ON(bpf_jit_binary_pack_finalize(header, rw_header))) { /* header has been freed */ header = NULL; goto out_image; @@ -3442,7 +3435,7 @@ void bpf_jit_free(struct bpf_prog *prog) * before freeing it. */ if (jit_data) { - bpf_jit_binary_pack_finalize(prog, jit_data->header, + bpf_jit_binary_pack_finalize(jit_data->header, jit_data->rw_header); kvfree(jit_data->addrs); kfree(jit_data); |