diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2024-07-16 19:28:34 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2024-07-16 19:28:34 -0700 |
commit | 51835949dda3783d4639cfa74ce13a3c9829de00 (patch) | |
tree | 2b593de5eba6ecc73f7c58fc65fdaffae45c7323 /kernel/bpf/btf.c | |
parent | 0434dbe32053d07d658165be681505120c6b1abc (diff) | |
parent | 77ae5e5b00720372af2860efdc4bc652ac682696 (diff) |
Merge tag 'net-next-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextHEADmaster
Pull networking updates from Jakub Kicinski:
"Not much excitement - a handful of large patchsets (devmem among them)
did not make it in time.
Core & protocols:
- Use local_lock in addition to local_bh_disable() to protect per-CPU
resources in networking, a step closer for local_bh_disable() not
to act as a big lock on PREEMPT_RT
- Use flex array for netdevice priv area, ensure its cache alignment
- Add a sysctl knob to allow user to specify a default rto_min at
socket init time. Bit of a big hammer but multiple companies were
independently carrying such patch downstream so clearly it's useful
- Support scheduling transmission of packets based on CLOCK_TAI
- Un-pin TCP TIMEWAIT timer to avoid it firing on CPUs later cordoned
off using cpusets
- Support multiple L2TPv3 UDP tunnels using the same 5-tuple address
- Allow configuration of multipath hash seed, to both allow
synchronizing hashing of two routers, and preventing partial
accidental sync
- Improve TCP compliance with RFC 9293 for simultaneous connect()
- Support sending NAT keepalives in IPsec ESP in UDP states.
Userspace IKE daemon had to do this before, but the kernel can
better keep track of it
- Support sending supervision HSR frames with MAC addresses stored in
ProxyNodeTable when RedBox (i.e. HSR-SAN) is enabled
- Introduce IPPROTO_SMC for selecting SMC when socket is created
- Allow UDP GSO transmit from devices with no checksum offload
- openvswitch: add packet sampling via psample, separating the
sampled traffic from "upcall" packets sent to user space for
forwarding
- nf_tables: shrink memory consumption for transaction objects
Things we sprinkled into general kernel code:
- Power Sequencing subsystem (used by Qualcomm Bluetooth driver for
QCA6390) [ Already merged separately - Linus ]
- Add IRQ information in sysfs for auxiliary bus
- Introduce guard definition for local_lock
- Add aligned flavor of __cacheline_group_{begin, end}() markings for
grouping fields in structures
BPF:
- Notify user space (via epoll) when a struct_ops object is getting
detached/unregistered
- Add new kfuncs for a generic, open-coded bits iterator
- Enable BPF programs to declare arrays of kptr, bpf_rb_root, and
bpf_list_head
- Support resilient split BTF which cuts down on duplication and
makes BTF as compact as possible WRT BTF from modules
- Add support for dumping kfunc prototypes from BTF which enables
both detecting as well as dumping compilable prototypes for kfuncs
- riscv64 BPF JIT improvements in particular to add 12-argument
support for BPF trampolines and to utilize bpf_prog_pack for the
latter
- Add the capability to offload the netfilter flowtable in XDP layer
through kfuncs
Driver API:
- Allow users to configure IRQ tresholds between which automatic IRQ
moderation can choose
- Expand Power Sourcing (PoE) status with power, class and failure
reason. Support setting power limits
- Track additional RSS contexts in the core, make sure configuration
changes don't break them
- Support IPsec crypto offload for IPv6 ESP and IPv4 UDP-encapsulated
ESP data paths
- Support updating firmware on SFP modules
Tests and tooling:
- mptcp: use net/lib.sh to manage netns
- TCP-AO and TCP-MD5: replace debug prints used by tests with
tracepoints
- openvswitch: make test self-contained (don't depend on OvS CLI
tools)
Drivers:
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- increase the max total outstanding PTP TX packets to 4
- add timestamping statistics support
- implement netdev_queue_mgmt_ops
- support new RSS context API
- Intel (100G, ice, idpf):
- implement FEC statistics and dumping signal quality indicators
- support E825C products (with 56Gbps PHYs)
- nVidia/Mellanox:
- support HW-GRO
- mlx4/mlx5: support per-queue statistics via netlink
- obey the max number of EQs setting in sub-functions
- AMD/Solarflare:
- support new RSS context API
- AMD/Pensando:
- ionic: rework fix for doorbell miss to lower overhead and
skip it on new HW
- Wangxun:
- txgbe: support Flow Director perfect filters
- Ethernet NICs consumer, embedded and virtual:
- Add driver for Tehuti Networks TN40xx chips
- Add driver for Meta's internal NIC chips
- Add driver for Ethernet MAC on Airoha EN7581 SoCs
- Add driver for Renesas Ethernet-TSN devices
- Google cloud vNIC:
- flow steering support
- Microsoft vNIC:
- support page sizes other than 4KB on ARM64
- vmware vNIC:
- support latency measurement (update to version 9)
- VirtIO net:
- support for Byte Queue Limits
- support configuring thresholds for automatic IRQ moderation
- support for AF_XDP Rx zero-copy
- Synopsys (stmmac):
- support for STM32MP13 SoC
- let platforms select the right PCS implementation
- TI:
- icssg-prueth: add multicast filtering support
- icssg-prueth: enable PTP timestamping and PPS
- Renesas:
- ravb: improve Rx performance 30-400% by using page pool,
theaded NAPI and timer-based IRQ coalescing
- ravb: add MII support for R-Car V4M
- Cadence (macb):
- macb: add ARP support to Wake-On-LAN
- Cortina:
- use phylib for RX and TX pause configuration
- Ethernet switches:
- nVidia/Mellanox:
- support configuration of multipath hash seed
- report more accurate max MTU
- use page_pool to improve Rx performance
- MediaTek:
- mt7530: add support for bridge port isolation
- Qualcomm:
- qca8k: add support for bridge port isolation
- Microchip:
- lan9371/2: add 100BaseTX PHY support
- NXP:
- vsc73xx: implement VLAN operations
- Ethernet PHYs:
- aquantia: enable support for aqr115c
- aquantia: add support for PHY LEDs
- realtek: add support for rtl8224 2.5Gbps PHY
- xpcs: add memory-mapped device support
- add BroadR-Reach link mode and support in Broadcom's PHY driver
- CAN:
- add document for ISO 15765-2 protocol support
- mcp251xfd: workaround for erratum DS80000789E, use timestamps to
catch when device returns incorrect FIFO status
- WiFi:
- mac80211/cfg80211:
- parse Transmit Power Envelope (TPE) data in mac80211 instead
of in drivers
- improvements for 6 GHz regulatory flexibility
- multi-link improvements
- support multiple radios per wiphy
- remove DEAUTH_NEED_MGD_TX_PREP flag
- Intel (iwlwifi):
- bump FW API to 91 for BZ/SC devices
- report 64-bit radiotap timestamp
- enable P2P low latency by default
- handle Transmit Power Envelope (TPE) advertised by AP
- remove support for older FW for new devices
- fast resume (keeping the device configured)
- mvm: re-enable Multi-Link Operation (MLO)
- aggregation (A-MSDU) optimizations
- MediaTek (mt76):
- mt7925 Multi-Link Operation (MLO) support
- Qualcomm (ath10k):
- LED support for various chipsets
- Qualcomm (ath12k):
- remove unsupported Tx monitor handling
- support channel 2 in 6 GHz band
- support Spatial Multiplexing Power Save (SMPS) in 6 GHz band
- supprt multiple BSSID (MBSSID) and Enhanced Multi-BSSID
Advertisements (EMA)
- support dynamic VLAN
- add panic handler for resetting the firmware state
- DebugFS support for datapath statistics
- WCN7850: support for Wake on WLAN
- Microchip (wilc1000):
- read MAC address during probe to make it visible to user space
- suspend/resume improvements
- TI (wl18xx):
- support newer firmware versions
- RealTek (rtw89):
- preparation for RTL8852BE-VT support
- Wake on WLAN support for WiFi 6 chips
- 36-bit PCI DMA support
- RealTek (rtlwifi):
- RTL8192DU support
- Broadcom (brcmfmac):
- Management Frame Protection support (to enable WPA3)
- Bluetooth:
- qualcomm: use the power sequencer for QCA6390
- btusb: mediatek: add ISO data transmission functions
- hci_bcm4377: add BCM4388 support
- btintel: add support for BlazarU core
- btintel: add support for Whale Peak2
- btnxpuart: add support for AW693 A1 chipset
- btnxpuart: add support for IW615 chipset
- btusb: add Realtek RTL8852BE support ID 0x13d3:0x3591"
* tag 'net-next-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1589 commits)
eth: fbnic: Fix spelling mistake "tiggerring" -> "triggering"
tcp: Replace strncpy() with strscpy()
wifi: ath12k: fix build vs old compiler
tcp: Don't access uninit tcp_rsk(req)->ao_keyid in tcp_create_openreq_child().
eth: fbnic: Write the TCAM tables used for RSS control and Rx to host
eth: fbnic: Add L2 address programming
eth: fbnic: Add basic Rx handling
eth: fbnic: Add basic Tx handling
eth: fbnic: Add link detection
eth: fbnic: Add initial messaging to notify FW of our presence
eth: fbnic: Implement Rx queue alloc/start/stop/free
eth: fbnic: Implement Tx queue alloc/start/stop/free
eth: fbnic: Allocate a netdevice and napi vectors with queues
eth: fbnic: Add FW communication mechanism
eth: fbnic: Add message parsing for FW messages
eth: fbnic: Add register init to set PCIe/Ethernet device config
eth: fbnic: Allocate core device specific structures and devlink interface
eth: fbnic: Add scaffolding for Meta's NIC driver
PCI: Add Meta Platforms vendor ID
net/sched: cls_flower: propagate tca[TCA_OPTIONS] to NL_REQ_ATTR_CHECK
...
Diffstat (limited to 'kernel/bpf/btf.c')
-rw-r--r-- | kernel/bpf/btf.c | 509 |
1 files changed, 341 insertions, 168 deletions
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 821063660d9f..520f49f422fe 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -274,6 +274,7 @@ struct btf { u32 start_str_off; /* first string offset (0 for base BTF) */ char name[MODULE_NAME_LEN]; bool kernel_btf; + __u32 *base_id_map; /* map from distilled base BTF -> vmlinux BTF ids */ }; enum verifier_phase { @@ -414,7 +415,7 @@ const char *btf_type_str(const struct btf_type *t) struct btf_show { u64 flags; void *target; /* target of show operation (seq file, buffer) */ - void (*showfn)(struct btf_show *show, const char *fmt, va_list args); + __printf(2, 0) void (*showfn)(struct btf_show *show, const char *fmt, va_list args); const struct btf *btf; /* below are used during iteration */ struct { @@ -530,6 +531,11 @@ static bool btf_type_is_decl_tag_target(const struct btf_type *t) btf_type_is_var(t) || btf_type_is_typedef(t); } +bool btf_is_vmlinux(const struct btf *btf) +{ + return btf->kernel_btf && !btf->base_btf; +} + u32 btf_nr_types(const struct btf *btf) { u32 total = 0; @@ -772,7 +778,7 @@ static bool __btf_name_char_ok(char c, bool first) return true; } -static const char *btf_str_by_offset(const struct btf *btf, u32 offset) +const char *btf_str_by_offset(const struct btf *btf, u32 offset) { while (offset < btf->start_str_off) btf = btf->base_btf; @@ -1670,14 +1676,8 @@ static void btf_free_kfunc_set_tab(struct btf *btf) if (!tab) return; - /* For module BTF, we directly assign the sets being registered, so - * there is nothing to free except kfunc_set_tab. - */ - if (btf_is_module(btf)) - goto free_tab; for (hook = 0; hook < ARRAY_SIZE(tab->sets); hook++) kfree(tab->sets[hook]); -free_tab: kfree(tab); btf->kfunc_set_tab = NULL; } @@ -1735,7 +1735,12 @@ static void btf_free(struct btf *btf) kvfree(btf->types); kvfree(btf->resolved_sizes); kvfree(btf->resolved_ids); - kvfree(btf->data); + /* vmlinux does not allocate btf->data, it simply points it at + * __start_BTF. + */ + if (!btf_is_vmlinux(btf)) + kvfree(btf->data); + kvfree(btf->base_id_map); kfree(btf); } @@ -1764,6 +1769,23 @@ void btf_put(struct btf *btf) } } +struct btf *btf_base_btf(const struct btf *btf) +{ + return btf->base_btf; +} + +const struct btf_header *btf_header(const struct btf *btf) +{ + return &btf->hdr; +} + +void btf_set_base_btf(struct btf *btf, const struct btf *base_btf) +{ + btf->base_btf = (struct btf *)base_btf; + btf->start_id = btf_nr_types(base_btf); + btf->start_str_off = base_btf->hdr.str_len; +} + static int env_resolve_init(struct btf_verifier_env *env) { struct btf *btf = env->btf; @@ -3442,10 +3464,12 @@ btf_find_graph_root(const struct btf *btf, const struct btf_type *pt, goto end; \ } -static int btf_get_field_type(const char *name, u32 field_mask, u32 *seen_mask, +static int btf_get_field_type(const struct btf *btf, const struct btf_type *var_type, + u32 field_mask, u32 *seen_mask, int *align, int *sz) { int type = 0; + const char *name = __btf_name_by_offset(btf, var_type->name_off); if (field_mask & BPF_SPIN_LOCK) { if (!strcmp(name, "bpf_spin_lock")) { @@ -3481,7 +3505,7 @@ static int btf_get_field_type(const char *name, u32 field_mask, u32 *seen_mask, field_mask_test_name(BPF_REFCOUNT, "bpf_refcount"); /* Only return BPF_KPTR when all other types with matchable names fail */ - if (field_mask & BPF_KPTR) { + if (field_mask & BPF_KPTR && !__btf_type_is_struct(var_type)) { type = BPF_KPTR_REF; goto end; } @@ -3494,140 +3518,232 @@ end: #undef field_mask_test_name +/* Repeat a number of fields for a specified number of times. + * + * Copy the fields starting from the first field and repeat them for + * repeat_cnt times. The fields are repeated by adding the offset of each + * field with + * (i + 1) * elem_size + * where i is the repeat index and elem_size is the size of an element. + */ +static int btf_repeat_fields(struct btf_field_info *info, + u32 field_cnt, u32 repeat_cnt, u32 elem_size) +{ + u32 i, j; + u32 cur; + + /* Ensure not repeating fields that should not be repeated. */ + for (i = 0; i < field_cnt; i++) { + switch (info[i].type) { + case BPF_KPTR_UNREF: + case BPF_KPTR_REF: + case BPF_KPTR_PERCPU: + case BPF_LIST_HEAD: + case BPF_RB_ROOT: + break; + default: + return -EINVAL; + } + } + + cur = field_cnt; + for (i = 0; i < repeat_cnt; i++) { + memcpy(&info[cur], &info[0], field_cnt * sizeof(info[0])); + for (j = 0; j < field_cnt; j++) + info[cur++].off += (i + 1) * elem_size; + } + + return 0; +} + static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t, u32 field_mask, - struct btf_field_info *info, int info_cnt) + struct btf_field_info *info, int info_cnt, + u32 level); + +/* Find special fields in the struct type of a field. + * + * This function is used to find fields of special types that is not a + * global variable or a direct field of a struct type. It also handles the + * repetition if it is the element type of an array. + */ +static int btf_find_nested_struct(const struct btf *btf, const struct btf_type *t, + u32 off, u32 nelems, + u32 field_mask, struct btf_field_info *info, + int info_cnt, u32 level) { - int ret, idx = 0, align, sz, field_type; - const struct btf_member *member; + int ret, err, i; + + level++; + if (level >= MAX_RESOLVE_DEPTH) + return -E2BIG; + + ret = btf_find_struct_field(btf, t, field_mask, info, info_cnt, level); + + if (ret <= 0) + return ret; + + /* Shift the offsets of the nested struct fields to the offsets + * related to the container. + */ + for (i = 0; i < ret; i++) + info[i].off += off; + + if (nelems > 1) { + err = btf_repeat_fields(info, ret, nelems - 1, t->size); + if (err == 0) + ret *= nelems; + else + ret = err; + } + + return ret; +} + +static int btf_find_field_one(const struct btf *btf, + const struct btf_type *var, + const struct btf_type *var_type, + int var_idx, + u32 off, u32 expected_size, + u32 field_mask, u32 *seen_mask, + struct btf_field_info *info, int info_cnt, + u32 level) +{ + int ret, align, sz, field_type; struct btf_field_info tmp; + const struct btf_array *array; + u32 i, nelems = 1; + + /* Walk into array types to find the element type and the number of + * elements in the (flattened) array. + */ + for (i = 0; i < MAX_RESOLVE_DEPTH && btf_type_is_array(var_type); i++) { + array = btf_array(var_type); + nelems *= array->nelems; + var_type = btf_type_by_id(btf, array->type); + } + if (i == MAX_RESOLVE_DEPTH) + return -E2BIG; + if (nelems == 0) + return 0; + + field_type = btf_get_field_type(btf, var_type, + field_mask, seen_mask, &align, &sz); + /* Look into variables of struct types */ + if (!field_type && __btf_type_is_struct(var_type)) { + sz = var_type->size; + if (expected_size && expected_size != sz * nelems) + return 0; + ret = btf_find_nested_struct(btf, var_type, off, nelems, field_mask, + &info[0], info_cnt, level); + return ret; + } + + if (field_type == 0) + return 0; + if (field_type < 0) + return field_type; + + if (expected_size && expected_size != sz * nelems) + return 0; + if (off % align) + return 0; + + switch (field_type) { + case BPF_SPIN_LOCK: + case BPF_TIMER: + case BPF_WORKQUEUE: + case BPF_LIST_NODE: + case BPF_RB_NODE: + case BPF_REFCOUNT: + ret = btf_find_struct(btf, var_type, off, sz, field_type, + info_cnt ? &info[0] : &tmp); + if (ret < 0) + return ret; + break; + case BPF_KPTR_UNREF: + case BPF_KPTR_REF: + case BPF_KPTR_PERCPU: + ret = btf_find_kptr(btf, var_type, off, sz, + info_cnt ? &info[0] : &tmp); + if (ret < 0) + return ret; + break; + case BPF_LIST_HEAD: + case BPF_RB_ROOT: + ret = btf_find_graph_root(btf, var, var_type, + var_idx, off, sz, + info_cnt ? &info[0] : &tmp, + field_type); + if (ret < 0) + return ret; + break; + default: + return -EFAULT; + } + + if (ret == BTF_FIELD_IGNORE) + return 0; + if (nelems > info_cnt) + return -E2BIG; + if (nelems > 1) { + ret = btf_repeat_fields(info, 1, nelems - 1, sz); + if (ret < 0) + return ret; + } + return nelems; +} + +static int btf_find_struct_field(const struct btf *btf, + const struct btf_type *t, u32 field_mask, + struct btf_field_info *info, int info_cnt, + u32 level) +{ + int ret, idx = 0; + const struct btf_member *member; u32 i, off, seen_mask = 0; for_each_member(i, t, member) { const struct btf_type *member_type = btf_type_by_id(btf, member->type); - field_type = btf_get_field_type(__btf_name_by_offset(btf, member_type->name_off), - field_mask, &seen_mask, &align, &sz); - if (field_type == 0) - continue; - if (field_type < 0) - return field_type; - off = __btf_member_bit_offset(t, member); if (off % 8) /* valid C code cannot generate such BTF */ return -EINVAL; off /= 8; - if (off % align) - continue; - switch (field_type) { - case BPF_SPIN_LOCK: - case BPF_TIMER: - case BPF_WORKQUEUE: - case BPF_LIST_NODE: - case BPF_RB_NODE: - case BPF_REFCOUNT: - ret = btf_find_struct(btf, member_type, off, sz, field_type, - idx < info_cnt ? &info[idx] : &tmp); - if (ret < 0) - return ret; - break; - case BPF_KPTR_UNREF: - case BPF_KPTR_REF: - case BPF_KPTR_PERCPU: - ret = btf_find_kptr(btf, member_type, off, sz, - idx < info_cnt ? &info[idx] : &tmp); - if (ret < 0) - return ret; - break; - case BPF_LIST_HEAD: - case BPF_RB_ROOT: - ret = btf_find_graph_root(btf, t, member_type, - i, off, sz, - idx < info_cnt ? &info[idx] : &tmp, - field_type); - if (ret < 0) - return ret; - break; - default: - return -EFAULT; - } - - if (ret == BTF_FIELD_IGNORE) - continue; - if (idx >= info_cnt) - return -E2BIG; - ++idx; + ret = btf_find_field_one(btf, t, member_type, i, + off, 0, + field_mask, &seen_mask, + &info[idx], info_cnt - idx, level); + if (ret < 0) + return ret; + idx += ret; } return idx; } static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t, u32 field_mask, struct btf_field_info *info, - int info_cnt) + int info_cnt, u32 level) { - int ret, idx = 0, align, sz, field_type; + int ret, idx = 0; const struct btf_var_secinfo *vsi; - struct btf_field_info tmp; u32 i, off, seen_mask = 0; for_each_vsi(i, t, vsi) { const struct btf_type *var = btf_type_by_id(btf, vsi->type); const struct btf_type *var_type = btf_type_by_id(btf, var->type); - field_type = btf_get_field_type(__btf_name_by_offset(btf, var_type->name_off), - field_mask, &seen_mask, &align, &sz); - if (field_type == 0) - continue; - if (field_type < 0) - return field_type; - off = vsi->offset; - if (vsi->size != sz) - continue; - if (off % align) - continue; - - switch (field_type) { - case BPF_SPIN_LOCK: - case BPF_TIMER: - case BPF_WORKQUEUE: - case BPF_LIST_NODE: - case BPF_RB_NODE: - case BPF_REFCOUNT: - ret = btf_find_struct(btf, var_type, off, sz, field_type, - idx < info_cnt ? &info[idx] : &tmp); - if (ret < 0) - return ret; - break; - case BPF_KPTR_UNREF: - case BPF_KPTR_REF: - case BPF_KPTR_PERCPU: - ret = btf_find_kptr(btf, var_type, off, sz, - idx < info_cnt ? &info[idx] : &tmp); - if (ret < 0) - return ret; - break; - case BPF_LIST_HEAD: - case BPF_RB_ROOT: - ret = btf_find_graph_root(btf, var, var_type, - -1, off, sz, - idx < info_cnt ? &info[idx] : &tmp, - field_type); - if (ret < 0) - return ret; - break; - default: - return -EFAULT; - } - - if (ret == BTF_FIELD_IGNORE) - continue; - if (idx >= info_cnt) - return -E2BIG; - ++idx; + ret = btf_find_field_one(btf, var, var_type, -1, off, vsi->size, + field_mask, &seen_mask, + &info[idx], info_cnt - idx, + level); + if (ret < 0) + return ret; + idx += ret; } return idx; } @@ -3637,9 +3753,9 @@ static int btf_find_field(const struct btf *btf, const struct btf_type *t, int info_cnt) { if (__btf_type_is_struct(t)) - return btf_find_struct_field(btf, t, field_mask, info, info_cnt); + return btf_find_struct_field(btf, t, field_mask, info, info_cnt, 0); else if (btf_type_is_datasec(t)) - return btf_find_datasec_var(btf, t, field_mask, info, info_cnt); + return btf_find_datasec_var(btf, t, field_mask, info, info_cnt, 0); return -EINVAL; } @@ -5726,6 +5842,15 @@ static int find_kern_ctx_type_id(enum bpf_prog_type prog_type) return ctx_type->type; } +bool btf_is_projection_of(const char *pname, const char *tname) +{ + if (strcmp(pname, "__sk_buff") == 0 && strcmp(tname, "sk_buff") == 0) + return true; + if (strcmp(pname, "xdp_md") == 0 && strcmp(tname, "xdp_buff") == 0) + return true; + return false; +} + bool btf_is_prog_ctx_type(struct bpf_verifier_log *log, const struct btf *btf, const struct btf_type *t, enum bpf_prog_type prog_type, int arg) @@ -5788,9 +5913,7 @@ again: * int socket_filter_bpf_prog(struct __sk_buff *skb) * { // no fields of skb are ever used } */ - if (strcmp(ctx_tname, "__sk_buff") == 0 && strcmp(tname, "sk_buff") == 0) - return true; - if (strcmp(ctx_tname, "xdp_md") == 0 && strcmp(tname, "xdp_buff") == 0) + if (btf_is_projection_of(ctx_tname, tname)) return true; if (strcmp(ctx_tname, tname)) { /* bpf_user_pt_regs_t is a typedef, so resolve it to @@ -5982,23 +6105,15 @@ int get_kern_ctx_btf_id(struct bpf_verifier_log *log, enum bpf_prog_type prog_ty BTF_ID_LIST(bpf_ctx_convert_btf_id) BTF_ID(struct, bpf_ctx_convert) -struct btf *btf_parse_vmlinux(void) +static struct btf *btf_parse_base(struct btf_verifier_env *env, const char *name, + void *data, unsigned int data_size) { - struct btf_verifier_env *env = NULL; - struct bpf_verifier_log *log; struct btf *btf = NULL; int err; if (!IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) return ERR_PTR(-ENOENT); - env = kzalloc(sizeof(*env), GFP_KERNEL | __GFP_NOWARN); - if (!env) - return ERR_PTR(-ENOMEM); - - log = &env->log; - log->level = BPF_LOG_KERNEL; - btf = kzalloc(sizeof(*btf), GFP_KERNEL | __GFP_NOWARN); if (!btf) { err = -ENOMEM; @@ -6006,10 +6121,10 @@ struct btf *btf_parse_vmlinux(void) } env->btf = btf; - btf->data = __start_BTF; - btf->data_size = __stop_BTF - __start_BTF; + btf->data = data; + btf->data_size = data_size; btf->kernel_btf = true; - snprintf(btf->name, sizeof(btf->name), "vmlinux"); + snprintf(btf->name, sizeof(btf->name), "%s", name); err = btf_parse_hdr(env); if (err) @@ -6029,20 +6144,11 @@ struct btf *btf_parse_vmlinux(void) if (err) goto errout; - /* btf_parse_vmlinux() runs under bpf_verifier_lock */ - bpf_ctx_convert.t = btf_type_by_id(btf, bpf_ctx_convert_btf_id[0]); - refcount_set(&btf->refcnt, 1); - err = btf_alloc_id(btf); - if (err) - goto errout; - - btf_verifier_env_free(env); return btf; errout: - btf_verifier_env_free(env); if (btf) { kvfree(btf->types); kfree(btf); @@ -6050,19 +6156,61 @@ errout: return ERR_PTR(err); } +struct btf *btf_parse_vmlinux(void) +{ + struct btf_verifier_env *env = NULL; + struct bpf_verifier_log *log; + struct btf *btf; + int err; + + env = kzalloc(sizeof(*env), GFP_KERNEL | __GFP_NOWARN); + if (!env) + return ERR_PTR(-ENOMEM); + + log = &env->log; + log->level = BPF_LOG_KERNEL; + btf = btf_parse_base(env, "vmlinux", __start_BTF, __stop_BTF - __start_BTF); + if (IS_ERR(btf)) + goto err_out; + + /* btf_parse_vmlinux() runs under bpf_verifier_lock */ + bpf_ctx_convert.t = btf_type_by_id(btf, bpf_ctx_convert_btf_id[0]); + err = btf_alloc_id(btf); + if (err) { + btf_free(btf); + btf = ERR_PTR(err); + } +err_out: + btf_verifier_env_free(env); + return btf; +} + +/* If .BTF_ids section was created with distilled base BTF, both base and + * split BTF ids will need to be mapped to actual base/split ids for + * BTF now that it has been relocated. + */ +static __u32 btf_relocate_id(const struct btf *btf, __u32 id) +{ + if (!btf->base_btf || !btf->base_id_map) + return id; + return btf->base_id_map[id]; +} + #ifdef CONFIG_DEBUG_INFO_BTF_MODULES -static struct btf *btf_parse_module(const char *module_name, const void *data, unsigned int data_size) +static struct btf *btf_parse_module(const char *module_name, const void *data, + unsigned int data_size, void *base_data, + unsigned int base_data_size) { + struct btf *btf = NULL, *vmlinux_btf, *base_btf = NULL; struct btf_verifier_env *env = NULL; struct bpf_verifier_log *log; - struct btf *btf = NULL, *base_btf; - int err; + int err = 0; - base_btf = bpf_get_btf_vmlinux(); - if (IS_ERR(base_btf)) - return base_btf; - if (!base_btf) + vmlinux_btf = bpf_get_btf_vmlinux(); + if (IS_ERR(vmlinux_btf)) + return vmlinux_btf; + if (!vmlinux_btf) return ERR_PTR(-EINVAL); env = kzalloc(sizeof(*env), GFP_KERNEL | __GFP_NOWARN); @@ -6072,6 +6220,16 @@ static struct btf *btf_parse_module(const char *module_name, const void *data, u log = &env->log; log->level = BPF_LOG_KERNEL; + if (base_data) { + base_btf = btf_parse_base(env, ".BTF.base", base_data, base_data_size); + if (IS_ERR(base_btf)) { + err = PTR_ERR(base_btf); + goto errout; + } + } else { + base_btf = vmlinux_btf; + } + btf = kzalloc(sizeof(*btf), GFP_KERNEL | __GFP_NOWARN); if (!btf) { err = -ENOMEM; @@ -6111,12 +6269,22 @@ static struct btf *btf_parse_module(const char *module_name, const void *data, u if (err) goto errout; + if (base_btf != vmlinux_btf) { + err = btf_relocate(btf, vmlinux_btf, &btf->base_id_map); + if (err) + goto errout; + btf_free(base_btf); + base_btf = vmlinux_btf; + } + btf_verifier_env_free(env); refcount_set(&btf->refcnt, 1); return btf; errout: btf_verifier_env_free(env); + if (base_btf != vmlinux_btf) + btf_free(base_btf); if (btf) { kvfree(btf->data); kvfree(btf->types); @@ -6693,7 +6861,7 @@ int btf_struct_access(struct bpf_verifier_log *log, for (i = 0; i < rec->cnt; i++) { struct btf_field *field = &rec->fields[i]; u32 offset = field->offset; - if (off < offset + btf_field_type_size(field->type) && offset < off + size) { + if (off < offset + field->size && offset < off + size) { bpf_log(log, "direct access to %s is disallowed\n", btf_field_type_name(field->type)); @@ -7370,8 +7538,8 @@ static void btf_type_show(const struct btf *btf, u32 type_id, void *obj, btf_type_ops(t)->show(btf, t, type_id, obj, 0, show); } -static void btf_seq_show(struct btf_show *show, const char *fmt, - va_list args) +__printf(2, 0) static void btf_seq_show(struct btf_show *show, const char *fmt, + va_list args) { seq_vprintf((struct seq_file *)show->target, fmt, args); } @@ -7404,8 +7572,8 @@ struct btf_show_snprintf { int len; /* length we would have written */ }; -static void btf_snprintf_show(struct btf_show *show, const char *fmt, - va_list args) +__printf(2, 0) static void btf_snprintf_show(struct btf_show *show, const char *fmt, + va_list args) { struct btf_show_snprintf *ssnprintf = (struct btf_show_snprintf *)show; int len; @@ -7669,7 +7837,8 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op, err = -ENOMEM; goto out; } - btf = btf_parse_module(mod->name, mod->btf_data, mod->btf_data_size); + btf = btf_parse_module(mod->name, mod->btf_data, mod->btf_data_size, + mod->btf_base_data, mod->btf_base_data_size); if (IS_ERR(btf)) { kfree(btf_mod); if (!IS_ENABLED(CONFIG_MODULE_ALLOW_BTF_MISMATCH)) { @@ -7993,7 +8162,7 @@ static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook, bool add_filter = !!kset->filter; struct btf_kfunc_set_tab *tab; struct btf_id_set8 *set; - u32 set_cnt; + u32 set_cnt, i; int ret; if (hook >= BTF_KFUNC_HOOK_MAX) { @@ -8039,21 +8208,15 @@ static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook, goto end; } - /* We don't need to allocate, concatenate, and sort module sets, because - * only one is allowed per hook. Hence, we can directly assign the - * pointer and return. - */ - if (!vmlinux_set) { - tab->sets[hook] = add_set; - goto do_add_filter; - } - /* In case of vmlinux sets, there may be more than one set being * registered per hook. To create a unified set, we allocate a new set * and concatenate all individual sets being registered. While each set * is individually sorted, they may become unsorted when concatenated, * hence re-sorting the final set again is required to make binary * searching the set using btf_id_set8_contains function work. + * + * For module sets, we need to allocate as we may need to relocate + * BTF ids. */ set_cnt = set ? set->cnt : 0; @@ -8083,11 +8246,14 @@ static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook, /* Concatenate the two sets */ memcpy(set->pairs + set->cnt, add_set->pairs, add_set->cnt * sizeof(set->pairs[0])); + /* Now that the set is copied, update with relocated BTF ids */ + for (i = set->cnt; i < set->cnt + add_set->cnt; i++) + set->pairs[i].id = btf_relocate_id(btf, set->pairs[i].id); + set->cnt += add_set->cnt; sort(set->pairs, set->cnt, sizeof(set->pairs[0]), btf_id_cmp_func, NULL); -do_add_filter: if (add_filter) { hook_filter = &tab->hook_filters[hook]; hook_filter->filters[hook_filter->nr_filters++] = kset->filter; @@ -8207,7 +8373,7 @@ static int __register_btf_kfunc_id_set(enum btf_kfunc_hook hook, return PTR_ERR(btf); for (i = 0; i < kset->set->cnt; i++) { - ret = btf_check_kfunc_protos(btf, kset->set->pairs[i].id, + ret = btf_check_kfunc_protos(btf, btf_relocate_id(btf, kset->set->pairs[i].id), kset->set->pairs[i].flags); if (ret) goto err_out; @@ -8271,7 +8437,7 @@ static int btf_check_dtor_kfuncs(struct btf *btf, const struct btf_id_dtor_kfunc u32 nr_args, i; for (i = 0; i < cnt; i++) { - dtor_btf_id = dtors[i].kfunc_btf_id; + dtor_btf_id = btf_relocate_id(btf, dtors[i].kfunc_btf_id); dtor_func = btf_type_by_id(btf, dtor_btf_id); if (!dtor_func || !btf_type_is_func(dtor_func)) @@ -8306,7 +8472,7 @@ int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_c { struct btf_id_dtor_kfunc_tab *tab; struct btf *btf; - u32 tab_cnt; + u32 tab_cnt, i; int ret; btf = btf_get_module_btf(owner); @@ -8357,6 +8523,13 @@ int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_c btf->dtor_kfunc_tab = tab; memcpy(tab->dtors + tab->cnt, dtors, add_cnt * sizeof(tab->dtors[0])); + + /* remap BTF ids based on BTF relocation (if any) */ + for (i = tab_cnt; i < tab_cnt + add_cnt; i++) { + tab->dtors[i].btf_id = btf_relocate_id(btf, tab->dtors[i].btf_id); + tab->dtors[i].kfunc_btf_id = btf_relocate_id(btf, tab->dtors[i].kfunc_btf_id); + } + tab->cnt += add_cnt; sort(tab->dtors, tab->cnt, sizeof(tab->dtors[0]), btf_id_cmp_func, NULL); |