diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2023-04-26 16:07:23 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2023-04-26 16:07:23 -0700 |
commit | 6e98b09da931a00bf4e0477d0fa52748bf28fcce (patch) | |
tree | 9c658ed95add5693f42f29f63df80a2ede3f6ec2 /net | |
parent | b68ee1c6131c540a62ecd443be89c406401df091 (diff) | |
parent | 9b78d919632b7149d311aaad5a977e4b48b10321 (diff) |
Merge tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"Core:
- Introduce a config option to tweak MAX_SKB_FRAGS. Increasing the
default value allows for better BIG TCP performances
- Reduce compound page head access for zero-copy data transfers
- RPS/RFS improvements, avoiding unneeded NET_RX_SOFTIRQ when
possible
- Threaded NAPI improvements, adding defer skb free support and
unneeded softirq avoidance
- Address dst_entry reference count scalability issues, via false
sharing avoidance and optimize refcount tracking
- Add lockless accesses annotation to sk_err[_soft]
- Optimize again the skb struct layout
- Extends the skb drop reasons to make it usable by multiple
subsystems
- Better const qualifier awareness for socket casts
BPF:
- Add skb and XDP typed dynptrs which allow BPF programs for more
ergonomic and less brittle iteration through data and
variable-sized accesses
- Add a new BPF netfilter program type and minimal support to hook
BPF programs to netfilter hooks such as prerouting or forward
- Add more precise memory usage reporting for all BPF map types
- Adds support for using {FOU,GUE} encap with an ipip device
operating in collect_md mode and add a set of BPF kfuncs for
controlling encap params
- Allow BPF programs to detect at load time whether a particular
kfunc exists or not, and also add support for this in light
skeleton
- Bigger batch of BPF verifier improvements to prepare for upcoming
BPF open-coded iterators allowing for less restrictive looping
capabilities
- Rework RCU enforcement in the verifier, add kptr_rcu and enforce
BPF programs to NULL-check before passing such pointers into kfunc
- Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and
in local storage maps
- Enable RCU semantics for task BPF kptrs and allow referenced kptr
tasks to be stored in BPF maps
- Add support for refcounted local kptrs to the verifier for allowing
shared ownership, useful for adding a node to both the BPF list and
rbtree
- Add BPF verifier support for ST instructions in
convert_ctx_access() which will help new -mcpu=v4 clang flag to
start emitting them
- Add ARM32 USDT support to libbpf
- Improve bpftool's visual program dump which produces the control
flow graph in a DOT format by adding C source inline annotations
Protocols:
- IPv4: Allow adding to IPv4 address a 'protocol' tag. Such value
indicates the provenance of the IP address
- IPv6: optimize route lookup, dropping unneeded R/W lock acquisition
- Add the handshake upcall mechanism, allowing the user-space to
implement generic TLS handshake on kernel's behalf
- Bridge: support per-{Port, VLAN} neighbor suppression, increasing
resilience to nodes failures
- SCTP: add support for Fair Capacity and Weighted Fair Queueing
schedulers
- MPTCP: delay first subflow allocation up to its first usage. This
will allow for later better LSM interaction
- xfrm: Remove inner/outer modes from input/output path. These are
not needed anymore
- WiFi:
- reduced neighbor report (RNR) handling for AP mode
- HW timestamping support
- support for randomized auth/deauth TA for PASN privacy
- per-link debugfs for multi-link
- TC offload support for mac80211 drivers
- mac80211 mesh fast-xmit and fast-rx support
- enable Wi-Fi 7 (EHT) mesh support
Netfilter:
- Add nf_tables 'brouting' support, to force a packet to be routed
instead of being bridged
- Update bridge netfilter and ovs conntrack helpers to handle IPv6
Jumbo packets properly, i.e. fetch the packet length from
hop-by-hop extension header. This is needed for BIT TCP support
- The iptables 32bit compat interface isn't compiled in by default
anymore
- Move ip(6)tables builtin icmp matches to the udptcp one. This has
the advantage that icmp/icmpv6 match doesn't load the
iptables/ip6tables modules anymore when iptables-nft is used
- Extended netlink error report for netdevice in flowtables and
netdev/chains. Allow for incrementally add/delete devices to netdev
basechain. Allow to create netdev chain without device
Driver API:
- Remove redundant Device Control Error Reporting Enable, as PCI core
has already error reporting enabled at enumeration time
- Move Multicast DB netlink handlers to core, allowing devices other
then bridge to use them
- Allow the page_pool to directly recycle the pages from safely
localized NAPI
- Implement lockless TX queue stop/wake combo macros, allowing for
further code de-duplication and sanitization
- Add YNL support for user headers and struct attrs
- Add partial YNL specification for devlink
- Add partial YNL specification for ethtool
- Add tc-mqprio and tc-taprio support for preemptible traffic classes
- Add tx push buf len param to ethtool, specifies the maximum number
of bytes of a transmitted packet a driver can push directly to the
underlying device
- Add basic LED support for switch/phy
- Add NAPI documentation, stop relaying on external links
- Convert dsa_master_ioctl() to netdev notifier. This is a
preparatory work to make the hardware timestamping layer selectable
by user space
- Add transceiver support and improve the error messages for CAN-FD
controllers
New hardware / drivers:
- Ethernet:
- AMD/Pensando core device support
- MediaTek MT7981 SoC
- MediaTek MT7988 SoC
- Broadcom BCM53134 embedded switch
- Texas Instruments CPSW9G ethernet switch
- Qualcomm EMAC3 DWMAC ethernet
- StarFive JH7110 SoC
- NXP CBTX ethernet PHY
- WiFi:
- Apple M1 Pro/Max devices
- RealTek rtl8710bu/rtl8188gu
- RealTek rtl8822bs, rtl8822cs and rtl8821cs SDIO chipset
- Bluetooth:
- Realtek RTL8821CS, RTL8851B, RTL8852BS
- Mediatek MT7663, MT7922
- NXP w8997
- Actions Semi ATS2851
- QTI WCN6855
- Marvell 88W8997
- Can:
- STMicroelectronics bxcan stm32f429
Drivers:
- Ethernet NICs:
- Intel (1G, icg):
- add tracking and reporting of QBV config errors
- add support for configuring max SDU for each Tx queue
- Intel (100G, ice):
- refactor mailbox overflow detection to support Scalable IOV
- GNSS interface optimization
- Intel (i40e):
- support XDP multi-buffer
- nVidia/Mellanox:
- add the support for linux bridge multicast offload
- enable TC offload for egress and engress MACVLAN over bond
- add support for VxLAN GBP encap/decap flows offload
- extend packet offload to fully support libreswan
- support tunnel mode in mlx5 IPsec packet offload
- extend XDP multi-buffer support
- support MACsec VLAN offload
- add support for dynamic msix vectors allocation
- drop RX page_cache and fully use page_pool
- implement thermal zone to report NIC temperature
- Netronome/Corigine:
- add support for multi-zone conntrack offload
- Solarflare/Xilinx:
- support offloading TC VLAN push/pop actions to the MAE
- support TC decap rules
- support unicast PTP
- Other NICs:
- Broadcom (bnxt): enforce software based freq adjustments only on
shared PHC NIC
- RealTek (r8169): refactor to addess ASPM issues during NAPI poll
- Micrel (lan8841): add support for PTP_PF_PEROUT
- Cadence (macb): enable PTP unicast
- Engleder (tsnep): add XDP socket zero-copy support
- virtio-net: implement exact header length guest feature
- veth: add page_pool support for page recycling
- vxlan: add MDB data path support
- gve: add XDP support for GQI-QPL format
- geneve: accept every ethertype
- macvlan: allow some packets to bypass broadcast queue
- mana: add support for jumbo frame
- Ethernet high-speed switches:
- Microchip (sparx5): Add support for TC flower templates
- Ethernet embedded switches:
- Broadcom (b54):
- configure 6318 and 63268 RGMII ports
- Marvell (mv88e6xxx):
- faster C45 bus scan
- Microchip:
- lan966x:
- add support for IS1 VCAP
- better TX/RX from/to CPU performances
- ksz9477: add ETS Qdisc support
- ksz8: enhance static MAC table operations and error handling
- sama7g5: add PTP capability
- NXP (ocelot):
- add support for external ports
- add support for preemptible traffic classes
- Texas Instruments:
- add CPSWxG SGMII support for J7200 and J721E
- Intel WiFi (iwlwifi):
- preparation for Wi-Fi 7 EHT and multi-link support
- EHT (Wi-Fi 7) sniffer support
- hardware timestamping support for some devices/firwmares
- TX beacon protection on newer hardware
- Qualcomm 802.11ax WiFi (ath11k):
- MU-MIMO parameters support
- ack signal support for management packets
- RealTek WiFi (rtw88):
- SDIO bus support
- better support for some SDIO devices (e.g. MAC address from
efuse)
- RealTek WiFi (rtw89):
- HW scan support for 8852b
- better support for 6 GHz scanning
- support for various newer firmware APIs
- framework firmware backwards compatibility
- MediaTek WiFi (mt76):
- P2P support
- mesh A-MSDU support
- EHT (Wi-Fi 7) support
- coredump support"
* tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2078 commits)
net: phy: hide the PHYLIB_LEDS knob
net: phy: marvell-88x2222: remove unnecessary (void*) conversions
tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp.
net: amd: Fix link leak when verifying config failed
net: phy: marvell: Fix inconsistent indenting in led_blink_set
lan966x: Don't use xdp_frame when action is XDP_TX
tsnep: Add XDP socket zero-copy TX support
tsnep: Add XDP socket zero-copy RX support
tsnep: Move skb receive action to separate function
tsnep: Add functions for queue enable/disable
tsnep: Rework TX/RX queue initialization
tsnep: Replace modulo operation with mask
net: phy: dp83867: Add led_brightness_set support
net: phy: Fix reading LED reg property
drivers: nfc: nfcsim: remove return value check of `dev_dir`
net: phy: dp83867: Remove unnecessary (void*) conversions
net: ethtool: coalesce: try to make user settings stick twice
net: mana: Check if netdev/napi_alloc_frag returns single page
net: mana: Rename mana_refill_rxoob and remove some empty lines
net: veth: add page_pool stats
...
Diffstat (limited to 'net')
271 files changed, 10038 insertions, 2997 deletions
diff --git a/net/6lowpan/iphc.c b/net/6lowpan/iphc.c index 52fad5dad9f7..e116d308a8df 100644 --- a/net/6lowpan/iphc.c +++ b/net/6lowpan/iphc.c @@ -848,7 +848,7 @@ static u8 lowpan_compress_ctx_addr(u8 **hc_ptr, const struct net_device *dev, const struct lowpan_iphc_ctx *ctx, const unsigned char *lladdr, bool sam) { - struct in6_addr tmp = {}; + struct in6_addr tmp; u8 dam; switch (lowpan_dev(dev)->lltype) { diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c index 296d0145932f..870e4935d6e6 100644 --- a/net/8021q/vlan_dev.c +++ b/net/8021q/vlan_dev.c @@ -26,6 +26,7 @@ #include <linux/ethtool.h> #include <linux/phy.h> #include <net/arp.h> +#include <net/macsec.h> #include "vlan.h" #include "vlanproc.h" @@ -365,7 +366,7 @@ static int vlan_dev_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd) switch (cmd) { case SIOCSHWTSTAMP: - if (!net_eq(dev_net(dev), &init_net)) + if (!net_eq(dev_net(dev), dev_net(real_dev))) break; fallthrough; case SIOCGMIIPHY: @@ -572,6 +573,9 @@ static int vlan_dev_init(struct net_device *dev) NETIF_F_HIGHDMA | NETIF_F_SCTP_CRC | NETIF_F_ALL_FCOE; + if (real_dev->vlan_features & NETIF_F_HW_MACSEC) + dev->hw_features |= NETIF_F_HW_MACSEC; + dev->features |= dev->hw_features | NETIF_F_LLTX; netif_inherit_tso_max(dev, real_dev); if (dev->features & NETIF_F_VLAN_FEATURES) @@ -803,6 +807,241 @@ static int vlan_dev_fill_forward_path(struct net_device_path_ctx *ctx, return 0; } +#if IS_ENABLED(CONFIG_MACSEC) + +static const struct macsec_ops *vlan_get_macsec_ops(const struct macsec_context *ctx) +{ + return vlan_dev_priv(ctx->netdev)->real_dev->macsec_ops; +} + +static int vlan_macsec_offload(int (* const func)(struct macsec_context *), + struct macsec_context *ctx) +{ + if (unlikely(!func)) + return 0; + + return (*func)(ctx); +} + +static int vlan_macsec_dev_open(struct macsec_context *ctx) +{ + const struct macsec_ops *ops = vlan_get_macsec_ops(ctx); + + if (!ops) + return -EOPNOTSUPP; + + return vlan_macsec_offload(ops->mdo_dev_open, ctx); +} + +static int vlan_macsec_dev_stop(struct macsec_context *ctx) +{ + const struct macsec_ops *ops = vlan_get_macsec_ops(ctx); + + if (!ops) + return -EOPNOTSUPP; + + return vlan_macsec_offload(ops->mdo_dev_stop, ctx); +} + +static int vlan_macsec_add_secy(struct macsec_context *ctx) +{ + const struct macsec_ops *ops = vlan_get_macsec_ops(ctx); + + if (!ops) + return -EOPNOTSUPP; + + return vlan_macsec_offload(ops->mdo_add_secy, ctx); +} + +static int vlan_macsec_upd_secy(struct macsec_context *ctx) +{ + const struct macsec_ops *ops = vlan_get_macsec_ops(ctx); + + if (!ops) + return -EOPNOTSUPP; + + return vlan_macsec_offload(ops->mdo_upd_secy, ctx); +} + +static int vlan_macsec_del_secy(struct macsec_context *ctx) +{ + const struct macsec_ops *ops = vlan_get_macsec_ops(ctx); + + if (!ops) + return -EOPNOTSUPP; + + return vlan_macsec_offload(ops->mdo_del_secy, ctx); +} + +static int vlan_macsec_add_rxsc(struct macsec_context *ctx) +{ + const struct macsec_ops *ops = vlan_get_macsec_ops(ctx); + + if (!ops) + return -EOPNOTSUPP; + + return vlan_macsec_offload(ops->mdo_add_rxsc, ctx); +} + +static int vlan_macsec_upd_rxsc(struct macsec_context *ctx) +{ + const struct macsec_ops *ops = vlan_get_macsec_ops(ctx); + + if (!ops) + return -EOPNOTSUPP; + + return vlan_macsec_offload(ops->mdo_upd_rxsc, ctx); +} + +static int vlan_macsec_del_rxsc(struct macsec_context *ctx) +{ + const struct macsec_ops *ops = vlan_get_macsec_ops(ctx); + + if (!ops) + return -EOPNOTSUPP; + + return vlan_macsec_offload(ops->mdo_del_rxsc, ctx); +} + +static int vlan_macsec_add_rxsa(struct macsec_context *ctx) +{ + const struct macsec_ops *ops = vlan_get_macsec_ops(ctx); + + if (!ops) + return -EOPNOTSUPP; + + return vlan_macsec_offload(ops->mdo_add_rxsa, ctx); +} + +static int vlan_macsec_upd_rxsa(struct macsec_context *ctx) +{ + const struct macsec_ops *ops = vlan_get_macsec_ops(ctx); + + if (!ops) + return -EOPNOTSUPP; + + return vlan_macsec_offload(ops->mdo_upd_rxsa, ctx); +} + +static int vlan_macsec_del_rxsa(struct macsec_context *ctx) +{ + const struct macsec_ops *ops = vlan_get_macsec_ops(ctx); + + if (!ops) + return -EOPNOTSUPP; + + return vlan_macsec_offload(ops->mdo_del_rxsa, ctx); +} + +static int vlan_macsec_add_txsa(struct macsec_context *ctx) +{ + const struct macsec_ops *ops = vlan_get_macsec_ops(ctx); + + if (!ops) + return -EOPNOTSUPP; + + return vlan_macsec_offload(ops->mdo_add_txsa, ctx); +} + +static int vlan_macsec_upd_txsa(struct macsec_context *ctx) +{ + const struct macsec_ops *ops = vlan_get_macsec_ops(ctx); + + if (!ops) + return -EOPNOTSUPP; + + return vlan_macsec_offload(ops->mdo_upd_txsa, ctx); +} + +static int vlan_macsec_del_txsa(struct macsec_context *ctx) +{ + const struct macsec_ops *ops = vlan_get_macsec_ops(ctx); + + if (!ops) + return -EOPNOTSUPP; + + return vlan_macsec_offload(ops->mdo_del_txsa, ctx); +} + +static int vlan_macsec_get_dev_stats(struct macsec_context *ctx) +{ + const struct macsec_ops *ops = vlan_get_macsec_ops(ctx); + + if (!ops) + return -EOPNOTSUPP; + + return vlan_macsec_offload(ops->mdo_get_dev_stats, ctx); +} + +static int vlan_macsec_get_tx_sc_stats(struct macsec_context *ctx) +{ + const struct macsec_ops *ops = vlan_get_macsec_ops(ctx); + + if (!ops) + return -EOPNOTSUPP; + + return vlan_macsec_offload(ops->mdo_get_tx_sc_stats, ctx); +} + +static int vlan_macsec_get_tx_sa_stats(struct macsec_context *ctx) +{ + const struct macsec_ops *ops = vlan_get_macsec_ops(ctx); + + if (!ops) + return -EOPNOTSUPP; + + return vlan_macsec_offload(ops->mdo_get_tx_sa_stats, ctx); +} + +static int vlan_macsec_get_rx_sc_stats(struct macsec_context *ctx) +{ + const struct macsec_ops *ops = vlan_get_macsec_ops(ctx); + + if (!ops) + return -EOPNOTSUPP; + + return vlan_macsec_offload(ops->mdo_get_rx_sc_stats, ctx); +} + +static int vlan_macsec_get_rx_sa_stats(struct macsec_context *ctx) +{ + const struct macsec_ops *ops = vlan_get_macsec_ops(ctx); + + if (!ops) + return -EOPNOTSUPP; + + return vlan_macsec_offload(ops->mdo_get_rx_sa_stats, ctx); +} + +static const struct macsec_ops macsec_offload_ops = { + /* Device wide */ + .mdo_dev_open = vlan_macsec_dev_open, + .mdo_dev_stop = vlan_macsec_dev_stop, + /* SecY */ + .mdo_add_secy = vlan_macsec_add_secy, + .mdo_upd_secy = vlan_macsec_upd_secy, + .mdo_del_secy = vlan_macsec_del_secy, + /* Security channels */ + .mdo_add_rxsc = vlan_macsec_add_rxsc, + .mdo_upd_rxsc = vlan_macsec_upd_rxsc, + .mdo_del_rxsc = vlan_macsec_del_rxsc, + /* Security associations */ + .mdo_add_rxsa = vlan_macsec_add_rxsa, + .mdo_upd_rxsa = vlan_macsec_upd_rxsa, + .mdo_del_rxsa = vlan_macsec_del_rxsa, + .mdo_add_txsa = vlan_macsec_add_txsa, + .mdo_upd_txsa = vlan_macsec_upd_txsa, + .mdo_del_txsa = vlan_macsec_del_txsa, + /* Statistics */ + .mdo_get_dev_stats = vlan_macsec_get_dev_stats, + .mdo_get_tx_sc_stats = vlan_macsec_get_tx_sc_stats, + .mdo_get_tx_sa_stats = vlan_macsec_get_tx_sa_stats, + .mdo_get_rx_sc_stats = vlan_macsec_get_rx_sc_stats, + .mdo_get_rx_sa_stats = vlan_macsec_get_rx_sa_stats, +}; + +#endif + static const struct ethtool_ops vlan_ethtool_ops = { .get_link_ksettings = vlan_ethtool_get_link_ksettings, .get_drvinfo = vlan_ethtool_get_drvinfo, @@ -869,6 +1108,9 @@ void vlan_setup(struct net_device *dev) dev->priv_destructor = vlan_dev_free; dev->ethtool_ops = &vlan_ethtool_ops; +#if IS_ENABLED(CONFIG_MACSEC) + dev->macsec_ops = &macsec_offload_ops; +#endif dev->min_mtu = 0; dev->max_mtu = ETH_MAX_MTU; diff --git a/net/Kconfig b/net/Kconfig index 48c33c222199..7d39c1773eb4 100644 --- a/net/Kconfig +++ b/net/Kconfig @@ -68,6 +68,26 @@ source "net/iucv/Kconfig" source "net/smc/Kconfig" source "net/xdp/Kconfig" +config NET_HANDSHAKE + bool + depends on SUNRPC || NVME_TARGET_TCP || NVME_TCP + default y + +config NET_HANDSHAKE_KUNIT_TEST + tristate "KUnit tests for the handshake upcall mechanism" if !KUNIT_ALL_TESTS + default KUNIT_ALL_TESTS + depends on KUNIT + help + This builds the KUnit tests for the handshake upcall mechanism. + + KUnit tests run during boot and output the results to the debug + log in TAP format (https://testanything.org/). Only useful for + kernel devs running KUnit test harness and are not for inclusion + into a production build. + + For more information on KUnit and unit tests in general, refer + to the KUnit documentation in Documentation/dev-tools/kunit/. + config INET bool "TCP/IP networking" help @@ -251,6 +271,18 @@ config PCPU_DEV_REFCNT network device refcount are using per cpu variables if this option is set. This can be forced to N to detect underflows (with a performance drop). +config MAX_SKB_FRAGS + int "Maximum number of fragments per skb_shared_info" + range 17 45 + default 17 + help + Having more fragments per skb_shared_info can help GRO efficiency. + This helps BIG TCP workloads, but might expose bugs in some + legacy drivers. + This also increases memory overhead of small packets, + and in drivers using build_skb(). + If unsure, say 17. + config RPS bool depends on SMP && SYSFS diff --git a/net/Makefile b/net/Makefile index 0914bea9c335..4c4dc535453d 100644 --- a/net/Makefile +++ b/net/Makefile @@ -24,7 +24,7 @@ obj-$(CONFIG_PACKET) += packet/ obj-$(CONFIG_NET_KEY) += key/ obj-$(CONFIG_BRIDGE) += bridge/ obj-$(CONFIG_NET_DEVLINK) += devlink/ -obj-$(CONFIG_NET_DSA) += dsa/ +obj-y += dsa/ obj-$(CONFIG_ATALK) += appletalk/ obj-$(CONFIG_X25) += x25/ obj-$(CONFIG_LAPB) += lapb/ @@ -79,3 +79,4 @@ obj-$(CONFIG_NET_NCSI) += ncsi/ obj-$(CONFIG_XDP_SOCKETS) += xdp/ obj-$(CONFIG_MPTCP) += mptcp/ obj-$(CONFIG_MCTP) += mctp/ +obj-$(CONFIG_NET_HANDSHAKE) += handshake/ diff --git a/net/atm/signaling.c b/net/atm/signaling.c index 5de06ab8ed75..e70ae2c113f9 100644 --- a/net/atm/signaling.c +++ b/net/atm/signaling.c @@ -125,7 +125,7 @@ as_indicate_complete: break; case as_addparty: case as_dropparty: - sk->sk_err_soft = -msg->reply; + WRITE_ONCE(sk->sk_err_soft, -msg->reply); /* < 0 failure, otherwise ep_ref */ clear_bit(ATM_VF_WAITING, &vcc->flags); break; diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c index 125f4628687c..d3fdf82282af 100644 --- a/net/batman-adv/soft-interface.c +++ b/net/batman-adv/soft-interface.c @@ -439,7 +439,7 @@ void batadv_interface_rx(struct net_device *soft_iface, if (!pskb_may_pull(skb, VLAN_ETH_HLEN)) goto dropped; - vhdr = (struct vlan_ethhdr *)skb->data; + vhdr = skb_vlan_eth_hdr(skb); /* drop batman-in-batman packets to prevent loops */ if (vhdr->h_vlan_encapsulated_proto != htons(ETH_P_BATMAN)) diff --git a/net/bluetooth/Makefile b/net/bluetooth/Makefile index 0e7b7db42750..141ac1fda0bf 100644 --- a/net/bluetooth/Makefile +++ b/net/bluetooth/Makefile @@ -17,6 +17,8 @@ bluetooth-y := af_bluetooth.o hci_core.o hci_conn.o hci_event.o mgmt.o \ ecdh_helper.o hci_request.o mgmt_util.o mgmt_config.o hci_codec.o \ eir.o hci_sync.o +bluetooth-$(CONFIG_DEV_COREDUMP) += coredump.o + bluetooth-$(CONFIG_BT_BREDR) += sco.o bluetooth-$(CONFIG_BT_LE) += iso.o bluetooth-$(CONFIG_BT_HS) += a2mp.o amp.o diff --git a/net/bluetooth/coredump.c b/net/bluetooth/coredump.c new file mode 100644 index 000000000000..d2d2624ec708 --- /dev/null +++ b/net/bluetooth/coredump.c @@ -0,0 +1,536 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2023 Google Corporation + */ + +#include <linux/devcoredump.h> + +#include <asm/unaligned.h> +#include <net/bluetooth/bluetooth.h> +#include <net/bluetooth/hci_core.h> + +enum hci_devcoredump_pkt_type { + HCI_DEVCOREDUMP_PKT_INIT, + HCI_DEVCOREDUMP_PKT_SKB, + HCI_DEVCOREDUMP_PKT_PATTERN, + HCI_DEVCOREDUMP_PKT_COMPLETE, + HCI_DEVCOREDUMP_PKT_ABORT, +}; + +struct hci_devcoredump_skb_cb { + u16 pkt_type; +}; + +struct hci_devcoredump_skb_pattern { + u8 pattern; + u32 len; +} __packed; + +#define hci_dmp_cb(skb) ((struct hci_devcoredump_skb_cb *)((skb)->cb)) + +#define DBG_UNEXPECTED_STATE() \ + bt_dev_dbg(hdev, \ + "Unexpected packet (%d) for state (%d). ", \ + hci_dmp_cb(skb)->pkt_type, hdev->dump.state) + +#define MAX_DEVCOREDUMP_HDR_SIZE 512 /* bytes */ + +static int hci_devcd_update_hdr_state(char *buf, size_t size, int state) +{ + int len = 0; + + if (!buf) + return 0; + + len = scnprintf(buf, size, "Bluetooth devcoredump\nState: %d\n", state); + + return len + 1; /* scnprintf adds \0 at the end upon state rewrite */ +} + +/* Call with hci_dev_lock only. */ +static int hci_devcd_update_state(struct hci_dev *hdev, int state) +{ + bt_dev_dbg(hdev, "Updating devcoredump state from %d to %d.", + hdev->dump.state, state); + + hdev->dump.state = state; + + return hci_devcd_update_hdr_state(hdev->dump.head, + hdev->dump.alloc_size, state); +} + +static int hci_devcd_mkheader(struct hci_dev *hdev, struct sk_buff *skb) +{ + char dump_start[] = "--- Start dump ---\n"; + char hdr[80]; + int hdr_len; + + hdr_len = hci_devcd_update_hdr_state(hdr, sizeof(hdr), + HCI_DEVCOREDUMP_IDLE); + skb_put_data(skb, hdr, hdr_len); + + if (hdev->dump.dmp_hdr) + hdev->dump.dmp_hdr(hdev, skb); + + skb_put_data(skb, dump_start, strlen(dump_start)); + + return skb->len; +} + +/* Do not call with hci_dev_lock since this calls driver code. */ +static void hci_devcd_notify(struct hci_dev *hdev, int state) +{ + if (hdev->dump.notify_change) + hdev->dump.notify_change(hdev, state); +} + +/* Call with hci_dev_lock only. */ +void hci_devcd_reset(struct hci_dev *hdev) +{ + hdev->dump.head = NULL; + hdev->dump.tail = NULL; + hdev->dump.alloc_size = 0; + + hci_devcd_update_state(hdev, HCI_DEVCOREDUMP_IDLE); + + cancel_delayed_work(&hdev->dump.dump_timeout); + skb_queue_purge(&hdev->dump.dump_q); +} + +/* Call with hci_dev_lock only. */ +static void hci_devcd_free(struct hci_dev *hdev) +{ + if (hdev->dump.head) + vfree(hdev->dump.head); + + hci_devcd_reset(hdev); +} + +/* Call with hci_dev_lock only. */ +static int hci_devcd_alloc(struct hci_dev *hdev, u32 size) +{ + hdev->dump.head = vmalloc(size); + if (!hdev->dump.head) + return -ENOMEM; + + hdev->dump.alloc_size = size; + hdev->dump.tail = hdev->dump.head; + hdev->dump.end = hdev->dump.head + size; + + hci_devcd_update_state(hdev, HCI_DEVCOREDUMP_IDLE); + + return 0; +} + +/* Call with hci_dev_lock only. */ +static bool hci_devcd_copy(struct hci_dev *hdev, char *buf, u32 size) +{ + if (hdev->dump.tail + size > hdev->dump.end) + return false; + + memcpy(hdev->dump.tail, buf, size); + hdev->dump.tail += size; + + return true; +} + +/* Call with hci_dev_lock only. */ +static bool hci_devcd_memset(struct hci_dev *hdev, u8 pattern, u32 len) +{ + if (hdev->dump.tail + len > hdev->dump.end) + return false; + + memset(hdev->dump.tail, pattern, len); + hdev->dump.tail += len; + + return true; +} + +/* Call with hci_dev_lock only. */ +static int hci_devcd_prepare(struct hci_dev *hdev, u32 dump_size) +{ + struct sk_buff *skb; + int dump_hdr_size; + int err = 0; + + skb = alloc_skb(MAX_DEVCOREDUMP_HDR_SIZE, GFP_ATOMIC); + if (!skb) + return -ENOMEM; + + dump_hdr_size = hci_devcd_mkheader(hdev, skb); + + if (hci_devcd_alloc(hdev, dump_hdr_size + dump_size)) { + err = -ENOMEM; + goto hdr_free; + } + + /* Insert the device header */ + if (!hci_devcd_copy(hdev, skb->data, skb->len)) { + bt_dev_err(hdev, "Failed to insert header"); + hci_devcd_free(hdev); + + err = -ENOMEM; + goto hdr_free; + } + +hdr_free: + kfree_skb(skb); + + return err; +} + +static void hci_devcd_handle_pkt_init(struct hci_dev *hdev, struct sk_buff *skb) +{ + u32 dump_size; + + if (hdev->dump.state != HCI_DEVCOREDUMP_IDLE) { + DBG_UNEXPECTED_STATE(); + return; + } + + if (skb->len != sizeof(dump_size)) { + bt_dev_dbg(hdev, "Invalid dump init pkt"); + return; + } + + dump_size = get_unaligned_le32(skb_pull_data(skb, 4)); + if (!dump_size) { + bt_dev_err(hdev, "Zero size dump init pkt"); + return; + } + + if (hci_devcd_prepare(hdev, dump_size)) { + bt_dev_err(hdev, "Failed to prepare for dump"); + return; + } + + hci_devcd_update_state(hdev, HCI_DEVCOREDUMP_ACTIVE); + queue_delayed_work(hdev->workqueue, &hdev->dump.dump_timeout, + hdev->dump.timeout); +} + +static void hci_devcd_handle_pkt_skb(struct hci_dev *hdev, struct sk_buff *skb) +{ + if (hdev->dump.state != HCI_DEVCOREDUMP_ACTIVE) { + DBG_UNEXPECTED_STATE(); + return; + } + + if (!hci_devcd_copy(hdev, skb->data, skb->len)) + bt_dev_dbg(hdev, "Failed to insert skb"); +} + +static void hci_devcd_handle_pkt_pattern(struct hci_dev *hdev, + struct sk_buff *skb) +{ + struct hci_devcoredump_skb_pattern *pattern; + + if (hdev->dump.state != HCI_DEVCOREDUMP_ACTIVE) { + DBG_UNEXPECTED_STATE(); + return; + } + + if (skb->len != sizeof(*pattern)) { + bt_dev_dbg(hdev, "Invalid pattern skb"); + return; + } + + pattern = skb_pull_data(skb, sizeof(*pattern)); + + if (!hci_devcd_memset(hdev, pattern->pattern, pattern->len)) + bt_dev_dbg(hdev, "Failed to set pattern"); +} + +static void hci_devcd_handle_pkt_complete(struct hci_dev *hdev, + struct sk_buff *skb) +{ + u32 dump_size; + + if (hdev->dump.state != HCI_DEVCOREDUMP_ACTIVE) { + DBG_UNEXPECTED_STATE(); + return; + } + + hci_devcd_update_state(hdev, HCI_DEVCOREDUMP_DONE); + dump_size = hdev->dump.tail - hdev->dump.head; + + bt_dev_dbg(hdev, "complete with size %u (expect %zu)", dump_size, + hdev->dump.alloc_size); + + dev_coredumpv(&hdev->dev, hdev->dump.head, dump_size, GFP_KERNEL); +} + +static void hci_devcd_handle_pkt_abort(struct hci_dev *hdev, + struct sk_buff *skb) +{ + u32 dump_size; + + if (hdev->dump.state != HCI_DEVCOREDUMP_ACTIVE) { + DBG_UNEXPECTED_STATE(); + return; + } + + hci_devcd_update_state(hdev, HCI_DEVCOREDUMP_ABORT); + dump_size = hdev->dump.tail - hdev->dump.head; + + bt_dev_dbg(hdev, "aborted with size %u (expect %zu)", dump_size, + hdev->dump.alloc_size); + + /* Emit a devcoredump with the available data */ + dev_coredumpv(&hdev->dev, hdev->dump.head, dump_size, GFP_KERNEL); +} + +/* Bluetooth devcoredump state machine. + * + * Devcoredump states: + * + * HCI_DEVCOREDUMP_IDLE: The default state. + * + * HCI_DEVCOREDUMP_ACTIVE: A devcoredump will be in this state once it has + * been initialized using hci_devcd_init(). Once active, the driver + * can append data using hci_devcd_append() or insert a pattern + * using hci_devcd_append_pattern(). + * + * HCI_DEVCOREDUMP_DONE: Once the dump collection is complete, the drive + * can signal the completion using hci_devcd_complete(). A + * devcoredump is generated indicating the completion event and + * then the state machine is reset to the default state. + * + * HCI_DEVCOREDUMP_ABORT: The driver can cancel ongoing dump collection in + * case of any error using hci_devcd_abort(). A devcoredump is + * still generated with the available data indicating the abort + * event and then the state machine is reset to the default state. + * + * HCI_DEVCOREDUMP_TIMEOUT: A timeout timer for HCI_DEVCOREDUMP_TIMEOUT sec + * is started during devcoredump initialization. Once the timeout + * occurs, the driver is notified, a devcoredump is generated with + * the available data indicating the timeout event and then the + * state machine is reset to the default state. + * + * The driver must register using hci_devcd_register() before using the hci + * devcoredump APIs. + */ +void hci_devcd_rx(struct work_struct *work) +{ + struct hci_dev *hdev = container_of(work, struct hci_dev, dump.dump_rx); + struct sk_buff *skb; + int start_state; + + while ((skb = skb_dequeue(&hdev->dump.dump_q))) { + /* Return if timeout occurs. The timeout handler function + * hci_devcd_timeout() will report the available dump data. + */ + if (hdev->dump.state == HCI_DEVCOREDUMP_TIMEOUT) { + kfree_skb(skb); + return; + } + + hci_dev_lock(hdev); + start_state = hdev->dump.state; + + switch (hci_dmp_cb(skb)->pkt_type) { + case HCI_DEVCOREDUMP_PKT_INIT: + hci_devcd_handle_pkt_init(hdev, skb); + break; + + case HCI_DEVCOREDUMP_PKT_SKB: + hci_devcd_handle_pkt_skb(hdev, skb); + break; + + case HCI_DEVCOREDUMP_PKT_PATTERN: + hci_devcd_handle_pkt_pattern(hdev, skb); + break; + + case HCI_DEVCOREDUMP_PKT_COMPLETE: + hci_devcd_handle_pkt_complete(hdev, skb); + break; + + case HCI_DEVCOREDUMP_PKT_ABORT: + hci_devcd_handle_pkt_abort(hdev, skb); + break; + + default: + bt_dev_dbg(hdev, "Unknown packet (%d) for state (%d). ", + hci_dmp_cb(skb)->pkt_type, hdev->dump.state); + break; + } + + hci_dev_unlock(hdev); + kfree_skb(skb); + + /* Notify the driver about any state changes before resetting + * the state machine + */ + if (start_state != hdev->dump.state) + hci_devcd_notify(hdev, hdev->dump.state); + + /* Reset the state machine if the devcoredump is complete */ + hci_dev_lock(hdev); + if (hdev->dump.state == HCI_DEVCOREDUMP_DONE || + hdev->dump.state == HCI_DEVCOREDUMP_ABORT) + hci_devcd_reset(hdev); + hci_dev_unlock(hdev); + } +} +EXPORT_SYMBOL(hci_devcd_rx); + +void hci_devcd_timeout(struct work_struct *work) +{ + struct hci_dev *hdev = container_of(work, struct hci_dev, + dump.dump_timeout.work); + u32 dump_size; + + hci_devcd_notify(hdev, HCI_DEVCOREDUMP_TIMEOUT); + + hci_dev_lock(hdev); + + cancel_work(&hdev->dump.dump_rx); + + hci_devcd_update_state(hdev, HCI_DEVCOREDUMP_TIMEOUT); + + dump_size = hdev->dump.tail - hdev->dump.head; + bt_dev_dbg(hdev, "timeout with size %u (expect %zu)", dump_size, + hdev->dump.alloc_size); + + /* Emit a devcoredump with the available data */ + dev_coredumpv(&hdev->dev, hdev->dump.head, dump_size, GFP_KERNEL); + + hci_devcd_reset(hdev); + + hci_dev_unlock(hdev); +} +EXPORT_SYMBOL(hci_devcd_timeout); + +int hci_devcd_register(struct hci_dev *hdev, coredump_t coredump, + dmp_hdr_t dmp_hdr, notify_change_t notify_change) +{ + /* Driver must implement coredump() and dmp_hdr() functions for + * bluetooth devcoredump. The coredump() should trigger a coredump + * event on the controller when the device's coredump sysfs entry is + * written to. The dmp_hdr() should create a dump header to identify + * the controller/fw/driver info. + */ + if (!coredump || !dmp_hdr) + return -EINVAL; + + hci_dev_lock(hdev); + hdev->dump.coredump = coredump; + hdev->dump.dmp_hdr = dmp_hdr; + hdev->dump.notify_change = notify_change; + hdev->dump.supported = true; + hdev->dump.timeout = DEVCOREDUMP_TIMEOUT; + hci_dev_unlock(hdev); + + return 0; +} +EXPORT_SYMBOL(hci_devcd_register); + +static inline bool hci_devcd_enabled(struct hci_dev *hdev) +{ + return hdev->dump.supported; +} + +int hci_devcd_init(struct hci_dev *hdev, u32 dump_size) +{ + struct sk_buff *skb; + + if (!hci_devcd_enabled(hdev)) + return -EOPNOTSUPP; + + skb = alloc_skb(sizeof(dump_size), GFP_ATOMIC); + if (!skb) + return -ENOMEM; + + hci_dmp_cb(skb)->pkt_type = HCI_DEVCOREDUMP_PKT_INIT; + put_unaligned_le32(dump_size, skb_put(skb, 4)); + + skb_queue_tail(&hdev->dump.dump_q, skb); + queue_work(hdev->workqueue, &hdev->dump.dump_rx); + + return 0; +} +EXPORT_SYMBOL(hci_devcd_init); + +int hci_devcd_append(struct hci_dev *hdev, struct sk_buff *skb) +{ + if (!skb) + return -ENOMEM; + + if (!hci_devcd_enabled(hdev)) { + kfree_skb(skb); + return -EOPNOTSUPP; + } + + hci_dmp_cb(skb)->pkt_type = HCI_DEVCOREDUMP_PKT_SKB; + + skb_queue_tail(&hdev->dump.dump_q, skb); + queue_work(hdev->workqueue, &hdev->dump.dump_rx); + + return 0; +} +EXPORT_SYMBOL(hci_devcd_append); + +int hci_devcd_append_pattern(struct hci_dev *hdev, u8 pattern, u32 len) +{ + struct hci_devcoredump_skb_pattern p; + struct sk_buff *skb; + + if (!hci_devcd_enabled(hdev)) + return -EOPNOTSUPP; + + skb = alloc_skb(sizeof(p), GFP_ATOMIC); + if (!skb) + return -ENOMEM; + + p.pattern = pattern; + p.len = len; + + hci_dmp_cb(skb)->pkt_type = HCI_DEVCOREDUMP_PKT_PATTERN; + skb_put_data(skb, &p, sizeof(p)); + + skb_queue_tail(&hdev->dump.dump_q, skb); + queue_work(hdev->workqueue, &hdev->dump.dump_rx); + + return 0; +} +EXPORT_SYMBOL(hci_devcd_append_pattern); + +int hci_devcd_complete(struct hci_dev *hdev) +{ + struct sk_buff *skb; + + if (!hci_devcd_enabled(hdev)) + return -EOPNOTSUPP; + + skb = alloc_skb(0, GFP_ATOMIC); + if (!skb) + return -ENOMEM; + + hci_dmp_cb(skb)->pkt_type = HCI_DEVCOREDUMP_PKT_COMPLETE; + + skb_queue_tail(&hdev->dump.dump_q, skb); + queue_work(hdev->workqueue, &hdev->dump.dump_rx); + + return 0; +} +EXPORT_SYMBOL(hci_devcd_complete); + +int hci_devcd_abort(struct hci_dev *hdev) +{ + struct sk_buff *skb; + + if (!hci_devcd_enabled(hdev)) + return -EOPNOTSUPP; + + skb = alloc_skb(0, GFP_ATOMIC); + if (!skb) + return -ENOMEM; + + hci_dmp_cb(skb)->pkt_type = HCI_DEVCOREDUMP_PKT_ABORT; + + skb_queue_tail(&hdev->dump.dump_q, skb); + queue_work(hdev->workqueue, &hdev->dump.dump_rx); + + return 0; +} +EXPORT_SYMBOL(hci_devcd_abort); diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c index 8455ba141ee6..640b951bf40a 100644 --- a/net/bluetooth/hci_conn.c +++ b/net/bluetooth/hci_conn.c @@ -1,6 +1,7 @@ /* BlueZ - Bluetooth protocol stack for Linux Copyright (c) 2000-2001, 2010, Code Aurora Forum. All rights reserved. + Copyright 2023 NXP Written 2000,2001 by Maxim Krasnyansky <maxk@qualcomm.com> @@ -329,8 +330,11 @@ static void hci_add_sco(struct hci_conn *conn, __u16 handle) static bool find_next_esco_param(struct hci_conn *conn, const struct sco_param *esco_param, int size) { + if (!conn->parent) + return false; + for (; conn->attempt <= size; conn->attempt++) { - if (lmp_esco_2m_capable(conn->link) || + if (lmp_esco_2m_capable(conn->parent) || (esco_param[conn->attempt - 1].pkt_type & ESCO_2EV3)) break; BT_DBG("hcon %p skipped attempt %d, eSCO 2M not supported", @@ -460,7 +464,7 @@ static int hci_enhanced_setup_sync(struct hci_dev *hdev, void *data) break; case BT_CODEC_CVSD: - if (lmp_esco_capable(conn->link)) { + if (conn->parent && lmp_esco_capable(conn->parent)) { if (!find_next_esco_param(conn, esco_param_cvsd, ARRAY_SIZE(esco_param_cvsd))) return -EINVAL; @@ -530,7 +534,7 @@ static bool hci_setup_sync_conn(struct hci_conn *conn, __u16 handle) param = &esco_param_msbc[conn->attempt - 1]; break; case SCO_AIRMODE_CVSD: - if (lmp_esco_capable(conn->link)) { + if (conn->parent && lmp_esco_capable(conn->parent)) { if (!find_next_esco_param(conn, esco_param_cvsd, ARRAY_SIZE(esco_param_cvsd))) return false; @@ -636,21 +640,22 @@ void hci_le_start_enc(struct hci_conn *conn, __le16 ediv, __le64 rand, /* Device _must_ be locked */ void hci_sco_setup(struct hci_conn *conn, __u8 status) { - struct hci_conn *sco = conn->link; + struct hci_link *link; - if (!sco) + link = list_first_entry_or_null(&conn->link_list, struct hci_link, list); + if (!link || !link->conn) return; BT_DBG("hcon %p", conn); if (!status) { if (lmp_esco_capable(conn->hdev)) - hci_setup_sync(sco, conn->handle); + hci_setup_sync(link->conn, conn->handle); else - hci_add_sco(sco, conn->handle); + hci_add_sco(link->conn, conn->handle); } else { - hci_connect_cfm(sco, status); - hci_conn_del(sco); + hci_connect_cfm(link->conn, status); + hci_conn_del(link->conn); } } @@ -795,8 +800,8 @@ static void bis_list(struct hci_conn *conn, void *data) if (bacmp(&conn->dst, BDADDR_ANY)) return; - if (d->big != conn->iso_qos.big || d->bis == BT_ISO_QOS_BIS_UNSET || - d->bis != conn->iso_qos.bis) + if (d->big != conn->iso_qos.bcast.big || d->bis == BT_ISO_QOS_BIS_UNSET || + d->bis != conn->iso_qos.bcast.bis) return; d->count++; @@ -916,10 +921,10 @@ static void bis_cleanup(struct hci_conn *conn) if (!test_and_clear_bit(HCI_CONN_PER_ADV, &conn->flags)) return; - hci_le_terminate_big(hdev, conn->iso_qos.big, - conn->iso_qos.bis); + hci_le_terminate_big(hdev, conn->iso_qos.bcast.big, + conn->iso_qos.bcast.bis); } else { - hci_le_big_terminate(hdev, conn->iso_qos.big, + hci_le_big_terminate(hdev, conn->iso_qos.bcast.big, conn->sync_handle); } } @@ -959,7 +964,7 @@ static void cis_cleanup(struct hci_conn *conn) struct iso_list_data d; memset(&d, 0, sizeof(d)); - d.cig = conn->iso_qos.cig; + d.cig = conn->iso_qos.ucast.cig; /* Check if ISO connection is a CIS and remove CIG if there are * no other connections using it. @@ -968,7 +973,7 @@ static void cis_cleanup(struct hci_conn *conn) if (d.count) return; - hci_le_remove_cig(hdev, conn->iso_qos.cig); + hci_le_remove_cig(hdev, conn->iso_qos.ucast.cig); } struct hci_conn *hci_conn_add(struct hci_dev *hdev, int type, bdaddr_t *dst, @@ -1041,6 +1046,7 @@ struct hci_conn *hci_conn_add(struct hci_dev *hdev, int type, bdaddr_t *dst, skb_queue_head_init(&conn->data_q); INIT_LIST_HEAD(&conn->chan_list); + INIT_LIST_HEAD(&conn->link_list); INIT_DELAYED_WORK(&conn->disc_work, hci_conn_timeout); INIT_DELAYED_WORK(&conn->auto_accept_work, hci_conn_auto_accept); @@ -1068,15 +1074,39 @@ struct hci_conn *hci_conn_add(struct hci_dev *hdev, int type, bdaddr_t *dst, return conn; } -static bool hci_conn_unlink(struct hci_conn *conn) +static void hci_conn_unlink(struct hci_conn *conn) { + struct hci_dev *hdev = conn->hdev; + + bt_dev_dbg(hdev, "hcon %p", conn); + + if (!conn->parent) { + struct hci_link *link, *t; + + list_for_each_entry_safe(link, t, &conn->link_list, list) + hci_conn_unlink(link->conn); + + return; + } + if (!conn->link) - return false; + return; + + hci_conn_put(conn->parent); + conn->parent = NULL; + + list_del_rcu(&conn->link->list); + synchronize_rcu(); - conn->link->link = NULL; + kfree(conn->link); conn->link = NULL; - return true; + /* Due to race, SCO connection might be not established + * yet at this point. Delete it now, otherwise it is + * possible for it to be stuck and can't be deleted. + */ + if (conn->handle == HCI_CONN_HANDLE_UNSET) + hci_conn_del(conn); } int hci_conn_del(struct hci_conn *conn) @@ -1090,18 +1120,7 @@ int hci_conn_del(struct hci_conn *conn) cancel_delayed_work_sync(&conn->idle_work); if (conn->type == ACL_LINK) { - struct hci_conn *link = conn->link; - - if (link) { - hci_conn_unlink(conn); - /* Due to race, SCO connection might be not established - * yet at this point. Delete it now, otherwise it is - * possible for it to be stuck and can't be deleted. - */ - if (link->handle == HCI_CONN_HANDLE_UNSET) - hci_conn_del(link); - } - + hci_conn_unlink(conn); /* Unacked frames */ hdev->acl_cnt += conn->sent; } else if (conn->type == LE_LINK) { @@ -1112,7 +1131,7 @@ int hci_conn_del(struct hci_conn *conn) else hdev->acl_cnt += conn->sent; } else { - struct hci_conn *acl = conn->link; + struct hci_conn *acl = conn->parent; if (acl) { hci_conn_unlink(conn); @@ -1411,7 +1430,7 @@ static int qos_set_big(struct hci_dev *hdev, struct bt_iso_qos *qos) struct iso_list_data data; /* Allocate a BIG if not set */ - if (qos->big == BT_ISO_QOS_BIG_UNSET) { + if (qos->bcast.big == BT_ISO_QOS_BIG_UNSET) { for (data.big = 0x00; data.big < 0xef; data.big++) { data.count = 0; data.bis = 0xff; @@ -1426,7 +1445,7 @@ static int qos_set_big(struct hci_dev *hdev, struct bt_iso_qos *qos) return -EADDRNOTAVAIL; /* Update BIG */ - qos->big = data.big; + qos->bcast.big = data.big; } return 0; @@ -1437,7 +1456,7 @@ static int qos_set_bis(struct hci_dev *hdev, struct bt_iso_qos *qos) struct iso_list_data data; /* Allocate BIS if not set */ - if (qos->bis == BT_ISO_QOS_BIS_UNSET) { + if (qos->bcast.bis == BT_ISO_QOS_BIS_UNSET) { /* Find an unused adv set to advertise BIS, skip instance 0x00 * since it is reserved as general purpose set. */ @@ -1455,7 +1474,7 @@ static int qos_set_bis(struct hci_dev *hdev, struct bt_iso_qos *qos) return -EADDRNOTAVAIL; /* Update BIS */ - qos->bis = data.bis; + qos->bcast.bis = data.bis; } return 0; @@ -1484,8 +1503,8 @@ static struct hci_conn *hci_add_bis(struct hci_dev *hdev, bdaddr_t *dst, if (err) return ERR_PTR(err); - data.big = qos->big; - data.bis = qos->bis; + data.big = qos->bcast.big; + data.bis = qos->bcast.bis; data.count = 0; /* Check if there is already a matching BIG/BIS */ @@ -1493,7 +1512,7 @@ static struct hci_conn *hci_add_bis(struct hci_dev *hdev, bdaddr_t *dst, if (data.count) return ERR_PTR(-EADDRINUSE); - conn = hci_conn_hash_lookup_bis(hdev, dst, qos->big, qos->bis); + conn = hci_conn_hash_lookup_bis(hdev, dst, qos->bcast.big, qos->bcast.bis); if (conn) return ERR_PTR(-EADDRINUSE); @@ -1599,11 +1618,40 @@ struct hci_conn *hci_connect_acl(struct hci_dev *hdev, bdaddr_t *dst, return acl; } +static struct hci_link *hci_conn_link(struct hci_conn *parent, + struct hci_conn *conn) +{ + struct hci_dev *hdev = parent->hdev; + struct hci_link *link; + + bt_dev_dbg(hdev, "parent %p hcon %p", parent, conn); + + if (conn->link) + return conn->link; + + if (conn->parent) + return NULL; + + link = kzalloc(sizeof(*link), GFP_KERNEL); + if (!link) + return NULL; + + link->conn = hci_conn_hold(conn); + conn->link = link; + conn->parent = hci_conn_get(parent); + + /* Use list_add_tail_rcu append to the list */ + list_add_tail_rcu(&link->list, &parent->link_list); + + return link; +} + struct hci_conn *hci_connect_sco(struct hci_dev *hdev, int type, bdaddr_t *dst, __u16 setting, struct bt_codec *codec) { struct hci_conn *acl; struct hci_conn *sco; + struct hci_link *link; acl = hci_connect_acl(hdev, dst, BT_SECURITY_LOW, HCI_AT_NO_BONDING, CONN_REASON_SCO_CONNECT); @@ -1619,10 +1667,12 @@ struct hci_conn *hci_connect_sco(struct hci_dev *hdev, int type, bdaddr_t *dst, } } - acl->link = sco; - sco->link = acl; - - hci_conn_hold(sco); + link = hci_conn_link(acl, sco); + if (!link) { + hci_conn_drop(acl); + hci_conn_drop(sco); + return NULL; + } sco->setting = setting; sco->codec = *codec; @@ -1648,13 +1698,13 @@ static void cis_add(struct iso_list_data *d, struct bt_iso_qos *qos) { struct hci_cis_params *cis = &d->pdu.cis[d->pdu.cp.num_cis]; - cis->cis_id = qos->cis; - cis->c_sdu = cpu_to_le16(qos->out.sdu); - cis->p_sdu = cpu_to_le16(qos->in.sdu); - cis->c_phy = qos->out.phy ? qos->out.phy : qos->in.phy; - cis->p_phy = qos->in.phy ? qos->in.phy : qos->out.phy; - cis->c_rtn = qos->out.rtn; - cis->p_rtn = qos->in.rtn; + cis->cis_id = qos->ucast.cis; + cis->c_sdu = cpu_to_le16(qos->ucast.out.sdu); + cis->p_sdu = cpu_to_le16(qos->ucast.in.sdu); + cis->c_phy = qos->ucast.out.phy ? qos->ucast.out.phy : qos->ucast.in.phy; + cis->p_phy = qos->ucast.in.phy ? qos->ucast.in.phy : qos->ucast.out.phy; + cis->c_rtn = qos->ucast.out.rtn; + cis->p_rtn = qos->ucast.in.rtn; d->pdu.cp.num_cis++; } @@ -1667,8 +1717,8 @@ static void cis_list(struct hci_conn *conn, void *data) if (!bacmp(&conn->dst, BDADDR_ANY)) return; - if (d->cig != conn->iso_qos.cig || d->cis == BT_ISO_QOS_CIS_UNSET || - d->cis != conn->iso_qos.cis) + if (d->cig != conn->iso_qos.ucast.cig || d->cis == BT_ISO_QOS_CIS_UNSET || + d->cis != conn->iso_qos.ucast.cis) return; d->count++; @@ -1687,18 +1737,18 @@ static int hci_le_create_big(struct hci_conn *conn, struct bt_iso_qos *qos) memset(&cp, 0, sizeof(cp)); - cp.handle = qos->big; - cp.adv_handle = qos->bis; + cp.handle = qos->bcast.big; + cp.adv_handle = qos->bcast.bis; cp.num_bis = 0x01; - hci_cpu_to_le24(qos->out.interval, cp.bis.sdu_interval); - cp.bis.sdu = cpu_to_le16(qos->out.sdu); - cp.bis.latency = cpu_to_le16(qos->out.latency); - cp.bis.rtn = qos->out.rtn; - cp.bis.phy = qos->out.phy; - cp.bis.packing = qos->packing; - cp.bis.framing = qos->framing; - cp.bis.encryption = 0x00; - memset(&cp.bis.bcode, 0, sizeof(cp.bis.bcode)); + hci_cpu_to_le24(qos->bcast.out.interval, cp.bis.sdu_interval); + cp.bis.sdu = cpu_to_le16(qos->bcast.out.sdu); + cp.bis.latency = cpu_to_le16(qos->bcast.out.latency); + cp.bis.rtn = qos->bcast.out.rtn; + cp.bis.phy = qos->bcast.out.phy; + cp.bis.packing = qos->bcast.packing; + cp.bis.framing = qos->bcast.framing; + cp.bis.encryption = qos->bcast.encryption; + memcpy(cp.bis.bcode, qos->bcast.bcode, sizeof(cp.bis.bcode)); return hci_send_cmd(hdev, HCI_OP_LE_CREATE_BIG, sizeof(cp), &cp); } @@ -1711,7 +1761,7 @@ static bool hci_le_set_cig_params(struct hci_conn *conn, struct bt_iso_qos *qos) memset(&data, 0, sizeof(data)); /* Allocate a CIG if not set */ - if (qos->cig == BT_ISO_QOS_CIG_UNSET) { + if (qos->ucast.cig == BT_ISO_QOS_CIG_UNSET) { for (data.cig = 0x00; data.cig < 0xff; data.cig++) { data.count = 0; data.cis = 0xff; @@ -1731,22 +1781,22 @@ static bool hci_le_set_cig_params(struct hci_conn *conn, struct bt_iso_qos *qos) return false; /* Update CIG */ - qos->cig = data.cig; + qos->ucast.cig = data.cig; } - data.pdu.cp.cig_id = qos->cig; - hci_cpu_to_le24(qos->out.interval, data.pdu.cp.c_interval); - hci_cpu_to_le24(qos->in.interval, data.pdu.cp.p_interval); - data.pdu.cp.sca = qos->sca; - data.pdu.cp.packing = qos->packing; - data.pdu.cp.framing = qos->framing; - data.pdu.cp.c_latency = cpu_to_le16(qos->out.latency); - data.pdu.cp.p_latency = cpu_to_le16(qos->in.latency); + data.pdu.cp.cig_id = qos->ucast.cig; + hci_cpu_to_le24(qos->ucast.out.interval, data.pdu.cp.c_interval); + hci_cpu_to_le24(qos->ucast.in.interval, data.pdu.cp.p_interval); + data.pdu.cp.sca = qos->ucast.sca; + data.pdu.cp.packing = qos->ucast.packing; + data.pdu.cp.framing = qos->ucast.framing; + data.pdu.cp.c_latency = cpu_to_le16(qos->ucast.out.latency); + data.pdu.cp.p_latency = cpu_to_le16(qos->ucast.in.latency); - if (qos->cis != BT_ISO_QOS_CIS_UNSET) { + if (qos->ucast.cis != BT_ISO_QOS_CIS_UNSET) { data.count = 0; - data.cig = qos->cig; - data.cis = qos->cis; + data.cig = qos->ucast.cig; + data.cis = qos->ucast.cis; hci_conn_hash_list_state(hdev, cis_list, ISO_LINK, BT_BOUND, &data); @@ -1757,7 +1807,7 @@ static bool hci_le_set_cig_params(struct hci_conn *conn, struct bt_iso_qos *qos) } /* Reprogram all CIS(s) with the same CIG */ - for (data.cig = qos->cig, data.cis = 0x00; data.cis < 0x11; + for (data.cig = qos->ucast.cig, data.cis = 0x00; data.cis < 0x11; data.cis++) { data.count = 0; @@ -1767,14 +1817,14 @@ static bool hci_le_set_cig_params(struct hci_conn *conn, struct bt_iso_qos *qos) continue; /* Allocate a CIS if not set */ - if (qos->cis == BT_ISO_QOS_CIS_UNSET) { + if (qos->ucast.cis == BT_ISO_QOS_CIS_UNSET) { /* Update CIS */ - qos->cis = data.cis; + qos->ucast.cis = data.cis; cis_add(&data, qos); } } - if (qos->cis == BT_ISO_QOS_CIS_UNSET || !data.pdu.cp.num_cis) + if (qos->ucast.cis == BT_ISO_QOS_CIS_UNSET || !data.pdu.cp.num_cis) return false; if (hci_send_cmd(hdev, HCI_OP_LE_SET_CIG_PARAMS, @@ -1791,7 +1841,8 @@ struct hci_conn *hci_bind_cis(struct hci_dev *hdev, bdaddr_t *dst, { struct hci_conn *cis; - cis = hci_conn_hash_lookup_cis(hdev, dst, dst_type); + cis = hci_conn_hash_lookup_cis(hdev, dst, dst_type, qos->ucast.cig, + qos->ucast.cis); if (!cis) { cis = hci_conn_add(hdev, ISO_LINK, dst, HCI_ROLE_MASTER); if (!cis) @@ -1809,32 +1860,32 @@ struct hci_conn *hci_bind_cis(struct hci_dev *hdev, bdaddr_t *dst, return cis; /* Update LINK PHYs according to QoS preference */ - cis->le_tx_phy = qos->out.phy; - cis->le_rx_phy = qos->in.phy; + cis->le_tx_phy = qos->ucast.out.phy; + cis->le_rx_phy = qos->ucast.in.phy; /* If output interval is not set use the input interval as it cannot be * 0x000000. */ - if (!qos->out.interval) - qos->out.interval = qos->in.interval; + if (!qos->ucast.out.interval) + qos->ucast.out.interval = qos->ucast.in.interval; /* If input interval is not set use the output interval as it cannot be * 0x000000. */ - if (!qos->in.interval) - qos->in.interval = qos->out.interval; + if (!qos->ucast.in.interval) + qos->ucast.in.interval = qos->ucast.out.interval; /* If output latency is not set use the input latency as it cannot be * 0x0000. */ - if (!qos->out.latency) - qos->out.latency = qos->in.latency; + if (!qos->ucast.out.latency) + qos->ucast.out.latency = qos->ucast.in.latency; /* If input latency is not set use the output latency as it cannot be * 0x0000. */ - if (!qos->in.latency) - qos->in.latency = qos->out.latency; + if (!qos->ucast.in.latency) + qos->ucast.in.latency = qos->ucast.out.latency; if (!hci_le_set_cig_params(cis, qos)) { hci_conn_drop(cis); @@ -1854,7 +1905,7 @@ bool hci_iso_setup_path(struct hci_conn *conn) memset(&cmd, 0, sizeof(cmd)); - if (conn->iso_qos.out.sdu) { + if (conn->iso_qos.ucast.out.sdu) { cmd.handle = cpu_to_le16(conn->handle); cmd.direction = 0x00; /* Input (Host to Controller) */ cmd.path = 0x00; /* HCI path if enabled */ @@ -1865,7 +1916,7 @@ bool hci_iso_setup_path(struct hci_conn *conn) return false; } - if (conn->iso_qos.in.sdu) { + if (conn->iso_qos.ucast.in.sdu) { cmd.handle = cpu_to_le16(conn->handle); cmd.direction = 0x01; /* Output (Controller to Host) */ cmd.path = 0x00; /* HCI path if enabled */ @@ -1881,76 +1932,39 @@ bool hci_iso_setup_path(struct hci_conn *conn) static int hci_create_cis_sync(struct hci_dev *hdev, void *data) { - struct { - struct hci_cp_le_create_cis cp; - struct hci_cis cis[0x1f]; - } cmd; - struct hci_conn *conn = data; - u8 cig; - - memset(&cmd, 0, sizeof(cmd)); - cmd.cis[0].acl_handle = cpu_to_le16(conn->link->handle); - cmd.cis[0].cis_handle = cpu_to_le16(conn->handle); - cmd.cp.num_cis++; - cig = conn->iso_qos.cig; - - hci_dev_lock(hdev); - - rcu_read_lock(); - - list_for_each_entry_rcu(conn, &hdev->conn_hash.list, list) { - struct hci_cis *cis = &cmd.cis[cmd.cp.num_cis]; - - if (conn == data || conn->type != ISO_LINK || - conn->state == BT_CONNECTED || conn->iso_qos.cig != cig) - continue; - - /* Check if all CIS(s) belonging to a CIG are ready */ - if (!conn->link || conn->link->state != BT_CONNECTED || - conn->state != BT_CONNECT) { - cmd.cp.num_cis = 0; - break; - } - - /* Group all CIS with state BT_CONNECT since the spec don't - * allow to send them individually: - * - * BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 4, Part E - * page 2566: - * - * If the Host issues this command before all the - * HCI_LE_CIS_Established events from the previous use of the - * command have been generated, the Controller shall return the - * error code Command Disallowed (0x0C). - */ - cis->acl_handle = cpu_to_le16(conn->link->handle); - cis->cis_handle = cpu_to_le16(conn->handle); - cmd.cp.num_cis++; - } - - rcu_read_unlock(); - - hci_dev_unlock(hdev); - - if (!cmd.cp.num_cis) - return 0; - - return hci_send_cmd(hdev, HCI_OP_LE_CREATE_CIS, sizeof(cmd.cp) + - sizeof(cmd.cis[0]) * cmd.cp.num_cis, &cmd); + return hci_le_create_cis_sync(hdev, data); } int hci_le_create_cis(struct hci_conn *conn) { struct hci_conn *cis; + struct hci_link *link, *t; struct hci_dev *hdev = conn->hdev; int err; + bt_dev_dbg(hdev, "hcon %p", conn); + switch (conn->type) { case LE_LINK: - if (!conn->link || conn->state != BT_CONNECTED) + if (conn->state != BT_CONNECTED || list_empty(&conn->link_list)) return -EINVAL; - cis = conn->link; - break; + + cis = NULL; + + /* hci_conn_link uses list_add_tail_rcu so the list is in + * the same order as the connections are requested. + */ + list_for_each_entry_safe(link, t, &conn->link_list, list) { + if (link->conn->state == BT_BOUND) { + err = hci_le_create_cis(link->conn); + if (err) + return err; + + cis = link->conn; + } + } + + return cis ? 0 : -EINVAL; case ISO_LINK: cis = conn; break; @@ -2002,8 +2016,8 @@ static void hci_bind_bis(struct hci_conn *conn, struct bt_iso_qos *qos) { /* Update LINK PHYs according to QoS preference */ - conn->le_tx_phy = qos->out.phy; - conn->le_tx_phy = qos->out.phy; + conn->le_tx_phy = qos->bcast.out.phy; + conn->le_tx_phy = qos->bcast.out.phy; conn->iso_qos = *qos; conn->state = BT_BOUND; } @@ -2016,16 +2030,16 @@ static int create_big_sync(struct hci_dev *hdev, void *data) u32 flags = 0; int err; - if (qos->out.phy == 0x02) + if (qos->bcast.out.phy == 0x02) flags |= MGMT_ADV_FLAG_SEC_2M; /* Align intervals */ - interval = qos->out.interval / 1250; + interval = qos->bcast.out.interval / 1250; - if (qos->bis) - sync_interval = qos->sync_interval * 1600; + if (qos->bcast.bis) + sync_interval = qos->bcast.sync_interval * 1600; - err = hci_start_per_adv_sync(hdev, qos->bis, conn->le_per_adv_data_len, + err = hci_start_per_adv_sync(hdev, qos->bcast.bis, conn->le_per_adv_data_len, conn->le_per_adv_data, flags, interval, interval, sync_interval); if (err) @@ -2062,7 +2076,7 @@ static int create_pa_sync(struct hci_dev *hdev, void *data) } int hci_pa_create_sync(struct hci_dev *hdev, bdaddr_t *dst, __u8 dst_type, - __u8 sid) + __u8 sid, struct bt_iso_qos *qos) { struct hci_cp_le_pa_create_sync *cp; @@ -2075,9 +2089,13 @@ int hci_pa_create_sync(struct hci_dev *hdev, bdaddr_t *dst, __u8 dst_type, return -ENOMEM; } + cp->options = qos->bcast.options; cp->sid = sid; cp->addr_type = dst_type; bacpy(&cp->addr, dst); + cp->skip = cpu_to_le16(qos->bcast.skip); + cp->sync_timeout = cpu_to_le16(qos->bcast.sync_timeout); + cp->sync_cte_type = qos->bcast.sync_cte_type; /* Queue start pa_create_sync and scan */ return hci_cmd_sync_queue(hdev, create_pa_sync, cp, create_pa_complete); @@ -2100,8 +2118,12 @@ int hci_le_big_create_sync(struct hci_dev *hdev, struct bt_iso_qos *qos, return err; memset(&pdu, 0, sizeof(pdu)); - pdu.cp.handle = qos->big; + pdu.cp.handle = qos->bcast.big; pdu.cp.sync_handle = cpu_to_le16(sync_handle); + pdu.cp.encryption = qos->bcast.encryption; + memcpy(pdu.cp.bcode, qos->bcast.bcode, sizeof(pdu.cp.bcode)); + pdu.cp.mse = qos->bcast.mse; + pdu.cp.timeout = cpu_to_le16(qos->bcast.timeout); pdu.cp.num_bis = num_bis; memcpy(pdu.bis, bis, num_bis); @@ -2151,7 +2173,7 @@ struct hci_conn *hci_connect_bis(struct hci_dev *hdev, bdaddr_t *dst, return ERR_PTR(err); } - hci_iso_qos_setup(hdev, conn, &qos->out, + hci_iso_qos_setup(hdev, conn, &qos->bcast.out, conn->le_tx_phy ? conn->le_tx_phy : hdev->le_tx_def_phys); @@ -2163,6 +2185,7 @@ struct hci_conn *hci_connect_cis(struct hci_dev *hdev, bdaddr_t *dst, { struct hci_conn *le; struct hci_conn *cis; + struct hci_link *link; if (hci_dev_test_flag(hdev, HCI_ADVERTISING)) le = hci_connect_le(hdev, dst, dst_type, false, @@ -2177,9 +2200,9 @@ struct hci_conn *hci_connect_cis(struct hci_dev *hdev, bdaddr_t *dst, if (IS_ERR(le)) return le; - hci_iso_qos_setup(hdev, le, &qos->out, + hci_iso_qos_setup(hdev, le, &qos->ucast.out, le->le_tx_phy ? le->le_tx_phy : hdev->le_tx_def_phys); - hci_iso_qos_setup(hdev, le, &qos->in, + hci_iso_qos_setup(hdev, le, &qos->ucast.in, le->le_rx_phy ? le->le_rx_phy : hdev->le_rx_def_phys); cis = hci_bind_cis(hdev, dst, dst_type, qos); @@ -2188,16 +2211,18 @@ struct hci_conn *hci_connect_cis(struct hci_dev *hdev, bdaddr_t *dst, return cis; } - le->link = cis; - cis->link = le; - - hci_conn_hold(cis); + link = hci_conn_link(le, cis); + if (!link) { + hci_conn_drop(le); + hci_conn_drop(cis); + return NULL; + } /* If LE is already connected and CIS handle is already set proceed to * Create CIS immediately. */ if (le->state == BT_CONNECTED && cis->handle != HCI_CONN_HANDLE_UNSET) - hci_le_create_cis(le); + hci_le_create_cis(cis); return cis; } diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c index 334e308451f5..a856b1051d35 100644 --- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -2544,6 +2544,7 @@ struct hci_dev *hci_alloc_dev_priv(int sizeof_priv) INIT_DELAYED_WORK(&hdev->cmd_timer, hci_cmd_timeout); INIT_DELAYED_WORK(&hdev->ncmd_timer, hci_ncmd_timeout); + hci_devcd_setup(hdev); hci_request_setup(hdev); hci_init_sysfs(hdev); @@ -2802,6 +2803,9 @@ int hci_suspend_dev(struct hci_dev *hdev) if (mgmt_powering_down(hdev)) return 0; + /* Cancel potentially blocking sync operation before suspend */ + __hci_cmd_sync_cancel(hdev, -EHOSTDOWN); + hci_req_sync_lock(hdev); ret = hci_suspend_sync(hdev); hci_req_sync_unlock(hdev); diff --git a/net/bluetooth/hci_debugfs.c b/net/bluetooth/hci_debugfs.c index b7f682922a16..ec0df2f9188e 100644 --- a/net/bluetooth/hci_debugfs.c +++ b/net/bluetooth/hci_debugfs.c @@ -189,7 +189,7 @@ static int uuids_show(struct seq_file *f, void *p) } hci_dev_unlock(hdev); - return 0; + return 0; } DEFINE_SHOW_ATTRIBUTE(uuids); diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c index e87c928c9e17..d00ef6e3fc45 100644 --- a/net/bluetooth/hci_event.c +++ b/net/bluetooth/hci_event.c @@ -1,6 +1,7 @@ /* BlueZ - Bluetooth protocol stack for Linux Copyright (c) 2000-2001, 2010, Code Aurora Forum. All rights reserved. + Copyright 2023 NXP Written 2000,2001 by Maxim Krasnyansky <maxk@qualcomm.com> @@ -886,8 +887,13 @@ static u8 hci_cc_read_local_ext_features(struct hci_dev *hdev, void *data, if (rp->status) return rp->status; - if (hdev->max_page < rp->max_page) - hdev->max_page = rp->max_page; + if (hdev->max_page < rp->max_page) { + if (test_bit(HCI_QUIRK_BROKEN_LOCAL_EXT_FEATURES_PAGE_2, + &hdev->quirks)) + bt_dev_warn(hdev, "broken local ext features page 2"); + else + hdev->max_page = rp->max_page; + } if (rp->page < HCI_MAX_PAGES) memcpy(hdev->features[rp->page], rp->features, 8); @@ -2339,7 +2345,8 @@ static void hci_cs_create_conn(struct hci_dev *hdev, __u8 status) static void hci_cs_add_sco(struct hci_dev *hdev, __u8 status) { struct hci_cp_add_sco *cp; - struct hci_conn *acl, *sco; + struct hci_conn *acl; + struct hci_link *link; __u16 handle; bt_dev_dbg(hdev, "status 0x%2.2x", status); @@ -2359,12 +2366,13 @@ static void hci_cs_add_sco(struct hci_dev *hdev, __u8 status) acl = hci_conn_hash_lookup_handle(hdev, handle); if (acl) { - sco = acl->link; - if (sco) { - sco->state = BT_CLOSED; + link = list_first_entry_or_null(&acl->link_list, + struct hci_link, list); + if (link && link->conn) { + link->conn->state = BT_CLOSED; - hci_connect_cfm(sco, status); - hci_conn_del(sco); + hci_connect_cfm(link->conn, status); + hci_conn_del(link->conn); } } @@ -2631,74 +2639,61 @@ static void hci_cs_read_remote_ext_features(struct hci_dev *hdev, __u8 status) hci_dev_unlock(hdev); } -static void hci_cs_setup_sync_conn(struct hci_dev *hdev, __u8 status) +static void hci_setup_sync_conn_status(struct hci_dev *hdev, __u16 handle, + __u8 status) { - struct hci_cp_setup_sync_conn *cp; - struct hci_conn *acl, *sco; - __u16 handle; + struct hci_conn *acl; + struct hci_link *link; - bt_dev_dbg(hdev, "status 0x%2.2x", status); - - if (!status) - return; - - cp = hci_sent_cmd_data(hdev, HCI_OP_SETUP_SYNC_CONN); - if (!cp) - return; - - handle = __le16_to_cpu(cp->handle); - - bt_dev_dbg(hdev, "handle 0x%4.4x", handle); + bt_dev_dbg(hdev, "handle 0x%4.4x status 0x%2.2x", handle, status); hci_dev_lock(hdev); acl = hci_conn_hash_lookup_handle(hdev, handle); if (acl) { - sco = acl->link; - if (sco) { - sco->state = BT_CLOSED; + link = list_first_entry_or_null(&acl->link_list, + struct hci_link, list); + if (link && link->conn) { + link->conn->state = BT_CLOSED; - hci_connect_cfm(sco, status); - hci_conn_del(sco); + hci_connect_cfm(link->conn, status); + hci_conn_del(link->conn); } } hci_dev_unlock(hdev); } -static void hci_cs_enhanced_setup_sync_conn(struct hci_dev *hdev, __u8 status) +static void hci_cs_setup_sync_conn(struct hci_dev *hdev, __u8 status) { - struct hci_cp_enhanced_setup_sync_conn *cp; - struct hci_conn *acl, *sco; - __u16 handle; + struct hci_cp_setup_sync_conn *cp; bt_dev_dbg(hdev, "status 0x%2.2x", status); if (!status) return; - cp = hci_sent_cmd_data(hdev, HCI_OP_ENHANCED_SETUP_SYNC_CONN); + cp = hci_sent_cmd_data(hdev, HCI_OP_SETUP_SYNC_CONN); if (!cp) return; - handle = __le16_to_cpu(cp->handle); + hci_setup_sync_conn_status(hdev, __le16_to_cpu(cp->handle), status); +} - bt_dev_dbg(hdev, "handle 0x%4.4x", handle); +static void hci_cs_enhanced_setup_sync_conn(struct hci_dev *hdev, __u8 status) +{ + struct hci_cp_enhanced_setup_sync_conn *cp; - hci_dev_lock(hdev); + bt_dev_dbg(hdev, "status 0x%2.2x", status); - acl = hci_conn_hash_lookup_handle(hdev, handle); - if (acl) { - sco = acl->link; - if (sco) { - sco->state = BT_CLOSED; + if (!status) + return; - hci_connect_cfm(sco, status); - hci_conn_del(sco); - } - } + cp = hci_sent_cmd_data(hdev, HCI_OP_ENHANCED_SETUP_SYNC_CONN); + if (!cp) + return; - hci_dev_unlock(hdev); + hci_setup_sync_conn_status(hdev, __le16_to_cpu(cp->handle), status); } static void hci_cs_sniff_mode(struct hci_dev *hdev, __u8 status) @@ -3828,19 +3823,20 @@ static u8 hci_cc_le_set_cig_params(struct hci_dev *hdev, void *data, rcu_read_lock(); list_for_each_entry_rcu(conn, &hdev->conn_hash.list, list) { - if (conn->type != ISO_LINK || conn->iso_qos.cig != rp->cig_id || + if (conn->type != ISO_LINK || + conn->iso_qos.ucast.cig != rp->cig_id || conn->state == BT_CONNECTED) continue; conn->handle = __le16_to_cpu(rp->handle[i++]); - bt_dev_dbg(hdev, "%p handle 0x%4.4x link %p", conn, - conn->handle, conn->link); + bt_dev_dbg(hdev, "%p handle 0x%4.4x parent %p", conn, + conn->handle, conn->parent); /* Create CIS if LE is already connected */ - if (conn->link && conn->link->state == BT_CONNECTED) { + if (conn->parent && conn->parent->state == BT_CONNECTED) { rcu_read_unlock(); - hci_le_create_cis(conn->link); + hci_le_create_cis(conn); rcu_read_lock(); } @@ -3885,7 +3881,7 @@ static u8 hci_cc_le_setup_iso_path(struct hci_dev *hdev, void *data, /* Input (Host to Controller) */ case 0x00: /* Only confirm connection if output only */ - if (conn->iso_qos.out.sdu && !conn->iso_qos.in.sdu) + if (conn->iso_qos.ucast.out.sdu && !conn->iso_qos.ucast.in.sdu) hci_connect_cfm(conn, rp->status); break; /* Output (Controller to Host) */ @@ -5025,7 +5021,7 @@ static void hci_sync_conn_complete_evt(struct hci_dev *hdev, void *data, if (conn->out) { conn->pkt_type = (hdev->esco_type & SCO_ESCO_MASK) | (hdev->esco_type & EDR_ESCO_MASK); - if (hci_setup_sync(conn, conn->link->handle)) + if (hci_setup_sync(conn, conn->parent->handle)) goto unlock; } fallthrough; @@ -6813,15 +6809,15 @@ static void hci_le_cis_estabilished_evt(struct hci_dev *hdev, void *data, memset(&interval, 0, sizeof(interval)); memcpy(&interval, ev->c_latency, sizeof(ev->c_latency)); - conn->iso_qos.in.interval = le32_to_cpu(interval); + conn->iso_qos.ucast.in.interval = le32_to_cpu(interval); memcpy(&interval, ev->p_latency, sizeof(ev->p_latency)); - conn->iso_qos.out.interval = le32_to_cpu(interval); - conn->iso_qos.in.latency = le16_to_cpu(ev->interval); - conn->iso_qos.out.latency = le16_to_cpu(ev->interval); - conn->iso_qos.in.sdu = le16_to_cpu(ev->c_mtu); - conn->iso_qos.out.sdu = le16_to_cpu(ev->p_mtu); - conn->iso_qos.in.phy = ev->c_phy; - conn->iso_qos.out.phy = ev->p_phy; + conn->iso_qos.ucast.out.interval = le32_to_cpu(interval); + conn->iso_qos.ucast.in.latency = le16_to_cpu(ev->interval); + conn->iso_qos.ucast.out.latency = le16_to_cpu(ev->interval); + conn->iso_qos.ucast.in.sdu = le16_to_cpu(ev->c_mtu); + conn->iso_qos.ucast.out.sdu = le16_to_cpu(ev->p_mtu); + conn->iso_qos.ucast.in.phy = ev->c_phy; + conn->iso_qos.ucast.out.phy = ev->p_phy; } if (!ev->status) { @@ -6895,8 +6891,8 @@ static void hci_le_cis_req_evt(struct hci_dev *hdev, void *data, cis->handle = cis_handle; } - cis->iso_qos.cig = ev->cig_id; - cis->iso_qos.cis = ev->cis_id; + cis->iso_qos.ucast.cig = ev->cig_id; + cis->iso_qos.ucast.cis = ev->cis_id; if (!(flags & HCI_PROTO_DEFER)) { hci_le_accept_cis(hdev, ev->cis_handle); @@ -6983,13 +6979,13 @@ static void hci_le_big_sync_established_evt(struct hci_dev *hdev, void *data, bis->handle = handle; } - bis->iso_qos.big = ev->handle; + bis->iso_qos.bcast.big = ev->handle; memset(&interval, 0, sizeof(interval)); memcpy(&interval, ev->latency, sizeof(ev->latency)); - bis->iso_qos.in.interval = le32_to_cpu(interval); + bis->iso_qos.bcast.in.interval = le32_to_cpu(interval); /* Convert ISO Interval (1.25 ms slots) to latency (ms) */ - bis->iso_qos.in.latency = le16_to_cpu(ev->interval) * 125 / 100; - bis->iso_qos.in.sdu = le16_to_cpu(ev->max_pdu); + bis->iso_qos.bcast.in.latency = le16_to_cpu(ev->interval) * 125 / 100; + bis->iso_qos.bcast.in.sdu = le16_to_cpu(ev->max_pdu); hci_iso_setup_path(bis); } diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c index 06581223238c..1d249d839819 100644 --- a/net/bluetooth/hci_sock.c +++ b/net/bluetooth/hci_sock.c @@ -987,6 +987,34 @@ static int hci_sock_ioctl(struct socket *sock, unsigned int cmd, BT_DBG("cmd %x arg %lx", cmd, arg); + /* Make sure the cmd is valid before doing anything */ + switch (cmd) { + case HCIGETDEVLIST: + case HCIGETDEVINFO: + case HCIGETCONNLIST: + case HCIDEVUP: + case HCIDEVDOWN: + case HCIDEVRESET: + case HCIDEVRESTAT: + case HCISETSCAN: + case HCISETAUTH: + case HCISETENCRYPT: + case HCISETPTYPE: + case HCISETLINKPOL: + case HCISETLINKMODE: + case HCISETACLMTU: + case HCISETSCOMTU: + case HCIINQUIRY: + case HCISETRAW: + case HCIGETCONNINFO: + case HCIGETAUTHINFO: + case HCIBLOCKADDR: + case HCIUNBLOCKADDR: + break; + default: + return -ENOIOCTLCMD; + } + lock_sock(sk); if (hci_pi(sk)->channel != HCI_CHANNEL_RAW) { @@ -1003,7 +1031,14 @@ static int hci_sock_ioctl(struct socket *sock, unsigned int cmd, if (hci_sock_gen_cookie(sk)) { struct sk_buff *skb; - if (capable(CAP_NET_ADMIN)) + /* Perform careful checks before setting the HCI_SOCK_TRUSTED + * flag. Make sure that not only the current task but also + * the socket opener has the required capability, since + * privileged programs can be tricked into making ioctl calls + * on HCI sockets, and the socket should not be marked as + * trusted simply because the ioctl caller is privileged. + */ + if (sk_capable(sk, CAP_NET_ADMIN)) hci_sock_set_flag(sk, HCI_SOCK_TRUSTED); /* Send event to monitor */ diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c index 632be1267288..647a8ce54062 100644 --- a/net/bluetooth/hci_sync.c +++ b/net/bluetooth/hci_sync.c @@ -684,8 +684,12 @@ void hci_cmd_sync_cancel(struct hci_dev *hdev, int err) } EXPORT_SYMBOL(hci_cmd_sync_cancel); -int hci_cmd_sync_queue(struct hci_dev *hdev, hci_cmd_sync_work_func_t func, - void *data, hci_cmd_sync_work_destroy_t destroy) +/* Submit HCI command to be run in as cmd_sync_work: + * + * - hdev must _not_ be unregistered + */ +int hci_cmd_sync_submit(struct hci_dev *hdev, hci_cmd_sync_work_func_t func, + void *data, hci_cmd_sync_work_destroy_t destroy) { struct hci_cmd_sync_work_entry *entry; @@ -708,6 +712,23 @@ int hci_cmd_sync_queue(struct hci_dev *hdev, hci_cmd_sync_work_func_t func, return 0; } +EXPORT_SYMBOL(hci_cmd_sync_submit); + +/* Queue HCI command: + * + * - hdev must be running + */ +int hci_cmd_sync_queue(struct hci_dev *hdev, hci_cmd_sync_work_func_t func, + void *data, hci_cmd_sync_work_destroy_t destroy) +{ + /* Only queue command if hdev is running which means it had been opened + * and is either on init phase or is already up. + */ + if (!test_bit(HCI_RUNNING, &hdev->flags)) + return -ENETDOWN; + + return hci_cmd_sync_submit(hdev, func, data, destroy); +} EXPORT_SYMBOL(hci_cmd_sync_queue); int hci_update_eir_sync(struct hci_dev *hdev) @@ -2403,7 +2424,7 @@ static int hci_pause_addr_resolution(struct hci_dev *hdev) /* Return if address resolution is disabled and RPA is not used. */ if (!err && scan_use_rpa(hdev)) - return err; + return 0; hci_resume_advertising_sync(hdev); return err; @@ -4093,7 +4114,8 @@ static int hci_le_set_rpa_timeout_sync(struct hci_dev *hdev) { __le16 timeout = cpu_to_le16(hdev->rpa_timeout); - if (!(hdev->commands[35] & 0x04)) + if (!(hdev->commands[35] & 0x04) || + test_bit(HCI_QUIRK_BROKEN_SET_RPA_TIMEOUT, &hdev->quirks)) return 0; return __hci_cmd_sync_status(hdev, HCI_OP_LE_SET_RPA_TIMEOUT, @@ -4414,18 +4436,38 @@ static int hci_le_set_write_def_data_len_sync(struct hci_dev *hdev) sizeof(cp), &cp, HCI_CMD_TIMEOUT); } -/* Set Default PHY parameters if command is supported */ +/* Set Default PHY parameters if command is supported, enables all supported + * PHYs according to the LE Features bits. + */ static int hci_le_set_default_phy_sync(struct hci_dev *hdev) { struct hci_cp_le_set_default_phy cp; - if (!(hdev->commands[35] & 0x20)) + if (!(hdev->commands[35] & 0x20)) { + /* If the command is not supported it means only 1M PHY is + * supported. + */ + hdev->le_tx_def_phys = HCI_LE_SET_PHY_1M; + hdev->le_rx_def_phys = HCI_LE_SET_PHY_1M; return 0; + } memset(&cp, 0, sizeof(cp)); cp.all_phys = 0x00; - cp.tx_phys = hdev->le_tx_def_phys; - cp.rx_phys = hdev->le_rx_def_phys; + cp.tx_phys = HCI_LE_SET_PHY_1M; + cp.rx_phys = HCI_LE_SET_PHY_1M; + + /* Enables 2M PHY if supported */ + if (le_2m_capable(hdev)) { + cp.tx_phys |= HCI_LE_SET_PHY_2M; + cp.rx_phys |= HCI_LE_SET_PHY_2M; + } + + /* Enables Coded PHY if supported */ + if (le_coded_capable(hdev)) { + cp.tx_phys |= HCI_LE_SET_PHY_CODED; + cp.rx_phys |= HCI_LE_SET_PHY_CODED; + } return __hci_cmd_sync_status(hdev, HCI_OP_LE_SET_DEFAULT_PHY, sizeof(cp), &cp, HCI_CMD_TIMEOUT); @@ -4533,6 +4575,9 @@ static const struct { "HCI Set Event Filter command not supported."), HCI_QUIRK_BROKEN(ENHANCED_SETUP_SYNC_CONN, "HCI Enhanced Setup Synchronous Connection command is " + "advertised, but not supported."), + HCI_QUIRK_BROKEN(SET_RPA_TIMEOUT, + "HCI LE Set Random Private Address Timeout command is " "advertised, but not supported.") }; @@ -4727,6 +4772,8 @@ int hci_dev_open_sync(struct hci_dev *hdev) goto done; } + hci_devcd_reset(hdev); + set_bit(HCI_RUNNING, &hdev->flags); hci_sock_dev_event(hdev, HCI_DEV_OPEN); @@ -5108,10 +5155,12 @@ static int hci_disconnect_sync(struct hci_dev *hdev, struct hci_conn *conn, cp.handle = cpu_to_le16(conn->handle); cp.reason = reason; - /* Wait for HCI_EV_DISCONN_COMPLETE not HCI_EV_CMD_STATUS when not - * suspending. + /* Wait for HCI_EV_DISCONN_COMPLETE, not HCI_EV_CMD_STATUS, when the + * reason is anything but HCI_ERROR_REMOTE_POWER_OFF. This reason is + * used when suspending or powering off, where we don't want to wait + * for the peer's response. */ - if (!hdev->suspended) + if (reason != HCI_ERROR_REMOTE_POWER_OFF) return __hci_cmd_sync_status_sk(hdev, HCI_OP_DISCONNECT, sizeof(cp), &cp, HCI_EV_DISCONN_COMPLETE, @@ -5853,7 +5902,6 @@ static int hci_le_ext_directed_advertising_sync(struct hci_dev *hdev, memset(&cp, 0, sizeof(cp)); cp.evt_properties = cpu_to_le16(LE_LEGACY_ADV_DIRECT_IND); - cp.own_addr_type = own_addr_type; cp.channel_map = hdev->le_adv_channel_map; cp.tx_power = HCI_TX_POWER_INVALID; cp.primary_phy = HCI_ADV_PHY_1M; @@ -6114,6 +6162,71 @@ done: return err; } +int hci_le_create_cis_sync(struct hci_dev *hdev, struct hci_conn *conn) +{ + struct { + struct hci_cp_le_create_cis cp; + struct hci_cis cis[0x1f]; + } cmd; + u8 cig; + struct hci_conn *hcon = conn; + + memset(&cmd, 0, sizeof(cmd)); + cmd.cis[0].acl_handle = cpu_to_le16(conn->parent->handle); + cmd.cis[0].cis_handle = cpu_to_le16(conn->handle); + cmd.cp.num_cis++; + cig = conn->iso_qos.ucast.cig; + + hci_dev_lock(hdev); + + rcu_read_lock(); + + list_for_each_entry_rcu(conn, &hdev->conn_hash.list, list) { + struct hci_cis *cis = &cmd.cis[cmd.cp.num_cis]; + + if (conn == hcon || conn->type != ISO_LINK || + conn->state == BT_CONNECTED || + conn->iso_qos.ucast.cig != cig) + continue; + + /* Check if all CIS(s) belonging to a CIG are ready */ + if (!conn->parent || conn->parent->state != BT_CONNECTED || + conn->state != BT_CONNECT) { + cmd.cp.num_cis = 0; + break; + } + + /* Group all CIS with state BT_CONNECT since the spec don't + * allow to send them individually: + * + * BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 4, Part E + * page 2566: + * + * If the Host issues this command before all the + * HCI_LE_CIS_Established events from the previous use of the + * command have been generated, the Controller shall return the + * error code Command Disallowed (0x0C). + */ + cis->acl_handle = cpu_to_le16(conn->parent->handle); + cis->cis_handle = cpu_to_le16(conn->handle); + cmd.cp.num_cis++; + } + + rcu_read_unlock(); + + hci_dev_unlock(hdev); + + if (!cmd.cp.num_cis) + return 0; + + /* Wait for HCI_LE_CIS_Established */ + return __hci_cmd_sync_status_sk(hdev, HCI_OP_LE_CREATE_CIS, + sizeof(cmd.cp) + sizeof(cmd.cis[0]) * + cmd.cp.num_cis, &cmd, + HCI_EVT_LE_CIS_ESTABLISHED, + conn->conn_timeout, NULL); +} + int hci_le_remove_cig_sync(struct hci_dev *hdev, u8 handle) { struct hci_cp_le_remove_cig cp; diff --git a/net/bluetooth/iso.c b/net/bluetooth/iso.c index 8d136a730163..34d55a85d8f6 100644 --- a/net/bluetooth/iso.c +++ b/net/bluetooth/iso.c @@ -3,6 +3,7 @@ * BlueZ - Bluetooth protocol stack for Linux * * Copyright (C) 2022 Intel Corporation + * Copyright 2023 NXP */ #include <linux/module.h> @@ -59,11 +60,17 @@ struct iso_pinfo { __u16 sync_handle; __u32 flags; struct bt_iso_qos qos; + bool qos_user_set; __u8 base_len; __u8 base[BASE_MAX_LENGTH]; struct iso_conn *conn; }; +static struct bt_iso_qos default_qos; + +static bool check_ucast_qos(struct bt_iso_qos *qos); +static bool check_bcast_qos(struct bt_iso_qos *qos); + /* ---- ISO timers ---- */ #define ISO_CONN_TIMEOUT (HZ * 40) #define ISO_DISCONN_TIMEOUT (HZ * 2) @@ -264,8 +271,15 @@ static int iso_connect_bis(struct sock *sk) goto unlock; } + /* Fail if user set invalid QoS */ + if (iso_pi(sk)->qos_user_set && !check_bcast_qos(&iso_pi(sk)->qos)) { + iso_pi(sk)->qos = default_qos; + err = -EINVAL; + goto unlock; + } + /* Fail if out PHYs are marked as disabled */ - if (!iso_pi(sk)->qos.out.phy) { + if (!iso_pi(sk)->qos.bcast.out.phy) { err = -EINVAL; goto unlock; } @@ -336,8 +350,15 @@ static int iso_connect_cis(struct sock *sk) goto unlock; } + /* Fail if user set invalid QoS */ + if (iso_pi(sk)->qos_user_set && !check_ucast_qos(&iso_pi(sk)->qos)) { + iso_pi(sk)->qos = default_qos; + err = -EINVAL; + goto unlock; + } + /* Fail if either PHYs are marked as disabled */ - if (!iso_pi(sk)->qos.in.phy && !iso_pi(sk)->qos.out.phy) { + if (!iso_pi(sk)->qos.ucast.in.phy && !iso_pi(sk)->qos.ucast.out.phy) { err = -EINVAL; goto unlock; } @@ -417,7 +438,7 @@ static int iso_send_frame(struct sock *sk, struct sk_buff *skb) BT_DBG("sk %p len %d", sk, skb->len); - if (skb->len > qos->out.sdu) + if (skb->len > qos->ucast.out.sdu) return -EMSGSIZE; len = skb->len; @@ -680,13 +701,23 @@ static struct proto iso_proto = { } static struct bt_iso_qos default_qos = { - .cig = BT_ISO_QOS_CIG_UNSET, - .cis = BT_ISO_QOS_CIS_UNSET, - .sca = 0x00, - .packing = 0x00, - .framing = 0x00, - .in = DEFAULT_IO_QOS, - .out = DEFAULT_IO_QOS, + .bcast = { + .big = BT_ISO_QOS_BIG_UNSET, + .bis = BT_ISO_QOS_BIS_UNSET, + .sync_interval = 0x00, + .packing = 0x00, + .framing = 0x00, + .in = DEFAULT_IO_QOS, + .out = DEFAULT_IO_QOS, + .encryption = 0x00, + .bcode = {0x00}, + .options = 0x00, + .skip = 0x0000, + .sync_timeout = 0x4000, + .sync_cte_type = 0x00, + .mse = 0x00, + .timeout = 0x4000, + }, }; static struct sock *iso_sock_alloc(struct net *net, struct socket *sock, @@ -893,9 +924,15 @@ static int iso_listen_bis(struct sock *sk) if (!hdev) return -EHOSTUNREACH; + /* Fail if user set invalid QoS */ + if (iso_pi(sk)->qos_user_set && !check_bcast_qos(&iso_pi(sk)->qos)) { + iso_pi(sk)->qos = default_qos; + return -EINVAL; + } + err = hci_pa_create_sync(hdev, &iso_pi(sk)->dst, le_addr_type(iso_pi(sk)->dst_type), - iso_pi(sk)->bc_sid); + iso_pi(sk)->bc_sid, &iso_pi(sk)->qos); hci_dev_put(hdev); @@ -1154,21 +1191,62 @@ static bool check_io_qos(struct bt_iso_io_qos *qos) return true; } -static bool check_qos(struct bt_iso_qos *qos) +static bool check_ucast_qos(struct bt_iso_qos *qos) { - if (qos->sca > 0x07) + if (qos->ucast.sca > 0x07) return false; - if (qos->packing > 0x01) + if (qos->ucast.packing > 0x01) return false; - if (qos->framing > 0x01) + if (qos->ucast.framing > 0x01) return false; - if (!check_io_qos(&qos->in)) + if (!check_io_qos(&qos->ucast.in)) return false; - if (!check_io_qos(&qos->out)) + if (!check_io_qos(&qos->ucast.out)) + return false; + + return true; +} + +static bool check_bcast_qos(struct bt_iso_qos *qos) +{ + if (qos->bcast.sync_interval > 0x07) + return false; + + if (qos->bcast.packing > 0x01) + return false; + + if (qos->bcast.framing > 0x01) + return false; + + if (!check_io_qos(&qos->bcast.in)) + return false; + + if (!check_io_qos(&qos->bcast.out)) + return false; + + if (qos->bcast.encryption > 0x01) + return false; + + if (qos->bcast.options > 0x07) + return false; + + if (qos->bcast.skip > 0x01f3) + return false; + + if (qos->bcast.sync_timeout < 0x000a || qos->bcast.sync_timeout > 0x4000) + return false; + + if (qos->bcast.sync_cte_type > 0x1f) + return false; + + if (qos->bcast.mse > 0x1f) + return false; + + if (qos->bcast.timeout < 0x000a || qos->bcast.timeout > 0x4000) return false; return true; @@ -1179,7 +1257,7 @@ static int iso_sock_setsockopt(struct socket *sock, int level, int optname, { struct sock *sk = sock->sk; int len, err = 0; - struct bt_iso_qos qos; + struct bt_iso_qos qos = default_qos; u32 opt; BT_DBG("sk %p", sk); @@ -1212,24 +1290,19 @@ static int iso_sock_setsockopt(struct socket *sock, int level, int optname, } len = min_t(unsigned int, sizeof(qos), optlen); - if (len != sizeof(qos)) { - err = -EINVAL; - break; - } - - memset(&qos, 0, sizeof(qos)); if (copy_from_sockptr(&qos, optval, len)) { err = -EFAULT; break; } - if (!check_qos(&qos)) { + if (len == sizeof(qos.ucast) && !check_ucast_qos(&qos)) { err = -EINVAL; break; } iso_pi(sk)->qos = qos; + iso_pi(sk)->qos_user_set = true; break; @@ -1419,7 +1492,7 @@ static bool iso_match_big(struct sock *sk, void *data) { struct hci_evt_le_big_sync_estabilished *ev = data; - return ev->handle == iso_pi(sk)->qos.big; + return ev->handle == iso_pi(sk)->qos.bcast.big; } static void iso_conn_ready(struct iso_conn *conn) @@ -1584,8 +1657,12 @@ static void iso_connect_cfm(struct hci_conn *hcon, __u8 status) /* Check if LE link has failed */ if (status) { - if (hcon->link) - iso_conn_del(hcon->link, bt_to_errno(status)); + struct hci_link *link, *t; + + list_for_each_entry_safe(link, t, &hcon->link_list, + list) + iso_conn_del(link->conn, bt_to_errno(status)); + return; } diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c index 55a7226233f9..376b523c7b26 100644 --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -745,7 +745,7 @@ EXPORT_SYMBOL_GPL(l2cap_chan_list); static void l2cap_conn_update_id_addr(struct work_struct *work) { struct l2cap_conn *conn = container_of(work, struct l2cap_conn, - id_addr_update_work); + id_addr_timer.work); struct hci_conn *hcon = conn->hcon; struct l2cap_chan *chan; @@ -1907,8 +1907,7 @@ static void l2cap_conn_del(struct hci_conn *hcon, int err) if (work_pending(&conn->pending_rx_work)) cancel_work_sync(&conn->pending_rx_work); - if (work_pending(&conn->id_addr_update_work)) - cancel_work_sync(&conn->id_addr_update_work); + cancel_delayed_work_sync(&conn->id_addr_timer); l2cap_unregister_all_users(conn); @@ -4694,7 +4693,6 @@ static inline int l2cap_disconnect_rsp(struct l2cap_conn *conn, chan = l2cap_get_chan_by_scid(conn, scid); if (!chan) { - mutex_unlock(&conn->chan_lock); return 0; } @@ -7874,7 +7872,7 @@ static struct l2cap_conn *l2cap_conn_add(struct hci_conn *hcon) skb_queue_head_init(&conn->pending_rx); INIT_WORK(&conn->pending_rx_work, process_pending_rx); - INIT_WORK(&conn->id_addr_update_work, l2cap_conn_update_id_addr); + INIT_DELAYED_WORK(&conn->id_addr_timer, l2cap_conn_update_id_addr); conn->disc_reason = HCI_ERROR_REMOTE_USER_TERM; diff --git a/net/bluetooth/mgmt.c b/net/bluetooth/mgmt.c index 249dc6777fb4..f7b2d0971f24 100644 --- a/net/bluetooth/mgmt.c +++ b/net/bluetooth/mgmt.c @@ -1399,8 +1399,16 @@ static int set_powered(struct sock *sk, struct hci_dev *hdev, void *data, goto failed; } - err = hci_cmd_sync_queue(hdev, set_powered_sync, cmd, - mgmt_set_powered_complete); + /* Cancel potentially blocking sync operation before power off */ + if (cp->val == 0x00) { + __hci_cmd_sync_cancel(hdev, -EHOSTDOWN); + err = hci_cmd_sync_queue(hdev, set_powered_sync, cmd, + mgmt_set_powered_complete); + } else { + /* Use hci_cmd_sync_submit since hdev might not be running */ + err = hci_cmd_sync_submit(hdev, set_powered_sync, cmd, + mgmt_set_powered_complete); + } if (err < 0) mgmt_pending_remove(cmd); @@ -8393,10 +8401,10 @@ static u32 get_supported_adv_flags(struct hci_dev *hdev) flags |= MGMT_ADV_FLAG_HW_OFFLOAD; flags |= MGMT_ADV_FLAG_CAN_SET_TX_POWER; - if (hdev->le_features[1] & HCI_LE_PHY_2M) + if (le_2m_capable(hdev)) flags |= MGMT_ADV_FLAG_SEC_2M; - if (hdev->le_features[1] & HCI_LE_PHY_CODED) + if (le_coded_capable(hdev)) flags |= MGMT_ADV_FLAG_SEC_CODED; } diff --git a/net/bluetooth/msft.c b/net/bluetooth/msft.c index bee6a4c656be..bf5cee48916c 100644 --- a/net/bluetooth/msft.c +++ b/net/bluetooth/msft.c @@ -743,17 +743,12 @@ __u64 msft_get_features(struct hci_dev *hdev) } static void msft_le_set_advertisement_filter_enable_cb(struct hci_dev *hdev, - u8 status, u16 opcode, - struct sk_buff *skb) + void *user_data, + u8 status) { - struct msft_cp_le_set_advertisement_filter_enable *cp; - struct msft_rp_le_set_advertisement_filter_enable *rp; + struct msft_cp_le_set_advertisement_filter_enable *cp = user_data; struct msft_data *msft = hdev->msft_data; - rp = (struct msft_rp_le_set_advertisement_filter_enable *)skb->data; - if (skb->len < sizeof(*rp)) - return; - /* Error 0x0C would be returned if the filter enabled status is * already set to whatever we were trying to set. * Although the default state should be disabled, some controller set @@ -766,7 +761,6 @@ static void msft_le_set_advertisement_filter_enable_cb(struct hci_dev *hdev, hci_dev_lock(hdev); - cp = hci_sent_cmd_data(hdev, hdev->msft_opcode); msft->filter_enabled = cp->enable; if (status == 0x0C) @@ -804,31 +798,23 @@ int msft_remove_monitor(struct hci_dev *hdev, struct adv_monitor *monitor) return msft_remove_monitor_sync(hdev, monitor); } -void msft_req_add_set_filter_enable(struct hci_request *req, bool enable) -{ - struct hci_dev *hdev = req->hdev; - struct msft_cp_le_set_advertisement_filter_enable cp; - - cp.sub_opcode = MSFT_OP_LE_SET_ADVERTISEMENT_FILTER_ENABLE; - cp.enable = enable; - - hci_req_add(req, hdev->msft_opcode, sizeof(cp), &cp); -} - int msft_set_filter_enable(struct hci_dev *hdev, bool enable) { - struct hci_request req; + struct msft_cp_le_set_advertisement_filter_enable cp; struct msft_data *msft = hdev->msft_data; int err; if (!msft) return -EOPNOTSUPP; - hci_req_init(&req, hdev); - msft_req_add_set_filter_enable(&req, enable); - err = hci_req_run_skb(&req, msft_le_set_advertisement_filter_enable_cb); + cp.sub_opcode = MSFT_OP_LE_SET_ADVERTISEMENT_FILTER_ENABLE; + cp.enable = enable; + err = __hci_cmd_sync_status(hdev, hdev->msft_opcode, sizeof(cp), &cp, + HCI_CMD_TIMEOUT); + + msft_le_set_advertisement_filter_enable_cb(hdev, &cp, err); - return err; + return 0; } bool msft_curve_validity(struct hci_dev *hdev) diff --git a/net/bluetooth/smp.c b/net/bluetooth/smp.c index 70663229b3cc..f1a9fc0012f0 100644 --- a/net/bluetooth/smp.c +++ b/net/bluetooth/smp.c @@ -58,6 +58,8 @@ #define SMP_TIMEOUT msecs_to_jiffies(30000) +#define ID_ADDR_TIMEOUT msecs_to_jiffies(200) + #define AUTH_REQ_MASK(dev) (hci_dev_test_flag(dev, HCI_SC_ENABLED) ? \ 0x3f : 0x07) #define KEY_DIST_MASK 0x07 @@ -1067,7 +1069,12 @@ static void smp_notify_keys(struct l2cap_conn *conn) if (hcon->type == LE_LINK) { bacpy(&hcon->dst, &smp->remote_irk->bdaddr); hcon->dst_type = smp->remote_irk->addr_type; - queue_work(hdev->workqueue, &conn->id_addr_update_work); + /* Use a short delay to make sure the new address is + * propagated _before_ the channels. + */ + queue_delayed_work(hdev->workqueue, + &conn->id_addr_timer, + ID_ADDR_TIMEOUT); } } diff --git a/net/bpf/bpf_dummy_struct_ops.c b/net/bpf/bpf_dummy_struct_ops.c index ff4f89a2b02a..5918d1b32e19 100644 --- a/net/bpf/bpf_dummy_struct_ops.c +++ b/net/bpf/bpf_dummy_struct_ops.c @@ -173,14 +173,11 @@ static int bpf_dummy_ops_check_member(const struct btf_type *t, static int bpf_dummy_ops_btf_struct_access(struct bpf_verifier_log *log, const struct bpf_reg_state *reg, - int off, int size, enum bpf_access_type atype, - u32 *next_btf_id, - enum bpf_type_flag *flag) + int off, int size) { const struct btf_type *state; const struct btf_type *t; s32 type_id; - int err; type_id = btf_find_by_name_kind(reg->btf, "bpf_dummy_ops_state", BTF_KIND_STRUCT); @@ -194,11 +191,12 @@ static int bpf_dummy_ops_btf_struct_access(struct bpf_verifier_log *log, return -EACCES; } - err = btf_struct_access(log, reg, off, size, atype, next_btf_id, flag); - if (err < 0) - return err; + if (off + size > sizeof(struct bpf_dummy_ops_state)) { + bpf_log(log, "write access at off %d with size %d\n", off, size); + return -EACCES; + } - return atype == BPF_READ ? err : NOT_INIT; + return NOT_INIT; } static const struct bpf_verifier_ops bpf_dummy_verifier_ops = { diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index f81b24320a36..e79e3a415ca9 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -19,7 +19,9 @@ #include <linux/error-injection.h> #include <linux/smp.h> #include <linux/sock_diag.h> +#include <linux/netfilter.h> #include <net/xdp.h> +#include <net/netfilter/nf_bpf_link.h> #define CREATE_TRACE_POINTS #include <trace/events/bpf_test_run.h> @@ -215,6 +217,16 @@ static void xdp_test_run_teardown(struct xdp_test_data *xdp) kfree(xdp->skbs); } +static bool frame_was_changed(const struct xdp_page_head *head) +{ + /* xdp_scrub_frame() zeroes the data pointer, flags is the last field, + * i.e. has the highest chances to be overwritten. If those two are + * untouched, it's most likely safe to skip the context reset. + */ + return head->frame->data != head->orig_ctx.data || + head->frame->flags != head->orig_ctx.flags; +} + static bool ctx_was_changed(struct xdp_page_head *head) { return head->orig_ctx.data != head->ctx.data || @@ -224,7 +236,7 @@ static bool ctx_was_changed(struct xdp_page_head *head) static void reset_ctx(struct xdp_page_head *head) { - if (likely(!ctx_was_changed(head))) + if (likely(!frame_was_changed(head) && !ctx_was_changed(head))) return; head->ctx.data = head->orig_ctx.data; @@ -538,6 +550,11 @@ int noinline bpf_fentry_test8(struct bpf_fentry_test_t *arg) return (long)arg->a; } +__bpf_kfunc u32 bpf_fentry_test9(u32 *a) +{ + return *a; +} + __bpf_kfunc int bpf_modify_return_test(int a, int *b) { *b += 1; @@ -567,6 +584,11 @@ long noinline bpf_kfunc_call_test4(signed char a, short b, int c, long d) return (long)a + (long)b + (long)c + d; } +int noinline bpf_fentry_shadow_test(int a) +{ + return a + 1; +} + struct prog_test_member1 { int a; }; @@ -598,6 +620,11 @@ bpf_kfunc_call_test_acquire(unsigned long *scalar_ptr) return &prog_test_struct; } +__bpf_kfunc void bpf_kfunc_call_test_offset(struct prog_test_ref_kfunc *p) +{ + WARN_ON_ONCE(1); +} + __bpf_kfunc struct prog_test_member * bpf_kfunc_call_memb_acquire(void) { @@ -607,9 +634,6 @@ bpf_kfunc_call_memb_acquire(void) __bpf_kfunc void bpf_kfunc_call_test_release(struct prog_test_ref_kfunc *p) { - if (!p) - return; - refcount_dec(&p->cnt); } @@ -657,17 +681,6 @@ __bpf_kfunc void bpf_kfunc_call_int_mem_release(int *p) { } -__bpf_kfunc struct prog_test_ref_kfunc * -bpf_kfunc_call_test_kptr_get(struct prog_test_ref_kfunc **pp, int a, int b) -{ - struct prog_test_ref_kfunc *p = READ_ONCE(*pp); - - if (!p) - return NULL; - refcount_inc(&p->cnt); - return p; -} - struct prog_test_pass1 { int x0; struct { @@ -744,6 +757,7 @@ __bpf_kfunc void bpf_kfunc_call_test_mem_len_fail2(u64 *mem, int len) __bpf_kfunc void bpf_kfunc_call_test_ref(struct prog_test_ref_kfunc *p) { + /* p != NULL, but p->cnt could be 0 */ } __bpf_kfunc void bpf_kfunc_call_test_destructive(void) @@ -781,7 +795,6 @@ BTF_ID_FLAGS(func, bpf_kfunc_call_test_get_rdwr_mem, KF_RET_NULL) BTF_ID_FLAGS(func, bpf_kfunc_call_test_get_rdonly_mem, KF_RET_NULL) BTF_ID_FLAGS(func, bpf_kfunc_call_test_acq_rdonly_mem, KF_ACQUIRE | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_kfunc_call_int_mem_release, KF_RELEASE) -BTF_ID_FLAGS(func, bpf_kfunc_call_test_kptr_get, KF_ACQUIRE | KF_RET_NULL | KF_KPTR_GET) BTF_ID_FLAGS(func, bpf_kfunc_call_test_pass_ctx) BTF_ID_FLAGS(func, bpf_kfunc_call_test_pass1) BTF_ID_FLAGS(func, bpf_kfunc_call_test_pass2) @@ -791,9 +804,10 @@ BTF_ID_FLAGS(func, bpf_kfunc_call_test_fail3) BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_pass1) BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_fail1) BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_fail2) -BTF_ID_FLAGS(func, bpf_kfunc_call_test_ref, KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, bpf_kfunc_call_test_ref, KF_TRUSTED_ARGS | KF_RCU) BTF_ID_FLAGS(func, bpf_kfunc_call_test_destructive, KF_DESTRUCTIVE) BTF_ID_FLAGS(func, bpf_kfunc_call_test_static_unused_arg) +BTF_ID_FLAGS(func, bpf_kfunc_call_test_offset) BTF_SET8_END(test_sk_check_kfunc_ids) static void *bpf_test_init(const union bpf_attr *kattr, u32 user_size, @@ -843,7 +857,8 @@ int bpf_prog_test_run_tracing(struct bpf_prog *prog, bpf_fentry_test5(11, (void *)12, 13, 14, 15) != 65 || bpf_fentry_test6(16, (void *)17, 18, 19, (void *)20, 21) != 111 || bpf_fentry_test7((struct bpf_fentry_test_t *)0) != 0 || - bpf_fentry_test8(&arg) != 0) + bpf_fentry_test8(&arg) != 0 || + bpf_fentry_test9(&retval) != 0) goto out; break; case BPF_MODIFY_RETURN: @@ -1678,6 +1693,162 @@ out: return err; } +static int verify_and_copy_hook_state(struct nf_hook_state *state, + const struct nf_hook_state *user, + struct net_device *dev) +{ + if (user->in || user->out) + return -EINVAL; + + if (user->net || user->sk || user->okfn) + return -EINVAL; + + switch (user->pf) { + case NFPROTO_IPV4: + case NFPROTO_IPV6: + switch (state->hook) { + case NF_INET_PRE_ROUTING: + state->in = dev; + break; + case NF_INET_LOCAL_IN: + state->in = dev; + break; + case NF_INET_FORWARD: + state->in = dev; + state->out = dev; + break; + case NF_INET_LOCAL_OUT: + state->out = dev; + break; + case NF_INET_POST_ROUTING: + state->out = dev; + break; + } + + break; + default: + return -EINVAL; + } + + state->pf = user->pf; + state->hook = user->hook; + + return 0; +} + +static __be16 nfproto_eth(int nfproto) +{ + switch (nfproto) { + case NFPROTO_IPV4: + return htons(ETH_P_IP); + case NFPROTO_IPV6: + break; + } + + return htons(ETH_P_IPV6); +} + +int bpf_prog_test_run_nf(struct bpf_prog *prog, + const union bpf_attr *kattr, + union bpf_attr __user *uattr) +{ + struct net *net = current->nsproxy->net_ns; + struct net_device *dev = net->loopback_dev; + struct nf_hook_state *user_ctx, hook_state = { + .pf = NFPROTO_IPV4, + .hook = NF_INET_LOCAL_OUT, + }; + u32 size = kattr->test.data_size_in; + u32 repeat = kattr->test.repeat; + struct bpf_nf_ctx ctx = { + .state = &hook_state, + }; + struct sk_buff *skb = NULL; + u32 retval, duration; + void *data; + int ret; + + if (kattr->test.flags || kattr->test.cpu || kattr->test.batch_size) + return -EINVAL; + + if (size < sizeof(struct iphdr)) + return -EINVAL; + + data = bpf_test_init(kattr, kattr->test.data_size_in, size, + NET_SKB_PAD + NET_IP_ALIGN, + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))); + if (IS_ERR(data)) + return PTR_ERR(data); + + if (!repeat) + repeat = 1; + + user_ctx = bpf_ctx_init(kattr, sizeof(struct nf_hook_state)); + if (IS_ERR(user_ctx)) { + kfree(data); + return PTR_ERR(user_ctx); + } + + if (user_ctx) { + ret = verify_and_copy_hook_state(&hook_state, user_ctx, dev); + if (ret) + goto out; + } + + skb = slab_build_skb(data); + if (!skb) { + ret = -ENOMEM; + goto out; + } + + data = NULL; /* data released via kfree_skb */ + + skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN); + __skb_put(skb, size); + + ret = -EINVAL; + + if (hook_state.hook != NF_INET_LOCAL_OUT) { + if (size < ETH_HLEN + sizeof(struct iphdr)) + goto out; + + skb->protocol = eth_type_trans(skb, dev); + switch (skb->protocol) { + case htons(ETH_P_IP): + if (hook_state.pf == NFPROTO_IPV4) + break; + goto out; + case htons(ETH_P_IPV6): + if (size < ETH_HLEN + sizeof(struct ipv6hdr)) + goto out; + if (hook_state.pf == NFPROTO_IPV6) + break; + goto out; + default: + ret = -EPROTO; + goto out; + } + + skb_reset_network_header(skb); + } else { + skb->protocol = nfproto_eth(hook_state.pf); + } + + ctx.skb = skb; + + ret = bpf_test_run(prog, &ctx, repeat, &retval, &duration, false); + if (ret) + goto out; + + ret = bpf_test_finish(kattr, uattr, NULL, NULL, 0, retval, duration); + +out: + kfree(user_ctx); + kfree_skb(skb); + kfree(data); + return ret; +} + static const struct btf_kfunc_id_set bpf_prog_test_kfunc_set = { .owner = THIS_MODULE, .set = &test_sk_check_kfunc_ids, diff --git a/net/bridge/br_arp_nd_proxy.c b/net/bridge/br_arp_nd_proxy.c index e5e48c6e35d7..c7869a286df4 100644 --- a/net/bridge/br_arp_nd_proxy.c +++ b/net/bridge/br_arp_nd_proxy.c @@ -30,7 +30,7 @@ void br_recalculate_neigh_suppress_enabled(struct net_bridge *br) bool neigh_suppress = false; list_for_each_entry(p, &br->port_list, list) { - if (p->flags & BR_NEIGH_SUPPRESS) { + if (p->flags & (BR_NEIGH_SUPPRESS | BR_NEIGH_VLAN_SUPPRESS)) { neigh_suppress = true; break; } @@ -158,7 +158,7 @@ void br_do_proxy_suppress_arp(struct sk_buff *skb, struct net_bridge *br, return; if (br_opt_get(br, BROPT_NEIGH_SUPPRESS_ENABLED)) { - if (p && (p->flags & BR_NEIGH_SUPPRESS)) + if (br_is_neigh_suppress_enabled(p, vid)) return; if (parp->ar_op != htons(ARPOP_RREQUEST) && parp->ar_op != htons(ARPOP_RREPLY) && @@ -192,7 +192,7 @@ void br_do_proxy_suppress_arp(struct sk_buff *skb, struct net_bridge *br, if (n) { struct net_bridge_fdb_entry *f; - if (!(n->nud_state & NUD_VALID)) { + if (!(READ_ONCE(n->nud_state) & NUD_VALID)) { neigh_release(n); return; } @@ -202,8 +202,8 @@ void br_do_proxy_suppress_arp(struct sk_buff *skb, struct net_bridge *br, bool replied = false; if ((p && (p->flags & BR_PROXYARP)) || - (f->dst && (f->dst->flags & (BR_PROXYARP_WIFI | - BR_NEIGH_SUPPRESS)))) { + (f->dst && (f->dst->flags & BR_PROXYARP_WIFI)) || + br_is_neigh_suppress_enabled(f->dst, vid)) { if (!vid) br_arp_send(br, p, skb->dev, sip, tip, sha, n->ha, sha, 0, 0); @@ -407,7 +407,7 @@ void br_do_suppress_nd(struct sk_buff *skb, struct net_bridge *br, BR_INPUT_SKB_CB(skb)->proxyarp_replied = 0; - if (p && (p->flags & BR_NEIGH_SUPPRESS)) + if (br_is_neigh_suppress_enabled(p, vid)) return; if (msg->icmph.icmp6_type == NDISC_NEIGHBOUR_ADVERTISEMENT && @@ -452,7 +452,7 @@ void br_do_suppress_nd(struct sk_buff *skb, struct net_bridge *br, if (n) { struct net_bridge_fdb_entry *f; - if (!(n->nud_state & NUD_VALID)) { + if (!(READ_ONCE(n->nud_state) & NUD_VALID)) { neigh_release(n); return; } @@ -461,7 +461,7 @@ void br_do_suppress_nd(struct sk_buff *skb, struct net_bridge *br, if (f) { bool replied = false; - if (f->dst && (f->dst->flags & BR_NEIGH_SUPPRESS)) { + if (br_is_neigh_suppress_enabled(f->dst, vid)) { if (vid != 0) br_nd_send(br, p, skb, n, skb->vlan_proto, @@ -483,3 +483,24 @@ void br_do_suppress_nd(struct sk_buff *skb, struct net_bridge *br, } } #endif + +bool br_is_neigh_suppress_enabled(const struct net_bridge_port *p, u16 vid) +{ + if (!p) + return false; + + if (!vid) + return !!(p->flags & BR_NEIGH_SUPPRESS); + + if (p->flags & BR_NEIGH_VLAN_SUPPRESS) { + struct net_bridge_vlan_group *vg = nbp_vlan_group_rcu(p); + struct net_bridge_vlan *v; + + v = br_vlan_find(vg, vid); + if (!v) + return false; + return !!(v->priv_flags & BR_VLFLAG_NEIGH_SUPPRESS_ENABLED); + } else { + return !!(p->flags & BR_NEIGH_SUPPRESS); + } +} diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c index b82906fc999a..8eca8a5c80c6 100644 --- a/net/bridge/br_device.c +++ b/net/bridge/br_device.c @@ -80,10 +80,10 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev) dest = eth_hdr(skb)->h_dest; if (is_broadcast_ether_addr(dest)) { - br_flood(br, skb, BR_PKT_BROADCAST, false, true); + br_flood(br, skb, BR_PKT_BROADCAST, false, true, vid); } else if (is_multicast_ether_addr(dest)) { if (unlikely(netpoll_tx_running(dev))) { - br_flood(br, skb, BR_PKT_MULTICAST, false, true); + br_flood(br, skb, BR_PKT_MULTICAST, false, true, vid); goto out; } if (br_multicast_rcv(&brmctx, &pmctx_null, vlan, skb, vid)) { @@ -96,11 +96,11 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev) br_multicast_querier_exists(brmctx, eth_hdr(skb), mdst)) br_multicast_flood(mdst, skb, brmctx, false, true); else - br_flood(br, skb, BR_PKT_MULTICAST, false, true); + br_flood(br, skb, BR_PKT_MULTICAST, false, true, vid); } else if ((dst = br_fdb_find_rcu(br, dest, vid)) != NULL) { br_forward(dst->dst, skb, false, true); } else { - br_flood(br, skb, BR_PKT_UNICAST, false, true); + br_flood(br, skb, BR_PKT_UNICAST, false, true, vid); } out: rcu_read_unlock(); @@ -468,6 +468,9 @@ static const struct net_device_ops br_netdev_ops = { .ndo_fdb_del_bulk = br_fdb_delete_bulk, .ndo_fdb_dump = br_fdb_dump, .ndo_fdb_get = br_fdb_get, + .ndo_mdb_add = br_mdb_add, + .ndo_mdb_del = br_mdb_del, + .ndo_mdb_dump = br_mdb_dump, .ndo_bridge_getlink = br_getlink, .ndo_bridge_setlink = br_setlink, .ndo_bridge_dellink = br_dellink, diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c index 02bb620d3b8d..57744704ff69 100644 --- a/net/bridge/br_forward.c +++ b/net/bridge/br_forward.c @@ -197,7 +197,8 @@ out: /* called under rcu_read_lock */ void br_flood(struct net_bridge *br, struct sk_buff *skb, - enum br_pkt_type pkt_type, bool local_rcv, bool local_orig) + enum br_pkt_type pkt_type, bool local_rcv, bool local_orig, + u16 vid) { struct net_bridge_port *prev = NULL; struct net_bridge_port *p; @@ -224,8 +225,9 @@ void br_flood(struct net_bridge *br, struct sk_buff *skb, /* Do not flood to ports that enable proxy ARP */ if (p->flags & BR_PROXYARP) continue; - if ((p->flags & (BR_PROXYARP_WIFI | BR_NEIGH_SUPPRESS)) && - BR_INPUT_SKB_CB(skb)->proxyarp_replied) + if (BR_INPUT_SKB_CB(skb)->proxyarp_replied && + ((p->flags & BR_PROXYARP_WIFI) || + br_is_neigh_suppress_enabled(p, vid))) continue; prev = maybe_deliver(prev, p, skb, local_orig); diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c index 24f01ff113f0..3f04b40f6056 100644 --- a/net/bridge/br_if.c +++ b/net/bridge/br_if.c @@ -759,7 +759,7 @@ void br_port_flags_change(struct net_bridge_port *p, unsigned long mask) if (mask & BR_AUTO_MASK) nbp_update_port_count(br); - if (mask & BR_NEIGH_SUPPRESS) + if (mask & (BR_NEIGH_SUPPRESS | BR_NEIGH_VLAN_SUPPRESS)) br_recalculate_neigh_suppress_enabled(br); } diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c index 3027e8f6be15..fc17b9fd93e6 100644 --- a/net/bridge/br_input.c +++ b/net/bridge/br_input.c @@ -207,7 +207,7 @@ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb br_forward(dst->dst, skb, local_rcv, false); } else { if (!mcast_hit) - br_flood(br, skb, pkt_type, local_rcv, false); + br_flood(br, skb, pkt_type, local_rcv, false, vid); else br_multicast_flood(mdst, skb, brmctx, local_rcv, false); } diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c index 25c48d81a597..7305f5f8215c 100644 --- a/net/bridge/br_mdb.c +++ b/net/bridge/br_mdb.c @@ -380,82 +380,37 @@ out: return err; } -static int br_mdb_valid_dump_req(const struct nlmsghdr *nlh, - struct netlink_ext_ack *extack) +int br_mdb_dump(struct net_device *dev, struct sk_buff *skb, + struct netlink_callback *cb) { + struct net_bridge *br = netdev_priv(dev); struct br_port_msg *bpm; + struct nlmsghdr *nlh; + int err; - if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*bpm))) { - NL_SET_ERR_MSG_MOD(extack, "Invalid header for mdb dump request"); - return -EINVAL; - } + nlh = nlmsg_put(skb, NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, RTM_GETMDB, sizeof(*bpm), + NLM_F_MULTI); + if (!nlh) + return -EMSGSIZE; bpm = nlmsg_data(nlh); - if (bpm->ifindex) { - NL_SET_ERR_MSG_MOD(extack, "Filtering by device index is not supported for mdb dump request"); - return -EINVAL; - } - if (nlmsg_attrlen(nlh, sizeof(*bpm))) { - NL_SET_ERR_MSG(extack, "Invalid data after header in mdb dump request"); - return -EINVAL; - } - - return 0; -} - -static int br_mdb_dump(struct sk_buff *skb, struct netlink_callback *cb) -{ - struct net_device *dev; - struct net *net = sock_net(skb->sk); - struct nlmsghdr *nlh = NULL; - int idx = 0, s_idx; - - if (cb->strict_check) { - int err = br_mdb_valid_dump_req(cb->nlh, cb->extack); - - if (err < 0) - return err; - } - - s_idx = cb->args[0]; + memset(bpm, 0, sizeof(*bpm)); + bpm->ifindex = dev->ifindex; rcu_read_lock(); - for_each_netdev_rcu(net, dev) { - if (netif_is_bridge_master(dev)) { - struct net_bridge *br = netdev_priv(dev); - struct br_port_msg *bpm; - - if (idx < s_idx) - goto skip; - - nlh = nlmsg_put(skb, NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, RTM_GETMDB, - sizeof(*bpm), NLM_F_MULTI); - if (nlh == NULL) - break; - - bpm = nlmsg_data(nlh); - memset(bpm, 0, sizeof(*bpm)); - bpm->ifindex = dev->ifindex; - if (br_mdb_fill_info(skb, cb, dev) < 0) - goto out; - if (br_rports_fill_info(skb, &br->multicast_ctx) < 0) - goto out; - - cb->args[1] = 0; - nlmsg_end(skb, nlh); - skip: - idx++; - } - } + err = br_mdb_fill_info(skb, cb, dev); + if (err) + goto out; + err = br_rports_fill_info(skb, &br->multicast_ctx); + if (err) + goto out; out: - if (nlh) - nlmsg_end(skb, nlh); rcu_read_unlock(); - cb->args[0] = idx; - return skb->len; + nlmsg_end(skb, nlh); + return err; } static int nlmsg_populate_mdb_fill(struct sk_buff *skb, @@ -683,60 +638,6 @@ static const struct nla_policy br_mdbe_attrs_pol[MDBE_ATTR_MAX + 1] = { [MDBE_ATTR_RTPROT] = NLA_POLICY_MIN(NLA_U8, RTPROT_STATIC), }; -static int validate_mdb_entry(const struct nlattr *attr, - struct netlink_ext_ack *extack) -{ - struct br_mdb_entry *entry = nla_data(attr); - - if (nla_len(attr) != sizeof(struct br_mdb_entry)) { - NL_SET_ERR_MSG_MOD(extack, "Invalid MDBA_SET_ENTRY attribute length"); - return -EINVAL; - } - - if (entry->ifindex == 0) { - NL_SET_ERR_MSG_MOD(extack, "Zero entry ifindex is not allowed"); - return -EINVAL; - } - - if (entry->addr.proto == htons(ETH_P_IP)) { - if (!ipv4_is_multicast(entry->addr.u.ip4)) { - NL_SET_ERR_MSG_MOD(extack, "IPv4 entry group address is not multicast"); - return -EINVAL; - } - if (ipv4_is_local_multicast(entry->addr.u.ip4)) { - NL_SET_ERR_MSG_MOD(extack, "IPv4 entry group address is local multicast"); - return -EINVAL; - } -#if IS_ENABLED(CONFIG_IPV6) - } else if (entry->addr.proto == htons(ETH_P_IPV6)) { - if (ipv6_addr_is_ll_all_nodes(&entry->addr.u.ip6)) { - NL_SET_ERR_MSG_MOD(extack, "IPv6 entry group address is link-local all nodes"); - return -EINVAL; - } -#endif - } else if (entry->addr.proto == 0) { - /* L2 mdb */ - if (!is_multicast_ether_addr(entry->addr.u.mac_addr)) { - NL_SET_ERR_MSG_MOD(extack, "L2 entry group is not multicast"); - return -EINVAL; - } - } else { - NL_SET_ERR_MSG_MOD(extack, "Unknown entry protocol"); - return -EINVAL; - } - - if (entry->state != MDB_PERMANENT && entry->state != MDB_TEMPORARY) { - NL_SET_ERR_MSG_MOD(extack, "Unknown entry state"); - return -EINVAL; - } - if (entry->vid >= VLAN_VID_MASK) { - NL_SET_ERR_MSG_MOD(extack, "Invalid entry VLAN id"); - return -EINVAL; - } - - return 0; -} - static bool is_valid_mdb_source(struct nlattr *attr, __be16 proto, struct netlink_ext_ack *extack) { @@ -1299,49 +1200,16 @@ static int br_mdb_config_attrs_init(struct nlattr *set_attrs, return 0; } -static const struct nla_policy mdba_policy[MDBA_SET_ENTRY_MAX + 1] = { - [MDBA_SET_ENTRY_UNSPEC] = { .strict_start_type = MDBA_SET_ENTRY_ATTRS + 1 }, - [MDBA_SET_ENTRY] = NLA_POLICY_VALIDATE_FN(NLA_BINARY, - validate_mdb_entry, - sizeof(struct br_mdb_entry)), - [MDBA_SET_ENTRY_ATTRS] = { .type = NLA_NESTED }, -}; - -static int br_mdb_config_init(struct net *net, const struct nlmsghdr *nlh, - struct br_mdb_config *cfg, +static int br_mdb_config_init(struct br_mdb_config *cfg, struct net_device *dev, + struct nlattr *tb[], u16 nlmsg_flags, struct netlink_ext_ack *extack) { - struct nlattr *tb[MDBA_SET_ENTRY_MAX + 1]; - struct br_port_msg *bpm; - struct net_device *dev; - int err; - - err = nlmsg_parse_deprecated(nlh, sizeof(*bpm), tb, - MDBA_SET_ENTRY_MAX, mdba_policy, extack); - if (err) - return err; + struct net *net = dev_net(dev); memset(cfg, 0, sizeof(*cfg)); cfg->filter_mode = MCAST_EXCLUDE; cfg->rt_protocol = RTPROT_STATIC; - cfg->nlflags = nlh->nlmsg_flags; - - bpm = nlmsg_data(nlh); - if (!bpm->ifindex) { - NL_SET_ERR_MSG_MOD(extack, "Invalid bridge ifindex"); - return -EINVAL; - } - - dev = __dev_get_by_index(net, bpm->ifindex); - if (!dev) { - NL_SET_ERR_MSG_MOD(extack, "Bridge device doesn't exist"); - return -ENODEV; - } - - if (!netif_is_bridge_master(dev)) { - NL_SET_ERR_MSG_MOD(extack, "Device is not a bridge"); - return -EOPNOTSUPP; - } + cfg->nlflags = nlmsg_flags; cfg->br = netdev_priv(dev); @@ -1355,11 +1223,6 @@ static int br_mdb_config_init(struct net *net, const struct nlmsghdr *nlh, return -EINVAL; } - if (NL_REQ_ATTR_CHECK(extack, NULL, tb, MDBA_SET_ENTRY)) { - NL_SET_ERR_MSG_MOD(extack, "Missing MDBA_SET_ENTRY attribute"); - return -EINVAL; - } - cfg->entry = nla_data(tb[MDBA_SET_ENTRY]); if (cfg->entry->ifindex != cfg->br->dev->ifindex) { @@ -1383,6 +1246,12 @@ static int br_mdb_config_init(struct net *net, const struct nlmsghdr *nlh, } } + if (cfg->entry->addr.proto == htons(ETH_P_IP) && + ipv4_is_zeronet(cfg->entry->addr.u.ip4)) { + NL_SET_ERR_MSG_MOD(extack, "IPv4 entry group address 0.0.0.0 is not allowed"); + return -EINVAL; + } + if (tb[MDBA_SET_ENTRY_ATTRS]) return br_mdb_config_attrs_init(tb[MDBA_SET_ENTRY_ATTRS], cfg, extack); @@ -1397,16 +1266,15 @@ static void br_mdb_config_fini(struct br_mdb_config *cfg) br_mdb_config_src_list_fini(cfg); } -static int br_mdb_add(struct sk_buff *skb, struct nlmsghdr *nlh, - struct netlink_ext_ack *extack) +int br_mdb_add(struct net_device *dev, struct nlattr *tb[], u16 nlmsg_flags, + struct netlink_ext_ack *extack) { - struct net *net = sock_net(skb->sk); struct net_bridge_vlan_group *vg; struct net_bridge_vlan *v; struct br_mdb_config cfg; int err; - err = br_mdb_config_init(net, nlh, &cfg, extack); + err = br_mdb_config_init(&cfg, dev, tb, nlmsg_flags, extack); if (err) return err; @@ -1500,16 +1368,15 @@ unlock: return err; } -static int br_mdb_del(struct sk_buff *skb, struct nlmsghdr *nlh, - struct netlink_ext_ack *extack) +int br_mdb_del(struct net_device *dev, struct nlattr *tb[], + struct netlink_ext_ack *extack) { - struct net *net = sock_net(skb->sk); struct net_bridge_vlan_group *vg; struct net_bridge_vlan *v; struct br_mdb_config cfg; int err; - err = br_mdb_config_init(net, nlh, &cfg, extack); + err = br_mdb_config_init(&cfg, dev, tb, 0, extack); if (err) return err; @@ -1534,17 +1401,3 @@ static int br_mdb_del(struct sk_buff *skb, struct nlmsghdr *nlh, br_mdb_config_fini(&cfg); return err; } - -void br_mdb_init(void) -{ - rtnl_register_module(THIS_MODULE, PF_BRIDGE, RTM_GETMDB, NULL, br_mdb_dump, 0); - rtnl_register_module(THIS_MODULE, PF_BRIDGE, RTM_NEWMDB, br_mdb_add, NULL, 0); - rtnl_register_module(THIS_MODULE, PF_BRIDGE, RTM_DELMDB, br_mdb_del, NULL, 0); -} - -void br_mdb_uninit(void) -{ - rtnl_unregister(PF_BRIDGE, RTM_GETMDB); - rtnl_unregister(PF_BRIDGE, RTM_NEWMDB); - rtnl_unregister(PF_BRIDGE, RTM_DELMDB); -} diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c index 4bc6761517bb..1a801fab9543 100644 --- a/net/bridge/br_netfilter_hooks.c +++ b/net/bridge/br_netfilter_hooks.c @@ -277,7 +277,8 @@ int br_nf_pre_routing_finish_bridge(struct net *net, struct sock *sk, struct sk_ struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb); int ret; - if ((neigh->nud_state & NUD_CONNECTED) && neigh->hh.hh_len) { + if ((READ_ONCE(neigh->nud_state) & NUD_CONNECTED) && + READ_ONCE(neigh->hh.hh_len)) { neigh_hh_bridge(&neigh->hh, skb); skb->dev = nf_bridge->physindev; ret = br_handle_frame_finish(net, sk, skb); diff --git a/net/bridge/br_netfilter_ipv6.c b/net/bridge/br_netfilter_ipv6.c index 6b07f30675bb..550039dfc31a 100644 --- a/net/bridge/br_netfilter_ipv6.c +++ b/net/bridge/br_netfilter_ipv6.c @@ -40,62 +40,6 @@ #include <linux/sysctl.h> #endif -/* We only check the length. A bridge shouldn't do any hop-by-hop stuff - * anyway - */ -static int br_nf_check_hbh_len(struct sk_buff *skb) -{ - unsigned char *raw = (u8 *)(ipv6_hdr(skb) + 1); - u32 pkt_len; - const unsigned char *nh = skb_network_header(skb); - int off = raw - nh; - int len = (raw[1] + 1) << 3; - - if ((raw + len) - skb->data > skb_headlen(skb)) - goto bad; - - off += 2; - len -= 2; - - while (len > 0) { - int optlen = nh[off + 1] + 2; - - switch (nh[off]) { - case IPV6_TLV_PAD1: - optlen = 1; - break; - - case IPV6_TLV_PADN: - break; - - case IPV6_TLV_JUMBO: - if (nh[off + 1] != 4 || (off & 3) != 2) - goto bad; - pkt_len = ntohl(*(__be32 *)(nh + off + 2)); - if (pkt_len <= IPV6_MAXPLEN || - ipv6_hdr(skb)->payload_len) - goto bad; - if (pkt_len > skb->len - sizeof(struct ipv6hdr)) - goto bad; - if (pskb_trim_rcsum(skb, - pkt_len + sizeof(struct ipv6hdr))) - goto bad; - nh = skb_network_header(skb); - break; - default: - if (optlen > len) - goto bad; - break; - } - off += optlen; - len -= optlen; - } - if (len == 0) - return 0; -bad: - return -1; -} - int br_validate_ipv6(struct net *net, struct sk_buff *skb) { const struct ipv6hdr *hdr; @@ -115,22 +59,19 @@ int br_validate_ipv6(struct net *net, struct sk_buff *skb) goto inhdr_error; pkt_len = ntohs(hdr->payload_len); + if (hdr->nexthdr == NEXTHDR_HOP && nf_ip6_check_hbh_len(skb, &pkt_len)) + goto drop; - if (pkt_len || hdr->nexthdr != NEXTHDR_HOP) { - if (pkt_len + ip6h_len > skb->len) { - __IP6_INC_STATS(net, idev, - IPSTATS_MIB_INTRUNCATEDPKTS); - goto drop; - } - if (pskb_trim_rcsum(skb, pkt_len + ip6h_len)) { - __IP6_INC_STATS(net, idev, - IPSTATS_MIB_INDISCARDS); - goto drop; - } - hdr = ipv6_hdr(skb); + if (pkt_len + ip6h_len > skb->len) { + __IP6_INC_STATS(net, idev, + IPSTATS_MIB_INTRUNCATEDPKTS); + goto drop; } - if (hdr->nexthdr == NEXTHDR_HOP && br_nf_check_hbh_len(skb)) + if (pskb_trim_rcsum(skb, pkt_len + ip6h_len)) { + __IP6_INC_STATS(net, idev, + IPSTATS_MIB_INDISCARDS); goto drop; + } memset(IP6CB(skb), 0, sizeof(struct inet6_skb_parm)); /* No IP options in IPv6 header; however it should be diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c index 9173e52b89e2..05c5863d2e20 100644 --- a/net/bridge/br_netlink.c +++ b/net/bridge/br_netlink.c @@ -189,6 +189,7 @@ static inline size_t br_port_info_size(void) + nla_total_size(1) /* IFLA_BRPORT_ISOLATED */ + nla_total_size(1) /* IFLA_BRPORT_LOCKED */ + nla_total_size(1) /* IFLA_BRPORT_MAB */ + + nla_total_size(1) /* IFLA_BRPORT_NEIGH_VLAN_SUPPRESS */ + nla_total_size(sizeof(struct ifla_bridge_id)) /* IFLA_BRPORT_ROOT_ID */ + nla_total_size(sizeof(struct ifla_bridge_id)) /* IFLA_BRPORT_BRIDGE_ID */ + nla_total_size(sizeof(u16)) /* IFLA_BRPORT_DESIGNATED_PORT */ @@ -278,7 +279,9 @@ static int br_port_fill_attrs(struct sk_buff *skb, !!(p->flags & BR_MRP_LOST_IN_CONT)) || nla_put_u8(skb, IFLA_BRPORT_ISOLATED, !!(p->flags & BR_ISOLATED)) || nla_put_u8(skb, IFLA_BRPORT_LOCKED, !!(p->flags & BR_PORT_LOCKED)) || - nla_put_u8(skb, IFLA_BRPORT_MAB, !!(p->flags & BR_PORT_MAB))) + nla_put_u8(skb, IFLA_BRPORT_MAB, !!(p->flags & BR_PORT_MAB)) || + nla_put_u8(skb, IFLA_BRPORT_NEIGH_VLAN_SUPPRESS, + !!(p->flags & BR_NEIGH_VLAN_SUPPRESS))) return -EMSGSIZE; timerval = br_timer_value(&p->message_age_timer); @@ -891,6 +894,7 @@ static const struct nla_policy br_port_policy[IFLA_BRPORT_MAX + 1] = { [IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT] = { .type = NLA_U32 }, [IFLA_BRPORT_MCAST_N_GROUPS] = { .type = NLA_REJECT }, [IFLA_BRPORT_MCAST_MAX_GROUPS] = { .type = NLA_U32 }, + [IFLA_BRPORT_NEIGH_VLAN_SUPPRESS] = NLA_POLICY_MAX(NLA_U8, 1), }; /* Change the state of the port and notify spanning tree */ @@ -957,6 +961,8 @@ static int br_setport(struct net_bridge_port *p, struct nlattr *tb[], br_set_port_flag(p, tb, IFLA_BRPORT_ISOLATED, BR_ISOLATED); br_set_port_flag(p, tb, IFLA_BRPORT_LOCKED, BR_PORT_LOCKED); br_set_port_flag(p, tb, IFLA_BRPORT_MAB, BR_PORT_MAB); + br_set_port_flag(p, tb, IFLA_BRPORT_NEIGH_VLAN_SUPPRESS, + BR_NEIGH_VLAN_SUPPRESS); if ((p->flags & BR_PORT_MAB) && (!(p->flags & BR_PORT_LOCKED) || !(p->flags & BR_LEARNING))) { @@ -1886,7 +1892,6 @@ int __init br_netlink_init(void) { int err; - br_mdb_init(); br_vlan_rtnl_init(); rtnl_af_register(&br_af_ops); @@ -1898,13 +1903,11 @@ int __init br_netlink_init(void) out_af: rtnl_af_unregister(&br_af_ops); - br_mdb_uninit(); return err; } void br_netlink_fini(void) { - br_mdb_uninit(); br_vlan_rtnl_uninit(); rtnl_af_unregister(&br_af_ops); rtnl_link_unregister(&br_link_ops); diff --git a/net/bridge/br_nf_core.c b/net/bridge/br_nf_core.c index 8c69f0c95a8e..98aea5485aae 100644 --- a/net/bridge/br_nf_core.c +++ b/net/bridge/br_nf_core.c @@ -73,7 +73,7 @@ void br_netfilter_rtable_init(struct net_bridge *br) { struct rtable *rt = &br->fake_rtable; - atomic_set(&rt->dst.__refcnt, 1); + rcuref_init(&rt->dst.__rcuref, 1); rt->dst.dev = br->dev; dst_init_metrics(&rt->dst, br_dst_default_metrics, true); rt->dst.flags = DST_NOXFRM | DST_FAKE_RTABLE; diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index cef5f6ea850c..2119729ded2b 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -178,6 +178,7 @@ enum { BR_VLFLAG_ADDED_BY_SWITCHDEV = BIT(1), BR_VLFLAG_MCAST_ENABLED = BIT(2), BR_VLFLAG_GLOBAL_MCAST_ENABLED = BIT(3), + BR_VLFLAG_NEIGH_SUPPRESS_ENABLED = BIT(4), }; /** @@ -849,7 +850,8 @@ void br_forward(const struct net_bridge_port *to, struct sk_buff *skb, bool local_rcv, bool local_orig); int br_forward_finish(struct net *net, struct sock *sk, struct sk_buff *skb); void br_flood(struct net_bridge *br, struct sk_buff *skb, - enum br_pkt_type pkt_type, bool local_rcv, bool local_orig); + enum br_pkt_type pkt_type, bool local_rcv, bool local_orig, + u16 vid); /* return true if both source port and dest port are isolated */ static inline bool br_skb_isolated(const struct net_bridge_port *to, @@ -981,8 +983,12 @@ void br_multicast_get_stats(const struct net_bridge *br, u32 br_multicast_ngroups_get(const struct net_bridge_mcast_port *pmctx); void br_multicast_ngroups_set_max(struct net_bridge_mcast_port *pmctx, u32 max); u32 br_multicast_ngroups_get_max(const struct net_bridge_mcast_port *pmctx); -void br_mdb_init(void); -void br_mdb_uninit(void); +int br_mdb_add(struct net_device *dev, struct nlattr *tb[], u16 nlmsg_flags, + struct netlink_ext_ack *extack); +int br_mdb_del(struct net_device *dev, struct nlattr *tb[], + struct netlink_ext_ack *extack); +int br_mdb_dump(struct net_device *dev, struct sk_buff *skb, + struct netlink_callback *cb); void br_multicast_host_join(const struct net_bridge_mcast *brmctx, struct net_bridge_mdb_entry *mp, bool notify); void br_multicast_host_leave(struct net_bridge_mdb_entry *mp, bool notify); @@ -1374,12 +1380,22 @@ static inline bool br_multicast_querier_exists(struct net_bridge_mcast *brmctx, return false; } -static inline void br_mdb_init(void) +static inline int br_mdb_add(struct net_device *dev, struct nlattr *tb[], + u16 nlmsg_flags, struct netlink_ext_ack *extack) +{ + return -EOPNOTSUPP; +} + +static inline int br_mdb_del(struct net_device *dev, struct nlattr *tb[], + struct netlink_ext_ack *extack) { + return -EOPNOTSUPP; } -static inline void br_mdb_uninit(void) +static inline int br_mdb_dump(struct net_device *dev, struct sk_buff *skb, + struct netlink_callback *cb) { + return 0; } static inline int br_mdb_hash_init(struct net_bridge *br) @@ -2204,4 +2220,5 @@ void br_do_proxy_suppress_arp(struct sk_buff *skb, struct net_bridge *br, void br_do_suppress_nd(struct sk_buff *skb, struct net_bridge *br, u16 vid, struct net_bridge_port *p, struct nd_msg *msg); struct nd_msg *br_is_nd_neigh_msg(struct sk_buff *skb, struct nd_msg *m); +bool br_is_neigh_suppress_enabled(const struct net_bridge_port *p, u16 vid); #endif diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c index 8a3dbc09ba38..15f44d026e75 100644 --- a/net/bridge/br_vlan.c +++ b/net/bridge/br_vlan.c @@ -2134,6 +2134,7 @@ static const struct nla_policy br_vlan_db_policy[BRIDGE_VLANDB_ENTRY_MAX + 1] = [BRIDGE_VLANDB_ENTRY_MCAST_ROUTER] = { .type = NLA_U8 }, [BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS] = { .type = NLA_REJECT }, [BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS] = { .type = NLA_U32 }, + [BRIDGE_VLANDB_ENTRY_NEIGH_SUPPRESS] = NLA_POLICY_MAX(NLA_U8, 1), }; static int br_vlan_rtm_process_one(struct net_device *dev, diff --git a/net/bridge/br_vlan_options.c b/net/bridge/br_vlan_options.c index e378c2f3a9e2..8fa89b04ee94 100644 --- a/net/bridge/br_vlan_options.c +++ b/net/bridge/br_vlan_options.c @@ -52,7 +52,9 @@ bool br_vlan_opts_fill(struct sk_buff *skb, const struct net_bridge_vlan *v, const struct net_bridge_port *p) { if (nla_put_u8(skb, BRIDGE_VLANDB_ENTRY_STATE, br_vlan_get_state(v)) || - !__vlan_tun_put(skb, v)) + !__vlan_tun_put(skb, v) || + nla_put_u8(skb, BRIDGE_VLANDB_ENTRY_NEIGH_SUPPRESS, + !!(v->priv_flags & BR_VLFLAG_NEIGH_SUPPRESS_ENABLED))) return false; #ifdef CONFIG_BRIDGE_IGMP_SNOOPING @@ -80,6 +82,7 @@ size_t br_vlan_opts_nl_size(void) + nla_total_size(sizeof(u32)) /* BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS */ + nla_total_size(sizeof(u32)) /* BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS */ #endif + + nla_total_size(sizeof(u8)) /* BRIDGE_VLANDB_ENTRY_NEIGH_SUPPRESS */ + 0; } @@ -239,6 +242,21 @@ static int br_vlan_process_one_opts(const struct net_bridge *br, } #endif + if (tb[BRIDGE_VLANDB_ENTRY_NEIGH_SUPPRESS]) { + bool enabled = v->priv_flags & BR_VLFLAG_NEIGH_SUPPRESS_ENABLED; + bool val = nla_get_u8(tb[BRIDGE_VLANDB_ENTRY_NEIGH_SUPPRESS]); + + if (!p) { + NL_SET_ERR_MSG_MOD(extack, "Can't set neigh_suppress for non-port vlans"); + return -EINVAL; + } + + if (val != enabled) { + v->priv_flags ^= BR_VLFLAG_NEIGH_SUPPRESS_ENABLED; + *changed = true; + } + } + return 0; } diff --git a/net/bridge/netfilter/nft_meta_bridge.c b/net/bridge/netfilter/nft_meta_bridge.c index c3ecd77e25cb..bd4d1b4d745f 100644 --- a/net/bridge/netfilter/nft_meta_bridge.c +++ b/net/bridge/netfilter/nft_meta_bridge.c @@ -8,6 +8,9 @@ #include <net/netfilter/nf_tables.h> #include <net/netfilter/nft_meta.h> #include <linux/if_bridge.h> +#include <uapi/linux/netfilter_bridge.h> /* NF_BR_PRE_ROUTING */ + +#include "../br_private.h" static const struct net_device * nft_meta_get_bridge(const struct net_device *dev) @@ -102,6 +105,50 @@ static const struct nft_expr_ops nft_meta_bridge_get_ops = { .reduce = nft_meta_get_reduce, }; +static void nft_meta_bridge_set_eval(const struct nft_expr *expr, + struct nft_regs *regs, + const struct nft_pktinfo *pkt) +{ + const struct nft_meta *meta = nft_expr_priv(expr); + u32 *sreg = ®s->data[meta->sreg]; + struct sk_buff *skb = pkt->skb; + u8 value8; + + switch (meta->key) { + case NFT_META_BRI_BROUTE: + value8 = nft_reg_load8(sreg); + BR_INPUT_SKB_CB(skb)->br_netfilter_broute = !!value8; + break; + default: + nft_meta_set_eval(expr, regs, pkt); + } +} + +static int nft_meta_bridge_set_init(const struct nft_ctx *ctx, + const struct nft_expr *expr, + const struct nlattr * const tb[]) +{ + struct nft_meta *priv = nft_expr_priv(expr); + unsigned int len; + int err; + + priv->key = ntohl(nla_get_be32(tb[NFTA_META_KEY])); + switch (priv->key) { + case NFT_META_BRI_BROUTE: + len = sizeof(u8); + break; + default: + return nft_meta_set_init(ctx, expr, tb); + } + + priv->len = len; + err = nft_parse_register_load(tb[NFTA_META_SREG], &priv->sreg, len); + if (err < 0) + return err; + + return 0; +} + static bool nft_meta_bridge_set_reduce(struct nft_regs_track *track, const struct nft_expr *expr) { @@ -120,15 +167,33 @@ static bool nft_meta_bridge_set_reduce(struct nft_regs_track *track, return false; } +static int nft_meta_bridge_set_validate(const struct nft_ctx *ctx, + const struct nft_expr *expr, + const struct nft_data **data) +{ + struct nft_meta *priv = nft_expr_priv(expr); + unsigned int hooks; + + switch (priv->key) { + case NFT_META_BRI_BROUTE: + hooks = 1 << NF_BR_PRE_ROUTING; + break; + default: + return nft_meta_set_validate(ctx, expr, data); + } + + return nft_chain_validate_hooks(ctx->chain, hooks); +} + static const struct nft_expr_ops nft_meta_bridge_set_ops = { .type = &nft_meta_bridge_type, .size = NFT_EXPR_SIZE(sizeof(struct nft_meta)), - .eval = nft_meta_set_eval, - .init = nft_meta_set_init, + .eval = nft_meta_bridge_set_eval, + .init = nft_meta_bridge_set_init, .destroy = nft_meta_set_destroy, .dump = nft_meta_set_dump, .reduce = nft_meta_bridge_set_reduce, - .validate = nft_meta_set_validate, + .validate = nft_meta_bridge_set_validate, }; static const struct nft_expr_ops * diff --git a/net/can/isotp.c b/net/can/isotp.c index 5761d4ab839d..a750259cb79c 100644 --- a/net/can/isotp.c +++ b/net/can/isotp.c @@ -85,10 +85,21 @@ MODULE_ALIAS("can-proto-6"); /* ISO 15765-2:2016 supports more than 4095 byte per ISO PDU as the FF_DL can * take full 32 bit values (4 Gbyte). We would need some good concept to handle - * this between user space and kernel space. For now increase the static buffer - * to something about 64 kbyte to be able to test this new functionality. + * this between user space and kernel space. For now set the static buffer to + * something about 8 kbyte to be able to test this new functionality. */ -#define MAX_MSG_LENGTH 66000 +#define DEFAULT_MAX_PDU_SIZE 8300 + +/* maximum PDU size before ISO 15765-2:2016 extension was 4095 */ +#define MAX_12BIT_PDU_SIZE 4095 + +/* limit the isotp pdu size from the optional module parameter to 1MByte */ +#define MAX_PDU_SIZE (1025 * 1024U) + +static unsigned int max_pdu_size __read_mostly = DEFAULT_MAX_PDU_SIZE; +module_param(max_pdu_size, uint, 0444); +MODULE_PARM_DESC(max_pdu_size, "maximum isotp pdu size (default " + __stringify(DEFAULT_MAX_PDU_SIZE) ")"); /* N_PCI type values in bits 7-4 of N_PCI bytes */ #define N_PCI_SF 0x00 /* single frame */ @@ -124,13 +135,15 @@ enum { }; struct tpcon { - unsigned int idx; + u8 *buf; + unsigned int buflen; unsigned int len; + unsigned int idx; u32 state; u8 bs; u8 sn; u8 ll_dl; - u8 buf[MAX_MSG_LENGTH + 1]; + u8 sbuf[DEFAULT_MAX_PDU_SIZE]; }; struct isotp_sock { @@ -504,7 +517,17 @@ static int isotp_rcv_ff(struct sock *sk, struct canfd_frame *cf, int ae) if (so->rx.len + ae + off + ff_pci_sz < so->rx.ll_dl) return 1; - if (so->rx.len > MAX_MSG_LENGTH) { + /* PDU size > default => try max_pdu_size */ + if (so->rx.len > so->rx.buflen && so->rx.buflen < max_pdu_size) { + u8 *newbuf = kmalloc(max_pdu_size, GFP_ATOMIC); + + if (newbuf) { + so->rx.buf = newbuf; + so->rx.buflen = max_pdu_size; + } + } + + if (so->rx.len > so->rx.buflen) { /* send FC frame with overflow status */ isotp_send_fc(sk, ae, ISOTP_FC_OVFLW); return 1; @@ -808,7 +831,7 @@ static void isotp_create_fframe(struct canfd_frame *cf, struct isotp_sock *so, cf->data[0] = so->opt.ext_address; /* create N_PCI bytes with 12/32 bit FF_DL data length */ - if (so->tx.len > 4095) { + if (so->tx.len > MAX_12BIT_PDU_SIZE) { /* use 32 bit FF_DL notation */ cf->data[ae] = N_PCI_FF; cf->data[ae + 1] = 0; @@ -948,7 +971,17 @@ wait_free_buffer: goto wait_free_buffer; } - if (!size || size > MAX_MSG_LENGTH) { + /* PDU size > default => try max_pdu_size */ + if (size > so->tx.buflen && so->tx.buflen < max_pdu_size) { + u8 *newbuf = kmalloc(max_pdu_size, GFP_KERNEL); + + if (newbuf) { + so->tx.buf = newbuf; + so->tx.buflen = max_pdu_size; + } + } + + if (!size || size > so->tx.buflen) { err = -EINVAL; goto err_out_drop; } @@ -1202,6 +1235,12 @@ static int isotp_release(struct socket *sock) so->ifindex = 0; so->bound = 0; + if (so->rx.buf != so->rx.sbuf) + kfree(so->rx.buf); + + if (so->tx.buf != so->tx.sbuf) + kfree(so->tx.buf); + sock_orphan(sk); sock->sk = NULL; @@ -1598,6 +1637,11 @@ static int isotp_init(struct sock *sk) so->rx.state = ISOTP_IDLE; so->tx.state = ISOTP_IDLE; + so->rx.buf = so->rx.sbuf; + so->tx.buf = so->tx.sbuf; + so->rx.buflen = ARRAY_SIZE(so->rx.sbuf); + so->tx.buflen = ARRAY_SIZE(so->tx.sbuf); + hrtimer_init(&so->rxtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_SOFT); so->rxtimer.function = isotp_rx_timer_handler; hrtimer_init(&so->txtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_SOFT); @@ -1680,7 +1724,10 @@ static __init int isotp_module_init(void) { int err; - pr_info("can: isotp protocol\n"); + max_pdu_size = max_t(unsigned int, max_pdu_size, MAX_12BIT_PDU_SIZE); + max_pdu_size = min_t(unsigned int, max_pdu_size, MAX_PDU_SIZE); + + pr_info("can: isotp protocol (max_pdu_size %d)\n", max_pdu_size); err = can_proto_register(&isotp_can_proto); if (err < 0) diff --git a/net/compat.c b/net/compat.c index 161b7bea1f62..6564720f32b7 100644 --- a/net/compat.c +++ b/net/compat.c @@ -113,7 +113,7 @@ int get_compat_msghdr(struct msghdr *kmsg, #define CMSG_COMPAT_FIRSTHDR(msg) \ (((msg)->msg_controllen) >= sizeof(struct compat_cmsghdr) ? \ - (struct compat_cmsghdr __user *)((msg)->msg_control) : \ + (struct compat_cmsghdr __user *)((msg)->msg_control_user) : \ (struct compat_cmsghdr __user *)NULL) #define CMSG_COMPAT_OK(ucmlen, ucmsg, mhdr) \ @@ -126,7 +126,7 @@ static inline struct compat_cmsghdr __user *cmsg_compat_nxthdr(struct msghdr *ms struct compat_cmsghdr __user *cmsg, int cmsg_len) { char __user *ptr = (char __user *)cmsg + CMSG_COMPAT_ALIGN(cmsg_len); - if ((unsigned long)(ptr + 1 - (char __user *)msg->msg_control) > + if ((unsigned long)(ptr + 1 - (char __user *)msg->msg_control_user) > msg->msg_controllen) return NULL; return (struct compat_cmsghdr __user *)ptr; @@ -211,6 +211,7 @@ int cmsghdr_from_user_compat_to_kern(struct msghdr *kmsg, struct sock *sk, goto Einval; /* Ok, looks like we made it. Hook it up and return success. */ + kmsg->msg_control_is_user = false; kmsg->msg_control = kcmsg_base; kmsg->msg_controllen = kcmlen; return 0; @@ -225,7 +226,7 @@ Efault: int put_cmsg_compat(struct msghdr *kmsg, int level, int type, int len, void *data) { - struct compat_cmsghdr __user *cm = (struct compat_cmsghdr __user *) kmsg->msg_control; + struct compat_cmsghdr __user *cm = (struct compat_cmsghdr __user *) kmsg->msg_control_user; struct compat_cmsghdr cmhdr; struct old_timeval32 ctv; struct old_timespec32 cts[3]; @@ -274,7 +275,7 @@ int put_cmsg_compat(struct msghdr *kmsg, int level, int type, int len, void *dat cmlen = CMSG_COMPAT_SPACE(len); if (kmsg->msg_controllen < cmlen) cmlen = kmsg->msg_controllen; - kmsg->msg_control += cmlen; + kmsg->msg_control_user += cmlen; kmsg->msg_controllen -= cmlen; return 0; } @@ -289,7 +290,7 @@ static int scm_max_fds_compat(struct msghdr *msg) void scm_detach_fds_compat(struct msghdr *msg, struct scm_cookie *scm) { struct compat_cmsghdr __user *cm = - (struct compat_cmsghdr __user *)msg->msg_control; + (struct compat_cmsghdr __user *)msg->msg_control_user; unsigned int o_flags = (msg->msg_flags & MSG_CMSG_CLOEXEC) ? O_CLOEXEC : 0; int fdmax = min_t(int, scm_max_fds_compat(msg), scm->fp->count); int __user *cmsg_data = CMSG_COMPAT_DATA(cm); @@ -313,7 +314,7 @@ void scm_detach_fds_compat(struct msghdr *msg, struct scm_cookie *scm) cmlen = CMSG_COMPAT_SPACE(i * sizeof(int)); if (msg->msg_controllen < cmlen) cmlen = msg->msg_controllen; - msg->msg_control += cmlen; + msg->msg_control_user += cmlen; msg->msg_controllen -= cmlen; } } diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c index bb378c33f542..d4172534dfa8 100644 --- a/net/core/bpf_sk_storage.c +++ b/net/core/bpf_sk_storage.c @@ -40,7 +40,7 @@ static int bpf_sk_storage_del(struct sock *sk, struct bpf_map *map) if (!sdata) return -ENOENT; - bpf_selem_unlink(SELEM(sdata), true); + bpf_selem_unlink(SELEM(sdata), false); return 0; } @@ -49,7 +49,6 @@ static int bpf_sk_storage_del(struct sock *sk, struct bpf_map *map) void bpf_sk_storage_free(struct sock *sk) { struct bpf_local_storage *sk_storage; - bool free_sk_storage = false; rcu_read_lock(); sk_storage = rcu_dereference(sk->sk_bpf_storage); @@ -58,13 +57,8 @@ void bpf_sk_storage_free(struct sock *sk) return; } - raw_spin_lock_bh(&sk_storage->lock); - free_sk_storage = bpf_local_storage_unlink_nolock(sk_storage); - raw_spin_unlock_bh(&sk_storage->lock); + bpf_local_storage_destroy(sk_storage); rcu_read_unlock(); - - if (free_sk_storage) - kfree_rcu(sk_storage, rcu); } static void bpf_sk_storage_map_free(struct bpf_map *map) @@ -74,7 +68,7 @@ static void bpf_sk_storage_map_free(struct bpf_map *map) static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr) { - return bpf_local_storage_map_alloc(attr, &sk_cache); + return bpf_local_storage_map_alloc(attr, &sk_cache, false); } static int notsupp_get_next_key(struct bpf_map *map, void *key, @@ -100,8 +94,8 @@ static void *bpf_fd_sk_storage_lookup_elem(struct bpf_map *map, void *key) return ERR_PTR(err); } -static int bpf_fd_sk_storage_update_elem(struct bpf_map *map, void *key, - void *value, u64 map_flags) +static long bpf_fd_sk_storage_update_elem(struct bpf_map *map, void *key, + void *value, u64 map_flags) { struct bpf_local_storage_data *sdata; struct socket *sock; @@ -120,7 +114,7 @@ static int bpf_fd_sk_storage_update_elem(struct bpf_map *map, void *key, return err; } -static int bpf_fd_sk_storage_delete_elem(struct bpf_map *map, void *key) +static long bpf_fd_sk_storage_delete_elem(struct bpf_map *map, void *key) { struct socket *sock; int fd, err; @@ -203,7 +197,7 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk) } else { ret = bpf_local_storage_alloc(newsk, smap, copy_selem, GFP_ATOMIC); if (ret) { - kfree(copy_selem); + bpf_selem_free(copy_selem, smap, true); atomic_sub(smap->elem_size, &newsk->sk_omem_alloc); bpf_map_put(map); @@ -324,6 +318,7 @@ const struct bpf_map_ops sk_storage_map_ops = { .map_local_storage_charge = bpf_sk_storage_charge, .map_local_storage_uncharge = bpf_sk_storage_uncharge, .map_owner_storage_ptr = bpf_sk_storage_ptr, + .map_mem_usage = bpf_local_storage_map_mem_usage, }; const struct bpf_func_proto bpf_sk_storage_get_proto = { @@ -417,7 +412,7 @@ const struct bpf_func_proto bpf_sk_storage_get_tracing_proto = { .gpl_only = false, .ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL, .arg1_type = ARG_CONST_MAP_PTR, - .arg2_type = ARG_PTR_TO_BTF_ID, + .arg2_type = ARG_PTR_TO_BTF_ID_OR_NULL, .arg2_btf_id = &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON], .arg3_type = ARG_PTR_TO_MAP_VALUE_OR_NULL, .arg4_type = ARG_ANYTHING, @@ -429,7 +424,7 @@ const struct bpf_func_proto bpf_sk_storage_delete_tracing_proto = { .gpl_only = false, .ret_type = RET_INTEGER, .arg1_type = ARG_CONST_MAP_PTR, - .arg2_type = ARG_PTR_TO_BTF_ID, + .arg2_type = ARG_PTR_TO_BTF_ID_OR_NULL, .arg2_btf_id = &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON], .allowed = bpf_sk_storage_tracing_allowed, }; diff --git a/net/core/datagram.c b/net/core/datagram.c index e4ff2db40c98..5662dff3d381 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -622,12 +622,12 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, frag = skb_shinfo(skb)->nr_frags; while (length && iov_iter_count(from)) { + struct page *head, *last_head = NULL; struct page *pages[MAX_SKB_FRAGS]; - struct page *last_head = NULL; + int refs, order, n = 0; size_t start; ssize_t copied; unsigned long truesize; - int refs, n = 0; if (frag == MAX_SKB_FRAGS) return -EMSGSIZE; @@ -650,9 +650,17 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, } else { refcount_add(truesize, &skb->sk->sk_wmem_alloc); } + + head = compound_head(pages[n]); + order = compound_order(head); + for (refs = 0; copied != 0; start = 0) { int size = min_t(int, copied, PAGE_SIZE - start); - struct page *head = compound_head(pages[n]); + + if (pages[n] - head > (1UL << order) - 1) { + head = compound_head(pages[n]); + order = compound_order(head); + } start += (pages[n] - head) << PAGE_SHIFT; copied -= size; diff --git a/net/core/dev.c b/net/core/dev.c index 1488f700bf81..735096d42c1d 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -160,8 +160,6 @@ struct list_head ptype_base[PTYPE_HASH_SIZE] __read_mostly; struct list_head ptype_all __read_mostly; /* Taps */ static int netif_rx_internal(struct sk_buff *skb); -static int call_netdevice_notifiers_info(unsigned long val, - struct netdev_notifier_info *info); static int call_netdevice_notifiers_extack(unsigned long val, struct net_device *dev, struct netlink_ext_ack *extack); @@ -1919,8 +1917,8 @@ static void move_netdevice_notifiers_dev_net(struct net_device *dev, * are as for raw_notifier_call_chain(). */ -static int call_netdevice_notifiers_info(unsigned long val, - struct netdev_notifier_info *info) +int call_netdevice_notifiers_info(unsigned long val, + struct netdev_notifier_info *info) { struct net *net = dev_net(info->dev); int ret; @@ -2535,6 +2533,8 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask, struct xps_map *map, *new_map; unsigned int nr_ids; + WARN_ON_ONCE(index >= dev->num_tx_queues); + if (dev->num_tc) { /* Do not allow XPS on subordinate device directly */ num_tc = dev->num_tc; @@ -3075,7 +3075,7 @@ void __netif_schedule(struct Qdisc *q) EXPORT_SYMBOL(__netif_schedule); struct dev_kfree_skb_cb { - enum skb_free_reason reason; + enum skb_drop_reason reason; }; static struct dev_kfree_skb_cb *get_kfree_skb_cb(const struct sk_buff *skb) @@ -3108,7 +3108,7 @@ void netif_tx_wake_queue(struct netdev_queue *dev_queue) } EXPORT_SYMBOL(netif_tx_wake_queue); -void __dev_kfree_skb_irq(struct sk_buff *skb, enum skb_free_reason reason) +void dev_kfree_skb_irq_reason(struct sk_buff *skb, enum skb_drop_reason reason) { unsigned long flags; @@ -3128,18 +3128,16 @@ void __dev_kfree_skb_irq(struct sk_buff *skb, enum skb_free_reason reason) raise_softirq_irqoff(NET_TX_SOFTIRQ); local_irq_restore(flags); } -EXPORT_SYMBOL(__dev_kfree_skb_irq); +EXPORT_SYMBOL(dev_kfree_skb_irq_reason); -void __dev_kfree_skb_any(struct sk_buff *skb, enum skb_free_reason reason) +void dev_kfree_skb_any_reason(struct sk_buff *skb, enum skb_drop_reason reason) { if (in_hardirq() || irqs_disabled()) - __dev_kfree_skb_irq(skb, reason); - else if (unlikely(reason == SKB_REASON_DROPPED)) - kfree_skb(skb); + dev_kfree_skb_irq_reason(skb, reason); else - consume_skb(skb); + kfree_skb_reason(skb, reason); } -EXPORT_SYMBOL(__dev_kfree_skb_any); +EXPORT_SYMBOL(dev_kfree_skb_any_reason); /** @@ -3317,8 +3315,7 @@ int skb_crc32c_csum_help(struct sk_buff *skb) skb->len - start, ~(__u32)0, crc32c_csum_stub)); *(__le32 *)(skb->data + offset) = crc32c_csum; - skb->ip_summed = CHECKSUM_NONE; - skb->csum_not_inet = 0; + skb_reset_csum_not_inet(skb); out: return ret; } @@ -3736,25 +3733,25 @@ static void qdisc_pkt_len_init(struct sk_buff *skb) * we add to pkt_len the headers size of all segments */ if (shinfo->gso_size && skb_transport_header_was_set(skb)) { - unsigned int hdr_len; u16 gso_segs = shinfo->gso_segs; + unsigned int hdr_len; /* mac layer + network layer */ - hdr_len = skb_transport_header(skb) - skb_mac_header(skb); + hdr_len = skb_transport_offset(skb); /* + transport layer */ if (likely(shinfo->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6))) { const struct tcphdr *th; struct tcphdr _tcphdr; - th = skb_header_pointer(skb, skb_transport_offset(skb), + th = skb_header_pointer(skb, hdr_len, sizeof(_tcphdr), &_tcphdr); if (likely(th)) hdr_len += __tcp_hdrlen(th); } else { struct udphdr _udphdr; - if (skb_header_pointer(skb, skb_transport_offset(skb), + if (skb_header_pointer(skb, hdr_len, sizeof(_udphdr), &_udphdr)) hdr_len += sizeof(struct udphdr); } @@ -4361,7 +4358,12 @@ static inline void ____napi_schedule(struct softnet_data *sd, } list_add_tail(&napi->poll_list, &sd->poll_list); - __raise_softirq_irqoff(NET_RX_SOFTIRQ); + WRITE_ONCE(napi->list_owner, smp_processor_id()); + /* If not called from net_rx_action() + * we have to raise NET_RX_SOFTIRQ. + */ + if (!sd->in_net_rx_action) + __raise_softirq_irqoff(NET_RX_SOFTIRQ); } #ifdef CONFIG_RPS @@ -4583,11 +4585,16 @@ static void trigger_rx_softirq(void *data) } /* - * Check if this softnet_data structure is another cpu one - * If yes, queue it to our IPI list and return 1 - * If no, return 0 + * After we queued a packet into sd->input_pkt_queue, + * we need to make sure this queue is serviced soon. + * + * - If this is another cpu queue, link it to our rps_ipi_list, + * and make sure we will process rps_ipi_list from net_rx_action(). + * + * - If this is our own queue, NAPI schedule our backlog. + * Note that this also raises NET_RX_SOFTIRQ. */ -static int napi_schedule_rps(struct softnet_data *sd) +static void napi_schedule_rps(struct softnet_data *sd) { struct softnet_data *mysd = this_cpu_ptr(&softnet_data); @@ -4596,12 +4603,15 @@ static int napi_schedule_rps(struct softnet_data *sd) sd->rps_ipi_next = mysd->rps_ipi_list; mysd->rps_ipi_list = sd; - __raise_softirq_irqoff(NET_RX_SOFTIRQ); - return 1; + /* If not called from net_rx_action() or napi_threaded_poll() + * we have to raise NET_RX_SOFTIRQ. + */ + if (!mysd->in_net_rx_action && !mysd->in_napi_threaded_poll) + __raise_softirq_irqoff(NET_RX_SOFTIRQ); + return; } #endif /* CONFIG_RPS */ __napi_schedule_irqoff(&mysd->backlog); - return 0; } #ifdef CONFIG_NET_FLOW_LIMIT @@ -5021,16 +5031,17 @@ static __latent_entropy void net_tx_action(struct softirq_action *h) clist = clist->next; WARN_ON(refcount_read(&skb->users)); - if (likely(get_kfree_skb_cb(skb)->reason == SKB_REASON_CONSUMED)) + if (likely(get_kfree_skb_cb(skb)->reason == SKB_CONSUMED)) trace_consume_skb(skb, net_tx_action); else trace_kfree_skb(skb, net_tx_action, - SKB_DROP_REASON_NOT_SPECIFIED); + get_kfree_skb_cb(skb)->reason); if (skb->fclone != SKB_FCLONE_UNAVAILABLE) __kfree_skb(skb); else - __kfree_skb_defer(skb); + __napi_kfree_skb(skb, + get_kfree_skb_cb(skb)->reason); } } @@ -6059,6 +6070,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done) list_del_init(&n->poll_list); local_irq_restore(flags); } + WRITE_ONCE(n->list_owner, -1); val = READ_ONCE(n->state); do { @@ -6374,6 +6386,7 @@ void netif_napi_add_weight(struct net_device *dev, struct napi_struct *napi, #ifdef CONFIG_NETPOLL napi->poll_owner = -1; #endif + napi->list_owner = -1; set_bit(NAPI_STATE_SCHED, &napi->state); set_bit(NAPI_STATE_NPSVC, &napi->state); list_add_rcu(&napi->dev_list, &dev->napi_list); @@ -6585,9 +6598,31 @@ static int napi_thread_wait(struct napi_struct *napi) return -1; } +static void skb_defer_free_flush(struct softnet_data *sd) +{ + struct sk_buff *skb, *next; + + /* Paired with WRITE_ONCE() in skb_attempt_defer_free() */ + if (!READ_ONCE(sd->defer_list)) + return; + + spin_lock(&sd->defer_lock); + skb = sd->defer_list; + sd->defer_list = NULL; + sd->defer_count = 0; + spin_unlock(&sd->defer_lock); + + while (skb != NULL) { + next = skb->next; + napi_consume_skb(skb, 1); + skb = next; + } +} + static int napi_threaded_poll(void *data) { struct napi_struct *napi = data; + struct softnet_data *sd; void *have; while (!napi_thread_wait(napi)) { @@ -6595,11 +6630,21 @@ static int napi_threaded_poll(void *data) bool repoll = false; local_bh_disable(); + sd = this_cpu_ptr(&softnet_data); + sd->in_napi_threaded_poll = true; have = netpoll_poll_lock(napi); __napi_poll(napi, &repoll); netpoll_poll_unlock(have); + sd->in_napi_threaded_poll = false; + barrier(); + + if (sd_has_rps_ipi_waiting(sd)) { + local_irq_disable(); + net_rps_action_and_irq_enable(sd); + } + skb_defer_free_flush(sd); local_bh_enable(); if (!repoll) @@ -6611,27 +6656,6 @@ static int napi_threaded_poll(void *data) return 0; } -static void skb_defer_free_flush(struct softnet_data *sd) -{ - struct sk_buff *skb, *next; - - /* Paired with WRITE_ONCE() in skb_attempt_defer_free() */ - if (!READ_ONCE(sd->defer_list)) - return; - - spin_lock_irq(&sd->defer_lock); - skb = sd->defer_list; - sd->defer_list = NULL; - sd->defer_count = 0; - spin_unlock_irq(&sd->defer_lock); - - while (skb != NULL) { - next = skb->next; - napi_consume_skb(skb, 1); - skb = next; - } -} - static __latent_entropy void net_rx_action(struct softirq_action *h) { struct softnet_data *sd = this_cpu_ptr(&softnet_data); @@ -6641,6 +6665,8 @@ static __latent_entropy void net_rx_action(struct softirq_action *h) LIST_HEAD(list); LIST_HEAD(repoll); +start: + sd->in_net_rx_action = true; local_irq_disable(); list_splice_init(&sd->poll_list, &list); local_irq_enable(); @@ -6651,8 +6677,18 @@ static __latent_entropy void net_rx_action(struct softirq_action *h) skb_defer_free_flush(sd); if (list_empty(&list)) { - if (!sd_has_rps_ipi_waiting(sd) && list_empty(&repoll)) - goto end; + if (list_empty(&repoll)) { + sd->in_net_rx_action = false; + barrier(); + /* We need to check if ____napi_schedule() + * had refilled poll_list while + * sd->in_net_rx_action was true. + */ + if (!list_empty(&sd->poll_list)) + goto start; + if (!sd_has_rps_ipi_waiting(sd)) + goto end; + } break; } @@ -6677,6 +6713,8 @@ static __latent_entropy void net_rx_action(struct softirq_action *h) list_splice(&list, &sd->poll_list); if (!list_empty(&sd->poll_list)) __raise_softirq_irqoff(NET_RX_SOFTIRQ); + else + sd->in_net_rx_action = false; net_rps_action_and_irq_enable(sd); end:; diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c index 5cdbfbf9a7dc..3730945ee294 100644 --- a/net/core/dev_ioctl.c +++ b/net/core/dev_ioctl.c @@ -7,7 +7,7 @@ #include <linux/net_tstamp.h> #include <linux/wireless.h> #include <linux/if_bridge.h> -#include <net/dsa.h> +#include <net/dsa_stubs.h> #include <net/wext.h> #include "dev.h" @@ -183,22 +183,18 @@ static int dev_ifsioc_locked(struct net *net, struct ifreq *ifr, unsigned int cm return err; } -static int net_hwtstamp_validate(struct ifreq *ifr) +static int net_hwtstamp_validate(const struct kernel_hwtstamp_config *cfg) { - struct hwtstamp_config cfg; enum hwtstamp_tx_types tx_type; enum hwtstamp_rx_filters rx_filter; int tx_type_valid = 0; int rx_filter_valid = 0; - if (copy_from_user(&cfg, ifr->ifr_data, sizeof(cfg))) - return -EFAULT; - - if (cfg.flags & ~HWTSTAMP_FLAG_MASK) + if (cfg->flags & ~HWTSTAMP_FLAG_MASK) return -EINVAL; - tx_type = cfg.tx_type; - rx_filter = cfg.rx_filter; + tx_type = cfg->tx_type; + rx_filter = cfg->rx_filter; switch (tx_type) { case HWTSTAMP_TX_OFF: @@ -246,20 +242,45 @@ static int dev_eth_ioctl(struct net_device *dev, struct ifreq *ifr, unsigned int cmd) { const struct net_device_ops *ops = dev->netdev_ops; + + if (!ops->ndo_eth_ioctl) + return -EOPNOTSUPP; + + if (!netif_device_present(dev)) + return -ENODEV; + + return ops->ndo_eth_ioctl(dev, ifr, cmd); +} + +static int dev_get_hwtstamp(struct net_device *dev, struct ifreq *ifr) +{ + return dev_eth_ioctl(dev, ifr, SIOCGHWTSTAMP); +} + +static int dev_set_hwtstamp(struct net_device *dev, struct ifreq *ifr) +{ + struct kernel_hwtstamp_config kernel_cfg; + struct netlink_ext_ack extack = {}; + struct hwtstamp_config cfg; int err; - err = dsa_ndo_eth_ioctl(dev, ifr, cmd); - if (err == 0 || err != -EOPNOTSUPP) + if (copy_from_user(&cfg, ifr->ifr_data, sizeof(cfg))) + return -EFAULT; + + hwtstamp_config_to_kernel(&kernel_cfg, &cfg); + + err = net_hwtstamp_validate(&kernel_cfg); + if (err) return err; - if (ops->ndo_eth_ioctl) { - if (netif_device_present(dev)) - err = ops->ndo_eth_ioctl(dev, ifr, cmd); - else - err = -ENODEV; + err = dsa_master_hwtstamp_validate(dev, &kernel_cfg, &extack); + if (err) { + if (extack._msg) + netdev_err(dev, "%s\n", extack._msg); + return err; } - return err; + return dev_eth_ioctl(dev, ifr, SIOCSHWTSTAMP); } static int dev_siocbond(struct net_device *dev, @@ -391,36 +412,31 @@ static int dev_ifsioc(struct net *net, struct ifreq *ifr, void __user *data, rtnl_lock(); return err; + case SIOCDEVPRIVATE ... SIOCDEVPRIVATE + 15: + return dev_siocdevprivate(dev, ifr, data, cmd); + case SIOCSHWTSTAMP: - err = net_hwtstamp_validate(ifr); - if (err) - return err; - fallthrough; + return dev_set_hwtstamp(dev, ifr); - /* - * Unknown or private ioctl - */ - default: - if (cmd >= SIOCDEVPRIVATE && - cmd <= SIOCDEVPRIVATE + 15) - return dev_siocdevprivate(dev, ifr, data, cmd); - - if (cmd == SIOCGMIIPHY || - cmd == SIOCGMIIREG || - cmd == SIOCSMIIREG || - cmd == SIOCSHWTSTAMP || - cmd == SIOCGHWTSTAMP) { - err = dev_eth_ioctl(dev, ifr, cmd); - } else if (cmd == SIOCBONDENSLAVE || - cmd == SIOCBONDRELEASE || - cmd == SIOCBONDSETHWADDR || - cmd == SIOCBONDSLAVEINFOQUERY || - cmd == SIOCBONDINFOQUERY || - cmd == SIOCBONDCHANGEACTIVE) { - err = dev_siocbond(dev, ifr, cmd); - } else - err = -EINVAL; + case SIOCGHWTSTAMP: + return dev_get_hwtstamp(dev, ifr); + case SIOCGMIIPHY: + case SIOCGMIIREG: + case SIOCSMIIREG: + return dev_eth_ioctl(dev, ifr, cmd); + + case SIOCBONDENSLAVE: + case SIOCBONDRELEASE: + case SIOCBONDSETHWADDR: + case SIOCBONDSLAVEINFOQUERY: + case SIOCBONDINFOQUERY: + case SIOCBONDCHANGEACTIVE: + return dev_siocbond(dev, ifr, cmd); + + /* Unknown ioctl */ + default: + err = -EINVAL; } return err; } @@ -462,6 +478,7 @@ EXPORT_SYMBOL(dev_load); * @net: the applicable net namespace * @cmd: command to issue * @ifr: pointer to a struct ifreq in user space + * @data: data exchanged with userspace * @need_copyout: whether or not copy_to_user() should be called * * Issue ioctl functions to devices. This is normally called by the diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c index 5a782d1d8fd3..aff31cd944c2 100644 --- a/net/core/drop_monitor.c +++ b/net/core/drop_monitor.c @@ -21,6 +21,7 @@ #include <linux/workqueue.h> #include <linux/netlink.h> #include <linux/net_dropmon.h> +#include <linux/bitfield.h> #include <linux/percpu.h> #include <linux/timer.h> #include <linux/bitops.h> @@ -29,6 +30,7 @@ #include <net/genetlink.h> #include <net/netevent.h> #include <net/flow_offload.h> +#include <net/dropreason.h> #include <net/devlink.h> #include <trace/events/skb.h> @@ -504,8 +506,6 @@ static void net_dm_packet_trace_kfree_skb_hit(void *ignore, if (!nskb) return; - if (unlikely(reason >= SKB_DROP_REASON_MAX || reason <= 0)) - reason = SKB_DROP_REASON_NOT_SPECIFIED; cb = NET_DM_SKB_CB(nskb); cb->reason = reason; cb->pc = location; @@ -552,9 +552,9 @@ static size_t net_dm_in_port_size(void) } #define NET_DM_MAX_SYMBOL_LEN 40 +#define NET_DM_MAX_REASON_LEN 50 -static size_t net_dm_packet_report_size(size_t payload_len, - enum skb_drop_reason reason) +static size_t net_dm_packet_report_size(size_t payload_len) { size_t size; @@ -576,7 +576,7 @@ static size_t net_dm_packet_report_size(size_t payload_len, /* NET_DM_ATTR_PROTO */ nla_total_size(sizeof(u16)) + /* NET_DM_ATTR_REASON */ - nla_total_size(strlen(drop_reasons[reason]) + 1) + + nla_total_size(NET_DM_MAX_REASON_LEN + 1) + /* NET_DM_ATTR_PAYLOAD */ nla_total_size(payload_len); } @@ -610,6 +610,8 @@ static int net_dm_packet_report_fill(struct sk_buff *msg, struct sk_buff *skb, size_t payload_len) { struct net_dm_skb_cb *cb = NET_DM_SKB_CB(skb); + const struct drop_reason_list *list = NULL; + unsigned int subsys, subsys_reason; char buf[NET_DM_MAX_SYMBOL_LEN]; struct nlattr *attr; void *hdr; @@ -627,9 +629,24 @@ static int net_dm_packet_report_fill(struct sk_buff *msg, struct sk_buff *skb, NET_DM_ATTR_PAD)) goto nla_put_failure; + rcu_read_lock(); + subsys = u32_get_bits(cb->reason, SKB_DROP_REASON_SUBSYS_MASK); + if (subsys < SKB_DROP_REASON_SUBSYS_NUM) + list = rcu_dereference(drop_reasons_by_subsys[subsys]); + subsys_reason = cb->reason & ~SKB_DROP_REASON_SUBSYS_MASK; + if (!list || + subsys_reason >= list->n_reasons || + !list->reasons[subsys_reason] || + strlen(list->reasons[subsys_reason]) > NET_DM_MAX_REASON_LEN) { + list = rcu_dereference(drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_CORE]); + subsys_reason = SKB_DROP_REASON_NOT_SPECIFIED; + } if (nla_put_string(msg, NET_DM_ATTR_REASON, - drop_reasons[cb->reason])) + list->reasons[subsys_reason])) { + rcu_read_unlock(); goto nla_put_failure; + } + rcu_read_unlock(); snprintf(buf, sizeof(buf), "%pS", cb->pc); if (nla_put_string(msg, NET_DM_ATTR_SYMBOL, buf)) @@ -687,9 +704,7 @@ static void net_dm_packet_report(struct sk_buff *skb) if (net_dm_trunc_len) payload_len = min_t(size_t, net_dm_trunc_len, payload_len); - msg = nlmsg_new(net_dm_packet_report_size(payload_len, - NET_DM_SKB_CB(skb)->reason), - GFP_KERNEL); + msg = nlmsg_new(net_dm_packet_report_size(payload_len), GFP_KERNEL); if (!msg) goto out; diff --git a/net/core/dst.c b/net/core/dst.c index 31c08a3386d3..79d9306ad1ee 100644 --- a/net/core/dst.c +++ b/net/core/dst.c @@ -66,7 +66,8 @@ void dst_init(struct dst_entry *dst, struct dst_ops *ops, dst->tclassid = 0; #endif dst->lwtstate = NULL; - atomic_set(&dst->__refcnt, initial_ref); + rcuref_init(&dst->__rcuref, initial_ref); + INIT_LIST_HEAD(&dst->rt_uncached); dst->__use = 0; dst->lastuse = jiffies; dst->flags = flags; @@ -162,31 +163,15 @@ EXPORT_SYMBOL(dst_dev_put); void dst_release(struct dst_entry *dst) { - if (dst) { - int newrefcnt; - - newrefcnt = atomic_dec_return(&dst->__refcnt); - if (WARN_ONCE(newrefcnt < 0, "dst_release underflow")) - net_warn_ratelimited("%s: dst:%p refcnt:%d\n", - __func__, dst, newrefcnt); - if (!newrefcnt) - call_rcu_hurry(&dst->rcu_head, dst_destroy_rcu); - } + if (dst && rcuref_put(&dst->__rcuref)) + call_rcu_hurry(&dst->rcu_head, dst_destroy_rcu); } EXPORT_SYMBOL(dst_release); void dst_release_immediate(struct dst_entry *dst) { - if (dst) { - int newrefcnt; - - newrefcnt = atomic_dec_return(&dst->__refcnt); - if (WARN_ONCE(newrefcnt < 0, "dst_release_immediate underflow")) - net_warn_ratelimited("%s: dst:%p refcnt:%d\n", - __func__, dst, newrefcnt); - if (!newrefcnt) - dst_destroy(dst); - } + if (dst && rcuref_put(&dst->__rcuref)) + dst_destroy(dst); } EXPORT_SYMBOL(dst_release_immediate); diff --git a/net/core/filter.c b/net/core/filter.c index 1d6f165923bf..d9ce04ca22ce 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -1721,6 +1721,12 @@ static const struct bpf_func_proto bpf_skb_store_bytes_proto = { .arg5_type = ARG_ANYTHING, }; +int __bpf_skb_store_bytes(struct sk_buff *skb, u32 offset, const void *from, + u32 len, u64 flags) +{ + return ____bpf_skb_store_bytes(skb, offset, from, len, flags); +} + BPF_CALL_4(bpf_skb_load_bytes, const struct sk_buff *, skb, u32, offset, void *, to, u32, len) { @@ -1751,6 +1757,11 @@ static const struct bpf_func_proto bpf_skb_load_bytes_proto = { .arg4_type = ARG_CONST_SIZE, }; +int __bpf_skb_load_bytes(const struct sk_buff *skb, u32 offset, void *to, u32 len) +{ + return ____bpf_skb_load_bytes(skb, offset, to, len); +} + BPF_CALL_4(bpf_flow_dissector_load_bytes, const struct bpf_flow_dissector *, ctx, u32, offset, void *, to, u32, len) @@ -2111,6 +2122,7 @@ static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb) } skb->dev = dev; + skb_set_redirected_noclear(skb, skb_at_tc_ingress(skb)); skb_clear_tstamp(skb); dev_xmit_recursion_inc(); @@ -2193,7 +2205,7 @@ static int bpf_out_neigh_v6(struct net *net, struct sk_buff *skb, return -ENOMEM; } - rcu_read_lock_bh(); + rcu_read_lock(); if (!nh) { dst = skb_dst(skb); nexthop = rt6_nexthop(container_of(dst, struct rt6_info, dst), @@ -2206,10 +2218,12 @@ static int bpf_out_neigh_v6(struct net *net, struct sk_buff *skb, int ret; sock_confirm_neigh(skb, neigh); + local_bh_disable(); dev_xmit_recursion_inc(); ret = neigh_output(neigh, skb, false); dev_xmit_recursion_dec(); - rcu_read_unlock_bh(); + local_bh_enable(); + rcu_read_unlock(); return ret; } rcu_read_unlock_bh(); @@ -2291,7 +2305,7 @@ static int bpf_out_neigh_v4(struct net *net, struct sk_buff *skb, return -ENOMEM; } - rcu_read_lock_bh(); + rcu_read_lock(); if (!nh) { struct dst_entry *dst = skb_dst(skb); struct rtable *rt = container_of(dst, struct rtable, dst); @@ -2303,7 +2317,7 @@ static int bpf_out_neigh_v4(struct net *net, struct sk_buff *skb, } else if (nh->nh_family == AF_INET) { neigh = ip_neigh_gw4(dev, nh->ipv4_nh); } else { - rcu_read_unlock_bh(); + rcu_read_unlock(); goto out_drop; } @@ -2311,13 +2325,15 @@ static int bpf_out_neigh_v4(struct net *net, struct sk_buff *skb, int ret; sock_confirm_neigh(skb, neigh); + local_bh_disable(); dev_xmit_recursion_inc(); ret = neigh_output(neigh, skb, is_v6gw); dev_xmit_recursion_dec(); - rcu_read_unlock_bh(); + local_bh_enable(); + rcu_read_unlock(); return ret; } - rcu_read_unlock_bh(); + rcu_read_unlock(); out_drop: kfree_skb(skb); return -ENETDOWN; @@ -3828,7 +3844,7 @@ static const struct bpf_func_proto sk_skb_change_head_proto = { .arg3_type = ARG_ANYTHING, }; -BPF_CALL_1(bpf_xdp_get_buff_len, struct xdp_buff*, xdp) +BPF_CALL_1(bpf_xdp_get_buff_len, struct xdp_buff*, xdp) { return xdp_get_buff_len(xdp); } @@ -3883,8 +3899,8 @@ static const struct bpf_func_proto bpf_xdp_adjust_head_proto = { .arg2_type = ARG_ANYTHING, }; -static void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off, - void *buf, unsigned long len, bool flush) +void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off, + void *buf, unsigned long len, bool flush) { unsigned long ptr_len, ptr_off = 0; skb_frag_t *next_frag, *end_frag; @@ -3930,7 +3946,7 @@ static void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off, } } -static void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len) +void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len) { struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp); u32 size = xdp->data_end - xdp->data; @@ -3988,6 +4004,11 @@ static const struct bpf_func_proto bpf_xdp_load_bytes_proto = { .arg4_type = ARG_CONST_SIZE, }; +int __bpf_xdp_load_bytes(struct xdp_buff *xdp, u32 offset, void *buf, u32 len) +{ + return ____bpf_xdp_load_bytes(xdp, offset, buf, len); +} + BPF_CALL_4(bpf_xdp_store_bytes, struct xdp_buff *, xdp, u32, offset, void *, buf, u32, len) { @@ -4015,6 +4036,11 @@ static const struct bpf_func_proto bpf_xdp_store_bytes_proto = { .arg4_type = ARG_CONST_SIZE, }; +int __bpf_xdp_store_bytes(struct xdp_buff *xdp, u32 offset, void *buf, u32 len) +{ + return ____bpf_xdp_store_bytes(xdp, offset, buf, len); +} + static int bpf_xdp_frags_increase_tail(struct xdp_buff *xdp, int offset) { struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp); @@ -4977,7 +5003,7 @@ const struct bpf_func_proto bpf_get_socket_ptr_cookie_proto = { .func = bpf_get_socket_ptr_cookie, .gpl_only = false, .ret_type = RET_INTEGER, - .arg1_type = ARG_PTR_TO_BTF_ID_SOCK_COMMON, + .arg1_type = ARG_PTR_TO_BTF_ID_SOCK_COMMON | PTR_MAYBE_NULL, }; BPF_CALL_1(bpf_get_socket_cookie_sock_ops, struct bpf_sock_ops_kern *, ctx) @@ -5850,7 +5876,7 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params, else neigh = __ipv6_neigh_lookup_noref_stub(dev, params->ipv6_dst); - if (!neigh || !(neigh->nud_state & NUD_VALID)) + if (!neigh || !(READ_ONCE(neigh->nud_state) & NUD_VALID)) return BPF_FIB_LKUP_RET_NO_NEIGH; memcpy(params->dmac, neigh->ha, ETH_ALEN); memcpy(params->smac, dev->dev_addr, ETH_ALEN); @@ -5971,7 +5997,7 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params, * not needed here. */ neigh = __ipv6_neigh_lookup_noref_stub(dev, dst); - if (!neigh || !(neigh->nud_state & NUD_VALID)) + if (!neigh || !(READ_ONCE(neigh->nud_state) & NUD_VALID)) return BPF_FIB_LKUP_RET_NO_NEIGH; memcpy(params->dmac, neigh->ha, ETH_ALEN); memcpy(params->smac, dev->dev_addr, ETH_ALEN); @@ -8144,12 +8170,6 @@ sk_msg_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return &bpf_sk_storage_delete_proto; case BPF_FUNC_get_netns_cookie: return &bpf_get_netns_cookie_sk_msg_proto; -#ifdef CONFIG_CGROUPS - case BPF_FUNC_get_current_cgroup_id: - return &bpf_get_current_cgroup_id_proto; - case BPF_FUNC_get_current_ancestor_cgroup_id: - return &bpf_get_current_ancestor_cgroup_id_proto; -#endif #ifdef CONFIG_CGROUP_NET_CLASSID case BPF_FUNC_get_cgroup_classid: return &bpf_get_cgroup_classid_curr_proto; @@ -8727,23 +8747,18 @@ EXPORT_SYMBOL_GPL(nf_conn_btf_access_lock); int (*nfct_btf_struct_access)(struct bpf_verifier_log *log, const struct bpf_reg_state *reg, - int off, int size, enum bpf_access_type atype, - u32 *next_btf_id, enum bpf_type_flag *flag); + int off, int size); EXPORT_SYMBOL_GPL(nfct_btf_struct_access); static int tc_cls_act_btf_struct_access(struct bpf_verifier_log *log, const struct bpf_reg_state *reg, - int off, int size, enum bpf_access_type atype, - u32 *next_btf_id, enum bpf_type_flag *flag) + int off, int size) { int ret = -EACCES; - if (atype == BPF_READ) - return btf_struct_access(log, reg, off, size, atype, next_btf_id, flag); - mutex_lock(&nf_conn_btf_access_lock); if (nfct_btf_struct_access) - ret = nfct_btf_struct_access(log, reg, off, size, atype, next_btf_id, flag); + ret = nfct_btf_struct_access(log, reg, off, size); mutex_unlock(&nf_conn_btf_access_lock); return ret; @@ -8810,17 +8825,13 @@ EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_action); static int xdp_btf_struct_access(struct bpf_verifier_log *log, const struct bpf_reg_state *reg, - int off, int size, enum bpf_access_type atype, - u32 *next_btf_id, enum bpf_type_flag *flag) + int off, int size) { int ret = -EACCES; - if (atype == BPF_READ) - return btf_struct_access(log, reg, off, size, atype, next_btf_id, flag); - mutex_lock(&nf_conn_btf_access_lock); if (nfct_btf_struct_access) - ret = nfct_btf_struct_access(log, reg, off, size, atype, next_btf_id, flag); + ret = nfct_btf_struct_access(log, reg, off, size); mutex_unlock(&nf_conn_btf_access_lock); return ret; @@ -9170,7 +9181,7 @@ static struct bpf_insn *bpf_convert_tstamp_type_read(const struct bpf_insn *si, __u8 tmp_reg = BPF_REG_AX; *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, - PKT_VLAN_PRESENT_OFFSET); + SKB_BF_MONO_TC_OFFSET); *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, SKB_MONO_DELIVERY_TIME_MASK, 2); *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_UNSPEC); @@ -9217,7 +9228,7 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog, /* AX is needed because src_reg and dst_reg could be the same */ __u8 tmp_reg = BPF_REG_AX; - *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, PKT_VLAN_PRESENT_OFFSET); + *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, SKB_BF_MONO_TC_OFFSET); *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK); *insn++ = BPF_JMP32_IMM(BPF_JNE, tmp_reg, @@ -9252,23 +9263,27 @@ static struct bpf_insn *bpf_convert_tstamp_write(const struct bpf_prog *prog, if (!prog->tstamp_type_access) { __u8 tmp_reg = BPF_REG_AX; - *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, PKT_VLAN_PRESENT_OFFSET); + *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, SKB_BF_MONO_TC_OFFSET); /* Writing __sk_buff->tstamp as ingress, goto <clear> */ *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, TC_AT_INGRESS_MASK, 1); /* goto <store> */ *insn++ = BPF_JMP_A(2); /* <clear>: mono_delivery_time */ *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, ~SKB_MONO_DELIVERY_TIME_MASK); - *insn++ = BPF_STX_MEM(BPF_B, skb_reg, tmp_reg, PKT_VLAN_PRESENT_OFFSET); + *insn++ = BPF_STX_MEM(BPF_B, skb_reg, tmp_reg, SKB_BF_MONO_TC_OFFSET); } #endif /* <store>: skb->tstamp = tstamp */ - *insn++ = BPF_STX_MEM(BPF_DW, skb_reg, value_reg, - offsetof(struct sk_buff, tstamp)); + *insn++ = BPF_RAW_INSN(BPF_CLASS(si->code) | BPF_DW | BPF_MEM, + skb_reg, value_reg, offsetof(struct sk_buff, tstamp), si->imm); return insn; } +#define BPF_EMIT_STORE(size, si, off) \ + BPF_RAW_INSN(BPF_CLASS((si)->code) | (size) | BPF_MEM, \ + (si)->dst_reg, (si)->src_reg, (off), (si)->imm) + static u32 bpf_convert_ctx_access(enum bpf_access_type type, const struct bpf_insn *si, struct bpf_insn *insn_buf, @@ -9298,9 +9313,9 @@ static u32 bpf_convert_ctx_access(enum bpf_access_type type, case offsetof(struct __sk_buff, priority): if (type == BPF_WRITE) - *insn++ = BPF_STX_MEM(BPF_W, si->dst_reg, si->src_reg, - bpf_target_off(struct sk_buff, priority, 4, - target_size)); + *insn++ = BPF_EMIT_STORE(BPF_W, si, + bpf_target_off(struct sk_buff, priority, 4, + target_size)); else *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg, bpf_target_off(struct sk_buff, priority, 4, @@ -9331,9 +9346,9 @@ static u32 bpf_convert_ctx_access(enum bpf_access_type type, case offsetof(struct __sk_buff, mark): if (type == BPF_WRITE) - *insn++ = BPF_STX_MEM(BPF_W, si->dst_reg, si->src_reg, - bpf_target_off(struct sk_buff, mark, 4, - target_size)); + *insn++ = BPF_EMIT_STORE(BPF_W, si, + bpf_target_off(struct sk_buff, mark, 4, + target_size)); else *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg, bpf_target_off(struct sk_buff, mark, 4, @@ -9352,11 +9367,16 @@ static u32 bpf_convert_ctx_access(enum bpf_access_type type, case offsetof(struct __sk_buff, queue_mapping): if (type == BPF_WRITE) { - *insn++ = BPF_JMP_IMM(BPF_JGE, si->src_reg, NO_QUEUE_MAPPING, 1); - *insn++ = BPF_STX_MEM(BPF_H, si->dst_reg, si->src_reg, - bpf_target_off(struct sk_buff, - queue_mapping, - 2, target_size)); + u32 off = bpf_target_off(struct sk_buff, queue_mapping, 2, target_size); + + if (BPF_CLASS(si->code) == BPF_ST && si->imm >= NO_QUEUE_MAPPING) { + *insn++ = BPF_JMP_A(0); /* noop */ + break; + } + + if (BPF_CLASS(si->code) == BPF_STX) + *insn++ = BPF_JMP_IMM(BPF_JGE, si->src_reg, NO_QUEUE_MAPPING, 1); + *insn++ = BPF_EMIT_STORE(BPF_H, si, off); } else { *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg, bpf_target_off(struct sk_buff, @@ -9392,8 +9412,7 @@ static u32 bpf_convert_ctx_access(enum bpf_access_type type, off += offsetof(struct sk_buff, cb); off += offsetof(struct qdisc_skb_cb, data); if (type == BPF_WRITE) - *insn++ = BPF_STX_MEM(BPF_SIZE(si->code), si->dst_reg, - si->src_reg, off); + *insn++ = BPF_EMIT_STORE(BPF_SIZE(si->code), si, off); else *insn++ = BPF_LDX_MEM(BPF_SIZE(si->code), si->dst_reg, si->src_reg, off); @@ -9408,8 +9427,7 @@ static u32 bpf_convert_ctx_access(enum bpf_access_type type, off += offsetof(struct qdisc_skb_cb, tc_classid); *target_size = 2; if (type == BPF_WRITE) - *insn++ = BPF_STX_MEM(BPF_H, si->dst_reg, - si->src_reg, off); + *insn++ = BPF_EMIT_STORE(BPF_H, si, off); else *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg, off); @@ -9442,9 +9460,9 @@ static u32 bpf_convert_ctx_access(enum bpf_access_type type, case offsetof(struct __sk_buff, tc_index): #ifdef CONFIG_NET_SCHED if (type == BPF_WRITE) - *insn++ = BPF_STX_MEM(BPF_H, si->dst_reg, si->src_reg, - bpf_target_off(struct sk_buff, tc_index, 2, - target_size)); + *insn++ = BPF_EMIT_STORE(BPF_H, si, + bpf_target_off(struct sk_buff, tc_index, 2, + target_size)); else *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg, bpf_target_off(struct sk_buff, tc_index, 2, @@ -9645,8 +9663,8 @@ u32 bpf_sock_convert_ctx_access(enum bpf_access_type type, BUILD_BUG_ON(sizeof_field(struct sock, sk_bound_dev_if) != 4); if (type == BPF_WRITE) - *insn++ = BPF_STX_MEM(BPF_W, si->dst_reg, si->src_reg, - offsetof(struct sock, sk_bound_dev_if)); + *insn++ = BPF_EMIT_STORE(BPF_W, si, + offsetof(struct sock, sk_bound_dev_if)); else *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg, offsetof(struct sock, sk_bound_dev_if)); @@ -9656,8 +9674,8 @@ u32 bpf_sock_convert_ctx_access(enum bpf_access_type type, BUILD_BUG_ON(sizeof_field(struct sock, sk_mark) != 4); if (type == BPF_WRITE) - *insn++ = BPF_STX_MEM(BPF_W, si->dst_reg, si->src_reg, - offsetof(struct sock, sk_mark)); + *insn++ = BPF_EMIT_STORE(BPF_W, si, + offsetof(struct sock, sk_mark)); else *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg, offsetof(struct sock, sk_mark)); @@ -9667,8 +9685,8 @@ u32 bpf_sock_convert_ctx_access(enum bpf_access_type type, BUILD_BUG_ON(sizeof_field(struct sock, sk_priority) != 4); if (type == BPF_WRITE) - *insn++ = BPF_STX_MEM(BPF_W, si->dst_reg, si->src_reg, - offsetof(struct sock, sk_priority)); + *insn++ = BPF_EMIT_STORE(BPF_W, si, + offsetof(struct sock, sk_priority)); else *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg, offsetof(struct sock, sk_priority)); @@ -9933,10 +9951,12 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type, offsetof(S, TF)); \ *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(S, F), tmp_reg, \ si->dst_reg, offsetof(S, F)); \ - *insn++ = BPF_STX_MEM(SIZE, tmp_reg, si->src_reg, \ + *insn++ = BPF_RAW_INSN(SIZE | BPF_MEM | BPF_CLASS(si->code), \ + tmp_reg, si->src_reg, \ bpf_target_off(NS, NF, sizeof_field(NS, NF), \ target_size) \ - + OFF); \ + + OFF, \ + si->imm); \ *insn++ = BPF_LDX_MEM(BPF_DW, tmp_reg, si->dst_reg, \ offsetof(S, TF)); \ } while (0) @@ -10171,9 +10191,11 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type, struct bpf_sock_ops_kern, sk),\ reg, si->dst_reg, \ offsetof(struct bpf_sock_ops_kern, sk));\ - *insn++ = BPF_STX_MEM(BPF_FIELD_SIZEOF(OBJ, OBJ_FIELD), \ - reg, si->src_reg, \ - offsetof(OBJ, OBJ_FIELD)); \ + *insn++ = BPF_RAW_INSN(BPF_FIELD_SIZEOF(OBJ, OBJ_FIELD) | \ + BPF_MEM | BPF_CLASS(si->code), \ + reg, si->src_reg, \ + offsetof(OBJ, OBJ_FIELD), \ + si->imm); \ *insn++ = BPF_LDX_MEM(BPF_DW, reg, si->dst_reg, \ offsetof(struct bpf_sock_ops_kern, \ temp)); \ @@ -10205,8 +10227,7 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type, off -= offsetof(struct bpf_sock_ops, replylong[0]); off += offsetof(struct bpf_sock_ops_kern, replylong[0]); if (type == BPF_WRITE) - *insn++ = BPF_STX_MEM(BPF_W, si->dst_reg, si->src_reg, - off); + *insn++ = BPF_EMIT_STORE(BPF_W, si, off); else *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg, off); @@ -10563,8 +10584,7 @@ static u32 sk_skb_convert_ctx_access(enum bpf_access_type type, off += offsetof(struct sk_buff, cb); off += offsetof(struct sk_skb_cb, data); if (type == BPF_WRITE) - *insn++ = BPF_STX_MEM(BPF_SIZE(si->code), si->dst_reg, - si->src_reg, off); + *insn++ = BPF_EMIT_STORE(BPF_SIZE(si->code), si, off); else *insn++ = BPF_LDX_MEM(BPF_SIZE(si->code), si->dst_reg, si->src_reg, off); @@ -11621,3 +11641,83 @@ bpf_sk_base_func_proto(enum bpf_func_id func_id) return func; } + +__diag_push(); +__diag_ignore_all("-Wmissing-prototypes", + "Global functions as their definitions will be in vmlinux BTF"); +__bpf_kfunc int bpf_dynptr_from_skb(struct sk_buff *skb, u64 flags, + struct bpf_dynptr_kern *ptr__uninit) +{ + if (flags) { + bpf_dynptr_set_null(ptr__uninit); + return -EINVAL; + } + + bpf_dynptr_init(ptr__uninit, skb, BPF_DYNPTR_TYPE_SKB, 0, skb->len); + + return 0; +} + +__bpf_kfunc int bpf_dynptr_from_xdp(struct xdp_buff *xdp, u64 flags, + struct bpf_dynptr_kern *ptr__uninit) +{ + if (flags) { + bpf_dynptr_set_null(ptr__uninit); + return -EINVAL; + } + + bpf_dynptr_init(ptr__uninit, xdp, BPF_DYNPTR_TYPE_XDP, 0, xdp_get_buff_len(xdp)); + + return 0; +} +__diag_pop(); + +int bpf_dynptr_from_skb_rdonly(struct sk_buff *skb, u64 flags, + struct bpf_dynptr_kern *ptr__uninit) +{ + int err; + + err = bpf_dynptr_from_skb(skb, flags, ptr__uninit); + if (err) + return err; + + bpf_dynptr_set_rdonly(ptr__uninit); + + return 0; +} + +BTF_SET8_START(bpf_kfunc_check_set_skb) +BTF_ID_FLAGS(func, bpf_dynptr_from_skb) +BTF_SET8_END(bpf_kfunc_check_set_skb) + +BTF_SET8_START(bpf_kfunc_check_set_xdp) +BTF_ID_FLAGS(func, bpf_dynptr_from_xdp) +BTF_SET8_END(bpf_kfunc_check_set_xdp) + +static const struct btf_kfunc_id_set bpf_kfunc_set_skb = { + .owner = THIS_MODULE, + .set = &bpf_kfunc_check_set_skb, +}; + +static const struct btf_kfunc_id_set bpf_kfunc_set_xdp = { + .owner = THIS_MODULE, + .set = &bpf_kfunc_check_set_xdp, +}; + +static int __init bpf_kfunc_init(void) +{ + int ret; + + ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_skb); + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_ACT, &bpf_kfunc_set_skb); + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SK_SKB, &bpf_kfunc_set_skb); + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SOCKET_FILTER, &bpf_kfunc_set_skb); + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SKB, &bpf_kfunc_set_skb); + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_LWT_OUT, &bpf_kfunc_set_skb); + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_LWT_IN, &bpf_kfunc_set_skb); + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_LWT_XMIT, &bpf_kfunc_set_skb); + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_LWT_SEG6LOCAL, &bpf_kfunc_set_skb); + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_NETFILTER, &bpf_kfunc_set_skb); + return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &bpf_kfunc_set_xdp); +} +late_initcall(bpf_kfunc_init); diff --git a/net/core/gro.c b/net/core/gro.c index a606705a0859..2d84165cb4f1 100644 --- a/net/core/gro.c +++ b/net/core/gro.c @@ -633,7 +633,7 @@ static gro_result_t napi_skb_finish(struct napi_struct *napi, else if (skb->fclone != SKB_FCLONE_UNAVAILABLE) __kfree_skb(skb); else - __kfree_skb_defer(skb); + __napi_kfree_skb(skb, SKB_CONSUMED); break; case GRO_HELD: diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 6798f6d2423b..ddd0f32de20e 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -614,7 +614,7 @@ struct neighbour *neigh_lookup(struct neigh_table *tbl, const void *pkey, NEIGH_CACHE_STAT_INC(tbl, lookups); - rcu_read_lock_bh(); + rcu_read_lock(); n = __neigh_lookup_noref(tbl, pkey, dev); if (n) { if (!refcount_inc_not_zero(&n->refcnt)) @@ -622,42 +622,11 @@ struct neighbour *neigh_lookup(struct neigh_table *tbl, const void *pkey, NEIGH_CACHE_STAT_INC(tbl, hits); } - rcu_read_unlock_bh(); + rcu_read_unlock(); return n; } EXPORT_SYMBOL(neigh_lookup); -struct neighbour *neigh_lookup_nodev(struct neigh_table *tbl, struct net *net, - const void *pkey) -{ - struct neighbour *n; - unsigned int key_len = tbl->key_len; - u32 hash_val; - struct neigh_hash_table *nht; - - NEIGH_CACHE_STAT_INC(tbl, lookups); - - rcu_read_lock_bh(); - nht = rcu_dereference_bh(tbl->nht); - hash_val = tbl->hash(pkey, NULL, nht->hash_rnd) >> (32 - nht->hash_shift); - - for (n = rcu_dereference_bh(nht->hash_buckets[hash_val]); - n != NULL; - n = rcu_dereference_bh(n->next)) { - if (!memcmp(n->primary_key, pkey, key_len) && - net_eq(dev_net(n->dev), net)) { - if (!refcount_inc_not_zero(&n->refcnt)) - n = NULL; - NEIGH_CACHE_STAT_INC(tbl, hits); - break; - } - } - - rcu_read_unlock_bh(); - return n; -} -EXPORT_SYMBOL(neigh_lookup_nodev); - static struct neighbour * ___neigh_create(struct neigh_table *tbl, const void *pkey, struct net_device *dev, u32 flags, @@ -1124,13 +1093,13 @@ static void neigh_timer_handler(struct timer_list *t) neigh->used + NEIGH_VAR(neigh->parms, DELAY_PROBE_TIME))) { neigh_dbg(2, "neigh %p is delayed\n", neigh); - neigh->nud_state = NUD_DELAY; + WRITE_ONCE(neigh->nud_state, NUD_DELAY); neigh->updated = jiffies; neigh_suspect(neigh); next = now + NEIGH_VAR(neigh->parms, DELAY_PROBE_TIME); } else { neigh_dbg(2, "neigh %p is suspected\n", neigh); - neigh->nud_state = NUD_STALE; + WRITE_ONCE(neigh->nud_state, NUD_STALE); neigh->updated = jiffies; neigh_suspect(neigh); notify = 1; @@ -1140,14 +1109,14 @@ static void neigh_timer_handler(struct timer_list *t) neigh->confirmed + NEIGH_VAR(neigh->parms, DELAY_PROBE_TIME))) { neigh_dbg(2, "neigh %p is now reachable\n", neigh); - neigh->nud_state = NUD_REACHABLE; + WRITE_ONCE(neigh->nud_state, NUD_REACHABLE); neigh->updated = jiffies; neigh_connect(neigh); notify = 1; next = neigh->confirmed + neigh->parms->reachable_time; } else { neigh_dbg(2, "neigh %p is probed\n", neigh); - neigh->nud_state = NUD_PROBE; + WRITE_ONCE(neigh->nud_state, NUD_PROBE); neigh->updated = jiffies; atomic_set(&neigh->probes, 0); notify = 1; @@ -1161,7 +1130,7 @@ static void neigh_timer_handler(struct timer_list *t) if ((neigh->nud_state & (NUD_INCOMPLETE | NUD_PROBE)) && atomic_read(&neigh->probes) >= neigh_max_probes(neigh)) { - neigh->nud_state = NUD_FAILED; + WRITE_ONCE(neigh->nud_state, NUD_FAILED); notify = 1; neigh_invalidate(neigh); goto out; @@ -1210,7 +1179,7 @@ int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb, atomic_set(&neigh->probes, NEIGH_VAR(neigh->parms, UCAST_PROBES)); neigh_del_timer(neigh); - neigh->nud_state = NUD_INCOMPLETE; + WRITE_ONCE(neigh->nud_state, NUD_INCOMPLETE); neigh->updated = now; if (!immediate_ok) { next = now + 1; @@ -1222,7 +1191,7 @@ int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb, } neigh_add_timer(neigh, next); } else { - neigh->nud_state = NUD_FAILED; + WRITE_ONCE(neigh->nud_state, NUD_FAILED); neigh->updated = jiffies; write_unlock_bh(&neigh->lock); @@ -1232,7 +1201,7 @@ int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb, } else if (neigh->nud_state & NUD_STALE) { neigh_dbg(2, "neigh %p is delayed\n", neigh); neigh_del_timer(neigh); - neigh->nud_state = NUD_DELAY; + WRITE_ONCE(neigh->nud_state, NUD_DELAY); neigh->updated = jiffies; neigh_add_timer(neigh, jiffies + NEIGH_VAR(neigh->parms, DELAY_PROBE_TIME)); @@ -1344,7 +1313,7 @@ static int __neigh_update(struct neighbour *neigh, const u8 *lladdr, neigh_update_flags(neigh, flags, ¬ify, &gc_update, &managed_update); if (flags & (NEIGH_UPDATE_F_USE | NEIGH_UPDATE_F_MANAGED)) { new = old & ~NUD_PERMANENT; - neigh->nud_state = new; + WRITE_ONCE(neigh->nud_state, new); err = 0; goto out; } @@ -1353,7 +1322,7 @@ static int __neigh_update(struct neighbour *neigh, const u8 *lladdr, neigh_del_timer(neigh); if (old & NUD_CONNECTED) neigh_suspect(neigh); - neigh->nud_state = new; + WRITE_ONCE(neigh->nud_state, new); err = 0; notify = old & NUD_VALID; if ((old & (NUD_INCOMPLETE | NUD_PROBE)) && @@ -1432,7 +1401,7 @@ static int __neigh_update(struct neighbour *neigh, const u8 *lladdr, ((new & NUD_REACHABLE) ? neigh->parms->reachable_time : 0))); - neigh->nud_state = new; + WRITE_ONCE(neigh->nud_state, new); notify = 1; } @@ -1519,7 +1488,7 @@ void __neigh_set_probe_once(struct neighbour *neigh) neigh->updated = jiffies; if (!(neigh->nud_state & NUD_FAILED)) return; - neigh->nud_state = NUD_INCOMPLETE; + WRITE_ONCE(neigh->nud_state, NUD_INCOMPLETE); atomic_set(&neigh->probes, neigh_max_probes(neigh)); neigh_add_timer(neigh, jiffies + max(NEIGH_VAR(neigh->parms, RETRANS_TIME), @@ -2215,11 +2184,11 @@ static int neightbl_fill_info(struct sk_buff *skb, struct neigh_table *tbl, .ndtc_proxy_qlen = tbl->proxy_queue.qlen, }; - rcu_read_lock_bh(); - nht = rcu_dereference_bh(tbl->nht); + rcu_read_lock(); + nht = rcu_dereference(tbl->nht); ndc.ndtc_hash_rnd = nht->hash_rnd[0]; ndc.ndtc_hash_mask = ((1 << nht->hash_shift) - 1); - rcu_read_unlock_bh(); + rcu_read_unlock(); if (nla_put(skb, NDTA_CONFIG, sizeof(ndc), &ndc)) goto nla_put_failure; @@ -2734,15 +2703,15 @@ static int neigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb, if (filter->dev_idx || filter->master_idx) flags |= NLM_F_DUMP_FILTERED; - rcu_read_lock_bh(); - nht = rcu_dereference_bh(tbl->nht); + rcu_read_lock(); + nht = rcu_dereference(tbl->nht); for (h = s_h; h < (1 << nht->hash_shift); h++) { if (h > s_h) s_idx = 0; - for (n = rcu_dereference_bh(nht->hash_buckets[h]), idx = 0; + for (n = rcu_dereference(nht->hash_buckets[h]), idx = 0; n != NULL; - n = rcu_dereference_bh(n->next)) { + n = rcu_dereference(n->next)) { if (idx < s_idx || !net_eq(dev_net(n->dev), net)) goto next; if (neigh_ifindex_filtered(n->dev, filter->dev_idx) || @@ -2761,7 +2730,7 @@ next: } rc = skb->len; out: - rcu_read_unlock_bh(); + rcu_read_unlock(); cb->args[1] = h; cb->args[2] = idx; return rc; @@ -3106,20 +3075,20 @@ void neigh_for_each(struct neigh_table *tbl, void (*cb)(struct neighbour *, void int chain; struct neigh_hash_table *nht; - rcu_read_lock_bh(); - nht = rcu_dereference_bh(tbl->nht); + rcu_read_lock(); + nht = rcu_dereference(tbl->nht); - read_lock(&tbl->lock); /* avoid resizes */ + read_lock_bh(&tbl->lock); /* avoid resizes */ for (chain = 0; chain < (1 << nht->hash_shift); chain++) { struct neighbour *n; - for (n = rcu_dereference_bh(nht->hash_buckets[chain]); + for (n = rcu_dereference(nht->hash_buckets[chain]); n != NULL; - n = rcu_dereference_bh(n->next)) + n = rcu_dereference(n->next)) cb(n, cookie); } - read_unlock(&tbl->lock); - rcu_read_unlock_bh(); + read_unlock_bh(&tbl->lock); + rcu_read_unlock(); } EXPORT_SYMBOL(neigh_for_each); @@ -3169,7 +3138,7 @@ int neigh_xmit(int index, struct net_device *dev, tbl = neigh_tables[index]; if (!tbl) goto out; - rcu_read_lock_bh(); + rcu_read_lock(); if (index == NEIGH_ARP_TABLE) { u32 key = *((u32 *)addr); @@ -3181,11 +3150,11 @@ int neigh_xmit(int index, struct net_device *dev, neigh = __neigh_create(tbl, addr, dev, false); err = PTR_ERR(neigh); if (IS_ERR(neigh)) { - rcu_read_unlock_bh(); + rcu_read_unlock(); goto out_kfree_skb; } err = neigh->output(neigh, skb); - rcu_read_unlock_bh(); + rcu_read_unlock(); } else if (index == NEIGH_LINK_TABLE) { err = dev_hard_header(skb, dev, ntohs(skb->protocol), @@ -3214,7 +3183,7 @@ static struct neighbour *neigh_get_first(struct seq_file *seq) state->flags &= ~NEIGH_SEQ_IS_PNEIGH; for (bucket = 0; bucket < (1 << nht->hash_shift); bucket++) { - n = rcu_dereference_bh(nht->hash_buckets[bucket]); + n = rcu_dereference(nht->hash_buckets[bucket]); while (n) { if (!net_eq(dev_net(n->dev), net)) @@ -3229,10 +3198,10 @@ static struct neighbour *neigh_get_first(struct seq_file *seq) } if (!(state->flags & NEIGH_SEQ_SKIP_NOARP)) break; - if (n->nud_state & ~NUD_NOARP) + if (READ_ONCE(n->nud_state) & ~NUD_NOARP) break; next: - n = rcu_dereference_bh(n->next); + n = rcu_dereference(n->next); } if (n) @@ -3256,7 +3225,7 @@ static struct neighbour *neigh_get_next(struct seq_file *seq, if (v) return n; } - n = rcu_dereference_bh(n->next); + n = rcu_dereference(n->next); while (1) { while (n) { @@ -3271,10 +3240,10 @@ static struct neighbour *neigh_get_next(struct seq_file *seq, if (!(state->flags & NEIGH_SEQ_SKIP_NOARP)) break; - if (n->nud_state & ~NUD_NOARP) + if (READ_ONCE(n->nud_state) & ~NUD_NOARP) break; next: - n = rcu_dereference_bh(n->next); + n = rcu_dereference(n->next); } if (n) @@ -3283,7 +3252,7 @@ next: if (++state->bucket >= (1 << nht->hash_shift)) break; - n = rcu_dereference_bh(nht->hash_buckets[state->bucket]); + n = rcu_dereference(nht->hash_buckets[state->bucket]); } if (n && pos) @@ -3385,7 +3354,7 @@ static void *neigh_get_idx_any(struct seq_file *seq, loff_t *pos) void *neigh_seq_start(struct seq_file *seq, loff_t *pos, struct neigh_table *tbl, unsigned int neigh_seq_flags) __acquires(tbl->lock) - __acquires(rcu_bh) + __acquires(rcu) { struct neigh_seq_state *state = seq->private; @@ -3393,9 +3362,9 @@ void *neigh_seq_start(struct seq_file *seq, loff_t *pos, struct neigh_table *tbl state->bucket = 0; state->flags = (neigh_seq_flags & ~NEIGH_SEQ_IS_PNEIGH); - rcu_read_lock_bh(); - state->nht = rcu_dereference_bh(tbl->nht); - read_lock(&tbl->lock); + rcu_read_lock(); + state->nht = rcu_dereference(tbl->nht); + read_lock_bh(&tbl->lock); return *pos ? neigh_get_idx_any(seq, pos) : SEQ_START_TOKEN; } @@ -3430,13 +3399,13 @@ EXPORT_SYMBOL(neigh_seq_next); void neigh_seq_stop(struct seq_file *seq, void *v) __releases(tbl->lock) - __releases(rcu_bh) + __releases(rcu) { struct neigh_seq_state *state = seq->private; struct neigh_table *tbl = state->tbl; - read_unlock(&tbl->lock); - rcu_read_unlock_bh(); + read_unlock_bh(&tbl->lock); + rcu_read_unlock(); } EXPORT_SYMBOL(neigh_seq_stop); diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c index 1ec23bf8b05c..09f7ed1a04e8 100644 --- a/net/core/net-procfs.c +++ b/net/core/net-procfs.c @@ -115,10 +115,14 @@ static int dev_seq_show(struct seq_file *seq, void *v) return 0; } -static u32 softnet_backlog_len(struct softnet_data *sd) +static u32 softnet_input_pkt_queue_len(struct softnet_data *sd) { - return skb_queue_len_lockless(&sd->input_pkt_queue) + - skb_queue_len_lockless(&sd->process_queue); + return skb_queue_len_lockless(&sd->input_pkt_queue); +} + +static u32 softnet_process_queue_len(struct softnet_data *sd) +{ + return skb_queue_len_lockless(&sd->process_queue); } static struct softnet_data *softnet_get_online(loff_t *pos) @@ -152,6 +156,8 @@ static void softnet_seq_stop(struct seq_file *seq, void *v) static int softnet_seq_show(struct seq_file *seq, void *v) { struct softnet_data *sd = v; + u32 input_qlen = softnet_input_pkt_queue_len(sd); + u32 process_qlen = softnet_process_queue_len(sd); unsigned int flow_limit_count = 0; #ifdef CONFIG_NET_FLOW_LIMIT @@ -169,12 +175,14 @@ static int softnet_seq_show(struct seq_file *seq, void *v) * mapping the data a specific CPU */ seq_printf(seq, - "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n", + "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x " + "%08x %08x\n", sd->processed, sd->dropped, sd->time_squeeze, 0, 0, 0, 0, 0, /* was fastroute */ 0, /* was cpu_collision */ sd->received_rps, flow_limit_count, - softnet_backlog_len(sd), (int)seq->index); + input_qlen + process_qlen, (int)seq->index, + input_qlen, process_qlen); return 0; } diff --git a/net/core/netdev-genl-gen.c b/net/core/netdev-genl-gen.c index 3abab70d66dd..de17ca2f7dbf 100644 --- a/net/core/netdev-genl-gen.c +++ b/net/core/netdev-genl-gen.c @@ -16,7 +16,7 @@ static const struct nla_policy netdev_dev_get_nl_policy[NETDEV_A_DEV_IFINDEX + 1 }; /* Ops table for netdev */ -static const struct genl_split_ops netdev_nl_ops[2] = { +static const struct genl_split_ops netdev_nl_ops[] = { { .cmd = NETDEV_CMD_DEV_GET, .doit = netdev_nl_dev_get_doit, diff --git a/net/core/page_pool.c b/net/core/page_pool.c index 193c18799865..e212e9d7edcb 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -19,6 +19,7 @@ #include <linux/mm.h> /* for put_page() */ #include <linux/poison.h> #include <linux/ethtool.h> +#include <linux/netdevice.h> #include <trace/events/page_pool.h> @@ -315,7 +316,8 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page) */ dma = dma_map_page_attrs(pool->p.dev, page, 0, (PAGE_SIZE << pool->p.order), - pool->p.dma_dir, DMA_ATTR_SKIP_CPU_SYNC); + pool->p.dma_dir, DMA_ATTR_SKIP_CPU_SYNC | + DMA_ATTR_WEAK_ORDERING); if (dma_mapping_error(pool->p.dev, dma)) return false; @@ -483,7 +485,7 @@ void page_pool_release_page(struct page_pool *pool, struct page *page) /* When page is unmapped, it cannot be returned to our pool */ dma_unmap_page_attrs(pool->p.dev, dma, PAGE_SIZE << pool->p.order, pool->p.dma_dir, - DMA_ATTR_SKIP_CPU_SYNC); + DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING); page_pool_set_dma_addr(page, 0); skip_dma_unmap: page_pool_clear_pp_info(page); @@ -837,6 +839,21 @@ void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *), pool->xdp_mem_id = mem->id; } +void page_pool_unlink_napi(struct page_pool *pool) +{ + if (!pool->p.napi) + return; + + /* To avoid races with recycling and additional barriers make sure + * pool and NAPI are unlinked when NAPI is disabled. + */ + WARN_ON(!test_bit(NAPI_STATE_SCHED, &pool->p.napi->state) || + READ_ONCE(pool->p.napi->list_owner) != -1); + + WRITE_ONCE(pool->p.napi, NULL); +} +EXPORT_SYMBOL(page_pool_unlink_napi); + void page_pool_destroy(struct page_pool *pool) { if (!pool) @@ -845,6 +862,7 @@ void page_pool_destroy(struct page_pool *pool) if (!page_pool_put(pool)) return; + page_pool_unlink_napi(pool); page_pool_free_frag(pool); if (!page_pool_release(pool)) @@ -874,9 +892,11 @@ void page_pool_update_nid(struct page_pool *pool, int new_nid) } EXPORT_SYMBOL(page_pool_update_nid); -bool page_pool_return_skb_page(struct page *page) +bool page_pool_return_skb_page(struct page *page, bool napi_safe) { + struct napi_struct *napi; struct page_pool *pp; + bool allow_direct; page = compound_head(page); @@ -892,12 +912,20 @@ bool page_pool_return_skb_page(struct page *page) pp = page->pp; + /* Allow direct recycle if we have reasons to believe that we are + * in the same context as the consumer would run, so there's + * no possible race. + */ + napi = READ_ONCE(pp->p.napi); + allow_direct = napi_safe && napi && + READ_ONCE(napi->list_owner) == smp_processor_id(); + /* Driver set this to memory recycling info. Reset it on recycle. * This will *not* work for NIC using a split-page memory model. * The page will be returned to the pool here regardless of the * 'flipped' fragment being in use or not. */ - page_pool_put_full_page(pp, page, false); + page_pool_put_full_page(pp, page, allow_direct); return true; } diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 6e44e92ebdf5..653901a1bf75 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -54,11 +54,14 @@ #include <net/rtnetlink.h> #include <net/net_namespace.h> #include <net/devlink.h> +#if IS_ENABLED(CONFIG_IPV6) +#include <net/addrconf.h> +#endif #include "dev.h" #define RTNL_MAX_TYPE 50 -#define RTNL_SLAVE_MAX_TYPE 42 +#define RTNL_SLAVE_MAX_TYPE 43 struct rtnl_link { rtnl_doit_func doit; @@ -840,7 +843,7 @@ int rtnl_put_cacheinfo(struct sk_buff *skb, struct dst_entry *dst, u32 id, if (dst) { ci.rta_lastuse = jiffies_delta_to_clock_t(jiffies - dst->lastuse); ci.rta_used = dst->__use; - ci.rta_clntref = atomic_read(&dst->__refcnt); + ci.rta_clntref = rcuref_read(&dst->__rcuref); } if (expires) { unsigned long clock; @@ -6070,6 +6073,217 @@ static int rtnl_stats_set(struct sk_buff *skb, struct nlmsghdr *nlh, return 0; } +static int rtnl_mdb_valid_dump_req(const struct nlmsghdr *nlh, + struct netlink_ext_ack *extack) +{ + struct br_port_msg *bpm; + + if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*bpm))) { + NL_SET_ERR_MSG(extack, "Invalid header for mdb dump request"); + return -EINVAL; + } + + bpm = nlmsg_data(nlh); + if (bpm->ifindex) { + NL_SET_ERR_MSG(extack, "Filtering by device index is not supported for mdb dump request"); + return -EINVAL; + } + if (nlmsg_attrlen(nlh, sizeof(*bpm))) { + NL_SET_ERR_MSG(extack, "Invalid data after header in mdb dump request"); + return -EINVAL; + } + + return 0; +} + +struct rtnl_mdb_dump_ctx { + long idx; +}; + +static int rtnl_mdb_dump(struct sk_buff *skb, struct netlink_callback *cb) +{ + struct rtnl_mdb_dump_ctx *ctx = (void *)cb->ctx; + struct net *net = sock_net(skb->sk); + struct net_device *dev; + int idx, s_idx; + int err; + + NL_ASSERT_DUMP_CTX_FITS(struct rtnl_mdb_dump_ctx); + + if (cb->strict_check) { + err = rtnl_mdb_valid_dump_req(cb->nlh, cb->extack); + if (err) + return err; + } + + s_idx = ctx->idx; + idx = 0; + + for_each_netdev(net, dev) { + if (idx < s_idx) + goto skip; + if (!dev->netdev_ops->ndo_mdb_dump) + goto skip; + + err = dev->netdev_ops->ndo_mdb_dump(dev, skb, cb); + if (err == -EMSGSIZE) + goto out; + /* Moving on to next device, reset markers and sequence + * counters since they are all maintained per-device. + */ + memset(cb->ctx, 0, sizeof(cb->ctx)); + cb->prev_seq = 0; + cb->seq = 0; +skip: + idx++; + } + +out: + ctx->idx = idx; + return skb->len; +} + +static int rtnl_validate_mdb_entry(const struct nlattr *attr, + struct netlink_ext_ack *extack) +{ + struct br_mdb_entry *entry = nla_data(attr); + + if (nla_len(attr) != sizeof(struct br_mdb_entry)) { + NL_SET_ERR_MSG_ATTR(extack, attr, "Invalid attribute length"); + return -EINVAL; + } + + if (entry->ifindex == 0) { + NL_SET_ERR_MSG(extack, "Zero entry ifindex is not allowed"); + return -EINVAL; + } + + if (entry->addr.proto == htons(ETH_P_IP)) { + if (!ipv4_is_multicast(entry->addr.u.ip4) && + !ipv4_is_zeronet(entry->addr.u.ip4)) { + NL_SET_ERR_MSG(extack, "IPv4 entry group address is not multicast or 0.0.0.0"); + return -EINVAL; + } + if (ipv4_is_local_multicast(entry->addr.u.ip4)) { + NL_SET_ERR_MSG(extack, "IPv4 entry group address is local multicast"); + return -EINVAL; + } +#if IS_ENABLED(CONFIG_IPV6) + } else if (entry->addr.proto == htons(ETH_P_IPV6)) { + if (ipv6_addr_is_ll_all_nodes(&entry->addr.u.ip6)) { + NL_SET_ERR_MSG(extack, "IPv6 entry group address is link-local all nodes"); + return -EINVAL; + } +#endif + } else if (entry->addr.proto == 0) { + /* L2 mdb */ + if (!is_multicast_ether_addr(entry->addr.u.mac_addr)) { + NL_SET_ERR_MSG(extack, "L2 entry group is not multicast"); + return -EINVAL; + } + } else { + NL_SET_ERR_MSG(extack, "Unknown entry protocol"); + return -EINVAL; + } + + if (entry->state != MDB_PERMANENT && entry->state != MDB_TEMPORARY) { + NL_SET_ERR_MSG(extack, "Unknown entry state"); + return -EINVAL; + } + if (entry->vid >= VLAN_VID_MASK) { + NL_SET_ERR_MSG(extack, "Invalid entry VLAN id"); + return -EINVAL; + } + + return 0; +} + +static const struct nla_policy mdba_policy[MDBA_SET_ENTRY_MAX + 1] = { + [MDBA_SET_ENTRY_UNSPEC] = { .strict_start_type = MDBA_SET_ENTRY_ATTRS + 1 }, + [MDBA_SET_ENTRY] = NLA_POLICY_VALIDATE_FN(NLA_BINARY, + rtnl_validate_mdb_entry, + sizeof(struct br_mdb_entry)), + [MDBA_SET_ENTRY_ATTRS] = { .type = NLA_NESTED }, +}; + +static int rtnl_mdb_add(struct sk_buff *skb, struct nlmsghdr *nlh, + struct netlink_ext_ack *extack) +{ + struct nlattr *tb[MDBA_SET_ENTRY_MAX + 1]; + struct net *net = sock_net(skb->sk); + struct br_port_msg *bpm; + struct net_device *dev; + int err; + + err = nlmsg_parse_deprecated(nlh, sizeof(*bpm), tb, + MDBA_SET_ENTRY_MAX, mdba_policy, extack); + if (err) + return err; + + bpm = nlmsg_data(nlh); + if (!bpm->ifindex) { + NL_SET_ERR_MSG(extack, "Invalid ifindex"); + return -EINVAL; + } + + dev = __dev_get_by_index(net, bpm->ifindex); + if (!dev) { + NL_SET_ERR_MSG(extack, "Device doesn't exist"); + return -ENODEV; + } + + if (NL_REQ_ATTR_CHECK(extack, NULL, tb, MDBA_SET_ENTRY)) { + NL_SET_ERR_MSG(extack, "Missing MDBA_SET_ENTRY attribute"); + return -EINVAL; + } + + if (!dev->netdev_ops->ndo_mdb_add) { + NL_SET_ERR_MSG(extack, "Device does not support MDB operations"); + return -EOPNOTSUPP; + } + + return dev->netdev_ops->ndo_mdb_add(dev, tb, nlh->nlmsg_flags, extack); +} + +static int rtnl_mdb_del(struct sk_buff *skb, struct nlmsghdr *nlh, + struct netlink_ext_ack *extack) +{ + struct nlattr *tb[MDBA_SET_ENTRY_MAX + 1]; + struct net *net = sock_net(skb->sk); + struct br_port_msg *bpm; + struct net_device *dev; + int err; + + err = nlmsg_parse_deprecated(nlh, sizeof(*bpm), tb, + MDBA_SET_ENTRY_MAX, mdba_policy, extack); + if (err) + return err; + + bpm = nlmsg_data(nlh); + if (!bpm->ifindex) { + NL_SET_ERR_MSG(extack, "Invalid ifindex"); + return -EINVAL; + } + + dev = __dev_get_by_index(net, bpm->ifindex); + if (!dev) { + NL_SET_ERR_MSG(extack, "Device doesn't exist"); + return -ENODEV; + } + + if (NL_REQ_ATTR_CHECK(extack, NULL, tb, MDBA_SET_ENTRY)) { + NL_SET_ERR_MSG(extack, "Missing MDBA_SET_ENTRY attribute"); + return -EINVAL; + } + + if (!dev->netdev_ops->ndo_mdb_del) { + NL_SET_ERR_MSG(extack, "Device does not support MDB operations"); + return -EOPNOTSUPP; + } + + return dev->netdev_ops->ndo_mdb_del(dev, tb, extack); +} + /* Process one rtnetlink message. */ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, @@ -6304,4 +6518,8 @@ void __init rtnetlink_init(void) rtnl_register(PF_UNSPEC, RTM_GETSTATS, rtnl_stats_get, rtnl_stats_dump, 0); rtnl_register(PF_UNSPEC, RTM_SETSTATS, rtnl_stats_set, NULL, 0); + + rtnl_register(PF_BRIDGE, RTM_GETMDB, NULL, rtnl_mdb_dump, 0); + rtnl_register(PF_BRIDGE, RTM_NEWMDB, rtnl_mdb_add, NULL, 0); + rtnl_register(PF_BRIDGE, RTM_DELMDB, rtnl_mdb_del, NULL, 0); } diff --git a/net/core/scm.c b/net/core/scm.c index acb7d776fa6e..3cd7dd377e53 100644 --- a/net/core/scm.c +++ b/net/core/scm.c @@ -250,7 +250,10 @@ int put_cmsg(struct msghdr * msg, int level, int type, int len, void *data) } cmlen = min(CMSG_SPACE(len), msg->msg_controllen); - msg->msg_control += cmlen; + if (msg->msg_control_is_user) + msg->msg_control_user += cmlen; + else + msg->msg_control += cmlen; msg->msg_controllen -= cmlen; return 0; @@ -299,7 +302,7 @@ static int scm_max_fds(struct msghdr *msg) void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm) { struct cmsghdr __user *cm = - (__force struct cmsghdr __user *)msg->msg_control; + (__force struct cmsghdr __user *)msg->msg_control_user; unsigned int o_flags = (msg->msg_flags & MSG_CMSG_CLOEXEC) ? O_CLOEXEC : 0; int fdmax = min_t(int, scm_max_fds(msg), scm->fp->count); int __user *cmsg_data = CMSG_USER_DATA(cm); @@ -332,7 +335,7 @@ void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm) cmlen = CMSG_SPACE(i * sizeof(int)); if (msg->msg_controllen < cmlen) cmlen = msg->msg_controllen; - msg->msg_control += cmlen; + msg->msg_control_user += cmlen; msg->msg_controllen -= cmlen; } } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 4c0879798eb8..2112146092bf 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -58,6 +58,7 @@ #include <linux/scatterlist.h> #include <linux/errqueue.h> #include <linux/prefetch.h> +#include <linux/bitfield.h> #include <linux/if_vlan.h> #include <linux/mpls.h> #include <linux/kcov.h> @@ -72,6 +73,7 @@ #include <net/mptcp.h> #include <net/mctp.h> #include <net/page_pool.h> +#include <net/dropreason.h> #include <linux/uaccess.h> #include <trace/events/skb.h> @@ -122,11 +124,59 @@ EXPORT_SYMBOL(sysctl_max_skb_frags); #undef FN #define FN(reason) [SKB_DROP_REASON_##reason] = #reason, -const char * const drop_reasons[] = { +static const char * const drop_reasons[] = { [SKB_CONSUMED] = "CONSUMED", DEFINE_DROP_REASON(FN, FN) }; -EXPORT_SYMBOL(drop_reasons); + +static const struct drop_reason_list drop_reasons_core = { + .reasons = drop_reasons, + .n_reasons = ARRAY_SIZE(drop_reasons), +}; + +const struct drop_reason_list __rcu * +drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_NUM] = { + [SKB_DROP_REASON_SUBSYS_CORE] = RCU_INITIALIZER(&drop_reasons_core), +}; +EXPORT_SYMBOL(drop_reasons_by_subsys); + +/** + * drop_reasons_register_subsys - register another drop reason subsystem + * @subsys: the subsystem to register, must not be the core + * @list: the list of drop reasons within the subsystem, must point to + * a statically initialized list + */ +void drop_reasons_register_subsys(enum skb_drop_reason_subsys subsys, + const struct drop_reason_list *list) +{ + if (WARN(subsys <= SKB_DROP_REASON_SUBSYS_CORE || + subsys >= ARRAY_SIZE(drop_reasons_by_subsys), + "invalid subsystem %d\n", subsys)) + return; + + /* must point to statically allocated memory, so INIT is OK */ + RCU_INIT_POINTER(drop_reasons_by_subsys[subsys], list); +} +EXPORT_SYMBOL_GPL(drop_reasons_register_subsys); + +/** + * drop_reasons_unregister_subsys - unregister a drop reason subsystem + * @subsys: the subsystem to remove, must not be the core + * + * Note: This will synchronize_rcu() to ensure no users when it returns. + */ +void drop_reasons_unregister_subsys(enum skb_drop_reason_subsys subsys) +{ + if (WARN(subsys <= SKB_DROP_REASON_SUBSYS_CORE || + subsys >= ARRAY_SIZE(drop_reasons_by_subsys), + "invalid subsystem %d\n", subsys)) + return; + + RCU_INIT_POINTER(drop_reasons_by_subsys[subsys], NULL); + + synchronize_rcu(); +} +EXPORT_SYMBOL_GPL(drop_reasons_unregister_subsys); /** * skb_panic - private function for out-of-line support @@ -420,10 +470,9 @@ struct sk_buff *build_skb(void *data, unsigned int frag_size) { struct sk_buff *skb = __build_skb(data, frag_size); - if (skb && frag_size) { + if (likely(skb && frag_size)) { skb->head_frag = 1; - if (page_is_pfmemalloc(virt_to_head_page(data))) - skb->pfmemalloc = 1; + skb_propagate_pfmemalloc(virt_to_head_page(data), skb); } return skb; } @@ -445,8 +494,7 @@ struct sk_buff *build_skb_around(struct sk_buff *skb, if (frag_size) { skb->head_frag = 1; - if (page_is_pfmemalloc(virt_to_head_page(data))) - skb->pfmemalloc = 1; + skb_propagate_pfmemalloc(virt_to_head_page(data), skb); } return skb; } @@ -841,11 +889,11 @@ static void skb_clone_fraglist(struct sk_buff *skb) skb_get(list); } -static bool skb_pp_recycle(struct sk_buff *skb, void *data) +static bool skb_pp_recycle(struct sk_buff *skb, void *data, bool napi_safe) { if (!IS_ENABLED(CONFIG_PAGE_POOL) || !skb->pp_recycle) return false; - return page_pool_return_skb_page(virt_to_page(data)); + return page_pool_return_skb_page(virt_to_page(data), napi_safe); } static void skb_kfree_head(void *head, unsigned int end_offset) @@ -858,12 +906,12 @@ static void skb_kfree_head(void *head, unsigned int end_offset) kfree(head); } -static void skb_free_head(struct sk_buff *skb) +static void skb_free_head(struct sk_buff *skb, bool napi_safe) { unsigned char *head = skb->head; if (skb->head_frag) { - if (skb_pp_recycle(skb, head)) + if (skb_pp_recycle(skb, head, napi_safe)) return; skb_free_frag(head); } else { @@ -871,7 +919,8 @@ static void skb_free_head(struct sk_buff *skb) } } -static void skb_release_data(struct sk_buff *skb, enum skb_drop_reason reason) +static void skb_release_data(struct sk_buff *skb, enum skb_drop_reason reason, + bool napi_safe) { struct skb_shared_info *shinfo = skb_shinfo(skb); int i; @@ -890,13 +939,13 @@ static void skb_release_data(struct sk_buff *skb, enum skb_drop_reason reason) } for (i = 0; i < shinfo->nr_frags; i++) - __skb_frag_unref(&shinfo->frags[i], skb->pp_recycle); + napi_frag_unref(&shinfo->frags[i], skb->pp_recycle, napi_safe); free_head: if (shinfo->frag_list) kfree_skb_list_reason(shinfo->frag_list, reason); - skb_free_head(skb); + skb_free_head(skb, napi_safe); exit: /* When we clone an SKB we copy the reycling bit. The pp_recycle * bit is only set on the head though, so in order to avoid races @@ -957,11 +1006,12 @@ void skb_release_head_state(struct sk_buff *skb) } /* Free everything but the sk_buff shell. */ -static void skb_release_all(struct sk_buff *skb, enum skb_drop_reason reason) +static void skb_release_all(struct sk_buff *skb, enum skb_drop_reason reason, + bool napi_safe) { skb_release_head_state(skb); if (likely(skb->head)) - skb_release_data(skb, reason); + skb_release_data(skb, reason, napi_safe); } /** @@ -975,7 +1025,7 @@ static void skb_release_all(struct sk_buff *skb, enum skb_drop_reason reason) void __kfree_skb(struct sk_buff *skb) { - skb_release_all(skb, SKB_DROP_REASON_NOT_SPECIFIED); + skb_release_all(skb, SKB_DROP_REASON_NOT_SPECIFIED, false); kfree_skbmem(skb); } EXPORT_SYMBOL(__kfree_skb); @@ -986,7 +1036,10 @@ bool __kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason) if (unlikely(!skb_unref(skb))) return false; - DEBUG_NET_WARN_ON_ONCE(reason <= 0 || reason >= SKB_DROP_REASON_MAX); + DEBUG_NET_WARN_ON_ONCE(reason == SKB_NOT_DROPPED_YET || + u32_get_bits(reason, + SKB_DROP_REASON_SUBSYS_MASK) >= + SKB_DROP_REASON_SUBSYS_NUM); if (reason == SKB_CONSUMED) trace_consume_skb(skb, __builtin_return_address(0)); @@ -1029,7 +1082,7 @@ static void kfree_skb_add_bulk(struct sk_buff *skb, return; } - skb_release_all(skb, reason); + skb_release_all(skb, reason, false); sa->skb_array[sa->skb_count++] = skb; if (unlikely(sa->skb_count == KFREE_SKB_BULK_SIZE)) { @@ -1203,7 +1256,7 @@ EXPORT_SYMBOL(consume_skb); void __consume_stateless_skb(struct sk_buff *skb) { trace_consume_skb(skb, __builtin_return_address(0)); - skb_release_data(skb, SKB_CONSUMED); + skb_release_data(skb, SKB_CONSUMED, false); kfree_skbmem(skb); } @@ -1226,9 +1279,9 @@ static void napi_skb_cache_put(struct sk_buff *skb) } } -void __kfree_skb_defer(struct sk_buff *skb) +void __napi_kfree_skb(struct sk_buff *skb, enum skb_drop_reason reason) { - skb_release_all(skb, SKB_DROP_REASON_NOT_SPECIFIED); + skb_release_all(skb, reason, true); napi_skb_cache_put(skb); } @@ -1266,7 +1319,7 @@ void napi_consume_skb(struct sk_buff *skb, int budget) return; } - skb_release_all(skb, SKB_CONSUMED); + skb_release_all(skb, SKB_CONSUMED, !!budget); napi_skb_cache_put(skb); } EXPORT_SYMBOL(napi_consume_skb); @@ -1397,7 +1450,7 @@ EXPORT_SYMBOL_GPL(alloc_skb_for_msg); */ struct sk_buff *skb_morph(struct sk_buff *dst, struct sk_buff *src) { - skb_release_all(dst, SKB_CONSUMED); + skb_release_all(dst, SKB_CONSUMED, false); return __skb_clone(dst, src); } EXPORT_SYMBOL_GPL(skb_morph); @@ -2020,9 +2073,9 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, if (skb_has_frag_list(skb)) skb_clone_fraglist(skb); - skb_release_data(skb, SKB_CONSUMED); + skb_release_data(skb, SKB_CONSUMED, false); } else { - skb_free_head(skb); + skb_free_head(skb, false); } off = (data + nhead) - skb->head; @@ -5162,6 +5215,9 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb, skb = alloc_skb(0, GFP_ATOMIC); } else { skb = skb_clone(orig_skb, GFP_ATOMIC); + + if (skb_orphan_frags_rx(skb, GFP_ATOMIC)) + return; } if (!skb) return; @@ -5189,6 +5245,7 @@ void skb_tstamp_tx(struct sk_buff *orig_skb, } EXPORT_SYMBOL_GPL(skb_tstamp_tx); +#ifdef CONFIG_WIRELESS void skb_complete_wifi_ack(struct sk_buff *skb, bool acked) { struct sock *sk = skb->sk; @@ -5214,6 +5271,7 @@ void skb_complete_wifi_ack(struct sk_buff *skb, bool acked) kfree_skb(skb); } EXPORT_SYMBOL_GPL(skb_complete_wifi_ack); +#endif /* CONFIG_WIRELESS */ /** * skb_partial_csum_set - set up and verify partial csum values for packet @@ -5941,7 +5999,6 @@ EXPORT_SYMBOL(skb_ensure_writable); */ int __skb_vlan_pop(struct sk_buff *skb, u16 *vlan_tci) { - struct vlan_hdr *vhdr; int offset = skb->data - skb_mac_header(skb); int err; @@ -5957,13 +6014,8 @@ int __skb_vlan_pop(struct sk_buff *skb, u16 *vlan_tci) skb_postpull_rcsum(skb, skb->data + (2 * ETH_ALEN), VLAN_HLEN); - vhdr = (struct vlan_hdr *)(skb->data + ETH_HLEN); - *vlan_tci = ntohs(vhdr->h_vlan_TCI); - - memmove(skb->data + VLAN_HLEN, skb->data, 2 * ETH_ALEN); - __skb_pull(skb, VLAN_HLEN); + vlan_remove_tag(skb, vlan_tci); - vlan_set_encap_proto(skb, vhdr); skb->mac_header += VLAN_HLEN; if (skb_network_offset(skb) < ETH_HLEN) @@ -6391,12 +6443,12 @@ static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off, skb_frag_ref(skb, i); if (skb_has_frag_list(skb)) skb_clone_fraglist(skb); - skb_release_data(skb, SKB_CONSUMED); + skb_release_data(skb, SKB_CONSUMED, false); } else { /* we can reuse existing recount- all we did was * relocate values */ - skb_free_head(skb); + skb_free_head(skb, false); } skb->head = data; @@ -6531,7 +6583,7 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off, skb_kfree_head(data, size); return -ENOMEM; } - skb_release_data(skb, SKB_CONSUMED); + skb_release_data(skb, SKB_CONSUMED, false); skb->head = data; skb->head_frag = 0; @@ -6815,7 +6867,6 @@ void skb_attempt_defer_free(struct sk_buff *skb) { int cpu = skb->alloc_cpu; struct softnet_data *sd; - unsigned long flags; unsigned int defer_max; bool kick; @@ -6826,12 +6877,15 @@ nodefer: __kfree_skb(skb); return; } + DEBUG_NET_WARN_ON_ONCE(skb_dst(skb)); + DEBUG_NET_WARN_ON_ONCE(skb->destructor); + sd = &per_cpu(softnet_data, cpu); defer_max = READ_ONCE(sysctl_skb_defer_max); if (READ_ONCE(sd->defer_count) >= defer_max) goto nodefer; - spin_lock_irqsave(&sd->defer_lock, flags); + spin_lock_bh(&sd->defer_lock); /* Send an IPI every time queue reaches half capacity. */ kick = sd->defer_count == (defer_max >> 1); /* Paired with the READ_ONCE() few lines above */ @@ -6840,7 +6894,7 @@ nodefer: __kfree_skb(skb); skb->next = sd->defer_list; /* Paired with READ_ONCE() in skb_defer_free_flush() */ WRITE_ONCE(sd->defer_list, skb); - spin_unlock_irqrestore(&sd->defer_lock, flags); + spin_unlock_bh(&sd->defer_lock); /* Make sure to trigger NET_RX_SOFTIRQ on the remote CPU * if we are unlucky enough (this seems very unlikely). diff --git a/net/core/sock.c b/net/core/sock.c index c25888795390..5440e67bcfe3 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1396,15 +1396,10 @@ set_sndbuf: #ifdef CONFIG_NET_RX_BUSY_POLL case SO_BUSY_POLL: - /* allow unprivileged users to decrease the value */ - if ((val > sk->sk_ll_usec) && !sockopt_capable(CAP_NET_ADMIN)) - ret = -EPERM; - else { - if (val < 0) - ret = -EINVAL; - else - WRITE_ONCE(sk->sk_ll_usec, val); - } + if (val < 0) + ret = -EINVAL; + else + WRITE_ONCE(sk->sk_ll_usec, val); break; case SO_PREFER_BUSY_POLL: if (valbool && !sockopt_capable(CAP_NET_ADMIN)) diff --git a/net/core/sock_map.c b/net/core/sock_map.c index a68a7290a3b2..7c189c2e2fbf 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -437,7 +437,7 @@ static void sock_map_delete_from_link(struct bpf_map *map, struct sock *sk, __sock_map_delete(stab, sk, link_raw); } -static int sock_map_delete_elem(struct bpf_map *map, void *key) +static long sock_map_delete_elem(struct bpf_map *map, void *key) { struct bpf_stab *stab = container_of(map, struct bpf_stab, map); u32 i = *(u32 *)key; @@ -587,8 +587,8 @@ out: return ret; } -static int sock_map_update_elem(struct bpf_map *map, void *key, - void *value, u64 flags) +static long sock_map_update_elem(struct bpf_map *map, void *key, + void *value, u64 flags) { struct sock *sk = (struct sock *)value; int ret; @@ -797,6 +797,14 @@ static void sock_map_fini_seq_private(void *priv_data) bpf_map_put_with_uref(info->map); } +static u64 sock_map_mem_usage(const struct bpf_map *map) +{ + u64 usage = sizeof(struct bpf_stab); + + usage += (u64)map->max_entries * sizeof(struct sock *); + return usage; +} + static const struct bpf_iter_seq_info sock_map_iter_seq_info = { .seq_ops = &sock_map_seq_ops, .init_seq_private = sock_map_init_seq_private, @@ -816,6 +824,7 @@ const struct bpf_map_ops sock_map_ops = { .map_lookup_elem = sock_map_lookup, .map_release_uref = sock_map_release_progs, .map_check_btf = map_check_no_btf, + .map_mem_usage = sock_map_mem_usage, .map_btf_id = &sock_map_btf_ids[0], .iter_seq_info = &sock_map_iter_seq_info, }; @@ -916,7 +925,7 @@ static void sock_hash_delete_from_link(struct bpf_map *map, struct sock *sk, raw_spin_unlock_bh(&bucket->lock); } -static int sock_hash_delete_elem(struct bpf_map *map, void *key) +static long sock_hash_delete_elem(struct bpf_map *map, void *key) { struct bpf_shtab *htab = container_of(map, struct bpf_shtab, map); u32 hash, key_size = map->key_size; @@ -1397,6 +1406,16 @@ static void sock_hash_fini_seq_private(void *priv_data) bpf_map_put_with_uref(info->map); } +static u64 sock_hash_mem_usage(const struct bpf_map *map) +{ + struct bpf_shtab *htab = container_of(map, struct bpf_shtab, map); + u64 usage = sizeof(*htab); + + usage += htab->buckets_num * sizeof(struct bpf_shtab_bucket); + usage += atomic_read(&htab->count) * (u64)htab->elem_size; + return usage; +} + static const struct bpf_iter_seq_info sock_hash_iter_seq_info = { .seq_ops = &sock_hash_seq_ops, .init_seq_private = sock_hash_init_seq_private, @@ -1416,6 +1435,7 @@ const struct bpf_map_ops sock_hash_ops = { .map_lookup_elem_sys_only = sock_hash_lookup_sys, .map_release_uref = sock_hash_release_progs, .map_check_btf = map_check_no_btf, + .map_mem_usage = sock_hash_mem_usage, .map_btf_id = &sock_hash_map_btf_ids[0], .iter_seq_info = &sock_hash_iter_seq_info, }; diff --git a/net/core/xdp.c b/net/core/xdp.c index fb85aca81961..41e5ca8643ec 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -531,21 +531,6 @@ out: } EXPORT_SYMBOL_GPL(xdp_return_buff); -/* Only called for MEM_TYPE_PAGE_POOL see xdp.h */ -void __xdp_release_frame(void *data, struct xdp_mem_info *mem) -{ - struct xdp_mem_allocator *xa; - struct page *page; - - rcu_read_lock(); - xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params); - page = virt_to_head_page(data); - if (xa) - page_pool_release_page(xa->page_pool, page); - rcu_read_unlock(); -} -EXPORT_SYMBOL_GPL(__xdp_release_frame); - void xdp_attachment_setup(struct xdp_attachment_info *info, struct netdev_bpf *bpf) { @@ -658,8 +643,8 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf, * - RX ring dev queue index (skb_record_rx_queue) */ - /* Until page_pool get SKB return path, release DMA here */ - xdp_release_frame(xdpf); + if (xdpf->mem.type == MEM_TYPE_PAGE_POOL) + skb_mark_for_recycle(skb); /* Allow SKB to reuse area used by xdp_frame */ xdp_scrub_frame(xdpf); diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c index b780827f5e0a..3ab68415d121 100644 --- a/net/dccp/ipv4.c +++ b/net/dccp/ipv4.c @@ -177,7 +177,7 @@ static inline void dccp_do_pmtu_discovery(struct sock *sk, * for the case, if this connection will not able to recover. */ if (mtu < dst_mtu(dst) && ip_dont_fragment(sk, dst)) - sk->sk_err_soft = EMSGSIZE; + WRITE_ONCE(sk->sk_err_soft, EMSGSIZE); mtu = dst_mtu(dst); @@ -339,8 +339,9 @@ static int dccp_v4_err(struct sk_buff *skb, u32 info) sk_error_report(sk); dccp_done(sk); - } else - sk->sk_err_soft = err; + } else { + WRITE_ONCE(sk->sk_err_soft, err); + } goto out; } @@ -364,8 +365,9 @@ static int dccp_v4_err(struct sk_buff *skb, u32 info) if (!sock_owned_by_user(sk) && inet->recverr) { sk->sk_err = err; sk_error_report(sk); - } else /* Only an error on timeout */ - sk->sk_err_soft = err; + } else { /* Only an error on timeout */ + WRITE_ONCE(sk->sk_err_soft, err); + } out: bh_unlock_sock(sk); sock_put(sk); diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index b9d7c3dd1cb3..93c98990d726 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -174,17 +174,18 @@ static int dccp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, */ sk_error_report(sk); dccp_done(sk); - } else - sk->sk_err_soft = err; + } else { + WRITE_ONCE(sk->sk_err_soft, err); + } goto out; } if (!sock_owned_by_user(sk) && np->recverr) { sk->sk_err = err; sk_error_report(sk); - } else - sk->sk_err_soft = err; - + } else { + WRITE_ONCE(sk->sk_err_soft, err); + } out: bh_unlock_sock(sk); sock_put(sk); @@ -783,6 +784,7 @@ lookup: if (!xfrm6_policy_check(sk, XFRM_POLICY_IN, skb)) goto discard_and_relse; + nf_reset_ct(skb); return __sk_receive_skb(sk, skb, 1, dh->dccph_doff * 4, refcounted) ? -1 : 0; diff --git a/net/dccp/timer.c b/net/dccp/timer.c index 27a3b37acd2e..b3255e87cc7e 100644 --- a/net/dccp/timer.c +++ b/net/dccp/timer.c @@ -19,7 +19,7 @@ int sysctl_dccp_retries2 __read_mostly = TCP_RETR2; static void dccp_write_err(struct sock *sk) { - sk->sk_err = sk->sk_err_soft ? : ETIMEDOUT; + sk->sk_err = READ_ONCE(sk->sk_err_soft) ? : ETIMEDOUT; sk_error_report(sk); dccp_send_reset(sk, DCCP_RESET_CODE_ABORTED); diff --git a/net/dsa/Makefile b/net/dsa/Makefile index cc7e93a562fe..12e305824a96 100644 --- a/net/dsa/Makefile +++ b/net/dsa/Makefile @@ -1,4 +1,10 @@ # SPDX-License-Identifier: GPL-2.0 + +# the stubs are built-in whenever DSA is built-in or module +ifdef CONFIG_NET_DSA +obj-y := stubs.o +endif + # the core obj-$(CONFIG_NET_DSA) += dsa_core.o dsa_core-y += \ @@ -10,7 +16,8 @@ dsa_core-y += \ slave.o \ switch.o \ tag.o \ - tag_8021q.o + tag_8021q.o \ + trace.o # tagging formats obj-$(CONFIG_NET_DSA_TAG_AR9331) += tag_ar9331.o @@ -31,3 +38,6 @@ obj-$(CONFIG_NET_DSA_TAG_RZN1_A5PSW) += tag_rzn1_a5psw.o obj-$(CONFIG_NET_DSA_TAG_SJA1105) += tag_sja1105.o obj-$(CONFIG_NET_DSA_TAG_TRAILER) += tag_trailer.o obj-$(CONFIG_NET_DSA_TAG_XRS700X) += tag_xrs700x.o + +# for tracing framework to find trace.h +CFLAGS_trace.o := -I$(src) diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c index e5f156940c67..ab1afe67fd18 100644 --- a/net/dsa/dsa.c +++ b/net/dsa/dsa.c @@ -17,6 +17,7 @@ #include <linux/of.h> #include <linux/of_mdio.h> #include <linux/of_net.h> +#include <net/dsa_stubs.h> #include <net/sch_generic.h> #include "devlink.h" @@ -1702,6 +1703,20 @@ bool dsa_mdb_present_in_other_db(struct dsa_switch *ds, int port, } EXPORT_SYMBOL_GPL(dsa_mdb_present_in_other_db); +static const struct dsa_stubs __dsa_stubs = { + .master_hwtstamp_validate = __dsa_master_hwtstamp_validate, +}; + +static void dsa_register_stubs(void) +{ + dsa_stubs = &__dsa_stubs; +} + +static void dsa_unregister_stubs(void) +{ + dsa_stubs = NULL; +} + static int __init dsa_init_module(void) { int rc; @@ -1721,6 +1736,8 @@ static int __init dsa_init_module(void) if (rc) goto netlink_register_fail; + dsa_register_stubs(); + return 0; netlink_register_fail: @@ -1735,6 +1752,8 @@ module_init(dsa_init_module); static void __exit dsa_cleanup_module(void) { + dsa_unregister_stubs(); + rtnl_link_unregister(&dsa_link_ops); dsa_slave_unregister_notifier(); diff --git a/net/dsa/master.c b/net/dsa/master.c index 22d3f16b0e6d..6be89ab0cc01 100644 --- a/net/dsa/master.c +++ b/net/dsa/master.c @@ -195,38 +195,31 @@ static void dsa_master_get_strings(struct net_device *dev, uint32_t stringset, } } -static int dsa_master_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd) +/* Deny PTP operations on master if there is at least one switch in the tree + * that is PTP capable. + */ +int __dsa_master_hwtstamp_validate(struct net_device *dev, + const struct kernel_hwtstamp_config *config, + struct netlink_ext_ack *extack) { struct dsa_port *cpu_dp = dev->dsa_ptr; struct dsa_switch *ds = cpu_dp->ds; struct dsa_switch_tree *dst; - int err = -EOPNOTSUPP; struct dsa_port *dp; dst = ds->dst; - switch (cmd) { - case SIOCGHWTSTAMP: - case SIOCSHWTSTAMP: - /* Deny PTP operations on master if there is at least one - * switch in the tree that is PTP capable. - */ - list_for_each_entry(dp, &dst->ports, list) - if (dsa_port_supports_hwtstamp(dp, ifr)) - return -EBUSY; - break; + list_for_each_entry(dp, &dst->ports, list) { + if (dsa_port_supports_hwtstamp(dp)) { + NL_SET_ERR_MSG(extack, + "HW timestamping not allowed on DSA master when switch supports the operation"); + return -EBUSY; + } } - if (dev->netdev_ops->ndo_eth_ioctl) - err = dev->netdev_ops->ndo_eth_ioctl(dev, ifr, cmd); - - return err; + return 0; } -static const struct dsa_netdevice_ops dsa_netdev_ops = { - .ndo_eth_ioctl = dsa_master_ioctl, -}; - static int dsa_master_ethtool_setup(struct net_device *dev) { struct dsa_port *cpu_dp = dev->dsa_ptr; @@ -267,15 +260,6 @@ static void dsa_master_ethtool_teardown(struct net_device *dev) cpu_dp->orig_ethtool_ops = NULL; } -static void dsa_netdev_ops_set(struct net_device *dev, - const struct dsa_netdevice_ops *ops) -{ - if (netif_is_lag_master(dev)) - return; - - dev->dsa_ptr->netdev_ops = ops; -} - /* Keep the master always promiscuous if the tagging protocol requires that * (garbles MAC DA) or if it doesn't support unicast filtering, case in which * it would revert to promiscuous mode as soon as we call dev_uc_add() on it @@ -414,16 +398,13 @@ int dsa_master_setup(struct net_device *dev, struct dsa_port *cpu_dp) if (ret) goto out_err_reset_promisc; - dsa_netdev_ops_set(dev, &dsa_netdev_ops); - ret = sysfs_create_group(&dev->dev.kobj, &dsa_group); if (ret) - goto out_err_ndo_teardown; + goto out_err_ethtool_teardown; return ret; -out_err_ndo_teardown: - dsa_netdev_ops_set(dev, NULL); +out_err_ethtool_teardown: dsa_master_ethtool_teardown(dev); out_err_reset_promisc: dsa_master_set_promiscuity(dev, -1); @@ -433,7 +414,6 @@ out_err_reset_promisc: void dsa_master_teardown(struct net_device *dev) { sysfs_remove_group(&dev->dev.kobj, &dsa_group); - dsa_netdev_ops_set(dev, NULL); dsa_master_ethtool_teardown(dev); dsa_master_reset_mtu(dev); dsa_master_set_promiscuity(dev, -1); diff --git a/net/dsa/master.h b/net/dsa/master.h index 3fc0e610b5b5..76e39d3ec909 100644 --- a/net/dsa/master.h +++ b/net/dsa/master.h @@ -15,5 +15,8 @@ int dsa_master_lag_setup(struct net_device *lag_dev, struct dsa_port *cpu_dp, struct netlink_ext_ack *extack); void dsa_master_lag_teardown(struct net_device *lag_dev, struct dsa_port *cpu_dp); +int __dsa_master_hwtstamp_validate(struct net_device *dev, + const struct kernel_hwtstamp_config *config, + struct netlink_ext_ack *extack); #endif diff --git a/net/dsa/port.c b/net/dsa/port.c index 67ad1adec2a2..71ba30538411 100644 --- a/net/dsa/port.c +++ b/net/dsa/port.c @@ -114,19 +114,21 @@ static bool dsa_port_can_configure_learning(struct dsa_port *dp) return !err; } -bool dsa_port_supports_hwtstamp(struct dsa_port *dp, struct ifreq *ifr) +bool dsa_port_supports_hwtstamp(struct dsa_port *dp) { struct dsa_switch *ds = dp->ds; + struct ifreq ifr = {}; int err; if (!ds->ops->port_hwtstamp_get || !ds->ops->port_hwtstamp_set) return false; /* "See through" shim implementations of the "get" method. - * This will clobber the ifreq structure, but we will either return an - * error, or the master will overwrite it with proper values. + * Since we can't cook up a complete ioctl request structure, this will + * fail in copy_to_user() with -EFAULT, which hopefully is enough to + * detect a valid implementation. */ - err = ds->ops->port_hwtstamp_get(ds, dp->index, ifr); + err = ds->ops->port_hwtstamp_get(ds, dp->index, &ifr); return err != -EOPNOTSUPP; } @@ -1028,9 +1030,6 @@ static int dsa_port_host_fdb_add(struct dsa_port *dp, .db = db, }; - if (!dp->ds->fdb_isolation) - info.db.bridge.num = 0; - return dsa_port_notify(dp, DSA_NOTIFIER_HOST_FDB_ADD, &info); } @@ -1055,6 +1054,9 @@ int dsa_port_bridge_host_fdb_add(struct dsa_port *dp, }; int err; + if (!dp->ds->fdb_isolation) + db.bridge.num = 0; + /* Avoid a call to __dev_set_promiscuity() on the master, which * requires rtnl_lock(), since we can't guarantee that is held here, * and we can't take it either. @@ -1079,9 +1081,6 @@ static int dsa_port_host_fdb_del(struct dsa_port *dp, .db = db, }; - if (!dp->ds->fdb_isolation) - info.db.bridge.num = 0; - return dsa_port_notify(dp, DSA_NOTIFIER_HOST_FDB_DEL, &info); } @@ -1106,6 +1105,9 @@ int dsa_port_bridge_host_fdb_del(struct dsa_port *dp, }; int err; + if (!dp->ds->fdb_isolation) + db.bridge.num = 0; + if (master->priv_flags & IFF_UNICAST_FLT) { err = dev_uc_del(master, addr); if (err) @@ -1210,9 +1212,6 @@ static int dsa_port_host_mdb_add(const struct dsa_port *dp, .db = db, }; - if (!dp->ds->fdb_isolation) - info.db.bridge.num = 0; - return dsa_port_notify(dp, DSA_NOTIFIER_HOST_MDB_ADD, &info); } @@ -1237,6 +1236,9 @@ int dsa_port_bridge_host_mdb_add(const struct dsa_port *dp, }; int err; + if (!dp->ds->fdb_isolation) + db.bridge.num = 0; + err = dev_mc_add(master, mdb->addr); if (err) return err; @@ -1254,9 +1256,6 @@ static int dsa_port_host_mdb_del(const struct dsa_port *dp, .db = db, }; - if (!dp->ds->fdb_isolation) - info.db.bridge.num = 0; - return dsa_port_notify(dp, DSA_NOTIFIER_HOST_MDB_DEL, &info); } @@ -1281,6 +1280,9 @@ int dsa_port_bridge_host_mdb_del(const struct dsa_port *dp, }; int err; + if (!dp->ds->fdb_isolation) + db.bridge.num = 0; + err = dev_mc_del(master, mdb->addr); if (err) return err; diff --git a/net/dsa/port.h b/net/dsa/port.h index 9c218660d223..dc812512fd0e 100644 --- a/net/dsa/port.h +++ b/net/dsa/port.h @@ -15,7 +15,7 @@ struct switchdev_obj_port_mdb; struct switchdev_vlan_msti; struct phy_device; -bool dsa_port_supports_hwtstamp(struct dsa_port *dp, struct ifreq *ifr); +bool dsa_port_supports_hwtstamp(struct dsa_port *dp); void dsa_port_set_tag_protocol(struct dsa_port *cpu_dp, const struct dsa_device_ops *tag_ops); int dsa_port_set_state(struct dsa_port *dp, u8 state, bool do_fast_age); diff --git a/net/dsa/stubs.c b/net/dsa/stubs.c new file mode 100644 index 000000000000..2ed8a6c85fbf --- /dev/null +++ b/net/dsa/stubs.c @@ -0,0 +1,10 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Stubs for DSA functionality called by the core network stack. + * These are necessary because CONFIG_NET_DSA can be a module, and built-in + * code cannot directly call symbols exported by modules. + */ +#include <net/dsa_stubs.h> + +const struct dsa_stubs *dsa_stubs; +EXPORT_SYMBOL_GPL(dsa_stubs); diff --git a/net/dsa/switch.c b/net/dsa/switch.c index d5bc4bb7310d..8c9a9f94b756 100644 --- a/net/dsa/switch.c +++ b/net/dsa/switch.c @@ -18,6 +18,7 @@ #include "slave.h" #include "switch.h" #include "tag_8021q.h" +#include "trace.h" static unsigned int dsa_switch_fastest_ageing_time(struct dsa_switch *ds, unsigned int ageing_time) @@ -164,14 +165,20 @@ static int dsa_port_do_mdb_add(struct dsa_port *dp, int err = 0; /* No need to bother with refcounting for user ports */ - if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp))) - return ds->ops->port_mdb_add(ds, port, mdb, db); + if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp))) { + err = ds->ops->port_mdb_add(ds, port, mdb, db); + trace_dsa_mdb_add_hw(dp, mdb->addr, mdb->vid, &db, err); + + return err; + } mutex_lock(&dp->addr_lists_lock); a = dsa_mac_addr_find(&dp->mdbs, mdb->addr, mdb->vid, db); if (a) { refcount_inc(&a->refcount); + trace_dsa_mdb_add_bump(dp, mdb->addr, mdb->vid, &db, + &a->refcount); goto out; } @@ -182,6 +189,7 @@ static int dsa_port_do_mdb_add(struct dsa_port *dp, } err = ds->ops->port_mdb_add(ds, port, mdb, db); + trace_dsa_mdb_add_hw(dp, mdb->addr, mdb->vid, &db, err); if (err) { kfree(a); goto out; @@ -209,21 +217,30 @@ static int dsa_port_do_mdb_del(struct dsa_port *dp, int err = 0; /* No need to bother with refcounting for user ports */ - if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp))) - return ds->ops->port_mdb_del(ds, port, mdb, db); + if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp))) { + err = ds->ops->port_mdb_del(ds, port, mdb, db); + trace_dsa_mdb_del_hw(dp, mdb->addr, mdb->vid, &db, err); + + return err; + } mutex_lock(&dp->addr_lists_lock); a = dsa_mac_addr_find(&dp->mdbs, mdb->addr, mdb->vid, db); if (!a) { + trace_dsa_mdb_del_not_found(dp, mdb->addr, mdb->vid, &db); err = -ENOENT; goto out; } - if (!refcount_dec_and_test(&a->refcount)) + if (!refcount_dec_and_test(&a->refcount)) { + trace_dsa_mdb_del_drop(dp, mdb->addr, mdb->vid, &db, + &a->refcount); goto out; + } err = ds->ops->port_mdb_del(ds, port, mdb, db); + trace_dsa_mdb_del_hw(dp, mdb->addr, mdb->vid, &db, err); if (err) { refcount_set(&a->refcount, 1); goto out; @@ -247,14 +264,19 @@ static int dsa_port_do_fdb_add(struct dsa_port *dp, const unsigned char *addr, int err = 0; /* No need to bother with refcounting for user ports */ - if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp))) - return ds->ops->port_fdb_add(ds, port, addr, vid, db); + if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp))) { + err = ds->ops->port_fdb_add(ds, port, addr, vid, db); + trace_dsa_fdb_add_hw(dp, addr, vid, &db, err); + + return err; + } mutex_lock(&dp->addr_lists_lock); a = dsa_mac_addr_find(&dp->fdbs, addr, vid, db); if (a) { refcount_inc(&a->refcount); + trace_dsa_fdb_add_bump(dp, addr, vid, &db, &a->refcount); goto out; } @@ -265,6 +287,7 @@ static int dsa_port_do_fdb_add(struct dsa_port *dp, const unsigned char *addr, } err = ds->ops->port_fdb_add(ds, port, addr, vid, db); + trace_dsa_fdb_add_hw(dp, addr, vid, &db, err); if (err) { kfree(a); goto out; @@ -291,21 +314,29 @@ static int dsa_port_do_fdb_del(struct dsa_port *dp, const unsigned char *addr, int err = 0; /* No need to bother with refcounting for user ports */ - if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp))) - return ds->ops->port_fdb_del(ds, port, addr, vid, db); + if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp))) { + err = ds->ops->port_fdb_del(ds, port, addr, vid, db); + trace_dsa_fdb_del_hw(dp, addr, vid, &db, err); + + return err; + } mutex_lock(&dp->addr_lists_lock); a = dsa_mac_addr_find(&dp->fdbs, addr, vid, db); if (!a) { + trace_dsa_fdb_del_not_found(dp, addr, vid, &db); err = -ENOENT; goto out; } - if (!refcount_dec_and_test(&a->refcount)) + if (!refcount_dec_and_test(&a->refcount)) { + trace_dsa_fdb_del_drop(dp, addr, vid, &db, &a->refcount); goto out; + } err = ds->ops->port_fdb_del(ds, port, addr, vid, db); + trace_dsa_fdb_del_hw(dp, addr, vid, &db, err); if (err) { refcount_set(&a->refcount, 1); goto out; @@ -332,6 +363,8 @@ static int dsa_switch_do_lag_fdb_add(struct dsa_switch *ds, struct dsa_lag *lag, a = dsa_mac_addr_find(&lag->fdbs, addr, vid, db); if (a) { refcount_inc(&a->refcount); + trace_dsa_lag_fdb_add_bump(lag->dev, addr, vid, &db, + &a->refcount); goto out; } @@ -342,6 +375,7 @@ static int dsa_switch_do_lag_fdb_add(struct dsa_switch *ds, struct dsa_lag *lag, } err = ds->ops->lag_fdb_add(ds, *lag, addr, vid, db); + trace_dsa_lag_fdb_add_hw(lag->dev, addr, vid, &db, err); if (err) { kfree(a); goto out; @@ -370,14 +404,19 @@ static int dsa_switch_do_lag_fdb_del(struct dsa_switch *ds, struct dsa_lag *lag, a = dsa_mac_addr_find(&lag->fdbs, addr, vid, db); if (!a) { + trace_dsa_lag_fdb_del_not_found(lag->dev, addr, vid, &db); err = -ENOENT; goto out; } - if (!refcount_dec_and_test(&a->refcount)) + if (!refcount_dec_and_test(&a->refcount)) { + trace_dsa_lag_fdb_del_drop(lag->dev, addr, vid, &db, + &a->refcount); goto out; + } err = ds->ops->lag_fdb_del(ds, *lag, addr, vid, db); + trace_dsa_lag_fdb_del_hw(lag->dev, addr, vid, &db, err); if (err) { refcount_set(&a->refcount, 1); goto out; @@ -656,8 +695,12 @@ static int dsa_port_do_vlan_add(struct dsa_port *dp, int err = 0; /* No need to bother with refcounting for user ports. */ - if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp))) - return ds->ops->port_vlan_add(ds, port, vlan, extack); + if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp))) { + err = ds->ops->port_vlan_add(ds, port, vlan, extack); + trace_dsa_vlan_add_hw(dp, vlan, err); + + return err; + } /* No need to propagate on shared ports the existing VLANs that were * re-notified after just the flags have changed. This would cause a @@ -672,6 +715,7 @@ static int dsa_port_do_vlan_add(struct dsa_port *dp, v = dsa_vlan_find(&dp->vlans, vlan); if (v) { refcount_inc(&v->refcount); + trace_dsa_vlan_add_bump(dp, vlan, &v->refcount); goto out; } @@ -682,6 +726,7 @@ static int dsa_port_do_vlan_add(struct dsa_port *dp, } err = ds->ops->port_vlan_add(ds, port, vlan, extack); + trace_dsa_vlan_add_hw(dp, vlan, err); if (err) { kfree(v); goto out; @@ -706,21 +751,29 @@ static int dsa_port_do_vlan_del(struct dsa_port *dp, int err = 0; /* No need to bother with refcounting for user ports */ - if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp))) - return ds->ops->port_vlan_del(ds, port, vlan); + if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp))) { + err = ds->ops->port_vlan_del(ds, port, vlan); + trace_dsa_vlan_del_hw(dp, vlan, err); + + return err; + } mutex_lock(&dp->vlans_lock); v = dsa_vlan_find(&dp->vlans, vlan); if (!v) { + trace_dsa_vlan_del_not_found(dp, vlan); err = -ENOENT; goto out; } - if (!refcount_dec_and_test(&v->refcount)) + if (!refcount_dec_and_test(&v->refcount)) { + trace_dsa_vlan_del_drop(dp, vlan, &v->refcount); goto out; + } err = ds->ops->port_vlan_del(ds, port, vlan); + trace_dsa_vlan_del_hw(dp, vlan, err); if (err) { refcount_set(&v->refcount, 1); goto out; diff --git a/net/dsa/tag.h b/net/dsa/tag.h index 7cfbca824f1c..32d12f4a9d73 100644 --- a/net/dsa/tag.h +++ b/net/dsa/tag.h @@ -229,7 +229,7 @@ static inline void *dsa_etype_header_pos_rx(struct sk_buff *skb) return skb->data - 2; } -/* On TX, skb->data points to skb_mac_header(skb), which means that EtherType +/* On TX, skb->data points to the MAC header, which means that EtherType * header taggers start exactly where the EtherType is (the EtherType is * treated as part of the DSA header). */ diff --git a/net/dsa/tag_8021q.c b/net/dsa/tag_8021q.c index 5ee9ef00954e..cbdfc392f7e0 100644 --- a/net/dsa/tag_8021q.c +++ b/net/dsa/tag_8021q.c @@ -461,8 +461,8 @@ EXPORT_SYMBOL_GPL(dsa_tag_8021q_unregister); struct sk_buff *dsa_8021q_xmit(struct sk_buff *skb, struct net_device *netdev, u16 tpid, u16 tci) { - /* skb->data points at skb_mac_header, which - * is fine for vlan_insert_tag. + /* skb->data points at the MAC header, which is fine + * for vlan_insert_tag(). */ return vlan_insert_tag(skb, htons(tpid), tci); } diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c index 0eb1c7784c3d..ea100bd25939 100644 --- a/net/dsa/tag_ksz.c +++ b/net/dsa/tag_ksz.c @@ -120,18 +120,18 @@ static struct sk_buff *ksz_common_rcv(struct sk_buff *skb, static struct sk_buff *ksz8795_xmit(struct sk_buff *skb, struct net_device *dev) { struct dsa_port *dp = dsa_slave_to_port(dev); + struct ethhdr *hdr; u8 *tag; - u8 *addr; if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb)) return NULL; /* Tag encoding */ tag = skb_put(skb, KSZ_INGRESS_TAG_LEN); - addr = skb_mac_header(skb); + hdr = skb_eth_hdr(skb); *tag = 1 << dp->index; - if (is_link_local_ether_addr(addr)) + if (is_link_local_ether_addr(hdr->h_dest)) *tag |= KSZ8795_TAIL_TAG_OVERRIDE; return skb; @@ -273,8 +273,8 @@ static struct sk_buff *ksz9477_xmit(struct sk_buff *skb, u16 queue_mapping = skb_get_queue_mapping(skb); u8 prio = netdev_txq_to_tc(dev, queue_mapping); struct dsa_port *dp = dsa_slave_to_port(dev); + struct ethhdr *hdr; __be16 *tag; - u8 *addr; u16 val; if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb)) @@ -284,13 +284,13 @@ static struct sk_buff *ksz9477_xmit(struct sk_buff *skb, ksz_xmit_timestamp(dp, skb); tag = skb_put(skb, KSZ9477_INGRESS_TAG_LEN); - addr = skb_mac_header(skb); + hdr = skb_eth_hdr(skb); val = BIT(dp->index); val |= FIELD_PREP(KSZ9477_TAIL_TAG_PRIO, prio); - if (is_link_local_ether_addr(addr)) + if (is_link_local_ether_addr(hdr->h_dest)) val |= KSZ9477_TAIL_TAG_OVERRIDE; *tag = cpu_to_be16(val); @@ -337,7 +337,7 @@ static struct sk_buff *ksz9893_xmit(struct sk_buff *skb, u16 queue_mapping = skb_get_queue_mapping(skb); u8 prio = netdev_txq_to_tc(dev, queue_mapping); struct dsa_port *dp = dsa_slave_to_port(dev); - u8 *addr; + struct ethhdr *hdr; u8 *tag; if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb)) @@ -347,13 +347,13 @@ static struct sk_buff *ksz9893_xmit(struct sk_buff *skb, ksz_xmit_timestamp(dp, skb); tag = skb_put(skb, KSZ_INGRESS_TAG_LEN); - addr = skb_mac_header(skb); + hdr = skb_eth_hdr(skb); *tag = BIT(dp->index); *tag |= FIELD_PREP(KSZ9893_TAIL_TAG_PRIO, prio); - if (is_link_local_ether_addr(addr)) + if (is_link_local_ether_addr(hdr->h_dest)) *tag |= KSZ9893_TAIL_TAG_OVERRIDE; return ksz_defer_xmit(dp, skb); diff --git a/net/dsa/tag_ocelot.c b/net/dsa/tag_ocelot.c index 28ebecafdd24..20bf7074d5a6 100644 --- a/net/dsa/tag_ocelot.c +++ b/net/dsa/tag_ocelot.c @@ -26,11 +26,11 @@ static void ocelot_xmit_get_vlan_info(struct sk_buff *skb, struct dsa_port *dp, return; } - hdr = (struct vlan_ethhdr *)skb_mac_header(skb); + hdr = skb_vlan_eth_hdr(skb); br_vlan_get_proto(br, &proto); if (ntohs(hdr->h_vlan_proto) == proto) { - __skb_vlan_pop(skb, &tci); + vlan_remove_tag(skb, &tci); *vlan_tci = tci; } else { rcu_read_lock(); diff --git a/net/dsa/tag_sja1105.c b/net/dsa/tag_sja1105.c index 1c2ceba4771b..a5f3b73da417 100644 --- a/net/dsa/tag_sja1105.c +++ b/net/dsa/tag_sja1105.c @@ -256,7 +256,7 @@ static struct sk_buff *sja1105_pvid_tag_control_pkt(struct dsa_port *dp, return NULL; } - hdr = (struct vlan_ethhdr *)skb_mac_header(skb); + hdr = skb_vlan_eth_hdr(skb); /* If skb is already VLAN-tagged, leave that VLAN ID in place */ if (hdr->h_vlan_proto == xmit_tpid) @@ -516,7 +516,7 @@ static bool sja1110_skb_has_inband_control_extension(const struct sk_buff *skb) static void sja1105_vlan_rcv(struct sk_buff *skb, int *source_port, int *switch_id, int *vbid, u16 *vid) { - struct vlan_ethhdr *hdr = (struct vlan_ethhdr *)skb_mac_header(skb); + struct vlan_ethhdr *hdr = vlan_eth_hdr(skb); u16 vlan_tci; if (skb_vlan_tag_present(skb)) diff --git a/net/dsa/trace.c b/net/dsa/trace.c new file mode 100644 index 000000000000..1b107165d331 --- /dev/null +++ b/net/dsa/trace.c @@ -0,0 +1,39 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* Copyright 2022-2023 NXP + */ + +#define CREATE_TRACE_POINTS +#include "trace.h" + +void dsa_db_print(const struct dsa_db *db, char buf[DSA_DB_BUFSIZ]) +{ + switch (db->type) { + case DSA_DB_PORT: + sprintf(buf, "port %s", db->dp->name); + break; + case DSA_DB_LAG: + sprintf(buf, "lag %s id %d", db->lag.dev->name, db->lag.id); + break; + case DSA_DB_BRIDGE: + sprintf(buf, "bridge %s num %d", db->bridge.dev->name, + db->bridge.num); + break; + default: + sprintf(buf, "unknown"); + break; + } +} + +const char *dsa_port_kind(const struct dsa_port *dp) +{ + switch (dp->type) { + case DSA_PORT_TYPE_USER: + return "user"; + case DSA_PORT_TYPE_CPU: + return "cpu"; + case DSA_PORT_TYPE_DSA: + return "dsa"; + default: + return "unused"; + } +} diff --git a/net/dsa/trace.h b/net/dsa/trace.h new file mode 100644 index 000000000000..567f29a39707 --- /dev/null +++ b/net/dsa/trace.h @@ -0,0 +1,447 @@ +/* SPDX-License-Identifier: GPL-2.0 + * Copyright 2022-2023 NXP + */ + +#undef TRACE_SYSTEM +#define TRACE_SYSTEM dsa + +#if !defined(_NET_DSA_TRACE_H) || defined(TRACE_HEADER_MULTI_READ) +#define _NET_DSA_TRACE_H + +#include <net/dsa.h> +#include <net/switchdev.h> +#include <linux/etherdevice.h> +#include <linux/if_bridge.h> +#include <linux/refcount.h> +#include <linux/tracepoint.h> + +/* Enough to fit "bridge %s num %d" where num has 3 digits */ +#define DSA_DB_BUFSIZ (IFNAMSIZ + 16) + +void dsa_db_print(const struct dsa_db *db, char buf[DSA_DB_BUFSIZ]); +const char *dsa_port_kind(const struct dsa_port *dp); + +DECLARE_EVENT_CLASS(dsa_port_addr_op_hw, + + TP_PROTO(const struct dsa_port *dp, const unsigned char *addr, u16 vid, + const struct dsa_db *db, int err), + + TP_ARGS(dp, addr, vid, db, err), + + TP_STRUCT__entry( + __string(dev, dev_name(dp->ds->dev)) + __string(kind, dsa_port_kind(dp)) + __field(int, port) + __array(unsigned char, addr, ETH_ALEN) + __field(u16, vid) + __array(char, db_buf, DSA_DB_BUFSIZ) + __field(int, err) + ), + + TP_fast_assign( + __assign_str(dev, dev_name(dp->ds->dev)); + __assign_str(kind, dsa_port_kind(dp)); + __entry->port = dp->index; + ether_addr_copy(__entry->addr, addr); + __entry->vid = vid; + dsa_db_print(db, __entry->db_buf); + __entry->err = err; + ), + + TP_printk("%s %s port %d addr %pM vid %u db \"%s\" err %d", + __get_str(dev), __get_str(kind), __entry->port, __entry->addr, + __entry->vid, __entry->db_buf, __entry->err) +); + +/* Add unicast/multicast address to hardware, either on user ports + * (where no refcounting is kept), or on shared ports when the entry + * is first seen and its refcount is 1. + */ +DEFINE_EVENT(dsa_port_addr_op_hw, dsa_fdb_add_hw, + TP_PROTO(const struct dsa_port *dp, const unsigned char *addr, + u16 vid, const struct dsa_db *db, int err), + TP_ARGS(dp, addr, vid, db, err)); + +DEFINE_EVENT(dsa_port_addr_op_hw, dsa_mdb_add_hw, + TP_PROTO(const struct dsa_port *dp, const unsigned char *addr, + u16 vid, const struct dsa_db *db, int err), + TP_ARGS(dp, addr, vid, db, err)); + +/* Delete unicast/multicast address from hardware, either on user ports or + * when the refcount on shared ports reaches 0 + */ +DEFINE_EVENT(dsa_port_addr_op_hw, dsa_fdb_del_hw, + TP_PROTO(const struct dsa_port *dp, const unsigned char *addr, + u16 vid, const struct dsa_db *db, int err), + TP_ARGS(dp, addr, vid, db, err)); + +DEFINE_EVENT(dsa_port_addr_op_hw, dsa_mdb_del_hw, + TP_PROTO(const struct dsa_port *dp, const unsigned char *addr, + u16 vid, const struct dsa_db *db, int err), + TP_ARGS(dp, addr, vid, db, err)); + +DECLARE_EVENT_CLASS(dsa_port_addr_op_refcount, + + TP_PROTO(const struct dsa_port *dp, const unsigned char *addr, u16 vid, + const struct dsa_db *db, const refcount_t *refcount), + + TP_ARGS(dp, addr, vid, db, refcount), + + TP_STRUCT__entry( + __string(dev, dev_name(dp->ds->dev)) + __string(kind, dsa_port_kind(dp)) + __field(int, port) + __array(unsigned char, addr, ETH_ALEN) + __field(u16, vid) + __array(char, db_buf, DSA_DB_BUFSIZ) + __field(unsigned int, refcount) + ), + + TP_fast_assign( + __assign_str(dev, dev_name(dp->ds->dev)); + __assign_str(kind, dsa_port_kind(dp)); + __entry->port = dp->index; + ether_addr_copy(__entry->addr, addr); + __entry->vid = vid; + dsa_db_print(db, __entry->db_buf); + __entry->refcount = refcount_read(refcount); + ), + + TP_printk("%s %s port %d addr %pM vid %u db \"%s\" refcount %u", + __get_str(dev), __get_str(kind), __entry->port, __entry->addr, + __entry->vid, __entry->db_buf, __entry->refcount) +); + +/* Bump the refcount of an existing unicast/multicast address on shared ports */ +DEFINE_EVENT(dsa_port_addr_op_refcount, dsa_fdb_add_bump, + TP_PROTO(const struct dsa_port *dp, const unsigned char *addr, + u16 vid, const struct dsa_db *db, + const refcount_t *refcount), + TP_ARGS(dp, addr, vid, db, refcount)); + +DEFINE_EVENT(dsa_port_addr_op_refcount, dsa_mdb_add_bump, + TP_PROTO(const struct dsa_port *dp, const unsigned char *addr, + u16 vid, const struct dsa_db *db, + const refcount_t *refcount), + TP_ARGS(dp, addr, vid, db, refcount)); + +/* Drop the refcount of a multicast address that we still keep on + * shared ports + */ +DEFINE_EVENT(dsa_port_addr_op_refcount, dsa_fdb_del_drop, + TP_PROTO(const struct dsa_port *dp, const unsigned char *addr, + u16 vid, const struct dsa_db *db, + const refcount_t *refcount), + TP_ARGS(dp, addr, vid, db, refcount)); + +DEFINE_EVENT(dsa_port_addr_op_refcount, dsa_mdb_del_drop, + TP_PROTO(const struct dsa_port *dp, const unsigned char *addr, + u16 vid, const struct dsa_db *db, + const refcount_t *refcount), + TP_ARGS(dp, addr, vid, db, refcount)); + +DECLARE_EVENT_CLASS(dsa_port_addr_del_not_found, + + TP_PROTO(const struct dsa_port *dp, const unsigned char *addr, u16 vid, + const struct dsa_db *db), + + TP_ARGS(dp, addr, vid, db), + + TP_STRUCT__entry( + __string(dev, dev_name(dp->ds->dev)) + __string(kind, dsa_port_kind(dp)) + __field(int, port) + __array(unsigned char, addr, ETH_ALEN) + __field(u16, vid) + __array(char, db_buf, DSA_DB_BUFSIZ) + ), + + TP_fast_assign( + __assign_str(dev, dev_name(dp->ds->dev)); + __assign_str(kind, dsa_port_kind(dp)); + __entry->port = dp->index; + ether_addr_copy(__entry->addr, addr); + __entry->vid = vid; + dsa_db_print(db, __entry->db_buf); + ), + + TP_printk("%s %s port %d addr %pM vid %u db \"%s\"", + __get_str(dev), __get_str(kind), __entry->port, + __entry->addr, __entry->vid, __entry->db_buf) +); + +/* Attempt to delete a unicast/multicast address on shared ports for which + * the delete operation was called more times than the addition + */ +DEFINE_EVENT(dsa_port_addr_del_not_found, dsa_fdb_del_not_found, + TP_PROTO(const struct dsa_port *dp, const unsigned char *addr, + u16 vid, const struct dsa_db *db), + TP_ARGS(dp, addr, vid, db)); + +DEFINE_EVENT(dsa_port_addr_del_not_found, dsa_mdb_del_not_found, + TP_PROTO(const struct dsa_port *dp, const unsigned char *addr, + u16 vid, const struct dsa_db *db), + TP_ARGS(dp, addr, vid, db)); + +TRACE_EVENT(dsa_lag_fdb_add_hw, + + TP_PROTO(const struct net_device *lag_dev, const unsigned char *addr, + u16 vid, const struct dsa_db *db, int err), + + TP_ARGS(lag_dev, addr, vid, db, err), + + TP_STRUCT__entry( + __string(dev, lag_dev->name) + __array(unsigned char, addr, ETH_ALEN) + __field(u16, vid) + __array(char, db_buf, DSA_DB_BUFSIZ) + __field(int, err) + ), + + TP_fast_assign( + __assign_str(dev, lag_dev->name); + ether_addr_copy(__entry->addr, addr); + __entry->vid = vid; + dsa_db_print(db, __entry->db_buf); + __entry->err = err; + ), + + TP_printk("%s addr %pM vid %u db \"%s\" err %d", + __get_str(dev), __entry->addr, __entry->vid, + __entry->db_buf, __entry->err) +); + +TRACE_EVENT(dsa_lag_fdb_add_bump, + + TP_PROTO(const struct net_device *lag_dev, const unsigned char *addr, + u16 vid, const struct dsa_db *db, const refcount_t *refcount), + + TP_ARGS(lag_dev, addr, vid, db, refcount), + + TP_STRUCT__entry( + __string(dev, lag_dev->name) + __array(unsigned char, addr, ETH_ALEN) + __field(u16, vid) + __array(char, db_buf, DSA_DB_BUFSIZ) + __field(unsigned int, refcount) + ), + + TP_fast_assign( + __assign_str(dev, lag_dev->name); + ether_addr_copy(__entry->addr, addr); + __entry->vid = vid; + dsa_db_print(db, __entry->db_buf); + __entry->refcount = refcount_read(refcount); + ), + + TP_printk("%s addr %pM vid %u db \"%s\" refcount %u", + __get_str(dev), __entry->addr, __entry->vid, + __entry->db_buf, __entry->refcount) +); + +TRACE_EVENT(dsa_lag_fdb_del_hw, + + TP_PROTO(const struct net_device *lag_dev, const unsigned char *addr, + u16 vid, const struct dsa_db *db, int err), + + TP_ARGS(lag_dev, addr, vid, db, err), + + TP_STRUCT__entry( + __string(dev, lag_dev->name) + __array(unsigned char, addr, ETH_ALEN) + __field(u16, vid) + __array(char, db_buf, DSA_DB_BUFSIZ) + __field(int, err) + ), + + TP_fast_assign( + __assign_str(dev, lag_dev->name); + ether_addr_copy(__entry->addr, addr); + __entry->vid = vid; + dsa_db_print(db, __entry->db_buf); + __entry->err = err; + ), + + TP_printk("%s addr %pM vid %u db \"%s\" err %d", + __get_str(dev), __entry->addr, __entry->vid, + __entry->db_buf, __entry->err) +); + +TRACE_EVENT(dsa_lag_fdb_del_drop, + + TP_PROTO(const struct net_device *lag_dev, const unsigned char *addr, + u16 vid, const struct dsa_db *db, const refcount_t *refcount), + + TP_ARGS(lag_dev, addr, vid, db, refcount), + + TP_STRUCT__entry( + __string(dev, lag_dev->name) + __array(unsigned char, addr, ETH_ALEN) + __field(u16, vid) + __array(char, db_buf, DSA_DB_BUFSIZ) + __field(unsigned int, refcount) + ), + + TP_fast_assign( + __assign_str(dev, lag_dev->name); + ether_addr_copy(__entry->addr, addr); + __entry->vid = vid; + dsa_db_print(db, __entry->db_buf); + __entry->refcount = refcount_read(refcount); + ), + + TP_printk("%s addr %pM vid %u db \"%s\" refcount %u", + __get_str(dev), __entry->addr, __entry->vid, + __entry->db_buf, __entry->refcount) +); + +TRACE_EVENT(dsa_lag_fdb_del_not_found, + + TP_PROTO(const struct net_device *lag_dev, const unsigned char *addr, + u16 vid, const struct dsa_db *db), + + TP_ARGS(lag_dev, addr, vid, db), + + TP_STRUCT__entry( + __string(dev, lag_dev->name) + __array(unsigned char, addr, ETH_ALEN) + __field(u16, vid) + __array(char, db_buf, DSA_DB_BUFSIZ) + ), + + TP_fast_assign( + __assign_str(dev, lag_dev->name); + ether_addr_copy(__entry->addr, addr); + __entry->vid = vid; + dsa_db_print(db, __entry->db_buf); + ), + + TP_printk("%s addr %pM vid %u db \"%s\"", + __get_str(dev), __entry->addr, __entry->vid, __entry->db_buf) +); + +DECLARE_EVENT_CLASS(dsa_vlan_op_hw, + + TP_PROTO(const struct dsa_port *dp, + const struct switchdev_obj_port_vlan *vlan, int err), + + TP_ARGS(dp, vlan, err), + + TP_STRUCT__entry( + __string(dev, dev_name(dp->ds->dev)) + __string(kind, dsa_port_kind(dp)) + __field(int, port) + __field(u16, vid) + __field(u16, flags) + __field(bool, changed) + __field(int, err) + ), + + TP_fast_assign( + __assign_str(dev, dev_name(dp->ds->dev)); + __assign_str(kind, dsa_port_kind(dp)); + __entry->port = dp->index; + __entry->vid = vlan->vid; + __entry->flags = vlan->flags; + __entry->changed = vlan->changed; + __entry->err = err; + ), + + TP_printk("%s %s port %d vid %u%s%s%s", + __get_str(dev), __get_str(kind), __entry->port, __entry->vid, + __entry->flags & BRIDGE_VLAN_INFO_PVID ? " pvid" : "", + __entry->flags & BRIDGE_VLAN_INFO_UNTAGGED ? " untagged" : "", + __entry->changed ? " (changed)" : "") +); + +DEFINE_EVENT(dsa_vlan_op_hw, dsa_vlan_add_hw, + TP_PROTO(const struct dsa_port *dp, + const struct switchdev_obj_port_vlan *vlan, int err), + TP_ARGS(dp, vlan, err)); + +DEFINE_EVENT(dsa_vlan_op_hw, dsa_vlan_del_hw, + TP_PROTO(const struct dsa_port *dp, + const struct switchdev_obj_port_vlan *vlan, int err), + TP_ARGS(dp, vlan, err)); + +DECLARE_EVENT_CLASS(dsa_vlan_op_refcount, + + TP_PROTO(const struct dsa_port *dp, + const struct switchdev_obj_port_vlan *vlan, + const refcount_t *refcount), + + TP_ARGS(dp, vlan, refcount), + + TP_STRUCT__entry( + __string(dev, dev_name(dp->ds->dev)) + __string(kind, dsa_port_kind(dp)) + __field(int, port) + __field(u16, vid) + __field(u16, flags) + __field(bool, changed) + __field(unsigned int, refcount) + ), + + TP_fast_assign( + __assign_str(dev, dev_name(dp->ds->dev)); + __assign_str(kind, dsa_port_kind(dp)); + __entry->port = dp->index; + __entry->vid = vlan->vid; + __entry->flags = vlan->flags; + __entry->changed = vlan->changed; + __entry->refcount = refcount_read(refcount); + ), + + TP_printk("%s %s port %d vid %u%s%s%s refcount %u", + __get_str(dev), __get_str(kind), __entry->port, __entry->vid, + __entry->flags & BRIDGE_VLAN_INFO_PVID ? " pvid" : "", + __entry->flags & BRIDGE_VLAN_INFO_UNTAGGED ? " untagged" : "", + __entry->changed ? " (changed)" : "", __entry->refcount) +); + +DEFINE_EVENT(dsa_vlan_op_refcount, dsa_vlan_add_bump, + TP_PROTO(const struct dsa_port *dp, + const struct switchdev_obj_port_vlan *vlan, + const refcount_t *refcount), + TP_ARGS(dp, vlan, refcount)); + +DEFINE_EVENT(dsa_vlan_op_refcount, dsa_vlan_del_drop, + TP_PROTO(const struct dsa_port *dp, + const struct switchdev_obj_port_vlan *vlan, + const refcount_t *refcount), + TP_ARGS(dp, vlan, refcount)); + +TRACE_EVENT(dsa_vlan_del_not_found, + + TP_PROTO(const struct dsa_port *dp, + const struct switchdev_obj_port_vlan *vlan), + + TP_ARGS(dp, vlan), + + TP_STRUCT__entry( + __string(dev, dev_name(dp->ds->dev)) + __string(kind, dsa_port_kind(dp)) + __field(int, port) + __field(u16, vid) + ), + + TP_fast_assign( + __assign_str(dev, dev_name(dp->ds->dev)); + __assign_str(kind, dsa_port_kind(dp)); + __entry->port = dp->index; + __entry->vid = vlan->vid; + ), + + TP_printk("%s %s port %d vid %u", + __get_str(dev), __get_str(kind), __entry->port, __entry->vid) +); + +#endif /* _NET_DSA_TRACE_H */ + +/* We don't want to use include/trace/events */ +#undef TRACE_INCLUDE_PATH +#define TRACE_INCLUDE_PATH . +#undef TRACE_INCLUDE_FILE +#define TRACE_INCLUDE_FILE trace +/* This part must be outside protection */ +#include <trace/define_trace.h> diff --git a/net/ethtool/coalesce.c b/net/ethtool/coalesce.c index 443e7e642c96..01a59ce211c8 100644 --- a/net/ethtool/coalesce.c +++ b/net/ethtool/coalesce.c @@ -254,13 +254,14 @@ ethnl_set_coalesce_validate(struct ethnl_req_info *req_info, } static int -ethnl_set_coalesce(struct ethnl_req_info *req_info, struct genl_info *info) +__ethnl_set_coalesce(struct ethnl_req_info *req_info, struct genl_info *info, + bool *dual_change) { struct kernel_ethtool_coalesce kernel_coalesce = {}; struct net_device *dev = req_info->dev; struct ethtool_coalesce coalesce = {}; + bool mod_mode = false, mod = false; struct nlattr **tb = info->attrs; - bool mod = false; int ret; ret = dev->ethtool_ops->get_coalesce(dev, &coalesce, &kernel_coalesce, @@ -268,6 +269,7 @@ ethnl_set_coalesce(struct ethnl_req_info *req_info, struct genl_info *info) if (ret < 0) return ret; + /* Update values */ ethnl_update_u32(&coalesce.rx_coalesce_usecs, tb[ETHTOOL_A_COALESCE_RX_USECS], &mod); ethnl_update_u32(&coalesce.rx_max_coalesced_frames, @@ -286,10 +288,6 @@ ethnl_set_coalesce(struct ethnl_req_info *req_info, struct genl_info *info) tb[ETHTOOL_A_COALESCE_TX_MAX_FRAMES_IRQ], &mod); ethnl_update_u32(&coalesce.stats_block_coalesce_usecs, tb[ETHTOOL_A_COALESCE_STATS_BLOCK_USECS], &mod); - ethnl_update_bool32(&coalesce.use_adaptive_rx_coalesce, - tb[ETHTOOL_A_COALESCE_USE_ADAPTIVE_RX], &mod); - ethnl_update_bool32(&coalesce.use_adaptive_tx_coalesce, - tb[ETHTOOL_A_COALESCE_USE_ADAPTIVE_TX], &mod); ethnl_update_u32(&coalesce.pkt_rate_low, tb[ETHTOOL_A_COALESCE_PKT_RATE_LOW], &mod); ethnl_update_u32(&coalesce.rx_coalesce_usecs_low, @@ -312,17 +310,25 @@ ethnl_set_coalesce(struct ethnl_req_info *req_info, struct genl_info *info) tb[ETHTOOL_A_COALESCE_TX_MAX_FRAMES_HIGH], &mod); ethnl_update_u32(&coalesce.rate_sample_interval, tb[ETHTOOL_A_COALESCE_RATE_SAMPLE_INTERVAL], &mod); - ethnl_update_u8(&kernel_coalesce.use_cqe_mode_tx, - tb[ETHTOOL_A_COALESCE_USE_CQE_MODE_TX], &mod); - ethnl_update_u8(&kernel_coalesce.use_cqe_mode_rx, - tb[ETHTOOL_A_COALESCE_USE_CQE_MODE_RX], &mod); ethnl_update_u32(&kernel_coalesce.tx_aggr_max_bytes, tb[ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES], &mod); ethnl_update_u32(&kernel_coalesce.tx_aggr_max_frames, tb[ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES], &mod); ethnl_update_u32(&kernel_coalesce.tx_aggr_time_usecs, tb[ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS], &mod); - if (!mod) + + /* Update operation modes */ + ethnl_update_bool32(&coalesce.use_adaptive_rx_coalesce, + tb[ETHTOOL_A_COALESCE_USE_ADAPTIVE_RX], &mod_mode); + ethnl_update_bool32(&coalesce.use_adaptive_tx_coalesce, + tb[ETHTOOL_A_COALESCE_USE_ADAPTIVE_TX], &mod_mode); + ethnl_update_u8(&kernel_coalesce.use_cqe_mode_tx, + tb[ETHTOOL_A_COALESCE_USE_CQE_MODE_TX], &mod_mode); + ethnl_update_u8(&kernel_coalesce.use_cqe_mode_rx, + tb[ETHTOOL_A_COALESCE_USE_CQE_MODE_RX], &mod_mode); + + *dual_change = mod && mod_mode; + if (!mod && !mod_mode) return 0; ret = dev->ethtool_ops->set_coalesce(dev, &coalesce, &kernel_coalesce, @@ -330,6 +336,32 @@ ethnl_set_coalesce(struct ethnl_req_info *req_info, struct genl_info *info) return ret < 0 ? ret : 1; } +static int +ethnl_set_coalesce(struct ethnl_req_info *req_info, struct genl_info *info) +{ + bool dual_change; + int err, ret; + + /* SET_COALESCE may change operation mode and parameters in one call. + * Changing operation mode may cause the driver to reset the parameter + * values, and therefore ignore user input (driver does not know which + * parameters come from user and which are echoed back from ->get). + * To not complicate the drivers if user tries to change both the mode + * and parameters at once - call the driver twice. + */ + err = __ethnl_set_coalesce(req_info, info, &dual_change); + if (err < 0) + return err; + ret = err; + + if (ret && dual_change) { + err = __ethnl_set_coalesce(req_info, info, &dual_change); + if (err < 0) + return err; + } + return ret; +} + const struct ethnl_request_ops ethnl_coalesce_request_ops = { .request_cmd = ETHTOOL_MSG_COALESCE_GET, .reply_cmd = ETHTOOL_MSG_COALESCE_GET_REPLY, diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c index 646b3e490c71..59adc4e6e9ee 100644 --- a/net/ethtool/ioctl.c +++ b/net/ethtool/ioctl.c @@ -27,6 +27,7 @@ #include <linux/net.h> #include <linux/pm_runtime.h> #include <net/devlink.h> +#include <net/ipv6.h> #include <net/xdp_sock_drv.h> #include <net/flow_offload.h> #include <linux/ethtool_netlink.h> @@ -3127,7 +3128,6 @@ struct ethtool_rx_flow_rule * ethtool_rx_flow_rule_create(const struct ethtool_rx_flow_spec_input *input) { const struct ethtool_rx_flow_spec *fs = input->fs; - static struct in6_addr zero_addr = {}; struct ethtool_rx_flow_match *match; struct ethtool_rx_flow_rule *flow; struct flow_action_entry *act; @@ -3233,20 +3233,20 @@ ethtool_rx_flow_rule_create(const struct ethtool_rx_flow_spec_input *input) v6_spec = &fs->h_u.tcp_ip6_spec; v6_m_spec = &fs->m_u.tcp_ip6_spec; - if (memcmp(v6_m_spec->ip6src, &zero_addr, sizeof(zero_addr))) { + if (!ipv6_addr_any((struct in6_addr *)v6_m_spec->ip6src)) { memcpy(&match->key.ipv6.src, v6_spec->ip6src, sizeof(match->key.ipv6.src)); memcpy(&match->mask.ipv6.src, v6_m_spec->ip6src, sizeof(match->mask.ipv6.src)); } - if (memcmp(v6_m_spec->ip6dst, &zero_addr, sizeof(zero_addr))) { + if (!ipv6_addr_any((struct in6_addr *)v6_m_spec->ip6dst)) { memcpy(&match->key.ipv6.dst, v6_spec->ip6dst, sizeof(match->key.ipv6.dst)); memcpy(&match->mask.ipv6.dst, v6_m_spec->ip6dst, sizeof(match->mask.ipv6.dst)); } - if (memcmp(v6_m_spec->ip6src, &zero_addr, sizeof(zero_addr)) || - memcmp(v6_m_spec->ip6dst, &zero_addr, sizeof(zero_addr))) { + if (!ipv6_addr_any((struct in6_addr *)v6_m_spec->ip6src) || + !ipv6_addr_any((struct in6_addr *)v6_m_spec->ip6dst)) { match->dissector.used_keys |= BIT(FLOW_DISSECTOR_KEY_IPV6_ADDRS); match->dissector.offset[FLOW_DISSECTOR_KEY_IPV6_ADDRS] = diff --git a/net/ethtool/mm.c b/net/ethtool/mm.c index fce3cc2734f9..4058a557b5a4 100644 --- a/net/ethtool/mm.c +++ b/net/ethtool/mm.c @@ -214,6 +214,16 @@ static int ethnl_set_mm(struct ethnl_req_info *req_info, struct genl_info *info) return -ERANGE; } + if (cfg.verify_enabled && !cfg.tx_enabled) { + NL_SET_ERR_MSG(extack, "Verification requires TX enabled"); + return -EINVAL; + } + + if (cfg.tx_enabled && !cfg.pmac_enabled) { + NL_SET_ERR_MSG(extack, "TX enabled requires pMAC enabled"); + return -EINVAL; + } + ret = dev->ethtool_ops->set_mm(dev, &cfg, extack); return ret < 0 ? ret : 1; } @@ -249,3 +259,26 @@ bool __ethtool_dev_mm_supported(struct net_device *dev) return !ret; } + +bool ethtool_dev_mm_supported(struct net_device *dev) +{ + const struct ethtool_ops *ops = dev->ethtool_ops; + bool supported; + int ret; + + ASSERT_RTNL(); + + if (!ops) + return false; + + ret = ethnl_ops_begin(dev); + if (ret < 0) + return false; + + supported = __ethtool_dev_mm_supported(dev); + + ethnl_ops_complete(dev); + + return supported; +} +EXPORT_SYMBOL_GPL(ethtool_dev_mm_supported); diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h index f7b189ed96b2..79424b34b553 100644 --- a/net/ethtool/netlink.h +++ b/net/ethtool/netlink.h @@ -413,7 +413,7 @@ extern const struct nla_policy ethnl_features_set_policy[ETHTOOL_A_FEATURES_WANT extern const struct nla_policy ethnl_privflags_get_policy[ETHTOOL_A_PRIVFLAGS_HEADER + 1]; extern const struct nla_policy ethnl_privflags_set_policy[ETHTOOL_A_PRIVFLAGS_FLAGS + 1]; extern const struct nla_policy ethnl_rings_get_policy[ETHTOOL_A_RINGS_HEADER + 1]; -extern const struct nla_policy ethnl_rings_set_policy[ETHTOOL_A_RINGS_RX_PUSH + 1]; +extern const struct nla_policy ethnl_rings_set_policy[ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN_MAX + 1]; extern const struct nla_policy ethnl_channels_get_policy[ETHTOOL_A_CHANNELS_HEADER + 1]; extern const struct nla_policy ethnl_channels_set_policy[ETHTOOL_A_CHANNELS_COMBINED_COUNT + 1]; extern const struct nla_policy ethnl_coalesce_get_policy[ETHTOOL_A_COALESCE_HEADER + 1]; diff --git a/net/ethtool/rings.c b/net/ethtool/rings.c index f358cd57d094..1c4972526142 100644 --- a/net/ethtool/rings.c +++ b/net/ethtool/rings.c @@ -11,6 +11,7 @@ struct rings_reply_data { struct ethnl_reply_data base; struct ethtool_ringparam ringparam; struct kernel_ethtool_ringparam kernel_ringparam; + u32 supported_ring_params; }; #define RINGS_REPDATA(__reply_base) \ @@ -32,6 +33,8 @@ static int rings_prepare_data(const struct ethnl_req_info *req_base, if (!dev->ethtool_ops->get_ringparam) return -EOPNOTSUPP; + + data->supported_ring_params = dev->ethtool_ops->supported_ring_params; ret = ethnl_ops_begin(dev); if (ret < 0) return ret; @@ -57,7 +60,9 @@ static int rings_reply_size(const struct ethnl_req_info *req_base, nla_total_size(sizeof(u8)) + /* _RINGS_TCP_DATA_SPLIT */ nla_total_size(sizeof(u32) + /* _RINGS_CQE_SIZE */ nla_total_size(sizeof(u8)) + /* _RINGS_TX_PUSH */ - nla_total_size(sizeof(u8))); /* _RINGS_RX_PUSH */ + nla_total_size(sizeof(u8))) + /* _RINGS_RX_PUSH */ + nla_total_size(sizeof(u32)) + /* _RINGS_TX_PUSH_BUF_LEN */ + nla_total_size(sizeof(u32)); /* _RINGS_TX_PUSH_BUF_LEN_MAX */ } static int rings_fill_reply(struct sk_buff *skb, @@ -67,6 +72,7 @@ static int rings_fill_reply(struct sk_buff *skb, const struct rings_reply_data *data = RINGS_REPDATA(reply_base); const struct kernel_ethtool_ringparam *kr = &data->kernel_ringparam; const struct ethtool_ringparam *ringparam = &data->ringparam; + u32 supported_ring_params = data->supported_ring_params; WARN_ON(kr->tcp_data_split > ETHTOOL_TCP_DATA_SPLIT_ENABLED); @@ -98,7 +104,12 @@ static int rings_fill_reply(struct sk_buff *skb, (kr->cqe_size && (nla_put_u32(skb, ETHTOOL_A_RINGS_CQE_SIZE, kr->cqe_size))) || nla_put_u8(skb, ETHTOOL_A_RINGS_TX_PUSH, !!kr->tx_push) || - nla_put_u8(skb, ETHTOOL_A_RINGS_RX_PUSH, !!kr->rx_push)) + nla_put_u8(skb, ETHTOOL_A_RINGS_RX_PUSH, !!kr->rx_push) || + ((supported_ring_params & ETHTOOL_RING_USE_TX_PUSH_BUF_LEN) && + (nla_put_u32(skb, ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN_MAX, + kr->tx_push_buf_max_len) || + nla_put_u32(skb, ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN, + kr->tx_push_buf_len)))) return -EMSGSIZE; return 0; @@ -117,6 +128,7 @@ const struct nla_policy ethnl_rings_set_policy[] = { [ETHTOOL_A_RINGS_CQE_SIZE] = NLA_POLICY_MIN(NLA_U32, 1), [ETHTOOL_A_RINGS_TX_PUSH] = NLA_POLICY_MAX(NLA_U8, 1), [ETHTOOL_A_RINGS_RX_PUSH] = NLA_POLICY_MAX(NLA_U8, 1), + [ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN] = { .type = NLA_U32 }, }; static int @@ -158,6 +170,14 @@ ethnl_set_rings_validate(struct ethnl_req_info *req_info, return -EOPNOTSUPP; } + if (tb[ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN] && + !(ops->supported_ring_params & ETHTOOL_RING_USE_TX_PUSH_BUF_LEN)) { + NL_SET_ERR_MSG_ATTR(info->extack, + tb[ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN], + "setting tx push buf len is not supported"); + return -EOPNOTSUPP; + } + return ops->get_ringparam && ops->set_ringparam ? 1 : -EOPNOTSUPP; } @@ -189,6 +209,8 @@ ethnl_set_rings(struct ethnl_req_info *req_info, struct genl_info *info) tb[ETHTOOL_A_RINGS_TX_PUSH], &mod); ethnl_update_u8(&kernel_ringparam.rx_push, tb[ETHTOOL_A_RINGS_RX_PUSH], &mod); + ethnl_update_u32(&kernel_ringparam.tx_push_buf_len, + tb[ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN], &mod); if (!mod) return 0; @@ -209,6 +231,14 @@ ethnl_set_rings(struct ethnl_req_info *req_info, struct genl_info *info) return -EINVAL; } + if (kernel_ringparam.tx_push_buf_len > kernel_ringparam.tx_push_buf_max_len) { + NL_SET_ERR_MSG_ATTR_FMT(info->extack, tb[ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN], + "Requested TX push buffer exceeds the maximum of %u", + kernel_ringparam.tx_push_buf_max_len); + + return -EINVAL; + } + ret = dev->ethtool_ops->set_ringparam(dev, &ringparam, &kernel_ringparam, info->extack); return ret < 0 ? ret : 1; diff --git a/net/handshake/.kunitconfig b/net/handshake/.kunitconfig new file mode 100644 index 000000000000..5c48cf4abca2 --- /dev/null +++ b/net/handshake/.kunitconfig @@ -0,0 +1,11 @@ +CONFIG_KUNIT=y +CONFIG_UBSAN=y +CONFIG_STACKTRACE=y +CONFIG_NET=y +CONFIG_NETWORK_FILESYSTEMS=y +CONFIG_INET=y +CONFIG_MULTIUSER=y +CONFIG_NFS_FS=y +CONFIG_SUNRPC=y +CONFIG_NET_HANDSHAKE=y +CONFIG_NET_HANDSHAKE_KUNIT_TEST=y diff --git a/net/handshake/Makefile b/net/handshake/Makefile new file mode 100644 index 000000000000..247d73c6ff6e --- /dev/null +++ b/net/handshake/Makefile @@ -0,0 +1,13 @@ +# SPDX-License-Identifier: GPL-2.0-only +# +# Makefile for the Generic HANDSHAKE service +# +# Author: Chuck Lever <chuck.lever@oracle.com> +# +# Copyright (c) 2023, Oracle and/or its affiliates. +# + +obj-y += handshake.o +handshake-y := genl.o netlink.o request.o tlshd.o trace.o + +obj-$(CONFIG_NET_HANDSHAKE_KUNIT_TEST) += handshake-test.o diff --git a/net/handshake/genl.c b/net/handshake/genl.c new file mode 100644 index 000000000000..9f29efb1493e --- /dev/null +++ b/net/handshake/genl.c @@ -0,0 +1,58 @@ +// SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) +/* Do not edit directly, auto-generated from: */ +/* Documentation/netlink/specs/handshake.yaml */ +/* YNL-GEN kernel source */ + +#include <net/netlink.h> +#include <net/genetlink.h> + +#include "genl.h" + +#include <linux/handshake.h> + +/* HANDSHAKE_CMD_ACCEPT - do */ +static const struct nla_policy handshake_accept_nl_policy[HANDSHAKE_A_ACCEPT_HANDLER_CLASS + 1] = { + [HANDSHAKE_A_ACCEPT_HANDLER_CLASS] = NLA_POLICY_MAX(NLA_U32, 2), +}; + +/* HANDSHAKE_CMD_DONE - do */ +static const struct nla_policy handshake_done_nl_policy[HANDSHAKE_A_DONE_REMOTE_AUTH + 1] = { + [HANDSHAKE_A_DONE_STATUS] = { .type = NLA_U32, }, + [HANDSHAKE_A_DONE_SOCKFD] = { .type = NLA_U32, }, + [HANDSHAKE_A_DONE_REMOTE_AUTH] = { .type = NLA_U32, }, +}; + +/* Ops table for handshake */ +static const struct genl_split_ops handshake_nl_ops[] = { + { + .cmd = HANDSHAKE_CMD_ACCEPT, + .doit = handshake_nl_accept_doit, + .policy = handshake_accept_nl_policy, + .maxattr = HANDSHAKE_A_ACCEPT_HANDLER_CLASS, + .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DO, + }, + { + .cmd = HANDSHAKE_CMD_DONE, + .doit = handshake_nl_done_doit, + .policy = handshake_done_nl_policy, + .maxattr = HANDSHAKE_A_DONE_REMOTE_AUTH, + .flags = GENL_CMD_CAP_DO, + }, +}; + +static const struct genl_multicast_group handshake_nl_mcgrps[] = { + [HANDSHAKE_NLGRP_NONE] = { "none", }, + [HANDSHAKE_NLGRP_TLSHD] = { "tlshd", }, +}; + +struct genl_family handshake_nl_family __ro_after_init = { + .name = HANDSHAKE_FAMILY_NAME, + .version = HANDSHAKE_FAMILY_VERSION, + .netnsok = true, + .parallel_ops = true, + .module = THIS_MODULE, + .split_ops = handshake_nl_ops, + .n_split_ops = ARRAY_SIZE(handshake_nl_ops), + .mcgrps = handshake_nl_mcgrps, + .n_mcgrps = ARRAY_SIZE(handshake_nl_mcgrps), +}; diff --git a/net/handshake/genl.h b/net/handshake/genl.h new file mode 100644 index 000000000000..2c1f1aa6a02a --- /dev/null +++ b/net/handshake/genl.h @@ -0,0 +1,24 @@ +/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */ +/* Do not edit directly, auto-generated from: */ +/* Documentation/netlink/specs/handshake.yaml */ +/* YNL-GEN kernel header */ + +#ifndef _LINUX_HANDSHAKE_GEN_H +#define _LINUX_HANDSHAKE_GEN_H + +#include <net/netlink.h> +#include <net/genetlink.h> + +#include <linux/handshake.h> + +int handshake_nl_accept_doit(struct sk_buff *skb, struct genl_info *info); +int handshake_nl_done_doit(struct sk_buff *skb, struct genl_info *info); + +enum { + HANDSHAKE_NLGRP_NONE, + HANDSHAKE_NLGRP_TLSHD, +}; + +extern struct genl_family handshake_nl_family; + +#endif /* _LINUX_HANDSHAKE_GEN_H */ diff --git a/net/handshake/handshake-test.c b/net/handshake/handshake-test.c new file mode 100644 index 000000000000..e6adc5dec11a --- /dev/null +++ b/net/handshake/handshake-test.c @@ -0,0 +1,523 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2023 Oracle and/or its affiliates. + * + * KUnit test of the handshake upcall mechanism. + */ + +#include <kunit/test.h> +#include <kunit/visibility.h> + +#include <linux/kernel.h> + +#include <net/sock.h> +#include <net/genetlink.h> +#include <net/netns/generic.h> + +#include <uapi/linux/handshake.h> +#include "handshake.h" + +MODULE_IMPORT_NS(EXPORTED_FOR_KUNIT_TESTING); + +static int test_accept_func(struct handshake_req *req, struct genl_info *info, + int fd) +{ + return 0; +} + +static void test_done_func(struct handshake_req *req, unsigned int status, + struct genl_info *info) +{ +} + +struct handshake_req_alloc_test_param { + const char *desc; + struct handshake_proto *proto; + gfp_t gfp; + bool expect_success; +}; + +static struct handshake_proto handshake_req_alloc_proto_2 = { + .hp_handler_class = HANDSHAKE_HANDLER_CLASS_NONE, +}; + +static struct handshake_proto handshake_req_alloc_proto_3 = { + .hp_handler_class = HANDSHAKE_HANDLER_CLASS_MAX, +}; + +static struct handshake_proto handshake_req_alloc_proto_4 = { + .hp_handler_class = HANDSHAKE_HANDLER_CLASS_TLSHD, +}; + +static struct handshake_proto handshake_req_alloc_proto_5 = { + .hp_handler_class = HANDSHAKE_HANDLER_CLASS_TLSHD, + .hp_accept = test_accept_func, +}; + +static struct handshake_proto handshake_req_alloc_proto_6 = { + .hp_handler_class = HANDSHAKE_HANDLER_CLASS_TLSHD, + .hp_privsize = UINT_MAX, + .hp_accept = test_accept_func, + .hp_done = test_done_func, +}; + +static struct handshake_proto handshake_req_alloc_proto_good = { + .hp_handler_class = HANDSHAKE_HANDLER_CLASS_TLSHD, + .hp_accept = test_accept_func, + .hp_done = test_done_func, +}; + +static const +struct handshake_req_alloc_test_param handshake_req_alloc_params[] = { + { + .desc = "handshake_req_alloc NULL proto", + .proto = NULL, + .gfp = GFP_KERNEL, + .expect_success = false, + }, + { + .desc = "handshake_req_alloc CLASS_NONE", + .proto = &handshake_req_alloc_proto_2, + .gfp = GFP_KERNEL, + .expect_success = false, + }, + { + .desc = "handshake_req_alloc CLASS_MAX", + .proto = &handshake_req_alloc_proto_3, + .gfp = GFP_KERNEL, + .expect_success = false, + }, + { + .desc = "handshake_req_alloc no callbacks", + .proto = &handshake_req_alloc_proto_4, + .gfp = GFP_KERNEL, + .expect_success = false, + }, + { + .desc = "handshake_req_alloc no done callback", + .proto = &handshake_req_alloc_proto_5, + .gfp = GFP_KERNEL, + .expect_success = false, + }, + { + .desc = "handshake_req_alloc excessive privsize", + .proto = &handshake_req_alloc_proto_6, + .gfp = GFP_KERNEL, + .expect_success = false, + }, + { + .desc = "handshake_req_alloc all good", + .proto = &handshake_req_alloc_proto_good, + .gfp = GFP_KERNEL, + .expect_success = true, + }, +}; + +static void +handshake_req_alloc_get_desc(const struct handshake_req_alloc_test_param *param, + char *desc) +{ + strscpy(desc, param->desc, KUNIT_PARAM_DESC_SIZE); +} + +/* Creates the function handshake_req_alloc_gen_params */ +KUNIT_ARRAY_PARAM(handshake_req_alloc, handshake_req_alloc_params, + handshake_req_alloc_get_desc); + +static void handshake_req_alloc_case(struct kunit *test) +{ + const struct handshake_req_alloc_test_param *param = test->param_value; + struct handshake_req *result; + + /* Arrange */ + + /* Act */ + result = handshake_req_alloc(param->proto, param->gfp); + + /* Assert */ + if (param->expect_success) + KUNIT_EXPECT_NOT_NULL(test, result); + else + KUNIT_EXPECT_NULL(test, result); + + kfree(result); +} + +static void handshake_req_submit_test1(struct kunit *test) +{ + struct socket *sock; + int err, result; + + /* Arrange */ + err = __sock_create(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP, + &sock, 1); + KUNIT_ASSERT_EQ(test, err, 0); + + /* Act */ + result = handshake_req_submit(sock, NULL, GFP_KERNEL); + + /* Assert */ + KUNIT_EXPECT_EQ(test, result, -EINVAL); + + sock_release(sock); +} + +static void handshake_req_submit_test2(struct kunit *test) +{ + struct handshake_req *req; + int result; + + /* Arrange */ + req = handshake_req_alloc(&handshake_req_alloc_proto_good, GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, req); + + /* Act */ + result = handshake_req_submit(NULL, req, GFP_KERNEL); + + /* Assert */ + KUNIT_EXPECT_EQ(test, result, -EINVAL); + + /* handshake_req_submit() destroys @req on error */ +} + +static void handshake_req_submit_test3(struct kunit *test) +{ + struct handshake_req *req; + struct socket *sock; + int err, result; + + /* Arrange */ + req = handshake_req_alloc(&handshake_req_alloc_proto_good, GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, req); + + err = __sock_create(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP, + &sock, 1); + KUNIT_ASSERT_EQ(test, err, 0); + sock->file = NULL; + + /* Act */ + result = handshake_req_submit(sock, req, GFP_KERNEL); + + /* Assert */ + KUNIT_EXPECT_EQ(test, result, -EINVAL); + + /* handshake_req_submit() destroys @req on error */ + sock_release(sock); +} + +static void handshake_req_submit_test4(struct kunit *test) +{ + struct handshake_req *req, *result; + struct socket *sock; + int err; + + /* Arrange */ + req = handshake_req_alloc(&handshake_req_alloc_proto_good, GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, req); + + err = __sock_create(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP, + &sock, 1); + KUNIT_ASSERT_EQ(test, err, 0); + sock->file = sock_alloc_file(sock, O_NONBLOCK, NULL); + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, sock->file); + KUNIT_ASSERT_NOT_NULL(test, sock->sk); + + err = handshake_req_submit(sock, req, GFP_KERNEL); + KUNIT_ASSERT_EQ(test, err, 0); + + /* Act */ + result = handshake_req_hash_lookup(sock->sk); + + /* Assert */ + KUNIT_EXPECT_NOT_NULL(test, result); + KUNIT_EXPECT_PTR_EQ(test, req, result); + + handshake_req_cancel(sock->sk); + sock_release(sock); +} + +static void handshake_req_submit_test5(struct kunit *test) +{ + struct handshake_req *req; + struct handshake_net *hn; + struct socket *sock; + struct net *net; + int saved, err; + + /* Arrange */ + req = handshake_req_alloc(&handshake_req_alloc_proto_good, GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, req); + + err = __sock_create(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP, + &sock, 1); + KUNIT_ASSERT_EQ(test, err, 0); + sock->file = sock_alloc_file(sock, O_NONBLOCK, NULL); + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, sock->file); + KUNIT_ASSERT_NOT_NULL(test, sock->sk); + + net = sock_net(sock->sk); + hn = handshake_pernet(net); + KUNIT_ASSERT_NOT_NULL(test, hn); + + saved = hn->hn_pending; + hn->hn_pending = hn->hn_pending_max + 1; + + /* Act */ + err = handshake_req_submit(sock, req, GFP_KERNEL); + + /* Assert */ + KUNIT_EXPECT_EQ(test, err, -EAGAIN); + + sock_release(sock); + hn->hn_pending = saved; +} + +static void handshake_req_submit_test6(struct kunit *test) +{ + struct handshake_req *req1, *req2; + struct socket *sock; + int err; + + /* Arrange */ + req1 = handshake_req_alloc(&handshake_req_alloc_proto_good, GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, req1); + req2 = handshake_req_alloc(&handshake_req_alloc_proto_good, GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, req2); + + err = __sock_create(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP, + &sock, 1); + KUNIT_ASSERT_EQ(test, err, 0); + sock->file = sock_alloc_file(sock, O_NONBLOCK, NULL); + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, sock->file); + KUNIT_ASSERT_NOT_NULL(test, sock->sk); + + /* Act */ + err = handshake_req_submit(sock, req1, GFP_KERNEL); + KUNIT_ASSERT_EQ(test, err, 0); + err = handshake_req_submit(sock, req2, GFP_KERNEL); + + /* Assert */ + KUNIT_EXPECT_EQ(test, err, -EBUSY); + + handshake_req_cancel(sock->sk); + sock_release(sock); +} + +static void handshake_req_cancel_test1(struct kunit *test) +{ + struct handshake_req *req; + struct socket *sock; + bool result; + int err; + + /* Arrange */ + req = handshake_req_alloc(&handshake_req_alloc_proto_good, GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, req); + + err = __sock_create(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP, + &sock, 1); + KUNIT_ASSERT_EQ(test, err, 0); + + sock->file = sock_alloc_file(sock, O_NONBLOCK, NULL); + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, sock->file); + + err = handshake_req_submit(sock, req, GFP_KERNEL); + KUNIT_ASSERT_EQ(test, err, 0); + + /* NB: handshake_req hasn't been accepted */ + + /* Act */ + result = handshake_req_cancel(sock->sk); + + /* Assert */ + KUNIT_EXPECT_TRUE(test, result); + + sock_release(sock); +} + +static void handshake_req_cancel_test2(struct kunit *test) +{ + struct handshake_req *req, *next; + struct handshake_net *hn; + struct socket *sock; + struct net *net; + bool result; + int err; + + /* Arrange */ + req = handshake_req_alloc(&handshake_req_alloc_proto_good, GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, req); + + err = __sock_create(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP, + &sock, 1); + KUNIT_ASSERT_EQ(test, err, 0); + + sock->file = sock_alloc_file(sock, O_NONBLOCK, NULL); + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, sock->file); + + err = handshake_req_submit(sock, req, GFP_KERNEL); + KUNIT_ASSERT_EQ(test, err, 0); + + net = sock_net(sock->sk); + hn = handshake_pernet(net); + KUNIT_ASSERT_NOT_NULL(test, hn); + + /* Pretend to accept this request */ + next = handshake_req_next(hn, HANDSHAKE_HANDLER_CLASS_TLSHD); + KUNIT_ASSERT_PTR_EQ(test, req, next); + + /* Act */ + result = handshake_req_cancel(sock->sk); + + /* Assert */ + KUNIT_EXPECT_TRUE(test, result); + + sock_release(sock); +} + +static void handshake_req_cancel_test3(struct kunit *test) +{ + struct handshake_req *req, *next; + struct handshake_net *hn; + struct socket *sock; + struct net *net; + bool result; + int err; + + /* Arrange */ + req = handshake_req_alloc(&handshake_req_alloc_proto_good, GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, req); + + err = __sock_create(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP, + &sock, 1); + KUNIT_ASSERT_EQ(test, err, 0); + + sock->file = sock_alloc_file(sock, O_NONBLOCK, NULL); + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, sock->file); + + err = handshake_req_submit(sock, req, GFP_KERNEL); + KUNIT_ASSERT_EQ(test, err, 0); + + net = sock_net(sock->sk); + hn = handshake_pernet(net); + KUNIT_ASSERT_NOT_NULL(test, hn); + + /* Pretend to accept this request */ + next = handshake_req_next(hn, HANDSHAKE_HANDLER_CLASS_TLSHD); + KUNIT_ASSERT_PTR_EQ(test, req, next); + + /* Pretend to complete this request */ + handshake_complete(next, -ETIMEDOUT, NULL); + + /* Act */ + result = handshake_req_cancel(sock->sk); + + /* Assert */ + KUNIT_EXPECT_FALSE(test, result); + + sock_release(sock); +} + +static struct handshake_req *handshake_req_destroy_test; + +static void test_destroy_func(struct handshake_req *req) +{ + handshake_req_destroy_test = req; +} + +static struct handshake_proto handshake_req_alloc_proto_destroy = { + .hp_handler_class = HANDSHAKE_HANDLER_CLASS_TLSHD, + .hp_accept = test_accept_func, + .hp_done = test_done_func, + .hp_destroy = test_destroy_func, +}; + +static void handshake_req_destroy_test1(struct kunit *test) +{ + struct handshake_req *req; + struct socket *sock; + int err; + + /* Arrange */ + handshake_req_destroy_test = NULL; + + req = handshake_req_alloc(&handshake_req_alloc_proto_destroy, GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, req); + + err = __sock_create(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP, + &sock, 1); + KUNIT_ASSERT_EQ(test, err, 0); + + sock->file = sock_alloc_file(sock, O_NONBLOCK, NULL); + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, sock->file); + + err = handshake_req_submit(sock, req, GFP_KERNEL); + KUNIT_ASSERT_EQ(test, err, 0); + + handshake_req_cancel(sock->sk); + + /* Act */ + sock_release(sock); + + /* Assert */ + KUNIT_EXPECT_PTR_EQ(test, handshake_req_destroy_test, req); +} + +static struct kunit_case handshake_api_test_cases[] = { + { + .name = "req_alloc API fuzzing", + .run_case = handshake_req_alloc_case, + .generate_params = handshake_req_alloc_gen_params, + }, + { + .name = "req_submit NULL req arg", + .run_case = handshake_req_submit_test1, + }, + { + .name = "req_submit NULL sock arg", + .run_case = handshake_req_submit_test2, + }, + { + .name = "req_submit NULL sock->file", + .run_case = handshake_req_submit_test3, + }, + { + .name = "req_lookup works", + .run_case = handshake_req_submit_test4, + }, + { + .name = "req_submit max pending", + .run_case = handshake_req_submit_test5, + }, + { + .name = "req_submit multiple", + .run_case = handshake_req_submit_test6, + }, + { + .name = "req_cancel before accept", + .run_case = handshake_req_cancel_test1, + }, + { + .name = "req_cancel after accept", + .run_case = handshake_req_cancel_test2, + }, + { + .name = "req_cancel after done", + .run_case = handshake_req_cancel_test3, + }, + { + .name = "req_destroy works", + .run_case = handshake_req_destroy_test1, + }, + {} +}; + +static struct kunit_suite handshake_api_suite = { + .name = "Handshake API tests", + .test_cases = handshake_api_test_cases, +}; + +kunit_test_suites(&handshake_api_suite); + +MODULE_DESCRIPTION("Test handshake upcall API functions"); +MODULE_LICENSE("GPL"); diff --git a/net/handshake/handshake.h b/net/handshake/handshake.h new file mode 100644 index 000000000000..4dac965c99df --- /dev/null +++ b/net/handshake/handshake.h @@ -0,0 +1,87 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Generic netlink handshake service + * + * Author: Chuck Lever <chuck.lever@oracle.com> + * + * Copyright (c) 2023, Oracle and/or its affiliates. + */ + +#ifndef _INTERNAL_HANDSHAKE_H +#define _INTERNAL_HANDSHAKE_H + +/* Per-net namespace context */ +struct handshake_net { + spinlock_t hn_lock; /* protects next 3 fields */ + int hn_pending; + int hn_pending_max; + struct list_head hn_requests; + + unsigned long hn_flags; +}; + +enum hn_flags_bits { + HANDSHAKE_F_NET_DRAINING, +}; + +struct handshake_proto; + +/* One handshake request */ +struct handshake_req { + struct list_head hr_list; + struct rhash_head hr_rhash; + unsigned long hr_flags; + const struct handshake_proto *hr_proto; + struct sock *hr_sk; + void (*hr_odestruct)(struct sock *sk); + + /* Always the last field */ + char hr_priv[]; +}; + +enum hr_flags_bits { + HANDSHAKE_F_REQ_COMPLETED, +}; + +/* Invariants for all handshake requests for one transport layer + * security protocol + */ +struct handshake_proto { + int hp_handler_class; + size_t hp_privsize; + unsigned long hp_flags; + + int (*hp_accept)(struct handshake_req *req, + struct genl_info *info, int fd); + void (*hp_done)(struct handshake_req *req, + unsigned int status, + struct genl_info *info); + void (*hp_destroy)(struct handshake_req *req); +}; + +enum hp_flags_bits { + HANDSHAKE_F_PROTO_NOTIFY, +}; + +/* netlink.c */ +int handshake_genl_notify(struct net *net, const struct handshake_proto *proto, + gfp_t flags); +struct nlmsghdr *handshake_genl_put(struct sk_buff *msg, + struct genl_info *info); +struct handshake_net *handshake_pernet(struct net *net); + +/* request.c */ +struct handshake_req *handshake_req_alloc(const struct handshake_proto *proto, + gfp_t flags); +int handshake_req_hash_init(void); +void handshake_req_hash_destroy(void); +void *handshake_req_private(struct handshake_req *req); +struct handshake_req *handshake_req_hash_lookup(struct sock *sk); +struct handshake_req *handshake_req_next(struct handshake_net *hn, int class); +int handshake_req_submit(struct socket *sock, struct handshake_req *req, + gfp_t flags); +void handshake_complete(struct handshake_req *req, unsigned int status, + struct genl_info *info); +bool handshake_req_cancel(struct sock *sk); + +#endif /* _INTERNAL_HANDSHAKE_H */ diff --git a/net/handshake/netlink.c b/net/handshake/netlink.c new file mode 100644 index 000000000000..35c9c445e0b8 --- /dev/null +++ b/net/handshake/netlink.c @@ -0,0 +1,319 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Generic netlink handshake service + * + * Author: Chuck Lever <chuck.lever@oracle.com> + * + * Copyright (c) 2023, Oracle and/or its affiliates. + */ + +#include <linux/types.h> +#include <linux/socket.h> +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/skbuff.h> +#include <linux/mm.h> + +#include <net/sock.h> +#include <net/genetlink.h> +#include <net/netns/generic.h> + +#include <kunit/visibility.h> + +#include <uapi/linux/handshake.h> +#include "handshake.h" +#include "genl.h" + +#include <trace/events/handshake.h> + +/** + * handshake_genl_notify - Notify handlers that a request is waiting + * @net: target network namespace + * @proto: handshake protocol + * @flags: memory allocation control flags + * + * Returns zero on success or a negative errno if notification failed. + */ +int handshake_genl_notify(struct net *net, const struct handshake_proto *proto, + gfp_t flags) +{ + struct sk_buff *msg; + void *hdr; + + /* Disable notifications during unit testing */ + if (!test_bit(HANDSHAKE_F_PROTO_NOTIFY, &proto->hp_flags)) + return 0; + + if (!genl_has_listeners(&handshake_nl_family, net, + proto->hp_handler_class)) + return -ESRCH; + + msg = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + hdr = genlmsg_put(msg, 0, 0, &handshake_nl_family, 0, + HANDSHAKE_CMD_READY); + if (!hdr) + goto out_free; + + if (nla_put_u32(msg, HANDSHAKE_A_ACCEPT_HANDLER_CLASS, + proto->hp_handler_class) < 0) { + genlmsg_cancel(msg, hdr); + goto out_free; + } + + genlmsg_end(msg, hdr); + return genlmsg_multicast_netns(&handshake_nl_family, net, msg, + 0, proto->hp_handler_class, flags); + +out_free: + nlmsg_free(msg); + return -EMSGSIZE; +} + +/** + * handshake_genl_put - Create a generic netlink message header + * @msg: buffer in which to create the header + * @info: generic netlink message context + * + * Returns a ready-to-use header, or NULL. + */ +struct nlmsghdr *handshake_genl_put(struct sk_buff *msg, + struct genl_info *info) +{ + return genlmsg_put(msg, info->snd_portid, info->snd_seq, + &handshake_nl_family, 0, info->genlhdr->cmd); +} +EXPORT_SYMBOL(handshake_genl_put); + +/* + * dup() a kernel socket for use as a user space file descriptor + * in the current process. The kernel socket must have an + * instatiated struct file. + * + * Implicit argument: "current()" + */ +static int handshake_dup(struct socket *sock) +{ + struct file *file; + int newfd; + + if (!sock->file) + return -EBADF; + + file = get_file(sock->file); + newfd = get_unused_fd_flags(O_CLOEXEC); + if (newfd < 0) { + fput(file); + return newfd; + } + + fd_install(newfd, file); + return newfd; +} + +int handshake_nl_accept_doit(struct sk_buff *skb, struct genl_info *info) +{ + struct net *net = sock_net(skb->sk); + struct handshake_net *hn = handshake_pernet(net); + struct handshake_req *req = NULL; + struct socket *sock; + int class, fd, err; + + err = -EOPNOTSUPP; + if (!hn) + goto out_status; + + err = -EINVAL; + if (GENL_REQ_ATTR_CHECK(info, HANDSHAKE_A_ACCEPT_HANDLER_CLASS)) + goto out_status; + class = nla_get_u32(info->attrs[HANDSHAKE_A_ACCEPT_HANDLER_CLASS]); + + err = -EAGAIN; + req = handshake_req_next(hn, class); + if (!req) + goto out_status; + + sock = req->hr_sk->sk_socket; + fd = handshake_dup(sock); + if (fd < 0) { + err = fd; + goto out_complete; + } + err = req->hr_proto->hp_accept(req, info, fd); + if (err) + goto out_complete; + + trace_handshake_cmd_accept(net, req, req->hr_sk, fd); + return 0; + +out_complete: + handshake_complete(req, -EIO, NULL); + fput(sock->file); +out_status: + trace_handshake_cmd_accept_err(net, req, NULL, err); + return err; +} + +int handshake_nl_done_doit(struct sk_buff *skb, struct genl_info *info) +{ + struct net *net = sock_net(skb->sk); + struct socket *sock = NULL; + struct handshake_req *req; + int fd, status, err; + + if (GENL_REQ_ATTR_CHECK(info, HANDSHAKE_A_DONE_SOCKFD)) + return -EINVAL; + fd = nla_get_u32(info->attrs[HANDSHAKE_A_DONE_SOCKFD]); + + err = 0; + sock = sockfd_lookup(fd, &err); + if (err) { + err = -EBADF; + goto out_status; + } + + req = handshake_req_hash_lookup(sock->sk); + if (!req) { + err = -EBUSY; + fput(sock->file); + goto out_status; + } + + trace_handshake_cmd_done(net, req, sock->sk, fd); + + status = -EIO; + if (info->attrs[HANDSHAKE_A_DONE_STATUS]) + status = nla_get_u32(info->attrs[HANDSHAKE_A_DONE_STATUS]); + + handshake_complete(req, status, info); + fput(sock->file); + return 0; + +out_status: + trace_handshake_cmd_done_err(net, req, sock->sk, err); + return err; +} + +static unsigned int handshake_net_id; + +static int __net_init handshake_net_init(struct net *net) +{ + struct handshake_net *hn = net_generic(net, handshake_net_id); + unsigned long tmp; + struct sysinfo si; + + /* + * Arbitrary limit to prevent handshakes that do not make + * progress from clogging up the system. The cap scales up + * with the amount of physical memory on the system. + */ + si_meminfo(&si); + tmp = si.totalram / (25 * si.mem_unit); + hn->hn_pending_max = clamp(tmp, 3UL, 50UL); + + spin_lock_init(&hn->hn_lock); + hn->hn_pending = 0; + hn->hn_flags = 0; + INIT_LIST_HEAD(&hn->hn_requests); + return 0; +} + +static void __net_exit handshake_net_exit(struct net *net) +{ + struct handshake_net *hn = net_generic(net, handshake_net_id); + struct handshake_req *req; + LIST_HEAD(requests); + + /* + * Drain the net's pending list. Requests that have been + * accepted and are in progress will be destroyed when + * the socket is closed. + */ + spin_lock(&hn->hn_lock); + set_bit(HANDSHAKE_F_NET_DRAINING, &hn->hn_flags); + list_splice_init(&requests, &hn->hn_requests); + spin_unlock(&hn->hn_lock); + + while (!list_empty(&requests)) { + req = list_first_entry(&requests, struct handshake_req, hr_list); + list_del(&req->hr_list); + + /* + * Requests on this list have not yet been + * accepted, so they do not have an fd to put. + */ + + handshake_complete(req, -ETIMEDOUT, NULL); + } +} + +static struct pernet_operations handshake_genl_net_ops = { + .init = handshake_net_init, + .exit = handshake_net_exit, + .id = &handshake_net_id, + .size = sizeof(struct handshake_net), +}; + +/** + * handshake_pernet - Get the handshake private per-net structure + * @net: network namespace + * + * Returns a pointer to the net's private per-net structure for the + * handshake module, or NULL if handshake_init() failed. + */ +struct handshake_net *handshake_pernet(struct net *net) +{ + return handshake_net_id ? + net_generic(net, handshake_net_id) : NULL; +} +EXPORT_SYMBOL_IF_KUNIT(handshake_pernet); + +static int __init handshake_init(void) +{ + int ret; + + ret = handshake_req_hash_init(); + if (ret) { + pr_warn("handshake: hash initialization failed (%d)\n", ret); + return ret; + } + + ret = genl_register_family(&handshake_nl_family); + if (ret) { + pr_warn("handshake: netlink registration failed (%d)\n", ret); + handshake_req_hash_destroy(); + return ret; + } + + /* + * ORDER: register_pernet_subsys must be done last. + * + * If initialization does not make it past pernet_subsys + * registration, then handshake_net_id will remain 0. That + * shunts the handshake consumer API to return ENOTSUPP + * to prevent it from dereferencing something that hasn't + * been allocated. + */ + ret = register_pernet_subsys(&handshake_genl_net_ops); + if (ret) { + pr_warn("handshake: pernet registration failed (%d)\n", ret); + genl_unregister_family(&handshake_nl_family); + handshake_req_hash_destroy(); + } + + return ret; +} + +static void __exit handshake_exit(void) +{ + unregister_pernet_subsys(&handshake_genl_net_ops); + handshake_net_id = 0; + + handshake_req_hash_destroy(); + genl_unregister_family(&handshake_nl_family); +} + +module_init(handshake_init); +module_exit(handshake_exit); diff --git a/net/handshake/request.c b/net/handshake/request.c new file mode 100644 index 000000000000..94d5cef3e048 --- /dev/null +++ b/net/handshake/request.c @@ -0,0 +1,344 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Handshake request lifetime events + * + * Author: Chuck Lever <chuck.lever@oracle.com> + * + * Copyright (c) 2023, Oracle and/or its affiliates. + */ + +#include <linux/types.h> +#include <linux/socket.h> +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/skbuff.h> +#include <linux/inet.h> +#include <linux/fdtable.h> +#include <linux/rhashtable.h> + +#include <net/sock.h> +#include <net/genetlink.h> +#include <net/netns/generic.h> + +#include <kunit/visibility.h> + +#include <uapi/linux/handshake.h> +#include "handshake.h" + +#include <trace/events/handshake.h> + +/* + * We need both a handshake_req -> sock mapping, and a sock -> + * handshake_req mapping. Both are one-to-one. + * + * To avoid adding another pointer field to struct sock, net/handshake + * maintains a hash table, indexed by the memory address of @sock, to + * find the struct handshake_req outstanding for that socket. The + * reverse direction uses a simple pointer field in the handshake_req + * struct. + */ + +static struct rhashtable handshake_rhashtbl ____cacheline_aligned_in_smp; + +static const struct rhashtable_params handshake_rhash_params = { + .key_len = sizeof_field(struct handshake_req, hr_sk), + .key_offset = offsetof(struct handshake_req, hr_sk), + .head_offset = offsetof(struct handshake_req, hr_rhash), + .automatic_shrinking = true, +}; + +int handshake_req_hash_init(void) +{ + return rhashtable_init(&handshake_rhashtbl, &handshake_rhash_params); +} + +void handshake_req_hash_destroy(void) +{ + rhashtable_destroy(&handshake_rhashtbl); +} + +struct handshake_req *handshake_req_hash_lookup(struct sock *sk) +{ + return rhashtable_lookup_fast(&handshake_rhashtbl, &sk, + handshake_rhash_params); +} +EXPORT_SYMBOL_IF_KUNIT(handshake_req_hash_lookup); + +static bool handshake_req_hash_add(struct handshake_req *req) +{ + int ret; + + ret = rhashtable_lookup_insert_fast(&handshake_rhashtbl, + &req->hr_rhash, + handshake_rhash_params); + return ret == 0; +} + +static void handshake_req_destroy(struct handshake_req *req) +{ + if (req->hr_proto->hp_destroy) + req->hr_proto->hp_destroy(req); + rhashtable_remove_fast(&handshake_rhashtbl, &req->hr_rhash, + handshake_rhash_params); + kfree(req); +} + +static void handshake_sk_destruct(struct sock *sk) +{ + void (*sk_destruct)(struct sock *sk); + struct handshake_req *req; + + req = handshake_req_hash_lookup(sk); + if (!req) + return; + + trace_handshake_destruct(sock_net(sk), req, sk); + sk_destruct = req->hr_odestruct; + handshake_req_destroy(req); + if (sk_destruct) + sk_destruct(sk); +} + +/** + * handshake_req_alloc - Allocate a handshake request + * @proto: security protocol + * @flags: memory allocation flags + * + * Returns an initialized handshake_req or NULL. + */ +struct handshake_req *handshake_req_alloc(const struct handshake_proto *proto, + gfp_t flags) +{ + struct handshake_req *req; + + if (!proto) + return NULL; + if (proto->hp_handler_class <= HANDSHAKE_HANDLER_CLASS_NONE) + return NULL; + if (proto->hp_handler_class >= HANDSHAKE_HANDLER_CLASS_MAX) + return NULL; + if (!proto->hp_accept || !proto->hp_done) + return NULL; + + req = kzalloc(struct_size(req, hr_priv, proto->hp_privsize), flags); + if (!req) + return NULL; + + INIT_LIST_HEAD(&req->hr_list); + req->hr_proto = proto; + return req; +} +EXPORT_SYMBOL(handshake_req_alloc); + +/** + * handshake_req_private - Get per-handshake private data + * @req: handshake arguments + * + */ +void *handshake_req_private(struct handshake_req *req) +{ + return (void *)&req->hr_priv; +} +EXPORT_SYMBOL(handshake_req_private); + +static bool __add_pending_locked(struct handshake_net *hn, + struct handshake_req *req) +{ + if (WARN_ON_ONCE(!list_empty(&req->hr_list))) + return false; + hn->hn_pending++; + list_add_tail(&req->hr_list, &hn->hn_requests); + return true; +} + +static void __remove_pending_locked(struct handshake_net *hn, + struct handshake_req *req) +{ + hn->hn_pending--; + list_del_init(&req->hr_list); +} + +/* + * Returns %true if the request was found on @net's pending list, + * otherwise %false. + * + * If @req was on a pending list, it has not yet been accepted. + */ +static bool remove_pending(struct handshake_net *hn, struct handshake_req *req) +{ + bool ret = false; + + spin_lock(&hn->hn_lock); + if (!list_empty(&req->hr_list)) { + __remove_pending_locked(hn, req); + ret = true; + } + spin_unlock(&hn->hn_lock); + + return ret; +} + +struct handshake_req *handshake_req_next(struct handshake_net *hn, int class) +{ + struct handshake_req *req, *pos; + + req = NULL; + spin_lock(&hn->hn_lock); + list_for_each_entry(pos, &hn->hn_requests, hr_list) { + if (pos->hr_proto->hp_handler_class != class) + continue; + __remove_pending_locked(hn, pos); + req = pos; + break; + } + spin_unlock(&hn->hn_lock); + + return req; +} +EXPORT_SYMBOL_IF_KUNIT(handshake_req_next); + +/** + * handshake_req_submit - Submit a handshake request + * @sock: open socket on which to perform the handshake + * @req: handshake arguments + * @flags: memory allocation flags + * + * Return values: + * %0: Request queued + * %-EINVAL: Invalid argument + * %-EBUSY: A handshake is already under way for this socket + * %-ESRCH: No handshake agent is available + * %-EAGAIN: Too many pending handshake requests + * %-ENOMEM: Failed to allocate memory + * %-EMSGSIZE: Failed to construct notification message + * %-EOPNOTSUPP: Handshake module not initialized + * + * A zero return value from handshake_req_submit() means that + * exactly one subsequent completion callback is guaranteed. + * + * A negative return value from handshake_req_submit() means that + * no completion callback will be done and that @req has been + * destroyed. + */ +int handshake_req_submit(struct socket *sock, struct handshake_req *req, + gfp_t flags) +{ + struct handshake_net *hn; + struct net *net; + int ret; + + if (!sock || !req || !sock->file) { + kfree(req); + return -EINVAL; + } + + req->hr_sk = sock->sk; + if (!req->hr_sk) { + kfree(req); + return -EINVAL; + } + req->hr_odestruct = req->hr_sk->sk_destruct; + req->hr_sk->sk_destruct = handshake_sk_destruct; + + ret = -EOPNOTSUPP; + net = sock_net(req->hr_sk); + hn = handshake_pernet(net); + if (!hn) + goto out_err; + + ret = -EAGAIN; + if (READ_ONCE(hn->hn_pending) >= hn->hn_pending_max) + goto out_err; + + spin_lock(&hn->hn_lock); + ret = -EOPNOTSUPP; + if (test_bit(HANDSHAKE_F_NET_DRAINING, &hn->hn_flags)) + goto out_unlock; + ret = -EBUSY; + if (!handshake_req_hash_add(req)) + goto out_unlock; + if (!__add_pending_locked(hn, req)) + goto out_unlock; + spin_unlock(&hn->hn_lock); + + ret = handshake_genl_notify(net, req->hr_proto, flags); + if (ret) { + trace_handshake_notify_err(net, req, req->hr_sk, ret); + if (remove_pending(hn, req)) + goto out_err; + } + + /* Prevent socket release while a handshake request is pending */ + sock_hold(req->hr_sk); + + trace_handshake_submit(net, req, req->hr_sk); + return 0; + +out_unlock: + spin_unlock(&hn->hn_lock); +out_err: + trace_handshake_submit_err(net, req, req->hr_sk, ret); + handshake_req_destroy(req); + return ret; +} +EXPORT_SYMBOL(handshake_req_submit); + +void handshake_complete(struct handshake_req *req, unsigned int status, + struct genl_info *info) +{ + struct sock *sk = req->hr_sk; + struct net *net = sock_net(sk); + + if (!test_and_set_bit(HANDSHAKE_F_REQ_COMPLETED, &req->hr_flags)) { + trace_handshake_complete(net, req, sk, status); + req->hr_proto->hp_done(req, status, info); + + /* Handshake request is no longer pending */ + sock_put(sk); + } +} +EXPORT_SYMBOL_IF_KUNIT(handshake_complete); + +/** + * handshake_req_cancel - Cancel an in-progress handshake + * @sk: socket on which there is an ongoing handshake + * + * Request cancellation races with request completion. To determine + * who won, callers examine the return value from this function. + * + * Return values: + * %true - Uncompleted handshake request was canceled + * %false - Handshake request already completed or not found + */ +bool handshake_req_cancel(struct sock *sk) +{ + struct handshake_req *req; + struct handshake_net *hn; + struct net *net; + + net = sock_net(sk); + req = handshake_req_hash_lookup(sk); + if (!req) { + trace_handshake_cancel_none(net, req, sk); + return false; + } + + hn = handshake_pernet(net); + if (hn && remove_pending(hn, req)) { + /* Request hadn't been accepted */ + goto out_true; + } + if (test_and_set_bit(HANDSHAKE_F_REQ_COMPLETED, &req->hr_flags)) { + /* Request already completed */ + trace_handshake_cancel_busy(net, req, sk); + return false; + } + +out_true: + trace_handshake_cancel(net, req, sk); + + /* Handshake request is no longer pending */ + sock_put(sk); + return true; +} +EXPORT_SYMBOL(handshake_req_cancel); diff --git a/net/handshake/tlshd.c b/net/handshake/tlshd.c new file mode 100644 index 000000000000..fcbeb63b4eb1 --- /dev/null +++ b/net/handshake/tlshd.c @@ -0,0 +1,418 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Establish a TLS session for a kernel socket consumer + * using the tlshd user space handler. + * + * Author: Chuck Lever <chuck.lever@oracle.com> + * + * Copyright (c) 2021-2023, Oracle and/or its affiliates. + */ + +#include <linux/types.h> +#include <linux/socket.h> +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/slab.h> +#include <linux/key.h> + +#include <net/sock.h> +#include <net/handshake.h> +#include <net/genetlink.h> + +#include <uapi/linux/keyctl.h> +#include <uapi/linux/handshake.h> +#include "handshake.h" + +struct tls_handshake_req { + void (*th_consumer_done)(void *data, int status, + key_serial_t peerid); + void *th_consumer_data; + + int th_type; + unsigned int th_timeout_ms; + int th_auth_mode; + key_serial_t th_keyring; + key_serial_t th_certificate; + key_serial_t th_privkey; + + unsigned int th_num_peerids; + key_serial_t th_peerid[5]; +}; + +static struct tls_handshake_req * +tls_handshake_req_init(struct handshake_req *req, + const struct tls_handshake_args *args) +{ + struct tls_handshake_req *treq = handshake_req_private(req); + + treq->th_timeout_ms = args->ta_timeout_ms; + treq->th_consumer_done = args->ta_done; + treq->th_consumer_data = args->ta_data; + treq->th_keyring = args->ta_keyring; + treq->th_num_peerids = 0; + treq->th_certificate = TLS_NO_CERT; + treq->th_privkey = TLS_NO_PRIVKEY; + return treq; +} + +static void tls_handshake_remote_peerids(struct tls_handshake_req *treq, + struct genl_info *info) +{ + struct nlattr *head = nlmsg_attrdata(info->nlhdr, GENL_HDRLEN); + int rem, len = nlmsg_attrlen(info->nlhdr, GENL_HDRLEN); + struct nlattr *nla; + unsigned int i; + + i = 0; + nla_for_each_attr(nla, head, len, rem) { + if (nla_type(nla) == HANDSHAKE_A_DONE_REMOTE_AUTH) + i++; + } + if (!i) + return; + treq->th_num_peerids = min_t(unsigned int, i, + ARRAY_SIZE(treq->th_peerid)); + + i = 0; + nla_for_each_attr(nla, head, len, rem) { + if (nla_type(nla) == HANDSHAKE_A_DONE_REMOTE_AUTH) + treq->th_peerid[i++] = nla_get_u32(nla); + if (i >= treq->th_num_peerids) + break; + } +} + +/** + * tls_handshake_done - callback to handle a CMD_DONE request + * @req: socket on which the handshake was performed + * @status: session status code + * @info: full results of session establishment + * + */ +static void tls_handshake_done(struct handshake_req *req, + unsigned int status, struct genl_info *info) +{ + struct tls_handshake_req *treq = handshake_req_private(req); + + treq->th_peerid[0] = TLS_NO_PEERID; + if (info) + tls_handshake_remote_peerids(treq, info); + + treq->th_consumer_done(treq->th_consumer_data, -status, + treq->th_peerid[0]); +} + +#if IS_ENABLED(CONFIG_KEYS) +static int tls_handshake_private_keyring(struct tls_handshake_req *treq) +{ + key_ref_t process_keyring_ref, keyring_ref; + int ret; + + if (treq->th_keyring == TLS_NO_KEYRING) + return 0; + + process_keyring_ref = lookup_user_key(KEY_SPEC_PROCESS_KEYRING, + KEY_LOOKUP_CREATE, + KEY_NEED_WRITE); + if (IS_ERR(process_keyring_ref)) { + ret = PTR_ERR(process_keyring_ref); + goto out; + } + + keyring_ref = lookup_user_key(treq->th_keyring, KEY_LOOKUP_CREATE, + KEY_NEED_LINK); + if (IS_ERR(keyring_ref)) { + ret = PTR_ERR(keyring_ref); + goto out_put_key; + } + + ret = key_link(key_ref_to_ptr(process_keyring_ref), + key_ref_to_ptr(keyring_ref)); + + key_ref_put(keyring_ref); +out_put_key: + key_ref_put(process_keyring_ref); +out: + return ret; +} +#else +static int tls_handshake_private_keyring(struct tls_handshake_req *treq) +{ + return 0; +} +#endif + +static int tls_handshake_put_peer_identity(struct sk_buff *msg, + struct tls_handshake_req *treq) +{ + unsigned int i; + + for (i = 0; i < treq->th_num_peerids; i++) + if (nla_put_u32(msg, HANDSHAKE_A_ACCEPT_PEER_IDENTITY, + treq->th_peerid[i]) < 0) + return -EMSGSIZE; + return 0; +} + +static int tls_handshake_put_certificate(struct sk_buff *msg, + struct tls_handshake_req *treq) +{ + struct nlattr *entry_attr; + + if (treq->th_certificate == TLS_NO_CERT && + treq->th_privkey == TLS_NO_PRIVKEY) + return 0; + + entry_attr = nla_nest_start(msg, HANDSHAKE_A_ACCEPT_CERTIFICATE); + if (!entry_attr) + return -EMSGSIZE; + + if (nla_put_u32(msg, HANDSHAKE_A_X509_CERT, + treq->th_certificate) || + nla_put_u32(msg, HANDSHAKE_A_X509_PRIVKEY, + treq->th_privkey)) { + nla_nest_cancel(msg, entry_attr); + return -EMSGSIZE; + } + + nla_nest_end(msg, entry_attr); + return 0; +} + +/** + * tls_handshake_accept - callback to construct a CMD_ACCEPT response + * @req: handshake parameters to return + * @info: generic netlink message context + * @fd: file descriptor to be returned + * + * Returns zero on success, or a negative errno on failure. + */ +static int tls_handshake_accept(struct handshake_req *req, + struct genl_info *info, int fd) +{ + struct tls_handshake_req *treq = handshake_req_private(req); + struct nlmsghdr *hdr; + struct sk_buff *msg; + int ret; + + ret = tls_handshake_private_keyring(treq); + if (ret < 0) + goto out; + + ret = -ENOMEM; + msg = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + goto out; + hdr = handshake_genl_put(msg, info); + if (!hdr) + goto out_cancel; + + ret = -EMSGSIZE; + ret = nla_put_u32(msg, HANDSHAKE_A_ACCEPT_SOCKFD, fd); + if (ret < 0) + goto out_cancel; + ret = nla_put_u32(msg, HANDSHAKE_A_ACCEPT_MESSAGE_TYPE, treq->th_type); + if (ret < 0) + goto out_cancel; + if (treq->th_timeout_ms) { + ret = nla_put_u32(msg, HANDSHAKE_A_ACCEPT_TIMEOUT, treq->th_timeout_ms); + if (ret < 0) + goto out_cancel; + } + + ret = nla_put_u32(msg, HANDSHAKE_A_ACCEPT_AUTH_MODE, + treq->th_auth_mode); + if (ret < 0) + goto out_cancel; + switch (treq->th_auth_mode) { + case HANDSHAKE_AUTH_PSK: + ret = tls_handshake_put_peer_identity(msg, treq); + if (ret < 0) + goto out_cancel; + break; + case HANDSHAKE_AUTH_X509: + ret = tls_handshake_put_certificate(msg, treq); + if (ret < 0) + goto out_cancel; + break; + } + + genlmsg_end(msg, hdr); + return genlmsg_reply(msg, info); + +out_cancel: + genlmsg_cancel(msg, hdr); +out: + return ret; +} + +static const struct handshake_proto tls_handshake_proto = { + .hp_handler_class = HANDSHAKE_HANDLER_CLASS_TLSHD, + .hp_privsize = sizeof(struct tls_handshake_req), + .hp_flags = BIT(HANDSHAKE_F_PROTO_NOTIFY), + + .hp_accept = tls_handshake_accept, + .hp_done = tls_handshake_done, +}; + +/** + * tls_client_hello_anon - request an anonymous TLS handshake on a socket + * @args: socket and handshake parameters for this request + * @flags: memory allocation control flags + * + * Return values: + * %0: Handshake request enqueue; ->done will be called when complete + * %-ESRCH: No user agent is available + * %-ENOMEM: Memory allocation failed + */ +int tls_client_hello_anon(const struct tls_handshake_args *args, gfp_t flags) +{ + struct tls_handshake_req *treq; + struct handshake_req *req; + + req = handshake_req_alloc(&tls_handshake_proto, flags); + if (!req) + return -ENOMEM; + treq = tls_handshake_req_init(req, args); + treq->th_type = HANDSHAKE_MSG_TYPE_CLIENTHELLO; + treq->th_auth_mode = HANDSHAKE_AUTH_UNAUTH; + + return handshake_req_submit(args->ta_sock, req, flags); +} +EXPORT_SYMBOL(tls_client_hello_anon); + +/** + * tls_client_hello_x509 - request an x.509-based TLS handshake on a socket + * @args: socket and handshake parameters for this request + * @flags: memory allocation control flags + * + * Return values: + * %0: Handshake request enqueue; ->done will be called when complete + * %-ESRCH: No user agent is available + * %-ENOMEM: Memory allocation failed + */ +int tls_client_hello_x509(const struct tls_handshake_args *args, gfp_t flags) +{ + struct tls_handshake_req *treq; + struct handshake_req *req; + + req = handshake_req_alloc(&tls_handshake_proto, flags); + if (!req) + return -ENOMEM; + treq = tls_handshake_req_init(req, args); + treq->th_type = HANDSHAKE_MSG_TYPE_CLIENTHELLO; + treq->th_auth_mode = HANDSHAKE_AUTH_X509; + treq->th_certificate = args->ta_my_cert; + treq->th_privkey = args->ta_my_privkey; + + return handshake_req_submit(args->ta_sock, req, flags); +} +EXPORT_SYMBOL(tls_client_hello_x509); + +/** + * tls_client_hello_psk - request a PSK-based TLS handshake on a socket + * @args: socket and handshake parameters for this request + * @flags: memory allocation control flags + * + * Return values: + * %0: Handshake request enqueue; ->done will be called when complete + * %-EINVAL: Wrong number of local peer IDs + * %-ESRCH: No user agent is available + * %-ENOMEM: Memory allocation failed + */ +int tls_client_hello_psk(const struct tls_handshake_args *args, gfp_t flags) +{ + struct tls_handshake_req *treq; + struct handshake_req *req; + unsigned int i; + + if (!args->ta_num_peerids || + args->ta_num_peerids > ARRAY_SIZE(treq->th_peerid)) + return -EINVAL; + + req = handshake_req_alloc(&tls_handshake_proto, flags); + if (!req) + return -ENOMEM; + treq = tls_handshake_req_init(req, args); + treq->th_type = HANDSHAKE_MSG_TYPE_CLIENTHELLO; + treq->th_auth_mode = HANDSHAKE_AUTH_PSK; + treq->th_num_peerids = args->ta_num_peerids; + for (i = 0; i < args->ta_num_peerids; i++) + treq->th_peerid[i] = args->ta_my_peerids[i]; + + return handshake_req_submit(args->ta_sock, req, flags); +} +EXPORT_SYMBOL(tls_client_hello_psk); + +/** + * tls_server_hello_x509 - request a server TLS handshake on a socket + * @args: socket and handshake parameters for this request + * @flags: memory allocation control flags + * + * Return values: + * %0: Handshake request enqueue; ->done will be called when complete + * %-ESRCH: No user agent is available + * %-ENOMEM: Memory allocation failed + */ +int tls_server_hello_x509(const struct tls_handshake_args *args, gfp_t flags) +{ + struct tls_handshake_req *treq; + struct handshake_req *req; + + req = handshake_req_alloc(&tls_handshake_proto, flags); + if (!req) + return -ENOMEM; + treq = tls_handshake_req_init(req, args); + treq->th_type = HANDSHAKE_MSG_TYPE_SERVERHELLO; + treq->th_auth_mode = HANDSHAKE_AUTH_X509; + treq->th_certificate = args->ta_my_cert; + treq->th_privkey = args->ta_my_privkey; + + return handshake_req_submit(args->ta_sock, req, flags); +} +EXPORT_SYMBOL(tls_server_hello_x509); + +/** + * tls_server_hello_psk - request a server TLS handshake on a socket + * @args: socket and handshake parameters for this request + * @flags: memory allocation control flags + * + * Return values: + * %0: Handshake request enqueue; ->done will be called when complete + * %-ESRCH: No user agent is available + * %-ENOMEM: Memory allocation failed + */ +int tls_server_hello_psk(const struct tls_handshake_args *args, gfp_t flags) +{ + struct tls_handshake_req *treq; + struct handshake_req *req; + + req = handshake_req_alloc(&tls_handshake_proto, flags); + if (!req) + return -ENOMEM; + treq = tls_handshake_req_init(req, args); + treq->th_type = HANDSHAKE_MSG_TYPE_SERVERHELLO; + treq->th_auth_mode = HANDSHAKE_AUTH_PSK; + treq->th_num_peerids = 1; + treq->th_peerid[0] = args->ta_my_peerids[0]; + + return handshake_req_submit(args->ta_sock, req, flags); +} +EXPORT_SYMBOL(tls_server_hello_psk); + +/** + * tls_handshake_cancel - cancel a pending handshake + * @sk: socket on which there is an ongoing handshake + * + * Request cancellation races with request completion. To determine + * who won, callers examine the return value from this function. + * + * Return values: + * %true - Uncompleted handshake request was canceled + * %false - Handshake request already completed or not found + */ +bool tls_handshake_cancel(struct sock *sk) +{ + return handshake_req_cancel(sk); +} +EXPORT_SYMBOL(tls_handshake_cancel); diff --git a/net/handshake/trace.c b/net/handshake/trace.c new file mode 100644 index 000000000000..1c4d8e27e17a --- /dev/null +++ b/net/handshake/trace.c @@ -0,0 +1,20 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Trace points for transport security layer handshakes. + * + * Author: Chuck Lever <chuck.lever@oracle.com> + * + * Copyright (c) 2023, Oracle and/or its affiliates. + */ + +#include <linux/types.h> + +#include <net/sock.h> +#include <net/netlink.h> +#include <net/genetlink.h> + +#include "handshake.h" + +#define CREATE_TRACE_POINTS + +#include <trace/events/handshake.h> diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile index 880277c9fd07..b18ba8ef93ad 100644 --- a/net/ipv4/Makefile +++ b/net/ipv4/Makefile @@ -26,7 +26,7 @@ obj-$(CONFIG_IP_MROUTE) += ipmr.o obj-$(CONFIG_IP_MROUTE_COMMON) += ipmr_base.o obj-$(CONFIG_NET_IPIP) += ipip.o gre-y := gre_demux.o -fou-y := fou_core.o fou_nl.o +fou-y := fou_core.o fou_nl.o fou_bpf.o obj-$(CONFIG_NET_FOU) += fou.o obj-$(CONFIG_NET_IPGRE_DEMUX) += gre.o obj-$(CONFIG_NET_IPGRE) += ip_gre.o diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 8db6747f892f..940062e08f57 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1322,7 +1322,7 @@ int inet_sk_rebuild_header(struct sock *sk) sk->sk_state != TCP_SYN_SENT || (sk->sk_userlocks & SOCK_BINDADDR_LOCK) || (err = inet_sk_reselect_saddr(sk)) != 0) - sk->sk_err_soft = -err; + WRITE_ONCE(sk->sk_err_soft, -err); } return err; diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c index 4f7237661afb..9456f5bb35e5 100644 --- a/net/ipv4/arp.c +++ b/net/ipv4/arp.c @@ -375,7 +375,7 @@ static void arp_solicit(struct neighbour *neigh, struct sk_buff *skb) probes -= NEIGH_VAR(neigh->parms, UCAST_PROBES); if (probes < 0) { - if (!(neigh->nud_state & NUD_VALID)) + if (!(READ_ONCE(neigh->nud_state) & NUD_VALID)) pr_debug("trying to ucast probe in NUD_INVALID\n"); neigh_ha_snapshot(dst_ha, neigh, dev); dst_hw = dst_ha; @@ -1123,7 +1123,7 @@ static int arp_req_get(struct arpreq *r, struct net_device *dev) neigh = neigh_lookup(&arp_tbl, &ip, dev); if (neigh) { - if (!(neigh->nud_state & NUD_NOARP)) { + if (!(READ_ONCE(neigh->nud_state) & NUD_NOARP)) { read_lock_bh(&neigh->lock); memcpy(r->arp_ha.sa_data, neigh->ha, dev->addr_len); r->arp_flags = arp_state_to_flags(neigh); @@ -1144,12 +1144,12 @@ int arp_invalidate(struct net_device *dev, __be32 ip, bool force) struct neigh_table *tbl = &arp_tbl; if (neigh) { - if ((neigh->nud_state & NUD_VALID) && !force) { + if ((READ_ONCE(neigh->nud_state) & NUD_VALID) && !force) { neigh_release(neigh); return 0; } - if (neigh->nud_state & ~NUD_NOARP) + if (READ_ONCE(neigh->nud_state) & ~NUD_NOARP) err = neigh_update(neigh, NULL, NUD_FAILED, NEIGH_UPDATE_F_OVERRIDE| NEIGH_UPDATE_F_ADMIN, 0); diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c index 13fc0c185cd9..4406d796cc2f 100644 --- a/net/ipv4/bpf_tcp_ca.c +++ b/net/ipv4/bpf_tcp_ca.c @@ -72,15 +72,11 @@ static bool bpf_tcp_ca_is_valid_access(int off, int size, static int bpf_tcp_ca_btf_struct_access(struct bpf_verifier_log *log, const struct bpf_reg_state *reg, - int off, int size, enum bpf_access_type atype, - u32 *next_btf_id, enum bpf_type_flag *flag) + int off, int size) { const struct btf_type *t; size_t end; - if (atype == BPF_READ) - return btf_struct_access(log, reg, off, size, atype, next_btf_id, flag); - t = btf_type_by_id(reg->btf, reg->btf_id); if (t != tcp_sock_type) { bpf_log(log, "only read is supported\n"); @@ -113,6 +109,9 @@ static int bpf_tcp_ca_btf_struct_access(struct bpf_verifier_log *log, case offsetof(struct tcp_sock, ecn_flags): end = offsetofend(struct tcp_sock, ecn_flags); break; + case offsetof(struct tcp_sock, app_limited): + end = offsetofend(struct tcp_sock, app_limited); + break; default: bpf_log(log, "no write support to tcp_sock at off %d\n", off); return -EACCES; @@ -239,8 +238,6 @@ static int bpf_tcp_ca_init_member(const struct btf_type *t, if (bpf_obj_name_cpy(tcp_ca->name, utcp_ca->name, sizeof(tcp_ca->name)) <= 0) return -EINVAL; - if (tcp_ca_find(utcp_ca->name)) - return -EEXIST; return 1; } @@ -266,13 +263,25 @@ static void bpf_tcp_ca_unreg(void *kdata) tcp_unregister_congestion_control(kdata); } +static int bpf_tcp_ca_update(void *kdata, void *old_kdata) +{ + return tcp_update_congestion_control(kdata, old_kdata); +} + +static int bpf_tcp_ca_validate(void *kdata) +{ + return tcp_validate_congestion_control(kdata); +} + struct bpf_struct_ops bpf_tcp_congestion_ops = { .verifier_ops = &bpf_tcp_ca_verifier_ops, .reg = bpf_tcp_ca_reg, .unreg = bpf_tcp_ca_unreg, + .update = bpf_tcp_ca_update, .check_member = bpf_tcp_ca_check_member, .init_member = bpf_tcp_ca_init_member, .init = bpf_tcp_ca_init, + .validate = bpf_tcp_ca_validate, .name = "tcp_congestion_ops", }; diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index b0acf6e19aed..5deac0517ef7 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -962,6 +962,7 @@ static int inet_rtm_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh, extack); } else { u32 new_metric = ifa->ifa_rt_priority; + u8 new_proto = ifa->ifa_proto; inet_free_ifa(ifa); @@ -975,6 +976,8 @@ static int inet_rtm_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh, ifa->ifa_rt_priority = new_metric; } + ifa->ifa_proto = new_proto; + set_ifa_lifetime(ifa, valid_lft, prefered_lft); cancel_delayed_work(&check_lifetime_work); queue_delayed_work(system_power_efficient_wq, diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index 3bb890a40ed7..65ba18a91865 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -563,7 +563,7 @@ static int fib_detect_death(struct fib_info *fi, int order, n = NULL; if (n) { - state = n->nud_state; + state = READ_ONCE(n->nud_state); neigh_release(n); } else { return 0; @@ -2191,7 +2191,7 @@ static bool fib_good_nh(const struct fib_nh *nh) if (nh->fib_nh_scope == RT_SCOPE_LINK) { struct neighbour *n; - rcu_read_lock_bh(); + rcu_read_lock(); if (likely(nh->fib_nh_gw_family == AF_INET)) n = __ipv4_neigh_lookup_noref(nh->fib_nh_dev, @@ -2202,9 +2202,9 @@ static bool fib_good_nh(const struct fib_nh *nh) else n = NULL; if (n) - state = n->nud_state; + state = READ_ONCE(n->nud_state); - rcu_read_unlock_bh(); + rcu_read_unlock(); } return !!(state & NUD_VALID); diff --git a/net/ipv4/fou_bpf.c b/net/ipv4/fou_bpf.c new file mode 100644 index 000000000000..3760a14b6b57 --- /dev/null +++ b/net/ipv4/fou_bpf.c @@ -0,0 +1,119 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Unstable Fou Helpers for TC-BPF hook + * + * These are called from SCHED_CLS BPF programs. Note that it is + * allowed to break compatibility for these functions since the interface they + * are exposed through to BPF programs is explicitly unstable. + */ + +#include <linux/bpf.h> +#include <linux/btf_ids.h> + +#include <net/dst_metadata.h> +#include <net/fou.h> + +struct bpf_fou_encap { + __be16 sport; + __be16 dport; +}; + +enum bpf_fou_encap_type { + FOU_BPF_ENCAP_FOU, + FOU_BPF_ENCAP_GUE, +}; + +__diag_push(); +__diag_ignore_all("-Wmissing-prototypes", + "Global functions as their definitions will be in BTF"); + +/* bpf_skb_set_fou_encap - Set FOU encap parameters + * + * This function allows for using GUE or FOU encapsulation together with an + * ipip device in collect-metadata mode. + * + * It is meant to be used in BPF tc-hooks and after a call to the + * bpf_skb_set_tunnel_key helper, responsible for setting IP addresses. + * + * Parameters: + * @skb_ctx Pointer to ctx (__sk_buff) in TC program. Cannot be NULL + * @encap Pointer to a `struct bpf_fou_encap` storing UDP src and + * dst ports. If sport is set to 0 the kernel will auto-assign a + * port. This is similar to using `encap-sport auto`. + * Cannot be NULL + * @type Encapsulation type for the packet. Their definitions are + * specified in `enum bpf_fou_encap_type` + */ +__bpf_kfunc int bpf_skb_set_fou_encap(struct __sk_buff *skb_ctx, + struct bpf_fou_encap *encap, int type) +{ + struct sk_buff *skb = (struct sk_buff *)skb_ctx; + struct ip_tunnel_info *info = skb_tunnel_info(skb); + + if (unlikely(!encap)) + return -EINVAL; + + if (unlikely(!info || !(info->mode & IP_TUNNEL_INFO_TX))) + return -EINVAL; + + switch (type) { + case FOU_BPF_ENCAP_FOU: + info->encap.type = TUNNEL_ENCAP_FOU; + break; + case FOU_BPF_ENCAP_GUE: + info->encap.type = TUNNEL_ENCAP_GUE; + break; + default: + info->encap.type = TUNNEL_ENCAP_NONE; + } + + if (info->key.tun_flags & TUNNEL_CSUM) + info->encap.flags |= TUNNEL_ENCAP_FLAG_CSUM; + + info->encap.sport = encap->sport; + info->encap.dport = encap->dport; + + return 0; +} + +/* bpf_skb_get_fou_encap - Get FOU encap parameters + * + * This function allows for reading encap metadata from a packet received + * on an ipip device in collect-metadata mode. + * + * Parameters: + * @skb_ctx Pointer to ctx (__sk_buff) in TC program. Cannot be NULL + * @encap Pointer to a struct bpf_fou_encap storing UDP source and + * destination port. Cannot be NULL + */ +__bpf_kfunc int bpf_skb_get_fou_encap(struct __sk_buff *skb_ctx, + struct bpf_fou_encap *encap) +{ + struct sk_buff *skb = (struct sk_buff *)skb_ctx; + struct ip_tunnel_info *info = skb_tunnel_info(skb); + + if (unlikely(!info)) + return -EINVAL; + + encap->sport = info->encap.sport; + encap->dport = info->encap.dport; + + return 0; +} + +__diag_pop() + +BTF_SET8_START(fou_kfunc_set) +BTF_ID_FLAGS(func, bpf_skb_set_fou_encap) +BTF_ID_FLAGS(func, bpf_skb_get_fou_encap) +BTF_SET8_END(fou_kfunc_set) + +static const struct btf_kfunc_id_set fou_bpf_kfunc_set = { + .owner = THIS_MODULE, + .set = &fou_kfunc_set, +}; + +int register_fou_bpf(void) +{ + return register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, + &fou_bpf_kfunc_set); +} diff --git a/net/ipv4/fou_core.c b/net/ipv4/fou_core.c index cafec9b4eee0..0c41076e31ed 100644 --- a/net/ipv4/fou_core.c +++ b/net/ipv4/fou_core.c @@ -1236,10 +1236,15 @@ static int __init fou_init(void) if (ret < 0) goto unregister; + ret = register_fou_bpf(); + if (ret < 0) + goto kfunc_failed; + ret = ip_tunnel_encap_add_fou_ops(); if (ret == 0) return 0; +kfunc_failed: genl_unregister_family(&fou_nl_family); unregister: unregister_pernet_device(&fou_net_ops); diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c index c920aa9a62a9..48ff5f13e797 100644 --- a/net/ipv4/igmp.c +++ b/net/ipv4/igmp.c @@ -2638,10 +2638,10 @@ int ip_mc_gsfget(struct sock *sk, struct group_filter *gsf, /* * check if a multicast source filter allows delivery for a given <src,dst,intf> */ -int ip_mc_sf_allow(struct sock *sk, __be32 loc_addr, __be32 rmt_addr, +int ip_mc_sf_allow(const struct sock *sk, __be32 loc_addr, __be32 rmt_addr, int dif, int sdif) { - struct inet_sock *inet = inet_sk(sk); + const struct inet_sock *inet = inet_sk(sk); struct ip_mc_socklist *pmc; struct ip_sf_socklist *psl; int i; diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 6edae3886885..e7391bf310a7 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -826,13 +826,11 @@ bool inet_bind2_bucket_match_addr_any(const struct inet_bind2_bucket *tb, const unsigned short port, int l3mdev, const struct sock *sk) { #if IS_ENABLED(CONFIG_IPV6) - struct in6_addr addr_any = {}; - if (sk->sk_family != tb->family) { if (sk->sk_family == AF_INET) return net_eq(ib2_net(tb), net) && tb->port == port && tb->l3mdev == l3mdev && - ipv6_addr_equal(&tb->v6_rcv_saddr, &addr_any); + ipv6_addr_any(&tb->v6_rcv_saddr); return false; } @@ -840,7 +838,7 @@ bool inet_bind2_bucket_match_addr_any(const struct inet_bind2_bucket *tb, const if (sk->sk_family == AF_INET6) return net_eq(ib2_net(tb), net) && tb->port == port && tb->l3mdev == l3mdev && - ipv6_addr_equal(&tb->v6_rcv_saddr, &addr_any); + ipv6_addr_any(&tb->v6_rcv_saddr); else #endif return net_eq(ib2_net(tb), net) && tb->port == port && @@ -866,11 +864,10 @@ inet_bhash2_addr_any_hashbucket(const struct sock *sk, const struct net *net, in { struct inet_hashinfo *hinfo = tcp_or_dccp_get_hashinfo(sk); u32 hash; -#if IS_ENABLED(CONFIG_IPV6) - struct in6_addr addr_any = {}; +#if IS_ENABLED(CONFIG_IPV6) if (sk->sk_family == AF_INET6) - hash = ipv6_portaddr_hash(net, &addr_any, port); + hash = ipv6_portaddr_hash(net, &in6addr_any, port); else #endif hash = ipv4_portaddr_hash(net, 0, port); diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 4e4e308c3230..61892268e8a6 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -129,7 +129,8 @@ int ip_local_out(struct net *net, struct sock *sk, struct sk_buff *skb) } EXPORT_SYMBOL_GPL(ip_local_out); -static inline int ip_select_ttl(struct inet_sock *inet, struct dst_entry *dst) +static inline int ip_select_ttl(const struct inet_sock *inet, + const struct dst_entry *dst) { int ttl = inet->uc_ttl; @@ -146,7 +147,7 @@ int ip_build_and_send_pkt(struct sk_buff *skb, const struct sock *sk, __be32 saddr, __be32 daddr, struct ip_options_rcu *opt, u8 tos) { - struct inet_sock *inet = inet_sk(sk); + const struct inet_sock *inet = inet_sk(sk); struct rtable *rt = skb_rtable(skb); struct net *net = sock_net(sk); struct iphdr *iph; @@ -218,7 +219,7 @@ static int ip_finish_output2(struct net *net, struct sock *sk, struct sk_buff *s return res; } - rcu_read_lock_bh(); + rcu_read_lock(); neigh = ip_neigh_for_gw(rt, skb, &is_v6gw); if (!IS_ERR(neigh)) { int res; @@ -226,10 +227,10 @@ static int ip_finish_output2(struct net *net, struct sock *sk, struct sk_buff *s sock_confirm_neigh(skb, neigh); /* if crossing protocols, can not use the cached header */ res = neigh_output(neigh, skb, is_v6gw); - rcu_read_unlock_bh(); + rcu_read_unlock(); return res; } - rcu_read_unlock_bh(); + rcu_read_unlock(); net_dbg_ratelimited("%s: No header cache and no neighbour!\n", __func__); @@ -990,7 +991,7 @@ static int __ip_append_data(struct sock *sk, mtu = cork->gso_size ? IP_MAX_MTU : cork->fragsize; paged = !!cork->gso_size; - if (cork->tx_flags & SKBTX_ANY_SW_TSTAMP && + if (cork->tx_flags & SKBTX_ANY_TSTAMP && sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID) tskey = atomic_inc_return(&sk->sk_tskey) - 1; @@ -1570,9 +1571,19 @@ struct sk_buff *__ip_make_skb(struct sock *sk, cork->dst = NULL; skb_dst_set(skb, &rt->dst); - if (iph->protocol == IPPROTO_ICMP) - icmp_out_count(net, ((struct icmphdr *) - skb_transport_header(skb))->type); + if (iph->protocol == IPPROTO_ICMP) { + u8 icmp_type; + + /* For such sockets, transhdrlen is zero when do ip_append_data(), + * so icmphdr does not in skb linear region and can not get icmp_type + * by icmp_hdr(skb)->type. + */ + if (sk->sk_type == SOCK_RAW && !inet_sk(sk)->hdrincl) + icmp_type = fl4->fl4_icmp_type; + else + icmp_type = icmp_hdr(skb)->type; + icmp_out_count(net, icmp_type); + } ip_cork_release(cork); out: diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c index 2541083d49ad..beeae624c412 100644 --- a/net/ipv4/ip_tunnel.c +++ b/net/ipv4/ip_tunnel.c @@ -359,6 +359,20 @@ err_dev_set_mtu: return ERR_PTR(err); } +void ip_tunnel_md_udp_encap(struct sk_buff *skb, struct ip_tunnel_info *info) +{ + const struct iphdr *iph = ip_hdr(skb); + const struct udphdr *udph; + + if (iph->protocol != IPPROTO_UDP) + return; + + udph = (struct udphdr *)((__u8 *)iph + (iph->ihl << 2)); + info->encap.sport = udph->source; + info->encap.dport = udph->dest; +} +EXPORT_SYMBOL(ip_tunnel_md_udp_encap); + int ip_tunnel_rcv(struct ip_tunnel *tunnel, struct sk_buff *skb, const struct tnl_ptk_info *tpi, struct metadata_dst *tun_dst, bool log_ecn_error) @@ -572,7 +586,11 @@ void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev, tunnel_id_to_key32(key->tun_id), RT_TOS(tos), dev_net(dev), 0, skb->mark, skb_get_hash(skb), key->flow_flags); - if (tunnel->encap.type != TUNNEL_ENCAP_NONE) + + if (!tunnel_hlen) + tunnel_hlen = ip_encap_hlen(&tun_info->encap); + + if (ip_tunnel_encap(skb, &tun_info->encap, &proto, &fl4) < 0) goto tx_error; use_cache = ip_tunnel_dst_cache_usable(skb, tun_info); @@ -732,7 +750,7 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev, dev_net(dev), tunnel->parms.link, tunnel->fwmark, skb_get_hash(skb), 0); - if (ip_tunnel_encap(skb, tunnel, &protocol, &fl4) < 0) + if (ip_tunnel_encap(skb, &tunnel->encap, &protocol, &fl4) < 0) goto tx_error; if (connected && md) { diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c index abea77759b7e..27b8f83c6ea2 100644 --- a/net/ipv4/ipip.c +++ b/net/ipv4/ipip.c @@ -241,6 +241,7 @@ static int ipip_tunnel_rcv(struct sk_buff *skb, u8 ipproto) tun_dst = ip_tun_rx_dst(skb, 0, 0, 0); if (!tun_dst) return 0; + ip_tunnel_md_udp_encap(skb, &tun_dst->u.tun_info); } skb_reset_mac_header(skb); diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c index da5998011ab9..7da1df4997d0 100644 --- a/net/ipv4/netfilter/ip_tables.c +++ b/net/ipv4/netfilter/ip_tables.c @@ -14,7 +14,6 @@ #include <linux/vmalloc.h> #include <linux/netdevice.h> #include <linux/module.h> -#include <linux/icmp.h> #include <net/ip.h> #include <net/compat.h> #include <linux/uaccess.h> @@ -31,7 +30,6 @@ MODULE_LICENSE("GPL"); MODULE_AUTHOR("Netfilter Core Team <coreteam@netfilter.org>"); MODULE_DESCRIPTION("IPv4 packet filter"); -MODULE_ALIAS("ipt_icmp"); void *ipt_alloc_initial_table(const struct xt_table *info) { @@ -1799,52 +1797,6 @@ void ipt_unregister_table_exit(struct net *net, const char *name) __ipt_unregister_table(net, table); } -/* Returns 1 if the type and code is matched by the range, 0 otherwise */ -static inline bool -icmp_type_code_match(u_int8_t test_type, u_int8_t min_code, u_int8_t max_code, - u_int8_t type, u_int8_t code, - bool invert) -{ - return ((test_type == 0xFF) || - (type == test_type && code >= min_code && code <= max_code)) - ^ invert; -} - -static bool -icmp_match(const struct sk_buff *skb, struct xt_action_param *par) -{ - const struct icmphdr *ic; - struct icmphdr _icmph; - const struct ipt_icmp *icmpinfo = par->matchinfo; - - /* Must not be a fragment. */ - if (par->fragoff != 0) - return false; - - ic = skb_header_pointer(skb, par->thoff, sizeof(_icmph), &_icmph); - if (ic == NULL) { - /* We've been asked to examine this packet, and we - * can't. Hence, no choice but to drop. - */ - par->hotdrop = true; - return false; - } - - return icmp_type_code_match(icmpinfo->type, - icmpinfo->code[0], - icmpinfo->code[1], - ic->type, ic->code, - !!(icmpinfo->invflags&IPT_ICMP_INV)); -} - -static int icmp_checkentry(const struct xt_mtchk_param *par) -{ - const struct ipt_icmp *icmpinfo = par->matchinfo; - - /* Must specify no unknown invflags */ - return (icmpinfo->invflags & ~IPT_ICMP_INV) ? -EINVAL : 0; -} - static struct xt_target ipt_builtin_tg[] __read_mostly = { { .name = XT_STANDARD_TARGET, @@ -1875,18 +1827,6 @@ static struct nf_sockopt_ops ipt_sockopts = { .owner = THIS_MODULE, }; -static struct xt_match ipt_builtin_mt[] __read_mostly = { - { - .name = "icmp", - .match = icmp_match, - .matchsize = sizeof(struct ipt_icmp), - .checkentry = icmp_checkentry, - .proto = IPPROTO_ICMP, - .family = NFPROTO_IPV4, - .me = THIS_MODULE, - }, -}; - static int __net_init ip_tables_net_init(struct net *net) { return xt_proto_init(net, NFPROTO_IPV4); @@ -1914,19 +1854,14 @@ static int __init ip_tables_init(void) ret = xt_register_targets(ipt_builtin_tg, ARRAY_SIZE(ipt_builtin_tg)); if (ret < 0) goto err2; - ret = xt_register_matches(ipt_builtin_mt, ARRAY_SIZE(ipt_builtin_mt)); - if (ret < 0) - goto err4; /* Register setsockopt */ ret = nf_register_sockopt(&ipt_sockopts); if (ret < 0) - goto err5; + goto err4; return 0; -err5: - xt_unregister_matches(ipt_builtin_mt, ARRAY_SIZE(ipt_builtin_mt)); err4: xt_unregister_targets(ipt_builtin_tg, ARRAY_SIZE(ipt_builtin_tg)); err2: @@ -1939,7 +1874,6 @@ static void __exit ip_tables_fini(void) { nf_unregister_sockopt(&ipt_sockopts); - xt_unregister_matches(ipt_builtin_mt, ARRAY_SIZE(ipt_builtin_mt)); xt_unregister_targets(ipt_builtin_tg, ARRAY_SIZE(ipt_builtin_tg)); unregister_pernet_subsys(&ip_tables_net_ops); } diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c index d8ef05347fd9..f95142e56da0 100644 --- a/net/ipv4/nexthop.c +++ b/net/ipv4/nexthop.c @@ -1124,13 +1124,13 @@ static bool ipv6_good_nh(const struct fib6_nh *nh) int state = NUD_REACHABLE; struct neighbour *n; - rcu_read_lock_bh(); + rcu_read_lock(); n = __ipv6_neigh_lookup_noref_stub(nh->fib_nh_dev, &nh->fib_nh_gw6); if (n) - state = n->nud_state; + state = READ_ONCE(n->nud_state); - rcu_read_unlock_bh(); + rcu_read_unlock(); return !!(state & NUD_VALID); } @@ -1140,14 +1140,14 @@ static bool ipv4_good_nh(const struct fib_nh *nh) int state = NUD_REACHABLE; struct neighbour *n; - rcu_read_lock_bh(); + rcu_read_lock(); n = __ipv4_neigh_lookup_noref(nh->fib_nh_dev, (__force u32)nh->fib_nh_gw4); if (n) - state = n->nud_state; + state = READ_ONCE(n->nud_state); - rcu_read_unlock_bh(); + rcu_read_unlock(); return !!(state & NUD_VALID); } diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index 8088a5011e7d..ff712bf2a98d 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -116,10 +116,10 @@ void raw_unhash_sk(struct sock *sk) } EXPORT_SYMBOL_GPL(raw_unhash_sk); -bool raw_v4_match(struct net *net, struct sock *sk, unsigned short num, +bool raw_v4_match(struct net *net, const struct sock *sk, unsigned short num, __be32 raddr, __be32 laddr, int dif, int sdif) { - struct inet_sock *inet = inet_sk(sk); + const struct inet_sock *inet = inet_sk(sk); if (net_eq(sock_net(sk), net) && inet->inet_num == num && !(inet->inet_daddr && inet->inet_daddr != raddr) && diff --git a/net/ipv4/raw_diag.c b/net/ipv4/raw_diag.c index da3591a66a16..63a40e4b678f 100644 --- a/net/ipv4/raw_diag.c +++ b/net/ipv4/raw_diag.c @@ -34,7 +34,7 @@ raw_get_hashinfo(const struct inet_diag_req_v2 *r) * use helper to figure it out. */ -static bool raw_lookup(struct net *net, struct sock *sk, +static bool raw_lookup(struct net *net, const struct sock *sk, const struct inet_diag_req_v2 *req) { struct inet_diag_req_raw *r = (void *)req; diff --git a/net/ipv4/route.c b/net/ipv4/route.c index de6e3515ab4f..98d7e6ba7493 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -408,7 +408,7 @@ static struct neighbour *ipv4_neigh_lookup(const struct dst_entry *dst, struct net_device *dev = dst->dev; struct neighbour *n; - rcu_read_lock_bh(); + rcu_read_lock(); if (likely(rt->rt_gw_family == AF_INET)) { n = ip_neigh_gw4(dev, rt->rt_gw4); @@ -424,7 +424,7 @@ static struct neighbour *ipv4_neigh_lookup(const struct dst_entry *dst, if (!IS_ERR(n) && !refcount_inc_not_zero(&n->refcnt)) n = NULL; - rcu_read_unlock_bh(); + rcu_read_unlock(); return n; } @@ -784,7 +784,7 @@ static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flow if (!n) n = neigh_create(&arp_tbl, &new_gw, rt->dst.dev); if (!IS_ERR(n)) { - if (!(n->nud_state & NUD_VALID)) { + if (!(READ_ONCE(n->nud_state) & NUD_VALID)) { neigh_event_send(n, NULL); } else { if (fib_lookup(net, fl4, &res, 0) == 0) { @@ -1508,20 +1508,20 @@ void rt_add_uncached_list(struct rtable *rt) { struct uncached_list *ul = raw_cpu_ptr(&rt_uncached_list); - rt->rt_uncached_list = ul; + rt->dst.rt_uncached_list = ul; spin_lock_bh(&ul->lock); - list_add_tail(&rt->rt_uncached, &ul->head); + list_add_tail(&rt->dst.rt_uncached, &ul->head); spin_unlock_bh(&ul->lock); } void rt_del_uncached_list(struct rtable *rt) { - if (!list_empty(&rt->rt_uncached)) { - struct uncached_list *ul = rt->rt_uncached_list; + if (!list_empty(&rt->dst.rt_uncached)) { + struct uncached_list *ul = rt->dst.rt_uncached_list; spin_lock_bh(&ul->lock); - list_del_init(&rt->rt_uncached); + list_del_init(&rt->dst.rt_uncached); spin_unlock_bh(&ul->lock); } } @@ -1546,13 +1546,13 @@ void rt_flush_dev(struct net_device *dev) continue; spin_lock_bh(&ul->lock); - list_for_each_entry_safe(rt, safe, &ul->head, rt_uncached) { + list_for_each_entry_safe(rt, safe, &ul->head, dst.rt_uncached) { if (rt->dst.dev != dev) continue; rt->dst.dev = blackhole_netdev; netdev_ref_replace(dev, blackhole_netdev, &rt->dst.dev_tracker, GFP_ATOMIC); - list_move(&rt->rt_uncached, &ul->quarantine); + list_move(&rt->dst.rt_uncached, &ul->quarantine); } spin_unlock_bh(&ul->lock); } @@ -1644,7 +1644,6 @@ struct rtable *rt_dst_alloc(struct net_device *dev, rt->rt_uses_gateway = 0; rt->rt_gw_family = 0; rt->rt_gw4 = 0; - INIT_LIST_HEAD(&rt->rt_uncached); rt->dst.output = ip_output; if (flags & RTCF_LOCAL) @@ -1675,7 +1674,6 @@ struct rtable *rt_dst_clone(struct net_device *dev, struct rtable *rt) new_rt->rt_gw4 = rt->rt_gw4; else if (rt->rt_gw_family == AF_INET6) new_rt->rt_gw6 = rt->rt_gw6; - INIT_LIST_HEAD(&new_rt->rt_uncached); new_rt->dst.input = rt->dst.input; new_rt->dst.output = rt->dst.output; @@ -2858,8 +2856,6 @@ struct dst_entry *ipv4_blackhole_route(struct net *net, struct dst_entry *dst_or rt->rt_gw4 = ort->rt_gw4; else if (rt->rt_gw_family == AF_INET6) rt->rt_gw6 = ort->rt_gw6; - - INIT_LIST_HEAD(&rt->rt_uncached); } dst_release(dst_orig); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 288693981b00..20db115c38c4 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -589,7 +589,8 @@ __poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait) } /* This barrier is coupled with smp_wmb() in tcp_reset() */ smp_rmb(); - if (sk->sk_err || !skb_queue_empty_lockless(&sk->sk_error_queue)) + if (READ_ONCE(sk->sk_err) || + !skb_queue_empty_lockless(&sk->sk_error_queue)) mask |= EPOLLERR; return mask; @@ -2164,7 +2165,7 @@ static void tcp_zc_finalize_rx_tstamp(struct sock *sk, struct msghdr cmsg_dummy; msg_control_addr = (unsigned long)zc->msg_control; - cmsg_dummy.msg_control = (void *)msg_control_addr; + cmsg_dummy.msg_control_user = (void __user *)msg_control_addr; cmsg_dummy.msg_controllen = (__kernel_size_t)zc->msg_controllen; cmsg_dummy.msg_flags = in_compat_syscall() @@ -2175,7 +2176,7 @@ static void tcp_zc_finalize_rx_tstamp(struct sock *sk, zc->msg_controllen == cmsg_dummy.msg_controllen) { tcp_recv_timestamp(&cmsg_dummy, sk, tss); zc->msg_control = (__u64) - ((uintptr_t)cmsg_dummy.msg_control); + ((uintptr_t)cmsg_dummy.msg_control_user); zc->msg_controllen = (__u64)cmsg_dummy.msg_controllen; zc->msg_flags = (__u32)cmsg_dummy.msg_flags; @@ -3094,7 +3095,7 @@ int tcp_disconnect(struct sock *sk, int flags) if (old_state == TCP_LISTEN) { inet_csk_listen_stop(sk); } else if (unlikely(tp->repair)) { - sk->sk_err = ECONNABORTED; + WRITE_ONCE(sk->sk_err, ECONNABORTED); } else if (tcp_need_reset(old_state) || (tp->snd_nxt != tp->write_seq && (1 << old_state) & (TCPF_CLOSING | TCPF_LAST_ACK))) { @@ -3102,9 +3103,9 @@ int tcp_disconnect(struct sock *sk, int flags) * states */ tcp_send_active_reset(sk, gfp_any()); - sk->sk_err = ECONNRESET; + WRITE_ONCE(sk->sk_err, ECONNRESET); } else if (old_state == TCP_SYN_SENT) - sk->sk_err = ECONNRESET; + WRITE_ONCE(sk->sk_err, ECONNRESET); tcp_clear_xmit_timers(sk); __skb_queue_purge(&sk->sk_receive_queue); @@ -4569,7 +4570,7 @@ tcp_inbound_md5_hash(const struct sock *sk, const struct sk_buff *skb, const __u8 *hash_location = NULL; struct tcp_md5sig_key *hash_expected; const struct tcphdr *th = tcp_hdr(skb); - struct tcp_sock *tp = tcp_sk(sk); + const struct tcp_sock *tp = tcp_sk(sk); int genhash, l3index; u8 newhash[16]; @@ -4692,7 +4693,7 @@ int tcp_abort(struct sock *sk, int err) bh_lock_sock(sk); if (!sock_flag(sk, SOCK_DEAD)) { - sk->sk_err = err; + WRITE_ONCE(sk->sk_err, err); /* This barrier is coupled with smp_rmb() in tcp_poll() */ smp_wmb(); sk_error_report(sk); diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c index db8b4b488c31..1b34050a7538 100644 --- a/net/ipv4/tcp_cong.c +++ b/net/ipv4/tcp_cong.c @@ -75,14 +75,8 @@ struct tcp_congestion_ops *tcp_ca_find_key(u32 key) return NULL; } -/* - * Attach new congestion control algorithm to the list - * of available options. - */ -int tcp_register_congestion_control(struct tcp_congestion_ops *ca) +int tcp_validate_congestion_control(struct tcp_congestion_ops *ca) { - int ret = 0; - /* all algorithms must implement these */ if (!ca->ssthresh || !ca->undo_cwnd || !(ca->cong_avoid || ca->cong_control)) { @@ -90,6 +84,20 @@ int tcp_register_congestion_control(struct tcp_congestion_ops *ca) return -EINVAL; } + return 0; +} + +/* Attach new congestion control algorithm to the list + * of available options. + */ +int tcp_register_congestion_control(struct tcp_congestion_ops *ca) +{ + int ret; + + ret = tcp_validate_congestion_control(ca); + if (ret) + return ret; + ca->key = jhash(ca->name, sizeof(ca->name), strlen(ca->name)); spin_lock(&tcp_cong_list_lock); @@ -130,6 +138,50 @@ void tcp_unregister_congestion_control(struct tcp_congestion_ops *ca) } EXPORT_SYMBOL_GPL(tcp_unregister_congestion_control); +/* Replace a registered old ca with a new one. + * + * The new ca must have the same name as the old one, that has been + * registered. + */ +int tcp_update_congestion_control(struct tcp_congestion_ops *ca, struct tcp_congestion_ops *old_ca) +{ + struct tcp_congestion_ops *existing; + int ret; + + ret = tcp_validate_congestion_control(ca); + if (ret) + return ret; + + ca->key = jhash(ca->name, sizeof(ca->name), strlen(ca->name)); + + spin_lock(&tcp_cong_list_lock); + existing = tcp_ca_find_key(old_ca->key); + if (ca->key == TCP_CA_UNSPEC || !existing || strcmp(existing->name, ca->name)) { + pr_notice("%s not registered or non-unique key\n", + ca->name); + ret = -EINVAL; + } else if (existing != old_ca) { + pr_notice("invalid old congestion control algorithm to replace\n"); + ret = -EINVAL; + } else { + /* Add the new one before removing the old one to keep + * one implementation available all the time. + */ + list_add_tail_rcu(&ca->list, &tcp_cong_list); + list_del_rcu(&existing->list); + pr_debug("%s updated\n", ca->name); + } + spin_unlock(&tcp_cong_list_lock); + + /* Wait for outstanding readers to complete before the + * module or struct_ops gets removed entirely. + */ + if (!ret) + synchronize_rcu(); + + return ret; +} + u32 tcp_ca_get_key_by_name(struct net *net, const char *name, bool *ecn_ca) { const struct tcp_congestion_ops *ca; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index cc072d2cfcd8..a057330d6f59 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -458,7 +458,7 @@ static void tcp_sndbuf_expand(struct sock *sk) static int __tcp_grow_window(const struct sock *sk, const struct sk_buff *skb, unsigned int skbtruesize) { - struct tcp_sock *tp = tcp_sk(sk); + const struct tcp_sock *tp = tcp_sk(sk); /* Optimize this! */ int truesize = tcp_win_from_space(sk, skbtruesize) >> 1; int window = tcp_win_from_space(sk, READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2])) >> 1; @@ -3874,7 +3874,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) /* We passed data and got it acked, remove any soft error * log. Something worked... */ - sk->sk_err_soft = 0; + WRITE_ONCE(sk->sk_err_soft, 0); icsk->icsk_probes_out = 0; tp->rcv_tstamp = tcp_jiffies32; if (!prior_packets) @@ -4322,15 +4322,15 @@ void tcp_reset(struct sock *sk, struct sk_buff *skb) /* We want the right error as BSD sees it (and indeed as we do). */ switch (sk->sk_state) { case TCP_SYN_SENT: - sk->sk_err = ECONNREFUSED; + WRITE_ONCE(sk->sk_err, ECONNREFUSED); break; case TCP_CLOSE_WAIT: - sk->sk_err = EPIPE; + WRITE_ONCE(sk->sk_err, EPIPE); break; case TCP_CLOSE: return; default: - sk->sk_err = ECONNRESET; + WRITE_ONCE(sk->sk_err, ECONNRESET); } /* This barrier is coupled with smp_rmb() in tcp_poll() */ smp_wmb(); @@ -5693,7 +5693,7 @@ static void tcp_urg(struct sock *sk, struct sk_buff *skb, const struct tcphdr *t */ static bool tcp_reset_check(const struct sock *sk, const struct sk_buff *skb) { - struct tcp_sock *tp = tcp_sk(sk); + const struct tcp_sock *tp = tcp_sk(sk); return unlikely(TCP_SKB_CB(skb)->seq == (tp->rcv_nxt - 1) && (1 << sk->sk_state) & (TCPF_CLOSE_WAIT | TCPF_LAST_ACK | @@ -5714,6 +5714,8 @@ static bool tcp_validate_incoming(struct sock *sk, struct sk_buff *skb, tp->rx_opt.saw_tstamp && tcp_paws_discard(sk, skb)) { if (!th->rst) { + if (unlikely(th->syn)) + goto syn_challenge; NET_INC_STATS(sock_net(sk), LINUX_MIB_PAWSESTABREJECTED); if (!tcp_oow_rate_limited(sock_net(sk), skb, LINUX_MIB_TCPACKSKIPPEDPAWS, diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index b9d55277cb85..39bda2b1066e 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -361,7 +361,7 @@ void tcp_v4_mtu_reduced(struct sock *sk) * for the case, if this connection will not able to recover. */ if (mtu < dst_mtu(dst) && ip_dont_fragment(sk, dst)) - sk->sk_err_soft = EMSGSIZE; + WRITE_ONCE(sk->sk_err_soft, EMSGSIZE); mtu = dst_mtu(dst); @@ -596,13 +596,13 @@ int tcp_v4_err(struct sk_buff *skb, u32 info) ip_icmp_error(sk, skb, err, th->dest, info, (u8 *)th); if (!sock_owned_by_user(sk)) { - sk->sk_err = err; + WRITE_ONCE(sk->sk_err, err); sk_error_report(sk); tcp_done(sk); } else { - sk->sk_err_soft = err; + WRITE_ONCE(sk->sk_err_soft, err); } goto out; } @@ -625,10 +625,10 @@ int tcp_v4_err(struct sk_buff *skb, u32 info) inet = inet_sk(sk); if (!sock_owned_by_user(sk) && inet->recverr) { - sk->sk_err = err; + WRITE_ONCE(sk->sk_err, err); sk_error_report(sk); } else { /* Only an error on timeout */ - sk->sk_err_soft = err; + WRITE_ONCE(sk->sk_err_soft, err); } out: diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index 9a7ef7732c24..dac0d62120e6 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -463,7 +463,7 @@ void tcp_ca_openreq_child(struct sock *sk, const struct dst_entry *dst) } EXPORT_SYMBOL_GPL(tcp_ca_openreq_child); -static void smc_check_reset_syn_req(struct tcp_sock *oldtp, +static void smc_check_reset_syn_req(const struct tcp_sock *oldtp, struct request_sock *req, struct tcp_sock *newtp) { @@ -492,7 +492,8 @@ struct sock *tcp_create_openreq_child(const struct sock *sk, const struct inet_request_sock *ireq = inet_rsk(req); struct tcp_request_sock *treq = tcp_rsk(req); struct inet_connection_sock *newicsk; - struct tcp_sock *oldtp, *newtp; + const struct tcp_sock *oldtp; + struct tcp_sock *newtp; u32 seq; if (!newsk) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index ba839e441450..cfe128b81a01 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -3699,7 +3699,7 @@ static void tcp_connect_init(struct sock *sk) tp->rx_opt.rcv_wscale = rcv_wscale; tp->rcv_ssthresh = tp->rcv_wnd; - sk->sk_err = 0; + WRITE_ONCE(sk->sk_err, 0); sock_reset_flag(sk, SOCK_DONE); tp->snd_wnd = 0; tcp_init_wl(tp, 0); @@ -4127,8 +4127,13 @@ int tcp_rtx_synack(const struct sock *sk, struct request_sock *req) if (!res) { TCP_INC_STATS(sock_net(sk), TCP_MIB_RETRANSSEGS); NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPSYNRETRANS); - if (unlikely(tcp_passive_fastopen(sk))) - tcp_sk(sk)->total_retrans++; + if (unlikely(tcp_passive_fastopen(sk))) { + /* sk has const attribute because listeners are lockless. + * However in this case, we are dealing with a passive fastopen + * socket thus we can change total_retrans value. + */ + tcp_sk_rw(sk)->total_retrans++; + } trace_tcp_retransmit_synack(sk, req); } return res; diff --git a/net/ipv4/tcp_recovery.c b/net/ipv4/tcp_recovery.c index 50abaa941387..acf4869c5d3b 100644 --- a/net/ipv4/tcp_recovery.c +++ b/net/ipv4/tcp_recovery.c @@ -4,7 +4,7 @@ static u32 tcp_rack_reo_wnd(const struct sock *sk) { - struct tcp_sock *tp = tcp_sk(sk); + const struct tcp_sock *tp = tcp_sk(sk); if (!tp->reord_seen) { /* If reordering has not been observed, be aggressive during diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index cb79127f45c3..b839c2f91292 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -67,7 +67,7 @@ u32 tcp_clamp_probe0_to_user_timeout(const struct sock *sk, u32 when) static void tcp_write_err(struct sock *sk) { - sk->sk_err = sk->sk_err_soft ? : ETIMEDOUT; + WRITE_ONCE(sk->sk_err, READ_ONCE(sk->sk_err_soft) ? : ETIMEDOUT); sk_error_report(sk); tcp_write_queue_purge(sk); @@ -110,7 +110,7 @@ static int tcp_out_of_resources(struct sock *sk, bool do_reset) shift++; /* If some dubious ICMP arrived, penalize even more. */ - if (sk->sk_err_soft) + if (READ_ONCE(sk->sk_err_soft)) shift++; if (tcp_check_oom(sk, shift)) { @@ -146,7 +146,7 @@ static int tcp_orphan_retries(struct sock *sk, bool alive) int retries = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_orphan_retries); /* May be zero. */ /* We know from an ICMP that something is wrong. */ - if (sk->sk_err_soft && !alive) + if (READ_ONCE(sk->sk_err_soft) && !alive) retries = 0; /* However, if socket sent something recently, select some safe diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index c605d171eb2d..aa32afd871ee 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -578,12 +578,12 @@ struct sock *udp4_lib_lookup(struct net *net, __be32 saddr, __be16 sport, EXPORT_SYMBOL_GPL(udp4_lib_lookup); #endif -static inline bool __udp_is_mcast_sock(struct net *net, struct sock *sk, +static inline bool __udp_is_mcast_sock(struct net *net, const struct sock *sk, __be16 loc_port, __be32 loc_addr, __be16 rmt_port, __be32 rmt_addr, int dif, int sdif, unsigned short hnum) { - struct inet_sock *inet = inet_sk(sk); + const struct inet_sock *inet = inet_sk(sk); if (!net_eq(sock_net(sk), net) || udp_sk(sk)->udp_port_hash != hnum || @@ -1531,10 +1531,21 @@ static void busylock_release(spinlock_t *busy) spin_unlock(busy); } +static int udp_rmem_schedule(struct sock *sk, int size) +{ + int delta; + + delta = size - sk->sk_forward_alloc; + if (delta > 0 && !__sk_mem_schedule(sk, delta, SK_MEM_RECV)) + return -ENOBUFS; + + return 0; +} + int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb) { struct sk_buff_head *list = &sk->sk_receive_queue; - int rmem, delta, amt, err = -ENOMEM; + int rmem, err = -ENOMEM; spinlock_t *busy = NULL; int size; @@ -1567,16 +1578,10 @@ int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb) goto uncharge_drop; spin_lock(&list->lock); - if (size >= sk->sk_forward_alloc) { - amt = sk_mem_pages(size); - delta = amt << PAGE_SHIFT; - if (!__sk_mem_raise_allocated(sk, delta, amt, SK_MEM_RECV)) { - err = -ENOBUFS; - spin_unlock(&list->lock); - goto uncharge_drop; - } - - sk->sk_forward_alloc += delta; + err = udp_rmem_schedule(sk, size); + if (err) { + spin_unlock(&list->lock); + goto uncharge_drop; } sk->sk_forward_alloc -= size; diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c index 3d0dfa6cf9f9..9403bbaf1b61 100644 --- a/net/ipv4/xfrm4_policy.c +++ b/net/ipv4/xfrm4_policy.c @@ -91,7 +91,6 @@ static int xfrm4_fill_dst(struct xfrm_dst *xdst, struct net_device *dev, xdst->u.rt.rt_gw6 = rt->rt_gw6; xdst->u.rt.rt_pmtu = rt->rt_pmtu; xdst->u.rt.rt_mtu_locked = rt->rt_mtu_locked; - INIT_LIST_HEAD(&xdst->u.rt.rt_uncached); rt_add_uncached_list(&xdst->u.rt); return 0; @@ -121,8 +120,7 @@ static void xfrm4_dst_destroy(struct dst_entry *dst) struct xfrm_dst *xdst = (struct xfrm_dst *)dst; dst_destroy_metrics_generic(dst); - if (xdst->u.rt.rt_uncached_list) - rt_del_uncached_list(&xdst->u.rt); + rt_del_uncached_list(&xdst->u.rt); xfrm_dst_destroy(xdst); } diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index faa47f9ea73a..3797917237d0 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -1034,7 +1034,7 @@ static int ipv6_add_addr_hash(struct net_device *dev, struct inet6_ifaddr *ifa) unsigned int hash = inet6_addr_hash(net, &ifa->addr); int err = 0; - spin_lock(&net->ipv6.addrconf_hash_lock); + spin_lock_bh(&net->ipv6.addrconf_hash_lock); /* Ignore adding duplicate addresses on an interface */ if (ipv6_chk_same_addr(net, &ifa->addr, dev, hash)) { @@ -1044,7 +1044,7 @@ static int ipv6_add_addr_hash(struct net_device *dev, struct inet6_ifaddr *ifa) hlist_add_head_rcu(&ifa->addr_lst, &net->ipv6.inet6_addr_lst[hash]); } - spin_unlock(&net->ipv6.addrconf_hash_lock); + spin_unlock_bh(&net->ipv6.addrconf_hash_lock); return err; } @@ -1139,15 +1139,15 @@ ipv6_add_addr(struct inet6_dev *idev, struct ifa6_config *cfg, /* For caller */ refcount_set(&ifa->refcnt, 1); - rcu_read_lock_bh(); + rcu_read_lock(); err = ipv6_add_addr_hash(idev->dev, ifa); if (err < 0) { - rcu_read_unlock_bh(); + rcu_read_unlock(); goto out; } - write_lock(&idev->lock); + write_lock_bh(&idev->lock); /* Add to inet6_dev unicast addr list. */ ipv6_link_dev_addr(idev, ifa); @@ -1158,9 +1158,9 @@ ipv6_add_addr(struct inet6_dev *idev, struct ifa6_config *cfg, } in6_ifa_hold(ifa); - write_unlock(&idev->lock); + write_unlock_bh(&idev->lock); - rcu_read_unlock_bh(); + rcu_read_unlock(); inet6addr_notifier_call_chain(NETDEV_UP, ifa); out: @@ -4223,7 +4223,8 @@ static void addrconf_dad_completed(struct inet6_ifaddr *ifp, bool bump_id, ipv6_accept_ra(ifp->idev) && ifp->idev->cnf.rtr_solicits != 0 && (dev->flags & IFF_LOOPBACK) == 0 && - (dev->type != ARPHRD_TUNNEL); + (dev->type != ARPHRD_TUNNEL) && + !netif_is_team_port(dev); read_unlock_bh(&ifp->idev->lock); /* While dad is in progress mld report's source address is in6_addrany. diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 38689bedfce7..2bbf13216a3d 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -845,7 +845,7 @@ int inet6_sk_rebuild_header(struct sock *sk) dst = ip6_dst_lookup_flow(sock_net(sk), sk, &fl6, final_p); if (IS_ERR(dst)) { sk->sk_route_caps = 0; - sk->sk_err_soft = -PTR_ERR(dst); + WRITE_ONCE(sk->sk_err_soft, -PTR_ERR(dst)); return PTR_ERR(dst); } @@ -952,6 +952,7 @@ static int __net_init inet6_net_init(struct net *net) net->ipv6.sysctl.icmpv6_echo_ignore_all = 0; net->ipv6.sysctl.icmpv6_echo_ignore_multicast = 0; net->ipv6.sysctl.icmpv6_echo_ignore_anycast = 0; + net->ipv6.sysctl.icmpv6_error_anycast_as_unicast = 0; /* By default, rate limit error messages. * Except for pmtu discovery, it would break it. diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c index 1f53f2a74480..9edf1f45b1ed 100644 --- a/net/ipv6/icmp.c +++ b/net/ipv6/icmp.c @@ -362,9 +362,10 @@ static struct dst_entry *icmpv6_route_lookup(struct net *net, /* * We won't send icmp if the destination is known - * anycast. + * anycast unless we need to treat anycast as unicast. */ - if (ipv6_anycast_destination(dst, &fl6->daddr)) { + if (!READ_ONCE(net->ipv6.sysctl.icmpv6_error_anycast_as_unicast) && + ipv6_anycast_destination(dst, &fl6->daddr)) { net_dbg_ratelimited("icmp6_send: acast source\n"); dst_release(dst); return ERR_PTR(-EINVAL); @@ -1195,6 +1196,15 @@ static struct ctl_table ipv6_icmp_table_template[] = { .mode = 0644, .proc_handler = proc_do_large_bitmap, }, + { + .procname = "error_anycast_as_unicast", + .data = &init_net.ipv6.sysctl.icmpv6_error_anycast_as_unicast, + .maxlen = sizeof(u8), + .mode = 0644, + .proc_handler = proc_dou8vec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, { }, }; @@ -1212,6 +1222,7 @@ struct ctl_table * __net_init ipv6_icmp_sysctl_init(struct net *net) table[2].data = &net->ipv6.sysctl.icmpv6_echo_ignore_multicast; table[3].data = &net->ipv6.sysctl.icmpv6_echo_ignore_anycast; table[4].data = &net->ipv6.sysctl.icmpv6_ratemask_ptr; + table[5].data = &net->ipv6.sysctl.icmpv6_error_anycast_as_unicast; } return table; } diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c index 5a9f4d722f35..0c50dcd35fe8 100644 --- a/net/ipv6/inet6_connection_sock.c +++ b/net/ipv6/inet6_connection_sock.c @@ -120,7 +120,7 @@ int inet6_csk_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl_unused dst = inet6_csk_route_socket(sk, &fl6); if (IS_ERR(dst)) { - sk->sk_err_soft = -PTR_ERR(dst); + WRITE_ONCE(sk->sk_err_soft, -PTR_ERR(dst)); sk->sk_route_caps = 0; kfree_skb(skb); return PTR_ERR(dst); diff --git a/net/ipv6/ip6_flowlabel.c b/net/ipv6/ip6_flowlabel.c index 18481eb76a0a..b3ca4beb4405 100644 --- a/net/ipv6/ip6_flowlabel.c +++ b/net/ipv6/ip6_flowlabel.c @@ -58,18 +58,18 @@ DEFINE_STATIC_KEY_DEFERRED_FALSE(ipv6_flowlabel_exclusive, HZ); EXPORT_SYMBOL(ipv6_flowlabel_exclusive); #define for_each_fl_rcu(hash, fl) \ - for (fl = rcu_dereference_bh(fl_ht[(hash)]); \ + for (fl = rcu_dereference(fl_ht[(hash)]); \ fl != NULL; \ - fl = rcu_dereference_bh(fl->next)) + fl = rcu_dereference(fl->next)) #define for_each_fl_continue_rcu(fl) \ - for (fl = rcu_dereference_bh(fl->next); \ + for (fl = rcu_dereference(fl->next); \ fl != NULL; \ - fl = rcu_dereference_bh(fl->next)) + fl = rcu_dereference(fl->next)) #define for_each_sk_fl_rcu(np, sfl) \ - for (sfl = rcu_dereference_bh(np->ipv6_fl_list); \ + for (sfl = rcu_dereference(np->ipv6_fl_list); \ sfl != NULL; \ - sfl = rcu_dereference_bh(sfl->next)) + sfl = rcu_dereference(sfl->next)) static inline struct ip6_flowlabel *__fl_lookup(struct net *net, __be32 label) { @@ -86,11 +86,11 @@ static struct ip6_flowlabel *fl_lookup(struct net *net, __be32 label) { struct ip6_flowlabel *fl; - rcu_read_lock_bh(); + rcu_read_lock(); fl = __fl_lookup(net, label); if (fl && !atomic_inc_not_zero(&fl->users)) fl = NULL; - rcu_read_unlock_bh(); + rcu_read_unlock(); return fl; } @@ -217,6 +217,7 @@ static struct ip6_flowlabel *fl_intern(struct net *net, fl->label = label & IPV6_FLOWLABEL_MASK; + rcu_read_lock(); spin_lock_bh(&ip6_fl_lock); if (label == 0) { for (;;) { @@ -240,6 +241,7 @@ static struct ip6_flowlabel *fl_intern(struct net *net, if (lfl) { atomic_inc(&lfl->users); spin_unlock_bh(&ip6_fl_lock); + rcu_read_unlock(); return lfl; } } @@ -249,6 +251,7 @@ static struct ip6_flowlabel *fl_intern(struct net *net, rcu_assign_pointer(fl_ht[FL_HASH(fl->label)], fl); atomic_inc(&fl_size); spin_unlock_bh(&ip6_fl_lock); + rcu_read_unlock(); return NULL; } @@ -263,17 +266,17 @@ struct ip6_flowlabel *__fl6_sock_lookup(struct sock *sk, __be32 label) label &= IPV6_FLOWLABEL_MASK; - rcu_read_lock_bh(); + rcu_read_lock(); for_each_sk_fl_rcu(np, sfl) { struct ip6_flowlabel *fl = sfl->fl; if (fl->label == label && atomic_inc_not_zero(&fl->users)) { fl->lastuse = jiffies; - rcu_read_unlock_bh(); + rcu_read_unlock(); return fl; } } - rcu_read_unlock_bh(); + rcu_read_unlock(); return NULL; } EXPORT_SYMBOL_GPL(__fl6_sock_lookup); @@ -475,10 +478,10 @@ static int mem_check(struct sock *sk) if (room > FL_MAX_SIZE - FL_MAX_PER_SOCK) return 0; - rcu_read_lock_bh(); + rcu_read_lock(); for_each_sk_fl_rcu(np, sfl) count++; - rcu_read_unlock_bh(); + rcu_read_unlock(); if (room <= 0 || ((count >= FL_MAX_PER_SOCK || @@ -515,7 +518,7 @@ int ipv6_flowlabel_opt_get(struct sock *sk, struct in6_flowlabel_req *freq, return 0; } - rcu_read_lock_bh(); + rcu_read_lock(); for_each_sk_fl_rcu(np, sfl) { if (sfl->fl->label == (np->flow_label & IPV6_FLOWLABEL_MASK)) { @@ -527,11 +530,11 @@ int ipv6_flowlabel_opt_get(struct sock *sk, struct in6_flowlabel_req *freq, freq->flr_linger = sfl->fl->linger / HZ; spin_unlock_bh(&ip6_fl_lock); - rcu_read_unlock_bh(); + rcu_read_unlock(); return 0; } } - rcu_read_unlock_bh(); + rcu_read_unlock(); return -ENOENT; } @@ -581,16 +584,16 @@ static int ipv6_flowlabel_renew(struct sock *sk, struct in6_flowlabel_req *freq) struct ipv6_fl_socklist *sfl; int err; - rcu_read_lock_bh(); + rcu_read_lock(); for_each_sk_fl_rcu(np, sfl) { if (sfl->fl->label == freq->flr_label) { err = fl6_renew(sfl->fl, freq->flr_linger, freq->flr_expires); - rcu_read_unlock_bh(); + rcu_read_unlock(); return err; } } - rcu_read_unlock_bh(); + rcu_read_unlock(); if (freq->flr_share == IPV6_FL_S_NONE && ns_capable(net->user_ns, CAP_NET_ADMIN)) { @@ -641,11 +644,11 @@ static int ipv6_flowlabel_get(struct sock *sk, struct in6_flowlabel_req *freq, if (freq->flr_label) { err = -EEXIST; - rcu_read_lock_bh(); + rcu_read_lock(); for_each_sk_fl_rcu(np, sfl) { if (sfl->fl->label == freq->flr_label) { if (freq->flr_flags & IPV6_FL_F_EXCL) { - rcu_read_unlock_bh(); + rcu_read_unlock(); goto done; } fl1 = sfl->fl; @@ -654,7 +657,7 @@ static int ipv6_flowlabel_get(struct sock *sk, struct in6_flowlabel_req *freq, break; } } - rcu_read_unlock_bh(); + rcu_read_unlock(); if (!fl1) fl1 = fl_lookup(net, freq->flr_label); @@ -809,7 +812,7 @@ static void *ip6fl_seq_start(struct seq_file *seq, loff_t *pos) state->pid_ns = proc_pid_ns(file_inode(seq->file)->i_sb); - rcu_read_lock_bh(); + rcu_read_lock(); return *pos ? ip6fl_get_idx(seq, *pos - 1) : SEQ_START_TOKEN; } @@ -828,7 +831,7 @@ static void *ip6fl_seq_next(struct seq_file *seq, void *v, loff_t *pos) static void ip6fl_seq_stop(struct seq_file *seq, void *v) __releases(RCU) { - rcu_read_unlock_bh(); + rcu_read_unlock(); } static int ip6fl_seq_show(struct seq_file *seq, void *v) diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c index e1ebf5e42ebe..d94041bb4287 100644 --- a/net/ipv6/ip6_input.c +++ b/net/ipv6/ip6_input.c @@ -404,10 +404,6 @@ resubmit_final: /* Only do this once for first final protocol */ have_final = true; - /* Free reference early: we don't need it any more, - and it may hold ip_conntrack module loaded - indefinitely. */ - nf_reset_ct(skb); skb_postpull_rcsum(skb, skb_network_header(skb), skb_network_header_len(skb)); @@ -430,10 +426,12 @@ resubmit_final: goto discard; } } - if (!(ipprot->flags & INET6_PROTO_NOPOLICY) && - !xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) { - SKB_DR_SET(reason, XFRM_POLICY); - goto discard; + if (!(ipprot->flags & INET6_PROTO_NOPOLICY)) { + if (!xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) { + SKB_DR_SET(reason, XFRM_POLICY); + goto discard; + } + nf_reset_ct(skb); } ret = INDIRECT_CALL_2(ipprot->handler, tcp_v6_rcv, udpv6_rcv, diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 95a55c6630ad..9554cf46ed88 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -116,7 +116,7 @@ static int ip6_finish_output2(struct net *net, struct sock *sk, struct sk_buff * return res; } - rcu_read_lock_bh(); + rcu_read_lock(); nexthop = rt6_nexthop((struct rt6_info *)dst, daddr); neigh = __ipv6_neigh_lookup_noref(dev, nexthop); @@ -124,7 +124,7 @@ static int ip6_finish_output2(struct net *net, struct sock *sk, struct sk_buff * if (unlikely(!neigh)) neigh = __neigh_create(&nd_tbl, nexthop, dev, false); if (IS_ERR(neigh)) { - rcu_read_unlock_bh(); + rcu_read_unlock(); IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTNOROUTES); kfree_skb_reason(skb, SKB_DROP_REASON_NEIGH_CREATEFAIL); return -EINVAL; @@ -132,7 +132,7 @@ static int ip6_finish_output2(struct net *net, struct sock *sk, struct sk_buff * } sock_confirm_neigh(skb, neigh); ret = neigh_output(neigh, skb, false); - rcu_read_unlock_bh(); + rcu_read_unlock(); return ret; } @@ -1150,11 +1150,11 @@ static int ip6_dst_lookup_tail(struct net *net, const struct sock *sk, * dst entry of the nexthop router */ rt = (struct rt6_info *) *dst; - rcu_read_lock_bh(); + rcu_read_lock(); n = __ipv6_neigh_lookup_noref(rt->dst.dev, rt6_nexthop(rt, &fl6->daddr)); - err = n && !(n->nud_state & NUD_VALID) ? -EINVAL : 0; - rcu_read_unlock_bh(); + err = n && !(READ_ONCE(n->nud_state) & NUD_VALID) ? -EINVAL : 0; + rcu_read_unlock(); if (err) { struct inet6_ifaddr *ifp; @@ -1500,7 +1500,7 @@ static int __ip6_append_data(struct sock *sk, mtu = cork->gso_size ? IP6_MAX_MTU : cork->fragsize; orig_mtu = mtu; - if (cork->tx_flags & SKBTX_ANY_SW_TSTAMP && + if (cork->tx_flags & SKBTX_ANY_TSTAMP && sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID) tskey = atomic_inc_return(&sk->sk_tskey) - 1; diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c index 2917dd8d198c..ae818ff46224 100644 --- a/net/ipv6/ipv6_sockglue.c +++ b/net/ipv6/ipv6_sockglue.c @@ -716,6 +716,7 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname, goto done; msg.msg_controllen = optlen; + msg.msg_control_is_user = false; msg.msg_control = (void *)(opt+1); ipc6.opt = opt; diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c index 1c02160cf7a4..714cdc9e2b8e 100644 --- a/net/ipv6/mcast.c +++ b/net/ipv6/mcast.c @@ -627,12 +627,12 @@ int ip6_mc_msfget(struct sock *sk, struct group_filter *gsf, return 0; } -bool inet6_mc_check(struct sock *sk, const struct in6_addr *mc_addr, +bool inet6_mc_check(const struct sock *sk, const struct in6_addr *mc_addr, const struct in6_addr *src_addr) { - struct ipv6_pinfo *np = inet6_sk(sk); - struct ipv6_mc_socklist *mc; - struct ip6_sf_socklist *psl; + const struct ipv6_pinfo *np = inet6_sk(sk); + const struct ipv6_mc_socklist *mc; + const struct ip6_sf_socklist *psl; bool rv = true; rcu_read_lock(); diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c index c4be62c99f73..18634ebd20a4 100644 --- a/net/ipv6/ndisc.c +++ b/net/ipv6/ndisc.c @@ -745,7 +745,7 @@ static void ndisc_solicit(struct neighbour *neigh, struct sk_buff *skb) saddr = &ipv6_hdr(skb)->saddr; probes -= NEIGH_VAR(neigh->parms, UCAST_PROBES); if (probes < 0) { - if (!(neigh->nud_state & NUD_VALID)) { + if (!(READ_ONCE(neigh->nud_state) & NUD_VALID)) { ND_PRINTK(1, dbg, "%s: trying to ucast probe in NUD_INVALID: %pI6\n", __func__, target); @@ -1090,7 +1090,7 @@ static enum skb_drop_reason ndisc_recv_na(struct sk_buff *skb) u8 old_flags = neigh->flags; struct net *net = dev_net(dev); - if (neigh->nud_state & NUD_FAILED) + if (READ_ONCE(neigh->nud_state) & NUD_FAILED) goto out; /* diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c index 0ce0ed17c758..fd9f049d6d41 100644 --- a/net/ipv6/netfilter/ip6_tables.c +++ b/net/ipv6/netfilter/ip6_tables.c @@ -18,7 +18,6 @@ #include <linux/netdevice.h> #include <linux/module.h> #include <linux/poison.h> -#include <linux/icmpv6.h> #include <net/ipv6.h> #include <net/compat.h> #include <linux/uaccess.h> @@ -35,7 +34,6 @@ MODULE_LICENSE("GPL"); MODULE_AUTHOR("Netfilter Core Team <coreteam@netfilter.org>"); MODULE_DESCRIPTION("IPv6 packet filter"); -MODULE_ALIAS("ip6t_icmp6"); void *ip6t_alloc_initial_table(const struct xt_table *info) { @@ -1805,52 +1803,6 @@ void ip6t_unregister_table_exit(struct net *net, const char *name) __ip6t_unregister_table(net, table); } -/* Returns 1 if the type and code is matched by the range, 0 otherwise */ -static inline bool -icmp6_type_code_match(u_int8_t test_type, u_int8_t min_code, u_int8_t max_code, - u_int8_t type, u_int8_t code, - bool invert) -{ - return (type == test_type && code >= min_code && code <= max_code) - ^ invert; -} - -static bool -icmp6_match(const struct sk_buff *skb, struct xt_action_param *par) -{ - const struct icmp6hdr *ic; - struct icmp6hdr _icmph; - const struct ip6t_icmp *icmpinfo = par->matchinfo; - - /* Must not be a fragment. */ - if (par->fragoff != 0) - return false; - - ic = skb_header_pointer(skb, par->thoff, sizeof(_icmph), &_icmph); - if (ic == NULL) { - /* We've been asked to examine this packet, and we - * can't. Hence, no choice but to drop. - */ - par->hotdrop = true; - return false; - } - - return icmp6_type_code_match(icmpinfo->type, - icmpinfo->code[0], - icmpinfo->code[1], - ic->icmp6_type, ic->icmp6_code, - !!(icmpinfo->invflags&IP6T_ICMP_INV)); -} - -/* Called when user tries to insert an entry of this type. */ -static int icmp6_checkentry(const struct xt_mtchk_param *par) -{ - const struct ip6t_icmp *icmpinfo = par->matchinfo; - - /* Must specify no unknown invflags */ - return (icmpinfo->invflags & ~IP6T_ICMP_INV) ? -EINVAL : 0; -} - /* The built-in targets: standard (NULL) and error. */ static struct xt_target ip6t_builtin_tg[] __read_mostly = { { @@ -1882,18 +1834,6 @@ static struct nf_sockopt_ops ip6t_sockopts = { .owner = THIS_MODULE, }; -static struct xt_match ip6t_builtin_mt[] __read_mostly = { - { - .name = "icmp6", - .match = icmp6_match, - .matchsize = sizeof(struct ip6t_icmp), - .checkentry = icmp6_checkentry, - .proto = IPPROTO_ICMPV6, - .family = NFPROTO_IPV6, - .me = THIS_MODULE, - }, -}; - static int __net_init ip6_tables_net_init(struct net *net) { return xt_proto_init(net, NFPROTO_IPV6); @@ -1921,19 +1861,14 @@ static int __init ip6_tables_init(void) ret = xt_register_targets(ip6t_builtin_tg, ARRAY_SIZE(ip6t_builtin_tg)); if (ret < 0) goto err2; - ret = xt_register_matches(ip6t_builtin_mt, ARRAY_SIZE(ip6t_builtin_mt)); - if (ret < 0) - goto err4; /* Register setsockopt */ ret = nf_register_sockopt(&ip6t_sockopts); if (ret < 0) - goto err5; + goto err4; return 0; -err5: - xt_unregister_matches(ip6t_builtin_mt, ARRAY_SIZE(ip6t_builtin_mt)); err4: xt_unregister_targets(ip6t_builtin_tg, ARRAY_SIZE(ip6t_builtin_tg)); err2: @@ -1946,7 +1881,6 @@ static void __exit ip6_tables_fini(void) { nf_unregister_sockopt(&ip6t_sockopts); - xt_unregister_matches(ip6t_builtin_mt, ARRAY_SIZE(ip6t_builtin_mt)); xt_unregister_targets(ip6t_builtin_tg, ARRAY_SIZE(ip6t_builtin_tg)); unregister_pernet_subsys(&ip6_tables_net_ops); } diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c index 808983bc2ec9..c4835dbdfcff 100644 --- a/net/ipv6/ping.c +++ b/net/ipv6/ping.c @@ -237,7 +237,7 @@ static int ping_v6_seq_show(struct seq_file *seq, void *v) seq_puts(seq, IPV6_SEQ_DGRAM_HEADER); } else { int bucket = ((struct ping_iter_state *) seq->private)->bucket; - struct inet_sock *inet = inet_sk(v); + struct inet_sock *inet = inet_sk((struct sock *)v); __u16 srcp = ntohs(inet->inet_sport); __u16 destp = ntohs(inet->inet_dport); ip6_dgram_sock_seq_show(seq, v, srcp, destp, bucket); diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index a327aa481df4..7d0adb612bdd 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -64,7 +64,7 @@ struct raw_hashinfo raw_v6_hashinfo; EXPORT_SYMBOL_GPL(raw_v6_hashinfo); -bool raw_v6_match(struct net *net, struct sock *sk, unsigned short num, +bool raw_v6_match(struct net *net, const struct sock *sk, unsigned short num, const struct in6_addr *loc_addr, const struct in6_addr *rmt_addr, int dif, int sdif) { @@ -193,10 +193,8 @@ static bool ipv6_raw_deliver(struct sk_buff *skb, int nexthdr) struct sk_buff *clone = skb_clone(skb, GFP_ATOMIC); /* Not releasing hash table! */ - if (clone) { - nf_reset_ct(clone); + if (clone) rawv6_rcv(sk, clone); - } } } rcu_read_unlock(); @@ -389,6 +387,7 @@ int rawv6_rcv(struct sock *sk, struct sk_buff *skb) kfree_skb_reason(skb, SKB_DROP_REASON_XFRM_POLICY); return NET_RX_DROP; } + nf_reset_ct(skb); if (!rp->checksum) skb->ip_summed = CHECKSUM_UNNECESSARY; diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 0fdb03df2287..e3aec46bd466 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -139,20 +139,20 @@ void rt6_uncached_list_add(struct rt6_info *rt) { struct uncached_list *ul = raw_cpu_ptr(&rt6_uncached_list); - rt->rt6i_uncached_list = ul; + rt->dst.rt_uncached_list = ul; spin_lock_bh(&ul->lock); - list_add_tail(&rt->rt6i_uncached, &ul->head); + list_add_tail(&rt->dst.rt_uncached, &ul->head); spin_unlock_bh(&ul->lock); } void rt6_uncached_list_del(struct rt6_info *rt) { - if (!list_empty(&rt->rt6i_uncached)) { - struct uncached_list *ul = rt->rt6i_uncached_list; + if (!list_empty(&rt->dst.rt_uncached)) { + struct uncached_list *ul = rt->dst.rt_uncached_list; spin_lock_bh(&ul->lock); - list_del_init(&rt->rt6i_uncached); + list_del_init(&rt->dst.rt_uncached); spin_unlock_bh(&ul->lock); } } @@ -169,7 +169,7 @@ static void rt6_uncached_list_flush_dev(struct net_device *dev) continue; spin_lock_bh(&ul->lock); - list_for_each_entry_safe(rt, safe, &ul->head, rt6i_uncached) { + list_for_each_entry_safe(rt, safe, &ul->head, dst.rt_uncached) { struct inet6_dev *rt_idev = rt->rt6i_idev; struct net_device *rt_dev = rt->dst.dev; bool handled = false; @@ -188,7 +188,7 @@ static void rt6_uncached_list_flush_dev(struct net_device *dev) handled = true; } if (handled) - list_move(&rt->rt6i_uncached, + list_move(&rt->dst.rt_uncached, &ul->quarantine); } spin_unlock_bh(&ul->lock); @@ -293,7 +293,7 @@ static const struct fib6_info fib6_null_entry_template = { static const struct rt6_info ip6_null_entry_template = { .dst = { - .__refcnt = ATOMIC_INIT(1), + .__rcuref = RCUREF_INIT(1), .__use = 1, .obsolete = DST_OBSOLETE_FORCE_CHK, .error = -ENETUNREACH, @@ -307,7 +307,7 @@ static const struct rt6_info ip6_null_entry_template = { static const struct rt6_info ip6_prohibit_entry_template = { .dst = { - .__refcnt = ATOMIC_INIT(1), + .__rcuref = RCUREF_INIT(1), .__use = 1, .obsolete = DST_OBSOLETE_FORCE_CHK, .error = -EACCES, @@ -319,7 +319,7 @@ static const struct rt6_info ip6_prohibit_entry_template = { static const struct rt6_info ip6_blk_hole_entry_template = { .dst = { - .__refcnt = ATOMIC_INIT(1), + .__rcuref = RCUREF_INIT(1), .__use = 1, .obsolete = DST_OBSOLETE_FORCE_CHK, .error = -EINVAL, @@ -334,7 +334,6 @@ static const struct rt6_info ip6_blk_hole_entry_template = { static void rt6_info_init(struct rt6_info *rt) { memset_after(rt, 0, dst); - INIT_LIST_HEAD(&rt->rt6i_uncached); } /* allocate dst with ip6_dst_ops */ @@ -633,15 +632,15 @@ static void rt6_probe(struct fib6_nh *fib6_nh) nh_gw = &fib6_nh->fib_nh_gw6; dev = fib6_nh->fib_nh_dev; - rcu_read_lock_bh(); + rcu_read_lock(); last_probe = READ_ONCE(fib6_nh->last_probe); idev = __in6_dev_get(dev); neigh = __ipv6_neigh_lookup_noref(dev, nh_gw); if (neigh) { - if (neigh->nud_state & NUD_VALID) + if (READ_ONCE(neigh->nud_state) & NUD_VALID) goto out; - write_lock(&neigh->lock); + write_lock_bh(&neigh->lock); if (!(neigh->nud_state & NUD_VALID) && time_after(jiffies, neigh->updated + idev->cnf.rtr_probe_interval)) { @@ -649,7 +648,7 @@ static void rt6_probe(struct fib6_nh *fib6_nh) if (work) __neigh_set_probe_once(neigh); } - write_unlock(&neigh->lock); + write_unlock_bh(&neigh->lock); } else if (time_after(jiffies, last_probe + idev->cnf.rtr_probe_interval)) { work = kmalloc(sizeof(*work), GFP_ATOMIC); @@ -667,7 +666,7 @@ static void rt6_probe(struct fib6_nh *fib6_nh) } out: - rcu_read_unlock_bh(); + rcu_read_unlock(); } #else static inline void rt6_probe(struct fib6_nh *fib6_nh) @@ -683,25 +682,25 @@ static enum rt6_nud_state rt6_check_neigh(const struct fib6_nh *fib6_nh) enum rt6_nud_state ret = RT6_NUD_FAIL_HARD; struct neighbour *neigh; - rcu_read_lock_bh(); + rcu_read_lock(); neigh = __ipv6_neigh_lookup_noref(fib6_nh->fib_nh_dev, &fib6_nh->fib_nh_gw6); if (neigh) { - read_lock(&neigh->lock); - if (neigh->nud_state & NUD_VALID) + u8 nud_state = READ_ONCE(neigh->nud_state); + + if (nud_state & NUD_VALID) ret = RT6_NUD_SUCCEED; #ifdef CONFIG_IPV6_ROUTER_PREF - else if (!(neigh->nud_state & NUD_FAILED)) + else if (!(nud_state & NUD_FAILED)) ret = RT6_NUD_SUCCEED; else ret = RT6_NUD_FAIL_PROBE; #endif - read_unlock(&neigh->lock); } else { ret = IS_ENABLED(CONFIG_IPV6_ROUTER_PREF) ? RT6_NUD_SUCCEED : RT6_NUD_FAIL_DO_RR; } - rcu_read_unlock_bh(); + rcu_read_unlock(); return ret; } @@ -2638,7 +2637,7 @@ struct dst_entry *ip6_route_output_flags(struct net *net, dst = ip6_route_output_flags_noref(net, sk, fl6, flags); rt6 = (struct rt6_info *)dst; /* For dst cached in uncached_list, refcnt is already taken. */ - if (list_empty(&rt6->rt6i_uncached) && !dst_hold_safe(dst)) { + if (list_empty(&rt6->dst.rt_uncached) && !dst_hold_safe(dst)) { dst = &net->ipv6.ip6_null_entry->dst; dst_hold(dst); } @@ -2748,7 +2747,7 @@ INDIRECT_CALLABLE_SCOPE struct dst_entry *ip6_dst_check(struct dst_entry *dst, from = rcu_dereference(rt->from); if (from && (rt->rt6i_flags & RTF_PCPU || - unlikely(!list_empty(&rt->rt6i_uncached)))) + unlikely(!list_empty(&rt->dst.rt_uncached)))) dst_ret = rt6_dst_from_check(rt, from, cookie); else dst_ret = rt6_check(rt, from, cookie); @@ -6477,7 +6476,7 @@ static int __net_init ip6_route_net_init(struct net *net) net->ipv6.ip6_null_entry->dst.ops = &net->ipv6.ip6_dst_ops; dst_init_metrics(&net->ipv6.ip6_null_entry->dst, ip6_template_metrics, true); - INIT_LIST_HEAD(&net->ipv6.ip6_null_entry->rt6i_uncached); + INIT_LIST_HEAD(&net->ipv6.ip6_null_entry->dst.rt_uncached); #ifdef CONFIG_IPV6_MULTIPLE_TABLES net->ipv6.fib6_has_custom_rules = false; @@ -6489,7 +6488,7 @@ static int __net_init ip6_route_net_init(struct net *net) net->ipv6.ip6_prohibit_entry->dst.ops = &net->ipv6.ip6_dst_ops; dst_init_metrics(&net->ipv6.ip6_prohibit_entry->dst, ip6_template_metrics, true); - INIT_LIST_HEAD(&net->ipv6.ip6_prohibit_entry->rt6i_uncached); + INIT_LIST_HEAD(&net->ipv6.ip6_prohibit_entry->dst.rt_uncached); net->ipv6.ip6_blk_hole_entry = kmemdup(&ip6_blk_hole_entry_template, sizeof(*net->ipv6.ip6_blk_hole_entry), @@ -6499,7 +6498,7 @@ static int __net_init ip6_route_net_init(struct net *net) net->ipv6.ip6_blk_hole_entry->dst.ops = &net->ipv6.ip6_dst_ops; dst_init_metrics(&net->ipv6.ip6_blk_hole_entry->dst, ip6_template_metrics, true); - INIT_LIST_HEAD(&net->ipv6.ip6_blk_hole_entry->rt6i_uncached); + INIT_LIST_HEAD(&net->ipv6.ip6_blk_hole_entry->dst.rt_uncached); #ifdef CONFIG_IPV6_SUBTREES net->ipv6.fib6_routes_require_src = 0; #endif diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c index 70d81bba5093..063560e2cb1a 100644 --- a/net/ipv6/sit.c +++ b/net/ipv6/sit.c @@ -1024,7 +1024,7 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb, ttl = iph6->hop_limit; tos = INET_ECN_encapsulate(tos, ipv6_get_dsfield(iph6)); - if (ip_tunnel_encap(skb, tunnel, &protocol, &fl4) < 0) { + if (ip_tunnel_encap(skb, &tunnel->encap, &protocol, &fl4) < 0) { ip_rt_put(rt); goto tx_error; } diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 1bf93b61aa06..244cf86c4cbb 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -493,12 +493,13 @@ static int tcp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, ipv6_icmp_error(sk, skb, err, th->dest, ntohl(info), (u8 *)th); if (!sock_owned_by_user(sk)) { - sk->sk_err = err; + WRITE_ONCE(sk->sk_err, err); sk_error_report(sk); /* Wake people up to see the error (see connect in sock.c) */ tcp_done(sk); - } else - sk->sk_err_soft = err; + } else { + WRITE_ONCE(sk->sk_err_soft, err); + } goto out; case TCP_LISTEN: break; @@ -512,11 +513,11 @@ static int tcp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, } if (!sock_owned_by_user(sk) && np->recverr) { - sk->sk_err = err; + WRITE_ONCE(sk->sk_err, err); sk_error_report(sk); - } else - sk->sk_err_soft = err; - + } else { + WRITE_ONCE(sk->sk_err_soft, err); + } out: bh_unlock_sock(sk); sock_put(sk); @@ -1722,6 +1723,8 @@ process: if (drop_reason) goto discard_and_relse; + nf_reset_ct(skb); + if (tcp_filter(sk, skb)) { drop_reason = SKB_DROP_REASON_SOCKET_FILTER; goto discard_and_relse; diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index a675acfb901d..e5a337e6b970 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -704,6 +704,7 @@ static int udpv6_queue_rcv_one_skb(struct sock *sk, struct sk_buff *skb) drop_reason = SKB_DROP_REASON_XFRM_POLICY; goto drop; } + nf_reset_ct(skb); if (static_branch_unlikely(&udpv6_encap_needed_key) && up->encap_type) { int (*encap_rcv)(struct sock *sk, struct sk_buff *skb); @@ -805,12 +806,12 @@ static int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb) return 0; } -static bool __udp_v6_is_mcast_sock(struct net *net, struct sock *sk, +static bool __udp_v6_is_mcast_sock(struct net *net, const struct sock *sk, __be16 loc_port, const struct in6_addr *loc_addr, __be16 rmt_port, const struct in6_addr *rmt_addr, int dif, int sdif, unsigned short hnum) { - struct inet_sock *inet = inet_sk(sk); + const struct inet_sock *inet = inet_sk(sk); if (!net_eq(sock_net(sk), net)) return false; @@ -1027,6 +1028,7 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable, if (!xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) goto discard; + nf_reset_ct(skb); if (udp_lib_checksum_complete(skb)) goto csum_error; @@ -1710,7 +1712,7 @@ int udp6_seq_show(struct seq_file *seq, void *v) seq_puts(seq, IPV6_SEQ_DGRAM_HEADER); } else { int bucket = ((struct udp_iter_state *)seq->private)->bucket; - struct inet_sock *inet = inet_sk(v); + const struct inet_sock *inet = inet_sk((const struct sock *)v); __u16 srcp = ntohs(inet->inet_sport); __u16 destp = ntohs(inet->inet_dport); __ip6_dgram_sock_seq_show(seq, v, srcp, destp, diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c index ea435eba3053..eecc5e59da17 100644 --- a/net/ipv6/xfrm6_policy.c +++ b/net/ipv6/xfrm6_policy.c @@ -89,7 +89,6 @@ static int xfrm6_fill_dst(struct xfrm_dst *xdst, struct net_device *dev, xdst->u.rt6.rt6i_gateway = rt->rt6i_gateway; xdst->u.rt6.rt6i_dst = rt->rt6i_dst; xdst->u.rt6.rt6i_src = rt->rt6i_src; - INIT_LIST_HEAD(&xdst->u.rt6.rt6i_uncached); rt6_uncached_list_add(&xdst->u.rt6); return 0; @@ -121,8 +120,7 @@ static void xfrm6_dst_destroy(struct dst_entry *dst) if (likely(xdst->u.rt6.rt6i_idev)) in6_dev_put(xdst->u.rt6.rt6i_idev); dst_destroy_metrics_generic(dst); - if (xdst->u.rt6.rt6i_uncached_list) - rt6_uncached_list_del(&xdst->u.rt6); + rt6_uncached_list_del(&xdst->u.rt6); xfrm_dst_destroy(xdst); } diff --git a/net/mac80211/agg-tx.c b/net/mac80211/agg-tx.c index f9514bacbd4a..3b651e7f5a73 100644 --- a/net/mac80211/agg-tx.c +++ b/net/mac80211/agg-tx.c @@ -554,6 +554,23 @@ void ieee80211_tx_ba_session_handle_start(struct sta_info *sta, int tid) ieee80211_send_addba_with_timeout(sta, tid_tx); } +void ieee80211_refresh_tx_agg_session_timer(struct ieee80211_sta *pubsta, + u16 tid) +{ + struct sta_info *sta = container_of(pubsta, struct sta_info, sta); + struct tid_ampdu_tx *tid_tx; + + if (WARN_ON_ONCE(tid >= IEEE80211_NUM_TIDS)) + return; + + tid_tx = rcu_dereference(sta->ampdu_mlme.tid_tx[tid]); + if (!tid_tx) + return; + + tid_tx->last_tx = jiffies; +} +EXPORT_SYMBOL(ieee80211_refresh_tx_agg_session_timer); + /* * After accepting the AddBA Response we activated a timer, * resetting it after each frame that we send. diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c index d3d861911ed6..7317e4a5d1ff 100644 --- a/net/mac80211/cfg.c +++ b/net/mac80211/cfg.c @@ -1084,6 +1084,23 @@ ieee80211_copy_mbssid_beacon(u8 *pos, struct cfg80211_mbssid_elems *dst, return offset; } +static int +ieee80211_copy_rnr_beacon(u8 *pos, struct cfg80211_rnr_elems *dst, + struct cfg80211_rnr_elems *src) +{ + int i, offset = 0; + + for (i = 0; i < src->cnt; i++) { + memcpy(pos + offset, src->elem[i].data, src->elem[i].len); + dst->elem[i].len = src->elem[i].len; + dst->elem[i].data = pos + offset; + offset += dst->elem[i].len; + } + dst->cnt = src->cnt; + + return offset; +} + static int ieee80211_assign_beacon(struct ieee80211_sub_if_data *sdata, struct ieee80211_link_data *link, struct cfg80211_beacon_data *params, @@ -1091,6 +1108,7 @@ static int ieee80211_assign_beacon(struct ieee80211_sub_if_data *sdata, const struct ieee80211_color_change_settings *cca) { struct cfg80211_mbssid_elems *mbssid = NULL; + struct cfg80211_rnr_elems *rnr = NULL; struct beacon_data *new, *old; int new_head_len, new_tail_len; int size, err; @@ -1122,11 +1140,21 @@ static int ieee80211_assign_beacon(struct ieee80211_sub_if_data *sdata, if (params->mbssid_ies) { mbssid = params->mbssid_ies; size += struct_size(new->mbssid_ies, elem, mbssid->cnt); - size += ieee80211_get_mbssid_beacon_len(mbssid); + if (params->rnr_ies) { + rnr = params->rnr_ies; + size += struct_size(new->rnr_ies, elem, rnr->cnt); + } + size += ieee80211_get_mbssid_beacon_len(mbssid, rnr, + mbssid->cnt); } else if (old && old->mbssid_ies) { mbssid = old->mbssid_ies; size += struct_size(new->mbssid_ies, elem, mbssid->cnt); - size += ieee80211_get_mbssid_beacon_len(mbssid); + if (old && old->rnr_ies) { + rnr = old->rnr_ies; + size += struct_size(new->rnr_ies, elem, rnr->cnt); + } + size += ieee80211_get_mbssid_beacon_len(mbssid, rnr, + mbssid->cnt); } new = kzalloc(size, GFP_KERNEL); @@ -1137,7 +1165,7 @@ static int ieee80211_assign_beacon(struct ieee80211_sub_if_data *sdata, /* * pointers go into the block we allocated, - * memory is | beacon_data | head | tail | mbssid_ies + * memory is | beacon_data | head | tail | mbssid_ies | rnr_ies */ new->head = ((u8 *) new) + sizeof(*new); new->tail = new->head + new_head_len; @@ -1149,7 +1177,13 @@ static int ieee80211_assign_beacon(struct ieee80211_sub_if_data *sdata, new->mbssid_ies = (void *)pos; pos += struct_size(new->mbssid_ies, elem, mbssid->cnt); - ieee80211_copy_mbssid_beacon(pos, new->mbssid_ies, mbssid); + pos += ieee80211_copy_mbssid_beacon(pos, new->mbssid_ies, + mbssid); + if (rnr) { + new->rnr_ies = (void *)pos; + pos += struct_size(new->rnr_ies, elem, rnr->cnt); + ieee80211_copy_rnr_beacon(pos, new->rnr_ies, rnr); + } /* update bssid_indicator */ link_conf->bssid_indicator = ilog2(__roundup_pow_of_two(mbssid->cnt + 1)); @@ -1252,7 +1286,15 @@ static int ieee80211_start_ap(struct wiphy *wiphy, struct net_device *dev, prev_beacon_int = link_conf->beacon_int; link_conf->beacon_int = params->beacon_interval; + if (params->ht_cap) + link_conf->ht_ldpc = + params->ht_cap->cap_info & + cpu_to_le16(IEEE80211_HT_CAP_LDPC_CODING); + if (params->vht_cap) { + link_conf->vht_ldpc = + params->vht_cap->vht_cap_info & + cpu_to_le32(IEEE80211_VHT_CAP_RXLDPC); link_conf->vht_su_beamformer = params->vht_cap->vht_cap_info & cpu_to_le32(IEEE80211_VHT_CAP_SU_BEAMFORMER_CAPABLE); @@ -1282,6 +1324,9 @@ static int ieee80211_start_ap(struct wiphy *wiphy, struct net_device *dev, } if (params->he_cap) { + link_conf->he_ldpc = + params->he_cap->phy_cap_info[1] & + IEEE80211_HE_PHY_CAP1_LDPC_CODING_IN_PAYLOAD; link_conf->he_su_beamformer = params->he_cap->phy_cap_info[3] & IEEE80211_HE_PHY_CAP3_SU_BEAMFORMER; @@ -1297,8 +1342,28 @@ static int ieee80211_start_ap(struct wiphy *wiphy, struct net_device *dev, } if (params->eht_cap) { + if (!link_conf->he_support) + return -EOPNOTSUPP; + + link_conf->eht_support = true; link_conf->eht_puncturing = params->punct_bitmap; changed |= BSS_CHANGED_EHT_PUNCTURING; + + link_conf->eht_su_beamformer = + params->eht_cap->fixed.phy_cap_info[0] & + IEEE80211_EHT_PHY_CAP0_SU_BEAMFORMER; + link_conf->eht_su_beamformee = + params->eht_cap->fixed.phy_cap_info[0] & + IEEE80211_EHT_PHY_CAP0_SU_BEAMFORMEE; + link_conf->eht_mu_beamformer = + params->eht_cap->fixed.phy_cap_info[7] & + (IEEE80211_EHT_PHY_CAP7_MU_BEAMFORMER_80MHZ | + IEEE80211_EHT_PHY_CAP7_MU_BEAMFORMER_160MHZ | + IEEE80211_EHT_PHY_CAP7_MU_BEAMFORMER_320MHZ); + } else { + link_conf->eht_su_beamformer = false; + link_conf->eht_su_beamformee = false; + link_conf->eht_mu_beamformer = false; } if (sdata->vif.type == NL80211_IFTYPE_AP && @@ -1480,6 +1545,7 @@ static void ieee80211_free_next_beacon(struct ieee80211_link_data *link) return; kfree(link->u.ap.next_beacon->mbssid_ies); + kfree(link->u.ap.next_beacon->rnr_ies); kfree(link->u.ap.next_beacon); link->u.ap.next_beacon = NULL; } @@ -1788,7 +1854,7 @@ static int sta_link_apply_parameters(struct ieee80211_local *local, (void *)params->he_6ghz_capa, link_sta); - if (params->eht_capa) + if (params->he_capa && params->eht_capa) ieee80211_eht_cap_ie_to_sta_eht_cap(sdata, sband, (u8 *)params->he_capa, params->he_capa_len, @@ -3380,8 +3446,12 @@ cfg80211_beacon_dup(struct cfg80211_beacon_data *beacon) len = beacon->head_len + beacon->tail_len + beacon->beacon_ies_len + beacon->proberesp_ies_len + beacon->assocresp_ies_len + - beacon->probe_resp_len + beacon->lci_len + beacon->civicloc_len + - ieee80211_get_mbssid_beacon_len(beacon->mbssid_ies); + beacon->probe_resp_len + beacon->lci_len + beacon->civicloc_len; + + if (beacon->mbssid_ies) + len += ieee80211_get_mbssid_beacon_len(beacon->mbssid_ies, + beacon->rnr_ies, + beacon->mbssid_ies->cnt); new_beacon = kzalloc(sizeof(*new_beacon) + len, GFP_KERNEL); if (!new_beacon) @@ -3396,6 +3466,18 @@ cfg80211_beacon_dup(struct cfg80211_beacon_data *beacon) kfree(new_beacon); return NULL; } + + if (beacon->rnr_ies && beacon->rnr_ies->cnt) { + new_beacon->rnr_ies = + kzalloc(struct_size(new_beacon->rnr_ies, + elem, beacon->rnr_ies->cnt), + GFP_KERNEL); + if (!new_beacon->rnr_ies) { + kfree(new_beacon->mbssid_ies); + kfree(new_beacon); + return NULL; + } + } } pos = (u8 *)(new_beacon + 1); @@ -3435,10 +3517,15 @@ cfg80211_beacon_dup(struct cfg80211_beacon_data *beacon) memcpy(pos, beacon->probe_resp, beacon->probe_resp_len); pos += beacon->probe_resp_len; } - if (beacon->mbssid_ies && beacon->mbssid_ies->cnt) + if (beacon->mbssid_ies && beacon->mbssid_ies->cnt) { pos += ieee80211_copy_mbssid_beacon(pos, new_beacon->mbssid_ies, beacon->mbssid_ies); + if (beacon->rnr_ies && beacon->rnr_ies->cnt) + pos += ieee80211_copy_rnr_beacon(pos, + new_beacon->rnr_ies, + beacon->rnr_ies); + } /* might copy -1, meaning no changes requested */ new_beacon->ftm_responder = beacon->ftm_responder; @@ -4905,6 +4992,22 @@ ieee80211_del_link_station(struct wiphy *wiphy, struct net_device *dev, return ret; } +static int ieee80211_set_hw_timestamp(struct wiphy *wiphy, + struct net_device *dev, + struct cfg80211_set_hw_timestamp *hwts) +{ + struct ieee80211_sub_if_data *sdata = IEEE80211_DEV_TO_SUB_IF(dev); + struct ieee80211_local *local = sdata->local; + + if (!local->ops->set_hw_timestamp) + return -EOPNOTSUPP; + + if (!check_sdata_in_driver(sdata)) + return -EIO; + + return local->ops->set_hw_timestamp(&local->hw, &sdata->vif, hwts); +} + const struct cfg80211_ops mac80211_config_ops = { .add_virtual_intf = ieee80211_add_iface, .del_virtual_intf = ieee80211_del_iface, @@ -5015,4 +5118,5 @@ const struct cfg80211_ops mac80211_config_ops = { .add_link_station = ieee80211_add_link_station, .mod_link_station = ieee80211_mod_link_station, .del_link_station = ieee80211_del_link_station, + .set_hw_timestamp = ieee80211_set_hw_timestamp, }; diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c index dfb9f55e2685..207f772bd8ce 100644 --- a/net/mac80211/debugfs.c +++ b/net/mac80211/debugfs.c @@ -673,10 +673,6 @@ void debugfs_hw_add(struct ieee80211_local *local) statsd = debugfs_create_dir("statistics", phyd); - /* if the dir failed, don't put all the other things into the root! */ - if (!statsd) - return; - #ifdef CONFIG_MAC80211_DEBUG_COUNTERS DEBUGFS_STATS_ADD(dot11TransmittedFragmentCount); DEBUGFS_STATS_ADD(dot11MulticastTransmittedFrameCount); diff --git a/net/mac80211/debugfs_netdev.c b/net/mac80211/debugfs_netdev.c index 0bac9af3ca96..b0cef37eb394 100644 --- a/net/mac80211/debugfs_netdev.c +++ b/net/mac80211/debugfs_netdev.c @@ -23,16 +23,16 @@ #include "driver-ops.h" static ssize_t ieee80211_if_read( - struct ieee80211_sub_if_data *sdata, + void *data, char __user *userbuf, size_t count, loff_t *ppos, - ssize_t (*format)(const struct ieee80211_sub_if_data *, char *, int)) + ssize_t (*format)(const void *, char *, int)) { char buf[200]; ssize_t ret = -EINVAL; read_lock(&dev_base_lock); - ret = (*format)(sdata, buf, sizeof(buf)); + ret = (*format)(data, buf, sizeof(buf)); read_unlock(&dev_base_lock); if (ret >= 0) @@ -42,10 +42,10 @@ static ssize_t ieee80211_if_read( } static ssize_t ieee80211_if_write( - struct ieee80211_sub_if_data *sdata, + void *data, const char __user *userbuf, size_t count, loff_t *ppos, - ssize_t (*write)(struct ieee80211_sub_if_data *, const char *, int)) + ssize_t (*write)(void *, const char *, int)) { char buf[64]; ssize_t ret; @@ -58,64 +58,64 @@ static ssize_t ieee80211_if_write( buf[count] = '\0'; rtnl_lock(); - ret = (*write)(sdata, buf, count); + ret = (*write)(data, buf, count); rtnl_unlock(); return ret; } -#define IEEE80211_IF_FMT(name, field, format_string) \ +#define IEEE80211_IF_FMT(name, type, field, format_string) \ static ssize_t ieee80211_if_fmt_##name( \ - const struct ieee80211_sub_if_data *sdata, char *buf, \ + const type *data, char *buf, \ int buflen) \ { \ - return scnprintf(buf, buflen, format_string, sdata->field); \ + return scnprintf(buf, buflen, format_string, data->field); \ } -#define IEEE80211_IF_FMT_DEC(name, field) \ - IEEE80211_IF_FMT(name, field, "%d\n") -#define IEEE80211_IF_FMT_HEX(name, field) \ - IEEE80211_IF_FMT(name, field, "%#x\n") -#define IEEE80211_IF_FMT_LHEX(name, field) \ - IEEE80211_IF_FMT(name, field, "%#lx\n") +#define IEEE80211_IF_FMT_DEC(name, type, field) \ + IEEE80211_IF_FMT(name, type, field, "%d\n") +#define IEEE80211_IF_FMT_HEX(name, type, field) \ + IEEE80211_IF_FMT(name, type, field, "%#x\n") +#define IEEE80211_IF_FMT_LHEX(name, type, field) \ + IEEE80211_IF_FMT(name, type, field, "%#lx\n") -#define IEEE80211_IF_FMT_HEXARRAY(name, field) \ +#define IEEE80211_IF_FMT_HEXARRAY(name, type, field) \ static ssize_t ieee80211_if_fmt_##name( \ - const struct ieee80211_sub_if_data *sdata, \ + const type *data, \ char *buf, int buflen) \ { \ char *p = buf; \ int i; \ - for (i = 0; i < sizeof(sdata->field); i++) { \ + for (i = 0; i < sizeof(data->field); i++) { \ p += scnprintf(p, buflen + buf - p, "%.2x ", \ - sdata->field[i]); \ + data->field[i]); \ } \ p += scnprintf(p, buflen + buf - p, "\n"); \ return p - buf; \ } -#define IEEE80211_IF_FMT_ATOMIC(name, field) \ +#define IEEE80211_IF_FMT_ATOMIC(name, type, field) \ static ssize_t ieee80211_if_fmt_##name( \ - const struct ieee80211_sub_if_data *sdata, \ + const type *data, \ char *buf, int buflen) \ { \ - return scnprintf(buf, buflen, "%d\n", atomic_read(&sdata->field));\ + return scnprintf(buf, buflen, "%d\n", atomic_read(&data->field));\ } -#define IEEE80211_IF_FMT_MAC(name, field) \ +#define IEEE80211_IF_FMT_MAC(name, type, field) \ static ssize_t ieee80211_if_fmt_##name( \ - const struct ieee80211_sub_if_data *sdata, char *buf, \ + const type *data, char *buf, \ int buflen) \ { \ - return scnprintf(buf, buflen, "%pM\n", sdata->field); \ + return scnprintf(buf, buflen, "%pM\n", data->field); \ } -#define IEEE80211_IF_FMT_JIFFIES_TO_MS(name, field) \ +#define IEEE80211_IF_FMT_JIFFIES_TO_MS(name, type, field) \ static ssize_t ieee80211_if_fmt_##name( \ - const struct ieee80211_sub_if_data *sdata, \ + const type *data, \ char *buf, int buflen) \ { \ return scnprintf(buf, buflen, "%d\n", \ - jiffies_to_msecs(sdata->field)); \ + jiffies_to_msecs(data->field)); \ } #define _IEEE80211_IF_FILE_OPS(name, _read, _write) \ @@ -126,43 +126,67 @@ static const struct file_operations name##_ops = { \ .llseek = generic_file_llseek, \ } -#define _IEEE80211_IF_FILE_R_FN(name) \ +#define _IEEE80211_IF_FILE_R_FN(name, type) \ static ssize_t ieee80211_if_read_##name(struct file *file, \ char __user *userbuf, \ size_t count, loff_t *ppos) \ { \ + ssize_t (*fn)(const void *, char *, int) = (void *) \ + ((ssize_t (*)(const type, char *, int)) \ + ieee80211_if_fmt_##name); \ return ieee80211_if_read(file->private_data, \ - userbuf, count, ppos, \ - ieee80211_if_fmt_##name); \ + userbuf, count, ppos, fn); \ } -#define _IEEE80211_IF_FILE_W_FN(name) \ +#define _IEEE80211_IF_FILE_W_FN(name, type) \ static ssize_t ieee80211_if_write_##name(struct file *file, \ const char __user *userbuf, \ size_t count, loff_t *ppos) \ { \ + ssize_t (*fn)(void *, const char *, int) = (void *) \ + ((ssize_t (*)(type, const char *, int)) \ + ieee80211_if_parse_##name); \ return ieee80211_if_write(file->private_data, userbuf, count, \ - ppos, ieee80211_if_parse_##name); \ + ppos, fn); \ } #define IEEE80211_IF_FILE_R(name) \ - _IEEE80211_IF_FILE_R_FN(name) \ + _IEEE80211_IF_FILE_R_FN(name, struct ieee80211_sub_if_data *) \ _IEEE80211_IF_FILE_OPS(name, ieee80211_if_read_##name, NULL) #define IEEE80211_IF_FILE_W(name) \ - _IEEE80211_IF_FILE_W_FN(name) \ + _IEEE80211_IF_FILE_W_FN(name, struct ieee80211_sub_if_data *) \ _IEEE80211_IF_FILE_OPS(name, NULL, ieee80211_if_write_##name) #define IEEE80211_IF_FILE_RW(name) \ - _IEEE80211_IF_FILE_R_FN(name) \ - _IEEE80211_IF_FILE_W_FN(name) \ + _IEEE80211_IF_FILE_R_FN(name, struct ieee80211_sub_if_data *) \ + _IEEE80211_IF_FILE_W_FN(name, struct ieee80211_sub_if_data *) \ _IEEE80211_IF_FILE_OPS(name, ieee80211_if_read_##name, \ ieee80211_if_write_##name) #define IEEE80211_IF_FILE(name, field, format) \ - IEEE80211_IF_FMT_##format(name, field) \ + IEEE80211_IF_FMT_##format(name, struct ieee80211_sub_if_data, field) \ IEEE80211_IF_FILE_R(name) +/* Same but with a link_ prefix in the ops variable name and different type */ +#define IEEE80211_IF_LINK_FILE_R(name) \ + _IEEE80211_IF_FILE_R_FN(name, struct ieee80211_link_data *) \ + _IEEE80211_IF_FILE_OPS(link_##name, ieee80211_if_read_##name, NULL) + +#define IEEE80211_IF_LINK_FILE_W(name) \ + _IEEE80211_IF_FILE_W_FN(name) \ + _IEEE80211_IF_FILE_OPS(link_##name, NULL, ieee80211_if_write_##name) + +#define IEEE80211_IF_LINK_FILE_RW(name) \ + _IEEE80211_IF_FILE_R_FN(name, struct ieee80211_link_data *) \ + _IEEE80211_IF_FILE_W_FN(name, struct ieee80211_link_data *) \ + _IEEE80211_IF_FILE_OPS(link_##name, ieee80211_if_read_##name, \ + ieee80211_if_write_##name) + +#define IEEE80211_IF_LINK_FILE(name, field, format) \ + IEEE80211_IF_FMT_##format(name, struct ieee80211_link_data, field) \ + IEEE80211_IF_LINK_FILE_R(name) + /* common attributes */ IEEE80211_IF_FILE(rc_rateidx_mask_2ghz, rc_rateidx_mask[NL80211_BAND_2GHZ], HEX); @@ -207,9 +231,9 @@ IEEE80211_IF_FILE_R(rc_rateidx_vht_mcs_mask_5ghz); IEEE80211_IF_FILE(flags, flags, HEX); IEEE80211_IF_FILE(state, state, LHEX); -IEEE80211_IF_FILE(txpower, vif.bss_conf.txpower, DEC); -IEEE80211_IF_FILE(ap_power_level, deflink.ap_power_level, DEC); -IEEE80211_IF_FILE(user_power_level, deflink.user_power_level, DEC); +IEEE80211_IF_LINK_FILE(txpower, conf->txpower, DEC); +IEEE80211_IF_LINK_FILE(ap_power_level, ap_power_level, DEC); +IEEE80211_IF_LINK_FILE(user_power_level, user_power_level, DEC); static ssize_t ieee80211_if_fmt_hw_queues(const struct ieee80211_sub_if_data *sdata, @@ -236,9 +260,10 @@ IEEE80211_IF_FILE(bssid, deflink.u.mgd.bssid, MAC); IEEE80211_IF_FILE(aid, vif.cfg.aid, DEC); IEEE80211_IF_FILE(beacon_timeout, u.mgd.beacon_timeout, JIFFIES_TO_MS); -static int ieee80211_set_smps(struct ieee80211_sub_if_data *sdata, +static int ieee80211_set_smps(struct ieee80211_link_data *link, enum ieee80211_smps_mode smps_mode) { + struct ieee80211_sub_if_data *sdata = link->sdata; struct ieee80211_local *local = sdata->local; int err; @@ -256,7 +281,7 @@ static int ieee80211_set_smps(struct ieee80211_sub_if_data *sdata, return -EOPNOTSUPP; sdata_lock(sdata); - err = __ieee80211_request_smps_mgd(sdata, &sdata->deflink, smps_mode); + err = __ieee80211_request_smps_mgd(link->sdata, link, smps_mode); sdata_unlock(sdata); return err; @@ -269,24 +294,24 @@ static const char *smps_modes[IEEE80211_SMPS_NUM_MODES] = { [IEEE80211_SMPS_DYNAMIC] = "dynamic", }; -static ssize_t ieee80211_if_fmt_smps(const struct ieee80211_sub_if_data *sdata, +static ssize_t ieee80211_if_fmt_smps(const struct ieee80211_link_data *link, char *buf, int buflen) { - if (sdata->vif.type == NL80211_IFTYPE_STATION) + if (link->sdata->vif.type == NL80211_IFTYPE_STATION) return snprintf(buf, buflen, "request: %s\nused: %s\n", - smps_modes[sdata->deflink.u.mgd.req_smps], - smps_modes[sdata->deflink.smps_mode]); + smps_modes[link->u.mgd.req_smps], + smps_modes[link->smps_mode]); return -EINVAL; } -static ssize_t ieee80211_if_parse_smps(struct ieee80211_sub_if_data *sdata, +static ssize_t ieee80211_if_parse_smps(struct ieee80211_link_data *link, const char *buf, int buflen) { enum ieee80211_smps_mode mode; for (mode = 0; mode < IEEE80211_SMPS_NUM_MODES; mode++) { if (strncmp(buf, smps_modes[mode], buflen) == 0) { - int err = ieee80211_set_smps(sdata, mode); + int err = ieee80211_set_smps(link, mode); if (!err) return buflen; return err; @@ -295,7 +320,7 @@ static ssize_t ieee80211_if_parse_smps(struct ieee80211_sub_if_data *sdata, return -EINVAL; } -IEEE80211_IF_FILE_RW(smps); +IEEE80211_IF_LINK_FILE_RW(smps); static ssize_t ieee80211_if_parse_tkip_mic_test( struct ieee80211_sub_if_data *sdata, const char *buf, int buflen) @@ -595,6 +620,8 @@ static ssize_t ieee80211_if_parse_active_links(struct ieee80211_sub_if_data *sda } IEEE80211_IF_FILE_RW(active_links); +IEEE80211_IF_LINK_FILE(addr, conf->addr, MAC); + #ifdef CONFIG_MAC80211_MESH IEEE80211_IF_FILE(estab_plinks, u.mesh.estab_plinks, ATOMIC); @@ -685,7 +712,6 @@ static void add_sta_files(struct ieee80211_sub_if_data *sdata) DEBUGFS_ADD(bssid); DEBUGFS_ADD(aid); DEBUGFS_ADD(beacon_timeout); - DEBUGFS_ADD_MODE(smps, 0600); DEBUGFS_ADD_MODE(tkip_mic_test, 0200); DEBUGFS_ADD_MODE(beacon_loss, 0200); DEBUGFS_ADD_MODE(uapsd_queues, 0600); @@ -698,7 +724,6 @@ static void add_sta_files(struct ieee80211_sub_if_data *sdata) static void add_ap_files(struct ieee80211_sub_if_data *sdata) { DEBUGFS_ADD(num_mcast_sta); - DEBUGFS_ADD_MODE(smps, 0600); DEBUGFS_ADD(num_sta_ps); DEBUGFS_ADD(dtim_count); DEBUGFS_ADD(num_buffered_multicast); @@ -789,9 +814,6 @@ static void add_files(struct ieee80211_sub_if_data *sdata) DEBUGFS_ADD(flags); DEBUGFS_ADD(state); - DEBUGFS_ADD(txpower); - DEBUGFS_ADD(user_power_level); - DEBUGFS_ADD(ap_power_level); if (sdata->vif.type != NL80211_IFTYPE_MONITOR) add_common_files(sdata); @@ -821,6 +843,31 @@ static void add_files(struct ieee80211_sub_if_data *sdata) } } +#undef DEBUGFS_ADD_MODE +#undef DEBUGFS_ADD + +#define DEBUGFS_ADD_MODE(dentry, name, mode) \ + debugfs_create_file(#name, mode, dentry, \ + link, &link_##name##_ops) + +#define DEBUGFS_ADD(dentry, name) DEBUGFS_ADD_MODE(dentry, name, 0400) + +static void add_link_files(struct ieee80211_link_data *link, + struct dentry *dentry) +{ + DEBUGFS_ADD(dentry, txpower); + DEBUGFS_ADD(dentry, user_power_level); + DEBUGFS_ADD(dentry, ap_power_level); + + switch (link->sdata->vif.type) { + case NL80211_IFTYPE_STATION: + DEBUGFS_ADD_MODE(dentry, smps, 0600); + break; + default: + break; + } +} + void ieee80211_debugfs_add_netdev(struct ieee80211_sub_if_data *sdata) { char buf[10+IFNAMSIZ]; @@ -831,6 +878,9 @@ void ieee80211_debugfs_add_netdev(struct ieee80211_sub_if_data *sdata) sdata->debugfs.subdir_stations = debugfs_create_dir("stations", sdata->vif.debugfs_dir); add_files(sdata); + + if (!(sdata->local->hw.wiphy->flags & WIPHY_FLAG_SUPPORTS_MLO)) + add_link_files(&sdata->deflink, sdata->vif.debugfs_dir); } void ieee80211_debugfs_remove_netdev(struct ieee80211_sub_if_data *sdata) @@ -856,3 +906,66 @@ void ieee80211_debugfs_rename_netdev(struct ieee80211_sub_if_data *sdata) sprintf(buf, "netdev:%s", sdata->name); debugfs_rename(dir->d_parent, dir, dir->d_parent, buf); } + +void ieee80211_link_debugfs_add(struct ieee80211_link_data *link) +{ + char link_dir_name[10]; + + if (WARN_ON(!link->sdata->vif.debugfs_dir)) + return; + + /* For now, this should not be called for non-MLO capable drivers */ + if (WARN_ON(!(link->sdata->local->hw.wiphy->flags & WIPHY_FLAG_SUPPORTS_MLO))) + return; + + snprintf(link_dir_name, sizeof(link_dir_name), + "link-%d", link->link_id); + + link->debugfs_dir = + debugfs_create_dir(link_dir_name, + link->sdata->vif.debugfs_dir); + + DEBUGFS_ADD(link->debugfs_dir, addr); + add_link_files(link, link->debugfs_dir); +} + +void ieee80211_link_debugfs_remove(struct ieee80211_link_data *link) +{ + if (!link->sdata->vif.debugfs_dir || !link->debugfs_dir) { + link->debugfs_dir = NULL; + return; + } + + if (link->debugfs_dir == link->sdata->vif.debugfs_dir) { + WARN_ON(link != &link->sdata->deflink); + link->debugfs_dir = NULL; + return; + } + + debugfs_remove_recursive(link->debugfs_dir); + link->debugfs_dir = NULL; +} + +void ieee80211_link_debugfs_drv_add(struct ieee80211_link_data *link) +{ + if (WARN_ON(!link->debugfs_dir)) + return; + + drv_link_add_debugfs(link->sdata->local, link->sdata, + link->conf, link->debugfs_dir); +} + +void ieee80211_link_debugfs_drv_remove(struct ieee80211_link_data *link) +{ + if (!link || !link->debugfs_dir) + return; + + if (WARN_ON(link->debugfs_dir == link->sdata->vif.debugfs_dir)) + return; + + /* Recreate the directory excluding the driver data */ + debugfs_remove_recursive(link->debugfs_dir); + link->debugfs_dir = NULL; + + ieee80211_link_debugfs_add(link); +} diff --git a/net/mac80211/debugfs_netdev.h b/net/mac80211/debugfs_netdev.h index a7e9d8d518f9..99e688dcabd6 100644 --- a/net/mac80211/debugfs_netdev.h +++ b/net/mac80211/debugfs_netdev.h @@ -10,6 +10,12 @@ void ieee80211_debugfs_add_netdev(struct ieee80211_sub_if_data *sdata); void ieee80211_debugfs_remove_netdev(struct ieee80211_sub_if_data *sdata); void ieee80211_debugfs_rename_netdev(struct ieee80211_sub_if_data *sdata); + +void ieee80211_link_debugfs_add(struct ieee80211_link_data *link); +void ieee80211_link_debugfs_remove(struct ieee80211_link_data *link); + +void ieee80211_link_debugfs_drv_add(struct ieee80211_link_data *link); +void ieee80211_link_debugfs_drv_remove(struct ieee80211_link_data *link); #else static inline void ieee80211_debugfs_add_netdev( struct ieee80211_sub_if_data *sdata) @@ -20,6 +26,16 @@ static inline void ieee80211_debugfs_remove_netdev( static inline void ieee80211_debugfs_rename_netdev( struct ieee80211_sub_if_data *sdata) {} + +static inline void ieee80211_link_debugfs_add(struct ieee80211_link_data *link) +{} +static inline void ieee80211_link_debugfs_remove(struct ieee80211_link_data *link) +{} + +static inline void ieee80211_link_debugfs_drv_add(struct ieee80211_link_data *link) +{} +static inline void ieee80211_link_debugfs_drv_remove(struct ieee80211_link_data *link) +{} #endif #endif /* __IEEE80211_DEBUGFS_NETDEV_H */ diff --git a/net/mac80211/driver-ops.c b/net/mac80211/driver-ops.c index cfb09e4aed4d..30cd0c905a24 100644 --- a/net/mac80211/driver-ops.c +++ b/net/mac80211/driver-ops.c @@ -8,6 +8,7 @@ #include "trace.h" #include "driver-ops.h" #include "debugfs_sta.h" +#include "debugfs_netdev.h" int drv_start(struct ieee80211_local *local) { @@ -477,6 +478,10 @@ int drv_change_vif_links(struct ieee80211_local *local, u16 old_links, u16 new_links, struct ieee80211_bss_conf *old[IEEE80211_MLD_MAX_NUM_LINKS]) { + struct ieee80211_link_data *link; + unsigned long links_to_add; + unsigned long links_to_rem; + unsigned int link_id; int ret = -EOPNOTSUPP; might_sleep(); @@ -487,13 +492,31 @@ int drv_change_vif_links(struct ieee80211_local *local, if (old_links == new_links) return 0; + links_to_add = ~old_links & new_links; + links_to_rem = old_links & ~new_links; + + for_each_set_bit(link_id, &links_to_rem, IEEE80211_MLD_MAX_NUM_LINKS) { + link = rcu_access_pointer(sdata->link[link_id]); + + ieee80211_link_debugfs_drv_remove(link); + } + trace_drv_change_vif_links(local, sdata, old_links, new_links); if (local->ops->change_vif_links) ret = local->ops->change_vif_links(&local->hw, &sdata->vif, old_links, new_links, old); trace_drv_return_int(local, ret); - return ret; + if (ret) + return ret; + + for_each_set_bit(link_id, &links_to_add, IEEE80211_MLD_MAX_NUM_LINKS) { + link = rcu_access_pointer(sdata->link[link_id]); + + ieee80211_link_debugfs_drv_add(link); + } + + return 0; } int drv_change_sta_links(struct ieee80211_local *local, diff --git a/net/mac80211/driver-ops.h b/net/mac80211/driver-ops.h index 5d13a3dfd366..45d3e53c7383 100644 --- a/net/mac80211/driver-ops.h +++ b/net/mac80211/driver-ops.h @@ -465,6 +465,22 @@ static inline void drv_sta_remove(struct ieee80211_local *local, } #ifdef CONFIG_MAC80211_DEBUGFS +static inline void drv_link_add_debugfs(struct ieee80211_local *local, + struct ieee80211_sub_if_data *sdata, + struct ieee80211_bss_conf *link_conf, + struct dentry *dir) +{ + might_sleep(); + + sdata = get_bss_sdata(sdata); + if (!check_sdata_in_driver(sdata)) + return; + + if (local->ops->link_add_debugfs) + local->ops->link_add_debugfs(&local->hw, &sdata->vif, + link_conf, dir); +} + static inline void drv_sta_add_debugfs(struct ieee80211_local *local, struct ieee80211_sub_if_data *sdata, struct ieee80211_sta *sta, @@ -633,6 +649,21 @@ static inline void drv_flush(struct ieee80211_local *local, trace_drv_return_void(local); } +static inline void drv_flush_sta(struct ieee80211_local *local, + struct ieee80211_sub_if_data *sdata, + struct sta_info *sta) +{ + might_sleep(); + + if (sdata && !check_sdata_in_driver(sdata)) + return; + + trace_drv_flush_sta(local, sdata, &sta->sta); + if (local->ops->flush_sta) + local->ops->flush_sta(&local->hw, &sdata->vif, &sta->sta); + trace_drv_return_void(local); +} + static inline void drv_channel_switch(struct ieee80211_local *local, struct ieee80211_sub_if_data *sdata, struct ieee80211_channel_switch *ch_switch) @@ -1486,6 +1517,23 @@ static inline int drv_net_fill_forward_path(struct ieee80211_local *local, return ret; } +static inline int drv_net_setup_tc(struct ieee80211_local *local, + struct ieee80211_sub_if_data *sdata, + struct net_device *dev, + enum tc_setup_type type, void *type_data) +{ + int ret = -EOPNOTSUPP; + + sdata = get_bss_sdata(sdata); + trace_drv_net_setup_tc(local, sdata, type); + if (local->ops->net_setup_tc) + ret = local->ops->net_setup_tc(&local->hw, &sdata->vif, dev, + type, type_data); + trace_drv_return_int(local, ret); + + return ret; +} + int drv_change_vif_links(struct ieee80211_local *local, struct ieee80211_sub_if_data *sdata, u16 old_links, u16 new_links, diff --git a/net/mac80211/drop.h b/net/mac80211/drop.h new file mode 100644 index 000000000000..49dc809cab29 --- /dev/null +++ b/net/mac80211/drop.h @@ -0,0 +1,56 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * mac80211 drop reason list + * + * Copyright (C) 2023 Intel Corporation + */ + +#ifndef MAC80211_DROP_H +#define MAC80211_DROP_H +#include <net/dropreason.h> + +typedef unsigned int __bitwise ieee80211_rx_result; + +#define MAC80211_DROP_REASONS_MONITOR(R) \ + R(RX_DROP_M_UNEXPECTED_4ADDR_FRAME) \ + R(RX_DROP_M_BAD_BCN_KEYIDX) \ + R(RX_DROP_M_BAD_MGMT_KEYIDX) \ +/* this line for the trailing \ - add before this */ + +#define MAC80211_DROP_REASONS_UNUSABLE(R) \ + R(RX_DROP_U_MIC_FAIL) \ + R(RX_DROP_U_REPLAY) \ + R(RX_DROP_U_BAD_MMIE) \ +/* this line for the trailing \ - add before this */ + +/* having two enums allows for checking ieee80211_rx_result use with sparse */ +enum ___mac80211_drop_reason { +/* if we get to the end of handlers with RX_CONTINUE this will be the reason */ + ___RX_CONTINUE = SKB_CONSUMED, + +/* this never gets used as an argument to kfree_skb_reason() */ + ___RX_QUEUED = SKB_NOT_DROPPED_YET, + +#define ENUM(x) ___ ## x, + ___RX_DROP_MONITOR = SKB_DROP_REASON_SUBSYS_MAC80211_MONITOR << + SKB_DROP_REASON_SUBSYS_SHIFT, + MAC80211_DROP_REASONS_MONITOR(ENUM) + + ___RX_DROP_UNUSABLE = SKB_DROP_REASON_SUBSYS_MAC80211_UNUSABLE << + SKB_DROP_REASON_SUBSYS_SHIFT, + MAC80211_DROP_REASONS_UNUSABLE(ENUM) +#undef ENUM +}; + +enum mac80211_drop_reason { + RX_CONTINUE = (__force ieee80211_rx_result)___RX_CONTINUE, + RX_QUEUED = (__force ieee80211_rx_result)___RX_QUEUED, + RX_DROP_MONITOR = (__force ieee80211_rx_result)___RX_DROP_MONITOR, + RX_DROP_UNUSABLE = (__force ieee80211_rx_result)___RX_DROP_UNUSABLE, +#define DEF(x) x = (__force ieee80211_rx_result)___ ## x, + MAC80211_DROP_REASONS_MONITOR(DEF) + MAC80211_DROP_REASONS_UNUSABLE(DEF) +#undef DEF +}; + +#endif /* MAC80211_DROP_H */ diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h index e082582e0aa2..a0a7839cb961 100644 --- a/net/mac80211/ieee80211_i.h +++ b/net/mac80211/ieee80211_i.h @@ -33,10 +33,12 @@ #include "key.h" #include "sta_info.h" #include "debug.h" +#include "drop.h" extern const struct cfg80211_ops mac80211_config_ops; struct ieee80211_local; +struct ieee80211_mesh_fast_tx; /* Maximum number of broadcast/multicast frames to buffer when some of the * associated stations are using power saving. */ @@ -169,13 +171,6 @@ struct ieee80211_tx_data { unsigned int flags; }; - -typedef unsigned __bitwise ieee80211_rx_result; -#define RX_CONTINUE ((__force ieee80211_rx_result) 0u) -#define RX_DROP_UNUSABLE ((__force ieee80211_rx_result) 1u) -#define RX_DROP_MONITOR ((__force ieee80211_rx_result) 2u) -#define RX_QUEUED ((__force ieee80211_rx_result) 3u) - /** * enum ieee80211_packet_rx_flags - packet RX flags * @IEEE80211_RX_AMSDU: a-MSDU packet @@ -269,6 +264,7 @@ struct beacon_data { u16 cntdwn_counter_offsets[IEEE80211_MAX_CNTDWN_COUNTERS_NUM]; u8 cntdwn_current_counter; struct cfg80211_mbssid_elems *mbssid_ies; + struct cfg80211_rnr_elems *rnr_ies; struct rcu_head rcu_head; }; @@ -656,6 +652,19 @@ struct mesh_table { atomic_t entries; /* Up to MAX_MESH_NEIGHBOURS */ }; +/** + * struct mesh_tx_cache - mesh fast xmit header cache + * + * @rht: hash table containing struct ieee80211_mesh_fast_tx, using skb DA as key + * @walk_head: linked list containing all ieee80211_mesh_fast_tx objects + * @walk_lock: lock protecting walk_head and rht + */ +struct mesh_tx_cache { + struct rhashtable rht; + struct hlist_head walk_head; + spinlock_t walk_lock; +}; + struct ieee80211_if_mesh { struct timer_list housekeeping_timer; struct timer_list mesh_path_timer; @@ -696,7 +705,7 @@ struct ieee80211_if_mesh { struct mesh_stats mshstats; struct mesh_config mshcfg; atomic_t estab_plinks; - u32 mesh_seqnum; + atomic_t mesh_seqnum; bool accepting_plinks; int num_gates; struct beacon_data __rcu *beacon; @@ -734,6 +743,7 @@ struct ieee80211_if_mesh { struct mesh_table mpp_paths; /* Store paths for MPP&MAP */ int mesh_paths_generation; int mpp_paths_generation; + struct mesh_tx_cache tx_cache; }; #ifdef CONFIG_MAC80211_MESH @@ -999,6 +1009,10 @@ struct ieee80211_link_data { struct ieee80211_tx_queue_params tx_conf[IEEE80211_NUM_ACS]; struct ieee80211_bss_conf *conf; + +#ifdef CONFIG_MAC80211_DEBUGFS + struct dentry *debugfs_dir; +#endif }; struct ieee80211_sub_if_data { @@ -1167,16 +1181,34 @@ ieee80211_vif_get_shift(struct ieee80211_vif *vif) } static inline int -ieee80211_get_mbssid_beacon_len(struct cfg80211_mbssid_elems *elems) +ieee80211_get_mbssid_beacon_len(struct cfg80211_mbssid_elems *elems, + struct cfg80211_rnr_elems *rnr_elems, + u8 i) { - int i, len = 0; + int len = 0; - if (!elems) + if (!elems || !elems->cnt || i > elems->cnt) return 0; + if (i < elems->cnt) { + len = elems->elem[i].len; + if (rnr_elems) { + len += rnr_elems->elem[i].len; + for (i = elems->cnt; i < rnr_elems->cnt; i++) + len += rnr_elems->elem[i].len; + } + return len; + } + + /* i == elems->cnt, calculate total length of all MBSSID elements */ for (i = 0; i < elems->cnt; i++) len += elems->elem[i].len; + if (rnr_elems) { + for (i = 0; i < rnr_elems->cnt; i++) + len += rnr_elems->elem[i].len; + } + return len; } @@ -1938,7 +1970,8 @@ void ieee80211_color_collision_detection_work(struct work_struct *work); /* interface handling */ #define MAC80211_SUPPORTED_FEATURES_TX (NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM | \ NETIF_F_HW_CSUM | NETIF_F_SG | \ - NETIF_F_HIGHDMA | NETIF_F_GSO_SOFTWARE) + NETIF_F_HIGHDMA | NETIF_F_GSO_SOFTWARE | \ + NETIF_F_HW_TC) #define MAC80211_SUPPORTED_FEATURES_RX (NETIF_F_RXCSUM) #define MAC80211_SUPPORTED_FEATURES (MAC80211_SUPPORTED_FEATURES_TX | \ MAC80211_SUPPORTED_FEATURES_RX) @@ -2016,6 +2049,13 @@ int ieee80211_tx_control_port(struct wiphy *wiphy, struct net_device *dev, int link_id, u64 *cookie); int ieee80211_probe_mesh_link(struct wiphy *wiphy, struct net_device *dev, const u8 *buf, size_t len); +void __ieee80211_xmit_fast(struct ieee80211_sub_if_data *sdata, + struct sta_info *sta, + struct ieee80211_fast_tx *fast_tx, + struct sk_buff *skb, bool ampdu, + const u8 *da, const u8 *sa); +void ieee80211_aggr_check(struct ieee80211_sub_if_data *sdata, + struct sta_info *sta, struct sk_buff *skb); /* HT */ void ieee80211_apply_htcap_overrides(struct ieee80211_sub_if_data *sdata, @@ -2433,6 +2473,8 @@ void ieee80211_ie_build_he_6ghz_cap(struct ieee80211_sub_if_data *sdata, enum ieee80211_smps_mode smps_mode, struct sk_buff *skb); u8 *ieee80211_ie_build_he_oper(u8 *pos, struct cfg80211_chan_def *chandef); +u8 *ieee80211_ie_build_eht_oper(u8 *pos, struct cfg80211_chan_def *chandef, + const struct ieee80211_sta_eht_cap *eht_cap); int ieee80211_parse_bitrates(enum nl80211_chan_width width, const struct ieee80211_supported_band *sband, const u8 *srates, int srates_len, u32 *rates); @@ -2448,6 +2490,7 @@ void ieee80211_add_s1g_capab_ie(struct ieee80211_sub_if_data *sdata, struct sk_buff *skb); void ieee80211_add_aid_request_ie(struct ieee80211_sub_if_data *sdata, struct sk_buff *skb); +u8 *ieee80211_ie_build_s1g_cap(u8 *pos, struct ieee80211_sta_s1g_cap *s1g_cap); /* channel management */ bool ieee80211_chandef_ht_oper(const struct ieee80211_ht_operation *ht_oper, diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c index 23ed13f15067..bd2c48870add 100644 --- a/net/mac80211/iface.c +++ b/net/mac80211/iface.c @@ -813,6 +813,15 @@ ieee80211_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) dev_fetch_sw_netstats(stats, dev->tstats); } +static int ieee80211_netdev_setup_tc(struct net_device *dev, + enum tc_setup_type type, void *type_data) +{ + struct ieee80211_sub_if_data *sdata = IEEE80211_DEV_TO_SUB_IF(dev); + struct ieee80211_local *local = sdata->local; + + return drv_net_setup_tc(local, sdata, dev, type, type_data); +} + static const struct net_device_ops ieee80211_dataif_ops = { .ndo_open = ieee80211_open, .ndo_stop = ieee80211_stop, @@ -821,6 +830,7 @@ static const struct net_device_ops ieee80211_dataif_ops = { .ndo_set_rx_mode = ieee80211_set_multicast_list, .ndo_set_mac_address = ieee80211_change_mac, .ndo_get_stats64 = ieee80211_get_stats64, + .ndo_setup_tc = ieee80211_netdev_setup_tc, }; static u16 ieee80211_monitor_select_queue(struct net_device *dev, @@ -929,6 +939,7 @@ static const struct net_device_ops ieee80211_dataif_8023_ops = { .ndo_set_mac_address = ieee80211_change_mac, .ndo_get_stats64 = ieee80211_get_stats64, .ndo_fill_forward_path = ieee80211_netdev_fill_forward_path, + .ndo_setup_tc = ieee80211_netdev_setup_tc, }; static bool ieee80211_iftype_supports_hdr_offload(enum nl80211_iftype iftype) diff --git a/net/mac80211/link.c b/net/mac80211/link.c index 8c8869cc1fb4..e82db88a47f8 100644 --- a/net/mac80211/link.c +++ b/net/mac80211/link.c @@ -10,6 +10,7 @@ #include "ieee80211_i.h" #include "driver-ops.h" #include "key.h" +#include "debugfs_netdev.h" void ieee80211_link_setup(struct ieee80211_link_data *link) { @@ -34,6 +35,7 @@ void ieee80211_link_init(struct ieee80211_sub_if_data *sdata, link->link_id = link_id; link->conf = link_conf; link_conf->link_id = link_id; + link_conf->vif = &sdata->vif; INIT_WORK(&link->csa_finalize_work, ieee80211_csa_finalize_work); @@ -60,6 +62,8 @@ void ieee80211_link_init(struct ieee80211_sub_if_data *sdata, default: WARN_ON(1); } + + ieee80211_link_debugfs_add(link); } } @@ -93,6 +97,7 @@ static void ieee80211_tear_down_links(struct ieee80211_sub_if_data *sdata, if (WARN_ON(!link)) continue; ieee80211_remove_link_keys(link, &keys); + ieee80211_link_debugfs_remove(link); ieee80211_link_stop(link); } diff --git a/net/mac80211/main.c b/net/mac80211/main.c index ddf2b7811c55..55cdfaef0f5d 100644 --- a/net/mac80211/main.c +++ b/net/mac80211/main.c @@ -22,6 +22,7 @@ #include <linux/bitmap.h> #include <linux/inetdevice.h> #include <net/net_namespace.h> +#include <net/dropreason.h> #include <net/cfg80211.h> #include <net/addrconf.h> @@ -1542,6 +1543,28 @@ void ieee80211_free_hw(struct ieee80211_hw *hw) } EXPORT_SYMBOL(ieee80211_free_hw); +static const char * const drop_reasons_monitor[] = { +#define V(x) #x, + [0] = "RX_DROP_MONITOR", + MAC80211_DROP_REASONS_MONITOR(V) +}; + +static struct drop_reason_list drop_reason_list_monitor = { + .reasons = drop_reasons_monitor, + .n_reasons = ARRAY_SIZE(drop_reasons_monitor), +}; + +static const char * const drop_reasons_unusable[] = { + [0] = "RX_DROP_UNUSABLE", + MAC80211_DROP_REASONS_UNUSABLE(V) +#undef V +}; + +static struct drop_reason_list drop_reason_list_unusable = { + .reasons = drop_reasons_unusable, + .n_reasons = ARRAY_SIZE(drop_reasons_unusable), +}; + static int __init ieee80211_init(void) { struct sk_buff *skb; @@ -1559,6 +1582,11 @@ static int __init ieee80211_init(void) if (ret) goto err_netdev; + drop_reasons_register_subsys(SKB_DROP_REASON_SUBSYS_MAC80211_MONITOR, + &drop_reason_list_monitor); + drop_reasons_register_subsys(SKB_DROP_REASON_SUBSYS_MAC80211_UNUSABLE, + &drop_reason_list_unusable); + return 0; err_netdev: rc80211_minstrel_exit(); @@ -1574,6 +1602,9 @@ static void __exit ieee80211_exit(void) ieee80211_iface_exit(); + drop_reasons_unregister_subsys(SKB_DROP_REASON_SUBSYS_MAC80211_MONITOR); + drop_reasons_unregister_subsys(SKB_DROP_REASON_SUBSYS_MAC80211_UNUSABLE); + rcu_barrier(); } diff --git a/net/mac80211/mesh.c b/net/mac80211/mesh.c index 5a99b8f6e465..f72333201903 100644 --- a/net/mac80211/mesh.c +++ b/net/mac80211/mesh.c @@ -10,6 +10,7 @@ #include <asm/unaligned.h> #include "ieee80211_i.h" #include "mesh.h" +#include "wme.h" #include "driver-ops.h" static int mesh_allocated; @@ -104,7 +105,7 @@ bool mesh_matches_local(struct ieee80211_sub_if_data *sdata, ieee80211_chandef_vht_oper(&sdata->local->hw, vht_cap_info, ie->vht_operation, ie->ht_operation, &sta_chan_def); - ieee80211_chandef_he_6ghz_oper(sdata, ie->he_operation, NULL, + ieee80211_chandef_he_6ghz_oper(sdata, ie->he_operation, ie->eht_operation, &sta_chan_def); if (!cfg80211_chandef_compatible(&sdata->vif.bss_conf.chandef, @@ -638,6 +639,65 @@ int mesh_add_he_6ghz_cap_ie(struct ieee80211_sub_if_data *sdata, return 0; } +int mesh_add_eht_cap_ie(struct ieee80211_sub_if_data *sdata, + struct sk_buff *skb, u8 ie_len) +{ + const struct ieee80211_sta_he_cap *he_cap; + const struct ieee80211_sta_eht_cap *eht_cap; + struct ieee80211_supported_band *sband; + u8 *pos; + + sband = ieee80211_get_sband(sdata); + if (!sband) + return -EINVAL; + + he_cap = ieee80211_get_he_iftype_cap(sband, NL80211_IFTYPE_MESH_POINT); + eht_cap = ieee80211_get_eht_iftype_cap(sband, NL80211_IFTYPE_MESH_POINT); + if (!he_cap || !eht_cap || + sdata->vif.bss_conf.chandef.width == NL80211_CHAN_WIDTH_20_NOHT || + sdata->vif.bss_conf.chandef.width == NL80211_CHAN_WIDTH_5 || + sdata->vif.bss_conf.chandef.width == NL80211_CHAN_WIDTH_10) + return 0; + + if (skb_tailroom(skb) < ie_len) + return -ENOMEM; + + pos = skb_put(skb, ie_len); + ieee80211_ie_build_eht_cap(pos, he_cap, eht_cap, pos + ie_len, false); + + return 0; +} + +int mesh_add_eht_oper_ie(struct ieee80211_sub_if_data *sdata, struct sk_buff *skb) +{ + const struct ieee80211_sta_eht_cap *eht_cap; + struct ieee80211_supported_band *sband; + u32 len; + u8 *pos; + + sband = ieee80211_get_sband(sdata); + if (!sband) + return -EINVAL; + + eht_cap = ieee80211_get_eht_iftype_cap(sband, NL80211_IFTYPE_MESH_POINT); + if (!eht_cap || + sdata->vif.bss_conf.chandef.width == NL80211_CHAN_WIDTH_20_NOHT || + sdata->vif.bss_conf.chandef.width == NL80211_CHAN_WIDTH_5 || + sdata->vif.bss_conf.chandef.width == NL80211_CHAN_WIDTH_10) + return 0; + + len = 2 + 1 + offsetof(struct ieee80211_eht_operation, optional) + + offsetof(struct ieee80211_eht_operation_info, optional); + + if (skb_tailroom(skb) < len) + return -ENOMEM; + + pos = skb_put(skb, len); + ieee80211_ie_build_eht_oper(pos, &sdata->vif.bss_conf.chandef, eht_cap); + + return 0; +} + static void ieee80211_mesh_path_timer(struct timer_list *t) { struct ieee80211_sub_if_data *sdata = @@ -696,6 +756,98 @@ ieee80211_mesh_update_bss_params(struct ieee80211_sub_if_data *sdata, if (he_oper) sdata->vif.bss_conf.he_oper.params = __le32_to_cpu(he_oper->he_oper_params); + + sdata->vif.bss_conf.eht_support = + !!ieee80211_get_eht_iftype_cap(sband, NL80211_IFTYPE_MESH_POINT); +} + +bool ieee80211_mesh_xmit_fast(struct ieee80211_sub_if_data *sdata, + struct sk_buff *skb, u32 ctrl_flags) +{ + struct ieee80211_if_mesh *ifmsh = &sdata->u.mesh; + struct ieee80211_mesh_fast_tx *entry; + struct ieee80211s_hdr *meshhdr; + u8 sa[ETH_ALEN] __aligned(2); + struct tid_ampdu_tx *tid_tx; + struct sta_info *sta; + bool copy_sa = false; + u16 ethertype; + u8 tid; + + if (ctrl_flags & IEEE80211_TX_CTRL_SKIP_MPATH_LOOKUP) + return false; + + if (ifmsh->mshcfg.dot11MeshNolearn) + return false; + + /* Add support for these cases later */ + if (ifmsh->ps_peers_light_sleep || ifmsh->ps_peers_deep_sleep) + return false; + + if (is_multicast_ether_addr(skb->data)) + return false; + + ethertype = (skb->data[12] << 8) | skb->data[13]; + if (ethertype < ETH_P_802_3_MIN) + return false; + + if (skb->sk && skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) + return false; + + if (skb->ip_summed == CHECKSUM_PARTIAL) { + skb_set_transport_header(skb, skb_checksum_start_offset(skb)); + if (skb_checksum_help(skb)) + return false; + } + + entry = mesh_fast_tx_get(sdata, skb->data); + if (!entry) + return false; + + if (skb_headroom(skb) < entry->hdrlen + entry->fast_tx.hdr_len) + return false; + + sta = rcu_dereference(entry->mpath->next_hop); + if (!sta) + return false; + + tid = skb->priority & IEEE80211_QOS_CTL_TAG1D_MASK; + tid_tx = rcu_dereference(sta->ampdu_mlme.tid_tx[tid]); + if (tid_tx) { + if (!test_bit(HT_AGG_STATE_OPERATIONAL, &tid_tx->state)) + return false; + if (tid_tx->timeout) + tid_tx->last_tx = jiffies; + } + + skb = skb_share_check(skb, GFP_ATOMIC); + if (!skb) + return true; + + skb_set_queue_mapping(skb, ieee80211_select_queue(sdata, sta, skb)); + + meshhdr = (struct ieee80211s_hdr *)entry->hdr; + if ((meshhdr->flags & MESH_FLAGS_AE) == MESH_FLAGS_AE_A5_A6) { + /* preserve SA from eth header for 6-addr frames */ + ether_addr_copy(sa, skb->data + ETH_ALEN); + copy_sa = true; + } + + memcpy(skb_push(skb, entry->hdrlen - 2 * ETH_ALEN), entry->hdr, + entry->hdrlen); + + meshhdr = (struct ieee80211s_hdr *)skb->data; + put_unaligned_le32(atomic_inc_return(&sdata->u.mesh.mesh_seqnum), + &meshhdr->seqnum); + meshhdr->ttl = sdata->u.mesh.mshcfg.dot11MeshTTL; + if (copy_sa) + ether_addr_copy(meshhdr->eaddr2, sa); + + skb_push(skb, 2 * ETH_ALEN); + __ieee80211_xmit_fast(sdata, sta, &entry->fast_tx, skb, tid_tx, + entry->mpath->dst, sdata->vif.addr); + + return true; } /** @@ -752,10 +904,8 @@ unsigned int ieee80211_new_mesh_header(struct ieee80211_sub_if_data *sdata, meshhdr->ttl = sdata->u.mesh.mshcfg.dot11MeshTTL; - /* FIXME: racy -- TX on multiple queues can be concurrent */ - put_unaligned(cpu_to_le32(sdata->u.mesh.mesh_seqnum), &meshhdr->seqnum); - sdata->u.mesh.mesh_seqnum++; - + put_unaligned_le32(atomic_inc_return(&sdata->u.mesh.mesh_seqnum), + &meshhdr->seqnum); if (addr4or5 && !addr6) { meshhdr->flags |= MESH_FLAGS_AE_A4; memcpy(meshhdr->eaddr1, addr4or5, ETH_ALEN); @@ -782,6 +932,8 @@ static void ieee80211_mesh_housekeeping(struct ieee80211_sub_if_data *sdata) changed = mesh_accept_plinks_update(sdata); ieee80211_mbss_info_change_notify(sdata, changed); + mesh_fast_tx_gc(sdata); + mod_timer(&ifmsh->housekeeping_timer, round_jiffies(jiffies + IEEE80211_MESH_HOUSEKEEPING_INTERVAL)); @@ -813,7 +965,7 @@ ieee80211_mesh_build_beacon(struct ieee80211_if_mesh *ifmsh) struct ieee80211_chanctx_conf *chanctx_conf; struct mesh_csa_settings *csa; enum nl80211_band band; - u8 ie_len_he_cap; + u8 ie_len_he_cap, ie_len_eht_cap; u8 *pos; struct ieee80211_sub_if_data *sdata; int hdr_len = offsetofend(struct ieee80211_mgmt, u.beacon); @@ -826,6 +978,8 @@ ieee80211_mesh_build_beacon(struct ieee80211_if_mesh *ifmsh) ie_len_he_cap = ieee80211_ie_len_he_cap(sdata, NL80211_IFTYPE_MESH_POINT); + ie_len_eht_cap = ieee80211_ie_len_eht_cap(sdata, + NL80211_IFTYPE_MESH_POINT); head_len = hdr_len + 2 + /* NULL SSID */ /* Channel Switch Announcement */ @@ -849,6 +1003,9 @@ ieee80211_mesh_build_beacon(struct ieee80211_if_mesh *ifmsh) 2 + 1 + sizeof(struct ieee80211_he_operation) + sizeof(struct ieee80211_he_6ghz_oper) + 2 + 1 + sizeof(struct ieee80211_he_6ghz_capa) + + ie_len_eht_cap + + 2 + 1 + offsetof(struct ieee80211_eht_operation, optional) + + offsetof(struct ieee80211_eht_operation_info, optional) + ifmsh->ie_len; bcn = kzalloc(sizeof(*bcn) + head_len + tail_len, GFP_KERNEL); @@ -969,6 +1126,8 @@ ieee80211_mesh_build_beacon(struct ieee80211_if_mesh *ifmsh) mesh_add_he_cap_ie(sdata, skb, ie_len_he_cap) || mesh_add_he_oper_ie(sdata, skb) || mesh_add_he_6ghz_cap_ie(sdata, skb) || + mesh_add_eht_cap_ie(sdata, skb, ie_len_eht_cap) || + mesh_add_eht_oper_ie(sdata, skb) || mesh_add_vendor_ies(sdata, skb)) goto out_free; diff --git a/net/mac80211/mesh.h b/net/mac80211/mesh.h index b2b717a78114..022f41292a05 100644 --- a/net/mac80211/mesh.h +++ b/net/mac80211/mesh.h @@ -122,11 +122,41 @@ struct mesh_path { u8 rann_snd_addr[ETH_ALEN]; u32 rann_metric; unsigned long last_preq_to_root; + unsigned long fast_tx_check; bool is_root; bool is_gate; u32 path_change_count; }; +#define MESH_FAST_TX_CACHE_MAX_SIZE 512 +#define MESH_FAST_TX_CACHE_THRESHOLD_SIZE 384 +#define MESH_FAST_TX_CACHE_TIMEOUT 8000 /* msecs */ + +/** + * struct ieee80211_mesh_fast_tx - cached mesh fast tx entry + * @rhash: rhashtable pointer + * @addr_key: The Ethernet DA which is the key for this entry + * @fast_tx: base fast_tx data + * @hdr: cached mesh and rfc1042 headers + * @hdrlen: length of mesh + rfc1042 + * @walk_list: list containing all the fast tx entries + * @mpath: mesh path corresponding to the Mesh DA + * @mppath: MPP entry corresponding to this DA + * @timestamp: Last used time of this entry + */ +struct ieee80211_mesh_fast_tx { + struct rhash_head rhash; + u8 addr_key[ETH_ALEN] __aligned(2); + + struct ieee80211_fast_tx fast_tx; + u8 hdr[sizeof(struct ieee80211s_hdr) + sizeof(rfc1042_header)]; + u16 hdrlen; + + struct mesh_path *mpath, *mppath; + struct hlist_node walk_list; + unsigned long timestamp; +}; + /* Recent multicast cache */ /* RMC_BUCKETS must be a power of 2, maximum 256 */ #define RMC_BUCKETS 256 @@ -204,6 +234,10 @@ int mesh_add_he_oper_ie(struct ieee80211_sub_if_data *sdata, struct sk_buff *skb); int mesh_add_he_6ghz_cap_ie(struct ieee80211_sub_if_data *sdata, struct sk_buff *skb); +int mesh_add_eht_cap_ie(struct ieee80211_sub_if_data *sdata, + struct sk_buff *skb, u8 ie_len); +int mesh_add_eht_oper_ie(struct ieee80211_sub_if_data *sdata, + struct sk_buff *skb); void mesh_rmc_free(struct ieee80211_sub_if_data *sdata); int mesh_rmc_init(struct ieee80211_sub_if_data *sdata); void ieee80211s_init(void); @@ -298,6 +332,20 @@ void mesh_path_discard_frame(struct ieee80211_sub_if_data *sdata, void mesh_path_tx_root_frame(struct ieee80211_sub_if_data *sdata); bool mesh_action_is_path_sel(struct ieee80211_mgmt *mgmt); +struct ieee80211_mesh_fast_tx * +mesh_fast_tx_get(struct ieee80211_sub_if_data *sdata, const u8 *addr); +bool ieee80211_mesh_xmit_fast(struct ieee80211_sub_if_data *sdata, + struct sk_buff *skb, u32 ctrl_flags); +void mesh_fast_tx_cache(struct ieee80211_sub_if_data *sdata, + struct sk_buff *skb, struct mesh_path *mpath); +void mesh_fast_tx_gc(struct ieee80211_sub_if_data *sdata); +void mesh_fast_tx_flush_addr(struct ieee80211_sub_if_data *sdata, + const u8 *addr); +void mesh_fast_tx_flush_mpath(struct mesh_path *mpath); +void mesh_fast_tx_flush_sta(struct ieee80211_sub_if_data *sdata, + struct sta_info *sta); +void mesh_path_refresh(struct ieee80211_sub_if_data *sdata, + struct mesh_path *mpath, const u8 *addr); #ifdef CONFIG_MAC80211_MESH static inline diff --git a/net/mac80211/mesh_hwmp.c b/net/mac80211/mesh_hwmp.c index 9b1ce7c3925a..5217e1d97dd6 100644 --- a/net/mac80211/mesh_hwmp.c +++ b/net/mac80211/mesh_hwmp.c @@ -394,6 +394,7 @@ static u32 hwmp_route_info_get(struct ieee80211_sub_if_data *sdata, u32 orig_sn, orig_metric; unsigned long orig_lifetime, exp_time; u32 last_hop_metric, new_metric; + bool flush_mpath = false; bool process = true; u8 hopcount; @@ -491,8 +492,10 @@ static u32 hwmp_route_info_get(struct ieee80211_sub_if_data *sdata, } if (fresh_info) { - if (rcu_access_pointer(mpath->next_hop) != sta) + if (rcu_access_pointer(mpath->next_hop) != sta) { mpath->path_change_count++; + flush_mpath = true; + } mesh_path_assign_nexthop(mpath, sta); mpath->flags |= MESH_PATH_SN_VALID; mpath->metric = new_metric; @@ -502,6 +505,8 @@ static u32 hwmp_route_info_get(struct ieee80211_sub_if_data *sdata, mpath->hop_count = hopcount; mesh_path_activate(mpath); spin_unlock_bh(&mpath->state_lock); + if (flush_mpath) + mesh_fast_tx_flush_mpath(mpath); ewma_mesh_fail_avg_init(&sta->mesh->fail_avg); /* init it at a low value - 0 start is tricky */ ewma_mesh_fail_avg_add(&sta->mesh->fail_avg, 1); @@ -539,8 +544,10 @@ static u32 hwmp_route_info_get(struct ieee80211_sub_if_data *sdata, } if (fresh_info) { - if (rcu_access_pointer(mpath->next_hop) != sta) + if (rcu_access_pointer(mpath->next_hop) != sta) { mpath->path_change_count++; + flush_mpath = true; + } mesh_path_assign_nexthop(mpath, sta); mpath->metric = last_hop_metric; mpath->exp_time = time_after(mpath->exp_time, exp_time) @@ -548,6 +555,8 @@ static u32 hwmp_route_info_get(struct ieee80211_sub_if_data *sdata, mpath->hop_count = 1; mesh_path_activate(mpath); spin_unlock_bh(&mpath->state_lock); + if (flush_mpath) + mesh_fast_tx_flush_mpath(mpath); ewma_mesh_fail_avg_init(&sta->mesh->fail_avg); /* init it at a low value - 0 start is tricky */ ewma_mesh_fail_avg_add(&sta->mesh->fail_avg, 1); @@ -1215,6 +1224,20 @@ static int mesh_nexthop_lookup_nolearn(struct ieee80211_sub_if_data *sdata, return 0; } +void mesh_path_refresh(struct ieee80211_sub_if_data *sdata, + struct mesh_path *mpath, const u8 *addr) +{ + if (mpath->flags & (MESH_PATH_REQ_QUEUED | MESH_PATH_FIXED | + MESH_PATH_RESOLVING)) + return; + + if (time_after(jiffies, + mpath->exp_time - + msecs_to_jiffies(sdata->u.mesh.mshcfg.path_refresh_time)) && + (!addr || ether_addr_equal(sdata->vif.addr, addr))) + mesh_queue_preq(mpath, PREQ_Q_F_START | PREQ_Q_F_REFRESH); +} + /** * mesh_nexthop_lookup - put the appropriate next hop on a mesh frame. Calling * this function is considered "using" the associated mpath, so preempt a path @@ -1242,19 +1265,15 @@ int mesh_nexthop_lookup(struct ieee80211_sub_if_data *sdata, if (!mpath || !(mpath->flags & MESH_PATH_ACTIVE)) return -ENOENT; - if (time_after(jiffies, - mpath->exp_time - - msecs_to_jiffies(sdata->u.mesh.mshcfg.path_refresh_time)) && - ether_addr_equal(sdata->vif.addr, hdr->addr4) && - !(mpath->flags & MESH_PATH_RESOLVING) && - !(mpath->flags & MESH_PATH_FIXED)) - mesh_queue_preq(mpath, PREQ_Q_F_START | PREQ_Q_F_REFRESH); + mesh_path_refresh(sdata, mpath, hdr->addr4); next_hop = rcu_dereference(mpath->next_hop); if (next_hop) { memcpy(hdr->addr1, next_hop->sta.addr, ETH_ALEN); memcpy(hdr->addr2, sdata->vif.addr, ETH_ALEN); ieee80211_mps_set_frame_flags(sdata, next_hop, hdr); + if (ieee80211_hw_check(&sdata->local->hw, SUPPORT_FAST_XMIT)) + mesh_fast_tx_cache(sdata, skb, mpath); return 0; } diff --git a/net/mac80211/mesh_pathtbl.c b/net/mac80211/mesh_pathtbl.c index 3b81e6df3f34..d32e304eeb4b 100644 --- a/net/mac80211/mesh_pathtbl.c +++ b/net/mac80211/mesh_pathtbl.c @@ -14,6 +14,7 @@ #include "wme.h" #include "ieee80211_i.h" #include "mesh.h" +#include <linux/rhashtable.h> static void mesh_path_free_rcu(struct mesh_table *tbl, struct mesh_path *mpath); @@ -32,6 +33,41 @@ static const struct rhashtable_params mesh_rht_params = { .hashfn = mesh_table_hash, }; +static const struct rhashtable_params fast_tx_rht_params = { + .nelem_hint = 10, + .automatic_shrinking = true, + .key_len = ETH_ALEN, + .key_offset = offsetof(struct ieee80211_mesh_fast_tx, addr_key), + .head_offset = offsetof(struct ieee80211_mesh_fast_tx, rhash), + .hashfn = mesh_table_hash, +}; + +static void __mesh_fast_tx_entry_free(void *ptr, void *tblptr) +{ + struct ieee80211_mesh_fast_tx *entry = ptr; + + kfree_rcu(entry, fast_tx.rcu_head); +} + +static void mesh_fast_tx_deinit(struct ieee80211_sub_if_data *sdata) +{ + struct mesh_tx_cache *cache; + + cache = &sdata->u.mesh.tx_cache; + rhashtable_free_and_destroy(&cache->rht, + __mesh_fast_tx_entry_free, NULL); +} + +static void mesh_fast_tx_init(struct ieee80211_sub_if_data *sdata) +{ + struct mesh_tx_cache *cache; + + cache = &sdata->u.mesh.tx_cache; + rhashtable_init(&cache->rht, &fast_tx_rht_params); + INIT_HLIST_HEAD(&cache->walk_head); + spin_lock_init(&cache->walk_lock); +} + static inline bool mpath_expired(struct mesh_path *mpath) { return (mpath->flags & MESH_PATH_ACTIVE) && @@ -381,6 +417,243 @@ struct mesh_path *mesh_path_new(struct ieee80211_sub_if_data *sdata, return new_mpath; } +static void mesh_fast_tx_entry_free(struct mesh_tx_cache *cache, + struct ieee80211_mesh_fast_tx *entry) +{ + hlist_del_rcu(&entry->walk_list); + rhashtable_remove_fast(&cache->rht, &entry->rhash, fast_tx_rht_params); + kfree_rcu(entry, fast_tx.rcu_head); +} + +struct ieee80211_mesh_fast_tx * +mesh_fast_tx_get(struct ieee80211_sub_if_data *sdata, const u8 *addr) +{ + struct ieee80211_mesh_fast_tx *entry; + struct mesh_tx_cache *cache; + + cache = &sdata->u.mesh.tx_cache; + entry = rhashtable_lookup(&cache->rht, addr, fast_tx_rht_params); + if (!entry) + return NULL; + + if (!(entry->mpath->flags & MESH_PATH_ACTIVE) || + mpath_expired(entry->mpath)) { + spin_lock_bh(&cache->walk_lock); + entry = rhashtable_lookup(&cache->rht, addr, fast_tx_rht_params); + if (entry) + mesh_fast_tx_entry_free(cache, entry); + spin_unlock_bh(&cache->walk_lock); + return NULL; + } + + mesh_path_refresh(sdata, entry->mpath, NULL); + if (entry->mppath) + entry->mppath->exp_time = jiffies; + entry->timestamp = jiffies; + + return entry; +} + +void mesh_fast_tx_cache(struct ieee80211_sub_if_data *sdata, + struct sk_buff *skb, struct mesh_path *mpath) +{ + struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)skb->data; + struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb); + struct ieee80211_mesh_fast_tx *entry, *prev; + struct ieee80211_mesh_fast_tx build = {}; + struct ieee80211s_hdr *meshhdr; + struct mesh_tx_cache *cache; + struct ieee80211_key *key; + struct mesh_path *mppath; + struct sta_info *sta; + u8 *qc; + + if (sdata->noack_map || + !ieee80211_is_data_qos(hdr->frame_control)) + return; + + build.fast_tx.hdr_len = ieee80211_hdrlen(hdr->frame_control); + meshhdr = (struct ieee80211s_hdr *)(skb->data + build.fast_tx.hdr_len); + build.hdrlen = ieee80211_get_mesh_hdrlen(meshhdr); + + cache = &sdata->u.mesh.tx_cache; + if (atomic_read(&cache->rht.nelems) >= MESH_FAST_TX_CACHE_MAX_SIZE) + return; + + sta = rcu_dereference(mpath->next_hop); + if (!sta) + return; + + if ((meshhdr->flags & MESH_FLAGS_AE) == MESH_FLAGS_AE_A5_A6) { + /* This is required to keep the mppath alive */ + mppath = mpp_path_lookup(sdata, meshhdr->eaddr1); + if (!mppath) + return; + build.mppath = mppath; + } else if (ieee80211_has_a4(hdr->frame_control)) { + mppath = mpath; + } else { + return; + } + + /* rate limit, in case fast xmit can't be enabled */ + if (mppath->fast_tx_check == jiffies) + return; + + mppath->fast_tx_check = jiffies; + + /* + * Same use of the sta lock as in ieee80211_check_fast_xmit, in order + * to protect against concurrent sta key updates. + */ + spin_lock_bh(&sta->lock); + key = rcu_access_pointer(sta->ptk[sta->ptk_idx]); + if (!key) + key = rcu_access_pointer(sdata->default_unicast_key); + build.fast_tx.key = key; + + if (key) { + bool gen_iv, iv_spc; + + gen_iv = key->conf.flags & IEEE80211_KEY_FLAG_GENERATE_IV; + iv_spc = key->conf.flags & IEEE80211_KEY_FLAG_PUT_IV_SPACE; + + if (!(key->flags & KEY_FLAG_UPLOADED_TO_HARDWARE) || + (key->flags & KEY_FLAG_TAINTED)) + goto unlock_sta; + + switch (key->conf.cipher) { + case WLAN_CIPHER_SUITE_CCMP: + case WLAN_CIPHER_SUITE_CCMP_256: + if (gen_iv) + build.fast_tx.pn_offs = build.fast_tx.hdr_len; + if (gen_iv || iv_spc) + build.fast_tx.hdr_len += IEEE80211_CCMP_HDR_LEN; + break; + case WLAN_CIPHER_SUITE_GCMP: + case WLAN_CIPHER_SUITE_GCMP_256: + if (gen_iv) + build.fast_tx.pn_offs = build.fast_tx.hdr_len; + if (gen_iv || iv_spc) + build.fast_tx.hdr_len += IEEE80211_GCMP_HDR_LEN; + break; + default: + goto unlock_sta; + } + } + + memcpy(build.addr_key, mppath->dst, ETH_ALEN); + build.timestamp = jiffies; + build.fast_tx.band = info->band; + build.fast_tx.da_offs = offsetof(struct ieee80211_hdr, addr3); + build.fast_tx.sa_offs = offsetof(struct ieee80211_hdr, addr4); + build.mpath = mpath; + memcpy(build.hdr, meshhdr, build.hdrlen); + memcpy(build.hdr + build.hdrlen, rfc1042_header, sizeof(rfc1042_header)); + build.hdrlen += sizeof(rfc1042_header); + memcpy(build.fast_tx.hdr, hdr, build.fast_tx.hdr_len); + + hdr = (struct ieee80211_hdr *)build.fast_tx.hdr; + if (build.fast_tx.key) + hdr->frame_control |= cpu_to_le16(IEEE80211_FCTL_PROTECTED); + + qc = ieee80211_get_qos_ctl(hdr); + qc[1] |= IEEE80211_QOS_CTL_MESH_CONTROL_PRESENT >> 8; + + entry = kmemdup(&build, sizeof(build), GFP_ATOMIC); + if (!entry) + goto unlock_sta; + + spin_lock(&cache->walk_lock); + prev = rhashtable_lookup_get_insert_fast(&cache->rht, + &entry->rhash, + fast_tx_rht_params); + if (unlikely(IS_ERR(prev))) { + kfree(entry); + goto unlock_cache; + } + + /* + * replace any previous entry in the hash table, in case we're + * replacing it with a different type (e.g. mpath -> mpp) + */ + if (unlikely(prev)) { + rhashtable_replace_fast(&cache->rht, &prev->rhash, + &entry->rhash, fast_tx_rht_params); + hlist_del_rcu(&prev->walk_list); + kfree_rcu(prev, fast_tx.rcu_head); + } + + hlist_add_head(&entry->walk_list, &cache->walk_head); + +unlock_cache: + spin_unlock(&cache->walk_lock); +unlock_sta: + spin_unlock_bh(&sta->lock); +} + +void mesh_fast_tx_gc(struct ieee80211_sub_if_data *sdata) +{ + unsigned long timeout = msecs_to_jiffies(MESH_FAST_TX_CACHE_TIMEOUT); + struct mesh_tx_cache *cache; + struct ieee80211_mesh_fast_tx *entry; + struct hlist_node *n; + + cache = &sdata->u.mesh.tx_cache; + if (atomic_read(&cache->rht.nelems) < MESH_FAST_TX_CACHE_THRESHOLD_SIZE) + return; + + spin_lock_bh(&cache->walk_lock); + hlist_for_each_entry_safe(entry, n, &cache->walk_head, walk_list) + if (!time_is_after_jiffies(entry->timestamp + timeout)) + mesh_fast_tx_entry_free(cache, entry); + spin_unlock_bh(&cache->walk_lock); +} + +void mesh_fast_tx_flush_mpath(struct mesh_path *mpath) +{ + struct ieee80211_sub_if_data *sdata = mpath->sdata; + struct mesh_tx_cache *cache = &sdata->u.mesh.tx_cache; + struct ieee80211_mesh_fast_tx *entry; + struct hlist_node *n; + + cache = &sdata->u.mesh.tx_cache; + spin_lock_bh(&cache->walk_lock); + hlist_for_each_entry_safe(entry, n, &cache->walk_head, walk_list) + if (entry->mpath == mpath) + mesh_fast_tx_entry_free(cache, entry); + spin_unlock_bh(&cache->walk_lock); +} + +void mesh_fast_tx_flush_sta(struct ieee80211_sub_if_data *sdata, + struct sta_info *sta) +{ + struct mesh_tx_cache *cache = &sdata->u.mesh.tx_cache; + struct ieee80211_mesh_fast_tx *entry; + struct hlist_node *n; + + cache = &sdata->u.mesh.tx_cache; + spin_lock_bh(&cache->walk_lock); + hlist_for_each_entry_safe(entry, n, &cache->walk_head, walk_list) + if (rcu_access_pointer(entry->mpath->next_hop) == sta) + mesh_fast_tx_entry_free(cache, entry); + spin_unlock_bh(&cache->walk_lock); +} + +void mesh_fast_tx_flush_addr(struct ieee80211_sub_if_data *sdata, + const u8 *addr) +{ + struct mesh_tx_cache *cache = &sdata->u.mesh.tx_cache; + struct ieee80211_mesh_fast_tx *entry; + + cache = &sdata->u.mesh.tx_cache; + spin_lock_bh(&cache->walk_lock); + entry = rhashtable_lookup(&cache->rht, addr, fast_tx_rht_params); + if (entry) + mesh_fast_tx_entry_free(cache, entry); + spin_unlock_bh(&cache->walk_lock); +} + /** * mesh_path_add - allocate and add a new path to the mesh path table * @dst: destination address of the path (ETH_ALEN length) @@ -464,6 +737,8 @@ int mpp_path_add(struct ieee80211_sub_if_data *sdata, if (ret) kfree(new_mpath); + else + mesh_fast_tx_flush_addr(sdata, dst); sdata->u.mesh.mpp_paths_generation++; return ret; @@ -523,6 +798,10 @@ static void __mesh_path_del(struct mesh_table *tbl, struct mesh_path *mpath) { hlist_del_rcu(&mpath->walk_list); rhashtable_remove_fast(&tbl->rhead, &mpath->rhash, mesh_rht_params); + if (tbl == &mpath->sdata->u.mesh.mpp_paths) + mesh_fast_tx_flush_addr(mpath->sdata, mpath->dst); + else + mesh_fast_tx_flush_mpath(mpath); mesh_path_free_rcu(tbl, mpath); } @@ -747,6 +1026,7 @@ void mesh_path_fix_nexthop(struct mesh_path *mpath, struct sta_info *next_hop) mpath->exp_time = 0; mpath->flags = MESH_PATH_FIXED | MESH_PATH_SN_VALID; mesh_path_activate(mpath); + mesh_fast_tx_flush_mpath(mpath); spin_unlock_bh(&mpath->state_lock); ewma_mesh_fail_avg_init(&next_hop->mesh->fail_avg); /* init it at a low value - 0 start is tricky */ @@ -758,6 +1038,7 @@ void mesh_pathtbl_init(struct ieee80211_sub_if_data *sdata) { mesh_table_init(&sdata->u.mesh.mesh_paths); mesh_table_init(&sdata->u.mesh.mpp_paths); + mesh_fast_tx_init(sdata); } static @@ -785,6 +1066,7 @@ void mesh_path_expire(struct ieee80211_sub_if_data *sdata) void mesh_pathtbl_unregister(struct ieee80211_sub_if_data *sdata) { + mesh_fast_tx_deinit(sdata); mesh_table_free(&sdata->u.mesh.mesh_paths); mesh_table_free(&sdata->u.mesh.mpp_paths); } diff --git a/net/mac80211/mesh_plink.c b/net/mac80211/mesh_plink.c index ddfe5102b9a4..8f168bc4e4b8 100644 --- a/net/mac80211/mesh_plink.c +++ b/net/mac80211/mesh_plink.c @@ -219,12 +219,14 @@ static int mesh_plink_frame_tx(struct ieee80211_sub_if_data *sdata, bool include_plid = false; u16 peering_proto = 0; u8 *pos, ie_len = 4; - u8 ie_len_he_cap; + u8 ie_len_he_cap, ie_len_eht_cap; int hdr_len = offsetofend(struct ieee80211_mgmt, u.action.u.self_prot); int err = -ENOMEM; ie_len_he_cap = ieee80211_ie_len_he_cap(sdata, NL80211_IFTYPE_MESH_POINT); + ie_len_eht_cap = ieee80211_ie_len_eht_cap(sdata, + NL80211_IFTYPE_MESH_POINT); skb = dev_alloc_skb(local->tx_headroom + hdr_len + 2 + /* capability info */ @@ -241,6 +243,9 @@ static int mesh_plink_frame_tx(struct ieee80211_sub_if_data *sdata, 2 + 1 + sizeof(struct ieee80211_he_operation) + sizeof(struct ieee80211_he_6ghz_oper) + 2 + 1 + sizeof(struct ieee80211_he_6ghz_capa) + + ie_len_eht_cap + + 2 + 1 + offsetof(struct ieee80211_eht_operation, optional) + + offsetof(struct ieee80211_eht_operation_info, optional) + 2 + 8 + /* peering IE */ sdata->u.mesh.ie_len); if (!skb) @@ -332,7 +337,9 @@ static int mesh_plink_frame_tx(struct ieee80211_sub_if_data *sdata, mesh_add_vht_oper_ie(sdata, skb) || mesh_add_he_cap_ie(sdata, skb, ie_len_he_cap) || mesh_add_he_oper_ie(sdata, skb) || - mesh_add_he_6ghz_cap_ie(sdata, skb)) + mesh_add_he_6ghz_cap_ie(sdata, skb) || + mesh_add_eht_cap_ie(sdata, skb, ie_len_eht_cap) || + mesh_add_eht_oper_ie(sdata, skb)) goto free; } @@ -451,6 +458,11 @@ static void mesh_sta_info_init(struct ieee80211_sub_if_data *sdata, elems->he_6ghz_capa, &sta->deflink); + ieee80211_eht_cap_ie_to_sta_eht_cap(sdata, sband, elems->he_cap, + elems->he_cap_len, + elems->eht_cap, elems->eht_cap_len, + &sta->deflink); + if (bw != sta->sta.deflink.bandwidth) changed |= IEEE80211_RC_BW_CHANGED; diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c index 60792dfabc9d..e13a0354c397 100644 --- a/net/mac80211/mlme.c +++ b/net/mac80211/mlme.c @@ -2744,7 +2744,7 @@ static u32 ieee80211_handle_bss_capability(struct ieee80211_link_data *link, return changed; } -static u32 ieee80211_link_set_associated(struct ieee80211_link_data *link, +static u64 ieee80211_link_set_associated(struct ieee80211_link_data *link, struct cfg80211_bss *cbss) { struct ieee80211_sub_if_data *sdata = link->sdata; @@ -3227,7 +3227,7 @@ static void ieee80211_mgd_probe_ap(struct ieee80211_sub_if_data *sdata, struct ieee80211_if_managed *ifmgd = &sdata->u.mgd; bool already = false; - if (WARN_ON(sdata->vif.valid_links)) + if (WARN_ON_ONCE(sdata->vif.valid_links)) return; if (!ieee80211_sdata_running(sdata)) @@ -5893,7 +5893,7 @@ static void ieee80211_rx_mgmt_beacon(struct ieee80211_link_data *link, goto free; } - if (sta && elems->opmode_notif) + if (elems->opmode_notif) ieee80211_vht_handle_opmode(sdata, link_sta, *elems->opmode_notif, rx_status->band); diff --git a/net/mac80211/rc80211_minstrel_ht.c b/net/mac80211/rc80211_minstrel_ht.c index 762346598338..b34c80522047 100644 --- a/net/mac80211/rc80211_minstrel_ht.c +++ b/net/mac80211/rc80211_minstrel_ht.c @@ -1708,7 +1708,6 @@ minstrel_ht_update_caps(void *priv, struct ieee80211_supported_band *sband, struct sta_info *sta_info; bool ldpc, erp; int use_vht; - int n_supported = 0; int ack_dur; int stbc; int i; @@ -1791,8 +1790,6 @@ minstrel_ht_update_caps(void *priv, struct ieee80211_supported_band *sband, continue; mi->supported[i] = mcs->rx_mask[nss - 1]; - if (mi->supported[i]) - n_supported++; continue; } @@ -1819,9 +1816,6 @@ minstrel_ht_update_caps(void *priv, struct ieee80211_supported_band *sband, mi->supported[i] = minstrel_get_valid_vht_rates(bw, nss, vht_cap->vht_mcs.tx_mcs_map); - - if (mi->supported[i]) - n_supported++; } sta_info = container_of(sta, struct sta_info, sta); diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c index af57616d2f1d..58222c077898 100644 --- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@ -43,6 +43,7 @@ static struct sk_buff *ieee80211_clean_skb(struct sk_buff *skb, unsigned int present_fcs_len, unsigned int rtap_space) { + struct ieee80211_rx_status *status = IEEE80211_SKB_RXCB(skb); struct ieee80211_hdr *hdr; unsigned int hdrlen; __le16 fc; @@ -51,6 +52,14 @@ static struct sk_buff *ieee80211_clean_skb(struct sk_buff *skb, __pskb_trim(skb, skb->len - present_fcs_len); pskb_pull(skb, rtap_space); + /* After pulling radiotap header, clear all flags that indicate + * info in skb->data. + */ + status->flag &= ~(RX_FLAG_RADIOTAP_TLV_AT_END | + RX_FLAG_RADIOTAP_LSIG | + RX_FLAG_RADIOTAP_HE_MU | + RX_FLAG_RADIOTAP_HE); + hdr = (void *)skb->data; fc = hdr->frame_control; @@ -117,9 +126,6 @@ ieee80211_rx_radiotap_hdrlen(struct ieee80211_local *local, /* allocate extra bitmaps */ if (status->chains) len += 4 * hweight8(status->chains); - /* vendor presence bitmap */ - if (status->flag & RX_FLAG_RADIOTAP_VENDOR_DATA) - len += 4; if (ieee80211_have_rx_timestamp(status)) { len = ALIGN(len, 8); @@ -181,34 +187,28 @@ ieee80211_rx_radiotap_hdrlen(struct ieee80211_local *local, len += 2 * hweight8(status->chains); } - if (status->flag & RX_FLAG_RADIOTAP_VENDOR_DATA) { - struct ieee80211_vendor_radiotap *rtap; - int vendor_data_offset = 0; + if (status->flag & RX_FLAG_RADIOTAP_TLV_AT_END) { + int tlv_offset = 0; /* * The position to look at depends on the existence (or non- * existence) of other elements, so take that into account... */ if (status->flag & RX_FLAG_RADIOTAP_HE) - vendor_data_offset += + tlv_offset += sizeof(struct ieee80211_radiotap_he); if (status->flag & RX_FLAG_RADIOTAP_HE_MU) - vendor_data_offset += + tlv_offset += sizeof(struct ieee80211_radiotap_he_mu); if (status->flag & RX_FLAG_RADIOTAP_LSIG) - vendor_data_offset += + tlv_offset += sizeof(struct ieee80211_radiotap_lsig); - rtap = (void *)&skb->data[vendor_data_offset]; + /* ensure 4 byte alignment for TLV */ + len = ALIGN(len, 4); - /* alignment for fixed 6-byte vendor data header */ - len = ALIGN(len, 2); - /* vendor data header */ - len += 6; - if (WARN_ON(rtap->align == 0)) - rtap->align = 1; - len = ALIGN(len, rtap->align); - len += rtap->len + rtap->pad; + /* TLVs until the mac header */ + len += skb_mac_header(skb) - &skb->data[tlv_offset]; } return len; @@ -304,9 +304,9 @@ ieee80211_add_rx_radiotap_header(struct ieee80211_local *local, u32 it_present_val; u16 rx_flags = 0; u16 channel_flags = 0; + u32 tlvs_len = 0; int mpdulen, chain; unsigned long chains = status->chains; - struct ieee80211_vendor_radiotap rtap = {}; struct ieee80211_radiotap_he he = {}; struct ieee80211_radiotap_he_mu he_mu = {}; struct ieee80211_radiotap_lsig lsig = {}; @@ -327,18 +327,17 @@ ieee80211_add_rx_radiotap_header(struct ieee80211_local *local, skb_pull(skb, sizeof(lsig)); } - if (status->flag & RX_FLAG_RADIOTAP_VENDOR_DATA) { - rtap = *(struct ieee80211_vendor_radiotap *)skb->data; - /* rtap.len and rtap.pad are undone immediately */ - skb_pull(skb, sizeof(rtap) + rtap.len + rtap.pad); + if (status->flag & RX_FLAG_RADIOTAP_TLV_AT_END) { + /* data is pointer at tlv all other info was pulled off */ + tlvs_len = skb_mac_header(skb) - skb->data; } mpdulen = skb->len; if (!(has_fcs && ieee80211_hw_check(&local->hw, RX_INCLUDES_FCS))) mpdulen += FCS_LEN; - rthdr = skb_push(skb, rtap_len); - memset(rthdr, 0, rtap_len - rtap.len - rtap.pad); + rthdr = skb_push(skb, rtap_len - tlvs_len); + memset(rthdr, 0, rtap_len - tlvs_len); it_present = &rthdr->it_present; /* radiotap header, set always present flags */ @@ -360,13 +359,8 @@ ieee80211_add_rx_radiotap_header(struct ieee80211_local *local, BIT(IEEE80211_RADIOTAP_DBM_ANTSIGNAL); } - if (status->flag & RX_FLAG_RADIOTAP_VENDOR_DATA) { - it_present_val |= BIT(IEEE80211_RADIOTAP_VENDOR_NAMESPACE) | - BIT(IEEE80211_RADIOTAP_EXT); - put_unaligned_le32(it_present_val, it_present); - it_present++; - it_present_val = rtap.present; - } + if (status->flag & RX_FLAG_RADIOTAP_TLV_AT_END) + it_present_val |= BIT(IEEE80211_RADIOTAP_TLV); put_unaligned_le32(it_present_val, it_present); @@ -697,22 +691,6 @@ ieee80211_add_rx_radiotap_header(struct ieee80211_local *local, *pos++ = status->chain_signal[chain]; *pos++ = chain; } - - if (status->flag & RX_FLAG_RADIOTAP_VENDOR_DATA) { - /* ensure 2 byte alignment for the vendor field as required */ - if ((pos - (u8 *)rthdr) & 1) - *pos++ = 0; - *pos++ = rtap.oui[0]; - *pos++ = rtap.oui[1]; - *pos++ = rtap.oui[2]; - *pos++ = rtap.subns; - put_unaligned_le16(rtap.len, pos); - pos += 2; - /* align the actual payload as requested */ - while ((pos - (u8 *)rthdr) & (rtap.align - 1)) - *pos++ = 0; - /* data (and possible padding) already follows */ - } } static struct sk_buff * @@ -788,6 +766,13 @@ ieee80211_rx_monitor(struct ieee80211_local *local, struct sk_buff *origskb, bool only_monitor = false; unsigned int min_head_len; + if (WARN_ON_ONCE(status->flag & RX_FLAG_RADIOTAP_TLV_AT_END && + !skb_mac_header_was_set(origskb))) { + /* with this skb no way to know where frame payload starts */ + dev_kfree_skb(origskb); + return NULL; + } + if (status->flag & RX_FLAG_RADIOTAP_HE) rtap_space += sizeof(struct ieee80211_radiotap_he); @@ -797,12 +782,8 @@ ieee80211_rx_monitor(struct ieee80211_local *local, struct sk_buff *origskb, if (status->flag & RX_FLAG_RADIOTAP_LSIG) rtap_space += sizeof(struct ieee80211_radiotap_lsig); - if (unlikely(status->flag & RX_FLAG_RADIOTAP_VENDOR_DATA)) { - struct ieee80211_vendor_radiotap *rtap = - (void *)(origskb->data + rtap_space); - - rtap_space += sizeof(*rtap) + rtap->len + rtap->pad; - } + if (status->flag & RX_FLAG_RADIOTAP_TLV_AT_END) + rtap_space += skb_mac_header(origskb) - &origskb->data[rtap_space]; min_head_len = rtap_space; @@ -1845,7 +1826,7 @@ ieee80211_rx_h_sta_process(struct ieee80211_rx_data *rx) cfg80211_rx_unexpected_4addr_frame( rx->sdata->dev, sta->sta.addr, GFP_ATOMIC); - return RX_DROP_MONITOR; + return RX_DROP_M_UNEXPECTED_4ADDR_FRAME; } /* * Update counter and free packet here to avoid @@ -1980,7 +1961,7 @@ ieee80211_rx_h_decrypt(struct ieee80211_rx_data *rx) cfg80211_rx_unprot_mlme_mgmt(rx->sdata->dev, skb->data, skb->len); - return RX_DROP_MONITOR; /* unexpected BIP keyidx */ + return RX_DROP_M_BAD_BCN_KEYIDX; } rx->key = ieee80211_rx_get_bigtk(rx, mmie_keyidx); @@ -1994,7 +1975,7 @@ ieee80211_rx_h_decrypt(struct ieee80211_rx_data *rx) if (mmie_keyidx < NUM_DEFAULT_KEYS || mmie_keyidx >= NUM_DEFAULT_KEYS + NUM_DEFAULT_MGMT_KEYS) - return RX_DROP_MONITOR; /* unexpected BIP keyidx */ + return RX_DROP_M_BAD_MGMT_KEYIDX; /* unexpected BIP keyidx */ if (rx->link_sta) { if (ieee80211_is_group_privacy_action(skb) && test_sta_flag(rx->sta, WLAN_STA_MFP)) @@ -2582,7 +2563,7 @@ static void ieee80211_deliver_skb_to_local_stack(struct sk_buff *skb, struct ieee80211_rx_status *status = IEEE80211_SKB_RXCB(skb); bool noencrypt = !(status->flag & RX_FLAG_DECRYPTED); - cfg80211_rx_control_port(dev, skb, noencrypt); + cfg80211_rx_control_port(dev, skb, noencrypt, rx->link_id); dev_kfree_skb(skb); } else { struct ethhdr *ehdr = (void *)skb_mac_header(skb); @@ -2720,6 +2701,65 @@ ieee80211_deliver_skb(struct ieee80211_rx_data *rx) } } +#ifdef CONFIG_MAC80211_MESH +static bool +ieee80211_rx_mesh_fast_forward(struct ieee80211_sub_if_data *sdata, + struct sk_buff *skb, int hdrlen) +{ + struct ieee80211_if_mesh *ifmsh = &sdata->u.mesh; + struct ieee80211_mesh_fast_tx *entry = NULL; + struct ieee80211s_hdr *mesh_hdr; + struct tid_ampdu_tx *tid_tx; + struct sta_info *sta; + struct ethhdr eth; + u8 tid; + + mesh_hdr = (struct ieee80211s_hdr *)(skb->data + sizeof(eth)); + if ((mesh_hdr->flags & MESH_FLAGS_AE) == MESH_FLAGS_AE_A5_A6) + entry = mesh_fast_tx_get(sdata, mesh_hdr->eaddr1); + else if (!(mesh_hdr->flags & MESH_FLAGS_AE)) + entry = mesh_fast_tx_get(sdata, skb->data); + if (!entry) + return false; + + sta = rcu_dereference(entry->mpath->next_hop); + if (!sta) + return false; + + if (skb_linearize(skb)) + return false; + + tid = skb->priority & IEEE80211_QOS_CTL_TAG1D_MASK; + tid_tx = rcu_dereference(sta->ampdu_mlme.tid_tx[tid]); + if (tid_tx) { + if (!test_bit(HT_AGG_STATE_OPERATIONAL, &tid_tx->state)) + return false; + + if (tid_tx->timeout) + tid_tx->last_tx = jiffies; + } + + ieee80211_aggr_check(sdata, sta, skb); + + if (ieee80211_get_8023_tunnel_proto(skb->data + hdrlen, + &skb->protocol)) + hdrlen += ETH_ALEN; + else + skb->protocol = htons(skb->len - hdrlen); + skb_set_network_header(skb, hdrlen + 2); + + skb->dev = sdata->dev; + memcpy(ð, skb->data, ETH_HLEN - 2); + skb_pull(skb, 2); + __ieee80211_xmit_fast(sdata, sta, &entry->fast_tx, skb, tid_tx, + eth.h_dest, eth.h_source); + IEEE80211_IFSTA_MESH_CTR_INC(ifmsh, fwded_unicast); + IEEE80211_IFSTA_MESH_CTR_INC(ifmsh, fwded_frames); + + return true; +} +#endif + static ieee80211_rx_result ieee80211_rx_mesh_data(struct ieee80211_sub_if_data *sdata, struct sta_info *sta, struct sk_buff *skb) @@ -2772,6 +2812,7 @@ ieee80211_rx_mesh_data(struct ieee80211_sub_if_data *sdata, struct sta_info *sta if (mesh_hdr->flags & MESH_FLAGS_AE) { struct mesh_path *mppath; char *proxied_addr; + bool update = false; if (multicast) proxied_addr = mesh_hdr->eaddr1; @@ -2787,11 +2828,18 @@ ieee80211_rx_mesh_data(struct ieee80211_sub_if_data *sdata, struct sta_info *sta mpp_path_add(sdata, proxied_addr, eth->h_source); } else { spin_lock_bh(&mppath->state_lock); - if (!ether_addr_equal(mppath->mpp, eth->h_source)) + if (!ether_addr_equal(mppath->mpp, eth->h_source)) { memcpy(mppath->mpp, eth->h_source, ETH_ALEN); + update = true; + } mppath->exp_time = jiffies; spin_unlock_bh(&mppath->state_lock); } + + /* flush fast xmit cache if the address path changed */ + if (update) + mesh_fast_tx_flush_addr(sdata, proxied_addr); + rcu_read_unlock(); } @@ -2816,6 +2864,10 @@ ieee80211_rx_mesh_data(struct ieee80211_sub_if_data *sdata, struct sta_info *sta skb_set_queue_mapping(skb, ieee802_1d_to_ac[skb->priority]); + if (!multicast && + ieee80211_rx_mesh_fast_forward(sdata, skb, mesh_hdrlen)) + return RX_QUEUED; + ieee80211_fill_mesh_addresses(&hdr, &hdr.frame_control, eth->h_dest, eth->h_source); hdrlen = ieee80211_hdrlen(hdr.frame_control); @@ -2857,6 +2909,7 @@ ieee80211_rx_mesh_data(struct ieee80211_sub_if_data *sdata, struct sta_info *sta info->control.flags |= IEEE80211_TX_INTCFL_NEED_TXPROCESSING; info->control.vif = &sdata->vif; info->control.jiffies = jiffies; + fwd_skb->dev = sdata->dev; if (multicast) { IEEE80211_IFSTA_MESH_CTR_INC(ifmsh, fwded_mcast); memcpy(fwd_hdr->addr2, sdata->vif.addr, ETH_ALEN); @@ -2878,7 +2931,6 @@ ieee80211_rx_mesh_data(struct ieee80211_sub_if_data *sdata, struct sta_info *sta } IEEE80211_IFSTA_MESH_CTR_INC(ifmsh, fwded_frames); - fwd_skb->dev = sdata->dev; ieee80211_add_pending_skb(local, fwd_skb); rx_accept: @@ -2934,13 +2986,23 @@ __ieee80211_rx_h_amsdu(struct ieee80211_rx_data *rx, u8 data_offset) return RX_DROP_UNUSABLE; if (rx->sta->amsdu_mesh_control < 0) { - bool valid_std = ieee80211_is_valid_amsdu(skb, true); - bool valid_nonstd = ieee80211_is_valid_amsdu(skb, false); + s8 valid = -1; + int i; - if (valid_std && !valid_nonstd) - rx->sta->amsdu_mesh_control = 1; - else if (valid_nonstd && !valid_std) - rx->sta->amsdu_mesh_control = 0; + for (i = 0; i <= 2; i++) { + if (!ieee80211_is_valid_amsdu(skb, i)) + continue; + + if (valid >= 0) { + /* ambiguous */ + valid = -1; + break; + } + + valid = i; + } + + rx->sta->amsdu_mesh_control = valid; } ieee80211_amsdu_to_8023s(skb, &frame_list, dev->dev_addr, @@ -3898,7 +3960,8 @@ ieee80211_rx_h_mgmt(struct ieee80211_rx_data *rx) } static void ieee80211_rx_cooked_monitor(struct ieee80211_rx_data *rx, - struct ieee80211_rate *rate) + struct ieee80211_rate *rate, + ieee80211_rx_result reason) { struct ieee80211_sub_if_data *sdata; struct ieee80211_local *local = rx->local; @@ -3919,8 +3982,6 @@ static void ieee80211_rx_cooked_monitor(struct ieee80211_rx_data *rx, if (!local->cooked_mntrs) goto out_free_skb; - /* vendor data is long removed here */ - status->flag &= ~RX_FLAG_RADIOTAP_VENDOR_DATA; /* room for the radiotap header based on driver features */ needed_headroom = ieee80211_rx_radiotap_hdrlen(local, status, skb); @@ -3964,42 +4025,38 @@ static void ieee80211_rx_cooked_monitor(struct ieee80211_rx_data *rx, } out_free_skb: - dev_kfree_skb(skb); + kfree_skb_reason(skb, (__force u32)reason); } static void ieee80211_rx_handlers_result(struct ieee80211_rx_data *rx, ieee80211_rx_result res) { - switch (res) { - case RX_DROP_MONITOR: - I802_DEBUG_INC(rx->sdata->local->rx_handlers_drop); - if (rx->sta) - rx->link_sta->rx_stats.dropped++; - fallthrough; - case RX_CONTINUE: { - struct ieee80211_rate *rate = NULL; - struct ieee80211_supported_band *sband; - struct ieee80211_rx_status *status; - - status = IEEE80211_SKB_RXCB((rx->skb)); + struct ieee80211_rx_status *status = IEEE80211_SKB_RXCB(rx->skb); + struct ieee80211_supported_band *sband; + struct ieee80211_rate *rate = NULL; - sband = rx->local->hw.wiphy->bands[status->band]; - if (status->encoding == RX_ENC_LEGACY) - rate = &sband->bitrates[status->rate_idx]; + if (res == RX_QUEUED) { + I802_DEBUG_INC(rx->sdata->local->rx_handlers_queued); + return; + } - ieee80211_rx_cooked_monitor(rx, rate); - break; - } - case RX_DROP_UNUSABLE: + if (res != RX_CONTINUE) { I802_DEBUG_INC(rx->sdata->local->rx_handlers_drop); if (rx->sta) rx->link_sta->rx_stats.dropped++; - dev_kfree_skb(rx->skb); - break; - case RX_QUEUED: - I802_DEBUG_INC(rx->sdata->local->rx_handlers_queued); - break; } + + if (u32_get_bits((__force u32)res, SKB_DROP_REASON_SUBSYS_MASK) == + SKB_DROP_REASON_SUBSYS_MAC80211_UNUSABLE) { + kfree_skb_reason(rx->skb, (__force u32)res); + return; + } + + sband = rx->local->hw.wiphy->bands[status->band]; + if (status->encoding == RX_ENC_LEGACY) + rate = &sband->bitrates[status->rate_idx]; + + ieee80211_rx_cooked_monitor(rx, rate, res); } static void ieee80211_rx_handlers(struct ieee80211_rx_data *rx, @@ -4496,6 +4553,12 @@ void ieee80211_check_fast_rx(struct sta_info *sta) } break; + case NL80211_IFTYPE_MESH_POINT: + fastrx.expected_ds_bits = cpu_to_le16(IEEE80211_FCTL_FROMDS | + IEEE80211_FCTL_TODS); + fastrx.da_offs = offsetof(struct ieee80211_hdr, addr3); + fastrx.sa_offs = offsetof(struct ieee80211_hdr, addr4); + break; default: goto clear; } @@ -4704,6 +4767,7 @@ static bool ieee80211_invoke_fast_rx(struct ieee80211_rx_data *rx, struct sk_buff *skb = rx->skb; struct ieee80211_hdr *hdr = (void *)skb->data; struct ieee80211_rx_status *status = IEEE80211_SKB_RXCB(skb); + static ieee80211_rx_result res; int orig_len = skb->len; int hdrlen = ieee80211_hdrlen(hdr->frame_control); int snap_offs = hdrlen; @@ -4765,7 +4829,8 @@ static bool ieee80211_invoke_fast_rx(struct ieee80211_rx_data *rx, snap_offs += IEEE80211_CCMP_HDR_LEN; } - if (!(status->rx_flags & IEEE80211_RX_AMSDU)) { + if (!ieee80211_vif_is_mesh(&rx->sdata->vif) && + !(status->rx_flags & IEEE80211_RX_AMSDU)) { if (!pskb_may_pull(skb, snap_offs + sizeof(*payload))) return false; @@ -4804,13 +4869,29 @@ static bool ieee80211_invoke_fast_rx(struct ieee80211_rx_data *rx, /* do the header conversion - first grab the addresses */ ether_addr_copy(addrs.da, skb->data + fast_rx->da_offs); ether_addr_copy(addrs.sa, skb->data + fast_rx->sa_offs); - skb_postpull_rcsum(skb, skb->data + snap_offs, - sizeof(rfc1042_header) + 2); - /* remove the SNAP but leave the ethertype */ - skb_pull(skb, snap_offs + sizeof(rfc1042_header)); + if (ieee80211_vif_is_mesh(&rx->sdata->vif)) { + skb_pull(skb, snap_offs - 2); + put_unaligned_be16(skb->len - 2, skb->data); + } else { + skb_postpull_rcsum(skb, skb->data + snap_offs, + sizeof(rfc1042_header) + 2); + + /* remove the SNAP but leave the ethertype */ + skb_pull(skb, snap_offs + sizeof(rfc1042_header)); + } /* push the addresses in front */ memcpy(skb_push(skb, sizeof(addrs)), &addrs, sizeof(addrs)); + res = ieee80211_rx_mesh_data(rx->sdata, rx->sta, rx->skb); + switch (res) { + case RX_QUEUED: + return true; + case RX_CONTINUE: + break; + default: + goto drop; + } + ieee80211_rx_8023(rx, fast_rx, orig_len); return true; diff --git a/net/mac80211/scan.c b/net/mac80211/scan.c index dc3cdee51e66..32fa8aca7005 100644 --- a/net/mac80211/scan.c +++ b/net/mac80211/scan.c @@ -9,7 +9,7 @@ * Copyright 2007, Michael Wu <flamingice@sourmilk.net> * Copyright 2013-2015 Intel Mobile Communications GmbH * Copyright 2016-2017 Intel Deutschland GmbH - * Copyright (C) 2018-2021 Intel Corporation + * Copyright (C) 2018-2022 Intel Corporation */ #include <linux/if_arp.h> @@ -1246,11 +1246,11 @@ int ieee80211_request_ibss_scan(struct ieee80211_sub_if_data *sdata, return ret; } -/* - * Only call this function when a scan can't be queued -- under RTNL. - */ void ieee80211_scan_cancel(struct ieee80211_local *local) { + /* ensure a new scan cannot be queued */ + lockdep_assert_wiphy(local->hw.wiphy); + /* * We are canceling software scan, or deferred scan that was not * yet really started (see __ieee80211_start_scan ). diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c index 941bda9141fa..1400512e0dde 100644 --- a/net/mac80211/sta_info.c +++ b/net/mac80211/sta_info.c @@ -1294,6 +1294,18 @@ static void __sta_info_destroy_part2(struct sta_info *sta) WARN_ON_ONCE(ret); } + /* Flush queues before removing keys, as that might remove them + * from hardware, and then depending on the offload method, any + * frames sitting on hardware queues might be sent out without + * any encryption at all. + */ + if (local->ops->set_key) { + if (local->ops->flush_sta) + drv_flush_sta(local, sta->sdata, sta); + else + ieee80211_flush_queues(local, sta->sdata, false); + } + /* now keys can no longer be reached */ ieee80211_free_sta_keys(local, sta); diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h index e8e482a82d77..195b563132d6 100644 --- a/net/mac80211/sta_info.h +++ b/net/mac80211/sta_info.h @@ -622,8 +622,13 @@ struct link_sta_info { * taken from HT/VHT capabilities or VHT operating mode notification * @cparams: CoDel parameters for this station. * @reserved_tid: reserved TID (if any, otherwise IEEE80211_TID_UNRESERVED) - * @amsdu_mesh_control: track the mesh A-MSDU format used by the peer - * (-1: not yet known, 0: non-standard [without mesh header], 1: standard) + * @amsdu_mesh_control: track the mesh A-MSDU format used by the peer: + * + * * -1: not yet known + * * 0: non-mesh A-MSDU length field + * * 1: big-endian mesh A-MSDU length field + * * 2: little-endian mesh A-MSDU length field + * * @fast_tx: TX fastpath information * @fast_rx: RX fastpath information * @tdls_chandef: a TDLS peer can have a wider chandef that is compatible to diff --git a/net/mac80211/status.c b/net/mac80211/status.c index 3f9ddd7f04b6..2b13a52ce96c 100644 --- a/net/mac80211/status.c +++ b/net/mac80211/status.c @@ -1244,30 +1244,6 @@ void ieee80211_tx_rate_update(struct ieee80211_hw *hw, } EXPORT_SYMBOL(ieee80211_tx_rate_update); -void ieee80211_tx_status_8023(struct ieee80211_hw *hw, - struct ieee80211_vif *vif, - struct sk_buff *skb) -{ - struct ieee80211_sub_if_data *sdata; - struct ieee80211_tx_status status = { - .skb = skb, - .info = IEEE80211_SKB_CB(skb), - }; - struct sta_info *sta; - - sdata = vif_to_sdata(vif); - - rcu_read_lock(); - - if (!ieee80211_lookup_ra_sta(sdata, skb, &sta) && !IS_ERR(sta)) - status.sta = &sta->sta; - - ieee80211_tx_status_ext(hw, &status); - - rcu_read_unlock(); -} -EXPORT_SYMBOL(ieee80211_tx_status_8023); - void ieee80211_report_low_ack(struct ieee80211_sta *pubsta, u32 num_packets) { struct sta_info *sta = container_of(pubsta, struct sta_info, sta); diff --git a/net/mac80211/trace.h b/net/mac80211/trace.h index 9f4377566c42..de5d69f21306 100644 --- a/net/mac80211/trace.h +++ b/net/mac80211/trace.h @@ -1177,6 +1177,13 @@ TRACE_EVENT(drv_flush, ) ); +DEFINE_EVENT(sta_event, drv_flush_sta, + TP_PROTO(struct ieee80211_local *local, + struct ieee80211_sub_if_data *sdata, + struct ieee80211_sta *sta), + TP_ARGS(local, sdata, sta) +); + TRACE_EVENT(drv_channel_switch, TP_PROTO(struct ieee80211_local *local, struct ieee80211_sub_if_data *sdata, @@ -2478,6 +2485,31 @@ DEFINE_EVENT(sta_event, drv_net_fill_forward_path, TP_ARGS(local, sdata, sta) ); +TRACE_EVENT(drv_net_setup_tc, + TP_PROTO(struct ieee80211_local *local, + struct ieee80211_sub_if_data *sdata, + u8 type), + + TP_ARGS(local, sdata, type), + + TP_STRUCT__entry( + LOCAL_ENTRY + VIF_ENTRY + __field(u8, type) + ), + + TP_fast_assign( + LOCAL_ASSIGN; + VIF_ASSIGN; + __entry->type = type; + ), + + TP_printk( + LOCAL_PR_FMT VIF_PR_FMT " type:%d\n", + LOCAL_PR_ARG, VIF_PR_ARG, __entry->type + ) +); + TRACE_EVENT(drv_change_vif_links, TP_PROTO(struct ieee80211_local *local, struct ieee80211_sub_if_data *sdata, diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c index 7699fb410670..1a3327407552 100644 --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -488,7 +488,7 @@ ieee80211_tx_h_unicast_ps_buf(struct ieee80211_tx_data *tx) int ac = skb_get_queue_mapping(tx->skb); if (ieee80211_is_mgmt(hdr->frame_control) && - !ieee80211_is_bufferable_mmpdu(hdr->frame_control)) { + !ieee80211_is_bufferable_mmpdu(tx->skb)) { info->flags |= IEEE80211_TX_CTL_NO_PS_BUFFER; return TX_CONTINUE; } @@ -1189,10 +1189,8 @@ static bool ieee80211_tx_prep_agg(struct ieee80211_tx_data *tx, return queued; } -static void -ieee80211_aggr_check(struct ieee80211_sub_if_data *sdata, - struct sta_info *sta, - struct sk_buff *skb) +void ieee80211_aggr_check(struct ieee80211_sub_if_data *sdata, + struct sta_info *sta, struct sk_buff *skb) { struct rate_control_ref *ref = sdata->local->rate_ctrl; u16 tid; @@ -1325,7 +1323,7 @@ static struct txq_info *ieee80211_get_txq(struct ieee80211_local *local, if (!(info->flags & IEEE80211_TX_CTL_HW_80211_ENCAP) && unlikely(!ieee80211_is_data_present(hdr->frame_control))) { if ((!ieee80211_is_mgmt(hdr->frame_control) || - ieee80211_is_bufferable_mmpdu(hdr->frame_control) || + ieee80211_is_bufferable_mmpdu(skb) || vif->type == NL80211_IFTYPE_STATION) && sta && sta->uploaded) { /* @@ -3019,6 +3017,9 @@ void ieee80211_check_fast_xmit(struct sta_info *sta) if (!ieee80211_hw_check(&local->hw, SUPPORT_FAST_XMIT)) return; + if (ieee80211_vif_is_mesh(&sdata->vif)) + mesh_fast_tx_flush_sta(sdata, sta); + /* Locking here protects both the pointer itself, and against concurrent * invocations winning data access races to, e.g., the key pointer that * is used. @@ -3371,7 +3372,8 @@ static bool ieee80211_amsdu_prepare_head(struct ieee80211_sub_if_data *sdata, static bool ieee80211_amsdu_aggregate(struct ieee80211_sub_if_data *sdata, struct sta_info *sta, struct ieee80211_fast_tx *fast_tx, - struct sk_buff *skb) + struct sk_buff *skb, + const u8 *da, const u8 *sa) { struct ieee80211_local *local = sdata->local; struct fq *fq = &local->fq; @@ -3400,6 +3402,9 @@ static bool ieee80211_amsdu_aggregate(struct ieee80211_sub_if_data *sdata, if (sdata->vif.offload_flags & IEEE80211_OFFLOAD_ENCAP_ENABLED) return false; + if (ieee80211_vif_is_mesh(&sdata->vif)) + return false; + if (skb_is_gso(skb)) return false; @@ -3484,7 +3489,8 @@ static bool ieee80211_amsdu_aggregate(struct ieee80211_sub_if_data *sdata, ret = true; data = skb_push(skb, ETH_ALEN + 2); - memmove(data, data + ETH_ALEN + 2, 2 * ETH_ALEN); + ether_addr_copy(data, da); + ether_addr_copy(data + ETH_ALEN, sa); data += 2 * ETH_ALEN; len = cpu_to_be16(subframe_len); @@ -3632,10 +3638,11 @@ free: return NULL; } -static void __ieee80211_xmit_fast(struct ieee80211_sub_if_data *sdata, - struct sta_info *sta, - struct ieee80211_fast_tx *fast_tx, - struct sk_buff *skb, u8 tid, bool ampdu) +void __ieee80211_xmit_fast(struct ieee80211_sub_if_data *sdata, + struct sta_info *sta, + struct ieee80211_fast_tx *fast_tx, + struct sk_buff *skb, bool ampdu, + const u8 *da, const u8 *sa) { struct ieee80211_local *local = sdata->local; struct ieee80211_hdr *hdr = (void *)fast_tx->hdr; @@ -3644,14 +3651,13 @@ static void __ieee80211_xmit_fast(struct ieee80211_sub_if_data *sdata, ieee80211_tx_result r; int hw_headroom = sdata->local->hw.extra_tx_headroom; int extra_head = fast_tx->hdr_len - (ETH_HLEN - 2); - struct ethhdr eth; skb = skb_share_check(skb, GFP_ATOMIC); if (unlikely(!skb)) return; if ((hdr->frame_control & cpu_to_le16(IEEE80211_STYPE_QOS_DATA)) && - ieee80211_amsdu_aggregate(sdata, sta, fast_tx, skb)) + ieee80211_amsdu_aggregate(sdata, sta, fast_tx, skb, da, sa)) return; /* will not be crypto-handled beyond what we do here, so use false @@ -3664,11 +3670,10 @@ static void __ieee80211_xmit_fast(struct ieee80211_sub_if_data *sdata, ENCRYPT_NO))) goto free; - memcpy(ð, skb->data, ETH_HLEN - 2); hdr = skb_push(skb, extra_head); memcpy(skb->data, fast_tx->hdr, fast_tx->hdr_len); - memcpy(skb->data + fast_tx->da_offs, eth.h_dest, ETH_ALEN); - memcpy(skb->data + fast_tx->sa_offs, eth.h_source, ETH_ALEN); + memcpy(skb->data + fast_tx->da_offs, da, ETH_ALEN); + memcpy(skb->data + fast_tx->sa_offs, sa, ETH_ALEN); info = IEEE80211_SKB_CB(skb); memset(info, 0, sizeof(*info)); @@ -3686,7 +3691,8 @@ static void __ieee80211_xmit_fast(struct ieee80211_sub_if_data *sdata, #endif if (hdr->frame_control & cpu_to_le16(IEEE80211_STYPE_QOS_DATA)) { - tid = skb->priority & IEEE80211_QOS_CTL_TAG1D_MASK; + u8 tid = skb->priority & IEEE80211_QOS_CTL_TAG1D_MASK; + *ieee80211_get_qos_ctl(hdr) = tid; } @@ -3729,6 +3735,7 @@ static bool ieee80211_xmit_fast(struct ieee80211_sub_if_data *sdata, struct ieee80211_hdr *hdr = (void *)fast_tx->hdr; struct tid_ampdu_tx *tid_tx = NULL; struct sk_buff *next; + struct ethhdr eth; u8 tid = IEEE80211_NUM_TIDS; /* control port protocol needs a lot of special handling */ @@ -3754,6 +3761,8 @@ static bool ieee80211_xmit_fast(struct ieee80211_sub_if_data *sdata, } } + memcpy(ð, skb->data, ETH_HLEN - 2); + /* after this point (skb is modified) we cannot return false */ skb = ieee80211_tx_skb_fixup(skb, ieee80211_sdata_netdev_features(sdata)); if (!skb) @@ -3761,7 +3770,8 @@ static bool ieee80211_xmit_fast(struct ieee80211_sub_if_data *sdata, skb_list_walk_safe(skb, skb, next) { skb_mark_not_on_list(skb); - __ieee80211_xmit_fast(sdata, sta, fast_tx, skb, tid, tid_tx); + __ieee80211_xmit_fast(sdata, sta, fast_tx, skb, tid_tx, + eth.h_dest, eth.h_source); } return true; @@ -4245,8 +4255,15 @@ void __ieee80211_subif_start_xmit(struct sk_buff *skb, return; } + sk_pacing_shift_update(skb->sk, sdata->local->hw.tx_sk_pacing_shift); + rcu_read_lock(); + if (ieee80211_vif_is_mesh(&sdata->vif) && + ieee80211_hw_check(&local->hw, SUPPORT_FAST_XMIT) && + ieee80211_mesh_xmit_fast(sdata, skb, ctrl_flags)) + goto out; + if (ieee80211_lookup_ra_sta(sdata, skb, &sta)) goto out_free; @@ -4256,8 +4273,6 @@ void __ieee80211_subif_start_xmit(struct sk_buff *skb, skb_set_queue_mapping(skb, ieee80211_select_queue(sdata, sta, skb)); ieee80211_aggr_check(sdata, sta, skb); - sk_pacing_shift_update(skb->sk, sdata->local->hw.tx_sk_pacing_shift); - if (sta) { struct ieee80211_fast_tx *fast_tx; @@ -5115,6 +5130,16 @@ static int ieee80211_beacon_protect(struct sk_buff *skb, tx.key = rcu_dereference(link->default_beacon_key); if (!tx.key) return 0; + + if (unlikely(tx.key->flags & KEY_FLAG_TAINTED)) { + tx.key = NULL; + return -EINVAL; + } + + if (!(tx.key->conf.flags & IEEE80211_KEY_FLAG_SW_MGMT_TX) && + tx.key->flags & KEY_FLAG_UPLOADED_TO_HARDWARE) + IEEE80211_SKB_CB(skb)->control.hw_key = &tx.key->conf; + tx.local = local; tx.sdata = sdata; __skb_queue_head_init(&tx.skbs); @@ -5187,13 +5212,29 @@ ieee80211_beacon_get_finish(struct ieee80211_hw *hw, } static void -ieee80211_beacon_add_mbssid(struct sk_buff *skb, struct beacon_data *beacon) +ieee80211_beacon_add_mbssid(struct sk_buff *skb, struct beacon_data *beacon, + u8 i) { - int i; + if (!beacon->mbssid_ies || !beacon->mbssid_ies->cnt || + i > beacon->mbssid_ies->cnt) + return; + + if (i < beacon->mbssid_ies->cnt) { + skb_put_data(skb, beacon->mbssid_ies->elem[i].data, + beacon->mbssid_ies->elem[i].len); - if (!beacon->mbssid_ies) + if (beacon->rnr_ies && beacon->rnr_ies->cnt) { + skb_put_data(skb, beacon->rnr_ies->elem[i].data, + beacon->rnr_ies->elem[i].len); + + for (i = beacon->mbssid_ies->cnt; i < beacon->rnr_ies->cnt; i++) + skb_put_data(skb, beacon->rnr_ies->elem[i].data, + beacon->rnr_ies->elem[i].len); + } return; + } + /* i == beacon->mbssid_ies->cnt, include all MBSSID elements */ for (i = 0; i < beacon->mbssid_ies->cnt; i++) skb_put_data(skb, beacon->mbssid_ies->elem[i].data, beacon->mbssid_ies->elem[i].len); @@ -5206,7 +5247,8 @@ ieee80211_beacon_get_ap(struct ieee80211_hw *hw, struct ieee80211_mutable_offsets *offs, bool is_template, struct beacon_data *beacon, - struct ieee80211_chanctx_conf *chanctx_conf) + struct ieee80211_chanctx_conf *chanctx_conf, + u8 ema_index) { struct ieee80211_local *local = hw_to_local(hw); struct ieee80211_sub_if_data *sdata = vif_to_sdata(vif); @@ -5225,7 +5267,10 @@ ieee80211_beacon_get_ap(struct ieee80211_hw *hw, /* headroom, head length, * tail length, maximum TIM length and multiple BSSID length */ - mbssid_len = ieee80211_get_mbssid_beacon_len(beacon->mbssid_ies); + mbssid_len = ieee80211_get_mbssid_beacon_len(beacon->mbssid_ies, + beacon->rnr_ies, + ema_index); + skb = dev_alloc_skb(local->tx_headroom + beacon->head_len + beacon->tail_len + 256 + local->hw.extra_beacon_tailroom + mbssid_len); @@ -5243,7 +5288,7 @@ ieee80211_beacon_get_ap(struct ieee80211_hw *hw, offs->cntdwn_counter_offs[0] = beacon->cntdwn_counter_offsets[0]; if (mbssid_len) { - ieee80211_beacon_add_mbssid(skb, beacon); + ieee80211_beacon_add_mbssid(skb, beacon, ema_index); offs->mbssid_off = skb->len - mbssid_len; } @@ -5262,12 +5307,51 @@ ieee80211_beacon_get_ap(struct ieee80211_hw *hw, return skb; } +static struct ieee80211_ema_beacons * +ieee80211_beacon_get_ap_ema_list(struct ieee80211_hw *hw, + struct ieee80211_vif *vif, + struct ieee80211_link_data *link, + struct ieee80211_mutable_offsets *offs, + bool is_template, struct beacon_data *beacon, + struct ieee80211_chanctx_conf *chanctx_conf) +{ + struct ieee80211_ema_beacons *ema = NULL; + + if (!beacon->mbssid_ies || !beacon->mbssid_ies->cnt) + return NULL; + + ema = kzalloc(struct_size(ema, bcn, beacon->mbssid_ies->cnt), + GFP_ATOMIC); + if (!ema) + return NULL; + + for (ema->cnt = 0; ema->cnt < beacon->mbssid_ies->cnt; ema->cnt++) { + ema->bcn[ema->cnt].skb = + ieee80211_beacon_get_ap(hw, vif, link, + &ema->bcn[ema->cnt].offs, + is_template, beacon, + chanctx_conf, ema->cnt); + if (!ema->bcn[ema->cnt].skb) + break; + } + + if (ema->cnt == beacon->mbssid_ies->cnt) + return ema; + + ieee80211_beacon_free_ema_list(ema); + return NULL; +} + +#define IEEE80211_INCLUDE_ALL_MBSSID_ELEMS -1 + static struct sk_buff * __ieee80211_beacon_get(struct ieee80211_hw *hw, struct ieee80211_vif *vif, struct ieee80211_mutable_offsets *offs, bool is_template, - unsigned int link_id) + unsigned int link_id, + int ema_index, + struct ieee80211_ema_beacons **ema_beacons) { struct ieee80211_local *local = hw_to_local(hw); struct beacon_data *beacon = NULL; @@ -5296,8 +5380,29 @@ __ieee80211_beacon_get(struct ieee80211_hw *hw, if (!beacon) goto out; - skb = ieee80211_beacon_get_ap(hw, vif, link, offs, is_template, - beacon, chanctx_conf); + if (ema_beacons) { + *ema_beacons = + ieee80211_beacon_get_ap_ema_list(hw, vif, link, + offs, + is_template, + beacon, + chanctx_conf); + } else { + if (beacon->mbssid_ies && beacon->mbssid_ies->cnt) { + if (ema_index >= beacon->mbssid_ies->cnt) + goto out; /* End of MBSSID elements */ + + if (ema_index <= IEEE80211_INCLUDE_ALL_MBSSID_ELEMS) + ema_index = beacon->mbssid_ies->cnt; + } else { + ema_index = 0; + } + + skb = ieee80211_beacon_get_ap(hw, vif, link, offs, + is_template, beacon, + chanctx_conf, + ema_index); + } } else if (sdata->vif.type == NL80211_IFTYPE_ADHOC) { struct ieee80211_if_ibss *ifibss = &sdata->u.ibss; struct ieee80211_hdr *hdr; @@ -5385,10 +5490,50 @@ ieee80211_beacon_get_template(struct ieee80211_hw *hw, struct ieee80211_mutable_offsets *offs, unsigned int link_id) { - return __ieee80211_beacon_get(hw, vif, offs, true, link_id); + return __ieee80211_beacon_get(hw, vif, offs, true, link_id, + IEEE80211_INCLUDE_ALL_MBSSID_ELEMS, NULL); } EXPORT_SYMBOL(ieee80211_beacon_get_template); +struct sk_buff * +ieee80211_beacon_get_template_ema_index(struct ieee80211_hw *hw, + struct ieee80211_vif *vif, + struct ieee80211_mutable_offsets *offs, + unsigned int link_id, u8 ema_index) +{ + return __ieee80211_beacon_get(hw, vif, offs, true, link_id, ema_index, + NULL); +} +EXPORT_SYMBOL(ieee80211_beacon_get_template_ema_index); + +void ieee80211_beacon_free_ema_list(struct ieee80211_ema_beacons *ema_beacons) +{ + u8 i; + + if (!ema_beacons) + return; + + for (i = 0; i < ema_beacons->cnt; i++) + kfree_skb(ema_beacons->bcn[i].skb); + + kfree(ema_beacons); +} +EXPORT_SYMBOL(ieee80211_beacon_free_ema_list); + +struct ieee80211_ema_beacons * +ieee80211_beacon_get_template_ema_list(struct ieee80211_hw *hw, + struct ieee80211_vif *vif, + unsigned int link_id) +{ + struct ieee80211_ema_beacons *ema_beacons = NULL; + + WARN_ON(__ieee80211_beacon_get(hw, vif, NULL, false, link_id, 0, + &ema_beacons)); + + return ema_beacons; +} +EXPORT_SYMBOL(ieee80211_beacon_get_template_ema_list); + struct sk_buff *ieee80211_beacon_get_tim(struct ieee80211_hw *hw, struct ieee80211_vif *vif, u16 *tim_offset, u16 *tim_length, @@ -5396,7 +5541,9 @@ struct sk_buff *ieee80211_beacon_get_tim(struct ieee80211_hw *hw, { struct ieee80211_mutable_offsets offs = {}; struct sk_buff *bcn = __ieee80211_beacon_get(hw, vif, &offs, false, - link_id); + link_id, + IEEE80211_INCLUDE_ALL_MBSSID_ELEMS, + NULL); struct sk_buff *copy; int shift; diff --git a/net/mac80211/util.c b/net/mac80211/util.c index 8c397650b96f..1527d6aafc14 100644 --- a/net/mac80211/util.c +++ b/net/mac80211/util.c @@ -1962,6 +1962,14 @@ static int ieee80211_build_preq_ies_band(struct ieee80211_sub_if_data *sdata, rate_flags = ieee80211_chandef_rate_flags(chandef); shift = ieee80211_chandef_get_shift(chandef); + /* For direct scan add S1G IE and consider its override bits */ + if (band == NL80211_BAND_S1GHZ) { + if (end - pos < 2 + sizeof(struct ieee80211_s1g_cap)) + goto out_err; + pos = ieee80211_ie_build_s1g_cap(pos, &sband->s1g_cap); + goto done; + } + num_rates = 0; for (i = 0; i < sband->n_bitrates; i++) { if ((BIT(i) & rate_mask) == 0) @@ -3023,6 +3031,21 @@ size_t ieee80211_ie_split_vendor(const u8 *ies, size_t ielen, size_t offset) return pos; } +u8 *ieee80211_ie_build_s1g_cap(u8 *pos, struct ieee80211_sta_s1g_cap *s1g_cap) +{ + *pos++ = WLAN_EID_S1G_CAPABILITIES; + *pos++ = sizeof(struct ieee80211_s1g_cap); + memset(pos, 0, sizeof(struct ieee80211_s1g_cap)); + + memcpy(pos, &s1g_cap->cap, sizeof(s1g_cap->cap)); + pos += sizeof(s1g_cap->cap); + + memcpy(pos, &s1g_cap->nss_mcs, sizeof(s1g_cap->nss_mcs)); + pos += sizeof(s1g_cap->nss_mcs); + + return pos; +} + u8 *ieee80211_ie_build_ht_cap(u8 *pos, struct ieee80211_sta_ht_cap *ht_cap, u16 cap) { @@ -3462,6 +3485,77 @@ out: return pos; } +u8 *ieee80211_ie_build_eht_oper(u8 *pos, struct cfg80211_chan_def *chandef, + const struct ieee80211_sta_eht_cap *eht_cap) + +{ + const struct ieee80211_eht_mcs_nss_supp_20mhz_only *eht_mcs_nss = + &eht_cap->eht_mcs_nss_supp.only_20mhz; + struct ieee80211_eht_operation *eht_oper; + struct ieee80211_eht_operation_info *eht_oper_info; + u8 eht_oper_len = offsetof(struct ieee80211_eht_operation, optional); + u8 eht_oper_info_len = + offsetof(struct ieee80211_eht_operation_info, optional); + u8 chan_width = 0; + + *pos++ = WLAN_EID_EXTENSION; + *pos++ = 1 + eht_oper_len + eht_oper_info_len; + *pos++ = WLAN_EID_EXT_EHT_OPERATION; + + eht_oper = (struct ieee80211_eht_operation *)pos; + + memcpy(&eht_oper->basic_mcs_nss, eht_mcs_nss, sizeof(*eht_mcs_nss)); + eht_oper->params |= IEEE80211_EHT_OPER_INFO_PRESENT; + pos += eht_oper_len; + + eht_oper_info = + (struct ieee80211_eht_operation_info *)eht_oper->optional; + + eht_oper_info->ccfs0 = + ieee80211_frequency_to_channel(chandef->center_freq1); + if (chandef->center_freq2) + eht_oper_info->ccfs1 = + ieee80211_frequency_to_channel(chandef->center_freq2); + else + eht_oper_info->ccfs1 = 0; + + switch (chandef->width) { + case NL80211_CHAN_WIDTH_320: + chan_width = IEEE80211_EHT_OPER_CHAN_WIDTH_320MHZ; + eht_oper_info->ccfs1 = eht_oper_info->ccfs0; + if (chandef->chan->center_freq < chandef->center_freq1) + eht_oper_info->ccfs0 -= 16; + else + eht_oper_info->ccfs0 += 16; + break; + case NL80211_CHAN_WIDTH_160: + eht_oper_info->ccfs1 = eht_oper_info->ccfs0; + if (chandef->chan->center_freq < chandef->center_freq1) + eht_oper_info->ccfs0 -= 8; + else + eht_oper_info->ccfs0 += 8; + fallthrough; + case NL80211_CHAN_WIDTH_80P80: + chan_width = IEEE80211_EHT_OPER_CHAN_WIDTH_160MHZ; + break; + case NL80211_CHAN_WIDTH_80: + chan_width = IEEE80211_EHT_OPER_CHAN_WIDTH_80MHZ; + break; + case NL80211_CHAN_WIDTH_40: + chan_width = IEEE80211_EHT_OPER_CHAN_WIDTH_40MHZ; + break; + default: + chan_width = IEEE80211_EHT_OPER_CHAN_WIDTH_20MHZ; + break; + } + eht_oper_info->control = chan_width; + pos += eht_oper_info_len; + + /* TODO: eht_oper_info->optional */ + + return pos; +} + bool ieee80211_chandef_ht_oper(const struct ieee80211_ht_operation *ht_oper, struct cfg80211_chan_def *chandef) { diff --git a/net/mac80211/wpa.c b/net/mac80211/wpa.c index 20f742b5503b..4133496da378 100644 --- a/net/mac80211/wpa.c +++ b/net/mac80211/wpa.c @@ -550,7 +550,7 @@ ieee80211_crypto_ccmp_decrypt(struct ieee80211_rx_data *rx, if (res < 0 || (!res && !(status->flag & RX_FLAG_ALLOW_SAME_PN))) { key->u.ccmp.replays++; - return RX_DROP_UNUSABLE; + return RX_DROP_U_REPLAY; } if (!(status->flag & RX_FLAG_DECRYPTED)) { @@ -564,7 +564,7 @@ ieee80211_crypto_ccmp_decrypt(struct ieee80211_rx_data *rx, skb->data + hdrlen + IEEE80211_CCMP_HDR_LEN, data_len, skb->data + skb->len - mic_len)) - return RX_DROP_UNUSABLE; + return RX_DROP_U_MIC_FAIL; } memcpy(key->u.ccmp.rx_pn[queue], pn, IEEE80211_CCMP_PN_LEN); @@ -746,7 +746,7 @@ ieee80211_crypto_gcmp_decrypt(struct ieee80211_rx_data *rx) if (res < 0 || (!res && !(status->flag & RX_FLAG_ALLOW_SAME_PN))) { key->u.gcmp.replays++; - return RX_DROP_UNUSABLE; + return RX_DROP_U_REPLAY; } if (!(status->flag & RX_FLAG_DECRYPTED)) { @@ -761,7 +761,7 @@ ieee80211_crypto_gcmp_decrypt(struct ieee80211_rx_data *rx) data_len, skb->data + skb->len - IEEE80211_GCMP_MIC_LEN)) - return RX_DROP_UNUSABLE; + return RX_DROP_U_MIC_FAIL; } memcpy(key->u.gcmp.rx_pn[queue], pn, IEEE80211_GCMP_PN_LEN); @@ -930,13 +930,13 @@ ieee80211_crypto_aes_cmac_decrypt(struct ieee80211_rx_data *rx) (skb->data + skb->len - sizeof(*mmie)); if (mmie->element_id != WLAN_EID_MMIE || mmie->length != sizeof(*mmie) - 2) - return RX_DROP_UNUSABLE; /* Invalid MMIE */ + return RX_DROP_U_BAD_MMIE; /* Invalid MMIE */ bip_ipn_swap(ipn, mmie->sequence_number); if (memcmp(ipn, key->u.aes_cmac.rx_pn, 6) <= 0) { key->u.aes_cmac.replays++; - return RX_DROP_UNUSABLE; + return RX_DROP_U_REPLAY; } if (!(status->flag & RX_FLAG_DECRYPTED)) { @@ -946,7 +946,7 @@ ieee80211_crypto_aes_cmac_decrypt(struct ieee80211_rx_data *rx) skb->data + 24, skb->len - 24, mic); if (crypto_memneq(mic, mmie->mic, sizeof(mmie->mic))) { key->u.aes_cmac.icverrors++; - return RX_DROP_UNUSABLE; + return RX_DROP_U_MIC_FAIL; } } @@ -986,7 +986,7 @@ ieee80211_crypto_aes_cmac_256_decrypt(struct ieee80211_rx_data *rx) if (memcmp(ipn, key->u.aes_cmac.rx_pn, 6) <= 0) { key->u.aes_cmac.replays++; - return RX_DROP_UNUSABLE; + return RX_DROP_U_REPLAY; } if (!(status->flag & RX_FLAG_DECRYPTED)) { @@ -996,7 +996,7 @@ ieee80211_crypto_aes_cmac_256_decrypt(struct ieee80211_rx_data *rx) skb->data + 24, skb->len - 24, mic); if (crypto_memneq(mic, mmie->mic, sizeof(mmie->mic))) { key->u.aes_cmac.icverrors++; - return RX_DROP_UNUSABLE; + return RX_DROP_U_MIC_FAIL; } } @@ -1079,13 +1079,13 @@ ieee80211_crypto_aes_gmac_decrypt(struct ieee80211_rx_data *rx) (skb->data + skb->len - sizeof(*mmie)); if (mmie->element_id != WLAN_EID_MMIE || mmie->length != sizeof(*mmie) - 2) - return RX_DROP_UNUSABLE; /* Invalid MMIE */ + return RX_DROP_U_BAD_MMIE; /* Invalid MMIE */ bip_ipn_swap(ipn, mmie->sequence_number); if (memcmp(ipn, key->u.aes_gmac.rx_pn, 6) <= 0) { key->u.aes_gmac.replays++; - return RX_DROP_UNUSABLE; + return RX_DROP_U_REPLAY; } if (!(status->flag & RX_FLAG_DECRYPTED)) { @@ -1104,7 +1104,7 @@ ieee80211_crypto_aes_gmac_decrypt(struct ieee80211_rx_data *rx) crypto_memneq(mic, mmie->mic, sizeof(mmie->mic))) { key->u.aes_gmac.icverrors++; kfree(mic); - return RX_DROP_UNUSABLE; + return RX_DROP_U_MIC_FAIL; } kfree(mic); } diff --git a/net/mctp/af_mctp.c b/net/mctp/af_mctp.c index 3150f3f0c872..bb4bd0b6a4f7 100644 --- a/net/mctp/af_mctp.c +++ b/net/mctp/af_mctp.c @@ -704,7 +704,6 @@ subsys_initcall(mctp_init); module_exit(mctp_exit); MODULE_DESCRIPTION("MCTP core"); -MODULE_LICENSE("GPL v2"); MODULE_AUTHOR("Jeremy Kerr <jk@codeconstruct.com.au>"); MODULE_ALIAS_NETPROTO(PF_MCTP); diff --git a/net/mptcp/options.c b/net/mptcp/options.c index 355f798d575a..19a01b6566f1 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -442,7 +442,6 @@ static void clear_3rdack_retransmission(struct sock *sk) static bool mptcp_established_options_mp(struct sock *sk, struct sk_buff *skb, bool snd_data_fin_enable, unsigned int *size, - unsigned int remaining, struct mptcp_out_options *opts) { struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); @@ -556,7 +555,6 @@ static void mptcp_write_data_fin(struct mptcp_subflow_context *subflow, static bool mptcp_established_options_dss(struct sock *sk, struct sk_buff *skb, bool snd_data_fin_enable, unsigned int *size, - unsigned int remaining, struct mptcp_out_options *opts) { struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); @@ -580,7 +578,6 @@ static bool mptcp_established_options_dss(struct sock *sk, struct sk_buff *skb, opts->ext_copy = *mpext; } - remaining -= map_size; dss_size = map_size; if (skb && snd_data_fin_enable) mptcp_write_data_fin(subflow, skb, &opts->ext_copy); @@ -851,9 +848,9 @@ bool mptcp_established_options(struct sock *sk, struct sk_buff *skb, } snd_data_fin = mptcp_data_fin_enabled(msk); - if (mptcp_established_options_mp(sk, skb, snd_data_fin, &opt_size, remaining, opts)) + if (mptcp_established_options_mp(sk, skb, snd_data_fin, &opt_size, opts)) ret = true; - else if (mptcp_established_options_dss(sk, skb, snd_data_fin, &opt_size, remaining, opts)) { + else if (mptcp_established_options_dss(sk, skb, snd_data_fin, &opt_size, opts)) { unsigned int mp_fail_size; ret = true; @@ -1001,7 +998,7 @@ check_notify: clear_3rdack_retransmission(ssk); mptcp_pm_subflow_established(msk); } else { - mptcp_pm_fully_established(msk, ssk, GFP_ATOMIC); + mptcp_pm_fully_established(msk, ssk); } return true; diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c index 70f0ced3ca86..78c924506e83 100644 --- a/net/mptcp/pm.c +++ b/net/mptcp/pm.c @@ -126,7 +126,7 @@ static bool mptcp_pm_schedule_work(struct mptcp_sock *msk, return true; } -void mptcp_pm_fully_established(struct mptcp_sock *msk, const struct sock *ssk, gfp_t gfp) +void mptcp_pm_fully_established(struct mptcp_sock *msk, const struct sock *ssk) { struct mptcp_pm_data *pm = &msk->pm; bool announce = false; @@ -150,7 +150,7 @@ void mptcp_pm_fully_established(struct mptcp_sock *msk, const struct sock *ssk, spin_unlock_bh(&pm->lock); if (announce) - mptcp_event(MPTCP_EVENT_ESTABLISHED, msk, ssk, gfp); + mptcp_event(MPTCP_EVENT_ESTABLISHED, msk, ssk, GFP_ATOMIC); } void mptcp_pm_connection_closed(struct mptcp_sock *msk) diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c index 5c8dea49626c..bc343dab5e3f 100644 --- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -1035,8 +1035,8 @@ static int mptcp_pm_nl_create_listen_socket(struct sock *sk, lock_sock(newsk); ssock = __mptcp_nmpc_socket(mptcp_sk(newsk)); release_sock(newsk); - if (!ssock) - return -EINVAL; + if (IS_ERR(ssock)) + return PTR_ERR(ssock); mptcp_info2sockaddr(&entry->addr, &addr, entry->addr.family); #if IS_ENABLED(CONFIG_MPTCP_IPV6) @@ -2035,7 +2035,7 @@ static int mptcp_event_put_token_and_ssk(struct sk_buff *skb, nla_put_s32(skb, MPTCP_ATTR_IF_IDX, ssk->sk_bound_dev_if)) return -EMSGSIZE; - sk_err = ssk->sk_err; + sk_err = READ_ONCE(ssk->sk_err); if (sk_err && sk->sk_state == TCP_ESTABLISHED && nla_put_u8(skb, MPTCP_ATTR_ERROR, sk_err)) return -EMSGSIZE; diff --git a/net/mptcp/pm_userspace.c b/net/mptcp/pm_userspace.c index a02d3cbf2a1b..27a275805c06 100644 --- a/net/mptcp/pm_userspace.c +++ b/net/mptcp/pm_userspace.c @@ -25,8 +25,8 @@ void mptcp_free_local_addr_list(struct mptcp_sock *msk) } } -int mptcp_userspace_pm_append_new_local_addr(struct mptcp_sock *msk, - struct mptcp_pm_addr_entry *entry) +static int mptcp_userspace_pm_append_new_local_addr(struct mptcp_sock *msk, + struct mptcp_pm_addr_entry *entry) { DECLARE_BITMAP(id_bitmap, MPTCP_PM_MAX_ADDR_ID + 1); struct mptcp_pm_addr_entry *match = NULL; diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index b998e9df53ce..08dc53f56bc2 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -49,18 +49,6 @@ static void __mptcp_check_send_data_fin(struct sock *sk); DEFINE_PER_CPU(struct mptcp_delegated_action, mptcp_delegated_actions); static struct net_device mptcp_napi_dev; -/* If msk has an initial subflow socket, and the MP_CAPABLE handshake has not - * completed yet or has failed, return the subflow socket. - * Otherwise return NULL. - */ -struct socket *__mptcp_nmpc_socket(const struct mptcp_sock *msk) -{ - if (!msk->subflow || READ_ONCE(msk->can_ack)) - return NULL; - - return msk->subflow; -} - /* Returns end sequence number of the receiver's advertised window */ static u64 mptcp_wnd_end(const struct mptcp_sock *msk) { @@ -116,6 +104,31 @@ static int __mptcp_socket_create(struct mptcp_sock *msk) return 0; } +/* If the MPC handshake is not started, returns the first subflow, + * eventually allocating it. + */ +struct socket *__mptcp_nmpc_socket(struct mptcp_sock *msk) +{ + struct sock *sk = (struct sock *)msk; + int ret; + + if (!((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN))) + return ERR_PTR(-EINVAL); + + if (!msk->subflow) { + if (msk->first) + return ERR_PTR(-EINVAL); + + ret = __mptcp_socket_create(msk); + if (ret) + return ERR_PTR(ret); + + mptcp_sockopt_sync(msk, msk->first); + } + + return msk->subflow; +} + static void mptcp_drop(struct sock *sk, struct sk_buff *skb) { sk_drops_add(sk, skb); @@ -459,7 +472,7 @@ static bool mptcp_pending_data_fin(struct sock *sk, u64 *seq) return false; } -static void mptcp_set_datafin_timeout(const struct sock *sk) +static void mptcp_set_datafin_timeout(struct sock *sk) { struct inet_connection_sock *icsk = inet_csk(sk); u32 retransmits; @@ -1662,13 +1675,31 @@ static void mptcp_set_nospace(struct sock *sk) static int mptcp_disconnect(struct sock *sk, int flags); -static int mptcp_sendmsg_fastopen(struct sock *sk, struct sock *ssk, struct msghdr *msg, +static int mptcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg, size_t len, int *copied_syn) { unsigned int saved_flags = msg->msg_flags; struct mptcp_sock *msk = mptcp_sk(sk); + struct socket *ssock; + struct sock *ssk; int ret; + /* on flags based fastopen the mptcp is supposed to create the + * first subflow right now. Otherwise we are in the defer_connect + * path, and the first subflow must be already present. + * Since the defer_connect flag is cleared after the first succsful + * fastopen attempt, no need to check for additional subflow status. + */ + if (msg->msg_flags & MSG_FASTOPEN) { + ssock = __mptcp_nmpc_socket(msk); + if (IS_ERR(ssock)) + return PTR_ERR(ssock); + } + if (!msk->first) + return -EINVAL; + + ssk = msk->first; + lock_sock(ssk); msg->msg_flags |= MSG_DONTWAIT; msk->connect_flags = O_NONBLOCK; @@ -1691,6 +1722,7 @@ static int mptcp_sendmsg_fastopen(struct sock *sk, struct sock *ssk, struct msgh } else if (ret && ret != -EINPROGRESS) { mptcp_disconnect(sk, 0); } + inet_sk(sk)->defer_connect = 0; return ret; } @@ -1699,7 +1731,6 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) { struct mptcp_sock *msk = mptcp_sk(sk); struct page_frag *pfrag; - struct socket *ssock; size_t copied = 0; int ret = 0; long timeo; @@ -1709,12 +1740,10 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) lock_sock(sk); - ssock = __mptcp_nmpc_socket(msk); - if (unlikely(ssock && (inet_sk(ssock->sk)->defer_connect || - msg->msg_flags & MSG_FASTOPEN))) { + if (unlikely(inet_sk(sk)->defer_connect || msg->msg_flags & MSG_FASTOPEN)) { int copied_syn = 0; - ret = mptcp_sendmsg_fastopen(sk, ssock->sk, msg, len, &copied_syn); + ret = mptcp_sendmsg_fastopen(sk, msg, len, &copied_syn); copied += copied_syn; if (ret == -EINPROGRESS && copied_syn > 0) goto out; @@ -2486,15 +2515,15 @@ static void mptcp_check_fastclose(struct mptcp_sock *msk) /* Mirror the tcp_reset() error propagation */ switch (sk->sk_state) { case TCP_SYN_SENT: - sk->sk_err = ECONNREFUSED; + WRITE_ONCE(sk->sk_err, ECONNREFUSED); break; case TCP_CLOSE_WAIT: - sk->sk_err = EPIPE; + WRITE_ONCE(sk->sk_err, EPIPE); break; case TCP_CLOSE: return; default: - sk->sk_err = ECONNRESET; + WRITE_ONCE(sk->sk_err, ECONNRESET); } inet_sk_state_store(sk, TCP_CLOSE); @@ -2734,10 +2763,6 @@ static int mptcp_init_sock(struct sock *sk) if (unlikely(!net->mib.mptcp_statistics) && !mptcp_mib_alloc(net)) return -ENOMEM; - ret = __mptcp_socket_create(mptcp_sk(sk)); - if (ret) - return ret; - set_bit(SOCK_CUSTOM_SOCKOPT, &sk->sk_socket->flags); /* fetch the ca name; do it outside __mptcp_init_sock(), so that clone will @@ -2942,10 +2967,13 @@ bool __mptcp_close(struct sock *sk, long timeout) goto cleanup; } - if (mptcp_check_readable(msk)) { - /* the msk has read data, do the MPTCP equivalent of TCP reset */ + if (mptcp_check_readable(msk) || timeout < 0) { + /* If the msk has read data, or the caller explicitly ask it, + * do the MPTCP equivalent of TCP reset, aka MPTCP fastclose + */ inet_sk_state_store(sk, TCP_CLOSE); mptcp_do_fastclose(sk); + timeout = 0; } else if (mptcp_close_state(sk)) { __mptcp_wr_shutdown(sk); } @@ -3157,7 +3185,7 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err, struct socket *listener; struct sock *newsk; - listener = __mptcp_nmpc_socket(msk); + listener = msk->subflow; if (WARN_ON_ONCE(!listener)) { *err = -EINVAL; return NULL; @@ -3377,7 +3405,7 @@ static int mptcp_get_port(struct sock *sk, unsigned short snum) struct mptcp_sock *msk = mptcp_sk(sk); struct socket *ssock; - ssock = __mptcp_nmpc_socket(msk); + ssock = msk->subflow; pr_debug("msk=%p, subflow=%p", msk, ssock); if (WARN_ON_ONCE(!ssock)) return -EINVAL; @@ -3565,8 +3593,8 @@ static int mptcp_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len) int err = -EINVAL; ssock = __mptcp_nmpc_socket(msk); - if (!ssock) - return -EINVAL; + if (IS_ERR(ssock)) + return PTR_ERR(ssock); mptcp_token_destroy(msk); inet_sk_state_store(sk, TCP_SYN_SENT); @@ -3654,8 +3682,8 @@ static int mptcp_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) lock_sock(sock->sk); ssock = __mptcp_nmpc_socket(msk); - if (!ssock) { - err = -EINVAL; + if (IS_ERR(ssock)) { + err = PTR_ERR(ssock); goto unlock; } @@ -3691,8 +3719,8 @@ static int mptcp_listen(struct socket *sock, int backlog) lock_sock(sk); ssock = __mptcp_nmpc_socket(msk); - if (!ssock) { - err = -EINVAL; + if (IS_ERR(ssock)) { + err = PTR_ERR(ssock); goto unlock; } @@ -3723,7 +3751,10 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock, pr_debug("msk=%p", msk); - ssock = __mptcp_nmpc_socket(msk); + /* buggy applications can call accept on socket states other then LISTEN + * but no need to allocate the first subflow just to error out. + */ + ssock = msk->subflow; if (!ssock) return -EINVAL; @@ -3817,7 +3848,7 @@ static __poll_t mptcp_poll(struct file *file, struct socket *sock, /* This barrier is coupled with smp_wmb() in __mptcp_error_report() */ smp_rmb(); - if (sk->sk_err) + if (READ_ONCE(sk->sk_err)) mask |= EPOLLERR; return mask; diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index d6469b6ab38e..2d7b2c80a164 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -334,10 +334,7 @@ static inline void msk_owned_by_me(const struct mptcp_sock *msk) sock_owned_by_me((const struct sock *)msk); } -static inline struct mptcp_sock *mptcp_sk(const struct sock *sk) -{ - return (struct mptcp_sock *)sk; -} +#define mptcp_sk(ptr) container_of_const(ptr, struct mptcp_sock, sk.icsk_inet.sk) /* the msk socket don't use the backlog, also account for the bulk * free memory @@ -371,7 +368,7 @@ static inline struct mptcp_data_frag *mptcp_send_next(struct sock *sk) static inline struct mptcp_data_frag *mptcp_pending_tail(const struct sock *sk) { - struct mptcp_sock *msk = mptcp_sk(sk); + const struct mptcp_sock *msk = mptcp_sk(sk); if (!msk->first_pending) return NULL; @@ -382,7 +379,7 @@ static inline struct mptcp_data_frag *mptcp_pending_tail(const struct sock *sk) return list_last_entry(&msk->rtx_queue, struct mptcp_data_frag, list); } -static inline struct mptcp_data_frag *mptcp_rtx_head(const struct sock *sk) +static inline struct mptcp_data_frag *mptcp_rtx_head(struct sock *sk) { struct mptcp_sock *msk = mptcp_sk(sk); @@ -631,7 +628,7 @@ void __mptcp_subflow_send_ack(struct sock *ssk); void mptcp_subflow_reset(struct sock *ssk); void mptcp_subflow_queue_clean(struct sock *sk, struct sock *ssk); void mptcp_sock_graft(struct sock *sk, struct socket *parent); -struct socket *__mptcp_nmpc_socket(const struct mptcp_sock *msk); +struct socket *__mptcp_nmpc_socket(struct mptcp_sock *msk); bool __mptcp_close(struct sock *sk, long timeout); void mptcp_cancel_work(struct sock *sk); void __mptcp_unaccepted_force_close(struct sock *sk); @@ -787,7 +784,7 @@ bool mptcp_pm_addr_families_match(const struct sock *sk, void mptcp_pm_subflow_chk_stale(const struct mptcp_sock *msk, struct sock *ssk); void mptcp_pm_nl_subflow_chk_stale(const struct mptcp_sock *msk, struct sock *ssk); void mptcp_pm_new_connection(struct mptcp_sock *msk, const struct sock *ssk, int server_side); -void mptcp_pm_fully_established(struct mptcp_sock *msk, const struct sock *ssk, gfp_t gfp); +void mptcp_pm_fully_established(struct mptcp_sock *msk, const struct sock *ssk); bool mptcp_pm_allow_new_subflow(struct mptcp_sock *msk); void mptcp_pm_connection_closed(struct mptcp_sock *msk); void mptcp_pm_subflow_established(struct mptcp_sock *msk); @@ -835,8 +832,6 @@ int mptcp_pm_remove_subflow(struct mptcp_sock *msk, const struct mptcp_rm_list * void mptcp_pm_remove_addrs_and_subflows(struct mptcp_sock *msk, struct list_head *rm_list); -int mptcp_userspace_pm_append_new_local_addr(struct mptcp_sock *msk, - struct mptcp_pm_addr_entry *entry); void mptcp_free_local_addr_list(struct mptcp_sock *msk); int mptcp_nl_cmd_announce(struct sk_buff *skb, struct genl_info *info); int mptcp_nl_cmd_remove(struct sk_buff *skb, struct genl_info *info); diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c index 8a9656248b0f..d4258869ac48 100644 --- a/net/mptcp/sockopt.c +++ b/net/mptcp/sockopt.c @@ -301,9 +301,9 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname, case SO_BINDTOIFINDEX: lock_sock(sk); ssock = __mptcp_nmpc_socket(msk); - if (!ssock) { + if (IS_ERR(ssock)) { release_sock(sk); - return -EINVAL; + return PTR_ERR(ssock); } ret = sock_setsockopt(ssock, SOL_SOCKET, optname, optval, optlen); @@ -396,9 +396,9 @@ static int mptcp_setsockopt_v6(struct mptcp_sock *msk, int optname, case IPV6_FREEBIND: lock_sock(sk); ssock = __mptcp_nmpc_socket(msk); - if (!ssock) { + if (IS_ERR(ssock)) { release_sock(sk); - return -EINVAL; + return PTR_ERR(ssock); } ret = tcp_setsockopt(ssock->sk, SOL_IPV6, optname, optval, optlen); @@ -693,9 +693,9 @@ static int mptcp_setsockopt_sol_ip_set_transparent(struct mptcp_sock *msk, int o lock_sock(sk); ssock = __mptcp_nmpc_socket(msk); - if (!ssock) { + if (IS_ERR(ssock)) { release_sock(sk); - return -EINVAL; + return PTR_ERR(ssock); } issk = inet_sk(ssock->sk); @@ -762,13 +762,15 @@ static int mptcp_setsockopt_first_sf_only(struct mptcp_sock *msk, int level, int { struct sock *sk = (struct sock *)msk; struct socket *sock; - int ret = -EINVAL; + int ret; /* Limit to first subflow, before the connection establishment */ lock_sock(sk); sock = __mptcp_nmpc_socket(msk); - if (!sock) + if (IS_ERR(sock)) { + ret = PTR_ERR(sock); goto unlock; + } ret = tcp_setsockopt(sock->sk, level, optname, optval, optlen); @@ -861,7 +863,7 @@ static int mptcp_getsockopt_first_sf_only(struct mptcp_sock *msk, int level, int { struct sock *sk = (struct sock *)msk; struct socket *ssock; - int ret = -EINVAL; + int ret; struct sock *ssk; lock_sock(sk); @@ -872,8 +874,10 @@ static int mptcp_getsockopt_first_sf_only(struct mptcp_sock *msk, int level, int } ssock = __mptcp_nmpc_socket(msk); - if (!ssock) + if (IS_ERR(ssock)) { + ret = PTR_ERR(ssock); goto out; + } ret = tcp_getsockopt(ssock->sk, level, optname, optval, optlen); @@ -885,7 +889,6 @@ out: void mptcp_diag_fill_info(struct mptcp_sock *msk, struct mptcp_info *info) { u32 flags = 0; - u8 val; memset(info, 0, sizeof(*info)); @@ -893,12 +896,19 @@ void mptcp_diag_fill_info(struct mptcp_sock *msk, struct mptcp_info *info) info->mptcpi_add_addr_signal = READ_ONCE(msk->pm.add_addr_signaled); info->mptcpi_add_addr_accepted = READ_ONCE(msk->pm.add_addr_accepted); info->mptcpi_local_addr_used = READ_ONCE(msk->pm.local_addr_used); - info->mptcpi_subflows_max = mptcp_pm_get_subflows_max(msk); - val = mptcp_pm_get_add_addr_signal_max(msk); - info->mptcpi_add_addr_signal_max = val; - val = mptcp_pm_get_add_addr_accept_max(msk); - info->mptcpi_add_addr_accepted_max = val; - info->mptcpi_local_addr_max = mptcp_pm_get_local_addr_max(msk); + + /* The following limits only make sense for the in-kernel PM */ + if (mptcp_pm_is_kernel(msk)) { + info->mptcpi_subflows_max = + mptcp_pm_get_subflows_max(msk); + info->mptcpi_add_addr_signal_max = + mptcp_pm_get_add_addr_signal_max(msk); + info->mptcpi_add_addr_accepted_max = + mptcp_pm_get_add_addr_accept_max(msk); + info->mptcpi_local_addr_max = + mptcp_pm_get_local_addr_max(msk); + } + if (test_bit(MPTCP_FALLBACK_DONE, &msk->flags)) flags |= MPTCP_INFO_FLAG_FALLBACK; if (READ_ONCE(msk->can_ack)) @@ -1046,7 +1056,7 @@ static int mptcp_getsockopt_tcpinfo(struct mptcp_sock *msk, char __user *optval, static void mptcp_get_sub_addrs(const struct sock *sk, struct mptcp_subflow_addrs *a) { - struct inet_sock *inet = inet_sk(sk); + const struct inet_sock *inet = inet_sk(sk); memset(a, 0, sizeof(*a)); diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index 281c1cc8dc8d..ba065b66551a 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -695,14 +695,6 @@ static bool subflow_hmac_valid(const struct request_sock *req, return !crypto_memneq(hmac, mp_opt->hmac, MPTCPOPT_HMAC_LEN); } -static void mptcp_force_close(struct sock *sk) -{ - /* the msk is not yet exposed to user-space, and refcount is 2 */ - inet_sk_state_store(sk, TCP_CLOSE); - sk_common_release(sk); - sock_put(sk); -} - static void subflow_ulp_fallback(struct sock *sk, struct mptcp_subflow_context *old_ctx) { @@ -757,7 +749,6 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk, struct mptcp_subflow_request_sock *subflow_req; struct mptcp_options_received mp_opt; bool fallback, fallback_is_fatal; - struct sock *new_msk = NULL; struct mptcp_sock *owner; struct sock *child; @@ -786,14 +777,9 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk, * options. */ mptcp_get_options(skb, &mp_opt); - if (!(mp_opt.suboptions & OPTIONS_MPTCP_MPC)) { + if (!(mp_opt.suboptions & OPTIONS_MPTCP_MPC)) fallback = true; - goto create_child; - } - new_msk = mptcp_sk_clone(listener->conn, &mp_opt, req); - if (!new_msk) - fallback = true; } else if (subflow_req->mp_join) { mptcp_get_options(skb, &mp_opt); if (!(mp_opt.suboptions & OPTIONS_MPTCP_MPJ) || @@ -822,23 +808,23 @@ create_child: subflow_add_reset_reason(skb, MPTCP_RST_EMPTCP); goto dispose_child; } - - if (new_msk) - mptcp_copy_inaddrs(new_msk, child); - mptcp_subflow_drop_ctx(child); - goto out; + goto fallback; } /* ssk inherits options of listener sk */ ctx->setsockopt_seq = listener->setsockopt_seq; if (ctx->mp_capable) { - owner = mptcp_sk(new_msk); + ctx->conn = mptcp_sk_clone(listener->conn, &mp_opt, req); + if (!ctx->conn) + goto fallback; + + owner = mptcp_sk(ctx->conn); /* this can't race with mptcp_close(), as the msk is * not yet exposted to user-space */ - inet_sk_state_store((void *)new_msk, TCP_ESTABLISHED); + inet_sk_state_store(ctx->conn, TCP_ESTABLISHED); /* record the newly created socket as the first msk * subflow, but don't link it yet into conn_list @@ -848,11 +834,9 @@ create_child: /* new mpc subflow takes ownership of the newly * created mptcp socket */ - mptcp_sk(new_msk)->setsockopt_seq = ctx->setsockopt_seq; + owner->setsockopt_seq = ctx->setsockopt_seq; mptcp_pm_new_connection(owner, child, 1); mptcp_token_accept(subflow_req, owner); - ctx->conn = new_msk; - new_msk = NULL; /* set msk addresses early to ensure mptcp_pm_get_local_id() * uses the correct data @@ -869,7 +853,7 @@ create_child: */ if (mp_opt.suboptions & OPTION_MPTCP_MPC_ACK) { mptcp_subflow_fully_established(ctx, &mp_opt); - mptcp_pm_fully_established(owner, child, GFP_ATOMIC); + mptcp_pm_fully_established(owner, child); ctx->pm_notified = 1; } } else if (ctx->mp_join) { @@ -902,11 +886,6 @@ create_child: } } -out: - /* dispose of the left over mptcp master, if any */ - if (unlikely(new_msk)) - mptcp_force_close(new_msk); - /* check for expected invariant - should never trigger, just help * catching eariler subtle bugs */ @@ -924,6 +903,10 @@ dispose_child: /* The last child reference will be released by the caller */ return child; + +fallback: + mptcp_subflow_drop_ctx(child); + return child; } static struct inet_connection_sock_af_ops subflow_specific __ro_after_init; @@ -1347,7 +1330,7 @@ fallback: subflow->reset_reason = MPTCP_RST_EMPTCP; reset: - ssk->sk_err = EBADMSG; + WRITE_ONCE(ssk->sk_err, EBADMSG); tcp_set_state(ssk, TCP_CLOSE); while ((skb = skb_peek(&ssk->sk_receive_queue))) sk_eat_skb(ssk, skb); @@ -1431,7 +1414,7 @@ void __mptcp_error_report(struct sock *sk) ssk_state = inet_sk_state_load(ssk); if (ssk_state == TCP_CLOSE && !sock_flag(sk, SOCK_DEAD)) inet_sk_state_store(sk, ssk_state); - sk->sk_err = -err; + WRITE_ONCE(sk->sk_err, -err); /* This barrier is coupled with smp_rmb() in mptcp_poll() */ smp_wmb(); diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index 4d6737160857..441d1f134110 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -30,6 +30,9 @@ config NETFILTER_FAMILY_BRIDGE config NETFILTER_FAMILY_ARP bool +config NETFILTER_BPF_LINK + def_bool BPF_SYSCALL + config NETFILTER_NETLINK_HOOK tristate "Netfilter base hook dump support" depends on NETFILTER_ADVANCED @@ -753,7 +756,6 @@ if NETFILTER_XTABLES config NETFILTER_XTABLES_COMPAT bool "Netfilter Xtables 32bit support" depends on COMPAT - default y help This option provides a translation layer to run 32bit arp,ip(6),ebtables binaries on 64bit kernels. diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index 5ffef1cd6143..d4958e7e7631 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -22,6 +22,7 @@ nf_conntrack-$(CONFIG_DEBUG_INFO_BTF) += nf_conntrack_bpf.o endif obj-$(CONFIG_NETFILTER) = netfilter.o +obj-$(CONFIG_NETFILTER_BPF_LINK) += nf_bpf_link.o obj-$(CONFIG_NETFILTER_NETLINK) += nfnetlink.o obj-$(CONFIG_NETFILTER_NETLINK_ACCT) += nfnetlink_acct.o diff --git a/net/netfilter/core.c b/net/netfilter/core.c index 358220b58521..f0783e42108b 100644 --- a/net/netfilter/core.c +++ b/net/netfilter/core.c @@ -119,6 +119,18 @@ nf_hook_entries_grow(const struct nf_hook_entries *old, for (i = 0; i < old_entries; i++) { if (orig_ops[i] != &dummy_ops) alloc_entries++; + + /* Restrict BPF hook type to force a unique priority, not + * shared at attach time. + * + * This is mainly to avoid ordering issues between two + * different bpf programs, this doesn't prevent a normal + * hook at same priority as a bpf one (we don't want to + * prevent defrag, conntrack, iptables etc from attaching). + */ + if (reg->priority == orig_ops[i]->priority && + reg->hook_ops_type == NF_HOOK_OP_BPF) + return ERR_PTR(-EBUSY); } } diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c index 13534e02346c..928e64653837 100644 --- a/net/netfilter/ipvs/ip_vs_conn.c +++ b/net/netfilter/ipvs/ip_vs_conn.c @@ -1481,6 +1481,7 @@ void __net_exit ip_vs_conn_net_cleanup(struct netns_ipvs *ipvs) int __init ip_vs_conn_init(void) { + size_t tab_array_size; int idx; /* Compute size and mask */ @@ -1494,8 +1495,9 @@ int __init ip_vs_conn_init(void) /* * Allocate the connection hash table and initialize its list heads */ - ip_vs_conn_tab = vmalloc(array_size(ip_vs_conn_tab_size, - sizeof(*ip_vs_conn_tab))); + tab_array_size = array_size(ip_vs_conn_tab_size, + sizeof(*ip_vs_conn_tab)); + ip_vs_conn_tab = vmalloc(tab_array_size); if (!ip_vs_conn_tab) return -ENOMEM; @@ -1508,10 +1510,8 @@ int __init ip_vs_conn_init(void) return -ENOMEM; } - pr_info("Connection hash table configured " - "(size=%d, memory=%ldKbytes)\n", - ip_vs_conn_tab_size, - (long)(ip_vs_conn_tab_size*sizeof(*ip_vs_conn_tab))/1024); + pr_info("Connection hash table configured (size=%d, memory=%zdKbytes)\n", + ip_vs_conn_tab_size, tab_array_size / 1024); IP_VS_DBG(0, "Each connection entry needs %zd bytes at least\n", sizeof(struct ip_vs_conn)); diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c index 2fcc26507d69..cb83ca506c5c 100644 --- a/net/netfilter/ipvs/ip_vs_core.c +++ b/net/netfilter/ipvs/ip_vs_core.c @@ -1140,7 +1140,6 @@ struct ip_vs_conn *ip_vs_new_conn_out(struct ip_vs_service *svc, __be16 vport; unsigned int flags; - EnterFunction(12); vaddr = &svc->addr; vport = svc->port; daddr = &iph->saddr; @@ -1208,7 +1207,6 @@ struct ip_vs_conn *ip_vs_new_conn_out(struct ip_vs_service *svc, IP_VS_DBG_ADDR(cp->af, &cp->vaddr), ntohs(cp->vport), IP_VS_DBG_ADDR(cp->af, &cp->daddr), ntohs(cp->dport), cp->flags, refcount_read(&cp->refcnt)); - LeaveFunction(12); return cp; } @@ -1316,13 +1314,11 @@ after_nat: ip_vs_update_conntrack(skb, cp, 0); ip_vs_conn_put(cp); - LeaveFunction(11); return NF_ACCEPT; drop: ip_vs_conn_put(cp); kfree_skb(skb); - LeaveFunction(11); return NF_STOLEN; } @@ -1341,8 +1337,6 @@ ip_vs_out_hook(void *priv, struct sk_buff *skb, const struct nf_hook_state *stat int af = state->pf; struct sock *sk; - EnterFunction(11); - /* Already marked as IPVS request or reply? */ if (skb->ipvs_property) return NF_ACCEPT; @@ -2365,7 +2359,6 @@ static void __net_exit __ip_vs_dev_cleanup_batch(struct list_head *net_list) struct netns_ipvs *ipvs; struct net *net; - EnterFunction(2); list_for_each_entry(net, net_list, exit_list) { ipvs = net_ipvs(net); ip_vs_unregister_hooks(ipvs, AF_INET); @@ -2374,7 +2367,6 @@ static void __net_exit __ip_vs_dev_cleanup_batch(struct list_head *net_list) smp_wmb(); ip_vs_sync_net_cleanup(ipvs); } - LeaveFunction(2); } static struct pernet_operations ipvs_core_ops = { diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c index 2a5ed71c82c3..62606fb44d02 100644 --- a/net/netfilter/ipvs/ip_vs_ctl.c +++ b/net/netfilter/ipvs/ip_vs_ctl.c @@ -1061,8 +1061,6 @@ ip_vs_new_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest) unsigned int atype; int ret; - EnterFunction(2); - #ifdef CONFIG_IP_VS_IPV6 if (udest->af == AF_INET6) { atype = ipv6_addr_type(&udest->addr.in6); @@ -1111,7 +1109,6 @@ ip_vs_new_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest) spin_lock_init(&dest->dst_lock); __ip_vs_update_dest(svc, dest, udest, 1); - LeaveFunction(2); return 0; err_stats: @@ -1134,8 +1131,6 @@ ip_vs_add_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest) __be16 dport = udest->port; int ret; - EnterFunction(2); - if (udest->weight < 0) { pr_err("%s(): server weight less than zero\n", __func__); return -ERANGE; @@ -1183,7 +1178,7 @@ ip_vs_add_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest) ret = ip_vs_start_estimator(svc->ipvs, &dest->stats); if (ret < 0) - goto err; + return ret; __ip_vs_update_dest(svc, dest, udest, 1); } else { /* @@ -1192,9 +1187,6 @@ ip_vs_add_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest) ret = ip_vs_new_dest(svc, udest); } -err: - LeaveFunction(2); - return ret; } @@ -1209,8 +1201,6 @@ ip_vs_edit_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest) union nf_inet_addr daddr; __be16 dport = udest->port; - EnterFunction(2); - if (udest->weight < 0) { pr_err("%s(): server weight less than zero\n", __func__); return -ERANGE; @@ -1242,7 +1232,6 @@ ip_vs_edit_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest) } __ip_vs_update_dest(svc, dest, udest, 0); - LeaveFunction(2); return 0; } @@ -1317,8 +1306,6 @@ ip_vs_del_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest) struct ip_vs_dest *dest; __be16 dport = udest->port; - EnterFunction(2); - /* We use function that requires RCU lock */ rcu_read_lock(); dest = ip_vs_lookup_dest(svc, udest->af, &udest->addr, dport); @@ -1339,8 +1326,6 @@ ip_vs_del_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest) */ __ip_vs_del_dest(svc->ipvs, dest, false); - LeaveFunction(2); - return 0; } @@ -1746,7 +1731,6 @@ void ip_vs_service_nets_cleanup(struct list_head *net_list) struct netns_ipvs *ipvs; struct net *net; - EnterFunction(2); /* Check for "full" addressed entries */ mutex_lock(&__ip_vs_mutex); list_for_each_entry(net, net_list, exit_list) { @@ -1754,7 +1738,6 @@ void ip_vs_service_nets_cleanup(struct list_head *net_list) ip_vs_flush(ipvs, true); } mutex_unlock(&__ip_vs_mutex); - LeaveFunction(2); } /* Put all references for device (dst_cache) */ @@ -1792,7 +1775,6 @@ static int ip_vs_dst_event(struct notifier_block *this, unsigned long event, if (event != NETDEV_DOWN || !ipvs) return NOTIFY_DONE; IP_VS_DBG(3, "%s() dev=%s\n", __func__, dev->name); - EnterFunction(2); mutex_lock(&__ip_vs_mutex); for (idx = 0; idx < IP_VS_SVC_TAB_SIZE; idx++) { hlist_for_each_entry(svc, &ip_vs_svc_table[idx], s_list) { @@ -1821,7 +1803,6 @@ static int ip_vs_dst_event(struct notifier_block *this, unsigned long event, } spin_unlock_bh(&ipvs->dest_trash_lock); mutex_unlock(&__ip_vs_mutex); - LeaveFunction(2); return NOTIFY_DONE; } @@ -4537,8 +4518,6 @@ int __init ip_vs_control_init(void) int idx; int ret; - EnterFunction(2); - /* Initialize svc_table, ip_vs_svc_fwm_table */ for (idx = 0; idx < IP_VS_SVC_TAB_SIZE; idx++) { INIT_HLIST_HEAD(&ip_vs_svc_table[idx]); @@ -4551,15 +4530,12 @@ int __init ip_vs_control_init(void) if (ret < 0) return ret; - LeaveFunction(2); return 0; } void ip_vs_control_cleanup(void) { - EnterFunction(2); unregister_netdevice_notifier(&ip_vs_dst_notifier); /* relying on common rcu_barrier() in ip_vs_cleanup() */ - LeaveFunction(2); } diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c index 4963fec815da..264f2f87a437 100644 --- a/net/netfilter/ipvs/ip_vs_sync.c +++ b/net/netfilter/ipvs/ip_vs_sync.c @@ -603,7 +603,7 @@ static void ip_vs_sync_conn_v0(struct netns_ipvs *ipvs, struct ip_vs_conn *cp, if (cp->flags & IP_VS_CONN_F_SEQ_MASK) { struct ip_vs_sync_conn_options *opt = (struct ip_vs_sync_conn_options *)&s[1]; - memcpy(opt, &cp->in_seq, sizeof(*opt)); + memcpy(opt, &cp->sync_conn_opt, sizeof(*opt)); } m->nr_conns++; @@ -1582,13 +1582,11 @@ ip_vs_send_async(struct socket *sock, const char *buffer, const size_t length) struct kvec iov; int len; - EnterFunction(7); iov.iov_base = (void *)buffer; iov.iov_len = length; len = kernel_sendmsg(sock, &msg, &iov, 1, (size_t)(length)); - LeaveFunction(7); return len; } @@ -1614,15 +1612,12 @@ ip_vs_receive(struct socket *sock, char *buffer, const size_t buflen) struct kvec iov = {buffer, buflen}; int len; - EnterFunction(7); - /* Receive a packet */ iov_iter_kvec(&msg.msg_iter, ITER_DEST, &iov, 1, buflen); len = sock_recvmsg(sock, &msg, MSG_DONTWAIT); if (len < 0) return len; - LeaveFunction(7); return len; } diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c index 80448885c3d7..feb1d7fcb09f 100644 --- a/net/netfilter/ipvs/ip_vs_xmit.c +++ b/net/netfilter/ipvs/ip_vs_xmit.c @@ -339,7 +339,7 @@ __ip_vs_get_out_rt(struct netns_ipvs *ipvs, int skb_af, struct sk_buff *skb, spin_unlock_bh(&dest->dst_lock); IP_VS_DBG(10, "new dst %pI4, src %pI4, refcnt=%d\n", &dest->addr.ip, &dest_dst->dst_saddr.ip, - atomic_read(&rt->dst.__refcnt)); + rcuref_read(&rt->dst.__rcuref)); } if (ret_saddr) *ret_saddr = dest_dst->dst_saddr.ip; @@ -507,7 +507,7 @@ __ip_vs_get_out_rt_v6(struct netns_ipvs *ipvs, int skb_af, struct sk_buff *skb, spin_unlock_bh(&dest->dst_lock); IP_VS_DBG(10, "new dst %pI6, src %pI6, refcnt=%d\n", &dest->addr.in6, &dest_dst->dst_saddr.in6, - atomic_read(&rt->dst.__refcnt)); + rcuref_read(&rt->dst.__rcuref)); } if (ret_saddr) *ret_saddr = dest_dst->dst_saddr.in6; @@ -706,8 +706,6 @@ ip_vs_bypass_xmit(struct sk_buff *skb, struct ip_vs_conn *cp, { struct iphdr *iph = ip_hdr(skb); - EnterFunction(10); - if (__ip_vs_get_out_rt(cp->ipvs, cp->af, skb, NULL, iph->daddr, IP_VS_RT_MODE_NON_LOCAL, NULL, ipvsh) < 0) goto tx_error; @@ -719,12 +717,10 @@ ip_vs_bypass_xmit(struct sk_buff *skb, struct ip_vs_conn *cp, ip_vs_send_or_cont(NFPROTO_IPV4, skb, cp, 0); - LeaveFunction(10); return NF_STOLEN; tx_error: kfree_skb(skb); - LeaveFunction(10); return NF_STOLEN; } @@ -735,8 +731,6 @@ ip_vs_bypass_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp, { struct ipv6hdr *iph = ipv6_hdr(skb); - EnterFunction(10); - if (__ip_vs_get_out_rt_v6(cp->ipvs, cp->af, skb, NULL, &iph->daddr, NULL, ipvsh, 0, IP_VS_RT_MODE_NON_LOCAL) < 0) @@ -747,12 +741,10 @@ ip_vs_bypass_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp, ip_vs_send_or_cont(NFPROTO_IPV6, skb, cp, 0); - LeaveFunction(10); return NF_STOLEN; tx_error: kfree_skb(skb); - LeaveFunction(10); return NF_STOLEN; } #endif @@ -768,8 +760,6 @@ ip_vs_nat_xmit(struct sk_buff *skb, struct ip_vs_conn *cp, struct rtable *rt; /* Route to the other host */ int local, rc, was_input; - EnterFunction(10); - /* check if it is a connection of no-client-port */ if (unlikely(cp->flags & IP_VS_CONN_F_NO_CPORT)) { __be16 _pt, *p; @@ -839,12 +829,10 @@ ip_vs_nat_xmit(struct sk_buff *skb, struct ip_vs_conn *cp, rc = ip_vs_nat_send_or_cont(NFPROTO_IPV4, skb, cp, local); - LeaveFunction(10); return rc; tx_error: kfree_skb(skb); - LeaveFunction(10); return NF_STOLEN; } @@ -856,8 +844,6 @@ ip_vs_nat_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp, struct rt6_info *rt; /* Route to the other host */ int local, rc; - EnterFunction(10); - /* check if it is a connection of no-client-port */ if (unlikely(cp->flags & IP_VS_CONN_F_NO_CPORT && !ipvsh->fragoffs)) { __be16 _pt, *p; @@ -927,11 +913,9 @@ ip_vs_nat_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp, rc = ip_vs_nat_send_or_cont(NFPROTO_IPV6, skb, cp, local); - LeaveFunction(10); return rc; tx_error: - LeaveFunction(10); kfree_skb(skb); return NF_STOLEN; } @@ -1149,8 +1133,6 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp, int tun_type, gso_type; int tun_flags; - EnterFunction(10); - local = __ip_vs_get_out_rt(ipvs, cp->af, skb, cp->dest, cp->daddr.ip, IP_VS_RT_MODE_LOCAL | IP_VS_RT_MODE_NON_LOCAL | @@ -1199,7 +1181,7 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp, &next_protocol, NULL, &dsfield, &ttl, dfp); if (IS_ERR(skb)) - goto tx_error; + return NF_STOLEN; gso_type = __tun_gso_type_mask(AF_INET, cp->af); if (tun_type == IP_VS_CONN_F_TUNNEL_TYPE_GUE) { @@ -1267,14 +1249,10 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp, else if (ret == NF_DROP) kfree_skb(skb); - LeaveFunction(10); - return NF_STOLEN; tx_error: - if (!IS_ERR(skb)) - kfree_skb(skb); - LeaveFunction(10); + kfree_skb(skb); return NF_STOLEN; } @@ -1298,8 +1276,6 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp, int tun_type, gso_type; int tun_flags; - EnterFunction(10); - local = __ip_vs_get_out_rt_v6(ipvs, cp->af, skb, cp->dest, &cp->daddr.in6, &saddr, ipvsh, 1, @@ -1347,7 +1323,7 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp, &next_protocol, &payload_len, &dsfield, &ttl, NULL); if (IS_ERR(skb)) - goto tx_error; + return NF_STOLEN; gso_type = __tun_gso_type_mask(AF_INET6, cp->af); if (tun_type == IP_VS_CONN_F_TUNNEL_TYPE_GUE) { @@ -1414,14 +1390,10 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp, else if (ret == NF_DROP) kfree_skb(skb); - LeaveFunction(10); - return NF_STOLEN; tx_error: - if (!IS_ERR(skb)) - kfree_skb(skb); - LeaveFunction(10); + kfree_skb(skb); return NF_STOLEN; } #endif @@ -1437,8 +1409,6 @@ ip_vs_dr_xmit(struct sk_buff *skb, struct ip_vs_conn *cp, { int local; - EnterFunction(10); - local = __ip_vs_get_out_rt(cp->ipvs, cp->af, skb, cp->dest, cp->daddr.ip, IP_VS_RT_MODE_LOCAL | IP_VS_RT_MODE_NON_LOCAL | @@ -1455,12 +1425,10 @@ ip_vs_dr_xmit(struct sk_buff *skb, struct ip_vs_conn *cp, ip_vs_send_or_cont(NFPROTO_IPV4, skb, cp, 0); - LeaveFunction(10); return NF_STOLEN; tx_error: kfree_skb(skb); - LeaveFunction(10); return NF_STOLEN; } @@ -1471,8 +1439,6 @@ ip_vs_dr_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp, { int local; - EnterFunction(10); - local = __ip_vs_get_out_rt_v6(cp->ipvs, cp->af, skb, cp->dest, &cp->daddr.in6, NULL, ipvsh, 0, @@ -1489,12 +1455,10 @@ ip_vs_dr_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp, ip_vs_send_or_cont(NFPROTO_IPV6, skb, cp, 0); - LeaveFunction(10); return NF_STOLEN; tx_error: kfree_skb(skb); - LeaveFunction(10); return NF_STOLEN; } #endif @@ -1514,8 +1478,6 @@ ip_vs_icmp_xmit(struct sk_buff *skb, struct ip_vs_conn *cp, int local; int rt_mode, was_input; - EnterFunction(10); - /* The ICMP packet for VS/TUN, VS/DR and LOCALNODE will be forwarded directly here, because there is no need to translate address/port back */ @@ -1526,7 +1488,7 @@ ip_vs_icmp_xmit(struct sk_buff *skb, struct ip_vs_conn *cp, rc = NF_ACCEPT; /* do not touch skb anymore */ atomic_inc(&cp->in_pkts); - goto out; + return rc; } /* @@ -1582,14 +1544,11 @@ ip_vs_icmp_xmit(struct sk_buff *skb, struct ip_vs_conn *cp, /* Another hack: avoid icmp_send in ip_fragment */ skb->ignore_df = 1; - rc = ip_vs_nat_send_or_cont(NFPROTO_IPV4, skb, cp, local); - goto out; + return ip_vs_nat_send_or_cont(NFPROTO_IPV4, skb, cp, local); tx_error: kfree_skb(skb); rc = NF_STOLEN; - out: - LeaveFunction(10); return rc; } @@ -1604,8 +1563,6 @@ ip_vs_icmp_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp, int local; int rt_mode; - EnterFunction(10); - /* The ICMP packet for VS/TUN, VS/DR and LOCALNODE will be forwarded directly here, because there is no need to translate address/port back */ @@ -1616,7 +1573,7 @@ ip_vs_icmp_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp, rc = NF_ACCEPT; /* do not touch skb anymore */ atomic_inc(&cp->in_pkts); - goto out; + return rc; } /* @@ -1671,14 +1628,11 @@ ip_vs_icmp_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp, /* Another hack: avoid icmp_send in ip_fragment */ skb->ignore_df = 1; - rc = ip_vs_nat_send_or_cont(NFPROTO_IPV6, skb, cp, local); - goto out; + return ip_vs_nat_send_or_cont(NFPROTO_IPV6, skb, cp, local); tx_error: kfree_skb(skb); rc = NF_STOLEN; -out: - LeaveFunction(10); return rc; } #endif diff --git a/net/netfilter/nf_bpf_link.c b/net/netfilter/nf_bpf_link.c new file mode 100644 index 000000000000..c36da56d756f --- /dev/null +++ b/net/netfilter/nf_bpf_link.c @@ -0,0 +1,228 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <linux/bpf.h> +#include <linux/filter.h> +#include <linux/netfilter.h> + +#include <net/netfilter/nf_bpf_link.h> +#include <uapi/linux/netfilter_ipv4.h> + +static unsigned int nf_hook_run_bpf(void *bpf_prog, struct sk_buff *skb, + const struct nf_hook_state *s) +{ + const struct bpf_prog *prog = bpf_prog; + struct bpf_nf_ctx ctx = { + .state = s, + .skb = skb, + }; + + return bpf_prog_run(prog, &ctx); +} + +struct bpf_nf_link { + struct bpf_link link; + struct nf_hook_ops hook_ops; + struct net *net; + u32 dead; +}; + +static void bpf_nf_link_release(struct bpf_link *link) +{ + struct bpf_nf_link *nf_link = container_of(link, struct bpf_nf_link, link); + + if (nf_link->dead) + return; + + /* prevent hook-not-found warning splat from netfilter core when + * .detach was already called + */ + if (!cmpxchg(&nf_link->dead, 0, 1)) + nf_unregister_net_hook(nf_link->net, &nf_link->hook_ops); +} + +static void bpf_nf_link_dealloc(struct bpf_link *link) +{ + struct bpf_nf_link *nf_link = container_of(link, struct bpf_nf_link, link); + + kfree(nf_link); +} + +static int bpf_nf_link_detach(struct bpf_link *link) +{ + bpf_nf_link_release(link); + return 0; +} + +static void bpf_nf_link_show_info(const struct bpf_link *link, + struct seq_file *seq) +{ + struct bpf_nf_link *nf_link = container_of(link, struct bpf_nf_link, link); + + seq_printf(seq, "pf:\t%u\thooknum:\t%u\tprio:\t%d\n", + nf_link->hook_ops.pf, nf_link->hook_ops.hooknum, + nf_link->hook_ops.priority); +} + +static int bpf_nf_link_fill_link_info(const struct bpf_link *link, + struct bpf_link_info *info) +{ + struct bpf_nf_link *nf_link = container_of(link, struct bpf_nf_link, link); + + info->netfilter.pf = nf_link->hook_ops.pf; + info->netfilter.hooknum = nf_link->hook_ops.hooknum; + info->netfilter.priority = nf_link->hook_ops.priority; + info->netfilter.flags = 0; + + return 0; +} + +static int bpf_nf_link_update(struct bpf_link *link, struct bpf_prog *new_prog, + struct bpf_prog *old_prog) +{ + return -EOPNOTSUPP; +} + +static const struct bpf_link_ops bpf_nf_link_lops = { + .release = bpf_nf_link_release, + .dealloc = bpf_nf_link_dealloc, + .detach = bpf_nf_link_detach, + .show_fdinfo = bpf_nf_link_show_info, + .fill_link_info = bpf_nf_link_fill_link_info, + .update_prog = bpf_nf_link_update, +}; + +static int bpf_nf_check_pf_and_hooks(const union bpf_attr *attr) +{ + switch (attr->link_create.netfilter.pf) { + case NFPROTO_IPV4: + case NFPROTO_IPV6: + if (attr->link_create.netfilter.hooknum >= NF_INET_NUMHOOKS) + return -EPROTO; + break; + default: + return -EAFNOSUPPORT; + } + + if (attr->link_create.netfilter.flags) + return -EOPNOTSUPP; + + /* make sure conntrack confirm is always last. + * + * In the future, if userspace can e.g. request defrag, then + * "defrag_requested && prio before NF_IP_PRI_CONNTRACK_DEFRAG" + * should fail. + */ + switch (attr->link_create.netfilter.priority) { + case NF_IP_PRI_FIRST: return -ERANGE; /* sabotage_in and other warts */ + case NF_IP_PRI_LAST: return -ERANGE; /* e.g. conntrack confirm */ + } + + return 0; +} + +int bpf_nf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) +{ + struct net *net = current->nsproxy->net_ns; + struct bpf_link_primer link_primer; + struct bpf_nf_link *link; + int err; + + if (attr->link_create.flags) + return -EINVAL; + + err = bpf_nf_check_pf_and_hooks(attr); + if (err) + return err; + + link = kzalloc(sizeof(*link), GFP_USER); + if (!link) + return -ENOMEM; + + bpf_link_init(&link->link, BPF_LINK_TYPE_NETFILTER, &bpf_nf_link_lops, prog); + + link->hook_ops.hook = nf_hook_run_bpf; + link->hook_ops.hook_ops_type = NF_HOOK_OP_BPF; + link->hook_ops.priv = prog; + + link->hook_ops.pf = attr->link_create.netfilter.pf; + link->hook_ops.priority = attr->link_create.netfilter.priority; + link->hook_ops.hooknum = attr->link_create.netfilter.hooknum; + + link->net = net; + link->dead = false; + + err = bpf_link_prime(&link->link, &link_primer); + if (err) { + kfree(link); + return err; + } + + err = nf_register_net_hook(net, &link->hook_ops); + if (err) { + bpf_link_cleanup(&link_primer); + return err; + } + + return bpf_link_settle(&link_primer); +} + +const struct bpf_prog_ops netfilter_prog_ops = { + .test_run = bpf_prog_test_run_nf, +}; + +static bool nf_ptr_to_btf_id(struct bpf_insn_access_aux *info, const char *name) +{ + struct btf *btf; + s32 type_id; + + btf = bpf_get_btf_vmlinux(); + if (IS_ERR_OR_NULL(btf)) + return false; + + type_id = btf_find_by_name_kind(btf, name, BTF_KIND_STRUCT); + if (WARN_ON_ONCE(type_id < 0)) + return false; + + info->btf = btf; + info->btf_id = type_id; + info->reg_type = PTR_TO_BTF_ID | PTR_TRUSTED; + return true; +} + +static bool nf_is_valid_access(int off, int size, enum bpf_access_type type, + const struct bpf_prog *prog, + struct bpf_insn_access_aux *info) +{ + if (off < 0 || off >= sizeof(struct bpf_nf_ctx)) + return false; + + if (type == BPF_WRITE) + return false; + + switch (off) { + case bpf_ctx_range(struct bpf_nf_ctx, skb): + if (size != sizeof_field(struct bpf_nf_ctx, skb)) + return false; + + return nf_ptr_to_btf_id(info, "sk_buff"); + case bpf_ctx_range(struct bpf_nf_ctx, state): + if (size != sizeof_field(struct bpf_nf_ctx, state)) + return false; + + return nf_ptr_to_btf_id(info, "nf_hook_state"); + default: + return false; + } + + return false; +} + +static const struct bpf_func_proto * +bpf_nf_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +{ + return bpf_base_func_proto(func_id); +} + +const struct bpf_verifier_ops netfilter_verifier_ops = { + .is_valid_access = nf_is_valid_access, + .get_func_proto = bpf_nf_func_proto, +}; diff --git a/net/netfilter/nf_conntrack_bpf.c b/net/netfilter/nf_conntrack_bpf.c index cd99e6dc1f35..0d36d7285e3f 100644 --- a/net/netfilter/nf_conntrack_bpf.c +++ b/net/netfilter/nf_conntrack_bpf.c @@ -192,8 +192,7 @@ BTF_ID(struct, nf_conn___init) /* Check writes into `struct nf_conn` */ static int _nf_conntrack_btf_struct_access(struct bpf_verifier_log *log, const struct bpf_reg_state *reg, - int off, int size, enum bpf_access_type atype, - u32 *next_btf_id, enum bpf_type_flag *flag) + int off, int size) { const struct btf_type *ncit, *nct, *t; size_t end; @@ -381,6 +380,7 @@ __bpf_kfunc struct nf_conn *bpf_ct_insert_entry(struct nf_conn___init *nfct_i) struct nf_conn *nfct = (struct nf_conn *)nfct_i; int err; + nfct->status |= IPS_CONFIRMED; err = nf_conntrack_hash_check_insert(nfct); if (err < 0) { nf_conntrack_free(nfct); @@ -401,8 +401,6 @@ __bpf_kfunc struct nf_conn *bpf_ct_insert_entry(struct nf_conn___init *nfct_i) */ __bpf_kfunc void bpf_ct_release(struct nf_conn *nfct) { - if (!nfct) - return; nf_ct_put(nfct); } diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index c6a6a6099b4e..c4ccfec6cb98 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -932,7 +932,6 @@ nf_conntrack_hash_check_insert(struct nf_conn *ct) goto out; } - ct->status |= IPS_CONFIRMED; smp_wmb(); /* The caller holds a reference to this object */ refcount_set(&ct->ct_general.use, 2); @@ -1294,7 +1293,7 @@ dying: } EXPORT_SYMBOL_GPL(__nf_conntrack_confirm); -/* Returns true if a connection correspondings to the tuple (required +/* Returns true if a connection corresponds to the tuple (required for NAT). */ int nf_conntrack_tuple_taken(const struct nf_conntrack_tuple *tuple, diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index bfc3aaa2c872..d40544cd61a6 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -176,7 +176,12 @@ nla_put_failure: static int ctnetlink_dump_timeout(struct sk_buff *skb, const struct nf_conn *ct, bool skip_zero) { - long timeout = nf_ct_expires(ct) / HZ; + long timeout; + + if (nf_ct_is_confirmed(ct)) + timeout = nf_ct_expires(ct) / HZ; + else + timeout = ct->timeout / HZ; if (skip_zero && timeout == 0) return 0; @@ -1554,9 +1559,6 @@ static const struct nla_policy ct_nla_policy[CTA_MAX+1] = { static int ctnetlink_flush_iterate(struct nf_conn *ct, void *data) { - if (test_bit(IPS_OFFLOAD_BIT, &ct->status)) - return 0; - return ctnetlink_filter_match(ct, data); } @@ -1626,11 +1628,6 @@ static int ctnetlink_del_conntrack(struct sk_buff *skb, ct = nf_ct_tuplehash_to_ctrack(h); - if (test_bit(IPS_OFFLOAD_BIT, &ct->status)) { - nf_ct_put(ct); - return -EBUSY; - } - if (cda[CTA_ID]) { __be32 id = nla_get_be32(cda[CTA_ID]); @@ -2253,9 +2250,6 @@ ctnetlink_create_conntrack(struct net *net, if (!cda[CTA_TIMEOUT]) goto err1; - timeout = (u64)ntohl(nla_get_be32(cda[CTA_TIMEOUT])) * HZ; - __nf_ct_set_timeout(ct, timeout); - rcu_read_lock(); if (cda[CTA_HELP]) { char *helpname = NULL; @@ -2316,6 +2310,12 @@ ctnetlink_create_conntrack(struct net *net, nfct_seqadj_ext_add(ct); nfct_synproxy_ext_add(ct); + /* we must add conntrack extensions before confirmation. */ + ct->status |= IPS_CONFIRMED; + + timeout = (u64)ntohl(nla_get_be32(cda[CTA_TIMEOUT])) * HZ; + __nf_ct_set_timeout(ct, timeout); + if (cda[CTA_STATUS]) { err = ctnetlink_change_status(ct, cda); if (err < 0) diff --git a/net/netfilter/nf_conntrack_ovs.c b/net/netfilter/nf_conntrack_ovs.c index 52b776bdf526..068e9489e1c2 100644 --- a/net/netfilter/nf_conntrack_ovs.c +++ b/net/netfilter/nf_conntrack_ovs.c @@ -6,6 +6,7 @@ #include <net/netfilter/ipv6/nf_defrag_ipv6.h> #include <net/ipv6_frag.h> #include <net/ip.h> +#include <linux/netfilter_ipv6.h> /* 'skb' should already be pulled to nh_ofs. */ int nf_ct_helper(struct sk_buff *skb, struct nf_conn *ct, @@ -120,8 +121,14 @@ int nf_ct_skb_network_trim(struct sk_buff *skb, int family) len = skb_ip_totlen(skb); break; case NFPROTO_IPV6: - len = sizeof(struct ipv6hdr) - + ntohs(ipv6_hdr(skb)->payload_len); + len = ntohs(ipv6_hdr(skb)->payload_len); + if (ipv6_hdr(skb)->nexthdr == NEXTHDR_HOP) { + int err = nf_ip6_check_hbh_len(skb, &len); + + if (err) + return err; + } + len += sizeof(struct ipv6hdr); break; default: len = skb->len; diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c index e29e4ccb5c5a..ce829d434f13 100644 --- a/net/netfilter/nf_nat_core.c +++ b/net/netfilter/nf_nat_core.c @@ -549,8 +549,8 @@ get_unique_tuple(struct nf_conntrack_tuple *tuple, if (range->flags & NF_NAT_RANGE_PROTO_SPECIFIED) { if (!(range->flags & NF_NAT_RANGE_PROTO_OFFSET) && l4proto_in_range(tuple, maniptype, - &range->min_proto, - &range->max_proto) && + &range->min_proto, + &range->max_proto) && (range->min_proto.all == range->max_proto.all || !nf_nat_used_tuple(tuple, ct))) return; diff --git a/net/netfilter/nf_nat_redirect.c b/net/netfilter/nf_nat_redirect.c index f91579c821e9..6616ba5d0b04 100644 --- a/net/netfilter/nf_nat_redirect.c +++ b/net/netfilter/nf_nat_redirect.c @@ -10,6 +10,7 @@ #include <linux/if.h> #include <linux/inetdevice.h> +#include <linux/in.h> #include <linux/ip.h> #include <linux/kernel.h> #include <linux/netdevice.h> @@ -24,54 +25,56 @@ #include <net/netfilter/nf_nat.h> #include <net/netfilter/nf_nat_redirect.h> +static unsigned int +nf_nat_redirect(struct sk_buff *skb, const struct nf_nat_range2 *range, + const union nf_inet_addr *newdst) +{ + struct nf_nat_range2 newrange; + enum ip_conntrack_info ctinfo; + struct nf_conn *ct; + + ct = nf_ct_get(skb, &ctinfo); + + memset(&newrange, 0, sizeof(newrange)); + + newrange.flags = range->flags | NF_NAT_RANGE_MAP_IPS; + newrange.min_addr = *newdst; + newrange.max_addr = *newdst; + newrange.min_proto = range->min_proto; + newrange.max_proto = range->max_proto; + + return nf_nat_setup_info(ct, &newrange, NF_NAT_MANIP_DST); +} + unsigned int -nf_nat_redirect_ipv4(struct sk_buff *skb, - const struct nf_nat_ipv4_multi_range_compat *mr, +nf_nat_redirect_ipv4(struct sk_buff *skb, const struct nf_nat_range2 *range, unsigned int hooknum) { - struct nf_conn *ct; - enum ip_conntrack_info ctinfo; - __be32 newdst; - struct nf_nat_range2 newrange; + union nf_inet_addr newdst = {}; WARN_ON(hooknum != NF_INET_PRE_ROUTING && hooknum != NF_INET_LOCAL_OUT); - ct = nf_ct_get(skb, &ctinfo); - WARN_ON(!(ct && (ctinfo == IP_CT_NEW || ctinfo == IP_CT_RELATED))); - /* Local packets: make them go to loopback */ if (hooknum == NF_INET_LOCAL_OUT) { - newdst = htonl(0x7F000001); + newdst.ip = htonl(INADDR_LOOPBACK); } else { const struct in_device *indev; - newdst = 0; - indev = __in_dev_get_rcu(skb->dev); if (indev) { const struct in_ifaddr *ifa; ifa = rcu_dereference(indev->ifa_list); if (ifa) - newdst = ifa->ifa_local; + newdst.ip = ifa->ifa_local; } - if (!newdst) + if (!newdst.ip) return NF_DROP; } - /* Transfer from original range. */ - memset(&newrange.min_addr, 0, sizeof(newrange.min_addr)); - memset(&newrange.max_addr, 0, sizeof(newrange.max_addr)); - newrange.flags = mr->range[0].flags | NF_NAT_RANGE_MAP_IPS; - newrange.min_addr.ip = newdst; - newrange.max_addr.ip = newdst; - newrange.min_proto = mr->range[0].min; - newrange.max_proto = mr->range[0].max; - - /* Hand modified range to generic setup. */ - return nf_nat_setup_info(ct, &newrange, NF_NAT_MANIP_DST); + return nf_nat_redirect(skb, range, &newdst); } EXPORT_SYMBOL_GPL(nf_nat_redirect_ipv4); @@ -81,14 +84,10 @@ unsigned int nf_nat_redirect_ipv6(struct sk_buff *skb, const struct nf_nat_range2 *range, unsigned int hooknum) { - struct nf_nat_range2 newrange; - struct in6_addr newdst; - enum ip_conntrack_info ctinfo; - struct nf_conn *ct; + union nf_inet_addr newdst = {}; - ct = nf_ct_get(skb, &ctinfo); if (hooknum == NF_INET_LOCAL_OUT) { - newdst = loopback_addr; + newdst.in6 = loopback_addr; } else { struct inet6_dev *idev; struct inet6_ifaddr *ifa; @@ -98,7 +97,7 @@ nf_nat_redirect_ipv6(struct sk_buff *skb, const struct nf_nat_range2 *range, if (idev != NULL) { read_lock_bh(&idev->lock); list_for_each_entry(ifa, &idev->addr_list, if_list) { - newdst = ifa->addr; + newdst.in6 = ifa->addr; addr = true; break; } @@ -109,12 +108,6 @@ nf_nat_redirect_ipv6(struct sk_buff *skb, const struct nf_nat_range2 *range, return NF_DROP; } - newrange.flags = range->flags | NF_NAT_RANGE_MAP_IPS; - newrange.min_addr.in6 = newdst; - newrange.max_addr.in6 = newdst; - newrange.min_proto = range->min_proto; - newrange.max_proto = range->max_proto; - - return nf_nat_setup_info(ct, &newrange, NF_NAT_MANIP_DST); + return nf_nat_redirect(skb, range, &newdst); } EXPORT_SYMBOL_GPL(nf_nat_redirect_ipv6); diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index e48ab8dfb541..09542951656c 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -102,11 +102,9 @@ static const u8 nft2audit_op[NFT_MSG_MAX] = { // enum nf_tables_msg_types [NFT_MSG_DELFLOWTABLE] = AUDIT_NFT_OP_FLOWTABLE_UNREGISTER, }; -static void nft_validate_state_update(struct net *net, u8 new_validate_state) +static void nft_validate_state_update(struct nft_table *table, u8 new_validate_state) { - struct nftables_pernet *nft_net = nft_pernet(net); - - switch (nft_net->validate_state) { + switch (table->validate_state) { case NFT_VALIDATE_SKIP: WARN_ON_ONCE(new_validate_state == NFT_VALIDATE_DO); break; @@ -117,7 +115,7 @@ static void nft_validate_state_update(struct net *net, u8 new_validate_state) return; } - nft_net->validate_state = new_validate_state; + table->validate_state = new_validate_state; } static void nf_tables_trans_destroy_work(struct work_struct *w); static DECLARE_WORK(trans_destroy_work, nf_tables_trans_destroy_work); @@ -821,12 +819,20 @@ static int nf_tables_fill_table_info(struct sk_buff *skb, struct net *net, goto nla_put_failure; if (nla_put_string(skb, NFTA_TABLE_NAME, table->name) || - nla_put_be32(skb, NFTA_TABLE_FLAGS, - htonl(table->flags & NFT_TABLE_F_MASK)) || nla_put_be32(skb, NFTA_TABLE_USE, htonl(table->use)) || nla_put_be64(skb, NFTA_TABLE_HANDLE, cpu_to_be64(table->handle), NFTA_TABLE_PAD)) goto nla_put_failure; + + if (event == NFT_MSG_DELTABLE) { + nlmsg_end(skb, nlh); + return 0; + } + + if (nla_put_be32(skb, NFTA_TABLE_FLAGS, + htonl(table->flags & NFT_TABLE_F_MASK))) + goto nla_put_failure; + if (nft_table_has_owner(table) && nla_put_be32(skb, NFTA_TABLE_OWNER, htonl(table->nlpid))) goto nla_put_failure; @@ -1224,6 +1230,7 @@ static int nf_tables_newtable(struct sk_buff *skb, const struct nfnl_info *info, if (table == NULL) goto err_kzalloc; + table->validate_state = NFT_VALIDATE_SKIP; table->name = nla_strdup(attr, GFP_KERNEL_ACCOUNT); if (table->name == NULL) goto err_strdup; @@ -1575,7 +1582,8 @@ nla_put_failure: } static int nft_dump_basechain_hook(struct sk_buff *skb, int family, - const struct nft_base_chain *basechain) + const struct nft_base_chain *basechain, + const struct list_head *hook_list) { const struct nf_hook_ops *ops = &basechain->ops; struct nft_hook *hook, *first = NULL; @@ -1592,7 +1600,11 @@ static int nft_dump_basechain_hook(struct sk_buff *skb, int family, if (nft_base_chain_netdev(family, ops->hooknum)) { nest_devs = nla_nest_start_noflag(skb, NFTA_HOOK_DEVS); - list_for_each_entry(hook, &basechain->hook_list, list) { + + if (!hook_list) + hook_list = &basechain->hook_list; + + list_for_each_entry(hook, hook_list, list) { if (!first) first = hook; @@ -1617,7 +1629,8 @@ nla_put_failure: static int nf_tables_fill_chain_info(struct sk_buff *skb, struct net *net, u32 portid, u32 seq, int event, u32 flags, int family, const struct nft_table *table, - const struct nft_chain *chain) + const struct nft_chain *chain, + const struct list_head *hook_list) { struct nlmsghdr *nlh; @@ -1627,19 +1640,22 @@ static int nf_tables_fill_chain_info(struct sk_buff *skb, struct net *net, if (!nlh) goto nla_put_failure; - if (nla_put_string(skb, NFTA_CHAIN_TABLE, table->name)) - goto nla_put_failure; - if (nla_put_be64(skb, NFTA_CHAIN_HANDLE, cpu_to_be64(chain->handle), + if (nla_put_string(skb, NFTA_CHAIN_TABLE, table->name) || + nla_put_string(skb, NFTA_CHAIN_NAME, chain->name) || + nla_put_be64(skb, NFTA_CHAIN_HANDLE, cpu_to_be64(chain->handle), NFTA_CHAIN_PAD)) goto nla_put_failure; - if (nla_put_string(skb, NFTA_CHAIN_NAME, chain->name)) - goto nla_put_failure; + + if (event == NFT_MSG_DELCHAIN && !hook_list) { + nlmsg_end(skb, nlh); + return 0; + } if (nft_is_base_chain(chain)) { const struct nft_base_chain *basechain = nft_base_chain(chain); struct nft_stats __percpu *stats; - if (nft_dump_basechain_hook(skb, family, basechain)) + if (nft_dump_basechain_hook(skb, family, basechain, hook_list)) goto nla_put_failure; if (nla_put_be32(skb, NFTA_CHAIN_POLICY, @@ -1674,7 +1690,8 @@ nla_put_failure: return -1; } -static void nf_tables_chain_notify(const struct nft_ctx *ctx, int event) +static void nf_tables_chain_notify(const struct nft_ctx *ctx, int event, + const struct list_head *hook_list) { struct nftables_pernet *nft_net; struct sk_buff *skb; @@ -1694,7 +1711,7 @@ static void nf_tables_chain_notify(const struct nft_ctx *ctx, int event) err = nf_tables_fill_chain_info(skb, ctx->net, ctx->portid, ctx->seq, event, flags, ctx->family, ctx->table, - ctx->chain); + ctx->chain, hook_list); if (err < 0) { kfree_skb(skb); goto err; @@ -1740,7 +1757,7 @@ static int nf_tables_dump_chains(struct sk_buff *skb, NFT_MSG_NEWCHAIN, NLM_F_MULTI, table->family, table, - chain) < 0) + chain, NULL) < 0) goto done; nl_dump_check_consistent(cb, nlmsg_hdr(skb)); @@ -1794,7 +1811,7 @@ static int nf_tables_getchain(struct sk_buff *skb, const struct nfnl_info *info, err = nf_tables_fill_chain_info(skb2, net, NETLINK_CB(skb).portid, info->nlh->nlmsg_seq, NFT_MSG_NEWCHAIN, - 0, family, table, chain); + 0, family, table, chain, NULL); if (err < 0) goto err_fill_chain_info; @@ -1955,7 +1972,8 @@ static struct nft_hook *nft_hook_list_find(struct list_head *hook_list, static int nf_tables_parse_netdev_hooks(struct net *net, const struct nlattr *attr, - struct list_head *hook_list) + struct list_head *hook_list, + struct netlink_ext_ack *extack) { struct nft_hook *hook, *next; const struct nlattr *tmp; @@ -1969,10 +1987,12 @@ static int nf_tables_parse_netdev_hooks(struct net *net, hook = nft_netdev_hook_alloc(net, tmp); if (IS_ERR(hook)) { + NL_SET_BAD_ATTR(extack, tmp); err = PTR_ERR(hook); goto err_hook; } if (nft_hook_list_find(hook_list, hook)) { + NL_SET_BAD_ATTR(extack, tmp); kfree(hook); err = -EEXIST; goto err_hook; @@ -2003,38 +2023,41 @@ struct nft_chain_hook { struct list_head list; }; -static int nft_chain_parse_netdev(struct net *net, - struct nlattr *tb[], - struct list_head *hook_list) +static int nft_chain_parse_netdev(struct net *net, struct nlattr *tb[], + struct list_head *hook_list, + struct netlink_ext_ack *extack, u32 flags) { struct nft_hook *hook; int err; if (tb[NFTA_HOOK_DEV]) { hook = nft_netdev_hook_alloc(net, tb[NFTA_HOOK_DEV]); - if (IS_ERR(hook)) + if (IS_ERR(hook)) { + NL_SET_BAD_ATTR(extack, tb[NFTA_HOOK_DEV]); return PTR_ERR(hook); + } list_add_tail(&hook->list, hook_list); } else if (tb[NFTA_HOOK_DEVS]) { err = nf_tables_parse_netdev_hooks(net, tb[NFTA_HOOK_DEVS], - hook_list); + hook_list, extack); if (err < 0) return err; - if (list_empty(hook_list)) - return -EINVAL; - } else { - return -EINVAL; } + if (flags & NFT_CHAIN_HW_OFFLOAD && + list_empty(hook_list)) + return -EINVAL; + return 0; } static int nft_chain_parse_hook(struct net *net, + struct nft_base_chain *basechain, const struct nlattr * const nla[], struct nft_chain_hook *hook, u8 family, - struct netlink_ext_ack *extack, bool autoload) + u32 flags, struct netlink_ext_ack *extack) { struct nftables_pernet *nft_net = nft_pernet(net); struct nlattr *ha[NFTA_HOOK_MAX + 1]; @@ -2050,31 +2073,46 @@ static int nft_chain_parse_hook(struct net *net, if (err < 0) return err; - if (ha[NFTA_HOOK_HOOKNUM] == NULL || - ha[NFTA_HOOK_PRIORITY] == NULL) - return -EINVAL; + if (!basechain) { + if (!ha[NFTA_HOOK_HOOKNUM] || + !ha[NFTA_HOOK_PRIORITY]) + return -EINVAL; - hook->num = ntohl(nla_get_be32(ha[NFTA_HOOK_HOOKNUM])); - hook->priority = ntohl(nla_get_be32(ha[NFTA_HOOK_PRIORITY])); + hook->num = ntohl(nla_get_be32(ha[NFTA_HOOK_HOOKNUM])); + hook->priority = ntohl(nla_get_be32(ha[NFTA_HOOK_PRIORITY])); - type = __nft_chain_type_get(family, NFT_CHAIN_T_DEFAULT); - if (!type) - return -EOPNOTSUPP; + type = __nft_chain_type_get(family, NFT_CHAIN_T_DEFAULT); + if (!type) + return -EOPNOTSUPP; - if (nla[NFTA_CHAIN_TYPE]) { - type = nf_tables_chain_type_lookup(net, nla[NFTA_CHAIN_TYPE], - family, autoload); - if (IS_ERR(type)) { - NL_SET_BAD_ATTR(extack, nla[NFTA_CHAIN_TYPE]); - return PTR_ERR(type); + if (nla[NFTA_CHAIN_TYPE]) { + type = nf_tables_chain_type_lookup(net, nla[NFTA_CHAIN_TYPE], + family, true); + if (IS_ERR(type)) { + NL_SET_BAD_ATTR(extack, nla[NFTA_CHAIN_TYPE]); + return PTR_ERR(type); + } } - } - if (hook->num >= NFT_MAX_HOOKS || !(type->hook_mask & (1 << hook->num))) - return -EOPNOTSUPP; + if (hook->num >= NFT_MAX_HOOKS || !(type->hook_mask & (1 << hook->num))) + return -EOPNOTSUPP; - if (type->type == NFT_CHAIN_T_NAT && - hook->priority <= NF_IP_PRI_CONNTRACK) - return -EOPNOTSUPP; + if (type->type == NFT_CHAIN_T_NAT && + hook->priority <= NF_IP_PRI_CONNTRACK) + return -EOPNOTSUPP; + } else { + if (ha[NFTA_HOOK_HOOKNUM]) { + hook->num = ntohl(nla_get_be32(ha[NFTA_HOOK_HOOKNUM])); + if (hook->num != basechain->ops.hooknum) + return -EOPNOTSUPP; + } + if (ha[NFTA_HOOK_PRIORITY]) { + hook->priority = ntohl(nla_get_be32(ha[NFTA_HOOK_PRIORITY])); + if (hook->priority != basechain->ops.priority) + return -EOPNOTSUPP; + } + + type = basechain->type; + } if (!try_module_get(type->owner)) { if (nla[NFTA_CHAIN_TYPE]) @@ -2086,7 +2124,7 @@ static int nft_chain_parse_hook(struct net *net, INIT_LIST_HEAD(&hook->list); if (nft_base_chain_netdev(family, hook->num)) { - err = nft_chain_parse_netdev(net, ha, &hook->list); + err = nft_chain_parse_netdev(net, ha, &hook->list, extack, flags); if (err < 0) { module_put(type->owner); return err; @@ -2110,38 +2148,34 @@ static void nft_chain_release_hook(struct nft_chain_hook *hook) module_put(hook->type->owner); } -struct nft_rules_old { - struct rcu_head h; - struct nft_rule_blob *blob; -}; - -static void nft_last_rule(struct nft_rule_blob *blob, const void *ptr) +static void nft_last_rule(const struct nft_chain *chain, const void *ptr) { - struct nft_rule_dp *prule; + struct nft_rule_dp_last *lrule; + + BUILD_BUG_ON(offsetof(struct nft_rule_dp_last, end) != 0); - prule = (struct nft_rule_dp *)ptr; - prule->is_last = 1; + lrule = (struct nft_rule_dp_last *)ptr; + lrule->end.is_last = 1; + lrule->chain = chain; /* blob size does not include the trailer rule */ } -static struct nft_rule_blob *nf_tables_chain_alloc_rules(unsigned int size) +static struct nft_rule_blob *nf_tables_chain_alloc_rules(const struct nft_chain *chain, + unsigned int size) { struct nft_rule_blob *blob; - /* size must include room for the last rule */ - if (size < offsetof(struct nft_rule_dp, data)) - return NULL; - - size += sizeof(struct nft_rule_blob) + sizeof(struct nft_rules_old); if (size > INT_MAX) return NULL; + size += sizeof(struct nft_rule_blob) + sizeof(struct nft_rule_dp_last); + blob = kvmalloc(size, GFP_KERNEL_ACCOUNT); if (!blob) return NULL; blob->size = 0; - nft_last_rule(blob, blob->data); + nft_last_rule(chain, blob->data); return blob; } @@ -2172,12 +2206,8 @@ static int nft_basechain_init(struct nft_base_chain *basechain, u8 family, list_splice_init(&hook->list, &basechain->hook_list); list_for_each_entry(h, &basechain->hook_list, list) nft_basechain_hook_init(&h->ops, family, hook, chain); - - basechain->ops.hooknum = hook->num; - basechain->ops.priority = hook->priority; - } else { - nft_basechain_hook_init(&basechain->ops, family, hook, chain); } + nft_basechain_hook_init(&basechain->ops, family, hook, chain); chain->flags |= NFT_CHAIN_BASE | flags; basechain->policy = NF_ACCEPT; @@ -2220,7 +2250,6 @@ static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask, struct nft_rule_blob *blob; struct nft_trans *trans; struct nft_chain *chain; - unsigned int data_size; int err; if (table->use == UINT_MAX) @@ -2228,13 +2257,13 @@ static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask, if (nla[NFTA_CHAIN_HOOK]) { struct nft_stats __percpu *stats = NULL; - struct nft_chain_hook hook; + struct nft_chain_hook hook = {}; if (flags & NFT_CHAIN_BINDING) return -EOPNOTSUPP; - err = nft_chain_parse_hook(net, nla, &hook, family, extack, - true); + err = nft_chain_parse_hook(net, NULL, nla, &hook, family, flags, + extack); if (err < 0) return err; @@ -2308,8 +2337,7 @@ static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask, chain->udlen = nla_len(nla[NFTA_CHAIN_USERDATA]); } - data_size = offsetof(struct nft_rule_dp, data); /* last rule */ - blob = nf_tables_chain_alloc_rules(data_size); + blob = nf_tables_chain_alloc_rules(chain, 0); if (!blob) { err = -ENOMEM; goto err_destroy_chain; @@ -2349,65 +2377,57 @@ err_destroy_chain: return err; } -static bool nft_hook_list_equal(struct list_head *hook_list1, - struct list_head *hook_list2) -{ - struct nft_hook *hook; - int n = 0, m = 0; - - n = 0; - list_for_each_entry(hook, hook_list2, list) { - if (!nft_hook_list_find(hook_list1, hook)) - return false; - - n++; - } - list_for_each_entry(hook, hook_list1, list) - m++; - - return n == m; -} - static int nf_tables_updchain(struct nft_ctx *ctx, u8 genmask, u8 policy, u32 flags, const struct nlattr *attr, struct netlink_ext_ack *extack) { const struct nlattr * const *nla = ctx->nla; + struct nft_base_chain *basechain = NULL; struct nft_table *table = ctx->table; struct nft_chain *chain = ctx->chain; - struct nft_base_chain *basechain; + struct nft_chain_hook hook = {}; struct nft_stats *stats = NULL; - struct nft_chain_hook hook; + struct nft_hook *h, *next; struct nf_hook_ops *ops; struct nft_trans *trans; + bool unregister = false; int err; if (chain->flags ^ flags) return -EOPNOTSUPP; + INIT_LIST_HEAD(&hook.list); + if (nla[NFTA_CHAIN_HOOK]) { if (!nft_is_base_chain(chain)) { NL_SET_BAD_ATTR(extack, attr); return -EEXIST; } - err = nft_chain_parse_hook(ctx->net, nla, &hook, ctx->family, - extack, false); + + basechain = nft_base_chain(chain); + err = nft_chain_parse_hook(ctx->net, basechain, nla, &hook, + ctx->family, flags, extack); if (err < 0) return err; - basechain = nft_base_chain(chain); if (basechain->type != hook.type) { nft_chain_release_hook(&hook); NL_SET_BAD_ATTR(extack, attr); return -EEXIST; } - if (nft_base_chain_netdev(ctx->family, hook.num)) { - if (!nft_hook_list_equal(&basechain->hook_list, - &hook.list)) { - nft_chain_release_hook(&hook); - NL_SET_BAD_ATTR(extack, attr); - return -EEXIST; + if (nft_base_chain_netdev(ctx->family, basechain->ops.hooknum)) { + list_for_each_entry_safe(h, next, &hook.list, list) { + h->ops.pf = basechain->ops.pf; + h->ops.hooknum = basechain->ops.hooknum; + h->ops.priority = basechain->ops.priority; + h->ops.priv = basechain->ops.priv; + h->ops.hook = basechain->ops.hook; + + if (nft_hook_list_find(&basechain->hook_list, h)) { + list_del(&h->list); + kfree(h); + } } } else { ops = &basechain->ops; @@ -2418,7 +2438,6 @@ static int nf_tables_updchain(struct nft_ctx *ctx, u8 genmask, u8 policy, return -EEXIST; } } - nft_chain_release_hook(&hook); } if (nla[NFTA_CHAIN_HANDLE] && @@ -2429,24 +2448,43 @@ static int nf_tables_updchain(struct nft_ctx *ctx, u8 genmask, u8 policy, nla[NFTA_CHAIN_NAME], genmask); if (!IS_ERR(chain2)) { NL_SET_BAD_ATTR(extack, nla[NFTA_CHAIN_NAME]); - return -EEXIST; + err = -EEXIST; + goto err_hooks; } } if (nla[NFTA_CHAIN_COUNTERS]) { - if (!nft_is_base_chain(chain)) - return -EOPNOTSUPP; + if (!nft_is_base_chain(chain)) { + err = -EOPNOTSUPP; + goto err_hooks; + } stats = nft_stats_alloc(nla[NFTA_CHAIN_COUNTERS]); - if (IS_ERR(stats)) - return PTR_ERR(stats); + if (IS_ERR(stats)) { + err = PTR_ERR(stats); + goto err_hooks; + } } + if (!(table->flags & NFT_TABLE_F_DORMANT) && + nft_is_base_chain(chain) && + !list_empty(&hook.list)) { + basechain = nft_base_chain(chain); + ops = &basechain->ops; + + if (nft_base_chain_netdev(table->family, basechain->ops.hooknum)) { + err = nft_netdev_register_hooks(ctx->net, &hook.list); + if (err < 0) + goto err_hooks; + } + } + + unregister = true; err = -ENOMEM; trans = nft_trans_alloc(ctx, NFT_MSG_NEWCHAIN, sizeof(struct nft_trans_chain)); if (trans == NULL) - goto err; + goto err_trans; nft_trans_chain_stats(trans) = stats; nft_trans_chain_update(trans) = true; @@ -2465,7 +2503,7 @@ static int nf_tables_updchain(struct nft_ctx *ctx, u8 genmask, u8 policy, err = -ENOMEM; name = nla_strdup(nla[NFTA_CHAIN_NAME], GFP_KERNEL_ACCOUNT); if (!name) - goto err; + goto err_trans; err = -EEXIST; list_for_each_entry(tmp, &nft_net->commit_list, list) { @@ -2476,18 +2514,35 @@ static int nf_tables_updchain(struct nft_ctx *ctx, u8 genmask, u8 policy, strcmp(name, nft_trans_chain_name(tmp)) == 0) { NL_SET_BAD_ATTR(extack, nla[NFTA_CHAIN_NAME]); kfree(name); - goto err; + goto err_trans; } } nft_trans_chain_name(trans) = name; } + + nft_trans_basechain(trans) = basechain; + INIT_LIST_HEAD(&nft_trans_chain_hooks(trans)); + list_splice(&hook.list, &nft_trans_chain_hooks(trans)); + nft_trans_commit_list_add_tail(ctx->net, trans); return 0; -err: + +err_trans: free_percpu(stats); kfree(trans); +err_hooks: + if (nla[NFTA_CHAIN_HOOK]) { + list_for_each_entry_safe(h, next, &hook.list, list) { + if (unregister) + nf_unregister_net_hook(ctx->net, &h->ops); + list_del(&h->list); + kfree_rcu(h, rcu); + } + module_put(hook.type->owner); + } + return err; } @@ -2611,6 +2666,59 @@ static int nf_tables_newchain(struct sk_buff *skb, const struct nfnl_info *info, return nf_tables_addchain(&ctx, family, genmask, policy, flags, extack); } +static int nft_delchain_hook(struct nft_ctx *ctx, struct nft_chain *chain, + struct netlink_ext_ack *extack) +{ + const struct nlattr * const *nla = ctx->nla; + struct nft_chain_hook chain_hook = {}; + struct nft_base_chain *basechain; + struct nft_hook *this, *hook; + LIST_HEAD(chain_del_list); + struct nft_trans *trans; + int err; + + if (!nft_is_base_chain(chain)) + return -EOPNOTSUPP; + + basechain = nft_base_chain(chain); + err = nft_chain_parse_hook(ctx->net, basechain, nla, &chain_hook, + ctx->family, chain->flags, extack); + if (err < 0) + return err; + + list_for_each_entry(this, &chain_hook.list, list) { + hook = nft_hook_list_find(&basechain->hook_list, this); + if (!hook) { + err = -ENOENT; + goto err_chain_del_hook; + } + list_move(&hook->list, &chain_del_list); + } + + trans = nft_trans_alloc(ctx, NFT_MSG_DELCHAIN, + sizeof(struct nft_trans_chain)); + if (!trans) { + err = -ENOMEM; + goto err_chain_del_hook; + } + + nft_trans_basechain(trans) = basechain; + nft_trans_chain_update(trans) = true; + INIT_LIST_HEAD(&nft_trans_chain_hooks(trans)); + list_splice(&chain_del_list, &nft_trans_chain_hooks(trans)); + nft_chain_release_hook(&chain_hook); + + nft_trans_commit_list_add_tail(ctx->net, trans); + + return 0; + +err_chain_del_hook: + list_splice(&chain_del_list, &basechain->hook_list); + nft_chain_release_hook(&chain_hook); + + return err; +} + static int nf_tables_delchain(struct sk_buff *skb, const struct nfnl_info *info, const struct nlattr * const nla[]) { @@ -2651,12 +2759,19 @@ static int nf_tables_delchain(struct sk_buff *skb, const struct nfnl_info *info, return PTR_ERR(chain); } + nft_ctx_init(&ctx, net, skb, info->nlh, family, table, chain, nla); + + if (nla[NFTA_CHAIN_HOOK]) { + if (chain->flags & NFT_CHAIN_HW_OFFLOAD) + return -EOPNOTSUPP; + + return nft_delchain_hook(&ctx, chain, extack); + } + if (info->nlh->nlmsg_flags & NLM_F_NONREC && chain->use > 0) return -EBUSY; - nft_ctx_init(&ctx, net, skb, info->nlh, family, table, chain, nla); - use = chain->use; list_for_each_entry(rule, &chain->rules, list) { if (!nft_is_active_next(net, rule)) @@ -3666,7 +3781,7 @@ static int nf_tables_newrule(struct sk_buff *skb, const struct nfnl_info *info, } if (expr_info[i].ops->validate) - nft_validate_state_update(net, NFT_VALIDATE_NEED); + nft_validate_state_update(table, NFT_VALIDATE_NEED); expr_info[i].ops = NULL; expr = nft_expr_next(expr); @@ -3716,7 +3831,7 @@ static int nf_tables_newrule(struct sk_buff *skb, const struct nfnl_info *info, if (flow) nft_trans_flow_rule(trans) = flow; - if (nft_net->validate_state == NFT_VALIDATE_DO) + if (table->validate_state == NFT_VALIDATE_DO) return nft_table_validate(net, table); return 0; @@ -4151,6 +4266,12 @@ static int nf_tables_fill_set(struct sk_buff *skb, const struct nft_ctx *ctx, if (nla_put_be64(skb, NFTA_SET_HANDLE, cpu_to_be64(set->handle), NFTA_SET_PAD)) goto nla_put_failure; + + if (event == NFT_MSG_DELSET) { + nlmsg_end(skb, nlh); + return 0; + } + if (set->flags != 0) if (nla_put_be32(skb, NFTA_SET_FLAGS, htonl(set->flags))) goto nla_put_failure; @@ -6318,7 +6439,7 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set, if (desc.type == NFT_DATA_VERDICT && (elem.data.val.verdict.code == NFT_GOTO || elem.data.val.verdict.code == NFT_JUMP)) - nft_validate_state_update(ctx->net, + nft_validate_state_update(ctx->table, NFT_VALIDATE_NEED); } @@ -6443,7 +6564,6 @@ static int nf_tables_newsetelem(struct sk_buff *skb, const struct nfnl_info *info, const struct nlattr * const nla[]) { - struct nftables_pernet *nft_net = nft_pernet(info->net); struct netlink_ext_ack *extack = info->extack; u8 genmask = nft_genmask_next(info->net); u8 family = info->nfmsg->nfgen_family; @@ -6482,7 +6602,7 @@ static int nf_tables_newsetelem(struct sk_buff *skb, } } - if (nft_net->validate_state == NFT_VALIDATE_DO) + if (table->validate_state == NFT_VALIDATE_DO) return nft_table_validate(net, table); return 0; @@ -7156,13 +7276,20 @@ static int nf_tables_fill_obj_info(struct sk_buff *skb, struct net *net, if (nla_put_string(skb, NFTA_OBJ_TABLE, table->name) || nla_put_string(skb, NFTA_OBJ_NAME, obj->key.name) || - nla_put_be32(skb, NFTA_OBJ_TYPE, htonl(obj->ops->type->type)) || - nla_put_be32(skb, NFTA_OBJ_USE, htonl(obj->use)) || - nft_object_dump(skb, NFTA_OBJ_DATA, obj, reset) || nla_put_be64(skb, NFTA_OBJ_HANDLE, cpu_to_be64(obj->handle), NFTA_OBJ_PAD)) goto nla_put_failure; + if (event == NFT_MSG_DELOBJ) { + nlmsg_end(skb, nlh); + return 0; + } + + if (nla_put_be32(skb, NFTA_OBJ_TYPE, htonl(obj->ops->type->type)) || + nla_put_be32(skb, NFTA_OBJ_USE, htonl(obj->use)) || + nft_object_dump(skb, NFTA_OBJ_DATA, obj, reset)) + goto nla_put_failure; + if (obj->udata && nla_put(skb, NFTA_OBJ_USERDATA, obj->udlen, obj->udata)) goto nla_put_failure; @@ -7568,7 +7695,8 @@ static const struct nla_policy nft_flowtable_hook_policy[NFTA_FLOWTABLE_HOOK_MAX static int nft_flowtable_parse_hook(const struct nft_ctx *ctx, const struct nlattr *attr, struct nft_flowtable_hook *flowtable_hook, - struct nft_flowtable *flowtable, bool add) + struct nft_flowtable *flowtable, + struct netlink_ext_ack *extack, bool add) { struct nlattr *tb[NFTA_FLOWTABLE_HOOK_MAX + 1]; struct nft_hook *hook; @@ -7615,7 +7743,8 @@ static int nft_flowtable_parse_hook(const struct nft_ctx *ctx, if (tb[NFTA_FLOWTABLE_HOOK_DEVS]) { err = nf_tables_parse_netdev_hooks(ctx->net, tb[NFTA_FLOWTABLE_HOOK_DEVS], - &flowtable_hook->list); + &flowtable_hook->list, + extack); if (err < 0) return err; } @@ -7747,7 +7876,7 @@ err_unregister_net_hooks: return err; } -static void nft_flowtable_hooks_destroy(struct list_head *hook_list) +static void nft_hooks_destroy(struct list_head *hook_list) { struct nft_hook *hook, *next; @@ -7758,7 +7887,8 @@ static void nft_flowtable_hooks_destroy(struct list_head *hook_list) } static int nft_flowtable_update(struct nft_ctx *ctx, const struct nlmsghdr *nlh, - struct nft_flowtable *flowtable) + struct nft_flowtable *flowtable, + struct netlink_ext_ack *extack) { const struct nlattr * const *nla = ctx->nla; struct nft_flowtable_hook flowtable_hook; @@ -7769,7 +7899,7 @@ static int nft_flowtable_update(struct nft_ctx *ctx, const struct nlmsghdr *nlh, int err; err = nft_flowtable_parse_hook(ctx, nla[NFTA_FLOWTABLE_HOOK], - &flowtable_hook, flowtable, false); + &flowtable_hook, flowtable, extack, false); if (err < 0) return err; @@ -7874,7 +8004,7 @@ static int nf_tables_newflowtable(struct sk_buff *skb, nft_ctx_init(&ctx, net, skb, info->nlh, family, table, NULL, nla); - return nft_flowtable_update(&ctx, info->nlh, flowtable); + return nft_flowtable_update(&ctx, info->nlh, flowtable, extack); } nft_ctx_init(&ctx, net, skb, info->nlh, family, table, NULL, nla); @@ -7915,7 +8045,7 @@ static int nf_tables_newflowtable(struct sk_buff *skb, goto err3; err = nft_flowtable_parse_hook(&ctx, nla[NFTA_FLOWTABLE_HOOK], - &flowtable_hook, flowtable, true); + &flowtable_hook, flowtable, extack, true); if (err < 0) goto err4; @@ -7927,7 +8057,7 @@ static int nf_tables_newflowtable(struct sk_buff *skb, &flowtable->hook_list, flowtable); if (err < 0) { - nft_flowtable_hooks_destroy(&flowtable->hook_list); + nft_hooks_destroy(&flowtable->hook_list); goto err4; } @@ -7967,7 +8097,8 @@ static void nft_flowtable_hook_release(struct nft_flowtable_hook *flowtable_hook } static int nft_delflowtable_hook(struct nft_ctx *ctx, - struct nft_flowtable *flowtable) + struct nft_flowtable *flowtable, + struct netlink_ext_ack *extack) { const struct nlattr * const *nla = ctx->nla; struct nft_flowtable_hook flowtable_hook; @@ -7977,7 +8108,7 @@ static int nft_delflowtable_hook(struct nft_ctx *ctx, int err; err = nft_flowtable_parse_hook(ctx, nla[NFTA_FLOWTABLE_HOOK], - &flowtable_hook, flowtable, false); + &flowtable_hook, flowtable, extack, false); if (err < 0) return err; @@ -8059,7 +8190,7 @@ static int nf_tables_delflowtable(struct sk_buff *skb, nft_ctx_init(&ctx, net, skb, info->nlh, family, table, NULL, nla); if (nla[NFTA_FLOWTABLE_HOOK]) - return nft_delflowtable_hook(&ctx, flowtable); + return nft_delflowtable_hook(&ctx, flowtable, extack); if (flowtable->use > 0) { NL_SET_BAD_ATTR(extack, attr); @@ -8087,9 +8218,16 @@ static int nf_tables_fill_flowtable_info(struct sk_buff *skb, struct net *net, if (nla_put_string(skb, NFTA_FLOWTABLE_TABLE, flowtable->table->name) || nla_put_string(skb, NFTA_FLOWTABLE_NAME, flowtable->name) || - nla_put_be32(skb, NFTA_FLOWTABLE_USE, htonl(flowtable->use)) || nla_put_be64(skb, NFTA_FLOWTABLE_HANDLE, cpu_to_be64(flowtable->handle), - NFTA_FLOWTABLE_PAD) || + NFTA_FLOWTABLE_PAD)) + goto nla_put_failure; + + if (event == NFT_MSG_DELFLOWTABLE && !hook_list) { + nlmsg_end(skb, nlh); + return 0; + } + + if (nla_put_be32(skb, NFTA_FLOWTABLE_USE, htonl(flowtable->use)) || nla_put_be32(skb, NFTA_FLOWTABLE_FLAGS, htonl(flowtable->data.flags))) goto nla_put_failure; @@ -8104,6 +8242,9 @@ static int nf_tables_fill_flowtable_info(struct sk_buff *skb, struct net *net, if (!nest_devs) goto nla_put_failure; + if (!hook_list) + hook_list = &flowtable->hook_list; + list_for_each_entry_rcu(hook, hook_list, list) { if (nla_put_string(skb, NFTA_DEVICE_NAME, hook->ops.dev->name)) goto nla_put_failure; @@ -8160,8 +8301,7 @@ static int nf_tables_dump_flowtable(struct sk_buff *skb, NFT_MSG_NEWFLOWTABLE, NLM_F_MULTI | NLM_F_APPEND, table->family, - flowtable, - &flowtable->hook_list) < 0) + flowtable, NULL) < 0) goto done; nl_dump_check_consistent(cb, nlmsg_hdr(skb)); @@ -8256,7 +8396,7 @@ static int nf_tables_getflowtable(struct sk_buff *skb, err = nf_tables_fill_flowtable_info(skb2, net, NETLINK_CB(skb).portid, info->nlh->nlmsg_seq, NFT_MSG_NEWFLOWTABLE, 0, family, - flowtable, &flowtable->hook_list); + flowtable, NULL); if (err < 0) goto err_fill_flowtable_info; @@ -8269,8 +8409,7 @@ err_fill_flowtable_info: static void nf_tables_flowtable_notify(struct nft_ctx *ctx, struct nft_flowtable *flowtable, - struct list_head *hook_list, - int event) + struct list_head *hook_list, int event) { struct nftables_pernet *nft_net = nft_pernet(ctx->net); struct sk_buff *skb; @@ -8634,17 +8773,20 @@ static int nf_tables_validate(struct net *net) struct nftables_pernet *nft_net = nft_pernet(net); struct nft_table *table; - switch (nft_net->validate_state) { - case NFT_VALIDATE_SKIP: - break; - case NFT_VALIDATE_NEED: - nft_validate_state_update(net, NFT_VALIDATE_DO); - fallthrough; - case NFT_VALIDATE_DO: - list_for_each_entry(table, &nft_net->tables, list) { + list_for_each_entry(table, &nft_net->tables, list) { + switch (table->validate_state) { + case NFT_VALIDATE_SKIP: + continue; + case NFT_VALIDATE_NEED: + nft_validate_state_update(table, NFT_VALIDATE_DO); + fallthrough; + case NFT_VALIDATE_DO: if (nft_table_validate(net, table) < 0) return -EAGAIN; + + nft_validate_state_update(table, NFT_VALIDATE_SKIP); } + break; } @@ -8729,7 +8871,10 @@ static void nft_commit_release(struct nft_trans *trans) break; case NFT_MSG_DELCHAIN: case NFT_MSG_DESTROYCHAIN: - nf_tables_chain_destroy(&trans->ctx); + if (nft_trans_chain_update(trans)) + nft_hooks_destroy(&nft_trans_chain_hooks(trans)); + else + nf_tables_chain_destroy(&trans->ctx); break; case NFT_MSG_DELRULE: case NFT_MSG_DESTROYRULE: @@ -8752,7 +8897,7 @@ static void nft_commit_release(struct nft_trans *trans) case NFT_MSG_DELFLOWTABLE: case NFT_MSG_DESTROYFLOWTABLE: if (nft_trans_flowtable_update(trans)) - nft_flowtable_hooks_destroy(&nft_trans_flowtable_hooks(trans)); + nft_hooks_destroy(&nft_trans_flowtable_hooks(trans)); else nf_tables_flowtable_destroy(nft_trans_flowtable(trans)); break; @@ -8817,9 +8962,8 @@ static int nf_tables_commit_chain_prepare(struct net *net, struct nft_chain *cha return -ENOMEM; } } - data_size += offsetof(struct nft_rule_dp, data); /* last rule */ - chain->blob_next = nf_tables_chain_alloc_rules(data_size); + chain->blob_next = nf_tables_chain_alloc_rules(chain, data_size); if (!chain->blob_next) return -ENOMEM; @@ -8864,12 +9008,11 @@ static int nf_tables_commit_chain_prepare(struct net *net, struct nft_chain *cha chain->blob_next->size += (unsigned long)(data - (void *)prule); } - prule = (struct nft_rule_dp *)data; - data += offsetof(struct nft_rule_dp, data); if (WARN_ON_ONCE(data > data_boundary)) return -ENOMEM; - nft_last_rule(chain->blob_next, prule); + prule = (struct nft_rule_dp *)data; + nft_last_rule(chain, prule); return 0; } @@ -8890,22 +9033,22 @@ static void nf_tables_commit_chain_prepare_cancel(struct net *net) } } -static void __nf_tables_commit_chain_free_rules_old(struct rcu_head *h) +static void __nf_tables_commit_chain_free_rules(struct rcu_head *h) { - struct nft_rules_old *o = container_of(h, struct nft_rules_old, h); + struct nft_rule_dp_last *l = container_of(h, struct nft_rule_dp_last, h); - kvfree(o->blob); + kvfree(l->blob); } static void nf_tables_commit_chain_free_rules_old(struct nft_rule_blob *blob) { - struct nft_rules_old *old; + struct nft_rule_dp_last *last; - /* rcu_head is after end marker */ - old = (void *)blob + sizeof(*blob) + blob->size; - old->blob = blob; + /* last rule trailer is after end marker */ + last = (void *)blob + sizeof(*blob) + blob->size; + last->blob = blob; - call_rcu(&old->h, __nf_tables_commit_chain_free_rules_old); + call_rcu(&last->h, __nf_tables_commit_chain_free_rules); } static void nf_tables_commit_chain(struct net *net, struct nft_chain *chain) @@ -9209,22 +9352,34 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb) case NFT_MSG_NEWCHAIN: if (nft_trans_chain_update(trans)) { nft_chain_commit_update(trans); - nf_tables_chain_notify(&trans->ctx, NFT_MSG_NEWCHAIN); + nf_tables_chain_notify(&trans->ctx, NFT_MSG_NEWCHAIN, + &nft_trans_chain_hooks(trans)); + list_splice(&nft_trans_chain_hooks(trans), + &nft_trans_basechain(trans)->hook_list); /* trans destroyed after rcu grace period */ } else { nft_chain_commit_drop_policy(trans); nft_clear(net, trans->ctx.chain); - nf_tables_chain_notify(&trans->ctx, NFT_MSG_NEWCHAIN); + nf_tables_chain_notify(&trans->ctx, NFT_MSG_NEWCHAIN, NULL); nft_trans_destroy(trans); } break; case NFT_MSG_DELCHAIN: case NFT_MSG_DESTROYCHAIN: - nft_chain_del(trans->ctx.chain); - nf_tables_chain_notify(&trans->ctx, trans->msg_type); - nf_tables_unregister_hook(trans->ctx.net, - trans->ctx.table, - trans->ctx.chain); + if (nft_trans_chain_update(trans)) { + nf_tables_chain_notify(&trans->ctx, NFT_MSG_DELCHAIN, + &nft_trans_chain_hooks(trans)); + nft_netdev_unregister_hooks(net, + &nft_trans_chain_hooks(trans), + true); + } else { + nft_chain_del(trans->ctx.chain); + nf_tables_chain_notify(&trans->ctx, NFT_MSG_DELCHAIN, + NULL); + nf_tables_unregister_hook(trans->ctx.net, + trans->ctx.table, + trans->ctx.chain); + } break; case NFT_MSG_NEWRULE: nft_clear(trans->ctx.net, nft_trans_rule(trans)); @@ -9330,7 +9485,7 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb) nft_clear(net, nft_trans_flowtable(trans)); nf_tables_flowtable_notify(&trans->ctx, nft_trans_flowtable(trans), - &nft_trans_flowtable(trans)->hook_list, + NULL, NFT_MSG_NEWFLOWTABLE); } nft_trans_destroy(trans); @@ -9348,7 +9503,7 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb) list_del_rcu(&nft_trans_flowtable(trans)->list); nf_tables_flowtable_notify(&trans->ctx, nft_trans_flowtable(trans), - &nft_trans_flowtable(trans)->hook_list, + NULL, trans->msg_type); nft_unregister_flowtable_net_hooks(net, &nft_trans_flowtable(trans)->hook_list); @@ -9388,7 +9543,10 @@ static void nf_tables_abort_release(struct nft_trans *trans) nf_tables_table_destroy(&trans->ctx); break; case NFT_MSG_NEWCHAIN: - nf_tables_chain_destroy(&trans->ctx); + if (nft_trans_chain_update(trans)) + nft_hooks_destroy(&nft_trans_chain_hooks(trans)); + else + nf_tables_chain_destroy(&trans->ctx); break; case NFT_MSG_NEWRULE: nf_tables_rule_destroy(&trans->ctx, nft_trans_rule(trans)); @@ -9405,7 +9563,7 @@ static void nf_tables_abort_release(struct nft_trans *trans) break; case NFT_MSG_NEWFLOWTABLE: if (nft_trans_flowtable_update(trans)) - nft_flowtable_hooks_destroy(&nft_trans_flowtable_hooks(trans)); + nft_hooks_destroy(&nft_trans_flowtable_hooks(trans)); else nf_tables_flowtable_destroy(nft_trans_flowtable(trans)); break; @@ -9451,6 +9609,9 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action) break; case NFT_MSG_NEWCHAIN: if (nft_trans_chain_update(trans)) { + nft_netdev_unregister_hooks(net, + &nft_trans_chain_hooks(trans), + true); free_percpu(nft_trans_chain_stats(trans)); kfree(nft_trans_chain_name(trans)); nft_trans_destroy(trans); @@ -9468,8 +9629,13 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action) break; case NFT_MSG_DELCHAIN: case NFT_MSG_DESTROYCHAIN: - trans->ctx.table->use++; - nft_clear(trans->ctx.net, trans->ctx.chain); + if (nft_trans_chain_update(trans)) { + list_splice(&nft_trans_chain_hooks(trans), + &nft_trans_basechain(trans)->hook_list); + } else { + trans->ctx.table->use++; + nft_clear(trans->ctx.net, trans->ctx.chain); + } nft_trans_destroy(trans); break; case NFT_MSG_NEWRULE: @@ -9586,11 +9752,6 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action) return 0; } -static void nf_tables_cleanup(struct net *net) -{ - nft_validate_state_update(net, NFT_VALIDATE_SKIP); -} - static int nf_tables_abort(struct net *net, struct sk_buff *skb, enum nfnl_abort_action action) { @@ -9624,7 +9785,6 @@ static const struct nfnetlink_subsystem nf_tables_subsys = { .cb = nf_tables_cb, .commit = nf_tables_commit, .abort = nf_tables_abort, - .cleanup = nf_tables_cleanup, .valid_genid = nf_tables_valid_genid, .owner = THIS_MODULE, }; @@ -10367,7 +10527,6 @@ static int __net_init nf_tables_init_net(struct net *net) INIT_LIST_HEAD(&nft_net->notify_list); mutex_init(&nft_net->commit_mutex); nft_net->base_seq = 1; - nft_net->validate_state = NFT_VALIDATE_SKIP; return 0; } diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c index 6ecd0ba2e546..4d0ce12221f6 100644 --- a/net/netfilter/nf_tables_core.c +++ b/net/netfilter/nf_tables_core.c @@ -41,39 +41,37 @@ static inline bool nf_skip_indirect_calls(void) { return false; } static inline void nf_skip_indirect_calls_enable(void) { } #endif -static noinline void __nft_trace_packet(struct nft_traceinfo *info, - const struct nft_chain *chain, +static noinline void __nft_trace_packet(const struct nft_pktinfo *pkt, + const struct nft_verdict *verdict, + const struct nft_rule_dp *rule, + struct nft_traceinfo *info, enum nft_trace_types type) { if (!info->trace || !info->nf_trace) return; - info->chain = chain; info->type = type; - nft_trace_notify(info); + nft_trace_notify(pkt, verdict, rule, info); } static inline void nft_trace_packet(const struct nft_pktinfo *pkt, + struct nft_verdict *verdict, struct nft_traceinfo *info, - const struct nft_chain *chain, const struct nft_rule_dp *rule, enum nft_trace_types type) { if (static_branch_unlikely(&nft_trace_enabled)) { info->nf_trace = pkt->skb->nf_trace; - info->rule = rule; - __nft_trace_packet(info, chain, type); + __nft_trace_packet(pkt, verdict, rule, info, type); } } static inline void nft_trace_copy_nftrace(const struct nft_pktinfo *pkt, struct nft_traceinfo *info) { - if (static_branch_unlikely(&nft_trace_enabled)) { - if (info->trace) - info->nf_trace = pkt->skb->nf_trace; - } + if (static_branch_unlikely(&nft_trace_enabled)) + info->nf_trace = pkt->skb->nf_trace; } static void nft_bitwise_fast_eval(const struct nft_expr *expr, @@ -110,8 +108,9 @@ static void nft_cmp16_fast_eval(const struct nft_expr *expr, regs->verdict.code = NFT_BREAK; } -static noinline void __nft_trace_verdict(struct nft_traceinfo *info, - const struct nft_chain *chain, +static noinline void __nft_trace_verdict(const struct nft_pktinfo *pkt, + struct nft_traceinfo *info, + const struct nft_rule_dp *rule, const struct nft_regs *regs) { enum nft_trace_types type; @@ -129,22 +128,20 @@ static noinline void __nft_trace_verdict(struct nft_traceinfo *info, type = NFT_TRACETYPE_RULE; if (info->trace) - info->nf_trace = info->pkt->skb->nf_trace; + info->nf_trace = pkt->skb->nf_trace; break; } - __nft_trace_packet(info, chain, type); + __nft_trace_packet(pkt, ®s->verdict, rule, info, type); } -static inline void nft_trace_verdict(struct nft_traceinfo *info, - const struct nft_chain *chain, +static inline void nft_trace_verdict(const struct nft_pktinfo *pkt, + struct nft_traceinfo *info, const struct nft_rule_dp *rule, const struct nft_regs *regs) { - if (static_branch_unlikely(&nft_trace_enabled)) { - info->rule = rule; - __nft_trace_verdict(info, chain, regs); - } + if (static_branch_unlikely(&nft_trace_enabled)) + __nft_trace_verdict(pkt, info, rule, regs); } static bool nft_payload_fast_eval(const struct nft_expr *expr, @@ -203,9 +200,7 @@ static noinline void nft_update_chain_stats(const struct nft_chain *chain, } struct nft_jumpstack { - const struct nft_chain *chain; const struct nft_rule_dp *rule; - const struct nft_rule_dp *last_rule; }; static void expr_call_ops_eval(const struct nft_expr *expr, @@ -248,7 +243,6 @@ indirect_call: #define nft_rule_expr_first(rule) (struct nft_expr *)&rule->data[0] #define nft_rule_expr_next(expr) ((void *)expr) + expr->ops->size #define nft_rule_expr_last(rule) (struct nft_expr *)&rule->data[rule->dlen] -#define nft_rule_next(rule) (void *)rule + sizeof(*rule) + rule->dlen #define nft_rule_dp_for_each_expr(expr, last, rule) \ for ((expr) = nft_rule_expr_first(rule), (last) = nft_rule_expr_last(rule); \ @@ -259,9 +253,9 @@ unsigned int nft_do_chain(struct nft_pktinfo *pkt, void *priv) { const struct nft_chain *chain = priv, *basechain = chain; - const struct nft_rule_dp *rule, *last_rule; const struct net *net = nft_net(pkt); const struct nft_expr *expr, *last; + const struct nft_rule_dp *rule; struct nft_regs regs = {}; unsigned int stackptr = 0; struct nft_jumpstack jumpstack[NFT_JUMP_STACK_SIZE]; @@ -271,7 +265,7 @@ nft_do_chain(struct nft_pktinfo *pkt, void *priv) info.trace = false; if (static_branch_unlikely(&nft_trace_enabled)) - nft_trace_init(&info, pkt, ®s.verdict, basechain); + nft_trace_init(&info, pkt, basechain); do_chain: if (genbit) blob = rcu_dereference(chain->blob_gen_1); @@ -279,10 +273,9 @@ do_chain: blob = rcu_dereference(chain->blob_gen_0); rule = (struct nft_rule_dp *)blob->data; - last_rule = (void *)blob->data + blob->size; next_rule: regs.verdict.code = NFT_CONTINUE; - for (; rule < last_rule; rule = nft_rule_next(rule)) { + for (; !rule->is_last ; rule = nft_rule_next(rule)) { nft_rule_dp_for_each_expr(expr, last, rule) { if (expr->ops == &nft_cmp_fast_ops) nft_cmp_fast_eval(expr, ®s); @@ -304,14 +297,14 @@ next_rule: nft_trace_copy_nftrace(pkt, &info); continue; case NFT_CONTINUE: - nft_trace_packet(pkt, &info, chain, rule, + nft_trace_packet(pkt, ®s.verdict, &info, rule, NFT_TRACETYPE_RULE); continue; } break; } - nft_trace_verdict(&info, chain, rule, ®s); + nft_trace_verdict(pkt, &info, rule, ®s); switch (regs.verdict.code & NF_VERDICT_MASK) { case NF_ACCEPT: @@ -325,9 +318,7 @@ next_rule: case NFT_JUMP: if (WARN_ON_ONCE(stackptr >= NFT_JUMP_STACK_SIZE)) return NF_DROP; - jumpstack[stackptr].chain = chain; jumpstack[stackptr].rule = nft_rule_next(rule); - jumpstack[stackptr].last_rule = last_rule; stackptr++; fallthrough; case NFT_GOTO: @@ -342,13 +333,11 @@ next_rule: if (stackptr > 0) { stackptr--; - chain = jumpstack[stackptr].chain; rule = jumpstack[stackptr].rule; - last_rule = jumpstack[stackptr].last_rule; goto next_rule; } - nft_trace_packet(pkt, &info, basechain, NULL, NFT_TRACETYPE_POLICY); + nft_trace_packet(pkt, ®s.verdict, &info, NULL, NFT_TRACETYPE_POLICY); if (static_branch_unlikely(&nft_counters_enabled)) nft_update_chain_stats(basechain, pkt); diff --git a/net/netfilter/nf_tables_trace.c b/net/netfilter/nf_tables_trace.c index 1163ba9c1401..6d41c0bd3d78 100644 --- a/net/netfilter/nf_tables_trace.c +++ b/net/netfilter/nf_tables_trace.c @@ -124,9 +124,11 @@ static int nf_trace_fill_pkt_info(struct sk_buff *nlskb, } static int nf_trace_fill_rule_info(struct sk_buff *nlskb, + const struct nft_verdict *verdict, + const struct nft_rule_dp *rule, const struct nft_traceinfo *info) { - if (!info->rule || info->rule->is_last) + if (!rule || rule->is_last) return 0; /* a continue verdict with ->type == RETURN means that this is @@ -135,15 +137,16 @@ static int nf_trace_fill_rule_info(struct sk_buff *nlskb, * Since no rule matched, the ->rule pointer is invalid. */ if (info->type == NFT_TRACETYPE_RETURN && - info->verdict->code == NFT_CONTINUE) + verdict->code == NFT_CONTINUE) return 0; return nla_put_be64(nlskb, NFTA_TRACE_RULE_HANDLE, - cpu_to_be64(info->rule->handle), + cpu_to_be64(rule->handle), NFTA_TRACE_PAD); } -static bool nft_trace_have_verdict_chain(struct nft_traceinfo *info) +static bool nft_trace_have_verdict_chain(const struct nft_verdict *verdict, + struct nft_traceinfo *info) { switch (info->type) { case NFT_TRACETYPE_RETURN: @@ -153,7 +156,7 @@ static bool nft_trace_have_verdict_chain(struct nft_traceinfo *info) return false; } - switch (info->verdict->code) { + switch (verdict->code) { case NFT_JUMP: case NFT_GOTO: break; @@ -164,9 +167,31 @@ static bool nft_trace_have_verdict_chain(struct nft_traceinfo *info) return true; } -void nft_trace_notify(struct nft_traceinfo *info) +static const struct nft_chain *nft_trace_get_chain(const struct nft_rule_dp *rule, + const struct nft_traceinfo *info) { - const struct nft_pktinfo *pkt = info->pkt; + const struct nft_rule_dp_last *last; + + if (!rule) + return &info->basechain->chain; + + while (!rule->is_last) + rule = nft_rule_next(rule); + + last = (const struct nft_rule_dp_last *)rule; + + if (WARN_ON_ONCE(!last->chain)) + return &info->basechain->chain; + + return last->chain; +} + +void nft_trace_notify(const struct nft_pktinfo *pkt, + const struct nft_verdict *verdict, + const struct nft_rule_dp *rule, + struct nft_traceinfo *info) +{ + const struct nft_chain *chain; struct nlmsghdr *nlh; struct sk_buff *skb; unsigned int size; @@ -176,9 +201,11 @@ void nft_trace_notify(struct nft_traceinfo *info) if (!nfnetlink_has_listeners(nft_net(pkt), NFNLGRP_NFTRACE)) return; + chain = nft_trace_get_chain(rule, info); + size = nlmsg_total_size(sizeof(struct nfgenmsg)) + - nla_total_size(strlen(info->chain->table->name)) + - nla_total_size(strlen(info->chain->name)) + + nla_total_size(strlen(chain->table->name)) + + nla_total_size(strlen(chain->name)) + nla_total_size_64bit(sizeof(__be64)) + /* rule handle */ nla_total_size(sizeof(__be32)) + /* trace type */ nla_total_size(0) + /* VERDICT, nested */ @@ -195,8 +222,8 @@ void nft_trace_notify(struct nft_traceinfo *info) nla_total_size(sizeof(u32)) + /* nfproto */ nla_total_size(sizeof(u32)); /* policy */ - if (nft_trace_have_verdict_chain(info)) - size += nla_total_size(strlen(info->verdict->chain->name)); /* jump target */ + if (nft_trace_have_verdict_chain(verdict, info)) + size += nla_total_size(strlen(verdict->chain->name)); /* jump target */ skb = nlmsg_new(size, GFP_ATOMIC); if (!skb) @@ -217,13 +244,13 @@ void nft_trace_notify(struct nft_traceinfo *info) if (nla_put_u32(skb, NFTA_TRACE_ID, info->skbid)) goto nla_put_failure; - if (nla_put_string(skb, NFTA_TRACE_CHAIN, info->chain->name)) + if (nla_put_string(skb, NFTA_TRACE_CHAIN, chain->name)) goto nla_put_failure; - if (nla_put_string(skb, NFTA_TRACE_TABLE, info->chain->table->name)) + if (nla_put_string(skb, NFTA_TRACE_TABLE, chain->table->name)) goto nla_put_failure; - if (nf_trace_fill_rule_info(skb, info)) + if (nf_trace_fill_rule_info(skb, verdict, rule, info)) goto nla_put_failure; switch (info->type) { @@ -232,11 +259,11 @@ void nft_trace_notify(struct nft_traceinfo *info) break; case NFT_TRACETYPE_RETURN: case NFT_TRACETYPE_RULE: - if (nft_verdict_dump(skb, NFTA_TRACE_VERDICT, info->verdict)) + if (nft_verdict_dump(skb, NFTA_TRACE_VERDICT, verdict)) goto nla_put_failure; /* pkt->skb undefined iff NF_STOLEN, disable dump */ - if (info->verdict->code == NF_STOLEN) + if (verdict->code == NF_STOLEN) info->packet_dumped = true; else mark = pkt->skb->mark; @@ -273,7 +300,6 @@ void nft_trace_notify(struct nft_traceinfo *info) } void nft_trace_init(struct nft_traceinfo *info, const struct nft_pktinfo *pkt, - const struct nft_verdict *verdict, const struct nft_chain *chain) { static siphash_key_t trace_key __read_mostly; @@ -283,8 +309,6 @@ void nft_trace_init(struct nft_traceinfo *info, const struct nft_pktinfo *pkt, info->trace = true; info->nf_trace = pkt->skb->nf_trace; info->packet_dumped = false; - info->pkt = pkt; - info->verdict = verdict; net_get_random_once(&trace_key, sizeof(trace_key)); diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c index 81c7737c803a..ae7146475d17 100644 --- a/net/netfilter/nfnetlink.c +++ b/net/netfilter/nfnetlink.c @@ -590,8 +590,6 @@ done: goto replay_abort; } } - if (ss->cleanup) - ss->cleanup(net); nfnl_err_deliver(&err_list, oskb); kfree_skb(skb); diff --git a/net/netfilter/nfnetlink_hook.c b/net/netfilter/nfnetlink_hook.c index 8120aadf6a0f..ade8ee1988b1 100644 --- a/net/netfilter/nfnetlink_hook.c +++ b/net/netfilter/nfnetlink_hook.c @@ -5,6 +5,7 @@ * Author: Florian Westphal <fw@strlen.de> */ +#include <linux/bpf.h> #include <linux/module.h> #include <linux/kallsyms.h> #include <linux/kernel.h> @@ -57,35 +58,76 @@ struct nfnl_dump_hook_data { u8 hook; }; +static struct nlattr *nfnl_start_info_type(struct sk_buff *nlskb, enum nfnl_hook_chaintype t) +{ + struct nlattr *nest = nla_nest_start(nlskb, NFNLA_HOOK_CHAIN_INFO); + int ret; + + if (!nest) + return NULL; + + ret = nla_put_be32(nlskb, NFNLA_HOOK_INFO_TYPE, htonl(t)); + if (ret == 0) + return nest; + + nla_nest_cancel(nlskb, nest); + return NULL; +} + +static int nfnl_hook_put_bpf_prog_info(struct sk_buff *nlskb, + const struct nfnl_dump_hook_data *ctx, + unsigned int seq, + const struct bpf_prog *prog) +{ + struct nlattr *nest, *nest2; + int ret; + + if (!IS_ENABLED(CONFIG_NETFILTER_BPF_LINK)) + return 0; + + if (WARN_ON_ONCE(!prog)) + return 0; + + nest = nfnl_start_info_type(nlskb, NFNL_HOOK_TYPE_BPF); + if (!nest) + return -EMSGSIZE; + + nest2 = nla_nest_start(nlskb, NFNLA_HOOK_INFO_DESC); + if (!nest2) + goto cancel_nest; + + ret = nla_put_be32(nlskb, NFNLA_HOOK_BPF_ID, htonl(prog->aux->id)); + if (ret) + goto cancel_nest; + + nla_nest_end(nlskb, nest2); + nla_nest_end(nlskb, nest); + return 0; + +cancel_nest: + nla_nest_cancel(nlskb, nest); + return -EMSGSIZE; +} + static int nfnl_hook_put_nft_chain_info(struct sk_buff *nlskb, const struct nfnl_dump_hook_data *ctx, unsigned int seq, - const struct nf_hook_ops *ops) + struct nft_chain *chain) { struct net *net = sock_net(nlskb->sk); struct nlattr *nest, *nest2; - struct nft_chain *chain; int ret = 0; - if (ops->hook_ops_type != NF_HOOK_OP_NF_TABLES) - return 0; - - chain = ops->priv; if (WARN_ON_ONCE(!chain)) return 0; if (!nft_is_active(net, chain)) return 0; - nest = nla_nest_start(nlskb, NFNLA_HOOK_CHAIN_INFO); + nest = nfnl_start_info_type(nlskb, NFNL_HOOK_TYPE_NFTABLES); if (!nest) return -EMSGSIZE; - ret = nla_put_be32(nlskb, NFNLA_HOOK_INFO_TYPE, - htonl(NFNL_HOOK_TYPE_NFTABLES)); - if (ret) - goto cancel_nest; - nest2 = nla_nest_start(nlskb, NFNLA_HOOK_INFO_DESC); if (!nest2) goto cancel_nest; @@ -171,7 +213,20 @@ static int nfnl_hook_dump_one(struct sk_buff *nlskb, if (ret) goto nla_put_failure; - ret = nfnl_hook_put_nft_chain_info(nlskb, ctx, seq, ops); + switch (ops->hook_ops_type) { + case NF_HOOK_OP_NF_TABLES: + ret = nfnl_hook_put_nft_chain_info(nlskb, ctx, seq, ops->priv); + break; + case NF_HOOK_OP_BPF: + ret = nfnl_hook_put_bpf_prog_info(nlskb, ctx, seq, ops->priv); + break; + case NF_HOOK_OP_UNDEFINED: + break; + default: + WARN_ON_ONCE(1); + break; + } + if (ret) goto nla_put_failure; diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c index d97eb280cb2e..e57eb168ee13 100644 --- a/net/netfilter/nfnetlink_log.c +++ b/net/netfilter/nfnetlink_log.c @@ -103,9 +103,9 @@ static inline u_int8_t instance_hashfn(u_int16_t group_num) } static struct nfulnl_instance * -__instance_lookup(struct nfnl_log_net *log, u_int16_t group_num) +__instance_lookup(const struct nfnl_log_net *log, u16 group_num) { - struct hlist_head *head; + const struct hlist_head *head; struct nfulnl_instance *inst; head = &log->instance_table[instance_hashfn(group_num)]; @@ -123,15 +123,25 @@ instance_get(struct nfulnl_instance *inst) } static struct nfulnl_instance * -instance_lookup_get(struct nfnl_log_net *log, u_int16_t group_num) +instance_lookup_get_rcu(const struct nfnl_log_net *log, u16 group_num) { struct nfulnl_instance *inst; - rcu_read_lock_bh(); inst = __instance_lookup(log, group_num); if (inst && !refcount_inc_not_zero(&inst->use)) inst = NULL; - rcu_read_unlock_bh(); + + return inst; +} + +static struct nfulnl_instance * +instance_lookup_get(const struct nfnl_log_net *log, u16 group_num) +{ + struct nfulnl_instance *inst; + + rcu_read_lock(); + inst = instance_lookup_get_rcu(log, group_num); + rcu_read_unlock(); return inst; } @@ -698,7 +708,7 @@ nfulnl_log_packet(struct net *net, else li = &default_loginfo; - inst = instance_lookup_get(log, li->u.ulog.group); + inst = instance_lookup_get_rcu(log, li->u.ulog.group); if (!inst) return; @@ -1030,7 +1040,7 @@ static struct hlist_node *get_first(struct net *net, struct iter_state *st) struct hlist_head *head = &log->instance_table[st->bucket]; if (!hlist_empty(head)) - return rcu_dereference_bh(hlist_first_rcu(head)); + return rcu_dereference(hlist_first_rcu(head)); } return NULL; } @@ -1038,7 +1048,7 @@ static struct hlist_node *get_first(struct net *net, struct iter_state *st) static struct hlist_node *get_next(struct net *net, struct iter_state *st, struct hlist_node *h) { - h = rcu_dereference_bh(hlist_next_rcu(h)); + h = rcu_dereference(hlist_next_rcu(h)); while (!h) { struct nfnl_log_net *log; struct hlist_head *head; @@ -1048,7 +1058,7 @@ static struct hlist_node *get_next(struct net *net, struct iter_state *st, log = nfnl_log_pernet(net); head = &log->instance_table[st->bucket]; - h = rcu_dereference_bh(hlist_first_rcu(head)); + h = rcu_dereference(hlist_first_rcu(head)); } return h; } @@ -1066,9 +1076,9 @@ static struct hlist_node *get_idx(struct net *net, struct iter_state *st, } static void *seq_start(struct seq_file *s, loff_t *pos) - __acquires(rcu_bh) + __acquires(rcu) { - rcu_read_lock_bh(); + rcu_read_lock(); return get_idx(seq_file_net(s), s->private, *pos); } @@ -1079,9 +1089,9 @@ static void *seq_next(struct seq_file *s, void *v, loff_t *pos) } static void seq_stop(struct seq_file *s, void *v) - __releases(rcu_bh) + __releases(rcu) { - rcu_read_unlock_bh(); + rcu_read_unlock(); } static int seq_show(struct seq_file *s, void *v) diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c index 87a9009d5234..e311462f6d98 100644 --- a/net/netfilter/nfnetlink_queue.c +++ b/net/netfilter/nfnetlink_queue.c @@ -29,6 +29,7 @@ #include <linux/netfilter/nfnetlink_queue.h> #include <linux/netfilter/nf_conntrack_common.h> #include <linux/list.h> +#include <linux/cgroup-defs.h> #include <net/sock.h> #include <net/tcp_states.h> #include <net/netfilter/nf_queue.h> @@ -301,6 +302,19 @@ nla_put_failure: return -1; } +static int nfqnl_put_sk_classid(struct sk_buff *skb, struct sock *sk) +{ +#if IS_ENABLED(CONFIG_CGROUP_NET_CLASSID) + if (sk && sk_fullsock(sk)) { + u32 classid = sock_cgroup_classid(&sk->sk_cgrp_data); + + if (classid && nla_put_be32(skb, NFQA_CGROUP_CLASSID, htonl(classid))) + return -1; + } +#endif + return 0; +} + static u32 nfqnl_get_sk_secctx(struct sk_buff *skb, char **secdata) { u32 seclen = 0; @@ -406,6 +420,9 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue, + nla_total_size(sizeof(u_int32_t)) /* priority */ + nla_total_size(sizeof(struct nfqnl_msg_packet_hw)) + nla_total_size(sizeof(u_int32_t)) /* skbinfo */ +#if IS_ENABLED(CONFIG_CGROUP_NET_CLASSID) + + nla_total_size(sizeof(u_int32_t)) /* classid */ +#endif + nla_total_size(sizeof(u_int32_t)); /* cap_len */ tstamp = skb_tstamp_cond(entskb, false); @@ -599,6 +616,9 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue, nfqnl_put_sk_uidgid(skb, entskb->sk) < 0) goto nla_put_failure; + if (nfqnl_put_sk_classid(skb, entskb->sk) < 0) + goto nla_put_failure; + if (seclen && nla_put(skb, NFQA_SECCTX, seclen, secdata)) goto nla_put_failure; diff --git a/net/netfilter/nft_masq.c b/net/netfilter/nft_masq.c index 9544c2f16998..b115d77fbbc7 100644 --- a/net/netfilter/nft_masq.c +++ b/net/netfilter/nft_masq.c @@ -96,23 +96,39 @@ nla_put_failure: return -1; } -static void nft_masq_ipv4_eval(const struct nft_expr *expr, - struct nft_regs *regs, - const struct nft_pktinfo *pkt) +static void nft_masq_eval(const struct nft_expr *expr, + struct nft_regs *regs, + const struct nft_pktinfo *pkt) { - struct nft_masq *priv = nft_expr_priv(expr); + const struct nft_masq *priv = nft_expr_priv(expr); struct nf_nat_range2 range; memset(&range, 0, sizeof(range)); range.flags = priv->flags; if (priv->sreg_proto_min) { - range.min_proto.all = (__force __be16)nft_reg_load16( - ®s->data[priv->sreg_proto_min]); - range.max_proto.all = (__force __be16)nft_reg_load16( - ®s->data[priv->sreg_proto_max]); + range.min_proto.all = (__force __be16) + nft_reg_load16(®s->data[priv->sreg_proto_min]); + range.max_proto.all = (__force __be16) + nft_reg_load16(®s->data[priv->sreg_proto_max]); + } + + switch (nft_pf(pkt)) { + case NFPROTO_IPV4: + regs->verdict.code = nf_nat_masquerade_ipv4(pkt->skb, + nft_hook(pkt), + &range, + nft_out(pkt)); + break; +#ifdef CONFIG_NF_TABLES_IPV6 + case NFPROTO_IPV6: + regs->verdict.code = nf_nat_masquerade_ipv6(pkt->skb, &range, + nft_out(pkt)); + break; +#endif + default: + WARN_ON_ONCE(1); + break; } - regs->verdict.code = nf_nat_masquerade_ipv4(pkt->skb, nft_hook(pkt), - &range, nft_out(pkt)); } static void @@ -125,7 +141,7 @@ static struct nft_expr_type nft_masq_ipv4_type; static const struct nft_expr_ops nft_masq_ipv4_ops = { .type = &nft_masq_ipv4_type, .size = NFT_EXPR_SIZE(sizeof(struct nft_masq)), - .eval = nft_masq_ipv4_eval, + .eval = nft_masq_eval, .init = nft_masq_init, .destroy = nft_masq_ipv4_destroy, .dump = nft_masq_dump, @@ -143,25 +159,6 @@ static struct nft_expr_type nft_masq_ipv4_type __read_mostly = { }; #ifdef CONFIG_NF_TABLES_IPV6 -static void nft_masq_ipv6_eval(const struct nft_expr *expr, - struct nft_regs *regs, - const struct nft_pktinfo *pkt) -{ - struct nft_masq *priv = nft_expr_priv(expr); - struct nf_nat_range2 range; - - memset(&range, 0, sizeof(range)); - range.flags = priv->flags; - if (priv->sreg_proto_min) { - range.min_proto.all = (__force __be16)nft_reg_load16( - ®s->data[priv->sreg_proto_min]); - range.max_proto.all = (__force __be16)nft_reg_load16( - ®s->data[priv->sreg_proto_max]); - } - regs->verdict.code = nf_nat_masquerade_ipv6(pkt->skb, &range, - nft_out(pkt)); -} - static void nft_masq_ipv6_destroy(const struct nft_ctx *ctx, const struct nft_expr *expr) { @@ -172,7 +169,7 @@ static struct nft_expr_type nft_masq_ipv6_type; static const struct nft_expr_ops nft_masq_ipv6_ops = { .type = &nft_masq_ipv6_type, .size = NFT_EXPR_SIZE(sizeof(struct nft_masq)), - .eval = nft_masq_ipv6_eval, + .eval = nft_masq_eval, .init = nft_masq_init, .destroy = nft_masq_ipv6_destroy, .dump = nft_masq_dump, @@ -204,20 +201,6 @@ static inline void nft_masq_module_exit_ipv6(void) {} #endif #ifdef CONFIG_NF_TABLES_INET -static void nft_masq_inet_eval(const struct nft_expr *expr, - struct nft_regs *regs, - const struct nft_pktinfo *pkt) -{ - switch (nft_pf(pkt)) { - case NFPROTO_IPV4: - return nft_masq_ipv4_eval(expr, regs, pkt); - case NFPROTO_IPV6: - return nft_masq_ipv6_eval(expr, regs, pkt); - } - - WARN_ON_ONCE(1); -} - static void nft_masq_inet_destroy(const struct nft_ctx *ctx, const struct nft_expr *expr) { @@ -228,7 +211,7 @@ static struct nft_expr_type nft_masq_inet_type; static const struct nft_expr_ops nft_masq_inet_ops = { .type = &nft_masq_inet_type, .size = NFT_EXPR_SIZE(sizeof(struct nft_masq)), - .eval = nft_masq_inet_eval, + .eval = nft_masq_eval, .init = nft_masq_init, .destroy = nft_masq_inet_destroy, .dump = nft_masq_dump, diff --git a/net/netfilter/nft_redir.c b/net/netfilter/nft_redir.c index 67cec56bc84a..a70196ffcb1e 100644 --- a/net/netfilter/nft_redir.c +++ b/net/netfilter/nft_redir.c @@ -64,6 +64,8 @@ static int nft_redir_init(const struct nft_ctx *ctx, } else { priv->sreg_proto_max = priv->sreg_proto_min; } + + priv->flags |= NF_NAT_RANGE_PROTO_SPECIFIED; } if (tb[NFTA_REDIR_FLAGS]) { @@ -99,25 +101,37 @@ nla_put_failure: return -1; } -static void nft_redir_ipv4_eval(const struct nft_expr *expr, - struct nft_regs *regs, - const struct nft_pktinfo *pkt) +static void nft_redir_eval(const struct nft_expr *expr, + struct nft_regs *regs, + const struct nft_pktinfo *pkt) { - struct nft_redir *priv = nft_expr_priv(expr); - struct nf_nat_ipv4_multi_range_compat mr; + const struct nft_redir *priv = nft_expr_priv(expr); + struct nf_nat_range2 range; - memset(&mr, 0, sizeof(mr)); + memset(&range, 0, sizeof(range)); + range.flags = priv->flags; if (priv->sreg_proto_min) { - mr.range[0].min.all = (__force __be16)nft_reg_load16( - ®s->data[priv->sreg_proto_min]); - mr.range[0].max.all = (__force __be16)nft_reg_load16( - ®s->data[priv->sreg_proto_max]); - mr.range[0].flags |= NF_NAT_RANGE_PROTO_SPECIFIED; + range.min_proto.all = (__force __be16) + nft_reg_load16(®s->data[priv->sreg_proto_min]); + range.max_proto.all = (__force __be16) + nft_reg_load16(®s->data[priv->sreg_proto_max]); } - mr.range[0].flags |= priv->flags; - - regs->verdict.code = nf_nat_redirect_ipv4(pkt->skb, &mr, nft_hook(pkt)); + switch (nft_pf(pkt)) { + case NFPROTO_IPV4: + regs->verdict.code = nf_nat_redirect_ipv4(pkt->skb, &range, + nft_hook(pkt)); + break; +#ifdef CONFIG_NF_TABLES_IPV6 + case NFPROTO_IPV6: + regs->verdict.code = nf_nat_redirect_ipv6(pkt->skb, &range, + nft_hook(pkt)); + break; +#endif + default: + WARN_ON_ONCE(1); + break; + } } static void @@ -130,7 +144,7 @@ static struct nft_expr_type nft_redir_ipv4_type; static const struct nft_expr_ops nft_redir_ipv4_ops = { .type = &nft_redir_ipv4_type, .size = NFT_EXPR_SIZE(sizeof(struct nft_redir)), - .eval = nft_redir_ipv4_eval, + .eval = nft_redir_eval, .init = nft_redir_init, .destroy = nft_redir_ipv4_destroy, .dump = nft_redir_dump, @@ -148,28 +162,6 @@ static struct nft_expr_type nft_redir_ipv4_type __read_mostly = { }; #ifdef CONFIG_NF_TABLES_IPV6 -static void nft_redir_ipv6_eval(const struct nft_expr *expr, - struct nft_regs *regs, - const struct nft_pktinfo *pkt) -{ - struct nft_redir *priv = nft_expr_priv(expr); - struct nf_nat_range2 range; - - memset(&range, 0, sizeof(range)); - if (priv->sreg_proto_min) { - range.min_proto.all = (__force __be16)nft_reg_load16( - ®s->data[priv->sreg_proto_min]); - range.max_proto.all = (__force __be16)nft_reg_load16( - ®s->data[priv->sreg_proto_max]); - range.flags |= NF_NAT_RANGE_PROTO_SPECIFIED; - } - - range.flags |= priv->flags; - - regs->verdict.code = - nf_nat_redirect_ipv6(pkt->skb, &range, nft_hook(pkt)); -} - static void nft_redir_ipv6_destroy(const struct nft_ctx *ctx, const struct nft_expr *expr) { @@ -180,7 +172,7 @@ static struct nft_expr_type nft_redir_ipv6_type; static const struct nft_expr_ops nft_redir_ipv6_ops = { .type = &nft_redir_ipv6_type, .size = NFT_EXPR_SIZE(sizeof(struct nft_redir)), - .eval = nft_redir_ipv6_eval, + .eval = nft_redir_eval, .init = nft_redir_init, .destroy = nft_redir_ipv6_destroy, .dump = nft_redir_dump, @@ -199,20 +191,6 @@ static struct nft_expr_type nft_redir_ipv6_type __read_mostly = { #endif #ifdef CONFIG_NF_TABLES_INET -static void nft_redir_inet_eval(const struct nft_expr *expr, - struct nft_regs *regs, - const struct nft_pktinfo *pkt) -{ - switch (nft_pf(pkt)) { - case NFPROTO_IPV4: - return nft_redir_ipv4_eval(expr, regs, pkt); - case NFPROTO_IPV6: - return nft_redir_ipv6_eval(expr, regs, pkt); - } - - WARN_ON_ONCE(1); -} - static void nft_redir_inet_destroy(const struct nft_ctx *ctx, const struct nft_expr *expr) { @@ -223,7 +201,7 @@ static struct nft_expr_type nft_redir_inet_type; static const struct nft_expr_ops nft_redir_inet_ops = { .type = &nft_redir_inet_type, .size = NFT_EXPR_SIZE(sizeof(struct nft_redir)), - .eval = nft_redir_inet_eval, + .eval = nft_redir_eval, .init = nft_redir_init, .destroy = nft_redir_inet_destroy, .dump = nft_redir_dump, diff --git a/net/netfilter/utils.c b/net/netfilter/utils.c index 2182d361e273..acef4155f0da 100644 --- a/net/netfilter/utils.c +++ b/net/netfilter/utils.c @@ -215,3 +215,55 @@ int nf_reroute(struct sk_buff *skb, struct nf_queue_entry *entry) } return ret; } + +/* Only get and check the lengths, not do any hop-by-hop stuff. */ +int nf_ip6_check_hbh_len(struct sk_buff *skb, u32 *plen) +{ + int len, off = sizeof(struct ipv6hdr); + unsigned char *nh; + + if (!pskb_may_pull(skb, off + 8)) + return -ENOMEM; + nh = (unsigned char *)(ipv6_hdr(skb) + 1); + len = (nh[1] + 1) << 3; + + if (!pskb_may_pull(skb, off + len)) + return -ENOMEM; + nh = skb_network_header(skb); + + off += 2; + len -= 2; + while (len > 0) { + int optlen; + + if (nh[off] == IPV6_TLV_PAD1) { + off++; + len--; + continue; + } + if (len < 2) + return -EBADMSG; + optlen = nh[off + 1] + 2; + if (optlen > len) + return -EBADMSG; + + if (nh[off] == IPV6_TLV_JUMBO) { + u32 pkt_len; + + if (nh[off + 1] != 4 || (off & 3) != 2) + return -EBADMSG; + pkt_len = ntohl(*(__be32 *)(nh + off + 2)); + if (pkt_len <= IPV6_MAXPLEN || + ipv6_hdr(skb)->payload_len) + return -EBADMSG; + if (pkt_len > skb->len - sizeof(struct ipv6hdr)) + return -EBADMSG; + *plen = pkt_len; + } + off += optlen; + len -= optlen; + } + + return len ? -EBADMSG : 0; +} +EXPORT_SYMBOL_GPL(nf_ip6_check_hbh_len); diff --git a/net/netfilter/xt_REDIRECT.c b/net/netfilter/xt_REDIRECT.c index 353ca7801251..ff66b56a3f97 100644 --- a/net/netfilter/xt_REDIRECT.c +++ b/net/netfilter/xt_REDIRECT.c @@ -46,7 +46,6 @@ static void redirect_tg_destroy(const struct xt_tgdtor_param *par) nf_ct_netns_put(par->net, par->family); } -/* FIXME: Take multiple ranges --RR */ static int redirect_tg4_check(const struct xt_tgchk_param *par) { const struct nf_nat_ipv4_multi_range_compat *mr = par->targinfo; @@ -65,7 +64,14 @@ static int redirect_tg4_check(const struct xt_tgchk_param *par) static unsigned int redirect_tg4(struct sk_buff *skb, const struct xt_action_param *par) { - return nf_nat_redirect_ipv4(skb, par->targinfo, xt_hooknum(par)); + const struct nf_nat_ipv4_multi_range_compat *mr = par->targinfo; + struct nf_nat_range2 range = { + .flags = mr->range[0].flags, + .min_proto = mr->range[0].min, + .max_proto = mr->range[0].max, + }; + + return nf_nat_redirect_ipv4(skb, &range, xt_hooknum(par)); } static struct xt_target redirect_tg_reg[] __read_mostly = { diff --git a/net/netfilter/xt_tcpudp.c b/net/netfilter/xt_tcpudp.c index 11ec2abf0c72..e8991130a3de 100644 --- a/net/netfilter/xt_tcpudp.c +++ b/net/netfilter/xt_tcpudp.c @@ -4,6 +4,7 @@ #include <linux/module.h> #include <net/ip.h> #include <linux/ipv6.h> +#include <linux/icmp.h> #include <net/ipv6.h> #include <net/tcp.h> #include <net/udp.h> @@ -20,6 +21,8 @@ MODULE_ALIAS("ipt_udp"); MODULE_ALIAS("ipt_tcp"); MODULE_ALIAS("ip6t_udp"); MODULE_ALIAS("ip6t_tcp"); +MODULE_ALIAS("ipt_icmp"); +MODULE_ALIAS("ip6t_icmp6"); /* Returns 1 if the port is matched by the range, 0 otherwise */ static inline bool @@ -161,6 +164,95 @@ static int udp_mt_check(const struct xt_mtchk_param *par) return (udpinfo->invflags & ~XT_UDP_INV_MASK) ? -EINVAL : 0; } +/* Returns 1 if the type and code is matched by the range, 0 otherwise */ +static bool type_code_in_range(u8 test_type, u8 min_code, u8 max_code, + u8 type, u8 code) +{ + return type == test_type && code >= min_code && code <= max_code; +} + +static bool icmp_type_code_match(u8 test_type, u8 min_code, u8 max_code, + u8 type, u8 code, bool invert) +{ + return (test_type == 0xFF || + type_code_in_range(test_type, min_code, max_code, type, code)) + ^ invert; +} + +static bool icmp6_type_code_match(u8 test_type, u8 min_code, u8 max_code, + u8 type, u8 code, bool invert) +{ + return type_code_in_range(test_type, min_code, max_code, type, code) ^ invert; +} + +static bool +icmp_match(const struct sk_buff *skb, struct xt_action_param *par) +{ + const struct icmphdr *ic; + struct icmphdr _icmph; + const struct ipt_icmp *icmpinfo = par->matchinfo; + + /* Must not be a fragment. */ + if (par->fragoff != 0) + return false; + + ic = skb_header_pointer(skb, par->thoff, sizeof(_icmph), &_icmph); + if (!ic) { + /* We've been asked to examine this packet, and we + * can't. Hence, no choice but to drop. + */ + par->hotdrop = true; + return false; + } + + return icmp_type_code_match(icmpinfo->type, + icmpinfo->code[0], + icmpinfo->code[1], + ic->type, ic->code, + !!(icmpinfo->invflags & IPT_ICMP_INV)); +} + +static bool +icmp6_match(const struct sk_buff *skb, struct xt_action_param *par) +{ + const struct icmp6hdr *ic; + struct icmp6hdr _icmph; + const struct ip6t_icmp *icmpinfo = par->matchinfo; + + /* Must not be a fragment. */ + if (par->fragoff != 0) + return false; + + ic = skb_header_pointer(skb, par->thoff, sizeof(_icmph), &_icmph); + if (!ic) { + /* We've been asked to examine this packet, and we + * can't. Hence, no choice but to drop. + */ + par->hotdrop = true; + return false; + } + + return icmp6_type_code_match(icmpinfo->type, + icmpinfo->code[0], + icmpinfo->code[1], + ic->icmp6_type, ic->icmp6_code, + !!(icmpinfo->invflags & IP6T_ICMP_INV)); +} + +static int icmp_checkentry(const struct xt_mtchk_param *par) +{ + const struct ipt_icmp *icmpinfo = par->matchinfo; + + return (icmpinfo->invflags & ~IPT_ICMP_INV) ? -EINVAL : 0; +} + +static int icmp6_checkentry(const struct xt_mtchk_param *par) +{ + const struct ip6t_icmp *icmpinfo = par->matchinfo; + + return (icmpinfo->invflags & ~IP6T_ICMP_INV) ? -EINVAL : 0; +} + static struct xt_match tcpudp_mt_reg[] __read_mostly = { { .name = "tcp", @@ -216,6 +308,24 @@ static struct xt_match tcpudp_mt_reg[] __read_mostly = { .proto = IPPROTO_UDPLITE, .me = THIS_MODULE, }, + { + .name = "icmp", + .match = icmp_match, + .matchsize = sizeof(struct ipt_icmp), + .checkentry = icmp_checkentry, + .proto = IPPROTO_ICMP, + .family = NFPROTO_IPV4, + .me = THIS_MODULE, + }, + { + .name = "icmp6", + .match = icmp6_match, + .matchsize = sizeof(struct ip6t_icmp), + .checkentry = icmp6_checkentry, + .proto = IPPROTO_ICMPV6, + .family = NFPROTO_IPV6, + .me = THIS_MODULE, + }, }; static int __init tcpudp_mt_init(void) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index f365dfdd672d..7ef8b9a1e30c 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -1742,7 +1742,8 @@ static int netlink_getsockopt(struct socket *sock, int level, int optname, { struct sock *sk = sock->sk; struct netlink_sock *nlk = nlk_sk(sk); - int len, val, err; + unsigned int flag; + int len, val; if (level != SOL_NETLINK) return -ENOPROTOOPT; @@ -1754,39 +1755,17 @@ static int netlink_getsockopt(struct socket *sock, int level, int optname, switch (optname) { case NETLINK_PKTINFO: - if (len < sizeof(int)) - return -EINVAL; - len = sizeof(int); - val = nlk->flags & NETLINK_F_RECV_PKTINFO ? 1 : 0; - if (put_user(len, optlen) || - put_user(val, optval)) - return -EFAULT; - err = 0; + flag = NETLINK_F_RECV_PKTINFO; break; case NETLINK_BROADCAST_ERROR: - if (len < sizeof(int)) - return -EINVAL; - len = sizeof(int); - val = nlk->flags & NETLINK_F_BROADCAST_SEND_ERROR ? 1 : 0; - if (put_user(len, optlen) || - put_user(val, optval)) - return -EFAULT; - err = 0; + flag = NETLINK_F_BROADCAST_SEND_ERROR; break; case NETLINK_NO_ENOBUFS: - if (len < sizeof(int)) - return -EINVAL; - len = sizeof(int); - val = nlk->flags & NETLINK_F_RECV_NO_ENOBUFS ? 1 : 0; - if (put_user(len, optlen) || - put_user(val, optval)) - return -EFAULT; - err = 0; + flag = NETLINK_F_RECV_NO_ENOBUFS; break; case NETLINK_LIST_MEMBERSHIPS: { - int pos, idx, shift; + int pos, idx, shift, err = 0; - err = 0; netlink_lock_table(); for (pos = 0; pos * 8 < nlk->ngroups; pos += sizeof(u32)) { if (len - pos < sizeof(u32)) @@ -1803,40 +1782,32 @@ static int netlink_getsockopt(struct socket *sock, int level, int optname, if (put_user(ALIGN(nlk->ngroups / 8, sizeof(u32)), optlen)) err = -EFAULT; netlink_unlock_table(); - break; + return err; } case NETLINK_CAP_ACK: - if (len < sizeof(int)) - return -EINVAL; - len = sizeof(int); - val = nlk->flags & NETLINK_F_CAP_ACK ? 1 : 0; - if (put_user(len, optlen) || - put_user(val, optval)) - return -EFAULT; - err = 0; + flag = NETLINK_F_CAP_ACK; break; case NETLINK_EXT_ACK: - if (len < sizeof(int)) - return -EINVAL; - len = sizeof(int); - val = nlk->flags & NETLINK_F_EXT_ACK ? 1 : 0; - if (put_user(len, optlen) || put_user(val, optval)) - return -EFAULT; - err = 0; + flag = NETLINK_F_EXT_ACK; break; case NETLINK_GET_STRICT_CHK: - if (len < sizeof(int)) - return -EINVAL; - len = sizeof(int); - val = nlk->flags & NETLINK_F_STRICT_CHK ? 1 : 0; - if (put_user(len, optlen) || put_user(val, optval)) - return -EFAULT; - err = 0; + flag = NETLINK_F_STRICT_CHK; break; default: - err = -ENOPROTOOPT; + return -ENOPROTOOPT; } - return err; + + if (len < sizeof(int)) + return -EINVAL; + + len = sizeof(int); + val = nlk->flags & flag ? 1 : 0; + + if (put_user(len, optlen) || + copy_to_user(optval, &val, len)) + return -EFAULT; + + return 0; } static void netlink_cmsg_recv_pktinfo(struct msghdr *msg, struct sk_buff *skb) @@ -2098,8 +2069,6 @@ __netlink_kernel_create(struct net *net, int unit, struct module *module, nl_table[unit].bind = cfg->bind; nl_table[unit].unbind = cfg->unbind; nl_table[unit].flags = cfg->flags; - if (cfg->compare) - nl_table[unit].compare = cfg->compare; } nl_table[unit].registered = 1; } else { diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h index 5f454c8de6a4..90a3198a9b7f 100644 --- a/net/netlink/af_netlink.h +++ b/net/netlink/af_netlink.h @@ -64,7 +64,6 @@ struct netlink_table { struct module *module; int (*bind)(struct net *net, int group); void (*unbind)(struct net *net, int group); - bool (*compare)(struct net *net, struct sock *sock); int registered; }; diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index d4e76e2ae153..6080c0db0814 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -270,8 +270,11 @@ static noinline struct sk_buff *nf_hook_direct_egress(struct sk_buff *skb) } #endif -static int packet_direct_xmit(struct sk_buff *skb) +static int packet_xmit(const struct packet_sock *po, struct sk_buff *skb) { + if (!packet_sock_flag(po, PACKET_SOCK_QDISC_BYPASS)) + return dev_queue_xmit(skb); + #ifdef CONFIG_NETFILTER_EGRESS if (nf_hook_egress_active()) { skb = nf_hook_direct_egress(skb); @@ -305,11 +308,6 @@ static void packet_cached_dev_reset(struct packet_sock *po) RCU_INIT_POINTER(po->cached_dev, NULL); } -static bool packet_use_direct_xmit(const struct packet_sock *po) -{ - return po->xmit == packet_direct_xmit; -} - static u16 packet_pick_tx_queue(struct sk_buff *skb) { struct net_device *dev = skb->dev; @@ -339,14 +337,14 @@ static void __register_prot_hook(struct sock *sk) { struct packet_sock *po = pkt_sk(sk); - if (!po->running) { + if (!packet_sock_flag(po, PACKET_SOCK_RUNNING)) { if (po->fanout) __fanout_link(sk, po); else dev_add_pack(&po->prot_hook); sock_hold(sk); - po->running = 1; + packet_sock_flag_set(po, PACKET_SOCK_RUNNING, 1); } } @@ -368,7 +366,7 @@ static void __unregister_prot_hook(struct sock *sk, bool sync) lockdep_assert_held_once(&po->bind_lock); - po->running = 0; + packet_sock_flag_set(po, PACKET_SOCK_RUNNING, 0); if (po->fanout) __fanout_unlink(sk, po); @@ -388,7 +386,7 @@ static void unregister_prot_hook(struct sock *sk, bool sync) { struct packet_sock *po = pkt_sk(sk); - if (po->running) + if (packet_sock_flag(po, PACKET_SOCK_RUNNING)) __unregister_prot_hook(sk, sync); } @@ -473,7 +471,7 @@ static __u32 __packet_set_timestamp(struct packet_sock *po, void *frame, struct timespec64 ts; __u32 ts_status; - if (!(ts_status = tpacket_get_timestamp(skb, &ts, po->tp_tstamp))) + if (!(ts_status = tpacket_get_timestamp(skb, &ts, READ_ONCE(po->tp_tstamp)))) return 0; h.raw = frame; @@ -1306,22 +1304,23 @@ static int __packet_rcv_has_room(const struct packet_sock *po, static int packet_rcv_has_room(struct packet_sock *po, struct sk_buff *skb) { - int pressure, ret; + bool pressure; + int ret; ret = __packet_rcv_has_room(po, skb); pressure = ret != ROOM_NORMAL; - if (READ_ONCE(po->pressure) != pressure) - WRITE_ONCE(po->pressure, pressure); + if (packet_sock_flag(po, PACKET_SOCK_PRESSURE) != pressure) + packet_sock_flag_set(po, PACKET_SOCK_PRESSURE, pressure); return ret; } static void packet_rcv_try_clear_pressure(struct packet_sock *po) { - if (READ_ONCE(po->pressure) && + if (packet_sock_flag(po, PACKET_SOCK_PRESSURE) && __packet_rcv_has_room(po, NULL) == ROOM_NORMAL) - WRITE_ONCE(po->pressure, 0); + packet_sock_flag_set(po, PACKET_SOCK_PRESSURE, false); } static void packet_sock_destruct(struct sock *sk) @@ -1408,7 +1407,8 @@ static unsigned int fanout_demux_rollover(struct packet_fanout *f, i = j = min_t(int, po->rollover->sock, num - 1); do { po_next = pkt_sk(rcu_dereference(f->arr[i])); - if (po_next != po_skip && !READ_ONCE(po_next->pressure) && + if (po_next != po_skip && + !packet_sock_flag(po_next, PACKET_SOCK_PRESSURE) && packet_rcv_has_room(po_next, skb) == ROOM_NORMAL) { if (i != j) po->rollover->sock = i; @@ -1781,7 +1781,7 @@ static int fanout_add(struct sock *sk, struct fanout_args *args) err = -EINVAL; spin_lock(&po->bind_lock); - if (po->running && + if (packet_sock_flag(po, PACKET_SOCK_RUNNING) && match->type == type && match->prot_hook.type == po->prot_hook.type && match->prot_hook.dev == po->prot_hook.dev) { @@ -2090,18 +2090,18 @@ static unsigned int run_filter(struct sk_buff *skb, } static int packet_rcv_vnet(struct msghdr *msg, const struct sk_buff *skb, - size_t *len) + size_t *len, int vnet_hdr_sz) { - struct virtio_net_hdr vnet_hdr; + struct virtio_net_hdr_mrg_rxbuf vnet_hdr = { .num_buffers = 0 }; - if (*len < sizeof(vnet_hdr)) + if (*len < vnet_hdr_sz) return -EINVAL; - *len -= sizeof(vnet_hdr); + *len -= vnet_hdr_sz; - if (virtio_net_hdr_from_skb(skb, &vnet_hdr, vio_le(), true, 0)) + if (virtio_net_hdr_from_skb(skb, (struct virtio_net_hdr *)&vnet_hdr, vio_le(), true, 0)) return -EINVAL; - return memcpy_to_msg(msg, (void *)&vnet_hdr, sizeof(vnet_hdr)); + return memcpy_to_msg(msg, (void *)&vnet_hdr, vnet_hdr_sz); } /* @@ -2183,7 +2183,7 @@ static int packet_rcv(struct sk_buff *skb, struct net_device *dev, sll = &PACKET_SKB_CB(skb)->sa.ll; sll->sll_hatype = dev->type; sll->sll_pkttype = skb->pkt_type; - if (unlikely(po->origdev)) + if (unlikely(packet_sock_flag(po, PACKET_SOCK_ORIGDEV))) sll->sll_ifindex = orig_dev->ifindex; else sll->sll_ifindex = dev->ifindex; @@ -2250,7 +2250,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, __u32 ts_status; bool is_drop_n_account = false; unsigned int slot_id = 0; - bool do_vnet = false; + int vnet_hdr_sz = 0; /* struct tpacket{2,3}_hdr is aligned to a multiple of TPACKET_ALIGNMENT. * We may add members to them until current aligned size without forcing @@ -2308,10 +2308,9 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, netoff = TPACKET_ALIGN(po->tp_hdrlen + (maclen < 16 ? 16 : maclen)) + po->tp_reserve; - if (po->has_vnet_hdr) { - netoff += sizeof(struct virtio_net_hdr); - do_vnet = true; - } + vnet_hdr_sz = READ_ONCE(po->vnet_hdr_sz); + if (vnet_hdr_sz) + netoff += vnet_hdr_sz; macoff = netoff - maclen; } if (netoff > USHRT_MAX) { @@ -2337,7 +2336,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, snaplen = po->rx_ring.frame_size - macoff; if ((int)snaplen < 0) { snaplen = 0; - do_vnet = false; + vnet_hdr_sz = 0; } } } else if (unlikely(macoff + snaplen > @@ -2351,7 +2350,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, if (unlikely((int)snaplen < 0)) { snaplen = 0; macoff = GET_PBDQC_FROM_RB(&po->rx_ring)->max_frame_len; - do_vnet = false; + vnet_hdr_sz = 0; } } spin_lock(&sk->sk_receive_queue.lock); @@ -2367,7 +2366,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, __set_bit(slot_id, po->rx_ring.rx_owner_map); } - if (do_vnet && + if (vnet_hdr_sz && virtio_net_hdr_from_skb(skb, h.raw + macoff - sizeof(struct virtio_net_hdr), vio_le(), true, 0)) { @@ -2402,7 +2401,8 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, * closer to the time of capture. */ ts_status = tpacket_get_timestamp(skb, &ts, - po->tp_tstamp | SOF_TIMESTAMPING_SOFTWARE); + READ_ONCE(po->tp_tstamp) | + SOF_TIMESTAMPING_SOFTWARE); if (!ts_status) ktime_get_real_ts64(&ts); @@ -2460,7 +2460,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, sll->sll_hatype = dev->type; sll->sll_protocol = skb->protocol; sll->sll_pkttype = skb->pkt_type; - if (unlikely(po->origdev)) + if (unlikely(packet_sock_flag(po, PACKET_SOCK_ORIGDEV))) sll->sll_ifindex = orig_dev->ifindex; else sll->sll_ifindex = dev->ifindex; @@ -2550,16 +2550,26 @@ static int __packet_snd_vnet_parse(struct virtio_net_hdr *vnet_hdr, size_t len) } static int packet_snd_vnet_parse(struct msghdr *msg, size_t *len, - struct virtio_net_hdr *vnet_hdr) + struct virtio_net_hdr *vnet_hdr, int vnet_hdr_sz) { - if (*len < sizeof(*vnet_hdr)) + int ret; + + if (*len < vnet_hdr_sz) return -EINVAL; - *len -= sizeof(*vnet_hdr); + *len -= vnet_hdr_sz; if (!copy_from_iter_full(vnet_hdr, sizeof(*vnet_hdr), &msg->msg_iter)) return -EFAULT; - return __packet_snd_vnet_parse(vnet_hdr, *len); + ret = __packet_snd_vnet_parse(vnet_hdr, *len); + if (ret) + return ret; + + /* move iter to point to the start of mac header */ + if (vnet_hdr_sz != sizeof(struct virtio_net_hdr)) + iov_iter_advance(&msg->msg_iter, vnet_hdr_sz - sizeof(struct virtio_net_hdr)); + + return 0; } static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb, @@ -2621,8 +2631,8 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb, nr_frags = skb_shinfo(skb)->nr_frags; if (unlikely(nr_frags >= MAX_SKB_FRAGS)) { - pr_err("Packet exceed the number of skb frags(%lu)\n", - MAX_SKB_FRAGS); + pr_err("Packet exceed the number of skb frags(%u)\n", + (unsigned int)MAX_SKB_FRAGS); return -EFAULT; } @@ -2670,7 +2680,7 @@ static int tpacket_parse_header(struct packet_sock *po, void *frame, return -EMSGSIZE; } - if (unlikely(po->tp_tx_has_off)) { + if (unlikely(packet_sock_flag(po, PACKET_SOCK_TX_HAS_OFF))) { int off_min, off_max; off_min = po->tp_hdrlen - sizeof(struct sockaddr_ll); @@ -2721,6 +2731,7 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg) void *ph; DECLARE_SOCKADDR(struct sockaddr_ll *, saddr, msg->msg_name); bool need_wait = !(msg->msg_flags & MSG_DONTWAIT); + int vnet_hdr_sz = READ_ONCE(po->vnet_hdr_sz); unsigned char *addr = NULL; int tp_len, size_max; void *data; @@ -2778,7 +2789,7 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg) size_max = po->tx_ring.frame_size - (po->tp_hdrlen - sizeof(struct sockaddr_ll)); - if ((size_max > dev->mtu + reserve + VLAN_HLEN) && !po->has_vnet_hdr) + if ((size_max > dev->mtu + reserve + VLAN_HLEN) && !vnet_hdr_sz) size_max = dev->mtu + reserve + VLAN_HLEN; reinit_completion(&po->skb_completion); @@ -2807,10 +2818,10 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg) status = TP_STATUS_SEND_REQUEST; hlen = LL_RESERVED_SPACE(dev); tlen = dev->needed_tailroom; - if (po->has_vnet_hdr) { + if (vnet_hdr_sz) { vnet_hdr = data; - data += sizeof(*vnet_hdr); - tp_len -= sizeof(*vnet_hdr); + data += vnet_hdr_sz; + tp_len -= vnet_hdr_sz; if (tp_len < 0 || __packet_snd_vnet_parse(vnet_hdr, tp_len)) { tp_len = -EINVAL; @@ -2835,13 +2846,13 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg) addr, hlen, copylen, &sockc); if (likely(tp_len >= 0) && tp_len > dev->mtu + reserve && - !po->has_vnet_hdr && + !vnet_hdr_sz && !packet_extra_vlan_len_allowed(dev, skb)) tp_len = -EMSGSIZE; if (unlikely(tp_len < 0)) { tpacket_error: - if (po->tp_loss) { + if (packet_sock_flag(po, PACKET_SOCK_TP_LOSS)) { __packet_set_status(po, ph, TP_STATUS_AVAILABLE); packet_increment_head(&po->tx_ring); @@ -2854,7 +2865,7 @@ tpacket_error: } } - if (po->has_vnet_hdr) { + if (vnet_hdr_sz) { if (virtio_net_hdr_to_skb(skb, vnet_hdr, vio_le())) { tp_len = -EINVAL; goto tpacket_error; @@ -2867,7 +2878,7 @@ tpacket_error: packet_inc_pending(&po->tx_ring); status = TP_STATUS_SEND_REQUEST; - err = po->xmit(skb); + err = packet_xmit(po, skb); if (unlikely(err != 0)) { if (err > 0) err = net_xmit_errno(err); @@ -2944,7 +2955,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len) struct virtio_net_hdr vnet_hdr = { 0 }; int offset = 0; struct packet_sock *po = pkt_sk(sk); - bool has_vnet_hdr = false; + int vnet_hdr_sz = READ_ONCE(po->vnet_hdr_sz); int hlen, tlen, linear; int extra_len = 0; @@ -2988,11 +2999,10 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len) if (sock->type == SOCK_RAW) reserve = dev->hard_header_len; - if (po->has_vnet_hdr) { - err = packet_snd_vnet_parse(msg, &len, &vnet_hdr); + if (vnet_hdr_sz) { + err = packet_snd_vnet_parse(msg, &len, &vnet_hdr, vnet_hdr_sz); if (err) goto out_unlock; - has_vnet_hdr = true; } if (unlikely(sock_flag(sk, SOCK_NOFCS))) { @@ -3062,15 +3072,16 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len) packet_parse_headers(skb, sock); - if (has_vnet_hdr) { + if (vnet_hdr_sz) { err = virtio_net_hdr_to_skb(skb, &vnet_hdr, vio_le()); if (err) goto out_free; - len += sizeof(vnet_hdr); + len += vnet_hdr_sz; virtio_net_hdr_set_proto(skb, &vnet_hdr); } - err = po->xmit(skb); + err = packet_xmit(po, skb); + if (unlikely(err != 0)) { if (err > 0) err = net_xmit_errno(err); @@ -3217,7 +3228,7 @@ static int packet_do_bind(struct sock *sk, const char *name, int ifindex, if (need_rehook) { dev_hold(dev); - if (po->running) { + if (packet_sock_flag(po, PACKET_SOCK_RUNNING)) { rcu_read_unlock(); /* prevents packet_notifier() from calling * register_prot_hook() @@ -3230,7 +3241,7 @@ static int packet_do_bind(struct sock *sk, const char *name, int ifindex, dev->ifindex); } - BUG_ON(po->running); + BUG_ON(packet_sock_flag(po, PACKET_SOCK_RUNNING)); WRITE_ONCE(po->num, proto); po->prot_hook.type = proto; @@ -3352,7 +3363,6 @@ static int packet_create(struct net *net, struct socket *sock, int protocol, init_completion(&po->skb_completion); sk->sk_family = PF_PACKET; po->num = proto; - po->xmit = dev_queue_xmit; err = packet_alloc_pending(po); if (err) @@ -3406,7 +3416,7 @@ static int packet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, struct sock *sk = sock->sk; struct sk_buff *skb; int copied, err; - int vnet_hdr_len = 0; + int vnet_hdr_len = READ_ONCE(pkt_sk(sk)->vnet_hdr_sz); unsigned int origlen = 0; err = -EINVAL; @@ -3447,11 +3457,10 @@ static int packet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, packet_rcv_try_clear_pressure(pkt_sk(sk)); - if (pkt_sk(sk)->has_vnet_hdr) { - err = packet_rcv_vnet(msg, skb, &len); + if (vnet_hdr_len) { + err = packet_rcv_vnet(msg, skb, &len, vnet_hdr_len); if (err) goto out_free; - vnet_hdr_len = sizeof(struct virtio_net_hdr); } /* You lose any data beyond the buffer you gave. If it worries @@ -3511,7 +3520,7 @@ static int packet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, memcpy(msg->msg_name, &PACKET_SKB_CB(skb)->sa, copy_len); } - if (pkt_sk(sk)->auxdata) { + if (packet_sock_flag(pkt_sk(sk), PACKET_SOCK_AUXDATA)) { struct tpacket_auxdata aux; aux.tp_status = TP_STATUS_USER; @@ -3882,7 +3891,7 @@ packet_setsockopt(struct socket *sock, int level, int optname, sockptr_t optval, if (po->rx_ring.pg_vec || po->tx_ring.pg_vec) { ret = -EBUSY; } else { - po->tp_loss = !!val; + packet_sock_flag_set(po, PACKET_SOCK_TP_LOSS, val); ret = 0; } release_sock(sk); @@ -3897,9 +3906,7 @@ packet_setsockopt(struct socket *sock, int level, int optname, sockptr_t optval, if (copy_from_sockptr(&val, optval, sizeof(val))) return -EFAULT; - lock_sock(sk); - po->auxdata = !!val; - release_sock(sk); + packet_sock_flag_set(po, PACKET_SOCK_AUXDATA, val); return 0; } case PACKET_ORIGDEV: @@ -3911,14 +3918,13 @@ packet_setsockopt(struct socket *sock, int level, int optname, sockptr_t optval, if (copy_from_sockptr(&val, optval, sizeof(val))) return -EFAULT; - lock_sock(sk); - po->origdev = !!val; - release_sock(sk); + packet_sock_flag_set(po, PACKET_SOCK_ORIGDEV, val); return 0; } case PACKET_VNET_HDR: + case PACKET_VNET_HDR_SZ: { - int val; + int val, hdr_len; if (sock->type != SOCK_RAW) return -EINVAL; @@ -3927,11 +3933,19 @@ packet_setsockopt(struct socket *sock, int level, int optname, sockptr_t optval, if (copy_from_sockptr(&val, optval, sizeof(val))) return -EFAULT; + if (optname == PACKET_VNET_HDR_SZ) { + if (val && val != sizeof(struct virtio_net_hdr) && + val != sizeof(struct virtio_net_hdr_mrg_rxbuf)) + return -EINVAL; + hdr_len = val; + } else { + hdr_len = val ? sizeof(struct virtio_net_hdr) : 0; + } lock_sock(sk); if (po->rx_ring.pg_vec || po->tx_ring.pg_vec) { ret = -EBUSY; } else { - po->has_vnet_hdr = !!val; + WRITE_ONCE(po->vnet_hdr_sz, hdr_len); ret = 0; } release_sock(sk); @@ -3946,7 +3960,7 @@ packet_setsockopt(struct socket *sock, int level, int optname, sockptr_t optval, if (copy_from_sockptr(&val, optval, sizeof(val))) return -EFAULT; - po->tp_tstamp = val; + WRITE_ONCE(po->tp_tstamp, val); return 0; } case PACKET_FANOUT: @@ -3993,7 +4007,7 @@ packet_setsockopt(struct socket *sock, int level, int optname, sockptr_t optval, lock_sock(sk); if (!po->rx_ring.pg_vec && !po->tx_ring.pg_vec) - po->tp_tx_has_off = !!val; + packet_sock_flag_set(po, PACKET_SOCK_TX_HAS_OFF, val); release_sock(sk); return 0; @@ -4007,7 +4021,7 @@ packet_setsockopt(struct socket *sock, int level, int optname, sockptr_t optval, if (copy_from_sockptr(&val, optval, sizeof(val))) return -EFAULT; - po->xmit = val ? packet_direct_xmit : dev_queue_xmit; + packet_sock_flag_set(po, PACKET_SOCK_QDISC_BYPASS, val); return 0; } default: @@ -4058,13 +4072,16 @@ static int packet_getsockopt(struct socket *sock, int level, int optname, break; case PACKET_AUXDATA: - val = po->auxdata; + val = packet_sock_flag(po, PACKET_SOCK_AUXDATA); break; case PACKET_ORIGDEV: - val = po->origdev; + val = packet_sock_flag(po, PACKET_SOCK_ORIGDEV); break; case PACKET_VNET_HDR: - val = po->has_vnet_hdr; + val = !!READ_ONCE(po->vnet_hdr_sz); + break; + case PACKET_VNET_HDR_SZ: + val = READ_ONCE(po->vnet_hdr_sz); break; case PACKET_VERSION: val = po->tp_version; @@ -4094,10 +4111,10 @@ static int packet_getsockopt(struct socket *sock, int level, int optname, val = po->tp_reserve; break; case PACKET_LOSS: - val = po->tp_loss; + val = packet_sock_flag(po, PACKET_SOCK_TP_LOSS); break; case PACKET_TIMESTAMP: - val = po->tp_tstamp; + val = READ_ONCE(po->tp_tstamp); break; case PACKET_FANOUT: val = (po->fanout ? @@ -4119,10 +4136,10 @@ static int packet_getsockopt(struct socket *sock, int level, int optname, lv = sizeof(rstats); break; case PACKET_TX_HAS_OFF: - val = po->tp_tx_has_off; + val = packet_sock_flag(po, PACKET_SOCK_TX_HAS_OFF); break; case PACKET_QDISC_BYPASS: - val = packet_use_direct_xmit(po); + val = packet_sock_flag(po, PACKET_SOCK_QDISC_BYPASS); break; default: return -ENOPROTOOPT; @@ -4157,7 +4174,7 @@ static int packet_notifier(struct notifier_block *this, case NETDEV_DOWN: if (dev->ifindex == po->ifindex) { spin_lock(&po->bind_lock); - if (po->running) { + if (packet_sock_flag(po, PACKET_SOCK_RUNNING)) { __unregister_prot_hook(sk, false); sk->sk_err = ENETDOWN; if (!sock_flag(sk, SOCK_DEAD)) @@ -4468,7 +4485,7 @@ static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u, /* Detach socket from network */ spin_lock(&po->bind_lock); - was_running = po->running; + was_running = packet_sock_flag(po, PACKET_SOCK_RUNNING); num = po->num; if (was_running) { WRITE_ONCE(po->num, 0); @@ -4679,7 +4696,7 @@ static int packet_seq_show(struct seq_file *seq, void *v) s->sk_type, ntohs(READ_ONCE(po->num)), READ_ONCE(po->ifindex), - po->running, + packet_sock_flag(po, PACKET_SOCK_RUNNING), atomic_read(&s->sk_rmem_alloc), from_kuid_munged(seq_user_ns(seq), sock_i_uid(s)), sock_i_ino(s)); diff --git a/net/packet/diag.c b/net/packet/diag.c index 07812ae5ca07..d0c4eda4cdc6 100644 --- a/net/packet/diag.c +++ b/net/packet/diag.c @@ -18,18 +18,18 @@ static int pdiag_put_info(const struct packet_sock *po, struct sk_buff *nlskb) pinfo.pdi_version = po->tp_version; pinfo.pdi_reserve = po->tp_reserve; pinfo.pdi_copy_thresh = po->copy_thresh; - pinfo.pdi_tstamp = po->tp_tstamp; + pinfo.pdi_tstamp = READ_ONCE(po->tp_tstamp); pinfo.pdi_flags = 0; - if (po->running) + if (packet_sock_flag(po, PACKET_SOCK_RUNNING)) pinfo.pdi_flags |= PDI_RUNNING; - if (po->auxdata) + if (packet_sock_flag(po, PACKET_SOCK_AUXDATA)) pinfo.pdi_flags |= PDI_AUXDATA; - if (po->origdev) + if (packet_sock_flag(po, PACKET_SOCK_ORIGDEV)) pinfo.pdi_flags |= PDI_ORIGDEV; - if (po->has_vnet_hdr) + if (READ_ONCE(po->vnet_hdr_sz)) pinfo.pdi_flags |= PDI_VNETHDR; - if (po->tp_loss) + if (packet_sock_flag(po, PACKET_SOCK_TP_LOSS)) pinfo.pdi_flags |= PDI_LOSS; return nla_put(nlskb, PACKET_DIAG_INFO, sizeof(pinfo), &pinfo); diff --git a/net/packet/internal.h b/net/packet/internal.h index 48af35b1aed2..63f4865202c1 100644 --- a/net/packet/internal.h +++ b/net/packet/internal.h @@ -116,14 +116,9 @@ struct packet_sock { int copy_thresh; spinlock_t bind_lock; struct mutex pg_vec_lock; - unsigned int running; /* bind_lock must be held */ - unsigned int auxdata:1, /* writer must hold sock lock */ - origdev:1, - has_vnet_hdr:1, - tp_loss:1, - tp_tx_has_off:1; - int pressure; + unsigned long flags; int ifindex; /* bound device */ + u8 vnet_hdr_sz; __be16 num; struct packet_rollover *rollover; struct packet_mclist *mclist; @@ -134,14 +129,36 @@ struct packet_sock { unsigned int tp_tstamp; struct completion skb_completion; struct net_device __rcu *cached_dev; - int (*xmit)(struct sk_buff *skb); struct packet_type prot_hook ____cacheline_aligned_in_smp; atomic_t tp_drops ____cacheline_aligned_in_smp; }; -static inline struct packet_sock *pkt_sk(struct sock *sk) +#define pkt_sk(ptr) container_of_const(ptr, struct packet_sock, sk) + +enum packet_sock_flags { + PACKET_SOCK_ORIGDEV, + PACKET_SOCK_AUXDATA, + PACKET_SOCK_TX_HAS_OFF, + PACKET_SOCK_TP_LOSS, + PACKET_SOCK_RUNNING, + PACKET_SOCK_PRESSURE, + PACKET_SOCK_QDISC_BYPASS, +}; + +static inline void packet_sock_flag_set(struct packet_sock *po, + enum packet_sock_flags flag, + bool val) +{ + if (val) + set_bit(flag, &po->flags); + else + clear_bit(flag, &po->flags); +} + +static inline bool packet_sock_flag(const struct packet_sock *po, + enum packet_sock_flags flag) { - return (struct packet_sock *)sk; + return test_bit(flag, &po->flags); } #endif diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c index 102f5cbff91a..c32b164206f9 100644 --- a/net/rxrpc/af_rxrpc.c +++ b/net/rxrpc/af_rxrpc.c @@ -342,31 +342,44 @@ static void rxrpc_dummy_notify_rx(struct sock *sk, struct rxrpc_call *rxcall, } /** - * rxrpc_kernel_end_call - Allow a kernel service to end a call it was using + * rxrpc_kernel_shutdown_call - Allow a kernel service to shut down a call it was using * @sock: The socket the call is on * @call: The call to end * - * Allow a kernel service to end a call it was using. The call must be + * Allow a kernel service to shut down a call it was using. The call must be * complete before this is called (the call should be aborted if necessary). */ -void rxrpc_kernel_end_call(struct socket *sock, struct rxrpc_call *call) +void rxrpc_kernel_shutdown_call(struct socket *sock, struct rxrpc_call *call) { _enter("%d{%d}", call->debug_id, refcount_read(&call->ref)); mutex_lock(&call->user_mutex); - rxrpc_release_call(rxrpc_sk(sock->sk), call); - - /* Make sure we're not going to call back into a kernel service */ - if (call->notify_rx) { - spin_lock(&call->notify_lock); - call->notify_rx = rxrpc_dummy_notify_rx; - spin_unlock(&call->notify_lock); + if (!test_bit(RXRPC_CALL_RELEASED, &call->flags)) { + rxrpc_release_call(rxrpc_sk(sock->sk), call); + + /* Make sure we're not going to call back into a kernel service */ + if (call->notify_rx) { + spin_lock(&call->notify_lock); + call->notify_rx = rxrpc_dummy_notify_rx; + spin_unlock(&call->notify_lock); + } } - mutex_unlock(&call->user_mutex); +} +EXPORT_SYMBOL(rxrpc_kernel_shutdown_call); + +/** + * rxrpc_kernel_put_call - Release a reference to a call + * @sock: The socket the call is on + * @call: The call to put + * + * Drop the application's ref on an rxrpc call. + */ +void rxrpc_kernel_put_call(struct socket *sock, struct rxrpc_call *call) +{ rxrpc_put_call(call, rxrpc_call_put_kernel); } -EXPORT_SYMBOL(rxrpc_kernel_end_call); +EXPORT_SYMBOL(rxrpc_kernel_put_call); /** * rxrpc_kernel_check_life - Check to see whether a call is still alive diff --git a/net/rxrpc/key.c b/net/rxrpc/key.c index 8d53aded09c4..33e8302a79e3 100644 --- a/net/rxrpc/key.c +++ b/net/rxrpc/key.c @@ -680,7 +680,7 @@ static long rxrpc_read(const struct key *key, return -ENOPKG; } - if (WARN_ON((unsigned long)xdr - (unsigned long)oldxdr == + if (WARN_ON((unsigned long)xdr - (unsigned long)oldxdr != toksize)) return -EIO; } diff --git a/net/rxrpc/protocol.h b/net/rxrpc/protocol.h index 6760cb99c6d6..e8ee4af43ca8 100644 --- a/net/rxrpc/protocol.h +++ b/net/rxrpc/protocol.h @@ -126,7 +126,7 @@ struct rxrpc_ackpacket { uint8_t nAcks; /* number of ACKs */ #define RXRPC_MAXACKS 255 - uint8_t acks[0]; /* list of ACK/NAKs */ + uint8_t acks[]; /* list of ACK/NAKs */ #define RXRPC_ACK_TYPE_NACK 0 #define RXRPC_ACK_TYPE_ACK 1 diff --git a/net/rxrpc/rxperf.c b/net/rxrpc/rxperf.c index 4a2e90015ca7..085e7892d310 100644 --- a/net/rxrpc/rxperf.c +++ b/net/rxrpc/rxperf.c @@ -342,7 +342,8 @@ static void rxperf_deliver_to_call(struct work_struct *work) call_complete: rxperf_set_call_complete(call, ret, remote_abort); /* The call may have been requeued */ - rxrpc_kernel_end_call(rxperf_socket, call->rxcall); + rxrpc_kernel_shutdown_call(rxperf_socket, call->rxcall); + rxrpc_kernel_put_call(rxperf_socket, call->rxcall); cancel_work(&call->work); kfree(call); } diff --git a/net/sched/act_api.c b/net/sched/act_api.c index 296fc1afedd8..f7887f42d542 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -453,7 +453,7 @@ static size_t tcf_action_shared_attrs_size(const struct tc_action *act) + nla_total_size_64bit(sizeof(u64)) /* TCA_STATS_QUEUE */ + nla_total_size_64bit(sizeof(struct gnet_stats_queue)) - + nla_total_size(0) /* TCA_OPTIONS nested */ + + nla_total_size(0) /* TCA_ACT_OPTIONS nested */ + nla_total_size(sizeof(struct tcf_t)); /* TCA_GACT_TM */ } @@ -480,7 +480,7 @@ tcf_action_dump_terse(struct sk_buff *skb, struct tc_action *a, bool from_act) unsigned char *b = skb_tail_pointer(skb); struct tc_cookie *cookie; - if (nla_put_string(skb, TCA_KIND, a->ops->kind)) + if (nla_put_string(skb, TCA_ACT_KIND, a->ops->kind)) goto nla_put_failure; if (tcf_action_copy_stats(skb, a, 0)) goto nla_put_failure; @@ -598,7 +598,7 @@ static int tcf_del_walker(struct tcf_idrinfo *idrinfo, struct sk_buff *skb, nest = nla_nest_start_noflag(skb, 0); if (nest == NULL) goto nla_put_failure; - if (nla_put_string(skb, TCA_KIND, ops->kind)) + if (nla_put_string(skb, TCA_ACT_KIND, ops->kind)) goto nla_put_failure; ret = 0; @@ -1189,7 +1189,7 @@ tcf_action_dump_1(struct sk_buff *skb, struct tc_action *a, int bind, int ref) if (nla_put_u32(skb, TCA_ACT_IN_HW_COUNT, a->in_hw_count)) goto nla_put_failure; - nest = nla_nest_start_noflag(skb, TCA_OPTIONS); + nest = nla_nest_start_noflag(skb, TCA_ACT_OPTIONS); if (nest == NULL) goto nla_put_failure; err = tcf_action_dump_old(skb, a, bind, ref); diff --git a/net/sched/act_csum.c b/net/sched/act_csum.c index 95e9304024b7..8ed285023a40 100644 --- a/net/sched/act_csum.c +++ b/net/sched/act_csum.c @@ -376,8 +376,7 @@ static int tcf_csum_sctp(struct sk_buff *skb, unsigned int ihl, sctph->checksum = sctp_compute_cksum(skb, skb_network_offset(skb) + ihl); - skb->ip_summed = CHECKSUM_NONE; - skb->csum_not_inet = 0; + skb_reset_csum_not_inet(skb); return 1; } diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c index 8037ec9b1d31..ec43764e92e7 100644 --- a/net/sched/act_mirred.c +++ b/net/sched/act_mirred.c @@ -295,7 +295,7 @@ TC_INDIRECT_SCOPE int tcf_mirred_act(struct sk_buff *skb, at_nh = skb->data == skb_network_header(skb); if (at_nh != expects_nh) { mac_len = skb_at_tc_ingress(skb) ? skb->mac_len : - skb_network_header(skb) - skb_mac_header(skb); + skb_network_offset(skb); if (expects_nh) { /* target device/action expect data at nh */ skb_pull_rcsum(skb2, mac_len); diff --git a/net/sched/act_mpls.c b/net/sched/act_mpls.c index 809f7928a1be..1010dc632874 100644 --- a/net/sched/act_mpls.c +++ b/net/sched/act_mpls.c @@ -69,7 +69,7 @@ TC_INDIRECT_SCOPE int tcf_mpls_act(struct sk_buff *skb, skb_push_rcsum(skb, skb->mac_len); mac_len = skb->mac_len; } else { - mac_len = skb_network_header(skb) - skb_mac_header(skb); + mac_len = skb_network_offset(skb); } ret = READ_ONCE(m->tcf_action); diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c index 4559a1507ea5..fb93d4c1faca 100644 --- a/net/sched/act_pedit.c +++ b/net/sched/act_pedit.c @@ -30,12 +30,13 @@ static const struct nla_policy pedit_policy[TCA_PEDIT_MAX + 1] = { }; static const struct nla_policy pedit_key_ex_policy[TCA_PEDIT_KEY_EX_MAX + 1] = { - [TCA_PEDIT_KEY_EX_HTYPE] = { .type = NLA_U16 }, - [TCA_PEDIT_KEY_EX_CMD] = { .type = NLA_U16 }, + [TCA_PEDIT_KEY_EX_HTYPE] = + NLA_POLICY_MAX(NLA_U16, TCA_PEDIT_HDR_TYPE_MAX), + [TCA_PEDIT_KEY_EX_CMD] = NLA_POLICY_MAX(NLA_U16, TCA_PEDIT_CMD_MAX), }; static struct tcf_pedit_key_ex *tcf_pedit_keys_ex_parse(struct nlattr *nla, - u8 n) + u8 n, struct netlink_ext_ack *extack) { struct tcf_pedit_key_ex *keys_ex; struct tcf_pedit_key_ex *k; @@ -56,12 +57,14 @@ static struct tcf_pedit_key_ex *tcf_pedit_keys_ex_parse(struct nlattr *nla, struct nlattr *tb[TCA_PEDIT_KEY_EX_MAX + 1]; if (!n) { + NL_SET_ERR_MSG_MOD(extack, "Can't parse more extended keys than requested"); err = -EINVAL; goto err_out; } n--; if (nla_type(ka) != TCA_PEDIT_KEY_EX) { + NL_SET_ERR_MSG_ATTR(extack, ka, "Unknown attribute, expected extended key"); err = -EINVAL; goto err_out; } @@ -72,25 +75,26 @@ static struct tcf_pedit_key_ex *tcf_pedit_keys_ex_parse(struct nlattr *nla, if (err) goto err_out; - if (!tb[TCA_PEDIT_KEY_EX_HTYPE] || - !tb[TCA_PEDIT_KEY_EX_CMD]) { + if (NL_REQ_ATTR_CHECK(extack, nla, tb, TCA_PEDIT_KEY_EX_HTYPE)) { + NL_SET_ERR_MSG(extack, "Missing required attribute"); err = -EINVAL; goto err_out; } - k->htype = nla_get_u16(tb[TCA_PEDIT_KEY_EX_HTYPE]); - k->cmd = nla_get_u16(tb[TCA_PEDIT_KEY_EX_CMD]); - - if (k->htype > TCA_PEDIT_HDR_TYPE_MAX || - k->cmd > TCA_PEDIT_CMD_MAX) { + if (NL_REQ_ATTR_CHECK(extack, nla, tb, TCA_PEDIT_KEY_EX_CMD)) { + NL_SET_ERR_MSG(extack, "Missing required attribute"); err = -EINVAL; goto err_out; } + k->htype = nla_get_u16(tb[TCA_PEDIT_KEY_EX_HTYPE]); + k->cmd = nla_get_u16(tb[TCA_PEDIT_KEY_EX_CMD]); + k++; } if (n) { + NL_SET_ERR_MSG_MOD(extack, "Not enough extended keys to parse"); err = -EINVAL; goto err_out; } @@ -222,7 +226,7 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla, } nparms->tcfp_keys_ex = - tcf_pedit_keys_ex_parse(tb[TCA_PEDIT_KEYS_EX], parm->nkeys); + tcf_pedit_keys_ex_parse(tb[TCA_PEDIT_KEYS_EX], parm->nkeys, extack); if (IS_ERR(nparms->tcfp_keys_ex)) { ret = PTR_ERR(nparms->tcfp_keys_ex); goto out_free; @@ -247,8 +251,16 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla, memcpy(nparms->tcfp_keys, parm->keys, ksize); for (i = 0; i < nparms->tcfp_nkeys; ++i) { + u32 offmask = nparms->tcfp_keys[i].offmask; u32 cur = nparms->tcfp_keys[i].off; + /* The AT option can be added to static offsets in the datapath */ + if (!offmask && cur % 4) { + NL_SET_ERR_MSG_MOD(extack, "Offsets must be on 32bit boundaries"); + ret = -EINVAL; + goto put_chain; + } + /* sanitize the shift value for any later use */ nparms->tcfp_keys[i].shift = min_t(size_t, BITS_PER_TYPE(int) - 1, @@ -257,7 +269,7 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla, /* The AT option can read a single byte, we can bound the actual * value with uchar max. */ - cur += (0xff & nparms->tcfp_keys[i].offmask) >> nparms->tcfp_keys[i].shift; + cur += (0xff & offmask) >> nparms->tcfp_keys[i].shift; /* Each key touches 4 bytes starting from the computed offset */ nparms->tcfp_off_max_hint = @@ -313,37 +325,28 @@ static bool offset_valid(struct sk_buff *skb, int offset) return true; } -static int pedit_skb_hdr_offset(struct sk_buff *skb, - enum pedit_header_type htype, int *hoffset) +static void pedit_skb_hdr_offset(struct sk_buff *skb, + enum pedit_header_type htype, int *hoffset) { - int ret = -EINVAL; - + /* 'htype' is validated in the netlink parsing */ switch (htype) { case TCA_PEDIT_KEY_EX_HDR_TYPE_ETH: - if (skb_mac_header_was_set(skb)) { + if (skb_mac_header_was_set(skb)) *hoffset = skb_mac_offset(skb); - ret = 0; - } break; case TCA_PEDIT_KEY_EX_HDR_TYPE_NETWORK: case TCA_PEDIT_KEY_EX_HDR_TYPE_IP4: case TCA_PEDIT_KEY_EX_HDR_TYPE_IP6: *hoffset = skb_network_offset(skb); - ret = 0; break; case TCA_PEDIT_KEY_EX_HDR_TYPE_TCP: case TCA_PEDIT_KEY_EX_HDR_TYPE_UDP: - if (skb_transport_header_was_set(skb)) { + if (skb_transport_header_was_set(skb)) *hoffset = skb_transport_offset(skb); - ret = 0; - } break; default: - ret = -EINVAL; break; } - - return ret; } TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb, @@ -376,10 +379,9 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb, for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) { int offset = tkey->off; + int hoffset = 0; u32 *ptr, hdata; - int hoffset; u32 val; - int rc; if (tkey_ex) { htype = tkey_ex->htype; @@ -388,36 +390,30 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb, tkey_ex++; } - rc = pedit_skb_hdr_offset(skb, htype, &hoffset); - if (rc) { - pr_info("tc action pedit bad header type specified (0x%x)\n", - htype); - goto bad; - } + pedit_skb_hdr_offset(skb, htype, &hoffset); if (tkey->offmask) { u8 *d, _d; if (!offset_valid(skb, hoffset + tkey->at)) { - pr_info("tc action pedit 'at' offset %d out of bounds\n", - hoffset + tkey->at); + pr_info_ratelimited("tc action pedit 'at' offset %d out of bounds\n", + hoffset + tkey->at); goto bad; } d = skb_header_pointer(skb, hoffset + tkey->at, sizeof(_d), &_d); if (!d) goto bad; - offset += (*d & tkey->offmask) >> tkey->shift; - } - if (offset % 4) { - pr_info("tc action pedit offset must be on 32 bit boundaries\n"); - goto bad; + offset += (*d & tkey->offmask) >> tkey->shift; + if (offset % 4) { + pr_info_ratelimited("tc action pedit offset must be on 32 bit boundaries\n"); + goto bad; + } } if (!offset_valid(skb, hoffset + offset)) { - pr_info("tc action pedit offset %d out of bounds\n", - hoffset + offset); + pr_info_ratelimited("tc action pedit offset %d out of bounds\n", hoffset + offset); goto bad; } @@ -434,8 +430,7 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb, val = (*ptr + tkey->val) & ~tkey->mask; break; default: - pr_info("tc action pedit bad command (%d)\n", - cmd); + pr_info_ratelimited("tc action pedit bad command (%d)\n", cmd); goto bad; } diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c index 2d12d2626415..0c8aa7e686ea 100644 --- a/net/sched/act_tunnel_key.c +++ b/net/sched/act_tunnel_key.c @@ -420,6 +420,9 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla, nla_get_u8(tb[TCA_TUNNEL_KEY_NO_CSUM])) flags &= ~TUNNEL_CSUM; + if (nla_get_flag(tb[TCA_TUNNEL_KEY_NO_FRAG])) + flags |= TUNNEL_DONT_FRAGMENT; + if (tb[TCA_TUNNEL_KEY_ENC_DST_PORT]) dst_port = nla_get_be16(tb[TCA_TUNNEL_KEY_ENC_DST_PORT]); @@ -747,6 +750,8 @@ static int tunnel_key_dump(struct sk_buff *skb, struct tc_action *a, key->tp_dst)) || nla_put_u8(skb, TCA_TUNNEL_KEY_NO_CSUM, !(key->tun_flags & TUNNEL_CSUM)) || + ((key->tun_flags & TUNNEL_DONT_FRAGMENT) && + nla_put_flag(skb, TCA_TUNNEL_KEY_NO_FRAG)) || tunnel_key_opts_dump(skb, info)) goto nla_put_failure; diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 35785a36c802..3c3629c9e7b6 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -3211,6 +3211,7 @@ int tcf_exts_init_ex(struct tcf_exts *exts, struct net *net, int action, #ifdef CONFIG_NET_CLS_ACT exts->type = 0; exts->nr_actions = 0; + exts->miss_cookie_node = NULL; /* Note: we do not own yet a reference on net. * This reference might be taken later from tcf_exts_get_net(). */ diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c index 475fe222a855..cc49256d5318 100644 --- a/net/sched/cls_flower.c +++ b/net/sched/cls_flower.c @@ -1057,7 +1057,7 @@ static void fl_set_key_pppoe(struct nlattr **tb, * because ETH_P_PPP_SES was stored in basic.n_proto * which might get overwritten by ppp_proto * or might be set to 0, the role of key_val::type - * is simmilar to vlan_key::tpid + * is similar to vlan_key::tpid */ key_val->type = htons(ETH_P_PPP_SES); key_mask->type = cpu_to_be16(~0); diff --git a/net/sched/em_meta.c b/net/sched/em_meta.c index 49bae3d5006b..af85a73c4c54 100644 --- a/net/sched/em_meta.c +++ b/net/sched/em_meta.c @@ -44,7 +44,7 @@ * be provided for non-numeric types. * * Additionally, type dependent modifiers such as shift operators - * or mask may be applied to extend the functionaliy. As of now, + * or mask may be applied to extend the functionality. As of now, * the variable length type supports shifting the byte string to * the right, eating up any number of octets and thus supporting * wildcard interface name comparisons such as "ppp%" matching diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index aba789c30a2e..fdb8f429333d 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -639,14 +639,16 @@ void qdisc_watchdog_schedule_range_ns(struct qdisc_watchdog *wd, u64 expires, return; if (hrtimer_is_queued(&wd->timer)) { + u64 softexpires; + + softexpires = ktime_to_ns(hrtimer_get_softexpires(&wd->timer)); /* If timer is already set in [expires, expires + delta_ns], * do not reprogram it. */ - if (wd->last_expires - expires <= delta_ns) + if (softexpires - expires <= delta_ns) return; } - wd->last_expires = expires; hrtimer_start_range_ns(&wd->timer, ns_to_ktime(expires), delta_ns, diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c index 7970217b565a..891e007d5c0b 100644 --- a/net/sched/sch_cake.c +++ b/net/sched/sch_cake.c @@ -1360,7 +1360,7 @@ static u32 cake_overhead(struct cake_sched_data *q, const struct sk_buff *skb) return cake_calc_overhead(q, len, off); /* borrowed from qdisc_pkt_len_init() */ - hdr_len = skb_transport_header(skb) - skb_mac_header(skb); + hdr_len = skb_transport_offset(skb); /* + transport layer */ if (likely(shinfo->gso_type & (SKB_GSO_TCPV4 | @@ -1368,14 +1368,14 @@ static u32 cake_overhead(struct cake_sched_data *q, const struct sk_buff *skb) const struct tcphdr *th; struct tcphdr _tcphdr; - th = skb_header_pointer(skb, skb_transport_offset(skb), + th = skb_header_pointer(skb, hdr_len, sizeof(_tcphdr), &_tcphdr); if (likely(th)) hdr_len += __tcp_hdrlen(th); } else { struct udphdr _udphdr; - if (skb_header_pointer(skb, skb_transport_offset(skb), + if (skb_header_pointer(skb, hdr_len, sizeof(_udphdr), &_udphdr)) hdr_len += sizeof(struct udphdr); } diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c index 48d14fb90ba0..f59a2cb2c803 100644 --- a/net/sched/sch_fq.c +++ b/net/sched/sch_fq.c @@ -779,13 +779,17 @@ static int fq_resize(struct Qdisc *sch, u32 log) return 0; } +static struct netlink_range_validation iq_range = { + .max = INT_MAX, +}; + static const struct nla_policy fq_policy[TCA_FQ_MAX + 1] = { [TCA_FQ_UNSPEC] = { .strict_start_type = TCA_FQ_TIMER_SLACK }, [TCA_FQ_PLIMIT] = { .type = NLA_U32 }, [TCA_FQ_FLOW_PLIMIT] = { .type = NLA_U32 }, [TCA_FQ_QUANTUM] = { .type = NLA_U32 }, - [TCA_FQ_INITIAL_QUANTUM] = { .type = NLA_U32 }, + [TCA_FQ_INITIAL_QUANTUM] = NLA_POLICY_FULL_RANGE(NLA_U32, &iq_range), [TCA_FQ_RATE_ENABLE] = { .type = NLA_U32 }, [TCA_FQ_FLOW_DEFAULT_RATE] = { .type = NLA_U32 }, [TCA_FQ_FLOW_MAX_RATE] = { .type = NLA_U32 }, diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index a9aadc4e6858..37e41f972f69 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -502,7 +502,7 @@ static void dev_watchdog(struct timer_list *t) if (netif_device_present(dev) && netif_running(dev) && netif_carrier_ok(dev)) { - int some_queue_timedout = 0; + unsigned int timedout_ms = 0; unsigned int i; unsigned long trans_start; @@ -514,16 +514,16 @@ static void dev_watchdog(struct timer_list *t) if (netif_xmit_stopped(txq) && time_after(jiffies, (trans_start + dev->watchdog_timeo))) { - some_queue_timedout = 1; + timedout_ms = jiffies_to_msecs(jiffies - trans_start); atomic_long_inc(&txq->trans_timeout); break; } } - if (unlikely(some_queue_timedout)) { + if (unlikely(timedout_ms)) { trace_net_dev_xmit_timeout(dev, i); - WARN_ONCE(1, KERN_INFO "NETDEV WATCHDOG: %s (%s): transmit queue %u timed out\n", - dev->name, netdev_drivername(dev), i); + WARN_ONCE(1, "NETDEV WATCHDOG: %s (%s): transmit queue %u timed out %u ms\n", + dev->name, netdev_drivername(dev), i, timedout_ms); netif_freeze_queues(dev); dev->netdev_ops->ndo_tx_timeout(dev, i); netif_unfreeze_queues(dev); diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c index 92f2975b6a82..8aef7dd9fb88 100644 --- a/net/sched/sch_htb.c +++ b/net/sched/sch_htb.c @@ -1786,7 +1786,7 @@ static int htb_change_class(struct Qdisc *sch, u32 classid, goto failure; err = nla_parse_nested_deprecated(tb, TCA_HTB_MAX, opt, htb_policy, - NULL); + extack); if (err < 0) goto failure; @@ -1858,7 +1858,7 @@ static int htb_change_class(struct Qdisc *sch, u32 classid, /* check maximal depth */ if (parent && parent->parent && parent->parent->level < 2) { - pr_err("htb: tree is too deep\n"); + NL_SET_ERR_MSG_MOD(extack, "tree is too deep"); goto failure; } err = -ENOBUFS; @@ -1917,8 +1917,8 @@ static int htb_change_class(struct Qdisc *sch, u32 classid, }; err = htb_offload(dev, &offload_opt); if (err) { - pr_err("htb: TC_HTB_LEAF_ALLOC_QUEUE failed with err = %d\n", - err); + NL_SET_ERR_MSG_WEAK(extack, + "Failed to offload TC_HTB_LEAF_ALLOC_QUEUE"); goto err_kill_estimator; } dev_queue = netdev_get_tx_queue(dev, offload_opt.qid); @@ -1937,8 +1937,8 @@ static int htb_change_class(struct Qdisc *sch, u32 classid, }; err = htb_offload(dev, &offload_opt); if (err) { - pr_err("htb: TC_HTB_LEAF_TO_INNER failed with err = %d\n", - err); + NL_SET_ERR_MSG_WEAK(extack, + "Failed to offload TC_HTB_LEAF_TO_INNER"); htb_graft_helper(dev_queue, old_q); goto err_kill_estimator; } @@ -2067,8 +2067,9 @@ static int htb_change_class(struct Qdisc *sch, u32 classid, qdisc_put(parent_qdisc); if (warn) - pr_warn("HTB: quantum of class %X is %s. Consider r2q change.\n", - cl->common.classid, (warn == -1 ? "small" : "big")); + NL_SET_ERR_MSG_FMT_MOD(extack, + "quantum of class %X is %s. Consider r2q change.", + cl->common.classid, (warn == -1 ? "small" : "big")); qdisc_class_hash_grow(sch, &q->clhash); diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c index 48ed87b91086..dc5a0ff50b14 100644 --- a/net/sched/sch_mqprio.c +++ b/net/sched/sch_mqprio.c @@ -5,6 +5,7 @@ * Copyright (c) 2010 John Fastabend <john.r.fastabend@intel.com> */ +#include <linux/ethtool_netlink.h> #include <linux/types.h> #include <linux/slab.h> #include <linux/kernel.h> @@ -27,15 +28,19 @@ struct mqprio_sched { u32 flags; u64 min_rate[TC_QOPT_MAX_QUEUE]; u64 max_rate[TC_QOPT_MAX_QUEUE]; + u32 fp[TC_QOPT_MAX_QUEUE]; }; static int mqprio_enable_offload(struct Qdisc *sch, const struct tc_mqprio_qopt *qopt, struct netlink_ext_ack *extack) { - struct tc_mqprio_qopt_offload mqprio = {.qopt = *qopt}; struct mqprio_sched *priv = qdisc_priv(sch); struct net_device *dev = qdisc_dev(sch); + struct tc_mqprio_qopt_offload mqprio = { + .qopt = *qopt, + .extack = extack, + }; int err, i; switch (priv->mode) { @@ -60,6 +65,8 @@ static int mqprio_enable_offload(struct Qdisc *sch, return -EINVAL; } + mqprio_fp_to_offload(priv->fp, &mqprio); + err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_QDISC_MQPRIO, &mqprio); if (err) @@ -133,91 +140,193 @@ static int mqprio_parse_opt(struct net_device *dev, struct tc_mqprio_qopt *qopt, /* If ndo_setup_tc is not present then hardware doesn't support offload * and we should return an error. */ - if (qopt->hw && !dev->netdev_ops->ndo_setup_tc) + if (qopt->hw && !dev->netdev_ops->ndo_setup_tc) { + NL_SET_ERR_MSG(extack, + "Device does not support hardware offload"); return -EINVAL; + } return 0; } +static const struct +nla_policy mqprio_tc_entry_policy[TCA_MQPRIO_TC_ENTRY_MAX + 1] = { + [TCA_MQPRIO_TC_ENTRY_INDEX] = NLA_POLICY_MAX(NLA_U32, + TC_QOPT_MAX_QUEUE), + [TCA_MQPRIO_TC_ENTRY_FP] = NLA_POLICY_RANGE(NLA_U32, + TC_FP_EXPRESS, + TC_FP_PREEMPTIBLE), +}; + static const struct nla_policy mqprio_policy[TCA_MQPRIO_MAX + 1] = { [TCA_MQPRIO_MODE] = { .len = sizeof(u16) }, [TCA_MQPRIO_SHAPER] = { .len = sizeof(u16) }, [TCA_MQPRIO_MIN_RATE64] = { .type = NLA_NESTED }, [TCA_MQPRIO_MAX_RATE64] = { .type = NLA_NESTED }, + [TCA_MQPRIO_TC_ENTRY] = { .type = NLA_NESTED }, }; -static int parse_attr(struct nlattr *tb[], int maxtype, struct nlattr *nla, - const struct nla_policy *policy, int len) +static int mqprio_parse_tc_entry(u32 fp[TC_QOPT_MAX_QUEUE], + struct nlattr *opt, + unsigned long *seen_tcs, + struct netlink_ext_ack *extack) { - int nested_len = nla_len(nla) - NLA_ALIGN(len); + struct nlattr *tb[TCA_MQPRIO_TC_ENTRY_MAX + 1]; + int err, tc; + + err = nla_parse_nested(tb, TCA_MQPRIO_TC_ENTRY_MAX, opt, + mqprio_tc_entry_policy, extack); + if (err < 0) + return err; + + if (NL_REQ_ATTR_CHECK(extack, opt, tb, TCA_MQPRIO_TC_ENTRY_INDEX)) { + NL_SET_ERR_MSG(extack, "TC entry index missing"); + return -EINVAL; + } + + tc = nla_get_u32(tb[TCA_MQPRIO_TC_ENTRY_INDEX]); + if (*seen_tcs & BIT(tc)) { + NL_SET_ERR_MSG_ATTR(extack, tb[TCA_MQPRIO_TC_ENTRY_INDEX], + "Duplicate tc entry"); + return -EINVAL; + } + + *seen_tcs |= BIT(tc); - if (nested_len >= nla_attr_size(0)) - return nla_parse_deprecated(tb, maxtype, - nla_data(nla) + NLA_ALIGN(len), - nested_len, policy, NULL); + if (tb[TCA_MQPRIO_TC_ENTRY_FP]) + fp[tc] = nla_get_u32(tb[TCA_MQPRIO_TC_ENTRY_FP]); - memset(tb, 0, sizeof(struct nlattr *) * (maxtype + 1)); return 0; } +static int mqprio_parse_tc_entries(struct Qdisc *sch, struct nlattr *nlattr_opt, + int nlattr_opt_len, + struct netlink_ext_ack *extack) +{ + struct mqprio_sched *priv = qdisc_priv(sch); + struct net_device *dev = qdisc_dev(sch); + bool have_preemption = false; + unsigned long seen_tcs = 0; + u32 fp[TC_QOPT_MAX_QUEUE]; + struct nlattr *n; + int tc, rem; + int err = 0; + + for (tc = 0; tc < TC_QOPT_MAX_QUEUE; tc++) + fp[tc] = priv->fp[tc]; + + nla_for_each_attr(n, nlattr_opt, nlattr_opt_len, rem) { + if (nla_type(n) != TCA_MQPRIO_TC_ENTRY) + continue; + + err = mqprio_parse_tc_entry(fp, n, &seen_tcs, extack); + if (err) + goto out; + } + + for (tc = 0; tc < TC_QOPT_MAX_QUEUE; tc++) { + priv->fp[tc] = fp[tc]; + if (fp[tc] == TC_FP_PREEMPTIBLE) + have_preemption = true; + } + + if (have_preemption && !ethtool_dev_mm_supported(dev)) { + NL_SET_ERR_MSG(extack, "Device does not support preemption"); + return -EOPNOTSUPP; + } +out: + return err; +} + +/* Parse the other netlink attributes that represent the payload of + * TCA_OPTIONS, which are appended right after struct tc_mqprio_qopt. + */ static int mqprio_parse_nlattr(struct Qdisc *sch, struct tc_mqprio_qopt *qopt, - struct nlattr *opt) + struct nlattr *opt, + struct netlink_ext_ack *extack) { + struct nlattr *nlattr_opt = nla_data(opt) + NLA_ALIGN(sizeof(*qopt)); + int nlattr_opt_len = nla_len(opt) - NLA_ALIGN(sizeof(*qopt)); struct mqprio_sched *priv = qdisc_priv(sch); - struct nlattr *tb[TCA_MQPRIO_MAX + 1]; + struct nlattr *tb[TCA_MQPRIO_MAX + 1] = {}; struct nlattr *attr; int i, rem, err; - err = parse_attr(tb, TCA_MQPRIO_MAX, opt, mqprio_policy, - sizeof(*qopt)); - if (err < 0) - return err; + if (nlattr_opt_len >= nla_attr_size(0)) { + err = nla_parse_deprecated(tb, TCA_MQPRIO_MAX, nlattr_opt, + nlattr_opt_len, mqprio_policy, + NULL); + if (err < 0) + return err; + } - if (!qopt->hw) + if (!qopt->hw) { + NL_SET_ERR_MSG(extack, + "mqprio TCA_OPTIONS can only contain netlink attributes in hardware mode"); return -EINVAL; + } if (tb[TCA_MQPRIO_MODE]) { priv->flags |= TC_MQPRIO_F_MODE; - priv->mode = *(u16 *)nla_data(tb[TCA_MQPRIO_MODE]); + priv->mode = nla_get_u16(tb[TCA_MQPRIO_MODE]); } if (tb[TCA_MQPRIO_SHAPER]) { priv->flags |= TC_MQPRIO_F_SHAPER; - priv->shaper = *(u16 *)nla_data(tb[TCA_MQPRIO_SHAPER]); + priv->shaper = nla_get_u16(tb[TCA_MQPRIO_SHAPER]); } if (tb[TCA_MQPRIO_MIN_RATE64]) { - if (priv->shaper != TC_MQPRIO_SHAPER_BW_RATE) + if (priv->shaper != TC_MQPRIO_SHAPER_BW_RATE) { + NL_SET_ERR_MSG_ATTR(extack, tb[TCA_MQPRIO_MIN_RATE64], + "min_rate accepted only when shaper is in bw_rlimit mode"); return -EINVAL; + } i = 0; nla_for_each_nested(attr, tb[TCA_MQPRIO_MIN_RATE64], rem) { - if (nla_type(attr) != TCA_MQPRIO_MIN_RATE64) + if (nla_type(attr) != TCA_MQPRIO_MIN_RATE64) { + NL_SET_ERR_MSG_ATTR(extack, attr, + "Attribute type expected to be TCA_MQPRIO_MIN_RATE64"); return -EINVAL; + } if (i >= qopt->num_tc) break; - priv->min_rate[i] = *(u64 *)nla_data(attr); + priv->min_rate[i] = nla_get_u64(attr); i++; } priv->flags |= TC_MQPRIO_F_MIN_RATE; } if (tb[TCA_MQPRIO_MAX_RATE64]) { - if (priv->shaper != TC_MQPRIO_SHAPER_BW_RATE) + if (priv->shaper != TC_MQPRIO_SHAPER_BW_RATE) { + NL_SET_ERR_MSG_ATTR(extack, tb[TCA_MQPRIO_MAX_RATE64], + "max_rate accepted only when shaper is in bw_rlimit mode"); return -EINVAL; + } i = 0; nla_for_each_nested(attr, tb[TCA_MQPRIO_MAX_RATE64], rem) { - if (nla_type(attr) != TCA_MQPRIO_MAX_RATE64) + if (nla_type(attr) != TCA_MQPRIO_MAX_RATE64) { + NL_SET_ERR_MSG_ATTR(extack, attr, + "Attribute type expected to be TCA_MQPRIO_MAX_RATE64"); return -EINVAL; + } if (i >= qopt->num_tc) break; - priv->max_rate[i] = *(u64 *)nla_data(attr); + priv->max_rate[i] = nla_get_u64(attr); i++; } priv->flags |= TC_MQPRIO_F_MAX_RATE; } + if (tb[TCA_MQPRIO_TC_ENTRY]) { + err = mqprio_parse_tc_entries(sch, nlattr_opt, nlattr_opt_len, + extack); + if (err) + return err; + } + return 0; } @@ -231,7 +340,7 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt, int i, err = -EOPNOTSUPP; struct tc_mqprio_qopt *qopt = NULL; struct tc_mqprio_caps caps; - int len; + int len, tc; BUILD_BUG_ON(TC_MAX_QUEUE != TC_QOPT_MAX_QUEUE); BUILD_BUG_ON(TC_BITMASK != TC_QOPT_BITMASK); @@ -249,6 +358,9 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt, if (!opt || nla_len(opt) < sizeof(*qopt)) return -EINVAL; + for (tc = 0; tc < TC_QOPT_MAX_QUEUE; tc++) + priv->fp[tc] = TC_FP_EXPRESS; + qdisc_offload_query_caps(dev, TC_SETUP_QDISC_MQPRIO, &caps, sizeof(caps)); @@ -258,7 +370,7 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt, len = nla_len(opt) - NLA_ALIGN(sizeof(*qopt)); if (len > 0) { - err = mqprio_parse_nlattr(sch, qopt, opt); + err = mqprio_parse_nlattr(sch, qopt, opt, extack); if (err) return err; } @@ -399,6 +511,33 @@ nla_put_failure: return -1; } +static int mqprio_dump_tc_entries(struct mqprio_sched *priv, + struct sk_buff *skb) +{ + struct nlattr *n; + int tc; + + for (tc = 0; tc < TC_QOPT_MAX_QUEUE; tc++) { + n = nla_nest_start(skb, TCA_MQPRIO_TC_ENTRY); + if (!n) + return -EMSGSIZE; + + if (nla_put_u32(skb, TCA_MQPRIO_TC_ENTRY_INDEX, tc)) + goto nla_put_failure; + + if (nla_put_u32(skb, TCA_MQPRIO_TC_ENTRY_FP, priv->fp[tc])) + goto nla_put_failure; + + nla_nest_end(skb, n); + } + + return 0; + +nla_put_failure: + nla_nest_cancel(skb, n); + return -EMSGSIZE; +} + static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb) { struct net_device *dev = qdisc_dev(sch); @@ -449,6 +588,9 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb) (dump_rates(priv, &opt, skb) != 0)) goto nla_put_failure; + if (mqprio_dump_tc_entries(priv, skb)) + goto nla_put_failure; + return nla_nest_end(skb, nla); nla_put_failure: nlmsg_trim(skb, nla); diff --git a/net/sched/sch_mqprio_lib.c b/net/sched/sch_mqprio_lib.c index c58a533b8ec5..83b3793c4012 100644 --- a/net/sched/sch_mqprio_lib.c +++ b/net/sched/sch_mqprio_lib.c @@ -114,4 +114,18 @@ void mqprio_qopt_reconstruct(struct net_device *dev, struct tc_mqprio_qopt *qopt } EXPORT_SYMBOL_GPL(mqprio_qopt_reconstruct); +void mqprio_fp_to_offload(u32 fp[TC_QOPT_MAX_QUEUE], + struct tc_mqprio_qopt_offload *mqprio) +{ + unsigned long preemptible_tcs = 0; + int tc; + + for (tc = 0; tc < TC_QOPT_MAX_QUEUE; tc++) + if (fp[tc] == TC_FP_PREEMPTIBLE) + preemptible_tcs |= BIT(tc); + + mqprio->preemptible_tcs = preemptible_tcs; +} +EXPORT_SYMBOL_GPL(mqprio_fp_to_offload); + MODULE_LICENSE("GPL"); diff --git a/net/sched/sch_mqprio_lib.h b/net/sched/sch_mqprio_lib.h index 63f725ab8761..079f597072e3 100644 --- a/net/sched/sch_mqprio_lib.h +++ b/net/sched/sch_mqprio_lib.h @@ -14,5 +14,7 @@ int mqprio_validate_qopt(struct net_device *dev, struct tc_mqprio_qopt *qopt, struct netlink_ext_ack *extack); void mqprio_qopt_reconstruct(struct net_device *dev, struct tc_mqprio_qopt *qopt); +void mqprio_fp_to_offload(u32 fp[TC_QOPT_MAX_QUEUE], + struct tc_mqprio_qopt_offload *mqprio); #endif diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c index 265c238047a4..2152a56d73f8 100644 --- a/net/sched/sch_pie.c +++ b/net/sched/sch_pie.c @@ -319,7 +319,7 @@ void pie_calculate_probability(struct pie_params *params, struct pie_vars *vars, } /* If qdelay is zero and backlog is not, it means backlog is very small, - * so we do not update probabilty in this round. + * so we do not update probability in this round. */ if (qdelay == 0 && backlog != 0) update_prob = false; diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c index 02098a02943e..dfd9a99e6257 100644 --- a/net/sched/sch_qfq.c +++ b/net/sched/sch_qfq.c @@ -113,6 +113,7 @@ #define QFQ_MTU_SHIFT 16 /* to support TSO/GSO */ #define QFQ_MIN_LMAX 512 /* see qfq_slot_insert */ +#define QFQ_MAX_LMAX (1UL << QFQ_MTU_SHIFT) #define QFQ_MAX_AGG_CLASSES 8 /* max num classes per aggregate allowed */ @@ -214,9 +215,14 @@ static struct qfq_class *qfq_find_class(struct Qdisc *sch, u32 classid) return container_of(clc, struct qfq_class, common); } +static struct netlink_range_validation lmax_range = { + .min = QFQ_MIN_LMAX, + .max = QFQ_MAX_LMAX, +}; + static const struct nla_policy qfq_policy[TCA_QFQ_MAX + 1] = { - [TCA_QFQ_WEIGHT] = { .type = NLA_U32 }, - [TCA_QFQ_LMAX] = { .type = NLA_U32 }, + [TCA_QFQ_WEIGHT] = NLA_POLICY_RANGE(NLA_U32, 1, QFQ_MAX_WEIGHT), + [TCA_QFQ_LMAX] = NLA_POLICY_FULL_RANGE(NLA_U32, &lmax_range), }; /* @@ -402,23 +408,19 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid, int err; int delta_w; - if (tca[TCA_OPTIONS] == NULL) { - pr_notice("qfq: no options\n"); + if (NL_REQ_ATTR_CHECK(extack, NULL, tca, TCA_OPTIONS)) { + NL_SET_ERR_MSG_MOD(extack, "missing options"); return -EINVAL; } err = nla_parse_nested_deprecated(tb, TCA_QFQ_MAX, tca[TCA_OPTIONS], - qfq_policy, NULL); + qfq_policy, extack); if (err < 0) return err; - if (tb[TCA_QFQ_WEIGHT]) { + if (tb[TCA_QFQ_WEIGHT]) weight = nla_get_u32(tb[TCA_QFQ_WEIGHT]); - if (!weight || weight > (1UL << QFQ_MAX_WSHIFT)) { - pr_notice("qfq: invalid weight %u\n", weight); - return -EINVAL; - } - } else + else weight = 1; if (tb[TCA_QFQ_LMAX]) @@ -426,11 +428,6 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid, else lmax = psched_mtu(qdisc_dev(sch)); - if (lmax < QFQ_MIN_LMAX || lmax > (1UL << QFQ_MTU_SHIFT)) { - pr_notice("qfq: invalid max length %u\n", lmax); - return -EINVAL; - } - inv_w = ONE_FP / weight; weight = ONE_FP / inv_w; @@ -442,8 +439,9 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid, delta_w = weight - (cl ? cl->agg->class_weight : 0); if (q->wsum + delta_w > QFQ_MAX_WSUM) { - pr_notice("qfq: total weight out of range (%d + %u)\n", - delta_w, q->wsum); + NL_SET_ERR_MSG_FMT_MOD(extack, + "total weight out of range (%d + %u)\n", + delta_w, q->wsum); return -EINVAL; } diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index 1f469861eae3..76db9a10ef50 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -7,6 +7,7 @@ */ #include <linux/ethtool.h> +#include <linux/ethtool_netlink.h> #include <linux/types.h> #include <linux/slab.h> #include <linux/kernel.h> @@ -96,6 +97,7 @@ struct taprio_sched { struct list_head taprio_list; int cur_txq[TC_MAX_QUEUE]; u32 max_sdu[TC_MAX_QUEUE]; /* save info from the user */ + u32 fp[TC_QOPT_MAX_QUEUE]; /* only for dump and offloading */ u32 txtime_delay; }; @@ -1002,6 +1004,9 @@ static const struct nla_policy entry_policy[TCA_TAPRIO_SCHED_ENTRY_MAX + 1] = { static const struct nla_policy taprio_tc_policy[TCA_TAPRIO_TC_ENTRY_MAX + 1] = { [TCA_TAPRIO_TC_ENTRY_INDEX] = { .type = NLA_U32 }, [TCA_TAPRIO_TC_ENTRY_MAX_SDU] = { .type = NLA_U32 }, + [TCA_TAPRIO_TC_ENTRY_FP] = NLA_POLICY_RANGE(NLA_U32, + TC_FP_EXPRESS, + TC_FP_PREEMPTIBLE), }; static const struct nla_policy taprio_policy[TCA_TAPRIO_ATTR_MAX + 1] = { @@ -1520,22 +1525,31 @@ static int taprio_enable_offload(struct net_device *dev, return -ENOMEM; } offload->enable = 1; + offload->extack = extack; mqprio_qopt_reconstruct(dev, &offload->mqprio.qopt); + offload->mqprio.extack = extack; taprio_sched_to_offload(dev, sched, offload, &caps); + mqprio_fp_to_offload(q->fp, &offload->mqprio); for (tc = 0; tc < TC_MAX_QUEUE; tc++) offload->max_sdu[tc] = q->max_sdu[tc]; err = ops->ndo_setup_tc(dev, TC_SETUP_QDISC_TAPRIO, offload); if (err < 0) { - NL_SET_ERR_MSG(extack, - "Device failed to setup taprio offload"); + NL_SET_ERR_MSG_WEAK(extack, + "Device failed to setup taprio offload"); goto done; } q->offloaded = true; done: + /* The offload structure may linger around via a reference taken by the + * device driver, so clear up the netlink extack pointer so that the + * driver isn't tempted to dereference data which stopped being valid + */ + offload->extack = NULL; + offload->mqprio.extack = NULL; taprio_offload_free(offload); return err; @@ -1663,13 +1677,14 @@ out: static int taprio_parse_tc_entry(struct Qdisc *sch, struct nlattr *opt, u32 max_sdu[TC_QOPT_MAX_QUEUE], + u32 fp[TC_QOPT_MAX_QUEUE], unsigned long *seen_tcs, struct netlink_ext_ack *extack) { struct nlattr *tb[TCA_TAPRIO_TC_ENTRY_MAX + 1] = { }; struct net_device *dev = qdisc_dev(sch); - u32 val = 0; int err, tc; + u32 val; err = nla_parse_nested(tb, TCA_TAPRIO_TC_ENTRY_MAX, opt, taprio_tc_policy, extack); @@ -1694,15 +1709,18 @@ static int taprio_parse_tc_entry(struct Qdisc *sch, *seen_tcs |= BIT(tc); - if (tb[TCA_TAPRIO_TC_ENTRY_MAX_SDU]) + if (tb[TCA_TAPRIO_TC_ENTRY_MAX_SDU]) { val = nla_get_u32(tb[TCA_TAPRIO_TC_ENTRY_MAX_SDU]); + if (val > dev->max_mtu) { + NL_SET_ERR_MSG_MOD(extack, "TC max SDU exceeds device max MTU"); + return -ERANGE; + } - if (val > dev->max_mtu) { - NL_SET_ERR_MSG_MOD(extack, "TC max SDU exceeds device max MTU"); - return -ERANGE; + max_sdu[tc] = val; } - max_sdu[tc] = val; + if (tb[TCA_TAPRIO_TC_ENTRY_FP]) + fp[tc] = nla_get_u32(tb[TCA_TAPRIO_TC_ENTRY_FP]); return 0; } @@ -1712,29 +1730,51 @@ static int taprio_parse_tc_entries(struct Qdisc *sch, struct netlink_ext_ack *extack) { struct taprio_sched *q = qdisc_priv(sch); + struct net_device *dev = qdisc_dev(sch); u32 max_sdu[TC_QOPT_MAX_QUEUE]; + bool have_preemption = false; unsigned long seen_tcs = 0; + u32 fp[TC_QOPT_MAX_QUEUE]; struct nlattr *n; int tc, rem; int err = 0; - for (tc = 0; tc < TC_QOPT_MAX_QUEUE; tc++) + for (tc = 0; tc < TC_QOPT_MAX_QUEUE; tc++) { max_sdu[tc] = q->max_sdu[tc]; + fp[tc] = q->fp[tc]; + } nla_for_each_nested(n, opt, rem) { if (nla_type(n) != TCA_TAPRIO_ATTR_TC_ENTRY) continue; - err = taprio_parse_tc_entry(sch, n, max_sdu, &seen_tcs, + err = taprio_parse_tc_entry(sch, n, max_sdu, fp, &seen_tcs, extack); if (err) - goto out; + return err; } - for (tc = 0; tc < TC_QOPT_MAX_QUEUE; tc++) + for (tc = 0; tc < TC_QOPT_MAX_QUEUE; tc++) { q->max_sdu[tc] = max_sdu[tc]; + q->fp[tc] = fp[tc]; + if (fp[tc] != TC_FP_EXPRESS) + have_preemption = true; + } + + if (have_preemption) { + if (!FULL_OFFLOAD_IS_ENABLED(q->flags)) { + NL_SET_ERR_MSG(extack, + "Preemption only supported with full offload"); + return -EOPNOTSUPP; + } + + if (!ethtool_dev_mm_supported(dev)) { + NL_SET_ERR_MSG(extack, + "Device does not support preemption"); + return -EOPNOTSUPP; + } + } -out: return err; } @@ -2015,7 +2055,7 @@ static int taprio_init(struct Qdisc *sch, struct nlattr *opt, { struct taprio_sched *q = qdisc_priv(sch); struct net_device *dev = qdisc_dev(sch); - int i; + int i, tc; spin_lock_init(&q->current_entry_lock); @@ -2072,6 +2112,9 @@ static int taprio_init(struct Qdisc *sch, struct nlattr *opt, q->qdiscs[i] = qdisc; } + for (tc = 0; tc < TC_QOPT_MAX_QUEUE; tc++) + q->fp[tc] = TC_FP_EXPRESS; + taprio_detect_broken_mqprio(q); return taprio_change(sch, opt, extack); @@ -2215,6 +2258,7 @@ error_nest: } static int taprio_dump_tc_entries(struct sk_buff *skb, + struct taprio_sched *q, struct sched_gate_list *sched) { struct nlattr *n; @@ -2232,6 +2276,9 @@ static int taprio_dump_tc_entries(struct sk_buff *skb, sched->max_sdu[tc])) goto nla_put_failure; + if (nla_put_u32(skb, TCA_TAPRIO_TC_ENTRY_FP, q->fp[tc])) + goto nla_put_failure; + nla_nest_end(skb, n); } @@ -2273,7 +2320,7 @@ static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb) nla_put_u32(skb, TCA_TAPRIO_ATTR_TXTIME_DELAY, q->txtime_delay)) goto options_error; - if (oper && taprio_dump_tc_entries(skb, oper)) + if (oper && taprio_dump_tc_entries(skb, q, oper)) goto options_error; if (oper && dump_schedule(skb, oper)) diff --git a/net/sctp/Makefile b/net/sctp/Makefile index e845e4588535..0448398408d8 100644 --- a/net/sctp/Makefile +++ b/net/sctp/Makefile @@ -13,7 +13,8 @@ sctp-y := sm_statetable.o sm_statefuns.o sm_sideeffect.o \ tsnmap.o bind_addr.o socket.o primitive.o \ output.o input.o debug.o stream.o auth.o \ offload.o stream_sched.o stream_sched_prio.o \ - stream_sched_rr.o stream_interleave.o + stream_sched_rr.o stream_sched_fc.o \ + stream_interleave.o sctp_diag-y := diag.o diff --git a/net/sctp/associola.c b/net/sctp/associola.c index 63ba5551c13f..796529167e8d 100644 --- a/net/sctp/associola.c +++ b/net/sctp/associola.c @@ -1597,9 +1597,10 @@ int sctp_assoc_set_bind_addr_from_cookie(struct sctp_association *asoc, struct sctp_cookie *cookie, gfp_t gfp) { - int var_size2 = ntohs(cookie->peer_init->chunk_hdr.length); + struct sctp_init_chunk *peer_init = (struct sctp_init_chunk *)(cookie + 1); + int var_size2 = ntohs(peer_init->chunk_hdr.length); int var_size3 = cookie->raw_addr_list_len; - __u8 *raw = (__u8 *)cookie->peer_init + var_size2; + __u8 *raw = (__u8 *)peer_init + var_size2; return sctp_raw_to_bind_addrs(&asoc->base.bind_addr, raw, var_size3, asoc->ep->base.bind_addr.port, gfp); diff --git a/net/sctp/auth.c b/net/sctp/auth.c index 34964145514e..c58fffc86a0c 100644 --- a/net/sctp/auth.c +++ b/net/sctp/auth.c @@ -738,7 +738,7 @@ void sctp_auth_calculate_hmac(const struct sctp_association *asoc, tfm = asoc->ep->auth_hmacs[hmac_id]; - digest = auth->auth_hdr.hmac; + digest = (u8 *)(&auth->auth_hdr + 1); if (crypto_shash_setkey(tfm, &asoc_key->data[0], asoc_key->len)) goto free; diff --git a/net/sctp/input.c b/net/sctp/input.c index bf70371301ff..2613c4d74b16 100644 --- a/net/sctp/input.c +++ b/net/sctp/input.c @@ -585,7 +585,7 @@ static void sctp_v4_err_handle(struct sctp_transport *t, struct sk_buff *skb, sk->sk_err = err; sk_error_report(sk); } else { /* Only an error on timeout */ - sk->sk_err_soft = err; + WRITE_ONCE(sk->sk_err_soft, err); } } @@ -1150,7 +1150,7 @@ static struct sctp_association *__sctp_rcv_init_lookup(struct net *net, init = (struct sctp_init_chunk *)skb->data; /* Walk the parameters looking for embedded addresses. */ - sctp_walk_params(params, init, init_hdr.params) { + sctp_walk_params(params, init) { /* Note: Ignoring hostname addresses. */ af = sctp_get_af_specific(param_type2af(params.p->type)); diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c index 62b436a2c8fe..43f2731bf590 100644 --- a/net/sctp/ipv6.c +++ b/net/sctp/ipv6.c @@ -155,7 +155,7 @@ static void sctp_v6_err_handle(struct sctp_transport *t, struct sk_buff *skb, sk->sk_err = err; sk_error_report(sk); } else { - sk->sk_err_soft = err; + WRITE_ONCE(sk->sk_err_soft, err); } } diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c index 20831079fb09..0dc6b8ab9963 100644 --- a/net/sctp/outqueue.c +++ b/net/sctp/outqueue.c @@ -1231,7 +1231,7 @@ static void sctp_sack_update_unack_data(struct sctp_association *assoc, unack_data = assoc->next_tsn - assoc->ctsn_ack_point - 1; - frags = sack->variable; + frags = (union sctp_sack_variable *)(sack + 1); for (i = 0; i < ntohs(sack->num_gap_ack_blocks); i++) { unack_data -= ((ntohs(frags[i].gab.end) - ntohs(frags[i].gab.start) + 1)); @@ -1252,7 +1252,6 @@ int sctp_outq_sack(struct sctp_outq *q, struct sctp_chunk *chunk) struct sctp_transport *transport; struct sctp_chunk *tchunk = NULL; struct list_head *lchunk, *transport_list, *temp; - union sctp_sack_variable *frags = sack->variable; __u32 sack_ctsn, ctsn, tsn; __u32 highest_tsn, highest_new_tsn; __u32 sack_a_rwnd; @@ -1313,8 +1312,12 @@ int sctp_outq_sack(struct sctp_outq *q, struct sctp_chunk *chunk) /* Get the highest TSN in the sack. */ highest_tsn = sack_ctsn; - if (gap_ack_blocks) + if (gap_ack_blocks) { + union sctp_sack_variable *frags = + (union sctp_sack_variable *)(sack + 1); + highest_tsn += ntohs(frags[gap_ack_blocks - 1].gab.end); + } if (TSN_lt(asoc->highest_sacked, highest_tsn)) asoc->highest_sacked = highest_tsn; @@ -1789,7 +1792,7 @@ static int sctp_acked(struct sctp_sackhdr *sack, __u32 tsn) * Block are assumed to have been received correctly. */ - frags = sack->variable; + frags = (union sctp_sack_variable *)(sack + 1); blocks = ntohs(sack->num_gap_ack_blocks); tsn_offset = tsn - ctsn; for (i = 0; i < blocks; ++i) { diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c index c7503fd64915..08527d882e56 100644 --- a/net/sctp/sm_make_chunk.c +++ b/net/sctp/sm_make_chunk.c @@ -1707,11 +1707,11 @@ static struct sctp_cookie_param *sctp_pack_cookie( ktime_get_real()); /* Copy the peer's init packet. */ - memcpy(&cookie->c.peer_init[0], init_chunk->chunk_hdr, + memcpy(cookie + 1, init_chunk->chunk_hdr, ntohs(init_chunk->chunk_hdr->length)); /* Copy the raw local address list of the association. */ - memcpy((__u8 *)&cookie->c.peer_init[0] + + memcpy((__u8 *)(cookie + 1) + ntohs(init_chunk->chunk_hdr->length), raw_addrs, addrs_len); if (sctp_sk(ep->base.sk)->hmac) { @@ -2207,7 +2207,7 @@ static enum sctp_ierror sctp_verify_param(struct net *net, break; case SCTP_PARAM_HOST_NAME_ADDRESS: - /* Tell the peer, we won't support this param. */ + /* This param has been Deprecated, send ABORT. */ sctp_process_hn_param(asoc, param, chunk, err_chunk); retval = SCTP_IERROR_ABORT; break; @@ -2306,7 +2306,7 @@ int sctp_verify_init(struct net *net, const struct sctp_endpoint *ep, ntohl(peer_init->init_hdr.a_rwnd) < SCTP_DEFAULT_MINWINDOW) return sctp_process_inv_mandatory(asoc, chunk, errp); - sctp_walk_params(param, peer_init, init_hdr.params) { + sctp_walk_params(param, peer_init) { if (param.p->type == SCTP_PARAM_STATE_COOKIE) has_cookie = true; } @@ -2329,7 +2329,7 @@ int sctp_verify_init(struct net *net, const struct sctp_endpoint *ep, chunk, errp); /* Verify all the variable length parameters */ - sctp_walk_params(param, peer_init, init_hdr.params) { + sctp_walk_params(param, peer_init) { result = sctp_verify_param(net, ep, asoc, param, cid, chunk, errp); switch (result) { @@ -2381,7 +2381,7 @@ int sctp_process_init(struct sctp_association *asoc, struct sctp_chunk *chunk, src_match = 1; /* Process the initialization parameters. */ - sctp_walk_params(param, peer_init, init_hdr.params) { + sctp_walk_params(param, peer_init) { if (!src_match && (param.p->type == SCTP_PARAM_IPV4_ADDRESS || param.p->type == SCTP_PARAM_IPV6_ADDRESS)) { @@ -2589,10 +2589,6 @@ do_addr_param: asoc->cookie_life = ktime_add_ms(asoc->cookie_life, stale); break; - case SCTP_PARAM_HOST_NAME_ADDRESS: - pr_debug("%s: unimplemented SCTP_HOST_NAME_ADDRESS\n", __func__); - break; - case SCTP_PARAM_SUPPORTED_ADDRESS_TYPES: /* Turn off the default values first so we'll know which * ones are really set by the peer. @@ -2624,10 +2620,6 @@ do_addr_param: asoc->peer.ipv6_address = 1; break; - case SCTP_PARAM_HOST_NAME_ADDRESS: - asoc->peer.hostname_address = 1; - break; - default: /* Just ignore anything else. */ break; } @@ -3210,7 +3202,7 @@ bool sctp_verify_asconf(const struct sctp_association *asoc, union sctp_params param; addip = (struct sctp_addip_chunk *)chunk->chunk_hdr; - sctp_walk_params(param, addip, addip_hdr.params) { + sctp_walk_params(param, addip) { size_t length = ntohs(param.p->length); *errp = param.p; @@ -3223,14 +3215,14 @@ bool sctp_verify_asconf(const struct sctp_association *asoc, /* ensure there is only one addr param and it's in the * beginning of addip_hdr params, or we reject it. */ - if (param.v != addip->addip_hdr.params) + if (param.v != (addip + 1)) return false; addr_param_seen = true; break; case SCTP_PARAM_IPV6_ADDRESS: if (length != sizeof(struct sctp_ipv6addr_param)) return false; - if (param.v != addip->addip_hdr.params) + if (param.v != (addip + 1)) return false; addr_param_seen = true; break; @@ -3310,7 +3302,7 @@ struct sctp_chunk *sctp_process_asconf(struct sctp_association *asoc, goto done; /* Process the TLVs contained within the ASCONF chunk. */ - sctp_walk_params(param, addip, addip_hdr.params) { + sctp_walk_params(param, addip) { /* Skip preceeding address parameters. */ if (param.p->type == SCTP_PARAM_IPV4_ADDRESS || param.p->type == SCTP_PARAM_IPV6_ADDRESS) @@ -3644,7 +3636,7 @@ static struct sctp_chunk *sctp_make_reconf(const struct sctp_association *asoc, return NULL; reconf = (struct sctp_reconf_chunk *)retval->chunk_hdr; - retval->param_hdr.v = reconf->params; + retval->param_hdr.v = (u8 *)(reconf + 1); return retval; } @@ -3886,7 +3878,7 @@ bool sctp_verify_reconf(const struct sctp_association *asoc, __u16 cnt = 0; hdr = (struct sctp_reconf_chunk *)chunk->chunk_hdr; - sctp_walk_params(param, hdr, params) { + sctp_walk_params(param, hdr) { __u16 length = ntohs(param.p->length); *errp = param.p; diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c index 463c4a58d2c3..7fbeb99d8d32 100644 --- a/net/sctp/sm_sideeffect.c +++ b/net/sctp/sm_sideeffect.c @@ -984,8 +984,7 @@ static void sctp_cmd_process_operr(struct sctp_cmd_seq *cmds, { struct sctp_chunkhdr *unk_chunk_hdr; - unk_chunk_hdr = (struct sctp_chunkhdr *) - err_hdr->variable; + unk_chunk_hdr = (struct sctp_chunkhdr *)(err_hdr + 1); switch (unk_chunk_hdr->type) { /* ADDIP 4.1 A9) If the peer responds to an ASCONF with * an ERROR chunk reporting that it did not recognized diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c index ce5426171206..97f1155a2045 100644 --- a/net/sctp/sm_statefuns.c +++ b/net/sctp/sm_statefuns.c @@ -794,8 +794,7 @@ enum sctp_disposition sctp_sf_do_5_1D_ce(struct net *net, /* This is a brand-new association, so these are not yet side * effects--it is safe to run them here. */ - peer_init = &chunk->subh.cookie_hdr->c.peer_init[0]; - + peer_init = (struct sctp_init_chunk *)(chunk->subh.cookie_hdr + 1); if (!sctp_process_init(new_asoc, chunk, &chunk->subh.cookie_hdr->c.peer_addr, peer_init, GFP_ATOMIC)) @@ -1337,7 +1336,7 @@ static int sctp_sf_send_restart_abort(struct net *net, union sctp_addr *ssa, * throughout the code today. */ errhdr = (struct sctp_errhdr *)buffer; - addrparm = (union sctp_addr_param *)errhdr->variable; + addrparm = (union sctp_addr_param *)(errhdr + 1); /* Copy into a parm format. */ len = af->to_addr_param(ssa, addrparm); @@ -1869,8 +1868,7 @@ static enum sctp_disposition sctp_sf_do_dupcook_a( /* new_asoc is a brand-new association, so these are not yet * side effects--it is safe to run them here. */ - peer_init = &chunk->subh.cookie_hdr->c.peer_init[0]; - + peer_init = (struct sctp_init_chunk *)(chunk->subh.cookie_hdr + 1); if (!sctp_process_init(new_asoc, chunk, sctp_source(chunk), peer_init, GFP_ATOMIC)) goto nomem; @@ -1990,7 +1988,7 @@ static enum sctp_disposition sctp_sf_do_dupcook_b( /* new_asoc is a brand-new association, so these are not yet * side effects--it is safe to run them here. */ - peer_init = &chunk->subh.cookie_hdr->c.peer_init[0]; + peer_init = (struct sctp_init_chunk *)(chunk->subh.cookie_hdr + 1); if (!sctp_process_init(new_asoc, chunk, sctp_source(chunk), peer_init, GFP_ATOMIC)) goto nomem; @@ -4142,7 +4140,7 @@ enum sctp_disposition sctp_sf_do_reconf(struct net *net, (void *)err_param, commands); hdr = (struct sctp_reconf_chunk *)chunk->chunk_hdr; - sctp_walk_params(param, hdr, params) { + sctp_walk_params(param, hdr) { struct sctp_chunk *reply = NULL; struct sctp_ulpevent *ev = NULL; @@ -4393,7 +4391,7 @@ static enum sctp_ierror sctp_sf_authenticate( * 3. Compute the new digest * 4. Compare saved and new digests. */ - digest = auth_hdr->hmac; + digest = (u8 *)(auth_hdr + 1); skb_pull(chunk->skb, sig_len); save_digest = kmemdup(digest, sig_len, GFP_ATOMIC); diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 218e0982c370..cda8c2874691 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -5192,10 +5192,11 @@ int sctp_get_sctp_info(struct sock *sk, struct sctp_association *asoc, info->sctpi_peer_rwnd = asoc->peer.rwnd; info->sctpi_peer_tag = asoc->c.peer_vtag; - mask = asoc->peer.ecn_capable << 1; + mask = asoc->peer.intl_capable << 1; + mask = (mask | asoc->peer.ecn_capable) << 1; mask = (mask | asoc->peer.ipv4_address) << 1; mask = (mask | asoc->peer.ipv6_address) << 1; - mask = (mask | asoc->peer.hostname_address) << 1; + mask = (mask | asoc->peer.reconf_capable) << 1; mask = (mask | asoc->peer.asconf_capable) << 1; mask = (mask | asoc->peer.prsctp_capable) << 1; mask = (mask | asoc->peer.auth_capable); diff --git a/net/sctp/stream.c b/net/sctp/stream.c index ee6514af830f..c241cc552e8d 100644 --- a/net/sctp/stream.c +++ b/net/sctp/stream.c @@ -491,7 +491,7 @@ static struct sctp_paramhdr *sctp_chunk_lookup_strreset_param( return NULL; hdr = (struct sctp_reconf_chunk *)chunk->chunk_hdr; - sctp_walk_params(param, hdr, params) { + sctp_walk_params(param, hdr) { /* sctp_strreset_tsnreq is actually the basic structure * of all stream reconf params, so it's safe to use it * to access request_seq. diff --git a/net/sctp/stream_interleave.c b/net/sctp/stream_interleave.c index b046b11200c9..840f24045ae2 100644 --- a/net/sctp/stream_interleave.c +++ b/net/sctp/stream_interleave.c @@ -1153,8 +1153,8 @@ static void sctp_generate_iftsn(struct sctp_outq *q, __u32 ctsn) } #define _sctp_walk_ifwdtsn(pos, chunk, end) \ - for (pos = chunk->subh.ifwdtsn_hdr->skip; \ - (void *)pos <= (void *)chunk->subh.ifwdtsn_hdr->skip + (end) - \ + for (pos = (void *)(chunk->subh.ifwdtsn_hdr + 1); \ + (void *)pos <= (void *)(chunk->subh.ifwdtsn_hdr + 1) + (end) - \ sizeof(struct sctp_ifwdtsn_skip); pos++) #define sctp_walk_ifwdtsn(pos, ch) \ diff --git a/net/sctp/stream_sched.c b/net/sctp/stream_sched.c index 330067002deb..e843760e9aaa 100644 --- a/net/sctp/stream_sched.c +++ b/net/sctp/stream_sched.c @@ -124,6 +124,8 @@ void sctp_sched_ops_init(void) sctp_sched_ops_fcfs_init(); sctp_sched_ops_prio_init(); sctp_sched_ops_rr_init(); + sctp_sched_ops_fc_init(); + sctp_sched_ops_wfq_init(); } static void sctp_sched_free_sched(struct sctp_stream *stream) diff --git a/net/sctp/stream_sched_fc.c b/net/sctp/stream_sched_fc.c new file mode 100644 index 000000000000..4bd18a497a6d --- /dev/null +++ b/net/sctp/stream_sched_fc.c @@ -0,0 +1,225 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* SCTP kernel implementation + * (C) Copyright Red Hat Inc. 2022 + * + * This file is part of the SCTP kernel implementation + * + * These functions manipulate sctp stream queue/scheduling. + * + * Please send any bug reports or fixes you make to the + * email addresched(es): + * lksctp developers <linux-sctp@vger.kernel.org> + * + * Written or modified by: + * Xin Long <lucien.xin@gmail.com> + */ + +#include <linux/list.h> +#include <net/sctp/sctp.h> +#include <net/sctp/sm.h> +#include <net/sctp/stream_sched.h> + +/* Fair Capacity and Weighted Fair Queueing handling + * RFC 8260 section 3.5 and 3.6 + */ +static void sctp_sched_fc_unsched_all(struct sctp_stream *stream); + +static int sctp_sched_wfq_set(struct sctp_stream *stream, __u16 sid, + __u16 weight, gfp_t gfp) +{ + struct sctp_stream_out_ext *soute = SCTP_SO(stream, sid)->ext; + + if (!weight) + return -EINVAL; + + soute->fc_weight = weight; + return 0; +} + +static int sctp_sched_wfq_get(struct sctp_stream *stream, __u16 sid, + __u16 *value) +{ + struct sctp_stream_out_ext *soute = SCTP_SO(stream, sid)->ext; + + *value = soute->fc_weight; + return 0; +} + +static int sctp_sched_fc_set(struct sctp_stream *stream, __u16 sid, + __u16 weight, gfp_t gfp) +{ + return 0; +} + +static int sctp_sched_fc_get(struct sctp_stream *stream, __u16 sid, + __u16 *value) +{ + return 0; +} + +static int sctp_sched_fc_init(struct sctp_stream *stream) +{ + INIT_LIST_HEAD(&stream->fc_list); + + return 0; +} + +static int sctp_sched_fc_init_sid(struct sctp_stream *stream, __u16 sid, + gfp_t gfp) +{ + struct sctp_stream_out_ext *soute = SCTP_SO(stream, sid)->ext; + + INIT_LIST_HEAD(&soute->fc_list); + soute->fc_length = 0; + soute->fc_weight = 1; + + return 0; +} + +static void sctp_sched_fc_free_sid(struct sctp_stream *stream, __u16 sid) +{ +} + +static void sctp_sched_fc_sched(struct sctp_stream *stream, + struct sctp_stream_out_ext *soute) +{ + struct sctp_stream_out_ext *pos; + + if (!list_empty(&soute->fc_list)) + return; + + list_for_each_entry(pos, &stream->fc_list, fc_list) + if ((__u64)pos->fc_length * soute->fc_weight >= + (__u64)soute->fc_length * pos->fc_weight) + break; + list_add_tail(&soute->fc_list, &pos->fc_list); +} + +static void sctp_sched_fc_enqueue(struct sctp_outq *q, + struct sctp_datamsg *msg) +{ + struct sctp_stream *stream; + struct sctp_chunk *ch; + __u16 sid; + + ch = list_first_entry(&msg->chunks, struct sctp_chunk, frag_list); + sid = sctp_chunk_stream_no(ch); + stream = &q->asoc->stream; + sctp_sched_fc_sched(stream, SCTP_SO(stream, sid)->ext); +} + +static struct sctp_chunk *sctp_sched_fc_dequeue(struct sctp_outq *q) +{ + struct sctp_stream *stream = &q->asoc->stream; + struct sctp_stream_out_ext *soute; + struct sctp_chunk *ch; + + /* Bail out quickly if queue is empty */ + if (list_empty(&q->out_chunk_list)) + return NULL; + + /* Find which chunk is next */ + if (stream->out_curr) + soute = stream->out_curr->ext; + else + soute = list_entry(stream->fc_list.next, struct sctp_stream_out_ext, fc_list); + ch = list_entry(soute->outq.next, struct sctp_chunk, stream_list); + + sctp_sched_dequeue_common(q, ch); + return ch; +} + +static void sctp_sched_fc_dequeue_done(struct sctp_outq *q, + struct sctp_chunk *ch) +{ + struct sctp_stream *stream = &q->asoc->stream; + struct sctp_stream_out_ext *soute, *pos; + __u16 sid, i; + + sid = sctp_chunk_stream_no(ch); + soute = SCTP_SO(stream, sid)->ext; + /* reduce all fc_lengths by U32_MAX / 4 if the current fc_length overflows. */ + if (soute->fc_length > U32_MAX - ch->skb->len) { + for (i = 0; i < stream->outcnt; i++) { + pos = SCTP_SO(stream, i)->ext; + if (!pos) + continue; + if (pos->fc_length <= (U32_MAX >> 2)) { + pos->fc_length = 0; + continue; + } + pos->fc_length -= (U32_MAX >> 2); + } + } + soute->fc_length += ch->skb->len; + + if (list_empty(&soute->outq)) { + list_del_init(&soute->fc_list); + return; + } + + pos = soute; + list_for_each_entry_continue(pos, &stream->fc_list, fc_list) + if ((__u64)pos->fc_length * soute->fc_weight >= + (__u64)soute->fc_length * pos->fc_weight) + break; + list_move_tail(&soute->fc_list, &pos->fc_list); +} + +static void sctp_sched_fc_sched_all(struct sctp_stream *stream) +{ + struct sctp_association *asoc; + struct sctp_chunk *ch; + + asoc = container_of(stream, struct sctp_association, stream); + list_for_each_entry(ch, &asoc->outqueue.out_chunk_list, list) { + __u16 sid = sctp_chunk_stream_no(ch); + + if (SCTP_SO(stream, sid)->ext) + sctp_sched_fc_sched(stream, SCTP_SO(stream, sid)->ext); + } +} + +static void sctp_sched_fc_unsched_all(struct sctp_stream *stream) +{ + struct sctp_stream_out_ext *soute, *tmp; + + list_for_each_entry_safe(soute, tmp, &stream->fc_list, fc_list) + list_del_init(&soute->fc_list); +} + +static struct sctp_sched_ops sctp_sched_fc = { + .set = sctp_sched_fc_set, + .get = sctp_sched_fc_get, + .init = sctp_sched_fc_init, + .init_sid = sctp_sched_fc_init_sid, + .free_sid = sctp_sched_fc_free_sid, + .enqueue = sctp_sched_fc_enqueue, + .dequeue = sctp_sched_fc_dequeue, + .dequeue_done = sctp_sched_fc_dequeue_done, + .sched_all = sctp_sched_fc_sched_all, + .unsched_all = sctp_sched_fc_unsched_all, +}; + +void sctp_sched_ops_fc_init(void) +{ + sctp_sched_ops_register(SCTP_SS_FC, &sctp_sched_fc); +} + +static struct sctp_sched_ops sctp_sched_wfq = { + .set = sctp_sched_wfq_set, + .get = sctp_sched_wfq_get, + .init = sctp_sched_fc_init, + .init_sid = sctp_sched_fc_init_sid, + .free_sid = sctp_sched_fc_free_sid, + .enqueue = sctp_sched_fc_enqueue, + .dequeue = sctp_sched_fc_dequeue, + .dequeue_done = sctp_sched_fc_dequeue_done, + .sched_all = sctp_sched_fc_sched_all, + .unsched_all = sctp_sched_fc_unsched_all, +}; + +void sctp_sched_ops_wfq_init(void) +{ + sctp_sched_ops_register(SCTP_SS_WFQ, &sctp_sched_wfq); +} diff --git a/net/smc/smc.h b/net/smc/smc.h index 5ed765ea0c73..2eeea4cdc718 100644 --- a/net/smc/smc.h +++ b/net/smc/smc.h @@ -283,10 +283,7 @@ struct smc_sock { /* smc sock container */ * */ }; -static inline struct smc_sock *smc_sk(const struct sock *sk) -{ - return (struct smc_sock *)sk; -} +#define smc_sk(ptr) container_of_const(ptr, struct smc_sock, sk) static inline void smc_init_saved_callbacks(struct smc_sock *smc) { diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h index 08b457c2d294..1645fba0d2d3 100644 --- a/net/smc/smc_core.h +++ b/net/smc/smc_core.h @@ -106,7 +106,10 @@ struct smc_link { unsigned long *wr_tx_mask; /* bit mask of used indexes */ u32 wr_tx_cnt; /* number of WR send buffers */ wait_queue_head_t wr_tx_wait; /* wait for free WR send buf */ - atomic_t wr_tx_refcnt; /* tx refs to link */ + struct { + struct percpu_ref wr_tx_refs; + } ____cacheline_aligned_in_smp; + struct completion tx_ref_comp; struct smc_wr_buf *wr_rx_bufs; /* WR recv payload buffers */ struct ib_recv_wr *wr_rx_ibs; /* WR recv meta data */ @@ -122,7 +125,10 @@ struct smc_link { struct ib_reg_wr wr_reg; /* WR register memory region */ wait_queue_head_t wr_reg_wait; /* wait for wr_reg result */ - atomic_t wr_reg_refcnt; /* reg refs to link */ + struct { + struct percpu_ref wr_reg_refs; + } ____cacheline_aligned_in_smp; + struct completion reg_ref_comp; enum smc_wr_reg_state wr_reg_state; /* state of wr_reg request */ u8 gid[SMC_GID_SIZE];/* gid matching used vlan id*/ diff --git a/net/smc/smc_ism.c b/net/smc/smc_ism.c index 3b0b7710c6b0..fbee2493091f 100644 --- a/net/smc/smc_ism.c +++ b/net/smc/smc_ism.c @@ -429,7 +429,7 @@ static void smcd_register_dev(struct ism_dev *ism) u8 *system_eid = NULL; system_eid = smcd->ops->get_system_eid(); - if (system_eid[24] != '0' || system_eid[28] != '0') { + if (smcd->ops->supports_v2()) { smc_ism_v2_capable = true; memcpy(smc_ism_v2_system_eid, system_eid, SMC_MAX_EID_LEN); diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c index b0678a417e09..0021065a600a 100644 --- a/net/smc/smc_wr.c +++ b/net/smc/smc_wr.c @@ -377,12 +377,11 @@ int smc_wr_reg_send(struct smc_link *link, struct ib_mr *mr) if (rc) return rc; - atomic_inc(&link->wr_reg_refcnt); + percpu_ref_get(&link->wr_reg_refs); rc = wait_event_interruptible_timeout(link->wr_reg_wait, (link->wr_reg_state != POSTED), SMC_WR_REG_MR_WAIT_TIME); - if (atomic_dec_and_test(&link->wr_reg_refcnt)) - wake_up_all(&link->wr_reg_wait); + percpu_ref_put(&link->wr_reg_refs); if (!rc) { /* timeout - terminate link */ smcr_link_down_cond_sched(link); @@ -647,8 +646,10 @@ void smc_wr_free_link(struct smc_link *lnk) smc_wr_wakeup_tx_wait(lnk); smc_wr_tx_wait_no_pending_sends(lnk); - wait_event(lnk->wr_reg_wait, (!atomic_read(&lnk->wr_reg_refcnt))); - wait_event(lnk->wr_tx_wait, (!atomic_read(&lnk->wr_tx_refcnt))); + percpu_ref_kill(&lnk->wr_reg_refs); + wait_for_completion(&lnk->reg_ref_comp); + percpu_ref_kill(&lnk->wr_tx_refs); + wait_for_completion(&lnk->tx_ref_comp); if (lnk->wr_rx_dma_addr) { ib_dma_unmap_single(ibdev, lnk->wr_rx_dma_addr, @@ -847,6 +848,20 @@ void smc_wr_add_dev(struct smc_ib_device *smcibdev) tasklet_setup(&smcibdev->send_tasklet, smc_wr_tx_tasklet_fn); } +static void smcr_wr_tx_refs_free(struct percpu_ref *ref) +{ + struct smc_link *lnk = container_of(ref, struct smc_link, wr_tx_refs); + + complete(&lnk->tx_ref_comp); +} + +static void smcr_wr_reg_refs_free(struct percpu_ref *ref) +{ + struct smc_link *lnk = container_of(ref, struct smc_link, wr_reg_refs); + + complete(&lnk->reg_ref_comp); +} + int smc_wr_create_link(struct smc_link *lnk) { struct ib_device *ibdev = lnk->smcibdev->ibdev; @@ -890,9 +905,15 @@ int smc_wr_create_link(struct smc_link *lnk) smc_wr_init_sge(lnk); bitmap_zero(lnk->wr_tx_mask, SMC_WR_BUF_CNT); init_waitqueue_head(&lnk->wr_tx_wait); - atomic_set(&lnk->wr_tx_refcnt, 0); + rc = percpu_ref_init(&lnk->wr_tx_refs, smcr_wr_tx_refs_free, 0, GFP_KERNEL); + if (rc) + goto dma_unmap; + init_completion(&lnk->tx_ref_comp); init_waitqueue_head(&lnk->wr_reg_wait); - atomic_set(&lnk->wr_reg_refcnt, 0); + rc = percpu_ref_init(&lnk->wr_reg_refs, smcr_wr_reg_refs_free, 0, GFP_KERNEL); + if (rc) + goto dma_unmap; + init_completion(&lnk->reg_ref_comp); init_waitqueue_head(&lnk->wr_rx_empty_wait); return rc; diff --git a/net/smc/smc_wr.h b/net/smc/smc_wr.h index 45e9b894d3f8..f3008dda222a 100644 --- a/net/smc/smc_wr.h +++ b/net/smc/smc_wr.h @@ -63,14 +63,13 @@ static inline bool smc_wr_tx_link_hold(struct smc_link *link) { if (!smc_link_sendable(link)) return false; - atomic_inc(&link->wr_tx_refcnt); + percpu_ref_get(&link->wr_tx_refs); return true; } static inline void smc_wr_tx_link_put(struct smc_link *link) { - if (atomic_dec_and_test(&link->wr_tx_refcnt)) - wake_up_all(&link->wr_tx_wait); + percpu_ref_put(&link->wr_tx_refs); } static inline void smc_wr_drain_cq(struct smc_link *lnk) diff --git a/net/socket.c b/net/socket.c index 9c92c0e6c4da..a7b4b37d86df 100644 --- a/net/socket.c +++ b/net/socket.c @@ -957,6 +957,7 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk, } EXPORT_SYMBOL_GPL(__sock_recv_timestamp); +#ifdef CONFIG_WIRELESS void __sock_recv_wifi_status(struct msghdr *msg, struct sock *sk, struct sk_buff *skb) { @@ -972,6 +973,7 @@ void __sock_recv_wifi_status(struct msghdr *msg, struct sock *sk, put_cmsg(msg, SOL_SOCKET, SCM_WIFI_STATUS, sizeof(ack), &ack); } EXPORT_SYMBOL_GPL(__sock_recv_wifi_status); +#endif static inline void sock_recv_drops(struct msghdr *msg, struct sock *sk, struct sk_buff *skb) @@ -2292,9 +2294,9 @@ INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level, int __sys_getsockopt(int fd, int level, int optname, char __user *optval, int __user *optlen) { + int max_optlen __maybe_unused; int err, fput_needed; struct socket *sock; - int max_optlen; sock = sockfd_lookup_light(fd, &err, &fput_needed); if (!sock) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 0b0f18ecce44..fb31e8a4409e 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -557,7 +557,7 @@ static void unix_dgram_disconnected(struct sock *sk, struct sock *other) * when peer was not connected to us. */ if (!sock_flag(other, SOCK_DEAD) && unix_peer(other) == sk) { - other->sk_err = ECONNRESET; + WRITE_ONCE(other->sk_err, ECONNRESET); sk_error_report(other); } } @@ -630,7 +630,7 @@ static void unix_release_sock(struct sock *sk, int embrion) /* No more writes */ skpair->sk_shutdown = SHUTDOWN_MASK; if (!skb_queue_empty(&sk->sk_receive_queue) || embrion) - skpair->sk_err = ECONNRESET; + WRITE_ONCE(skpair->sk_err, ECONNRESET); unix_state_unlock(skpair); skpair->sk_state_change(skpair); sk_wake_async(skpair, SOCK_WAKE_WAITD, POLL_HUP); @@ -3165,7 +3165,7 @@ static __poll_t unix_poll(struct file *file, struct socket *sock, poll_table *wa mask = 0; /* exceptional events? */ - if (sk->sk_err) + if (READ_ONCE(sk->sk_err)) mask |= EPOLLERR; if (sk->sk_shutdown == SHUTDOWN_MASK) mask |= EPOLLHUP; @@ -3208,7 +3208,8 @@ static __poll_t unix_dgram_poll(struct file *file, struct socket *sock, mask = 0; /* exceptional events? */ - if (sk->sk_err || !skb_queue_empty_lockless(&sk->sk_error_queue)) + if (READ_ONCE(sk->sk_err) || + !skb_queue_empty_lockless(&sk->sk_error_queue)) mask |= EPOLLERR | (sock_flag(sk, SOCK_SELECT_ERR_QUEUE) ? EPOLLPRI : 0); diff --git a/net/unix/garbage.c b/net/unix/garbage.c index dc2763540393..2405f0f9af31 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -305,7 +305,7 @@ void unix_gc(void) * release.path eventually putting registered files. */ skb_queue_walk_safe(&hitlist, skb, next_skb) { - if (skb->scm_io_uring) { + if (skb->destructor == io_uring_destruct_scm) { __skb_unlink(skb, &hitlist); skb_queue_tail(&skb->sk->sk_receive_queue, skb); } diff --git a/net/unix/scm.c b/net/unix/scm.c index aa27a02478dc..f9152881d77f 100644 --- a/net/unix/scm.c +++ b/net/unix/scm.c @@ -152,3 +152,9 @@ void unix_destruct_scm(struct sk_buff *skb) sock_wfree(skb); } EXPORT_SYMBOL(unix_destruct_scm); + +void io_uring_destruct_scm(struct sk_buff *skb) +{ + unix_destruct_scm(skb); +} +EXPORT_SYMBOL(io_uring_destruct_scm); diff --git a/net/vmw_vsock/Makefile b/net/vmw_vsock/Makefile index 6a943ec95c4a..5da74c4a9f1d 100644 --- a/net/vmw_vsock/Makefile +++ b/net/vmw_vsock/Makefile @@ -8,6 +8,7 @@ obj-$(CONFIG_HYPERV_VSOCKETS) += hv_sock.o obj-$(CONFIG_VSOCKETS_LOOPBACK) += vsock_loopback.o vsock-y += af_vsock.o af_vsock_tap.o vsock_addr.o +vsock-$(CONFIG_BPF_SYSCALL) += vsock_bpf.o vsock_diag-y += diag.o diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 19aea7cba26e..413407bb646c 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -116,10 +116,13 @@ static void vsock_sk_destruct(struct sock *sk); static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb); /* Protocol family. */ -static struct proto vsock_proto = { +struct proto vsock_proto = { .name = "AF_VSOCK", .owner = THIS_MODULE, .obj_size = sizeof(struct vsock_sock), +#ifdef CONFIG_BPF_SYSCALL + .psock_update_sk_prot = vsock_bpf_update_proto, +#endif }; /* The default peer timeout indicates how long we will wait for a peer response @@ -865,7 +868,7 @@ s64 vsock_stream_has_data(struct vsock_sock *vsk) } EXPORT_SYMBOL_GPL(vsock_stream_has_data); -static s64 vsock_connectible_has_data(struct vsock_sock *vsk) +s64 vsock_connectible_has_data(struct vsock_sock *vsk) { struct sock *sk = sk_vsock(vsk); @@ -874,6 +877,7 @@ static s64 vsock_connectible_has_data(struct vsock_sock *vsk) else return vsock_stream_has_data(vsk); } +EXPORT_SYMBOL_GPL(vsock_connectible_has_data); s64 vsock_stream_has_space(struct vsock_sock *vsk) { @@ -1131,6 +1135,13 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock, return mask; } +static int vsock_read_skb(struct sock *sk, skb_read_actor_t read_actor) +{ + struct vsock_sock *vsk = vsock_sk(sk); + + return vsk->transport->read_skb(vsk, read_actor); +} + static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) { @@ -1242,18 +1253,42 @@ static int vsock_dgram_connect(struct socket *sock, memcpy(&vsk->remote_addr, remote_addr, sizeof(vsk->remote_addr)); sock->state = SS_CONNECTED; + /* sock map disallows redirection of non-TCP sockets with sk_state != + * TCP_ESTABLISHED (see sock_map_redirect_allowed()), so we set + * TCP_ESTABLISHED here to allow redirection of connected vsock dgrams. + * + * This doesn't seem to be abnormal state for datagram sockets, as the + * same approach can be see in other datagram socket types as well + * (such as unix sockets). + */ + sk->sk_state = TCP_ESTABLISHED; + out: release_sock(sk); return err; } -static int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg, - size_t len, int flags) +int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg, + size_t len, int flags) { - struct vsock_sock *vsk = vsock_sk(sock->sk); +#ifdef CONFIG_BPF_SYSCALL + const struct proto *prot; +#endif + struct vsock_sock *vsk; + struct sock *sk; + + sk = sock->sk; + vsk = vsock_sk(sk); + +#ifdef CONFIG_BPF_SYSCALL + prot = READ_ONCE(sk->sk_prot); + if (prot != &vsock_proto) + return prot->recvmsg(sk, msg, len, flags, NULL); +#endif return vsk->transport->dgram_dequeue(vsk, msg, len, flags); } +EXPORT_SYMBOL_GPL(vsock_dgram_recvmsg); static const struct proto_ops vsock_dgram_ops = { .family = PF_VSOCK, @@ -1272,6 +1307,7 @@ static const struct proto_ops vsock_dgram_ops = { .recvmsg = vsock_dgram_recvmsg, .mmap = sock_no_mmap, .sendpage = sock_no_sendpage, + .read_skb = vsock_read_skb, }; static int vsock_transport_cancel_pkt(struct vsock_sock *vsk) @@ -2007,7 +2043,7 @@ static int __vsock_stream_recvmsg(struct sock *sk, struct msghdr *msg, read = transport->stream_dequeue(vsk, msg, len - copied, flags); if (read < 0) { - err = -ENOMEM; + err = read; break; } @@ -2058,7 +2094,7 @@ static int __vsock_seqpacket_recvmsg(struct sock *sk, struct msghdr *msg, msg_len = transport->seqpacket_dequeue(vsk, msg, flags); if (msg_len < 0) { - err = -ENOMEM; + err = msg_len; goto out; } @@ -2086,13 +2122,16 @@ out: return err; } -static int +int vsock_connectible_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, int flags) { struct sock *sk; struct vsock_sock *vsk; const struct vsock_transport *transport; +#ifdef CONFIG_BPF_SYSCALL + const struct proto *prot; +#endif int err; sk = sock->sk; @@ -2139,6 +2178,14 @@ vsock_connectible_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, goto out; } +#ifdef CONFIG_BPF_SYSCALL + prot = READ_ONCE(sk->sk_prot); + if (prot != &vsock_proto) { + release_sock(sk); + return prot->recvmsg(sk, msg, len, flags, NULL); + } +#endif + if (sk->sk_type == SOCK_STREAM) err = __vsock_stream_recvmsg(sk, msg, len, flags); else @@ -2148,6 +2195,7 @@ out: release_sock(sk); return err; } +EXPORT_SYMBOL_GPL(vsock_connectible_recvmsg); static int vsock_set_rcvlowat(struct sock *sk, int val) { @@ -2188,6 +2236,7 @@ static const struct proto_ops vsock_stream_ops = { .mmap = sock_no_mmap, .sendpage = sock_no_sendpage, .set_rcvlowat = vsock_set_rcvlowat, + .read_skb = vsock_read_skb, }; static const struct proto_ops vsock_seqpacket_ops = { @@ -2209,6 +2258,7 @@ static const struct proto_ops vsock_seqpacket_ops = { .recvmsg = vsock_connectible_recvmsg, .mmap = sock_no_mmap, .sendpage = sock_no_sendpage, + .read_skb = vsock_read_skb, }; static int vsock_create(struct net *net, struct socket *sock, @@ -2348,6 +2398,8 @@ static int __init vsock_init(void) goto err_unregister_proto; } + vsock_bpf_build_proto(); + return 0; err_unregister_proto: diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c index 28b5a8e8e094..e95df847176b 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -457,6 +457,8 @@ static struct virtio_transport virtio_transport = { .notify_send_pre_enqueue = virtio_transport_notify_send_pre_enqueue, .notify_send_post_enqueue = virtio_transport_notify_send_post_enqueue, .notify_buffer_size = virtio_transport_notify_buffer_size, + + .read_skb = virtio_transport_read_skb, }, .send_pkt = virtio_transport_send_pkt, diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index ee78b4082ef9..e4878551f140 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -201,7 +201,8 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk, const struct virtio_transport *t_ops; struct virtio_vsock_sock *vvs; u32 pkt_len = info->pkt_len; - struct sk_buff *skb; + u32 rest_len; + int ret; info->type = virtio_transport_get_type(sk_vsock(vsk)); @@ -221,10 +222,6 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk, vvs = vsk->trans; - /* we can send less than pkt_len bytes */ - if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) - pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE; - /* virtio_transport_get_credit might return less than pkt_len credit */ pkt_len = virtio_transport_get_credit(vvs, pkt_len); @@ -232,17 +229,49 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk, if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW) return pkt_len; - skb = virtio_transport_alloc_skb(info, pkt_len, - src_cid, src_port, - dst_cid, dst_port); - if (!skb) { - virtio_transport_put_credit(vvs, pkt_len); - return -ENOMEM; - } + rest_len = pkt_len; + + do { + struct sk_buff *skb; + size_t skb_len; + + skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE, rest_len); + + skb = virtio_transport_alloc_skb(info, skb_len, + src_cid, src_port, + dst_cid, dst_port); + if (!skb) { + ret = -ENOMEM; + break; + } + + virtio_transport_inc_tx_pkt(vvs, skb); - virtio_transport_inc_tx_pkt(vvs, skb); + ret = t_ops->send_pkt(skb); + if (ret < 0) + break; + + /* Both virtio and vhost 'send_pkt()' returns 'skb_len', + * but for reliability use 'ret' instead of 'skb_len'. + * Also if partial send happens (e.g. 'ret' != 'skb_len') + * somehow, we break this loop, but account such returned + * value in 'virtio_transport_put_credit()'. + */ + rest_len -= ret; + + if (WARN_ONCE(ret != skb_len, + "'send_pkt()' returns %i, but %zu expected\n", + ret, skb_len)) + break; + } while (rest_len); + + virtio_transport_put_credit(vvs, rest_len); - return t_ops->send_pkt(skb); + /* Return number of bytes, if any data has been sent. */ + if (rest_len != pkt_len) + ret = pkt_len - rest_len; + + return ret; } static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs, @@ -278,6 +307,9 @@ u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 credit) { u32 ret; + if (!credit) + return 0; + spin_lock_bh(&vvs->tx_lock); ret = vvs->peer_buf_alloc - (vvs->tx_cnt - vvs->peer_fwd_cnt); if (ret > credit) @@ -291,6 +323,9 @@ EXPORT_SYMBOL_GPL(virtio_transport_get_credit); void virtio_transport_put_credit(struct virtio_vsock_sock *vvs, u32 credit) { + if (!credit) + return; + spin_lock_bh(&vvs->tx_lock); vvs->tx_cnt -= credit; spin_unlock_bh(&vvs->tx_lock); @@ -862,6 +897,9 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t, if (le16_to_cpu(hdr->op) == VIRTIO_VSOCK_OP_RST) return 0; + if (!t) + return -ENOTCONN; + reply = virtio_transport_alloc_skb(&info, 0, le64_to_cpu(hdr->dst_cid), le32_to_cpu(hdr->dst_port), @@ -870,11 +908,6 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t, if (!reply) return -ENOMEM; - if (!t) { - kfree_skb(reply); - return -ENOTCONN; - } - return t->send_pkt(reply); } @@ -1402,6 +1435,31 @@ int virtio_transport_purge_skbs(void *vsk, struct sk_buff_head *queue) } EXPORT_SYMBOL_GPL(virtio_transport_purge_skbs); +int virtio_transport_read_skb(struct vsock_sock *vsk, skb_read_actor_t recv_actor) +{ + struct virtio_vsock_sock *vvs = vsk->trans; + struct sock *sk = sk_vsock(vsk); + struct sk_buff *skb; + int off = 0; + int copied; + int err; + + spin_lock_bh(&vvs->rx_lock); + /* Use __skb_recv_datagram() for race-free handling of the receive. It + * works for types other than dgrams. + */ + skb = __skb_recv_datagram(sk, &vvs->rx_queue, MSG_DONTWAIT, &off, &err); + spin_unlock_bh(&vvs->rx_lock); + + if (!skb) + return err; + + copied = recv_actor(sk, skb); + kfree_skb(skb); + return copied; +} +EXPORT_SYMBOL_GPL(virtio_transport_read_skb); + MODULE_LICENSE("GPL v2"); MODULE_AUTHOR("Asias He"); MODULE_DESCRIPTION("common code for virtio vsock"); diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c index 95cc4d79ba29..b370070194fa 100644 --- a/net/vmw_vsock/vmci_transport.c +++ b/net/vmw_vsock/vmci_transport.c @@ -1831,10 +1831,17 @@ static ssize_t vmci_transport_stream_dequeue( size_t len, int flags) { + ssize_t err; + if (flags & MSG_PEEK) - return vmci_qpair_peekv(vmci_trans(vsk)->qpair, msg, len, 0); + err = vmci_qpair_peekv(vmci_trans(vsk)->qpair, msg, len, 0); else - return vmci_qpair_dequev(vmci_trans(vsk)->qpair, msg, len, 0); + err = vmci_qpair_dequev(vmci_trans(vsk)->qpair, msg, len, 0); + + if (err < 0) + err = -ENOMEM; + + return err; } static ssize_t vmci_transport_stream_enqueue( diff --git a/net/vmw_vsock/vsock_bpf.c b/net/vmw_vsock/vsock_bpf.c new file mode 100644 index 000000000000..a3c97546ab84 --- /dev/null +++ b/net/vmw_vsock/vsock_bpf.c @@ -0,0 +1,174 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2022 Bobby Eshleman <bobby.eshleman@bytedance.com> + * + * Based off of net/unix/unix_bpf.c + */ + +#include <linux/bpf.h> +#include <linux/module.h> +#include <linux/skmsg.h> +#include <linux/socket.h> +#include <linux/wait.h> +#include <net/af_vsock.h> +#include <net/sock.h> + +#define vsock_sk_has_data(__sk, __psock) \ + ({ !skb_queue_empty(&(__sk)->sk_receive_queue) || \ + !skb_queue_empty(&(__psock)->ingress_skb) || \ + !list_empty(&(__psock)->ingress_msg); \ + }) + +static struct proto *vsock_prot_saved __read_mostly; +static DEFINE_SPINLOCK(vsock_prot_lock); +static struct proto vsock_bpf_prot; + +static bool vsock_has_data(struct sock *sk, struct sk_psock *psock) +{ + struct vsock_sock *vsk = vsock_sk(sk); + s64 ret; + + ret = vsock_connectible_has_data(vsk); + if (ret > 0) + return true; + + return vsock_sk_has_data(sk, psock); +} + +static bool vsock_msg_wait_data(struct sock *sk, struct sk_psock *psock, long timeo) +{ + bool ret; + + DEFINE_WAIT_FUNC(wait, woken_wake_function); + + if (sk->sk_shutdown & RCV_SHUTDOWN) + return true; + + if (!timeo) + return false; + + add_wait_queue(sk_sleep(sk), &wait); + sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk); + ret = vsock_has_data(sk, psock); + if (!ret) { + wait_woken(&wait, TASK_INTERRUPTIBLE, timeo); + ret = vsock_has_data(sk, psock); + } + sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk); + remove_wait_queue(sk_sleep(sk), &wait); + return ret; +} + +static int __vsock_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags) +{ + struct socket *sock = sk->sk_socket; + int err; + + if (sk->sk_type == SOCK_STREAM || sk->sk_type == SOCK_SEQPACKET) + err = vsock_connectible_recvmsg(sock, msg, len, flags); + else if (sk->sk_type == SOCK_DGRAM) + err = vsock_dgram_recvmsg(sock, msg, len, flags); + else + err = -EPROTOTYPE; + + return err; +} + +static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg, + size_t len, int flags, int *addr_len) +{ + struct sk_psock *psock; + int copied; + + psock = sk_psock_get(sk); + if (unlikely(!psock)) + return __vsock_recvmsg(sk, msg, len, flags); + + lock_sock(sk); + if (vsock_has_data(sk, psock) && sk_psock_queue_empty(psock)) { + release_sock(sk); + sk_psock_put(sk, psock); + return __vsock_recvmsg(sk, msg, len, flags); + } + + copied = sk_msg_recvmsg(sk, psock, msg, len, flags); + while (copied == 0) { + long timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT); + + if (!vsock_msg_wait_data(sk, psock, timeo)) { + copied = -EAGAIN; + break; + } + + if (sk_psock_queue_empty(psock)) { + release_sock(sk); + sk_psock_put(sk, psock); + return __vsock_recvmsg(sk, msg, len, flags); + } + + copied = sk_msg_recvmsg(sk, psock, msg, len, flags); + } + + release_sock(sk); + sk_psock_put(sk, psock); + + return copied; +} + +/* Copy of original proto with updated sock_map methods */ +static struct proto vsock_bpf_prot = { + .close = sock_map_close, + .recvmsg = vsock_bpf_recvmsg, + .sock_is_readable = sk_msg_is_readable, + .unhash = sock_map_unhash, +}; + +static void vsock_bpf_rebuild_protos(struct proto *prot, const struct proto *base) +{ + *prot = *base; + prot->close = sock_map_close; + prot->recvmsg = vsock_bpf_recvmsg; + prot->sock_is_readable = sk_msg_is_readable; +} + +static void vsock_bpf_check_needs_rebuild(struct proto *ops) +{ + /* Paired with the smp_store_release() below. */ + if (unlikely(ops != smp_load_acquire(&vsock_prot_saved))) { + spin_lock_bh(&vsock_prot_lock); + if (likely(ops != vsock_prot_saved)) { + vsock_bpf_rebuild_protos(&vsock_bpf_prot, ops); + /* Make sure proto function pointers are updated before publishing the + * pointer to the struct. + */ + smp_store_release(&vsock_prot_saved, ops); + } + spin_unlock_bh(&vsock_prot_lock); + } +} + +int vsock_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore) +{ + struct vsock_sock *vsk; + + if (restore) { + sk->sk_write_space = psock->saved_write_space; + sock_replace_proto(sk, psock->sk_proto); + return 0; + } + + vsk = vsock_sk(sk); + if (!vsk->transport) + return -ENODEV; + + if (!vsk->transport->read_skb) + return -EOPNOTSUPP; + + vsock_bpf_check_needs_rebuild(psock->sk_proto); + sock_replace_proto(sk, &vsock_bpf_prot); + return 0; +} + +void __init vsock_bpf_build_proto(void) +{ + vsock_bpf_rebuild_protos(&vsock_bpf_prot, &vsock_proto); +} diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c index 89905c092645..5c6360df1f31 100644 --- a/net/vmw_vsock/vsock_loopback.c +++ b/net/vmw_vsock/vsock_loopback.c @@ -31,8 +31,7 @@ static int vsock_loopback_send_pkt(struct sk_buff *skb) struct vsock_loopback *vsock = &the_vsock_loopback; int len = skb->len; - skb_queue_tail(&vsock->pkt_queue, skb); - + virtio_vsock_skb_queue_tail(&vsock->pkt_queue, skb); queue_work(vsock->workqueue, &vsock->pkt_work); return len; @@ -91,6 +90,8 @@ static struct virtio_transport loopback_transport = { .notify_send_pre_enqueue = virtio_transport_notify_send_pre_enqueue, .notify_send_post_enqueue = virtio_transport_notify_send_post_enqueue, .notify_buffer_size = virtio_transport_notify_buffer_size, + + .read_skb = virtio_transport_read_skb, }, .send_pkt = vsock_loopback_send_pkt, diff --git a/net/wireless/mlme.c b/net/wireless/mlme.c index 81d3f40d6235..ac059cefbeb3 100644 --- a/net/wireless/mlme.c +++ b/net/wireless/mlme.c @@ -673,6 +673,39 @@ static bool cfg80211_allowed_address(struct wireless_dev *wdev, const u8 *addr) return ether_addr_equal(addr, wdev_address(wdev)); } +static bool cfg80211_allowed_random_address(struct wireless_dev *wdev, + const struct ieee80211_mgmt *mgmt) +{ + if (ieee80211_is_auth(mgmt->frame_control) || + ieee80211_is_deauth(mgmt->frame_control)) { + /* Allow random TA to be used with authentication and + * deauthentication frames if the driver has indicated support. + */ + if (wiphy_ext_feature_isset( + wdev->wiphy, + NL80211_EXT_FEATURE_AUTH_AND_DEAUTH_RANDOM_TA)) + return true; + } else if (ieee80211_is_action(mgmt->frame_control) && + mgmt->u.action.category == WLAN_CATEGORY_PUBLIC) { + /* Allow random TA to be used with Public Action frames if the + * driver has indicated support. + */ + if (!wdev->connected && + wiphy_ext_feature_isset( + wdev->wiphy, + NL80211_EXT_FEATURE_MGMT_TX_RANDOM_TA)) + return true; + + if (wdev->connected && + wiphy_ext_feature_isset( + wdev->wiphy, + NL80211_EXT_FEATURE_MGMT_TX_RANDOM_TA_CONNECTED)) + return true; + } + + return false; +} + int cfg80211_mlme_mgmt_tx(struct cfg80211_registered_device *rdev, struct wireless_dev *wdev, struct cfg80211_mgmt_tx_params *params, u64 *cookie) @@ -774,25 +807,9 @@ int cfg80211_mlme_mgmt_tx(struct cfg80211_registered_device *rdev, return err; } - if (!cfg80211_allowed_address(wdev, mgmt->sa)) { - /* Allow random TA to be used with Public Action frames if the - * driver has indicated support for this. Otherwise, only allow - * the local address to be used. - */ - if (!ieee80211_is_action(mgmt->frame_control) || - mgmt->u.action.category != WLAN_CATEGORY_PUBLIC) - return -EINVAL; - if (!wdev->connected && - !wiphy_ext_feature_isset( - &rdev->wiphy, - NL80211_EXT_FEATURE_MGMT_TX_RANDOM_TA)) - return -EINVAL; - if (wdev->connected && - !wiphy_ext_feature_isset( - &rdev->wiphy, - NL80211_EXT_FEATURE_MGMT_TX_RANDOM_TA_CONNECTED)) - return -EINVAL; - } + if (!cfg80211_allowed_address(wdev, mgmt->sa) && + !cfg80211_allowed_random_address(wdev, mgmt)) + return -EINVAL; /* Transmit the management frame as requested by user space */ return rdev_mgmt_tx(rdev, wdev, params, cookie); diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index 4f63059efd81..d95f8053020d 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -812,6 +812,10 @@ static const struct nla_policy nl80211_policy[NUM_NL80211_ATTR] = { [NL80211_ATTR_MAX_NUM_AKM_SUITES] = { .type = NLA_REJECT }, [NL80211_ATTR_PUNCT_BITMAP] = NLA_POLICY_FULL_RANGE(NLA_U32, &nl80211_punct_bitmap_range), + + [NL80211_ATTR_MAX_HW_TIMESTAMP_PEERS] = { .type = NLA_U16 }, + [NL80211_ATTR_HW_TIMESTAMP_ENABLED] = { .type = NLA_FLAG }, + [NL80211_ATTR_EMA_RNR_ELEMS] = { .type = NLA_NESTED }, }; /* policy for the key attributes */ @@ -1963,6 +1967,16 @@ static int nl80211_send_band_rateinfo(struct sk_buff *msg, nla_nest_end(msg, nl_rates); + /* S1G capabilities */ + if (sband->band == NL80211_BAND_S1GHZ && sband->s1g_cap.s1g && + (nla_put(msg, NL80211_BAND_ATTR_S1G_CAPA, + sizeof(sband->s1g_cap.cap), + sband->s1g_cap.cap) || + nla_put(msg, NL80211_BAND_ATTR_S1G_MCS_NSS_SET, + sizeof(sband->s1g_cap.nss_mcs), + sband->s1g_cap.nss_mcs))) + return -ENOBUFS; + return 0; } @@ -2970,6 +2984,11 @@ static int nl80211_send_wiphy(struct cfg80211_registered_device *rdev, if (rdev->wiphy.flags & WIPHY_FLAG_SUPPORTS_MLO) nla_put_flag(msg, NL80211_ATTR_MLO_SUPPORT); + if (rdev->wiphy.hw_timestamp_max_peers && + nla_put_u16(msg, NL80211_ATTR_MAX_HW_TIMESTAMP_PEERS, + rdev->wiphy.hw_timestamp_max_peers)) + goto nla_put_failure; + /* done */ state->split_start = 0; break; @@ -3762,8 +3781,7 @@ out: return result; } -static int nl80211_send_chandef(struct sk_buff *msg, - const struct cfg80211_chan_def *chandef) +int nl80211_send_chandef(struct sk_buff *msg, const struct cfg80211_chan_def *chandef) { if (WARN_ON(!cfg80211_chandef_valid(chandef))) return -EINVAL; @@ -3794,6 +3812,7 @@ static int nl80211_send_chandef(struct sk_buff *msg, return -ENOBUFS; return 0; } +EXPORT_SYMBOL(nl80211_send_chandef); static int nl80211_send_iface(struct sk_buff *msg, u32 portid, u32 seq, int flags, struct cfg80211_registered_device *rdev, @@ -5423,6 +5442,38 @@ nl80211_parse_mbssid_elems(struct wiphy *wiphy, struct nlattr *attrs) return elems; } +static struct cfg80211_rnr_elems * +nl80211_parse_rnr_elems(struct wiphy *wiphy, struct nlattr *attrs, + struct netlink_ext_ack *extack) +{ + struct nlattr *nl_elems; + struct cfg80211_rnr_elems *elems; + int rem_elems; + u8 i = 0, num_elems = 0; + + nla_for_each_nested(nl_elems, attrs, rem_elems) { + int ret; + + ret = validate_ie_attr(nl_elems, extack); + if (ret) + return ERR_PTR(ret); + + num_elems++; + } + + elems = kzalloc(struct_size(elems, elem, num_elems), GFP_KERNEL); + if (!elems) + return ERR_PTR(-ENOMEM); + + nla_for_each_nested(nl_elems, attrs, rem_elems) { + elems->elem[i].data = nla_data(nl_elems); + elems->elem[i].len = nla_len(nl_elems); + i++; + } + elems->cnt = num_elems; + return elems; +} + static int nl80211_parse_he_bss_color(struct nlattr *attrs, struct cfg80211_he_bss_color *he_bss_color) { @@ -5449,7 +5500,8 @@ static int nl80211_parse_he_bss_color(struct nlattr *attrs, static int nl80211_parse_beacon(struct cfg80211_registered_device *rdev, struct nlattr *attrs[], - struct cfg80211_beacon_data *bcn) + struct cfg80211_beacon_data *bcn, + struct netlink_ext_ack *extack) { bool haveinfo = false; int err; @@ -5546,6 +5598,21 @@ static int nl80211_parse_beacon(struct cfg80211_registered_device *rdev, return PTR_ERR(mbssid); bcn->mbssid_ies = mbssid; + + if (bcn->mbssid_ies && attrs[NL80211_ATTR_EMA_RNR_ELEMS]) { + struct cfg80211_rnr_elems *rnr = + nl80211_parse_rnr_elems(&rdev->wiphy, + attrs[NL80211_ATTR_EMA_RNR_ELEMS], + extack); + + if (IS_ERR(rnr)) + return PTR_ERR(rnr); + + if (rnr && rnr->cnt < bcn->mbssid_ies->cnt) + return -EINVAL; + + bcn->rnr_ies = rnr; + } } return 0; @@ -5864,7 +5931,8 @@ static int nl80211_start_ap(struct sk_buff *skb, struct genl_info *info) if (!params) return -ENOMEM; - err = nl80211_parse_beacon(rdev, info->attrs, ¶ms->beacon); + err = nl80211_parse_beacon(rdev, info->attrs, ¶ms->beacon, + info->extack); if (err) goto out; @@ -6094,6 +6162,11 @@ static int nl80211_start_ap(struct sk_buff *skb, struct genl_info *info) goto out_unlock; } + if (!params->mbssid_config.ema && params->beacon.rnr_ies) { + err = -EINVAL; + goto out_unlock; + } + err = nl80211_calculate_ap_params(params); if (err) goto out_unlock; @@ -6135,6 +6208,7 @@ out: params->mbssid_config.tx_wdev->netdev && params->mbssid_config.tx_wdev->netdev != dev) dev_put(params->mbssid_config.tx_wdev->netdev); + kfree(params->beacon.rnr_ies); kfree(params); return err; @@ -6159,7 +6233,7 @@ static int nl80211_set_beacon(struct sk_buff *skb, struct genl_info *info) if (!wdev->links[link_id].ap.beacon_interval) return -EINVAL; - err = nl80211_parse_beacon(rdev, info->attrs, ¶ms); + err = nl80211_parse_beacon(rdev, info->attrs, ¶ms, info->extack); if (err) goto out; @@ -6169,6 +6243,7 @@ static int nl80211_set_beacon(struct sk_buff *skb, struct genl_info *info) out: kfree(params.mbssid_ies); + kfree(params.rnr_ies); return err; } @@ -9025,7 +9100,7 @@ static int nl80211_trigger_scan(struct sk_buff *skb, struct genl_info *info) struct nlattr *attr; struct wiphy *wiphy; int err, tmp, n_ssids = 0, n_channels, i; - size_t ie_len; + size_t ie_len, size; wiphy = &rdev->wiphy; @@ -9070,10 +9145,10 @@ static int nl80211_trigger_scan(struct sk_buff *skb, struct genl_info *info) if (ie_len > wiphy->max_scan_ie_len) return -EINVAL; - request = kzalloc(sizeof(*request) - + sizeof(*request->ssids) * n_ssids - + sizeof(*request->channels) * n_channels - + ie_len, GFP_KERNEL); + size = struct_size(request, channels, n_channels); + size = size_add(size, array_size(sizeof(*request->ssids), n_ssids)); + size = size_add(size, ie_len); + request = kzalloc(size, GFP_KERNEL); if (!request) return -ENOMEM; @@ -9406,7 +9481,7 @@ nl80211_parse_sched_scan(struct wiphy *wiphy, struct wireless_dev *wdev, struct nlattr *attr; int err, tmp, n_ssids = 0, n_match_sets = 0, n_channels, i, n_plans = 0; enum nl80211_band band; - size_t ie_len; + size_t ie_len, size; struct nlattr *tb[NL80211_SCHED_SCAN_MATCH_ATTR_MAX + 1]; s32 default_match_rssi = NL80211_SCAN_RSSI_THOLD_OFF; @@ -9515,12 +9590,14 @@ nl80211_parse_sched_scan(struct wiphy *wiphy, struct wireless_dev *wdev, attrs[NL80211_ATTR_SCHED_SCAN_RSSI_ADJUST])) return ERR_PTR(-EINVAL); - request = kzalloc(sizeof(*request) - + sizeof(*request->ssids) * n_ssids - + sizeof(*request->match_sets) * n_match_sets - + sizeof(*request->scan_plans) * n_plans - + sizeof(*request->channels) * n_channels - + ie_len, GFP_KERNEL); + size = struct_size(request, channels, n_channels); + size = size_add(size, array_size(sizeof(*request->ssids), n_ssids)); + size = size_add(size, array_size(sizeof(*request->match_sets), + n_match_sets)); + size = size_add(size, array_size(sizeof(*request->scan_plans), + n_plans)); + size = size_add(size, ie_len); + request = kzalloc(size, GFP_KERNEL); if (!request) return ERR_PTR(-ENOMEM); @@ -10026,7 +10103,8 @@ static int nl80211_channel_switch(struct sk_buff *skb, struct genl_info *info) if (!need_new_beacon) goto skip_beacons; - err = nl80211_parse_beacon(rdev, info->attrs, ¶ms.beacon_after); + err = nl80211_parse_beacon(rdev, info->attrs, ¶ms.beacon_after, + info->extack); if (err) goto free; @@ -10043,7 +10121,8 @@ static int nl80211_channel_switch(struct sk_buff *skb, struct genl_info *info) if (err) goto free; - err = nl80211_parse_beacon(rdev, csa_attrs, ¶ms.beacon_csa); + err = nl80211_parse_beacon(rdev, csa_attrs, ¶ms.beacon_csa, + info->extack); if (err) goto free; @@ -10163,6 +10242,8 @@ skip_beacons: free: kfree(params.beacon_after.mbssid_ies); kfree(params.beacon_csa.mbssid_ies); + kfree(params.beacon_after.rnr_ies); + kfree(params.beacon_csa.rnr_ies); kfree(csa_attrs); return err; } @@ -15876,7 +15957,8 @@ static int nl80211_color_change(struct sk_buff *skb, struct genl_info *info) params.count = nla_get_u8(info->attrs[NL80211_ATTR_COLOR_CHANGE_COUNT]); params.color = nla_get_u8(info->attrs[NL80211_ATTR_COLOR_CHANGE_COLOR]); - err = nl80211_parse_beacon(rdev, info->attrs, ¶ms.beacon_next); + err = nl80211_parse_beacon(rdev, info->attrs, ¶ms.beacon_next, + info->extack); if (err) return err; @@ -15890,7 +15972,8 @@ static int nl80211_color_change(struct sk_buff *skb, struct genl_info *info) if (err) goto out; - err = nl80211_parse_beacon(rdev, tb, ¶ms.beacon_color_change); + err = nl80211_parse_beacon(rdev, tb, ¶ms.beacon_color_change, + info->extack); if (err) goto out; @@ -15946,6 +16029,8 @@ static int nl80211_color_change(struct sk_buff *skb, struct genl_info *info) out: kfree(params.beacon_next.mbssid_ies); kfree(params.beacon_color_change.mbssid_ies); + kfree(params.beacon_next.rnr_ies); + kfree(params.beacon_color_change.rnr_ies); kfree(tb); return err; } @@ -16166,6 +16251,29 @@ nl80211_remove_link_station(struct sk_buff *skb, struct genl_info *info) return ret; } +static int nl80211_set_hw_timestamp(struct sk_buff *skb, + struct genl_info *info) +{ + struct cfg80211_registered_device *rdev = info->user_ptr[0]; + struct net_device *dev = info->user_ptr[1]; + struct cfg80211_set_hw_timestamp hwts = {}; + + if (!rdev->wiphy.hw_timestamp_max_peers) + return -EOPNOTSUPP; + + if (!info->attrs[NL80211_ATTR_MAC] && + rdev->wiphy.hw_timestamp_max_peers != CFG80211_HW_TIMESTAMP_ALL_PEERS) + return -EOPNOTSUPP; + + if (info->attrs[NL80211_ATTR_MAC]) + hwts.macaddr = nla_data(info->attrs[NL80211_ATTR_MAC]); + + hwts.enable = + nla_get_flag(info->attrs[NL80211_ATTR_HW_TIMESTAMP_ENABLED]); + + return rdev_set_hw_timestamp(rdev, dev, &hwts); +} + #define NL80211_FLAG_NEED_WIPHY 0x01 #define NL80211_FLAG_NEED_NETDEV 0x02 #define NL80211_FLAG_NEED_RTNL 0x04 @@ -17340,6 +17448,12 @@ static const struct genl_small_ops nl80211_small_ops[] = { .internal_flags = IFLAGS(NL80211_FLAG_NEED_NETDEV_UP | NL80211_FLAG_MLO_VALID_LINK_ID), }, + { + .cmd = NL80211_CMD_SET_HW_TIMESTAMP, + .doit = nl80211_set_hw_timestamp, + .flags = GENL_UNS_ADMIN_PERM, + .internal_flags = IFLAGS(NL80211_FLAG_NEED_NETDEV_UP), + }, }; static struct genl_family nl80211_fam __ro_after_init = { @@ -18721,7 +18835,9 @@ EXPORT_SYMBOL(cfg80211_mgmt_tx_status_ext); static int __nl80211_rx_control_port(struct net_device *dev, struct sk_buff *skb, - bool unencrypted, gfp_t gfp) + bool unencrypted, + int link_id, + gfp_t gfp) { struct wireless_dev *wdev = dev->ieee80211_ptr; struct cfg80211_registered_device *rdev = wiphy_to_rdev(wdev->wiphy); @@ -18753,6 +18869,8 @@ static int __nl80211_rx_control_port(struct net_device *dev, NL80211_ATTR_PAD) || nla_put(msg, NL80211_ATTR_MAC, ETH_ALEN, addr) || nla_put_u16(msg, NL80211_ATTR_CONTROL_PORT_ETHERTYPE, proto) || + (link_id >= 0 && + nla_put_u8(msg, NL80211_ATTR_MLO_LINK_ID, link_id)) || (unencrypted && nla_put_flag(msg, NL80211_ATTR_CONTROL_PORT_NO_ENCRYPT))) goto nla_put_failure; @@ -18771,13 +18889,14 @@ static int __nl80211_rx_control_port(struct net_device *dev, return -ENOBUFS; } -bool cfg80211_rx_control_port(struct net_device *dev, - struct sk_buff *skb, bool unencrypted) +bool cfg80211_rx_control_port(struct net_device *dev, struct sk_buff *skb, + bool unencrypted, int link_id) { int ret; - trace_cfg80211_rx_control_port(dev, skb, unencrypted); - ret = __nl80211_rx_control_port(dev, skb, unencrypted, GFP_ATOMIC); + trace_cfg80211_rx_control_port(dev, skb, unencrypted, link_id); + ret = __nl80211_rx_control_port(dev, skb, unencrypted, link_id, + GFP_ATOMIC); trace_cfg80211_return_bool(ret == 0); return ret == 0; } diff --git a/net/wireless/rdev-ops.h b/net/wireless/rdev-ops.h index 13b209a8db28..2e497cf26ef2 100644 --- a/net/wireless/rdev-ops.h +++ b/net/wireless/rdev-ops.h @@ -1494,4 +1494,21 @@ rdev_del_link_station(struct cfg80211_registered_device *rdev, return ret; } +static inline int +rdev_set_hw_timestamp(struct cfg80211_registered_device *rdev, + struct net_device *dev, + struct cfg80211_set_hw_timestamp *hwts) +{ + struct wiphy *wiphy = &rdev->wiphy; + int ret; + + if (!rdev->ops->set_hw_timestamp) + return -EOPNOTSUPP; + + trace_rdev_set_hw_timestamp(wiphy, dev, hwts); + ret = rdev->ops->set_hw_timestamp(wiphy, dev, hwts); + trace_rdev_return_int(wiphy, ret); + + return ret; +} #endif /* __CFG80211_RDEV_OPS */ diff --git a/net/wireless/scan.c b/net/wireless/scan.c index 790bc31cf82e..a1382255fab3 100644 --- a/net/wireless/scan.c +++ b/net/wireless/scan.c @@ -1810,8 +1810,7 @@ cfg80211_bss_update(struct cfg80211_registered_device *rdev, } int cfg80211_get_ies_channel_number(const u8 *ie, size_t ielen, - enum nl80211_band band, - enum cfg80211_bss_frame_type ftype) + enum nl80211_band band) { const struct element *tmp; @@ -1830,9 +1829,7 @@ int cfg80211_get_ies_channel_number(const u8 *ie, size_t ielen, if (!he_6ghz_oper) return -1; - if (ftype != CFG80211_BSS_FTYPE_BEACON || - he_6ghz_oper->control & IEEE80211_HE_6GHZ_OPER_CTRL_DUP_BEACON) - return he_6ghz_oper->primary; + return he_6ghz_oper->primary; } } else if (band == NL80211_BAND_S1GHZ) { tmp = cfg80211_find_elem(WLAN_EID_S1G_OPERATION, ie, ielen); @@ -1870,15 +1867,14 @@ EXPORT_SYMBOL(cfg80211_get_ies_channel_number); static struct ieee80211_channel * cfg80211_get_bss_channel(struct wiphy *wiphy, const u8 *ie, size_t ielen, struct ieee80211_channel *channel, - enum nl80211_bss_scan_width scan_width, - enum cfg80211_bss_frame_type ftype) + enum nl80211_bss_scan_width scan_width) { u32 freq; int channel_number; struct ieee80211_channel *alt_channel; channel_number = cfg80211_get_ies_channel_number(ie, ielen, - channel->band, ftype); + channel->band); if (channel_number < 0) { /* No channel information in frame payload */ @@ -1888,22 +1884,21 @@ cfg80211_get_bss_channel(struct wiphy *wiphy, const u8 *ie, size_t ielen, freq = ieee80211_channel_to_freq_khz(channel_number, channel->band); /* - * In 6GHz, duplicated beacon indication is relevant for - * beacons only. + * Frame info (beacon/prob res) is the same as received channel, + * no need for further processing. */ - if (channel->band == NL80211_BAND_6GHZ && - (freq == channel->center_freq || - abs(freq - channel->center_freq) > 80)) + if (freq == ieee80211_channel_to_khz(channel)) return channel; alt_channel = ieee80211_get_channel_khz(wiphy, freq); if (!alt_channel) { - if (channel->band == NL80211_BAND_2GHZ) { + if (channel->band == NL80211_BAND_2GHZ || + channel->band == NL80211_BAND_6GHZ) { /* * Better not allow unexpected channels when that could * be going beyond the 1-11 range (e.g., discovering * BSS on channel 12 when radio is configured for - * channel 11. + * channel 11) or beyond the 6 GHz channel range. */ return NULL; } @@ -1957,7 +1952,7 @@ cfg80211_inform_single_bss_data(struct wiphy *wiphy, return NULL; channel = cfg80211_get_bss_channel(wiphy, ie, ielen, data->chan, - data->scan_width, ftype); + data->scan_width); if (!channel) return NULL; @@ -2391,7 +2386,6 @@ cfg80211_inform_single_bss_frame_data(struct wiphy *wiphy, size_t ielen, min_hdr_len = offsetof(struct ieee80211_mgmt, u.probe_resp.variable); int bss_type; - enum cfg80211_bss_frame_type ftype; BUILD_BUG_ON(offsetof(struct ieee80211_mgmt, u.probe_resp.variable) != offsetof(struct ieee80211_mgmt, u.beacon.variable)); @@ -2428,16 +2422,8 @@ cfg80211_inform_single_bss_frame_data(struct wiphy *wiphy, variable = ext->u.s1g_beacon.variable; } - if (ieee80211_is_beacon(mgmt->frame_control)) - ftype = CFG80211_BSS_FTYPE_BEACON; - else if (ieee80211_is_probe_resp(mgmt->frame_control)) - ftype = CFG80211_BSS_FTYPE_PRESP; - else - ftype = CFG80211_BSS_FTYPE_UNKNOWN; - channel = cfg80211_get_bss_channel(wiphy, variable, - ielen, data->chan, data->scan_width, - ftype); + ielen, data->chan, data->scan_width); if (!channel) return NULL; diff --git a/net/wireless/trace.h b/net/wireless/trace.h index ca7474eec723..716a1fa70069 100644 --- a/net/wireless/trace.h +++ b/net/wireless/trace.h @@ -3165,14 +3165,15 @@ TRACE_EVENT(cfg80211_control_port_tx_status, TRACE_EVENT(cfg80211_rx_control_port, TP_PROTO(struct net_device *netdev, struct sk_buff *skb, - bool unencrypted), - TP_ARGS(netdev, skb, unencrypted), + bool unencrypted, int link_id), + TP_ARGS(netdev, skb, unencrypted, link_id), TP_STRUCT__entry( NETDEV_ENTRY __field(int, len) MAC_ENTRY(from) __field(u16, proto) __field(bool, unencrypted) + __field(int, link_id) ), TP_fast_assign( NETDEV_ASSIGN; @@ -3180,10 +3181,12 @@ TRACE_EVENT(cfg80211_rx_control_port, MAC_ASSIGN(from, eth_hdr(skb)->h_source); __entry->proto = be16_to_cpu(skb->protocol); __entry->unencrypted = unencrypted; + __entry->link_id = link_id; ), - TP_printk(NETDEV_PR_FMT ", len=%d, %pM, proto: 0x%x, unencrypted: %s", + TP_printk(NETDEV_PR_FMT ", len=%d, %pM, proto: 0x%x, unencrypted: %s, link: %d", NETDEV_PR_ARG, __entry->len, __entry->from, - __entry->proto, BOOL_TO_STR(__entry->unencrypted)) + __entry->proto, BOOL_TO_STR(__entry->unencrypted), + __entry->link_id) ); TRACE_EVENT(cfg80211_cqm_rssi_notify, @@ -3918,6 +3921,31 @@ TRACE_EVENT(rdev_del_link_station, __entry->link_id) ); +TRACE_EVENT(rdev_set_hw_timestamp, + TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, + struct cfg80211_set_hw_timestamp *hwts), + + TP_ARGS(wiphy, netdev, hwts), + + TP_STRUCT__entry( + WIPHY_ENTRY + NETDEV_ENTRY + MAC_ENTRY(macaddr) + __field(bool, enable) + ), + + TP_fast_assign( + WIPHY_ASSIGN; + NETDEV_ASSIGN; + MAC_ASSIGN(macaddr, hwts->macaddr); + __entry->enable = hwts->enable; + ), + + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", mac %pM, enable: %u", + WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->macaddr, + __entry->enable) +); + #endif /* !__RDEV_OPS_TRACE || TRACE_HEADER_MULTI_READ */ #undef TRACE_INCLUDE_PATH diff --git a/net/wireless/util.c b/net/wireless/util.c index d1a89e82ead0..3bc0c3072e78 100644 --- a/net/wireless/util.c +++ b/net/wireless/util.c @@ -776,7 +776,24 @@ __ieee80211_amsdu_copy(struct sk_buff *skb, unsigned int hlen, return frame; } -bool ieee80211_is_valid_amsdu(struct sk_buff *skb, bool mesh_hdr) +static u16 +ieee80211_amsdu_subframe_length(void *field, u8 mesh_flags, u8 hdr_type) +{ + __le16 *field_le = field; + __be16 *field_be = field; + u16 len; + + if (hdr_type >= 2) + len = le16_to_cpu(*field_le); + else + len = be16_to_cpu(*field_be); + if (hdr_type) + len += __ieee80211_get_mesh_hdrlen(mesh_flags); + + return len; +} + +bool ieee80211_is_valid_amsdu(struct sk_buff *skb, u8 mesh_hdr) { int offset = 0, remaining, subframe_len, padding; @@ -790,12 +807,8 @@ bool ieee80211_is_valid_amsdu(struct sk_buff *skb, bool mesh_hdr) if (skb_copy_bits(skb, offset + 2 * ETH_ALEN, &hdr, sizeof(hdr)) < 0) return false; - if (mesh_hdr) - len = le16_to_cpu(*(__le16 *)&hdr.len) + - __ieee80211_get_mesh_hdrlen(hdr.mesh_flags); - else - len = ntohs(hdr.len); - + len = ieee80211_amsdu_subframe_length(&hdr.len, hdr.mesh_flags, + mesh_hdr); subframe_len = sizeof(struct ethhdr) + len; padding = (4 - subframe_len) & 0x3; remaining = skb->len - offset; @@ -812,7 +825,7 @@ void ieee80211_amsdu_to_8023s(struct sk_buff *skb, struct sk_buff_head *list, const u8 *addr, enum nl80211_iftype iftype, const unsigned int extra_headroom, const u8 *check_da, const u8 *check_sa, - bool mesh_control) + u8 mesh_control) { unsigned int hlen = ALIGN(extra_headroom, 4); struct sk_buff *frame = NULL; @@ -837,11 +850,8 @@ void ieee80211_amsdu_to_8023s(struct sk_buff *skb, struct sk_buff_head *list, skb_copy_bits(skb, offset, &hdr, copy_len); if (iftype == NL80211_IFTYPE_MESH_POINT) mesh_len = __ieee80211_get_mesh_hdrlen(hdr.flags); - if (mesh_control) - len = le16_to_cpu(*(__le16 *)&hdr.eth.h_proto) + mesh_len; - else - len = ntohs(hdr.eth.h_proto); - + len = ieee80211_amsdu_subframe_length(&hdr.eth.h_proto, hdr.flags, + mesh_control); subframe_len = sizeof(struct ethhdr) + len; padding = (4 - subframe_len) & 0x3; diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 2ac58b282b5e..cc1e7f15fa73 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -1301,9 +1301,10 @@ static int xsk_mmap(struct file *file, struct socket *sock, loff_t offset = (loff_t)vma->vm_pgoff << PAGE_SHIFT; unsigned long size = vma->vm_end - vma->vm_start; struct xdp_sock *xs = xdp_sk(sock->sk); + int state = READ_ONCE(xs->state); struct xsk_queue *q = NULL; - if (READ_ONCE(xs->state) != XSK_READY) + if (state != XSK_READY && state != XSK_BOUND) return -EBUSY; if (offset == XDP_PGOFF_RX_RING) { @@ -1314,9 +1315,11 @@ static int xsk_mmap(struct file *file, struct socket *sock, /* Matches the smp_wmb() in XDP_UMEM_REG */ smp_rmb(); if (offset == XDP_UMEM_PGOFF_FILL_RING) - q = READ_ONCE(xs->fq_tmp); + q = state == XSK_READY ? READ_ONCE(xs->fq_tmp) : + READ_ONCE(xs->pool->fq); else if (offset == XDP_UMEM_PGOFF_COMPLETION_RING) - q = READ_ONCE(xs->cq_tmp); + q = state == XSK_READY ? READ_ONCE(xs->cq_tmp) : + READ_ONCE(xs->pool->cq); } if (!q) diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h index bfb2a7e50c26..6d40a77fccbe 100644 --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -133,16 +133,12 @@ static inline bool xskq_cons_read_addr_unchecked(struct xsk_queue *q, u64 *addr) static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc) { - u64 chunk, chunk_end; + u64 offset = desc->addr & (pool->chunk_size - 1); - chunk = xp_aligned_extract_addr(pool, desc->addr); - if (likely(desc->len)) { - chunk_end = xp_aligned_extract_addr(pool, desc->addr + desc->len - 1); - if (chunk != chunk_end) - return false; - } + if (offset + desc->len > pool->chunk_size) + return false; - if (chunk >= pool->addrs_cnt) + if (desc->addr >= pool->addrs_cnt) return false; if (desc->options) @@ -153,15 +149,12 @@ static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool, static inline bool xp_unaligned_validate_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc) { - u64 addr, base_addr; - - base_addr = xp_unaligned_extract_addr(desc->addr); - addr = xp_unaligned_add_offset_to_addr(desc->addr); + u64 addr = xp_unaligned_add_offset_to_addr(desc->addr); if (desc->len > pool->chunk_size) return false; - if (base_addr >= pool->addrs_cnt || addr >= pool->addrs_cnt || + if (addr >= pool->addrs_cnt || addr + desc->len > pool->addrs_cnt || xp_desc_crosses_non_contig_pg(pool, addr, desc->len)) return false; diff --git a/net/xdp/xskmap.c b/net/xdp/xskmap.c index 771d0fa90ef5..2c1427074a3b 100644 --- a/net/xdp/xskmap.c +++ b/net/xdp/xskmap.c @@ -24,6 +24,7 @@ static struct xsk_map_node *xsk_map_node_alloc(struct xsk_map *map, return ERR_PTR(-ENOMEM); bpf_map_inc(&map->map); + atomic_inc(&map->count); node->map = map; node->map_entry = map_entry; @@ -32,8 +33,11 @@ static struct xsk_map_node *xsk_map_node_alloc(struct xsk_map *map, static void xsk_map_node_free(struct xsk_map_node *node) { + struct xsk_map *map = node->map; + bpf_map_put(&node->map->map); kfree(node); + atomic_dec(&map->count); } static void xsk_map_sock_add(struct xdp_sock *xs, struct xsk_map_node *node) @@ -85,6 +89,14 @@ static struct bpf_map *xsk_map_alloc(union bpf_attr *attr) return &m->map; } +static u64 xsk_map_mem_usage(const struct bpf_map *map) +{ + struct xsk_map *m = container_of(map, struct xsk_map, map); + + return struct_size(m, xsk_map, map->max_entries) + + (u64)atomic_read(&m->count) * sizeof(struct xsk_map_node); +} + static void xsk_map_free(struct bpf_map *map) { struct xsk_map *m = container_of(map, struct xsk_map, map); @@ -150,8 +162,8 @@ static void *xsk_map_lookup_elem_sys_only(struct bpf_map *map, void *key) return ERR_PTR(-EOPNOTSUPP); } -static int xsk_map_update_elem(struct bpf_map *map, void *key, void *value, - u64 map_flags) +static long xsk_map_update_elem(struct bpf_map *map, void *key, void *value, + u64 map_flags) { struct xsk_map *m = container_of(map, struct xsk_map, map); struct xdp_sock __rcu **map_entry; @@ -211,7 +223,7 @@ out: return err; } -static int xsk_map_delete_elem(struct bpf_map *map, void *key) +static long xsk_map_delete_elem(struct bpf_map *map, void *key) { struct xsk_map *m = container_of(map, struct xsk_map, map); struct xdp_sock __rcu **map_entry; @@ -231,7 +243,7 @@ static int xsk_map_delete_elem(struct bpf_map *map, void *key) return 0; } -static int xsk_map_redirect(struct bpf_map *map, u64 index, u64 flags) +static long xsk_map_redirect(struct bpf_map *map, u64 index, u64 flags) { return __bpf_xdp_redirect_map(map, index, flags, 0, __xsk_map_lookup_elem); @@ -267,6 +279,7 @@ const struct bpf_map_ops xsk_map_ops = { .map_update_elem = xsk_map_update_elem, .map_delete_elem = xsk_map_delete_elem, .map_check_btf = map_check_no_btf, + .map_mem_usage = xsk_map_mem_usage, .map_btf_id = &xsk_map_btf_ids[0], .map_redirect = xsk_map_redirect, }; diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c index 95f1436bf6a2..bef28c6187eb 100644 --- a/net/xfrm/xfrm_device.c +++ b/net/xfrm/xfrm_device.c @@ -287,7 +287,7 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x, return (is_packet_offload) ? -EINVAL : 0; } - if (x->props.flags & XFRM_STATE_ESN && + if (!is_packet_offload && x->props.flags & XFRM_STATE_ESN && !dev->xfrmdev_ops->xdo_dev_state_advance_esn) { NL_SET_ERR_MSG(extack, "Device doesn't support offload with ESN"); xso->dev = NULL; diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c index 436d29640ac2..39fb91ff23d9 100644 --- a/net/xfrm/xfrm_input.c +++ b/net/xfrm/xfrm_input.c @@ -231,9 +231,6 @@ static int xfrm4_remove_tunnel_encap(struct xfrm_state *x, struct sk_buff *skb) { int err = -EINVAL; - if (XFRM_MODE_SKB_CB(skb)->protocol != IPPROTO_IPIP) - goto out; - if (!pskb_may_pull(skb, sizeof(struct iphdr))) goto out; @@ -269,8 +266,6 @@ static int xfrm6_remove_tunnel_encap(struct xfrm_state *x, struct sk_buff *skb) { int err = -EINVAL; - if (XFRM_MODE_SKB_CB(skb)->protocol != IPPROTO_IPV6) - goto out; if (!pskb_may_pull(skb, sizeof(struct ipv6hdr))) goto out; @@ -331,22 +326,26 @@ out: */ static int xfrm_inner_mode_encap_remove(struct xfrm_state *x, - const struct xfrm_mode *inner_mode, struct sk_buff *skb) { - switch (inner_mode->encap) { + switch (x->props.mode) { case XFRM_MODE_BEET: - if (inner_mode->family == AF_INET) + switch (XFRM_MODE_SKB_CB(skb)->protocol) { + case IPPROTO_IPIP: + case IPPROTO_BEETPH: return xfrm4_remove_beet_encap(x, skb); - if (inner_mode->family == AF_INET6) + case IPPROTO_IPV6: return xfrm6_remove_beet_encap(x, skb); + } break; case XFRM_MODE_TUNNEL: - if (inner_mode->family == AF_INET) + switch (XFRM_MODE_SKB_CB(skb)->protocol) { + case IPPROTO_IPIP: return xfrm4_remove_tunnel_encap(x, skb); - if (inner_mode->family == AF_INET6) + case IPPROTO_IPV6: return xfrm6_remove_tunnel_encap(x, skb); break; + } } WARN_ON_ONCE(1); @@ -355,9 +354,7 @@ xfrm_inner_mode_encap_remove(struct xfrm_state *x, static int xfrm_prepare_input(struct xfrm_state *x, struct sk_buff *skb) { - const struct xfrm_mode *inner_mode = &x->inner_mode; - - switch (x->outer_mode.family) { + switch (x->props.family) { case AF_INET: xfrm4_extract_header(skb); break; @@ -369,17 +366,12 @@ static int xfrm_prepare_input(struct xfrm_state *x, struct sk_buff *skb) return -EAFNOSUPPORT; } - if (x->sel.family == AF_UNSPEC) { - inner_mode = xfrm_ip2inner_mode(x, XFRM_MODE_SKB_CB(skb)->protocol); - if (!inner_mode) - return -EAFNOSUPPORT; - } - - switch (inner_mode->family) { - case AF_INET: + switch (XFRM_MODE_SKB_CB(skb)->protocol) { + case IPPROTO_IPIP: + case IPPROTO_BEETPH: skb->protocol = htons(ETH_P_IP); break; - case AF_INET6: + case IPPROTO_IPV6: skb->protocol = htons(ETH_P_IPV6); break; default: @@ -387,7 +379,7 @@ static int xfrm_prepare_input(struct xfrm_state *x, struct sk_buff *skb) break; } - return xfrm_inner_mode_encap_remove(x, inner_mode, skb); + return xfrm_inner_mode_encap_remove(x, skb); } /* Remove encapsulation header. @@ -433,17 +425,16 @@ static int xfrm6_transport_input(struct xfrm_state *x, struct sk_buff *skb) } static int xfrm_inner_mode_input(struct xfrm_state *x, - const struct xfrm_mode *inner_mode, struct sk_buff *skb) { - switch (inner_mode->encap) { + switch (x->props.mode) { case XFRM_MODE_BEET: case XFRM_MODE_TUNNEL: return xfrm_prepare_input(x, skb); case XFRM_MODE_TRANSPORT: - if (inner_mode->family == AF_INET) + if (x->props.family == AF_INET) return xfrm4_transport_input(x, skb); - if (inner_mode->family == AF_INET6) + if (x->props.family == AF_INET6) return xfrm6_transport_input(x, skb); break; case XFRM_MODE_ROUTEOPTIMIZATION: @@ -461,7 +452,6 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type) { const struct xfrm_state_afinfo *afinfo; struct net *net = dev_net(skb->dev); - const struct xfrm_mode *inner_mode; int err; __be32 seq; __be32 seq_hi; @@ -491,7 +481,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type) goto drop; } - family = x->outer_mode.family; + family = x->props.family; /* An encap_type of -1 indicates async resumption. */ if (encap_type == -1) { @@ -676,17 +666,7 @@ resume: XFRM_MODE_SKB_CB(skb)->protocol = nexthdr; - inner_mode = &x->inner_mode; - - if (x->sel.family == AF_UNSPEC) { - inner_mode = xfrm_ip2inner_mode(x, XFRM_MODE_SKB_CB(skb)->protocol); - if (inner_mode == NULL) { - XFRM_INC_STATS(net, LINUX_MIB_XFRMINSTATEMODEERROR); - goto drop; - } - } - - if (xfrm_inner_mode_input(x, inner_mode, skb)) { + if (xfrm_inner_mode_input(x, skb)) { XFRM_INC_STATS(net, LINUX_MIB_XFRMINSTATEMODEERROR); goto drop; } @@ -701,7 +681,7 @@ resume: * transport mode so the outer address is identical. */ daddr = &x->id.daddr; - family = x->outer_mode.family; + family = x->props.family; err = xfrm_parse_spi(skb, nexthdr, &spi, &seq); if (err < 0) { @@ -732,7 +712,7 @@ resume: err = -EAFNOSUPPORT; rcu_read_lock(); - afinfo = xfrm_state_afinfo_get_rcu(x->inner_mode.family); + afinfo = xfrm_state_afinfo_get_rcu(x->props.family); if (likely(afinfo)) err = afinfo->transport_finish(skb, xfrm_gro || async); rcu_read_unlock(); diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c index ff114d68cc43..369e5de8558f 100644 --- a/net/xfrm/xfrm_output.c +++ b/net/xfrm/xfrm_output.c @@ -412,7 +412,7 @@ static int xfrm4_prepare_output(struct xfrm_state *x, struct sk_buff *skb) IPCB(skb)->flags |= IPSKB_XFRM_TUNNEL_SIZE; skb->protocol = htons(ETH_P_IP); - switch (x->outer_mode.encap) { + switch (x->props.mode) { case XFRM_MODE_BEET: return xfrm4_beet_encap_add(x, skb); case XFRM_MODE_TUNNEL: @@ -435,7 +435,7 @@ static int xfrm6_prepare_output(struct xfrm_state *x, struct sk_buff *skb) skb->ignore_df = 1; skb->protocol = htons(ETH_P_IPV6); - switch (x->outer_mode.encap) { + switch (x->props.mode) { case XFRM_MODE_BEET: return xfrm6_beet_encap_add(x, skb); case XFRM_MODE_TUNNEL: @@ -451,22 +451,22 @@ static int xfrm6_prepare_output(struct xfrm_state *x, struct sk_buff *skb) static int xfrm_outer_mode_output(struct xfrm_state *x, struct sk_buff *skb) { - switch (x->outer_mode.encap) { + switch (x->props.mode) { case XFRM_MODE_BEET: case XFRM_MODE_TUNNEL: - if (x->outer_mode.family == AF_INET) + if (x->props.family == AF_INET) return xfrm4_prepare_output(x, skb); - if (x->outer_mode.family == AF_INET6) + if (x->props.family == AF_INET6) return xfrm6_prepare_output(x, skb); break; case XFRM_MODE_TRANSPORT: - if (x->outer_mode.family == AF_INET) + if (x->props.family == AF_INET) return xfrm4_transport_output(x, skb); - if (x->outer_mode.family == AF_INET6) + if (x->props.family == AF_INET6) return xfrm6_transport_output(x, skb); break; case XFRM_MODE_ROUTEOPTIMIZATION: - if (x->outer_mode.family == AF_INET6) + if (x->props.family == AF_INET6) return xfrm6_ro_output(x, skb); WARN_ON_ONCE(1); break; @@ -875,21 +875,10 @@ static int xfrm6_extract_output(struct xfrm_state *x, struct sk_buff *skb) static int xfrm_inner_extract_output(struct xfrm_state *x, struct sk_buff *skb) { - const struct xfrm_mode *inner_mode; - - if (x->sel.family == AF_UNSPEC) - inner_mode = xfrm_ip2inner_mode(x, - xfrm_af2proto(skb_dst(skb)->ops->family)); - else - inner_mode = &x->inner_mode; - - if (inner_mode == NULL) - return -EAFNOSUPPORT; - - switch (inner_mode->family) { - case AF_INET: + switch (skb->protocol) { + case htons(ETH_P_IP): return xfrm4_extract_output(x, skb); - case AF_INET6: + case htons(ETH_P_IPV6): return xfrm6_extract_output(x, skb); } diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index 50baf50dc513..49e63eea841d 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -1272,6 +1272,7 @@ found: xso->dir = xdo->dir; xso->dev = xdo->dev; xso->real_dev = xdo->real_dev; + xso->flags = XFRM_DEV_OFFLOAD_FLAG_ACQ; netdev_tracker_alloc(xso->dev, &xso->dev_tracker, GFP_ATOMIC); error = xso->dev->xfrmdev_ops->xdo_dev_state_add(x, NULL); diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c index 103af2b3e986..d720e163ae6e 100644 --- a/net/xfrm/xfrm_user.c +++ b/net/xfrm/xfrm_user.c @@ -901,6 +901,8 @@ static void copy_to_user_state(struct xfrm_state *x, struct xfrm_usersa_info *p) memcpy(&p->id, &x->id, sizeof(p->id)); memcpy(&p->sel, &x->sel, sizeof(p->sel)); memcpy(&p->lft, &x->lft, sizeof(p->lft)); + if (x->xso.dev) + xfrm_dev_state_update_curlft(x); memcpy(&p->curlft, &x->curlft, sizeof(p->curlft)); put_unaligned(x->stats.replay_window, &p->stats.replay_window); put_unaligned(x->stats.replay, &p->stats.replay); |