summaryrefslogtreecommitdiff
path: root/include/uapi
AgeCommit message (Collapse)Author
2021-10-26mctp: Implement extended addressingJeremy Kerr
This change allows an extended address struct - struct sockaddr_mctp_ext - to be passed to sendmsg/recvmsg. This allows userspace to specify output ifindex and physical address information (for sendmsg) or receive the input ifindex/physaddr for incoming messages (for recvmsg). This is typically used by userspace for MCTP address discovery and assignment operations. The extended addressing facility is conditional on a new sockopt: MCTP_OPT_ADDR_EXT; userspace must explicitly enable addressing before the kernel will consume/populate the extended address data. Includes a fix for an uninitialised var: Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25net: phy: add constants for fast retrain related registerLuo Jie
Add the constants for 2.5G fast retrain capability in 10G AN control register, fast retrain status and control register and THP bypass register into mdio.h. Signed-off-by: Luo Jie <luoj@codeaurora.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-24can: netlink: add interface for CAN-FD Transmitter Delay Compensation (TDC)Vincent Mailhol
Add the netlink interface for TDC parameters of struct can_tdc_const and can_tdc. Contrary to the can_bittiming(_const) structures for which there is just a single IFLA_CAN(_DATA)_BITTMING(_CONST) entry per structure, here, we create a nested entry IFLA_CAN_TDC. Within this nested entry, additional IFLA_CAN_TDC_TDC* entries are added for each of the TDC parameters of the newly introduced struct can_tdc_const and struct can_tdc. For struct can_tdc_const, these are: IFLA_CAN_TDC_TDCV_MIN IFLA_CAN_TDC_TDCV_MAX IFLA_CAN_TDC_TDCO_MIN IFLA_CAN_TDC_TDCO_MAX IFLA_CAN_TDC_TDCF_MIN IFLA_CAN_TDC_TDCF_MAX For struct can_tdc, these are: IFLA_CAN_TDC_TDCV IFLA_CAN_TDC_TDCO IFLA_CAN_TDC_TDCF This is done so that changes can be applied in the future to the structures without breaking the netlink interface. The TDC netlink logic works as follow: * CAN_CTRLMODE_FD is not provided: - if any TDC parameters are provided: error. - TDC parameters not provided: TDC parameters unchanged. * CAN_CTRLMODE_FD is provided and is false: - TDC is deactivated: both the structure and the CAN_CTRLMODE_TDC_{AUTO,MANUAL} flags are flushed. * CAN_CTRLMODE_FD provided and is true: - CAN_CTRLMODE_TDC_{AUTO,MANUAL} and tdc{v,o,f} not provided: call can_calc_tdco() to automatically decide whether TDC should be activated and, if so, set CAN_CTRLMODE_TDC_AUTO and uses the calculated tdco value. - CAN_CTRLMODE_TDC_AUTO and tdco provided: set CAN_CTRLMODE_TDC_AUTO and use the provided tdco value. Here, tdcv is illegal and tdcf is optional. - CAN_CTRLMODE_TDC_MANUAL and both of tdcv and tdco provided: set CAN_CTRLMODE_TDC_MANUAL and use the provided tdcv and tdco value. Here, tdcf is optional. - CAN_CTRLMODE_TDC_{AUTO,MANUAL} are mutually exclusive. Whenever one flag is turned on, the other will automatically be turned off. Providing both returns an error. - Combination other than the one listed above are illegal and will return an error. N.B. above rules mean that whenever CAN_CTRLMODE_FD is provided, the previous TDC values will be overwritten. The only option to reuse previous TDC value is to not provide CAN_CTRLMODE_FD. All the new parameters are defined as u32. This arbitrary choice is done to mimic the other bittiming values with are also all of type u32. An u16 would have been sufficient to hold the TDC values. This patch completes below series (c.f. [1]): - commit 289ea9e4ae59 ("can: add new CAN FD bittiming parameters: Transmitter Delay Compensation (TDC)") - commit c25cc7993243 ("can: bittiming: add calculation for CAN FD Transmitter Delay Compensation (TDC)") [1] https://lore.kernel.org/linux-can/20210224002008.4158-1-mailhol.vincent@wanadoo.fr/T/#t Link: https://lore.kernel.org/all/20210918095637.20108-5-mailhol.vincent@wanadoo.fr Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-10-24can: bittiming: allow TDC{V,O} to be zero and add can_tdc_const::tdc{v,o,f}_minVincent Mailhol
ISO 11898-1 specifies in section 11.3.3 "Transmitter delay compensation" that "the configuration range for [the] SSP position shall be at least 0 to 63 minimum time quanta." Because SSP = TDCV + TDCO, it means that we should allow both TDCV and TDCO to hold zero value in order to honor SSP's minimum possible value. However, current implementation assigned special meaning to TDCV and TDCO's zero values: * TDCV = 0 -> TDCV is automatically measured by the transceiver. * TDCO = 0 -> TDC is off. In order to allow for those values to really be zero and to maintain current features, we introduce two new flags: * CAN_CTRLMODE_TDC_AUTO indicates that the controller support automatic measurement of TDCV. * CAN_CTRLMODE_TDC_MANUAL indicates that the controller support manual configuration of TDCV. N.B.: current implementation failed to provide an option for the driver to indicate that only manual mode was supported. TDC is disabled if both CAN_CTRLMODE_TDC_AUTO and CAN_CTRLMODE_TDC_MANUAL flags are off, c.f. the helper function can_tdc_is_enabled() which is also introduced in this patch. Also, this patch adds three fields: tdcv_min, tdco_min and tdcf_min to struct can_tdc_const. While we are not convinced that those three fields could be anything else than zero, we can imagine that some controllers might specify a lower bound on these. Thus, those minimums are really added "just in case". Comments of struct can_tdc and can_tdc_const are updated accordingly. Finally, the changes are applied to the etas_es58x driver. Link: https://lore.kernel.org/all/20210918095637.20108-2-mailhol.vincent@wanadoo.fr Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-10-22Merge tag 'mac80211-next-for-net-next-2021-10-21' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Quite a few changes: * the applicable eth_hw_addr_set() and const hw_addr changes * various code cleanups/refactorings * stack usage reductions across the wireless stack * some unstructured find_ie() -> structured find_element() changes * a few more pieces of multi-BSSID support * some 6 GHz regulatory support * 6 GHz support in hwsim, for testing userspace code * Light Communications (LC, 802.11bb) early band definitions to be able to add a first driver soon * tag 'mac80211-next-for-net-next-2021-10-21' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next: (35 commits) cfg80211: fix kernel-doc for MBSSID EMA mac80211: Prevent AP probing during suspend nl80211: Add LC placeholder band definition to nl80211_band ... ==================== Link: https://lore.kernel.org/r/20211021154953.134849-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Lots of simnple overlapping additions. With a build fix from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21Merge tag 'net-5.15-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter, and can. We'll have one more fix for a socket accounting regression, it's still getting polished. Otherwise things look fine. Current release - regressions: - revert "vrf: reset skb conntrack connection on VRF rcv", there are valid uses for previous behavior - can: m_can: fix iomap_read_fifo() and iomap_write_fifo() Current release - new code bugs: - mlx5: e-switch, return correct error code on group creation failure Previous releases - regressions: - sctp: fix transport encap_port update in sctp_vtag_verify - stmmac: fix E2E delay mechanism (in PTP timestamping) Previous releases - always broken: - netfilter: ip6t_rt: fix out-of-bounds read of ipv6_rt_hdr - netfilter: xt_IDLETIMER: fix out-of-bound read caused by lack of init - netfilter: ipvs: make global sysctl read-only in non-init netns - tcp: md5: fix selection between vrf and non-vrf keys - ipv6: count rx stats on the orig netdev when forwarding - bridge: mcast: use multicast_membership_interval for IGMPv3 - can: - j1939: fix UAF for rx_kref of j1939_priv abort sessions on receiving bad messages - isotp: fix TX buffer concurrent access in isotp_sendmsg() fix return error on FC timeout on TX path - ice: fix re-init of RDMA Tx queues and crash if RDMA was not inited - hns3: schedule the polling again when allocation fails, prevent stalls - drivers: add missing of_node_put() when aborting for_each_available_child_of_node() - ptp: fix possible memory leak and UAF in ptp_clock_register() - e1000e: fix packet loss in burst mode on Tiger Lake and later - mlx5e: ipsec: fix more checksum offload issues" * tag 'net-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (75 commits) usbnet: sanity check for maxpacket net: enetc: make sure all traffic classes can send large frames net: enetc: fix ethtool counter name for PM0_TERR ptp: free 'vclock_index' in ptp_clock_release() sfc: Don't use netif_info before net_device setup sfc: Export fibre-specific supported link modes net/mlx5e: IPsec: Fix work queue entry ethernet segment checksum flags net/mlx5e: IPsec: Fix a misuse of the software parser's fields net/mlx5e: Fix vlan data lost during suspend flow net/mlx5: E-switch, Return correct error code on group creation failure net/mlx5: Lag, change multipath and bonding to be mutually exclusive ice: Add missing E810 device ids igc: Update I226_K device ID e1000e: Fix packet loss on Tiger Lake and later e1000e: Separate TGP board type from SPT ptp: Fix possible memory leak in ptp_clock_register() net: stmmac: Fix E2E delay mechanism nfc: st95hf: Make spi remove() callback return zero net: hns3: disable sriov before unload hclge layer net: hns3: fix vf reset workqueue cannot exit ...
2021-10-21cfg80211: fix kernel-doc for MBSSID EMAJohannes Berg
The struct member ema_max_profile_periodicity was listed with the wrong name in the kernel-doc, fix that. Link: https://lore.kernel.org/r/20211021173038.18ec2030c66b.Iac731bb299525940948adad2c41f514b7dd81c47@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-10-21nl80211: Add LC placeholder band definition to nl80211_bandSrinivasan Raju
Define LC band which is a draft under IEEE 802.11bb. Current NL80211_BAND_LC is a placeholder band and will be more defined IEEE 802.11bb progresses. Signed-off-by: Srinivasan Raju <srini.raju@purelifi.com> Link: https://lore.kernel.org/r/20211018100143.7565-2-srini.raju@purelifi.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-10-21nl80211: vendor-cmd: intel: add more details for ↵Emmanuel Grumbach
IWL_MVM_VENDOR_CMD_HOST_GET_OWNERSHIP Explain more the expected flow for this command. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Link: https://lore.kernel.org/r/20211020051147.29297-1-emmanuel.grumbach@intel.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-10-20fq_codel: generalise ce_threshold marking for subset of trafficToke Høiland-Jørgensen
Commit e72aeb9ee0e3 ("fq_codel: implement L4S style ce_threshold_ect1 marking") expanded the ce_threshold feature of FQ-CoDel so it can be applied to a subset of the traffic, using the ECT(1) bit of the ECN field as the classifier. However, hard-coding ECT(1) as the only classifier for this feature seems limiting, so let's expand it to be more general. To this end, change the parameter from a ce_threshold_ect1 boolean, to a one-byte selector/mask pair (ce_threshold_{selector,mask}) which is applied to the whole diffserv/ECN field in the IP header. This makes it possible to classify packets by any value in either the ECN field or the diffserv field. In particular, setting a selector of INET_ECN_ECT_1 and a mask of INET_ECN_MASK corresponds to the functionality before this patch, and a mask of ~INET_ECN_MASK allows using the selector as a straight-forward match against a diffserv code point: # apply ce_threshold to ECT(1) traffic tc qdisc replace dev eth0 root fq_codel ce_threshold 1ms ce_threshold_selector 0x1/0x3 # apply ce_threshold to ECN-capable traffic marked as diffserv AF22 tc qdisc replace dev eth0 root fq_codel ce_threshold 1ms ce_threshold_selector 0x50/0xfc Regardless of the selector chosen, the normal rules for ECN-marking of packets still apply, i.e., the flow must still declare itself ECN-capable by setting one of the bits in the ECN field to get marked at all. v2: - Add tc usage examples to patch description Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20211019174709.69081-1-toke@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS for net-next: 1) Add new run_estimation toggle to IPVS to stop the estimation_timer logic, from Dust Li. 2) Relax superfluous dynset check on NFT_SET_TIMEOUT. 3) Add egress hook, from Lukas Wunner. 4) Nowadays, almost all hook functions in x_table land just call the hook evaluation loop. Remove remaining hook wrappers from iptables and IPVS. From Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18ether: add EtherType for proprietary Realtek protocolsAlvin Šipraga
Add a new EtherType ETH_P_REALTEK to the if_ether.h uapi header. The EtherType 0x8899 is used in a number of different protocols from Realtek Semiconductor Corp [1], so no general assumptions should be made when trying to decode such packets. Observed protocols include: 0x1 - Realtek Remote Control protocol [2] 0x2 - Echo protocol [2] 0x3 - Loop detection protocol [2] 0x4 - RTL8365MB 4- and 8-byte switch CPU tag protocols [3] 0x9 - RTL8306 switch CPU tag protocol [4] 0xA - RTL8366RB switch CPU tag protocol [4] [1] https://lore.kernel.org/netdev/CACRpkdYQthFgjwVzHyK3DeYUOdcYyWmdjDPG=Rf9B3VrJ12Rzg@mail.gmail.com/ [2] https://www.wireshark.org/lists/ethereal-dev/200409/msg00090.html [3] https://lore.kernel.org/netdev/20210822193145.1312668-4-alvin@pqrs.dk/ [4] https://lore.kernel.org/netdev/20200708122537.1341307-2-linus.walleij@linaro.org/ Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18mctp: Be explicit about struct sockaddr_mctp paddingJeremy Kerr
We currently have some implicit padding in struct sockaddr_mctp. This patch makes this padding explicit, and ensures we have consistent layout on platforms with <32bit alignmnent. Fixes: 60fc63981693 ("mctp: Add sockaddr_mctp to uapi") Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18mctp: unify sockaddr_mctp typesJeremy Kerr
Use the more precise __kernel_sa_family_t for smctp_family, to match struct sockaddr. Also, use an unsigned int for the network member; negative networks don't make much sense. We're already using unsigned for mctp_dev and mctp_skb_cb, but need to change mctp_sock to suit. Fixes: 60fc63981693 ("mctp: Add sockaddr_mctp to uapi") Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Acked-by: Eugene Syromiatnikov <esyr@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-17Merge tag 'char-misc-5.15-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are some small char/misc driver fixes for 5.15-rc6 for reported issues that include: - habanalabs driver fixes - mei driver fixes and new ids - fpga new device ids - MAINTAINER file updates for fpga subsystem - spi module id table additions and fixes - fastrpc locking fixes - nvmem driver fix All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: eeprom: 93xx46: fix MODULE_DEVICE_TABLE nvmem: Fix shift-out-of-bound (UBSAN) with byte size cells mei: hbm: drop hbm responses on early shutdown mei: me: add Ice Lake-N device id. eeprom: 93xx46: Add SPI device ID table eeprom: at25: Add SPI ID table misc: HI6421V600_IRQ should depend on HAS_IOMEM misc: fastrpc: Add missing lock before accessing find_vma() cb710: avoid NULL pointer subtraction misc: gehc: Add SPI ID table MAINTAINERS: Drop outdated FPGA Manager website MAINTAINERS: Add Hao and Yilun as maintainers habanalabs: fix resetting args in wait for CS IOCTL fpga: ice40-spi: Add SPI device ID table
2021-10-16net/smc: add netlink support for SMC-Rv2Karsten Graul
Implement the netlink support for SMC-Rv2 related attributes that are provided to user space. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15fq_codel: implement L4S style ce_threshold_ect1 markingEric Dumazet
Add TCA_FQ_CODEL_CE_THRESHOLD_ECT1 boolean option to select Low Latency, Low Loss, Scalable Throughput (L4S) style marking, along with ce_threshold. If enabled, only packets with ECT(1) can be transformed to CE if their sojourn time is above the ce_threshold. Note that this new option does not change rules for codel law. In particular, if TCA_FQ_CODEL_ECN is left enabled (this is the default when fq_codel qdisc is created), ECT(0) packets can still get CE if codel law (as governed by limit/target) decides so. Section 4.3.b of current draft [1] states: b. A scheduler with per-flow queues such as FQ-CoDel or FQ-PIE can be used for L4S. For instance within each queue of an FQ-CoDel system, as well as a CoDel AQM, there is typically also ECN marking at an immediate (unsmoothed) shallow threshold to support use in data centres (see Sec.5.2.7 of [RFC8290]). This can be modified so that the shallow threshold is solely applied to ECT(1) packets. Then if there is a flow of non-ECN or ECT(0) packets in the per-flow-queue, the Classic AQM (e.g. CoDel) is applied; while if there is a flow of ECT(1) packets in the queue, the shallower (typically sub-millisecond) threshold is applied. Tested: tc qd replace dev eth1 root fq_codel ce_threshold_ect1 50usec netperf ... -t TCP_STREAM -- K dctcp tc -s -d qd sh dev eth1 qdisc fq_codel 8022: root refcnt 32 limit 10240p flows 1024 quantum 9212 target 5ms ce_threshold_ect1 49us interval 100ms memory_limit 32Mb ecn drop_batch 64 Sent 14388596616 bytes 9543449 pkt (dropped 0, overlimits 0 requeues 152013) backlog 0b 0p requeues 152013 maxpacket 68130 drop_overlimit 0 new_flow_count 95678 ecn_mark 0 ce_mark 7639 new_flows_len 0 old_flows_len 0 [1] L4S current draft: https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-l4s-arch Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Ingemar Johansson S <ingemar.s.johansson@ericsson.com> Cc: Tom Henderson <tomh@tomh.org> Cc: Bob Briscoe <in@bobbriscoe.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-14netfilter: Introduce egress hookLukas Wunner
Support classifying packets with netfilter on egress to satisfy user requirements such as: * outbound security policies for containers (Laura) * filtering and mangling intra-node Direct Server Return (DSR) traffic on a load balancer (Laura) * filtering locally generated traffic coming in through AF_PACKET, such as local ARP traffic generated for clustering purposes or DHCP (Laura; the AF_PACKET plumbing is contained in a follow-up commit) * L2 filtering from ingress and egress for AVB (Audio Video Bridging) and gPTP with nftables (Pablo) * in the future: in-kernel NAT64/NAT46 (Pablo) The egress hook introduced herein complements the ingress hook added by commit e687ad60af09 ("netfilter: add netfilter ingress hook after handle_ing() under unique static key"). A patch for nftables to hook up egress rules from user space has been submitted separately, so users may immediately take advantage of the feature. Alternatively or in addition to netfilter, packets can be classified with traffic control (tc). On ingress, packets are classified first by tc, then by netfilter. On egress, the order is reversed for symmetry. Conceptually, tc and netfilter can be thought of as layers, with netfilter layered above tc. Traffic control is capable of redirecting packets to another interface (man 8 tc-mirred). E.g., an ingress packet may be redirected from the host namespace to a container via a veth connection: tc ingress (host) -> tc egress (veth host) -> tc ingress (veth container) In this case, netfilter egress classifying is not performed when leaving the host namespace! That's because the packet is still on the tc layer. If tc redirects the packet to a physical interface in the host namespace such that it leaves the system, the packet is never subjected to netfilter egress classifying. That is only logical since it hasn't passed through netfilter ingress classifying either. Packets can alternatively be redirected at the netfilter layer using nft fwd. Such a packet *is* subjected to netfilter egress classifying since it has reached the netfilter layer. Internally, the skb->nf_skip_egress flag controls whether netfilter is invoked on egress by __dev_queue_xmit(). Because __dev_queue_xmit() may be called recursively by tunnel drivers such as vxlan, the flag is reverted to false after sch_handle_egress(). This ensures that netfilter is applied both on the overlay and underlying network. Interaction between tc and netfilter is possible by setting and querying skb->mark. If netfilter egress classifying is not enabled on any interface, it is patched out of the data path by way of a static_key and doesn't make a performance difference that is discernible from noise: Before: 1537 1538 1538 1537 1538 1537 Mb/sec After: 1536 1534 1539 1539 1539 1540 Mb/sec Before + tc accept: 1418 1418 1418 1419 1419 1418 Mb/sec After + tc accept: 1419 1424 1418 1419 1422 1420 Mb/sec Before + tc drop: 1620 1619 1619 1619 1620 1620 Mb/sec After + tc drop: 1616 1624 1625 1624 1622 1619 Mb/sec When netfilter egress classifying is enabled on at least one interface, a minimal performance penalty is incurred for every egress packet, even if the interface it's transmitted over doesn't have any netfilter egress rules configured. That is caused by checking dev->nf_hooks_egress against NULL. Measurements were performed on a Core i7-3615QM. Commands to reproduce: ip link add dev foo type dummy ip link set dev foo up modprobe pktgen echo "add_device foo" > /proc/net/pktgen/kpktgend_3 samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -i foo -n 400000000 -m "11:11:11:11:11:11" -d 1.1.1.1 Accept all traffic with tc: tc qdisc add dev foo clsact tc filter add dev foo egress bpf da bytecode '1,6 0 0 0,' Drop all traffic with tc: tc qdisc add dev foo clsact tc filter add dev foo egress bpf da bytecode '1,6 0 0 2,' Apply this patch when measuring packet drops to avoid errors in dmesg: https://lore.kernel.org/netdev/a73dda33-57f4-95d8-ea51-ed483abd6a7a@iogearbox.net/ Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: Laura García Liébana <nevola@gmail.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-10-12net, neigh: Add NTF_MANAGED flag for managed neighbor entriesDaniel Borkmann
Allow a user space control plane to insert entries with a new NTF_EXT_MANAGED flag. The flag then indicates to the kernel that the neighbor entry should be periodically probed for keeping the entry in NUD_REACHABLE state iff possible. The use case for this is targeting XDP or tc BPF load-balancers which use the bpf_fib_lookup() BPF helper in order to piggyback on neighbor resolution for their backends. Given they cannot be resolved in fast-path, a control plane inserts the L3 (without L2) entries manually into the neighbor table and lets the kernel do the neighbor resolution either on the gateway or on the backend directly in case the latter resides in the same L2. This avoids to deal with L2 in the control plane and to rebuild what the kernel already does best anyway. NTF_EXT_MANAGED can be combined with NTF_EXT_LEARNED in order to avoid GC eviction. The kernel then adds NTF_MANAGED flagged entries to a per-neighbor table which gets triggered by the system work queue to periodically call neigh_event_send() for performing the resolution. The implementation allows migration from/to NTF_MANAGED neighbor entries, so that already existing entries can be converted by the control plane if needed. Potentially, we could make the interval for periodically calling neigh_event_send() configurable; right now it's set to DELAY_PROBE_TIME which is also in line with mlxsw which has similar driver-internal infrastructure c723c735fa6b ("mlxsw: spectrum_router: Periodically update the kernel's neigh table"). In future, the latter could possibly reuse the NTF_MANAGED neighbors as well. Example: # ./ip/ip n replace 192.168.178.30 dev enp5s0 managed extern_learn # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a managed extern_learn REACHABLE [...] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Roopa Prabhu <roopa@nvidia.com> Link: https://linuxplumbersconf.org/event/11/contributions/953/ Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-12net, neigh: Extend neigh->flags to 32 bit to allow for extensionsRoopa Prabhu
Currently, all bits in struct ndmsg's ndm_flags are used up with the most recent addition of 435f2e7cc0b7 ("net: bridge: add support for sticky fdb entries"). This makes it impossible to extend the neighboring subsystem with new NTF_* flags: struct ndmsg { __u8 ndm_family; __u8 ndm_pad1; __u16 ndm_pad2; __s32 ndm_ifindex; __u16 ndm_state; __u8 ndm_flags; __u8 ndm_type; }; There are ndm_pad{1,2} attributes which are not used. However, due to uncareful design, the kernel does not enforce them to be zero upon new neighbor entry addition, and given they've been around forever, it is not possible to reuse them today due to risk of breakage. One option to overcome this limitation is to add a new NDA_FLAGS_EXT attribute for extended flags. In struct neighbour, there is a 3 byte hole between protocol and ha_lock, which allows neigh->flags to be extended from 8 to 32 bits while still being on the same cacheline as before. This also allows for all future NTF_* flags being in neigh->flags rather than yet another flags field. Unknown flags in NDA_FLAGS_EXT will be rejected by the kernel. Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Roopa Prabhu <roopa@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-08vsock: Enable y2038 safe timeval for timeoutRichard Palethorpe
Reuse the timeval compat code from core/sock to handle 32-bit and 64-bit timeval structures. Also introduce a new socket option define to allow using y2038 safe timeval under 32-bit. The existing behavior of sock_set_timeout and vsock's timeout setter differ when the time value is out of bounds. vsocks current behavior is retained at the expense of not being able to share the full implementation. This allows the LTP test vsock01 to pass under 32-bit compat mode. Fixes: fe0c72f3db11 ("socket: move compat timeout handling into sock.c") Signed-off-by: Richard Palethorpe <rpalethorpe@suse.com> Cc: Richard Palethorpe <rpalethorpe@richiejp.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-07Merge tag 'net-5.15-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from xfrm, bpf, netfilter, and wireless. Current release - regressions: - xfrm: fix XFRM_MSG_MAPPING ABI breakage caused by inserting a new value in the middle of an enum - unix: fix an issue in unix_shutdown causing the other end read/write failures - phy: mdio: fix memory leak Current release - new code bugs: - mlx5e: improve MQPRIO resiliency against bad configs Previous releases - regressions: - bpf: fix integer overflow leading to OOB access in map element pre-allocation - stmmac: dwmac-rk: fix ethernet on rk3399 based devices - netfilter: conntrack: fix boot failure with nf_conntrack.enable_hooks=1 - brcmfmac: revert using ISO3166 country code and 0 rev as fallback - i40e: fix freeing of uninitialized misc IRQ vector - iavf: fix double unlock of crit_lock Previous releases - always broken: - bpf, arm: fix register clobbering in div/mod implementation - netfilter: nf_tables: correct issues in netlink rule change event notifications - dsa: tag_dsa: fix mask for trunked packets - usb: r8152: don't resubmit rx immediately to avoid soft lockup on device unplug - i40e: fix endless loop under rtnl if FW fails to correctly respond to capability query - mlx5e: fix rx checksum offload coexistence with ipsec offload - mlx5: force round second at 1PPS out start time and allow it only in supported clock modes - phy: pcs: xpcs: fix incorrect CL37 AN sequence, EEE disable sequence Misc: - xfrm: slightly rejig the new policy uAPI to make it less cryptic" * tag 'net-5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (66 commits) net: prefer socket bound to interface when not in VRF iavf: fix double unlock of crit_lock i40e: Fix freeing of uninitialized misc IRQ vector i40e: fix endless loop under rtnl dt-bindings: net: dsa: marvell: fix compatible in example ionic: move filter sync_needed bit set gve: report 64bit tx_bytes counter from gve_handle_report_stats() gve: fix gve_get_stats() rtnetlink: fix if_nlmsg_stats_size() under estimation gve: Properly handle errors in gve_assign_qpl gve: Avoid freeing NULL pointer gve: Correct available tx qpl check unix: Fix an issue in unix_shutdown causing the other end read/write failures net: stmmac: trigger PCS EEE to turn off on link down net: pcs: xpcs: fix incorrect steps on disable EEE netlink: annotate data races around nlk->bound net: pcs: xpcs: fix incorrect CL37 AN sequence net: sfp: Fix typo in state machine debug string net/sched: sch_taprio: properly cancel timer from taprio_destroy() net: bridge: fix under estimation in br_get_linkxstats_size() ...
2021-10-07Merge tag 'hyperv-fixes-signed-20211007' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - Replace uuid.h with types.h in a header (Andy Shevchenko) - Avoid sleeping in atomic context in PCI driver (Long Li) - Avoid sending IPI to self when it shouldn't (Vitaly Kuznetsov) * tag 'hyperv-fixes-signed-20211007' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: x86/hyperv: Avoid erroneously sending IPI to 'self' hyper-v: Replace uuid.h with types.h PCI: hv: Fix sleep while in non-sleep context when removing child devices from the bus
2021-10-07Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/David S. Miller
ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2021-10-07 1) Fix a sysbot reported shift-out-of-bounds in xfrm_get_default. From Pavel Skripkin. 2) Fix XFRM_MSG_MAPPING ABI breakage. The new XFRM_MSG_MAPPING messages were accidentally not paced at the end. Fix by Eugene Syromiatnikov. 3) Fix the uapi for the default policy, use explicit field and macros and make it accessible to userland. From Nicolas Dichtel. 4) Fix a missing rcu lock in xfrm_notify_userpolicy(). From Nicolas Dichtel. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-06ethtool: Add transceiver module extended stateIdo Schimmel
Add an extended state and sub-state to describe link issues related to transceiver modules. The 'ETHTOOL_LINK_EXT_SUBSTATE_MODULE_CMIS_NOT_READY' extended sub-state tells user space that port is unable to gain a carrier because the CMIS Module State Machine did not reach the ModuleReady (Fully Operational) state. For example, if the module is stuck at ModuleLowPwr or ModuleFault state. In case of the latter, user space can read the fault reason from the module's EEPROM and potentially reset it. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-06ethtool: Add ability to control transceiver modules' power modeIdo Schimmel
Add a pair of new ethtool messages, 'ETHTOOL_MSG_MODULE_SET' and 'ETHTOOL_MSG_MODULE_GET', that can be used to control transceiver modules parameters and retrieve their status. The first parameter to control is the power mode of the module. It is only relevant for paged memory modules, as flat memory modules always operate in low power mode. When a paged memory module is in low power mode, its power consumption is reduced to the minimum, the management interface towards the host is available and the data path is deactivated. User space can choose to put modules that are not currently in use in low power mode and transition them to high power mode before putting the associated ports administratively up. This is useful for user space that favors reduced power consumption and lower temperatures over reduced link up times. In QSFP-DD modules the transition from low power mode to high power mode can take a few seconds and this transition is only expected to get longer with future / more complex modules. User space can control the power mode of the module via the power mode policy attribute ('ETHTOOL_A_MODULE_POWER_MODE_POLICY'). Possible values: * high: Module is always in high power mode. * auto: Module is transitioned by the host to high power mode when the first port using it is put administratively up and to low power mode when the last port using it is put administratively down. The operational power mode of the module is available to user space via the 'ETHTOOL_A_MODULE_POWER_MODE' attribute. The attribute is not reported to user space when a module is not plugged-in. The user API is designed to be generic enough so that it could be used for modules with different memory maps (e.g., SFF-8636, CMIS). The only implementation of the device driver API in this series is for a MAC driver (mlxsw) where the module is controlled by the device's firmware, but it is designed to be generic enough so that it could also be used by implementations where the module is controlled by the CPU. CMIS testing ============ # ethtool -m swp11 Identifier : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628)) ... Module State : 0x03 (ModuleReady) LowPwrAllowRequestHW : Off LowPwrRequestSW : Off The module is not in low power mode, as it is not forced by hardware (LowPwrAllowRequestHW is off) or by software (LowPwrRequestSW is off). The power mode can be queried from the kernel. In case LowPwrAllowRequestHW was on, the kernel would need to take into account the state of the LowPwrRequestHW signal, which is not visible to user space. $ ethtool --show-module swp11 Module parameters for swp11: power-mode-policy high power-mode high Change the power mode policy to 'auto': # ethtool --set-module swp11 power-mode-policy auto Query the power mode again: $ ethtool --show-module swp11 Module parameters for swp11: power-mode-policy auto power-mode low Verify with the data read from the EEPROM: # ethtool -m swp11 Identifier : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628)) ... Module State : 0x01 (ModuleLowPwr) LowPwrAllowRequestHW : Off LowPwrRequestSW : On Put the associated port administratively up which will instruct the host to transition the module to high power mode: # ip link set dev swp11 up Query the power mode again: $ ethtool --show-module swp11 Module parameters for swp11: power-mode-policy auto power-mode high Verify with the data read from the EEPROM: # ethtool -m swp11 Identifier : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628)) ... Module State : 0x03 (ModuleReady) LowPwrAllowRequestHW : Off LowPwrRequestSW : Off Put the associated port administratively down which will instruct the host to transition the module to low power mode: # ip link set dev swp11 down Query the power mode again: $ ethtool --show-module swp11 Module parameters for swp11: power-mode-policy auto power-mode low Verify with the data read from the EEPROM: # ethtool -m swp11 Identifier : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628)) ... Module State : 0x01 (ModuleLowPwr) LowPwrAllowRequestHW : Off LowPwrRequestSW : On SFF-8636 testing ================ # ethtool -m swp13 Identifier : 0x11 (QSFP28) ... Extended identifier description : 5.0W max. Power consumption, High Power Class (> 3.5 W) enabled Power set : Off Power override : On ... Transmit avg optical power (Channel 1) : 0.7733 mW / -1.12 dBm Transmit avg optical power (Channel 2) : 0.7649 mW / -1.16 dBm Transmit avg optical power (Channel 3) : 0.7790 mW / -1.08 dBm Transmit avg optical power (Channel 4) : 0.7837 mW / -1.06 dBm Rcvr signal avg optical power(Channel 1) : 0.9302 mW / -0.31 dBm Rcvr signal avg optical power(Channel 2) : 0.9079 mW / -0.42 dBm Rcvr signal avg optical power(Channel 3) : 0.8993 mW / -0.46 dBm Rcvr signal avg optical power(Channel 4) : 0.8778 mW / -0.57 dBm The module is not in low power mode, as it is not forced by hardware (Power override is on) or by software (Power set is off). The power mode can be queried from the kernel. In case Power override was off, the kernel would need to take into account the state of the LPMode signal, which is not visible to user space. $ ethtool --show-module swp13 Module parameters for swp13: power-mode-policy high power-mode high Change the power mode policy to 'auto': # ethtool --set-module swp13 power-mode-policy auto Query the power mode again: $ ethtool --show-module swp13 Module parameters for swp13: power-mode-policy auto power-mode low Verify with the data read from the EEPROM: # ethtool -m swp13 Identifier : 0x11 (QSFP28) Extended identifier description : 5.0W max. Power consumption, High Power Class (> 3.5 W) not enabled Power set : On Power override : On ... Transmit avg optical power (Channel 1) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 2) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 3) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 4) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 1) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 2) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 3) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 4) : 0.0000 mW / -inf dBm Put the associated port administratively up which will instruct the host to transition the module to high power mode: # ip link set dev swp13 up Query the power mode again: $ ethtool --show-module swp13 Module parameters for swp13: power-mode-policy auto power-mode high Verify with the data read from the EEPROM: # ethtool -m swp13 Identifier : 0x11 (QSFP28) ... Extended identifier description : 5.0W max. Power consumption, High Power Class (> 3.5 W) enabled Power set : Off Power override : On ... Transmit avg optical power (Channel 1) : 0.7934 mW / -1.01 dBm Transmit avg optical power (Channel 2) : 0.7859 mW / -1.05 dBm Transmit avg optical power (Channel 3) : 0.7885 mW / -1.03 dBm Transmit avg optical power (Channel 4) : 0.7985 mW / -0.98 dBm Rcvr signal avg optical power(Channel 1) : 0.9325 mW / -0.30 dBm Rcvr signal avg optical power(Channel 2) : 0.9034 mW / -0.44 dBm Rcvr signal avg optical power(Channel 3) : 0.9086 mW / -0.42 dBm Rcvr signal avg optical power(Channel 4) : 0.8885 mW / -0.51 dBm Put the associated port administratively down which will instruct the host to transition the module to low power mode: # ip link set dev swp13 down Query the power mode again: $ ethtool --show-module swp13 Module parameters for swp13: power-mode-policy auto power-mode low Verify with the data read from the EEPROM: # ethtool -m swp13 Identifier : 0x11 (QSFP28) ... Extended identifier description : 5.0W max. Power consumption, High Power Class (> 3.5 W) not enabled Power set : On Power override : On ... Transmit avg optical power (Channel 1) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 2) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 3) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 4) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 1) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 2) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 3) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 4) : 0.0000 mW / -inf dBm Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-06hyper-v: Replace uuid.h with types.hAndy Shevchenko
There is no user of anything in uuid.h in the hyperv.h. Replace it with more appropriate types.h. Fixes: f081bbb3fd03 ("hyper-v: Remove internal types from UAPI header") Reported-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://lore.kernel.org/r/20211001135544.1823-1-andriy.shevchenko@linux.intel.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-10-04ipv6: ioam: Add support for the ip6ip6 encapsulationJustin Iurman
This patch adds support for the ip6ip6 encapsulation by providing three encap modes: inline, encap and auto. Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-04Merge tag 'misc-habanalabs-fixes-2021-09-29' of ↵Greg Kroah-Hartman
https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux into char-misc-linus Oded writes: This tag contains the following fix for 5.15-rc4: - Prevent memset of ioctl arguments in case driver returns -EINTR * tag 'misc-habanalabs-fixes-2021-09-29' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux: habanalabs: fix resetting args in wait for CS IOCTL
2021-10-01Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski
Daniel Borkmann says: ==================== bpf-next 2021-10-02 We've added 85 non-merge commits during the last 15 day(s) which contain a total of 132 files changed, 13779 insertions(+), 6724 deletions(-). The main changes are: 1) Massive update on test_bpf.ko coverage for JITs as preparatory work for an upcoming MIPS eBPF JIT, from Johan Almbladh. 2) Add a batched interface for RX buffer allocation in AF_XDP buffer pool, with driver support for i40e and ice from Magnus Karlsson. 3) Add legacy uprobe support to libbpf to complement recently merged legacy kprobe support, from Andrii Nakryiko. 4) Add bpf_trace_vprintk() as variadic printk helper, from Dave Marchevsky. 5) Support saving the register state in verifier when spilling <8byte bounded scalar to the stack, from Martin Lau. 6) Add libbpf opt-in for stricter BPF program section name handling as part of libbpf 1.0 effort, from Andrii Nakryiko. 7) Add a document to help clarifying BPF licensing, from Alexei Starovoitov. 8) Fix skel_internal.h to propagate errno if the loader indicates an internal error, from Kumar Kartikeya Dwivedi. 9) Fix build warnings with -Wcast-function-type so that the option can later be enabled by default for the kernel, from Kees Cook. 10) Fix libbpf to ignore STT_SECTION symbols in legacy map definitions as it otherwise errors out when encountering them, from Toke Høiland-Jørgensen. 11) Teach libbpf to recognize specialized maps (such as for perf RB) and internally remove BTF type IDs when creating them, from Hengqi Chen. 12) Various fixes and improvements to BPF selftests. ==================== Link: https://lore.kernel.org/r/20211002001327.15169-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-01devlink: report maximum number of snapshots with regionsJacob Keller
Each region has an independently configurable number of maximum snapshots. This information is not reported to userspace, making it not very discoverable. Fix this by adding a new DEVLINK_ATTR_REGION_MAX_SNAPSHOST attribute which is used to report this maximum. Ex: $devlink region pci/0000:af:00.0/nvm-flash: size 10485760 snapshot [] max 1 pci/0000:af:00.0/device-caps: size 4096 snapshot [] max 10 pci/0000:af:00.1/nvm-flash: size 10485760 snapshot [] max 1 pci/0000:af:00.1/device-caps: size 4096 snapshot [] max 10 This information enables users to understand why a new region command may fail due to having too many existing snapshots. Reported-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
drivers/net/phy/bcm7xxx.c d88fd1b546ff ("net: phy: bcm7xxx: Fixed indirect MMD operations") f68d08c437f9 ("net: phy: bcm7xxx: Add EPHY entry for 72165") net/sched/sch_api.c b193e15ac69d ("net: prevent user from passing illegal stab size") 69508d43334e ("net_sched: Use struct_size() and flex_array_size() helpers") Both cases trivial - adjacent code additions. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-30net: add new socket option SO_RESERVE_MEMWei Wang
This socket option provides a mechanism for users to reserve a certain amount of memory for the socket to use. When this option is set, kernel charges the user specified amount of memory to memcg, as well as sk_forward_alloc. This amount of memory is not reclaimable and is available in sk_forward_alloc for this socket. With this socket option set, the networking stack spends less cycles doing forward alloc and reclaim, which should lead to better system performance, with the cost of an amount of pre-allocated and unreclaimable memory, even under memory pressure. Note: This socket option is only available when memory cgroup is enabled and we require this reserved memory to be charged to the user's memcg. We hope this could avoid mis-behaving users to abused this feature to reserve a large amount on certain sockets and cause unfairness for others. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29Merge tag 'sound-5.15-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "This became a slightly large collection of changes, partly because I've been off in the last weeks. Most of changes are small and scattered while a bit big change is found in HD-audio Realtek codec driver; it's a very device-specific fix that has been long wanted, so I decided to pick up although it's in the middle RC. Some highlights: - A new guard ioctl for ALSA rawmidi API to avoid the misuse of the new timestamp framing mode; it's for a regression fix - HD-audio: a revert of the 5.15 change that might work badly, new quirks for Lenovo Legion & co, a follow-up fix for CS8409 - ASoC: lots of SOF-related fixes, fsl component fixes, corrections of mediatek drivers - USB-audio: fix for the PM resume - FireWire: oxfw and motu fixes" * tag 'sound-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (25 commits) ALSA: pcsp: Make hrtimer forwarding more robust ALSA: rawmidi: introduce SNDRV_RAWMIDI_IOCTL_USER_PVERSION ALSA: firewire-motu: fix truncated bytes in message tracepoints ASoC: SOF: trace: Omit error print when waking up trace sleepers ASoC: mediatek: mt8195: remove wrong fixup assignment on HDMITX ASoC: SOF: loader: Re-phrase the missing firmware error to avoid duplication ASoC: SOF: loader: release_firmware() on load failure to avoid batching ALSA: hda/cs8409: Setup Dolphin Headset Mic as Phantom Jack ALSA: pcxhr: "fix" PCXHR_REG_TO_PORT definition ASoC: SOF: imx: imx8m: Bar index is only valid for IRAM and SRAM types ASoC: SOF: imx: imx8: Bar index is only valid for IRAM and SRAM types ASoC: SOF: Fix DSP oops stack dump output contents ALSA: hda/realtek: Quirks to enable speaker output for Lenovo Legion 7i 15IMHG05, Yoga 7i 14ITL5/15ITL5, and 13s Gen2 laptops. ALSA: usb-audio: Unify mixer resume and reset_resume procedure Revert "ALSA: hda: Drop workaround for a hang at shutdown again" ALSA: oxfw: fix transmission method for Loud models based on OXFW971 ASoC: mediatek: common: handle NULL case in suspend/resume function ASoC: fsl_xcvr: register platform component before registering cpu dai ASoC: fsl_spdif: register platform component before registering cpu dai ASoC: fsl_micfil: register platform component before registering cpu dai ...
2021-09-29habanalabs: fix resetting args in wait for CS IOCTLRajaravi Krishna Katta
In wait for CS IOCTL code, the driver resets the incoming args structure before returning to the user, regardless of the return value of the IOCTL. In case the IOCTL returns EINTR, resetting the args will result in error in case the userspace will repeat the ioctl call immediately (which is the behavior in the hl-thunk userspace library). The solution is to reset the args only if the driver returns success (0) as a return value for the IOCTL. Signed-off-by: Rajaravi Krishna Katta <rkatta@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-09-27nl80211: MBSSID and EMA support in AP modeJohn Crispin
Add new attributes to configure support for multiple BSSID and advanced multi-BSSID advertisements (EMA) in AP mode. - NL80211_ATTR_MBSSID_CONFIG used for per interface configuration. - NL80211_ATTR_MBSSID_ELEMS used to MBSSID elements for beacons. Memory for the elements is allocated dynamically. This change frees the memory in existing functions which call nl80211_parse_beacon(), a comment is added to indicate the new references to do the same. Signed-off-by: John Crispin <john@phrozen.org> Co-developed-by: Aloka Dixit <alokad@codeaurora.org> Signed-off-by: Aloka Dixit <alokad@codeaurora.org> Link: https://lore.kernel.org/r/20210916025437.29138-2-alokad@codeaurora.org [don't leave ERR_PTR hanging around] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-09-27cfg80211: AP mode driver offload for FILS association cryptoSubrat Mishra
Add a driver FILS crypto offload extended capability flag to indicate that the driver running in AP mode is capable of handling encryption and decryption of (Re)Association request and response frames. Add a command to set FILS AAD data to driver. This feature is supported on drivers running in AP mode only. This extended capability is exchanged with hostapd during cfg80211 init. If the driver indicates this capability, then before sending the Authentication response frame, hostapd sets FILS AAD data to the driver. This allows the driver to decrypt (Re)Association Request frame and encrypt (Re)Association Response frame. FILS Key derivation will still be done in hostapd. Signed-off-by: Subrat Mishra <subratm@codeaurora.org> Link: https://lore.kernel.org/r/1631685143-13530-1-git-send-email-subratm@codeaurora.org [fix whitespace] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-09-25Merge tag 'char-misc-5.15-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are some small char and misc driver fixes for 5.15-rc3. Nothing huge in here, just fixes for a number of small issues that have been reported. These include: - habanalabs race conditions and other bugs fixed - binder driver fixes - fpga driver fixes - coresight build warning fix - nvmem driver fix - comedi memory leak fix - bcm-vk tty race fix - other tiny driver fixes All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (21 commits) comedi: Fix memory leak in compat_insnlist() nvmem: NVMEM_NINTENDO_OTP should depend on WII misc: bcm-vk: fix tty registration race fpga: dfl: Avoid reads to AFU CSRs during enumeration fpga: machxo2-spi: Fix missing error code in machxo2_write_complete() fpga: machxo2-spi: Return an error on failure habanalabs: expose a single cs seq in staged submissions habanalabs: fix wait offset handling habanalabs: rate limit multi CS completion errors habanalabs/gaudi: fix LBW RR configuration habanalabs: Fix spelling mistake "FEADBACK" -> "FEEDBACK" habanalabs: fail collective wait when not supported habanalabs/gaudi: use direct MSI in single mode habanalabs: fix kernel OOPs related to staged cs habanalabs: fix potential race in interrupt wait ioctl mcb: fix error handling in mcb_alloc_bus() misc: genwqe: Fixes DMA mask setting coresight: syscfg: Fix compiler warning nvmem: core: Add stubs for nvmem_cell_read_variable_le_u32/64 if !CONFIG_NVMEM binder: make sure fd closes complete ...
2021-09-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
net/mptcp/protocol.c 977d293e23b4 ("mptcp: ensure tx skbs always have the MPTCP ext") efe686ffce01 ("mptcp: ensure tx skbs always have the MPTCP ext") same patch merged in both trees, keep net-next. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-23ALSA: rawmidi: introduce SNDRV_RAWMIDI_IOCTL_USER_PVERSIONJaroslav Kysela
The new framing mode causes the user space regression, because the alsa-lib code does not initialize the reserved space in the params structure when the device is opened. This change adds SNDRV_RAWMIDI_IOCTL_USER_PVERSION like we do for the PCM interface for the protocol acknowledgment. Cc: David Henningsson <coding@diwic.se> Cc: <stable@vger.kernel.org> Fixes: 08fdced60ca0 ("ALSA: rawmidi: Add framing mode") BugLink: https://github.com/alsa-project/alsa-lib/issues/178 Signed-off-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20210920171850.154186-1-perex@perex.cz Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-09-20Merge tag '5.15-rc1-smb3' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs client fixes from Steve French: - two deferred close fixes (for bugs found with xfstests 478 and 461) - a deferred close improvement in rename - two trivial fixes for incorrect Linux comment formatting of multiple cifs files (pointed out by automated kernel test robot and checkpatch) * tag '5.15-rc1-smb3' of git://git.samba.org/sfrench/cifs-2.6: cifs: Not to defer close on file when lock is set cifs: Fix soft lockup during fsstress cifs: Deferred close performance improvements cifs: fix incorrect kernel doc comments cifs: remove pathname for file from SPDX header
2021-09-18mptcp: add MPTCP_SUBFLOW_ADDRS getsockopt supportFlorian Westphal
This retrieves the address pairs of all subflows currently active for a given mptcp connection. It re-uses the same meta-header as for MPTCP_TCPINFO. A new structure is provided to hold the subflow address data: struct mptcp_subflow_addrs { union { __kernel_sa_family_t sa_family; struct sockaddr sa_local; struct sockaddr_in sin_local; struct sockaddr_in6 sin6_local; struct sockaddr_storage ss_local; }; union { struct sockaddr sa_remote; struct sockaddr_in sin_remote; struct sockaddr_in6 sin6_remote; struct sockaddr_storage ss_remote; }; }; Usage of the new getsockopt is very similar to MPTCP_TCPINFO one. Userspace allocates a 'struct mptcp_subflow_data', followed by one or more 'struct mptcp_subflow_addrs', then inits the mptcp_subflow_data structure as follows: struct mptcp_subflow_addrs *sf_addr; struct mptcp_subflow_data *addr; socklen_t olen = sizeof(*addr) + (8 * sizeof(*sf_addr)); addr = malloc(olen); addr->size_subflow_data = sizeof(*addr); addr->num_subflows = 0; addr->size_kernel = 0; addr->size_user = sizeof(struct mptcp_subflow_addrs); sf_addr = (struct mptcp_subflow_addrs *)(addr + 1); and then retrieves the endpoint addresses via: ret = getsockopt(fd, SOL_MPTCP, MPTCP_SUBFLOW_ADDRS, addr, &olen); If the call succeeds, kernel will have added up to 8 endpoint addresses after the 'mptcp_subflow_data' header. Userspace needs to re-check 'olen' value to detect how many bytes have been filled in by the kernel. Userspace can check addr->num_subflows to discover when there were more subflows that available data space. Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-18mptcp: add MPTCP_TCPINFO getsockopt supportFlorian Westphal
Allow users to retrieve TCP_INFO data of all subflows. Users need to pre-initialize a meta header that has to be prepended to the data buffer that will be filled with the tcp info data. The meta header looks like this: struct mptcp_subflow_data { __u32 size_subflow_data;/* size of this structure in userspace */ __u32 num_subflows; /* must be 0, set by kernel */ __u32 size_kernel; /* must be 0, set by kernel */ __u32 size_user; /* size of one element in data[] */ } __attribute__((aligned(8))); size_subflow_data has to be set to 'sizeof(struct mptcp_subflow_data)'. This allows to extend mptcp_subflow_data structure later on without breaking backwards compatibility. If the structure is extended later on, kernel knows where the userspace-provided meta header ends, even if userspace uses an older (smaller) version of the structure. num_subflows must be set to 0. If the getsockopt request succeeds (return value is 0), it will be updated to contain the number of active subflows for the given logical connection. size_kernel must be set to 0. If the getsockopt request is successful, it will contain the size of the 'struct tcp_info' as known by the kernel. This is informational only. size_user must be set to 'sizeof(struct tcp_info)'. This allows the kernel to only fill in the space reserved/expected by userspace. Example: struct my_tcp_info { struct mptcp_subflow_data d; struct tcp_info ti[2]; }; struct my_tcp_info ti; socklen_t olen; memset(&ti, 0, sizeof(ti)); ti.d.size_subflow_data = sizeof(struct mptcp_subflow_data); ti.d.size_user = sizeof(struct tcp_info); olen = sizeof(ti); ret = getsockopt(fd, SOL_MPTCP, MPTCP_TCPINFO, &ti, &olen); if (ret < 0) die_perror("getsockopt MPTCP_TCPINFO"); mptcp_subflow_data.num_subflows is populated with the number of subflows that exist on the kernel side for the logical mptcp connection. This allows userspace to re-try with a larger tcp_info array if the number of subflows was larger than the available space in the ti[] array. olen has to be set to the number of bytes that userspace has allocated to receive the kernel data. It will be updated to contain the real number bytes that have been copied to by the kernel. In the above example, if the number if subflows was 1, olen is equal to 'sizeof(struct mptcp_subflow_data) + sizeof(struct tcp_info). For 2 or more subflows olen is equal to 'sizeof(struct my_tcp_info)'. If there was more data that could not be copied due to lack of space in the option buffer, userspace can detect this by checking mptcp_subflow_data->num_subflows. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-18mptcp: add MPTCP_INFO getsockoptFlorian Westphal
Its not compatible with multipath-tcp.org kernel one. 1. The out-of-tree implementation defines a different 'struct mptcp_info', with embedded __user addresses for additional data such as endpoint addresses. 2. Mat Martineau points out that embedded __user addresses doesn't work with BPF_CGROUP_RUN_PROG_GETSOCKOPT() which assumes that copying in optsize bytes from optval provides all data that got copied to userspace. This provides mptcp_info data for the given mptcp socket. Userspace sets optlen to the size of the structure it expects. The kernel updates it to contain the number of bytes that it copied. This allows to append more information to the structure later. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-17bpf: Clarify data_len param in bpf_snprintf and bpf_seq_printf commentsDave Marchevsky
Since the data_len in these two functions is a byte len of the preceding u64 *data array, it must always be a multiple of 8. If this isn't the case both helpers error out, so let's make the requirement explicit so users don't need to infer it. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210917182911.2426606-10-davemarchevsky@fb.com
2021-09-17bpf: Add bpf_trace_vprintk helperDave Marchevsky
This helper is meant to be "bpf_trace_printk, but with proper vararg support". Follow bpf_snprintf's example and take a u64 pseudo-vararg array. Write to /sys/kernel/debug/tracing/trace_pipe using the same mechanism as bpf_trace_printk. The functionality of this helper was requested in the libbpf issue tracker [0]. [0] Closes: https://github.com/libbpf/libbpf/issues/315 Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210917182911.2426606-4-davemarchevsky@fb.com
2021-09-17Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski
Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-09-17 We've added 63 non-merge commits during the last 12 day(s) which contain a total of 65 files changed, 2653 insertions(+), 751 deletions(-). The main changes are: 1) Streamline internal BPF program sections handling and bpf_program__set_attach_target() in libbpf, from Andrii. 2) Add support for new btf kind BTF_KIND_TAG, from Yonghong. 3) Introduce bpf_get_branch_snapshot() to capture LBR, from Song. 4) IMUL optimization for x86-64 JIT, from Jie. 5) xsk selftest improvements, from Magnus. 6) Introduce legacy kprobe events support in libbpf, from Rafael. 7) Access hw timestamp through BPF's __sk_buff, from Vadim. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (63 commits) selftests/bpf: Fix a few compiler warnings libbpf: Constify all high-level program attach APIs libbpf: Schedule open_opts.attach_prog_fd deprecation since v0.7 selftests/bpf: Switch fexit_bpf2bpf selftest to set_attach_target() API libbpf: Allow skipping attach_func_name in bpf_program__set_attach_target() libbpf: Deprecated bpf_object_open_opts.relaxed_core_relocs selftests/bpf: Stop using relaxed_core_relocs which has no effect libbpf: Use pre-setup sec_def in libbpf_find_attach_btf_id() bpf: Update bpf_get_smp_processor_id() documentation libbpf: Add sphinx code documentation comments selftests/bpf: Skip btf_tag test if btf_tag attribute not supported docs/bpf: Add documentation for BTF_KIND_TAG selftests/bpf: Add a test with a bpf program with btf_tag attributes selftests/bpf: Test BTF_KIND_TAG for deduplication selftests/bpf: Add BTF_KIND_TAG unit tests selftests/bpf: Change NAME_NTH/IS_NAME_NTH for BTF_KIND_TAG format selftests/bpf: Test libbpf API function btf__add_tag() bpftool: Add support for BTF_KIND_TAG libbpf: Add support for BTF_KIND_TAG libbpf: Rename btf_{hash,equal}_int to btf_{hash,equal}_int_tag ... ==================== Link: https://lore.kernel.org/r/20210917173738.3397064-1-ast@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts! Signed-off-by: Jakub Kicinski <kuba@kernel.org>