summaryrefslogtreecommitdiff
path: root/drivers/infiniband
AgeCommit message (Collapse)Author
2023-02-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "Quite a small cycle this time, even with the rc8. I suppose everyone went to sleep over xmas. - Minor driver updates for hfi1, cxgb4, erdma, hns, irdma, mlx5, siw, mana - inline CQE support for hns - Have mlx5 display device error codes - Pinned DMABUF support for irdma - Continued rxe cleanups, particularly converting the MRs to use xarray - Improvements to what can be cached in the mlx5 mkey cache" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (61 commits) IB/mlx5: Extend debug control for CC parameters IB/hfi1: Fix sdma.h tx->num_descs off-by-one errors IB/hfi1: Fix math bugs in hfi1_can_pin_pages() RDMA/irdma: Add support for dmabuf pin memory regions RDMA/mlx5: Use query_special_contexts for mkeys net/mlx5e: Use query_special_contexts for mkeys net/mlx5: Change define name for 0x100 lkey value net/mlx5: Expose bits for querying special mkeys RDMA/rxe: Fix missing memory barriers in rxe_queue.h RDMA/mana_ib: Fix a bug when the PF indicates more entries for registering memory on first packet RDMA/rxe: Remove rxe_alloc() RDMA/cma: Distinguish between sockaddr_in and sockaddr_in6 by size Subject: RDMA/rxe: Handle zero length rdma iw_cxgb4: Fix potential NULL dereference in c4iw_fill_res_cm_id_entry() RDMA/mlx5: Use rdma_umem_for_each_dma_block() RDMA/umem: Remove unused 'work' member from struct ib_umem RDMA/irdma: Cap MSIX used to online CPUs + 1 RDMA/mlx5: Check reg_create() create for errors RDMA/restrack: Correct spelling RDMA/cxgb4: Fix potential null-ptr-deref in pass_establish() ...
2023-02-24Merge tag 'for-linus-iommufd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "Some polishing and small fixes for iommufd: - Remove IOMMU_CAP_INTR_REMAP, instead rely on the interrupt subsystem - Use GFP_KERNEL_ACCOUNT inside the iommu_domains - Support VFIO_NOIOMMU mode with iommufd - Various typos - A list corruption bug if HWPTs are used for attach" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommufd: Do not add the same hwpt to the ioas->hwpt_list twice iommufd: Make sure to zero vfio_iommu_type1_info before copying to user vfio: Support VFIO_NOIOMMU with iommufd iommufd: Add three missing structures in ucmd_buffer selftests: iommu: Fix test_cmd_destroy_access() call in user_copy iommu: Remove IOMMU_CAP_INTR_REMAP irq/s390: Add arch_is_isolated_msi() for s390 iommu/x86: Replace IOMMU_CAP_INTR_REMAP with IRQ_DOMAIN_FLAG_ISOLATED_MSI genirq/msi: Rename IRQ_DOMAIN_MSI_REMAP to IRQ_DOMAIN_ISOLATED_MSI genirq/irqdomain: Remove unused irq_domain_check_msi_remap() code iommufd: Convert to msi_device_has_isolated_msi() vfio/type1: Convert to iommu_group_has_isolated_msi() iommu: Add iommu_group_has_isolated_msi() genirq/msi: Add msi_device_has_isolated_msi()
2023-02-24Merge tag 'iommu-updates-v6.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: - Consolidate iommu_map/unmap functions. There have been blocking and atomic variants so far, but that was problematic as this approach does not scale with required new variants which just differ in the GFP flags used. So Jason consolidated this back into single functions that take a GFP parameter. - Retire the detach_dev() call-back in iommu_ops - Arm SMMU updates from Will: - Device-tree binding updates: - Cater for three power domains on SM6375 - Document existing compatible strings for Qualcomm SoCs - Tighten up clocks description for platform-specific compatible strings - Enable Qualcomm workarounds for some additional platforms that need them - Intel VT-d updates from Lu Baolu: - Add Intel IOMMU performance monitoring support - Set No Execute Enable bit in PASID table entry - Two performance optimizations - Fix PASID directory pointer coherency - Fix missed rollbacks in error path - Cleanups - Apple t8110 DART support - Exynos IOMMU: - Implement better fault handling - Error handling fixes - Renesas IPMMU: - Add device tree bindings for r8a779g0 - AMD IOMMU: - Various fixes for handling on SNP-enabled systems and handling of faults with unknown request-ids - Cleanups and other small fixes - Various other smaller fixes and cleanups * tag 'iommu-updates-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (71 commits) iommu/amd: Skip attach device domain is same as new domain iommu: Attach device group to old domain in error path iommu/vt-d: Allow to use flush-queue when first level is default iommu/vt-d: Fix PASID directory pointer coherency iommu/vt-d: Avoid superfluous IOTLB tracking in lazy mode iommu/vt-d: Fix error handling in sva enable/disable paths iommu/amd: Improve page fault error reporting iommu/amd: Do not identity map v2 capable device when snp is enabled iommu: Fix error unwind in iommu_group_alloc() iommu/of: mark an unused function as __maybe_unused iommu: dart: DART_T8110_ERROR range should be 0 to 5 iommu/vt-d: Enable IOMMU perfmon support iommu/vt-d: Add IOMMU perfmon overflow handler support iommu/vt-d: Support cpumask for IOMMU perfmon iommu/vt-d: Add IOMMU perfmon support iommu/vt-d: Support Enhanced Command Interface iommu/vt-d: Retrieve IOMMU perfmon capability information iommu/vt-d: Support size of the register set in DRHD iommu/vt-d: Set No Execute Enable bit in PASID table entry iommu/vt-d: Remove sva from intel_svm_dev ...
2023-02-23Merge tag 'mm-nonmm-stable-2023-02-20-15-29' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "There is no particular theme here - mainly quick hits all over the tree. Most notable is a set of zlib changes from Mikhail Zaslonko which enhances and fixes zlib's use of S390 hardware support: 'lib/zlib: Set of s390 DFLTCC related patches for kernel zlib'" * tag 'mm-nonmm-stable-2023-02-20-15-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (55 commits) Update CREDITS file entry for Jesper Juhl sparc: allow PM configs for sparc32 COMPILE_TEST hung_task: print message when hung_task_warnings gets down to zero. arch/Kconfig: fix indentation scripts/tags.sh: fix the Kconfig tags generation when using latest ctags nilfs2: prevent WARNING in nilfs_dat_commit_end() lib/zlib: remove redundation assignement of avail_in dfltcc_gdht() lib/Kconfig.debug: do not enable DEBUG_PREEMPT by default lib/zlib: DFLTCC always switch to software inflate for Z_PACKET_FLUSH option lib/zlib: DFLTCC support inflate with small window lib/zlib: Split deflate and inflate states for DFLTCC lib/zlib: DFLTCC not writing header bits when avail_out == 0 lib/zlib: fix DFLTCC ignoring flush modes when avail_in == 0 lib/zlib: fix DFLTCC not flushing EOBS when creating raw streams lib/zlib: implement switching between DFLTCC and software lib/zlib: adjust offset calculation for dfltcc_state nilfs2: replace WARN_ONs for invalid DAT metadata block requests scripts/spelling.txt: add "exsits" pattern and fix typo instances fs: gracefully handle ->get_block not mapping bh in __mpage_writepage cramfs: Kconfig: fix spelling & punctuation ...
2023-02-23Merge tag 'mm-stable-2023-02-20-13-37' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Daniel Verkamp has contributed a memfd series ("mm/memfd: add F_SEAL_EXEC") which permits the setting of the memfd execute bit at memfd creation time, with the option of sealing the state of the X bit. - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare") which addresses a rare race condition related to PMD unsharing. - Several folioification patch serieses from Matthew Wilcox, Vishal Moola, Sidhartha Kumar and Lorenzo Stoakes - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which does perform some memcg maintenance and cleanup work. - SeongJae Park has added DAMOS filtering to DAMON, with the series "mm/damon/core: implement damos filter". These filters provide users with finer-grained control over DAMOS's actions. SeongJae has also done some DAMON cleanup work. - Kairui Song adds a series ("Clean up and fixes for swap"). - Vernon Yang contributed the series "Clean up and refinement for maple tree". - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It adds to MGLRU an LRU of memcgs, to improve the scalability of global reclaim. - David Hildenbrand has added some userfaultfd cleanup work in the series "mm: uffd-wp + change_protection() cleanups". - Christoph Hellwig has removed the generic_writepages() library function in the series "remove generic_writepages". - Baolin Wang has performed some maintenance on the compaction code in his series "Some small improvements for compaction". - Sidhartha Kumar is doing some maintenance work on struct page in his series "Get rid of tail page fields". - David Hildenbrand contributed some cleanup, bugfixing and generalization of pte management and of pte debugging in his series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs". - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation flag in the series "Discard __GFP_ATOMIC". - Sergey Senozhatsky has improved zsmalloc's memory utilization with his series "zsmalloc: make zspage chain size configurable". - Joey Gouly has added prctl() support for prohibiting the creation of writeable+executable mappings. The previous BPF-based approach had shortcomings. See "mm: In-kernel support for memory-deny-write-execute (MDWE)". - Waiman Long did some kmemleak cleanup and bugfixing in the series "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF". - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series "mm: multi-gen LRU: improve". - Jiaqi Yan has provided some enhancements to our memory error statistics reporting, mainly by presenting the statistics on a per-node basis. See the series "Introduce per NUMA node memory error statistics". - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog regression in compaction via his series "Fix excessive CPU usage during compaction". - Christoph Hellwig does some vmalloc maintenance work in the series "cleanup vfree and vunmap". - Christoph Hellwig has removed block_device_operations.rw_page() in ths series "remove ->rw_page". - We get some maple_tree improvements and cleanups in Liam Howlett's series "VMA tree type safety and remove __vma_adjust()". - Suren Baghdasaryan has done some work on the maintainability of our vm_flags handling in the series "introduce vm_flags modifier functions". - Some pagemap cleanup and generalization work in Mike Rapoport's series "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and "fixups for generic implementation of pfn_valid()" - Baoquan He has done some work to make /proc/vmallocinfo and /proc/kcore better represent the real state of things in his series "mm/vmalloc.c: allow vread() to read out vm_map_ram areas". - Jason Gunthorpe rationalized the GUP system's interface to the rest of the kernel in the series "Simplify the external interface for GUP". - SeongJae Park wishes to migrate people from DAMON's debugfs interface over to its sysfs interface. To support this, we'll temporarily be printing warnings when people use the debugfs interface. See the series "mm/damon: deprecate DAMON debugfs interface". - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes and clean-ups" series. - Huang Ying has provided a dramatic reduction in migration's TLB flush IPI rates with the series "migrate_pages(): batch TLB flushing". - Arnd Bergmann has some objtool fixups in "objtool warning fixes". * tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits) include/linux/migrate.h: remove unneeded externs mm/memory_hotplug: cleanup return value handing in do_migrate_range() mm/uffd: fix comment in handling pte markers mm: change to return bool for isolate_movable_page() mm: hugetlb: change to return bool for isolate_hugetlb() mm: change to return bool for isolate_lru_page() mm: change to return bool for folio_isolate_lru() objtool: add UACCESS exceptions for __tsan_volatile_read/write kmsan: disable ftrace in kmsan core code kasan: mark addr_has_metadata __always_inline mm: memcontrol: rename memcg_kmem_enabled() sh: initialize max_mapnr m68k/nommu: add missing definition of ARCH_PFN_OFFSET mm: percpu: fix incorrect size in pcpu_obj_full_size() maple_tree: reduce stack usage with gcc-9 and earlier mm: page_alloc: call panic() when memoryless node allocation fails mm: multi-gen LRU: avoid futile retries migrate_pages: move THP/hugetlb migration support check to simplify code migrate_pages: batch flushing TLB migrate_pages: share more code between _unmap and _move ...
2023-02-21Merge tag 'v6.2' into iommufd.git for-nextJason Gunthorpe
Resolve conflicts from the signature change in iommu_map: - drivers/infiniband/hw/usnic/usnic_uiom.c Switch iommu_map_atomic() to iommu_map(.., GFP_ATOMIC) - drivers/vfio/vfio_iommu_type1.c Following indenting change for GFP_KERNEL Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-19IB/mlx5: Extend debug control for CC parametersEdward Srouji
This patch adds rtt_resp_dscp to the current debug controllability of congestion control (CC) parameters. rtt_resp_dscp can be read or written through debugfs. If set, its value overwrites the DSCP of the generated RTT response. Signed-off-by: Edward Srouji <edwards@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Link: https://lore.kernel.org/r/1dcc3440ee53c688f19f579a051ded81a2aaa70a.1676538714.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-18Merge branches 'apple/dart', 'arm/exynos', 'arm/renesas', 'arm/smmu', ↵Joerg Roedel
'x86/vt-d', 'x86/amd' and 'core' into next
2023-02-17IB/hfi1: Fix sdma.h tx->num_descs off-by-one errorsPatrick Kelsey
Fix three sources of error involving struct sdma_txreq.num_descs. When _extend_sdma_tx_descs() extends the descriptor array, it uses the value of tx->num_descs to determine how many existing entries from the tx's original, internal descriptor array to copy to the newly allocated one. As this value was incremented before the call, the copy loop will access one entry past the internal descriptor array, copying its contents into the corresponding slot in the new array. If the call to _extend_sdma_tx_descs() fails, _pad_smda_tx_descs() then invokes __sdma_tx_clean() which uses the value of tx->num_desc to drive a loop that unmaps all descriptor entries in use. As this value was incremented before the call, the unmap loop will invoke sdma_unmap_desc() on a descriptor entry whose contents consist of whatever random data was copied into it during (1), leading to cascading further calls into the kernel and driver using arbitrary data. _sdma_close_tx() was using tx->num_descs instead of tx->num_descs - 1. Fix all of the above by: - Only increment .num_descs after .descp is extended. - Use .num_descs - 1 instead of .num_descs for last .descp entry. Fixes: f4d26d81ad7f ("staging/rdma/hfi1: Add coalescing support for SDMA TX descriptors") Link: https://lore.kernel.org/r/167656658879.2223096.10026561343022570690.stgit@awfm-02.cornelisnetworks.com Signed-off-by: Brendan Cunningham <bcunningham@cornelisnetworks.com> Signed-off-by: Patrick Kelsey <pat.kelsey@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-17IB/hfi1: Fix math bugs in hfi1_can_pin_pages()Patrick Kelsey
Fix arithmetic and logic errors in hfi1_can_pin_pages() that would allow hfi1 to attempt pinning pages in cases where it should not because of resource limits or lack of required capability. Fixes: 2c97ce4f3c29 ("IB/hfi1: Add pin query function") Link: https://lore.kernel.org/r/167656658362.2223096.10954762619837718026.stgit@awfm-02.cornelisnetworks.com Signed-off-by: Brendan Cunningham <bcunningham@cornelisnetworks.com> Signed-off-by: Patrick Kelsey <pat.kelsey@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-17RDMA/irdma: Add support for dmabuf pin memory regionsZhu Yanjun
This is a followup to the EFA dmabuf[1]. Irdma driver currently does not support on-demand-paging(ODP). So it uses habanalabs as the dmabuf exporter, and irdma as the importer to allow for peer2peer access through libibverbs. In this commit, the function ib_umem_dmabuf_get_pinned() is used. This function is introduced in EFA dmabuf[1] which allows the driver to get a dmabuf umem which is pinned and does not require move_notify callback implementation. The returned umem is pinned and DMA mapped like standard cpu umems, and is released through ib_umem_release(). [1]https://lore.kernel.org/lkml/20211007114018.GD2688930@ziepe.ca/t/ Link: https://lore.kernel.org/r/20230217011425.498847-1-yanjun.zhu@intel.com Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-17Merge mlx5-next into rdma.git for-nextJason Gunthorpe
Synchronize the shared mlx5 branch with net: - From Jiri: fixe a deadlock in mlx5_ib's netdev notifier unregister. - From Mark and Patrisious: add IPsec RoCEv2 support. - From Or: Rely on firmware to get special mkeys * branch mlx5-next: RDMA/mlx5: Use query_special_contexts for mkeys net/mlx5e: Use query_special_contexts for mkeys net/mlx5: Change define name for 0x100 lkey value net/mlx5: Expose bits for querying special mkeys net/mlx5: Configure IPsec steering for egress RoCEv2 traffic net/mlx5: Configure IPsec steering for ingress RoCEv2 traffic net/mlx5: Add IPSec priorities in RDMA namespaces net/mlx5: Implement new destination type TABLE_TYPE net/mlx5: Introduce new destination type TABLE_TYPE RDMA/mlx5: Track netdev to avoid deadlock during netdev notifier unregister net/mlx5e: Propagate an internal event in case uplink netdev changes net/mlx5e: Fix trap event handling Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-17RDMA/mlx5: Use query_special_contexts for mkeysOr Har-Toov
Use query_sepcial_contexts to get the correct value of mkeys such as null_mkey, terminate_scatter_list_mkey and dump_fill_mkey, as FW will change them in certain configurations. Link: https://lore.kernel.org/r/000236f0a9487d48809f87bcc3620a3964b2d3d3.1673960981.git.leon@kernel.org Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-17net/mlx5: Change define name for 0x100 lkey valueOr Har-Toov
Change define of 0x100 lkey value from MLX5_INVALID_LKEY to be MLX5_TERMINATE_SCATTER_LIST_LKEY as 0x100 is the value of terminate_scatter_list_mkey. Link: https://lore.kernel.org/r/3a116dc3fbae4cb6b76a63d27d418830b06ade0c.1673960981.git.leon@kernel.org Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-17Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Some of the devlink bits were tricky, but I think I got it right. Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-16RDMA/rxe: Fix missing memory barriers in rxe_queue.hBob Pearson
An earlier patch which introduced smp_load_acquire/smp_store_release into rxe_queue.h incorrectly assumed that surrounding spin-locks in rxe_verbs.c around queue updates for kernel ulps was sufficient to protect the passing of data through the queues between the ulp and the rxe tasklets. But this was incorrect. The typical sequence was ulp rxe requester tasklet ------------------------ --------------------- spin_lock_irqsave() wqe = queue_head(queue) if (!queue_full(q)) { if (!wqe) spin_unlock_irqrestore return; return -ENOMEM } <process wqe> wqe = queue_producer_addr(q) <fill in wqe> queue_advance_consumer(queue) queue_advance_producer(q) spin_unlock_irqrestore() queue_head() calls queue_empty() which calls smp_load_acquire() For user space apps queue_advance_producer() calls smp_store_release() so that there is a memory barrier between the producer and the consumer but for kernel ulps queue_advance_produce() just incremented the producer index because the lock function is a release function. But to work the barrier has to come between filling in the wqe and updating the producer index. This patch adds the missing barriers. It also changes the enum names for the ulp queue types to QUEUE_TYPE_FROM/TO_ULP instead of QUEUE_TYPE_TO/FROM_DRIVER which is very ambiguous. This bug is suspected as the cause of very rare lockups in a very high scale storage application. It is a bug in any case and should be corrected. Fixes: 0a67c46d2e99 ("RDMA/rxe: Protect user space index loads/stores") Link: https://lore.kernel.org/r/20230214071053.5395-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-16RDMA/mana_ib: Fix a bug when the PF indicates more entries for registering ↵Long Li
memory on first packet When registering memory in a large chunk that doesn't fit into a single PF message, the PF may return GDMA_STATUS_MORE_ENTRIES on the first message if there are more messages needed for registering more chunks. Fix the VF to make it process the correct return code. Fixes: 0266a177631d ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter") Link: https://lore.kernel.org/r/1676507522-21018-1-git-send-email-longli@linuxonhyperv.com Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-16RDMA/rxe: Remove rxe_alloc()Bob Pearson
Currently all the object types in the rxe driver are allocated in rdma-core except for MRs. By moving tha kzalloc() call outside of the pool code the rxe_alloc() subroutine can be eliminated and code checking for MR as a special case can be removed. This patch moves the kzalloc() and kfree_rcu() calls into the mr registration and destruction verbs. It removes that code from rxe_pool.c including the rxe_alloc() subroutine which is no longer used. Link: https://lore.kernel.org/r/20230213225551.12437-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-16RDMA/cma: Distinguish between sockaddr_in and sockaddr_in6 by sizeKees Cook
Clang can do some aggressive inlining, which provides it with greater visibility into the sizes of various objects that are passed into helpers. Specifically, compare_netdev_and_ip() can see through the type given to the "sa" argument, which means it can generate code for "struct sockaddr_in" that would have been passed to ipv6_addr_cmp() (that expects to operate on the larger "struct sockaddr_in6"), which would result in a compile-time buffer overflow condition detected by memcmp(). Logically, this state isn't reachable due to the sa_family assignment two callers above and the check in compare_netdev_and_ip(). Instead, provide a compile-time check on sizes so the size-mismatched code will be elided when inlining. Avoids the following warning from Clang: ../include/linux/fortify-string.h:652:4: error: call to '__read_overflow' declared with 'error' attribute: detected read beyond size of object (1st parameter) __read_overflow(); ^ note: In function 'cma_netevent_callback' note: which inlined function 'node_from_ndev_ip' 1 error generated. When the underlying object size is not known (e.g. with GCC and older Clang), the result of __builtin_object_size() is SIZE_MAX, which will also compile away, leaving the code as it was originally. Link: https://lore.kernel.org/r/20230208232549.never.139-kees@kernel.org Link: https://github.com/ClangBuiltLinux/linux/issues/1687 Signed-off-by: Kees Cook <keescook@chromium.org> Tested-by: Nathan Chancellor <nathan@kernel.org> # build Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-16Subject: RDMA/rxe: Handle zero length rdmaBob Pearson
Currently the rxe driver does not handle all cases of zero length rdma operations correctly. The client does not have to provide an rkey for zero length RDMA read or write operations so the rkey provided may be invalid and should not be used to lookup an mr. This patch corrects the driver to ignore the provided rkey if the reth length is zero for read or write operations and make sure to set the mr to NULL. In read_reply() if length is zero rxe_recheck_mr() is not called. Warnings are added in the routines in rxe_mr.c to catch NULL MRs when the length is non-zero. Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://lore.kernel.org/r/20230202044240.6304-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Reviewed-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-15iw_cxgb4: Fix potential NULL dereference in c4iw_fill_res_cm_id_entry()Dan Carpenter
This condition needs to match the previous "if (epcp->state == LISTEN) {" exactly to avoid a NULL dereference of either "listen_ep" or "ep". The problem is that "epcp" has been re-assigned so just testing "if (epcp->state == LISTEN) {" a second time is not sufficient. Fixes: 116aeb887371 ("iw_cxgb4: provide detailed provider-specific CM_ID information") Signed-off-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/r/Y+usKuWIKr4dimZh@kili Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-15RDMA/mlx5: Use rdma_umem_for_each_dma_block()Jason Gunthorpe
Replace an open coding of rdma_umem_for_each_dma_block() with the proper function. Fixes: b3d47ebd4908 ("RDMA/mlx5: Use mlx5_umr_post_send_wait() to update MR pas") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v1-c13a5b88359b+556d0-mlx5_umem_block_jgg@nvidia.com Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-14net/mlx5: Lag, Add single RDMA device in multiport modeMark Bloch
In MultiPort E-Switch mode a single RDMA is created. This device has multiple RDMA ports that represent the uplink ports that are connected to the E-Switch. Account for this when creating the RDMA device so it has an additional port for the non native uplink. As a side effect of this patch, use shared fdb in multiport eswitch mode. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-09mm: replace vma->vm_flags direct modifications with modifier callsSuren Baghdasaryan
Replace direct modifications to vma->vm_flags with calls to modifier functions to be able to track flag changes and to keep vma locking correctness. [akpm@linux-foundation.org: fix drivers/misc/open-dice.c, per Hyeonggon Yoo] Link: https://lkml.kernel.org/r/20230126193752.297968-5-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjun Roy <arjunroy@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Oskolkov <posk@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Song Liu <songliubraving@fb.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-08Merge tag 'mlx5-next-netdev-deadlock' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== mlx5-next-netdev-deadlock This series from Jiri solves a deadlock when removing a network namespace with mlx5 devlink instance being in it. The deadlock is between: 1) mlx5_ib->unregister_netdevice_notifier() AND 2) mlx5_core->devlink_reload->cleanup_net() To slove this introduced mlx5 netdev added/removed events to track uplink netdev to be used for register_netdevice_notifier_dev_net() purposes. * tag 'mlx5-next-netdev-deadlock' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: RDMA/mlx5: Track netdev to avoid deadlock during netdev notifier unregister net/mlx5e: Propagate an internal event in case uplink netdev changes net/mlx5e: Fix trap event handling net/mlx5: Introduce CQE error syndrome ==================== Link: https://lore.kernel.org/r/20230208005626.72930-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-08RDMA/mlx5: Track netdev to avoid deadlock during netdev notifier unregisterJiri Pirko
When removing a network namespace with mlx5 devlink instance being in it, following callchain is performed: cleanup_net (takes down_read(&pernet_ops_rwsem) devlink_pernet_pre_exit() devlink_reload() mlx5_devlink_reload_down() mlx5_unload_one_devl_locked() mlx5_detach_device() del_adev() mlx5r_remove() __mlx5_ib_remove() mlx5_ib_roce_cleanup() mlx5_remove_netdev_notifier() unregister_netdevice_notifier (takes down_write(&pernet_ops_rwsem) This deadlocks. Resolve this by converting to register_netdevice_notifier_dev_net() which does not take pernet_ops_rwsem and moves the notifier block around according to netdev it takes as arg. Use previously introduced netdev added/removed events to track uplink netdev to be used for register_netdevice_notifier_dev_net() purposes. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-08RDMA/irdma: Cap MSIX used to online CPUs + 1Mustafa Ismail
The irdma driver can use a maximum number of msix vectors equal to num_online_cpus() + 1 and the kernel warning stack below is shown if that number is exceeded. The kernel throws a warning as the driver tries to update the affinity hint with a CPU mask greater than the max CPU IDs. Fix this by capping the MSIX vectors to num_online_cpus() + 1. WARNING: CPU: 7 PID: 23655 at include/linux/cpumask.h:106 irdma_cfg_ceq_vector+0x34c/0x3f0 [irdma] RIP: 0010:irdma_cfg_ceq_vector+0x34c/0x3f0 [irdma] Call Trace: irdma_rt_init_hw+0xa62/0x1290 [irdma] ? irdma_alloc_local_mac_entry+0x1a0/0x1a0 [irdma] ? __is_kernel_percpu_address+0x63/0x310 ? rcu_read_lock_held_common+0xe/0xb0 ? irdma_lan_unregister_qset+0x280/0x280 [irdma] ? irdma_request_reset+0x80/0x80 [irdma] ? ice_get_qos_params+0x84/0x390 [ice] irdma_probe+0xa40/0xfc0 [irdma] ? rcu_read_lock_bh_held+0xd0/0xd0 ? irdma_remove+0x140/0x140 [irdma] ? rcu_read_lock_sched_held+0x62/0xe0 ? down_write+0x187/0x3d0 ? auxiliary_match_id+0xf0/0x1a0 ? irdma_remove+0x140/0x140 [irdma] auxiliary_bus_probe+0xa6/0x100 __driver_probe_device+0x4a4/0xd50 ? __device_attach_driver+0x2c0/0x2c0 driver_probe_device+0x4a/0x110 __driver_attach+0x1aa/0x350 bus_for_each_dev+0x11d/0x1b0 ? subsys_dev_iter_init+0xe0/0xe0 bus_add_driver+0x3b1/0x610 driver_register+0x18e/0x410 ? 0xffffffffc0b88000 irdma_init_module+0x50/0xaa [irdma] do_one_initcall+0x103/0x5f0 ? perf_trace_initcall_level+0x420/0x420 ? do_init_module+0x4e/0x700 ? __kasan_kmalloc+0x7d/0xa0 ? kmem_cache_alloc_trace+0x188/0x2b0 ? kasan_unpoison+0x21/0x50 do_init_module+0x1d1/0x700 load_module+0x3867/0x5260 ? layout_and_allocate+0x3990/0x3990 ? rcu_read_lock_held_common+0xe/0xb0 ? rcu_read_lock_sched_held+0x62/0xe0 ? rcu_read_lock_bh_held+0xd0/0xd0 ? __vmalloc_node_range+0x46b/0x890 ? lock_release+0x5c8/0xba0 ? alloc_vm_area+0x120/0x120 ? selinux_kernel_module_from_file+0x2a5/0x300 ? __inode_security_revalidate+0xf0/0xf0 ? __do_sys_init_module+0x1db/0x260 __do_sys_init_module+0x1db/0x260 ? load_module+0x5260/0x5260 ? do_syscall_64+0x22/0x450 do_syscall_64+0xa5/0x450 entry_SYSCALL_64_after_hwframe+0x66/0xdb Fixes: 44d9e52977a1 ("RDMA/irdma: Implement device initialization definitions") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Sindhu Devale <sindhu.devale@intel.com> Link: https://lore.kernel.org/r/20230207201938.1329-1-sindhu.devale@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-07RDMA/mlx5: Check reg_create() create for errorsDan Carpenter
The reg_create() can fail. Check for errors before dereferencing it. Fixes: dd1b913fb0d0 ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow") Signed-off-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/r/Y+ERYy4wN0LsKsm+@kili Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-07RDMA/rtrs: Don't call kobject_del for srv_path->kobjLi Zhijian
As the mention in commmit f7452a7e96c1 ("RDMA/rtrs-srv: fix memory leak by missing kobject free"), it was intended to remove the kobject_del for srv_path->kobj. f7452a7e96c1 said: >This patch moves kobject_del() into free_sess() so that the kobject of > rtrs_srv_sess can be freed. This patch also move rtrs_srv_destroy_once_sysfs_root_folders back to 'if (srv_path->kobj.state_in_sysfs)' block to avoid a 'held lock freed!' A kernel panic will be triggered by following script ----------------------- $ while true do echo "sessname=foo path=ip:<ip address> device_path=/dev/nvme0n1" > /sys/devices/virtual/rnbd-client/ctl/map_device echo "normal" > /sys/block/rnbd0/rnbd/unmap_device done ----------------------- The bisection pointed to commit 6af4609c18b3 ("RDMA/rtrs-srv: Fix several issues in rtrs_srv_destroy_path_files") at last. rnbd_server L777: </dev/nvme0n1@foo>: Opened device 'nvme0n1' general protection fault, probably for non-canonical address 0x765f766564753aea: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 3558 Comm: systemd-udevd Kdump: loaded Not tainted 6.1.0-rc3-roce-flush+ #51 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:kernfs_dop_revalidate+0x36/0x180 Code: 00 00 41 55 41 54 55 53 48 8b 47 68 48 89 fb 48 85 c0 0f 84 db 00 00 00 48 8b a8 60 04 00 00 48 8b 45 30 48 85 c0 48 0f 44 c5 <4c> 8b 60 78 49 81 c4 d8 00 00 00 4c 89 e7 e8 b7 78 7b 00 8b 05 3d RSP: 0018:ffffaf1700b67c78 EFLAGS: 00010206 RAX: 765f766564753a72 RBX: ffff89e2830849c0 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff89e2830849c0 RBP: ffff89e280361bd0 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000065 R11: 0000000000000000 R12: ffff89e2830849c0 R13: ffff89e283084888 R14: d0d0d0d0d0d0d0d0 R15: 2f2f2f2f2f2f2f2f FS: 00007f13fbce7b40(0000) GS:ffff89e2bbc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f93e055d340 CR3: 0000000104664002 CR4: 00000000001706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> lookup_fast+0x7b/0x100 walk_component+0x21/0x160 link_path_walk.part.0+0x24d/0x390 path_openat+0xad/0x9a0 do_filp_open+0xa9/0x150 ? lock_release+0x13c/0x2e0 ? _raw_spin_unlock+0x29/0x50 ? alloc_fd+0x124/0x1f0 do_sys_openat2+0x9b/0x160 __x64_sys_openat+0x54/0xa0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f13fc9d701b Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 91 00 00 00 48 8b 54 24 28 64 48 2b 14 25 RSP: 002b:00007ffddf242640 EFLAGS: 00000246 ORIG_RAX: 0000000000000101 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f13fc9d701b RDX: 0000000000080000 RSI: 00007ffddf2427c0 RDI: 00000000ffffff9c RBP: 00007ffddf2427c0 R08: 00007f13fcc5b440 R09: 21b2131aa64b1ef2 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000080000 R13: 00007ffddf2427c0 R14: 000055ed13be8db0 R15: 0000000000000000 Fixes: 6af4609c18b3 ("RDMA/rtrs-srv: Fix several issues in rtrs_srv_destroy_path_files") Acked-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Link: https://lore.kernel.org/r/1675332721-2-1-git-send-email-lizhijian@fujitsu.com Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-06Merge branch 'aux-bus-v11' of https://github.com/ajitkhaparde1/linuxJakub Kicinski
Ajit Khaparde says: ==================== bnxt: Add Auxiliary driver support Add auxiliary device driver for Broadcom devices. The bnxt_en driver will register and initialize an aux device if RDMA is enabled in the underlying device. The bnxt_re driver will then probe and initialize the RoCE interfaces with the infiniband stack. We got rid of the bnxt_en_ops which the bnxt_re driver used to communicate with bnxt_en. Similarly We have tried to clean up most of the bnxt_ulp_ops. In most of the cases we used the functions and entry points provided by the auxiliary bus driver framework. And now these are the minimal functions needed to support the functionality. We will try to work on getting rid of the remaining if we find any other viable option in future. * 'aux-bus-v11' of https://github.com/ajitkhaparde1/linux: bnxt_en: Remove runtime interrupt vector allocation RDMA/bnxt_re: Remove the sriov config callback bnxt_en: Remove struct bnxt access from RoCE driver bnxt_en: Use auxiliary bus calls over proprietary calls bnxt_en: Use direct API instead of indirection bnxt_en: Remove usage of ulp_id RDMA/bnxt_re: Use auxiliary driver interface bnxt_en: Add auxiliary driver support ==================== Link: https://lore.kernel.org/r/20230202033809.3989-1-ajit.khaparde@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-06RDMA/cxgb4: Fix potential null-ptr-deref in pass_establish()Nikita Zhandarovich
If get_ep_from_tid() fails to lookup non-NULL value for ep, ep is dereferenced later regardless of whether it is empty. This patch adds a simple sanity check to fix the issue. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 944661dd97f4 ("RDMA/iw_cxgb4: atomically lookup ep and get a reference") Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Link: https://lore.kernel.org/r/20230202184850.29882-1-n.zhandarovich@fintech.ru Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-06RDMA/mlx5: Remove impossible check of mkey cache cleanup failureLeon Romanovsky
mlx5_mkey_cache_cleanup() can't fail and can be changed to be void. Link: https://lore.kernel.org/r/1acd9528995d083114e7dec2a2afc59436406583.1675328463.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-02-06RDMA/mlx5: Fix MR cache debugfs error in IB representors modeLeon Romanovsky
Block MR cache debugfs creation for IB representor flow as MR cache shouldn't be used at all in that mode. As part of this change, add missing debugfs cleanup in error path too. This change fixes the following debugfs errors: bond0: (slave enp8s0f1): Enslaving as a backup interface with an up link mlx5_core 0000:08:00.0: lag map: port 1:1 port 2:1 mlx5_core 0000:08:00.0: shared_fdb:1 mode:queue_affinity mlx5_core 0000:08:00.0: Operation mode is single FDB debugfs: Directory '2' with parent '/' already present! ... debugfs: Directory '22' with parent '/' already present! Fixes: 73d09b2fe833 ("RDMA/mlx5: Introduce mlx5r_cache_rb_key") Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://lore.kernel.org/r/482a78c54acbcfa1742a0e06a452546428900ffa.1675328463.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-02-06RDMA/siw: Fix user page pinning accountingBernard Metzler
To avoid racing with other user memory reservations, immediately account full amount of pages to be pinned. Fixes: 2251334dcac9 ("rdma/siw: application buffer management") Reported-by: Jason Gunthorpe <jgg@nvidia.com> Suggested-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20230202101000.402990-1-bmt@zurich.ibm.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-06RDMA/mana_ib: Prevent array underflow in mana_ib_create_qp_raw()Dan Carpenter
The "port" comes from the user and if it is zero then the: ndev = mc->ports[port - 1]; assignment does an out of bounds read. I have changed the if statement to fix this and to mirror how it is done in mana_ib_create_qp_rss(). Fixes: 0266a177631d ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter") Signed-off-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/r/Y8/3Vn8qx00kE9Kk@kili Acked-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-02scripts/spelling.txt: add "exsits" pattern and fix typo instancesLuca Ceresoli
Fix typos and add the following to the scripts/spelling.txt: exsits||exists Link: https://lkml.kernel.org/r/20230126152205.959277-1-luca.ceresoli@bootlin.com Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02RDMA/cxgb4: add null-ptr-check after ip_dev_find()Nikita Zhandarovich
ip_dev_find() may return NULL and assign it to pdev which is dereferenced later. Fix this by checking the return value of ip_dev_find() for NULL similar to the way it is done with other instances of said function. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 1cab775c3e75 ("RDMA/cxgb4: Fix LE hash collision bug for passive open connection") Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Link: https://lore.kernel.org/r/20230201172103.17261-1-n.zhandarovich@fintech.ru Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-01bnxt_en: Remove runtime interrupt vector allocationAjit Khaparde
Modified the bnxt_en code to create and pre-configure RDMA devices with the right MSI-X vector count for the ROCE driver to use. This is to align the ROCE driver to the auxiliary device model which will simply bind the driver without getting into PCI-related handling. All PCI-related logic will now be in the bnxt_en driver. Suggested-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
2023-02-01RDMA/bnxt_re: Remove the sriov config callbackAjit Khaparde
Remove the SRIOV config callback which the bnxt_en was calling to reconfigure the chip resources for a PF device when VFs are created. The code is now modified to provision the VF resources based on the total VF count instead of the actual VF count. This allows the SRIOV config callback to be removed from the list of ulp_ops. Suggested-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
2023-02-01bnxt_en: Remove struct bnxt access from RoCE driverHongguang Gao
Decouple RoCE driver from directly accessing L2's private bnxt structure. Move the fields needed by RoCE driver into bnxt_en_dev. They'll be passed to RoCE driver by bnxt_rdma_aux_device_add() function. Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
2023-02-01bnxt_en: Use auxiliary bus calls over proprietary callsAjit Khaparde
Wherever possible use the function ops provided by auxiliary bus instead of using proprietary ops. Defined bnxt_re_suspend and bnxt_re_resume calls which can be invoked by the bnxt_en driver instead of the ULP stop/start calls. Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
2023-02-01bnxt_en: Use direct API instead of indirectionAjit Khaparde
For a single ULP user there is no need for complicating function indirection calls. Remove all this complexity in favour of direct function calls exported by the bnxt_en driver. This allows to simplify the code greatly. Also remove unused ulp_async_notifier. Suggested-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
2023-02-01bnxt_en: Remove usage of ulp_idAjit Khaparde
Since the driver continues to use the single ULP model, the extra complexity and indirection is unnecessary. Remove the usage of ulp_id from the code. Suggested-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
2023-02-01RDMA/bnxt_re: Use auxiliary driver interfaceAjit Khaparde
Use auxiliary driver interface for driver load, unload ROCE driver. The driver does not need to register the interface using the netdev notifier anymore. Removed the bnxt_re_dev_list which is not needed. Currently probe, remove and shutdown ops have been implemented for the auxiliary device. Also remove exccessve validation checks for rdev. Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
2023-01-31IB/hfi1: Assign npages earlierDean Luick
Improve code clarity and enable earlier use of tidbuf->npages by moving its assignment to structure creation time. Signed-off-by: Dean Luick <dean.luick@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Link: https://lore.kernel.org/r/167329104884.1472990.4639750192433251493.stgit@awfm-02.cornelisnetworks.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-31RDMA/umem: Use dma-buf locked API to solve deadlockMaor Gottlieb
The cited commit moves umem to call the unlocked versions of dmabuf unmap/map attachment, but the lock is held while calling to these functions, hence move back to the locked versions of these APIs. Fixes: 21c9c5c0784f ("RDMA/umem: Prepare to dynamic dma-buf locking specification") Link: https://lore.kernel.org/r/311c2cb791f8af75486df446819071357353db1b.1675088709.git.leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-30RDMA/usnic: use iommu_map_atomic() under spin_lock()Yang Yingliang
usnic_uiom_map_sorted_intervals() is called under spin_lock(), iommu_map() might sleep, use iommu_map_atomic() to avoid potential sleep in atomic context. Fixes: e3cf00d0a87f ("IB/usnic: Add Cisco VIC low-level hardware driver") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20230129093757.637354-1-yangyingliang@huawei.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-01-29RDMA/irdma: Fix potential NULL-ptr-dereferenceNikita Zhandarovich
in_dev_get() can return NULL which will cause a failure once idev is dereferenced in in_dev_for_each_ifa_rtnl(). This patch adds a check for NULL value in idev beforehand. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager") Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Link: https://lore.kernel.org/r/20230126185230.62464-1-n.zhandarovich@fintech.ru Reviewed-by: Sindhu Devale <sindhu.devale@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-01-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Conflicts: drivers/net/ethernet/intel/ice/ice_main.c 418e53401e47 ("ice: move devlink port creation/deletion") 643ef23bd9dd ("ice: Introduce local var for readability") https://lore.kernel.org/all/20230127124025.0dacef40@canb.auug.org.au/ https://lore.kernel.org/all/20230124005714.3996270-1-anthony.l.nguyen@intel.com/ drivers/net/ethernet/engleder/tsnep_main.c 3d53aaef4332 ("tsnep: Fix TX queue stop/wake for multiple queues") 25faa6a4c5ca ("tsnep: Replace TX spin_lock with __netif_tx_lock") https://lore.kernel.org/all/20230127123604.36bb3e99@canb.auug.org.au/ net/netfilter/nf_conntrack_proto_sctp.c 13bd9b31a969 ("Revert "netfilter: conntrack: add sctp DATA_SENT state"") a44b7651489f ("netfilter: conntrack: unify established states for SCTP paths") f71cb8f45d09 ("netfilter: conntrack: sctp: use nf log infrastructure for invalid packets") https://lore.kernel.org/all/20230127125052.674281f9@canb.auug.org.au/ https://lore.kernel.org/all/d36076f3-6add-a442-6d4b-ead9f7ffff86@tessares.net/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-27RDMA/mlx5: Add work to remove temporary entries from the cacheMichael Guralnik
The non-cache mkeys are stored in the cache only to shorten restarting application time. Don't store them longer than needed. Configure cache entries that store non-cache MRs as temporary entries. If 30 seconds have passed and no user reclaimed the temporarily cached mkeys, an asynchronous work will destroy the mkeys entries. Link: https://lore.kernel.org/r/20230125222807.6921-7-michaelgur@nvidia.com Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>