summaryrefslogtreecommitdiff
path: root/drivers/xen
AgeCommit message (Collapse)Author
2023-08-29Merge tag 'dma-mapping-6.6-2023-08-29' of ↵Linus Torvalds
git://git.infradead.org/users/hch/dma-mapping Pull dma-maping updates from Christoph Hellwig: - allow dynamic sizing of the swiotlb buffer, to cater for secure virtualization workloads that require all I/O to be bounce buffered (Petr Tesarik) - move a declaration to a header (Arnd Bergmann) - check for memory region overlap in dma-contiguous (Binglei Wang) - remove the somewhat dangerous runtime swiotlb-xen enablement and unexport is_swiotlb_active (Christoph Hellwig, Juergen Gross) - per-node CMA improvements (Yajun Deng) * tag 'dma-mapping-6.6-2023-08-29' of git://git.infradead.org/users/hch/dma-mapping: swiotlb: optimize get_max_slots() swiotlb: move slot allocation explanation comment where it belongs swiotlb: search the software IO TLB only if the device makes use of it swiotlb: allocate a new memory pool when existing pools are full swiotlb: determine potential physical address limit swiotlb: if swiotlb is full, fall back to a transient memory pool swiotlb: add a flag whether SWIOTLB is allowed to grow swiotlb: separate memory pool data from other allocator data swiotlb: add documentation and rename swiotlb_do_find_slots() swiotlb: make io_tlb_default_mem local to swiotlb.c swiotlb: bail out of swiotlb_init_late() if swiotlb is already allocated dma-contiguous: check for memory region overlap dma-contiguous: support numa CMA for specified node dma-contiguous: support per-numa CMA for all architectures dma-mapping: move arch_dma_set_mask() declaration to header swiotlb: unexport is_swiotlb_active x86: always initialize xen-swiotlb when xen-pcifront is enabling xen/pci: add flag for PCI passthrough being possible
2023-08-22xen: privcmd: Add support for irqfdViresh Kumar
Xen provides support for injecting interrupts to the guests via the HYPERVISOR_dm_op() hypercall. The same is used by the Virtio based device backend implementations, in an inefficient manner currently. Generally, the Virtio backends are implemented to work with the Eventfd based mechanism. In order to make such backends work with Xen, another software layer needs to poll the Eventfds and raise an interrupt to the guest using the Xen based mechanism. This results in an extra context switch. This is not a new problem in Linux though. It is present with other hypervisors like KVM, etc. as well. The generic solution implemented in the kernel for them is to provide an IOCTL call to pass the interrupt details and eventfd, which lets the kernel take care of polling the eventfd and raising of the interrupt, instead of handling this in user space (which involves an extra context switch). This patch adds support to inject a specific interrupt to guest using the eventfd mechanism, by preventing the extra context switch. Inspired by existing implementations for KVM, etc.. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/8e724ac1f50c2bc1eb8da9b3ff6166f1372570aa.1692697321.git.viresh.kumar@linaro.org Signed-off-by: Juergen Gross <jgross@suse.com>
2023-08-22xen/xenbus: Avoid a lockdep warning when adding a watchPetr Pavlu
The following lockdep warning appears during boot on a Xen dom0 system: [ 96.388794] ====================================================== [ 96.388797] WARNING: possible circular locking dependency detected [ 96.388799] 6.4.0-rc5-default+ #8 Tainted: G EL [ 96.388803] ------------------------------------------------------ [ 96.388804] xenconsoled/1330 is trying to acquire lock: [ 96.388808] ffffffff82acdd10 (xs_watch_rwsem){++++}-{3:3}, at: register_xenbus_watch+0x45/0x140 [ 96.388847] but task is already holding lock: [ 96.388849] ffff888100c92068 (&u->msgbuffer_mutex){+.+.}-{3:3}, at: xenbus_file_write+0x2c/0x600 [ 96.388862] which lock already depends on the new lock. [ 96.388864] the existing dependency chain (in reverse order) is: [ 96.388866] -> #2 (&u->msgbuffer_mutex){+.+.}-{3:3}: [ 96.388874] __mutex_lock+0x85/0xb30 [ 96.388885] xenbus_dev_queue_reply+0x48/0x2b0 [ 96.388890] xenbus_thread+0x1d7/0x950 [ 96.388897] kthread+0xe7/0x120 [ 96.388905] ret_from_fork+0x2c/0x50 [ 96.388914] -> #1 (xs_response_mutex){+.+.}-{3:3}: [ 96.388923] __mutex_lock+0x85/0xb30 [ 96.388930] xenbus_backend_ioctl+0x56/0x1c0 [ 96.388935] __x64_sys_ioctl+0x90/0xd0 [ 96.388942] do_syscall_64+0x5c/0x90 [ 96.388950] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 96.388957] -> #0 (xs_watch_rwsem){++++}-{3:3}: [ 96.388965] __lock_acquire+0x1538/0x2260 [ 96.388972] lock_acquire+0xc6/0x2b0 [ 96.388976] down_read+0x2d/0x160 [ 96.388983] register_xenbus_watch+0x45/0x140 [ 96.388990] xenbus_file_write+0x53d/0x600 [ 96.388994] vfs_write+0xe4/0x490 [ 96.389003] ksys_write+0xb8/0xf0 [ 96.389011] do_syscall_64+0x5c/0x90 [ 96.389017] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 96.389023] other info that might help us debug this: [ 96.389025] Chain exists of: xs_watch_rwsem --> xs_response_mutex --> &u->msgbuffer_mutex [ 96.413429] Possible unsafe locking scenario: [ 96.413430] CPU0 CPU1 [ 96.413430] ---- ---- [ 96.413431] lock(&u->msgbuffer_mutex); [ 96.413432] lock(xs_response_mutex); [ 96.413433] lock(&u->msgbuffer_mutex); [ 96.413434] rlock(xs_watch_rwsem); [ 96.413436] *** DEADLOCK *** [ 96.413436] 1 lock held by xenconsoled/1330: [ 96.413438] #0: ffff888100c92068 (&u->msgbuffer_mutex){+.+.}-{3:3}, at: xenbus_file_write+0x2c/0x600 [ 96.413446] An ioctl call IOCTL_XENBUS_BACKEND_SETUP (record #1 in the report) results in calling xenbus_alloc() -> xs_suspend() which introduces ordering xs_watch_rwsem --> xs_response_mutex. The xenbus_thread() operation (record #2) creates xs_response_mutex --> &u->msgbuffer_mutex. An XS_WATCH write to the xenbus file then results in a complain about the opposite lock order &u->msgbuffer_mutex --> xs_watch_rwsem. The dependency xs_watch_rwsem --> xs_response_mutex is spurious. Avoid it and the warning by changing the ordering in xs_suspend(), first acquire xs_response_mutex and then xs_watch_rwsem. Reverse also the unlocking order in xs_suspend_cancel() for consistency, but keep xs_resume() as is because it needs to have xs_watch_rwsem unlocked only after exiting xs suspend and re-adding all watches. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230607123624.15739-1-petr.pavlu@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
2023-08-21xen: Fix one kernel-doc commentYang Li
Use colon to separate parameter name from their specific meaning. silence the warning: drivers/xen/grant-table.c:1051: warning: Function parameter or member 'nr_pages' not described in 'gnttab_free_pages' Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6030 Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Acked-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230731030037.123946-1-yang.lee@linux.alibaba.com Signed-off-by: Juergen Gross <jgross@suse.com>
2023-08-21xen: xenbus: Use helper function IS_ERR_OR_NULL()Li Zetao
Use IS_ERR_OR_NULL() to detect an error pointer or a null pointer open-coding to simplify the code. Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230817014736.3094289-1-lizetao1@huawei.com Signed-off-by: Juergen Gross <jgross@suse.com>
2023-08-21xen: Switch to use kmemdup() helperRuan Jinjie
Use kmemdup() helper instead of open-coding to simplify the code. Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Chen Jiahao <chenjiahao16@huawei.com> Link: https://lore.kernel.org/r/20230815092434.1206386-1-ruanjinjie@huawei.com Signed-off-by: Juergen Gross <jgross@suse.com>
2023-08-21xen-pciback: Remove unused function declarationsYue Haibing
Commit a92336a1176b ("xen/pciback: Drop two backends, squash and cleanup some code.") declared but never implemented these functions. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230808150912.43416-1-yuehaibing@huawei.com Signed-off-by: Juergen Gross <jgross@suse.com>
2023-08-01swiotlb: make io_tlb_default_mem local to swiotlb.cPetr Tesarik
SWIOTLB implementation details should not be exposed to the rest of the kernel. This will allow to make changes to the implementation without modifying non-swiotlb code. To avoid breaking existing users, provide helper functions for the few required fields. As a bonus, using a helper function to initialize struct device allows to get rid of an #ifdef in driver core. Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-07-27xen: speed up grant-table reclaimDemi Marie Obenour
When a grant entry is still in use by the remote domain, Linux must put it on a deferred list. Normally, this list is very short, because the PV network and block protocols expect the backend to unmap the grant first. However, Qubes OS's GUI protocol is subject to the constraints of the X Window System, and as such winds up with the frontend unmapping the window first. As a result, the list can grow very large, resulting in a massive memory leak and eventual VM freeze. To partially solve this problem, make the number of entries that the VM will attempt to free at each iteration tunable. The default is still 10, but it can be overridden via a module parameter. This is Cc: stable because (when combined with appropriate userspace changes) it fixes a severe performance and stability problem for Qubes OS users. Cc: stable@vger.kernel.org Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230726165354.1252-1-demi@invisiblethingslab.com Signed-off-by: Juergen Gross <jgross@suse.com>
2023-07-26xen/evtchn: Introduce new IOCTL to bind static evtchnRahul Singh
Xen 4.17 supports the creation of static evtchns. To allow user space application to bind static evtchns introduce new ioctl "IOCTL_EVTCHN_BIND_STATIC". Existing IOCTL doing more than binding that’s why we need to introduce the new IOCTL to only bind the static event channels. Static evtchns to be available for use during the lifetime of the guest. When the application exits, __unbind_from_irq() ends up being called from release() file operations because of that static evtchns are getting closed. To avoid closing the static event channel, add the new bool variable "is_static" in "struct irq_info" to mark the event channel static when creating the event channel to avoid closing the static evtchn. Also, take this opportunity to remove the open-coded version of the evtchn close in drivers/xen/evtchn.c file and use xen_evtchn_close(). Signed-off-by: Rahul Singh <rahul.singh@arm.com> Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org> Link: https://lore.kernel.org/r/ae7329bf1713f83e4aad4f3fa0f316258c40a3e9.1689677042.git.rahul.singh@arm.com Signed-off-by: Juergen Gross <jgross@suse.com>
2023-07-25xenbus: check xen_domain in xenbus_probe_initcallStefano Stabellini
The same way we already do in xenbus_init. Fixes the following warning: [ 352.175563] Trying to free already-free IRQ 0 [ 352.177355] WARNING: CPU: 1 PID: 88 at kernel/irq/manage.c:1893 free_irq+0xbf/0x350 [...] [ 352.213951] Call Trace: [ 352.214390] <TASK> [ 352.214717] ? __warn+0x81/0x170 [ 352.215436] ? free_irq+0xbf/0x350 [ 352.215906] ? report_bug+0x10b/0x200 [ 352.216408] ? prb_read_valid+0x17/0x20 [ 352.216926] ? handle_bug+0x44/0x80 [ 352.217409] ? exc_invalid_op+0x13/0x60 [ 352.217932] ? asm_exc_invalid_op+0x16/0x20 [ 352.218497] ? free_irq+0xbf/0x350 [ 352.218979] ? __pfx_xenbus_probe_thread+0x10/0x10 [ 352.219600] xenbus_probe+0x7a/0x80 [ 352.221030] xenbus_probe_thread+0x76/0xc0 Fixes: 5b3353949e89 ("xen: add support for initializing xenstore later as HVM domain") Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Tested-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Link: https://lore.kernel.org/r/alpine.DEB.2.22.394.2307211609140.3118466@ubuntu-linux-20-04-desktop Signed-off-by: Juergen Gross <jgross@suse.com>
2023-07-13Merge tag 'for-linus-6.5-rc2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - a cleanup of the Xen related ELF-notes - a fix for virtio handling in Xen dom0 when running Xen in a VM * tag 'for-linus-6.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/virtio: Fix NULL deref when a bridge of PCI root bus has no parent x86/Xen: tidy xen-head.S
2023-07-04xen/virtio: Fix NULL deref when a bridge of PCI root bus has no parentPetr Pavlu
When attempting to run Xen on a QEMU/KVM virtual machine with virtio devices (all x86_64), function xen_dt_get_node() crashes on accessing bus->bridge->parent->of_node because a bridge of the PCI root bus has no parent set: [ 1.694192][ T1] BUG: kernel NULL pointer dereference, address: 0000000000000288 [ 1.695688][ T1] #PF: supervisor read access in kernel mode [ 1.696297][ T1] #PF: error_code(0x0000) - not-present page [ 1.696297][ T1] PGD 0 P4D 0 [ 1.696297][ T1] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 1.696297][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.3.7-1-default #1 openSUSE Tumbleweed a577eae57964bb7e83477b5a5645a1781df990f0 [ 1.696297][ T1] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014 [ 1.696297][ T1] RIP: e030:xen_virtio_restricted_mem_acc+0xd9/0x1c0 [ 1.696297][ T1] Code: 45 0c 83 e8 c9 a3 ea ff 31 c0 eb d7 48 8b 87 40 ff ff ff 48 89 c2 48 8b 40 10 48 85 c0 75 f4 48 8b 82 10 01 00 00 48 8b 40 40 <48> 83 b8 88 02 00 00 00 0f 84 45 ff ff ff 66 90 31 c0 eb a5 48 89 [ 1.696297][ T1] RSP: e02b:ffffc90040013cc8 EFLAGS: 00010246 [ 1.696297][ T1] RAX: 0000000000000000 RBX: ffff888006c75000 RCX: 0000000000000029 [ 1.696297][ T1] RDX: ffff888005ed1000 RSI: ffffc900400f100c RDI: ffff888005ee30d0 [ 1.696297][ T1] RBP: ffff888006c75010 R08: 0000000000000001 R09: 0000000330000006 [ 1.696297][ T1] R10: ffff888005850028 R11: 0000000000000002 R12: ffffffff830439a0 [ 1.696297][ T1] R13: 0000000000000000 R14: ffff888005657900 R15: ffff888006e3e1e8 [ 1.696297][ T1] FS: 0000000000000000(0000) GS:ffff88804a000000(0000) knlGS:0000000000000000 [ 1.696297][ T1] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.696297][ T1] CR2: 0000000000000288 CR3: 0000000002e36000 CR4: 0000000000050660 [ 1.696297][ T1] Call Trace: [ 1.696297][ T1] <TASK> [ 1.696297][ T1] virtio_features_ok+0x1b/0xd0 [ 1.696297][ T1] virtio_dev_probe+0x19c/0x270 [ 1.696297][ T1] really_probe+0x19b/0x3e0 [ 1.696297][ T1] __driver_probe_device+0x78/0x160 [ 1.696297][ T1] driver_probe_device+0x1f/0x90 [ 1.696297][ T1] __driver_attach+0xd2/0x1c0 [ 1.696297][ T1] bus_for_each_dev+0x74/0xc0 [ 1.696297][ T1] bus_add_driver+0x116/0x220 [ 1.696297][ T1] driver_register+0x59/0x100 [ 1.696297][ T1] virtio_console_init+0x7f/0x110 [ 1.696297][ T1] do_one_initcall+0x47/0x220 [ 1.696297][ T1] kernel_init_freeable+0x328/0x480 [ 1.696297][ T1] kernel_init+0x1a/0x1c0 [ 1.696297][ T1] ret_from_fork+0x29/0x50 [ 1.696297][ T1] </TASK> [ 1.696297][ T1] Modules linked in: [ 1.696297][ T1] CR2: 0000000000000288 [ 1.696297][ T1] ---[ end trace 0000000000000000 ]--- The PCI root bus is in this case created from ACPI description via acpi_pci_root_add() -> pci_acpi_scan_root() -> acpi_pci_root_create() -> pci_create_root_bus() where the last function is called with parent=NULL. It indicates that no parent is present and then bus->bridge->parent is NULL too. Fix the problem by checking bus->bridge->parent in xen_dt_get_node() for NULL first. Fixes: ef8ae384b4c9 ("xen/virtio: Handle PCI devices which Host controller is described in DT") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Link: https://lore.kernel.org/r/20230621131214.9398-2-petr.pavlu@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
2023-06-28Merge tag 'mm-stable-2023-06-24-19-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull mm updates from Andrew Morton: - Yosry Ahmed brought back some cgroup v1 stats in OOM logs - Yosry has also eliminated cgroup's atomic rstat flushing - Nhat Pham adds the new cachestat() syscall. It provides userspace with the ability to query pagecache status - a similar concept to mincore() but more powerful and with improved usability - Mel Gorman provides more optimizations for compaction, reducing the prevalence of page rescanning - Lorenzo Stoakes has done some maintanance work on the get_user_pages() interface - Liam Howlett continues with cleanups and maintenance work to the maple tree code. Peng Zhang also does some work on maple tree - Johannes Weiner has done some cleanup work on the compaction code - David Hildenbrand has contributed additional selftests for get_user_pages() - Thomas Gleixner has contributed some maintenance and optimization work for the vmalloc code - Baolin Wang has provided some compaction cleanups, - SeongJae Park continues maintenance work on the DAMON code - Huang Ying has done some maintenance on the swap code's usage of device refcounting - Christoph Hellwig has some cleanups for the filemap/directio code - Ryan Roberts provides two patch series which yield some rationalization of the kernel's access to pte entries - use the provided APIs rather than open-coding accesses - Lorenzo Stoakes has some fixes to the interaction between pagecache and directio access to file mappings - John Hubbard has a series of fixes to the MM selftesting code - ZhangPeng continues the folio conversion campaign - Hugh Dickins has been working on the pagetable handling code, mainly with a view to reducing the load on the mmap_lock - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment from 128 to 8 - Domenico Cerasuolo has improved the zswap reclaim mechanism by reorganizing the LRU management - Matthew Wilcox provides some fixups to make gfs2 work better with the buffer_head code - Vishal Moola also has done some folio conversion work - Matthew Wilcox has removed the remnants of the pagevec code - their functionality is migrated over to struct folio_batch * tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (380 commits) mm/hugetlb: remove hugetlb_set_page_subpool() mm: nommu: correct the range of mmap_sem_read_lock in task_mem() hugetlb: revert use of page_cache_next_miss() Revert "page cache: fix page_cache_next/prev_miss off by one" mm/vmscan: fix root proactive reclaim unthrottling unbalanced node mm: memcg: rename and document global_reclaim() mm: kill [add|del]_page_to_lru_list() mm: compaction: convert to use a folio in isolate_migratepages_block() mm: zswap: fix double invalidate with exclusive loads mm: remove unnecessary pagevec includes mm: remove references to pagevec mm: rename invalidate_mapping_pagevec to mapping_try_invalidate mm: remove struct pagevec net: convert sunrpc from pagevec to folio_batch i915: convert i915_gpu_error to use a folio_batch pagevec: rename fbatch_count() mm: remove check_move_unevictable_pages() drm: convert drm_gem_put_pages() to use a folio_batch i915: convert shmem_sg_free_table() to use a folio_batch scatterlist: add sg_set_folio() ...
2023-06-27Merge tag 'wq-for-6.5-cleanup-ordered' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull ordered workqueue creation updates from Tejun Heo: "For historical reasons, unbound workqueues with max concurrency limit of 1 are considered ordered, even though the concurrency limit hasn't been system-wide for a long time. This creates ambiguity around whether ordered execution is actually required for correctness, which was actually confusing for e.g. btrfs (btrfs updates are being routed through the btrfs tree). There aren't that many users in the tree which use the combination and there are pending improvements to unbound workqueue affinity handling which will make inadvertent use of ordered workqueue a bigger loss. This clarifies the situation for most of them by updating the ones which require ordered execution to use alloc_ordered_workqueue(). There are some conversions being routed through subsystem-specific trees and likely a few stragglers. Once they're all converted, workqueue can trigger a warning on unbound + @max_active==1 usages and eventually drop the implicit ordered behavior" * tag 'wq-for-6.5-cleanup-ordered' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: rxrpc: Use alloc_ordered_workqueue() to create ordered workqueues net: qrtr: Use alloc_ordered_workqueue() to create ordered workqueues net: wwan: t7xx: Use alloc_ordered_workqueue() to create ordered workqueues dm integrity: Use alloc_ordered_workqueue() to create ordered workqueues media: amphion: Use alloc_ordered_workqueue() to create ordered workqueues scsi: NCR5380: Use default @max_active for hostdata->work_q media: coda: Use alloc_ordered_workqueue() to create ordered workqueues crypto: octeontx2: Use alloc_ordered_workqueue() to create ordered workqueues wifi: ath10/11/12k: Use alloc_ordered_workqueue() to create ordered workqueues wifi: mwifiex: Use default @max_active for workqueues wifi: iwlwifi: Use default @max_active for trans_pcie->rba.alloc_wq xen/pvcalls: Use alloc_ordered_workqueue() to create ordered workqueues virt: acrn: Use alloc_ordered_workqueue() to create ordered workqueues net: octeontx2: Use alloc_ordered_workqueue() to create ordered workqueues net: thunderx: Use alloc_ordered_workqueue() to create ordered workqueues greybus: Use alloc_ordered_workqueue() to create ordered workqueues powerpc, workqueue: Use alloc_ordered_workqueue() to create ordered workqueues
2023-06-19mm: ptep_get() conversionRyan Roberts
Convert all instances of direct pte_t* dereferencing to instead use ptep_get() helper. This means that by default, the accesses change from a C dereference to a READ_ONCE(). This is technically the correct thing to do since where pgtables are modified by HW (for access/dirty) they are volatile and therefore we should always ensure READ_ONCE() semantics. But more importantly, by always using the helper, it can be overridden by the architecture to fully encapsulate the contents of the pte. Arch code is deliberately not converted, as the arch code knows best. It is intended that arch code (arm64) will override the default with its own implementation that can (e.g.) hide certain bits from the core code, or determine young/dirty status by mixing in state from another source. Conversion was done using Coccinelle: ---- // $ make coccicheck \ // COCCI=ptepget.cocci \ // SPFLAGS="--include-headers" \ // MODE=patch virtual patch @ depends on patch @ pte_t *v; @@ - *v + ptep_get(v) ---- Then reviewed and hand-edited to avoid multiple unnecessary calls to ptep_get(), instead opting to store the result of a single call in a variable, where it is correct to do so. This aims to negate any cost of READ_ONCE() and will benefit arch-overrides that may be more complex. Included is a fix for an issue in an earlier version of this patch that was pointed out by kernel test robot. The issue arose because config MMU=n elides definition of the ptep helper functions, including ptep_get(). HUGETLB_PAGE=n configs still define a simple huge_ptep_clear_flush() for linking purposes, which dereferences the ptep. So when both configs are disabled, this caused a build error because ptep_get() is not defined. Fix by continuing to do a direct dereference when MMU=n. This is safe because for this config the arch code cannot be trying to virtualize the ptes because none of the ptep helpers are defined. Link: https://lkml.kernel.org/r/20230612151545.3317766-4-ryan.roberts@arm.com Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202305120142.yXsNEo6H-lkp@intel.com/ Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Airlie <airlied@gmail.com> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: SeongJae Park <sj@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-27Merge tag 'for-linus-6.4-rc4-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - a double free fix in the Xen pvcalls backend driver - a fix for a regression causing the MSI related sysfs entries to not being created in Xen PV guests - a fix in the Xen blkfront driver for handling insane input data better * tag 'for-linus-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/pci/xen: populate MSI sysfs entries xen/pvcalls-back: fix double frees with pvcalls_new_active_socket() xen/blkfront: Only check REQ_FUA for writes
2023-05-24xen/pvcalls-back: fix double frees with pvcalls_new_active_socket()Dan Carpenter
In the pvcalls_new_active_socket() function, most error paths call pvcalls_back_release_active(fedata->dev, fedata, map) which calls sock_release() on "sock". The bug is that the caller also frees sock. Fix this by making every error path in pvcalls_new_active_socket() release the sock, and don't free it in the caller. Fixes: 5db4d286a8ef ("xen/pvcalls: implement connect command") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/e5f98dc2-0305-491f-a860-71bbd1398a2f@kili.mountain Signed-off-by: Juergen Gross <jgross@suse.com>
2023-05-08xen/pvcalls: Use alloc_ordered_workqueue() to create ordered workqueuesTejun Heo
BACKGROUND ========== When multiple work items are queued to a workqueue, their execution order doesn't match the queueing order. They may get executed in any order and simultaneously. When fully serialized execution - one by one in the queueing order - is needed, an ordered workqueue should be used which can be created with alloc_ordered_workqueue(). However, alloc_ordered_workqueue() was a later addition. Before it, an ordered workqueue could be obtained by creating an UNBOUND workqueue with @max_active==1. This originally was an implementation side-effect which was broken by 4c16bd327c74 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered"). Because there were users that depended on the ordered execution, 5c0338c68706 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered") made workqueue allocation path to implicitly promote UNBOUND workqueues w/ @max_active==1 to ordered workqueues. While this has worked okay, overloading the UNBOUND allocation interface this way creates other issues. It's difficult to tell whether a given workqueue actually needs to be ordered and users that legitimately want a min concurrency level wq unexpectedly gets an ordered one instead. With planned UNBOUND workqueue updates to improve execution locality and more prevalence of chiplet designs which can benefit from such improvements, this isn't a state we wanna be in forever. This patch series audits all callsites that create an UNBOUND workqueue w/ @max_active==1 and converts them to alloc_ordered_workqueue() as necessary. WHAT TO LOOK FOR ================ The conversions are from alloc_workqueue(WQ_UNBOUND | flags, 1, args..) to alloc_ordered_workqueue(flags, args...) which don't cause any functional changes. If you know that fully ordered execution is not ncessary, please let me know. I'll drop the conversion and instead add a comment noting the fact to reduce confusion while conversion is in progress. If you aren't fully sure, it's completely fine to let the conversion through. The behavior will stay exactly the same and we can always reconsider later. As there are follow-up workqueue core changes, I'd really appreciate if the patch can be routed through the workqueue tree w/ your acks. Thanks. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Cc: xen-devel@lists.xenproject.org
2023-04-27Merge tag 'for-linus-6.4-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: - some cleanups in the Xen blkback driver - fix potential sleeps under lock in various Xen drivers * tag 'for-linus-6.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/blkback: move blkif_get_x86_*_req() into blkback.c xen/blkback: simplify free_persistent_gnts() interface xen/blkback: remove stale prototype xen/blkback: fix white space code style issues xen/pvcalls: don't call bind_evtchn_to_irqhandler() under lock xen/scsiback: don't call scsiback_free_translation_entry() under lock xen/pciback: don't call pcistub_device_put() under lock
2023-04-27Merge tag 'sysctl-6.4-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull sysctl updates from Luis Chamberlain: "This only does a few sysctl moves from the kernel/sysctl.c file, the rest of the work has been put towards deprecating two API calls which incur recursion and prevent us from simplifying the registration process / saving memory per move. Most of the changes have been soaking on linux-next since v6.3-rc3. I've slowed down the kernel/sysctl.c moves due to Matthew Wilcox's feedback that we should see if we could *save* memory with these moves instead of incurring more memory. We currently incur more memory since when we move a syctl from kernel/sysclt.c out to its own file we end up having to add a new empty sysctl used to register it. To achieve saving memory we want to allow syctls to be passed without requiring the end element being empty, and just have our registration process rely on ARRAY_SIZE(). Without this, supporting both styles of sysctls would make the sysctl registration pretty brittle, hard to read and maintain as can be seen from Meng Tang's efforts to do just this [0]. Fortunately, in order to use ARRAY_SIZE() for all sysctl registrations also implies doing the work to deprecate two API calls which use recursion in order to support sysctl declarations with subdirectories. And so during this development cycle quite a bit of effort went into this deprecation effort. I've annotated the following two APIs are deprecated and in few kernel releases we should be good to remove them: - register_sysctl_table() - register_sysctl_paths() During this merge window we should be able to deprecate and unexport register_sysctl_paths(), we can probably do that towards the end of this merge window. Deprecating register_sysctl_table() will take a bit more time but this pull request goes with a few example of how to do this. As it turns out each of the conversions to move away from either of these two API calls *also* saves memory. And so long term, all these changes *will* prove to have saved a bit of memory on boot. The way I see it then is if remove a user of one deprecated call, it gives us enough savings to move one kernel/sysctl.c out from the generic arrays as we end up with about the same amount of bytes. Since deprecating register_sysctl_table() and register_sysctl_paths() does not require maintainer coordination except the final unexport you'll see quite a bit of these changes from other pull requests, I've just kept the stragglers after rc3" Link: https://lkml.kernel.org/r/ZAD+cpbrqlc5vmry@bombadil.infradead.org [0] * tag 'sysctl-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (29 commits) fs: fix sysctls.c built mm: compaction: remove incorrect #ifdef checks mm: compaction: move compaction sysctl to its own file mm: memory-failure: Move memory failure sysctls to its own file arm: simplify two-level sysctl registration for ctl_isa_vars ia64: simplify one-level sysctl registration for kdump_ctl_table utsname: simplify one-level sysctl registration for uts_kern_table ntfs: simplfy one-level sysctl registration for ntfs_sysctls coda: simplify one-level sysctl registration for coda_table fs/cachefiles: simplify one-level sysctl registration for cachefiles_sysctls xfs: simplify two-level sysctl registration for xfs_table nfs: simplify two-level sysctl registration for nfs_cb_sysctls nfs: simplify two-level sysctl registration for nfs4_cb_sysctls lockd: simplify two-level sysctl registration for nlm_sysctls proc_sysctl: enhance documentation xen: simplify sysctl registration for balloon md: simplify sysctl registration hv: simplify sysctl registration scsi: simplify sysctl registration with register_sysctl() csky: simplify alignment sysctl registration ...
2023-04-26Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull SCSI updates from James Bottomley: "Updates to the usual drivers (megaraid_sas, scsi_debug, lpfc, target, mpi3mr, hisi_sas, arcmsr). The major core change is the constification of the host templates (which touches everything) along with other minor fixups and clean ups" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (207 commits) scsi: ufs: mcq: Use pointer arithmetic in ufshcd_send_command() scsi: ufs: mcq: Annotate ufshcd_inc_sq_tail() appropriately scsi: cxlflash: s/semahpore/semaphore/ scsi: lpfc: Silence an incorrect device output scsi: mpi3mr: Use IRQ save variants of spinlock to protect chain frame allocation scsi: scsi_debug: Fix missing error code in scsi_debug_init() scsi: hisi_sas: Work around build failure in suspend function scsi: lpfc: Fix ioremap issues in lpfc_sli4_pci_mem_setup() scsi: mpt3sas: Fix an issue when driver is being removed scsi: mpt3sas: Remove HBA BIOS version in the kernel log scsi: target: core: Fix invalid memory access scsi: scsi_debug: Drop sdebug_queue scsi: scsi_debug: Only allow sdebug_max_queue be modified when no shosts scsi: scsi_debug: Use scsi_host_busy() in delay_store() and ndelay_store() scsi: scsi_debug: Use blk_mq_tagset_busy_iter() in stop_all_queued() scsi: scsi_debug: Use blk_mq_tagset_busy_iter() in sdebug_blk_mq_poll() scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd scsi: scsi_debug: Use scsi_block_requests() to block queues scsi: scsi_debug: Protect block_unblock_all_queues() with mutex scsi: scsi_debug: Change shost list lock to a mutex ...
2023-04-24xen/pvcalls: don't call bind_evtchn_to_irqhandler() under lockJuergen Gross
bind_evtchn_to_irqhandler() shouldn't be called under spinlock, as it can sleep. This requires to move the calls of create_active() out of the locked regions. This is no problem, as the worst which could happen would be a spurious call of the interrupt handler, causing a spurious wake_up(). Reported-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/lkml/Y+JUIl64UDmdkboh@kadam/ Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Link: https://lore.kernel.org/r/20230403092711.15285-1-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
2023-04-24xen/scsiback: don't call scsiback_free_translation_entry() under lockJuergen Gross
scsiback_free_translation_entry() shouldn't be called under spinlock, as it can sleep. This requires to split removing a translation entry from the v2p list from actually calling kref_put() for the entry. Reported-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/lkml/Y+JUIl64UDmdkboh@kadam/ Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Link: https://lore.kernel.org/r/20230328084602.20729-1-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
2023-04-24xen/pciback: don't call pcistub_device_put() under lockJuergen Gross
pcistub_device_put() shouldn't be called under spinlock, as it can sleep. For this reason pcistub_device_get_pci_dev() needs to be modified: instead of always calling pcistub_device_get() just do the call of pcistub_device_get() only if it is really needed. This removes the need to call pcistub_device_put(). Reported-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/lkml/Y+JUIl64UDmdkboh@kadam/ Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Link: https://lore.kernel.org/r/20230328084549.20695-1-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
2023-04-13xen: simplify sysctl registration for balloonLuis Chamberlain
register_sysctl_table() is a deprecated compatibility wrapper. register_sysctl_init() can do the directory creation for you so just use that. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Juergen Gross <jgross@suse.com>
2023-03-22ACPI: processor: Fix evaluating _PDC method when running as Xen dom0Roger Pau Monne
In ACPI systems, the OS can direct power management, as opposed to the firmware. This OS-directed Power Management is called OSPM. Part of telling the firmware that the OS going to direct power management is making ACPI "_PDC" (Processor Driver Capabilities) calls. These _PDC methods must be evaluated for every processor object. If these _PDC calls are not completed for every processor it can lead to inconsistency and later failures in things like the CPU frequency driver. In a Xen system, the dom0 kernel is responsible for system-wide power management. The dom0 kernel is in charge of OSPM. However, the number of CPUs available to dom0 can be different than the number of CPUs physically present on the system. This leads to a problem: the dom0 kernel needs to evaluate _PDC for all the processors, but it can't always see them. In dom0 kernels, ignore the existing ACPI method for determining if a processor is physically present because it might not be accurate. Instead, ask the hypervisor for this information. Fix this by introducing a custom function to use when running as Xen dom0 in order to check whether a processor object matches a CPU that's online. Such checking is done using the existing information fetched by the Xen pCPU subsystem, extending it to also store the ACPI ID. This ensures that _PDC method gets evaluated for all physically online CPUs, regardless of the number of CPUs made available to dom0. Fixes: 5d554a7bb064 ("ACPI: processor: add internal processor_physically_present()") Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-03-17Merge tag 'for-linus-6.3-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - cleanup for xen time handling - enable the VGA console in a Xen PVH dom0 - cleanup in the xenfs driver * tag 'for-linus-6.3-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: remove unnecessary (void*) conversions x86/PVH: obtain VGA console info in Dom0 x86/xen/time: cleanup xen_tsc_safe_clocksource xen: update arch/x86/include/asm/xen/cpuid.h
2023-03-16scsi: xen-scsiback: Remove default fabric ops calloutsDmitry Bogdanov
Remove callouts that are identical to the default implementations in TCM Core. Acked-by: Juergen Gross <jgross@suse.com> Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com> Link: https://lore.kernel.org/r/20230313181110.20566-10-d.bogdanov@yadro.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-03-16xen: remove unnecessary (void*) conversionsYu Zhe
Pointer variables of void * type do not require type cast. Signed-off-by: Yu Zhe <yuzhe@nfschina.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230316083954.4223-1-yuzhe@nfschina.com Signed-off-by: Juergen Gross <jgross@suse.com>
2023-02-24Merge tag 'driver-core-6.3-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the large set of driver core changes for 6.3-rc1. There's a lot of changes this development cycle, most of the work falls into two different categories: - fw_devlink fixes and updates. This has gone through numerous review cycles and lots of review and testing by lots of different devices. Hopefully all should be good now, and Saravana will be keeping a watch for any potential regression on odd embedded systems. - driver core changes to work to make struct bus_type able to be moved into read-only memory (i.e. const) The recent work with Rust has pointed out a number of areas in the driver core where we are passing around and working with structures that really do not have to be dynamic at all, and they should be able to be read-only making things safer overall. This is the contuation of that work (started last release with kobject changes) in moving struct bus_type to be constant. We didn't quite make it for this release, but the remaining patches will be finished up for the release after this one, but the groundwork has been laid for this effort. Other than that we have in here: - debugfs memory leak fixes in some subsystems - error path cleanups and fixes for some never-able-to-be-hit codepaths. - cacheinfo rework and fixes - Other tiny fixes, full details are in the shortlog All of these have been in linux-next for a while with no reported problems" [ Geert Uytterhoeven points out that that last sentence isn't true, and that there's a pending report that has a fix that is queued up - Linus ] * tag 'driver-core-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (124 commits) debugfs: drop inline constant formatting for ERR_PTR(-ERROR) OPP: fix error checking in opp_migrate_dentry() debugfs: update comment of debugfs_rename() i3c: fix device.h kernel-doc warnings dma-mapping: no need to pass a bus_type into get_arch_dma_ops() driver core: class: move EXPORT_SYMBOL_GPL() lines to the correct place Revert "driver core: add error handling for devtmpfs_create_node()" Revert "devtmpfs: add debug info to handle()" Revert "devtmpfs: remove return value of devtmpfs_delete_node()" driver core: cpu: don't hand-override the uevent bus_type callback. devtmpfs: remove return value of devtmpfs_delete_node() devtmpfs: add debug info to handle() driver core: add error handling for devtmpfs_create_node() driver core: bus: update my copyright notice driver core: bus: add bus_get_dev_root() function driver core: bus: constify bus_unregister() driver core: bus: constify some internal functions driver core: bus: constify bus_get_kset() driver core: bus: constify bus_register/unregister_notifier() driver core: remove private pointer from struct bus_type ...
2023-02-23Merge tag 'mm-stable-2023-02-20-13-37' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Daniel Verkamp has contributed a memfd series ("mm/memfd: add F_SEAL_EXEC") which permits the setting of the memfd execute bit at memfd creation time, with the option of sealing the state of the X bit. - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare") which addresses a rare race condition related to PMD unsharing. - Several folioification patch serieses from Matthew Wilcox, Vishal Moola, Sidhartha Kumar and Lorenzo Stoakes - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which does perform some memcg maintenance and cleanup work. - SeongJae Park has added DAMOS filtering to DAMON, with the series "mm/damon/core: implement damos filter". These filters provide users with finer-grained control over DAMOS's actions. SeongJae has also done some DAMON cleanup work. - Kairui Song adds a series ("Clean up and fixes for swap"). - Vernon Yang contributed the series "Clean up and refinement for maple tree". - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It adds to MGLRU an LRU of memcgs, to improve the scalability of global reclaim. - David Hildenbrand has added some userfaultfd cleanup work in the series "mm: uffd-wp + change_protection() cleanups". - Christoph Hellwig has removed the generic_writepages() library function in the series "remove generic_writepages". - Baolin Wang has performed some maintenance on the compaction code in his series "Some small improvements for compaction". - Sidhartha Kumar is doing some maintenance work on struct page in his series "Get rid of tail page fields". - David Hildenbrand contributed some cleanup, bugfixing and generalization of pte management and of pte debugging in his series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs". - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation flag in the series "Discard __GFP_ATOMIC". - Sergey Senozhatsky has improved zsmalloc's memory utilization with his series "zsmalloc: make zspage chain size configurable". - Joey Gouly has added prctl() support for prohibiting the creation of writeable+executable mappings. The previous BPF-based approach had shortcomings. See "mm: In-kernel support for memory-deny-write-execute (MDWE)". - Waiman Long did some kmemleak cleanup and bugfixing in the series "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF". - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series "mm: multi-gen LRU: improve". - Jiaqi Yan has provided some enhancements to our memory error statistics reporting, mainly by presenting the statistics on a per-node basis. See the series "Introduce per NUMA node memory error statistics". - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog regression in compaction via his series "Fix excessive CPU usage during compaction". - Christoph Hellwig does some vmalloc maintenance work in the series "cleanup vfree and vunmap". - Christoph Hellwig has removed block_device_operations.rw_page() in ths series "remove ->rw_page". - We get some maple_tree improvements and cleanups in Liam Howlett's series "VMA tree type safety and remove __vma_adjust()". - Suren Baghdasaryan has done some work on the maintainability of our vm_flags handling in the series "introduce vm_flags modifier functions". - Some pagemap cleanup and generalization work in Mike Rapoport's series "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and "fixups for generic implementation of pfn_valid()" - Baoquan He has done some work to make /proc/vmallocinfo and /proc/kcore better represent the real state of things in his series "mm/vmalloc.c: allow vread() to read out vm_map_ram areas". - Jason Gunthorpe rationalized the GUP system's interface to the rest of the kernel in the series "Simplify the external interface for GUP". - SeongJae Park wishes to migrate people from DAMON's debugfs interface over to its sysfs interface. To support this, we'll temporarily be printing warnings when people use the debugfs interface. See the series "mm/damon: deprecate DAMON debugfs interface". - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes and clean-ups" series. - Huang Ying has provided a dramatic reduction in migration's TLB flush IPI rates with the series "migrate_pages(): batch TLB flushing". - Arnd Bergmann has some objtool fixups in "objtool warning fixes". * tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits) include/linux/migrate.h: remove unneeded externs mm/memory_hotplug: cleanup return value handing in do_migrate_range() mm/uffd: fix comment in handling pte markers mm: change to return bool for isolate_movable_page() mm: hugetlb: change to return bool for isolate_hugetlb() mm: change to return bool for isolate_lru_page() mm: change to return bool for folio_isolate_lru() objtool: add UACCESS exceptions for __tsan_volatile_read/write kmsan: disable ftrace in kmsan core code kasan: mark addr_has_metadata __always_inline mm: memcontrol: rename memcg_kmem_enabled() sh: initialize max_mapnr m68k/nommu: add missing definition of ARCH_PFN_OFFSET mm: percpu: fix incorrect size in pcpu_obj_full_size() maple_tree: reduce stack usage with gcc-9 and earlier mm: page_alloc: call panic() when memoryless node allocation fails mm: multi-gen LRU: avoid futile retries migrate_pages: move THP/hugetlb migration support check to simplify code migrate_pages: batch flushing TLB migrate_pages: share more code between _unmap and _move ...
2023-02-23Merge tag 'efi-next-for-v6.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI updates from Ard Biesheuvel: "A healthy mix of EFI contributions this time: - Performance tweaks for efifb earlycon (Andy) - Preparatory refactoring and cleanup work in the efivar layer, which is needed to accommodate the Snapdragon arm64 laptops that expose their EFI variable store via a TEE secure world API (Johan) - Enhancements to the EFI memory map handling so that Xen dom0 can safely access EFI configuration tables (Demi Marie) - Wire up the newly introduced IBT/BTI flag in the EFI memory attributes table, so that firmware that is generated with ENDBR/BTI landing pads will be mapped with enforcement enabled - Clean up how we check and print the EFI revision exposed by the firmware - Incorporate EFI memory attributes protocol definition and wire it up in the EFI zboot code (Evgeniy) This ensures that these images can execute under new and stricter rules regarding the default memory permissions for EFI page allocations (More work is in progress here) - CPER header cleanup (Dan Williams) - Use a raw spinlock to protect the EFI runtime services stack on arm64 to ensure the correct semantics under -rt (Pierre) - EFI framebuffer quirk for Lenovo Ideapad (Darrell)" * tag 'efi-next-for-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (24 commits) firmware/efi sysfb_efi: Add quirk for Lenovo IdeaPad Duet 3 arm64: efi: Make efi_rt_lock a raw_spinlock efi: Add mixed-mode thunk recipe for GetMemoryAttributes efi: x86: Wire up IBT annotation in memory attributes table efi: arm64: Wire up BTI annotation in memory attributes table efi: Discover BTI support in runtime services regions efi/cper, cxl: Remove cxl_err.h efi: Use standard format for printing the EFI revision efi: Drop minimum EFI version check at boot efi: zboot: Use EFI protocol to remap code/data with the right attributes efi/libstub: Add memory attribute protocol definitions efi: efivars: prevent double registration efi: verify that variable services are supported efivarfs: always register filesystem efi: efivars: add efivars printk prefix efi: Warn if trying to reserve memory under Xen efi: Actually enable the ESRT under Xen efi: Apply allowlist to EFI configuration tables when running under Xen efi: xen: Implement memory descriptor lookup based on hypercall efi: memmap: Disregard bogus entries instead of returning them ...
2023-02-21Merge tag 'net-next-6.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - Add dedicated kmem_cache for typical/small skb->head, avoid having to access struct page at kfree time, and improve memory use. - Introduce sysctl to set default RPS configuration for new netdevs. - Define Netlink protocol specification format which can be used to describe messages used by each family and auto-generate parsers. Add tools for generating kernel data structures and uAPI headers. - Expose all net/core sysctls inside netns. - Remove 4s sleep in netpoll if carrier is instantly detected on boot. - Add configurable limit of MDB entries per port, and port-vlan. - Continue populating drop reasons throughout the stack. - Retire a handful of legacy Qdiscs and classifiers. Protocols: - Support IPv4 big TCP (TSO frames larger than 64kB). - Add IP_LOCAL_PORT_RANGE socket option, to control local port range on socket by socket basis. - Track and report in procfs number of MPTCP sockets used. - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path manager. - IPv6: don't check net.ipv6.route.max_size and rely on garbage collection to free memory (similarly to IPv4). - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986). - ICMP: add per-rate limit counters. - Add support for user scanning requests in ieee802154. - Remove static WEP support. - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate reporting. - WiFi 7 EHT channel puncturing support (client & AP). BPF: - Add a rbtree data structure following the "next-gen data structure" precedent set by recently added linked list, that is, by using kfunc + kptr instead of adding a new BPF map type. - Expose XDP hints via kfuncs with initial support for RX hash and timestamp metadata. - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better support decap on GRE tunnel devices not operating in collect metadata. - Improve x86 JIT's codegen for PROBE_MEM runtime error checks. - Remove the need for trace_printk_lock for bpf_trace_printk and bpf_trace_vprintk helpers. - Extend libbpf's bpf_tracing.h support for tracing arguments of kprobes/uprobes and syscall as a special case. - Significantly reduce the search time for module symbols by livepatch and BPF. - Enable cpumasks to be used as kptrs, which is useful for tracing programs tracking which tasks end up running on which CPUs in different time intervals. - Add support for BPF trampoline on s390x and riscv64. - Add capability to export the XDP features supported by the NIC. - Add __bpf_kfunc tag for marking kernel functions as kfuncs. - Add cgroup.memory=nobpf kernel parameter option to disable BPF memory accounting for container environments. Netfilter: - Remove the CLUSTERIP target. It has been marked as obsolete for years, and we still have WARN splats wrt races of the out-of-band /proc interface installed by this target. - Add 'destroy' commands to nf_tables. They are identical to the existing 'delete' commands, but do not return an error if the referenced object (set, chain, rule...) did not exist. Driver API: - Improve cpumask_local_spread() locality to help NICs set the right IRQ affinity on AMD platforms. - Separate C22 and C45 MDIO bus transactions more clearly. - Introduce new DCB table to control DSCP rewrite on egress. - Support configuration of Physical Layer Collision Avoidance (PLCA) Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of shared medium Ethernet. - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing preemption of low priority frames by high priority frames. - Add support for controlling MACSec offload using netlink SET. - Rework devlink instance refcounts to allow registration and de-registration under the instance lock. Split the code into multiple files, drop some of the unnecessarily granular locks and factor out common parts of netlink operation handling. - Add TX frame aggregation parameters (for USB drivers). - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning messages with notifications for debug. - Allow offloading of UDP NEW connections via act_ct. - Add support for per action HW stats in TC. - Support hardware miss to TC action (continue processing in SW from a specific point in the action chain). - Warn if old Wireless Extension user space interface is used with modern cfg80211/mac80211 drivers. Do not support Wireless Extensions for Wi-Fi 7 devices at all. Everyone should switch to using nl80211 interface instead. - Improve the CAN bit timing configuration. Use extack to return error messages directly to user space, update the SJW handling, including the definition of a new default value that will benefit CAN-FD controllers, by increasing their oscillator tolerance. New hardware / drivers: - Ethernet: - nVidia BlueField-3 support (control traffic driver) - Ethernet support for imx93 SoCs - Motorcomm yt8531 gigabit Ethernet PHY - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA) - Microchip LAN8841 PHY (incl. cable diagnostics and PTP) - Amlogic gxl MDIO mux - WiFi: - RealTek RTL8188EU (rtl8xxxu) - Qualcomm Wi-Fi 7 devices (ath12k) - CAN: - Renesas R-Car V4H Drivers: - Bluetooth: - Set Per Platform Antenna Gain (PPAG) for Intel controllers. - Ethernet NICs: - Intel (1G, igc): - support TSN / Qbv / packet scheduling features of i226 model - Intel (100G, ice): - use GNSS subsystem instead of TTY - multi-buffer XDP support - extend support for GPIO pins to E823 devices - nVidia/Mellanox: - update the shared buffer configuration on PFC commands - implement PTP adjphase function for HW offset control - TC support for Geneve and GRE with VF tunnel offload - more efficient crypto key management method - multi-port eswitch support - Netronome/Corigine: - add DCB IEEE support - support IPsec offloading for NFP3800 - Freescale/NXP (enetc): - support XDP_REDIRECT for XDP non-linear buffers - improve reconfig, avoid link flap and waiting for idle - support MAC Merge layer - Other NICs: - sfc/ef100: add basic devlink support for ef100 - ionic: rx_push mode operation (writing descriptors via MMIO) - bnxt: use the auxiliary bus abstraction for RDMA - r8169: disable ASPM and reset bus in case of tx timeout - cpsw: support QSGMII mode for J721e CPSW9G - cpts: support pulse-per-second output - ngbe: add an mdio bus driver - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing - r8152: handle devices with FW with NCM support - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation - virtio-net: support multi buffer XDP - virtio/vsock: replace virtio_vsock_pkt with sk_buff - tsnep: XDP support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add support for latency TLV (in FW control messages) - Microchip (sparx5): - separate explicit and implicit traffic forwarding rules, make the implicit rules always active - add support for egress DSCP rewrite - IS0 VCAP support (Ingress Classification) - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS etc.) - ES2 VCAP support (Egress Access Control) - support for Per-Stream Filtering and Policing (802.1Q, 8.6.5.1) - Ethernet embedded switches: - Marvell (mv88e6xxx): - add MAB (port auth) offload support - enable PTP receive for mv88e6390 - NXP (ocelot): - support MAC Merge layer - support for the the vsc7512 internal copper phys - Microchip: - lan9303: convert to PHYLINK - lan966x: support TC flower filter statistics - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x - lan937x: support Credit Based Shaper configuration - ksz9477: support Energy Efficient Ethernet - other: - qca8k: convert to regmap read/write API, use bulk operations - rswitch: Improve TX timestamp accuracy - Intel WiFi (iwlwifi): - EHT (Wi-Fi 7) rate reporting - STEP equalizer support: transfer some STEP (connection to radio on platforms with integrated wifi) related parameters from the BIOS to the firmware. - Qualcomm 802.11ax WiFi (ath11k): - IPQ5018 support - Fine Timing Measurement (FTM) responder role support - channel 177 support - MediaTek WiFi (mt76): - per-PHY LED support - mt7996: EHT (Wi-Fi 7) support - Wireless Ethernet Dispatch (WED) reset support - switch to using page pool allocator - RealTek WiFi (rtw89): - support new version of Bluetooth co-existance - Mobile: - rmnet: support TX aggregation" * tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits) page_pool: add a comment explaining the fragment counter usage net: ethtool: fix __ethtool_dev_mm_supported() implementation ethtool: pse-pd: Fix double word in comments xsk: add linux/vmalloc.h to xsk.c sefltests: netdevsim: wait for devlink instance after netns removal selftest: fib_tests: Always cleanup before exit net/mlx5e: Align IPsec ASO result memory to be as required by hardware net/mlx5e: TC, Set CT miss to the specific ct action instance net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG net/mlx5: Refactor tc miss handling to a single function net/mlx5: Kconfig: Make tc offload depend on tc skb extension net/sched: flower: Support hardware miss to tc action net/sched: flower: Move filter handle initialization earlier net/sched: cls_api: Support hardware miss to tc action net/sched: Rename user cookie and act cookie sfc: fix builds without CONFIG_RTC_LIB sfc: clean up some inconsistent indentings net/mlx4_en: Introduce flexible array to silence overflow warning net: lan966x: Fix possible deadlock inside PTP net/ulp: Remove redundant ->clone() test in inet_clone_ulp(). ...
2023-02-18xen: sysfs: make kobj_type structure constantThomas Weißschuh
Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") the driver core allows the usage of const struct kobj_type. Take advantage of this to constify the structure definition to prevent modification at runtime. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230216-kobj_type-xen-v1-1-742423de7d71@weissschuh.net Signed-off-by: Juergen Gross <jgross@suse.com>
2023-02-13xen: Replace one-element array with flexible-array memberGustavo A. R. Silva
One-element arrays are deprecated, and we are replacing them with flexible array members instead. So, replace one-element array with flexible-array member in struct xen_page_directory. This helps with the ongoing efforts to tighten the FORTIFY_SOURCE routines on memcpy() and help us make progress towards globally enabling -fstrict-flex-arrays=3 [1]. This results in no differences in binary output. Link: https://github.com/KSPP/linux/issues/79 Link: https://github.com/KSPP/linux/issues/255 Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html [1] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/Y9xjN6Wa3VslgXeX@work Signed-off-by: Juergen Gross <jgross@suse.com>
2023-02-13xen/grant-dma-iommu: Implement a dummy probe_device() callbackOleksandr Tyshchenko
Update stub IOMMU driver (which main purpose is to reuse generic IOMMU device-tree bindings by Xen grant DMA-mapping layer on Arm) according to the recent changes done in the following commit 57365a04c921 ("iommu: Move bus setup to IOMMU device registration"). With probe_device() callback being called during IOMMU device registration, the uninitialized callback just leads to the "kernel NULL pointer dereference" issue during boot. Fix that by adding a dummy callback. Looks like the release_device() callback is not mandatory to be implemented as IOMMU framework makes sure that callback is initialized before dereferencing. Reported-by: Viresh Kumar <viresh.kumar@linaro.org> Fixes: 57365a04c921 ("iommu: Move bus setup to IOMMU device registration") Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Tested-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Link: https://lore.kernel.org/r/20230208153649.3604857-1-olekstysh@gmail.com Signed-off-by: Juergen Gross <jgross@suse.com>
2023-02-13xen/pvcalls-back: fix permanently masked event channelVolodymyr Babchuk
There is a sequence of events that can lead to a permanently masked event channel, because xen_irq_lateeoi() is newer called. This happens when a backend receives spurious write event from a frontend. In this case pvcalls_conn_back_write() returns early and it does not clears the map->write counter. As map->write > 0, pvcalls_back_ioworker() returns without calling xen_irq_lateeoi(). This leaves the event channel in masked state, a backend does not receive any new events from a frontend and the whole communication stops. Move atomic_set(&map->write, 0) to the very beginning of pvcalls_conn_back_write() to fix this issue. Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com> Reported-by: Oleksii Moisieiev <oleksii_moisieiev@epam.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230119211037.1234931-1-volodymyr_babchuk@epam.com Signed-off-by: Juergen Gross <jgross@suse.com>
2023-02-13xen: Allow platform PCI interrupt to be sharedDavid Woodhouse
When we don't use the per-CPU vector callback, we ask Xen to deliver event channel interrupts as INTx on the PCI platform device. As such, it can be shared with INTx on other PCI devices. Set IRQF_SHARED, and make it return IRQ_HANDLED or IRQ_NONE according to whether the evtchn_upcall_pending flag was actually set. Now I can share the interrupt: 11: 82 0 IO-APIC 11-fasteoi xen-platform-pci, ens4 Drop the IRQF_TRIGGER_RISING. It has no effect when the IRQ is shared, and besides, the only effect it was having even beforehand was to trigger a debug message in both I/OAPIC and legacy PIC cases: [ 0.915441] genirq: No set_type function for IRQ 11 (IO-APIC) [ 0.951939] genirq: No set_type function for IRQ 11 (XT-PIC) Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/f9a29a68d05668a3636dd09acd94d970269eaec6.camel@infradead.org Signed-off-by: Juergen Gross <jgross@suse.com>
2023-02-13drivers/xen/hypervisor: Expose Xen SIF flags to userspacePer Bilse
/proc/xen is a legacy pseudo filesystem which predates Xen support getting merged into Linux. It has largely been replaced with more normal locations for data (/sys/hypervisor/ for info, /dev/xen/ for user devices). We want to compile xenfs support out of the dom0 kernel. There is one item which only exists in /proc/xen, namely /proc/xen/capabilities with "control_d" being the signal of "you're in the control domain". This ultimately comes from the SIF flags provided at VM start. This patch exposes all SIF flags in /sys/hypervisor/start_flags/ as boolean files, one for each bit, returning '1' if set, '0' otherwise. Two known flags, 'privileged' and 'initdomain', are explicitly named, and all remaining flags can be accessed via generically named files, as suggested by Andrew Cooper. Signed-off-by: Per Bilse <per.bilse@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230103130213.2129753-1-per.bilse@citrix.com Signed-off-by: Juergen Gross <jgross@suse.com>
2023-02-09mm: replace vma->vm_flags direct modifications with modifier callsSuren Baghdasaryan
Replace direct modifications to vma->vm_flags with calls to modifier functions to be able to track flag changes and to keep vma locking correctness. [akpm@linux-foundation.org: fix drivers/misc/open-dice.c, per Hyeonggon Yoo] Link: https://lkml.kernel.org/r/20230126193752.297968-5-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjun Roy <arjunroy@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Oskolkov <posk@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Song Liu <songliubraving@fb.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-27driver core: make struct bus_type.uevent() take a const *Greg Kroah-Hartman
The uevent() callback in struct bus_type should not be modifying the device that is passed into it, so mark it as a const * and propagate the function signature changes out into all relevant subsystems that use this callback. Acked-by: Rafael J. Wysocki <rafael@kernel.org> Acked-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20230111113018.459199-16-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-23net/sock: Introduce trace_sk_data_ready()Peilin Ye
As suggested by Cong, introduce a tracepoint for all ->sk_data_ready() callback implementations. For example: <...> iperf-609 [002] ..... 70.660425: sk_data_ready: family=2 protocol=6 func=sock_def_readable iperf-609 [002] ..... 70.660436: sk_data_ready: family=2 protocol=6 func=sock_def_readable <...> Suggested-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23efi: Apply allowlist to EFI configuration tables when running under XenDemi Marie Obenour
As it turns out, Xen does not guarantee that EFI boot services data regions in memory are preserved, which means that EFI configuration tables pointing into such memory regions may be corrupted before the dom0 OS has had a chance to inspect them. This is causing problems for Qubes OS when it attempts to perform system firmware updates, which requires that the contents of the EFI System Resource Table are valid when the fwupd userspace program runs. However, other configuration tables such as the memory attributes table or the runtime properties table are equally affected, and so we need a comprehensive workaround that works for any table type. So when running under Xen, check the EFI memory descriptor covering the start of the table, and disregard the table if it does not reside in memory that is preserved by Xen. Co-developed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-01-22efi: xen: Implement memory descriptor lookup based on hypercallDemi Marie Obenour
Xen on x86 boots dom0 in EFI mode but without providing a memory map. This means that some consistency checks we would like to perform on configuration tables or other data structures in memory are not currently possible. Xen does, however, expose EFI memory descriptor info via a Xen hypercall, so let's wire that up instead. It turns out that the returned information is not identical to what Linux's efi_mem_desc_lookup would return: the address returned is the address passed to the hypercall, and the size returned is the number of bytes remaining in the configuration table. However, none of the callers of efi_mem_desc_lookup() currently care about this. In the future, Xen may gain a hypercall that returns the actual start address, which can be used instead. Co-developed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-01-12Merge tag 'for-linus-6.2-rc4-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - two cleanup patches - a fix of a memory leak in the Xen pvfront driver - a fix of a locking issue in the Xen hypervisor console driver * tag 'for-linus-6.2-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/pvcalls: free active map buffer on pvcalls_front_free_map hvc/xen: lock console list traversal x86/xen: Remove the unused function p2m_index() xen: make remove callback of xen driver void returned
2023-01-09xen/pvcalls: free active map buffer on pvcalls_front_free_mapOleksii Moisieiev
Data buffer for active map is allocated in alloc_active_ring and freed in free_active_ring function, which is used only for the error cleanup. pvcalls_front_release is calling pvcalls_front_free_map which ends foreign access for this buffer, but doesn't free allocated pages. Call free_active_ring to clean all allocated resources. Signed-off-by: Oleksii Moisieiev <oleksii_moisieiev@epam.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Link: https://lore.kernel.org/r/6a762ee32dd655cbb09a4aa0e2307e8919761311.1671531297.git.oleksii_moisieiev@epam.com Signed-off-by: Juergen Gross <jgross@suse.com>
2022-12-15xen: make remove callback of xen driver void returnedDawei Li
Since commit fc7a6209d571 ("bus: Make remove callback return void") forces bus_type::remove be void-returned, it doesn't make much sense for any bus based driver implementing remove callbalk to return non-void to its caller. This change is for xen bus based drivers. Acked-by: Juergen Gross <jgross@suse.com> Signed-off-by: Dawei Li <set_pte_at@outlook.com> Link: https://lore.kernel.org/r/TYCP286MB23238119AB4DF190997075C9CAE39@TYCP286MB2323.JPNP286.PROD.OUTLOOK.COM Signed-off-by: Juergen Gross <jgross@suse.com>
2022-12-13Merge tag 'drm-next-2022-12-13' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm updates from Dave Airlie: "The biggest highlight is that the accel subsystem framework is merged. Hopefully for 6.3 we will be able to line up a driver to use it. In drivers land, i915 enables DG2 support by default now, and nouveau has a big stability refactoring and initial ampere support, AMD includes new hw IP support and should build on ARM again. There is also an ofdrm driver to take over offb on platforms it's used. Stuff outside my tree, the dma-buf patches hit a few places, the vc4 firmware changes also do, and i915 has some interactions with MEI for discrete GPUs. I think all of those should have been acked/reviewed by relevant parties. New driver: - ofdrm - replacement for offb fbdev: - add support for nomodeset fourcc: - add Vivante tiled modifier core: - atomic-helpers: CRTC primary plane test fixes, fb access hooks - connector: TV API consistency, cmdline parser improvements - send connector hotplug on cleanup - sort makefile objects tests: - sort kunit tests - improve DP-MST tests - add kunit helpers to create a device sched: - module param for scheduling policy - refcounting fix buddy: - add back random seed log ttm: - convert ttm_resource to size_t - optimize pool allocations edid: - HFVSDB parsing support fixes - logging/debug improvements - DSC quirks dma-buf: - Add unlocked vmap and attachment mapping - move drivers to common locking convention - locking improvements firmware: - new API for rPI firmware and vc4 xilinx: - zynqmp: displayport bridge support - dpsub fix bridge: - adv7533: Remove dynamic lane switching - it6505: Runtime PM support, sync improvements - ps8640: Handle AUX defer messages - tc358775: Drop soft-reset over I2C panel: - panel-edp: Add INX N116BGE-EA2 C2 and C4 support. - Jadard JD9365DA-H3 - NewVision NV3051D amdgpu: - DCN support on ARM - DCN 2.1 secure display - Sienna Cichlid mode2 reset fixes - new GC 11.x firmware versions - drop AMD specific DSC workarounds in favour of drm code - clang warning fixes - scheduler rework - SR-IOV fixes - GPUVM locking fixes - fix memory leak in CS IOCTL error path - flexible array updates - enable new GC/PSP/SMU/NBIO IP - GFX preemption support for gfx9 amdkfd: - cache size fixes - userptr fixes - enable cooperative launch on gfx 10.3 - enable GC 11.0.4 KFD support radeon: - replace kmap with kmap_local_page - ACPI ref count fix - HDA audio notifier support i915: - DG2 enabled by default - MTL enablement work - hotplug refactoring - VBT improvements - Display and watermark refactoring - ADL-P workaround - temp disable runtime_pm for discrete- - fix for A380 as a secondary GPU - Wa_18017747507 for DG2 - CS timestamp support fixes for gen5 and earlier - never purge busy TTM objects - use i915_sg_dma_sizes for all backends - demote GuC kernel contexts to normal priority - gvt: refactor for new MDEV interface - enable DC power states on eDP ports - fix gen 2/3 workarounds nouveau: - fix page fault handling - Ampere acceleration support - driver stability improvements - nva3 backlight support msm: - MSM_INFO_GET_FLAGS support - DPU: XR30 and P010 image formats - Qualcomm SM6115 support - DSI PHY support for QCM2290 - HDMI: refactored dev init path - remove exclusive-fence hack - fix speed-bin detection - enable clamp to idle on 7c3 - improved hangcheck detection vmwgfx: - fb and cursor refactoring - convert to generic hashtable - cursor improvements etnaviv: - hw workarounds - softpin MMU fixes ast: - atomic gamma LUT support - convert to SHMEM lcdif: - support YUV planes - Increase DMA burst size - FIFO threshold tuning meson: - fix return type of cvbs mode_valid mgag200: - fix PLL setup on some revisions sun4i: - A100 and D1 support udl: - modesetting improvements - hot unplug support vc4: - support PAL-M - fix regression preventing 4K @ 60Hz - fix NULL ptr deref v3d: - switch to drm managed resources renesas: - RZ/G2L DSI support - DU Kconfig cleanup mediatek: - fixup dpi and hdmi - MT8188 dpi support - MT8195 AFBC support tegra: - NVDEC hardware on Tegra234 SoC hdlcd: - switch to drm managed resources ingenic: - fix registration error path hisilicon: - convert to drm_mode_init maildp: - use managed resources mtk: - use drm_mode_init rockchip: - use drm_mode_copy" * tag 'drm-next-2022-12-13' of git://anongit.freedesktop.org/drm/drm: (1397 commits) drm/amdgpu: fix mmhub register base coding error drm/amdgpu: add tmz support for GC IP v11.0.4 drm/amdgpu: enable GFX Clock Gating control for GC IP v11.0.4 drm/amdgpu: enable GFX Power Gating for GC IP v11.0.4 drm/amdgpu: enable GFX IP v11.0.4 CG support drm/amdgpu: Make amdgpu_ring_mux functions as static drm/amdgpu: generally allow over-commit during BO allocation drm/amd/display: fix array index out of bound error in DCN32 DML drm/amd/display: 3.2.215 drm/amd/display: set optimized required for comp buf changes drm/amd/display: Add debug option to skip PSR CRTC disable drm/amd/display: correct DML calc error of UrgentLatency drm/amd/display: correct static_screen_event_mask drm/amd/display: Ensure commit_streams returns the DC return code drm/amd/display: read invalid ddc pin status cause engine busy drm/amd/display: Bypass DET swath fill check for max clocks drm/amd/display: Disable uclk pstate for subvp pipes drm/amd/display: Fix DCN2.1 default DSC clocks drm/amd/display: Enable dp_hdmi21_pcon support drm/amd/display: prevent seamless boot on displays that don't have the preferred dig ...
2022-12-12Merge tag 'pull-iov_iter' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull iov_iter updates from Al Viro: "iov_iter work; most of that is about getting rid of direction misannotations and (hopefully) preventing more of the same for the future" * tag 'pull-iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: use less confusing names for iov_iter direction initializers iov_iter: saner checks for attempt to copy to/from iterator [xen] fix "direction" argument of iov_iter_kvec() [vhost] fix 'direction' argument of iov_iter_{init,bvec}() [target] fix iov_iter_bvec() "direction" argument [s390] memcpy_real(): WRITE is "data source", not destination... [s390] zcore: WRITE is "data source", not destination... [infiniband] READ is "data destination", not source... [fsi] WRITE is "data source", not destination... [s390] copy_oldmem_kernel() - WRITE is "data source", not destination csum_and_copy_to_iter(): handle ITER_DISCARD get rid of unlikely() on page_copy_sane() calls