diff options
33 files changed, 593 insertions, 244 deletions
diff --git a/Documentation/virt/hyperv/index.rst b/Documentation/virt/hyperv/index.rst index 4a7a1b738bbe..de447e11b4a5 100644 --- a/Documentation/virt/hyperv/index.rst +++ b/Documentation/virt/hyperv/index.rst @@ -10,3 +10,4 @@ Hyper-V Enlightenments overview vmbus clocks + vpci diff --git a/Documentation/virt/hyperv/vpci.rst b/Documentation/virt/hyperv/vpci.rst new file mode 100644 index 000000000000..b65b2126ede3 --- /dev/null +++ b/Documentation/virt/hyperv/vpci.rst @@ -0,0 +1,316 @@ +.. SPDX-License-Identifier: GPL-2.0 + +PCI pass-thru devices +========================= +In a Hyper-V guest VM, PCI pass-thru devices (also called +virtual PCI devices, or vPCI devices) are physical PCI devices +that are mapped directly into the VM's physical address space. +Guest device drivers can interact directly with the hardware +without intermediation by the host hypervisor. This approach +provides higher bandwidth access to the device with lower +latency, compared with devices that are virtualized by the +hypervisor. The device should appear to the guest just as it +would when running on bare metal, so no changes are required +to the Linux device drivers for the device. + +Hyper-V terminology for vPCI devices is "Discrete Device +Assignment" (DDA). Public documentation for Hyper-V DDA is +available here: `DDA`_ + +.. _DDA: https://learn.microsoft.com/en-us/windows-server/virtualization/hyper-v/plan/plan-for-deploying-devices-using-discrete-device-assignment + +DDA is typically used for storage controllers, such as NVMe, +and for GPUs. A similar mechanism for NICs is called SR-IOV +and produces the same benefits by allowing a guest device +driver to interact directly with the hardware. See Hyper-V +public documentation here: `SR-IOV`_ + +.. _SR-IOV: https://learn.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-single-root-i-o-virtualization--sr-iov- + +This discussion of vPCI devices includes DDA and SR-IOV +devices. + +Device Presentation +------------------- +Hyper-V provides full PCI functionality for a vPCI device when +it is operating, so the Linux device driver for the device can +be used unchanged, provided it uses the correct Linux kernel +APIs for accessing PCI config space and for other integration +with Linux. But the initial detection of the PCI device and +its integration with the Linux PCI subsystem must use Hyper-V +specific mechanisms. Consequently, vPCI devices on Hyper-V +have a dual identity. They are initially presented to Linux +guests as VMBus devices via the standard VMBus "offer" +mechanism, so they have a VMBus identity and appear under +/sys/bus/vmbus/devices. The VMBus vPCI driver in Linux at +drivers/pci/controller/pci-hyperv.c handles a newly introduced +vPCI device by fabricating a PCI bus topology and creating all +the normal PCI device data structures in Linux that would +exist if the PCI device were discovered via ACPI on a bare- +metal system. Once those data structures are set up, the +device also has a normal PCI identity in Linux, and the normal +Linux device driver for the vPCI device can function as if it +were running in Linux on bare-metal. Because vPCI devices are +presented dynamically through the VMBus offer mechanism, they +do not appear in the Linux guest's ACPI tables. vPCI devices +may be added to a VM or removed from a VM at any time during +the life of the VM, and not just during initial boot. + +With this approach, the vPCI device is a VMBus device and a +PCI device at the same time. In response to the VMBus offer +message, the hv_pci_probe() function runs and establishes a +VMBus connection to the vPCI VSP on the Hyper-V host. That +connection has a single VMBus channel. The channel is used to +exchange messages with the vPCI VSP for the purpose of setting +up and configuring the vPCI device in Linux. Once the device +is fully configured in Linux as a PCI device, the VMBus +channel is used only if Linux changes the vCPU to be interrupted +in the guest, or if the vPCI device is removed from +the VM while the VM is running. The ongoing operation of the +device happens directly between the Linux device driver for +the device and the hardware, with VMBus and the VMBus channel +playing no role. + +PCI Device Setup +---------------- +PCI device setup follows a sequence that Hyper-V originally +created for Windows guests, and that can be ill-suited for +Linux guests due to differences in the overall structure of +the Linux PCI subsystem compared with Windows. Nonetheless, +with a bit of hackery in the Hyper-V virtual PCI driver for +Linux, the virtual PCI device is setup in Linux so that +generic Linux PCI subsystem code and the Linux driver for the +device "just work". + +Each vPCI device is set up in Linux to be in its own PCI +domain with a host bridge. The PCI domainID is derived from +bytes 4 and 5 of the instance GUID assigned to the VMBus vPCI +device. The Hyper-V host does not guarantee that these bytes +are unique, so hv_pci_probe() has an algorithm to resolve +collisions. The collision resolution is intended to be stable +across reboots of the same VM so that the PCI domainIDs don't +change, as the domainID appears in the user space +configuration of some devices. + +hv_pci_probe() allocates a guest MMIO range to be used as PCI +config space for the device. This MMIO range is communicated +to the Hyper-V host over the VMBus channel as part of telling +the host that the device is ready to enter d0. See +hv_pci_enter_d0(). When the guest subsequently accesses this +MMIO range, the Hyper-V host intercepts the accesses and maps +them to the physical device PCI config space. + +hv_pci_probe() also gets BAR information for the device from +the Hyper-V host, and uses this information to allocate MMIO +space for the BARs. That MMIO space is then setup to be +associated with the host bridge so that it works when generic +PCI subsystem code in Linux processes the BARs. + +Finally, hv_pci_probe() creates the root PCI bus. At this +point the Hyper-V virtual PCI driver hackery is done, and the +normal Linux PCI machinery for scanning the root bus works to +detect the device, to perform driver matching, and to +initialize the driver and device. + +PCI Device Removal +------------------ +A Hyper-V host may initiate removal of a vPCI device from a +guest VM at any time during the life of the VM. The removal +is instigated by an admin action taken on the Hyper-V host and +is not under the control of the guest OS. + +A guest VM is notified of the removal by an unsolicited +"Eject" message sent from the host to the guest over the VMBus +channel associated with the vPCI device. Upon receipt of such +a message, the Hyper-V virtual PCI driver in Linux +asynchronously invokes Linux kernel PCI subsystem calls to +shutdown and remove the device. When those calls are +complete, an "Ejection Complete" message is sent back to +Hyper-V over the VMBus channel indicating that the device has +been removed. At this point, Hyper-V sends a VMBus rescind +message to the Linux guest, which the VMBus driver in Linux +processes by removing the VMBus identity for the device. Once +that processing is complete, all vestiges of the device having +been present are gone from the Linux kernel. The rescind +message also indicates to the guest that Hyper-V has stopped +providing support for the vPCI device in the guest. If the +guest were to attempt to access that device's MMIO space, it +would be an invalid reference. Hypercalls affecting the device +return errors, and any further messages sent in the VMBus +channel are ignored. + +After sending the Eject message, Hyper-V allows the guest VM +60 seconds to cleanly shutdown the device and respond with +Ejection Complete before sending the VMBus rescind +message. If for any reason the Eject steps don't complete +within the allowed 60 seconds, the Hyper-V host forcibly +performs the rescind steps, which will likely result in +cascading errors in the guest because the device is now no +longer present from the guest standpoint and accessing the +device MMIO space will fail. + +Because ejection is asynchronous and can happen at any point +during the guest VM lifecycle, proper synchronization in the +Hyper-V virtual PCI driver is very tricky. Ejection has been +observed even before a newly offered vPCI device has been +fully setup. The Hyper-V virtual PCI driver has been updated +several times over the years to fix race conditions when +ejections happen at inopportune times. Care must be taken when +modifying this code to prevent re-introducing such problems. +See comments in the code. + +Interrupt Assignment +-------------------- +The Hyper-V virtual PCI driver supports vPCI devices using +MSI, multi-MSI, or MSI-X. Assigning the guest vCPU that will +receive the interrupt for a particular MSI or MSI-X message is +complex because of the way the Linux setup of IRQs maps onto +the Hyper-V interfaces. For the single-MSI and MSI-X cases, +Linux calls hv_compse_msi_msg() twice, with the first call +containing a dummy vCPU and the second call containing the +real vCPU. Furthermore, hv_irq_unmask() is finally called +(on x86) or the GICD registers are set (on arm64) to specify +the real vCPU again. Each of these three calls interact +with Hyper-V, which must decide which physical CPU should +receive the interrupt before it is forwarded to the guest VM. +Unfortunately, the Hyper-V decision-making process is a bit +limited, and can result in concentrating the physical +interrupts on a single CPU, causing a performance bottleneck. +See details about how this is resolved in the extensive +comment above the function hv_compose_msi_req_get_cpu(). + +The Hyper-V virtual PCI driver implements the +irq_chip.irq_compose_msi_msg function as hv_compose_msi_msg(). +Unfortunately, on Hyper-V the implementation requires sending +a VMBus message to the Hyper-V host and awaiting an interrupt +indicating receipt of a reply message. Since +irq_chip.irq_compose_msi_msg can be called with IRQ locks +held, it doesn't work to do the normal sleep until awakened by +the interrupt. Instead hv_compose_msi_msg() must send the +VMBus message, and then poll for the completion message. As +further complexity, the vPCI device could be ejected/rescinded +while the polling is in progress, so this scenario must be +detected as well. See comments in the code regarding this +very tricky area. + +Most of the code in the Hyper-V virtual PCI driver (pci- +hyperv.c) applies to Hyper-V and Linux guests running on x86 +and on arm64 architectures. But there are differences in how +interrupt assignments are managed. On x86, the Hyper-V +virtual PCI driver in the guest must make a hypercall to tell +Hyper-V which guest vCPU should be interrupted by each +MSI/MSI-X interrupt, and the x86 interrupt vector number that +the x86_vector IRQ domain has picked for the interrupt. This +hypercall is made by hv_arch_irq_unmask(). On arm64, the +Hyper-V virtual PCI driver manages the allocation of an SPI +for each MSI/MSI-X interrupt. The Hyper-V virtual PCI driver +stores the allocated SPI in the architectural GICD registers, +which Hyper-V emulates, so no hypercall is necessary as with +x86. Hyper-V does not support using LPIs for vPCI devices in +arm64 guest VMs because it does not emulate a GICv3 ITS. + +The Hyper-V virtual PCI driver in Linux supports vPCI devices +whose drivers create managed or unmanaged Linux IRQs. If the +smp_affinity for an unmanaged IRQ is updated via the /proc/irq +interface, the Hyper-V virtual PCI driver is called to tell +the Hyper-V host to change the interrupt targeting and +everything works properly. However, on x86 if the x86_vector +IRQ domain needs to reassign an interrupt vector due to +running out of vectors on a CPU, there's no path to inform the +Hyper-V host of the change, and things break. Fortunately, +guest VMs operate in a constrained device environment where +using all the vectors on a CPU doesn't happen. Since such a +problem is only a theoretical concern rather than a practical +concern, it has been left unaddressed. + +DMA +--- +By default, Hyper-V pins all guest VM memory in the host +when the VM is created, and programs the physical IOMMU to +allow the VM to have DMA access to all its memory. Hence +it is safe to assign PCI devices to the VM, and allow the +guest operating system to program the DMA transfers. The +physical IOMMU prevents a malicious guest from initiating +DMA to memory belonging to the host or to other VMs on the +host. From the Linux guest standpoint, such DMA transfers +are in "direct" mode since Hyper-V does not provide a virtual +IOMMU in the guest. + +Hyper-V assumes that physical PCI devices always perform +cache-coherent DMA. When running on x86, this behavior is +required by the architecture. When running on arm64, the +architecture allows for both cache-coherent and +non-cache-coherent devices, with the behavior of each device +specified in the ACPI DSDT. But when a PCI device is assigned +to a guest VM, that device does not appear in the DSDT, so the +Hyper-V VMBus driver propagates cache-coherency information +from the VMBus node in the ACPI DSDT to all VMBus devices, +including vPCI devices (since they have a dual identity as a VMBus +device and as a PCI device). See vmbus_dma_configure(). +Current Hyper-V versions always indicate that the VMBus is +cache coherent, so vPCI devices on arm64 always get marked as +cache coherent and the CPU does not perform any sync +operations as part of dma_map/unmap_*() calls. + +vPCI protocol versions +---------------------- +As previously described, during vPCI device setup and teardown +messages are passed over a VMBus channel between the Hyper-V +host and the Hyper-v vPCI driver in the Linux guest. Some +messages have been revised in newer versions of Hyper-V, so +the guest and host must agree on the vPCI protocol version to +be used. The version is negotiated when communication over +the VMBus channel is first established. See +hv_pci_protocol_negotiation(). Newer versions of the protocol +extend support to VMs with more than 64 vCPUs, and provide +additional information about the vPCI device, such as the +guest virtual NUMA node to which it is most closely affined in +the underlying hardware. + +Guest NUMA node affinity +------------------------ +When the vPCI protocol version provides it, the guest NUMA +node affinity of the vPCI device is stored as part of the Linux +device information for subsequent use by the Linux driver. See +hv_pci_assign_numa_node(). If the negotiated protocol version +does not support the host providing NUMA affinity information, +the Linux guest defaults the device NUMA node to 0. But even +when the negotiated protocol version includes NUMA affinity +information, the ability of the host to provide such +information depends on certain host configuration options. If +the guest receives NUMA node value "0", it could mean NUMA +node 0, or it could mean "no information is available". +Unfortunately it is not possible to distinguish the two cases +from the guest side. + +PCI config space access in a CoCo VM +------------------------------------ +Linux PCI device drivers access PCI config space using a +standard set of functions provided by the Linux PCI subsystem. +In Hyper-V guests these standard functions map to functions +hv_pcifront_read_config() and hv_pcifront_write_config() +in the Hyper-V virtual PCI driver. In normal VMs, +these hv_pcifront_*() functions directly access the PCI config +space, and the accesses trap to Hyper-V to be handled. +But in CoCo VMs, memory encryption prevents Hyper-V +from reading the guest instruction stream to emulate the +access, so the hv_pcifront_*() functions must invoke +hypercalls with explicit arguments describing the access to be +made. + +Config Block back-channel +------------------------- +The Hyper-V host and Hyper-V virtual PCI driver in Linux +together implement a non-standard back-channel communication +path between the host and guest. The back-channel path uses +messages sent over the VMBus channel associated with the vPCI +device. The functions hyperv_read_cfg_blk() and +hyperv_write_cfg_blk() are the primary interfaces provided to +other parts of the Linux kernel. As of this writing, these +interfaces are used only by the Mellanox mlx5 driver to pass +diagnostic data to a Hyper-V host running in the Azure public +cloud. The functions hyperv_read_cfg_blk() and +hyperv_write_cfg_blk() are implemented in a separate module +(pci-hyperv-intf.c, under CONFIG_PCI_HYPERV_INTERFACE) that +effectively stubs them out when running in non-Hyper-V +environments. diff --git a/MAINTAINERS b/MAINTAINERS index 4f298c4187fb..13158047f2af 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2157,7 +2157,7 @@ M: Shawn Guo <shawnguo@kernel.org> M: Sascha Hauer <s.hauer@pengutronix.de> R: Pengutronix Kernel Team <kernel@pengutronix.de> R: Fabio Estevam <festevam@gmail.com> -R: NXP Linux Team <linux-imx@nxp.com> +L: imx@lists.linux.dev L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers) S: Maintained T: git git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux.git @@ -8496,7 +8496,7 @@ FREESCALE IMX / MXC FEC DRIVER M: Wei Fang <wei.fang@nxp.com> R: Shenwei Wang <shenwei.wang@nxp.com> R: Clark Wang <xiaoning.wang@nxp.com> -R: NXP Linux Team <linux-imx@nxp.com> +L: imx@lists.linux.dev L: netdev@vger.kernel.org S: Maintained F: Documentation/devicetree/bindings/net/fsl,fec.yaml @@ -8531,7 +8531,7 @@ F: drivers/i2c/busses/i2c-imx.c FREESCALE IMX LPI2C DRIVER M: Dong Aisheng <aisheng.dong@nxp.com> L: linux-i2c@vger.kernel.org -L: linux-imx@nxp.com +L: imx@lists.linux.dev S: Maintained F: Documentation/devicetree/bindings/i2c/i2c-imx-lpi2c.yaml F: drivers/i2c/busses/i2c-imx-lpi2c.c @@ -15728,7 +15728,7 @@ F: drivers/iio/gyro/fxas21002c_spi.c NXP i.MX 7D/6SX/6UL/93 AND VF610 ADC DRIVER M: Haibo Chen <haibo.chen@nxp.com> L: linux-iio@vger.kernel.org -L: linux-imx@nxp.com +L: imx@lists.linux.dev S: Maintained F: Documentation/devicetree/bindings/iio/adc/fsl,imx7d-adc.yaml F: Documentation/devicetree/bindings/iio/adc/fsl,vf610-adc.yaml @@ -15765,7 +15765,7 @@ F: drivers/gpu/drm/imx/dcss/ NXP i.MX 8QXP ADC DRIVER M: Cai Huoqing <cai.huoqing@linux.dev> M: Haibo Chen <haibo.chen@nxp.com> -L: linux-imx@nxp.com +L: imx@lists.linux.dev L: linux-iio@vger.kernel.org S: Maintained F: Documentation/devicetree/bindings/iio/adc/nxp,imx8qxp-adc.yaml @@ -15773,7 +15773,7 @@ F: drivers/iio/adc/imx8qxp-adc.c NXP i.MX 8QXP/8QM JPEG V4L2 DRIVER M: Mirela Rabulea <mirela.rabulea@nxp.com> -R: NXP Linux Team <linux-imx@nxp.com> +L: imx@lists.linux.dev L: linux-media@vger.kernel.org S: Maintained F: Documentation/devicetree/bindings/media/nxp,imx8-jpeg.yaml @@ -15783,7 +15783,7 @@ NXP i.MX CLOCK DRIVERS M: Abel Vesa <abelvesa@kernel.org> R: Peng Fan <peng.fan@nxp.com> L: linux-clk@vger.kernel.org -L: linux-imx@nxp.com +L: imx@lists.linux.dev S: Maintained T: git git://git.kernel.org/pub/scm/linux/kernel/git/abelvesa/linux.git clk/imx F: Documentation/devicetree/bindings/clock/imx* @@ -19655,7 +19655,7 @@ F: drivers/mmc/host/sdhci-of-at91.c SECURE DIGITAL HOST CONTROLLER INTERFACE (SDHCI) NXP i.MX DRIVER M: Haibo Chen <haibo.chen@nxp.com> -L: linux-imx@nxp.com +L: imx@lists.linux.dev L: linux-mmc@vger.kernel.org S: Maintained F: drivers/mmc/host/sdhci-esdhc-imx.c diff --git a/arch/arm/boot/dts/nxp/imx/imx7s.dtsi b/arch/arm/boot/dts/nxp/imx/imx7s.dtsi index ebf7befcc11e..9c81c6baa2d3 100644 --- a/arch/arm/boot/dts/nxp/imx/imx7s.dtsi +++ b/arch/arm/boot/dts/nxp/imx/imx7s.dtsi @@ -834,16 +834,6 @@ <&clks IMX7D_LCDIF_PIXEL_ROOT_CLK>; clock-names = "pix", "axi"; status = "disabled"; - - port { - #address-cells = <1>; - #size-cells = <0>; - - lcdif_out_mipi_dsi: endpoint@0 { - reg = <0>; - remote-endpoint = <&mipi_dsi_in_lcdif>; - }; - }; }; mipi_csi: mipi-csi@30750000 { @@ -895,22 +885,6 @@ samsung,esc-clock-frequency = <20000000>; samsung,pll-clock-frequency = <24000000>; status = "disabled"; - - ports { - #address-cells = <1>; - #size-cells = <0>; - - port@0 { - reg = <0>; - #address-cells = <1>; - #size-cells = <0>; - - mipi_dsi_in_lcdif: endpoint@0 { - reg = <0>; - remote-endpoint = <&lcdif_out_mipi_dsi>; - }; - }; - }; }; }; diff --git a/arch/arm/configs/imx_v6_v7_defconfig b/arch/arm/configs/imx_v6_v7_defconfig index 0a90583f9f01..8f9dbe8d9029 100644 --- a/arch/arm/configs/imx_v6_v7_defconfig +++ b/arch/arm/configs/imx_v6_v7_defconfig @@ -297,6 +297,7 @@ CONFIG_FB_MODE_HELPERS=y CONFIG_LCD_CLASS_DEVICE=y CONFIG_LCD_L4F00242T03=y CONFIG_LCD_PLATFORM=y +CONFIG_BACKLIGHT_CLASS_DEVICE=y CONFIG_BACKLIGHT_PWM=y CONFIG_BACKLIGHT_GPIO=y CONFIG_FRAMEBUFFER_CONSOLE=y diff --git a/arch/arm64/boot/dts/allwinner/Makefile b/arch/arm64/boot/dts/allwinner/Makefile index 91d505b385de..1f1f8d865d0e 100644 --- a/arch/arm64/boot/dts/allwinner/Makefile +++ b/arch/arm64/boot/dts/allwinner/Makefile @@ -42,5 +42,6 @@ dtb-$(CONFIG_ARCH_SUNXI) += sun50i-h616-bigtreetech-cb1-manta.dtb dtb-$(CONFIG_ARCH_SUNXI) += sun50i-h616-bigtreetech-pi.dtb dtb-$(CONFIG_ARCH_SUNXI) += sun50i-h616-orangepi-zero2.dtb dtb-$(CONFIG_ARCH_SUNXI) += sun50i-h616-x96-mate.dtb +dtb-$(CONFIG_ARCH_SUNXI) += sun50i-h618-orangepi-zero2w.dtb dtb-$(CONFIG_ARCH_SUNXI) += sun50i-h618-orangepi-zero3.dtb dtb-$(CONFIG_ARCH_SUNXI) += sun50i-h618-transpeed-8k618-t.dtb diff --git a/arch/arm64/boot/dts/freescale/imx8mp-dhcom-som.dtsi b/arch/arm64/boot/dts/freescale/imx8mp-dhcom-som.dtsi index 4ae4fdab461e..43f1d45ccc96 100644 --- a/arch/arm64/boot/dts/freescale/imx8mp-dhcom-som.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8mp-dhcom-som.dtsi @@ -255,7 +255,7 @@ <&clk IMX8MP_AUDIO_PLL2_OUT>; assigned-clock-parents = <&clk IMX8MP_AUDIO_PLL2_OUT>; assigned-clock-rates = <13000000>, <13000000>, <156000000>; - reset-gpios = <&gpio3 21 GPIO_ACTIVE_HIGH>; + reset-gpios = <&gpio4 1 GPIO_ACTIVE_HIGH>; status = "disabled"; ports { diff --git a/arch/arm64/boot/dts/freescale/imx8mp.dtsi b/arch/arm64/boot/dts/freescale/imx8mp.dtsi index 76c73daf546b..39a550c1cd26 100644 --- a/arch/arm64/boot/dts/freescale/imx8mp.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8mp.dtsi @@ -1820,7 +1820,7 @@ compatible = "fsl,imx8mp-ldb"; reg = <0x5c 0x4>, <0x128 0x4>; reg-names = "ldb", "lvds"; - clocks = <&clk IMX8MP_CLK_MEDIA_LDB>; + clocks = <&clk IMX8MP_CLK_MEDIA_LDB_ROOT>; clock-names = "ldb"; assigned-clocks = <&clk IMX8MP_CLK_MEDIA_LDB>; assigned-clock-parents = <&clk IMX8MP_VIDEO_PLL1_OUT>; diff --git a/arch/arm64/boot/dts/nvidia/tegra234-p3737-0000+p3701-0000.dts b/arch/arm64/boot/dts/nvidia/tegra234-p3737-0000+p3701-0000.dts index ea13c4a7027c..81a82933e350 100644 --- a/arch/arm64/boot/dts/nvidia/tegra234-p3737-0000+p3701-0000.dts +++ b/arch/arm64/boot/dts/nvidia/tegra234-p3737-0000+p3701-0000.dts @@ -175,7 +175,7 @@ status = "okay"; phy-handle = <&mgbe0_phy>; - phy-mode = "usxgmii"; + phy-mode = "10gbase-r"; mdio { #address-cells = <1>; diff --git a/arch/arm64/boot/dts/nvidia/tegra234.dtsi b/arch/arm64/boot/dts/nvidia/tegra234.dtsi index 3f16595d099c..d1bd328892af 100644 --- a/arch/arm64/boot/dts/nvidia/tegra234.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra234.dtsi @@ -1459,7 +1459,7 @@ <&mc TEGRA234_MEMORY_CLIENT_MGBEAWR &emc>; interconnect-names = "dma-mem", "write"; iommus = <&smmu_niso0 TEGRA234_SID_MGBE>; - power-domains = <&bpmp TEGRA234_POWER_DOMAIN_MGBEA>; + power-domains = <&bpmp TEGRA234_POWER_DOMAIN_MGBEB>; status = "disabled"; }; @@ -1493,7 +1493,7 @@ <&mc TEGRA234_MEMORY_CLIENT_MGBEBWR &emc>; interconnect-names = "dma-mem", "write"; iommus = <&smmu_niso0 TEGRA234_SID_MGBE_VF1>; - power-domains = <&bpmp TEGRA234_POWER_DOMAIN_MGBEB>; + power-domains = <&bpmp TEGRA234_POWER_DOMAIN_MGBEC>; status = "disabled"; }; @@ -1527,7 +1527,7 @@ <&mc TEGRA234_MEMORY_CLIENT_MGBECWR &emc>; interconnect-names = "dma-mem", "write"; iommus = <&smmu_niso0 TEGRA234_SID_MGBE_VF2>; - power-domains = <&bpmp TEGRA234_POWER_DOMAIN_MGBEC>; + power-domains = <&bpmp TEGRA234_POWER_DOMAIN_MGBED>; status = "disabled"; }; diff --git a/arch/arm64/boot/dts/qcom/msm8996.dtsi b/arch/arm64/boot/dts/qcom/msm8996.dtsi index 8d41ed261adf..ee6f87c828ae 100644 --- a/arch/arm64/boot/dts/qcom/msm8996.dtsi +++ b/arch/arm64/boot/dts/qcom/msm8996.dtsi @@ -457,25 +457,6 @@ }; }; - mpm: interrupt-controller { - compatible = "qcom,mpm"; - qcom,rpm-msg-ram = <&apss_mpm>; - interrupts = <GIC_SPI 171 IRQ_TYPE_EDGE_RISING>; - mboxes = <&apcs_glb 1>; - interrupt-controller; - #interrupt-cells = <2>; - #power-domain-cells = <0>; - interrupt-parent = <&intc>; - qcom,mpm-pin-count = <96>; - qcom,mpm-pin-map = <2 184>, /* TSENS1 upper_lower_int */ - <52 243>, /* DWC3_PRI ss_phy_irq */ - <79 347>, /* DWC3_PRI hs_phy_irq */ - <80 352>, /* DWC3_SEC hs_phy_irq */ - <81 347>, /* QUSB2_PHY_PRI DP+DM */ - <82 352>, /* QUSB2_PHY_SEC DP+DM */ - <87 326>; /* SPMI */ - }; - psci { compatible = "arm,psci-1.0"; method = "smc"; @@ -765,15 +746,8 @@ }; rpm_msg_ram: sram@68000 { - compatible = "qcom,rpm-msg-ram", "mmio-sram"; + compatible = "qcom,rpm-msg-ram"; reg = <0x00068000 0x6000>; - #address-cells = <1>; - #size-cells = <1>; - ranges = <0 0x00068000 0x7000>; - - apss_mpm: sram@1b8 { - reg = <0x1b8 0x48>; - }; }; qfprom@74000 { @@ -856,8 +830,8 @@ reg = <0x004ad000 0x1000>, /* TM */ <0x004ac000 0x1000>; /* SROT */ #qcom,sensors = <8>; - interrupts-extended = <&mpm 2 IRQ_TYPE_LEVEL_HIGH>, - <&intc GIC_SPI 430 IRQ_TYPE_LEVEL_HIGH>; + interrupts = <GIC_SPI 184 IRQ_TYPE_LEVEL_HIGH>, + <GIC_SPI 430 IRQ_TYPE_LEVEL_HIGH>; interrupt-names = "uplow", "critical"; #thermal-sensor-cells = <1>; }; @@ -1363,7 +1337,6 @@ interrupts = <GIC_SPI 208 IRQ_TYPE_LEVEL_HIGH>; gpio-controller; gpio-ranges = <&tlmm 0 0 150>; - wakeup-parent = <&mpm>; #gpio-cells = <2>; interrupt-controller; #interrupt-cells = <2>; @@ -1891,7 +1864,7 @@ <0x0400a000 0x002100>; reg-names = "core", "chnls", "obsrvr", "intr", "cnfg"; interrupt-names = "periph_irq"; - interrupts-extended = <&mpm 87 IRQ_TYPE_LEVEL_HIGH>; + interrupts = <GIC_SPI 326 IRQ_TYPE_LEVEL_HIGH>; qcom,ee = <0>; qcom,channel = <0>; #address-cells = <2>; @@ -3052,8 +3025,8 @@ #size-cells = <1>; ranges; - interrupts-extended = <&mpm 79 IRQ_TYPE_LEVEL_HIGH>, - <&mpm 52 IRQ_TYPE_LEVEL_HIGH>; + interrupts = <GIC_SPI 347 IRQ_TYPE_LEVEL_HIGH>, + <GIC_SPI 243 IRQ_TYPE_LEVEL_HIGH>; interrupt-names = "hs_phy_irq", "ss_phy_irq"; clocks = <&gcc GCC_SYS_NOC_USB3_AXI_CLK>, diff --git a/arch/arm64/boot/dts/qcom/sc8280xp-crd.dts b/arch/arm64/boot/dts/qcom/sc8280xp-crd.dts index ffc4406422ae..41215567b3ae 100644 --- a/arch/arm64/boot/dts/qcom/sc8280xp-crd.dts +++ b/arch/arm64/boot/dts/qcom/sc8280xp-crd.dts @@ -563,6 +563,8 @@ }; &pcie4 { + max-link-speed = <2>; + perst-gpios = <&tlmm 141 GPIO_ACTIVE_LOW>; wake-gpios = <&tlmm 139 GPIO_ACTIVE_LOW>; diff --git a/arch/arm64/boot/dts/qcom/sc8280xp-lenovo-thinkpad-x13s.dts b/arch/arm64/boot/dts/qcom/sc8280xp-lenovo-thinkpad-x13s.dts index def3976bd5bb..eb657e544961 100644 --- a/arch/arm64/boot/dts/qcom/sc8280xp-lenovo-thinkpad-x13s.dts +++ b/arch/arm64/boot/dts/qcom/sc8280xp-lenovo-thinkpad-x13s.dts @@ -722,6 +722,8 @@ }; &pcie4 { + max-link-speed = <2>; + perst-gpios = <&tlmm 141 GPIO_ACTIVE_LOW>; wake-gpios = <&tlmm 139 GPIO_ACTIVE_LOW>; diff --git a/arch/arm64/boot/dts/qcom/sm6115.dtsi b/arch/arm64/boot/dts/qcom/sm6115.dtsi index 160e098f1075..f9849b8befbf 100644 --- a/arch/arm64/boot/dts/qcom/sm6115.dtsi +++ b/arch/arm64/boot/dts/qcom/sm6115.dtsi @@ -1304,6 +1304,9 @@ &config_noc SLAVE_QUP_0 RPM_ALWAYS_TAG>, <&system_noc MASTER_QUP_0 RPM_ALWAYS_TAG &bimc SLAVE_EBI_CH0 RPM_ALWAYS_TAG>; + interconnect-names = "qup-core", + "qup-config", + "qup-memory"; #address-cells = <1>; #size-cells = <0>; status = "disabled"; diff --git a/arch/arm64/boot/dts/qcom/sm8650-mtp.dts b/arch/arm64/boot/dts/qcom/sm8650-mtp.dts index 9d916edb1c73..be133a3d5cbe 100644 --- a/arch/arm64/boot/dts/qcom/sm8650-mtp.dts +++ b/arch/arm64/boot/dts/qcom/sm8650-mtp.dts @@ -622,7 +622,7 @@ &tlmm { /* Reserved I/Os for NFC */ - gpio-reserved-ranges = <32 8>; + gpio-reserved-ranges = <32 8>, <74 1>; disp0_reset_n_active: disp0-reset-n-active-state { pins = "gpio133"; diff --git a/arch/arm64/boot/dts/qcom/sm8650-qrd.dts b/arch/arm64/boot/dts/qcom/sm8650-qrd.dts index 592a67a47c78..b9151c2ddf2e 100644 --- a/arch/arm64/boot/dts/qcom/sm8650-qrd.dts +++ b/arch/arm64/boot/dts/qcom/sm8650-qrd.dts @@ -659,7 +659,7 @@ &tlmm { /* Reserved I/Os for NFC */ - gpio-reserved-ranges = <32 8>; + gpio-reserved-ranges = <32 8>, <74 1>; bt_default: bt-default-state { bt-en-pins { diff --git a/arch/x86/hyperv/hv_vtl.c b/arch/x86/hyperv/hv_vtl.c index 96e6c51515f5..cf1b78cb2d04 100644 --- a/arch/x86/hyperv/hv_vtl.c +++ b/arch/x86/hyperv/hv_vtl.c @@ -16,6 +16,11 @@ extern struct boot_params boot_params; static struct real_mode_header hv_vtl_real_mode_header; +static bool __init hv_vtl_msi_ext_dest_id(void) +{ + return true; +} + void __init hv_vtl_init_platform(void) { pr_info("Linux runs in Hyper-V Virtual Trust Level\n"); @@ -38,6 +43,8 @@ void __init hv_vtl_init_platform(void) x86_platform.legacy.warm_reset = 0; x86_platform.legacy.reserve_bios_regions = 0; x86_platform.legacy.devices.pnpbios = 0; + + x86_init.hyper.msi_ext_dest_id = hv_vtl_msi_ext_dest_id; } static inline u64 hv_vtl_system_desc_base(struct ldttss_desc *desc) diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c index 7dcbf153ad72..768d73de0d09 100644 --- a/arch/x86/hyperv/ivm.c +++ b/arch/x86/hyperv/ivm.c @@ -15,6 +15,7 @@ #include <asm/io.h> #include <asm/coco.h> #include <asm/mem_encrypt.h> +#include <asm/set_memory.h> #include <asm/mshyperv.h> #include <asm/hypervisor.h> #include <asm/mtrr.h> @@ -503,6 +504,31 @@ static int hv_mark_gpa_visibility(u16 count, const u64 pfn[], } /* + * When transitioning memory between encrypted and decrypted, the caller + * of set_memory_encrypted() or set_memory_decrypted() is responsible for + * ensuring that the memory isn't in use and isn't referenced while the + * transition is in progress. The transition has multiple steps, and the + * memory is in an inconsistent state until all steps are complete. A + * reference while the state is inconsistent could result in an exception + * that can't be cleanly fixed up. + * + * But the Linux kernel load_unaligned_zeropad() mechanism could cause a + * stray reference that can't be prevented by the caller, so Linux has + * specific code to handle this case. But when the #VC and #VE exceptions + * routed to a paravisor, the specific code doesn't work. To avoid this + * problem, mark the pages as "not present" while the transition is in + * progress. If load_unaligned_zeropad() causes a stray reference, a normal + * page fault is generated instead of #VC or #VE, and the page-fault-based + * handlers for load_unaligned_zeropad() resolve the reference. When the + * transition is complete, hv_vtom_set_host_visibility() marks the pages + * as "present" again. + */ +static bool hv_vtom_clear_present(unsigned long kbuffer, int pagecount, bool enc) +{ + return !set_memory_np(kbuffer, pagecount); +} + +/* * hv_vtom_set_host_visibility - Set specified memory visible to host. * * In Isolation VM, all guest memory is encrypted from host and guest @@ -515,16 +541,28 @@ static bool hv_vtom_set_host_visibility(unsigned long kbuffer, int pagecount, bo enum hv_mem_host_visibility visibility = enc ? VMBUS_PAGE_NOT_VISIBLE : VMBUS_PAGE_VISIBLE_READ_WRITE; u64 *pfn_array; + phys_addr_t paddr; + void *vaddr; int ret = 0; bool result = true; int i, pfn; pfn_array = kmalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL); - if (!pfn_array) - return false; + if (!pfn_array) { + result = false; + goto err_set_memory_p; + } for (i = 0, pfn = 0; i < pagecount; i++) { - pfn_array[pfn] = virt_to_hvpfn((void *)kbuffer + i * HV_HYP_PAGE_SIZE); + /* + * Use slow_virt_to_phys() because the PRESENT bit has been + * temporarily cleared in the PTEs. slow_virt_to_phys() works + * without the PRESENT bit while virt_to_hvpfn() or similar + * does not. + */ + vaddr = (void *)kbuffer + (i * HV_HYP_PAGE_SIZE); + paddr = slow_virt_to_phys(vaddr); + pfn_array[pfn] = paddr >> HV_HYP_PAGE_SHIFT; pfn++; if (pfn == HV_MAX_MODIFY_GPA_REP_COUNT || i == pagecount - 1) { @@ -538,14 +576,30 @@ static bool hv_vtom_set_host_visibility(unsigned long kbuffer, int pagecount, bo } } - err_free_pfn_array: +err_free_pfn_array: kfree(pfn_array); + +err_set_memory_p: + /* + * Set the PTE PRESENT bits again to revert what hv_vtom_clear_present() + * did. Do this even if there is an error earlier in this function in + * order to avoid leaving the memory range in a "broken" state. Setting + * the PRESENT bits shouldn't fail, but return an error if it does. + */ + if (set_memory_p(kbuffer, pagecount)) + result = false; + return result; } static bool hv_vtom_tlb_flush_required(bool private) { - return true; + /* + * Since hv_vtom_clear_present() marks the PTEs as "not present" + * and flushes the TLB, they can't be in the TLB. That makes the + * flush controlled by this function redundant, so return "false". + */ + return false; } static bool hv_vtom_cache_flush_required(void) @@ -608,6 +662,7 @@ void __init hv_vtom_init(void) x86_platform.hyper.is_private_mmio = hv_is_private_mmio; x86_platform.guest.enc_cache_flush_required = hv_vtom_cache_flush_required; x86_platform.guest.enc_tlb_flush_required = hv_vtom_tlb_flush_required; + x86_platform.guest.enc_status_change_prepare = hv_vtom_clear_present; x86_platform.guest.enc_status_change_finish = hv_vtom_set_host_visibility; /* Set WB as the default cache mode. */ diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h index a5e89641bd2d..9aee31862b4a 100644 --- a/arch/x86/include/asm/set_memory.h +++ b/arch/x86/include/asm/set_memory.h @@ -47,6 +47,7 @@ int set_memory_uc(unsigned long addr, int numpages); int set_memory_wc(unsigned long addr, int numpages); int set_memory_wb(unsigned long addr, int numpages); int set_memory_np(unsigned long addr, int numpages); +int set_memory_p(unsigned long addr, int numpages); int set_memory_4k(unsigned long addr, int numpages); int set_memory_encrypted(unsigned long addr, int numpages); int set_memory_decrypted(unsigned long addr, int numpages); diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index e9b448d1b1b7..102880404046 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -755,10 +755,14 @@ pmd_t *lookup_pmd_address(unsigned long address) * areas on 32-bit NUMA systems. The percpu areas can * end up in this kind of memory, for instance. * - * This could be optimized, but it is only intended to be - * used at initialization time, and keeping it - * unoptimized should increase the testing coverage for - * the more obscure platforms. + * Note that as long as the PTEs are well-formed with correct PFNs, this + * works without checking the PRESENT bit in the leaf PTE. This is unlike + * the similar vmalloc_to_page() and derivatives. Callers may depend on + * this behavior. + * + * This could be optimized, but it is only used in paths that are not perf + * sensitive, and keeping it unoptimized should increase the testing coverage + * for the more obscure platforms. */ phys_addr_t slow_virt_to_phys(void *__virt_addr) { @@ -2041,17 +2045,12 @@ int set_mce_nospec(unsigned long pfn) return rc; } -static int set_memory_p(unsigned long *addr, int numpages) -{ - return change_page_attr_set(addr, numpages, __pgprot(_PAGE_PRESENT), 0); -} - /* Restore full speculative operation to the pfn. */ int clear_mce_nospec(unsigned long pfn) { unsigned long addr = (unsigned long) pfn_to_kaddr(pfn); - return set_memory_p(&addr, 1); + return set_memory_p(addr, 1); } EXPORT_SYMBOL_GPL(clear_mce_nospec); #endif /* CONFIG_X86_64 */ @@ -2104,6 +2103,11 @@ int set_memory_np_noalias(unsigned long addr, int numpages) CPA_NO_CHECK_ALIAS, NULL); } +int set_memory_p(unsigned long addr, int numpages) +{ + return change_page_attr_set(&addr, numpages, __pgprot(_PAGE_PRESENT), 0); +} + int set_memory_4k(unsigned long addr, int numpages) { return change_page_attr_set_clr(&addr, numpages, __pgprot(0), diff --git a/drivers/crypto/allwinner/sun8i-ce/sun8i-ce-cipher.c b/drivers/crypto/allwinner/sun8i-ce/sun8i-ce-cipher.c index 1262a7773ef3..de50c00ba218 100644 --- a/drivers/crypto/allwinner/sun8i-ce/sun8i-ce-cipher.c +++ b/drivers/crypto/allwinner/sun8i-ce/sun8i-ce-cipher.c @@ -299,22 +299,6 @@ theend: return err; } -static void sun8i_ce_cipher_run(struct crypto_engine *engine, void *areq) -{ - struct skcipher_request *breq = container_of(areq, struct skcipher_request, base); - struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(breq); - struct sun8i_cipher_tfm_ctx *op = crypto_skcipher_ctx(tfm); - struct sun8i_ce_dev *ce = op->ce; - struct sun8i_cipher_req_ctx *rctx = skcipher_request_ctx(breq); - int flow, err; - - flow = rctx->flow; - err = sun8i_ce_run_task(ce, flow, crypto_tfm_alg_name(breq->base.tfm)); - local_bh_disable(); - crypto_finalize_skcipher_request(engine, breq, err); - local_bh_enable(); -} - static void sun8i_ce_cipher_unprepare(struct crypto_engine *engine, void *async_req) { @@ -360,6 +344,23 @@ static void sun8i_ce_cipher_unprepare(struct crypto_engine *engine, dma_unmap_single(ce->dev, rctx->addr_key, op->keylen, DMA_TO_DEVICE); } +static void sun8i_ce_cipher_run(struct crypto_engine *engine, void *areq) +{ + struct skcipher_request *breq = container_of(areq, struct skcipher_request, base); + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(breq); + struct sun8i_cipher_tfm_ctx *op = crypto_skcipher_ctx(tfm); + struct sun8i_ce_dev *ce = op->ce; + struct sun8i_cipher_req_ctx *rctx = skcipher_request_ctx(breq); + int flow, err; + + flow = rctx->flow; + err = sun8i_ce_run_task(ce, flow, crypto_tfm_alg_name(breq->base.tfm)); + sun8i_ce_cipher_unprepare(engine, areq); + local_bh_disable(); + crypto_finalize_skcipher_request(engine, breq, err); + local_bh_enable(); +} + int sun8i_ce_cipher_do_one(struct crypto_engine *engine, void *areq) { int err = sun8i_ce_cipher_prepare(engine, areq); @@ -368,7 +369,6 @@ int sun8i_ce_cipher_do_one(struct crypto_engine *engine, void *areq) return err; sun8i_ce_cipher_run(engine, areq); - sun8i_ce_cipher_unprepare(engine, areq); return 0; } diff --git a/drivers/crypto/rockchip/rk3288_crypto_ahash.c b/drivers/crypto/rockchip/rk3288_crypto_ahash.c index 1b13b4aa16ec..a235e6c300f1 100644 --- a/drivers/crypto/rockchip/rk3288_crypto_ahash.c +++ b/drivers/crypto/rockchip/rk3288_crypto_ahash.c @@ -332,12 +332,12 @@ static int rk_hash_run(struct crypto_engine *engine, void *breq) theend: pm_runtime_put_autosuspend(rkc->dev); + rk_hash_unprepare(engine, breq); + local_bh_disable(); crypto_finalize_hash_request(engine, breq, err); local_bh_enable(); - rk_hash_unprepare(engine, breq); - return 0; } diff --git a/drivers/firmware/microchip/mpfs-auto-update.c b/drivers/firmware/microchip/mpfs-auto-update.c index 682e417be5a3..fbeeaee4ac85 100644 --- a/drivers/firmware/microchip/mpfs-auto-update.c +++ b/drivers/firmware/microchip/mpfs-auto-update.c @@ -384,7 +384,8 @@ static int mpfs_auto_update_available(struct mpfs_auto_update_priv *priv) u32 *response_msg; int ret; - response_msg = devm_kzalloc(priv->dev, AUTO_UPDATE_FEATURE_RESP_SIZE * sizeof(response_msg), + response_msg = devm_kzalloc(priv->dev, + AUTO_UPDATE_FEATURE_RESP_SIZE * sizeof(*response_msg), GFP_KERNEL); if (!response_msg) return -ENOMEM; diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c index 56f7e06c673e..adbf674355b2 100644 --- a/drivers/hv/channel.c +++ b/drivers/hv/channel.c @@ -322,125 +322,89 @@ static int create_gpadl_header(enum hv_gpadl_type type, void *kbuffer, pagecount = hv_gpadl_size(type, size) >> HV_HYP_PAGE_SHIFT; - /* do we need a gpadl body msg */ pfnsize = MAX_SIZE_CHANNEL_MESSAGE - sizeof(struct vmbus_channel_gpadl_header) - sizeof(struct gpa_range); + pfncount = umin(pagecount, pfnsize / sizeof(u64)); + + msgsize = sizeof(struct vmbus_channel_msginfo) + + sizeof(struct vmbus_channel_gpadl_header) + + sizeof(struct gpa_range) + pfncount * sizeof(u64); + msgheader = kzalloc(msgsize, GFP_KERNEL); + if (!msgheader) + return -ENOMEM; + + INIT_LIST_HEAD(&msgheader->submsglist); + msgheader->msgsize = msgsize; + + gpadl_header = (struct vmbus_channel_gpadl_header *) + msgheader->msg; + gpadl_header->rangecount = 1; + gpadl_header->range_buflen = sizeof(struct gpa_range) + + pagecount * sizeof(u64); + gpadl_header->range[0].byte_offset = 0; + gpadl_header->range[0].byte_count = hv_gpadl_size(type, size); + for (i = 0; i < pfncount; i++) + gpadl_header->range[0].pfn_array[i] = hv_gpadl_hvpfn( + type, kbuffer, size, send_offset, i); + *msginfo = msgheader; + + pfnsum = pfncount; + pfnleft = pagecount - pfncount; + + /* how many pfns can we fit in a body message */ + pfnsize = MAX_SIZE_CHANNEL_MESSAGE - + sizeof(struct vmbus_channel_gpadl_body); pfncount = pfnsize / sizeof(u64); - if (pagecount > pfncount) { - /* we need a gpadl body */ - /* fill in the header */ + /* + * If pfnleft is zero, everything fits in the header and no body + * messages are needed + */ + while (pfnleft) { + pfncurr = umin(pfncount, pfnleft); msgsize = sizeof(struct vmbus_channel_msginfo) + - sizeof(struct vmbus_channel_gpadl_header) + - sizeof(struct gpa_range) + pfncount * sizeof(u64); - msgheader = kzalloc(msgsize, GFP_KERNEL); - if (!msgheader) - goto nomem; - - INIT_LIST_HEAD(&msgheader->submsglist); - msgheader->msgsize = msgsize; - - gpadl_header = (struct vmbus_channel_gpadl_header *) - msgheader->msg; - gpadl_header->rangecount = 1; - gpadl_header->range_buflen = sizeof(struct gpa_range) + - pagecount * sizeof(u64); - gpadl_header->range[0].byte_offset = 0; - gpadl_header->range[0].byte_count = hv_gpadl_size(type, size); - for (i = 0; i < pfncount; i++) - gpadl_header->range[0].pfn_array[i] = hv_gpadl_hvpfn( - type, kbuffer, size, send_offset, i); - *msginfo = msgheader; - - pfnsum = pfncount; - pfnleft = pagecount - pfncount; - - /* how many pfns can we fit */ - pfnsize = MAX_SIZE_CHANNEL_MESSAGE - - sizeof(struct vmbus_channel_gpadl_body); - pfncount = pfnsize / sizeof(u64); - - /* fill in the body */ - while (pfnleft) { - if (pfnleft > pfncount) - pfncurr = pfncount; - else - pfncurr = pfnleft; - - msgsize = sizeof(struct vmbus_channel_msginfo) + - sizeof(struct vmbus_channel_gpadl_body) + - pfncurr * sizeof(u64); - msgbody = kzalloc(msgsize, GFP_KERNEL); - - if (!msgbody) { - struct vmbus_channel_msginfo *pos = NULL; - struct vmbus_channel_msginfo *tmp = NULL; - /* - * Free up all the allocated messages. - */ - list_for_each_entry_safe(pos, tmp, - &msgheader->submsglist, - msglistentry) { - - list_del(&pos->msglistentry); - kfree(pos); - } - - goto nomem; - } - - msgbody->msgsize = msgsize; - gpadl_body = - (struct vmbus_channel_gpadl_body *)msgbody->msg; + sizeof(struct vmbus_channel_gpadl_body) + + pfncurr * sizeof(u64); + msgbody = kzalloc(msgsize, GFP_KERNEL); + if (!msgbody) { + struct vmbus_channel_msginfo *pos = NULL; + struct vmbus_channel_msginfo *tmp = NULL; /* - * Gpadl is u32 and we are using a pointer which could - * be 64-bit - * This is governed by the guest/host protocol and - * so the hypervisor guarantees that this is ok. + * Free up all the allocated messages. */ - for (i = 0; i < pfncurr; i++) - gpadl_body->pfn[i] = hv_gpadl_hvpfn(type, - kbuffer, size, send_offset, pfnsum + i); - - /* add to msg header */ - list_add_tail(&msgbody->msglistentry, - &msgheader->submsglist); - pfnsum += pfncurr; - pfnleft -= pfncurr; + list_for_each_entry_safe(pos, tmp, + &msgheader->submsglist, + msglistentry) { + + list_del(&pos->msglistentry); + kfree(pos); + } + kfree(msgheader); + return -ENOMEM; } - } else { - /* everything fits in a header */ - msgsize = sizeof(struct vmbus_channel_msginfo) + - sizeof(struct vmbus_channel_gpadl_header) + - sizeof(struct gpa_range) + pagecount * sizeof(u64); - msgheader = kzalloc(msgsize, GFP_KERNEL); - if (msgheader == NULL) - goto nomem; - - INIT_LIST_HEAD(&msgheader->submsglist); - msgheader->msgsize = msgsize; - - gpadl_header = (struct vmbus_channel_gpadl_header *) - msgheader->msg; - gpadl_header->rangecount = 1; - gpadl_header->range_buflen = sizeof(struct gpa_range) + - pagecount * sizeof(u64); - gpadl_header->range[0].byte_offset = 0; - gpadl_header->range[0].byte_count = hv_gpadl_size(type, size); - for (i = 0; i < pagecount; i++) - gpadl_header->range[0].pfn_array[i] = hv_gpadl_hvpfn( - type, kbuffer, size, send_offset, i); - - *msginfo = msgheader; + + msgbody->msgsize = msgsize; + gpadl_body = (struct vmbus_channel_gpadl_body *)msgbody->msg; + + /* + * Gpadl is u32 and we are using a pointer which could + * be 64-bit + * This is governed by the guest/host protocol and + * so the hypervisor guarantees that this is ok. + */ + for (i = 0; i < pfncurr; i++) + gpadl_body->pfn[i] = hv_gpadl_hvpfn(type, + kbuffer, size, send_offset, pfnsum + i); + + /* add to msg header */ + list_add_tail(&msgbody->msglistentry, &msgheader->submsglist); + pfnsum += pfncurr; + pfnleft -= pfncurr; } return 0; -nomem: - kfree(msgheader); - kfree(msgbody); - return -ENOMEM; } /* diff --git a/drivers/hv/hv_util.c b/drivers/hv/hv_util.c index 42aec2c5606a..9c97c4065fe7 100644 --- a/drivers/hv/hv_util.c +++ b/drivers/hv/hv_util.c @@ -296,6 +296,11 @@ static struct { spinlock_t lock; } host_ts; +static bool timesync_implicit; + +module_param(timesync_implicit, bool, 0644); +MODULE_PARM_DESC(timesync_implicit, "If set treat SAMPLE as SYNC when clock is behind"); + static inline u64 reftime_to_ns(u64 reftime) { return (reftime - WLTIMEDELTA) * 100; @@ -345,6 +350,29 @@ static void hv_set_host_time(struct work_struct *work) } /* + * Due to a bug on Hyper-V hosts, the sync flag may not always be sent on resume. + * Force a sync if the guest is behind. + */ +static inline bool hv_implicit_sync(u64 host_time) +{ + struct timespec64 new_ts; + struct timespec64 threshold_ts; + + new_ts = ns_to_timespec64(reftime_to_ns(host_time)); + ktime_get_real_ts64(&threshold_ts); + + threshold_ts.tv_sec += 5; + + /* + * If guest behind the host by 5 or more seconds. + */ + if (timespec64_compare(&new_ts, &threshold_ts) >= 0) + return true; + + return false; +} + +/* * Synchronize time with host after reboot, restore, etc. * * ICTIMESYNCFLAG_SYNC flag bit indicates reboot, restore events of the VM. @@ -384,7 +412,8 @@ static inline void adj_guesttime(u64 hosttime, u64 reftime, u8 adj_flags) spin_unlock_irqrestore(&host_ts.lock, flags); /* Schedule work to do do_settimeofday64() */ - if (adj_flags & ICTIMESYNCFLAG_SYNC) + if ((adj_flags & ICTIMESYNCFLAG_SYNC) || + (timesync_implicit && hv_implicit_sync(host_ts.host_time))) schedule_work(&adj_time_work); } diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index b33d5abd9beb..7f7965f3d187 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -988,7 +988,7 @@ static const struct dev_pm_ops vmbus_pm = { }; /* The one and only one */ -static struct bus_type hv_bus = { +static const struct bus_type hv_bus = { .name = "vmbus", .match = vmbus_match, .shutdown = vmbus_shutdown, diff --git a/drivers/platform/x86/amd/pmf/tee-if.c b/drivers/platform/x86/amd/pmf/tee-if.c index 8527dca9cf56..dcbe8f85e122 100644 --- a/drivers/platform/x86/amd/pmf/tee-if.c +++ b/drivers/platform/x86/amd/pmf/tee-if.c @@ -458,8 +458,10 @@ int amd_pmf_init_smart_pc(struct amd_pmf_dev *dev) amd_pmf_hex_dump_pb(dev); dev->prev_data = kzalloc(sizeof(*dev->prev_data), GFP_KERNEL); - if (!dev->prev_data) + if (!dev->prev_data) { + ret = -ENOMEM; goto error; + } ret = amd_pmf_start_policy_engine(dev); if (ret) diff --git a/drivers/platform/x86/p2sb.c b/drivers/platform/x86/p2sb.c index 6bd14d0132db..3d66e1d4eb1f 100644 --- a/drivers/platform/x86/p2sb.c +++ b/drivers/platform/x86/p2sb.c @@ -20,9 +20,11 @@ #define P2SBC_HIDE BIT(8) #define P2SB_DEVFN_DEFAULT PCI_DEVFN(31, 1) +#define P2SB_DEVFN_GOLDMONT PCI_DEVFN(13, 0) +#define SPI_DEVFN_GOLDMONT PCI_DEVFN(13, 2) static const struct x86_cpu_id p2sb_cpu_ids[] = { - X86_MATCH_INTEL_FAM6_MODEL(ATOM_GOLDMONT, PCI_DEVFN(13, 0)), + X86_MATCH_INTEL_FAM6_MODEL(ATOM_GOLDMONT, P2SB_DEVFN_GOLDMONT), {} }; @@ -98,21 +100,12 @@ static void p2sb_scan_and_cache_devfn(struct pci_bus *bus, unsigned int devfn) static int p2sb_scan_and_cache(struct pci_bus *bus, unsigned int devfn) { - unsigned int slot, fn; - - if (PCI_FUNC(devfn) == 0) { - /* - * When function number of the P2SB device is zero, scan it and - * other function numbers, and if devices are available, cache - * their BAR0s. - */ - slot = PCI_SLOT(devfn); - for (fn = 0; fn < NR_P2SB_RES_CACHE; fn++) - p2sb_scan_and_cache_devfn(bus, PCI_DEVFN(slot, fn)); - } else { - /* Scan the P2SB device and cache its BAR0 */ - p2sb_scan_and_cache_devfn(bus, devfn); - } + /* Scan the P2SB device and cache its BAR0 */ + p2sb_scan_and_cache_devfn(bus, devfn); + + /* On Goldmont p2sb_bar() also gets called for the SPI controller */ + if (devfn == P2SB_DEVFN_GOLDMONT) + p2sb_scan_and_cache_devfn(bus, SPI_DEVFN_GOLDMONT); if (!p2sb_valid_resource(&p2sb_resources[PCI_FUNC(devfn)].res)) return -ENOENT; diff --git a/drivers/tee/optee/device.c b/drivers/tee/optee/device.c index 4b1092127694..1892e49a8e6a 100644 --- a/drivers/tee/optee/device.c +++ b/drivers/tee/optee/device.c @@ -90,13 +90,14 @@ static int optee_register_device(const uuid_t *device_uuid, u32 func) if (rc) { pr_err("device registration failed, err: %d\n", rc); put_device(&optee_device->dev); + return rc; } if (func == PTA_CMD_GET_DEVICES_SUPP) device_create_file(&optee_device->dev, &dev_attr_need_supplicant); - return rc; + return 0; } static int __optee_enumerate_devices(u32 func) diff --git a/drivers/video/fbdev/hyperv_fb.c b/drivers/video/fbdev/hyperv_fb.c index c26ee6fd73c9..8fdccf033b2d 100644 --- a/drivers/video/fbdev/hyperv_fb.c +++ b/drivers/video/fbdev/hyperv_fb.c @@ -1010,8 +1010,6 @@ static int hvfb_getmem(struct hv_device *hdev, struct fb_info *info) goto getmem_done; } pr_info("Unable to allocate enough contiguous physical memory on Gen 1 VM. Using MMIO instead.\n"); - } else { - goto err1; } /* diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index 2b00faf98017..6ef0557b4bff 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -164,8 +164,28 @@ struct hv_ring_buffer { u8 buffer[]; } __packed; + +/* + * If the requested ring buffer size is at least 8 times the size of the + * header, steal space from the ring buffer for the header. Otherwise, add + * space for the header so that is doesn't take too much of the ring buffer + * space. + * + * The factor of 8 is somewhat arbitrary. The goal is to prevent adding a + * relatively small header (4 Kbytes on x86) to a large-ish power-of-2 ring + * buffer size (such as 128 Kbytes) and so end up making a nearly twice as + * large allocation that will be almost half wasted. As a contrasting example, + * on ARM64 with 64 Kbyte page size, we don't want to take 64 Kbytes for the + * header from a 128 Kbyte allocation, leaving only 64 Kbytes for the ring. + * In this latter case, we must add 64 Kbytes for the header and not worry + * about what's wasted. + */ +#define VMBUS_HEADER_ADJ(payload_sz) \ + ((payload_sz) >= 8 * sizeof(struct hv_ring_buffer) ? \ + 0 : sizeof(struct hv_ring_buffer)) + /* Calculate the proper size of a ringbuffer, it must be page-aligned */ -#define VMBUS_RING_SIZE(payload_sz) PAGE_ALIGN(sizeof(struct hv_ring_buffer) + \ +#define VMBUS_RING_SIZE(payload_sz) PAGE_ALIGN(VMBUS_HEADER_ADJ(payload_sz) + \ (payload_sz)) struct hv_ring_buffer_info { diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index ba36c073304a..927bef3a598a 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -2562,7 +2562,7 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs, update_partition_sd_lb(cs, old_prs); out_free: free_cpumasks(NULL, &tmp); - return 0; + return retval; } /** @@ -2598,9 +2598,6 @@ static int update_exclusive_cpumask(struct cpuset *cs, struct cpuset *trialcs, if (cpumask_equal(cs->exclusive_cpus, trialcs->exclusive_cpus)) return 0; - if (alloc_cpumasks(NULL, &tmp)) - return -ENOMEM; - if (*buf) compute_effective_exclusive_cpumask(trialcs, NULL); @@ -2615,6 +2612,9 @@ static int update_exclusive_cpumask(struct cpuset *cs, struct cpuset *trialcs, if (retval) return retval; + if (alloc_cpumasks(NULL, &tmp)) + return -ENOMEM; + if (old_prs) { if (cpumask_empty(trialcs->effective_xcpus)) { invalidate = true; diff --git a/security/integrity/digsig.c b/security/integrity/digsig.c index df387de29bfa..45c3e5dda355 100644 --- a/security/integrity/digsig.c +++ b/security/integrity/digsig.c @@ -179,7 +179,8 @@ static int __init integrity_add_key(const unsigned int id, const void *data, KEY_ALLOC_NOT_IN_QUOTA); if (IS_ERR(key)) { rc = PTR_ERR(key); - pr_err("Problem loading X.509 certificate %d\n", rc); + if (id != INTEGRITY_KEYRING_MACHINE) + pr_err("Problem loading X.509 certificate %d\n", rc); } else { pr_notice("Loaded X.509 cert '%s'\n", key_ref_to_ptr(key)->description); |