summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2020-03-16KVM: x86: Allocate memslot resources during prepare_memory_region()Sean Christopherson
Allocate the various metadata structures associated with a new memslot during kvm_arch_prepare_memory_region(), which paves the way for removing kvm_arch_create_memslot() altogether. Moving x86's memory allocation only changes the order of kernel memory allocations between x86 and common KVM code. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: PPC: Move memslot memory allocation into prepare_memory_region()Sean Christopherson
Allocate the rmap array during kvm_arch_prepare_memory_region() to pave the way for removing kvm_arch_create_memslot() altogether. Moving PPC's memory allocation only changes the order of kernel memory allocations between PPC and common KVM code. No functional change intended. Acked-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Allocate new rmap and large page tracking when moving memslotSean Christopherson
Reallocate a rmap array and recalcuate large page compatibility when moving an existing memslot to correctly handle the alignment properties of the new memslot. The number of rmap entries required at each level is dependent on the alignment of the memslot's base gfn with respect to that level, e.g. moving a large-page aligned memslot so that it becomes unaligned will increase the number of rmap entries needed at the now unaligned level. Not updating the rmap array is the most obvious bug, as KVM accesses garbage data beyond the end of the rmap. KVM interprets the bad data as pointers, leading to non-canonical #GPs, unexpected #PFs, etc... general protection fault: 0000 [#1] SMP CPU: 0 PID: 1909 Comm: move_memory_reg Not tainted 5.4.0-rc7+ #139 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:rmap_get_first+0x37/0x50 [kvm] Code: <48> 8b 3b 48 85 ff 74 ec e8 6c f4 ff ff 85 c0 74 e3 48 89 d8 5b c3 RSP: 0018:ffffc9000021bbc8 EFLAGS: 00010246 RAX: ffff00617461642e RBX: ffff00617461642e RCX: 0000000000000012 RDX: ffff88827400f568 RSI: ffffc9000021bbe0 RDI: ffff88827400f570 RBP: 0010000000000000 R08: ffffc9000021bd00 R09: ffffc9000021bda8 R10: ffffc9000021bc48 R11: 0000000000000000 R12: 0030000000000000 R13: 0000000000000000 R14: ffff88827427d700 R15: ffffc9000021bce8 FS: 00007f7eda014700(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7ed9216ff8 CR3: 0000000274391003 CR4: 0000000000162eb0 Call Trace: kvm_mmu_slot_set_dirty+0xa1/0x150 [kvm] __kvm_set_memory_region.part.64+0x559/0x960 [kvm] kvm_set_memory_region+0x45/0x60 [kvm] kvm_vm_ioctl+0x30f/0x920 [kvm] do_vfs_ioctl+0xa1/0x620 ksys_ioctl+0x66/0x70 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x4c/0x170 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f7ed9911f47 Code: <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 21 6f 2c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffc00937498 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000001ab0010 RCX: 00007f7ed9911f47 RDX: 0000000001ab1350 RSI: 000000004020ae46 RDI: 0000000000000004 RBP: 000000000000000a R08: 0000000000000000 R09: 00007f7ed9214700 R10: 00007f7ed92149d0 R11: 0000000000000246 R12: 00000000bffff000 R13: 0000000000000003 R14: 00007f7ed9215000 R15: 0000000000000000 Modules linked in: kvm_intel kvm irqbypass ---[ end trace 0c5f570b3358ca89 ]--- The disallow_lpage tracking is more subtle. Failure to update results in KVM creating large pages when it shouldn't, either due to stale data or again due to indexing beyond the end of the metadata arrays, which can lead to memory corruption and/or leaking data to guest/userspace. Note, the arrays for the old memslot are freed by the unconditional call to kvm_free_memslot() in __kvm_set_memory_region(). Fixes: 05da45583de9b ("KVM: MMU: large page support") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Move gpa_val and gpa_available into the emulator contextSean Christopherson
Move the GPA tracking into the emulator context now that the context is guaranteed to be initialized via __init_emulate_ctxt() prior to dereferencing gpa_{available,val}, i.e. now that seeing a stale gpa_available will also trigger a WARN due to an invalid context. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Add EMULTYPE_PF when emulation is triggered by a page faultSean Christopherson
Add a new emulation type flag to explicitly mark emulation related to a page fault. Move the propation of the GPA into the emulator from the page fault handler into x86_emulate_instruction, using EMULTYPE_PF as an indicator that cr2 is valid. Similarly, don't propagate cr2 into the exception.address when it's *not* valid. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: apic: remove unused function apic_lvt_vector()Miaohe Lin
The function apic_lvt_vector() is unused now, remove it. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: VMX: Add 'else' to split mutually exclusive caseMiaohe Lin
Each if branch in handle_external_interrupt_irqoff() is mutually exclusive. Add 'else' to make it clear and also avoid some unnecessary check. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: eliminate some unreachable codeMiaohe Lin
These code are unreachable, remove them. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Fix print format and coding styleMiaohe Lin
Use %u to print u32 var and correct some coding style. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: vmx: rewrite the comment in vmx_get_mt_maskChia-I Wu
Better reflect the structure of the code and metion why we could not always honor the guest. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Cc: Gurchetan Singh <gurchetansingh@chromium.org> Cc: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-23KVM: nVMX: Check IO instruction VM-exit conditionsOliver Upton
Consult the 'unconditional IO exiting' and 'use IO bitmaps' VM-execution controls when checking instruction interception. If the 'use IO bitmaps' VM-execution control is 1, check the instruction access against the IO bitmaps to determine if the instruction causes a VM-exit. Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-23KVM: nVMX: Refactor IO bitmap checks into helper functionOliver Upton
Checks against the IO bitmap are useful for both instruction emulation and VM-exit reflection. Refactor the IO bitmap checks into a helper function. Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-23KVM: nVMX: Don't emulate instructions in guest modePaolo Bonzini
vmx_check_intercept is not yet fully implemented. To avoid emulating instructions disallowed by the L1 hypervisor, refuse to emulate instructions by default. Cc: stable@vger.kernel.org [Made commit, added commit msg - Oliver] Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-23KVM: nVMX: Emulate MTF when performing instruction emulationOliver Upton
Since commit 5f3d45e7f282 ("kvm/x86: add support for MONITOR_TRAP_FLAG"), KVM has allowed an L1 guest to use the monitor trap flag processor-based execution control for its L2 guest. KVM simply forwards any MTF VM-exits to the L1 guest, which works for normal instruction execution. However, when KVM needs to emulate an instruction on the behalf of an L2 guest, the monitor trap flag is not emulated. Add the necessary logic to kvm_skip_emulated_instruction() to synthesize an MTF VM-exit to L1 upon instruction emulation for L2. Fixes: 5f3d45e7f282 ("kvm/x86: add support for MONITOR_TRAP_FLAG") Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-23KVM: fix error handling in svm_hardware_setupLi RongQing
rename svm_hardware_unsetup as svm_hardware_teardown, move it before svm_hardware_setup, and call it to free all memory if fail to setup in svm_hardware_setup, otherwise memory will be leaked remove __exit attribute for it since it is called in __init function Signed-off-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21KVM: SVM: Fix potential memory leak in svm_cpu_init()Miaohe Lin
When kmalloc memory for sd->sev_vmcbs failed, we forget to free the page held by sd->save_area. Also get rid of the var r as '-ENOMEM' is actually the only possible outcome here. Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21KVM: apic: avoid calculating pending eoi from an uninitialized valMiaohe Lin
When pv_eoi_get_user() fails, 'val' may remain uninitialized and the return value of pv_eoi_get_pending() becomes random. Fix the issue by initializing the variable. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21KVM: nVMX: clear PIN_BASED_POSTED_INTR from nested pinbased_ctls only when ↵Vitaly Kuznetsov
apicv is globally disabled When apicv is disabled on a vCPU (e.g. by enabling KVM_CAP_HYPERV_SYNIC*), nothing happens to VMX MSRs on the already existing vCPUs, however, all new ones are created with PIN_BASED_POSTED_INTR filtered out. This is very confusing and results in the following picture inside the guest: $ rdmsr -ax 0x48d ff00000016 7f00000016 7f00000016 7f00000016 This is observed with QEMU and 4-vCPU guest: QEMU creates vCPU0, does KVM_CAP_HYPERV_SYNIC2 and then creates the remaining three. L1 hypervisor may only check CPU0's controls to find out what features are available and it will be very confused later. Switch to setting PIN_BASED_POSTED_INTR control based on global 'enable_apicv' setting. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21KVM: nVMX: handle nested posted interrupts when apicv is disabled for L1Vitaly Kuznetsov
Even when APICv is disabled for L1 it can (and, actually, is) still available for L2, this means we need to always call vmx_deliver_nested_posted_interrupt() when attempting an interrupt delivery. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21kvm: x86: svm: Fix NULL pointer dereference when AVIC not enabledSuravee Suthikulpanit
Launching VM w/ AVIC disabled together with pass-through device results in NULL pointer dereference bug with the following call trace. RIP: 0010:svm_refresh_apicv_exec_ctrl+0x17e/0x1a0 [kvm_amd] Call Trace: kvm_vcpu_update_apicv+0x44/0x60 [kvm] kvm_arch_vcpu_ioctl_run+0x3f4/0x1c80 [kvm] kvm_vcpu_ioctl+0x3d8/0x650 [kvm] do_vfs_ioctl+0xaa/0x660 ? tomoyo_file_ioctl+0x19/0x20 ksys_ioctl+0x67/0x90 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x57/0x190 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Investigation shows that this is due to the uninitialized usage of struct vapu_svm.ir_list in the svm_set_pi_irte_mode(), which is called from svm_refresh_apicv_exec_ctrl(). The ir_list is initialized only if AVIC is enabled. So, fixes by adding a check if AVIC is enabled in the svm_refresh_apicv_exec_ctrl(). Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206579 Fixes: 8937d762396d ("kvm: x86: svm: Add support to (de)activate posted interrupts.") Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21KVM: VMX: Add VMX_FEATURE_USR_WAIT_PAUSEXiaoyao Li
Commit 159348784ff0 ("x86/vmx: Introduce VMX_FEATURES_*") missed bit 26 (enable user wait and pause) of Secondary Processor-based VM-Execution Controls. Add VMX_FEATURE_USR_WAIT_PAUSE flag so that it shows up in /proc/cpuinfo, and use it to define SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE to make them uniform. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21KVM: nVMX: Hold KVM's srcu lock when syncing vmcs12->shadowwanpeng li
For the duration of mapping eVMCS, it derefences ->memslots without holding ->srcu or ->slots_lock when accessing hv assist page. This patch fixes it by moving nested_sync_vmcs12_to_shadow to prepare_guest_switch, where the SRCU is already taken. It can be reproduced by running kvm's evmcs_test selftest. ============================= warning: suspicious rcu usage 5.6.0-rc1+ #53 tainted: g w ioe ----------------------------- ./include/linux/kvm_host.h:623 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by evmcs_test/8507: #0: ffff9ddd156d00d0 (&vcpu->mutex){+.+.}, at: kvm_vcpu_ioctl+0x85/0x680 [kvm] stack backtrace: cpu: 6 pid: 8507 comm: evmcs_test tainted: g w ioe 5.6.0-rc1+ #53 hardware name: dell inc. optiplex 7040/0jctf8, bios 1.4.9 09/12/2016 call trace: dump_stack+0x68/0x9b kvm_read_guest_cached+0x11d/0x150 [kvm] kvm_hv_get_assist_page+0x33/0x40 [kvm] nested_enlightened_vmentry+0x2c/0x60 [kvm_intel] nested_vmx_handle_enlightened_vmptrld.part.52+0x32/0x1c0 [kvm_intel] nested_sync_vmcs12_to_shadow+0x439/0x680 [kvm_intel] vmx_vcpu_run+0x67a/0xe60 [kvm_intel] vcpu_enter_guest+0x35e/0x1bc0 [kvm] kvm_arch_vcpu_ioctl_run+0x40b/0x670 [kvm] kvm_vcpu_ioctl+0x370/0x680 [kvm] ksys_ioctl+0x235/0x850 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x77/0x780 entry_syscall_64_after_hwframe+0x49/0xbe Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21KVM: x86: don't notify userspace IOAPIC on edge-triggered interrupt EOIMiaohe Lin
Commit 13db77347db1 ("KVM: x86: don't notify userspace IOAPIC on edge EOI") said, edge-triggered interrupts don't set a bit in TMR, which means that IOAPIC isn't notified on EOI. And var level indicates level-triggered interrupt. But commit 3159d36ad799 ("KVM: x86: use generic function for MSI parsing") replace var level with irq.level by mistake. Fix it by changing irq.level to irq.trig_mode. Cc: stable@vger.kernel.org Fixes: 3159d36ad799 ("KVM: x86: use generic function for MSI parsing") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-20kvm/emulate: fix a -Werror=cast-function-typeQian Cai
arch/x86/kvm/emulate.c: In function 'x86_emulate_insn': arch/x86/kvm/emulate.c:5686:22: error: cast between incompatible function types from 'int (*)(struct x86_emulate_ctxt *)' to 'void (*)(struct fastop *)' [-Werror=cast-function-type] rc = fastop(ctxt, (fastop_t)ctxt->execute); Fix it by using an unnamed union of a (*execute) function pointer and a (*fastop) function pointer. Fixes: 3009afc6e39e ("KVM: x86: Use a typedef for fastop functions") Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-20KVM: x86: fix incorrect comparison in trace eventPaolo Bonzini
The "u" field in the event has three states, -1/0/1. Using u8 however means that comparison with -1 will always fail, so change to signed char. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-17KVM: nVMX: Fix some obsolete comments and grammar errorMiaohe Lin
Fix wrong variable names and grammar error in comment. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12KVM: x86: enable -WerrorPaolo Bonzini
Avoid more embarrassing mistakes. At least those that the compiler can catch. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12KVM: x86: fix WARN_ON check of an unsigned less than zeroPaolo Bonzini
The check cpu->hv_clock.system_time < 0 is redundant since system_time is a u64 and hence can never be less than zero. But what was actually meant is to check that the result is positive, since kernel_ns and v->kvm->arch.kvmclock_offset are both s64. Reported-by: Colin King <colin.king@canonical.com> Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Addresses-Coverity: ("Macro compares unsigned to 0") Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12KVM: x86/mmu: Fix struct guest_walker arrays for 5-level pagingSean Christopherson
Define PT_MAX_FULL_LEVELS as PT64_ROOT_MAX_LEVEL, i.e. 5, to fix shadow paging for 5-level guest page tables. PT_MAX_FULL_LEVELS is used to size the arrays that track guest pages table information, i.e. using a "max levels" of 4 causes KVM to access garbage beyond the end of an array when querying state for level 5 entries. E.g. FNAME(gpte_changed) will read garbage and most likely return %true for a level 5 entry, soft-hanging the guest because FNAME(fetch) will restart the guest instead of creating SPTEs because it thinks the guest PTE has changed. Note, KVM doesn't yet support 5-level nested EPT, so PT_MAX_FULL_LEVELS gets to stay "4" for the PTTYPE_EPT case. Fixes: 855feb673640 ("KVM: MMU: Add 5 level EPT & Shadow page table support.") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12KVM: nVMX: Use correct root level for nested EPT shadow page tablesSean Christopherson
Hardcode the EPT page-walk level for L2 to be 4 levels, as KVM's MMU currently also hardcodes the page walk level for nested EPT to be 4 levels. The L2 guest is all but guaranteed to soft hang on its first instruction when L1 is using EPT, as KVM will construct 4-level page tables and then tell hardware to use 5-level page tables. Fixes: 855feb673640 ("KVM: MMU: Add 5 level EPT & Shadow page table support.") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12KVM: nVMX: Fix some comment typos and coding styleMiaohe Lin
Fix some typos in the comments. Also fix coding style. [Sean Christopherson rewrites the comment of write_fault_to_shadow_pgtable field in struct kvm_vcpu_arch.] Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12KVM: x86/mmu: Avoid retpoline on ->page_fault() with TDPSean Christopherson
Wrap calls to ->page_fault() with a small shim to directly invoke the TDP fault handler when the kernel is using retpolines and TDP is being used. Single out the TDP fault handler and annotate the TDP path as likely to coerce the compiler into preferring it over the indirect function call. Rename tdp_page_fault() to kvm_tdp_page_fault(), as it's exposed outside of mmu.c to allow inlining the shim. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12KVM: apic: reuse smp_wmb() in kvm_make_request()Miaohe Lin
kvm_make_request() provides smp_wmb() so pending_events changes are guaranteed to be visible. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12KVM: x86: remove duplicated KVM_REQ_EVENT requestMiaohe Lin
The KVM_REQ_EVENT request is already made in kvm_set_rflags(). We should not make it again. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12KVM: x86: Deliver exception payload on KVM_GET_VCPU_EVENTSOliver Upton
KVM allows the deferral of exception payloads when a vCPU is in guest mode to allow the L1 hypervisor to intercept certain events (#PF, #DB) before register state has been modified. However, this behavior is incompatible with the KVM_{GET,SET}_VCPU_EVENTS ABI, as userspace expects register state to have been immediately modified. Userspace may opt-in for the payload deferral behavior with the KVM_CAP_EXCEPTION_PAYLOAD per-VM capability. As such, kvm_multiple_exception() will immediately manipulate guest registers if the capability hasn't been requested. Since the deferral is only necessary if a userspace ioctl were to be serviced at the same as a payload bearing exception is recognized, this behavior can be relaxed. Instead, opportunistically defer the payload from kvm_multiple_exception() and deliver the payload before completing a KVM_GET_VCPU_EVENTS ioctl. Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12KVM: nVMX: Handle pending #DB when injecting INIT VM-exitOliver Upton
SDM 27.3.4 states that the 'pending debug exceptions' VMCS field will be populated if a VM-exit caused by an INIT signal takes priority over a debug-trap. Emulate this behavior when synthesizing an INIT signal VM-exit into L1. Fixes: 4b9852f4f389 ("KVM: x86: Fix INIT signal handling in various CPU states") Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12KVM: x86: Mask off reserved bit from #DB exception payloadOliver Upton
KVM defines the #DB payload as compatible with the 'pending debug exceptions' field under VMX, not DR6. Mask off bit 12 when applying the payload to DR6, as it is reserved on DR6 but not the 'pending debug exceptions' field. Fixes: f10c729ff965 ("kvm: vmx: Defer setting of DR6 until #DB delivery") Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12KVM: x86: do not reset microcode version on INIT or RESETPaolo Bonzini
Do not initialize the microcode version at RESET or INIT, only on vCPU creation. Microcode updates are not lost during INIT, and exact behavior across a warm RESET is not specified by the architecture. Since we do not support a microcode update directly from the hypervisor, but only as a result of userspace setting the microcode version MSR, it's simpler for userspace if we do nothing in KVM and let userspace emulate behavior for RESET as it sees fit. Userspace can tie the fix to the availability of MSR_IA32_UCODE_REV in the list of emulated MSRs. Reported-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-09Merge tag 'kbuild-v5.6-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull more Kbuild updates from Masahiro Yamada: - fix randconfig to generate a sane .config - rename hostprogs-y / always to hostprogs / always-y, which are more natual syntax. - optimize scripts/kallsyms - fix yes2modconfig and mod2yesconfig - make multiple directory targets ('make foo/ bar/') work * tag 'kbuild-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: make multiple directory targets work kconfig: Invalidate all symbols after changing to y or m. kallsyms: fix type of kallsyms_token_table[] scripts/kallsyms: change table to store (strcut sym_entry *) scripts/kallsyms: rename local variables in read_symbol() kbuild: rename hostprogs-y/always to hostprogs/always-y kbuild: fix the document to use extra-y for vmlinux.lds kconfig: fix broken dependency in randconfig-generated .config
2020-02-09Merge tag 'x86-urgent-2020-02-09' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of fixes for X86: - Ensure that the PIT is set up when the local APIC is disable or configured in legacy mode. This is caused by an ordering issue introduced in the recent changes which skip PIT initialization when the TSC and APIC frequencies are already known. - Handle malformed SRAT tables during early ACPI parsing which caused an infinite loop anda boot hang. - Fix a long standing race in the affinity setting code which affects PCI devices with non-maskable MSI interrupts. The problem is caused by the non-atomic writes of the MSI address (destination APIC id) and data (vector) fields which the device uses to construct the MSI message. The non-atomic writes are mandated by PCI. If both fields change and the device raises an interrupt after writing address and before writing data, then the MSI block constructs a inconsistent message which causes interrupts to be lost and subsequent malfunction of the device. The fix is to redirect the interrupt to the new vector on the current CPU first and then switch it over to the new target CPU. This allows to observe an eventually raised interrupt in the transitional stage (old CPU, new vector) to be observed in the APIC IRR and retriggered on the new target CPU and the new vector. The potential spurious interrupts caused by this are harmless and can in the worst case expose a buggy driver (all handlers have to be able to deal with spurious interrupts as they can and do happen for various reasons). - Add the missing suspend/resume mechanism for the HYPERV hypercall page which prevents resume hibernation on HYPERV guests. This change got lost before the merge window. - Mask the IOAPIC before disabling the local APIC to prevent potentially stale IOAPIC remote IRR bits which cause stale interrupt lines after resume" * tag 'x86-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic: Mask IOAPIC entries when disabling the local APIC x86/hyperv: Suspend/resume the hypercall page for hibernation x86/apic/msi: Plug non-maskable MSI affinity race x86/boot: Handle malformed SRAT tables during early ACPI parsing x86/timer: Don't skip PIT setup when APIC is disabled or in legacy mode
2020-02-09Merge tag 'irq-urgent-2020-02-09' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull interrupt fixes from Thomas Gleixner: "A set of fixes for the interrupt subsystem: - Provision only ACPI enabled redistributors on GICv3 - Use the proper command colums when building the INVALL command for the GICv3-ITS - Ensure the allocation of the L2 vPE table for GICv4.1 - Correct the GICv4.1 VPROBASER programming so it uses the proper size - A set of small GICv4.1 tidy up patches - Configuration cleanup for C-SKY interrupt chip - Clarify the function documentation for irq_set_wake() to document that the wakeup functionality is orthogonal to the irq disable/enable mechanism" * tag 'irq-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/gic-v3-its: Rename VPENDBASER/VPROPBASER accessors irqchip/gic-v3-its: Remove superfluous WARN_ON irqchip/gic-v4.1: Drop 'tmp' in inherit_vpe_l1_table_from_rd() irqchip/gic-v4.1: Ensure L2 vPE table is allocated at RD level irqchip/gic-v4.1: Set vpe_l1_base for all redistributors irqchip/gic-v4.1: Fix programming of GICR_VPROPBASER_4_1_SIZE genirq: Clarify that irq wake state is orthogonal to enable/disable irqchip/gic-v3-its: Reference to its_invall_cmd descriptor when building INVALL irqchip: Some Kconfig cleanup for C-SKY irqchip/gic-v3: Only provision redistributors that are enabled in ACPI
2020-02-09Merge tag 'efi-urgent-2020-02-09' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fix from Thomas Gleixner: "A single fix for a EFI boot regression on X86 which was caused by the recent rework of the EFI memory map parsing. On systems with invalid memmap entries the cleanup function uses an value which cannot be relied on in this stage. Use the actual EFI memmap entry instead" * tag 'efi-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/x86: Fix boot regression on systems with invalid memmap entries
2020-02-08Merge tag 'powerpc-5.6-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix an existing bug in our user access handling, exposed by one of the bug fixes we merged this cycle. - A fix for a boot hang on 32-bit with CONFIG_TRACE_IRQFLAGS and the recently added CONFIG_VMAP_STACK. Thanks to: Christophe Leroy, Guenter Roeck. * tag 'powerpc-5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc: Fix CONFIG_TRACE_IRQFLAGS with CONFIG_VMAP_STACK powerpc/futex: Fix incorrect user access blocking
2020-02-08Merge tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/socLinus Torvalds
Pull ARM SoC late updates from Olof Johansson: "This is some material that we picked up into our tree late, or that had more complex dependencies on more than one topic branch that makes sense to keep separately. - TI support for secure accelerators and hwrng on OMAP4/5 - TI camera changes for dra7 and am437x and SGX improvement due to better reset control support on am335x, am437x and dra7 - Davinci moves to proper clocksource on DM365, and regulator/audio improvements for DM365 and DM644x eval boards" * tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (32 commits) ARM: dts: omap4-droid4: Enable hdq for droid4 ds250x 1-wire battery nvmem ARM: dts: motorola-cpcap-mapphone: Configure calibration interrupt ARM: dts: Configure interconnect target module for am437x sgx ARM: dts: Configure sgx for dra7 ARM: dts: Configure rstctrl reset for am335x SGX ARM: dts: dra7: Add ti-sysc node for VPE ARM: dts: dra7: add vpe clkctrl node ARM: dts: am43x-epos-evm: Add VPFE and OV2659 entries ARM: dts: am437x-sk-evm: Add VPFE and OV2659 entries ARM: dts: am43xx: add support for clkout1 clock arm: dts: dra76-evm: Add CAL and OV5640 nodes arm: dtsi: dra76x: Add CAL dtsi node arm: dts: dra72-evm-common: Add entries for the CSI2 cameras ARM: dts: DRA72: Add CAL dtsi node ARM: dts: dra7-l4: Add ti-sysc node for CAM ARM: OMAP: DRA7xx: Make CAM clock domain SWSUP only ARM: dts: dra7: add cam clkctrl node ARM: OMAP2+: Drop legacy platform data for omap4 des ARM: OMAP2+: Drop legacy platform data for omap4 sham ARM: OMAP2+: Drop legacy platform data for omap4 aes ...
2020-02-08Merge tag 'armsoc-defconfig' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC defconfig updates from Olof Johansson: "We keep this in a separate branch to avoid cross-branch conflicts, but most of the material here is fairly boring -- some new drivers turned on for hardware since they were merged, and some refreshed files due to time having moved a lot of entries around" * tag 'armsoc-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (38 commits) ARM: configs: at91: enable MMC_SDHCI_OF_AT91 and MICROCHIP_PIT64B arm64: defconfig: Enable Broadcom's GENET Ethernet controller ARM: multi_v7_defconfig: Enable devfreq thermal integration ARM: exynos_defconfig: Enable devfreq thermal integration ARM: multi_v7_defconfig: Enable NFS v4.1 and v4.2 ARM: exynos_defconfig: Enable NFS v4.1 and v4.2 arm64: defconfig: Enable Actions Semi specific drivers arm64: defconfig: Enable Broadcom's STB PCIe controller arm64: defconfig: Enable CONFIG_CLK_IMX8MP by default ARM: configs: at91: enable config flags for sam9x60 SoC ARM: configs: at91: use savedefconfig arm64: defconfig: Enable tegra XUDC support ARM: defconfig: gemini: Update defconfig arm64: defconfig: enable CONFIG_ARM_QCOM_CPUFREQ_NVMEM arm64: defconfig: enable CONFIG_QCOM_CPR arm64: defconfig: Enable HFPLL arm64: defconfig: Enable CRYPTO_DEV_FSL_CAAM ARM: imx_v6_v7_defconfig: Select the TFP410 driver ARM: imx_v6_v7_defconfig: Enable NFS_V4_1 and NFS_V4_2 support arm64: defconfig: Enable ATH10K_SNOC ...
2020-02-08Merge tag 'armsoc-drivers' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC-related driver updates from Olof Johansson: "Various driver updates for platforms: - Nvidia: Fuse support for Tegra194, continued memory controller pieces for Tegra30 - NXP/FSL: Refactorings of QuickEngine drivers to support ARM/ARM64/PPC - NXP/FSL: i.MX8MP SoC driver pieces - TI Keystone: ring accelerator driver - Qualcomm: SCM driver cleanup/refactoring + support for new SoCs. - Xilinx ZynqMP: feature checking interface for firmware. Mailbox communication for power management - Overall support patch set for cpuidle on more complex hierarchies (PSCI-based) and misc cleanups, refactorings of Marvell, TI, other platforms" * tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (166 commits) drivers: soc: xilinx: Use mailbox IPI callback dt-bindings: power: reset: xilinx: Add bindings for ipi mailbox drivers: soc: ti: knav_qmss_queue: Pass lockdep expression to RCU lists MAINTAINERS: Add brcmstb PCIe controller entry soc/tegra: fuse: Unmap registers once they are not needed anymore soc/tegra: fuse: Correct straps' address for older Tegra124 device trees soc/tegra: fuse: Warn if straps are not ready soc/tegra: fuse: Cache values of straps and Chip ID registers memory: tegra30-emc: Correct error message for timed out auto calibration memory: tegra30-emc: Firm up hardware programming sequence memory: tegra30-emc: Firm up suspend/resume sequence soc/tegra: regulators: Do nothing if voltage is unchanged memory: tegra: Correct reset value of xusb_hostr soc/tegra: fuse: Add APB DMA dependency for Tegra20 bus: tegra-aconnect: Remove PM_CLK dependency dt-bindings: mediatek: add MT6765 power dt-bindings soc: mediatek: cmdq: delete not used define memory: tegra: Add support for the Tegra194 memory controller memory: tegra: Only include support for enabled SoCs memory: tegra: Support DVFS on Tegra186 and later ...
2020-02-08Merge tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/socLinus Torvalds
Pull ARM Device-tree updates from Olof Johansson: "New SoCs: - Atmel/Microchip SAM9X60 (ARM926 SoC) - OMAP 37xx gets split into AM3703/AM3715/DM3725, who are all variants of it with different GPU/media IP configurations. - ST stm32mp15 SoCs (1-2 Cortex-A7, CAN, GPU depending on SKU) - ST Ericsson ab8505 (variant of ab8500) and db8520 (variant of db8500) - Unisoc SC9863A SoC (8x Cortex-A55 mobile chipset w/ GPU, modem) - Qualcomm SC7180 (8-core 64bit SoC, unnamed CPU class) New boards: - Allwinner: + Emlid Neutis SoM (H3 variant) + Libre Computer ALL-H3-IT + PineH64 Model B - Amlogic: + Libretech Amlogic GX PC (s905d and s912-based variants) - Atmel/Microchip: + Kizboxmini, sam9x60 EK, sama5d27 Wireless SOM (wlsom1) - Marvell: + Armada 385-based SolidRun Clearfog GTR - NXP: + Gateworks GW59xx boards based on i.MX6/6Q/6QDL + Tolino Shine 3 eBook reader (i.MX6sl) + Embedded Artists COM (i.MX7ULP) + SolidRun CLearfog CX/ITX and HoneyComb (LX2160A-based systems) + Google Coral Edge TPU (i.MX8MQ) - Rockchip: + Radxa Dalang Carrier (supports rk3288 and rk3399 SOMs) + Radxa Rock Pi N10 (RK3399Pro-based) + VMARC RK3399Pro SOM - ST: + Reference boards for stm32mp15 - ST Ericsson: + Samsung Galaxy S III mini (GT-I8190) + HREF520 reference board for DB8520 - TI OMAP: + Gen1 Amazon Echo (OMAP3630-based) - Qualcomm: + Inforce 6640 Single Board Computer (msm8996-based) + SC7180 IDP (SC7180-based)" * tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (623 commits) dt-bindings: fix compilation error of the example in marvell,mmp3-hsic-phy.yaml arm64: dts: ti: k3-am654-base-board: Add CSI2 OV5640 camera arm64: dts: ti: k3-am65-main Add CAL node arm64: dts: ti: k3-j721e-main: Add McASP nodes arm64: dts: ti: k3-am654-main: Add McASP nodes arm64: dts: ti: k3-j721e: DMA support arm64: dts: ti: k3-j721e-main: Move secure proxy and smmu under main_navss arm64: dts: ti: k3-j721e-main: Correct main NAVSS representation arm64: dts: ti: k3-j721e: Correct the address for MAIN NAVSS arm64: dts: ti: k3-am65: DMA support arm64: dts: ti: k3-am65-main: Move secure proxy under cbass_main_navss arm64: dts: ti: k3-am65-main: Correct main NAVSS representation ARM: dts: aspeed: rainier: Add UCD90320 power sequencer ARM: dts: aspeed: rainier: Switch PSUs to unknown version arm64: dts: rockchip: Kill off "simple-panel" compatibles ARM: dts: rockchip: Kill off "simple-panel" compatibles arm64: dts: rockchip: rename dwmmc node names to mmc ARM: dts: rockchip: rename dwmmc node names to mmc arm64: dts: exynos: Rename Samsung and Exynos to lowercase arm64: dts: uniphier: add reset-names to NAND controller node ...
2020-02-08Merge tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/socLinus Torvalds
Pull ARM SoC platform updates from Olof Johansson: "Most of these are smaller fixes that have accrued, and some continued cleanup of OMAP platforms towards shared frameworks. One new SoC from Atmel/Microchip: sam9x60" * tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (35 commits) ARM: OMAP2+: Fix undefined reference to omap_secure_init ARM: s3c64xx: Drop unneeded select of TIMER_OF ARM: exynos: Drop unneeded select of MIGHT_HAVE_CACHE_L2X0 ARM: s3c24xx: Switch to atomic pwm API in rx1950 ARM: OMAP2+: sleep43xx: Call secure suspend/resume handlers ARM: OMAP2+: Use ARM SMC Calling Convention when OP-TEE is available ARM: OMAP2+: Introduce check for OP-TEE in omap_secure_init() ARM: OMAP2+: Add omap_secure_init callback hook for secure initialization ARM: at91: Documentation: add sam9x60 product and datasheet ARM: at91: pm: use of_device_id array to find the proper shdwc node ARM: at91: pm: use SAM9X60 PMC's compatible ARM: imx: only select ARM_ERRATA_814220 for ARMv7-A ARM: zynq: use physical cpuid in zynq_slcr_cpu_stop/start ARM: tegra: Use clk_m CPU on Tegra124 LP1 resume ARM: tegra: Modify reshift divider during LP1 ARM: tegra: Enable PLLP bypass during Tegra124 LP1 ARM: samsung: Rename Samsung and Exynos to lowercase ARM: exynos: Correct the help text for platform Kconfig option ARM: bcm: Select ARM_AMBA for ARCH_BRCMSTB ARM: brcmstb: Add debug UART entry for 7216 ...
2020-02-08Merge branch 'merge.nfs-fs_parse.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs file system parameter updates from Al Viro: "Saner fs_parser.c guts and data structures. The system-wide registry of syntax types (string/enum/int32/oct32/.../etc.) is gone and so is the horror switch() in fs_parse() that would have to grow another case every time something got added to that system-wide registry. New syntax types can be added by filesystems easily now, and their namespace is that of functions - not of system-wide enum members. IOW, they can be shared or kept private and if some turn out to be widely useful, we can make them common library helpers, etc., without having to do anything whatsoever to fs_parse() itself. And we already get that kind of requests - the thing that finally pushed me into doing that was "oh, and let's add one for timeouts - things like 15s or 2h". If some filesystem really wants that, let them do it. Without somebody having to play gatekeeper for the variants blessed by direct support in fs_parse(), TYVM. Quite a bit of boilerplate is gone. And IMO the data structures make a lot more sense now. -200LoC, while we are at it" * 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (25 commits) tmpfs: switch to use of invalfc() cgroup1: switch to use of errorfc() et.al. procfs: switch to use of invalfc() hugetlbfs: switch to use of invalfc() cramfs: switch to use of errofc() et.al. gfs2: switch to use of errorfc() et.al. fuse: switch to use errorfc() et.al. ceph: use errorfc() and friends instead of spelling the prefix out prefix-handling analogues of errorf() and friends turn fs_param_is_... into functions fs_parse: handle optional arguments sanely fs_parse: fold fs_parameter_desc/fs_parameter_spec fs_parser: remove fs_parameter_description name field add prefix to fs_context->log ceph_parse_param(), ceph_parse_mon_ips(): switch to passing fc_log new primitive: __fs_parse() switch rbd and libceph to p_log-based primitives struct p_log, variants of warnf() et.al. taking that one instead teach logfc() to handle prefices, give it saner calling conventions get rid of cg_invalf() ...
2020-02-08Merge tag 'irqchip-fixes-5.6-1' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent Pull irqchip fixes for 5.6, take #1 from Marc Zyngier: - Guarantee allocation of L2 vPE table for GICv4.1 - Fix GICv4.1 VPROPBASER programming - Numerous GICv4.1 tidy ups - Fix disabled GICv3 redistributor provisioning with ACPI - KConfig cleanup for C-SKY