summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2012-07-23KVM: Choose better candidate for directed yieldRaghavendra K T
Currently, on a large vcpu guests, there is a high probability of yielding to the same vcpu who had recently done a pause-loop exit or cpu relax intercepted. Such a yield can lead to the vcpu spinning again and hence degrade the performance. The patchset keeps track of the pause loop exit/cpu relax interception and gives chance to a vcpu which: (a) Has not done pause loop exit or cpu relax intercepted at all (probably he is preempted lock-holder) (b) Was skipped in last iteration because it did pause loop exit or cpu relax intercepted, and probably has become eligible now (next eligible lock holder) Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> # on s390x Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-23KVM: Note down when cpu relax intercepted or pause loop exitedRaghavendra K T
Noting pause loop exited vcpu or cpu relax intercepted helps in filtering right candidate to yield. Wrong selection of vcpu; i.e., a vcpu that just did a pl-exit or cpu relax intercepted may contribute to performance degradation. Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> # on s390x Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-23KVM: Add config to support ple or cpu relax optimzationRaghavendra K T
Suggested-by: Avi Kivity <avi@redhat.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> # on s390x Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-20KVM: switch to symbolic name for irq_states sizeMichael S. Tsirkin
Use PIC_NUM_PINS instead of hard-coded 16 for pic pins. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-20KVM: x86: Fix typos in pmu.cGuo Chao
Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-20KVM: x86: Fix typos in lapic.cGuo Chao
Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-20KVM: x86: Fix typos in cpuid.cGuo Chao
Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-20KVM: x86: Fix typos in emulate.cGuo Chao
Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-20KVM: x86: Fix typos in x86.cGuo Chao
Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-20KVM: SVM: Fix typosGuo Chao
Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-20KVM: VMX: Fix typosGuo Chao
Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-19KVM: remove the unused parameter of gfn_to_pfn_memslotXiao Guangrong
The parameter, 'kvm', is not used in gfn_to_pfn_memslot, we can happily remove it Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-19KVM: remove is_error_hpaXiao Guangrong
Remove them since they are not used anymore Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-19KVM: make bad_pfn static to kvm_main.cXiao Guangrong
bad_pfn is not used out of kvm_main.c, so mark it static, also move it near hwpoison_pfn and fault_pfn Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-19KVM: using get_fault_pfn to get the fault pfnXiao Guangrong
Using get_fault_pfn to cleanup the code Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-19KVM: MMU: track the refcount when unmap the pageXiao Guangrong
It will trigger a WARN_ON if the page has been freed but it is still used in mmu, it can help us to detect mm bug early Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-19KVM: x86: remove unnecessary mark_page_dirtyXiao Guangrong
fix: [ 132.474633] 3.5.0-rc1+ #50 Not tainted [ 132.474634] ------------------------------- [ 132.474635] include/linux/kvm_host.h:369 suspicious rcu_dereference_check() usage! [ 132.474636] [ 132.474636] other info that might help us debug this: [ 132.474636] [ 132.474638] [ 132.474638] rcu_scheduler_active = 1, debug_locks = 1 [ 132.474640] 1 lock held by qemu-kvm/2832: [ 132.474657] #0: (&vcpu->mutex){+.+.+.}, at: [<ffffffffa01e1636>] vcpu_load+0x1e/0x91 [kvm] [ 132.474658] [ 132.474658] stack backtrace: [ 132.474660] Pid: 2832, comm: qemu-kvm Not tainted 3.5.0-rc1+ #50 [ 132.474661] Call Trace: [ 132.474665] [<ffffffff81092f40>] lockdep_rcu_suspicious+0xfc/0x105 [ 132.474675] [<ffffffffa01e0c85>] kvm_memslots+0x6d/0x75 [kvm] [ 132.474683] [<ffffffffa01e0ca1>] gfn_to_memslot+0x14/0x4c [kvm] [ 132.474693] [<ffffffffa01e3575>] mark_page_dirty+0x17/0x2a [kvm] [ 132.474706] [<ffffffffa01f21ea>] kvm_arch_vcpu_ioctl+0xbcf/0xc07 [kvm] Actually, we do not write vcpu->arch.time at this time, mark_page_dirty should be removed. Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-18KVM: MMU: Avoid handling same rmap_pde in kvm_handle_hva_range()Takuya Yoshikawa
When we invalidate a THP page, we call the handler with the same rmap_pde argument 512 times in the following loop: for each guest page in the range for each level unmap using rmap This patch avoids these extra handler calls by changing the loop order like this: for each level for each rmap in the range unmap using rmap With the preceding patches in the patch series, this made THP page invalidation more than 5 times faster on our x86 host: the host became more responsive during swapping the guest's memory as a result. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-18KVM: MMU: Push trace_kvm_age_page() into kvm_age_rmapp()Takuya Yoshikawa
This restricts the tracing to page aging and makes it possible to optimize kvm_handle_hva_range() further in the following patch. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-18KVM: MMU: Add memslot parameter to hva handlersTakuya Yoshikawa
This is needed to push trace_kvm_age_page() into kvm_age_rmapp() in the following patch. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-18KVM: Separate rmap_pde from kvm_lpage_info->write_countTakuya Yoshikawa
This makes it possible to loop over rmap_pde arrays in the same way as we do over rmap so that we can optimize kvm_handle_hva_range() easily in the following patch. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-18KVM: Introduce kvm_unmap_hva_range() for ↵Takuya Yoshikawa
kvm_mmu_notifier_invalidate_range_start() When we tested KVM under memory pressure, with THP enabled on the host, we noticed that MMU notifier took a long time to invalidate huge pages. Since the invalidation was done with mmu_lock held, it not only wasted the CPU but also made the host harder to respond. This patch mitigates this by using kvm_handle_hva_range(). Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Cc: Alexander Graf <agraf@suse.de> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-18KVM: MMU: Make kvm_handle_hva() handle range of addressesTakuya Yoshikawa
When guest's memory is backed by THP pages, MMU notifier needs to call kvm_unmap_hva(), which in turn leads to kvm_handle_hva(), in a loop to invalidate a range of pages which constitute one huge page: for each page for each memslot if page is in memslot unmap using rmap This means although every page in that range is expected to be found in the same memslot, we are forced to check unrelated memslots many times. If the guest has more memslots, the situation will become worse. Furthermore, if the range does not include any pages in the guest's memory, the loop over the pages will just consume extra time. This patch, together with the following patches, solves this problem by introducing kvm_handle_hva_range() which makes the loop look like this: for each memslot for each page in memslot unmap using rmap In this new processing, the actual work is converted to a loop over rmap which is much more cache friendly than before. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Cc: Alexander Graf <agraf@suse.de> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-18KVM: Introduce hva_to_gfn_memslot() for kvm_handle_hva()Takuya Yoshikawa
This restricts hva handling in mmu code and makes it easier to extend kvm_handle_hva() so that it can treat a range of addresses later in this patch series. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Cc: Alexander Graf <agraf@suse.de> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-18KVM: MMU: Use __gfn_to_rmap() to clean up kvm_handle_hva()Takuya Yoshikawa
We can treat every level uniformly. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-16Revert "apic: fix kvm build on UP without IOAPIC"Michael S. Tsirkin
This reverts commit f9808b7fd422b965cea52e05ba470e0a473c53d3. After commit 'kvm: switch to apic_set_eoi_write, apic_write' the stubs are no longer needed as kvm does not look at apicdrivers anymore. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-16KVM guest: switch to apic_set_eoi_write, apic_writeMichael S. Tsirkin
Use apic_set_eoi_write, apic_write to avoid meedling in core apic driver data structures directly. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-16apic: add apic_set_eoi_write for PV useMichael S. Tsirkin
KVM PV EOI optimization overrides eoi_write apic op with its own version. Add an API for this to avoid meddling with core x86 apic driver data structures directly. For KVM use, we don't need any guarantees about when the switch to the new op will take place, so it could in theory use this API after SMP init, but it currently doesn't, and restricting callers to early init makes it clear that it's safe as it won't race with actual APIC driver use. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-15Merge branch 'for-upstream' of git://github.com/agraf/linux-2.6 into nextAvi Kivity
ppc queue from Alex Graf: * Prepare some of the booke code for 64 bit support * BookE: Fix ESR flag in DSI * BookE: Add rfci emulation * 'for-upstream' of git://github.com/agraf/linux-2.6: KVM: PPC: Critical interrupt emulation support KVM: PPC: e500mc: Fix tlbilx emulation for 64-bit guests KVM: PPC64: booke: Set interrupt computation mode for 64-bit host KVM: PPC: bookehv: Add ESR flag to Data Storage Interrupt KVM: PPC: bookehv64: Add support for std/ld emulation. booke: Added crit/mc exception handler for e500v2 booke/bookehv: Add host crit-watchdog exception support Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-12KVM: VMX: Implement PCID/INVPCID for guests with EPTMao, Junjie
This patch handles PCID/INVPCID for guests. Process-context identifiers (PCIDs) are a facility by which a logical processor may cache information for multiple linear-address spaces so that the processor may retain cached information when software switches to a different linear address space. Refer to section 4.10.1 in IA32 Intel Software Developer's Manual Volume 3A for details. For guests with EPT, the PCID feature is enabled and INVPCID behaves as running natively. For guests without EPT, the PCID feature is disabled and INVPCID triggers #UD. Signed-off-by: Junjie Mao <junjie.mao@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11KVM: Add x86_hyper_kvm to complete detect_hypervisor_platform checkPrarit Bhargava
While debugging I noticed that unlike all the other hypervisor code in the kernel, kvm does not have an entry for x86_hyper which is used in detect_hypervisor_platform() which results in a nice printk in the syslog. This is only really a stub function but it does make kvm more consistent with the other hypervisors. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Avi Kivity <avi@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Marcelo Tostatti <mtosatti@redhat.com> Cc: kvm@vger.kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11KVM: PPC: Critical interrupt emulation supportBharat Bhushan
rfci instruction and CSRR0/1 registers are emulated. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2012-07-11KVM: PPC: e500mc: Fix tlbilx emulation for 64-bit guestsMihai Caraman
tlbilxva emulation was using an u32 variable for guest effective address. Replace it with gva_t type to handle 64-bit guests. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2012-07-11KVM: PPC64: booke: Set interrupt computation mode for 64-bit hostMihai Caraman
64-bit host needs to remain in 64-bit mode when an exception take place. Set interrupt computaion mode in EPCR register. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2012-07-11KVM: PPC: bookehv: Add ESR flag to Data Storage InterruptMihai Caraman
ESR register is required by Data Storage Interrupt handling code. Add the specific flag to the interrupt handler. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2012-07-11KVM: PPC: bookehv64: Add support for std/ld emulation.Varun Sethi
Add support for std/ld emulation. Signed-off-by: Varun Sethi <Varun.Sethi@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2012-07-11booke: Added crit/mc exception handler for e500v2Bharat Bhushan
Watchdog is taken at critical exception level. So this patch is tested with host watchdog exception happening when guest is running. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2012-07-11booke/bookehv: Add host crit-watchdog exception supportBharat Bhushan
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2012-07-11KVM: MMU: document mmu-lock and fast page faultXiao Guangrong
Document fast page fault and mmu-lock in locking.txt Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11KVM: MMU: fix kvm_mmu_pagetable_walk tracepointXiao Guangrong
The P bit of page fault error code is missed in this tracepoint, fix it by passing the full error code Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11KVM: MMU: trace fast page faultXiao Guangrong
To see what happen on this path and help us to optimize it Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11KVM: MMU: fast path of handling guest page faultXiao Guangrong
If the the present bit of page fault error code is set, it indicates the shadow page is populated on all levels, it means what we do is only modify the access bit which can be done out of mmu-lock Currently, in order to simplify the code, we only fix the page fault caused by write-protect on the fast path Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11KVM: MMU: introduce SPTE_MMU_WRITEABLE bitXiao Guangrong
This bit indicates whether the spte can be writable on MMU, that means the corresponding gpte is writable and the corresponding gfn is not protected by shadow page protection In the later path, SPTE_MMU_WRITEABLE will indicates whether the spte can be locklessly updated Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11KVM: MMU: fold tlb flush judgement into mmu_spte_updateXiao Guangrong
mmu_spte_update() is the common function, we can easily audit the path Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11KVM: VMX: export PFEC.P bit on eptXiao Guangrong
Export the present bit of page fault error code, the later patch will use it Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11KVM: MMU: cleanup spte_write_protectXiao Guangrong
Use __drop_large_spte to cleanup this function and comment spte_write_protect Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11KVM: MMU: abstract spte write-protectXiao Guangrong
Introduce a common function to abstract spte write-protect to cleanup the code Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11KVM: MMU: return bool in __rmap_write_protectXiao Guangrong
The reture value of __rmap_write_protect is either 1 or 0, use true/false instead of these Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09KVM: VMX: Emulate invalid guest state by defaultAvi Kivity
Our emulation should be complete enough that we can emulate guests while they are in big real mode, or in a mode transition that is not virtualizable without unrestricted guest support. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09KVM: x86 emulator: implement LTRAvi Kivity
Opcode 0F 00 /3. Encountered during Windows XP secondary processor bringup. Signed-off-by: Avi Kivity <avi@redhat.com>