summaryrefslogtreecommitdiff
path: root/include
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2020-06-13 10:05:47 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2020-06-13 10:05:47 -0700
commit076f14be7fc942e112c94c841baec44124275cd0 (patch)
tree3bc4d01b7732ebc444060f0df84bc10f26da6238 /include
parent6c3297841472b4e53e22e53826eea9e483d993e5 (diff)
parent0bf3924bfabd13ba21aa702344fc00b3b3263e5a (diff)
Merge tag 'x86-entry-2020-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 entry updates from Thomas Gleixner: "The x86 entry, exception and interrupt code rework This all started about 6 month ago with the attempt to move the Posix CPU timer heavy lifting out of the timer interrupt code and just have lockless quick checks in that code path. Trivial 5 patches. This unearthed an inconsistency in the KVM handling of task work and the review requested to move all of this into generic code so other architectures can share. Valid request and solved with another 25 patches but those unearthed inconsistencies vs. RCU and instrumentation. Digging into this made it obvious that there are quite some inconsistencies vs. instrumentation in general. The int3 text poke handling in particular was completely unprotected and with the batched update of trace events even more likely to expose to endless int3 recursion. In parallel the RCU implications of instrumenting fragile entry code came up in several discussions. The conclusion of the x86 maintainer team was to go all the way and make the protection against any form of instrumentation of fragile and dangerous code pathes enforcable and verifiable by tooling. A first batch of preparatory work hit mainline with commit d5f744f9a2ac ("Pull x86 entry code updates from Thomas Gleixner") That (almost) full solution introduced a new code section '.noinstr.text' into which all code which needs to be protected from instrumentation of all sorts goes into. Any call into instrumentable code out of this section has to be annotated. objtool has support to validate this. Kprobes now excludes this section fully which also prevents BPF from fiddling with it and all 'noinstr' annotated functions also keep ftrace off. The section, kprobes and objtool changes are already merged. The major changes coming with this are: - Preparatory cleanups - Annotating of relevant functions to move them into the noinstr.text section or enforcing inlining by marking them __always_inline so the compiler cannot misplace or instrument them. - Splitting and simplifying the idtentry macro maze so that it is now clearly separated into simple exception entries and the more interesting ones which use interrupt stacks and have the paranoid handling vs. CR3 and GS. - Move quite some of the low level ASM functionality into C code: - enter_from and exit to user space handling. The ASM code now calls into C after doing the really necessary ASM handling and the return path goes back out without bells and whistels in ASM. - exception entry/exit got the equivivalent treatment - move all IRQ tracepoints from ASM to C so they can be placed as appropriate which is especially important for the int3 recursion issue. - Consolidate the declaration and definition of entry points between 32 and 64 bit. They share a common header and macros now. - Remove the extra device interrupt entry maze and just use the regular exception entry code. - All ASM entry points except NMI are now generated from the shared header file and the corresponding macros in the 32 and 64 bit entry ASM. - The C code entry points are consolidated as well with the help of DEFINE_IDTENTRY*() macros. This allows to ensure at one central point that all corresponding entry points share the same semantics. The actual function body for most entry points is in an instrumentable and sane state. There are special macros for the more sensitive entry points, e.g. INT3 and of course the nasty paranoid #NMI, #MCE, #DB and #DF. They allow to put the whole entry instrumentation and RCU handling into safe places instead of the previous pray that it is correct approach. - The INT3 text poke handling is now completely isolated and the recursion issue banned. Aside of the entry rework this required other isolation work, e.g. the ability to force inline bsearch. - Prevent #DB on fragile entry code, entry relevant memory and disable it on NMI, #MC entry, which allowed to get rid of the nested #DB IST stack shifting hackery. - A few other cleanups and enhancements which have been made possible through this and already merged changes, e.g. consolidating and further restricting the IDT code so the IDT table becomes RO after init which removes yet another popular attack vector - About 680 lines of ASM maze are gone. There are a few open issues: - An escape out of the noinstr section in the MCE handler which needs some more thought but under the aspect that MCE is a complete trainwreck by design and the propability to survive it is low, this was not high on the priority list. - Paravirtualization When PV is enabled then objtool complains about a bunch of indirect calls out of the noinstr section. There are a few straight forward ways to fix this, but the other issues vs. general correctness were more pressing than parawitz. - KVM KVM is inconsistent as well. Patches have been posted, but they have not yet been commented on or picked up by the KVM folks. - IDLE Pretty much the same problems can be found in the low level idle code especially the parts where RCU stopped watching. This was beyond the scope of the more obvious and exposable problems and is on the todo list. The lesson learned from this brain melting exercise to morph the evolved code base into something which can be validated and understood is that once again the violation of the most important engineering principle "correctness first" has caused quite a few people to spend valuable time on problems which could have been avoided in the first place. The "features first" tinkering mindset really has to stop. With that I want to say thanks to everyone involved in contributing to this effort. Special thanks go to the following people (alphabetical order): Alexandre Chartre, Andy Lutomirski, Borislav Petkov, Brian Gerst, Frederic Weisbecker, Josh Poimboeuf, Juergen Gross, Lai Jiangshan, Macro Elver, Paolo Bonzin,i Paul McKenney, Peter Zijlstra, Vitaly Kuznetsov, and Will Deacon" * tag 'x86-entry-2020-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (142 commits) x86/entry: Force rcu_irq_enter() when in idle task x86/entry: Make NMI use IDTENTRY_RAW x86/entry: Treat BUG/WARN as NMI-like entries x86/entry: Unbreak __irqentry_text_start/end magic x86/entry: __always_inline CR2 for noinstr lockdep: __always_inline more for noinstr x86/entry: Re-order #DB handler to avoid *SAN instrumentation x86/entry: __always_inline arch_atomic_* for noinstr x86/entry: __always_inline irqflags for noinstr x86/entry: __always_inline debugreg for noinstr x86/idt: Consolidate idt functionality x86/idt: Cleanup trap_init() x86/idt: Use proper constants for table size x86/idt: Add comments about early #PF handling x86/idt: Mark init only functions __init x86/entry: Rename trace_hardirqs_off_prepare() x86/entry: Clarify irq_{enter,exit}_rcu() x86/entry: Remove DBn stacks x86/entry: Remove debug IDT frobbing x86/entry: Optimize local_db_save() for virt ...
Diffstat (limited to 'include')
-rw-r--r--include/asm-generic/bug.h9
-rw-r--r--include/linux/bsearch.h26
-rw-r--r--include/linux/context_tracking.h6
-rw-r--r--include/linux/context_tracking_state.h6
-rw-r--r--include/linux/debug_locks.h2
-rw-r--r--include/linux/hardirq.h41
-rw-r--r--include/linux/interrupt.h8
-rw-r--r--include/linux/irqflags.h4
-rw-r--r--include/xen/events.h7
-rw-r--r--include/xen/hvm.h2
-rw-r--r--include/xen/interface/hvm/hvm_op.h2
-rw-r--r--include/xen/xen-ops.h19
12 files changed, 93 insertions, 39 deletions
diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h
index 384b5c835ced..c94e33ae3e7b 100644
--- a/include/asm-generic/bug.h
+++ b/include/asm-generic/bug.h
@@ -83,14 +83,19 @@ extern __printf(4, 5)
void warn_slowpath_fmt(const char *file, const int line, unsigned taint,
const char *fmt, ...);
#define __WARN() __WARN_printf(TAINT_WARN, NULL)
-#define __WARN_printf(taint, arg...) \
- warn_slowpath_fmt(__FILE__, __LINE__, taint, arg)
+#define __WARN_printf(taint, arg...) do { \
+ instrumentation_begin(); \
+ warn_slowpath_fmt(__FILE__, __LINE__, taint, arg); \
+ instrumentation_end(); \
+ } while (0)
#else
extern __printf(1, 2) void __warn_printk(const char *fmt, ...);
#define __WARN() __WARN_FLAGS(BUGFLAG_TAINT(TAINT_WARN))
#define __WARN_printf(taint, arg...) do { \
+ instrumentation_begin(); \
__warn_printk(arg); \
__WARN_FLAGS(BUGFLAG_NO_CUT_HERE | BUGFLAG_TAINT(taint));\
+ instrumentation_end(); \
} while (0)
#define WARN_ON_ONCE(condition) ({ \
int __ret_warn_on = !!(condition); \
diff --git a/include/linux/bsearch.h b/include/linux/bsearch.h
index 8ed53d7524ea..e66b711d091e 100644
--- a/include/linux/bsearch.h
+++ b/include/linux/bsearch.h
@@ -4,7 +4,29 @@
#include <linux/types.h>
-void *bsearch(const void *key, const void *base, size_t num, size_t size,
- cmp_func_t cmp);
+static __always_inline
+void *__inline_bsearch(const void *key, const void *base, size_t num, size_t size, cmp_func_t cmp)
+{
+ const char *pivot;
+ int result;
+
+ while (num > 0) {
+ pivot = base + (num >> 1) * size;
+ result = cmp(key, pivot);
+
+ if (result == 0)
+ return (void *)pivot;
+
+ if (result > 0) {
+ base = pivot + size;
+ num--;
+ }
+ num >>= 1;
+ }
+
+ return NULL;
+}
+
+extern void *bsearch(const void *key, const void *base, size_t num, size_t size, cmp_func_t cmp);
#endif /* _LINUX_BSEARCH_H */
diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index 8cac62ee6add..981b880d5b60 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -33,13 +33,13 @@ static inline void user_exit(void)
}
/* Called with interrupts disabled. */
-static inline void user_enter_irqoff(void)
+static __always_inline void user_enter_irqoff(void)
{
if (context_tracking_enabled())
__context_tracking_enter(CONTEXT_USER);
}
-static inline void user_exit_irqoff(void)
+static __always_inline void user_exit_irqoff(void)
{
if (context_tracking_enabled())
__context_tracking_exit(CONTEXT_USER);
@@ -75,7 +75,7 @@ static inline void exception_exit(enum ctx_state prev_ctx)
* is enabled. If context tracking is disabled, returns
* CONTEXT_DISABLED. This should be used primarily for debugging.
*/
-static inline enum ctx_state ct_state(void)
+static __always_inline enum ctx_state ct_state(void)
{
return context_tracking_enabled() ?
this_cpu_read(context_tracking.state) : CONTEXT_DISABLED;
diff --git a/include/linux/context_tracking_state.h b/include/linux/context_tracking_state.h
index e7fe6678b7ad..65a60d3313b0 100644
--- a/include/linux/context_tracking_state.h
+++ b/include/linux/context_tracking_state.h
@@ -26,12 +26,12 @@ struct context_tracking {
extern struct static_key_false context_tracking_key;
DECLARE_PER_CPU(struct context_tracking, context_tracking);
-static inline bool context_tracking_enabled(void)
+static __always_inline bool context_tracking_enabled(void)
{
return static_branch_unlikely(&context_tracking_key);
}
-static inline bool context_tracking_enabled_cpu(int cpu)
+static __always_inline bool context_tracking_enabled_cpu(int cpu)
{
return context_tracking_enabled() && per_cpu(context_tracking.active, cpu);
}
@@ -41,7 +41,7 @@ static inline bool context_tracking_enabled_this_cpu(void)
return context_tracking_enabled() && __this_cpu_read(context_tracking.active);
}
-static inline bool context_tracking_in_user(void)
+static __always_inline bool context_tracking_in_user(void)
{
return __this_cpu_read(context_tracking.state) == CONTEXT_USER;
}
diff --git a/include/linux/debug_locks.h b/include/linux/debug_locks.h
index 257ab3c92cb8..e7e45f0cc7da 100644
--- a/include/linux/debug_locks.h
+++ b/include/linux/debug_locks.h
@@ -12,7 +12,7 @@ extern int debug_locks __read_mostly;
extern int debug_locks_silent __read_mostly;
-static inline int __debug_locks_off(void)
+static __always_inline int __debug_locks_off(void)
{
return xchg(&debug_locks, 0);
}
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index e07cf853aa16..03c9fece7d43 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -38,9 +38,24 @@ static __always_inline void rcu_irq_enter_check_tick(void)
} while (0)
/*
+ * Like __irq_enter() without time accounting for fast
+ * interrupts, e.g. reschedule IPI where time accounting
+ * is more expensive than the actual interrupt.
+ */
+#define __irq_enter_raw() \
+ do { \
+ preempt_count_add(HARDIRQ_OFFSET); \
+ lockdep_hardirq_enter(); \
+ } while (0)
+
+/*
* Enter irq context (on NO_HZ, update jiffies):
*/
-extern void irq_enter(void);
+void irq_enter(void);
+/*
+ * Like irq_enter(), but RCU is already watching.
+ */
+void irq_enter_rcu(void);
/*
* Exit irq context without processing softirqs:
@@ -53,9 +68,23 @@ extern void irq_enter(void);
} while (0)
/*
+ * Like __irq_exit() without time accounting
+ */
+#define __irq_exit_raw() \
+ do { \
+ lockdep_hardirq_exit(); \
+ preempt_count_sub(HARDIRQ_OFFSET); \
+ } while (0)
+
+/*
* Exit irq context and process softirqs if needed:
*/
-extern void irq_exit(void);
+void irq_exit(void);
+
+/*
+ * Like irq_exit(), but return with RCU watching.
+ */
+void irq_exit_rcu(void);
#ifndef arch_nmi_enter
#define arch_nmi_enter() do { } while (0)
@@ -87,20 +116,24 @@ extern void rcu_nmi_exit(void);
arch_nmi_enter(); \
printk_nmi_enter(); \
lockdep_off(); \
- ftrace_nmi_enter(); \
BUG_ON(in_nmi() == NMI_MASK); \
__preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET); \
rcu_nmi_enter(); \
lockdep_hardirq_enter(); \
+ instrumentation_begin(); \
+ ftrace_nmi_enter(); \
+ instrumentation_end(); \
} while (0)
#define nmi_exit() \
do { \
+ instrumentation_begin(); \
+ ftrace_nmi_exit(); \
+ instrumentation_end(); \
lockdep_hardirq_exit(); \
rcu_nmi_exit(); \
BUG_ON(!in_nmi()); \
__preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET); \
- ftrace_nmi_exit(); \
lockdep_on(); \
printk_nmi_exit(); \
arch_nmi_exit(); \
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 80f637c3a6f3..5db970b6615a 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -760,8 +760,10 @@ extern int arch_early_irq_init(void);
/*
* We want to know which function is an entrypoint of a hardirq or a softirq.
*/
-#define __irq_entry __attribute__((__section__(".irqentry.text")))
-#define __softirq_entry \
- __attribute__((__section__(".softirqentry.text")))
+#ifndef __irq_entry
+# define __irq_entry __attribute__((__section__(".irqentry.text")))
+#endif
+
+#define __softirq_entry __attribute__((__section__(".softirqentry.text")))
#endif
diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index d7f7e436c3af..6384d2813ded 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -32,7 +32,7 @@
#ifdef CONFIG_TRACE_IRQFLAGS
extern void trace_hardirqs_on_prepare(void);
- extern void trace_hardirqs_off_prepare(void);
+ extern void trace_hardirqs_off_finish(void);
extern void trace_hardirqs_on(void);
extern void trace_hardirqs_off(void);
# define lockdep_hardirq_context(p) ((p)->hardirq_context)
@@ -101,7 +101,7 @@ do { \
#else
# define trace_hardirqs_on_prepare() do { } while (0)
-# define trace_hardirqs_off_prepare() do { } while (0)
+# define trace_hardirqs_off_finish() do { } while (0)
# define trace_hardirqs_on() do { } while (0)
# define trace_hardirqs_off() do { } while (0)
# define lockdep_hardirq_context(p) 0
diff --git a/include/xen/events.h b/include/xen/events.h
index 12b0dcb6a120..df1e6391f63f 100644
--- a/include/xen/events.h
+++ b/include/xen/events.h
@@ -90,13 +90,6 @@ unsigned int irq_from_evtchn(evtchn_port_t evtchn);
int irq_from_virq(unsigned int cpu, unsigned int virq);
evtchn_port_t evtchn_from_irq(unsigned irq);
-#ifdef CONFIG_XEN_PVHVM
-/* Xen HVM evtchn vector callback */
-void xen_hvm_callback_vector(void);
-#ifdef CONFIG_TRACING
-#define trace_xen_hvm_callback_vector xen_hvm_callback_vector
-#endif
-#endif
int xen_set_callback_via(uint64_t via);
void xen_evtchn_do_upcall(struct pt_regs *regs);
void xen_hvm_evtchn_do_upcall(void);
diff --git a/include/xen/hvm.h b/include/xen/hvm.h
index 0b15f8cb17fc..b7fd7fc9ad41 100644
--- a/include/xen/hvm.h
+++ b/include/xen/hvm.h
@@ -58,4 +58,6 @@ static inline int hvm_get_parameter(int idx, uint64_t *value)
#define HVM_CALLBACK_VECTOR(x) (((uint64_t)HVM_CALLBACK_VIA_TYPE_VECTOR)<<\
HVM_CALLBACK_VIA_TYPE_SHIFT | (x))
+void xen_setup_callback_vector(void);
+
#endif /* XEN_HVM_H__ */
diff --git a/include/xen/interface/hvm/hvm_op.h b/include/xen/interface/hvm/hvm_op.h
index 956a04682865..25d945ef17de 100644
--- a/include/xen/interface/hvm/hvm_op.h
+++ b/include/xen/interface/hvm/hvm_op.h
@@ -21,6 +21,8 @@
#ifndef __XEN_PUBLIC_HVM_HVM_OP_H__
#define __XEN_PUBLIC_HVM_HVM_OP_H__
+#include <xen/interface/xen.h>
+
/* Get/set subcommands: the second argument of the hypercall is a
* pointer to a xen_hvm_param struct. */
#define HVMOP_set_param 0
diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
index 095be1d66f31..39a5580f8feb 100644
--- a/include/xen/xen-ops.h
+++ b/include/xen/xen-ops.h
@@ -215,17 +215,7 @@ bool xen_running_on_version_or_later(unsigned int major, unsigned int minor);
void xen_efi_runtime_setup(void);
-#ifdef CONFIG_PREEMPTION
-
-static inline void xen_preemptible_hcall_begin(void)
-{
-}
-
-static inline void xen_preemptible_hcall_end(void)
-{
-}
-
-#else
+#if defined(CONFIG_XEN_PV) && !defined(CONFIG_PREEMPTION)
DECLARE_PER_CPU(bool, xen_in_preemptible_hcall);
@@ -239,6 +229,11 @@ static inline void xen_preemptible_hcall_end(void)
__this_cpu_write(xen_in_preemptible_hcall, false);
}
-#endif /* CONFIG_PREEMPTION */
+#else
+
+static inline void xen_preemptible_hcall_begin(void) { }
+static inline void xen_preemptible_hcall_end(void) { }
+
+#endif /* CONFIG_XEN_PV && !CONFIG_PREEMPTION */
#endif /* INCLUDE_XEN_OPS_H */