Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"Updates for timekeeping, timers and clockevent/source drivers:
Core:
- Yet another round of improvements to make the clocksource watchdog
more robust:
- Relax the clocksource-watchdog skew criteria to match the NTP
criteria.
- Temporarily skip the watchdog when high memory latencies are
detected which can lead to false-positives.
- Provide an option to enable TSC skew detection even on systems
where TSC is marked as reliable.
Sigh!
- Initialize the restart block in the nanosleep syscalls to be
directed to the no restart function instead of doing a partial
setup on entry.
This prevents an erroneous restart_syscall() invocation from
corrupting user space data. While such a situation is clearly a
user space bug, preventing this is a correctness issue and caters
to the least suprise principle.
- Ignore the hrtimer slack for realtime tasks in schedule_hrtimeout()
to align it with the nanosleep semantics.
Drivers:
- The obligatory new driver bindings for Mediatek, Rockchip and
RISC-V variants.
- Add support for the C3STOP misfeature to the RISC-V timer to handle
the case where the timer stops in deeper idle state.
- Set up a static key in the RISC-V timer correctly before first use.
- The usual small improvements and fixes all over the place"
* tag 'timers-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
clocksource/drivers/timer-sun4i: Add CLOCK_EVT_FEAT_DYNIRQ
clocksource/drivers/em_sti: Mark driver as non-removable
clocksource/drivers/sh_tmu: Mark driver as non-removable
clocksource/drivers/riscv: Patch riscv_clock_next_event() jump before first use
clocksource/drivers/timer-microchip-pit64b: Add delay timer
clocksource/drivers/timer-microchip-pit64b: Select driver only on ARM
dt-bindings: timer: sifive,clint: add comaptibles for T-Head's C9xx
dt-bindings: timer: mediatek,mtk-timer: add MT8365
clocksource/drivers/riscv: Get rid of clocksource_arch_init() callback
clocksource/drivers/sh_cmt: Mark driver as non-removable
clocksource/drivers/timer-microchip-pit64b: Drop obsolete dependency on COMPILE_TEST
clocksource/drivers/riscv: Increase the clock source rating
clocksource/drivers/timer-riscv: Set CLOCK_EVT_FEAT_C3STOP based on DT
dt-bindings: timer: Add bindings for the RISC-V timer device
RISC-V: time: initialize hrtimer based broadcast clock event device
dt-bindings: timer: rk-timer: Add rktimer for rv1126
time/debug: Fix memory leak with using debugfs_lookup()
clocksource: Enable TSC watchdog checking of HPET and PMTMR only when requested
posix-timers: Use atomic64_try_cmpxchg() in __update_gt_cputime()
clocksource: Verify HPET and PMTMR when TSC unverified
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull miscellaneous x86 cleanups from Thomas Gleixner:
- Correct the common copy and pasted mishandling of kstrtobool() in the
strict_sas_size() setup function
- Make recalibrate_cpu_khz() an GPL only export
- Check TSC feature before doing anything else which avoids pointless
code execution if TSC is not available
- Remove or fixup stale and misleading comments
- Remove unused or pointelessly duplicated variables
- Spelling and typo fixes
* tag 'x86-cleanups-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/hotplug: Remove incorrect comment about mwait_play_dead()
x86/tsc: Do feature check as the very first thing
x86/tsc: Make recalibrate_cpu_khz() export GPL only
x86/cacheinfo: Remove unused trace variable
x86/Kconfig: Fix spellos & punctuation
x86/signal: Fix the value returned by strict_sas_size()
x86/cpu: Remove misleading comment
x86/setup: Move duplicate boot_cpu_data definition out of the ifdeffery
x86/boot/e820: Fix typo in e820.c comment
|
|
Do the feature check as the very first thing in the function. Everything
else comes after that and is meaningless work if the TSC CPUID bit is
not even set. Switch to cpu_feature_enabled() too, while at it.
No functional changes.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/Y5990CUCuWd5jfBH@zn.tnic
|
|
A quick search doesn't reveal any use outside of the kernel - which
would be questionable to begin with anyway - so make the export GPL
only.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/Y599miBzWRAuOwhg@zn.tnic
|
|
Unconditionally enabling TSC watchdog checking of the HPET and PMTMR
clocksources can degrade latency and performance. Therefore, provide
a new "watchdog" option to the tsc= boot parameter that opts into such
checking. Note that tsc=watchdog is overridden by a tsc=nowatchdog
regardless of their relative positions in the list of boot parameters.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Waiman Long <longman@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Waiman Long <longman@redhat.com>
|
|
On systems with two or fewer sockets, when the boot CPU has CONSTANT_TSC,
NONSTOP_TSC, and TSC_ADJUST, clocksource watchdog verification of the
TSC is disabled. This works well much of the time, but there is the
occasional production-level system that meets all of these criteria, but
which still has a TSC that skews significantly from atomic-clock time.
This is usually attributed to a firmware or hardware fault. Yes, the
various NTP daemons do express their opinions of userspace-to-atomic-clock
time skew, but they put them in various places, depending on the daemon
and distro in question. It would therefore be good for the kernel to
have some clue that there is a problem.
The old behavior of marking the TSC unstable is a non-starter because a
great many workloads simply cannot tolerate the overheads and latencies
of the various non-TSC clocksources. In addition, NTP-corrected systems
sometimes can tolerate significant kernel-space time skew as long as
the userspace time sources are within epsilon of atomic-clock time.
Therefore, when watchdog verification of TSC is disabled, enable it for
HPET and PMTMR (AKA ACPI PM timer). This provides the needed in-kernel
time-skew diagnostic without degrading the system's performance.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Waiman Long <longman@redhat.com>
Cc: <x86@kernel.org>
Tested-by: Feng Tang <feng.tang@intel.com>
|
|
The kernel assumes that the TSC frequency which is provided by the
hardware / firmware via MSRs or CPUID(0x15) is correct after applying
a few basic consistency checks. This disables the TSC recalibration
against HPET or PM timer.
As a result there is no mechanism to validate that frequency in cases
where a firmware or hardware defect is suspected. And there was case
that some user used atomic clock to measure the TSC frequency and
reported an inaccuracy issue, which was later fixed in firmware.
Add an option 'recalibrate' for 'tsc' kernel parameter to force the
tsc freq recalibration with HPET or PM timer, and warn if the
deviation from previous value is more than about 500 PPM, which
provides a way to verify the data from hardware / firmware.
There is no functional change to existing work flow.
Recently there was a real-world case: "The 40ms/s divergence between
TSC and HPET was observed on hardware that is quite recent" [1], on
that platform the TSC frequence 1896 MHz was got from CPUID(0x15),
and the force-reclibration with HPET/PMTIMER both calibrated out
value of 1975 MHz, which also matched with check from software
'chronyd', indicating it's a problem of BIOS or firmware.
[Thanks tglx for helping improving the commit log]
[ paulmck: Wordsmith Kconfig help text. ]
[1]. https://lore.kernel.org/lkml/20221117230910.GI4001@paulmck-ThinkPad-P17-Gen-1/
Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: <x86@kernel.org>
Cc: <linux-doc@vger.kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
In order to use sched_clock() from noinstr code, mark it and all it's
implenentations noinstr.
The whole pvclock thing (used by KVM/Xen) is a bit of a pain,
since it calls out to watchdogs, create a
pvclock_clocksource_read_nowd() variant doesn't do that and can be
noinstr.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230126151323.702003578@infradead.org
|
|
The symbol is not used outside of the file, so mark it static.
Fixes the following warning:
arch/x86/kernel/tsc.c:53:20: warning:
symbol 'art_related_clocksource' was not declared. Should it be static?
Signed-off-by: Chen Lifu <chenlifu@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220823021821.3052159-1-chenlifu@huawei.com
|
|
Currently loops_per_jiffy is set in tsc_early_init(), but then don't
switch to delay_tsc, with the result that delay_loop is used with
loops_per_jiffy set for delay_tsc.
Then in (late) tsc_init() lpj_fine is set (which is mostly unused) and
after which use_tsc_delay() is finally called.
Move both loops_per_jiffy and use_tsc_delay() into
tsc_enable_sched_clock() which is called the moment tsc_khz is
determined, be it early or late. Keeping the lot consistent.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220304152135.914397165@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
There are cases that the TSC clocksource is wrongly judged as unstable by
the clocksource watchdog mechanism which tries to validate the TSC against
HPET, PM_TIMER or jiffies. While there is hardly a general reliable way to
check the validity of a watchdog, Thomas Gleixner proposed [1]:
"I'm inclined to lift that requirement when the CPU has:
1) X86_FEATURE_CONSTANT_TSC
2) X86_FEATURE_NONSTOP_TSC
3) X86_FEATURE_NONSTOP_TSC_S3
4) X86_FEATURE_TSC_ADJUST
5) At max. 4 sockets
After two decades of horrors we're finally at a point where TSC seems
to be halfway reliable and less abused by BIOS tinkerers. TSC_ADJUST
was really key as we can now detect even small modifications reliably
and the important point is that we can cure them as well (not pretty
but better than all other options)."
As feature #3 X86_FEATURE_NONSTOP_TSC_S3 only exists on several generations
of Atom processorz, and is always coupled with X86_FEATURE_CONSTANT_TSC
and X86_FEATURE_NONSTOP_TSC, skip checking it, and also be more defensive
to use maximal 2 sockets.
The check is done inside tsc_init() before registering 'tsc-early' and
'tsc' clocksources, as there were cases that both of them had been
wrongly judged as unreliable.
For more background of tsc/watchdog, there is a good summary in [2]
[tglx} Update vs. jiffies:
On systems where the only remaining clocksource aside of TSC is jiffies
there is no way to make this work because that creates a circular
dependency. Jiffies accuracy depends on not missing a periodic timer
interrupt, which is not guaranteed. That could be detected by TSC, but as
TSC is not trusted this cannot be compensated. The consequence is a
circulus vitiosus which results in shutting down TSC and falling back to
the jiffies clocksource which is even more unreliable.
[1]. https://lore.kernel.org/lkml/87eekfk8bd.fsf@nanos.tec.linutronix.de/
[2]. https://lore.kernel.org/lkml/87a6pimt1f.ffs@nanos.tec.linutronix.de/
[ tglx: Refine comment and amend changelog ]
Fixes: 6e3cd95234dc ("x86/hpet: Use another crystalball to evaluate HPET usability")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211117023751.24190-2-feng.tang@intel.com
|
|
Currently, WATCHDOG_THRESHOLD is set to detect a 62.5-millisecond skew in
a 500-millisecond WATCHDOG_INTERVAL. This requires that clocks be skewed
by more than 12.5% in order to be marked unstable. Except that a clock
that is skewed by that much is probably destroying unsuspecting software
right and left. And given that there are now checks for false-positive
skews due to delays between reading the two clocks, it should be possible
to greatly decrease WATCHDOG_THRESHOLD, at least for fine-grained clocks
such as TSC.
Therefore, add a new uncertainty_margin field to the clocksource structure
that contains the maximum uncertainty in nanoseconds for the corresponding
clock. This field may be initialized manually, as it is for
clocksource_tsc_early and clocksource_jiffies, which is copied to
refined_jiffies. If the field is not initialized manually, it will be
computed at clock-registry time as the period of the clock in question
based on the scale and freq parameters to __clocksource_update_freq_scale()
function. If either of those two parameters are zero, the
tens-of-milliseconds WATCHDOG_THRESHOLD is used as a cowardly alternative
to dividing by zero. No matter how the uncertainty_margin field is
calculated, it is bounded below by twice WATCHDOG_MAX_SKEW, that is, by 100
microseconds.
Note that manually initialized uncertainty_margin fields are not adjusted,
but there is a WARN_ON_ONCE() that triggers if any such field is less than
twice WATCHDOG_MAX_SKEW. This WARN_ON_ONCE() is intended to discourage
production use of the one-nanosecond uncertainty_margin values that are
used to test the clock-skew code itself.
The actual clock-skew check uses the sum of the uncertainty_margin fields
of the two clocksource structures being compared. Integer overflow is
avoided because the largest computed value of the uncertainty_margin
fields is one billion (10^9), and double that value fits into an
unsigned int. However, if someone manually specifies (say) UINT_MAX,
they will get what they deserve.
Note that the refined_jiffies uncertainty_margin field is initialized to
TICK_NSEC, which means that skew checks involving this clocksource will
be sufficently forgiving. In a similar vein, the clocksource_tsc_early
uncertainty_margin field is initialized to 32*NSEC_PER_MSEC, which
replicates the current behavior and allows custom setting if needed
in order to address the rare skews detected for this clocksource in
current mainline.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Feng Tang <feng.tang@intel.com>
Link: https://lore.kernel.org/r/20210527190124.440372-4-paulmck@kernel.org
|
|
Some sorts of per-CPU clock sources have a history of going out of
synchronization with each other. However, this problem has purportedy been
solved in the past ten years. Except that it is all too possible that the
problem has instead simply been made less likely, which might mean that
some of the occasional "Marking clocksource 'tsc' as unstable" messages
might be due to desynchronization. How would anyone know?
Therefore apply CPU-to-CPU synchronization checking to newly unstable
clocksource that are marked with the new CLOCK_SOURCE_VERIFY_PERCPU flag.
Lists of desynchronized CPUs are printed, with the caveat that if it
is the reporting CPU that is itself desynchronized, it will appear that
all the other clocks are wrong. Just like in real life.
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Feng Tang <feng.tang@intel.com>
Link: https://lore.kernel.org/r/20210527190124.440372-2-paulmck@kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 cleanups from Borislav Petkov:
"Trivial cleanups and fixes all over the place"
* tag 'x86_cleanups_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
MAINTAINERS: Remove me from IDE/ATAPI section
x86/pat: Do not compile stubbed functions when X86_PAT is off
x86/asm: Ensure asm/proto.h can be included stand-alone
x86/platform/intel/quark: Fix incorrect kernel-doc comment syntax in files
x86/msr: Make locally used functions static
x86/cacheinfo: Remove unneeded dead-store initialization
x86/process/64: Move cpu_current_top_of_stack out of TSS
tools/turbostat: Unmark non-kernel-doc comment
x86/syscalls: Fix -Wmissing-prototypes warnings from COND_SYSCALL()
x86/fpu/math-emu: Fix function cast warning
x86/msr: Fix wr/rdmsr_safe_regs_on_cpu() prototypes
x86: Fix various typos in comments, take #2
x86: Remove unusual Unicode characters from comments
x86/kaslr: Return boolean values from a function returning bool
x86: Fix various typos in comments
x86/setup: Remove unused RESERVE_BRK_ARRAY()
stacktrace: Move documentation for arch_stack_walk_reliable() to header
x86: Remove duplicate TSC DEADLINE MSR definitions
|
|
Fix ~144 single-word typos in arch/x86/ code comments.
Doing this in a single commit should reduce the churn.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-kernel@vger.kernel.org
|
|
The time pvops functions are the only ones left which might be
used in 32-bit mode and which return a 64-bit value.
Switch them to use the static_call() mechanism instead of pvops, as
this allows quite some simplification of the pvops implementation.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210311142319.4723-5-jgross@suse.com
|
|
Latch sequence counters have unique read and write APIs, and thus
seqcount_latch_t was recently introduced at seqlock.h.
Use that new data type instead of plain seqcount_t. This adds the
necessary type-safety and ensures that only latching-safe seqcount APIs
are to be used.
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
[peterz: unwreck cyc2ns_read_begin()]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200827114044.11173-7-a.darwish@linutronix.de
|
|
Changing base clock frequency directly impacts TSC Hz but not CPUID.16h
value. An overclocked CPU supporting CPUID.16h and with partial CPUID.15h
support will set TSC KHZ according to "best guess" given by CPUID.16h
relying on tsc_refine_calibration_work to give better numbers later.
tsc_refine_calibration_work will refuse to do its work when the outcome is
off the early TSC KHZ value by more than 1% which is certain to happen on
an overclocked system.
Fix this by adding a tsc_early_khz command line parameter that makes the
kernel skip early TSC calibration and use the given value instead.
This allows the user to provide the expected TSC frequency that is closer
to reality than the one reported by the hardware, enabling
tsc_refine_calibration_work to do meaningful error checking.
[ tglx: Made the variable __initdata as it's only used on init and
removed the error checking in the argument parser because
kstrto*() only stores to the variable if the string is valid ]
Signed-off-by: Krzysztof Piecuch <piecuch@protonmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/O2CpIOrqLZHgNRkfjRpz_LGqnc1ix_seNIiOCvHY4RHoulOVRo6kMXKuLOfBVTi0SMMevg6Go1uZ_cL9fLYtYdTRNH78ChaFaZyG3VAyYz8=@protonmail.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
"This topic tree contains more commits than usual:
- most of it are uaccess cleanups/reorganization by Al
- there's a bunch of prototype declaration (--Wmissing-prototypes)
cleanups
- misc other cleanups all around the map"
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
x86/mm/set_memory: Fix -Wmissing-prototypes warnings
x86/efi: Add a prototype for efi_arch_mem_reserve()
x86/mm: Mark setup_emu2phys_nid() static
x86/jump_label: Move 'inline' keyword placement
x86/platform/uv: Add a missing prototype for uv_bau_message_interrupt()
kill uaccess_try()
x86: unsafe_put-style macro for sigmask
x86: x32_setup_rt_frame(): consolidate uaccess areas
x86: __setup_rt_frame(): consolidate uaccess areas
x86: __setup_frame(): consolidate uaccess areas
x86: setup_sigcontext(): list user_access_{begin,end}() into callers
x86: get rid of put_user_try in __setup_rt_frame() (both 32bit and 64bit)
x86: ia32_setup_rt_frame(): consolidate uaccess areas
x86: ia32_setup_frame(): consolidate uaccess areas
x86: ia32_setup_sigcontext(): lift user_access_{begin,end}() into the callers
x86/alternatives: Mark text_poke_loc_init() static
x86/cpu: Fix a -Wmissing-prototypes warning for init_ia32_feat_ctl()
x86/mm: Drop pud_mknotpresent()
x86: Replace setup_irq() by request_irq()
x86/configs: Slightly reduce defconfigs
...
|
|
Switch to the generic VDSO clock mode storage.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> (VDSO parts)
Acked-by: Juergen Gross <jgross@suse.com> (Xen parts)
Acked-by: Paolo Bonzini <pbonzini@redhat.com> (KVM parts)
Link: https://lkml.kernel.org/r/20200207124403.152039903@linutronix.de
|
|
All architectures which use the generic VDSO code have their own storage
for the VDSO clock mode. That's pointless and just requires duplicate code.
X86 abuses the function which retrieves the architecture specific clock
mode storage to mark the clocksource as used in the VDSO. That's silly
because this is invoked on every tick when the VDSO data is updated.
Move this functionality to the clocksource::enable() callback so it gets
invoked once when the clocksource is installed. This allows to make the
clock mode storage generic.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Michael Kelley <mikelley@microsoft.com> (Hyper-V parts)
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> (VDSO parts)
Acked-by: Juergen Gross <jgross@suse.com> (Xen parts)
Link: https://lkml.kernel.org/r/20200207124402.934519777@linutronix.de
|
|
Fix a couple of typos in code comments.
[ bp: While at it: s/IRQ's/IRQs/. ]
Signed-off-by: Martin Molnar <martin.molnar.programming@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lkml.kernel.org/r/0819a044-c360-44a4-f0b6-3f5bafe2d35c@gmail.com
|
|
The introduction of clocksource_tsc_early broke the functionality of
"tsc=reliable" and "tsc=nowatchdog" command line parameters, since
clocksource_tsc_early is unconditionally registered with
CLOCK_SOURCE_MUST_VERIFY and thus put on the watchdog list.
This can cause the TSC to be declared unstable during boot:
clocksource: timekeeping watchdog on CPU0: Marking clocksource
'tsc-early' as unstable because the skew is too large:
clocksource: 'refined-jiffies' wd_now: fffb7018 wd_last: fffb6e9d
mask: ffffffff
clocksource: 'tsc-early' cs_now: 68a6a7070f6a0 cs_last: 68a69ab6f74d6
mask: ffffffffffffffff
tsc: Marking TSC unstable due to clocksource watchdog
The corresponding elapsed times are cs_nsec=1224152026 and wd_nsec=378942392, so
the watchdog differs from TSC by 0.84 seconds.
This happens when HPET is not available and jiffies are used as the TSC
watchdog instead and the jiffies update is not happening due to lost timer
interrupts in periodic mode, which can happen e.g. with expensive debug
mechanisms enabled or under massive overload conditions in virtualized
environments.
Before the introduction of the early TSC clocksource the command line
parameters "tsc=reliable" and "tsc=nowatchdog" could be used to work around
this issue.
Restore the behaviour by disabling the watchdog if requested on the kernel
command line.
[ tglx: Clarify changelog ]
Fixes: aa83c45762a24 ("x86/tsc: Introduce early tsc clocksource")
Signed-off-by: Michael Zhivich <mzhivich@akamai.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191024175945.14338-1-mzhivich@akamai.com
|
|
Currently big microservers have _XEON_D while small microservers have
_X, Make it uniformly: _D.
for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_\(X\|XEON_D\)"`
do
sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*ATOM.*\)_X/\1_D/g' \
-e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_XEON_D/\1_D/g' ${i}
done
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20190827195122.677152989@infradead.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
"Misc small cleanups: removal of superfluous code and coding style
cleanups mostly"
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/kexec: Make variable static and config dependent
x86/defconfigs: Remove useless UEVENT_HELPER_PATH
x86/amd_nb: Make hygon_nb_misc_ids static
x86/tsc: Move inline keyword to the beginning of function declarations
x86/io_delay: Define IO_DELAY macros in C instead of Kconfig
x86/io_delay: Break instead of fallthrough in switch statement
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x96 apic updates from Thomas Gleixner:
"Updates for the x86 APIC interrupt handling and APIC timer:
- Fix a long standing issue with spurious interrupts which was caused
by the big vector management rework a few years ago. Robert Hodaszi
provided finally enough debug data and an excellent initial failure
analysis which allowed to understand the underlying issues.
This contains a change to the core interrupt management code which
is required to handle this correctly for the APIC/IO_APIC. The core
changes are NOOPs for most architectures except ARM64. ARM64 is not
impacted by the change as confirmed by Marc Zyngier.
- Newer systems allow to disable the PIT clock for power saving
causing panic in the timer interrupt delivery check of the IO/APIC
when the HPET timer is not enabled either. While the clock could be
turned on this would cause an endless whack a mole game to chase
the proper register in each affected chipset.
These systems provide the relevant frequencies for TSC, CPU and the
local APIC timer via CPUID and/or MSRs, which allows to avoid the
PIT/HPET based calibration. As the calibration code is the only
usage of the legacy timers on modern systems and is skipped anyway
when the frequencies are known already, there is no point in
setting up the PIT and actually checking for the interrupt delivery
via IO/APIC.
To achieve this on a wide variety of platforms, the CPUID/MSR based
frequency readout has been made more robust, which also allowed to
remove quite some workarounds which turned out to be not longer
required. Thanks to Daniel Drake for analysis, patches and
verification"
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/irq: Seperate unused system vectors from spurious entry again
x86/irq: Handle spurious interrupt after shutdown gracefully
x86/ioapic: Implement irq_get_irqchip_state() callback
genirq: Add optional hardware synchronization for shutdown
genirq: Fix misleading synchronize_irq() documentation
genirq: Delay deactivation in free_irq()
x86/timer: Skip PIT initialization on modern chipsets
x86/apic: Use non-atomic operations when possible
x86/apic: Make apic_bsp_setup() static
x86/tsc: Set LAPIC timer period to crystal clock frequency
x86/apic: Rename 'lapic_timer_frequency' to 'lapic_timer_period'
x86/tsc: Use CPUID.0x16 to calculate missing crystal frequency
|
|
The inline keyword was not at the beginning of the function declarations.
Fix the following warnings triggered when using W=1:
arch/x86/kernel/tsc.c:62:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration]
arch/x86/kernel/tsc.c:79:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration]
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: trivial@kernel.org
Cc: kernel-janitors@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20190524103252.28575-1-malat@debian.org
|
|
Add SPDX license identifiers to all files which:
- Have no license information of any form
- Have EXPORT_.*_SYMBOL_GPL inside which was used in the
initial scan/conversion to ignore the file
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Currently, the notifiers are called once for each CPU of the policy->cpus
cpumask. It would be more optimal if the notifier can be called only
once and all the relevant information be provided to it. Out of the 23
drivers that register for the transition notifiers today, only 4 of them
do per-cpu updates and the callback for the rest can be called only once
for the policy without any impact.
This would also avoid multiple function calls to the notifier callbacks
and reduce multiple iterations of notifier core's code (which does
locking as well).
This patch adds pointer to the cpufreq policy to the struct
cpufreq_freqs, so the notifier callback has all the information
available to it with a single call. The five drivers which perform
per-cpu updates are updated to use the cpufreq policy. The freqs->cpu
field is redundant now and is removed.
Acked-by: David S. Miller <davem@davemloft.net> (sparc)
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The APIC timer calibration (calibrate_APIC_timer()) can be skipped
in cases where we know the APIC timer frequency. On Intel SoCs,
we believe that the APIC is fed by the crystal clock; this would make
sense, and the crystal clock frequency has been verified against the
APIC timer calibration result on ApolloLake, GeminiLake, Kabylake,
CoffeeLake, WhiskeyLake and AmberLake.
Set lapic_timer_period based on the crystal clock frequency
accordingly.
APIC timer calibration would normally be skipped on modern CPUs
by nature of the TSC deadline timer being used instead,
however this change is still potentially useful, e.g. if the
TSC deadline timer has been disabled with a kernel parameter.
calibrate_APIC_timer() uses the legacy timer, but we are seeing
new platforms that omit such legacy functionality, so avoiding
such codepaths is becoming more important.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Daniel Drake <drake@endlessm.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: len.brown@intel.com
Cc: linux@endlessm.com
Cc: rafael.j.wysocki@intel.com
Link: http://lkml.kernel.org/r/20190509055417.13152-3-drake@endlessm.com
Link: https://lkml.kernel.org/r/20190419083533.32388-1-drake@endlessm.com
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1904031206440.1967@nanos.tec.linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
native_calibrate_tsc() had a data mapping Intel CPU families
and crystal clock speed, but hardcoded tables are not ideal, and this
approach was already problematic at least in the Skylake X case, as
seen in commit:
b51120309348 ("x86/tsc: Fix erroneous TSC rate on Skylake Xeon")
By examining CPUID data from http://instlatx64.atw.hu/ and units
in the lab, we have found that 3 different scenarios need to be dealt
with, and we can eliminate most of the hardcoded data using an approach a
little more advanced than before:
1. ApolloLake, GeminiLake, CannonLake (and presumably all new chipsets
from this point) report the crystal frequency directly via CPUID.0x15.
That's definitive data that we can rely upon.
2. Skylake, Kabylake and all variants of those two chipsets report a
crystal frequency of zero, however we can calculate the crystal clock
speed by condidering data from CPUID.0x16.
This method correctly distinguishes between the two crystal clock
frequencies present on different Skylake X variants that caused
headaches before.
As the calculations do not quite match the previously-hardcoded values
in some cases (e.g. 23913043Hz instead of 24MHz), TSC refinement is
enabled on all platforms where we had to calculate the crystal
frequency in this way.
3. Denverton (GOLDMONT_X) reports a crystal frequency of zero and does
not support CPUID.0x16, so we leave this entry hardcoded.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Daniel Drake <drake@endlessm.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: len.brown@intel.com
Cc: linux@endlessm.com
Cc: rafael.j.wysocki@intel.com
Link: http://lkml.kernel.org/r/20190509055417.13152-1-drake@endlessm.com
Link: https://lkml.kernel.org/r/20190419083533.32388-1-drake@endlessm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These fix the (Intel-specific) Performance and Energy Bias Hint (EPB)
handling and expose it to user space via sysfs, fix and clean up
several cpufreq drivers, add support for two new chips to the qoriq
cpufreq driver, fix, simplify and clean up the cpufreq core and the
schedutil governor, add support for "CPU" domains to the generic power
domains (genpd) framework and provide low-level PSCI firmware support
for that feature, fix the exynos cpuidle driver and fix a couple of
issues in the devfreq subsystem and clean it up.
Specifics:
- Fix the handling of Performance and Energy Bias Hint (EPB) on Intel
processors and expose it to user space via sysfs to avoid having to
access it through the generic MSR I/F (Rafael Wysocki).
- Improve the handling of global turbo changes made by the platform
firmware in the intel_pstate driver (Rafael Wysocki).
- Convert some slow-path static_cpu_has() callers to boot_cpu_has()
in cpufreq (Borislav Petkov).
- Fix the frequency calculation loop in the armada-37xx cpufreq
driver (Gregory CLEMENT).
- Fix possible object reference leaks in multuple cpufreq drivers
(Wen Yang).
- Fix kerneldoc comment in the centrino cpufreq driver (dongjian).
- Clean up the ACPI and maple cpufreq drivers (Viresh Kumar, Mohan
Kumar).
- Add support for lx2160a and ls1028a to the qoriq cpufreq driver
(Vabhav Sharma, Yuantian Tang).
- Fix kobject memory leak in the cpufreq core (Viresh Kumar).
- Simplify the IOwait boosting in the schedutil cpufreq governor and
rework the TSC cpufreq notifier on x86 (Rafael Wysocki).
- Clean up the cpufreq core and statistics code (Yue Hu, Kyle Lin).
- Improve the cpufreq documentation, add SPDX license tags to some PM
documentation files and unify copyright notices in them (Rafael
Wysocki).
- Add support for "CPU" domains to the generic power domains (genpd)
framework and provide low-level PSCI firmware support for that
feature (Ulf Hansson).
- Rearrange the PSCI firmware support code and add support for
SYSTEM_RESET2 to it (Ulf Hansson, Sudeep Holla).
- Improve genpd support for devices in multiple power domains (Ulf
Hansson).
- Unify target residency for the AFTR and coupled AFTR states in the
exynos cpuidle driver (Marek Szyprowski).
- Introduce new helper routine in the operating performance points
(OPP) framework (Andrew-sh.Cheng).
- Add support for passing on-die termination (ODT) and auto power
down parameters from the kernel to Trusted Firmware-A (TF-A) to the
rk3399_dmc devfreq driver (Enric Balletbo i Serra).
- Add tracing to devfreq (Lukasz Luba).
- Make the exynos-bus devfreq driver suspend all devices on system
shutdown (Marek Szyprowski).
- Fix a few minor issues in the devfreq subsystem and clean it up
somewhat (Enric Balletbo i Serra, MyungJoo Ham, Rob Herring,
Saravana Kannan, Yangtao Li).
- Improve system wakeup diagnostics (Stephen Boyd).
- Rework filesystem sync messages emitted during system suspend and
hibernation (Harry Pan)"
* tag 'pm-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (72 commits)
cpufreq: Fix kobject memleak
cpufreq: armada-37xx: fix frequency calculation for opp
cpufreq: centrino: Fix centrino_setpolicy() kerneldoc comment
cpufreq: qoriq: add support for lx2160a
x86: tsc: Rework time_cpufreq_notifier()
PM / Domains: Allow to attach a CPU via genpd_dev_pm_attach_by_id|name()
PM / Domains: Search for the CPU device outside the genpd lock
PM / Domains: Drop unused in-parameter to some genpd functions
PM / Domains: Use the base device for driver_deferred_probe_check_state()
cpufreq: qoriq: Add ls1028a chip support
PM / Domains: Enable genpd_dev_pm_attach_by_id|name() for single PM domain
PM / Domains: Allow OF lookup for multi PM domain case from ->attach_dev()
PM / Domains: Don't kfree() the virtual device in the error path
cpufreq: Move ->get callback check outside of __cpufreq_get()
PM / Domains: remove unnecessary unlikely()
cpufreq: Remove needless bios_limit check in show_bios_limit()
drivers/cpufreq/acpi-cpufreq.c: This fixes the following checkpatch warning
firmware/psci: add support for SYSTEM_RESET2
PM / devfreq: add tracing for scheduling work
trace: events: add devfreq trace event file
...
|
|
There are problems with running time_cpufreq_notifier() on SMP
systems.
First off, the rdtsc() called from there runs on the CPU executing
that code and not necessarily on the CPU whose sched_clock() rate is
updated which is questionable at best.
Second, in the cases when the frequencies of all CPUs in an SMP
system are always in sync, it is not sufficient to update just
one of them or the set associated with a given cpufreq policy on
frequency changes - all CPUs in the system should be updated and
that would require more than a simple transition notifier.
Note, however, that the underlying issue (the TSC rate depending on
the CPU frequency) has not been present in hardware shipping for the
last few years and in quite a few relevant cases (acpi-cpufreq in
particular) running time_cpufreq_notifier() will cause the TSC to
be marked as unstable anyway.
For this reason, make time_cpufreq_notifier() simply mark the TSC
as unstable and give up when run on SMP and only try to carry out
any adjustments otherwise.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
Clocksource watchdog has been found responsible for generating latency
spikes (in the 10-20 us range) when woken up to check for TSC stability.
Add an option to disable it at boot.
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: bigeasy@linutronix.de
Cc: linux-rt-users@vger.kernel.org
Cc: peterz@infradead.org
Cc: bristot@redhat.com
Cc: williams@redhat.com
Link: https://lkml.kernel.org/r/20190307120913.13168-1-juri.lelli@redhat.com
|
|
The threshold in tsc_read_refs() is constant which may favor slower CPUs
but may not be optimal for simple reading of reference on faster ones.
Hence make it proportional to tsc_khz when available to compensate for
this. The threshold guards against any disturbance like IRQs, NMIs, SMIs
or CPU stealing by host on guest systems so rename it accordingly and
fix comments as well.
Also on some systems there is noticeable DMI bus contention at some point
during boot keeping the readout failing (observed with about one in ~300
boots when testing). In that case retry also the second readout instead of
simply bailing out unrefined. Usually the next second the readout returns
fast just fine without any issues.
Signed-off-by: Daniel Vacek <neelx@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/1541437840-29293-1-git-send-email-neelx@redhat.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 paravirt updates from Ingo Molnar:
"Two main changes:
- Remove no longer used parts of the paravirt infrastructure and put
large quantities of paravirt ops under a new config option
PARAVIRT_XXL=y, which is selected by XEN_PV only. (Joergen Gross)
- Enable PV spinlocks on Hyperv (Yi Sun)"
* 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/hyperv: Enable PV qspinlock for Hyper-V
x86/hyperv: Add GUEST_IDLE_MSR support
x86/paravirt: Clean up native_patch()
x86/paravirt: Prevent redefinition of SAVE_FLAGS macro
x86/xen: Make xen_reservation_lock static
x86/paravirt: Remove unneeded mmu related paravirt ops bits
x86/paravirt: Move the Xen-only pv_mmu_ops under the PARAVIRT_XXL umbrella
x86/paravirt: Move the pv_irq_ops under the PARAVIRT_XXL umbrella
x86/paravirt: Move the Xen-only pv_cpu_ops under the PARAVIRT_XXL umbrella
x86/paravirt: Move items in pv_info under PARAVIRT_XXL umbrella
x86/paravirt: Introduce new config option PARAVIRT_XXL
x86/paravirt: Remove unused paravirt bits
x86/paravirt: Use a single ops structure
x86/paravirt: Remove clobbers from struct paravirt_patch_site
x86/paravirt: Remove clobbers parameter from paravirt patch functions
x86/paravirt: Make paravirt_patch_call() and paravirt_patch_jmp() static
x86/xen: Add SPDX identifier in arch/x86/xen files
x86/xen: Link platform-pci-unplug.o only if CONFIG_XEN_PVHVM
x86/xen: Move pv specific parts of arch/x86/xen/mmu.c to mmu_pv.c
x86/xen: Move pv irq related functions under CONFIG_XEN_PV umbrella
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
"The main updates in this cycle were:
- Lots of perf tooling changes too voluminous to list (big perf trace
and perf stat improvements, lots of libtraceevent reorganization,
etc.), so I'll list the authors and refer to the changelog for
details:
Benjamin Peterson, Jérémie Galarneau, Kim Phillips, Peter
Zijlstra, Ravi Bangoria, Sangwon Hong, Sean V Kelley, Steven
Rostedt, Thomas Gleixner, Ding Xiang, Eduardo Habkost, Thomas
Richter, Andi Kleen, Sanskriti Sharma, Adrian Hunter, Tzvetomir
Stoyanov, Arnaldo Carvalho de Melo, Jiri Olsa.
... with the bulk of the changes written by Jiri Olsa, Tzvetomir
Stoyanov and Arnaldo Carvalho de Melo.
- Continued intel_rdt work with a focus on playing well with perf
events. This also imported some non-perf RDT work due to
dependencies. (Reinette Chatre)
- Implement counter freezing for Arch Perfmon v4 (Skylake and newer).
This allows to speed up the PMI handler by avoiding unnecessary MSR
writes and make it more accurate. (Andi Kleen)
- kprobes cleanups and simplification (Masami Hiramatsu)
- Intel Goldmont PMU updates (Kan Liang)
- ... plus misc other fixes and updates"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (155 commits)
kprobes/x86: Use preempt_enable() in optimized_callback()
x86/intel_rdt: Prevent pseudo-locking from using stale pointers
kprobes, x86/ptrace.h: Make regs_get_kernel_stack_nth() not fault on bad stack
perf/x86/intel: Export mem events only if there's PEBS support
x86/cpu: Drop pointless static qualifier in punit_dev_state_show()
x86/intel_rdt: Fix initial allocation to consider CDP
x86/intel_rdt: CBM overlap should also check for overlap with CDP peer
x86/intel_rdt: Introduce utility to obtain CDP peer
tools lib traceevent, perf tools: Move struct tep_handler definition in a local header file
tools lib traceevent: Separate out tep_strerror() for strerror_r() issues
perf python: More portable way to make CFLAGS work with clang
perf python: Make clang_has_option() work on Python 3
perf tools: Free temporary 'sys' string in read_event_files()
perf tools: Avoid double free in read_event_file()
perf tools: Free 'printk' string in parse_ftrace_printk()
perf tools: Cleanup trace-event-info 'tdata' leak
perf strbuf: Match va_{add,copy} with va_end
perf test: S390 does not support watchpoints in test 22
perf auxtrace: Include missing asm/bitsperlong.h to get BITS_PER_LONG
tools include: Adopt linux/bits.h
...
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Looking at the asm for native_sched_clock() I noticed we don't inline
enough. Mostly caused by sharing code with cyc2ns_read_begin(), which
we didn't used to do. So mark all that __force_inline to make it DTRT.
Fixes: 59eaef78bfea ("x86/tsc: Remodel cyc2ns to use seqcount_latch()")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: hpa@zytor.com
Cc: eric.dumazet@gmail.com
Cc: bp@alien8.de
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181011104019.695196158@infradead.org
|
|
The recent rework of the TSC calibration code introduced a regression on UV
systems as it added a call to tsc_early_init() which initializes the TSC
ADJUST values before acpi_boot_table_init(). In the case of UV systems,
that is a necessary step that calls uv_system_init(). This informs
tsc_sanitize_first_cpu() that the kernel runs on a platform with async TSC
resets as documented in commit 341102c3ef29 ("x86/tsc: Add option that TSC
on Socket 0 being non-zero is valid")
Fix it by skipping the early tsc initialization on UV systems and let TSC
init tests take place later in tsc_init().
Fixes: cf7a63ef4e02 ("x86/tsc: Calibrate tsc only once")
Suggested-by: Hedi Berriche <hedi.berriche@hpe.com>
Signed-off-by: Mike Travis <mike.travis@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Russ Anderson <rja@hpe.com>
Reviewed-by: Dimitri Sivanich <sivanich@hpe.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Russ Anderson <russ.anderson@hpe.com>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Xiaoming Gao <gxm.linux.kernel@gmail.com>
Cc: Rajvi Jingar <rajvi.jingar@intel.com>
Link: https://lkml.kernel.org/r/20181002180144.923579706@stormcage.americas.sgi.com
|
|
Going primarily by:
https://en.wikipedia.org/wiki/List_of_Intel_Atom_microprocessors
with additional information gleaned from other related pages; notably:
- Bonnell shrink was called Saltwell
- Moorefield is the Merriefield refresh which makes it Airmont
The general naming scheme is: FAM6_ATOM_UARCH_SOCTYPE
for i in `git grep -l FAM6_ATOM` ; do
sed -i -e 's/ATOM_PINEVIEW/ATOM_BONNELL/g' \
-e 's/ATOM_LINCROFT/ATOM_BONNELL_MID/' \
-e 's/ATOM_PENWELL/ATOM_SALTWELL_MID/g' \
-e 's/ATOM_CLOVERVIEW/ATOM_SALTWELL_TABLET/g' \
-e 's/ATOM_CEDARVIEW/ATOM_SALTWELL/g' \
-e 's/ATOM_SILVERMONT1/ATOM_SILVERMONT/g' \
-e 's/ATOM_SILVERMONT2/ATOM_SILVERMONT_X/g' \
-e 's/ATOM_MERRIFIELD/ATOM_SILVERMONT_MID/g' \
-e 's/ATOM_MOOREFIELD/ATOM_AIRMONT_MID/g' \
-e 's/ATOM_DENVERTON/ATOM_GOLDMONT_X/g' \
-e 's/ATOM_GEMINI_LAKE/ATOM_GOLDMONT_PLUS/g' ${i}
done
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: dave.hansen@linux.intel.com
Cc: len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Loops per jiffy is calculated by multiplying tsc_khz with 1e3 and then
dividing it by HZ.
Both tsc_khz and the temporary variable holding the multiplication result
are of type unsigned long, so on 32bit the result is truncated to the lower
32bit.
Use u64 as type for the temporary variable and cast tsc_khz to it before
multiplying.
[ tglx: Massaged changelog and removed pointless braces ]
Fixes: cf7a63ef4e02 ("x86/tsc: Calibrate tsc only once")
Signed-off-by: Chuanhua Lei <chuanhua.lei@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: yixin.zhu@linux.intel.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@microsoft.com>
Cc: Rajvi Jingar <rajvi.jingar@intel.com>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Link: https://lkml.kernel.org/r/1536228203-18701-1-git-send-email-chuanhua.lei@linux.intel.com
|
|
Instead of using six globally visible paravirt ops structures combine
them in a single structure, keeping the original structures as
sub-structures.
This avoids the need to assemble struct paravirt_patch_template at
runtime on the stack each time apply_paravirt() is being called (i.e.
when loading a module).
[ tglx: Made the struct and the initializer tabular for readability sake ]
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel@lists.xenproject.org
Cc: virtualization@lists.linux-foundation.org
Cc: akataria@vmware.com
Cc: rusty@rustcorp.com.au
Cc: boris.ostrovsky@oracle.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/20180828074026.820-9-jgross@suse.com
|
|
Split out suplicated code from tsc_early_init() and tsc_init() into a
common helper and fixup some comment typos.
[ tglx: Massaged changelog and renamed function ]
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20180730075421.22830-2-douly.fnst@cn.fujitsu.com
|
|
During early boot enable tsc_calibrate_cpu_early() and switch to
tsc_calibrate_cpu() only later. Do this unconditionally, because it is
unknown what methods other cpus will use to calibrate once they are
onlined.
If by the time tsc_init() is called tsc frequency is still unknown do only
pit_hpet_ptimer_calibrate_cpu() to calibrate, as this function contains the
only methods wich have not been called and tried earlier.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: steven.sistare@oracle.com
Cc: daniel.m.jordan@oracle.com
Cc: linux@armlinux.org.uk
Cc: schwidefsky@de.ibm.com
Cc: heiko.carstens@de.ibm.com
Cc: john.stultz@linaro.org
Cc: sboyd@codeaurora.org
Cc: hpa@zytor.com
Cc: douly.fnst@cn.fujitsu.com
Cc: peterz@infradead.org
Cc: prarit@redhat.com
Cc: feng.tang@intel.com
Cc: pmladek@suse.com
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: linux-s390@vger.kernel.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Cc: pbonzini@redhat.com
Link: https://lkml.kernel.org/r/20180719205545.16512-27-pasha.tatashin@oracle.com
|
|
During early boot TSC and CPU frequency can be calibrated using MSR, CPUID,
and quick PIT calibration methods. The other methods PIT/HPET/PMTIMER are
available only after ACPI is initialized.
Split native_calibrate_cpu() into early and late parts so they can be
called separately during early and late tsc calibration.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: steven.sistare@oracle.com
Cc: daniel.m.jordan@oracle.com
Cc: linux@armlinux.org.uk
Cc: schwidefsky@de.ibm.com
Cc: heiko.carstens@de.ibm.com
Cc: john.stultz@linaro.org
Cc: sboyd@codeaurora.org
Cc: hpa@zytor.com
Cc: douly.fnst@cn.fujitsu.com
Cc: peterz@infradead.org
Cc: prarit@redhat.com
Cc: feng.tang@intel.com
Cc: pmladek@suse.com
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: linux-s390@vger.kernel.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Cc: pbonzini@redhat.com
Link: https://lkml.kernel.org/r/20180719205545.16512-26-pasha.tatashin@oracle.com
|
|
All prerequesites for enabling TSC as sched clock early in the boot
process are available now:
- Early attempt of TSC calibration
- Early availablity of static branch patching
If TSC frequency can be established in the early calibration, enable the
static key which switches sched clock to use TSC.
[ tglx: Massaged changelog ]
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: steven.sistare@oracle.com
Cc: daniel.m.jordan@oracle.com
Cc: linux@armlinux.org.uk
Cc: schwidefsky@de.ibm.com
Cc: heiko.carstens@de.ibm.com
Cc: john.stultz@linaro.org
Cc: sboyd@codeaurora.org
Cc: hpa@zytor.com
Cc: douly.fnst@cn.fujitsu.com
Cc: peterz@infradead.org
Cc: prarit@redhat.com
Cc: feng.tang@intel.com
Cc: pmladek@suse.com
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: linux-s390@vger.kernel.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Cc: pbonzini@redhat.com
Link: https://lkml.kernel.org/r/20180719205545.16512-22-pasha.tatashin@oracle.com
|
|
cyc2ns converts tsc to nanoseconds, and it is handled in a per-cpu data
structure.
Currently, the setup code for c2ns data for every possible CPU goes through
the same sequence of calculations as for the boot CPU, but is based on the
same tsc frequency as the boot CPU, and thus this is not necessary.
Initialize the boot cpu when tsc frequency is determined. Copy the
calculated data from the boot CPU to the other CPUs in tsc_init().
In addition do the following:
- Remove unnecessary zeroing of c2ns data by removing cyc2ns_data_init()
- Split set_cyc2ns_scale() into two functions, so set_cyc2ns_scale() can be
called when system is up, and wraps around __set_cyc2ns_scale() that can
be called directly when system is booting but avoids saving restoring
IRQs and going and waking up from idle.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: steven.sistare@oracle.com
Cc: daniel.m.jordan@oracle.com
Cc: linux@armlinux.org.uk
Cc: schwidefsky@de.ibm.com
Cc: heiko.carstens@de.ibm.com
Cc: john.stultz@linaro.org
Cc: sboyd@codeaurora.org
Cc: hpa@zytor.com
Cc: douly.fnst@cn.fujitsu.com
Cc: peterz@infradead.org
Cc: prarit@redhat.com
Cc: feng.tang@intel.com
Cc: pmladek@suse.com
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: linux-s390@vger.kernel.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Cc: pbonzini@redhat.com
Link: https://lkml.kernel.org/r/20180719205545.16512-21-pasha.tatashin@oracle.com
|
|
During boot tsc is calibrated twice: once in tsc_early_delay_calibrate(),
and the second time in tsc_init().
Rename tsc_early_delay_calibrate() to tsc_early_init(), and rework it so
the calibration is done only early, and make tsc_init() to use the values
already determined in tsc_early_init().
Sometimes it is not possible to determine tsc early, as the subsystem that
is required is not yet initialized, in such case try again later in
tsc_init().
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: steven.sistare@oracle.com
Cc: daniel.m.jordan@oracle.com
Cc: linux@armlinux.org.uk
Cc: schwidefsky@de.ibm.com
Cc: heiko.carstens@de.ibm.com
Cc: john.stultz@linaro.org
Cc: sboyd@codeaurora.org
Cc: hpa@zytor.com
Cc: douly.fnst@cn.fujitsu.com
Cc: peterz@infradead.org
Cc: prarit@redhat.com
Cc: feng.tang@intel.com
Cc: pmladek@suse.com
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: linux-s390@vger.kernel.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Cc: pbonzini@redhat.com
Link: https://lkml.kernel.org/r/20180719205545.16512-20-pasha.tatashin@oracle.com
|
|
Currently, the notsc kernel parameter disables the use of the TSC by
sched_clock(). However, this parameter does not prevent the kernel from
accessing tsc in other places.
The only rationale to boot with notsc is to avoid timing discrepancies on
multi-socket systems where TSC are not properly synchronized, and thus
exclude TSC from being used for time keeping. But that prevents using TSC
as sched_clock() as well, which is not necessary as the core sched_clock()
implementation can handle non synchronized TSC based sched clocks just
fine.
However, there is another method to solve the above problem: booting with
tsc=unstable parameter. This parameter allows sched_clock() to use TSC and
just excludes it from timekeeping.
So there is no real reason to keep notsc, but for compatibility reasons the
parameter has to stay. Make it behave like 'tsc=unstable' instead.
[ tglx: Massaged changelog ]
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: steven.sistare@oracle.com
Cc: daniel.m.jordan@oracle.com
Cc: linux@armlinux.org.uk
Cc: schwidefsky@de.ibm.com
Cc: heiko.carstens@de.ibm.com
Cc: john.stultz@linaro.org
Cc: sboyd@codeaurora.org
Cc: hpa@zytor.com
Cc: peterz@infradead.org
Cc: prarit@redhat.com
Cc: feng.tang@intel.com
Cc: pmladek@suse.com
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: linux-s390@vger.kernel.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Cc: pbonzini@redhat.com
Link: https://lkml.kernel.org/r/20180719205545.16512-12-pasha.tatashin@oracle.com
|