Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph fix from Sage Weil:
"This is a final commit we missed to align the protocol compatibility
with the feature bits.
It decodes a few extra fields in two different messages and reports
EIO when they are used (not yet supported)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: initial CEPH_FEATURE_FS_FILE_LAYOUT_V2 support
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
"Assorted fixes for libata drivers.
- Turns out HDIO_GET_32BIT ioctl was subtly broken all along.
- Recent update to ahci external port handling was incorrectly
marking hotpluggable ports as external making userland handle
devices connected to those ports incorrectly.
- ahci_xgene needs its own irq handler to work around a hardware
erratum. libahci updated to allow irq handler override.
- Misc driver specific updates"
* 'for-4.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ata: ahci: don't mark HotPlugCapable Ports as external/removable
ahci: Workaround for ThunderX Errata#22536
libata: Align ata_device's id on a cacheline
Adding Intel Lewisburg device IDs for SATA
pata-rb532-cf: get rid of the irq_to_gpio() call
libata: fix HDIO_GET_32BIT ioctl
ahci_xgene: Implement the workaround to fix the missing of the edge interrupt for the HOST_IRQ_STAT.
ata: Remove the AHCI_HFLAG_EDGE_IRQ support from libahci.
libahci: Implement the capability to override the generic ahci interrupt handler.
|
|
Pull block fixes from Jens Axboe:
"Round 2 of this. I cut back to the bare necessities, the patch is
still larger than it usually would be at this time, due to the number
of NVMe fixes in there. This pull request contains:
- The 4 core fixes from Ming, that fix both problems with exceeding
the virtual boundary limit in case of merging, and the gap checking
for cloned bio's.
- NVMe fixes from Keith and Christoph:
- Regression on larger user commands, causing problems with
reading log pages (for instance). This touches both NVMe,
and the block core since that is now generally utilized also
for these types of commands.
- Hot removal fixes.
- User exploitable issue with passthrough IO commands, if !length
is given, causing us to fault on writing to the zero
page.
- Fix for a hang under error conditions
- And finally, the current series regression for umount with cgroup
writeback, where the final flush would happen async and hence open
up window after umount where the device wasn't consistent. fsck
right after umount would show this. From Tejun"
* 'for-linus2' of git://git.kernel.dk/linux-block:
block: support large requests in blk_rq_map_user_iov
block: fix blk_rq_get_max_sectors for driver private requests
nvme: fix max_segments integer truncation
nvme: set queue limits for the admin queue
writeback: flush inode cgroup wb switches instead of pinning super_block
NVMe: Fix 0-length integrity payload
NVMe: Don't allow unsupported flags
NVMe: Move error handling to failed reset handler
NVMe: Simplify device reset failure
NVMe: Fix namespace removal deadlock
NVMe: Use IDA for namespace disk naming
NVMe: Don't unmap controller registers on reset
block: merge: get the 1st and last bvec via helpers
block: get the 1st and last bvec via helpers
block: check virt boundary in bio_will_gap()
block: bio: introduce helpers to get the 1st and last bvec
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"A feature was added in 4.3 that allowed users to filter trace points
on a tasks "comm" field. But this prevented filtering on a comm field
that is within a trace event (like sched_migrate_task).
When trying to filter on when a program migrated, this change
prevented the filtering of the sched_migrate_task.
To fix this, the event fields are examined first, and then the extra
fields like "comm" and "cpu" are examined. Also, instead of testing
to assign the comm filter function based on the field's name, the
generic comm field is given a new filter type (FILTER_COMM). When
this field is used to filter the type is checked. The same is done
for the cpu filter field.
Two new special filter types are added: "COMM" and "CPU". This allows
users to still filter the tasks comm for events that have "comm" as
one of their fields, in cases that users would like to filter
sched_migrate_task on the comm of the task that called the event, and
not the comm of the task that is being migrated"
* tag 'trace-fixes-v4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Do not have 'comm' filter override event 'comm' field
|
|
Add support for the format change of MClientReply/MclientCaps.
Also add code that denies access to inodes with pool_ns layouts.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
|
|
Commit 9f61668073a8d "tracing: Allow triggers to filter for CPU ids and
process names" added a 'comm' filter that will filter events based on the
current tasks struct 'comm'. But this now hides the ability to filter events
that have a 'comm' field too. For example, sched_migrate_task trace event.
That has a 'comm' field of the task to be migrated.
echo 'comm == "bash"' > events/sched_migrate_task/filter
will now filter all sched_migrate_task events for tasks named "bash" that
migrates other tasks (in interrupt context), instead of seeing when "bash"
itself gets migrated.
This fix requires a couple of changes.
1) Change the look up order for filter predicates to look at the events
fields before looking at the generic filters.
2) Instead of basing the filter function off of the "comm" name, have the
generic "comm" filter have its own filter_type (FILTER_COMM). Test
against the type instead of the name to assign the filter function.
3) Add a new "COMM" filter that works just like "comm" but will filter based
on the current task, even if the trace event contains a "comm" field.
Do the same for "cpu" field, adding a FILTER_CPU and a filter "CPU".
Cc: stable@vger.kernel.org # v4.3+
Fixes: 9f61668073a8d "tracing: Allow triggers to filter for CPU ids and process names"
Reported-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Driver private request types should not get the artifical cap for the
FS requests. This is important to use the full device capabilities
for internal command or NVMe pass through commands.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Jeff Lien <Jeff.Lien@hgst.com>
Tested-by: Jeff Lien <Jeff.Lien@hgst.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Updated by me to use an explicit check for the one command type that
does support extended checking, instead of relying on the ordering
of the enum command values - as suggested by Keith.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
If cgroup writeback is in use, inodes can be scheduled for
asynchronous wb switching. Before 5ff8eaac1636 ("writeback: keep
superblock pinned during cgroup writeback association switches"), this
could race with umount leading to super_block being destroyed while
inodes are pinned for wb switching. 5ff8eaac1636 fixed it by bumping
s_active while wb switches are in flight; however, this allowed
in-flight wb switches to make umounts asynchronous when the userland
expected synchronosity - e.g. fsck immediately following umount may
fail because the device is still busy.
This patch removes the problematic super_block pinning and instead
makes generic_shutdown_super() flush in-flight wb switches. wb
switches are now executed on a dedicated isw_wq so that they can be
flushed and isw_nr_in_flight keeps track of the number of in-flight wb
switches so that flushing can be avoided in most cases.
v2: Move cgroup_writeback_umount() further below and add MS_ACTIVE
check in inode_switch_wbs() as Jan an Al suggested.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Tahsin Erdogan <tahsin@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Link: http://lkml.kernel.org/g/CAAeU0aNCq7LGODvVGRU-oU_o-6enii5ey0p1c26D1ZzYwkDc5A@mail.gmail.com
Fixes: 5ff8eaac1636 ("writeback: keep superblock pinned during cgroup writeback association switches")
Cc: stable@vger.kernel.org #v4.5
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
This patch applies the two introduced helpers to
figure out the 1st and last bvec, and fixes the
original way after bio splitting.
Cc: stable@vger.kernel.org
Reported-by: Sagi Grimberg <sagig@dev.mellanox.co.il>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
In the following patch, the way for figuring out
the last bvec will be changed with a bit cost introduced,
so return immediately if the queue doesn't have virt
boundary limit. Actually most of devices have not
this limit.
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
The bio passed to bio_will_gap() may be fast cloned from upper
layer(dm, md, bcache, fs, ...), or from bio splitting in block
core.
Unfortunately bio_will_gap() just figures out the last bvec via
'bi_io_vec[prev->bi_vcnt - 1]' directly, and this way is obviously
wrong.
This patch introduces two helpers for getting the first and last
bvec of one bio for fixing the issue.
Cc: stable@vger.kernel.org
Reported-by: Sagi Grimberg <sagig@dev.mellanox.co.il>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull d_inode/d_flags race fix from Al Viro.
I love this fix. Not only does it fix the race in the dentry type
handling, it entirely gets rid of the nasty and subtle memory ordering
rules for d_type and d_inode, and replaces them with the basic dentry
locking rules (sequence numbers under RCU, d_lock elsewhere).
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
use ->d_seq to get coherency between ->d_inode and ->d_flags
|
|
Games with ordering and barriers are way too brittle. Just
bump ->d_seq before and after updating ->d_inode and ->d_flags
type bits, so that verifying ->d_seq would guarantee they are
coherent.
Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"A rather largish series of 12 patches addressing a maze of race
conditions in the perf core code from Peter Zijlstra"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Robustify task_function_call()
perf: Fix scaling vs. perf_install_in_context()
perf: Fix scaling vs. perf_event_enable()
perf: Fix scaling vs. perf_event_enable_on_exec()
perf: Fix ctx time tracking by introducing EVENT_TIME
perf: Cure event->pending_disable race
perf: Fix race between event install and jump_labels
perf: Fix cloning
perf: Only update context time when active
perf: Allow perf_release() with !event->ctx
perf: Do not double free
perf: Close install vs. exit race
|
|
Merge fixes from Andrew Morton:
"10 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
dax: move writeback calls into the filesystems
dax: give DAX clearing code correct bdev
ext4: online defrag not supported with DAX
ext2, ext4: only set S_DAX for regular inodes
block: disable block device DAX by default
ocfs2: unlock inode if deleting inode from orphan fails
mm: ASLR: use get_random_long()
drivers: char: random: add get_random_long()
mm: numa: quickly fail allocations for NUMA balancing on full nodes
mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
"Enumeration:
Revert x86 pcibios_alloc_irq() to fix regression (Bjorn Helgaas)
Marvell MVEBU host bridge driver:
Restrict build to 32-bit ARM (Thierry Reding)"
* tag 'pci-v4.5-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: mvebu: Restrict build to 32-bit ARM
Revert "PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()"
Revert "PCI: Add helpers to manage pci_dev->irq and pci_dev->irq_managed"
Revert "x86/PCI: Don't alloc pcibios-irq when MSI is enabled"
|
|
Previously calls to dax_writeback_mapping_range() for all DAX filesystems
(ext2, ext4 & xfs) were centralized in filemap_write_and_wait_range().
dax_writeback_mapping_range() needs a struct block_device, and it used
to get that from inode->i_sb->s_bdev. This is correct for normal inodes
mounted on ext2, ext4 and XFS filesystems, but is incorrect for DAX raw
block devices and for XFS real-time files.
Instead, call dax_writeback_mapping_range() directly from the filesystem
->writepages function so that it can supply us with a valid block
device. This also fixes DAX code to properly flush caches in response
to sync(2).
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
dax_clear_blocks() needs a valid struct block_device and previously it
was using inode->i_sb->s_bdev in all cases. This is correct for normal
inodes on mounted ext2, ext4 and XFS filesystems, but is incorrect for
DAX raw block devices and for XFS real-time devices.
Instead, rename dax_clear_blocks() to dax_clear_sectors(), and change
its arguments to take a bdev and a sector instead of an inode and a
block. This better reflects what the function does, and it allows the
filesystem and raw block device code to pass in an appropriate struct
block_device.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit d07e22597d1d ("mm: mmap: add new /proc tunable for mmap_base
ASLR") added the ability to choose from a range of values to use for
entropy count in generating the random offset to the mmap_base address.
The maximum value on this range was set to 32 bits for 64-bit x86
systems, but this value could be increased further, requiring more than
the 32 bits of randomness provided by get_random_int(), as is already
possible for arm64. Add a new function: get_random_long() which more
naturally fits with the mmap usage of get_random_int() but operates
exactly the same as get_random_int().
Also, fix the shifting constant in mmap_rnd() to be an unsigned long so
that values greater than 31 bits generate an appropriate mask without
overflow. This is especially important on x86, as its shift instruction
uses a 5-bit mask for the shift operand, which meant that any value for
mmap_rnd_bits over 31 acts as a no-op and effectively disables mmap_base
randomization.
Finally, replace calls to get_random_int() with get_random_long() where
appropriate.
This patch (of 2):
Add get_random_long().
Signed-off-by: Daniel Cashman <dcashman@android.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: David S. Miller <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nick Kralevich <nnk@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
- Two fixes for compatibility with the ACPI 6.1 specification.
Without these fixes multi-interface DIMMs will fail to be probed, and
address range scrub commands to find memory errors will give results
that the kernel will mis-interpret. For multi-interface DIMMs Linux
will accept either the original 6.0 implementation or 6.1.
For address range scrub we'll only support 6.1 since ACPI formalized
this DSM differently than the original example [1] implemented in
v4.2. The expectation is that production systems will only ever ship
the ACPI 6.1 address range scrub command definition.
- The wider async address range scrub work targeting 4.6 discovered
that the original synchronous implementation in 4.5 is not sizing its
return buffer correctly.
- Arnd caught that my recent fix to the size of the pfn_t flags missed
updating the flags variable used in the pmem driver.
- Toshi found that we mishandle the memremap() return value in
devm_memremap().
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
nvdimm: use 'u64' for pfn flags
devm_memremap: Fix error value when memremap failed
nfit: update address range scrub commands to the acpi 6.1 format
libnvdimm, tools/testing/nvdimm: fix 'ars_status' output buffer sizing
nfit: fix multi-interface dimm handling, acpi6.1 compatibility
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply
Pull power supply fixes from Sebastian Reichel:
"Add a regression fix for changed sysfs path of bq27xxx_battery and
update MAINTAINERS file"
* tag 'for-v4.5-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
power: bq27xxx_battery: Restore device name
MAINTAINERS: update bq27xxx driver
|
|
The id buffer in ata_device is a DMA target, but it isn't explicitly
cacheline aligned. Due to this, adjacent fields can be overwritten with
stale data from memory on non coherent architectures. As a result, the
kernel is sometimes unable to communicate with an ATA device.
Fix this by ensuring that the id buffer is cacheline aligned.
This issue is similar to that fixed by Commit 84bda12af31f
("libata: align ap->sector_buf").
Signed-off-by: Harvey Hunt <harvey.hunt@imgtec.com>
Cc: linux-kernel@vger.kernel.org
Cc: <stable@vger.kernel.org> # 2.6.18
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
perf_install_in_context() relies upon the context switch hooks to have
scheduled in events when the IPI misses its target -- after all, if
the task has moved from the CPU (or wasn't running at all), it will
have to context switch to run elsewhere.
This however doesn't appear to be happening.
It is possible for the IPI to not happen (task wasn't running) only to
later observe the task running with an inactive context.
The only possible explanation is that the context switch hooks are not
called. Therefore put in a sync_sched() after toggling the jump_label
to guarantee all CPUs will have them enabled before we install an
event.
A simple if (0->1) sync_sched() will not in fact work, because any
further increment can race and complete before the sync_sched().
Therefore we must jump through some hoops.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dvyukov@google.com
Cc: eranian@google.com
Cc: oleg@redhat.com
Cc: panand@redhat.com
Cc: sasha.levin@oracle.com
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160224174947.980211985@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Alexander reported that when the 'original' context gets destroyed, no
new clones happen.
This can happen irrespective of the ctx switch optimization, any task
can die, even the parent, and we want to continue monitoring the task
hierarchy until we either close the event or no tasks are left in the
hierarchy.
perf_event_init_context() will attempt to pin the 'parent' context
during clone(). At that point current is the parent, and since current
cannot have exited while executing clone(), its context cannot have
passed through perf_event_exit_task_context(). Therefore
perf_pin_task_context() cannot observe ctx->task == TASK_TOMBSTONE.
However, since inherit_event() does:
if (parent_event->parent)
parent_event = parent_event->parent;
it looks at the 'original' event when it does: is_orphaned_event().
This can return true if the context that contains the this event has
passed through perf_event_exit_task_context(). And thus we'll fail to
clone the perf context.
Fix this by adding a new state: STATE_DEAD, which is set by
perf_release() to indicate that the filedesc (or kernel reference) is
dead and there are no observers for our data left.
Only for STATE_DEAD will is_orphaned_event() be true and inhibit
cloning.
STATE_EXIT is otherwise preserved such that is_event_hup() remains
functional and will report when the observed task hierarchy becomes
empty.
Reported-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Tested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dvyukov@google.com
Cc: eranian@google.com
Cc: oleg@redhat.com
Cc: panand@redhat.com
Cc: sasha.levin@oracle.com
Cc: vince@deater.net
Fixes: c6e5b73242d2 ("perf: Synchronously clean up child events")
Link: http://lkml.kernel.org/r/20160224174947.919845295@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The original format of these commands from the "NVDIMM DSM Interface
Example" [1] are superseded by the ACPI 6.1 definition of the "NVDIMM Root
Device _DSMs" [2].
[1]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
[2]: http://www.uefi.org/sites/default/files/resources/ACPI_6_1.pdf
"9.20.7 NVDIMM Root Device _DSMs"
Changes include:
1/ New 'restart' fields in ars_status, unfortunately these are
implemented in the middle of the existing definition so this change
is not backwards compatible. The expectation is that shipping
platforms will only ever support the ACPI 6.1 definition.
2/ New status values for ars_start ('busy') and ars_status ('overflow').
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Linda Knippers <linda.knippers@hpe.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Pull NFS client bugfixes from Trond Myklebust:
"Stable bugfixes:
- Fix nfs_size_to_loff_t
- NFSv4: Fix a dentry leak on alias use
Other bugfixes:
- Don't schedule a layoutreturn if the layout segment can be freed
immediately.
- Always set NFS_LAYOUT_RETURN_REQUESTED with lo->plh_return_iomode
- rpcrdma_bc_receive_call() should init rq_private_buf.len
- fix stateid handling for the NFS v4.2 operations
- pnfs/blocklayout: fix a memeory leak when using,vmalloc_to_page
- fix panic in gss_pipe_downcall() in fips mode
- Fix a race between layoutget and pnfs_destroy_layout
- Fix a race between layoutget and bulk recalls"
* tag 'nfs-for-4.5-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4.x/pnfs: Fix a race between layoutget and bulk recalls
NFSv4.x/pnfs: Fix a race between layoutget and pnfs_destroy_layout
auth_gss: fix panic in gss_pipe_downcall() in fips mode
pnfs/blocklayout: fix a memeory leak when using,vmalloc_to_page
nfs4: fix stateid handling for the NFS v4.2 operations
NFSv4: Fix a dentry leak on alias use
xprtrdma: rpcrdma_bc_receive_call() should init rq_private_buf.len
pNFS: Always set NFS_LAYOUT_RETURN_REQUESTED with lo->plh_return_iomode
pNFS: Fix pnfs_mark_matching_lsegs_return()
nfs: fix nfs_size_to_loff_t
|
|
Pull networking fixes from David Miller:
"Looks like a lot, but mostly driver fixes scattered all over as usual.
Of note:
1) Add conditional sched in nf conntrack in cleanup to avoid NMI
watchdogs. From Florian Westphal.
2) Fix deadlock in nfnetlink cttimeout, also from Floarian.
3) Fix handling of slaves in bonding ARP monitor validation, from Jay
Vosburgh.
4) Callers of ip_cmsg_send() are responsible for freeing IP options,
some were not doing so. Fix from Eric Dumazet.
5) Fix per-cpu bugs in mvneta driver, from Gregory CLEMENT.
6) Fix vlan handling in mv88e6xxx DSA driver, from Vivien Didelot.
7) bcm7xxx PHY driver bug fixes from Florian Fainelli.
8) Avoid unaligned accesses to protocol headers wrt. GRE, from
Alexander Duyck.
9) SKB leaks and other problems in arc_emac driver, from Alexander
Kochetkov.
10) tcp_v4_inbound_md5_hash() releases listener socket instead of
request socket on error path, oops. Fix from Eric Dumazet.
11) Missing socket release in pppoe_rcv_core() that seems to have
existed basically forever. From Guillaume Nault.
12) Missing slave_dev unregister in dsa_slave_create() error path,
from Florian Fainelli.
13) crypto_alloc_hash() never returns NULL, fix return value check in
__tcp_alloc_md5sig_pool. From Insu Yun.
14) Properly expire exception route entries in ipv4, from Xin Long.
15) Fix races in tcp/dccp listener socket dismantle, from Eric
Dumazet.
16) Don't set IFF_TX_SKB_SHARING in vxlan, geneve, or GRE, it's not
legal. These drivers modify the SKB on transmit. From Jiri Benc.
17) Fix regression in the initialziation of netdev->tx_queue_len.
From Phil Sutter.
18) Missing unlock in tipc_nl_add_bc_link() error path, from Insu Yun.
19) SCTP port hash sizing does not properly ensure that table is a
power of two in size. From Neil Horman.
20) Fix initializing of software copy of MAC address in fmvj18x_cs
driver, from Ken Kawasaki"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (129 commits)
bnx2x: Fix 84833 phy command handler
bnx2x: Fix led setting for 84858 phy.
bnx2x: Correct 84858 PHY fw version
bnx2x: Fix 84833 RX CRC
bnx2x: Fix link-forcing for KR2
net: ethernet: davicom: fix devicetree irq resource
fmvj18x_cs: fix incorrect indexing of dev->dev_addr[] when copying the MAC address
Driver: Vmxnet3: Update Rx ring 2 max size
net: netcp: rework the code for get/set sw_data in dma desc
soc: ti: knav_dma: rename pad in struct knav_dma_desc to sw_data
net: ti: netcp: restore get/set_pad_info() functionality
MAINTAINERS: Drop myself as xen netback maintainer
sctp: Fix port hash table size computation
can: ems_usb: Fix possible tx overflow
Bluetooth: hci_core: Avoid mixing up req_complete and req_complete_skb
net: bcmgenet: Fix internal PHY link state
af_unix: Don't use continue to re-execute unix_stream_read_generic loop
unix_diag: fix incorrect sign extension in unix_lookup_by_ino
bnxt_en: Failure to update PHY is not fatal condition.
bnxt_en: Remove unnecessary call to update PHY settings.
...
|
|
Rename the pad to sw_data as per description of this field in the hardware
spec(refer sprugr9 from www.ti.com). Latest version of the document is
at http://www.ti.com/lit/ug/sprugr9h/sprugr9h.pdf and section 3.1
Host Packet Descriptor describes this field.
Define and use a constant for the size of sw_data field similar to
other fields in the struct for desc and document the sw_data field
in the header. As the sw_data is not touched by hw, it's type can be
changed to u32.
Rename the helpers to match with the updated dma desc field sw_data.
Cc: Wingman Kwok <w-kwok2@ti.com>
Cc: Mugunthan V N <mugunthanvnm@ti.com>
CC: Arnd Bergmann <arnd@arndb.de>
CC: Grygorii Strashko <grygorii.strashko@ti.com>
CC: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Patch <703df6c09795> ("power: bq27xxx_battery: Reorganize I2C
into a module") has removed the device name numbering from
bq27xxx_battery_i2c_probe. Fix that by restoring the code.
Fixes: 703df6c097956d17a818e63961c82e8e9eef9fef
Signed-off-by: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com>
Reviewed-by: Pali Rohár <pali.rohar@gmail.com>
Tested-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Sebastian Reichel <sre@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"This is unusually large, partly due to the EFI fixes that prevent
accidental deletion of EFI variables through efivarfs that may brick
machines. These fixes are somewhat involved to maintain compatibility
with existing install methods and other usage modes, while trying to
turn off the 'rm -rf' bricking vector.
Other fixes are for large page ioremap()s and for non-temporal
user-memcpy()s"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Fix vmalloc_fault() to handle large pages properly
hpet: Drop stale URLs
x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache()
x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable
lib/ucs2_string: Correct ucs2 -> utf8 conversion
efi: Add pstore variables to the deletion whitelist
efi: Make efivarfs entries immutable by default
efi: Make our variable validation list include the guid
efi: Do variable name validation tests in utf8
efi: Use ucs2_as_utf8 in efivarfs instead of open coding a bad version
lib/ucs2_string: Add ucs2 -> utf8 helper functions
|
|
Use the output length specified in the command to size the receive
buffer rather than the arbitrary 4K limit.
This bug was hiding the fact that the ndctl implementation of
ndctl_bus_cmd_new_ars_status() was not specifying an output buffer size.
Cc: <stable@vger.kernel.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Merge fixes from Andrew Morton:
"10 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: slab: free kmem_cache_node after destroy sysfs file
ipc/shm: handle removed segments gracefully in shm_mmap()
MAINTAINERS: update Kselftest Framework mailing list
devm_memremap_release(): fix memremap'd addr handling
mm/hugetlb.c: fix incorrect proc nr_hugepages value
mm, x86: fix pte_page() crash in gup_pte_range()
fsnotify: turn fsnotify reaper thread into a workqueue job
Revert "fsnotify: destroy marks with call_srcu instead of dedicated thread"
mm: fix regression in remap_file_pages() emulation
thp, dax: do not try to withdraw pgtable from non-anon VMA
|
|
When I used netdev_for_each_lower_dev in commit bad531623253 ("vrf:
remove slave queue and private slave struct") I thought that it acts
like netdev_for_each_lower_private and can be used to remove the current
device from the list while walking, but unfortunately it acts more like
netdev_for_each_lower_private_rcu and doesn't allow it. The difference
is where the "iter" points to, right now it points to the current element
and that makes it impossible to remove it. Change the logic to be
similar to netdev_for_each_lower_private and make it point to the "next"
element so we can safely delete the current one. VRF is the only such
user right now, there's no change for the read-only users.
Here's what can happen now:
[98423.249858] general protection fault: 0000 [#1] SMP
[98423.250175] Modules linked in: vrf bridge(O) stp llc nfsd auth_rpcgss
oid_registry nfs_acl nfs lockd grace sunrpc crct10dif_pclmul
crc32_pclmul crc32c_intel ghash_clmulni_intel jitterentropy_rng
sha256_generic hmac drbg ppdev aesni_intel aes_x86_64 glue_helper lrw
gf128mul ablk_helper cryptd evdev serio_raw pcspkr virtio_balloon
parport_pc parport i2c_piix4 i2c_core virtio_console acpi_cpufreq button
9pnet_virtio 9p 9pnet fscache ipv6 autofs4 ext4 crc16 mbcache jbd2 sg
virtio_blk virtio_net sr_mod cdrom e1000 ata_generic ehci_pci uhci_hcd
ehci_hcd usbcore usb_common virtio_pci ata_piix libata floppy
virtio_ring virtio scsi_mod [last unloaded: bridge]
[98423.255040] CPU: 1 PID: 14173 Comm: ip Tainted: G O
4.5.0-rc2+ #81
[98423.255386] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.8.1-20150318_183358- 04/01/2014
[98423.255777] task: ffff8800547f5540 ti: ffff88003428c000 task.ti:
ffff88003428c000
[98423.256123] RIP: 0010:[<ffffffff81514f3e>] [<ffffffff81514f3e>]
netdev_lower_get_next+0x1e/0x30
[98423.256534] RSP: 0018:ffff88003428f940 EFLAGS: 00010207
[98423.256766] RAX: 0002000100000004 RBX: ffff880054ff9000 RCX:
0000000000000000
[98423.257039] RDX: ffff88003428f8b8 RSI: ffff88003428f950 RDI:
ffff880054ff90c0
[98423.257287] RBP: ffff88003428f940 R08: 0000000000000000 R09:
0000000000000000
[98423.257537] R10: 0000000000000001 R11: 0000000000000000 R12:
ffff88003428f9e0
[98423.257802] R13: ffff880054a5fd00 R14: ffff88003428f970 R15:
0000000000000001
[98423.258055] FS: 00007f3d76881700(0000) GS:ffff88005d000000(0000)
knlGS:0000000000000000
[98423.258418] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[98423.258650] CR2: 00007ffe5951ffa8 CR3: 0000000052077000 CR4:
00000000000406e0
[98423.258902] Stack:
[98423.259075] ffff88003428f960 ffffffffa0442636 0002000100000004
ffff880054ff9000
[98423.259647] ffff88003428f9b0 ffffffff81518205 ffff880054ff9000
ffff88003428f978
[98423.260208] ffff88003428f978 ffff88003428f9e0 ffff88003428f9e0
ffff880035b35f00
[98423.260739] Call Trace:
[98423.260920] [<ffffffffa0442636>] vrf_dev_uninit+0x76/0xa0 [vrf]
[98423.261156] [<ffffffff81518205>]
rollback_registered_many+0x205/0x390
[98423.261401] [<ffffffff815183ec>] unregister_netdevice_many+0x1c/0x70
[98423.261641] [<ffffffff8153223c>] rtnl_delete_link+0x3c/0x50
[98423.271557] [<ffffffff815335bb>] rtnl_dellink+0xcb/0x1d0
[98423.271800] [<ffffffff811cd7da>] ? __inc_zone_state+0x4a/0x90
[98423.272049] [<ffffffff815337b4>] rtnetlink_rcv_msg+0x84/0x200
[98423.272279] [<ffffffff810cfe7d>] ? trace_hardirqs_on+0xd/0x10
[98423.272513] [<ffffffff8153370b>] ? rtnetlink_rcv+0x1b/0x40
[98423.272755] [<ffffffff81533730>] ? rtnetlink_rcv+0x40/0x40
[98423.272983] [<ffffffff8155d6e7>] netlink_rcv_skb+0x97/0xb0
[98423.273209] [<ffffffff8153371a>] rtnetlink_rcv+0x2a/0x40
[98423.273476] [<ffffffff8155ce8b>] netlink_unicast+0x11b/0x1a0
[98423.273710] [<ffffffff8155d2f1>] netlink_sendmsg+0x3e1/0x610
[98423.273947] [<ffffffff814fbc98>] sock_sendmsg+0x38/0x70
[98423.274175] [<ffffffff814fc253>] ___sys_sendmsg+0x2e3/0x2f0
[98423.274416] [<ffffffff810d841e>] ? do_raw_spin_unlock+0xbe/0x140
[98423.274658] [<ffffffff811e1bec>] ? handle_mm_fault+0x26c/0x2210
[98423.274894] [<ffffffff811e19cd>] ? handle_mm_fault+0x4d/0x2210
[98423.275130] [<ffffffff81269611>] ? __fget_light+0x91/0xb0
[98423.275365] [<ffffffff814fcd42>] __sys_sendmsg+0x42/0x80
[98423.275595] [<ffffffff814fcd92>] SyS_sendmsg+0x12/0x20
[98423.275827] [<ffffffff81611bb6>] entry_SYSCALL_64_fastpath+0x16/0x7a
[98423.276073] Code: c3 31 c0 5d c3 0f 1f 84 00 00 00 00 00 66 66 66 66
90 48 8b 06 55 48 81 c7 c0 00 00 00 48 89 e5 48 8b 00 48 39 f8 74 09 48
89 06 <48> 8b 40 e8 5d c3 31 c0 5d c3 0f 1f 84 00 00 00 00 00 66 66 66
[98423.279639] RIP [<ffffffff81514f3e>] netdev_lower_get_next+0x1e/0x30
[98423.279920] RSP <ffff88003428f940>
CC: David Ahern <dsa@cumulusnetworks.com>
CC: David S. Miller <davem@davemloft.net>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
CC: Vlad Yasevich <vyasevic@redhat.com>
Fixes: bad531623253 ("vrf: remove slave queue and private slave struct")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Tested-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull livepatching fixes from Jiri Kosina:
- regression (from 4.4) fix for ordering issue, introduced by an
earlier ftrace change, that broke live patching of modules.
The fix replaces the ftrace module notifier by direct call in order
to make the ordering guaranteed and well-defined. The patch, from
Jessica Yu, has been acked both by Steven and Rusty
- error message fix from Miroslav Benes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
ftrace/module: remove ftrace module notifier
livepatch: change the error message in asm/livepatch.h header files
|
|
This reverts commit c510eff6beba ("fsnotify: destroy marks with
call_srcu instead of dedicated thread").
Eryu reported that he was seeing some OOM kills kick in when running a
testcase that adds and removes inotify marks on a file in a tight loop.
The above commit changed the code to use call_srcu to clean up the
marks. While that does (in principle) work, the srcu callback job is
limited to cleaning up entries in small batches and only once per jiffy.
It's easily possible to overwhelm that machinery with too many call_srcu
callbacks, and Eryu's reproduer did just that.
There's also another potential problem with using call_srcu here. While
you can obviously sleep while holding the srcu_read_lock, the callbacks
run under local_bh_disable, so you can't sleep there.
It's possible when putting the last reference to the fsnotify_mark that
we'll end up putting a chain of references including the fsnotify_group,
uid, and associated keys. While I don't see any obvious ways that that
could occurs, it's probably still best to avoid using call_srcu here
after all.
This patch reverts the above patch. A later patch will take a different
approach to eliminated the dedicated thread here.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Reported-by: Eryu Guan <guaneryu@gmail.com>
Tested-by: Eryu Guan <guaneryu@gmail.com>
Cc: Jan Kara <jack@suse.com>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Revert 811a4e6fce09 ("PCI: Add helpers to manage pci_dev->irq and
pci_dev->irq_managed").
This is part of reverting 991de2e59090 ("PCI, x86: Implement
pcibios_alloc_irq() and pcibios_free_irq()") to fix regressions it
introduced.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=111211
Fixes: 991de2e59090 ("PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
CC: Jiang Liu <jiang.liu@linux.intel.com>
|
|
Remove the ftrace module notifier in favor of directly calling
ftrace_module_enable() and ftrace_release_mod() in the module loader.
Hard-coding the function calls directly in the module loader removes
dependence on the module notifier call chain and provides better
visibility and control over what gets called when, which is important
to kernel utilities such as livepatch.
This fixes a notifier ordering issue in which the ftrace module notifier
(and hence ftrace_module_enable()) for coming modules was being called
after klp_module_notify(), which caused livepatch modules to initialize
incorrectly. This patch removes dependence on the module notifier call
chain in favor of hard coding the corresponding function calls in the
module loader. This ensures that ftrace and livepatch code get called in
the correct order on patch module load and unload.
Fixes: 5156dca34a3e ("ftrace: Fix the race between ftrace and insmod")
Signed-off-by: Jessica Yu <jeyu@redhat.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
Pull block fixes from Jens Axboe:
"A collection of fixes from the past few weeks that should go into 4.5.
This contains:
- Overflow fix for sysfs discard show function from Alan.
- A stacking limit init fix for max_dev_sectors, so we don't end up
artificially capping some use cases. From Keith.
- Have blk-mq proper end unstarted requests on a dying queue, instead
of pushing that to the driver. From Keith.
- NVMe:
- Update to Kconfig description for NVME_SCSI, since it was
vague and having it on is important for some SUSE distros.
From Christoph.
- Set of fixes from Keith, around surprise removal. Also kills
the no-merge flag, so it supports merging.
- Set of fixes for lightnvm from Matias, Javier, and Wenwei.
- Fix null_blk oops when asked for lightnvm, but not available. From
Matias.
- Copy-to-user EINTR fix from Hannes, fixing a case where SG_IO fails
if interrupted by a signal.
- Two floppy fixes from Jiri, fixing signal handling and blocking
open.
- A use-after-free fix for O_DIRECT, from Mike Krinkin.
- A block module ref count fix from Roman Pen.
- An fs IO wait accounting fix for O_DSYNC from Stephane Gasparini.
- Smaller reallo fix for xen-blkfront from Bob Liu.
- Removal of an unused struct member in the deadline IO scheduler,
from Tahsin.
- Also from Tahsin, properly initialize inode struct members
associated with cgroup writeback, if enabled.
- From Tejun, ensure that we keep the superblock pinned during cgroup
writeback"
* 'for-linus' of git://git.kernel.dk/linux-block: (25 commits)
blk: fix overflow in queue_discard_max_hw_show
writeback: initialize inode members that track writeback history
writeback: keep superblock pinned during cgroup writeback association switches
bio: return EINTR if copying to user space got interrupted
NVMe: Rate limit nvme IO warnings
NVMe: Poll device while still active during remove
NVMe: Requeue requests on suspended queues
NVMe: Allow request merges
NVMe: Fix io incapable return values
blk-mq: End unstarted requests on dying queue
block: Initialize max_dev_sectors to 0
null_blk: oops when initializing without lightnvm
block: fix module reference leak on put_disk() call for cgroups throttle
nvme: fix Kconfig description for BLK_DEV_NVME_SCSI
kernel/fs: fix I/O wait not accounted for RW O_DSYNC
floppy: refactor open() flags handling
lightnvm: allow to force mm initialization
lightnvm: check overflow and correct mlc pairs
lightnvm: fix request intersection locking in rrpc
lightnvm: warn if irqs are disabled in lock laddr
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"This includes two fixes.
The first is something that has come up a few times and has been
worked out individually, but it's come up now enough that the problem
should be generic. Tracepoints are protected by RCU sched. There are
several tracepoints within core infrastructure like kfree(). If a
tracepoint is called when the CPU is going down, or when it's coming
up but has yet to be recognized by RCU, a RCU warning is triggered.
This is a true bug as that tracepoint is not protected by RCU.
Usually, this is taken care of by testing for cpu online as a
tracepoint condition. But as this is happening more often, moving it
from a individual tracepoint to a check in the tracepoint
infrastructure is more robust.
Note, there is now a duplicate of a cpu online test, because this
update does not remove the individual checks. But the overhead is
small enough that the removal can be done in another release.
The second change is strange linker breakage due to the branch
tracer's builtin_constant_p() check failing, and treating the
condition as a variable instead of a constant. Arnd Bergmann found
that this can be fixed by testing !!(cond) instead of just (cond)"
* tag 'trace-fixes-v4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix freak link error caused by branch tracer
tracepoints: Do not trace when cpu is offline
|
|
unreferenced object 0xffffc90000abf000 (size 16900):
comm "fsync02", pid 15765, jiffies 4297431627 (age 423.772s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 a0 c2 19 00 88 ff ff ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff8174d54e>] kmemleak_alloc+0x4e/0xb0
[<ffffffff811b9b91>] __vmalloc_node_range+0x231/0x280
[<ffffffff811b9c2a>] __vmalloc+0x4a/0x50
[<ffffffffa02c9ec1>] ext_tree_prepare_commit+0x231/0x2e0 [blocklayoutdriver]
[<ffffffffa02c700e>] bl_prepare_layoutcommit+0xe/0x10 [blocklayoutdriver]
[<ffffffffa0596a6c>] pnfs_layoutcommit_inode+0x29c/0x330 [nfsv4]
[<ffffffffa0596b13>] pnfs_generic_sync+0x13/0x20 [nfsv4]
[<ffffffffa0585188>] nfs4_file_fsync+0x58/0x150 [nfsv4]
[<ffffffff81228e5b>] vfs_fsync_range+0x4b/0xb0
[<ffffffff81228f1d>] do_fsync+0x3d/0x70
[<ffffffff812291d0>] SyS_fsync+0x10/0x20
[<ffffffff81757def>] entry_SYSCALL_64_fastpath+0x12/0x76
[<ffffffffffffffff>] 0xffffffffffffffff
v2, add missing include header
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
problem description:
The current code sets UAR page size equal to system page size.
The ConnectX-3 and ConnectX-3 Pro HWs require minimum 128 UAR pages.
The mlx4 kernel drivers are not loaded if there is less than 128 UAR pages.
solution:
Always set UAR page to 4KB. This allows more UAR pages if the OS
has PAGE_SIZE larger than 4KB. For example, PowerPC kernel use 64KB
system page size, with 4MB uar region, there are 4MB/2/64KB = 32
uars (half for uar, half for blueflame). This does not meet minimum 128
UAR pages requirement. With 4KB UAR page, there are 4MB/2/4KB = 512 uars
which meet the minimum requirement.
Note that only codes in mlx4_core that deal with firmware know that uar
page size is 4KB. Codes that deal with usr page in cq and qp context
(mlx4_ib, mlx4_en and part of mlx4_core) still have the same assumption
that uar page size equals to system page size.
Note that with this implementation, on 64KB system page size kernel, there
are 16 uars per system page but only one uars is used. The other 15
uars are ignored because of the above assumption.
Regarding SR-IOV, mlx4_core in hypervisor will set the uar page size
to 4KB and mlx4_core code in virtual OS will obtain the uar page size from
firmware.
Regarding backward compatibility in SR-IOV, if hypervisor has this new code,
the virtual OS must be updated. If hypervisor has old code, and the virtual
OS has this new code, the new code will be backward compatible with the
old code. If the uar size is big enough, this new code in VF continues to
work with 64 KB uar page size (on PowerPc kernel). If the uar size does not
meet 128 uars requirement, this new code not loaded in VF and print the same
error message as the old code in Hypervisor.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
mlx5_ifc.h is a header file representing the API and ABI between
the driver to the firmware and hardware. This file is used from
both the mlx5_ib and mlx5_core drivers.
Previously, this file used incrementing counter to indicate
reserved fields, for example:
struct mlx5_ifc_odp_per_transport_service_cap_bits {
u8 send[0x1];
u8 receive[0x1];
u8 write[0x1];
u8 read[0x1];
u8 reserved_0[0x1];
u8 srq_receive[0x1];
u8 reserved_1[0x1a];
};
If one developer implements through net-next feature A that uses
reserved_0, they replace it with featureA and renames reserved_1 to
reserved_0. In the same kernel cycle, a 2nd developer could implement
feature B through the rdma tree, that uses reserved_1 and split it to
featureB and a smaller reserved_1 field. This will cause a conflict
when the two trees are merged.
The source of this conflict is that the 1st developer changed *all*
reserved fields.
As Linus suggested, we change the layout of structs to:
struct mlx5_ifc_odp_per_transport_service_cap_bits {
u8 send[0x1];
u8 receive[0x1];
u8 write[0x1];
u8 read[0x1];
u8 reserved_at_4[0x1];
u8 srq_receive[0x1];
u8 reserved_at_6[0x1a];
};
This makes the conflicts much more rare and preserves the locality of
changes.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull IOMMU SVM fixes from David Woodhouse:
"Minor register size and interrupt acknowledgement fixes which only
showed up in testing on newer hardware, but mostly a fix to the MM
refcount handling to prevent a recursive refcount issue when mmap() is
used on the file descriptor associated with a bound PASID"
* tag 'for-linus-20160216' of git://git.infradead.org/intel-iommu:
iommu/vt-d: Clear PPR bit to ensure we get more page request interrupts
iommu/vt-d: Fix 64-bit accesses to 32-bit DMAR_GSTS_REG
iommu/vt-d: Fix mm refcounting to hold mm_count not mm_users
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent
Pull EFI fixes from Matt Fleming:
* Prevent accidental deletion of EFI variables through efivarfs that
may brick machines. We use a whitelist of known-safe variables to
allow things like installing distributions to work out of the box, and
instead restrict vendor-specific variable deletion by making
non-whitelist variables immutable (Peter Jones)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
In my randconfig tests, I came across a bug that involves several
components:
* gcc-4.9 through at least 5.3
* CONFIG_GCOV_PROFILE_ALL enabling -fprofile-arcs for all files
* CONFIG_PROFILE_ALL_BRANCHES overriding every if()
* The optimized implementation of do_div() that tries to
replace a library call with an division by multiplication
* code in drivers/media/dvb-frontends/zl10353.c doing
u32 adc_clock = 450560; /* 45.056 MHz */
if (state->config.adc_clock)
adc_clock = state->config.adc_clock;
do_div(value, adc_clock);
In this case, gcc fails to determine whether the divisor
in do_div() is __builtin_constant_p(). In particular, it
concludes that __builtin_constant_p(adc_clock) is false, while
__builtin_constant_p(!!adc_clock) is true.
That in turn throws off the logic in do_div() that also uses
__builtin_constant_p(), and instead of picking either the
constant- optimized division, and the code in ilog2() that uses
__builtin_constant_p() to figure out whether it knows the answer at
compile time. The result is a link error from failing to find
multiple symbols that should never have been called based on
the __builtin_constant_p():
dvb-frontends/zl10353.c:138: undefined reference to `____ilog2_NaN'
dvb-frontends/zl10353.c:138: undefined reference to `__aeabi_uldivmod'
ERROR: "____ilog2_NaN" [drivers/media/dvb-frontends/zl10353.ko] undefined!
ERROR: "__aeabi_uldivmod" [drivers/media/dvb-frontends/zl10353.ko] undefined!
This patch avoids the problem by changing __trace_if() to check
whether the condition is known at compile-time to be nonzero, rather
than checking whether it is actually a constant.
I see this one link error in roughly one out of 1600 randconfig builds
on ARM, and the patch fixes all known instances.
Link: http://lkml.kernel.org/r/1455312410-1058841-1-git-send-email-arnd@arndb.de
Acked-by: Nicolas Pitre <nico@linaro.org>
Fixes: ab3c9c686e22 ("branch tracer, intel-iommu: fix build with CONFIG_BRANCH_TRACER=y")
Cc: stable@vger.kernel.org # v2.6.30+
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
The tracepoint infrastructure uses RCU sched protection to enable and
disable tracepoints safely. There are some instances where tracepoints are
used in infrastructure code (like kfree()) that get called after a CPU is
going offline, and perhaps when it is coming back online but hasn't been
registered yet.
This can probuce the following warning:
[ INFO: suspicious RCU usage. ]
4.4.0-00006-g0fe53e8-dirty #34 Tainted: G S
-------------------------------
include/trace/events/kmem.h:141 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1
no locks held by swapper/8/0.
stack backtrace:
CPU: 8 PID: 0 Comm: swapper/8 Tainted: G S 4.4.0-00006-g0fe53e8-dirty #34
Call Trace:
[c0000005b76c78d0] [c0000000008b9540] .dump_stack+0x98/0xd4 (unreliable)
[c0000005b76c7950] [c00000000010c898] .lockdep_rcu_suspicious+0x108/0x170
[c0000005b76c79e0] [c00000000029adc0] .kfree+0x390/0x440
[c0000005b76c7a80] [c000000000055f74] .destroy_context+0x44/0x100
[c0000005b76c7b00] [c0000000000934a0] .__mmdrop+0x60/0x150
[c0000005b76c7b90] [c0000000000e3ff0] .idle_task_exit+0x130/0x140
[c0000005b76c7c20] [c000000000075804] .pseries_mach_cpu_die+0x64/0x310
[c0000005b76c7cd0] [c000000000043e7c] .cpu_die+0x3c/0x60
[c0000005b76c7d40] [c0000000000188d8] .arch_cpu_idle_dead+0x28/0x40
[c0000005b76c7db0] [c000000000101e6c] .cpu_startup_entry+0x50c/0x560
[c0000005b76c7ed0] [c000000000043bd8] .start_secondary+0x328/0x360
[c0000005b76c7f90] [c000000000008a6c] start_secondary_prolog+0x10/0x14
This warning is not a false positive either. RCU is not protecting code that
is being executed while the CPU is offline.
Instead of playing "whack-a-mole(TM)" and adding conditional statements to
the tracepoints we find that are used in this instance, simply add a
cpu_online() test to the tracepoint code where the tracepoint will be
ignored if the CPU is offline.
Use of raw_smp_processor_id() is fine, as there should never be a case where
the tracepoint code goes from running on a CPU that is online and suddenly
gets migrated to a CPU that is offline.
Link: http://lkml.kernel.org/r/1455387773-4245-1-git-send-email-kda@linux-powerpc.org
Reported-by: Denis Kirjanov <kda@linux-powerpc.org>
Fixes: 97e1c18e8d17b ("tracing: Kernel Tracepoints")
Cc: stable@vger.kernel.org # v2.6.28+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
According to the VT-d specification we need to clear the PPR bit in
the Page Request Status register when handling page requests, or the
hardware won't generate any more interrupts.
This wasn't actually necessary on SKL/KBL (which may well be the
subject of a hardware erratum, although it's harmless enough). But
other implementations do appear to get it right, and we only ever get
one interrupt unless we clear the PPR bit.
Reported-by: CQ Tang <cq.tang@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@vger.kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are a number of small tty and serial driver fixes for 4.5-rc4
that resolve some reported issues.
One of them got reverted as it wasn't correct based on testing, and
all have been in linux-next for a while"
* tag 'tty-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
Revert "8250: uniphier: allow modular build with 8250 console"
pty: make sure super_block is still valid in final /dev/tty close
pty: fix possible use after free of tty->driver_data
tty: Add support for PCIe WCH382 2S multi-IO card
serial/omap: mark wait_for_xmitr as __maybe_unused
serial: omap: Prevent DoS using unprivileged ioctl(TIOCSRS485)
8250: uniphier: allow modular build with 8250 console
tty: Drop krefs for interrupted tty lock
|
|
Merge fixes from Andrew Morton:
"10 fixes"
The lockdep hlist conversion is in the locking tree too, waiting for the
next merge window. Andrew thought it should go in now. I'll take it,
since it fixes a real problem and looks trivially correct (famous last
words).
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
arch/x86/Kconfig: CONFIG_X86_UV should depend on CONFIG_EFI
mm: fix pfn_t vs highmem
kernel/locking/lockdep.c: convert hash tables to hlists
mm,thp: fix spellos in describing __HAVE_ARCH_FLUSH_PMD_TLB_RANGE
mm,thp: khugepaged: call pte flush at the time of collapse
mm/backing-dev.c: fix error path in wb_init()
mm, dax: check for pmd_none() after split_huge_pmd()
vsprintf: kptr_restrict is okay in IRQ when 2
mm: fix filemap.c kernel doc warning
ubsan: cosmetic fix to Kconfig text
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"A set of seven fixes:
Two regressions in the new hisi_sas arm driver, a blacklist entry for
the marvell console which was causing a reset cascade without it, a
race fix in the WRITE_SAME/DISCARD routines, a retry fix for the rdac
driver, without which, it would prematurely return EIO and a couple of
fixes for the hyper-v storvsc driver"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
block/sd: Return -EREMOTEIO when WRITE SAME and DISCARD are disabled
SCSI: Add Marvell Console to VPD blacklist
scsi_dh_rdac: always retry MODE SELECT on command lock violation
storvsc: Use the specified target ID in device lookup
storvsc: Install the storvsc specific timeout handler for FC devices
hisi_sas: fix v1 hw check for slot error
hisi_sas: add dependency for HAS_IOMEM
|