summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-08-17dlm: use sctp 1-to-1 APIMarcelo Ricardo Leitner
DLM is using 1-to-many API but in a 1-to-1 fashion. That is, it's not needed but this causes it to use sctp_do_peeloff() to mimic an kernel_accept() and this causes a symbol dependency on sctp module. By switching it to 1-to-1 API we can avoid this dependency and also reduce quite a lot of SCTP-specific code in lowcomms.c. The caveat is that now DLM won't always use the same src port. It will choose a random one, just like TCP code. This allows the peers to attempt simultaneous connections, which now are handled just like for TCP. Even more sharing between TCP and SCTP code on DLM is possible, but it is intentionally left for a later commit. Note that for using nodes with this commit, you have to have at least the early fixes on this patchset otherwise it will trigger some issues on old nodes. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David Teigland <teigland@redhat.com>
2015-08-17dlm: fix not reconnecting on connecting error handlingMarcelo Ricardo Leitner
If we don't clear that bit, lowcomms_connect_sock() will not schedule another attempt, and no further attempt will be done. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David Teigland <teigland@redhat.com>
2015-08-17dlm: fix race while closing connectionsMarcelo Ricardo Leitner
When a connection have issues DLM may need to close it. Therefore we should also cancel pending workqueues for such connection at that time, and not just when dlm is not willing to use this connection anymore. Also, if we don't clear CF_CONNECT_PENDING flag, the error handling routines won't be able to re-connect as lowcomms_connect_sock() will check for it. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David Teigland <teigland@redhat.com>
2015-08-17dlm: fix connection stealing if using SCTPMarcelo Ricardo Leitner
When using SCTP and accepting a new connection, DLM currently validates if the peer trying to connect to it is one of the cluster nodes, but it doesn't check if it already has a connection to it or not. If it already had a connection, it will be overwritten, and the new one will be used for writes, possibly causing the node to leave the cluster due to communication breakage. Still, one could DoS the node by attempting N connections and keeping them open. As said, but being explicit, both situations are only triggerable from other cluster nodes, but are doable with only user-level perms. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David Teigland <teigland@redhat.com>
2015-08-09Linux 4.2-rc6v4.2-rc6Linus Torvalds
2015-08-09Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input subsystem fixes from Dmitry Torokhov: "Just small ALPS and Elan touchpads, and other driver fixups" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: elantech - add special check for fw_version 0x470f01 touchpad Input: twl4030-vibra - fix ERROR: Bad of_node_put() warning Input: alps - only Dell laptops have separate button bits for v2 dualpoint sticks Input: axp20x-pek - add module alias Input: turbografx - fix potential out of bound access
2015-08-09Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linusLinus Torvalds
Pull MIPS fixes from Ralf Baechle: "Another round of MIPS fixes for 4.2. No area does particularly stand out but we have a two unpleasant ones: - Kernel ptes are marked with a global bit which allows the kernel to share kernel TLB entries between all processes. For this to work both entries of an adjacent even/odd pte pair need to have the global bit set. There has been a subtle race in setting the other entry's global bit since ~ 2000 but it take particularly pathological workloads that essentially do mostly vmalloc/vfree to trigger this. This pull request fixes the 64-bit case but leaves the case of 32 bit CPUs with 64 bit ptes unsolved for now. The unfixed cases affect hardware that is not available in the field yet. - Instruction emulation requires loading instructions from user space but the current fast but simplistic approach will fail on pages that are PROT_EXEC but !PROT_READ. For this reason we temporarily do not permit this permission and will map pages with PROT_EXEC | PROT_READ. The remainder of this pull request is more or less across the field and the short log explains them well" * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: MIPS: Make set_pte() SMP safe. MIPS: Replace add and sub instructions in relocate_kernel.S with addiu MIPS: Flush RPS on kernel entry with EVA Revert "MIPS: BCM63xx: Provide a plat_post_dma_flush hook" MIPS: BMIPS: Delete unused Kconfig symbol MIPS: Export get_c0_perfcount_int() MIPS: show_stack: Fix stack trace with EVA MIPS: do_mcheck: Fix kernel code dump with EVA MIPS: SMP: Don't increment irq_count multiple times for call function IPIs MIPS: Partially disable RIXI support. MIPS: Handle page faults of executable but unreadable pages correctly. MIPS: Malta: Don't reinitialise RTC MIPS: unaligned: Fix build error on big endian R6 kernels MIPS: Fix sched_getaffinity with MT FPAFF enabled MIPS: Fix build with CONFIG_OF=y for non OF-enabled targets CPUFREQ: Loongson2: Fix broken build due to incorrect include.
2015-08-09Merge branch 'for-linus-4.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fix from Chris Mason: "We have a btrfs quota regression fix. I merged this one on Thursday and have run it through tests against current master. Normally I wouldn't have sent this while you were finalizing rc6, but I'm feeding mosquitoes in the adirondacks next week, so I wanted to get this one out before leaving. I'll leave longer tests running and check on things during the week, but I don't expect any problems" * 'for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: qgroup: Fix a regression in qgroup reserved space.
2015-08-09Merge branch 'for-rc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux Pull thermal management fixes from Zhang Rui: "Specifics: - fix an error that "weight_attr" sysfs attribute is not removed while unbinding. From: Viresh Kumar. - fix power allocator governor tracing to return the real request. From Javi Merino. - remove redundant owner assignment of hisi platform thermal driver. From Krzysztof Kozlowski. - a couple of small fixes of Exynos thermal driver. From Krzysztof Kozlowski and Chanwoo Choi" * 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: thermal: Drop owner assignment from platform_driver thermal: exynos: Remove unused code related to platform_data on probe() thermal: exynos: Add the dependency of CONFIG_THERMAL_OF instead of CONFIG_OF thermal: exynos: Disable the regulator on probe failure thermal: power_allocator: trace the real requested power thermal: remove dangling 'weight_attr' device file
2015-08-08Merge tag 'arc-v4.2-rc6-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull ARC fixes from Vineet Gupta: "Here's a late pull request for accumulated ARC fixes which came out of extended testing of the new ARCv2 port with LTP etc. llock/scond livelock workaround has been reviewed by PeterZ. The changes look a lot but I've crafted them into finer grained patches for better tracking later. I have some more fixes (ARC Futex backend) ready to go but those will have to wait for tglx to return from vacation. Summary: - Enable a reduced config of HS38 (w/o div-rem, ll64...) - Add software workaround for LLOCK/SCOND livelock - Fallout of a recent pt_regs update" * tag 'arc-v4.2-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: ARCv2: spinlock/rwlock/atomics: reduce 1 instruction in exponential backoff ARC: Make pt_regs regs unsigned ARCv2: spinlock/rwlock: Reset retry delay when starting a new spin-wait cycle ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff ARC: LLOCK/SCOND based rwlock ARC: LLOCK/SCOND based spin_lock ARC: refactor atomic inline asm operands with symbolic names Revert "ARCv2: STAR 9000837815 workaround hardware exclusive transactions livelock" ARCv2: [axs103_smp] Reduce clk for Quad FPGA configs ARCv2: Fix the peripheral address space detection ARCv2: allow selection of page size for MMUv4 ARCv2: lib: memset: Don't assume 64-bit load/stores ARCv2: lib: memcpy: Missing PREFETCHW ARCv2: add knob for DIV_REV in Kconfig ARC/time: Migrate to new 'set-state' interface
2015-08-08Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio fix from Michael Tsirkin: "A last minute fix for the new virtio input driver. It seems pretty obvious, and the problem it's fixing would be quite hard to debug" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio-input: reset device and detach unused during remove
2015-08-08Merge tag 'dm-4.2-fixes-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - stable fix for a dm_merge_bvec() regression on 32 bit Fedora systems. - fix for a 4.2 DM thinp discard regression due to inability to properly delete a range of blocks in a data mapping btree. * tag 'dm-4.2-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm btree remove: fix bug in remove_one() dm: fix dm_merge_bvec regression on 32 bit systems
2015-08-08Merge tag 'sound-4.2-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "The only bulk changes in this request is ABI updates for ASoC topology API. It's a new API that was introduced in 4.2, and we'd like to avoid ABI change after the release, so it's taken now. As there is no real in-tree user for this API, it should be fairly safe. Other than that, the usual small fixes are found in various drivers: ASoC cs4265, rt5645, intel-sst, firewire, oxygen and HD-audio" * tag 'sound-4.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ASoC: topology: Add private data type and bump ABI version to 3 ASoC: topology: Add ops support to byte controls UAPI ASoC: topology: Update TLV support so we can support more TLV types ASoC: topology: add private data to manifest ASoC: topology: Add subsequence in topology ALSA: hda - one Dell machine needs the headphone white noise fixup ALSA: fireworks/firewire-lib: add support for recent firmware quirk Revert "ALSA: fireworks: add support for AudioFire2 quirk" ASoC: topology: fix typo in soc_tplg_kcontrol_bind_io() ALSA: HDA: Dont check return for snd_hdac_chip_readl ALSA: HDA: Fix stream assignment for host in decoupled mode ASoC: rt5645: Fix lost pin setting for DMIC1 ALSA: oxygen: Fix logical-not-parentheses warning ASoC: Intel: sst_byt: fix initialize 'NULL device *' issue ASoC: Intel: haswell: fix initialize 'NULL device *' issue ASoC: cs4265: Fix setting dai format for Left/Right Justified
2015-08-08Merge tag 'hwmon-for-linus-v4.2-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck: - Export module alias information in g762 and nct7904 to support auto-loading. - Blacklist Dell Studio XPS 8100 in dell-smm to fix fan control problems. * tag 'hwmon-for-linus-v4.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (g762) Export OF module alias information hwmon: (nct7904) Export I2C module alias information hwmon: (dell-smm) Blacklist Dell Studio XPS 8100
2015-08-08Merge tag 'usb-4.2-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are some USB and PHY fixes for 4.2-rc6 that resolve some reported issues. All of these have been in the linux-next tree for a while, full details on the patches are in the shortlog below" * tag 'usb-4.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: ARM: dts: dra7: Add syscon-pllreset syscon to SATA PHY drivers/usb: Delete XHCI command timer if necessary xhci: fix off by one error in TRB DMA address boundary check usb: udc: core: add device_del() call to error pathway phy: ti-pipe3: i783 workaround for SATA lockup after dpll unlock/relock phy-sun4i-usb: Add missing EXPORT_SYMBOL_GPL for sun4i_usb_phy_set_squelch_detect USB: sierra: add 1199:68AB device ID usb: gadget: f_printer: actually limit the number of instances usb: gadget: f_hid: actually limit the number of instances usb: gadget: f_uac2: fix calculation of uac2->p_interval usb: gadget: bdc: fix a driver crash on disconnect usb: chipidea: ehci_init_driver is intended to call one time USB: qcserial: Add support for Dell Wireless 5809e 4G Modem USB: qcserial/option: make AT URCs work for Sierra Wireless MC7305/MC7355
2015-08-08Merge tag 'staging-4.2-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging driver fixes from Greg KH: "Here are three bugfixes for some staging driver issues that have been reported. All have been in the linux-next tree for a while" * tag 'staging-4.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: staging: lustre: Include unaligned.h instead of access_ok.h staging: vt6655: vnt_bss_info_changed check conf->beacon_rate is not NULL staging: comedi: das1800: add missing break in switch
2015-08-08Merge tag 'char-misc-4.2-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc fixes from Greg KH: "Here are some extcon fixes for 4.2-rc6 that resolve some reported problems. All have been in linux-next for a while" * tag 'char-misc-4.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: extcon: Fix extcon_cable_get_state() from getting old state after notification extcon: Fix hang and extcon_get/set_cable_state(). extcon: palmas: Fix NULL pointer error
2015-08-08Merge tag 'drm-intel-fixes-2015-08-07' of ↵Linus Torvalds
git://anongit.freedesktop.org/drm-intel Pull drm fixes from Daniel Vetter: "One i915 regression fix and a drm core one since Dave's not around, both introduced in 4.2 so not cc: stable. The fix for the warning Ted reported isn't in here yet since he didn't yet supply a tested-by and I can't repro this one myself (it's in fixup code that needs firmware doing something i915 wouldn't do)" * tag 'drm-intel-fixes-2015-08-07' of git://anongit.freedesktop.org/drm-intel: drm/vblank: Use u32 consistently for vblank counters drm/i915: Allow parsing of variable size child device entries from VBT
2015-08-07Input: elantech - add special check for fw_version 0x470f01 touchpadDuson Lin
It is no need to check the packet[0] for sanity check when doing elantech_packet_check_v4() function for fw_version = 0x470f01 touchpad. Signed-off by: Duson Lin <dusonlin@emc.com.tw> Reviewed-by: Ulrik De Bie <ulrik.debie-os@e2big.org> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2015-08-07dm btree remove: fix bug in remove_one()Joe Thornber
remove_one() was not incrementing the key for the beginning of the range, so not all entries were being removed. This resulted in discards that were not unmapping all blocks. Fixes: 4ec331c3ea ("dm btree: add dm_btree_remove_leaves()") Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-08-07drm/vblank: Use u32 consistently for vblank countersDaniel Vetter
In commit 99264a61dfcda41d86d0960cf2d4c0fc2758a773 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Wed Apr 15 19:34:43 2015 +0200 drm/vblank: Fixup and document timestamp update/read barriers I've switched vblank->count from atomic_t to unsigned long and accidentally created an integer comparison bug in drm_vblank_count_and_time since vblanke->count might overflow the u32 local copy and hence the retry loop never succeed. Fix this by consistently using u32. Cc: Michel Dänzer <michel@daenzer.net> Reported-by: Michel Dänzer <michel@daenzer.net> Reviewed-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2015-08-07Merge tag 'asoc-fix-v4.2-rc5' of ↵Takashi Iwai
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v4.2 There are a couple of small driver specific fixes here but the overwhelming bulk of these changes are fixes to the topology ABI that has been newly introduced in v4.2. Once this makes it into a release we will have to firm this up but for now getting enhancements in before they've made it into a release is the most expedient thing.
2015-08-07ARCv2: spinlock/rwlock/atomics: reduce 1 instruction in exponential backoffVineet Gupta
The increment of delay counter was 2 instructions: Arithmatic Shfit Left (ASL) + set to 1 on overflow This can be done in 1 using ROtate Left (ROL) Suggested-by: Nigel Topham <ntopham@synopsys.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds
Pull sparc fix from David Miller: "FPU register corruption bug fix" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc64: Fix userspace FPU register corruptions.
2015-08-07Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge fixes from Andrew Morton: "21 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (21 commits) writeback: fix initial dirty limit mm/memory-failure: set PageHWPoison before migrate_pages() mm: check __PG_HWPOISON separately from PAGE_FLAGS_CHECK_AT_* mm/memory-failure: give up error handling for non-tail-refcounted thp mm/memory-failure: fix race in counting num_poisoned_pages mm/memory-failure: unlock_page before put_page ipc: use private shmem or hugetlbfs inodes for shm segments. mm: initialize hotplugged pages as reserved ocfs2: fix shift left overflow kthread: export kthread functions fsnotify: fix oops in fsnotify_clear_marks_by_group_flags() lib/iommu-common.c: do not use 0xffffffffffffffffl for computing align_mask mm/slub: allow merging when SLAB_DEBUG_FREE is set signalfd: fix information leak in signalfd_copyinfo signal: fix information leak in copy_siginfo_to_user signal: fix information leak in copy_siginfo_from_user32 ocfs2: fix BUG in ocfs2_downconvert_thread_do_work() fs, file table: reinit files_stat.max_files after deferred memory initialisation mm, meminit: replace rwsem with completion mm, meminit: allow early_pfn_to_nid to be used during runtime ...
2015-08-06sparc64: Fix userspace FPU register corruptions.David S. Miller
If we have a series of events from userpsace, with %fprs=FPRS_FEF, like follows: ETRAP ETRAP VIS_ENTRY(fprs=0x4) VIS_EXIT RTRAP (kernel FPU restore with fpu_saved=0x4) RTRAP We will not restore the user registers that were clobbered by the FPU using kernel code in the inner-most trap. Traps allocate FPU save slots in the thread struct, and FPU using sequences save the "dirty" FPU registers only. This works at the initial trap level because all of the registers get recorded into the top-level FPU save area, and we'll return to userspace with the FPU disabled so that any FPU use by the user will take an FPU disabled trap wherein we'll load the registers back up properly. But this is not how trap returns from kernel to kernel operate. The simplest fix for this bug is to always save all FPU register state for anything other than the top-most FPU save area. Getting rid of the optimized inner-slot FPU saving code ends up making VISEntryHalf degenerate into plain VISEntry. Longer term we need to do something smarter to reinstate the partial save optimizations. Perhaps the fundament error is having trap entry and exit allocate FPU save slots and restore register state. Instead, the VISEntry et al. calls should be doing that work. This bug is about two decades old. Reported-by: James Y Knight <jyknight@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-07Merge branch 'drm-fixes-4.2' of git://people.freedesktop.org/~agd5f/linuxLinus Torvalds
Pull amdgpu fixes from Alex Deucher: "Just a few amdgpu fixes to make sure we report the proper firmware information and number of render buffers to userspace and a typo in a debugging function" [ Pulling directly from Alex since Dave Airlie is on vacation - Linus ] * 'drm-fixes-4.2' of git://people.freedesktop.org/~agd5f/linux: drm/amdgpu: set fw_version and feature_version for smu fw loading drm/amdgpu: add feature version for SDMA ucode drm/amdgpu: add feature version for RLC and MEC v2 drm/amdgpu: increment queue when iterating on this variable. drm/amdgpu: fix rb setting for CZ
2015-08-07Merge branch 'drm-tda998x-fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-armLinus Torvalds
Pull TDA998x i2c driver fixes from Russell King: "This fixes the double-checksumming of the AVI infoframe which was resulting in the checksum always being zero. It went unnoticed as none of my HDMI devices had a problem with this" [ Pulling directly from rmk since Dave Airlie is on vacation - Linus ] * 'drm-tda998x-fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: drm/i2c: tda998x: fix bad checksum of the HDMI AVI infoframe
2015-08-07writeback: fix initial dirty limitRabin Vincent
The initial value of global_wb_domain.dirty_limit set by writeback_set_ratelimit() is zeroed out by the memset in wb_domain_init(). Signed-off-by: Rabin Vincent <rabin.vincent@axis.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07mm/memory-failure: set PageHWPoison before migrate_pages()Naoya Horiguchi
Now page freeing code doesn't consider PageHWPoison as a bad page, so by setting it before completing the page containment, we can prevent the error page from being reused just after successful page migration. I added TTU_IGNORE_HWPOISON for try_to_unmap() to make sure that the page table entry is transformed into migration entry, not to hwpoison entry. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Dean Nelson <dnelson@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Hugh Dickins <hughd@google.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07mm: check __PG_HWPOISON separately from PAGE_FLAGS_CHECK_AT_*Naoya Horiguchi
The race condition addressed in commit add05cecef80 ("mm: soft-offline: don't free target page in successful page migration") was not closed completely, because that can happen not only for soft-offline, but also for hard-offline. Consider that a slab page is about to be freed into buddy pool, and then an uncorrected memory error hits the page just after entering __free_one_page(), then VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP) is triggered, despite the fact that it's not necessary because the data on the affected page is not consumed. To solve it, this patch drops __PG_HWPOISON from page flag checks at allocation/free time. I think it's justified because __PG_HWPOISON flags is defined to prevent the page from being reused, and setting it outside the page's alloc-free cycle is a designed behavior (not a bug.) For recent months, I was annoyed about BUG_ON when soft-offlined page remains on lru cache list for a while, which is avoided by calling put_page() instead of putback_lru_page() in page migration's success path. This means that this patch reverts a major change from commit add05cecef80 about the new refcounting rule of soft-offlined pages, so "reuse window" revives. This will be closed by a subsequent patch. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Dean Nelson <dnelson@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Hugh Dickins <hughd@google.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07mm/memory-failure: give up error handling for non-tail-refcounted thpNaoya Horiguchi
"non anonymous thp" case is still racy with freeing thp, which causes panic due to put_page() for refcount-0 page. It seems that closing up this race might be hard (and/or not worth doing,) so let's give up the error handling for this case. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Dean Nelson <dnelson@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Hugh Dickins <hughd@google.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07mm/memory-failure: fix race in counting num_poisoned_pagesNaoya Horiguchi
When memory_failure() is called on a page which are just freed after page migration from soft offlining, the counter num_poisoned_pages is raised twi= ce. So let's fix it with using TestSetPageHWPoison. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Dean Nelson <dnelson@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Hugh Dickins <hughd@google.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07mm/memory-failure: unlock_page before put_pageNaoya Horiguchi
Recently I addressed a few of hwpoison race problems and the patches are merged on v4.2-rc1. It made progress, but unfortunately some problems still remain due to less coverage of my testing. So I'm trying to fix or avoid them in this series. One point I'm expecting to discuss is that patch 4/5 changes the page flag set to be checked on free time. In current behavior, __PG_HWPOISON is not supposed to be set when the page is freed. I think that there is no strong reason for this behavior, and it causes a problem hard to fix only in error handler side (because __PG_HWPOISON could be set at arbitrary timing.) So I suggest to change it. With this patchset, hwpoison stress testing in official mce-test testsuite (which previously failed) passes. This patch (of 5): In "just unpoisoned" path, we do put_page and then unlock_page, which is a wrong order and causes "freeing locked page" bug. So let's fix it. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Dean Nelson <dnelson@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Hugh Dickins <hughd@google.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07ipc: use private shmem or hugetlbfs inodes for shm segments.Stephen Smalley
The shm implementation internally uses shmem or hugetlbfs inodes for shm segments. As these inodes are never directly exposed to userspace and only accessed through the shm operations which are already hooked by security modules, mark the inodes with the S_PRIVATE flag so that inode security initialization and permission checking is skipped. This was motivated by the following lockdep warning: ====================================================== [ INFO: possible circular locking dependency detected ] 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1 Tainted: G W ------------------------------------------------------- httpd/1597 is trying to acquire lock: (&ids->rwsem){+++++.}, at: shm_close+0x34/0x130 but task is already holding lock: (&mm->mmap_sem){++++++}, at: SyS_shmdt+0x4b/0x180 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&mm->mmap_sem){++++++}: lock_acquire+0xc7/0x270 __might_fault+0x7a/0xa0 filldir+0x9e/0x130 xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs] xfs_readdir+0x1b4/0x330 [xfs] xfs_file_readdir+0x2b/0x30 [xfs] iterate_dir+0x97/0x130 SyS_getdents+0x91/0x120 entry_SYSCALL_64_fastpath+0x12/0x76 -> #2 (&xfs_dir_ilock_class){++++.+}: lock_acquire+0xc7/0x270 down_read_nested+0x57/0xa0 xfs_ilock+0x167/0x350 [xfs] xfs_ilock_attr_map_shared+0x38/0x50 [xfs] xfs_attr_get+0xbd/0x190 [xfs] xfs_xattr_get+0x3d/0x70 [xfs] generic_getxattr+0x4f/0x70 inode_doinit_with_dentry+0x162/0x670 sb_finish_set_opts+0xd9/0x230 selinux_set_mnt_opts+0x35c/0x660 superblock_doinit+0x77/0xf0 delayed_superblock_init+0x10/0x20 iterate_supers+0xb3/0x110 selinux_complete_init+0x2f/0x40 security_load_policy+0x103/0x600 sel_write_load+0xc1/0x750 __vfs_write+0x37/0x100 vfs_write+0xa9/0x1a0 SyS_write+0x58/0xd0 entry_SYSCALL_64_fastpath+0x12/0x76 ... Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Reported-by: Morten Stevens <mstevens@fedoraproject.org> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Paul Moore <paul@paul-moore.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07mm: initialize hotplugged pages as reservedMel Gorman
Commit 92923ca3aace ("mm: meminit: only set page reserved in the memblock region") broke memory hotplug which expects the memmap for newly added sections to be reserved until onlined by online_pages_range(). This patch marks hotplugged pages as reserved when adding new zones. Signed-off-by: Mel Gorman <mgorman@suse.de> Reported-by: David Vrabel <david.vrabel@citrix.com> Tested-by: David Vrabel <david.vrabel@citrix.com> Cc: Nathan Zimmer <nzimmer@sgi.com> Cc: Robin Holt <holt@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07ocfs2: fix shift left overflowJoseph Qi
When using a large volume, for example 9T volume with 2T already used, frequent creation of small files with O_DIRECT when the IO is not cluster aligned may clear sectors in the wrong place. This will cause filesystem corruption. This is because p_cpos is a u32. When calculating the corresponding sector it should be converted to u64 first, otherwise it may overflow. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: <stable@vger.kernel.org> [4.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07kthread: export kthread functionsDavid Kershner
The s-Par visornic driver, currently in staging, processes a queue being serviced by the an s-Par service partition. We can get a message that something has happened with the Service Partition, when that happens, we must not access the channel until we get a message that the service partition is back again. The visornic driver has a thread for processing the channel, when we get the message, we need to be able to park the thread and then resume it when the problem clears. We can do this with kthread_park and unpark but they are not exported from the kernel, this patch exports the needed functions. Signed-off-by: David Kershner <david.kershner@unisys.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Weinberger <richard.weinberger@gmail.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()Jan Kara
fsnotify_clear_marks_by_group_flags() can race with fsnotify_destroy_marks() so that when fsnotify_destroy_mark_locked() drops mark_mutex, a mark from the list iterated by fsnotify_clear_marks_by_group_flags() can be freed and thus the next entry pointer we have cached may become stale and we dereference free memory. Fix the problem by first moving marks to free to a special private list and then always free the first entry in the special list. This method is safe even when entries from the list can disappear once we drop the lock. Signed-off-by: Jan Kara <jack@suse.com> Reported-by: Ashish Sangwan <a.sangwan@samsung.com> Reviewed-by: Ashish Sangwan <a.sangwan@samsung.com> Cc: Lino Sanfilippo <LinoSanfilippo@gmx.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07lib/iommu-common.c: do not use 0xffffffffffffffffl for computing align_maskSowmini Varadhan
Using a 64 bit constant generates "warning: integer constant is too large for 'long' type" on 32 bit platforms. Instead use ~0ul and BITS_PER_LONG. Detected by Andrew Morton on ARMD. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David S. Miller <davem@davemloft.net> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07mm/slub: allow merging when SLAB_DEBUG_FREE is setKonstantin Khlebnikov
This patch fixes creation of new kmem-caches after enabling sanity_checks for existing mergeable kmem-caches in runtime: before that patch creation fails because unique name in sysfs already taken by existing kmem-cache. Unlike other debug options this doesn't change object layout and could be enabled and disabled at any time. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07signalfd: fix information leak in signalfd_copyinfoAmanieu d'Antras
This function may copy the si_addr_lsb field to user mode when it hasn't been initialized, which can leak kernel stack data to user mode. Just checking the value of si_code is insufficient because the same si_code value is shared between multiple signals. This is solved by checking the value of si_signo in addition to si_code. Signed-off-by: Amanieu d'Antras <amanieu@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07signal: fix information leak in copy_siginfo_to_userAmanieu d'Antras
This function may copy the si_addr_lsb, si_lower and si_upper fields to user mode when they haven't been initialized, which can leak kernel stack data to user mode. Just checking the value of si_code is insufficient because the same si_code value is shared between multiple signals. This is solved by checking the value of si_signo in addition to si_code. Signed-off-by: Amanieu d'Antras <amanieu@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07signal: fix information leak in copy_siginfo_from_user32Amanieu d'Antras
This function can leak kernel stack data when the user siginfo_t has a positive si_code value. The top 16 bits of si_code descibe which fields in the siginfo_t union are active, but they are treated inconsistently between copy_siginfo_from_user32, copy_siginfo_to_user32 and copy_siginfo_to_user. copy_siginfo_from_user32 is called from rt_sigqueueinfo and rt_tgsigqueueinfo in which the user has full control overthe top 16 bits of si_code. This fixes the following information leaks: x86: 8 bytes leaked when sending a signal from a 32-bit process to itself. This leak grows to 16 bytes if the process uses x32. (si_code = __SI_CHLD) x86: 100 bytes leaked when sending a signal from a 32-bit process to a 64-bit process. (si_code = -1) sparc: 4 bytes leaked when sending a signal from a 32-bit process to a 64-bit process. (si_code = any) parsic and s390 have similar bugs, but they are not vulnerable because rt_[tg]sigqueueinfo have checks that prevent sending a positive si_code to a different process. These bugs are also fixed for consistency. Signed-off-by: Amanieu d'Antras <amanieu@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07ocfs2: fix BUG in ocfs2_downconvert_thread_do_work()Joseph Qi
The "BUG_ON(list_empty(&osb->blocked_lock_list))" in ocfs2_downconvert_thread_do_work can be triggered in the following case: ocfs2dc has firstly saved osb->blocked_lock_count to local varibale processed, and then processes the dentry lockres. During the dentry put, it calls iput and then deletes rw, inode and open lockres from blocked list in ocfs2_mark_lockres_freeing. And this causes the variable `processed' to not reflect the number of blocked lockres to be processed, which triggers the BUG. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07fs, file table: reinit files_stat.max_files after deferred memory initialisationMel Gorman
Dave Hansen reported the following; My laptop has been behaving strangely with 4.2-rc2. Once I log in to my X session, I start getting all kinds of strange errors from applications and see this in my dmesg: VFS: file-max limit 8192 reached The problem is that the file-max is calculated before memory is fully initialised and miscalculates how much memory the kernel is using. This patch recalculates file-max after deferred memory initialisation. Note that using memory hotplug infrastructure would not have avoided this problem as the value is not recalculated after memory hot-add. 4.1: files_stat.max_files = 6582781 4.2-rc2: files_stat.max_files = 8192 4.2-rc2 patched: files_stat.max_files = 6562467 Small differences with the patch applied and 4.1 but not enough to matter. Signed-off-by: Mel Gorman <mgorman@suse.de> Reported-by: Dave Hansen <dave.hansen@intel.com> Cc: Nicolai Stange <nicstange@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Alex Ng <alexng@microsoft.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07mm, meminit: replace rwsem with completionNicolai Stange
Commit 0e1cc95b4cc7 ("mm: meminit: finish initialisation of struct pages before basic setup") introduced a rwsem to signal completion of the initialization workers. Lockdep complains about possible recursive locking: ============================================= [ INFO: possible recursive locking detected ] 4.1.0-12802-g1dc51b8 #3 Not tainted --------------------------------------------- swapper/0/1 is trying to acquire lock: (pgdat_init_rwsem){++++.+}, at: [<ffffffff8424c7fb>] page_alloc_init_late+0xc7/0xe6 but task is already holding lock: (pgdat_init_rwsem){++++.+}, at: [<ffffffff8424c772>] page_alloc_init_late+0x3e/0xe6 Replace the rwsem by a completion together with an atomic "outstanding work counter". [peterz@infradead.org: Barrier removal on the grounds of being pointless] [mgorman@suse.de: Applied review feedback] Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Alex Ng <alexng@microsoft.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07mm, meminit: allow early_pfn_to_nid to be used during runtimeMel Gorman
early_pfn_to_nid() historically was inherently not SMP safe but only used during boot which is inherently single threaded or during hotplug which is protected by a giant mutex. With deferred memory initialisation there was a thread-safe version introduced and the early_pfn_to_nid would trigger a BUG_ON if used unsafely. Memory hotplug hit that check. This patch makes early_pfn_to_nid introduces a lock to make it safe to use during hotplug. Signed-off-by: Mel Gorman <mgorman@suse.de> Reported-by: Alex Ng <alexng@microsoft.com> Tested-by: Alex Ng <alexng@microsoft.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Nicolai Stange <nicstange@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07ipc: modify message queue accounting to not take kernel data structures into ↵Marcus Gelderie
account A while back, the message queue implementation in the kernel was improved to use btrees to speed up retrieval of messages, in commit d6629859b36d ("ipc/mqueue: improve performance of send/recv"). That patch introducing the improved kernel handling of message queues (using btrees) has, as a by-product, changed the meaning of the QSIZE field in the pseudo-file created for the queue. Before, this field reflected the size of the user-data in the queue. Since, it also takes kernel data structures into account. For example, if 13 bytes of user data are in the queue, on my machine the file reports a size of 61 bytes. There was some discussion on this topic before (for example https://lkml.org/lkml/2014/10/1/115). Commenting on a th lkml, Michael Kerrisk gave the following background (https://lkml.org/lkml/2015/6/16/74): The pseudofiles in the mqueue filesystem (usually mounted at /dev/mqueue) expose fields with metadata describing a message queue. One of these fields, QSIZE, as originally implemented, showed the total number of bytes of user data in all messages in the message queue, and this feature was documented from the beginning in the mq_overview(7) page. In 3.5, some other (useful) work happened to break the user-space API in a couple of places, including the value exposed via QSIZE, which now includes a measure of kernel overhead bytes for the queue, a figure that renders QSIZE useless for its original purpose, since there's no way to deduce the number of overhead bytes consumed by the implementation. (The other user-space breakage was subsequently fixed.) This patch removes the accounting of kernel data structures in the queue. Reporting the size of these data-structures in the QSIZE field was a breaking change (see Michael's comment above). Without the QSIZE field reporting the total size of user-data in the queue, there is no way to deduce this number. It should be noted that the resource limit RLIMIT_MSGQUEUE is counted against the worst-case size of the queue (in both the old and the new implementation). Therefore, the kernel overhead accounting in QSIZE is not necessary to help the user understand the limitations RLIMIT imposes on the processes. Signed-off-by: Marcus Gelderie <redmnic@gmail.com> Acked-by: Doug Ledford <dledford@redhat.com> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Davidlohr Bueso <dbueso@suse.de> Cc: David Howells <dhowells@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: John Duffy <jb_duffy@btinternet.com> Cc: Arto Bendiken <arto@bendiken.net> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-06btrfs: qgroup: Fix a regression in qgroup reserved space.Qu Wenruo
During the change to new btrfs extent-oriented qgroup implement, due to it doesn't use the old __qgroup_excl_accounting() for exclusive extent, it didn't free the reserved bytes. The bug will cause limit function go crazy as the reserved space is never freed, increasing limit will have no effect and still cause EQOUT. The fix is easy, just free reserved bytes for newly created exclusive extent as what it does before. Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Yang Dongsheng <yangds.fnst@cn.fujitsu.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>