summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2016-05-20userfaultfd: don't pin the user memory in userfaultfd_file_create()Oleg Nesterov
userfaultfd_file_create() increments mm->mm_users; this means that the memory won't be unmapped/freed if mm owner exits/execs, and UFFDIO_COPY after that can populate the orphaned mm more. Change userfaultfd_file_create() and userfaultfd_ctx_put() to use mm->mm_count to pin mm_struct. This means that atomic_inc_not_zero(mm->mm_users) is needed when we are going to actually play with this memory. Except handle_userfault() path doesn't need this, the caller must already have a reference. The patch adds the new trivial helper, mmget_not_zero(), it can have more users. Link: http://lkml.kernel.org/r/20160516172254.GA8595@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20mm,writeback: don't use memory reserves for wb_start_writebackTetsuo Handa
When writeback operation cannot make forward progress because memory allocation requests needed for doing I/O cannot be satisfied (e.g. under OOM-livelock situation), we can observe flood of order-0 page allocation failure messages caused by complete depletion of memory reserves. This is caused by unconditionally allocating "struct wb_writeback_work" objects using GFP_ATOMIC from PF_MEMALLOC context. __alloc_pages_nodemask() { __alloc_pages_slowpath() { __alloc_pages_direct_reclaim() { __perform_reclaim() { current->flags |= PF_MEMALLOC; try_to_free_pages() { do_try_to_free_pages() { wakeup_flusher_threads() { wb_start_writeback() { kzalloc(sizeof(*work), GFP_ATOMIC) { /* ALLOC_NO_WATERMARKS via PF_MEMALLOC */ } } } } } current->flags &= ~PF_MEMALLOC; } } } } Since I/O is stalling, allocating writeback requests forever shall deplete memory reserves. Fortunately, since wb_start_writeback() can fall back to wb_wakeup() when allocating "struct wb_writeback_work" failed, we don't need to allow wb_start_writeback() to use memory reserves. Mem-Info: active_anon:289393 inactive_anon:2093 isolated_anon:29 active_file:10838 inactive_file:113013 isolated_file:859 unevictable:0 dirty:108531 writeback:5308 unstable:0 slab_reclaimable:5526 slab_unreclaimable:7077 mapped:9970 shmem:2159 pagetables:2387 bounce:0 free:3042 free_pcp:0 free_cma:0 Node 0 DMA free:6968kB min:44kB low:52kB high:64kB active_anon:6056kB inactive_anon:176kB active_file:712kB inactive_file:744kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:756kB writeback:0kB mapped:736kB shmem:184kB slab_reclaimable:48kB slab_unreclaimable:208kB kernel_stack:160kB pagetables:144kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:9708 all_unreclaimable? yes lowmem_reserve[]: 0 1732 1732 1732 Node 0 DMA32 free:5200kB min:5200kB low:6500kB high:7800kB active_anon:1151516kB inactive_anon:8196kB active_file:42640kB inactive_file:451076kB unevictable:0kB isolated(anon):116kB isolated(file):3564kB present:2080640kB managed:1775332kB mlocked:0kB dirty:433368kB writeback:21232kB mapped:39144kB shmem:8452kB slab_reclaimable:22056kB slab_unreclaimable:28100kB kernel_stack:20976kB pagetables:9404kB unstable:0kB bounce:0kB free_pcp:120kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:2701604 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Node 0 DMA: 25*4kB (UME) 16*8kB (UME) 3*16kB (UE) 5*32kB (UME) 2*64kB (UM) 2*128kB (ME) 2*256kB (ME) 1*512kB (E) 1*1024kB (E) 2*2048kB (ME) 0*4096kB = 6964kB Node 0 DMA32: 925*4kB (UME) 140*8kB (UME) 5*16kB (ME) 5*32kB (M) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 5060kB Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB 126847 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB 524157 pages RAM 0 pages HighMem/MovableOnly 76348 pages reserved 0 pages hwpoisoned Out of memory: Kill process 4450 (file_io.00) score 998 or sacrifice child Killed process 4450 (file_io.00) total-vm:4308kB, anon-rss:100kB, file-rss:1184kB, shmem-rss:0kB kthreadd: page allocation failure: order:0, mode:0x2200020 file_io.00: page allocation failure: order:0, mode:0x2200020 CPU: 0 PID: 4457 Comm: file_io.00 Not tainted 4.5.0-rc7+ #45 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 Call Trace: warn_alloc_failed+0xf7/0x150 __alloc_pages_nodemask+0x23f/0xa60 alloc_pages_current+0x87/0x110 new_slab+0x3a1/0x440 ___slab_alloc+0x3cf/0x590 __slab_alloc.isra.64+0x18/0x1d kmem_cache_alloc+0x11c/0x150 wb_start_writeback+0x39/0x90 wakeup_flusher_threads+0x7f/0xf0 do_try_to_free_pages+0x1f9/0x410 try_to_free_pages+0x94/0xc0 __alloc_pages_nodemask+0x566/0xa60 alloc_pages_current+0x87/0x110 __page_cache_alloc+0xaf/0xc0 pagecache_get_page+0x88/0x260 grab_cache_page_write_begin+0x21/0x40 xfs_vm_write_begin+0x2f/0xf0 generic_perform_write+0xca/0x1c0 xfs_file_buffered_aio_write+0xcc/0x1f0 xfs_file_write_iter+0x84/0x140 __vfs_write+0xc7/0x100 vfs_write+0x9d/0x190 SyS_write+0x50/0xc0 entry_SYSCALL_64_fastpath+0x12/0x6a Mem-Info: active_anon:293335 inactive_anon:2093 isolated_anon:0 active_file:10829 inactive_file:110045 isolated_file:32 unevictable:0 dirty:109275 writeback:822 unstable:0 slab_reclaimable:5489 slab_unreclaimable:10070 mapped:9999 shmem:2159 pagetables:2420 bounce:0 free:3 free_pcp:0 free_cma:0 Node 0 DMA free:12kB min:44kB low:52kB high:64kB active_anon:6060kB inactive_anon:176kB active_file:708kB inactive_file:756kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:756kB writeback:0kB mapped:736kB shmem:184kB slab_reclaimable:48kB slab_unreclaimable:7160kB kernel_stack:160kB pagetables:144kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:9844 all_unreclaimable? yes lowmem_reserve[]: 0 1732 1732 1732 Node 0 DMA32 free:0kB min:5200kB low:6500kB high:7800kB active_anon:1167280kB inactive_anon:8196kB active_file:42608kB inactive_file:439424kB unevictable:0kB isolated(anon):0kB isolated(file):128kB present:2080640kB managed:1775332kB mlocked:0kB dirty:436344kB writeback:3288kB mapped:39260kB shmem:8452kB slab_reclaimable:21908kB slab_unreclaimable:33120kB kernel_stack:20976kB pagetables:9536kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:11073180 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0 Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB 123086 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB 524157 pages RAM 0 pages HighMem/MovableOnly 76348 pages reserved 0 pages hwpoisoned SLUB: Unable to allocate memory on node -1 (gfp=0x2088020) cache: kmalloc-64, object size: 64, buffer size: 64, default order: 0, min order: 0 node 0: slabs: 3218, objs: 205952, free: 0 file_io.00: page allocation failure: order:0, mode:0x2200020 CPU: 0 PID: 4457 Comm: file_io.00 Not tainted 4.5.0-rc7+ #45 Assuming that somebody will find a better solution, let's apply this patch for now to stop bleeding, for this problem frequently prevents me from testing OOM livelock condition. Link: http://lkml.kernel.org/r/20160318131136.GE7152@quack.suse.cz Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Jan Kara <jack@suse.cz> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20tmpfs/ramfs: fix VM_MAYSHARE mappings for NOMMURich Felker
The nommu do_mmap expects f_op->get_unmapped_area to either succeed or return -ENOSYS for VM_MAYSHARE (e.g. private read-only) mappings. Returning addr in the non-MAP_SHARED case was completely wrong, and only happened to work because addr was 0. However, it prevented VM_MAYSHARE mappings from sharing backing with the fs cache, and forced such mappings (including shareable program text) to be copied whenever the number of mappings transitioned from 0 to 1, impacting performance and memory usage. Subsequent mappings beyond the first still correctly shared memory with the first. Instead, treat VM_MAYSHARE identically to VM_SHARED at the file ops level; do_mmap already handles the semantic differences between them. Signed-off-by: Rich Felker <dalias@libc.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Greg Ungerer <gerg@uclinux.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge updates from Andrew Morton: - fsnotify fix - poll() timeout fix - a few scripts/ tweaks - debugobjects updates - the (small) ocfs2 queue - Minor fixes to kernel/padata.c - Maybe half of the MM queue * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (117 commits) mm, page_alloc: restore the original nodemask if the fast path allocation failed mm, page_alloc: uninline the bad page part of check_new_page() mm, page_alloc: don't duplicate code in free_pcp_prepare mm, page_alloc: defer debugging checks of pages allocated from the PCP mm, page_alloc: defer debugging checks of freed pages until a PCP drain cpuset: use static key better and convert to new API mm, page_alloc: inline pageblock lookup in page free fast paths mm, page_alloc: remove unnecessary variable from free_pcppages_bulk mm, page_alloc: pull out side effects from free_pages_check mm, page_alloc: un-inline the bad part of free_pages_check mm, page_alloc: check multiple page fields with a single branch mm, page_alloc: remove field from alloc_context mm, page_alloc: avoid looking up the first zone in a zonelist twice mm, page_alloc: shortcut watermark checks for order-0 pages mm, page_alloc: reduce cost of fair zone allocation policy retry mm, page_alloc: shorten the page allocator fast path mm, page_alloc: check once if a zone has isolated pageblocks mm, page_alloc: move __GFP_HARDWALL modifications out of the fastpath mm, page_alloc: simplify last cpupid reset mm, page_alloc: remove unnecessary initialisation from __alloc_pages_nodemask() ...
2016-05-19mm, page_alloc: avoid looking up the first zone in a zonelist twiceMel Gorman
The allocator fast path looks up the first usable zone in a zonelist and then get_page_from_freelist does the same job in the zonelist iterator. This patch preserves the necessary information. 4.6.0-rc2 4.6.0-rc2 fastmark-v1r20 initonce-v1r20 Min alloc-odr0-1 364.00 ( 0.00%) 359.00 ( 1.37%) Min alloc-odr0-2 262.00 ( 0.00%) 260.00 ( 0.76%) Min alloc-odr0-4 214.00 ( 0.00%) 214.00 ( 0.00%) Min alloc-odr0-8 186.00 ( 0.00%) 186.00 ( 0.00%) Min alloc-odr0-16 173.00 ( 0.00%) 173.00 ( 0.00%) Min alloc-odr0-32 165.00 ( 0.00%) 165.00 ( 0.00%) Min alloc-odr0-64 161.00 ( 0.00%) 162.00 ( -0.62%) Min alloc-odr0-128 159.00 ( 0.00%) 161.00 ( -1.26%) Min alloc-odr0-256 168.00 ( 0.00%) 170.00 ( -1.19%) Min alloc-odr0-512 180.00 ( 0.00%) 181.00 ( -0.56%) Min alloc-odr0-1024 190.00 ( 0.00%) 190.00 ( 0.00%) Min alloc-odr0-2048 196.00 ( 0.00%) 196.00 ( 0.00%) Min alloc-odr0-4096 202.00 ( 0.00%) 202.00 ( 0.00%) Min alloc-odr0-8192 206.00 ( 0.00%) 205.00 ( 0.49%) Min alloc-odr0-16384 206.00 ( 0.00%) 205.00 ( 0.49%) The benefit is negligible and the results are within the noise but each cycle counts. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19mm: rename _count, field of the struct page, to _refcountJoonsoo Kim
Many developers already know that field for reference count of the struct page is _count and atomic type. They would try to handle it directly and this could break the purpose of page reference count tracepoint. To prevent direct _count modification, this patch rename it to _refcount and add warning message on the code. After that, developer who need to handle reference count will find that field should not be accessed directly. [akpm@linux-foundation.org: fix comments, per Vlastimil] [akpm@linux-foundation.org: Documentation/vm/transhuge.txt too] [sfr@canb.auug.org.au: sync ethernet driver changes] Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Sunil Goutham <sgoutham@cavium.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Manish Chopra <manish.chopra@qlogic.com> Cc: Yuval Mintz <yuval.mintz@qlogic.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19ocfs2: clean up an unneeded goto in ocfs2_put_slot()Guozhonghua
The goto is not useful in ocfs2_put_slot(), so delete it. Signed-off-by: Guozhonghua <guozhonghua@h3c.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19ocfs2: clean up unused parameter 'count' in o2hb_read_block_input()Jun Piao
Clean up unused parameter 'count' in o2hb_read_block_input(). Signed-off-by: Jun Piao <piaojun@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19ocfs2: clean up an unused variable 'wants_rotate' in ocfs2_truncate_recpiaojun
Clean up an unused variable 'wants_rotate' in ocfs2_truncate_rec. Signed-off-by: Jun Piao <piaojun@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19ocfs2: fix comment in struct ocfs2_extended_slotGuozhonghua
The comment in ocfs2_extended_slot has the offset wrong. Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19fs: poll/select/recvmmsg: use timespec64 for timeout eventsDeepa Dinamani
struct timespec is not y2038 safe. Even though timespec might be sufficient to represent timeouts, use struct timespec64 here as the plan is to get rid of all timespec reference in the kernel. The patch transitions the common functions: poll_select_set_timeout() and select_estimate_accuracy() to use timespec64. And, all the syscalls that use these functions are transitioned in the same patch. The restart block parameters for poll uses monotonic time. Use timespec64 here as well to assign timeout value. This parameter in the restart block need not change because this only holds the monotonic timestamp at which timeout should occur. And, unsigned long data type should be big enough for this timestamp. The system call interfaces will be handled in a separate series. Compat interfaces need not change as timespec64 is an alias to struct timespec on a 64 bit system. Link: http://lkml.kernel.org/r/1461947989-21926-3-git-send-email-deepa.kernel@gmail.com Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: John Stultz <john.stultz@linaro.org> Acked-by: David S. Miller <davem@davemloft.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19fsnotify: avoid spurious EMFILE errors from inotify_init()Jan Kara
Inotify instance is destroyed when all references to it are dropped. That not only means that the corresponding file descriptor needs to be closed but also that all corresponding instance marks are freed (as each mark holds a reference to the inotify instance). However marks are freed only after SRCU period ends which can take some time and thus if user rapidly creates and frees inotify instances, number of existing inotify instances can exceed max_user_instances limit although from user point of view there is always at most one existing instance. Thus inotify_init() returns EMFILE error which is hard to justify from user point of view. This problem is exposed by LTP inotify06 testcase on some machines. We fix the problem by making sure all group marks are properly freed while destroying inotify instance. We wait for SRCU period to end in that path anyway since we have to make sure there is no event being added to the instance while we are tearing down the instance. So it takes only some plumbing to allow for marks to be destroyed in that path as well and not from a dedicated work item. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Xiaoguang Wang <wangxg.fnst@cn.fujitsu.com> Tested-by: Xiaoguang Wang <wangxg.fnst@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linusLinus Torvalds
Pull MIPS updates from Ralf Baechle: "This is the main pull request for MIPS for 4.7. Here's the summary of the changes: - ATH79: Support for DTB passuing using the UHI boot protocol - ATH79: Remove support for builtin DTB. - ATH79: Add zboot debug serial support. - ATH79: Add initial support for Dragino MS14 (Dragine 2), Onion Omega and DPT-Module. - ATH79: Update devicetree clock support for AR9132 and AR9331. - ATH79: Cleanup the DT code. - ATH79: Support newer SOCs in ath79_ddr_ctrl_init. - ATH79: Fix regression in PCI window initialization. - BCM47xx: Move SPROM driver to drivers/firmware/ - BCM63xx: Enable partition parser in defconfig. - BMIPS: BMIPS5000 has I cache filing from D cache - BMIPS: BMIPS: Add cpu-feature-overrides.h - BMIPS: Add Whirlwind support - BMIPS: Adjust mips-hpt-frequency for BCM7435 - BMIPS: Remove maxcpus from BCM97435SVMB DTS - BMIPS: Add missing 7038 L1 register cells to BCM7435 - BMIPS: Various tweaks to initialization code. - BMIPS: Enable partition parser in defconfig. - BMIPS: Cache tweaks. - BMIPS: Add UART, I2C and SATA devices to DT. - BMIPS: Add BCM6358 and BCM63268support - BMIPS: Add device tree example for BCM6358. - BMIPS: Improve Improve BCM6328 and BCM6368 device trees - Lantiq: Add support for device tree file from boot loader - Lantiq: Allow build with no built-in DT. - Loongson 3: Reserve 32MB for RS780E integrated GPU. - Loongson 3: Fix build error after ld-version.sh modification - Loongson 3: Move chipset ACPI code from drivers to arch. - Loongson 3: Speedup irq processing. - Loongson 3: Add basic Loongson 3A support. - Loongson 3: Set cache flush handlers to nop. - Loongson 3: Invalidate special TLBs when needed. - Loongson 3: Fast TLB refill handler. - MT7620: Fallback strategy for invalid syscfg0. - Netlogic: Fix CP0_EBASE redefinition warnings - Octeon: Initialization fixes - Octeon: Add DTS files for the D-Link DSR-1000N and EdgeRouter Lite - Octeon: Enable add Octeon-drivers in cavium_octeon_defconfig - Octeon: Correctly handle endian-swapped initramfs images. - Octeon: Support CN73xx, CN75xx and CN78xx. - Octeon: Remove dead code from cvmx-sysinfo. - Octeon: Extend number of supported CPUs past 32. - Octeon: Remove some code limiting NR_IRQS to 255. - Octeon: Simplify octeon_irq_ciu_gpio_set_type. - Octeon: Mark some functions __init in smp.c - Octeon: Octeon: Add Octeon III CN7xxx interface detection - PIC32: Add serial driver and bindings for it. - PIC32: Add PIC32 deadman timer driver and bindings. - PIC32: Add PIC32 clock timer driver and bindings. - Pistachio: Determine SoC revision during boot - Sibyte: Fix Kconfig dependencies of SIBYTE_BUS_WATCHER. - Sibyte: Strip redundant comments from bcm1480_regs.h. - Panic immediately if panic_on_oops is set. - module: fix incorrect IS_ERR_VALUE macro usage. - module: Make consistent use of pr_* - Remove no longer needed work_on_cpu() call. - Remove CONFIG_IPV6_PRIVACY from defconfigs. - Fix registers of non-crashing CPUs in dumps. - Handle MIPSisms in new vmcore_elf32_check_arch. - Select CONFIG_HANDLE_DOMAIN_IRQ and make it work. - Allow RIXI to be used on non-R2 or R6 cores. - Reserve nosave data for hibernation - Fix siginfo.h to use strict POSIX types. - Don't unwind user mode with EVA. - Fix watchpoint restoration - Ptrace watchpoints for R6. - Sync icache when it fills from dcache - I6400 I-cache fills from dcache. - Various MSA fixes. - Cleanup MIPS_CPU_* definitions. - Signal: Move generic copy_siginfo to signal.h - Signal: Fix uapi include in exported asm/siginfo.h - Timer fixes for sake of KVM. - XPA TLB refill fixes. - Treat perf counter feature - Update John Crispin's email address - Add PIC32 watchdog and bindings. - Handle R10000 LL/SC bug in set_pte() - cpufreq: Various fixes for Longson1. - R6: Fix R2 emulation. - mathemu: Cosmetic fix to ADDIUPC emulation, plenty of other small fixes - ELF: ABI and FP fixes. - Allow for relocatable kernel and use that to support KASLR. - Fix CPC_BASE_ADDR mask - Plenty fo smp-cps, CM, R6 and M6250 fixes. - Make reset_control_ops const. - Fix kernel command line handling of leading whitespace. - Cleanups to cache handling. - Add brcm, bcm6345-l1-intc device tree bindings. - Use generic clkdev.h header - Remove CLK_IS_ROOT usage. - Misc small cleanups. - CM: Fix compilation error when !MIPS_CM - oprofile: Fix a preemption issue - Detect DSP ASE v3 support:1" * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (275 commits) MIPS: pic32mzda: fix getting timer clock rate. MIPS: ath79: fix regression in PCI window initialization MIPS: ath79: make ath79_ddr_ctrl_init() compatible for newer SoCs MIPS: Fix VZ probe gas errors with binutils <2.24 MIPS: perf: Fix I6400 event numbers MIPS: DEC: Export `ioasic_ssr_lock' to modules MIPS: MSA: Fix a link error on `_init_msa_upper' with older GCC MIPS: CM: Fix compilation error when !MIPS_CM MIPS: Fix genvdso error on rebuild USB: ohci-jz4740: Remove obsolete driver MIPS: JZ4740: Probe OHCI platform device via DT MIPS: JZ4740: Qi LB60: Remove support for AVT2 variant MIPS: pistachio: Determine SoC revision during boot MIPS: BMIPS: Adjust mips-hpt-frequency for BCM7435 mips: mt7620: fallback to SDRAM when syscfg0 does not have a valid value for the memory type MIPS: Prevent "restoration" of MSA context in non-MSA kernels MIPS: cevt-r4k: Dynamically calculate min_delta_ns MIPS: malta-time: Take seconds into account MIPS: malta-time: Start GIC count before syncing to RTC MIPS: Force CPUs to lose FP context during mode switches ...
2016-05-19Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "Highlights: - A new LSM, "LoadPin", from Kees Cook is added, which allows forcing of modules and firmware to be loaded from a specific device (this is from ChromeOS, where the device as a whole is verified cryptographically via dm-verity). This is disabled by default but can be configured to be enabled by default (don't do this if you don't know what you're doing). - Keys: allow authentication data to be stored in an asymmetric key. Lots of general fixes and updates. - SELinux: add restrictions for loading of kernel modules via finit_module(). Distinguish non-init user namespace capability checks. Apply execstack check on thread stacks" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (48 commits) LSM: LoadPin: provide enablement CONFIG Yama: use atomic allocations when reporting seccomp: Fix comment typo ima: add support for creating files using the mknodat syscall ima: fix ima_inode_post_setattr vfs: forbid write access when reading a file into memory fs: fix over-zealous use of "const" selinux: apply execstack check on thread stacks selinux: distinguish non-init user namespace capability checks LSM: LoadPin for kernel file loading restrictions fs: define a string representation of the kernel_read_file_id enumeration Yama: consolidate error reporting string_helpers: add kstrdup_quotable_file string_helpers: add kstrdup_quotable_cmdline string_helpers: add kstrdup_quotable selinux: check ss_initialized before revalidating an inode label selinux: delay inode label lookup as long as possible selinux: don't revalidate an inode's label when explicitly setting it selinux: Change bool variable name to index. KEYS: Add KEYCTL_DH_COMPUTE command ...
2016-05-18Merge branch 'work.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs cleanups from Al Viro: "Assorted cleanups and fixes all over the place" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: coredump: only charge written data against RLIMIT_CORE coredump: get rid of coredump_params->written ecryptfs_lookup(): try either only encrypted or plaintext name ecryptfs: avoid multiple aliases for directories bpf: reject invalid names right in ->lookup() __d_alloc(): treat NULL name as QSTR("/", 1) mtd: switch ubi_open_volume_path() to vfs_stat() mtd: switch open_mtd_by_chdev() to use of vfs_stat()
2016-05-18Merge branch 'work.iov_iter' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull iov_iter cleanups from Al Viro. * 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fold checks into iterate_and_advance() rw_verify_area(): saner calling conventions aio: remove a pointless assignment
2016-05-18Merge branch 'work.lookups' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull parallel lookup fixups from Al Viro: "Fix for xfs parallel readdir (turns out the cxfs exposure was not enough to catch all problems), and a reversion of btrfs back to ->iterate() until the fs/btrfs/delayed-inode.c gets fixed" * 'work.lookups' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: xfs: concurrent readdir hangs on data buffer locks Revert "btrfs: switch to ->iterate_shared()"
2016-05-18xfs: concurrent readdir hangs on data buffer locksDave Chinner
There's a three-process deadlock involving shared/exclusive barriers and inverted lock orders in the directory readdir implementation. It's a pre-existing problem with lock ordering, exposed by the VFS parallelisation code. process 1 process 2 process 3 --------- --------- --------- readdir iolock(shared) get_leaf_dents iterate entries ilock(shared) map, lock and read buffer iunlock(shared) process entries in buffer ..... readdir iolock(shared) get_leaf_dents iterate entries ilock(shared) map, lock buffer <blocks> finish ->iterate_shared file_accessed() ->update_time start transaction ilock(excl) <blocks> ..... finishes processing buffer get next buffer ilock(shared) <blocks> And that's the deadlock. Fix this by dropping the current buffer lock in process 1 before trying to map the next buffer. This means we keep the lock order of ilock -> buffer lock intact and hence will allow process 3 to make progress and drop it's ilock(shared) once it is done. Reported-by: Xiong Zhou <xzhou@redhat.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-18Revert "btrfs: switch to ->iterate_shared()"Al Viro
This reverts commit 972b241f8441dc37a3f89dcd7e71d7f013873d13. Quoth Chris: didn't take the delayed inode stuff into account it got an rbtree of items and it pulls things out so in shared mode, its hugely racey sorry, lets revert and fix it for real inside of btrfs Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-18Merge branch 'sendmsg.cifs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull cifs iovec cleanups from Al Viro. * 'sendmsg.cifs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: cifs: don't bother with kmap on read_pages side cifs_readv_receive: use cifs_read_from_socket() cifs: no need to wank with copying and advancing iovec on recvmsg side either cifs: quit playing games with draining iovecs cifs: merge the hash calculation helpers
2016-05-18Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull remaining vfs xattr work from Al Viro: "The rest of work.xattr (non-cifs conversions)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: btrfs: Switch to generic xattr handlers ubifs: Switch to generic xattr handlers jfs: Switch to generic xattr handlers jfs: Clean up xattr name mapping gfs2: Switch to generic xattr handlers ceph: kill __ceph_removexattr() ceph: Switch to generic xattr handlers ceph: Get rid of d_find_alias in ceph_set_acl
2016-05-18Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs updates from Steve French: "Various small CIFS and SMB3 fixes (including some for stable)" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: remove directory incorrectly tries to set delete on close on non-empty directories Update cifs.ko version to 2.09 fs/cifs: correctly to anonymous authentication for the NTLM(v2) authentication fs/cifs: correctly to anonymous authentication for the NTLM(v1) authentication fs/cifs: correctly to anonymous authentication for the LANMAN authentication fs/cifs: correctly to anonymous authentication via NTLMSSP cifs: remove any preceding delimiter from prefix_path cifs: Use file_dentry()
2016-05-17Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (21 commits) gitignore: fix wording mfd: ab8500-debugfs: fix "between" in printk memstick: trivial fix of spelling mistake on management cpupowerutils: bench: fix "average" treewide: Fix typos in printk IB/mlx4: printk fix pinctrl: sirf/atlas7: fix printk spelling serial: mctrl_gpio: Grammar s/lines GPIOs/line GPIOs/, /sets/set/ w1: comment spelling s/minmum/minimum/ Blackfin: comment spelling s/divsor/divisor/ metag: Fix misspellings in comments. ia64: Fix misspellings in comments. hexagon: Fix misspellings in comments. tools/perf: Fix misspellings in comments. cris: Fix misspellings in comments. c6x: Fix misspellings in comments. blackfin: Fix misspelling of 'register' in comment. avr32: Fix misspelling of 'definitions' in comment. treewide: Fix typos in printk Doc: treewide : Fix typos in DocBook/filesystem.xml ...
2016-05-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: "Highlights: 1) Support SPI based w5100 devices, from Akinobu Mita. 2) Partial Segmentation Offload, from Alexander Duyck. 3) Add GMAC4 support to stmmac driver, from Alexandre TORGUE. 4) Allow cls_flower stats offload, from Amir Vadai. 5) Implement bpf blinding, from Daniel Borkmann. 6) Optimize _ASYNC_ bit twiddling on sockets, unless the socket is actually using FASYNC these atomics are superfluous. From Eric Dumazet. 7) Run TCP more preemptibly, also from Eric Dumazet. 8) Support LED blinking, EEPROM dumps, and rxvlan offloading in mlx5e driver, from Gal Pressman. 9) Allow creating ppp devices via rtnetlink, from Guillaume Nault. 10) Improve BPF usage documentation, from Jesper Dangaard Brouer. 11) Support tunneling offloads in qed, from Manish Chopra. 12) aRFS offloading in mlx5e, from Maor Gottlieb. 13) Add RFS and RPS support to SCTP protocol, from Marcelo Ricardo Leitner. 14) Add MSG_EOR support to TCP, this allows controlling packet coalescing on application record boundaries for more accurate socket timestamp sampling. From Martin KaFai Lau. 15) Fix alignment of 64-bit netlink attributes across the board, from Nicolas Dichtel. 16) Per-vlan stats in bridging, from Nikolay Aleksandrov. 17) Several conversions of drivers to ethtool ksettings, from Philippe Reynes. 18) Checksum neutral ILA in ipv6, from Tom Herbert. 19) Factorize all of the various marvell dsa drivers into one, from Vivien Didelot 20) Add VF support to qed driver, from Yuval Mintz" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1649 commits) Revert "phy dp83867: Fix compilation with CONFIG_OF_MDIO=m" Revert "phy dp83867: Make rgmii parameters optional" r8169: default to 64-bit DMA on recent PCIe chips phy dp83867: Make rgmii parameters optional phy dp83867: Fix compilation with CONFIG_OF_MDIO=m bpf: arm64: remove callee-save registers use for tmp registers asix: Fix offset calculation in asix_rx_fixup() causing slow transmissions switchdev: pass pointer to fib_info instead of copy net_sched: close another race condition in tcf_mirred_release() tipc: fix nametable publication field in nl compat drivers: net: Don't print unpopulated net_device name qed: add support for dcbx. ravb: Add missing free_irq() calls to ravb_close() qed: Remove a stray tab net: ethernet: fec-mpc52xx: use phy_ethtool_{get|set}_link_ksettings net: ethernet: fec-mpc52xx: use phydev from struct net_device bpf, doc: fix typo on bpf_asm descriptions stmmac: hardware TX COE doesn't work when force_thresh_dma_mode is set net: ethernet: fs-enet: use phy_ethtool_{get|set}_link_ksettings net: ethernet: fs-enet: use phydev from struct net_device ...
2016-05-17btrfs: Switch to generic xattr handlersAndreas Gruenbacher
The btrfs_{set,remove}xattr inode operations check for a read-only root (btrfs_root_readonly) before calling into generic_{set,remove}xattr. If this check is moved into __btrfs_setxattr, we can get rid of btrfs_{set,remove}xattr. This patch applies to mainline, I would like to keep it together with the other xattr cleanups if possible, though. Could you please review? Thanks, Andreas Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-17ubifs: Switch to generic xattr handlersAndreas Gruenbacher
Ubifs internally uses special inodes for storing xattrs. Those inodes had NULL {get,set,remove}xattr inode operations before this change, so xattr operations on them would fail. The super block's s_xattr field would also apply to those special inodes. However, the inodes are not visible outside of ubifs, and so no xattr operations will ever be carried out on them anyway. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Richard Weinberger <richard@nod.at> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-17Merge branch 'work.preadv2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs cleanups from Al Viro: "More cleanups from Christoph" * 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: nfsd: use RWF_SYNC fs: add RWF_DSYNC aand RWF_SYNC ceph: use generic_write_sync fs: simplify the generic_write_sync prototype fs: add IOCB_SYNC and IOCB_DSYNC direct-io: remove the offset argument to dio_complete direct-io: eliminate the offset argument to ->direct_IO xfs: eliminate the pos variable in xfs_file_dio_aio_write filemap: remove the pos argument to generic_file_direct_write filemap: remove pos variables in generic_file_read_iter
2016-05-17Merge branch 'work.const-path' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull 'struct path' constification update from Al Viro: "'struct path' is passed by reference to a bunch of Linux security methods; in theory, there's nothing to stop them from modifying the damn thing and LSM community being what it is, sooner or later some enterprising soul is going to decide that it's a good idea. Let's remove the temptation and constify all of those..." * 'work.const-path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: constify ima_d_path() constify security_sb_pivotroot() constify security_path_chroot() constify security_path_{link,rename} apparmor: remove useless checks for NULL ->mnt constify security_path_{mkdir,mknod,symlink} constify security_path_{unlink,rmdir} apparmor: constify common_perm_...() apparmor: constify aa_path_link() apparmor: new helper - common_path_perm() constify chmod_common/security_path_chmod constify security_sb_mount() constify chown_common/security_path_chown tomoyo: constify assorted struct path * apparmor_path_truncate(): path->mnt is never NULL constify vfs_truncate() constify security_path_truncate() [apparmor] constify struct path * in a bunch of helpers
2016-05-17Merge branch 'for-cifs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull cifs xattr updates from Al Viro: "This is the remaining parts of the xattr work - the cifs bits" * 'for-cifs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: cifs: Switch to generic xattr handlers cifs: Fix removexattr for os2.* xattrs cifs: Check for equality with ACL_TYPE_ACCESS and ACL_TYPE_DEFAULT cifs: Fix xattr name checks
2016-05-17Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull UDF fixes from Jan Kara: "A fix for UDF crash on corrupted media and one UDF header fixup" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: Export superblock magic to userspace udf: Prevent stack overflow on corrupted filesystem mount
2016-05-17Merge tag 'jfs-4.7' of git://github.com/kleikamp/linux-shaggyLinus Torvalds
Pull jfs updates from Dave Kleikamp: "Some jfs logging cleanups from Joe Perches" * tag 'jfs-4.7' of git://github.com/kleikamp/linux-shaggy: jfs: Coalesce some formats jfs: Remove unnecessary line continuations and terminating newlines jfs: Remove terminating newlines from jfs_info, jfs_warn, jfs_err uses
2016-05-17exec: clarify reasoning for euid/egid resetKees Cook
This section of code initially looks redundant, but is required. This improves the comment to explain more clearly why the reset is needed. Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-17remove directory incorrectly tries to set delete on close on non-empty ↵Steve French
directories Wrong return code was being returned on SMB3 rmdir of non-empty directory. For SMB3 (unlike for cifs), we attempt to delete a directory by set of delete on close flag on the open. Windows clients set this flag via a set info (SET_FILE_DISPOSITION to set this flag) which properly checks if the directory is empty. With this patch on smb3 mounts we correctly return "DIRECTORY NOT EMPTY" on attempts to remove a non-empty directory. Signed-off-by: Steve French <steve.french@primarydata.com> CC: Stable <stable@vger.kernel.org> Acked-by: Sachin Prabhu <sprabhu@redhat.com>
2016-05-17Update cifs.ko version to 2.09Steve French
Signed-off-by: Steven French <steve.french@primarydata.com>
2016-05-17fs/cifs: correctly to anonymous authentication for the NTLM(v2) authenticationStefan Metzmacher
Only server which map unknown users to guest will allow access using a non-null NTLMv2_Response. For Samba it's the "map to guest = bad user" option. BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913 Signed-off-by: Stefan Metzmacher <metze@samba.org> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <smfrench@gmail.com>
2016-05-17fs/cifs: correctly to anonymous authentication for the NTLM(v1) authenticationStefan Metzmacher
Only server which map unknown users to guest will allow access using a non-null NTChallengeResponse. For Samba it's the "map to guest = bad user" option. BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913 Signed-off-by: Stefan Metzmacher <metze@samba.org> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <smfrench@gmail.com>
2016-05-17fs/cifs: correctly to anonymous authentication for the LANMAN authenticationStefan Metzmacher
Only server which map unknown users to guest will allow access using a non-null LMChallengeResponse. For Samba it's the "map to guest = bad user" option. BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913 Signed-off-by: Stefan Metzmacher <metze@samba.org> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <smfrench@gmail.com>
2016-05-17fs/cifs: correctly to anonymous authentication via NTLMSSPStefan Metzmacher
See [MS-NLMP] 3.2.5.1.2 Server Receives an AUTHENTICATE_MESSAGE from the Client: ... Set NullSession to FALSE If (AUTHENTICATE_MESSAGE.UserNameLen == 0 AND AUTHENTICATE_MESSAGE.NtChallengeResponse.Length == 0 AND (AUTHENTICATE_MESSAGE.LmChallengeResponse == Z(1) OR AUTHENTICATE_MESSAGE.LmChallengeResponse.Length == 0)) -- Special case: client requested anonymous authentication Set NullSession to TRUE ... Only server which map unknown users to guest will allow access using a non-null NTChallengeResponse. For Samba it's the "map to guest = bad user" option. BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913 CC: Stable <stable@vger.kernel.org> Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>
2016-05-17cifs: remove any preceding delimiter from prefix_pathSachin Prabhu
We currently do not check if any delimiter exists before the prefix path in cifs_compose_mount_options(). Consequently when building the devname using cifs_build_devname() we can end up with multiple delimiters separating the UNC and the prefix path. An issue was reported by the customer mounting a folder within a DFS share from a Netapp server which uses McAfee antivirus. We have narrowed down the cause to the use of double backslashes in the file name used to open the file. This was determined to be caused because of additional delimiters as a result of the bug. In addition to changes in cifs_build_devname(), we also fix cifs_parse_devname() to ignore any preceding delimiter for the prefix path. The problem was originally reported on RHEL 6 in RHEL bz 1252721. This is the upstream version of the fix. The fix was confirmed by looking at the packet capture of a DFS mount. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
2016-05-17cifs: Use file_dentry()Goldwyn Rodrigues
CIFS may be used as lower layer of overlayfs and accessing f_path.dentry can lead to a crash. Fix by replacing direct access of file->f_path.dentry with the file_dentry() accessor, which will always return a native object. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <smfrench@gmail.com>
2016-05-17Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull parallel filesystem directory handling update from Al Viro. This is the main parallel directory work by Al that makes the vfs layer able to do lookup and readdir in parallel within a single directory. That's a big change, since this used to be all protected by the directory inode mutex. The inode mutex is replaced by an rwsem, and serialization of lookups of a single name is done by a "in-progress" dentry marker. The series begins with xattr cleanups, and then ends with switching filesystems over to actually doing the readdir in parallel (switching to the "iterate_shared()" that only takes the read lock). A more detailed explanation of the process from Al Viro: "The xattr work starts with some acl fixes, then switches ->getxattr to passing inode and dentry separately. This is the point where the things start to get tricky - that got merged into the very beginning of the -rc3-based #work.lookups, to allow untangling the security_d_instantiate() mess. The xattr work itself proceeds to switch a lot of filesystems to generic_...xattr(); no complications there. After that initial xattr work, the series then does the following: - untangle security_d_instantiate() - convert a bunch of open-coded lookup_one_len_unlocked() to calls of that thing; one such place (in overlayfs) actually yields a trivial conflict with overlayfs fixes later in the cycle - overlayfs ended up switching to a variant of lookup_one_len_unlocked() sans the permission checks. I would've dropped that commit (it gets overridden on merge from #ovl-fixes in #for-next; proper resolution is to use the variant in mainline fs/overlayfs/super.c), but I didn't want to rebase the damn thing - it was fairly late in the cycle... - some filesystems had managed to depend on lookup/lookup exclusion for *fs-internal* data structures in a way that would break if we relaxed the VFS exclusion. Fixing hadn't been hard, fortunately. - core of that series - parallel lookup machinery, replacing ->i_mutex with rwsem, making lookup_slow() take it only shared. At that point lookups happen in parallel; lookups on the same name wait for the in-progress one to be done with that dentry. Surprisingly little code, at that - almost all of it is in fs/dcache.c, with fs/namei.c changes limited to lookup_slow() - making it use the new primitive and actually switching to locking shared. - parallel readdir stuff - first of all, we provide the exclusion on per-struct file basis, same as we do for read() vs lseek() for regular files. That takes care of most of the needed exclusion in readdir/readdir; however, these guys are trickier than lookups, so I went for switching them one-by-one. To do that, a new method '->iterate_shared()' is added and filesystems are switched to it as they are either confirmed to be OK with shared lock on directory or fixed to be OK with that. I hope to kill the original method come next cycle (almost all in-tree filesystems are switched already), but it's still not quite finished. - several filesystems get switched to parallel readdir. The interesting part here is dealing with dcache preseeding by readdir; that needs minor adjustment to be safe with directory locked only shared. Most of the filesystems doing that got switched to in those commits. Important exception: NFS. Turns out that NFS folks, with their, er, insistence on VFS getting the fuck out of the way of the Smart Filesystem Code That Knows How And What To Lock(tm) have grown the locking of their own. They had their own homegrown rwsem, with lookup/readdir/atomic_open being *writers* (sillyunlink is the reader there). Of course, with VFS getting the fuck out of the way, as requested, the actual smarts of the smart filesystem code etc. had become exposed... - do_last/lookup_open/atomic_open cleanups. As the result, open() without O_CREAT locks the directory only shared. Including the ->atomic_open() case. Backmerge from #for-linus in the middle of that - atomic_open() fix got brought in. - then comes NFS switch to saner (VFS-based ;-) locking, killing the homegrown "lookup and readdir are writers" kinda-sorta rwsem. All exclusion for sillyunlink/lookup is done by the parallel lookups mechanism. Exclusion between sillyunlink and rmdir is a real rwsem now - rmdir being the writer. Result: NFS lookups/readdirs/O_CREAT-less opens happen in parallel now. - the rest of the series consists of switching a lot of filesystems to parallel readdir; in a lot of cases ->llseek() gets simplified as well. One backmerge in there (again, #for-linus - rockridge fix)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (74 commits) ext4: switch to ->iterate_shared() hfs: switch to ->iterate_shared() hfsplus: switch to ->iterate_shared() hostfs: switch to ->iterate_shared() hpfs: switch to ->iterate_shared() hpfs: handle allocation failures in hpfs_add_pos() gfs2: switch to ->iterate_shared() f2fs: switch to ->iterate_shared() afs: switch to ->iterate_shared() befs: switch to ->iterate_shared() befs: constify stuff a bit isofs: switch to ->iterate_shared() get_acorn_filename(): deobfuscate a bit btrfs: switch to ->iterate_shared() logfs: no need to lock directory in lseek switch ecryptfs to ->iterate_shared 9p: switch to ->iterate_shared() fat: switch to ->iterate_shared() romfs, squashfs: switch to ->iterate_shared() more trivial ->iterate_shared conversions ...
2016-05-17Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: "API: - Crypto self tests can now be disabled at boot/run time. - Add async support to algif_aead. Algorithms: - A large number of fixes to MPI from Nicolai Stange. - Performance improvement for HMAC DRBG. Drivers: - Use generic crypto engine in omap-des. - Merge ppc4xx-rng and crypto4xx drivers. - Fix lockups in sun4i-ss driver by disabling IRQs. - Add DMA engine support to ccp. - Reenable talitos hash algorithms. - Add support for Hisilicon SoC RNG. - Add basic crypto driver for the MXC SCC. Others: - Do not allocate crypto hash tfm in NORECLAIM context in ecryptfs" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (77 commits) crypto: qat - change the adf_ctl_stop_devices to void crypto: caam - fix caam_jr_alloc() ret code crypto: vmx - comply with ABIs that specify vrsave as reserved. crypto: testmgr - Add a flag allowing the self-tests to be disabled at runtime. crypto: ccp - constify ccp_actions structure crypto: marvell/cesa - Use dma_pool_zalloc crypto: qat - make adf_vf_isr.c dependant on IOV config crypto: qat - Fix typo in comments lib: asn1_decoder - add MODULE_LICENSE("GPL") crypto: omap-sham - Use dma_request_chan() for requesting DMA channel crypto: omap-des - Use dma_request_chan() for requesting DMA channel crypto: omap-aes - Use dma_request_chan() for requesting DMA channel crypto: omap-des - Integrate with the crypto engine framework crypto: s5p-sss - fix incorrect usage of scatterlists api crypto: s5p-sss - Fix missed interrupts when working with 8 kB blocks crypto: s5p-sss - Use common BIT macro crypto: mxc-scc - fix unwinding in mxc_scc_crypto_register() crypto: mxc-scc - signedness bugs in mxc_scc_ablkcipher_req_init() crypto: talitos - fix ahash algorithms registration crypto: ccp - Ensure all dependencies are specified ...
2016-05-17Merge branch 'ovl-fixes' into for-linusAl Viro
Backmerge to resolve a conflict in ovl_lookup_real(); "ovl_lookup_real(): use lookup_one_len_unlocked()" instead, but it was too late in the cycle to rebase.
2016-05-16Merge branch 'efi-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI updates from Ingo Molnar: "The main changes in this cycle were: - Drop the unused EFI_SYSTEM_TABLES efi.flags bit and ensure the ARM/arm64 EFI System Table mapping is read-only (Ard Biesheuvel) - Add a comment to explain that one of the code paths in the x86/pat code is only executed for EFI boot (Matt Fleming) - Improve Secure Boot status checks on arm64 and handle unexpected errors (Linn Crosetto) - Remove the global EFI memory map variable 'memmap' as the same information is already available in efi::memmap (Matt Fleming) - Add EFI Memory Attribute table support for ARM/arm64 (Ard Biesheuvel) - Add EFI GOP framebuffer support for ARM/arm64 (Ard Biesheuvel) - Add EFI Bootloader Control driver for storing reboot(2) data in EFI variables for consumption by bootloaders (Jeremy Compostella) - Add Core EFI capsule support (Matt Fleming) - Add EFI capsule char driver (Kweh, Hock Leong) - Unify EFI memory map code for ARM and arm64 (Ard Biesheuvel) - Add generic EFI support for detecting when firmware corrupts CPU status register bits (like IRQ flags) when performing EFI runtime service calls (Mark Rutland) ... and other misc cleanups" * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits) efivarfs: Make efivarfs_file_ioctl() static efi: Merge boolean flag arguments efi/capsule: Move 'capsule' to the stack in efi_capsule_supported() efibc: Fix excessive stack footprint warning efi/capsule: Make efi_capsule_pending() lockless efi: Remove unnecessary (and buggy) .memmap initialization from the Xen EFI driver efi/runtime-wrappers: Remove ARCH_EFI_IRQ_FLAGS_MASK #ifdef x86/efi: Enable runtime call flag checking arm/efi: Enable runtime call flag checking arm64/efi: Enable runtime call flag checking efi/runtime-wrappers: Detect firmware IRQ flag corruption efi/runtime-wrappers: Remove redundant #ifdefs x86/efi: Move to generic {__,}efi_call_virt() arm/efi: Move to generic {__,}efi_call_virt() arm64/efi: Move to generic {__,}efi_call_virt() efi/runtime-wrappers: Add {__,}efi_call_virt() templates efi/arm-init: Reserve rather than unmap the memory map for ARM as well efi: Add misc char driver interface to update EFI firmware x86/efi: Force EFI reboot to process pending capsules efi: Add 'capsule' update support ...
2016-05-16namei: Improve hash mixing if CONFIG_DCACHE_WORD_ACCESSGeorge Spelvin
The hash mixing between adding the next 64 bits of name was just a bit weak. Replaced with a still very fast but slightly more effective mixing function. Signed-off-by: George Spelvin <linux@horizon.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
The nf_conntrack_core.c fix in 'net' is not relevant in 'net-next' because we no longer have a per-netns conntrack hash. The ip_gre.c conflict as well as the iwlwifi ones were cases of overlapping changes. Conflicts: drivers/net/wireless/intel/iwlwifi/mvm/tx.c net/ipv4/ip_gre.c net/netfilter/nf_conntrack_core.c Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Overlayfs fixes from Miklos, assorted fixes from me. Stable fodder of varying severity, all sat in -next for a while" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ovl: ignore permissions on underlying lookup vfs: add lookup_hash() helper vfs: rename: check backing inode being equal vfs: add vfs_select_inode() helper get_rock_ridge_filename(): handle malformed NM entries ecryptfs: fix handling of directory opening atomic_open(): fix the handling of create_error fix the copy vs. map logics in blk_rq_map_user_iov() do_splice_to(): cap the size before passing to ->splice_read()
2016-05-13Merge branch 'for-4.6-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "During v4.6-rc1 cgroup namespace support was merged. There is an issue where it's impossible to tell whether a given cgroup mount point is bind mounted or namespaced. Serge has been working on the issue but it took longer than expected to resolve, so the late pull request. Given that it's a completely new feature and the patches don't touch anything else, the risk seems acceptable. However, if this is too late, an alternative is plugging new cgroup ns creation for v4.6 and retrying for v4.7" * 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: fix compile warning kernfs: kernfs_sop_show_path: don't return 0 after seq_dentry call cgroup, kernfs: make mountinfo show properly scoped path for cgroup namespaces kernfs_path_from_node_locked: don't overwrite nlen
2016-05-13crash_dump: Add vmcore_elf32_check_archDaniel Wagner
parse_crash_elf{32|64}_headers will check the headers via the elf_check_arch respectively vmcore_elf64_check_arch macro. The MIPS architecture implements those two macros differently. In order to make the differentiation more explicit, let's introduce an vmcore_elf32_check_arch to allow the archs to overwrite it. Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Suggested-by: Maciej W. Rozycki <macro@imgtec.com> Reviewed-by: Maciej W. Rozycki <macro@imgtec.com> Cc: linux-kernel@vger.kernel.org Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/12535/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-05-12jfs: Switch to generic xattr handlersAndreas Gruenbacher
This is mostly the same as on other filesystems except for attribute names with an "os2." prefix: for those, the prefix is not stored on disk, and on-attribute names without a prefix have "os2." added. As on several other filesystems, the underlying function for setting/removing xattrs (__jfs_setxattr) removes attributes when the value is NULL, so the set xattr handlers will work as expected. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>