summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2017-12-21xfs: set cowblocks tag for direct cow writes tooDarrick J. Wong
If a user performs a direct CoW write, we end up loading the CoW fork with preallocated extents. Therefore, we must set the cowblocks tag so that they can be cleared out if we run low on space. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-12-21xfs: remove leftover CoW reservations when remounting roDarrick J. Wong
When we're remounting the filesystem readonly, remove all CoW preallocations prior to going ro. If the fs goes down after the ro remount, we never clean up the staging extents, which means xfs_check will trip over them on a subsequent run. Practically speaking, the next mount will clean them up too, so this is unlikely to be seen. Since we shut down the cowblocks cleaner on remount-ro, we also have to make sure we start it back up if/when we remount-rw. Found by adding clonerange to fsstress and running xfs/017. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-12-21xfs: don't be so eager to clear the cowblocks tag on truncateDarrick J. Wong
Currently, xfs_itruncate_extents clears the cowblocks tag if i_cnextents is zero. This is wrong, since i_cnextents only tracks real extents in the CoW fork, which means that we could have some delayed CoW reservations still in there that will now never get cleaned. Fix a further bug where we /don't/ clear the reflink iflag if there are any attribute blocks -- really, it's only safe to clear the reflink flag if there are no data fork extents and no cow fork extents. Found by adding clonerange to fsstress in xfs/017. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-12-20xfs: track cowblocks separately in i_flagsDarrick J. Wong
The EOFBLOCKS/COWBLOCKS tags are totally separate things, so track them with separate i_flags. Right now we're abusing IEOFBLOCKS for both, which is totally bogus because we won't tag the inode with COWBLOCKS if IEOFBLOCKS was set by a previous tagging of the inode with EOFBLOCKS. Found by wiring up clonerange to fsstress in xfs/017. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-12-14xfs: allow CoW remap transactions to use reserve blocksDarrick J. Wong
Since we as yet have no way of holding on to the indlen blocks that are reserved as part of CoW fork delalloc reservations, let the CoW remap transaction dip into the reserves so that we avoid failing writes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-12-14xfs: avoid infinite loop when cancelling CoW blocks after writeback failureDarrick J. Wong
When we're cancelling a cow range, we don't always delete each extent that we iterate, so we have to move icur backwards in the list to avoid an infinite loop. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-12-14xfs: relax is_reflink_inode assert in xfs_reflink_find_cow_mappingDarrick J. Wong
We don't hold the ilock through the entire sequence of xfs_writepage_map -> xfs_map_cow -> xfs_reflink_find_cow_mapping. This means that we can race with another thread that is trying to clear the inode reflink flag, with the result that the flag is set for the xfs_map_cow check but cleared before we get to the assert in find_cow_mapping. When this happens, we blow the assert even though everything is fine. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-12-14xfs: remove dest file's post-eof preallocations before reflinkingDarrick J. Wong
If we try to reflink into a file with post-eof preallocations at an offset well past the preallocations, we increase i_size as one would expect. However, those allocations do not have page cache backing them, so they won't get cleaned out on their own. This leads to asserts in the collapse/insert range code and xfs_destroy_inode when they encounter delalloc extents they weren't expecting to find. Since there are plenty of other places where we dump those post-eof blocks, do the same to the reflink destination file before we start remapping extents. This was found by adding clonerange support to fsstress and running it in write-only mode. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-12-14xfs: move xfs_iext_insert tracepoint to report useful informationDarrick J. Wong
Move the tracepoint in xfs_iext_insert to after the point where we've inserted the extent because otherwise we report stale extent data in the ftrace output. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-12-14xfs: account for null transactions in bunmapiDarrick J. Wong
In e1a4e37cc7b665 ("xfs: try to avoid blowing out the transaction reservation when bunmaping a shared extent"), we try to constrain the amount of real extents we unmap from the data fork in a given call so that we don't blow out transaction reservations. However, not all bunmapi operations require a transaction -- if we're only removing a delalloc extent, no transaction is needed, so we have to code against that. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-12-14xfs: hold xfs_buf locked between shortform->leaf conversion and the addition ↵Darrick J. Wong
of an attribute The new attribute leaf buffer is not held locked across the transaction roll between the shortform->leaf modification and the addition of the new entry. As a result, the attribute buffer modification being made is not atomic from an operational perspective. Hence the AIL push can grab it in the transient state of "just created" after the initial transaction is rolled, because the buffer has been released. This leads to xfs_attr3_leaf_verify() asserting that hdr.count is zero, treating this as in-memory corruption, and shutting down the filesystem. Darrick ported the original patch to 4.15 and reworked it use the xfs_defer_bjoin helper and hold/join the buffer correctly across the second transaction roll. Signed-off-by: Alex Lyakas <alex@zadarastorage.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-12-14xfs: add the ability to join a held buffer to a defer_opsDarrick J. Wong
In certain cases, defer_ops callers will lock a buffer and want to hold the lock across transaction rolls. Similar to ijoined inodes, we want to dirty & join the buffer with each transaction roll in defer_finish so that afterwards the caller still owns the buffer lock and we haven't inadvertently pinned the log. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-12-08xfs: make iomap_begin functions trim iomaps consistentlyDarrick J. Wong
Historically, the XFS iomap_begin function only returned mappings for exactly the range queried, i.e. it doesn't do XFS_BMAPI_ENTIRE lookups. The current vfs iomap consumers are only set up to deal with trimmed mappings. xfs_xattr_iomap_begin does BMAPI_ENTIRE lookups, which is inconsistent with the current iomap usage. Remove the flag so that both iomap_begin functions behave the same way. FWIW this also fixes a behavioral regression in xattr FIEMAP that was introduced in 4.8 wherein attr fork extents are no longer trimmed like they used to be. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-12-08xfs: remove "no-allocation" reservations for file creationsChristoph Hellwig
If we create a new file we will need an inode, and usually some metadata in the parent direction. Aiming for everything to go well despite the lack of a reservation leads to dirty transactions cancelled under a heavy create/delete load. This patch removes those nospace transactions, which will lead to slightly earlier ENOSPC on some workloads, but instead prevent file system shutdowns due to cancelling dirty transactions for others. A customer could observe assertations failures and shutdowns due to cancelation of dirty transactions during heavy NFS workloads as shown below: 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728125] XFS: Assertion failed: error != -ENOSPC, file: fs/xfs/xfs_inode.c, line: 1262 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728222] Call Trace: 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728246] [<ffffffff81795daf>] dump_stack+0x63/0x81 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728262] [<ffffffff810a1a5a>] warn_slowpath_common+0x8a/0xc0 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728264] [<ffffffff810a1b8a>] warn_slowpath_null+0x1a/0x20 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728285] [<ffffffffa01bf403>] asswarn+0x33/0x40 [xfs] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728308] [<ffffffffa01bb07e>] xfs_create+0x7be/0x7d0 [xfs] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728329] [<ffffffffa01b6ffb>] xfs_generic_create+0x1fb/0x2e0 [xfs] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728348] [<ffffffffa01b7114>] xfs_vn_mknod+0x14/0x20 [xfs] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728366] [<ffffffffa01b7153>] xfs_vn_create+0x13/0x20 [xfs] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728380] [<ffffffff81231de5>] vfs_create+0xd5/0x140 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728390] [<ffffffffa045ddb9>] do_nfsd_create+0x499/0x610 [nfsd] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728396] [<ffffffffa0465fa5>] nfsd3_proc_create+0x135/0x210 [nfsd] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728401] [<ffffffffa04561e3>] nfsd_dispatch+0xc3/0x210 [nfsd] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728416] [<ffffffffa03bfa43>] svc_process_common+0x453/0x6f0 [sunrpc] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728423] [<ffffffffa03bfdf3>] svc_process+0x113/0x1f0 [sunrpc] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728427] [<ffffffffa0455bcf>] nfsd+0x10f/0x180 [nfsd] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728432] [<ffffffffa0455ac0>] ? nfsd_destroy+0x80/0x80 [nfsd] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728438] [<ffffffff810c0d58>] kthread+0xd8/0xf0 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728441] [<ffffffff810c0c80>] ? kthread_create_on_node+0x1b0/0x1b0 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728451] [<ffffffff8179d962>] ret_from_fork+0x42/0x70 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728453] [<ffffffff810c0c80>] ? kthread_create_on_node+0x1b0/0x1b0 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728454] ---[ end trace f9822c842fec81d4 ]--- 2017-05-30 21:17:06 kernel: ALERT: [ 2670.728477] XFS (sdb): Internal error xfs_trans_cancel at line 983 of file fs/xfs/xfs_trans.c. Caller xfs_create+0x4ee/0x7d0 [xfs] 2017-05-30 21:17:06 kernel: ALERT: [ 2670.728684] XFS (sdb): Corruption of in-memory data detected. Shutting down filesystem 2017-05-30 21:17:06 kernel: ALERT: [ 2670.728685] XFS (sdb): Please umount the filesystem and rectify the problem(s) Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-12-08fs: xfs: remove duplicate includesPravin Shedge
These duplicate includes have been found with scripts/checkincludes.pl but they have been removed manually to avoid removing false positives. Signed-off-by: Pravin Shedge <pravin.shedge4linux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-11-30xfs: Properly retry failed dquot items in case of error during buffer writebackCarlos Maiolino
Once the inode item writeback errors is already fixed, it's time to fix the same problem in dquot code. Although there were no reports of users hitting this bug in dquot code (at least none I've seen), the bug is there and I was already planning to fix it when the correct approach to fix the inodes part was decided. This patch aims to fix the same problem in dquot code, regarding failed buffers being unable to be resubmitted once they are flush locked. Tested with the recently test-case sent to fstests list by Hou Tao. Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-11-30xfs: scrub inode mode properlyDarrick J. Wong
Since we've used up all the bits in i_mode, the existing mode check doesn't actually do anything useful. However, we've not used all the bit values in the format portion of i_mode, so we /do/ need to test that for bad values. Fixes: 80e4e1268 ("xfs: scrub inodes") Fixes-coverity-id: 1423992 Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2017-11-30xfs: remove unused parameter from xfs_writepage_mapDarrick J. Wong
The first thing that xfs_writepage_map does is clobber the offset parameter. Since we never use the passed-in value, turn the parameter into a local variable. This gets rid of an UBSAN warning in generic/466. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2017-11-30xfs: ubsan fixesDarrick J. Wong
Fix some complaints from the UBSAN about signed integer addition overflows. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2017-11-28xfs: calculate correct offset in xfs_scrub_quota_itemEric Sandeen
It's only used for tracepoints so it's relatively harmless, but the offset is calculated incorrectly in xfs_scrub_quota_item. qi_dqperchunk is the nr. of dquots per "chunk" which we have conveniently *cough* defined to always be 1 FSB. Therefore block_offset * qi_dqperchunk == first id in that chunk, and so offset = id / qi_dqperchunk id * dqperchunk is ... meaningless. Fixes-coverity-id: 1423965 Fixes: c2fc338c ("xfs: scrub quota information") Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-11-28xfs: fix uninitialized variable in xfs_scrub_quotaEric Sandeen
On the first pass through the while(1) loop, we get to xfs_scrub_should_terminate() which can test the uninitialized error variable. Fixes-coverity-id: 1423737 Fixes: c2fc338c ("xfs: scrub quota information") Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-11-28xfs: fix leaks on corruption errors in xfs_bmap.cEric Sandeen
Use _GOTO instead of _RETURN so we can free the allocated cursor on error. Fixes: bf80628 ("xfs: remove xfs_bmse_shift_one") Fixes-coverity-id: 1423813, 1423676 Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-11-28xfs: fortify xfs_alloc_buftarg error handlingMichal Hocko
percpu_counter_init failure path doesn't clean up &btp->bt_lru list. Call list_lru_destroy in that error path. Similarly register_shrinker error path is not handled. While it is unlikely to trigger these error path, it is not impossible especially the later might fail with large NUMAs. Let's handle the failure to make the code more robust. Noticed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-11-27xfs: log recovery should replay deferred ops in orderDarrick J. Wong
As part of testing log recovery with dm_log_writes, Amir Goldstein discovered an error in the deferred ops recovery that lead to corruption of the filesystem metadata if a reflink+rmap filesystem happened to shut down midway through a CoW remap: "This is what happens [after failed log recovery]: "Phase 1 - find and verify superblock... "Phase 2 - using internal log " - zero log... " - scan filesystem freespace and inode maps... " - found root inode chunk "Phase 3 - for each AG... " - scan (but don't clear) agi unlinked lists... " - process known inodes and perform inode discovery... " - agno = 0 "data fork in regular inode 134 claims CoW block 376 "correcting nextents for inode 134 "bad data fork in inode 134 "would have cleared inode 134" Hou Tao dissected the log contents of exactly such a crash: "According to the implementation of xfs_defer_finish(), these ops should be completed in the following sequence: "Have been done: "(1) CUI: Oper (160) "(2) BUI: Oper (161) "(3) CUD: Oper (194), for CUI Oper (160) "(4) RUI A: Oper (197), free rmap [0x155, 2, -9] "Should be done: "(5) BUD: for BUI Oper (161) "(6) RUI B: add rmap [0x155, 2, 137] "(7) RUD: for RUI A "(8) RUD: for RUI B "Actually be done by xlog_recover_process_intents() "(5) BUD: for BUI Oper (161) "(6) RUI B: add rmap [0x155, 2, 137] "(7) RUD: for RUI B "(8) RUD: for RUI A "So the rmap entry [0x155, 2, -9] for COW should be freed firstly, then a new rmap entry [0x155, 2, 137] will be added. However, as we can see from the log record in post_mount.log (generated after umount) and the trace print, the new rmap entry [0x155, 2, 137] are added firstly, then the rmap entry [0x155, 2, -9] are freed." When reconstructing the internal log state from the log items found on disk, it's required that deferred ops replay in exactly the same order that they would have had the filesystem not gone down. However, replaying unfinished deferred ops can create /more/ deferred ops. These new deferred ops are finished in the wrong order. This causes fs corruption and replay crashes, so let's create a single defer_ops to handle the subsequent ops created during replay, then use one single transaction at the end of log recovery to ensure that everything is replayed in the same order as they're supposed to be. Reported-by: Amir Goldstein <amir73il@gmail.com> Analyzed-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-11-27xfs: always free inline data before resetting inode fork during ifreeDarrick J. Wong
In xfs_ifree, we reset the data/attr forks to extents format without bothering to free any inline data buffer that might still be around after all the blocks have been truncated off the file. Prior to commit 43518812d2 ("xfs: remove support for inlining data/extents into the inode fork") nobody noticed because the leftover inline data after truncation was small enough to fit inside the inline buffer inside the fork itself. However, now that we've removed the inline buffer, we /always/ have to free the inline data buffer or else we leak them like crazy. This test was found by turning on kmemleak for generic/001 or generic/388. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-11-25Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: - The final conversion of timer wheel timers to timer_setup(). A few manual conversions and a large coccinelle assisted sweep and the removal of the old initialization mechanisms and the related code. - Remove the now unused VSYSCALL update code - Fix permissions of /proc/timer_list. I still need to get rid of that file completely - Rename a misnomed clocksource function and remove a stale declaration * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits) m68k/macboing: Fix missed timer callback assignment treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE casts timer: Remove redundant __setup_timer*() macros timer: Pass function down to initialization routines timer: Remove unused data arguments from macros timer: Switch callback prototype to take struct timer_list * argument timer: Pass timer_list pointer to callbacks unconditionally Coccinelle: Remove setup_timer.cocci timer: Remove setup_*timer() interface timer: Remove init_timer() interface treewide: setup_timer() -> timer_setup() (2 field) treewide: setup_timer() -> timer_setup() treewide: init_timer() -> setup_timer() treewide: Switch DEFINE_TIMER callbacks to struct timer_list * s390: cmm: Convert timers to use timer_setup() lightnvm: Convert timers to use timer_setup() drivers/net: cris: Convert timers to use timer_setup() drm/vc4: Convert timers to use timer_setup() block/laptop_mode: Convert timers to use timer_setup() net/atm/mpc: Avoid open-coded assignment of timer callback function ...
2017-11-25Merge tag 'afs-fixes-20171124' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull AFS fixes from David Howells: - Make AFS file locking work again. - Don't write to a page that's being written out, but wait for it to complete. - Do d_drop() and d_add() in the right places. - Put keys on error paths. - Remove some redundant code. * tag 'afs-fixes-20171124' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: remove redundant assignment of dvnode to itself afs: cell: Remove unnecessary code in afs_lookup_cell afs: Fix signal handling in some file ops afs: Fix some dentry handling in dir ops and missing key_puts afs: Make afs_write_begin() avoid writing to a page that's being stored afs: Fix file locking
2017-11-24afs: remove redundant assignment of dvnode to itselfColin Ian King
The assignment of dvnode to itself is redundant and can be removed. Cleans up warning detected by cppcheck: fs/afs/dir.c:975: (warning) Redundant assignment of 'dvnode' to itself. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-24afs: cell: Remove unnecessary code in afs_lookup_cellGustavo A. R. Silva
Due to recent changes this piece of code is no longer needed. Addresses-Coverity-ID: 1462033 Link: https://lkml.kernel.org/r/4923.1510957307@warthog.procyon.org.uk Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-24afs: Fix signal handling in some file opsDavid Howells
afs_mkdir(), afs_create(), afs_link() and afs_symlink() all need to drop the target dentry if a signal causes the operation to be killed immediately before we try to contact the server. Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-24afs: Fix some dentry handling in dir ops and missing key_putsDavid Howells
Fix some of dentry handling in AFS directory ops: (1) Do d_drop() on the new_dentry before assigning a new inode to it in afs_vnode_new_inode(). It's fine to do this before calling afs_iget() because the operation has taken place on the server. (2) Replace d_instantiate()/d_rehash() with d_add(). (3) Don't d_drop() the new_dentry in afs_rename() on error. Also fix afs_link() and afs_rename() to call key_put() on all error paths where the key is taken. Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-24afs: Make afs_write_begin() avoid writing to a page that's being storedDavid Howells
Make afs_write_begin() wait for a page that's marked PG_writeback because: (1) We need to avoid interference with the data being stored so that the data on the server ends up in a defined state. (2) page->private is used to track the window of dirty data within a page, but it's also used by the storage code to track what's being written, being cleared by the completion notification. Ownership can't be relinquished by the storage code until completion because it a store fails, the data must be remarked dirty. Tracing shows something like the following (edited): x86_64-linux-gn-15940 [1] afs_page_dirty: vn=ffff8800bef33800 9c75 begin 0-125 kworker/u8:3-114 [2] afs_page_dirty: vn=ffff8800bef33800 9c75 store+ 0-125 x86_64-linux-gn-15940 [1] afs_page_dirty: vn=ffff8800bef33800 9c75 begin 0-2052 kworker/u8:3-114 [2] afs_page_dirty: vn=ffff8800bef33800 9c75 clear 0-2052 kworker/u8:3-114 [2] afs_page_dirty: vn=ffff8800bef33800 9c75 store 0-0 kworker/u8:3-114 [2] afs_page_dirty: vn=ffff8800bef33800 9c75 WARN 0-0 The clear (completion) corresponding to the store+ (store continuation from a previous page) happens between the second begin (afs_write_begin) and the store corresponding to that. This results in the second store not seeing any data to write back, leading to the following warning: WARNING: CPU: 2 PID: 114 at ../fs/afs/write.c:403 afs_write_back_from_locked_page+0x19d/0x76c [kafs] Modules linked in: kafs(E) CPU: 2 PID: 114 Comm: kworker/u8:3 Tainted: G E 4.14.0-fscache+ #242 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Workqueue: writeback wb_workfn (flush-afs-2) task: ffff8800cad72600 task.stack: ffff8800cad44000 RIP: 0010:afs_write_back_from_locked_page+0x19d/0x76c [kafs] RSP: 0018:ffff8800cad47aa0 EFLAGS: 00010246 RAX: 0000000000000001 RBX: ffff8800bef33a20 RCX: 0000000000000000 RDX: 000000000000000f RSI: ffffffff81c5d0e0 RDI: ffff8800cad72e78 RBP: ffff8800d31ea1e8 R08: ffff8800c1358000 R09: ffff8800ca00e400 R10: ffff8800cad47a38 R11: ffff8800c5d9e400 R12: 0000000000000000 R13: ffffea0002d9df00 R14: ffffffffa0023c1c R15: 0000000000007fdf FS: 0000000000000000(0000) GS:ffff8800ca700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f85ac6c4000 CR3: 0000000001c10001 CR4: 00000000001606e0 Call Trace: ? clear_page_dirty_for_io+0x23a/0x267 afs_writepages_region+0x1be/0x286 [kafs] afs_writepages+0x60/0x127 [kafs] do_writepages+0x36/0x70 __writeback_single_inode+0x12f/0x635 writeback_sb_inodes+0x2cc/0x452 __writeback_inodes_wb+0x68/0x9f wb_writeback+0x208/0x470 ? wb_workfn+0x22b/0x565 wb_workfn+0x22b/0x565 ? worker_thread+0x230/0x2ac process_one_work+0x2cc/0x517 ? worker_thread+0x230/0x2ac worker_thread+0x1d4/0x2ac ? rescuer_thread+0x29b/0x29b kthread+0x15d/0x165 ? kthread_create_on_node+0x3f/0x3f ? call_usermodehelper_exec_async+0x118/0x11f ret_from_fork+0x24/0x30 Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-22Merge tag 'xfs-4.15-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Darrick Wong: - Fix a memory leak in the new in-core extent map - Refactor the xfs_dev_t conversions for easier xfsprogs porting * tag 'xfs-4.15-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: abstract out dev_t conversions xfs: fix memory leak in xfs_iext_free_last_leaf
2017-11-22Merge branch 'work.whack-a-mole' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull mode_t whack-a-mole from Al Viro: "For all internal uses we want umode_t, which is arch-independent; mode_t (or __kernel_mode_t, for that matter) is wrong outside of userland ABI. Unfortunately, that crap keeps coming back and needs to be put down from time to time..." * 'work.whack-a-mole' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: mode_t whack-a-mole: task_dump_owner()
2017-11-22Merge branch '9p-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull 9p filesystemfixes from Al Viro: "Several 9p fixes" * '9p-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: 9p: Fix missing commas in mount options net/9p: Switch to wait_event_killable() fs/9p: Compare qid.path in v9fs_test_inode
2017-11-21treewide: setup_timer() -> timer_setup()Kees Cook
This converts all remaining cases of the old setup_timer() API into using timer_setup(), where the callback argument is the structure already holding the struct timer_list. These should have no behavioral changes, since they just change which pointer is passed into the callback with the same available pointers after conversion. It handles the following examples, in addition to some other variations. Casting from unsigned long: void my_callback(unsigned long data) { struct something *ptr = (struct something *)data; ... } ... setup_timer(&ptr->my_timer, my_callback, ptr); and forced object casts: void my_callback(struct something *ptr) { ... } ... setup_timer(&ptr->my_timer, my_callback, (unsigned long)ptr); become: void my_callback(struct timer_list *t) { struct something *ptr = from_timer(ptr, t, my_timer); ... } ... timer_setup(&ptr->my_timer, my_callback, 0); Direct function assignments: void my_callback(unsigned long data) { struct something *ptr = (struct something *)data; ... } ... ptr->my_timer.function = my_callback; have a temporary cast added, along with converting the args: void my_callback(struct timer_list *t) { struct something *ptr = from_timer(ptr, t, my_timer); ... } ... ptr->my_timer.function = (TIMER_FUNC_TYPE)my_callback; And finally, callbacks without a data assignment: void my_callback(unsigned long data) { ... } ... setup_timer(&ptr->my_timer, my_callback, 0); have their argument renamed to verify they're unused during conversion: void my_callback(struct timer_list *unused) { ... } ... timer_setup(&ptr->my_timer, my_callback, 0); The conversion is done with the following Coccinelle script: spatch --very-quiet --all-includes --include-headers \ -I ./arch/x86/include -I ./arch/x86/include/generated \ -I ./include -I ./arch/x86/include/uapi \ -I ./arch/x86/include/generated/uapi -I ./include/uapi \ -I ./include/generated/uapi --include ./include/linux/kconfig.h \ --dir . \ --cocci-file ~/src/data/timer_setup.cocci @fix_address_of@ expression e; @@ setup_timer( -&(e) +&e , ...) // Update any raw setup_timer() usages that have a NULL callback, but // would otherwise match change_timer_function_usage, since the latter // will update all function assignments done in the face of a NULL // function initialization in setup_timer(). @change_timer_function_usage_NULL@ expression _E; identifier _timer; type _cast_data; @@ ( -setup_timer(&_E->_timer, NULL, _E); +timer_setup(&_E->_timer, NULL, 0); | -setup_timer(&_E->_timer, NULL, (_cast_data)_E); +timer_setup(&_E->_timer, NULL, 0); | -setup_timer(&_E._timer, NULL, &_E); +timer_setup(&_E._timer, NULL, 0); | -setup_timer(&_E._timer, NULL, (_cast_data)&_E); +timer_setup(&_E._timer, NULL, 0); ) @change_timer_function_usage@ expression _E; identifier _timer; struct timer_list _stl; identifier _callback; type _cast_func, _cast_data; @@ ( -setup_timer(&_E->_timer, _callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, &_callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, _callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, &_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)_callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)&_callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)&_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E._timer, _callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, _callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, &_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, &_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | _E->_timer@_stl.function = _callback; | _E->_timer@_stl.function = &_callback; | _E->_timer@_stl.function = (_cast_func)_callback; | _E->_timer@_stl.function = (_cast_func)&_callback; | _E._timer@_stl.function = _callback; | _E._timer@_stl.function = &_callback; | _E._timer@_stl.function = (_cast_func)_callback; | _E._timer@_stl.function = (_cast_func)&_callback; ) // callback(unsigned long arg) @change_callback_handle_cast depends on change_timer_function_usage@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _origtype; identifier _origarg; type _handletype; identifier _handle; @@ void _callback( -_origtype _origarg +struct timer_list *t ) { ( ... when != _origarg _handletype *_handle = -(_handletype *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg | ... when != _origarg _handletype *_handle = -(void *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg | ... when != _origarg _handletype *_handle; ... when != _handle _handle = -(_handletype *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg | ... when != _origarg _handletype *_handle; ... when != _handle _handle = -(void *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg ) } // callback(unsigned long arg) without existing variable @change_callback_handle_cast_no_arg depends on change_timer_function_usage && !change_callback_handle_cast@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _origtype; identifier _origarg; type _handletype; @@ void _callback( -_origtype _origarg +struct timer_list *t ) { + _handletype *_origarg = from_timer(_origarg, t, _timer); + ... when != _origarg - (_handletype *)_origarg + _origarg ... when != _origarg } // Avoid already converted callbacks. @match_callback_converted depends on change_timer_function_usage && !change_callback_handle_cast && !change_callback_handle_cast_no_arg@ identifier change_timer_function_usage._callback; identifier t; @@ void _callback(struct timer_list *t) { ... } // callback(struct something *handle) @change_callback_handle_arg depends on change_timer_function_usage && !match_callback_converted && !change_callback_handle_cast && !change_callback_handle_cast_no_arg@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _handletype; identifier _handle; @@ void _callback( -_handletype *_handle +struct timer_list *t ) { + _handletype *_handle = from_timer(_handle, t, _timer); ... } // If change_callback_handle_arg ran on an empty function, remove // the added handler. @unchange_callback_handle_arg depends on change_timer_function_usage && change_callback_handle_arg@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _handletype; identifier _handle; identifier t; @@ void _callback(struct timer_list *t) { - _handletype *_handle = from_timer(_handle, t, _timer); } // We only want to refactor the setup_timer() data argument if we've found // the matching callback. This undoes changes in change_timer_function_usage. @unchange_timer_function_usage depends on change_timer_function_usage && !change_callback_handle_cast && !change_callback_handle_cast_no_arg && !change_callback_handle_arg@ expression change_timer_function_usage._E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type change_timer_function_usage._cast_data; @@ ( -timer_setup(&_E->_timer, _callback, 0); +setup_timer(&_E->_timer, _callback, (_cast_data)_E); | -timer_setup(&_E._timer, _callback, 0); +setup_timer(&_E._timer, _callback, (_cast_data)&_E); ) // If we fixed a callback from a .function assignment, fix the // assignment cast now. @change_timer_function_assignment depends on change_timer_function_usage && (change_callback_handle_cast || change_callback_handle_cast_no_arg || change_callback_handle_arg)@ expression change_timer_function_usage._E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type _cast_func; typedef TIMER_FUNC_TYPE; @@ ( _E->_timer.function = -_callback +(TIMER_FUNC_TYPE)_callback ; | _E->_timer.function = -&_callback +(TIMER_FUNC_TYPE)_callback ; | _E->_timer.function = -(_cast_func)_callback; +(TIMER_FUNC_TYPE)_callback ; | _E->_timer.function = -(_cast_func)&_callback +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -_callback +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -&_callback; +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -(_cast_func)_callback +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -(_cast_func)&_callback +(TIMER_FUNC_TYPE)_callback ; ) // Sometimes timer functions are called directly. Replace matched args. @change_timer_function_calls depends on change_timer_function_usage && (change_callback_handle_cast || change_callback_handle_cast_no_arg || change_callback_handle_arg)@ expression _E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type _cast_data; @@ _callback( ( -(_cast_data)_E +&_E->_timer | -(_cast_data)&_E +&_E._timer | -_E +&_E->_timer ) ) // If a timer has been configured without a data argument, it can be // converted without regard to the callback argument, since it is unused. @match_timer_function_unused_data@ expression _E; identifier _timer; identifier _callback; @@ ( -setup_timer(&_E->_timer, _callback, 0); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, _callback, 0L); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, _callback, 0UL); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E._timer, _callback, 0); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, _callback, 0L); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, _callback, 0UL); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_timer, _callback, 0); +timer_setup(&_timer, _callback, 0); | -setup_timer(&_timer, _callback, 0L); +timer_setup(&_timer, _callback, 0); | -setup_timer(&_timer, _callback, 0UL); +timer_setup(&_timer, _callback, 0); | -setup_timer(_timer, _callback, 0); +timer_setup(_timer, _callback, 0); | -setup_timer(_timer, _callback, 0L); +timer_setup(_timer, _callback, 0); | -setup_timer(_timer, _callback, 0UL); +timer_setup(_timer, _callback, 0); ) @change_callback_unused_data depends on match_timer_function_unused_data@ identifier match_timer_function_unused_data._callback; type _origtype; identifier _origarg; @@ void _callback( -_origtype _origarg +struct timer_list *unused ) { ... when != _origarg } Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21treewide: Switch DEFINE_TIMER callbacks to struct timer_list *Kees Cook
This changes all DEFINE_TIMER() callbacks to use a struct timer_list pointer instead of unsigned long. Since the data argument has already been removed, none of these callbacks are using their argument currently, so this renames the argument to "unused". Done using the following semantic patch: @match_define_timer@ declarer name DEFINE_TIMER; identifier _timer, _callback; @@ DEFINE_TIMER(_timer, _callback); @change_callback depends on match_define_timer@ identifier match_define_timer._callback; type _origtype; identifier _origarg; @@ void -_callback(_origtype _origarg) +_callback(struct timer_list *unused) { ... } Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21Merge tag 'for-linus-4.15-ofs1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs updates from Mike Marshall: "Fix: - stop setting atime on inode dirty (Martin Brandenburg) Cleanups: - remove initialization of i_version (Jeff Layton) - use ARRAY_SIZE (Jérémy Lefaure) - call op_release sooner when creating inodes (Mike MarshallMartin Brandenburg)" * tag 'for-linus-4.15-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: orangefs: call op_release sooner when creating inodes orangefs: stop setting atime on inode dirty orangefs: use ARRAY_SIZE orangefs: remove initialization of i_version
2017-11-21Merge tag 'ceph-for-4.15-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph updates from Ilya Dryomov: "We have a set of file locking improvements from Zheng, rbd rw/ro state handling code cleanup from myself and some assorted CephFS fixes from Jeff. rbd now defaults to single-major=Y, lifting the limit of ~240 rbd images per host for everyone" * tag 'ceph-for-4.15-rc1' of git://github.com/ceph/ceph-client: rbd: default to single-major device number scheme libceph: don't WARN() if user tries to add invalid key rbd: set discard_alignment to zero ceph: silence sparse endianness warning in encode_caps_cb ceph: remove the bump of i_version ceph: present consistent fsid, regardless of arch endianness ceph: clean up spinlocking and list handling around cleanup_cap_releases() rbd: get rid of rbd_mapping::read_only rbd: fix and simplify rbd_ioctl_set_ro() ceph: remove unused and redundant variable dropping ceph: mark expected switch fall-throughs ceph: -EINVAL on decoding failure in ceph_mdsc_handle_fsmap() ceph: disable cached readdir after dropping positive dentry ceph: fix bool initialization/comparison ceph: handle 'session get evicted while there are file locks' ceph: optimize flock encoding during reconnect ceph: make lock_to_ceph_filelock() static ceph: keep auth cap when inode has flocks or posix locks
2017-11-21xfs: abstract out dev_t conversionsChristoph Hellwig
And move them to xfs_linux.h so that xfsprogs can stub them out more easily. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-11-21xfs: fix memory leak in xfs_iext_free_last_leafShu Wang
found the issue by kmemleak. unreferenced object 0xffff8800674611c0 (size 16): xfs_iext_insert+0x82a/0xa90 [xfs] xfs_bmap_add_extent_hole_delay+0x1e5/0x5b0 [xfs] xfs_bmapi_reserve_delalloc+0x483/0x530 [xfs] xfs_file_iomap_begin+0xac8/0xd40 [xfs] iomap_apply+0xb8/0x1b0 iomap_file_buffered_write+0xac/0xe0 xfs_file_buffered_aio_write+0x198/0x420 [xfs] xfs_file_write_iter+0x23f/0x2a0 [xfs] __vfs_write+0x23e/0x340 vfs_write+0xe9/0x240 SyS_write+0xa1/0x120 do_syscall_64+0xda/0x260 Signed-off-by: Shu Wang <shuwang@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-11-18Merge tag 'nfsd-4.15' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: "Lots of good bugfixes, including: - fix a number of races in the NFSv4+ state code - fix some shutdown crashes in multiple-network-namespace cases - relax our 4.1 session limits; if you've an artificially low limit to the number of 4.1 clients that can mount simultaneously, try upgrading" * tag 'nfsd-4.15' of git://linux-nfs.org/~bfields/linux: (22 commits) SUNRPC: Improve ordering of transport processing nfsd: deal with revoked delegations appropriately svcrdma: Enqueue after setting XPT_CLOSE in completion handlers nfsd: use nfs->ns.inum as net ID rpc: remove some BUG()s svcrdma: Preserve CB send buffer across retransmits nfds: avoid gettimeofday for nfssvc_boot time fs, nfsd: convert nfs4_file.fi_ref from atomic_t to refcount_t fs, nfsd: convert nfs4_cntl_odstate.co_odcount from atomic_t to refcount_t fs, nfsd: convert nfs4_stid.sc_count from atomic_t to refcount_t lockd: double unregister of inetaddr notifiers nfsd4: catch some false session retries nfsd4: fix cached replies to solo SEQUENCE compounds sunrcp: make function _svc_create_xprt static SUNRPC: Fix tracepoint storage issues with svc_recv and svc_rqst_status nfsd: use ARRAY_SIZE nfsd: give out fewer session slots as limit approaches nfsd: increase DRC cache limit nfsd: remove unnecessary nofilehandle checks nfs_common: convert int to bool ...
2017-11-17Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: - a bit more MM - procfs updates - dynamic-debug fixes - lib/ updates - checkpatch - epoll - nilfs2 - signals - rapidio - PID management cleanup and optimization - kcov updates - sysvipc updates - quite a few misc things all over the place * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (94 commits) EXPERT Kconfig menu: fix broken EXPERT menu include/asm-generic/topology.h: remove unused parent_node() macro arch/tile/include/asm/topology.h: remove unused parent_node() macro arch/sparc/include/asm/topology_64.h: remove unused parent_node() macro arch/sh/include/asm/topology.h: remove unused parent_node() macro arch/ia64/include/asm/topology.h: remove unused parent_node() macro drivers/pcmcia/sa1111_badge4.c: avoid unused function warning mm: add infrastructure for get_user_pages_fast() benchmarking sysvipc: make get_maxid O(1) again sysvipc: properly name ipc_addid() limit parameter sysvipc: duplicate lock comments wrt ipc_addid() sysvipc: unteach ids->next_id for !CHECKPOINT_RESTORE initramfs: use time64_t timestamps drivers/watchdog: make use of devm_register_reboot_notifier() kernel/reboot.c: add devm_register_reboot_notifier() kcov: update documentation Makefile: support flag -fsanitizer-coverage=trace-cmp kcov: support comparison operands collection kcov: remove pointless current != NULL check kernel/panic.c: add TAINT_AUX ...
2017-11-17pid: replace pid bitmap implementation with IDR APIGargi Sharma
Patch series "Replacing PID bitmap implementation with IDR API", v4. This series replaces kernel bitmap implementation of PID allocation with IDR API. These patches are written to simplify the kernel by replacing custom code with calls to generic code. The following are the stats for pid and pid_namespace object files before and after the replacement. There is a noteworthy change between the IDR and bitmap implementation. Before text data bss dec hex filename 8447 3894 64 12405 3075 kernel/pid.o After text data bss dec hex filename 3397 304 0 3701 e75 kernel/pid.o Before text data bss dec hex filename 5692 1842 192 7726 1e2e kernel/pid_namespace.o After text data bss dec hex filename 2854 216 16 3086 c0e kernel/pid_namespace.o The following are the stats for ps, pstree and calling readdir on /proc for 10,000 processes. ps: With IDR API With bitmap real 0m1.479s 0m2.319s user 0m0.070s 0m0.060s sys 0m0.289s 0m0.516s pstree: With IDR API With bitmap real 0m1.024s 0m1.794s user 0m0.348s 0m0.612s sys 0m0.184s 0m0.264s proc: With IDR API With bitmap real 0m0.059s 0m0.074s user 0m0.000s 0m0.004s sys 0m0.016s 0m0.016s This patch (of 2): Replace the current bitmap implementation for Process ID allocation. Functions that are no longer required, for example, free_pidmap(), alloc_pidmap(), etc. are removed. The rest of the functions are modified to use the IDR API. The change was made to make the PID allocation less complex by replacing custom code with calls to generic API. [gs051095@gmail.com: v6] Link: http://lkml.kernel.org/r/1507760379-21662-2-git-send-email-gs051095@gmail.com [avagin@openvz.org: restore the old behaviour of the ns_last_pid sysctl] Link: http://lkml.kernel.org/r/20171106183144.16368-1-avagin@openvz.org Link: http://lkml.kernel.org/r/1507583624-22146-2-git-send-email-gs051095@gmail.com Signed-off-by: Gargi Sharma <gs051095@gmail.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Julia Lawall <julia.lawall@lip6.fr> Cc: Ingo Molnar <mingo@kernel.org> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17fat: remove redundant assignment of 0 to slotsColin Ian King
The variable slots is being assigned a value of zero that is never read, slots is being updated again a few lines later. Remove this redundant assignment. Cleans clang warning: Value stored to 'slots' is never read Link: http://lkml.kernel.org/r/20171017140258.22536-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17hfs/hfsplus: clean up unused variables in bnode.cChristos Gkekas
Delete variables 'tree' and 'sb', which are set but never used. Link: http://lkml.kernel.org/r/1507977146-15875-1-git-send-email-chris.gekas@gmail.com Signed-off-by: Christos Gkekas <chris.gekas@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17nilfs2: remove inode->i_version initializationJeff Layton
It's never used in nilfs2. Link: http://lkml.kernel.org/r/1510064486-1728-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17nilfs2: use octal for unreadable permission macroRyusuke Konishi
Replace S_IRWXUGO with 0777 because symbolic permissions are considered harmful: https://lwn.net/Articles/696229/ Link: http://lkml.kernel.org/r/1509367935-3086-5-git-send-email-konishi.ryusuke@lab.ntt.co.jp Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17nilfs2: align block comments of nilfs_sufile_truncate_range() at *Ryusuke Konishi
Fix the following checkpatch warning: WARNING: Block comments should align the * on each line #633: FILE: sufile.c:633: +/** + * nilfs_sufile_truncate_range - truncate range of segment array Link: http://lkml.kernel.org/r/1509367935-3086-4-git-send-email-konishi.ryusuke@lab.ntt.co.jp Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17fs, nilfs: convert nilfs_root.count from atomic_t to refcount_tElena Reshetova
atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nilfs_root.count is used as pure reference counter. Convert it to refcount_t and fix up the operations. Link: http://lkml.kernel.org/r/1509367935-3086-3-git-send-email-konishi.ryusuke@lab.ntt.co.jp Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>