summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2014-03-09Merge tag 'nfs-for-3.14-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: "Highlights include: - Fix another nfs4_sequence corruptor in RELEASE_LOCKOWNER - Fix an Oopsable delegation callback race - Fix another bad stateid infinite loop - Fail the data server I/O is the stateid represents a lost lock - Fix an Oopsable sunrpc trace event" * tag 'nfs-for-3.14-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: Fix oops when trace sunrpc_task events in nfs client NFSv4: Fail the truncate() if the lock/open stateid is invalid NFSv4.1 Fail data server I/O if stateid represents a lost lock NFSv4: Fix the return value of nfs4_select_rw_stateid NFSv4: nfs4_stateid_is_current should return 'true' for an invalid stateid NFS: Fix a delegation callback race NFSv4: Fix another nfs4_sequence corruptor
2014-03-07Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "Small collection of fixes for 3.14-rc. It contains: - Three minor update to blk-mq from Christoph. - Reduce number of unaligned (< 4kb) in-flight writes on mtip32xx to two. From Micron. - Make the blk-mq CPU notify spinlock raw, since it can't be a sleeper spinlock on RT. From Mike Galbraith. - Drop now bogus BUG_ON() for bio iteration with blk integrity. From Nic Bellinger. - Properly propagate the SYNC flag on requests. From Shaohua" * 'for-linus' of git://git.kernel.dk/linux-block: blk-mq: add REQ_SYNC early rt,blk,mq: Make blk_mq_cpu_notify_lock a raw spinlock bio-integrity: Drop bio_integrity_verify BUG_ON in post bip->bip_iter world blk-mq: support partial I/O completions blk-mq: merge blk_mq_insert_request and blk_mq_run_request blk-mq: remove blk_mq_alloc_rq mtip32xx: Reduce the number of unaligned writes to 2
2014-03-05NFSv4: Fail the truncate() if the lock/open stateid is invalidTrond Myklebust
If the open stateid could not be recovered, or the file locks were lost, then we should fail the truncate() operation altogether. Reported-by: Andy Adamson <andros@netapp.com> Link: http://lkml.kernel.org/r/1393954269-3974-1-git-send-email-andros@netapp.com Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-03-05NFSv4.1 Fail data server I/O if stateid represents a lost lockAndy Adamson
Signed-off-by: Andy Adamson <andros@netapp.com> Link: http://lkml.kernel.org/r/1393954269-3974-1-git-send-email-andros@netapp.com Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-03-05NFSv4: Fix the return value of nfs4_select_rw_stateidTrond Myklebust
In commit 5521abfdcf4d6 (NFSv4: Resend the READ/WRITE RPC call if a stateid change causes an error), we overloaded the return value of nfs4_select_rw_stateid() to cause it to return -EWOULDBLOCK if an RPC call is outstanding that would cause the NFSv4 lock or open stateid to change. That is all redundant when we actually copy the stateid used in the read/write RPC call that failed, and check that against the current stateid. It is doubly so, when we consider that in the NFSv4.1 case, we also set the stateid's seqid to the special value '0', which means 'match the current valid stateid'. Reported-by: Andy Adamson <andros@netapp.com> Link: http://lkml.kernel.org/r/1393954269-3974-1-git-send-email-andros@netapp.com Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-03-05NFSv4: nfs4_stateid_is_current should return 'true' for an invalid stateidTrond Myklebust
When nfs4_set_rw_stateid() can fails by returning EIO to indicate that the stateid is completely invalid, then it makes no sense to have it trigger a retry of the READ or WRITE operation. Instead, we should just have it fall through and attempt a recovery. This fixes an infinite loop in which the client keeps replaying the same bad stateid back to the server. Reported-by: Andy Adamson <andros@netapp.com> Link: http://lkml.kernel.org/r/1393954269-3974-1-git-send-email-andros@netapp.com Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-03-04hfsplus: fix remount issueVyacheslav Dubeyko
Current implementation of HFS+ driver has small issue with remount option. Namely, for example, you are unable to remount from RO mode into RW mode by means of command "mount -o remount,rw /dev/loop0 /mnt/hfsplus". Trying to execute sequence of commands results in an error message: mount /dev/loop0 /mnt/hfsplus mount -o remount,ro /dev/loop0 /mnt/hfsplus mount -o remount,rw /dev/loop0 /mnt/hfsplus mount: you must specify the filesystem type mount -t hfsplus -o remount,rw /dev/loop0 /mnt/hfsplus mount: /mnt/hfsplus not mounted or bad option The reason of such issue is failure of mount syscall: mount("/dev/loop0", "/mnt/hfsplus", 0x2282a60, MS_MGC_VAL|MS_REMOUNT, NULL) = -1 EINVAL (Invalid argument) Namely, hfsplus_parse_options_remount() method receives empty "input" argument and return false in such case. As a result, hfsplus_remount() returns -EINVAL error code. This patch fixes the issue by means of return true for the case of empty "input" argument in hfsplus_parse_options_remount() method. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-04ocfs2: fix quota file corruptionJan Kara
Global quota files are accessed from different nodes. Thus we cannot cache offset of quota structure in the quota file after we drop our node reference count to it because after that moment quota structure may be freed and reallocated elsewhere by a different node resulting in corruption of quota file. Fix the problem by clearing dq_off when we are releasing dquot structure. We also remove the DB_READ_B handling because it is useless - DQ_ACTIVE_B is set iff DQ_READ_B is set. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Goldwyn Rodrigues <rgoldwyn@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-04mm: close PageTail raceDavid Rientjes
Commit bf6bddf1924e ("mm: introduce compaction and migration for ballooned pages") introduces page_count(page) into memory compaction which dereferences page->first_page if PageTail(page). This results in a very rare NULL pointer dereference on the aforementioned page_count(page). Indeed, anything that does compound_head(), including page_count() is susceptible to racing with prep_compound_page() and seeing a NULL or dangling page->first_page pointer. This patch uses Andrea's implementation of compound_trans_head() that deals with such a race and makes it the default compound_head() implementation. This includes a read memory barrier that ensures that if PageTail(head) is true that we return a head page that is neither NULL nor dangling. The patch then adds a store memory barrier to prep_compound_page() to ensure page->first_page is set. This is the safest way to ensure we see the head page that we are expecting, PageTail(page) is already in the unlikely() path and the memory barriers are unfortunately required. Hugetlbfs is the exception, we don't enforce a store memory barrier during init since no race is possible. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Holger Kiehl <Holger.Kiehl@dwd.de> Cc: Christoph Lameter <cl@linux.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-02NFS: Fix a delegation callback raceTrond Myklebust
The clean-up in commit 36281caa839f ended up removing a NULL pointer check that is needed in order to prevent an Oops in nfs_async_inode_return_delegation(). Reported-by: "Yan, Zheng" <zheng.z.yan@intel.com> Link: http://lkml.kernel.org/r/5313E9F6.2020405@intel.com Fixes: 36281caa839f (NFSv4: Further clean-ups of delegation stateid validation) Cc: stable@vger.kernel.org # 3.4+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-03-02Merge tag 'driver-core-3.14-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull sysfs fix from Greg KH: "Here is a single sysfs fix for 3.14-rc5. It fixes a reported problem with the namespace code in sysfs" * tag 'driver-core-3.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: sysfs: fix namespace refcnt leak
2014-03-01NFSv4: Fix another nfs4_sequence corruptorTrond Myklebust
nfs4_release_lockowner needs to set the rpc_message reply to point to the nfs4_sequence_res in order to avoid another Oopsable situation in nfs41_assign_slot. Fixes: fbd4bfd1d9d21 (NFS: Add nfs4_sequence calls for RELEASE_LOCKOWNER) Cc: stable@vger.kernel.org # 3.12+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-02-27Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull filesystem fixes from Jan Kara: "Notification, writeback, udf, quota fixes The notification patches are (with one exception) a fallout of my fsnotify rework which went into -rc1 (I've extented LTP to cover these cornercases to avoid similar breakage in future). The UDF patch is a nasty data corruption Al has recently reported, the revert of the writeback patch is due to possibility of violating sync(2) guarantees, and a quota bug can lead to corruption of quota files in ocfs2" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fsnotify: Allocate overflow events with proper type fanotify: Handle overflow in case of permission events fsnotify: Fix detection whether overflow event is queued Revert "writeback: do not sync data dirtied after sync start" quota: Fix race between dqput() and dquot_scan_active() udf: Fix data corruption on file type conversion inotify: Fix reporting of cookies for inotify events
2014-02-25sysfs: fix namespace refcnt leakLi Zefan
As mount() and kill_sb() is not a one-to-one match, we shoudn't get ns refcnt unconditionally in sysfs_mount(), and instead we should get the refcnt only when kernfs_mount() allocated a new superblock. v2: - Changed the name of the new argument, suggested by Tejun. - Made the argument optional, suggested by Tejun. v3: - Make the new argument as second-to-last arg, suggested by Tejun. Signed-off-by: Li Zefan <lizefan@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> --- fs/kernfs/mount.c | 8 +++++++- fs/sysfs/mount.c | 5 +++-- include/linux/kernfs.h | 9 +++++---- 3 files changed, 15 insertions(+), 7 deletions(-) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-25fsnotify: Allocate overflow events with proper typeJan Kara
Commit 7053aee26a35 "fsnotify: do not share events between notification groups" used overflow event statically allocated in a group with the size of the generic notification event. This causes problems because some code looks at type specific parts of event structure and gets confused by a random data it sees there and causes crashes. Fix the problem by allocating overflow event with type corresponding to the group type so code cannot get confused. Signed-off-by: Jan Kara <jack@suse.cz>
2014-02-25fanotify: Handle overflow in case of permission eventsJan Kara
If the event queue overflows when we are handling permission event, we will never get response from userspace. So we must avoid waiting for it. Change fsnotify_add_notify_event() to return whether overflow has happened so that we can detect it in fanotify_handle_event() and act accordingly. Signed-off-by: Jan Kara <jack@suse.cz>
2014-02-25fsnotify: Fix detection whether overflow event is queuedJan Kara
Currently we didn't initialize event's list head when we removed it from the event list. Thus a detection whether overflow event is already queued wasn't working. Fix it by always initializing the list head when deleting event from a list. Signed-off-by: Jan Kara <jack@suse.cz>
2014-02-22Merge branch 'xfs-fixes-for-3.14-rc4' of git://oss.sgi.com/xfs/xfsLinus Torvalds
Pull xfs fixes from Dave Chinner: "This is the first pull request I've had to do for you, so I'm still sorting things out. The reason I'm sending this and not Ben should be obvious from the first commit below - SGI has stepped down from the XFS maintainership role. As such, I'd like to take another opportunity to thank them for their many years of effort maintaining XFS and supporting the XFS community that they developed from the ground up. So I haven't had time to work things like signed tags into my workflows yet, so this is just a repo branch I'm asking you to pull from. And yes, I named the branch -rc4 because I wanted the fixes in rc4, not because the branch was for merging into -rc3. Probably not right, either. Anyway, I should have everything sorted out by the time the next merge window comes around. If there's anything that you don't like in the pull req, feel free to flame me unmercifully. The changes are fixes for recent regressions and important thinkos in verification code: - a log vector buffer alignment issue on ia32 - timestamps on truncate got mangled - primary superblock CRC validation fixes and error message sanitisation" * 'xfs-fixes-for-3.14-rc4' of git://oss.sgi.com/xfs/xfs: xfs: limit superblock corruption errors to actual corruption xfs: skip verification on initial "guess" superblock read MAINTAINERS: SGI no longer maintaining XFS xfs: xfs_sb_read_verify() doesn't flag bad crcs on primary sb xfs: ensure correct log item buffer alignment xfs: ensure correct timestamp updates from truncate
2014-02-22Revert "writeback: do not sync data dirtied after sync start"Jan Kara
This reverts commit c4a391b53a72d2df4ee97f96f78c1d5971b47489. Dave Chinner <david@fromorbit.com> has reported the commit may cause some inodes to be left out from sync(2). This is because we can call redirty_tail() for some inode (which sets i_dirtied_when to current time) after sync(2) has started or similarly requeue_inode() can set i_dirtied_when to current time if writeback had to skip some pages. The real problem is in the functions clobbering i_dirtied_when but fixing that isn't trivial so revert is a safer choice for now. CC: stable@vger.kernel.org # >= 3.13 Signed-off-by: Jan Kara <jack@suse.cz>
2014-02-21bio-integrity: Drop bio_integrity_verify BUG_ON in post bip->bip_iter worldNicholas Bellinger
Given that bip->bip_iter.bi_size is decremented after bio_advance() -> bio_integrity_advance() is called, the BUG_ON() in bio_integrity_verify() ends up tripping in v3.14-rc1 code with the advent of immutable biovecs in: commit d57a5f7c6605f15f3b5134837e68b448a7cea88e Author: Kent Overstreet <kmo@daterainc.com> Date: Sat Nov 23 17:20:16 2013 -0800 bio-integrity: Convert to bvec_iter Given that there is no easy way to ascertain the original bi_size value, go ahead and drop this BUG_ON(). Reported-by: Sagi Grimberg <sagig@dev.mellanox.co.il> Reported-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Cc: Kent Overstreet <kmo@daterainc.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-02-20quota: Fix race between dqput() and dquot_scan_active()Jan Kara
Currently last dqput() can race with dquot_scan_active() causing it to call callback for an already deactivated dquot. The race is as follows: CPU1 CPU2 dqput() spin_lock(&dq_list_lock); if (atomic_read(&dquot->dq_count) > 1) { - not taken if (test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) { spin_unlock(&dq_list_lock); ->release_dquot(dquot); if (atomic_read(&dquot->dq_count) > 1) - not taken dquot_scan_active() spin_lock(&dq_list_lock); if (!test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) - not taken atomic_inc(&dquot->dq_count); spin_unlock(&dq_list_lock); - proceeds to release dquot ret = fn(dquot, priv); - called for inactive dquot Fix the problem by making sure possible ->release_dquot() is finished by the time we call the callback and new calls to it will notice reference dquot_scan_active() has taken and bail out. CC: stable@vger.kernel.org # >= 2.6.29 Signed-off-by: Jan Kara <jack@suse.cz>
2014-02-20udf: Fix data corruption on file type conversionJan Kara
UDF has two types of files - files with data stored in inode (ICB in UDF terminology) and files with data stored in external data blocks. We convert file from in-inode format to external format in udf_file_aio_write() when we find out data won't fit into inode any longer. However the following race between two O_APPEND writes can happen: CPU1 CPU2 udf_file_aio_write() udf_file_aio_write() down_write(&iinfo->i_data_sem); checks that i_size + count1 fits within inode => no need to convert up_write(&iinfo->i_data_sem); down_write(&iinfo->i_data_sem); checks that i_size + count2 fits within inode => no need to convert up_write(&iinfo->i_data_sem); generic_file_aio_write() - extends file by count1 bytes generic_file_aio_write() - extends file by count2 bytes Clearly if count1 + count2 doesn't fit into the inode, we overwrite kernel buffers beyond inode, possibly corrupting the filesystem as well. Fix the problem by acquiring i_mutex before checking whether write fits into the inode and using __generic_file_aio_write() afterwards which puts check and write into one critical section. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jan Kara <jack@suse.cz>
2014-02-20Merge branch 'for-3.14-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "Quite a few fixes this time. Three locking fixes, all marked for -stable. A couple error path fixes and some misc fixes. Hugh found a bug in memcg offlining sequence and we thought we could fix that from cgroup core side but that turned out to be insufficient and got reverted. A different fix has been applied to -mm" * 'for-3.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: update cgroup_enable_task_cg_lists() to grab siglock Revert "cgroup: use an ordered workqueue for cgroup destruction" cgroup: protect modifications to cgroup_idr with cgroup_mutex cgroup: fix locking in cgroup_cfts_commit() cgroup: fix error return from cgroup_create() cgroup: fix error return value in cgroup_mount() cgroup: use an ordered workqueue for cgroup destruction nfs: include xattr.h from fs/nfs/nfs3proc.c cpuset: update MAINTAINERS entry arm, pm, vmpressure: add missing slab.h includes
2014-02-19Merge tag 'nfs-for-3.14-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: "Highlights include stable fixes for the following bugs: - General performance regression due to NFS_INO_INVALID_LABEL being set when the server doesn't support labeled NFS - Hang in the RPC code due to a socket out-of-buffer race - Infinite loop when trying to establish the NFSv4 lease - Use-after-free bug in the RPCSEC gss code. - nfs4_select_rw_stateid is returning with a non-zero error value on success Other bug fixes: - Potential memory scribble in the RPC bi-directional RPC code - Pipe version reference leak - Use the correct net namespace in the new NFSv4 migration code" * tag 'nfs-for-3.14-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS fix error return in nfs4_select_rw_stateid NFSv4: Use the correct net namespace in nfs4_update_server SUNRPC: Fix a pipe_version reference leak SUNRPC: Ensure that gss_auth isn't freed before its upcall messages SUNRPC: Fix potential memory scribble in xprt_free_bc_request() SUNRPC: Fix races in xs_nospace() SUNRPC: Don't create a gss auth cache unless rpc.gssd is running NFS: Do not set NFS_INO_INVALID_LABEL unless server supports labeled NFS
2014-02-19NFS fix error return in nfs4_select_rw_stateidAndy Adamson
Do not return an error when nfs4_copy_delegation_stateid succeeds. Signed-off-by: Andy Adamson <andros@netapp.com> Link: http://lkml.kernel.org/r/1392737765-41942-1-git-send-email-andros@netapp.com Fixes: ef1820f9be27b (NFSv4: Don't try to recover NFSv4 locks when...) Cc: NeilBrown <neilb@suse.de> Cc: stable@vger.kernel.org # 3.12+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-02-19xfs: limit superblock corruption errors to actual corruptionEric Sandeen
Today, if xfs_sb_read_verify xfs_sb_verify xfs_mount_validate_sb detects superblock corruption, it'll be extremely noisy, dumping 2 stacks, 2 hexdumps, etc. This is because we call XFS_CORRUPTION_ERROR in xfs_mount_validate_sb as well as in xfs_sb_read_verify. Also, *any* errors in xfs_mount_validate_sb which are not corruption per se; things like too-big-blocksize, bad version, bad magic, v1 dirs, rw-incompat etc - things which do not return EFSCORRUPTED - will still do the whole XFS_CORRUPTION_ERROR spew when xfs_sb_read_verify sees any error at all. And it suggests to the user that they should run xfs_repair, even if the root cause of the mount failure is a simple incompatibility. I'll submit that the probably-not-corrupted errors don't warrant this much noise, so this patch removes the warning for anything other than EFSCORRUPTED returns, and replaces the lower-level XFS_CORRUPTION_ERROR with an xfs_notice(). Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-19xfs: skip verification on initial "guess" superblock readEric Sandeen
When xfs_readsb() does the very first read of the superblock, it makes a guess at the length of the buffer, based on the sector size of the underlying storage. This may or may not match the filesystem sector size in sb_sectsize, so we can't i.e. do a CRC check on it; it might be too short. In fact, mounting a filesystem with sb_sectsize larger than the device sector size will cause a mount failure if CRCs are enabled, because we are checksumming a length which exceeds the buffer passed to it. So always read twice; the first time we read with NULL buffer ops to skip verification; then set the proper read length, hook up the proper verifier, and give it another go. Once we are sure that we've got the right buffer length, we can also use bp->b_length in the xfs_sb_read_verify, rather than the less-trusted on-disk sectorsize for secondary superblocks. Before this we ran the risk of passing junk to the crc32c routines, which didn't always handle extreme values. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-19xfs: xfs_sb_read_verify() doesn't flag bad crcs on primary sbEric Sandeen
My earlier commit 10e6e65 deserves a layer or two of brown paper bags. The logic in that commit means that a CRC failure on the primary superblock will *never* result in an error return. Hopefully this fixes it, so that we always return the error if it's a primary superblock, otherwise only if the filesystem has CRCs enabled. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2014-02-18Merge tag 'jfs-3.14-rc4' of git://github.com/kleikamp/linux-shaggyLinus Torvalds
Pull jfs fix from David Kleikamp: "Another ACL regression. This one more subtle" * tag 'jfs-3.14-rc4' of git://github.com/kleikamp/linux-shaggy: jfs: set i_ctime when setting ACL
2014-02-18Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Miscellaneous ext4 bug fixes for v3.14" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: jbd2: fix use after free in jbd2_journal_start_reserved() ext4: don't leave i_crtime.tv_sec uninitialized ext4: fix online resize with a non-standard blocks per group setting ext4: fix online resize with very large inode tables ext4: don't try to modify s_flags if the the file system is read-only ext4: fix error paths in swap_inode_boot_loader() ext4: fix xfstest generic/299 block validity failures
2014-02-18inotify: Fix reporting of cookies for inotify eventsJan Kara
My rework of handling of notification events (namely commit 7053aee26a35 "fsnotify: do not share events between notification groups") broke sending of cookies with inotify events. We didn't propagate the value passed to fsnotify() properly and passed 4 uninitialized bytes to userspace instead (so it is also an information leak). Sadly I didn't notice this during my testing because inotify cookies aren't used very much and LTP inotify tests ignore them. Fix the problem by passing the cookie value properly. Fixes: 7053aee26a3548ebaba046ae2e52396ccf56ac6c Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>
2014-02-17jbd2: fix use after free in jbd2_journal_start_reserved()Dan Carpenter
If start_this_handle() fails then it leads to a use after free of "handle". Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2014-02-17Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fixes from Sage Weil: "We have some patches fixing up ACL support issues from Zheng and Guangliang and a mount option to enable/disable this support. (These fixes were somewhat delayed by the Chinese holiday.) There is also a small fix for cached readdir handling when directories are fragmented" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: fix __dcache_readdir() ceph: add acl, noacl options for cephfs mount ceph: make ceph_forget_all_cached_acls() static inline ceph: add missing init_acl() for mkdir() and atomic_open() ceph: fix ceph_set_acl() ceph: fix ceph_removexattr() ceph: remove xattr when null value is given to setxattr() ceph: properly handle XATTR_CREATE and XATTR_REPLACE
2014-02-17Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull CIFS fixes from Steve French: "Three cifs fixes, the most important fixing the problem with passing bogus pointers with writev (CVE-2014-0069). Two additional cifs fixes are still in review (including the fix for an append problem which Al also discovered)" * 'for-linus' of git://git.samba.org/sfrench/cifs-2.6: CIFS: Fix too big maxBuf size for SMB3 mounts cifs: ensure that uncached writes handle unmapped areas correctly [CIFS] Fix cifsacl mounts over smb2 to not call cifs
2014-02-17FS-Cache: Handle removal of unadded object to the fscache_object_list rb treeDavid Howells
When FS-Cache allocates an object, the following sequence of events can occur: -->fscache_alloc_object() -->cachefiles_alloc_object() [via cache->ops->alloc_object] <--[returns new object] -->fscache_attach_object() <--[failed] -->cachefiles_put_object() [via cache->ops->put_object] -->fscache_object_destroy() -->fscache_objlist_remove() -->rb_erase() to remove the object from fscache_object_list. resulting in a crash in the rbtree code. The problem is that the object is only added to fscache_object_list on the success path of fscache_attach_object() where it calls fscache_objlist_add(). So if fscache_attach_object() fails, the object won't have been added to the objlist rbtree. We do, however, unconditionally try to remove the object from the tree. Thanks to NeilBrown for finding this and suggesting this solution. Reported-by: NeilBrown <neilb@suse.de> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: (a customer of) NeilBrown <neilb@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-17reiserfs: fix utterly brain-damaged indentation.Dave Jones
This has been this way for years, and every time I stumble across it I lose my lunch. After coming across it for the nth time in the Coverity results, I had to overcome the bystander effect and do something about it. This ignores the 79 column limit in favor of making it look like C instead of gibberish. The correct thing to do here would be to lose some of the indentation by breaking this function up into several smaller ones. I might do that at some point if I have the stomach to look at this again. (Also some of those overlong ternary operations would likely be more readable as regular if's) Signed-off-by: Dave Jones <davej@fedoraproject.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-17ceph: fix __dcache_readdir()Yan, Zheng
If directory is fragmented, readdir() read its dirfrags one by one. After reading all dirfrags, the corresponding dentries are sorted in (frag_t, off) order in the dcache. If dentries of a directory are all cached, __dcache_readdir() can use the cached dentries to satisfy readdir syscall. But when checking if a given dentry is after the position of readdir, __dcache_readdir() compares numerical value of frag_t directly. This is wrong, it should use ceph_frag_compare(). Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-02-17ceph: add acl, noacl options for cephfs mountSage Weil
Make the 'acl' option dependent on having ACL support compiled in. Make the 'noacl' option work even without it so that one can always ask it to be off and not error out on mount when it is not supported. Signed-off-by: Guangliang Zhao <lucienchao@gmail.com> Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-17ceph: make ceph_forget_all_cached_acls() static inlineGuangliang Zhao
Signed-off-by: Guangliang Zhao <lucienchao@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org> Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-17ceph: add missing init_acl() for mkdir() and atomic_open()Yan, Zheng
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-02-17ceph: fix ceph_set_acl()Yan, Zheng
If acl is equivalent to file mode permission bits, ceph_set_acl() needs to remove any existing acl xattr. Use __ceph_setxattr() to handle both setting and removing acl xattr cases, it doesn't return -ENODATA when there is no acl xattr. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-02-17ceph: fix ceph_removexattr()Yan, Zheng
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-02-17ceph: remove xattr when null value is given to setxattr()Yan, Zheng
For the setxattr request, introduce a new flag CEPH_XATTR_REMOVE to distinguish null value case from the zero-length value case. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-02-17ceph: properly handle XATTR_CREATE and XATTR_REPLACEYan, Zheng
return -EEXIST if XATTR_CREATE is set and xattr alread exists. return -ENODATA if XATTR_REPLACE is set but xattr does not exist. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-02-17NFSv4: Use the correct net namespace in nfs4_update_serverTrond Myklebust
We need to use the same net namespace that was used to resolve the hostname and sockaddr arguments. Fixes: 32e62b7c3ef09 (NFS: Add nfs4_update_server) Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-02-16ext4: don't leave i_crtime.tv_sec uninitializedTheodore Ts'o
If the i_crtime field is not present in the inode, don't leave the field uninitialized. Fixes: ef7f38359 ("ext4: Add nanosecond timestamps") Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Tested-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2014-02-16Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "We have a small collection of fixes in my for-linus branch. The big thing that stands out is a revert of a new ioctl. Users haven't shipped yet in btrfs-progs, and Dave Sterba found a better way to export the information" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: use right clone root offset for compressed extents btrfs: fix null pointer deference at btrfs_sysfs_add_one+0x105 Btrfs: unset DCACHE_DISCONNECTED when mounting default subvol Btrfs: fix max_inline mount option Btrfs: fix a lockdep warning when cleaning up aborted transaction Revert "btrfs: add ioctl to export size of global metadata reservation"
2014-02-15ext4: fix online resize with a non-standard blocks per group settingTheodore Ts'o
The set_flexbg_block_bitmap() function assumed that the number of blocks in a blockgroup was sb->blocksize * 8, which is normally true, but not always! Use EXT4_BLOCKS_PER_GROUP(sb) instead, to fix block bitmap corruption after: mke2fs -t ext4 -g 3072 -i 4096 /dev/vdd 1G mount -t ext4 /dev/vdd /vdd resize2fs /dev/vdd 8G Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reported-by: Jon Bernard <jbernard@tuxion.com> Cc: stable@vger.kernel.org
2014-02-15ext4: fix online resize with very large inode tablesTheodore Ts'o
If a file system has a large number of inodes per block group, all of the metadata blocks in a flex_bg may be larger than what can fit in a single block group. Unfortunately, ext4_alloc_group_tables() in resize.c was never tested to see if it would handle this case correctly, and there were a large number of bugs which caused the following sequence to result in a BUG_ON: kernel bug at fs/ext4/resize.c:409! ... call trace: [<ffffffff81256768>] ext4_flex_group_add+0x1448/0x1830 [<ffffffff81257de2>] ext4_resize_fs+0x7b2/0xe80 [<ffffffff8123ac50>] ext4_ioctl+0xbf0/0xf00 [<ffffffff811c111d>] do_vfs_ioctl+0x2dd/0x4b0 [<ffffffff811b9df2>] ? final_putname+0x22/0x50 [<ffffffff811c1371>] sys_ioctl+0x81/0xa0 [<ffffffff81676aa9>] system_call_fastpath+0x16/0x1b code: c8 4c 89 df e8 41 96 f8 ff 44 89 e8 49 01 c4 44 29 6d d4 0 rip [<ffffffff81254fa1>] set_flexbg_block_bitmap+0x171/0x180 This can be reproduced with the following command sequence: mke2fs -t ext4 -i 4096 /dev/vdd 1G mount -t ext4 /dev/vdd /vdd resize2fs /dev/vdd 8G To fix this, we need to make sure the right thing happens when a block group's inode table straddles two block groups, which means the following bugs had to be fixed: 1) Not clearing the BLOCK_UNINIT flag in the second block group in ext4_alloc_group_tables --- the was proximate cause of the BUG_ON. 2) Incorrectly determining how many block groups contained contiguous free blocks in ext4_alloc_group_tables(). 3) Incorrectly setting the start of the next block range to be marked in use after a discontinuity in setup_new_flex_group_blocks(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2014-02-15Btrfs: use right clone root offset for compressed extentsFilipe David Borba Manana
For non compressed extents, iterate_extent_inodes() gives us offsets that take into account the data offset from the file extent items, while for compressed extents it doesn't. Therefore we have to adjust them before placing them in a send clone instruction. Not doing this adjustment leads to the receiving end requesting for a wrong a file range to the clone ioctl, which results in different file content from the one in the original send root. Issue reproducible with the following excerpt from the test I made for xfstests: _scratch_mkfs _scratch_mount "-o compress-force=lzo" $XFS_IO_PROG -f -c "truncate 118811" $SCRATCH_MNT/foo $XFS_IO_PROG -c "pwrite -S 0x0d -b 39987 92267 39987" $SCRATCH_MNT/foo $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1 $XFS_IO_PROG -c "pwrite -S 0x3e -b 80000 200000 80000" $SCRATCH_MNT/foo $BTRFS_UTIL_PROG filesystem sync $SCRATCH_MNT $XFS_IO_PROG -c "pwrite -S 0xdc -b 10000 250000 10000" $SCRATCH_MNT/foo $XFS_IO_PROG -c "pwrite -S 0xff -b 10000 300000 10000" $SCRATCH_MNT/foo # will be used for incremental send to be able to issue clone operations $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/clones_snap $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2 $FSSUM_PROG -A -f -w $tmp/1.fssum $SCRATCH_MNT/mysnap1 $FSSUM_PROG -A -f -w $tmp/2.fssum -x $SCRATCH_MNT/mysnap2/mysnap1 \ -x $SCRATCH_MNT/mysnap2/clones_snap $SCRATCH_MNT/mysnap2 $FSSUM_PROG -A -f -w $tmp/clones.fssum $SCRATCH_MNT/clones_snap \ -x $SCRATCH_MNT/clones_snap/mysnap1 -x $SCRATCH_MNT/clones_snap/mysnap2 $BTRFS_UTIL_PROG send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap $BTRFS_UTIL_PROG send $SCRATCH_MNT/clones_snap -f $tmp/clones.snap $BTRFS_UTIL_PROG send -p $SCRATCH_MNT/mysnap1 \ -c $SCRATCH_MNT/clones_snap $SCRATCH_MNT/mysnap2 -f $tmp/2.snap _scratch_unmount _scratch_mkfs _scratch_mount $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/1.snap $FSSUM_PROG -r $tmp/1.fssum $SCRATCH_MNT/mysnap1 2>> $seqres.full $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/clones.snap $FSSUM_PROG -r $tmp/clones.fssum $SCRATCH_MNT/clones_snap 2>> $seqres.full $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/2.snap $FSSUM_PROG -r $tmp/2.fssum $SCRATCH_MNT/mysnap2 2>> $seqres.full Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>