btrfs: remove ordered extent check and wait during fallocate

For fallocate() we have this loop that checks if we have ordered extents after locking the file range, and if so unlock the range, wait for ordered extents, and retry until we don't find more ordered extents. This logic was needed in the past because: 1) Direct IO writes within the i_size boundary did not take the inode's VFS lock. This was because that lock used to be a mutex, then some years ago it was switched to a rw semaphore (commit 9902af79c01a8e ("parallel lookups: actual switch to rwsem")), and then btrfs was changed to take the VFS inode's lock in shared mode for writes that don't cross the i_size boundary (commit e9adabb9712ef9 ("btrfs: use shared lock for direct writes within EOF")); 2) We could race with memory mapped writes, because memory mapped writes don't acquire the inode's VFS lock. We don't have that race anymore, as we have a rw semaphore to synchronize memory mapped writes with fallocate (and reflinking too). That change happened with commit 8d9b4a162a37ce ("btrfs: exclude mmap from happening during all fallocate operations"). So stop looking for ordered extents after locking the file range when doing a plain fallocate. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
author: Filipe Manana <fdmanana@suse.com> 2022-03-15 15:22:38 +0000
committer: David Sterba <dsterba@suse.com> 2022-05-16 17:03:09 +0200
commit: ffa8fc603d2774ab2b22b18301a12f9d4d2be954 (patch)
tree: a0bdc9c6da403f5ff918f07366425d4bb34766e2 /fs/btrfs/file.c
parent: 1c6cbbbeeeca5702c115f4547fd0f75a7fc0f911 (diff)
1 files changed, 8 insertions, 34 deletions
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index bd638cbb5bda..0a9b334f4970 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -3474,8 +3474,12 @@ static long btrfs_fallocate(struct file *file, int mode,
 	}
 
 	/*
-	 * wait for ordered IO before we have any locks.  We'll loop again
-	 * below with the locks held.
+	 * We have locked the inode at the VFS level (in exclusive mode) and we
+	 * have locked the i_mmap_lock lock (in exclusive mode). Now before
+	 * locking the file range, flush all dealloc in the range and wait for
+	 * all ordered extents in the range to complete. After this we can lock
+	 * the file range and, due to the previous locking we did, we know there
+	 * can't be more delalloc or ordered extents in the range.
 	 */
 	ret = btrfs_wait_ordered_range(inode, alloc_start,
 				       alloc_end - alloc_start);
@@ -3489,38 +3493,8 @@ static long btrfs_fallocate(struct file *file, int mode,
 	}
 
 	locked_end = alloc_end - 1;
-	while (1) {
-		struct btrfs_ordered_extent *ordered;
-
-		/* the extent lock is ordered inside the running
-		 * transaction
-		 */
-		lock_extent_bits(&BTRFS_I(inode)->io_tree, alloc_start,
-				 locked_end, &cached_state);
-		ordered = btrfs_lookup_first_ordered_extent(BTRFS_I(inode),
-							    locked_end);
-
-		if (ordered &&
-		    ordered->file_offset + ordered->num_bytes > alloc_start &&
-		    ordered->file_offset < alloc_end) {
-			btrfs_put_ordered_extent(ordered);
-			unlock_extent_cached(&BTRFS_I(inode)->io_tree,
-					     alloc_start, locked_end,
-					     &cached_state);
-			/*
-			 * we can't wait on the range with the transaction
-			 * running or with the extent lock held
-			 */
-			ret = btrfs_wait_ordered_range(inode, alloc_start,
-						       alloc_end - alloc_start);
-			if (ret)
-				goto out;
-		} else {
-			if (ordered)
-				btrfs_put_ordered_extent(ordered);
-			break;
-		}
-	}
+	lock_extent_bits(&BTRFS_I(inode)->io_tree, alloc_start, locked_end,
+			 &cached_state);
 
 	/* First, check if we exceed the qgroup limit */
 	INIT_LIST_HEAD(&reserve_list);
author	Filipe Manana <fdmanana@suse.com>	2022-03-15 15:22:38 +0000
committer	David Sterba <dsterba@suse.com>	2022-05-16 17:03:09 +0200
commit	ffa8fc603d2774ab2b22b18301a12f9d4d2be954 (patch)
tree	a0bdc9c6da403f5ff918f07366425d4bb34766e2 /fs/btrfs/file.c
parent	1c6cbbbeeeca5702c115f4547fd0f75a7fc0f911 (diff)