diff options
author | Filipe Manana <fdmanana@suse.com> | 2023-03-06 10:13:34 +0000 |
---|---|---|
committer | David Sterba <dsterba@suse.com> | 2023-03-08 01:14:01 +0100 |
commit | 675dfe1223a69e270b3d52cb0211c8a501455cec (patch) | |
tree | 88aa1cd32c5e8884ca4040824a829f3cc30f7a97 | |
parent | e4cc1483f35940c9288c332dd275f6fad485f8d2 (diff) |
btrfs: fix block group item corruption after inserting new block group
We can often end up inserting a block group item, for a new block group,
with a wrong value for the used bytes field.
This happens if for the new allocated block group, in the same transaction
that created the block group, we have tasks allocating extents from it as
well as tasks removing extents from it.
For example:
1) Task A creates a metadata block group X;
2) Two extents are allocated from block group X, so its "used" field is
updated to 32K, and its "commit_used" field remains as 0;
3) Transaction commit starts, by some task B, and it enters
btrfs_start_dirty_block_groups(). There it tries to update the block
group item for block group X, which currently has its "used" field with
a value of 32K. But that fails since the block group item was not yet
inserted, and so on failure update_block_group_item() sets the
"commit_used" field of the block group back to 0;
4) The block group item is inserted by task A, when for example
btrfs_create_pending_block_groups() is called when releasing its
transaction handle. This results in insert_block_group_item() inserting
the block group item in the extent tree (or block group tree), with a
"used" field having a value of 32K, but without updating the
"commit_used" field in the block group, which remains with value of 0;
5) The two extents are freed from block X, so its "used" field changes
from 32K to 0;
6) The transaction commit by task B continues, it enters
btrfs_write_dirty_block_groups() which calls update_block_group_item()
for block group X, and there it decides to skip the block group item
update, because "used" has a value of 0 and "commit_used" has a value
of 0 too.
As a result, we end up with a block item having a 32K "used" field but
no extents allocated from it.
When this issue happens, a btrfs check reports an error like this:
[1/7] checking root items
[2/7] checking extents
block group [1104150528 1073741824] used 39796736 but extent items used 0
ERROR: errors found in extent allocation tree or chunk allocation
(...)
Fix this by making insert_block_group_item() update the block group's
"commit_used" field.
Fixes: 7248e0cebbef ("btrfs: skip update of block group item if used bytes are the same")
CC: stable@vger.kernel.org # 6.2+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
-rw-r--r-- | fs/btrfs/block-group.c | 13 |
1 files changed, 12 insertions, 1 deletions
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 0eec3084fb00..0ef8b8926bfa 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -2484,18 +2484,29 @@ static int insert_block_group_item(struct btrfs_trans_handle *trans, struct btrfs_block_group_item bgi; struct btrfs_root *root = btrfs_block_group_root(fs_info); struct btrfs_key key; + u64 old_commit_used; + int ret; spin_lock(&block_group->lock); btrfs_set_stack_block_group_used(&bgi, block_group->used); btrfs_set_stack_block_group_chunk_objectid(&bgi, block_group->global_root_id); btrfs_set_stack_block_group_flags(&bgi, block_group->flags); + old_commit_used = block_group->commit_used; + block_group->commit_used = block_group->used; key.objectid = block_group->start; key.type = BTRFS_BLOCK_GROUP_ITEM_KEY; key.offset = block_group->length; spin_unlock(&block_group->lock); - return btrfs_insert_item(trans, root, &key, &bgi, sizeof(bgi)); + ret = btrfs_insert_item(trans, root, &key, &bgi, sizeof(bgi)); + if (ret < 0) { + spin_lock(&block_group->lock); + block_group->commit_used = old_commit_used; + spin_unlock(&block_group->lock); + } + + return ret; } static int insert_dev_extent(struct btrfs_trans_handle *trans, |