fs/jbd2, locking/mutex, sched/wait: Use mutex_lock_io() for journal->j_checkpoint_mutex

When an ext4 fs is bogged down by a lot of metadata IOs (in the reported case, it was deletion of millions of files, but any massive amount of journal writes would do), after the journal is filled up, tasks which try to access the filesystem and aren't currently performing the journal writes end up waiting in __jbd2_log_wait_for_space() for journal->j_checkpoint_mutex. Because those mutex sleeps aren't marked as iowait, this condition can lead to misleadingly low iowait and /proc/stat:procs_blocked. While iowait propagation is far from strict, this condition can be triggered fairly easily and annotating these sleeps correctly helps initial diagnosis quite a bit. Use the new mutex_lock_io() for journal->j_checkpoint_mutex so that these sleeps are properly marked as iowait. Reported-by: Mingbo Wan <mingbo@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Kara <jack@suse.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-team@fb.com Link: http://lkml.kernel.org/r/1477673892-28940-5-git-send-email-tj@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
author: Tejun Heo <tj@kernel.org> 2016-10-28 12:58:12 -0400
committer: Ingo Molnar <mingo@kernel.org> 2017-01-14 11:30:06 +0100
commit: 6fa7aa50b2c48400bbd045daf3a2498882eb0596 (patch)
tree: c4b5c8a06b1afe29d19dc73298b91388ee55f49d
parent: 1460cb65a10f6c7a6e3a1c76513338861a0a43b6 (diff)
2 files changed, 7 insertions, 7 deletions
diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 8c514367ba5a..b6b194ec1b4f 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -393,7 +393,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
 	/* Do we need to erase the effects of a prior jbd2_journal_flush? */
 	if (journal->j_flags & JBD2_FLUSHED) {
 		jbd_debug(3, "super block updated\n");
-		mutex_lock(&journal->j_checkpoint_mutex);
+		mutex_lock_io(&journal->j_checkpoint_mutex);
 		/*
 		 * We hold j_checkpoint_mutex so tail cannot change under us.
 		 * We don't need any special data guarantees for writing sb
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index a097048ed1a3..d8a5d0a08f07 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -944,7 +944,7 @@ out:
  */
 void jbd2_update_log_tail(journal_t *journal, tid_t tid, unsigned long block)
 {
-	mutex_lock(&journal->j_checkpoint_mutex);
+	mutex_lock_io(&journal->j_checkpoint_mutex);
 	if (tid_gt(tid, journal->j_tail_sequence))
 		__jbd2_update_log_tail(journal, tid, block);
 	mutex_unlock(&journal->j_checkpoint_mutex);
@@ -1304,7 +1304,7 @@ static int journal_reset(journal_t *journal)
 		journal->j_flags |= JBD2_FLUSHED;
 	} else {
 		/* Lock here to make assertions happy... */
-		mutex_lock(&journal->j_checkpoint_mutex);
+		mutex_lock_io(&journal->j_checkpoint_mutex);
 		/*
 		 * Update log tail information. We use REQ_FUA since new
 		 * transaction will start reusing journal space and so we
@@ -1691,7 +1691,7 @@ int jbd2_journal_destroy(journal_t *journal)
 	spin_lock(&journal->j_list_lock);
 	while (journal->j_checkpoint_transactions != NULL) {
 		spin_unlock(&journal->j_list_lock);
-		mutex_lock(&journal->j_checkpoint_mutex);
+		mutex_lock_io(&journal->j_checkpoint_mutex);
 		err = jbd2_log_do_checkpoint(journal);
 		mutex_unlock(&journal->j_checkpoint_mutex);
 		/*
@@ -1713,7 +1713,7 @@ int jbd2_journal_destroy(journal_t *journal)
 
 	if (journal->j_sb_buffer) {
 		if (!is_journal_aborted(journal)) {
-			mutex_lock(&journal->j_checkpoint_mutex);
+			mutex_lock_io(&journal->j_checkpoint_mutex);
 
 			write_lock(&journal->j_state_lock);
 			journal->j_tail_sequence =
@@ -1955,7 +1955,7 @@ int jbd2_journal_flush(journal_t *journal)
 	spin_lock(&journal->j_list_lock);
 	while (!err && journal->j_checkpoint_transactions != NULL) {
 		spin_unlock(&journal->j_list_lock);
-		mutex_lock(&journal->j_checkpoint_mutex);
+		mutex_lock_io(&journal->j_checkpoint_mutex);
 		err = jbd2_log_do_checkpoint(journal);
 		mutex_unlock(&journal->j_checkpoint_mutex);
 		spin_lock(&journal->j_list_lock);
@@ -1965,7 +1965,7 @@ int jbd2_journal_flush(journal_t *journal)
 	if (is_journal_aborted(journal))
 		return -EIO;
 
-	mutex_lock(&journal->j_checkpoint_mutex);
+	mutex_lock_io(&journal->j_checkpoint_mutex);
 	if (!err) {
 		err = jbd2_cleanup_journal_tail(journal);
 		if (err < 0) {
author	Tejun Heo <tj@kernel.org>	2016-10-28 12:58:12 -0400
committer	Ingo Molnar <mingo@kernel.org>	2017-01-14 11:30:06 +0100
commit	6fa7aa50b2c48400bbd045daf3a2498882eb0596 (patch)
tree	c4b5c8a06b1afe29d19dc73298b91388ee55f49d
parent	1460cb65a10f6c7a6e3a1c76513338861a0a43b6 (diff)