summaryrefslogtreecommitdiff
path: root/Documentation/filesystems
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2020-06-08 12:47:09 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2020-06-08 12:47:09 -0700
commitca687877e05ad1bf5b4cefd9cdd091044626deac (patch)
tree6b61bf62f2d87729fcbdd125f483792a71165016 /Documentation/filesystems
parent23fc02e36e4f657af242e59175c891b27c704935 (diff)
parent300e549b6e53025ea69550f009451f7a13bfc3eb (diff)
Merge tag 'gfs2-for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher: - An iopen glock locking scheme rework that speeds up deletes of inodes accessed from multiple nodes - Various bug fixes and debugging improvements - Convert gfs2-glocks.txt to ReST * tag 'gfs2-for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: fix use-after-free on transaction ail lists gfs2: new slab for transactions gfs2: initialize transaction tr_ailX_lists earlier gfs2: Smarter iopen glock waiting gfs2: Wake up when setting GLF_DEMOTE gfs2: Check inode generation number in delete_work_func gfs2: Move inode generation number check into gfs2_inode_lookup gfs2: Minor gfs2_lookup_by_inum cleanup gfs2: Try harder to delete inodes locally gfs2: Give up the iopen glock on contention gfs2: Turn gl_delete into a delayed work gfs2: Keep track of deleted inode generations in LVBs gfs2: Allow ASPACE glocks to also have an lvb gfs2: instrumentation wrt log_flush stuck gfs2: introduce new gfs2_glock_assert_withdraw gfs2: print mapping->nrpages in glock dump for address space glocks gfs2: Only do glock put in gfs2_create_inode for free inodes gfs2: Allow lock_nolock mount to specify jid=X gfs2: Don't ignore inode write errors during inode_go_sync docs: filesystems: convert gfs2-glocks.txt to ReST
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r--Documentation/filesystems/gfs2-glocks.rst (renamed from Documentation/filesystems/gfs2-glocks.txt)149
-rw-r--r--Documentation/filesystems/index.rst1
2 files changed, 86 insertions, 64 deletions
diff --git a/Documentation/filesystems/gfs2-glocks.txt b/Documentation/filesystems/gfs2-glocks.rst
index 7059623635b2..d14f230f0b12 100644
--- a/Documentation/filesystems/gfs2-glocks.txt
+++ b/Documentation/filesystems/gfs2-glocks.rst
@@ -1,5 +1,8 @@
- Glock internal locking rules
- ------------------------------
+.. SPDX-License-Identifier: GPL-2.0
+
+============================
+Glock internal locking rules
+============================
This documents the basic principles of the glock state machine
internals. Each glock (struct gfs2_glock in fs/gfs2/incore.h)
@@ -24,24 +27,28 @@ There are three lock states that users of the glock layer can request,
namely shared (SH), deferred (DF) and exclusive (EX). Those translate
to the following DLM lock modes:
-Glock mode | DLM lock mode
-------------------------------
- UN | IV/NL Unlocked (no DLM lock associated with glock) or NL
- SH | PR (Protected read)
- DF | CW (Concurrent write)
- EX | EX (Exclusive)
+========== ====== =====================================================
+Glock mode DLM lock mode
+========== ====== =====================================================
+ UN IV/NL Unlocked (no DLM lock associated with glock) or NL
+ SH PR (Protected read)
+ DF CW (Concurrent write)
+ EX EX (Exclusive)
+========== ====== =====================================================
Thus DF is basically a shared mode which is incompatible with the "normal"
shared lock mode, SH. In GFS2 the DF mode is used exclusively for direct I/O
operations. The glocks are basically a lock plus some routines which deal
with cache management. The following rules apply for the cache:
-Glock mode | Cache data | Cache Metadata | Dirty Data | Dirty Metadata
---------------------------------------------------------------------------
- UN | No | No | No | No
- SH | Yes | Yes | No | No
- DF | No | Yes | No | No
- EX | Yes | Yes | Yes | Yes
+========== ========== ============== ========== ==============
+Glock mode Cache data Cache Metadata Dirty Data Dirty Metadata
+========== ========== ============== ========== ==============
+ UN No No No No
+ SH Yes Yes No No
+ DF No Yes No No
+ EX Yes Yes Yes Yes
+========== ========== ============== ========== ==============
These rules are implemented using the various glock operations which
are defined for each type of glock. Not all types of glocks use
@@ -49,21 +56,23 @@ all the modes. Only inode glocks use the DF mode for example.
Table of glock operations and per type constants:
-Field | Purpose
-----------------------------------------------------------------------------
-go_xmote_th | Called before remote state change (e.g. to sync dirty data)
-go_xmote_bh | Called after remote state change (e.g. to refill cache)
-go_inval | Called if remote state change requires invalidating the cache
-go_demote_ok | Returns boolean value of whether its ok to demote a glock
- | (e.g. checks timeout, and that there is no cached data)
-go_lock | Called for the first local holder of a lock
-go_unlock | Called on the final local unlock of a lock
-go_dump | Called to print content of object for debugfs file, or on
- | error to dump glock to the log.
-go_type | The type of the glock, LM_TYPE_.....
-go_callback | Called if the DLM sends a callback to drop this lock
-go_flags | GLOF_ASPACE is set, if the glock has an address space
- | associated with it
+============= =============================================================
+Field Purpose
+============= =============================================================
+go_xmote_th Called before remote state change (e.g. to sync dirty data)
+go_xmote_bh Called after remote state change (e.g. to refill cache)
+go_inval Called if remote state change requires invalidating the cache
+go_demote_ok Returns boolean value of whether its ok to demote a glock
+ (e.g. checks timeout, and that there is no cached data)
+go_lock Called for the first local holder of a lock
+go_unlock Called on the final local unlock of a lock
+go_dump Called to print content of object for debugfs file, or on
+ error to dump glock to the log.
+go_type The type of the glock, ``LM_TYPE_*``
+go_callback Called if the DLM sends a callback to drop this lock
+go_flags GLOF_ASPACE is set, if the glock has an address space
+ associated with it
+============= =============================================================
The minimum hold time for each lock is the time after a remote lock
grant for which we ignore remote demote requests. This is in order to
@@ -82,21 +91,25 @@ rather than via the glock.
Locking rules for glock operations:
-Operation | GLF_LOCK bit lock held | gl_lockref.lock spinlock held
--------------------------------------------------------------------------
-go_xmote_th | Yes | No
-go_xmote_bh | Yes | No
-go_inval | Yes | No
-go_demote_ok | Sometimes | Yes
-go_lock | Yes | No
-go_unlock | Yes | No
-go_dump | Sometimes | Yes
-go_callback | Sometimes (N/A) | Yes
-
-N.B. Operations must not drop either the bit lock or the spinlock
-if its held on entry. go_dump and do_demote_ok must never block.
-Note that go_dump will only be called if the glock's state
-indicates that it is caching uptodate data.
+============= ====================== =============================
+Operation GLF_LOCK bit lock held gl_lockref.lock spinlock held
+============= ====================== =============================
+go_xmote_th Yes No
+go_xmote_bh Yes No
+go_inval Yes No
+go_demote_ok Sometimes Yes
+go_lock Yes No
+go_unlock Yes No
+go_dump Sometimes Yes
+go_callback Sometimes (N/A) Yes
+============= ====================== =============================
+
+.. Note::
+
+ Operations must not drop either the bit lock or the spinlock
+ if its held on entry. go_dump and do_demote_ok must never block.
+ Note that go_dump will only be called if the glock's state
+ indicates that it is caching uptodate data.
Glock locking order within GFS2:
@@ -104,7 +117,7 @@ Glock locking order within GFS2:
2. Rename glock (for rename only)
3. Inode glock(s)
(Parents before children, inodes at "same level" with same parent in
- lock number order)
+ lock number order)
4. Rgrp glock(s) (for (de)allocation operations)
5. Transaction glock (via gfs2_trans_begin) for non-read operations
6. i_rw_mutex (if required)
@@ -117,8 +130,8 @@ determine the lifetime of the inode in question. Locking of inodes
is on a per-inode basis. Locking of rgrps is on a per rgrp basis.
In general we prefer to lock local locks prior to cluster locks.
- Glock Statistics
- ------------------
+Glock Statistics
+----------------
The stats are divided into two sets: those relating to the
super block and those relating to an individual glock. The
@@ -173,8 +186,8 @@ we'd like to get a better idea of these timings:
1. To be able to better set the glock "min hold time"
2. To spot performance issues more easily
3. To improve the algorithm for selecting resource groups for
-allocation (to base it on lock wait time, rather than blindly
-using a "try lock")
+ allocation (to base it on lock wait time, rather than blindly
+ using a "try lock")
Due to the smoothing action of the updates, a step change in
some input quantity being sampled will only fully be taken
@@ -195,10 +208,13 @@ as possible. There are always inaccuracies in any
measuring system, but I hope this is as accurate as we
can reasonably make it.
-Per sb stats can be found here:
-/sys/kernel/debug/gfs2/<fsname>/sbstats
-Per glock stats can be found here:
-/sys/kernel/debug/gfs2/<fsname>/glstats
+Per sb stats can be found here::
+
+ /sys/kernel/debug/gfs2/<fsname>/sbstats
+
+Per glock stats can be found here::
+
+ /sys/kernel/debug/gfs2/<fsname>/glstats
Assuming that debugfs is mounted on /sys/kernel/debug and also
that <fsname> is replaced with the name of the gfs2 filesystem
@@ -206,14 +222,16 @@ in question.
The abbreviations used in the output as are follows:
-srtt - Smoothed round trip time for non-blocking dlm requests
-srttvar - Variance estimate for srtt
-srttb - Smoothed round trip time for (potentially) blocking dlm requests
-srttvarb - Variance estimate for srttb
-sirt - Smoothed inter-request time (for dlm requests)
-sirtvar - Variance estimate for sirt
-dlm - Number of dlm requests made (dcnt in glstats file)
-queue - Number of glock requests queued (qcnt in glstats file)
+========= ================================================================
+srtt Smoothed round trip time for non blocking dlm requests
+srttvar Variance estimate for srtt
+srttb Smoothed round trip time for (potentially) blocking dlm requests
+srttvarb Variance estimate for srttb
+sirt Smoothed inter request time (for dlm requests)
+sirtvar Variance estimate for sirt
+dlm Number of dlm requests made (dcnt in glstats file)
+queue Number of glock requests queued (qcnt in glstats file)
+========= ================================================================
The sbstats file contains a set of these stats for each glock type (so 8 lines
for each type) and for each cpu (one column per cpu). The glstats file contains
@@ -224,9 +242,12 @@ The gfs2_glock_lock_time tracepoint prints out the current values of the stats
for the glock in question, along with some addition information on each dlm
reply that is received:
-status - The status of the dlm request
-flags - The dlm request flags
-tdiff - The time taken by this specific request
+====== =======================================
+status The status of the dlm request
+flags The dlm request flags
+tdiff The time taken by this specific request
+====== =======================================
+
(remaining fields as per above list)
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 17795341e0a3..4c536e66dc4c 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -88,6 +88,7 @@ Documentation for filesystem implementations.
f2fs
gfs2
gfs2-uevents
+ gfs2-glocks
hfs
hfsplus
hpfs