diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2020-01-28 08:46:13 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2020-01-28 08:46:13 -0800 |
commit | d99391ec2b42d827d92003dcdcb96fadac9d862b (patch) | |
tree | 812292b25d209fc3cb63b5fc0633acf77a4d2afb /Documentation | |
parent | 8b561778f29766675e88566215aa835fff9dc1f7 (diff) | |
parent | f8a4bb6bfa639fbdd07aede615be6dffe86a9713 (diff) |
Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
"The RCU changes in this cycle were:
- Expedited grace-period updates
- kfree_rcu() updates
- RCU list updates
- Preemptible RCU updates
- Torture-test updates
- Miscellaneous fixes
- Documentation updates"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits)
rcu: Remove unused stop-machine #include
powerpc: Remove comment about read_barrier_depends()
.mailmap: Add entries for old paulmck@kernel.org addresses
srcu: Apply *_ONCE() to ->srcu_last_gp_end
rcu: Switch force_qs_rnp() to for_each_leaf_node_cpu_mask()
rcu: Move rcu_{expedited,normal} definitions into rcupdate.h
rcu: Move gp_state_names[] and gp_state_getname() to tree_stall.h
rcu: Remove the declaration of call_rcu() in tree.h
rcu: Fix tracepoint tracking RCU CPU kthread utilization
rcu: Fix harmless omission of "CONFIG_" from #if condition
rcu: Avoid tick_dep_set_cpu() misordering
rcu: Provide wrappers for uses of ->rcu_read_lock_nesting
rcu: Use READ_ONCE() for ->expmask in rcu_read_unlock_special()
rcu: Clear ->rcu_read_unlock_special only once
rcu: Clear .exp_hint only when deferred quiescent state has been reported
rcu: Rename some instance of CONFIG_PREEMPTION to CONFIG_PREEMPT_RCU
rcu: Remove kfree_call_rcu_nobatch()
rcu: Remove kfree_rcu() special casing and lazy-callback handling
rcu: Add support for debug_objects debugging for kfree_rcu()
rcu: Add multiple in-flight batches of kfree_rcu() work
...
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/RCU/NMI-RCU.rst (renamed from Documentation/RCU/NMI-RCU.txt) | 53 | ||||
-rw-r--r-- | Documentation/RCU/arrayRCU.rst (renamed from Documentation/RCU/arrayRCU.txt) | 34 | ||||
-rw-r--r-- | Documentation/RCU/index.rst | 5 | ||||
-rw-r--r-- | Documentation/RCU/lockdep-splat.txt | 2 | ||||
-rw-r--r-- | Documentation/RCU/rcu_dereference.rst (renamed from Documentation/RCU/rcu_dereference.txt) | 75 | ||||
-rw-r--r-- | Documentation/RCU/rcubarrier.rst (renamed from Documentation/RCU/rcubarrier.txt) | 222 | ||||
-rw-r--r-- | Documentation/RCU/stallwarn.txt | 11 | ||||
-rw-r--r-- | Documentation/RCU/whatisRCU.rst (renamed from Documentation/RCU/whatisRCU.txt) | 291 | ||||
-rw-r--r-- | Documentation/admin-guide/kernel-parameters.txt | 13 |
9 files changed, 422 insertions, 284 deletions
diff --git a/Documentation/RCU/NMI-RCU.txt b/Documentation/RCU/NMI-RCU.rst index 881353fd5bff..180958388ff9 100644 --- a/Documentation/RCU/NMI-RCU.txt +++ b/Documentation/RCU/NMI-RCU.rst @@ -1,4 +1,7 @@ +.. _NMI_rcu_doc: + Using RCU to Protect Dynamic NMI Handlers +========================================= Although RCU is usually used to protect read-mostly data structures, @@ -9,7 +12,7 @@ work in "arch/x86/oprofile/nmi_timer_int.c" and in "arch/x86/kernel/traps.c". The relevant pieces of code are listed below, each followed by a -brief explanation. +brief explanation:: static int dummy_nmi_callback(struct pt_regs *regs, int cpu) { @@ -18,12 +21,12 @@ brief explanation. The dummy_nmi_callback() function is a "dummy" NMI handler that does nothing, but returns zero, thus saying that it did nothing, allowing -the NMI handler to take the default machine-specific action. +the NMI handler to take the default machine-specific action:: static nmi_callback_t nmi_callback = dummy_nmi_callback; This nmi_callback variable is a global function pointer to the current -NMI handler. +NMI handler:: void do_nmi(struct pt_regs * regs, long error_code) { @@ -53,11 +56,12 @@ anyway. However, in practice it is a good documentation aid, particularly for anyone attempting to do something similar on Alpha or on systems with aggressive optimizing compilers. -Quick Quiz: Why might the rcu_dereference_sched() be necessary on Alpha, - given that the code referenced by the pointer is read-only? +Quick Quiz: + Why might the rcu_dereference_sched() be necessary on Alpha, given that the code referenced by the pointer is read-only? +:ref:`Answer to Quick Quiz <answer_quick_quiz_NMI>` -Back to the discussion of NMI and RCU... +Back to the discussion of NMI and RCU:: void set_nmi_callback(nmi_callback_t callback) { @@ -68,7 +72,7 @@ The set_nmi_callback() function registers an NMI handler. Note that any data that is to be used by the callback must be initialized up -before- the call to set_nmi_callback(). On architectures that do not order writes, the rcu_assign_pointer() ensures that the NMI handler sees the -initialized values. +initialized values:: void unset_nmi_callback(void) { @@ -82,7 +86,7 @@ up any data structures used by the old NMI handler until execution of it completes on all other CPUs. One way to accomplish this is via synchronize_rcu(), perhaps as -follows: +follows:: unset_nmi_callback(); synchronize_rcu(); @@ -98,24 +102,23 @@ to free up the handler's data as soon as synchronize_rcu() returns. Important note: for this to work, the architecture in question must invoke nmi_enter() and nmi_exit() on NMI entry and exit, respectively. +.. _answer_quick_quiz_NMI: -Answer to Quick Quiz - - Why might the rcu_dereference_sched() be necessary on Alpha, given - that the code referenced by the pointer is read-only? +Answer to Quick Quiz: + Why might the rcu_dereference_sched() be necessary on Alpha, given that the code referenced by the pointer is read-only? - Answer: The caller to set_nmi_callback() might well have - initialized some data that is to be used by the new NMI - handler. In this case, the rcu_dereference_sched() would - be needed, because otherwise a CPU that received an NMI - just after the new handler was set might see the pointer - to the new NMI handler, but the old pre-initialized - version of the handler's data. + The caller to set_nmi_callback() might well have + initialized some data that is to be used by the new NMI + handler. In this case, the rcu_dereference_sched() would + be needed, because otherwise a CPU that received an NMI + just after the new handler was set might see the pointer + to the new NMI handler, but the old pre-initialized + version of the handler's data. - This same sad story can happen on other CPUs when using - a compiler with aggressive pointer-value speculation - optimizations. + This same sad story can happen on other CPUs when using + a compiler with aggressive pointer-value speculation + optimizations. - More important, the rcu_dereference_sched() makes it - clear to someone reading the code that the pointer is - being protected by RCU-sched. + More important, the rcu_dereference_sched() makes it + clear to someone reading the code that the pointer is + being protected by RCU-sched. diff --git a/Documentation/RCU/arrayRCU.txt b/Documentation/RCU/arrayRCU.rst index f05a9afb2c39..4051ea3871ef 100644 --- a/Documentation/RCU/arrayRCU.txt +++ b/Documentation/RCU/arrayRCU.rst @@ -1,19 +1,21 @@ -Using RCU to Protect Read-Mostly Arrays +.. _array_rcu_doc: +Using RCU to Protect Read-Mostly Arrays +======================================= Although RCU is more commonly used to protect linked lists, it can also be used to protect arrays. Three situations are as follows: -1. Hash Tables +1. :ref:`Hash Tables <hash_tables>` -2. Static Arrays +2. :ref:`Static Arrays <static_arrays>` -3. Resizeable Arrays +3. :ref:`Resizable Arrays <resizable_arrays>` Each of these three situations involves an RCU-protected pointer to an array that is separately indexed. It might be tempting to consider use of RCU to instead protect the index into an array, however, this use -case is -not- supported. The problem with RCU-protected indexes into +case is **not** supported. The problem with RCU-protected indexes into arrays is that compilers can play way too many optimization games with integers, which means that the rules governing handling of these indexes are far more trouble than they are worth. If RCU-protected indexes into @@ -24,16 +26,20 @@ to be safely used. That aside, each of the three RCU-protected pointer situations are described in the following sections. +.. _hash_tables: Situation 1: Hash Tables +------------------------ Hash tables are often implemented as an array, where each array entry has a linked-list hash chain. Each hash chain can be protected by RCU as described in the listRCU.txt document. This approach also applies to other array-of-list situations, such as radix trees. +.. _static_arrays: Situation 2: Static Arrays +-------------------------- Static arrays, where the data (rather than a pointer to the data) is located in each array element, and where the array is never resized, @@ -41,13 +47,17 @@ have not been used with RCU. Rik van Riel recommends using seqlock in this situation, which would also have minimal read-side overhead as long as updates are rare. -Quick Quiz: Why is it so important that updates be rare when - using seqlock? +Quick Quiz: + Why is it so important that updates be rare when using seqlock? + +:ref:`Answer to Quick Quiz <answer_quick_quiz_seqlock>` +.. _resizable_arrays: -Situation 3: Resizeable Arrays +Situation 3: Resizable Arrays +------------------------------ -Use of RCU for resizeable arrays is demonstrated by the grow_ary() +Use of RCU for resizable arrays is demonstrated by the grow_ary() function formerly used by the System V IPC code. The array is used to map from semaphore, message-queue, and shared-memory IDs to the data structure that represents the corresponding IPC construct. The grow_ary() @@ -60,7 +70,7 @@ the remainder of the new, updates the ids->entries pointer to point to the new array, and invokes ipc_rcu_putref() to free up the old array. Note that rcu_assign_pointer() is used to update the ids->entries pointer, which includes any memory barriers required on whatever architecture -you are running on. +you are running on:: static int grow_ary(struct ipc_ids* ids, int newsize) { @@ -112,7 +122,7 @@ a simple check suffices. The pointer to the structure corresponding to the desired IPC object is placed in "out", with NULL indicating a non-existent entry. After acquiring "out->lock", the "out->deleted" flag indicates whether the IPC object is in the process of being -deleted, and, if not, the pointer is returned. +deleted, and, if not, the pointer is returned:: struct kern_ipc_perm* ipc_lock(struct ipc_ids* ids, int id) { @@ -144,8 +154,10 @@ deleted, and, if not, the pointer is returned. return out; } +.. _answer_quick_quiz_seqlock: Answer to Quick Quiz: + Why is it so important that updates be rare when using seqlock? The reason that it is important that updates be rare when using seqlock is that frequent updates can livelock readers. diff --git a/Documentation/RCU/index.rst b/Documentation/RCU/index.rst index 5c99185710fa..81a0a1e5f767 100644 --- a/Documentation/RCU/index.rst +++ b/Documentation/RCU/index.rst @@ -7,8 +7,13 @@ RCU concepts .. toctree:: :maxdepth: 3 + arrayRCU + rcubarrier + rcu_dereference + whatisRCU rcu listRCU + NMI-RCU UP Design/Memory-Ordering/Tree-RCU-Memory-Ordering diff --git a/Documentation/RCU/lockdep-splat.txt b/Documentation/RCU/lockdep-splat.txt index 9c015976b174..b8096316fd11 100644 --- a/Documentation/RCU/lockdep-splat.txt +++ b/Documentation/RCU/lockdep-splat.txt @@ -99,7 +99,7 @@ With this change, the rcu_dereference() is always within an RCU read-side critical section, which again would have suppressed the above lockdep-RCU splat. -But in this particular case, we don't actually deference the pointer +But in this particular case, we don't actually dereference the pointer returned from rcu_dereference(). Instead, that pointer is just compared to the cic pointer, which means that the rcu_dereference() can be replaced by rcu_access_pointer() as follows: diff --git a/Documentation/RCU/rcu_dereference.txt b/Documentation/RCU/rcu_dereference.rst index bf699e8cfc75..c9667eb0d444 100644 --- a/Documentation/RCU/rcu_dereference.txt +++ b/Documentation/RCU/rcu_dereference.rst @@ -1,4 +1,7 @@ +.. _rcu_dereference_doc: + PROPER CARE AND FEEDING OF RETURN VALUES FROM rcu_dereference() +=============================================================== Most of the time, you can use values from rcu_dereference() or one of the similar primitives without worries. Dereferencing (prefix "*"), @@ -8,7 +11,7 @@ subtraction of constants, and casts all work quite naturally and safely. It is nevertheless possible to get into trouble with other operations. Follow these rules to keep your RCU code working properly: -o You must use one of the rcu_dereference() family of primitives +- You must use one of the rcu_dereference() family of primitives to load an RCU-protected pointer, otherwise CONFIG_PROVE_RCU will complain. Worse yet, your code can see random memory-corruption bugs due to games that compilers and DEC Alpha can play. @@ -25,24 +28,24 @@ o You must use one of the rcu_dereference() family of primitives for an example where the compiler can in fact deduce the exact value of the pointer, and thus cause misordering. -o You are only permitted to use rcu_dereference on pointer values. +- You are only permitted to use rcu_dereference on pointer values. The compiler simply knows too much about integral values to trust it to carry dependencies through integer operations. There are a very few exceptions, namely that you can temporarily cast the pointer to uintptr_t in order to: - o Set bits and clear bits down in the must-be-zero low-order + - Set bits and clear bits down in the must-be-zero low-order bits of that pointer. This clearly means that the pointer must have alignment constraints, for example, this does -not- work in general for char* pointers. - o XOR bits to translate pointers, as is done in some + - XOR bits to translate pointers, as is done in some classic buddy-allocator algorithms. It is important to cast the value back to pointer before doing much of anything else with it. -o Avoid cancellation when using the "+" and "-" infix arithmetic +- Avoid cancellation when using the "+" and "-" infix arithmetic operators. For example, for a given variable "x", avoid "(x-(uintptr_t)x)" for char* pointers. The compiler is within its rights to substitute zero for this sort of expression, so that @@ -54,16 +57,16 @@ o Avoid cancellation when using the "+" and "-" infix arithmetic "p+a-b" is safe because its value still necessarily depends on the rcu_dereference(), thus maintaining proper ordering. -o If you are using RCU to protect JITed functions, so that the +- If you are using RCU to protect JITed functions, so that the "()" function-invocation operator is applied to a value obtained (directly or indirectly) from rcu_dereference(), you may need to interact directly with the hardware to flush instruction caches. This issue arises on some systems when a newly JITed function is using the same memory that was used by an earlier JITed function. -o Do not use the results from relational operators ("==", "!=", +- Do not use the results from relational operators ("==", "!=", ">", ">=", "<", or "<=") when dereferencing. For example, - the following (quite strange) code is buggy: + the following (quite strange) code is buggy:: int *p; int *q; @@ -81,11 +84,11 @@ o Do not use the results from relational operators ("==", "!=", after such branches, but can speculate loads, which can again result in misordering bugs. -o Be very careful about comparing pointers obtained from +- Be very careful about comparing pointers obtained from rcu_dereference() against non-NULL values. As Linus Torvalds explained, if the two pointers are equal, the compiler could substitute the pointer you are comparing against for the pointer - obtained from rcu_dereference(). For example: + obtained from rcu_dereference(). For example:: p = rcu_dereference(gp); if (p == &default_struct) @@ -93,7 +96,7 @@ o Be very careful about comparing pointers obtained from Because the compiler now knows that the value of "p" is exactly the address of the variable "default_struct", it is free to - transform this code into the following: + transform this code into the following:: p = rcu_dereference(gp); if (p == &default_struct) @@ -105,14 +108,14 @@ o Be very careful about comparing pointers obtained from However, comparisons are OK in the following cases: - o The comparison was against the NULL pointer. If the + - The comparison was against the NULL pointer. If the compiler knows that the pointer is NULL, you had better not be dereferencing it anyway. If the comparison is non-equal, the compiler is none the wiser. Therefore, it is safe to compare pointers from rcu_dereference() against NULL pointers. - o The pointer is never dereferenced after being compared. + - The pointer is never dereferenced after being compared. Since there are no subsequent dereferences, the compiler cannot use anything it learned from the comparison to reorder the non-existent subsequent dereferences. @@ -124,31 +127,31 @@ o Be very careful about comparing pointers obtained from dereferenced, rcu_access_pointer() should be used in place of rcu_dereference(). - o The comparison is against a pointer that references memory + - The comparison is against a pointer that references memory that was initialized "a long time ago." The reason this is safe is that even if misordering occurs, the misordering will not affect the accesses that follow the comparison. So exactly how long ago is "a long time ago"? Here are some possibilities: - o Compile time. + - Compile time. - o Boot time. + - Boot time. - o Module-init time for module code. + - Module-init time for module code. - o Prior to kthread creation for kthread code. + - Prior to kthread creation for kthread code. - o During some prior acquisition of the lock that + - During some prior acquisition of the lock that we now hold. - o Before mod_timer() time for a timer handler. + - Before mod_timer() time for a timer handler. There are many other possibilities involving the Linux kernel's wide array of primitives that cause code to be invoked at a later time. - o The pointer being compared against also came from + - The pointer being compared against also came from rcu_dereference(). In this case, both pointers depend on one rcu_dereference() or another, so you get proper ordering either way. @@ -159,13 +162,13 @@ o Be very careful about comparing pointers obtained from of such an RCU usage bug is shown in the section titled "EXAMPLE OF AMPLIFIED RCU-USAGE BUG". - o All of the accesses following the comparison are stores, + - All of the accesses following the comparison are stores, so that a control dependency preserves the needed ordering. That said, it is easy to get control dependencies wrong. Please see the "CONTROL DEPENDENCIES" section of Documentation/memory-barriers.txt for more details. - o The pointers are not equal -and- the compiler does + - The pointers are not equal -and- the compiler does not have enough information to deduce the value of the pointer. Note that the volatile cast in rcu_dereference() will normally prevent the compiler from knowing too much. @@ -175,7 +178,7 @@ o Be very careful about comparing pointers obtained from comparison will provide exactly the information that the compiler needs to deduce the value of the pointer. -o Disable any value-speculation optimizations that your compiler +- Disable any value-speculation optimizations that your compiler might provide, especially if you are making use of feedback-based optimizations that take data collected from prior runs. Such value-speculation optimizations reorder operations by design. @@ -188,11 +191,12 @@ o Disable any value-speculation optimizations that your compiler EXAMPLE OF AMPLIFIED RCU-USAGE BUG +---------------------------------- Because updaters can run concurrently with RCU readers, RCU readers can see stale and/or inconsistent values. If RCU readers need fresh or consistent values, which they sometimes do, they need to take proper -precautions. To see this, consider the following code fragment: +precautions. To see this, consider the following code fragment:: struct foo { int a; @@ -244,7 +248,7 @@ to some reordering from the compiler and CPUs is beside the point. But suppose that the reader needs a consistent view? -Then one approach is to use locking, for example, as follows: +Then one approach is to use locking, for example, as follows:: struct foo { int a; @@ -299,6 +303,7 @@ As always, use the right tool for the job! EXAMPLE WHERE THE COMPILER KNOWS TOO MUCH +----------------------------------------- If a pointer obtained from rcu_dereference() compares not-equal to some other pointer, the compiler normally has no clue what the value of the @@ -308,7 +313,7 @@ guarantees that RCU depends on. And the volatile cast in rcu_dereference() should prevent the compiler from guessing the value. But without rcu_dereference(), the compiler knows more than you might -expect. Consider the following code fragment: +expect. Consider the following code fragment:: struct foo { int a; @@ -354,6 +359,7 @@ dereference the resulting pointer. WHICH MEMBER OF THE rcu_dereference() FAMILY SHOULD YOU USE? +------------------------------------------------------------ First, please avoid using rcu_dereference_raw() and also please avoid using rcu_dereference_check() and rcu_dereference_protected() with a @@ -370,7 +376,7 @@ member of the rcu_dereference() to use in various situations: 2. If the access might be within an RCU read-side critical section on the one hand, or protected by (say) my_lock on the other, - use rcu_dereference_check(), for example: + use rcu_dereference_check(), for example:: p1 = rcu_dereference_check(p->rcu_protected_pointer, lockdep_is_held(&my_lock)); @@ -378,14 +384,14 @@ member of the rcu_dereference() to use in various situations: 3. If the access might be within an RCU read-side critical section on the one hand, or protected by either my_lock or your_lock on - the other, again use rcu_dereference_check(), for example: + the other, again use rcu_dereference_check(), for example:: p1 = rcu_dereference_check(p->rcu_protected_pointer, lockdep_is_held(&my_lock) || lockdep_is_held(&your_lock)); 4. If the access is on the update side, so that it is always protected - by my_lock, use rcu_dereference_protected(): + by my_lock, use rcu_dereference_protected():: p1 = rcu_dereference_protected(p->rcu_protected_pointer, lockdep_is_held(&my_lock)); @@ -410,18 +416,19 @@ member of the rcu_dereference() to use in various situations: SPARSE CHECKING OF RCU-PROTECTED POINTERS +----------------------------------------- The sparse static-analysis tool checks for direct access to RCU-protected pointers, which can result in "interesting" bugs due to compiler optimizations involving invented loads and perhaps also load tearing. -For example, suppose someone mistakenly does something like this: +For example, suppose someone mistakenly does something like this:: p = q->rcu_protected_pointer; do_something_with(p->a); do_something_else_with(p->b); If register pressure is high, the compiler might optimize "p" out -of existence, transforming the code to something like this: +of existence, transforming the code to something like this:: do_something_with(q->rcu_protected_pointer->a); do_something_else_with(q->rcu_protected_pointer->b); @@ -435,7 +442,7 @@ Load tearing could of course result in dereferencing a mashup of a pair of pointers, which also might fatally disappoint your code. These problems could have been avoided simply by making the code instead -read as follows: +read as follows:: p = rcu_dereference(q->rcu_protected_pointer); do_something_with(p->a); @@ -448,7 +455,7 @@ or as a formal parameter, with "__rcu", which tells sparse to complain if this pointer is accessed directly. It will also cause sparse to complain if a pointer not marked with "__rcu" is accessed using rcu_dereference() and friends. For example, ->rcu_protected_pointer might be declared as -follows: +follows:: struct foo __rcu *rcu_protected_pointer; diff --git a/Documentation/RCU/rcubarrier.txt b/Documentation/RCU/rcubarrier.rst index a2782df69732..f64f4413a47c 100644 --- a/Documentation/RCU/rcubarrier.txt +++ b/Documentation/RCU/rcubarrier.rst @@ -1,4 +1,7 @@ +.. _rcu_barrier: + RCU and Unloadable Modules +========================== [Originally published in LWN Jan. 14, 2007: http://lwn.net/Articles/217484/] @@ -21,7 +24,7 @@ given that readers might well leave absolutely no trace of their presence? There is a synchronize_rcu() primitive that blocks until all pre-existing readers have completed. An updater wishing to delete an element p from a linked list might do the following, while holding an -appropriate lock, of course: +appropriate lock, of course:: list_del_rcu(p); synchronize_rcu(); @@ -32,13 +35,13 @@ primitive must be used instead. This primitive takes a pointer to an rcu_head struct placed within the RCU-protected data structure and another pointer to a function that may be invoked later to free that structure. Code to delete an element p from the linked list from IRQ -context might then be as follows: +context might then be as follows:: list_del_rcu(p); call_rcu(&p->rcu, p_callback); Since call_rcu() never blocks, this code can safely be used from within -IRQ context. The function p_callback() might be defined as follows: +IRQ context. The function p_callback() might be defined as follows:: static void p_callback(struct rcu_head *rp) { @@ -49,6 +52,7 @@ IRQ context. The function p_callback() might be defined as follows: Unloading Modules That Use call_rcu() +------------------------------------- But what if p_callback is defined in an unloadable module? @@ -69,10 +73,11 @@ in realtime kernels in order to avoid excessive scheduling latencies. rcu_barrier() +------------- We instead need the rcu_barrier() primitive. Rather than waiting for a grace period to elapse, rcu_barrier() waits for all outstanding RCU -callbacks to complete. Please note that rcu_barrier() does -not- imply +callbacks to complete. Please note that rcu_barrier() does **not** imply synchronize_rcu(), in particular, if there are no RCU callbacks queued anywhere, rcu_barrier() is within its rights to return immediately, without waiting for a grace period to elapse. @@ -88,79 +93,79 @@ must match the flavor of rcu_barrier() with that of call_rcu(). If your module uses multiple flavors of call_rcu(), then it must also use multiple flavors of rcu_barrier() when unloading that module. For example, if it uses call_rcu(), call_srcu() on srcu_struct_1, and call_srcu() on -srcu_struct_2(), then the following three lines of code will be required -when unloading: +srcu_struct_2, then the following three lines of code will be required +when unloading:: 1 rcu_barrier(); 2 srcu_barrier(&srcu_struct_1); 3 srcu_barrier(&srcu_struct_2); The rcutorture module makes use of rcu_barrier() in its exit function -as follows: +as follows:: - 1 static void - 2 rcu_torture_cleanup(void) - 3 { - 4 int i; + 1 static void + 2 rcu_torture_cleanup(void) + 3 { + 4 int i; 5 - 6 fullstop = 1; - 7 if (shuffler_task != NULL) { + 6 fullstop = 1; + 7 if (shuffler_task != NULL) { 8 VERBOSE_PRINTK_STRING("Stopping rcu_torture_shuffle task"); 9 kthread_stop(shuffler_task); -10 } -11 shuffler_task = NULL; -12 -13 if (writer_task != NULL) { -14 VERBOSE_PRINTK_STRING("Stopping rcu_torture_writer task"); -15 kthread_stop(writer_task); -16 } -17 writer_task = NULL; -18 -19 if (reader_tasks != NULL) { -20 for (i = 0; i < nrealreaders; i++) { -21 if (reader_tasks[i] != NULL) { -22 VERBOSE_PRINTK_STRING( -23 "Stopping rcu_torture_reader task"); -24 kthread_stop(reader_tasks[i]); -25 } -26 reader_tasks[i] = NULL; -27 } -28 kfree(reader_tasks); -29 reader_tasks = NULL; -30 } -31 rcu_torture_current = NULL; -32 -33 if (fakewriter_tasks != NULL) { -34 for (i = 0; i < nfakewriters; i++) { -35 if (fakewriter_tasks[i] != NULL) { -36 VERBOSE_PRINTK_STRING( -37 "Stopping rcu_torture_fakewriter task"); -38 kthread_stop(fakewriter_tasks[i]); -39 } -40 fakewriter_tasks[i] = NULL; -41 } -42 kfree(fakewriter_tasks); -43 fakewriter_tasks = NULL; -44 } -45 -46 if (stats_task != NULL) { -47 VERBOSE_PRINTK_STRING("Stopping rcu_torture_stats task"); -48 kthread_stop(stats_task); -49 } -50 stats_task = NULL; -51 -52 /* Wait for all RCU callbacks to fire. */ -53 rcu_barrier(); -54 -55 rcu_torture_stats_print(); /* -After- the stats thread is stopped! */ -56 -57 if (cur_ops->cleanup != NULL) -58 cur_ops->cleanup(); -59 if (atomic_read(&n_rcu_torture_error)) -60 rcu_torture_print_module_parms("End of test: FAILURE"); -61 else -62 rcu_torture_print_module_parms("End of test: SUCCESS"); -63 } + 10 } + 11 shuffler_task = NULL; + 12 + 13 if (writer_task != NULL) { + 14 VERBOSE_PRINTK_STRING("Stopping rcu_torture_writer task"); + 15 kthread_stop(writer_task); + 16 } + 17 writer_task = NULL; + 18 + 19 if (reader_tasks != NULL) { + 20 for (i = 0; i < nrealreaders; i++) { + 21 if (reader_tasks[i] != NULL) { + 22 VERBOSE_PRINTK_STRING( + 23 "Stopping rcu_torture_reader task"); + 24 kthread_stop(reader_tasks[i]); + 25 } + 26 reader_tasks[i] = NULL; + 27 } + 28 kfree(reader_tasks); + 29 reader_tasks = NULL; + 30 } + 31 rcu_torture_current = NULL; + 32 + 33 if (fakewriter_tasks != NULL) { + 34 for (i = 0; i < nfakewriters; i++) { + 35 if (fakewriter_tasks[i] != NULL) { + 36 VERBOSE_PRINTK_STRING( + 37 "Stopping rcu_torture_fakewriter task"); + 38 kthread_stop(fakewriter_tasks[i]); + 39 } + 40 fakewriter_tasks[i] = NULL; + 41 } + 42 kfree(fakewriter_tasks); + 43 fakewriter_tasks = NULL; + 44 } + 45 + 46 if (stats_task != NULL) { + 47 VERBOSE_PRINTK_STRING("Stopping rcu_torture_stats task"); + 48 kthread_stop(stats_task); + 49 } + 50 stats_task = NULL; + 51 + 52 /* Wait for all RCU callbacks to fire. */ + 53 rcu_barrier(); + 54 + 55 rcu_torture_stats_print(); /* -After- the stats thread is stopped! */ + 56 + 57 if (cur_ops->cleanup != NULL) + 58 cur_ops->cleanup(); + 59 if (atomic_read(&n_rcu_torture_error)) + 60 rcu_torture_print_module_parms("End of test: FAILURE"); + 61 else + 62 rcu_torture_print_module_parms("End of test: SUCCESS"); + 63 } Line 6 sets a global variable that prevents any RCU callbacks from re-posting themselves. This will not be necessary in most cases, since @@ -176,9 +181,14 @@ for any pre-existing callbacks to complete. Then lines 55-62 print status and do operation-specific cleanup, and then return, permitting the module-unload operation to be completed. -Quick Quiz #1: Is there any other situation where rcu_barrier() might +.. _rcubarrier_quiz_1: + +Quick Quiz #1: + Is there any other situation where rcu_barrier() might be required? +:ref:`Answer to Quick Quiz #1 <answer_rcubarrier_quiz_1>` + Your module might have additional complications. For example, if your module invokes call_rcu() from timers, you will need to first cancel all the timers, and only then invoke rcu_barrier() to wait for any remaining @@ -188,11 +198,12 @@ Of course, if you module uses call_rcu(), you will need to invoke rcu_barrier() before unloading. Similarly, if your module uses call_srcu(), you will need to invoke srcu_barrier() before unloading, and on the same srcu_struct structure. If your module uses call_rcu() --and- call_srcu(), then you will need to invoke rcu_barrier() -and- +**and** call_srcu(), then you will need to invoke rcu_barrier() **and** srcu_barrier(). Implementing rcu_barrier() +-------------------------- Dipankar Sarma's implementation of rcu_barrier() makes use of the fact that RCU callbacks are never reordered once queued on one of the per-CPU @@ -200,19 +211,19 @@ queues. His implementation queues an RCU callback on each of the per-CPU callback queues, and then waits until they have all started executing, at which point, all earlier RCU callbacks are guaranteed to have completed. -The original code for rcu_barrier() was as follows: +The original code for rcu_barrier() was as follows:: - 1 void rcu_barrier(void) - 2 { - 3 BUG_ON(in_interrupt()); - 4 /* Take cpucontrol mutex to protect against CPU hotplug */ - 5 mutex_lock(&rcu_barrier_mutex); - 6 init_completion(&rcu_barrier_completion); - 7 atomic_set(&rcu_barrier_cpu_count, 0); - 8 on_each_cpu(rcu_barrier_func, NULL, 0, 1); - 9 wait_for_completion(&rcu_barrier_completion); -10 mutex_unlock(&rcu_barrier_mutex); -11 } + 1 void rcu_barrier(void) + 2 { + 3 BUG_ON(in_interrupt()); + 4 /* Take cpucontrol mutex to protect against CPU hotplug */ + 5 mutex_lock(&rcu_barrier_mutex); + 6 init_completion(&rcu_barrier_completion); + 7 atomic_set(&rcu_barrier_cpu_count, 0); + 8 on_each_cpu(rcu_barrier_func, NULL, 0, 1); + 9 wait_for_completion(&rcu_barrier_completion); + 10 mutex_unlock(&rcu_barrier_mutex); + 11 } Line 3 verifies that the caller is in process context, and lines 5 and 10 use rcu_barrier_mutex to ensure that only one rcu_barrier() is using the @@ -226,18 +237,18 @@ This code was rewritten in 2008 and several times thereafter, but this still gives the general idea. The rcu_barrier_func() runs on each CPU, where it invokes call_rcu() -to post an RCU callback, as follows: +to post an RCU callback, as follows:: - 1 static void rcu_barrier_func(void *notused) - 2 { - 3 int cpu = smp_processor_id(); - 4 struct rcu_data *rdp = &per_cpu(rcu_data, cpu); - 5 struct rcu_head *head; + 1 static void rcu_barrier_func(void *notused) + 2 { + 3 int cpu = smp_processor_id(); + 4 struct rcu_data *rdp = &per_cpu(rcu_data, cpu); + 5 struct rcu_head *head; 6 - 7 head = &rdp->barrier; - 8 atomic_inc(&rcu_barrier_cpu_count); - 9 call_rcu(head, rcu_barrier_callback); -10 } + 7 head = &rdp->barrier; + 8 atomic_inc(&rcu_barrier_cpu_count); + 9 call_rcu(head, rcu_barrier_callback); + 10 } Lines 3 and 4 locate RCU's internal per-CPU rcu_data structure, which contains the struct rcu_head that needed for the later call to @@ -248,20 +259,25 @@ the current CPU's queue. The rcu_barrier_callback() function simply atomically decrements the rcu_barrier_cpu_count variable and finalizes the completion when it -reaches zero, as follows: +reaches zero, as follows:: 1 static void rcu_barrier_callback(struct rcu_head *notused) 2 { - 3 if (atomic_dec_and_test(&rcu_barrier_cpu_count)) - 4 complete(&rcu_barrier_completion); + 3 if (atomic_dec_and_test(&rcu_barrier_cpu_count)) + 4 complete(&rcu_barrier_completion); 5 } -Quick Quiz #2: What happens if CPU 0's rcu_barrier_func() executes +.. _rcubarrier_quiz_2: + +Quick Quiz #2: + What happens if CPU 0's rcu_barrier_func() executes immediately (thus incrementing rcu_barrier_cpu_count to the value one), but the other CPU's rcu_barrier_func() invocations are delayed for a full grace period? Couldn't this result in rcu_barrier() returning prematurely? +:ref:`Answer to Quick Quiz #2 <answer_rcubarrier_quiz_2>` + The current rcu_barrier() implementation is more complex, due to the need to avoid disturbing idle CPUs (especially on battery-powered systems) and the need to minimally disturb non-idle CPUs in real-time systems. @@ -269,6 +285,7 @@ However, the code above illustrates the concepts. rcu_barrier() Summary +--------------------- The rcu_barrier() primitive has seen relatively little use, since most code using RCU is in the core kernel rather than in modules. However, if @@ -277,8 +294,12 @@ so that your module may be safely unloaded. Answers to Quick Quizzes +------------------------ + +.. _answer_rcubarrier_quiz_1: -Quick Quiz #1: Is there any other situation where rcu_barrier() might +Quick Quiz #1: + Is there any other situation where rcu_barrier() might be required? Answer: Interestingly enough, rcu_barrier() was not originally @@ -292,7 +313,12 @@ Answer: Interestingly enough, rcu_barrier() was not originally implementing rcutorture, and found that rcu_barrier() solves this problem as well. -Quick Quiz #2: What happens if CPU 0's rcu_barrier_func() executes +:ref:`Back to Quick Quiz #1 <rcubarrier_quiz_1>` + +.. _answer_rcubarrier_quiz_2: + +Quick Quiz #2: + What happens if CPU 0's rcu_barrier_func() executes immediately (thus incrementing rcu_barrier_cpu_count to the value one), but the other CPU's rcu_barrier_func() invocations are delayed for a full grace period? Couldn't this result in @@ -323,3 +349,5 @@ Answer: This cannot happen. The reason is that on_each_cpu() has its last is to add an rcu_read_lock() before line 8 of rcu_barrier() and an rcu_read_unlock() after line 8 of this same function. If you can think of a better change, please let me know! + +:ref:`Back to Quick Quiz #2 <rcubarrier_quiz_2>` diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt index f48f4621ccbc..a360a8796710 100644 --- a/Documentation/RCU/stallwarn.txt +++ b/Documentation/RCU/stallwarn.txt @@ -225,18 +225,13 @@ an estimate of the total number of RCU callbacks queued across all CPUs In kernels with CONFIG_RCU_FAST_NO_HZ, more information is printed for each CPU: - 0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 Nonlazy posted: ..D + 0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 dyntick_enabled: 1 The "last_accelerate:" prints the low-order 16 bits (in hex) of the jiffies counter when this CPU last invoked rcu_try_advance_all_cbs() from rcu_needs_cpu() or last invoked rcu_accelerate_cbs() from -rcu_prepare_for_idle(). The "Nonlazy posted:" indicates lazy-callback -status, so that an "l" indicates that all callbacks were lazy at the start -of the last idle period and an "L" indicates that there are currently -no non-lazy callbacks (in both cases, "." is printed otherwise, as -shown above) and "D" indicates that dyntick-idle processing is enabled -("." is printed otherwise, for example, if disabled via the "nohz=" -kernel boot parameter). +rcu_prepare_for_idle(). "dyntick_enabled: 1" indicates that dyntick-idle +processing is enabled. If the grace period ends just as the stall warning starts printing, there will be a spurious stall-warning message, which will include diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.rst index 58ba05c4d97f..c7f147b8034f 100644 --- a/Documentation/RCU/whatisRCU.txt +++ b/Documentation/RCU/whatisRCU.rst @@ -1,15 +1,18 @@ +.. _whatisrcu_doc: + What is RCU? -- "Read, Copy, Update" +====================================== Please note that the "What is RCU?" LWN series is an excellent place to start learning about RCU: -1. What is RCU, Fundamentally? http://lwn.net/Articles/262464/ -2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/ -3. RCU part 3: the RCU API http://lwn.net/Articles/264090/ -4. The RCU API, 2010 Edition http://lwn.net/Articles/418853/ - 2010 Big API Table http://lwn.net/Articles/419086/ -5. The RCU API, 2014 Edition http://lwn.net/Articles/609904/ - 2014 Big API Table http://lwn.net/Articles/609973/ +| 1. What is RCU, Fundamentally? http://lwn.net/Articles/262464/ +| 2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/ +| 3. RCU part 3: the RCU API http://lwn.net/Articles/264090/ +| 4. The RCU API, 2010 Edition http://lwn.net/Articles/418853/ +| 2010 Big API Table http://lwn.net/Articles/419086/ +| 5. The RCU API, 2014 Edition http://lwn.net/Articles/609904/ +| 2014 Big API Table http://lwn.net/Articles/609973/ What is RCU? @@ -24,14 +27,21 @@ the experience has been that different people must take different paths to arrive at an understanding of RCU. This document provides several different paths, as follows: -1. RCU OVERVIEW -2. WHAT IS RCU'S CORE API? -3. WHAT ARE SOME EXAMPLE USES OF CORE RCU API? -4. WHAT IF MY UPDATING THREAD CANNOT BLOCK? -5. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU? -6. ANALOGY WITH READER-WRITER LOCKING -7. FULL LIST OF RCU APIs -8. ANSWERS TO QUICK QUIZZES +:ref:`1. RCU OVERVIEW <1_whatisRCU>` + +:ref:`2. WHAT IS RCU'S CORE API? <2_whatisRCU>` + +:ref:`3. WHAT ARE SOME EXAMPLE USES OF CORE RCU API? <3_whatisRCU>` + +:ref:`4. WHAT IF MY UPDATING THREAD CANNOT BLOCK? <4_whatisRCU>` + +:ref:`5. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU? <5_whatisRCU>` + +:ref:`6. ANALOGY WITH READER-WRITER LOCKING <6_whatisRCU>` + +:ref:`7. FULL LIST OF RCU APIs <7_whatisRCU>` + +:ref:`8. ANSWERS TO QUICK QUIZZES <8_whatisRCU>` People who prefer starting with a conceptual overview should focus on Section 1, though most readers will profit by reading this section at @@ -49,8 +59,10 @@ everything, feel free to read the whole thing -- but if you are really that type of person, you have perused the source code and will therefore never need this document anyway. ;-) +.. _1_whatisRCU: 1. RCU OVERVIEW +---------------- The basic idea behind RCU is to split updates into "removal" and "reclamation" phases. The removal phase removes references to data items @@ -116,8 +128,10 @@ So how the heck can a reclaimer tell when a reader is done, given that readers are not doing any sort of synchronization operations??? Read on to learn about how RCU's API makes this easy. +.. _2_whatisRCU: 2. WHAT IS RCU'S CORE API? +--------------------------- The core RCU API is quite small: @@ -136,7 +150,7 @@ later. See the kernel docbook documentation for more info, or look directly at the function header comments. rcu_read_lock() - +^^^^^^^^^^^^^^^ void rcu_read_lock(void); Used by a reader to inform the reclaimer that the reader is @@ -150,7 +164,7 @@ rcu_read_lock() longer-term references to data structures. rcu_read_unlock() - +^^^^^^^^^^^^^^^^^ void rcu_read_unlock(void); Used by a reader to inform the reclaimer that the reader is @@ -158,15 +172,15 @@ rcu_read_unlock() read-side critical sections may be nested and/or overlapping. synchronize_rcu() - +^^^^^^^^^^^^^^^^^ void synchronize_rcu(void); Marks the end of updater code and the beginning of reclaimer code. It does this by blocking until all pre-existing RCU read-side critical sections on all CPUs have completed. - Note that synchronize_rcu() will -not- necessarily wait for + Note that synchronize_rcu() will **not** necessarily wait for any subsequent RCU read-side critical sections to complete. - For example, consider the following sequence of events: + For example, consider the following sequence of events:: CPU 0 CPU 1 CPU 2 ----------------- ------------------------- --------------- @@ -182,7 +196,7 @@ synchronize_rcu() any that begin after synchronize_rcu() is invoked. Of course, synchronize_rcu() does not necessarily return - -immediately- after the last pre-existing RCU read-side critical + **immediately** after the last pre-existing RCU read-side critical section completes. For one thing, there might well be scheduling delays. For another thing, many RCU implementations process requests in batches in order to improve efficiencies, which can @@ -211,10 +225,10 @@ synchronize_rcu() checklist.txt for some approaches to limiting the update rate. rcu_assign_pointer() - +^^^^^^^^^^^^^^^^^^^^ void rcu_assign_pointer(p, typeof(p) v); - Yes, rcu_assign_pointer() -is- implemented as a macro, though it + Yes, rcu_assign_pointer() **is** implemented as a macro, though it would be cool to be able to declare a function in this manner. (Compiler experts will no doubt disagree.) @@ -231,7 +245,7 @@ rcu_assign_pointer() the _rcu list-manipulation primitives such as list_add_rcu(). rcu_dereference() - +^^^^^^^^^^^^^^^^^ typeof(p) rcu_dereference(p); Like rcu_assign_pointer(), rcu_dereference() must be implemented @@ -248,13 +262,13 @@ rcu_dereference() Common coding practice uses rcu_dereference() to copy an RCU-protected pointer to a local variable, then dereferences - this local variable, for example as follows: + this local variable, for example as follows:: p = rcu_dereference(head.next); return p->data; However, in this case, one could just as easily combine these - into one statement: + into one statement:: return rcu_dereference(head.next)->data; @@ -266,8 +280,8 @@ rcu_dereference() unnecessary overhead on Alpha CPUs. Note that the value returned by rcu_dereference() is valid - only within the enclosing RCU read-side critical section [1]. - For example, the following is -not- legal: + only within the enclosing RCU read-side critical section [1]_. + For example, the following is **not** legal:: rcu_read_lock(); p = rcu_dereference(head.next); @@ -290,9 +304,9 @@ rcu_dereference() at any time, including immediately after the rcu_dereference(). And, again like rcu_assign_pointer(), rcu_dereference() is typically used indirectly, via the _rcu list-manipulation - primitives, such as list_for_each_entry_rcu() [2]. + primitives, such as list_for_each_entry_rcu() [2]_. - [1] The variant rcu_dereference_protected() can be used outside +.. [1] The variant rcu_dereference_protected() can be used outside of an RCU read-side critical section as long as the usage is protected by locks acquired by the update-side code. This variant avoids the lockdep warning that would happen when using (for @@ -305,7 +319,7 @@ rcu_dereference() a lockdep splat is emitted. See Documentation/RCU/Design/Requirements/Requirements.rst and the API's code comments for more details and example usage. - [2] If the list_for_each_entry_rcu() instance might be used by +.. [2] If the list_for_each_entry_rcu() instance might be used by update-side code as well as by RCU readers, then an additional lockdep expression can be added to its list of arguments. For example, given an additional "lock_is_held(&mylock)" argument, @@ -315,6 +329,7 @@ rcu_dereference() The following diagram shows how each API communicates among the reader, updater, and reclaimer. +:: rcu_assign_pointer() @@ -375,12 +390,16 @@ c. RCU applied to scheduler and interrupt/NMI-handler tasks. Again, most uses will be of (a). The (b) and (c) cases are important for specialized uses, but are relatively uncommon. +.. _3_whatisRCU: 3. WHAT ARE SOME EXAMPLE USES OF CORE RCU API? +----------------------------------------------- This section shows a simple use of the core RCU API to protect a global pointer to a dynamically allocated structure. More-typical -uses of RCU may be found in listRCU.txt, arrayRCU.txt, and NMI-RCU.txt. +uses of RCU may be found in :ref:`listRCU.rst <list_rcu_doc>`, +:ref:`arrayRCU.rst <array_rcu_doc>`, and :ref:`NMI-RCU.rst <NMI_rcu_doc>`. +:: struct foo { int a; @@ -440,40 +459,43 @@ uses of RCU may be found in listRCU.txt, arrayRCU.txt, and NMI-RCU.txt. So, to sum up: -o Use rcu_read_lock() and rcu_read_unlock() to guard RCU +- Use rcu_read_lock() and rcu_read_unlock() to guard RCU read-side critical sections. -o Within an RCU read-side critical section, use rcu_dereference() +- Within an RCU read-side critical section, use rcu_dereference() to dereference RCU-protected pointers. -o Use some solid scheme (such as locks or semaphores) to +- Use some solid scheme (such as locks or semaphores) to keep concurrent updates from interfering with each other. -o Use rcu_assign_pointer() to update an RCU-protected pointer. +- Use rcu_assign_pointer() to update an RCU-protected pointer. This primitive protects concurrent readers from the updater, - -not- concurrent updates from each other! You therefore still + **not** concurrent updates from each other! You therefore still need to use locking (or something similar) to keep concurrent rcu_assign_pointer() primitives from interfering with each other. -o Use synchronize_rcu() -after- removing a data element from an - RCU-protected data structure, but -before- reclaiming/freeing +- Use synchronize_rcu() **after** removing a data element from an + RCU-protected data structure, but **before** reclaiming/freeing the data element, in order to wait for the completion of all RCU read-side critical sections that might be referencing that data item. See checklist.txt for additional rules to follow when using RCU. -And again, more-typical uses of RCU may be found in listRCU.txt, -arrayRCU.txt, and NMI-RCU.txt. +And again, more-typical uses of RCU may be found in :ref:`listRCU.rst +<list_rcu_doc>`, :ref:`arrayRCU.rst <array_rcu_doc>`, and :ref:`NMI-RCU.rst +<NMI_rcu_doc>`. +.. _4_whatisRCU: 4. WHAT IF MY UPDATING THREAD CANNOT BLOCK? +-------------------------------------------- In the example above, foo_update_a() blocks until a grace period elapses. This is quite simple, but in some cases one cannot afford to wait so long -- there might be other high-priority work to be done. In such cases, one uses call_rcu() rather than synchronize_rcu(). -The call_rcu() API is as follows: +The call_rcu() API is as follows:: void call_rcu(struct rcu_head * head, void (*func)(struct rcu_head *head)); @@ -481,7 +503,7 @@ The call_rcu() API is as follows: This function invokes func(head) after a grace period has elapsed. This invocation might happen from either softirq or process context, so the function is not permitted to block. The foo struct needs to -have an rcu_head structure added, perhaps as follows: +have an rcu_head structure added, perhaps as follows:: struct foo { int a; @@ -490,7 +512,7 @@ have an rcu_head structure added, perhaps as follows: struct rcu_head rcu; }; -The foo_update_a() function might then be written as follows: +The foo_update_a() function might then be written as follows:: /* * Create a new struct foo that is the same as the one currently @@ -520,7 +542,7 @@ The foo_update_a() function might then be written as follows: call_rcu(&old_fp->rcu, foo_reclaim); } -The foo_reclaim() function might appear as follows: +The foo_reclaim() function might appear as follows:: void foo_reclaim(struct rcu_head *rp) { @@ -544,7 +566,7 @@ namely foo_reclaim(). The summary of advice is the same as for the previous section, except that we are now using call_rcu() rather than synchronize_rcu(): -o Use call_rcu() -after- removing a data element from an +- Use call_rcu() **after** removing a data element from an RCU-protected data structure in order to register a callback function that will be invoked after the completion of all RCU read-side critical sections that might be referencing that @@ -552,14 +574,16 @@ o Use call_rcu() -after- removing a data element from an If the callback for call_rcu() is not doing anything more than calling kfree() on the structure, you can use kfree_rcu() instead of call_rcu() -to avoid having to write your own callback: +to avoid having to write your own callback:: kfree_rcu(old_fp, rcu); Again, see checklist.txt for additional rules governing the use of RCU. +.. _5_whatisRCU: 5. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU? +------------------------------------------------ One of the nice things about RCU is that it has extremely simple "toy" implementations that are a good first step towards understanding the @@ -579,7 +603,7 @@ more details on the current implementation as of early 2004. 5A. "TOY" IMPLEMENTATION #1: LOCKING - +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This section presents a "toy" RCU implementation that is based on familiar locking primitives. Its overhead makes it a non-starter for real-life use, as does its lack of scalability. It is also unsuitable @@ -591,7 +615,7 @@ you allow nested rcu_read_lock() calls, you can deadlock. However, it is probably the easiest implementation to relate to, so is a good starting point. -It is extremely simple: +It is extremely simple:: static DEFINE_RWLOCK(rcu_gp_mutex); @@ -614,7 +638,7 @@ It is extremely simple: [You can ignore rcu_assign_pointer() and rcu_dereference() without missing much. But here are simplified versions anyway. And whatever you do, -don't forget about them when submitting patches making use of RCU!] +don't forget about them when submitting patches making use of RCU!]:: #define rcu_assign_pointer(p, v) \ ({ \ @@ -647,18 +671,23 @@ that the only thing that can block rcu_read_lock() is a synchronize_rcu(). But synchronize_rcu() does not acquire any locks while holding rcu_gp_mutex, so there can be no deadlock cycle. -Quick Quiz #1: Why is this argument naive? How could a deadlock +.. _quiz_1: + +Quick Quiz #1: + Why is this argument naive? How could a deadlock occur when using this algorithm in a real-world Linux kernel? How could this deadlock be avoided? +:ref:`Answers to Quick Quiz <8_whatisRCU>` 5B. "TOY" EXAMPLE #2: CLASSIC RCU - +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This section presents a "toy" RCU implementation that is based on "classic RCU". It is also short on performance (but only for updates) and on features such as hotplug CPU and the ability to run in CONFIG_PREEMPT kernels. The definitions of rcu_dereference() and rcu_assign_pointer() are the same as those shown in the preceding section, so they are omitted. +:: void rcu_read_lock(void) { } @@ -683,14 +712,14 @@ CPU in turn. The run_on() primitive can be implemented straightforwardly in terms of the sched_setaffinity() primitive. Of course, a somewhat less "toy" implementation would restore the affinity upon completion rather than just leaving all tasks running on the last CPU, but when I said -"toy", I meant -toy-! +"toy", I meant **toy**! So how the heck is this supposed to work??? Remember that it is illegal to block while in an RCU read-side critical section. Therefore, if a given CPU executes a context switch, we know that it must have completed all preceding RCU read-side critical sections. -Once -all- CPUs have executed a context switch, then -all- preceding +Once **all** CPUs have executed a context switch, then **all** preceding RCU read-side critical sections will have completed. So, suppose that we remove a data item from its structure and then invoke @@ -698,19 +727,32 @@ synchronize_rcu(). Once synchronize_rcu() returns, we are guaranteed that there are no RCU read-side critical sections holding a reference to that data item, so we can safely reclaim it. -Quick Quiz #2: Give an example where Classic RCU's read-side - overhead is -negative-. +.. _quiz_2: + +Quick Quiz #2: + Give an example where Classic RCU's read-side + overhead is **negative**. + +:ref:`Answers to Quick Quiz <8_whatisRCU>` -Quick Quiz #3: If it is illegal to block in an RCU read-side +.. _quiz_3: + +Quick Quiz #3: + If it is illegal to block in an RCU read-side critical section, what the heck do you do in PREEMPT_RT, where normal spinlocks can block??? +:ref:`Answers to Quick Quiz <8_whatisRCU>` + +.. _6_whatisRCU: 6. ANALOGY WITH READER-WRITER LOCKING +-------------------------------------- Although RCU can be used in many different ways, a very common use of RCU is analogous to reader-writer locking. The following unified diff shows how closely related RCU and reader-writer locking can be. +:: @@ -5,5 +5,5 @@ struct el { int data; @@ -762,7 +804,7 @@ diff shows how closely related RCU and reader-writer locking can be. return 0; } -Or, for those who prefer a side-by-side listing: +Or, for those who prefer a side-by-side listing:: 1 struct el { 1 struct el { 2 struct list_head list; 2 struct list_head list; @@ -774,40 +816,44 @@ Or, for those who prefer a side-by-side listing: 8 rwlock_t listmutex; 8 spinlock_t listmutex; 9 struct el head; 9 struct el head; - 1 int search(long key, int *result) 1 int search(long key, int *result) - 2 { 2 { - 3 struct list_head *lp; 3 struct list_head *lp; - 4 struct el *p; 4 struct el *p; - 5 5 - 6 read_lock(&listmutex); 6 rcu_read_lock(); - 7 list_for_each_entry(p, head, lp) { 7 list_for_each_entry_rcu(p, head, lp) { - 8 if (p->key == key) { 8 if (p->key == key) { - 9 *result = p->data; 9 *result = p->data; -10 read_unlock(&listmutex); 10 rcu_read_unlock(); -11 return 1; 11 return 1; -12 } 12 } -13 } 13 } -14 read_unlock(&listmutex); 14 rcu_read_unlock(); -15 return 0; 15 return 0; -16 } 16 } - - 1 int delete(long key) 1 int delete(long key) - 2 { 2 { - 3 struct el *p; 3 struct el *p; - 4 4 - 5 write_lock(&listmutex); 5 spin_lock(&listmutex); - 6 list_for_each_entry(p, head, lp) { 6 list_for_each_entry(p, head, lp) { - 7 if (p->key == key) { 7 if (p->key == key) { - 8 list_del(&p->list); 8 list_del_rcu(&p->list); - 9 write_unlock(&listmutex); 9 spin_unlock(&listmutex); - 10 synchronize_rcu(); -10 kfree(p); 11 kfree(p); -11 return 1; 12 return 1; -12 } 13 } -13 } 14 } -14 write_unlock(&listmutex); 15 spin_unlock(&listmutex); -15 return 0; 16 return 0; -16 } 17 } +:: + + 1 int search(long key, int *result) 1 int search(long key, int *result) + 2 { 2 { + 3 struct list_head *lp; 3 struct list_head *lp; + 4 struct el *p; 4 struct el *p; + 5 5 + 6 read_lock(&listmutex); 6 rcu_read_lock(); + 7 list_for_each_entry(p, head, lp) { 7 list_for_each_entry_rcu(p, head, lp) { + 8 if (p->key == key) { 8 if (p->key == key) { + 9 *result = p->data; 9 *result = p->data; + 10 read_unlock(&listmutex); 10 rcu_read_unlock(); + 11 return 1; 11 return 1; + 12 } 12 } + 13 } 13 } + 14 read_unlock(&listmutex); 14 rcu_read_unlock(); + 15 return 0; 15 return 0; + 16 } 16 } + +:: + + 1 int delete(long key) 1 int delete(long key) + 2 { 2 { + 3 struct el *p; 3 struct el *p; + 4 4 + 5 write_lock(&listmutex); 5 spin_lock(&listmutex); + 6 list_for_each_entry(p, head, lp) { 6 list_for_each_entry(p, head, lp) { + 7 if (p->key == key) { 7 if (p->key == key) { + 8 list_del(&p->list); 8 list_del_rcu(&p->list); + 9 write_unlock(&listmutex); 9 spin_unlock(&listmutex); + 10 synchronize_rcu(); + 10 kfree(p); 11 kfree(p); + 11 return 1; 12 return 1; + 12 } 13 } + 13 } 14 } + 14 write_unlock(&listmutex); 15 spin_unlock(&listmutex); + 15 return 0; 16 return 0; + 16 } 17 } Either way, the differences are quite small. Read-side locking moves to rcu_read_lock() and rcu_read_unlock, update-side locking moves from @@ -825,22 +871,27 @@ delete() can now block. If this is a problem, there is a callback-based mechanism that never blocks, namely call_rcu() or kfree_rcu(), that can be used in place of synchronize_rcu(). +.. _7_whatisRCU: 7. FULL LIST OF RCU APIs +------------------------- The RCU APIs are documented in docbook-format header comments in the Linux-kernel source code, but it helps to have a full list of the APIs, since there does not appear to be a way to categorize them in docbook. Here is the list, by category. -RCU list traversal: +RCU list traversal:: list_entry_rcu + list_entry_lockless list_first_entry_rcu list_next_rcu list_for_each_entry_rcu list_for_each_entry_continue_rcu list_for_each_entry_from_rcu + list_first_or_null_rcu + list_next_or_null_rcu hlist_first_rcu hlist_next_rcu hlist_pprev_rcu @@ -854,7 +905,7 @@ RCU list traversal: hlist_bl_first_rcu hlist_bl_for_each_entry_rcu -RCU pointer/list update: +RCU pointer/list update:: rcu_assign_pointer list_add_rcu @@ -864,10 +915,12 @@ RCU pointer/list update: hlist_add_behind_rcu hlist_add_before_rcu hlist_add_head_rcu + hlist_add_tail_rcu hlist_del_rcu hlist_del_init_rcu hlist_replace_rcu - list_splice_init_rcu() + list_splice_init_rcu + list_splice_tail_init_rcu hlist_nulls_del_init_rcu hlist_nulls_del_rcu hlist_nulls_add_head_rcu @@ -876,7 +929,9 @@ RCU pointer/list update: hlist_bl_del_rcu hlist_bl_set_first_rcu -RCU: Critical sections Grace period Barrier +RCU:: + + Critical sections Grace period Barrier rcu_read_lock synchronize_net rcu_barrier rcu_read_unlock synchronize_rcu @@ -885,7 +940,9 @@ RCU: Critical sections Grace period Barrier rcu_dereference_check kfree_rcu rcu_dereference_protected -bh: Critical sections Grace period Barrier +bh:: + + Critical sections Grace period Barrier rcu_read_lock_bh call_rcu rcu_barrier rcu_read_unlock_bh synchronize_rcu @@ -896,7 +953,9 @@ bh: Critical sections Grace period Barrier rcu_dereference_bh_protected rcu_read_lock_bh_held -sched: Critical sections Grace period Barrier +sched:: + + Critical sections Grace period Barrier rcu_read_lock_sched call_rcu rcu_barrier rcu_read_unlock_sched synchronize_rcu @@ -910,7 +969,9 @@ sched: Critical sections Grace period Barrier rcu_read_lock_sched_held -SRCU: Critical sections Grace period Barrier +SRCU:: + + Critical sections Grace period Barrier srcu_read_lock call_srcu srcu_barrier srcu_read_unlock synchronize_srcu @@ -918,13 +979,14 @@ SRCU: Critical sections Grace period Barrier srcu_dereference_check srcu_read_lock_held -SRCU: Initialization/cleanup +SRCU: Initialization/cleanup:: + DEFINE_SRCU DEFINE_STATIC_SRCU init_srcu_struct cleanup_srcu_struct -All: lockdep-checked RCU-protected pointer access +All: lockdep-checked RCU-protected pointer access:: rcu_access_pointer rcu_dereference_raw @@ -974,15 +1036,19 @@ g. Otherwise, use RCU. Of course, this all assumes that you have determined that RCU is in fact the right tool for your job. +.. _8_whatisRCU: 8. ANSWERS TO QUICK QUIZZES +---------------------------- -Quick Quiz #1: Why is this argument naive? How could a deadlock +Quick Quiz #1: + Why is this argument naive? How could a deadlock occur when using this algorithm in a real-world Linux kernel? [Referring to the lock-based "toy" RCU algorithm.] -Answer: Consider the following sequence of events: +Answer: + Consider the following sequence of events: 1. CPU 0 acquires some unrelated lock, call it "problematic_lock", disabling irq via @@ -1021,10 +1087,14 @@ Answer: Consider the following sequence of events: approach where tasks in RCU read-side critical sections cannot be blocked by tasks executing synchronize_rcu(). -Quick Quiz #2: Give an example where Classic RCU's read-side - overhead is -negative-. +:ref:`Back to Quick Quiz #1 <quiz_1>` + +Quick Quiz #2: + Give an example where Classic RCU's read-side + overhead is **negative**. -Answer: Imagine a single-CPU system with a non-CONFIG_PREEMPT +Answer: + Imagine a single-CPU system with a non-CONFIG_PREEMPT kernel where a routing table is used by process-context code, but can be updated by irq-context code (for example, by an "ICMP REDIRECT" packet). The usual way of handling @@ -1046,11 +1116,15 @@ Answer: Imagine a single-CPU system with a non-CONFIG_PREEMPT even the theoretical possibility of negative overhead for a synchronization primitive is a bit unexpected. ;-) -Quick Quiz #3: If it is illegal to block in an RCU read-side +:ref:`Back to Quick Quiz #2 <quiz_2>` + +Quick Quiz #3: + If it is illegal to block in an RCU read-side critical section, what the heck do you do in PREEMPT_RT, where normal spinlocks can block??? -Answer: Just as PREEMPT_RT permits preemption of spinlock +Answer: + Just as PREEMPT_RT permits preemption of spinlock critical sections, it permits preemption of RCU read-side critical sections. It also permits spinlocks blocking while in RCU read-side critical @@ -1069,6 +1143,7 @@ Answer: Just as PREEMPT_RT permits preemption of spinlock Besides, how does the computer know what pizza parlor the human being went to??? +:ref:`Back to Quick Quiz #3 <quiz_3>` ACKNOWLEDGEMENTS diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index cdc27cb8d8df..fe64f0c814ea 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -4001,6 +4001,19 @@ test until boot completes in order to avoid interference. + rcuperf.kfree_rcu_test= [KNL] + Set to measure performance of kfree_rcu() flooding. + + rcuperf.kfree_nthreads= [KNL] + The number of threads running loops of kfree_rcu(). + + rcuperf.kfree_alloc_num= [KNL] + Number of allocations and frees done in an iteration. + + rcuperf.kfree_loops= [KNL] + Number of loops doing rcuperf.kfree_alloc_num number + of allocations and frees. + rcuperf.nreaders= [KNL] Set number of RCU readers. The value -1 selects N, where N is the number of CPUs. A value |