1 files changed, 126 insertions, 82 deletions
diff --git a/Documentation/process/handling-regressions.rst b/Documentation/process/handling-regressions.rst
index abb741b1aeee..5d3c3de3f4ec 100644
--- a/Documentation/process/handling-regressions.rst
+++ b/Documentation/process/handling-regressions.rst
@@ -129,88 +129,132 @@ tools and scripts used by other kernel developers or Linux distributions; one of
 these tools is regzbot, which heavily relies on the "Link:" tags to associate
 reports for regression with changes resolving them.
 
-Prioritize work on fixing regressions
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-You should fix any reported regression as quickly as possible, to provide
-affected users with a solution in a timely manner and prevent more users from
-running into the issue; nevertheless developers need to take enough time and
-care to ensure regression fixes do not cause additional damage.
-
-In the end though, developers should give their best to prevent users from
-running into situations where a regression leaves them only three options: "run
-a kernel with a regression that seriously impacts usage", "continue running an
-outdated and thus potentially insecure kernel version for more than two weeks
-after a regression's culprit was identified", and "downgrade to a still
-supported kernel series that lack required features".
-
-How to realize this depends a lot on the situation. Here are a few rules of
-thumb for you, in order or importance:
-
- * Prioritize work on handling regression reports and fixing regression over all
-   other Linux kernel work, unless the latter concerns acute security issues or
-   bugs causing data loss or damage.
-
- * Always consider reverting the culprit commits and reapplying them later
-   together with necessary fixes, as this might be the least dangerous and
-   quickest way to fix a regression.
-
- * Developers should handle regressions in all supported kernel series, but are
-   free to delegate the work to the stable team, if the issue probably at no
-   point in time occurred with mainline.
-
- * Try to resolve any regressions introduced in the current development before
-   its end. If you fear a fix might be too risky to apply only days before a new
-   mainline release, let Linus decide: submit the fix separately to him as soon
-   as possible with the explanation of the situation. He then can make a call
-   and postpone the release if necessary, for example if multiple such changes
-   show up in his inbox.
-
- * Address regressions in stable, longterm, or proper mainline releases with
-   more urgency than regressions in mainline pre-releases. That changes after
-   the release of the fifth pre-release, aka "-rc5": mainline then becomes as
-   important, to ensure all the improvements and fixes are ideally tested
-   together for at least one week before Linus releases a new mainline version.
-
- * Fix regressions within two or three days, if they are critical for some
-   reason -- for example, if the issue is likely to affect many users of the
-   kernel series in question on all or certain architectures. Note, this
-   includes mainline, as issues like compile errors otherwise might prevent many
-   testers or continuous integration systems from testing the series.
-
- * Aim to fix regressions within one week after the culprit was identified, if
-   the issue was introduced in either:
-
-    * a recent stable/longterm release
-
-    * the development cycle of the latest proper mainline release
-
-   In the latter case (say Linux v5.14), try to address regressions even
-   quicker, if the stable series for the predecessor (v5.13) will be abandoned
-   soon or already was stamped "End-of-Life" (EOL) -- this usually happens about
-   three to four weeks after a new mainline release.
-
- * Try to fix all other regressions within two weeks after the culprit was
-   found. Two or three additional weeks are acceptable for performance
-   regressions and other issues which are annoying, but don't prevent anyone
-   from running Linux (unless it's an issue in the current development cycle,
-   as those should ideally be addressed before the release). A few weeks in
-   total are acceptable if a regression can only be fixed with a risky change
-   and at the same time is affecting only a few users; as much time is
-   also okay if the regression is already present in the second newest longterm
-   kernel series.
-
-Note: The aforementioned time frames for resolving regressions are meant to
-include getting the fix tested, reviewed, and merged into mainline, ideally with
-the fix being in linux-next at least briefly. This leads to delays you need to
-account for.
-
-Subsystem maintainers are expected to assist in reaching those periods by doing
-timely reviews and quick handling of accepted patches. They thus might have to
-send git-pull requests earlier or more often than usual; depending on the fix,
-it might even be acceptable to skip testing in linux-next. Especially fixes for
-regressions in stable and longterm kernels need to be handled quickly, as fixes
-need to be merged in mainline before they can be backported to older series.
+Expectations and best practices for fixing regressions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+As a Linux kernel developer, you are expected to give your best to prevent
+situations where a regression caused by a recent change of yours leaves users
+only these options:
+
+ * Run a kernel with a regression that impacts usage.
+
+ * Switch to an older or newer kernel series.
+
+ * Continue running an outdated and thus potentially insecure kernel for more
+   than three weeks after the regression's culprit was identified. Ideally it
+   should be less than two. And it ought to be just a few days, if the issue is
+   severe or affects many users -- either in general or in prevalent
+   environments.
+
+How to realize that in practice depends on various factors. Use the following
+rules of thumb as a guide.
+
+In general:
+
+ * Prioritize work on regressions over all other Linux kernel work, unless the
+   latter concerns a severe issue (e.g. acute security vulnerability, data loss,
+   bricked hardware, ...).
+
+ * Expedite fixing mainline regressions that recently made it into a proper
+   mainline, stable, or longterm release (either directly or via backport).
+
+ * Do not consider regressions from the current cycle as something that can wait
+   till the end of the cycle, as the issue might discourage or prevent users and
+   CI systems from testing mainline now or generally.
+
+ * Work with the required care to avoid additional or bigger damage, even if
+   resolving an issue then might take longer than outlined below.
+
+On timing once the culprit of a regression is known:
+
+ * Aim to mainline a fix within two or three days, if the issue is severe or
+   bothering many users -- either in general or in prevalent conditions like a
+   particular hardware environment, distribution, or stable/longterm series.
+
+ * Aim to mainline a fix by Sunday after the next, if the culprit made it
+   into a recent mainline, stable, or longterm release (either directly or via
+   backport); if the culprit became known early during a week and is simple to
+   resolve, try to mainline the fix within the same week.
+
+ * For other regressions, aim to mainline fixes before the hindmost Sunday
+   within the next three weeks. One or two Sundays later are acceptable, if the
+   regression is something people can live with easily for a while -- like a
+   mild performance regression.
+
+ * It's strongly discouraged to delay mainlining regression fixes till the next
+   merge window, except when the fix is extraordinarily risky or when the
+   culprit was mainlined more than a year ago.
+
+On procedure:
+
+ * Always consider reverting the culprit, as it's often the quickest and least
+   dangerous way to fix a regression. Don't worry about mainlining a fixed
+   variant later: that should be straight-forward, as most of the code went
+   through review once already.
+
+ * Try to resolve any regressions introduced in mainline during the past
+   twelve months before the current development cycle ends: Linus wants such
+   regressions to be handled like those from the current cycle, unless fixing
+   bears unusual risks.
+
+ * Consider CCing Linus on discussions or patch review, if a regression seems
+   tangly. Do the same in precarious or urgent cases -- especially if the
+   subsystem maintainer might be unavailable. Also CC the stable team, when you
+   know such a regression made it into a mainline, stable, or longterm release.
+
+ * For urgent regressions, consider asking Linus to pick up the fix straight
+   from the mailing list: he is totally fine with that for uncontroversial
+   fixes. Ideally though such requests should happen in accordance with the
+   subsystem maintainers or come directly from them.
+
+ * In case you are unsure if a fix is worth the risk applying just days before
+   a new mainline release, send Linus a mail with the usual lists and people in
+   CC; in it, summarize the situation while asking him to consider picking up
+   the fix straight from the list. He then himself can make the call and when
+   needed even postpone the release. Such requests again should ideally happen
+   in accordance with the subsystem maintainers or come directly from them.
+
+Regarding stable and longterm kernels:
+
+ * You are free to leave regressions to the stable team, if they at no point in
+   time occurred with mainline or were fixed there already.
+
+ * If a regression made it into a proper mainline release during the past
+   twelve months, ensure to tag the fix with "Cc: stable@vger.kernel.org", as a
+   "Fixes:" tag alone does not guarantee a backport. Please add the same tag,
+   in case you know the culprit was backported to stable or longterm kernels.
+
+ * When receiving reports about regressions in recent stable or longterm kernel
+   series, please evaluate at least briefly if the issue might happen in current
+   mainline as well -- and if that seems likely, take hold of the report. If in
+   doubt, ask the reporter to check mainline.
+
+ * Whenever you want to swiftly resolve a regression that recently also made it
+   into a proper mainline, stable, or longterm release, fix it quickly in
+   mainline; when appropriate thus involve Linus to fast-track the fix (see
+   above). That's because the stable team normally does neither revert nor fix
+   any changes that cause the same problems in mainline.
+
+ * In case of urgent regression fixes you might want to ensure prompt
+   backporting by dropping the stable team a note once the fix was mainlined;
+   this is especially advisable during merge windows and shortly thereafter, as
+   the fix otherwise might land at the end of a huge patch queue.
+
+On patch flow:
+
+ * Developers, when trying to reach the time periods mentioned above, remember
+   to account for the time it takes to get fixes tested, reviewed, and merged by
+   Linus, ideally with them being in linux-next at least briefly. Hence, if a
+   fix is urgent, make it obvious to ensure others handle it appropriately.
+
+ * Reviewers, you are kindly asked to assist developers in reaching the time
+   periods mentioned above by reviewing regression fixes in a timely manner.
+
+ * Subsystem maintainers, you likewise are encouraged to expedite the handling
+   of regression fixes. Thus evaluate if skipping linux-next is an option for
+   the particular fix. Also consider sending git pull requests more often than
+   usual when needed. And try to avoid holding onto regression fixes over
+   weekends -- especially when the fix is marked for backporting.
 
 
 More aspects regarding regressions developers should be aware of