diff options
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/process/handling-regressions.rst | 208 |
1 files changed, 126 insertions, 82 deletions
diff --git a/Documentation/process/handling-regressions.rst b/Documentation/process/handling-regressions.rst index abb741b1aeee..5d3c3de3f4ec 100644 --- a/Documentation/process/handling-regressions.rst +++ b/Documentation/process/handling-regressions.rst @@ -129,88 +129,132 @@ tools and scripts used by other kernel developers or Linux distributions; one of these tools is regzbot, which heavily relies on the "Link:" tags to associate reports for regression with changes resolving them. -Prioritize work on fixing regressions -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -You should fix any reported regression as quickly as possible, to provide -affected users with a solution in a timely manner and prevent more users from -running into the issue; nevertheless developers need to take enough time and -care to ensure regression fixes do not cause additional damage. - -In the end though, developers should give their best to prevent users from -running into situations where a regression leaves them only three options: "run -a kernel with a regression that seriously impacts usage", "continue running an -outdated and thus potentially insecure kernel version for more than two weeks -after a regression's culprit was identified", and "downgrade to a still -supported kernel series that lack required features". - -How to realize this depends a lot on the situation. Here are a few rules of -thumb for you, in order or importance: - - * Prioritize work on handling regression reports and fixing regression over all - other Linux kernel work, unless the latter concerns acute security issues or - bugs causing data loss or damage. - - * Always consider reverting the culprit commits and reapplying them later - together with necessary fixes, as this might be the least dangerous and - quickest way to fix a regression. - - * Developers should handle regressions in all supported kernel series, but are - free to delegate the work to the stable team, if the issue probably at no - point in time occurred with mainline. - - * Try to resolve any regressions introduced in the current development before - its end. If you fear a fix might be too risky to apply only days before a new - mainline release, let Linus decide: submit the fix separately to him as soon - as possible with the explanation of the situation. He then can make a call - and postpone the release if necessary, for example if multiple such changes - show up in his inbox. - - * Address regressions in stable, longterm, or proper mainline releases with - more urgency than regressions in mainline pre-releases. That changes after - the release of the fifth pre-release, aka "-rc5": mainline then becomes as - important, to ensure all the improvements and fixes are ideally tested - together for at least one week before Linus releases a new mainline version. - - * Fix regressions within two or three days, if they are critical for some - reason -- for example, if the issue is likely to affect many users of the - kernel series in question on all or certain architectures. Note, this - includes mainline, as issues like compile errors otherwise might prevent many - testers or continuous integration systems from testing the series. - - * Aim to fix regressions within one week after the culprit was identified, if - the issue was introduced in either: - - * a recent stable/longterm release - - * the development cycle of the latest proper mainline release - - In the latter case (say Linux v5.14), try to address regressions even - quicker, if the stable series for the predecessor (v5.13) will be abandoned - soon or already was stamped "End-of-Life" (EOL) -- this usually happens about - three to four weeks after a new mainline release. - - * Try to fix all other regressions within two weeks after the culprit was - found. Two or three additional weeks are acceptable for performance - regressions and other issues which are annoying, but don't prevent anyone - from running Linux (unless it's an issue in the current development cycle, - as those should ideally be addressed before the release). A few weeks in - total are acceptable if a regression can only be fixed with a risky change - and at the same time is affecting only a few users; as much time is - also okay if the regression is already present in the second newest longterm - kernel series. - -Note: The aforementioned time frames for resolving regressions are meant to -include getting the fix tested, reviewed, and merged into mainline, ideally with -the fix being in linux-next at least briefly. This leads to delays you need to -account for. - -Subsystem maintainers are expected to assist in reaching those periods by doing -timely reviews and quick handling of accepted patches. They thus might have to -send git-pull requests earlier or more often than usual; depending on the fix, -it might even be acceptable to skip testing in linux-next. Especially fixes for -regressions in stable and longterm kernels need to be handled quickly, as fixes -need to be merged in mainline before they can be backported to older series. +Expectations and best practices for fixing regressions +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +As a Linux kernel developer, you are expected to give your best to prevent +situations where a regression caused by a recent change of yours leaves users +only these options: + + * Run a kernel with a regression that impacts usage. + + * Switch to an older or newer kernel series. + + * Continue running an outdated and thus potentially insecure kernel for more + than three weeks after the regression's culprit was identified. Ideally it + should be less than two. And it ought to be just a few days, if the issue is + severe or affects many users -- either in general or in prevalent + environments. + +How to realize that in practice depends on various factors. Use the following +rules of thumb as a guide. + +In general: + + * Prioritize work on regressions over all other Linux kernel work, unless the + latter concerns a severe issue (e.g. acute security vulnerability, data loss, + bricked hardware, ...). + + * Expedite fixing mainline regressions that recently made it into a proper + mainline, stable, or longterm release (either directly or via backport). + + * Do not consider regressions from the current cycle as something that can wait + till the end of the cycle, as the issue might discourage or prevent users and + CI systems from testing mainline now or generally. + + * Work with the required care to avoid additional or bigger damage, even if + resolving an issue then might take longer than outlined below. + +On timing once the culprit of a regression is known: + + * Aim to mainline a fix within two or three days, if the issue is severe or + bothering many users -- either in general or in prevalent conditions like a + particular hardware environment, distribution, or stable/longterm series. + + * Aim to mainline a fix by Sunday after the next, if the culprit made it + into a recent mainline, stable, or longterm release (either directly or via + backport); if the culprit became known early during a week and is simple to + resolve, try to mainline the fix within the same week. + + * For other regressions, aim to mainline fixes before the hindmost Sunday + within the next three weeks. One or two Sundays later are acceptable, if the + regression is something people can live with easily for a while -- like a + mild performance regression. + + * It's strongly discouraged to delay mainlining regression fixes till the next + merge window, except when the fix is extraordinarily risky or when the + culprit was mainlined more than a year ago. + +On procedure: + + * Always consider reverting the culprit, as it's often the quickest and least + dangerous way to fix a regression. Don't worry about mainlining a fixed + variant later: that should be straight-forward, as most of the code went + through review once already. + + * Try to resolve any regressions introduced in mainline during the past + twelve months before the current development cycle ends: Linus wants such + regressions to be handled like those from the current cycle, unless fixing + bears unusual risks. + + * Consider CCing Linus on discussions or patch review, if a regression seems + tangly. Do the same in precarious or urgent cases -- especially if the + subsystem maintainer might be unavailable. Also CC the stable team, when you + know such a regression made it into a mainline, stable, or longterm release. + + * For urgent regressions, consider asking Linus to pick up the fix straight + from the mailing list: he is totally fine with that for uncontroversial + fixes. Ideally though such requests should happen in accordance with the + subsystem maintainers or come directly from them. + + * In case you are unsure if a fix is worth the risk applying just days before + a new mainline release, send Linus a mail with the usual lists and people in + CC; in it, summarize the situation while asking him to consider picking up + the fix straight from the list. He then himself can make the call and when + needed even postpone the release. Such requests again should ideally happen + in accordance with the subsystem maintainers or come directly from them. + +Regarding stable and longterm kernels: + + * You are free to leave regressions to the stable team, if they at no point in + time occurred with mainline or were fixed there already. + + * If a regression made it into a proper mainline release during the past + twelve months, ensure to tag the fix with "Cc: stable@vger.kernel.org", as a + "Fixes:" tag alone does not guarantee a backport. Please add the same tag, + in case you know the culprit was backported to stable or longterm kernels. + + * When receiving reports about regressions in recent stable or longterm kernel + series, please evaluate at least briefly if the issue might happen in current + mainline as well -- and if that seems likely, take hold of the report. If in + doubt, ask the reporter to check mainline. + + * Whenever you want to swiftly resolve a regression that recently also made it + into a proper mainline, stable, or longterm release, fix it quickly in + mainline; when appropriate thus involve Linus to fast-track the fix (see + above). That's because the stable team normally does neither revert nor fix + any changes that cause the same problems in mainline. + + * In case of urgent regression fixes you might want to ensure prompt + backporting by dropping the stable team a note once the fix was mainlined; + this is especially advisable during merge windows and shortly thereafter, as + the fix otherwise might land at the end of a huge patch queue. + +On patch flow: + + * Developers, when trying to reach the time periods mentioned above, remember + to account for the time it takes to get fixes tested, reviewed, and merged by + Linus, ideally with them being in linux-next at least briefly. Hence, if a + fix is urgent, make it obvious to ensure others handle it appropriately. + + * Reviewers, you are kindly asked to assist developers in reaching the time + periods mentioned above by reviewing regression fixes in a timely manner. + + * Subsystem maintainers, you likewise are encouraged to expedite the handling + of regression fixes. Thus evaluate if skipping linux-next is an option for + the particular fix. Also consider sending git pull requests more often than + usual when needed. And try to avoid holding onto regression fixes over + weekends -- especially when the fix is marked for backporting. More aspects regarding regressions developers should be aware of |