mm/munlock: delete page_mlock() and all its works

We have recommended some applications to mlock their userspace, but that turns out to be counter-productive: when many processes mlock the same file, contention on rmap's i_mmap_rwsem can become intolerable at exit: it is needed for write, to remove any vma mapping that file from rmap's tree; but hogged for read by those with mlocks calling page_mlock() (formerly known as try_to_munlock()) on *each* page mapped from the file (the purpose being to find out whether another process has the page mlocked, so therefore it should not be unmlocked yet). Several optimizations have been made in the past: one is to skip page_mlock() when mapcount tells that nothing else has this page mapped; but that doesn't help at all when others do have it mapped. This time around, I initially intended to add a preliminary search of the rmap tree for overlapping VM_LOCKED ranges; but that gets messy with locking order, when in doubt whether a page is actually present; and risks adding even more contention on the i_mmap_rwsem. A solution would be much easier, if only there were space in struct page for an mlock_count... but actually, most of the time, there is space for it - an mlocked page spends most of its life on an unevictable LRU, but since 3.18 removed the scan_unevictable_pages sysctl, that "LRU" has been redundant. Let's try to reuse its page->lru. But leave that until a later patch: in this patch, clear the ground by removing page_mlock(), and all the infrastructure that has gathered around it - which mostly hinders understanding, and will make reviewing new additions harder. Don't mind those old comments about THPs, they date from before 4.5's refcounting rework: splitting is not a risk here. Just keep a minimal version of munlock_vma_page(), as reminder of what it should attend to (in particular, the odd way PGSTRANDED is counted out of PGMUNLOCKED), and likewise a stub for munlock_vma_pages_range(). Move unchanged __mlock_posix_error_return() out of the way, down to above its caller: this series then makes no further change after mlock_fixup(). After this and each following commit, the kernel builds, boots and runs; but with deficiencies which may show up in testing of mlock and munlock. The system calls succeed or fail as before, and mlock remains effective in preventing page reclaim; but meminfo's Unevictable and Mlocked amounts may be shown too low after mlock, grow, then stay too high after munlock: with previously mlocked pages remaining unevictable for too long, until finally unmapped and freed and counts corrected. Normal service will be resumed in "mm/munlock: mlock_pte_range() when mlocking or munlocking". Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
author: Hugh Dickins <hughd@google.com> 2022-02-14 18:20:24 -0800
committer: Matthew Wilcox (Oracle) <willy@infradead.org> 2022-02-17 11:56:13 -0500
commit: ebcbc6ea7d8a604ad8504dae70a6ac1b1e64a0b7 (patch)
tree: 014fb16d25e6e420bbaef89f800f58fd24983747 /include/linux/rmap.h
parent: f71077a4d84bbe8c7b91b7db7c4ef815755ac5e3 (diff)
1 files changed, 0 insertions, 6 deletions
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index e704b1a4c06c..dc48aa8c2c94 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -237,12 +237,6 @@ unsigned long page_address_in_vma(struct page *, struct vm_area_struct *);
  */
 int folio_mkclean(struct folio *);
 
-/*
- * called in munlock()/munmap() path to check for other vmas holding
- * the page mlocked.
- */
-void page_mlock(struct page *page);
-
 void remove_migration_ptes(struct page *old, struct page *new, bool locked);
 
 /*
author	Hugh Dickins <hughd@google.com>	2022-02-14 18:20:24 -0800
committer	Matthew Wilcox (Oracle) <willy@infradead.org>	2022-02-17 11:56:13 -0500
commit	ebcbc6ea7d8a604ad8504dae70a6ac1b1e64a0b7 (patch)
tree	014fb16d25e6e420bbaef89f800f58fd24983747 /include/linux/rmap.h
parent	f71077a4d84bbe8c7b91b7db7c4ef815755ac5e3 (diff)