diff options
author | Johannes Weiner <hannes@cmpxchg.org> | 2019-11-30 17:50:09 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2019-12-01 06:29:18 -0800 |
commit | 8c8c383c04f6cbcda38e38b2430cb245da4d7e5a (patch) | |
tree | f67ddec92369719a8b2f38e06084988ccb59af65 /mm/memcontrol.c | |
parent | 7249c9f01da30ae5cd1843a54a8fab9b35dd979d (diff) |
mm: memcontrol: try harder to set a new memory.high
Setting a memory.high limit below the usage makes almost no effort to
shrink the cgroup to the new target size.
While memory.high is a "soft" limit that isn't supposed to cause OOM
situations, we should still try harder to meet a user request through
persistent reclaim.
For example, after setting a 10M memory.high on an 800M cgroup full of
file cache, the usage shrinks to about 350M:
+ cat /cgroup/workingset/memory.current
841568256
+ echo 10M
+ cat /cgroup/workingset/memory.current
355729408
This isn't exactly what the user would expect to happen. Setting the
value a few more times eventually whittles the usage down to what we
are asking for:
+ echo 10M
+ cat /cgroup/workingset/memory.current
104181760
+ echo 10M
+ cat /cgroup/workingset/memory.current
31801344
+ echo 10M
+ cat /cgroup/workingset/memory.current
10440704
To improve this, add reclaim retry loops to the memory.high write()
callback, similar to what we do for memory.max, to make a reasonable
effort that the usage meets the requested size after the call returns.
Afterwards, a single write() to memory.high is enough in all but extreme
cases:
+ cat /cgroup/workingset/memory.current
841609216
+ echo 10M
+ cat /cgroup/workingset/memory.current
10182656
790M is not a reasonable reclaim target to ask of a single reclaim
invocation. And it wouldn't be reasonable to optimize the reclaim code
for it. So asking for the full size but retrying is not a bad choice
here: we express our intent, and benefit if reclaim becomes better at
handling larger requests, but we also acknowledge that some of the
deltas we can encounter in memory_high_write() are just too ridiculously
big for a single reclaim invocation to manage.
Link: http://lkml.kernel.org/r/20191022201518.341216-2-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'mm/memcontrol.c')
-rw-r--r-- | mm/memcontrol.c | 30 |
1 files changed, 24 insertions, 6 deletions
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2bd6d470c5f1..94a5b6d831f9 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6091,7 +6091,8 @@ static ssize_t memory_high_write(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off) { struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); - unsigned long nr_pages; + unsigned int nr_retries = MEM_CGROUP_RECLAIM_RETRIES; + bool drained = false; unsigned long high; int err; @@ -6102,12 +6103,29 @@ static ssize_t memory_high_write(struct kernfs_open_file *of, memcg->high = high; - nr_pages = page_counter_read(&memcg->memory); - if (nr_pages > high) - try_to_free_mem_cgroup_pages(memcg, nr_pages - high, - GFP_KERNEL, true); + for (;;) { + unsigned long nr_pages = page_counter_read(&memcg->memory); + unsigned long reclaimed; + + if (nr_pages <= high) + break; + + if (signal_pending(current)) + break; + + if (!drained) { + drain_all_stock(memcg); + drained = true; + continue; + } + + reclaimed = try_to_free_mem_cgroup_pages(memcg, nr_pages - high, + GFP_KERNEL, true); + + if (!reclaimed && !nr_retries--) + break; + } - memcg_wb_domain_size_changed(memcg); return nbytes; } |