diff options
author | Huang Ying <ying.huang@intel.com> | 2023-02-13 20:34:36 +0800 |
---|---|---|
committer | Andrew Morton <akpm@linux-foundation.org> | 2023-02-16 20:43:52 -0800 |
commit | 5b855937096aea7f81e73ad6d40d433c9dd49577 (patch) | |
tree | b9dc01892689880f3cb48f0c0776e66ed8c6fe70 /fs/signalfd.c | |
parent | 0621d160f1003a8aedd3628133568ecffdd724f7 (diff) |
migrate_pages: organize stats with struct migrate_pages_stats
Patch series "migrate_pages(): batch TLB flushing", v5.
Now, migrate_pages() migrates folios one by one, like the fake code as
follows,
for each folio
unmap
flush TLB
copy
restore map
If multiple folios are passed to migrate_pages(), there are opportunities
to batch the TLB flushing and copying. That is, we can change the code to
something as follows,
for each folio
unmap
for each folio
flush TLB
for each folio
copy
for each folio
restore map
The total number of TLB flushing IPI can be reduced considerably. And we
may use some hardware accelerator such as DSA to accelerate the folio
copying.
So in this patch, we refactor the migrate_pages() implementation and
implement the TLB flushing batching. Base on this, hardware accelerated
folio copying can be implemented.
If too many folios are passed to migrate_pages(), in the naive batched
implementation, we may unmap too many folios at the same time. The
possibility for a task to wait for the migrated folios to be mapped again
increases. So the latency may be hurt. To deal with this issue, the max
number of folios be unmapped in batch is restricted to no more than
HPAGE_PMD_NR in the unit of page. That is, the influence is at the same
level of THP migration.
We use the following test to measure the performance impact of the
patchset,
On a 2-socket Intel server,
- Run pmbench memory accessing benchmark
- Run `migratepages` to migrate pages of pmbench between node 0 and
node 1 back and forth.
With the patch, the TLB flushing IPI reduces 99.1% during the test and
the number of pages migrated successfully per second increases 291.7%.
Xin Hao helped to test the patchset on an ARM64 server with 128 cores,
2 NUMA nodes. Test results show that the page migration performance
increases up to 78%.
This patch (of 9):
Define struct migrate_pages_stats to organize the various statistics in
migrate_pages(). This makes it easier to collect and consume the
statistics in multiple functions. This will be needed in the following
patches in the series.
Link: https://lkml.kernel.org/r/20230213123444.155149-1-ying.huang@intel.com
Link: https://lkml.kernel.org/r/20230213123444.155149-2-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Xin Hao <xhao@linux.alibaba.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Bharata B Rao <bharata@amd.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Diffstat (limited to 'fs/signalfd.c')
0 files changed, 0 insertions, 0 deletions