ipv4: Allow amount of dirty memory from fib resizing to be controllable

fib_trie implementation calls synchronize_rcu when a certain amount of pages are dirty from freed entries. The number of pages was determined experimentally in 2009 (commit c3059477fce2d). At the current setting, synchronize_rcu is called often -- 51 times in a second in one test with an average of an 8 msec delay adding a fib entry. The total impact is a lot of slow down modifying the fib. This is seen in the output of 'time' - the difference between real time and sys+user. For example, using 720,022 single path routes and 'ip -batch'[1]: $ time ./ip -batch ipv4/routes-1-hops real 0m14.214s user 0m2.513s sys 0m6.783s So roughly 35% of the actual time to install the routes is from the ip command getting scheduled out, most notably due to synchronize_rcu (this is observed using 'perf sched timehist'). This patch makes the amount of dirty memory configurable between 64k where the synchronize_rcu is called often (small, low end systems that are memory sensitive) to 64M where synchronize_rcu is called rarely during a large FIB change (for high end systems with lots of memory). The default is 512kB which corresponds to the current setting of 128 pages with a 4kB page size. As an example, at 16MB the worst interval shows 4 calls to synchronize_rcu in a second blocking for up to 30 msec in a single instance, and a total of almost 100 msec across the 4 calls in the second. The trade off is allowing FIB entries to consume more memory in a given time window but but with much better fib insertion rates (~30% increase in prefixes/sec). With this patch and net.ipv4.fib_sync_mem set to 16MB, the same batch file runs in: $ time ./ip -batch ipv4/routes-1-hops real 0m9.692s user 0m2.491s sys 0m6.769s So the dead time is reduced to about 1/2 second or <5% of the real time. [1] 'ip' modified to not request ACK messages which improves route insertion times by about 20% Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
author: David Ahern <dsahern@gmail.com> 2019-03-20 09:18:59 -0700
committer: David S. Miller <davem@davemloft.net> 2019-03-21 13:29:53 -0700
commit: 9ab948a91b2c2abc8e82845c0e61f4b1683e3a4f (patch)
tree: d096d05c00760a38c1d91b6f5be5bd91fc28a445 /net/ipv4/fib_trie.c
parent: 12132768dc4a79be65af75ac6262117d0adf93f3 (diff)
1 files changed, 8 insertions, 6 deletions
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index a573e37e0615..1704f432de1f 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -183,14 +183,16 @@ struct trie {
 };
 
 static struct key_vector *resize(struct trie *t, struct key_vector *tn);
-static size_t tnode_free_size;
+static unsigned int tnode_free_size;
 
 /*
- * synchronize_rcu after call_rcu for that many pages; it should be especially
- * useful before resizing the root node with PREEMPT_NONE configs; the value was
- * obtained experimentally, aiming to avoid visible slowdown.
+ * synchronize_rcu after call_rcu for outstanding dirty memory; it should be
+ * especially useful before resizing the root node with PREEMPT_NONE configs;
+ * the value was obtained experimentally, aiming to avoid visible slowdown.
  */
-static const int sync_pages = 128;
+unsigned int sysctl_fib_sync_mem = 512 * 1024;
+unsigned int sysctl_fib_sync_mem_min = 64 * 1024;
+unsigned int sysctl_fib_sync_mem_max = 64 * 1024 * 1024;
 
 static struct kmem_cache *fn_alias_kmem __ro_after_init;
 static struct kmem_cache *trie_leaf_kmem __ro_after_init;
@@ -504,7 +506,7 @@ static void tnode_free(struct key_vector *tn)
 		tn = container_of(head, struct tnode, rcu)->kv;
 	}
 
-	if (tnode_free_size >= PAGE_SIZE * sync_pages) {
+	if (tnode_free_size >= sysctl_fib_sync_mem) {
 		tnode_free_size = 0;
 		synchronize_rcu();
 	}
author	David Ahern <dsahern@gmail.com>	2019-03-20 09:18:59 -0700
committer	David S. Miller <davem@davemloft.net>	2019-03-21 13:29:53 -0700
commit	9ab948a91b2c2abc8e82845c0e61f4b1683e3a4f (patch)
tree	d096d05c00760a38c1d91b6f5be5bd91fc28a445 /net/ipv4/fib_trie.c
parent	12132768dc4a79be65af75ac6262117d0adf93f3 (diff)