From 9878196578286c5ed494778ada01da094377a686 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 3 Feb 2015 18:31:53 -0800 Subject: tcp: do not pace pure ack packets When we added pacing to TCP, we decided to let sch_fq take care of actual pacing. All TCP had to do was to compute sk->pacing_rate using simple formula: sk->pacing_rate = 2 * cwnd * mss / rtt It works well for senders (bulk flows), but not very well for receivers or even RPC : cwnd on the receiver can be less than 10, rtt can be around 100ms, so we can end up pacing ACK packets, slowing down the sender. Really, only the sender should pace, according to its own logic. Instead of adding a new bit in skb, or call yet another flow dissection, we tweak skb->truesize to a small value (2), and we instruct sch_fq to use new helper and not pace pure ack. Note this also helps TCP small queue, as ack packets present in qdisc/NIC do not prevent sending a data packet (RPC workload) This helps to reduce tx completion overhead, ack packets can use regular sock_wfree() instead of tcp_wfree() which is a bit more expensive. This has no impact in the case packets are sent to loopback interface, as we do not coalesce ack packets (were we would detect skb->truesize lie) In case netem (with a delay) is used, skb_orphan_partial() also sets skb->truesize to 1. This patch is a combination of two patches we used for about one year at Google. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- net/sched/sch_fq.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) (limited to 'net/sched') diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c index 2a50f5c62070..69a3dbf55c60 100644 --- a/net/sched/sch_fq.c +++ b/net/sched/sch_fq.c @@ -52,6 +52,7 @@ #include #include #include +#include /* * Per flow structure, dynamically allocated @@ -445,7 +446,9 @@ begin: goto begin; } - if (unlikely(f->head && now < f->time_next_packet)) { + skb = f->head; + if (unlikely(skb && now < f->time_next_packet && + !skb_is_tcp_pure_ack(skb))) { head->first = f->next; fq_flow_set_throttled(q, f); goto begin; @@ -464,12 +467,15 @@ begin: goto begin; } prefetch(&skb->end); - f->time_next_packet = now; f->credit -= qdisc_pkt_len(skb); if (f->credit > 0 || !q->rate_enable) goto out; + /* Do not pace locally generated ack packets */ + if (skb_is_tcp_pure_ack(skb)) + goto out; + rate = q->flow_max_rate; if (skb->sk) rate = min(skb->sk->sk_pacing_rate, rate); -- cgit v1.2.3-58-ga151