summaryrefslogtreecommitdiff
path: root/net/sched/sch_gred.c
AgeCommit message (Collapse)Author
2012-09-13net_sched: gred: actually perform idling in WRED modeDavid Ward
gred_dequeue() and gred_drop() do not seem to get called when the queue is empty, meaning that we never start idling while in WRED mode. And since qidlestart is not stored by gred_store_wred_set(), we would never stop idling while in WRED mode if we ever started. This messes up the average queue size calculation that influences packet marking/dropping behavior. Now, we start WRED mode idling as we are removing the last packet from the queue. Also we now actually stop WRED mode idling when we are enqueuing a packet. Cc: Bruce Osler <brosler@cisco.com> Signed-off-by: David Ward <david.ward@ll.mit.edu> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-13net_sched: gred: fix qave reporting via netlinkDavid Ward
q->vars.qavg is a Wlog scaled value, but q->backlog is not. In order to pass q->vars.qavg as the backlog value, we need to un-scale it. Additionally, the qave value returned via netlink should not be Wlog scaled, so we need to un-scale the result of red_calc_qavg(). This caused artificially high values for "Average Queue" to be shown by 'tc -s -d qdisc', but did not affect the actual operation of GRED. Signed-off-by: David Ward <david.ward@ll.mit.edu> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-13net_sched: gred: eliminate redundant DP prio comparisonsDavid Ward
Each pair of DPs only needs to be compared once when searching for a non-unique prio value. Signed-off-by: David Ward <david.ward@ll.mit.edu> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-13net_sched: gred: correct comment about qavg calculation in RIO modeDavid Ward
Signed-off-by: David Ward <david.ward@ll.mit.edu> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-15net: Convert net_ratelimit uses to net_<level>_ratelimitedJoe Perches
Standardize the net core ratelimited logging functions. Coalesce formats, align arguments. Change a printk then vprintk sequence to use printf extension %pV. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Fix merge between commit 3adadc08cc1e ("net ax25: Reorder ax25_exit to remove races") and commit 0ca7a4c87d27 ("net ax25: Simplify and cleanup the ax25 sysctl handling") The former moved around the sysctl register/unregister calls, the later simply removed them. With help from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-16net_sched: gred: Fix oops in gred_dump() in WRED modeDavid Ward
A parameter set exists for WRED mode, called wred_set, to hold the same values for qavg and qidlestart across all VQs. The WRED mode values had been previously held in the VQ for the default DP. After these values were moved to wred_set, the VQ for the default DP was no longer created automatically (so that it could be omitted on purpose, to have packets in the default DP enqueued directly to the device without using RED). However, gred_dump() was overlooked during that change; in WRED mode it still reads qavg/qidlestart from the VQ for the default DP, which might not even exist. As a result, this command sequence will cause an oops: tc qdisc add dev $DEV handle $HANDLE parent $PARENT gred setup \ DPs 3 default 2 grio tc qdisc change dev $DEV handle $HANDLE gred DP 0 prio 8 $RED_OPTIONS tc qdisc change dev $DEV handle $HANDLE gred DP 1 prio 8 $RED_OPTIONS This fixes gred_dump() in WRED mode to use the values held in wred_set. Signed-off-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-01pkt_sched: Stop using NLA_PUT*().David S. Miller
These macros contain a hidden goto, and are thus extremely error prone and make code hard to audit. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-05net_sched: red: split red_parms into parms and varsEric Dumazet
This patch splits the red_parms structure into two components. One holding the RED 'constant' parameters, and one containing the variables. This permits a size reduction of GRED qdisc, and is a preliminary step to add an optional RED unit to SFQ. SFQRED will have a single red_parms structure shared by all flows, and a private red_vars per flow. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Dave Taht <dave.taht@gmail.com> CC: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-16sch_gred: prefer GFP_KERNEL allocationsEric Dumazet
In control path, its better to use GFP_KERNEL allocations where possible. Before taking qdisc spinlock, we preallocate memory just in case we'll need it in gred_change_vq() This is a followup to commit 3f1e6d3fd37b (sch_gred: should not use GFP_KERNEL while holding a spinlock) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/freescale/fsl_pq_mdio.c net/batman-adv/translation-table.c net/ipv6/route.c
2011-12-12sch_gred: should not use GFP_KERNEL while holding a spinlockEric Dumazet
gred_change_vq() is called under sch_tree_lock(sch). This means a spinlock is held, and we are not allowed to sleep in this context. We might pre-allocate memory using GFP_KERNEL before taking spinlock, but this is not suitable for stable material. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-09sch_red: generalize accurate MAX_P support to RED/GRED/CHOKEEric Dumazet
Now RED uses a Q0.32 number to store max_p (max probability), allow RED/GRED/CHOKE to use/report full resolution at config/dump time. Old tc binaries are non aware of new attributes, and still set/get Plog. New tc binary set/get both Plog and max_p for backward compatibility, they display "probability value" if they get max_p from new kernels. # tc -d qdisc show dev ... ... qdisc red 10: parent 1:1 limit 360Kb min 30Kb max 90Kb ecn ewma 5 probability 0.09 Scell_log 15 Make sure we avoid potential divides by 0 in reciprocal_value(), if (max_th - min_th) is big. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19net_sched: cleanupsEric Dumazet
Cleanup net/sched code to current CodingStyle and practices. Reduce inline abuse Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2008-11-13pkt_sched: Remove qdisc->ops->requeue() etc.Jarek Poplawski
After implementing qdisc->ops->peek() and changing sch_netem into classless qdisc there are no more qdisc->ops->requeue() users. This patch removes this method with its wrappers (qdisc_requeue()), and also unused qdisc->requeue structure. There are a few minor fixes of warnings (htb_enqueue()) and comments btw. The idea to kill ->requeue() and a similar patch were first developed by David S. Miller. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-31pkt_sched: Add qdisc->ops->peek() implementation.Jarek Poplawski
Add qdisc->ops->peek() implementation for work-conserving qdiscs. With feedback from Patrick McHardy. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-20net_sched: Add accessor function for packet length for qdiscsJussi Kivilinna
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08pkt_sched: Remove 'dev' member of struct Qdisc.David S. Miller
It can be obtained via the netdev_queue. So create a helper routine, qdisc_dev(), to make the transformations nicer looking. Now, qdisc_alloc() now no longer needs a net_device pointer argument. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-03netlink: Improve returned error codesThomas Graf
Make nlmsg_trim(), nlmsg_cancel(), genlmsg_cancel(), and nla_nest_cancel() void functions. Return -EMSGSIZE instead of -1 if the provided message buffer is not big enough. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NET_SCHED]: Use nla_policy for attribute validation in packet schedulersPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NET_SCHED]: Propagate nla_parse return valuePatrick McHardy
nla_parse() returns more detailed errno codes, propagate them back on error. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NET_SCHED]: Convert packet schedulers from rtnetlink to new netlink APIPatrick McHardy
Convert packet schedulers to use the netlink API. Unfortunately a gradual conversion is not possible without breaking compilation in the middle or adding lots of casts, so this patch converts them all in one step. The patch has been mostly generated automatically with some minor edits to at least allow seperate conversion of classifiers and actions. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NET]: Move Qdisc_class_ops and Qdisc_ops in appropriate sections.Eric Dumazet
Qdisc_class_ops are const, and Qdisc_ops are mostly read. Using "const" and "__read_mostly" qualifiers helps to reduce false sharing. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10[NET_SCHED]: Remove unnecessary includesPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10[NET] SCHED: Fix whitespace errors.YOSHIFUJI Hideaki
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-21[NET]: Conversions from kmalloc+memset to k(z|c)alloc.Panagiotis Issaris
Signed-off-by: Panagiotis Issaris <takis@issaris.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-30Remove obsolete #include <linux/config.h>Jörn Engel
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2005-11-05[PKT_SCHED]: (G)RED: Introduce hard droppingThomas Graf
Introduces a new flag TC_RED_HARDDROP which specifies that if ECN marking is enabled packets should still be dropped once the average queue length exceeds the maximum threshold. This _may_ help to avoid global synchronisation during small bursts of peers advertising but not caring about ECN. Use this option very carefully, it does more harm than good if (qth_max - qth_min) does not cover at least two average burst cycles. The difference to the current behaviour, in which we'd run into the hard queue limit, is that due to the low pass filter of RED short bursts are less likely to cause a global synchronisation. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Support ECN markingThomas Graf
Adds a new u8 flags in a unused padding area of the netlink message. Adds ECN marking support to be used instead of dropping packets immediately. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Fix restart of idle period in WRED mode upon dequeue and dropThomas Graf
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Cleanup and remove unnecessary codeThomas Graf
Removes unnecessary includes, initializers, and simplifies the code a bit. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Remove auto-creation of default VQThomas Graf
Since we are no longer depending on the default VQ to be always allocated we can leave it up to the user to actually create it. This gives the user the ability to leave it out on purpose and enqueue packets directly to the device without applying the RED algorithm. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Dont abuse default VQ for equalizingThomas Graf
Introduces a new red parameter set for use in equalize mode, although only the qavg variable and the idle period marker are being used for now this makes it possible to allow a separate parameter set to be used for equalize later on. The use of this separate parameter set fixes a bogus start of an idle period in gred_drop() which did start an idle period on the default VQ even if equalize mode was disabled. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Remove initd flagThomas Graf
The case when the default VQ is not set up yet is already handled in a less error prone way. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Improve error handling and messagesThomas Graf
Try to enqueue packets if we cannot associate it with a VQ, this basically means that the default VQ has not been set up yet. We must check if the VQ still exists while requeueing, the VQ might have been changed between dequeue and the requeue of the underlying qdisc. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Introduce tc_index_to_dp()Thomas Graf
Adds a transformation function returning the DP index for a given skb according to its tc_index. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Use generic queue management interfaceThomas Graf
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Report congestion related drops as NET_XMIT_CNThomas Graf
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Do not reset statistics in gred_reset/gred_changeThomas Graf
Qdiscs are not supposed to reset statistics in reset() and while changing parameters. My argumentation is that if the user wants the counters to be reset he can simply remove and readd the qdiscs, that's what most users do anyway. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Use new generic red interfaceThomas Graf
Simplifies code a lot by separating the red algorithm and the queueing logic. We now differentiate between probability marks and forced marks but sum them together again to not break backwards compatibility. This brings GRED back to the level of RED and improves the accuracy of the averge queue length calculations when stab suggests a zero shift. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Use central VQ change procedureThomas Graf
Introduces a function gred_change_vq() acting as a central point to change VQ parameters. Fixes priority inheritance in rio mode when the default DP equals 0. Adds proper locking during changes. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Report out-of-bound DPs as illegalThomas Graf
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Use a central table definition change procedureThomas Graf
Introduces a function gred_change_table_def() acting as a central point to change the table definition. Adds missing validations for table definition: MAX_DPs > DPs > 0 and def_DP < DPs thus fixing possible invalid memory reference oopses. Only root could do it but having a typo crashing the machine is a bit hard. Adds missing locking while changing the table definition, the operation of changing the number of DPs and removing shadowed VQs may not be interrupted by a dequeue. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Dump table definitionThomas Graf
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Cleanup dumpingThomas Graf
Avoids the allocation of a buffer by appending the VQs directly to the skb and simplifies the code by using the appropriate message construction macros. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Transform grio to GRED_RIO_MODEThomas Graf
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Cleanup equalize flag and add new WRED mode detectionThomas Graf
Introduces a flags variable using bitops and transforms eqp to use it. Converts the conditions of the form (wred && rio) to (wred) since wred can only be enabled in rio mode anyway. The patch also improves WRED mode detection. The current behaviour does not allow WRED mode to be turned off again without removing the whole qdisc first. The new algorithm checks each VQ against each other looking for equal priorities every time a VQ is changed or added. The performance is poor, O(n**2), but it's used only during administrative tasks and the number of VQs is strictly limited. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-04-16Linux-2.6.12-rc2v2.6.12-rc2Linus Torvalds
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!