summaryrefslogtreecommitdiff
path: root/net/ipv4
AgeCommit message (Collapse)Author
2007-10-10[IPV4]: Add ICMPMsgStats MIB (RFC 4293)David L Stevens
Background: RFC 4293 deprecates existing individual, named ICMP type counters to be replaced with the ICMPMsgStatsTable. This table includes entries for both IPv4 and IPv6, and requires counting of all ICMP types, whether or not the machine implements the type. These patches "remove" (but not really) the existing counters, and replace them with the ICMPMsgStats tables for v4 and v6. It includes the named counters in the /proc places they were, but gets the values for them from the new tables. It also counts packets generated from raw socket output (e.g., OutEchoes, MLD queries, RA's from radvd, etc). Changes: 1) create icmpmsg_statistics mib 2) create icmpv6msg_statistics mib 3) modify existing counters to use these 4) modify /proc/net/snmp to add "IcmpMsg" with all ICMP types listed by number for easy SNMP parsing 5) modify /proc/net/snmp printing for "Icmp" to get the named data from new counters. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[IPV4] af_inet.c: use ARRAY_SIZE macro from kernel.h insteadDenis Cheng
Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NETLINK]: Avoid pointer in netlink_run_queueHerbert Xu
I was looking at Patrick's fix to inet_diag and it occured to me that we're using a pointer argument to return values unnecessarily in netlink_run_queue. Changing it to return the value will allow the compiler to generate better code since the value won't have to be memory-backed. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[IPV4/IPV6/DECNET]: Small cleanup for fib rules.Denis V. Lunev
This patch slightly cleanups FIB rules framework. rules_list as a pointer on struct fib_rules_ops is useless. It is always assigned with a static per/subsystem list in IPv4, IPv6 and DecNet. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[CIPSO]: remove duplicated code in the cipso_v4_*_getattr() functionsPaul Moore
The bulk of the CIPSO option parsing/processing in the cipso_v4_sock_getattr() and cipso_v4_skb_getattr() functions are identical, the only real difference being where the functions obtain the CIPSO option itself. This patch creates a new function, cipso_v4_getattr(), which contains the common CIPSO option parsing/processing code and modifies the existing functions to call this new helper function. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NET]: Nuke SET_MODULE_OWNER macro.Ralf Baechle
It's been a useless no-op for long enough in 2.6 so I figured it's time to remove it. The number of people that could object because they're maintaining unified 2.4 and 2.6 drivers is probably rather small. [ Handled drivers added by netdev tree and some missed IRDA cases... -DaveM ] Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[IPV4]: Convert rt_check_expire() from softirq processing to workqueue.Eric Dumazet
On loaded/big hosts, rt_check_expire() if of litle use, because it generally breaks out of its main loop because of a jiffies change. It can take a long time (read : timer invocations) to actually scan the whole hash table, freeing unused entries. Converting it to use a workqueue instead of softirq is a nice move because we can allow rt_check_expire() to do the scan it is supposed to do, without hogging the CPU. This has an impact on the average number of entries in cache, reducing ram usage. Cache is more responsive to parameter changes (/proc/sys/net/ipv4/route/gc_timeout and /proc/sys/net/ipv4/route/gc_interval) Note: Maybe the default value of gc_interval (60 seconds) is too high, since this means we actually need 5 (300/60) invocations to scan the whole table. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NETLINK]: Introduce nested and byteorder flag to netlink attributeThomas Graf
This change allows the generic attribute interface to be used within the netfilter subsystem where this flag was initially introduced. The byte-order flag is yet unused, it's intended use is to allow automatic byte order convertions for all atomic types. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NET]: Make the device list and device lookups per namespace.Eric W. Biederman
This patch makes most of the generic device layer network namespace safe. This patch makes dev_base_head a network namespace variable, and then it picks up a few associated variables. The functions: dev_getbyhwaddr dev_getfirsthwbytype dev_get_by_flags dev_get_by_name __dev_get_by_name dev_get_by_index __dev_get_by_index dev_ioctl dev_ethtool dev_load wireless_process_ioctl were modified to take a network namespace argument, and deal with it. vlan_ioctl_set and brioctl_set were modified so their hooks will receive a network namespace argument. So basically anthing in the core of the network stack that was affected to by the change of dev_base was modified to handle multiple network namespaces. The rest of the network stack was simply modified to explicitly use &init_net the initial network namespace. This can be fixed when those components of the network stack are modified to handle multiple network namespaces. For now the ifindex generator is left global. Fundametally ifindex numbers are per namespace, or else we will have corner case problems with migration when we get that far. At the same time there are assumptions in the network stack that the ifindex of a network device won't change. Making the ifindex number global seems a good compromise until the network stack can cope with ifindex changes when you change namespaces, and the like. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NET]: Support multiple network namespaces with netlinkEric W. Biederman
Each netlink socket will live in exactly one network namespace, this includes the controlling kernel sockets. This patch updates all of the existing netlink protocols to only support the initial network namespace. Request by clients in other namespaces will get -ECONREFUSED. As they would if the kernel did not have the support for that netlink protocol compiled in. As each netlink protocol is updated to be multiple network namespace safe it can register multiple kernel sockets to acquire a presence in the rest of the network namespaces. The implementation in af_netlink is a simple filter implementation at hash table insertion and hash table look up time. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NET]: Make device event notification network namespace safeEric W. Biederman
Every user of the network device notifiers is either a protocol stack or a pseudo device. If a protocol stack that does not have support for multiple network namespaces receives an event for a device that is not in the initial network namespace it quite possibly can get confused and do the wrong thing. To avoid problems until all of the protocol stacks are converted this patch modifies all netdev event handlers to ignore events on devices that are not in the initial network namespace. As the rest of the code is made network namespace aware these checks can be removed. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NET]: Make packet reception network namespace safeEric W. Biederman
This patch modifies every packet receive function registered with dev_add_pack() to drop packets if they are not from the initial network namespace. This should ensure that the various network stacks do not receive packets in a anything but the initial network namespace until the code has been converted and is ready for them. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NET]: Make socket creation namespace safe.Eric W. Biederman
This patch passes in the namespace a new socket should be created in and has the socket code do the appropriate reference counting. By virtue of this all socket create methods are touched. In addition the socket create methods are modified so that they will fail if you attempt to create a socket in a non-default network namespace. Failing if we attempt to create a socket outside of the default network namespace ensures that as we incrementally make the network stack network namespace aware we will not export functionality that someone has not audited and made certain is network namespace safe. Allowing us to partially enable network namespaces before all of the exotic protocols are supported. Any protocol layers I have missed will fail to compile because I now pass an extra parameter into the socket creation code. [ Integrated AF_IUCV build fixes from Andrew Morton... -DaveM ] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NET]: Make /proc/net per network namespaceEric W. Biederman
This patch makes /proc/net per network namespace. It modifies the global variables proc_net and proc_net_stat to be per network namespace. The proc_net file helpers are modified to take a network namespace argument, and all of their callers are fixed to pass &init_net for that argument. This ensures that all of the /proc/net files are only visible and usable in the initial network namespace until the code behind them has been updated to be handle multiple network namespaces. Making /proc/net per namespace is necessary as at least some files in /proc/net depend upon the set of network devices which is per network namespace, and even more files in /proc/net have contents that are relevant to a single network namespace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NET]: DIV_ROUND_UP cleanup (part two)Ilpo Järvinen
Hopefully captured all single statement cases under net/. I'm not too sure if there is some policy about #includes that are "guaranteed" (ie., in the current tree) to be available through some other #included header, so I just added linux/kernel.h to each changed file that didn't #include it previously. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[IPV4] IPSEC: Omit redirect for tunnelled packet.Masahide NAKAMURA
IPv4 IPsec tunnel gateway incorrectly sends redirect to sender if it is onlink host when network device the IPsec tunnelled packet is arrived is the same as the one the decapsulated packet is sent. With this patch, it omits to send the redirect when the forwarding skbuff carries secpath, since such skbuff should be assumed as a decapsulated packet from IPsec tunnel by own. Request for comments: Alternatively we'd have another way to change net/ipv4/route.c (__mkroute_input) to use RTCF_DOREDIRECT flag unless skbuff has no secpath. It is better than this patch at performance point of view because IPv4 redirect judgement is done at routing slow-path. However, it should be taken care of resource changes between SAD(XFRM states) and routing table. In other words, When IPv4 SAD is changed does the related routing entry go to its slow-path? If not, it is reasonable to apply this patch. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[UDP]: Randomize port selection.Stephen Hemminger
This patch causes UDP port allocation to be randomized like TCP. The earlier code would always choose same port (ie first empty list). Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NET] Cleanup: DIV_ROUND_UPIlpo Järvinen
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP] MIB: Add counters for discarded SACK blocksIlpo Järvinen
In DSACK case, some events are not extraordinary, such as packet duplication generated DSACK. They can arrive easily below snd_una when undo_marker is not set (TCP being in CA_Open), counting such DSACKs amoung SACK discards will likely just mislead if they occur in some scenario when there are other problems as well. Similarly, excessively delayed packets could cause "normal" DSACKs. Therefore, separate counters are allocated for DSACK events. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Discard fuzzy SACK blocksIlpo Järvinen
SACK processing code has been a sort of russian roulette as no validation of SACK blocks is previously attempted. Besides, it is not very clear what all kinds of broken SACK blocks really mean (e.g., one that has start and end sequence numbers reversed). So now close the roulette once and for all. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Rename tcp_ack_packets_out -> tcp_rearm_rtoIlpo Järvinen
Only thing that tiny function does is rearming the RTO (if necessary), name it accordingly. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: tcp_packets_out_inc to tcp_output.c (no callers elsewhere)Ilpo Järvinen
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Remove unnecessary wrapper tcp_packets_out_decIlpo Järvinen
Makes caller side more obvious, there's no need to have a wrapper for this oneliner! Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[IPV4] fib_trie: macro cleanupStephen Hemminger
This patch converts the messy macro for MASK_PFX to inline function and expands TKEY_GET_MASK in the one place it is used. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[IPV4] fib_trie: cleanupStephen Hemminger
Try this out: * replace macro's with inlines * get rid of places doing multiple evaluations of NODE_PARENT [akpm@linux-foundation.org: rcu_dereference wants an lval] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Move sack_ok access to obviously named funcs & cleanupIlpo Järvinen
Previously code had IsReno/IsFack defined as macros that were local to tcp_input.c though sack_ok field has user elsewhere too for the same purpose. This changes them to static inlines as preferred according the current coding style and unifies the access to sack_ok across multiple files. Magic bitops of sack_ok for FACK and DSACK are also abstracted to functions with appropriate names. Note: - One sack_ok = 1 remains but that's self explanary, i.e., it enables sack - Couple of !IsReno cases are changed to tcp_is_sack - There were no users for IsDSack => I dropped it Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Reduce sacked_out with reno when purging write_queueIlpo Järvinen
Previously TCP had a transitional state during which reno counted segments that are already below the current window into sacked_out, which is now prevented. In addition, re-try now the unconditional S+L skb catching. This approach conservatively calls just remove_sack and leaves reset_sack() calls alone. The best solution to the whole problem would be to first calculate the new sacked_out fully (this patch does not move reno_sack_reset calls from original sites and thus does not implement this). However, that would require very invasive change to fastretrans_alert (perhaps even slicing it to two halves). Alternatively, all callers of tcp_packets_in_flight (i.e., users that depend on sacked_out) should be postponed until the new sacked_out has been calculated but it isn't any simpler alternative. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Keep state in Disorder also if only lost_out > 0Ilpo Järvinen
This happens rather infrequently and is only possible during FRTO. We must not allow TCP to slip to Open state because tcp_fastretrans_alert might then not be called on it's time when FRTO has exited. This become a problem when left_out got removed and was replaced by just sacked_out. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Restore over-zealous tcp_sync_left_out-like removalsIlpo Järvinen
tcp_verify_left_out is useful for verifying S+L condition, so add it back to couple of places in where the code was not calling to tcp_sync_left_out but used own ad-hoc solution (before the tcp_sync_left_out got removed). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Left out sync->verify (the new meaning of it) & definifyIlpo Järvinen
Left_out was dropped a while ago, thus leaving verifying consistency of the "left out" as only task for the function in question. Thus make it's name more appropriate. In addition, it is intentionally converted to #define instead of static inline because the location of the invariant failure is the most important thing to have if this ever triggers. I think it would have been helpful e.g. in this case where the location of the failure point had to be based on some quesswork: http://lkml.org/lkml/2007/5/2/464 ...Luckily the guesswork seems to have proved to be correct. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Add tcp_left_out(tp) "back" to get cleaner looking linesIlpo Järvinen
tp->left_out got removed but nothing came to replace it back then (users just did addition by themselves), so add function for users now. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Tighten tcp_sock's belt, drop left_outIlpo Järvinen
It is easily calculable when needed and user are not that many after all. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Remove num_acked>0 checks from cong.ctrl mods pkts_ackedIlpo Järvinen
There is no need for such check in pkts_acked because the callback is not invoked unless at least one segment got fully ACKed (i.e., the snd_una moved past skb's end_seq) by the cumulative ACK's snd_una advancement. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Add tcp_dec_pcount_approx int variantIlpo Järvinen
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Move code from tcp_ecn.h to tcp*.c and tcp.h & remove itIlpo Järvinen
No other users exist for tcp_ecn.h. Very few things remain in tcp.h, for most TCP ECN functions callers reside within a single .c file and can be placed there. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Access to highest_sack obsoletes forward_cnt_hintIlpo Järvinen
In addition, added a reference about the purpose of the loop. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP] FRTO: remove unnecessary fackets/sacked_out recountingIlpo Järvinen
F-RTO does not touch SACKED_ACKED bits at all, so there is no need to recount them in tcp_enter_frto_loss. After removal of the else branch, nested ifs can be combined. This must also reset sacked_out when SACK is not in use as TCP could have received some duplicate ACKs prior RTO. To achieve that in a sane manner, tcp_reset_reno_sack was re-placed by the previous patch. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Move Reno SACKed_out counter functions earlierIlpo Järvinen
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Extract DSACK detection code from tcp_sacktag_write_queue().David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Rexmit hint must be cleared instead of setting itIlpo Järvinen
Stupid error from my side. Even though now that I noticed this, I hoped it would have been an optimization but no, the counter hint is then incorrect. Thus clearing is necessary for now (I still suspect though that this path is never executed). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Extracted rexmit hint clearing from the LOST marking codeIlpo Järvinen
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Add highest_sack seqno, points to globally highest SACKIlpo Järvinen
It is guaranteed to be valid only when !tp->sacked_out. In most cases this seqno is available in the last ACK but there is no guarantee for that. The new fast recovery loss marking algorithm needs this as entry point. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NET]: Generic Large Receive Offload for TCP trafficJan-Bernd Themann
This patch provides generic Large Receive Offload (LRO) functionality for IPv4/TCP traffic. LRO combines received tcp packets to a single larger tcp packet and passes them then to the network stack in order to increase performance (throughput). The interface supports two modes: Drivers can either pass SKBs or fragment lists to the LRO engine. Signed-off-by: Jan-Bernd Themann <themann@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-07[TCP]: Fix fastpath_cnt_hint when GSO skb is partially ACKedIlpo Järvinen
When only GSO skb was partially ACKed, no hints are reset, therefore fastpath_cnt_hint must be tweaked too or else it can corrupt fackets_out. The corruption to occur, one must have non-trivial ACK/SACK sequence, so this bug is not very often that harmful. There's a fackets_out state reset in TCP because fackets_out is known to be inaccurate and that fixes the issue eventually anyway. In case there was also at least one skb that got fully ACKed, the fastpath_skb_hint is set to NULL which causes a recount for fastpath_cnt_hint (the old value won't be accessed anymore), thus it can safely be decremented without additional checking. Reported by Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-28[TCP]: Fix MD5 signature handling on big-endian.David S. Miller
Based upon a report and initial patch by Peter Lieven. tcp4_md5sig_key and tcp6_md5sig_key need to start with the exact same members as tcp_md5sig_key. Because they are both cast to that type by tcp_v{4,6}_md5_do_lookup(). Unfortunately tcp{4,6}_md5sig_key use a u16 for the key length instead of a u8, which is what tcp_md5sig_key uses. This just so happens to work by accident on little-endian, but on big-endian it doesn't. Instead of casting, just place tcp_md5sig_key as the first member of the address-family specific structures, adjust the access sites, and kill off the ugly casts. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-14[IPV4]: Just increment OutDatagrams once per a datagram.YOSHIFUJI Hideaki
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-11[INET_DIAG]: Fix oops in netlink_rcv_skbPatrick McHardy
netlink_run_queue() doesn't handle multiple processes processing the queue concurrently. Serialize queue processing in inet_diag to fix a oops in netlink_rcv_skb caused by netlink_run_queue passing a NULL for the skb. BUG: unable to handle kernel NULL pointer dereference at virtual address 00000054 [349587.500454] printing eip: [349587.500457] c03318ae [349587.500459] *pde = 00000000 [349587.500464] Oops: 0000 [#1] [349587.500466] PREEMPT SMP [349587.500474] Modules linked in: w83627hf hwmon_vid i2c_isa [349587.500483] CPU: 0 [349587.500485] EIP: 0060:[<c03318ae>] Not tainted VLI [349587.500487] EFLAGS: 00010246 (2.6.22.3 #1) [349587.500499] EIP is at netlink_rcv_skb+0xa/0x7e [349587.500506] eax: 00000000 ebx: 00000000 ecx: c148d2a0 edx: c0398819 [349587.500510] esi: 00000000 edi: c0398819 ebp: c7a21c8c esp: c7a21c80 [349587.500517] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 [349587.500521] Process oidentd (pid: 17943, ti=c7a20000 task=cee231c0 task.ti=c7a20000) [349587.500527] Stack: 00000000 c7a21cac f7c8ba78 c7a21ca4 c0331962 c0398819 f7c8ba00 0000004c [349587.500542] f736f000 c7a21cb4 c03988e3 00000001 f7c8ba00 c7a21cc4 c03312a5 0000004c [349587.500558] f7c8ba00 c7a21cd4 c0330681 f7c8ba00 e4695280 c7a21d00 c03307c6 7fffffff [349587.500578] Call Trace: [349587.500581] [<c010361a>] show_trace_log_lvl+0x1c/0x33 [349587.500591] [<c01036d4>] show_stack_log_lvl+0x8d/0xaa [349587.500595] [<c010390e>] show_registers+0x1cb/0x321 [349587.500604] [<c0103bff>] die+0x112/0x1e1 [349587.500607] [<c01132d2>] do_page_fault+0x229/0x565 [349587.500618] [<c03c8d3a>] error_code+0x72/0x78 [349587.500625] [<c0331962>] netlink_run_queue+0x40/0x76 [349587.500632] [<c03988e3>] inet_diag_rcv+0x1f/0x2c [349587.500639] [<c03312a5>] netlink_data_ready+0x57/0x59 [349587.500643] [<c0330681>] netlink_sendskb+0x24/0x45 [349587.500651] [<c03307c6>] netlink_unicast+0x100/0x116 [349587.500656] [<c0330f83>] netlink_sendmsg+0x1c2/0x280 [349587.500664] [<c02fcce9>] sock_sendmsg+0xba/0xd5 [349587.500671] [<c02fe4d1>] sys_sendmsg+0x17b/0x1e8 [349587.500676] [<c02fe92d>] sys_socketcall+0x230/0x24d [349587.500684] [<c01028d2>] syscall_call+0x7/0xb [349587.500691] ======================= [349587.500693] Code: f0 ff 4e 18 0f 94 c0 84 c0 0f 84 66 ff ff ff 89 f0 e8 86 e2 fc ff e9 5a ff ff ff f0 ff 40 10 eb be 55 89 e5 57 89 d7 56 89 c6 53 <8b> 50 54 83 fa 10 72 55 8b 9e 9c 00 00 00 31 c9 8b 03 83 f8 0f Reported by Athanasius <link@miggy.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-11[NETFILTER]: Fix/improve deadlock condition on module removal netfilterNeil Horman
So I've had a deadlock reported to me. I've found that the sequence of events goes like this: 1) process A (modprobe) runs to remove ip_tables.ko 2) process B (iptables-restore) runs and calls setsockopt on a netfilter socket, increasing the ip_tables socket_ops use count 3) process A acquires a file lock on the file ip_tables.ko, calls remove_module in the kernel, which in turn executes the ip_tables module cleanup routine, which calls nf_unregister_sockopt 4) nf_unregister_sockopt, seeing that the use count is non-zero, puts the calling process into uninterruptible sleep, expecting the process using the socket option code to wake it up when it exits the kernel 4) the user of the socket option code (process B) in do_ipt_get_ctl, calls ipt_find_table_lock, which in this case calls request_module to load ip_tables_nat.ko 5) request_module forks a copy of modprobe (process C) to load the module and blocks until modprobe exits. 6) Process C. forked by request_module process the dependencies of ip_tables_nat.ko, of which ip_tables.ko is one. 7) Process C attempts to lock the request module and all its dependencies, it blocks when it attempts to lock ip_tables.ko (which was previously locked in step 3) Theres not really any great permanent solution to this that I can see, but I've developed a two part solution that corrects the problem Part 1) Modifies the nf_sockopt registration code so that, instead of using a use counter internal to the nf_sockopt_ops structure, we instead use a pointer to the registering modules owner to do module reference counting when nf_sockopt calls a modules set/get routine. This prevents the deadlock by preventing set 4 from happening. Part 2) Enhances the modprobe utilty so that by default it preforms non-blocking remove operations (the same way rmmod does), and add an option to explicity request blocking operation. So if you select blocking operation in modprobe you can still cause the above deadlock, but only if you explicity try (and since root can do any old stupid thing it would like.... :) ). Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-11[NETFILTER]: nf_conntrack_ipv4: fix "Frag of proto ..." messagesPatrick McHardy
Since we're now using a generic tuple decoding function in ICMP connection tracking, ipv4_get_l4proto() might get called with a fragmented packet from within an ICMP error. Remove the error message we used to print when this happens. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-11[IPV4] devinet: show all addresses assigned to interfaceStephen Hemminger
Bug: http://bugzilla.kernel.org/show_bug.cgi?id=8876 Not all ips are shown by "ip addr show" command when IPs number assigned to an interface is more than 60-80 (in fact it depends on broadcast/label etc presence on each address). Steps to reproduce: It's terribly simple to reproduce: # for i in $(seq 1 100); do ip ad add 10.0.$i.1/24 dev eth10 ; done # ip addr show this will _not_ show all IPs. Looks like the problem is in netlink/ipv4 message processing. This is fix from bug submitter, it looks correct. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>