linux.git - dakr's fork of kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

Age	Commit message (Collapse)	Author
2009-06-18	net: correct off-by-one write allocations reports	Eric Dumazet
	commit 2b85a34e911bf483c27cfdd124aeb1605145dc80 (net: No more expensive sock_hold()/sock_put() on each tx) changed initial sk_wmem_alloc value. We need to take into account this offset when reporting sk_wmem_alloc to user, in PROC_FS files or various ioctls (SIOCOUTQ/TIOCOUTQ) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-18	net: group address list and its count	Jiri Pirko
	This patch is inspired by patch recently posted by Johannes Berg. Basically what my patch does is to group list and a count of addresses into newly introduced structure netdev_hw_addr_list. This brings us two benefits: 1) struct net_device becames a bit nicer. 2) in the future there will be a possibility to operate with lists independently on netdevices (with exporting right functions). I wanted to introduce this patch before I'll post a multicast lists conversion. Signed-off-by: Jiri Pirko <jpirko@redhat.com> drivers/net/bnx2.c \| 4 +- drivers/net/e1000/e1000_main.c \| 4 +- drivers/net/ixgbe/ixgbe_main.c \| 6 +- drivers/net/mv643xx_eth.c \| 2 +- drivers/net/niu.c \| 4 +- drivers/net/virtio_net.c \| 10 ++-- drivers/s390/net/qeth_l2_main.c \| 2 +- include/linux/netdevice.h \| 17 +++-- net/core/dev.c \| 130 ++++++++++++++++++-------------------- 9 files changed, 89 insertions(+), 90 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-18	ipv4: Fix fib_trie rebalancing, part 2	Jarek Poplawski
	My previous patch, which explicitly delays freeing of tnodes by adding them to the list to flush them after the update is finished, isn't strict enough. It treats exceptionally tnodes without parent, assuming they are newly created, so "invisible" for the read side yet. But the top tnode doesn't have parent as well, so we have to exclude all exceptions (at least until a better way is found). Additionally we need to move rcu assignment of this node before flushing, so the return type of the trie_rebalance() function is changed. Reported-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-17	pkt_sched: Update drops stats in act_police	Jarek Poplawski
	Action police statistics could be misleading because drops are not shown when expected. With feedback from: Jamal Hadi Salim <hadi@cyberus.ca> Reported-by: Pawel Staszewski <pstaszewski@itcare.pl> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-17	skbuff: don't corrupt mac_header on skb expansion	Stephen Hemminger
	The skb mac_header field is sometimes NULL (or ~0u) as a sentinel value. The places where skb is expanded add an offset which would change this flag into an invalid pointer (or offset). Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-17	skbuff: skb_mac_header_was_set is always true on >32 bit	Stephen Hemminger
	Looking at the crash in log_martians(), one suspect is that the check for mac header being set is not correct. The value of mac_header defaults to 0 on allocation, therefore skb_mac_header_was_set will always be true on platforms using NET_SKBUFF_USES_OFFSET. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-17	net: sk_wmem_alloc has initial value of one, not zero	Eric Dumazet
	commit 2b85a34e911bf483c27cfdd124aeb1605145dc80 (net: No more expensive sock_hold()/sock_put() on each tx) changed initial sk_wmem_alloc value. Some protocols check sk_wmem_alloc value to determine if a timer must delay socket deallocation. We must take care of the sk_wmem_alloc value being one instead of zero when no write allocations are pending. Reported by Ingo Molnar, and full diagnostic from David Miller. This patch introduces three helpers to get read/write allocations and a followup patch will use these helpers to report correct write allocations to user. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-16	x25: Fix sleep from timer on socket destroy.	David S. Miller
	If socket destuction gets delayed to a timer, we try to lock_sock() from that timer which won't work. Use bh_lock_sock() in that case. Signed-off-by: David S. Miller <davem@davemloft.net> Tested-by: Ingo Molnar <mingo@elte.hu>
2009-06-15	mac80211: fix wext bssid/ssid setting	Johannes Berg
	When changing to a new BSSID or SSID, the code in ieee80211_set_disassoc() needs to have the old data still valid to be able to disconnect and clean up properly. Currently, however, the old data is thrown away before ieee80211_set_disassoc() is ever called, so fix that by calling the function _before_ the old data is overwritten. This is (one of) the issue(s) causing mac80211 to hold cfg80211's BSS structs forever, and them thus being returned in scan results after they're long gone. http://www.intellinuxwireless.org/bugzilla/show_bug.cgi?id=2015 Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-06-15	mac80211: disconnect when user changes channel	Johannes Berg
	If we do not disconnect when a channel switch is requested, we end up eventually detection beacon loss from the AP and then disconnecting, without ever really telling the AP, so we might just as well disconnect right away. Additionally, this fixes a problem with iwlwifi where the driver will clear some internal state on channel changes like this and then get confused when we actually go clear that state from mac80211. It may look like this patch drops the no-IBSS check, but that is already handled by cfg80211 in the wext handler it provides for IBSS (cfg80211_ibss_wext_siwfreq). Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-06-15	mac80211: add queue debugfs file	Johannes Berg
	I suspect that some driver bugs can cause queues to be stopped while they shouldn't be, but it's hard to find out whether that is the case or not without having any visible information about the queues. This adds a file to debugfs that allows us to see the queues' statuses. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-06-15	mac80211: Do not try to associate with an empty SSID	Jouni Malinen
	It looks like some programs (e.g., NM) are setting an empty SSID with SIOCSIWESSID in some cases. This seems to trigger mac80211 to try to associate with an invalid configuration (wildcard SSID) which will result in failing associations (or odd issues, potentially including kernel panic with some drivers) if the AP were to actually accept this anyway). Only start association process if the SSID is actually set. This speeds up connection with NM in number of cases and avoids sending out broken association request frames. Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-06-15	Merge branch 'master' of ↵	David S. Miller
	master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: Documentation/feature-removal-schedule.txt drivers/scsi/fcoe/fcoe.c net/core/drop_monitor.c net/core/net-traces.c
2009-06-15	pkt_sched: Rename PSCHED_US2NS and PSCHED_NS2US	Jarek Poplawski
	Let's use TICKS instead of US, so PSCHED_TICKS2NS and PSCHED_NS2TICKS (like in PSCHED_TICKS_PER_SEC already) to avoid misleading. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-15	ipv4: Fix fib_trie rebalancing	Jarek Poplawski
	While doing trie_rebalance(): resize(), inflate(), halve() RCU free tnodes before updating their parents. It depends on RCU delaying the real destruction, but if RCU readers start after call_rcu() and before parent update they could access freed memory. It is currently prevented with preempt_disable() on the update side, but it's not safe, except maybe classic RCU, plus it conflicts with memory allocations with GFP_KERNEL flag used from these functions. This patch explicitly delays freeing of tnodes by adding them to the list, which is flushed after the update is finished. Reported-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-14	Merge branch 'for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (31 commits) trivial: remove the trivial patch monkey's name from SubmittingPatches trivial: Fix a typo in comment of addrconf_dad_start() trivial: usb: fix missing space typo in doc trivial: pci hotplug: adding __init/__exit macros to sgi_hotplug trivial: Remove the hyphen from git commands trivial: fix ETIMEOUT -> ETIMEDOUT typos trivial: Kconfig: .ko is normally not included in module names trivial: SubmittingPatches: fix typo trivial: Documentation/dell_rbu.txt: fix typos trivial: Fix Pavel's address in MAINTAINERS trivial: ftrace:fix description of trace directory trivial: unnecessary (void*) cast removal in sound/oss/msnd.c trivial: input/misc: Fix typo in Kconfig trivial: fix grammo in bus_for_each_dev() kerneldoc trivial: rbtree.txt: fix rb_entry() parameters in sample code trivial: spelling fix in ppc code comments trivial: fix typo in bio_alloc kernel doc trivial: Documentation/rbtree.txt: cleanup kerneldoc of rbtree.txt trivial: Miscellaneous documentation typo fixes trivial: fix typo milisecond/millisecond for documentation and source comments. ...
2009-06-14	Bluetooth: Fix Kconfig issue with RFKILL integration	Marcel Holtmann
	Since the re-write of the RFKILL subsystem it is no longer good to just select RFKILL, but it is important to add a proper depends on rule. Based on a report by Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-06-14	PIM-SM: namespace changes	Tom Goff
	IPv4: - make PIM register vifs netns local - set the netns when a PIM register vif is created - make PIM available in all network namespaces (if CONFIG_IP_PIMSM_V2) by adding the protocol handler when multicast routing is initialized IPv6: - make PIM register vifs netns local - make PIM available in all network namespaces (if CONFIG_IPV6_PIMSM_V2) by adding the protocol handler when multicast routing is initialized Signed-off-by: Tom Goff <thomas.goff@boeing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-13	ipv4: update ARPD help text	Timo Teräs
	Removed the statements about ARP cache size as this config option does not affect it. The cache size is controlled by neigh_table gc thresholds. Remove also expiremental and obsolete markings as the API originally intended for arp caching is useful for implementing ARP-like protocols (e.g. NHRP) in user space and has been there for a long enough time. Signed-off-by: Timo Teras <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-13	net: use a deferred timer in rt_check_expire	Eric Dumazet
	For the sake of power saver lovers, use a deferrable timer to fire rt_check_expire() As some big routers cache equilibrium depends on garbage collection done in time, we take into account elapsed time between two rt_check_expire() invocations to adjust the amount of slots we have to check. Based on an initial idea and patch from Tero Kristo Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Tero Kristo <tero.kristo@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-13	Merge branch 'master' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
2009-06-13	x_tables: Convert printk to pr_err	Joe Perches
	Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-13	netfilter: conntrack: optional reliable conntrack event delivery	Pablo Neira Ayuso
	This patch improves ctnetlink event reliability if one broadcast listener has set the NETLINK_BROADCAST_ERROR socket option. The logic is the following: if an event delivery fails, we keep the undelivered events in the missed event cache. Once the next packet arrives, we add the new events (if any) to the missed events in the cache and we try a new delivery, and so on. Thus, if ctnetlink fails to deliver an event, we try to deliver them once we see a new packet. Therefore, we may lose state transitions but the userspace process gets in sync at some point. At worst case, if no events were delivered to userspace, we make sure that destroy events are successfully delivered. Basically, if ctnetlink fails to deliver the destroy event, we remove the conntrack entry from the hashes and we insert them in the dying list, which contains inactive entries. Then, the conntrack timer is added with an extra grace timeout of random32() % 15 seconds to trigger the event again (this grace timeout is tunable via /proc). The use of a limited random timeout value allows distributing the "destroy" resends, thus, avoiding accumulating lots "destroy" events at the same time. Event delivery may re-order but we can identify them by means of the tuple plus the conntrack ID. The maximum number of conntrack entries (active or inactive) is still handled by nf_conntrack_max. Thus, we may start dropping packets at some point if we accumulate a lot of inactive conntrack entries that did not successfully report the destroy event to userspace. During my stress tests consisting of setting a very small buffer of 2048 bytes for conntrackd and the NETLINK_BROADCAST_ERROR socket flag, and generating lots of very small connections, I noticed very few destroy entries on the fly waiting to be resend. A simple way to test this patch consist of creating a lot of entries, set a very small Netlink buffer in conntrackd (+ a patch which is not in the git tree to set the BROADCAST_ERROR flag) and invoke `conntrack -F'. For expectations, no changes are introduced in this patch. Currently, event delivery is only done for new expectations (no events from expectation expiration, removal and confirmation). In that case, they need a per-expectation event cache to implement the same idea that is exposed in this patch. This patch can be useful to provide reliable flow-accouting. We still have to add a new conntrack extension to store the creation and destroy time. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-13	netfilter: conntrack: move helper destruction to nf_ct_helper_destroy()	Pablo Neira Ayuso
	This patch moves the helper destruction to a function that lives in nf_conntrack_helper.c. This new function is used in the patch to add ctnetlink reliable event delivery. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-13	netfilter: conntrack: move event caching to conntrack extension infrastructure	Pablo Neira Ayuso
	This patch reworks the per-cpu event caching to use the conntrack extension infrastructure. The main drawback is that we consume more memory per conntrack if event delivery is enabled. This patch is required by the reliable event delivery that follows to this patch. BTW, this patch allows you to enable/disable event delivery via /proc/sys/net/netfilter/nf_conntrack_events in runtime, although you can still disable event caching as compilation option. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-13	netfilter: nf_conntrack: use mod_timer_pending() for conntrack refresh	Patrick McHardy
	Use mod_timer_pending() instead of atomic sequence of del_timer()/ add_timer(). mod_timer_pending() does not rearm an inactive timer, so we don't need the conntrack lock anymore to make sure we don't accidentally rearm a timer of a conntrack which is in the process of being destroyed. With this change, we don't need to take the global lock anymore at all, counter updates can be performed under the per-conntrack lock. Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-13	netfilter: nf_log: fix sleeping function called from invalid context	Patrick McHardy
	Fix regression introduced by 17625274 "netfilter: sysctl support of logger choice": BUG: sleeping function called from invalid context at /mnt/s390test/linux-2.6-tip/arch/s390/include/asm/uaccess.h:234 in_atomic(): 1, irqs_disabled(): 0, pid: 3245, name: sysctl CPU: 1 Not tainted 2.6.30-rc8-tipjun10-02053-g39ae214 #1 Process sysctl (pid: 3245, task: 000000007f675da0, ksp: 000000007eb17cf0) 0000000000000000 000000007eb17be8 0000000000000002 0000000000000000 000000007eb17c88 000000007eb17c00 000000007eb17c00 0000000000048156 00000000003e2de8 000000007f676118 000000007eb17f10 0000000000000000 0000000000000000 000000007eb17be8 000000000000000d 000000007eb17c58 00000000003e2050 000000000001635c 000000007eb17be8 000000007eb17c30 Call Trace: (�<00000000000162e6>� show_trace+0x13a/0x148) �<00000000000349ea>� __might_sleep+0x13a/0x164 �<0000000000050300>� proc_dostring+0x134/0x22c �<0000000000312b70>� nf_log_proc_dostring+0xfc/0x188 �<0000000000136f5e>� proc_sys_call_handler+0xf6/0x118 �<0000000000136fda>� proc_sys_read+0x26/0x34 �<00000000000d6e9c>� vfs_read+0xac/0x158 �<00000000000d703e>� SyS_read+0x56/0x88 �<0000000000027f42>� sysc_noemu+0x10/0x16 Use the nf_log_mutex instead of RCU to fix this. Reported-and-tested-by: Maran Pakkirisamy <maranpsamy@in.ibm.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-13	net: use symbolic values for ndo_start_xmit() return codes	Patrick McHardy
	Convert magic values 1 and -1 to NETDEV_TX_BUSY and NETDEV_TX_LOCKED respectively. 0 (NETDEV_TX_OK) is not changed to keep the noise down, except in very few cases where its in direct proximity to one of the other values. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-13	net: fix network drivers ndo_start_xmit() return values (part 7)	Patrick McHardy
	Fix up ATM drivers that return an errno value to qdisc_restart(), causing qdisc_restart() to print a warning an requeue/retransmit the skb. - lec: condition can only be remedied by userspace, until that retransmissions Compile tested only. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-12	trivial: Fix a typo in comment of addrconf_dad_start()	Masatake YAMATO
	Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-06-12	trivial: Kconfig: .ko is normally not included in module names	Pavel Machek
	.ko is normally not included in Kconfig help, make it consistent. Signed-off-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-06-12	trivial: Fix paramater/parameter typo in dmesg and source comments	Martin Olsson
	Signed-off-by: Martin Olsson <martin@minimum.se> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-06-12	virtio: find_vqs/del_vqs virtio operations	Michael S. Tsirkin
	This replaces find_vq/del_vq with find_vqs/del_vqs virtio operations, and updates all drivers. This is needed for MSI support, because MSI needs to know the total number of vectors upfront. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (+ lguest/9p compile fixes)
2009-06-12	virtio: add names to virtqueue struct, mapping from devices to queues.	Rusty Russell
	Add a linked list of all virtqueues for a virtio device: this helps for debugging and is also needed for upcoming interface change. Also, add a "name" field for clearer debug messages. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-11	bridge: Simplify interface for ATM LANE	Michał Mirosław
	This patch changes FDB entry check for ATM LANE bridge integration. There's no point in holding a FDB entry around SKB building. br_fdb_get()/br_fdb_put() pair are changed into single br_fdb_test_addr() hook that checks if the addr has FDB entry pointing to other port to the one the request arrived on. FDB entry refcounting is removed as it's not used anywhere else. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-11	[PATCH] net core: Some interface flags not returned by SIOCGIFFLAGS	John Dykstra
	Commit b00055aacdb172c05067612278ba27265fcd05ce " [NET] core: add RFC2863 operstate" defined new interface flag values. Its documentation specified that these flags could be accessed from user space via SIOCGIFFLAGS. However, this does not work because the new flags do not fit in that ioctl's argument width. Change the documentation to match the code's behavior. Also change the source to explicitly show the truncation. This _should_ have no effect on executable code, and did not with gcc 4.2.4 generating x86 code. A new ioctl could be defined to return all interface flags to user space. However, since this has been broken for three years with no one complaining, there doesn't seem much need. They are still accessible via netlink. Reported-by: "Fredrik Arnerup" <fredrik.arnerup@edgeware.tv> Signed-off-by: John Dykstra <john.dykstra1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-11	Merge branch 'master' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
2009-06-11	Merge branch 'linux-2.6.31.y' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/inaky/wimax
2009-06-12	netfilter: ip_tables: fix build error	Patrick McHardy
	Fix build error introduced by commit bb70dfa5 (netfilter: xtables: consolidate comefrom debug cast access): net/ipv4/netfilter/ip_tables.c: In function 'ipt_do_table': net/ipv4/netfilter/ip_tables.c:421: error: 'comefrom' undeclared (first use in this function) net/ipv4/netfilter/ip_tables.c:421: error: (Each undeclared identifier is reported only once net/ipv4/netfilter/ip_tables.c:421: error: for each function it appears in.) Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-11	wimax: fix warning caused by not checking retval of rfkill_set_hw_state()	Inaky Perez-Gonzalez
	Caused by an API update. The return value can be safely ignored, as there is notthing we can do with it. Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
2009-06-11	netfilter: nf_ct_tcp: fix up build after merge	Patrick McHardy
	Replace the last occurence of tcp_lock by the per-conntrack lock. Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-11	Merge branch 'master' of ↵	Patrick McHardy
	git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
2009-06-11	Merge branch 'master' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-next-2.6
2009-06-11	neigh: fix state transition INCOMPLETE->FAILED via Netlink request	Timo Teras
	The current code errors out the INCOMPLETE neigh entry skb queue only from the timer if maximum probes have been attempted and there has been no reply. This also causes the transtion to FAILED state. However, the neigh entry can be also updated via Netlink to inform that the address is unavailable. Currently, neigh_update() just stops the timers and leaves the pending skb's unreleased. This results that the clean up code in the timer callback is never called, preventing also proper garbage collection. This fixes neigh_update() to process the pending skb queue immediately if INCOMPLETE -> FAILED state transtion occurs due to a Netlink request. Signed-off-by: Timo Teras <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-11	net: No more expensive sock_hold()/sock_put() on each tx	Eric Dumazet
	One of the problem with sock memory accounting is it uses a pair of sock_hold()/sock_put() for each transmitted packet. This slows down bidirectional flows because the receive path also needs to take a refcount on socket and might use a different cpu than transmit path or transmit completion path. So these two atomic operations also trigger cache line bounces. We can see this in tx or tx/rx workloads (media gateways for example), where sock_wfree() can be in top five functions in profiles. We use this sock_hold()/sock_put() so that sock freeing is delayed until all tx packets are completed. As we also update sk_wmem_alloc, we could offset sk_wmem_alloc by one unit at init time, until sk_free() is called. Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc) to decrement initial offset and atomicaly check if any packets are in flight. skb_set_owner_w() doesnt call sock_hold() anymore sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc reached 0 to perform the final freeing. Drawback is that a skb->truesize error could lead to unfreeable sockets, or even worse, prematurely calling __sk_free() on a live socket. Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt contention point. 5 % speedup on a UDP transmit workload (depends on number of flows), lowering TX completion cpu usage. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-11	ieee802154: Use '%Zu' printf format for size_t.	David S. Miller
	Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-10	Merge branch 'master' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
2009-06-10	Merge branch 'tracing-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (244 commits) Revert "x86, bts: reenable ptrace branch trace support" tracing: do not translate event helper macros in print format ftrace/documentation: fix typo in function grapher name tracing/events: convert block trace points to TRACE_EVENT(), fix !CONFIG_BLOCK tracing: add protection around module events unload tracing: add trace_seq_vprint interface tracing: fix the block trace points print size tracing/events: convert block trace points to TRACE_EVENT() ring-buffer: fix ret in rb_add_time_stamp ring-buffer: pass in lockdep class key for reader_lock tracing: add annotation to what type of stack trace is recorded tracing: fix multiple use of __print_flags and __print_symbolic tracing/events: fix output format of user stack tracing/events: fix output format of kernel stack tracing/trace_stack: fix the number of entries in the header ring-buffer: discard timestamps that are at the start of the buffer ring-buffer: try to discard unneeded timestamps ring-buffer: fix bug in ring_buffer_discard_commit ftrace: do not profile functions when disabled tracing: make trace pipe recognize latency format flag ...
2009-06-10	cfg80211: fix rfkill locking problem	Johannes Berg
	rfkill currently requires a global lock within the rfkill_register() function, and holds that lock over calls to the set_block() methods. This means that we cannot hold a lock around rfkill_register() that we also require in set_block(), directly or indirectly. Fix cfg80211 to register rfkill outside the block locked by its global lock. Much of what cfg80211 does in the locked block doesn't need to be locked anyway. Reported-by: Vasanthakumar Thiagarajan <vasanth@atheros.com> Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-06-10	mac80211: disable PS while probing AP	Johannes Berg
	When associated, but probing the AP because we detected beacon loss, we need to disable powersave to be able to receive the probe response. Change the code to do that by checking whether we're trying to probe when determining the possibility of going into PS, and recalculate the PS ability at the necessary spots. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>