linux.git - dakr's fork of kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

Age	Commit message (Collapse)	Author
2012-03-28	radix-tree: use iterators in find_get_pages* functions	Konstantin Khlebnikov
	Replace radix_tree_gang_lookup_slot() and radix_tree_gang_lookup_tag_slot() in page-cache lookup functions with brand-new radix-tree direct iterating. This avoids the double-scanning and pointer copying. Iterator don't stop after nr_pages page-get fails in a row, it continue lookup till the radix-tree end. Thus we can safely remove these restart conditions. Unfortunately, old implementation didn't forbid nr_pages == 0, this corner case does not fit into new code, so the patch adds an extra check at the beginning. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Tested-by: Hugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28	radix-tree: rewrite gang lookup using iterator	Konstantin Khlebnikov
	Rewrite radix_tree_gang_lookup_* functions using the new radix-tree iterator. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Tested-by: Hugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28	radix-tree: introduce bit-optimized iterator	Konstantin Khlebnikov
	A series of radix tree cleanups, and usage of them in the core pagecache code. Micro-benchmark: lookup 14 slots (typical page-vector size) in radix-tree there earch <step> slot filled and tagged before/after - nsec per full scan through tree * Intel Sandy Bridge i7-2620M 4Mb L3 New code always faster * AMD Athlon 6000+ 2x1Mb L2, without L3 New code generally faster, Minor degradation (marked with "") for huge sparse trees i386 on Sandy Bridge New code faster for common cases: tagged and dense trees. Some degradations for non-tagged lookup on sparse trees. Ideally, there might help __ffs() analog for searching first non-zero long element in array, gcc sometimes cannot optimize this loop corretly. Numbers: CPU: Intel Sandy Bridge i7-2620M 4Mb L3 radix-tree with 1024 slots: tagged lookup step 1 before 7156 after 3613 step 2 before 5399 after 2696 step 3 before 4779 after 1928 step 4 before 4456 after 1429 step 5 before 4292 after 1213 step 6 before 4183 after 1052 step 7 before 4157 after 951 step 8 before 4016 after 812 step 9 before 3952 after 851 step 10 before 3937 after 732 step 11 before 4023 after 709 step 12 before 3872 after 657 step 13 before 3892 after 633 step 14 before 3720 after 591 step 15 before 3879 after 578 step 16 before 3561 after 513 normal lookup step 1 before 4266 after 3301 step 2 before 2695 after 2129 step 3 before 2083 after 1712 step 4 before 1801 after 1534 step 5 before 1628 after 1313 step 6 before 1551 after 1263 step 7 before 1475 after 1185 step 8 before 1432 after 1167 step 9 before 1373 after 1092 step 10 before 1339 after 1134 step 11 before 1292 after 1056 step 12 before 1319 after 1030 step 13 before 1276 after 1004 step 14 before 1256 after 987 step 15 before 1228 after 992 step 16 before 1247 after 999 radix-tree with 10241024128 slots: tagged lookup step 1 before 1086102841 after 674196409 step 2 before 816839155 after 498138306 step 7 before 599728907 after 240676762 step 15 before 555729253 after 185219677 step 63 before 606637748 after 128585664 step 64 before 608384432 after 102945089 step 65 before 596987114 after 123996019 step 128 before 304459225 after 56783056 step 256 before 158846855 after 31232481 step 512 before 86085652 after 18950595 step 12345 before 6517189 after 1674057 normal lookup step 1 before 626064869 after 544418266 step 2 before 418809975 after 336321473 step 7 before 242303598 after 207755560 step 15 before 208380563 after 176496355 step 63 before 186854206 after 167283638 step 64 before 176188060 after 170143976 step 65 before 185139608 after 167487116 step 128 before 88181865 after 86913490 step 256 before 45733628 after 45143534 step 512 before 24506038 after 23859036 step 12345 before 2177425 after 2018662 * AMD Athlon 6000+ 2x1Mb L2, without L3 radix-tree with 1024 slots: tag-lookup step 1 before 8164 after 5379 step 2 before 5818 after 5581 step 3 before 4959 after 4213 step 4 before 4371 after 3386 step 5 before 4204 after 2997 step 6 before 4950 after 2744 step 7 before 4598 after 2480 step 8 before 4251 after 2288 step 9 before 4262 after 2243 step 10 before 4175 after 2131 step 11 before 3999 after 2024 step 12 before 3979 after 1994 step 13 before 3842 after 1929 step 14 before 3750 after 1810 step 15 before 3735 after 1810 step 16 before 3532 after 1660 normal-lookup step 1 before 7875 after 5847 step 2 before 4808 after 4071 step 3 before 4073 after 3462 step 4 before 3677 after 3074 step 5 before 4308 after 2978 step 6 before 3911 after 3807 step 7 before 3635 after 3522 step 8 before 3313 after 3202 step 9 before 3280 after 3257 step 10 before 3166 after 3083 step 11 before 3066 after 3026 step 12 before 2985 after 2982 step 13 before 2925 after 2924 step 14 before 2834 after 2808 step 15 before 2805 after 2803 step 16 before 2647 after 2622 radix-tree with 10241024128 slots: tag-lookup step 1 before 1288059720 after 951736580 step 2 before 961292300 after 884212140 step 7 before 768905140 after 547267580 step 15 before 771319480 after 456550640 step 63 before 504847640 after 242704304 step 64 before 392484800 after 177920786 step 65 before 491162160 after 246895264 step 128 before 208084064 after 97348392 step 256 before 112401035 after 51408126 step 512 before 75825834 after 29145070 step 12345 before 5603166 after 2847330 normal-lookup step 1 before 1025677120 after 861375100 step 2 before 647220080 after 572258540 step 7 before 505518960 after 484041813 step 15 before 430483053 after 444815320 * step 63 before 388113453 after 404250546 * step 64 before 374154666 after 396027440 * step 65 before 381423973 after 396704853 * step 128 before 190078700 after 202619384 * step 256 before 100886756 after 102829108 * step 512 before 64074505 after 56158720 step 12345 before 4237289 after 4422299 * * i686 on Sandy bridge radix-tree with 1024 slots: tagged lookup step 1 before 7990 after 4019 step 2 before 5698 after 2897 step 3 before 5013 after 2475 step 4 before 4630 after 1721 step 5 before 4346 after 1759 step 6 before 4299 after 1556 step 7 before 4098 after 1513 step 8 before 4115 after 1222 step 9 before 3983 after 1390 step 10 before 4077 after 1207 step 11 before 3921 after 1231 step 12 before 3894 after 1116 step 13 before 3840 after 1147 step 14 before 3799 after 1090 step 15 before 3797 after 1059 step 16 before 3783 after 745 normal lookup step 1 before 5103 after 3499 step 2 before 3299 after 2550 step 3 before 2489 after 2370 step 4 before 2034 after 2302 * step 5 before 1846 after 2268 * step 6 before 1752 after 2249 * step 7 before 1679 after 2164 * step 8 before 1627 after 2153 * step 9 before 1542 after 2095 * step 10 before 1479 after 2109 * step 11 before 1469 after 2009 * step 12 before 1445 after 2039 * step 13 before 1411 after 2013 * step 14 before 1374 after 2046 * step 15 before 1340 after 1975 * step 16 before 1331 after 2000 * radix-tree with 10241024128 slots: tagged lookup step 1 before 1225865377 after 667153553 step 2 before 842427423 after 471533007 step 7 before 609296153 after 276260116 step 15 before 544232060 after 226859105 step 63 before 519209199 after 141343043 step 64 before 588980279 after 141951339 step 65 before 521099710 after 138282060 step 128 before 298476778 after 83390628 step 256 before 149358342 after 43602609 step 512 before 76994713 after 22911077 step 12345 before 5328666 after 1472111 normal lookup step 1 before 819284564 after 533635310 step 2 before 512421605 after 364956155 step 7 before 271443305 after 305721345 * step 15 before 223591630 after 273960216 * step 63 before 190320247 after 217770207 * step 64 before 178538168 after 267411372 * step 65 before 186400423 after 215347937 * step 128 before 88106045 after 140540612 * step 256 before 44812420 after 70660377 * step 512 before 24435438 after 36328275 * step 12345 before 2123924 after 2148062 * bloat-o-meter delta for this patchset + patchset with related shmem cleanups bloat-o-meter: x86_64 add/remove: 4/3 grow/shrink: 5/6 up/down: 928/-939 (-11) function old new delta radix_tree_next_chunk - 499 +499 shmem_unuse 428 554 +126 shmem_radix_tree_replace 131 227 +96 find_get_pages_tag 354 419 +65 find_get_pages_contig 345 407 +62 find_get_pages 362 396 +34 __kstrtab_radix_tree_next_chunk - 22 +22 __ksymtab_radix_tree_next_chunk - 16 +16 __kcrctab_radix_tree_next_chunk - 8 +8 radix_tree_gang_lookup_slot 204 203 -1 static.shmem_xattr_set 384 381 -3 radix_tree_gang_lookup_tag_slot 208 191 -17 radix_tree_gang_lookup 231 187 -44 radix_tree_gang_lookup_tag 247 199 -48 shmem_unlock_mapping 278 190 -88 __lookup 217 - -217 __lookup_tag 242 - -242 radix_tree_locate_item 279 - -279 bloat-o-meter: i386 add/remove: 3/3 grow/shrink: 8/9 up/down: 1075/-1275 (-200) function old new delta radix_tree_next_chunk - 757 +757 shmem_unuse 352 449 +97 find_get_pages_contig 269 322 +53 shmem_radix_tree_replace 113 154 +41 find_get_pages_tag 277 318 +41 dcache_dir_lseek 426 458 +32 __kstrtab_radix_tree_next_chunk - 22 +22 vc_do_resize 968 977 +9 snd_pcm_lib_read1 725 733 +8 __ksymtab_radix_tree_next_chunk - 8 +8 netlbl_cipsov4_list 1120 1127 +7 find_get_pages 293 291 -2 new_slab 467 459 -8 bitfill_unaligned_rev 425 417 -8 radix_tree_gang_lookup_tag_slot 177 146 -31 blk_dump_cmd 267 229 -38 radix_tree_gang_lookup_slot 212 134 -78 shmem_unlock_mapping 221 128 -93 radix_tree_gang_lookup_tag 275 162 -113 radix_tree_gang_lookup 255 126 -129 __lookup 227 - -227 __lookup_tag 271 - -271 radix_tree_locate_item 277 - -277 This patch: Implement a clean, simple and effective radix-tree iteration routine. Iterating divided into two phases: * lookup next chunk in radix-tree leaf node * iterating through slots in this chunk Main iterator function radix_tree_next_chunk() returns pointer to first slot, and stores in the struct radix_tree_iter index of next-to-last slot. For tagged-iterating it also constuct bitmask of tags for retunted chunk. All additional logic implemented as static-inline functions and macroses. Also adds radix_tree_find_next_bit() static-inline variant of find_next_bit() optimized for small constant size arrays, because find_next_bit() too heavy for searching in an array with one/two long elements. [akpm@linux-foundation.org: rework comments a bit] Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Tested-by: Hugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28	fs/proc/namespaces.c: prevent crash when ns_entries[] is empty	Andrew Morton
	If CONFIG_NET_NS, CONFIG_UTS_NS and CONFIG_IPC_NS are disabled, ns_entries[] becomes empty and things like ns_entries[ARRAY_SIZE(ns_entries) - 1] will explode. Reported-by: Richard Weinberger <richard@nod.at> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Daniel Lezcano <daniel.lezcano@free.fr> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28	nbd: rename the nbd_device variable from lo to nbd	Wanlong Gao
	rename the nbd_device variable from "lo" to "nbd", since "lo" is just a name copied from loop.c. Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Cc: Paul Clements <paul.clements@steeleye.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28	pidns: add reboot_pid_ns() to handle the reboot syscall	Daniel Lezcano
	In the case of a child pid namespace, rebooting the system does not really makes sense. When the pid namespace is used in conjunction with the other namespaces in order to create a linux container, the reboot syscall leads to some problems. A container can reboot the host. That can be fixed by dropping the sys_reboot capability but we are unable to correctly to poweroff/ halt/reboot a container and the container stays stuck at the shutdown time with the container's init process waiting indefinitively. After several attempts, no solution from userspace was found to reliabily handle the shutdown from a container. This patch propose to make the init process of the child pid namespace to exit with a signal status set to : SIGINT if the child pid namespace called "halt/poweroff" and SIGHUP if the child pid namespace called "reboot". When the reboot syscall is called and we are not in the initial pid namespace, we kill the pid namespace for "HALT", "POWEROFF", "RESTART", and "RESTART2". Otherwise we return EINVAL. Returning EINVAL is also an easy way to check if this feature is supported by the kernel when invoking another 'reboot' option like CAD. By this way the parent process of the child pid namespace knows if it rebooted or not and can take the right decision. Test case: ========== #include <alloca.h> #include <stdio.h> #include <sched.h> #include <unistd.h> #include <signal.h> #include <sys/reboot.h> #include <sys/types.h> #include <sys/wait.h> #include <linux/reboot.h> static int do_reboot(void arg) { int cmd = arg; if (reboot(cmd)) printf("failed to reboot(%d): %m\n", cmd); } int test_reboot(int cmd, int sig) { long stack_size = 4096; void stack = alloca(stack_size) + stack_size; int status; pid_t ret; ret = clone(do_reboot, stack, CLONE_NEWPID \| SIGCHLD, &cmd); if (ret < 0) { printf("failed to clone: %m\n"); return -1; } if (wait(&status) < 0) { printf("unexpected wait error: %m\n"); return -1; } if (!WIFSIGNALED(status)) { printf("child process exited but was not signaled\n"); return -1; } if (WTERMSIG(status) != sig) { printf("signal termination is not the one expected\n"); return -1; } return 0; } int main(int argc, char argv[]) { int status; status = test_reboot(LINUX_REBOOT_CMD_RESTART, SIGHUP); if (status < 0) return 1; printf("reboot(LINUX_REBOOT_CMD_RESTART) succeed\n"); status = test_reboot(LINUX_REBOOT_CMD_RESTART2, SIGHUP); if (status < 0) return 1; printf("reboot(LINUX_REBOOT_CMD_RESTART2) succeed\n"); status = test_reboot(LINUX_REBOOT_CMD_HALT, SIGINT); if (status < 0) return 1; printf("reboot(LINUX_REBOOT_CMD_HALT) succeed\n"); status = test_reboot(LINUX_REBOOT_CMD_POWER_OFF, SIGINT); if (status < 0) return 1; printf("reboot(LINUX_REBOOT_CMD_POWERR_OFF) succeed\n"); status = test_reboot(LINUX_REBOOT_CMD_CAD_ON, -1); if (status >= 0) { printf("reboot(LINUX_REBOOT_CMD_CAD_ON) should have failed\n"); return 1; } printf("reboot(LINUX_REBOOT_CMD_CAD_ON) has failed as expected\n"); return 0; } [akpm@linux-foundation.org: tweak and add comments] [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Tested-by: Serge Hallyn <serge.hallyn@canonical.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28	sysctl: use bitmap library functions	Akinobu Mita
	Use bitmap_set() instead of using set_bit() for each bit. This conversion is valid because the bitmap is private in the function call and atomic bitops were unnecessary. This also includes minor change. - Use bitmap_copy() for shorter typing Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28	ipmi: use locks on watchdog timeout set on reboot	Corey Minyard
	The IPMI watchdog timer clears or extends the timer on reboot/shutdown. It was using the non-locking routine for setting the watchdog timer, but this was causing race conditions. Instead, use the locking version to avoid the races. It seems to work fine. Signed-off-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28	ipmi: simplify locking	Corey Minyard
	Now that the the IPMI driver is using a tasklet, we can simplify the locking in the driver and get rid of the message lock. Signed-off-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28	ipmi: fix message handling during panics	Corey Minyard
	The part of the IPMI driver that delivered panic information to the event log and extended the watchdog timeout during a panic was not properly handling the messages. It used static messages to avoid allocation, but wasn't properly waiting for these, or wasn't properly handling the refcounts. Signed-off-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28	ipmi: use a tasklet for handling received messages	Corey Minyard
	The IPMI driver would release a lock, deliver a message, then relock. This is obviously ugly, and this patch converts the message handler interface to use a tasklet to schedule work. This lets the receive handler be called from an interrupt handler with interrupts enabled. Signed-off-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28	ipmi: increase KCS timeouts	Matthew Garrett
	We currently time out and retry KCS transactions after 1 second of waiting for IBF or OBF. This appears to be too short for some hardware. The IPMI spec says "All system software wait loops should include error timeouts. For simplicity, such timeouts are not shown explicitly in the flow diagrams. A five-second timeout or greater is recommended". Change the timeout to five seconds to satisfy the slow hardware. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28	ipmi: decrease the IPMI message transaction time in interrupt mode	Srinivas_Gowda
	Call the event handler immediately after starting the next message. This change considerably decreases the IPMI transaction time (cuts off ~9ms for a single ipmitool transaction). Signed-off-by: Srinivas_Gowda <srinivas_g_gowda@dell.com> Signed-off-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28	kdump x86: fix total mem size calculation for reservation	Dave Young
	crashkernel reservation need know the total memory size. Current get_total_mem simply use max_pfn - min_low_pfn. It is wrong because it will including memory holes in the middle. Especially for kvm guest with memory > 0xe0000000, there's below in qemu code: qemu split memory as below: if (ram_size >= 0xe0000000 ) { above_4g_mem_size = ram_size - 0xe0000000; below_4g_mem_size = 0xe0000000; } else { below_4g_mem_size = ram_size; } So for 4G mem guest, seabios will insert a 512M usable region beyond of 4G. Thus in above case max_pfn - min_low_pfn will be more than original memsize. Fixing this issue by using memblock_phys_mem_size() to get the total memsize. Signed-off-by: Dave Young <dyoung@redhat.com> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28	kexec: add further check to crashkernel	Zhenzhong Duan
	When using crashkernel=2M-256M, the kernel doesn't give any warning. This is misleading sometimes. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28	kexec: crash: don't save swapper_pg_dir for !CONFIG_MMU configurations	Will Deacon
	nommu platforms don't have very interesting swapper_pg_dir pointers and usually just #define them to NULL, meaning that we can't include them in the vmcoreinfo on the kexec crash path. This patch only saves the swapper_pg_dir if we have an MMU. Signed-off-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Simon Horman <horms@verge.net.au> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28	arch/ia64: remove references to cpu_*_map	Srivatsa S. Bhat
	This was marked as obsolete for quite a while now.. Now it is time to remove it altogether. And while doing this, get rid of first_cpu() as well. Also, remove the redundant setting of cpu_online_mask in smp_prepare_cpus() because the generic code would have already set cpu 0 in cpu_online_mask. Reported-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28	lib/cpumask.c: remove __any_online_cpu()	Srivatsa S. Bhat
	__any_online_cpu() is not optimal and also unnecessary. So, replace its use by faster cpumask_* operations. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Venkatesh Pallipadi <venki@google.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28	mm: only IPI CPUs to drain local pages if they exist	Gilad Ben-Yossef
	Calculate a cpumask of CPUs with per-cpu pages in any zone and only send an IPI requesting CPUs to drain these pages to the buddy allocator if they actually have pages when asked to flush. This patch saves 85%+ of IPIs asking to drain per-cpu pages in case of severe memory pressure that leads to OOM since in these cases multiple, possibly concurrent, allocation requests end up in the direct reclaim code path so when the per-cpu pages end up reclaimed on first allocation failure for most of the proceeding allocation attempts until the memory pressure is off (possibly via the OOM killer) there are no per-cpu pages on most CPUs (and there can easily be hundreds of them). This also has the side effect of shortening the average latency of direct reclaim by 1 or more order of magnitude since waiting for all the CPUs to ACK the IPI takes a long time. Tested by running "hackbench 400" on a 8 CPU x86 VM and observing the difference between the number of direct reclaim attempts that end up in drain_all_pages() and those were more then 1/2 of the online CPU had any per-cpu page in them, using the vmstat counters introduced in the next patch in the series and using proc/interrupts. In the test sceanrio, this was seen to save around 3600 global IPIs after trigerring an OOM on a concurrent workload: $ cat /proc/vmstat \| tail -n 2 pcp_global_drain 0 pcp_global_ipi_saved 0 $ cat /proc/interrupts \| grep CAL CAL: 1 2 1 2 2 2 2 2 Function call interrupts $ hackbench 400 [OOM messages snipped] $ cat /proc/vmstat \| tail -n 2 pcp_global_drain 3647 pcp_global_ipi_saved 3642 $ cat /proc/interrupts \| grep CAL CAL: 6 13 6 3 3 3 1 2 7 Function call interrupts Please note that if the global drain is removed from the direct reclaim path as a patch from Mel Gorman currently suggests this should be replaced with an on_each_cpu_cond invocation. Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Christoph Lameter <cl@linux.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Pekka Enberg <penberg@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Acked-by: Michal Nazarewicz <mina86@mina86.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28	fs: only send IPI to invalidate LRU BH when needed	Gilad Ben-Yossef
	In several code paths, such as when unmounting a file system (but not only) we send an IPI to ask each cpu to invalidate its local LRU BHs. For multi-cores systems that have many cpus that may not have any LRU BH because they are idle or because they have not performed any file system accesses since last invalidation (e.g. CPU crunching on high perfomance computing nodes that write results to shared memory or only using filesystems that do not use the bh layer.) This can lead to loss of performance each time someone switches the KVM (the virtual keyboard and screen type, not the hypervisor) if it has a USB storage stuck in. This patch attempts to only send an IPI to cpus that have LRU BH. Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28	slub: only IPI CPUs that have per cpu obj to flush	Gilad Ben-Yossef
	flush_all() is called for each kmem_cache_destroy(). So every cache being destroyed dynamically ends up sending an IPI to each CPU in the system, regardless if the cache has ever been used there. For example, if you close the Infinband ipath driver char device file, the close file ops calls kmem_cache_destroy(). So running some infiniband config tool on one a single CPU dedicated to system tasks might interrupt the rest of the 127 CPUs dedicated to some CPU intensive or latency sensitive task. I suspect there is a good chance that every line in the output of "git grep kmem_cache_destroy linux/ \| grep '\->'" has a similar scenario. This patch attempts to rectify this issue by sending an IPI to flush the per cpu objects back to the free lists only to CPUs that seem to have such objects. The check which CPU to IPI is racy but we don't care since asking a CPU without per cpu objects to flush does no damage and as far as I can tell the flush_all by itself is racy against allocs on remote CPUs anyway, so if you required the flush_all to be determinstic, you had to arrange for locking regardless. Without this patch the following artificial test case: $ cd /sys/kernel/slab $ for DIR in *; do cat $DIR/alloc_calls > /dev/null; done produces 166 IPIs on an cpuset isolated CPU. With it it produces none. The code path of memory allocation failure for CPUMASK_OFFSTACK=y config was tested using fault injection framework. Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Pekka Enberg <penberg@kernel.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Sasha Levin <levinsasha928@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Avi Kivity <avi@redhat.com> Cc: Michal Nazarewicz <mina86@mina86.org> Cc: Kosaki Motohiro <kosaki.motohiro@gmail.com> Cc: Milton Miller <miltonm@bga.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28	smp: add func to IPI cpus based on parameter func	Gilad Ben-Yossef
	Add the on_each_cpu_cond() function that wraps on_each_cpu_mask() and calculates the cpumask of cpus to IPI by calling a function supplied as a parameter in order to determine whether to IPI each specific cpu. The function works around allocation failure of cpumask variable in CONFIG_CPUMASK_OFFSTACK=y by itereating over cpus sending an IPI a time via smp_call_function_single(). The function is useful since it allows to seperate the specific code that decided in each case whether to IPI a specific cpu for a specific request from the common boilerplate code of handling creating the mask, handling failures etc. [akpm@linux-foundation.org: s/gfpflags/gfp_flags/] [akpm@linux-foundation.org: avoid double-evaluation of `info' (per Michal), parenthesise evaluation of `cond_func'] [akpm@linux-foundation.org: s/CPU/CPUs, use all 80 cols in comment] Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux-foundation.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Pekka Enberg <penberg@kernel.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Sasha Levin <levinsasha928@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Avi Kivity <avi@redhat.com> Acked-by: Michal Nazarewicz <mina86@mina86.org> Cc: Kosaki Motohiro <kosaki.motohiro@gmail.com> Cc: Milton Miller <miltonm@bga.com> Reviewed-by: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28	smp: introduce a generic on_each_cpu_mask() function	Gilad Ben-Yossef
	We have lots of infrastructure in place to partition multi-core systems such that we have a group of CPUs that are dedicated to specific task: cgroups, scheduler and interrupt affinity, and cpuisol= boot parameter. Still, kernel code will at times interrupt all CPUs in the system via IPIs for various needs. These IPIs are useful and cannot be avoided altogether, but in certain cases it is possible to interrupt only specific CPUs that have useful work to do and not the entire system. This patch set, inspired by discussions with Peter Zijlstra and Frederic Weisbecker when testing the nohz task patch set, is a first stab at trying to explore doing this by locating the places where such global IPI calls are being made and turning the global IPI into an IPI for a specific group of CPUs. The purpose of the patch set is to get feedback if this is the right way to go for dealing with this issue and indeed, if the issue is even worth dealing with at all. Based on the feedback from this patch set I plan to offer further patches that address similar issue in other code paths. This patch creates an on_each_cpu_mask() and on_each_cpu_cond() infrastructure API (the former derived from existing arch specific versions in Tile and Arm) and uses them to turn several global IPI invocation to per CPU group invocations. Core kernel: on_each_cpu_mask() calls a function on processors specified by cpumask, which may or may not include the local processor. You must not call this function with disabled interrupts or from a hardware interrupt handler or from a bottom half handler. arch/arm: Note that the generic version is a little different then the Arm one: 1. It has the mask as first parameter 2. It calls the function on the calling CPU with interrupts disabled, but this should be OK since the function is called on the other CPUs with interrupts disabled anyway. arch/tile: The API is the same as the tile private one, but the generic version also calls the function on the with interrupts disabled in UP case This is OK since the function is called on the other CPUs with interrupts disabled. Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com> Reviewed-by: Christoph Lameter <cl@linux.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Pekka Enberg <penberg@kernel.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Sasha Levin <levinsasha928@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Avi Kivity <avi@redhat.com> Acked-by: Michal Nazarewicz <mina86@mina86.org> Cc: Kosaki Motohiro <kosaki.motohiro@gmail.com> Cc: Milton Miller <miltonm@bga.com> Cc: Russell King <linux@arm.linux.org.uk> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28	swapon: check validity of swap_flags	Hugh Dickins
	Most system calls taking flags first check that the flags passed in are valid, and that helps userspace to detect when new flags are supported. But swapon never did so: start checking now, to help if we ever want to support more swap_flags in future. It's difficult to get stray bits set in an int, and swapon is not widely used, so this is most unlikely to break any userspace; but we can just revert if it turns out to do so. Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28	mm, coredump: fail allocations when coredumping instead of oom killing	David Rientjes
	The size of coredump files is limited by RLIMIT_CORE, however, allocating large amounts of memory results in three negative consequences: - the coredumping process may be chosen for oom kill and quickly deplete all memory reserves in oom conditions preventing further progress from being made or tasks from exiting, - the coredumping process may cause other processes to be oom killed without fault of their own as the result of a SIGSEGV, for example, in the coredumping process, or - the coredumping process may result in a livelock while writing to the dump file if it needs memory to allocate while other threads are in the exit path waiting on the coredumper to complete. This is fixed by implying __GFP_NORETRY in the page allocator for coredumping processes when reclaim has failed so the allocations fail and the process continues to exit. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28	mm: thp: fix up pmd_trans_unstable() locations	Andrea Arcangeli
	pmd_trans_unstable() should be called before pmd_offset_map() in the locations where the mmap_sem is held for reading. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mark Salter <msalter@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28	mm for fs: add truncate_pagecache_range()	Hugh Dickins
	Holepunching filesystems ext4 and xfs are using truncate_inode_pages_range but forgetting to unmap pages first (ocfs2 remembers). This is not really a bug, since races already require truncate_inode_page() to handle that case once the page is locked; but it can be very inefficient if the file being punched happens to be mapped into many vmas. Provide a drop-in replacement truncate_pagecache_range() which does the unmapping pass first, handling the awkward mismatch between arguments to truncate_inode_pages_range() and arguments to unmap_mapping_range(). Note that holepunching does not unmap privately COWed pages in the range: POSIX requires that we do so when truncating, but it's hard to justify, difficult to implement without an i_size cutoff, and no filesystem is attempting to implement it. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Ben Myers <bpm@sgi.com> Cc: Alex Elder <elder@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28	procfs: fix /proc/statm	KAMEZAWA Hiroyuki
	bda7bad62bc4 ("procfs: speed up /proc/pid/stat, statm") broke /proc/statm - 'text' is printed twice by mistake. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reported-by: Ulrich Drepper <drepper@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28	Merge tag 'writeback-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux Pull trivial writeback fixes from Wu Fengguang: "They've been tested in linux-next for 20 days actually." * tag 'writeback-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux: writeback: Remove outdated comment fs: Remove bogus wait in write_inode_now()
2012-03-28	Merge tag 'ext4_for_linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates for 3.4 from Ted Ts'o: "Ext4 commits for 3.3 merge window; mostly cleanups and bug fixes The changes to export dirty_writeback_interval are from Artem's s_dirt cleanup patch series. The same is true of the change to remove the s_dirt helper functions which never got used by anyone in-tree. I've run these changes by Al Viro, and am carrying them so that Artem can more easily fix up the rest of the file systems during the next merge window. (Originally we had hopped to remove the use of s_dirt from ext4 during this merge window, but his patches had some bugs, so I ultimately ended dropping them from the ext4 tree.)" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (66 commits) vfs: remove unused superblock helpers mm: export dirty_writeback_interval ext4: remove useless s_dirt assignment ext4: write superblock only once on unmount ext4: do not mark superblock as dirty unnecessarily ext4: correct ext4_punch_hole return codes ext4: remove restrictive checks for EOFBLOCKS_FL ext4: always set then trimmed blocks count into len ext4: fix trimmed block count accunting ext4: fix start and len arguments handling in ext4_trim_fs() ext4: update s_free_{inodes,blocks}_count during online resize ext4: change some printk() calls to use ext4_msg() instead ext4: avoid output message interleaving in ext4_error_<foo>() ext4: remove trailing newlines from ext4_msg() and ext4_error() messages ext4: add no_printk argument validation, fix fallout ext4: remove redundant "EXT4-fs: " from uses of ext4_msg ext4: give more helpful error message in ext4_ext_rm_leaf() ext4: remove unused code from ext4_ext_map_blocks() ext4: rewrite punch hole to use ext4_ext_remove_space() jbd2: cleanup journal tail after transaction commit ...
2012-03-28	Merge branch 'for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates for 3.4-rc1 from Sage Weil: "Alex has been busy. There are a range of rbd and libceph cleanups, especially surrounding device setup and teardown, and a few critical fixes in that code. There are more cleanups in the messenger code, virtual xattrs, a fix for CRC calculation/checks, and lots of other miscellaneous stuff. There's a patch from Amon Ott to make inos behave a bit better on 32-bit boxes, some decode check fixes from Xi Wang, and network throttling fix from Jim Schutt, and a couple RBD fixes from Josh Durgin. No new functionality, just a lot of cleanup and bug fixing." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (65 commits) rbd: move snap_rwsem to the device, rename to header_rwsem ceph: fix three bugs, two in ceph_vxattrcb_file_layout() libceph: isolate kmap() call in write_partial_msg_pages() libceph: rename "page_shift" variable to something sensible libceph: get rid of zero_page_address libceph: only call kernel_sendpage() via helper libceph: use kernel_sendpage() for sending zeroes libceph: fix inverted crc option logic libceph: some simple changes libceph: small refactor in write_partial_kvec() libceph: do crc calculations outside loop libceph: separate CRC calculation from byte swapping libceph: use "do" in CRC-related Boolean variables ceph: ensure Boolean options support both senses libceph: a few small changes libceph: make ceph_tcp_connect() return int libceph: encapsulate some messenger cleanup code libceph: make ceph_msgr_wq private libceph: encapsulate connection kvec operations libceph: move prepare_write_banner() ...
2012-03-28	Merge branch 'for_linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext3, UDF, and quota fixes from Jan Kara: "A couple of ext3 & UDF fixes and also one improvement in quota locking." * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext3: fix start and len arguments handling in ext3_trim_fs() udf: Fix deadlock in udf_release_file() udf: Fix file entry logicalBlocksRecorded udf: Fix handling of i_blocks quota: Make quota code not call tty layer with dqptr_sem held udf: Init/maintain file entry checkpoint field ext3: Update ctime in ext3_splice_branch() only when needed ext3: Don't call dquot_free_block() if we don't update anything udf: Remove unnecessary OOM messages
2012-03-28	Merge tag 'for-linus-3.4-merge-window' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs Pull 9p changes for the 3.4 merge window from Eric Van Hensbergen. * tag 'for-linus-3.4-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: 9p: statfs should not override server f_type net/9p: handle flushed Tclunk/Tremove net/9p: don't allow Tflush to be interrupted
2012-03-28	vfs: fix d_ancestor() case in d_materialize_unique	Michel Lespinasse
	In d_materialise_unique() there are 3 subcases to the 'aliased dentry' case; in two subcases the inode i_lock is properly released but this does not occur in the -ELOOP subcase. This seems to have been introduced by commit 1836750115f2 ("fix loop checks in d_materialise_unique()"). Signed-off-by: Michel Lespinasse <walken@google.com> Cc: stable@vger.kernel.org # v3.0+ [ Added a comment, and moved the unlock to where we generate the -ELOOP, which seems to be more natural. You probably can't actually trigger this without a buggy network file server - d_materialize_unique() is for finding aliases on non-local filesystems, and the d_ancestor() case is for a hardlinked directory loop. But we should be robust in the case of such buggy servers anyway. ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-27	Merge branch 'for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 patches part 2 from Martin Schwidefsky: "Some minor improvements and one additional feature for the 3.4 merge window: Hendrik added perf support for the s390 CPU counters." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: [S390] register cpu devices for SMP=n [S390] perf: add support for s390x CPU counters [S390] oprofile: Allow multiple users of the measurement alert interrupt [S390] qdio: log all adapter characteristics [S390] Remove unncessary export of arch_pick_mmap_layout
2012-03-27	Merge branch 'for-linus-3.4-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml Pull UML changes from Richard Weinberger: "Mostly bug fixes and cleanups" * 'for-linus-3.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: (35 commits) um: Update defconfig um: Switch to large mcmodel on x86_64 MTD: Relax dependencies um: Wire CONFIG_GENERIC_IO up um: Serve io_remap_pfn_range() Introduce CONFIG_GENERIC_IO um: allow SUBARCH=x86 um: most of the SUBARCH uses can be killed um: deadlock in line_write_interrupt() um: don't bother trying to rebuild CHECKFLAGS for USER_OBJS um: use the right ifdef around exports in user_syms.c um: a bunch of headers can be killed by using generic-y um: ptrace-generic.h doesn't need user.h um: kill HOST_TASK_PID um: remove pointless include of asm/fixmap.h from asm/pgtable.h um: asm-offsets.h might as well come from underlying arch... um: merge processor_{32,64}.h a bit... um: switch close_chan() to struct line um: race fix: initialize delayed_work before registering IRQ um: line->have_irq is never checked... ...
2012-03-27	Merge branch 'next' of git://git.monstr.eu/linux-2.6-microblaze	Linus Torvalds
	Pull arch/microblaze fixes from Michal Simek * 'next' of git://git.monstr.eu/linux-2.6-microblaze: microblaze: Handle TLB skip size dynamically microblaze: Introduce TLB skip size microblaze: Improve TLB calculation for small systems microblaze: Extend space for compiled-in FDT to 32kB microblaze: Clear all MSR flags on the first kernel instruction microblaze: Use node name instead of compatible string microblaze: Fix mapin_ram function microblaze: Highmem support microblaze: Use active regions microblaze: Show more detailed information about memory microblaze: Introduce fixmap microblaze: mm: Fix lowmem max memory size limits microblaze: mm: Use ZONE_DMA instead of ZONE_NORMAL microblaze: trivial: Fix typo fault in timer.c microblaze: Use vsprintf extention %pf with builtin_return_address microblaze: Add PVR version string for MB 8.20.b and 8.30.a microblaze: Fix makefile to work with latest toolchain microblaze: Fix typo in early_printk.c
2012-03-27	Merge branch 'platforms' of git://git.linaro.org/people/rmk/linux-arm	Linus Torvalds
	Pull ARM platform updates from Russell King: "This covers platform stuff for platforms I have a direct interest in (iow, I have the hardware). Essentially: - as we no longer support any other Acorn platforms other than RiscPC anymore, we can collect all that code into mach-rpc. - convert Acorn expansion card stuff to use IRQ allocation functions, and get rid of NO_IRQ from there. - cleanups to the ebsa110 platform to move some private stuff out of its header files. - large amount of SA11x0 updates: - conversion of private DMA implementation to DMA engine support (this actually gives us greater flexibility in drivers over the old API.) - re-worked ucb1x00 updates - convert to genirq, remove sa11x0 dependencies, fix various minor issues - move platform specific sa11x0 framebuffer data into platform files in arch/arm instead of keeping this in the driver itself - update sa11x0 IrDA driver for DMA engine, and allow it to use DMA for SIR transmissions as well as FIR - rework sa1111 support for genirq, and irq allocation - fix sa1111 IRQ support so it works again - use sparse IRQ support After this, I have one more pull request remaining from my current set, which I think is going to be the most problematical as it generates 8 conflicts." Fixed up the trivial conflict in arch/arm/mach-rpc/Makefile as per Russell. * 'platforms' of git://git.linaro.org/people/rmk/linux-arm: (125 commits) ARM: 7343/1: sa11x0: convert to sparse IRQ ARM: 7342/2: sa1100: prepare for sparse irq conversion ARM: 7341/1: input: prepare jornada720 keyboard and ts for sa11x0 sparse irq ARM: 7340/1: rtc: sa1100: include mach/irqs.h instead of asm/irq.h ARM: sa11x0: remove unused DMA controller definitions ARM: sa11x0: remove old SoC private DMA driver USB: sa1111: add hcd .reset method USB: sa1111: add OHCI shutdown methods USB: sa1111: reorganize ohci-sa1111.c USB: sa1111: get rid of nasty printk(KERN_DEBUG "%s: ...", __FILE__) USB: sa1111: sparse and checkpatch cleanups ARM: sa11x0: don't static map sa1111 ARM: sa1111: use dev_err() rather than printk() ARM: sa1111: cleanup sub-device registration and unregistration ARM: sa1111: only setup DMA for DMA capable devices ARM: sa1111: register sa1111 devices with dmabounce in bus notifier ARM: sa1111: move USB interface register definitions to ohci-sa1111.c ARM: sa1111: move PCMCIA interface register definitions to sa1111_generic.c ARM: sa1111: move PS/2 interface register definitions to sa1111p2.c ARM: sa1111: delete unused physical GPIO register definitions ...
2012-03-27	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	Linus Torvalds
	Pull networking fixes from David Miller: 1) Name string overrun fix in gianfar driver from Joe Perches. 2) VHOST bug fixes from Michael S. Tsirkin and Nadav Har'El 3) Fix dependencies on xt_LOG netfilter module, from Pablo Neira Ayuso. 4) Fix RCU locking in xt_CT, also from Pablo Neira Ayuso. 5) Add a parameter to skb_add_rx_frag() so we can fix the truesize adjustments in the drivers that use it. The individual drivers aren't fixed by this commit, but will be dealt with using follow-on commits. From Eric Dumazet. 6) Add some device IDs to qmi_wwan driver, from Andrew Bird. 7) Fix a potential rcu_read_lock() imbalancein rt6_fill_node(). From Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: net: fix a potential rcu_read_lock() imbalance in rt6_fill_node() net: add a truesize parameter to skb_add_rx_frag() gianfar: Fix possible overrun and simplify interrupt name field creation USB: qmi_wwan: Add ZTE (Vodafone) K3570-Z and K3571-Z net interfaces USB: option: Ignore ZTE (Vodafone) K3570/71 net interfaces USB: qmi_wwan: Add ZTE (Vodafone) K3565-Z and K4505-Z net interfaces qlcnic: Bug fix for LRO netfilter: nf_conntrack: permanently attach timeout policy to conntrack netfilter: xt_CT: fix assignation of the generic protocol tracker netfilter: xt_CT: missing rcu_read_lock section in timeout assignment netfilter: cttimeout: fix dependency with l4protocol conntrack module netfilter: xt_LOG: use CONFIG_IP6_NF_IPTABLES instead of CONFIG_IPV6 vhost: fix release path lockdep checks vhost: don't forget to schedule() tools/virtio: stub out strong barriers tools/virtio: add linux/hrtimer.h stub tools/virtio: add linux/module.h stub
2012-03-27	Merge tag 'dt' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc	Linus Torvalds
	Pull "ARM: device tree work" from Arnd Bergmann: "Most of these patches convert code from using static platform data to describing the hardware in the device tree. This is only the first half of the changes for v3.4 because a lot of patches for this topic came in the last week before the merge window. Signed-off-by: Arnd Bergmann <arnd@arndb.de>" Fix up trivial conflicts in arch/arm/mach-vexpress/{Kconfig,core.h} * tag 'dt' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (86 commits) Document: devicetree: add OF documents for arch-mmp ARM: dts: append DTS file of pxa168 ARM: mmp: append OF support on pxa168 ARM: mmp: enable rtc clk in pxa168 i2c: pxa: add OF support serial: pxa: add OF support arm/dts: mt_ventoux: very basic support for TeeJet Mt.Ventoux board ARM: OMAP2+: Remove extra ifdefs for board-generic ARM: OMAP2+: Fix build error when only ARCH_OMAP2/3 or 4 is selected ASoC: DT: Add digital microphone binding to PAZ00 board. ARM: dt: Add ARM PMU to tegra*.dtsi ARM: at91: at91sam9x5cm/dt: add leds support ARM: at91: usb_a9g20/dt: add gpio-keys support ARM: at91: at91sam9m10g45ek/dt: add gpio-keys support ARM: at91: at91sam9m10g45ek/dt: add leds support ARM: at91: usb_a9g20/dt: add leds support ARM: at91/pio: add new PIO3 features ARM: at91: add sam9_smc.o to at91sam9x5 build ARM: at91/tc/clocksource: Add 32 bit variant to Timer Counter ARM: at91/tc: add device tree support to atmel_tclib ...
2012-03-27	Merge tag 'drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc	Linus Torvalds
	Pull "ARM: driver specific updates" from Arnd Bergmann: "These are all specific to some driver. They are typically the platform side of a change in the drivers directory, such as adding a new driver or extending the interface to the platform. In cases where there is no maintainer for the driver, or the maintainer prefers to have the platform changes in the same branch as the driver changes, the patches to the drivers are included as well. A much smaller set of driver updates that depend on other branches getting merged first will be sent later. The new export of tegra_chip_uid conflicts with other changes in fuse.c. In rtc-sa1100.c, the global removal of IRQF_DISABLED conflicts with the cleanup of the interrupt handling of that driver. Signed-off-by: Arnd Bergmann <arnd@arndb.de>" Fixed up aforementioned trivial conflicts. * tag 'drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (94 commits) ARM: SAMSUNG: change the name from s3c-sdhci to exynos4-sdhci mmc: sdhci-s3c: add platform data for the second capability ARM: SAMSUNG: support the second capability for samsung-soc ARM: EXYNOS: add support DMA for EXYNOS4X12 SoC ARM: EXYNOS: Add apb_pclk clkdev entry for mdma1 ARM: EXYNOS: Enable MDMA driver regulator: Remove bq24022 regulator driver rtc: sa1100: add OF support pxa: magician/hx4700: Convert to gpio-regulator from bq24022 ARM: OMAP3+: SmartReflex: fix error handling ARM: OMAP3+: SmartReflex: fix the use of debugfs_create_* API ARM: OMAP3+: SmartReflex: micro-optimization for sanity check ARM: OMAP3+: SmartReflex: misc cleanups ARM: OMAP3+: SmartReflex: move late_initcall() closer to its argument ARM: OMAP3+: SmartReflex: add missing platform_set_drvdata() ARM: OMAP3+: hwmod: add SmartReflex IRQs ARM: OMAP3+: SmartReflex: clear ERRCONFIG_VPBOUNDINTST only on a need ARM: OMAP3+: SmartReflex: Fix status masking in ERRCONFIG register ARM: OMAP3+: SmartReflex: Add a shutdown hook ARM: OMAP3+: SmartReflex Class3: disable errorgen before disable VP ... Conflicts: arch/arm/mach-tegra/Makefile arch/arm/mach-tegra/fuse.c drivers/rtc/rtc-sa1100.c
2012-03-27	Merge tag 'rpmsg' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc	Linus Torvalds
	Pull "remoteproc/rpmsg: new subsystem" from Arnd Bergmann: "This new subsystem provides a common way to talk to secondary processors on an SoC, e.g. a DSP, GPU or service processor, using virtio as the transport. In the long run, it should replace a few dozen vendor specific ways to do the same thing, which all never made it into the upstream kernel. There is a broad agreement that rpmsg is the way to go here and several vendors have started working on replacing their own subsystems. Two branches each add one virtio protocol number. Fortunately the numbers were agreed upon in advance, so there are only context changes. Signed-off-by: Arnd Bergmann <arnd@arndb.de>" Fixed up trivial protocol number conflict due to the mentioned additions next to each other. * tag 'rpmsg' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (32 commits) remoteproc: cleanup resource table parsing paths remoteproc: remove the hardcoded vring alignment remoteproc/omap: remove the mbox_callback limitation remoteproc: remove the single rpmsg vdev limitation remoteproc: safer boot/shutdown order remoteproc: remoteproc_rpmsg -> remoteproc_virtio remoteproc: resource table overhaul rpmsg: fix build warning when dma_addr_t is 64-bit rpmsg: fix published buffer length in rpmsg_recv_done rpmsg: validate incoming message length before propagating rpmsg: fix name service endpoint leak remoteproc/omap: two Kconfig fixes remoteproc: make sure we're parsing a 32bit firmware remoteproc: s/big switch/lookup table/ remoteproc: bail out if firmware has different endianess remoteproc: don't use virtio's weak barriers rpmsg: rename virtqueue_add_buf_gfp to virtqueue_add_buf rpmsg: depend on EXPERIMENTAL remoteproc: depend on EXPERIMENTAL rpmsg: add Kconfig menu ... Conflicts: include/linux/virtio_ids.h
2012-03-27	Merge tag 'boards' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc	Linus Torvalds
	Pull "ARM: board specific updates" from Arnd Bergmann/Olof Johansson: "These changes are all specific to one board only. We're trying to keep the number of board files low, but generally board level updates are ok on platforms that are working on moving towards DT based probing, which will eventually lead to removing them. The board-ams-delta.c board file gets a conflict between the removal of ams_delta_config and the addition of a lot of other data. The Kconfig file has two changes in the same line, and in exynos, the power domain cleanup conflicts with the addition of the image sensor device. Signed-off-by: Arnd Bergmann <arnd@arndb.de> [olof: Amended a fix for a mismerge to board-omap4panda.c] Signed-off-by: Olof Johansson <olof@lixom.net>" Fixed up some fairly trivial conflicts manually. * tag 'boards' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (82 commits) i.MX35-PDK: Add Camera support ARM : mx35: 3ds-board: add framebuffer device pxa/hx4700: Remove pcmcia platform_device structure ARM: pxa/hx4700: Reduce sleep mode battery discharge by 35% ARM: pxa/hx4700: Remove unwanted request for GPIO105 ARM: EXYNOS: support Exynos4210-bus Devfreq driver on Nuri board ARM: EXYNOS: Register JPEG on nuri ARM: EXYNOS: Register JPEG on universal_c210 ARM: S5PV210: Enable JPEG on SMDKV210 ARM: S5PV210: Add JPEG board definition ARM: EXYNOS: Enable JPEG on Origen ARM: EXYNOS: Enable JPEG on SMDKV310 ARM: EXYNOS: Add __init attribute to universal_camera_init() ARM: EXYNOS: Add __init attribute to nuri_camera_init() ARM: S5PV210: Enable FIMC on SMDKC110 ARM: S5PV210: Enable FIMC on SMDKV210 ARM: S5PV210: Enable MFC on SMDKC110 ARM: S5PV210: Enable MFC on SMDKV210 ARM: EXYNOS: Enable G2D on SMDKV310 ARM: tegra: update defconfig ...
2012-03-27	Merge tag 'soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc	Linus Torvalds
	Pull "ARM: SoC specific updates" from Arnd Bergmann: "These changes are all specific to an soc family or the code for one soc. Lots of work for Tegra3 this time, but also a lot of other platforms. There will be another (smaller) set of soc patches later in the merge window for stuff that has dependencies on external trees or that was sent just before the merge window opened. The asoc tree added a few devices to the i.mx platform, which conflict with other devices added in the same place here. The tegra Makefile conflicts between a number of branches, mostly because of changes regarding localtimer.c, which was removed in the end. Signed-off-by: Arnd Bergmann <arnd@arndb.de>" Fix up some trivial conflicts, including the mentioned Tegra Makefile. * tag 'soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (51 commits) ARM: EXYNOS: fix cycle count for periodic mode of clock event timers ARM: EXYNOS: add support JPEG ARM: EXYNOS: Add DMC1, allow PPMU access for DMC ARM: SAMSUNG: Correct MIPI-CSIS io memory resource definition ARM: SAMSUNG: fix __init attribute on regarding s3c_set_platdata() ARM: SAMSUNG: Add __init attribute to samsung_bl_set() ARM: S5PV210: Add usb otg phy control ARM: S3C64XX: Add usb otg phy control ARM: EXYNOS: Enable l2 configuration through device tree ARM: EXYNOS: remove useless code to save/restore L2 ARM: EXYNOS: save L2 settings during bootup ARM: S5P: add L2 early resume code ARM: EXYNOS: Add support AFTR mode on EXYNOS4210 ARM: mx35: Setup the AIPS registers ARM: mx5: Use common function for configuring AIPS ARM: mx3: Setup AIPS registers ARM: mx3: Let mx31 and mx35 enter in LPM mode in WFI ARM: defconfig: imx_v6_v7: build in REGULATOR_FIXED_VOLTAGE ARM: imx: update imx_v6_v7_defconfig ARM: tegra: Demote EMC clock inconsistency BUG to WARN ...
2012-03-27	Merge tag 'timer' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc	Linus Torvalds
	Pull "ARM: timer cleanup work" from Arnd Bergmann: "These are split out from the generic soc and driver updates because there was a lot of conflicting work by multiple people. Marc Zyngier worked on simplifying the "localtimer" interfaces, and some of the platforms are touching the same code as they move to device tree based booting. Signed-off-by: Arnd Bergmann <arnd@arndb.de>" * tag 'timer' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (61 commits) ARM: tegra: select USB_ULPI if USB is selected arm/tegra: pcie: fix return value of function ARM: ux500: fix compilation after local timer rework ARM: shmobile: remove additional __io() macro use ARM: local timers: make the runtime registration interface mandatory ARM: local timers: convert MSM to runtime registration interface ARM: local timers: convert exynos to runtime registration interface ARM: smp_twd: remove old local timer interface ARM: imx6q: convert to twd_local_timer_register() interface ARM: highbank: convert to twd_local_timer_register() interface ARM: ux500: convert to twd_local_timer_register() interface ARM: shmobile: convert to twd_local_timer_register() interface ARM: tegra: convert to twd_local_timer_register() interface ARM: plat-versatile: convert to twd_local_timer_register() interface ARM: OMAP4: convert to twd_local_timer_register() interface ARM: smp_twd: add device tree support ARM: smp_twd: add runtime registration support ARM: local timers: introduce a new registration interface ARM: smp_twd: make local_timer_stop a symbol instead of a #define ARM: mach-shmobile: default to no earlytimer ...
2012-03-27	Merge tag 'cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc	Linus Torvalds
	Pull "ARM: global cleanups" from Arnd Bergmann: "Quite a bit of code gets removed, and some stuff moved around, mostly the old samsung s3c24xx stuff. There should be no functional changes in this series otherwise. Some cleanups have dependencies on other arm-soc branches and will be sent in the second round. Signed-off-by: Arnd Bergmann <arnd@arndb.de>" Fixed up trivial conflicts mainly due to #include's being changes on both sides. * tag 'cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (121 commits) ep93xx: Remove unnecessary includes of ep93xx-regs.h ep93xx: Move EP93XX_SYSCON defines to SoC private header ep93xx: Move crunch code to mach-ep93xx directory ep93xx: Make syscon access functions private to SoC ep93xx: Configure GPIO ports in core code ep93xx: Move peripheral defines to local SoC header ep93xx: Convert the watchdog driver into a platform device. ep93xx: Use ioremap for backlight driver ep93xx: Move GPIO defines to gpio-ep93xx.h ep93xx: Don't use system controller defines in audio drivers ep93xx: Move PHYS_BASE defines to local SoC header file ARM: EXYNOS: Add clock register addresses for EXYNOS4X12 bus devfreq driver ARM: EXYNOS: add clock registers for exynos4x12-cpufreq PM / devfreq: update the name of EXYNOS clock registers that were omitted PM / devfreq: update the name of EXYNOS clock register ARM: EXYNOS: change the prefix S5P_ to EXYNOS4_ for clock ARM: EXYNOS: use static declaration on regarding clock ARM: EXYNOS: replace clock.c for other new EXYNOS SoCs ARM: OMAP2+: Fix build error after merge ARM: S3C24XX: remove call to s3c24xx_setup_clocks ...
2012-03-27	Merge tag 'maintainers' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull "ARM: subarch maintainer updates" from Arnd Bergmann: "This is a collection of updates to the MAINTAINERS file, separated out mostly to give an overview of what has changed regarding who does what. Signed-off-by: Arnd Bergmann <arnd@arndb.de>" * tag 'maintainers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: tegra: update main repo and add patchwork MAINTAINERS: update MAINTAINERS email entry MAINTAINERS: update maintainer entry for pxa/hx4700 MAINTAINERS: ARM: tegra: update Stephen's email address MAINTAINERS: add TI DaVinci git tree information MAINTAINERS: mark TI DaVinci list as "moderated" MAINTAINERS: remove arch/arm/mach-mx*/ from IMX entry
2012-03-27	Merge tag 'fixes-non-critical' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull "ARM: Non-critical bug fixes" from Ardn Bergmann: "Simple bug fixes that were not considered important enough for inclusion into 3.3. One bug fix was originally intended for 3.3 but accidentally got missed, but is not marked stable because it should only get backported once later fixes also make it into v3.4. Signed-off-by: Arnd Bergmann <arnd@arndb.de>" * tag 'fixes-non-critical' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (66 commits) iomux-mx25.h slew rate adjusted for LCD __LD pins ARM: davinci: DA850: move da850_register_pm to .init.text ARM: davinci: cpufreq: fix compiler warning ARM: OMAP2+: Fix build for omap4 only builds with missing include of linux/bug.h ARM: OMAP2+: Fix section warnings for hsmmc_init_one ARM: OMAP2+: Fix build issues with missing include of linux/bug.h ARM: OMAP2+: gpmc-smsc911x: only register regulator for first instance ARM: OMAP3+: PM: VP: fix integer truncation error ARM: OMAP2+: PM: fix wakeupgen warning when hotplug disabled ARM: OMAP2+: PM: fix section mismatch with omap2_init_processor_devices() ARM: OMAP2: Fix section warning for n8x0 when CONFIG_MMC_OMAP is not set ARM: OMAP2+: Fix omap24xx_io_desc warning if SoC subtypes are not selected ARM: OMAP1: Fix section mismatch for omap1_init_early() ARM: OMAP1: Fix typo in lcd_dma.c ARM: OMAP: mailbox: trivial whitespace fix ARM: OMAP: Remove definition cpu_is_omap4430() ARM: OMAP2+: included some headers twice ARM: OMAP: clock.c: included linux/debugfs.h twice ARM: OMAP: don't build hwspinlock in vain ARM: OMAP2+: ads7846_init: put gpio_pendown into pdata if it's provided ...
2012-03-27	net: fix a potential rcu_read_lock() imbalance in rt6_fill_node()	Eric Dumazet
	Commit f2c31e32b378 (net: fix NULL dereferences in check_peer_redir() ) added a regression in rt6_fill_node(), leading to rcu_read_lock() imbalance. Thats because NLA_PUT() can make a jump to nla_put_failure label. Fix this by using nla_put() Many thanks to Ben Greear for his help Reported-by: Ben Greear <greearb@candelatech.com> Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-27	Merge tag 'asoc-3.4' of ↵	Arnd Bergmann
	git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into next/boards The asoc branch that was already merged into v3.4 contains some board-level changes that conflict with patches we already have here, so pull in that branch to resolve the conflicts. Conflicts: arch/arm/mach-imx/mach-imx27_visstrim_m10.c arch/arm/mach-omap2/board-omap4panda.c Signed-off-by: Arnd Bergmann <arnd@arndb.de> [olof: Amended fix for mismerge as reported by Kevin Hilman] Signed-off-by: Olof Johansson <olof@lixom.net>