summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2012-07-31documentation: update how page-cluster affects swap I/OChristian Ehrhardt
Fix of the documentation of /proc/sys/vm/page-cluster to match the behavior of the code and add some comments about what the tunable will change in that behavior. Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Acked-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Minchan Kim <minchan@kernel.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31swap: allow swap readahead to be mergedChristian Ehrhardt
Swap readahead works fine, but the I/O to disk is almost always done in page size requests, despite the fact that readahead submits 1<<page-cluster pages at a time. On older kernels the old per device plugging behavior might have captured this and merged the requests, but currently all comes down to much more I/Os than required. On a single device this might not be an issue, but as soon as a server runs on shared san resources savin I/Os not only improves swapin throughput but also provides a lower resource utilization. With a load running KVM in a lot of memory overcommitment (the hot memory is 1.5 times the host memory) swapping throughput improves significantly and the lead feels more responsive as well as achieves more throughput. In a test setup with 16 swap disks running blocktrace on one of those disks shows the improved merging: Prior: Reads Queued: 560,888, 2,243MiB Writes Queued: 226,242, 904,968KiB Read Dispatches: 544,701, 2,243MiB Write Dispatches: 159,318, 904,968KiB Reads Requeued: 0 Writes Requeued: 0 Reads Completed: 544,716, 2,243MiB Writes Completed: 159,321, 904,980KiB Read Merges: 16,187, 64,748KiB Write Merges: 61,744, 246,976KiB IO unplugs: 149,614 Timer unplugs: 2,940 With the patch: Reads Queued: 734,315, 2,937MiB Writes Queued: 300,188, 1,200MiB Read Dispatches: 214,972, 2,937MiB Write Dispatches: 215,176, 1,200MiB Reads Requeued: 0 Writes Requeued: 0 Reads Completed: 214,971, 2,937MiB Writes Completed: 215,177, 1,200MiB Read Merges: 519,343, 2,077MiB Write Merges: 73,325, 293,300KiB IO unplugs: 337,130 Timer unplugs: 11,184 I got ~10% to ~40% more throughput in my cases and at the same time much lower cpu consumption when broken down per transferred kilobyte (the majority of that due to saved interrupts and better cache handling). In a shared SAN others might get an additional benefit as well, because this now causes less protocol overhead. Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Minchan Kim <minchan@kernel.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31memcg: remove MEM_CGROUP_CHARGE_TYPE_FORCEKamezawa Hiroyuki
There are no users since commit b24028572fb69 ("memcg: remove PCG_CACHE"). Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31memcg: rename MEM_CGROUP_CHARGE_TYPE_MAPPED as MEM_CGROUP_CHARGE_TYPE_ANONKamezawa Hiroyuki
Now, in memcg, 2 "MAPPED" enum/macro are found MEM_CGROUP_CHARGE_TYPE_MAPPED MEM_CGROUP_STAT_FILE_MAPPED Thier names looks similar to each other but the former is used for accounting anonymous memory. rename it as TYPE_ANON. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31memcg: rename MEM_CGROUP_STAT_SWAPOUT as MEM_CGROUP_STAT_SWAPKamezawa Hiroyuki
MEM_CGROUP_STAT_SWAPOUT represents the usage of swap rather than the number of swap-out events. Rename it to be MEM_CGROUP_STAT_SWAP. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31mm: make vb_alloc() more foolproofJan Kara
If someone calls vb_alloc() (or vm_map_ram() for that matter) to allocate 0 bytes (0 pages), get_order() returns BITS_PER_LONG - PAGE_CACHE_SHIFT and interesting stuff happens. So make debugging such problems easier and warn about 0-size allocation. [akpm@linux-foundation.org: use WARN_ON-return-value feature] Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31vmalloc: walk vmap_areas by sorted list instead of rb_next()Hong zhi guo
There's a walk by repeating rb_next to find a suitable hole. Could be simply replaced by walk on the sorted vmap_area_list. More simpler and efficient. Mutation of the list and tree only happens in pair within __insert_vmap_area and __free_vmap_area, under protection of vmap_area_lock. The patch code is also under vmap_area_lock, so the list walk is safe, and consistent with the tree walk. Tested on SMP by repeating batch of vmalloc anf vfree for random sizes and rounds for hours. Signed-off-by: Hong Zhiguo <honkiko@gmail.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31drivers/media/video/v4l2-ioctl.c: fix buildAndrew Morton
Fix zillions of these: drivers/media/video/v4l2-ioctl.c:1848: error: unknown field 'func' specified in initializer drivers/media/video/v4l2-ioctl.c:1848: warning: missing braces around initializer drivers/media/video/v4l2-ioctl.c:1848: warning: (near initialization for 'v4l2_ioctls[0].<anonymous>') drivers/media/video/v4l2-ioctl.c:1848: warning: initialization makes integer from pointer without a cast drivers/media/video/v4l2-ioctl.c:1848: error: initializer element is not computable at load time drivers/media/video/v4l2-ioctl.c:1848: error: (near initialization for 'v4l2_ioctls[0].<anonymous>.offset') Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31xtensa: select generic atomic64_t supportFengguang Wu
This will fix build errors: block/blk-cgroup.c:609:2: error: unknown type name 'atomic64_t' block/blk-cgroup.c:609:2: error: implicit declaration of function 'ATOMIC64_INIT' [-Werror=implicit-function-declaration] Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31fault-injection: fix failcmd.sh warningAkinobu Mita
"fault-injection: add tool to run command with failslab or fail_page_alloc" added tools/testing/fault-injection/failcmd.sh to make it easier to inject slab/page allocation failures by fault injection. failcmd.sh prints the following warning when running with arguments for command. # ./failcmd.sh echo aaa failcmd.sh: line 209: [: echo: binary operator expected aaa This warning is caused by an improper check whether at least one parameter is left after parsing command options. Fix it by testing the length of $1 instead of $@ Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30Merge tag 'writeback-proportions' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux Pull writeback updates from Wu Fengguang: "Use time based periods to age the writeback proportions, which can adapt equally well to fast/slow devices." Fix up trivial conflict in comment in fs/sync.c * tag 'writeback-proportions' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux: writeback: Fix some comment errors block: Convert BDI proportion calculations to flexible proportions lib: Fix possible deadlock in flexible proportion code lib: Proportions with flexible period
2012-07-30Merge tag 'nfs-for-3.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Features include: - More preparatory patches for modularising NFSv2/v3/v4. Split out the various NFSv2/v3/v4-specific code into separate files - More preparation for the NFSv4 migration code - Ensure that OPEN(O_CREATE) observes the pNFS mds threshold parameters - pNFS fast failover when the data servers are down - Various cleanups and debugging patches" * tag 'nfs-for-3.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (67 commits) nfs: fix fl_type tests in NFSv4 code NFS: fix pnfs regression with directio writes NFS: fix pnfs regression with directio reads sunrpc: clnt: Add missing braces nfs: fix stub return type warnings NFS: exit_nfs_v4() shouldn't be an __exit function SUNRPC: Add a missing spin_unlock to gss_mech_list_pseudoflavors NFS: Split out NFS v4 client functions NFS: Split out the NFS v4 filesystem types NFS: Create a single nfs_clone_super() function NFS: Split out NFS v4 server creating code NFS: Initialize the NFS v4 client from init_nfs_v4() NFS: Move the v4 getroot code to nfs4getroot.c NFS: Split out NFS v4 file operations NFS: Initialize v4 sysctls from nfs_init_v4() NFS: Create an init_nfs_v4() function NFS: Split out NFS v4 inode operations NFS: Split out NFS v3 inode operations NFS: Split out NFS v2 inode operations NFS: Clean up nfs4_proc_setclientid() and friends ...
2012-07-30Merge tag 'mfd-for-linus-3.6-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6 Pull MFD fix from Samuel Ortiz: "This one fixes an s5m8767 regulator build breakage due to a merge conflict caused by the MFD s5m API changes." * tag 'mfd-for-linus-3.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6: regulator: Fix an s5m8767 build failure
2012-07-30Merge branch 'v4l_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media updates from Mauro Carvalho Chehab: "This is the first part of the media patches for v3.6. This patch series contain: - new DVB frontend: rtl2832 - new video drivers: adv7393 - some unused files got removed - a selection API cleanup between V4L2 and V4L2 subdev API's - a major redesign at v4l-ioctl2, in order to clean it up - several driver fixes and improvements." * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (174 commits) v4l: Export v4l2-common.h in include/linux/Kbuild media: Revert "[media] Terratec Cinergy S2 USB HD Rev.2" [media] media: Use pr_info not homegrown pr_reg macro [media] Terratec Cinergy S2 USB HD Rev.2 [media] v4l: Correct conflicting V4L2 subdev selection API documentation [media] Feature removal: V4L2 selections API target and flag definitions [media] v4l: Unify selection flags documentation [media] v4l: Unify selection flags [media] v4l: Common documentation for selection targets [media] v4l: Unify selection targets across V4L2 and V4L2 subdev interfaces [media] v4l: Remove "_ACTUAL" from subdev selection API target definition names [media] V4L: Remove "_ACTIVE" from the selection target name definitions [media] media: dvb-usb: print mac address via native %pM [media] s5p-tv: Use module_i2c_driver in sii9234_drv.c file [media] media: gpio-ir-recv: add allowed_protos for platform data [media] s5p-jpeg: Use module_platform_driver in jpeg-core.c file [media] saa7134: fix spelling of detach in label [media] cx88-blackbird: replace ioctl by unlocked_ioctl [media] cx88: don't use current_norm [media] cx88: fix a number of v4l2-compliance violations ...
2012-07-30Merge branch 'akpm' (Andrew's patch-bomb)Linus Torvalds
Merge Andrew's first set of patches: "Non-MM patches: - lots of misc bits - tree-wide have_clk() cleanups - quite a lot of printk tweaks. I draw your attention to "printk: convert the format for KERN_<LEVEL> to a 2 byte pattern" which looks a bit scary. But afaict it's solid. - backlight updates - lib/ feature work (notably the addition and use of memweight()) - checkpatch updates - rtc updates - nilfs updates - fatfs updates (partial, still waiting for acks) - kdump, proc, fork, IPC, sysctl, taskstats, pps, etc - new fault-injection feature work" * Merge emailed patches from Andrew Morton <akpm@linux-foundation.org>: (128 commits) drivers/misc/lkdtm.c: fix missing allocation failure check lib/scatterlist: do not re-write gfp_flags in __sg_alloc_table() fault-injection: add tool to run command with failslab or fail_page_alloc fault-injection: add selftests for cpu and memory hotplug powerpc: pSeries reconfig notifier error injection module memory: memory notifier error injection module PM: PM notifier error injection module cpu: rewrite cpu-notifier-error-inject module fault-injection: notifier error injection c/r: fcntl: add F_GETOWNER_UIDS option resource: make sure requested range is included in the root range include/linux/aio.h: cpp->C conversions fs: cachefiles: add support for large files in filesystem caching pps: return PTR_ERR on error in device_create taskstats: check nla_reserve() return sysctl: suppress kmemleak messages ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION ipc: compat: use signed size_t types for msgsnd and msgrcv ipc: allow compat IPC version field parsing if !ARCH_WANT_OLD_COMPAT_IPC ipc: add COMPAT_SHMLBA support ...
2012-07-30drivers/misc/lkdtm.c: fix missing allocation failure checkAlan Cox
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=44691 Reported-by: <rucsoftsec@gmail.com> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30lib/scatterlist: do not re-write gfp_flags in __sg_alloc_table()Mandeep Singh Baines
We are seeing a lot of sg_alloc_table allocation failures using the new drm prime infrastructure. We isolated the cause to code in __sg_alloc_table that was re-writing the gfp_flags. There is a comment in the code that suggest that there is an assumption about the allocation coming from a memory pool. This was likely true when sg lists were primarily used for disk I/O. Signed-off-by: Mandeep Singh Baines <msb@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Cong Wang <amwang@redhat.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Rob Clark <rob.clark@linaro.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Inki Dae <inki.dae@samsung.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Sonny Rao <sonnyrao@chromium.org> Cc: Olof Johansson <olofj@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30fault-injection: add tool to run command with failslab or fail_page_allocAkinobu Mita
This adds tools/testing/fault-injection/failcmd.sh to run a command while injecting slab/page allocation failures via fault injection. Example: Run a command "make -C tools/testing/selftests/ run_tests" with injecting slab allocation failure. # ./tools/testing/fault-injection/failcmd.sh \ -- make -C tools/testing/selftests/ run_tests Same as above except to specify 100 times failures at most instead of one time at most by default. # ./tools/testing/fault-injection/failcmd.sh --times=100 \ -- make -C tools/testing/selftests/ run_tests Same as above except to inject page allocation failure instead of slab allocation failure. # env FAILCMD_TYPE=fail_page_alloc \ ./tools/testing/fault-injection/failcmd.sh --times=100 \ -- make -C tools/testing/selftests/ run_tests Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30fault-injection: add selftests for cpu and memory hotplugAkinobu Mita
This adds two selftests * tools/testing/selftests/cpu-hotplug/on-off-test.sh is testing script for CPU hotplug 1. Online all hot-pluggable CPUs 2. Offline all hot-pluggable CPUs 3. Online all hot-pluggable CPUs again 4. Exit if cpu-notifier-error-inject.ko is not available 5. Offline all hot-pluggable CPUs in preparation for testing 6. Test CPU hot-add error handling by injecting notifier errors 7. Online all hot-pluggable CPUs in preparation for testing 8. Test CPU hot-remove error handling by injecting notifier errors * tools/testing/selftests/memory-hotplug/on-off-test.sh is doing the similar thing for memory hotplug. 1. Online all hot-pluggable memory 2. Offline 10% of hot-pluggable memory 3. Online all hot-pluggable memory again 4. Exit if memory-notifier-error-inject.ko is not available 5. Offline 10% of hot-pluggable memory in preparation for testing 6. Test memory hot-add error handling by injecting notifier errors 7. Online all hot-pluggable memory in preparation for testing 8. Test memory hot-remove error handling by injecting notifier errors Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Greg KH <greg@kroah.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30powerpc: pSeries reconfig notifier error injection moduleAkinobu Mita
This provides the ability to inject artifical errors to pSeries reconfig notifier chain callbacks. It is controlled through debugfs interface under /sys/kernel/debug/notifier-error-inject/pSeries-reconfig If the notifier call chain should be failed with some events notified, write the error code to "actions/<notifier event>/error". Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Greg KH <greg@kroah.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30memory: memory notifier error injection moduleAkinobu Mita
This provides the ability to inject artifical errors to memory hotplug notifier chain callbacks. It is controlled through debugfs interface under /sys/kernel/debug/notifier-error-inject/memory If the notifier call chain should be failed with some events notified, write the error code to "actions/<notifier event>/error". Example: Inject memory hotplug offline error (-12 == -ENOMEM) # cd /sys/kernel/debug/notifier-error-inject/memory # echo -12 > actions/MEM_GOING_OFFLINE/error # echo offline > /sys/devices/system/memory/memoryXXX/state bash: echo: write error: Cannot allocate memory Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Greg KH <greg@kroah.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30PM: PM notifier error injection moduleAkinobu Mita
This provides the ability to inject artifical errors to PM notifier chain callbacks. It is controlled through debugfs interface under /sys/kernel/debug/notifier-error-inject/pm Each of the files in "error" directory represents an event which can be failed and contains the error code. If the notifier call chain should be failed with some events notified, write the error code to the files. If the notifier call chain should be failed with some events notified, write the error code to "actions/<notifier event>/error". Example: Inject PM suspend error (-12 = -ENOMEM) # cd /sys/kernel/debug/notifier-error-inject/pm # echo -12 > actions/PM_SUSPEND_PREPARE/error # echo mem > /sys/power/state bash: echo: write error: Cannot allocate memory Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Cc: Greg KH <greg@kroah.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30cpu: rewrite cpu-notifier-error-inject moduleAkinobu Mita
Rewrite existing cpu-notifier-error-inject module to use debugfs based new framework. This change removes cpu_up_prepare_error and cpu_down_prepare_error module parameters which were used to specify error code to be injected. We could keep these module parameters for backward compatibility by module_param_cb but it seems overkill for this module. This provides the ability to inject artifical errors to CPU notifier chain callbacks. It is controlled through debugfs interface under /sys/kernel/debug/notifier-error-inject/cpu If the notifier call chain should be failed with some events notified, write the error code to "actions/<notifier event>/error". Example1: inject CPU offline error (-1 == -EPERM) # cd /sys/kernel/debug/notifier-error-inject/cpu # echo -1 > actions/CPU_DOWN_PREPARE/error # echo 0 > /sys/devices/system/cpu/cpu1/online bash: echo: write error: Operation not permitted Example2: inject CPU online error (-2 == -ENOENT) # cd /sys/kernel/debug/notifier-error-inject/cpu # echo -2 > actions/CPU_UP_PREPARE/error # echo 1 > /sys/devices/system/cpu/cpu1/online bash: echo: write error: No such file or directory Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Greg KH <greg@kroah.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30fault-injection: notifier error injectionAkinobu Mita
This patchset provides kernel modules that can be used to test the error handling of notifier call chain failures by injecting artifical errors to the following notifier chain callbacks. * CPU notifier * PM notifier * memory hotplug notifier * powerpc pSeries reconfig notifier Example: Inject CPU offline error (-1 == -EPERM) # cd /sys/kernel/debug/notifier-error-inject/cpu # echo -1 > actions/CPU_DOWN_PREPARE/error # echo 0 > /sys/devices/system/cpu/cpu1/online bash: echo: write error: Operation not permitted The patchset also adds cpu and memory hotplug tests to tools/testing/selftests These tests first do simple online and offline test and then do fault injection tests if notifier error injection module is available. This patch: The notifier error injection provides the ability to inject artifical errors to specified notifier chain callbacks. It is useful to test the error handling of notifier call chain failures. This adds common basic functions to define which type of events can be fail and to initialize the debugfs interface to control what error code should be returned and which event should be failed. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Greg KH <greg@kroah.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30c/r: fcntl: add F_GETOWNER_UIDS optionCyrill Gorcunov
When we restore file descriptors we would like them to look exactly as they were at dumping time. With help of fcntl it's almost possible, the missing snippet is file owners UIDs. To be able to read their values the F_GETOWNER_UIDS is introduced. This option is valid iif CONFIG_CHECKPOINT_RESTORE is turned on, otherwise returning -EINVAL. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30resource: make sure requested range is included in the root rangeOctavian Purdila
When the requested range is outside of the root range the logic in __reserve_region_with_split will cause an infinite recursion which will overflow the stack as seen in the warning bellow. This particular stack overflow was caused by requesting the (100000000-107ffffff) range while the root range was (0-ffffffff). In this case __request_resource would return the whole root range as conflict range (i.e. 0-ffffffff). Then, the logic in __reserve_region_with_split would continue the recursion requesting the new range as (conflict->end+1, end) which incidentally in this case equals the originally requested range. This patch aborts looking for an usable range when the request does not intersect with the root range. When the request partially overlaps with the root range, it ajust the request to fall in the root range and then continues with the new request. When the request is modified or aborted errors and a stack trace are logged to allow catching the errors in the upper layers. [ 5.968374] WARNING: at kernel/sched.c:4129 sub_preempt_count+0x63/0x89() [ 5.975150] Modules linked in: [ 5.978184] Pid: 1, comm: swapper Not tainted 3.0.22-mid27-00004-gb72c817 #46 [ 5.985324] Call Trace: [ 5.987759] [<c1039dfc>] ? console_unlock+0x17b/0x18d [ 5.992891] [<c1039620>] warn_slowpath_common+0x48/0x5d [ 5.998194] [<c1031758>] ? sub_preempt_count+0x63/0x89 [ 6.003412] [<c1039644>] warn_slowpath_null+0xf/0x13 [ 6.008453] [<c1031758>] sub_preempt_count+0x63/0x89 [ 6.013499] [<c14d60c4>] _raw_spin_unlock+0x27/0x3f [ 6.018453] [<c10c6349>] add_partial+0x36/0x3b [ 6.022973] [<c10c7c0a>] deactivate_slab+0x96/0xb4 [ 6.027842] [<c14cf9d9>] __slab_alloc.isra.54.constprop.63+0x204/0x241 [ 6.034456] [<c103f78f>] ? kzalloc.constprop.5+0x29/0x38 [ 6.039842] [<c103f78f>] ? kzalloc.constprop.5+0x29/0x38 [ 6.045232] [<c10c7dc9>] kmem_cache_alloc_trace+0x51/0xb0 [ 6.050710] [<c103f78f>] ? kzalloc.constprop.5+0x29/0x38 [ 6.056100] [<c103f78f>] kzalloc.constprop.5+0x29/0x38 [ 6.061320] [<c17b45e9>] __reserve_region_with_split+0x1c/0xd1 [ 6.067230] [<c17b4693>] __reserve_region_with_split+0xc6/0xd1 ... [ 7.179057] [<c17b4693>] __reserve_region_with_split+0xc6/0xd1 [ 7.184970] [<c17b4779>] reserve_region_with_split+0x30/0x42 [ 7.190709] [<c17a8ebf>] e820_reserve_resources_late+0xd1/0xe9 [ 7.196623] [<c17c9526>] pcibios_resource_survey+0x23/0x2a [ 7.202184] [<c17cad8a>] pcibios_init+0x23/0x35 [ 7.206789] [<c17ca574>] pci_subsys_init+0x3f/0x44 [ 7.211659] [<c1002088>] do_one_initcall+0x72/0x122 [ 7.216615] [<c17ca535>] ? pci_legacy_init+0x3d/0x3d [ 7.221659] [<c17a27ff>] kernel_init+0xa6/0x118 [ 7.226265] [<c17a2759>] ? start_kernel+0x334/0x334 [ 7.231223] [<c14d7482>] kernel_thread_helper+0x6/0x10 Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30include/linux/aio.h: cpp->C conversionsAndrew Morton
Convert init_sync_kiocb() from a nasty macro into a nice C function. The struct assignment trick takes care of zeroing all unmentioned fields. Shrinks fs/read_write.o's .text from 9857 bytes to 9714. Also demacroize is_sync_kiocb() and aio_ring_avail(). The latter fixes an arg-referenced-multiple-times hand grenade. Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30fs: cachefiles: add support for large files in filesystem cachingJustin Lecher
Support the caching of large files. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=31182 Signed-off-by: Justin Lecher <jlec@gentoo.org> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.com> Tested-by: Suresh Jayaraman <sjayaraman@suse.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30pps: return PTR_ERR on error in device_createEmil Goode
We should return PTR_ERR if the call to the device_create function fails. Without this patch we instead return the value from a successful call to cdev_add if the call to device_create fails. Signed-off-by: Emil Goode <emilgoode@gmail.com> Acked-by: Devendra Naga <devendra.aaru@gmail.com> Cc: Alexander Gordeev <lasaine@lvk.cs.msu.su> Cc: Rodolfo Giometti <giometti@enneenne.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30taskstats: check nla_reserve() returnAlan Cox
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=44621 Reported-by: <rucsoftsec@gmail.com> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30sysctl: suppress kmemleak messagesSteven Rostedt
register_sysctl_table() is a strange function, as it makes internal allocations (a header) to register a sysctl_table. This header is a handle to the table that is created, and can be used to unregister the table. But if the table is permanent and never unregistered, the header acts the same as a static variable. Unfortunately, this allocation of memory that is never expected to be freed fools kmemleak in thinking that we have leaked memory. For those sysctl tables that are never unregistered, and have no pointer referencing them, kmemleak will think that these are memory leaks: unreferenced object 0xffff880079fb9d40 (size 192): comm "swapper/0", pid 0, jiffies 4294667316 (age 12614.152s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8146b590>] kmemleak_alloc+0x73/0x98 [<ffffffff8110a935>] kmemleak_alloc_recursive.constprop.42+0x16/0x18 [<ffffffff8110b852>] __kmalloc+0x107/0x153 [<ffffffff8116fa72>] kzalloc.constprop.8+0xe/0x10 [<ffffffff811703c9>] __register_sysctl_paths+0xe1/0x160 [<ffffffff81170463>] register_sysctl_paths+0x1b/0x1d [<ffffffff8117047d>] register_sysctl_table+0x18/0x1a [<ffffffff81afb0a1>] sysctl_init+0x10/0x14 [<ffffffff81b05a6f>] proc_sys_init+0x2f/0x31 [<ffffffff81b0584c>] proc_root_init+0xa5/0xa7 [<ffffffff81ae5b7e>] start_kernel+0x3d0/0x40a [<ffffffff81ae52a7>] x86_64_start_reservations+0xae/0xb2 [<ffffffff81ae53ad>] x86_64_start_kernel+0x102/0x111 [<ffffffffffffffff>] 0xffffffffffffffff The sysctl_base_table used by sysctl itself is one such instance that registers the table to never be unregistered. Use kmemleak_not_leak() to suppress the kmemleak false positive. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSIONWill Deacon
Rather than #define the options manually in the architecture code, add Kconfig options for them and select them there instead. This also allows us to select the compat IPC version parsing automatically for platforms using the old compat IPC interface. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30ipc: compat: use signed size_t types for msgsnd and msgrcvWill Deacon
The msgsnd and msgrcv system calls use size_t to represent the size of the message being transferred. POSIX states that values of msgsz greater than SSIZE_MAX cause the result to be implementation-defined. On Linux, this equates to returning -EINVAL if (long) msgsz < 0. For compat tasks where !CONFIG_ARCH_WANT_OLD_COMPAT_IPC and compat_size_t is smaller than size_t, negative size values passed from userspace will be interpreted as positive values by do_msg{rcv,snd} and will fail to exit early with -EINVAL. This patch changes the compat prototypes for msg{rcv,snd} so that the message size is represented as a compat_ssize_t, which we cast to the native ssize_t type for the core IPC code. Cc: Arnd Bergmann <arnd@arndb.de> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30ipc: allow compat IPC version field parsing if !ARCH_WANT_OLD_COMPAT_IPCWill Deacon
Commit 48b25c43e6ee ("ipc: provide generic compat versions of IPC syscalls") added a new ARCH_WANT_OLD_COMPAT_IPC config option for architectures to select if their compat target requires the old IPC syscall interface. For architectures (such as AArch64) that do not require the internal calling conventions provided by this option, but have a compat target where the C library passes the IPC_64 flag explicitly, compat_ipc_parse_version no longer strips out the flag before calling the native system call implementation, resulting in unknown SHM/IPC commands and -EINVAL being returned to userspace. This patch separates the selection of the internal calling conventions for the IPC syscalls from the version parsing, allowing architectures to select __ARCH_WANT_COMPAT_IPC_PARSE_VERSION if they want to use version parsing whilst retaining the newer syscall calling conventions. Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: Arnd Bergmann <arnd@arndb.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30ipc: add COMPAT_SHMLBA supportWill Deacon
If the SHMLBA definition for a native task differs from the definition for a compat task, the do_shmat() function would need to handle both. This patch introduces COMPAT_SHMLBA, which is used by the compat shmat syscall when calling the ipc code and allows architectures such as AArch64 (where the native SHMLBA is 64k but the compat (AArch32) definition is 16k) to provide the correct semantics for compat IPC system calls. Cc: David S. Miller <davem@davemloft.net> Cc: Chris Zankel <chris@zankel.net> Cc: Arnd Bergmann <arnd@arndb.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30kdump: append newline to the last lien of vmcoreinfo noteVivek Goyal
The last line of vmcoreinfo note does not end with \n. Parsing all the lines in note becomes easier if all lines end with \n instead of trying to special case the last line. I know at least one tool, vmcore-dmesg in kexec-tools tree which made the assumption that all lines end with \n. I think it is a good idea to fix it. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30fork: fix error handling in dup_task()Akinobu Mita
The function dup_task() may fail at the following function calls in the following order. 0) alloc_task_struct_node() 1) alloc_thread_info_node() 2) arch_dup_task_struct() Error by 0) is not a matter, it can just return. But error by 1) requires releasing task_struct allocated by 0) before it returns. Likewise, error by 2) requires releasing task_struct and thread_info allocated by 0) and 1). The existing error handling calls free_task_struct() and free_thread_info() which do not only release task_struct and thread_info, but also call architecture specific arch_release_task_struct() and arch_release_thread_info(). The problem is that task_struct and thread_info are not fully initialized yet at this point, but arch_release_task_struct() and arch_release_thread_info() are called with them. For example, x86 defines its own arch_release_task_struct() that releases a task_xstate. If alloc_thread_info_node() fails in dup_task(), arch_release_task_struct() is called with task_struct which is just allocated and filled with garbage in this error handling. This actually happened with tools/testing/fault-injection/failcmd.sh # env FAILCMD_TYPE=fail_page_alloc \ ./tools/testing/fault-injection/failcmd.sh --times=100 \ --min-order=0 --ignore-gfp-wait=0 \ -- make -C tools/testing/selftests/ run_tests In order to fix this issue, make free_{task_struct,thread_info}() not to call arch_release_{task_struct,thread_info}() and call arch_release_{task_struct,thread_info}() implicitly where needed. Default arch_release_task_struct() and arch_release_thread_info() are defined as empty by default. So this change only affects the architectures which implement their own arch_release_task_struct() or arch_release_thread_info() as listed below. arch_release_task_struct(): x86, sh arch_release_thread_info(): mn10300, tile Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: David Howells <dhowells@redhat.com> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Salman Qazi <sqazi@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30revert "sched: Fix fork() error path to not crash"Andrew Morton
To make way for "fork: fix error handling in dup_task()", which fixes the errors more completely. Cc: Salman Qazi <sqazi@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30fork: use vma_pages() to simplify the codeHuang Shijie
The current code can be replaced by vma_pages(). So use it to simplify the code. [akpm@linux-foundation.org: initialise `len' at its definition site] Signed-off-by: Huang Shijie <shijie8@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30proc: do not allow negative offsets on /proc/<pid>/environDjalal Harouni
__mem_open() which is called by both /proc/<pid>/environ and /proc/<pid>/mem ->open() handlers will allow the use of negative offsets. /proc/<pid>/mem has negative offsets but not /proc/<pid>/environ. Clean this by moving the 'force FMODE_UNSIGNED_OFFSET flag' to mem_open() to allow negative offsets only on /proc/<pid>/mem. Signed-off-by: Djalal Harouni <tixxdz@opendz.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Brad Spengler <spender@grsecurity.net> Acked-by: Kees Cook <keescook@chromium.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30proc: environ_read() make sure offset points to environment address rangeDjalal Harouni
Currently the following offset and environment address range check in environ_read() of /proc/<pid>/environ is buggy: int this_len = mm->env_end - (mm->env_start + src); if (this_len <= 0) break; Large or negative offsets on /proc/<pid>/environ converted to 'unsigned long' may pass this check since '(mm->env_start + src)' can overflow and 'this_len' will be positive. This can turn /proc/<pid>/environ to act like /proc/<pid>/mem since (mm->env_start + src) will point and read from another VMA. There are two fixes here plus some code cleaning: 1) Fix the overflow by checking if the offset that was converted to unsigned long will always point to the [mm->env_start, mm->env_end] address range. 2) Remove the truncation that was made to the result of the check, storing the result in 'int this_len' will alter its value and we can not depend on it. For kernels that have commit b409e578d ("proc: clean up /proc/<pid>/environ handling") which adds the appropriate ptrace check and saves the 'mm' at ->open() time, this is not a security issue. This patch is taken from the grsecurity patch since it was just made available. Signed-off-by: Djalal Harouni <tixxdz@opendz.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Brad Spengler <spender@grsecurity.net> Acked-by: Kees Cook <keescook@chromium.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30coredump: fix wrong comments on core limits of pipe coredump caseJovi Zhang
In commit 898b374af6f7 ("exec: replace call_usermodehelper_pipe with use of umh init function and resolve limit"), the core limits recursive check value was changed from 0 to 1, but the corresponding comments were not updated. Signed-off-by: Jovi Zhang <bookjovi@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30kmod: avoid deadlock from recursive kmod callTetsuo Handa
The system deadlocks (at least since 2.6.10) when call_usermodehelper(UMH_WAIT_EXEC) request triggers call_usermodehelper(UMH_WAIT_PROC) request. This is because "khelper thread is waiting for the worker thread at wait_for_completion() in do_fork() since the worker thread was created with CLONE_VFORK flag" and "the worker thread cannot call complete() because do_execve() is blocked at UMH_WAIT_PROC request" and "the khelper thread cannot start processing UMH_WAIT_PROC request because the khelper thread is waiting for the worker thread at wait_for_completion() in do_fork()". The easiest example to observe this deadlock is to use a corrupted /sbin/hotplug binary (like shown below). # : > /tmp/dummy # chmod 755 /tmp/dummy # echo /tmp/dummy > /proc/sys/kernel/hotplug # modprobe whatever call_usermodehelper("/tmp/dummy", UMH_WAIT_EXEC) is called from kobject_uevent_env() in lib/kobject_uevent.c upon loading/unloading a module. do_execve("/tmp/dummy") triggers a call to request_module("binfmt-0000") from search_binary_handler() which in turn calls call_usermodehelper(UMH_WAIT_PROC). In order to avoid deadlock, as a for-now and easy-to-backport solution, do not try to call wait_for_completion() in call_usermodehelper_exec() if the worker thread was created by khelper thread with CLONE_VFORK flag. Future and fundamental solution might be replacing singleton khelper thread with some workqueue so that recursive calls up to max_active dependency loop can be handled without deadlock. [akpm@linux-foundation.org: add comment to kmod_thread_locker] Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30kernel/kmod.c: document call_usermodehelper_fns() a bitAndrew Morton
This function's interface is, uh, subtle. Attempt to apologise for it. Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Kees Cook <keescook@chromium.org> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30fat: refactor shortname parsingSteven J. Magnani
Nearly identical shortname parsing is performed in fat_search_long() and __fat_readdir(). Extract this code into a function that may be called by both. Signed-off-by: Steven J. Magnani <steve@digidescorp.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30fat: accessors for msdos_dir_entry 'start' fieldsSteven J. Magnani
Simplify code by providing accessor functions for the directory entry start cluster fields. Signed-off-by: Steven J. Magnani <steve@digidescorp.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30hfsplus: use -ENOMEM when kzalloc() failsNamjae Jeon
Use -ENOMEM return value instead of -EINVAL when kzalloc() fails. Signed-off-by: Namjae Jeon <linkinjeon@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30nilfs2: add omitted comments for different structures in driver implementationVyacheslav Dubeyko
Add omitted comments for different structures in driver implementation. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30nilfs2: add omitted comments for structures in nilfs2_fs.hVyacheslav Dubeyko
Add omitted comments for structures in nilfs2_fs.h. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30nilfs2: fix deadlock issue between chcp and thaw ioctlsRyusuke Konishi
An fs-thaw ioctl causes deadlock with a chcp or mkcp -s command: chcp D ffff88013870f3d0 0 1325 1324 0x00000004 ... Call Trace: nilfs_transaction_begin+0x11c/0x1a0 [nilfs2] wake_up_bit+0x20/0x20 copy_from_user+0x18/0x30 [nilfs2] nilfs_ioctl_change_cpmode+0x7d/0xcf [nilfs2] nilfs_ioctl+0x252/0x61a [nilfs2] do_page_fault+0x311/0x34c get_unmapped_area+0x132/0x14e do_vfs_ioctl+0x44b/0x490 __set_task_blocked+0x5a/0x61 vm_mmap_pgoff+0x76/0x87 __set_current_blocked+0x30/0x4a sys_ioctl+0x4b/0x6f system_call_fastpath+0x16/0x1b thaw D ffff88013870d890 0 1352 1351 0x00000004 ... Call Trace: rwsem_down_failed_common+0xdb/0x10f call_rwsem_down_write_failed+0x13/0x20 down_write+0x25/0x27 thaw_super+0x13/0x9e do_vfs_ioctl+0x1f5/0x490 vm_mmap_pgoff+0x76/0x87 sys_ioctl+0x4b/0x6f filp_close+0x64/0x6c system_call_fastpath+0x16/0x1b where the thaw ioctl deadlocked at thaw_super() when called while chcp was waiting at nilfs_transaction_begin() called from nilfs_ioctl_change_cpmode(). This deadlock is 100% reproducible. This is because nilfs_ioctl_change_cpmode() first locks sb->s_umount in read mode and then waits for unfreezing in nilfs_transaction_begin(), whereas thaw_super() locks sb->s_umount in write mode. The locking of sb->s_umount here was intended to make snapshot mounts and the downgrade of snapshots to checkpoints exclusive. This fixes the deadlock issue by replacing the sb->s_umount usage in nilfs_ioctl_change_cpmode() with a dedicated mutex which protects snapshot mounts. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>