summaryrefslogtreecommitdiff
path: root/include/linux/slab.h
AgeCommit message (Collapse)Author
2014-06-04memcg: cleanup kmem cache creation/destruction functions namingVladimir Davydov
Current names are rather inconsistent. Let's try to improve them. Brief change log: ** old name ** ** new name ** kmem_cache_create_memcg memcg_create_kmem_cache memcg_kmem_create_cache memcg_regsiter_cache memcg_kmem_destroy_cache memcg_unregister_cache kmem_cache_destroy_memcg_children memcg_cleanup_cache_params mem_cgroup_destroy_all_caches memcg_unregister_all_caches create_work memcg_register_cache_work memcg_create_cache_work_func memcg_register_cache_func memcg_create_cache_enqueue memcg_schedule_register_cache Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04memcg: get rid of memcg_create_cache_nameVladimir Davydov
Instead of calling back to memcontrol.c from kmem_cache_create_memcg in order to just create the name of a per memcg cache, let's allocate it in place. We only need to pass the memcg name to kmem_cache_create_memcg for that - everything else can be done in slab_common.c. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04memcg, slab: simplify synchronization schemeVladimir Davydov
At present, we have the following mutexes protecting data related to per memcg kmem caches: - slab_mutex. This one is held during the whole kmem cache creation and destruction paths. We also take it when updating per root cache memcg_caches arrays (see memcg_update_all_caches). As a result, taking it guarantees there will be no changes to any kmem cache (including per memcg). Why do we need something else then? The point is it is private to slab implementation and has some internal dependencies with other mutexes (get_online_cpus). So we just don't want to rely upon it and prefer to introduce additional mutexes instead. - activate_kmem_mutex. Initially it was added to synchronize initializing kmem limit (memcg_activate_kmem). However, since we can grow per root cache memcg_caches arrays only on kmem limit initialization (see memcg_update_all_caches), we also employ it to protect against memcg_caches arrays relocation (e.g. see __kmem_cache_destroy_memcg_children). - We have a convention not to take slab_mutex in memcontrol.c, but we want to walk over per memcg memcg_slab_caches lists there (e.g. for destroying all memcg caches on offline). So we have per memcg slab_caches_mutex's protecting those lists. The mutexes are taken in the following order: activate_kmem_mutex -> slab_mutex -> memcg::slab_caches_mutex Such a syncrhonization scheme has a number of flaws, for instance: - We can't call kmem_cache_{destroy,shrink} while walking over a memcg::memcg_slab_caches list due to locking order. As a result, in mem_cgroup_destroy_all_caches we schedule the memcg_cache_params::destroy work shrinking and destroying the cache. - We don't have a mutex to synchronize per memcg caches destruction between memcg offline (mem_cgroup_destroy_all_caches) and root cache destruction (__kmem_cache_destroy_memcg_children). Currently we just don't bother about it. This patch simplifies it by substituting per memcg slab_caches_mutex's with the global memcg_slab_mutex. It will be held whenever a new per memcg cache is created or destroyed, so it protects per root cache memcg_caches arrays and per memcg memcg_slab_caches lists. The locking order is following: activate_kmem_mutex -> memcg_slab_mutex -> slab_mutex This allows us to call kmem_cache_{create,shrink,destroy} under the memcg_slab_mutex. As a result, we don't need memcg_cache_params::destroy work any more - we can simply destroy caches while iterating over a per memcg slab caches list. Also using the global mutex simplifies synchronization between concurrent per memcg caches creation/destruction, e.g. mem_cgroup_destroy_all_caches vs __kmem_cache_destroy_memcg_children. The downside of this is that we substitute per-memcg slab_caches_mutex's with a hummer-like global mutex, but since we already take either the slab_mutex or the cgroup_mutex along with a memcg::slab_caches_mutex, it shouldn't hurt concurrency a lot. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Glauber Costa <glommer@gmail.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04memcg, slab: do not schedule cache destruction when last page goes awayVladimir Davydov
This patchset is a part of preparations for kmemcg re-parenting. It targets at simplifying kmemcg work-flows and synchronization. First, it removes async per memcg cache destruction (see patches 1, 2). Now caches are only destroyed on memcg offline. That means the caches that are not empty on memcg offline will be leaked. However, they are already leaked, because memcg_cache_params::nr_pages normally never drops to 0 so the destruction work is never scheduled except kmem_cache_shrink is called explicitly. In the future I'm planning reaping such dead caches on vmpressure or periodically. Second, it substitutes per memcg slab_caches_mutex's with the global memcg_slab_mutex, which should be taken during the whole per memcg cache creation/destruction path before the slab_mutex (see patch 3). This greatly simplifies synchronization among various per memcg cache creation/destruction paths. I'm still not quite sure about the end picture, in particular I don't know whether we should reap dead memcgs' kmem caches periodically or try to merge them with their parents (see https://lkml.org/lkml/2014/4/20/38 for more details), but whichever way we choose, this set looks like a reasonable change to me, because it greatly simplifies kmemcg work-flows and eases further development. This patch (of 3): After a memcg is offlined, we mark its kmem caches that cannot be deleted right now due to pending objects as dead by setting the memcg_cache_params::dead flag, so that memcg_release_pages will schedule cache destruction (memcg_cache_params::destroy) as soon as the last slab of the cache is freed (memcg_cache_params::nr_pages drops to zero). I guess the idea was to destroy the caches as soon as possible, i.e. immediately after freeing the last object. However, it just doesn't work that way, because kmem caches always preserve some pages for the sake of performance, so that nr_pages never gets to zero unless the cache is shrunk explicitly using kmem_cache_shrink. Of course, we could account the total number of objects on the cache or check if all the slabs allocated for the cache are empty on kmem_cache_free and schedule destruction if so, but that would be too costly. Thus we have a piece of code that works only when we explicitly call kmem_cache_shrink, but complicates the whole picture a lot. Moreover, it's racy in fact. For instance, kmem_cache_shrink may free the last slab and thus schedule cache destruction before it finishes checking that the cache is empty, which can lead to use-after-free. So I propose to remove this async cache destruction from memcg_release_pages, and check if the cache is empty explicitly after calling kmem_cache_shrink instead. This will simplify things a lot w/o introducing any functional changes. And regarding dead memcg caches (i.e. those that are left hanging around after memcg offline for they have objects), I suppose we should reap them either periodically or on vmpressure as Glauber suggested initially. I'm going to implement this later. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Glauber Costa <glommer@gmail.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04mm: get rid of __GFP_KMEMCGVladimir Davydov
Currently to allocate a page that should be charged to kmemcg (e.g. threadinfo), we pass __GFP_KMEMCG flag to the page allocator. The page allocated is then to be freed by free_memcg_kmem_pages. Apart from looking asymmetrical, this also requires intrusion to the general allocation path. So let's introduce separate functions that will alloc/free pages charged to kmemcg. The new functions are called alloc_kmem_pages and free_kmem_pages. They should be used when the caller actually would like to use kmalloc, but has to fall back to the page allocator for the allocation is large. They only differ from alloc_pages and free_pages in that besides allocating or freeing pages they also charge them to the kmem resource counter of the current memory cgroup. [sfr@canb.auug.org.au: export kmalloc_order() to modules] Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Glauber Costa <glommer@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-13Merge branch 'slab/next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux Pull slab changes from Pekka Enberg: "The biggest change is byte-sized freelist indices which reduces slab freelist memory usage: https://lkml.org/lkml/2013/12/2/64" * 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux: mm: slab/slub: use page->list consistently instead of page->lru mm/slab.c: cleanup outdated comments and unify variables naming slab: fix wrongly used macro slub: fix high order page allocation problem with __GFP_NOFAIL slab: Make allocations with GFP_ZERO slightly more efficient slab: make more slab management structure off the slab slab: introduce byte sized index for the freelist of a slab slab: restrict the number of objects in a slab slab: introduce helper functions to get/set free object slab: factor out calculate nr objects in cache_estimate
2014-04-07memcg, slab: separate memcg vs root cache creation pathsVladimir Davydov
Memcg-awareness turned kmem_cache_create() into a dirty interweaving of memcg-only and except-for-memcg calls. To clean this up, let's move the code responsible for memcg cache creation to a separate function. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Glauber Costa <glommer@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-01slab: fix wrongly used macroJoonsoo Kim
commit 'slab: restrict the number of objects in a slab' uses __builtin_constant_p() on #if macro. It is wrong usage of builtin function, but it is compiled on x86 without any problem, so I can't find it before 0 day build system find it. This commit fixes the situation by using KMALLOC_MIN_SIZE, instead of KMALLOC_SHIFT_LOW. KMALLOC_SHIFT_LOW is parsed to ilog2() on some architecture and this ilog2() uses __builtin_constant_p() and results in the problem. This problem would disappear by using KMALLOC_MIN_SIZE, since it is just constant. Tested-by: David Rientjes <rientjes@google.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2014-03-10mm: fix GFP_THISNODE callers and clarifyJohannes Weiner
GFP_THISNODE is for callers that implement their own clever fallback to remote nodes. It restricts the allocation to the specified node and does not invoke reclaim, assuming that the caller will take care of it when the fallback fails, e.g. through a subsequent allocation request without GFP_THISNODE set. However, many current GFP_THISNODE users only want the node exclusive aspect of the flag, without actually implementing their own fallback or triggering reclaim if necessary. This results in things like page migration failing prematurely even when there is easily reclaimable memory available, unless kswapd happens to be running already or a concurrent allocation attempt triggers the necessary reclaim. Convert all callsites that don't implement their own fallback strategy to __GFP_THISNODE. This restricts the allocation a single node too, but at the same time allows the allocator to enter the slowpath, wake kswapd, and invoke direct reclaim if necessary, to make the allocation happen when memory is full. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Rik van Riel <riel@redhat.com> Cc: Jan Stancek <jstancek@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-08slab: restrict the number of objects in a slabJoonsoo Kim
To prepare to implement byte sized index for managing the freelist of a slab, we should restrict the number of objects in a slab to be less or equal to 256, since byte only represent 256 different values. Setting the size of object to value equal or more than newly introduced SLAB_OBJ_MIN_SIZE ensures that the number of objects in a slab is less or equal to 256 for a slab with 1 page. If page size is rather larger than 4096, above assumption would be wrong. In this case, we would fall back on 2 bytes sized index. If minimum size of kmalloc is less than 16, we use it as minimum object size and give up this optimization. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2014-02-02Merge branch 'slab/next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux Pull SLAB changes from Pekka Enberg: "Random bug fixes that have accumulated in my inbox over the past few months" * 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux: mm: Fix warning on make htmldocs caused by slab.c mm: slub: work around unneeded lockdep warning mm: sl[uo]b: fix misleading comments slub: Fix possible format string bug. slub: use lockdep_assert_held slub: Fix calculation of cpu slabs slab.h: remove duplicate kmalloc declaration and fix kernel-doc warnings
2014-01-31mm: sl[uo]b: fix misleading commentsDave Hansen
On x86, SLUB creates and handles <=8192-byte allocations internally. It passes larger ones up to the allocator. Saying "up to order 2" is, at best, ambiguous. Is that order-1? Or (order-2 bytes)? Make it more clear. SLOB commits a similar sin. It *handles* page-size requests, but the comment says that it passes up "all page size and larger requests". SLOB also swaps around the order of the very-similarly-named KMALLOC_SHIFT_HIGH and KMALLOC_SHIFT_MAX #defines. Make it consistent with the order of the other two allocators. Cc: Matt Mackall <mpm@selenic.com> Cc: Andrew Morton <akpm@linux-foundation.org> Acked-by: Christoph Lameter <cl@linux-foundation.org> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2014-01-23memcg, slab: RCU protect memcg_params for root cachesVladimir Davydov
We relocate root cache's memcg_params whenever we need to grow the memcg_caches array to accommodate all kmem-active memory cgroups. Currently on relocation we free the old version immediately, which can lead to use-after-free, because the memcg_caches array is accessed lock-free (see cache_from_memcg_idx()). This patch fixes this by making memcg_params RCU-protected for root caches. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Glauber Costa <glommer@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-18slab.h: remove duplicate kmalloc declaration and fix kernel-doc warningsRandy Dunlap
Fix kernel-doc warning for duplicate definition of 'kmalloc': Documentation/DocBook/kernel-api.xml:9483: element refentry: validity error : ID API-kmalloc already defined <refentry id="API-kmalloc"> Also combine the kernel-doc info from the 2 kmalloc definitions into one block and remove the "see kcalloc" comment since kmalloc now contains the @flags info. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-11-24slab.h: remove duplicate kmalloc declaration and fix kernel-doc warningsRandy Dunlap
Fix kernel-doc warning for duplicate definition of 'kmalloc': Documentation/DocBook/kernel-api.xml:9483: element refentry: validity error : ID API-kmalloc already defined <refentry id="API-kmalloc"> Also combine the kernel-doc info from the 2 kmalloc definitions into one block and remove the "see kcalloc" comment since kmalloc now contains the @flags info. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-22Merge branch 'slab/next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux Pull SLAB changes from Pekka Enberg: "The patches from Joonsoo Kim switch mm/slab.c to use 'struct page' for slab internals similar to mm/slub.c. This reduces memory usage and improves performance: https://lkml.org/lkml/2013/10/16/155 Rest of the changes are bug fixes from various people" * 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux: (21 commits) mm, slub: fix the typo in mm/slub.c mm, slub: fix the typo in include/linux/slub_def.h slub: Handle NULL parameter in kmem_cache_flags slab: replace non-existing 'struct freelist *' with 'void *' slab: fix to calm down kmemleak warning slub: proper kmemleak tracking if CONFIG_SLUB_DEBUG disabled slab: rename slab_bufctl to slab_freelist slab: remove useless statement for checking pfmemalloc slab: use struct page for slab management slab: replace free and inuse in struct slab with newly introduced active slab: remove SLAB_LIMIT slab: remove kmem_bufctl_t slab: change the management method of free objects of the slab slab: use __GFP_COMP flag for allocating slab pages slab: use well-defined macro, virt_to_slab() slab: overloading the RCU head over the LRU for RCU free slab: remove cachep in struct slab_rcu slab: remove nodeid in struct slab slab: remove colouroff in struct slab slab: change return type of kmem_getpages() to struct page ...
2013-10-24slab: overloading the RCU head over the LRU for RCU freeJoonsoo Kim
With build-time size checking, we can overload the RCU head over the LRU of struct page to free pages of a slab in rcu context. This really help to implement to overload the struct slab over the struct page and this eventually reduce memory usage and cache footprint of the SLAB. Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Pekka Enberg <penberg@iki.fi>
2013-09-04slab: Use correct GFP_DMA constantChristoph Lameter
On Thu, 5 Sep 2013, kbuild test robot wrote: > >> include/linux/slab.h:433:53: sparse: restricted gfp_t degrades to integer > 429 static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node) > 430 { > 431 #ifndef CONFIG_SLOB > 432 if (__builtin_constant_p(size) && > > 433 size <= KMALLOC_MAX_CACHE_SIZE && !(flags & SLAB_CACHE_DMA)) { > 434 int i = kmalloc_index(size); > 435 flags is of type gfp_t and not a slab internal flag. Therefore use GFP_DMA. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-09-04mm/sl[aou]b: Move kmallocXXX functions to common codeChristoph Lameter
The kmalloc* functions of all slab allcoators are similar now so lets move them into slab.h. This requires some function naming changes in slob. As a results of this patch there is a common set of functions for all allocators. Also means that kmalloc_large() is now available in general to perform large order allocations that go directly via the page allocator. kmalloc_large() can be substituted if kmalloc() throws warnings because of too large allocations. kmalloc_large() has exactly the same semantics as kmalloc but can only used for allocations > PAGE_SIZE. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-07-07slab: add kmalloc() to kernel API documentationMichael Opdenacker
At the moment, kmalloc() isn't even listed in the kernel API documentation (DocBook/kernel-api.html after running "make htmldocs"). Another issue is that the documentation for kmalloc_node() refers to kcalloc()'s documentation to describe its 'flags' parameter, while kcalloc() refered to kmalloc()'s documentation, which doesn't exist! This patch is a proposed fix for this. It also removes the documentation for kmalloc() in include/linux/slob_def.h which isn't included to generate the documentation anyway. This way, kmalloc() is described in only one place. Acked-by: Christoph Lameter <cl@linux.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-06-18slob: Rework #ifdeffery in slab.hChristoph Lameter
Make the SLOB specific stuff harmonize more with the way the other allocators do it. Create the typical kmalloc constants for that purpose. SLOB does not support it but the constants help us avoid #ifdefs. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-02-06slab: Handle ARCH_DMA_MINALIGN correctlyChristoph Lameter
James Hogan hit boot problems in next-20130204 on Meta: META213-Thread0 DSP [LogF] kobject (4fc03980): tried to init an initialized object, something is seriously wrong. META213-Thread0 DSP [LogF] META213-Thread0 DSP [LogF] Call trace: META213-Thread0 DSP [LogF] [<4000888c>] _show_stack+0x68/0x7c META213-Thread0 DSP [LogF] [<400088b4>] _dump_stack+0x14/0x28 META213-Thread0 DSP [LogF] [<40103794>] _kobject_init+0x58/0x9c META213-Thread0 DSP [LogF] [<40103810>] _kobject_create+0x38/0x64 META213-Thread0 DSP [LogF] [<40103eac>] _kobject_create_and_add+0x14/0x8c META213-Thread0 DSP [LogF] [<40190ac4>] _mnt_init+0xd8/0x220 META213-Thread0 DSP [LogF] [<40190508>] _vfs_caches_init+0xb0/0x160 META213-Thread0 DSP [LogF] [<401851f4>] _start_kernel+0x274/0x340 META213-Thread0 DSP [LogF] [<40188424>] _metag_start_kernel+0x58/0x6c META213-Thread0 DSP [LogF] [<40000044>] __start+0x44/0x48 META213-Thread0 DSP [LogF] META213-Thread0 DSP [LogF] devtmpfs: initialized META213-Thread0 DSP [LogF] L2 Cache: Not present META213-Thread0 DSP [LogF] BUG: failure at fs/sysfs/dir.c:736/sysfs_read_ns_type()! META213-Thread0 DSP [LogF] Kernel panic - not syncing: BUG! META213-Thread0 DSP [Thread Exit] Thread has exited - return code = 4294967295 And bisected the problem to commit 95a05b4 ("slab: Common constants for kmalloc boundaries"). As it turns out, a fixed KMALLOC_SHIFT_LOW does not work for arches with higher alignment requirements. Determine KMALLOC_SHIFT_LOW from ARCH_DMA_MINALIGN instead. Reported-and-tested-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-02-01slab: Common definition for the array of kmalloc cachesChristoph Lameter
Have a common definition fo the kmalloc cache arrays in SLAB and SLUB Acked-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-02-01slab: Common constants for kmalloc boundariesChristoph Lameter
Standardize the constants that describe the smallest and largest object kept in the kmalloc arrays for SLAB and SLUB. Differentiate between the maximum size for which a slab cache is used (KMALLOC_MAX_CACHE_SIZE) and the maximum allocatable size (KMALLOC_MAX_SIZE, KMALLOC_MAX_ORDER). Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-02-01slab: Common kmalloc slab index determinationChristoph Lameter
Extract the function to determine the index of the slab within the array of kmalloc caches as well as a function to determine maximum object size from the nr of the kmalloc slab. This is used here only to simplify slub bootstrap but will be used later also for SLAB. Acked-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-02-01slab: Move kmalloc related function defsChristoph Lameter
Move these functions higher up in slab.h so that they are grouped with other generic kmalloc related definitions. Acked-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-12-18slab: propagate tunable valuesGlauber Costa
SLAB allows us to tune a particular cache behavior with tunables. When creating a new memcg cache copy, we'd like to preserve any tunables the parent cache already had. This could be done by an explicit call to do_tune_cpucache() after the cache is created. But this is not very convenient now that the caches are created from common code, since this function is SLAB-specific. Another method of doing that is taking advantage of the fact that do_tune_cpucache() is always called from enable_cpucache(), which is called at cache initialization. We can just preset the values, and then things work as expected. It can also happen that a root cache has its tunables updated during normal system operation. In this case, we will propagate the change to all caches that are already active. This change will require us to move the assignment of root_cache in memcg_params a bit earlier. We need this to be already set - which memcg_kmem_register_cache will do - when we reach __kmem_cache_create() Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18memcg: aggregate memcg cache values in slabinfoGlauber Costa
When we create caches in memcgs, we need to display their usage information somewhere. We'll adopt a scheme similar to /proc/meminfo, with aggregate totals shown in the global file, and per-group information stored in the group itself. For the time being, only reads are allowed in the per-group cache. Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18memcg: destroy memcg cachesGlauber Costa
Implement destruction of memcg caches. Right now, only caches where our reference counter is the last remaining are deleted. If there are any other reference counters around, we just leave the caches lying around until they go away. When that happens, a destruction function is called from the cache code. Caches are only destroyed in process context, so we queue them up for later processing in the general case. Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18slab/slub: consider a memcg parameter in kmem_create_cacheGlauber Costa
Allow a memcg parameter to be passed during cache creation. When the slub allocator is being used, it will only merge caches that belong to the same memcg. We'll do this by scanning the global list, and then translating the cache to a memcg-specific cache Default function is created as a wrapper, passing NULL to the memcg version. We only merge caches that belong to the same memcg. A helper is provided, memcg_css_id: because slub needs a unique cache name for sysfs. Since this is visible, but not the canonical location for slab data, the cache name is not used, the css_id should suffice. Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18slab/slub: struct memcg_paramsGlauber Costa
For the kmem slab controller, we need to record some extra information in the kmem_cache structure. Signed-off-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Suleiman Souhlal <suleiman@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-31mm/sl[aou]b: Move common kmem_cache_size() to slab.hEzequiel Garcia
This function is identically defined in all three allocators and it's trivial to move it to slab.h Since now it's static, inline, header-defined function this patch also drops the EXPORT_SYMBOL tag. Cc: Pekka Enberg <penberg@kernel.org> Cc: Matt Mackall <mpm@selenic.com> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Ezequiel Garcia <elezegarcia@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-09-25mm, slob: Add support for kmalloc_track_caller()Ezequiel Garcia
Currently slob falls back to regular kmalloc for this case. With this patch kmalloc_track_caller() is correctly implemented, thus tracing the specified caller. This is important to trace accurately allocations performed by krealloc, kstrdup, kmemdup, etc. Signed-off-by: Ezequiel Garcia <elezegarcia@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-07-09mm, sl[aou]b: Common definition for boot state of the slab allocatorsChristoph Lameter
All allocators have some sort of support for the bootstrap status. Setup a common definition for the boot states and make all slab allocators use that definition. Reviewed-by: Glauber Costa <glommer@parallels.com> Reviewed-by: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-07-09mm, sl[aou]b: Extract common code for kmem_cache_create()Christoph Lameter
Kmem_cache_create() does a variety of sanity checks but those vary depending on the allocator. Use the strictest tests and put them into a slab_common file. Make the tests conditional on CONFIG_DEBUG_VM. This patch has the effect of adding sanity checks for SLUB and SLOB under CONFIG_DEBUG_VM and removes the checks in SLAB for !CONFIG_DEBUG_VM. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-06-14mm, sl[aou]b: Extract common fields from struct kmem_cacheChristoph Lameter
Define a struct that describes common fields used in all slab allocators. A slab allocator either uses the common definition (like SLOB) or is required to provide members of kmem_cache with the definition given. After that it will be possible to share code that only operates on those fields of kmem_cache. The patch basically takes the slob definition of kmem cache and uses the field namees for the other allocators. It also standardizes the names used for basic object lengths in allocators: object_size Struct size specified at kmem_cache_create. Basically the payload expected to be used by the subsystem. size The size of memory allocator for each object. This size is larger than object_size and includes padding, alignment and extra metadata for each object (f.e. for debugging and rcu). Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-05-31introduce SIZE_MAXXi Wang
ULONG_MAX is often used to check for integer overflow when calculating allocation size. While ULONG_MAX happens to work on most systems, there is no guarantee that `size_t' must be the same size as `long'. This patch introduces SIZE_MAX, the maximum value of `size_t', to improve portability and readability for allocation size validation. Signed-off-by: Xi Wang <xi.wang@gmail.com> Acked-by: Alex Elder <elder@dreamhost.com> Cc: David Airlie <airlied@linux.ie> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-06slab: introduce kmalloc_array()Xi Wang
Introduce a kmalloc_array() wrapper that performs integer overflow checking without zeroing the memory. Suggested-by: Andrew Morton <akpm@linux-foundation.org> Suggested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Xi Wang <xi.wang@gmail.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2011-07-07slab allocators: Provide generic description of alignment definesChristoph Lameter
Provide description for alignment defines. Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2011-06-16slab, slub, slob: Unify alignment definitionChristoph Lameter
Every slab has its on alignment definition in include/linux/sl?b_def.h. Extract those and define a common set in include/linux/slab.h. SLOB: As notes sometimes we need double word alignment on 32 bit. This gives all structures allocated by SLOB a unsigned long long alignment like the others do. SLAB: If ARCH_SLAB_MINALIGN is not set SLAB would set ARCH_SLAB_MINALIGN to zero meaning no alignment at all. Give it the default unsigned long long alignment. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2011-01-23mm: Remove support for kmem_cache_name()Christoph Lameter
The last user was ext4 and Eric Sandeen removed the call in a recent patch. See the following URL for the discussion: http://marc.info/?l=linux-ext4&m=129546975702198&w=2 Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2011-01-07kernel: kmem_ptr_validate considered harmfulNick Piggin
This is a nasty and error prone API. It is no longer used, remove it. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2010-07-04slab: fix caller tracking on !CONFIG_DEBUG_SLAB && CONFIG_TRACINGXiaotian Feng
In slab, all __xxx_track_caller is defined on CONFIG_DEBUG_SLAB || CONFIG_TRACING, thus caller tracking function should be worked for CONFIG_TRACING. But if CONFIG_DEBUG_SLAB is not set, include/linux/slab.h will define xxx_track_caller to __xxx() without consideration of CONFIG_TRACING. This will break the caller tracking behaviour then. Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Vegard Nossum <vegard.nossum@gmail.com> Cc: Dmitry Monakhov <dmonakhov@openvz.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2010-04-09slab: Generify kernel pointer validationPekka Enberg
As suggested by Linus, introduce a kern_ptr_validate() helper that does some sanity checks to make sure a pointer is a valid kernel pointer. This is a preparational step for fixing SLUB kmem_ptr_validate(). Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Matt Mackall <mpm@selenic.com> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-26failslab: add ability to filter slab cachesDmitry Monakhov
This patch allow to inject faults only for specific slabs. In order to preserve default behavior cache filter is off by default (all caches are faulty). One may define specific set of slabs like this: # mark skbuff_head_cache as faulty echo 1 > /sys/kernel/slab/skbuff_head_cache/failslab # Turn on cache filter (off by default) echo 1 > /sys/kernel/debug/failslab/cache-filter # Turn on fault injection echo 1 > /sys/kernel/debug/failslab/times echo 1 > /sys/kernel/debug/failslab/probability Acked-by: David Rientjes <rientjes@google.com> Acked-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-06-15Merge commit 'linus/master' into HEADVegard Nossum
Conflicts: MAINTAINERS Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
2009-06-15kmemcheck: add mm functionsVegard Nossum
With kmemcheck enabled, the slab allocator needs to do this: 1. Tell kmemcheck to allocate the shadow memory which stores the status of each byte in the allocation proper, e.g. whether it is initialized or uninitialized. 2. Tell kmemcheck which parts of memory that should be marked uninitialized. There are actually a few more states, such as "not yet allocated" and "recently freed". If a slab cache is set up using the SLAB_NOTRACK flag, it will never return memory that can take page faults because of kmemcheck. If a slab cache is NOT set up using the SLAB_NOTRACK flag, callers can still request memory with the __GFP_NOTRACK flag. This does not prevent the page faults from occuring, however, but marks the object in question as being initialized so that no warnings will ever be produced for this object. In addition to (and in contrast to) __GFP_NOTRACK, the __GFP_NOTRACK_FALSE_POSITIVE flag indicates that the allocation should not be tracked _because_ it would produce a false positive. Their values are identical, but need not be so in the future (for example, we could now enable/disable false positives with a config option). Parts of this patch were contributed by Pekka Enberg but merged for atomicity. Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu> [rebased for mainline inclusion] Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
2009-06-12slab,slub: don't enable interrupts during early bootPekka Enberg
As explained by Benjamin Herrenschmidt: Oh and btw, your patch alone doesn't fix powerpc, because it's missing a whole bunch of GFP_KERNEL's in the arch code... You would have to grep the entire kernel for things that check slab_is_available() and even then you'll be missing some. For example, slab_is_available() didn't always exist, and so in the early days on powerpc, we used a mem_init_done global that is set form mem_init() (not perfect but works in practice). And we still have code using that to do the test. Therefore, mask out __GFP_WAIT, __GFP_IO, and __GFP_FS in the slab allocators in early boot code to avoid enabling interrupts. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-06-11kmemleak: Add the slab memory allocation/freeing hooksCatalin Marinas
This patch adds the callbacks to kmemleak_(alloc|free) functions from the slab allocator. The patch also adds the SLAB_NOLEAKTRACE flag to avoid recursive calls to kmemleak when it allocates its own data structures. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-02-20slab: introduce kzfree()Johannes Weiner
kzfree() is a wrapper for kfree() that additionally zeroes the underlying memory before releasing it to the slab allocator. Currently there is code which memset()s the memory region of an object before releasing it back to the slab allocator to make sure security-sensitive data are really zeroed out after use. These callsites can then just use kzfree() which saves some code, makes users greppable and allows for a stupid destructor that isn't necessarily aware of the actual object size. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Matt Mackall <mpm@selenic.com> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>