diff options
Diffstat (limited to 'Documentation/sysctl')
-rw-r--r-- | Documentation/sysctl/kernel.txt | 51 | ||||
-rw-r--r-- | Documentation/sysctl/net.txt | 63 | ||||
-rw-r--r-- | Documentation/sysctl/vm.txt | 30 |
3 files changed, 117 insertions, 27 deletions
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index ccd42589e124..9d4c1d18ad44 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -70,12 +70,12 @@ show up in /proc/sys/kernel: - shmall - shmmax [ sysv ipc ] - shmmni -- softlockup_thresh - stop-a [ SPARC only ] - sysrq ==> Documentation/sysrq.txt - tainted - threads-max - unknown_nmi_panic +- watchdog_thresh - version ============================================================== @@ -182,6 +182,7 @@ core_pattern is used to specify a core dumpfile pattern name. %<NUL> '%' is dropped %% output one '%' %p pid + %P global pid (init PID namespace) %u uid %g gid %d dump mode, matches PR_SET_DUMPABLE and @@ -427,6 +428,32 @@ This file shows up if CONFIG_DEBUG_STACKOVERFLOW is enabled. ============================================================== +perf_cpu_time_max_percent: + +Hints to the kernel how much CPU time it should be allowed to +use to handle perf sampling events. If the perf subsystem +is informed that its samples are exceeding this limit, it +will drop its sampling frequency to attempt to reduce its CPU +usage. + +Some perf sampling happens in NMIs. If these samples +unexpectedly take too long to execute, the NMIs can become +stacked up next to each other so much that nothing else is +allowed to execute. + +0: disable the mechanism. Do not monitor or correct perf's + sampling rate no matter how CPU time it takes. + +1-100: attempt to throttle perf's sample rate to this + percentage of CPU. Note: the kernel calculates an + "expected" length of each sample event. 100 here means + 100% of that expected length. Even if this is set to + 100, you may still see sample throttling if this + length is exceeded. Set to 0 if you truly do not care + how much CPU is consumed. + +============================================================== + pid_max: @@ -604,15 +631,6 @@ without users and with a dead originative process will be destroyed. ============================================================== -softlockup_thresh: - -This value can be used to lower the softlockup tolerance threshold. The -default threshold is 60 seconds. If a cpu is locked up for 60 seconds, -the kernel complains. Valid values are 1-60 seconds. Setting this -tunable to zero will disable the softlockup detection altogether. - -============================================================== - tainted: Non-zero if the kernel has been tainted. Numeric values, which @@ -648,3 +666,16 @@ that time, kernel debugging information is displayed on console. NMI switch that most IA32 servers have fires unknown NMI up, for example. If a system hangs up, try pressing the NMI switch. + +============================================================== + +watchdog_thresh: + +This value can be used to control the frequency of hrtimer and NMI +events and the soft and hard lockup thresholds. The default threshold +is 10 seconds. + +The softlockup threshold is (2 * watchdog_thresh). Setting this +tunable to zero will disable lockup detection altogether. + +============================================================== diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt index 98335b7a5337..9a0319a82470 100644 --- a/Documentation/sysctl/net.txt +++ b/Documentation/sysctl/net.txt @@ -1,4 +1,4 @@ -Documentation for /proc/sys/net/* kernel version 2.4.0-test11-pre4 +Documentation for /proc/sys/net/* (c) 1999 Terrehon Bowden <terrehon@pacbell.net> Bodo Bauer <bb@ricochet.net> (c) 2000 Jorge Nerin <comandante@zaralinux.com> @@ -9,10 +9,10 @@ For general info and legal blurb, please look in README. ============================================================== This file contains the documentation for the sysctl files in -/proc/sys/net and is valid for Linux kernel version 2.4.0-test11-pre4. +/proc/sys/net The interface to the networking parts of the kernel is located in -/proc/sys/net. The following table shows all possible subdirectories.You may +/proc/sys/net. The following table shows all possible subdirectories. You may see only some of them, depending on your kernel's configuration. @@ -26,7 +26,7 @@ Table : Subdirectories in /proc/sys/net ipv4 IP version 4 x25 X.25 protocol ipx IPX token-ring IBM token ring bridge Bridging decnet DEC net - ipv6 IP version 6 + ipv6 IP version 6 tipc TIPC .............................................................................. 1. /proc/sys/net/core - Network core options @@ -50,6 +50,43 @@ The maximum number of packets that kernel can handle on a NAPI interrupt, it's a Per-CPU variable. Default: 64 +default_qdisc +-------------- + +The default queuing discipline to use for network devices. This allows +overriding the default queue discipline of pfifo_fast with an +alternative. Since the default queuing discipline is created with the +no additional parameters so is best suited to queuing disciplines that +work well without configuration like stochastic fair queue (sfq), +CoDel (codel) or fair queue CoDel (fq_codel). Don't use queuing disciplines +like Hierarchical Token Bucket or Deficit Round Robin which require setting +up classes and bandwidths. +Default: pfifo_fast + +busy_read +---------------- +Low latency busy poll timeout for socket reads. (needs CONFIG_NET_RX_BUSY_POLL) +Approximate time in us to busy loop waiting for packets on the device queue. +This sets the default value of the SO_BUSY_POLL socket option. +Can be set or overridden per socket by setting socket option SO_BUSY_POLL, +which is the preferred method of enabling. If you need to enable the feature +globally via sysctl, a value of 50 is recommended. +Will increase power usage. +Default: 0 (off) + +busy_poll +---------------- +Low latency busy poll timeout for poll and select. (needs CONFIG_NET_RX_BUSY_POLL) +Approximate time in us to busy loop waiting for events. +Recommended value depends on the number of sockets you poll on. +For several sockets 50, for several hundreds 100. +For more than that you probably want to use epoll. +Note that only sockets with SO_BUSY_POLL set will be busy polled, +so you want to either selectively set SO_BUSY_POLL on those sockets or set +sysctl.net.busy_read globally. +Will increase power usage. +Default: 0 (off) + rmem_default ------------ @@ -93,8 +130,7 @@ netdev_budget Maximum number of packets taken from all interfaces in one polling cycle (NAPI poll). In one polling cycle interfaces which are registered to polling are -probed in a round-robin manner. The limit of packets in one such probe can be -set per-device via sysfs class/net/<device>/weight . +probed in a round-robin manner. netdev_max_backlog ------------------ @@ -201,3 +237,18 @@ IPX. The /proc/net/ipx_route table holds a list of IPX routes. For each route it gives the destination network, the router node (or Directly) and the network address of the router (or Connected) for internal networks. + +6. TIPC +------------------------------------------------------- + +The TIPC protocol now has a tunable for the receive memory, similar to the +tcp_rmem - i.e. a vector of 3 INTEGERs: (min, default, max) + + # cat /proc/sys/net/tipc/tipc_rmem + 4252725 34021800 68043600 + # + +The max value is set to CONN_OVERLOAD_LIMIT, and the default and min values +are scaled (shifted) versions of that same value. Note that the min value +is not at this point in time used in any meaningful way, but the triplet is +preserved in order to be consistent with things like tcp_rmem. diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index dcc75a9ed919..79a797eb3e87 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -200,17 +200,25 @@ fragmentation index is <= extfrag_threshold. The default value is 500. hugepages_treat_as_movable -This parameter is only useful when kernelcore= is specified at boot time to -create ZONE_MOVABLE for pages that may be reclaimed or migrated. Huge pages -are not movable so are not normally allocated from ZONE_MOVABLE. A non-zero -value written to hugepages_treat_as_movable allows huge pages to be allocated -from ZONE_MOVABLE. +This parameter controls whether we can allocate hugepages from ZONE_MOVABLE +or not. If set to non-zero, hugepages can be allocated from ZONE_MOVABLE. +ZONE_MOVABLE is created when kernel boot parameter kernelcore= is specified, +so this parameter has no effect if used without kernelcore=. -Once enabled, the ZONE_MOVABLE is treated as an area of memory the huge -pages pool can easily grow or shrink within. Assuming that applications are -not running that mlock() a lot of memory, it is likely the huge pages pool -can grow to the size of ZONE_MOVABLE by repeatedly entering the desired value -into nr_hugepages and triggering page reclaim. +Hugepage migration is now available in some situations which depend on the +architecture and/or the hugepage size. If a hugepage supports migration, +allocation from ZONE_MOVABLE is always enabled for the hugepage regardless +of the value of this parameter. +IOW, this parameter affects only non-migratable hugepages. + +Assuming that hugepages are not migratable in your system, one usecase of +this parameter is that users can make hugepage pool more extensible by +enabling the allocation from ZONE_MOVABLE. This is because on ZONE_MOVABLE +page reclaim/migration/compaction work more and you can get contiguous +memory more likely. Note that using ZONE_MOVABLE for non-migratable +hugepages can do harm to other features like memory hotremove (because +memory hotremove expects that memory blocks on ZONE_MOVABLE are always +removable,) so it's a trade-off responsible for the users. ============================================================== @@ -510,7 +518,7 @@ Specify "[Dd]efault" to request automatic configuration. Autoconfiguration will select "node" order in following case. (1) if the DMA zone does not exist or (2) if the DMA zone comprises greater than 50% of the available memory or -(3) if any node's DMA zone comprises greater than 60% of its local memory and +(3) if any node's DMA zone comprises greater than 70% of its local memory and the amount of local memory is big enough. Otherwise, "zone" order will be selected. Default order is recommended unless |