summaryrefslogtreecommitdiff
path: root/drivers/cpufreq/cpufreq_ondemand.c
AgeCommit message (Collapse)Author
2015-12-09cpufreq: ondemand: update update_sampling_rate() to make it more efficientViresh Kumar
Currently update_sampling_rate() runs over each online CPU and cancels/queues timers on all policy->cpus every time. This should be done just once for any cpu belonging to a policy. Create a cpumask and keep on clearing it as and when we process policies, so that we don't have to traverse through all CPUs of the same policy. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-09cpufreq: governor: replace per-CPU delayed work with timersViresh Kumar
cpufreq governors evaluate load at sampling rate and based on that they update frequency for a group of CPUs belonging to the same cpufreq policy. This is required to be done in a single thread for all policy->cpus, but because we don't want to wakeup idle CPUs to do just that, we use deferrable work for this. If we would have used a single delayed deferrable work for the entire policy, there were chances that the CPU required to run the handler can be in idle and we might end up not changing the frequency for the entire group with load variations. And so we were forced to keep per-cpu works, and only the one that expires first need to do the real work and others are rescheduled for next sampling time. We have been using the more complex solution until now, where we used a delayed deferrable work for this, which is a combination of a timer and a work. This could be made lightweight by keeping per-cpu deferred timers with a single work item, which is scheduled by the first timer that expires. This patch does just that and here are important changes: - The timer handler will run in irq context and so we need to use a spin_lock instead of the timer_mutex. And so a separate timer_lock is created. This also makes the use of the mutex and lock quite clear, as we know what exactly they are protecting. - A new field 'skip_work' is added to track when the timer handlers can queue a work. More comments present in code. Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Ashwin Chaugule <ashwin.chaugule@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-07cpufreq: governor: Pass policy as argument to ->gov_dbs_timer()Viresh Kumar
Pass 'policy' as argument to ->gov_dbs_timer() instead of cdbs and dbs_data. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-07cpufreq: ondemand: Work is guaranteed to be pendingViresh Kumar
We are guaranteed to have works scheduled for policy->cpus, as the policy isn't stopped yet. And so there is no need to check that again. Drop it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-07cpufreq: ondemand: Update sampling rate only for concerned policiesViresh Kumar
We are comparing policy->governor against cpufreq_gov_ondemand to make sure that we update sampling rate only for the concerned CPUs. But that isn't enough. In case of governor_per_policy, there can be multiple instances of ondemand governor and we will always end up updating all of them with current code. What we rather need to do, is to compare dbs_data with poilcy->governor_data, which will match only for the policies governed by dbs_data. This code is also racy as the governor might be getting stopped at that time and we may end up scheduling work for a policy, which we have just disabled. Fix that by protecting the entire function with &od_dbs_cdata.mutex, which will prevent against races with policy START/STOP/etc. After these locks are in place, we can safely get the policy via per-cpu dbs_info. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-10-28cpufreq: ondemand: Drop unnecessary locks from update_sampling_rate()Viresh Kumar
'timer_mutex' is required to sync work-handlers of policy->cpus. update_sampling_rate() is just canceling the works and queuing them again. This isn't protecting anything at all in update_sampling_rate() and is not gonna be of any use. Even if a work-handler is already running for a CPU, cancel_delayed_work_sync() will wait for it to finish. Drop these unnecessary locks. Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-07-21cpufreq: governor: split out common part of {cs|od}_dbs_timer()Viresh Kumar
Some part of cs_dbs_timer() and od_dbs_timer() is exactly same and is unnecessarily duplicated. Create the real work-handler in cpufreq_governor.c and put the common code in this routine (dbs_timer()). Shouldn't make any functional change. Reviewed-and-tested-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-07-21cpufreq: governor: Keep single copy of information common to policy->cpusViresh Kumar
Some information is common to all CPUs belonging to a policy, but are kept on per-cpu basis. Lets keep that in another structure common to all policy->cpus. That will make updates/reads to that less complex and less error prone. The memory for cpu_common_dbs_info is allocated/freed at INIT/EXIT, so that it we don't reallocate it for STOP/START sequence. It will be also be used (in next patch) while the governor is stopped and so must not be freed that early. Reviewed-and-tested-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-07-17cpufreq: governor: rename cur_policy as policyViresh Kumar
Just call it 'policy', cur_policy is unnecessarily long and doesn't have any special meaning. Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-07-17cpufreq: governor: Name delayed-work as dworkViresh Kumar
Delayed work was named as 'work' and to access work within it we do work.work. Not much readable. Rename delayed_work as 'dwork'. Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-06-15cpufreq: governor: Serialize governor callbacksViresh Kumar
There are several races reported in cpufreq core around governors (only ondemand and conservative) by different people. There are at least two race scenarios present in governor code: (a) Concurrent access/updates of governor internal structures. It is possible that fields such as 'dbs_data->usage_count', etc. are accessed simultaneously for different policies using same governor structure (i.e. CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag unset). And because of this we can dereference bad pointers. For example consider a system with two CPUs with separate 'struct cpufreq_policy' instances. CPU0 governor: ondemand and CPU1: powersave. CPU0 switching to powersave and CPU1 to ondemand: CPU0 CPU1 store* store* cpufreq_governor_exit() cpufreq_governor_init() dbs_data = cdata->gdbs_data; if (!--dbs_data->usage_count) kfree(dbs_data); dbs_data->usage_count++; *Bad pointer dereference* There are other races possible between EXIT and START/STOP/LIMIT as well. Its really complicated. (b) Switching governor state in bad sequence: For example trying to switch a governor to START state, when the governor is in EXIT state. There are some checks present in __cpufreq_governor() but they aren't sufficient as they compare events against 'policy->governor_enabled', where as we need to take governor's state into account, which can be used by multiple policies. These two issues need to be solved separately and the responsibility should be properly divided between cpufreq and governor core. The first problem is more about the governor core, as it needs to protect its structures properly. And the second problem should be fixed in cpufreq core instead of governor, as its all about sequence of events. This patch is trying to solve only the first problem. There are two types of data we need to protect, - 'struct common_dbs_data': No matter what, there is going to be a single copy of this per governor. - 'struct dbs_data': With CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag set, we will have per-policy copy of this data, otherwise a single copy. Because of such complexities, the mutex present in 'struct dbs_data' is insufficient to solve our problem. For example we need to protect fetching of 'dbs_data' from different structures at the beginning of cpufreq_governor_dbs(), to make sure it isn't currently being updated. This can be fixed if we can guarantee serialization of event parsing code for an individual governor. This is best solved with a mutex per governor, and the placeholder for that is 'struct common_dbs_data'. And so this patch moves the mutex from 'struct dbs_data' to 'struct common_dbs_data' and takes it at the beginning and drops it at the end of cpufreq_governor_dbs(). Tested with and without following configuration options: CONFIG_LOCKDEP_SUPPORT=y CONFIG_DEBUG_RT_MUTEXES=y CONFIG_DEBUG_PI_LIST=y CONFIG_DEBUG_SPINLOCK=y CONFIG_DEBUG_MUTEXES=y CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y CONFIG_LOCKDEP=y CONFIG_DEBUG_ATOMIC_SLEEP=y Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-06-15cpufreq: governor: register notifier from cs_init()Viresh Kumar
Notifiers are required only for conservative governor and the common governor code is unnecessarily polluted with that. Handle that from cs_init/exit() instead of cpufreq_governor_dbs(). Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21cpufreq: ondemand: Eliminate the deadband effectStratos Karafotis
Currently, ondemand calculates the target frequency proportional to load using the formula: Target frequency = C * load where C = policy->cpuinfo.max_freq / 100 Though, in many cases, the minimum available frequency is pretty high and the above calculation introduces a dead band from load 0 to 100 * policy->cpuinfo.min_freq / policy->cpuinfo.max_freq where the target frequency is always calculated to less than policy->cpuinfo.min_freq and the minimum frequency is selected. For example: on Intel i7-3770 @ 3.4GHz the policy->cpuinfo.min_freq = 1600000 and the policy->cpuinfo.max_freq = 3400000 (without turbo). Thus, the CPU starts to scale up at a load above 47. On quad core 1500MHz Krait the policy->cpuinfo.min_freq = 384000 and the policy->cpuinfo.max_freq = 1512000. Thus, the CPU starts to scale at load above 25. Change the calculation of target frequency to eliminate the above effect using the formula: Target frequency = A + B * load where A = policy->cpuinfo.min_freq and B = (policy->cpuinfo.max_freq - policy->cpuinfo->min_freq) / 100 This will map load values 0 to 100 linearly to cpuinfo.min_freq to cpuinfo.max_freq. Also, use the CPUFREQ_RELATION_C in __cpufreq_driver_target to select the closest frequency in frequency_table. This is necessary to avoid selection of minimum frequency only when load equals to 0. It will also help for selection of frequencies using a more 'fair' criterion. Tables below show the difference in selected frequency for specific values of load without and with this patch. On Intel i7-3770 @ 3.40GHz: Without With Load Target Selected Target Selected 0 0 1600000 1600000 1600000 5 170050 1600000 1690050 1700000 10 340100 1600000 1780100 1700000 15 510150 1600000 1870150 1900000 20 680200 1600000 1960200 2000000 25 850250 1600000 2050250 2100000 30 1020300 1600000 2140300 2100000 35 1190350 1600000 2230350 2200000 40 1360400 1600000 2320400 2400000 45 1530450 1600000 2410450 2400000 50 1700500 1900000 2500500 2500000 55 1870550 1900000 2590550 2600000 60 2040600 2100000 2680600 2600000 65 2210650 2400000 2770650 2800000 70 2380700 2400000 2860700 2800000 75 2550750 2600000 2950750 3000000 80 2720800 2800000 3040800 3000000 85 2890850 2900000 3130850 3100000 90 3060900 3100000 3220900 3300000 95 3230950 3300000 3310950 3300000 100 3401000 3401000 3401000 3401000 On ARM quad core 1500MHz Krait: Without With Load Target Selected Target Selected 0 0 384000 384000 384000 5 75600 384000 440400 486000 10 151200 384000 496800 486000 15 226800 384000 553200 594000 20 302400 384000 609600 594000 25 378000 384000 666000 702000 30 453600 486000 722400 702000 35 529200 594000 778800 810000 40 604800 702000 835200 810000 45 680400 702000 891600 918000 50 756000 810000 948000 918000 55 831600 918000 1004400 1026000 60 907200 918000 1060800 1026000 65 982800 1026000 1117200 1134000 70 1058400 1134000 1173600 1134000 75 1134000 1134000 1230000 1242000 80 1209600 1242000 1286400 1242000 85 1285200 1350000 1342800 1350000 90 1360800 1458000 1399200 1350000 95 1436400 1458000 1455600 1458000 100 1512000 1512000 1512000 1512000 Tested on Intel i7-3770 CPU @ 3.40GHz and on ARM quad core 1500MHz Krait (Android smartphone). Benchmarks on Intel i7 shows a performance improvement on low and medium work loads with lower power consumption. Specifics: Phoronix Linux Kernel Compilation 3.1: Time: -0.40%, energy: -0.07% Phoronix Apache: Time: -4.98%, energy: -2.35% Phoronix FFMPEG: Time: -6.29%, energy: -4.02% Also, running mp3 decoding (very low load) shows no differences with and without this patch. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-11-01cpufreq: ondemand: Remove redundant return statementStratos Karafotis
After commit dfa5bb622555 (cpufreq: ondemand: Change the calculation of target frequency), this return statement is no longer needed. Reported-by: Henrik Nilsson <Karl.Henrik.Nilsson@gmail.com> Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-28cpufreq: governors: Remove duplicate check of target freq in supported rangeStratos Karafotis
Function __cpufreq_driver_target() checks if target_freq is within policy->min and policy->max range. generic_powersave_bias_target() also checks if target_freq is valid via a cpufreq_frequency_table_target() call. So, drop the unnecessary duplicate check in *_check_cpu(). Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-14Merge back earlier 'pm-cpufreq' materialRafael J. Wysocki
2013-08-07cpufreq: Use sizeof(*ptr) convetion for computing sizesViresh Kumar
Chapter 14 of Documentation/CodingStyle says: The preferred form for passing a size of a struct is the following: p = kmalloc(sizeof(*p), ...); The alternative form where struct name is spelled out hurts readability and introduces an opportunity for a bug when the pointer variable type is changed but the corresponding sizeof that is passed to a memory allocator is not. This wasn't followed consistently in drivers/cpufreq, let's make it more consistent by always following this rule. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07cpufreq: Give consistent names to cpufreq_policy objectsViresh Kumar
They are called policy, cur_policy, new_policy, data, etc. Just call them policy wherever possible. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07cpufreq: Clean up header files included in the coreViresh Kumar
This patch addresses the following issues in the header files in the cpufreq core: - Include headers in ascending order, so that we don't add same many times by mistake. - <asm/> must be included after <linux/>, so that they override whatever they need to. - Remove unnecessary includes. - Don't include files already included by cpufreq.h or cpufreq_governor.h. [rjw: Changelog] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07cpufreq: rename ignore_nice as ignore_nice_loadViresh Kumar
This sysfs file was called ignore_nice_load earlier and commit 4d5dcc4 (cpufreq: governor: Implement per policy instances of governors) changed its name to ignore_nice by mistake. Lets get it renamed back to its original name. Reported-by: Martin von Gagern <Martin.vGagern@gmx.net> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 3.10+ <stable@vger.kernel.org> # 3.10+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-07-26cpufreq: ondemand: Change the calculation of target frequencyStratos Karafotis
The ondemand governor calculates load in terms of frequency and increases it only if load_freq is greater than up_threshold multiplied by the current or average frequency. This appears to produce oscillations of frequency between min and max because, for example, a relatively small load can easily saturate minimum frequency and lead the CPU to the max. Then, it will decrease back to the min due to small load_freq. Change the calculation method of load and target frequency on the basis of the following two observations: - Load computation should not depend on the current or average measured frequency. For example, absolute load of 80% at 100MHz is not necessarily equivalent to 8% at 1000MHz in the next sampling interval. - It should be possible to increase the target frequency to any value present in the frequency table proportional to the absolute load, rather than to the max only, so that: Target frequency = C * load where we take C = policy->cpuinfo.max_freq / 100. Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait. Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an increase ~1.5% in performance. cpufreq_stats (time_in_state) shows that middle frequencies are used more, with this patch. Highest and lowest frequencies were used less by ~9%. [rjw: We have run multiple other tests on kernels with this change applied and in the vast majority of cases it turns out that the resulting performance improvement also leads to reduced consumption of energy. The change is additionally justified by the overall simplification of the code in question.] Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-06-25cpufreq: fix NULL pointer deference at od_set_powersave_bias()Jacob Shin
When initializing the default powersave_bias value, we need to first make sure that this policy is running the ondemand governor. Reported-and-tested-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Jacob Shin <jacob.shin@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-05-13cpufreq, ondemand: Remove leftover debug lineBorislav Petkov
I don't see how the virtual address of the tuners pointer would be of any help to anyone so remove it. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-10cpufreq: ondemand: allow custom powersave_bias_target handler to be registeredJacob Shin
This allows for another [arch specific] driver to hook into existing powersave bias function of the ondemand governor. i.e. This allows AMD specific powersave bias function (in a separate AMD specific driver) to aid ondemand governor's frequency transition decisions. Signed-off-by: Jacob Shin <jacob.shin@amd.com> Acked-by: Thomas Renninger <trenn@suse.de> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-01cpufreq: governors: Calculate iowait time only when necessaryStratos Karafotis
Currently we always calculate the CPU iowait time and add it to idle time. If we are in ondemand and we use io_is_busy, we re-calculate iowait time and we subtract it from idle time. With this patch iowait time is calculated only when necessary avoiding the double call to get_cpu_iowait_time_us. We use a parameter in function get_cpu_idle_time to distinguish when the iowait time will be added to idle time or not, without the need of keeping the prev_io_wait. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Acked-by: Viresh Kumar <viresh.kumar@linaro.,org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-01cpufreq: governors: Avoid unnecessary per cpu timer interruptsViresh Kumar
Following patch has introduced per cpu timers or works for ondemand and conservative governors. commit 2abfa876f1117b0ab45f191fb1f82c41b1cbc8fe Author: Rickard Andersson <rickard.andersson@stericsson.com> Date: Thu Dec 27 14:55:38 2012 +0000 cpufreq: handle SW coordinated CPUs This causes additional unnecessary interrupts on all cpus when the load is recently evaluated by any other cpu. i.e. When load is recently evaluated by cpu x, we don't really need any other cpu to evaluate this load again for the next sampling_rate time. Some sort of code is present to avoid that but we are still getting timer interrupts for all cpus. A good way of avoiding this would be to modify delays for all cpus (policy->cpus) whenever any cpu has evaluated load. This patch does this change and some related code cleanup. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-01cpufreq: ondemand: Don't update sample_type if we don't evaluate load againViresh Kumar
Because we have per cpu timer now, we check if we need to evaluate load again or not (In case it is recently evaluated). Here the 2nd cpu which got timer interrupt updates core_dbs_info->sample_type irrespective of load evaluation is required or not. Which is wrong as the first cpu is dependent on this variable set to an older value. Moreover it would be best in this case to schedule 2nd cpu's timer to sampling_rate instead of freq_lo or hi as that must be managed by the other cpu. In case the other cpu idles in between then also we wouldn't loose much power. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-01cpufreq: governor: Implement per policy instances of governorsViresh Kumar
Currently, there can't be multiple instances of single governor_type. If we have a multi-package system, where we have multiple instances of struct policy (per package), we can't have multiple instances of same governor. i.e. We can't have multiple instances of ondemand governor for multiple packages. Governors directory in sysfs is created at /sys/devices/system/cpu/cpufreq/ governor-name/. Which again reflects that there can be only one instance of a governor_type in the system. This is a bottleneck for multicluster system, where we want different packages to use same governor type, but with different tunables. This patch uses the infrastructure provided by earlier patch and implements init/exit routines for ondemand and conservative governors. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-09cpufreq: ondemand: Fix typos in commentsStratos Karafotis
Fix some typos in comments. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-09cpufreq: ondemand: Replace down_differential tuner with adj_up_thresholdStratos Karafotis
In order to avoid the calculation of up_threshold - down_differential every time that the frequency must be decreased, we replace the down_differential tuner with the adj_up_threshold which keeps the difference across multiple checks. Update the adj_up_threshold only when the up_theshold is also updated. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-02cpufreq: governors: Remove code redundancy between governorsViresh Kumar
With the inclusion of following patches: 9f4eb10 cpufreq: conservative: call dbs_check_cpu only when necessary 772b4b1 cpufreq: ondemand: call dbs_check_cpu only when necessary code redundancy between the conservative and ondemand governors is introduced again, so get rid of it. [rjw: Changelog] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Fabio Baltieri <fabio.baltieri@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-02cpufreq: governors: fix misuse of cdbs.cpuFabio Baltieri
Fix governors code to set all cpu's cdbs->cpu to the the actual cpu id and use cur_policy->cpu istead of cdbs->cpu to track current governor's leader cpu. Reported-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Fabio Baltieri <fabio.baltieri@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-02cpufreq: governors: implement generic policy_is_sharedFabio Baltieri
Implement a generic helper function policy_is_shared() to replace the current dbs_sw_coordinated_cpus() at cpufreq level, so that it can be used by code other than cpufreq governors. Suggested-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Fabio Baltieri <fabio.baltieri@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-02cpufreq: ondemand: use all CPUs in update_sampling_rateFabio Baltieri
Modify update_sampling_rate() to check, and eventually immediately schedule, all CPU's do_dbs_timer delayed work. This is required in case of software coordinated CPUs, as we now have a separate delayed work for each CPU. Signed-off-by: Fabio Baltieri <fabio.baltieri@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-02cpufreq: ondemand: call dbs_check_cpu only when necessaryFabio Baltieri
Modify ondemand timer to not resample CPU utilization if recently sampled from another SW coordinated core. Signed-off-by: Fabio Baltieri <fabio.baltieri@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-02cpufreq: handle SW coordinated CPUsRickard Andersson
This patch fixes a bug that occurred when we had load on a secondary CPU and the primary CPU was sleeping. Only one sampling timer was spawned and it was spawned as a deferred timer on the primary CPU, so when a secondary CPU had a change in load this was not detected by the cpufreq governor (both ondemand and conservative). This patch make sure that deferred timers are run on all CPUs in the case of software controlled CPUs that run on the same frequency. Signed-off-by: Rickard Andersson <rickard.andersson@stericsson.com> Signed-off-by: Fabio Baltieri <fabio.baltieri@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-27cpufreq: ondemand: update sampling rate only on right CPUsFabio Baltieri
Fix cpufreq_gov_ondemand to skip CPU where another governor is used. The bug present itself as NULL pointer access on the mutex_lock() call, an can be reproduced on an SMP machine by setting the default governor to anything other than ondemand, setting a single CPU's governor to ondemand, then changing the sample rate by writing on: > /sys/devices/system/cpu/cpufreq/ondemand/sampling_rate Backtrace: Nov 26 17:36:54 balto kernel: [ 839.585241] BUG: unable to handle kernel NULL pointer dereference at (null) Nov 26 17:36:54 balto kernel: [ 839.585311] IP: [<ffffffff8174e082>] __mutex_lock_slowpath+0xb2/0x170 [snip] Nov 26 17:36:54 balto kernel: [ 839.587005] Call Trace: Nov 26 17:36:54 balto kernel: [ 839.587030] [<ffffffff8174da82>] mutex_lock+0x22/0x40 Nov 26 17:36:54 balto kernel: [ 839.587067] [<ffffffff81610b8f>] store_sampling_rate+0xbf/0x150 Nov 26 17:36:54 balto kernel: [ 839.587110] [<ffffffff81031e9c>] ? __do_page_fault+0x1cc/0x4c0 Nov 26 17:36:54 balto kernel: [ 839.587153] [<ffffffff813309bf>] kobj_attr_store+0xf/0x20 Nov 26 17:36:54 balto kernel: [ 839.587192] [<ffffffff811bb62d>] sysfs_write_file+0xcd/0x140 Nov 26 17:36:54 balto kernel: [ 839.587234] [<ffffffff8114c12c>] vfs_write+0xac/0x180 Nov 26 17:36:54 balto kernel: [ 839.587271] [<ffffffff8114c472>] sys_write+0x52/0xa0 Nov 26 17:36:54 balto kernel: [ 839.587306] [<ffffffff810321ce>] ? do_page_fault+0xe/0x10 Nov 26 17:36:54 balto kernel: [ 839.587345] [<ffffffff81751202>] system_call_fastpath+0x16/0x1b Signed-off-by: Fabio Baltieri <fabio.baltieri@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-23cpufreq: ondemand: fix wrong delay sampling rateFabio Baltieri
Restore the correct delay value for ondemand's od_dbs_timer, as it was changed erroneously in commit 83f0e55 (cpufreq: governors: remove redundant code). Signed-off-by: Fabio Baltieri <fabio.baltieri@linaro.org> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15cpufreq: governors: remove redundant codeViresh Kumar
Initially ondemand governor was written and then using its code conservative governor is written. It used a lot of code from ondemand governor, but copy of code was created instead of using the same routines from both governors. Which increased code redundancy, which is difficult to manage. This patch is an attempt to move common part of both the governors to cpufreq_governor.c file to come over above mentioned issues. This shouldn't change anything from functionality point of view. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15cpufreq: Move common part from governors to separate file, v2viresh kumar
Multiple cpufreq governers have defined similar get_cpu_idle_time_***() routines. These routines must be moved to some common place, so that all governors can use them. So moving them to cpufreq_governor.c, which seems to be a better place for keeping these routines. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-10-02Merge tag 'pm-for-3.7-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael J Wysocki: - Improved system suspend/resume and runtime PM handling for the SH TMU, CMT and MTU2 clock event devices (also used by ARM/shmobile). - Generic PM domains framework extensions related to cpuidle support and domain objects lookup using names. - ARM/shmobile power management updates including improved support for the SH7372's A4S power domain containing the CPU core. - cpufreq changes related to AMD CPUs support from Matthew Garrett, Andre Przywara and Borislav Petkov. - cpu0 cpufreq driver from Shawn Guo. - cpufreq governor fixes related to the relaxing of limit from Michal Pecio. - OMAP cpufreq updates from Axel Lin and Richard Zhao. - cpuidle ladder governor fixes related to the disabling of states from Carsten Emde and me. - Runtime PM core updates related to the interactions with the system suspend core from Alan Stern and Kevin Hilman. - Wakeup sources modification allowing more helper functions to be called from interrupt context from John Stultz and additional diagnostic code from Todd Poynor. - System suspend error code path fix from Feng Hong. Fixed up conflicts in cpufreq/powernow-k8 that stemmed from the workqueue fixes conflicting fairly badly with the removal of support for hardware P-state chips. The changes were independent but somewhat intertwined. * tag 'pm-for-3.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (76 commits) Revert "PM QoS: Use spinlock in the per-device PM QoS constraints code" PM / Runtime: let rpm_resume() succeed if RPM_ACTIVE, even when disabled, v2 cpuidle: rename function name "__cpuidle_register_driver", v2 cpufreq: OMAP: Check IS_ERR() instead of NULL for omap_device_get_by_hwmod_name cpuidle: remove some empty lines PM: Prevent runtime suspend during system resume PM QoS: Use spinlock in the per-device PM QoS constraints code PM / Sleep: use resume event when call dpm_resume_early cpuidle / ACPI : move cpuidle_device field out of the acpi_processor_power structure ACPI / processor: remove pointless variable initialization ACPI / processor: remove unused function parameter cpufreq: OMAP: remove loops_per_jiffy recalculate for smp sections: fix section conflicts in drivers/cpufreq cpufreq: conservative: update frequency when limits are relaxed cpufreq / ondemand: update frequency when limits are relaxed properly __init-annotate pm_sysrq_init() cpufreq: Add a generic cpufreq-cpu0 driver PM / OPP: Initialize OPP table from device tree ARM: add cpufreq transiton notifier to adjust loops_per_jiffy for smp cpufreq: Remove support for hardware P-state chips from powernow-k8 ...
2012-09-14cpufreq / ondemand: update frequency when limits are relaxedMichal Pecio
Reevaluate CPU load and update frequency immediately whenever limits are changed. Currently ondemand doesn't do that when limits are relaxed, wasting power on systems with relatively low sampling rate. Signed-off-by: Michal Pecio <mpecio@nvidia.com> Reviewed-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-08-21workqueue: make deferrable delayed_work initializer names consistentTejun Heo
Initalizers for deferrable delayed_work are confused. * __DEFERRED_WORK_INITIALIZER() * DECLARE_DEFERRED_WORK() * INIT_DELAYED_WORK_DEFERRABLE() Rename them to * __DEFERRABLE_WORK_INITIALIZER() * DECLARE_DEFERRABLE_WORK() * INIT_DEFERRABLE_WORK() This patch doesn't cause any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-02-29[CPUFREQ] CPUfreq ondemand: update sampling rate without waiting for next ↵MyungJoo Ham
sampling When a new sampling rate is shorter than the current one, (e.g., 1 sec --> 10 ms) regardless how short the new one is, the current ondemand mechanism wait for the previously set timer to be expired. For example, if the user has just expressed that the sampling rate should be 10 ms from now and the previous was 1000 ms, the new rate may become effective 999 ms later, which could be not acceptable for the user if the user has intended to speed up sampling because the system is expected to react to CPU load fluctuation quickly from __now__. In order to address this issue, we need to cancel the previously set timer (schedule_delayed_work) and reset the timer if resetting timer is expected to trigger the delayed_work ealier. Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Dave Jones <davej@redhat.com>
2012-01-11Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq: (23 commits) [CPUFREQ] EXYNOS: Removed useless headers and codes [CPUFREQ] EXYNOS: Make EXYNOS common cpufreq driver [CPUFREQ] powernow-k8: Update copyright, maintainer and documentation information [CPUFREQ] powernow-k8: Fix indexing issue [CPUFREQ] powernow-k8: Avoid Pstate MSR accesses on systems supporting CPB [CPUFREQ] update lpj only if frequency has changed [CPUFREQ] cpufreq:userspace: fix cpu_cur_freq updation [CPUFREQ] Remove wall variable from cpufreq_gov_dbs_init() [CPUFREQ] EXYNOS4210: cpufreq code is changed for stable working [CPUFREQ] EXYNOS4210: Update frequency table for cpu divider [CPUFREQ] EXYNOS4210: Remove code about bus on cpufreq [CPUFREQ] s3c64xx: Use pr_fmt() for consistent log messages cpufreq: OMAP: fixup for omap_device changes, include <linux/module.h> cpufreq: OMAP: fix freq_table leak cpufreq: OMAP: put clk if cpu_init failed cpufreq: OMAP: only supports OPP library cpufreq: OMAP: dont support !freq_table cpufreq: OMAP: deny initialization if no mpudev cpufreq: OMAP: move clk name decision to init cpufreq: OMAP: notify even with bad boot frequency ...
2011-12-19Merge branch 'sched/core' of ↵Martin Schwidefsky
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into cputime-tip Conflicts: drivers/cpufreq/cpufreq_conservative.c drivers/cpufreq/cpufreq_ondemand.c drivers/macintosh/rack-meter.c fs/proc/stat.c fs/proc/uptime.c kernel/sched/core.c
2011-12-15[S390] cputime: add sparse checking and cleanupMartin Schwidefsky
Make cputime_t and cputime64_t nocast to enable sparse checking to detect incorrect use of cputime. Drop the cputime macros for simple scalar operations. The conversion macros are still needed. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-09[CPUFREQ] Remove wall variable from cpufreq_gov_dbs_init()Kamalesh Babulal
CPUFREQ Remove wall variable from cpufreq_gov_dbs_init() Remove wall variable from cpufreq_gov_dbs_init() as get_cpu_idle_time_us() no longer updates the last_update_time unconditionally. Passing non-NULL last_update_time address will result in accounting additional idle time with update_ts_time_stats() before returning idle_sleeptime. Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Dave Jones <davej@redhat.com> -- drivers/cpufreq/cpufreq_ondemand.c | 3 +-- 1 files changed, 1 insertions(+), 2 deletions(-)
2011-12-06sched/accounting: Change cpustat fields to an arrayGlauber Costa
This patch changes fields in cpustat from a structure, to an u64 array. Math gets easier, and the code is more flexible. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Tuner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1322498719-2255-2-git-send-email-glommer@parallels.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-26Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) time, s390: Get rid of compile warning dw_apb_timer: constify clocksource name time: Cleanup old CONFIG_GENERIC_TIME references that snuck in time: Change jiffies_to_clock_t() argument type to unsigned long alarmtimers: Fix error handling clocksource: Make watchdog reset lockless posix-cpu-timers: Cure SMP accounting oddities s390: Use direct ktime path for s390 clockevent device clockevents: Add direct ktime programming function clockevents: Make minimum delay adjustments configurable nohz: Remove "Switched to NOHz mode" debugging messages proc: Consider NO_HZ when printing idle and iowait times nohz: Make idle/iowait counter update conditional nohz: Fix update_ts_time_stat idle accounting cputime: Clean up cputime_to_usecs and usecs_to_cputime macros alarmtimers: Rework RTC device selection using class interface alarmtimers: Add try_to_cancel functionality alarmtimers: Add more refined alarm state tracking alarmtimers: Remove period from alarm structure alarmtimers: Remove interval cap limit hack ...