目錄
3. userspace設置task的uclamp- __setscheduler_uclamp()
簡介
uclamp是在kernel5.3引入的新feature,又在kernel5.4引入了cgroup的的支持。通過設置task的util值,可以boost task util,達到和android schedtune相似的效果。
一. 數據結構
1. rq中uclamp的實現
圖1-1 rq中uclamp的實現
#ifdef CONFIG_UCLAMP_TASK
/* Utilization clamp values based on CPU's RUNNABLE tasks */
struct uclamp_rq uclamp[UCLAMP_CNT] ____cacheline_aligned;
unsigned int uclamp_flags;
#define UCLAMP_FLAG_IDLE 0x01
struct uclamp_rq {
unsigned int value;
struct uclamp_bucket bucket[UCLAMP_BUCKETS];
};
在rq中,通過嵌入兩組uclamp,即一組最小值uclamp[UCLAMP_MIN]和一組最大值uclamp[UCLAMP_MAX]實現對cpu的util clamp。每組中的value表示rq當前生效的clamp值,兩組中默認各包含5個buckets,可以通過CONFIG_UCLAMP_BUCKETS_COUNT 配置其它buckets數(5~20個)。uclamp_flags在uclamp初始化init_uclamp()時置零,目前只有一個標誌位UCLAMP_FLAG_IDLE ,標識cpu上是否還有task運行。
每個bucket表示一定範圍的util值,以系統默認的5個buckets爲例,每個bucket的範圍是cpu最大capacity的20%: SCHED_CAPACITY_SCALE/UCLAMP_BUCKETS_COUNT,即1024/5。
rq上的task會根據task的值將其規劃到對應的bucket中。比如taskA在cpu0上run,要求25%的util值,會被規劃到bucket[1]中,bucket[1]::value=25%,記錄生效的util值,tasks計數加1,表示規劃到當前bucket中task又多了一個。如果同時35% util值的taskB被調度到cpu0上,則此時bucket[1]::value=35%,tasks計數加1。此時taskA受益於taskB 35%的util值,直到taskA退出rq。如果系統中對taskA受益taskB更高的boot util不能接受(10%),比如功耗增加顯著?可以增加bucket數量,這樣就減少了每個bucket對應的util範圍,提高了bucket util的統計精度,代價是使用更多的memory分配buckets。當bucket中沒有task時,value被設置成默認的bucket範圍的最小值,bucket[1]中沒有task時,bucket[1]::value=%20。
2. task中uclamp的實現
圖1-2 task中uclamp的實現
struct task_struct
{
#ifdef CONFIG_UCLAMP_TASK
/* Clamp values requested for a scheduling entity */
struct uclamp_se uclamp_req[UCLAMP_CNT]; (1)
/* Effective clamp values used for a scheduling entity */
struct uclamp_se uclamp[UCLAMP_CNT]; (2)
#endif
}
struct uclamp_se {
unsigned int value : bits_per(SCHED_CAPACITY_SCALE);
unsigned int bucket_id : bits_per(UCLAMP_BUCKETS);
unsigned int active : 1;
unsigned int user_defined : 1;
};
value:最大值是1024,11bit,表示調度實體的clamp值
bucket_id:默認每個uclamp_id各5個bucket,使用3bit表示,表示clamp值對應的bucket id
active:task被規劃到rq的一個bucket中,此bucket::tasks值爲此task計數加1,bucket::value值也作用於此task
user_defined:標識是usersapce請求的clamp值 。可以更改system給task默認分配的boost.
uclamp以調度實體的方式嵌入task_struct中,包含兩種uclamp_se:
(1) 記錄請求的clamp值的request se
(2) 記錄生效的clamp值的avtive se
3. 擴展 CPU's cgroup controller
cpu cgroup支持uclamp,通過在cpu_cgrp_subsys中添加新屬性uclamp.{min,max},限制group中所有task的boost util值和最大util值。
#ifdef CONFIG_UCLAMP_TASK_GROUP
{
.name = "uclamp.min",
.flags = CFTYPE_NOT_ON_ROOT,
.seq_show = cpu_uclamp_min_show,
.write = cpu_uclamp_min_write,
},
{
.name = "uclamp.max",
.flags = CFTYPE_NOT_ON_ROOT,
.seq_show = cpu_uclamp_max_show,
.write = cpu_uclamp_max_write,
},
{
.name = "uclamp.latency_sensitive",
.flags = CFTYPE_NOT_ON_ROOT,
.read_u64 = cpu_uclamp_ls_read_u64,
.write_u64 = cpu_uclamp_ls_write_u64,
},
#endif
通過android的配置,示例如下,實現對group task util的限制。group限制遵循parent group優先原則。
/dev/cpuctl/cpu.uclamp.min 1024
/dev/cpuctl/cpu.uclamp.max 0
/dev/cpuctl/cpu.uclamp.uclamp.latency_sensitive 1000000
/dev/cpuctl/bg_non_interactive/cpu.uclamp.min 1024
/dev/cpuctl/bg_non_interactive/cpu.uclamp.max 0
/dev/cpuctl/bg_non_interactive/cpu.uclamp.uclamp.latency_sensitive 1000000
二. 關鍵函數
1. uclamp的初始化
static void __init init_uclamp(void)
{
struct uclamp_se uc_max = {};
enum uclamp_id clamp_id;
int cpu;
mutex_init(&uclamp_mutex);
for_each_possible_cpu(cpu) {
memset(&cpu_rq(cpu)->uclamp, 0, sizeof(struct uclamp_rq)); (1)
cpu_rq(cpu)->uclamp_flags = 0;
}
for_each_clamp_id(clamp_id) {
uclamp_se_set(&init_task.uclamp_req[clamp_id], (2)
uclamp_none(clamp_id), false);
}
/* System defaults allow max clamp values for both indexes */
uclamp_se_set(&uc_max, uclamp_none(UCLAMP_MAX), false); (3)
for_each_clamp_id(clamp_id) {
uclamp_default[clamp_id] = uc_max; (4)
#ifdef CONFIG_UCLAMP_TASK_GROUP (5)
root_task_group.uclamp_req[clamp_id] = uc_max;
root_task_group.uclamp[clamp_id] = uc_max;
#endif
}
}
uclamp的初始化函數,在調度器初始化函數sched_init()最後被調用。
(1) 初始化每個cpu所屬rq的uclamp,此處初始化的內存大小應爲:sizeof(struct uclamp_rq)*UCLAMP_CNT,已提交upstream patch。
(2) 初始化init_task的request se,UCLAMP_MIN初始化value=0,UCLAMP_MAX初始化value=1024,並計算對應的bucket id。
(3) 定義了一個uc_max se,並初始化value=1024,計算對應的bucket, uc_max表示clamp的最大值。
(4) 定義了uclamp_default se,並初始化UCLAMP_MIN和UCLAMP_MAX value都爲uc_max。uclamp_defaut是所有clamp se的上限,任何clamp se都要小於等於uclamp_default的值。
(5) 初始化root_task_group的request se和active se爲uc_max
2. fork task的uclamp
static void uclamp_fork(struct task_struct *p)
{
enum uclamp_id clamp_id;
for_each_clamp_id(clamp_id)
p->uclamp[clamp_id].active = false; (1)
if (likely(!p->sched_reset_on_fork)) (2)
return;
for_each_clamp_id(clamp_id) {
unsigned int clamp_value = uclamp_none(clamp_id);
/* By default, RT tasks always get 100% boost */
if (unlikely(rt_task(p) && clamp_id == UCLAMP_MIN)) (3)
clamp_value = uclamp_none(UCLAMP_MAX);
uclamp_se_set(&p->uclamp_req[clamp_id], clamp_value, false); (4)
}
}
uclamp_fork()在sched_fork()函數中被調用。
(1) 新fork的task不被規劃到任何bucket中,設置其uclamp::active=false。
(2) 一般新fork的task都不要求reset sched,此時返回即可。
(3) 如果新fork的task要求reset sched,則設置task的request se,task是RT時,task request se的UCLAMP_MIN和UCLAMP_MAX都設置value=1024,即RT task boost沒有限制,獲取100% boost。
(4) 其它類型task request se設置UCLAMP_MIN value=0,UCLAMP_MAX value=1024。
3. userspace設置task的uclamp- __setscheduler_uclamp()
__setscheduler_uclamp函數在__sched_setscheduler()函數中被調用。
static void __setscheduler_uclamp(struct task_struct *p,
const struct sched_attr *attr)
{
enum uclamp_id clamp_id;
/*
* On scheduling class change, reset to default clamps for tasks
* without a task-specific value.
*/
for_each_clamp_id(clamp_id) { (1)
struct uclamp_se *uc_se = &p->uclamp_req[clamp_id];
unsigned int clamp_value = uclamp_none(clamp_id);
/* Keep using defined clamps across class changes */
if (uc_se->user_defined)
continue;
/* By default, RT tasks always get 100% boost */
if (unlikely(rt_task(p) && clamp_id == UCLAMP_MIN))
clamp_value = uclamp_none(UCLAMP_MAX);
uclamp_se_set(uc_se, clamp_value, false);
}
if (likely(!(attr->sched_flags & SCHED_FLAG_UTIL_CLAMP)))
return;
if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) { (2)
uclamp_se_set(&p->uclamp_req[UCLAMP_MIN],
attr->sched_util_min, true);
}
if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) { (3)
uclamp_se_set(&p->uclamp_req[UCLAMP_MAX],
attr->sched_util_max, true);
}
}
__setscheduler_uclamp()函數提供了userspace設置task uclamp的方法。
(1) 設置task 默認的request se值,對RT,UCLAMP_MIN和UCLAMP_MAX 設置value=1024最大值,其它task設置UCLAMP_MIN value=0,UCLAMP_MAX value=1024.
(2) 根據attr參數中sched_flags=SCHED_FLAG_UTIL_CLAMP_MIN,設置task的request se的UCLAMP_MIN value值爲attr::sched_util_min的值;
(3) 根據attr參數中sched_flags=SCHED_FLAG_UTIL_CLAMP_MAX,設置task的request se的UCLAMP_MAX value值爲attr::sched_util_max的值;
4. uclamp_rq_inc_id()
static inline void uclamp_rq_inc_id(struct rq *rq, struct task_struct *p,
enum uclamp_id clamp_id)
{
struct uclamp_rq *uc_rq = &rq->uclamp[clamp_id];
struct uclamp_se *uc_se = &p->uclamp[clamp_id];
struct uclamp_bucket *bucket;
lockdep_assert_held(&rq->lock);
/* Update task effective clamp */
p->uclamp[clamp_id] = uclamp_eff_get(p, clamp_id); (1)
bucket = &uc_rq->bucket[uc_se->bucket_id]; (2)
bucket->tasks++;
uc_se->active = true;
uclamp_idle_reset(rq, clamp_id, uc_se->value); (3)
/*
* Local max aggregation: rq buckets always track the max
* "requested" clamp value of its RUNNABLE tasks.
*/
if (bucket->tasks == 1 || uc_se->value > bucket->value) (4)
bucket->value = uc_se->value;
if (uc_se->value > READ_ONCE(uc_rq->value)) (5)
WRITE_ONCE(uc_rq->value, uc_se->value);
}
enqueue_task()->uclamp_rq_inc()->uclamp_rq_inc_id() /*入隊時規劃task到rq bucket中 ,並更新uclamp_rq::value*/。
|cpu_cgrp_subsys::cpu_legacy_files::cpu_uclamp_min_write()/cpu_uclamp_max_write->cpu_uclamp_write()->| /*設置cgroup的uclamp*/
|sysctl_sched_uclamp_handler()->uclamp_update_root_tg()->| /*通過proc/sys/kernel/sched_util_clamp_min{max}設置root_task_group的uclamp*/
|->cpu_util_update_eff()->uclamp_update_active_tasks()->uclamp_update_active() ->uclamp_rq_inc_id()
(1) 把task::uclamp_req的請求uclamp,按系優先級順序:uclamp_default (include task::sched_task_group::autogroup/root_task_group) > p::sched_task_group(非auto非root) > task::uclamp_req,進行檢查,並將請求的uclamp賦給生效的uclamp值。
(2) 根據生效的uclam值規劃task到對應的bucket(bucket->tasks++),設置task的生效uclamp::active=true。
(3) 通過UCLAMP_FLAG_IDLE判斷rq是否是剛剛退出idle,如果剛剛退出idle,說明rq上當前只有這一個task,則設置rq::uclamp::value=task::uclamp:value
(4) 如果當前bucket只要當前一個task,或者當前task的task::uclamp:value>bucket::value,則更新bucket::value=task::uclamp::value,因爲bucket::value始終是此bucket中所有task 生效clamp的最大值。
(5) 如果task::uclamp::value>rq::uclamp::value,則更新rq::uclamp::value,因爲rq::uclamp::value始終代表rq上所有task uclamp的最大值。
5. uclamp_rq_dec_id()
static inline void uclamp_rq_dec_id(struct rq *rq, struct task_struct *p,
enum uclamp_id clamp_id)
{
struct uclamp_rq *uc_rq = &rq->uclamp[clamp_id];
struct uclamp_se *uc_se = &p->uclamp[clamp_id];
struct uclamp_bucket *bucket;
unsigned int bkt_clamp;
unsigned int rq_clamp;
lockdep_assert_held(&rq->lock);
bucket = &uc_rq->bucket[uc_se->bucket_id];
SCHED_WARN_ON(!bucket->tasks);
if (likely(bucket->tasks)) (1)
bucket->tasks--;
uc_se->active = false;
/*
* Keep "local max aggregation" simple and accept to (possibly)
* overboost some RUNNABLE tasks in the same bucket.
* The rq clamp bucket value is reset to its base value whenever
* there are no more RUNNABLE tasks refcounting it.
*/
if (likely(bucket->tasks)) (2)
return;
rq_clamp = READ_ONCE(uc_rq->value);
/*
* Defensive programming: this should never happen. If it happens,
* e.g. due to future modification, warn and fixup the expected value.
*/
SCHED_WARN_ON(bucket->value > rq_clamp); (3)
if (bucket->value >= rq_clamp) {
bkt_clamp = uclamp_rq_max_value(rq, clamp_id, uc_se->value);
WRITE_ONCE(uc_rq->value, bkt_clamp);
}
}
函數調用關係:
- dequeue_task()->uclamp_rq_dec()->uclamp_rq_dec_id()
/*把task從rq的bucket中解除規劃*/ - |cpu_cgrp_subsys::cpu_legacy_files::cpu_uclamp_min_write()/cpu_uclamp_max_write->cpu_uclamp_write()->|
|sysctl_sched_uclamp_handler()->uclamp_update_root_tg()->|
|->cpu_util_update_eff()->uclamp_update_active_tasks()->uclamp_update_active()->uclamp_rq_dec_id()
(1) 從bucket中解除task的規劃,並將task::uclamp::active=false,表示task沒被規劃到rq的bucket中。
(2) 如果從bucket中解除該task的規劃後,bucket中還有其它task,直接返回。(此處沒有更新bucket::value和rq::uclamp::value, comments中解釋,簡化更新同一個bucket的vaule值,對其它runnable task的over boot是可接受的。這種情況下,已經dequeue的task的uclamp對rq中的runnable taks的over boost影響,對功耗的影響尚待評估)。
(3) 對bucket::value>rq::uclamp::value極少出現的不正常的邏輯添加debug log,或者如果此時task對應的bucket::value=rq::rclamp::value, 更新rq::rclamp::value爲當前buckets中的最大值(如果(2)中沒更新bucket::value,此時bucket最大值仍可能是已經出隊的task的task::uclamp::value,直到此bucket內沒有task)。
6. uclamp_rq_util_with()
static __always_inline
unsigned long uclamp_rq_util_with(struct rq *rq, unsigned long util,
struct task_struct *p)
{
unsigned long min_util = READ_ONCE(rq->uclamp[UCLAMP_MIN].value);
unsigned long max_util = READ_ONCE(rq->uclamp[UCLAMP_MAX].value);
if (p) {
min_util = max(min_util, uclamp_eff_value(p, UCLAMP_MIN));
max_util = max(max_util, uclamp_eff_value(p, UCLAMP_MAX));
}
/*
* Since CPU's {min,max}_util clamps are MAX aggregated considering
* RUNNABLE tasks with _different_ clamps, we can end up with an
* inversion. Fix it now when the clamps are applied.
*/
if (unlikely(min_util >= max_util))
return min_util;
return clamp(util, min_util, max_util);
}
函數調用關係:
|compute_energy()->|
|sugov_get_util()->|
|->schedutil_cpu_util()-> uclamp_rq_util_with()
此函數是uclamp和schedutil/energy的接口函數,通過此函數獲得uclamp值。
7. sysctl接口
sysctl提供了/proc/sys/kernel/sched_uclamp_util_{min,max}接口,定義了系統默認的clamp範圍,無條件限制所有task。
三. 參考資料
-
Patrick Bellasi's kernel:
http://www.linux-arm.org/git?p=linux-pb.git;a=summary
http://www.linux-arm.org/git?p=linux-pb.git;a=shortlog;h=refs/heads/lkml/utilclamp_v10 -
uclamp前生之SchedTune:http://retis.sssup.it/~luca/ospm-summit/2017/Downloads/OSPM_SchedTune.pdf
-
base patch list:
af24bde sched/uclamp: Add uclamp support to energy_compute()
9d20ad7 sched/uclamp: Add uclamp_util_with()
982d9cd sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks
1a00d99 sched/uclamp: Set default clamps for RT tasks
a87498a sched/uclamp: Reset uclamp values on RESET_ON_FORK
a509a7c sched/uclamp: Extend sched_setattr() to support utilization clamping
1d6362f sched/core: Allow sched_setattr() to use the current policy
e8f1417 sched/uclamp: Add system default clamps
e496187 sched/uclamp: Enforce last task's UCLAMP_MAX
60daf9c sched/uclamp: Add bucket local max tracking
69842cb sched/uclamp: Add CPU's clamp buckets refcounting -
android5.4
15d93f6 UPSTREAM: sched/fair: Make EAS wakeup placement consider uclamp restrictions
d5c2a09 UPSTREAM: sched/fair: Make task_fits_capacity() consider uclamp restrictions
1356a58 UPSTREAM: sched/uclamp: Rename uclamp_util_with() into uclamp_rq_util_with()
fedb670 UPSTREAM: sched/uclamp: Make uclamp util helpers use and return UL values
6966eb9 BACKPORT: sched/uclamp: Remove uclamp_util()
1473e20 Revert "ANDROID: sched/fair: EAS: Add uclamp support to find_energy_efficient_cpu()"
c598c8a sched/uclamp: Fix overzealous type replacement
6e1ff07 sched/uclamp: Fix incorrect condition
0e00b6f ANDROID: sched: Introduce uclamp latency and boost wrapper
c28f9d3 ANDROID: sched/core: Add a latency-sensitive flag to uclamp
b61876e ANDROID: sched/fair: EAS: Add uclamp support to find_energy_efficient_cpu()
1251201 sched/core: Fix uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic and code
0413d7f sched/uclamp: Always use 'enum uclamp_id' for clamp_id values
babbe17 sched/uclamp: Update CPU's refcount on TG's clamp changes
3eac870 sched/uclamp: Use TG's clamps to restrict TASK's clamps
7274a5c sched/uclamp: Propagate system defaults to the root group
0b60ba2dd3 sched/uclamp: Propagate parent clamps
2480c09 sched/uclamp: Extend CPU's cgroup controller