eas k5.4 (二):v5.4 - Util(ization) clamping

 目錄

簡介

一. 數據結構

1. rq中uclamp的實現

2.  task中uclamp的實現

3. 擴展 CPU's cgroup controller

二. 關鍵函數

1. uclamp的初始化

2. fork task的uclamp

3. userspace設置task的uclamp- __setscheduler_uclamp()

4. uclamp_rq_inc_id() 

5.  uclamp_rq_dec_id()

6.  uclamp_rq_util_with()

7. sysctl接口

三. 參考資料


 

簡介

uclamp是在kernel5.3引入的新feature,又在kernel5.4引入了cgroup的的支持。通過設置task的util值,可以boost task util,達到和android schedtune相似的效果。

一. 數據結構

1. rq中uclamp的實現

 圖1-1 rq中uclamp的實現

#ifdef CONFIG_UCLAMP_TASK
    /* Utilization clamp values based on CPU's RUNNABLE tasks */
    struct uclamp_rq    uclamp[UCLAMP_CNT] ____cacheline_aligned;
    unsigned int        uclamp_flags;
#define UCLAMP_FLAG_IDLE 0x01

struct uclamp_rq {
    unsigned int value;
    struct uclamp_bucket bucket[UCLAMP_BUCKETS];
};

在rq中,通過嵌入兩組uclamp,即一組最小值uclamp[UCLAMP_MIN]和一組最大值uclamp[UCLAMP_MAX]實現對cpu的util clamp。每組中的value表示rq當前生效的clamp值,兩組中默認各包含5個buckets,可以通過CONFIG_UCLAMP_BUCKETS_COUNT 配置其它buckets數(5~20個)。uclamp_flags在uclamp初始化init_uclamp()時置零,目前只有一個標誌位UCLAMP_FLAG_IDLE ,標識cpu上是否還有task運行。

每個bucket表示一定範圍的util值,以系統默認的5個buckets爲例,每個bucket的範圍是cpu最大capacity的20%: SCHED_CAPACITY_SCALE/UCLAMP_BUCKETS_COUNT,即1024/5。

rq上的task會根據task的值將其規劃到對應的bucket中。比如taskA在cpu0上run,要求25%的util值,會被規劃到bucket[1]中,bucket[1]::value=25%,記錄生效的util值,tasks計數加1,表示規劃到當前bucket中task又多了一個。如果同時35% util值的taskB被調度到cpu0上,則此時bucket[1]::value=35%,tasks計數加1。此時taskA受益於taskB 35%的util值,直到taskA退出rq。如果系統中對taskA受益taskB更高的boot util不能接受(\Delta10%),比如功耗增加顯著?可以增加bucket數量,這樣就減少了每個bucket對應的util範圍,提高了bucket util的統計精度,代價是使用更多的memory分配buckets。當bucket中沒有task時,value被設置成默認的bucket範圍的最小值,bucket[1]中沒有task時,bucket[1]::value=%20。

2.  task中uclamp的實現


圖1-2 task中uclamp的實現

struct task_struct 
{
 #ifdef CONFIG_UCLAMP_TASK
     /* Clamp values requested for a scheduling entity */
     struct uclamp_se        uclamp_req[UCLAMP_CNT];                             (1)
     /* Effective clamp values used for a scheduling entity */
      struct uclamp_se        uclamp[UCLAMP_CNT];                                (2)
 #endif
}     

struct uclamp_se {       
    unsigned int value      : bits_per(SCHED_CAPACITY_SCALE);
    unsigned int bucket_id      : bits_per(UCLAMP_BUCKETS);    
    unsigned int active     : 1;   
    unsigned int user_defined   : 1;
};

value:最大值是1024,11bit,表示調度實體的clamp值
bucket_id:默認每個uclamp_id各5個bucket,使用3bit表示,表示clamp值對應的bucket id
active:task被規劃到rq的一個bucket中,此bucket::tasks值爲此task計數加1,bucket::value值也作用於此task
user_defined:標識是usersapce請求的clamp值 。可以更改system給task默認分配的boost.

uclamp以調度實體的方式嵌入task_struct中,包含兩種uclamp_se:
(1) 記錄請求的clamp值的request se
(2) 記錄生效的clamp值的avtive se

3. 擴展 CPU's cgroup controller

cpu cgroup支持uclamp,通過在cpu_cgrp_subsys中添加新屬性uclamp.{min,max},限制group中所有task的boost util值和最大util值。

#ifdef CONFIG_UCLAMP_TASK_GROUP
    {    
        .name = "uclamp.min",
        .flags = CFTYPE_NOT_ON_ROOT,
        .seq_show = cpu_uclamp_min_show,
        .write = cpu_uclamp_min_write,
    },   
    {    
        .name = "uclamp.max",
        .flags = CFTYPE_NOT_ON_ROOT,
        .seq_show = cpu_uclamp_max_show,
        .write = cpu_uclamp_max_write,
    },   
    {    
        .name = "uclamp.latency_sensitive",
        .flags = CFTYPE_NOT_ON_ROOT,
        .read_u64 = cpu_uclamp_ls_read_u64,
        .write_u64 = cpu_uclamp_ls_write_u64,
    },   
#endif

通過android的配置,示例如下,實現對group task util的限制。group限制遵循parent group優先原則。

/dev/cpuctl/cpu.uclamp.min 1024
/dev/cpuctl/cpu.uclamp.max 0
/dev/cpuctl/cpu.uclamp.uclamp.latency_sensitive 1000000

/dev/cpuctl/bg_non_interactive/cpu.uclamp.min 1024
/dev/cpuctl/bg_non_interactive/cpu.uclamp.max 0
/dev/cpuctl/bg_non_interactive/cpu.uclamp.uclamp.latency_sensitive 1000000

二. 關鍵函數

1. uclamp的初始化

static void __init init_uclamp(void)
{
    struct uclamp_se uc_max = {};
    enum uclamp_id clamp_id;
    int cpu; 

    mutex_init(&uclamp_mutex);

    for_each_possible_cpu(cpu) {
        memset(&cpu_rq(cpu)->uclamp, 0, sizeof(struct uclamp_rq));                (1)
        cpu_rq(cpu)->uclamp_flags = 0; 
    }    

    for_each_clamp_id(clamp_id) {
        uclamp_se_set(&init_task.uclamp_req[clamp_id],                            (2)
                  uclamp_none(clamp_id), false);
    }    

    /* System defaults allow max clamp values for both indexes */
    uclamp_se_set(&uc_max, uclamp_none(UCLAMP_MAX), false);                        (3)
    for_each_clamp_id(clamp_id) {
        uclamp_default[clamp_id] = uc_max;                                         (4)
#ifdef CONFIG_UCLAMP_TASK_GROUP                                                    (5)
        root_task_group.uclamp_req[clamp_id] = uc_max;
        root_task_group.uclamp[clamp_id] = uc_max;
#endif
    }    
}

uclamp的初始化函數,在調度器初始化函數sched_init()最後被調用。

(1) 初始化每個cpu所屬rq的uclamp,此處初始化的內存大小應爲:sizeof(struct uclamp_rq)*UCLAMP_CNT,已提交upstream patch。

(2) 初始化init_task的request se,UCLAMP_MIN初始化value=0,UCLAMP_MAX初始化value=1024,並計算對應的bucket id。

(3) 定義了一個uc_max se,並初始化value=1024,計算對應的bucket, uc_max表示clamp的最大值。

(4) 定義了uclamp_default se,並初始化UCLAMP_MIN和UCLAMP_MAX value都爲uc_max。uclamp_defaut是所有clamp se的上限,任何clamp se都要小於等於uclamp_default的值。

(5) 初始化root_task_group的request se和active se爲uc_max

2. fork task的uclamp

static void uclamp_fork(struct task_struct *p)
{
    enum uclamp_id clamp_id;

    for_each_clamp_id(clamp_id)
        p->uclamp[clamp_id].active = false;                                (1)

    if (likely(!p->sched_reset_on_fork))                                    (2)
        return;

    for_each_clamp_id(clamp_id) {
        unsigned int clamp_value = uclamp_none(clamp_id);

        /* By default, RT tasks always get 100% boost */
        if (unlikely(rt_task(p) && clamp_id == UCLAMP_MIN))                 (3)
            clamp_value = uclamp_none(UCLAMP_MAX);

        uclamp_se_set(&p->uclamp_req[clamp_id], clamp_value, false);        (4)
    }    
}

uclamp_fork()在sched_fork()函數中被調用。

(1)  新fork的task不被規劃到任何bucket中,設置其uclamp::active=false。

(2) 一般新fork的task都不要求reset sched,此時返回即可。

(3) 如果新fork的task要求reset sched,則設置task的request se,task是RT時,task request se的UCLAMP_MIN和UCLAMP_MAX都設置value=1024,即RT task boost沒有限制,獲取100% boost。

(4) 其它類型task request se設置UCLAMP_MIN value=0,UCLAMP_MAX value=1024。

3. userspace設置task的uclamp- __setscheduler_uclamp()

__setscheduler_uclamp函數在__sched_setscheduler()函數中被調用。

static void __setscheduler_uclamp(struct task_struct *p,
                  const struct sched_attr *attr) 
{
    enum uclamp_id clamp_id;  

    /* 
     * On scheduling class change, reset to default clamps for tasks
     * without a task-specific value.
     */
    for_each_clamp_id(clamp_id) {                                          (1)
        struct uclamp_se *uc_se = &p->uclamp_req[clamp_id];
        unsigned int clamp_value = uclamp_none(clamp_id);

        /* Keep using defined clamps across class changes */
        if (uc_se->user_defined)       
            continue;

        /* By default, RT tasks always get 100% boost */
        if (unlikely(rt_task(p) && clamp_id == UCLAMP_MIN))
            clamp_value = uclamp_none(UCLAMP_MAX);

        uclamp_se_set(uc_se, clamp_value, false);
    }  

    if (likely(!(attr->sched_flags & SCHED_FLAG_UTIL_CLAMP)))
        return;

    if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) {                    (2)
        uclamp_se_set(&p->uclamp_req[UCLAMP_MIN],
                  attr->sched_util_min, true);   
    }  

    if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) {                    (3)
        uclamp_se_set(&p->uclamp_req[UCLAMP_MAX],
                  attr->sched_util_max, true);   
    }  
}

__setscheduler_uclamp()函數提供了userspace設置task uclamp的方法。

(1) 設置task 默認的request se值,對RT,UCLAMP_MIN和UCLAMP_MAX 設置value=1024最大值,其它task設置UCLAMP_MIN value=0,UCLAMP_MAX value=1024.

(2) 根據attr參數中sched_flags=SCHED_FLAG_UTIL_CLAMP_MIN,設置task的request se的UCLAMP_MIN value值爲attr::sched_util_min的值;

(3) 根據attr參數中sched_flags=SCHED_FLAG_UTIL_CLAMP_MAX,設置task的request se的UCLAMP_MAX value值爲attr::sched_util_max的值;

4. uclamp_rq_inc_id() 

static inline void uclamp_rq_inc_id(struct rq *rq, struct task_struct *p,
                    enum uclamp_id clamp_id)
{
    struct uclamp_rq *uc_rq = &rq->uclamp[clamp_id];
    struct uclamp_se *uc_se = &p->uclamp[clamp_id];
    struct uclamp_bucket *bucket;

    lockdep_assert_held(&rq->lock);

    /* Update task effective clamp */
    p->uclamp[clamp_id] = uclamp_eff_get(p, clamp_id);                (1)

    bucket = &uc_rq->bucket[uc_se->bucket_id];                        (2)
    bucket->tasks++;
    uc_se->active = true;

    uclamp_idle_reset(rq, clamp_id, uc_se->value);                    (3)

    /*   
     * Local max aggregation: rq buckets always track the max
     * "requested" clamp value of its RUNNABLE tasks.
     */
    if (bucket->tasks == 1 || uc_se->value > bucket->value)           (4)
        bucket->value = uc_se->value;

    if (uc_se->value > READ_ONCE(uc_rq->value))                       (5)
        WRITE_ONCE(uc_rq->value, uc_se->value);
}

enqueue_task()->uclamp_rq_inc()->uclamp_rq_inc_id() /*入隊時規劃task到rq bucket中 ,並更新uclamp_rq::value*/

|cpu_cgrp_subsys::cpu_legacy_files::cpu_uclamp_min_write()/cpu_uclamp_max_write->cpu_uclamp_write()->| /*設置cgroup的uclamp*/
|sysctl_sched_uclamp_handler()->uclamp_update_root_tg()->| /*通過proc/sys/kernel/sched_util_clamp_min{max}設置root_task_group的uclamp*/
|->cpu_util_update_eff()->uclamp_update_active_tasks()->uclamp_update_active() ->uclamp_rq_inc_id()

(1) 把task::uclamp_req的請求uclamp,按系優先級順序:uclamp_default (include task::sched_task_group::autogroup/root_task_group) > p::sched_task_group(非auto非root) > task::uclamp_req,進行檢查,並將請求的uclamp賦給生效的uclamp值。

(2) 根據生效的uclam值規劃task到對應的bucket(bucket->tasks++),設置task的生效uclamp::active=true。

(3) 通過UCLAMP_FLAG_IDLE判斷rq是否是剛剛退出idle,如果剛剛退出idle,說明rq上當前只有這一個task,則設置rq::uclamp::value=task::uclamp:value

(4) 如果當前bucket只要當前一個task,或者當前task的task::uclamp:value>bucket::value,則更新bucket::value=task::uclamp::value,因爲bucket::value始終是此bucket中所有task 生效clamp的最大值。

(5) 如果task::uclamp::value>rq::uclamp::value,則更新rq::uclamp::value,因爲rq::uclamp::value始終代表rq上所有task uclamp的最大值。

5.  uclamp_rq_dec_id()

static inline void uclamp_rq_dec_id(struct rq *rq, struct task_struct *p, 
                    enum uclamp_id clamp_id)       
{
    struct uclamp_rq *uc_rq = &rq->uclamp[clamp_id];
    struct uclamp_se *uc_se = &p->uclamp[clamp_id]; 
    struct uclamp_bucket *bucket;  
    unsigned int bkt_clamp;   
    unsigned int rq_clamp;    

    lockdep_assert_held(&rq->lock);

    bucket = &uc_rq->bucket[uc_se->bucket_id];
    SCHED_WARN_ON(!bucket->tasks); 
    if (likely(bucket->tasks))                                              (1)
        bucket->tasks--;      
    uc_se->active = false;

    /* 
     * Keep "local max aggregation" simple and accept to (possibly)
     * overboost some RUNNABLE tasks in the same bucket.
     * The rq clamp bucket value is reset to its base value whenever
     * there are no more RUNNABLE tasks refcounting it.
     */
    if (likely(bucket->tasks))                                              (2)
        return;

    rq_clamp = READ_ONCE(uc_rq->value);
    /* 
     * Defensive programming: this should never happen. If it happens,
     * e.g. due to future modification, warn and fixup the expected value.
     */
    SCHED_WARN_ON(bucket->value > rq_clamp);                                (3)
    if (bucket->value >= rq_clamp) {                            
        bkt_clamp = uclamp_rq_max_value(rq, clamp_id, uc_se->value);
        WRITE_ONCE(uc_rq->value, bkt_clamp);
    }  
}

函數調用關係:

  • dequeue_task()->uclamp_rq_dec()->uclamp_rq_dec_id()
    /*把task從rq的bucket中解除規劃*/
  • |cpu_cgrp_subsys::cpu_legacy_files::cpu_uclamp_min_write()/cpu_uclamp_max_write->cpu_uclamp_write()->|
    |sysctl_sched_uclamp_handler()->uclamp_update_root_tg()->|
    |->cpu_util_update_eff()->uclamp_update_active_tasks()->uclamp_update_active()->uclamp_rq_dec_id()

(1)  從bucket中解除task的規劃,並將task::uclamp::active=false,表示task沒被規劃到rq的bucket中。

(2) 如果從bucket中解除該task的規劃後,bucket中還有其它task,直接返回。(此處沒有更新bucket::value和rq::uclamp::value, comments中解釋,簡化更新同一個bucket的vaule值,對其它runnable task的over boot是可接受的。這種情況下,已經dequeue的task的uclamp對rq中的runnable taks的over boost影響,對功耗的影響尚待評估)。

(3) 對bucket::value>rq::uclamp::value極少出現的不正常的邏輯添加debug log,或者如果此時task對應的bucket::value=rq::rclamp::value, 更新rq::rclamp::value爲當前buckets中的最大值(如果(2)中沒更新bucket::value,此時bucket最大值仍可能是已經出隊的task的task::uclamp::value,直到此bucket內沒有task)。

6.  uclamp_rq_util_with()

static __always_inline
unsigned long uclamp_rq_util_with(struct rq *rq, unsigned long util,
                  struct task_struct *p)
{
    unsigned long min_util = READ_ONCE(rq->uclamp[UCLAMP_MIN].value);
    unsigned long max_util = READ_ONCE(rq->uclamp[UCLAMP_MAX].value);

    if (p) {
        min_util = max(min_util, uclamp_eff_value(p, UCLAMP_MIN));
        max_util = max(max_util, uclamp_eff_value(p, UCLAMP_MAX));
    }    

    /*   
     * Since CPU's {min,max}_util clamps are MAX aggregated considering
     * RUNNABLE tasks with _different_ clamps, we can end up with an
     * inversion. Fix it now when the clamps are applied.
     */
    if (unlikely(min_util >= max_util))
        return min_util;

    return clamp(util, min_util, max_util);
}

函數調用關係:
|compute_energy()->|
|sugov_get_util()->|
|->schedutil_cpu_util()-> uclamp_rq_util_with()

此函數是uclamp和schedutil/energy的接口函數,通過此函數獲得uclamp值。

7. sysctl接口

sysctl提供了/proc/sys/kernel/sched_uclamp_util_{min,max}接口,定義了系統默認的clamp範圍,無條件限制所有task。

 

三. 參考資料

  1. Patrick Bellasi's kernel:
    http://www.linux-arm.org/git?p=linux-pb.git;a=summary
    http://www.linux-arm.org/git?p=linux-pb.git;a=shortlog;h=refs/heads/lkml/utilclamp_v10

  2. uclamp前生之SchedTune:http://retis.sssup.it/~luca/ospm-summit/2017/Downloads/OSPM_SchedTune.pdf

  3. base patch list:
    af24bde sched/uclamp: Add uclamp support to energy_compute()
    9d20ad7 sched/uclamp: Add uclamp_util_with()
    982d9cd sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks
    1a00d99 sched/uclamp: Set default clamps for RT tasks
    a87498a sched/uclamp: Reset uclamp values on RESET_ON_FORK
    a509a7c sched/uclamp: Extend sched_setattr() to support utilization clamping
    1d6362f sched/core: Allow sched_setattr() to use the current policy
    e8f1417 sched/uclamp: Add system default clamps
    e496187 sched/uclamp: Enforce last task's UCLAMP_MAX
    60daf9c sched/uclamp: Add bucket local max tracking
    69842cb sched/uclamp: Add CPU's clamp buckets refcounting

  4. android5.4
    15d93f6 UPSTREAM: sched/fair: Make EAS wakeup placement consider uclamp restrictions
    d5c2a09 UPSTREAM: sched/fair: Make task_fits_capacity() consider uclamp restrictions
    1356a58 UPSTREAM: sched/uclamp: Rename uclamp_util_with() into uclamp_rq_util_with()
    fedb670 UPSTREAM: sched/uclamp: Make uclamp util helpers use and return UL values
    6966eb9 BACKPORT: sched/uclamp: Remove uclamp_util()
    1473e20 Revert "ANDROID: sched/fair: EAS: Add uclamp support to find_energy_efficient_cpu()"
    c598c8a sched/uclamp: Fix overzealous type replacement
    6e1ff07 sched/uclamp: Fix incorrect condition
    0e00b6f ANDROID: sched: Introduce uclamp latency and boost wrapper
    c28f9d3 ANDROID: sched/core: Add a latency-sensitive flag to uclamp
    b61876e ANDROID: sched/fair: EAS: Add uclamp support to find_energy_efficient_cpu()
    1251201 sched/core: Fix uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic and code
    0413d7f sched/uclamp: Always use 'enum uclamp_id' for clamp_id values
    babbe17 sched/uclamp: Update CPU's refcount on TG's clamp changes
    3eac870 sched/uclamp: Use TG's clamps to restrict TASK's clamps
    7274a5c sched/uclamp: Propagate system defaults to the root group
    0b60ba2dd3 sched/uclamp: Propagate parent clamps
    2480c09 sched/uclamp: Extend CPU's cgroup controller


 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章