kube-scheduler源碼分析(五)之 PrioritizeNodes

本文個人博客地址:https://www.huweihuang.com/kubernetes-notes/code-analysis/kube-scheduler/PrioritizeNodes.html

kube-scheduler源碼分析(五)之 PrioritizeNodes

以下代碼分析基於 kubernetes v1.12.0 版本。

本文主要分析優選策略邏輯,即從預選的節點中選擇出最優的節點。優選策略的具體實現函數爲PrioritizeNodesPrioritizeNodes最終返回是一個記錄了各個節點分數的列表。

1. 調用入口

genericScheduler.Schedule中對PrioritizeNodes的調用過程如下:

此部分代碼位於pkg/scheduler/core/generic_scheduler.go

func (g *genericScheduler) Schedule(pod *v1.Pod, nodeLister algorithm.NodeLister) (string, error) {
  ...
	trace.Step("Prioritizing")
	startPriorityEvalTime := time.Now()
	// When only one node after predicate, just use it.
	if len(filteredNodes) == 1 {
		metrics.SchedulingAlgorithmPriorityEvaluationDuration.Observe(metrics.SinceInMicroseconds(startPriorityEvalTime))
		return filteredNodes[0].Name, nil
	}

	metaPrioritiesInterface := g.priorityMetaProducer(pod, g.cachedNodeInfoMap)
  // 執行優選邏輯的操作,返回記錄各個節點分數的列表
	priorityList, err := PrioritizeNodes(pod, g.cachedNodeInfoMap, metaPrioritiesInterface, g.prioritizers, filteredNodes, g.extenders)
	if err != nil {
		return "", err
	}
	metrics.SchedulingAlgorithmPriorityEvaluationDuration.Observe(metrics.SinceInMicroseconds(startPriorityEvalTime))
	metrics.SchedulingLatency.WithLabelValues(metrics.PriorityEvaluation).Observe(metrics.SinceInSeconds(startPriorityEvalTime))
  ...
}  

核心代碼:

// 基於預選節點filteredNodes進一步篩選優選的節點,返回記錄各個節點分數的列表
priorityList, err := PrioritizeNodes(pod, g.cachedNodeInfoMap, metaPrioritiesInterface, g.prioritizers, filteredNodes, g.extenders)

2. PrioritizeNodes

優選,從滿足的節點中選擇出最優的節點。PrioritizeNodes最終返回是一個記錄了各個節點分數的列表。

具體操作如下:

  • PrioritizeNodes通過並行運行各個優先級函數來對節點進行優先級排序。
  • 每個優先級函數會給節點打分,打分範圍爲0-10分。
  • 0 表示優先級最低的節點,10表示優先級最高的節點。
  • 每個優先級函數也有各自的權重。
  • 優先級函數返回的節點分數乘以權重以獲得加權分數。
  • 最後組合(添加)所有分數以獲得所有節點的總加權分數。

PrioritizeNodes主要流程如下:

  1. 如果沒有設置優選函數和拓展函數,則全部節點設置相同的分數,直接返回。
  2. 依次給node執行map函數進行打分。
  3. 再對上述map函數的執行結果執行reduce函數計算最終得分。
  4. 最後根據不同優先級函數的權重對得分取加權平均數。

入參:

  • pod
  • nodeNameToInfo
  • meta interface{},
  • priorityConfigs
  • nodes
  • extenders

出參:

  • HostPriorityList:記錄節點分數的列表。

HostPriority定義如下:

// HostPriority represents the priority of scheduling to a particular host, higher priority is better.
type HostPriority struct {
	// Name of the host
	Host string
	// Score associated with the host
	Score int
}

PrioritizeNodes完整代碼如下:

此部分代碼位於pkg/scheduler/core/generic_scheduler.go

// PrioritizeNodes prioritizes the nodes by running the individual priority functions in parallel.
// Each priority function is expected to set a score of 0-10
// 0 is the lowest priority score (least preferred node) and 10 is the highest
// Each priority function can also have its own weight
// The node scores returned by the priority function are multiplied by the weights to get weighted scores
// All scores are finally combined (added) to get the total weighted scores of all nodes
func PrioritizeNodes(
	pod *v1.Pod,
	nodeNameToInfo map[string]*schedulercache.NodeInfo,
	meta interface{},
	priorityConfigs []algorithm.PriorityConfig,
	nodes []*v1.Node,
	extenders []algorithm.SchedulerExtender,
) (schedulerapi.HostPriorityList, error) {
	// If no priority configs are provided, then the EqualPriority function is applied
	// This is required to generate the priority list in the required format
	if len(priorityConfigs) == 0 && len(extenders) == 0 {
		result := make(schedulerapi.HostPriorityList, 0, len(nodes))
		for i := range nodes {
			hostPriority, err := EqualPriorityMap(pod, meta, nodeNameToInfo[nodes[i].Name])
			if err != nil {
				return nil, err
			}
			result = append(result, hostPriority)
		}
		return result, nil
	}

	var (
		mu   = sync.Mutex{}
		wg   = sync.WaitGroup{}
		errs []error
	)
	appendError := func(err error) {
		mu.Lock()
		defer mu.Unlock()
		errs = append(errs, err)
	}

	results := make([]schedulerapi.HostPriorityList, len(priorityConfigs), len(priorityConfigs))

	for i, priorityConfig := range priorityConfigs {
		if priorityConfig.Function != nil {
			// DEPRECATED
			wg.Add(1)
			go func(index int, config algorithm.PriorityConfig) {
				defer wg.Done()
				var err error
				results[index], err = config.Function(pod, nodeNameToInfo, nodes)
				if err != nil {
					appendError(err)
				}
			}(i, priorityConfig)
		} else {
			results[i] = make(schedulerapi.HostPriorityList, len(nodes))
		}
	}
	processNode := func(index int) {
		nodeInfo := nodeNameToInfo[nodes[index].Name]
		var err error
		for i := range priorityConfigs {
			if priorityConfigs[i].Function != nil {
				continue
			}
			results[i][index], err = priorityConfigs[i].Map(pod, meta, nodeInfo)
			if err != nil {
				appendError(err)
				return
			}
		}
	}
	workqueue.Parallelize(16, len(nodes), processNode)
	for i, priorityConfig := range priorityConfigs {
		if priorityConfig.Reduce == nil {
			continue
		}
		wg.Add(1)
		go func(index int, config algorithm.PriorityConfig) {
			defer wg.Done()
			if err := config.Reduce(pod, meta, nodeNameToInfo, results[index]); err != nil {
				appendError(err)
			}
			if glog.V(10) {
				for _, hostPriority := range results[index] {
					glog.Infof("%v -> %v: %v, Score: (%d)", pod.Name, hostPriority.Host, config.Name, hostPriority.Score)
				}
			}
		}(i, priorityConfig)
	}
	// Wait for all computations to be finished.
	wg.Wait()
	if len(errs) != 0 {
		return schedulerapi.HostPriorityList{}, errors.NewAggregate(errs)
	}

	// Summarize all scores.
	result := make(schedulerapi.HostPriorityList, 0, len(nodes))

	for i := range nodes {
		result = append(result, schedulerapi.HostPriority{Host: nodes[i].Name, Score: 0})
		for j := range priorityConfigs {
			result[i].Score += results[j][i].Score * priorityConfigs[j].Weight
		}
	}

	if len(extenders) != 0 && nodes != nil {
		combinedScores := make(map[string]int, len(nodeNameToInfo))
		for _, extender := range extenders {
			if !extender.IsInterested(pod) {
				continue
			}
			wg.Add(1)
			go func(ext algorithm.SchedulerExtender) {
				defer wg.Done()
				prioritizedList, weight, err := ext.Prioritize(pod, nodes)
				if err != nil {
					// Prioritization errors from extender can be ignored, let k8s/other extenders determine the priorities
					return
				}
				mu.Lock()
				for i := range *prioritizedList {
					host, score := (*prioritizedList)[i].Host, (*prioritizedList)[i].Score
					combinedScores[host] += score * weight
				}
				mu.Unlock()
			}(extender)
		}
		// wait for all go routines to finish
		wg.Wait()
		for i := range result {
			result[i].Score += combinedScores[result[i].Host]
		}
	}

	if glog.V(10) {
		for i := range result {
			glog.V(10).Infof("Host %s => Score %d", result[i].Host, result[i].Score)
		}
	}
	return result, nil
}

以下對PrioritizeNodes分段進行分析。

3. EqualPriorityMap

如果沒有提供優選函數和拓展函數,則將所有的節點設置爲相同的優先級,即節點的score都爲1,然後直接返回結果。(但一般情況下優選函數列表都不爲空)

// If no priority configs are provided, then the EqualPriority function is applied
// This is required to generate the priority list in the required format
if len(priorityConfigs) == 0 && len(extenders) == 0 {
	result := make(schedulerapi.HostPriorityList, 0, len(nodes))
	for i := range nodes {
		hostPriority, err := EqualPriorityMap(pod, meta, nodeNameToInfo[nodes[i].Name])
		if err != nil {
			return nil, err
		}
		result = append(result, hostPriority)
	}
	return result, nil
}

EqualPriorityMap具體實現如下:

// EqualPriorityMap is a prioritizer function that gives an equal weight of one to all nodes
func EqualPriorityMap(_ *v1.Pod, _ interface{}, nodeInfo *schedulercache.NodeInfo) (schedulerapi.HostPriority, error) {
	node := nodeInfo.Node()
	if node == nil {
		return schedulerapi.HostPriority{}, fmt.Errorf("node not found")
	}
	return schedulerapi.HostPriority{
		Host:  node.Name,
		Score: 1,
	}, nil
}

4. processNode

processNode就是基於index拿出node的信息,調用之前註冊的各種優選函數(此處是mapFunction),通過優選函數對node和pod進行處理,最後返回一個記錄node分數的列表resultprocessNode同樣也使用workqueue.Parallelize來進行並行處理。(processNode類似於預選邏輯findNodesThatFit中使用到的checkNode的作用)

其中優選函數是通過priorityConfigs來記錄,每類優選函數包括PriorityMapFunctionPriorityReduceFunction兩種函數。優選函數的註冊部分可參考registerAlgorithmProvider

processNode := func(index int) {
	nodeInfo := nodeNameToInfo[nodes[index].Name]
	var err error
	for i := range priorityConfigs {
		if priorityConfigs[i].Function != nil {
			continue
		}
		results[i][index], err = priorityConfigs[i].Map(pod, meta, nodeInfo)
		if err != nil {
			appendError(err)
			return
		}
	}
}
// 並行執行processNode
workqueue.Parallelize(16, len(nodes), processNode)

priorityConfigs定義如下:

核心屬性:

  • Map :PriorityMapFunction
  • Reduce:PriorityReduceFunction
// PriorityConfig is a config used for a priority function.
type PriorityConfig struct {
	Name   string
	Map    PriorityMapFunction   
	Reduce PriorityReduceFunction
	// TODO: Remove it after migrating all functions to
	// Map-Reduce pattern.
	Function PriorityFunction
	Weight   int
}

具體的優選函數處理邏輯待下文分析,本文會以NewSelectorSpreadPriority函數爲例。

5. PriorityMapFunction

PriorityMapFunction是一個計算給定節點的每個節點結果的函數。

PriorityMapFunction定義如下:

// PriorityMapFunction is a function that computes per-node results for a given node.
// TODO: Figure out the exact API of this method.
// TODO: Change interface{} to a specific type.
type PriorityMapFunction func(pod *v1.Pod, meta interface{}, nodeInfo *schedulercache.NodeInfo) (schedulerapi.HostPriority, error)

PriorityMapFunction是在processNode中調用的,代碼如下:

results[i][index], err = priorityConfigs[i].Map(pod, meta, nodeInfo)

下文會分析NewSelectorSpreadPriority在的map函數CalculateSpreadPriorityMap

6. PriorityReduceFunction

PriorityReduceFunction是一個聚合每個節點結果並計算所有節點的最終得分的函數。

PriorityReduceFunction定義如下:

// PriorityReduceFunction is a function that aggregated per-node results and computes
// final scores for all nodes.
// TODO: Figure out the exact API of this method.
// TODO: Change interface{} to a specific type.
type PriorityReduceFunction func(pod *v1.Pod, meta interface{}, nodeNameToInfo map[string]*schedulercache.NodeInfo, result schedulerapi.HostPriorityList) error

PrioritizeNodes中對reduce函數調用部分如下:

for i, priorityConfig := range priorityConfigs {
	if priorityConfig.Reduce == nil {
		continue
	}
	wg.Add(1)
	go func(index int, config algorithm.PriorityConfig) {
		defer wg.Done()
		if err := config.Reduce(pod, meta, nodeNameToInfo, results[index]); err != nil {
			appendError(err)
		}
		if glog.V(10) {
			for _, hostPriority := range results[index] {
				glog.Infof("%v -> %v: %v, Score: (%d)", pod.Name, hostPriority.Host, config.Name, hostPriority.Score)
			}
		}
	}(i, priorityConfig)
}

下文會分析NewSelectorSpreadPriority在的reduce函數CalculateSpreadPriorityReduce

7. Summarize all scores

先等待計算完成再計算加權平均數。

// Wait for all computations to be finished.
wg.Wait()
if len(errs) != 0 {
	return schedulerapi.HostPriorityList{}, errors.NewAggregate(errs)
}

計算所有節點的加權平均數。

// Summarize all scores.
result := make(schedulerapi.HostPriorityList, 0, len(nodes))

for i := range nodes {
	result = append(result, schedulerapi.HostPriority{Host: nodes[i].Name, Score: 0})
	for j := range priorityConfigs {
		result[i].Score += results[j][i].Score * priorityConfigs[j].Weight
	}
}

當設置了拓展的計算方式,則增加拓展計算方式的加權平均數。

if len(extenders) != 0 && nodes != nil {
	combinedScores := make(map[string]int, len(nodeNameToInfo))
	for _, extender := range extenders {
		if !extender.IsInterested(pod) {
			continue
		}
		wg.Add(1)
		go func(ext algorithm.SchedulerExtender) {
			defer wg.Done()
			prioritizedList, weight, err := ext.Prioritize(pod, nodes)
			if err != nil {
				// Prioritization errors from extender can be ignored, let k8s/other extenders determine the priorities
				return
			}
			mu.Lock()
			for i := range *prioritizedList {
				host, score := (*prioritizedList)[i].Host, (*prioritizedList)[i].Score
				combinedScores[host] += score * weight
			}
			mu.Unlock()
		}(extender)
	}
	// wait for all go routines to finish
	wg.Wait()
	for i := range result {
		result[i].Score += combinedScores[result[i].Host]
	}
}

8. NewSelectorSpreadPriority

以下以NewSelectorSpreadPriority這個優選函數來做分析,其他重要的優選函數待後續專門分析。

NewSelectorSpreadPriority主要的功能是將屬於相同service和rs下的pod儘量分佈在不同的node上。

該函數的註冊代碼如下:

此部分代碼位於pkg/scheduler/algorithmprovider/defaults/defaults.go

// ServiceSpreadingPriority is a priority config factory that spreads pods by minimizing
// the number of pods (belonging to the same service) on the same node.
// Register the factory so that it's available, but do not include it as part of the default priorities
// Largely replaced by "SelectorSpreadPriority", but registered for backward compatibility with 1.0
factory.RegisterPriorityConfigFactory(
	"ServiceSpreadingPriority",
	factory.PriorityConfigFactory{
		MapReduceFunction: func(args factory.PluginFactoryArgs) (algorithm.PriorityMapFunction, algorithm.PriorityReduceFunction) {
			return priorities.NewSelectorSpreadPriority(args.ServiceLister, algorithm.EmptyControllerLister{}, algorithm.EmptyReplicaSetLister{}, algorithm.EmptyStatefulSetLister{})
		},
		Weight: 1,
	},
)

NewSelectorSpreadPriority的具體實現如下:

此部分代碼位於pkg/scheduler/algorithm/priorities/selector_spreading.go

// NewSelectorSpreadPriority creates a SelectorSpread.
func NewSelectorSpreadPriority(
	serviceLister algorithm.ServiceLister,
	controllerLister algorithm.ControllerLister,
	replicaSetLister algorithm.ReplicaSetLister,
	statefulSetLister algorithm.StatefulSetLister) (algorithm.PriorityMapFunction, algorithm.PriorityReduceFunction) {
	selectorSpread := &SelectorSpread{
		serviceLister:     serviceLister,
		controllerLister:  controllerLister,
		replicaSetLister:  replicaSetLister,
		statefulSetLister: statefulSetLister,
	}
	return selectorSpread.CalculateSpreadPriorityMap, selectorSpread.CalculateSpreadPriorityReduce
}

NewSelectorSpreadPriority主要包括map和reduce兩種函數,分別對應CalculateSpreadPriorityMapCalculateSpreadPriorityReduce

8.1. CalculateSpreadPriorityMap

CalculateSpreadPriorityMap的主要作用是將相同service、RC、RS或statefulset的pod分佈在不同的節點上。當調度一個pod的時候,先尋找與該pod匹配的service、RS、RC或statefulset,然後尋找與其selector匹配的已存在的pod,尋找存在這類pod最少的節點。

基本流程如下:

  1. 尋找與該pod對應的service、RS、RC、statefulset匹配的selector。
  2. 遍歷當前節點的所有pod,將該節點上已存在的selector匹配到的pod的個數作爲該節點的分數(此時,分數大的表示匹配到的pod越多,越不符合被調度的條件,該分數在reduce階段會被按10分制處理成分數大的越符合被調度的條件)。

此部分代碼位於pkg/scheduler/algorithm/priorities/selector_spreading.go

// CalculateSpreadPriorityMap spreads pods across hosts, considering pods
// belonging to the same service,RC,RS or StatefulSet.
// When a pod is scheduled, it looks for services, RCs,RSs and StatefulSets that match the pod,
// then finds existing pods that match those selectors.
// It favors nodes that have fewer existing matching pods.
// i.e. it pushes the scheduler towards a node where there's the smallest number of
// pods which match the same service, RC,RSs or StatefulSets selectors as the pod being scheduled.
func (s *SelectorSpread) CalculateSpreadPriorityMap(pod *v1.Pod, meta interface{}, nodeInfo *schedulercache.NodeInfo) (schedulerapi.HostPriority, error) {
	var selectors []labels.Selector
	node := nodeInfo.Node()
	if node == nil {
		return schedulerapi.HostPriority{}, fmt.Errorf("node not found")
	}

	priorityMeta, ok := meta.(*priorityMetadata)
	if ok {
		selectors = priorityMeta.podSelectors
	} else {
		selectors = getSelectors(pod, s.serviceLister, s.controllerLister, s.replicaSetLister, s.statefulSetLister)
	}

	if len(selectors) == 0 {
		return schedulerapi.HostPriority{
			Host:  node.Name,
			Score: int(0),
		}, nil
	}

	count := int(0)
	for _, nodePod := range nodeInfo.Pods() {
		if pod.Namespace != nodePod.Namespace {
			continue
		}
		// When we are replacing a failed pod, we often see the previous
		// deleted version while scheduling the replacement.
		// Ignore the previous deleted version for spreading purposes
		// (it can still be considered for resource restrictions etc.)
		if nodePod.DeletionTimestamp != nil {
			glog.V(4).Infof("skipping pending-deleted pod: %s/%s", nodePod.Namespace, nodePod.Name)
			continue
		}
		for _, selector := range selectors {
			if selector.Matches(labels.Set(nodePod.ObjectMeta.Labels)) {
				count++
				break
			}
		}
	}
	return schedulerapi.HostPriority{
		Host:  node.Name,
		Score: int(count),
	}, nil
}

以下分段分析:

先獲得selector。

selectors = getSelectors(pod, s.serviceLister, s.controllerLister, s.replicaSetLister, s.statefulSetLister)

計算節點上匹配selector的pod的個數,作爲該節點分數,該分數並不是最終節點的分數,只是中間過渡的記錄狀態。

count := int(0)
for _, nodePod := range nodeInfo.Pods() {
	...
	for _, selector := range selectors {
		if selector.Matches(labels.Set(nodePod.ObjectMeta.Labels)) {
			count++
			break
		}
	}
}

8.2. CalculateSpreadPriorityReduce

CalculateSpreadPriorityReduce根據節點上現有匹配pod的數量計算每個節點的十分制的分數,具有較少現有匹配pod的節點的分數越高,表示節點越可能被調度到。

基本流程如下:

  1. 記錄所有節點中匹配到pod個數最多的節點的分數(即匹配到的pod最多的個數)。
  2. 遍歷所有的節點,按比例取十分制的得分,計算方式爲:(節點中最多匹配pod的個數-當前節點pod的個數)/節點中最多匹配pod的個數。此時,分數越高表示該節點上匹配到的pod的個數越少,越可能被調度到,即滿足把相同selector的pod分散到不同節點的需求。

此部分代碼位於pkg/scheduler/algorithm/priorities/selector_spreading.go

// CalculateSpreadPriorityReduce calculates the source of each node
// based on the number of existing matching pods on the node
// where zone information is included on the nodes, it favors nodes
// in zones with fewer existing matching pods.
func (s *SelectorSpread) CalculateSpreadPriorityReduce(pod *v1.Pod, meta interface{}, nodeNameToInfo map[string]*schedulercache.NodeInfo, result schedulerapi.HostPriorityList) error {
	countsByZone := make(map[string]int, 10)
	maxCountByZone := int(0)
	maxCountByNodeName := int(0)

	for i := range result {
		if result[i].Score > maxCountByNodeName {
			maxCountByNodeName = result[i].Score
		}
		zoneID := utilnode.GetZoneKey(nodeNameToInfo[result[i].Host].Node())
		if zoneID == "" {
			continue
		}
		countsByZone[zoneID] += result[i].Score
	}

	for zoneID := range countsByZone {
		if countsByZone[zoneID] > maxCountByZone {
			maxCountByZone = countsByZone[zoneID]
		}
	}

	haveZones := len(countsByZone) != 0

	maxCountByNodeNameFloat64 := float64(maxCountByNodeName)
	maxCountByZoneFloat64 := float64(maxCountByZone)
	MaxPriorityFloat64 := float64(schedulerapi.MaxPriority)

	for i := range result {
		// initializing to the default/max node score of maxPriority
		fScore := MaxPriorityFloat64
		if maxCountByNodeName > 0 {
			fScore = MaxPriorityFloat64 * (float64(maxCountByNodeName-result[i].Score) / maxCountByNodeNameFloat64)
		}
		// If there is zone information present, incorporate it
		if haveZones {
			zoneID := utilnode.GetZoneKey(nodeNameToInfo[result[i].Host].Node())
			if zoneID != "" {
				zoneScore := MaxPriorityFloat64
				if maxCountByZone > 0 {
					zoneScore = MaxPriorityFloat64 * (float64(maxCountByZone-countsByZone[zoneID]) / maxCountByZoneFloat64)
				}
				fScore = (fScore * (1.0 - zoneWeighting)) + (zoneWeighting * zoneScore)
			}
		}
		result[i].Score = int(fScore)
		if glog.V(10) {
			glog.Infof(
				"%v -> %v: SelectorSpreadPriority, Score: (%d)", pod.Name, result[i].Host, int(fScore),
			)
		}
	}
	return nil
}

以下分段分析:

先獲取所有節點中匹配到的pod最多的個數。

for i := range result {
	if result[i].Score > maxCountByNodeName {
		maxCountByNodeName = result[i].Score
	}
	zoneID := utilnode.GetZoneKey(nodeNameToInfo[result[i].Host].Node())
	if zoneID == "" {
		continue
	}
	countsByZone[zoneID] += result[i].Score
}

遍歷所有的節點,按比例取十分制的得分。

for i := range result {
	// initializing to the default/max node score of maxPriority
	fScore := MaxPriorityFloat64
	if maxCountByNodeName > 0 {
		fScore = MaxPriorityFloat64 * (float64(maxCountByNodeName-result[i].Score) / maxCountByNodeNameFloat64)
	}
  ...
}  

9. 總結

優選,從滿足的節點中選擇出最優的節點。PrioritizeNodes最終返回是一個記錄了各個節點分數的列表。

9.1. PrioritizeNodes

主要流程如下:

  1. 如果沒有設置優選函數和拓展函數,則全部節點設置相同的分數,直接返回。
  2. 依次給node執行map函數進行打分。
  3. 再對上述map函數的執行結果執行reduce函數計算最終得分。
  4. 最後根據不同優先級函數的權重對得分取加權平均數。

其中每類優選函數會包含map函數和reduce函數兩種。

9.2. NewSelectorSpreadPriority

其中以NewSelectorSpreadPriority這個優選函數爲例作分析,該函數的功能是將相同service、RS、RC或statefulset下pod儘量分散到不同的節點上。包括map函數和reduce函數兩部分,具體如下。

9.2.1. CalculateSpreadPriorityMap

基本流程如下:

  1. 尋找與該pod對應的service、RS、RC、statefulset匹配的selector。
  2. 遍歷當前節點的所有pod,將該節點上已存在的selector匹配到的pod的個數作爲該節點的分數(此時,分數大的表示匹配到的pod越多,越不符合被調度的條件,該分數在reduce階段會被按10分制處理成分數大的越符合被調度的條件)。

9.2.2. CalculateSpreadPriorityReduce

基本流程如下:

  1. 記錄所有節點中匹配到pod個數最多的節點的分數(即匹配到的pod最多的個數)。
  2. 遍歷所有的節點,按比例取十分制的得分,計算方式爲:(節點中最多匹配pod的個數-當前節點pod的個數)/節點中最多匹配pod的個數。此時,分數越高表示該節點上匹配到的pod的個數越少,越可能被調度到,即滿足把相同selector的pod分散到不同節點的需求。

參考:

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章