简单来说,k8s中的调度算法就是给pod分配合适的node节点,具体分两步:预选、优选。其中预选是指k8s会默认注册一堆预选算法,只有某个节点通过所有预选算法成功运行pod,那么这个节点才能通过预选,进行下一步优选,这样经过预选之后会淘汰一大批节点。接下来的优选就是优中选优,首先让每个节点通过所有优选算法,得到一个加权分,最后取最高分,即为最优节点,也就是最终运行pod的节点。这里主要通读一下调度模块的代码逻辑。
- scheduler的入口函数所在文件E:\dev\golang\k8s\src\k8s.io\kubernetes\cmd\kube-scheduler\scheduler.go
-
执行上图中47行代码command.Execute()时,对应command中的Run方法会被执行,即下图中的85行:
进一步会执行86行的runCommand方法,该方法会跑起一个scheduler。
-
接下来跳进runCommand方法,比较重要的代码如下:
-
继续往下
-
再往下
不难发现,在调用ApplyFeatureGates方法之前,会先执行init方法,接下来就是本文的重点。
- init函数先是调用registerAlgorithmProvider函数,入参是defaultPredicates()和defaultPriorities(),明显是注册默认的预选、优选算法,我们先关注预选算法默认有14个:
NoVolumeZoneConflictPred// Fit is determined by volume zone requirements.
MaxEBSVolumeCountPred// Fit is determined by whether or not there would be too many AWS EBS volumes attached to the node
MaxGCEPDVolumeCountPred// Fit is determined by whether or not there would be too many GCE PD volumes attached to the node
MaxAzureDiskVolumeCountPred// Fit is determined by whether or not there would be too many Azure Disk volumes attached to the node
MaxCSIVolumeCountPred
MatchInterPodAffinityPred//内部pod之间的亲和力
NoDiskConflictPred// Fit is determined by non-conflicting disk volumes.
GeneralPred// GeneralPredicates are the predicates that are enforced by all Kubernetes components
CheckNodeMemoryPressurePred// Fit is determined by node memory pressure condition.
CheckNodeDiskPressurePred// Fit is determined by node disk pressure condition.
CheckNodePIDPressurePred// Fit is determined by node pid pressure condition.
CheckNodeConditionPred// Fit is determined by node conditions: not ready, network unavailable or out of disk.
PodToleratesNodeTaintsPred// Fit is determined based on whether a pod can tolerate all of the node's taints
CheckVolumeBindingPred// Fit is determined by volume topology requirements.
优选算法默认有8个:
SelectorSpreadPriority
InterPodAffinityPriority
LeastRequestedPriority
BalancedResourceAllocation
NodePreferAvoidPodsPriority
NodeAffinityPriority
TaintTolerationPriority
ImageLocalityPriority
接着又注册了5个预选算法:
factory.RegisterFitPredicate("PodFitsPorts", predicates.PodFitsHostPorts)//这个只是为了兼容旧版本,所以仍旧保留,较新的版本已经用PodFitsHostPorts这个名字取代了PodFitsPorts,也就是下面的一个预选算法
factory.RegisterFitPredicate(predicates.PodFitsHostPortsPred, predicates.PodFitsHostPorts)//保证没有端口冲突
factory.RegisterFitPredicate(predicates.PodFitsResourcesPred, predicates.PodFitsResources)//保证足够的资源可用,包括cpu、内存、gpu等
factory.RegisterFitPredicate(predicates.HostNamePred, predicates.PodFitsHost)//pod指定的节点名称是否跟当前节点相匹配,这个算法一般不起作用
factory.RegisterFitPredicate(predicates.MatchNodeSelectorPred, predicates.PodMatchNodeSelector)//检查pod节点选择器是否与节点标签匹配。
4个优选算法:
ServiceSpreadingPriority
EqualPriority
MostRequestedPriority
RequestedToCapacityRatioPriority
以上默认注册的预选算法最终注册到全局变量fitPredicateMap,key为算法名,value为具体算法。
以上默认注册的优选算法最终注册到全局变量priorityFunctionMap,key为算法名,value为具体算法。
- 执行完init()函数之后,回到重点,调用ApplyFeatureGates方法。