kubelet源碼分析(三)之 startKubelet

本文個人博客地址:https://www.huweihuang.com/kubernetes-notes/code-analysis/kubelet/startKubelet.html

kubelet源碼分析(三)之 startKubelet

以下代碼分析基於 kubernetes v1.12.0 版本。

本文主要分析startKubelet,其中主要是kubelet.Run部分,該部分的內容主要是初始化並運行一些manager。對於kubelet所包含的各種manager的執行邏輯和pod的生命週期管理邏輯待後續文章分析。

後續的文章主要會分類分析pkg/kubelet部分的代碼實現。

kubeletpkg代碼目錄結構:

kubelet
├── apis  # 定義一些相關接口
├── cadvisor # cadvisor
├── cm # ContainerManager、cpu manger、cgroup manager 
├── config
├── configmap # configmap manager
├── container  # Runtime、ImageService
├── dockershim  # docker的相關調用
├── eviction # eviction manager
├── images  # image manager
├── kubeletconfig  
├── kuberuntime # 核心:kubeGenericRuntimeManager、runtime容器的相關操作
├── lifecycle
├── mountpod
├── network  # pod dns
├── nodelease
├── nodestatus  # MachineInfo、節點相關信息
├── pleg  # PodLifecycleEventGenerator
├── pod  # 核心:pod manager、mirror pod
├── preemption
├── qos  # 資源服務質量,不過暫時內容很少
├── remote # RemoteRuntimeService
├── server
├── stats # StatsProvider
├── status # status manager
├── types  # PodUpdate、PodOperation
├── volumemanager # VolumeManager
├── kubelet.go  # 核心: SyncHandler、kubelet的大部分操作
├── kubelet_getters.go # 各種get操作,例如獲取相關目錄:getRootDir、getPodsDir、getPluginsDir
├── kubelet_network.go # 
├── kubelet_network_linux.go
├── kubelet_node_status.go # registerWithAPIServer、initialNode、syncNodeStatus
├── kubelet_pods.go # 核心:pod的增刪改查等相關操作、podKiller、
├── kubelet_resources.go
├── kubelet_volumes.go # ListVolumesForPod、cleanupOrphanedPodDirs
├── oom_watcher.go  # OOMWatcher
├── pod_container_deletor.go
├── pod_workers.go # 核心:PodWorkers、UpdatePodOptions、syncPodOptions、managePodLoop
├── runonce.go  # RunOnce
├── runtime.go
...

1. startKubelet

startKubelet的函數位於cmd/kubelet/app/server.go,啓動並運行一個kubelet,運行kubelet的邏輯代碼位於pkg/kubelet/kubelet.go

主要內容如下:

  1. 運行一個kubelet,執行kubelet中各種manager的相關邏輯。
  2. 運行kubelet server啓動監聽服務。

此部分代碼位於cmd/kubelet/app/server.go

func startKubelet(k kubelet.Bootstrap, podCfg *config.PodConfig, kubeCfg *kubeletconfiginternal.KubeletConfiguration, kubeDeps *kubelet.Dependencies, enableServer bool) {
	// start the kubelet
	go wait.Until(func() {
		k.Run(podCfg.Updates())
	}, 0, wait.NeverStop)

	// start the kubelet server
	if enableServer {
		go k.ListenAndServe(net.ParseIP(kubeCfg.Address), uint(kubeCfg.Port), kubeDeps.TLSOptions, kubeDeps.Auth, kubeCfg.EnableDebuggingHandlers, kubeCfg.EnableContentionProfiling)

	}
	if kubeCfg.ReadOnlyPort > 0 {
		go k.ListenAndServeReadOnly(net.ParseIP(kubeCfg.Address), uint(kubeCfg.ReadOnlyPort))
	}
}

2. Kubelet.Run

Kubelet.Run方法主要將NewMainKubelet構造的各種manager運行起來,讓各種manager執行相應的功能,大部分manager爲常駐進程的方式運行。

Kubelet.Run完整代碼如下:

此部分代碼位於pkg/kubelet/kubelet.go

// Run starts the kubelet reacting to config updates
func (kl *Kubelet) Run(updates <-chan kubetypes.PodUpdate) {
	if kl.logServer == nil {
		kl.logServer = http.StripPrefix("/logs/", http.FileServer(http.Dir("/var/log/")))
	}
	if kl.kubeClient == nil {
		glog.Warning("No api server defined - no node status update will be sent.")
	}

	// Start the cloud provider sync manager
	if kl.cloudResourceSyncManager != nil {
		go kl.cloudResourceSyncManager.Run(wait.NeverStop)
	}

	if err := kl.initializeModules(); err != nil {
		kl.recorder.Eventf(kl.nodeRef, v1.EventTypeWarning, events.KubeletSetupFailed, err.Error())
		glog.Fatal(err)
	}

	// Start volume manager
	go kl.volumeManager.Run(kl.sourcesReady, wait.NeverStop)

	if kl.kubeClient != nil {
		// Start syncing node status immediately, this may set up things the runtime needs to run.
		go wait.Until(kl.syncNodeStatus, kl.nodeStatusUpdateFrequency, wait.NeverStop)
		go kl.fastStatusUpdateOnce()

		// start syncing lease
		if utilfeature.DefaultFeatureGate.Enabled(features.NodeLease) {
			go kl.nodeLeaseController.Run(wait.NeverStop)
		}
	}
	go wait.Until(kl.updateRuntimeUp, 5*time.Second, wait.NeverStop)

	// Start loop to sync iptables util rules
	if kl.makeIPTablesUtilChains {
		go wait.Until(kl.syncNetworkUtil, 1*time.Minute, wait.NeverStop)
	}

	// Start a goroutine responsible for killing pods (that are not properly
	// handled by pod workers).
	go wait.Until(kl.podKiller, 1*time.Second, wait.NeverStop)

	// Start component sync loops.
	kl.statusManager.Start()
	kl.probeManager.Start()

	// Start syncing RuntimeClasses if enabled.
	if kl.runtimeClassManager != nil {
		go kl.runtimeClassManager.Run(wait.NeverStop)
	}

	// Start the pod lifecycle event generator.
	kl.pleg.Start()
	kl.syncLoop(updates, kl)
}

以下對Kubelet.Run分段進行分析。

3. initializeModules

initializeModules包含了imageManagerserverCertificateManageroomWatcherresourceAnalyzer

主要流程如下:

  1. 創建文件系統目錄,包括kubelet的root目錄、pods的目錄、plugins的目錄和容器日誌目錄。
  2. 啓動imageManager、serverCertificateManager、oomWatcher、resourceAnalyzer。

各種manager的說明如下:

  • imageManager:負責鏡像垃圾回收。
  • serverCertificateManager:負責處理證書。
  • oomWatcher:監控內存使用,是否發生內存耗盡。
  • resourceAnalyzer:監控資源使用情況。

完整代碼如下:

此部分代碼位於pkg/kubelet/kubelet.go

// initializeModules will initialize internal modules that do not require the container runtime to be up.
// Note that the modules here must not depend on modules that are not initialized here.
func (kl *Kubelet) initializeModules() error {
	// Prometheus metrics.
	metrics.Register(kl.runtimeCache, collectors.NewVolumeStatsCollector(kl))

	// Setup filesystem directories.
	if err := kl.setupDataDirs(); err != nil {
		return err
	}

	// If the container logs directory does not exist, create it.
	if _, err := os.Stat(ContainerLogsDir); err != nil {
		if err := kl.os.MkdirAll(ContainerLogsDir, 0755); err != nil {
			glog.Errorf("Failed to create directory %q: %v", ContainerLogsDir, err)
		}
	}

	// Start the image manager.
	kl.imageManager.Start()

	// Start the certificate manager if it was enabled.
	if kl.serverCertificateManager != nil {
		kl.serverCertificateManager.Start()
	}

	// Start out of memory watcher.
	if err := kl.oomWatcher.Start(kl.nodeRef); err != nil {
		return fmt.Errorf("Failed to start OOM watcher %v", err)
	}

	// Start resource analyzer
	kl.resourceAnalyzer.Start()

	return nil
}

3.1. setupDataDirs

initializeModules先創建相關目錄。

具體目錄如下:

  • ContainerLogsDir:目錄爲/var/log/containers。
  • rootDirectory:由參數傳入,一般爲/var/lib/kubelet
  • PodsDir:目錄爲{rootDirectory}/pods。
  • PluginsDir:目錄爲{rootDirectory}/plugins。

initializeModules中setupDataDirs的相關代碼如下:

// Setup filesystem directories.
if err := kl.setupDataDirs(); err != nil {
	return err
}

// If the container logs directory does not exist, create it.
if _, err := os.Stat(ContainerLogsDir); err != nil {
	if err := kl.os.MkdirAll(ContainerLogsDir, 0755); err != nil {
		glog.Errorf("Failed to create directory %q: %v", ContainerLogsDir, err)
	}
}

setupDataDirs代碼如下

// setupDataDirs creates:
// 1.  the root directory
// 2.  the pods directory
// 3.  the plugins directory
func (kl *Kubelet) setupDataDirs() error {
	kl.rootDirectory = path.Clean(kl.rootDirectory)
	if err := os.MkdirAll(kl.getRootDir(), 0750); err != nil {
		return fmt.Errorf("error creating root directory: %v", err)
	}
	if err := kl.mounter.MakeRShared(kl.getRootDir()); err != nil {
		return fmt.Errorf("error configuring root directory: %v", err)
	}
	if err := os.MkdirAll(kl.getPodsDir(), 0750); err != nil {
		return fmt.Errorf("error creating pods directory: %v", err)
	}
	if err := os.MkdirAll(kl.getPluginsDir(), 0750); err != nil {
		return fmt.Errorf("error creating plugins directory: %v", err)
	}
	return nil
}

3.2. manager

initializeModules中的manager如下:

// Start the image manager.
kl.imageManager.Start()

// Start the certificate manager if it was enabled.
if kl.serverCertificateManager != nil {
	kl.serverCertificateManager.Start()
}

// Start out of memory watcher.
if err := kl.oomWatcher.Start(kl.nodeRef); err != nil {
	return fmt.Errorf("Failed to start OOM watcher %v", err)
}

// Start resource analyzer
kl.resourceAnalyzer.Start()

4. 運行各種manager

4.1. volumeManager

volumeManager主要運行一組異步循環,根據在此節點上安排的pod調整哪些volume需要attached/detached/mounted/unmounted

// Start volume manager
go kl.volumeManager.Run(kl.sourcesReady, wait.NeverStop)

volumeManager.Run實現代碼如下:

func (vm *volumeManager) Run(sourcesReady config.SourcesReady, stopCh <-chan struct{}) {
	defer runtime.HandleCrash()

	go vm.desiredStateOfWorldPopulator.Run(sourcesReady, stopCh)
	glog.V(2).Infof("The desired_state_of_world populator starts")

	glog.Infof("Starting Kubelet Volume Manager")
	go vm.reconciler.Run(stopCh)

	metrics.Register(vm.actualStateOfWorld, vm.desiredStateOfWorld, vm.volumePluginMgr)

	<-stopCh
	glog.Infof("Shutting down Kubelet Volume Manager")
}

4.2. syncNodeStatus

syncNodeStatus通過goroutine的方式定期執行,它將節點的狀態同步給master,必要的時候註冊kubelet。

if kl.kubeClient != nil {
	// Start syncing node status immediately, this may set up things the runtime needs to run.
	go wait.Until(kl.syncNodeStatus, kl.nodeStatusUpdateFrequency, wait.NeverStop)
	go kl.fastStatusUpdateOnce()

	// start syncing lease
	if utilfeature.DefaultFeatureGate.Enabled(features.NodeLease) {
		go kl.nodeLeaseController.Run(wait.NeverStop)
	}
}

4.3. updateRuntimeUp

updateRuntimeUp調用容器運行時狀態回調,在容器運行時首次啓動時初始化運行時相關模塊,如果狀態檢查失敗則返回錯誤。 如果狀態檢查正常,在kubelet runtimeState中更新容器運行時的正常運行時間。

go wait.Until(kl.updateRuntimeUp, 5*time.Second, wait.NeverStop)

4.4. syncNetworkUtil

通過循環的方式同步iptables的規則,不過當前代碼並沒有執行任何操作。

// Start loop to sync iptables util rules
if kl.makeIPTablesUtilChains {
	go wait.Until(kl.syncNetworkUtil, 1*time.Minute, wait.NeverStop)
}

4.5. podKiller

但pod沒有被podworker正確處理的時候,啓動一個goroutine負責殺死pod。

// Start a goroutine responsible for killing pods (that are not properly
// handled by pod workers).
go wait.Until(kl.podKiller, 1*time.Second, wait.NeverStop)

podKiller代碼如下:

此部分代碼位於pkg/kubelet/kubelet_pods.go

// podKiller launches a goroutine to kill a pod received from the channel if
// another goroutine isn't already in action.
func (kl *Kubelet) podKiller() {
	killing := sets.NewString()
	// guard for the killing set
	lock := sync.Mutex{}
	for podPair := range kl.podKillingCh {
		runningPod := podPair.RunningPod
		apiPod := podPair.APIPod

		lock.Lock()
		exists := killing.Has(string(runningPod.ID))
		if !exists {
			killing.Insert(string(runningPod.ID))
		}
		lock.Unlock()

		if !exists {
			go func(apiPod *v1.Pod, runningPod *kubecontainer.Pod) {
				glog.V(2).Infof("Killing unwanted pod %q", runningPod.Name)
				err := kl.killPod(apiPod, runningPod, nil, nil)
				if err != nil {
					glog.Errorf("Failed killing the pod %q: %v", runningPod.Name, err)
				}
				lock.Lock()
				killing.Delete(string(runningPod.ID))
				lock.Unlock()
			}(apiPod, runningPod)
		}
	}
}

4.6. statusManager

使用apiserver同步pods狀態; 也用作狀態緩存。

// Start component sync loops.
kl.statusManager.Start()

statusManager.Start的實現代碼如下:

func (m *manager) Start() {
	// Don't start the status manager if we don't have a client. This will happen
	// on the master, where the kubelet is responsible for bootstrapping the pods
	// of the master components.
	if m.kubeClient == nil {
		glog.Infof("Kubernetes client is nil, not starting status manager.")
		return
	}

	glog.Info("Starting to sync pod status with apiserver")
	syncTicker := time.Tick(syncPeriod)
	// syncPod and syncBatch share the same go routine to avoid sync races.
	go wait.Forever(func() {
		select {
		case syncRequest := <-m.podStatusChannel:
			glog.V(5).Infof("Status Manager: syncing pod: %q, with status: (%d, %v) from podStatusChannel",
				syncRequest.podUID, syncRequest.status.version, syncRequest.status.status)
			m.syncPod(syncRequest.podUID, syncRequest.status)
		case <-syncTicker:
			m.syncBatch()
		}
	}, 0)
}

4.7. probeManager

處理容器探針

kl.probeManager.Start()

4.8. runtimeClassManager

// Start syncing RuntimeClasses if enabled.
if kl.runtimeClassManager != nil {
	go kl.runtimeClassManager.Run(wait.NeverStop)
}

4.9. PodLifecycleEventGenerator

// Start the pod lifecycle event generator.
kl.pleg.Start()

PodLifecycleEventGenerator是一個pod生命週期時間生成器接口,具體如下:

// PodLifecycleEventGenerator contains functions for generating pod life cycle events.
type PodLifecycleEventGenerator interface {
	Start()
	Watch() chan *PodLifecycleEvent
	Healthy() (bool, error)
}

start方法具體實現如下:

// Start spawns a goroutine to relist periodically.
func (g *GenericPLEG) Start() {
	go wait.Until(g.relist, g.relistPeriod, wait.NeverStop)
}

4.10. syncLoop

最後調用syncLoop來執行同步變化變更的循環。

kl.syncLoop(updates, kl)

5. syncLoop

syncLoop是處理變化的循環。 它監聽來自三種channel(file,apiserver和http)的更改。 對於看到的任何新更改,將針對所需狀態和運行狀態運行同步。 如果沒有看到配置的變化,將在每個同步頻率秒同步最後已知的所需狀態。

// syncLoop is the main loop for processing changes. It watches for changes from
// three channels (file, apiserver, and http) and creates a union of them. For
// any new change seen, will run a sync against desired state and running state. If
// no changes are seen to the configuration, will synchronize the last known desired
// state every sync-frequency seconds. Never returns.
func (kl *Kubelet) syncLoop(updates <-chan kubetypes.PodUpdate, handler SyncHandler) {
	glog.Info("Starting kubelet main sync loop.")
	// The resyncTicker wakes up kubelet to checks if there are any pod workers
	// that need to be sync'd. A one-second period is sufficient because the
	// sync interval is defaulted to 10s.
	syncTicker := time.NewTicker(time.Second)
	defer syncTicker.Stop()
	housekeepingTicker := time.NewTicker(housekeepingPeriod)
	defer housekeepingTicker.Stop()
	plegCh := kl.pleg.Watch()
	const (
		base   = 100 * time.Millisecond
		max    = 5 * time.Second
		factor = 2
	)
	duration := base
	for {
		if rs := kl.runtimeState.runtimeErrors(); len(rs) != 0 {
			glog.Infof("skipping pod synchronization - %v", rs)
			// exponential backoff
			time.Sleep(duration)
			duration = time.Duration(math.Min(float64(max), factor*float64(duration)))
			continue
		}
		// reset backoff if we have a success
		duration = base

		kl.syncLoopMonitor.Store(kl.clock.Now())
		if !kl.syncLoopIteration(updates, handler, syncTicker.C, housekeepingTicker.C, plegCh) {
			break
		}
		kl.syncLoopMonitor.Store(kl.clock.Now())
	}
}

其中調用了syncLoopIteration的函數來執行更具體的監控pod變化的循環。syncLoopIteration代碼邏輯待後續單獨分析。

6. 總結

6.1. 基本流程

Kubelet.Run主要流程如下:

  1. 初始化模塊,其實就是運行imageManagerserverCertificateManageroomWatcherresourceAnalyzer
  2. 運行各種manager,大部分以常駐goroutine的方式運行,其中包括volumeManagerstatusManager等。
  3. 執行處理變更的循環函數syncLoop,對pod的生命週期進行管理。

syncLoop:

syncLoop函數,對pod的生命週期進行管理,其中syncLoop調用了syncLoopIteration函數,該函數根據podUpdate的信息,針對不同的操作,由SyncHandler來執行pod的增刪改查等生命週期的管理,其中的syncHandler包括HandlePodSyncsHandlePodCleanups等。該部分邏輯待後續文章具體分析。

6.2. Manager

以下介紹kubelet運行時涉及到的manager的內容。

manager 說明
imageManager 負責鏡像垃圾回收
serverCertificateManager 負責處理證書
oomWatcher 監控內存使用,是否發生內存耗盡即OOM
resourceAnalyzer 監控資源使用情況
volumeManager 對pod執行attached/detached/mounted/unmounted操作
statusManager 使用apiserver同步pods狀態; 也用作狀態緩存
probeManager 處理容器探針
runtimeClassManager 同步RuntimeClasses
podKiller 負責殺死pod

參考文章:

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章