無需手動輸入命令,簡單3步即可在K8S集羣中啓用GPU

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着全球各大企業開始廣泛採用Kubernetes,我們看到Kubernetes正在向新的階段發展。一方面,Kubernetes被邊緣的工作負載所採用並提供超越數據中心的價值。另一方面,Kubernetes正在驅動機器學習(ML)和高質量、高速的數據分析性能的發展。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們現在所瞭解到的將Kubernetes應用於機器學習的案例主要源於Kubernetes 1.10中一個的功能,當時圖形處理單元(GPUs)成爲一個可調度的資源——現在這一功能處於beta版本。單獨來看,這兩個都是Kubernetes中令人興奮的發展。更令人興奮的是,可以使用Kubernetes在數據中心和邊緣採用GPU。在數據中心,GPU是一種構建ML庫的方式。那些訓練過的庫將被遷移到邊緣Kubernetes集羣作爲機器學習的推理工具,在儘可能靠近數據收集的地方提供數據分析。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在早些時候,Kubernetes還是爲分佈式應用程序提供一個CPU和RAM資源的池。如果我們有CPU和RAM池,爲什麼不能有一個GPU池呢?這當然毫無問題,但不是所有的server都有GPU。所以,如何讓我們的server在Kubernetes中可以裝配GPU呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在本文中,我將闡述在Kubernetes集羣中使用GPU的簡單方法。在未來的文章中,我們還將GPU推向至邊緣並向你展示如何完成這一步驟。爲了真正地簡化步驟,我將用Rancher UI來操作啓用GPU的過程。Rancher UI只是Rancher RESTful APIs的一個客戶端。你可以在GitOps、DevOps和其他自動化解決方案中使用其他API的客戶端,比如Golang、Python和Terraform。不過,我們不會在此文中深入探討這些。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本質上看,步驟十分簡單:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲Kubernetes集羣構建基礎架構"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"安裝Kubernetes"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從Helm中安裝gpu-operator"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"使用Rancher和可用的GPU資源啓動和運行"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Rancher是一個多集羣管理解決方案並且是上述步驟的粘合劑。你可以在NVIDIA的博客中找到一個簡化GPU管理的純NVIDIA解決方案,以及一些關於gpu-operator與構建沒有operator的GPU驅動堆棧有何區別的重要信息。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(https:\/\/developer.nvidia.com\/blog\/nvidia-gpu-operator-simplifying-gpu-management-in-kubernetes\/)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"前期準備"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以下是在Rancher中啓動和運行GPU所需的材料清單(BOM):"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"Rancher"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"GPU Operator(https:\/\/nvidia.github.io\/gpu-operator\/)"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"基礎架構——我們將在AWS上使用GPU節點"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在官方文檔中,我們有專門的章節闡述如何高可用安裝Rancher,所以我們假設你已經將Rancher安裝完畢:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"https:\/\/docs.rancher.cn\/docs\/rancher2\/installation\/k8s-install\/_index\/"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"流程步驟"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"使用GPUs安裝Kubernetes集羣"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Rancher安裝之後,我們首先將構建和配置一個Kubernetes集羣(你可以使用任何帶有NVIDIA GPU的集羣)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用"},{"type":"text","marks":[{"type":"strong"}],"text":"Global"},{"type":"text","text":"上下文,我們選擇"},{"type":"text","marks":[{"type":"strong"}],"text":"Add Cluster"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/5e\/ae\/5e4b105f22f67a8e6dbbd741cac632ae.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"並在“來自雲服務商提供的主機”部分,選擇Amazon EC2。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/d7\/fc\/d7d8c71f8101fcf71355420563271efc.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們是通過節點驅動來實現的—— 一組預配置的基礎設施模板,其中一些模板有GPU資源。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/98\/68\/983283ddda01a31c089c19506c1b6368.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"注意到這裏有3個節點池:一個是爲master準備的,一個是爲標準的worker節點準備的,另一個是爲帶GPU的worker準備的。GPU的模板基於p3.2xlarge機器類型,使用Ubuntu 18.04亞馬遜機器鏡像或AMI(ami-0ac80df6eff0e70b5)。當然,這些選擇是根據每個基礎設施提供商和企業需求而變化的。另外,我們將 “"},{"type":"text","marks":[{"type":"strong"}],"text":"Add Cluster"},{"type":"text","text":"”表單中的Kubernetes選項設置爲默認值。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"設置GPU Operator"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現在,我們將使用GPU Operator庫(https:\/\/nvidia.github.io\/gpu-operator)在Rancher中設置一個catalog。(也有其他的解決方案可以暴露GPU,包括使用Linux for Tegra [L4T] Linux發行版或設備插件)在撰寫本文時,GPU Operator已經通過NVIDIA Tesla Driver 440進行了測試和驗證。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用Rancher Global上下文菜單,我們選擇要安裝到的集羣:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/71\/6d\/71d5563fb3e309637d8b5fb35c2a486d.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後使用"},{"type":"text","marks":[{"type":"strong"}],"text":"Tools"},{"type":"text","text":"菜單來查看catalog列表。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/1a\/29\/1a88448fa03ff50acac8547ea2bd7e29.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"點擊"},{"type":"text","marks":[{"type":"strong"}],"text":"Add Catalog"},{"type":"text","text":"按鈕並且給其命名,然後添加url:https:\/\/nvidia.github.io\/gpu-operator"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們選擇了"},{"type":"text","marks":[{"type":"strong"}],"text":"Helm v3"},{"type":"text","text":"和集羣範圍。我們點擊"},{"type":"text","marks":[{"type":"strong"}],"text":"Create"},{"type":"text","text":"以添加Catalog到Rancher。當使用自動化時,我們可以將這一步作爲集羣構建的一部分。根據企業策略,我們可以添加這個Catalog到每個集羣中,即使它還沒有GPU節點或節點池。這一步爲我們提供了訪問GPU Operator chart的機會,我們接下來將安裝它。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/d8\/d3\/d8a1d301cb8b6261523f2ea3a0b0c3d3.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現在我們想要使用左上角的Rancher上下文菜單以進入集羣的“System”項目,我們在這裏添加了GPU Operator功能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/7a\/f2\/7afba229e11411d244dd520879c534f2.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在System項目中,選擇Apps:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/71\/66\/71f2ce98399fec596842667bf68b7366.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後點擊右上方的"},{"type":"text","marks":[{"type":"strong"}],"text":"Launch"},{"type":"text","text":"按鈕。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/77\/9e\/77b048c6d7c74613ef0b699443a8c99e.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們可以搜索“nvidia”或者向下滾動到我們剛剛創建的catalog。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/f9\/c0\/f93b3a127bf36703e3ab306ac9c2cdc0.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"點擊gpu-operator app,然後在頁面底部點擊"},{"type":"text","marks":[{"type":"strong"}],"text":"Launch"},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/eb\/db\/ebdyy9344d757550fb663a90ceeef2db.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在這種情況下,所有的默認值都應該沒問題。同樣,我們可以通過Rancher APIs將這一步驟添加到自動化中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"利用GPU"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"既然GPU已經可以訪問,我們現在可以部署一個"},{"type":"text","marks":[{"type":"strong"}],"text":"GPU-capable"},{"type":"text","text":" 工作負載。同時,我們可以通過在Rancher中查看"},{"type":"text","marks":[{"type":"strong"}],"text":"Cluster -> Nodes"},{"type":"text","text":"的頁面驗證安裝是否成功。我們看到GPU Operator已經安裝了Node Feature Discovery (NFD)並且給我們的節點貼上了GPU使用的標籤。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/d7\/31\/d712c6d56b259ab2ed257bb2d717d631.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"總結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"之所以能夠採用如此簡單的方法就能夠讓Kubernetes與GPU一起運行,離不開這3個重要部分:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"NVIDIA的GPU Operator"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"來自Kubernetes同名SIG的Node Feature Discovery(NFD)。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"Rancher的集羣部署和catalog app集成"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"文章轉載自: RancherLabs(ID:RancherLabs)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/7DgzmhqrYKZt4r9Zly0iSQ","title":"xxx","type":null},"content":[{"type":"text","text":"無需手動輸入命令,簡單3步即可在K8S集羣中啓用GPU"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章