Default process group has not been initialized, please make sure to call init_process_group.

原創

2020-06-16 02:48

在查看SlowFastNet源代碼https://github.com/facebookresearch/SlowFast中的model_builder時，想要採用多GPU訓練；GPU大於1的話，代碼自動調用torch.nn.parallel.DistributedDataParallel套在model外面。

torch.nn.parallel.DistributedDataParallel和torch.nn.parallel.DataParallel作用相似，都是pytorch調用多GPU訓練；但torch.nn.parallel.DistributedDataParallel需有個初始化

在torch.nn.parallel.DistributedDataParallel前加入

torch.distributed.init_process_group('nccl',init_method='file:///home/.../my_file',world_size=1,rank=0)

這裏是在單個機器上調用多張GPU，簡稱單機多卡，所以world_size=1；具體參考
https://github.com/pytorch/examples/tree/master/imagenet

下列代碼構造了僅使用index爲1,2共計兩個GPU同時訓練模型

os.environ['CUDA_VISIBLE_DEVICES']='1,2'
device=torch.device('cuda:0')
model=ConvNet(num_classes)
torch.distributed.init_process_group('nccl',init_method='file:///home/.../my_file',world_size=1,rank=0)
model=torch.nn.parallel.DistributedDataParallel(model.to(device)

關於pytorch多GPU訓練，下面這篇文章寫得有點亂，看了下就是把他放在最後的幾篇文章給結合了一下
https://blog.csdn.net/m0_38008956/article/details/86559432

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Default process group has not been initialized, please make sure to call init_process_group.

記一次 .NET某工業設計軟件崩潰分析

創建 Vue3 項目

TS + Webpack 整合 Jest

分享5款.NET開源免費的Redis客戶端組件庫

安卓手機如何登錄抖音境外版

golang開發 gorilla websocket的使用

面試官：如果不允許線程池丟棄任務，應該選擇哪個拒絕策略？

嵌入式汽車電子學習路線

Mac卸載 Node npm，升級 Node

uni.showModel內容換行

GEMS_Ultrasound_MovieGroup_001(python下進行private tag data(private creator)數據提取)

Counting Out Time: Class Agnostic Video Repetition Counting in the Wild個人筆記

cnpy: c++中讀取npy文件數組

SlowFastNet(SlowFast) finetune(微調)

opencv編譯筆記(opencv 4.1.0 編譯失敗缺少opencv_world410d.lib)

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結