[ pytorch ] ——— 報錯error解決彙總

顯存相關

1. 顯存溢出:GPU顯存佔用隨着運行不斷累加

出現這種情況主要是程序中有很多中間變量佔用的顯存,而這些顯存沒有被del掉。
舉個例子:

feature_loader = []
for i,(data, target) in enumerate(train_loader):
    # data, target = data.to(device), target.to(device)
    data, target = Variable(data.cuda()), Variable(target.cuda())

    feature, pred = model(data)
    
    loss = CELOSS(pred, target)
    optimizer4nn.zero_grad()
    loss.backward()
    optimizer4nn.step()

    feature_loader.append(feature)   # <--- 這裏

這裏的feature實際上還是個variable.cuda(),隨着迭代進行,featur變量會被新的feature變量替換,這部分不會有新的顯存增加。而feature_loader在不斷收集新的feature,這樣顯存會一點一點的累加。

解決方法就是在收集過程中,把feature從顯存.cuda()上搬到內存.cpu()上,這樣,feature_loader就不會佔用顯存。

feature_loader.append(feature.data.cpu())   # <---加上.data.cpu

待分類

1. 參數沒法load進去,多出個module.,原因是:之前訓練的時候使用了nn.DataParallel(model_structure,device_ids=gpu_ids)

RuntimeError: Error(s) in loading state_dict for ftnet_EncoderDecoder:
	
Missing key(s) in state_dict: 
"BackBone.model.conv1.weight", "BackBone.model.bn1.weight", "BackBone.model.bn1.bias", "BackBone.model.bn1.running_mean", "BackBone.model.bn1.running_var", .....

Unexpected key(s) in state_dict: 
"module.BackBone.model.conv1.weight", "module.BackBone.model.bn1.weight", "module.BackBone.model.bn1.bias", "module.BackBone.model.bn1.running_mean", "module.BackBone.model.bn1.running_var", "module.BackBone.model.l  "  ........

解決

model_structure = net()
model_structure = nn.DataParallel(model_structure,device_ids=gpu_ids)
model = load_network(model_structure)

2、There is an imbalance between your GPUs. You may want to exclude GPU 0 which has less than 75% of the memory or cores of GPU 1. You can do so by setting the device_ids argument to DataParallel, or by setting the CUDA_VISIBLE_DEVICES environment variable.

解決
As mentioned in this link, you have to do model.cuda() before passing it to nn.DataParallel.

net = nn.DataParallel(model.cuda(), device_ids=[0,1])

URL that solved this problem:https://stackoverflow.com/questions/55343893/how-to-do-parallel-processing-in-pytorch

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章