cuda runtime error (10) : invalid device ordinal at torch/csrc/cuda/Module.cpp:88

讚賞碼 & 聯繫方式 & 個人閒話

我在執行torch.load時遇到該報錯,感到非常奇怪。因爲我只有一塊GPU,怎麼會出現無效的設備序號呢?

查找__init__.py報錯源碼,發現self.idx的值確實爲1,然而應該爲0。仔細看load函數的註釋會發現已經寫明解決方法。

"""Loads an object saved with :func:`torch.save` from a file.

    torch.load uses Python's unpickling facilities but treats storages,
    which underlie tensors, specially. They are first deserialized on the
    CPU and are then moved to the device they were saved from. If this fails
    (e.g. because the run time system doesn't have certain devices), an exception
    is raised. However, storages can be dynamically remapped to an alternative
    set of devices using the map_location argument.

    If map_location is a callable, it will be called once for each serialized
    storage with two arguments: storage and location. The storage argument
    will be the initial deserialization of the storage, residing on the CPU.
    Each serialized storage has a location tag associated with it which
    identifies the device it was saved from, and this tag is the second
    argument passed to map_location. The builtin location tags are 'cpu' for
    CPU tensors and 'cuda:device_id' (e.g. 'cuda:2') for CUDA tensors.
    map_location should return either None or a storage. If map_location returns
    a storage, it will be used as the final deserialized object, already moved to
    the right device. Otherwise, torch.load will fall back to the default behavior,
    as if map_location wasn't specified.

    If map_location is a dict, it will be used to remap location tags
    appearing in the file (keys), to ones that specify where to put the
    storages (values).

    User extensions can register their own location tags and tagging and
    deserialization methods using register_package.

    Args:
        f: a file-like object (has to implement fileno that returns a file
            descriptor, and must implement seek), or a string containing a file
            name
        map_location: a function or a dict specifying how to remap storage
            locations
        pickle_module: module used for unpickling metadata and objects (has to
            match the pickle_module used to serialize file)

    Example:
        >>> torch.load('tensors.pt')
        # Load all tensors onto the CPU
        >>> torch.load('tensors.pt', map_location=lambda storage, loc: storage)
        # Load all tensors onto GPU 1
        >>> torch.load('tensors.pt', map_location=lambda storage, loc: storage.cuda(1))
        # Map tensors from GPU 1 to GPU 0
        >>> torch.load('tensors.pt', map_location={'cuda:1':'cuda:0'})

    """

最後的例子比較重要哈,如果我們想把GPU1修正到GPU0需要使用例子中的這句話(修改tensors.pt爲自己的文件哈)

torch.load('tensors.pt', map_location={'cuda:1':'cuda:0'})

將GPU1映射到GPU0,問題成功解決。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章