安裝nltk語料庫

原創

2020-04-20 20:35

在jupyter notebook上運行代碼時：

import nltk
paragraph = "i am a good boy ! are you ok? hahaha i am fine"
words_list = nltk.word_tokenize(paragraph)
print(words_list)

出現錯誤：

ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-4-55bf564de021> in <module>
----> 1 import nltk
      2 paragraph = "i am a good boy ! are you ok? hahaha i am fine"
      3 words_list = nltk.word_tokenize(paragraph)
      4 print(words_list)

ModuleNotFoundError: No module named 'nltk'

顯示沒有nltk這個模塊。

然後在cmd和conda裏分別運行pip list和conda list，發現nltk都已經安裝好。之後搜了一個博客，才知道還要下載nltk語料包。

嘗試了一下自動下載：

在idle 3.7（按照自己電腦上的版本）中運行代碼：

>>> import nltk
>>> nltk.download()

跳出來NlTK Downloder框，然後出現了getaddrinfo failed錯誤，貌似是服務器的問題。

之後看博客說是把NlTK Downloder裏的Server Index的內容：

https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml

換成：http://www.nltk.org/nltk_data/

點擊下載時又出現錯誤相同錯誤：getaddrinfo failed

搜了一堆博客嘗試後無果，沒辦法只能乖乖手動安裝nltk。

手動安裝有點麻煩，但也沒辦法。

不過有看到一個大佬寫了個代碼裝的，看起來很厲害：

手動安裝Python NLTK語言包

我自己是手動下載，然後解壓。

在github上下載語料庫：https://github.com/nltk/nltk_data

下載之後把裏面的packages文件名改成nltk_data（裏面的壓縮包都要解壓），然後放在該放的路徑下。
查看該放的路徑：可以先運行一段代碼（在idle中運行），錯誤提示裏會給出路徑，比如下面：

>>> import nltk
>>> paragraph = "i am a good boy ! are you ok? hahaha i am fine"
>>> words_list = nltk.word_tokenize(paragraph)
Traceback (most recent call last):
  File "<pyshell#5>", line 1, in <module>
    words_list = nltk.word_tokenize(paragraph)
  File "C:\Program Files\Python37\lib\site-packages\nltk\tokenize\__init__.py", line 144, in word_tokenize
    sentences = [text] if preserve_line else sent_tokenize(text, language)
  File "C:\Program Files\Python37\lib\site-packages\nltk\tokenize\__init__.py", line 105, in sent_tokenize
    tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language))
  File "C:\Program Files\Python37\lib\site-packages\nltk\data.py", line 868, in load
    opened_resource = _open(resource_url)
  File "C:\Program Files\Python37\lib\site-packages\nltk\data.py", line 993, in _open
    return find(path_, path + ['']).open()
  File "C:\Program Files\Python37\lib\site-packages\nltk\data.py", line 701, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource [93mpunkt[0m not found.
  Please use the NLTK Downloader to obtain the resource:

  [31m>>> import nltk
  >>> nltk.download('punkt')
  [0m
  For more information see: https://www.nltk.org/data.html

  Attempted to load [93mtokenizers/punkt/english.pickle[0m

  Searched in:
    - 'C:\\Users\\馬靜靜/nltk_data'
    - 'C:\\Program Files\\Python37\\nltk_data'
    - 'C:\\Program Files\\Python37\\share\\nltk_data'
    - 'C:\\Program Files\\Python37\\lib\\nltk_data'
    - 'C:\\Users\\馬靜靜\\AppData\\Roaming\\nltk_data'
    - 'C:\\nltk_data'
    - 'D:\\nltk_data'
    - 'E:\\nltk_data'
    - ''
**********************************************************************

這一部分就是可以放nltk_data的路徑：

Searched in:
    - 'C:\\Users\\馬靜靜/nltk_data'
    - 'C:\\Program Files\\Python37\\nltk_data'
    - 'C:\\Program Files\\Python37\\share\\nltk_data'
    - 'C:\\Program Files\\Python37\\lib\\nltk_data'
    - 'C:\\Users\\馬靜靜\\AppData\\Roaming\\nltk_data'
    - 'C:\\nltk_data'
    - 'D:\\nltk_data'
    - 'E:\\nltk_data'
    - ''

我解壓完後，直接把文件夾 ( packages文件名改成nltk_data的文件夾 ) 放在C:\\Users\\馬靜靜\\下。
再運行代碼就ok了。

>>> import nltk
>>> paragraph = "i am a good boy ! are you ok? hahaha i am fine"
>>> words_list = nltk.word_tokenize(paragraph)
>>> print(words_list)
['i', 'am', 'a', 'good', 'boy', '!', 'are', 'you', 'ok', '?', 'hahaha', 'i', 'am', 'fine']

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

安裝nltk語料庫

Python 潮流週刊#52：Python 處理 Excel 的資源

Python-變量和簡單數據類型

Python-用戶輸入和while循環

機器學習的類別

安裝nltk遇到的坑：下載失敗+安裝成功但導入不了

人工智能-課程導學

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結