今天在看《Python网络数据采集》一书,看到NLKT相关的部分自己动手去做。然而,运行时候找不到相关的资源。
from nltk import word_tokenizefrom nltk
import Text
tokens = word_tokenize("Here is some not very interesting text")
text = Text(tokens)
运行时出现了:
D:\Python27\python.exe H:/temp/python-scraping/chapter8/5-NltkTokenize.py
Traceback (most recent call last):
File "H:/temp/python-scraping/chapter8/5-NltkTokenize.py", line 7, in <module>
tokens = word_tokenize("Here is some not very interesting text")
File "D:\Python27\lib\site-packages\nltk\tokenize\__init__.py", line 106, in word_tokenize
return [token for sent in sent_tokenize(text, language)
File "D:\Python27\lib\site-packages\nltk\tokenize\__init__.py", line 90, in sent_tokenize
tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language))
File "D:\Python27\lib\site-packages\nltk\data.py", line 801, in load
opened_resource = _open(resource_url)
File "D:\Python27\lib\site-packages\nltk\data.py", line 919, in _open
return find(path_, path + ['']).open()
File "D:\Python27\lib\site-packages\nltk\data.py", line 641, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource u'tokenizers/punkt/english.pickle' not found. Please
use the NLTK Downloader to obtain the resource: >>>
nltk.download()
Searched in:
- 'C:\\Users\\Administrator/nltk_data'
- 'C:\\nltk_data'
- 'D:\\nltk_data'
- 'E:\\nltk_data'
- 'D:\\Python27\\nltk_data'
- 'D:\\Python27\\lib\\nltk_data'
- 'C:\\Users\\Administrator\\AppData\\Roaming\\nltk_data'
- u''
**********************************************************************
去网上查了半天,后来定睛一看。我靠,原来它查找的目录不是在我下载的目录。我修改了下载的路径。
所以,需要修改它的查找路径:
from nltk import data
data.path.append(u"G:\\nltk_data")
这样程序就可以运行了,或者还可以采用设置NLTK_DATA 环境变量的方法修改NLTK查找的路径。
建议:在最好放在其他的NLTK导入之前。