我使用sklearn训练了一个潜在Dirichlet分配模型。当我解封它后,我使用countVectorizer来转换文档,然后使用LDA转换这个实例以获取主题分布,但出现了以下错误:
AttributeError: module '__main__' has no attribute 'tokenize'
这是我的代码:
lda = joblib.load('lda_good.pkl')#训练好的LDA模型tf_vect = joblib.load('tf_vectorizer_.pkl')#向量化器texts = readContent('doc_name.pdf')new_doc = tf_vect.transform(texts)print(new_doc)print(lda.transform(new_doc))
问题在于解封后的countVectorizer对象运行正常,我可以使用.transform
方法,但是当我尝试使用LDA的属性进行.transform
时,它似乎引用了countvectorizer中的tokenize函数…tokenize函数是在代码前面定义的,但我无法理解tokenize与潜在Dirichlet分配的transform方法有什么关系。奇怪的是,所有这些代码在jupyter notebook中运行正常,但在作为脚本运行时却不行…
所有代码都在一个文件中。模型是在jupyter notebook中训练的,现在我尝试在脚本中使用这个模型。
这是错误追踪信息:
Traceback (most recent call last):File "<string>", line 1, in <module>Traceback (most recent call last):File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_mainFile "<string>", line 1, in <module>File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_mainexitcode = _main(fd)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _mainexitcode = _main(fd)prepare(preparation_data)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _mainFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 226, in prepareprepare(preparation_data)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 226, in prepare_fixup_main_from_path(data['init_main_from_path'])File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 278, in _fixup_main_from_pathTraceback (most recent call last):Traceback (most recent call last):_fixup_main_from_path(data['init_main_from_path'])File "<string>", line 1, in <module>File "<string>", line 1, in <module>run_name="__mp_main__")File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 278, in _fixup_main_from_pathFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 254, in run_pathFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_mainFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_mainrun_name="__mp_main__")pkg_name=pkg_name, script_name=fname)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 254, in run_pathexitcode = _main(fd)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 96, in _run_module_codeexitcode = _main(fd)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _mainFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _mainpkg_name=pkg_name, script_name=fname)prepare(preparation_data)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 226, in prepareFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 96, in _run_module_codemod_name, mod_spec, pkg_name, script_name)_fixup_main_from_path(data['init_main_from_path'])prepare(preparation_data)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 85, in _run_codeFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 278, in _fixup_main_from_pathmod_name, mod_spec, pkg_name, script_name)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 226, in prepareexec(code, run_globals)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 85, in _run_codeFile "C:\Users\eduard.bermejo\Documents\Machine learning\gestió documental\POC\program_POC.py", line 160, in <module>run_name="__mp_main__")_fixup_main_from_path(data['init_main_from_path'])File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 254, in run_pathFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 278, in _fixup_main_from_pathexec(code, run_globals)tf_vect = joblib.load('tf_vectorizer_.pkl')pkg_name=pkg_name, script_name=fname)File "C:\Users\eduard.bermejo\Documents\Machine learning\gestió documental\POC\program_POC.py", line 160, in <module>File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\site- packages\sklearn\externals\joblib\numpy_pickle.py", line 459, in loadFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 96, in _run_module_coderun_name="__mp_main__")File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 254, in run_pathtf_vect = joblib.load('tf_vectorizer_.pkl')obj = unpickler.load()mod_name, mod_spec, pkg_name, script_name)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\numpy_pickle.py", line 459, in loadFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1039, in loadFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 85, in _run_codepkg_name=pkg_name, script_name=fname)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 96, in _run_module_codeexec(code, run_globals)File "C:\Users\eduard.bermejo\Documents\Machine learning\gestió documental\POC\program_POC.py", line 160, in <module>obj = unpickler.load()mod_name, mod_spec, pkg_name, script_name)dispatch[key[0]](self)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1039, in loadFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 85, in _run_codeFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1334, in load_globaltf_vect = joblib.load('tf_vectorizer_.pkl')exec(code, run_globals)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\site- packages\sklearn\externals\joblib\numpy_pickle.py", line 459, in loadFile "C:\Users\eduard.bermejo\Documents\Machine learning\gestió documental\POC\program_POC.py", line 160, in <module>dispatch[key[0]](self)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1334, in load_globaltf_vect = joblib.load('tf_vectorizer_.pkl')obj = unpickler.load()klass = self.find_class(module, name)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\numpy_pickle.py", line 459, in loadFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1039, in loadFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1388, in find_classklass = self.find_class(module, name)obj = unpickler.load()File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1388, in find_classFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1039, in loaddispatch[key[0]](self)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1334, in load_globalklass = self.find_class(module, name)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1388, in find_classreturn getattr(sys.modules[module], name)AttributeError: module '__main__' has no attribute 'tokenize'dispatch[key[0]](self)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1334, in load_globalklass = self.find_class(module, name)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1388, in find_classreturn getattr(sys.modules[module], name)AttributeError: module '__main__' has no attribute 'tokenize'klass = self.find_class(module, name)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1388, in find_classreturn getattr(sys.modules[module], name)AttributeError: module '__main__' has no attribute 'tokenize'return getattr(sys.modules[module], name)AttributeError: module '__main__' has no attribute 'tokenize'Traceback (most recent call last):File "<string>", line 1, in <module>Traceback (most recent call last):File "<string>", line 1, in <module>File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_mainFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_mainexitcode = _main(fd)exitcode = _main(fd)
实际上它会继续,但我想这已经足够了,因为它开始了某种循环。
如果需要进一步的信息,请告诉我。
提前感谢
回答:
在Stack Overflow上查看了类似问题,显示存在封装和解封的问题。我猜测你用来执行joblib.dump
的代码位于不同的目录中。你可以将其放到与此程序相同的目录中,然后重新运行封装和解封过程吗?__main__
是为封装的目录存储的,解封器在运行时会尝试搜索它。