在使用Latent Dirichlet Allocation模型的transform方法时,解封后出现错误

我使用sklearn训练了一个潜在Dirichlet分配模型。当我解封它后,我使用countVectorizer来转换文档,然后使用LDA转换这个实例以获取主题分布,但出现了以下错误:

AttributeError: module '__main__' has no attribute 'tokenize'

这是我的代码:

lda = joblib.load('lda_good.pkl')#训练好的LDA模型tf_vect = joblib.load('tf_vectorizer_.pkl')#向量化器texts = readContent('doc_name.pdf')new_doc = tf_vect.transform(texts)print(new_doc)print(lda.transform(new_doc))

问题在于解封后的countVectorizer对象运行正常,我可以使用.transform方法,但是当我尝试使用LDA的属性进行.transform时,它似乎引用了countvectorizer中的tokenize函数…tokenize函数是在代码前面定义的,但我无法理解tokenize与潜在Dirichlet分配的transform方法有什么关系。奇怪的是,所有这些代码在jupyter notebook中运行正常,但在作为脚本运行时却不行…

所有代码都在一个文件中。模型是在jupyter notebook中训练的,现在我尝试在脚本中使用这个模型。

这是错误追踪信息:

Traceback (most recent call last):File "<string>", line 1, in <module>Traceback (most recent call last):File     "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_mainFile "<string>", line 1, in <module>File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_mainexitcode = _main(fd)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _mainexitcode = _main(fd)prepare(preparation_data)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _mainFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 226, in prepareprepare(preparation_data)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 226, in prepare_fixup_main_from_path(data['init_main_from_path'])File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 278, in _fixup_main_from_pathTraceback (most recent call last):Traceback (most recent call last):_fixup_main_from_path(data['init_main_from_path'])File "<string>", line 1, in <module>File "<string>", line 1, in <module>run_name="__mp_main__")File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 278, in _fixup_main_from_pathFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 254, in run_pathFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_mainFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_mainrun_name="__mp_main__")pkg_name=pkg_name, script_name=fname)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 254, in run_pathexitcode = _main(fd)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 96, in _run_module_codeexitcode = _main(fd)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _mainFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _mainpkg_name=pkg_name, script_name=fname)prepare(preparation_data)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 226, in prepareFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 96, in _run_module_codemod_name, mod_spec, pkg_name, script_name)_fixup_main_from_path(data['init_main_from_path'])prepare(preparation_data)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 85, in _run_codeFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 278, in _fixup_main_from_pathmod_name, mod_spec, pkg_name, script_name)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 226, in prepareexec(code, run_globals)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 85, in _run_codeFile "C:\Users\eduard.bermejo\Documents\Machine learning\gestió documental\POC\program_POC.py", line 160, in <module>run_name="__mp_main__")_fixup_main_from_path(data['init_main_from_path'])File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 254, in run_pathFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 278, in _fixup_main_from_pathexec(code, run_globals)tf_vect = joblib.load('tf_vectorizer_.pkl')pkg_name=pkg_name, script_name=fname)File "C:\Users\eduard.bermejo\Documents\Machine learning\gestió    documental\POC\program_POC.py", line 160, in <module>File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\site-  packages\sklearn\externals\joblib\numpy_pickle.py", line 459, in loadFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py",  line 96, in _run_module_coderun_name="__mp_main__")File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 254, in run_pathtf_vect = joblib.load('tf_vectorizer_.pkl')obj = unpickler.load()mod_name, mod_spec, pkg_name, script_name)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\numpy_pickle.py", line 459, in loadFile  "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1039, in loadFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 85, in _run_codepkg_name=pkg_name, script_name=fname)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 96, in _run_module_codeexec(code, run_globals)File "C:\Users\eduard.bermejo\Documents\Machine learning\gestió documental\POC\program_POC.py", line 160, in <module>obj = unpickler.load()mod_name, mod_spec, pkg_name, script_name)dispatch[key[0]](self)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1039, in loadFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 85, in _run_codeFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1334, in load_globaltf_vect = joblib.load('tf_vectorizer_.pkl')exec(code, run_globals)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\site- packages\sklearn\externals\joblib\numpy_pickle.py", line 459, in loadFile "C:\Users\eduard.bermejo\Documents\Machine learning\gestió documental\POC\program_POC.py", line 160, in <module>dispatch[key[0]](self)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1334, in load_globaltf_vect = joblib.load('tf_vectorizer_.pkl')obj = unpickler.load()klass = self.find_class(module, name)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\numpy_pickle.py", line 459, in loadFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1039, in loadFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1388, in find_classklass = self.find_class(module, name)obj = unpickler.load()File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1388, in find_classFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1039, in loaddispatch[key[0]](self)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1334, in load_globalklass = self.find_class(module, name)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1388, in find_classreturn getattr(sys.modules[module], name)AttributeError: module '__main__' has no attribute 'tokenize'dispatch[key[0]](self)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1334, in load_globalklass = self.find_class(module, name)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1388, in find_classreturn getattr(sys.modules[module], name)AttributeError: module '__main__' has no attribute 'tokenize'klass = self.find_class(module, name)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1388, in find_classreturn getattr(sys.modules[module], name)AttributeError: module '__main__' has no attribute 'tokenize'return getattr(sys.modules[module], name)AttributeError: module '__main__' has no attribute 'tokenize'Traceback (most recent call last):File "<string>", line 1, in <module>Traceback (most recent call last):File "<string>", line 1, in <module>File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_mainFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_mainexitcode = _main(fd)exitcode = _main(fd)

实际上它会继续,但我想这已经足够了,因为它开始了某种循环。

如果需要进一步的信息,请告诉我。

提前感谢


回答:

在Stack Overflow上查看了类似问题,显示存在封装和解封的问题。我猜测你用来执行joblib.dump的代码位于不同的目录中。你可以将其放到与此程序相同的目录中,然后重新运行封装和解封过程吗?__main__是为封装的目录存储的,解封器在运行时会尝试搜索它。

Related Posts

L1-L2正则化的不同系数

我想对网络的权重同时应用L1和L2正则化。然而,我找不…

使用scikit-learn的无监督方法将列表分类成不同组别,有没有办法?

我有一系列实例,每个实例都有一份列表,代表它所遵循的不…

f1_score metric in lightgbm

我想使用自定义指标f1_score来训练一个lgb模型…

通过相关系数矩阵进行特征选择

我在测试不同的算法时,如逻辑回归、高斯朴素贝叶斯、随机…

可以将机器学习库用于流式输入和输出吗?

已关闭。此问题需要更加聚焦。目前不接受回答。 想要改进…

在TensorFlow中,queue.dequeue_up_to()方法的用途是什么?

我对这个方法感到非常困惑,特别是当我发现这个令人费解的…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注