在使用Latent Dirichlet Allocation模型的transform方法时,解封后出现错误

我使用sklearn训练了一个潜在Dirichlet分配模型。当我解封它后,我使用countVectorizer来转换文档,然后使用LDA转换这个实例以获取主题分布,但出现了以下错误:

AttributeError: module '__main__' has no attribute 'tokenize'

这是我的代码:

lda = joblib.load('lda_good.pkl')#训练好的LDA模型tf_vect = joblib.load('tf_vectorizer_.pkl')#向量化器texts = readContent('doc_name.pdf')new_doc = tf_vect.transform(texts)print(new_doc)print(lda.transform(new_doc))

问题在于解封后的countVectorizer对象运行正常,我可以使用.transform方法,但是当我尝试使用LDA的属性进行.transform时,它似乎引用了countvectorizer中的tokenize函数…tokenize函数是在代码前面定义的,但我无法理解tokenize与潜在Dirichlet分配的transform方法有什么关系。奇怪的是,所有这些代码在jupyter notebook中运行正常,但在作为脚本运行时却不行…

所有代码都在一个文件中。模型是在jupyter notebook中训练的,现在我尝试在脚本中使用这个模型。

这是错误追踪信息:

Traceback (most recent call last):File "<string>", line 1, in <module>Traceback (most recent call last):File     "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_mainFile "<string>", line 1, in <module>File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_mainexitcode = _main(fd)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _mainexitcode = _main(fd)prepare(preparation_data)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _mainFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 226, in prepareprepare(preparation_data)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 226, in prepare_fixup_main_from_path(data['init_main_from_path'])File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 278, in _fixup_main_from_pathTraceback (most recent call last):Traceback (most recent call last):_fixup_main_from_path(data['init_main_from_path'])File "<string>", line 1, in <module>File "<string>", line 1, in <module>run_name="__mp_main__")File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 278, in _fixup_main_from_pathFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 254, in run_pathFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_mainFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_mainrun_name="__mp_main__")pkg_name=pkg_name, script_name=fname)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 254, in run_pathexitcode = _main(fd)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 96, in _run_module_codeexitcode = _main(fd)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _mainFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _mainpkg_name=pkg_name, script_name=fname)prepare(preparation_data)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 226, in prepareFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 96, in _run_module_codemod_name, mod_spec, pkg_name, script_name)_fixup_main_from_path(data['init_main_from_path'])prepare(preparation_data)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 85, in _run_codeFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 278, in _fixup_main_from_pathmod_name, mod_spec, pkg_name, script_name)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 226, in prepareexec(code, run_globals)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 85, in _run_codeFile "C:\Users\eduard.bermejo\Documents\Machine learning\gestió documental\POC\program_POC.py", line 160, in <module>run_name="__mp_main__")_fixup_main_from_path(data['init_main_from_path'])File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 254, in run_pathFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 278, in _fixup_main_from_pathexec(code, run_globals)tf_vect = joblib.load('tf_vectorizer_.pkl')pkg_name=pkg_name, script_name=fname)File "C:\Users\eduard.bermejo\Documents\Machine learning\gestió    documental\POC\program_POC.py", line 160, in <module>File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\site-  packages\sklearn\externals\joblib\numpy_pickle.py", line 459, in loadFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py",  line 96, in _run_module_coderun_name="__mp_main__")File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 254, in run_pathtf_vect = joblib.load('tf_vectorizer_.pkl')obj = unpickler.load()mod_name, mod_spec, pkg_name, script_name)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\numpy_pickle.py", line 459, in loadFile  "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1039, in loadFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 85, in _run_codepkg_name=pkg_name, script_name=fname)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 96, in _run_module_codeexec(code, run_globals)File "C:\Users\eduard.bermejo\Documents\Machine learning\gestió documental\POC\program_POC.py", line 160, in <module>obj = unpickler.load()mod_name, mod_spec, pkg_name, script_name)dispatch[key[0]](self)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1039, in loadFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 85, in _run_codeFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1334, in load_globaltf_vect = joblib.load('tf_vectorizer_.pkl')exec(code, run_globals)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\site- packages\sklearn\externals\joblib\numpy_pickle.py", line 459, in loadFile "C:\Users\eduard.bermejo\Documents\Machine learning\gestió documental\POC\program_POC.py", line 160, in <module>dispatch[key[0]](self)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1334, in load_globaltf_vect = joblib.load('tf_vectorizer_.pkl')obj = unpickler.load()klass = self.find_class(module, name)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\numpy_pickle.py", line 459, in loadFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1039, in loadFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1388, in find_classklass = self.find_class(module, name)obj = unpickler.load()File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1388, in find_classFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1039, in loaddispatch[key[0]](self)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1334, in load_globalklass = self.find_class(module, name)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1388, in find_classreturn getattr(sys.modules[module], name)AttributeError: module '__main__' has no attribute 'tokenize'dispatch[key[0]](self)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1334, in load_globalklass = self.find_class(module, name)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1388, in find_classreturn getattr(sys.modules[module], name)AttributeError: module '__main__' has no attribute 'tokenize'klass = self.find_class(module, name)File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1388, in find_classreturn getattr(sys.modules[module], name)AttributeError: module '__main__' has no attribute 'tokenize'return getattr(sys.modules[module], name)AttributeError: module '__main__' has no attribute 'tokenize'Traceback (most recent call last):File "<string>", line 1, in <module>Traceback (most recent call last):File "<string>", line 1, in <module>File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_mainFile "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_mainexitcode = _main(fd)exitcode = _main(fd)

实际上它会继续,但我想这已经足够了,因为它开始了某种循环。

如果需要进一步的信息,请告诉我。

提前感谢


回答:

在Stack Overflow上查看了类似问题,显示存在封装和解封的问题。我猜测你用来执行joblib.dump的代码位于不同的目录中。你可以将其放到与此程序相同的目录中,然后重新运行封装和解封过程吗?__main__是为封装的目录存储的,解封器在运行时会尝试搜索它。

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注