有人可以解释一下我在运行《用Python构建机器学习系统》书中第4章的blei_lda.py文件时遇到的不支持的操作数错误吗？

我一直在尝试运行《用Python构建机器学习系统》书中第4章的blei_lda.py文件，但一直没有成功。我使用的是Python 2.7和Enthought Canopy GUI。下面是作者提供的实际文件，但GitHub上也有多个副本。

问题是我不断收到以下错误：

TypeError                                 Traceback (most recent call last)c:\users\matt\desktop\pythonprojects\pml\ch04\blei_lda.py in <module>()    for ti in range(model.num_topics):        words = model.show_topic(ti, 64) ------>tf = sum(f for f, w in words)        with open('topics.txt', 'w') as output:        output.write('\n'.join('{}:{}'.format(w, int(1000. * f / tf)) for f, w in words))        output.write("\n\n\n")TypeError: unsupported operand type(s) for +: 'int' and 'unicode'

我尝试过创建一个解决方案，但没有找到完全有效的方法。

我也在网上和Stack Overflow上搜索了解决方案，但似乎只有我在运行这个文件时遇到麻烦。

# This code is supporting material for the book# Building Machine Learning Systems with Python# by Willi Richert and Luis Pedro Coelho# published by PACKT Publishing## It is made available under the MIT Licensefrom __future__ import print_functionfrom wordcloud import create_cloudtry:    from gensim import corpora, models, matutilsexcept:    print("import gensim failed.")    print()    print("Please install it")    raiseimport matplotlib.pyplot as pltimport numpy as npfrom os import pathNUM_TOPICS = 100# Check that data existsif not path.exists('./data/ap/ap.dat'):    print('Error: Expected data to be present at data/ap/')    print('Please cd into ./data & run ./download_ap.sh')# Load the datacorpus = corpora.BleiCorpus('./data/ap/ap.dat', './data/ap/vocab.txt')# Build the topic modelmodel = models.ldamodel.LdaModel(    corpus, num_topics=NUM_TOPICS, id2word=corpus.id2word, alpha=None)# Iterate over all the topics in the modelfor ti in range(model.num_topics):    words = model.show_topic(ti, 64)    tf = sum(f for f, w in words)    with open('topics.txt', 'w') as output:        output.write('\n'.join('{}:{}'.format(w, int(1000. * f / tf)) for f, w in words))        output.write("\n\n\n")# We first identify the most discussed topic, i.e., the one with the# highest total weighttopics = matutils.corpus2dense(model[corpus], num_terms=model.num_topics)weight = topics.sum(1)max_topic = weight.argmax()# Get the top 64 words for this topic# Without the argument, show_topic would return only 10 wordswords = model.show_topic(max_topic, 64)# This function will actually check for the presence of pytagcloud and is otherwise a no-opcreate_cloud('cloud_blei_lda.png', words)num_topics_used = [len(model[doc]) for doc in corpus]fig,ax = plt.subplots()ax.hist(num_topics_used, np.arange(42))ax.set_ylabel('Nr of documents')ax.set_xlabel('Nr of topics')fig.tight_layout()fig.savefig('Figure_04_01.png')# Now, repeat the same exercise using alpha=1.0# You can edit the constant below to play around with this parameterALPHA = 1.0model1 = models.ldamodel.LdaModel(    corpus, num_topics=NUM_TOPICS, id2word=corpus.id2word, alpha=ALPHA)num_topics_used1 = [len(model1[doc]) for doc in corpus]fig,ax = plt.subplots()ax.hist([num_topics_used, num_topics_used1], np.arange(42))ax.set_ylabel('Nr of documents')ax.set_xlabel('Nr of topics')# The coordinates below were fit by trial and error to look goodax.text(9, 223, r'default alpha')ax.text(26, 156, 'alpha=1.0')fig.tight_layout()fig.savefig('Figure_04_02.png')

回答：

在这一行：words = model.show_topic(ti, 64)，words是一个包含元组的列表(unicode,float64)

例如：[(u'school', 0.029515796999228502),(u'prom', 0.018586355008452897)]

所以在这一行tf = sum(f for f, w in words)中，f代表unicode，而w代表浮点值。你试图对unicode值进行求和，这会导致不支持的操作数类型错误。

将这一行修改为tf = sum(f for w, f in words)，这样它将对浮点值进行求和。

出于同样的原因，也要修改这一行output.write('\n'.join('{}:{}'.format(w, int(1000. * f / tf)) for w, f in words))。

所以代码片段将如下所示：

for ti in range(model.num_topics):    words = model.show_topic(ti, 64)    tf = sum(f for w, f in words)    with open('topics.txt', 'w') as output:    output.write('\n'.join('{}:{}'.format(w, int(1000. * f / tf)) for w, f in words))    output.write("\n\n\n")

学技术

有人可以解释一下我在运行《用Python构建机器学习系统》书中第4章的blei_lda.py文件时遇到的不支持的操作数错误吗？

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复