看起来相似的两个Python循环，但输出结果却不同？

昨天我在尝试完成Udacity的第11课，关于文本向量化的内容。我浏览了代码，一切看起来运行正常——我处理了一些电子邮件，打开它们，移除了一些签名词，并将每个电子邮件的词干化词汇返回到一个列表中。

这是循环1：

for name, from_person in [("sara", from_sara), ("chris", from_chris)]:    for path in from_person:        ### only look at first 200 emails when developing        ### once everything is working, remove this line to run over full dataset#        temp_counter += 1    if temp_counter < 200:        path = os.path.join('/xxx', path[:-1])        email = open(path, "r")        ### use parseOutText to extract the text from the opened email        email_stemmed = parseOutText(email)        ### use str.replace() to remove any instances of the words        ### ["sara", "shackleton", "chris", "germani"]        email_stemmed.replace("sara","")        email_stemmed.replace("shackleton","")        email_stemmed.replace("chris","")        email_stemmed.replace("germani","")    ### append the text to word_data    word_data.append(email_stemmed.replace('\n', ' ').strip())    ### append a 0 to from_data if email is from Sara, and 1 if email is from Chris        if from_person == "sara":            from_data.append(0)        elif from_person == "chris":            from_data.append(1)    email.close()

这是循环2：

for name, from_person in [("sara", from_sara), ("chris", from_chris)]:    for path in from_person:        ### only look at first 200 emails when developing        ### once everything is working, remove this line to run over full dataset#        temp_counter += 1        if temp_counter < 200:            path = os.path.join('/xxx', path[:-1])            email = open(path, "r")            ### use parseOutText to extract the text from the opened email            stemmed_email = parseOutText(email)            ### use str.replace() to remove any instances of the words            ### ["sara", "shackleton", "chris", "germani"]            signature_words = ["sara", "shackleton", "chris", "germani"]            for each_word in signature_words:                stemmed_email = stemmed_email.replace(each_word, '')         #careful here, dont use another variable, I did and broke my head to solve it            ### append the text to word_data            word_data.append(stemmed_email)            ### append a 0 to from_data if email is from Sara, and 1 if email is from Chris            if name == "sara":                from_data.append(0)            else: # its chris                from_data.append(1)            email.close()

代码的下一部分按预期工作：

print("emails processed")from_sara.close()from_chris.close()pickle.dump( word_data, open("/xxx/your_word_data.pkl", "wb") )pickle.dump( from_data, open("xxx/your_email_authors.pkl", "wb") )print("Answer to Lesson 11 quiz 19: ")print(word_data[152])### in Part 4, do TfIdf vectorization herefrom sklearn.feature_extraction.text import TfidfVectorizerfrom sklearn.feature_extraction import stop_wordsprint("SKLearn has this many Stop Words: ")print(len(stop_words.ENGLISH_STOP_WORDS))vectorizer = TfidfVectorizer(stop_words="english", lowercase=True)vectorizer.fit_transform(word_data)feature_names = vectorizer.get_feature_names()print('Number of different words: ')print(len(feature_names))

但是当我用循环1计算总词数时，得到了错误的结果。用循环2计算时，我得到了正确的结果。

我已经看了这段代码太久了，我找不到其中的区别——我在循环1中做错了什么？

为了记录，我一直得到的错误答案是38825。正确答案应该是38757。

非常感谢您的帮助，亲切的陌生人！

回答：

这些行没有任何作用：

email_stemmed.replace("sara","")email_stemmed.replace("shackleton","")email_stemmed.replace("chris","")email_stemmed.replace("germani","")

replace返回一个新字符串，并不修改email_stemmed。相反，你应该将返回值设置为email_stemmed：

email_stemmed = email_stemmed.replace("sara", "")

依此类推。

循环二实际上在for循环中设置了返回值：

for each_word in signature_words:    stemmed_email = stemmed_email.replace(each_word, '')

上面的代码片段并不等同，因为在第一个片段的末尾，email_stemmed完全没有改变，因为replace使用不当，而在第二个片段的末尾，stemmed_email实际上已经被去除了每个词。

学技术

看起来相似的两个Python循环，但输出结果却不同？

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复