在机器学习中使用三种不同的标签

我对机器学习还是一个新手。我正在审查用于区分电子邮件中垃圾邮件或正常邮件的代码。我在为另一组数据设置代码时遇到了问题。所以，我的数据集不仅仅包含正常邮件或垃圾邮件的值。我有两个不同的分类值（年龄和性别）。当我尝试在下面的代码块中使用两个分类值时，我得到了一个错误，错误提示为解包的值过多。我该如何处理所有的值呢？

from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test  = train_test_split(messages_bow, import_data['age'], import_data['gender'], test_size = 0.20, random_state = 0)

完整代码：

import numpy as npimport pandasimport nltkfrom nltk.corpus import stopwordsimport string# 导入数据.import_data = pandas.read_csv('/root/Desktop/%20/%100.csv' , encoding='cp1252') # 查看列标题.print(import_data.columns) # 移除重复项.import_data.drop_duplicates(inplace = True) # 查看数据大小.print(import_data.shape) # 标记化（一个标记列表），将用作分析器#1.标点符号是[!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~]#2.自然语言处理中的停用词，是无用的词（数据）。def process_text(text):    '''    将要覆盖的内容：    1. 移除标点符号    2. 移除停用词    3. 返回清理后的文本词列表    '''    #1    nopunc = [char for char in text if char not in string.punctuation]    nopunc = ''.join(nopunc)    #2    clean_words = [word for word in nopunc.split() if word.lower() not in stopwords.words('english')]    #3    return clean_words# 展示标记化（一个标记列表）print(import_data['text'].head().apply(process_text)) # 将文本转换为标记计数的矩阵.from sklearn.feature_extraction.text import CountVectorizer messages_bow = CountVectorizer(analyzer=process_text).fit_transform(import_data['text']) # 将数据分为80%的训练集和20%的测试集from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test  = train_test_split(messages_bow, import_data['gender'], import_data['frequency'], test_size = 0.20, random_state = 0)# 获取messages_bow的形状print(messages_bow.shape)

回答：

train_test_split 将你传递给它的每个参数分割成训练和测试集。由于你正在分割三种不同类型的数据，你需要6个变量：

X_train, X_test, age_train, age_test, gender_train, gender_test = train_test_split(messages_bow, import_data['age'], import_data['gender'], test_size=0.20, random_state=0)

学技术

在机器学习中使用三种不同的标签

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复