将预测结果合并到原始数据框中?

我已经完成了一个从文本中分类类别的机器学习算法。我已经完成了99%,但我现在不知道如何将我的预测结果合并回原始数据框,以查看我开始时的数据和预测结果的打印视图。

这是我的代码

#imports data from excel file and shows first 5 rows of datafile_name = r'C:\Users\aac1928\Documents\Machine Learning\Training        Data\RFP Training Data.xlsx'sheet = 'Sheet1'import pandas as pdimport numpyimport xlsxwriterimport sklearndf = pd.read_excel(io=file_name,sheet_name=sheet)#extracts specifics rows from data data = df.iloc[: , [0,2]]print(data)#Gets data ready for modelnewdata = df.iloc[:,[1,2]]newdata = newdata.rename(columns={'Label':'label'})newdata = newdata.rename(columns={'RFP Question':'question'})print(newdata)# how to define X and yfor use with COUNTVECTORIZERX = newdata.questiony = newdata.labelprint(X.shape)print(y.shape)# split X and y into training and testing setsX_train = Xy_train = yX_test = newdata.question[:50]y_test = newdata.label[:50]print(X_train.shape)print(X_test.shape)print(y_train.shape)print(y_test.shape)# import and instantiate CountVectorizer (with the default parameters)from sklearn.feature_extraction.text import CountVectorizervect = CountVectorizer()# equivalently: combine fit and transform into a single stepX_train_dtm = vect.fit_transform(X_train)# transform testing data (using fitted vocabulary) into a document-term matrixX_test_dtm = vect.transform(X_test)X_test_dtm# import and instantiate a logistic regression modelfrom sklearn.linear_model import LogisticRegressionlogreg = LogisticRegression()# train the model using X_train_dtm%time logreg.fit(X_train_dtm, y_train)# make class predictions for X_test_dtmy_pred_class = logreg.predict(X_test_dtm)y_pred_class# calculate predicted probabilities for X_test_dtm (well calibrated)y_pred_prob = logreg.predict_proba(X_test_dtm)[:, 1]y_pred_prob# calculate accuracymetrics.accuracy_score(y_test, y_pred_class)

这是我添加的新数据,用于进行预测,长度与数组相同

# split X and y into training and testing setsX_train = Xy_train = yX_testnew = dfpred.questiony_testnew = dfpred.labelprint(X_train.shape)print(X_testnew.shape)print(y_train.shape)print(y_testnew.shape)

(447,)(168,)(447,)(168,)

# transform new testing data (using fitted vocabulary) into a document-term matrixX_test_dtm_new = vect.transform(X_testnew)X_test_dtm_new

<168×1382 sparse matrix of type ” with 2240 stored elements in Compressed Sparse Row format>

# make class predictions for new X_test_dtmy_pred_class_new = nb.predict(X_test_dtm_new)y_pred_class_new

array([ 3, 3, 19, 18, 5, 10, 10, 5, 19, 3, 3, 3, 5, 3, 3, 3, 3, 9, 19, 5, 5, 10, 9, 5, 18, 19, 9, 9, 19, 19, 18, 18, 18, 4, 18, 3, 9, 18, 19, 19, 18, 19, 5, 19, 19, 3, 3, 18, 18, 5, 18, 3, 4, 5, 6, 4, 5, 19, 19, 5, 5, 19, 19, 4, 5, 18, 5, 5, 19, 5, 18, 5, 19, 18, 19, 5, 7, 5, 9, 9, 9, 9, 10, 9, 9, 5, 5, 5, 5, 3, 18, 4, 9, 5, 3, 6, 9, 18, 7, 5, 9, 5, 5, 19, 5, 5, 19, 5, 6, 5, 5, 6, 9, 21, 10, 9, 18, 9, 9, 3, 18, 5, 6, 18, 6, 3, 6, 5, 18, 6, 5, 18, 5, 6, 7, 7, 5, 7, 19, 18, 6, 5, 5, 5, 5, 5, 19, 16, 5, 19, 5, 5, 5, 5, 19, 5, 7, 19, 6, 7, 3, 18, 18, 18, 6, 19, 19, 7], dtype=int64)

# calculate predicted probabilities for X_test_dtm (well calibrated)y_pred_prob_new = logreg.predict_proba(X_test_dtm_new)[:, 1]y_pred_prob_newdf['prediction'] = pd.Series(y_pred_class_new)dfout = pd.merge(dfpred,df['prediction'].dropna() .to_frame(),how = 'left',left_index = True,   right_index = True)

print(dfout)

希望这对你有帮助,我尽量解释得尽可能清楚


回答:

我想因为你的预测结果只是一个数组,你最好直接使用:

df['predictions'] = y_pred_class

Related Posts

AttributeError: ‘LinearRegression’ 对象没有 ‘predict_proba’ 属性

我需要创建一个自定义转换器作为评分器的输入。 评分器将…

无法在Azure ML工作室中上传statsmodels 0.9rc1 Python包

我在Azure ML工作室中无法上传statsmode…

Python强化学习 – 元组观测空间

我创建了一个自定义的OpenAI Gym环境,具有离散…

### 在TensorFlow中训练芯片和目标图像的格式

已关闭。 这个问题与编程或软件开发无关。目前不接受回答…

有没有用Julia编写的原生机器学习库?

我开始使用Julia了。我听说它的速度比C还快。到目前…

如何使用Spark2和Scala获取数据框中某列的不同值及其计数,并将其作为(k,v)对存储在另一个数据框中

我想获取数据框中每列的不同值及其各自的计数,并将它们作…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注