我有两个数据集(训练集和测试集),它们具有完全相同的特征和标签列,仅内部数据(数字和值)不同。以下是我的代码:
import pandas as pdimport numpy as np from sklearn.model_selection import train_test_splitfrom sklearn.neural_network import MLPRegressordatatraining = pd.read_csv("datatrain.csv")datatesting = pd.read_csv("datatest.csv")columns = ["Full","Id","Id & PPDB","Id & Words Sequence","Id & Synonyms","Id & Hypernyms","Id & Hyponyms"]labeltrain = datatraining["Gold Standard"].valuesfeaturestrain = datatraining[list(columns)].valueslabeltest = datatesting["Gold Standard"].valuesfeaturestest = datatesting[list(columns)].valuesX_train = featurestrainy_train = labeltrainX_test = featurestesty_test = labeltestmlp = MLPRegressor(solver='lbfgs', hidden_layer_sizes=50, max_iter=1000, learning_rate='constant')mlp.fit(X_train, y_train)print('Accuracy training : {:.3f}'.format(mlp.score(X_train, y_train)))printmlp.fit(X_test, y_test)print('Accuracy testing : {:.3f}'.format(mlp.score(X_test, y_test)))print
我仍然怀疑我的代码是否正确地计算了训练和测试的得分,因为我没有看到任何区分训练集和测试集的标识。我看到的是两者都在训练,或者两者都在测试。有人能解释如何区分它们吗?还是我的代码已经正确了?谢谢
回答:
一旦你在训练集上拟合了模型,你就不应该在测试集上再次拟合。相反,你应该使用测试集来评估模型的性能。因此,你需要从代码中删除以下行:
mlp.fit(X_test, y_test)
然后使用以下行:
print('Accuracy testing : {:.3f}'.format(mlp.score(X_test, y_test)))
你将能够评估模型在未见数据上的表现。