使用K折交叉验证

我试图使用三个与银行历史相关的二元解释变量：违约、房产和贷款，通过逻辑回归分类器来预测二元响应变量。

我有以下数据集：

用于将文本“no/yes”转换为整数“0/1”的映射函数

convert_to_binary = {'no' : 0, 'yes' : 1}default = bank['default'].map(convert_to_binary)housing = bank['housing'].map(convert_to_binary)loan = bank['loan'].map(convert_to_binary)response = bank['response'].map(convert_to_binary)

我将三个解释变量和响应变量添加到数组中

data = np.array([np.array(default), np.array(housing), np.array(loan),np.array(response)]).Tkfold = KFold(n_splits=3)scores = []for train_index, test_index in kfold.split(data):    X_train, X_test = data[train_index], data[test_index]    y_train, y_test = response[train_index], response[test_index]    model = LogisticRegression().fit(X_train, y_train)    pred = model.predict(data[test_index])    results = model.score(X_test, y_test)    scores.append(results)print(np.mean(scores))

我的准确率总是100%，我知道这不正确。准确率应该在50-65%左右？

我做错了什么吗？

回答：

分割方式不正确

这是正确的分割方式

X_train, X_labels = data[train_index], response[train_index]y_test, y_labels = data[test_index], response[test_index]model = LogisticRegression().fit(X_train, X_labels)pred = model.predict(y_test)acc = sklearn.metrics.accuracy_score(y_labels,pred,normalize=True)

学技术

使用K折交叉验证

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复