如何在sklearn中使用我自己的数据集 – Python3

我想进行线性回归，但我希望使用我自己的来自某些.txt文件的数据。我有一些数据，表格中有3列。

然后，我想知道如何更改以下代码，这是一个来自http://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html的示例

然后我稍微更改了之前示例中的代码，并发明了一些数据，这样做正确吗？比如这样使用一些X和Y。然后我还想知道在方程中：x_train = x [:2]，[:2]对我的过程有什么影响。我不太理解这部分。

from sklearn import linear_modelimport matplotlib.pyplot as pltfrom sklearn.metrics import mean_squared_error, r2_score#X必须是numpy数组而不是列表。x=([0],[1],[2],[3],[4],[5],[6],[7],[8],[9],[10])y=[5,3,8,3,4,5,5,7,8,9,10]x_train = x [:2]x_test = x [2:]y_train = y[:2]y_test = y[2:]regr = linear_model.LinearRegression()regr.fit (x_train,y_train)y_pred = regr.predict(x_test)#系数print('Coefficients: \n', regr.coef_)#均方误差print("Mean squared error: %.2f" % mean_squared_error(y_test, y_pred))print('Variance score: %.2f' % r2_score(y_test, y_pred))plt.scatter(x_test, y_test,  color='black')plt.plot(x_test, y_pred, color='blue', linewidth=3)plt.axis([0, 20, 0, 20])plt.show()

编辑1

在本网页的帮助下，我尝试编写了一些代码，以生成我自己数据的拟合，但无法得到正确的拟合，所以如果有人有时间再帮我一下，或者告诉我我是否做错了什么。

我使用的代码和获取的图片

import pandas as pdimport numpy as npfrom sklearn.model_selection import KFoldfrom sklearn.model_selection import cross_val_scorefrom sklearn import linear_modelfrom sklearn.metrics import mean_squared_error, r2_scoreimport matplotlib.pyplot as pltdata = pd.read_csv('data.txt')#x = data[['col1','col2']]x = data[['col1']]y = data['col3']#转换为数组以适应模型x=np.asarray(x)y=np.asarray(y)#定义KFolds kf = KFold(n_splits=2)#定义模型regr = linear_model.LinearRegression()#使用交叉验证并返回每个Fold的r2得分 #如果您想返回r2以外的其他得分，只需更改cross_val_scores中的评分scores = cross_val_score(regr, x, y, cv= kf, scoring= 'r2')print(scores)for train_index, test_index in kf.split(x):  print("训练:", train_index, "测试:", test_index)  X_train, X_test = x[train_index], x[test_index]  y_train, y_test = y[train_index], y[test_index]plt.scatter (X_test, y_test)plt.show()

我在这里放了一张我的数据和从训练和测试中得到的图片

然后我进行了一些拟合过程，但我不知道是否正确：

regr.fit (X_train, y_train)y_pred = regr.predict(X_test)print(y_pred)plt.scatter(X_test, y_test,  color='black')plt.plot(X_test, y_pred, color='blue', linewidth=3)plt.show()

我得到一个完全奇怪的拟合。

我不明白为什么我会得到它，如果我使用MINUIT时，我的拟合是有效的。所以，如果有人有一些提示来帮助我。

为什么程序显然没有使用我的“y”数据来进行训练或测试样本？

我的数据可以在这里获取：https://www.dropbox.com/sh/nbbsc0fqznkwxvt/AAD-u6lM4orJOGrgIyz0o8B9a?dl=0

对我来说，重要的只是col1和col3，col2应该被忽略。然后我想在这个数据上进行拟合并提取我的拟合值。我知道这是一条适合这些数据的直线。

回答：

学技术

如何在sklearn中使用我自己的数据集 – Python3

发表回复取消回复

相关文章：

Related Posts

Keras Dense层输入未被展平

无法将分类变量输入随机森林

如何在Keras中对每个输出应用Sigmoid函数？

如何选择类概率的最佳阈值？

在Keras中使用深度学习得到不同的结果

‘MatMul’操作的输入’b’类型为float32，与参数’a’的类型float64不匹配

发表回复 取消回复

发表回复取消回复