回归线是否欠拟合,如果是的话,我该如何做才能得到准确的结果?我无法识别回归线是过拟合、欠拟合还是准确的,所以也欢迎关于这些方面的建议。文件“Advertising.csv”:-https://github.com/marcopeix/ISL-linear-regression/tree/master/data
#Importing the librariesimport pandas as pdimport numpy as npimport matplotlib.pyplot as pltfrom sklearn.linear_model import LinearRegressionfrom sklearn.metrics import r2_score,mean_squared_error#reading and knowing the datadata=pd.read_csv('Advertising.csv')#print(data.head())#print(data.columns)#print(data.shape)#plotting the dataplt.figure(figsize=(10,8))plt.scatter(data['TV'],data['sales'], c='black')plt.xlabel('Money Spent on TV ads')plt.ylabel('Sales')plt.show()#storing data into variable and shaping dataX=data['TV'].values.reshape(-1,1)Y=data['sales'].values.reshape(-1,1)#calling the model and fitting the modelreg=LinearRegression()reg.fit(X,Y)#making predictionspredictions=reg.predict(X)#plotting the predicted dataplt.figure(figsize=(16,8))plt.scatter(data['TV'],data['sales'], c='black')plt.plot(data['TV'],predictions, c='blue',linewidth=2)plt.xlabel('Money Spent on TV ads')plt.ylabel('Sales')plt.show()r2= r2_score(Y,predictions)print("R2 score is: ",r2)print("Accuracy: {:.2f}".format(reg.score(X,Y)))
回答:
要判断你的模型是否欠拟合(或过拟合),你需要查看模型的偏差(你的模型预测的输出与预期输出之间的距离)。据我所知,仅通过查看代码是无法判断的,你还需要评估模型(运行它)。
由于这是线性回归,可能会出现欠拟合的情况。
我建议将数据分为训练集和测试集。你可以在训练集上拟合模型,并使用测试集查看模型在未见数据上的表现。如果模型在训练数据和测试数据上表现都很差,则说明模型欠拟合。如果模型在训练数据上表现出色,但在测试数据上表现不佳,则说明模型过拟合。
你可以尝试以下方法:
from sklearn.model_selection import train_test_split# This will split the data into a train set and a test set, leaving 20% (the test_size parameter) for testingX, X_test, Y, Y_test = train_test_split(data['TV'].values.reshape(-1,1), data['sales'].values.reshape(-1,1), test_size=0.2)# Then fit your model ...# e.g. reg.fit(X,Y)# Finally evaluate how well it does on the training and test data.print("Test score " + str(reg.score(X_test, Y_test)))print("Train score " + str(reg.score(X_test, Y_test)))