多项式回归阶数增加导致的错误

我正在尝试预测波士顿房价。当我选择一阶或二阶的多项式回归时，R2分数还可以。但当选择三阶时，R2分数反而下降了。

# Importing the librariesimport numpy as npimport matplotlib.pyplot as pltimport pandas as pd# Importing the datasetfrom sklearn.datasets import load_bostonboston_dataset = load_boston()dataset = pd.DataFrame(boston_dataset.data, columns = boston_dataset.feature_names)dataset['MEDV'] = boston_dataset.targetX = dataset.iloc[:, 0:13].valuesy = dataset.iloc[:, 13].values.reshape(-1,1)# Splitting the dataset into the Training set and Test setfrom sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)# Fitting Linear Regression to the datasetfrom sklearn.linear_model import LinearRegression# Fitting Polynomial Regression to the datasetfrom sklearn.preprocessing import PolynomialFeaturespoly_reg = PolynomialFeatures(degree = 2)   # <-- Tuning to 3X_poly = poly_reg.fit_transform(X_train)poly_reg.fit(X_poly, y_train)lin_reg_2 = LinearRegression()lin_reg_2.fit(X_poly, y_train)y_pred = lin_reg_2.predict(poly_reg.fit_transform(X_test))from sklearn.metrics import r2_scoreprint('Prediction Score is: ', r2_score(y_test, y_pred))

输出（阶数=2）：

Prediction Score is:  0.6903318065831567

输出（阶数=3）：

Prediction Score is:  -12898.308114085281

回答：

这被称为模型过拟合。你所做的是让模型完美地适应训练集，这将导致高方差。当你的假设在训练集上拟合得很好时，它在测试集上的表现就会变差。你可以使用r2_score(X_train,y_train)来检查你的训练集的R2分数，它会很高。你需要在偏差和方差之间找到平衡点。

你可以尝试其他回归模型，比如Lasso和Ridge，并调整它们的alpha值，如果你希望获得更高的R2分数。为了更好地理解，我放了一张图片，展示了随着多项式阶数增加，假设线是如何受到影响的。

学技术

多项式回归阶数增加导致的错误

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复