我有一个数据框,我想计算臭氧的多项式回归。我将o3作为y值,日期作为x值。为什么我的多项式回归在2到15阶次看起来都一样?我比较了4阶和15阶,没有发现任何差异… 我将得到的回归结果与CurveExpert软件进行了比较,结果完全不同… 如何解决这些问题并查看4阶和15阶之间的差异?
import matplotlib.pyplot as pltimport datetime as dtimport pandas as pd# Importing the datasetdataset = pd.read_csv('https://raw.githubusercontent.com/iulianastroia/csv_data/master/final_dataframe.csv')dataset['day'] = pd.to_datetime(dataset['day'], dayfirst=True)dataset = dataset.sort_values(by=['readable time'])print(dataset.head())group_by_df = pd.DataFrame([name, group.mean()["o3"]] for name, group in dataset.groupby('day'))group_by_df.columns = ['day', "o3"]group_by_df['day'] = pd.to_datetime(group_by_df['day'])group_by_df['day'] = group_by_df['day'].map(dt.datetime.toordinal)X = group_by_df[['day']].valuesy = group_by_df[['o3']].values# Splitting the dataset into the Training set and Test setfrom sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)# Fitting Linear Regression to the datasetfrom sklearn.linear_model import LinearRegressionlin_reg = LinearRegression()lin_reg.fit(X, y)# Visualizing the Linear Regression resultsdef viz_linear(): plt.scatter(X, y, color='red') plt.plot(X, lin_reg.predict(X), color='blue') plt.title('Linear Regression') plt.xlabel('Date') plt.ylabel('O3 levels') plt.show() returnviz_linear()# Fitting Polynomial Regression to the datasetfrom sklearn.preprocessing import PolynomialFeaturespoly_reg = PolynomialFeatures(degree=15)X_poly = poly_reg.fit_transform(X)pol_reg = LinearRegression()pol_reg.fit(X_poly, y)# Visualizing the Polymonial Regression resultsdef viz_polymonial(): plt.scatter(X, y, color='red') plt.plot(X, pol_reg.predict(poly_reg.fit_transform(X)), color='blue') plt.title('poly Regression grade 15') plt.xlabel('Date') plt.ylabel('O3 levels') plt.show() returnviz_polymonial()
回答:
你已经非常接近了。做得好,你这里有很多内容。
我想你想像这样拟合线性回归的测试集:
# Fitting Linear Regression to the datasetfrom sklearn.linear_model import LinearRegressionlin_reg = LinearRegression()lin_reg.fit(X_test, y_test)
对于多项式回归,则像这样:
# Fitting Polynomial Regression to the datasetfrom sklearn.preprocessing import PolynomialFeaturespoly_reg = PolynomialFeatures(degree=15)X_poly = poly_reg.fit_transform(X_test)pol_reg = LinearRegression()pol_reg.fit(X_poly, y_test)
现在曲线在视觉上会有很大的不同