线性模型收敛变化与列顺序的改变

当我在带有正则化的SciKit线性模型中改变列的顺序(特征顺序)时,我得到了不同的得分。我已经用ElasticNetLasso测试了这一点。我使用的是scikit-learn==0.23.1

import pandas as pdimport numpy as npfrom sklearn import linear_modelfrom sklearn import metricsdf = pd.DataFrame({    'col1': [1, 2, 3, 4, 5, 6],    'col2': [16, 32, 64, 12, 5, 256],    'col3': [7, 8, 9, 10, 12, 11],    'out': [40, 5, 60, 7, 9, 100]})print(df)X_df = df[['col1', 'col2', 'col3']]y_df = df['out']regr = linear_model.ElasticNet(alpha=0.1, random_state=0)regr.fit(X_df, y_df)y_pred = regr.predict(X_df)print("R2:", regr.score(X_df, y_df))print("MSE:", metrics.mean_squared_error(y_df, y_pred))# change the order to: [col2, col1, col3]first_cols = ['col2']cols = first_cols.copy()for c in X_df.columns:    if c not in cols:        cols.append(c)X_df = X_df[cols]regr.fit(X_df, y_df)y_pred = regr.predict(X_df)print("\nReorder:")print("R2:", regr.score(X_df, y_df))print("MSE:", metrics.mean_squared_error(y_df, y_pred))

以上代码的输出结果是:

col1  col2  col3  out0     1    16     7   401     2    32     8    52     3    64     9   603     4    12    10    74     5     5    12    95     6   256    11  100R2: 0.8277462579081043MSE: 207.13034003933535Reorder:R2: 0.8277586094134455MSE: 207.11548769725997

为什么会这样?


回答:

这是因为tol参数的差异。

根据文档说明:

tol : float, default=1e-4

优化过程的容忍度:如果更新值小于tol,优化代码会检查双重间隙是否达到最优,并持续进行直到小于tol

只需在两种情况下都添加tol=1e-12,就可以获得你想要的精度水平。

from sklearn.tree import DecisionTreeClassifierfrom sklearn.model_selection import train_test_splitfrom sklearn.pipeline import make_pipelinefrom sklearn.compose import make_column_transformerfrom sklearn.feature_extraction.text import CountVectorizerimport pandas as pdimport numpy as npfrom sklearn import linear_modelfrom sklearn import metricsdf = pd.DataFrame({    'col1': [1, 2, 3, 4, 5, 6],    'col2': [16, 32, 64, 12, 5, 256],    'col3': [7, 8, 9, 10, 12, 11],    'out': [40, 5, 60, 7, 9, 100]})# print(df)X_df = df[['col1', 'col2', 'col3']]y_df = df['out']regr = linear_model.ElasticNet(alpha=0.1, random_state=0, tol=1e-12)regr.fit(X_df, y_df)y_pred = regr.predict(X_df)print(regr.coef_)print("R2:", regr.score(X_df, y_df))print("MSE:", metrics.mean_squared_error(y_df, y_pred))# change the order to: [col2, col1, col3]first_cols = ['col2']cols = first_cols.copy()for c in X_df.columns:    if c not in cols:        cols.append(c)X_df = X_df[cols]regr = linear_model.ElasticNet(alpha=0.1, random_state=0, tol=1e-12)regr.fit(X_df, y_df)y_pred = regr.predict(X_df)print("\nReorder:")print(regr.coef_)print("R2:", regr.score(X_df, y_df))print("MSE:", metrics.mean_squared_error(y_df, y_pred))
[-8.92519779  0.42980208  3.59812779]R2: 0.8277593357239204MSE: 207.11461432908925Reorder:[ 0.42980208 -8.92519779  3.59812779]R2: 0.8277593357240851MSE: 207.11461432889112

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注