### 在scikit-learn中使用Ridge回归的系数路径

从一个包含774行的pandas DataFrame d_train 开始:

我们希望遵循此示例来研究Ridge回归的系数路径。

在该示例中，变量类型如下:

X, y, w = make_regression(n_samples=10, n_features=10, coef=True,                          random_state=1, bias=3.5)print X.shape, type(X), y.shape, type(y), w.shape, type(w)>> (10, 10) <type 'numpy.ndarray'> (10,) <type 'numpy.ndarray'> (10,) <type'numpy.ndarray'>

为了避免这个stackoverflow讨论中提到的问题，我将所有数据转换为numpy数组:

predictors = ['p1', 'p2', 'p3', 'p4']target = ['target_bins']X = d_train[predictors].as_matrix()### X = np.transpose(d_train[predictors].as_matrix())y = d_train['target_bins'].as_matrix()w = numpy.full((774,), 3, dtype=float)print X.shape, type(X), y.shape, type(y), w.shape, type(w)>> (774, 4) <type 'numpy.ndarray'> y_shape: (774,) <type 'numpy.ndarray'>     w_shape: (774,) <type 'numpy.ndarray'>

然后我尝试运行(a)示例中的确切代码，(b)在ridge调用中添加参数fit_intercept = True, normalize = True（我的数据未进行缩放），但得到了相同的错误信息:

my_ridge = Ridge()coefs = []errors = []alphas = np.logspace(-6, 6, 200)for a in alphas:    my_ridge.set_params(alpha=a, fit_intercept = True, normalize = True)    my_ridge.fit(X, y)    coefs.append(my_ridge.coef_)    errors.append(mean_squared_error(my_ridge.coef_, w))>> ValueError: Found input variables with inconsistent numbers of samples: [4, 774]

正如代码中注释掉的部分所示，我也尝试了“相同”的代码，但使用了转置的X矩阵。我还尝试在创建X矩阵之前对数据进行缩放。得到了相同的错误信息。

最后，我使用’RidgeClassifier’做了同样的事情，并得到了一个不同的错误信息。

>> Found input variables with inconsistent numbers of samples: [1, 774]

问题：我完全不知道这里发生了什么——你能帮我吗？

使用的是Python 2.7，运行在Canopy 1.7.4.3348（64位）上，scikit-learn版本为18.01-3，pandas版本为0.19.2-2

谢谢你。

回答：

你需要为每个特征设置一个权重w（因为你为每个特征学习一个权重），但在你的代码中，权重向量的维度是774（这是训练数据集的行数），这就是为什么它不起作用。修改代码如下（设置4个权重而不是774个），一切都会正常运行:

w = np.full((4,), 3, dtype=float) # 特征数为4，即p1, p2, p3, p4print X.shape, type(X), y.shape, type(y), w.shape, type(w)#(774L, 4L) <type 'numpy.ndarray'> (774L,) <type 'numpy.ndarray'> (4L,) <type 'numpy.ndarray'>

现在你可以运行http://scikit-learn.org/stable/auto_examples/linear_model/plot_ridge_coeffs.html#sphx-glr-auto-examples-linear-model-plot-ridge-coeffs-py中的剩余代码，看看权重和误差如何随正则化参数alpha变化，并通过网格搜索获得以下图形

学技术

### 在scikit-learn中使用Ridge回归的系数路径

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复