在Python中使用Scikit Learn拟合所有变量

这个问题在其他地方曾针对不同的软件包提出过，但我想问的是在Scikit Learn中是否有类似于R语言中包含所有变量或排除某些指定变量的方法？

举个例子，假设我有一个回归模型 y = x1 + x2 + x3 + x4。在R中，我可以通过运行以下代码来评估这个回归模型：

result = lm(y ~ ., data=DF)summary(result)

我认为在Python中应该也有类似的方法来简化公式，因为对于大型数据集来说，逐一写出所有变量是非常不切实际的。

回答：

我们可以尝试以下解决方案（我们使用iris数据集，将标签species转换为数值，并拟合一个线性回归模型，来看看如何在R和python sklearn中使用所有独立变量）：

在R中

summary(lm(as.numeric(Species)~., iris))[c('coefficients', 'r.squared')]$coefficients                Estimate Std. Error   t value     Pr(>|t|)(Intercept)   1.18649525 0.20484104  5.792273 4.150495e-08Sepal.Length -0.11190585 0.05764674 -1.941235 5.416918e-02Sepal.Width  -0.04007949 0.05968881 -0.671474 5.029869e-01Petal.Length  0.22864503 0.05685036  4.021874 9.255215e-05Petal.Width   0.60925205 0.09445750  6.450013 1.564180e-09$r.squared[1] 0.9303939

在Python中（使用sklearn和pasty）

from sklearn.datasets import load_irisimport pandas as pdfrom patsy import dmatricesiris = load_iris()names = [f_name.replace(" ", "_").strip("_(cm)") for f_name in iris.feature_names]iris_df = pd.DataFrame(iris.data, columns=names)iris_df['species'] = iris.target# 在Windows Python 2.7中，pasty至少不支持'.'，所以这里是一个变通方法 y, X = dmatrices('species ~ ' + '+'.join(iris_df.columns - ['species']),                  iris_df, return_type="dataframe")from sklearn.linear_model import LinearRegressionmodel = LinearRegression()model.fit(X, y)print model.score(X,y)# 0.930422367533print model.intercept_, model.coef_# [ 0.19208399] [[0.22700138  0.60989412 -0.10974146 -0.04424045]]

正如我们所见，使用R和Python中的pasty学习到的模型是相似的（系数的顺序有所不同）。

学技术

在Python中使用Scikit Learn拟合所有变量

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复