我刚开始学习 Python，以前一直使用 R。我发现当我在 R 中构建一个简单的回归模型时，与在 iPython 中执行相同操作时，得到的结果非常不同。

R 平方值、P 值、系数的显著性 — 没有任何一个匹配。我是读错了输出，还是犯了其他基本错误？

以下是我在 R 和 Python 中的代码和结果：

R 代码

str(df_nv)Classes 'tbl_df', 'tbl' and 'data.frame':   81 obs. of  2 variables: $ Dependent Variabls       : num  733 627 405 353 434 556 381 558 612 901 ... $ Independent Variable: num  0.193 0.167 0.169 0.14 0.145 ...summary(lm(`Dependent Variable` ~ `Independent Variable`, data = df_nv))Call:    lm(formula = `Dependent Variable` ~ `Independent Variable`, data = df_nv)Residuals:    Min      1Q  Median      3Q     Max -501.18 -139.20  -82.61  -15.82 2136.74 Coefficients:                       Estimate Std. Error t value Pr(>|t|)   (Intercept)               478.2      148.2   3.226  0.00183 **`Independent Variable`   -196.1     1076.9  -0.182  0.85601   ---Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1Residual standard error: 381.5 on 79 degrees of freedomMultiple R-squared:  0.0004194, Adjusted R-squared:  -0.01223 F-statistic: 0.03314 on 1 and 79 DF,  p-value: 0.856

iPython Notebook 代码

df_nv.dtypesDependent Variable           float64Independent Variable         float64dtype: objectmodel = sm.OLS(df_nv['Dependent Variable'], df_nv['Independent Variable'])results = model.fit()results.summary()OLS Regression ResultsDep. Variable:  Dependent Variable  R-squared:  0.537Model:  OLS Adj. R-squared: 0.531Method: Least Squares   F-statistic:    92.63Date:   Fri, 20 Jan 2017    Prob (F-statistic): 5.23e-15Time:   09:08:54    Log-Likelihood: -600.40No. Observations:   81  AIC:    1203.Df Residuals:   80  BIC:    1205.Df Model:   1       Covariance Type:    nonrobust       coef    std err t   P>|t|   [95.0% Conf. Int.]Independent Variable    3133.1825   325.537 9.625   0.000   2485.342 3781.023Omnibus:    89.595  Durbin-Watson:  1.940Prob(Omnibus):  0.000   Jarque-Bera (JB):   980.289Skew:   3.489   Prob(JB):   1.36e-213Kurtosis:   18.549  Cond. No.   1.00

供参考，R 和 Python 中数据框的前几行数据：

head(df_nv)  Dependent Variable Independent Variable          <dbl>                <dbl>1           733            0.19323672           627            0.16666673           405            0.16861834           353            0.13986015           434            0.14492756           556            0.1475410

Python:

df_nv.head()    Dependent Variable  Independent Variable5292    733.0   0.1932375320    627.0   0.1666675348    405.0   0.1686185404    353.0   0.1398605460    434.0   0.144928

回答：

以下是使用 python pandas（使用 statsmodels.formula.api）和 R 对 gapminder 数据集进行线性回归的结果，它们完全相同：

R 代码

df <- read.csv('gapminder.csv')df <- df[c('internetuserate', 'urbanrate')]df <- df[complete.cases(df),]dim(df)# [1] 190   2m <- lm(internetuserate~urbanrate, df)summary(m)#Call:#lm(formula = internetuserate ~ urbanrate, data = df)#Residuals:#    Min      1Q  Median      3Q     Max #-51.474 -15.857  -3.954  14.305  74.590 #Coefficients:#            Estimate Std. Error t value Pr(>|t|)    #(Intercept) -4.90375    4.11485  -1.192    0.235    #urbanrate    0.72022    0.06753  10.665   <2e-16 ***#---#Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# #Residual standard error: 22.03 on 188 degrees of freedom#Multiple R-squared:  0.3769,   Adjusted R-squared:  0.3736 #F-statistic: 113.7 on 1 and 188 DF,  p-value: < 2.2e-16

学技术

R 与 Python 中 statsmodel OLS 回归结果的差异

R 代码

python 代码

发表回复取消回复

R 代码

python 代码

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复