我有一组5000个数据点,格式如(x, y, z),例如(0, 1, 50),其中x=1, y=2, z=120。利用这5000个数据,我需要得到一个方程,
该方程在给定x和y的情况下,能够计算出z的值
回答:
你可以使用statsmodels.ols
。假设你可以从你的(x, y, z)
数据创建一个pd.DataFrame
,这里是一些示例数据:
import pandas as pddf = pd.DataFrame(np.random.randint(100, size=(150, 3)), columns=list('XYZ'))df.info()RangeIndex: 150 entries, 0 to 149Data columns (total 3 columns):X 150 non-null int64Y 150 non-null int64Z 150 non-null int64
现在估计线性回归参数:
import numpy as npimport statsmodels.api as smmodel = sm.OLS(df['Z'], df[['X', 'Y']])results = model.fit()
得到结果如下:
results.summary()) OLS Regression Results ==============================================================================Dep. Variable: Z R-squared: 0.652Model: OLS Adj. R-squared: 0.647Method: Least Squares F-statistic: 138.6Date: Fri, 17 Jun 2016 Prob (F-statistic): 1.21e-34Time: 13:48:38 Log-Likelihood: -741.94No. Observations: 150 AIC: 1488.Df Residuals: 148 BIC: 1494.Df Model: 2 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [95.0% Conf. Int.]------------------------------------------------------------------------------X 0.5224 0.076 6.874 0.000 0.372 0.673Y 0.3531 0.076 4.667 0.000 0.204 0.503==============================================================================Omnibus: 5.869 Durbin-Watson: 1.921Prob(Omnibus): 0.053 Jarque-Bera (JB): 2.990Skew: -0.000 Prob(JB): 0.224Kurtosis: 2.308 Cond. No. 2.70==============================================================================Warnings:[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
进行预测时,使用以下代码:
params = results.paramsparams = results.paramsdf['predictions'] = model.predict(params)
预测结果如下:
X Y Z predictions0 31 85 75 54.7018301 36 46 43 34.8286052 77 42 8 43.7953863 78 84 65 66.9327614 27 54 50 36.737606