我使用了statsmodels.formula.api.quantreg来对测试集进行预测。在运行这个方法时,我遇到了一个意料之外的错误:
AttributeError Traceback (most recent call last)<ipython-input-34-12e0d345b0fc> in <module>----> 1 test['ypredL'] = model1.predict( test ).values 2 test['FVC'] = model2.predict( test ).values 3 test['ypredH'] = model3.predict( test ).values 4 test['Confidence'] = np.abs(test['ypredH'] - test['ypredL']) / 2~\anaconda3\envs\knk\lib\site-packages\statsmodels\base\model.py in predict(self, exog, transform, *args, **kwargs) 1081 '\n\nThe original error message returned by patsy is:\n' 1082 '{0}'.format(str(str(exc))))-> 1083 raise exc.__class__(msg) 1084 if orig_exog_len > len(exog) and not is_dict: 1085 import warningsAttributeError: predict requires that you use a DataFrame when predicting from a modelthat was created using the formula api.The original error message returned by patsy is:'DataFrame' object has no attribute 'dtype'
有趣的是,同样的预测代码在训练集上运行时表现得非常好!这是训练部分的代码:
model1 = quantreg('FVC ~ Weeks+Percent+Age+Sex+SmokingStatus', train).fit(q = 0.25)model2 = quantreg('FVC ~ Weeks+Percent+Age+Sex+SmokingStatus', train).fit(q = 0.5)model3 = quantreg('FVC ~ Weeks+Percent+Age+Sex+SmokingStatus', train).fit(q = 0.75)train['y_predL'] = model1.predict(train).valuestrain['y_pred'] = model2.predict(train).valuestrain['y_predH'] = model3.predict(train).values
回答:
错误信息‘DataFrame’ object has no attribute ‘dtype’是正确的,但理解起来有些困难。实际上,这意味着训练集和测试集之间的数据类型存在冲突。在问题中,训练集和测试集中的Weeks数据类型不匹配。
训练集中的Weeks数据类型是int,而测试集中的Weeks数据类型是str。