这个问题非常奇怪,因为那段代码在其他数据集上运行得很好。
完整代码如下:
import numpy as npimport pandas as pdimport xgboost as xgbfrom sklearn.cross_validation import train_test_split# # 分割学习集X_fit, X_eval, y_fit, y_eval= train_test_split( train, target, test_size=0.2, random_state=1)clf = xgb.XGBClassifier(missing=np.nan, max_depth=6, n_estimators=5, learning_rate=0.15, subsample=1, colsample_bytree=0.9, seed=1400)# 拟合clf.fit(X_fit, y_fit, early_stopping_rounds=50, eval_metric="logloss", eval_set=[(X_eval, y_eval)])#print y_predy_pred= clf.predict_proba(test)[:,1]
最后一行引发了以下错误(提供了完整输出):
将训练直到验证_0错误在50轮内没有减少。[0] validation_0-logloss:0.554366[1] validation_0-logloss:0.451454[2] validation_0-logloss:0.372142[3] validation_0-logloss:0.309450[4] validation_0-logloss:0.259002Traceback (most recent call last): File "../src/script.py", line 57, in y_pred= clf.predict_proba(test)[:,1] File "/opt/conda/lib/python3.4/site-packages/xgboost-0.4-py3.4.egg/xgboost/sklearn.py", line 435, in predict_proba test_dmatrix = DMatrix(data, missing=self.missing) File "/opt/conda/lib/python3.4/site-packages/xgboost-0.4-py3.4.egg/xgboost/core.py", line 220, in __init__ feature_types) File "/opt/conda/lib/python3.4/site-packages/xgboost-0.4-py3.4.egg/xgboost/core.py", line 147, in _maybe_pandas_data raise ValueError('DataFrame.dtypes for data must be int, float or bool')ValueError: DataFrame.dtypes for data must be int, float or boolException ignored in: >Traceback (most recent call last): File "/opt/conda/lib/python3.4/site-packages/xgboost-0.4-py3.4.egg/xgboost/core.py", line 289, in __del__ _check_call(_LIB.XGDMatrixFree(self.handle))AttributeError: 'DMatrix' object has no attribute 'handle'
这里出了什么问题?我不知道如何修复它
更新1:实际上这是kaggle的问题: https://www.kaggle.com/insaff/bnp-paribas-cardif-claims-management/xgboost
回答:
这里的问题与初始数据有关:一些值是浮点数或整数,而另一些是对象。这就是为什么我们需要转换它们:
from sklearn import preprocessing for f in train.columns: if train[f].dtype=='object': lbl = preprocessing.LabelEncoder() lbl.fit(list(train[f].values)) train[f] = lbl.transform(list(train[f].values))for f in test.columns: if test[f].dtype=='object': lbl = preprocessing.LabelEncoder() lbl.fit(list(test[f].values)) test[f] = lbl.transform(list(test[f].values))train.fillna((-999), inplace=True) test.fillna((-999), inplace=True)train=np.array(train) test=np.array(test) train = train.astype(float) test = test.astype(float)