XGBoost: AttributeError: ‘DataFrame’ 对象没有属性 ‘feature_names’

我已经训练了一个用于二分类任务的XGBoost分类器。在使用交叉验证对训练数据进行训练并对测试数据进行预测时,我遇到了错误 AttributeError: 'DataFrame' object has no attribute 'feature_names'

我的代码如下:

folds = StratifiedKFold(n_splits=5, shuffle=False, random_state=44000)oof = np.zeros(len(X_train))predictions = np.zeros(len(X_test))for fold_, (trn_idx, val_idx) in enumerate(folds.split(X_train, y_train)):    print("Fold {}".format(fold_+1))    trn_data = xgb.DMatrix(X_train.iloc[trn_idx], y_train.iloc[trn_idx])    val_data = xgb.DMatrix(X_train.iloc[val_idx], y_train.iloc[val_idx])    clf = xgb.train(params = best_params,                    dtrain = trn_data,                     num_boost_round = 2000,                     evals = [(trn_data, 'train'), (val_data, 'valid')],                    maximize = False,                    early_stopping_rounds = 100,                     verbose_eval=100)    oof[val_idx] = clf.predict(X_train.iloc[val_idx], ntree_limit=clf.best_ntree_limit)    predictions += clf.predict(X_test, ntree_limit=clf.best_ntree_limit)/folds.n_splits

如何处理这个问题?

以下是完整的错误跟踪信息:

Fold 1[0] train-auc:0.919667  valid-auc:0.822968Multiple eval metrics have been passed: 'valid-auc' will be used for early stopping.Will train until valid-auc hasn't improved in 100 rounds.[100]   train-auc:1 valid-auc:0.974659[200]   train-auc:1 valid-auc:0.97668[300]   train-auc:1 valid-auc:0.977696[400]   train-auc:1 valid-auc:0.977704Stopping. Best iteration:[376]   train-auc:1 valid-auc:0.977862Exception ignored in: <bound method DMatrix.__del__ of <xgboost.core.DMatrix object at 0x7f3d9c285550>>Traceback (most recent call last):  File "/usr/local/lib/python3.6/dist-packages/xgboost/core.py", line 368, in __del__    if self.handle is not None:AttributeError: 'DMatrix' object has no attribute 'handle'---------------------------------------------------------------------------AttributeError                            Traceback (most recent call last)<ipython-input-55-d52b20cc0183> in <module>()     19                     verbose_eval=100)     20 ---> 21     oof[val_idx] = clf.predict(X_train.iloc[val_idx], ntree_limit=clf.best_ntree_limit)     22      23     predictions += clf.predict(X_test, ntree_limit=clf.best_ntree_limit)/folds.n_splits/usr/local/lib/python3.6/dist-packages/xgboost/core.py in predict(self, data, output_margin, ntree_limit, pred_leaf, pred_contribs, approx_contribs)   1042             option_mask |= 0x08   1043 -> 1044         self._validate_features(data)   1045    1046         length = c_bst_ulong()/usr/local/lib/python3.6/dist-packages/xgboost/core.py in _validate_features(self, data)   1271         else:   1272             # Booster can't accept data with different feature names-> 1273             if self.feature_names != data.feature_names:   1274                 dat_missing = set(self.feature_names) - set(data.feature_names)   1275                 my_missing = set(data.feature_names) - set(self.feature_names)/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in __getattr__(self, name)   3612             if name in self._info_axis:   3613                 return self[name]-> 3614             return object.__getattribute__(self, name)   3615    3616     def __setattr__(self, name, value):AttributeError: 'DataFrame' object has no attribute 'feature_names'

回答:

问题已经解决了。问题在于,我没有将 X_train.iloc[val_idx] 转换为 xgb.DMatrix。在将 X_train.iloc[val_idx]X_test 转换为 xgb.DMatrix 后,问题就解决了!

更新了以下两行代码:

oof[val_idx] = clf.predict(xgb.DMatrix(X_train.iloc[val_idx]), ntree_limit=clf.best_ntree_limit)predictions += clf.predict(xgb.DMatrix(X_test), ntree_limit=clf.best_ntree_limit)/folds.n_splits

Related Posts

为什么我们在K-means聚类方法中使用kmeans.fit函数?

我在一个视频中使用K-means聚类技术,但我不明白为…

如何获取Keras中ImageDataGenerator的.flow_from_directory函数扫描的类名?

我想制作一个用户友好的GUI图像分类器,用户只需指向数…

如何查看每个词的tf-idf得分

我试图了解文档中每个词的tf-idf得分。然而,它只返…

如何修复 ‘ValueError: Found input variables with inconsistent numbers of samples: [32979, 21602]’?

我在制作一个用于情感分析的逻辑回归模型时遇到了这个问题…

如何向神经网络输入两个不同大小的输入?

我想向神经网络输入两个数据集。第一个数据集(元素)具有…

逻辑回归与机器学习有何关联

我们正在开会讨论聘请一位我们信任的顾问来做机器学习。一…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注