使用XGBoost分类器和RandomizedSearchCV时precision_score出现错误

我在尝试使用XGBoost构建一个分类器,并使用RandomizedSearchCV进行拟合。

这是我的函数代码:

def xgboost_classifier_rscv(x,y):    from scipy import stats    from xgboost import XGBClassifier    from sklearn.metrics import fbeta_score, make_scorer, recall_score, accuracy_score, precision_score    from sklearn.model_selection import StratifiedKFold, GridSearchCV, RandomizedSearchCV    #将数据集分成训练和测试部分    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)    #词袋模型实现    cv = CountVectorizer()    x_train = cv.fit_transform(x_train).toarray()    #TF-IDF实现    vector = TfidfTransformer()    x_train = vector.fit_transform(x_train).toarray()    x_test = cv.transform(x_test)        scorers = {            'f1_score':make_scorer(f1_score),            'precision_score': make_scorer(precision_score),            'recall_score': make_scorer(recall_score),            'accuracy_score': make_scorer(accuracy_score)          }    param_dist = {'n_estimators': stats.randint(150, 1000),                  'learning_rate': stats.uniform(0.01, 0.59),                  'subsample': stats.uniform(0.3, 0.6),                  'max_depth': [3, 4, 5, 6, 7, 8, 9],                  'colsample_bytree': stats.uniform(0.5, 0.4),                  'min_child_weight': [1, 2, 3, 4]                 } n_folds = numFolds)    skf = StratifiedKFold(n_splits=3, shuffle = True)    gridCV = RandomizedSearchCV(xgb_model,                              param_distributions = param_dist,                             cv = skf,                               n_iter = 5,                               scoring = scorers,                              verbose = 3,                              n_jobs = -1,                             return_train_score=True,                             refit = precision_score)    gridCV.fit(x_train,y_train)    best_pars = gridCV.best_params_    print("最佳参数 : ", best_pars)    xgb_predict = gridCV.predict(x_test)    xgb_pred_prob = gridCV.predict_proba(x_test)    print('最佳得分 : ', gridCV.grid_scores_)    scores = [x[1] for x in gridCV.grid_scores_]    print("最佳得分 : ", scores)    return y_test, xgb_predict, xgb_pred_prob

当我运行代码时,出现了以下错误:

TypeError                                 Traceback (most recent call last)<ipython-input-30-9adf84d48e5c> in <module>      1 print("********** Xgboost分类器 *************")      2 start_time = time.monotonic()----> 3 y_test, xgb_predict, xgb_pred_prob = xgboost_classifier_rscv(x,y)      4 end_time = time.monotonic()      5 print("耗时 : ", timedelta(seconds=end_time - start_time))<ipython-input-29-e0c6ae026076> in xgboost_classifier_rscv(x, y)     70 #                                 verbose=3, random_state=1001, refit='precision_score' )     71 ---> 72     gridCV.fit(x_train,y_train)     73     best_pars = gridCV.best_params_     74     print("最佳参数 : ", best_pars)~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)     61             extra_args = len(args) - len(all_args)     62             if extra_args <= 0:---> 63                 return f(*args, **kwargs)     64      65             # extra_args > 0~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)    858             # parameter set.    859             if callable(self.refit):--> 860                 self.best_index_ = self.refit(results)    861                 if not isinstance(self.best_index_, numbers.Integral):    862                     raise TypeError('best_index_ returned is not an integer')~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)     61             extra_args = len(args) - len(all_args)     62             if extra_args <= 0:---> 63                 return f(*args, **kwargs)     64      65             # extra_args > 0TypeError: precision_score() missing 1 required positional argument: 'y_pred'

当我使用GridSearchCV代替RandomizedSearchCV时,代码可以正常运行!


回答:

不是precision_score,应该是'precision_score'(带有”),像这样-

gridCV = RandomizedSearchCV(xgb_model,                          param_distributions = param_dist,                         cv = skf,                           n_iter = 5,                           scoring = scorers,                          verbose = 3,                          n_jobs = -1,                         return_train_score=True,                         refit = 'precision_score')

另一个错误:

grid_scores_已被移除,所以改为cv_results_(在最后第三行和第四行)

print('最佳得分 : ', gridCV.cv_results_)scores = [x[1] for x in gridCV.cv_results_]

还有一个错误:

您没有定义xgb_model,所以请添加以下代码。

xgb_model = XGBClassifier(n_jobs = -1, random_state = 42)

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注