我在尝试使用XGBoost构建一个分类器,并使用RandomizedSearchCV进行拟合。
这是我的函数代码:
def xgboost_classifier_rscv(x,y): from scipy import stats from xgboost import XGBClassifier from sklearn.metrics import fbeta_score, make_scorer, recall_score, accuracy_score, precision_score from sklearn.model_selection import StratifiedKFold, GridSearchCV, RandomizedSearchCV #将数据集分成训练和测试部分 x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42) #词袋模型实现 cv = CountVectorizer() x_train = cv.fit_transform(x_train).toarray() #TF-IDF实现 vector = TfidfTransformer() x_train = vector.fit_transform(x_train).toarray() x_test = cv.transform(x_test) scorers = { 'f1_score':make_scorer(f1_score), 'precision_score': make_scorer(precision_score), 'recall_score': make_scorer(recall_score), 'accuracy_score': make_scorer(accuracy_score) } param_dist = {'n_estimators': stats.randint(150, 1000), 'learning_rate': stats.uniform(0.01, 0.59), 'subsample': stats.uniform(0.3, 0.6), 'max_depth': [3, 4, 5, 6, 7, 8, 9], 'colsample_bytree': stats.uniform(0.5, 0.4), 'min_child_weight': [1, 2, 3, 4] } n_folds = numFolds) skf = StratifiedKFold(n_splits=3, shuffle = True) gridCV = RandomizedSearchCV(xgb_model, param_distributions = param_dist, cv = skf, n_iter = 5, scoring = scorers, verbose = 3, n_jobs = -1, return_train_score=True, refit = precision_score) gridCV.fit(x_train,y_train) best_pars = gridCV.best_params_ print("最佳参数 : ", best_pars) xgb_predict = gridCV.predict(x_test) xgb_pred_prob = gridCV.predict_proba(x_test) print('最佳得分 : ', gridCV.grid_scores_) scores = [x[1] for x in gridCV.grid_scores_] print("最佳得分 : ", scores) return y_test, xgb_predict, xgb_pred_prob
当我运行代码时,出现了以下错误:
TypeError Traceback (most recent call last)<ipython-input-30-9adf84d48e5c> in <module> 1 print("********** Xgboost分类器 *************") 2 start_time = time.monotonic()----> 3 y_test, xgb_predict, xgb_pred_prob = xgboost_classifier_rscv(x,y) 4 end_time = time.monotonic() 5 print("耗时 : ", timedelta(seconds=end_time - start_time))<ipython-input-29-e0c6ae026076> in xgboost_classifier_rscv(x, y) 70 # verbose=3, random_state=1001, refit='precision_score' ) 71 ---> 72 gridCV.fit(x_train,y_train) 73 best_pars = gridCV.best_params_ 74 print("最佳参数 : ", best_pars)~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs) 61 extra_args = len(args) - len(all_args) 62 if extra_args <= 0:---> 63 return f(*args, **kwargs) 64 65 # extra_args > 0~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params) 858 # parameter set. 859 if callable(self.refit):--> 860 self.best_index_ = self.refit(results) 861 if not isinstance(self.best_index_, numbers.Integral): 862 raise TypeError('best_index_ returned is not an integer')~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs) 61 extra_args = len(args) - len(all_args) 62 if extra_args <= 0:---> 63 return f(*args, **kwargs) 64 65 # extra_args > 0TypeError: precision_score() missing 1 required positional argument: 'y_pred'
当我使用GridSearchCV代替RandomizedSearchCV时,代码可以正常运行!
回答:
不是precision_score
,应该是'precision_score'
(带有”),像这样-
gridCV = RandomizedSearchCV(xgb_model, param_distributions = param_dist, cv = skf, n_iter = 5, scoring = scorers, verbose = 3, n_jobs = -1, return_train_score=True, refit = 'precision_score')
另一个错误:
grid_scores_
已被移除,所以改为cv_results_
(在最后第三行和第四行)
print('最佳得分 : ', gridCV.cv_results_)scores = [x[1] for x in gridCV.cv_results_]
还有一个错误:
您没有定义xgb_model
,所以请添加以下代码。
xgb_model = XGBClassifier(n_jobs = -1, random_state = 42)