如何获取按折叠平均的模型SHAP值？

这是我从单折训练模型中获取值的方法

clf.fit(X_train, y_train,         eval_set=[(X_train, y_train), (X_test, y_test)],         eval_metric='auc', verbose=100, early_stopping_rounds=200)import shap  # package used to calculate Shap values# Create object that can calculate shap valuesexplainer = shap.TreeExplainer(clf)# Calculate Shap valuesshap_values = explainer.shap_values(X_test)shap.summary_plot(shap_values, X_test)

如您所知，不同折叠的结果可能会有所不同 – 如何平均这些shap_values？

回答：

因为我们有这样的规则：

对使用相同输入特征训练的具有相同输出的模型进行SHAP值平均是可以的，只需确保也平均每个解释器的expected_value。然而，如果您有不重叠的测试集，那么您不能对测试集的SHAP值进行平均，因为它们针对的是不同的样本。您可以使用每个模型来解释整个数据集的SHAP值，然后将这些值平均成一个单一的矩阵。（解释训练集中的例子是可以的，只是要记住您可能会对它们过度拟合）

所以我们需要一个保留数据集来遵循这一规则。我做了一些类似的事情，以使一切按预期工作：

shap_values = Nonefrom sklearn.model_selection import cross_val_score, StratifiedKFold(X_train, X_test, y_train, y_test) = train_test_split(df[feat], df['target'].values,                                      test_size=0.2, shuffle  = True,stratify =df['target'].values,                                     random_state=42) folds = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)folds_idx = [(train_idx, val_idx)                  for train_idx, val_idx in folds.split(X_train, y=y_train)]auc_scores = []oof_preds = np.zeros(df[feat].shape[0])test_preds = []for n_fold, (train_idx, valid_idx) in enumerate(folds_idx):    train_x, train_y = df[feat].iloc[train_idx], df['target'].iloc[train_idx]    valid_x, valid_y = df[feat].iloc[valid_idx], df['target'].iloc[valid_idx]        clf = lgb.LGBMClassifier(nthread=4,            boosting_type= 'gbdt', is_unbalance= True,random_state = 42,            learning_rate= 0.05, max_depth= 3,            reg_lambda=0.1 , reg_alpha= 0.01,min_child_samples= 21,subsample_for_bin= 5000,            metric= 'auc', n_estimators= 5000    )    clf.fit(train_x, train_y,             eval_set=[(train_x, train_y), (valid_x, valid_y)],             eval_metric='auc', verbose=False, early_stopping_rounds=100)    explainer = shap.TreeExplainer(clf)    if shap_values is None:        shap_values = explainer.shap_values(X_test)    else:        shap_values += explainer.shap_values(X_test)           oof_preds[valid_idx] = clf.predict_proba(valid_x)[:, 1]       auc_scores.append(roc_auc_score(valid_y, oof_preds[valid_idx]))print( 'AUC: ', np.mean(auc_scores))shap_values /= 10 # number of foldsshap.summary_plot(shap_values, X_test)

学技术

如何获取按折叠平均的模型SHAP值？

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复