如何在Python的sklearn中使用交叉验证执行SMOTE

我有一个高度不平衡的数据集，希望使用SMOTE来平衡数据集，并通过交叉验证来测量准确性。然而，大多数现有的教程仅使用单次training和testing迭代来执行SMOTE。

因此，我想知道使用交叉验证执行SMOTE的正确步骤。

我当前的代码如下。然而，如上所述，它仅使用单次迭代。

from imblearn.over_sampling import SMOTEfrom sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)sm = SMOTE(random_state=2)X_train_res, y_train_res = sm.fit_sample(X_train, y_train.ravel())clf_rf = RandomForestClassifier(n_estimators=25, random_state=12)clf_rf.fit(x_train_res, y_train_res)

如果需要，我很乐意提供更多细节。

回答：

您需要在每个折叠内执行SMOTE。因此，您需要避免使用train_test_split，而选择KFold：

from sklearn.model_selection import KFoldfrom imblearn.over_sampling import SMOTEfrom sklearn.metrics import f1_scorekf = KFold(n_splits=5)for fold, (train_index, test_index) in enumerate(kf.split(X), 1):    X_train = X[train_index]    y_train = y[train_index]  # 根据您的代码，您可能需要在这里调用ravel，但我会查看您是如何生成y的    X_test = X[test_index]    y_test = y[test_index]  # 请参阅关于ravel和y_train的评论    sm = SMOTE()    X_train_oversampled, y_train_oversampled = sm.fit_sample(X_train, y_train)    model = ...  # 在这里选择一个模型    model.fit(X_train_oversampled, y_train_oversampled )      y_pred = model.predict(X_test)    print(f'对于第{fold}折：')    print(f'准确率：{model.score(X_test, y_test)}')    print(f'F分数：{f1_score(y_test, y_pred)}')

您还可以，例如，将得分追加到外部定义的list中。

学技术

如何在Python的sklearn中使用交叉验证执行SMOTE

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复