如何调整已废弃的 StratifiedKFold 代码

我的数据集中响应值分布不均衡,合格的被拒绝样本数量远多于未被拒绝的样本,因此我想平衡我的数据集。

为此,我之前使用了现在已废弃的 cross_validation.StratifiedKFold 的代码,现在需要对其进行调整,但我对其理解不完全,所以我在寻求帮助。

原代码如下:

def stratified_cv(X, y, clf_class, shuffle=True, n_folds=10, **kwargs):    stratified_k_fold = cross_validation.StratifiedKFold(y, n_folds=n_folds, shuffle=shuffle)    y_pred = y.copy()    # ii -> 训练集索引    # jj -> 测试集索引    for ii, jj in stratified_k_fold:         X_train, X_test = X[ii], X[jj]        y_train = y[ii]        clf = clf_class(**kwargs)        clf.fit(X_train,y_train)        y_pred[jj] = clf.predict(X_test)    return y_pred

其中 X 是经过 fit_transform 处理、转换为 numpy 浮点数组并进行缩放的数据集,y 是“被拒绝”与“未被拒绝”的分类,转换为整数数组(当然是0或1)。最后,clf_class(**kwargs) 可以是像 ensemble.GradientBoostingClassifiersvm.SVCensemble.RandomForestClassifier 这样的分类器

X = np.array([[-0.6786493 ,  0.67648946, -0.52360328, -0.32758048,  1.6170861 ,        1.23488274,  1.56676695,  0.47664315,  1.56703625, -0.07060962,       -0.05594035, -0.07042665,  0.86674322, -0.46549436,  0.86602851,       -0.08500823, -0.60119509, -0.0856905 , -0.42793202],[0.6031696 ,  0.14906505, -0.52360328, -0.32758048,  1.6170861 ,        1.30794844, -0.33373776,  1.12450284, -0.33401297, -0.10808036,        0.14486653, -0.10754944,  1.05857074,  0.14782467,  1.05938994,        1.24048169, -0.60119509,  1.2411686 , -0.42793202],[ 0.33331299,  0.9025285 , -0.52360328, -0.32758048, -0.61839626,       -0.59175986,  1.16830364,  0.67598459,  1.168464  , -1.57338336,        0.49627857, -1.57389963, -0.75686906,  0.19893459, -0.75557074,        0.70312091,  0.21153386,  0.69715637, -1.1882185 ],[ 0.6031696 , -0.42859027, -0.68883427,  3.05268496, -0.61839626,       -0.59175986,  2.19659605, -1.46693591,  2.19675881, -2.74286476,       -0.60815927, -2.7432675 , -0.07855114, -0.5677142 , -0.07880574,       -1.30302599,  1.02426282, -1.30640087,  0.33235445],[ 0.67063375, -0.6546293 , -0.52360328,  3.05268496, -0.61839626,       -0.59175986, -0.24008971,  0.62614923, -0.24004065, -1.03893233,        1.0986992 , -1.03793936, -0.27631146,  1.06780322, -0.27656174,       -0.04918418, -0.60119509, -0.04588472,  1.09264093],[-0.74611345, -0.90578379, -0.52360328, -0.32758048, -0.61839626,       -0.59175986, -0.93051461,  1.82219789, -0.93025113,  0.54272717,       -0.85916786,  0.54209937,  0.15678365,  0.55670403,  0.15850147,        0.88224117,  0.61789834,  0.88291665,  1.8529274 ],[ 0.53570545,  1.50529926, -0.52360328, -0.32758048, -0.61839626,       -0.59175986,  2.81173526, -1.66627735,  2.81135938,  2.30385178,       -0.15634379,  2.3031117 , -0.79642112,  1.42557266, -0.79512194,       -1.73291462,  1.83699177, -1.73099578,  1.8529274 ]])

y = np.array([0,0,0,0,0,1,1])


回答:

StratifiedKFold 已移至 model_selection。所以你应该这样做:

from sklearn.model_selection import StratifiedKFolddef stratified_cv(X, y, clf_class, shuffle=True, n_folds=10, **kwargs):    stratified_k_fold = StratifiedKFold(n_splits=n_folds, shuffle=shuffle)    y_pred = y.copy()    # ii -> 训练集索引    # jj -> 测试集索引    for ii, jj in stratified_k_fold.split(X,y):         X_train, X_test = X[ii], X[jj]        y_train = y[ii]        clf = clf_class(**kwargs)        clf.fit(X_train,y_train)        y_pred[jj] = clf.predict(X_test)    return y_pred

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注