我有一个示例数据框,如下所示。Y列全部包含0,1的二元结果。X是从x_1到x_13的列。
x_1 x_2 ... x_13 y_1 y_2 y_3 ... y_48 1 0.1 0.2 .... 0.1 0 1 0 .... 0 2 0.5 0.2 .... 0.2 1 0 1 .... 1 ...100 0.1 0.0 .... 0.5 0 1 0 ....0
我对机器学习方法还不太熟悉。我计划使用留一法来计算F1分数。如果不使用留一法,我们可以使用下面的代码:
accs = []for i in range(48): Y = df['y_{}'.format(i+1)] model = RandomForest() model.fit(X, Y) predicts = model.predict(X) accs.append(f1(predicts,Y)) print(accs)
结果输出的是[1,1,1….1]。我如何将留一法纳入其中,确保我们只打印出一个平均F1分数,比如0.45?
回答:
示例数据集:
import pandas as pdimport numpy as npnp.random.seed(111)df = pd.concat([pd.DataFrame(np.random.uniform(0,1,(100,10)),columns = ["x_" + str(i) for i in np.arange(1,11)]),pd.DataFrame(np.random.binomial(1,0.5,(100,5)),columns = ["y_" + str(i) for i in np.arange(1,6)])],axis=1)X = df.filter(like="x_")
然后,您可以使用cross_val_predict
和KFold
来获取每折的预测。将分割数设置为与您的观测数相同:
from sklearn.model_selection import cross_val_predict, KFoldfrom sklearn.ensemble import RandomForestClassifier from sklearn.metrics import f1_scoreaccs = []result = []loocv = KFold(len(X))for i in range(5): Y = df['y_{}'.format(i+1)] model = RandomForestClassifier() fold_pred = cross_val_predict(model, X, Y, cv=loocv) result.append(f1_score(Y,predicts)) model.fit(X, Y) predicts = model.predict(X) accs.append(f1_score(Y,predicts)) print(result)[0.5, 0.5871559633027522, 0.5585585585585585, 0.5585585585585585, 0.5871559633027522]