我创建了一个函数,用于显示单个模型的评估指标,现在我想将这个函数应用于我已经估算出的一组模型。
旧函数的输入是:
OldFunction(code: str, x, X_train: np.array, X_test: np.array, X:pd.DataFrame)
其中:
code 是用于创建数据框列名的字符串
x 是模型名称
X_train 和 X_test 是数据分割器的 np.arrays
X 是整个数据的数据框
为了估算一组模型的指标,我尝试通过在函数中添加循环并将模型放入列表中来修改我的函数。
但这不起作用。
问题出在我无法迭代一组模型,所以我有什么选择?你有什么想法吗?
我将新函数放在下面。
import numpy as npimport pandas as pdfrom sklearn.metrics import roc_curve, aucfrom sklearn.metrics import accuracy_score, recall_score, precision_scorefrom sklearn.model_selection import cross_val_scoredef displaymetrics(code: list, models: list, X_train: np.array, X_test: np.array, X: pd.DataFrame): for i in models: y_score = models[i].fit(X_train, y_train).decision_function(X_test) fpr, tpr, _ = roc_curve(y_test, y_score) roc_auc = auc(fpr, tpr) # 传统分数 y_pred = pd.DataFrame(model[i].predict(X_train)).reset_index(drop=True) Recall_Train,Precision_Train, Accuracy_Train = recall_score(y_train, y_pred), precision_score(y_train, y_pred), accuracy_score(y_train, y_pred) y_pred = pd.DataFrame(model[i].predict(X_test)).reset_index(drop=True) Recall_Test = recall_score(y_test, y_pred) Precision_Test = precision_score(y_test, y_pred) Accuracy_Test = accuracy_score(y_test, y_pred) #交叉验证 cv_au = cross_val_score(models[i], X_test, y_test, cv=30, scoring='roc_auc') cv_f1 = cross_val_score(models[i], X_test, y_test, cv=30, scoring='f1') cv_pr = cross_val_score(models[i], X_test, y_test, cv=30, scoring='precision') cv_re = cross_val_score(models[i], X_test, y_test, cv=30, scoring='recall') cv_ac = cross_val_score(models[i], X_test, y_test, cv=30, scoring='accuracy') cv_ba = cross_val_score(models[i], X_test, y_test, cv=30, scoring='balanced_accuracy') cv_au_m, cv_au_std = cv_au.mean() , cv_au.std() cv_f1_m, cv_f1_std = cv_f1.mean() , cv_f1.std() cv_pr_m, cv_pr_std = cv_pr.mean() , cv_pr.std() cv_re_m, cv_re_std= cv_re.mean() , cv_re.std() cv_ac_m, cv_ac_std = cv_ac.mean() , cv_ac.std() cv_ba_m, cv_ba_std= cv_ba.mean() , cv_ba.std() cv_au, cv_f1, cv_pr = (cv_au_m, cv_au_std), (cv_f1_m, cv_f1_std), (cv_pr_m, cv_pr_std) cv_re, cv_ac, cv_ba = (cv_re_m, cv_re_std), (cv_ac_m, cv_ac_std), (cv_ba_m, cv_ba_std) tuples = [cv_au, cv_f1, cv_pr, cv_re, cv_ac, cv_ba] tuplas = [0]*len(tuples) for i in range(len(tuples)): tuplas[i] = [round(x,4) for x in tuples[i]] results = pd.DataFrame() results['Metrics'] = ['roc_auc', 'Accuracy_Train', 'Precision_Train', 'Recall_Train', 'Accuracy_Test', 'Precision_Test','Recall_Test', 'cv_roc-auc (mean, std)', 'cv_f1score(mean, std)', 'cv_precision (mean, std)', 'cv_recall (mean, std)', 'cv_accuracy (mean, std)', 'cv_bal_accuracy (mean, std)'] results.set_index(['Metrics'], inplace=True) results['Model_'+code[i]] = [roc_auc, Accuracy_Train, Precision_Train, Recall_Train, Accuracy_Test, Precision_Test, Recall_Test, tuplas[0], tuplas[1], tuplas[2], tuplas[3], tuplas[4], tuplas[5]] return results
输出应该是一个数据框,其中每列代表每个模型,每行代表指标。
回答:
您应该提到是否有错误发生,或者只是输出不正确。我会假设您遇到了错误。
您确定在调用 displaymetrics
时是以列表形式传递模型的吗?
例如:
models = [model1, model2, ...]displaymetrics(code, models, X_train, X_test, X)
另外,您的代码中有一个错误:您调用了 models[i].fit(...)
,但 i
本身就是一个模型。您应该只需使用 i.fit(...)
,或者最好更改 i
的名称,因为它通常指的是迭代项目。(如果您想迭代列表的索引,您应该使用 for i in range(0, len(models)): ...
。)
注意:您不应该为每次模型迭代导入 pandas 和 numpy。我还建议您将所有导入(sklearn 模块)放在代码的上部。
所以,我认为您的代码应该如下所示:
import numpy as npimport pandas as pdfrom sklearn.metrics import roc_curve, aucfrom sklearn.metrics import accuracy_score, recall_score, precision_scorefrom sklearn.model_selection import cross_val_scoredef displaymetrics(code: list, models: list, X_train: np.array, X_test: np.array, X: pd.DataFrame): for model in models: # 或 for i in range(0, len(models)): y_score = model.fit(X_train, y_train).decision_function(X_test) # 或 y_score = models[i].fit(X_train, y_train).decision_function(X_test) fpr, tpr, _ = roc_curve(y_test, y_score) # 等等
尝试编辑您的代码,以便向我们展示您如何调用 displaymetrics
以及使用哪些参数。