召回率和精确率后的分类准确率

我想知道这是否是一种计算分类准确率的合法方法：

获取精确率和召回率的阈值
对于每个阈值，将连续的y_scores二值化
从列联表（混淆矩阵）中计算它们的准确率

返回阈值的平均准确率

recall, precision, thresholds = precision_recall_curve(np.array(np_y_true), np.array(np_y_scores))accuracy = 0for threshold in thresholds:    contingency_table = confusion_matrix(np_y_true, binarize(np_y_scores, threshold=threshold)[0])    accuracy += (float(contingency_table[0][0]) + float(contingency_table[1][1]))/float(np.sum(contingency_table))print "Classification accuracy is: {}".format(accuracy/len(thresholds))

回答：

你正朝着正确的方向前进。混淆矩阵无疑是计算分类器准确率的正确起点。我觉得你似乎在瞄准接收者操作特征曲线（ROC曲线）。

在统计学中，接收者操作特征曲线（ROC曲线）是一种图形绘制方法，用于展示二元分类器系统在其判别阈值变化时的性能。https://en.wikipedia.org/wiki/Receiver_operating_characteristic

AUC（曲线下面积）是衡量分类器性能的指标。更多信息和解释可以在这里找到：

https://stats.stackexchange.com/questions/132777/what-does-auc-stand-for-and-what-is-it

http://mlwiki.org/index.php/ROC_Analysis

这是我的实现，你可以改进或评论：

def auc(y_true, y_val, plot=False):  #check inputif len(y_true) != len(y_val):    raise ValueError('Label vector (y_true) and corresponding value vector (y_val) must have the same length.\n')#empty arrays, true positive and false positive numberstp = []fp = []#count 1's and -1's in y_truecond_positive = list(y_true).count(1)cond_negative = list(y_true).count(-1)#all possibly relevant bias parameters stored in a listbias_set = sorted(list(set(y_val)), key=float, reverse=True)bias_set.append(min(bias_set)*0.9)#initialize y_pred array full of negative predictions (-1)y_pred = np.ones(len(y_true))*(-1)#the computation time is mainly influenced by this for loop#for a contamination rate of 1% it already takes ~8s to terminatefor bias in bias_set:    #"lower values tend to correspond to label −1"    #indices of values which exceed the bias    posIdx = np.where(y_val > bias)    #set predicted values to 1    y_pred[posIdx] = 1    #the following function simply calculates results which enable a distinction     #between the cases of true positive and  false positive    results = np.asarray(y_true) + 2*np.asarray(y_pred)    #append the amount of tp's and fp's    tp.append(float(list(results).count(3)))    fp.append(float(list(results).count(1)))#calculate false positive/negative ratetpr = np.asarray(tp)/cond_positivefpr = np.asarray(fp)/cond_negative#optional scatterplotif plot == True:    plt.scatter(fpr,tpr)    plt.show()#calculate AUCAUC = np.trapz(tpr,fpr)return AUC

学技术

召回率和精确率后的分类准确率

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复