随机森林过拟合

我使用scikit-learn和分层交叉验证来比较一些分类器。我计算了：准确率、召回率、AUC。

我使用了GridSearchCV进行参数优化，交叉验证设置为5折。

RandomForestClassifier(warm_start= True, min_samples_leaf= 1, n_estimators= 800, min_samples_split= 5,max_features= 'log2', max_depth= 400, class_weight=None)

这是GridSearchCV得到的最佳参数。

我的问题是，我认为我真的过拟合了。例如：

随机森林的标准差（+/-）

精确度：0.99 (+/- 0.06)

敏感性：0.94 (+/- 0.06)

特异性：0.94 (+/- 0.06)

B_accuracy：0.94 (+/- 0.06)

AUC：0.94 (+/- 0.11)

逻辑回归的标准差（+/-）

精确度：0.88(+/- 0.06)

敏感性：0.79 (+/- 0.06)

特异性：0.68 (+/- 0.06)

B_accuracy：0.73 (+/- 0.06)

AUC：0.73 (+/- 0.041)

其他分类器的表现看起来更像逻辑回归（因此它们看起来没有过拟合）。

我的交叉验证代码是：

for i,j in enumerate(data):    X.append(data[i][0])    y.append(float(data[i][1]))x=np.array(X)y=np.array(y)def SD(values):    mean=sum(values)/len(values)    a=[]    for i in range(len(values)):        a.append((values[i]-mean)**2)    erg=sum(a)/len(values)    SD=math.sqrt(erg)    return SD,mean    for name, clf in zip(titles,classifiers):    # 遍历所有分类器，计算10折     # 接下来的for循环应该多缩进一个制表符，由于格式问题无法在这里正确显示，抱歉    pre,sen,spe,ba,area=[],[],[],[],[]    for train_index, test_index in skf:        #print train_index, test_index        #获取所有train_index和test_index的索引        #将其转换为列表以避免某些错误        train=train_index.tolist()        test=test_index.tolist()        X_train=[]        X_test=[]        y_train=[]        y_test=[]        for i in train:            X_train.append(x[i])        for i in test:            X_test.append(x[i])         for i in train:            y_train.append(y[i])        for i in test:            y_test.append(y[i])         #clf=clf.fit(X_train,y_train)        #predicted=clf.predict_proba(X_test)        #... 其他代码，计算指标等...

回答：

学技术

随机森林过拟合

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复