我想在一轮运行中计算不同的分类器,并将结果传输到一个Pandas数据框中。
# Lets create some test dataimport pandas as pdimport numpy as npimport string import randomintegers = pd.DataFrame(np.random.randint(0,100,size=(50, 1)), columns=list('I'))strings = pd.DataFrame([random.choice('ab') for _ in range(50)], columns=list('S'))df2 = pd.concat([strings,integers], axis=1)df2.head() S I0 a 51 a 312 b 843 a 794 b 92# Train - Testfrom sklearn.model_selection import train_test_splitX = df2[["I"]].valuesy = df2["S"]X_train, X_test, y_train, y_test = train_test_split(X, y)#Load librariesfrom sklearn import metricsfrom sklearn.model_selection import cross_val_scorefrom sklearn.neighbors import KNeighborsClassifierfrom sklearn.tree import DecisionTreeClassifierfrom sklearn.ensemble import RandomForestClassifier, AdaBoostClassifierfrom sklearn.linear_model import LogisticRegression#Classifiers classifiers = [KNeighborsClassifier(30),DecisionTreeClassifier(),RandomForestClassifier(),AdaBoostClassifier(),LogisticRegression()]n_range = list(range(1, 10))RandomForestClf = []data_frame = []for n in n_range:# name = clf.__class__.__name__model = RandomForestClassifier(n_estimators=n)scores = cross_val_score(model, X, y, cv=5, scoring="accuracy")RandomForestClf.append(scores.mean())data_frame = pd.DataFrame({"Random Forest": RandomForestClf})
我无法让不同的分类器通过for循环运行。
我该如何设置for循环,以便每个分类器都能运行并将预测结果传输到Pandas数据框中?
我当前的for循环只有在代码中提到模型时才能工作。
我是Python新手,抱歉。
感谢您的帮助!
回答:
您可以在for循环之外定义数据框,然后通过检查对象的type
来查找分类器的名称并分配给它:
from sklearn import metricsfrom sklearn.model_selection import cross_val_scorefrom sklearn.neighbors import KNeighborsClassifierfrom sklearn.tree import DecisionTreeClassifierfrom sklearn.ensemble import RandomForestClassifier, AdaBoostClassifierfrom sklearn.linear_model import LogisticRegression#Classifiers classifiers = [KNeighborsClassifier(30), DecisionTreeClassifier(), RandomForestClassifier(), AdaBoostClassifier(), LogisticRegression()]from sklearn.datasets import load_irisX, y = load_iris(return_X_y=True)X_train, X_test, y_train, y_test = train_test_split(X, y)k = 5preds = pd.DataFrame(index=[*range(k)])for cls in classifiers: scores = cross_val_score(cls, X, y, cv=k, scoring="accuracy") preds[type(cls).__name__] = scores
在这种情况下,您将得到:
print(preds) KNeighborsClassifier DecisionTreeClassifier RandomForestClassifier \0 0.900000 0.966667 0.966667 1 0.966667 0.966667 0.966667 2 0.933333 0.900000 0.933333 3 0.900000 0.966667 0.966667 4 1.000000 1.000000 1.000000 AdaBoostClassifier LogisticRegression 0 0.966667 0.966667 1 0.933333 1.000000 2 0.900000 0.933333 3 0.933333 0.966667 4 1.000000 1.000000
这里有一个相关的回答,展示了如何从分类器列表中绘制多个混淆矩阵,如果您觉得这也有用的话。