我有一个使用Scikit-Learn的基本决策树分类器:
#Used to determine men from women based on height and shoe sizefrom sklearn import tree#height and shoe sizeX = [[65,9],[67,7],[70,11],[62,6],[60,7],[72,13],[66,10],[67,7.5]]Y=["male","female","male","female","female","male","male","female"]#creating a decision treeclf = tree.DecisionTreeClassifier()#fitting the data to the treeclf.fit(X, Y)#predicting the gender based on a predictionprediction = clf.predict([68,9])#print the predicted genderprint(prediction)
当我运行程序时,它总是输出“male”或“female”,但我如何才能看到预测为男性或女性的概率呢?例如,上面的预测返回“male”,但我如何让它打印出预测为男性的概率呢?
谢谢!
回答:
你可以像下面这样做:
from sklearn import tree#load dataX = [[65,9],[67,7],[70,11],[62,6],[60,7],[72,13],[66,10],[67,7.5]]Y=["male","female","male","female","female","male","male","female"]#build modelclf = tree.DecisionTreeClassifier()#fitclf.fit(X, Y)#predictprediction = clf.predict([[68,9],[66,9]])#probabilitiesprobs = clf.predict_proba([[68,9],[66,9]])#print the predicted genderprint(prediction)print(probs)
理论
clf.predict_proba(X)
的结果是:叶子节点中同一类样本的比例,即预测的类概率。
结果的解释:
第一个print
返回['male' 'male']
,所以数据[[68,9],[66,9]]
被预测为males
。
第二个print
返回:
[[ 0. 1.] [ 0. 1.]]
这意味着数据被预测为男性,并且由第二列的1表示。
要查看类的顺序,请使用:clf.classes_
这将返回:['female', 'male']