我正在使用Orange数据挖掘工具编写一个Python脚本,使用先前保存的模型(pickle文件)来获取测试数据的分类准确率。
dataFile = "training.csv" data = Orange.data.Table(dataFile);learner = Orange.classification.RandomForestLearner()cf = learner(data)#save the pickle filewith open("1.pkcls", "wb") as f: pickle.dump(cf, f)#load the pickle filewith open("1.pkcls", "rb") as f: loadCF = pickle.load(f)testFile = "testing.csv" test = Orange.data.Table(testFile);learners = [1]learners[0] = cfresult = Orange.evaluation.testing.TestOnTestData(data,test,learners)# get classification accuracyCAs = Orange.evaluation.CA(result)
我能够成功保存和加载模型,但遇到了一个错误
CAs = Orange.evaluation.CA(result)File "/Users/anaconda2/envs/py36/lib/python3.6/site-packages/Orange/evaluation/scoring.py", line 39, in __new__ return self(results, **kwargs) File "/Users/anaconda2/envs/py36/lib/python3.6/site-packages/Orange/evaluation/scoring.py", line 48, in __call__ return self.compute_score(results, **kwargs) File "/Users/anaconda2/envs/py36/lib/python3.6/site-packages/Orange/evaluation/scoring.py", line 84, in compute_score return self.from_predicted(results, skl_metrics.accuracy_score) File "/Users/anaconda2/envs/py36/lib/python3.6/site-packages/Orange/evaluation/scoring.py", line 75, in from_predicted dtype=np.float64, count=len(results.predicted)) File "/Users/anaconda2/envs/py36/lib/python3.6/site-packages/Orange/evaluation/scoring.py", line 74, in <genexpr> for predicted in results.predicted), File "/Users/anaconda2/envs/py36/lib/python3.6/site-packages/sklearn/metrics/classification.py", line 172, in accuracy_score y_type, y_true, y_pred = _check_targets(y_true, y_pred) File "/Users/anaconda2/envs/py36/lib/python3.6/site-packages/sklearn/metrics/classification.py", line 82, in _check_targets "".format(type_true, type_pred))ValueError: Can't handle mix of multiclass and continuous
我找到了一种解决这个问题的方法,通过删除
cf = learner(data)
然而,如果我删除了这行代码,我将无法训练模型并保存它,因为RandomForestLearner在保存和加载模型的代码之前不会根据输入文件训练模型。
with open("1.pkcls", "wb") as f:pickle.dump(cf, f)#load the pickle filewith open("1.pkcls", "rb") as f:loadCF = pickle.load(f)
有人知道是否可以先训练一个模型并将其保存为pickle文件,然后再使用它来测试另一个文件以获取分类准确率吗?
回答:
在将分类器传递给TestOnTestData
之前,您不能预先训练分类器(它的名字应该是TrainOnTrainAndTestOnTestData
,即它会自行调用拟合/训练步骤)。
不幸的是,没有现成的明确方法可以从预训练分类器在测试数据集上的应用中创建Result
实例。
一种快速且粗糙的方法是将传递给TestOnTest数据的’learners’转换为返回预训练模型的函数
results = Orange.evaluation.testing.TestOnTestData(data, test, [lambda testdata: loadCF])