我试图通过输入温度、土壤湿度、pH值和平均降雨量来预测作物名称。每次的准确率都很高,通常在88%到94%之间。但预测后的最终结果总是错误。这里是我的代码:
#导入所需的库import pandas as pdimport numpy as npimport matplotlib.pyplot as pltfrom sklearn.model_selection import train_test_split#读取csv文件data=pd.read_csv('cpdata.csv')#为目标变量创建虚拟变量label= pd.get_dummies(data.label).iloc[: , 1:]data= pd.concat([data,label],axis=1)data.drop('label', axis=1,inplace=True)print('数据集中一行的数据是')print(data.head(1))train=data.iloc[:, 0:4].valuestest=data.iloc[: ,4:].values#将数据分为训练集和测试集X_train,X_test,y_train,y_test=train_test_split(train,test,test_size=0.3)from sklearn.preprocessing import StandardScalersc = StandardScaler()X_train = sc.fit_transform(X_train)X_test = sc.transform(X_test)#导入决策树分类器from sklearn.tree import DecisionTreeRegressorclf=DecisionTreeRegressor()#将分类器拟合到训练集上clf.fit(X_train,y_train)pred=clf.predict(X_test)from sklearn.metrics import accuracy_score#计算模型的准确率a=accuracy_score(y_test,pred)print("该模型的准确率是: ", a*100)ah=89.41atemp=26.98shum=28pH=6.26rain=58.54l=[]l.append(atemp)l.append(ah)l.append(pH)l.append(rain)predictcrop=[l]#将作物名称放在一个列表中crops=['rice','wheat','mungbean','Tea','millet','maize','lentil','jute','cofee','cotton','ground nut','peas','rubber','sugarcane','tobacco','kidney beans','moth beans','coconut','blackgram','adzuki beans','pigeon peas','chick peas','banana','grapes','apple','mango','muskmelon','orange','papaya','pomegranate','watermelon']cr='rice'#预测作物predictions = clf.predict(predictcrop)count=0for i in range(0,31): if(predictions[0][i]==1): c=crops[i] count=count+1 break; i=i+1if(count==0): print('预测的作物是 %s'%cr)else: print('预测的作物是 %s'%c)
我得到的输出是-
该模型的准确率是: 90.43010752688173预测的作物是 apple
即使我输入其他作物的精确值,我每次得到的都是苹果或芒果。
请帮助我解决这个问题。
回答:
在预测时,也要对新数据应用缩放器。我无法在没有您的数据的情况下进行测试,但应该看起来像这样:
datascaled = sc.transform(predictcrop)predictions = clf.predict(datascaled)
为了以后也能对新数据应用缩放器,您需要保存它:
from sklearn.externals.joblib import dump, loaddump(sc, 'scaler.bin', compress=True)
然后以后使用时:
sc=load('scaler.bin')