我是一个机器学习的初学者,我正在做一个大学项目,并且成功地训练了一个模型,但我不确定如何测试用户输入。我的项目是要检查输入的人员数据是否患有糖尿病。
数据CSV:
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Outcome6 148 72 35 0 33.6 0.627 50 11 85 66 29 0 26.6 0.351 31 08 183 64 0 0 23.3 0.672 32 11 89 66 23 94 28.1 0.167 21 00 137 40 35 168 43.1 2.288 33 15 116 74 0 0 25.6 0.201 30 03 78 50 32 88 31 0.248 26 110 115 0 0 0 35.3 0.134 29 02 197 70 45 543 30.5 0.158 53 1
代码:
from sklearn.ensemble import RandomForestClassifierrandom_forest_model = RandomForestClassifier(random_state=10)random_forest_model.fit(X_train, y_train.ravel())predict_train_data = random_forest_model.predict(X_test)from sklearn import metricsprint("Accuracy = {0:.3f}".format(metrics.accuracy_score(y_test, predict_train_data)))
用户输入代码:
print("Enter your own data to test the model:")pregnancy = int(input("Enter Pregnancy:"))glucose = int(input("Enter Glucose:"))bloodpressure = int(input("Enter Blood Pressue:"))skinthickness = int(input("Enter Skin Thickness:"))insulin = int(input("Enter Insulin:"))bmi = float(input("Enter BMI:"))DiabetesPedigreeFunction = float(input("Enter DiabetesPedigreeFunction:"))age = int(input("Enter Age:"))userInput = [pregnancy, glucose, bloodpressure, skinthickness, insulin, bmi, DiabetesPedigreeFunction, age]
我希望它返回1 – 如果是糖尿病,或0 – 如果不是糖尿病
编辑 – 添加了x_train和y_train:
from sklearn.model_selection import train_test_splitfeature_columns = ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin', 'BMI', 'DiabetesPedigreeFunction', 'Age']predicted_class = ['Outcome']X = data[feature_columns].valuesy = data[predicted_class].valuesX_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, random_state=10)from sklearn.ensemble import RandomForestClassifierrandom_forest_model = RandomForestClassifier(random_state=10)random_forest_model.fit(X_train, y_train.ravel())
回答:
尝试
result = random_forest_model.predict([user_input])[0]
因为模型期望接收多个输入(二维数组)并返回每个元素的预测(观察列表)。