我正在创建一个简单的Python机器学习脚本,用于根据以下参数预测贷款是否会被批准
business experience: should be greater than 7year of founded: should be after 2015loan: no previous or current loan
如果上述条件都符合,贷款才会被批准。该数据集可以从以下链接下载:
https://drive.google.com/file/d/1QtJ3EED7KDqJDrSHxHB6g9kc5YAfTlmF/view?usp=sharing
对于上述数据,我有以下脚本
from sklearn.linear_model import LogisticRegressionimport pandas as pdimport numpy as npdata = pd.read_csv("test2.csv")data.head()X = data[["Business Exp", "Year of Founded", "Previous/Current Loan"]]Y = data["OUTPUT"]clf = LogisticRegression()clf.fit(X, Y)test_x2 = np.array([[9, 2017, 0]])Y_pred = clf.predict(test_x2)print(Y_pred)
我在test_x2
中传递了测试数据。测试数据是如果商业经验为9年,成立年份为2017年,且没有当前或之前的贷款,那么贷款将会被提供。因此它应该预测结果为1
,但显示的是0。代码或数据集是否有问题?因为我还是机器学习的初学者,正在学习中,所以我创建了这个自定义数据集来帮助自己理解。
回答:
您应该在管道中使用StandardScaler()
from sklearn.linear_model import LogisticRegressionfrom sklearn.preprocessing import StandardScalerfrom sklearn.pipeline import make_pipelineimport pandas as pdimport numpy as npdata = pd.read_csv("test2.csv")data.head()X = data[["Business Exp", "Year of Founded", "Previous/Current Loan"]]Y = data["OUTPUT"]clf = make_pipeline(StandardScaler(), LogisticRegression())clf.fit(X, Y)test_x2 = np.array([[9, 2017, 0]])Y_pred = clf.predict(test_x2)
print("prediction = ", Y_pred.item())prediction = 1print("score = ", clf.score(X, Y))score = 0.95535