Python: float() 参数必须是字符串或数字,而非 ‘pandas._libs.interval.Interval’

我在尝试Analytics Vidhya的贷款预测机器学习练习问题。当我使用随机森林分类器时,显示如下错误:

TypeError:float() 参数必须是字符串或数字,而非 ‘pandas._libs.interval.Interval’

代码如下:

train['Loan_Status']=np.where(train['Loan_Status']=='Y', 1,0)train_test_data=[train,test]#性别特征for dataset in train_test_data:  dataset["Gender"]=dataset["Gender"].fillna('Male')for dataset in train_test_data:  dataset["Gender"]=dataset["Gender"].map({ "Female" : 1 , "Male" : 0}).astype(int)#婚姻状况特征 for dataset in train_test_data:  dataset['Married']=dataset['Married'].fillna('Yes')for dataset in train_test_data:  dataset['Married']=dataset['Married'].map({"Yes" : 1 , "No" : 0}).astype(int)#教育特征for dataset in train_test_data:  dataset['Education']=dataset['Education'].map({'Graduate' : 1 , 'Not Graduate' : 0}).astype(int)#合并申请人收入和共同申请人收入for dataset in train_test_data:dataset['Income']=dataset['ApplicantIncome']+dataset['CoapplicantIncome']train['IncomeBand']= pd.cut(train['Income'] , 4)print(train[['IncomeBand' , 'Loan_Status']].groupby(['IncomeBand'] , as_index=False).mean())for dataset in train_test_data:  dataset.loc[dataset['Income'] <= 21331.5, 'Income'] =0  dataset.loc[(dataset['Income'] > 21331.5) & (dataset['Income'] <= 41221.0), 'Income'] =1  dataset.loc[(dataset['Income'] > 41221.0) & (dataset['Income'] <= 61110.5), 'Income'] =2  dataset.loc[dataset['Income'] > 61110.5, 'Income'] =3  dataset['Income']=dataset['Income'].astype(int)# 贷款金额特征fillin=train.LoanAmount.median()for dataset in train_test_data:  dataset['LoanAmount']=dataset['LoanAmount'].fillna(fillin)train['LoanAmountBand']=pd.cut(train['LoanAmount'] , 4)print(train[['LoanAmountBand' , 'Loan_Status']].groupby(['LoanAmountBand'] , as_index=False).mean())for dataset in train_test_data:  dataset.loc[dataset['LoanAmount'] <= 181.75, 'LoanAmount'] =0  dataset.loc[(dataset['LoanAmount'] >181.75) & (dataset['LoanAmount'] <= 354.5), 'LoanAmount'] =1  dataset.loc[(dataset['LoanAmount'] > 354.5) & (dataset['LoanAmount'] <= 527.25), 'LoanAmount'] =2  dataset.loc[dataset['LoanAmount'] > 527.25, 'LoanAmount'] =3  dataset['LoanAmount']=dataset['LoanAmount'].astype(int)#贷款期限特征for dataset in train_test_data:       dataset['Loan_Amount_Term']=dataset['Loan_Amount_Term'].fillna(360.0)Loan_Amount_Term_mapping={360.0 : 1 , 180.0 : 2 , 480.0 : 3 , 300.0 : 4 , 84.0 : 5 , 240.0 : 6, 120.0 :7 , 36.0:8 , 60.0 : 9, 12.0 :10}for dataset in train_test_data:              dataset['Loan_Amount_Term']=dataset['Loan_Amount_Term'].map(Loan_Amount_Term_mapping)# 信用历史特征for dataset in train_test_data:  dataset['Credit_History']=dataset['Credit_History'].fillna(2)# 物业区域特征for dataset in train_test_data: dataset['Property_Area']=dataset['Property_Area'].map({'Semiurban' : 0 , 'Urban' : 1 , 'Rural' : 2}).astype(int)# 特征选择features_drop=['Self_Employed' , 'ApplicantIncome' , 'CoapplicantIncome', 'Dependents']train=train.drop(features_drop, axis=1)test=test.drop(features_drop, axis=1)train.drop(['Loan_ID' , 'IncomeBand' , 'LoanAmountBand'] , axis=1)X_train=train.drop('Loan_Status' , axis=1)y_train=train['Loan_Status']X_test=test.drop('Loan_ID' , axis=1).copy()X_train.shape , y_train.shape , X_test.shapeclf = RandomForestClassifier(n_estimators=100)clf.fit(X_train, y_train)y_pred_random_forest = clf.predict(X_test)acc_random_forest = round(clf.score(X_train, y_train) * 100, 2)print (acc_random_forest)

X_train.dtypes

我不知道这个浮点数错误是从哪里来的。任何建议都将不胜感激。


回答:

问题出在那些类别数据类型的列上。这些列可能是通过pd.cut函数创建的。随机森林分类器不能接受这些作为输入,因此你需要将它们转换为数字。

最简单的转换方法是使用cat.codes

在上面的代码中,IncomeBandLoanAmountBand这两个列需要从类别类型转换为数字类型:

train['IncomeBand']= pd.cut(train['Income'] , 4).cat.codestrain['LoanAmountBand']=pd.cut(train['LoanAmount'] , 4).cat.codes

Related Posts

为什么我们在K-means聚类方法中使用kmeans.fit函数?

我在一个视频中使用K-means聚类技术,但我不明白为…

如何获取Keras中ImageDataGenerator的.flow_from_directory函数扫描的类名?

我想制作一个用户友好的GUI图像分类器,用户只需指向数…

如何查看每个词的tf-idf得分

我试图了解文档中每个词的tf-idf得分。然而,它只返…

如何修复 ‘ValueError: Found input variables with inconsistent numbers of samples: [32979, 21602]’?

我在制作一个用于情感分析的逻辑回归模型时遇到了这个问题…

如何向神经网络输入两个不同大小的输入?

我想向神经网络输入两个数据集。第一个数据集(元素)具有…

逻辑回归与机器学习有何关联

我们正在开会讨论聘请一位我们信任的顾问来做机器学习。一…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注