我已经对我的数据集进行了递归特征消除,现在试图基于RFE返回的特征进行预测,但一直遇到这个错误:
ValueError: X has 31 features per sample; expecting 9
这是我编写的获取最佳特征并根据返回的特征转换数据的代码
no_list = np.arange(1,len(list(dat)))acc_score = 0n_features = 0score_list = []for x in range(len(no_list)): X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2,random_state=100) log_reg = LogisticRegression() rfe = RFE(log_reg,no_list[x]) X_train_rfe = rfe.fit_transform(X_train,y_train) X_test_rfe = rfe.transform(X_test) log_reg.fit(X_train_rfe,y_train) score = log_reg.score(X_test_rfe,y_test) score_list.append(score) if(score > acc_score): acc_score = score n_features = no_list[x]rfe = RFE(log_reg,n_features)rfe.fit_transform(X_train,y_train)predictions = rfe.predict(X_test)
回答:
在将X_test
发送到预测函数之前,先对其进行转换。你的rfe
使用了log_reg
模型,该模型只接受n_features
个特征。
rfe = RFE(log_reg,n_features)rfe.fit_transform(X_train,y_train)X_test = rfe.transform(X_test)predictions = rfe.predict(X_test)