如何使用递归特征消除法?

我是机器学习的新手,正在尝试使用递归特征消除(RFE)方法进行特征选择。我的数据集有5000条记录,是一个二分类问题。以下是我基于在线教程所遵循的代码 在线

#no of featuresnof_list=np.arange(1,13)            high_score=0#Variable to store the optimum featuresnof=0           score_list =[]for n in range(len(nof_list)):    X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.3, random_state = 0)    model = RandomForestClassifier()    rfe = RFE(model,nof_list[n])    X_train_rfe = rfe.fit_transform(X_train,y_train)    X_test_rfe = rfe.transform(X_test)    model.fit(X_train_rfe,y_train)    score = model.score(X_test_rfe,y_test)    score_list.append(score)    if(score>high_score):        high_score = score        nof = nof_list[n]print("Optimum number of features: %d" %nof)print("Score with %d features: %f" % (nof, high_score))

我遇到了以下错误。请问有人可以帮忙吗?

TypeError                                 Traceback (most recent call last)<ipython-input-332-a23dfb331001> in <module>      9     model = RandomForestClassifier()     10     rfe = RFE(model,nof_list[n])---> 11     X_train_rfe = rfe.fit_transform(X_train,y_train)     12     X_test_rfe = rfe.transform(X_test)     13     model.fit(X_train_rfe,y_train)~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\base.py in fit_transform(self, X, y, **fit_params)    554             Training set.    555 --> 556         y : numpy array of shape [n_samples]    557             Target values.    558 ~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\feature_selection\_base.py in transform(self, X)     75         X = check_array(X, dtype=None, accept_sparse='csr',     76                         force_all_finite=not tags.get('allow_nan', True))---> 77         mask = self.get_support()     78         if not mask.any():     79             warn("No features were selected: either the data is"~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\feature_selection\_base.py in get_support(self, indices)     44             values are indices into the input feature vector.     45         """---> 46         mask = self._get_support_mask()     47         return mask if not indices else np.where(mask)[0]     48 ~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\feature_selection\_rfe.py in _get_support_mask(self)    269     270     def _get_support_mask(self):--> 271         check_is_fitted(self)    272         return self.support_    273 TypeError: check_is_fitted() missing 1 required positional argument: 'attributes'

回答:

您的sklearn版本是什么?

以下代码(使用人造数据)应该可以正常运行:

from sklearn.model_selection import train_test_splitimport numpy as npfrom sklearn.feature_selection import RFEfrom sklearn.ensemble import RandomForestClassifierX = np.random.rand(100,20)y = np.ones((X.shape[0]))#no of featuresnof_list=np.arange(1,13)            high_score=0#Variable to store the optimum featuresnof=0           score_list =[]for n in range(len(nof_list)):    X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.3, random_state = 0)    model = RandomForestClassifier()    rfe = RFE(model,nof_list[n])    X_train_rfe = rfe.fit_transform(X_train,y_train)    X_test_rfe = rfe.transform(X_test)    model.fit(X_train_rfe,y_train)    score = model.score(X_test_rfe,y_test)    score_list.append(score)    if(score>high_score):        high_score = score        nof = nof_list[n]print("Optimum number of features: %d" %nof)print("Score with %d features: %f" % (nof, high_score))

最佳特征数:1

使用1个特征的得分:1.000000

测试的版本:

sklearn.__version__'0.20.4'sklearn.__version__'0.21.3'

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注