K值不影响SelectKBest的结果

我目前正在进行一个项目，处理数据并试图找出数据中的最佳特征。我使用了sklearn和SelectKBest模块。当我运行代码时，我得到了结果，但无论我使用什么K值，结果都是一样的。我想问问是否有人能看一下我的代码并告诉我哪里出了问题。我正在使用Jupyter Notebook构建这个项目，所以我会更改值然后重新运行代码块。

features_list = ['poi', 'salary','to_messages','deferral_payments',                 'total_payments','exercised_stock_options','bonus',                 'restricted_stock','shared_receipt_with_poi','restricted_stock_deferred',                 'total_stock_value','expenses','loan_advances','from_messages','other',                 'from_this_person_to_poi','director_fees','deferred_income','long_term_incentive','from_poi_to_this_person']         data = featureFormat(data_dict, features_list, sort_keys=True)            labels, features = targetFeatureSplit(data)            from sklearn.feature_selection import SelectKBest            clf = SelectKBest()            new_features = clf.fit_transform(features,labels)            params = clf.get_params()            i=0            featureImportance = []            for item in clf.scores_:                featureImportance.append((item,features_list[i+1]))                i=i+1            featureImportance=sorted(featureImportance, reverse=True)            for item in featureImportance:                 print "{0} , {1:4.2f}%".format(item[1],item[0])

输出:

exercised_stock_options , 25.10%total_stock_value , 24.47%bonus , 21.06%salary , 18.58%deferred_income , 11.60%long_term_incentive , 10.07%restricted_stock , 9.35%total_payments , 8.87%shared_receipt_with_poi , 8.75%loan_advances , 7.24%expenses , 6.23%from_poi_to_this_person , 5.34%other , 4.20%from_this_person_to_poi , 2.43%director_fees , 2.11%to_messages , 1.70%deferral_payments , 0.22%from_messages , 0.16%restricted_stock_deferred , 0.06%

百分比不变。

回答：

scores_ 是根据提供的 score_func 参数在完整特征上生成的，参见 SelectKBest :

score_func : 可调用的函数，接受两个数组 X 和 y，并返回一对数组（scores, pvalues）或一个只包含scores的数组。默认是 f_classif（参见下面的“另见”）。默认函数仅适用于分类任务。

默认情况下，SelectKBest 将使用 f_classif 。

由于提供给它的特征没有变化，因此得分也不会变化。变化的是根据这些得分选择了多少个特征。根据你选择的 k 值，将选择排名前列的特征。

你可以使用 get_support() 方法查看哪些特征被选中。

学技术

K值不影响SelectKBest的结果

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复