早上好!我刚开始学习Python,使用Spyder 4.0来构建神经网络。在下面的脚本中,我使用随机森林来进行特征重要性分析。因此,变量importances
的值告诉我每个特征的重要性。不幸的是我无法上传数据集,但我可以告诉你数据集中有18个特征和1个标签,特征和标签都是物理量,这是一个回归问题。我想将变量importances
导出到Excel文件中,但当我直接复制这个向量时,数字是用小数点表示的(例如0.012, 0.015,等等)。为了在Excel文件中使用,我更希望用逗号代替小数点。我尝试使用.replace('.',',')
,但不起作用,错误是:
AttributeError: 'numpy.ndarray' object has no attribute 'replace'
我认为这是因为importances
向量是一个float64类型的数组(18,)。我该怎么办?
谢谢。
import pandas as pdimport numpy as npfrom sklearn.ensemble import RandomForestRegressorfrom sklearn.feature_selection import SelectFromModelfrom sklearn import preprocessingfrom sklearn.model_selection import train_test_splitfrom matplotlib import pyplot as pltdataset = pd.read_csv('Dataset.csv', decimal=',', delimiter = ";")label = dataset.iloc[:,-1]features = dataset.drop(columns = ['Label'])y_max_pre_normalize = max(label)y_min_pre_normalize = min(label)def denormalize(y): final_value = y*(y_max_pre_normalize-y_min_pre_normalize)+y_min_pre_normalize return final_valueX_train1, X_test1, y_train1, y_test1 = train_test_split(features, label, test_size = 0.20, shuffle = True)y_test2 = y_test1.to_frame()y_train2 = y_train1.to_frame()scaler1 = preprocessing.MinMaxScaler()scaler2 = preprocessing.MinMaxScaler()X_train = scaler1.fit_transform(X_train1)X_test = scaler2.fit_transform(X_test1)scaler3 = preprocessing.MinMaxScaler()scaler4 = preprocessing.MinMaxScaler()y_train = scaler3.fit_transform(y_train2)y_test = scaler4.fit_transform(y_test2)sel = RandomForestRegressor(n_estimators = 200,max_depth = 9, max_features = 5, min_samples_leaf = 1, min_samples_split = 2,bootstrap = False)sel.fit(X_train, y_train)importances = sel.feature_importances_# sel.fit(X_train, y_train)# a = []# for feature_list_index in sel.get_support(indices=True):# a.append(feat_labels[feature_list_index])# print(feat_labels[feature_list_index])# X_important_train = sel.transform(X_train1)# X_important_test = sel.transform(X_test1)
回答:
我将通过一些随机值向你展示应该怎么做。我在Python shell上运行的,所以你会看到“>>>”。
>>> import numpy as np # 首先我导入numpy并命名为"np"# 我生成10个随机值并存储在"importance"中>>> importance=np.random.rand(10)# 这里我只是想查看"importance"的内容>>> importancearray([0.77609076, 0.97746829, 0.56946118, 0.23986983, 0.93655692, 0.22003531, 0.7711095 , 0.36083248, 0.58277805, 0.57865248])# 这里是你的错误,我为了教学目的重现了它>>>importance.replace(".", ",")Traceback (most recent call last): File "<stdin>", line 1, in <module>AttributeError: 'numpy.ndarray' object has no attribute 'replace'
你需要做的是将”importance”中的元素转换为字符串列表
>>> imp_astr=[str(i) for i in importance]>>> imp_astr['0.7760907642658763', '0.9774682868805988', '0.569461184647781', '0.23986982589422634', '0.9365569207431337', '0.22003531170279356', '0.7711094966708247', '0.3608324767276052', '0.5827780487688116', '0.5786524781334242']# 最后,对于每个字符串,你可以使用"replace"函数>>> imp_astr=[i.replace(".", ",") for i in imp_astr]>>> imp_astr['0,7760907642658763', '0,9774682868805988', '0,569461184647781', '0,23986982589422634', '0,9365569207431337', '0,22003531170279356', '0,7711094966708247', '0,3608324767276052', '0,5827780487688116', '0,5786524781334242']