我想使用随机森林来预测电力消耗。在对数据进行调整后,最新的状态如下:
X=df[['Temp(⁰C)','Araç Sayısı (adet)','Montaj V362_WH','Montaj V363_WH','Montaj_Temp','avg_humidity']]X.head(15)
输出:
Temp(⁰C) Araç Sayısı (adet) Montaj V362_WH Montaj V363_WH Montaj_Temp avg_humidity0 3.250000 0.0 0.0 0.0 17.500000 88.2500001 3.500000 868.0 16.0 18.0 20.466667 82.3166672 3.958333 774.0 18.0 18.0 21.166667 87.5333333 6.541667 0.0 0.0 0.0 18.900000 83.9166674 4.666667 785.0 16.0 18.0 20.416667 72.6500005 2.458333 813.0 18.0 18.0 21.166667 73.9833336 -0.458333 804.0 16.0 18.0 20.500000 72.1500007 -1.041667 850.0 16.0 16.0 19.850000 76.4333338 -0.375000 763.0 16.0 18.0 20.500000 76.5833339 4.375000 1149.0 16.0 16.0 21.416667 84.30000010 8.541667 0.0 0.0 0.0 21.916667 71.65000011 6.625000 763.0 16.0 18.0 22.833333 73.73333312 5.333333 783.0 16.0 16.0 22.166667 69.25000013 4.708333 764.0 16.0 18.0 21.583333 66.80000014 4.208333 813.0 16.0 16.0 20.750000 68.150000y.head(15)
输出:
Montaj_ET_kWh/day0 11951.01 41821.02 42534.03 14537.04 41305.05 42295.06 44923.07 44279.08 45752.09 44432.010 25786.011 42203.012 40676.013 39980.014 39404.0 X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.30, random_state=None) clf = RandomForestRegressor(n_estimators=10000, random_state=0, n_jobs=-1) clf.fit(X_train, y_train['Montaj_ET_kWh/day']) for feature in zip(feature_list, clf.feature_importances_): print(feature)
输出
('Temp(⁰C)', 0.11598075020423881) ('Araç Sayısı (adet)', 0.7047301384616493) ('Montaj V362_WH', 0.04065706901940535) ('Montaj V363_WH', 0.023077554218712878) ('Montaj_Temp', 0.08082006262985514) ('avg_humidity', 0.03473442546613837) sfm = SelectFromModel(clf, threshold=0.10) sfm.fit(X_train, y_train['Montaj_ET_kWh/day']) for feature_list_index in sfm.get_support(indices=True): print(feature_list[feature_list_index])
输出:
Temp(⁰C) Araç Sayısı (adet) X_important_train = sfm.transform(X_train) X_important_test = sfm.transform(X_test) clf_important = RandomForestRegressor(n_estimators=10000, random_state=0, n_jobs=-1) clf_important.fit(X_important_train, y_train) y_test=y_test.values y_pred = clf.predict(X_test) y_test=y_test.reshape(-1,1) y_pred=y_pred.reshape(-1,1) y_test=y_test.ravel() y_pred=y_pred.ravel() label_encoder = LabelEncoder() y_pred = label_encoder.fit_transform(y_pred) y_test = label_encoder.fit_transform(y_test) accuracy_score(y_test, y_pred)
输出:
0.010964912280701754
我不知道为什么准确率这么低,你们知道我哪里出错了么
回答:
你的错误在于,你在一个回归设置中请求准确率(一种分类指标),这是毫无意义的。
从accuracy_score
的文档中(强调部分):
sklearn.metrics.accuracy_score
(y_true, y_pred, normalize=True, sample_weight=None)准确率分类得分。
请查看指标列表,了解scikit-learn中可用的适合回归的指标(你也可以确认准确率仅用于分类);更多详情,请参见我在准确率得分ValueError:无法处理二元和连续目标的混合中的回答