Lasso和Ridge回归低精度问题

我在我的森林火灾样本数据集上应用了Lasso回归和Ridge回归,但我的精度远低于预期的目标

我已经尝试更改alpha值和训练集的值

#Kütüphaneleri importladımimport pandas as pdimport numpy as npfrom sklearn.preprocessing import LabelEncoder, OneHotEncoderfrom sklearn.compose import ColumnTransformerfrom sklearn.impute import SimpleImputerfrom sklearn.model_selection import train_test_splitfrom sklearn.linear_model import LinearRegressionfrom sklearn.linear_model import Ridge#Dosyami yukledimforest = pd.read_csv('forestfires.csv')#Coulmn ve row feaute adlarimi duzenledimforest.month.replace(('jan','feb','mar','apr','may','jun','jul','aug','sep','oct','nov','dec'),(1,2,3,4,5,6,7,8,9,10,11,12), inplace=True)forest.day.replace(('mon','tue','wed','thu','fri','sat','sun'),(1,2,3,4,5,6,7), inplace=True)# iloc indeksin sırasıyla, loc indeksin kendisiyle işlem yapmaya olanak verir.Burada indeksledimX = forest.iloc[:,0:12].valuesy = forest.iloc[:,12].values# 30 -70 olarak train test setlerimi ayirdimX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=3)#x-y axis trainler arasina linear regressyon kurdumlr = LinearRegression()lr.fit(X_train, y_train)#ridge regression modeli kurdumrr = Ridge(alpha=0.01)rr.fit(X_train, y_train)rr100 = Ridge(alpha=100)rr100.fit(X_train, y_train)#lasso regression icin modelledimtrain_score = lr.score(X_train, y_train)test_score = lr.score(X_test, y_test)Ridge_train_score = rr.score(X_train, y_train)Ridge_test_score = rr.score(X_test, y_test)Ridge_train_score100 = rr100.score(X_train, y_train)Ridge_test_score100 = rr100.score(X_test, y_test)print("linear regression train score:", train_score)print("linear regression test score:", test_score)print('ridge regression train score low score: %.2f' % Ridge_train_score)print('ridge regression test score low score: %.2f' % Ridge_test_score)print('ridge regression train score high score: %.2f' % Ridge_train_score100)print('ridge regression test score high score: %.2f' % Ridge_test_score100)

回答:

考虑到你的问题:我没有在你的代码中看到任何Lasso回归。尝试一些LassoCVElasticNetCV(l1_ratio=[.1, .5, .7, .9, .95, .99, 1])总是找到合理alpha值的好开始。对于Ridge,RidgeCV是CV算法。与LassoCVElasticNetCV不同,RidgeCV使用LOO-CV并且采用一组固定的alpha值,因此它需要更多的用户处理来获得最佳输出。以下是一个给定的代码示例:

import pandas as pdimport numpy as npfrom sklearn.preprocessing import LabelEncoder, OneHotEncoderfrom sklearn.compose import ColumnTransformerfrom sklearn.impute import SimpleImputerfrom sklearn.model_selection import train_test_splitfrom sklearn.linear_model import LinearRegression, LassoCV, ElasticNetCVfrom sklearn.linear_model import Ridge, RidgeCVforest = pd.read_csv('forestfires.csv')#Coulmn ve row feaute adlarimi duzenledimforest.month.replace(('jan','feb','mar','apr','may','jun','jul','aug','sep','oct','nov','dec'),(1,2,3,4,5,6,7,8,9,10,11,12), inplace=True)forest.day.replace(('mon','tue','wed','thu','fri','sat','sun'),(1,2,3,4,5,6,7), inplace=True)# iloc indeksin sırasıyla, loc indeksin kendisiyle işlem yapmaya olanak verir.Burada indeksledimX = forest.iloc[:,0:12].valuesy = forest.iloc[:,12].values# 30 -70 olarak train test setlerimi ayirdimX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=3)#x-y axis trainler arasina linear regressyon kurdumlr = LinearRegression()# The cross validation algorithms:lasso_cv = LassoCV()  # LassoCV will try to find the best alpha for you# ElasticNetCV will try to find the best alpha for you, for a given set of combinations of Ridge and Alphaenet_cv = ElasticNetCV()ridge_cv = RidgeCV()lr.fit(X_train, y_train)lasso_cv.fit(X_train, y_train)enet_cv.fit(X_train, y_train)ridge_cv.fit(X_train, y_train)#ridge regression modeli kurdumrr = Ridge(alpha=0.01)rr.fit(X_train, y_train)rr100 = Ridge(alpha=100)

现在检查找到的alpha值:

print('LassoCV alpha:', lasso_cv.alpha_)print('RidgeCV alpha:', ridge_cv.alpha_)print('ElasticNetCV alpha:', enet_cv.alpha_, 'ElasticNetCV l1_ratio:', enet_cv.l1_ratio_)ridge_alpha = ridge_cv.alpha_enet_alpha, enet_l1ratio = enet_cv.alpha_, enet_cv.l1_ratio_

并围绕这些值调整你的新RidgeCV和/或ElasticNetCVl1_ratio小于0和大于1的值将被ElasticNetCV忽略):

enet_new_l1ratios = [enet_l1ratio * mult for mult in [.9, .95, 1, 1.05, 1.1]]ridge_new_alphas = [ridge_alpha * mult for mult in [.9, .95, 1, 1.05, 1.1]]# fit Enet and Ridge again:enet_cv = ElasticNetCV(l1_ratio=enet_new_l1ratios)ridge_cv = RidgeCV(alphas=ridge_new_alphas)enet_cv.fit(X_train, y_train)ridge_cv.fit(X_train, y_train)

这应该是找到模型的良好alpha值和/或l1比率的第一步。当然,其他步骤如特征工程和选择正确的模型(例如Lasso:执行特征选择)应该先于寻找好的参数。:)

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注