我想弄清楚如何在使用RobustScaler和Lasso后对我的预测数据进行反缩放(可能使用inverse_transform)。下面的数据只是一个例子。我的实际数据要大得多且更复杂,但我希望使用RobustScaler(因为我的数据中有异常值)和Lasso(因为我的数据中有几十个无用的特征)。
基本上,如果我尝试使用这个模型进行任何预测,我希望得到未缩放的预测结果。当我尝试对示例数据点进行操作时,我得到了一个错误,似乎要求我对与训练子集相同大小的数据(即两个观测值)进行反缩放。我得到了以下错误:ValueError: non-broadcastable output operand with shape (1,1) doesn’t match the broadcast shape (1,2)
我如何仅对一个预测进行反缩放?这是可能的吗?
import pandas as pdfrom sklearn.linear_model import Lassofrom sklearn.model_selection import train_test_splitfrom sklearn.preprocessing import RobustScalerdata = [[100, 1, 50],[500 , 3, 25],[1000 , 10, 100]]df = pd.DataFrame(data,columns=['Cost','People', 'Supplies'])X = df[['People', 'Supplies']]y = df[['Cost']]#SplitX_train,X_test,y_train,y_test = train_test_split(X,y)#Scale datatransformer = RobustScaler().fit(X_train)transformer.transform(X_train)X_rtrain = RobustScaler().fit_transform(X_train)y_rtrain = RobustScaler().fit_transform(y_train)X_rtest = RobustScaler().fit_transform(X_test)y_rtest = RobustScaler().fit_transform(y_test)#Fit Train Modellasso = Lasso()lasso_alg = lasso.fit(X_rtrain,y_rtrain)train_score =lasso_alg.score(X_rtrain,y_rtrain)test_score = lasso_alg.score(X_rtest,y_rtest)print ("training score:", train_score)print ("test score:", test_score)#Predict example example = [[10,100]]transformer.inverse_transform(lasso_alg.predict(example).reshape(-1, 1))
回答:
你不能对X和y使用同一个tranformer
对象。在你的代码片段中,你的transformer
是为2D的X准备的,因此当你尝试转换预测结果(1D)时会得到错误。(实际上,你能得到错误是幸运的;如果你的X是一维的,你会得到毫无意义的结果)。
像这样应该可以工作:
transformer_x = RobustScaler().fit(X_train)transformer_y = RobustScaler().fit(y_train) X_rtrain = transformer_x.transform(X_train)y_rtrain = transformer_y.transform(y_train)X_rtest = transformer_x.transform(X_test)y_rtest = transformer_y.transform(y_test)#Fit Train Modellasso = Lasso()lasso_alg = lasso.fit(X_rtrain,y_rtrain)train_score =lasso_alg.score(X_rtrain,y_rtrain)test_score = lasso_alg.score(X_rtest,y_rtest)print ("training score:", train_score)print ("test score:", test_score)example = [[10,100]]transformer_y.inverse_transform(lasso.predict(example).reshape(-1, 1))