如何使用TransformedTargetRegressor应用两个变换?

我试图对目标变量应用两个变换。首先是log(y+1)变换,然后是minmax缩放器。我一直在尝试使用scikit-learn中可用的TransformedTargetRegressor函数,并结合以下两个函数:

tmin = 0tmax = 0def target_logtransform(target):    target_ = target.copy()    global tmin    global tmax    tmin = np.min(target_)    tmax = np.max(target_)    target_ = (target_ - tmin)/(tmax - tmin)    target_ = np.log(target_+1)    return target_def target_inverselog(target):    global tmin    global tmax    target_ = target.copy()    target_ = np.exp(target_)-1    target_ = (target_*(tmax-tmin))+tmin    return target_

回归器如下所示:

#使用时间序列分割tscv = TimeSeriesSplit(n_splits=4)#特征的缩放器col_transform = ColumnTransformer(transformers=[('num', MinMaxScaler(), [2,3,4,5])],remainder='passthrough')#要拟合的模型m = LGBMRegressor(random_state=random_state, **params)#设置管道pipeline = Pipeline(steps=[('prep',col_transform), ('m', m)]) #设置TransformedTargetRegressor以对目标进行变换model = TransformedTargetRegressor(regressor=pipeline, func=target_logtransform, inverse_func = target_inverselog, check_inverse = False )score = -cross_val_score(model, X_train, y_train, cv=tscv, scoring=my_scorer).mean()

我对如何在需要使用训练中目标的minmax值进行交叉验证时,对目标应用两个变换感到困惑。


回答:

完整答案展示了解决此问题两种等效的方法。跳到TL;DR以获取解决方案。


步骤1: 设置问题

让我们替换target_logtransformtarget_inverselog函数。scikit-learn为这两者都提供了内置方法:

import numpy as npfrom sklearn.preprocessing import FunctionTransformerfrom sklearn.preprocessing import MinMaxScalerlog_transformer = FunctionTransformer(func=np.log1p, inverse_func=np.expm1)scaler = MinMaxScaler()

我们可以这样做,手动重新缩放我们的目标y

我们将这样做,作为一个健全性检查,以确保我们后面是正确的:

# 初始化一些数据以重现结果:X = np.array([-0.916,-0.916,-0.836,-0.768,-0.700,-0.608,-0.528,-0.472,-0.404,-0.300,-0.184,-0.0840,0.0480,0.168,0.328,0.468,0.640,0.760,0.872]).reshape(-1, 1)y = np.array([0.899,0.899,0.895,0.871,0.827,0.747,0.607,0.479,0.339,0.167,0.00294,-0.0971,-0.181,-0.233,-0.309,-0.333,-0.333,-0.321,-0.301]).reshape(-1, 1)# 手动进行两步操作y_log = log_transformer.fit_transform(y)y_log_scaled = scaler.fit_transform(y_log)print(y_log_scaled)

输出:

array([[1.        ],       [1.        ],       [0.9979847 ],       [0.98580284],       ...       [0.03378575],       [0.        ],       [0.        ],       [0.01704215],       [0.04478737]]

步骤2: 定义TwoTransformers以先进行对数变换然后缩放

让我们定义一个TwoTransformers类,扩展scikit-learnTransformerMixinBaseEstimator类,并实现此对象的fit_transforminverse_transform方法。第一个将看起来与我们的手动方法相似,但我们可以轻松定义逆操作:

from sklearn.base import TransformerMixin, BaseEstimatorclass TwoTransformers(TransformerMixin, BaseEstimator):    def fit_transform(self, y):        self.log_transformer = FunctionTransformer(            func=np.log1p,            inverse_func=np.expm1,        )        self.scaler = MinMaxScaler()        y_log = self.log_transformer.fit_transform(y)        y_log_scaled = self.scaler.fit_transform(y_log)        return y_log_scaled    def inverse_transform(self, y):        y_unscaled = self.scaler.inverse_transform(y)        y_unscaled_unlog = self.log_transformer.inverse_transform(y_unscaled)        return y_unscaled_unlog

我们可以展示它与我们之前的结果是等效的:

two_steps = TwoTransformers()print(np.all(y_log_scaled == two_steps.fit_transform(y)))print(two_steps.fit_transform(y))

输出匹配:

True[[1.        ] [1.        ] [0.9979847 ] [0.98580284] ... [0.03378575] [0.        ] [0.        ] [0.01704215] [0.04478737]]

步骤3: 与TransformedTargetRegressor整合

让我们用LinearRegression进行演示(暂时忽略交叉验证),以确保一切正常:

from sklearn.linear_model import LinearRegressionfrom sklearn.compose import TransformedTargetRegressortwo_steps = TwoTransformers()regr_trans = TransformedTargetRegressor(    regressor=LinearRegression(),    func=two_steps.fit_transform,    inverse_func=two_steps.inverse_transform,)regr_trans.fit(X, y)y_pred_two_step = regr_trans.predict(X)

为了比较,这里是一个等效版本,我们使用之前的y_log_scaled变量,拟合我们的模型,然后手动撤销我们的操作:

clf = LinearRegression()clf.fit(X, y_log_scaled)# 撤销缩放y_pred = clf.predict(X)y_pred_unscale = scaler.inverse_transform(y_pred)y_pred_unscale_unlog = log_transformer.inverse_transform(y_pred_unscale)

同样,我们可以展示两种方法得到相同的结果:

print(np.all(y_pred_two_step == y_pred_unscale_unlog))print(np.c_[y_pred_two_step, y_pred_unscale_unlog])

输出:

True[[ 0.9212786   0.9212786 ] [ 0.9212786   0.9212786 ] [ 0.81566306  0.81566306] ... [-0.36027418 -0.36027418] [-0.41229248 -0.41229248] [-0.45701964 -0.45701964]]

TL;DR: 最终代码

定义一个带有fit_transforminverse_transform的类,将实例传递给TransformedTargetRegressor:

import numpy as npfrom sklearn.base import TransformerMixin, BaseEstimatorfrom sklearn.preprocessing import FunctionTransformerfrom sklearn.preprocessing import MinMaxScalerfrom sklearn.linear_model import LinearRegressionfrom sklearn.compose import TransformedTargetRegressorX = np.array([-0.916,-0.916,-0.836,-0.768,-0.700,-0.608,-0.528,-0.472,-0.404,-0.300,-0.184,-0.0840,0.0480,0.168,0.328,0.468,0.640,0.760,0.872]).reshape(-1, 1)y = np.array([0.899,0.899,0.895,0.871,0.827,0.747,0.607,0.479,0.339,0.167,0.00294,-0.0971,-0.181,-0.233,-0.309,-0.333,-0.333,-0.321,-0.301]).reshape(-1, 1)class TwoTransformers(TransformerMixin, BaseEstimator):    def fit_transform(self, y):        self.log_transformer = FunctionTransformer(            func=np.log1p,            inverse_func=np.expm1,        )        self.scaler = MinMaxScaler()        y_log = self.log_transformer.fit_transform(y)        y_log_scaled = self.scaler.fit_transform(y_log)        return y_log_scaled    def inverse_transform(self, y):        y_unscaled = self.scaler.inverse_transform(y)        y_unscaled_unlog = self.log_transformer.inverse_transform(y_unscaled)        return y_unscaled_unlogtwo_steps = TwoTransformers()regr_trans = TransformedTargetRegressor(    regressor=LinearRegression(),    func=two_steps.fit_transform,    inverse_func=two_steps.inverse_transform,)regr_trans.fit(X, y)y_pred_two_step = regr_trans.predict(X)print(y_pred_two_step)

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注