我试图对目标变量应用两个变换。首先是log(y+1)
变换,然后是minmax
缩放器。我一直在尝试使用scikit-learn
中可用的TransformedTargetRegressor
函数,并结合以下两个函数:
tmin = 0tmax = 0def target_logtransform(target): target_ = target.copy() global tmin global tmax tmin = np.min(target_) tmax = np.max(target_) target_ = (target_ - tmin)/(tmax - tmin) target_ = np.log(target_+1) return target_def target_inverselog(target): global tmin global tmax target_ = target.copy() target_ = np.exp(target_)-1 target_ = (target_*(tmax-tmin))+tmin return target_
回归器如下所示:
#使用时间序列分割tscv = TimeSeriesSplit(n_splits=4)#特征的缩放器col_transform = ColumnTransformer(transformers=[('num', MinMaxScaler(), [2,3,4,5])],remainder='passthrough')#要拟合的模型m = LGBMRegressor(random_state=random_state, **params)#设置管道pipeline = Pipeline(steps=[('prep',col_transform), ('m', m)]) #设置TransformedTargetRegressor以对目标进行变换model = TransformedTargetRegressor(regressor=pipeline, func=target_logtransform, inverse_func = target_inverselog, check_inverse = False )score = -cross_val_score(model, X_train, y_train, cv=tscv, scoring=my_scorer).mean()
我对如何在需要使用训练中目标的min
和max
值进行交叉验证时,对目标应用两个变换感到困惑。
回答:
完整答案展示了解决此问题两种等效的方法。跳到TL;DR以获取解决方案。
步骤1: 设置问题
让我们替换target_logtransform
和target_inverselog
函数。scikit-learn
为这两者都提供了内置方法:
import numpy as npfrom sklearn.preprocessing import FunctionTransformerfrom sklearn.preprocessing import MinMaxScalerlog_transformer = FunctionTransformer(func=np.log1p, inverse_func=np.expm1)scaler = MinMaxScaler()
我们可以这样做,手动重新缩放我们的目标y
。
我们将这样做,作为一个健全性检查,以确保我们后面是正确的:
# 初始化一些数据以重现结果:X = np.array([-0.916,-0.916,-0.836,-0.768,-0.700,-0.608,-0.528,-0.472,-0.404,-0.300,-0.184,-0.0840,0.0480,0.168,0.328,0.468,0.640,0.760,0.872]).reshape(-1, 1)y = np.array([0.899,0.899,0.895,0.871,0.827,0.747,0.607,0.479,0.339,0.167,0.00294,-0.0971,-0.181,-0.233,-0.309,-0.333,-0.333,-0.321,-0.301]).reshape(-1, 1)# 手动进行两步操作y_log = log_transformer.fit_transform(y)y_log_scaled = scaler.fit_transform(y_log)print(y_log_scaled)
输出:
array([[1. ], [1. ], [0.9979847 ], [0.98580284], ... [0.03378575], [0. ], [0. ], [0.01704215], [0.04478737]]
步骤2: 定义TwoTransformers
以先进行对数变换然后缩放
让我们定义一个TwoTransformers
类,扩展scikit-learn
的TransformerMixin
和BaseEstimator
类,并实现此对象的fit_transform
和inverse_transform
方法。第一个将看起来与我们的手动方法相似,但我们可以轻松定义逆操作:
from sklearn.base import TransformerMixin, BaseEstimatorclass TwoTransformers(TransformerMixin, BaseEstimator): def fit_transform(self, y): self.log_transformer = FunctionTransformer( func=np.log1p, inverse_func=np.expm1, ) self.scaler = MinMaxScaler() y_log = self.log_transformer.fit_transform(y) y_log_scaled = self.scaler.fit_transform(y_log) return y_log_scaled def inverse_transform(self, y): y_unscaled = self.scaler.inverse_transform(y) y_unscaled_unlog = self.log_transformer.inverse_transform(y_unscaled) return y_unscaled_unlog
我们可以展示它与我们之前的结果是等效的:
two_steps = TwoTransformers()print(np.all(y_log_scaled == two_steps.fit_transform(y)))print(two_steps.fit_transform(y))
输出匹配:
True[[1. ] [1. ] [0.9979847 ] [0.98580284] ... [0.03378575] [0. ] [0. ] [0.01704215] [0.04478737]]
步骤3: 与TransformedTargetRegressor
整合
让我们用LinearRegression
进行演示(暂时忽略交叉验证),以确保一切正常:
from sklearn.linear_model import LinearRegressionfrom sklearn.compose import TransformedTargetRegressortwo_steps = TwoTransformers()regr_trans = TransformedTargetRegressor( regressor=LinearRegression(), func=two_steps.fit_transform, inverse_func=two_steps.inverse_transform,)regr_trans.fit(X, y)y_pred_two_step = regr_trans.predict(X)
为了比较,这里是一个等效版本,我们使用之前的y_log_scaled
变量,拟合我们的模型,然后手动撤销我们的操作:
clf = LinearRegression()clf.fit(X, y_log_scaled)# 撤销缩放y_pred = clf.predict(X)y_pred_unscale = scaler.inverse_transform(y_pred)y_pred_unscale_unlog = log_transformer.inverse_transform(y_pred_unscale)
同样,我们可以展示两种方法得到相同的结果:
print(np.all(y_pred_two_step == y_pred_unscale_unlog))print(np.c_[y_pred_two_step, y_pred_unscale_unlog])
输出:
True[[ 0.9212786 0.9212786 ] [ 0.9212786 0.9212786 ] [ 0.81566306 0.81566306] ... [-0.36027418 -0.36027418] [-0.41229248 -0.41229248] [-0.45701964 -0.45701964]]
TL;DR: 最终代码
定义一个带有fit_transform
和inverse_transform
的类,将实例传递给TransformedTargetRegressor
:
import numpy as npfrom sklearn.base import TransformerMixin, BaseEstimatorfrom sklearn.preprocessing import FunctionTransformerfrom sklearn.preprocessing import MinMaxScalerfrom sklearn.linear_model import LinearRegressionfrom sklearn.compose import TransformedTargetRegressorX = np.array([-0.916,-0.916,-0.836,-0.768,-0.700,-0.608,-0.528,-0.472,-0.404,-0.300,-0.184,-0.0840,0.0480,0.168,0.328,0.468,0.640,0.760,0.872]).reshape(-1, 1)y = np.array([0.899,0.899,0.895,0.871,0.827,0.747,0.607,0.479,0.339,0.167,0.00294,-0.0971,-0.181,-0.233,-0.309,-0.333,-0.333,-0.321,-0.301]).reshape(-1, 1)class TwoTransformers(TransformerMixin, BaseEstimator): def fit_transform(self, y): self.log_transformer = FunctionTransformer( func=np.log1p, inverse_func=np.expm1, ) self.scaler = MinMaxScaler() y_log = self.log_transformer.fit_transform(y) y_log_scaled = self.scaler.fit_transform(y_log) return y_log_scaled def inverse_transform(self, y): y_unscaled = self.scaler.inverse_transform(y) y_unscaled_unlog = self.log_transformer.inverse_transform(y_unscaled) return y_unscaled_unlogtwo_steps = TwoTransformers()regr_trans = TransformedTargetRegressor( regressor=LinearRegression(), func=two_steps.fit_transform, inverse_func=two_steps.inverse_transform,)regr_trans.fit(X, y)y_pred_two_step = regr_trans.predict(X)print(y_pred_two_step)