预测结果为负值

我有以下样本数据,使用回归算法预测时间。我也使用了lightgbm和xgboost训练模型,但是我得到的预测结果非常差,而且预测结果是负值。我不确定我哪里做错了,谢谢

size channels time3    3      4.9802783    16     4.9720543    64     4.8998843    256    5.4992213    512    5.5994953    1024   5.93693316   3      5.22165316   16     5.99482116   64     6.64825416   256    7.17682816   512    8.170716   1024   8.65149664   3      7.80153364   16     7.39824864   64     8.39564864   256    17.4949464   512    26.4335464   1024   49.55192256  3      12.36093256  16     20.50781256  64     46.49553256  256    170.5452256  512    333.8809256  1024   675.9459512  3      22.44313512  16     53.82643512  64     164.3493512  256    659.4345512  512    1306.881512  1024   3122.403 

lightbgm代码

x_train, x_val, y_train, y_val = train_test_split(X,Y, train_size=0.8)print(f"Number of training examples {len(x_train)}")print(f"Number fo testing examples {len(x_val)}")regressor = lightgbm.LGBMRegressor()regressor.fit(x_train,y_train)train_pred = regressor.predict(x_train)train_rmse = mean_squared_error(train_pred, y_train) ** 0.5print(f"Train RMSE is {train_rmse}")val_pred = regressor.predict(x_val)val_rmse = mean_squared_error(val_pred, y_val)**0.5 print(f"Test RMSE is {val_rmse}")R_squared = r2_score(val_pred,y_val)print('R2',R_squared)

结果

Train RMSE is 5385.50, Test RMSE is 1245.1,R2 -2.9991290197894976e+31

使用optuna优化的XGBoost代码

def optimize(trial,x,y,regressor):  max_depth = trial.suggest_int("max_depth",3,10)  n_estimators = trial.suggest_int("n_estimators",5000,10000)  max_leaves= trial.suggest_int("max_leaves",1,10)  learning_rate = trial.suggest_loguniform('learning_rate', 0.001, 0.1)  colsample_bytree = trial.suggest_uniform('colsample_bytree', 0.0, 1.0)   min_child_weight = trial.suggest_uniform('min_child_weight',1,3)  subsample = trial.suggest_uniform('subsample', 0.5, 1)  model = xgb.XGBRegressor(    objective ='reg:squarederror',    n_estimators=n_estimators,    max_depth=max_depth,    learning_rate=learning_rate,    colsample_bytree=colsample_bytree,    min_child_weight=min_child_weight,    max_leaves=max_leaves,    subsample = subsample)  kf=model_selection.KFold(n_splits=5)  error=[]  for idx in kf.split(X=x , y=y):    train_idx , test_idx= idx[0],idx[1]    xtrain=x[train_idx]    ytrain=y[train_idx]    xtest=x[test_idx]    ytest=y[test_idx]       model.fit(xtrain,ytrain)    y_pred = model.predict(xtest)    fold_err = metrics.mean_squared_error(ytest,y_pred)    error.append(np.sqrt(fold_err))  return np.mean(error)  best_params={{'max_depth': 9, 'n_estimators': 9242, 'max_leaves': 7, 'learning_rate': 0.0015809052065858954, 'colsample_bytree': 0.4908644884609704, 'min_child_weight': 2.3502876962874435, 'subsample': 0.5927926099148189}def optimize_xgb(X,y):  list_of_y = ["Target 1"]  for i,m in zip(range(y.shape[1]),list_of_y):    print("{} optimized Parameters on MSE Error".format(m))    optimization_function = partial(optimize , x=X,y=y[:,i],regressor="random_forest")    study = optuna.create_study(direction="minimize")    study.optimize(optimization_function,n_trials=50)optimize_xgb(X_train, y_train)def modeling(X,             y,             optimize = "no",             max_depth=50,             n_estimators=3000,             max_leaves=30,             learning_rate=0.01,             colsample_bytree=1.0,             gamma=0.0001,             min_child_weight=2,             reg_lambda=0.0001):  if optimize == "no":    model = xgb.XGBRegressor(objective='reg:squarederror')  else:    model = xgb.XGBRegressor(objective='reg:squarederror',                          **best_params)  if y.shape[1] ==1:    model_xgb = model.fit(X, y)  cv = RepeatedKFold(n_splits=5, n_repeats=3, random_state=1)  scores = []  for i in range(y.shape[1]):    scores.append(np.abs(cross_val_score(model, X, y[:,i], scoring='neg_mean_squared_error', cv=cv, n_jobs=-1)))    print('Mean MSE of the {} target : {}  ({})'.format(i,scores[i].mean(), scores[i].std()) )  return model_xgbmodel_xgb = modeling(X_train,y_train, optimize="yes")model_xgb.fit(X_train, y_train)y_pred = model_xgb.predict(X_test)MSE = mse(y_pred,y_test)RMSE = np.sqrt(MSE)print("TEST MSE",MSE)R_squared = r2_score(y_pred,y_test)print("RMSE: ", np.round(RMSE, 2))print("R-Squared: ", np.round(R_squared, 2))

xgboost的结果

TEST MSE 2653915.139388934,RMSE:  1629.08,R-Squared:  -1.69

回答:

首先,我们必须说,梯度提升模型可以返回训练范围内的和范围外的值。

GB回归器使用前一阶段的残差来拟合树。因此,如果在阶段t的预测值大于目标变量,那么阶段t的残差将是负值,然后阶段t+1的回归树将处理来自之前残差的这些负目标值。

通过逐阶段添加这些得到的树,模型可以预测负值。

处理预测模型中负值的常见技术是对数变换。

目标变量的变换是Y -> log(Y+c),其中c是常数。人们通常选择类似Y -> log(Y+0.001)或任何其他“非常小”的正数。

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注