早期停止在LightGBM中使用RMSLE作为评估指标时不起作用

我在Python中使用rmsle作为评估指标训练LightGBM机器学习模型时,尝试启用早期停止功能时遇到了问题。

这是我的代码:

import numpy as npimport pandas as pdimport lightgbm as lgbfrom sklearn.model_selection import train_test_splitdf_train = pd.read_csv('train_data.csv')X_train = df_train.drop('target', axis=1)y_train = np.log(df_train['target'])sample_params = {    'boosting_type': 'gbdt',    'objective': 'regression',    'random_state': 42,    'metric': 'rmsle',    'lambda_l1': 5,    'lambda_l2': 5,    'num_leaves': 5,    'bagging_freq': 5,    'max_depth': 5,    'max_bin': 5,    'min_child_samples': 5,    'feature_fraction': 0.5,    'bagging_fraction': 0.5,    'learning_rate': 0.1,}X_train_tr, X_train_val, y_train_tr, y_train_val = train_test_split(X_train, y_train, test_size=0.2, random_state=42)def train_lightgbm(X_train_tr, y_train_tr, X_train_val, y_train_val, params, num_boost_round, early_stopping_rounds, verbose_eval):    d_train = lgb.Dataset(X_train_tr, y_train_tr)    d_val = lgb.Dataset(X_train_val, y_train_val)    model = lgb.train(        params=params,        train_set=d_train,        num_boost_round=num_boost_round,        valid_sets=d_val,        early_stopping_rounds=early_stopping_rounds,        verbose_eval=verbose_eval,    )    return modelmodel = train_lightgbm(        X_train_tr,         y_train_tr,         X_train_val,         y_train_val,         params=sample_params,        num_boost_round=500,        early_stopping_rounds=True,        verbose_eval=1)df_test = pd.read_csv('test_data.csv')X_test = df_test.drop('target', axis=1)y_test = np.log(df_test['target'])df_train['prediction'] = np.exp(model.predict(X_train))df_test['prediction'] = np.exp(model.predict(X_test))def rmsle(y_true, y_pred):    assert len(y_true) == len(y_pred)    return np.sqrt(np.mean(np.power(np.log1p(y_true + 1) - np.log1p(y_pred + 1), 2)))metric = rmsle(y_test, df_test['prediction'])print('Test Metric Value:', round(metric, 4))

如果我在train_lightgbm方法中将early_stopping_rounds=False,代码可以正常编译。

但是,当我设置early_stopping_rounds=True时,它会抛出以下错误:

ValueError: For early stopping, at least one dataset and eval metric is required for evaluation.

如果我运行一个类似的脚本,但将sample_params中的’metric’: ‘rmse’改为’rmsle’,即使early_stopping_rounds=True,它也可以正常编译。

我需要添加什么才能让LightGBM识别我的数据集和评估指标?谢谢!


回答:

rmsle默认情况下不被LGB支持作为指标(查看这里以获取可用列表)

为了应用这个自定义指标,你需要定义一个自定义函数

def rmsle_lgbm(y_pred, data):    y_true = np.array(data.get_label())    score = np.sqrt(np.mean(np.power(np.log1p(y_true) - np.log1p(y_pred), 2)))    return 'rmsle', score, False

以这种方式重新定义你的参数字典:

params = {....'objective': 'regression','metric': 'custom', # <=============....}

然后进行训练

model = lgb.train(        params=params,        train_set=d_train,        num_boost_round=num_boost_round,        valid_sets=d_val,        early_stopping_rounds=early_stopping_rounds,        verbose_eval=verbose_eval,        feval=rmsle_lgbm # <=============    )

附注: np.log(y + 1) = np.log1p(y) ===> np.log1p(y + 1) 看起来是个错误

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注