通过整合不同的基础和文档示例,我设法得出了以下代码:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)def objective(config, reporter): for i in range(config['iterations']): model = RandomForestClassifier(random_state=0, n_jobs=-1, max_depth=None, n_estimators= int(config['n_estimators']), min_samples_split=int(config['min_samples_split']), min_samples_leaf=int(config['min_samples_leaf'])) model.fit(X_train, y_train) y_pred = model.predict(X_test) # 将分数反馈给tune? reporter(precision=precision_score(y_test, y_pred, average='macro'))space = {'n_estimators': (100,200), 'min_samples_split': (2, 10), 'min_samples_leaf': (1, 5)}algo = BayesOptSearch( space, metric="precision", mode="max", utility_kwargs={ "kind": "ucb", "kappa": 2.5, "xi": 0.0 }, verbose=3 )scheduler = AsyncHyperBandScheduler(metric="precision", mode="max")config = { "num_samples": 1000, "config": { "iterations": 10, }}results = run(objective, name="my_exp", search_alg=algo, scheduler=scheduler, stop={"training_iteration": 400, "precision": 0.80}, resources_per_trial={"cpu":2, "gpu":0.5}, **config)print(results.dataframe())print("最佳配置: ", results.get_best_config(metric="precision"))
代码可以运行,并且我能够在最后得到最佳配置。然而,我主要对objective
函数有疑问。我写的这个函数是否正确?我找不到任何示例。
后续问题:
- 配置对象中的
num_samples
是什么?它是从总体训练数据中为每个试验提取的样本数量吗?
回答:
Tune现在有原生的sklearn绑定:https://github.com/ray-project/tune-sklearn
你可以尝试使用这个吗?
回答你最初的问题,目标函数看起来很好;num_samples
是你想尝试的超参数配置的总数。
此外,你需要从你的训练函数中移除for循环:
def objective(config, reporter): model = RandomForestClassifier(random_state=0, n_jobs=-1, max_depth=None, n_estimators= int(config['n_estimators']), min_samples_split=int(config['min_samples_split']), min_samples_leaf=int(config['min_samples_leaf'])) model.fit(X_train, y_train) y_pred = model.predict(X_test) # 将分数反馈给tune reporter(precision=precision_score(y_test, y_pred, average='macro'))