在GCP上对自定义xgb模型进行批量预测时挂起

我在Vertex AI中成功运行了我的模型，但在尝试获取批量预测时，程序挂起了。

当我在本地环境中运行模型时，只需几秒钟就能完成。在GCP上，模型需要8分钟来计算。

我的模型代码如下：

from google.cloud import storageimport osimport gcsfsimport pandas as pdimport pickle#read in the fileprint("Mod script starts")ds = pd.read_csv("gs://shottypeids/ShotTypeModel_alldata.csv")print("Data read in success")#modelimport xgboost as xgbfrom sklearn.model_selection import train_test_splity=ds[["Label_Num","ShotPlus"]]#y["Player"]=shots2["Player"]#adjust in iX=ds.drop(["ShotPlus", "Label_Num",              #,"DSL_Available_Bandwidth","Band_2_DSL_rel","DSL_vals"             ],axis=1)X_train, X_test, y_train1,y_test1=train_test_split(X,y, test_size=0.3, random_state=785)y_test = y_test1[["Label_Num"]]y_train = y_train1[["Label_Num"]]dtrain=xgb.DMatrix(X_train,label=y_train)dtest=xgb.DMatrix(X_test,label=y_test)params={        'max_depth':6,    'min_child_weight': 4,    'eta':0.1,    'subsample': 0.8,    'colsample_bytree': 0.8,#     "scale_pos_weight" : 8, #change me    # Other parameters#     'eval_metric' : "auc",    'objective':'multi:softprob',    "num_class":7,    'seed':123}num_boost_round = 999print("Mod Prep Success")mod_addK=xgb.train(params,             dtrain,             num_boost_round=num_boost_round,             evals=[(dtest, "Test")],             early_stopping_rounds=10)print("Mod Run")artifact_filename = 'ShotTypeModel_2pt1.pkl'# Save model artifact to local filesystem (doesn't persist)local_path = artifact_filenamewith open(local_path, 'wb') as model_file:    pickle.dump(mod_addK, model_file)# Upload model artifact to Cloud Storagemodel_directory = os.environ['AIP_MODEL_DIR']storage_path = os.path.join(model_directory, artifact_filename)blob = storage.blob.Blob.from_string(storage_path, client=storage.Client())blob.upload_from_filename(local_path)print("Model artefacts saved")

我查看了日志，发现有一些关于pip的错误，但程序还是运行并完成了。

然后我在GCP的模型标签页中找到了模型，并将其作为工件保存到了云存储中。我在一个csv文件上设置了一个批处理作业，但它一直挂起很久。我原以为可能是因为我没有立即将其放入容器中，所以我重新运行并加载了与训练时相同的容器（xgb 1.1）

现在已经运行了超过45分钟，前几次尝试也超过了半小时。我取消了最后的作业，它显示是因为模型服务器启动超时，应该检查容器规格。我还没有找到关于这方面应该做什么的信息。

我严格按照这里的说明操作，但它就是挂起。我无法让API工作，但我只是在云壳中运行了这个，而不是在虚拟机上，所以接下来会尝试那一步。

欢迎任何建议，J

回答：

这个问题的简单答案似乎是文件确实必须保存为“model.pkl”。我以为扩展名前的名称可以变化，但实际上不行。

我仍然在努力生成预测，但现在大约15分钟内就会返回失败信息

学技术

在GCP上对自定义xgb模型进行批量预测时挂起

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复