如何在GCP中将数据集链接到模型

我在GCP中首次尝试建模，无法找到或弄清楚如何将数据链接到模型上。在我的脚本中，我通常会从某个路径读取CSV文件。

我知道我需要将数据加载到Google Cloud Storage中。这是一个CSV文件，我正在对其运行XGB分类。问题是如何将这些东西链接起来，让脚本知道要在那个数据上运行…

#read in the file#ds#modelimport xgboost as xgbfrom sklearn.model_selection import train_test_splity=ds[["Label_Num","ShotPlus"]]y["Player"]=shots2["Player"]#adjust in iX=ds.drop(["ShotPlus", "Label_Num",              #,"DSL_Available_Bandwidth","Band_2_DSL_rel","DSL_vals"             ],axis=1)X_train, X_test, y_train1,y_test1=train_test_split(X,y, test_size=0.3, random_state=785)y_test = y_test1[["Label_Num"]]y_train = y_train1[["Label_Num"]]dtrain=xgb.DMatrix(X_train,label=y_train)dtest=xgb.DMatrix(X_test,label=y_test)params={        'max_depth':6,    'min_child_weight': 4,    'eta':0.1,    'subsample': 0.8,    'colsample_bytree': 0.8,#     "scale_pos_weight" : 8, #change me    # Other parameters#     'eval_metric' : "auc",    'objective':'multi:softprob',    "num_class":7,    'seed':123}num_boost_round = 999mod_addK=xgb.train(params,             dtrain,             num_boost_round=num_boost_round,             evals=[(dtest, "Test")],             early_stopping_rounds=10)

我没有找到以CSV文件加载的例子。这个读取的是tf.dataset，这个告诉我如何在使用AutoML分类模型的过程中使用它。但是在我自己编写代码并希望调整的自定义作业中，它是如何工作的呢？

上面的代码将是设置我自己的源代码分发任务元素的一部分，需要添加写入的元素。我从GCS页面中提取了这个部分。

artifact_filename = 'ShotTypeModel.pkl'# Save model artifact to local filesystem (doesn't persist)local_path = artifact_filenamewith open(local_path, 'wb') as model_file:    pickle.dump(mod_addK, model_file)# Upload model artifact to Cloud Storagemodel_directory = os.environ['AIP_MODEL_DIR']storage_path = os.path.join(model_directory, artifact_filename)blob = storage.blob.Blob.from_string(storage_path, client=storage.Client())blob.upload_from_filename(local_path)

谷歌网站上有很多文档听起来应该有帮助，但没有给我具体的细节。比如在自定义训练应用中使用托管数据集

回答：

答案是存储桶提供了一个URL，这就是你需要加载数据的URL。

你还需要包含gcsfs包才能让它工作。

from google.cloud import storageimport osimport gcsfsimport pandas as pdimport pickle#read in the fileprint("Mod script starts")ds = pd.read_csv("gs://shottypeids/ShotTypeModel_alldata.csv")

学技术

如何在GCP中将数据集链接到模型

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复