使用Sagemaker的随机切割森林算法进行带验证的训练

最近几天使用Sagemaker内置的随机切割森林算法遇到了问题。

我想在训练过程中对模型进行验证，但可能有些地方我没有完全理解。

首先，仅使用训练通道进行拟合是可以正常工作的：

container=sagemaker.image_uris.retrieve("randomcutforest", region, "us-east-1")print(container)rcf = sagemaker.estimator.Estimator(    image_uri=container,    role=role,    instance_count=1,    sagemaker_session=sagemaker.Session(),    instance_type="ml.m4.xlarge",    data_location=f"s3://{bucket}/{prefix}/",    output_path=f"s3://{bucket}/{prefix}/output")rcf.set_hyperparameters(    feature_dim = 116,    eval_metrics = 'precision_recall_fscore',    num_samples_per_tree=256,    num_trees=100,    )train_data = sagemaker.inputs.TrainingInput(s3_data=train_location, content_type='text/csv;label_size=0', distribution='ShardedByS3Key')rcf.fit({'train': train_data})

[06/28/2021 09:45:24 INFO 140226936620864] 未提供测试数据。#metrics {"StartTime": 1624873524.6154933, "EndTime": 1624873524.6156445, "Dimensions": {"Algorithm": "RandomCutForest", "Host": "algo-1", "Operation": "training"}, "Metrics": {"setuptime": {"sum": 40.169477462768555, "count": 1, "min": 40.169477462768555, "max": 40.169477462768555}, "totaltime": {"sum": 13035.491704940796, "count": 1, "min": 13035.491704940796, "max": 13035.491704940796}}}2021-06-28 09:45:50 已完成 - 训练任务已完成ProfilerReport-1624873226: NoIssuesFound训练秒数: 78计费秒数: 78

但是，当我想在训练过程中验证模型时：

train_data = sagemaker.inputs.TrainingInput(s3_data=train_location, content_type='text/csv;label_size=0', distribution='ShardedByS3Key')val_data = sagemaker.inputs.TrainingInput(s3_data=val_location, content_type='text/csv;label_size=1', distribution='FullyReplicated')rcf.fit({'train': train_data, 'validation': val_data}, wait=True)

我得到了以下错误：

AWS Region: us-east-1RoleArn: arn:aws:iam::517714493426:role/service-role/AmazonSageMaker-ExecutionRole-20210409T152960382416733822.dkr.ecr.us-east-1.amazonaws.com/randomcutforest:12021-06-28 10:14:12 开始 - 启动训练任务...2021-06-28 10:14:14 开始 - 启动请求的ML实例ProfilerReport-1624875252: InProgress......2021-06-28 10:15:27 开始 - 准备训练实例.........2021-06-28 10:17:07 下载中 - 下载输入数据...2021-06-28 10:17:27 训练中 - 下载训练镜像..Docker入口点被调用，参数为：train运行默认环境配置脚本[06/28/2021 10:17:53 INFO 140648505521984] 从/opt/amazon/lib/python3.7/site-packages/algorithm/resources/default-conf.json读取默认配置： {'num_samples_per_tree': 256, 'num_trees': 100, 'force_dense': 'true', 'eval_metrics': ['accuracy', 'precision_recall_fscore'], 'epochs': 1, 'mini_batch_size': 1000, '_log_level': 'info', '_kvstore': 'dist_async', '_num_kv_servers': 'auto', '_num_gpus': 'auto', '_tuning_objective_metric': '', '_ftp_port': 8999}[06/28/2021 10:17:53 INFO 140648505521984] 与/opt/ml/input/config/hyperparameters.json中提供的配置合并： {'num_trees': '100', 'num_samples_per_tree': '256', 'feature_dim': '116', 'eval_metrics': 'precision_recall_fscore'}[06/28/2021 10:17:53 INFO 140648505521984] 最终配置： {'num_samples_per_tree': '256', 'num_trees': '100', 'force_dense': 'true', 'eval_metrics': 'precision_recall_fscore', 'epochs': 1, 'mini_batch_size': 1000, '_log_level': 'info', '_kvstore': 'dist_async', '_num_kv_servers': 'auto', '_num_gpus': 'auto', '_tuning_objective_metric': '', '_ftp_port': 8999, 'feature_dim': '116'}[06/28/2021 10:17:53 ERROR 140648505521984] 客户错误：无法初始化算法。无法验证输入数据配置。 (由ValidationError引起)由以下原因引起：不允许额外的属性（'validation'是意外的）在架构中验证'additionalProperties'失败：    {'$schema': 'http://json-schema.org/draft-04/schema#',     'additionalProperties': False,     'definitions': {'data_channel_replicated': {'properties': {'ContentType': {'type': 'string'},                                                                'RecordWrapperType': {'$ref': '#/definitions/record_wrapper_type'},                                                                'S3DistributionType': {'$ref': '#/definitions/s3_replicated_type'},                                                                'TrainingInputMode': {'$ref': '#/definitions/training_input_mode'}},                                                 'type': 'object'},                     'data_channel_sharded': {'properties': {'ContentType': {'type': 'string'},                                                             'RecordWrapperType': {'$ref': '#/definitions/record_wrapper_type'},                                                             'S3DistributionType': {'$ref': '#/definitions/s3_sharded_type'},                                                             'TrainingInputMode': {'$ref': '#/definitions/training_input_mode'}},                                              'type': 'object'},                     'record_wrapper_type': {'enum': ['None', 'Recordio'],                                             'type': 'string'},                     's3_replicated_type': {'enum': ['FullyReplicated'],                                            'type': 'string'},                     's3_sharded_type': {'enum': ['ShardedByS3Key'],                                         'type': 'string'},                     'training_input_mode': {'enum': ['File', 'Pipe'],                                             'type': 'string'}},     'properties': {'state': {'$ref': '#/definitions/data_channel'},                    'test': {'$ref': '#/definitions/data_channel_replicated'},                    'train': {'$ref': '#/definitions/data_channel_sharded'}},     'required': ['train'],     'type': 'object'}在实例上：    {'train': {'ContentType': 'text/csv;label_size=0',               'RecordWrapperType': 'None',               'S3DistributionType': 'ShardedByS3Key',               'TrainingInputMode': 'File'},     'validation': {'ContentType': 'text/csv;label_size=1',                    'RecordWrapperType': 'None',                    'S3DistributionType': 'FullyReplicated',                    'TrainingInputMode': 'File'}}2021-06-28 10:18:10 上传中 - 上传生成的训练模型2021-06-28 10:18:10 失败 - 训练任务失败ProfilerReport-1624875252: Stopping---------------------------------------------------------------------------UnexpectedStatusException                 Traceback (most recent call last)<ipython-input-34-c624ace00c69> in <module>     33      34 ---> 35 rcf.fit({'train': train_data, 'validation': val_data}, wait=True)~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/estimator.py in fit(self, inputs, wait, logs, job_name, experiment_config)    680         self.jobs.append(self.latest_training_job)    681         if wait:--> 682             self.latest_training_job.wait(logs=logs)    683     684     def _compilation_job_name(self):~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/estimator.py in wait(self, logs)   1623         # 如果请求日志，则调用logs_for_jobs。   1624         if logs != "None":-> 1625             self.sagemaker_session.logs_for_job(self.job_name, wait=True, log_type=logs)   1626         else:   1627             self.sagemaker_session.wait_for_job(self.job_name)~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/session.py in logs_for_job(self, job_name, wait, poll, log_type)   3679    3680         if wait:-> 3681             self._check_job_status(job_name, description, "TrainingJobStatus")   3682             if dot:   3683                 print()~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/session.py in _check_job_status(self, job, desc, status_key_name)   3243                 ),   3244                 allowed_statuses=["Completed", "Stopped"],-> 3245                 actual_status=status,   3246             )   3247 UnexpectedStatusException: 训练任务randomcutforest-2021-06-28-10-14-12-783的错误：失败。原因：ClientError: 无法初始化算法。无法验证输入数据配置。 (由ValidationError引起)由以下原因引起：不允许额外的属性（'validation'是意外的）在架构中验证'additionalProperties'失败：    {'$schema': 'http://json-schema.org/draft-04/schema#',     'additionalProperties': False,     'definitions': {'data_channel_replicated': {'properties': {'ContentType': {'type': 'string'},                                                                'RecordWrapperType': {'$ref': '#/definitions/record_wrapper_type'},                                                                'S3DistributionType': {'$ref': '#/definitions/s3_replicated_type'},                                                                'TrainingInputMode': {'$ref': '#/definitions/training_input_mode'}},                                                 'type': 'object'},                     'data_channel_sharded': {'properties': {'ContentType': {'type': 'string'},

有人能帮我正确实现训练过程中的验证吗？这将是我能得到的最好结果。:-D

亲切的问候，Christina

回答：

我找到了错误：你需要将通道命名为’test’而不是’validation’，这样就能工作：rcf.fit({‘train’: train_data, ‘test’: test_data}, wait=True)

学技术

使用Sagemaker的随机切割森林算法进行带验证的训练

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复