我有亚马逊示例代码来运行comprehend.start_topics_detection_job
。这是我为我的任务填写变量后的代码:
import reimport csvimport pytzimport boto3import json# https://docs.aws.amazon.com/code-samples/latest/catalog/python-comprehend-TopicModeling.py.html# https://docs.aws.amazon.com/comprehend/latest/dg/API_InputDataConfig.html# 在运行程序之前设置这些值input_s3_url = "s3://comprehend-topic-modelling-bucket/input_800_cleaned_articles/"input_doc_format = "ONE_DOC_PER_LINE"output_s3_url = "s3://comprehend-topic-modelling-bucket/output"data_access_role_arn = "arn:aws:iam::372656143103:role/access-aws-services-from-sagemaker"number_of_topics = 30# 设置作业配置input_data_config = {"S3Uri": input_s3_url, "InputFormat": input_doc_format}output_data_config = {"S3Uri": output_s3_url}# 开始一个作业来检测文档集合中的主题comprehend = boto3.client('comprehend')start_result = comprehend.start_topics_detection_job( NumberOfTopics=number_of_topics, InputDataConfig=input_data_config, OutputDataConfig=output_data_config, DataAccessRoleArn=data_access_role_arn)# 输出结果print('Start Topic Detection Job: ' + json.dumps(start_result))job_id = start_result['JobId']print(f'job_id: {job_id}')# 检索并输出关于作业的信息describe_result = comprehend.describe_topics_detection_job(JobId=job_id)print('Describe Job: ' + json.dumps(describe_result)) . #<===LINE 36# 列出并输出当前作业的信息list_result = comprehend.list_topics_detection_jobs()print('list_topics_detection_jobs_result: ' + json.dumps(list_result))
它出现了以下错误:
---------------------------------------------------------------------------TypeError Traceback (most recent call last)<ipython-input-8-840a7ee043d4> in <module>() 34 # 检索并输出关于作业的信息 35 describe_result = comprehend.describe_topics_detection_job(JobId=job_id)---> 36 print('Describe Job: ' + json.dumps(describe_result)) 37 38 # 列出并输出当前作业的信息~/anaconda3/envs/python3/lib/python3.6/json/__init__.py in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw) 229 cls is None and indent is None and separators is None and 230 default is None and not sort_keys and not kw):--> 231 return _default_encoder.encode(obj) 232 if cls is None: 233 cls = JSONEncoder~/anaconda3/envs/python3/lib/python3.6/json/encoder.py in encode(self, o) 197 # exceptions aren't as detailed. The list call should be roughly 198 # equivalent to the PySequence_Fast that ''.join() would do.--> 199 chunks = self.iterencode(o, _one_shot=True) 200 if not isinstance(chunks, (list, tuple)): 201 chunks = list(chunks)~/anaconda3/envs/python3/lib/python3.6/json/encoder.py in iterencode(self, o, _one_shot) 255 self.key_separator, self.item_separator, self.sort_keys, 256 self.skipkeys, _one_shot)--> 257 return _iterencode(o, 0) 258 259 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,~/anaconda3/envs/python3/lib/python3.6/json/encoder.py in default(self, o) 178 """ 179 raise TypeError("Object of type '%s' is not JSON serializable" %--> 180 o.__class__.__name__) 181 182 def encode(self, o):TypeError: Object of type 'datetime' is not JSON serializable
它在按下“运行”的一瞬间就失败了。我认为对comprehend.start_topics_detection_job
的调用可能失败了,导致第36行的错误,print('Describe Job: ' + json.dumps(describe_result))
。
我遗漏了什么?
更新
笔记本和上述代码使用了相同的IAM角色。以下是当前分配给该IAM角色的权限:
回答:
事实证明,调用comprehend.describe_topics_detection_job
并没有问题——它只是在describe_result
中返回了无法进行JSON序列化的内容,因此json.dumps(describe_result))
抛出了错误。