使用AWS无服务器SageMaker端点调用Huggingface摘要模型

我尝试使用huggingface-summarization-bert-small2bert-small-finetuned-cnn-daily-mail-summarization模型在AWS无服务器SageMaker端点上运行简单的文本摘要功能。

这是我的AWS CloudFormation模板：

SageMakerModel:  Type: AWS::SageMaker::Model  Properties:    ModelName: SummarizationModel    Containers:      - Image: "763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:1.13.1-transformers4.26.0-cpu-py39-ubuntu20.04"        ModelDataUrl: "s3://jumpstart-cache-prod-us-east-1/huggingface-infer/infer-huggingface-summarization-bert-small2bert-small-finetuned-cnn-daily-mail-summarization.tar.gz"        Mode: SingleModel    ExecutionRoleArn: !GetAtt SageMakerExecutionRole.ArnSageMakerEndpointConfig:  Type: "AWS::SageMaker::EndpointConfig"  Properties:    ProductionVariants:      - ModelName: !GetAtt SageMakerModel.ModelName        VariantName: "ServerlessVariant"        ServerlessConfig:           MaxConcurrency: 1          MemorySizeInMB: 2048SageMakerEndpoint:  Type: "AWS::SageMaker::Endpoint"  Properties:    EndpointName: SummarizationEndpoint    EndpointConfigName:      !GetAtt SageMakerEndpointConfig.EndpointConfigName

据我所知，模型已经成功部署。

我部署了一个Python lambda函数来调用该端点。这是我的代码：

client = boto3.client('runtime.sagemaker')payload = {  'inputs': 'Summarize this text: This is a beautiful day. I am happy. I am going to the park.'}response = client.invoke_endpoint(        EndpointName="SummarizationEndpoint",         ContentType="application/json",         Accept="application/json",        Body=json.dumps(payload)        # Body=bytes(json.dumps(payload), 'utf-8') # alternative attempt - not working        # Body=json.dumps(payload).encode("utf-8") # alternative attempt - not working    )

当我运行这段代码时，我得到了以下错误：

An error occurred: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from model with message "{   "code": 400,   "type": "InternalServerException",    "message": "\u0027str\u0027 object is not callable"}".

由于这是ModelError，我假设模型已经部署，并且推理管道正在被调用。不过，我对有效载荷格式不太确定。从这里的测试代码来看，我猜测要摘要的文本应该像这里一样，在有效载荷的inputs属性中传递。然而，查看SummarizationPipeline时，我不太理解这里的注释 – 应该在某处有一个documents属性吗？我尝试了inputs、documents等所有可能的组合，但都没有成功。

将有效载荷传递给模型的正确方法是什么？我能看到一个工作示例吗？

更新1：当我使用版本payload = {'inputs':'...'}时，这是CloudWatch的日志：

[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.9/site-packages/transformers/pipelines/base.py", line 1084, in __call__[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Prediction error[INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 3[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last):[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.9/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 234, in handle[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     response = self.transform_fn(self.model, input_data, content_type, accept)[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.9/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 190, in transform_fn[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     predictions = self.predict(processed_data, model)[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.9/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 158, in predict[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     prediction = model(inputs)[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.9/site-packages/transformers/pipelines/text2text_generation.py", line 165, in __call__[INFO ] W-9000-model ACCESS_LOG - /127.0.0.1:48184 "POST /invocations HTTP/1.1" 400 3416[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     result = super().__call__(*args, **kwargs)[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.9/site-packages/transformers/pipelines/base.py", line 1084, in __call__[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.9/site-packages/transformers/pipelines/base.py", line 1090, in run_single[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     model_inputs = self.preprocess(inputs, **preprocess_params)[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.9/site-packages/transformers/pipelines/text2text_generation.py", line 175, in preprocess[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     inputs = self._parse_and_tokenize(inputs, truncation=truncation, **kwargs)[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.9/site-packages/transformers/pipelines/text2text_generation.py", line 130, in _parse_and_tokenize[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     inputs = self.tokenizer(*args, padding=padding, truncation=truncation, return_tensors=self.framework)[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - TypeError: 'str' object is not callable

我查看了handler_service.py的代码。由于第158行被执行，这意味着该有效载荷成功通过了第151行

        inputs = data.pop("inputs", data)

…这证实了inputs必须是属性名称。

然而，进一步查看堆栈跟踪，我没有发现什么有趣的东西。输入正在被传递给tokenizer，这就是我的堆栈跟踪结束的地方。

更新2：我注意到相同的调用代码在另一个模型上工作。这里是可以工作的模型yaml：

SageMakerModel2:  Type: AWS::SageMaker::Model  Properties:    ModelName: SummarizationModel2    Containers:      - Image: "763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:1.7.1-transformers4.6.1-cpu-py36-ubuntu18.04"        ModelDataUrl: "s3://jumpstart-cache-prod-us-east-1/huggingface-infer/infer-huggingface-translation-t5-small.tar.gz"        Mode: SingleModel    ExecutionRoleArn: !GetAtt SageMakerExecutionRole.Arn

经过进一步分析，我了解到多模型服务器在加载模型时调用handler_service.initialize来使用pipeline()函数创建管道。

然后我下载了两个模型，并尝试在我的机器上从两个模型中实例化管道，看看分词器会发生什么情况。这是代码…

# p1模型不起作用p1 = pipeline("summarization", "/REDACTED/Code/infer-huggingface-summarization-bert-small2bert-small-finetuned-cnn-daily-mail-summarization")# p2模型起作用p2 = pipeline("text2text-generation", "/REDACTED/Code/infer-huggingface-translation-t5-small/")print("P1的分词器: " + str(type(p1.tokenizer)))print("P2的分词器: " + str(type(p2.tokenizer)))

代码证明p1.tokenizer是NoneType，而p2.tokenizer是'transformers.models.t5.tokenization_t5_fast.T5TokenizerFast'类。

在进一步调查pipeline()函数的代码后，我发现在这一行…

load_tokenizer = type(model_config) in TOKENIZER_MAPPING or model_config.tokenizer_class is not None

…对于p1，load_tokenizer被设置为False，因为type(model_config)未在TOKENIZER_MAPPING中找到，而对于p2，load_tokenizer为True，因为它被找到了。（见这里和这里）。不过，我不确定这一发现是否相关，因为模型的inference.py中的model_fn()函数确实通过使用tokenizer = AutoTokenizer.from_pretrained(model_dir)创建了一个tokenizer，然后将其传递给SummarizationPipeline。我尝试在本地以这种方式创建分词器…

tokenizer = AutoTokenizer.from_pretrained("/REDACTED/Code/infer-huggingface-summarization-bert-small2bert-small-finetuned-cnn-daily-mail-summarization")

print(str(type(tokenizer)))

…我确实得到了类型为transformers.models.bert.tokenization_bert_fast.BertTokenizerFast的实例。

（我得承认，我并没有完全理解这里发生的一切，但我会继续调查…）

回答：

TLDR：问题在于我的模型tar.gz文件（即s3://jumpstart-cache-prod-us-east-1/huggingface-infer/infer-huggingface-summarization-bert-small2bert-small-finetuned-cnn-daily-mail-summarization.tar.gz）缺少自定义inference.py脚本，因此sagemaker_huggingface_inferencen_toolkit使用默认的load()函数加载模块，导致无法创建分词器实例，因此在调用分词器时失败，导致错误str object is not callable。

这是我解决问题所做的事情：

我从S3下载了模型的tar.gz文件
我通过调用sagemaker.script_uris查找对应的sourcedir.tar.gz文件的位置（见这里），并下载了sourcedir.tar.gz。
我解压了模型tar.gz和sourcedir.tar.gz，并将sourcedir中的代码移动到模型目录中的名为code的子目录中。
我将模型目录（现在包括包含inference.py的code子目录）打包成tar.gz，并将其上传到S3存储桶中。
我使用这个新S3存储桶的URL作为我的ModelDataURL。

更多细节：在调试问题时，我发现了更多细节。我使用的两个模型tar.gz都没有包含inference.py，但一个工作，另一个不工作。我试图理解调用堆栈，为什么一个失败而另一个不失败。如果你是初学者（像我一样）并且想了解更多正在发生的事情，我在这里分享它。

在SageMaker的Image属性中指定的docker镜像使用其ENTRYPOINT环境变量中指定的Python脚本来启动multi-model-server。它通过调用sagemaker_huggingface_inference_toolkit.mms_model_server.start_model_server()来实现这一点。该函数将要使用的handler_service传递给启动mms服务器的命令行。mms服务器已安装在docker镜像中（/opt/conda/bin/multi-model-server）。
MMS加载sagemaker_huggingface_inference_toolkit.handler_service并调用其构造函数__init__()函数，然后是initialize()函数。

构造函数使用sagemaker_inference.Environment()将environment.module_name设置为默认值inference.py，除非在SAGEMAKER_PROGRAM参数中另有指定。它还将code_dir（即module-path的code子目录）添加到PYTHON_PATH_ENV中 – 这在后面很重要。
initialize()函数将命令行调用multi-model-server的model-path参数（见上文，即”/opt/ml/model”）作为context.system_properties中的model_dir变量传递。然后initialize()调用self.validate_and_initialize_user_module()，它在加载的sage_maker.environment中查找module_name。如果指定了module_name（例如"inference.py"）并且可以使用imp相关文章：使用Sage Maker进行图像分类时遇到内存不足错误如何从AWS S3读取存储桶中的图像到Sagemaker Jupyter实例 AWS Lex在输入完全匹配的语句时却匹配到错误的意图如何提升Ubuntu系统性能？将音频与文本匹配 Huggingface sagemaker 在Django项目中可以使用.h5文件吗？如何在Python代码中分配LLM任务？ Llama-2 7B-hf直接从输入提示中重复问题上下文，并以换行符截断 AttributeError: ‘Dataset’ 对象没有属性 ‘remove_columns’ in hugging face

学技术

使用AWS无服务器SageMaker端点调用Huggingface摘要模型

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复