使用AWS无服务器SageMaker端点调用Huggingface摘要模型

我尝试使用huggingface-summarization-bert-small2bert-small-finetuned-cnn-daily-mail-summarization模型在AWS无服务器SageMaker端点上运行简单的文本摘要功能。

这是我的AWS CloudFormation模板:

SageMakerModel:  Type: AWS::SageMaker::Model  Properties:    ModelName: SummarizationModel    Containers:      - Image: "763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:1.13.1-transformers4.26.0-cpu-py39-ubuntu20.04"        ModelDataUrl: "s3://jumpstart-cache-prod-us-east-1/huggingface-infer/infer-huggingface-summarization-bert-small2bert-small-finetuned-cnn-daily-mail-summarization.tar.gz"        Mode: SingleModel    ExecutionRoleArn: !GetAtt SageMakerExecutionRole.ArnSageMakerEndpointConfig:  Type: "AWS::SageMaker::EndpointConfig"  Properties:    ProductionVariants:      - ModelName: !GetAtt SageMakerModel.ModelName        VariantName: "ServerlessVariant"        ServerlessConfig:           MaxConcurrency: 1          MemorySizeInMB: 2048SageMakerEndpoint:  Type: "AWS::SageMaker::Endpoint"  Properties:    EndpointName: SummarizationEndpoint    EndpointConfigName:      !GetAtt SageMakerEndpointConfig.EndpointConfigName

据我所知,模型已经成功部署。

我部署了一个Python lambda函数来调用该端点。这是我的代码:

client = boto3.client('runtime.sagemaker')payload = {  'inputs': 'Summarize this text: This is a beautiful day. I am happy. I am going to the park.'}response = client.invoke_endpoint(        EndpointName="SummarizationEndpoint",         ContentType="application/json",         Accept="application/json",        Body=json.dumps(payload)        # Body=bytes(json.dumps(payload), 'utf-8') # alternative attempt - not working        # Body=json.dumps(payload).encode("utf-8") # alternative attempt - not working    ) 

当我运行这段代码时,我得到了以下错误:

An error occurred: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from model with message "{   "code": 400,   "type": "InternalServerException",    "message": "\u0027str\u0027 object is not callable"}".

由于这是ModelError,我假设模型已经部署,并且推理管道正在被调用。不过,我对有效载荷格式不太确定。从这里的测试代码来看,我猜测要摘要的文本应该像这里一样,在有效载荷的inputs属性中传递。然而,查看SummarizationPipeline时,我不太理解这里的注释 – 应该在某处有一个documents属性吗?我尝试了inputsdocuments等所有可能的组合,但都没有成功。

将有效载荷传递给模型的正确方法是什么?我能看到一个工作示例吗?

更新1:当我使用版本payload = {'inputs':'...'}时,这是CloudWatch的日志:

[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.9/site-packages/transformers/pipelines/base.py", line 1084, in __call__[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Prediction error[INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 3[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last):[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.9/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 234, in handle[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     response = self.transform_fn(self.model, input_data, content_type, accept)[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.9/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 190, in transform_fn[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     predictions = self.predict(processed_data, model)[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.9/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 158, in predict[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     prediction = model(inputs)[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.9/site-packages/transformers/pipelines/text2text_generation.py", line 165, in __call__[INFO ] W-9000-model ACCESS_LOG - /127.0.0.1:48184 "POST /invocations HTTP/1.1" 400 3416[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     result = super().__call__(*args, **kwargs)[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.9/site-packages/transformers/pipelines/base.py", line 1084, in __call__[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.9/site-packages/transformers/pipelines/base.py", line 1090, in run_single[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     model_inputs = self.preprocess(inputs, **preprocess_params)[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.9/site-packages/transformers/pipelines/text2text_generation.py", line 175, in preprocess[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     inputs = self._parse_and_tokenize(inputs, truncation=truncation, **kwargs)[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.9/site-packages/transformers/pipelines/text2text_generation.py", line 130, in _parse_and_tokenize[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     inputs = self.tokenizer(*args, padding=padding, truncation=truncation, return_tensors=self.framework)[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - TypeError: 'str' object is not callable

我查看了handler_service.py的代码。由于第158行被执行,这意味着该有效载荷成功通过了第151行

        inputs = data.pop("inputs", data)

…这证实了inputs必须是属性名称。

然而,进一步查看堆栈跟踪,我没有发现什么有趣的东西。输入正在被传递给tokenizer,这就是我的堆栈跟踪结束的地方。

更新2:我注意到相同的调用代码在另一个模型上工作。这里是可以工作的模型yaml:

SageMakerModel2:  Type: AWS::SageMaker::Model  Properties:    ModelName: SummarizationModel2    Containers:      - Image: "763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:1.7.1-transformers4.6.1-cpu-py36-ubuntu18.04"        ModelDataUrl: "s3://jumpstart-cache-prod-us-east-1/huggingface-infer/infer-huggingface-translation-t5-small.tar.gz"        Mode: SingleModel    ExecutionRoleArn: !GetAtt SageMakerExecutionRole.Arn

经过进一步分析,我了解到多模型服务器在加载模型时调用handler_service.initialize来使用pipeline()函数创建管道。

然后我下载了两个模型,并尝试在我的机器上从两个模型中实例化管道,看看分词器会发生什么情况。这是代码…

# p1模型不起作用p1 = pipeline("summarization", "/REDACTED/Code/infer-huggingface-summarization-bert-small2bert-small-finetuned-cnn-daily-mail-summarization")# p2模型起作用p2 = pipeline("text2text-generation", "/REDACTED/Code/infer-huggingface-translation-t5-small/")print("P1的分词器: " + str(type(p1.tokenizer)))print("P2的分词器: " + str(type(p2.tokenizer)))

代码证明p1.tokenizerNoneType,而p2.tokenizer'transformers.models.t5.tokenization_t5_fast.T5TokenizerFast'类。

在进一步调查pipeline()函数的代码后,我发现在这一行…

load_tokenizer = type(model_config) in TOKENIZER_MAPPING or model_config.tokenizer_class is not None

…对于p1load_tokenizer被设置为False,因为type(model_config)未在TOKENIZER_MAPPING中找到,而对于p2load_tokenizerTrue,因为它被找到了。(见这里这里)。不过,我不确定这一发现是否相关,因为模型的inference.py中的model_fn()函数确实通过使用tokenizer = AutoTokenizer.from_pretrained(model_dir)创建了一个tokenizer,然后将其传递给SummarizationPipeline。我尝试在本地以这种方式创建分词器…

tokenizer = AutoTokenizer.from_pretrained("/REDACTED/Code/infer-huggingface-summarization-bert-small2bert-small-finetuned-cnn-daily-mail-summarization")

print(str(type(tokenizer)))

…我确实得到了类型为transformers.models.bert.tokenization_bert_fast.BertTokenizerFast的实例。

(我得承认,我并没有完全理解这里发生的一切,但我会继续调查…)


回答:

TLDR:问题在于我的模型tar.gz文件(即s3://jumpstart-cache-prod-us-east-1/huggingface-infer/infer-huggingface-summarization-bert-small2bert-small-finetuned-cnn-daily-mail-summarization.tar.gz)缺少自定义inference.py脚本,因此sagemaker_huggingface_inferencen_toolkit使用默认的load()函数加载模块,导致无法创建分词器实例,因此在调用分词器时失败,导致错误str object is not callable

这是我解决问题所做的事情:

  • 我从S3下载了模型的tar.gz文件
  • 我通过调用sagemaker.script_uris查找对应的sourcedir.tar.gz文件的位置(见这里),并下载了sourcedir.tar.gz
  • 我解压了模型tar.gz和sourcedir.tar.gz,并将sourcedir中的代码移动到模型目录中的名为code的子目录中。
  • 我将模型目录(现在包括包含inference.pycode子目录)打包成tar.gz,并将其上传到S3存储桶中。
  • 我使用这个新S3存储桶的URL作为我的ModelDataURL

更多细节:在调试问题时,我发现了更多细节。我使用的两个模型tar.gz都没有包含inference.py,但一个工作,另一个不工作。我试图理解调用堆栈,为什么一个失败而另一个不失败。如果你是初学者(像我一样)并且想了解更多正在发生的事情,我在这里分享它。

  1. 在SageMaker的Image属性中指定的docker镜像使用其ENTRYPOINT环境变量中指定的Python脚本来启动multi-model-server。它通过调用sagemaker_huggingface_inference_toolkit.mms_model_server.start_model_server()来实现这一点。该函数将要使用的handler_service传递给启动mms服务器的命令行。mms服务器已安装在docker镜像中(/opt/conda/bin/multi-model-server)。
  2. MMS加载sagemaker_huggingface_inference_toolkit.handler_service并调用其构造函数__init__()函数,然后是initialize()函数。

Related Posts

L1-L2正则化的不同系数

我想对网络的权重同时应用L1和L2正则化。然而,我找不…

使用scikit-learn的无监督方法将列表分类成不同组别,有没有办法?

我有一系列实例,每个实例都有一份列表,代表它所遵循的不…

f1_score metric in lightgbm

我想使用自定义指标f1_score来训练一个lgb模型…

通过相关系数矩阵进行特征选择

我在测试不同的算法时,如逻辑回归、高斯朴素贝叶斯、随机…

可以将机器学习库用于流式输入和输出吗?

已关闭。此问题需要更加聚焦。目前不接受回答。 想要改进…

在TensorFlow中,queue.dequeue_up_to()方法的用途是什么?

我对这个方法感到非常困惑,特别是当我发现这个令人费解的…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注