我尝试使用huggingface-summarization-bert-small2bert-small-finetuned-cnn-daily-mail-summarization
模型在AWS无服务器SageMaker端点上运行简单的文本摘要功能。
这是我的AWS CloudFormation模板:
SageMakerModel: Type: AWS::SageMaker::Model Properties: ModelName: SummarizationModel Containers: - Image: "763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:1.13.1-transformers4.26.0-cpu-py39-ubuntu20.04" ModelDataUrl: "s3://jumpstart-cache-prod-us-east-1/huggingface-infer/infer-huggingface-summarization-bert-small2bert-small-finetuned-cnn-daily-mail-summarization.tar.gz" Mode: SingleModel ExecutionRoleArn: !GetAtt SageMakerExecutionRole.ArnSageMakerEndpointConfig: Type: "AWS::SageMaker::EndpointConfig" Properties: ProductionVariants: - ModelName: !GetAtt SageMakerModel.ModelName VariantName: "ServerlessVariant" ServerlessConfig: MaxConcurrency: 1 MemorySizeInMB: 2048SageMakerEndpoint: Type: "AWS::SageMaker::Endpoint" Properties: EndpointName: SummarizationEndpoint EndpointConfigName: !GetAtt SageMakerEndpointConfig.EndpointConfigName
据我所知,模型已经成功部署。
我部署了一个Python lambda函数来调用该端点。这是我的代码:
client = boto3.client('runtime.sagemaker')payload = { 'inputs': 'Summarize this text: This is a beautiful day. I am happy. I am going to the park.'}response = client.invoke_endpoint( EndpointName="SummarizationEndpoint", ContentType="application/json", Accept="application/json", Body=json.dumps(payload) # Body=bytes(json.dumps(payload), 'utf-8') # alternative attempt - not working # Body=json.dumps(payload).encode("utf-8") # alternative attempt - not working )
当我运行这段代码时,我得到了以下错误:
An error occurred: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from model with message "{ "code": 400, "type": "InternalServerException", "message": "\u0027str\u0027 object is not callable"}".
由于这是ModelError
,我假设模型已经部署,并且推理管道正在被调用。不过,我对有效载荷格式不太确定。从这里的测试代码来看,我猜测要摘要的文本应该像这里一样,在有效载荷的inputs
属性中传递。然而,查看SummarizationPipeline
时,我不太理解这里的注释 – 应该在某处有一个documents
属性吗?我尝试了inputs
、documents
等所有可能的组合,但都没有成功。
将有效载荷传递给模型的正确方法是什么?我能看到一个工作示例吗?
更新1:当我使用版本payload = {'inputs':'...'}
时,这是CloudWatch的日志:
[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.9/site-packages/transformers/pipelines/base.py", line 1084, in __call__[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Prediction error[INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 3[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last):[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.9/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 234, in handle[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - response = self.transform_fn(self.model, input_data, content_type, accept)[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.9/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 190, in transform_fn[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - predictions = self.predict(processed_data, model)[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.9/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 158, in predict[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - prediction = model(inputs)[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.9/site-packages/transformers/pipelines/text2text_generation.py", line 165, in __call__[INFO ] W-9000-model ACCESS_LOG - /127.0.0.1:48184 "POST /invocations HTTP/1.1" 400 3416[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - result = super().__call__(*args, **kwargs)[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.9/site-packages/transformers/pipelines/base.py", line 1084, in __call__[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.9/site-packages/transformers/pipelines/base.py", line 1090, in run_single[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - model_inputs = self.preprocess(inputs, **preprocess_params)[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.9/site-packages/transformers/pipelines/text2text_generation.py", line 175, in preprocess[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - inputs = self._parse_and_tokenize(inputs, truncation=truncation, **kwargs)[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.9/site-packages/transformers/pipelines/text2text_generation.py", line 130, in _parse_and_tokenize[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - inputs = self.tokenizer(*args, padding=padding, truncation=truncation, return_tensors=self.framework)[INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - TypeError: 'str' object is not callable
我查看了handler_service.py
的代码。由于第158行被执行,这意味着该有效载荷成功通过了第151行
inputs = data.pop("inputs", data)
…这证实了inputs
必须是属性名称。
然而,进一步查看堆栈跟踪,我没有发现什么有趣的东西。输入正在被传递给tokenizer
,这就是我的堆栈跟踪结束的地方。
更新2:我注意到相同的调用代码在另一个模型上工作。这里是可以工作的模型yaml:
SageMakerModel2: Type: AWS::SageMaker::Model Properties: ModelName: SummarizationModel2 Containers: - Image: "763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:1.7.1-transformers4.6.1-cpu-py36-ubuntu18.04" ModelDataUrl: "s3://jumpstart-cache-prod-us-east-1/huggingface-infer/infer-huggingface-translation-t5-small.tar.gz" Mode: SingleModel ExecutionRoleArn: !GetAtt SageMakerExecutionRole.Arn
经过进一步分析,我了解到多模型服务器在加载模型时调用handler_service.initialize
来使用pipeline()
函数创建管道。
然后我下载了两个模型,并尝试在我的机器上从两个模型中实例化管道,看看分词器会发生什么情况。这是代码…
# p1模型不起作用p1 = pipeline("summarization", "/REDACTED/Code/infer-huggingface-summarization-bert-small2bert-small-finetuned-cnn-daily-mail-summarization")# p2模型起作用p2 = pipeline("text2text-generation", "/REDACTED/Code/infer-huggingface-translation-t5-small/")print("P1的分词器: " + str(type(p1.tokenizer)))print("P2的分词器: " + str(type(p2.tokenizer)))
代码证明p1.tokenizer
是NoneType
,而p2.tokenizer
是'transformers.models.t5.tokenization_t5_fast.T5TokenizerFast'
类。
在进一步调查pipeline()
函数的代码后,我发现在这一行…
load_tokenizer = type(model_config) in TOKENIZER_MAPPING or model_config.tokenizer_class is not None
…对于p1
,load_tokenizer
被设置为False
,因为type(model_config)
未在TOKENIZER_MAPPING
中找到,而对于p2
,load_tokenizer
为True
,因为它被找到了。(见这里和这里)。不过,我不确定这一发现是否相关,因为模型的inference.py
中的model_fn()
函数确实通过使用tokenizer = AutoTokenizer.from_pretrained(model_dir)
创建了一个tokenizer
,然后将其传递给SummarizationPipeline
。我尝试在本地以这种方式创建分词器…
tokenizer = AutoTokenizer.from_pretrained("/REDACTED/Code/infer-huggingface-summarization-bert-small2bert-small-finetuned-cnn-daily-mail-summarization")
print(str(type(tokenizer)))
…我确实得到了类型为transformers.models.bert.tokenization_bert_fast.BertTokenizerFast
的实例。
(我得承认,我并没有完全理解这里发生的一切,但我会继续调查…)
回答:
TLDR:问题在于我的模型tar.gz文件(即s3://jumpstart-cache-prod-us-east-1/huggingface-infer/infer-huggingface-summarization-bert-small2bert-small-finetuned-cnn-daily-mail-summarization.tar.gz)缺少自定义inference.py
脚本,因此sagemaker_huggingface_inferencen_toolkit
使用默认的load()
函数加载模块,导致无法创建分词器实例,因此在调用分词器时失败,导致错误str object is not callable
。
这是我解决问题所做的事情:
- 我从S3下载了模型的tar.gz文件
- 我通过调用
sagemaker.script_uris
查找对应的sourcedir.tar.gz
文件的位置(见这里),并下载了sourcedir.tar.gz
。 - 我解压了模型tar.gz和
sourcedir.tar.gz
,并将sourcedir中的代码移动到模型目录中的名为code
的子目录中。 - 我将模型目录(现在包括包含
inference.py
的code
子目录)打包成tar.gz,并将其上传到S3存储桶中。 - 我使用这个新S3存储桶的URL作为我的
ModelDataURL
。
更多细节:在调试问题时,我发现了更多细节。我使用的两个模型tar.gz都没有包含inference.py
,但一个工作,另一个不工作。我试图理解调用堆栈,为什么一个失败而另一个不失败。如果你是初学者(像我一样)并且想了解更多正在发生的事情,我在这里分享它。
- 在SageMaker的
Image
属性中指定的docker镜像使用其ENTRYPOINT环境变量中指定的Python脚本来启动multi-model-server
。它通过调用sagemaker_huggingface_inference_toolkit.mms_model_server.start_model_server()
来实现这一点。该函数将要使用的handler_service
传递给启动mms服务器的命令行。mms服务器已安装在docker镜像中(/opt/conda/bin/multi-model-server)。 - MMS加载
sagemaker_huggingface_inference_toolkit.handler_service
并调用其构造函数__init__()
函数,然后是initialize()
函数。
- 构造函数使用
sagemaker_inference.Environment()
将environment.module_name
设置为默认值inference.py
,除非在SAGEMAKER_PROGRAM
参数中另有指定。它还将code_dir
(即module-path
的code
子目录)添加到PYTHON_PATH_ENV
中 – 这在后面很重要。 initialize()
函数将命令行调用multi-model-server
的model-path
参数(见上文,即”/opt/ml/model”)作为context.system_properties
中的model_dir
变量传递。然后initialize()
调用self.validate_and_initialize_user_module()
,它在加载的sage_maker.environment
中查找module_name
。如果指定了module_name
(例如"inference.py"
)并且可以使用imp