我试图上传一个PDF文件到Google的Document AI服务进行处理。我使用的是Google的Google.Cloud.DocumentAI.V1库来处理”C#”。查看了GitHub和文档,但没有找到太多信息。PDF文件位于本地驱动器上。我将PDF文件转换成了字节数组,然后再转换成ByteString。接着我将请求的mime类型设置为”application/pdf”,但返回了一个错误:
状态(状态码=”InvalidArgument”, 详细信息=”不支持的输入文件格式。”, 调试异常=”Grpc.Core.Internal.CoreErrorDetailException: {“created”:”@1627582435.256000000″,”description”:”从对等端ipv4:142.250.72.170:443接收到的错误”,”file”:”……\src\core\lib\surface\call.cc”,”file_line”:1067,”grpc_message”:”不支持的输入文件格式。”,”grpc_status”:3}”)
代码:
try{ //生成文档 string pdfFilePath = "C:\\Users\\maponte\\Documents\\Projects\\SettonProjects\\OCRSTUFF\\DOC071621-0016.pdf"; var bytes = Encoding.UTF8.GetBytes(pdfFilePath); ByteString content = ByteString.CopyFrom(bytes); // 创建客户端 DocumentProcessorServiceClient documentProcessorServiceClient = await DocumentProcessorServiceClient.CreateAsync(); // 初始化请求参数 ProcessRequest request = new ProcessRequest { ProcessorName = ProcessorName.FromProjectLocationProcessor("*****", "mycountry", "***"), SkipHumanReview = false, InlineDocument = new Document(), RawDocument = new RawDocument(), }; request.RawDocument.MimeType = "application/pdf"; request.RawDocument.Content = content; // 发起请求 ProcessResponse response = await documentProcessorServiceClient.ProcessDocumentAsync(request); Document docResponse = response.Document; Console.WriteLine(docResponse.Text); }catch(Exception ex){ Console.WriteLine(ex.Message);}
回答:
这是问题所在(至少是一个问题) – 你实际上并没有加载文件:
string pdfFilePath = "C:\\Users\\maponte\\Documents\\Projects\\SettonProjects\\OCRSTUFF\\DOC071621-0016.pdf";var bytes = Encoding.UTF8.GetBytes(pdfFilePath);ByteString content = ByteString.CopyFrom(bytes);
你应该这样做:
string pdfFilePath = "path-as-before";var bytes = File.ReadAllBytes(pdfFilePath);ByteString content = ByteString.CopyFrom(bytes);
我还想指出,InlineDocument
和RawDocument
是互斥的选项 – 指定其中一个会移除另一个。你的请求创建可以更好地写成这样:
ProcessRequest request = new ProcessRequest{ ProcessorName = ProcessorName.FromProjectLocationProcessor("*****", "mycountry", "***"), SkipHumanReview = false, RawDocument = new RawDocument { MimeType = "application/pdf", Content = content }};