我目前正在使用OpenAI和Azure AI Search(前身为认知服务)在Azure上实现RAG。我有大约50到65个JSON文件需要在我的企业数据上进行搜索。在聊 bot 的引用中,我只得到了“citation”这个文本,我试图检索DOI,这是文档在线的URL,以及科学文章的标题。这些文件是以.txt格式保存的。
我按照这种方式格式化了我的JSON文件,其中键’content’和’title’是我唯一想要进行语义搜索并使其可检索的字段,而我只希望DOI(URL)是可检索的。
{ "content": "The human eye is a complex organ responsible for vision, capturing light and converting it into neural signals for the brain to interpret. It consists of multiple parts, including the cornea, lens, and retina, each playing a vital role in the process of seeing.", "date": "2023-07-15", "Title": "The Magic of Vision", "editorial_house": "MIT Research Meds and Public Health", "doi": "https://doi.org/10.1234", "author": "Dr. John Mayer"}
然而,当我在Azure AI搜索页面时,我从未在元数据中选择到其他字段:
如您所见,只有’content’显示出来,我在搜索的脚注中仍然得到这个不吸引人的引用。我如何才能以我想要的方式使数据可检索?
由于我没有使用代码来做这件事,只使用了Azure Studio网页,我不确定是否唯一的方法是使用代码。
我期望的输出是这样的:
这是可能的吗?使用Azure Studio或仅使用代码可以实现吗?
更新
我像这样设置了自定义映射:
然而,虽然我在引文面板中得到了正确的标题和内容,我缺少DOI,这是出版物的URL。我做错了什么吗?
回答:
您可以通过导入数据和定义索引两种方式来实现。
在门户中,点击导入数据后,您会看到一个选项连接到您的数据,在这里您需要将解析模式配置为JSON。
然后您将得到正确的字段。
在这里,您可以删除任何您不需要的字段。
另一种方法是创建带有自定义定义的索引,如下所示。
[{ "name": "content", "type": "Edm.String", "searchable": true, "filterable": false, "retrievable": true, "stored": true, "sortable": false, "facetable": false, "key": false, "indexAnalyzer": null, "searchAnalyzer": null, "analyzer": "standard.lucene", "normalizer": null, "dimensions": null, "vectorSearchProfile": null, "vectorEncoding": null, "synonymMaps": [] }, { "name": "date", "type": "Edm.DateTimeOffset", "searchable": false, "filterable": false, "retrievable": true, "stored": true, "sortable": false, "facetable": false, "key": false, "indexAnalyzer": null, "searchAnalyzer": null, "analyzer": null, "normalizer": null, "dimensions": null, "vectorSearchProfile": null, "vectorEncoding": null, "synonymMaps": [] }, { "name": "Title", "type": "Edm.String", "searchable": true, "filterable": false, "retrievable": true, "stored": true, "sortable": false, "facetable": false, "key": false, "indexAnalyzer": null, "searchAnalyzer": null, "analyzer": "standard.lucene", "normalizer": null, "dimensions": null, "vectorSearchProfile": null, "vectorEncoding": null, "synonymMaps": [] }, { "name": "editorial_house", "type": "Edm.String", "searchable": true, "filterable": false, "retrievable": true, "stored": true, "sortable": false, "facetable": false, "key": false, "indexAnalyzer": null, "searchAnalyzer": null, "analyzer": "standard.lucene", "normalizer": null, "dimensions": null, "vectorSearchProfile": null, "vectorEncoding": null, "synonymMaps": [] }, { "name": "doi", "type": "Edm.String", "searchable": true, "filterable": false, "retrievable": true, "stored": true, "sortable": false, "facetable": false, "key": false, "indexAnalyzer": null, "searchAnalyzer": null, "analyzer": "standard.lucene", "normalizer": null, "dimensions": null, "vectorSearchProfile": null, "vectorEncoding": null, "synonymMaps": [] }, { "name": "author", "type": "Edm.String", "searchable": true, "filterable": false, "retrievable": true, "stored": true, "sortable": false, "facetable": false, "key": false, "indexAnalyzer": null, "searchAnalyzer": null, "analyzer": "standard.lucene", "normalizer": null, "dimensions": null, "vectorSearchProfile": null, "vectorEncoding": null, "synonymMaps": [] }, { "name": "metadata_storage_size", "type": "Edm.Int64", "searchable": false, "filterable": false, "retrievable": true, "stored": true, "sortable": false, "facetable": false, "key": false, "indexAnalyzer": null, "searchAnalyzer": null, "analyzer": null, "normalizer": null, "dimensions": null, "vectorSearchProfile": null, "vectorEncoding": null, "synonymMaps": [] }, { "name": "metadata_storage_path", "type": "Edm.String", "searchable": true, "filterable": false, "retrievable": true, "stored": true, "sortable": false, "facetable": false, "key": true, "indexAnalyzer": null, "searchAnalyzer": null, "analyzer": "standard.lucene", "normalizer": null, "dimensions": null, "vectorSearchProfile": null, "vectorEncoding": null, "synonymMaps": [] } ]
接下来像下面这样配置索引器。
保存后,重置并运行索引器。