我正在使用 Azure OpenAI 为我的编辑公司创建一个聊 bot,用于从大量企业文本数据中检索数据。目前,我在使用 Azure AI Search 时遇到了一个挑战。最初,所有的数据都在一个索引中,但现在由于条件搜索的需求,我需要将数据分成三个不同的索引。以下是详细信息:
- 索引 1:生物学索引(私人,FR)
- 索引 2:工程与技术索引(EN)
- 索引 3:艺术与建筑索引(USA, UK)
这些索引包含各种数据源和出版物,并且它们之间存在主题重叠。例如,当查询与解剖学相关的主题,如视力、心血管疾病或生长激素治疗时,我希望这些查询以及相关的生物学主题,仅从生物学索引(索引 2)中检索数据。
我的 Python 代码可以有效地从单一索引中检索准确的数据,但我正在寻找 Azure AI Search 内的一个解决方案,以便根据查询上下文优先考虑特定的索引。
例如:
-
与生物学相关的查询应仅从索引 1 和 2 中检索数据。
-
与技术、数据科学和人工智能相关的查询应仅从索引 2 中检索数据。
我还没有找到直接解决这一特定需求的服务或 GitHub 存储库。我知道 Azure 不支持多索引搜索。
我该如何找到解决方案或变通方法?
这是我用于 RAG 的代码
index_name = 'indx-editorials-bio-fr-old'# Query to executequery = 'Please retrieve publications from editorial certified houses covering cardiovascular diseases'# Function to execute the query with semantic rankingdef execute_query_with_semantic_ranking(): try: # Create a SearchClient for the index credential = AzureKeyCredential(admin_key) client = SearchClient(endpoint=endpoint, index_name=index_name, credential=credential) # Execute the query with semantic ranking results = client.search(search_text=query, semantic_fields=["content", "title"]) # Print the results print(f"Results from index '{index_name}' with semantic ranking:") for result in results: print(result) print() except Exception as e: print(f"Error querying index '{index_name}' with semantic ranking: {e}")# Execute the query with semantic rankingexecute_query_with_semantic_ranking()
索引定义:
{ "@odata.context": "search.windows.net", "@odata.etag": "\"123547858WRF\"", "name": "all_articles_index", "defaultScoringProfile": null, "fields": [ { "name": "content", "type": "Edm.String", "searchable": true, "filterable": true, "retrievable": true, "stored": true, "sortable": true, "facetable": false, "key": false, "indexAnalyzer": null, "searchAnalyzer": null, "analyzer": null, "normalizer": null, "dimensions": null, "vectorSearchProfile": null, "vectorEncoding": null, "synonymMaps": [] }, { "name": "title", "type": "Edm.String", "searchable": true, "filterable": true, "retrievable": true, "stored": true, "sortable": true, "facetable": false, "key": false, "indexAnalyzer": null, "searchAnalyzer": null, "analyzer": null, "normalizer": null, "dimensions": null, "vectorSearchProfile": null, "vectorEncoding": null, "synonymMaps": [] }, { "name": "doi", "type": "Edm.String", "searchable": false, "filterable": false, "retrievable": true, "stored": true, "sortable": false, "facetable": false, "key": false, "indexAnalyzer": null, "searchAnalyzer": null, "analyzer": null, "normalizer": null, "dimensions": null, "vectorSearchProfile": null, "vectorEncoding": null, "synonymMaps": [] }, { "name": "editorial_house", "type": "Edm.String", "searchable": false, "filterable": false, "retrievable": true, "stored": true, "sortable": false, "facetable": false, "key": false, "indexAnalyzer": null, "searchAnalyzer": null, "analyzer": null, "normalizer": null, "dimensions": null, "vectorSearchProfile": null, "vectorEncoding": null, "synonymMaps": [] }, { "name": "metadata_storage_path", "type": "Edm.String", "searchable": false, "filterable": false, "retrievable": true, "stored": true, "sortable": false, "facetable": false, "key": true, "indexAnalyzer": null, "searchAnalyzer": null, "analyzer": null, "normalizer": null, "dimensions": null, "vectorSearchProfile": null, "vectorEncoding": null, "synonymMaps": [] } ], "scoringProfiles": [], "corsOptions": null, "suggesters": [], "analyzers": [], "normalizers": [], "tokenizers": [], "tokenFilters": [], "charFilters": [], "encryptionKey": null, "similarity": { "@odata.type": "BM25Similarity", "k1": null, "b": null }, "semantic": { "defaultConfiguration": null, "configurations": [ { "name": "article-semantic", "prioritizedFields": { "titleField": { "fieldName": "title" }, "prioritizedContentFields": [ { "fieldName": "content" } ], "prioritizedKeywordsFields": [] } } ] }, "vectorSearch": null}
样本数据
[ { "content": "This article explores the potential of AI to revolutionize genomics, highlighting recent breakthroughs and future prospects.", "title": "The Impact of AI on Genomics: Recent Breakthroughs and Future Prospects", "doi": "10.1234/ai-bio-2024-001", "editorial_house": "BioTech Publishers", "metadata_storage_path": "/articles/2024/ai-bio-2024-001" }, { "content": "In this study, we discuss the integration of machine learning in drug discovery processes, focusing on its benefits and challenges.", "title": "Machine Learning in Drug Discovery: Benefits and Challenges", "doi": "10.1234/ai-bio-2024-002", "editorial_house": "BioTech Publishers", "metadata_storage_path": "/articles/2024/ai-bio-2024-002" }, { "content": "This paper examines the role of AI in ecological monitoring, presenting case studies on wildlife conservation efforts.", "title": "AI in Ecological Monitoring: Wildlife Conservation Case Studies", "doi": "10.1234/ai-bio-2024-003", "editorial_house": "BioTech Publishers", "metadata_storage_path": "/articles/2024/ai-bio-2024-003" }, { "content": "The article reviews advances in bioinformatics driven by AI, with a focus on data analysis techniques and their applications.", "title": "Advances in Bioinformatics: AI-Driven Data Analysis Techniques", "doi": "10.1234/ai-bio-2024-004", "editorial_house": "BioTech Publishers", "metadata_storage_path": "/articles/2024/ai-bio-2024-004" }, { "content": "This study highlights the use of AI in personalized medicine, detailing the technology's impact on treatment plans and patient outcomes.", "title": "Personalized Medicine: AI's Role in Tailoring Treatment Plans", "doi": "10.1234/ai-bio-2024-005", "editorial_house": "BioTech Publishers", "metadata_storage_path": "/articles/2024/ai-bio-2024-005" }]
回答:
是的,正如你所说,多索引查询是不可能的。对于你的问题,以下是可能的解决方案:
你提到你正在创建三个新索引,同时你还需要创建第四个索引,其中包含所有内容、主题和索引名称作为字段。
样本数据
{"index_name":"Biology Index","content":"所有关于生物学主题的内容"},{"index_name":"Engineering and Technology Index","content":"所有关于工程与技术主题的内容"},{"index_name":"Art and Architecture Index","content":"所有关于艺术与建筑主题的内容"}
因此,创建一个包含上述样本数据的第四个索引,如果每个主题有多个文档,则将它们合并并添加到内容字段中。
接下来,在这个第四个索引上使用输入进行查询,并从结果中获取具有最高 search.score
的索引名称,然后在你的 Python 代码中使用该索引名称进行进一步查询。