Azure AI Search 多索引 / 条件语义搜索

我正在使用 Azure OpenAI 为我的编辑公司创建一个聊 bot，用于从大量企业文本数据中检索数据。目前，我在使用 Azure AI Search 时遇到了一个挑战。最初，所有的数据都在一个索引中，但现在由于条件搜索的需求，我需要将数据分成三个不同的索引。以下是详细信息：

索引 1：生物学索引（私人，FR）
索引 2：工程与技术索引（EN）
索引 3：艺术与建筑索引（USA, UK）

这些索引包含各种数据源和出版物，并且它们之间存在主题重叠。例如，当查询与解剖学相关的主题，如视力、心血管疾病或生长激素治疗时，我希望这些查询以及相关的生物学主题，仅从生物学索引（索引 2）中检索数据。

我的 Python 代码可以有效地从单一索引中检索准确的数据，但我正在寻找 Azure AI Search 内的一个解决方案，以便根据查询上下文优先考虑特定的索引。

例如：

与生物学相关的查询应仅从索引 1 和 2 中检索数据。
与技术、数据科学和人工智能相关的查询应仅从索引 2 中检索数据。

我还没有找到直接解决这一特定需求的服务或 GitHub 存储库。我知道 Azure 不支持多索引搜索。

我该如何找到解决方案或变通方法？

这是我用于 RAG 的代码

index_name = 'indx-editorials-bio-fr-old'# Query to executequery = 'Please retrieve publications from editorial certified houses covering cardiovascular diseases'# Function to execute the query with semantic rankingdef execute_query_with_semantic_ranking():    try:        # Create a SearchClient for the index        credential = AzureKeyCredential(admin_key)        client = SearchClient(endpoint=endpoint, index_name=index_name, credential=credential)                # Execute the query with semantic ranking        results = client.search(search_text=query, semantic_fields=["content", "title"])                # Print the results        print(f"Results from index '{index_name}' with semantic ranking:")        for result in results:            print(result)        print()        except Exception as e:        print(f"Error querying index '{index_name}' with semantic ranking: {e}")# Execute the query with semantic rankingexecute_query_with_semantic_ranking()

索引定义：

{  "@odata.context": "search.windows.net",  "@odata.etag": "\"123547858WRF\"",  "name": "all_articles_index",  "defaultScoringProfile": null,  "fields": [    {      "name": "content",      "type": "Edm.String",      "searchable": true,      "filterable": true,      "retrievable": true,      "stored": true,      "sortable": true,      "facetable": false,      "key": false,      "indexAnalyzer": null,      "searchAnalyzer": null,      "analyzer": null,      "normalizer": null,      "dimensions": null,      "vectorSearchProfile": null,      "vectorEncoding": null,      "synonymMaps": []    },    {      "name": "title",      "type": "Edm.String",      "searchable": true,      "filterable": true,      "retrievable": true,      "stored": true,      "sortable": true,      "facetable": false,      "key": false,      "indexAnalyzer": null,      "searchAnalyzer": null,      "analyzer": null,      "normalizer": null,      "dimensions": null,      "vectorSearchProfile": null,      "vectorEncoding": null,      "synonymMaps": []    },    {      "name": "doi",      "type": "Edm.String",      "searchable": false,      "filterable": false,      "retrievable": true,      "stored": true,      "sortable": false,      "facetable": false,      "key": false,      "indexAnalyzer": null,      "searchAnalyzer": null,      "analyzer": null,      "normalizer": null,      "dimensions": null,      "vectorSearchProfile": null,      "vectorEncoding": null,      "synonymMaps": []    },    {      "name": "editorial_house",      "type": "Edm.String",      "searchable": false,      "filterable": false,      "retrievable": true,      "stored": true,      "sortable": false,      "facetable": false,      "key": false,      "indexAnalyzer": null,      "searchAnalyzer": null,      "analyzer": null,      "normalizer": null,      "dimensions": null,      "vectorSearchProfile": null,      "vectorEncoding": null,      "synonymMaps": []    },    {      "name": "metadata_storage_path",      "type": "Edm.String",      "searchable": false,      "filterable": false,      "retrievable": true,      "stored": true,      "sortable": false,      "facetable": false,      "key": true,      "indexAnalyzer": null,      "searchAnalyzer": null,      "analyzer": null,      "normalizer": null,      "dimensions": null,      "vectorSearchProfile": null,      "vectorEncoding": null,      "synonymMaps": []    }  ],  "scoringProfiles": [],  "corsOptions": null,  "suggesters": [],  "analyzers": [],  "normalizers": [],  "tokenizers": [],  "tokenFilters": [],  "charFilters": [],  "encryptionKey": null,  "similarity": {    "@odata.type": "BM25Similarity",    "k1": null,    "b": null  },  "semantic": {    "defaultConfiguration": null,    "configurations": [      {        "name": "article-semantic",        "prioritizedFields": {          "titleField": {            "fieldName": "title"          },          "prioritizedContentFields": [            {              "fieldName": "content"            }          ],          "prioritizedKeywordsFields": []        }      }    ]  },  "vectorSearch": null}

样本数据

[  {    "content": "This article explores the potential of AI to revolutionize genomics, highlighting recent breakthroughs and future prospects.",    "title": "The Impact of AI on Genomics: Recent Breakthroughs and Future Prospects",    "doi": "10.1234/ai-bio-2024-001",    "editorial_house": "BioTech Publishers",    "metadata_storage_path": "/articles/2024/ai-bio-2024-001"  },  {    "content": "In this study, we discuss the integration of machine learning in drug discovery processes, focusing on its benefits and challenges.",    "title": "Machine Learning in Drug Discovery: Benefits and Challenges",    "doi": "10.1234/ai-bio-2024-002",    "editorial_house": "BioTech Publishers",    "metadata_storage_path": "/articles/2024/ai-bio-2024-002"  },  {    "content": "This paper examines the role of AI in ecological monitoring, presenting case studies on wildlife conservation efforts.",    "title": "AI in Ecological Monitoring: Wildlife Conservation Case Studies",    "doi": "10.1234/ai-bio-2024-003",    "editorial_house": "BioTech Publishers",    "metadata_storage_path": "/articles/2024/ai-bio-2024-003"  },  {    "content": "The article reviews advances in bioinformatics driven by AI, with a focus on data analysis techniques and their applications.",    "title": "Advances in Bioinformatics: AI-Driven Data Analysis Techniques",    "doi": "10.1234/ai-bio-2024-004",    "editorial_house": "BioTech Publishers",    "metadata_storage_path": "/articles/2024/ai-bio-2024-004"  },  {    "content": "This study highlights the use of AI in personalized medicine, detailing the technology's impact on treatment plans and patient outcomes.",    "title": "Personalized Medicine: AI's Role in Tailoring Treatment Plans",    "doi": "10.1234/ai-bio-2024-005",    "editorial_house": "BioTech Publishers",    "metadata_storage_path": "/articles/2024/ai-bio-2024-005"  }]

回答：

是的，正如你所说，多索引查询是不可能的。对于你的问题，以下是可能的解决方案：

你提到你正在创建三个新索引，同时你还需要创建第四个索引，其中包含所有内容、主题和索引名称作为字段。

样本数据

{"index_name":"Biology Index","content":"所有关于生物学主题的内容"},{"index_name":"Engineering and Technology Index","content":"所有关于工程与技术主题的内容"},{"index_name":"Art and Architecture Index","content":"所有关于艺术与建筑主题的内容"}

因此，创建一个包含上述样本数据的第四个索引，如果每个主题有多个文档，则将它们合并并添加到内容字段中。

接下来，在这个第四个索引上使用输入进行查询，并从结果中获取具有最高 search.score 的索引名称，然后在你的 Python 代码中使用该索引名称进行进一步查询。

学技术

Azure AI Search 多索引 / 条件语义搜索

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复