如何仅在答案来自自定义知识库时通过LangChain的get_relevant_documents方法检索源文档

我正在制作一个访问外部知识库docs的聊 bot。我希望获取机器人为回答问题而访问的相关文档,但如果用户输入的是类似于“hello”、“你好吗”、“2+2等于多少”或任何不从外部知识库docs中检索的答案时,就不应该这样做。在这种情况下,我希望retriever.get_relevant_documents(query)或其他任何代码行返回一个空列表或类似的东西。

import osfrom langchain.embeddings.openai import OpenAIEmbeddingsfrom langchain.vectorstores import FAISSfrom langchain.chains import ConversationalRetrievalChain from langchain.memory import ConversationBufferMemoryfrom langchain.chat_models import ChatOpenAIfrom langchain.prompts import PromptTemplateos.environ['OPENAI_API_KEY'] = ''custom_template = """This is conversation with a human. Answer the questions you get based on the knowledge you have.If you don't know the answer, just say that you don't, don't try to make up an answer.Chat History:{chat_history}Follow Up Input: {question}"""CUSTOM_QUESTION_PROMPT = PromptTemplate.from_template(custom_template)llm = ChatOpenAI(    model_name="gpt-3.5-turbo",  # Name of the language model    temperature=0  # Parameter that controls the randomness of the generated responses)embeddings = OpenAIEmbeddings()docs = [    "Buildings are made out of brick",    "Buildings are made out of wood",    "Buildings are made out of stone",    "Buildings are made out of atoms",    "Buildings are made out of building materials",    "Cars are made out of metal",    "Cars are made out of plastic",  ]vectorstore = FAISS.from_texts(docs, embeddings)retriever = vectorstore.as_retriever()memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)qa = ConversationalRetrievalChain.from_llm(    llm,    retriever,    condense_question_prompt=CUSTOM_QUESTION_PROMPT,    memory=memory)query = "what are cars made of?"result = qa({"question": query})print(result)print(retriever.get_relevant_documents(query))

我尝试为检索器设置了一个阈值,但我仍然会得到具有高相似度分数的相关文档。而在其他用户提示中有相关文档的情况下,我却没有得到任何相关文档。

retriever = vectorstore.as_retriever(search_type="similarity_score_threshold", search_kwargs={"score_threshold": .9})

回答:

为了解决这个问题,我不得不将链类型更改为RetrievalQA,并引入代理和工具。

import osfrom langchain.embeddings.openai import OpenAIEmbeddingsfrom langchain.vectorstores import FAISSfrom langchain.chains import RetrievalQAfrom langchain.memory import ConversationBufferMemoryfrom langchain.chat_models import ChatOpenAIfrom langchain.prompts import PromptTemplatefrom langchain.agents import AgentExecutor, Tool,initialize_agentfrom langchain.agents.types import AgentTypeos.environ['OPENAI_API_KEY'] = ''system_message = """"You are the XYZ bot.""This is conversation with a human. Answer the questions you get based on the knowledge you have.""If you don't know the answer, just say that you don't, don't try to make up an answer.""""llm = ChatOpenAI(    model_name="gpt-3.5-turbo",  # Name of the language model    temperature=0  # Parameter that controls the randomness of the generated responses)embeddings = OpenAIEmbeddings()docs = [    "Buildings are made out of brick",    "Buildings are made out of wood",    "Buildings are made out of stone",    "Buildings are made out of atoms",    "Buildings are made out of building materials",    "Cars are made out of metal",    "Cars are made out of plastic",  ]vectorstore = FAISS.from_texts(docs, embeddings)retriever = vectorstore.as_retriever()memory = ConversationBufferMemory(memory_key="chat_history", input_key='input', return_messages=True, output_key='output')qa = RetrievalQA.from_chain_type(        llm=llm,        chain_type="stuff",        retriever=vectorstore.as_retriever(),        verbose=True,        return_source_documents=True    )tools = [        Tool(            name="doc_search_tool",            func=qa,            description=(               "This tool is used to retrieve information from the knowledge base"            )        )    ]agent = initialize_agent(        agent = AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION,        tools=tools,        llm=llm,        memory=memory,        return_source_documents=True,        return_intermediate_steps=True,        agent_kwargs={"system_message": system_message}        )query1 = "what are buildings made of?"result1 = agent(query1)query2 = "who are you?"result2 = agent(query2)

如果结果访问了来源,它将具有"intermediate_steps"键的值,然后可以通过result1["intermediate_steps"][0][1]["source_documents"]访问源文档

否则,当查询不需要来源时,result2["intermediate_steps"]将为空。

Related Posts

L1-L2正则化的不同系数

我想对网络的权重同时应用L1和L2正则化。然而,我找不…

使用scikit-learn的无监督方法将列表分类成不同组别,有没有办法?

我有一系列实例,每个实例都有一份列表,代表它所遵循的不…

f1_score metric in lightgbm

我想使用自定义指标f1_score来训练一个lgb模型…

通过相关系数矩阵进行特征选择

我在测试不同的算法时,如逻辑回归、高斯朴素贝叶斯、随机…

可以将机器学习库用于流式输入和输出吗?

已关闭。此问题需要更加聚焦。目前不接受回答。 想要改进…

在TensorFlow中,queue.dequeue_up_to()方法的用途是什么?

我对这个方法感到非常困惑,特别是当我发现这个令人费解的…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注