如何仅在答案来自自定义知识库时通过LangChain的get_relevant_documents方法检索源文档

我正在制作一个访问外部知识库docs的聊 bot。我希望获取机器人为回答问题而访问的相关文档,但如果用户输入的是类似于“hello”、“你好吗”、“2+2等于多少”或任何不从外部知识库docs中检索的答案时,就不应该这样做。在这种情况下,我希望retriever.get_relevant_documents(query)或其他任何代码行返回一个空列表或类似的东西。

import osfrom langchain.embeddings.openai import OpenAIEmbeddingsfrom langchain.vectorstores import FAISSfrom langchain.chains import ConversationalRetrievalChain from langchain.memory import ConversationBufferMemoryfrom langchain.chat_models import ChatOpenAIfrom langchain.prompts import PromptTemplateos.environ['OPENAI_API_KEY'] = ''custom_template = """This is conversation with a human. Answer the questions you get based on the knowledge you have.If you don't know the answer, just say that you don't, don't try to make up an answer.Chat History:{chat_history}Follow Up Input: {question}"""CUSTOM_QUESTION_PROMPT = PromptTemplate.from_template(custom_template)llm = ChatOpenAI(    model_name="gpt-3.5-turbo",  # Name of the language model    temperature=0  # Parameter that controls the randomness of the generated responses)embeddings = OpenAIEmbeddings()docs = [    "Buildings are made out of brick",    "Buildings are made out of wood",    "Buildings are made out of stone",    "Buildings are made out of atoms",    "Buildings are made out of building materials",    "Cars are made out of metal",    "Cars are made out of plastic",  ]vectorstore = FAISS.from_texts(docs, embeddings)retriever = vectorstore.as_retriever()memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)qa = ConversationalRetrievalChain.from_llm(    llm,    retriever,    condense_question_prompt=CUSTOM_QUESTION_PROMPT,    memory=memory)query = "what are cars made of?"result = qa({"question": query})print(result)print(retriever.get_relevant_documents(query))

我尝试为检索器设置了一个阈值,但我仍然会得到具有高相似度分数的相关文档。而在其他用户提示中有相关文档的情况下,我却没有得到任何相关文档。

retriever = vectorstore.as_retriever(search_type="similarity_score_threshold", search_kwargs={"score_threshold": .9})

回答:

为了解决这个问题,我不得不将链类型更改为RetrievalQA,并引入代理和工具。

import osfrom langchain.embeddings.openai import OpenAIEmbeddingsfrom langchain.vectorstores import FAISSfrom langchain.chains import RetrievalQAfrom langchain.memory import ConversationBufferMemoryfrom langchain.chat_models import ChatOpenAIfrom langchain.prompts import PromptTemplatefrom langchain.agents import AgentExecutor, Tool,initialize_agentfrom langchain.agents.types import AgentTypeos.environ['OPENAI_API_KEY'] = ''system_message = """"You are the XYZ bot.""This is conversation with a human. Answer the questions you get based on the knowledge you have.""If you don't know the answer, just say that you don't, don't try to make up an answer.""""llm = ChatOpenAI(    model_name="gpt-3.5-turbo",  # Name of the language model    temperature=0  # Parameter that controls the randomness of the generated responses)embeddings = OpenAIEmbeddings()docs = [    "Buildings are made out of brick",    "Buildings are made out of wood",    "Buildings are made out of stone",    "Buildings are made out of atoms",    "Buildings are made out of building materials",    "Cars are made out of metal",    "Cars are made out of plastic",  ]vectorstore = FAISS.from_texts(docs, embeddings)retriever = vectorstore.as_retriever()memory = ConversationBufferMemory(memory_key="chat_history", input_key='input', return_messages=True, output_key='output')qa = RetrievalQA.from_chain_type(        llm=llm,        chain_type="stuff",        retriever=vectorstore.as_retriever(),        verbose=True,        return_source_documents=True    )tools = [        Tool(            name="doc_search_tool",            func=qa,            description=(               "This tool is used to retrieve information from the knowledge base"            )        )    ]agent = initialize_agent(        agent = AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION,        tools=tools,        llm=llm,        memory=memory,        return_source_documents=True,        return_intermediate_steps=True,        agent_kwargs={"system_message": system_message}        )query1 = "what are buildings made of?"result1 = agent(query1)query2 = "who are you?"result2 = agent(query2)

如果结果访问了来源,它将具有"intermediate_steps"键的值,然后可以通过result1["intermediate_steps"][0][1]["source_documents"]访问源文档

否则,当查询不需要来源时,result2["intermediate_steps"]将为空。

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注