问题描述
在使用Llama Index生成JSONL数据时,过程一直顺利进行,直到最后一步将结果保存到JSONL文件时。然而,每次尝试保存数据时似乎都不成功,因为我总是收到“Wrote 0 examples to finetuning_events.jsonl”的消息。我不确定导致这个问题的具体原因。
重现步骤
- 成功使用Llama Index生成JSONL数据。
- 尝试将结果保存到JSONL文件中。
- 收到消息“Wrote 0 examples to finetuning_events.jsonl”。
附加信息
- 使用的Llama Index版本:0.10.22
- 操作系统:Windows
日志
Wrote 0 examples to ./dataset_data/finetuning_events.jsonl
我的代码:
def jsonl_generation(self): """ Generate JSONL file for fine-tuning events and perform model refinement. """ # Initialize OpenAI FineTuningHandler and CallbackManager finetuning_handler = OpenAIFineTuningHandler() callback_manager = CallbackManager([finetuning_handler]) self.llm.callback_manager = callback_manager # Load questions for fine-tuning from a file questions = [] with open(f'{self.dataset_path}/train_questions.txt', "r", encoding='utf-8') as f: for line in f: questions.append(line.strip()) try: # Generate responses to the questions using GPT-4 and save the fine-tuning events to a JSONL file index = VectorStoreIndex.from_documents( self.documents ) query_engine = index.as_query_engine(similarity_top_k=2, llm=self.llm) for question in questions: response = query_engine.query(question) except Exception as e: # Handle the exception here, you might want to log the error or take appropriate action print(f"An error occurred: {e}") finally: # Save the fine-tuning events to a JSONL file finetuning_handler.save_finetuning_events(f'{self.dataset_path}/finetuning_events.jsonl')
回答:
我刚刚解决了这个问题。这是我的解决方案。目前,它正在将数据集存储到jsonl数据中。
def jsonl_generation(self): """ Generate JSONL file for fine-tuning events and perform model refinement. """ # Initialize OpenAI FineTuningHandler and CallbackManager finetuning_handler = OpenAIFineTuningHandler() callback_manager = CallbackManager([finetuning_handler]) llm = OpenAI(model="gpt-4", temperature=0.3) Settings.callback_manager, = (callback_manager,) # Load questions for fine-tuning from a file questions = [] with open(f'{self.dataset_path}/train_questions.txt', "r", encoding='utf-8') as f: for line in f: questions.append(line.strip()) try: from llama_index.core import VectorStoreIndex # Generate responses to the questions using GPT-4 and save the fine-tuning events to a JSONL file index = VectorStoreIndex.from_documents( self.documents ) query_engine = index.as_query_engine(similarity_top_k=2, llm=llm) for question in questions: response = query_engine.query(question) except Exception as e: # Handle the exception here, you might want to log the error or take appropriate action print(f"An error occurred: {e}") finally: # Save the fine-tuning events to a JSONL file finetuning_handler.save_finetuning_events(f'{self.dataset_path}/finetuning_events.jsonl')