AttributeError: 在 DataLoader 工作进程 0 中捕获到 AttributeError。 – 微调预训练的变换器模型

谁能帮我解决这个错误?

---------------------------------------------------------------------------AttributeError                            Traceback (most recent call last)<ipython-input-4-aaa58b106c77> in <module>()     25     output_path='fine_tuned_bert',     26     save_best_model= True,---> 27     show_progress_bar= True     28     )4 frames/usr/local/lib/python3.7/dist-packages/torch/_utils.py in reraise(self)    423             # have message field    424             raise self.exc_type(message=msg)--> 425         raise self.exc_type(msg)    426     427 AttributeError: Caught AttributeError in DataLoader worker process 0.Original Traceback (most recent call last):  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop    data = fetcher.fetch(index)  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch    return self.collate_fn(data)  File "/usr/local/lib/python3.7/dist-packages/sentence_transformers/SentenceTransformer.py", line 518, in smart_batching_collate    num_texts = len(batch[0].texts)AttributeError: 'str' object has no attribute 'texts'

代码:

import pandas as pd# initialise data of lists.data = {'input':[          "Alpro, Cioccolato bevanda a base di soia 1 ltr", #Alpro, Chocolate soy drink 1 ltr          "Milka  cioccolato al latte 100 g", #Milka milk chocolate 100 g          "Danone, HiPRO 25g Proteine gusto cioccolato 330 ml", #Danone, HiPRO 25g Protein chocolate flavor 330 ml         ]        } # Creates pandas DataFrame.x_sample = pd.DataFrame(data)print(x_sample['input'])# load modelfrom sentence_transformers import SentenceTransformer, SentencesDataset, InputExample, losses, evaluationfrom torch.utils.data import DataLoaderembedder = SentenceTransformer('sentence-transformers/paraphrase-xlm-r-multilingual-v1') # or any other pretrained modelprint("embedder loaded...")# define your train dataset, the dataloader, and the train losstrain_dataset = SentencesDataset(x_sample["input"].tolist(), embedder)train_dataloader = DataLoader(train_dataset, shuffle=False, batch_size=4, num_workers=1)train_loss = losses.CosineSimilarityLoss(embedder)# dummy evaluator to make the api worksentences1 = ['latte al cioccolato', 'latte al cioccolato','latte al cioccolato']sentences2 = ['Alpro, Cioccolato bevanda a base di soia 1 ltr', 'Danone, HiPRO 25g Proteine gusto cioccolato 330 ml','Milka  cioccolato al latte 100 g']scores = [0.99,0.95,0.4]evaluator = evaluation.EmbeddingSimilarityEvaluator(sentences1, sentences2, scores)# tune the modelembedder.fit(train_objectives=[(train_dataloader, train_loss)],     epochs=5,     warmup_steps=500,     evaluator=evaluator,     evaluation_steps=1,    output_path='fine_tuned_bert',    save_best_model= True,    show_progress_bar= True    )

回答:

[更新]我浏览了几行关于如何使用 fit() 方法的文档,在这里,我意识到有一个更简单的解决方案来实现你想要的。你只需要考虑的变化是定义合适的 InputExample 来构建 DataLoader 并创建一个损失函数!

import pandas as pd# initialise data of lists.data = {'input':[          "Alpro, Cioccolato bevanda a base di soia 1 ltr", #Alpro, Chocolate soy drink 1 ltr          "Milka  cioccolato al latte 100 g", #Milka milk chocolate 100 g          "Danone, HiPRO 25g Proteine gusto cioccolato 330 ml", #Danone, HiPRO 25g Protein chocolate flavor 330 ml         ]        } # Creates pandas DataFrame.x_sample = pd.DataFrame(data)print(x_sample['input'])# load modelfrom sentence_transformers import SentenceTransformer, SentencesDataset, InputExample, losses, evaluationfrom torch.utils.data import DataLoaderembedder = SentenceTransformer('sentence-transformers/paraphrase-xlm-r-multilingual-v1') # or any other pretrained modelprint("embedder loaded...")# define your train dataset, the dataloader, and the train loss# train_dataset = SentencesDataset(x_sample["input"].tolist(), embedder)# train_dataloader = DataLoader(train_dataset, shuffle=False, batch_size=4, num_workers=1)# train_loss = losses.CosineSimilarityLoss(embedder)# dummy evaluator to make the api worksentences1 = ['latte al cioccolato', 'latte al cioccolato','latte al cioccolato']sentences2 = ['Alpro, Cioccolato bevanda a base di soia 1 ltr', 'Danone, HiPRO 25g Proteine gusto cioccolato 330 ml','Milka  cioccolato al latte 100 g']scores = [0.99,0.95,0.4]evaluator = evaluation.EmbeddingSimilarityEvaluator(sentences1, sentences2, scores)examples = []for s1,s2,l in zip(sentences1, sentences2, scores):  examples.append(InputExample(texts=[s1, s2], label=l))train_dataloader = DataLoader(examples, shuffle=False, batch_size=4, num_workers=1)train_loss = losses.CosineSimilarityLoss(embedder)# tune the modelembedder.fit(train_objectives=[(train_dataloader, train_loss)],     epochs=5,     warmup_steps=500,     evaluator=evaluator,     evaluation_steps=1,    output_path='fine_tuned_bert',    save_best_model= True,    show_progress_bar= True    )

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注