我正在尝试使用Pytorch拟合一个LSTM模型。我的数据太大,无法一次性读入内存,因此我想使用Pytorch的DataLoader
函数来创建数据的小批次。
我的输入有两个特征(X1
,X2
)。我有一个输出特征(y)。我使用X1
和X2
的365个时间步作为特征来预测y
。
我的训练数组的维度是:
(n_observations, n_timesteps, n_features)
== (9498, 365, 2)
我不明白为什么下面的代码不起作用,因为我见过其他示例,其中X,y对具有不同的维数(用于径流建模的LSTM,Pytorch自己的文档)
最小可重现示例
输出:
---------------------------------------------------------------------------RuntimeError Traceback (most recent call last)<ipython-input-47-2a0b28b53c8f> in <module> 13 14 iterator = train_dataloader.__iter__()---> 15 iterator.next()/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py in __next__(self) 344 def __next__(self): 345 index = self._next_index() # may raise StopIteration--> 346 data = self._dataset_fetcher.fetch(index) # may raise StopIteration 347 if self._pin_memory: 348 data = _utils.pin_memory.pin_memory(data)/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py in fetch(self, possibly_batched_index) 45 else: 46 data = self.dataset[possibly_batched_index]---> 47 return self.collate_fn(data)/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py in default_collate(batch) 53 storage = elem.storage()._new_shared(numel) 54 out = elem.new(storage)---> 55 return torch.stack(batch, 0, out=out) 56 elif elem_type.__module__ == 'numpy' and elem_type.__name__ != 'str_' \ 57 and elem_type.__name__ != 'string_':RuntimeError: invalid argument 0: Tensors must have same number of dimensions: got 4 and 3 at /tmp/pip-req-build-4baxydiv/aten/src/TH/generic/THTensor.cpp:680
回答:
torch.utils.data.DataLoader
必须接收torch.utils.data.Dataset
作为参数。你提供的是一个张量元组。我建议你使用torch.utils.data.TensorDataset
,如下所示:
from torch.utils.data import DataLoader, TensorDatasettrain_x = torch.rand(9498, 365, 2) train_y = torch.rand(9498, 1)train_dataset = TensorDataset(train_x, train_y)train_dataloader = DataLoader(train_dataset, batch_size=256)for x, y in train_dataloader: print (x.shape)
检查一下是否解决了你的问题。