我想使用NumPy
创建一个数据集,然后训练和测试一个简单的模型,如’线性’或’逻辑’模型。
我正在学习Pytorch Lightning
。我找到了一篇教程,我们可以使用NumPy数据集,并可以使用均匀分布这里。作为新手,我没有完全理解如何做到这一点!
我的代码如下
import numpy as np import pytorch_lightning as pl from torch.utils.data import random_split, DataLoader, TensorDatasetimport torchfrom torch.autograd import Variablefrom torchvision import transformsnp.random.seed(42)device = 'cuda' if torch.cuda.is_available() else 'cpu'class DataModuleClass(pl.LightningDataModule): def __init__(self): super().__init__() self.constant = 2 self.batch_size = 10 self.transform = transforms.Compose([ transforms.ToTensor() ]) def prepare_data(self): a = np.random.uniform(0, 500, 500) b = np.random.normal(0, self.constant, len(x)) c = a + b X = np.transpose(np.array([a, b])) idx = np.arange(500) np.random.shuffle(idx) # Uses foirst 400 random indices for training train_idx = idx[:400] # Uses the remaining indices for validation val_idx = idx[400:] # Generate train and validation dataset x_train, y_train = X[train_idx], y[train_idx] x_val, y_val = X[val_idx], y[val_idx] # Converting numpy array to Tensor self.x_train_tensor = torch.from_numpy(x_train).float().to(device) self.y_train_tensor = torch.from_numpy(y_train).float().to(device) self.x_val_tensor = torch.from_numpy(x_val).float().to(device) self.y_val_tensor = torch.from_numpy(y_val).float().to(device) training_dataset = TensorDataset(self.x_train_tensor, self.y_train_tensor) validation_dataset = TensorDataset(self.x_val_tensor, self.y_val_tensor) return training_dataset, validation_dataset def train_dataloader(self): training_dataloader = prepare_data() # Most probably this is wrong way!!! return DataLoader(self.training_dataloader) def val_dataloader(self): validation_dataloader = prepare_data() # Most probably this is wrong way!!! return DataLoader(self.validation_dataloader) # def test_dataloader(self): obj = DataModuleClass()print(obj.prepare_data())
这部分是根据给出的答案完成的[在这里,我想将a和b作为特征
,c
作为标签或目标变量
。]
现在,如何将数据集传递到训练和验证方法
中?
回答:
您可以使用以下代码从prepare_data()
或setup()
中获取数据。
def prepare_data(self): a = np.random.uniform(0, 500, 500) b = np.random.normal(0, self.constant, len(a)) c = a + b X = np.transpose(np.array([a, b])) # Converting numpy array to Tensor self.x_train_tensor = torch.from_numpy(X).float().to(device) self.y_train_tensor = torch.from_numpy(c).float().to(device) training_dataset = TensorDataset(self.x_train_tensor, self.y_train_tensor) self.training_dataset = training_datasetdef setup(self): data = self.training_dataset self.train_data, self.val_data = random_split(data, [400, 100])def train_dataloader(self): return DataLoader(self.train_data)def val_dataloader(self): return DataLoader(self.val_data)
您可以使用random_split()
来拆分数据集。