Home IT技术如何在PyTorch中使用WeightedRandomSampler平衡（过采样）不平衡数据？

如何在PyTorch中使用WeightedRandomSampler平衡（过采样）不平衡数据？

IT技术 xiaolong · 2025年5月22日 · 0 Comment

我有一个两类问题，我的数据非常不平衡。其中一类有232550个样本，另一类有13498个样本。PyTorch的文档和互联网告诉我应该在DataLoader中使用WeightedRandomSampler类来解决这个问题。

我尝试使用WeightedRandomSampler，但总是遇到错误。

    trainratio = np.bincount(trainset.labels)    classcount = trainratio.tolist()    train_weights = 1./torch.tensor(classcount, dtype=torch.float)    train_sampleweights = train_weights[trainset.labels]    train_sampler = WeightedRandomSampler(weights=train_sampleweights,     num_samples = len(train_sampleweights))    trainloader = DataLoader(trainset, sampler=train_sampler,     shuffle=False)

我无法理解为什么在初始化WeightedRandomSampler类时会遇到这个错误？

我尝试了其他类似的解决方法，但到目前为止，所有尝试都产生了一些错误。我应该如何实现这一点来平衡我的训练、验证和测试数据？

目前遇到的错误是：

train__sampleweights = train_weights[trainset.labels] ValueError: too many dimensions ‘str’

回答：

问题出在trainset.labels的类型上。为了修复这个错误，可以将trainset.labels转换为浮点数类型。

data-cleaning machine-learning python pytorch

发表回复取消回复