这是一个word2vec的实现:
%reset -fimport torchfrom torch.autograd import Variableimport numpy as npimport torch.functional as Fimport torch.nn.functional as Fcorpus = [ 'this test', 'this separate test']def get_input_layer(word_idx): x = torch.zeros(vocabulary_size).float() x[word_idx] = 1.0 return xdef tokenize_corpus(corpus): tokens = [x.split() for x in corpus] return tokenstokenized_corpus = tokenize_corpus(corpus)vocabulary = []for sentence in tokenized_corpus: for token in sentence: if token not in vocabulary: vocabulary.append(token)word2idx = {w: idx for (idx, w) in enumerate(vocabulary)}idx2word = {idx: w for (idx, w) in enumerate(vocabulary)}window_size = 2idx_pairs = []# for each sentencefor sentence in tokenized_corpus: indices = [word2idx[word] for word in sentence] # for each word, threated as center word for center_word_pos in range(len(indices)): # for each window position for w in range(-window_size, window_size + 1): context_word_pos = center_word_pos + w # make soure not jump out sentence if context_word_pos < 0 or context_word_pos >= len(indices) or center_word_pos == context_word_pos: continue context_word_idx = indices[context_word_pos] idx_pairs.append((indices[center_word_pos], context_word_idx))idx_pairs = np.array(idx_pairs) # it will be useful to have this as numpy arrayvocabulary_size = len(vocabulary)embedding_dims = 4W1 = Variable(torch.randn(embedding_dims, vocabulary_size).float(), requires_grad=True)W2 = Variable(torch.randn(vocabulary_size, embedding_dims).float(), requires_grad=True)num_epochs = 1learning_rate = 0.001for epo in range(num_epochs): loss_val = 0 for data, target in idx_pairs: x = Variable(get_input_layer(data)).float() y_true = Variable(torch.from_numpy(np.array([target])).long()) z1 = torch.matmul(W1, x) z2 = torch.matmul(W2, z1) log_softmax = F.log_softmax(z2, dim=0) loss = F.nll_loss(log_softmax.view(1,-1), y_true) print(float(loss)) loss_val += loss.data.item() loss.backward() W1.data -= learning_rate * W1.grad.data W2.data -= learning_rate * W2.grad.data W1.grad.data.zero_() W2.grad.data.zero_() print(W1.shape) print(W2.shape) print(f'Loss at epo {epo}: {loss_val/len(idx_pairs)}')
这将打印:
0.33185482025146484torch.Size([4, 3])torch.Size([3, 4])Loss at epo 0: 0.0414818525314331053.302438735961914torch.Size([4, 3])torch.Size([3, 4])Loss at epo 0: 0.454286694526672362.3144636154174805torch.Size([4, 3])torch.Size([3, 4])Loss at epo 0: 0.74359464645385740.33418864011764526torch.Size([4, 3])torch.Size([3, 4])Loss at epo 0: 0.78536822646856311.0644199848175049torch.Size([4, 3])torch.Size([3, 4])Loss at epo 0: 0.91842072457075120.4970806837081909torch.Size([4, 3])torch.Size([3, 4])Loss at epo 0: 0.9805558100342753.2861199378967285torch.Size([4, 3])torch.Size([3, 4])Loss at epo 0: 1.39132080227136616.170125961303711torch.Size([4, 3])torch.Size([3, 4])Loss at epo 0: 2.16258654743433
修改代码使用mse_loss时, 将y_true更改为float类型:
y_true = Variable(torch.from_numpy(np.array([target])).float())
使用mse_loss:
loss = F.mse_loss(log_softmax.view(1,-1), y_true)
更新后的代码:
for epo in range(num_epochs): loss_val = 0 for data, target in idx_pairs: x = Variable(get_input_layer(data)).float() y_true = Variable(torch.from_numpy(np.array([target])).float()) z1 = torch.matmul(W1, x) z2 = torch.matmul(W2, z1) log_softmax = F.log_softmax(z2, dim=0) loss = F.mse_loss(log_softmax.view(1,-1), y_true) print(float(loss)) loss_val += loss.data.item() loss.backward() W1.data -= learning_rate * W1.grad.data W2.data -= learning_rate * W2.grad.data W1.grad.data.zero_() W2.grad.data.zero_() print(W1.shape) print(W2.shape) print(f'Loss at epo {epo}: {loss_val/len(idx_pairs)}')
现在输出的结果是:
41.75048828125torch.Size([4, 3])torch.Size([3, 4])Loss at epo 0: 5.2188110351562516.929386138916016torch.Size([4, 3])torch.Size([3, 4])Loss at epo 0: 7.33498430252075250.63690948486328torch.Size([4, 3])torch.Size([3, 4])Loss at epo 0: 13.66459798812866236.21110534667969torch.Size([4, 3])torch.Size([3, 4])Loss at epo 0: 18.1909861564636235.304859638214111torch.Size([4, 3])torch.Size([3, 4])Loss at epo 0: 18.8540936112403879.802173614501953torch.Size([4, 3])torch.Size([3, 4])Loss at epo 0: 20.0793653130531315.515325546264648torch.Size([4, 3])torch.Size([3, 4])Loss at epo 0: 22.01878100633621230.408292770385742torch.Size([4, 3])torch.Size([3, 4])Loss at epo 0: 25.81981760263443-c:12: UserWarning: Using a target size (torch.Size([1])) that is different to the input size (torch.Size([1, 3])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
为什么mse损失函数的效果不如nll损失函数好? 这是否与PyTorch的警告有关:
Using a target size (torch.Size([1])) that is different to the input size (torch.Size([1, 3])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
?
回答:
对于nn.MSELoss
, 输入和目标必须具有相同的大小, 因为它是通过计算输入的第i个元素与目标的第i个元素的差的平方来计算的, 即mse_i = (input_i - target_i) ** 2
。
此外, 你的目标是范围在[0, vocabulary_size]内的非负整数, 但你使用了log-softmax, 其值范围在[-∞, 0]。使用MSE的目的是使预测值与目标值相同, 但两个区间的唯一重叠是0。这意味着除了0之外的每个类别都是不可达的。
MSE是一个回归损失函数, 在这种情况下并不适用, 因为你处理的是分类数据。