损失函数的选择

这是一个word2vec的实现:

%reset -fimport torchfrom torch.autograd import Variableimport numpy as npimport torch.functional as Fimport torch.nn.functional as Fcorpus = [    'this test',    'this separate test']def get_input_layer(word_idx):    x = torch.zeros(vocabulary_size).float()    x[word_idx] = 1.0    return xdef tokenize_corpus(corpus):    tokens = [x.split() for x in corpus]    return tokenstokenized_corpus = tokenize_corpus(corpus)vocabulary = []for sentence in tokenized_corpus:    for token in sentence:        if token not in vocabulary:            vocabulary.append(token)word2idx = {w: idx for (idx, w) in enumerate(vocabulary)}idx2word = {idx: w for (idx, w) in enumerate(vocabulary)}window_size = 2idx_pairs = []# for each sentencefor sentence in tokenized_corpus:    indices = [word2idx[word] for word in sentence]    # for each word, threated as center word    for center_word_pos in range(len(indices)):        # for each window position        for w in range(-window_size, window_size + 1):            context_word_pos = center_word_pos + w            # make soure not jump out sentence            if context_word_pos < 0 or context_word_pos >= len(indices) or center_word_pos == context_word_pos:                continue            context_word_idx = indices[context_word_pos]            idx_pairs.append((indices[center_word_pos], context_word_idx))idx_pairs = np.array(idx_pairs) # it will be useful to have this as numpy arrayvocabulary_size = len(vocabulary)embedding_dims = 4W1 = Variable(torch.randn(embedding_dims, vocabulary_size).float(), requires_grad=True)W2 = Variable(torch.randn(vocabulary_size, embedding_dims).float(), requires_grad=True)num_epochs = 1learning_rate = 0.001for epo in range(num_epochs):    loss_val = 0    for data, target in idx_pairs:        x = Variable(get_input_layer(data)).float()        y_true = Variable(torch.from_numpy(np.array([target])).long())        z1 = torch.matmul(W1, x)        z2 = torch.matmul(W2, z1)        log_softmax = F.log_softmax(z2, dim=0)        loss = F.nll_loss(log_softmax.view(1,-1), y_true)        print(float(loss))        loss_val += loss.data.item()        loss.backward()        W1.data -= learning_rate * W1.grad.data        W2.data -= learning_rate * W2.grad.data        W1.grad.data.zero_()        W2.grad.data.zero_()        print(W1.shape)        print(W2.shape)        print(f'Loss at epo {epo}: {loss_val/len(idx_pairs)}')

这将打印:

0.33185482025146484torch.Size([4, 3])torch.Size([3, 4])Loss at epo 0: 0.0414818525314331053.302438735961914torch.Size([4, 3])torch.Size([3, 4])Loss at epo 0: 0.454286694526672362.3144636154174805torch.Size([4, 3])torch.Size([3, 4])Loss at epo 0: 0.74359464645385740.33418864011764526torch.Size([4, 3])torch.Size([3, 4])Loss at epo 0: 0.78536822646856311.0644199848175049torch.Size([4, 3])torch.Size([3, 4])Loss at epo 0: 0.91842072457075120.4970806837081909torch.Size([4, 3])torch.Size([3, 4])Loss at epo 0: 0.9805558100342753.2861199378967285torch.Size([4, 3])torch.Size([3, 4])Loss at epo 0: 1.39132080227136616.170125961303711torch.Size([4, 3])torch.Size([3, 4])Loss at epo 0: 2.16258654743433

修改代码使用mse_loss时, 将y_true更改为float类型:

y_true = Variable(torch.from_numpy(np.array([target])).float())

使用mse_loss:

loss = F.mse_loss(log_softmax.view(1,-1), y_true)

更新后的代码:

for epo in range(num_epochs):    loss_val = 0    for data, target in idx_pairs:        x = Variable(get_input_layer(data)).float()        y_true = Variable(torch.from_numpy(np.array([target])).float())        z1 = torch.matmul(W1, x)        z2 = torch.matmul(W2, z1)        log_softmax = F.log_softmax(z2, dim=0)        loss = F.mse_loss(log_softmax.view(1,-1), y_true)        print(float(loss))        loss_val += loss.data.item()        loss.backward()        W1.data -= learning_rate * W1.grad.data        W2.data -= learning_rate * W2.grad.data        W1.grad.data.zero_()        W2.grad.data.zero_()        print(W1.shape)        print(W2.shape)        print(f'Loss at epo {epo}: {loss_val/len(idx_pairs)}')

现在输出的结果是:

41.75048828125torch.Size([4, 3])torch.Size([3, 4])Loss at epo 0: 5.2188110351562516.929386138916016torch.Size([4, 3])torch.Size([3, 4])Loss at epo 0: 7.33498430252075250.63690948486328torch.Size([4, 3])torch.Size([3, 4])Loss at epo 0: 13.66459798812866236.21110534667969torch.Size([4, 3])torch.Size([3, 4])Loss at epo 0: 18.1909861564636235.304859638214111torch.Size([4, 3])torch.Size([3, 4])Loss at epo 0: 18.8540936112403879.802173614501953torch.Size([4, 3])torch.Size([3, 4])Loss at epo 0: 20.0793653130531315.515325546264648torch.Size([4, 3])torch.Size([3, 4])Loss at epo 0: 22.01878100633621230.408292770385742torch.Size([4, 3])torch.Size([3, 4])Loss at epo 0: 25.81981760263443-c:12: UserWarning: Using a target size (torch.Size([1])) that is different to the input size (torch.Size([1, 3])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.

为什么mse损失函数的效果不如nll损失函数好? 这是否与PyTorch的警告有关:

Using a target size (torch.Size([1])) that is different to the input size (torch.Size([1, 3])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.

回答：

对于nn.MSELoss, 输入和目标必须具有相同的大小, 因为它是通过计算输入的第i个元素与目标的第i个元素的差的平方来计算的, 即mse_i = (input_i - target_i) ** 2。

此外, 你的目标是范围在[0, vocabulary_size]内的非负整数, 但你使用了log-softmax, 其值范围在[-∞, 0]。使用MSE的目的是使预测值与目标值相同, 但两个区间的唯一重叠是0。这意味着除了0之外的每个类别都是不可达的。

MSE是一个回归损失函数, 在这种情况下并不适用, 因为你处理的是分类数据。

学技术

损失函数的选择

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复