我正在学习神经网络,并且我想在Python中编写一个cross_entropy
函数。它的定义如下:
其中N
是样本数量,k
是类别数量,log
是自然对数,t_i,j
在样本i
属于类别j
时为1,否则为0
,p_i,j
是样本i
属于类别j
的预测概率。为了避免对数计算中的数值问题,将预测值限制在[10^{−12}, 1 − 10^{−12}]
范围内。
根据上述描述,我编写了代码,将预测值限制在[epsilon, 1 − epsilon]
范围内,然后根据上述公式计算交叉熵:
def cross_entropy(predictions, targets, epsilon=1e-12): """ Computes cross entropy between targets (encoded as one-hot vectors) and predictions. Input: predictions (N, k) ndarray targets (N, k) ndarray Returns: scalar """ predictions = np.clip(predictions, epsilon, 1. - epsilon) ce = - np.mean(np.log(predictions) * targets) return ce
以下代码将用于检查cross_entropy
函数是否正确:
predictions = np.array([[0.25,0.25,0.25,0.25], [0.01,0.01,0.01,0.96]])targets = np.array([[0,0,0,1], [0,0,0,1]])ans = 0.71355817782 #正确答案x = cross_entropy(predictions, targets)print(np.isclose(x,ans))
上述代码的输出为False,也就是说我定义cross_entropy
函数的代码不正确。然后我打印了cross_entropy(predictions, targets)
的结果。它给出了0.178389544455
,而正确的结果应该是ans = 0.71355817782
。谁能帮我检查一下我的代码有什么问题?
回答:
你离正确答案并不远,但请记住,你是在计算N个和的平均值,其中N=2(在这种情况下)。所以你的代码可以改成:
def cross_entropy(predictions, targets, epsilon=1e-12): """ Computes cross entropy between targets (encoded as one-hot vectors) and predictions. Input: predictions (N, k) ndarray targets (N, k) ndarray Returns: scalar """ predictions = np.clip(predictions, epsilon, 1. - epsilon) N = predictions.shape[0] ce = -np.sum(targets*np.log(predictions+1e-9))/N return cepredictions = np.array([[0.25,0.25,0.25,0.25], [0.01,0.01,0.01,0.96]])targets = np.array([[0,0,0,1], [0,0,0,1]])ans = 0.71355817782 #正确答案x = cross_entropy(predictions, targets)print(np.isclose(x,ans))
在这里,我认为使用np.sum()
会更清晰一些。另外,我在np.log()
中添加了1e-9,以避免计算中出现log(0)的可能。希望这对你有帮助!
注意:正如@Peter的评论所指出的,如果你的epsilon值大于0
,那么1e-9
的偏移确实是多余的。