假设我有一个分类问题,包含30个潜在的二进制标签。这些标签并非互斥。标签通常是稀疏的——平均每个30个标签中只有1个正标签,但有时会超过1个。在下面的代码中,我该如何惩罚模型预测全零的情况?准确率会很高,但召回率会非常糟糕!
import numpy as npfrom tensorflow.keras.layers import Input, Densefrom tensorflow.keras.models import ModelOUTPUT_NODES = 30np.random.seed(0)def get_dataset(): """ Get a dataset of X and y. This is a learnable problem as there is some signal in the features. 10% of the time, a positive-output's index will also have a positive feature for that index :return: X and y data for training """ n_observations = 30000 y = np.random.rand(n_observations, OUTPUT_NODES) y = (y <= (1 / OUTPUT_NODES)).astype(int) # Makes a sparse output where there is roughly 1 positive label: ((1 / OUTPUT_NODES) * OUTPUT_NODES ≈ 1) X = np.zeros((n_observations, OUTPUT_NODES)) for i in range(len(y)): for j, feature in enumerate(y[i]): if feature == 1: X[i][j] = 1 if np.random.rand(1) > 0.9 else 0 # Makes the input features more noisy # X[i][j] = 1 # Using this instead will make the model perform very well return X, ydef create_model(): input_layer = Input(shape=(OUTPUT_NODES, )) dense1 = Dense(100, activation='relu')(input_layer) dense2 = Dense(100, activation='relu')(dense1) output_layer = Dense(30, activation='sigmoid')(dense2) model = Model(inputs=input_layer, outputs=output_layer) model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['Recall']) return modeldef main(): X, y = get_dataset() model = create_model() model.fit(X, y, epochs=10, batch_size=10) X_pred = np.random.randint(0, 2, (100, OUTPUT_NODES)) y_pred = model.predict(X_pred) print(X_pred) print(y_pred.round(1))if __name__ == '__main__': main()
我相信我在这里读到可以使用:
weighted_cross_entropy_with_logits
来解决这个问题。这会如何影响我最终输出层的激活函数?我是否必须使用激活函数?我如何指定对真正类别误分类的惩罚?
回答:
好的,这是一个有趣的问题
首先,您需要定义一个加权交叉熵损失包装器:
def wce_logits(positive_class_weight=1.): def mylossw(y_true, logits): cross_entropy = tf.reduce_mean(tf.nn.weighted_cross_entropy_with_logits(logits=logits, labels=tf.cast(y_true, dtype=tf.float32), pos_weight=positive_class_weight)) return cross_entropy return mylossw
positive_class_weight应用于正类数据。您需要这个包装器来让tf.nn.weighted_cross_entropy_with_logits得到一个只接受y_true和y_pred(仅此)作为输入的损失函数。请注意,您必须将y_true转换为float32类型。
其次,您不能使用预定义的Recall,因为它不适用于logits。我在这个讨论中找到了一个解决方法在这里
class Recall(tf.keras.metrics.Recall): def __init__(self, from_logits=False, *args, **kwargs): super().__init__(*args, **kwargs) self._from_logits = from_logits def update_state(self, y_true, y_pred, sample_weight=None): if self._from_logits: super(Recall, self).update_state(y_true, tf.nn.sigmoid(y_pred), sample_weight) else: super(Recall, self).update_state(y_true, y_pred, sample_weight)
最后,您需要从最后一层移除sigmoid激活,因为您使用的是logits
def create_model(): input_layer = Input(shape=(OUTPUT_NODES, )) dense1 = Dense(100, activation='relu')(input_layer) dense2 = Dense(100, activation='relu')(dense1) output_layer = Dense(30)(dense2) model = Model(inputs=input_layer, outputs=output_layer) model.compile(optimizer='adam', loss=wce_logits(positive_class_weight=27.), metrics=[Recall(from_logits=True)]) return model
请注意,这里正权重被设为27。您可以阅读关于如何正确计算权重的讨论在这里