我正在开发一个图神经网络(GNN),用于生成输入图的嵌入,以便在其他应用中使用,如强化学习。
我从spektral库中的一个示例开始,参考了TUDataset分类与GIN,并对其进行了修改,将网络分为两部分。第一部分用于生成嵌入,第二部分用于分类。我的目标是使用带有图标签的数据集(例如TUDataset)通过监督学习来训练这个网络,并在训练完成后将第一部分(嵌入生成)用于其他应用中。
在两个不同的数据集上,我的方法产生了不同的结果。TUDataset在这种新方法下显示出改进的损失和准确性,而另一个本地数据集则显示出损失显著增加。
请问我的生成嵌入的方法是否合适?有无进一步改进的建议?
以下是我用于生成图嵌入的代码:
import numpy as npimport tensorflow as tffrom tensorflow.keras.layers import Dense, Dropoutfrom tensorflow.keras.losses import CategoricalCrossentropyfrom tensorflow.keras.metrics import categorical_accuracyfrom tensorflow.keras.models import Model, Sequentialfrom tensorflow.keras.optimizers import Adamfrom spektral.data import DisjointLoaderfrom spektral.datasets import TUDatasetfrom spektral.layers import GINConv, GlobalAvgPool################################################################################# PARAMETERS################################################################################learning_rate = 1e-3 # Learning ratechannels = 128 # Hidden unitslayers = 3 # GIN layersepochs = 300 # Number of training epochsbatch_size = 32 # Batch size################################################################################# LOAD DATA################################################################################dataset = TUDataset("PROTEINS", clean=True)# ParametersF = dataset.n_node_features # Dimension of node featuresn_out = dataset.n_labels # Dimension of the target# Train/test splitidxs = np.random.permutation(len(dataset))split = int(0.9 * len(dataset))idx_tr, idx_te = np.split(idxs, [split])dataset_tr, dataset_te = dataset[idx_tr], dataset[idx_te]loader_tr = DisjointLoader(dataset_tr, batch_size=batch_size, epochs=epochs)loader_te = DisjointLoader(dataset_te, batch_size=batch_size, epochs=1)################################################################################# BUILD MODEL################################################################################class GIN0(Model): def __init__(self, channels, n_layers): super().__init__() self.conv1 = GINConv(channels, epsilon=0, mlp_hidden=[channels, channels]) self.convs = [] for _ in range(1, n_layers): self.convs.append( GINConv(channels, epsilon=0, mlp_hidden=[channels, channels]) ) self.pool = GlobalAvgPool() self.dense1 = Dense(channels, activation="relu") def call(self, inputs): x, a, i = inputs x = self.conv1([x, a]) for conv in self.convs: x = conv([x, a]) x = self.pool([x, i]) return self.dense1(x)# Build modelmodel = GIN0(channels, layers)model_op = Sequential()model_op.add(Dropout(0.5, input_shape=(channels,)))model_op.add(Dense(n_out, activation="softmax"))opt = Adam(lr=learning_rate)loss_fn = CategoricalCrossentropy()################################################################################# FIT MODEL################################################################################@tf.function(input_signature=loader_tr.tf_signature(), experimental_relax_shapes=True)def train_step(inputs, target): with tf.GradientTape(persistent=True) as tape: node2vec = model(inputs, training=True) predictions = model_op(node2vec, training=True) loss = loss_fn(target, predictions) loss += sum(model.losses) gradients = tape.gradient(loss, model.trainable_variables) opt.apply_gradients(zip(gradients, model.trainable_variables)) gradients2 = tape.gradient(loss, model_op.trainable_variables) opt.apply_gradients(zip(gradients2, model_op.trainable_variables)) acc = tf.reduce_mean(categorical_accuracy(target, predictions)) return loss, accprint("Fitting model")current_batch = 0model_lss = model_acc = 0for batch in loader_tr: lss, acc = train_step(*batch) model_lss += lss.numpy() model_acc += acc.numpy() current_batch += 1 if current_batch == loader_tr.steps_per_epoch: model_lss /= loader_tr.steps_per_epoch model_acc /= loader_tr.steps_per_epoch print("Loss: {}. Acc: {}".format(model_lss, model_acc)) model_lss = model_acc = 0 current_batch = 0################################################################################# EVALUATE MODEL################################################################################def tolist(predictions): result = [] for item in predictions: result.append((float(item[0]), float(item[1]))) return resultloss_data = []print("Testing model")model_lss = model_acc = 0for batch in loader_te: inputs, target = batch node2vec = model(inputs, training=False) predictions = model_op(node2vec, training=False) predictions_list = tolist(predictions) loss_data.append(zip(target,predictions_list)) model_lss += loss_fn(target, predictions) model_acc += tf.reduce_mean(categorical_accuracy(target, predictions))model_lss /= loader_te.steps_per_epochmodel_acc /= loader_te.steps_per_epochprint("Done. Test loss: {}. Test acc: {}".format(model_lss, model_acc))for batchi in loss_data: for item in batchi: print(list(item),'\n')
回答:
您生成图嵌入的方法是正确的,GIN0
模型将根据输入的图返回一个向量。
然而,这段代码看起来有些奇怪:
gradients = tape.gradient(loss, model.trainable_variables)opt.apply_gradients(zip(gradients, model.trainable_variables))gradients2 = tape.gradient(loss, model_op.trainable_variables)opt.apply_gradients(zip(gradients2, model_op.trainable_variables))
您在这里所做的是对model
的权重进行了两次更新,而对model_op
的权重只更新了一次。
当您在tf.GradientTape
的上下文中计算损失时,所有用于计算最终值的计算都会被跟踪。这意味着如果您调用loss = foo(bar(x))
然后使用该损失计算训练步骤,foo
和bar
的权重都会被更新。
除此之外,我没有看到代码中的其他问题,所以主要取决于您使用的本地数据集。
祝好