我正在尝试为keras模型中的特定输出变量相对于特定输入特征创建雅可比矩阵。例如,如果我有一个具有100个输入特征和10个输出变量的模型,并且我想创建输出2、3和4相对于输出50-70的雅可比矩阵,我可以这样创建雅可比矩阵:
from keras.models import Modelfrom keras.layers import Dense, Inputimport tensorflow as tfimport keras.backend as Kimport numpy as npinput_ = Input(shape=(100,))output_ = Dense(10)(input_)model = Model(input_,output_)x_indices = np.arange(50,70)y_indices = [2,3,4]y_list = tf.unstack(model.output[0])x = np.random.random((1,100))jacobian_matrix = []for i in y_indices: J = tf.gradients(y_list[i], model.input) jacobian_func = K.function([model.input, K.learning_phase()], J) jac = jacobian_func([x, False])[0][0,x_indices] jacobian_matrix.append(jac)jacobian_matrix = np.array(jacobian_matrix)
但是对于一个复杂得多的模型,这非常慢。我只想创建上述雅可比函数相对于感兴趣的输入。我尝试了这样的方法:
from keras.models import Modelfrom keras.layers import Dense, Inputimport tensorflow as tfimport keras.backend as Kimport numpy as npinput_ = Input(shape=(100,))output_ = Dense(10)(input_)model = Model(input_,output_)x_indices = np.arange(50,60)y_indices = [2,3,4]y_list = tf.unstack(model.output[0])x_list = tf.unstack(model.input[0])x = np.random.random((1,100))jacobian_matrix = []for i in y_indices: jacobian_row = [] for j in x_indices: J = tf.gradients(y_list[i], x_list[j]) jacobian_func = K.function([model.input, K.learning_phase()], J) jac = jacobian_func([x, False])[0][0,:] jacobian_row.append(jac) jacobian_matrix.append(jacobian_row)jacobian_matrix = np.array(jacobian_matrix)
但得到了错误:
TypeErrorTraceback (most recent call last)<ipython-input-33-d0d524ad0e40> in <module>() 23 for j in x_indices: 24 J = tf.gradients(y_list[i], x_list[j])---> 25 jacobian_func = K.function([model.input, K.learning_phase()], J) 26 jac = jacobian_func([x, False])[0][0,:] 27 jacobian_row.append(jac)/opt/conda/lib/python2.7/site-packages/keras/backend/tensorflow_backend.pyc in function(inputs, outputs, updates, **kwargs) 2500 msg = 'Invalid argument "%s" passed to K.function with TensorFlow backend' % key 2501 raise ValueError(msg)-> 2502 return Function(inputs, outputs, updates=updates, **kwargs) 2503 2504 /opt/conda/lib/python2.7/site-packages/keras/backend/tensorflow_backend.pyc in __init__(self, inputs, outputs, updates, name, **session_kwargs) 2443 self.inputs = list(inputs) 2444 self.outputs = list(outputs)-> 2445 with tf.control_dependencies(self.outputs): 2446 updates_ops = [] 2447 for update in updates:/opt/conda/lib/python2.7/site-packages/tensorflow/python/framework/ops.pyc in control_dependencies(control_inputs) 4302 """ 4303 if context.in_graph_mode():-> 4304 return get_default_graph().control_dependencies(control_inputs) 4305 else: 4306 return _NullContextmanager()/opt/conda/lib/python2.7/site-packages/tensorflow/python/framework/ops.pyc in control_dependencies(self, control_inputs) 4015 if isinstance(c, IndexedSlices): 4016 c = c.op-> 4017 c = self.as_graph_element(c) 4018 if isinstance(c, Tensor): 4019 c = c.op/opt/conda/lib/python2.7/site-packages/tensorflow/python/framework/ops.pyc in as_graph_element(self, obj, allow_tensor, allow_operation) 3033 3034 with self._lock:-> 3035 return self._as_graph_element_locked(obj, allow_tensor, allow_operation) 3036 3037 def _as_graph_element_locked(self, obj, allow_tensor, allow_operation):/opt/conda/lib/python2.7/site-packages/tensorflow/python/framework/ops.pyc in _as_graph_element_locked(self, obj, allow_tensor, allow_operation) 3122 # We give up! 3123 raise TypeError("Can not convert a %s into a %s." % (type(obj).__name__,-> 3124 types_str)) 3125 3126 def get_operations(self):TypeError: Can not convert a NoneType into a Tensor or Operation.
有什么想法吗?谢谢。
回答:
问题出在J = tf.gradients(y_list[i], x_list[j])
这行。x_list[j]
是从model.input[0]
派生出来的,但是从x_list[j]
到model.output[0]
之间没有直接路径。你需要要么拆分模型输入,然后重新组合并运行模型,要么相对于整个输入创建导数,然后只选择其中的第j
行。
第一种方法:
inputs = tf.keras.Inputs((100,))uninteresting, interesting, more_uninteresting = tf.split(inputs, [50, 10, 40], axis=1)inputs = tf.concat([uninteresting, interesting, more_uninteresting], axis=1)model = Model(inputs)...J, = tf.gradients(y_list[i], interesting)
第二种方法:
J, = tf.gradients(y_list[i], model.input[0])J = J[:, 50:60]
尽管如此,对于大量的y
索引,这仍然会很慢,所以我强烈建议你确定你确实需要雅可比矩阵本身(例如,不是雅可比-向量乘积的结果)。