Theano: 如何向神经网络提供训练数据

我正在尝试使用Theano创建一个用于“逻辑与”的简单多层感知器（MLP）。输入和输出之间有一个层。其结构如下：

2个值的输入 -> 与权重相乘，加上偏置 -> softmax -> 1个值的输出

维度的变化是由权重矩阵引起的。

实现基于这个教程： http://deeplearning.net/tutorial/logreg.html

这是我的Layer类：

class Layer():"""这是一个mlp中的层它并不用于预测结果因此它不计算损失。在最后一层的输出上应用负对数似然=成本的函数"""def __init__(self, input, n_in, n_out):    self.W = theano.shared(            value=numpy.zeros(                    (n_in, n_out),                    dtype=theano.config.floatX            ),            name="W",            borrow=True    )    self.b = theano.shared(            value=numpy.zeros((n_in                               , n_out),                              dtype=theano.config.floatX),            name="b",            borrow=True    )    self.output = T.nnet.softmax(T.dot(input, self.W) + self.b)    self.params = (self.W, self.b)    self.input = input

这个类旨在具有模块化。我希望能够添加多个层，而不仅仅是一个层。因此，预测、成本和错误的函数位于类之外（与教程相反）：

def y_pred(output):    return T.argmax(output, axis=1)def negative_log_likelihood(output, y):    return -T.mean(T.log(output)[T.arange(y.shape[0]), y])def errors(output, y):    # 检查y是否与y_pred具有相同的维度    if y.ndim != y_pred(output).ndim:        raise TypeError(                'y应与self.y_pred具有相同的形状',                ('y', y.type, 'y_pred', y_pred(output).type)        )    # 检查y是否为正确的datatype    if y.dtype.startswith('int'):        # T.neq操作符返回一个0和1的向量，其中1        # 表示预测错误        return T.mean(T.neq(y_pred(output), y))    else:        raise NotImplementedError()

逻辑与有4个训练案例：

[0,0] -> 0
[1,0] -> 0
[0,1] -> 0
[1,1] -> 1

以下是分类器的设置以及用于训练和评估的函数：

data_x = numpy.matrix([[0, 0],                       [1, 0],                       [0, 1],                       [1, 1]])data_y = numpy.array([0,                      0,                      0,                      1])train_set_x = theano.shared(numpy.asarray(data_x,                         dtype=theano.config.floatX),                         borrow=True)train_set_y = T.cast(theano.shared(numpy.asarray(data_y,                         dtype=theano.config.floatX),                         borrow=True),"int32")x = T.vector("x",theano.config.floatX)  # datay = T.ivector("y")  # labelsclassifier = Layer(input=x, n_in=2, n_out=1)cost = negative_log_likelihood(classifier.output, y)g_W = T.grad(cost=cost, wrt=classifier.W)g_b = T.grad(cost=cost, wrt=classifier.b)index = T.lscalar()learning_rate = 0.15updates = [    (classifier.W, classifier.W - learning_rate * g_W),    (classifier.b, classifier.b - learning_rate * g_b)]train_model = theano.function(        inputs=[index],        outputs=cost,        updates=updates,        givens={            x: train_set_x[index],            y: train_set_y[index]        })validate_model = theano.function(        inputs=[index],        outputs=classifier.errors(y),        givens={            x: train_set_x[index],            y: train_set_y[index]        })

我试图遵循惯例。数据矩阵中的每一行都是一个训练样本。每个训练样本都与正确的输出匹配。不幸的是，代码出现了错误。我无法解释错误信息。我做错了什么？错误：

TypeError: 无法将类型TensorType(int32, scalar)（变量Subtensor{int64}.0的类型）转换为类型TensorType(int32, vector)。您可以尝试手动将Subtensor{int64}.0转换为TensorType(int32, vector)类型。

这个错误发生在Theano代码的深处。我程序中冲突的行是：

train_model = theano.function(        inputs=[index],        outputs=cost,        updates=updates,        givens={            x: train_set_x[index],            y: train_set_y[index]      # <---------------在这里        })

显然，y和训练数据的维度之间存在不匹配。我的完整代码在pastebin上： http://pastebin.com/U5jYitk2完整的错误信息在pastebin上： http://pastebin.com/hUQJhfNM

简洁的问题：在Theano中，向mlp提供训练数据的正确方法是什么？我犯了什么错误？

我复制了教程的大部分代码。值得注意的变化（可能是错误原因）是：

用于y的训练数据不是矩阵。我认为这是正确的，因为我的网络的输出只是一个标量值
第一层的输入是一个向量。这个变量命名为x
训练数据的访问不使用切片。在教程中，训练数据非常复杂，我发现数据访问代码很难阅读。我相信x应该是一个数据矩阵的行。这就是我实现的方式

更新：我使用了Amir的代码。看起来很好，谢谢你。

但它也产生了错误。最后一个循环超出了界限：

/usr/bin/python3.4 /home/lhk/programming/sk/mlp/mlp/Layer.py Traceback (最近一次调用最后): File “/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py”, line 595, in call outputs = self.fn() ValueError: y_i值超出范围

在处理上述异常时，发生了另一个异常：

Traceback (最近一次调用最后): File “/home/lhk/programming/sk/mlp/mlp/Layer.py”, line 113, in train_model(i) File “/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py”, line 606, in call storage_map=self.fn.storage_map) File “/usr/local/lib/python3.4/dist-packages/theano/gof/link.py”, line 206, in raise_with_op raise exc_type(exc_value).with_traceback(exc_trace) File “/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py”, line 595, in call outputs = self.fn() ValueError: y_i值超出范围导致错误的应用节点：CrossentropySoftmaxArgmax1HotWithBias(Dot22.0, b, Elemwise{Cast{int32}}.0) 输入类型：[TensorType(float64, matrix), TensorType(float64, vector), TensorType(int32, vector)] 输入形状：[(1, 1), (1,), (1,)] 输入步长：[(8, 8), (8,), (4,)] 输入值：[array([[ 0.]]), array([ 0.]), array([1], dtype=int32)]

提示：重新运行时禁用大多数Theano优化可以为您提供此节点创建时的回溯。这可以通过设置Theano标志’optimizer=fast_compile’来实现。如果这不起作用，可以通过设置’optimizer=None’来禁用Theano优化。提示：使用Theano标志’exception_verbosity=high’来获取此应用节点的调试打印和存储映射脚印。

第113行是这一行：

#训练模型for i in range(train_set_x.shape[0].eval()):    train_model(i)              # <-----------------在这里

我认为这是因为训练数据的索引使用了index和index+1。为什么这是必要的？一行应该是一个训练样本。而一行是train_set_x[index]

编辑：我调试了代码。没有切片时返回的是一维数组，使用切片时是二维的。一维应该与矩阵x不兼容。

但在这样做的时候，我发现了另一个奇怪的问题：我添加了这段代码来查看训练的效果：

print("之前")print(classifier.W.get_value())print(classifier.b.get_value())for i in range(3):    train_model(i)print("之后")print(classifier.W.get_value())print(classifier.b.get_value())之前[[ 0.] [ 0.]][ 0.]之后[[ 0.] [ 0.]][ 0.]

这有意义，因为前三个样本的正确输出是0。如果我改变顺序并将训练样本(1,1),1移到前面，程序就会崩溃。

之前 [[ 0.] [ 0.]] [ 0.] Traceback (最近一次调用最后): File “/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py”, line 595, in call outputs = self.fn() ValueError: y_i值超出范围

在处理上述异常时，发生了另一个异常：

Traceback (最近一次调用最后): File “/home/lhk/programming/sk/mlp/mlp/Layer.py”, line 121, in train_model(i) File “/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py”, line 606, in call storage_map=self.fn.storage_map) File “/usr/local/lib/python3.4/dist-packages/theano/gof/link.py”, line 206, in raise_with_op raise exc_type(exc_value).with_traceback(exc_trace) File “/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py”, line 595, in call outputs = self.fn() ValueError: y_i值超出范围导致错误的应用节点：CrossentropySoftmaxArgmax1HotWithBias(Dot22.0, b, Elemwise{Cast{int32}}.0) 输入类型：[TensorType(float64, matrix), TensorType(float64, vector), TensorType(int32, vector)] 输入形状：[(1, 1), (1,), (1,)] 输入步长：[(8, 8), (8,), (4,)] 输入值：[array([[ 0.]]), array([ 0.]), array([1], dtype=int32)]

提示：重新运行时禁用大多数Theano优化可以为您提供此节点创建时的回溯。这可以通过设置Theano标志’optimizer=fast_compile’来实现。如果这不起作用，可以通过设置’optimizer=None’来禁用Theano优化。提示：使用Theano标志’exception_verbosity=high’来获取此应用节点的调试打印和存储映射脚印。

更新

我安装了Python2.7和Theano，并再次尝试运行代码。发生了相同的错误。我添加了详细的异常处理。以下是输出：

/usr/bin/python2.7 /home/lhk/programming/sk/mlp/mlp/Layer.pyTraceback (最近一次调用最后):  File "/home/lhk/programming/sk/mlp/mlp/Layer.py", line 113, in <module>    train_model(i)  File "/home/lhk/.local/lib/python2.7/site-packages/theano/compile/function_module.py", line 595, in __call__    outputs = self.fn()  File "/home/lhk/.local/lib/python2.7/site-packages/theano/gof/link.py", line 485, in streamline_default_f    raise_with_op(node, thunk)  File "/home/lhk/.local/lib/python2.7/site-packages/theano/gof/link.py", line 481, in streamline_default_f    thunk()  File "/home/lhk/.local/lib/python2.7/site-packages/theano/gof/op.py", line 768, in rval    r = p(n, [x[0] for x in i], o)  File "/home/lhk/.local/lib/python2.7/site-packages/theano/tensor/nnet/nnet.py", line 896, in perform    nll[i] = -row[y_idx[i]] + m + numpy.log(sum_j)IndexError: 索引1超出了轴0的范围，大小为1导致错误的应用节点：CrossentropySoftmaxArgmax1HotWithBias(Dot22.0, b, Subtensor{int32:int32:}.0)输入类型：[TensorType(float64, matrix), TensorType(float64, vector), TensorType(int32, vector)]输入形状：[(1, 1), (1,), (1,)]输入步长：[(8, 8), (8,), (4,)]输入值：[array([[ 0.]]), array([ 0.]), array([1], dtype=int32)]应用节点的调试打印： CrossentropySoftmaxArgmax1HotWithBias.0 [@A] <TensorType(float64, vector)> ''    |Dot22 [@B] <TensorType(float64, matrix)> ''    | |Subtensor{int32:int32:} [@C] <TensorType(float64, matrix)> ''    | | |<TensorType(float64, matrix)> [@D] <TensorType(float64, matrix)> | | |ScalarFromTensor [@E] <int32> ''    | | | |<TensorType(int32, scalar)> [@F] <TensorType(int32, scalar)> | | |ScalarFromTensor [@G] <int32> ''    | |   |Elemwise{add,no_inplace} [@H] <TensorType(int32, scalar)> ''    | |     |<TensorType(int32, scalar)> [@F] <TensorType(int32, scalar)> | |     |TensorConstant{1} [@I] <TensorType(int8, scalar)> | |W [@J] <TensorType(float64, matrix)> |b [@K] <TensorType(float64, vector)> |Subtensor{int32:int32:} [@L] <TensorType(int32, vector)> ''      |Elemwise{Cast{int32}} [@M] <TensorType(int32, vector)> ''      | |<TensorType(float64, vector)> [@N] <TensorType(float64, vector)>   |ScalarFromTensor [@E] <int32> ''      |ScalarFromTensor [@G] <int32> ''   CrossentropySoftmaxArgmax1HotWithBias.1 [@A] <TensorType(float64, matrix)> ''   CrossentropySoftmaxArgmax1HotWithBias.2 [@A] <TensorType(int32, vector)> ''   提示：重新运行时禁用大多数Theano优化可以为您提供此节点创建时的回溯。这可以通过设置Theano标志'optimizer=fast_compile'来实现。如果这不起作用，可以通过设置'optimizer=None'来禁用Theano优化。进程以退出代码1结束

更新：

我再次查看了训练数据。任何标记为1的样本都会产生上述错误。

data_y = numpy.array([1,                      1,                      1,                      1])

上面的样本标签会在每次train_model(i)中崩溃，对于i在(0,1,2,3)中。显然，样本的索引和样本内容之间存在干扰。

更新：问题确实如Amir的联系人指出的那样，是输出层的维度问题。我误以为可以直接在输出神经元中训练网络以编码“逻辑与”函数的输出。虽然这当然是可能的，但这种训练方法使用y值索引来选择应具有最高值的输出节点。在将输出大小更改为两个后，代码可以工作。经过足够的训练，所有情况的错误确实都变为零。

回答：

这是您问题的有效代码。您的代码中有很多小错误。导致您遇到的错误的原因是将b定义为n_in乘n_out的矩阵，而不是简单地将其定义为’n_out’向量。更新部分用括号[]定义，而不是用括号()定义。

此外，索引被定义为int32符号标量（这不是很重要）。另一个重要更改是根据正确的索引定义函数。您使用index编译函数的方式不知为何无法让函数编译。您还将输入声明为向量。这样，您将无法使用小批量或全批量训练模型。因此，最好将其声明为符号矩阵。要使用向量，您需要在共享变量上将输入存储为向量而不是矩阵，以使程序运行。因此，声明为向量会带来这样的麻烦。最后，您使用classifier.errors(y)来编译验证函数，尽管您已经从Layer类中删除了errors函数。

...（此处省略代码，因为代码较长且未要求翻译）...

这是更新后的代码。请注意，上述代码与下面的代码之间的主要区别是后者适合二元问题，而前者仅在您有多个类的问题时有效，这在此情况下并不适用。我在这里放置这两个代码片段是为了教育目的。请阅读注释以了解上述代码的问题以及我如何解决这些问题的方法。

...（此处省略代码，因为代码较长且未要求翻译）...

学技术

Theano: 如何向神经网络提供训练数据

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复