我在使用 Caffe 框架,并希望训练以下网络:
当我执行以下命令时:
caffe train –solver solver.prototxt
它抛出以下错误:
`F0802 14:31:54.506695 28038 insert_splits.cpp:29] Unknown bottom blob 'image' (layer 'conv1', bottom index 0)*** Check failure stack trace: ***@ 0x7ff2941c3f9d google::LogMessage::Fail()@ 0x7ff2941c5e03 google::LogMessage::SendToLog()@ 0x7ff2941c3b2b google::LogMessage::Flush()@ 0x7ff2941c67ee google::LogMessageFatal::~LogMessageFatal()@ 0x7ff2947cedbe caffe::InsertSplits()@ 0x7ff2948306de caffe::Net<>::Init()@ 0x7ff294833a81 caffe::Net<>::Net()@ 0x7ff29480ce6a caffe::Solver<>::InitTestNets()@ 0x7ff29480ee85 caffe::Solver<>::Init()@ 0x7ff29480f19a caffe::Solver<>::Solver()@ 0x7ff2947f4343 caffe::Creator_SGDSolver<>()@ 0x40b1a0 (unknown)@ 0x407373 (unknown)@ 0x7ff292e40741 __libc_start_main@ 0x407b79 (unknown)Abortado (`core' generado)
代码是(train2.prototxt):
name: "xxxxxx"layer { name: "image" type: "HDF5Data" top: "image" top: "label" hdf5_data_param { source: "h5a.train.h5.txt" batch_size: 64 } include { phase: TRAIN }}layer { name: "conv1" type: "Convolution" bottom: "image" top: "conv1" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 96 kernel_size: 11 stride: 4 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } }}layer { name: "norm1" type: "LRN" bottom: "conv1" top: "norm1" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 }}layer { name: "pool1" type: "Pooling" bottom: "norm1" top: "pool1" pooling_param { pool: MAX kernel_size: 3 stride: 2 }}layer { name: "norm2" type: "LRN" bottom: "pool1" top: "norm2" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 }}layer { name: "conv3" type: "Convolution" bottom: "norm2" top: "conv3" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 384 pad: 1 kernel_size: 3 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } }}layer { name: "pool2" type: "Pooling" bottom: "conv3" top: "pool2" pooling_param { pool: MAX kernel_size: 3 stride: 2 }}layer { name: "improd3" type: "InnerProduct" bottom: "pool2" top: "improd3" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 1000 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } }}layer { name: "accuracy" type: "Accuracy" bottom: "improd3" bottom: "label" top: "accuracy" include { phase: TEST }}layer { name: "loss" type: "SoftmaxWithLoss" bottom: "improd3" bottom: "label" top: "loss"}
solver.prototxt 内容如下:
net: "train2.prototxt"test_iter: 100test_interval: 1000# lr for fine-tuning should be lower than when starting from scratchbase_lr: 0.001lr_policy: "step"gamma: 0.1# stepsize should also be lower, as we're closer to being donestepsize: 20000display: 20max_iter: 100000momentum: 0.9weight_decay: 0.0005snapshot: 10000snapshot_prefix: "caffe"solver_mode: CPU
我被这个问题困住了,无法开始网络训练。
回答:
这是因为,即使您尝试执行Train
阶段,Test
阶段也会用于验证。由于测试阶段没有输入数据层,conv1
层无法找到输入 blob image
。这是因为您在 solver 中定义了test_*
参数,并且在 train2.prototxt 中的某些层中定义了phase: TEST
。从 solver 和代表TEST
阶段的层中删除上述参数将有助于您无障碍地运行训练。