Skip to content

run vgg_16 error #45

@fbiswt

Description

@fbiswt

I got a error when I run "train.sh" in demo/image_classification
I0907 14:35:07.593504 49407 Util.cpp:144] commandline: /home/hadoop/paddle/paddle/build/bin/../opt/paddle/bin/paddle_trainer --config=vgg_16_cifar.py --dot_period=10 --log_period=100 --test_all_data_in_one_period=1 --use_gpu=1 --trainer_count=1 --num_passes=200 --save_dir=./cifar_vgg_model
I0907 14:35:08.002375 49407 Util.cpp:113] Calling runInitFunctions
I0907 14:35:08.002609 49407 Util.cpp:126] Call runInitFunctions done.
[INFO 2016-09-07 14:35:08,043 layers.py:1438] channels=3 size=3072
[INFO 2016-09-07 14:35:08,043 layers.py:1438] output size for conv_0 is 32
[INFO 2016-09-07 14:35:08,044 layers.py:1438] channels=64 size=65536
[INFO 2016-09-07 14:35:08,045 layers.py:1438] output size for conv_1 is 32
[INFO 2016-09-07 14:35:08,046 layers.py:1499] output size for pool_0 is 16_16
[INFO 2016-09-07 14:35:08,046 layers.py:1438] channels=64 size=16384
[INFO 2016-09-07 14:35:08,046 layers.py:1438] output size for conv_2 is 16
[INFO 2016-09-07 14:35:08,047 layers.py:1438] channels=128 size=32768
[INFO 2016-09-07 14:35:08,048 layers.py:1438] output size for conv_3 is 16
[INFO 2016-09-07 14:35:08,049 layers.py:1499] output size for pool_1 is 8_8
[INFO 2016-09-07 14:35:08,049 layers.py:1438] channels=128 size=8192
[INFO 2016-09-07 14:35:08,049 layers.py:1438] output size for conv_4 is 8
[INFO 2016-09-07 14:35:08,051 layers.py:1438] channels=256 size=16384
[INFO 2016-09-07 14:35:08,051 layers.py:1438] output size for conv_5 is 8
[INFO 2016-09-07 14:35:08,052 layers.py:1438] channels=256 size=16384
[INFO 2016-09-07 14:35:08,052 layers.py:1438] output size for conv_6 is 8
[INFO 2016-09-07 14:35:08,053 layers.py:1499] output size for pool_2 is 4_4
[INFO 2016-09-07 14:35:08,054 layers.py:1438] channels=256 size=4096
[INFO 2016-09-07 14:35:08,054 layers.py:1438] output size for conv_7 is 4
[INFO 2016-09-07 14:35:08,055 layers.py:1438] channels=512 size=8192
[INFO 2016-09-07 14:35:08,055 layers.py:1438] output size for conv_8 is 4
[INFO 2016-09-07 14:35:08,056 layers.py:1438] channels=512 size=8192
[INFO 2016-09-07 14:35:08,056 layers.py:1438] output size for conv_9 is 4
[INFO 2016-09-07 14:35:08,058 layers.py:1499] output size for pool_3 is 2_2
[INFO 2016-09-07 14:35:08,058 layers.py:1499] output size for pool_4 is 1_1
[INFO 2016-09-07 14:35:08,060 networks.py:1122] The input order is [image, label]
[INFO 2016-09-07 14:35:08,060 networks.py:1129] The output order is [cost_0]
I0907 14:35:08.067443 49407 Trainer.cpp:169] trainer mode: Normal
I0907 14:35:08.075389 49407 PyDataProvider2.cpp:219] loading dataprovider image_provider::processData
[INFO 2016-09-07 14:35:08,109 image_provider.py:52] Image size: 32
[INFO 2016-09-07 14:35:08,109 image_provider.py:53] Meta path: data/cifar-out/batches/batches.meta
[INFO 2016-09-07 14:35:08,109 image_provider.py:58] DataProvider Initialization finished
I0907 14:35:08.109668 49407 PyDataProvider2.cpp:219] loading dataprovider image_provider::processData
[INFO 2016-09-07 14:35:08,109 image_provider.py:52] Image size: 32
[INFO 2016-09-07 14:35:08,109 image_provider.py:53] Meta path: data/cifar-out/batches/batches.meta
[INFO 2016-09-07 14:35:08,109 image_provider.py:58] DataProvider Initialization finished
I0907 14:35:08.109978 49407 GradientMachine.cpp:134] Initing parameters..
I0907 14:35:08.554066 49407 GradientMachine.cpp:141] Init parameters done.
Current Layer forward/backward stack is
LayerName: batch_norm_10
LayerName: fc_layer_0
LayerName: dropout_0
LayerName: pool_4
LayerName: pool_3
LayerName: batch_norm_9
LayerName: conv_9
LayerName: batch_norm_8
LayerName: conv_8
LayerName: batch_norm_7
LayerName: conv_7
LayerName: pool_2
LayerName: batch_norm_6
LayerName: conv_6
LayerName: batch_norm_5
LayerName: conv_5
LayerName: batch_norm_4
LayerName: conv_4
LayerName: pool_1
LayerName: batch_norm_3
LayerName: conv_3
LayerName: batch_norm_2
LayerName: conv_2
LayerName: pool_0
LayerName: batch_norm_1
LayerName: conv_1
LayerName: batch_norm_0
LayerName: conv_0
LayerName: image
*_* Aborted at 1473230129 (unix time) try "date -d @1473230129" if you are using GNU date ***
Current Layer forward/backward stack is
PC: @ 0x7fb72227a855 (unknown)
Current Layer forward/backward stack is
*** SIGSEGV (@0x130701aa00) received by PID 49407 (TID 0x7fb7386fe800) from PID 117549568; stack trace: ***
Current Layer forward/backward stack is
@ 0x7fb737fdf100 (unknown)
Current Layer forward/backward stack is
@ 0x7fb72227a855 (unknown)
Current Layer forward/backward stack is
@ 0x8b3350 hl_batch_norm_backward()
Current Layer forward/backward stack is
@ 0x5d4684 paddle::CudnnBatchNormLayer::backward()
Current Layer forward/backward stack is
@ 0x620bae paddle::NeuralNetwork::backward()
Current Layer forward/backward stack is
@ 0x69c95d paddle::TrainerInternal::forwardBackwardBatch()
Current Layer forward/backward stack is
@ 0x69cf14 paddle::TrainerInternal::trainOneBatch()
Current Layer forward/backward stack is
@ 0x698350 paddle::Trainer::trainOnePass()
Current Layer forward/backward stack is
@ 0x69ba47 paddle::Trainer::train()
Current Layer forward/backward stack is
@ 0x53aea3 main
Current Layer forward/backward stack is
@ 0x7fb73587bb15 __libc_start_main
Current Layer forward/backward stack is
@ 0x545b15 (unknown)
/home/hadoop/paddle/paddle/build/bin/paddle: line 46: 49407 Segmentation fault (core dumped) ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_trainer ${@:2}
No data to plot. Exiting!

Someone know why?

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions