Skip to content

Floating point exception (overflow) #46

@F0REacH

Description

@F0REacH

Used same config as in Issue #44 with CPU (changed only batch_size=45)
Looks like floating point overflow, but I can't figure what causing it. Maybe incorrect layer connection?

I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/utils/Util.cpp:144] commandline: /opt/paddle/bin/../opt/paddle/bin/paddle_trainer --config=trainer_config.py --save_dir=./model_output --job=train --use_gpu=false --trainer_count=4 --num_passes=100000 --log_period=10 --dot_period=1 --show_parameter_stats_period=1000 --test_all_data_in_one_period=1 --saving_period=100 
I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/utils/Util.cpp:113] Calling runInitFunctions
I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/utils/Util.cpp:126] Call runInitFunctions done.
[INFO 2016-09-08 02:48:21,778 networks.py:1122] The input order is [input, label]
[INFO 2016-09-08 02:48:21,778 networks.py:1129] The output order is [__cost_0__]
I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/trainer/Trainer.cpp:169] trainer mode: Normal
I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/dataproviders/PyDataProvider2.cpp:219] loading dataprovider dataprovider::process
I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/dataproviders/PyDataProvider2.cpp:219] loading dataprovider dataprovider::process
I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/gradientmachines/GradientMachine.cpp:134] Initing parameters..
I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/gradientmachines/GradientMachine.cpp:141] Init parameters done.
....I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/trainer/TrainerInternal.cpp:179]  Pass=0 Batch=4 samples=178 AvgCost=15884.9 Eval: classification_error_evaluator=0.993309 
I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/trainer/Tester.cpp:111]  Test samples=2 cost=172604 Eval: classification_error_evaluator=0.995207 
I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/gradientmachines/GradientMachine.cpp:112] Saving parameters to ./model_output/pass-00000
I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/utils/Util.cpp:219] copy trainer_config.py to ./model_output/pass-00000
....I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/trainer/TrainerInternal.cpp:179]  Pass=1 Batch=4 samples=178 AvgCost=14954.9 Eval: classification_error_evaluator=0.980111 
I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/trainer/Tester.cpp:111]  Test samples=2 cost=159166 Eval: classification_error_evaluator=0.975115 
....I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/trainer/TrainerInternal.cpp:179]  Pass=2 Batch=4 samples=178 AvgCost=14009.3 Eval: classification_error_evaluator=0.935489 
I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/trainer/Tester.cpp:111]  Test samples=2 cost=135530 Eval: classification_error_evaluator=0.871007 
... some steps ....

I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/trainer/Tester.cpp:111]  Test samples=2 cost=97979 Eval: classification_error_evaluator=0.676567 
....I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/trainer/TrainerInternal.cpp:179]  Pass=46 Batch=4 samples=178 AvgCost=8838.92 Eval: classification_error_evaluator=0.705236 
I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/trainer/Tester.cpp:111]  Test samples=2 cost=102650 Eval: classification_error_evaluator=0.730931 
....I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/trainer/TrainerInternal.cpp:179]  Pass=47 Batch=4 samples=178 AvgCost=8806.03 Eval: classification_error_evaluator=0.701997 
I /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/trainer/Tester.cpp:111]  Test samples=2 cost=91264.2 Eval: classification_error_evaluator=0.659892 
/opt/paddle/bin/paddle: line 46:  4547 Floating point exception(core dumped) ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_trainer ${@:2}

Error repeats after ~40 passes each time I run training
Backtrace:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/opt/paddle/bin/../opt/paddle/bin/paddle_trainer --config=trainer_config.py --s'.
Program terminated with signal SIGFPE, Arithmetic exception.
#0  0x00007f53d32d8a15 in __ieee754_exp_avx (x=<optimized out>) at ../sysdeps/ieee754/dbl-64/e_exp.c:214
214     ../sysdeps/ieee754/dbl-64/e_exp.c: No such file or directory.
[Current thread is 1 (Thread 0x7f53cec29700 (LWP 4548))]
(gdb) bt
#0  0x00007f53d32d8a15 in __ieee754_exp_avx (x=<optimized out>) at ../sysdeps/ieee754/dbl-64/e_exp.c:214
#1  0x00007f53d329847f in __GI___exp (x=711.2794189453125) at ../sysdeps/ieee754/dbl-64/w_exp.c:26
#2  0x0000000000e2c4dd in hppl::tanh (a=-355.639709) at /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/cuda/src/hl_cpu_functions.cc:33
#3  0x0000000000a3dd22 in hppl::forward::lstm::operator() (this=0x7f53cec28050, valueIn=@0x7f53cec28014: -0.940858305, valueIg=@0x7f53cec28010: 0.999997735, valueFg=@0x7f53cec2800c: 0.999997735, valueOg=@0x7f53cec28008: 0.999997735, prevState=@0x7f53cec27ff4: -354.699646, 
    state=@0x7f53cec27ff8: -355.639709, stateAtv=@0x7f53cec27ff0: 0.368853271, output=@0x7f53cec27fec: 0.165656254, checkI=@0x7f53cec28004: -0.0588896535, checkF=@0x7f53cec28000: -0.0764867961, checkO=@0x7f53cec27ffc: -0.0473404899, actInput=0xe2c4ba <hppl::tanh(float)>, 
    actGate=0xe2c431 <hppl::sigmoid(float)>, actState=0xe2c4ba <hppl::tanh(float)>) at /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/cuda/include/hl_lstm_ops.cuh:65
#4  0x0000000000a3ec6c in hl_naive_lstm_forward_one_sequence<hppl::forward::lstm> (op=..., value=..., frameSize=6, active_node=HL_ACTIVATION_TANH, active_gate=HL_ACTIVATION_SIGMOID, active_state=HL_ACTIVATION_TANH)
    at /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/cuda/include/hl_cpu_lstm.cuh:60
#5  0x0000000000a3e662 in hl_cpu_lstm_forward<hppl::forward::lstm> (op=..., value=..., frameSize=6, active_node=HL_ACTIVATION_TANH, active_gate=HL_ACTIVATION_SIGMOID, active_state=HL_ACTIVATION_TANH) at /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/cuda/include/hl_cpu_lstm.cuh:348
#6  0x0000000000a3d94f in paddle::LstmCompute::forwardOneSequence<false> (this=0x2e423a8, value=..., frameSize=6) at /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/layers/LstmCompute.cpp:32
#7  0x0000000000a3da0f in paddle::LstmCompute::forwardBatch<false> (this=0x2e423a8, value=..., frameSize=6, batchSize=10) at /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/layers/LstmCompute.cpp:47
#8  0x0000000000a3b75d in paddle::LstmLayer::forwardBatch (this=0x2e42010, batchSize=37105, numSequences=11, starts=0x7f53b805bb40, inputValue=std::shared_ptr (count 2, weak 0) 0x7f53b80d1a10) at /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/layers/LstmLayer.cpp:501
#9  0x0000000000a38c8c in paddle::LstmLayer::forward (this=0x2e42010, passType=paddle::enumeration_wrapper::PASS_TRAIN) at /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/layers/LstmLayer.cpp:172
#10 0x0000000000ac2334 in paddle::NeuralNetwork::forward (this=0x2e1a3e0, inArgs=std::vector of length 2, capacity 2 = {...}, outArgs=0x2e10d08, passType=paddle::enumeration_wrapper::PASS_TRAIN)
    at /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/gradientmachines/NeuralNetwork.cpp:242
#11 0x0000000000ad620c in paddle::TrainerThread::forward (this=0x2e10be0) at /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/gradientmachines/MultiGradientMachine.cpp:581
#12 0x0000000000ad5ef2 in paddle::TrainerThread::computeThread (this=0x2e10be0) at /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/gradientmachines/MultiGradientMachine.cpp:519
#13 0x0000000000ad5abd in paddle::TrainerThread::<lambda()>::operator()(void) const (__closure=0x2ef45f8) at /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/gradientmachines/MultiGradientMachine.cpp:465
#14 0x0000000000adb9b2 in std::_Bind_simple<paddle::TrainerThread::start()::<lambda()>()>::_M_invoke<>(std::_Index_tuple<>) (this=0x2ef45f8) at /opt/gcc/include/c++/4.9.4/functional:1700
#15 0x0000000000adb6ed in std::_Bind_simple<paddle::TrainerThread::start()::<lambda()>()>::operator()(void) (this=0x2ef45f8) at /opt/gcc/include/c++/4.9.4/functional:1688
#16 0x0000000000adb4d2 in std::thread::_Impl<std::_Bind_simple<paddle::TrainerThread::start()::<lambda()>()> >::_M_run(void) (this=0x2ef45e0) at /opt/gcc/include/c++/4.9.4/thread:115
#17 0x00007f53d363d380 in std::execute_native_thread_routine_compat (__p=<optimized out>) at ../../../../../libstdc++-v3/src/c++11/thread.cc:110
#18 0x00007f53d7578454 in start_thread (arg=0x7f53cec29700) at pthread_create.c:333
#19 0x00007f53d2da715d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) frame 0
#0  0x00007f53d32d8a15 in __ieee754_exp_avx (x=<optimized out>) at ../sysdeps/ieee754/dbl-64/e_exp.c:214
214     in ../sysdeps/ieee754/dbl-64/e_exp.c
(gdb) info locals
ctx = {env = {__control_word = <optimized out>, __glibc_reserved1 = <optimized out>, __status_word = <optimized out>, __glibc_reserved2 = <optimized out>, __tags = <optimized out>, __glibc_reserved3 = <optimized out>, __eip = <optimized out>, __cs_selector = <optimized out>, 
    __opcode = <optimized out>, __glibc_reserved4 = <optimized out>, __data_offset = <optimized out>, __data_selector = <optimized out>, __glibc_reserved5 = <optimized out>, __mxcsr = 39281}, updated_status = <optimized out>}
bexp = <optimized out>
t = 0.11041169086502123
eps = <optimized out>
del = <optimized out>
base = 0.11041259765625
y = 25769803776.110413
al = 1.1167387406605691
bet = -1.4572163044673799e-09
res = 1.1167377264919247
rem = -1.0141686444258574e-06
cor = -2.8229067033144067e-17
junk1 = <optimized out>
m = 1082538556
n = 1082538556
ex = <optimized out>
retval = <optimized out>
(gdb) frame 1
#1  0x00007f53d329847f in __GI___exp (x=711.2794189453125) at ../sysdeps/ieee754/dbl-64/w_exp.c:26
26      ../sysdeps/ieee754/dbl-64/w_exp.c: No such file or directory.
(gdb) info locals
z = <optimized out>
(gdb) frame 2
#2  0x0000000000e2c4dd in hppl::tanh (a=-355.639709) at /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/cuda/src/hl_cpu_functions.cc:33
33          return (2.0 / (1.0 + exp(-2.0*a))) - 1.0;
(gdb) info locals
No locals.
(gdb) frame 3
#3  0x0000000000a3dd22 in hppl::forward::lstm::operator() (this=0x7f53cec28050, valueIn=@0x7f53cec28014: -0.940858305, valueIg=@0x7f53cec28010: 0.999997735, valueFg=@0x7f53cec2800c: 0.999997735, valueOg=@0x7f53cec28008: 0.999997735, prevState=@0x7f53cec27ff4: -354.699646, 
    state=@0x7f53cec27ff8: -355.639709, stateAtv=@0x7f53cec27ff0: 0.368853271, output=@0x7f53cec27fec: 0.165656254, checkI=@0x7f53cec28004: -0.0588896535, checkF=@0x7f53cec28000: -0.0764867961, checkO=@0x7f53cec27ffc: -0.0473404899, actInput=0xe2c4ba <hppl::tanh(float)>, 
    actGate=0xe2c431 <hppl::sigmoid(float)>, actState=0xe2c4ba <hppl::tanh(float)>) at /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/cuda/include/hl_lstm_ops.cuh:65
65          stateAtv = actState(state);
(gdb) info locals
No locals.
(gdb) frame 4
#4  0x0000000000a3ec6c in hl_naive_lstm_forward_one_sequence<hppl::forward::lstm> (op=..., value=..., frameSize=6, active_node=HL_ACTIVATION_TANH, active_gate=HL_ACTIVATION_SIGMOID, active_state=HL_ACTIVATION_TANH)
    at /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/cuda/include/hl_cpu_lstm.cuh:60
60          op(rValueIn,
(gdb) info locals
i = 3
rValueIn = -0.940858305
rValueFg = 0.999997735
rCheckO = -0.0473404899
rPrevState = -354.699646
rOut = 0.165656254
valueOg = 0x7f5324c72908
rValueIg = 0.999997735
valueIn = 0x7f5324c728c0
valueFg = 0x7f5324c728f0
rCheckI = -0.0588896535
valueIg = 0x7f5324c728d8
rValueOg = 0.999997735
rCheckF = -0.0764867961
rState = -355.639709
rStateAtv = 0.368853271
(gdb) frame 5
#5  0x0000000000a3e662 in hl_cpu_lstm_forward<hppl::forward::lstm> (op=..., value=..., frameSize=6, active_node=HL_ACTIVATION_TANH, active_gate=HL_ACTIVATION_SIGMOID, active_state=HL_ACTIVATION_TANH) at /home/foreach/SOFT/BAIDU/PADDLE/Paddle/paddle/cuda/include/hl_cpu_lstm.cuh:348
348         hl_naive_lstm_forward_one_sequence(op, value, frameSize,
(gdb) info locals
No locals.

Paddle build options:

cmake -DWITH_GPU=ON -DWITH_DOC=OFF -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=/opt/paddle ..

BLAS backend is Intel MKL 11.3.3.210 CPU is Intel i5 4690K

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions