-
Notifications
You must be signed in to change notification settings - Fork 5.8k
Description
Can't run train.sh if trainer_config.py settings batch_size > 16. Getting following error:
train.log:
./train.sh
I /home/user/SOFT/BAIDU/PADDLE/Paddle/paddle/utils/Util.cpp:144] commandline: /opt/paddle/bin/../opt/paddle/bin/paddle_trainer --config=trainer_config.py --save_dir=./model_output --job=train --use_gpu=true --trainer_count=1 --num_passes=100000 --log_period=15 --dot_period=1 --show_parameter_stats_period=100 --test_all_data_in_one_period=1 --saving_period=100 --test_period=100
I /home/user/SOFT/BAIDU/PADDLE/Paddle/paddle/utils/Util.cpp:113] Calling runInitFunctions
I /home/user/SOFT/BAIDU/PADDLE/Paddle/paddle/utils/Util.cpp:126] Call runInitFunctions done.
[INFO 2016-09-06 20:10:47,439 networks.py:1122] The input order is [input, label]
[INFO 2016-09-06 20:10:47,439 networks.py:1129] The output order is [cost_0]
I /home/user/SOFT/BAIDU/PADDLE/Paddle/paddle/trainer/Trainer.cpp:169] trainer mode: Normal
I /home/user/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/dataproviders/PyDataProvider2.cpp:219] loading dataprovider dataprovider::process
I /home/user/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/dataproviders/PyDataProvider2.cpp:219] loading dataprovider dataprovider::process
I /home/user/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/gradientmachines/GradientMachine.cpp:134] Initing parameters..
I /home/user/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/gradientmachines/GradientMachine.cpp:141] Init parameters done.
F /home/user/SOFT/BAIDU/PADDLE/Paddle/paddle/cuda/src/hl_cuda_matrix.cu:322] 0x933ba8[hl_matrix_classification_error] CUDA error: invalid configuration argument
/opt/paddle/bin/paddle: line 46: 10921 Aborted (core dumped) ${DEBUGGER}$MYDIR/../opt/paddle/bin/paddle_trainer $ {@:2}
I'm trying to solve clasification task with LSTM model. My dataset is 180 examples, each is roughly 5000 timesteps (variable length). Each timestep is len=24 float vector labeled with int label in range [0, 132].
settings.input_types = [ dense_vector_sequence(settings.inputSize), integer_value_sequence(settings.vocabSize)]
Smaller size batches eg. 12 give no error, but my data is not very redundant, so gradients become unstable. My setup is 980ti (6Gb VRAM) memory usage for batch_size=12 is ~ 20%.
trainer_config.py:
settings( batch_size=24, learning_rate=0.001, learning_method=RMSPropOptimizer() ) stacked_lstm_net(input_dim=24, class_dim=133, hid_dim=24, stacked_num=7, is_predict=is_predict)
stacked_lstm_net
# simple sequential lstm
lstm_act = TanhActivation()
fc_act = LinearActivation()
data = data_layer("input", size=input_dim)
fc1 = fc_layer(input=data, size=hid_dim, act=fc_act)
lstm1 = lstmemory(input=fc1, act=lstm_act)
inputs = [fc1, lstm1]
for i in range(2, stacked_num + 1):
fc = fc_layer(input=inputs, size=hid_dim, act=fc_act)
lstm = lstmemory(input=fc, act=lstm_act)
inputs = [fc, lstm]
output = fc_layer(input=[inputs[0], inputs[1]], size=class_dim,
act=SoftmaxActivation())
if is_predict:
outputs(output)
else:
outputs(classification_cost(input=output, label=data_layer('label', class_dim)))
Could you please explain this error or point me how to debug such issue?