Skip to content

Getting hl_matrix_classification_error if using trainer_config settings.batch_size > 16 #44

@F0REacH

Description

@F0REacH

Can't run train.sh if trainer_config.py settings batch_size > 16. Getting following error:
train.log:

./train.sh
I /home/user/SOFT/BAIDU/PADDLE/Paddle/paddle/utils/Util.cpp:144] commandline: /opt/paddle/bin/../opt/paddle/bin/paddle_trainer --config=trainer_config.py --save_dir=./model_output --job=train --use_gpu=true --trainer_count=1 --num_passes=100000 --log_period=15 --dot_period=1 --show_parameter_stats_period=100 --test_all_data_in_one_period=1 --saving_period=100 --test_period=100
I /home/user/SOFT/BAIDU/PADDLE/Paddle/paddle/utils/Util.cpp:113] Calling runInitFunctions
I /home/user/SOFT/BAIDU/PADDLE/Paddle/paddle/utils/Util.cpp:126] Call runInitFunctions done.
[INFO 2016-09-06 20:10:47,439 networks.py:1122] The input order is [input, label]
[INFO 2016-09-06 20:10:47,439 networks.py:1129] The output order is [cost_0]
I /home/user/SOFT/BAIDU/PADDLE/Paddle/paddle/trainer/Trainer.cpp:169] trainer mode: Normal
I /home/user/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/dataproviders/PyDataProvider2.cpp:219] loading dataprovider dataprovider::process
I /home/user/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/dataproviders/PyDataProvider2.cpp:219] loading dataprovider dataprovider::process
I /home/user/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/gradientmachines/GradientMachine.cpp:134] Initing parameters..
I /home/user/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/gradientmachines/GradientMachine.cpp:141] Init parameters done.
F /home/user/SOFT/BAIDU/PADDLE/Paddle/paddle/cuda/src/hl_cuda_matrix.cu:322] 0x933ba8[hl_matrix_classification_error] CUDA error: invalid configuration argument
/opt/paddle/bin/paddle: line 46: 10921 Aborted (core dumped) ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_trainer ${@:2}

I'm trying to solve clasification task with LSTM model. My dataset is 180 examples, each is roughly 5000 timesteps (variable length). Each timestep is len=24 float vector labeled with int label in range [0, 132].

settings.input_types = [
    dense_vector_sequence(settings.inputSize),
    integer_value_sequence(settings.vocabSize)]

Smaller size batches eg. 12 give no error, but my data is not very redundant, so gradients become unstable. My setup is 980ti (6Gb VRAM) memory usage for batch_size=12 is ~ 20%.
trainer_config.py:
settings( batch_size=24, learning_rate=0.001, learning_method=RMSPropOptimizer() ) stacked_lstm_net(input_dim=24, class_dim=133, hid_dim=24, stacked_num=7, is_predict=is_predict)

stacked_lstm_net
# simple sequential lstm

lstm_act = TanhActivation()
fc_act = LinearActivation()

data = data_layer("input", size=input_dim)

fc1 = fc_layer(input=data, size=hid_dim, act=fc_act)
lstm1 = lstmemory(input=fc1, act=lstm_act)

inputs = [fc1, lstm1]
for i in range(2, stacked_num + 1):
    fc = fc_layer(input=inputs, size=hid_dim, act=fc_act)
    lstm = lstmemory(input=fc, act=lstm_act)
    inputs = [fc, lstm]

output = fc_layer(input=[inputs[0], inputs[1]], size=class_dim,
                  act=SoftmaxActivation())

if is_predict:
    outputs(output)
else:
    outputs(classification_cost(input=output, label=data_layer('label', class_dim)))

Could you please explain this error or point me how to debug such issue?

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions