-
Notifications
You must be signed in to change notification settings - Fork 2k
Liblinear (Linear SVMs) does not train, exits with "index out of range on Math.Accord..." #330
Description
Hi there, I was trying to emulate the simple example of the a9a dataset (included)
So I compiled the library into train.exe with Visual Studio 2013
After a while, all ok, now run with command line parameters: -s 2 a9a
(the 'a9a' file has been placed into the same directory)
the file is read perfectly, but on calling train(problem, parameters); the system exists with this error:
(this is the console output)
L2RegularizedL2LossSvc iter 1 act 1.174E+003 pre 1.161E+003 delta 5.718E-001 f 2.348E+003 |g| 6.779E+003 CG 2 cg reaches trust region boundary iter 2 act 1.424E+002 pre 1.253E+002 delta 6.722E-001 f 1.174E+003 |g| 4.085E+001 CG 4 iter 3 act 3.261E+001 pre 2.966E+001 delta 6.722E-001 f 1.032E+003 |g| 3.819E+001 CG 6 cg reaches trust region boundary iter 4 act 5.202E+000 pre 4.930E+000 delta 7.117E-001 f 9.989E+002 |g| 2.083E+001 CG 14 A first chance exception of type 'System.IndexOutOfRangeException' occurred in Accord.Math.dll The program '[0x548] linear.vshost.exe' has exited with code 1 (0x1).
¿any clue?
I could not debug so deeply, don't know all the implementation tricks and issues!
Comment
Actually a saw all the Matrix math and vector training is performed over dense matrixes.
Therefore I cannot load into memory a huge sparse problem, I tried and it cannot be read.
LIBLINEAR C++ code does this internally as sparse arrays (index, data) and is really very fast, it trains over a whole 99 megabytes text file (240k samples, 70 jagged parameters) just in under 2 seconds. The same code using C# does not end after several hours.
Another thing
I want to know if the 'model' files are compatible among C++ and C# (your version) and the loading of the support vectors are equal so if I train on the original C++ code, and load the model file to use it with C#, and just use Decide()
¿Am I right?
¡ and thanks for such a good job!
I guess a sparse vectors implementation, may be faster and less memory hungry than the dense one, (on sparse data, of course)
I am doing lots of NLP work and actually use C# therefore I need your code, I am using some code I've developed on my own but you program faster on new algorithms, and I cannot cope with it.
Even I asked you on CRF some time ago and you just did it!
the problem is the sparse data, I have tons of training corpuses, and the problem does not fit in memory (I have only a miserable 8 Gigs on W10x64 and sometimes I guess it needs 120 Gigs or more)
Also I am thinking on using CUDA and optimized code, because training a deep belief network on more than 10k dimensions, and several deep layers becomes impossible on human times. (weeks training)
and with CUDA's it can go into a few minutes, rarely going into hours.
best regards, and hope we can find this bug, or whatever