Parallel implementation #11
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No modification in standard module so it can still be used the same way as before. Example imports:
import sparse_dot_topn.sparse_dot_topn as ct_original
import sparse_dot_topn.sparse_dot_topn_threaded as ct_thread
Example function calls:
ct_original.sparse_dot_topn(row_len, col_len, train_ptr, train_indices, train_data, g_ptr, g_indices, g_data, best_entries_len, probability_above, c_ptr, c_indicies, c_data)
ct_thread.sparse_dot_topn_threaded(row_len, col_len, train_ptr, train_indices, train_data, g_ptr, g_indices, g_data, best_entries_len, probability_above, c_ptr, c_indicies, c_data, n_jobs)
Speed comparison using two matrices:
tf_idf_train_names
is<100000x61134 sparse matrix of type '<class 'numpy.float64'>'
tf_idf_g_names
is<450256x61134 sparse matrix of type '<class 'numpy.float64'>'
Results using original function and threaded function with 1 thread:
Threaded function results using 8 threads:
Only common file with major changes is
setup.py
It is now building two extensions and pack them into single python module. Also I modified
extra_compile_args
and used performance optimization instead of size.Tested it on Linux / Windows - don't know how/if it will compile on Mac with these settings.
Also it's worth to mention that threading speeds up process significally only under certain conditions. If there is cpu -> memory bottleneck (like on my work laptop) it will slow down calculations if you run more than 1 thread.