Skip to content

Conversation

Vegoo89
Copy link

@Vegoo89 Vegoo89 commented Jul 24, 2019

No modification in standard module so it can still be used the same way as before. Example imports:

import sparse_dot_topn.sparse_dot_topn as ct_original
import sparse_dot_topn.sparse_dot_topn_threaded as ct_thread

Example function calls:

ct_original.sparse_dot_topn(row_len, col_len, train_ptr, train_indices, train_data, g_ptr, g_indices, g_data, best_entries_len, probability_above, c_ptr, c_indicies, c_data)

ct_thread.sparse_dot_topn_threaded(row_len, col_len, train_ptr, train_indices, train_data, g_ptr, g_indices, g_data, best_entries_len, probability_above, c_ptr, c_indicies, c_data, n_jobs)

Speed comparison using two matrices:

  • tf_idf_train_names is <100000x61134 sparse matrix of type '<class 'numpy.float64'>'
  • tf_idf_g_names is <450256x61134 sparse matrix of type '<class 'numpy.float64'>'

Results using original function and threaded function with 1 thread:

cosine_topn_comparision

Threaded function results using 8 threads:

cosine_topn_8_threads

Only common file with major changes is setup.py

It is now building two extensions and pack them into single python module. Also I modified extra_compile_args and used performance optimization instead of size.

Tested it on Linux / Windows - don't know how/if it will compile on Mac with these settings.

Also it's worth to mention that threading speeds up process significally only under certain conditions. If there is cpu -> memory bottleneck (like on my work laptop) it will slow down calculations if you run more than 1 thread.

@aerdem4
Copy link

aerdem4 commented Jul 25, 2019

@Vegoo89 Thanks for your PR! Can you add your test to comparison.py? Can you also squash the commits into one?

@ymwdalex can you review too?

@Vegoo89
Copy link
Author

Vegoo89 commented Jul 26, 2019

@Vegoo89 Thanks for your PR! Can you add your test to comparison.py? Can you also squash the commits into one?

@ymwdalex can you review too?

Updated comparison files and squashed commits

@ymwdalex
Copy link
Collaborator

@Vegoo89 thanks for the pull request. It looks good!

Could you please also update readme.md file to show this new parallel implemention?

Commit squash

First Commit

Parallel implementation

Comparision added
Awesome cossim top modified
Duplicated functions/structs moved to single file

Readme and example update
@Vegoo89
Copy link
Author

Vegoo89 commented Aug 1, 2019

@Vegoo89 thanks for the pull request. It looks good!

Could you please also update readme.md file to show this new parallel implemention?

Done

@ymwdalex ymwdalex merged commit f9e99ed into ing-bank:master Aug 6, 2019
This was referenced Aug 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants