Faster Hierarchical Agglomerative clustering, output clustering metrics to Data Sets #939

drroe · 2022-01-31T15:32:29Z

Version 6.2.3.

This PR contains the following improvements.

Improve the speed of hierarchical agglomerative clustering by an order of magnitude.
Create data sets for Davies-Bouldin index, pseudo-F, and SSR/SST ratio clustering metrics.
Check for and fix a potential integer overflow when setting the random number generator seed from the wall clock.

of seed

in succession using the same seed I hope

have them calculated at the end of the run instead of just when info happens. This way they can eventually be put into datasets.

cluster min distance by keeping track of closest cluster to each cluster and updating as needed.

…/pSF/SSRSST DataSet output.

drroe · 2022-01-31T19:17:07Z

Pytraj failure is because I added 3 new data sets during clustering, causing the offset used by pytraj to access cluster data to be invalid:

https://jenkins.jasonswails.com/blue/organizations/jenkins/amber-github%2Fpytraj/detail/pytraj/414/pipeline#step-22-log-663

@hainm is there any reason why we're using data[-2] and not data[0] or something? The former (two from the last I think?) means that every time a new data set is added to cluster the index becomes invalid. Can we make them [0]? I would test myself but I'm having issues getting pytraj compiled on my platform (unrelated glibc issues).

hainm · 2022-01-31T20:55:32Z

Can we make them [0]? I would test myself but I'm having issues getting pytraj compiled on my platform (unrelated glibc issues).

For the record: Amber-MD/pytraj#1598 (comment)

drroe added 22 commits January 24, 2022 12:54

Start experimental command to compare clusters

e0e94f8

Report some stats

9a0d0c0

Use time instead of wall time to seed RNG. Protect against int overflow

dbbfc7b

of seed

Use wallclock time again, finer grained so less likelyhood of rapid runs

b1b3b0e

in succession using the same seed I hope

Add missing letter

2e6fd7c

Put DBI in separate file

993234a

Put pseudo F calc in separate file

60c0098

Take DBI/pseudo-F calc out of list and store those metrics in Control,

03d50ea

have them calculated at the end of the run instead of just when info happens. This way they can eventually be put into datasets.

Allow cpptraj to be set via env var

a148105

Add data sets for DBI and pseudo-F

fbc13e9

Store SSR/SST value in set. Update manual.

effc938

Fix TIMER compile

813a3e9

Start exploring a different way of speeding up finding cluster to

6aad510

cluster min distance by keeping track of closest cluster to each cluster and updating as needed.

FindMin needs to ensure iOut < jOut

2dd9db6

Disable some debug info

f2be4d3

Bench hieragglo

44073b6

Remove old code and openmp code. Finding min is no longer the bottleneck

ed8c66c

Check openmp

f411dea

Remove debug info

abb8c01

Merge branch 'master' into cluster.work

68f24a8

Update copyright year. Add Franz and Klaus to contributor list.

33399a4

Version 6.2.3. Revision bump for HA cluster speed improvement and DBI…

53a7926

…/pSF/SSRSST DataSet output.

drroe added enhancement bugfix labels Jan 31, 2022

drroe self-assigned this Jan 31, 2022

drroe added 2 commits January 31, 2022 12:52

Test cluster metrics write out

5946e22

Allocate cluster metric sets last to try to fix pytraj

c58e003

hainm mentioned this pull request Jan 31, 2022

Use positive index instead of negative to select data set in cluster test case Amber-MD/pytraj#1598

Merged

drroe merged commit caf123a into Amber-MD:master Jan 31, 2022

drroe deleted the cluster.work branch January 31, 2022 23:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Faster Hierarchical Agglomerative clustering, output clustering metrics to Data Sets #939

Faster Hierarchical Agglomerative clustering, output clustering metrics to Data Sets #939

Uh oh!

drroe commented Jan 31, 2022

Uh oh!

drroe commented Jan 31, 2022

Uh oh!

hainm commented Jan 31, 2022

Uh oh!

Uh oh!

Faster Hierarchical Agglomerative clustering, output clustering metrics to Data Sets #939

Faster Hierarchical Agglomerative clustering, output clustering metrics to Data Sets #939

Uh oh!

Conversation

drroe commented Jan 31, 2022

Uh oh!

drroe commented Jan 31, 2022

Uh oh!

hainm commented Jan 31, 2022

Uh oh!

Uh oh!