Skip to content

Should be able to cluster on existing pairwise distance matrices #769

@drroe

Description

@drroe

Currently you need to specify some sort of data set to cluster on, but it's not strictly necessary. Issue was raised by user Tim Davis on the Amber mailing list. Here's what I posted in response:

Hi,

You can load a pairwise distance file either via a 'readdata' command
prior to cluster or by using the 'loadpairdist' option, e.g.

readdata Cmatrix.cmatrix name PW
runanalysis cluster crd1 pairdist PW ...

or

cluster C1 loadpairdist pairdist pw.out ...

If you don't want to do any clustering you can use the 'readinfo'
option to read the results of previous clustering.

I guess it can be a bit annoying if you just want a summary that you
have to specify something to cluster. I never really considered that
case to be honest. If you want, you can "fool" cpptraj by creating a
"fake" data set to cluster that has the same size as the data you
want, then read that in and "cluster" on that after reading in your
pairwise distance matrix. Here's an example where I've modified one of
cpptraj's cluster test cases to do just that.

Original:

# Test loading PW distances from Cmatrix file
cat > cluster.in <<EOF
readdata Cmatrix.cmatrix name PW
parm ../tz2.parm7
loadtraj ../tz2.nc name MyTraj
runanalysis cluster crd1 crdset MyTraj :2-10 clusters 3 epsilon 4.0
summary summary2.dat \
                    complete nofit pairdist PW \
                    cpopvtime normpop.agr normpop
EOF
cpptraj -i cluster.in

The trajectory tz2.nc is 101 frames. So I create a fake data set to
cluster with 101 entries via something like:

#!/bin/bash
rm fakedata.dat
for ((i=1; i<= 101; i++)) ; do
  echo "$i" >> fakedata.dat
done

Then you can use the following modified input: note the replacement of
'crdset' with 'nocoords' and 'data':
# Test loading PW distances from Cmatrix file
cat > cluster.in <<EOF
readdata Cmatrix.cmatrix name PW
parm ../tz2.parm7
readdata fakedata.dat name MyData
runanalysis cluster crd1 data MyData nocoords :2-10 clusters 3 epsilon
4.0 summary summary2.dat \
                    complete nofit pairdist PW \
                    cpopvtime normpop.agr normpop
EOF
cpptraj -i cluster.in

That seems to work just fine. I'll add a feature request to cpptraj
GitHub to make clustering on existing pairwise distance matrices
easier. Thanks for the report!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions